linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] mm/hotplug: Only use subsection map for VMEMMAP
@ 2020-03-07  8:42 Baoquan He
  2020-03-07  8:42 ` [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case Baoquan He
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Baoquan He @ 2020-03-07  8:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, mhocko, david, richardw.yang, dan.j.williams,
	osalvador, rppt, bhe

Memory sub-section hotplug was added to fix the issue that nvdimm could
be mapped at non-section aligned starting address. A subsection map is
added into struct mem_section_usage to implement it. 

However, config ZONE_DEVICE depends on SPARSEMEM_VMEMMAP. It means
subsection map only makes sense when SPARSEMEM_VMEMMAP enabled. For the                                                                           
classic sparse, subsection map is meaningless and confusing.

About the classic sparse which doesn't support subsection hotplug, Dan
said it's more because the effort and maintenance burden outweighs the
benefit. Besides, the current 64 bit ARCHes all enable
SPARSEMEM_VMEMMAP_ENABLE by default.

In this patchset, the patches 2~4 ares used to make sub-section map and the
relevant operation only available for VMEMMAP. 

Patch 1 fixes a hot remove failure when the classic sparse is enabled.

Patches 5~7 are for document adding and doc/code cleanup.

Changelog

v2->v3:
  David spotted a code bug in the old patch 1, the old local variable
  subsection_map is invalid once ms->usage is resetting. Add a local
  variable 'empty' to cache if subsection_map is empty or not.

  Remove the kernel-doc comments for the newly added functions
  fill_subsection_map() and clear_subsection_map(). Michal and David
  suggested this.

  Add a new static function is_subsection_map_empty() to check if the
  handled section map is empty, but not return the value from
  clear_subsection_map(). David suggested this.

  Add document about only VMEMMAP supporting sub-section hotplug, and
  check_pfn_span() gating the alignment and size. Michal help rephrase
  the words. 

v1->v2:
  Move the hot remove fixing patch to the front so that people can
  back port it to easier. Suggested by David.

  Split the old patch which invalidate the sub-section map in
  !VMEMMAP case into two patches, patch 4/7, and patch 6/7. This
  makes patch reviewing easier. Suggested by David.

  Take Wei Yang's fixing patch out to post alone, since it has been
  reviewed and acked by people. Suggested by Andrew.

  Fix a code comment mistake in the current patch 2/7. Found out by
  Wei Yang during reviewing.

Baoquan He (7):
  mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
  mm/sparse.c: introduce new function fill_subsection_map()
  mm/sparse.c: introduce a new function clear_subsection_map()
  mm/sparse.c: only use subsection map in VMEMMAP case
  mm/sparse.c: add note about only VMEMMAP supporting sub-section
    support
  mm/sparse.c: move subsection_map related codes together
  mm/sparse.c: Use __get_free_pages() instead in
    populate_section_memmap()

 include/linux/mmzone.h |   2 +
 mm/sparse.c            | 159 +++++++++++++++++++++++++++--------------
 2 files changed, 107 insertions(+), 54 deletions(-)

-- 
2.17.2



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
  2020-03-07  8:42 [PATCH v3 0/7] mm/hotplug: Only use subsection map for VMEMMAP Baoquan He
@ 2020-03-07  8:42 ` Baoquan He
  2020-03-07 20:59   ` Andrew Morton
                     ` (4 more replies)
  2020-03-07  8:42 ` [PATCH v3 2/7] mm/sparse.c: introduce new function fill_subsection_map() Baoquan He
                   ` (5 subsequent siblings)
  6 siblings, 5 replies; 29+ messages in thread
From: Baoquan He @ 2020-03-07  8:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, mhocko, david, richardw.yang, dan.j.williams,
	osalvador, rppt, bhe

In section_deactivate(), pfn_to_page() doesn't work any more after
ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
It caused hot remove failure:

kernel BUG at mm/page_alloc.c:4806!
invalid opcode: 0000 [#1] SMP PTI
CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G        W         5.5.0-next-20200205+ #340
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
RIP: 0010:free_pages+0x85/0xa0
Call Trace:
 __remove_pages+0x99/0xc0
 arch_remove_memory+0x23/0x4d
 try_remove_memory+0xc8/0x130
 ? walk_memory_blocks+0x72/0xa0
 __remove_memory+0xa/0x11
 acpi_memory_device_remove+0x72/0x100
 acpi_bus_trim+0x55/0x90
 acpi_device_hotplug+0x2eb/0x3d0
 acpi_hotplug_work_fn+0x1a/0x30
 process_one_work+0x1a7/0x370
 worker_thread+0x30/0x380
 ? flush_rcu_work+0x30/0x30
 kthread+0x112/0x130
 ? kthread_create_on_node+0x60/0x60
 ret_from_fork+0x35/0x40

Let's move the ->section_mem_map resetting after depopulate_section_memmap()
to fix it.

Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: stable@vger.kernel.org
---
 mm/sparse.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 42c18a38ffaa..1b50c15677d7 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -734,6 +734,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 	struct mem_section *ms = __pfn_to_section(pfn);
 	bool section_is_early = early_section(ms);
 	struct page *memmap = NULL;
+	bool empty = false;
 	unsigned long *subsection_map = ms->usage
 		? &ms->usage->subsection_map[0] : NULL;
 
@@ -764,7 +765,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 	 * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
 	 */
 	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
-	if (bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION)) {
+	empty = bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION);
+	if (empty) {
 		unsigned long section_nr = pfn_to_section_nr(pfn);
 
 		/*
@@ -779,13 +781,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 			ms->usage = NULL;
 		}
 		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
-		ms->section_mem_map = (unsigned long)NULL;
 	}
 
 	if (section_is_early && memmap)
 		free_map_bootmem(memmap);
 	else
 		depopulate_section_memmap(pfn, nr_pages, altmap);
+
+	if (empty)
+		ms->section_mem_map = (unsigned long)NULL;
 }
 
 static struct page * __meminit section_activate(int nid, unsigned long pfn,
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 2/7] mm/sparse.c: introduce new function fill_subsection_map()
  2020-03-07  8:42 [PATCH v3 0/7] mm/hotplug: Only use subsection map for VMEMMAP Baoquan He
  2020-03-07  8:42 ` [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case Baoquan He
@ 2020-03-07  8:42 ` Baoquan He
  2020-03-07  8:42 ` [PATCH v3 3/7] mm/sparse.c: introduce a new function clear_subsection_map() Baoquan He
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Baoquan He @ 2020-03-07  8:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, mhocko, david, richardw.yang, dan.j.williams,
	osalvador, rppt, bhe

Factor out the code that fills the subsection map from section_activate()
into fill_subsection_map(), this makes section_activate() cleaner and
easier to follow.

Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 mm/sparse.c | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 1b50c15677d7..e37c0abcdc89 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -792,24 +792,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 		ms->section_mem_map = (unsigned long)NULL;
 }
 
-static struct page * __meminit section_activate(int nid, unsigned long pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap)
+static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
 {
-	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
 	struct mem_section *ms = __pfn_to_section(pfn);
-	struct mem_section_usage *usage = NULL;
+	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
 	unsigned long *subsection_map;
-	struct page *memmap;
 	int rc = 0;
 
 	subsection_mask_set(map, pfn, nr_pages);
 
-	if (!ms->usage) {
-		usage = kzalloc(mem_section_usage_size(), GFP_KERNEL);
-		if (!usage)
-			return ERR_PTR(-ENOMEM);
-		ms->usage = usage;
-	}
 	subsection_map = &ms->usage->subsection_map[0];
 
 	if (bitmap_empty(map, SUBSECTIONS_PER_SECTION))
@@ -820,6 +811,25 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
 		bitmap_or(subsection_map, map, subsection_map,
 				SUBSECTIONS_PER_SECTION);
 
+	return rc;
+}
+
+static struct page * __meminit section_activate(int nid, unsigned long pfn,
+		unsigned long nr_pages, struct vmem_altmap *altmap)
+{
+	struct mem_section *ms = __pfn_to_section(pfn);
+	struct mem_section_usage *usage = NULL;
+	struct page *memmap;
+	int rc = 0;
+
+	if (!ms->usage) {
+		usage = kzalloc(mem_section_usage_size(), GFP_KERNEL);
+		if (!usage)
+			return ERR_PTR(-ENOMEM);
+		ms->usage = usage;
+	}
+
+	rc = fill_subsection_map(pfn, nr_pages);
 	if (rc) {
 		if (usage)
 			ms->usage = NULL;
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 3/7] mm/sparse.c: introduce a new function clear_subsection_map()
  2020-03-07  8:42 [PATCH v3 0/7] mm/hotplug: Only use subsection map for VMEMMAP Baoquan He
  2020-03-07  8:42 ` [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case Baoquan He
  2020-03-07  8:42 ` [PATCH v3 2/7] mm/sparse.c: introduce new function fill_subsection_map() Baoquan He
@ 2020-03-07  8:42 ` Baoquan He
  2020-03-09  8:59   ` David Hildenbrand
  2020-03-07  8:42 ` [PATCH v3 4/7] mm/sparse.c: only use subsection map in VMEMMAP case Baoquan He
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Baoquan He @ 2020-03-07  8:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, mhocko, david, richardw.yang, dan.j.williams,
	osalvador, rppt, bhe

Factor out the code which clear subsection map of one memory region from
section_deactivate() into clear_subsection_map().

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse.c | 31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index e37c0abcdc89..d9dcd58d5c1d 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -726,15 +726,11 @@ static void free_map_bootmem(struct page *memmap)
 }
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
-static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap)
+static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
 {
 	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
 	DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
 	struct mem_section *ms = __pfn_to_section(pfn);
-	bool section_is_early = early_section(ms);
-	struct page *memmap = NULL;
-	bool empty = false;
 	unsigned long *subsection_map = ms->usage
 		? &ms->usage->subsection_map[0] : NULL;
 
@@ -745,8 +741,31 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 	if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
 				"section already deactivated (%#lx + %ld)\n",
 				pfn, nr_pages))
+		return -EINVAL;
+
+	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
+
+	return 0;
+}
+
+static bool is_subsection_map_empty(struct mem_section *ms)
+{
+	return bitmap_empty(&ms->usage->subsection_map[0],
+			    SUBSECTIONS_PER_SECTION);
+}
+
+static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
+		struct vmem_altmap *altmap)
+{
+	struct mem_section *ms = __pfn_to_section(pfn);
+	bool section_is_early = early_section(ms);
+	struct page *memmap = NULL;
+	bool empty = false;
+
+	if (clear_subsection_map(pfn, nr_pages))
 		return;
 
+	empty = is_subsection_map_empty(ms);
 	/*
 	 * There are 3 cases to handle across two configurations
 	 * (SPARSEMEM_VMEMMAP={y,n}):
@@ -764,8 +783,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 	 *
 	 * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
 	 */
-	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
-	empty = bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION);
 	if (empty) {
 		unsigned long section_nr = pfn_to_section_nr(pfn);
 
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 4/7] mm/sparse.c: only use subsection map in VMEMMAP case
  2020-03-07  8:42 [PATCH v3 0/7] mm/hotplug: Only use subsection map for VMEMMAP Baoquan He
                   ` (2 preceding siblings ...)
  2020-03-07  8:42 ` [PATCH v3 3/7] mm/sparse.c: introduce a new function clear_subsection_map() Baoquan He
@ 2020-03-07  8:42 ` Baoquan He
  2020-03-09  9:00   ` David Hildenbrand
  2020-03-07  8:42 ` [PATCH v3 5/7] mm/sparse.c: add note about only VMEMMAP supporting sub-section support Baoquan He
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Baoquan He @ 2020-03-07  8:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, mhocko, david, richardw.yang, dan.j.williams,
	osalvador, rppt, bhe

Currently, to support subsection aligned memory region adding for pmem,
subsection map is added to track which subsection is present.

However, config ZONE_DEVICE depends on SPARSEMEM_VMEMMAP. It means
subsection map only makes sense when SPARSEMEM_VMEMMAP enabled. For the
classic sparse, subsection map is meaningless and confusing.

About the classic sparse which doesn't support subsection hotplug, Dan
said it's more because the effort and maintenance burden outweighs the
benefit. Besides, the current 64 bit ARCHes all enable
SPARSEMEM_VMEMMAP_ENABLE by default.

Combining the above reasons, no need to provide subsection map and the
relevant handling for the classic sparse. Handle it with this patch.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 include/linux/mmzone.h |  2 ++
 mm/sparse.c            | 25 +++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 42b77d3b68e8..f3f264826423 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1143,7 +1143,9 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
 #define SUBSECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SUBSECTION_MASK)
 
 struct mem_section_usage {
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
 	DECLARE_BITMAP(subsection_map, SUBSECTIONS_PER_SECTION);
+#endif
 	/* See declaration of similar field in struct zone */
 	unsigned long pageblock_flags[0];
 };
diff --git a/mm/sparse.c b/mm/sparse.c
index d9dcd58d5c1d..2142045ab5c5 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -209,6 +209,7 @@ static inline unsigned long first_present_section_nr(void)
 	return next_present_section_nr(-1);
 }
 
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
 static void subsection_mask_set(unsigned long *map, unsigned long pfn,
 		unsigned long nr_pages)
 {
@@ -243,6 +244,11 @@ void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
 		nr_pages -= pfns;
 	}
 }
+#else
+void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
+{
+}
+#endif
 
 /* Record a memory area against a node. */
 void __init memory_present(int nid, unsigned long start, unsigned long end)
@@ -726,6 +732,7 @@ static void free_map_bootmem(struct page *memmap)
 }
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
 static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
 {
 	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
@@ -753,6 +760,17 @@ static bool is_subsection_map_empty(struct mem_section *ms)
 	return bitmap_empty(&ms->usage->subsection_map[0],
 			    SUBSECTIONS_PER_SECTION);
 }
+#else
+static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+	return 0;
+}
+
+static bool is_subsection_map_empty(struct mem_section *ms)
+{
+	return true;
+}
+#endif
 
 static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 		struct vmem_altmap *altmap)
@@ -809,6 +827,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 		ms->section_mem_map = (unsigned long)NULL;
 }
 
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
 static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
 {
 	struct mem_section *ms = __pfn_to_section(pfn);
@@ -830,6 +849,12 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
 
 	return rc;
 }
+#else
+static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+	return 0;
+}
+#endif
 
 static struct page * __meminit section_activate(int nid, unsigned long pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap)
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 5/7] mm/sparse.c: add note about only VMEMMAP supporting sub-section support
  2020-03-07  8:42 [PATCH v3 0/7] mm/hotplug: Only use subsection map for VMEMMAP Baoquan He
                   ` (3 preceding siblings ...)
  2020-03-07  8:42 ` [PATCH v3 4/7] mm/sparse.c: only use subsection map in VMEMMAP case Baoquan He
@ 2020-03-07  8:42 ` Baoquan He
  2020-03-07 11:55   ` Baoquan He
  2020-03-10 14:46   ` Michal Hocko
  2020-03-07  8:42 ` [PATCH v3 6/7] mm/sparse.c: move subsection_map related codes together Baoquan He
  2020-03-07  8:42 ` [PATCH v3 7/7] mm/sparse.c: Use __get_free_pages() instead in populate_section_memmap() Baoquan He
  6 siblings, 2 replies; 29+ messages in thread
From: Baoquan He @ 2020-03-07  8:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, mhocko, david, richardw.yang, dan.j.williams,
	osalvador, rppt, bhe

And tell check_pfn_span() gating the porper alignment and size of
hot added memory region.

And also move the code comments from inside section_deactivate()
to being above it. The code comments are reasonable for the whole
function, and the moving makes code cleaner.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse.c | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 2142045ab5c5..0fbd79c4ad81 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -772,6 +772,22 @@ static bool is_subsection_map_empty(struct mem_section *ms)
 }
 #endif
 
+/*
+ * To deactivate a memory region, there are 3 cases to handle across
+ * two configurations (SPARSEMEM_VMEMMAP={y,n}):
+ *
+ * 1. deactivation of a partial hot-added section (only possible in
+ *    the SPARSEMEM_VMEMMAP=y case).
+ *      a) section was present at memory init.
+ *      b) section was hot-added post memory init.
+ * 2. deactivation of a complete hot-added section.
+ * 3. deactivation of a complete section from memory init.
+ *
+ * For 1, when subsection_map does not empty we will not be freeing the
+ * usage map, but still need to free the vmemmap range.
+ *
+ * For 2 and 3, the SPARSEMEM_VMEMMAP={y,n} cases are unified
+ */
 static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 		struct vmem_altmap *altmap)
 {
@@ -784,23 +800,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 		return;
 
 	empty = is_subsection_map_empty(ms);
-	/*
-	 * There are 3 cases to handle across two configurations
-	 * (SPARSEMEM_VMEMMAP={y,n}):
-	 *
-	 * 1/ deactivation of a partial hot-added section (only possible
-	 * in the SPARSEMEM_VMEMMAP=y case).
-	 *    a/ section was present at memory init
-	 *    b/ section was hot-added post memory init
-	 * 2/ deactivation of a complete hot-added section
-	 * 3/ deactivation of a complete section from memory init
-	 *
-	 * For 1/, when subsection_map does not empty we will not be
-	 * freeing the usage map, but still need to free the vmemmap
-	 * range.
-	 *
-	 * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
-	 */
 	if (empty) {
 		unsigned long section_nr = pfn_to_section_nr(pfn);
 
@@ -907,6 +906,10 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
  *
  * This is only intended for hotplug.
  *
+ * Note that only VMEMMAP supports sub-section aligned hotplug,
+ * the proper alignment and size are gated by check_pfn_span().
+ *
+ *
  * Return:
  * * 0		- On success.
  * * -EEXIST	- Section has been present.
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 6/7] mm/sparse.c: move subsection_map related codes together
  2020-03-07  8:42 [PATCH v3 0/7] mm/hotplug: Only use subsection map for VMEMMAP Baoquan He
                   ` (4 preceding siblings ...)
  2020-03-07  8:42 ` [PATCH v3 5/7] mm/sparse.c: add note about only VMEMMAP supporting sub-section support Baoquan He
@ 2020-03-07  8:42 ` Baoquan He
  2020-03-09  9:08   ` David Hildenbrand
  2020-03-07  8:42 ` [PATCH v3 7/7] mm/sparse.c: Use __get_free_pages() instead in populate_section_memmap() Baoquan He
  6 siblings, 1 reply; 29+ messages in thread
From: Baoquan He @ 2020-03-07  8:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, mhocko, david, richardw.yang, dan.j.williams,
	osalvador, rppt, bhe

No functional change.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse.c | 134 +++++++++++++++++++++++++---------------------------
 1 file changed, 65 insertions(+), 69 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 0fbd79c4ad81..fde651ab8741 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -244,10 +244,75 @@ void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
 		nr_pages -= pfns;
 	}
 }
+
+static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
+	DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
+	struct mem_section *ms = __pfn_to_section(pfn);
+	unsigned long *subsection_map = ms->usage
+		? &ms->usage->subsection_map[0] : NULL;
+
+	subsection_mask_set(map, pfn, nr_pages);
+	if (subsection_map)
+		bitmap_and(tmp, map, subsection_map, SUBSECTIONS_PER_SECTION);
+
+	if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
+				"section already deactivated (%#lx + %ld)\n",
+				pfn, nr_pages))
+		return -EINVAL;
+
+	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
+
+	return 0;
+}
+
+static bool is_subsection_map_empty(struct mem_section *ms)
+{
+	return bitmap_empty(&ms->usage->subsection_map[0],
+			    SUBSECTIONS_PER_SECTION);
+}
+
+static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+	struct mem_section *ms = __pfn_to_section(pfn);
+	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
+	unsigned long *subsection_map;
+	int rc = 0;
+
+	subsection_mask_set(map, pfn, nr_pages);
+
+	subsection_map = &ms->usage->subsection_map[0];
+
+	if (bitmap_empty(map, SUBSECTIONS_PER_SECTION))
+		rc = -EINVAL;
+	else if (bitmap_intersects(map, subsection_map, SUBSECTIONS_PER_SECTION))
+		rc = -EEXIST;
+	else
+		bitmap_or(subsection_map, map, subsection_map,
+				SUBSECTIONS_PER_SECTION);
+
+	return rc;
+}
 #else
 void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
 {
 }
+
+static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+	return 0;
+}
+
+static bool is_subsection_map_empty(struct mem_section *ms)
+{
+	return true;
+}
+
+static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+	return 0;
+}
 #endif
 
 /* Record a memory area against a node. */
@@ -732,46 +797,6 @@ static void free_map_bootmem(struct page *memmap)
 }
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
-	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
-	DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
-	struct mem_section *ms = __pfn_to_section(pfn);
-	unsigned long *subsection_map = ms->usage
-		? &ms->usage->subsection_map[0] : NULL;
-
-	subsection_mask_set(map, pfn, nr_pages);
-	if (subsection_map)
-		bitmap_and(tmp, map, subsection_map, SUBSECTIONS_PER_SECTION);
-
-	if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
-				"section already deactivated (%#lx + %ld)\n",
-				pfn, nr_pages))
-		return -EINVAL;
-
-	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
-
-	return 0;
-}
-
-static bool is_subsection_map_empty(struct mem_section *ms)
-{
-	return bitmap_empty(&ms->usage->subsection_map[0],
-			    SUBSECTIONS_PER_SECTION);
-}
-#else
-static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
-	return 0;
-}
-
-static bool is_subsection_map_empty(struct mem_section *ms)
-{
-	return true;
-}
-#endif
-
 /*
  * To deactivate a memory region, there are 3 cases to handle across
  * two configurations (SPARSEMEM_VMEMMAP={y,n}):
@@ -826,35 +851,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 		ms->section_mem_map = (unsigned long)NULL;
 }
 
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
-	struct mem_section *ms = __pfn_to_section(pfn);
-	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
-	unsigned long *subsection_map;
-	int rc = 0;
-
-	subsection_mask_set(map, pfn, nr_pages);
-
-	subsection_map = &ms->usage->subsection_map[0];
-
-	if (bitmap_empty(map, SUBSECTIONS_PER_SECTION))
-		rc = -EINVAL;
-	else if (bitmap_intersects(map, subsection_map, SUBSECTIONS_PER_SECTION))
-		rc = -EEXIST;
-	else
-		bitmap_or(subsection_map, map, subsection_map,
-				SUBSECTIONS_PER_SECTION);
-
-	return rc;
-}
-#else
-static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
-	return 0;
-}
-#endif
-
 static struct page * __meminit section_activate(int nid, unsigned long pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap)
 {
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 7/7] mm/sparse.c: Use __get_free_pages() instead in populate_section_memmap()
  2020-03-07  8:42 [PATCH v3 0/7] mm/hotplug: Only use subsection map for VMEMMAP Baoquan He
                   ` (5 preceding siblings ...)
  2020-03-07  8:42 ` [PATCH v3 6/7] mm/sparse.c: move subsection_map related codes together Baoquan He
@ 2020-03-07  8:42 ` Baoquan He
  2020-03-10 14:56   ` Michal Hocko
  6 siblings, 1 reply; 29+ messages in thread
From: Baoquan He @ 2020-03-07  8:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, mhocko, david, richardw.yang, dan.j.williams,
	osalvador, rppt, bhe

This removes the unnecessary goto, and simplify codes.

Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
---
 mm/sparse.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index fde651ab8741..266f7f5040fb 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -735,23 +735,19 @@ static void free_map_bootmem(struct page *memmap)
 struct page * __meminit populate_section_memmap(unsigned long pfn,
 		unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
 {
-	struct page *page, *ret;
+	struct page *ret;
 	unsigned long memmap_size = sizeof(struct page) * PAGES_PER_SECTION;
 
-	page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size));
-	if (page)
-		goto got_map_page;
+	ret = (void*)__get_free_pages(GFP_KERNEL|__GFP_NOWARN,
+				get_order(memmap_size));
+	if (ret)
+		return ret;
 
 	ret = vmalloc(memmap_size);
 	if (ret)
-		goto got_map_ptr;
+		return ret;
 
 	return NULL;
-got_map_page:
-	ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
-got_map_ptr:
-
-	return ret;
 }
 
 static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/7] mm/sparse.c: add note about only VMEMMAP supporting sub-section support
  2020-03-07  8:42 ` [PATCH v3 5/7] mm/sparse.c: add note about only VMEMMAP supporting sub-section support Baoquan He
@ 2020-03-07 11:55   ` Baoquan He
  2020-03-10 14:46   ` Michal Hocko
  1 sibling, 0 replies; 29+ messages in thread
From: Baoquan He @ 2020-03-07 11:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, mhocko, david, richardw.yang, dan.j.williams,
	osalvador, rppt

On 03/07/20 at 04:42pm, Baoquan He wrote:

Sorry, the subject should be:

mm/sparse.c: add note about only VMEMMAP supporting sub-section hotplug

> And tell check_pfn_span() gating the porper alignment and size of
> hot added memory region.
> 
> And also move the code comments from inside section_deactivate()
> to being above it. The code comments are reasonable for the whole
> function, and the moving makes code cleaner.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/sparse.c | 37 ++++++++++++++++++++-----------------
>  1 file changed, 20 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 2142045ab5c5..0fbd79c4ad81 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -772,6 +772,22 @@ static bool is_subsection_map_empty(struct mem_section *ms)
>  }
>  #endif
>  
> +/*
> + * To deactivate a memory region, there are 3 cases to handle across
> + * two configurations (SPARSEMEM_VMEMMAP={y,n}):
> + *
> + * 1. deactivation of a partial hot-added section (only possible in
> + *    the SPARSEMEM_VMEMMAP=y case).
> + *      a) section was present at memory init.
> + *      b) section was hot-added post memory init.
> + * 2. deactivation of a complete hot-added section.
> + * 3. deactivation of a complete section from memory init.
> + *
> + * For 1, when subsection_map does not empty we will not be freeing the
> + * usage map, but still need to free the vmemmap range.
> + *
> + * For 2 and 3, the SPARSEMEM_VMEMMAP={y,n} cases are unified
> + */
>  static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  		struct vmem_altmap *altmap)
>  {
> @@ -784,23 +800,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  		return;
>  
>  	empty = is_subsection_map_empty(ms);
> -	/*
> -	 * There are 3 cases to handle across two configurations
> -	 * (SPARSEMEM_VMEMMAP={y,n}):
> -	 *
> -	 * 1/ deactivation of a partial hot-added section (only possible
> -	 * in the SPARSEMEM_VMEMMAP=y case).
> -	 *    a/ section was present at memory init
> -	 *    b/ section was hot-added post memory init
> -	 * 2/ deactivation of a complete hot-added section
> -	 * 3/ deactivation of a complete section from memory init
> -	 *
> -	 * For 1/, when subsection_map does not empty we will not be
> -	 * freeing the usage map, but still need to free the vmemmap
> -	 * range.
> -	 *
> -	 * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
> -	 */
>  	if (empty) {
>  		unsigned long section_nr = pfn_to_section_nr(pfn);
>  
> @@ -907,6 +906,10 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
>   *
>   * This is only intended for hotplug.
>   *
> + * Note that only VMEMMAP supports sub-section aligned hotplug,
> + * the proper alignment and size are gated by check_pfn_span().
> + *
> + *
>   * Return:
>   * * 0		- On success.
>   * * -EEXIST	- Section has been present.
> -- 
> 2.17.2
> 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
  2020-03-07  8:42 ` [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case Baoquan He
@ 2020-03-07 20:59   ` Andrew Morton
  2020-03-07 22:55     ` Baoquan He
  2020-03-09  8:56   ` David Hildenbrand
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2020-03-07 20:59 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, mhocko, david, richardw.yang,
	dan.j.williams, osalvador, rppt

On Sat,  7 Mar 2020 16:42:23 +0800 Baoquan He <bhe@redhat.com> wrote:

> In section_deactivate(), pfn_to_page() doesn't work any more after
> ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
> It caused hot remove failure:
> 
> kernel BUG at mm/page_alloc.c:4806!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G        W         5.5.0-next-20200205+ #340
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> RIP: 0010:free_pages+0x85/0xa0
> Call Trace:
>  __remove_pages+0x99/0xc0
>  arch_remove_memory+0x23/0x4d
>  try_remove_memory+0xc8/0x130
>  ? walk_memory_blocks+0x72/0xa0
>  __remove_memory+0xa/0x11
>  acpi_memory_device_remove+0x72/0x100
>  acpi_bus_trim+0x55/0x90
>  acpi_device_hotplug+0x2eb/0x3d0
>  acpi_hotplug_work_fn+0x1a/0x30
>  process_one_work+0x1a7/0x370
>  worker_thread+0x30/0x380
>  ? flush_rcu_work+0x30/0x30
>  kthread+0x112/0x130
>  ? kthread_create_on_node+0x60/0x60
>  ret_from_fork+0x35/0x40
> 
> Let's move the ->section_mem_map resetting after depopulate_section_memmap()
> to fix it.

Thanks.  I think I'll cherrypick this fix and shall await more
review/testing input on the rest of the series.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
  2020-03-07 20:59   ` Andrew Morton
@ 2020-03-07 22:55     ` Baoquan He
  0 siblings, 0 replies; 29+ messages in thread
From: Baoquan He @ 2020-03-07 22:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, mhocko, david, richardw.yang,
	dan.j.williams, osalvador, rppt

On 03/07/20 at 12:59pm, Andrew Morton wrote:
> On Sat,  7 Mar 2020 16:42:23 +0800 Baoquan He <bhe@redhat.com> wrote:
> 
> > In section_deactivate(), pfn_to_page() doesn't work any more after
> > ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
> > It caused hot remove failure:
> > 
> > kernel BUG at mm/page_alloc.c:4806!
> > invalid opcode: 0000 [#1] SMP PTI
> > CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G        W         5.5.0-next-20200205+ #340
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> > Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> > RIP: 0010:free_pages+0x85/0xa0
> > Call Trace:
> >  __remove_pages+0x99/0xc0
> >  arch_remove_memory+0x23/0x4d
> >  try_remove_memory+0xc8/0x130
> >  ? walk_memory_blocks+0x72/0xa0
> >  __remove_memory+0xa/0x11
> >  acpi_memory_device_remove+0x72/0x100
> >  acpi_bus_trim+0x55/0x90
> >  acpi_device_hotplug+0x2eb/0x3d0
> >  acpi_hotplug_work_fn+0x1a/0x30
> >  process_one_work+0x1a7/0x370
> >  worker_thread+0x30/0x380
> >  ? flush_rcu_work+0x30/0x30
> >  kthread+0x112/0x130
> >  ? kthread_create_on_node+0x60/0x60
> >  ret_from_fork+0x35/0x40
> > 
> > Let's move the ->section_mem_map resetting after depopulate_section_memmap()
> > to fix it.
> 
> Thanks.  I think I'll cherrypick this fix and shall await more
> review/testing input on the rest of the series.

Sure, thanks.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
  2020-03-07  8:42 ` [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case Baoquan He
  2020-03-07 20:59   ` Andrew Morton
@ 2020-03-09  8:56   ` David Hildenbrand
  2020-03-09  8:58   ` David Hildenbrand
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2020-03-09  8:56 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, mhocko, richardw.yang, dan.j.williams, osalvador, rppt

On 07.03.20 09:42, Baoquan He wrote:
> In section_deactivate(), pfn_to_page() doesn't work any more after
> ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
> It caused hot remove failure:
> 
> kernel BUG at mm/page_alloc.c:4806!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G        W         5.5.0-next-20200205+ #340
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> RIP: 0010:free_pages+0x85/0xa0
> Call Trace:
>  __remove_pages+0x99/0xc0
>  arch_remove_memory+0x23/0x4d
>  try_remove_memory+0xc8/0x130
>  ? walk_memory_blocks+0x72/0xa0
>  __remove_memory+0xa/0x11
>  acpi_memory_device_remove+0x72/0x100
>  acpi_bus_trim+0x55/0x90
>  acpi_device_hotplug+0x2eb/0x3d0
>  acpi_hotplug_work_fn+0x1a/0x30
>  process_one_work+0x1a7/0x370
>  worker_thread+0x30/0x380
>  ? flush_rcu_work+0x30/0x30
>  kthread+0x112/0x130
>  ? kthread_create_on_node+0x60/0x60
>  ret_from_fork+0x35/0x40
> 
> Let's move the ->section_mem_map resetting after depopulate_section_memmap()
> to fix it.
> 
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Cc: stable@vger.kernel.org
> ---
>  mm/sparse.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 42c18a38ffaa..1b50c15677d7 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -734,6 +734,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  	struct mem_section *ms = __pfn_to_section(pfn);
>  	bool section_is_early = early_section(ms);
>  	struct page *memmap = NULL;
> +	bool empty = false;
>  	unsigned long *subsection_map = ms->usage
>  		? &ms->usage->subsection_map[0] : NULL;
>  
> @@ -764,7 +765,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  	 * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
>  	 */
>  	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> -	if (bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION)) {
> +	empty = bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION);
> +	if (empty) {
>  		unsigned long section_nr = pfn_to_section_nr(pfn);
>  
>  		/*
> @@ -779,13 +781,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  			ms->usage = NULL;
>  		}
>  		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> -		ms->section_mem_map = (unsigned long)NULL;
>  	}
>  
>  	if (section_is_early && memmap)
>  		free_map_bootmem(memmap);
>  	else
>  		depopulate_section_memmap(pfn, nr_pages, altmap);
> +
> +	if (empty)
> +		ms->section_mem_map = (unsigned long)NULL;
>  }
>  
>  static struct page * __meminit section_activate(int nid, unsigned long pfn,
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
  2020-03-07  8:42 ` [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case Baoquan He
  2020-03-07 20:59   ` Andrew Morton
  2020-03-09  8:56   ` David Hildenbrand
@ 2020-03-09  8:58   ` David Hildenbrand
  2020-03-09 13:18     ` Baoquan He
  2020-03-09 10:13   ` Pankaj Gupta
  2020-03-09 12:56   ` Michal Hocko
  4 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2020-03-09  8:58 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, mhocko, richardw.yang, dan.j.williams, osalvador, rppt

On 07.03.20 09:42, Baoquan He wrote:
> In section_deactivate(), pfn_to_page() doesn't work any more after
> ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
> It caused hot remove failure:
> 
> kernel BUG at mm/page_alloc.c:4806!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G        W         5.5.0-next-20200205+ #340
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> RIP: 0010:free_pages+0x85/0xa0
> Call Trace:
>  __remove_pages+0x99/0xc0
>  arch_remove_memory+0x23/0x4d
>  try_remove_memory+0xc8/0x130
>  ? walk_memory_blocks+0x72/0xa0
>  __remove_memory+0xa/0x11
>  acpi_memory_device_remove+0x72/0x100
>  acpi_bus_trim+0x55/0x90
>  acpi_device_hotplug+0x2eb/0x3d0
>  acpi_hotplug_work_fn+0x1a/0x30
>  process_one_work+0x1a7/0x370
>  worker_thread+0x30/0x380
>  ? flush_rcu_work+0x30/0x30
>  kthread+0x112/0x130
>  ? kthread_create_on_node+0x60/0x60
>  ret_from_fork+0x35/0x40
> 
> Let's move the ->section_mem_map resetting after depopulate_section_memmap()
> to fix it.
> 
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Cc: stable@vger.kernel.org
> ---
>  mm/sparse.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 42c18a38ffaa..1b50c15677d7 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -734,6 +734,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  	struct mem_section *ms = __pfn_to_section(pfn);
>  	bool section_is_early = early_section(ms);
>  	struct page *memmap = NULL;
> +	bool empty = false;

Oh, one NIT: no need to initialize empty to false.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/7] mm/sparse.c: introduce a new function clear_subsection_map()
  2020-03-07  8:42 ` [PATCH v3 3/7] mm/sparse.c: introduce a new function clear_subsection_map() Baoquan He
@ 2020-03-09  8:59   ` David Hildenbrand
  2020-03-09 13:32     ` Baoquan He
  0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2020-03-09  8:59 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, mhocko, richardw.yang, dan.j.williams, osalvador, rppt

On 07.03.20 09:42, Baoquan He wrote:
> Factor out the code which clear subsection map of one memory region from
> section_deactivate() into clear_subsection_map().
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/sparse.c | 31 ++++++++++++++++++++++++-------
>  1 file changed, 24 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index e37c0abcdc89..d9dcd58d5c1d 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -726,15 +726,11 @@ static void free_map_bootmem(struct page *memmap)
>  }
>  #endif /* CONFIG_SPARSEMEM_VMEMMAP */
>  
> -static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> -		struct vmem_altmap *altmap)
> +static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
>  {
>  	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
>  	DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
>  	struct mem_section *ms = __pfn_to_section(pfn);
> -	bool section_is_early = early_section(ms);
> -	struct page *memmap = NULL;
> -	bool empty = false;
>  	unsigned long *subsection_map = ms->usage
>  		? &ms->usage->subsection_map[0] : NULL;
>  
> @@ -745,8 +741,31 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  	if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
>  				"section already deactivated (%#lx + %ld)\n",
>  				pfn, nr_pages))
> +		return -EINVAL;
> +
> +	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> +

Nit: I'd drop this line.

> +	return 0;
> +}
> +
> +static bool is_subsection_map_empty(struct mem_section *ms)
> +{
> +	return bitmap_empty(&ms->usage->subsection_map[0],
> +			    SUBSECTIONS_PER_SECTION);
> +}
> +
> +static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> +		struct vmem_altmap *altmap)
> +{
> +	struct mem_section *ms = __pfn_to_section(pfn);
> +	bool section_is_early = early_section(ms);
> +	struct page *memmap = NULL;
> +	bool empty = false;

Nit: No need to initialize empty.

> +
> +	if (clear_subsection_map(pfn, nr_pages))
>  		return;
>  

Nit: I'd drop this empty line.

> +	empty = is_subsection_map_empty(ms);
>  	/*
>  	 * There are 3 cases to handle across two configurations
>  	 * (SPARSEMEM_VMEMMAP={y,n}):
> @@ -764,8 +783,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  	 *
>  	 * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
>  	 */
> -	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> -	empty = bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION);

I do wonder why you moved this up the comment?

>  	if (empty) {
>  		unsigned long section_nr = pfn_to_section_nr(pfn);
>  
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 4/7] mm/sparse.c: only use subsection map in VMEMMAP case
  2020-03-07  8:42 ` [PATCH v3 4/7] mm/sparse.c: only use subsection map in VMEMMAP case Baoquan He
@ 2020-03-09  9:00   ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2020-03-09  9:00 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, mhocko, richardw.yang, dan.j.williams, osalvador, rppt

On 07.03.20 09:42, Baoquan He wrote:
> Currently, to support subsection aligned memory region adding for pmem,
> subsection map is added to track which subsection is present.
> 
> However, config ZONE_DEVICE depends on SPARSEMEM_VMEMMAP. It means
> subsection map only makes sense when SPARSEMEM_VMEMMAP enabled. For the
> classic sparse, subsection map is meaningless and confusing.
> 
> About the classic sparse which doesn't support subsection hotplug, Dan
> said it's more because the effort and maintenance burden outweighs the
> benefit. Besides, the current 64 bit ARCHes all enable
> SPARSEMEM_VMEMMAP_ENABLE by default.
> 
> Combining the above reasons, no need to provide subsection map and the
> relevant handling for the classic sparse. Handle it with this patch.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  include/linux/mmzone.h |  2 ++
>  mm/sparse.c            | 25 +++++++++++++++++++++++++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 42b77d3b68e8..f3f264826423 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1143,7 +1143,9 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
>  #define SUBSECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SUBSECTION_MASK)
>  
>  struct mem_section_usage {
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
>  	DECLARE_BITMAP(subsection_map, SUBSECTIONS_PER_SECTION);
> +#endif
>  	/* See declaration of similar field in struct zone */
>  	unsigned long pageblock_flags[0];
>  };
> diff --git a/mm/sparse.c b/mm/sparse.c
> index d9dcd58d5c1d..2142045ab5c5 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -209,6 +209,7 @@ static inline unsigned long first_present_section_nr(void)
>  	return next_present_section_nr(-1);
>  }
>  
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
>  static void subsection_mask_set(unsigned long *map, unsigned long pfn,
>  		unsigned long nr_pages)
>  {
> @@ -243,6 +244,11 @@ void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
>  		nr_pages -= pfns;
>  	}
>  }
> +#else
> +void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
> +{
> +}
> +#endif
>  
>  /* Record a memory area against a node. */
>  void __init memory_present(int nid, unsigned long start, unsigned long end)
> @@ -726,6 +732,7 @@ static void free_map_bootmem(struct page *memmap)
>  }
>  #endif /* CONFIG_SPARSEMEM_VMEMMAP */
>  
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
>  static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
>  {
>  	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> @@ -753,6 +760,17 @@ static bool is_subsection_map_empty(struct mem_section *ms)
>  	return bitmap_empty(&ms->usage->subsection_map[0],
>  			    SUBSECTIONS_PER_SECTION);
>  }
> +#else
> +static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
> +{
> +	return 0;
> +}
> +
> +static bool is_subsection_map_empty(struct mem_section *ms)
> +{
> +	return true;
> +}
> +#endif
>  
>  static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  		struct vmem_altmap *altmap)
> @@ -809,6 +827,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  		ms->section_mem_map = (unsigned long)NULL;
>  }
>  
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
>  static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
>  {
>  	struct mem_section *ms = __pfn_to_section(pfn);
> @@ -830,6 +849,12 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
>  
>  	return rc;
>  }
> +#else
> +static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
> +{
> +	return 0;
> +}
> +#endif
>  
>  static struct page * __meminit section_activate(int nid, unsigned long pfn,
>  		unsigned long nr_pages, struct vmem_altmap *altmap)
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 6/7] mm/sparse.c: move subsection_map related codes together
  2020-03-07  8:42 ` [PATCH v3 6/7] mm/sparse.c: move subsection_map related codes together Baoquan He
@ 2020-03-09  9:08   ` David Hildenbrand
  2020-03-09 13:41     ` Baoquan He
  0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2020-03-09  9:08 UTC (permalink / raw)
  To: Baoquan He, linux-kernel
  Cc: linux-mm, akpm, mhocko, richardw.yang, dan.j.williams, osalvador, rppt

On 07.03.20 09:42, Baoquan He wrote:
> No functional change.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/sparse.c | 134 +++++++++++++++++++++++++---------------------------
>  1 file changed, 65 insertions(+), 69 deletions(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 0fbd79c4ad81..fde651ab8741 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -244,10 +244,75 @@ void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
>  		nr_pages -= pfns;
>  	}
>  }
> +
> +static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
> +{
> +	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> +	DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
> +	struct mem_section *ms = __pfn_to_section(pfn);
> +	unsigned long *subsection_map = ms->usage
> +		? &ms->usage->subsection_map[0] : NULL;
> +
> +	subsection_mask_set(map, pfn, nr_pages);
> +	if (subsection_map)
> +		bitmap_and(tmp, map, subsection_map, SUBSECTIONS_PER_SECTION);
> +
> +	if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
> +				"section already deactivated (%#lx + %ld)\n",
> +				pfn, nr_pages))
> +		return -EINVAL;
> +
> +	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> +
> +	return 0;
> +}
> +
> +static bool is_subsection_map_empty(struct mem_section *ms)
> +{
> +	return bitmap_empty(&ms->usage->subsection_map[0],
> +			    SUBSECTIONS_PER_SECTION);
> +}
> +
> +static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
> +{
> +	struct mem_section *ms = __pfn_to_section(pfn);
> +	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> +	unsigned long *subsection_map;
> +	int rc = 0;
> +
> +	subsection_mask_set(map, pfn, nr_pages);
> +
> +	subsection_map = &ms->usage->subsection_map[0];
> +
> +	if (bitmap_empty(map, SUBSECTIONS_PER_SECTION))
> +		rc = -EINVAL;
> +	else if (bitmap_intersects(map, subsection_map, SUBSECTIONS_PER_SECTION))
> +		rc = -EEXIST;
> +	else
> +		bitmap_or(subsection_map, map, subsection_map,
> +				SUBSECTIONS_PER_SECTION);
> +
> +	return rc;
> +}
>  #else
>  void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
>  {
>  }
> +
> +static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
> +{
> +	return 0;
> +}
> +
> +static bool is_subsection_map_empty(struct mem_section *ms)
> +{
> +	return true;
> +}
> +
> +static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
> +{
> +	return 0;
> +}
>  #endif
>  
>  /* Record a memory area against a node. */
> @@ -732,46 +797,6 @@ static void free_map_bootmem(struct page *memmap)
>  }
>  #endif /* CONFIG_SPARSEMEM_VMEMMAP */
>  
> -#ifdef CONFIG_SPARSEMEM_VMEMMAP
> -static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
> -{
> -	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> -	DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
> -	struct mem_section *ms = __pfn_to_section(pfn);
> -	unsigned long *subsection_map = ms->usage
> -		? &ms->usage->subsection_map[0] : NULL;
> -
> -	subsection_mask_set(map, pfn, nr_pages);
> -	if (subsection_map)
> -		bitmap_and(tmp, map, subsection_map, SUBSECTIONS_PER_SECTION);
> -
> -	if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
> -				"section already deactivated (%#lx + %ld)\n",
> -				pfn, nr_pages))
> -		return -EINVAL;
> -
> -	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> -
> -	return 0;
> -}
> -
> -static bool is_subsection_map_empty(struct mem_section *ms)
> -{
> -	return bitmap_empty(&ms->usage->subsection_map[0],
> -			    SUBSECTIONS_PER_SECTION);
> -}
> -#else
> -static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
> -{
> -	return 0;
> -}
> -
> -static bool is_subsection_map_empty(struct mem_section *ms)
> -{
> -	return true;
> -}
> -#endif
> -
>  /*
>   * To deactivate a memory region, there are 3 cases to handle across
>   * two configurations (SPARSEMEM_VMEMMAP={y,n}):
> @@ -826,35 +851,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  		ms->section_mem_map = (unsigned long)NULL;
>  }
>  
> -#ifdef CONFIG_SPARSEMEM_VMEMMAP
> -static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
> -{
> -	struct mem_section *ms = __pfn_to_section(pfn);
> -	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> -	unsigned long *subsection_map;
> -	int rc = 0;
> -
> -	subsection_mask_set(map, pfn, nr_pages);
> -
> -	subsection_map = &ms->usage->subsection_map[0];
> -
> -	if (bitmap_empty(map, SUBSECTIONS_PER_SECTION))
> -		rc = -EINVAL;
> -	else if (bitmap_intersects(map, subsection_map, SUBSECTIONS_PER_SECTION))
> -		rc = -EEXIST;
> -	else
> -		bitmap_or(subsection_map, map, subsection_map,
> -				SUBSECTIONS_PER_SECTION);
> -
> -	return rc;
> -}
> -#else
> -static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
> -{
> -	return 0;
> -}
> -#endif
> -
>  static struct page * __meminit section_activate(int nid, unsigned long pfn,
>  		unsigned long nr_pages, struct vmem_altmap *altmap)
>  {
> 

IMHO, we don't need this patch - but just my personal opinion. Change
itself looks good on a quick glance.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
  2020-03-07  8:42 ` [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case Baoquan He
                     ` (2 preceding siblings ...)
  2020-03-09  8:58   ` David Hildenbrand
@ 2020-03-09 10:13   ` Pankaj Gupta
  2020-03-09 12:56   ` Michal Hocko
  4 siblings, 0 replies; 29+ messages in thread
From: Pankaj Gupta @ 2020-03-09 10:13 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, Andrew Morton, mhocko, David Hildenbrand,
	richardw.yang, dan.j.williams, osalvador, Mike Rapoport

>
> In section_deactivate(), pfn_to_page() doesn't work any more after
> ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
> It caused hot remove failure:
>
> kernel BUG at mm/page_alloc.c:4806!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G        W         5.5.0-next-20200205+ #340
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> RIP: 0010:free_pages+0x85/0xa0
> Call Trace:
>  __remove_pages+0x99/0xc0
>  arch_remove_memory+0x23/0x4d
>  try_remove_memory+0xc8/0x130
>  ? walk_memory_blocks+0x72/0xa0
>  __remove_memory+0xa/0x11
>  acpi_memory_device_remove+0x72/0x100
>  acpi_bus_trim+0x55/0x90
>  acpi_device_hotplug+0x2eb/0x3d0
>  acpi_hotplug_work_fn+0x1a/0x30
>  process_one_work+0x1a7/0x370
>  worker_thread+0x30/0x380
>  ? flush_rcu_work+0x30/0x30
>  kthread+0x112/0x130
>  ? kthread_create_on_node+0x60/0x60
>  ret_from_fork+0x35/0x40
>
> Let's move the ->section_mem_map resetting after depopulate_section_memmap()
> to fix it.
>
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Cc: stable@vger.kernel.org
> ---
>  mm/sparse.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 42c18a38ffaa..1b50c15677d7 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -734,6 +734,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>         struct mem_section *ms = __pfn_to_section(pfn);
>         bool section_is_early = early_section(ms);
>         struct page *memmap = NULL;
> +       bool empty = false;
>         unsigned long *subsection_map = ms->usage
>                 ? &ms->usage->subsection_map[0] : NULL;
>
> @@ -764,7 +765,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>          * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
>          */
>         bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> -       if (bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION)) {
> +       empty = bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION);
> +       if (empty) {
>                 unsigned long section_nr = pfn_to_section_nr(pfn);
>
>                 /*
> @@ -779,13 +781,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>                         ms->usage = NULL;
>                 }
>                 memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> -               ms->section_mem_map = (unsigned long)NULL;
>         }
>
>         if (section_is_early && memmap)
>                 free_map_bootmem(memmap);
>         else
>                 depopulate_section_memmap(pfn, nr_pages, altmap);
> +
> +       if (empty)
> +               ms->section_mem_map = (unsigned long)NULL;
>  }
>
>  static struct page * __meminit section_activate(int nid, unsigned long pfn,
> --

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

> 2.17.2
>
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
  2020-03-07  8:42 ` [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case Baoquan He
                     ` (3 preceding siblings ...)
  2020-03-09 10:13   ` Pankaj Gupta
@ 2020-03-09 12:56   ` Michal Hocko
  4 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2020-03-09 12:56 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, david, richardw.yang,
	dan.j.williams, osalvador, rppt

On Sat 07-03-20 16:42:23, Baoquan He wrote:
> In section_deactivate(), pfn_to_page() doesn't work any more after
> ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
> It caused hot remove failure:
> 
> kernel BUG at mm/page_alloc.c:4806!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G        W         5.5.0-next-20200205+ #340
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> RIP: 0010:free_pages+0x85/0xa0
> Call Trace:
>  __remove_pages+0x99/0xc0
>  arch_remove_memory+0x23/0x4d
>  try_remove_memory+0xc8/0x130
>  ? walk_memory_blocks+0x72/0xa0
>  __remove_memory+0xa/0x11
>  acpi_memory_device_remove+0x72/0x100
>  acpi_bus_trim+0x55/0x90
>  acpi_device_hotplug+0x2eb/0x3d0
>  acpi_hotplug_work_fn+0x1a/0x30
>  process_one_work+0x1a7/0x370
>  worker_thread+0x30/0x380
>  ? flush_rcu_work+0x30/0x30
>  kthread+0x112/0x130
>  ? kthread_create_on_node+0x60/0x60
>  ret_from_fork+0x35/0x40
> 
> Let's move the ->section_mem_map resetting after depopulate_section_memmap()
> to fix it.
> 
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Cc: stable@vger.kernel.org

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/sparse.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 42c18a38ffaa..1b50c15677d7 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -734,6 +734,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  	struct mem_section *ms = __pfn_to_section(pfn);
>  	bool section_is_early = early_section(ms);
>  	struct page *memmap = NULL;
> +	bool empty = false;
>  	unsigned long *subsection_map = ms->usage
>  		? &ms->usage->subsection_map[0] : NULL;
>  
> @@ -764,7 +765,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  	 * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
>  	 */
>  	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> -	if (bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION)) {
> +	empty = bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION);
> +	if (empty) {
>  		unsigned long section_nr = pfn_to_section_nr(pfn);
>  
>  		/*
> @@ -779,13 +781,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  			ms->usage = NULL;
>  		}
>  		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> -		ms->section_mem_map = (unsigned long)NULL;
>  	}
>  
>  	if (section_is_early && memmap)
>  		free_map_bootmem(memmap);
>  	else
>  		depopulate_section_memmap(pfn, nr_pages, altmap);
> +
> +	if (empty)
> +		ms->section_mem_map = (unsigned long)NULL;
>  }
>  
>  static struct page * __meminit section_activate(int nid, unsigned long pfn,
> -- 
> 2.17.2
> 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
  2020-03-09  8:58   ` David Hildenbrand
@ 2020-03-09 13:18     ` Baoquan He
  2020-03-09 13:22       ` David Hildenbrand
  0 siblings, 1 reply; 29+ messages in thread
From: Baoquan He @ 2020-03-09 13:18 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, akpm, mhocko, richardw.yang,
	dan.j.williams, osalvador, rppt

On 03/09/20 at 09:58am, David Hildenbrand wrote:
> On 07.03.20 09:42, Baoquan He wrote:
> > In section_deactivate(), pfn_to_page() doesn't work any more after
> > ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
> > It caused hot remove failure:
> > 
> > kernel BUG at mm/page_alloc.c:4806!
> > invalid opcode: 0000 [#1] SMP PTI
> > CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G        W         5.5.0-next-20200205+ #340
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> > Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> > RIP: 0010:free_pages+0x85/0xa0
> > Call Trace:
> >  __remove_pages+0x99/0xc0
> >  arch_remove_memory+0x23/0x4d
> >  try_remove_memory+0xc8/0x130
> >  ? walk_memory_blocks+0x72/0xa0
> >  __remove_memory+0xa/0x11
> >  acpi_memory_device_remove+0x72/0x100
> >  acpi_bus_trim+0x55/0x90
> >  acpi_device_hotplug+0x2eb/0x3d0
> >  acpi_hotplug_work_fn+0x1a/0x30
> >  process_one_work+0x1a7/0x370
> >  worker_thread+0x30/0x380
> >  ? flush_rcu_work+0x30/0x30
> >  kthread+0x112/0x130
> >  ? kthread_create_on_node+0x60/0x60
> >  ret_from_fork+0x35/0x40
> > 
> > Let's move the ->section_mem_map resetting after depopulate_section_memmap()
> > to fix it.
> > 
> > Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > Cc: stable@vger.kernel.org
> > ---
> >  mm/sparse.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index 42c18a38ffaa..1b50c15677d7 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -734,6 +734,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> >  	struct mem_section *ms = __pfn_to_section(pfn);
> >  	bool section_is_early = early_section(ms);
> >  	struct page *memmap = NULL;
> > +	bool empty = false;
> 
> Oh, one NIT: no need to initialize empty to false.

Thanks for careful reviewing, David.

Not very sure about this, do you have a doc or discussion thread about
not initializing local variable? Maybe Andrew can help update it if this
is not suggested. 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
  2020-03-09 13:18     ` Baoquan He
@ 2020-03-09 13:22       ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2020-03-09 13:22 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, mhocko, richardw.yang,
	dan.j.williams, osalvador, rppt

On 09.03.20 14:18, Baoquan He wrote:
> On 03/09/20 at 09:58am, David Hildenbrand wrote:
>> On 07.03.20 09:42, Baoquan He wrote:
>>> In section_deactivate(), pfn_to_page() doesn't work any more after
>>> ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
>>> It caused hot remove failure:
>>>
>>> kernel BUG at mm/page_alloc.c:4806!
>>> invalid opcode: 0000 [#1] SMP PTI
>>> CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G        W         5.5.0-next-20200205+ #340
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
>>> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
>>> RIP: 0010:free_pages+0x85/0xa0
>>> Call Trace:
>>>  __remove_pages+0x99/0xc0
>>>  arch_remove_memory+0x23/0x4d
>>>  try_remove_memory+0xc8/0x130
>>>  ? walk_memory_blocks+0x72/0xa0
>>>  __remove_memory+0xa/0x11
>>>  acpi_memory_device_remove+0x72/0x100
>>>  acpi_bus_trim+0x55/0x90
>>>  acpi_device_hotplug+0x2eb/0x3d0
>>>  acpi_hotplug_work_fn+0x1a/0x30
>>>  process_one_work+0x1a7/0x370
>>>  worker_thread+0x30/0x380
>>>  ? flush_rcu_work+0x30/0x30
>>>  kthread+0x112/0x130
>>>  ? kthread_create_on_node+0x60/0x60
>>>  ret_from_fork+0x35/0x40
>>>
>>> Let's move the ->section_mem_map resetting after depopulate_section_memmap()
>>> to fix it.
>>>
>>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>>> Signed-off-by: Baoquan He <bhe@redhat.com>
>>> Cc: stable@vger.kernel.org
>>> ---
>>>  mm/sparse.c | 8 ++++++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>> index 42c18a38ffaa..1b50c15677d7 100644
>>> --- a/mm/sparse.c
>>> +++ b/mm/sparse.c
>>> @@ -734,6 +734,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>>>  	struct mem_section *ms = __pfn_to_section(pfn);
>>>  	bool section_is_early = early_section(ms);
>>>  	struct page *memmap = NULL;
>>> +	bool empty = false;
>>
>> Oh, one NIT: no need to initialize empty to false.
> 
> Thanks for careful reviewing, David.
> 
> Not very sure about this, do you have a doc or discussion thread about
> not initializing local variable? Maybe Andrew can help update it if this
> is not suggested. 

The general rule is to no initialize what will always be initialized
later. Compare with most other code in-tree - e.g., sparse_init_nid.

Makes the code usually easier to follow.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/7] mm/sparse.c: introduce a new function clear_subsection_map()
  2020-03-09  8:59   ` David Hildenbrand
@ 2020-03-09 13:32     ` Baoquan He
  2020-03-09 13:38       ` David Hildenbrand
  0 siblings, 1 reply; 29+ messages in thread
From: Baoquan He @ 2020-03-09 13:32 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, akpm, mhocko, richardw.yang,
	dan.j.williams, osalvador, rppt

On 03/09/20 at 09:59am, David Hildenbrand wrote:
> On 07.03.20 09:42, Baoquan He wrote:
> > Factor out the code which clear subsection map of one memory region from
> > section_deactivate() into clear_subsection_map().
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  mm/sparse.c | 31 ++++++++++++++++++++++++-------
> >  1 file changed, 24 insertions(+), 7 deletions(-)
> > 
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index e37c0abcdc89..d9dcd58d5c1d 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -726,15 +726,11 @@ static void free_map_bootmem(struct page *memmap)
> >  }
> >  #endif /* CONFIG_SPARSEMEM_VMEMMAP */
> >  
> > -static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> > -		struct vmem_altmap *altmap)
> > +static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
> >  {
> >  	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> >  	DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
> >  	struct mem_section *ms = __pfn_to_section(pfn);
> > -	bool section_is_early = early_section(ms);
> > -	struct page *memmap = NULL;
> > -	bool empty = false;
> >  	unsigned long *subsection_map = ms->usage
> >  		? &ms->usage->subsection_map[0] : NULL;
> >  
> > @@ -745,8 +741,31 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> >  	if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
> >  				"section already deactivated (%#lx + %ld)\n",
> >  				pfn, nr_pages))
> > +		return -EINVAL;
> > +
> > +	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> > +
> 
> Nit: I'd drop this line.

It's fine to me. I usually keep one line for the returning. I will
remove it when update.

> 
> > +	return 0;
> > +}
> > +
> > +static bool is_subsection_map_empty(struct mem_section *ms)
> > +{
> > +	return bitmap_empty(&ms->usage->subsection_map[0],
> > +			    SUBSECTIONS_PER_SECTION);
> > +}
> > +
> > +static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> > +		struct vmem_altmap *altmap)
> > +{
> > +	struct mem_section *ms = __pfn_to_section(pfn);
> > +	bool section_is_early = early_section(ms);
> > +	struct page *memmap = NULL;
> > +	bool empty = false;
> 
> Nit: No need to initialize empty.

This is inherited from patch 1.

> 
> > +
> > +	if (clear_subsection_map(pfn, nr_pages))
> >  		return;
> >  
> 
> Nit: I'd drop this empty line.
> 
> > +	empty = is_subsection_map_empty(ms);
> >  	/*
> >  	 * There are 3 cases to handle across two configurations
> >  	 * (SPARSEMEM_VMEMMAP={y,n}):
> > @@ -764,8 +783,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> >  	 *
> >  	 * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
> >  	 */
> > -	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> > -	empty = bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION);
> 
> I do wonder why you moved this up the comment?

Since this empty will cover two places of handling, so moved it up,
seems this is what I was thinking. Can move it back here.

> 
> >  	if (empty) {
> >  		unsigned long section_nr = pfn_to_section_nr(pfn);
> >  
> > 
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
> 
> -- 
> Thanks,
> 
> David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/7] mm/sparse.c: introduce a new function clear_subsection_map()
  2020-03-09 13:32     ` Baoquan He
@ 2020-03-09 13:38       ` David Hildenbrand
  2020-03-09 14:07         ` Baoquan He
  0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2020-03-09 13:38 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, mhocko, richardw.yang,
	dan.j.williams, osalvador, rppt

On 09.03.20 14:32, Baoquan He wrote:
> On 03/09/20 at 09:59am, David Hildenbrand wrote:
>> On 07.03.20 09:42, Baoquan He wrote:
>>> Factor out the code which clear subsection map of one memory region from
>>> section_deactivate() into clear_subsection_map().
>>>
>>> Signed-off-by: Baoquan He <bhe@redhat.com>
>>> ---
>>>  mm/sparse.c | 31 ++++++++++++++++++++++++-------
>>>  1 file changed, 24 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>> index e37c0abcdc89..d9dcd58d5c1d 100644
>>> --- a/mm/sparse.c
>>> +++ b/mm/sparse.c
>>> @@ -726,15 +726,11 @@ static void free_map_bootmem(struct page *memmap)
>>>  }
>>>  #endif /* CONFIG_SPARSEMEM_VMEMMAP */
>>>  
>>> -static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>>> -		struct vmem_altmap *altmap)
>>> +static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
>>>  {
>>>  	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
>>>  	DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
>>>  	struct mem_section *ms = __pfn_to_section(pfn);
>>> -	bool section_is_early = early_section(ms);
>>> -	struct page *memmap = NULL;
>>> -	bool empty = false;
>>>  	unsigned long *subsection_map = ms->usage
>>>  		? &ms->usage->subsection_map[0] : NULL;
>>>  
>>> @@ -745,8 +741,31 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>>>  	if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
>>>  				"section already deactivated (%#lx + %ld)\n",
>>>  				pfn, nr_pages))
>>> +		return -EINVAL;
>>> +
>>> +	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
>>> +
>>
>> Nit: I'd drop this line.
> 
> It's fine to me. I usually keep one line for the returning. I will
> remove it when update.
> 
>>
>>> +	return 0;
>>> +}
>>> +
>>> +static bool is_subsection_map_empty(struct mem_section *ms)
>>> +{
>>> +	return bitmap_empty(&ms->usage->subsection_map[0],
>>> +			    SUBSECTIONS_PER_SECTION);
>>> +}
>>> +
>>> +static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>>> +		struct vmem_altmap *altmap)
>>> +{
>>> +	struct mem_section *ms = __pfn_to_section(pfn);
>>> +	bool section_is_early = early_section(ms);
>>> +	struct page *memmap = NULL;
>>> +	bool empty = false;
>>
>> Nit: No need to initialize empty.
> 
> This is inherited from patch 1.
> 
>>
>>> +
>>> +	if (clear_subsection_map(pfn, nr_pages))
>>>  		return;
>>>  
>>
>> Nit: I'd drop this empty line.
>>
>>> +	empty = is_subsection_map_empty(ms);
>>>  	/*
>>>  	 * There are 3 cases to handle across two configurations
>>>  	 * (SPARSEMEM_VMEMMAP={y,n}):
>>> @@ -764,8 +783,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>>>  	 *
>>>  	 * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
>>>  	 */
>>> -	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
>>> -	empty = bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION);
>>
>> I do wonder why you moved this up the comment?
> 
> Since this empty will cover two places of handling, so moved it up,
> seems this is what I was thinking. Can move it back here.

You're moving the whole comment later, was just wondering (makes it
slightly harder to review).


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 6/7] mm/sparse.c: move subsection_map related codes together
  2020-03-09  9:08   ` David Hildenbrand
@ 2020-03-09 13:41     ` Baoquan He
  0 siblings, 0 replies; 29+ messages in thread
From: Baoquan He @ 2020-03-09 13:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, akpm, mhocko, dan.j.williams, osalvador, rppt

On 03/09/20 at 10:08am, David Hildenbrand wrote:
> On 07.03.20 09:42, Baoquan He wrote:
> > No functional change.
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  mm/sparse.c | 134 +++++++++++++++++++++++++---------------------------
> >  1 file changed, 65 insertions(+), 69 deletions(-)
 
> 
> IMHO, we don't need this patch - but just my personal opinion. Change
> itself looks good on a quick glance.

I personally like seeing function set operating on one data structure
being put together. To me, I use vi+ctags+cscope to jump to called
funtion easily. When try to get a picture of a data and handling, e.g here
the subsection map and the relevant functions, putting them together is
better to understand code. I am also fine to discard this patch, no
patch has dependency on this one in this series, it's easy to not pick
it if no one like it. 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/7] mm/sparse.c: introduce a new function clear_subsection_map()
  2020-03-09 13:38       ` David Hildenbrand
@ 2020-03-09 14:07         ` Baoquan He
  0 siblings, 0 replies; 29+ messages in thread
From: Baoquan He @ 2020-03-09 14:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, akpm, mhocko, richardw.yang,
	dan.j.williams, osalvador, rppt

On 03/09/20 at 02:38pm, David Hildenbrand wrote:
> On 09.03.20 14:32, Baoquan He wrote:
> > On 03/09/20 at 09:59am, David Hildenbrand wrote:
> >> On 07.03.20 09:42, Baoquan He wrote:
> >>> Factor out the code which clear subsection map of one memory region from
> >>> section_deactivate() into clear_subsection_map().
> >>>
> >>> Signed-off-by: Baoquan He <bhe@redhat.com>
> >>> ---
> >>>  mm/sparse.c | 31 ++++++++++++++++++++++++-------
> >>>  1 file changed, 24 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/mm/sparse.c b/mm/sparse.c
> >>> index e37c0abcdc89..d9dcd58d5c1d 100644
> >>> --- a/mm/sparse.c
> >>> +++ b/mm/sparse.c
> >>> @@ -726,15 +726,11 @@ static void free_map_bootmem(struct page *memmap)
> >>>  }
> >>>  #endif /* CONFIG_SPARSEMEM_VMEMMAP */
> >>>  
> >>> -static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> >>> -		struct vmem_altmap *altmap)
> >>> +static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
> >>>  {
> >>>  	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> >>>  	DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
> >>>  	struct mem_section *ms = __pfn_to_section(pfn);
> >>> -	bool section_is_early = early_section(ms);
> >>> -	struct page *memmap = NULL;
> >>> -	bool empty = false;
> >>>  	unsigned long *subsection_map = ms->usage
> >>>  		? &ms->usage->subsection_map[0] : NULL;
> >>>  
> >>> @@ -745,8 +741,31 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> >>>  	if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
> >>>  				"section already deactivated (%#lx + %ld)\n",
> >>>  				pfn, nr_pages))
> >>> +		return -EINVAL;
> >>> +
> >>> +	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> >>> +
> >>
> >> Nit: I'd drop this line.
> > 
> > It's fine to me. I usually keep one line for the returning. I will
> > remove it when update.
> > 
> >>
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +static bool is_subsection_map_empty(struct mem_section *ms)
> >>> +{
> >>> +	return bitmap_empty(&ms->usage->subsection_map[0],
> >>> +			    SUBSECTIONS_PER_SECTION);
> >>> +}
> >>> +
> >>> +static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> >>> +		struct vmem_altmap *altmap)
> >>> +{
> >>> +	struct mem_section *ms = __pfn_to_section(pfn);
> >>> +	bool section_is_early = early_section(ms);
> >>> +	struct page *memmap = NULL;
> >>> +	bool empty = false;
> >>
> >> Nit: No need to initialize empty.
> > 
> > This is inherited from patch 1.
> > 
> >>
> >>> +
> >>> +	if (clear_subsection_map(pfn, nr_pages))
> >>>  		return;
> >>>  
> >>
> >> Nit: I'd drop this empty line.
> >>
> >>> +	empty = is_subsection_map_empty(ms);
> >>>  	/*
> >>>  	 * There are 3 cases to handle across two configurations
> >>>  	 * (SPARSEMEM_VMEMMAP={y,n}):
> >>> @@ -764,8 +783,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> >>>  	 *
> >>>  	 * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
> >>>  	 */
> >>> -	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> >>> -	empty = bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION);
> >>
> >> I do wonder why you moved this up the comment?
> > 
> > Since this empty will cover two places of handling, so moved it up,
> > seems this is what I was thinking. Can move it back here.
> 
> You're moving the whole comment later, was just wondering (makes it
> slightly harder to review).

I see, sorry for the confusion. I will move it back when repost.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/7] mm/sparse.c: add note about only VMEMMAP supporting sub-section support
  2020-03-07  8:42 ` [PATCH v3 5/7] mm/sparse.c: add note about only VMEMMAP supporting sub-section support Baoquan He
  2020-03-07 11:55   ` Baoquan He
@ 2020-03-10 14:46   ` Michal Hocko
  2020-03-11  4:20     ` Baoquan He
  1 sibling, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2020-03-10 14:46 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, david, richardw.yang,
	dan.j.williams, osalvador, rppt

On Sat 07-03-20 16:42:27, Baoquan He wrote:
> And tell check_pfn_span() gating the porper alignment and size of
> hot added memory region.
> 
> And also move the code comments from inside section_deactivate()
> to being above it. The code comments are reasonable for the whole
> function, and the moving makes code cleaner.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

I have glanced through other patches and they seem sane but I do not
have time to go deeper to give an ack. I like this one though because it
really makes the intention clearer.

> ---
>  mm/sparse.c | 37 ++++++++++++++++++++-----------------
>  1 file changed, 20 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 2142045ab5c5..0fbd79c4ad81 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -772,6 +772,22 @@ static bool is_subsection_map_empty(struct mem_section *ms)
>  }
>  #endif
>  
> +/*
> + * To deactivate a memory region, there are 3 cases to handle across
> + * two configurations (SPARSEMEM_VMEMMAP={y,n}):
> + *
> + * 1. deactivation of a partial hot-added section (only possible in
> + *    the SPARSEMEM_VMEMMAP=y case).
> + *      a) section was present at memory init.
> + *      b) section was hot-added post memory init.
> + * 2. deactivation of a complete hot-added section.
> + * 3. deactivation of a complete section from memory init.
> + *
> + * For 1, when subsection_map does not empty we will not be freeing the
> + * usage map, but still need to free the vmemmap range.
> + *
> + * For 2 and 3, the SPARSEMEM_VMEMMAP={y,n} cases are unified
> + */
>  static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  		struct vmem_altmap *altmap)
>  {
> @@ -784,23 +800,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  		return;
>  
>  	empty = is_subsection_map_empty(ms);
> -	/*
> -	 * There are 3 cases to handle across two configurations
> -	 * (SPARSEMEM_VMEMMAP={y,n}):
> -	 *
> -	 * 1/ deactivation of a partial hot-added section (only possible
> -	 * in the SPARSEMEM_VMEMMAP=y case).
> -	 *    a/ section was present at memory init
> -	 *    b/ section was hot-added post memory init
> -	 * 2/ deactivation of a complete hot-added section
> -	 * 3/ deactivation of a complete section from memory init
> -	 *
> -	 * For 1/, when subsection_map does not empty we will not be
> -	 * freeing the usage map, but still need to free the vmemmap
> -	 * range.
> -	 *
> -	 * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
> -	 */
>  	if (empty) {
>  		unsigned long section_nr = pfn_to_section_nr(pfn);
>  
> @@ -907,6 +906,10 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
>   *
>   * This is only intended for hotplug.
>   *
> + * Note that only VMEMMAP supports sub-section aligned hotplug,
> + * the proper alignment and size are gated by check_pfn_span().
> + *
> + *
>   * Return:
>   * * 0		- On success.
>   * * -EEXIST	- Section has been present.
> -- 
> 2.17.2
> 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 7/7] mm/sparse.c: Use __get_free_pages() instead in populate_section_memmap()
  2020-03-07  8:42 ` [PATCH v3 7/7] mm/sparse.c: Use __get_free_pages() instead in populate_section_memmap() Baoquan He
@ 2020-03-10 14:56   ` Michal Hocko
  2020-03-10 14:59     ` David Hildenbrand
  2020-03-11  9:31     ` Baoquan He
  0 siblings, 2 replies; 29+ messages in thread
From: Michal Hocko @ 2020-03-10 14:56 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, david, richardw.yang,
	dan.j.williams, osalvador, rppt

On Sat 07-03-20 16:42:29, Baoquan He wrote:
> This removes the unnecessary goto, and simplify codes.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
> ---
>  mm/sparse.c | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index fde651ab8741..266f7f5040fb 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -735,23 +735,19 @@ static void free_map_bootmem(struct page *memmap)
>  struct page * __meminit populate_section_memmap(unsigned long pfn,
>  		unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
>  {
> -	struct page *page, *ret;
> +	struct page *ret;
>  	unsigned long memmap_size = sizeof(struct page) * PAGES_PER_SECTION;
>  
> -	page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size));
> -	if (page)
> -		goto got_map_page;
> +	ret = (void*)__get_free_pages(GFP_KERNEL|__GFP_NOWARN,
> +				get_order(memmap_size));
> +	if (ret)
> +		return ret;
>  
>  	ret = vmalloc(memmap_size);
>  	if (ret)
> -		goto got_map_ptr;
> +		return ret;
>  
>  	return NULL;
> -got_map_page:
> -	ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
> -got_map_ptr:
> -
> -	return ret;
>  }

Boy this code is ugly. Is there any reason we cannot simply use
kvmalloc_array(PAGES_PER_SECTION, sizeof(struct page), GFP_KERNEL | __GFP_NOWARN)

And if we care about locality then go even one step further
kvmalloc_node(PAGES_PER_SECTION * sizeof(struct page), GFP_KERNEL | __GFP_NOWARN, nid)

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 7/7] mm/sparse.c: Use __get_free_pages() instead in populate_section_memmap()
  2020-03-10 14:56   ` Michal Hocko
@ 2020-03-10 14:59     ` David Hildenbrand
  2020-03-11  9:31     ` Baoquan He
  1 sibling, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2020-03-10 14:59 UTC (permalink / raw)
  To: Michal Hocko, Baoquan He
  Cc: linux-kernel, linux-mm, akpm, richardw.yang, dan.j.williams,
	osalvador, rppt

On 10.03.20 15:56, Michal Hocko wrote:
> On Sat 07-03-20 16:42:29, Baoquan He wrote:
>> This removes the unnecessary goto, and simplify codes.
>>
>> Signed-off-by: Baoquan He <bhe@redhat.com>
>> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
>> ---
>>  mm/sparse.c | 16 ++++++----------
>>  1 file changed, 6 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index fde651ab8741..266f7f5040fb 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -735,23 +735,19 @@ static void free_map_bootmem(struct page *memmap)
>>  struct page * __meminit populate_section_memmap(unsigned long pfn,
>>  		unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
>>  {
>> -	struct page *page, *ret;
>> +	struct page *ret;
>>  	unsigned long memmap_size = sizeof(struct page) * PAGES_PER_SECTION;
>>  
>> -	page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size));
>> -	if (page)
>> -		goto got_map_page;
>> +	ret = (void*)__get_free_pages(GFP_KERNEL|__GFP_NOWARN,
>> +				get_order(memmap_size));
>> +	if (ret)
>> +		return ret;
>>  
>>  	ret = vmalloc(memmap_size);
>>  	if (ret)
>> -		goto got_map_ptr;
>> +		return ret;
>>  
>>  	return NULL;
>> -got_map_page:
>> -	ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
>> -got_map_ptr:
>> -
>> -	return ret;
>>  }
> 
> Boy this code is ugly. Is there any reason we cannot simply use
> kvmalloc_array(PAGES_PER_SECTION, sizeof(struct page), GFP_KERNEL | __GFP_NOWARN)
> 
> And if we care about locality then go even one step further
> kvmalloc_node(PAGES_PER_SECTION * sizeof(struct page), GFP_KERNEL | __GFP_NOWARN, nid)
> 

Makes perfect sense to me.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/7] mm/sparse.c: add note about only VMEMMAP supporting sub-section support
  2020-03-10 14:46   ` Michal Hocko
@ 2020-03-11  4:20     ` Baoquan He
  0 siblings, 0 replies; 29+ messages in thread
From: Baoquan He @ 2020-03-11  4:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, akpm, david, richardw.yang,
	dan.j.williams, osalvador, rppt

On 03/10/20 at 03:46pm, Michal Hocko wrote:
> On Sat 07-03-20 16:42:27, Baoquan He wrote:
> > And tell check_pfn_span() gating the porper alignment and size of
> > hot added memory region.
> > 
> > And also move the code comments from inside section_deactivate()
> > to being above it. The code comments are reasonable for the whole
> > function, and the moving makes code cleaner.
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> 
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
> I have glanced through other patches and they seem sane but I do not
> have time to go deeper to give an ack. I like this one though because it
> really makes the intention clearer.

Thanks for your reviewing and providing ack on this patch.

I will post a new version to rebase on the top of patch 1 and its
appended fix, then address those concerns from David.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 7/7] mm/sparse.c: Use __get_free_pages() instead in populate_section_memmap()
  2020-03-10 14:56   ` Michal Hocko
  2020-03-10 14:59     ` David Hildenbrand
@ 2020-03-11  9:31     ` Baoquan He
  1 sibling, 0 replies; 29+ messages in thread
From: Baoquan He @ 2020-03-11  9:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, akpm, david, richardw.yang,
	dan.j.williams, osalvador, rppt

On 03/10/20 at 03:56pm, Michal Hocko wrote:
> On Sat 07-03-20 16:42:29, Baoquan He wrote:
> > This removes the unnecessary goto, and simplify codes.
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
> > ---
> >  mm/sparse.c | 16 ++++++----------
> >  1 file changed, 6 insertions(+), 10 deletions(-)
> > 
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index fde651ab8741..266f7f5040fb 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -735,23 +735,19 @@ static void free_map_bootmem(struct page *memmap)
> >  struct page * __meminit populate_section_memmap(unsigned long pfn,
> >  		unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
> >  {
> > -	struct page *page, *ret;
> > +	struct page *ret;
> >  	unsigned long memmap_size = sizeof(struct page) * PAGES_PER_SECTION;
> >  
> > -	page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size));
> > -	if (page)
> > -		goto got_map_page;
> > +	ret = (void*)__get_free_pages(GFP_KERNEL|__GFP_NOWARN,
> > +				get_order(memmap_size));
> > +	if (ret)
> > +		return ret;
> >  
> >  	ret = vmalloc(memmap_size);
> >  	if (ret)
> > -		goto got_map_ptr;
> > +		return ret;
> >  
> >  	return NULL;
> > -got_map_page:
> > -	ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
> > -got_map_ptr:
> > -
> > -	return ret;
> >  }
> 
> Boy this code is ugly. Is there any reason we cannot simply use
> kvmalloc_array(PAGES_PER_SECTION, sizeof(struct page), GFP_KERNEL | __GFP_NOWARN)
> 
> And if we care about locality then go even one step further
> kvmalloc_node(PAGES_PER_SECTION * sizeof(struct page), GFP_KERNEL | __GFP_NOWARN, nid)

Yes, this looks better. I will use this to make a new version. Thanks.



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-03-11  9:31 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-07  8:42 [PATCH v3 0/7] mm/hotplug: Only use subsection map for VMEMMAP Baoquan He
2020-03-07  8:42 ` [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case Baoquan He
2020-03-07 20:59   ` Andrew Morton
2020-03-07 22:55     ` Baoquan He
2020-03-09  8:56   ` David Hildenbrand
2020-03-09  8:58   ` David Hildenbrand
2020-03-09 13:18     ` Baoquan He
2020-03-09 13:22       ` David Hildenbrand
2020-03-09 10:13   ` Pankaj Gupta
2020-03-09 12:56   ` Michal Hocko
2020-03-07  8:42 ` [PATCH v3 2/7] mm/sparse.c: introduce new function fill_subsection_map() Baoquan He
2020-03-07  8:42 ` [PATCH v3 3/7] mm/sparse.c: introduce a new function clear_subsection_map() Baoquan He
2020-03-09  8:59   ` David Hildenbrand
2020-03-09 13:32     ` Baoquan He
2020-03-09 13:38       ` David Hildenbrand
2020-03-09 14:07         ` Baoquan He
2020-03-07  8:42 ` [PATCH v3 4/7] mm/sparse.c: only use subsection map in VMEMMAP case Baoquan He
2020-03-09  9:00   ` David Hildenbrand
2020-03-07  8:42 ` [PATCH v3 5/7] mm/sparse.c: add note about only VMEMMAP supporting sub-section support Baoquan He
2020-03-07 11:55   ` Baoquan He
2020-03-10 14:46   ` Michal Hocko
2020-03-11  4:20     ` Baoquan He
2020-03-07  8:42 ` [PATCH v3 6/7] mm/sparse.c: move subsection_map related codes together Baoquan He
2020-03-09  9:08   ` David Hildenbrand
2020-03-09 13:41     ` Baoquan He
2020-03-07  8:42 ` [PATCH v3 7/7] mm/sparse.c: Use __get_free_pages() instead in populate_section_memmap() Baoquan He
2020-03-10 14:56   ` Michal Hocko
2020-03-10 14:59     ` David Hildenbrand
2020-03-11  9:31     ` Baoquan He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).