All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/12] mm: Sub-section memory hotplug support
@ 2019-04-17 18:38 ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:38 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, david, linux-nvdimm, stable, linux-kernel, linux-mm,
	Jérôme Glisse, Vlastimil Babka

Changes since v5 [1]:

- Rebase on next-20190416 and the new 'struct mhp_restrictions'
  infrastructure.

- Extend mhp_restrictions to the 'remove' case so the sub-section policy
  can be clarified with respect to the memblock-api in a symmetric
  manner with the 'add' case.

- Kill is_dev_zone() since cleanups have now made it moot

[1]: https://lwn.net/Articles/783808/

---

The memory hotplug section is an arbitrary / convenient unit for memory
hotplug. 'Section-size' units have bled into the user interface
('memblock' sysfs) and can not be changed without breaking existing
userspace. The section-size constraint, while mostly benign for typical
memory hotplug, has and continues to wreak havoc with 'device-memory'
use cases, persistent memory (pmem) in particular. Recall that pmem uses
devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a
'struct page' memmap for pmem. However, it does not use the 'bottom
half' of memory hotplug, i.e. never marks pmem pages online and never
exposes the userspace memblock interface for pmem. This leaves an
opening to redress the section-size constraint.

To date, the libnvdimm subsystem has attempted to inject padding to
satisfy the internal constraints of arch_add_memory(). Beyond
complicating the code, leading to bugs [2], wasting memory, and limiting
configuration flexibility, the padding hack is broken when the platform
changes this physical memory alignment of pmem from one boot to the
next. Device failure (intermittent or permanent) and physical
reconfiguration are events that can cause the platform firmware to
change the physical placement of pmem on a subsequent boot, and device
failure is an everyday event in a data-center.

It turns out that sections are only a hard requirement of the
user-facing interface for memory hotplug and with a bit more
infrastructure sub-section arch_add_memory() support can be added for
kernel internal usages like devm_memremap_pages(). Here is an analysis
of the current design assumptions in the current code and how they are
addressed in the new implementation:

Current design assumptions:

- Sections that describe boot memory (early sections) are never
  unplugged / removed.

- pfn_valid(), in the CONFIG_SPARSEMEM_VMEMMAP=y, case devolves to a
  valid_section() check

- __add_pages() and helper routines assume all operations occur in
  PAGES_PER_SECTION units.

- The memblock sysfs interface only comprehends full sections

New design assumptions:

- Sections are instrumented with a sub-section bitmask to track (on x86)
  individual 2MB sub-divisions of a 128MB section.

- Partially populated early sections can be extended with additional
  sub-sections, and those sub-sections can be removed with
  arch_remove_memory(). With this in place we no longer lose usable memory
  capacity to padding.

- pfn_valid() is updated to look deeper than valid_section() to also check the
  active-sub-section mask. This indication is in the same cacheline as
  the valid_section() so the performance impact is expected to be
  negligible. So far the lkp robot has not reported any regressions.

- Outside of the core vmemmap population routines which are replaced,
  other helper routines like shrink_{zone,pgdat}_span() are updated to
  handle the smaller granularity. Core memory hotplug routines that deal
  with online memory are not touched.

- The existing memblock sysfs user api guarantees / assumptions are
  not touched since this capability is limited to !online
  !memblock-sysfs-accessible sections.

Meanwhile the issue reports continue to roll in from users that do not
understand when and how the 128MB constraint will bite them. The current
implementation relied on being able to support at least one misaligned
namespace, but that immediately falls over on any moderately complex
namespace creation attempt. Beyond the initial problem of 'System RAM'
colliding with pmem, and the unsolvable problem of physical alignment
changes, Linux is now being exposed to platforms that collide pmem
ranges with other pmem ranges by default [3]. In short,
devm_memremap_pages() has pushed the venerable section-size constraint
past the breaking point, and the simplicity of section-aligned
arch_add_memory() is no longer tenable.

These patches are exposed to the kbuild robot on my libnvdimm-pending
branch [4], and a preview of the unit test for this functionality is
available on the 'subsection-pending' branch of ndctl [5].

[2]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
[3]: https://github.com/pmem/ndctl/issues/76
[4]: https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=libnvdimm-pending
[5]: https://github.com/pmem/ndctl/commit/7c59b4867e1c

---

Dan Williams (12):
      mm/sparsemem: Introduce struct mem_section_usage
      mm/sparsemem: Introduce common definitions for the size and mask of a section
      mm/sparsemem: Add helpers track active portions of a section at boot
      mm/hotplug: Prepare shrink_{zone,pgdat}_span for sub-section removal
      mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap()
      mm/hotplug: Add mem-hotplug restrictions for remove_memory()
      mm: Kill is_dev_zone() helper
      mm/sparsemem: Prepare for sub-section ranges
      mm/sparsemem: Support sub-section hotplug
      mm/devm_memremap_pages: Enable sub-section remap
      libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields
      libnvdimm/pfn: Stop padding pmem namespaces to section alignment


 arch/ia64/mm/init.c            |    4 
 arch/powerpc/mm/mem.c          |    5 -
 arch/s390/mm/init.c            |    2 
 arch/sh/mm/init.c              |    4 
 arch/x86/mm/init_32.c          |    4 
 arch/x86/mm/init_64.c          |    9 +
 drivers/nvdimm/dax_devs.c      |    2 
 drivers/nvdimm/pfn.h           |   12 -
 drivers/nvdimm/pfn_devs.c      |   93 +++-------
 include/linux/memory_hotplug.h |   12 +
 include/linux/mm.h             |    4 
 include/linux/mmzone.h         |   72 ++++++--
 kernel/memremap.c              |   70 +++-----
 mm/hmm.c                       |    2 
 mm/memory_hotplug.c            |  148 +++++++++-------
 mm/page_alloc.c                |    8 +
 mm/sparse-vmemmap.c            |   21 ++
 mm/sparse.c                    |  371 +++++++++++++++++++++++++++-------------
 18 files changed, 503 insertions(+), 340 deletions(-)
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v6 00/12] mm: Sub-section memory hotplug support
@ 2019-04-17 18:38 ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:38 UTC (permalink / raw)
  To: akpm
  Cc: David Hildenbrand, Jérôme Glisse, Logan Gunthorpe,
	Toshi Kani, Jeff Moyer, Michal Hocko, Vlastimil Babka, stable,
	linux-mm, linux-nvdimm, linux-kernel, mhocko, david

Changes since v5 [1]:

- Rebase on next-20190416 and the new 'struct mhp_restrictions'
  infrastructure.

- Extend mhp_restrictions to the 'remove' case so the sub-section policy
  can be clarified with respect to the memblock-api in a symmetric
  manner with the 'add' case.

- Kill is_dev_zone() since cleanups have now made it moot

[1]: https://lwn.net/Articles/783808/

---

The memory hotplug section is an arbitrary / convenient unit for memory
hotplug. 'Section-size' units have bled into the user interface
('memblock' sysfs) and can not be changed without breaking existing
userspace. The section-size constraint, while mostly benign for typical
memory hotplug, has and continues to wreak havoc with 'device-memory'
use cases, persistent memory (pmem) in particular. Recall that pmem uses
devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a
'struct page' memmap for pmem. However, it does not use the 'bottom
half' of memory hotplug, i.e. never marks pmem pages online and never
exposes the userspace memblock interface for pmem. This leaves an
opening to redress the section-size constraint.

To date, the libnvdimm subsystem has attempted to inject padding to
satisfy the internal constraints of arch_add_memory(). Beyond
complicating the code, leading to bugs [2], wasting memory, and limiting
configuration flexibility, the padding hack is broken when the platform
changes this physical memory alignment of pmem from one boot to the
next. Device failure (intermittent or permanent) and physical
reconfiguration are events that can cause the platform firmware to
change the physical placement of pmem on a subsequent boot, and device
failure is an everyday event in a data-center.

It turns out that sections are only a hard requirement of the
user-facing interface for memory hotplug and with a bit more
infrastructure sub-section arch_add_memory() support can be added for
kernel internal usages like devm_memremap_pages(). Here is an analysis
of the current design assumptions in the current code and how they are
addressed in the new implementation:

Current design assumptions:

- Sections that describe boot memory (early sections) are never
  unplugged / removed.

- pfn_valid(), in the CONFIG_SPARSEMEM_VMEMMAP=y, case devolves to a
  valid_section() check

- __add_pages() and helper routines assume all operations occur in
  PAGES_PER_SECTION units.

- The memblock sysfs interface only comprehends full sections

New design assumptions:

- Sections are instrumented with a sub-section bitmask to track (on x86)
  individual 2MB sub-divisions of a 128MB section.

- Partially populated early sections can be extended with additional
  sub-sections, and those sub-sections can be removed with
  arch_remove_memory(). With this in place we no longer lose usable memory
  capacity to padding.

- pfn_valid() is updated to look deeper than valid_section() to also check the
  active-sub-section mask. This indication is in the same cacheline as
  the valid_section() so the performance impact is expected to be
  negligible. So far the lkp robot has not reported any regressions.

- Outside of the core vmemmap population routines which are replaced,
  other helper routines like shrink_{zone,pgdat}_span() are updated to
  handle the smaller granularity. Core memory hotplug routines that deal
  with online memory are not touched.

- The existing memblock sysfs user api guarantees / assumptions are
  not touched since this capability is limited to !online
  !memblock-sysfs-accessible sections.

Meanwhile the issue reports continue to roll in from users that do not
understand when and how the 128MB constraint will bite them. The current
implementation relied on being able to support at least one misaligned
namespace, but that immediately falls over on any moderately complex
namespace creation attempt. Beyond the initial problem of 'System RAM'
colliding with pmem, and the unsolvable problem of physical alignment
changes, Linux is now being exposed to platforms that collide pmem
ranges with other pmem ranges by default [3]. In short,
devm_memremap_pages() has pushed the venerable section-size constraint
past the breaking point, and the simplicity of section-aligned
arch_add_memory() is no longer tenable.

These patches are exposed to the kbuild robot on my libnvdimm-pending
branch [4], and a preview of the unit test for this functionality is
available on the 'subsection-pending' branch of ndctl [5].

[2]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
[3]: https://github.com/pmem/ndctl/issues/76
[4]: https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=libnvdimm-pending
[5]: https://github.com/pmem/ndctl/commit/7c59b4867e1c

---

Dan Williams (12):
      mm/sparsemem: Introduce struct mem_section_usage
      mm/sparsemem: Introduce common definitions for the size and mask of a section
      mm/sparsemem: Add helpers track active portions of a section at boot
      mm/hotplug: Prepare shrink_{zone,pgdat}_span for sub-section removal
      mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap()
      mm/hotplug: Add mem-hotplug restrictions for remove_memory()
      mm: Kill is_dev_zone() helper
      mm/sparsemem: Prepare for sub-section ranges
      mm/sparsemem: Support sub-section hotplug
      mm/devm_memremap_pages: Enable sub-section remap
      libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields
      libnvdimm/pfn: Stop padding pmem namespaces to section alignment


 arch/ia64/mm/init.c            |    4 
 arch/powerpc/mm/mem.c          |    5 -
 arch/s390/mm/init.c            |    2 
 arch/sh/mm/init.c              |    4 
 arch/x86/mm/init_32.c          |    4 
 arch/x86/mm/init_64.c          |    9 +
 drivers/nvdimm/dax_devs.c      |    2 
 drivers/nvdimm/pfn.h           |   12 -
 drivers/nvdimm/pfn_devs.c      |   93 +++-------
 include/linux/memory_hotplug.h |   12 +
 include/linux/mm.h             |    4 
 include/linux/mmzone.h         |   72 ++++++--
 kernel/memremap.c              |   70 +++-----
 mm/hmm.c                       |    2 
 mm/memory_hotplug.c            |  148 +++++++++-------
 mm/page_alloc.c                |    8 +
 mm/sparse-vmemmap.c            |   21 ++
 mm/sparse.c                    |  371 +++++++++++++++++++++++++++-------------
 18 files changed, 503 insertions(+), 340 deletions(-)

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v6 01/12] mm/sparsemem: Introduce struct mem_section_usage
  2019-04-17 18:38 ` Dan Williams
@ 2019-04-17 18:39   ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, linux-nvdimm, david, linux-kernel, linux-mm, Vlastimil Babka

Towards enabling memory hotplug to track partial population of a
section, introduce 'struct mem_section_usage'.

A pointer to a 'struct mem_section_usage' instance replaces the existing
pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to
house a new 'map_active' bitmap.  The new bitmap enables the memory
hot{plug,remove} implementation to act on incremental sub-divisions of a
section.

The primary motivation for this functionality is to support platforms
that mix "System RAM" and "Persistent Memory" within a single section,
or multiple PMEM ranges with different mapping lifetimes within a single
section. The section restriction for hotplug has caused an ongoing saga
of hacks and bugs for devm_memremap_pages() users.

Beyond the fixups to teach existing paths how to retrieve the 'usemap'
from a section, and updates to usemap allocation path, there are no
expected behavior changes.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/mmzone.h |   23 ++++++++++++--
 mm/memory_hotplug.c    |   18 ++++++-----
 mm/page_alloc.c        |    2 +
 mm/sparse.c            |   81 ++++++++++++++++++++++++------------------------
 4 files changed, 71 insertions(+), 53 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 70394cabaf4e..f0bbd85dc19a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1160,6 +1160,19 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
 #define SECTION_ALIGN_UP(pfn)	(((pfn) + PAGES_PER_SECTION - 1) & PAGE_SECTION_MASK)
 #define SECTION_ALIGN_DOWN(pfn)	((pfn) & PAGE_SECTION_MASK)
 
+#define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG)
+#define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1))
+
+struct mem_section_usage {
+	/*
+	 * SECTION_ACTIVE_SIZE portions of the section that are populated in
+	 * the memmap
+	 */
+	unsigned long map_active;
+	/* See declaration of similar field in struct zone */
+	unsigned long pageblock_flags[0];
+};
+
 struct page;
 struct page_ext;
 struct mem_section {
@@ -1177,8 +1190,7 @@ struct mem_section {
 	 */
 	unsigned long section_mem_map;
 
-	/* See declaration of similar field in struct zone */
-	unsigned long *pageblock_flags;
+	struct mem_section_usage *usage;
 #ifdef CONFIG_PAGE_EXTENSION
 	/*
 	 * If SPARSEMEM, pgdat doesn't have page_ext pointer. We use
@@ -1209,6 +1221,11 @@ extern struct mem_section **mem_section;
 extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT];
 #endif
 
+static inline unsigned long *section_to_usemap(struct mem_section *ms)
+{
+	return ms->usage->pageblock_flags;
+}
+
 static inline struct mem_section *__nr_to_section(unsigned long nr)
 {
 #ifdef CONFIG_SPARSEMEM_EXTREME
@@ -1220,7 +1237,7 @@ static inline struct mem_section *__nr_to_section(unsigned long nr)
 	return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK];
 }
 extern int __section_nr(struct mem_section* ms);
-extern unsigned long usemap_size(void);
+extern size_t mem_section_usage_size(void);
 
 /*
  * We use the lower bits of the mem_map pointer to store
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 52fef4a81e4c..8b7415736d21 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -165,9 +165,10 @@ void put_page_bootmem(struct page *page)
 #ifndef CONFIG_SPARSEMEM_VMEMMAP
 static void register_page_bootmem_info_section(unsigned long start_pfn)
 {
-	unsigned long *usemap, mapsize, section_nr, i;
+	unsigned long mapsize, section_nr, i;
 	struct mem_section *ms;
 	struct page *page, *memmap;
+	struct mem_section_usage *usage;
 
 	section_nr = pfn_to_section_nr(start_pfn);
 	ms = __nr_to_section(section_nr);
@@ -187,10 +188,10 @@ static void register_page_bootmem_info_section(unsigned long start_pfn)
 	for (i = 0; i < mapsize; i++, page++)
 		get_page_bootmem(section_nr, page, SECTION_INFO);
 
-	usemap = ms->pageblock_flags;
-	page = virt_to_page(usemap);
+	usage = ms->usage;
+	page = virt_to_page(usage);
 
-	mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT;
+	mapsize = PAGE_ALIGN(mem_section_usage_size()) >> PAGE_SHIFT;
 
 	for (i = 0; i < mapsize; i++, page++)
 		get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
@@ -199,9 +200,10 @@ static void register_page_bootmem_info_section(unsigned long start_pfn)
 #else /* CONFIG_SPARSEMEM_VMEMMAP */
 static void register_page_bootmem_info_section(unsigned long start_pfn)
 {
-	unsigned long *usemap, mapsize, section_nr, i;
+	unsigned long mapsize, section_nr, i;
 	struct mem_section *ms;
 	struct page *page, *memmap;
+	struct mem_section_usage *usage;
 
 	section_nr = pfn_to_section_nr(start_pfn);
 	ms = __nr_to_section(section_nr);
@@ -210,10 +212,10 @@ static void register_page_bootmem_info_section(unsigned long start_pfn)
 
 	register_page_bootmem_memmap(section_nr, memmap, PAGES_PER_SECTION);
 
-	usemap = ms->pageblock_flags;
-	page = virt_to_page(usemap);
+	usage = ms->usage;
+	page = virt_to_page(usage);
 
-	mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT;
+	mapsize = PAGE_ALIGN(mem_section_usage_size()) >> PAGE_SHIFT;
 
 	for (i = 0; i < mapsize; i++, page++)
 		get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index deea16489e2b..f671401a7c0b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -390,7 +390,7 @@ static inline unsigned long *get_pageblock_bitmap(struct page *page,
 							unsigned long pfn)
 {
 #ifdef CONFIG_SPARSEMEM
-	return __pfn_to_section(pfn)->pageblock_flags;
+	return section_to_usemap(__pfn_to_section(pfn));
 #else
 	return page_zone(page)->pageblock_flags;
 #endif /* CONFIG_SPARSEMEM */
diff --git a/mm/sparse.c b/mm/sparse.c
index fd13166949b5..f87de7ad32c8 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -288,33 +288,31 @@ struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pn
 
 static void __meminit sparse_init_one_section(struct mem_section *ms,
 		unsigned long pnum, struct page *mem_map,
-		unsigned long *pageblock_bitmap)
+		struct mem_section_usage *usage)
 {
 	ms->section_mem_map &= ~SECTION_MAP_MASK;
 	ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum) |
 							SECTION_HAS_MEM_MAP;
- 	ms->pageblock_flags = pageblock_bitmap;
+	ms->usage = usage;
 }
 
-unsigned long usemap_size(void)
+static unsigned long usemap_size(void)
 {
 	return BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS) * sizeof(unsigned long);
 }
 
-#ifdef CONFIG_MEMORY_HOTPLUG
-static unsigned long *__kmalloc_section_usemap(void)
+size_t mem_section_usage_size(void)
 {
-	return kmalloc(usemap_size(), GFP_KERNEL);
+	return sizeof(struct mem_section_usage) + usemap_size();
 }
-#endif /* CONFIG_MEMORY_HOTPLUG */
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-static unsigned long * __init
+static struct mem_section_usage * __init
 sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
 					 unsigned long size)
 {
+	struct mem_section_usage *usage;
 	unsigned long goal, limit;
-	unsigned long *p;
 	int nid;
 	/*
 	 * A page may contain usemaps for other sections preventing the
@@ -330,15 +328,16 @@ sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
 	limit = goal + (1UL << PA_SECTION_SHIFT);
 	nid = early_pfn_to_nid(goal >> PAGE_SHIFT);
 again:
-	p = memblock_alloc_try_nid(size, SMP_CACHE_BYTES, goal, limit, nid);
-	if (!p && limit) {
+	usage = memblock_alloc_try_nid(size, SMP_CACHE_BYTES, goal, limit, nid);
+	if (!usage && limit) {
 		limit = 0;
 		goto again;
 	}
-	return p;
+	return usage;
 }
 
-static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
+static void __init check_usemap_section_nr(int nid,
+		struct mem_section_usage *usage)
 {
 	unsigned long usemap_snr, pgdat_snr;
 	static unsigned long old_usemap_snr;
@@ -352,7 +351,7 @@ static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
 		old_pgdat_snr = NR_MEM_SECTIONS;
 	}
 
-	usemap_snr = pfn_to_section_nr(__pa(usemap) >> PAGE_SHIFT);
+	usemap_snr = pfn_to_section_nr(__pa(usage) >> PAGE_SHIFT);
 	pgdat_snr = pfn_to_section_nr(__pa(pgdat) >> PAGE_SHIFT);
 	if (usemap_snr == pgdat_snr)
 		return;
@@ -380,14 +379,15 @@ static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
 		usemap_snr, pgdat_snr, nid);
 }
 #else
-static unsigned long * __init
+static struct mem_section_usage * __init
 sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
 					 unsigned long size)
 {
 	return memblock_alloc_node(size, SMP_CACHE_BYTES, pgdat->node_id);
 }
 
-static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
+static void __init check_usemap_section_nr(int nid,
+		struct mem_section_usage *usage)
 {
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
@@ -474,14 +474,13 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
 				   unsigned long pnum_end,
 				   unsigned long map_count)
 {
-	unsigned long pnum, usemap_longs, *usemap;
+	struct mem_section_usage *usage;
+	unsigned long pnum;
 	struct page *map;
 
-	usemap_longs = BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS);
-	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid),
-							  usemap_size() *
-							  map_count);
-	if (!usemap) {
+	usage = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid),
+			mem_section_usage_size() * map_count);
+	if (!usage) {
 		pr_err("%s: node[%d] usemap allocation failed", __func__, nid);
 		goto failed;
 	}
@@ -497,9 +496,9 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
 			pnum_begin = pnum;
 			goto failed;
 		}
-		check_usemap_section_nr(nid, usemap);
-		sparse_init_one_section(__nr_to_section(pnum), pnum, map, usemap);
-		usemap += usemap_longs;
+		check_usemap_section_nr(nid, usage);
+		sparse_init_one_section(__nr_to_section(pnum), pnum, map, usage);
+		usage = (void *) usage + mem_section_usage_size();
 	}
 	sparse_buffer_fini();
 	return;
@@ -701,9 +700,9 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
 				     struct vmem_altmap *altmap)
 {
 	unsigned long section_nr = pfn_to_section_nr(start_pfn);
+	struct mem_section_usage *usage;
 	struct mem_section *ms;
 	struct page *memmap;
-	unsigned long *usemap;
 	int ret;
 
 	/*
@@ -717,8 +716,8 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
 	memmap = kmalloc_section_memmap(section_nr, nid, altmap);
 	if (!memmap)
 		return -ENOMEM;
-	usemap = __kmalloc_section_usemap();
-	if (!usemap) {
+	usage = kzalloc(mem_section_usage_size(), GFP_KERNEL);
+	if (!usage) {
 		__kfree_section_memmap(memmap, altmap);
 		return -ENOMEM;
 	}
@@ -736,11 +735,11 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
 	page_init_poison(memmap, sizeof(struct page) * PAGES_PER_SECTION);
 
 	section_mark_present(ms);
-	sparse_init_one_section(ms, section_nr, memmap, usemap);
+	sparse_init_one_section(ms, section_nr, memmap, usage);
 
 out:
 	if (ret < 0) {
-		kfree(usemap);
+		kfree(usage);
 		__kfree_section_memmap(memmap, altmap);
 	}
 	return ret;
@@ -777,20 +776,20 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
 }
 #endif
 
-static void free_section_usemap(struct page *memmap, unsigned long *usemap,
-		struct vmem_altmap *altmap)
+static void free_section_usage(struct page *memmap,
+		struct mem_section_usage *usage, struct vmem_altmap *altmap)
 {
-	struct page *usemap_page;
+	struct page *usage_page;
 
-	if (!usemap)
+	if (!usage)
 		return;
 
-	usemap_page = virt_to_page(usemap);
+	usage_page = virt_to_page(usage);
 	/*
 	 * Check to see if allocation came from hot-plug-add
 	 */
-	if (PageSlab(usemap_page) || PageCompound(usemap_page)) {
-		kfree(usemap);
+	if (PageSlab(usage_page) || PageCompound(usage_page)) {
+		kfree(usage);
 		if (memmap)
 			__kfree_section_memmap(memmap, altmap);
 		return;
@@ -809,19 +808,19 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
 		unsigned long map_offset, struct vmem_altmap *altmap)
 {
 	struct page *memmap = NULL;
-	unsigned long *usemap = NULL;
+	struct mem_section_usage *usage = NULL;
 
 	if (ms->section_mem_map) {
-		usemap = ms->pageblock_flags;
+		usage = ms->usage;
 		memmap = sparse_decode_mem_map(ms->section_mem_map,
 						__section_nr(ms));
 		ms->section_mem_map = 0;
-		ms->pageblock_flags = NULL;
+		ms->usage = NULL;
 	}
 
 	clear_hwpoisoned_pages(memmap + map_offset,
 			PAGES_PER_SECTION - map_offset);
-	free_section_usemap(memmap, usemap, altmap);
+	free_section_usage(memmap, usage, altmap);
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif /* CONFIG_MEMORY_HOTPLUG */

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 01/12] mm/sparsemem: Introduce struct mem_section_usage
@ 2019-04-17 18:39   ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm
  Cc: Michal Hocko, Vlastimil Babka, Logan Gunthorpe, linux-mm,
	linux-nvdimm, linux-kernel, mhocko, david

Towards enabling memory hotplug to track partial population of a
section, introduce 'struct mem_section_usage'.

A pointer to a 'struct mem_section_usage' instance replaces the existing
pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to
house a new 'map_active' bitmap.  The new bitmap enables the memory
hot{plug,remove} implementation to act on incremental sub-divisions of a
section.

The primary motivation for this functionality is to support platforms
that mix "System RAM" and "Persistent Memory" within a single section,
or multiple PMEM ranges with different mapping lifetimes within a single
section. The section restriction for hotplug has caused an ongoing saga
of hacks and bugs for devm_memremap_pages() users.

Beyond the fixups to teach existing paths how to retrieve the 'usemap'
from a section, and updates to usemap allocation path, there are no
expected behavior changes.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/mmzone.h |   23 ++++++++++++--
 mm/memory_hotplug.c    |   18 ++++++-----
 mm/page_alloc.c        |    2 +
 mm/sparse.c            |   81 ++++++++++++++++++++++++------------------------
 4 files changed, 71 insertions(+), 53 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 70394cabaf4e..f0bbd85dc19a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1160,6 +1160,19 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
 #define SECTION_ALIGN_UP(pfn)	(((pfn) + PAGES_PER_SECTION - 1) & PAGE_SECTION_MASK)
 #define SECTION_ALIGN_DOWN(pfn)	((pfn) & PAGE_SECTION_MASK)
 
+#define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG)
+#define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1))
+
+struct mem_section_usage {
+	/*
+	 * SECTION_ACTIVE_SIZE portions of the section that are populated in
+	 * the memmap
+	 */
+	unsigned long map_active;
+	/* See declaration of similar field in struct zone */
+	unsigned long pageblock_flags[0];
+};
+
 struct page;
 struct page_ext;
 struct mem_section {
@@ -1177,8 +1190,7 @@ struct mem_section {
 	 */
 	unsigned long section_mem_map;
 
-	/* See declaration of similar field in struct zone */
-	unsigned long *pageblock_flags;
+	struct mem_section_usage *usage;
 #ifdef CONFIG_PAGE_EXTENSION
 	/*
 	 * If SPARSEMEM, pgdat doesn't have page_ext pointer. We use
@@ -1209,6 +1221,11 @@ extern struct mem_section **mem_section;
 extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT];
 #endif
 
+static inline unsigned long *section_to_usemap(struct mem_section *ms)
+{
+	return ms->usage->pageblock_flags;
+}
+
 static inline struct mem_section *__nr_to_section(unsigned long nr)
 {
 #ifdef CONFIG_SPARSEMEM_EXTREME
@@ -1220,7 +1237,7 @@ static inline struct mem_section *__nr_to_section(unsigned long nr)
 	return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK];
 }
 extern int __section_nr(struct mem_section* ms);
-extern unsigned long usemap_size(void);
+extern size_t mem_section_usage_size(void);
 
 /*
  * We use the lower bits of the mem_map pointer to store
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 52fef4a81e4c..8b7415736d21 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -165,9 +165,10 @@ void put_page_bootmem(struct page *page)
 #ifndef CONFIG_SPARSEMEM_VMEMMAP
 static void register_page_bootmem_info_section(unsigned long start_pfn)
 {
-	unsigned long *usemap, mapsize, section_nr, i;
+	unsigned long mapsize, section_nr, i;
 	struct mem_section *ms;
 	struct page *page, *memmap;
+	struct mem_section_usage *usage;
 
 	section_nr = pfn_to_section_nr(start_pfn);
 	ms = __nr_to_section(section_nr);
@@ -187,10 +188,10 @@ static void register_page_bootmem_info_section(unsigned long start_pfn)
 	for (i = 0; i < mapsize; i++, page++)
 		get_page_bootmem(section_nr, page, SECTION_INFO);
 
-	usemap = ms->pageblock_flags;
-	page = virt_to_page(usemap);
+	usage = ms->usage;
+	page = virt_to_page(usage);
 
-	mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT;
+	mapsize = PAGE_ALIGN(mem_section_usage_size()) >> PAGE_SHIFT;
 
 	for (i = 0; i < mapsize; i++, page++)
 		get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
@@ -199,9 +200,10 @@ static void register_page_bootmem_info_section(unsigned long start_pfn)
 #else /* CONFIG_SPARSEMEM_VMEMMAP */
 static void register_page_bootmem_info_section(unsigned long start_pfn)
 {
-	unsigned long *usemap, mapsize, section_nr, i;
+	unsigned long mapsize, section_nr, i;
 	struct mem_section *ms;
 	struct page *page, *memmap;
+	struct mem_section_usage *usage;
 
 	section_nr = pfn_to_section_nr(start_pfn);
 	ms = __nr_to_section(section_nr);
@@ -210,10 +212,10 @@ static void register_page_bootmem_info_section(unsigned long start_pfn)
 
 	register_page_bootmem_memmap(section_nr, memmap, PAGES_PER_SECTION);
 
-	usemap = ms->pageblock_flags;
-	page = virt_to_page(usemap);
+	usage = ms->usage;
+	page = virt_to_page(usage);
 
-	mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT;
+	mapsize = PAGE_ALIGN(mem_section_usage_size()) >> PAGE_SHIFT;
 
 	for (i = 0; i < mapsize; i++, page++)
 		get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index deea16489e2b..f671401a7c0b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -390,7 +390,7 @@ static inline unsigned long *get_pageblock_bitmap(struct page *page,
 							unsigned long pfn)
 {
 #ifdef CONFIG_SPARSEMEM
-	return __pfn_to_section(pfn)->pageblock_flags;
+	return section_to_usemap(__pfn_to_section(pfn));
 #else
 	return page_zone(page)->pageblock_flags;
 #endif /* CONFIG_SPARSEMEM */
diff --git a/mm/sparse.c b/mm/sparse.c
index fd13166949b5..f87de7ad32c8 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -288,33 +288,31 @@ struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pn
 
 static void __meminit sparse_init_one_section(struct mem_section *ms,
 		unsigned long pnum, struct page *mem_map,
-		unsigned long *pageblock_bitmap)
+		struct mem_section_usage *usage)
 {
 	ms->section_mem_map &= ~SECTION_MAP_MASK;
 	ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum) |
 							SECTION_HAS_MEM_MAP;
- 	ms->pageblock_flags = pageblock_bitmap;
+	ms->usage = usage;
 }
 
-unsigned long usemap_size(void)
+static unsigned long usemap_size(void)
 {
 	return BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS) * sizeof(unsigned long);
 }
 
-#ifdef CONFIG_MEMORY_HOTPLUG
-static unsigned long *__kmalloc_section_usemap(void)
+size_t mem_section_usage_size(void)
 {
-	return kmalloc(usemap_size(), GFP_KERNEL);
+	return sizeof(struct mem_section_usage) + usemap_size();
 }
-#endif /* CONFIG_MEMORY_HOTPLUG */
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-static unsigned long * __init
+static struct mem_section_usage * __init
 sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
 					 unsigned long size)
 {
+	struct mem_section_usage *usage;
 	unsigned long goal, limit;
-	unsigned long *p;
 	int nid;
 	/*
 	 * A page may contain usemaps for other sections preventing the
@@ -330,15 +328,16 @@ sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
 	limit = goal + (1UL << PA_SECTION_SHIFT);
 	nid = early_pfn_to_nid(goal >> PAGE_SHIFT);
 again:
-	p = memblock_alloc_try_nid(size, SMP_CACHE_BYTES, goal, limit, nid);
-	if (!p && limit) {
+	usage = memblock_alloc_try_nid(size, SMP_CACHE_BYTES, goal, limit, nid);
+	if (!usage && limit) {
 		limit = 0;
 		goto again;
 	}
-	return p;
+	return usage;
 }
 
-static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
+static void __init check_usemap_section_nr(int nid,
+		struct mem_section_usage *usage)
 {
 	unsigned long usemap_snr, pgdat_snr;
 	static unsigned long old_usemap_snr;
@@ -352,7 +351,7 @@ static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
 		old_pgdat_snr = NR_MEM_SECTIONS;
 	}
 
-	usemap_snr = pfn_to_section_nr(__pa(usemap) >> PAGE_SHIFT);
+	usemap_snr = pfn_to_section_nr(__pa(usage) >> PAGE_SHIFT);
 	pgdat_snr = pfn_to_section_nr(__pa(pgdat) >> PAGE_SHIFT);
 	if (usemap_snr == pgdat_snr)
 		return;
@@ -380,14 +379,15 @@ static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
 		usemap_snr, pgdat_snr, nid);
 }
 #else
-static unsigned long * __init
+static struct mem_section_usage * __init
 sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
 					 unsigned long size)
 {
 	return memblock_alloc_node(size, SMP_CACHE_BYTES, pgdat->node_id);
 }
 
-static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
+static void __init check_usemap_section_nr(int nid,
+		struct mem_section_usage *usage)
 {
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
@@ -474,14 +474,13 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
 				   unsigned long pnum_end,
 				   unsigned long map_count)
 {
-	unsigned long pnum, usemap_longs, *usemap;
+	struct mem_section_usage *usage;
+	unsigned long pnum;
 	struct page *map;
 
-	usemap_longs = BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS);
-	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid),
-							  usemap_size() *
-							  map_count);
-	if (!usemap) {
+	usage = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid),
+			mem_section_usage_size() * map_count);
+	if (!usage) {
 		pr_err("%s: node[%d] usemap allocation failed", __func__, nid);
 		goto failed;
 	}
@@ -497,9 +496,9 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
 			pnum_begin = pnum;
 			goto failed;
 		}
-		check_usemap_section_nr(nid, usemap);
-		sparse_init_one_section(__nr_to_section(pnum), pnum, map, usemap);
-		usemap += usemap_longs;
+		check_usemap_section_nr(nid, usage);
+		sparse_init_one_section(__nr_to_section(pnum), pnum, map, usage);
+		usage = (void *) usage + mem_section_usage_size();
 	}
 	sparse_buffer_fini();
 	return;
@@ -701,9 +700,9 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
 				     struct vmem_altmap *altmap)
 {
 	unsigned long section_nr = pfn_to_section_nr(start_pfn);
+	struct mem_section_usage *usage;
 	struct mem_section *ms;
 	struct page *memmap;
-	unsigned long *usemap;
 	int ret;
 
 	/*
@@ -717,8 +716,8 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
 	memmap = kmalloc_section_memmap(section_nr, nid, altmap);
 	if (!memmap)
 		return -ENOMEM;
-	usemap = __kmalloc_section_usemap();
-	if (!usemap) {
+	usage = kzalloc(mem_section_usage_size(), GFP_KERNEL);
+	if (!usage) {
 		__kfree_section_memmap(memmap, altmap);
 		return -ENOMEM;
 	}
@@ -736,11 +735,11 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
 	page_init_poison(memmap, sizeof(struct page) * PAGES_PER_SECTION);
 
 	section_mark_present(ms);
-	sparse_init_one_section(ms, section_nr, memmap, usemap);
+	sparse_init_one_section(ms, section_nr, memmap, usage);
 
 out:
 	if (ret < 0) {
-		kfree(usemap);
+		kfree(usage);
 		__kfree_section_memmap(memmap, altmap);
 	}
 	return ret;
@@ -777,20 +776,20 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
 }
 #endif
 
-static void free_section_usemap(struct page *memmap, unsigned long *usemap,
-		struct vmem_altmap *altmap)
+static void free_section_usage(struct page *memmap,
+		struct mem_section_usage *usage, struct vmem_altmap *altmap)
 {
-	struct page *usemap_page;
+	struct page *usage_page;
 
-	if (!usemap)
+	if (!usage)
 		return;
 
-	usemap_page = virt_to_page(usemap);
+	usage_page = virt_to_page(usage);
 	/*
 	 * Check to see if allocation came from hot-plug-add
 	 */
-	if (PageSlab(usemap_page) || PageCompound(usemap_page)) {
-		kfree(usemap);
+	if (PageSlab(usage_page) || PageCompound(usage_page)) {
+		kfree(usage);
 		if (memmap)
 			__kfree_section_memmap(memmap, altmap);
 		return;
@@ -809,19 +808,19 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
 		unsigned long map_offset, struct vmem_altmap *altmap)
 {
 	struct page *memmap = NULL;
-	unsigned long *usemap = NULL;
+	struct mem_section_usage *usage = NULL;
 
 	if (ms->section_mem_map) {
-		usemap = ms->pageblock_flags;
+		usage = ms->usage;
 		memmap = sparse_decode_mem_map(ms->section_mem_map,
 						__section_nr(ms));
 		ms->section_mem_map = 0;
-		ms->pageblock_flags = NULL;
+		ms->usage = NULL;
 	}
 
 	clear_hwpoisoned_pages(memmap + map_offset,
 			PAGES_PER_SECTION - map_offset);
-	free_section_usemap(memmap, usemap, altmap);
+	free_section_usage(memmap, usage, altmap);
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif /* CONFIG_MEMORY_HOTPLUG */


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 02/12] mm/sparsemem: Introduce common definitions for the size and mask of a section
  2019-04-17 18:38 ` Dan Williams
@ 2019-04-17 18:39   ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, linux-nvdimm, david, linux-kernel, linux-mm,
	Jérôme Glisse, Vlastimil Babka

Up-level the local section size and mask from kernel/memremap.c to
global definitions.  These will be used by the new sub-section hotplug
support.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/mmzone.h |    2 ++
 kernel/memremap.c      |   10 ++++------
 mm/hmm.c               |    2 --
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f0bbd85dc19a..6726fc175b51 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1134,6 +1134,8 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn)
  * PFN_SECTION_SHIFT		pfn to/from section number
  */
 #define PA_SECTION_SHIFT	(SECTION_SIZE_BITS)
+#define PA_SECTION_SIZE		(1UL << PA_SECTION_SHIFT)
+#define PA_SECTION_MASK		(~(PA_SECTION_SIZE-1))
 #define PFN_SECTION_SHIFT	(SECTION_SIZE_BITS - PAGE_SHIFT)
 
 #define NR_MEM_SECTIONS		(1UL << SECTIONS_SHIFT)
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 4e59d29245f4..f355586ea54a 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -14,8 +14,6 @@
 #include <linux/hmm.h>
 
 static DEFINE_XARRAY(pgmap_array);
-#define SECTION_MASK ~((1UL << PA_SECTION_SHIFT) - 1)
-#define SECTION_SIZE (1UL << PA_SECTION_SHIFT)
 
 #if IS_ENABLED(CONFIG_DEVICE_PRIVATE)
 vm_fault_t device_private_entry_fault(struct vm_area_struct *vma,
@@ -98,8 +96,8 @@ static void devm_memremap_pages_release(void *data)
 		put_page(pfn_to_page(pfn));
 
 	/* pages are dead and unused, undo the arch mapping */
-	align_start = res->start & ~(SECTION_SIZE - 1);
-	align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
+	align_start = res->start & ~(PA_SECTION_SIZE - 1);
+	align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
 		- align_start;
 
 	nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT));
@@ -160,8 +158,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	if (!pgmap->ref || !pgmap->kill)
 		return ERR_PTR(-EINVAL);
 
-	align_start = res->start & ~(SECTION_SIZE - 1);
-	align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
+	align_start = res->start & ~(PA_SECTION_SIZE - 1);
+	align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
 		- align_start;
 	align_end = align_start + align_size - 1;
 
diff --git a/mm/hmm.c b/mm/hmm.c
index ecd16718285e..def451a56c3e 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -34,8 +34,6 @@
 #include <linux/mmu_notifier.h>
 #include <linux/memory_hotplug.h>
 
-#define PA_SECTION_SIZE (1UL << PA_SECTION_SHIFT)
-
 #if IS_ENABLED(CONFIG_HMM_MIRROR)
 static const struct mmu_notifier_ops hmm_mmu_notifier_ops;
 

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 02/12] mm/sparsemem: Introduce common definitions for the size and mask of a section
@ 2019-04-17 18:39   ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm
  Cc: Michal Hocko, Vlastimil Babka, Jérôme Glisse,
	Logan Gunthorpe, linux-mm, linux-nvdimm, linux-kernel, mhocko,
	david

Up-level the local section size and mask from kernel/memremap.c to
global definitions.  These will be used by the new sub-section hotplug
support.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/mmzone.h |    2 ++
 kernel/memremap.c      |   10 ++++------
 mm/hmm.c               |    2 --
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f0bbd85dc19a..6726fc175b51 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1134,6 +1134,8 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn)
  * PFN_SECTION_SHIFT		pfn to/from section number
  */
 #define PA_SECTION_SHIFT	(SECTION_SIZE_BITS)
+#define PA_SECTION_SIZE		(1UL << PA_SECTION_SHIFT)
+#define PA_SECTION_MASK		(~(PA_SECTION_SIZE-1))
 #define PFN_SECTION_SHIFT	(SECTION_SIZE_BITS - PAGE_SHIFT)
 
 #define NR_MEM_SECTIONS		(1UL << SECTIONS_SHIFT)
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 4e59d29245f4..f355586ea54a 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -14,8 +14,6 @@
 #include <linux/hmm.h>
 
 static DEFINE_XARRAY(pgmap_array);
-#define SECTION_MASK ~((1UL << PA_SECTION_SHIFT) - 1)
-#define SECTION_SIZE (1UL << PA_SECTION_SHIFT)
 
 #if IS_ENABLED(CONFIG_DEVICE_PRIVATE)
 vm_fault_t device_private_entry_fault(struct vm_area_struct *vma,
@@ -98,8 +96,8 @@ static void devm_memremap_pages_release(void *data)
 		put_page(pfn_to_page(pfn));
 
 	/* pages are dead and unused, undo the arch mapping */
-	align_start = res->start & ~(SECTION_SIZE - 1);
-	align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
+	align_start = res->start & ~(PA_SECTION_SIZE - 1);
+	align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
 		- align_start;
 
 	nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT));
@@ -160,8 +158,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	if (!pgmap->ref || !pgmap->kill)
 		return ERR_PTR(-EINVAL);
 
-	align_start = res->start & ~(SECTION_SIZE - 1);
-	align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
+	align_start = res->start & ~(PA_SECTION_SIZE - 1);
+	align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
 		- align_start;
 	align_end = align_start + align_size - 1;
 
diff --git a/mm/hmm.c b/mm/hmm.c
index ecd16718285e..def451a56c3e 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -34,8 +34,6 @@
 #include <linux/mmu_notifier.h>
 #include <linux/memory_hotplug.h>
 
-#define PA_SECTION_SIZE (1UL << PA_SECTION_SHIFT)
-
 #if IS_ENABLED(CONFIG_HMM_MIRROR)
 static const struct mmu_notifier_ops hmm_mmu_notifier_ops;
 


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
  2019-04-17 18:38 ` Dan Williams
@ 2019-04-17 18:39   ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, linux-nvdimm, david, linux-kernel, linux-mm, Vlastimil Babka

Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
map_active bitmask length (64)). If it turns out that 2MB is too large
of an active tracking granularity it is trivial to increase the size of
the map_active bitmap.

The implications of a partially populated section is that pfn_valid()
needs to go beyond a valid_section() check and read the sub-section
active ranges from the bitmask.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/mmzone.h |   29 ++++++++++++++++++++++++++++-
 mm/page_alloc.c        |    4 +++-
 mm/sparse.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6726fc175b51..cffde898e345 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1175,6 +1175,8 @@ struct mem_section_usage {
 	unsigned long pageblock_flags[0];
 };
 
+void section_active_init(unsigned long pfn, unsigned long nr_pages);
+
 struct page;
 struct page_ext;
 struct mem_section {
@@ -1312,12 +1314,36 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn)
 
 extern int __highest_present_section_nr;
 
+static inline int section_active_index(phys_addr_t phys)
+{
+	return (phys & ~(PA_SECTION_MASK)) / SECTION_ACTIVE_SIZE;
+}
+
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
+{
+	int idx = section_active_index(PFN_PHYS(pfn));
+
+	return !!(ms->usage->map_active & (1UL << idx));
+}
+#else
+static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
+{
+	return 1;
+}
+#endif
+
 #ifndef CONFIG_HAVE_ARCH_PFN_VALID
 static inline int pfn_valid(unsigned long pfn)
 {
+	struct mem_section *ms;
+
 	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
 		return 0;
-	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
+	ms = __nr_to_section(pfn_to_section_nr(pfn));
+	if (!valid_section(ms))
+		return 0;
+	return pfn_section_valid(ms, pfn);
 }
 #endif
 
@@ -1349,6 +1375,7 @@ void sparse_init(void);
 #define sparse_init()	do {} while (0)
 #define sparse_index_init(_sec, _nid)  do {} while (0)
 #define pfn_present pfn_valid
+#define section_active_init(_pfn, _nr_pages) do {} while (0)
 #endif /* CONFIG_SPARSEMEM */
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f671401a7c0b..c9ad28a78018 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7273,10 +7273,12 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 
 	/* Print out the early node map */
 	pr_info("Early memory node ranges\n");
-	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
 		pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
 			(u64)start_pfn << PAGE_SHIFT,
 			((u64)end_pfn << PAGE_SHIFT) - 1);
+		section_active_init(start_pfn, end_pfn - start_pfn);
+	}
 
 	/* Initialise every node */
 	mminit_verify_pageflags_layout();
diff --git a/mm/sparse.c b/mm/sparse.c
index f87de7ad32c8..5ef2f884c4e1 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -210,6 +210,54 @@ static inline unsigned long first_present_section_nr(void)
 	return next_present_section_nr(-1);
 }
 
+static unsigned long section_active_mask(unsigned long pfn,
+		unsigned long nr_pages)
+{
+	int idx_start, idx_size;
+	phys_addr_t start, size;
+
+	if (!nr_pages)
+		return 0;
+
+	start = PFN_PHYS(pfn);
+	size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
+				- (pfn & ~PAGE_SECTION_MASK)));
+	size = ALIGN(size, SECTION_ACTIVE_SIZE);
+
+	idx_start = section_active_index(start);
+	idx_size = section_active_index(size);
+
+	if (idx_size == 0)
+		return -1;
+	return ((1UL << idx_size) - 1) << idx_start;
+}
+
+void section_active_init(unsigned long pfn, unsigned long nr_pages)
+{
+	int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
+	int i, start_sec = pfn_to_section_nr(pfn);
+
+	if (!nr_pages)
+		return;
+
+	for (i = start_sec; i <= end_sec; i++) {
+		struct mem_section *ms;
+		unsigned long mask;
+		unsigned long pfns;
+
+		pfns = min(nr_pages, PAGES_PER_SECTION
+				- (pfn & ~PAGE_SECTION_MASK));
+		mask = section_active_mask(pfn, pfns);
+
+		ms = __nr_to_section(i);
+		pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i, mask);
+		ms->usage->map_active = mask;
+
+		pfn += pfns;
+		nr_pages -= pfns;
+	}
+}
+
 /* Record a memory area against a node. */
 void __init memory_present(int nid, unsigned long start, unsigned long end)
 {

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
@ 2019-04-17 18:39   ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm
  Cc: Michal Hocko, Vlastimil Babka, Logan Gunthorpe, linux-mm,
	linux-nvdimm, linux-kernel, mhocko, david

Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
map_active bitmask length (64)). If it turns out that 2MB is too large
of an active tracking granularity it is trivial to increase the size of
the map_active bitmap.

The implications of a partially populated section is that pfn_valid()
needs to go beyond a valid_section() check and read the sub-section
active ranges from the bitmask.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/mmzone.h |   29 ++++++++++++++++++++++++++++-
 mm/page_alloc.c        |    4 +++-
 mm/sparse.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6726fc175b51..cffde898e345 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1175,6 +1175,8 @@ struct mem_section_usage {
 	unsigned long pageblock_flags[0];
 };
 
+void section_active_init(unsigned long pfn, unsigned long nr_pages);
+
 struct page;
 struct page_ext;
 struct mem_section {
@@ -1312,12 +1314,36 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn)
 
 extern int __highest_present_section_nr;
 
+static inline int section_active_index(phys_addr_t phys)
+{
+	return (phys & ~(PA_SECTION_MASK)) / SECTION_ACTIVE_SIZE;
+}
+
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
+{
+	int idx = section_active_index(PFN_PHYS(pfn));
+
+	return !!(ms->usage->map_active & (1UL << idx));
+}
+#else
+static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
+{
+	return 1;
+}
+#endif
+
 #ifndef CONFIG_HAVE_ARCH_PFN_VALID
 static inline int pfn_valid(unsigned long pfn)
 {
+	struct mem_section *ms;
+
 	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
 		return 0;
-	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
+	ms = __nr_to_section(pfn_to_section_nr(pfn));
+	if (!valid_section(ms))
+		return 0;
+	return pfn_section_valid(ms, pfn);
 }
 #endif
 
@@ -1349,6 +1375,7 @@ void sparse_init(void);
 #define sparse_init()	do {} while (0)
 #define sparse_index_init(_sec, _nid)  do {} while (0)
 #define pfn_present pfn_valid
+#define section_active_init(_pfn, _nr_pages) do {} while (0)
 #endif /* CONFIG_SPARSEMEM */
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f671401a7c0b..c9ad28a78018 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7273,10 +7273,12 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 
 	/* Print out the early node map */
 	pr_info("Early memory node ranges\n");
-	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
 		pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
 			(u64)start_pfn << PAGE_SHIFT,
 			((u64)end_pfn << PAGE_SHIFT) - 1);
+		section_active_init(start_pfn, end_pfn - start_pfn);
+	}
 
 	/* Initialise every node */
 	mminit_verify_pageflags_layout();
diff --git a/mm/sparse.c b/mm/sparse.c
index f87de7ad32c8..5ef2f884c4e1 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -210,6 +210,54 @@ static inline unsigned long first_present_section_nr(void)
 	return next_present_section_nr(-1);
 }
 
+static unsigned long section_active_mask(unsigned long pfn,
+		unsigned long nr_pages)
+{
+	int idx_start, idx_size;
+	phys_addr_t start, size;
+
+	if (!nr_pages)
+		return 0;
+
+	start = PFN_PHYS(pfn);
+	size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
+				- (pfn & ~PAGE_SECTION_MASK)));
+	size = ALIGN(size, SECTION_ACTIVE_SIZE);
+
+	idx_start = section_active_index(start);
+	idx_size = section_active_index(size);
+
+	if (idx_size == 0)
+		return -1;
+	return ((1UL << idx_size) - 1) << idx_start;
+}
+
+void section_active_init(unsigned long pfn, unsigned long nr_pages)
+{
+	int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
+	int i, start_sec = pfn_to_section_nr(pfn);
+
+	if (!nr_pages)
+		return;
+
+	for (i = start_sec; i <= end_sec; i++) {
+		struct mem_section *ms;
+		unsigned long mask;
+		unsigned long pfns;
+
+		pfns = min(nr_pages, PAGES_PER_SECTION
+				- (pfn & ~PAGE_SECTION_MASK));
+		mask = section_active_mask(pfn, pfns);
+
+		ms = __nr_to_section(i);
+		pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i, mask);
+		ms->usage->map_active = mask;
+
+		pfn += pfns;
+		nr_pages -= pfns;
+	}
+}
+
 /* Record a memory area against a node. */
 void __init memory_present(int nid, unsigned long start, unsigned long end)
 {


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal
  2019-04-17 18:38 ` Dan Williams
@ 2019-04-17 18:39   ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, linux-nvdimm, david, linux-kernel, linux-mm, Vlastimil Babka

Sub-section hotplug support reduces the unit of operation of hotplug
from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units
(PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider
PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not
valid_section(), can toggle.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/mmzone.h |    2 ++
 mm/memory_hotplug.c    |   16 ++++++++--------
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index cffde898e345..b13f0cddf75e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1164,6 +1164,8 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
 
 #define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG)
 #define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1))
+#define PAGES_PER_SUB_SECTION (SECTION_ACTIVE_SIZE / PAGE_SIZE)
+#define PAGE_SUB_SECTION_MASK (~(PAGES_PER_SUB_SECTION-1))
 
 struct mem_section_usage {
 	/*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 8b7415736d21..d5874f9d4043 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -327,10 +327,10 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
 {
 	struct mem_section *ms;
 
-	for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) {
+	for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUB_SECTION) {
 		ms = __pfn_to_section(start_pfn);
 
-		if (unlikely(!valid_section(ms)))
+		if (unlikely(!pfn_valid(start_pfn)))
 			continue;
 
 		if (unlikely(pfn_to_nid(start_pfn) != nid))
@@ -355,10 +355,10 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
 
 	/* pfn is the end pfn of a memory section. */
 	pfn = end_pfn - 1;
-	for (; pfn >= start_pfn; pfn -= PAGES_PER_SECTION) {
+	for (; pfn >= start_pfn; pfn -= PAGES_PER_SUB_SECTION) {
 		ms = __pfn_to_section(pfn);
 
-		if (unlikely(!valid_section(ms)))
+		if (unlikely(!pfn_valid(pfn)))
 			continue;
 
 		if (unlikely(pfn_to_nid(pfn) != nid))
@@ -417,10 +417,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 	 * it check the zone has only hole or not.
 	 */
 	pfn = zone_start_pfn;
-	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SECTION) {
+	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUB_SECTION) {
 		ms = __pfn_to_section(pfn);
 
-		if (unlikely(!valid_section(ms)))
+		if (unlikely(!pfn_valid(pfn)))
 			continue;
 
 		if (page_zone(pfn_to_page(pfn)) != zone)
@@ -485,10 +485,10 @@ static void shrink_pgdat_span(struct pglist_data *pgdat,
 	 * has only hole or not.
 	 */
 	pfn = pgdat_start_pfn;
-	for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SECTION) {
+	for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SUB_SECTION) {
 		ms = __pfn_to_section(pfn);
 
-		if (unlikely(!valid_section(ms)))
+		if (unlikely(!pfn_valid(pfn)))
 			continue;
 
 		if (pfn_to_nid(pfn) != nid)

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal
@ 2019-04-17 18:39   ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm
  Cc: Michal Hocko, Vlastimil Babka, Logan Gunthorpe, linux-mm,
	linux-nvdimm, linux-kernel, mhocko, david

Sub-section hotplug support reduces the unit of operation of hotplug
from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units
(PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider
PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not
valid_section(), can toggle.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/mmzone.h |    2 ++
 mm/memory_hotplug.c    |   16 ++++++++--------
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index cffde898e345..b13f0cddf75e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1164,6 +1164,8 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
 
 #define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG)
 #define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1))
+#define PAGES_PER_SUB_SECTION (SECTION_ACTIVE_SIZE / PAGE_SIZE)
+#define PAGE_SUB_SECTION_MASK (~(PAGES_PER_SUB_SECTION-1))
 
 struct mem_section_usage {
 	/*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 8b7415736d21..d5874f9d4043 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -327,10 +327,10 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
 {
 	struct mem_section *ms;
 
-	for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) {
+	for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUB_SECTION) {
 		ms = __pfn_to_section(start_pfn);
 
-		if (unlikely(!valid_section(ms)))
+		if (unlikely(!pfn_valid(start_pfn)))
 			continue;
 
 		if (unlikely(pfn_to_nid(start_pfn) != nid))
@@ -355,10 +355,10 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
 
 	/* pfn is the end pfn of a memory section. */
 	pfn = end_pfn - 1;
-	for (; pfn >= start_pfn; pfn -= PAGES_PER_SECTION) {
+	for (; pfn >= start_pfn; pfn -= PAGES_PER_SUB_SECTION) {
 		ms = __pfn_to_section(pfn);
 
-		if (unlikely(!valid_section(ms)))
+		if (unlikely(!pfn_valid(pfn)))
 			continue;
 
 		if (unlikely(pfn_to_nid(pfn) != nid))
@@ -417,10 +417,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 	 * it check the zone has only hole or not.
 	 */
 	pfn = zone_start_pfn;
-	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SECTION) {
+	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUB_SECTION) {
 		ms = __pfn_to_section(pfn);
 
-		if (unlikely(!valid_section(ms)))
+		if (unlikely(!pfn_valid(pfn)))
 			continue;
 
 		if (page_zone(pfn_to_page(pfn)) != zone)
@@ -485,10 +485,10 @@ static void shrink_pgdat_span(struct pglist_data *pgdat,
 	 * has only hole or not.
 	 */
 	pfn = pgdat_start_pfn;
-	for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SECTION) {
+	for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SUB_SECTION) {
 		ms = __pfn_to_section(pfn);
 
-		if (unlikely(!valid_section(ms)))
+		if (unlikely(!pfn_valid(pfn)))
 			continue;
 
 		if (pfn_to_nid(pfn) != nid)


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 05/12] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap()
  2019-04-17 18:38 ` Dan Williams
@ 2019-04-17 18:39   ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, linux-nvdimm, david, linux-kernel, linux-mm

Allow sub-section sized ranges to be added to the memmap.
populate_section_memmap() takes an explict pfn range rather than
assuming a full section, and those parameters are plumbed all the way
through to vmmemap_populate(). There should be no sub-section usage in
current deployments. New warnings are added to clarify which memmap
allocation paths are sub-section capable.

Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/mm/init_64.c |    4 ++-
 include/linux/mm.h    |    4 ++-
 mm/sparse-vmemmap.c   |   21 +++++++++++------
 mm/sparse.c           |   61 +++++++++++++++++++++++++++++++------------------
 4 files changed, 57 insertions(+), 33 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 20d14254b686..bb018d09d2dc 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1457,7 +1457,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 {
 	int err;
 
-	if (boot_cpu_has(X86_FEATURE_PSE))
+	if (end - start < PAGES_PER_SECTION * sizeof(struct page))
+		err = vmemmap_populate_basepages(start, end, node);
+	else if (boot_cpu_has(X86_FEATURE_PSE))
 		err = vmemmap_populate_hugepages(start, end, node, altmap);
 	else if (altmap) {
 		pr_err_once("%s: no cpu support for altmap allocations\n",
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 91a19229452b..3cc599fd3ae0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2750,8 +2750,8 @@ const char * arch_vma_name(struct vm_area_struct *vma);
 void print_vma_addr(char *prefix, unsigned long rip);
 
 void *sparse_buffer_alloc(unsigned long size);
-struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
-		struct vmem_altmap *altmap);
+struct page * __populate_section_memmap(unsigned long pfn,
+		unsigned long nr_pages, int nid, struct vmem_altmap *altmap);
 pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
 p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node);
 pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 7fec05796796..dcb023aa23d1 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -245,19 +245,26 @@ int __meminit vmemmap_populate_basepages(unsigned long start,
 	return 0;
 }
 
-struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid,
-		struct vmem_altmap *altmap)
+struct page * __meminit __populate_section_memmap(unsigned long pfn,
+		unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
 {
 	unsigned long start;
 	unsigned long end;
-	struct page *map;
 
-	map = pfn_to_page(pnum * PAGES_PER_SECTION);
-	start = (unsigned long)map;
-	end = (unsigned long)(map + PAGES_PER_SECTION);
+	/*
+	 * The minimum granularity of memmap extensions is
+	 * SECTION_ACTIVE_SIZE as allocations are tracked in the
+	 * 'map_active' bitmap of the section.
+	 */
+	end = ALIGN(pfn + nr_pages, PHYS_PFN(SECTION_ACTIVE_SIZE));
+	pfn &= PHYS_PFN(SECTION_ACTIVE_MASK);
+	nr_pages = end - pfn;
+
+	start = (unsigned long) pfn_to_page(pfn);
+	end = start + nr_pages * sizeof(struct page);
 
 	if (vmemmap_populate(start, end, nid, altmap))
 		return NULL;
 
-	return map;
+	return pfn_to_page(pfn);
 }
diff --git a/mm/sparse.c b/mm/sparse.c
index 5ef2f884c4e1..98408c0da060 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -452,8 +452,8 @@ static unsigned long __init section_map_size(void)
 	return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
 }
 
-struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid,
-		struct vmem_altmap *altmap)
+struct page __init *__populate_section_memmap(unsigned long pfn,
+		unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
 {
 	unsigned long size = section_map_size();
 	struct page *map = sparse_buffer_alloc(size);
@@ -534,10 +534,13 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
 	}
 	sparse_buffer_init(map_count * section_map_size(), nid);
 	for_each_present_section_nr(pnum_begin, pnum) {
+		unsigned long pfn = section_nr_to_pfn(pnum);
+
 		if (pnum >= pnum_end)
 			break;
 
-		map = sparse_mem_map_populate(pnum, nid, NULL);
+		map = __populate_section_memmap(pfn, PAGES_PER_SECTION,
+				nid, NULL);
 		if (!map) {
 			pr_err("%s: node[%d] memory map backing failed. Some memory will not be available.",
 			       __func__, nid);
@@ -637,17 +640,17 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 #endif
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
-		struct vmem_altmap *altmap)
+static struct page *populate_section_memmap(unsigned long pfn,
+		unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
 {
-	/* This will make the necessary allocations eventually. */
-	return sparse_mem_map_populate(pnum, nid, altmap);
+	return __populate_section_memmap(pfn, nr_pages, nid, altmap);
 }
-static void __kfree_section_memmap(struct page *memmap,
+
+static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
 		struct vmem_altmap *altmap)
 {
-	unsigned long start = (unsigned long)memmap;
-	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
+	unsigned long start = (unsigned long) pfn_to_page(pfn);
+	unsigned long end = start + nr_pages * sizeof(struct page);
 
 	vmemmap_free(start, end, altmap);
 }
@@ -661,11 +664,18 @@ static void free_map_bootmem(struct page *memmap)
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 #else
-static struct page *__kmalloc_section_memmap(void)
+struct page *populate_section_memmap(unsigned long pfn,
+		unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
 {
 	struct page *page, *ret;
 	unsigned long memmap_size = sizeof(struct page) * PAGES_PER_SECTION;
 
+	if ((pfn & ~PAGE_SECTION_MASK) || nr_pages != PAGES_PER_SECTION) {
+		WARN(1, "%s: called with section unaligned parameters\n",
+				__func__);
+		return NULL;
+	}
+
 	page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size));
 	if (page)
 		goto got_map_page;
@@ -682,15 +692,17 @@ static struct page *__kmalloc_section_memmap(void)
 	return ret;
 }
 
-static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
+static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
 		struct vmem_altmap *altmap)
 {
-	return __kmalloc_section_memmap();
-}
+	struct page *memmap = pfn_to_page(pfn);
+
+	if ((pfn & ~PAGE_SECTION_MASK) || nr_pages != PAGES_PER_SECTION) {
+		WARN(1, "%s: called with section unaligned parameters\n",
+				__func__);
+		return;
+	}
 
-static void __kfree_section_memmap(struct page *memmap,
-		struct vmem_altmap *altmap)
-{
 	if (is_vmalloc_addr(memmap))
 		vfree(memmap);
 	else
@@ -761,12 +773,13 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
 	if (ret < 0 && ret != -EEXIST)
 		return ret;
 	ret = 0;
-	memmap = kmalloc_section_memmap(section_nr, nid, altmap);
+	memmap = populate_section_memmap(start_pfn, PAGES_PER_SECTION, nid,
+			altmap);
 	if (!memmap)
 		return -ENOMEM;
 	usage = kzalloc(mem_section_usage_size(), GFP_KERNEL);
 	if (!usage) {
-		__kfree_section_memmap(memmap, altmap);
+		depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap);
 		return -ENOMEM;
 	}
 
@@ -788,7 +801,7 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
 out:
 	if (ret < 0) {
 		kfree(usage);
-		__kfree_section_memmap(memmap, altmap);
+		depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap);
 	}
 	return ret;
 }
@@ -825,7 +838,8 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
 #endif
 
 static void free_section_usage(struct page *memmap,
-		struct mem_section_usage *usage, struct vmem_altmap *altmap)
+		struct mem_section_usage *usage, unsigned long pfn,
+		unsigned long nr_pages, struct vmem_altmap *altmap)
 {
 	struct page *usage_page;
 
@@ -839,7 +853,7 @@ static void free_section_usage(struct page *memmap,
 	if (PageSlab(usage_page) || PageCompound(usage_page)) {
 		kfree(usage);
 		if (memmap)
-			__kfree_section_memmap(memmap, altmap);
+			depopulate_section_memmap(pfn, nr_pages, altmap);
 		return;
 	}
 
@@ -868,7 +882,8 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
 
 	clear_hwpoisoned_pages(memmap + map_offset,
 			PAGES_PER_SECTION - map_offset);
-	free_section_usage(memmap, usage, altmap);
+	free_section_usage(memmap, usage, section_nr_to_pfn(__section_nr(ms)),
+			PAGES_PER_SECTION, altmap);
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif /* CONFIG_MEMORY_HOTPLUG */

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 05/12] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap()
@ 2019-04-17 18:39   ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm
  Cc: Michal Hocko, David Hildenbrand, Logan Gunthorpe, linux-mm,
	linux-nvdimm, linux-kernel, mhocko, david

Allow sub-section sized ranges to be added to the memmap.
populate_section_memmap() takes an explict pfn range rather than
assuming a full section, and those parameters are plumbed all the way
through to vmmemap_populate(). There should be no sub-section usage in
current deployments. New warnings are added to clarify which memmap
allocation paths are sub-section capable.

Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/mm/init_64.c |    4 ++-
 include/linux/mm.h    |    4 ++-
 mm/sparse-vmemmap.c   |   21 +++++++++++------
 mm/sparse.c           |   61 +++++++++++++++++++++++++++++++------------------
 4 files changed, 57 insertions(+), 33 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 20d14254b686..bb018d09d2dc 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1457,7 +1457,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 {
 	int err;
 
-	if (boot_cpu_has(X86_FEATURE_PSE))
+	if (end - start < PAGES_PER_SECTION * sizeof(struct page))
+		err = vmemmap_populate_basepages(start, end, node);
+	else if (boot_cpu_has(X86_FEATURE_PSE))
 		err = vmemmap_populate_hugepages(start, end, node, altmap);
 	else if (altmap) {
 		pr_err_once("%s: no cpu support for altmap allocations\n",
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 91a19229452b..3cc599fd3ae0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2750,8 +2750,8 @@ const char * arch_vma_name(struct vm_area_struct *vma);
 void print_vma_addr(char *prefix, unsigned long rip);
 
 void *sparse_buffer_alloc(unsigned long size);
-struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
-		struct vmem_altmap *altmap);
+struct page * __populate_section_memmap(unsigned long pfn,
+		unsigned long nr_pages, int nid, struct vmem_altmap *altmap);
 pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
 p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node);
 pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 7fec05796796..dcb023aa23d1 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -245,19 +245,26 @@ int __meminit vmemmap_populate_basepages(unsigned long start,
 	return 0;
 }
 
-struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid,
-		struct vmem_altmap *altmap)
+struct page * __meminit __populate_section_memmap(unsigned long pfn,
+		unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
 {
 	unsigned long start;
 	unsigned long end;
-	struct page *map;
 
-	map = pfn_to_page(pnum * PAGES_PER_SECTION);
-	start = (unsigned long)map;
-	end = (unsigned long)(map + PAGES_PER_SECTION);
+	/*
+	 * The minimum granularity of memmap extensions is
+	 * SECTION_ACTIVE_SIZE as allocations are tracked in the
+	 * 'map_active' bitmap of the section.
+	 */
+	end = ALIGN(pfn + nr_pages, PHYS_PFN(SECTION_ACTIVE_SIZE));
+	pfn &= PHYS_PFN(SECTION_ACTIVE_MASK);
+	nr_pages = end - pfn;
+
+	start = (unsigned long) pfn_to_page(pfn);
+	end = start + nr_pages * sizeof(struct page);
 
 	if (vmemmap_populate(start, end, nid, altmap))
 		return NULL;
 
-	return map;
+	return pfn_to_page(pfn);
 }
diff --git a/mm/sparse.c b/mm/sparse.c
index 5ef2f884c4e1..98408c0da060 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -452,8 +452,8 @@ static unsigned long __init section_map_size(void)
 	return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
 }
 
-struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid,
-		struct vmem_altmap *altmap)
+struct page __init *__populate_section_memmap(unsigned long pfn,
+		unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
 {
 	unsigned long size = section_map_size();
 	struct page *map = sparse_buffer_alloc(size);
@@ -534,10 +534,13 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
 	}
 	sparse_buffer_init(map_count * section_map_size(), nid);
 	for_each_present_section_nr(pnum_begin, pnum) {
+		unsigned long pfn = section_nr_to_pfn(pnum);
+
 		if (pnum >= pnum_end)
 			break;
 
-		map = sparse_mem_map_populate(pnum, nid, NULL);
+		map = __populate_section_memmap(pfn, PAGES_PER_SECTION,
+				nid, NULL);
 		if (!map) {
 			pr_err("%s: node[%d] memory map backing failed. Some memory will not be available.",
 			       __func__, nid);
@@ -637,17 +640,17 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 #endif
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
-		struct vmem_altmap *altmap)
+static struct page *populate_section_memmap(unsigned long pfn,
+		unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
 {
-	/* This will make the necessary allocations eventually. */
-	return sparse_mem_map_populate(pnum, nid, altmap);
+	return __populate_section_memmap(pfn, nr_pages, nid, altmap);
 }
-static void __kfree_section_memmap(struct page *memmap,
+
+static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
 		struct vmem_altmap *altmap)
 {
-	unsigned long start = (unsigned long)memmap;
-	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
+	unsigned long start = (unsigned long) pfn_to_page(pfn);
+	unsigned long end = start + nr_pages * sizeof(struct page);
 
 	vmemmap_free(start, end, altmap);
 }
@@ -661,11 +664,18 @@ static void free_map_bootmem(struct page *memmap)
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 #else
-static struct page *__kmalloc_section_memmap(void)
+struct page *populate_section_memmap(unsigned long pfn,
+		unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
 {
 	struct page *page, *ret;
 	unsigned long memmap_size = sizeof(struct page) * PAGES_PER_SECTION;
 
+	if ((pfn & ~PAGE_SECTION_MASK) || nr_pages != PAGES_PER_SECTION) {
+		WARN(1, "%s: called with section unaligned parameters\n",
+				__func__);
+		return NULL;
+	}
+
 	page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size));
 	if (page)
 		goto got_map_page;
@@ -682,15 +692,17 @@ static struct page *__kmalloc_section_memmap(void)
 	return ret;
 }
 
-static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
+static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
 		struct vmem_altmap *altmap)
 {
-	return __kmalloc_section_memmap();
-}
+	struct page *memmap = pfn_to_page(pfn);
+
+	if ((pfn & ~PAGE_SECTION_MASK) || nr_pages != PAGES_PER_SECTION) {
+		WARN(1, "%s: called with section unaligned parameters\n",
+				__func__);
+		return;
+	}
 
-static void __kfree_section_memmap(struct page *memmap,
-		struct vmem_altmap *altmap)
-{
 	if (is_vmalloc_addr(memmap))
 		vfree(memmap);
 	else
@@ -761,12 +773,13 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
 	if (ret < 0 && ret != -EEXIST)
 		return ret;
 	ret = 0;
-	memmap = kmalloc_section_memmap(section_nr, nid, altmap);
+	memmap = populate_section_memmap(start_pfn, PAGES_PER_SECTION, nid,
+			altmap);
 	if (!memmap)
 		return -ENOMEM;
 	usage = kzalloc(mem_section_usage_size(), GFP_KERNEL);
 	if (!usage) {
-		__kfree_section_memmap(memmap, altmap);
+		depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap);
 		return -ENOMEM;
 	}
 
@@ -788,7 +801,7 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
 out:
 	if (ret < 0) {
 		kfree(usage);
-		__kfree_section_memmap(memmap, altmap);
+		depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap);
 	}
 	return ret;
 }
@@ -825,7 +838,8 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
 #endif
 
 static void free_section_usage(struct page *memmap,
-		struct mem_section_usage *usage, struct vmem_altmap *altmap)
+		struct mem_section_usage *usage, unsigned long pfn,
+		unsigned long nr_pages, struct vmem_altmap *altmap)
 {
 	struct page *usage_page;
 
@@ -839,7 +853,7 @@ static void free_section_usage(struct page *memmap,
 	if (PageSlab(usage_page) || PageCompound(usage_page)) {
 		kfree(usage);
 		if (memmap)
-			__kfree_section_memmap(memmap, altmap);
+			depopulate_section_memmap(pfn, nr_pages, altmap);
 		return;
 	}
 
@@ -868,7 +882,8 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
 
 	clear_hwpoisoned_pages(memmap + map_offset,
 			PAGES_PER_SECTION - map_offset);
-	free_section_usage(memmap, usage, altmap);
+	free_section_usage(memmap, usage, section_nr_to_pfn(__section_nr(ms)),
+			PAGES_PER_SECTION, altmap);
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif /* CONFIG_MEMORY_HOTPLUG */


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 06/12] mm/hotplug: Add mem-hotplug restrictions for remove_memory()
  2019-04-17 18:38 ` Dan Williams
@ 2019-04-17 18:39   ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, linux-nvdimm, david, linux-kernel, linux-mm

Teach the arch_remove_memory() path to consult the same 'struct
mhp_restrictions' context as was specified at arch_add_memory() time.

No functional change, this is a preparation step for teaching
__remove_pages() about how and when to allow sub-section hot-remove, and
a cleanup for an unnecessary "is_dev_zone()" special case.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/ia64/mm/init.c            |    4 ++--
 arch/powerpc/mm/mem.c          |    5 +++--
 arch/s390/mm/init.c            |    2 +-
 arch/sh/mm/init.c              |    4 ++--
 arch/x86/mm/init_32.c          |    4 ++--
 arch/x86/mm/init_64.c          |    5 +++--
 include/linux/memory_hotplug.h |    5 +++--
 kernel/memremap.c              |   14 ++++++++------
 mm/memory_hotplug.c            |   17 ++++++++---------
 9 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index d28e29103bdb..86c69c87e7e8 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -683,14 +683,14 @@ int arch_add_memory(int nid, u64 start, u64 size,
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
-			struct vmem_altmap *altmap)
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	struct zone *zone;
 
 	zone = page_zone(pfn_to_page(start_pfn));
-	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	__remove_pages(zone, start_pfn, nr_pages, restrictions);
 }
 #endif
 #endif
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index cc9425fb9056..ccab989f397d 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -132,10 +132,11 @@ int __meminit arch_add_memory(int nid, u64 start, u64 size,
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 void __meminit arch_remove_memory(int nid, u64 start, u64 size,
-				  struct vmem_altmap *altmap)
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
+	struct vmem_altmap *altmap = restrictions->altmap;
 	struct page *page;
 	int ret;
 
@@ -147,7 +148,7 @@ void __meminit arch_remove_memory(int nid, u64 start, u64 size,
 	if (altmap)
 		page += vmem_altmap_offset(altmap);
 
-	__remove_pages(page_zone(page), start_pfn, nr_pages, altmap);
+	__remove_pages(page_zone(page), start_pfn, nr_pages, restrictions);
 
 	/* Remove htab bolted mappings for this section of memory */
 	start = (unsigned long)__va(start);
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 31b1071315d7..3af7b99af1b1 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -235,7 +235,7 @@ int arch_add_memory(int nid, u64 start, u64 size,
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
-			struct vmem_altmap *altmap)
+		struct mhp_restrictions *restrictions)
 {
 	/*
 	 * There is no hardware or firmware interface which could trigger a
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 5aeb4d7099a1..3cff7e4723e6 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -430,14 +430,14 @@ EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
-			struct vmem_altmap *altmap)
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	struct zone *zone;
 
 	zone = page_zone(pfn_to_page(start_pfn));
-	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	__remove_pages(zone, start_pfn, nr_pages, restrictions);
 }
 #endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 075e568098f2..ba888fd38f5d 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -861,14 +861,14 @@ int arch_add_memory(int nid, u64 start, u64 size,
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
-			struct vmem_altmap *altmap)
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	struct zone *zone;
 
 	zone = page_zone(pfn_to_page(start_pfn));
-	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	__remove_pages(zone, start_pfn, nr_pages, restrictions);
 }
 #endif
 #endif
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index bb018d09d2dc..4071632be007 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1142,8 +1142,9 @@ kernel_physical_mapping_remove(unsigned long start, unsigned long end)
 }
 
 void __ref arch_remove_memory(int nid, u64 start, u64 size,
-			      struct vmem_altmap *altmap)
+		struct mhp_restrictions *restrictions)
 {
+	struct vmem_altmap *altmap = restrictions->altmap;
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	struct page *page = pfn_to_page(start_pfn);
@@ -1153,7 +1154,7 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
 	if (altmap)
 		page += vmem_altmap_offset(altmap);
 	zone = page_zone(page);
-	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	__remove_pages(zone, start_pfn, nr_pages, restrictions);
 	kernel_physical_mapping_remove(start, start + size);
 }
 #endif
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index ae892eef8b82..31b768bd1268 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -125,9 +125,10 @@ static inline bool movable_node_is_enabled(void)
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 extern void arch_remove_memory(int nid, u64 start, u64 size,
-			       struct vmem_altmap *altmap);
+		struct mhp_restrictions *restrictions);
 extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
-			   unsigned long nr_pages, struct vmem_altmap *altmap);
+		unsigned long nr_pages,
+		struct mhp_restrictions *restrictions);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 /*
diff --git a/kernel/memremap.c b/kernel/memremap.c
index f355586ea54a..33475e211568 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -108,8 +108,11 @@ static void devm_memremap_pages_release(void *data)
 		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
 				align_size >> PAGE_SHIFT, NULL);
 	} else {
-		arch_remove_memory(nid, align_start, align_size,
-				pgmap->altmap_valid ? &pgmap->altmap : NULL);
+		struct mhp_restrictions restrictions = {
+			.altmap = pgmap->altmap_valid ? &pgmap->altmap : NULL,
+		};
+
+		arch_remove_memory(nid, align_start, align_size, &restrictions);
 		kasan_remove_zero_shadow(__va(align_start), align_size);
 	}
 	mem_hotplug_done();
@@ -142,15 +145,14 @@ static void devm_memremap_pages_release(void *data)
 void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 {
 	resource_size_t align_start, align_size, align_end;
-	struct vmem_altmap *altmap = pgmap->altmap_valid ?
-			&pgmap->altmap : NULL;
 	struct resource *res = &pgmap->res;
 	struct dev_pagemap *conflict_pgmap;
 	struct mhp_restrictions restrictions = {
 		/*
 		 * We do not want any optional features only our own memmap
 		*/
-		.altmap = altmap,
+
+		.altmap = pgmap->altmap_valid ? &pgmap->altmap : NULL,
 	};
 	pgprot_t pgprot = PAGE_KERNEL;
 	int error, nid, is_ram;
@@ -235,7 +237,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 
 		zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE];
 		move_pfn_range_to_zone(zone, align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, altmap);
+				align_size >> PAGE_SHIFT, restrictions.altmap);
 	}
 
 	mem_hotplug_done();
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d5874f9d4043..055cea62be6e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -543,7 +543,7 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
  * @zone: zone from which pages need to be removed
  * @phys_start_pfn: starting pageframe (must be aligned to start of a section)
  * @nr_pages: number of pages to remove (must be multiple of section size)
- * @altmap: alternative device page map or %NULL if default memmap is used
+ * @restrictions: optional alternative device page map and other features
  *
  * Generic helper function to remove section mappings and sysfs entries
  * for the section of the memory we are removing. Caller needs to make
@@ -551,17 +551,15 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
  * calling offline_pages().
  */
 void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
-		    unsigned long nr_pages, struct vmem_altmap *altmap)
+		unsigned long nr_pages, struct mhp_restrictions *restrictions)
 {
 	unsigned long i;
-	unsigned long map_offset = 0;
 	int sections_to_remove;
+	unsigned long map_offset = 0;
+	struct vmem_altmap *altmap = restrictions->altmap;
 
-	/* In the ZONE_DEVICE case device driver owns the memory region */
-	if (is_dev_zone(zone)) {
-		if (altmap)
-			map_offset = vmem_altmap_offset(altmap);
-	}
+	if (altmap)
+		map_offset = vmem_altmap_offset(altmap);
 
 	clear_zone_contiguous(zone);
 
@@ -1832,6 +1830,7 @@ static void __release_memory_resource(u64 start, u64 size)
  */
 void __ref __remove_memory(int nid, u64 start, u64 size)
 {
+	struct mhp_restrictions restrictions = { 0 };
 	int ret;
 
 	BUG_ON(check_hotplug_memory_range(start, size));
@@ -1853,7 +1852,7 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
 	memblock_free(start, size);
 	memblock_remove(start, size);
 
-	arch_remove_memory(nid, start, size, NULL);
+	arch_remove_memory(nid, start, size, &restrictions);
 	__release_memory_resource(start, size);
 
 	try_offline_node(nid);

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 06/12] mm/hotplug: Add mem-hotplug restrictions for remove_memory()
@ 2019-04-17 18:39   ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm
  Cc: Michal Hocko, Logan Gunthorpe, David Hildenbrand, linux-mm,
	linux-nvdimm, linux-kernel, mhocko, david

Teach the arch_remove_memory() path to consult the same 'struct
mhp_restrictions' context as was specified at arch_add_memory() time.

No functional change, this is a preparation step for teaching
__remove_pages() about how and when to allow sub-section hot-remove, and
a cleanup for an unnecessary "is_dev_zone()" special case.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/ia64/mm/init.c            |    4 ++--
 arch/powerpc/mm/mem.c          |    5 +++--
 arch/s390/mm/init.c            |    2 +-
 arch/sh/mm/init.c              |    4 ++--
 arch/x86/mm/init_32.c          |    4 ++--
 arch/x86/mm/init_64.c          |    5 +++--
 include/linux/memory_hotplug.h |    5 +++--
 kernel/memremap.c              |   14 ++++++++------
 mm/memory_hotplug.c            |   17 ++++++++---------
 9 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index d28e29103bdb..86c69c87e7e8 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -683,14 +683,14 @@ int arch_add_memory(int nid, u64 start, u64 size,
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
-			struct vmem_altmap *altmap)
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	struct zone *zone;
 
 	zone = page_zone(pfn_to_page(start_pfn));
-	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	__remove_pages(zone, start_pfn, nr_pages, restrictions);
 }
 #endif
 #endif
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index cc9425fb9056..ccab989f397d 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -132,10 +132,11 @@ int __meminit arch_add_memory(int nid, u64 start, u64 size,
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 void __meminit arch_remove_memory(int nid, u64 start, u64 size,
-				  struct vmem_altmap *altmap)
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
+	struct vmem_altmap *altmap = restrictions->altmap;
 	struct page *page;
 	int ret;
 
@@ -147,7 +148,7 @@ void __meminit arch_remove_memory(int nid, u64 start, u64 size,
 	if (altmap)
 		page += vmem_altmap_offset(altmap);
 
-	__remove_pages(page_zone(page), start_pfn, nr_pages, altmap);
+	__remove_pages(page_zone(page), start_pfn, nr_pages, restrictions);
 
 	/* Remove htab bolted mappings for this section of memory */
 	start = (unsigned long)__va(start);
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 31b1071315d7..3af7b99af1b1 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -235,7 +235,7 @@ int arch_add_memory(int nid, u64 start, u64 size,
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
-			struct vmem_altmap *altmap)
+		struct mhp_restrictions *restrictions)
 {
 	/*
 	 * There is no hardware or firmware interface which could trigger a
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 5aeb4d7099a1..3cff7e4723e6 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -430,14 +430,14 @@ EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
-			struct vmem_altmap *altmap)
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	struct zone *zone;
 
 	zone = page_zone(pfn_to_page(start_pfn));
-	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	__remove_pages(zone, start_pfn, nr_pages, restrictions);
 }
 #endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 075e568098f2..ba888fd38f5d 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -861,14 +861,14 @@ int arch_add_memory(int nid, u64 start, u64 size,
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
-			struct vmem_altmap *altmap)
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	struct zone *zone;
 
 	zone = page_zone(pfn_to_page(start_pfn));
-	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	__remove_pages(zone, start_pfn, nr_pages, restrictions);
 }
 #endif
 #endif
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index bb018d09d2dc..4071632be007 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1142,8 +1142,9 @@ kernel_physical_mapping_remove(unsigned long start, unsigned long end)
 }
 
 void __ref arch_remove_memory(int nid, u64 start, u64 size,
-			      struct vmem_altmap *altmap)
+		struct mhp_restrictions *restrictions)
 {
+	struct vmem_altmap *altmap = restrictions->altmap;
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	struct page *page = pfn_to_page(start_pfn);
@@ -1153,7 +1154,7 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
 	if (altmap)
 		page += vmem_altmap_offset(altmap);
 	zone = page_zone(page);
-	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	__remove_pages(zone, start_pfn, nr_pages, restrictions);
 	kernel_physical_mapping_remove(start, start + size);
 }
 #endif
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index ae892eef8b82..31b768bd1268 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -125,9 +125,10 @@ static inline bool movable_node_is_enabled(void)
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 extern void arch_remove_memory(int nid, u64 start, u64 size,
-			       struct vmem_altmap *altmap);
+		struct mhp_restrictions *restrictions);
 extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
-			   unsigned long nr_pages, struct vmem_altmap *altmap);
+		unsigned long nr_pages,
+		struct mhp_restrictions *restrictions);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 /*
diff --git a/kernel/memremap.c b/kernel/memremap.c
index f355586ea54a..33475e211568 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -108,8 +108,11 @@ static void devm_memremap_pages_release(void *data)
 		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
 				align_size >> PAGE_SHIFT, NULL);
 	} else {
-		arch_remove_memory(nid, align_start, align_size,
-				pgmap->altmap_valid ? &pgmap->altmap : NULL);
+		struct mhp_restrictions restrictions = {
+			.altmap = pgmap->altmap_valid ? &pgmap->altmap : NULL,
+		};
+
+		arch_remove_memory(nid, align_start, align_size, &restrictions);
 		kasan_remove_zero_shadow(__va(align_start), align_size);
 	}
 	mem_hotplug_done();
@@ -142,15 +145,14 @@ static void devm_memremap_pages_release(void *data)
 void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 {
 	resource_size_t align_start, align_size, align_end;
-	struct vmem_altmap *altmap = pgmap->altmap_valid ?
-			&pgmap->altmap : NULL;
 	struct resource *res = &pgmap->res;
 	struct dev_pagemap *conflict_pgmap;
 	struct mhp_restrictions restrictions = {
 		/*
 		 * We do not want any optional features only our own memmap
 		*/
-		.altmap = altmap,
+
+		.altmap = pgmap->altmap_valid ? &pgmap->altmap : NULL,
 	};
 	pgprot_t pgprot = PAGE_KERNEL;
 	int error, nid, is_ram;
@@ -235,7 +237,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 
 		zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE];
 		move_pfn_range_to_zone(zone, align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, altmap);
+				align_size >> PAGE_SHIFT, restrictions.altmap);
 	}
 
 	mem_hotplug_done();
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d5874f9d4043..055cea62be6e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -543,7 +543,7 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
  * @zone: zone from which pages need to be removed
  * @phys_start_pfn: starting pageframe (must be aligned to start of a section)
  * @nr_pages: number of pages to remove (must be multiple of section size)
- * @altmap: alternative device page map or %NULL if default memmap is used
+ * @restrictions: optional alternative device page map and other features
  *
  * Generic helper function to remove section mappings and sysfs entries
  * for the section of the memory we are removing. Caller needs to make
@@ -551,17 +551,15 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
  * calling offline_pages().
  */
 void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
-		    unsigned long nr_pages, struct vmem_altmap *altmap)
+		unsigned long nr_pages, struct mhp_restrictions *restrictions)
 {
 	unsigned long i;
-	unsigned long map_offset = 0;
 	int sections_to_remove;
+	unsigned long map_offset = 0;
+	struct vmem_altmap *altmap = restrictions->altmap;
 
-	/* In the ZONE_DEVICE case device driver owns the memory region */
-	if (is_dev_zone(zone)) {
-		if (altmap)
-			map_offset = vmem_altmap_offset(altmap);
-	}
+	if (altmap)
+		map_offset = vmem_altmap_offset(altmap);
 
 	clear_zone_contiguous(zone);
 
@@ -1832,6 +1830,7 @@ static void __release_memory_resource(u64 start, u64 size)
  */
 void __ref __remove_memory(int nid, u64 start, u64 size)
 {
+	struct mhp_restrictions restrictions = { 0 };
 	int ret;
 
 	BUG_ON(check_hotplug_memory_range(start, size));
@@ -1853,7 +1852,7 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
 	memblock_free(start, size);
 	memblock_remove(start, size);
 
-	arch_remove_memory(nid, start, size, NULL);
+	arch_remove_memory(nid, start, size, &restrictions);
 	__release_memory_resource(start, size);
 
 	try_offline_node(nid);


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 07/12] mm: Kill is_dev_zone() helper
  2019-04-17 18:38 ` Dan Williams
@ 2019-04-17 18:39   ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, linux-nvdimm, david, linux-kernel, linux-mm

Given there are no more usages of is_dev_zone() outside of 'ifdef
CONFIG_ZONE_DEVICE' protection, kill off the compilation helper.

Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/mmzone.h |   12 ------------
 mm/page_alloc.c        |    2 +-
 2 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b13f0cddf75e..3237c5e456df 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -855,18 +855,6 @@ static inline int local_memory_node(int node_id) { return node_id; };
  */
 #define zone_idx(zone)		((zone) - (zone)->zone_pgdat->node_zones)
 
-#ifdef CONFIG_ZONE_DEVICE
-static inline bool is_dev_zone(const struct zone *zone)
-{
-	return zone_idx(zone) == ZONE_DEVICE;
-}
-#else
-static inline bool is_dev_zone(const struct zone *zone)
-{
-	return false;
-}
-#endif
-
 /*
  * Returns true if a zone has pages managed by the buddy allocator.
  * All the reclaim decisions have to use this function rather than
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c9ad28a78018..fd455bd742d5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5844,7 +5844,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
 	unsigned long start = jiffies;
 	int nid = pgdat->node_id;
 
-	if (WARN_ON_ONCE(!pgmap || !is_dev_zone(zone)))
+	if (WARN_ON_ONCE(!pgmap || zone_idx(zone) != ZONE_DEVICE))
 		return;
 
 	/*

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 07/12] mm: Kill is_dev_zone() helper
@ 2019-04-17 18:39   ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm
  Cc: Michal Hocko, David Hildenbrand, Logan Gunthorpe, linux-mm,
	linux-nvdimm, linux-kernel, mhocko, david

Given there are no more usages of is_dev_zone() outside of 'ifdef
CONFIG_ZONE_DEVICE' protection, kill off the compilation helper.

Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/mmzone.h |   12 ------------
 mm/page_alloc.c        |    2 +-
 2 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b13f0cddf75e..3237c5e456df 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -855,18 +855,6 @@ static inline int local_memory_node(int node_id) { return node_id; };
  */
 #define zone_idx(zone)		((zone) - (zone)->zone_pgdat->node_zones)
 
-#ifdef CONFIG_ZONE_DEVICE
-static inline bool is_dev_zone(const struct zone *zone)
-{
-	return zone_idx(zone) == ZONE_DEVICE;
-}
-#else
-static inline bool is_dev_zone(const struct zone *zone)
-{
-	return false;
-}
-#endif
-
 /*
  * Returns true if a zone has pages managed by the buddy allocator.
  * All the reclaim decisions have to use this function rather than
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c9ad28a78018..fd455bd742d5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5844,7 +5844,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
 	unsigned long start = jiffies;
 	int nid = pgdat->node_id;
 
-	if (WARN_ON_ONCE(!pgmap || !is_dev_zone(zone)))
+	if (WARN_ON_ONCE(!pgmap || zone_idx(zone) != ZONE_DEVICE))
 		return;
 
 	/*


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 08/12] mm/sparsemem: Prepare for sub-section ranges
  2019-04-17 18:38 ` Dan Williams
@ 2019-04-17 18:39   ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, linux-nvdimm, david, linux-kernel, linux-mm, Vlastimil Babka

Prepare the memory hot-{add,remove} paths for handling sub-section
ranges by plumbing the starting page frame and number of pages being
handled through arch_{add,remove}_memory() to
sparse_{add,remove}_one_section().

This is simply plumbing, small cleanups, and some identifier renames. No
intended functional changes.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/memory_hotplug.h |    7 ++-
 mm/memory_hotplug.c            |  103 ++++++++++++++++++++++++----------------
 mm/sparse.c                    |    7 ++-
 3 files changed, 71 insertions(+), 46 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 31b768bd1268..70dd3b4d9ceb 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -355,9 +355,10 @@ extern int add_memory_resource(int nid, struct resource *resource);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap);
 extern bool is_memblock_offlined(struct memory_block *mem);
-extern int sparse_add_one_section(int nid, unsigned long start_pfn,
-				  struct vmem_altmap *altmap);
-extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
+extern int sparse_add_section(int nid, unsigned long pfn,
+		unsigned long nr_pages, struct vmem_altmap *altmap);
+extern void sparse_remove_section(struct zone *zone, struct mem_section *ms,
+		unsigned long pfn, unsigned long nr_pages,
 		unsigned long map_offset, struct vmem_altmap *altmap);
 extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
 					  unsigned long pnum);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 055cea62be6e..6622c4d06ac3 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -251,22 +251,44 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
 }
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
-static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
-		struct vmem_altmap *altmap, bool want_memblock)
+static int __meminit __add_section(int nid, unsigned long pfn,
+		unsigned long nr_pages,	struct vmem_altmap *altmap,
+		bool want_memblock)
 {
 	int ret;
 
-	if (pfn_valid(phys_start_pfn))
+	if (pfn_valid(pfn))
 		return -EEXIST;
 
-	ret = sparse_add_one_section(nid, phys_start_pfn, altmap);
+	ret = sparse_add_section(nid, pfn, nr_pages, altmap);
 	if (ret < 0)
 		return ret;
 
 	if (!want_memblock)
 		return 0;
 
-	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
+	return hotplug_memory_register(nid, __pfn_to_section(pfn));
+}
+
+static int subsection_check(unsigned long pfn, unsigned long nr_pages,
+		struct mhp_restrictions *restrictions, const char *reason)
+{
+	/*
+	 * Only allow partial section hotplug for !memblock ranges,
+	 * since register_new_memory() requires section alignment, and
+	 * CONFIG_SPARSEMEM_VMEMMAP=n requires sections to be fully
+	 * populated.
+	 */
+	if ((!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)
+				|| (restrictions->flags & MHP_MEMBLOCK_API))
+			&& ((pfn & ~PAGE_SECTION_MASK)
+				|| (nr_pages & ~PAGE_SECTION_MASK))) {
+		WARN(1, "Sub-section hot-%s incompatible with %s\n", reason,
+				(restrictions->flags & MHP_MEMBLOCK_API)
+				? "memblock api" : "!CONFIG_SPARSEMEM_VMEMMAP");
+		return -EINVAL;
+	}
+	return 0;
 }
 
 /*
@@ -275,23 +297,19 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
  * call this function after deciding the zone to which to
  * add the new pages.
  */
-int __ref __add_pages(int nid, unsigned long phys_start_pfn,
-		unsigned long nr_pages, struct mhp_restrictions *restrictions)
+int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long i;
 	int err = 0;
 	int start_sec, end_sec;
 	struct vmem_altmap *altmap = restrictions->altmap;
 
-	/* during initialize mem_map, align hot-added range to section */
-	start_sec = pfn_to_section_nr(phys_start_pfn);
-	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
-
 	if (altmap) {
 		/*
 		 * Validate altmap is within bounds of the total request
 		 */
-		if (altmap->base_pfn != phys_start_pfn
+		if (altmap->base_pfn != pfn
 				|| vmem_altmap_offset(altmap) > nr_pages) {
 			pr_warn_once("memory add fail, invalid altmap\n");
 			err = -EINVAL;
@@ -300,9 +318,17 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 		altmap->alloc = 0;
 	}
 
+	start_sec = pfn_to_section_nr(pfn);
+	end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
 	for (i = start_sec; i <= end_sec; i++) {
-		err = __add_section(nid, section_nr_to_pfn(i), altmap,
+		unsigned long pfns;
+
+		pfns = min(nr_pages, PAGES_PER_SECTION
+				- (pfn & ~PAGE_SECTION_MASK));
+		err = __add_section(nid, pfn, pfns, altmap,
 				restrictions->flags & MHP_MEMBLOCK_API);
+		pfn += pfns;
+		nr_pages -= pfns;
 
 		/*
 		 * EEXIST is finally dealt with by ioresource collision
@@ -507,10 +533,10 @@ static void shrink_pgdat_span(struct pglist_data *pgdat,
 	pgdat->node_spanned_pages = 0;
 }
 
-static void __remove_zone(struct zone *zone, unsigned long start_pfn)
+static void __remove_zone(struct zone *zone, unsigned long start_pfn,
+		unsigned long nr_pages)
 {
 	struct pglist_data *pgdat = zone->zone_pgdat;
-	int nr_pages = PAGES_PER_SECTION;
 	unsigned long flags;
 
 	pgdat_resize_lock(zone->zone_pgdat, &flags);
@@ -519,29 +545,26 @@ static void __remove_zone(struct zone *zone, unsigned long start_pfn)
 	pgdat_resize_unlock(zone->zone_pgdat, &flags);
 }
 
-static void __remove_section(struct zone *zone, struct mem_section *ms,
-			     unsigned long map_offset,
-			     struct vmem_altmap *altmap)
+static void __remove_section(struct zone *zone, unsigned long pfn,
+		unsigned long nr_pages, unsigned long map_offset,
+		struct vmem_altmap *altmap)
 {
-	unsigned long start_pfn;
-	int scn_nr;
+	struct mem_section *ms = __nr_to_section(pfn_to_section_nr(pfn));
 
 	if (WARN_ON_ONCE(!valid_section(ms)))
 		return;
 
 	unregister_memory_section(ms);
 
-	scn_nr = __section_nr(ms);
-	start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
-	__remove_zone(zone, start_pfn);
+	__remove_zone(zone, pfn, nr_pages);
 
-	sparse_remove_one_section(zone, ms, map_offset, altmap);
+	sparse_remove_section(zone, ms, pfn, nr_pages, map_offset, altmap);
 }
 
 /**
  * __remove_pages() - remove sections of pages from a zone
  * @zone: zone from which pages need to be removed
- * @phys_start_pfn: starting pageframe (must be aligned to start of a section)
+ * @pfn: starting pageframe (must be aligned to start of a section)
  * @nr_pages: number of pages to remove (must be multiple of section size)
  * @restrictions: optional alternative device page map and other features
  *
@@ -550,11 +573,10 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
  * sure that pages are marked reserved and zones are adjust properly by
  * calling offline_pages().
  */
-void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
-		unsigned long nr_pages, struct mhp_restrictions *restrictions)
+void __remove_pages(struct zone *zone, unsigned long pfn,
+		 unsigned long nr_pages, struct mhp_restrictions *restrictions)
 {
-	unsigned long i;
-	int sections_to_remove;
+	int i, start_sec, end_sec;
 	unsigned long map_offset = 0;
 	struct vmem_altmap *altmap = restrictions->altmap;
 
@@ -563,19 +585,20 @@ void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 
 	clear_zone_contiguous(zone);
 
-	/*
-	 * We can only remove entire sections
-	 */
-	BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
-	BUG_ON(nr_pages % PAGES_PER_SECTION);
+	if (subsection_check(pfn, nr_pages, restrictions, "remove"))
+		return;
 
-	sections_to_remove = nr_pages / PAGES_PER_SECTION;
-	for (i = 0; i < sections_to_remove; i++) {
-		unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
+	start_sec = pfn_to_section_nr(pfn);
+	end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
+	for (i = start_sec; i <= end_sec; i++) {
+		unsigned long pfns;
 
 		cond_resched();
-		__remove_section(zone, __pfn_to_section(pfn), map_offset,
-				 altmap);
+		pfns = min(nr_pages, PAGES_PER_SECTION
+				- (pfn & ~PAGE_SECTION_MASK));
+		__remove_section(zone, pfn, pfns, map_offset, altmap);
+		pfn += pfns;
+		nr_pages -= pfns;
 		map_offset = 0;
 	}
 
@@ -1830,7 +1853,7 @@ static void __release_memory_resource(u64 start, u64 size)
  */
 void __ref __remove_memory(int nid, u64 start, u64 size)
 {
-	struct mhp_restrictions restrictions = { 0 };
+	struct mhp_restrictions restrictions = { .flags = MHP_MEMBLOCK_API };
 	int ret;
 
 	BUG_ON(check_hotplug_memory_range(start, size));
diff --git a/mm/sparse.c b/mm/sparse.c
index 98408c0da060..bd45bff78ca1 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -756,8 +756,8 @@ static void free_map_bootmem(struct page *memmap)
  * * -EEXIST	- Section has been present.
  * * -ENOMEM	- Out of memory.
  */
-int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
-				     struct vmem_altmap *altmap)
+int __meminit sparse_add_section(int nid, unsigned long start_pfn,
+		unsigned long nr_pages, struct vmem_altmap *altmap)
 {
 	unsigned long section_nr = pfn_to_section_nr(start_pfn);
 	struct mem_section_usage *usage;
@@ -866,7 +866,8 @@ static void free_section_usage(struct page *memmap,
 		free_map_bootmem(memmap);
 }
 
-void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
+void sparse_remove_section(struct zone *zone, struct mem_section *ms,
+		unsigned long pfn, unsigned long nr_pages,
 		unsigned long map_offset, struct vmem_altmap *altmap)
 {
 	struct page *memmap = NULL;

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 08/12] mm/sparsemem: Prepare for sub-section ranges
@ 2019-04-17 18:39   ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm
  Cc: Michal Hocko, Vlastimil Babka, Logan Gunthorpe, linux-mm,
	linux-nvdimm, linux-kernel, mhocko, david

Prepare the memory hot-{add,remove} paths for handling sub-section
ranges by plumbing the starting page frame and number of pages being
handled through arch_{add,remove}_memory() to
sparse_{add,remove}_one_section().

This is simply plumbing, small cleanups, and some identifier renames. No
intended functional changes.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/memory_hotplug.h |    7 ++-
 mm/memory_hotplug.c            |  103 ++++++++++++++++++++++++----------------
 mm/sparse.c                    |    7 ++-
 3 files changed, 71 insertions(+), 46 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 31b768bd1268..70dd3b4d9ceb 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -355,9 +355,10 @@ extern int add_memory_resource(int nid, struct resource *resource);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap);
 extern bool is_memblock_offlined(struct memory_block *mem);
-extern int sparse_add_one_section(int nid, unsigned long start_pfn,
-				  struct vmem_altmap *altmap);
-extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
+extern int sparse_add_section(int nid, unsigned long pfn,
+		unsigned long nr_pages, struct vmem_altmap *altmap);
+extern void sparse_remove_section(struct zone *zone, struct mem_section *ms,
+		unsigned long pfn, unsigned long nr_pages,
 		unsigned long map_offset, struct vmem_altmap *altmap);
 extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
 					  unsigned long pnum);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 055cea62be6e..6622c4d06ac3 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -251,22 +251,44 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
 }
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
-static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
-		struct vmem_altmap *altmap, bool want_memblock)
+static int __meminit __add_section(int nid, unsigned long pfn,
+		unsigned long nr_pages,	struct vmem_altmap *altmap,
+		bool want_memblock)
 {
 	int ret;
 
-	if (pfn_valid(phys_start_pfn))
+	if (pfn_valid(pfn))
 		return -EEXIST;
 
-	ret = sparse_add_one_section(nid, phys_start_pfn, altmap);
+	ret = sparse_add_section(nid, pfn, nr_pages, altmap);
 	if (ret < 0)
 		return ret;
 
 	if (!want_memblock)
 		return 0;
 
-	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
+	return hotplug_memory_register(nid, __pfn_to_section(pfn));
+}
+
+static int subsection_check(unsigned long pfn, unsigned long nr_pages,
+		struct mhp_restrictions *restrictions, const char *reason)
+{
+	/*
+	 * Only allow partial section hotplug for !memblock ranges,
+	 * since register_new_memory() requires section alignment, and
+	 * CONFIG_SPARSEMEM_VMEMMAP=n requires sections to be fully
+	 * populated.
+	 */
+	if ((!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)
+				|| (restrictions->flags & MHP_MEMBLOCK_API))
+			&& ((pfn & ~PAGE_SECTION_MASK)
+				|| (nr_pages & ~PAGE_SECTION_MASK))) {
+		WARN(1, "Sub-section hot-%s incompatible with %s\n", reason,
+				(restrictions->flags & MHP_MEMBLOCK_API)
+				? "memblock api" : "!CONFIG_SPARSEMEM_VMEMMAP");
+		return -EINVAL;
+	}
+	return 0;
 }
 
 /*
@@ -275,23 +297,19 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
  * call this function after deciding the zone to which to
  * add the new pages.
  */
-int __ref __add_pages(int nid, unsigned long phys_start_pfn,
-		unsigned long nr_pages, struct mhp_restrictions *restrictions)
+int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long i;
 	int err = 0;
 	int start_sec, end_sec;
 	struct vmem_altmap *altmap = restrictions->altmap;
 
-	/* during initialize mem_map, align hot-added range to section */
-	start_sec = pfn_to_section_nr(phys_start_pfn);
-	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
-
 	if (altmap) {
 		/*
 		 * Validate altmap is within bounds of the total request
 		 */
-		if (altmap->base_pfn != phys_start_pfn
+		if (altmap->base_pfn != pfn
 				|| vmem_altmap_offset(altmap) > nr_pages) {
 			pr_warn_once("memory add fail, invalid altmap\n");
 			err = -EINVAL;
@@ -300,9 +318,17 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 		altmap->alloc = 0;
 	}
 
+	start_sec = pfn_to_section_nr(pfn);
+	end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
 	for (i = start_sec; i <= end_sec; i++) {
-		err = __add_section(nid, section_nr_to_pfn(i), altmap,
+		unsigned long pfns;
+
+		pfns = min(nr_pages, PAGES_PER_SECTION
+				- (pfn & ~PAGE_SECTION_MASK));
+		err = __add_section(nid, pfn, pfns, altmap,
 				restrictions->flags & MHP_MEMBLOCK_API);
+		pfn += pfns;
+		nr_pages -= pfns;
 
 		/*
 		 * EEXIST is finally dealt with by ioresource collision
@@ -507,10 +533,10 @@ static void shrink_pgdat_span(struct pglist_data *pgdat,
 	pgdat->node_spanned_pages = 0;
 }
 
-static void __remove_zone(struct zone *zone, unsigned long start_pfn)
+static void __remove_zone(struct zone *zone, unsigned long start_pfn,
+		unsigned long nr_pages)
 {
 	struct pglist_data *pgdat = zone->zone_pgdat;
-	int nr_pages = PAGES_PER_SECTION;
 	unsigned long flags;
 
 	pgdat_resize_lock(zone->zone_pgdat, &flags);
@@ -519,29 +545,26 @@ static void __remove_zone(struct zone *zone, unsigned long start_pfn)
 	pgdat_resize_unlock(zone->zone_pgdat, &flags);
 }
 
-static void __remove_section(struct zone *zone, struct mem_section *ms,
-			     unsigned long map_offset,
-			     struct vmem_altmap *altmap)
+static void __remove_section(struct zone *zone, unsigned long pfn,
+		unsigned long nr_pages, unsigned long map_offset,
+		struct vmem_altmap *altmap)
 {
-	unsigned long start_pfn;
-	int scn_nr;
+	struct mem_section *ms = __nr_to_section(pfn_to_section_nr(pfn));
 
 	if (WARN_ON_ONCE(!valid_section(ms)))
 		return;
 
 	unregister_memory_section(ms);
 
-	scn_nr = __section_nr(ms);
-	start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
-	__remove_zone(zone, start_pfn);
+	__remove_zone(zone, pfn, nr_pages);
 
-	sparse_remove_one_section(zone, ms, map_offset, altmap);
+	sparse_remove_section(zone, ms, pfn, nr_pages, map_offset, altmap);
 }
 
 /**
  * __remove_pages() - remove sections of pages from a zone
  * @zone: zone from which pages need to be removed
- * @phys_start_pfn: starting pageframe (must be aligned to start of a section)
+ * @pfn: starting pageframe (must be aligned to start of a section)
  * @nr_pages: number of pages to remove (must be multiple of section size)
  * @restrictions: optional alternative device page map and other features
  *
@@ -550,11 +573,10 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
  * sure that pages are marked reserved and zones are adjust properly by
  * calling offline_pages().
  */
-void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
-		unsigned long nr_pages, struct mhp_restrictions *restrictions)
+void __remove_pages(struct zone *zone, unsigned long pfn,
+		 unsigned long nr_pages, struct mhp_restrictions *restrictions)
 {
-	unsigned long i;
-	int sections_to_remove;
+	int i, start_sec, end_sec;
 	unsigned long map_offset = 0;
 	struct vmem_altmap *altmap = restrictions->altmap;
 
@@ -563,19 +585,20 @@ void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 
 	clear_zone_contiguous(zone);
 
-	/*
-	 * We can only remove entire sections
-	 */
-	BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
-	BUG_ON(nr_pages % PAGES_PER_SECTION);
+	if (subsection_check(pfn, nr_pages, restrictions, "remove"))
+		return;
 
-	sections_to_remove = nr_pages / PAGES_PER_SECTION;
-	for (i = 0; i < sections_to_remove; i++) {
-		unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
+	start_sec = pfn_to_section_nr(pfn);
+	end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
+	for (i = start_sec; i <= end_sec; i++) {
+		unsigned long pfns;
 
 		cond_resched();
-		__remove_section(zone, __pfn_to_section(pfn), map_offset,
-				 altmap);
+		pfns = min(nr_pages, PAGES_PER_SECTION
+				- (pfn & ~PAGE_SECTION_MASK));
+		__remove_section(zone, pfn, pfns, map_offset, altmap);
+		pfn += pfns;
+		nr_pages -= pfns;
 		map_offset = 0;
 	}
 
@@ -1830,7 +1853,7 @@ static void __release_memory_resource(u64 start, u64 size)
  */
 void __ref __remove_memory(int nid, u64 start, u64 size)
 {
-	struct mhp_restrictions restrictions = { 0 };
+	struct mhp_restrictions restrictions = { .flags = MHP_MEMBLOCK_API };
 	int ret;
 
 	BUG_ON(check_hotplug_memory_range(start, size));
diff --git a/mm/sparse.c b/mm/sparse.c
index 98408c0da060..bd45bff78ca1 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -756,8 +756,8 @@ static void free_map_bootmem(struct page *memmap)
  * * -EEXIST	- Section has been present.
  * * -ENOMEM	- Out of memory.
  */
-int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
-				     struct vmem_altmap *altmap)
+int __meminit sparse_add_section(int nid, unsigned long start_pfn,
+		unsigned long nr_pages, struct vmem_altmap *altmap)
 {
 	unsigned long section_nr = pfn_to_section_nr(start_pfn);
 	struct mem_section_usage *usage;
@@ -866,7 +866,8 @@ static void free_section_usage(struct page *memmap,
 		free_map_bootmem(memmap);
 }
 
-void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
+void sparse_remove_section(struct zone *zone, struct mem_section *ms,
+		unsigned long pfn, unsigned long nr_pages,
 		unsigned long map_offset, struct vmem_altmap *altmap)
 {
 	struct page *memmap = NULL;


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 09/12] mm/sparsemem: Support sub-section hotplug
  2019-04-17 18:38 ` Dan Williams
@ 2019-04-17 18:39   ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, linux-nvdimm, david, linux-kernel, linux-mm, Vlastimil Babka

The libnvdimm sub-system has suffered a series of hacks and broken
workarounds for the memory-hotplug implementation's awkward
section-aligned (128MB) granularity. For example the following backtrace
is emitted when attempting arch_add_memory() with physical address
ranges that intersect 'System RAM' (RAM) with 'Persistent Memory' (PMEM)
within a given section:

 WARNING: CPU: 0 PID: 558 at kernel/memremap.c:300 devm_memremap_pages+0x3b5/0x4c0
 devm_memremap_pages attempted on mixed region [mem 0x200000000-0x2fbffffff flags 0x200]
 [..]
 Call Trace:
   dump_stack+0x86/0xc3
   __warn+0xcb/0xf0
   warn_slowpath_fmt+0x5f/0x80
   devm_memremap_pages+0x3b5/0x4c0
   __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
   pmem_attach_disk+0x19a/0x440 [nd_pmem]

Recently it was discovered that the problem goes beyond RAM vs PMEM
collisions as some platform produce PMEM vs PMEM collisions within a
given section. The libnvdimm workaround for that case revealed that the
libnvdimm section-alignment-padding implementation has been broken for a
long while. A fix for that long-standing breakage introduces as many
problems as it solves as it would require a backward-incompatible change
to the namespace metadata interpretation. Instead of that dubious route
[1], address the root problem in the memory-hotplug implementation.

[1]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 mm/sparse.c |  224 ++++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 150 insertions(+), 74 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index bd45bff78ca1..3411321998b1 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -83,8 +83,15 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid)
 	unsigned long root = SECTION_NR_TO_ROOT(section_nr);
 	struct mem_section *section;
 
+	/*
+	 * An existing section is possible in the sub-section hotplug
+	 * case. First hot-add instantiates, follow-on hot-add reuses
+	 * the existing section.
+	 *
+	 * The mem_hotplug_lock resolves the apparent race below.
+	 */
 	if (mem_section[root])
-		return -EEXIST;
+		return 0;
 
 	section = sparse_index_alloc(nid);
 	if (!section)
@@ -338,6 +345,15 @@ static void __meminit sparse_init_one_section(struct mem_section *ms,
 		unsigned long pnum, struct page *mem_map,
 		struct mem_section_usage *usage)
 {
+	/*
+	 * Given that SPARSEMEM_VMEMMAP=y supports sub-section hotplug,
+	 * ->section_mem_map can not be guaranteed to point to a full
+	 *  section's worth of memory.  The field is only valid / used
+	 *  in the SPARSEMEM_VMEMMAP=n case.
+	 */
+	if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
+		mem_map = NULL;
+
 	ms->section_mem_map &= ~SECTION_MAP_MASK;
 	ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum) |
 							SECTION_HAS_MEM_MAP;
@@ -743,10 +759,130 @@ static void free_map_bootmem(struct page *memmap)
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
+#ifndef CONFIG_MEMORY_HOTREMOVE
+static void free_map_bootmem(struct page *memmap)
+{
+}
+#endif
+
+static bool is_early_section(struct mem_section *ms)
+{
+	struct page *usage_page;
+
+	usage_page = virt_to_page(ms->usage);
+	if (PageSlab(usage_page) || PageCompound(usage_page))
+		return false;
+	else
+		return true;
+}
+
+static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
+		int nid, struct vmem_altmap *altmap)
+{
+	unsigned long mask = section_active_mask(pfn, nr_pages);
+	struct mem_section *ms = __pfn_to_section(pfn);
+	bool early_section = is_early_section(ms);
+	struct page *memmap = NULL;
+
+	if (WARN(!ms->usage || (ms->usage->map_active & mask) != mask,
+			"section already deactivated: active: %#lx mask: %#lx\n",
+			ms->usage ? ms->usage->map_active : 0, mask))
+		return;
+
+	if (WARN(!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)
+				&& nr_pages < PAGES_PER_SECTION,
+				"partial memory section removal not supported\n"))
+		return;
+
+	/*
+	 * There are 3 cases to handle across two configurations
+	 * (SPARSEMEM_VMEMMAP={y,n}):
+	 *
+	 * 1/ deactivation of a partial hot-added section (only possible
+	 * in the SPARSEMEM_VMEMMAP=y case).
+	 *    a/ section was present at memory init
+	 *    b/ section was hot-added post memory init
+	 * 2/ deactivation of a complete hot-added section
+	 * 3/ deactivation of a complete section from memory init
+	 *
+	 * For 1/, when map_active does not go to zero we will not be
+	 * freeing the usage map, but still need to free the vmemmap
+	 * range.
+	 *
+	 * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
+	 */
+	ms->usage->map_active ^= mask;
+	if (ms->usage->map_active == 0) {
+		unsigned long section_nr = pfn_to_section_nr(pfn);
+
+		if (!early_section) {
+			kfree(ms->usage);
+			ms->usage = NULL;
+		}
+		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
+		ms->section_mem_map = sparse_encode_mem_map(NULL, section_nr);
+	}
+
+	if (early_section && memmap)
+		free_map_bootmem(memmap);
+	else
+		depopulate_section_memmap(pfn, nr_pages, altmap);
+}
+
+static struct page * __meminit section_activate(int nid, unsigned long pfn,
+		unsigned long nr_pages, struct vmem_altmap *altmap)
+{
+	unsigned long mask = section_active_mask(pfn, nr_pages);
+	struct mem_section *ms = __pfn_to_section(pfn);
+	struct mem_section_usage *usage = NULL;
+	struct page *memmap;
+	int rc = 0;
+
+	if (!ms->usage) {
+		usage = kzalloc(mem_section_usage_size(), GFP_KERNEL);
+		if (!usage)
+			return ERR_PTR(-ENOMEM);
+		ms->usage = usage;
+	}
+
+	if (!mask)
+		rc = -EINVAL;
+	else if (mask & ms->usage->map_active)
+		rc = -EEXIST;
+	else
+		ms->usage->map_active |= mask;
+
+	if (rc) {
+		if (usage)
+			ms->usage = NULL;
+		kfree(usage);
+		return ERR_PTR(rc);
+	}
+
+	/*
+	 * The early init code does not consider partially populated
+	 * initial sections, it simply assumes that memory will never be
+	 * referenced.  If we hot-add memory into such a section then we
+	 * do not need to populate the memmap and can simply reuse what
+	 * is already there.
+	 */
+	if (nr_pages < PAGES_PER_SECTION && is_early_section(ms))
+		return pfn_to_page(pfn);
+
+	memmap = populate_section_memmap(pfn, nr_pages, nid, altmap);
+	if (!memmap) {
+		section_deactivate(pfn, nr_pages, nid, altmap);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return memmap;
+}
+
 /**
- * sparse_add_one_section - add a memory section
+ * sparse_add_section - add a memory section, or populate an existing one
  * @nid: The node to add section on
  * @start_pfn: start pfn of the memory range
+ * @nr_pages: number of pfns to add in the section
  * @altmap: device page map
  *
  * This is only intended for hotplug.
@@ -760,49 +896,30 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap)
 {
 	unsigned long section_nr = pfn_to_section_nr(start_pfn);
-	struct mem_section_usage *usage;
-	struct mem_section *ms;
+	struct mem_section *ms = __pfn_to_section(start_pfn);
 	struct page *memmap;
 	int ret;
 
-	/*
-	 * no locking for this, because it does its own
-	 * plus, it does a kmalloc
-	 */
 	ret = sparse_index_init(section_nr, nid);
 	if (ret < 0 && ret != -EEXIST)
 		return ret;
-	ret = 0;
-	memmap = populate_section_memmap(start_pfn, PAGES_PER_SECTION, nid,
-			altmap);
-	if (!memmap)
-		return -ENOMEM;
-	usage = kzalloc(mem_section_usage_size(), GFP_KERNEL);
-	if (!usage) {
-		depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap);
-		return -ENOMEM;
-	}
 
-	ms = __pfn_to_section(start_pfn);
-	if (ms->section_mem_map & SECTION_MARKED_PRESENT) {
-		ret = -EEXIST;
-		goto out;
-	}
+	memmap = section_activate(nid, start_pfn, nr_pages, altmap);
+	if (IS_ERR(memmap))
+		return PTR_ERR(memmap);
+	ret = 0;
 
 	/*
 	 * Poison uninitialized struct pages in order to catch invalid flags
 	 * combinations.
 	 */
-	page_init_poison(memmap, sizeof(struct page) * PAGES_PER_SECTION);
+	page_init_poison(pfn_to_page(start_pfn), sizeof(struct page) * nr_pages);
 
 	section_mark_present(ms);
-	sparse_init_one_section(ms, section_nr, memmap, usage);
+	sparse_init_one_section(ms, section_nr, memmap, ms->usage);
 
-out:
-	if (ret < 0) {
-		kfree(usage);
-		depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap);
-	}
+	if (ret < 0)
+		section_deactivate(start_pfn, nr_pages, nid, altmap);
 	return ret;
 }
 
@@ -837,54 +954,13 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
 }
 #endif
 
-static void free_section_usage(struct page *memmap,
-		struct mem_section_usage *usage, unsigned long pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap)
-{
-	struct page *usage_page;
-
-	if (!usage)
-		return;
-
-	usage_page = virt_to_page(usage);
-	/*
-	 * Check to see if allocation came from hot-plug-add
-	 */
-	if (PageSlab(usage_page) || PageCompound(usage_page)) {
-		kfree(usage);
-		if (memmap)
-			depopulate_section_memmap(pfn, nr_pages, altmap);
-		return;
-	}
-
-	/*
-	 * The usemap came from bootmem. This is packed with other usemaps
-	 * on the section which has pgdat at boot time. Just keep it as is now.
-	 */
-
-	if (memmap)
-		free_map_bootmem(memmap);
-}
-
 void sparse_remove_section(struct zone *zone, struct mem_section *ms,
 		unsigned long pfn, unsigned long nr_pages,
 		unsigned long map_offset, struct vmem_altmap *altmap)
 {
-	struct page *memmap = NULL;
-	struct mem_section_usage *usage = NULL;
-
-	if (ms->section_mem_map) {
-		usage = ms->usage;
-		memmap = sparse_decode_mem_map(ms->section_mem_map,
-						__section_nr(ms));
-		ms->section_mem_map = 0;
-		ms->usage = NULL;
-	}
-
-	clear_hwpoisoned_pages(memmap + map_offset,
-			PAGES_PER_SECTION - map_offset);
-	free_section_usage(memmap, usage, section_nr_to_pfn(__section_nr(ms)),
-			PAGES_PER_SECTION, altmap);
+	clear_hwpoisoned_pages(pfn_to_page(pfn) + map_offset,
+			nr_pages - map_offset);
+	section_deactivate(pfn, nr_pages, zone_to_nid(zone), altmap);
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif /* CONFIG_MEMORY_HOTPLUG */

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 09/12] mm/sparsemem: Support sub-section hotplug
@ 2019-04-17 18:39   ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm
  Cc: Michal Hocko, Vlastimil Babka, Logan Gunthorpe, linux-mm,
	linux-nvdimm, linux-kernel, mhocko, david

The libnvdimm sub-system has suffered a series of hacks and broken
workarounds for the memory-hotplug implementation's awkward
section-aligned (128MB) granularity. For example the following backtrace
is emitted when attempting arch_add_memory() with physical address
ranges that intersect 'System RAM' (RAM) with 'Persistent Memory' (PMEM)
within a given section:

 WARNING: CPU: 0 PID: 558 at kernel/memremap.c:300 devm_memremap_pages+0x3b5/0x4c0
 devm_memremap_pages attempted on mixed region [mem 0x200000000-0x2fbffffff flags 0x200]
 [..]
 Call Trace:
   dump_stack+0x86/0xc3
   __warn+0xcb/0xf0
   warn_slowpath_fmt+0x5f/0x80
   devm_memremap_pages+0x3b5/0x4c0
   __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
   pmem_attach_disk+0x19a/0x440 [nd_pmem]

Recently it was discovered that the problem goes beyond RAM vs PMEM
collisions as some platform produce PMEM vs PMEM collisions within a
given section. The libnvdimm workaround for that case revealed that the
libnvdimm section-alignment-padding implementation has been broken for a
long while. A fix for that long-standing breakage introduces as many
problems as it solves as it would require a backward-incompatible change
to the namespace metadata interpretation. Instead of that dubious route
[1], address the root problem in the memory-hotplug implementation.

[1]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 mm/sparse.c |  224 ++++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 150 insertions(+), 74 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index bd45bff78ca1..3411321998b1 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -83,8 +83,15 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid)
 	unsigned long root = SECTION_NR_TO_ROOT(section_nr);
 	struct mem_section *section;
 
+	/*
+	 * An existing section is possible in the sub-section hotplug
+	 * case. First hot-add instantiates, follow-on hot-add reuses
+	 * the existing section.
+	 *
+	 * The mem_hotplug_lock resolves the apparent race below.
+	 */
 	if (mem_section[root])
-		return -EEXIST;
+		return 0;
 
 	section = sparse_index_alloc(nid);
 	if (!section)
@@ -338,6 +345,15 @@ static void __meminit sparse_init_one_section(struct mem_section *ms,
 		unsigned long pnum, struct page *mem_map,
 		struct mem_section_usage *usage)
 {
+	/*
+	 * Given that SPARSEMEM_VMEMMAP=y supports sub-section hotplug,
+	 * ->section_mem_map can not be guaranteed to point to a full
+	 *  section's worth of memory.  The field is only valid / used
+	 *  in the SPARSEMEM_VMEMMAP=n case.
+	 */
+	if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
+		mem_map = NULL;
+
 	ms->section_mem_map &= ~SECTION_MAP_MASK;
 	ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum) |
 							SECTION_HAS_MEM_MAP;
@@ -743,10 +759,130 @@ static void free_map_bootmem(struct page *memmap)
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
+#ifndef CONFIG_MEMORY_HOTREMOVE
+static void free_map_bootmem(struct page *memmap)
+{
+}
+#endif
+
+static bool is_early_section(struct mem_section *ms)
+{
+	struct page *usage_page;
+
+	usage_page = virt_to_page(ms->usage);
+	if (PageSlab(usage_page) || PageCompound(usage_page))
+		return false;
+	else
+		return true;
+}
+
+static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
+		int nid, struct vmem_altmap *altmap)
+{
+	unsigned long mask = section_active_mask(pfn, nr_pages);
+	struct mem_section *ms = __pfn_to_section(pfn);
+	bool early_section = is_early_section(ms);
+	struct page *memmap = NULL;
+
+	if (WARN(!ms->usage || (ms->usage->map_active & mask) != mask,
+			"section already deactivated: active: %#lx mask: %#lx\n",
+			ms->usage ? ms->usage->map_active : 0, mask))
+		return;
+
+	if (WARN(!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)
+				&& nr_pages < PAGES_PER_SECTION,
+				"partial memory section removal not supported\n"))
+		return;
+
+	/*
+	 * There are 3 cases to handle across two configurations
+	 * (SPARSEMEM_VMEMMAP={y,n}):
+	 *
+	 * 1/ deactivation of a partial hot-added section (only possible
+	 * in the SPARSEMEM_VMEMMAP=y case).
+	 *    a/ section was present at memory init
+	 *    b/ section was hot-added post memory init
+	 * 2/ deactivation of a complete hot-added section
+	 * 3/ deactivation of a complete section from memory init
+	 *
+	 * For 1/, when map_active does not go to zero we will not be
+	 * freeing the usage map, but still need to free the vmemmap
+	 * range.
+	 *
+	 * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
+	 */
+	ms->usage->map_active ^= mask;
+	if (ms->usage->map_active == 0) {
+		unsigned long section_nr = pfn_to_section_nr(pfn);
+
+		if (!early_section) {
+			kfree(ms->usage);
+			ms->usage = NULL;
+		}
+		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
+		ms->section_mem_map = sparse_encode_mem_map(NULL, section_nr);
+	}
+
+	if (early_section && memmap)
+		free_map_bootmem(memmap);
+	else
+		depopulate_section_memmap(pfn, nr_pages, altmap);
+}
+
+static struct page * __meminit section_activate(int nid, unsigned long pfn,
+		unsigned long nr_pages, struct vmem_altmap *altmap)
+{
+	unsigned long mask = section_active_mask(pfn, nr_pages);
+	struct mem_section *ms = __pfn_to_section(pfn);
+	struct mem_section_usage *usage = NULL;
+	struct page *memmap;
+	int rc = 0;
+
+	if (!ms->usage) {
+		usage = kzalloc(mem_section_usage_size(), GFP_KERNEL);
+		if (!usage)
+			return ERR_PTR(-ENOMEM);
+		ms->usage = usage;
+	}
+
+	if (!mask)
+		rc = -EINVAL;
+	else if (mask & ms->usage->map_active)
+		rc = -EEXIST;
+	else
+		ms->usage->map_active |= mask;
+
+	if (rc) {
+		if (usage)
+			ms->usage = NULL;
+		kfree(usage);
+		return ERR_PTR(rc);
+	}
+
+	/*
+	 * The early init code does not consider partially populated
+	 * initial sections, it simply assumes that memory will never be
+	 * referenced.  If we hot-add memory into such a section then we
+	 * do not need to populate the memmap and can simply reuse what
+	 * is already there.
+	 */
+	if (nr_pages < PAGES_PER_SECTION && is_early_section(ms))
+		return pfn_to_page(pfn);
+
+	memmap = populate_section_memmap(pfn, nr_pages, nid, altmap);
+	if (!memmap) {
+		section_deactivate(pfn, nr_pages, nid, altmap);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return memmap;
+}
+
 /**
- * sparse_add_one_section - add a memory section
+ * sparse_add_section - add a memory section, or populate an existing one
  * @nid: The node to add section on
  * @start_pfn: start pfn of the memory range
+ * @nr_pages: number of pfns to add in the section
  * @altmap: device page map
  *
  * This is only intended for hotplug.
@@ -760,49 +896,30 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap)
 {
 	unsigned long section_nr = pfn_to_section_nr(start_pfn);
-	struct mem_section_usage *usage;
-	struct mem_section *ms;
+	struct mem_section *ms = __pfn_to_section(start_pfn);
 	struct page *memmap;
 	int ret;
 
-	/*
-	 * no locking for this, because it does its own
-	 * plus, it does a kmalloc
-	 */
 	ret = sparse_index_init(section_nr, nid);
 	if (ret < 0 && ret != -EEXIST)
 		return ret;
-	ret = 0;
-	memmap = populate_section_memmap(start_pfn, PAGES_PER_SECTION, nid,
-			altmap);
-	if (!memmap)
-		return -ENOMEM;
-	usage = kzalloc(mem_section_usage_size(), GFP_KERNEL);
-	if (!usage) {
-		depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap);
-		return -ENOMEM;
-	}
 
-	ms = __pfn_to_section(start_pfn);
-	if (ms->section_mem_map & SECTION_MARKED_PRESENT) {
-		ret = -EEXIST;
-		goto out;
-	}
+	memmap = section_activate(nid, start_pfn, nr_pages, altmap);
+	if (IS_ERR(memmap))
+		return PTR_ERR(memmap);
+	ret = 0;
 
 	/*
 	 * Poison uninitialized struct pages in order to catch invalid flags
 	 * combinations.
 	 */
-	page_init_poison(memmap, sizeof(struct page) * PAGES_PER_SECTION);
+	page_init_poison(pfn_to_page(start_pfn), sizeof(struct page) * nr_pages);
 
 	section_mark_present(ms);
-	sparse_init_one_section(ms, section_nr, memmap, usage);
+	sparse_init_one_section(ms, section_nr, memmap, ms->usage);
 
-out:
-	if (ret < 0) {
-		kfree(usage);
-		depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap);
-	}
+	if (ret < 0)
+		section_deactivate(start_pfn, nr_pages, nid, altmap);
 	return ret;
 }
 
@@ -837,54 +954,13 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
 }
 #endif
 
-static void free_section_usage(struct page *memmap,
-		struct mem_section_usage *usage, unsigned long pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap)
-{
-	struct page *usage_page;
-
-	if (!usage)
-		return;
-
-	usage_page = virt_to_page(usage);
-	/*
-	 * Check to see if allocation came from hot-plug-add
-	 */
-	if (PageSlab(usage_page) || PageCompound(usage_page)) {
-		kfree(usage);
-		if (memmap)
-			depopulate_section_memmap(pfn, nr_pages, altmap);
-		return;
-	}
-
-	/*
-	 * The usemap came from bootmem. This is packed with other usemaps
-	 * on the section which has pgdat at boot time. Just keep it as is now.
-	 */
-
-	if (memmap)
-		free_map_bootmem(memmap);
-}
-
 void sparse_remove_section(struct zone *zone, struct mem_section *ms,
 		unsigned long pfn, unsigned long nr_pages,
 		unsigned long map_offset, struct vmem_altmap *altmap)
 {
-	struct page *memmap = NULL;
-	struct mem_section_usage *usage = NULL;
-
-	if (ms->section_mem_map) {
-		usage = ms->usage;
-		memmap = sparse_decode_mem_map(ms->section_mem_map,
-						__section_nr(ms));
-		ms->section_mem_map = 0;
-		ms->usage = NULL;
-	}
-
-	clear_hwpoisoned_pages(memmap + map_offset,
-			PAGES_PER_SECTION - map_offset);
-	free_section_usage(memmap, usage, section_nr_to_pfn(__section_nr(ms)),
-			PAGES_PER_SECTION, altmap);
+	clear_hwpoisoned_pages(pfn_to_page(pfn) + map_offset,
+			nr_pages - map_offset);
+	section_deactivate(pfn, nr_pages, zone_to_nid(zone), altmap);
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif /* CONFIG_MEMORY_HOTPLUG */


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 10/12] mm/devm_memremap_pages: Enable sub-section remap
  2019-04-17 18:38 ` Dan Williams
@ 2019-04-17 18:39   ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, linux-nvdimm, david, linux-kernel, linux-mm,
	Jérôme Glisse

Teach devm_memremap_pages() about the new sub-section capabilities of
arch_{add,remove}_memory(). Effectively, just replace all usage of
align_start, align_end, and align_size with res->start, res->end, and
resource_size(res). The existing sanity check will still make sure that
the two separate remap attempts do not collide within a sub-section (2MB
on x86).

Cc: Michal Hocko <mhocko@suse.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 kernel/memremap.c |   58 ++++++++++++++++++++++-------------------------------
 1 file changed, 24 insertions(+), 34 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 33475e211568..ca74a006371a 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -59,7 +59,7 @@ static unsigned long pfn_first(struct dev_pagemap *pgmap)
 	struct vmem_altmap *altmap = &pgmap->altmap;
 	unsigned long pfn;
 
-	pfn = res->start >> PAGE_SHIFT;
+	pfn = PHYS_PFN(res->start);
 	if (pgmap->altmap_valid)
 		pfn += vmem_altmap_offset(altmap);
 	return pfn;
@@ -87,7 +87,6 @@ static void devm_memremap_pages_release(void *data)
 	struct dev_pagemap *pgmap = data;
 	struct device *dev = pgmap->dev;
 	struct resource *res = &pgmap->res;
-	resource_size_t align_start, align_size;
 	unsigned long pfn;
 	int nid;
 
@@ -96,28 +95,25 @@ static void devm_memremap_pages_release(void *data)
 		put_page(pfn_to_page(pfn));
 
 	/* pages are dead and unused, undo the arch mapping */
-	align_start = res->start & ~(PA_SECTION_SIZE - 1);
-	align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
-		- align_start;
-
-	nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT));
+	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
 
 	mem_hotplug_begin();
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
-		pfn = align_start >> PAGE_SHIFT;
+		pfn = PHYS_PFN(res->start);
 		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
-				align_size >> PAGE_SHIFT, NULL);
+				PHYS_PFN(resource_size(res)), NULL);
 	} else {
 		struct mhp_restrictions restrictions = {
 			.altmap = pgmap->altmap_valid ? &pgmap->altmap : NULL,
 		};
 
-		arch_remove_memory(nid, align_start, align_size, &restrictions);
-		kasan_remove_zero_shadow(__va(align_start), align_size);
+		arch_remove_memory(nid, res->start, resource_size(res),
+				&restrictions);
+		kasan_remove_zero_shadow(__va(res->start), resource_size(res));
 	}
 	mem_hotplug_done();
 
-	untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
+	untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res));
 	pgmap_array_delete(res);
 	dev_WARN_ONCE(dev, pgmap->altmap.alloc,
 		      "%s: failed to free all reserved pages\n", __func__);
@@ -144,7 +140,6 @@ static void devm_memremap_pages_release(void *data)
  */
 void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 {
-	resource_size_t align_start, align_size, align_end;
 	struct resource *res = &pgmap->res;
 	struct dev_pagemap *conflict_pgmap;
 	struct mhp_restrictions restrictions = {
@@ -160,26 +155,21 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	if (!pgmap->ref || !pgmap->kill)
 		return ERR_PTR(-EINVAL);
 
-	align_start = res->start & ~(PA_SECTION_SIZE - 1);
-	align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
-		- align_start;
-	align_end = align_start + align_size - 1;
-
-	conflict_pgmap = get_dev_pagemap(PHYS_PFN(align_start), NULL);
+	conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->start), NULL);
 	if (conflict_pgmap) {
 		dev_WARN(dev, "Conflicting mapping in same section\n");
 		put_dev_pagemap(conflict_pgmap);
 		return ERR_PTR(-ENOMEM);
 	}
 
-	conflict_pgmap = get_dev_pagemap(PHYS_PFN(align_end), NULL);
+	conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->end), NULL);
 	if (conflict_pgmap) {
 		dev_WARN(dev, "Conflicting mapping in same section\n");
 		put_dev_pagemap(conflict_pgmap);
 		return ERR_PTR(-ENOMEM);
 	}
 
-	is_ram = region_intersects(align_start, align_size,
+	is_ram = region_intersects(res->start, resource_size(res),
 		IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE);
 
 	if (is_ram != REGION_DISJOINT) {
@@ -200,8 +190,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	if (nid < 0)
 		nid = numa_mem_id();
 
-	error = track_pfn_remap(NULL, &pgprot, PHYS_PFN(align_start), 0,
-			align_size);
+	error = track_pfn_remap(NULL, &pgprot, PHYS_PFN(res->start), 0,
+			resource_size(res));
 	if (error)
 		goto err_pfn_remap;
 
@@ -219,25 +209,25 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	 * arch_add_memory().
 	 */
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
-		error = add_pages(nid, align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, &restrictions);
+		error = add_pages(nid, PHYS_PFN(res->start),
+				PHYS_PFN(resource_size(res)), &restrictions);
 	} else {
-		error = kasan_add_zero_shadow(__va(align_start), align_size);
+		error = kasan_add_zero_shadow(__va(res->start), resource_size(res));
 		if (error) {
 			mem_hotplug_done();
 			goto err_kasan;
 		}
 
-		error = arch_add_memory(nid, align_start, align_size,
-					&restrictions);
+		error = arch_add_memory(nid, res->start, resource_size(res),
+				&restrictions);
 	}
 
 	if (!error) {
 		struct zone *zone;
 
 		zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE];
-		move_pfn_range_to_zone(zone, align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, restrictions.altmap);
+		move_pfn_range_to_zone(zone, PHYS_PFN(res->start),
+				PHYS_PFN(resource_size(res)), restrictions.altmap);
 	}
 
 	mem_hotplug_done();
@@ -249,8 +239,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	 * to allow us to do the work while not holding the hotplug lock.
 	 */
 	memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
-				align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, pgmap);
+				PHYS_PFN(res->start),
+				PHYS_PFN(resource_size(res)), pgmap);
 	percpu_ref_get_many(pgmap->ref, pfn_end(pgmap) - pfn_first(pgmap));
 
 	error = devm_add_action_or_reset(dev, devm_memremap_pages_release,
@@ -261,9 +251,9 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	return __va(res->start);
 
  err_add_memory:
-	kasan_remove_zero_shadow(__va(align_start), align_size);
+	kasan_remove_zero_shadow(__va(res->start), resource_size(res));
  err_kasan:
-	untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
+	untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res));
  err_pfn_remap:
 	pgmap_array_delete(res);
  err_array:

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 10/12] mm/devm_memremap_pages: Enable sub-section remap
@ 2019-04-17 18:39   ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm
  Cc: Michal Hocko, Toshi Kani, Jérôme Glisse,
	Logan Gunthorpe, linux-mm, linux-nvdimm, linux-kernel, mhocko,
	david

Teach devm_memremap_pages() about the new sub-section capabilities of
arch_{add,remove}_memory(). Effectively, just replace all usage of
align_start, align_end, and align_size with res->start, res->end, and
resource_size(res). The existing sanity check will still make sure that
the two separate remap attempts do not collide within a sub-section (2MB
on x86).

Cc: Michal Hocko <mhocko@suse.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 kernel/memremap.c |   58 ++++++++++++++++++++++-------------------------------
 1 file changed, 24 insertions(+), 34 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 33475e211568..ca74a006371a 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -59,7 +59,7 @@ static unsigned long pfn_first(struct dev_pagemap *pgmap)
 	struct vmem_altmap *altmap = &pgmap->altmap;
 	unsigned long pfn;
 
-	pfn = res->start >> PAGE_SHIFT;
+	pfn = PHYS_PFN(res->start);
 	if (pgmap->altmap_valid)
 		pfn += vmem_altmap_offset(altmap);
 	return pfn;
@@ -87,7 +87,6 @@ static void devm_memremap_pages_release(void *data)
 	struct dev_pagemap *pgmap = data;
 	struct device *dev = pgmap->dev;
 	struct resource *res = &pgmap->res;
-	resource_size_t align_start, align_size;
 	unsigned long pfn;
 	int nid;
 
@@ -96,28 +95,25 @@ static void devm_memremap_pages_release(void *data)
 		put_page(pfn_to_page(pfn));
 
 	/* pages are dead and unused, undo the arch mapping */
-	align_start = res->start & ~(PA_SECTION_SIZE - 1);
-	align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
-		- align_start;
-
-	nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT));
+	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
 
 	mem_hotplug_begin();
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
-		pfn = align_start >> PAGE_SHIFT;
+		pfn = PHYS_PFN(res->start);
 		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
-				align_size >> PAGE_SHIFT, NULL);
+				PHYS_PFN(resource_size(res)), NULL);
 	} else {
 		struct mhp_restrictions restrictions = {
 			.altmap = pgmap->altmap_valid ? &pgmap->altmap : NULL,
 		};
 
-		arch_remove_memory(nid, align_start, align_size, &restrictions);
-		kasan_remove_zero_shadow(__va(align_start), align_size);
+		arch_remove_memory(nid, res->start, resource_size(res),
+				&restrictions);
+		kasan_remove_zero_shadow(__va(res->start), resource_size(res));
 	}
 	mem_hotplug_done();
 
-	untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
+	untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res));
 	pgmap_array_delete(res);
 	dev_WARN_ONCE(dev, pgmap->altmap.alloc,
 		      "%s: failed to free all reserved pages\n", __func__);
@@ -144,7 +140,6 @@ static void devm_memremap_pages_release(void *data)
  */
 void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 {
-	resource_size_t align_start, align_size, align_end;
 	struct resource *res = &pgmap->res;
 	struct dev_pagemap *conflict_pgmap;
 	struct mhp_restrictions restrictions = {
@@ -160,26 +155,21 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	if (!pgmap->ref || !pgmap->kill)
 		return ERR_PTR(-EINVAL);
 
-	align_start = res->start & ~(PA_SECTION_SIZE - 1);
-	align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
-		- align_start;
-	align_end = align_start + align_size - 1;
-
-	conflict_pgmap = get_dev_pagemap(PHYS_PFN(align_start), NULL);
+	conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->start), NULL);
 	if (conflict_pgmap) {
 		dev_WARN(dev, "Conflicting mapping in same section\n");
 		put_dev_pagemap(conflict_pgmap);
 		return ERR_PTR(-ENOMEM);
 	}
 
-	conflict_pgmap = get_dev_pagemap(PHYS_PFN(align_end), NULL);
+	conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->end), NULL);
 	if (conflict_pgmap) {
 		dev_WARN(dev, "Conflicting mapping in same section\n");
 		put_dev_pagemap(conflict_pgmap);
 		return ERR_PTR(-ENOMEM);
 	}
 
-	is_ram = region_intersects(align_start, align_size,
+	is_ram = region_intersects(res->start, resource_size(res),
 		IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE);
 
 	if (is_ram != REGION_DISJOINT) {
@@ -200,8 +190,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	if (nid < 0)
 		nid = numa_mem_id();
 
-	error = track_pfn_remap(NULL, &pgprot, PHYS_PFN(align_start), 0,
-			align_size);
+	error = track_pfn_remap(NULL, &pgprot, PHYS_PFN(res->start), 0,
+			resource_size(res));
 	if (error)
 		goto err_pfn_remap;
 
@@ -219,25 +209,25 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	 * arch_add_memory().
 	 */
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
-		error = add_pages(nid, align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, &restrictions);
+		error = add_pages(nid, PHYS_PFN(res->start),
+				PHYS_PFN(resource_size(res)), &restrictions);
 	} else {
-		error = kasan_add_zero_shadow(__va(align_start), align_size);
+		error = kasan_add_zero_shadow(__va(res->start), resource_size(res));
 		if (error) {
 			mem_hotplug_done();
 			goto err_kasan;
 		}
 
-		error = arch_add_memory(nid, align_start, align_size,
-					&restrictions);
+		error = arch_add_memory(nid, res->start, resource_size(res),
+				&restrictions);
 	}
 
 	if (!error) {
 		struct zone *zone;
 
 		zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE];
-		move_pfn_range_to_zone(zone, align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, restrictions.altmap);
+		move_pfn_range_to_zone(zone, PHYS_PFN(res->start),
+				PHYS_PFN(resource_size(res)), restrictions.altmap);
 	}
 
 	mem_hotplug_done();
@@ -249,8 +239,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	 * to allow us to do the work while not holding the hotplug lock.
 	 */
 	memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
-				align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, pgmap);
+				PHYS_PFN(res->start),
+				PHYS_PFN(resource_size(res)), pgmap);
 	percpu_ref_get_many(pgmap->ref, pfn_end(pgmap) - pfn_first(pgmap));
 
 	error = devm_add_action_or_reset(dev, devm_memremap_pages_release,
@@ -261,9 +251,9 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	return __va(res->start);
 
  err_add_memory:
-	kasan_remove_zero_shadow(__va(align_start), align_size);
+	kasan_remove_zero_shadow(__va(res->start), resource_size(res));
  err_kasan:
-	untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
+	untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res));
  err_pfn_remap:
 	pgmap_array_delete(res);
  err_array:


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 11/12] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields
  2019-04-17 18:38 ` Dan Williams
@ 2019-04-17 18:39   ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, david, linux-nvdimm, linux-kernel, stable, linux-mm

At namespace creation time there is the potential for the "expected to
be zero" fields of a 'pfn' info-block to be filled with indeterminate
data. While the kernel buffer is zeroed on allocation it is immediately
overwritten by nd_pfn_validate() filling it with the current contents of
the on-media info-block location. For fields like, 'flags' and the
'padding' it potentially means that future implementations can not rely
on those fields being zero.

In preparation to stop using the 'start_pad' and 'end_trunc' fields for
section alignment, arrange for fields that are not explicitly
initialized to be guaranteed zero. Bump the minor version to indicate it
is safe to assume the 'padding' and 'flags' are zero. Otherwise, this
corruption is expected to benign since all other critical fields are
explicitly initialized.

Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/dax_devs.c |    2 +-
 drivers/nvdimm/pfn.h      |    1 +
 drivers/nvdimm/pfn_devs.c |   18 +++++++++++++++---
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/dax_devs.c b/drivers/nvdimm/dax_devs.c
index 0453f49dc708..326f02ffca81 100644
--- a/drivers/nvdimm/dax_devs.c
+++ b/drivers/nvdimm/dax_devs.c
@@ -126,7 +126,7 @@ int nd_dax_probe(struct device *dev, struct nd_namespace_common *ndns)
 	nvdimm_bus_unlock(&ndns->dev);
 	if (!dax_dev)
 		return -ENOMEM;
-	pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+	pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
 	nd_pfn->pfn_sb = pfn_sb;
 	rc = nd_pfn_validate(nd_pfn, DAX_SIG);
 	dev_dbg(dev, "dax: %s\n", rc == 0 ? dev_name(dax_dev) : "<none>");
diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index dde9853453d3..e901e3a3b04c 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -36,6 +36,7 @@ struct nd_pfn_sb {
 	__le32 end_trunc;
 	/* minor-version-2 record the base alignment of the mapping */
 	__le32 align;
+	/* minor-version-3 guarantee the padding and flags are zero */
 	u8 padding[4000];
 	__le64 checksum;
 };
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 01f40672507f..a2406253eb70 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -420,6 +420,15 @@ static int nd_pfn_clear_memmap_errors(struct nd_pfn *nd_pfn)
 	return 0;
 }
 
+/**
+ * nd_pfn_validate - read and validate info-block
+ * @nd_pfn: fsdax namespace runtime state / properties
+ * @sig: 'devdax' or 'fsdax' signature
+ *
+ * Upon return the info-block buffer contents (->pfn_sb) are
+ * indeterminate when validation fails, and a coherent info-block
+ * otherwise.
+ */
 int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 {
 	u64 checksum, offset;
@@ -565,7 +574,7 @@ int nd_pfn_probe(struct device *dev, struct nd_namespace_common *ndns)
 	nvdimm_bus_unlock(&ndns->dev);
 	if (!pfn_dev)
 		return -ENOMEM;
-	pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+	pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
 	nd_pfn = to_nd_pfn(pfn_dev);
 	nd_pfn->pfn_sb = pfn_sb;
 	rc = nd_pfn_validate(nd_pfn, PFN_SIG);
@@ -702,7 +711,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 	u64 checksum;
 	int rc;
 
-	pfn_sb = devm_kzalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
+	pfn_sb = devm_kmalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
 	if (!pfn_sb)
 		return -ENOMEM;
 
@@ -711,11 +720,14 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 		sig = DAX_SIG;
 	else
 		sig = PFN_SIG;
+
 	rc = nd_pfn_validate(nd_pfn, sig);
 	if (rc != -ENODEV)
 		return rc;
 
 	/* no info block, do init */;
+	memset(pfn_sb, 0, sizeof(*pfn_sb));
+
 	nd_region = to_nd_region(nd_pfn->dev.parent);
 	if (nd_region->ro) {
 		dev_info(&nd_pfn->dev,
@@ -768,7 +780,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 	memcpy(pfn_sb->uuid, nd_pfn->uuid, 16);
 	memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16);
 	pfn_sb->version_major = cpu_to_le16(1);
-	pfn_sb->version_minor = cpu_to_le16(2);
+	pfn_sb->version_minor = cpu_to_le16(3);
 	pfn_sb->start_pad = cpu_to_le32(start_pad);
 	pfn_sb->end_trunc = cpu_to_le32(end_trunc);
 	pfn_sb->align = cpu_to_le32(nd_pfn->align);

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 11/12] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields
@ 2019-04-17 18:39   ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm; +Cc: stable, linux-mm, linux-nvdimm, linux-kernel, mhocko, david

At namespace creation time there is the potential for the "expected to
be zero" fields of a 'pfn' info-block to be filled with indeterminate
data. While the kernel buffer is zeroed on allocation it is immediately
overwritten by nd_pfn_validate() filling it with the current contents of
the on-media info-block location. For fields like, 'flags' and the
'padding' it potentially means that future implementations can not rely
on those fields being zero.

In preparation to stop using the 'start_pad' and 'end_trunc' fields for
section alignment, arrange for fields that are not explicitly
initialized to be guaranteed zero. Bump the minor version to indicate it
is safe to assume the 'padding' and 'flags' are zero. Otherwise, this
corruption is expected to benign since all other critical fields are
explicitly initialized.

Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/dax_devs.c |    2 +-
 drivers/nvdimm/pfn.h      |    1 +
 drivers/nvdimm/pfn_devs.c |   18 +++++++++++++++---
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/dax_devs.c b/drivers/nvdimm/dax_devs.c
index 0453f49dc708..326f02ffca81 100644
--- a/drivers/nvdimm/dax_devs.c
+++ b/drivers/nvdimm/dax_devs.c
@@ -126,7 +126,7 @@ int nd_dax_probe(struct device *dev, struct nd_namespace_common *ndns)
 	nvdimm_bus_unlock(&ndns->dev);
 	if (!dax_dev)
 		return -ENOMEM;
-	pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+	pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
 	nd_pfn->pfn_sb = pfn_sb;
 	rc = nd_pfn_validate(nd_pfn, DAX_SIG);
 	dev_dbg(dev, "dax: %s\n", rc == 0 ? dev_name(dax_dev) : "<none>");
diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index dde9853453d3..e901e3a3b04c 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -36,6 +36,7 @@ struct nd_pfn_sb {
 	__le32 end_trunc;
 	/* minor-version-2 record the base alignment of the mapping */
 	__le32 align;
+	/* minor-version-3 guarantee the padding and flags are zero */
 	u8 padding[4000];
 	__le64 checksum;
 };
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 01f40672507f..a2406253eb70 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -420,6 +420,15 @@ static int nd_pfn_clear_memmap_errors(struct nd_pfn *nd_pfn)
 	return 0;
 }
 
+/**
+ * nd_pfn_validate - read and validate info-block
+ * @nd_pfn: fsdax namespace runtime state / properties
+ * @sig: 'devdax' or 'fsdax' signature
+ *
+ * Upon return the info-block buffer contents (->pfn_sb) are
+ * indeterminate when validation fails, and a coherent info-block
+ * otherwise.
+ */
 int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 {
 	u64 checksum, offset;
@@ -565,7 +574,7 @@ int nd_pfn_probe(struct device *dev, struct nd_namespace_common *ndns)
 	nvdimm_bus_unlock(&ndns->dev);
 	if (!pfn_dev)
 		return -ENOMEM;
-	pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+	pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
 	nd_pfn = to_nd_pfn(pfn_dev);
 	nd_pfn->pfn_sb = pfn_sb;
 	rc = nd_pfn_validate(nd_pfn, PFN_SIG);
@@ -702,7 +711,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 	u64 checksum;
 	int rc;
 
-	pfn_sb = devm_kzalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
+	pfn_sb = devm_kmalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
 	if (!pfn_sb)
 		return -ENOMEM;
 
@@ -711,11 +720,14 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 		sig = DAX_SIG;
 	else
 		sig = PFN_SIG;
+
 	rc = nd_pfn_validate(nd_pfn, sig);
 	if (rc != -ENODEV)
 		return rc;
 
 	/* no info block, do init */;
+	memset(pfn_sb, 0, sizeof(*pfn_sb));
+
 	nd_region = to_nd_region(nd_pfn->dev.parent);
 	if (nd_region->ro) {
 		dev_info(&nd_pfn->dev,
@@ -768,7 +780,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 	memcpy(pfn_sb->uuid, nd_pfn->uuid, 16);
 	memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16);
 	pfn_sb->version_major = cpu_to_le16(1);
-	pfn_sb->version_minor = cpu_to_le16(2);
+	pfn_sb->version_minor = cpu_to_le16(3);
 	pfn_sb->start_pad = cpu_to_le32(start_pad);
 	pfn_sb->end_trunc = cpu_to_le32(end_trunc);
 	pfn_sb->align = cpu_to_le32(nd_pfn->align);


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 12/12] libnvdimm/pfn: Stop padding pmem namespaces to section alignment
  2019-04-17 18:38 ` Dan Williams
@ 2019-04-17 18:39   ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, david, linux-nvdimm, linux-kernel, linux-mm

Now that the mm core supports section-unaligned hotplug of ZONE_DEVICE
memory, we no longer need to add padding at pfn/dax device creation
time. The kernel will still honor padding established by older kernels.

Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/pfn.h      |   11 ++-----
 drivers/nvdimm/pfn_devs.c |   75 +++++++--------------------------------------
 include/linux/mmzone.h    |    4 ++
 3 files changed, 19 insertions(+), 71 deletions(-)

diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index e901e3a3b04c..ae589cc528f2 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -41,18 +41,13 @@ struct nd_pfn_sb {
 	__le64 checksum;
 };
 
-#ifdef CONFIG_SPARSEMEM
-#define PFN_SECTION_ALIGN_DOWN(x) SECTION_ALIGN_DOWN(x)
-#define PFN_SECTION_ALIGN_UP(x) SECTION_ALIGN_UP(x)
-#else
 /*
  * In this case ZONE_DEVICE=n and we will disable 'pfn' device support,
  * but we still want pmem to compile.
  */
-#define PFN_SECTION_ALIGN_DOWN(x) (x)
-#define PFN_SECTION_ALIGN_UP(x) (x)
+#ifndef SUB_SECTION_ALIGN_DOWN
+#define SUB_SECTION_ALIGN_DOWN(x) (x)
+#define SUB_SECTION_ALIGN_UP(x) (x)
 #endif
 
-#define PHYS_SECTION_ALIGN_DOWN(x) PFN_PHYS(PFN_SECTION_ALIGN_DOWN(PHYS_PFN(x)))
-#define PHYS_SECTION_ALIGN_UP(x) PFN_PHYS(PFN_SECTION_ALIGN_UP(PHYS_PFN(x)))
 #endif /* __NVDIMM_PFN_H */
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index a2406253eb70..7bdaaf3dc77e 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -595,14 +595,14 @@ static u32 info_block_reserve(void)
 }
 
 /*
- * We hotplug memory at section granularity, pad the reserved area from
- * the previous section base to the namespace base address.
+ * We hotplug memory at sub-section granularity, pad the reserved area
+ * from the previous section base to the namespace base address.
  */
 static unsigned long init_altmap_base(resource_size_t base)
 {
 	unsigned long base_pfn = PHYS_PFN(base);
 
-	return PFN_SECTION_ALIGN_DOWN(base_pfn);
+	return SUB_SECTION_ALIGN_DOWN(base_pfn);
 }
 
 static unsigned long init_altmap_reserve(resource_size_t base)
@@ -610,7 +610,7 @@ static unsigned long init_altmap_reserve(resource_size_t base)
 	unsigned long reserve = info_block_reserve() >> PAGE_SHIFT;
 	unsigned long base_pfn = PHYS_PFN(base);
 
-	reserve += base_pfn - PFN_SECTION_ALIGN_DOWN(base_pfn);
+	reserve += base_pfn - SUB_SECTION_ALIGN_DOWN(base_pfn);
 	return reserve;
 }
 
@@ -641,8 +641,7 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap)
 		nd_pfn->npfns = le64_to_cpu(pfn_sb->npfns);
 		pgmap->altmap_valid = false;
 	} else if (nd_pfn->mode == PFN_MODE_PMEM) {
-		nd_pfn->npfns = PFN_SECTION_ALIGN_UP((resource_size(res)
-					- offset) / PAGE_SIZE);
+		nd_pfn->npfns = PHYS_PFN((resource_size(res) - offset));
 		if (le64_to_cpu(nd_pfn->pfn_sb->npfns) > nd_pfn->npfns)
 			dev_info(&nd_pfn->dev,
 					"number of pfns truncated from %lld to %ld\n",
@@ -658,50 +657,10 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap)
 	return 0;
 }
 
-static u64 phys_pmem_align_down(struct nd_pfn *nd_pfn, u64 phys)
-{
-	return min_t(u64, PHYS_SECTION_ALIGN_DOWN(phys),
-			ALIGN_DOWN(phys, nd_pfn->align));
-}
-
-/*
- * Check if pmem collides with 'System RAM', or other regions when
- * section aligned.  Trim it accordingly.
- */
-static void trim_pfn_device(struct nd_pfn *nd_pfn, u32 *start_pad, u32 *end_trunc)
-{
-	struct nd_namespace_common *ndns = nd_pfn->ndns;
-	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
-	struct nd_region *nd_region = to_nd_region(nd_pfn->dev.parent);
-	const resource_size_t start = nsio->res.start;
-	const resource_size_t end = start + resource_size(&nsio->res);
-	resource_size_t adjust, size;
-
-	*start_pad = 0;
-	*end_trunc = 0;
-
-	adjust = start - PHYS_SECTION_ALIGN_DOWN(start);
-	size = resource_size(&nsio->res) + adjust;
-	if (region_intersects(start - adjust, size, IORESOURCE_SYSTEM_RAM,
-				IORES_DESC_NONE) == REGION_MIXED
-			|| nd_region_conflict(nd_region, start - adjust, size))
-		*start_pad = PHYS_SECTION_ALIGN_UP(start) - start;
-
-	/* Now check that end of the range does not collide. */
-	adjust = PHYS_SECTION_ALIGN_UP(end) - end;
-	size = resource_size(&nsio->res) + adjust;
-	if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM,
-				IORES_DESC_NONE) == REGION_MIXED
-			|| !IS_ALIGNED(end, nd_pfn->align)
-			|| nd_region_conflict(nd_region, start, size))
-		*end_trunc = end - phys_pmem_align_down(nd_pfn, end);
-}
-
 static int nd_pfn_init(struct nd_pfn *nd_pfn)
 {
 	struct nd_namespace_common *ndns = nd_pfn->ndns;
 	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
-	u32 start_pad, end_trunc, reserve = info_block_reserve();
 	resource_size_t start, size;
 	struct nd_region *nd_region;
 	struct nd_pfn_sb *pfn_sb;
@@ -736,43 +695,35 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 		return -ENXIO;
 	}
 
-	memset(pfn_sb, 0, sizeof(*pfn_sb));
-
-	trim_pfn_device(nd_pfn, &start_pad, &end_trunc);
-	if (start_pad + end_trunc)
-		dev_info(&nd_pfn->dev, "%s alignment collision, truncate %d bytes\n",
-				dev_name(&ndns->dev), start_pad + end_trunc);
-
 	/*
 	 * Note, we use 64 here for the standard size of struct page,
 	 * debugging options may cause it to be larger in which case the
 	 * implementation will limit the pfns advertised through
 	 * ->direct_access() to those that are included in the memmap.
 	 */
-	start = nsio->res.start + start_pad;
+	start = nsio->res.start;
 	size = resource_size(&nsio->res);
-	npfns = PFN_SECTION_ALIGN_UP((size - start_pad - end_trunc - reserve)
-			/ PAGE_SIZE);
+	npfns = PHYS_PFN(size - SZ_8K);
 	if (nd_pfn->mode == PFN_MODE_PMEM) {
 		/*
 		 * The altmap should be padded out to the block size used
 		 * when populating the vmemmap. This *should* be equal to
 		 * PMD_SIZE for most architectures.
 		 */
-		offset = ALIGN(start + reserve + 64 * npfns,
-				max(nd_pfn->align, PMD_SIZE)) - start;
+		offset = ALIGN(start + SZ_8K + 64 * npfns,
+				max(nd_pfn->align, SECTION_ACTIVE_SIZE)) - start;
 	} else if (nd_pfn->mode == PFN_MODE_RAM)
-		offset = ALIGN(start + reserve, nd_pfn->align) - start;
+		offset = ALIGN(start + SZ_8K, nd_pfn->align) - start;
 	else
 		return -ENXIO;
 
-	if (offset + start_pad + end_trunc >= size) {
+	if (offset >= size) {
 		dev_err(&nd_pfn->dev, "%s unable to satisfy requested alignment\n",
 				dev_name(&ndns->dev));
 		return -ENXIO;
 	}
 
-	npfns = (size - offset - start_pad - end_trunc) / SZ_4K;
+	npfns = PHYS_PFN(size - offset);
 	pfn_sb->mode = cpu_to_le32(nd_pfn->mode);
 	pfn_sb->dataoff = cpu_to_le64(offset);
 	pfn_sb->npfns = cpu_to_le64(npfns);
@@ -781,8 +732,6 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 	memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16);
 	pfn_sb->version_major = cpu_to_le16(1);
 	pfn_sb->version_minor = cpu_to_le16(3);
-	pfn_sb->start_pad = cpu_to_le32(start_pad);
-	pfn_sb->end_trunc = cpu_to_le32(end_trunc);
 	pfn_sb->align = cpu_to_le32(nd_pfn->align);
 	checksum = nd_sb_checksum((struct nd_gen_sb *) pfn_sb);
 	pfn_sb->checksum = cpu_to_le64(checksum);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3237c5e456df..d2445c483ad4 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1155,6 +1155,10 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
 #define PAGES_PER_SUB_SECTION (SECTION_ACTIVE_SIZE / PAGE_SIZE)
 #define PAGE_SUB_SECTION_MASK (~(PAGES_PER_SUB_SECTION-1))
 
+#define SUB_SECTION_ALIGN_UP(pfn) (((pfn) + PAGES_PER_SUB_SECTION - 1) \
+		& PAGE_SUB_SECTION_MASK)
+#define SUB_SECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SUB_SECTION_MASK)
+
 struct mem_section_usage {
 	/*
 	 * SECTION_ACTIVE_SIZE portions of the section that are populated in

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v6 12/12] libnvdimm/pfn: Stop padding pmem namespaces to section alignment
@ 2019-04-17 18:39   ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 18:39 UTC (permalink / raw)
  To: akpm; +Cc: Jeff Moyer, linux-mm, linux-nvdimm, linux-kernel, mhocko, david

Now that the mm core supports section-unaligned hotplug of ZONE_DEVICE
memory, we no longer need to add padding at pfn/dax device creation
time. The kernel will still honor padding established by older kernels.

Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/pfn.h      |   11 ++-----
 drivers/nvdimm/pfn_devs.c |   75 +++++++--------------------------------------
 include/linux/mmzone.h    |    4 ++
 3 files changed, 19 insertions(+), 71 deletions(-)

diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index e901e3a3b04c..ae589cc528f2 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -41,18 +41,13 @@ struct nd_pfn_sb {
 	__le64 checksum;
 };
 
-#ifdef CONFIG_SPARSEMEM
-#define PFN_SECTION_ALIGN_DOWN(x) SECTION_ALIGN_DOWN(x)
-#define PFN_SECTION_ALIGN_UP(x) SECTION_ALIGN_UP(x)
-#else
 /*
  * In this case ZONE_DEVICE=n and we will disable 'pfn' device support,
  * but we still want pmem to compile.
  */
-#define PFN_SECTION_ALIGN_DOWN(x) (x)
-#define PFN_SECTION_ALIGN_UP(x) (x)
+#ifndef SUB_SECTION_ALIGN_DOWN
+#define SUB_SECTION_ALIGN_DOWN(x) (x)
+#define SUB_SECTION_ALIGN_UP(x) (x)
 #endif
 
-#define PHYS_SECTION_ALIGN_DOWN(x) PFN_PHYS(PFN_SECTION_ALIGN_DOWN(PHYS_PFN(x)))
-#define PHYS_SECTION_ALIGN_UP(x) PFN_PHYS(PFN_SECTION_ALIGN_UP(PHYS_PFN(x)))
 #endif /* __NVDIMM_PFN_H */
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index a2406253eb70..7bdaaf3dc77e 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -595,14 +595,14 @@ static u32 info_block_reserve(void)
 }
 
 /*
- * We hotplug memory at section granularity, pad the reserved area from
- * the previous section base to the namespace base address.
+ * We hotplug memory at sub-section granularity, pad the reserved area
+ * from the previous section base to the namespace base address.
  */
 static unsigned long init_altmap_base(resource_size_t base)
 {
 	unsigned long base_pfn = PHYS_PFN(base);
 
-	return PFN_SECTION_ALIGN_DOWN(base_pfn);
+	return SUB_SECTION_ALIGN_DOWN(base_pfn);
 }
 
 static unsigned long init_altmap_reserve(resource_size_t base)
@@ -610,7 +610,7 @@ static unsigned long init_altmap_reserve(resource_size_t base)
 	unsigned long reserve = info_block_reserve() >> PAGE_SHIFT;
 	unsigned long base_pfn = PHYS_PFN(base);
 
-	reserve += base_pfn - PFN_SECTION_ALIGN_DOWN(base_pfn);
+	reserve += base_pfn - SUB_SECTION_ALIGN_DOWN(base_pfn);
 	return reserve;
 }
 
@@ -641,8 +641,7 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap)
 		nd_pfn->npfns = le64_to_cpu(pfn_sb->npfns);
 		pgmap->altmap_valid = false;
 	} else if (nd_pfn->mode == PFN_MODE_PMEM) {
-		nd_pfn->npfns = PFN_SECTION_ALIGN_UP((resource_size(res)
-					- offset) / PAGE_SIZE);
+		nd_pfn->npfns = PHYS_PFN((resource_size(res) - offset));
 		if (le64_to_cpu(nd_pfn->pfn_sb->npfns) > nd_pfn->npfns)
 			dev_info(&nd_pfn->dev,
 					"number of pfns truncated from %lld to %ld\n",
@@ -658,50 +657,10 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap)
 	return 0;
 }
 
-static u64 phys_pmem_align_down(struct nd_pfn *nd_pfn, u64 phys)
-{
-	return min_t(u64, PHYS_SECTION_ALIGN_DOWN(phys),
-			ALIGN_DOWN(phys, nd_pfn->align));
-}
-
-/*
- * Check if pmem collides with 'System RAM', or other regions when
- * section aligned.  Trim it accordingly.
- */
-static void trim_pfn_device(struct nd_pfn *nd_pfn, u32 *start_pad, u32 *end_trunc)
-{
-	struct nd_namespace_common *ndns = nd_pfn->ndns;
-	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
-	struct nd_region *nd_region = to_nd_region(nd_pfn->dev.parent);
-	const resource_size_t start = nsio->res.start;
-	const resource_size_t end = start + resource_size(&nsio->res);
-	resource_size_t adjust, size;
-
-	*start_pad = 0;
-	*end_trunc = 0;
-
-	adjust = start - PHYS_SECTION_ALIGN_DOWN(start);
-	size = resource_size(&nsio->res) + adjust;
-	if (region_intersects(start - adjust, size, IORESOURCE_SYSTEM_RAM,
-				IORES_DESC_NONE) == REGION_MIXED
-			|| nd_region_conflict(nd_region, start - adjust, size))
-		*start_pad = PHYS_SECTION_ALIGN_UP(start) - start;
-
-	/* Now check that end of the range does not collide. */
-	adjust = PHYS_SECTION_ALIGN_UP(end) - end;
-	size = resource_size(&nsio->res) + adjust;
-	if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM,
-				IORES_DESC_NONE) == REGION_MIXED
-			|| !IS_ALIGNED(end, nd_pfn->align)
-			|| nd_region_conflict(nd_region, start, size))
-		*end_trunc = end - phys_pmem_align_down(nd_pfn, end);
-}
-
 static int nd_pfn_init(struct nd_pfn *nd_pfn)
 {
 	struct nd_namespace_common *ndns = nd_pfn->ndns;
 	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
-	u32 start_pad, end_trunc, reserve = info_block_reserve();
 	resource_size_t start, size;
 	struct nd_region *nd_region;
 	struct nd_pfn_sb *pfn_sb;
@@ -736,43 +695,35 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 		return -ENXIO;
 	}
 
-	memset(pfn_sb, 0, sizeof(*pfn_sb));
-
-	trim_pfn_device(nd_pfn, &start_pad, &end_trunc);
-	if (start_pad + end_trunc)
-		dev_info(&nd_pfn->dev, "%s alignment collision, truncate %d bytes\n",
-				dev_name(&ndns->dev), start_pad + end_trunc);
-
 	/*
 	 * Note, we use 64 here for the standard size of struct page,
 	 * debugging options may cause it to be larger in which case the
 	 * implementation will limit the pfns advertised through
 	 * ->direct_access() to those that are included in the memmap.
 	 */
-	start = nsio->res.start + start_pad;
+	start = nsio->res.start;
 	size = resource_size(&nsio->res);
-	npfns = PFN_SECTION_ALIGN_UP((size - start_pad - end_trunc - reserve)
-			/ PAGE_SIZE);
+	npfns = PHYS_PFN(size - SZ_8K);
 	if (nd_pfn->mode == PFN_MODE_PMEM) {
 		/*
 		 * The altmap should be padded out to the block size used
 		 * when populating the vmemmap. This *should* be equal to
 		 * PMD_SIZE for most architectures.
 		 */
-		offset = ALIGN(start + reserve + 64 * npfns,
-				max(nd_pfn->align, PMD_SIZE)) - start;
+		offset = ALIGN(start + SZ_8K + 64 * npfns,
+				max(nd_pfn->align, SECTION_ACTIVE_SIZE)) - start;
 	} else if (nd_pfn->mode == PFN_MODE_RAM)
-		offset = ALIGN(start + reserve, nd_pfn->align) - start;
+		offset = ALIGN(start + SZ_8K, nd_pfn->align) - start;
 	else
 		return -ENXIO;
 
-	if (offset + start_pad + end_trunc >= size) {
+	if (offset >= size) {
 		dev_err(&nd_pfn->dev, "%s unable to satisfy requested alignment\n",
 				dev_name(&ndns->dev));
 		return -ENXIO;
 	}
 
-	npfns = (size - offset - start_pad - end_trunc) / SZ_4K;
+	npfns = PHYS_PFN(size - offset);
 	pfn_sb->mode = cpu_to_le32(nd_pfn->mode);
 	pfn_sb->dataoff = cpu_to_le64(offset);
 	pfn_sb->npfns = cpu_to_le64(npfns);
@@ -781,8 +732,6 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 	memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16);
 	pfn_sb->version_major = cpu_to_le16(1);
 	pfn_sb->version_minor = cpu_to_le16(3);
-	pfn_sb->start_pad = cpu_to_le32(start_pad);
-	pfn_sb->end_trunc = cpu_to_le32(end_trunc);
 	pfn_sb->align = cpu_to_le32(nd_pfn->align);
 	checksum = nd_sb_checksum((struct nd_gen_sb *) pfn_sb);
 	pfn_sb->checksum = cpu_to_le64(checksum);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3237c5e456df..d2445c483ad4 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1155,6 +1155,10 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
 #define PAGES_PER_SUB_SECTION (SECTION_ACTIVE_SIZE / PAGE_SIZE)
 #define PAGE_SUB_SECTION_MASK (~(PAGES_PER_SUB_SECTION-1))
 
+#define SUB_SECTION_ALIGN_UP(pfn) (((pfn) + PAGES_PER_SUB_SECTION - 1) \
+		& PAGE_SUB_SECTION_MASK)
+#define SUB_SECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SUB_SECTION_MASK)
+
 struct mem_section_usage {
 	/*
 	 * SECTION_ACTIVE_SIZE portions of the section that are populated in


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 07/12] mm: Kill is_dev_zone() helper
  2019-04-17 18:39   ` Dan Williams
@ 2019-04-17 20:17     ` David Hildenbrand
  -1 siblings, 0 replies; 111+ messages in thread
From: David Hildenbrand @ 2019-04-17 20:17 UTC (permalink / raw)
  To: Dan Williams, akpm; +Cc: linux-mm, Michal Hocko, linux-kernel, linux-nvdimm

On 17.04.19 20:39, Dan Williams wrote:
> Given there are no more usages of is_dev_zone() outside of 'ifdef
> CONFIG_ZONE_DEVICE' protection, kill off the compilation helper.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/mmzone.h |   12 ------------
>  mm/page_alloc.c        |    2 +-
>  2 files changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index b13f0cddf75e..3237c5e456df 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -855,18 +855,6 @@ static inline int local_memory_node(int node_id) { return node_id; };
>   */
>  #define zone_idx(zone)		((zone) - (zone)->zone_pgdat->node_zones)
>  
> -#ifdef CONFIG_ZONE_DEVICE
> -static inline bool is_dev_zone(const struct zone *zone)
> -{
> -	return zone_idx(zone) == ZONE_DEVICE;
> -}
> -#else
> -static inline bool is_dev_zone(const struct zone *zone)
> -{
> -	return false;
> -}
> -#endif
> -
>  /*
>   * Returns true if a zone has pages managed by the buddy allocator.
>   * All the reclaim decisions have to use this function rather than
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c9ad28a78018..fd455bd742d5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5844,7 +5844,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
>  	unsigned long start = jiffies;
>  	int nid = pgdat->node_id;
>  
> -	if (WARN_ON_ONCE(!pgmap || !is_dev_zone(zone)))
> +	if (WARN_ON_ONCE(!pgmap || zone_idx(zone) != ZONE_DEVICE))
>  		return;
>  
>  	/*
> 

I like seeing that go

Acked-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 07/12] mm: Kill is_dev_zone() helper
@ 2019-04-17 20:17     ` David Hildenbrand
  0 siblings, 0 replies; 111+ messages in thread
From: David Hildenbrand @ 2019-04-17 20:17 UTC (permalink / raw)
  To: Dan Williams, akpm
  Cc: Michal Hocko, Logan Gunthorpe, linux-mm, linux-nvdimm, linux-kernel

On 17.04.19 20:39, Dan Williams wrote:
> Given there are no more usages of is_dev_zone() outside of 'ifdef
> CONFIG_ZONE_DEVICE' protection, kill off the compilation helper.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/mmzone.h |   12 ------------
>  mm/page_alloc.c        |    2 +-
>  2 files changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index b13f0cddf75e..3237c5e456df 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -855,18 +855,6 @@ static inline int local_memory_node(int node_id) { return node_id; };
>   */
>  #define zone_idx(zone)		((zone) - (zone)->zone_pgdat->node_zones)
>  
> -#ifdef CONFIG_ZONE_DEVICE
> -static inline bool is_dev_zone(const struct zone *zone)
> -{
> -	return zone_idx(zone) == ZONE_DEVICE;
> -}
> -#else
> -static inline bool is_dev_zone(const struct zone *zone)
> -{
> -	return false;
> -}
> -#endif
> -
>  /*
>   * Returns true if a zone has pages managed by the buddy allocator.
>   * All the reclaim decisions have to use this function rather than
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c9ad28a78018..fd455bd742d5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5844,7 +5844,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
>  	unsigned long start = jiffies;
>  	int nid = pgdat->node_id;
>  
> -	if (WARN_ON_ONCE(!pgmap || !is_dev_zone(zone)))
> +	if (WARN_ON_ONCE(!pgmap || zone_idx(zone) != ZONE_DEVICE))
>  		return;
>  
>  	/*
> 

I like seeing that go

Acked-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 11/12] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields
  2019-04-17 18:39   ` Dan Williams
@ 2019-04-17 22:02     ` Andrew Morton
  -1 siblings, 0 replies; 111+ messages in thread
From: Andrew Morton @ 2019-04-17 22:02 UTC (permalink / raw)
  To: Dan Williams; +Cc: mhocko, david, linux-nvdimm, linux-kernel, stable, linux-mm

On Wed, 17 Apr 2019 11:39:52 -0700 Dan Williams <dan.j.williams@intel.com> wrote:

> At namespace creation time there is the potential for the "expected to
> be zero" fields of a 'pfn' info-block to be filled with indeterminate
> data. While the kernel buffer is zeroed on allocation it is immediately
> overwritten by nd_pfn_validate() filling it with the current contents of
> the on-media info-block location. For fields like, 'flags' and the
> 'padding' it potentially means that future implementations can not rely
> on those fields being zero.
> 
> In preparation to stop using the 'start_pad' and 'end_trunc' fields for
> section alignment, arrange for fields that are not explicitly
> initialized to be guaranteed zero. Bump the minor version to indicate it
> is safe to assume the 'padding' and 'flags' are zero. Otherwise, this
> corruption is expected to benign since all other critical fields are
> explicitly initialized.
> 
> Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Buried at the end of a 12 patch series.  Should this be a standalone
patch, suitable for a prompt merge?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 11/12] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields
@ 2019-04-17 22:02     ` Andrew Morton
  0 siblings, 0 replies; 111+ messages in thread
From: Andrew Morton @ 2019-04-17 22:02 UTC (permalink / raw)
  To: Dan Williams; +Cc: stable, linux-mm, linux-nvdimm, linux-kernel, mhocko, david

On Wed, 17 Apr 2019 11:39:52 -0700 Dan Williams <dan.j.williams@intel.com> wrote:

> At namespace creation time there is the potential for the "expected to
> be zero" fields of a 'pfn' info-block to be filled with indeterminate
> data. While the kernel buffer is zeroed on allocation it is immediately
> overwritten by nd_pfn_validate() filling it with the current contents of
> the on-media info-block location. For fields like, 'flags' and the
> 'padding' it potentially means that future implementations can not rely
> on those fields being zero.
> 
> In preparation to stop using the 'start_pad' and 'end_trunc' fields for
> section alignment, arrange for fields that are not explicitly
> initialized to be guaranteed zero. Bump the minor version to indicate it
> is safe to assume the 'padding' and 'flags' are zero. Otherwise, this
> corruption is expected to benign since all other critical fields are
> explicitly initialized.
> 
> Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Buried at the end of a 12 patch series.  Should this be a standalone
patch, suitable for a prompt merge?

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
  2019-04-17 18:38 ` Dan Williams
@ 2019-04-17 22:03   ` Andrew Morton
  -1 siblings, 0 replies; 111+ messages in thread
From: Andrew Morton @ 2019-04-17 22:03 UTC (permalink / raw)
  To: Dan Williams
  Cc: Michal Hocko, David Hildenbrand, linux-nvdimm, stable,
	linux-kernel, linux-mm, Jérôme Glisse, Vlastimil Babka

On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@intel.com> wrote:

> The memory hotplug section is an arbitrary / convenient unit for memory
> hotplug. 'Section-size' units have bled into the user interface
> ('memblock' sysfs) and can not be changed without breaking existing
> userspace. The section-size constraint, while mostly benign for typical
> memory hotplug, has and continues to wreak havoc with 'device-memory'
> use cases, persistent memory (pmem) in particular. Recall that pmem uses
> devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a
> 'struct page' memmap for pmem. However, it does not use the 'bottom
> half' of memory hotplug, i.e. never marks pmem pages online and never
> exposes the userspace memblock interface for pmem. This leaves an
> opening to redress the section-size constraint.

v6 and we're not showing any review activity.  Who would be suitable
people to help out here?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
@ 2019-04-17 22:03   ` Andrew Morton
  0 siblings, 0 replies; 111+ messages in thread
From: Andrew Morton @ 2019-04-17 22:03 UTC (permalink / raw)
  To: Dan Williams
  Cc: David Hildenbrand, Jérôme Glisse, Logan Gunthorpe,
	Toshi Kani, Jeff Moyer, Michal Hocko, Vlastimil Babka, stable,
	linux-mm, linux-nvdimm, linux-kernel

On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@intel.com> wrote:

> The memory hotplug section is an arbitrary / convenient unit for memory
> hotplug. 'Section-size' units have bled into the user interface
> ('memblock' sysfs) and can not be changed without breaking existing
> userspace. The section-size constraint, while mostly benign for typical
> memory hotplug, has and continues to wreak havoc with 'device-memory'
> use cases, persistent memory (pmem) in particular. Recall that pmem uses
> devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a
> 'struct page' memmap for pmem. However, it does not use the 'bottom
> half' of memory hotplug, i.e. never marks pmem pages online and never
> exposes the userspace memblock interface for pmem. This leaves an
> opening to redress the section-size constraint.

v6 and we're not showing any review activity.  Who would be suitable
people to help out here?

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 11/12] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields
  2019-04-17 22:02     ` Andrew Morton
@ 2019-04-17 22:09       ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 22:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: stable, Linux MM, linux-nvdimm, Linux Kernel Mailing List,
	Michal Hocko, David Hildenbrand

On Wed, Apr 17, 2019 at 3:02 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Wed, 17 Apr 2019 11:39:52 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
>
> > At namespace creation time there is the potential for the "expected to
> > be zero" fields of a 'pfn' info-block to be filled with indeterminate
> > data. While the kernel buffer is zeroed on allocation it is immediately
> > overwritten by nd_pfn_validate() filling it with the current contents of
> > the on-media info-block location. For fields like, 'flags' and the
> > 'padding' it potentially means that future implementations can not rely
> > on those fields being zero.
> >
> > In preparation to stop using the 'start_pad' and 'end_trunc' fields for
> > section alignment, arrange for fields that are not explicitly
> > initialized to be guaranteed zero. Bump the minor version to indicate it
> > is safe to assume the 'padding' and 'flags' are zero. Otherwise, this
> > corruption is expected to benign since all other critical fields are
> > explicitly initialized.
> >
> > Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem")
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> Buried at the end of a 12 patch series.  Should this be a standalone
> patch, suitable for a prompt merge?

It's not a problem unless a kernel implementation is explicitly
expecting those fields to be zero-initialized. I only marked it for
-stable in case some future kernel backports patch12. Otherwise it's
benign on older kernels that don't have patch12 since all fields are
indeed initialized.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 11/12] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields
@ 2019-04-17 22:09       ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 22:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: stable, Linux MM, linux-nvdimm, Linux Kernel Mailing List,
	Michal Hocko, David Hildenbrand

On Wed, Apr 17, 2019 at 3:02 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Wed, 17 Apr 2019 11:39:52 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
>
> > At namespace creation time there is the potential for the "expected to
> > be zero" fields of a 'pfn' info-block to be filled with indeterminate
> > data. While the kernel buffer is zeroed on allocation it is immediately
> > overwritten by nd_pfn_validate() filling it with the current contents of
> > the on-media info-block location. For fields like, 'flags' and the
> > 'padding' it potentially means that future implementations can not rely
> > on those fields being zero.
> >
> > In preparation to stop using the 'start_pad' and 'end_trunc' fields for
> > section alignment, arrange for fields that are not explicitly
> > initialized to be guaranteed zero. Bump the minor version to indicate it
> > is safe to assume the 'padding' and 'flags' are zero. Otherwise, this
> > corruption is expected to benign since all other critical fields are
> > explicitly initialized.
> >
> > Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem")
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> Buried at the end of a 12 patch series.  Should this be a standalone
> patch, suitable for a prompt merge?

It's not a problem unless a kernel implementation is explicitly
expecting those fields to be zero-initialized. I only marked it for
-stable in case some future kernel backports patch12. Otherwise it's
benign on older kernels that don't have patch12 since all fields are
indeed initialized.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
  2019-04-17 22:03   ` Andrew Morton
@ 2019-04-17 22:59     ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 22:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, David Hildenbrand, linux-nvdimm, stable,
	Linux Kernel Mailing List, Linux MM, Jérôme Glisse,
	Vlastimil Babka, osalvador

On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
>
> > The memory hotplug section is an arbitrary / convenient unit for memory
> > hotplug. 'Section-size' units have bled into the user interface
> > ('memblock' sysfs) and can not be changed without breaking existing
> > userspace. The section-size constraint, while mostly benign for typical
> > memory hotplug, has and continues to wreak havoc with 'device-memory'
> > use cases, persistent memory (pmem) in particular. Recall that pmem uses
> > devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a
> > 'struct page' memmap for pmem. However, it does not use the 'bottom
> > half' of memory hotplug, i.e. never marks pmem pages online and never
> > exposes the userspace memblock interface for pmem. This leaves an
> > opening to redress the section-size constraint.
>
> v6 and we're not showing any review activity.  Who would be suitable
> people to help out here?

There was quite a bit of review of the cover letter from Michal and
David, but you're right the details not so much as of yet. I'd like to
call out other people where I can reciprocate with some review of my
own. Oscar's altmap work looks like a good candidate for that.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
@ 2019-04-17 22:59     ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-17 22:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Jérôme Glisse, Logan Gunthorpe,
	Toshi Kani, Jeff Moyer, Michal Hocko, Vlastimil Babka, stable,
	Linux MM, linux-nvdimm, Linux Kernel Mailing List, osalvador

On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
>
> > The memory hotplug section is an arbitrary / convenient unit for memory
> > hotplug. 'Section-size' units have bled into the user interface
> > ('memblock' sysfs) and can not be changed without breaking existing
> > userspace. The section-size constraint, while mostly benign for typical
> > memory hotplug, has and continues to wreak havoc with 'device-memory'
> > use cases, persistent memory (pmem) in particular. Recall that pmem uses
> > devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a
> > 'struct page' memmap for pmem. However, it does not use the 'bottom
> > half' of memory hotplug, i.e. never marks pmem pages online and never
> > exposes the userspace memblock interface for pmem. This leaves an
> > opening to redress the section-size constraint.
>
> v6 and we're not showing any review activity.  Who would be suitable
> people to help out here?

There was quite a bit of review of the cover letter from Michal and
David, but you're right the details not so much as of yet. I'd like to
call out other people where I can reciprocate with some review of my
own. Oscar's altmap work looks like a good candidate for that.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
  2019-04-17 22:59     ` Dan Williams
@ 2019-04-18  2:09       ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-18  2:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, David Hildenbrand, linux-nvdimm, stable,
	Linux Kernel Mailing List, Linux MM, Jérôme Glisse,
	Vlastimil Babka, osalvador

On Wed, Apr 17, 2019 at 3:59 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > > The memory hotplug section is an arbitrary / convenient unit for memory
> > > hotplug. 'Section-size' units have bled into the user interface
> > > ('memblock' sysfs) and can not be changed without breaking existing
> > > userspace. The section-size constraint, while mostly benign for typical
> > > memory hotplug, has and continues to wreak havoc with 'device-memory'
> > > use cases, persistent memory (pmem) in particular. Recall that pmem uses
> > > devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a
> > > 'struct page' memmap for pmem. However, it does not use the 'bottom
> > > half' of memory hotplug, i.e. never marks pmem pages online and never
> > > exposes the userspace memblock interface for pmem. This leaves an
> > > opening to redress the section-size constraint.
> >
> > v6 and we're not showing any review activity.  Who would be suitable
> > people to help out here?
>
> There was quite a bit of review of the cover letter from Michal and
> David, but you're right the details not so much as of yet. I'd like to
> call out other people where I can reciprocate with some review of my
> own. Oscar's altmap work looks like a good candidate for that.

I'm also hoping Jeff can give a tested-by for the customer scenarios
that fall over with the current implementation.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
@ 2019-04-18  2:09       ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-18  2:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Jérôme Glisse, Logan Gunthorpe,
	Toshi Kani, Jeff Moyer, Michal Hocko, Vlastimil Babka, stable,
	Linux MM, linux-nvdimm, Linux Kernel Mailing List, osalvador

On Wed, Apr 17, 2019 at 3:59 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > > The memory hotplug section is an arbitrary / convenient unit for memory
> > > hotplug. 'Section-size' units have bled into the user interface
> > > ('memblock' sysfs) and can not be changed without breaking existing
> > > userspace. The section-size constraint, while mostly benign for typical
> > > memory hotplug, has and continues to wreak havoc with 'device-memory'
> > > use cases, persistent memory (pmem) in particular. Recall that pmem uses
> > > devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a
> > > 'struct page' memmap for pmem. However, it does not use the 'bottom
> > > half' of memory hotplug, i.e. never marks pmem pages online and never
> > > exposes the userspace memblock interface for pmem. This leaves an
> > > opening to redress the section-size constraint.
> >
> > v6 and we're not showing any review activity.  Who would be suitable
> > people to help out here?
>
> There was quite a bit of review of the cover letter from Michal and
> David, but you're right the details not so much as of yet. I'd like to
> call out other people where I can reciprocate with some review of my
> own. Oscar's altmap work looks like a good candidate for that.

I'm also hoping Jeff can give a tested-by for the customer scenarios
that fall over with the current implementation.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
  2019-04-18  2:09       ` Dan Williams
@ 2019-04-18 12:45         ` Jeff Moyer
  -1 siblings, 0 replies; 111+ messages in thread
From: Jeff Moyer @ 2019-04-18 12:45 UTC (permalink / raw)
  To: Dan Williams
  Cc: Michal Hocko, David Hildenbrand, linux-nvdimm,
	Linux Kernel Mailing List, stable, Linux MM,
	Jérôme Glisse, Andrew Morton, Vlastimil Babka,
	osalvador

Dan Williams <dan.j.williams@intel.com> writes:

>> On Wed, Apr 17, 2019 at 3:59 PM Dan Williams <dan.j.williams@intel.com> wrote:
>>
>> On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>> >
>> > On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
>> >
>> > > The memory hotplug section is an arbitrary / convenient unit for memory
>> > > hotplug. 'Section-size' units have bled into the user interface
>> > > ('memblock' sysfs) and can not be changed without breaking existing
>> > > userspace. The section-size constraint, while mostly benign for typical
>> > > memory hotplug, has and continues to wreak havoc with 'device-memory'
>> > > use cases, persistent memory (pmem) in particular. Recall that pmem uses
>> > > devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a
>> > > 'struct page' memmap for pmem. However, it does not use the 'bottom
>> > > half' of memory hotplug, i.e. never marks pmem pages online and never
>> > > exposes the userspace memblock interface for pmem. This leaves an
>> > > opening to redress the section-size constraint.
>> >
>> > v6 and we're not showing any review activity.  Who would be suitable
>> > people to help out here?
>>
>> There was quite a bit of review of the cover letter from Michal and
>> David, but you're right the details not so much as of yet. I'd like to
>> call out other people where I can reciprocate with some review of my
>> own. Oscar's altmap work looks like a good candidate for that.
>
> I'm also hoping Jeff can give a tested-by for the customer scenarios
> that fall over with the current implementation.

Sure.  I'll also have a look over the patches.

-Jeff
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
@ 2019-04-18 12:45         ` Jeff Moyer
  0 siblings, 0 replies; 111+ messages in thread
From: Jeff Moyer @ 2019-04-18 12:45 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, David Hildenbrand, Jérôme Glisse,
	Logan Gunthorpe, Toshi Kani, Michal Hocko, Vlastimil Babka,
	stable, Linux MM, linux-nvdimm, Linux Kernel Mailing List,
	osalvador

Dan Williams <dan.j.williams@intel.com> writes:

>> On Wed, Apr 17, 2019 at 3:59 PM Dan Williams <dan.j.williams@intel.com> wrote:
>>
>> On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>> >
>> > On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
>> >
>> > > The memory hotplug section is an arbitrary / convenient unit for memory
>> > > hotplug. 'Section-size' units have bled into the user interface
>> > > ('memblock' sysfs) and can not be changed without breaking existing
>> > > userspace. The section-size constraint, while mostly benign for typical
>> > > memory hotplug, has and continues to wreak havoc with 'device-memory'
>> > > use cases, persistent memory (pmem) in particular. Recall that pmem uses
>> > > devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a
>> > > 'struct page' memmap for pmem. However, it does not use the 'bottom
>> > > half' of memory hotplug, i.e. never marks pmem pages online and never
>> > > exposes the userspace memblock interface for pmem. This leaves an
>> > > opening to redress the section-size constraint.
>> >
>> > v6 and we're not showing any review activity.  Who would be suitable
>> > people to help out here?
>>
>> There was quite a bit of review of the cover letter from Michal and
>> David, but you're right the details not so much as of yet. I'd like to
>> call out other people where I can reciprocate with some review of my
>> own. Oscar's altmap work looks like a good candidate for that.
>
> I'm also hoping Jeff can give a tested-by for the customer scenarios
> that fall over with the current implementation.

Sure.  I'll also have a look over the patches.

-Jeff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
  2019-04-18 12:45         ` Jeff Moyer
@ 2019-04-19  3:25           ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-19  3:25 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Michal Hocko, David Hildenbrand, linux-nvdimm,
	Linux Kernel Mailing List, stable, Linux MM,
	Jérôme Glisse, Andrew Morton, Vlastimil Babka,
	osalvador

On Thu, Apr 18, 2019 at 5:45 AM Jeff Moyer <jmoyer@redhat.com> wrote:
[..]
> >> > v6 and we're not showing any review activity.  Who would be suitable
> >> > people to help out here?
> >>
> >> There was quite a bit of review of the cover letter from Michal and
> >> David, but you're right the details not so much as of yet. I'd like to
> >> call out other people where I can reciprocate with some review of my
> >> own. Oscar's altmap work looks like a good candidate for that.
> >
> > I'm also hoping Jeff can give a tested-by for the customer scenarios
> > that fall over with the current implementation.
>
> Sure.  I'll also have a look over the patches.

Andrew, heads up it looks like there is a memory corruption bug in
these patches as I've gotten a few reported of "bad page state" at
boot. Please drop until I can track down the failure.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
@ 2019-04-19  3:25           ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-19  3:25 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Andrew Morton, David Hildenbrand, Jérôme Glisse,
	Logan Gunthorpe, Toshi Kani, Michal Hocko, Vlastimil Babka,
	stable, Linux MM, linux-nvdimm, Linux Kernel Mailing List,
	osalvador

On Thu, Apr 18, 2019 at 5:45 AM Jeff Moyer <jmoyer@redhat.com> wrote:
[..]
> >> > v6 and we're not showing any review activity.  Who would be suitable
> >> > people to help out here?
> >>
> >> There was quite a bit of review of the cover letter from Michal and
> >> David, but you're right the details not so much as of yet. I'd like to
> >> call out other people where I can reciprocate with some review of my
> >> own. Oscar's altmap work looks like a good candidate for that.
> >
> > I'm also hoping Jeff can give a tested-by for the customer scenarios
> > that fall over with the current implementation.
>
> Sure.  I'll also have a look over the patches.

Andrew, heads up it looks like there is a memory corruption bug in
these patches as I've gotten a few reported of "bad page state" at
boot. Please drop until I can track down the failure.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal
  2019-04-17 18:39   ` Dan Williams
@ 2019-04-19 23:09     ` Ralph Campbell
  -1 siblings, 0 replies; 111+ messages in thread
From: Ralph Campbell @ 2019-04-19 23:09 UTC (permalink / raw)
  To: Dan Williams, akpm
  Cc: Michal Hocko, linux-nvdimm, david, linux-kernel, linux-mm,
	Vlastimil Babka

Just noticed this by inspection.
I can't say I'm very familiar with the code.

On 4/17/19 11:39 AM, Dan Williams wrote:
> Sub-section hotplug support reduces the unit of operation of hotplug
> from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units
> (PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider
> PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not
> valid_section(), can toggle.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>   include/linux/mmzone.h |    2 ++
>   mm/memory_hotplug.c    |   16 ++++++++--------
>   2 files changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index cffde898e345..b13f0cddf75e 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1164,6 +1164,8 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
>   
>   #define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG)
>   #define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1))
> +#define PAGES_PER_SUB_SECTION (SECTION_ACTIVE_SIZE / PAGE_SIZE)
> +#define PAGE_SUB_SECTION_MASK (~(PAGES_PER_SUB_SECTION-1))
>   
>   struct mem_section_usage {
>   	/*
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 8b7415736d21..d5874f9d4043 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -327,10 +327,10 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
>   {
>   	struct mem_section *ms;
>   
> -	for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) {
> +	for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUB_SECTION) {
>   		ms = __pfn_to_section(start_pfn);
>   
> -		if (unlikely(!valid_section(ms)))
> +		if (unlikely(!pfn_valid(start_pfn)))
>   			continue;

Note that "struct mem_section *ms;" is now set but not used.
You can remove the definition and initialization of "ms".

>   		if (unlikely(pfn_to_nid(start_pfn) != nid))
> @@ -355,10 +355,10 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
>   
>   	/* pfn is the end pfn of a memory section. */
>   	pfn = end_pfn - 1;
> -	for (; pfn >= start_pfn; pfn -= PAGES_PER_SECTION) {
> +	for (; pfn >= start_pfn; pfn -= PAGES_PER_SUB_SECTION) {
>   		ms = __pfn_to_section(pfn);
>   
> -		if (unlikely(!valid_section(ms)))
> +		if (unlikely(!pfn_valid(pfn)))
>   			continue;

Ditto about "ms".

>   		if (unlikely(pfn_to_nid(pfn) != nid))
> @@ -417,10 +417,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
>   	 * it check the zone has only hole or not.
>   	 */
>   	pfn = zone_start_pfn;
> -	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SECTION) {
> +	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUB_SECTION) {
>   		ms = __pfn_to_section(pfn);
>   
> -		if (unlikely(!valid_section(ms)))
> +		if (unlikely(!pfn_valid(pfn)))
>   			continue;

Ditto about "ms".

>   		if (page_zone(pfn_to_page(pfn)) != zone)
> @@ -485,10 +485,10 @@ static void shrink_pgdat_span(struct pglist_data *pgdat,
>   	 * has only hole or not.
>   	 */
>   	pfn = pgdat_start_pfn;
> -	for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SECTION) {
> +	for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SUB_SECTION) {
>   		ms = __pfn_to_section(pfn);
>   
> -		if (unlikely(!valid_section(ms)))
> +		if (unlikely(!pfn_valid(pfn)))
>   			continue;

Ditto about "ms".

>   		if (pfn_to_nid(pfn) != nid)
> 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal
@ 2019-04-19 23:09     ` Ralph Campbell
  0 siblings, 0 replies; 111+ messages in thread
From: Ralph Campbell @ 2019-04-19 23:09 UTC (permalink / raw)
  To: Dan Williams, akpm
  Cc: Michal Hocko, Vlastimil Babka, Logan Gunthorpe, linux-mm,
	linux-nvdimm, linux-kernel, david

Just noticed this by inspection.
I can't say I'm very familiar with the code.

On 4/17/19 11:39 AM, Dan Williams wrote:
> Sub-section hotplug support reduces the unit of operation of hotplug
> from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units
> (PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider
> PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not
> valid_section(), can toggle.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>   include/linux/mmzone.h |    2 ++
>   mm/memory_hotplug.c    |   16 ++++++++--------
>   2 files changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index cffde898e345..b13f0cddf75e 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1164,6 +1164,8 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
>   
>   #define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG)
>   #define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1))
> +#define PAGES_PER_SUB_SECTION (SECTION_ACTIVE_SIZE / PAGE_SIZE)
> +#define PAGE_SUB_SECTION_MASK (~(PAGES_PER_SUB_SECTION-1))
>   
>   struct mem_section_usage {
>   	/*
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 8b7415736d21..d5874f9d4043 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -327,10 +327,10 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
>   {
>   	struct mem_section *ms;
>   
> -	for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) {
> +	for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUB_SECTION) {
>   		ms = __pfn_to_section(start_pfn);
>   
> -		if (unlikely(!valid_section(ms)))
> +		if (unlikely(!pfn_valid(start_pfn)))
>   			continue;

Note that "struct mem_section *ms;" is now set but not used.
You can remove the definition and initialization of "ms".

>   		if (unlikely(pfn_to_nid(start_pfn) != nid))
> @@ -355,10 +355,10 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
>   
>   	/* pfn is the end pfn of a memory section. */
>   	pfn = end_pfn - 1;
> -	for (; pfn >= start_pfn; pfn -= PAGES_PER_SECTION) {
> +	for (; pfn >= start_pfn; pfn -= PAGES_PER_SUB_SECTION) {
>   		ms = __pfn_to_section(pfn);
>   
> -		if (unlikely(!valid_section(ms)))
> +		if (unlikely(!pfn_valid(pfn)))
>   			continue;

Ditto about "ms".

>   		if (unlikely(pfn_to_nid(pfn) != nid))
> @@ -417,10 +417,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
>   	 * it check the zone has only hole or not.
>   	 */
>   	pfn = zone_start_pfn;
> -	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SECTION) {
> +	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUB_SECTION) {
>   		ms = __pfn_to_section(pfn);
>   
> -		if (unlikely(!valid_section(ms)))
> +		if (unlikely(!pfn_valid(pfn)))
>   			continue;

Ditto about "ms".

>   		if (page_zone(pfn_to_page(pfn)) != zone)
> @@ -485,10 +485,10 @@ static void shrink_pgdat_span(struct pglist_data *pgdat,
>   	 * has only hole or not.
>   	 */
>   	pfn = pgdat_start_pfn;
> -	for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SECTION) {
> +	for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SUB_SECTION) {
>   		ms = __pfn_to_section(pfn);
>   
> -		if (unlikely(!valid_section(ms)))
> +		if (unlikely(!pfn_valid(pfn)))
>   			continue;

Ditto about "ms".

>   		if (pfn_to_nid(pfn) != nid)
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal
  2019-04-19 23:09     ` Ralph Campbell
  (?)
@ 2019-04-19 23:13       ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-19 23:13 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: Michal Hocko, linux-nvdimm, David Hildenbrand,
	Linux Kernel Mailing List, Linux MM, Andrew Morton,
	Vlastimil Babka

On Fri, Apr 19, 2019 at 4:09 PM Ralph Campbell <rcampbell@nvidia.com> wrote:
[..]
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 8b7415736d21..d5874f9d4043 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -327,10 +327,10 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
> >   {
> >       struct mem_section *ms;
> >
> > -     for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) {
> > +     for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUB_SECTION) {
> >               ms = __pfn_to_section(start_pfn);
> >
> > -             if (unlikely(!valid_section(ms)))
> > +             if (unlikely(!pfn_valid(start_pfn)))
> >                       continue;
>
> Note that "struct mem_section *ms;" is now set but not used.
> You can remove the definition and initialization of "ms".

Good eye, yes, will clean up.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal
@ 2019-04-19 23:13       ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-19 23:13 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	Linux MM, linux-nvdimm, Linux Kernel Mailing List,
	David Hildenbrand

On Fri, Apr 19, 2019 at 4:09 PM Ralph Campbell <rcampbell@nvidia.com> wrote:
[..]
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 8b7415736d21..d5874f9d4043 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -327,10 +327,10 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
> >   {
> >       struct mem_section *ms;
> >
> > -     for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) {
> > +     for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUB_SECTION) {
> >               ms = __pfn_to_section(start_pfn);
> >
> > -             if (unlikely(!valid_section(ms)))
> > +             if (unlikely(!pfn_valid(start_pfn)))
> >                       continue;
>
> Note that "struct mem_section *ms;" is now set but not used.
> You can remove the definition and initialization of "ms".

Good eye, yes, will clean up.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal
@ 2019-04-19 23:13       ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-19 23:13 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	Linux MM, linux-nvdimm, Linux Kernel Mailing List,
	David Hildenbrand

On Fri, Apr 19, 2019 at 4:09 PM Ralph Campbell <rcampbell@nvidia.com> wrote:
[..]
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 8b7415736d21..d5874f9d4043 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -327,10 +327,10 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
> >   {
> >       struct mem_section *ms;
> >
> > -     for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) {
> > +     for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUB_SECTION) {
> >               ms = __pfn_to_section(start_pfn);
> >
> > -             if (unlikely(!valid_section(ms)))
> > +             if (unlikely(!pfn_valid(start_pfn)))
> >                       continue;
>
> Note that "struct mem_section *ms;" is now set but not used.
> You can remove the definition and initialization of "ms".

Good eye, yes, will clean up.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
  2019-04-17 22:59     ` Dan Williams
@ 2019-04-23 13:16       ` Oscar Salvador
  -1 siblings, 0 replies; 111+ messages in thread
From: Oscar Salvador @ 2019-04-23 13:16 UTC (permalink / raw)
  To: Dan Williams, Andrew Morton
  Cc: Michal Hocko, David Hildenbrand, linux-nvdimm, stable,
	Linux Kernel Mailing List, Linux MM, Jérôme Glisse,
	Vlastimil Babka

On Wed, 2019-04-17 at 15:59 -0700, Dan Williams wrote:
> On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton <akpm@linux-foundation.
> org> wrote:
> > 
> > On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@int
> > el.com> wrote:
> > 
> > > The memory hotplug section is an arbitrary / convenient unit for
> > > memory
> > > hotplug. 'Section-size' units have bled into the user interface
> > > ('memblock' sysfs) and can not be changed without breaking
> > > existing
> > > userspace. The section-size constraint, while mostly benign for
> > > typical
> > > memory hotplug, has and continues to wreak havoc with 'device-
> > > memory'
> > > use cases, persistent memory (pmem) in particular. Recall that
> > > pmem uses
> > > devm_memremap_pages(), and subsequently arch_add_memory(), to
> > > allocate a
> > > 'struct page' memmap for pmem. However, it does not use the
> > > 'bottom
> > > half' of memory hotplug, i.e. never marks pmem pages online and
> > > never
> > > exposes the userspace memblock interface for pmem. This leaves an
> > > opening to redress the section-size constraint.
> > 
> > v6 and we're not showing any review activity.  Who would be
> > suitable
> > people to help out here?
> 
> There was quite a bit of review of the cover letter from Michal and
> David, but you're right the details not so much as of yet. I'd like
> to
> call out other people where I can reciprocate with some review of my
> own. Oscar's altmap work looks like a good candidate for that.

Thanks Dan for ccing me.
I will take a look at the patches soon.

-- 
Oscar Salvador
SUSE L3
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
@ 2019-04-23 13:16       ` Oscar Salvador
  0 siblings, 0 replies; 111+ messages in thread
From: Oscar Salvador @ 2019-04-23 13:16 UTC (permalink / raw)
  To: Dan Williams, Andrew Morton
  Cc: David Hildenbrand, Jérôme Glisse, Logan Gunthorpe,
	Toshi Kani, Jeff Moyer, Michal Hocko, Vlastimil Babka, stable,
	Linux MM, linux-nvdimm, Linux Kernel Mailing List

On Wed, 2019-04-17 at 15:59 -0700, Dan Williams wrote:
> On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton <akpm@linux-foundation.
> org> wrote:
> > 
> > On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@int
> > el.com> wrote:
> > 
> > > The memory hotplug section is an arbitrary / convenient unit for
> > > memory
> > > hotplug. 'Section-size' units have bled into the user interface
> > > ('memblock' sysfs) and can not be changed without breaking
> > > existing
> > > userspace. The section-size constraint, while mostly benign for
> > > typical
> > > memory hotplug, has and continues to wreak havoc with 'device-
> > > memory'
> > > use cases, persistent memory (pmem) in particular. Recall that
> > > pmem uses
> > > devm_memremap_pages(), and subsequently arch_add_memory(), to
> > > allocate a
> > > 'struct page' memmap for pmem. However, it does not use the
> > > 'bottom
> > > half' of memory hotplug, i.e. never marks pmem pages online and
> > > never
> > > exposes the userspace memblock interface for pmem. This leaves an
> > > opening to redress the section-size constraint.
> > 
> > v6 and we're not showing any review activity.  Who would be
> > suitable
> > people to help out here?
> 
> There was quite a bit of review of the cover letter from Michal and
> David, but you're right the details not so much as of yet. I'd like
> to
> call out other people where I can reciprocate with some review of my
> own. Oscar's altmap work looks like a good candidate for that.

Thanks Dan for ccing me.
I will take a look at the patches soon.

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 06/12] mm/hotplug: Add mem-hotplug restrictions for remove_memory()
  2019-04-17 18:39   ` Dan Williams
@ 2019-04-23 21:21     ` David Hildenbrand
  -1 siblings, 0 replies; 111+ messages in thread
From: David Hildenbrand @ 2019-04-23 21:21 UTC (permalink / raw)
  To: Dan Williams, akpm; +Cc: linux-mm, Michal Hocko, linux-kernel, linux-nvdimm

On 17.04.19 20:39, Dan Williams wrote:
> Teach the arch_remove_memory() path to consult the same 'struct
> mhp_restrictions' context as was specified at arch_add_memory() time.
> 
> No functional change, this is a preparation step for teaching
> __remove_pages() about how and when to allow sub-section hot-remove, and
> a cleanup for an unnecessary "is_dev_zone()" special case.

I am not yet sure if this is the right thing to do. When adding memory,
we obviously have to specify the "how". When removing memory, we usually
should be able to look such stuff up.


>  void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
> -		    unsigned long nr_pages, struct vmem_altmap *altmap)
> +		unsigned long nr_pages, struct mhp_restrictions *restrictions)
>  {
>  	unsigned long i;
> -	unsigned long map_offset = 0;
>  	int sections_to_remove;
> +	unsigned long map_offset = 0;
> +	struct vmem_altmap *altmap = restrictions->altmap;
>  
> -	/* In the ZONE_DEVICE case device driver owns the memory region */
> -	if (is_dev_zone(zone)) {
> -		if (altmap)
> -			map_offset = vmem_altmap_offset(altmap);
> -	}
> +	if (altmap)
> +		map_offset = vmem_altmap_offset(altmap);
>  

Why weren't we able to use this exact same hunk before? (after my
resource deletion cleanup of course)

IOW, do we really need struct mhp_restrictions here?

After I factor out memory device handling into the caller of
arch_remove_memory(), also the next patch ("mm/sparsemem: Prepare for
sub-section ranges") should no longer need it. Or am I missing something?

-- 

Thanks,

David / dhildenb
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 06/12] mm/hotplug: Add mem-hotplug restrictions for remove_memory()
@ 2019-04-23 21:21     ` David Hildenbrand
  0 siblings, 0 replies; 111+ messages in thread
From: David Hildenbrand @ 2019-04-23 21:21 UTC (permalink / raw)
  To: Dan Williams, akpm
  Cc: Michal Hocko, Logan Gunthorpe, linux-mm, linux-nvdimm, linux-kernel

On 17.04.19 20:39, Dan Williams wrote:
> Teach the arch_remove_memory() path to consult the same 'struct
> mhp_restrictions' context as was specified at arch_add_memory() time.
> 
> No functional change, this is a preparation step for teaching
> __remove_pages() about how and when to allow sub-section hot-remove, and
> a cleanup for an unnecessary "is_dev_zone()" special case.

I am not yet sure if this is the right thing to do. When adding memory,
we obviously have to specify the "how". When removing memory, we usually
should be able to look such stuff up.


>  void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
> -		    unsigned long nr_pages, struct vmem_altmap *altmap)
> +		unsigned long nr_pages, struct mhp_restrictions *restrictions)
>  {
>  	unsigned long i;
> -	unsigned long map_offset = 0;
>  	int sections_to_remove;
> +	unsigned long map_offset = 0;
> +	struct vmem_altmap *altmap = restrictions->altmap;
>  
> -	/* In the ZONE_DEVICE case device driver owns the memory region */
> -	if (is_dev_zone(zone)) {
> -		if (altmap)
> -			map_offset = vmem_altmap_offset(altmap);
> -	}
> +	if (altmap)
> +		map_offset = vmem_altmap_offset(altmap);
>  

Why weren't we able to use this exact same hunk before? (after my
resource deletion cleanup of course)

IOW, do we really need struct mhp_restrictions here?

After I factor out memory device handling into the caller of
arch_remove_memory(), also the next patch ("mm/sparsemem: Prepare for
sub-section ranges") should no longer need it. Or am I missing something?

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 06/12] mm/hotplug: Add mem-hotplug restrictions for remove_memory()
  2019-04-23 21:21     ` David Hildenbrand
@ 2019-04-24 18:07       ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-24 18:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Michal Hocko, Logan Gunthorpe, Linux MM,
	linux-nvdimm, Linux Kernel Mailing List

On Tue, Apr 23, 2019 at 2:21 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 17.04.19 20:39, Dan Williams wrote:
> > Teach the arch_remove_memory() path to consult the same 'struct
> > mhp_restrictions' context as was specified at arch_add_memory() time.
> >
> > No functional change, this is a preparation step for teaching
> > __remove_pages() about how and when to allow sub-section hot-remove, and
> > a cleanup for an unnecessary "is_dev_zone()" special case.
>
> I am not yet sure if this is the right thing to do. When adding memory,
> we obviously have to specify the "how". When removing memory, we usually
> should be able to look such stuff up.

True, the implementation can just use find_memory_block(), and no need
to plumb this flag.

>
>
> >  void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
> > -                 unsigned long nr_pages, struct vmem_altmap *altmap)
> > +             unsigned long nr_pages, struct mhp_restrictions *restrictions)
> >  {
> >       unsigned long i;
> > -     unsigned long map_offset = 0;
> >       int sections_to_remove;
> > +     unsigned long map_offset = 0;
> > +     struct vmem_altmap *altmap = restrictions->altmap;
> >
> > -     /* In the ZONE_DEVICE case device driver owns the memory region */
> > -     if (is_dev_zone(zone)) {
> > -             if (altmap)
> > -                     map_offset = vmem_altmap_offset(altmap);
> > -     }
> > +     if (altmap)
> > +             map_offset = vmem_altmap_offset(altmap);
> >
>
> Why weren't we able to use this exact same hunk before? (after my
> resource deletion cleanup of course)
>
> IOW, do we really need struct mhp_restrictions here?

We don't need it. It was only the memblock info why I added the
"restrictions" argument.

> After I factor out memory device handling into the caller of
> arch_remove_memory(), also the next patch ("mm/sparsemem: Prepare for
> sub-section ranges") should no longer need it. Or am I missing something?

That patch is still needed for the places where it adds the @nr_pages
argument, but the mhp_restrictions related bits can be dropped. The
subsection_check() helper needs to be refactored a bit to not rely on
mhp_restrictions.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 06/12] mm/hotplug: Add mem-hotplug restrictions for remove_memory()
@ 2019-04-24 18:07       ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-04-24 18:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Michal Hocko, Logan Gunthorpe, Linux MM,
	linux-nvdimm, Linux Kernel Mailing List

On Tue, Apr 23, 2019 at 2:21 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 17.04.19 20:39, Dan Williams wrote:
> > Teach the arch_remove_memory() path to consult the same 'struct
> > mhp_restrictions' context as was specified at arch_add_memory() time.
> >
> > No functional change, this is a preparation step for teaching
> > __remove_pages() about how and when to allow sub-section hot-remove, and
> > a cleanup for an unnecessary "is_dev_zone()" special case.
>
> I am not yet sure if this is the right thing to do. When adding memory,
> we obviously have to specify the "how". When removing memory, we usually
> should be able to look such stuff up.

True, the implementation can just use find_memory_block(), and no need
to plumb this flag.

>
>
> >  void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
> > -                 unsigned long nr_pages, struct vmem_altmap *altmap)
> > +             unsigned long nr_pages, struct mhp_restrictions *restrictions)
> >  {
> >       unsigned long i;
> > -     unsigned long map_offset = 0;
> >       int sections_to_remove;
> > +     unsigned long map_offset = 0;
> > +     struct vmem_altmap *altmap = restrictions->altmap;
> >
> > -     /* In the ZONE_DEVICE case device driver owns the memory region */
> > -     if (is_dev_zone(zone)) {
> > -             if (altmap)
> > -                     map_offset = vmem_altmap_offset(altmap);
> > -     }
> > +     if (altmap)
> > +             map_offset = vmem_altmap_offset(altmap);
> >
>
> Why weren't we able to use this exact same hunk before? (after my
> resource deletion cleanup of course)
>
> IOW, do we really need struct mhp_restrictions here?

We don't need it. It was only the memblock info why I added the
"restrictions" argument.

> After I factor out memory device handling into the caller of
> arch_remove_memory(), also the next patch ("mm/sparsemem: Prepare for
> sub-section ranges") should no longer need it. Or am I missing something?

That patch is still needed for the places where it adds the @nr_pages
argument, but the mhp_restrictions related bits can be dropped. The
subsection_check() helper needs to be refactored a bit to not rely on
mhp_restrictions.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
  2019-04-23 13:16       ` Oscar Salvador
@ 2019-04-24 20:43         ` Pavel Tatashin
  -1 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-04-24 20:43 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Michal Hocko, David Hildenbrand, linux-nvdimm, stable,
	Linux Kernel Mailing List, Linux MM, Jérôme Glisse,
	Andrew Morton, Vlastimil Babka

I am also taking a look at this work now. I will review and test it in
the next couple of days.

Pasha

On Tue, Apr 23, 2019 at 9:17 AM Oscar Salvador <osalvador@suse.de> wrote:
>
> On Wed, 2019-04-17 at 15:59 -0700, Dan Williams wrote:
> > On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton <akpm@linux-foundation.
> > org> wrote:
> > >
> > > On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@int
> > > el.com> wrote:
> > >
> > > > The memory hotplug section is an arbitrary / convenient unit for
> > > > memory
> > > > hotplug. 'Section-size' units have bled into the user interface
> > > > ('memblock' sysfs) and can not be changed without breaking
> > > > existing
> > > > userspace. The section-size constraint, while mostly benign for
> > > > typical
> > > > memory hotplug, has and continues to wreak havoc with 'device-
> > > > memory'
> > > > use cases, persistent memory (pmem) in particular. Recall that
> > > > pmem uses
> > > > devm_memremap_pages(), and subsequently arch_add_memory(), to
> > > > allocate a
> > > > 'struct page' memmap for pmem. However, it does not use the
> > > > 'bottom
> > > > half' of memory hotplug, i.e. never marks pmem pages online and
> > > > never
> > > > exposes the userspace memblock interface for pmem. This leaves an
> > > > opening to redress the section-size constraint.
> > >
> > > v6 and we're not showing any review activity.  Who would be
> > > suitable
> > > people to help out here?
> >
> > There was quite a bit of review of the cover letter from Michal and
> > David, but you're right the details not so much as of yet. I'd like
> > to
> > call out other people where I can reciprocate with some review of my
> > own. Oscar's altmap work looks like a good candidate for that.
>
> Thanks Dan for ccing me.
> I will take a look at the patches soon.
>
> --
> Oscar Salvador
> SUSE L3
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
@ 2019-04-24 20:43         ` Pavel Tatashin
  0 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-04-24 20:43 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Dan Williams, Andrew Morton, David Hildenbrand,
	Jérôme Glisse, Logan Gunthorpe, Toshi Kani, Jeff Moyer,
	Michal Hocko, Vlastimil Babka, stable, Linux MM, linux-nvdimm,
	Linux Kernel Mailing List

I am also taking a look at this work now. I will review and test it in
the next couple of days.

Pasha

On Tue, Apr 23, 2019 at 9:17 AM Oscar Salvador <osalvador@suse.de> wrote:
>
> On Wed, 2019-04-17 at 15:59 -0700, Dan Williams wrote:
> > On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton <akpm@linux-foundation.
> > org> wrote:
> > >
> > > On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@int
> > > el.com> wrote:
> > >
> > > > The memory hotplug section is an arbitrary / convenient unit for
> > > > memory
> > > > hotplug. 'Section-size' units have bled into the user interface
> > > > ('memblock' sysfs) and can not be changed without breaking
> > > > existing
> > > > userspace. The section-size constraint, while mostly benign for
> > > > typical
> > > > memory hotplug, has and continues to wreak havoc with 'device-
> > > > memory'
> > > > use cases, persistent memory (pmem) in particular. Recall that
> > > > pmem uses
> > > > devm_memremap_pages(), and subsequently arch_add_memory(), to
> > > > allocate a
> > > > 'struct page' memmap for pmem. However, it does not use the
> > > > 'bottom
> > > > half' of memory hotplug, i.e. never marks pmem pages online and
> > > > never
> > > > exposes the userspace memblock interface for pmem. This leaves an
> > > > opening to redress the section-size constraint.
> > >
> > > v6 and we're not showing any review activity.  Who would be
> > > suitable
> > > people to help out here?
> >
> > There was quite a bit of review of the cover letter from Michal and
> > David, but you're right the details not so much as of yet. I'd like
> > to
> > call out other people where I can reciprocate with some review of my
> > own. Oscar's altmap work looks like a good candidate for that.
>
> Thanks Dan for ccing me.
> I will take a look at the patches soon.
>
> --
> Oscar Salvador
> SUSE L3

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
  2019-04-17 18:39   ` Dan Williams
  (?)
@ 2019-04-25 14:33   ` Oscar Salvador
  -1 siblings, 0 replies; 111+ messages in thread
From: Oscar Salvador @ 2019-04-25 14:33 UTC (permalink / raw)
  To: Dan Williams
  Cc: akpm, Michal Hocko, Vlastimil Babka, Logan Gunthorpe, linux-mm,
	linux-nvdimm, linux-kernel, david

On Wed, Apr 17, 2019 at 11:39:11AM -0700, Dan Williams wrote:
> Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
> map_active bitmask length (64)). If it turns out that 2MB is too large
> of an active tracking granularity it is trivial to increase the size of
> the map_active bitmap.
> 
> The implications of a partially populated section is that pfn_valid()
> needs to go beyond a valid_section() check and read the sub-section
> active ranges from the bitmask.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

I am still going through the patchset but:

> +static unsigned long section_active_mask(unsigned long pfn,
> +		unsigned long nr_pages)
> +{
> +	int idx_start, idx_size;
> +	phys_addr_t start, size;
> +
> +	if (!nr_pages)
> +		return 0;
> +
> +	start = PFN_PHYS(pfn);
> +	size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> +				- (pfn & ~PAGE_SECTION_MASK)));

We already picked the lowest value in section_active_init, didn't we?
This min() operations seems redundant to me here.

> +	size = ALIGN(size, SECTION_ACTIVE_SIZE);

> +
> +	idx_start = section_active_index(start);
> +	idx_size = section_active_index(size);
> +
> +	if (idx_size == 0)
> +		return -1;
> +	return ((1UL << idx_size) - 1) << idx_start;
> +}
> +
> +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> +{
> +	int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> +	int i, start_sec = pfn_to_section_nr(pfn);
> +
> +	if (!nr_pages)
> +		return;
> +
> +	for (i = start_sec; i <= end_sec; i++) {
> +		struct mem_section *ms;
> +		unsigned long mask;
> +		unsigned long pfns;

s/pfns/nr_pfns instead?

> +
> +		pfns = min(nr_pages, PAGES_PER_SECTION
> +				- (pfn & ~PAGE_SECTION_MASK));
> +		mask = section_active_mask(pfn, pfns);
> +
> +		ms = __nr_to_section(i);
> +		pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i, mask);
> +		ms->usage->map_active = mask;
> +
> +		pfn += pfns;
> +		nr_pages -= pfns;
> +	}
> +}

Although the code is not very complicated, it could use some comments here and
there.

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
  2019-04-17 18:39   ` Dan Williams
  (?)
@ 2019-04-25 14:43     ` Oscar Salvador
  -1 siblings, 0 replies; 111+ messages in thread
From: Oscar Salvador @ 2019-04-25 14:43 UTC (permalink / raw)
  To: Dan Williams, akpm
  Cc: Michal Hocko, linux-nvdimm, david, linux-kernel, linux-mm,
	Vlastimil Babka

On Wed, 2019-04-17 at 11:39 -0700, Dan Williams wrote:
> Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> section active bitmask, each bit representing 2MB (SECTION_SIZE
> (128M) /
> map_active bitmask length (64)). If it turns out that 2MB is too
> large
> of an active tracking granularity it is trivial to increase the size
> of
> the map_active bitmap.
> 
> The implications of a partially populated section is that pfn_valid()
> needs to go beyond a valid_section() check and read the sub-section
> active ranges from the bitmask.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Hi Dan,

I am still going through the patchset but:
 
> +static unsigned long section_active_mask(unsigned long pfn,
> +		unsigned long nr_pages)
> +{
> +	int idx_start, idx_size;
> +	phys_addr_t start, size;
> +
> +	if (!nr_pages)
> +		return 0;
> +
> +	start = PFN_PHYS(pfn);
> +	size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> +				- (pfn & ~PAGE_SECTION_MASK)));

We already picked the lowest value in section_active_init, didn't we?
This min() operations seems redundant to me here.

> +	size = ALIGN(size, SECTION_ACTIVE_SIZE);
> +
> +	idx_start = section_active_index(start);
> +	idx_size = section_active_index(size);
> +
> +	if (idx_size == 0)
> +		return -1;
> +	return ((1UL << idx_size) - 1) << idx_start;
> +}
> +
> +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> +{
> +	int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> +	int i, start_sec = pfn_to_section_nr(pfn);
> +
> +	if (!nr_pages)
> +		return;
> +
> +	for (i = start_sec; i <= end_sec; i++) {
> +		struct mem_section *ms;
> +		unsigned long mask;
> +		unsigned long pfns;

s/pfns/nr_pfns/ instead?

> +		pfns = min(nr_pages, PAGES_PER_SECTION
> +				- (pfn & ~PAGE_SECTION_MASK));
> +		mask = section_active_mask(pfn, pfns);
> +
> +		ms = __nr_to_section(i);
> +		pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i,
> mask);
> +		ms->usage->map_active = mask;
> +
> +		pfn += pfns;
> +		nr_pages -= pfns;
> +	}
> +}

Although the code is not very complicated, it could use some comments
here and there I think.

> +
>  /* Record a memory area against a node. */
>  void __init memory_present(int nid, unsigned long start, unsigned
> long end)
>  {
> 
-- 
Oscar Salvador
SUSE L3
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
@ 2019-04-25 14:43     ` Oscar Salvador
  0 siblings, 0 replies; 111+ messages in thread
From: Oscar Salvador @ 2019-04-25 14:43 UTC (permalink / raw)
  To: Dan Williams, akpm
  Cc: Michal Hocko, Vlastimil Babka, Logan Gunthorpe, linux-mm,
	linux-nvdimm, linux-kernel, david

On Wed, 2019-04-17 at 11:39 -0700, Dan Williams wrote:
> Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> section active bitmask, each bit representing 2MB (SECTION_SIZE
> (128M) /
> map_active bitmask length (64)). If it turns out that 2MB is too
> large
> of an active tracking granularity it is trivial to increase the size
> of
> the map_active bitmap.
> 
> The implications of a partially populated section is that pfn_valid()
> needs to go beyond a valid_section() check and read the sub-section
> active ranges from the bitmask.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Hi Dan,

I am still going through the patchset but:
 
> +static unsigned long section_active_mask(unsigned long pfn,
> +		unsigned long nr_pages)
> +{
> +	int idx_start, idx_size;
> +	phys_addr_t start, size;
> +
> +	if (!nr_pages)
> +		return 0;
> +
> +	start = PFN_PHYS(pfn);
> +	size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> +				- (pfn & ~PAGE_SECTION_MASK)));

We already picked the lowest value in section_active_init, didn't we?
This min() operations seems redundant to me here.

> +	size = ALIGN(size, SECTION_ACTIVE_SIZE);
> +
> +	idx_start = section_active_index(start);
> +	idx_size = section_active_index(size);
> +
> +	if (idx_size == 0)
> +		return -1;
> +	return ((1UL << idx_size) - 1) << idx_start;
> +}
> +
> +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> +{
> +	int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> +	int i, start_sec = pfn_to_section_nr(pfn);
> +
> +	if (!nr_pages)
> +		return;
> +
> +	for (i = start_sec; i <= end_sec; i++) {
> +		struct mem_section *ms;
> +		unsigned long mask;
> +		unsigned long pfns;

s/pfns/nr_pfns/ instead?

> +		pfns = min(nr_pages, PAGES_PER_SECTION
> +				- (pfn & ~PAGE_SECTION_MASK));
> +		mask = section_active_mask(pfn, pfns);
> +
> +		ms = __nr_to_section(i);
> +		pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i,
> mask);
> +		ms->usage->map_active = mask;
> +
> +		pfn += pfns;
> +		nr_pages -= pfns;
> +	}
> +}

Although the code is not very complicated, it could use some comments
here and there I think.

> +
>  /* Record a memory area against a node. */
>  void __init memory_present(int nid, unsigned long start, unsigned
> long end)
>  {
> 
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
@ 2019-04-25 14:43     ` Oscar Salvador
  0 siblings, 0 replies; 111+ messages in thread
From: Oscar Salvador @ 2019-04-25 14:43 UTC (permalink / raw)
  To: Dan Williams, akpm
  Cc: Michal Hocko, Vlastimil Babka, Logan Gunthorpe, linux-mm,
	linux-nvdimm, linux-kernel, david

On Wed, 2019-04-17 at 11:39 -0700, Dan Williams wrote:
> Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> section active bitmask, each bit representing 2MB (SECTION_SIZE
> (128M) /
> map_active bitmask length (64)). If it turns out that 2MB is too
> large
> of an active tracking granularity it is trivial to increase the size
> of
> the map_active bitmap.
> 
> The implications of a partially populated section is that pfn_valid()
> needs to go beyond a valid_section() check and read the sub-section
> active ranges from the bitmask.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Hi Dan,

I am still going through the patchset but:
 
> +static unsigned long section_active_mask(unsigned long pfn,
> +		unsigned long nr_pages)
> +{
> +	int idx_start, idx_size;
> +	phys_addr_t start, size;
> +
> +	if (!nr_pages)
> +		return 0;
> +
> +	start = PFN_PHYS(pfn);
> +	size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> +				- (pfn & ~PAGE_SECTION_MASK)));

We already picked the lowest value in section_active_init, didn't we?
This min() operations seems redundant to me here.

> +	size = ALIGN(size, SECTION_ACTIVE_SIZE);
> +
> +	idx_start = section_active_index(start);
> +	idx_size = section_active_index(size);
> +
> +	if (idx_size == 0)
> +		return -1;
> +	return ((1UL << idx_size) - 1) << idx_start;
> +}
> +
> +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> +{
> +	int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> +	int i, start_sec = pfn_to_section_nr(pfn);
> +
> +	if (!nr_pages)
> +		return;
> +
> +	for (i = start_sec; i <= end_sec; i++) {
> +		struct mem_section *ms;
> +		unsigned long mask;
> +		unsigned long pfns;

s/pfns/nr_pfns/ instead?

> +		pfns = min(nr_pages, PAGES_PER_SECTION
> +				- (pfn & ~PAGE_SECTION_MASK));
> +		mask = section_active_mask(pfn, pfns);
> +
> +		ms = __nr_to_section(i);
> +		pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i,
> mask);
> +		ms->usage->map_active = mask;
> +
> +		pfn += pfns;
> +		nr_pages -= pfns;
> +	}
> +}

Although the code is not very complicated, it could use some comments
here and there I think.

> +
>  /* Record a memory area against a node. */
>  void __init memory_present(int nid, unsigned long start, unsigned
> long end)
>  {
> 
-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
  2019-04-17 18:39   ` Dan Williams
@ 2019-04-26 12:57     ` Oscar Salvador
  -1 siblings, 0 replies; 111+ messages in thread
From: Oscar Salvador @ 2019-04-26 12:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Michal Hocko, linux-nvdimm, david, linux-kernel, linux-mm, akpm,
	Vlastimil Babka

On Wed, Apr 17, 2019 at 11:39:11AM -0700, Dan Williams wrote:
> Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
> map_active bitmask length (64)). If it turns out that 2MB is too large
> of an active tracking granularity it is trivial to increase the size of
> the map_active bitmap.
> 
> The implications of a partially populated section is that pfn_valid()
> needs to go beyond a valid_section() check and read the sub-section
> active ranges from the bitmask.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[...]  
> +static unsigned long section_active_mask(unsigned long pfn,
> +		unsigned long nr_pages)
> +{
> +	int idx_start, idx_size;
> +	phys_addr_t start, size;
> +
> +	if (!nr_pages)
> +		return 0;
> +
> +	start = PFN_PHYS(pfn);
> +	size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> +				- (pfn & ~PAGE_SECTION_MASK)));
> +	size = ALIGN(size, SECTION_ACTIVE_SIZE);

I am probably missing something, and this is more a question than anything else, but:
is there a reason for shifting pfn and pages to get the size and the address?
Could not we operate on pfn/pages, so we do not have to shift every time?
(even for pfn_section_valid() calls)

Something like:

#define SUB_SECTION_ACTIVE_PAGES        (SECTION_ACTIVE_SIZE / PAGE_SIZE)

static inline int section_active_index(unsigned long pfn)
{
	return (pfn & ~(PAGE_SECTION_MASK)) / SUB_SECTION_ACTIVE_PAGES;
}

> +
> +	idx_start = section_active_index(start);
> +	idx_size = section_active_index(size);
> +
> +	if (idx_size == 0)
> +		return -1;

What about turning that into something more intuitive?
Since -1 represents here a full section, we could define something like:

#define FULL_SECTION	(-1UL)

Or a better name, it is just that I find "-1" not really easy to interpret.

> +	return ((1UL << idx_size) - 1) << idx_start;
> +}
> +
> +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> +{
> +	int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> +	int i, start_sec = pfn_to_section_nr(pfn);
> +
> +	if (!nr_pages)
> +		return;
> +
> +	for (i = start_sec; i <= end_sec; i++) {
> +		struct mem_section *ms;
> +		unsigned long mask;
> +		unsigned long pfns;
> +
> +		pfns = min(nr_pages, PAGES_PER_SECTION
> +				- (pfn & ~PAGE_SECTION_MASK));
> +		mask = section_active_mask(pfn, pfns);
> +
> +		ms = __nr_to_section(i);
> +		pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i, mask);
> +		ms->usage->map_active = mask;
> +
> +		pfn += pfns;
> +		nr_pages -= pfns;
> +	}
> +}
> +
>  /* Record a memory area against a node. */
>  void __init memory_present(int nid, unsigned long start, unsigned long end)
>  {
> 

-- 
Oscar Salvador
SUSE L3
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
@ 2019-04-26 12:57     ` Oscar Salvador
  0 siblings, 0 replies; 111+ messages in thread
From: Oscar Salvador @ 2019-04-26 12:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: akpm, Michal Hocko, Vlastimil Babka, Logan Gunthorpe, linux-mm,
	linux-nvdimm, linux-kernel, david

On Wed, Apr 17, 2019 at 11:39:11AM -0700, Dan Williams wrote:
> Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
> map_active bitmask length (64)). If it turns out that 2MB is too large
> of an active tracking granularity it is trivial to increase the size of
> the map_active bitmap.
> 
> The implications of a partially populated section is that pfn_valid()
> needs to go beyond a valid_section() check and read the sub-section
> active ranges from the bitmask.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[...]  
> +static unsigned long section_active_mask(unsigned long pfn,
> +		unsigned long nr_pages)
> +{
> +	int idx_start, idx_size;
> +	phys_addr_t start, size;
> +
> +	if (!nr_pages)
> +		return 0;
> +
> +	start = PFN_PHYS(pfn);
> +	size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> +				- (pfn & ~PAGE_SECTION_MASK)));
> +	size = ALIGN(size, SECTION_ACTIVE_SIZE);

I am probably missing something, and this is more a question than anything else, but:
is there a reason for shifting pfn and pages to get the size and the address?
Could not we operate on pfn/pages, so we do not have to shift every time?
(even for pfn_section_valid() calls)

Something like:

#define SUB_SECTION_ACTIVE_PAGES        (SECTION_ACTIVE_SIZE / PAGE_SIZE)

static inline int section_active_index(unsigned long pfn)
{
	return (pfn & ~(PAGE_SECTION_MASK)) / SUB_SECTION_ACTIVE_PAGES;
}

> +
> +	idx_start = section_active_index(start);
> +	idx_size = section_active_index(size);
> +
> +	if (idx_size == 0)
> +		return -1;

What about turning that into something more intuitive?
Since -1 represents here a full section, we could define something like:

#define FULL_SECTION	(-1UL)

Or a better name, it is just that I find "-1" not really easy to interpret.

> +	return ((1UL << idx_size) - 1) << idx_start;
> +}
> +
> +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> +{
> +	int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> +	int i, start_sec = pfn_to_section_nr(pfn);
> +
> +	if (!nr_pages)
> +		return;
> +
> +	for (i = start_sec; i <= end_sec; i++) {
> +		struct mem_section *ms;
> +		unsigned long mask;
> +		unsigned long pfns;
> +
> +		pfns = min(nr_pages, PAGES_PER_SECTION
> +				- (pfn & ~PAGE_SECTION_MASK));
> +		mask = section_active_mask(pfn, pfns);
> +
> +		ms = __nr_to_section(i);
> +		pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i, mask);
> +		ms->usage->map_active = mask;
> +
> +		pfn += pfns;
> +		nr_pages -= pfns;
> +	}
> +}
> +
>  /* Record a memory area against a node. */
>  void __init memory_present(int nid, unsigned long start, unsigned long end)
>  {
> 

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal
  2019-04-17 18:39   ` Dan Williams
  (?)
  (?)
@ 2019-04-26 13:59   ` Oscar Salvador
  2019-04-26 14:00     ` Oscar Salvador
  -1 siblings, 1 reply; 111+ messages in thread
From: Oscar Salvador @ 2019-04-26 13:59 UTC (permalink / raw)
  To: Dan Williams
  Cc: akpm, Michal Hocko, Vlastimil Babka, Logan Gunthorpe, linux-mm,
	linux-nvdimm, linux-kernel, david

On Wed, Apr 17, 2019 at 11:39:16AM -0700, Dan Williams wrote:
> @@ -417,10 +417,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
>  	 * it check the zone has only hole or not.
>  	 */
>  	pfn = zone_start_pfn;
> -	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SECTION) {
> +	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUB_SECTION) {
>  		ms = __pfn_to_section(pfn);
>  
> -		if (unlikely(!valid_section(ms)))
> +		if (unlikely(!pfn_valid(pfn)))
>  			continue;
>  
>  		if (page_zone(pfn_to_page(pfn)) != zone)
> @@ -485,10 +485,10 @@ static void shrink_pgdat_span(struct pglist_data *pgdat,
>  	 * has only hole or not.
>  	 */
>  	pfn = pgdat_start_pfn;
> -	for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SECTION) {
> +	for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SUB_SECTION) {
>  		ms = __pfn_to_section(pfn);
>  
> -		if (unlikely(!valid_section(ms)))
> +		if (unlikely(!pfn_valid(pfn)))
>  			continue;
>  
>  		if (pfn_to_nid(pfn) != nid)

The last loop from shrink_{pgdat,zone}_span can be reworked to unify both
in one function, and both functions can be factored out a bit.
Actually, I do have a patch that does that, I might dig it up.

The rest looks good:

Reviewed-by: Oscar Salvador <osalvador@suse.de>

> 

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal
  2019-04-26 13:59   ` Oscar Salvador
@ 2019-04-26 14:00     ` Oscar Salvador
  0 siblings, 0 replies; 111+ messages in thread
From: Oscar Salvador @ 2019-04-26 14:00 UTC (permalink / raw)
  To: Dan Williams
  Cc: akpm, Michal Hocko, Vlastimil Babka, Logan Gunthorpe, linux-mm,
	linux-nvdimm, linux-kernel, david

On Fri, Apr 26, 2019 at 03:59:12PM +0200, Oscar Salvador wrote:
> On Wed, Apr 17, 2019 at 11:39:16AM -0700, Dan Williams wrote:
> > @@ -417,10 +417,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
> >  	 * it check the zone has only hole or not.
> >  	 */
> >  	pfn = zone_start_pfn;
> > -	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SECTION) {
> > +	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUB_SECTION) {
> >  		ms = __pfn_to_section(pfn);
> >  
> > -		if (unlikely(!valid_section(ms)))
> > +		if (unlikely(!pfn_valid(pfn)))
> >  			continue;
> >  
> >  		if (page_zone(pfn_to_page(pfn)) != zone)
> > @@ -485,10 +485,10 @@ static void shrink_pgdat_span(struct pglist_data *pgdat,
> >  	 * has only hole or not.
> >  	 */
> >  	pfn = pgdat_start_pfn;
> > -	for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SECTION) {
> > +	for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SUB_SECTION) {
> >  		ms = __pfn_to_section(pfn);
> >  
> > -		if (unlikely(!valid_section(ms)))
> > +		if (unlikely(!pfn_valid(pfn)))
> >  			continue;
> >  
> >  		if (pfn_to_nid(pfn) != nid)
> 
> The last loop from shrink_{pgdat,zone}_span can be reworked to unify both
> in one function, and both functions can be factored out a bit.
> Actually, I do have a patch that does that, I might dig it up.
> 
> The rest looks good:
> 
> Reviewed-by: Oscar Salvador <osalvador@suse.de>

I mean of course besides Ralph's comment.

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 07/12] mm: Kill is_dev_zone() helper
  2019-04-17 18:39   ` Dan Williams
@ 2019-04-26 14:04     ` Oscar Salvador
  -1 siblings, 0 replies; 111+ messages in thread
From: Oscar Salvador @ 2019-04-26 14:04 UTC (permalink / raw)
  To: Dan Williams
  Cc: Michal Hocko, linux-nvdimm, David Hildenbrand, linux-kernel,
	linux-mm, akpm

On Wed, Apr 17, 2019 at 11:39:32AM -0700, Dan Williams wrote:
> Given there are no more usages of is_dev_zone() outside of 'ifdef
> CONFIG_ZONE_DEVICE' protection, kill off the compilation helper.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

> ---
>  include/linux/mmzone.h |   12 ------------
>  mm/page_alloc.c        |    2 +-
>  2 files changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index b13f0cddf75e..3237c5e456df 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -855,18 +855,6 @@ static inline int local_memory_node(int node_id) { return node_id; };
>   */
>  #define zone_idx(zone)		((zone) - (zone)->zone_pgdat->node_zones)
>  
> -#ifdef CONFIG_ZONE_DEVICE
> -static inline bool is_dev_zone(const struct zone *zone)
> -{
> -	return zone_idx(zone) == ZONE_DEVICE;
> -}
> -#else
> -static inline bool is_dev_zone(const struct zone *zone)
> -{
> -	return false;
> -}
> -#endif
> -
>  /*
>   * Returns true if a zone has pages managed by the buddy allocator.
>   * All the reclaim decisions have to use this function rather than
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c9ad28a78018..fd455bd742d5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5844,7 +5844,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
>  	unsigned long start = jiffies;
>  	int nid = pgdat->node_id;
>  
> -	if (WARN_ON_ONCE(!pgmap || !is_dev_zone(zone)))
> +	if (WARN_ON_ONCE(!pgmap || zone_idx(zone) != ZONE_DEVICE))
>  		return;
>  
>  	/*
> 

-- 
Oscar Salvador
SUSE L3
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 07/12] mm: Kill is_dev_zone() helper
@ 2019-04-26 14:04     ` Oscar Salvador
  0 siblings, 0 replies; 111+ messages in thread
From: Oscar Salvador @ 2019-04-26 14:04 UTC (permalink / raw)
  To: Dan Williams
  Cc: akpm, Michal Hocko, David Hildenbrand, Logan Gunthorpe, linux-mm,
	linux-nvdimm, linux-kernel

On Wed, Apr 17, 2019 at 11:39:32AM -0700, Dan Williams wrote:
> Given there are no more usages of is_dev_zone() outside of 'ifdef
> CONFIG_ZONE_DEVICE' protection, kill off the compilation helper.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

> ---
>  include/linux/mmzone.h |   12 ------------
>  mm/page_alloc.c        |    2 +-
>  2 files changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index b13f0cddf75e..3237c5e456df 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -855,18 +855,6 @@ static inline int local_memory_node(int node_id) { return node_id; };
>   */
>  #define zone_idx(zone)		((zone) - (zone)->zone_pgdat->node_zones)
>  
> -#ifdef CONFIG_ZONE_DEVICE
> -static inline bool is_dev_zone(const struct zone *zone)
> -{
> -	return zone_idx(zone) == ZONE_DEVICE;
> -}
> -#else
> -static inline bool is_dev_zone(const struct zone *zone)
> -{
> -	return false;
> -}
> -#endif
> -
>  /*
>   * Returns true if a zone has pages managed by the buddy allocator.
>   * All the reclaim decisions have to use this function rather than
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c9ad28a78018..fd455bd742d5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5844,7 +5844,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
>  	unsigned long start = jiffies;
>  	int nid = pgdat->node_id;
>  
> -	if (WARN_ON_ONCE(!pgmap || !is_dev_zone(zone)))
> +	if (WARN_ON_ONCE(!pgmap || zone_idx(zone) != ZONE_DEVICE))
>  		return;
>  
>  	/*
> 

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 01/12] mm/sparsemem: Introduce struct mem_section_usage
  2019-04-17 18:39   ` Dan Williams
@ 2019-05-01 23:25     ` Pavel Tatashin
  -1 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-01 23:25 UTC (permalink / raw)
  To: Dan Williams
  Cc: Michal Hocko, linux-nvdimm, david, linux-kernel, linux-mm, akpm,
	Vlastimil Babka

On 19-04-17 11:39:00, Dan Williams wrote:
> Towards enabling memory hotplug to track partial population of a
> section, introduce 'struct mem_section_usage'.
> 
> A pointer to a 'struct mem_section_usage' instance replaces the existing
> pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
> 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to
> house a new 'map_active' bitmap.  The new bitmap enables the memory
> hot{plug,remove} implementation to act on incremental sub-divisions of a
> section.
> 
> The primary motivation for this functionality is to support platforms
> that mix "System RAM" and "Persistent Memory" within a single section,
> or multiple PMEM ranges with different mapping lifetimes within a single
> section. The section restriction for hotplug has caused an ongoing saga
> of hacks and bugs for devm_memremap_pages() users.
> 
> Beyond the fixups to teach existing paths how to retrieve the 'usemap'
> from a section, and updates to usemap allocation path, there are no
> expected behavior changes.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/mmzone.h |   23 ++++++++++++--
>  mm/memory_hotplug.c    |   18 ++++++-----
>  mm/page_alloc.c        |    2 +
>  mm/sparse.c            |   81 ++++++++++++++++++++++++------------------------
>  4 files changed, 71 insertions(+), 53 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 70394cabaf4e..f0bbd85dc19a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1160,6 +1160,19 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
>  #define SECTION_ALIGN_UP(pfn)	(((pfn) + PAGES_PER_SECTION - 1) & PAGE_SECTION_MASK)
>  #define SECTION_ALIGN_DOWN(pfn)	((pfn) & PAGE_SECTION_MASK)
>  
> +#define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG)
> +#define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1))
> +
> +struct mem_section_usage {
> +	/*
> +	 * SECTION_ACTIVE_SIZE portions of the section that are populated in
> +	 * the memmap
> +	 */
> +	unsigned long map_active;

I think this should be proportional to section_size / subsection_size.
For example, on intel section size = 128M, and subsection is 2M, so
64bits work nicely. But, on arm64 section size if 1G, so subsection is
16M.

On the other hand 16M is already much better than what we have: with 1G
section size and 2M pmem alignment we guaranteed to loose 1022M. And
with 16M subsection it is only 14M.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 01/12] mm/sparsemem: Introduce struct mem_section_usage
@ 2019-05-01 23:25     ` Pavel Tatashin
  0 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-01 23:25 UTC (permalink / raw)
  To: Dan Williams
  Cc: akpm, Michal Hocko, Vlastimil Babka, Logan Gunthorpe, linux-mm,
	linux-nvdimm, linux-kernel, david

On 19-04-17 11:39:00, Dan Williams wrote:
> Towards enabling memory hotplug to track partial population of a
> section, introduce 'struct mem_section_usage'.
> 
> A pointer to a 'struct mem_section_usage' instance replaces the existing
> pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
> 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to
> house a new 'map_active' bitmap.  The new bitmap enables the memory
> hot{plug,remove} implementation to act on incremental sub-divisions of a
> section.
> 
> The primary motivation for this functionality is to support platforms
> that mix "System RAM" and "Persistent Memory" within a single section,
> or multiple PMEM ranges with different mapping lifetimes within a single
> section. The section restriction for hotplug has caused an ongoing saga
> of hacks and bugs for devm_memremap_pages() users.
> 
> Beyond the fixups to teach existing paths how to retrieve the 'usemap'
> from a section, and updates to usemap allocation path, there are no
> expected behavior changes.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/mmzone.h |   23 ++++++++++++--
>  mm/memory_hotplug.c    |   18 ++++++-----
>  mm/page_alloc.c        |    2 +
>  mm/sparse.c            |   81 ++++++++++++++++++++++++------------------------
>  4 files changed, 71 insertions(+), 53 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 70394cabaf4e..f0bbd85dc19a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1160,6 +1160,19 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
>  #define SECTION_ALIGN_UP(pfn)	(((pfn) + PAGES_PER_SECTION - 1) & PAGE_SECTION_MASK)
>  #define SECTION_ALIGN_DOWN(pfn)	((pfn) & PAGE_SECTION_MASK)
>  
> +#define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG)
> +#define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1))
> +
> +struct mem_section_usage {
> +	/*
> +	 * SECTION_ACTIVE_SIZE portions of the section that are populated in
> +	 * the memmap
> +	 */
> +	unsigned long map_active;

I think this should be proportional to section_size / subsection_size.
For example, on intel section size = 128M, and subsection is 2M, so
64bits work nicely. But, on arm64 section size if 1G, so subsection is
16M.

On the other hand 16M is already much better than what we have: with 1G
section size and 2M pmem alignment we guaranteed to loose 1022M. And
with 16M subsection it is only 14M.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 01/12] mm/sparsemem: Introduce struct mem_section_usage
  2019-05-01 23:25     ` Pavel Tatashin
@ 2019-05-02  6:07       ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-05-02  6:07 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	Linux MM, linux-nvdimm, Linux Kernel Mailing List,
	David Hildenbrand

On Wed, May 1, 2019 at 4:25 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
>
> On 19-04-17 11:39:00, Dan Williams wrote:
> > Towards enabling memory hotplug to track partial population of a
> > section, introduce 'struct mem_section_usage'.
> >
> > A pointer to a 'struct mem_section_usage' instance replaces the existing
> > pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
> > 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to
> > house a new 'map_active' bitmap.  The new bitmap enables the memory
> > hot{plug,remove} implementation to act on incremental sub-divisions of a
> > section.
> >
> > The primary motivation for this functionality is to support platforms
> > that mix "System RAM" and "Persistent Memory" within a single section,
> > or multiple PMEM ranges with different mapping lifetimes within a single
> > section. The section restriction for hotplug has caused an ongoing saga
> > of hacks and bugs for devm_memremap_pages() users.
> >
> > Beyond the fixups to teach existing paths how to retrieve the 'usemap'
> > from a section, and updates to usemap allocation path, there are no
> > expected behavior changes.
> >
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Logan Gunthorpe <logang@deltatee.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  include/linux/mmzone.h |   23 ++++++++++++--
> >  mm/memory_hotplug.c    |   18 ++++++-----
> >  mm/page_alloc.c        |    2 +
> >  mm/sparse.c            |   81 ++++++++++++++++++++++++------------------------
> >  4 files changed, 71 insertions(+), 53 deletions(-)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 70394cabaf4e..f0bbd85dc19a 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -1160,6 +1160,19 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
> >  #define SECTION_ALIGN_UP(pfn)        (((pfn) + PAGES_PER_SECTION - 1) & PAGE_SECTION_MASK)
> >  #define SECTION_ALIGN_DOWN(pfn)      ((pfn) & PAGE_SECTION_MASK)
> >
> > +#define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG)
> > +#define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1))
> > +
> > +struct mem_section_usage {
> > +     /*
> > +      * SECTION_ACTIVE_SIZE portions of the section that are populated in
> > +      * the memmap
> > +      */
> > +     unsigned long map_active;
>
> I think this should be proportional to section_size / subsection_size.
> For example, on intel section size = 128M, and subsection is 2M, so
> 64bits work nicely. But, on arm64 section size if 1G, so subsection is
> 16M.
>
> On the other hand 16M is already much better than what we have: with 1G
> section size and 2M pmem alignment we guaranteed to loose 1022M. And
> with 16M subsection it is only 14M.

I'm ok with it being 16M for now unless it causes a problem in
practice, i.e. something like the minimum hardware mapping alignment
for physical memory being less than 16M.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 01/12] mm/sparsemem: Introduce struct mem_section_usage
@ 2019-05-02  6:07       ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-05-02  6:07 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	Linux MM, linux-nvdimm, Linux Kernel Mailing List,
	David Hildenbrand

On Wed, May 1, 2019 at 4:25 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
>
> On 19-04-17 11:39:00, Dan Williams wrote:
> > Towards enabling memory hotplug to track partial population of a
> > section, introduce 'struct mem_section_usage'.
> >
> > A pointer to a 'struct mem_section_usage' instance replaces the existing
> > pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
> > 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to
> > house a new 'map_active' bitmap.  The new bitmap enables the memory
> > hot{plug,remove} implementation to act on incremental sub-divisions of a
> > section.
> >
> > The primary motivation for this functionality is to support platforms
> > that mix "System RAM" and "Persistent Memory" within a single section,
> > or multiple PMEM ranges with different mapping lifetimes within a single
> > section. The section restriction for hotplug has caused an ongoing saga
> > of hacks and bugs for devm_memremap_pages() users.
> >
> > Beyond the fixups to teach existing paths how to retrieve the 'usemap'
> > from a section, and updates to usemap allocation path, there are no
> > expected behavior changes.
> >
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Logan Gunthorpe <logang@deltatee.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  include/linux/mmzone.h |   23 ++++++++++++--
> >  mm/memory_hotplug.c    |   18 ++++++-----
> >  mm/page_alloc.c        |    2 +
> >  mm/sparse.c            |   81 ++++++++++++++++++++++++------------------------
> >  4 files changed, 71 insertions(+), 53 deletions(-)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 70394cabaf4e..f0bbd85dc19a 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -1160,6 +1160,19 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
> >  #define SECTION_ALIGN_UP(pfn)        (((pfn) + PAGES_PER_SECTION - 1) & PAGE_SECTION_MASK)
> >  #define SECTION_ALIGN_DOWN(pfn)      ((pfn) & PAGE_SECTION_MASK)
> >
> > +#define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG)
> > +#define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1))
> > +
> > +struct mem_section_usage {
> > +     /*
> > +      * SECTION_ACTIVE_SIZE portions of the section that are populated in
> > +      * the memmap
> > +      */
> > +     unsigned long map_active;
>
> I think this should be proportional to section_size / subsection_size.
> For example, on intel section size = 128M, and subsection is 2M, so
> 64bits work nicely. But, on arm64 section size if 1G, so subsection is
> 16M.
>
> On the other hand 16M is already much better than what we have: with 1G
> section size and 2M pmem alignment we guaranteed to loose 1022M. And
> with 16M subsection it is only 14M.

I'm ok with it being 16M for now unless it causes a problem in
practice, i.e. something like the minimum hardware mapping alignment
for physical memory being less than 16M.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 01/12] mm/sparsemem: Introduce struct mem_section_usage
  2019-05-02  6:07       ` Dan Williams
@ 2019-05-02 14:16         ` Pavel Tatashin
  -1 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 14:16 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	Linux MM, linux-nvdimm, Linux Kernel Mailing List,
	David Hildenbrand

On Thu, May 2, 2019 at 2:07 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, May 1, 2019 at 4:25 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
> >
> > On 19-04-17 11:39:00, Dan Williams wrote:
> > > Towards enabling memory hotplug to track partial population of a
> > > section, introduce 'struct mem_section_usage'.
> > >
> > > A pointer to a 'struct mem_section_usage' instance replaces the existing
> > > pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
> > > 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to
> > > house a new 'map_active' bitmap.  The new bitmap enables the memory
> > > hot{plug,remove} implementation to act on incremental sub-divisions of a
> > > section.
> > >
> > > The primary motivation for this functionality is to support platforms
> > > that mix "System RAM" and "Persistent Memory" within a single section,
> > > or multiple PMEM ranges with different mapping lifetimes within a single
> > > section. The section restriction for hotplug has caused an ongoing saga
> > > of hacks and bugs for devm_memremap_pages() users.
> > >
> > > Beyond the fixups to teach existing paths how to retrieve the 'usemap'
> > > from a section, and updates to usemap allocation path, there are no
> > > expected behavior changes.
> > >
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Vlastimil Babka <vbabka@suse.cz>
> > > Cc: Logan Gunthorpe <logang@deltatee.com>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 01/12] mm/sparsemem: Introduce struct mem_section_usage
@ 2019-05-02 14:16         ` Pavel Tatashin
  0 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 14:16 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	Linux MM, linux-nvdimm, Linux Kernel Mailing List,
	David Hildenbrand

On Thu, May 2, 2019 at 2:07 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, May 1, 2019 at 4:25 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
> >
> > On 19-04-17 11:39:00, Dan Williams wrote:
> > > Towards enabling memory hotplug to track partial population of a
> > > section, introduce 'struct mem_section_usage'.
> > >
> > > A pointer to a 'struct mem_section_usage' instance replaces the existing
> > > pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
> > > 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to
> > > house a new 'map_active' bitmap.  The new bitmap enables the memory
> > > hot{plug,remove} implementation to act on incremental sub-divisions of a
> > > section.
> > >
> > > The primary motivation for this functionality is to support platforms
> > > that mix "System RAM" and "Persistent Memory" within a single section,
> > > or multiple PMEM ranges with different mapping lifetimes within a single
> > > section. The section restriction for hotplug has caused an ongoing saga
> > > of hacks and bugs for devm_memremap_pages() users.
> > >
> > > Beyond the fixups to teach existing paths how to retrieve the 'usemap'
> > > from a section, and updates to usemap allocation path, there are no
> > > expected behavior changes.
> > >
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Vlastimil Babka <vbabka@suse.cz>
> > > Cc: Logan Gunthorpe <logang@deltatee.com>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 02/12] mm/sparsemem: Introduce common definitions for the size and mask of a section
  2019-04-17 18:39   ` Dan Williams
  (?)
@ 2019-05-02 14:53   ` Pavel Tatashin
  2019-05-03  0:41       ` Dan Williams
  -1 siblings, 1 reply; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 14:53 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka,
	Jérôme Glisse, Logan Gunthorpe, linux-mm, linux-nvdimm,
	LKML, David Hildenbrand

On Wed, Apr 17, 2019 at 2:52 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Up-level the local section size and mask from kernel/memremap.c to
> global definitions.  These will be used by the new sub-section hotplug
> support.
>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Should be dropped from this series as it has been replaced by a very
similar patch in the mainline:

7c697d7fb5cb14ef60e2b687333ba3efb74f73da
 mm/memremap: Rename and consolidate SECTION_SIZE

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
  2019-04-17 18:39   ` Dan Williams
  (?)
@ 2019-05-02 16:12     ` Pavel Tatashin
  -1 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 16:12 UTC (permalink / raw)
  To: Dan Williams
  Cc: Michal Hocko, linux-nvdimm, David Hildenbrand, LKML, linux-mm,
	Andrew Morton, Vlastimil Babka

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
> map_active bitmask length (64)). If it turns out that 2MB is too large
> of an active tracking granularity it is trivial to increase the size of
> the map_active bitmap.

Please mention that 2M on Intel, and 16M on Arm64.

>
> The implications of a partially populated section is that pfn_valid()
> needs to go beyond a valid_section() check and read the sub-section
> active ranges from the bitmask.
>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/mmzone.h |   29 ++++++++++++++++++++++++++++-
>  mm/page_alloc.c        |    4 +++-
>  mm/sparse.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 79 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 6726fc175b51..cffde898e345 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1175,6 +1175,8 @@ struct mem_section_usage {
>         unsigned long pageblock_flags[0];
>  };
>
> +void section_active_init(unsigned long pfn, unsigned long nr_pages);
> +
>  struct page;
>  struct page_ext;
>  struct mem_section {
> @@ -1312,12 +1314,36 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn)
>
>  extern int __highest_present_section_nr;
>
> +static inline int section_active_index(phys_addr_t phys)
> +{
> +       return (phys & ~(PA_SECTION_MASK)) / SECTION_ACTIVE_SIZE;

How about also defining SECTION_ACTIVE_SHIFT like this:

/* BITS_PER_LONG = 2^6 */
#define BITS_PER_LONG_SHIFT 6
#define SECTION_ACTIVE_SHIFT (SECTION_SIZE_BITS - BITS_PER_LONG_SHIFT)
#define SECTION_ACTIVE_SIZE (1 << SECTION_ACTIVE_SHIFT)

The return above would become:
return (phys & ~(PA_SECTION_MASK)) >> SECTION_ACTIVE_SHIFT;

> +}
> +
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> +{
> +       int idx = section_active_index(PFN_PHYS(pfn));
> +
> +       return !!(ms->usage->map_active & (1UL << idx));
> +}
> +#else
> +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> +{
> +       return 1;
> +}
> +#endif
> +
>  #ifndef CONFIG_HAVE_ARCH_PFN_VALID
>  static inline int pfn_valid(unsigned long pfn)
>  {
> +       struct mem_section *ms;
> +
>         if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
>                 return 0;
> -       return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> +       ms = __nr_to_section(pfn_to_section_nr(pfn));
> +       if (!valid_section(ms))
> +               return 0;
> +       return pfn_section_valid(ms, pfn);
>  }
>  #endif
>
> @@ -1349,6 +1375,7 @@ void sparse_init(void);
>  #define sparse_init()  do {} while (0)
>  #define sparse_index_init(_sec, _nid)  do {} while (0)
>  #define pfn_present pfn_valid
> +#define section_active_init(_pfn, _nr_pages) do {} while (0)
>  #endif /* CONFIG_SPARSEMEM */
>
>  /*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f671401a7c0b..c9ad28a78018 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7273,10 +7273,12 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
>
>         /* Print out the early node map */
>         pr_info("Early memory node ranges\n");
> -       for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
> +       for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
>                 pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
>                         (u64)start_pfn << PAGE_SHIFT,
>                         ((u64)end_pfn << PAGE_SHIFT) - 1);
> +               section_active_init(start_pfn, end_pfn - start_pfn);
> +       }
>
>         /* Initialise every node */
>         mminit_verify_pageflags_layout();
> diff --git a/mm/sparse.c b/mm/sparse.c
> index f87de7ad32c8..5ef2f884c4e1 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -210,6 +210,54 @@ static inline unsigned long first_present_section_nr(void)
>         return next_present_section_nr(-1);
>  }
>
> +static unsigned long section_active_mask(unsigned long pfn,
> +               unsigned long nr_pages)
> +{
> +       int idx_start, idx_size;
> +       phys_addr_t start, size;
> +
> +       if (!nr_pages)
> +               return 0;
> +
> +       start = PFN_PHYS(pfn);
> +       size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> +                               - (pfn & ~PAGE_SECTION_MASK)));
> +       size = ALIGN(size, SECTION_ACTIVE_SIZE);
> +
> +       idx_start = section_active_index(start);
> +       idx_size = section_active_index(size);
> +
> +       if (idx_size == 0)
> +               return -1;
> +       return ((1UL << idx_size) - 1) << idx_start;
> +}
> +
> +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> +{
> +       int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> +       int i, start_sec = pfn_to_section_nr(pfn);
> +
> +       if (!nr_pages)
> +               return;
> +
> +       for (i = start_sec; i <= end_sec; i++) {
> +               struct mem_section *ms;
> +               unsigned long mask;
> +               unsigned long pfns;
> +
> +               pfns = min(nr_pages, PAGES_PER_SECTION
> +                               - (pfn & ~PAGE_SECTION_MASK));
> +               mask = section_active_mask(pfn, pfns);
> +
> +               ms = __nr_to_section(i);
> +               pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i, mask);
> +               ms->usage->map_active = mask;
> +
> +               pfn += pfns;
> +               nr_pages -= pfns;
> +       }
> +}

For some reasons the above code is confusing to me. It seems all the
code supposed to do is set all map_active to -1, and trim the first
and last sections (can be the same section of course). So, I would
replace the above two functions with one function like this:

void section_active_init(unsigned long pfn, unsigned long nr_pages)
{
        int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
        int i, idx, start_sec = pfn_to_section_nr(pfn);
        struct mem_section *ms;

        if (!nr_pages)
                return;

        for (i = start_sec; i <= end_sec; i++) {
                ms = __nr_to_section(i);
                ms->usage->map_active = ~0ul;
        }

        /* Might need to trim active pfns from the beginning and end */
        idx = section_active_index(PFN_PHYS(pfn));
        ms = __nr_to_section(start_sec);
        ms->usage->map_active &= (~0ul << idx);

        idx = section_active_index(PFN_PHYS(pfn + nr_pages -1));
        ms = __nr_to_section(end_sec);
        ms->usage->map_active &= (~0ul >> (BITS_PER_LONG - idx - 1));
}
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
@ 2019-05-02 16:12     ` Pavel Tatashin
  0 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 16:12 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	linux-mm, linux-nvdimm, LKML, David Hildenbrand

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
> map_active bitmask length (64)). If it turns out that 2MB is too large
> of an active tracking granularity it is trivial to increase the size of
> the map_active bitmap.

Please mention that 2M on Intel, and 16M on Arm64.

>
> The implications of a partially populated section is that pfn_valid()
> needs to go beyond a valid_section() check and read the sub-section
> active ranges from the bitmask.
>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/mmzone.h |   29 ++++++++++++++++++++++++++++-
>  mm/page_alloc.c        |    4 +++-
>  mm/sparse.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 79 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 6726fc175b51..cffde898e345 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1175,6 +1175,8 @@ struct mem_section_usage {
>         unsigned long pageblock_flags[0];
>  };
>
> +void section_active_init(unsigned long pfn, unsigned long nr_pages);
> +
>  struct page;
>  struct page_ext;
>  struct mem_section {
> @@ -1312,12 +1314,36 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn)
>
>  extern int __highest_present_section_nr;
>
> +static inline int section_active_index(phys_addr_t phys)
> +{
> +       return (phys & ~(PA_SECTION_MASK)) / SECTION_ACTIVE_SIZE;

How about also defining SECTION_ACTIVE_SHIFT like this:

/* BITS_PER_LONG = 2^6 */
#define BITS_PER_LONG_SHIFT 6
#define SECTION_ACTIVE_SHIFT (SECTION_SIZE_BITS - BITS_PER_LONG_SHIFT)
#define SECTION_ACTIVE_SIZE (1 << SECTION_ACTIVE_SHIFT)

The return above would become:
return (phys & ~(PA_SECTION_MASK)) >> SECTION_ACTIVE_SHIFT;

> +}
> +
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> +{
> +       int idx = section_active_index(PFN_PHYS(pfn));
> +
> +       return !!(ms->usage->map_active & (1UL << idx));
> +}
> +#else
> +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> +{
> +       return 1;
> +}
> +#endif
> +
>  #ifndef CONFIG_HAVE_ARCH_PFN_VALID
>  static inline int pfn_valid(unsigned long pfn)
>  {
> +       struct mem_section *ms;
> +
>         if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
>                 return 0;
> -       return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> +       ms = __nr_to_section(pfn_to_section_nr(pfn));
> +       if (!valid_section(ms))
> +               return 0;
> +       return pfn_section_valid(ms, pfn);
>  }
>  #endif
>
> @@ -1349,6 +1375,7 @@ void sparse_init(void);
>  #define sparse_init()  do {} while (0)
>  #define sparse_index_init(_sec, _nid)  do {} while (0)
>  #define pfn_present pfn_valid
> +#define section_active_init(_pfn, _nr_pages) do {} while (0)
>  #endif /* CONFIG_SPARSEMEM */
>
>  /*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f671401a7c0b..c9ad28a78018 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7273,10 +7273,12 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
>
>         /* Print out the early node map */
>         pr_info("Early memory node ranges\n");
> -       for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
> +       for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
>                 pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
>                         (u64)start_pfn << PAGE_SHIFT,
>                         ((u64)end_pfn << PAGE_SHIFT) - 1);
> +               section_active_init(start_pfn, end_pfn - start_pfn);
> +       }
>
>         /* Initialise every node */
>         mminit_verify_pageflags_layout();
> diff --git a/mm/sparse.c b/mm/sparse.c
> index f87de7ad32c8..5ef2f884c4e1 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -210,6 +210,54 @@ static inline unsigned long first_present_section_nr(void)
>         return next_present_section_nr(-1);
>  }
>
> +static unsigned long section_active_mask(unsigned long pfn,
> +               unsigned long nr_pages)
> +{
> +       int idx_start, idx_size;
> +       phys_addr_t start, size;
> +
> +       if (!nr_pages)
> +               return 0;
> +
> +       start = PFN_PHYS(pfn);
> +       size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> +                               - (pfn & ~PAGE_SECTION_MASK)));
> +       size = ALIGN(size, SECTION_ACTIVE_SIZE);
> +
> +       idx_start = section_active_index(start);
> +       idx_size = section_active_index(size);
> +
> +       if (idx_size == 0)
> +               return -1;
> +       return ((1UL << idx_size) - 1) << idx_start;
> +}
> +
> +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> +{
> +       int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> +       int i, start_sec = pfn_to_section_nr(pfn);
> +
> +       if (!nr_pages)
> +               return;
> +
> +       for (i = start_sec; i <= end_sec; i++) {
> +               struct mem_section *ms;
> +               unsigned long mask;
> +               unsigned long pfns;
> +
> +               pfns = min(nr_pages, PAGES_PER_SECTION
> +                               - (pfn & ~PAGE_SECTION_MASK));
> +               mask = section_active_mask(pfn, pfns);
> +
> +               ms = __nr_to_section(i);
> +               pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i, mask);
> +               ms->usage->map_active = mask;
> +
> +               pfn += pfns;
> +               nr_pages -= pfns;
> +       }
> +}

For some reasons the above code is confusing to me. It seems all the
code supposed to do is set all map_active to -1, and trim the first
and last sections (can be the same section of course). So, I would
replace the above two functions with one function like this:

void section_active_init(unsigned long pfn, unsigned long nr_pages)
{
        int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
        int i, idx, start_sec = pfn_to_section_nr(pfn);
        struct mem_section *ms;

        if (!nr_pages)
                return;

        for (i = start_sec; i <= end_sec; i++) {
                ms = __nr_to_section(i);
                ms->usage->map_active = ~0ul;
        }

        /* Might need to trim active pfns from the beginning and end */
        idx = section_active_index(PFN_PHYS(pfn));
        ms = __nr_to_section(start_sec);
        ms->usage->map_active &= (~0ul << idx);

        idx = section_active_index(PFN_PHYS(pfn + nr_pages -1));
        ms = __nr_to_section(end_sec);
        ms->usage->map_active &= (~0ul >> (BITS_PER_LONG - idx - 1));
}

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
@ 2019-05-02 16:12     ` Pavel Tatashin
  0 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 16:12 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	linux-mm, linux-nvdimm, LKML, David Hildenbrand

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
> map_active bitmask length (64)). If it turns out that 2MB is too large
> of an active tracking granularity it is trivial to increase the size of
> the map_active bitmap.

Please mention that 2M on Intel, and 16M on Arm64.

>
> The implications of a partially populated section is that pfn_valid()
> needs to go beyond a valid_section() check and read the sub-section
> active ranges from the bitmask.
>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/mmzone.h |   29 ++++++++++++++++++++++++++++-
>  mm/page_alloc.c        |    4 +++-
>  mm/sparse.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 79 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 6726fc175b51..cffde898e345 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1175,6 +1175,8 @@ struct mem_section_usage {
>         unsigned long pageblock_flags[0];
>  };
>
> +void section_active_init(unsigned long pfn, unsigned long nr_pages);
> +
>  struct page;
>  struct page_ext;
>  struct mem_section {
> @@ -1312,12 +1314,36 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn)
>
>  extern int __highest_present_section_nr;
>
> +static inline int section_active_index(phys_addr_t phys)
> +{
> +       return (phys & ~(PA_SECTION_MASK)) / SECTION_ACTIVE_SIZE;

How about also defining SECTION_ACTIVE_SHIFT like this:

/* BITS_PER_LONG = 2^6 */
#define BITS_PER_LONG_SHIFT 6
#define SECTION_ACTIVE_SHIFT (SECTION_SIZE_BITS - BITS_PER_LONG_SHIFT)
#define SECTION_ACTIVE_SIZE (1 << SECTION_ACTIVE_SHIFT)

The return above would become:
return (phys & ~(PA_SECTION_MASK)) >> SECTION_ACTIVE_SHIFT;

> +}
> +
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> +{
> +       int idx = section_active_index(PFN_PHYS(pfn));
> +
> +       return !!(ms->usage->map_active & (1UL << idx));
> +}
> +#else
> +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> +{
> +       return 1;
> +}
> +#endif
> +
>  #ifndef CONFIG_HAVE_ARCH_PFN_VALID
>  static inline int pfn_valid(unsigned long pfn)
>  {
> +       struct mem_section *ms;
> +
>         if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
>                 return 0;
> -       return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> +       ms = __nr_to_section(pfn_to_section_nr(pfn));
> +       if (!valid_section(ms))
> +               return 0;
> +       return pfn_section_valid(ms, pfn);
>  }
>  #endif
>
> @@ -1349,6 +1375,7 @@ void sparse_init(void);
>  #define sparse_init()  do {} while (0)
>  #define sparse_index_init(_sec, _nid)  do {} while (0)
>  #define pfn_present pfn_valid
> +#define section_active_init(_pfn, _nr_pages) do {} while (0)
>  #endif /* CONFIG_SPARSEMEM */
>
>  /*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f671401a7c0b..c9ad28a78018 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7273,10 +7273,12 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
>
>         /* Print out the early node map */
>         pr_info("Early memory node ranges\n");
> -       for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
> +       for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
>                 pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
>                         (u64)start_pfn << PAGE_SHIFT,
>                         ((u64)end_pfn << PAGE_SHIFT) - 1);
> +               section_active_init(start_pfn, end_pfn - start_pfn);
> +       }
>
>         /* Initialise every node */
>         mminit_verify_pageflags_layout();
> diff --git a/mm/sparse.c b/mm/sparse.c
> index f87de7ad32c8..5ef2f884c4e1 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -210,6 +210,54 @@ static inline unsigned long first_present_section_nr(void)
>         return next_present_section_nr(-1);
>  }
>
> +static unsigned long section_active_mask(unsigned long pfn,
> +               unsigned long nr_pages)
> +{
> +       int idx_start, idx_size;
> +       phys_addr_t start, size;
> +
> +       if (!nr_pages)
> +               return 0;
> +
> +       start = PFN_PHYS(pfn);
> +       size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> +                               - (pfn & ~PAGE_SECTION_MASK)));
> +       size = ALIGN(size, SECTION_ACTIVE_SIZE);
> +
> +       idx_start = section_active_index(start);
> +       idx_size = section_active_index(size);
> +
> +       if (idx_size == 0)
> +               return -1;
> +       return ((1UL << idx_size) - 1) << idx_start;
> +}
> +
> +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> +{
> +       int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> +       int i, start_sec = pfn_to_section_nr(pfn);
> +
> +       if (!nr_pages)
> +               return;
> +
> +       for (i = start_sec; i <= end_sec; i++) {
> +               struct mem_section *ms;
> +               unsigned long mask;
> +               unsigned long pfns;
> +
> +               pfns = min(nr_pages, PAGES_PER_SECTION
> +                               - (pfn & ~PAGE_SECTION_MASK));
> +               mask = section_active_mask(pfn, pfns);
> +
> +               ms = __nr_to_section(i);
> +               pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i, mask);
> +               ms->usage->map_active = mask;
> +
> +               pfn += pfns;
> +               nr_pages -= pfns;
> +       }
> +}

For some reasons the above code is confusing to me. It seems all the
code supposed to do is set all map_active to -1, and trim the first
and last sections (can be the same section of course). So, I would
replace the above two functions with one function like this:

void section_active_init(unsigned long pfn, unsigned long nr_pages)
{
        int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
        int i, idx, start_sec = pfn_to_section_nr(pfn);
        struct mem_section *ms;

        if (!nr_pages)
                return;

        for (i = start_sec; i <= end_sec; i++) {
                ms = __nr_to_section(i);
                ms->usage->map_active = ~0ul;
        }

        /* Might need to trim active pfns from the beginning and end */
        idx = section_active_index(PFN_PHYS(pfn));
        ms = __nr_to_section(start_sec);
        ms->usage->map_active &= (~0ul << idx);

        idx = section_active_index(PFN_PHYS(pfn + nr_pages -1));
        ms = __nr_to_section(end_sec);
        ms->usage->map_active &= (~0ul >> (BITS_PER_LONG - idx - 1));
}


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal
  2019-04-17 18:39   ` Dan Williams
  (?)
@ 2019-05-02 19:18     ` Pavel Tatashin
  -1 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 19:18 UTC (permalink / raw)
  To: Dan Williams
  Cc: Michal Hocko, linux-nvdimm, David Hildenbrand, LKML, linux-mm,
	Andrew Morton, Vlastimil Babka

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Sub-section hotplug support reduces the unit of operation of hotplug
> from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units
> (PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider
> PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not
> valid_section(), can toggle.
>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/mmzone.h |    2 ++
>  mm/memory_hotplug.c    |   16 ++++++++--------
>  2 files changed, 10 insertions(+), 8 deletions(-)

given removing all unused "*ms"

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal
@ 2019-05-02 19:18     ` Pavel Tatashin
  0 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 19:18 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	linux-mm, linux-nvdimm, LKML, David Hildenbrand

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Sub-section hotplug support reduces the unit of operation of hotplug
> from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units
> (PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider
> PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not
> valid_section(), can toggle.
>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/mmzone.h |    2 ++
>  mm/memory_hotplug.c    |   16 ++++++++--------
>  2 files changed, 10 insertions(+), 8 deletions(-)

given removing all unused "*ms"

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal
@ 2019-05-02 19:18     ` Pavel Tatashin
  0 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 19:18 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	linux-mm, linux-nvdimm, LKML, David Hildenbrand

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Sub-section hotplug support reduces the unit of operation of hotplug
> from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units
> (PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider
> PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not
> valid_section(), can toggle.
>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/mmzone.h |    2 ++
>  mm/memory_hotplug.c    |   16 ++++++++--------
>  2 files changed, 10 insertions(+), 8 deletions(-)

given removing all unused "*ms"

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 05/12] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap()
  2019-04-17 18:39   ` Dan Williams
@ 2019-05-02 19:28     ` Pavel Tatashin
  -1 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 19:28 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Michal Hocko, David Hildenbrand, Logan Gunthorpe,
	linux-mm, linux-nvdimm, LKML

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Allow sub-section sized ranges to be added to the memmap.
> populate_section_memmap() takes an explict pfn range rather than
> assuming a full section, and those parameters are plumbed all the way
> through to vmmemap_populate(). There should be no sub-section usage in
> current deployments. New warnings are added to clarify which memmap
> allocation paths are sub-section capable.
>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

 Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 05/12] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap()
@ 2019-05-02 19:28     ` Pavel Tatashin
  0 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 19:28 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Michal Hocko, David Hildenbrand, Logan Gunthorpe,
	linux-mm, linux-nvdimm, LKML

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Allow sub-section sized ranges to be added to the memmap.
> populate_section_memmap() takes an explict pfn range rather than
> assuming a full section, and those parameters are plumbed all the way
> through to vmmemap_populate(). There should be no sub-section usage in
> current deployments. New warnings are added to clarify which memmap
> allocation paths are sub-section capable.
>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

 Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 07/12] mm: Kill is_dev_zone() helper
  2019-04-17 18:39   ` Dan Williams
  (?)
@ 2019-05-02 20:37     ` Pavel Tatashin
  -1 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 20:37 UTC (permalink / raw)
  To: Dan Williams
  Cc: Michal Hocko, linux-nvdimm, David Hildenbrand, LKML, linux-mm,
	Andrew Morton

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Given there are no more usages of is_dev_zone() outside of 'ifdef
> CONFIG_ZONE_DEVICE' protection, kill off the compilation helper.
>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 07/12] mm: Kill is_dev_zone() helper
@ 2019-05-02 20:37     ` Pavel Tatashin
  0 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 20:37 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Michal Hocko, David Hildenbrand, Logan Gunthorpe,
	linux-mm, linux-nvdimm, LKML

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Given there are no more usages of is_dev_zone() outside of 'ifdef
> CONFIG_ZONE_DEVICE' protection, kill off the compilation helper.
>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 07/12] mm: Kill is_dev_zone() helper
@ 2019-05-02 20:37     ` Pavel Tatashin
  0 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 20:37 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Michal Hocko, David Hildenbrand, Logan Gunthorpe,
	linux-mm, linux-nvdimm, LKML

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Given there are no more usages of is_dev_zone() outside of 'ifdef
> CONFIG_ZONE_DEVICE' protection, kill off the compilation helper.
>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 08/12] mm/sparsemem: Prepare for sub-section ranges
  2019-04-17 18:39   ` Dan Williams
  (?)
@ 2019-05-02 21:25     ` Pavel Tatashin
  -1 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 21:25 UTC (permalink / raw)
  To: Dan Williams
  Cc: Michal Hocko, linux-nvdimm, David Hildenbrand, LKML, linux-mm,
	Andrew Morton, Vlastimil Babka

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Prepare the memory hot-{add,remove} paths for handling sub-section
> ranges by plumbing the starting page frame and number of pages being
> handled through arch_{add,remove}_memory() to
> sparse_{add,remove}_one_section().
>
> This is simply plumbing, small cleanups, and some identifier renames. No
> intended functional changes.
>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 08/12] mm/sparsemem: Prepare for sub-section ranges
@ 2019-05-02 21:25     ` Pavel Tatashin
  0 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 21:25 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	linux-mm, linux-nvdimm, LKML, David Hildenbrand

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Prepare the memory hot-{add,remove} paths for handling sub-section
> ranges by plumbing the starting page frame and number of pages being
> handled through arch_{add,remove}_memory() to
> sparse_{add,remove}_one_section().
>
> This is simply plumbing, small cleanups, and some identifier renames. No
> intended functional changes.
>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 08/12] mm/sparsemem: Prepare for sub-section ranges
@ 2019-05-02 21:25     ` Pavel Tatashin
  0 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 21:25 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	linux-mm, linux-nvdimm, LKML, David Hildenbrand

On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Prepare the memory hot-{add,remove} paths for handling sub-section
> ranges by plumbing the starting page frame and number of pages being
> handled through arch_{add,remove}_memory() to
> sparse_{add,remove}_one_section().
>
> This is simply plumbing, small cleanups, and some identifier renames. No
> intended functional changes.
>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
  2019-04-17 18:38 ` Dan Williams
                   ` (13 preceding siblings ...)
  (?)
@ 2019-05-02 22:46 ` Pavel Tatashin
  2019-05-02 23:20     ` Dan Williams
  -1 siblings, 1 reply; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-02 22:46 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, David Hildenbrand, Jérôme Glisse,
	Logan Gunthorpe, Toshi Kani, Jeff Moyer, Michal Hocko,
	Vlastimil Babka, stable, linux-mm, linux-nvdimm, LKML

Hi Dan,

How do you test these patches? Do you have any instructions?

I see for example that check_hotplug_memory_range() still enforces
memory_block_size_bytes() alignment.

Also, after removing check_hotplug_memory_range(), I tried to online
16M aligned DAX memory, and got the following panic:

# echo online > /sys/devices/system/memory/memory7/state
[  202.193132] WARNING: CPU: 2 PID: 351 at drivers/base/memory.c:207
memory_block_action+0x110/0x178
[  202.193391] Modules linked in:
[  202.193698] CPU: 2 PID: 351 Comm: sh Not tainted
5.1.0-rc7_pt_devdax-00038-g865af4385544-dirty #9
[  202.193909] Hardware name: linux,dummy-virt (DT)
[  202.194122] pstate: 60000005 (nZCv daif -PAN -UAO)
[  202.194243] pc : memory_block_action+0x110/0x178
[  202.194404] lr : memory_block_action+0x90/0x178
[  202.194506] sp : ffff000016763ca0
[  202.194592] x29: ffff000016763ca0 x28: ffff80016fd29b80
[  202.194724] x27: 0000000000000000 x26: 0000000000000000
[  202.194838] x25: ffff000015546000 x24: 00000000001c0000
[  202.194949] x23: 0000000000000000 x22: 0000000000040000
[  202.195058] x21: 00000000001c0000 x20: 0000000000000008
[  202.195168] x19: 0000000000000007 x18: 0000000000000000
[  202.195281] x17: 0000000000000000 x16: 0000000000000000
[  202.195393] x15: 0000000000000000 x14: 0000000000000000
[  202.195505] x13: 0000000000000000 x12: 0000000000000000
[  202.195614] x11: 0000000000000000 x10: 0000000000000000
[  202.195744] x9 : 0000000000000000 x8 : 0000000180000000
[  202.195858] x7 : 0000000000000018 x6 : ffff000015541930
[  202.195966] x5 : ffff000015541930 x4 : 0000000000000001
[  202.196074] x3 : 0000000000000001 x2 : 0000000000000000
[  202.196185] x1 : 0000000000000070 x0 : 0000000000000000
[  202.196366] Call trace:
[  202.196455]  memory_block_action+0x110/0x178
[  202.196589]  memory_subsys_online+0x3c/0x80
[  202.196681]  device_online+0x6c/0x90
[  202.196761]  state_store+0x84/0x100
[  202.196841]  dev_attr_store+0x18/0x28
[  202.196927]  sysfs_kf_write+0x40/0x58
[  202.197010]  kernfs_fop_write+0xcc/0x1d8
[  202.197099]  __vfs_write+0x18/0x40
[  202.197187]  vfs_write+0xa4/0x1b0
[  202.197295]  ksys_write+0x64/0xd8
[  202.197430]  __arm64_sys_write+0x18/0x20
[  202.197521]  el0_svc_common.constprop.0+0x7c/0xe8
[  202.197621]  el0_svc_handler+0x28/0x78
[  202.197706]  el0_svc+0x8/0xc
[  202.197828] ---[ end trace 57719823dda6d21e ]---

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
  2019-05-02 22:46 ` Pavel Tatashin
@ 2019-05-02 23:20     ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-05-02 23:20 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Michal Hocko, David Hildenbrand, linux-nvdimm,
	Jérôme Glisse, stable, LKML, linux-mm, Andrew Morton,
	Vlastimil Babka

On Thu, May 2, 2019 at 3:46 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
>
> Hi Dan,
>
> How do you test these patches? Do you have any instructions?

Yes, I briefly mentioned this in the cover letter, but here is the
test I am using:

>
> I see for example that check_hotplug_memory_range() still enforces
> memory_block_size_bytes() alignment.
>
> Also, after removing check_hotplug_memory_range(), I tried to online
> 16M aligned DAX memory, and got the following panic:

Right, this functionality is currently strictly limited to the
devm_memremap_pages() case where there are guarantees that the memory
will never be onlined. This is due to the fact that the section size
is entangled with the memblock api. That said I would have expected
you to trigger the warning in subsection_check() before getting this
far into the hotplug process.
>
> # echo online > /sys/devices/system/memory/memory7/state
> [  202.193132] WARNING: CPU: 2 PID: 351 at drivers/base/memory.c:207
> memory_block_action+0x110/0x178
> [  202.193391] Modules linked in:
> [  202.193698] CPU: 2 PID: 351 Comm: sh Not tainted
> 5.1.0-rc7_pt_devdax-00038-g865af4385544-dirty #9
> [  202.193909] Hardware name: linux,dummy-virt (DT)
> [  202.194122] pstate: 60000005 (nZCv daif -PAN -UAO)
> [  202.194243] pc : memory_block_action+0x110/0x178
> [  202.194404] lr : memory_block_action+0x90/0x178
> [  202.194506] sp : ffff000016763ca0
> [  202.194592] x29: ffff000016763ca0 x28: ffff80016fd29b80
> [  202.194724] x27: 0000000000000000 x26: 0000000000000000
> [  202.194838] x25: ffff000015546000 x24: 00000000001c0000
> [  202.194949] x23: 0000000000000000 x22: 0000000000040000
> [  202.195058] x21: 00000000001c0000 x20: 0000000000000008
> [  202.195168] x19: 0000000000000007 x18: 0000000000000000
> [  202.195281] x17: 0000000000000000 x16: 0000000000000000
> [  202.195393] x15: 0000000000000000 x14: 0000000000000000
> [  202.195505] x13: 0000000000000000 x12: 0000000000000000
> [  202.195614] x11: 0000000000000000 x10: 0000000000000000
> [  202.195744] x9 : 0000000000000000 x8 : 0000000180000000
> [  202.195858] x7 : 0000000000000018 x6 : ffff000015541930
> [  202.195966] x5 : ffff000015541930 x4 : 0000000000000001
> [  202.196074] x3 : 0000000000000001 x2 : 0000000000000000
> [  202.196185] x1 : 0000000000000070 x0 : 0000000000000000
> [  202.196366] Call trace:
> [  202.196455]  memory_block_action+0x110/0x178
> [  202.196589]  memory_subsys_online+0x3c/0x80
> [  202.196681]  device_online+0x6c/0x90
> [  202.196761]  state_store+0x84/0x100
> [  202.196841]  dev_attr_store+0x18/0x28
> [  202.196927]  sysfs_kf_write+0x40/0x58
> [  202.197010]  kernfs_fop_write+0xcc/0x1d8
> [  202.197099]  __vfs_write+0x18/0x40
> [  202.197187]  vfs_write+0xa4/0x1b0
> [  202.197295]  ksys_write+0x64/0xd8
> [  202.197430]  __arm64_sys_write+0x18/0x20
> [  202.197521]  el0_svc_common.constprop.0+0x7c/0xe8
> [  202.197621]  el0_svc_handler+0x28/0x78
> [  202.197706]  el0_svc+0x8/0xc
> [  202.197828] ---[ end trace 57719823dda6d21e ]---
>
> Thank you,
> Pasha
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
@ 2019-05-02 23:20     ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-05-02 23:20 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Andrew Morton, David Hildenbrand, Jérôme Glisse,
	Logan Gunthorpe, Toshi Kani, Jeff Moyer, Michal Hocko,
	Vlastimil Babka, stable, linux-mm, linux-nvdimm, LKML

On Thu, May 2, 2019 at 3:46 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
>
> Hi Dan,
>
> How do you test these patches? Do you have any instructions?

Yes, I briefly mentioned this in the cover letter, but here is the
test I am using:

>
> I see for example that check_hotplug_memory_range() still enforces
> memory_block_size_bytes() alignment.
>
> Also, after removing check_hotplug_memory_range(), I tried to online
> 16M aligned DAX memory, and got the following panic:

Right, this functionality is currently strictly limited to the
devm_memremap_pages() case where there are guarantees that the memory
will never be onlined. This is due to the fact that the section size
is entangled with the memblock api. That said I would have expected
you to trigger the warning in subsection_check() before getting this
far into the hotplug process.
>
> # echo online > /sys/devices/system/memory/memory7/state
> [  202.193132] WARNING: CPU: 2 PID: 351 at drivers/base/memory.c:207
> memory_block_action+0x110/0x178
> [  202.193391] Modules linked in:
> [  202.193698] CPU: 2 PID: 351 Comm: sh Not tainted
> 5.1.0-rc7_pt_devdax-00038-g865af4385544-dirty #9
> [  202.193909] Hardware name: linux,dummy-virt (DT)
> [  202.194122] pstate: 60000005 (nZCv daif -PAN -UAO)
> [  202.194243] pc : memory_block_action+0x110/0x178
> [  202.194404] lr : memory_block_action+0x90/0x178
> [  202.194506] sp : ffff000016763ca0
> [  202.194592] x29: ffff000016763ca0 x28: ffff80016fd29b80
> [  202.194724] x27: 0000000000000000 x26: 0000000000000000
> [  202.194838] x25: ffff000015546000 x24: 00000000001c0000
> [  202.194949] x23: 0000000000000000 x22: 0000000000040000
> [  202.195058] x21: 00000000001c0000 x20: 0000000000000008
> [  202.195168] x19: 0000000000000007 x18: 0000000000000000
> [  202.195281] x17: 0000000000000000 x16: 0000000000000000
> [  202.195393] x15: 0000000000000000 x14: 0000000000000000
> [  202.195505] x13: 0000000000000000 x12: 0000000000000000
> [  202.195614] x11: 0000000000000000 x10: 0000000000000000
> [  202.195744] x9 : 0000000000000000 x8 : 0000000180000000
> [  202.195858] x7 : 0000000000000018 x6 : ffff000015541930
> [  202.195966] x5 : ffff000015541930 x4 : 0000000000000001
> [  202.196074] x3 : 0000000000000001 x2 : 0000000000000000
> [  202.196185] x1 : 0000000000000070 x0 : 0000000000000000
> [  202.196366] Call trace:
> [  202.196455]  memory_block_action+0x110/0x178
> [  202.196589]  memory_subsys_online+0x3c/0x80
> [  202.196681]  device_online+0x6c/0x90
> [  202.196761]  state_store+0x84/0x100
> [  202.196841]  dev_attr_store+0x18/0x28
> [  202.196927]  sysfs_kf_write+0x40/0x58
> [  202.197010]  kernfs_fop_write+0xcc/0x1d8
> [  202.197099]  __vfs_write+0x18/0x40
> [  202.197187]  vfs_write+0xa4/0x1b0
> [  202.197295]  ksys_write+0x64/0xd8
> [  202.197430]  __arm64_sys_write+0x18/0x20
> [  202.197521]  el0_svc_common.constprop.0+0x7c/0xe8
> [  202.197621]  el0_svc_handler+0x28/0x78
> [  202.197706]  el0_svc+0x8/0xc
> [  202.197828] ---[ end trace 57719823dda6d21e ]---
>
> Thank you,
> Pasha

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
  2019-05-02 23:20     ` Dan Williams
@ 2019-05-02 23:21       ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-05-02 23:21 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Michal Hocko, David Hildenbrand, linux-nvdimm,
	Jérôme Glisse, stable, LKML, linux-mm, Andrew Morton,
	Vlastimil Babka

On Thu, May 2, 2019 at 4:20 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Thu, May 2, 2019 at 3:46 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
> >
> > Hi Dan,
> >
> > How do you test these patches? Do you have any instructions?
>
> Yes, I briefly mentioned this in the cover letter, but here is the
> test I am using:

Sorry, fumble fingered the 'send' button, here is that link:

https://github.com/pmem/ndctl/blob/subsection-pending/test/sub-section.sh
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
@ 2019-05-02 23:21       ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-05-02 23:21 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Andrew Morton, David Hildenbrand, Jérôme Glisse,
	Logan Gunthorpe, Toshi Kani, Jeff Moyer, Michal Hocko,
	Vlastimil Babka, stable, linux-mm, linux-nvdimm, LKML

On Thu, May 2, 2019 at 4:20 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Thu, May 2, 2019 at 3:46 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
> >
> > Hi Dan,
> >
> > How do you test these patches? Do you have any instructions?
>
> Yes, I briefly mentioned this in the cover letter, but here is the
> test I am using:

Sorry, fumble fingered the 'send' button, here is that link:

https://github.com/pmem/ndctl/blob/subsection-pending/test/sub-section.sh

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 02/12] mm/sparsemem: Introduce common definitions for the size and mask of a section
  2019-05-02 14:53   ` Pavel Tatashin
@ 2019-05-03  0:41       ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-05-03  0:41 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Michal Hocko, linux-nvdimm, Robin Murphy, David Hildenbrand,
	LKML, linux-mm, Jérôme Glisse, Andrew Morton,
	Vlastimil Babka

On Thu, May 2, 2019 at 7:53 AM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
>
> On Wed, Apr 17, 2019 at 2:52 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > Up-level the local section size and mask from kernel/memremap.c to
> > global definitions.  These will be used by the new sub-section hotplug
> > support.
> >
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Logan Gunthorpe <logang@deltatee.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> Should be dropped from this series as it has been replaced by a very
> similar patch in the mainline:
>
> 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
>  mm/memremap: Rename and consolidate SECTION_SIZE

I saw that patch fly by and acked it, but I have not seen it picked up
anywhere. I grabbed latest -linus and -next, but don't see that
commit.

$ git show 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
fatal: bad object 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 02/12] mm/sparsemem: Introduce common definitions for the size and mask of a section
@ 2019-05-03  0:41       ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-05-03  0:41 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka,
	Jérôme Glisse, Logan Gunthorpe, linux-mm, linux-nvdimm,
	LKML, David Hildenbrand, Robin Murphy

On Thu, May 2, 2019 at 7:53 AM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
>
> On Wed, Apr 17, 2019 at 2:52 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > Up-level the local section size and mask from kernel/memremap.c to
> > global definitions.  These will be used by the new sub-section hotplug
> > support.
> >
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Logan Gunthorpe <logang@deltatee.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> Should be dropped from this series as it has been replaced by a very
> similar patch in the mainline:
>
> 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
>  mm/memremap: Rename and consolidate SECTION_SIZE

I saw that patch fly by and acked it, but I have not seen it picked up
anywhere. I grabbed latest -linus and -next, but don't see that
commit.

$ git show 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
fatal: bad object 7c697d7fb5cb14ef60e2b687333ba3efb74f73da

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 02/12] mm/sparsemem: Introduce common definitions for the size and mask of a section
  2019-05-03  0:41       ` Dan Williams
@ 2019-05-03 10:35         ` Robin Murphy
  -1 siblings, 0 replies; 111+ messages in thread
From: Robin Murphy @ 2019-05-03 10:35 UTC (permalink / raw)
  To: Dan Williams, Pavel Tatashin
  Cc: Michal Hocko, linux-nvdimm, David Hildenbrand, LKML, linux-mm,
	Jérôme Glisse, Andrew Morton, Vlastimil Babka

On 03/05/2019 01:41, Dan Williams wrote:
> On Thu, May 2, 2019 at 7:53 AM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
>>
>> On Wed, Apr 17, 2019 at 2:52 PM Dan Williams <dan.j.williams@intel.com> wrote:
>>>
>>> Up-level the local section size and mask from kernel/memremap.c to
>>> global definitions.  These will be used by the new sub-section hotplug
>>> support.
>>>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>> Cc: Jérôme Glisse <jglisse@redhat.com>
>>> Cc: Logan Gunthorpe <logang@deltatee.com>
>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>
>> Should be dropped from this series as it has been replaced by a very
>> similar patch in the mainline:
>>
>> 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
>>   mm/memremap: Rename and consolidate SECTION_SIZE
> 
> I saw that patch fly by and acked it, but I have not seen it picked up
> anywhere. I grabbed latest -linus and -next, but don't see that
> commit.
> 
> $ git show 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
> fatal: bad object 7c697d7fb5cb14ef60e2b687333ba3efb74f73da

Yeah, I don't recognise that ID either, nor have I had any notifications 
that Andrew's picked up anything of mine yet :/

Robin.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 02/12] mm/sparsemem: Introduce common definitions for the size and mask of a section
@ 2019-05-03 10:35         ` Robin Murphy
  0 siblings, 0 replies; 111+ messages in thread
From: Robin Murphy @ 2019-05-03 10:35 UTC (permalink / raw)
  To: Dan Williams, Pavel Tatashin
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka,
	Jérôme Glisse, Logan Gunthorpe, linux-mm, linux-nvdimm,
	LKML, David Hildenbrand

On 03/05/2019 01:41, Dan Williams wrote:
> On Thu, May 2, 2019 at 7:53 AM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
>>
>> On Wed, Apr 17, 2019 at 2:52 PM Dan Williams <dan.j.williams@intel.com> wrote:
>>>
>>> Up-level the local section size and mask from kernel/memremap.c to
>>> global definitions.  These will be used by the new sub-section hotplug
>>> support.
>>>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>> Cc: Jérôme Glisse <jglisse@redhat.com>
>>> Cc: Logan Gunthorpe <logang@deltatee.com>
>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>
>> Should be dropped from this series as it has been replaced by a very
>> similar patch in the mainline:
>>
>> 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
>>   mm/memremap: Rename and consolidate SECTION_SIZE
> 
> I saw that patch fly by and acked it, but I have not seen it picked up
> anywhere. I grabbed latest -linus and -next, but don't see that
> commit.
> 
> $ git show 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
> fatal: bad object 7c697d7fb5cb14ef60e2b687333ba3efb74f73da

Yeah, I don't recognise that ID either, nor have I had any notifications 
that Andrew's picked up anything of mine yet :/

Robin.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
  2019-05-02 23:20     ` Dan Williams
@ 2019-05-03 10:48       ` Oscar Salvador
  -1 siblings, 0 replies; 111+ messages in thread
From: Oscar Salvador @ 2019-05-03 10:48 UTC (permalink / raw)
  To: Dan Williams
  Cc: Michal Hocko, Pavel Tatashin, David Hildenbrand, linux-nvdimm,
	stable, LKML, linux-mm, Jérôme Glisse, Andrew Morton,
	Vlastimil Babka

On Thu, May 02, 2019 at 04:20:03PM -0700, Dan Williams wrote:
> On Thu, May 2, 2019 at 3:46 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
> >
> > Hi Dan,
> >
> > How do you test these patches? Do you have any instructions?
> 
> Yes, I briefly mentioned this in the cover letter, but here is the
> test I am using:
> 
> >
> > I see for example that check_hotplug_memory_range() still enforces
> > memory_block_size_bytes() alignment.
> >
> > Also, after removing check_hotplug_memory_range(), I tried to online
> > 16M aligned DAX memory, and got the following panic:
> 
> Right, this functionality is currently strictly limited to the
> devm_memremap_pages() case where there are guarantees that the memory
> will never be onlined. This is due to the fact that the section size
> is entangled with the memblock api. That said I would have expected
> you to trigger the warning in subsection_check() before getting this
> far into the hotplug process.
> >
> > # echo online > /sys/devices/system/memory/memory7/state
> > [  202.193132] WARNING: CPU: 2 PID: 351 at drivers/base/memory.c:207
> > memory_block_action+0x110/0x178
> > [  202.193391] Modules linked in:
> > [  202.193698] CPU: 2 PID: 351 Comm: sh Not tainted
> > 5.1.0-rc7_pt_devdax-00038-g865af4385544-dirty #9
> > [  202.193909] Hardware name: linux,dummy-virt (DT)
> > [  202.194122] pstate: 60000005 (nZCv daif -PAN -UAO)
> > [  202.194243] pc : memory_block_action+0x110/0x178
> > [  202.194404] lr : memory_block_action+0x90/0x178
> > [  202.194506] sp : ffff000016763ca0
> > [  202.194592] x29: ffff000016763ca0 x28: ffff80016fd29b80
> > [  202.194724] x27: 0000000000000000 x26: 0000000000000000
> > [  202.194838] x25: ffff000015546000 x24: 00000000001c0000
> > [  202.194949] x23: 0000000000000000 x22: 0000000000040000
> > [  202.195058] x21: 00000000001c0000 x20: 0000000000000008
> > [  202.195168] x19: 0000000000000007 x18: 0000000000000000
> > [  202.195281] x17: 0000000000000000 x16: 0000000000000000
> > [  202.195393] x15: 0000000000000000 x14: 0000000000000000
> > [  202.195505] x13: 0000000000000000 x12: 0000000000000000
> > [  202.195614] x11: 0000000000000000 x10: 0000000000000000
> > [  202.195744] x9 : 0000000000000000 x8 : 0000000180000000
> > [  202.195858] x7 : 0000000000000018 x6 : ffff000015541930
> > [  202.195966] x5 : ffff000015541930 x4 : 0000000000000001
> > [  202.196074] x3 : 0000000000000001 x2 : 0000000000000000
> > [  202.196185] x1 : 0000000000000070 x0 : 0000000000000000
> > [  202.196366] Call trace:
> > [  202.196455]  memory_block_action+0x110/0x178
> > [  202.196589]  memory_subsys_online+0x3c/0x80
> > [  202.196681]  device_online+0x6c/0x90
> > [  202.196761]  state_store+0x84/0x100
> > [  202.196841]  dev_attr_store+0x18/0x28
> > [  202.196927]  sysfs_kf_write+0x40/0x58
> > [  202.197010]  kernfs_fop_write+0xcc/0x1d8
> > [  202.197099]  __vfs_write+0x18/0x40
> > [  202.197187]  vfs_write+0xa4/0x1b0
> > [  202.197295]  ksys_write+0x64/0xd8
> > [  202.197430]  __arm64_sys_write+0x18/0x20
> > [  202.197521]  el0_svc_common.constprop.0+0x7c/0xe8
> > [  202.197621]  el0_svc_handler+0x28/0x78
> > [  202.197706]  el0_svc+0x8/0xc
> > [  202.197828] ---[ end trace 57719823dda6d21e ]---

This warning relates to:

        for (; section_nr < section_nr_end; section_nr++) {
                if (WARN_ON_ONCE(!pfn_valid(pfn)))
                        return false;

from pages_correctly_probed().
AFAICS, this is orthogonal to subsection_check().


-- 
Oscar Salvador
SUSE L3
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
@ 2019-05-03 10:48       ` Oscar Salvador
  0 siblings, 0 replies; 111+ messages in thread
From: Oscar Salvador @ 2019-05-03 10:48 UTC (permalink / raw)
  To: Dan Williams
  Cc: Pavel Tatashin, Andrew Morton, David Hildenbrand,
	Jérôme Glisse, Logan Gunthorpe, Toshi Kani, Jeff Moyer,
	Michal Hocko, Vlastimil Babka, stable, linux-mm, linux-nvdimm,
	LKML

On Thu, May 02, 2019 at 04:20:03PM -0700, Dan Williams wrote:
> On Thu, May 2, 2019 at 3:46 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
> >
> > Hi Dan,
> >
> > How do you test these patches? Do you have any instructions?
> 
> Yes, I briefly mentioned this in the cover letter, but here is the
> test I am using:
> 
> >
> > I see for example that check_hotplug_memory_range() still enforces
> > memory_block_size_bytes() alignment.
> >
> > Also, after removing check_hotplug_memory_range(), I tried to online
> > 16M aligned DAX memory, and got the following panic:
> 
> Right, this functionality is currently strictly limited to the
> devm_memremap_pages() case where there are guarantees that the memory
> will never be onlined. This is due to the fact that the section size
> is entangled with the memblock api. That said I would have expected
> you to trigger the warning in subsection_check() before getting this
> far into the hotplug process.
> >
> > # echo online > /sys/devices/system/memory/memory7/state
> > [  202.193132] WARNING: CPU: 2 PID: 351 at drivers/base/memory.c:207
> > memory_block_action+0x110/0x178
> > [  202.193391] Modules linked in:
> > [  202.193698] CPU: 2 PID: 351 Comm: sh Not tainted
> > 5.1.0-rc7_pt_devdax-00038-g865af4385544-dirty #9
> > [  202.193909] Hardware name: linux,dummy-virt (DT)
> > [  202.194122] pstate: 60000005 (nZCv daif -PAN -UAO)
> > [  202.194243] pc : memory_block_action+0x110/0x178
> > [  202.194404] lr : memory_block_action+0x90/0x178
> > [  202.194506] sp : ffff000016763ca0
> > [  202.194592] x29: ffff000016763ca0 x28: ffff80016fd29b80
> > [  202.194724] x27: 0000000000000000 x26: 0000000000000000
> > [  202.194838] x25: ffff000015546000 x24: 00000000001c0000
> > [  202.194949] x23: 0000000000000000 x22: 0000000000040000
> > [  202.195058] x21: 00000000001c0000 x20: 0000000000000008
> > [  202.195168] x19: 0000000000000007 x18: 0000000000000000
> > [  202.195281] x17: 0000000000000000 x16: 0000000000000000
> > [  202.195393] x15: 0000000000000000 x14: 0000000000000000
> > [  202.195505] x13: 0000000000000000 x12: 0000000000000000
> > [  202.195614] x11: 0000000000000000 x10: 0000000000000000
> > [  202.195744] x9 : 0000000000000000 x8 : 0000000180000000
> > [  202.195858] x7 : 0000000000000018 x6 : ffff000015541930
> > [  202.195966] x5 : ffff000015541930 x4 : 0000000000000001
> > [  202.196074] x3 : 0000000000000001 x2 : 0000000000000000
> > [  202.196185] x1 : 0000000000000070 x0 : 0000000000000000
> > [  202.196366] Call trace:
> > [  202.196455]  memory_block_action+0x110/0x178
> > [  202.196589]  memory_subsys_online+0x3c/0x80
> > [  202.196681]  device_online+0x6c/0x90
> > [  202.196761]  state_store+0x84/0x100
> > [  202.196841]  dev_attr_store+0x18/0x28
> > [  202.196927]  sysfs_kf_write+0x40/0x58
> > [  202.197010]  kernfs_fop_write+0xcc/0x1d8
> > [  202.197099]  __vfs_write+0x18/0x40
> > [  202.197187]  vfs_write+0xa4/0x1b0
> > [  202.197295]  ksys_write+0x64/0xd8
> > [  202.197430]  __arm64_sys_write+0x18/0x20
> > [  202.197521]  el0_svc_common.constprop.0+0x7c/0xe8
> > [  202.197621]  el0_svc_handler+0x28/0x78
> > [  202.197706]  el0_svc+0x8/0xc
> > [  202.197828] ---[ end trace 57719823dda6d21e ]---

This warning relates to:

        for (; section_nr < section_nr_end; section_nr++) {
                if (WARN_ON_ONCE(!pfn_valid(pfn)))
                        return false;

from pages_correctly_probed().
AFAICS, this is orthogonal to subsection_check().


-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 02/12] mm/sparsemem: Introduce common definitions for the size and mask of a section
  2019-05-03 10:35         ` Robin Murphy
  (?)
@ 2019-05-03 12:57         ` Pavel Tatashin
  2019-05-03 13:00             ` Oscar Salvador
  -1 siblings, 1 reply; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-03 12:57 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Dan Williams, Andrew Morton, Michal Hocko, Vlastimil Babka,
	Jérôme Glisse, Logan Gunthorpe, linux-mm, linux-nvdimm,
	LKML, David Hildenbrand

On Fri, May 3, 2019 at 6:35 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 03/05/2019 01:41, Dan Williams wrote:
> > On Thu, May 2, 2019 at 7:53 AM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
> >>
> >> On Wed, Apr 17, 2019 at 2:52 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >>>
> >>> Up-level the local section size and mask from kernel/memremap.c to
> >>> global definitions.  These will be used by the new sub-section hotplug
> >>> support.
> >>>
> >>> Cc: Michal Hocko <mhocko@suse.com>
> >>> Cc: Vlastimil Babka <vbabka@suse.cz>
> >>> Cc: Jérôme Glisse <jglisse@redhat.com>
> >>> Cc: Logan Gunthorpe <logang@deltatee.com>
> >>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> >>
> >> Should be dropped from this series as it has been replaced by a very
> >> similar patch in the mainline:
> >>
> >> 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
> >>   mm/memremap: Rename and consolidate SECTION_SIZE
> >
> > I saw that patch fly by and acked it, but I have not seen it picked up
> > anywhere. I grabbed latest -linus and -next, but don't see that
> > commit.
> >
> > $ git show 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
> > fatal: bad object 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
>
> Yeah, I don't recognise that ID either, nor have I had any notifications
> that Andrew's picked up anything of mine yet :/

Sorry for the confusion. I thought I checked in a master branch, but
turns out I checked in a branch where I applied arm hotremove patches
and Robin's patch as well. These two patches are essentially the same,
so which one goes first the other should be dropped.

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>

Thank you,
Pasha

>
> Robin.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 02/12] mm/sparsemem: Introduce common definitions for the size and mask of a section
  2019-05-03 12:57         ` Pavel Tatashin
@ 2019-05-03 13:00             ` Oscar Salvador
  0 siblings, 0 replies; 111+ messages in thread
From: Oscar Salvador @ 2019-05-03 13:00 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Michal Hocko, linux-nvdimm, David Hildenbrand, LKML, linux-mm,
	Jérôme Glisse, Robin Murphy, Andrew Morton,
	Vlastimil Babka

On Fri, May 03, 2019 at 08:57:09AM -0400, Pavel Tatashin wrote:
> On Fri, May 3, 2019 at 6:35 AM Robin Murphy <robin.murphy@arm.com> wrote:
> >
> > On 03/05/2019 01:41, Dan Williams wrote:
> > > On Thu, May 2, 2019 at 7:53 AM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
> > >>
> > >> On Wed, Apr 17, 2019 at 2:52 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > >>>
> > >>> Up-level the local section size and mask from kernel/memremap.c to
> > >>> global definitions.  These will be used by the new sub-section hotplug
> > >>> support.
> > >>>
> > >>> Cc: Michal Hocko <mhocko@suse.com>
> > >>> Cc: Vlastimil Babka <vbabka@suse.cz>
> > >>> Cc: Jérôme Glisse <jglisse@redhat.com>
> > >>> Cc: Logan Gunthorpe <logang@deltatee.com>
> > >>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > >>
> > >> Should be dropped from this series as it has been replaced by a very
> > >> similar patch in the mainline:
> > >>
> > >> 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
> > >>   mm/memremap: Rename and consolidate SECTION_SIZE
> > >
> > > I saw that patch fly by and acked it, but I have not seen it picked up
> > > anywhere. I grabbed latest -linus and -next, but don't see that
> > > commit.
> > >
> > > $ git show 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
> > > fatal: bad object 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
> >
> > Yeah, I don't recognise that ID either, nor have I had any notifications
> > that Andrew's picked up anything of mine yet :/
> 
> Sorry for the confusion. I thought I checked in a master branch, but
> turns out I checked in a branch where I applied arm hotremove patches
> and Robin's patch as well. These two patches are essentially the same,
> so which one goes first the other should be dropped.
> 
> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>

Hey Pavel,

just a friendly note :-) :

you are reviewing v6, I think you might want to review v7 [1] instead ;-)?

[1] https://patchwork.kernel.org/cover/10926035/
 

-- 
Oscar Salvador
SUSE L3
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 02/12] mm/sparsemem: Introduce common definitions for the size and mask of a section
@ 2019-05-03 13:00             ` Oscar Salvador
  0 siblings, 0 replies; 111+ messages in thread
From: Oscar Salvador @ 2019-05-03 13:00 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Robin Murphy, Dan Williams, Andrew Morton, Michal Hocko,
	Vlastimil Babka, Jérôme Glisse, Logan Gunthorpe,
	linux-mm, linux-nvdimm, LKML, David Hildenbrand

On Fri, May 03, 2019 at 08:57:09AM -0400, Pavel Tatashin wrote:
> On Fri, May 3, 2019 at 6:35 AM Robin Murphy <robin.murphy@arm.com> wrote:
> >
> > On 03/05/2019 01:41, Dan Williams wrote:
> > > On Thu, May 2, 2019 at 7:53 AM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
> > >>
> > >> On Wed, Apr 17, 2019 at 2:52 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > >>>
> > >>> Up-level the local section size and mask from kernel/memremap.c to
> > >>> global definitions.  These will be used by the new sub-section hotplug
> > >>> support.
> > >>>
> > >>> Cc: Michal Hocko <mhocko@suse.com>
> > >>> Cc: Vlastimil Babka <vbabka@suse.cz>
> > >>> Cc: Jérôme Glisse <jglisse@redhat.com>
> > >>> Cc: Logan Gunthorpe <logang@deltatee.com>
> > >>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > >>
> > >> Should be dropped from this series as it has been replaced by a very
> > >> similar patch in the mainline:
> > >>
> > >> 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
> > >>   mm/memremap: Rename and consolidate SECTION_SIZE
> > >
> > > I saw that patch fly by and acked it, but I have not seen it picked up
> > > anywhere. I grabbed latest -linus and -next, but don't see that
> > > commit.
> > >
> > > $ git show 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
> > > fatal: bad object 7c697d7fb5cb14ef60e2b687333ba3efb74f73da
> >
> > Yeah, I don't recognise that ID either, nor have I had any notifications
> > that Andrew's picked up anything of mine yet :/
> 
> Sorry for the confusion. I thought I checked in a master branch, but
> turns out I checked in a branch where I applied arm hotremove patches
> and Robin's patch as well. These two patches are essentially the same,
> so which one goes first the other should be dropped.
> 
> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>

Hey Pavel,

just a friendly note :-) :

you are reviewing v6, I think you might want to review v7 [1] instead ;-)?

[1] https://patchwork.kernel.org/cover/10926035/
 

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 01/12] mm/sparsemem: Introduce struct mem_section_usage
  2019-05-02  6:07       ` Dan Williams
  (?)
@ 2019-05-04  0:22         ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-05-04  0:22 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Michal Hocko, linux-nvdimm, David Hildenbrand,
	Linux Kernel Mailing List, Linux MM, Andrew Morton,
	Vlastimil Babka

On Wed, May 1, 2019 at 11:07 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, May 1, 2019 at 4:25 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
> >
> > On 19-04-17 11:39:00, Dan Williams wrote:
> > > Towards enabling memory hotplug to track partial population of a
> > > section, introduce 'struct mem_section_usage'.
> > >
> > > A pointer to a 'struct mem_section_usage' instance replaces the existing
> > > pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
> > > 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to
> > > house a new 'map_active' bitmap.  The new bitmap enables the memory
> > > hot{plug,remove} implementation to act on incremental sub-divisions of a
> > > section.
> > >
> > > The primary motivation for this functionality is to support platforms
> > > that mix "System RAM" and "Persistent Memory" within a single section,
> > > or multiple PMEM ranges with different mapping lifetimes within a single
> > > section. The section restriction for hotplug has caused an ongoing saga
> > > of hacks and bugs for devm_memremap_pages() users.
> > >
> > > Beyond the fixups to teach existing paths how to retrieve the 'usemap'
> > > from a section, and updates to usemap allocation path, there are no
> > > expected behavior changes.
> > >
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Vlastimil Babka <vbabka@suse.cz>
> > > Cc: Logan Gunthorpe <logang@deltatee.com>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > ---
> > >  include/linux/mmzone.h |   23 ++++++++++++--
> > >  mm/memory_hotplug.c    |   18 ++++++-----
> > >  mm/page_alloc.c        |    2 +
> > >  mm/sparse.c            |   81 ++++++++++++++++++++++++------------------------
> > >  4 files changed, 71 insertions(+), 53 deletions(-)
> > >
> > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > > index 70394cabaf4e..f0bbd85dc19a 100644
> > > --- a/include/linux/mmzone.h
> > > +++ b/include/linux/mmzone.h
> > > @@ -1160,6 +1160,19 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
> > >  #define SECTION_ALIGN_UP(pfn)        (((pfn) + PAGES_PER_SECTION - 1) & PAGE_SECTION_MASK)
> > >  #define SECTION_ALIGN_DOWN(pfn)      ((pfn) & PAGE_SECTION_MASK)
> > >
> > > +#define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG)
> > > +#define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1))
> > > +
> > > +struct mem_section_usage {
> > > +     /*
> > > +      * SECTION_ACTIVE_SIZE portions of the section that are populated in
> > > +      * the memmap
> > > +      */
> > > +     unsigned long map_active;
> >
> > I think this should be proportional to section_size / subsection_size.
> > For example, on intel section size = 128M, and subsection is 2M, so
> > 64bits work nicely. But, on arm64 section size if 1G, so subsection is
> > 16M.
> >
> > On the other hand 16M is already much better than what we have: with 1G
> > section size and 2M pmem alignment we guaranteed to loose 1022M. And
> > with 16M subsection it is only 14M.
>
> I'm ok with it being 16M for now unless it causes a problem in
> practice, i.e. something like the minimum hardware mapping alignment
> for physical memory being less than 16M.

On second thought, arbitrary differences across architectures is a bit
sad. The most common nvdimm namespace alignment granularity is
PMD_SIZE, so perhaps the default sub-section size should try to match
that default.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 01/12] mm/sparsemem: Introduce struct mem_section_usage
@ 2019-05-04  0:22         ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-05-04  0:22 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	Linux MM, linux-nvdimm, Linux Kernel Mailing List,
	David Hildenbrand

On Wed, May 1, 2019 at 11:07 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, May 1, 2019 at 4:25 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
> >
> > On 19-04-17 11:39:00, Dan Williams wrote:
> > > Towards enabling memory hotplug to track partial population of a
> > > section, introduce 'struct mem_section_usage'.
> > >
> > > A pointer to a 'struct mem_section_usage' instance replaces the existing
> > > pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
> > > 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to
> > > house a new 'map_active' bitmap.  The new bitmap enables the memory
> > > hot{plug,remove} implementation to act on incremental sub-divisions of a
> > > section.
> > >
> > > The primary motivation for this functionality is to support platforms
> > > that mix "System RAM" and "Persistent Memory" within a single section,
> > > or multiple PMEM ranges with different mapping lifetimes within a single
> > > section. The section restriction for hotplug has caused an ongoing saga
> > > of hacks and bugs for devm_memremap_pages() users.
> > >
> > > Beyond the fixups to teach existing paths how to retrieve the 'usemap'
> > > from a section, and updates to usemap allocation path, there are no
> > > expected behavior changes.
> > >
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Vlastimil Babka <vbabka@suse.cz>
> > > Cc: Logan Gunthorpe <logang@deltatee.com>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > ---
> > >  include/linux/mmzone.h |   23 ++++++++++++--
> > >  mm/memory_hotplug.c    |   18 ++++++-----
> > >  mm/page_alloc.c        |    2 +
> > >  mm/sparse.c            |   81 ++++++++++++++++++++++++------------------------
> > >  4 files changed, 71 insertions(+), 53 deletions(-)
> > >
> > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > > index 70394cabaf4e..f0bbd85dc19a 100644
> > > --- a/include/linux/mmzone.h
> > > +++ b/include/linux/mmzone.h
> > > @@ -1160,6 +1160,19 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
> > >  #define SECTION_ALIGN_UP(pfn)        (((pfn) + PAGES_PER_SECTION - 1) & PAGE_SECTION_MASK)
> > >  #define SECTION_ALIGN_DOWN(pfn)      ((pfn) & PAGE_SECTION_MASK)
> > >
> > > +#define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG)
> > > +#define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1))
> > > +
> > > +struct mem_section_usage {
> > > +     /*
> > > +      * SECTION_ACTIVE_SIZE portions of the section that are populated in
> > > +      * the memmap
> > > +      */
> > > +     unsigned long map_active;
> >
> > I think this should be proportional to section_size / subsection_size.
> > For example, on intel section size = 128M, and subsection is 2M, so
> > 64bits work nicely. But, on arm64 section size if 1G, so subsection is
> > 16M.
> >
> > On the other hand 16M is already much better than what we have: with 1G
> > section size and 2M pmem alignment we guaranteed to loose 1022M. And
> > with 16M subsection it is only 14M.
>
> I'm ok with it being 16M for now unless it causes a problem in
> practice, i.e. something like the minimum hardware mapping alignment
> for physical memory being less than 16M.

On second thought, arbitrary differences across architectures is a bit
sad. The most common nvdimm namespace alignment granularity is
PMD_SIZE, so perhaps the default sub-section size should try to match
that default.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 01/12] mm/sparsemem: Introduce struct mem_section_usage
@ 2019-05-04  0:22         ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-05-04  0:22 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	Linux MM, linux-nvdimm, Linux Kernel Mailing List,
	David Hildenbrand

On Wed, May 1, 2019 at 11:07 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, May 1, 2019 at 4:25 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
> >
> > On 19-04-17 11:39:00, Dan Williams wrote:
> > > Towards enabling memory hotplug to track partial population of a
> > > section, introduce 'struct mem_section_usage'.
> > >
> > > A pointer to a 'struct mem_section_usage' instance replaces the existing
> > > pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
> > > 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to
> > > house a new 'map_active' bitmap.  The new bitmap enables the memory
> > > hot{plug,remove} implementation to act on incremental sub-divisions of a
> > > section.
> > >
> > > The primary motivation for this functionality is to support platforms
> > > that mix "System RAM" and "Persistent Memory" within a single section,
> > > or multiple PMEM ranges with different mapping lifetimes within a single
> > > section. The section restriction for hotplug has caused an ongoing saga
> > > of hacks and bugs for devm_memremap_pages() users.
> > >
> > > Beyond the fixups to teach existing paths how to retrieve the 'usemap'
> > > from a section, and updates to usemap allocation path, there are no
> > > expected behavior changes.
> > >
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Vlastimil Babka <vbabka@suse.cz>
> > > Cc: Logan Gunthorpe <logang@deltatee.com>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > ---
> > >  include/linux/mmzone.h |   23 ++++++++++++--
> > >  mm/memory_hotplug.c    |   18 ++++++-----
> > >  mm/page_alloc.c        |    2 +
> > >  mm/sparse.c            |   81 ++++++++++++++++++++++++------------------------
> > >  4 files changed, 71 insertions(+), 53 deletions(-)
> > >
> > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > > index 70394cabaf4e..f0bbd85dc19a 100644
> > > --- a/include/linux/mmzone.h
> > > +++ b/include/linux/mmzone.h
> > > @@ -1160,6 +1160,19 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
> > >  #define SECTION_ALIGN_UP(pfn)        (((pfn) + PAGES_PER_SECTION - 1) & PAGE_SECTION_MASK)
> > >  #define SECTION_ALIGN_DOWN(pfn)      ((pfn) & PAGE_SECTION_MASK)
> > >
> > > +#define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG)
> > > +#define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1))
> > > +
> > > +struct mem_section_usage {
> > > +     /*
> > > +      * SECTION_ACTIVE_SIZE portions of the section that are populated in
> > > +      * the memmap
> > > +      */
> > > +     unsigned long map_active;
> >
> > I think this should be proportional to section_size / subsection_size.
> > For example, on intel section size = 128M, and subsection is 2M, so
> > 64bits work nicely. But, on arm64 section size if 1G, so subsection is
> > 16M.
> >
> > On the other hand 16M is already much better than what we have: with 1G
> > section size and 2M pmem alignment we guaranteed to loose 1022M. And
> > with 16M subsection it is only 14M.
>
> I'm ok with it being 16M for now unless it causes a problem in
> practice, i.e. something like the minimum hardware mapping alignment
> for physical memory being less than 16M.

On second thought, arbitrary differences across architectures is a bit
sad. The most common nvdimm namespace alignment granularity is
PMD_SIZE, so perhaps the default sub-section size should try to match
that default.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 01/12] mm/sparsemem: Introduce struct mem_section_usage
  2019-05-04  0:22         ` Dan Williams
  (?)
@ 2019-05-04 15:55           ` Pavel Tatashin
  -1 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-04 15:55 UTC (permalink / raw)
  To: Dan Williams
  Cc: Michal Hocko, linux-nvdimm, David Hildenbrand,
	Linux Kernel Mailing List, Linux MM, Andrew Morton,
	Vlastimil Babka

> > I'm ok with it being 16M for now unless it causes a problem in
> > practice, i.e. something like the minimum hardware mapping alignment
> > for physical memory being less than 16M.
>
> On second thought, arbitrary differences across architectures is a bit
> sad. The most common nvdimm namespace alignment granularity is
> PMD_SIZE, so perhaps the default sub-section size should try to match
> that default.

I think that even if you keep it 16M for now, at very least you should
make the map_active bitmap scalable so it will be possible to change
as required later without revisiting all functions that use it. Making
it a static array won't slowdown x86, as it will be still a single
64-bit word on x86.

Pasha
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 01/12] mm/sparsemem: Introduce struct mem_section_usage
@ 2019-05-04 15:55           ` Pavel Tatashin
  0 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-04 15:55 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	Linux MM, linux-nvdimm, Linux Kernel Mailing List,
	David Hildenbrand

> > I'm ok with it being 16M for now unless it causes a problem in
> > practice, i.e. something like the minimum hardware mapping alignment
> > for physical memory being less than 16M.
>
> On second thought, arbitrary differences across architectures is a bit
> sad. The most common nvdimm namespace alignment granularity is
> PMD_SIZE, so perhaps the default sub-section size should try to match
> that default.

I think that even if you keep it 16M for now, at very least you should
make the map_active bitmap scalable so it will be possible to change
as required later without revisiting all functions that use it. Making
it a static array won't slowdown x86, as it will be still a single
64-bit word on x86.

Pasha

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 01/12] mm/sparsemem: Introduce struct mem_section_usage
@ 2019-05-04 15:55           ` Pavel Tatashin
  0 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-04 15:55 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	Linux MM, linux-nvdimm, Linux Kernel Mailing List,
	David Hildenbrand

> > I'm ok with it being 16M for now unless it causes a problem in
> > practice, i.e. something like the minimum hardware mapping alignment
> > for physical memory being less than 16M.
>
> On second thought, arbitrary differences across architectures is a bit
> sad. The most common nvdimm namespace alignment granularity is
> PMD_SIZE, so perhaps the default sub-section size should try to match
> that default.

I think that even if you keep it 16M for now, at very least you should
make the map_active bitmap scalable so it will be possible to change
as required later without revisiting all functions that use it. Making
it a static array won't slowdown x86, as it will be still a single
64-bit word on x86.

Pasha


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
  2019-05-02 16:12     ` Pavel Tatashin
  (?)
@ 2019-05-04 19:26       ` Dan Williams
  -1 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-05-04 19:26 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Michal Hocko, linux-nvdimm, David Hildenbrand, LKML, linux-mm,
	Andrew Morton, Vlastimil Babka

On Thu, May 2, 2019 at 9:12 AM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
>
> On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> > section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
> > map_active bitmask length (64)). If it turns out that 2MB is too large
> > of an active tracking granularity it is trivial to increase the size of
> > the map_active bitmap.
>
> Please mention that 2M on Intel, and 16M on Arm64.
>
> >
> > The implications of a partially populated section is that pfn_valid()
> > needs to go beyond a valid_section() check and read the sub-section
> > active ranges from the bitmask.
> >
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Logan Gunthorpe <logang@deltatee.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  include/linux/mmzone.h |   29 ++++++++++++++++++++++++++++-
> >  mm/page_alloc.c        |    4 +++-
> >  mm/sparse.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 79 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 6726fc175b51..cffde898e345 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -1175,6 +1175,8 @@ struct mem_section_usage {
> >         unsigned long pageblock_flags[0];
> >  };
> >
> > +void section_active_init(unsigned long pfn, unsigned long nr_pages);
> > +
> >  struct page;
> >  struct page_ext;
> >  struct mem_section {
> > @@ -1312,12 +1314,36 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn)
> >
> >  extern int __highest_present_section_nr;
> >
> > +static inline int section_active_index(phys_addr_t phys)
> > +{
> > +       return (phys & ~(PA_SECTION_MASK)) / SECTION_ACTIVE_SIZE;
>
> How about also defining SECTION_ACTIVE_SHIFT like this:
>
> /* BITS_PER_LONG = 2^6 */
> #define BITS_PER_LONG_SHIFT 6
> #define SECTION_ACTIVE_SHIFT (SECTION_SIZE_BITS - BITS_PER_LONG_SHIFT)
> #define SECTION_ACTIVE_SIZE (1 << SECTION_ACTIVE_SHIFT)
>
> The return above would become:
> return (phys & ~(PA_SECTION_MASK)) >> SECTION_ACTIVE_SHIFT;
>
> > +}
> > +
> > +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> > +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> > +{
> > +       int idx = section_active_index(PFN_PHYS(pfn));
> > +
> > +       return !!(ms->usage->map_active & (1UL << idx));
> > +}
> > +#else
> > +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> > +{
> > +       return 1;
> > +}
> > +#endif
> > +
> >  #ifndef CONFIG_HAVE_ARCH_PFN_VALID
> >  static inline int pfn_valid(unsigned long pfn)
> >  {
> > +       struct mem_section *ms;
> > +
> >         if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> >                 return 0;
> > -       return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> > +       ms = __nr_to_section(pfn_to_section_nr(pfn));
> > +       if (!valid_section(ms))
> > +               return 0;
> > +       return pfn_section_valid(ms, pfn);
> >  }
> >  #endif
> >
> > @@ -1349,6 +1375,7 @@ void sparse_init(void);
> >  #define sparse_init()  do {} while (0)
> >  #define sparse_index_init(_sec, _nid)  do {} while (0)
> >  #define pfn_present pfn_valid
> > +#define section_active_init(_pfn, _nr_pages) do {} while (0)
> >  #endif /* CONFIG_SPARSEMEM */
> >
> >  /*
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index f671401a7c0b..c9ad28a78018 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -7273,10 +7273,12 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
> >
> >         /* Print out the early node map */
> >         pr_info("Early memory node ranges\n");
> > -       for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
> > +       for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
> >                 pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
> >                         (u64)start_pfn << PAGE_SHIFT,
> >                         ((u64)end_pfn << PAGE_SHIFT) - 1);
> > +               section_active_init(start_pfn, end_pfn - start_pfn);
> > +       }
> >
> >         /* Initialise every node */
> >         mminit_verify_pageflags_layout();
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index f87de7ad32c8..5ef2f884c4e1 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -210,6 +210,54 @@ static inline unsigned long first_present_section_nr(void)
> >         return next_present_section_nr(-1);
> >  }
> >
> > +static unsigned long section_active_mask(unsigned long pfn,
> > +               unsigned long nr_pages)
> > +{
> > +       int idx_start, idx_size;
> > +       phys_addr_t start, size;
> > +
> > +       if (!nr_pages)
> > +               return 0;
> > +
> > +       start = PFN_PHYS(pfn);
> > +       size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> > +                               - (pfn & ~PAGE_SECTION_MASK)));
> > +       size = ALIGN(size, SECTION_ACTIVE_SIZE);
> > +
> > +       idx_start = section_active_index(start);
> > +       idx_size = section_active_index(size);
> > +
> > +       if (idx_size == 0)
> > +               return -1;
> > +       return ((1UL << idx_size) - 1) << idx_start;
> > +}
> > +
> > +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> > +{
> > +       int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> > +       int i, start_sec = pfn_to_section_nr(pfn);
> > +
> > +       if (!nr_pages)
> > +               return;
> > +
> > +       for (i = start_sec; i <= end_sec; i++) {
> > +               struct mem_section *ms;
> > +               unsigned long mask;
> > +               unsigned long pfns;
> > +
> > +               pfns = min(nr_pages, PAGES_PER_SECTION
> > +                               - (pfn & ~PAGE_SECTION_MASK));
> > +               mask = section_active_mask(pfn, pfns);
> > +
> > +               ms = __nr_to_section(i);
> > +               pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i, mask);
> > +               ms->usage->map_active = mask;
> > +
> > +               pfn += pfns;
> > +               nr_pages -= pfns;
> > +       }
> > +}
>
> For some reasons the above code is confusing to me. It seems all the
> code supposed to do is set all map_active to -1, and trim the first
> and last sections (can be the same section of course). So, I would
> replace the above two functions with one function like this:
>
> void section_active_init(unsigned long pfn, unsigned long nr_pages)
> {
>         int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
>         int i, idx, start_sec = pfn_to_section_nr(pfn);
>         struct mem_section *ms;
>
>         if (!nr_pages)
>                 return;
>
>         for (i = start_sec; i <= end_sec; i++) {
>                 ms = __nr_to_section(i);
>                 ms->usage->map_active = ~0ul;
>         }
>
>         /* Might need to trim active pfns from the beginning and end */
>         idx = section_active_index(PFN_PHYS(pfn));
>         ms = __nr_to_section(start_sec);
>         ms->usage->map_active &= (~0ul << idx);
>
>         idx = section_active_index(PFN_PHYS(pfn + nr_pages -1));
>         ms = __nr_to_section(end_sec);
>         ms->usage->map_active &= (~0ul >> (BITS_PER_LONG - idx - 1));
> }

I like the cleanup, but one of the fixes in v7 resulted in the
realization that a given section may be populated twice at init time.
For example, enabling that pr_debug() yields:

    section_active_init: sec: 12 mask: 0x00000003ffffffff
    section_active_init: sec: 12 mask: 0xe000000000000000

So, the implementation can't blindly clear bits based on the current
parameters. However, I'm switching this code over to use bitmap_*()
helpers which should help with the readability.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
@ 2019-05-04 19:26       ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-05-04 19:26 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	linux-mm, linux-nvdimm, LKML, David Hildenbrand

On Thu, May 2, 2019 at 9:12 AM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
>
> On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> > section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
> > map_active bitmask length (64)). If it turns out that 2MB is too large
> > of an active tracking granularity it is trivial to increase the size of
> > the map_active bitmap.
>
> Please mention that 2M on Intel, and 16M on Arm64.
>
> >
> > The implications of a partially populated section is that pfn_valid()
> > needs to go beyond a valid_section() check and read the sub-section
> > active ranges from the bitmask.
> >
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Logan Gunthorpe <logang@deltatee.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  include/linux/mmzone.h |   29 ++++++++++++++++++++++++++++-
> >  mm/page_alloc.c        |    4 +++-
> >  mm/sparse.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 79 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 6726fc175b51..cffde898e345 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -1175,6 +1175,8 @@ struct mem_section_usage {
> >         unsigned long pageblock_flags[0];
> >  };
> >
> > +void section_active_init(unsigned long pfn, unsigned long nr_pages);
> > +
> >  struct page;
> >  struct page_ext;
> >  struct mem_section {
> > @@ -1312,12 +1314,36 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn)
> >
> >  extern int __highest_present_section_nr;
> >
> > +static inline int section_active_index(phys_addr_t phys)
> > +{
> > +       return (phys & ~(PA_SECTION_MASK)) / SECTION_ACTIVE_SIZE;
>
> How about also defining SECTION_ACTIVE_SHIFT like this:
>
> /* BITS_PER_LONG = 2^6 */
> #define BITS_PER_LONG_SHIFT 6
> #define SECTION_ACTIVE_SHIFT (SECTION_SIZE_BITS - BITS_PER_LONG_SHIFT)
> #define SECTION_ACTIVE_SIZE (1 << SECTION_ACTIVE_SHIFT)
>
> The return above would become:
> return (phys & ~(PA_SECTION_MASK)) >> SECTION_ACTIVE_SHIFT;
>
> > +}
> > +
> > +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> > +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> > +{
> > +       int idx = section_active_index(PFN_PHYS(pfn));
> > +
> > +       return !!(ms->usage->map_active & (1UL << idx));
> > +}
> > +#else
> > +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> > +{
> > +       return 1;
> > +}
> > +#endif
> > +
> >  #ifndef CONFIG_HAVE_ARCH_PFN_VALID
> >  static inline int pfn_valid(unsigned long pfn)
> >  {
> > +       struct mem_section *ms;
> > +
> >         if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> >                 return 0;
> > -       return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> > +       ms = __nr_to_section(pfn_to_section_nr(pfn));
> > +       if (!valid_section(ms))
> > +               return 0;
> > +       return pfn_section_valid(ms, pfn);
> >  }
> >  #endif
> >
> > @@ -1349,6 +1375,7 @@ void sparse_init(void);
> >  #define sparse_init()  do {} while (0)
> >  #define sparse_index_init(_sec, _nid)  do {} while (0)
> >  #define pfn_present pfn_valid
> > +#define section_active_init(_pfn, _nr_pages) do {} while (0)
> >  #endif /* CONFIG_SPARSEMEM */
> >
> >  /*
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index f671401a7c0b..c9ad28a78018 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -7273,10 +7273,12 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
> >
> >         /* Print out the early node map */
> >         pr_info("Early memory node ranges\n");
> > -       for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
> > +       for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
> >                 pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
> >                         (u64)start_pfn << PAGE_SHIFT,
> >                         ((u64)end_pfn << PAGE_SHIFT) - 1);
> > +               section_active_init(start_pfn, end_pfn - start_pfn);
> > +       }
> >
> >         /* Initialise every node */
> >         mminit_verify_pageflags_layout();
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index f87de7ad32c8..5ef2f884c4e1 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -210,6 +210,54 @@ static inline unsigned long first_present_section_nr(void)
> >         return next_present_section_nr(-1);
> >  }
> >
> > +static unsigned long section_active_mask(unsigned long pfn,
> > +               unsigned long nr_pages)
> > +{
> > +       int idx_start, idx_size;
> > +       phys_addr_t start, size;
> > +
> > +       if (!nr_pages)
> > +               return 0;
> > +
> > +       start = PFN_PHYS(pfn);
> > +       size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> > +                               - (pfn & ~PAGE_SECTION_MASK)));
> > +       size = ALIGN(size, SECTION_ACTIVE_SIZE);
> > +
> > +       idx_start = section_active_index(start);
> > +       idx_size = section_active_index(size);
> > +
> > +       if (idx_size == 0)
> > +               return -1;
> > +       return ((1UL << idx_size) - 1) << idx_start;
> > +}
> > +
> > +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> > +{
> > +       int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> > +       int i, start_sec = pfn_to_section_nr(pfn);
> > +
> > +       if (!nr_pages)
> > +               return;
> > +
> > +       for (i = start_sec; i <= end_sec; i++) {
> > +               struct mem_section *ms;
> > +               unsigned long mask;
> > +               unsigned long pfns;
> > +
> > +               pfns = min(nr_pages, PAGES_PER_SECTION
> > +                               - (pfn & ~PAGE_SECTION_MASK));
> > +               mask = section_active_mask(pfn, pfns);
> > +
> > +               ms = __nr_to_section(i);
> > +               pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i, mask);
> > +               ms->usage->map_active = mask;
> > +
> > +               pfn += pfns;
> > +               nr_pages -= pfns;
> > +       }
> > +}
>
> For some reasons the above code is confusing to me. It seems all the
> code supposed to do is set all map_active to -1, and trim the first
> and last sections (can be the same section of course). So, I would
> replace the above two functions with one function like this:
>
> void section_active_init(unsigned long pfn, unsigned long nr_pages)
> {
>         int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
>         int i, idx, start_sec = pfn_to_section_nr(pfn);
>         struct mem_section *ms;
>
>         if (!nr_pages)
>                 return;
>
>         for (i = start_sec; i <= end_sec; i++) {
>                 ms = __nr_to_section(i);
>                 ms->usage->map_active = ~0ul;
>         }
>
>         /* Might need to trim active pfns from the beginning and end */
>         idx = section_active_index(PFN_PHYS(pfn));
>         ms = __nr_to_section(start_sec);
>         ms->usage->map_active &= (~0ul << idx);
>
>         idx = section_active_index(PFN_PHYS(pfn + nr_pages -1));
>         ms = __nr_to_section(end_sec);
>         ms->usage->map_active &= (~0ul >> (BITS_PER_LONG - idx - 1));
> }

I like the cleanup, but one of the fixes in v7 resulted in the
realization that a given section may be populated twice at init time.
For example, enabling that pr_debug() yields:

    section_active_init: sec: 12 mask: 0x00000003ffffffff
    section_active_init: sec: 12 mask: 0xe000000000000000

So, the implementation can't blindly clear bits based on the current
parameters. However, I'm switching this code over to use bitmap_*()
helpers which should help with the readability.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
@ 2019-05-04 19:26       ` Dan Williams
  0 siblings, 0 replies; 111+ messages in thread
From: Dan Williams @ 2019-05-04 19:26 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Logan Gunthorpe,
	linux-mm, linux-nvdimm, LKML, David Hildenbrand

On Thu, May 2, 2019 at 9:12 AM Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
>
> On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> > section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
> > map_active bitmask length (64)). If it turns out that 2MB is too large
> > of an active tracking granularity it is trivial to increase the size of
> > the map_active bitmap.
>
> Please mention that 2M on Intel, and 16M on Arm64.
>
> >
> > The implications of a partially populated section is that pfn_valid()
> > needs to go beyond a valid_section() check and read the sub-section
> > active ranges from the bitmask.
> >
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Logan Gunthorpe <logang@deltatee.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  include/linux/mmzone.h |   29 ++++++++++++++++++++++++++++-
> >  mm/page_alloc.c        |    4 +++-
> >  mm/sparse.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 79 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 6726fc175b51..cffde898e345 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -1175,6 +1175,8 @@ struct mem_section_usage {
> >         unsigned long pageblock_flags[0];
> >  };
> >
> > +void section_active_init(unsigned long pfn, unsigned long nr_pages);
> > +
> >  struct page;
> >  struct page_ext;
> >  struct mem_section {
> > @@ -1312,12 +1314,36 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn)
> >
> >  extern int __highest_present_section_nr;
> >
> > +static inline int section_active_index(phys_addr_t phys)
> > +{
> > +       return (phys & ~(PA_SECTION_MASK)) / SECTION_ACTIVE_SIZE;
>
> How about also defining SECTION_ACTIVE_SHIFT like this:
>
> /* BITS_PER_LONG = 2^6 */
> #define BITS_PER_LONG_SHIFT 6
> #define SECTION_ACTIVE_SHIFT (SECTION_SIZE_BITS - BITS_PER_LONG_SHIFT)
> #define SECTION_ACTIVE_SIZE (1 << SECTION_ACTIVE_SHIFT)
>
> The return above would become:
> return (phys & ~(PA_SECTION_MASK)) >> SECTION_ACTIVE_SHIFT;
>
> > +}
> > +
> > +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> > +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> > +{
> > +       int idx = section_active_index(PFN_PHYS(pfn));
> > +
> > +       return !!(ms->usage->map_active & (1UL << idx));
> > +}
> > +#else
> > +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> > +{
> > +       return 1;
> > +}
> > +#endif
> > +
> >  #ifndef CONFIG_HAVE_ARCH_PFN_VALID
> >  static inline int pfn_valid(unsigned long pfn)
> >  {
> > +       struct mem_section *ms;
> > +
> >         if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> >                 return 0;
> > -       return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> > +       ms = __nr_to_section(pfn_to_section_nr(pfn));
> > +       if (!valid_section(ms))
> > +               return 0;
> > +       return pfn_section_valid(ms, pfn);
> >  }
> >  #endif
> >
> > @@ -1349,6 +1375,7 @@ void sparse_init(void);
> >  #define sparse_init()  do {} while (0)
> >  #define sparse_index_init(_sec, _nid)  do {} while (0)
> >  #define pfn_present pfn_valid
> > +#define section_active_init(_pfn, _nr_pages) do {} while (0)
> >  #endif /* CONFIG_SPARSEMEM */
> >
> >  /*
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index f671401a7c0b..c9ad28a78018 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -7273,10 +7273,12 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
> >
> >         /* Print out the early node map */
> >         pr_info("Early memory node ranges\n");
> > -       for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
> > +       for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
> >                 pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
> >                         (u64)start_pfn << PAGE_SHIFT,
> >                         ((u64)end_pfn << PAGE_SHIFT) - 1);
> > +               section_active_init(start_pfn, end_pfn - start_pfn);
> > +       }
> >
> >         /* Initialise every node */
> >         mminit_verify_pageflags_layout();
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index f87de7ad32c8..5ef2f884c4e1 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -210,6 +210,54 @@ static inline unsigned long first_present_section_nr(void)
> >         return next_present_section_nr(-1);
> >  }
> >
> > +static unsigned long section_active_mask(unsigned long pfn,
> > +               unsigned long nr_pages)
> > +{
> > +       int idx_start, idx_size;
> > +       phys_addr_t start, size;
> > +
> > +       if (!nr_pages)
> > +               return 0;
> > +
> > +       start = PFN_PHYS(pfn);
> > +       size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> > +                               - (pfn & ~PAGE_SECTION_MASK)));
> > +       size = ALIGN(size, SECTION_ACTIVE_SIZE);
> > +
> > +       idx_start = section_active_index(start);
> > +       idx_size = section_active_index(size);
> > +
> > +       if (idx_size == 0)
> > +               return -1;
> > +       return ((1UL << idx_size) - 1) << idx_start;
> > +}
> > +
> > +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> > +{
> > +       int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> > +       int i, start_sec = pfn_to_section_nr(pfn);
> > +
> > +       if (!nr_pages)
> > +               return;
> > +
> > +       for (i = start_sec; i <= end_sec; i++) {
> > +               struct mem_section *ms;
> > +               unsigned long mask;
> > +               unsigned long pfns;
> > +
> > +               pfns = min(nr_pages, PAGES_PER_SECTION
> > +                               - (pfn & ~PAGE_SECTION_MASK));
> > +               mask = section_active_mask(pfn, pfns);
> > +
> > +               ms = __nr_to_section(i);
> > +               pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i, mask);
> > +               ms->usage->map_active = mask;
> > +
> > +               pfn += pfns;
> > +               nr_pages -= pfns;
> > +       }
> > +}
>
> For some reasons the above code is confusing to me. It seems all the
> code supposed to do is set all map_active to -1, and trim the first
> and last sections (can be the same section of course). So, I would
> replace the above two functions with one function like this:
>
> void section_active_init(unsigned long pfn, unsigned long nr_pages)
> {
>         int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
>         int i, idx, start_sec = pfn_to_section_nr(pfn);
>         struct mem_section *ms;
>
>         if (!nr_pages)
>                 return;
>
>         for (i = start_sec; i <= end_sec; i++) {
>                 ms = __nr_to_section(i);
>                 ms->usage->map_active = ~0ul;
>         }
>
>         /* Might need to trim active pfns from the beginning and end */
>         idx = section_active_index(PFN_PHYS(pfn));
>         ms = __nr_to_section(start_sec);
>         ms->usage->map_active &= (~0ul << idx);
>
>         idx = section_active_index(PFN_PHYS(pfn + nr_pages -1));
>         ms = __nr_to_section(end_sec);
>         ms->usage->map_active &= (~0ul >> (BITS_PER_LONG - idx - 1));
> }

I like the cleanup, but one of the fixes in v7 resulted in the
realization that a given section may be populated twice at init time.
For example, enabling that pr_debug() yields:

    section_active_init: sec: 12 mask: 0x00000003ffffffff
    section_active_init: sec: 12 mask: 0xe000000000000000

So, the implementation can't blindly clear bits based on the current
parameters. However, I'm switching this code over to use bitmap_*()
helpers which should help with the readability.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
  2019-05-04 19:26       ` Dan Williams
@ 2019-05-04 19:40         ` Pavel Tatashin
  -1 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-04 19:40 UTC (permalink / raw)
  To: Dan Williams
  Cc: mhocko, linux-nvdimm, David Hildenbrand, LKML, linux-mm, akpm,
	Vlastimil Babka

On Sat, May 4, 2019, 3:26 PM Dan Williams <dan.j.williams@intel.com> wrote:

> On Thu, May 2, 2019 at 9:12 AM Pavel Tatashin <pasha.tatashin@soleen.com>
> wrote:
> >
> > On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com>
> wrote:
> > >
> > > Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> > > section active bitmask, each bit representing 2MB (SECTION_SIZE (128M)
> /
> > > map_active bitmask length (64)). If it turns out that 2MB is too large
> > > of an active tracking granularity it is trivial to increase the size of
> > > the map_active bitmap.
> >
> > Please mention that 2M on Intel, and 16M on Arm64.
> >
> > >
> > > The implications of a partially populated section is that pfn_valid()
> > > needs to go beyond a valid_section() check and read the sub-section
> > > active ranges from the bitmask.
> > >
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Vlastimil Babka <vbabka@suse.cz>
> > > Cc: Logan Gunthorpe <logang@deltatee.com>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > ---
> > >  include/linux/mmzone.h |   29 ++++++++++++++++++++++++++++-
> > >  mm/page_alloc.c        |    4 +++-
> > >  mm/sparse.c            |   48
> ++++++++++++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 79 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > > index 6726fc175b51..cffde898e345 100644
> > > --- a/include/linux/mmzone.h
> > > +++ b/include/linux/mmzone.h
> > > @@ -1175,6 +1175,8 @@ struct mem_section_usage {
> > >         unsigned long pageblock_flags[0];
> > >  };
> > >
> > > +void section_active_init(unsigned long pfn, unsigned long nr_pages);
> > > +
> > >  struct page;
> > >  struct page_ext;
> > >  struct mem_section {
> > > @@ -1312,12 +1314,36 @@ static inline struct mem_section
> *__pfn_to_section(unsigned long pfn)
> > >
> > >  extern int __highest_present_section_nr;
> > >
> > > +static inline int section_active_index(phys_addr_t phys)
> > > +{
> > > +       return (phys & ~(PA_SECTION_MASK)) / SECTION_ACTIVE_SIZE;
> >
> > How about also defining SECTION_ACTIVE_SHIFT like this:
> >
> > /* BITS_PER_LONG = 2^6 */
> > #define BITS_PER_LONG_SHIFT 6
> > #define SECTION_ACTIVE_SHIFT (SECTION_SIZE_BITS - BITS_PER_LONG_SHIFT)
> > #define SECTION_ACTIVE_SIZE (1 << SECTION_ACTIVE_SHIFT)
> >
> > The return above would become:
> > return (phys & ~(PA_SECTION_MASK)) >> SECTION_ACTIVE_SHIFT;
> >
> > > +}
> > > +
> > > +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> > > +static inline int pfn_section_valid(struct mem_section *ms, unsigned
> long pfn)
> > > +{
> > > +       int idx = section_active_index(PFN_PHYS(pfn));
> > > +
> > > +       return !!(ms->usage->map_active & (1UL << idx));
> > > +}
> > > +#else
> > > +static inline int pfn_section_valid(struct mem_section *ms, unsigned
> long pfn)
> > > +{
> > > +       return 1;
> > > +}
> > > +#endif
> > > +
> > >  #ifndef CONFIG_HAVE_ARCH_PFN_VALID
> > >  static inline int pfn_valid(unsigned long pfn)
> > >  {
> > > +       struct mem_section *ms;
> > > +
> > >         if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> > >                 return 0;
> > > -       return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> > > +       ms = __nr_to_section(pfn_to_section_nr(pfn));
> > > +       if (!valid_section(ms))
> > > +               return 0;
> > > +       return pfn_section_valid(ms, pfn);
> > >  }
> > >  #endif
> > >
> > > @@ -1349,6 +1375,7 @@ void sparse_init(void);
> > >  #define sparse_init()  do {} while (0)
> > >  #define sparse_index_init(_sec, _nid)  do {} while (0)
> > >  #define pfn_present pfn_valid
> > > +#define section_active_init(_pfn, _nr_pages) do {} while (0)
> > >  #endif /* CONFIG_SPARSEMEM */
> > >
> > >  /*
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index f671401a7c0b..c9ad28a78018 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -7273,10 +7273,12 @@ void __init free_area_init_nodes(unsigned long
> *max_zone_pfn)
> > >
> > >         /* Print out the early node map */
> > >         pr_info("Early memory node ranges\n");
> > > -       for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn,
> &nid)
> > > +       for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn,
> &nid) {
> > >                 pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
> > >                         (u64)start_pfn << PAGE_SHIFT,
> > >                         ((u64)end_pfn << PAGE_SHIFT) - 1);
> > > +               section_active_init(start_pfn, end_pfn - start_pfn);
> > > +       }
> > >
> > >         /* Initialise every node */
> > >         mminit_verify_pageflags_layout();
> > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > index f87de7ad32c8..5ef2f884c4e1 100644
> > > --- a/mm/sparse.c
> > > +++ b/mm/sparse.c
> > > @@ -210,6 +210,54 @@ static inline unsigned long
> first_present_section_nr(void)
> > >         return next_present_section_nr(-1);
> > >  }
> > >
> > > +static unsigned long section_active_mask(unsigned long pfn,
> > > +               unsigned long nr_pages)
> > > +{
> > > +       int idx_start, idx_size;
> > > +       phys_addr_t start, size;
> > > +
> > > +       if (!nr_pages)
> > > +               return 0;
> > > +
> > > +       start = PFN_PHYS(pfn);
> > > +       size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> > > +                               - (pfn & ~PAGE_SECTION_MASK)));
> > > +       size = ALIGN(size, SECTION_ACTIVE_SIZE);
> > > +
> > > +       idx_start = section_active_index(start);
> > > +       idx_size = section_active_index(size);
> > > +
> > > +       if (idx_size == 0)
> > > +               return -1;
> > > +       return ((1UL << idx_size) - 1) << idx_start;
> > > +}
> > > +
> > > +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> > > +{
> > > +       int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> > > +       int i, start_sec = pfn_to_section_nr(pfn);
> > > +
> > > +       if (!nr_pages)
> > > +               return;
> > > +
> > > +       for (i = start_sec; i <= end_sec; i++) {
> > > +               struct mem_section *ms;
> > > +               unsigned long mask;
> > > +               unsigned long pfns;
> > > +
> > > +               pfns = min(nr_pages, PAGES_PER_SECTION
> > > +                               - (pfn & ~PAGE_SECTION_MASK));
> > > +               mask = section_active_mask(pfn, pfns);
> > > +
> > > +               ms = __nr_to_section(i);
> > > +               pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i,
> mask);
> > > +               ms->usage->map_active = mask;
> > > +
> > > +               pfn += pfns;
> > > +               nr_pages -= pfns;
> > > +       }
> > > +}
> >
> > For some reasons the above code is confusing to me. It seems all the
> > code supposed to do is set all map_active to -1, and trim the first
> > and last sections (can be the same section of course). So, I would
> > replace the above two functions with one function like this:
> >
> > void section_active_init(unsigned long pfn, unsigned long nr_pages)
> > {
> >         int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> >         int i, idx, start_sec = pfn_to_section_nr(pfn);
> >         struct mem_section *ms;
> >
> >         if (!nr_pages)
> >                 return;
> >
> >         for (i = start_sec; i <= end_sec; i++) {
> >                 ms = __nr_to_section(i);
> >                 ms->usage->map_active = ~0ul;
> >         }
> >
> >         /* Might need to trim active pfns from the beginning and end */
> >         idx = section_active_index(PFN_PHYS(pfn));
> >         ms = __nr_to_section(start_sec);
> >         ms->usage->map_active &= (~0ul << idx);
> >
> >         idx = section_active_index(PFN_PHYS(pfn + nr_pages -1));
> >         ms = __nr_to_section(end_sec);
> >         ms->usage->map_active &= (~0ul >> (BITS_PER_LONG - idx - 1));
> > }
>
> I like the cleanup, but one of the fixes in v7 resulted in the
> realization that a given section may be populated twice at init time.
> For example, enabling that pr_debug() yields:
>
>     section_active_init: sec: 12 mask: 0x00000003ffffffff
>     section_active_init: sec: 12 mask: 0xe000000000000000
>
> So, the implementation can't blindly clear bits based on the current
> parameters. However, I'm switching this code over to use bitmap_*()
> helpers which should help with the readability.
>

Yes, bitmap_* will help, and I assume you will also make active_map size
scalable?

I will take another look at version 8.


Thank you,
Pasha
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
@ 2019-05-04 19:40         ` Pavel Tatashin
  0 siblings, 0 replies; 111+ messages in thread
From: Pavel Tatashin @ 2019-05-04 19:40 UTC (permalink / raw)
  To: Dan Williams
  Cc: akpm, mhocko, Vlastimil Babka, Logan Gunthorpe, linux-mm,
	linux-nvdimm, LKML, David Hildenbrand

[-- Attachment #1: Type: text/plain, Size: 8535 bytes --]

On Sat, May 4, 2019, 3:26 PM Dan Williams <dan.j.williams@intel.com> wrote:

> On Thu, May 2, 2019 at 9:12 AM Pavel Tatashin <pasha.tatashin@soleen.com>
> wrote:
> >
> > On Wed, Apr 17, 2019 at 2:53 PM Dan Williams <dan.j.williams@intel.com>
> wrote:
> > >
> > > Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> > > section active bitmask, each bit representing 2MB (SECTION_SIZE (128M)
> /
> > > map_active bitmask length (64)). If it turns out that 2MB is too large
> > > of an active tracking granularity it is trivial to increase the size of
> > > the map_active bitmap.
> >
> > Please mention that 2M on Intel, and 16M on Arm64.
> >
> > >
> > > The implications of a partially populated section is that pfn_valid()
> > > needs to go beyond a valid_section() check and read the sub-section
> > > active ranges from the bitmask.
> > >
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Vlastimil Babka <vbabka@suse.cz>
> > > Cc: Logan Gunthorpe <logang@deltatee.com>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > ---
> > >  include/linux/mmzone.h |   29 ++++++++++++++++++++++++++++-
> > >  mm/page_alloc.c        |    4 +++-
> > >  mm/sparse.c            |   48
> ++++++++++++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 79 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > > index 6726fc175b51..cffde898e345 100644
> > > --- a/include/linux/mmzone.h
> > > +++ b/include/linux/mmzone.h
> > > @@ -1175,6 +1175,8 @@ struct mem_section_usage {
> > >         unsigned long pageblock_flags[0];
> > >  };
> > >
> > > +void section_active_init(unsigned long pfn, unsigned long nr_pages);
> > > +
> > >  struct page;
> > >  struct page_ext;
> > >  struct mem_section {
> > > @@ -1312,12 +1314,36 @@ static inline struct mem_section
> *__pfn_to_section(unsigned long pfn)
> > >
> > >  extern int __highest_present_section_nr;
> > >
> > > +static inline int section_active_index(phys_addr_t phys)
> > > +{
> > > +       return (phys & ~(PA_SECTION_MASK)) / SECTION_ACTIVE_SIZE;
> >
> > How about also defining SECTION_ACTIVE_SHIFT like this:
> >
> > /* BITS_PER_LONG = 2^6 */
> > #define BITS_PER_LONG_SHIFT 6
> > #define SECTION_ACTIVE_SHIFT (SECTION_SIZE_BITS - BITS_PER_LONG_SHIFT)
> > #define SECTION_ACTIVE_SIZE (1 << SECTION_ACTIVE_SHIFT)
> >
> > The return above would become:
> > return (phys & ~(PA_SECTION_MASK)) >> SECTION_ACTIVE_SHIFT;
> >
> > > +}
> > > +
> > > +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> > > +static inline int pfn_section_valid(struct mem_section *ms, unsigned
> long pfn)
> > > +{
> > > +       int idx = section_active_index(PFN_PHYS(pfn));
> > > +
> > > +       return !!(ms->usage->map_active & (1UL << idx));
> > > +}
> > > +#else
> > > +static inline int pfn_section_valid(struct mem_section *ms, unsigned
> long pfn)
> > > +{
> > > +       return 1;
> > > +}
> > > +#endif
> > > +
> > >  #ifndef CONFIG_HAVE_ARCH_PFN_VALID
> > >  static inline int pfn_valid(unsigned long pfn)
> > >  {
> > > +       struct mem_section *ms;
> > > +
> > >         if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> > >                 return 0;
> > > -       return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> > > +       ms = __nr_to_section(pfn_to_section_nr(pfn));
> > > +       if (!valid_section(ms))
> > > +               return 0;
> > > +       return pfn_section_valid(ms, pfn);
> > >  }
> > >  #endif
> > >
> > > @@ -1349,6 +1375,7 @@ void sparse_init(void);
> > >  #define sparse_init()  do {} while (0)
> > >  #define sparse_index_init(_sec, _nid)  do {} while (0)
> > >  #define pfn_present pfn_valid
> > > +#define section_active_init(_pfn, _nr_pages) do {} while (0)
> > >  #endif /* CONFIG_SPARSEMEM */
> > >
> > >  /*
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index f671401a7c0b..c9ad28a78018 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -7273,10 +7273,12 @@ void __init free_area_init_nodes(unsigned long
> *max_zone_pfn)
> > >
> > >         /* Print out the early node map */
> > >         pr_info("Early memory node ranges\n");
> > > -       for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn,
> &nid)
> > > +       for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn,
> &nid) {
> > >                 pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
> > >                         (u64)start_pfn << PAGE_SHIFT,
> > >                         ((u64)end_pfn << PAGE_SHIFT) - 1);
> > > +               section_active_init(start_pfn, end_pfn - start_pfn);
> > > +       }
> > >
> > >         /* Initialise every node */
> > >         mminit_verify_pageflags_layout();
> > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > index f87de7ad32c8..5ef2f884c4e1 100644
> > > --- a/mm/sparse.c
> > > +++ b/mm/sparse.c
> > > @@ -210,6 +210,54 @@ static inline unsigned long
> first_present_section_nr(void)
> > >         return next_present_section_nr(-1);
> > >  }
> > >
> > > +static unsigned long section_active_mask(unsigned long pfn,
> > > +               unsigned long nr_pages)
> > > +{
> > > +       int idx_start, idx_size;
> > > +       phys_addr_t start, size;
> > > +
> > > +       if (!nr_pages)
> > > +               return 0;
> > > +
> > > +       start = PFN_PHYS(pfn);
> > > +       size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> > > +                               - (pfn & ~PAGE_SECTION_MASK)));
> > > +       size = ALIGN(size, SECTION_ACTIVE_SIZE);
> > > +
> > > +       idx_start = section_active_index(start);
> > > +       idx_size = section_active_index(size);
> > > +
> > > +       if (idx_size == 0)
> > > +               return -1;
> > > +       return ((1UL << idx_size) - 1) << idx_start;
> > > +}
> > > +
> > > +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> > > +{
> > > +       int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> > > +       int i, start_sec = pfn_to_section_nr(pfn);
> > > +
> > > +       if (!nr_pages)
> > > +               return;
> > > +
> > > +       for (i = start_sec; i <= end_sec; i++) {
> > > +               struct mem_section *ms;
> > > +               unsigned long mask;
> > > +               unsigned long pfns;
> > > +
> > > +               pfns = min(nr_pages, PAGES_PER_SECTION
> > > +                               - (pfn & ~PAGE_SECTION_MASK));
> > > +               mask = section_active_mask(pfn, pfns);
> > > +
> > > +               ms = __nr_to_section(i);
> > > +               pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i,
> mask);
> > > +               ms->usage->map_active = mask;
> > > +
> > > +               pfn += pfns;
> > > +               nr_pages -= pfns;
> > > +       }
> > > +}
> >
> > For some reasons the above code is confusing to me. It seems all the
> > code supposed to do is set all map_active to -1, and trim the first
> > and last sections (can be the same section of course). So, I would
> > replace the above two functions with one function like this:
> >
> > void section_active_init(unsigned long pfn, unsigned long nr_pages)
> > {
> >         int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> >         int i, idx, start_sec = pfn_to_section_nr(pfn);
> >         struct mem_section *ms;
> >
> >         if (!nr_pages)
> >                 return;
> >
> >         for (i = start_sec; i <= end_sec; i++) {
> >                 ms = __nr_to_section(i);
> >                 ms->usage->map_active = ~0ul;
> >         }
> >
> >         /* Might need to trim active pfns from the beginning and end */
> >         idx = section_active_index(PFN_PHYS(pfn));
> >         ms = __nr_to_section(start_sec);
> >         ms->usage->map_active &= (~0ul << idx);
> >
> >         idx = section_active_index(PFN_PHYS(pfn + nr_pages -1));
> >         ms = __nr_to_section(end_sec);
> >         ms->usage->map_active &= (~0ul >> (BITS_PER_LONG - idx - 1));
> > }
>
> I like the cleanup, but one of the fixes in v7 resulted in the
> realization that a given section may be populated twice at init time.
> For example, enabling that pr_debug() yields:
>
>     section_active_init: sec: 12 mask: 0x00000003ffffffff
>     section_active_init: sec: 12 mask: 0xe000000000000000
>
> So, the implementation can't blindly clear bits based on the current
> parameters. However, I'm switching this code over to use bitmap_*()
> helpers which should help with the readability.
>

Yes, bitmap_* will help, and I assume you will also make active_map size
scalable?

I will take another look at version 8.


Thank you,
Pasha

[-- Attachment #2: Type: text/html, Size: 12008 bytes --]

^ permalink raw reply	[flat|nested] 111+ messages in thread

end of thread, other threads:[~2019-05-04 19:40 UTC | newest]

Thread overview: 111+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-17 18:38 [PATCH v6 00/12] mm: Sub-section memory hotplug support Dan Williams
2019-04-17 18:38 ` Dan Williams
2019-04-17 18:39 ` [PATCH v6 01/12] mm/sparsemem: Introduce struct mem_section_usage Dan Williams
2019-04-17 18:39   ` Dan Williams
2019-05-01 23:25   ` Pavel Tatashin
2019-05-01 23:25     ` Pavel Tatashin
2019-05-02  6:07     ` Dan Williams
2019-05-02  6:07       ` Dan Williams
2019-05-02 14:16       ` Pavel Tatashin
2019-05-02 14:16         ` Pavel Tatashin
2019-05-04  0:22       ` Dan Williams
2019-05-04  0:22         ` Dan Williams
2019-05-04  0:22         ` Dan Williams
2019-05-04 15:55         ` Pavel Tatashin
2019-05-04 15:55           ` Pavel Tatashin
2019-05-04 15:55           ` Pavel Tatashin
2019-04-17 18:39 ` [PATCH v6 02/12] mm/sparsemem: Introduce common definitions for the size and mask of a section Dan Williams
2019-04-17 18:39   ` Dan Williams
2019-05-02 14:53   ` Pavel Tatashin
2019-05-03  0:41     ` Dan Williams
2019-05-03  0:41       ` Dan Williams
2019-05-03 10:35       ` Robin Murphy
2019-05-03 10:35         ` Robin Murphy
2019-05-03 12:57         ` Pavel Tatashin
2019-05-03 13:00           ` Oscar Salvador
2019-05-03 13:00             ` Oscar Salvador
2019-04-17 18:39 ` [PATCH v6 03/12] mm/sparsemem: Add helpers track active portions of a section at boot Dan Williams
2019-04-17 18:39   ` Dan Williams
2019-04-25 14:33   ` Oscar Salvador
2019-04-25 14:43   ` Oscar Salvador
2019-04-25 14:43     ` Oscar Salvador
2019-04-25 14:43     ` Oscar Salvador
2019-04-26 12:57   ` Oscar Salvador
2019-04-26 12:57     ` Oscar Salvador
2019-05-02 16:12   ` Pavel Tatashin
2019-05-02 16:12     ` Pavel Tatashin
2019-05-02 16:12     ` Pavel Tatashin
2019-05-04 19:26     ` Dan Williams
2019-05-04 19:26       ` Dan Williams
2019-05-04 19:26       ` Dan Williams
2019-05-04 19:40       ` Pavel Tatashin
2019-05-04 19:40         ` Pavel Tatashin
2019-04-17 18:39 ` [PATCH v6 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal Dan Williams
2019-04-17 18:39   ` Dan Williams
2019-04-19 23:09   ` Ralph Campbell
2019-04-19 23:09     ` Ralph Campbell
2019-04-19 23:13     ` Dan Williams
2019-04-19 23:13       ` Dan Williams
2019-04-19 23:13       ` Dan Williams
2019-04-26 13:59   ` Oscar Salvador
2019-04-26 14:00     ` Oscar Salvador
2019-05-02 19:18   ` Pavel Tatashin
2019-05-02 19:18     ` Pavel Tatashin
2019-05-02 19:18     ` Pavel Tatashin
2019-04-17 18:39 ` [PATCH v6 05/12] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap() Dan Williams
2019-04-17 18:39   ` Dan Williams
2019-05-02 19:28   ` Pavel Tatashin
2019-05-02 19:28     ` Pavel Tatashin
2019-04-17 18:39 ` [PATCH v6 06/12] mm/hotplug: Add mem-hotplug restrictions for remove_memory() Dan Williams
2019-04-17 18:39   ` Dan Williams
2019-04-23 21:21   ` David Hildenbrand
2019-04-23 21:21     ` David Hildenbrand
2019-04-24 18:07     ` Dan Williams
2019-04-24 18:07       ` Dan Williams
2019-04-17 18:39 ` [PATCH v6 07/12] mm: Kill is_dev_zone() helper Dan Williams
2019-04-17 18:39   ` Dan Williams
2019-04-17 20:17   ` David Hildenbrand
2019-04-17 20:17     ` David Hildenbrand
2019-04-26 14:04   ` Oscar Salvador
2019-04-26 14:04     ` Oscar Salvador
2019-05-02 20:37   ` Pavel Tatashin
2019-05-02 20:37     ` Pavel Tatashin
2019-05-02 20:37     ` Pavel Tatashin
2019-04-17 18:39 ` [PATCH v6 08/12] mm/sparsemem: Prepare for sub-section ranges Dan Williams
2019-04-17 18:39   ` Dan Williams
2019-05-02 21:25   ` Pavel Tatashin
2019-05-02 21:25     ` Pavel Tatashin
2019-05-02 21:25     ` Pavel Tatashin
2019-04-17 18:39 ` [PATCH v6 09/12] mm/sparsemem: Support sub-section hotplug Dan Williams
2019-04-17 18:39   ` Dan Williams
2019-04-17 18:39 ` [PATCH v6 10/12] mm/devm_memremap_pages: Enable sub-section remap Dan Williams
2019-04-17 18:39   ` Dan Williams
2019-04-17 18:39 ` [PATCH v6 11/12] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields Dan Williams
2019-04-17 18:39   ` Dan Williams
2019-04-17 22:02   ` Andrew Morton
2019-04-17 22:02     ` Andrew Morton
2019-04-17 22:09     ` Dan Williams
2019-04-17 22:09       ` Dan Williams
2019-04-17 18:39 ` [PATCH v6 12/12] libnvdimm/pfn: Stop padding pmem namespaces to section alignment Dan Williams
2019-04-17 18:39   ` Dan Williams
2019-04-17 22:03 ` [PATCH v6 00/12] mm: Sub-section memory hotplug support Andrew Morton
2019-04-17 22:03   ` Andrew Morton
2019-04-17 22:59   ` Dan Williams
2019-04-17 22:59     ` Dan Williams
2019-04-18  2:09     ` Dan Williams
2019-04-18  2:09       ` Dan Williams
2019-04-18 12:45       ` Jeff Moyer
2019-04-18 12:45         ` Jeff Moyer
2019-04-19  3:25         ` Dan Williams
2019-04-19  3:25           ` Dan Williams
2019-04-23 13:16     ` Oscar Salvador
2019-04-23 13:16       ` Oscar Salvador
2019-04-24 20:43       ` Pavel Tatashin
2019-04-24 20:43         ` Pavel Tatashin
2019-05-02 22:46 ` Pavel Tatashin
2019-05-02 23:20   ` Dan Williams
2019-05-02 23:20     ` Dan Williams
2019-05-02 23:21     ` Dan Williams
2019-05-02 23:21       ` Dan Williams
2019-05-03 10:48     ` Oscar Salvador
2019-05-03 10:48       ` Oscar Salvador

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.