linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] Allocate memmap from hotadded memory
@ 2019-07-25 16:02 Oscar Salvador
  2019-07-25 16:02 ` [PATCH v3 1/5] mm,memory_hotplug: Introduce MHP_MEMMAP_ON_MEMORY Oscar Salvador
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Oscar Salvador @ 2019-07-25 16:02 UTC (permalink / raw)
  To: akpm
  Cc: dan.j.williams, david, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel, Oscar Salvador

Here we go with v3.

v3 -> v2:
        * Rewrite about vmemmap pages handling.
          Prior to this version, I was (ab)using hugepages fields
          from struct page, while here I am officially adding a new
          sub-page type with the fields I need.

        * Drop MHP_MEMMAP_{MEMBLOCK,DEVICE} in favor of MHP_MEMMAP_ON_MEMORY.
          While I am still not 100% if this the right decision, and while I
          still see some gaining in having MHP_MEMMAP_{MEMBLOCK,DEVICE},
          having only one flag ease the code.
          If the user wants to allocate memmaps per memblock, it'll
          have to call add_memory() variants with memory-block granularity.

          If we happen to have a more clear usecase MHP_MEMMAP_MEMBLOCK
          flag in the future, so user does not have to bother about the way
          it calls add_memory() variants, but only pass a flag, we can add it.
          Actually, I already had the code, so add it in the future is going to be
          easy.

        * Granularity check when hot-removing memory.
          Just checking that the granularity is the same.

[Testing]

 - x86_64: small and large memblocks (128MB, 1G and 2G)

So far, only acpi memory hotplug uses the new flag.
The other callers can be changed depending on their needs.

[Coverletter]

This is another step to make memory hotplug more usable. The primary
goal of this patchset is to reduce memory overhead of the hot-added
memory (at least for SPARSEMEM_VMEMMAP memory model). The current way we use
to populate memmap (struct page array) has two main drawbacks:

a) it consumes an additional memory until the hotadded memory itself is
   onlined and
b) memmap might end up on a different numa node which is especially true
   for movable_node configuration.

a) it is a problem especially for memory hotplug based memory "ballooning"
   solutions when the delay between physical memory hotplug and the
   onlining can lead to OOM and that led to introduction of hacks like auto
   onlining (see 31bc3858ea3e ("memory-hotplug: add automatic onlining
   policy for the newly added memory")).

b) can have performance drawbacks.

One way to mitigate all these issues is to simply allocate memmap array
(which is the largest memory footprint of the physical memory hotplug)
from the hot-added memory itself. SPARSEMEM_VMEMMAP memory model allows
us to map any pfn range so the memory doesn't need to be online to be
usable for the array. See patch 3 for more details.
This feature is only usable when CONFIG_SPARSEMEM_VMEMMAP is set.

[Overall design]:

Implementation wise we reuse vmem_altmap infrastructure to override
the default allocator used by vmemap_populate. Once the memmap is
allocated we need a way to mark altmap pfns used for the allocation.
If MHP_MEMMAP_ON_MEMORY flag was passed, we set up the layout of the
altmap structure at the beginning of __add_pages(), and then we call
mark_vmemmap_pages().

MHP_MEMMAP_ON_MEMORY flag parameter will specify to allocate memmaps
from the hot-added range.
If callers wants memmaps to be allocated per memory block, it will
have to call add_memory() variants in memory-block granularity
spanning the whole range, while if it wants to allocate memmaps
per whole memory range, just one call will do.

Want to add 384MB (3 sections, 3 memory-blocks)
e.g:

add_memory(0x1000, size_memory_block);
add_memory(0x2000, size_memory_block);
add_memory(0x3000, size_memory_block);

or

add_memory(0x1000, size_memory_block * 3);

One thing worth mention is that vmemmap pages residing in movable memory is not a
show-stopper for that memory to be offlined/migrated away.
Vmemmap pages are just ignored in that case and they stick around until sections
referred by those vmemmap pages are hot-removed.

Oscar Salvador (5):
  mm,memory_hotplug: Introduce MHP_MEMMAP_ON_MEMORY
  mm: Introduce a new Vmemmap page-type
  mm,sparse: Add SECTION_USE_VMEMMAP flag
  mm,memory_hotplug: Allocate memmap from the added memory range for
    sparse-vmemmap
  mm,memory_hotplug: Allow userspace to enable/disable vmemmap

 arch/powerpc/mm/init_64.c      |   7 ++
 arch/s390/mm/init.c            |   6 ++
 arch/x86/mm/init_64.c          |  10 +++
 drivers/acpi/acpi_memhotplug.c |   3 +-
 drivers/base/memory.c          |  35 +++++++++-
 drivers/dax/kmem.c             |   2 +-
 drivers/hv/hv_balloon.c        |   2 +-
 drivers/s390/char/sclp_cmd.c   |   2 +-
 drivers/xen/balloon.c          |   2 +-
 include/linux/memory_hotplug.h |  37 ++++++++--
 include/linux/memremap.h       |   2 +-
 include/linux/mm.h             |  17 +++++
 include/linux/mm_types.h       |   5 ++
 include/linux/mmzone.h         |   8 ++-
 include/linux/page-flags.h     |  19 +++++
 mm/compaction.c                |   7 ++
 mm/memory_hotplug.c            | 153 +++++++++++++++++++++++++++++++++++++----
 mm/page_alloc.c                |  26 ++++++-
 mm/page_isolation.c            |  14 +++-
 mm/sparse.c                    | 116 ++++++++++++++++++++++++++++++-
 20 files changed, 441 insertions(+), 32 deletions(-)

-- 
2.12.3


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/5] mm,memory_hotplug: Introduce MHP_MEMMAP_ON_MEMORY
  2019-07-25 16:02 [PATCH v3 0/5] Allocate memmap from hotadded memory Oscar Salvador
@ 2019-07-25 16:02 ` Oscar Salvador
  2019-07-26  8:34   ` David Hildenbrand
  2019-07-25 16:02 ` [PATCH v3 2/5] mm: Introduce a new Vmemmap page-type Oscar Salvador
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Oscar Salvador @ 2019-07-25 16:02 UTC (permalink / raw)
  To: akpm
  Cc: dan.j.williams, david, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel, Oscar Salvador

This patch introduces MHP_MEMMAP_ON_MEMORY flag,
and prepares the callers that add memory to take a "flags" parameter.
This "flags" parameter will be evaluated later on in Patch#3
to init mhp_restrictions struct.

The callers are:

add_memory
__add_memory
add_memory_resource

Unfortunately, we do not have a single entry point to add memory, as depending
on the requisites of the caller, they want to hook up in different places,
(e.g: Xen reserve_additional_memory()), so we have to spread the parameter
in the three callers.

MHP_MEMMAP_ON_MEMORY flag parameter will specify to allocate memmaps
from the hot-added range.
If callers wants memmaps to be allocated per memory block, it will
have to call add_memory() variants in memory-block granularity
spanning the whole range, while if it wants to allocate memmaps
per whole memory range, just one call will do.

Want to add 384MB (3 sections, 3 memory-blocks)
e.g:

	add_memory(0x1000, size_memory_block);
	add_memory(0x2000, size_memory_block);
	add_memory(0x3000, size_memory_block);

	[memblock#0  ]
	[0 - 511 pfns      ] - vmemmaps for section#0
	[512 - 32767 pfns  ] - normal memory

	[memblock#1 ]
	[32768 - 33279 pfns] - vmemmaps for section#1
	[33280 - 65535 pfns] - normal memory

	[memblock#2 ]
	[65536 - 66047 pfns] - vmemmap for section#2
	[66048 - 98304 pfns] - normal memory

or
	add_memory(0x1000, size_memory_block * 3);

	[memblock #0 ]
        [0 - 1533 pfns    ] - vmemmap for section#{0-2}
        [1534 - 98304 pfns] - normal memory

When using larger memory blocks (1GB or 2GB), the principle is the same.

Of course, per whole-range granularity is nicer when it comes to have a large
contigous area, while per memory-block granularity allows us to have flexibility
when removing the memory.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 drivers/acpi/acpi_memhotplug.c |  2 +-
 drivers/base/memory.c          |  2 +-
 drivers/dax/kmem.c             |  2 +-
 drivers/hv/hv_balloon.c        |  2 +-
 drivers/s390/char/sclp_cmd.c   |  2 +-
 drivers/xen/balloon.c          |  2 +-
 include/linux/memory_hotplug.h | 25 ++++++++++++++++++++++---
 mm/memory_hotplug.c            | 10 +++++-----
 8 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index e294f44a7850..d91b3584d4b2 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -207,7 +207,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (node < 0)
 			node = memory_add_physaddr_to_nid(info->start_addr);
 
-		result = __add_memory(node, info->start_addr, info->length);
+		result = __add_memory(node, info->start_addr, info->length, 0);
 
 		/*
 		 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 154d5d4a0779..d30d0f6c8ad0 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -521,7 +521,7 @@ static ssize_t probe_store(struct device *dev, struct device_attribute *attr,
 
 	nid = memory_add_physaddr_to_nid(phys_addr);
 	ret = __add_memory(nid, phys_addr,
-			   MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+			   MIN_MEMORY_BLOCK_SIZE * sections_per_block, 0);
 
 	if (ret)
 		goto out;
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 3d0a7e702c94..e159184e0ba0 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -65,7 +65,7 @@ int dev_dax_kmem_probe(struct device *dev)
 	new_res->flags = IORESOURCE_SYSTEM_RAM;
 	new_res->name = dev_name(dev);
 
-	rc = add_memory(numa_node, new_res->start, resource_size(new_res));
+	rc = add_memory(numa_node, new_res->start, resource_size(new_res), 0);
 	if (rc) {
 		release_resource(new_res);
 		kfree(new_res);
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 6fb4ea5f0304..beb92bc56186 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -731,7 +731,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
 		ret = add_memory(nid, PFN_PHYS((start_pfn)),
-				(HA_CHUNK << PAGE_SHIFT));
+				(HA_CHUNK << PAGE_SHIFT), 0);
 
 		if (ret) {
 			pr_err("hot_add memory failed error is %d\n", ret);
diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
index 37d42de06079..f61026c7db7e 100644
--- a/drivers/s390/char/sclp_cmd.c
+++ b/drivers/s390/char/sclp_cmd.c
@@ -406,7 +406,7 @@ static void __init add_memory_merged(u16 rn)
 	if (!size)
 		goto skip_add;
 	for (addr = start; addr < start + size; addr += block_size)
-		add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size);
+		add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size, 0);
 skip_add:
 	first_rn = rn;
 	num = 1;
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 4e11de6cde81..e4934ce40478 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -349,7 +349,7 @@ static enum bp_state reserve_additional_memory(void)
 	mutex_unlock(&balloon_mutex);
 	/* add_memory_resource() requires the device_hotplug lock */
 	lock_device_hotplug();
-	rc = add_memory_resource(nid, resource);
+	rc = add_memory_resource(nid, resource, 0);
 	unlock_device_hotplug();
 	mutex_lock(&balloon_mutex);
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index f46ea71b4ffd..45dece922d7c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -54,6 +54,25 @@ enum {
 };
 
 /*
+ * We want memmap (struct page array) to be allocated from the hotadded range.
+ * To do so, there are two possible ways depending on what the caller wants.
+ * 1) Allocate memmap pages whole hot-added range.
+ *    Here the caller will only call any add_memory() variant with the whole
+ *    memory address.
+ * 2) Allocate memmap pages per memblock
+ *    Here, the caller will call any add_memory() variant per memblock
+ *    granularity.
+ * The former implies that we will use the beginning of the hot-added range
+ * to store the memmap pages of the whole range, while the latter implies
+ * that we will use the beginning of each memblock to store its own memmap
+ * pages.
+ *
+ * Please note that this is only a hint, not a guarantee. Only selected
+ * architectures support it with SPARSE_VMEMMAP.
+ */
+#define MHP_MEMMAP_ON_MEMORY	(1UL<<1)
+
+/*
  * Restrictions for the memory hotplug:
  * flags:  MHP_ flags
  * altmap: alternative allocator for memmap array
@@ -340,9 +359,9 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 extern void __ref free_area_init_core_hotplug(int nid);
-extern int __add_memory(int nid, u64 start, u64 size);
-extern int add_memory(int nid, u64 start, u64 size);
-extern int add_memory_resource(int nid, struct resource *resource);
+extern int __add_memory(int nid, u64 start, u64 size, unsigned long flags);
+extern int add_memory(int nid, u64 start, u64 size, unsigned long flags);
+extern int add_memory_resource(int nid, struct resource *resource, unsigned long flags);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap);
 extern bool is_memblock_offlined(struct memory_block *mem);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9a82e12bd0e7..3d97c3711333 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1046,7 +1046,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
  *
  * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
  */
-int __ref add_memory_resource(int nid, struct resource *res)
+int __ref add_memory_resource(int nid, struct resource *res, unsigned long flags)
 {
 	struct mhp_restrictions restrictions = {};
 	u64 start, size;
@@ -1123,7 +1123,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 }
 
 /* requires device_hotplug_lock, see add_memory_resource() */
-int __ref __add_memory(int nid, u64 start, u64 size)
+int __ref __add_memory(int nid, u64 start, u64 size, unsigned long flags)
 {
 	struct resource *res;
 	int ret;
@@ -1132,18 +1132,18 @@ int __ref __add_memory(int nid, u64 start, u64 size)
 	if (IS_ERR(res))
 		return PTR_ERR(res);
 
-	ret = add_memory_resource(nid, res);
+	ret = add_memory_resource(nid, res, flags);
 	if (ret < 0)
 		release_memory_resource(res);
 	return ret;
 }
 
-int add_memory(int nid, u64 start, u64 size)
+int add_memory(int nid, u64 start, u64 size, unsigned long flags)
 {
 	int rc;
 
 	lock_device_hotplug();
-	rc = __add_memory(nid, start, size);
+	rc = __add_memory(nid, start, size, flags);
 	unlock_device_hotplug();
 
 	return rc;
-- 
2.12.3


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 2/5] mm: Introduce a new Vmemmap page-type
  2019-07-25 16:02 [PATCH v3 0/5] Allocate memmap from hotadded memory Oscar Salvador
  2019-07-25 16:02 ` [PATCH v3 1/5] mm,memory_hotplug: Introduce MHP_MEMMAP_ON_MEMORY Oscar Salvador
@ 2019-07-25 16:02 ` Oscar Salvador
  2019-07-26  8:48   ` David Hildenbrand
  2019-07-25 16:02 ` [PATCH v3 3/5] mm,sparse: Add SECTION_USE_VMEMMAP flag Oscar Salvador
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Oscar Salvador @ 2019-07-25 16:02 UTC (permalink / raw)
  To: akpm
  Cc: dan.j.williams, david, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel, Oscar Salvador

This patch introduces a new Vmemmap page-type.

It also introduces some functions to ease the handling of vmemmap pages:

- vmemmap_nr_sections: Returns the number of sections that used vmemmap.

- vmemmap_nr_pages: Allows us to retrieve the amount of vmemmap pages
  derivated from any vmemmap-page in the section. Useful for accounting
  and to know how much to we have to skip in the case where vmemmap pages
  need to be ignored.

- vmemmap_head: Returns the vmemmap head page

- SetPageVmemmap: Sets Reserved flag bit, and sets page->type to Vmemmap.
  Setting the Reserved flag bit is just for extra protection, actually
  we do not expect anyone to use these pages for anything.

- ClearPageVmemmap: Clears Reserved flag bit and page->type.
  Only used when sections containing vmemmap pages are removed.

These functions will be used for the code handling Vmemmap pages.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/mm.h         | 17 +++++++++++++++++
 include/linux/mm_types.h   |  5 +++++
 include/linux/page-flags.h | 19 +++++++++++++++++++
 3 files changed, 41 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 45f0ab0ed4f7..432175f8f8d2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2904,6 +2904,23 @@ static inline bool debug_guardpage_enabled(void) { return false; }
 static inline bool page_is_guard(struct page *page) { return false; }
 #endif /* CONFIG_DEBUG_PAGEALLOC */
 
+static __always_inline struct page *vmemmap_head(struct page *page)
+{
+	return (struct page *)page->vmemmap_head;
+}
+
+static __always_inline unsigned long vmemmap_nr_sections(struct page *page)
+{
+	struct page *head = vmemmap_head(page);
+	return head->vmemmap_sections;
+}
+
+static __always_inline unsigned long vmemmap_nr_pages(struct page *page)
+{
+	struct page *head = vmemmap_head(page);
+	return head->vmemmap_pages - (page - head);
+}
+
 #if MAX_NUMNODES > 1
 void __init setup_nr_node_ids(void);
 #else
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6a7a1083b6fb..51dd227f2a6b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -170,6 +170,11 @@ struct page {
 			 * pmem backed DAX files are mapped.
 			 */
 		};
+		struct {        /* Vmemmap pages */
+			unsigned long vmemmap_head;
+			unsigned long vmemmap_sections; /* Number of sections */
+			unsigned long vmemmap_pages;    /* Number of pages */
+		};
 
 		/** @rcu_head: You can use this to free a page by RCU. */
 		struct rcu_head rcu_head;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index f91cb8898ff0..75f302a532f9 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -708,6 +708,7 @@ PAGEFLAG_FALSE(DoubleMap)
 #define PG_kmemcg	0x00000200
 #define PG_table	0x00000400
 #define PG_guard	0x00000800
+#define PG_vmemmap     0x00001000
 
 #define PageType(page, flag)						\
 	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
@@ -764,6 +765,24 @@ PAGE_TYPE_OPS(Table, table)
  */
 PAGE_TYPE_OPS(Guard, guard)
 
+/*
+ * Vmemmap pages refers to those pages that are used to create the memmap
+ * array, and reside within the same memory range that was hotppluged, so
+ * they are self-hosted. (see include/linux/memory_hotplug.h)
+ */
+PAGE_TYPE_OPS(Vmemmap, vmemmap)
+static __always_inline void SetPageVmemmap(struct page *page)
+{
+	__SetPageVmemmap(page);
+	__SetPageReserved(page);
+}
+
+static __always_inline void ClearPageVmemmap(struct page *page)
+{
+	__ClearPageVmemmap(page);
+	__ClearPageReserved(page);
+}
+
 extern bool is_free_buddy_page(struct page *page);
 
 __PAGEFLAG(Isolated, isolated, PF_ANY);
-- 
2.12.3


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 3/5] mm,sparse: Add SECTION_USE_VMEMMAP flag
  2019-07-25 16:02 [PATCH v3 0/5] Allocate memmap from hotadded memory Oscar Salvador
  2019-07-25 16:02 ` [PATCH v3 1/5] mm,memory_hotplug: Introduce MHP_MEMMAP_ON_MEMORY Oscar Salvador
  2019-07-25 16:02 ` [PATCH v3 2/5] mm: Introduce a new Vmemmap page-type Oscar Salvador
@ 2019-07-25 16:02 ` Oscar Salvador
  2019-08-01 14:45   ` David Hildenbrand
  2019-07-25 16:02 ` [PATCH v3 4/5] mm,memory_hotplug: Allocate memmap from the added memory range for sparse-vmemmap Oscar Salvador
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Oscar Salvador @ 2019-07-25 16:02 UTC (permalink / raw)
  To: akpm
  Cc: dan.j.williams, david, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel, Oscar Salvador

When hot-removing memory, we need to be careful about two things:

1) Memory range must be memory_block aligned. This is what
   check_hotplug_memory_range() checks for.

2) If a range was hot-added using MHP_MEMMAP_ON_MEMORY, we need to check
   whether the caller is removing memory with the same granularity that
   it was added.

So to check against case 2), we mark all sections used by vmemmap
(not only the ones containing vmemmap pages, but all sections spanning
the memory range) with SECTION_USE_VMEMMAP.

This will allow us to do some sanity checks when in hot-remove stage.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/memory_hotplug.h | 3 ++-
 include/linux/mmzone.h         | 8 +++++++-
 mm/memory_hotplug.c            | 2 +-
 mm/sparse.c                    | 9 +++++++--
 4 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 45dece922d7c..6b20008d9297 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -366,7 +366,8 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap);
 extern bool is_memblock_offlined(struct memory_block *mem);
 extern int sparse_add_section(int nid, unsigned long pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap);
+		unsigned long nr_pages, struct vmem_altmap *altmap,
+		bool vmemmap_section);
 extern void sparse_remove_section(struct mem_section *ms,
 		unsigned long pfn, unsigned long nr_pages,
 		unsigned long map_offset, struct vmem_altmap *altmap);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d77d717c620c..259c326962f5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1254,7 +1254,8 @@ extern size_t mem_section_usage_size(void);
 #define SECTION_HAS_MEM_MAP	(1UL<<1)
 #define SECTION_IS_ONLINE	(1UL<<2)
 #define SECTION_IS_EARLY	(1UL<<3)
-#define SECTION_MAP_LAST_BIT	(1UL<<4)
+#define SECTION_USE_VMEMMAP	(1UL<<4)
+#define SECTION_MAP_LAST_BIT	(1UL<<5)
 #define SECTION_MAP_MASK	(~(SECTION_MAP_LAST_BIT-1))
 #define SECTION_NID_SHIFT	3
 
@@ -1265,6 +1266,11 @@ static inline struct page *__section_mem_map_addr(struct mem_section *section)
 	return (struct page *)map;
 }
 
+static inline int vmemmap_section(struct mem_section *section)
+{
+	return (section && (section->section_mem_map & SECTION_USE_VMEMMAP));
+}
+
 static inline int present_section(struct mem_section *section)
 {
 	return (section && (section->section_mem_map & SECTION_MARKED_PRESENT));
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3d97c3711333..c2338703ce80 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -314,7 +314,7 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
 
 		pfns = min(nr_pages, PAGES_PER_SECTION
 				- (pfn & ~PAGE_SECTION_MASK));
-		err = sparse_add_section(nid, pfn, pfns, altmap);
+		err = sparse_add_section(nid, pfn, pfns, altmap, 0);
 		if (err)
 			break;
 		pfn += pfns;
diff --git a/mm/sparse.c b/mm/sparse.c
index 79355a86064f..09cac39e39d9 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -856,13 +856,18 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
  * * -ENOMEM	- Out of memory.
  */
 int __meminit sparse_add_section(int nid, unsigned long start_pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap)
+		unsigned long nr_pages, struct vmem_altmap *altmap,
+		bool vmemmap_section)
 {
 	unsigned long section_nr = pfn_to_section_nr(start_pfn);
+	unsigned long flags = 0;
 	struct mem_section *ms;
 	struct page *memmap;
 	int ret;
 
+	if (vmemmap_section)
+		flags = SECTION_USE_VMEMMAP;
+
 	ret = sparse_index_init(section_nr, nid);
 	if (ret < 0)
 		return ret;
@@ -884,7 +889,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
 	/* Align memmap to section boundary in the subsection case */
 	if (section_nr_to_pfn(section_nr) != start_pfn)
 		memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
-	sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
+	sparse_init_one_section(ms, section_nr, memmap, ms->usage, flags);
 
 	return 0;
 }
-- 
2.12.3


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 4/5] mm,memory_hotplug: Allocate memmap from the added memory range for sparse-vmemmap
  2019-07-25 16:02 [PATCH v3 0/5] Allocate memmap from hotadded memory Oscar Salvador
                   ` (2 preceding siblings ...)
  2019-07-25 16:02 ` [PATCH v3 3/5] mm,sparse: Add SECTION_USE_VMEMMAP flag Oscar Salvador
@ 2019-07-25 16:02 ` Oscar Salvador
  2019-08-01 15:04   ` David Hildenbrand
  2019-07-25 16:02 ` [PATCH v3 5/5] mm,memory_hotplug: Allow userspace to enable/disable vmemmap Oscar Salvador
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Oscar Salvador @ 2019-07-25 16:02 UTC (permalink / raw)
  To: akpm
  Cc: dan.j.williams, david, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel, Oscar Salvador

Physical memory hotadd has to allocate a memmap (struct page array) for
the newly added memory section. Currently, alloc_pages_node() is used
for those allocations.

This has some disadvantages:
 a) an existing memory is consumed for that purpose
    (~2MB per 128MB memory section on x86_64)
 b) if the whole node is movable then we have off-node struct pages
    which has performance drawbacks.

a) has turned out to be a problem for memory hotplug based ballooning
   because the userspace might not react in time to online memory while
   the memory consumed during physical hotadd consumes enough memory to
   push system to OOM. 31bc3858ea3e ("memory-hotplug: add automatic onlining
   policy for the newly added memory") has been added to workaround that
   problem.

This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled.

Vmemap page tables can map arbitrary memory.
That means that we can simply use the beginning of each memory section and
map struct pages there.
struct pages which back the allocated space then just need to be treated
carefully.

Implementation wise we will reuse vmem_altmap infrastructure to override
the default allocator used by __vmemap_populate. Once the memmap is
allocated, we are going to need a way to mark altmap pfns used for the allocation.
If MHP_MEMMAP_ON_MEMORY flag was passed, we will set up the layout of the
altmap structure at the beginning of __add_pages(), and then we will call
mhp_mark_vmemmap_pages() to do the proper marking.

mhp_mark_vmemmap_pages() marks the pages as vmemmap and sets some metadata:

Vmemmap's pages layout is as follows:

        * Layout:
        * Head:
        *      head->vmemmap_pages     : nr of vmemmap pages
        *      head->vmemmap_sections  : nr of sections used by this altmap
        * Tail:
        *      tail->vmemmap_head      : head
        * All:
        *      page->type              : Vmemmap

E.g:
When hot-add 1GB on x86_64 :

head->vmemmap_pages = 4096
head->vmemmap_sections = 8

We keep this information within the struct pages as we need them in certain
stages like offline, online and hot-remove.

head->vmemmap_sections is a kind of refcount, because when using MHP_MEMMAP_ON_MEMORY,
we need to know how much do we have to defer the call to vmemmap_free().
The thing is that the first pages of the memory range are used to store the
memmap mapping, so we cannot remove those first, otherwise we would blow up
when accessing the other pages.

So, instead of actually removing the section (with vmemmap_free), we wait
until we remove the last one, and then we call vmemmap_free() for all
batched sections.

We also have to be careful about those pages during online and offline
operations. They are simply skipped, so online will keep them
reserved and so unusable for any other purpose and offline ignores them
so they do not block the offline operation.

In offline operation we only have to check for one particularity.
Depending on the way the hot-added range was added, it might be that,
that one or more of memory blocks from the beginning are filled with
only vmemmap pages.
We just need to check for this case and skip 1) isolating 2) migrating,
because those pages do not need to be migrated anywhere, as they are
self-hosted.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 arch/powerpc/mm/init_64.c      |   7 +++
 arch/s390/mm/init.c            |   6 ++
 arch/x86/mm/init_64.c          |  10 +++
 drivers/acpi/acpi_memhotplug.c |   3 +-
 include/linux/memory_hotplug.h |   6 ++
 include/linux/memremap.h       |   2 +-
 mm/compaction.c                |   7 +++
 mm/memory_hotplug.c            | 136 ++++++++++++++++++++++++++++++++++++++---
 mm/page_alloc.c                |  26 +++++++-
 mm/page_isolation.c            |  14 ++++-
 mm/sparse.c                    | 107 ++++++++++++++++++++++++++++++++
 11 files changed, 309 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index a44f6281ca3a..f19aa006ca6d 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -292,6 +292,13 @@ void __ref vmemmap_free(unsigned long start, unsigned long end,
 
 		if (base_pfn >= alt_start && base_pfn < alt_end) {
 			vmem_altmap_free(altmap, nr_pages);
+		} else if (PageVmemmap(page)) {
+			/*
+			 * runtime vmemmap pages are residing inside the memory
+			 * section so they do not have to be freed anywhere.
+			 */
+			while (PageVmemmap(page))
+				ClearPageVmemmap(page++);
 		} else if (PageReserved(page)) {
 			/* allocated from bootmem */
 			if (page_size < PAGE_SIZE) {
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 20340a03ad90..adb04f3977eb 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -278,6 +278,12 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	unsigned long size_pages = PFN_DOWN(size);
 	int rc;
 
+	/*
+	 * Physical memory is added only later during the memory online so we
+	 * cannot use the added range at this stage unfortunately.
+	 */
+	restrictions->flags &= ~restrictions->flags;
+
 	if (WARN_ON_ONCE(restrictions->altmap))
 		return -EINVAL;
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a6b5c653727b..f9f720a28b3e 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -876,6 +876,16 @@ static void __meminit free_pagetable(struct page *page, int order)
 	unsigned long magic;
 	unsigned int nr_pages = 1 << order;
 
+	/*
+	 * Runtime vmemmap pages are residing inside the memory section so
+	 * they do not have to be freed anywhere.
+	 */
+	if (PageVmemmap(page)) {
+		while (nr_pages--)
+			ClearPageVmemmap(page++);
+		return;
+	}
+
 	/* bootmem page has reserved flag */
 	if (PageReserved(page)) {
 		__ClearPageReserved(page);
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index d91b3584d4b2..e0148dde5313 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -207,7 +207,8 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (node < 0)
 			node = memory_add_physaddr_to_nid(info->start_addr);
 
-		result = __add_memory(node, info->start_addr, info->length, 0);
+		result = __add_memory(node, info->start_addr, info->length,
+				      MHP_MEMMAP_ON_MEMORY);
 
 		/*
 		 * If the memory block has been used by the kernel, add_memory()
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 6b20008d9297..e1e8abf22a80 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -377,4 +377,10 @@ extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_
 		int online_type);
 extern struct zone *zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
 		unsigned long nr_pages);
+
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+extern void mhp_mark_vmemmap_pages(struct vmem_altmap *self);
+#else
+static inline void mhp_mark_vmemmap_pages(struct vmem_altmap *self) {}
+#endif
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 2cfc3c289d01..0a7355b8c1cf 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -16,7 +16,7 @@ struct device;
  * @alloc: track pages consumed, private to vmemmap_populate()
  */
 struct vmem_altmap {
-	const unsigned long base_pfn;
+	unsigned long base_pfn;
 	const unsigned long reserve;
 	unsigned long free;
 	unsigned long align;
diff --git a/mm/compaction.c b/mm/compaction.c
index ac4ead029b4a..2faf769375c4 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -857,6 +857,13 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		nr_scanned++;
 
 		page = pfn_to_page(low_pfn);
+		/*
+		 * Vmemmap pages do not need to be isolated.
+		 */
+		if (PageVmemmap(page)) {
+			low_pfn += vmemmap_nr_pages(page) - 1;
+			continue;
+		}
 
 		/*
 		 * Check if the pageblock has already been marked skipped.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c2338703ce80..09d41339cd11 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -278,6 +278,13 @@ static int check_pfn_span(unsigned long pfn, unsigned long nr_pages,
 	return 0;
 }
 
+static void mhp_init_altmap(unsigned long pfn, unsigned long nr_pages,
+			    struct vmem_altmap *altmap)
+{
+	altmap->free = nr_pages;
+	altmap->base_pfn = pfn;
+}
+
 /*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
@@ -289,8 +296,18 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
 {
 	int err;
 	unsigned long nr, start_sec, end_sec;
-	struct vmem_altmap *altmap = restrictions->altmap;
+	struct vmem_altmap *altmap;
+	struct vmem_altmap mhp_altmap = {};
+	unsigned long mhp_flags = restrictions->flags;
+	bool vmemmap_section = false;
+
+	if (mhp_flags) {
+		mhp_init_altmap(pfn, nr_pages, &mhp_altmap);
+		restrictions->altmap = &mhp_altmap;
+		vmemmap_section = true;
+	}
 
+	altmap = restrictions->altmap;
 	if (altmap) {
 		/*
 		 * Validate altmap is within bounds of the total request
@@ -314,7 +331,7 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
 
 		pfns = min(nr_pages, PAGES_PER_SECTION
 				- (pfn & ~PAGE_SECTION_MASK));
-		err = sparse_add_section(nid, pfn, pfns, altmap, 0);
+		err = sparse_add_section(nid, pfn, pfns, altmap, vmemmap_section);
 		if (err)
 			break;
 		pfn += pfns;
@@ -322,6 +339,10 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
 		cond_resched();
 	}
 	vmemmap_populate_print_last();
+
+	if (mhp_flags)
+		mhp_mark_vmemmap_pages(altmap);
+
 	return err;
 }
 
@@ -640,6 +661,14 @@ static int online_pages_blocks(unsigned long start, unsigned long nr_pages)
 	while (start < end) {
 		order = min(MAX_ORDER - 1,
 			get_order(PFN_PHYS(end) - PFN_PHYS(start)));
+		/*
+		 * Check if the pfn is aligned to its order.
+		 * If not, we decrement the order until it is,
+		 * otherwise __free_one_page will bug us.
+		 */
+		while (start & ((1 << order) - 1))
+			order--;
+
 		(*online_page_callback)(pfn_to_page(start), order);
 
 		onlined_pages += (1UL << order);
@@ -648,17 +677,51 @@ static int online_pages_blocks(unsigned long start, unsigned long nr_pages)
 	return onlined_pages;
 }
 
+static bool vmemmap_skip_block(unsigned long pfn, unsigned long nr_pages,
+		       unsigned long *nr_vmemmap_pages)
+{
+	bool skip = false;
+	unsigned long vmemmap_pages = 0;
+
+	/*
+	 * This function gets called from {online,offline}_pages.
+	 * It has two goals:
+	 *
+	 * 1) Account number of vmemmap pages within the range
+	 * 2) Check if the whole range contains only vmemmap_pages.
+	 */
+
+	if (PageVmemmap(pfn_to_page(pfn))) {
+		struct page *page = pfn_to_page(pfn);
+
+		vmemmap_pages = min(vmemmap_nr_pages(page), nr_pages);
+		if (vmemmap_pages == nr_pages)
+			skip = true;
+	}
+
+	*nr_vmemmap_pages = vmemmap_pages;
+	return skip;
+}
+
 static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
 			void *arg)
 {
 	unsigned long onlined_pages = *(unsigned long *)arg;
-
-	if (PageReserved(pfn_to_page(start_pfn)))
-		onlined_pages += online_pages_blocks(start_pfn, nr_pages);
-
+	unsigned long pfn = start_pfn;
+	unsigned long nr_vmemmap_pages = 0;
+	bool skip;
+
+	skip = vmemmap_skip_block(pfn, nr_pages, &nr_vmemmap_pages);
+	if (skip)
+		goto skip_online_pages;
+
+	pfn += nr_vmemmap_pages;
+	if (PageReserved(pfn_to_page(pfn)))
+		onlined_pages += online_pages_blocks(pfn, nr_pages - nr_vmemmap_pages);
+skip_online_pages:
 	online_mem_sections(start_pfn, start_pfn + nr_pages);
 
-	*(unsigned long *)arg = onlined_pages;
+	*(unsigned long *)arg = onlined_pages + nr_vmemmap_pages;
 	return 0;
 }
 
@@ -1040,6 +1103,19 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
+static unsigned long mhp_check_flags(unsigned long flags)
+{
+	if (!flags)
+		return 0;
+
+	if (flags != MHP_MEMMAP_ON_MEMORY) {
+		WARN(1, "Wrong flags value (%lx). Ignoring flags.\n", flags);
+		return 0;
+	}
+
+	return flags;
+}
+
 /*
  * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
  * and online/offline operations (triggered e.g. by sysfs).
@@ -1075,6 +1151,8 @@ int __ref add_memory_resource(int nid, struct resource *res, unsigned long flags
 		goto error;
 	new_node = ret;
 
+	restrictions.flags = mhp_check_flags(flags);
+
 	/* call arch's memory hotadd */
 	ret = arch_add_memory(nid, start, size, &restrictions);
 	if (ret < 0)
@@ -1502,12 +1580,14 @@ static int __ref __offline_pages(unsigned long start_pfn,
 {
 	unsigned long pfn, nr_pages;
 	unsigned long offlined_pages = 0;
+	unsigned long nr_vmemmap_pages = 0;
 	int ret, node, nr_isolate_pageblock;
 	unsigned long flags;
 	unsigned long valid_start, valid_end;
 	struct zone *zone;
 	struct memory_notify arg;
 	char *reason;
+	bool skip = false;
 
 	mem_hotplug_begin();
 
@@ -1524,8 +1604,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	node = zone_to_nid(zone);
 	nr_pages = end_pfn - start_pfn;
 
+	skip = vmemmap_skip_block(start_pfn, nr_pages, &nr_vmemmap_pages);
+
 	/* set above range as isolated */
-	ret = start_isolate_page_range(start_pfn, end_pfn,
+	ret = start_isolate_page_range(start_pfn + nr_vmemmap_pages, end_pfn,
 				       MIGRATE_MOVABLE,
 				       SKIP_HWPOISON | REPORT_FAILURE);
 	if (ret < 0) {
@@ -1545,6 +1627,9 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		goto failed_removal_isolated;
 	}
 
+	if (skip)
+		goto skip_migration;
+
 	do {
 		for (pfn = start_pfn; pfn;) {
 			if (signal_pending(current)) {
@@ -1581,6 +1666,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 					    NULL, check_pages_isolated_cb);
 	} while (ret);
 
+skip_migration:
 	/* Ok, all of our target is isolated.
 	   We cannot do rollback at this point. */
 	walk_system_ram_range(start_pfn, end_pfn - start_pfn,
@@ -1596,7 +1682,9 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	spin_unlock_irqrestore(&zone->lock, flags);
 
 	/* removal success */
-	adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages);
+	if (offlined_pages)
+		adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages);
+	offlined_pages += nr_vmemmap_pages;
 	zone->present_pages -= offlined_pages;
 
 	pgdat_resize_lock(zone->zone_pgdat, &flags);
@@ -1739,11 +1827,41 @@ static void __release_memory_resource(resource_size_t start,
 	}
 }
 
+static int check_hotplug_granularity(u64 start, u64 size)
+{
+	unsigned long pfn = PHYS_PFN(start);
+
+	/*
+	 * Sanity check in case the range used MHP_MEMMAP_ON_MEMORY.
+	 */
+	if (vmemmap_section(__pfn_to_section(pfn))) {
+		struct page *page = pfn_to_page(pfn);
+		unsigned long nr_pages = size >> PAGE_SHIFT;
+		unsigned long sections;
+
+		/*
+		 * The start of the memory range is not correct.
+		 */
+		if (!PageVmemmap(page) || (vmemmap_head(page) != page))
+			return -EINVAL;
+
+		sections = vmemmap_nr_sections(page);
+		if (sections * PAGES_PER_SECTION != nr_pages)
+			/*
+			 * Check that granularity is the same.
+			 */
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int __ref try_remove_memory(int nid, u64 start, u64 size)
 {
 	int rc = 0;
 
 	BUG_ON(check_hotplug_memory_range(start, size));
+	BUG_ON(check_hotplug_granularity(start, size));
 
 	mem_hotplug_begin();
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d3bb601c461b..7c7d7130b627 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1340,14 +1340,21 @@ static void free_one_page(struct zone *zone,
 static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 				unsigned long zone, int nid)
 {
+	if (PageVmemmap(page))
+		/*
+		 * Vmemmap pages need to preserve their state.
+		 */
+		goto preserve_state;
+
 	mm_zero_struct_page(page);
-	set_page_links(page, zone, nid, pfn);
-	init_page_count(page);
 	page_mapcount_reset(page);
+	INIT_LIST_HEAD(&page->lru);
+preserve_state:
+	init_page_count(page);
+	set_page_links(page, zone, nid, pfn);
 	page_cpupid_reset_last(page);
 	page_kasan_tag_reset(page);
 
-	INIT_LIST_HEAD(&page->lru);
 #ifdef WANT_PAGE_VIRTUAL
 	/* The shift won't overflow because ZONE_NORMAL is below 4G. */
 	if (!is_highmem_idx(zone))
@@ -8184,6 +8191,14 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 
 		page = pfn_to_page(check);
 
+		/*
+		 * Vmemmap pages are not needed to be moved around.
+		 */
+		if (PageVmemmap(page)) {
+			iter += vmemmap_nr_pages(page) - 1;
+			continue;
+		}
+
 		if (PageReserved(page))
 			goto unmovable;
 
@@ -8551,6 +8566,11 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 			continue;
 		}
 		page = pfn_to_page(pfn);
+
+		if (PageVmemmap(page)) {
+			pfn += vmemmap_nr_pages(page);
+			continue;
+		}
 		/*
 		 * The HWPoisoned page may be not in buddy system, and
 		 * page_count() is not 0.
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 89c19c0feadb..ee26ea41c9eb 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -146,7 +146,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 static inline struct page *
 __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 {
-	int i;
+	unsigned long i;
 
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page;
@@ -154,6 +154,10 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
 		page = pfn_to_online_page(pfn + i);
 		if (!page)
 			continue;
+		if (PageVmemmap(page)) {
+			i += vmemmap_nr_pages(page) - 1;
+			continue;
+		}
 		return page;
 	}
 	return NULL;
@@ -267,6 +271,14 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 			continue;
 		}
 		page = pfn_to_page(pfn);
+		/*
+		 * Vmemmap pages are not isolated. Skip them.
+		 */
+		if (PageVmemmap(page)) {
+			pfn += vmemmap_nr_pages(page);
+			continue;
+		}
+
 		if (PageBuddy(page))
 			/*
 			 * If the page is on a free list, it has to be on
diff --git a/mm/sparse.c b/mm/sparse.c
index 09cac39e39d9..2cc2e5af1986 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -645,18 +645,125 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 #endif
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
+static void vmemmap_init_page(struct page *page, struct page *head)
+{
+	page_mapcount_reset(page);
+	SetPageVmemmap(page);
+	page->vmemmap_head = (unsigned long)head;
+}
+
+static void vmemmap_init_head(struct page *page, unsigned long nr_sections,
+			      unsigned long nr_pages)
+{
+	page->vmemmap_sections = nr_sections;
+	page->vmemmap_pages = nr_pages;
+}
+
+void mhp_mark_vmemmap_pages(struct vmem_altmap *self)
+{
+	unsigned long pfn = self->base_pfn + self->reserve;
+	unsigned long nr_pages = self->alloc;
+	unsigned long nr_sects = self->free / PAGES_PER_SECTION;
+	unsigned long i;
+	struct page *head;
+
+	if (!nr_pages)
+		return;
+
+	/*
+	 * All allocations for the memory hotplug are the same sized so align
+	 * should be 0.
+	 */
+	WARN_ON(self->align);
+
+	memset(pfn_to_page(pfn), 0, sizeof(struct page) * nr_pages);
+
+	/*
+	 * Mark pages as Vmemmap pages
+	 * Layout:
+	 * Head:
+	 * 	head->vmemmap_pages	: nr of vmemmap pages
+	 *	head->mhp_flags    	: MHP_flags
+	 *	head->vmemmap_sections	: nr of sections used by this altmap
+	 * Tail:
+	 *	tail->vmemmap_head	: head
+	 * All:
+	 *	page->type		: Vmemmap
+	 */
+	head = pfn_to_page(pfn);
+	for (i = 0; i < nr_pages; i++) {
+		struct page *page = head + i;
+
+		vmemmap_init_page(page, head);
+	}
+	vmemmap_init_head(head, nr_sects, nr_pages);
+}
+
+/*
+ * If the range we are trying to remove was hot-added with vmemmap pages
+ * using MHP_MEMMAP_*, we need to keep track of it to know how much
+ * do we have do defer the free up.
+ * Since sections are removed sequentally in __remove_pages()->
+ * __remove_section(), we just wait until we hit the last section.
+ * Once that happens, we can trigger free_deferred_vmemmap_range to actually
+ * free the whole memory-range.
+ */
+static struct page *__vmemmap_head = NULL;
+
 static struct page *populate_section_memmap(unsigned long pfn,
 		unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
 {
 	return __populate_section_memmap(pfn, nr_pages, nid, altmap);
 }
 
+static void vmemmap_free_deferred_range(unsigned long start,
+					unsigned long end)
+{
+	unsigned long nr_pages = end - start;
+	unsigned long first_section;
+
+	first_section = (unsigned long)__vmemmap_head;
+	while (start >= first_section) {
+		vmemmap_free(start, end, NULL);
+		end = start;
+		start -= nr_pages;
+	}
+	__vmemmap_head = NULL;
+}
+
+static inline bool vmemmap_dec_and_test(void)
+{
+	__vmemmap_head->vmemmap_sections--;
+	return !__vmemmap_head->vmemmap_sections;
+}
+
+static void vmemmap_defer_free(unsigned long start, unsigned long end)
+{
+	if (vmemmap_dec_and_test())
+		vmemmap_free_deferred_range(start, end);
+}
+
+static inline bool should_defer_freeing(unsigned long start)
+{
+	if (PageVmemmap((struct page *)start) || __vmemmap_head) {
+		if (!__vmemmap_head)
+			__vmemmap_head = (struct page *)start;
+		return true;
+	}
+	return false;
+}
+
 static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
 		struct vmem_altmap *altmap)
 {
 	unsigned long start = (unsigned long) pfn_to_page(pfn);
 	unsigned long end = start + nr_pages * sizeof(struct page);
 
+	if (should_defer_freeing(start)) {
+		vmemmap_defer_free(start, end);
+		return;
+	}
+
 	vmemmap_free(start, end, altmap);
 }
 static void free_map_bootmem(struct page *memmap)
-- 
2.12.3


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 5/5] mm,memory_hotplug: Allow userspace to enable/disable vmemmap
  2019-07-25 16:02 [PATCH v3 0/5] Allocate memmap from hotadded memory Oscar Salvador
                   ` (3 preceding siblings ...)
  2019-07-25 16:02 ` [PATCH v3 4/5] mm,memory_hotplug: Allocate memmap from the added memory range for sparse-vmemmap Oscar Salvador
@ 2019-07-25 16:02 ` Oscar Salvador
  2019-08-01 15:07   ` David Hildenbrand
  2019-07-25 16:56 ` [PATCH v3 0/5] Allocate memmap from hotadded memory David Hildenbrand
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Oscar Salvador @ 2019-07-25 16:02 UTC (permalink / raw)
  To: akpm
  Cc: dan.j.williams, david, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel, Oscar Salvador

It seems that we have some users out there that want to expose all
hotpluggable memory to userspace, so this implements a toggling mechanism
for those users who want to disable it.

By default, vmemmap pages mechanism is enabled.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 drivers/base/memory.c          | 33 +++++++++++++++++++++++++++++++++
 include/linux/memory_hotplug.h |  3 +++
 mm/memory_hotplug.c            |  7 +++++++
 3 files changed, 43 insertions(+)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index d30d0f6c8ad0..5ec6b80de9dd 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -578,6 +578,35 @@ static DEVICE_ATTR_WO(soft_offline_page);
 static DEVICE_ATTR_WO(hard_offline_page);
 #endif
 
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+static ssize_t vmemmap_hotplug_show(struct device *dev,
+				    struct device_attribute *attr, char *buf)
+{
+	if (vmemmap_enabled)
+		return sprintf(buf, "enabled\n");
+	else
+		return sprintf(buf, "disabled\n");
+}
+
+static ssize_t vmemmap_hotplug_store(struct device *dev,
+			   struct device_attribute *attr,
+			   const char *buf, size_t count)
+{
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (sysfs_streq(buf, "enable"))
+		vmemmap_enabled = true;
+	else if (sysfs_streq(buf, "disable"))
+		vmemmap_enabled = false;
+	else
+		return -EINVAL;
+
+	return count;
+}
+static DEVICE_ATTR_RW(vmemmap_hotplug);
+#endif
+
 /*
  * Note that phys_device is optional.  It is here to allow for
  * differentiation between which *physical* devices each
@@ -794,6 +823,10 @@ static struct attribute *memory_root_attrs[] = {
 	&dev_attr_hard_offline_page.attr,
 #endif
 
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+	&dev_attr_vmemmap_hotplug.attr,
+#endif
+
 	&dev_attr_block_size_bytes.attr,
 	&dev_attr_auto_online_blocks.attr,
 	NULL
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index e1e8abf22a80..03d227d13301 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -134,6 +134,9 @@ extern int arch_add_memory(int nid, u64 start, u64 size,
 			struct mhp_restrictions *restrictions);
 extern u64 max_mem_size;
 
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+extern bool vmemmap_enabled;
+#endif
 extern bool memhp_auto_online;
 /* If movable_node boot option specified */
 extern bool movable_node_enabled;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 09d41339cd11..5ffe5375b87c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -68,6 +68,10 @@ void put_online_mems(void)
 
 bool movable_node_enabled = false;
 
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+bool vmemmap_enabled __read_mostly = true;
+#endif
+
 #ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE
 bool memhp_auto_online;
 #else
@@ -1108,6 +1112,9 @@ static unsigned long mhp_check_flags(unsigned long flags)
 	if (!flags)
 		return 0;
 
+	if (!vmemmap_enabled)
+		return 0;
+
 	if (flags != MHP_MEMMAP_ON_MEMORY) {
 		WARN(1, "Wrong flags value (%lx). Ignoring flags.\n", flags);
 		return 0;
-- 
2.12.3


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/5] Allocate memmap from hotadded memory
  2019-07-25 16:02 [PATCH v3 0/5] Allocate memmap from hotadded memory Oscar Salvador
                   ` (4 preceding siblings ...)
  2019-07-25 16:02 ` [PATCH v3 5/5] mm,memory_hotplug: Allow userspace to enable/disable vmemmap Oscar Salvador
@ 2019-07-25 16:56 ` David Hildenbrand
  2019-08-01  7:39 ` Oscar Salvador
  2019-08-01 18:46 ` David Hildenbrand
  7 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2019-07-25 16:56 UTC (permalink / raw)
  To: Oscar Salvador, akpm
  Cc: dan.j.williams, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel

On 25.07.19 18:02, Oscar Salvador wrote:
> Here we go with v3.
> 
> v3 -> v2:
>         * Rewrite about vmemmap pages handling.
>           Prior to this version, I was (ab)using hugepages fields
>           from struct page, while here I am officially adding a new
>           sub-page type with the fields I need.
> 
>         * Drop MHP_MEMMAP_{MEMBLOCK,DEVICE} in favor of MHP_MEMMAP_ON_MEMORY.
>           While I am still not 100% if this the right decision, and while I
>           still see some gaining in having MHP_MEMMAP_{MEMBLOCK,DEVICE},
>           having only one flag ease the code.
>           If the user wants to allocate memmaps per memblock, it'll
>           have to call add_memory() variants with memory-block granularity.
> 
>           If we happen to have a more clear usecase MHP_MEMMAP_MEMBLOCK
>           flag in the future, so user does not have to bother about the way
>           it calls add_memory() variants, but only pass a flag, we can add it.
>           Actually, I already had the code, so add it in the future is going to be
>           easy.

FWIW, for now I think this is the right thing to do. Whoever roots for
this now has to propose an interface on how this is going to be used
now. Otherwise, this is untested, dead code. Nobody wants that :)

> 
>         * Granularity check when hot-removing memory.
>           Just checking that the granularity is the same.

This is for the powernv/memtrace.c case, right?

> 
> [Testing]
> 
>  - x86_64: small and large memblocks (128MB, 1G and 2G)
> 
> So far, only acpi memory hotplug uses the new flag.
> The other callers can be changed depending on their needs.
> 
> [Coverletter]
> 
> This is another step to make memory hotplug more usable. The primary
> goal of this patchset is to reduce memory overhead of the hot-added
> memory (at least for SPARSEMEM_VMEMMAP memory model). The current way we use
> to populate memmap (struct page array) has two main drawbacks:
> 
> a) it consumes an additional memory until the hotadded memory itself is
>    onlined and
> b) memmap might end up on a different numa node which is especially true
>    for movable_node configuration.
> 
> a) it is a problem especially for memory hotplug based memory "ballooning"
>    solutions when the delay between physical memory hotplug and the
>    onlining can lead to OOM and that led to introduction of hacks like auto
>    onlining (see 31bc3858ea3e ("memory-hotplug: add automatic onlining
>    policy for the newly added memory")).
> 
> b) can have performance drawbacks.

We now also consume less NORMAL memory when onlining DIMMs to the
MOVABLE_ZONE, as the vmemmap no longer ends up in the NORMAL zone -
which is nice. (not perfect, but nice :) )

I'm curious on how/when you are initializing the vmemmap and setting all
vmemmap pages to the new page type. Right now, we initialize it when
onlining memory - will have a look how you sorted that out :)

> 
> One way to mitigate all these issues is to simply allocate memmap array
> (which is the largest memory footprint of the physical memory hotplug)
> from the hot-added memory itself. SPARSEMEM_VMEMMAP memory model allows
> us to map any pfn range so the memory doesn't need to be online to be
> usable for the array. See patch 3 for more details.
> This feature is only usable when CONFIG_SPARSEMEM_VMEMMAP is set.
> 
> [Overall design]:
> 
> Implementation wise we reuse vmem_altmap infrastructure to override
> the default allocator used by vmemap_populate. Once the memmap is
> allocated we need a way to mark altmap pfns used for the allocation.
> If MHP_MEMMAP_ON_MEMORY flag was passed, we set up the layout of the
> altmap structure at the beginning of __add_pages(), and then we call
> mark_vmemmap_pages().
> 
> MHP_MEMMAP_ON_MEMORY flag parameter will specify to allocate memmaps
> from the hot-added range.
> If callers wants memmaps to be allocated per memory block, it will
> have to call add_memory() variants in memory-block granularity
> spanning the whole range, while if it wants to allocate memmaps
> per whole memory range, just one call will do.

I assume you you played with all kinds of offlining/onlining of affected
memory blocks and especially that the vmemmap pages remain set to the
new page type?

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/5] mm,memory_hotplug: Introduce MHP_MEMMAP_ON_MEMORY
  2019-07-25 16:02 ` [PATCH v3 1/5] mm,memory_hotplug: Introduce MHP_MEMMAP_ON_MEMORY Oscar Salvador
@ 2019-07-26  8:34   ` David Hildenbrand
  2019-07-26  9:29     ` Oscar Salvador
  0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2019-07-26  8:34 UTC (permalink / raw)
  To: Oscar Salvador, akpm
  Cc: dan.j.williams, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel

On 25.07.19 18:02, Oscar Salvador wrote:
> This patch introduces MHP_MEMMAP_ON_MEMORY flag,
> and prepares the callers that add memory to take a "flags" parameter.
> This "flags" parameter will be evaluated later on in Patch#3
> to init mhp_restrictions struct.
> 
> The callers are:
> 
> add_memory
> __add_memory
> add_memory_resource
> 
> Unfortunately, we do not have a single entry point to add memory, as depending
> on the requisites of the caller, they want to hook up in different places,
> (e.g: Xen reserve_additional_memory()), so we have to spread the parameter
> in the three callers.
> 
> MHP_MEMMAP_ON_MEMORY flag parameter will specify to allocate memmaps
> from the hot-added range.
> If callers wants memmaps to be allocated per memory block, it will
> have to call add_memory() variants in memory-block granularity
> spanning the whole range, while if it wants to allocate memmaps
> per whole memory range, just one call will do.
> 
> Want to add 384MB (3 sections, 3 memory-blocks)
> e.g:
> 
> 	add_memory(0x1000, size_memory_block);
> 	add_memory(0x2000, size_memory_block);
> 	add_memory(0x3000, size_memory_block);
> 
> 	[memblock#0  ]
> 	[0 - 511 pfns      ] - vmemmaps for section#0
> 	[512 - 32767 pfns  ] - normal memory
> 
> 	[memblock#1 ]
> 	[32768 - 33279 pfns] - vmemmaps for section#1
> 	[33280 - 65535 pfns] - normal memory
> 
> 	[memblock#2 ]
> 	[65536 - 66047 pfns] - vmemmap for section#2
> 	[66048 - 98304 pfns] - normal memory

I wouldn't even care about documenting this right now. We have no user
so far, so spending 50% of the description on this topic isn't really
needed IMHO :)

> 
> or
> 	add_memory(0x1000, size_memory_block * 3);
> 
> 	[memblock #0 ]
>         [0 - 1533 pfns    ] - vmemmap for section#{0-2}
>         [1534 - 98304 pfns] - normal memory
> 
> When using larger memory blocks (1GB or 2GB), the principle is the same.
> 
> Of course, per whole-range granularity is nicer when it comes to have a large
> contigous area, while per memory-block granularity allows us to have flexibility
> when removing the memory.

E.g., in my virtio-mem I am currently adding all memory blocks
separately either way (to guranatee that remove_memory() works cleanly -
see __release_memory_resource()), and to control the amount of
not-offlined memory blocks (e.g., to make user space is actually
onlining them). As it's just a prototype, this might change of course in
the future.

> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  drivers/acpi/acpi_memhotplug.c |  2 +-
>  drivers/base/memory.c          |  2 +-
>  drivers/dax/kmem.c             |  2 +-
>  drivers/hv/hv_balloon.c        |  2 +-
>  drivers/s390/char/sclp_cmd.c   |  2 +-
>  drivers/xen/balloon.c          |  2 +-
>  include/linux/memory_hotplug.h | 25 ++++++++++++++++++++++---
>  mm/memory_hotplug.c            | 10 +++++-----
>  8 files changed, 33 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index e294f44a7850..d91b3584d4b2 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -207,7 +207,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>  		if (node < 0)
>  			node = memory_add_physaddr_to_nid(info->start_addr);
>  
> -		result = __add_memory(node, info->start_addr, info->length);
> +		result = __add_memory(node, info->start_addr, info->length, 0);
>  
>  		/*
>  		 * If the memory block has been used by the kernel, add_memory()
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 154d5d4a0779..d30d0f6c8ad0 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -521,7 +521,7 @@ static ssize_t probe_store(struct device *dev, struct device_attribute *attr,
>  
>  	nid = memory_add_physaddr_to_nid(phys_addr);
>  	ret = __add_memory(nid, phys_addr,
> -			   MIN_MEMORY_BLOCK_SIZE * sections_per_block);
> +			   MIN_MEMORY_BLOCK_SIZE * sections_per_block, 0);
>  
>  	if (ret)
>  		goto out;
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 3d0a7e702c94..e159184e0ba0 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -65,7 +65,7 @@ int dev_dax_kmem_probe(struct device *dev)
>  	new_res->flags = IORESOURCE_SYSTEM_RAM;
>  	new_res->name = dev_name(dev);
>  
> -	rc = add_memory(numa_node, new_res->start, resource_size(new_res));
> +	rc = add_memory(numa_node, new_res->start, resource_size(new_res), 0);
>  	if (rc) {
>  		release_resource(new_res);
>  		kfree(new_res);
> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index 6fb4ea5f0304..beb92bc56186 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -731,7 +731,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
>  
>  		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
>  		ret = add_memory(nid, PFN_PHYS((start_pfn)),
> -				(HA_CHUNK << PAGE_SHIFT));
> +				(HA_CHUNK << PAGE_SHIFT), 0);
>  
>  		if (ret) {
>  			pr_err("hot_add memory failed error is %d\n", ret);
> diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
> index 37d42de06079..f61026c7db7e 100644
> --- a/drivers/s390/char/sclp_cmd.c
> +++ b/drivers/s390/char/sclp_cmd.c
> @@ -406,7 +406,7 @@ static void __init add_memory_merged(u16 rn)
>  	if (!size)
>  		goto skip_add;
>  	for (addr = start; addr < start + size; addr += block_size)
> -		add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size);
> +		add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size, 0);
>  skip_add:
>  	first_rn = rn;
>  	num = 1;
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 4e11de6cde81..e4934ce40478 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -349,7 +349,7 @@ static enum bp_state reserve_additional_memory(void)
>  	mutex_unlock(&balloon_mutex);
>  	/* add_memory_resource() requires the device_hotplug lock */
>  	lock_device_hotplug();
> -	rc = add_memory_resource(nid, resource);
> +	rc = add_memory_resource(nid, resource, 0);
>  	unlock_device_hotplug();
>  	mutex_lock(&balloon_mutex);
>  
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index f46ea71b4ffd..45dece922d7c 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -54,6 +54,25 @@ enum {
>  };
>  
>  /*
> + * We want memmap (struct page array) to be allocated from the hotadded range.
> + * To do so, there are two possible ways depending on what the caller wants.
> + * 1) Allocate memmap pages whole hot-added range.
> + *    Here the caller will only call any add_memory() variant with the whole
> + *    memory address.
> + * 2) Allocate memmap pages per memblock
> + *    Here, the caller will call any add_memory() variant per memblock
> + *    granularity.
> + * The former implies that we will use the beginning of the hot-added range
> + * to store the memmap pages of the whole range, while the latter implies
> + * that we will use the beginning of each memblock to store its own memmap
> + * pages.

Can you make this documentation only state how MHP_MEMMAP_ON_MEMORY
works? (IOW, shrink it heavily to what we actually implement)

> + *
> + * Please note that this is only a hint, not a guarantee. Only selected
> + * architectures support it with SPARSE_VMEMMAP.
> + */
> +#define MHP_MEMMAP_ON_MEMORY	(1UL<<1)
> +
> +/*
>   * Restrictions for the memory hotplug:
>   * flags:  MHP_ flags
>   * altmap: alternative allocator for memmap array
> @@ -340,9 +359,9 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
>  extern void __ref free_area_init_core_hotplug(int nid);
> -extern int __add_memory(int nid, u64 start, u64 size);
> -extern int add_memory(int nid, u64 start, u64 size);
> -extern int add_memory_resource(int nid, struct resource *resource);
> +extern int __add_memory(int nid, u64 start, u64 size, unsigned long flags);
> +extern int add_memory(int nid, u64 start, u64 size, unsigned long flags);
> +extern int add_memory_resource(int nid, struct resource *resource, unsigned long flags);
>  extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>  		unsigned long nr_pages, struct vmem_altmap *altmap);
>  extern bool is_memblock_offlined(struct memory_block *mem);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9a82e12bd0e7..3d97c3711333 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1046,7 +1046,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>   *
>   * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
>   */
> -int __ref add_memory_resource(int nid, struct resource *res)
> +int __ref add_memory_resource(int nid, struct resource *res, unsigned long flags)
>  {
>  	struct mhp_restrictions restrictions = {};
>  	u64 start, size;
> @@ -1123,7 +1123,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
>  }
>  
>  /* requires device_hotplug_lock, see add_memory_resource() */
> -int __ref __add_memory(int nid, u64 start, u64 size)
> +int __ref __add_memory(int nid, u64 start, u64 size, unsigned long flags)
>  {
>  	struct resource *res;
>  	int ret;
> @@ -1132,18 +1132,18 @@ int __ref __add_memory(int nid, u64 start, u64 size)
>  	if (IS_ERR(res))
>  		return PTR_ERR(res);
>  
> -	ret = add_memory_resource(nid, res);
> +	ret = add_memory_resource(nid, res, flags);
>  	if (ret < 0)
>  		release_memory_resource(res);
>  	return ret;
>  }
>  
> -int add_memory(int nid, u64 start, u64 size)
> +int add_memory(int nid, u64 start, u64 size, unsigned long flags)
>  {
>  	int rc;
>  
>  	lock_device_hotplug();
> -	rc = __add_memory(nid, start, size);
> +	rc = __add_memory(nid, start, size, flags);
>  	unlock_device_hotplug();
>  
>  	return rc;
> 

Apart from the requested description/documentation changes

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/5] mm: Introduce a new Vmemmap page-type
  2019-07-25 16:02 ` [PATCH v3 2/5] mm: Introduce a new Vmemmap page-type Oscar Salvador
@ 2019-07-26  8:48   ` David Hildenbrand
  2019-07-26  9:25     ` Oscar Salvador
  0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2019-07-26  8:48 UTC (permalink / raw)
  To: Oscar Salvador, akpm
  Cc: dan.j.williams, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel

On 25.07.19 18:02, Oscar Salvador wrote:
> This patch introduces a new Vmemmap page-type.
> 
> It also introduces some functions to ease the handling of vmemmap pages:
> 
> - vmemmap_nr_sections: Returns the number of sections that used vmemmap.
> 
> - vmemmap_nr_pages: Allows us to retrieve the amount of vmemmap pages
>   derivated from any vmemmap-page in the section. Useful for accounting
>   and to know how much to we have to skip in the case where vmemmap pages
>   need to be ignored.
> 
> - vmemmap_head: Returns the vmemmap head page
> 
> - SetPageVmemmap: Sets Reserved flag bit, and sets page->type to Vmemmap.
>   Setting the Reserved flag bit is just for extra protection, actually
>   we do not expect anyone to use these pages for anything.
> 
> - ClearPageVmemmap: Clears Reserved flag bit and page->type.
>   Only used when sections containing vmemmap pages are removed.
> 
> These functions will be used for the code handling Vmemmap pages.
> 

Much cleaner using the page type :)

> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  include/linux/mm.h         | 17 +++++++++++++++++
>  include/linux/mm_types.h   |  5 +++++
>  include/linux/page-flags.h | 19 +++++++++++++++++++
>  3 files changed, 41 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 45f0ab0ed4f7..432175f8f8d2 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2904,6 +2904,23 @@ static inline bool debug_guardpage_enabled(void) { return false; }
>  static inline bool page_is_guard(struct page *page) { return false; }
>  #endif /* CONFIG_DEBUG_PAGEALLOC */
>  
> +static __always_inline struct page *vmemmap_head(struct page *page)
> +{
> +	return (struct page *)page->vmemmap_head;
> +}
> +
> +static __always_inline unsigned long vmemmap_nr_sections(struct page *page)
> +{
> +	struct page *head = vmemmap_head(page);
> +	return head->vmemmap_sections;
> +}
> +
> +static __always_inline unsigned long vmemmap_nr_pages(struct page *page)
> +{
> +	struct page *head = vmemmap_head(page);
> +	return head->vmemmap_pages - (page - head);
> +}
> +
>  #if MAX_NUMNODES > 1
>  void __init setup_nr_node_ids(void);
>  #else
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6a7a1083b6fb..51dd227f2a6b 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -170,6 +170,11 @@ struct page {
>  			 * pmem backed DAX files are mapped.
>  			 */
>  		};
> +		struct {        /* Vmemmap pages */
> +			unsigned long vmemmap_head;
> +			unsigned long vmemmap_sections; /* Number of sections */
> +			unsigned long vmemmap_pages;    /* Number of pages */
> +		};
>  
>  		/** @rcu_head: You can use this to free a page by RCU. */
>  		struct rcu_head rcu_head;
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index f91cb8898ff0..75f302a532f9 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -708,6 +708,7 @@ PAGEFLAG_FALSE(DoubleMap)
>  #define PG_kmemcg	0x00000200
>  #define PG_table	0x00000400
>  #define PG_guard	0x00000800
> +#define PG_vmemmap     0x00001000
>  
>  #define PageType(page, flag)						\
>  	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> @@ -764,6 +765,24 @@ PAGE_TYPE_OPS(Table, table)
>   */
>  PAGE_TYPE_OPS(Guard, guard)
>  
> +/*
> + * Vmemmap pages refers to those pages that are used to create the memmap
> + * array, and reside within the same memory range that was hotppluged, so
> + * they are self-hosted. (see include/linux/memory_hotplug.h)
> + */
> +PAGE_TYPE_OPS(Vmemmap, vmemmap)
> +static __always_inline void SetPageVmemmap(struct page *page)
> +{
> +	__SetPageVmemmap(page);
> +	__SetPageReserved(page);

So, the issue with some vmemmap pages is that the "struct pages" reside
on the memory they manage. (it is nice, but complicated - e.g. when
onlining/offlining)

I would expect that you properly initialize the struct pages for the
vmemmap pages (now it gets confusing :) ) when adding memory. The other
struct pages are initialized when onlining/offlining.

So, at this point, the pages should already be marked reserved, no? Or
are the struct pages for the vmemmap never initialized?

What zone do these vmemmap pages have? They are not assigned to any zone
and will never be :/

> +}
> +
> +static __always_inline void ClearPageVmemmap(struct page *page)
> +{
> +	__ClearPageVmemmap(page);
> +	__ClearPageReserved(page);

You sure you want to clear the reserved flag here? Is this function
really needed?

(when you add memory, you can mark all relevant pages as vmemmap pages,
which is valid until removing the memory)

Let's draw a picture so I am not confused

[ ------ added memory ------ ]
[ vmemmap]

The first page of the added memory is a vmemmap page AND contains its
own vmemmap, right?

When adding memory, you would initialize set all struct pages of the
vmemmap (residing on itself) and set them to SetPageVmemmap().

When removing memory, there is nothing to do, all struct pages are
dropped. So why do we need the ClearPageVmemmap() ?

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/5] mm: Introduce a new Vmemmap page-type
  2019-07-26  8:48   ` David Hildenbrand
@ 2019-07-26  9:25     ` Oscar Salvador
  2019-07-26  9:41       ` David Hildenbrand
  0 siblings, 1 reply; 22+ messages in thread
From: Oscar Salvador @ 2019-07-26  9:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: akpm, dan.j.williams, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel

On Fri, Jul 26, 2019 at 10:48:54AM +0200, David Hildenbrand wrote:
> > Signed-off-by: Oscar Salvador <osalvador@suse.de>
> > ---
> >  include/linux/mm.h         | 17 +++++++++++++++++
> >  include/linux/mm_types.h   |  5 +++++
> >  include/linux/page-flags.h | 19 +++++++++++++++++++
> >  3 files changed, 41 insertions(+)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 45f0ab0ed4f7..432175f8f8d2 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2904,6 +2904,23 @@ static inline bool debug_guardpage_enabled(void) { return false; }
> >  static inline bool page_is_guard(struct page *page) { return false; }
> >  #endif /* CONFIG_DEBUG_PAGEALLOC */
> >  
> > +static __always_inline struct page *vmemmap_head(struct page *page)
> > +{
> > +	return (struct page *)page->vmemmap_head;
> > +}
> > +
> > +static __always_inline unsigned long vmemmap_nr_sections(struct page *page)
> > +{
> > +	struct page *head = vmemmap_head(page);
> > +	return head->vmemmap_sections;
> > +}
> > +
> > +static __always_inline unsigned long vmemmap_nr_pages(struct page *page)
> > +{
> > +	struct page *head = vmemmap_head(page);
> > +	return head->vmemmap_pages - (page - head);
> > +}
> > +
> >  #if MAX_NUMNODES > 1
> >  void __init setup_nr_node_ids(void);
> >  #else
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 6a7a1083b6fb..51dd227f2a6b 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -170,6 +170,11 @@ struct page {
> >  			 * pmem backed DAX files are mapped.
> >  			 */
> >  		};
> > +		struct {        /* Vmemmap pages */
> > +			unsigned long vmemmap_head;
> > +			unsigned long vmemmap_sections; /* Number of sections */
> > +			unsigned long vmemmap_pages;    /* Number of pages */
> > +		};
> >  
> >  		/** @rcu_head: You can use this to free a page by RCU. */
> >  		struct rcu_head rcu_head;
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index f91cb8898ff0..75f302a532f9 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -708,6 +708,7 @@ PAGEFLAG_FALSE(DoubleMap)
> >  #define PG_kmemcg	0x00000200
> >  #define PG_table	0x00000400
> >  #define PG_guard	0x00000800
> > +#define PG_vmemmap     0x00001000
> >  
> >  #define PageType(page, flag)						\
> >  	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> > @@ -764,6 +765,24 @@ PAGE_TYPE_OPS(Table, table)
> >   */
> >  PAGE_TYPE_OPS(Guard, guard)
> >  
> > +/*
> > + * Vmemmap pages refers to those pages that are used to create the memmap
> > + * array, and reside within the same memory range that was hotppluged, so
> > + * they are self-hosted. (see include/linux/memory_hotplug.h)
> > + */
> > +PAGE_TYPE_OPS(Vmemmap, vmemmap)
> > +static __always_inline void SetPageVmemmap(struct page *page)
> > +{
> > +	__SetPageVmemmap(page);
> > +	__SetPageReserved(page);
> 
> So, the issue with some vmemmap pages is that the "struct pages" reside
> on the memory they manage. (it is nice, but complicated - e.g. when
> onlining/offlining)

Hi David,

Not really.
Vemmap pages are just skipped when onling/offlining handling.
We do not need them to a) send to the buddy and b) migrate them over.
A look at patch#4 will probably help, as the crux of the matter is there.

> 
> I would expect that you properly initialize the struct pages for the
> vmemmap pages (now it gets confusing :) ) when adding memory. The other
> struct pages are initialized when onlining/offlining.
> 
> So, at this point, the pages should already be marked reserved, no? Or
> are the struct pages for the vmemmap never initialized?
> 
> What zone do these vmemmap pages have? They are not assigned to any zone
> and will never be :/

This patch is only a preparation, the real "fun" is in patch#4.

Vmemmap pages initialization occurs in mhp_mark_vmemmap_pages, called from
__add_pages() (patch#4).
In there we a) mark the page as vmemmap and b) initialize the fields we need to
track some medata (sections, pages and head).

In __init_single_page(), when onlining, the rest of the fields will be set up
properly (zone, refcount, etc).

Chunk from patch#4:

static void __meminit __init_single_page(struct page *page, unsigned long pfn,
                                unsigned long zone, int nid)
{
        if (PageVmemmap(page))
                /*
                 * Vmemmap pages need to preserve their state.
                 */
                goto preserve_state;

        mm_zero_struct_page(page);
        page_mapcount_reset(page);
        INIT_LIST_HEAD(&page->lru);
preserve_state:
        init_page_count(page);
        set_page_links(page, zone, nid, pfn);
        page_cpupid_reset_last(page);
        page_kasan_tag_reset(page);

So, vmemmap pages will fall within the same zone as the range we are adding,
that does not change.

> > +}
> > +
> > +static __always_inline void ClearPageVmemmap(struct page *page)
> > +{
> > +	__ClearPageVmemmap(page);
> > +	__ClearPageReserved(page);
> 
> You sure you want to clear the reserved flag here? Is this function
> really needed?
> 
> (when you add memory, you can mark all relevant pages as vmemmap pages,
> which is valid until removing the memory)
> 
> Let's draw a picture so I am not confused
> 
> [ ------ added memory ------ ]
> [ vmemmap]
> 
> The first page of the added memory is a vmemmap page AND contains its
> own vmemmap, right?

Not only the first page.
Depending on how large is the chunk you are adding, the number of vmemmap
pages will vary, because we need to cover the memmaps for the range.

e.g:

 - 128MB (1 section) = 512 vmemmap pages at the beginning of the range
 - 256MB (2 section) = 1024 vmemmap pages at the beginning of the range
 ...

> When adding memory, you would initialize set all struct pages of the
> vmemmap (residing on itself) and set them to SetPageVmemmap().
> 
> When removing memory, there is nothing to do, all struct pages are
> dropped. So why do we need the ClearPageVmemmap() ?

Well, it is not really needed as we only call ClearPageVmemmap when we are
actually removing the memory with vmemmap_free()->...
So one could argue that since the memory is going away, there is no need
to clear anything in there.

I just made it for consistency purposes.

Can drop it if feeling strong here.

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/5] mm,memory_hotplug: Introduce MHP_MEMMAP_ON_MEMORY
  2019-07-26  8:34   ` David Hildenbrand
@ 2019-07-26  9:29     ` Oscar Salvador
  2019-07-26  9:37       ` David Hildenbrand
  0 siblings, 1 reply; 22+ messages in thread
From: Oscar Salvador @ 2019-07-26  9:29 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: akpm, dan.j.williams, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel

On Fri, Jul 26, 2019 at 10:34:47AM +0200, David Hildenbrand wrote:
> > Want to add 384MB (3 sections, 3 memory-blocks)
> > e.g:
> > 
> > 	add_memory(0x1000, size_memory_block);
> > 	add_memory(0x2000, size_memory_block);
> > 	add_memory(0x3000, size_memory_block);
> > 
> > 	[memblock#0  ]
> > 	[0 - 511 pfns      ] - vmemmaps for section#0
> > 	[512 - 32767 pfns  ] - normal memory
> > 
> > 	[memblock#1 ]
> > 	[32768 - 33279 pfns] - vmemmaps for section#1
> > 	[33280 - 65535 pfns] - normal memory
> > 
> > 	[memblock#2 ]
> > 	[65536 - 66047 pfns] - vmemmap for section#2
> > 	[66048 - 98304 pfns] - normal memory
> 
> I wouldn't even care about documenting this right now. We have no user
> so far, so spending 50% of the description on this topic isn't really
> needed IMHO :)

Fair enough, I could drop it.
Was just trying to be extra clear.

> 
> > 
> > or
> > 	add_memory(0x1000, size_memory_block * 3);
> > 
> > 	[memblock #0 ]
> >         [0 - 1533 pfns    ] - vmemmap for section#{0-2}
> >         [1534 - 98304 pfns] - normal memory
> > 
> > When using larger memory blocks (1GB or 2GB), the principle is the same.
> > 
> > Of course, per whole-range granularity is nicer when it comes to have a large
> > contigous area, while per memory-block granularity allows us to have flexibility
> > when removing the memory.
> 
> E.g., in my virtio-mem I am currently adding all memory blocks
> separately either way (to guranatee that remove_memory() works cleanly -
> see __release_memory_resource()), and to control the amount of
> not-offlined memory blocks (e.g., to make user space is actually
> onlining them). As it's just a prototype, this might change of course in
> the future.

What is virtio-mem for? Did it that raised from a need?
Is it something you could try this patch on?

> >  /*
> > + * We want memmap (struct page array) to be allocated from the hotadded range.
> > + * To do so, there are two possible ways depending on what the caller wants.
> > + * 1) Allocate memmap pages whole hot-added range.
> > + *    Here the caller will only call any add_memory() variant with the whole
> > + *    memory address.
> > + * 2) Allocate memmap pages per memblock
> > + *    Here, the caller will call any add_memory() variant per memblock
> > + *    granularity.
> > + * The former implies that we will use the beginning of the hot-added range
> > + * to store the memmap pages of the whole range, while the latter implies
> > + * that we will use the beginning of each memblock to store its own memmap
> > + * pages.
> 
> Can you make this documentation only state how MHP_MEMMAP_ON_MEMORY
> works? (IOW, shrink it heavily to what we actually implement)

Sure.

> Apart from the requested description/documentation changes
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>

Thanks for having a look David ;-)
> 
> -- 
> 
> Thanks,
> 
> David / dhildenb

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/5] mm,memory_hotplug: Introduce MHP_MEMMAP_ON_MEMORY
  2019-07-26  9:29     ` Oscar Salvador
@ 2019-07-26  9:37       ` David Hildenbrand
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2019-07-26  9:37 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: akpm, dan.j.williams, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel

>>
>>>
>>> or
>>> 	add_memory(0x1000, size_memory_block * 3);
>>>
>>> 	[memblock #0 ]
>>>         [0 - 1533 pfns    ] - vmemmap for section#{0-2}
>>>         [1534 - 98304 pfns] - normal memory
>>>
>>> When using larger memory blocks (1GB or 2GB), the principle is the same.
>>>
>>> Of course, per whole-range granularity is nicer when it comes to have a large
>>> contigous area, while per memory-block granularity allows us to have flexibility
>>> when removing the memory.
>>
>> E.g., in my virtio-mem I am currently adding all memory blocks
>> separately either way (to guranatee that remove_memory() works cleanly -
>> see __release_memory_resource()), and to control the amount of
>> not-offlined memory blocks (e.g., to make user space is actually
>> onlining them). As it's just a prototype, this might change of course in
>> the future.
> 
> What is virtio-mem for? Did it that raised from a need?
> Is it something you could try this patch on?

virtio-mem is a paravirtualized way of hotplugging/removing to/from a
guest. (similar to, but different to e.g., the hv-balloon). It
adds/removes memory to/from the system. In the long term, it will try to
also act similar-but different to a balloon - but that will require more
work. In the first shot, it's all about adding/removing memory in the
smaller granularity possible.

The old prototype was

https://lwn.net/Articles/755423/

Since then, a lot changed. Some more updated information is at

https://events.linuxfoundation.org/wp-content/uploads/2017/12/virtio-mem-Paravirtualized-Memory-David-Hildenbrand-Red-Hat-1.pdf

There is also a recording of the presentation on youtube.

The current prototype is unfortunately not in a state yet that allows me
to test with this patch set - my Master's thesis consumed most of my
energy during the last year. I just started hacking on it again.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/5] mm: Introduce a new Vmemmap page-type
  2019-07-26  9:25     ` Oscar Salvador
@ 2019-07-26  9:41       ` David Hildenbrand
  2019-07-26 10:11         ` Oscar Salvador
  0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2019-07-26  9:41 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: akpm, dan.j.williams, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel

On 26.07.19 11:25, Oscar Salvador wrote:
> On Fri, Jul 26, 2019 at 10:48:54AM +0200, David Hildenbrand wrote:
>>> Signed-off-by: Oscar Salvador <osalvador@suse.de>
>>> ---
>>>  include/linux/mm.h         | 17 +++++++++++++++++
>>>  include/linux/mm_types.h   |  5 +++++
>>>  include/linux/page-flags.h | 19 +++++++++++++++++++
>>>  3 files changed, 41 insertions(+)
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 45f0ab0ed4f7..432175f8f8d2 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -2904,6 +2904,23 @@ static inline bool debug_guardpage_enabled(void) { return false; }
>>>  static inline bool page_is_guard(struct page *page) { return false; }
>>>  #endif /* CONFIG_DEBUG_PAGEALLOC */
>>>  
>>> +static __always_inline struct page *vmemmap_head(struct page *page)
>>> +{
>>> +	return (struct page *)page->vmemmap_head;
>>> +}
>>> +
>>> +static __always_inline unsigned long vmemmap_nr_sections(struct page *page)
>>> +{
>>> +	struct page *head = vmemmap_head(page);
>>> +	return head->vmemmap_sections;
>>> +}
>>> +
>>> +static __always_inline unsigned long vmemmap_nr_pages(struct page *page)
>>> +{
>>> +	struct page *head = vmemmap_head(page);
>>> +	return head->vmemmap_pages - (page - head);
>>> +}
>>> +
>>>  #if MAX_NUMNODES > 1
>>>  void __init setup_nr_node_ids(void);
>>>  #else
>>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>>> index 6a7a1083b6fb..51dd227f2a6b 100644
>>> --- a/include/linux/mm_types.h
>>> +++ b/include/linux/mm_types.h
>>> @@ -170,6 +170,11 @@ struct page {
>>>  			 * pmem backed DAX files are mapped.
>>>  			 */
>>>  		};
>>> +		struct {        /* Vmemmap pages */
>>> +			unsigned long vmemmap_head;
>>> +			unsigned long vmemmap_sections; /* Number of sections */
>>> +			unsigned long vmemmap_pages;    /* Number of pages */
>>> +		};
>>>  
>>>  		/** @rcu_head: You can use this to free a page by RCU. */
>>>  		struct rcu_head rcu_head;
>>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>>> index f91cb8898ff0..75f302a532f9 100644
>>> --- a/include/linux/page-flags.h
>>> +++ b/include/linux/page-flags.h
>>> @@ -708,6 +708,7 @@ PAGEFLAG_FALSE(DoubleMap)
>>>  #define PG_kmemcg	0x00000200
>>>  #define PG_table	0x00000400
>>>  #define PG_guard	0x00000800
>>> +#define PG_vmemmap     0x00001000
>>>  
>>>  #define PageType(page, flag)						\
>>>  	((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
>>> @@ -764,6 +765,24 @@ PAGE_TYPE_OPS(Table, table)
>>>   */
>>>  PAGE_TYPE_OPS(Guard, guard)
>>>  
>>> +/*
>>> + * Vmemmap pages refers to those pages that are used to create the memmap
>>> + * array, and reside within the same memory range that was hotppluged, so
>>> + * they are self-hosted. (see include/linux/memory_hotplug.h)
>>> + */
>>> +PAGE_TYPE_OPS(Vmemmap, vmemmap)
>>> +static __always_inline void SetPageVmemmap(struct page *page)
>>> +{
>>> +	__SetPageVmemmap(page);
>>> +	__SetPageReserved(page);
>>
>> So, the issue with some vmemmap pages is that the "struct pages" reside
>> on the memory they manage. (it is nice, but complicated - e.g. when
>> onlining/offlining)
> 
> Hi David,
> 
> Not really.
> Vemmap pages are just skipped when onling/offlining handling.
> We do not need them to a) send to the buddy and b) migrate them over.
> A look at patch#4 will probably help, as the crux of the matter is there.

Right, but you have to hinder onlining code from trying to reinitialize
the vmemmap - when you try to online the first memory block. Will dive
into the details (patch #4) next (maybe not today, but early next week) :)

> 
>>
>> I would expect that you properly initialize the struct pages for the
>> vmemmap pages (now it gets confusing :) ) when adding memory. The other
>> struct pages are initialized when onlining/offlining.
>>
>> So, at this point, the pages should already be marked reserved, no? Or
>> are the struct pages for the vmemmap never initialized?
>>
>> What zone do these vmemmap pages have? They are not assigned to any zone
>> and will never be :/
> 
> This patch is only a preparation, the real "fun" is in patch#4.
> 
> Vmemmap pages initialization occurs in mhp_mark_vmemmap_pages, called from
> __add_pages() (patch#4).
> In there we a) mark the page as vmemmap and b) initialize the fields we need to
> track some medata (sections, pages and head).
> 
> In __init_single_page(), when onlining, the rest of the fields will be set up
> properly (zone, refcount, etc).
> 
> Chunk from patch#4:
> 
> static void __meminit __init_single_page(struct page *page, unsigned long pfn,
>                                 unsigned long zone, int nid)
> {
>         if (PageVmemmap(page))
>                 /*
>                  * Vmemmap pages need to preserve their state.
>                  */
>                 goto preserve_state;

Can you be sure there are no false positives? (if I remember correctly,
this memory might be completely uninitialized - I might be wrong)

> 
>         mm_zero_struct_page(page);
>         page_mapcount_reset(page);
>         INIT_LIST_HEAD(&page->lru);
> preserve_state:
>         init_page_count(page);
>         set_page_links(page, zone, nid, pfn);
>         page_cpupid_reset_last(page);
>         page_kasan_tag_reset(page);
> 
> So, vmemmap pages will fall within the same zone as the range we are adding,
> that does not change.

I wonder if that is the right thing to do, hmmmm, because they are
effectively not part of that zone (not online)

Will have a look at the details :)

> 
>>> +}
>>> +
>>> +static __always_inline void ClearPageVmemmap(struct page *page)
>>> +{
>>> +	__ClearPageVmemmap(page);
>>> +	__ClearPageReserved(page);
>>
>> You sure you want to clear the reserved flag here? Is this function
>> really needed?
>>
>> (when you add memory, you can mark all relevant pages as vmemmap pages,
>> which is valid until removing the memory)
>>
>> Let's draw a picture so I am not confused
>>
>> [ ------ added memory ------ ]
>> [ vmemmap]
>>
>> The first page of the added memory is a vmemmap page AND contains its
>> own vmemmap, right?
> 
> Not only the first page.
> Depending on how large is the chunk you are adding, the number of vmemmap
> pages will vary, because we need to cover the memmaps for the range.
> 
> e.g:
> 
>  - 128MB (1 section) = 512 vmemmap pages at the beginning of the range
>  - 256MB (2 section) = 1024 vmemmap pages at the beginning of the range
>  ...
> 

Right.

>> When adding memory, you would initialize set all struct pages of the
>> vmemmap (residing on itself) and set them to SetPageVmemmap().
>>
>> When removing memory, there is nothing to do, all struct pages are
>> dropped. So why do we need the ClearPageVmemmap() ?
> 
> Well, it is not really needed as we only call ClearPageVmemmap when we are
> actually removing the memory with vmemmap_free()->...
> So one could argue that since the memory is going away, there is no need
> to clear anything in there.
> 
> I just made it for consistency purposes.
> 
> Can drop it if feeling strong here.

Not strong, was just wondering why that is needed at all in the big
picture :)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/5] mm: Introduce a new Vmemmap page-type
  2019-07-26  9:41       ` David Hildenbrand
@ 2019-07-26 10:11         ` Oscar Salvador
  0 siblings, 0 replies; 22+ messages in thread
From: Oscar Salvador @ 2019-07-26 10:11 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: akpm, dan.j.williams, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel

On Fri, Jul 26, 2019 at 11:41:46AM +0200, David Hildenbrand wrote:
> > static void __meminit __init_single_page(struct page *page, unsigned long pfn,
> >                                 unsigned long zone, int nid)
> > {
> >         if (PageVmemmap(page))
> >                 /*
> >                  * Vmemmap pages need to preserve their state.
> >                  */
> >                 goto preserve_state;
> 
> Can you be sure there are no false positives? (if I remember correctly,
> this memory might be completely uninitialized - I might be wrong)

Normal pages reaching this point will be uninitialized or 
poisoned-initialized.

Vmemmap pages are initialized to 0 in mhp_mark_vmemmap_pages,
before reaching here.

For the false positive to be effective, page should be reserved, and 
page->type would have to have a specific value.
If we feel unsure about this, I could add a new kind of check for only
this situation, where we initialize another field of struct page
to another specific/magic value, so we will have three checks only at
this stage.

> 
> > 
> >         mm_zero_struct_page(page);
> >         page_mapcount_reset(page);
> >         INIT_LIST_HEAD(&page->lru);
> > preserve_state:
> >         init_page_count(page);
> >         set_page_links(page, zone, nid, pfn);
> >         page_cpupid_reset_last(page);
> >         page_kasan_tag_reset(page);
> > 
> > So, vmemmap pages will fall within the same zone as the range we are adding,
> > that does not change.
> 
> I wonder if that is the right thing to do, hmmmm, because they are
> effectively not part of that zone (not online)
> 
> Will have a look at the details :)

I might be wrong here, but last time I checked, pages that are used for memmaps
at boot time (not hotplugged), are still linked to some zone.

Will have to double check though.

If that is not case, it would be easier, but I am afraid it is.


-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/5] Allocate memmap from hotadded memory
  2019-07-25 16:02 [PATCH v3 0/5] Allocate memmap from hotadded memory Oscar Salvador
                   ` (5 preceding siblings ...)
  2019-07-25 16:56 ` [PATCH v3 0/5] Allocate memmap from hotadded memory David Hildenbrand
@ 2019-08-01  7:39 ` Oscar Salvador
  2019-08-01  8:17   ` David Hildenbrand
  2019-08-01 18:46 ` David Hildenbrand
  7 siblings, 1 reply; 22+ messages in thread
From: Oscar Salvador @ 2019-08-01  7:39 UTC (permalink / raw)
  To: akpm
  Cc: dan.j.williams, david, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel

On Thu, Jul 25, 2019 at 06:02:02PM +0200, Oscar Salvador wrote:
> Here we go with v3.
> 
> v3 -> v2:
>         * Rewrite about vmemmap pages handling.
>           Prior to this version, I was (ab)using hugepages fields
>           from struct page, while here I am officially adding a new
>           sub-page type with the fields I need.
> 
>         * Drop MHP_MEMMAP_{MEMBLOCK,DEVICE} in favor of MHP_MEMMAP_ON_MEMORY.
>           While I am still not 100% if this the right decision, and while I
>           still see some gaining in having MHP_MEMMAP_{MEMBLOCK,DEVICE},
>           having only one flag ease the code.
>           If the user wants to allocate memmaps per memblock, it'll
>           have to call add_memory() variants with memory-block granularity.
> 
>           If we happen to have a more clear usecase MHP_MEMMAP_MEMBLOCK
>           flag in the future, so user does not have to bother about the way
>           it calls add_memory() variants, but only pass a flag, we can add it.
>           Actually, I already had the code, so add it in the future is going to be
>           easy.
> 
>         * Granularity check when hot-removing memory.
>           Just checking that the granularity is the same.
> 
> [Testing]
> 
>  - x86_64: small and large memblocks (128MB, 1G and 2G)
> 
> So far, only acpi memory hotplug uses the new flag.
> The other callers can be changed depending on their needs.
> 
> [Coverletter]
> 
> This is another step to make memory hotplug more usable. The primary
> goal of this patchset is to reduce memory overhead of the hot-added
> memory (at least for SPARSEMEM_VMEMMAP memory model). The current way we use
> to populate memmap (struct page array) has two main drawbacks:
> 
> a) it consumes an additional memory until the hotadded memory itself is
>    onlined and
> b) memmap might end up on a different numa node which is especially true
>    for movable_node configuration.
> 
> a) it is a problem especially for memory hotplug based memory "ballooning"
>    solutions when the delay between physical memory hotplug and the
>    onlining can lead to OOM and that led to introduction of hacks like auto
>    onlining (see 31bc3858ea3e ("memory-hotplug: add automatic onlining
>    policy for the newly added memory")).
> 
> b) can have performance drawbacks.
> 
> One way to mitigate all these issues is to simply allocate memmap array
> (which is the largest memory footprint of the physical memory hotplug)
> from the hot-added memory itself. SPARSEMEM_VMEMMAP memory model allows
> us to map any pfn range so the memory doesn't need to be online to be
> usable for the array. See patch 3 for more details.
> This feature is only usable when CONFIG_SPARSEMEM_VMEMMAP is set.
> 
> [Overall design]:
> 
> Implementation wise we reuse vmem_altmap infrastructure to override
> the default allocator used by vmemap_populate. Once the memmap is
> allocated we need a way to mark altmap pfns used for the allocation.
> If MHP_MEMMAP_ON_MEMORY flag was passed, we set up the layout of the
> altmap structure at the beginning of __add_pages(), and then we call
> mark_vmemmap_pages().
> 
> MHP_MEMMAP_ON_MEMORY flag parameter will specify to allocate memmaps
> from the hot-added range.
> If callers wants memmaps to be allocated per memory block, it will
> have to call add_memory() variants in memory-block granularity
> spanning the whole range, while if it wants to allocate memmaps
> per whole memory range, just one call will do.
> 
> Want to add 384MB (3 sections, 3 memory-blocks)
> e.g:
> 
> add_memory(0x1000, size_memory_block);
> add_memory(0x2000, size_memory_block);
> add_memory(0x3000, size_memory_block);
> 
> or
> 
> add_memory(0x1000, size_memory_block * 3);
> 
> One thing worth mention is that vmemmap pages residing in movable memory is not a
> show-stopper for that memory to be offlined/migrated away.
> Vmemmap pages are just ignored in that case and they stick around until sections
> referred by those vmemmap pages are hot-removed.

Gentle ping :-)

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/5] Allocate memmap from hotadded memory
  2019-08-01  7:39 ` Oscar Salvador
@ 2019-08-01  8:17   ` David Hildenbrand
  2019-08-01  8:39     ` Oscar Salvador
  0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2019-08-01  8:17 UTC (permalink / raw)
  To: Oscar Salvador, akpm
  Cc: dan.j.williams, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel

On 01.08.19 09:39, Oscar Salvador wrote:
> On Thu, Jul 25, 2019 at 06:02:02PM +0200, Oscar Salvador wrote:
>> Here we go with v3.
>>
>> v3 -> v2:
>>         * Rewrite about vmemmap pages handling.
>>           Prior to this version, I was (ab)using hugepages fields
>>           from struct page, while here I am officially adding a new
>>           sub-page type with the fields I need.
>>
>>         * Drop MHP_MEMMAP_{MEMBLOCK,DEVICE} in favor of MHP_MEMMAP_ON_MEMORY.
>>           While I am still not 100% if this the right decision, and while I
>>           still see some gaining in having MHP_MEMMAP_{MEMBLOCK,DEVICE},
>>           having only one flag ease the code.
>>           If the user wants to allocate memmaps per memblock, it'll
>>           have to call add_memory() variants with memory-block granularity.
>>
>>           If we happen to have a more clear usecase MHP_MEMMAP_MEMBLOCK
>>           flag in the future, so user does not have to bother about the way
>>           it calls add_memory() variants, but only pass a flag, we can add it.
>>           Actually, I already had the code, so add it in the future is going to be
>>           easy.
>>
>>         * Granularity check when hot-removing memory.
>>           Just checking that the granularity is the same.
>>
>> [Testing]
>>
>>  - x86_64: small and large memblocks (128MB, 1G and 2G)
>>
>> So far, only acpi memory hotplug uses the new flag.
>> The other callers can be changed depending on their needs.
>>
>> [Coverletter]
>>
>> This is another step to make memory hotplug more usable. The primary
>> goal of this patchset is to reduce memory overhead of the hot-added
>> memory (at least for SPARSEMEM_VMEMMAP memory model). The current way we use
>> to populate memmap (struct page array) has two main drawbacks:
>>
>> a) it consumes an additional memory until the hotadded memory itself is
>>    onlined and
>> b) memmap might end up on a different numa node which is especially true
>>    for movable_node configuration.
>>
>> a) it is a problem especially for memory hotplug based memory "ballooning"
>>    solutions when the delay between physical memory hotplug and the
>>    onlining can lead to OOM and that led to introduction of hacks like auto
>>    onlining (see 31bc3858ea3e ("memory-hotplug: add automatic onlining
>>    policy for the newly added memory")).
>>
>> b) can have performance drawbacks.
>>
>> One way to mitigate all these issues is to simply allocate memmap array
>> (which is the largest memory footprint of the physical memory hotplug)
>> from the hot-added memory itself. SPARSEMEM_VMEMMAP memory model allows
>> us to map any pfn range so the memory doesn't need to be online to be
>> usable for the array. See patch 3 for more details.
>> This feature is only usable when CONFIG_SPARSEMEM_VMEMMAP is set.
>>
>> [Overall design]:
>>
>> Implementation wise we reuse vmem_altmap infrastructure to override
>> the default allocator used by vmemap_populate. Once the memmap is
>> allocated we need a way to mark altmap pfns used for the allocation.
>> If MHP_MEMMAP_ON_MEMORY flag was passed, we set up the layout of the
>> altmap structure at the beginning of __add_pages(), and then we call
>> mark_vmemmap_pages().
>>
>> MHP_MEMMAP_ON_MEMORY flag parameter will specify to allocate memmaps
>> from the hot-added range.
>> If callers wants memmaps to be allocated per memory block, it will
>> have to call add_memory() variants in memory-block granularity
>> spanning the whole range, while if it wants to allocate memmaps
>> per whole memory range, just one call will do.
>>
>> Want to add 384MB (3 sections, 3 memory-blocks)
>> e.g:
>>
>> add_memory(0x1000, size_memory_block);
>> add_memory(0x2000, size_memory_block);
>> add_memory(0x3000, size_memory_block);
>>
>> or
>>
>> add_memory(0x1000, size_memory_block * 3);
>>
>> One thing worth mention is that vmemmap pages residing in movable memory is not a
>> show-stopper for that memory to be offlined/migrated away.
>> Vmemmap pages are just ignored in that case and they stick around until sections
>> referred by those vmemmap pages are hot-removed.
> 
> Gentle ping :-)
> 

I am not yet sure about two things:


1. Checking uninitialized pages for PageVmemmap() when onlining. I
consider this very bad.

I wonder if it would be better to remember for each memory block the pfn
offset, which will be used when onlining/offlining.

I have some patches that convert online_pages() to
__online_memory_block(struct memory block *mem) - which fits perfect to
the current user. So taking the offset and processing only these pages
when onlining would be easy. To do the same for offline_pages(), we
first have to rework memtrace code. But when offlining, all memmaps have
already been initialized.


2. Setting the Vmemmap pages to the zone of the online type. This would
mean we would have unmovable data on pages marked to belong to the
movable zone. I would suggest to always set them to the NORMAL zone when
onlining - and inititalize the vmemmap of the vmemmap pages directly
during add_memory() instead.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/5] Allocate memmap from hotadded memory
  2019-08-01  8:17   ` David Hildenbrand
@ 2019-08-01  8:39     ` Oscar Salvador
  2019-08-01  8:44       ` David Hildenbrand
  0 siblings, 1 reply; 22+ messages in thread
From: Oscar Salvador @ 2019-08-01  8:39 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: akpm, dan.j.williams, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel

On Thu, Aug 01, 2019 at 10:17:23AM +0200, David Hildenbrand wrote:
> I am not yet sure about two things:
> 
> 
> 1. Checking uninitialized pages for PageVmemmap() when onlining. I
> consider this very bad.
> 
> I wonder if it would be better to remember for each memory block the pfn
> offset, which will be used when onlining/offlining.
> 
> I have some patches that convert online_pages() to
> __online_memory_block(struct memory block *mem) - which fits perfect to
> the current user. So taking the offset and processing only these pages
> when onlining would be easy. To do the same for offline_pages(), we
> first have to rework memtrace code. But when offlining, all memmaps have
> already been initialized.

This is true, I did not really like that either, but was one of the things
I came up.
I already have some ideas how to avoid checking the page, I will work on it.

> 2. Setting the Vmemmap pages to the zone of the online type. This would
> mean we would have unmovable data on pages marked to belong to the
> movable zone. I would suggest to always set them to the NORMAL zone when
> onlining - and inititalize the vmemmap of the vmemmap pages directly
> during add_memory() instead.

IMHO, having vmemmap pages in ZONE_MOVABLE do not matter that match.
They are not counted as managed_pages, and they are not show-stopper for
moving all the other data around (migrate), they are just skipped.
Conceptually, they are not pages we can deal with.

I thought they should lay wherever the range lays.
Having said that, I do not oppose to place them in ZONE_NORMAL, as they might
fit there better under the theory that ZONE_NORMAL have memory that might not be
movable/migratable.

As for initializing them in add_memory(), we cannot do that.
First problem is that we first need sparse_mem_map_populate to create
the mapping, and to take the pages from our altmap.

Then, we can access and initialize those pages.
So we cannot do that in add_memory() because that happens before.

And I really think that it fits much better in __add_pages than in add_memory.

Given said that, I would appreciate some comments in patches#3 and patches#4,
specially patch#4.
So I would like to collect some feedback in those before sending a new version.

Thanks David

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/5] Allocate memmap from hotadded memory
  2019-08-01  8:39     ` Oscar Salvador
@ 2019-08-01  8:44       ` David Hildenbrand
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2019-08-01  8:44 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: akpm, dan.j.williams, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel

On 01.08.19 10:39, Oscar Salvador wrote:
> On Thu, Aug 01, 2019 at 10:17:23AM +0200, David Hildenbrand wrote:
>> I am not yet sure about two things:
>>
>>
>> 1. Checking uninitialized pages for PageVmemmap() when onlining. I
>> consider this very bad.
>>
>> I wonder if it would be better to remember for each memory block the pfn
>> offset, which will be used when onlining/offlining.
>>
>> I have some patches that convert online_pages() to
>> __online_memory_block(struct memory block *mem) - which fits perfect to
>> the current user. So taking the offset and processing only these pages
>> when onlining would be easy. To do the same for offline_pages(), we
>> first have to rework memtrace code. But when offlining, all memmaps have
>> already been initialized.
> 
> This is true, I did not really like that either, but was one of the things
> I came up.
> I already have some ideas how to avoid checking the page, I will work on it.

I think it would be best if we find some way that during
onlining/offlining we skip the vmemmap part completely. (e.g., as
discussed via an offset in the memblock or similar)

> 
>> 2. Setting the Vmemmap pages to the zone of the online type. This would
>> mean we would have unmovable data on pages marked to belong to the
>> movable zone. I would suggest to always set them to the NORMAL zone when
>> onlining - and inititalize the vmemmap of the vmemmap pages directly
>> during add_memory() instead.
> 
> IMHO, having vmemmap pages in ZONE_MOVABLE do not matter that match.
> They are not counted as managed_pages, and they are not show-stopper for
> moving all the other data around (migrate), they are just skipped.
> Conceptually, they are not pages we can deal with.

I am not sure yet about the implications of having these belong to a
zone they don't hmmmm. Will the pages be PG_reserved?

> 
> I thought they should lay wherever the range lays.
> Having said that, I do not oppose to place them in ZONE_NORMAL, as they might
> fit there better under the theory that ZONE_NORMAL have memory that might not be
> movable/migratable.
> 
> As for initializing them in add_memory(), we cannot do that.
> First problem is that we first need sparse_mem_map_populate to create
> the mapping, and to take the pages from our altmap.
> 
> Then, we can access and initialize those pages.
> So we cannot do that in add_memory() because that happens before.
> 
> And I really think that it fits much better in __add_pages than in add_memory.

Sorry, I rather meant when adding memory, not when onlining. But you
seem to do that already. :)

> 
> Given said that, I would appreciate some comments in patches#3 and patches#4,
> specially patch#4.

Will have a look!

> So I would like to collect some feedback in those before sending a new version.
> 
> Thanks David
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/5] mm,sparse: Add SECTION_USE_VMEMMAP flag
  2019-07-25 16:02 ` [PATCH v3 3/5] mm,sparse: Add SECTION_USE_VMEMMAP flag Oscar Salvador
@ 2019-08-01 14:45   ` David Hildenbrand
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2019-08-01 14:45 UTC (permalink / raw)
  To: Oscar Salvador, akpm
  Cc: dan.j.williams, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel

On 25.07.19 18:02, Oscar Salvador wrote:
> When hot-removing memory, we need to be careful about two things:
> 
> 1) Memory range must be memory_block aligned. This is what
>    check_hotplug_memory_range() checks for.
> 
> 2) If a range was hot-added using MHP_MEMMAP_ON_MEMORY, we need to check
>    whether the caller is removing memory with the same granularity that
>    it was added.

The second step does only apply to MMAP_ON_MEMORY and is not universally
true.

> 
> So to check against case 2), we mark all sections used by vmemmap
> (not only the ones containing vmemmap pages, but all sections spanning
> the memory range) with SECTION_USE_VMEMMAP.

SECTION_USE_VMEMAP is misleding.

Rather SECTION_MMAP_ON_MEMORY (TBD). Please *really* add a description
(these sections)

> 
> This will allow us to do some sanity checks when in hot-remove stage.
> 

One idea: lookup the struct page of the lowest memory address you are
removing and test if it lies on a PageVmemmap(). Then, from the stored
info along the vmemmap page (start + length) you can test if all memory
the vmemmap is responsible for is removed.

This should work or am I missing something?

> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  include/linux/memory_hotplug.h | 3 ++-
>  include/linux/mmzone.h         | 8 +++++++-
>  mm/memory_hotplug.c            | 2 +-
>  mm/sparse.c                    | 9 +++++++--
>  4 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 45dece922d7c..6b20008d9297 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -366,7 +366,8 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>  		unsigned long nr_pages, struct vmem_altmap *altmap);
>  extern bool is_memblock_offlined(struct memory_block *mem);
>  extern int sparse_add_section(int nid, unsigned long pfn,
> -		unsigned long nr_pages, struct vmem_altmap *altmap);
> +		unsigned long nr_pages, struct vmem_altmap *altmap,
> +		bool vmemmap_section);
>  extern void sparse_remove_section(struct mem_section *ms,
>  		unsigned long pfn, unsigned long nr_pages,
>  		unsigned long map_offset, struct vmem_altmap *altmap);
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index d77d717c620c..259c326962f5 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1254,7 +1254,8 @@ extern size_t mem_section_usage_size(void);
>  #define SECTION_HAS_MEM_MAP	(1UL<<1)
>  #define SECTION_IS_ONLINE	(1UL<<2)
>  #define SECTION_IS_EARLY	(1UL<<3)
> -#define SECTION_MAP_LAST_BIT	(1UL<<4)
> +#define SECTION_USE_VMEMMAP	(1UL<<4)
> +#define SECTION_MAP_LAST_BIT	(1UL<<5)
>  #define SECTION_MAP_MASK	(~(SECTION_MAP_LAST_BIT-1))
>  #define SECTION_NID_SHIFT	3
>  
> @@ -1265,6 +1266,11 @@ static inline struct page *__section_mem_map_addr(struct mem_section *section)
>  	return (struct page *)map;
>  }
>  
> +static inline int vmemmap_section(struct mem_section *section)
> +{
> +	return (section && (section->section_mem_map & SECTION_USE_VMEMMAP));
> +}
> +
>  static inline int present_section(struct mem_section *section)
>  {
>  	return (section && (section->section_mem_map & SECTION_MARKED_PRESENT));
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 3d97c3711333..c2338703ce80 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -314,7 +314,7 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
>  
>  		pfns = min(nr_pages, PAGES_PER_SECTION
>  				- (pfn & ~PAGE_SECTION_MASK));
> -		err = sparse_add_section(nid, pfn, pfns, altmap);
> +		err = sparse_add_section(nid, pfn, pfns, altmap, 0);
>  		if (err)
>  			break;
>  		pfn += pfns;
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 79355a86064f..09cac39e39d9 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -856,13 +856,18 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
>   * * -ENOMEM	- Out of memory.
>   */
>  int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> -		unsigned long nr_pages, struct vmem_altmap *altmap)
> +		unsigned long nr_pages, struct vmem_altmap *altmap,
> +		bool vmemmap_section)
>  {
>  	unsigned long section_nr = pfn_to_section_nr(start_pfn);
> +	unsigned long flags = 0;
>  	struct mem_section *ms;
>  	struct page *memmap;
>  	int ret;
>  
> +	if (vmemmap_section)
> +		flags = SECTION_USE_VMEMMAP;
> +
>  	ret = sparse_index_init(section_nr, nid);
>  	if (ret < 0)
>  		return ret;
> @@ -884,7 +889,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>  	/* Align memmap to section boundary in the subsection case */
>  	if (section_nr_to_pfn(section_nr) != start_pfn)
>  		memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> -	sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
> +	sparse_init_one_section(ms, section_nr, memmap, ms->usage, flags);
>  
>  	return 0;
>  }
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/5] mm,memory_hotplug: Allocate memmap from the added memory range for sparse-vmemmap
  2019-07-25 16:02 ` [PATCH v3 4/5] mm,memory_hotplug: Allocate memmap from the added memory range for sparse-vmemmap Oscar Salvador
@ 2019-08-01 15:04   ` David Hildenbrand
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2019-08-01 15:04 UTC (permalink / raw)
  To: Oscar Salvador, akpm
  Cc: dan.j.williams, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel

On 25.07.19 18:02, Oscar Salvador wrote:
> Physical memory hotadd has to allocate a memmap (struct page array) for
> the newly added memory section. Currently, alloc_pages_node() is used
> for those allocations.
> 
> This has some disadvantages:
>  a) an existing memory is consumed for that purpose
>     (~2MB per 128MB memory section on x86_64)
>  b) if the whole node is movable then we have off-node struct pages
>     which has performance drawbacks.
> 
> a) has turned out to be a problem for memory hotplug based ballooning
>    because the userspace might not react in time to online memory while
>    the memory consumed during physical hotadd consumes enough memory to
>    push system to OOM. 31bc3858ea3e ("memory-hotplug: add automatic onlining
>    policy for the newly added memory") has been added to workaround that
>    problem.

FWIW, e.g., in my current virtio-mem prototype, I add a bunch of memory
blocks and wait until they have been onlined to add more memory blocks
(max 10 offline at a time). So I am not sure if this is actually a
problem that couldn't have been solved differently. Or I am missing
something :)

Anyhow, the enumeration a) b) a) is strange :)

> 
> This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled.
> 
> Vmemap page tables can map arbitrary memory.
> That means that we can simply use the beginning of each memory section and
> map struct pages there.
> struct pages which back the allocated space then just need to be treated
> carefully.
> 
> Implementation wise we will reuse vmem_altmap infrastructure to override
> the default allocator used by __vmemap_populate. Once the memmap is
> allocated, we are going to need a way to mark altmap pfns used for the allocation.
> If MHP_MEMMAP_ON_MEMORY flag was passed, we will set up the layout of the
> altmap structure at the beginning of __add_pages(), and then we will call
> mhp_mark_vmemmap_pages() to do the proper marking.
> 
> mhp_mark_vmemmap_pages() marks the pages as vmemmap and sets some metadata:
> 
> Vmemmap's pages layout is as follows:
> 
>         * Layout:
>         * Head:
>         *      head->vmemmap_pages     : nr of vmemmap pages
>         *      head->vmemmap_sections  : nr of sections used by this altmap
>         * Tail:
>         *      tail->vmemmap_head      : head
>         * All:
>         *      page->type              : Vmemmap
> 

This description belongs into the introducing patch :)

> E.g:
> When hot-add 1GB on x86_64 :
> 
> head->vmemmap_pages = 4096
> head->vmemmap_sections = 8
> 
> We keep this information within the struct pages as we need them in certain
> stages like offline, online and hot-remove.
> 
> head->vmemmap_sections is a kind of refcount, because when using MHP_MEMMAP_ON_MEMORY,
> we need to know how much do we have to defer the call to vmemmap_free().

Why is it used as a refcount (see my comment to the previous patch,
storing the section count still makes sense)? As you validate that the
same granualrity is removed as was added, I would have guessed this does
not matter. But as we discussed, the whole ClearVmemmapPage() stuff
might not be needed at all (implying this patch can be simplified).

> The thing is that the first pages of the memory range are used to store the
> memmap mapping, so we cannot remove those first, otherwise we would blow up
> when accessing the other pages.

That is interesting: struct pages are initialized when onlining. That
makes me assume after offlining, the content is stale (especially when
never onlined). We should really fix any accesses to struct pages when
removing memory first. This smells like working around something that is
already broken.

> 
> So, instead of actually removing the section (with vmemmap_free), we wait
> until we remove the last one, and then we call vmemmap_free() for all
> batched sections.
> 
> We also have to be careful about those pages during online and offline
> operations. They are simply skipped, so online will keep them
> reserved and so unusable for any other purpose and offline ignores them
> so they do not block the offline operation.

As discussed, maybe storing an offset in the memory block can avoid
having to look at struct pages when onlining/offlining - simply skip
that part.

> 
> In offline operation we only have to check for one particularity.
> Depending on the way the hot-added range was added, it might be that,
> that one or more of memory blocks from the beginning are filled with
> only vmemmap pages.
> We just need to check for this case and skip 1) isolating 2) migrating,
> because those pages do not need to be migrated anywhere, as they are
> self-hosted.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  arch/powerpc/mm/init_64.c      |   7 +++
>  arch/s390/mm/init.c            |   6 ++
>  arch/x86/mm/init_64.c          |  10 +++
>  drivers/acpi/acpi_memhotplug.c |   3 +-
>  include/linux/memory_hotplug.h |   6 ++
>  include/linux/memremap.h       |   2 +-
>  mm/compaction.c                |   7 +++
>  mm/memory_hotplug.c            | 136 ++++++++++++++++++++++++++++++++++++++---
>  mm/page_alloc.c                |  26 +++++++-
>  mm/page_isolation.c            |  14 ++++-
>  mm/sparse.c                    | 107 ++++++++++++++++++++++++++++++++
>  11 files changed, 309 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index a44f6281ca3a..f19aa006ca6d 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -292,6 +292,13 @@ void __ref vmemmap_free(unsigned long start, unsigned long end,
>  
>  		if (base_pfn >= alt_start && base_pfn < alt_end) {
>  			vmem_altmap_free(altmap, nr_pages);
> +		} else if (PageVmemmap(page)) {
> +			/*
> +			 * runtime vmemmap pages are residing inside the memory
> +			 * section so they do not have to be freed anywhere.
> +			 */
> +			while (PageVmemmap(page))
> +				ClearPageVmemmap(page++);
>  		} else if (PageReserved(page)) {
>  			/* allocated from bootmem */
>  			if (page_size < PAGE_SIZE) {
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index 20340a03ad90..adb04f3977eb 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -278,6 +278,12 @@ int arch_add_memory(int nid, u64 start, u64 size,
>  	unsigned long size_pages = PFN_DOWN(size);
>  	int rc;
>  
> +	/*
> +	 * Physical memory is added only later during the memory online so we
> +	 * cannot use the added range at this stage unfortunately.
> +	 */
> +	restrictions->flags &= ~restrictions->flags;
> +
>  	if (WARN_ON_ONCE(restrictions->altmap))
>  		return -EINVAL;
>  
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index a6b5c653727b..f9f720a28b3e 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -876,6 +876,16 @@ static void __meminit free_pagetable(struct page *page, int order)
>  	unsigned long magic;
>  	unsigned int nr_pages = 1 << order;
>  
> +	/*
> +	 * Runtime vmemmap pages are residing inside the memory section so
> +	 * they do not have to be freed anywhere.
> +	 */
> +	if (PageVmemmap(page)) {
> +		while (nr_pages--)
> +			ClearPageVmemmap(page++);
> +		return;
> +	}
> +
>  	/* bootmem page has reserved flag */
>  	if (PageReserved(page)) {
>  		__ClearPageReserved(page);
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index d91b3584d4b2..e0148dde5313 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -207,7 +207,8 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>  		if (node < 0)
>  			node = memory_add_physaddr_to_nid(info->start_addr);
>  
> -		result = __add_memory(node, info->start_addr, info->length, 0);
> +		result = __add_memory(node, info->start_addr, info->length,
> +				      MHP_MEMMAP_ON_MEMORY);
>  
>  		/*
>  		 * If the memory block has been used by the kernel, add_memory()
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 6b20008d9297..e1e8abf22a80 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -377,4 +377,10 @@ extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_
>  		int online_type);
>  extern struct zone *zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
>  		unsigned long nr_pages);
> +
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> +extern void mhp_mark_vmemmap_pages(struct vmem_altmap *self);
> +#else
> +static inline void mhp_mark_vmemmap_pages(struct vmem_altmap *self) {}
> +#endif
>  #endif /* __LINUX_MEMORY_HOTPLUG_H */
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index 2cfc3c289d01..0a7355b8c1cf 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -16,7 +16,7 @@ struct device;
>   * @alloc: track pages consumed, private to vmemmap_populate()
>   */
>  struct vmem_altmap {
> -	const unsigned long base_pfn;
> +	unsigned long base_pfn;
>  	const unsigned long reserve;
>  	unsigned long free;
>  	unsigned long align;
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ac4ead029b4a..2faf769375c4 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -857,6 +857,13 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  		nr_scanned++;
>  
>  		page = pfn_to_page(low_pfn);
> +		/*
> +		 * Vmemmap pages do not need to be isolated.
> +		 */
> +		if (PageVmemmap(page)) {
> +			low_pfn += vmemmap_nr_pages(page) - 1;
> +			continue;
> +		}

What if somebody uses this e.g. via alloc_pages_contig()? Are we sure
only the actual memory offlining path will skip these? (maybe looking at
the pageblock migratetype might be necessary).

This makes me think that we should not even try to offline/online these
pages right from online_pages()/offline_pages() but instead skip the
vmemmap part there completely.

>  
>  		/*
>  		 * Check if the pageblock has already been marked skipped.
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index c2338703ce80..09d41339cd11 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -278,6 +278,13 @@ static int check_pfn_span(unsigned long pfn, unsigned long nr_pages,
>  	return 0;
>  }
>  
> +static void mhp_init_altmap(unsigned long pfn, unsigned long nr_pages,
> +			    struct vmem_altmap *altmap)
> +{
> +	altmap->free = nr_pages;
> +	altmap->base_pfn = pfn;
> +}
> +
>  /*
>   * Reasonably generic function for adding memory.  It is
>   * expected that archs that support memory hotplug will
> @@ -289,8 +296,18 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
>  {
>  	int err;
>  	unsigned long nr, start_sec, end_sec;
> -	struct vmem_altmap *altmap = restrictions->altmap;
> +	struct vmem_altmap *altmap;
> +	struct vmem_altmap mhp_altmap = {};
> +	unsigned long mhp_flags = restrictions->flags;
> +	bool vmemmap_section = false;
> +
> +	if (mhp_flags) {
> +		mhp_init_altmap(pfn, nr_pages, &mhp_altmap);
> +		restrictions->altmap = &mhp_altmap;
> +		vmemmap_section = true;
> +	}
>  
> +	altmap = restrictions->altmap;
>  	if (altmap) {
>  		/*
>  		 * Validate altmap is within bounds of the total request
> @@ -314,7 +331,7 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
>  
>  		pfns = min(nr_pages, PAGES_PER_SECTION
>  				- (pfn & ~PAGE_SECTION_MASK));
> -		err = sparse_add_section(nid, pfn, pfns, altmap, 0);
> +		err = sparse_add_section(nid, pfn, pfns, altmap, vmemmap_section);
>  		if (err)
>  			break;
>  		pfn += pfns;
> @@ -322,6 +339,10 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
>  		cond_resched();
>  	}
>  	vmemmap_populate_print_last();
> +
> +	if (mhp_flags)
> +		mhp_mark_vmemmap_pages(altmap);
> +
>  	return err;
>  }
>  
> @@ -640,6 +661,14 @@ static int online_pages_blocks(unsigned long start, unsigned long nr_pages)
>  	while (start < end) {
>  		order = min(MAX_ORDER - 1,
>  			get_order(PFN_PHYS(end) - PFN_PHYS(start)));
> +		/*
> +		 * Check if the pfn is aligned to its order.
> +		 * If not, we decrement the order until it is,
> +		 * otherwise __free_one_page will bug us.
> +		 */
> +		while (start & ((1 << order) - 1))
> +			order--;
> +
>  		(*online_page_callback)(pfn_to_page(start), order);
>  
>  		onlined_pages += (1UL << order);
> @@ -648,17 +677,51 @@ static int online_pages_blocks(unsigned long start, unsigned long nr_pages)
>  	return onlined_pages;
>  }
>  
> +static bool vmemmap_skip_block(unsigned long pfn, unsigned long nr_pages,
> +		       unsigned long *nr_vmemmap_pages)
> +{
> +	bool skip = false;
> +	unsigned long vmemmap_pages = 0;
> +
> +	/*
> +	 * This function gets called from {online,offline}_pages.
> +	 * It has two goals:
> +	 *
> +	 * 1) Account number of vmemmap pages within the range
> +	 * 2) Check if the whole range contains only vmemmap_pages.
> +	 */
> +
> +	if (PageVmemmap(pfn_to_page(pfn))) {
> +		struct page *page = pfn_to_page(pfn);
> +
> +		vmemmap_pages = min(vmemmap_nr_pages(page), nr_pages);
> +		if (vmemmap_pages == nr_pages)
> +			skip = true;
> +	}
> +
> +	*nr_vmemmap_pages = vmemmap_pages;
> +	return skip;
> +}
> +
>  static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
>  			void *arg)
>  {
>  	unsigned long onlined_pages = *(unsigned long *)arg;
> -
> -	if (PageReserved(pfn_to_page(start_pfn)))
> -		onlined_pages += online_pages_blocks(start_pfn, nr_pages);
> -
> +	unsigned long pfn = start_pfn;
> +	unsigned long nr_vmemmap_pages = 0;
> +	bool skip;
> +
> +	skip = vmemmap_skip_block(pfn, nr_pages, &nr_vmemmap_pages);
> +	if (skip)
> +		goto skip_online_pages;
> +
> +	pfn += nr_vmemmap_pages;
> +	if (PageReserved(pfn_to_page(pfn)))
> +		onlined_pages += online_pages_blocks(pfn, nr_pages - nr_vmemmap_pages);
> +skip_online_pages:
>  	online_mem_sections(start_pfn, start_pfn + nr_pages);
>  
> -	*(unsigned long *)arg = onlined_pages;
> +	*(unsigned long *)arg = onlined_pages + nr_vmemmap_pages;
>  	return 0;
>  }
>  
> @@ -1040,6 +1103,19 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>  	return device_online(&mem->dev);
>  }
>  
> +static unsigned long mhp_check_flags(unsigned long flags)
> +{
> +	if (!flags)
> +		return 0;
> +
> +	if (flags != MHP_MEMMAP_ON_MEMORY) {
> +		WARN(1, "Wrong flags value (%lx). Ignoring flags.\n", flags);
> +		return 0;
> +	}
> +
> +	return flags;
> +}
> +
>  /*
>   * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
>   * and online/offline operations (triggered e.g. by sysfs).
> @@ -1075,6 +1151,8 @@ int __ref add_memory_resource(int nid, struct resource *res, unsigned long flags
>  		goto error;
>  	new_node = ret;
>  
> +	restrictions.flags = mhp_check_flags(flags);
> +
>  	/* call arch's memory hotadd */
>  	ret = arch_add_memory(nid, start, size, &restrictions);
>  	if (ret < 0)
> @@ -1502,12 +1580,14 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  {
>  	unsigned long pfn, nr_pages;
>  	unsigned long offlined_pages = 0;
> +	unsigned long nr_vmemmap_pages = 0;
>  	int ret, node, nr_isolate_pageblock;
>  	unsigned long flags;
>  	unsigned long valid_start, valid_end;
>  	struct zone *zone;
>  	struct memory_notify arg;
>  	char *reason;
> +	bool skip = false;
>  
>  	mem_hotplug_begin();
>  
> @@ -1524,8 +1604,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  	node = zone_to_nid(zone);
>  	nr_pages = end_pfn - start_pfn;
>  
> +	skip = vmemmap_skip_block(start_pfn, nr_pages, &nr_vmemmap_pages);
> +
>  	/* set above range as isolated */
> -	ret = start_isolate_page_range(start_pfn, end_pfn,
> +	ret = start_isolate_page_range(start_pfn + nr_vmemmap_pages, end_pfn,
>  				       MIGRATE_MOVABLE,
>  				       SKIP_HWPOISON | REPORT_FAILURE);
>  	if (ret < 0) {
> @@ -1545,6 +1627,9 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  		goto failed_removal_isolated;
>  	}
>  
> +	if (skip)
> +		goto skip_migration;
> +
>  	do {
>  		for (pfn = start_pfn; pfn;) {
>  			if (signal_pending(current)) {
> @@ -1581,6 +1666,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  					    NULL, check_pages_isolated_cb);
>  	} while (ret);
>  
> +skip_migration:
>  	/* Ok, all of our target is isolated.
>  	   We cannot do rollback at this point. */
>  	walk_system_ram_range(start_pfn, end_pfn - start_pfn,
> @@ -1596,7 +1682,9 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  
>  	/* removal success */
> -	adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages);
> +	if (offlined_pages)
> +		adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages);
> +	offlined_pages += nr_vmemmap_pages;
>  	zone->present_pages -= offlined_pages;
>  
>  	pgdat_resize_lock(zone->zone_pgdat, &flags);
> @@ -1739,11 +1827,41 @@ static void __release_memory_resource(resource_size_t start,
>  	}
>  }
>  
> +static int check_hotplug_granularity(u64 start, u64 size)
> +{
> +	unsigned long pfn = PHYS_PFN(start);
> +
> +	/*
> +	 * Sanity check in case the range used MHP_MEMMAP_ON_MEMORY.
> +	 */
> +	if (vmemmap_section(__pfn_to_section(pfn))) {
> +		struct page *page = pfn_to_page(pfn);
> +		unsigned long nr_pages = size >> PAGE_SHIFT;
> +		unsigned long sections;
> +
> +		/*
> +		 * The start of the memory range is not correct.
> +		 */
> +		if (!PageVmemmap(page) || (vmemmap_head(page) != page))
> +			return -EINVAL;
> +
> +		sections = vmemmap_nr_sections(page);
> +		if (sections * PAGES_PER_SECTION != nr_pages)
> +			/*
> +			 * Check that granularity is the same.
> +			 */
> +			return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  static int __ref try_remove_memory(int nid, u64 start, u64 size)
>  {
>  	int rc = 0;
>  
>  	BUG_ON(check_hotplug_memory_range(start, size));
> +	BUG_ON(check_hotplug_granularity(start, size));
>  
>  	mem_hotplug_begin();
>  
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d3bb601c461b..7c7d7130b627 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1340,14 +1340,21 @@ static void free_one_page(struct zone *zone,
>  static void __meminit __init_single_page(struct page *page, unsigned long pfn,
>  				unsigned long zone, int nid)
>  {
> +	if (PageVmemmap(page))
> +		/*
> +		 * Vmemmap pages need to preserve their state.
> +		 */
> +		goto preserve_state;
> +
>  	mm_zero_struct_page(page);
> -	set_page_links(page, zone, nid, pfn);
> -	init_page_count(page);
>  	page_mapcount_reset(page);
> +	INIT_LIST_HEAD(&page->lru);
> +preserve_state:
> +	init_page_count(page);
> +	set_page_links(page, zone, nid, pfn);
>  	page_cpupid_reset_last(page);
>  	page_kasan_tag_reset(page);
>  
> -	INIT_LIST_HEAD(&page->lru);
>  #ifdef WANT_PAGE_VIRTUAL
>  	/* The shift won't overflow because ZONE_NORMAL is below 4G. */
>  	if (!is_highmem_idx(zone))
> @@ -8184,6 +8191,14 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  
>  		page = pfn_to_page(check);
>  
> +		/*
> +		 * Vmemmap pages are not needed to be moved around.
> +		 */
> +		if (PageVmemmap(page)) {
> +			iter += vmemmap_nr_pages(page) - 1;
> +			continue;
> +		}

Same applies, maybe we can skip such stuff right from the caller. Not
sure :(

> +
>  		if (PageReserved(page))
>  			goto unmovable;
>  
> @@ -8551,6 +8566,11 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
>  			continue;
>  		}
>  		page = pfn_to_page(pfn);
> +
> +		if (PageVmemmap(page)) {
> +			pfn += vmemmap_nr_pages(page);
> +			continue;
> +		}
>  		/*
>  		 * The HWPoisoned page may be not in buddy system, and
>  		 * page_count() is not 0.
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 89c19c0feadb..ee26ea41c9eb 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -146,7 +146,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>  static inline struct page *
>  __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>  {
> -	int i;
> +	unsigned long i;
>  
>  	for (i = 0; i < nr_pages; i++) {
>  		struct page *page;
> @@ -154,6 +154,10 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>  		page = pfn_to_online_page(pfn + i);
>  		if (!page)
>  			continue;
> +		if (PageVmemmap(page)) {
> +			i += vmemmap_nr_pages(page) - 1;
> +			continue;
> +		}
>  		return page;
>  	}
>  	return NULL;
> @@ -267,6 +271,14 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
>  			continue;
>  		}
>  		page = pfn_to_page(pfn);
> +		/*
> +		 * Vmemmap pages are not isolated. Skip them.
> +		 */
> +		if (PageVmemmap(page)) {
> +			pfn += vmemmap_nr_pages(page);
> +			continue;
> +		}
> +
>  		if (PageBuddy(page))
>  			/*
>  			 * If the page is on a free list, it has to be on
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 09cac39e39d9..2cc2e5af1986 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -645,18 +645,125 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
>  #endif
>  
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
> +static void vmemmap_init_page(struct page *page, struct page *head)
> +{
> +	page_mapcount_reset(page);
> +	SetPageVmemmap(page);
> +	page->vmemmap_head = (unsigned long)head;
> +}
> +
> +static void vmemmap_init_head(struct page *page, unsigned long nr_sections,
> +			      unsigned long nr_pages)
> +{
> +	page->vmemmap_sections = nr_sections;
> +	page->vmemmap_pages = nr_pages;
> +}
> +
> +void mhp_mark_vmemmap_pages(struct vmem_altmap *self)
> +{
> +	unsigned long pfn = self->base_pfn + self->reserve;
> +	unsigned long nr_pages = self->alloc;
> +	unsigned long nr_sects = self->free / PAGES_PER_SECTION;
> +	unsigned long i;
> +	struct page *head;
> +
> +	if (!nr_pages)
> +		return;
> +
> +	/*
> +	 * All allocations for the memory hotplug are the same sized so align
> +	 * should be 0.
> +	 */
> +	WARN_ON(self->align);
> +
> +	memset(pfn_to_page(pfn), 0, sizeof(struct page) * nr_pages);
> +
> +	/*
> +	 * Mark pages as Vmemmap pages
> +	 * Layout:
> +	 * Head:
> +	 * 	head->vmemmap_pages	: nr of vmemmap pages
> +	 *	head->mhp_flags    	: MHP_flags
> +	 *	head->vmemmap_sections	: nr of sections used by this altmap
> +	 * Tail:
> +	 *	tail->vmemmap_head	: head
> +	 * All:
> +	 *	page->type		: Vmemmap
> +	 */

I think this documentation is better kept at the place where the fields
actually reside.

> +	head = pfn_to_page(pfn);
> +	for (i = 0; i < nr_pages; i++) {
> +		struct page *page = head + i;
> +
> +		vmemmap_init_page(page, head);
> +	}
> +	vmemmap_init_head(head, nr_sects, nr_pages);
> +}
> +
> +/*
> + * If the range we are trying to remove was hot-added with vmemmap pages
> + * using MHP_MEMMAP_*, we need to keep track of it to know how much
> + * do we have do defer the free up.
> + * Since sections are removed sequentally in __remove_pages()->
> + * __remove_section(), we just wait until we hit the last section.
> + * Once that happens, we can trigger free_deferred_vmemmap_range to actually
> + * free the whole memory-range.
> + */
> +static struct page *__vmemmap_head = NULL;
> +
>  static struct page *populate_section_memmap(unsigned long pfn,
>  		unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
>  {
>  	return __populate_section_memmap(pfn, nr_pages, nid, altmap);
>  }
>  
> +static void vmemmap_free_deferred_range(unsigned long start,
> +					unsigned long end)
> +{
> +	unsigned long nr_pages = end - start;
> +	unsigned long first_section;
> +
> +	first_section = (unsigned long)__vmemmap_head;
> +	while (start >= first_section) {
> +		vmemmap_free(start, end, NULL);
> +		end = start;
> +		start -= nr_pages;
> +	}
> +	__vmemmap_head = NULL;
> +}
> +
> +static inline bool vmemmap_dec_and_test(void)
> +{
> +	__vmemmap_head->vmemmap_sections--;
> +	return !__vmemmap_head->vmemmap_sections;
> +}
> +
> +static void vmemmap_defer_free(unsigned long start, unsigned long end)
> +{
> +	if (vmemmap_dec_and_test())
> +		vmemmap_free_deferred_range(start, end);
> +}
> +
> +static inline bool should_defer_freeing(unsigned long start)
> +{
> +	if (PageVmemmap((struct page *)start) || __vmemmap_head) {
> +		if (!__vmemmap_head)
> +			__vmemmap_head = (struct page *)start;
> +		return true;
> +	}
> +	return false;
> +}
> +
>  static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
>  		struct vmem_altmap *altmap)
>  {
>  	unsigned long start = (unsigned long) pfn_to_page(pfn);
>  	unsigned long end = start + nr_pages * sizeof(struct page);
>  
> +	if (should_defer_freeing(start)) {
> +		vmemmap_defer_free(start, end);
> +		return;
> +	}
> +
>  	vmemmap_free(start, end, altmap);
>  }
>  static void free_map_bootmem(struct page *memmap)
> 

Complicated stuff :)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 5/5] mm,memory_hotplug: Allow userspace to enable/disable vmemmap
  2019-07-25 16:02 ` [PATCH v3 5/5] mm,memory_hotplug: Allow userspace to enable/disable vmemmap Oscar Salvador
@ 2019-08-01 15:07   ` David Hildenbrand
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2019-08-01 15:07 UTC (permalink / raw)
  To: Oscar Salvador, akpm
  Cc: dan.j.williams, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel

On 25.07.19 18:02, Oscar Salvador wrote:
> It seems that we have some users out there that want to expose all
> hotpluggable memory to userspace, so this implements a toggling mechanism
> for those users who want to disable it.
> 
> By default, vmemmap pages mechanism is enabled.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  drivers/base/memory.c          | 33 +++++++++++++++++++++++++++++++++
>  include/linux/memory_hotplug.h |  3 +++
>  mm/memory_hotplug.c            |  7 +++++++
>  3 files changed, 43 insertions(+)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index d30d0f6c8ad0..5ec6b80de9dd 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -578,6 +578,35 @@ static DEVICE_ATTR_WO(soft_offline_page);
>  static DEVICE_ATTR_WO(hard_offline_page);
>  #endif
>  

-ENODOCUMENTATION :)

> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> +static ssize_t vmemmap_hotplug_show(struct device *dev,
> +				    struct device_attribute *attr, char *buf)
> +{
> +	if (vmemmap_enabled)
> +		return sprintf(buf, "enabled\n");
> +	else
> +		return sprintf(buf, "disabled\n");
> +}
> +
> +static ssize_t vmemmap_hotplug_store(struct device *dev,
> +			   struct device_attribute *attr,
> +			   const char *buf, size_t count)
> +{
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	if (sysfs_streq(buf, "enable"))
> +		vmemmap_enabled = true;
> +	else if (sysfs_streq(buf, "disable"))
> +		vmemmap_enabled = false;
> +	else
> +		return -EINVAL;
> +
> +	return count;
> +}
> +static DEVICE_ATTR_RW(vmemmap_hotplug);
> +#endif
> +
>  /*
>   * Note that phys_device is optional.  It is here to allow for
>   * differentiation between which *physical* devices each
> @@ -794,6 +823,10 @@ static struct attribute *memory_root_attrs[] = {
>  	&dev_attr_hard_offline_page.attr,
>  #endif
>  
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> +	&dev_attr_vmemmap_hotplug.attr,

Don't like the name of that property, sorry.

> +#endif
> +
>  	&dev_attr_block_size_bytes.attr,
>  	&dev_attr_auto_online_blocks.attr,
>  	NULL
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index e1e8abf22a80..03d227d13301 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -134,6 +134,9 @@ extern int arch_add_memory(int nid, u64 start, u64 size,
>  			struct mhp_restrictions *restrictions);
>  extern u64 max_mem_size;
>  
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> +extern bool vmemmap_enabled;
> +#endif
>  extern bool memhp_auto_online;
>  /* If movable_node boot option specified */
>  extern bool movable_node_enabled;
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 09d41339cd11..5ffe5375b87c 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -68,6 +68,10 @@ void put_online_mems(void)
>  
>  bool movable_node_enabled = false;
>  
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> +bool vmemmap_enabled __read_mostly = true;
> +#endif
> +
>  #ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE
>  bool memhp_auto_online;
>  #else
> @@ -1108,6 +1112,9 @@ static unsigned long mhp_check_flags(unsigned long flags)
>  	if (!flags)
>  		return 0;
>  
> +	if (!vmemmap_enabled)
> +		return 0;
> +
>  	if (flags != MHP_MEMMAP_ON_MEMORY) {
>  		WARN(1, "Wrong flags value (%lx). Ignoring flags.\n", flags);
>  		return 0;
> 

Hmmm, I wonder if that should that rather be a per-memory device driver
thingy? E.g., a toggle for ACPI which will then not pass in
MHP_MEMMAP_ON_MEMORY.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/5] Allocate memmap from hotadded memory
  2019-07-25 16:02 [PATCH v3 0/5] Allocate memmap from hotadded memory Oscar Salvador
                   ` (6 preceding siblings ...)
  2019-08-01  7:39 ` Oscar Salvador
@ 2019-08-01 18:46 ` David Hildenbrand
  7 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2019-08-01 18:46 UTC (permalink / raw)
  To: Oscar Salvador, akpm
  Cc: dan.j.williams, pasha.tatashin, mhocko, anshuman.khandual,
	Jonathan.Cameron, vbabka, linux-mm, linux-kernel

On 25.07.19 18:02, Oscar Salvador wrote:
> Here we go with v3.
> 
> v3 -> v2:
>         * Rewrite about vmemmap pages handling.
>           Prior to this version, I was (ab)using hugepages fields
>           from struct page, while here I am officially adding a new
>           sub-page type with the fields I need.
> 
>         * Drop MHP_MEMMAP_{MEMBLOCK,DEVICE} in favor of MHP_MEMMAP_ON_MEMORY.
>           While I am still not 100% if this the right decision, and while I
>           still see some gaining in having MHP_MEMMAP_{MEMBLOCK,DEVICE},
>           having only one flag ease the code.
>           If the user wants to allocate memmaps per memblock, it'll
>           have to call add_memory() variants with memory-block granularity.
> 
>           If we happen to have a more clear usecase MHP_MEMMAP_MEMBLOCK
>           flag in the future, so user does not have to bother about the way
>           it calls add_memory() variants, but only pass a flag, we can add it.
>           Actually, I already had the code, so add it in the future is going to be
>           easy.
> 
>         * Granularity check when hot-removing memory.
>           Just checking that the granularity is the same.
> 
> [Testing]
> 
>  - x86_64: small and large memblocks (128MB, 1G and 2G)
> 
> So far, only acpi memory hotplug uses the new flag.
> The other callers can be changed depending on their needs.
> 
> [Coverletter]
> 
> This is another step to make memory hotplug more usable. The primary
> goal of this patchset is to reduce memory overhead of the hot-added
> memory (at least for SPARSEMEM_VMEMMAP memory model). The current way we use
> to populate memmap (struct page array) has two main drawbacks:
> 
> a) it consumes an additional memory until the hotadded memory itself is
>    onlined and
> b) memmap might end up on a different numa node which is especially true
>    for movable_node configuration.
> 
> a) it is a problem especially for memory hotplug based memory "ballooning"
>    solutions when the delay between physical memory hotplug and the
>    onlining can lead to OOM and that led to introduction of hacks like auto
>    onlining (see 31bc3858ea3e ("memory-hotplug: add automatic onlining
>    policy for the newly added memory")).
> 
> b) can have performance drawbacks.
> 
> One way to mitigate all these issues is to simply allocate memmap array
> (which is the largest memory footprint of the physical memory hotplug)
> from the hot-added memory itself. SPARSEMEM_VMEMMAP memory model allows
> us to map any pfn range so the memory doesn't need to be online to be
> usable for the array. See patch 3 for more details.
> This feature is only usable when CONFIG_SPARSEMEM_VMEMMAP is set.
> 
> [Overall design]:
> 
> Implementation wise we reuse vmem_altmap infrastructure to override
> the default allocator used by vmemap_populate. Once the memmap is
> allocated we need a way to mark altmap pfns used for the allocation.
> If MHP_MEMMAP_ON_MEMORY flag was passed, we set up the layout of the
> altmap structure at the beginning of __add_pages(), and then we call
> mark_vmemmap_pages().
> 
> MHP_MEMMAP_ON_MEMORY flag parameter will specify to allocate memmaps
> from the hot-added range.
> If callers wants memmaps to be allocated per memory block, it will
> have to call add_memory() variants in memory-block granularity
> spanning the whole range, while if it wants to allocate memmaps
> per whole memory range, just one call will do.
> 
> Want to add 384MB (3 sections, 3 memory-blocks)
> e.g:
> 
> add_memory(0x1000, size_memory_block);
> add_memory(0x2000, size_memory_block);
> add_memory(0x3000, size_memory_block);
> 

Some more thoughts:

1. It can happen that pfn_online() for a vmemmap page returns either
true or false, depending on the state of the section. It could be that
the memory block holding the vmemmap is offline while another memory
block making use of it is online.

I guess this isn't bad (I assume it is similar for the altmap), however
it could be that makedumpfile will exclude the vmemmap from dumps (as it
will usually only dump pages in sections marked online if I am not wrong
- maybe it special cases vmemmaps already). Also, could be that it is
not saved/restored during hibernation. We'll have to verify.


2. memmap access when adding/removing memory

The memmap is initialized when onlining memory. We still have to clean
up accessing the memmap in remove_memory(). You seem to introduce new
users - which is bad. Especially when removing memory we never onlined.

When removing memory, you shouldn't have to worry about any orders -
nobody should touch the memmap. I am aware that we still query the zone
- are there other users that touch the memmap when removing memory?


3. isolation/compaction

I am not sure if simply unconditionally skipping over Vmemmap pages is a
good idea. I would have guessed it is better to hinder callers from even
triggering this.

E.g., Only online the pieces that don't contain the vmemmap. When
offlining a memory block, only actually try to offline the pieces that
were onlined - excluding the vmemmap.

Might require some smaller reworks but shouldn't be too hard as far as I
can tell.


4. mhp_flags and altmap with __add_pages()

I have hoped that we can  handle the specific of MMAP_ON_MEMORY
completely in add_memory() - nobody else needs MMAP_ON_MEMORY (we have
the generic altmap concept already).

So, setup the struct vmem_altmap; in add_memory() and pass it directly.
During arch_add_memory(), nobody should be touching the vmemmap either
way, as it is completely uninitialized.

When we return from arch_add_memory() in add_memory(), we could then
initialize the memmap for the vmemmap pages (e.g., set them to
PageVmemmap) - via mhp_mark_vmemmap_pages() or such.

What exactly speaks against this approach? (moving the MMAP_ON_MEMORY
handling completely out of __add_pages())? Am I missing some access the
could be evil while the pages are not mapped?

(I'd love to see __add_pages() only eat an altmap again, and keep the
MMAP_ON_MEMORY thingy specific to add_memory())

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-08-01 18:46 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-25 16:02 [PATCH v3 0/5] Allocate memmap from hotadded memory Oscar Salvador
2019-07-25 16:02 ` [PATCH v3 1/5] mm,memory_hotplug: Introduce MHP_MEMMAP_ON_MEMORY Oscar Salvador
2019-07-26  8:34   ` David Hildenbrand
2019-07-26  9:29     ` Oscar Salvador
2019-07-26  9:37       ` David Hildenbrand
2019-07-25 16:02 ` [PATCH v3 2/5] mm: Introduce a new Vmemmap page-type Oscar Salvador
2019-07-26  8:48   ` David Hildenbrand
2019-07-26  9:25     ` Oscar Salvador
2019-07-26  9:41       ` David Hildenbrand
2019-07-26 10:11         ` Oscar Salvador
2019-07-25 16:02 ` [PATCH v3 3/5] mm,sparse: Add SECTION_USE_VMEMMAP flag Oscar Salvador
2019-08-01 14:45   ` David Hildenbrand
2019-07-25 16:02 ` [PATCH v3 4/5] mm,memory_hotplug: Allocate memmap from the added memory range for sparse-vmemmap Oscar Salvador
2019-08-01 15:04   ` David Hildenbrand
2019-07-25 16:02 ` [PATCH v3 5/5] mm,memory_hotplug: Allow userspace to enable/disable vmemmap Oscar Salvador
2019-08-01 15:07   ` David Hildenbrand
2019-07-25 16:56 ` [PATCH v3 0/5] Allocate memmap from hotadded memory David Hildenbrand
2019-08-01  7:39 ` Oscar Salvador
2019-08-01  8:17   ` David Hildenbrand
2019-08-01  8:39     ` Oscar Salvador
2019-08-01  8:44       ` David Hildenbrand
2019-08-01 18:46 ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).