All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/6] mm, memory_hotplug: allocate memmap from hotadded memory
@ 2017-08-01 12:41 ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-01 12:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Benjamin Herrenschmidt,
	Catalin Marinas, Dan Williams, Fenghua Yu, Gerald Schaefer,
	Heiko Carstens, H. Peter Anvin, Ingo Molnar, Martin Schwidefsky,
	Michael Ellerman, Michal Hocko, Paul Mackerras, Thomas Gleixner,
	Tony Luck, Will Deacon, x86

Hi,
this is a second version of the RFC previously posted [1]. It is still
an RFC and the main reason for the repost is that I've done some changes
and review should be easier this way. The biggest difference is that
users of the memory hotplug can opt-in for this new feature. This is mainly
because HMM doesn't provide its altmap nor it wants struct pages in the
added range. Archs can then veto the feature if they cannot support it.
The only such example is s390. See [2] for a more detailed explanation why.

Original cover:

This is another step to make the memory hotplug more usable. The primary
goal of this patchset is to reduce memory overhead of the hot added
memory (at least for SPARSE_VMEMMAP memory model). Currently we use
kmalloc to poppulate memmap (struct page array) which has two main
drawbacks a) it consumes an additional memory until the hotadded memory
itslef is onlined and b) memmap might end up on a different numa node
which is especially true for movable_node configuration.

a) is problem especially for memory hotplug based memory "ballooning"
solutions when the delay between physical memory hotplug and the
onlining can lead to OOM and that led to introduction of hacks like auto
onlining (see 31bc3858ea3e ("memory-hotplug: add automatic onlining
policy for the newly added memory")).
b) can have performance drawbacks.

One way to mitigate both issues is to simply allocate memmap array
(which is the largest memory footprint of the physical memory hotplug)
from the hotadded memory itself. VMEMMAP memory model allows us to map
any pfn range so the memory doesn't need to be online to be usable
for the array. See patch 3 for more details. In short I am reusing an
existing vmem_altmap which wants to achieve the same thing for nvdim
device memory.

I am sending this as an RFC because this has seen only a very limited
testing and I am mostly interested about opinions on the chosen
approach. I had to touch some arch code and I have no idea whether my
changes make sense there (especially ppc). Therefore I would highly
appreciate arch maintainers to check patch 3.

Patches 5 and 6 should be straightforward cleanups.

There is also one potential drawback, though. If somebody uses memory
hotplug for 1G (gigantic) hugetlb pages then this scheme will not work
for them obviously because each memory block will contain reserved
area. Large x86 machines will use 2G memblocks so at least one 1G page
will be available but this is still not 2G...

I am not really sure somebody does that and how reliable that can work
actually. Nevertheless, I _believe_ that onlining more memory into
virtual machines is much more common usecase. Anyway if there ever is a
strong demand for such a usecase we have basically 3 options a) enlarge
memory blocks even more b) enhance altmap allocation strategy and reuse
low memory sections to host memmaps of other sections on the same NUMA
node c) have the memmap allocation strategy configurable to fallback to
the current allocation.

Are there any other concerns, ideas, comments?

The patches is based on the current mmotm tree (mmotm-2017-07-31-16-56)

Diffstat says
 arch/arm64/mm/mmu.c            |  9 +++--
 arch/ia64/mm/discontig.c       |  4 ++-
 arch/ia64/mm/init.c            |  5 +--
 arch/powerpc/mm/init_64.c      | 34 +++++++++++++-----
 arch/powerpc/mm/mem.c          |  5 +--
 arch/s390/mm/init.c            | 11 ++++--
 arch/s390/mm/vmem.c            |  9 ++---
 arch/sh/mm/init.c              |  5 +--
 arch/sparc/mm/init_64.c        |  6 ++--
 arch/x86/mm/init_32.c          |  5 +--
 arch/x86/mm/init_64.c          | 18 +++++++---
 include/linux/memory_hotplug.h | 32 ++++++++++++++---
 include/linux/memremap.h       | 39 ++++++++++++++------
 include/linux/mm.h             | 25 +++++++++++--
 include/linux/page-flags.h     | 18 ++++++++++
 kernel/memremap.c              | 12 +++----
 mm/compaction.c                |  3 ++
 mm/memory_hotplug.c            | 80 +++++++++++++++++++++---------------------
 mm/page_alloc.c                | 25 +++++++++++--
 mm/page_isolation.c            | 11 +++++-
 mm/sparse-vmemmap.c            | 13 +++++--
 mm/sparse.c                    | 36 +++++++++++++------
 22 files changed, 290 insertions(+), 115 deletions(-)

Shortlog
Michal Hocko (6):
      mm, memory_hotplug: cleanup memory offline path
      mm, arch: unify vmemmap_populate altmap handling
      mm, memory_hotplug: provide a more generic restrictions for memory hotplug
      mm, memory_hotplug: allocate memmap from the added memory range for sparse-vmemmap
      mm, sparse: complain about implicit altmap usage in vmemmap_populate
      mm, sparse: rename kmalloc_section_memmap, __kfree_section_memmap


[1] http://lkml.kernel.org/r/20170726083333.17754-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/20170731195830.0d0ebf2f@thinkpad

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH v2 0/6] mm, memory_hotplug: allocate memmap from hotadded memory
@ 2017-08-01 12:41 ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-01 12:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Benjamin Herrenschmidt,
	Catalin Marinas, Dan Williams, Fenghua Yu, Gerald Schaefer,
	Heiko Carstens, H. Peter Anvin, Ingo Molnar, Martin Schwidefsky,
	Michael Ellerman, Michal Hocko, Paul Mackerras, Thomas Gleixner,
	Tony Luck, Will Deacon, x86

Hi,
this is a second version of the RFC previously posted [1]. It is still
an RFC and the main reason for the repost is that I've done some changes
and review should be easier this way. The biggest difference is that
users of the memory hotplug can opt-in for this new feature. This is mainly
because HMM doesn't provide its altmap nor it wants struct pages in the
added range. Archs can then veto the feature if they cannot support it.
The only such example is s390. See [2] for a more detailed explanation why.

Original cover:

This is another step to make the memory hotplug more usable. The primary
goal of this patchset is to reduce memory overhead of the hot added
memory (at least for SPARSE_VMEMMAP memory model). Currently we use
kmalloc to poppulate memmap (struct page array) which has two main
drawbacks a) it consumes an additional memory until the hotadded memory
itslef is onlined and b) memmap might end up on a different numa node
which is especially true for movable_node configuration.

a) is problem especially for memory hotplug based memory "ballooning"
solutions when the delay between physical memory hotplug and the
onlining can lead to OOM and that led to introduction of hacks like auto
onlining (see 31bc3858ea3e ("memory-hotplug: add automatic onlining
policy for the newly added memory")).
b) can have performance drawbacks.

One way to mitigate both issues is to simply allocate memmap array
(which is the largest memory footprint of the physical memory hotplug)
from the hotadded memory itself. VMEMMAP memory model allows us to map
any pfn range so the memory doesn't need to be online to be usable
for the array. See patch 3 for more details. In short I am reusing an
existing vmem_altmap which wants to achieve the same thing for nvdim
device memory.

I am sending this as an RFC because this has seen only a very limited
testing and I am mostly interested about opinions on the chosen
approach. I had to touch some arch code and I have no idea whether my
changes make sense there (especially ppc). Therefore I would highly
appreciate arch maintainers to check patch 3.

Patches 5 and 6 should be straightforward cleanups.

There is also one potential drawback, though. If somebody uses memory
hotplug for 1G (gigantic) hugetlb pages then this scheme will not work
for them obviously because each memory block will contain reserved
area. Large x86 machines will use 2G memblocks so at least one 1G page
will be available but this is still not 2G...

I am not really sure somebody does that and how reliable that can work
actually. Nevertheless, I _believe_ that onlining more memory into
virtual machines is much more common usecase. Anyway if there ever is a
strong demand for such a usecase we have basically 3 options a) enlarge
memory blocks even more b) enhance altmap allocation strategy and reuse
low memory sections to host memmaps of other sections on the same NUMA
node c) have the memmap allocation strategy configurable to fallback to
the current allocation.

Are there any other concerns, ideas, comments?

The patches is based on the current mmotm tree (mmotm-2017-07-31-16-56)

Diffstat says
 arch/arm64/mm/mmu.c            |  9 +++--
 arch/ia64/mm/discontig.c       |  4 ++-
 arch/ia64/mm/init.c            |  5 +--
 arch/powerpc/mm/init_64.c      | 34 +++++++++++++-----
 arch/powerpc/mm/mem.c          |  5 +--
 arch/s390/mm/init.c            | 11 ++++--
 arch/s390/mm/vmem.c            |  9 ++---
 arch/sh/mm/init.c              |  5 +--
 arch/sparc/mm/init_64.c        |  6 ++--
 arch/x86/mm/init_32.c          |  5 +--
 arch/x86/mm/init_64.c          | 18 +++++++---
 include/linux/memory_hotplug.h | 32 ++++++++++++++---
 include/linux/memremap.h       | 39 ++++++++++++++------
 include/linux/mm.h             | 25 +++++++++++--
 include/linux/page-flags.h     | 18 ++++++++++
 kernel/memremap.c              | 12 +++----
 mm/compaction.c                |  3 ++
 mm/memory_hotplug.c            | 80 +++++++++++++++++++++---------------------
 mm/page_alloc.c                | 25 +++++++++++--
 mm/page_isolation.c            | 11 +++++-
 mm/sparse-vmemmap.c            | 13 +++++--
 mm/sparse.c                    | 36 +++++++++++++------
 22 files changed, 290 insertions(+), 115 deletions(-)

Shortlog
Michal Hocko (6):
      mm, memory_hotplug: cleanup memory offline path
      mm, arch: unify vmemmap_populate altmap handling
      mm, memory_hotplug: provide a more generic restrictions for memory hotplug
      mm, memory_hotplug: allocate memmap from the added memory range for sparse-vmemmap
      mm, sparse: complain about implicit altmap usage in vmemmap_populate
      mm, sparse: rename kmalloc_section_memmap, __kfree_section_memmap


[1] http://lkml.kernel.org/r/20170726083333.17754-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/20170731195830.0d0ebf2f@thinkpad

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/6] mm, memory_hotplug: cleanup memory offline path
  2017-08-01 12:41 ` Michal Hocko
@ 2017-08-01 12:41   ` Michal Hocko
  -1 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-01 12:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

check_pages_isolated_cb currently accounts the whole pfn range as being
offlined if test_pages_isolated suceeds on the range. This is based on
the assumption that all pages in the range are freed which is currently
the case in most cases but it won't be with later changes. I haven't
double checked but if the range contains invalid pfns we could
theoretically over account and underflow zone's managed pages.

Move the offlined pages counting to offline_isolated_pages_cb and
rely on __offline_isolated_pages to return the correct value.
check_pages_isolated_cb will still do it's primary job and check the pfn
range.

While we are at it remove check_pages_isolated and offline_isolated_pages
and use directly walk_system_ram_range as do in online_pages.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/memory_hotplug.h |  2 +-
 mm/memory_hotplug.c            | 43 ++++++++++--------------------------------
 mm/page_alloc.c                | 11 +++++++++--
 3 files changed, 20 insertions(+), 36 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 5e6e4cc36ff4..f64321b35e88 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -101,7 +101,7 @@ extern int add_one_highpage(struct page *page, int pfn, int bad_ppro);
 extern int online_pages(unsigned long, unsigned long, int);
 extern int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn,
 	unsigned long *valid_start, unsigned long *valid_end);
-extern void __offline_isolated_pages(unsigned long, unsigned long);
+extern unsigned long __offline_isolated_pages(unsigned long, unsigned long);
 
 typedef void (*online_page_callback_t)(struct page *page);
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1de2f132bca3..8031cc41bc5c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1444,17 +1444,12 @@ static int
 offline_isolated_pages_cb(unsigned long start, unsigned long nr_pages,
 			void *data)
 {
-	__offline_isolated_pages(start, start + nr_pages);
+	unsigned long offlined_pages;
+	offlined_pages = __offline_isolated_pages(start, start + nr_pages);
+	*(unsigned long *)data += offlined_pages;
 	return 0;
 }
 
-static void
-offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
-{
-	walk_system_ram_range(start_pfn, end_pfn - start_pfn, NULL,
-				offline_isolated_pages_cb);
-}
-
 /*
  * Check all pages in range, recoreded as memory resource, are isolated.
  */
@@ -1462,26 +1457,7 @@ static int
 check_pages_isolated_cb(unsigned long start_pfn, unsigned long nr_pages,
 			void *data)
 {
-	int ret;
-	long offlined = *(long *)data;
-	ret = test_pages_isolated(start_pfn, start_pfn + nr_pages, true);
-	offlined = nr_pages;
-	if (!ret)
-		*(long *)data += offlined;
-	return ret;
-}
-
-static long
-check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
-{
-	long offlined = 0;
-	int ret;
-
-	ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, &offlined,
-			check_pages_isolated_cb);
-	if (ret < 0)
-		offlined = (long)ret;
-	return offlined;
+	return test_pages_isolated(start_pfn, start_pfn + nr_pages, true);
 }
 
 static int __init cmdline_parse_movable_node(char *p)
@@ -1590,7 +1566,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		  unsigned long end_pfn, unsigned long timeout)
 {
 	unsigned long pfn, nr_pages, expire;
-	long offlined_pages;
+	unsigned long offlined_pages = 0;
 	int ret, drain, retry_max, node;
 	unsigned long flags;
 	unsigned long valid_start, valid_end;
@@ -1673,15 +1649,16 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	if (ret)
 		goto failed_removal;
 	/* check again */
-	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
-	if (offlined_pages < 0) {
+	if (walk_system_ram_range(start_pfn, end_pfn - start_pfn, NULL,
+			check_pages_isolated_cb)) {
 		ret = -EBUSY;
 		goto failed_removal;
 	}
-	pr_info("Offlined Pages %ld\n", offlined_pages);
 	/* Ok, all of our target is isolated.
 	   We cannot do rollback at this point. */
-	offline_isolated_pages(start_pfn, end_pfn);
+	walk_system_ram_range(start_pfn, end_pfn - start_pfn, &offlined_pages,
+				offline_isolated_pages_cb);
+	pr_info("Offlined Pages %ld\n", offlined_pages);
 	/* reset pagetype flags and makes migrate type to be MOVABLE */
 	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
 	/* removal success */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index af224632c779..f4e5db85ebfc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7612,7 +7612,7 @@ void zone_pcp_reset(struct zone *zone)
  * All pages in the range must be in a single zone and isolated
  * before calling this.
  */
-void
+unsigned long
 __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 {
 	struct page *page;
@@ -7620,12 +7620,15 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 	unsigned int order, i;
 	unsigned long pfn;
 	unsigned long flags;
+	unsigned long offlined_pages = 0;
+
 	/* find the first valid pfn */
 	for (pfn = start_pfn; pfn < end_pfn; pfn++)
 		if (pfn_valid(pfn))
 			break;
 	if (pfn == end_pfn)
-		return;
+		return offlined_pages;
+
 	offline_mem_sections(pfn, end_pfn);
 	zone = page_zone(pfn_to_page(pfn));
 	spin_lock_irqsave(&zone->lock, flags);
@@ -7643,12 +7646,14 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 		if (unlikely(!PageBuddy(page) && PageHWPoison(page))) {
 			pfn++;
 			SetPageReserved(page);
+			offlined_pages++;
 			continue;
 		}
 
 		BUG_ON(page_count(page));
 		BUG_ON(!PageBuddy(page));
 		order = page_order(page);
+		offlined_pages += 1 << order;
 #ifdef CONFIG_DEBUG_VM
 		pr_info("remove from free list %lx %d %lx\n",
 			pfn, 1 << order, end_pfn);
@@ -7661,6 +7666,8 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 		pfn += (1 << order);
 	}
 	spin_unlock_irqrestore(&zone->lock, flags);
+
+	return offlined_pages;
 }
 #endif
 
-- 
2.13.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 1/6] mm, memory_hotplug: cleanup memory offline path
@ 2017-08-01 12:41   ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-01 12:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

check_pages_isolated_cb currently accounts the whole pfn range as being
offlined if test_pages_isolated suceeds on the range. This is based on
the assumption that all pages in the range are freed which is currently
the case in most cases but it won't be with later changes. I haven't
double checked but if the range contains invalid pfns we could
theoretically over account and underflow zone's managed pages.

Move the offlined pages counting to offline_isolated_pages_cb and
rely on __offline_isolated_pages to return the correct value.
check_pages_isolated_cb will still do it's primary job and check the pfn
range.

While we are at it remove check_pages_isolated and offline_isolated_pages
and use directly walk_system_ram_range as do in online_pages.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/memory_hotplug.h |  2 +-
 mm/memory_hotplug.c            | 43 ++++++++++--------------------------------
 mm/page_alloc.c                | 11 +++++++++--
 3 files changed, 20 insertions(+), 36 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 5e6e4cc36ff4..f64321b35e88 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -101,7 +101,7 @@ extern int add_one_highpage(struct page *page, int pfn, int bad_ppro);
 extern int online_pages(unsigned long, unsigned long, int);
 extern int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn,
 	unsigned long *valid_start, unsigned long *valid_end);
-extern void __offline_isolated_pages(unsigned long, unsigned long);
+extern unsigned long __offline_isolated_pages(unsigned long, unsigned long);
 
 typedef void (*online_page_callback_t)(struct page *page);
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1de2f132bca3..8031cc41bc5c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1444,17 +1444,12 @@ static int
 offline_isolated_pages_cb(unsigned long start, unsigned long nr_pages,
 			void *data)
 {
-	__offline_isolated_pages(start, start + nr_pages);
+	unsigned long offlined_pages;
+	offlined_pages = __offline_isolated_pages(start, start + nr_pages);
+	*(unsigned long *)data += offlined_pages;
 	return 0;
 }
 
-static void
-offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
-{
-	walk_system_ram_range(start_pfn, end_pfn - start_pfn, NULL,
-				offline_isolated_pages_cb);
-}
-
 /*
  * Check all pages in range, recoreded as memory resource, are isolated.
  */
@@ -1462,26 +1457,7 @@ static int
 check_pages_isolated_cb(unsigned long start_pfn, unsigned long nr_pages,
 			void *data)
 {
-	int ret;
-	long offlined = *(long *)data;
-	ret = test_pages_isolated(start_pfn, start_pfn + nr_pages, true);
-	offlined = nr_pages;
-	if (!ret)
-		*(long *)data += offlined;
-	return ret;
-}
-
-static long
-check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
-{
-	long offlined = 0;
-	int ret;
-
-	ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, &offlined,
-			check_pages_isolated_cb);
-	if (ret < 0)
-		offlined = (long)ret;
-	return offlined;
+	return test_pages_isolated(start_pfn, start_pfn + nr_pages, true);
 }
 
 static int __init cmdline_parse_movable_node(char *p)
@@ -1590,7 +1566,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		  unsigned long end_pfn, unsigned long timeout)
 {
 	unsigned long pfn, nr_pages, expire;
-	long offlined_pages;
+	unsigned long offlined_pages = 0;
 	int ret, drain, retry_max, node;
 	unsigned long flags;
 	unsigned long valid_start, valid_end;
@@ -1673,15 +1649,16 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	if (ret)
 		goto failed_removal;
 	/* check again */
-	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
-	if (offlined_pages < 0) {
+	if (walk_system_ram_range(start_pfn, end_pfn - start_pfn, NULL,
+			check_pages_isolated_cb)) {
 		ret = -EBUSY;
 		goto failed_removal;
 	}
-	pr_info("Offlined Pages %ld\n", offlined_pages);
 	/* Ok, all of our target is isolated.
 	   We cannot do rollback at this point. */
-	offline_isolated_pages(start_pfn, end_pfn);
+	walk_system_ram_range(start_pfn, end_pfn - start_pfn, &offlined_pages,
+				offline_isolated_pages_cb);
+	pr_info("Offlined Pages %ld\n", offlined_pages);
 	/* reset pagetype flags and makes migrate type to be MOVABLE */
 	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
 	/* removal success */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index af224632c779..f4e5db85ebfc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7612,7 +7612,7 @@ void zone_pcp_reset(struct zone *zone)
  * All pages in the range must be in a single zone and isolated
  * before calling this.
  */
-void
+unsigned long
 __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 {
 	struct page *page;
@@ -7620,12 +7620,15 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 	unsigned int order, i;
 	unsigned long pfn;
 	unsigned long flags;
+	unsigned long offlined_pages = 0;
+
 	/* find the first valid pfn */
 	for (pfn = start_pfn; pfn < end_pfn; pfn++)
 		if (pfn_valid(pfn))
 			break;
 	if (pfn == end_pfn)
-		return;
+		return offlined_pages;
+
 	offline_mem_sections(pfn, end_pfn);
 	zone = page_zone(pfn_to_page(pfn));
 	spin_lock_irqsave(&zone->lock, flags);
@@ -7643,12 +7646,14 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 		if (unlikely(!PageBuddy(page) && PageHWPoison(page))) {
 			pfn++;
 			SetPageReserved(page);
+			offlined_pages++;
 			continue;
 		}
 
 		BUG_ON(page_count(page));
 		BUG_ON(!PageBuddy(page));
 		order = page_order(page);
+		offlined_pages += 1 << order;
 #ifdef CONFIG_DEBUG_VM
 		pr_info("remove from free list %lx %d %lx\n",
 			pfn, 1 << order, end_pfn);
@@ -7661,6 +7666,8 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 		pfn += (1 << order);
 	}
 	spin_unlock_irqrestore(&zone->lock, flags);
+
+	return offlined_pages;
 }
 #endif
 
-- 
2.13.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/6] mm, arch: unify vmemmap_populate altmap handling
  2017-08-01 12:41 ` Michal Hocko
@ 2017-08-01 12:41   ` Michal Hocko
  -1 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-01 12:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko,
	Benjamin Herrenschmidt, Catalin Marinas, Fenghua Yu,
	Heiko Carstens, H. Peter Anvin, Ingo Molnar, Martin Schwidefsky,
	Michael Ellerman, Paul Mackerras, Thomas Gleixner, Tony Luck,
	Will Deacon

From: Michal Hocko <mhocko@suse.com>

vmem_altmap allows vmemmap_populate to allocate memmap (struct page
array) from an alternative allocator rather than bootmem resp.
kmalloc. Only x86 currently supports altmap handling, most likely
because only nvdim code uses this mechanism currently and the code
depends on ZONE_DEVICE which is present only for x86_64. This will
change in follow up changes so we would like other architectures
to support it as well.

Provide vmemmap_populate generic implementation which simply resolves
altmap and then call into arch specific __vmemmap_populate.
Architectures then only need to use __vmemmap_alloc_block_buf to
allocate the memmap. vmemmap_free then needs to call vmem_altmap_free
if there is any altmap associated with the address.

This patch shouldn't introduce any functional changes because
to_vmem_altmap always returns NULL on !x86_x64.

Changes since v1
- s390: use altmap even for ptes in case the HW doesn't support large
  pages as per Gerald Schaefer

c: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ia64@vger.kernel.org
Cc: x86@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/arm64/mm/mmu.c       |  9 ++++++---
 arch/ia64/mm/discontig.c  |  4 +++-
 arch/powerpc/mm/init_64.c | 29 ++++++++++++++++++++---------
 arch/s390/mm/vmem.c       |  9 +++++----
 arch/sparc/mm/init_64.c   |  6 +++---
 arch/x86/mm/init_64.c     |  4 ++--
 include/linux/memremap.h  | 13 ++-----------
 include/linux/mm.h        | 19 ++++++++++++++++++-
 mm/sparse-vmemmap.c       |  2 +-
 9 files changed, 60 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0c429ec6fde8..5de1161e7a1b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -649,12 +649,15 @@ int kern_addr_valid(unsigned long addr)
 }
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 #if !ARM64_SWAPPER_USES_SECTION_MAPS
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit __vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
+	WARN(altmap, "altmap unsupported\n");
 	return vmemmap_populate_basepages(start, end, node);
 }
 #else	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit __vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
 	unsigned long addr = start;
 	unsigned long next;
@@ -677,7 +680,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 		if (pmd_none(*pmd)) {
 			void *p = NULL;
 
-			p = vmemmap_alloc_block_buf(PMD_SIZE, node);
+			p = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
 			if (!p)
 				return -ENOMEM;
 
diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index 878626805369..2a939e877ced 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -753,8 +753,10 @@ void arch_refresh_nodedata(int update_node, pg_data_t *update_pgdat)
 #endif
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit __vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
+	WARN(altmap, "altmap unsupported\n");
 	return vmemmap_populate_basepages(start, end, node);
 }
 
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index ec84b31c6c86..5ea5e870a589 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -44,6 +44,7 @@
 #include <linux/slab.h>
 #include <linux/of_fdt.h>
 #include <linux/libfdt.h>
+#include <linux/memremap.h>
 
 #include <asm/pgalloc.h>
 #include <asm/page.h>
@@ -115,7 +116,8 @@ static struct vmemmap_backing *next;
 static int num_left;
 static int num_freed;
 
-static __meminit struct vmemmap_backing * vmemmap_list_alloc(int node)
+static __meminit struct vmemmap_backing * vmemmap_list_alloc(int node,
+		struct vmem_altmap *altmap)
 {
 	struct vmemmap_backing *vmem_back;
 	/* get from freed entries first */
@@ -129,7 +131,7 @@ static __meminit struct vmemmap_backing * vmemmap_list_alloc(int node)
 
 	/* allocate a page when required and hand out chunks */
 	if (!num_left) {
-		next = vmemmap_alloc_block(PAGE_SIZE, node);
+		next = __vmemmap_alloc_block_buf(PAGE_SIZE, node, altmap);
 		if (unlikely(!next)) {
 			WARN_ON(1);
 			return NULL;
@@ -144,11 +146,12 @@ static __meminit struct vmemmap_backing * vmemmap_list_alloc(int node)
 
 static __meminit void vmemmap_list_populate(unsigned long phys,
 					    unsigned long start,
-					    int node)
+					    int node,
+					    struct vmem_altmap *altmap)
 {
 	struct vmemmap_backing *vmem_back;
 
-	vmem_back = vmemmap_list_alloc(node);
+	vmem_back = vmemmap_list_alloc(node, altmap);
 	if (unlikely(!vmem_back)) {
 		WARN_ON(1);
 		return;
@@ -161,14 +164,15 @@ static __meminit void vmemmap_list_populate(unsigned long phys,
 	vmemmap_list = vmem_back;
 }
 
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit __vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
 	unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
 
 	/* Align to the page size of the linear mapping. */
 	start = _ALIGN_DOWN(start, page_size);
 
-	pr_debug("vmemmap_populate %lx..%lx, node %d\n", start, end, node);
+	pr_debug("__vmemmap_populate %lx..%lx, node %d\n", start, end, node);
 
 	for (; start < end; start += page_size) {
 		void *p;
@@ -177,11 +181,11 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 		if (vmemmap_populated(start, page_size))
 			continue;
 
-		p = vmemmap_alloc_block(page_size, node);
+		p = __vmemmap_alloc_block_buf(page_size, node, altmap);
 		if (!p)
 			return -ENOMEM;
 
-		vmemmap_list_populate(__pa(p), start, node);
+		vmemmap_list_populate(__pa(p), start, node, altmap);
 
 		pr_debug("      * %016lx..%016lx allocated at %p\n",
 			 start, start + page_size, p);
@@ -189,7 +193,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 		rc = vmemmap_create_mapping(start, page_size, __pa(p));
 		if (rc < 0) {
 			pr_warning(
-				"vmemmap_populate: Unable to create vmemmap mapping: %d\n",
+				"__vmemmap_populate: Unable to create vmemmap mapping: %d\n",
 				rc);
 			return -EFAULT;
 		}
@@ -253,6 +257,12 @@ void __ref vmemmap_free(unsigned long start, unsigned long end)
 		addr = vmemmap_list_free(start);
 		if (addr) {
 			struct page *page = pfn_to_page(addr >> PAGE_SHIFT);
+			struct vmem_altmap *altmap = to_vmem_altmap((unsigned long) page);
+
+			if (altmap) {
+				vmem_altmap_free(altmap, page_size >> PAGE_SHIFT);
+				goto unmap;
+			}
 
 			if (PageReserved(page)) {
 				/* allocated from bootmem */
@@ -272,6 +282,7 @@ void __ref vmemmap_free(unsigned long start, unsigned long end)
 				free_pages((unsigned long)(__va(addr)),
 							get_order(page_size));
 
+unmap:
 			vmemmap_remove_mapping(start, page_size);
 		}
 	}
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index c33c94b4be60..764b6393e66c 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -208,7 +208,8 @@ static void vmem_remove_range(unsigned long start, unsigned long size)
 /*
  * Add a backed mem_map array to the virtual mem_map array.
  */
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit __vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
 	unsigned long pgt_prot, sgt_prot;
 	unsigned long address = start;
@@ -247,12 +248,12 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 			 * use large frames even if they are only partially
 			 * used.
 			 * Otherwise we would have also page tables since
-			 * vmemmap_populate gets called for each section
+			 * __vmemmap_populate gets called for each section
 			 * separately. */
 			if (MACHINE_HAS_EDAT1) {
 				void *new_page;
 
-				new_page = vmemmap_alloc_block(PMD_SIZE, node);
+				new_page = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
 				if (!new_page)
 					goto out;
 				pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
@@ -272,7 +273,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 		if (pte_none(*pt_dir)) {
 			void *new_page;
 
-			new_page = vmemmap_alloc_block(PAGE_SIZE, node);
+			new_page = __vmemmap_alloc_block_buf(PAGE_SIZE, node, altmap);
 			if (!new_page)
 				goto out;
 			pte_val(*pt_dir) = __pa(new_page) | pgt_prot;
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 3c40ebd50f92..4a77dfa85468 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2544,8 +2544,8 @@ unsigned long _PAGE_CACHE __read_mostly;
 EXPORT_SYMBOL(_PAGE_CACHE);
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
-			       int node)
+int __meminit __vmemmap_populate(unsigned long vstart, unsigned long vend,
+			       int node, struct vmem_altmap *altmap)
 {
 	unsigned long pte_base;
 
@@ -2587,7 +2587,7 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
 
 		pte = pmd_val(*pmd);
 		if (!(pte & _PAGE_VALID)) {
-			void *block = vmemmap_alloc_block(PMD_SIZE, node);
+			void *block = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
 
 			if (!block)
 				return -ENOMEM;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 136422d7d539..cfe01150f24f 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1398,9 +1398,9 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
 	return 0;
 }
 
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit __vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
-	struct vmem_altmap *altmap = to_vmem_altmap(start);
 	int err;
 
 	if (boot_cpu_has(X86_FEATURE_PSE))
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 93416196ba64..6f8f65d8ebdd 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -8,12 +8,12 @@ struct resource;
 struct device;
 
 /**
- * struct vmem_altmap - pre-allocated storage for vmemmap_populate
+ * struct vmem_altmap - pre-allocated storage for __vmemmap_populate
  * @base_pfn: base of the entire dev_pagemap mapping
  * @reserve: pages mapped, but reserved for driver use (relative to @base)
  * @free: free pages set aside in the mapping for memmap storage
  * @align: pages reserved to meet allocation alignments
- * @alloc: track pages consumed, private to vmemmap_populate()
+ * @alloc: track pages consumed, private to __vmemmap_populate()
  */
 struct vmem_altmap {
 	const unsigned long base_pfn;
@@ -26,15 +26,6 @@ struct vmem_altmap {
 unsigned long vmem_altmap_offset(struct vmem_altmap *altmap);
 void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns);
 
-#ifdef CONFIG_ZONE_DEVICE
-struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start);
-#else
-static inline struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start)
-{
-	return NULL;
-}
-#endif
-
 /**
  * struct dev_pagemap - metadata for ZONE_DEVICE mappings
  * @altmap: pre-allocated/reserved memory for vmemmap allocations
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6f543a47fc92..957d4658977d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2441,10 +2441,27 @@ static inline void *vmemmap_alloc_block_buf(unsigned long size, int node)
 	return __vmemmap_alloc_block_buf(size, node, NULL);
 }
 
+#ifdef CONFIG_ZONE_DEVICE
+struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start);
+#else
+static inline struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start)
+{
+	return NULL;
+}
+#endif
+
 void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
 int vmemmap_populate_basepages(unsigned long start, unsigned long end,
 			       int node);
-int vmemmap_populate(unsigned long start, unsigned long end, int node);
+int __vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap);
+static inline int vmemmap_populate(unsigned long start, unsigned long end,
+		int node)
+{
+	struct vmem_altmap *altmap = to_vmem_altmap(start);
+	return __vmemmap_populate(start, end, node, altmap);
+}
+
 void vmemmap_populate_print_last(void);
 #ifdef CONFIG_MEMORY_HOTPLUG
 void vmemmap_free(unsigned long start, unsigned long end);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index d1a39b8051e0..48c44eda3254 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -14,7 +14,7 @@
  * case the overhead consists of a few additional pages that are
  * allocated to create a view of memory for vmemmap.
  *
- * The architecture is expected to provide a vmemmap_populate() function
+ * The architecture is expected to provide a __vmemmap_populate() function
  * to instantiate the mapping.
  */
 #include <linux/mm.h>
-- 
2.13.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/6] mm, arch: unify vmemmap_populate altmap handling
@ 2017-08-01 12:41   ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-01 12:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko,
	Benjamin Herrenschmidt, Catalin Marinas, Fenghua Yu,
	Heiko Carstens, H. Peter Anvin, Ingo Molnar, Martin Schwidefsky,
	Michael Ellerman, Paul Mackerras, Thomas Gleixner, Tony Luck,
	Will Deacon

From: Michal Hocko <mhocko@suse.com>

vmem_altmap allows vmemmap_populate to allocate memmap (struct page
array) from an alternative allocator rather than bootmem resp.
kmalloc. Only x86 currently supports altmap handling, most likely
because only nvdim code uses this mechanism currently and the code
depends on ZONE_DEVICE which is present only for x86_64. This will
change in follow up changes so we would like other architectures
to support it as well.

Provide vmemmap_populate generic implementation which simply resolves
altmap and then call into arch specific __vmemmap_populate.
Architectures then only need to use __vmemmap_alloc_block_buf to
allocate the memmap. vmemmap_free then needs to call vmem_altmap_free
if there is any altmap associated with the address.

This patch shouldn't introduce any functional changes because
to_vmem_altmap always returns NULL on !x86_x64.

Changes since v1
- s390: use altmap even for ptes in case the HW doesn't support large
  pages as per Gerald Schaefer

c: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ia64@vger.kernel.org
Cc: x86@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/arm64/mm/mmu.c       |  9 ++++++---
 arch/ia64/mm/discontig.c  |  4 +++-
 arch/powerpc/mm/init_64.c | 29 ++++++++++++++++++++---------
 arch/s390/mm/vmem.c       |  9 +++++----
 arch/sparc/mm/init_64.c   |  6 +++---
 arch/x86/mm/init_64.c     |  4 ++--
 include/linux/memremap.h  | 13 ++-----------
 include/linux/mm.h        | 19 ++++++++++++++++++-
 mm/sparse-vmemmap.c       |  2 +-
 9 files changed, 60 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0c429ec6fde8..5de1161e7a1b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -649,12 +649,15 @@ int kern_addr_valid(unsigned long addr)
 }
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 #if !ARM64_SWAPPER_USES_SECTION_MAPS
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit __vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
+	WARN(altmap, "altmap unsupported\n");
 	return vmemmap_populate_basepages(start, end, node);
 }
 #else	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit __vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
 	unsigned long addr = start;
 	unsigned long next;
@@ -677,7 +680,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 		if (pmd_none(*pmd)) {
 			void *p = NULL;
 
-			p = vmemmap_alloc_block_buf(PMD_SIZE, node);
+			p = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
 			if (!p)
 				return -ENOMEM;
 
diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index 878626805369..2a939e877ced 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -753,8 +753,10 @@ void arch_refresh_nodedata(int update_node, pg_data_t *update_pgdat)
 #endif
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit __vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
+	WARN(altmap, "altmap unsupported\n");
 	return vmemmap_populate_basepages(start, end, node);
 }
 
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index ec84b31c6c86..5ea5e870a589 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -44,6 +44,7 @@
 #include <linux/slab.h>
 #include <linux/of_fdt.h>
 #include <linux/libfdt.h>
+#include <linux/memremap.h>
 
 #include <asm/pgalloc.h>
 #include <asm/page.h>
@@ -115,7 +116,8 @@ static struct vmemmap_backing *next;
 static int num_left;
 static int num_freed;
 
-static __meminit struct vmemmap_backing * vmemmap_list_alloc(int node)
+static __meminit struct vmemmap_backing * vmemmap_list_alloc(int node,
+		struct vmem_altmap *altmap)
 {
 	struct vmemmap_backing *vmem_back;
 	/* get from freed entries first */
@@ -129,7 +131,7 @@ static __meminit struct vmemmap_backing * vmemmap_list_alloc(int node)
 
 	/* allocate a page when required and hand out chunks */
 	if (!num_left) {
-		next = vmemmap_alloc_block(PAGE_SIZE, node);
+		next = __vmemmap_alloc_block_buf(PAGE_SIZE, node, altmap);
 		if (unlikely(!next)) {
 			WARN_ON(1);
 			return NULL;
@@ -144,11 +146,12 @@ static __meminit struct vmemmap_backing * vmemmap_list_alloc(int node)
 
 static __meminit void vmemmap_list_populate(unsigned long phys,
 					    unsigned long start,
-					    int node)
+					    int node,
+					    struct vmem_altmap *altmap)
 {
 	struct vmemmap_backing *vmem_back;
 
-	vmem_back = vmemmap_list_alloc(node);
+	vmem_back = vmemmap_list_alloc(node, altmap);
 	if (unlikely(!vmem_back)) {
 		WARN_ON(1);
 		return;
@@ -161,14 +164,15 @@ static __meminit void vmemmap_list_populate(unsigned long phys,
 	vmemmap_list = vmem_back;
 }
 
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit __vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
 	unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
 
 	/* Align to the page size of the linear mapping. */
 	start = _ALIGN_DOWN(start, page_size);
 
-	pr_debug("vmemmap_populate %lx..%lx, node %d\n", start, end, node);
+	pr_debug("__vmemmap_populate %lx..%lx, node %d\n", start, end, node);
 
 	for (; start < end; start += page_size) {
 		void *p;
@@ -177,11 +181,11 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 		if (vmemmap_populated(start, page_size))
 			continue;
 
-		p = vmemmap_alloc_block(page_size, node);
+		p = __vmemmap_alloc_block_buf(page_size, node, altmap);
 		if (!p)
 			return -ENOMEM;
 
-		vmemmap_list_populate(__pa(p), start, node);
+		vmemmap_list_populate(__pa(p), start, node, altmap);
 
 		pr_debug("      * %016lx..%016lx allocated at %p\n",
 			 start, start + page_size, p);
@@ -189,7 +193,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 		rc = vmemmap_create_mapping(start, page_size, __pa(p));
 		if (rc < 0) {
 			pr_warning(
-				"vmemmap_populate: Unable to create vmemmap mapping: %d\n",
+				"__vmemmap_populate: Unable to create vmemmap mapping: %d\n",
 				rc);
 			return -EFAULT;
 		}
@@ -253,6 +257,12 @@ void __ref vmemmap_free(unsigned long start, unsigned long end)
 		addr = vmemmap_list_free(start);
 		if (addr) {
 			struct page *page = pfn_to_page(addr >> PAGE_SHIFT);
+			struct vmem_altmap *altmap = to_vmem_altmap((unsigned long) page);
+
+			if (altmap) {
+				vmem_altmap_free(altmap, page_size >> PAGE_SHIFT);
+				goto unmap;
+			}
 
 			if (PageReserved(page)) {
 				/* allocated from bootmem */
@@ -272,6 +282,7 @@ void __ref vmemmap_free(unsigned long start, unsigned long end)
 				free_pages((unsigned long)(__va(addr)),
 							get_order(page_size));
 
+unmap:
 			vmemmap_remove_mapping(start, page_size);
 		}
 	}
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index c33c94b4be60..764b6393e66c 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -208,7 +208,8 @@ static void vmem_remove_range(unsigned long start, unsigned long size)
 /*
  * Add a backed mem_map array to the virtual mem_map array.
  */
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit __vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
 	unsigned long pgt_prot, sgt_prot;
 	unsigned long address = start;
@@ -247,12 +248,12 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 			 * use large frames even if they are only partially
 			 * used.
 			 * Otherwise we would have also page tables since
-			 * vmemmap_populate gets called for each section
+			 * __vmemmap_populate gets called for each section
 			 * separately. */
 			if (MACHINE_HAS_EDAT1) {
 				void *new_page;
 
-				new_page = vmemmap_alloc_block(PMD_SIZE, node);
+				new_page = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
 				if (!new_page)
 					goto out;
 				pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
@@ -272,7 +273,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 		if (pte_none(*pt_dir)) {
 			void *new_page;
 
-			new_page = vmemmap_alloc_block(PAGE_SIZE, node);
+			new_page = __vmemmap_alloc_block_buf(PAGE_SIZE, node, altmap);
 			if (!new_page)
 				goto out;
 			pte_val(*pt_dir) = __pa(new_page) | pgt_prot;
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 3c40ebd50f92..4a77dfa85468 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2544,8 +2544,8 @@ unsigned long _PAGE_CACHE __read_mostly;
 EXPORT_SYMBOL(_PAGE_CACHE);
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
-			       int node)
+int __meminit __vmemmap_populate(unsigned long vstart, unsigned long vend,
+			       int node, struct vmem_altmap *altmap)
 {
 	unsigned long pte_base;
 
@@ -2587,7 +2587,7 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
 
 		pte = pmd_val(*pmd);
 		if (!(pte & _PAGE_VALID)) {
-			void *block = vmemmap_alloc_block(PMD_SIZE, node);
+			void *block = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
 
 			if (!block)
 				return -ENOMEM;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 136422d7d539..cfe01150f24f 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1398,9 +1398,9 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
 	return 0;
 }
 
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit __vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
-	struct vmem_altmap *altmap = to_vmem_altmap(start);
 	int err;
 
 	if (boot_cpu_has(X86_FEATURE_PSE))
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 93416196ba64..6f8f65d8ebdd 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -8,12 +8,12 @@ struct resource;
 struct device;
 
 /**
- * struct vmem_altmap - pre-allocated storage for vmemmap_populate
+ * struct vmem_altmap - pre-allocated storage for __vmemmap_populate
  * @base_pfn: base of the entire dev_pagemap mapping
  * @reserve: pages mapped, but reserved for driver use (relative to @base)
  * @free: free pages set aside in the mapping for memmap storage
  * @align: pages reserved to meet allocation alignments
- * @alloc: track pages consumed, private to vmemmap_populate()
+ * @alloc: track pages consumed, private to __vmemmap_populate()
  */
 struct vmem_altmap {
 	const unsigned long base_pfn;
@@ -26,15 +26,6 @@ struct vmem_altmap {
 unsigned long vmem_altmap_offset(struct vmem_altmap *altmap);
 void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns);
 
-#ifdef CONFIG_ZONE_DEVICE
-struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start);
-#else
-static inline struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start)
-{
-	return NULL;
-}
-#endif
-
 /**
  * struct dev_pagemap - metadata for ZONE_DEVICE mappings
  * @altmap: pre-allocated/reserved memory for vmemmap allocations
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6f543a47fc92..957d4658977d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2441,10 +2441,27 @@ static inline void *vmemmap_alloc_block_buf(unsigned long size, int node)
 	return __vmemmap_alloc_block_buf(size, node, NULL);
 }
 
+#ifdef CONFIG_ZONE_DEVICE
+struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start);
+#else
+static inline struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start)
+{
+	return NULL;
+}
+#endif
+
 void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
 int vmemmap_populate_basepages(unsigned long start, unsigned long end,
 			       int node);
-int vmemmap_populate(unsigned long start, unsigned long end, int node);
+int __vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap);
+static inline int vmemmap_populate(unsigned long start, unsigned long end,
+		int node)
+{
+	struct vmem_altmap *altmap = to_vmem_altmap(start);
+	return __vmemmap_populate(start, end, node, altmap);
+}
+
 void vmemmap_populate_print_last(void);
 #ifdef CONFIG_MEMORY_HOTPLUG
 void vmemmap_free(unsigned long start, unsigned long end);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index d1a39b8051e0..48c44eda3254 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -14,7 +14,7 @@
  * case the overhead consists of a few additional pages that are
  * allocated to create a view of memory for vmemmap.
  *
- * The architecture is expected to provide a vmemmap_populate() function
+ * The architecture is expected to provide a __vmemmap_populate() function
  * to instantiate the mapping.
  */
 #include <linux/mm.h>
-- 
2.13.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/6] mm, memory_hotplug: provide a more generic restrictions for memory hotplug
  2017-08-01 12:41 ` Michal Hocko
@ 2017-08-01 12:41   ` Michal Hocko
  -1 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-01 12:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko,
	Dan Williams

From: Michal Hocko <mhocko@suse.com>

arch_add_memory, __add_pages take a want_memblock which controls whether
the newly added memory should get the sysfs memblock user API (e.g.
ZONE_DEVICE users do not want/need this interface). Some callers even
want to control where do we allocate the memmap from by configuring
altmap. This is currently done quite ugly by searching for altmap down
in memory hotplug (to_vmem_altmap). It should be the caller to provide
the altmap down the call chain.

Add a more generic hotplug context for arch_add_memory and __add_pages.
struct mhp_restrictions contains flags which contains additional
features to be enabled by the memory hotplug (MHP_MEMBLOCK_API
currently) and altmap for alternative memmap allocator.

Please note that the complete altmap propagation down to vmemmap code
is still not done in this patch. It will be done in the follow up to
reduce the churn here.

This patch shouldn't introduce any functional change.

Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/ia64/mm/init.c            |  5 +++--
 arch/powerpc/mm/mem.c          |  5 +++--
 arch/s390/mm/init.c            |  5 +++--
 arch/sh/mm/init.c              |  5 +++--
 arch/x86/mm/init_32.c          |  5 +++--
 arch/x86/mm/init_64.c          |  5 +++--
 include/linux/memory_hotplug.h | 18 ++++++++++++++++--
 kernel/memremap.c              |  6 +++++-
 mm/memory_hotplug.c            | 14 +++++++++-----
 9 files changed, 48 insertions(+), 20 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index a4e8d6bd9cfa..7fab6a4bdda7 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -646,13 +646,14 @@ mem_init (void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size,
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
-	ret = __add_pages(nid, start_pfn, nr_pages, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, restrictions);
 	if (ret)
 		printk("%s: Problem encountered in __add_pages() as ret=%d\n",
 		       __func__,  ret);
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index de5a90e1ceaa..dcbf278c4e5f 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -126,7 +126,8 @@ int __weak remove_section_mapping(unsigned long start, unsigned long end)
 	return -ENODEV;
 }
 
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size,
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -143,7 +144,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
 		return -EFAULT;
 	}
 
-	return __add_pages(nid, start_pfn, nr_pages, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, restrictions);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index f4fb5d191562..e3db0079ebfc 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -162,7 +162,8 @@ unsigned long memory_block_size_bytes(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size,
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long size_pages = PFN_DOWN(size);
@@ -172,7 +173,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
 	if (rc)
 		return rc;
 
-	rc = __add_pages(nid, start_pfn, size_pages, want_memblock);
+	rc = __add_pages(nid, start_pfn, size_pages, restrictions);
 	if (rc)
 		vmem_remove_mapping(start, size);
 	return rc;
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index bf726af5f1a5..a603e9be989b 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -485,14 +485,15 @@ void free_initrd_mem(unsigned long start, unsigned long end)
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size,
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
 	/* We only have ZONE_NORMAL, so this is easy.. */
-	ret = __add_pages(nid, start_pfn, nr_pages, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, restrictions);
 	if (unlikely(ret))
 		printk("%s: Failed, __add_pages() == %d\n", __func__, ret);
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 8a64a6f2848d..8465555cac90 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -823,12 +823,13 @@ void __init mem_init(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size,
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	return __add_pages(nid, start_pfn, nr_pages, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, restrictions);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index cfe01150f24f..da57a9c9c218 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -772,7 +772,8 @@ static void  update_end_of_memory_vars(u64 start, u64 size)
 	}
 }
 
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size,
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -780,7 +781,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
 
 	init_memory_mapping(start, start + size);
 
-	ret = __add_pages(nid, start_pfn, nr_pages, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, restrictions);
 	WARN_ON_ONCE(ret);
 
 	/* update max_pfn, max_low_pfn and high_memory */
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index f64321b35e88..76e5bfde8050 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -129,9 +129,22 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
 	unsigned long nr_pages);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
+/*
+ * Do we want sysfs memblock files created. This will allow userspace to online
+ * and offline memory explicitly. Lack of this bit means that the caller has to
+ * call move_pfn_range_to_zone to finish the initialization.
+ */
+#define MHP_MEMBLOCK_API		1<<0
+
+/* Restrictions for the memory hotplug */
+struct mhp_restrictions {
+	unsigned long flags;	/* MHP_ flags */
+	struct vmem_altmap *altmap; /* use this alternative allocatro for memmaps */
+};
+
 /* reasonably generic interface to expand the physical pages */
 extern int __add_pages(int nid, unsigned long start_pfn,
-	unsigned long nr_pages, bool want_memblock);
+	unsigned long nr_pages, struct mhp_restrictions *restrictions);
 
 #ifdef CONFIG_NUMA
 extern int memory_add_physaddr_to_nid(u64 start);
@@ -306,7 +319,8 @@ extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
 		void *arg, int (*func)(struct memory_block *, void *));
 extern int add_memory(int nid, u64 start, u64 size);
 extern int add_memory_resource(int nid, struct resource *resource, bool online);
-extern int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock);
+extern int arch_add_memory(int nid, u64 start, u64 size,
+		struct mhp_restrictions *restrictions);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 124bed776532..02029a993329 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -286,6 +286,7 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 	struct dev_pagemap *pgmap;
 	struct page_map *page_map;
 	int error, nid, is_ram;
+	struct mhp_restrictions restrictions = {};
 	unsigned long pfn;
 
 	align_start = res->start & ~(SECTION_SIZE - 1);
@@ -357,8 +358,11 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 	if (error)
 		goto err_pfn_remap;
 
+	/* We do not want any optional features only our own memmap */
+	restrictions.altmap = altmap;
+
 	mem_hotplug_begin();
-	error = arch_add_memory(nid, align_start, align_size, false);
+	error = arch_add_memory(nid, align_start, align_size, &restrictions);
 	if (!error)
 		move_pfn_range_to_zone(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
 					align_start >> PAGE_SHIFT,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 8031cc41bc5c..d28883aea475 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -286,18 +286,18 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
  * add the new pages.
  */
 int __ref __add_pages(int nid, unsigned long phys_start_pfn,
-			unsigned long nr_pages, bool want_memblock)
+			unsigned long nr_pages,
+			struct mhp_restrictions *restrictions)
 {
 	unsigned long i;
 	int err = 0;
 	int start_sec, end_sec;
-	struct vmem_altmap *altmap;
+	struct vmem_altmap *altmap = restrictions->altmap;
 
 	/* during initialize mem_map, align hot-added range to section */
 	start_sec = pfn_to_section_nr(phys_start_pfn);
 	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
 
-	altmap = to_vmem_altmap((unsigned long) pfn_to_page(phys_start_pfn));
 	if (altmap) {
 		/*
 		 * Validate altmap is within bounds of the total request
@@ -312,7 +312,8 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 	}
 
 	for (i = start_sec; i <= end_sec; i++) {
-		err = __add_section(nid, section_nr_to_pfn(i), want_memblock);
+		err = __add_section(nid, section_nr_to_pfn(i),
+				restrictions->flags & MHP_MEMBLOCK_API);
 
 		/*
 		 * EEXIST is finally dealt with by ioresource collision
@@ -1114,6 +1115,7 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
 	bool new_pgdat;
 	bool new_node;
 	int ret;
+	struct mhp_restrictions restrictions = {};
 
 	start = res->start;
 	size = resource_size(res);
@@ -1145,8 +1147,10 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
 			goto error;
 	}
 
+	restrictions.flags = MHP_MEMBLOCK_API;
+
 	/* call arch's memory hotadd */
-	ret = arch_add_memory(nid, start, size, true);
+	ret = arch_add_memory(nid, start, size, &restrictions);
 
 	if (ret < 0)
 		goto error;
-- 
2.13.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/6] mm, memory_hotplug: provide a more generic restrictions for memory hotplug
@ 2017-08-01 12:41   ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-01 12:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko,
	Dan Williams

From: Michal Hocko <mhocko@suse.com>

arch_add_memory, __add_pages take a want_memblock which controls whether
the newly added memory should get the sysfs memblock user API (e.g.
ZONE_DEVICE users do not want/need this interface). Some callers even
want to control where do we allocate the memmap from by configuring
altmap. This is currently done quite ugly by searching for altmap down
in memory hotplug (to_vmem_altmap). It should be the caller to provide
the altmap down the call chain.

Add a more generic hotplug context for arch_add_memory and __add_pages.
struct mhp_restrictions contains flags which contains additional
features to be enabled by the memory hotplug (MHP_MEMBLOCK_API
currently) and altmap for alternative memmap allocator.

Please note that the complete altmap propagation down to vmemmap code
is still not done in this patch. It will be done in the follow up to
reduce the churn here.

This patch shouldn't introduce any functional change.

Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/ia64/mm/init.c            |  5 +++--
 arch/powerpc/mm/mem.c          |  5 +++--
 arch/s390/mm/init.c            |  5 +++--
 arch/sh/mm/init.c              |  5 +++--
 arch/x86/mm/init_32.c          |  5 +++--
 arch/x86/mm/init_64.c          |  5 +++--
 include/linux/memory_hotplug.h | 18 ++++++++++++++++--
 kernel/memremap.c              |  6 +++++-
 mm/memory_hotplug.c            | 14 +++++++++-----
 9 files changed, 48 insertions(+), 20 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index a4e8d6bd9cfa..7fab6a4bdda7 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -646,13 +646,14 @@ mem_init (void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size,
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
-	ret = __add_pages(nid, start_pfn, nr_pages, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, restrictions);
 	if (ret)
 		printk("%s: Problem encountered in __add_pages() as ret=%d\n",
 		       __func__,  ret);
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index de5a90e1ceaa..dcbf278c4e5f 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -126,7 +126,8 @@ int __weak remove_section_mapping(unsigned long start, unsigned long end)
 	return -ENODEV;
 }
 
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size,
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -143,7 +144,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
 		return -EFAULT;
 	}
 
-	return __add_pages(nid, start_pfn, nr_pages, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, restrictions);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index f4fb5d191562..e3db0079ebfc 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -162,7 +162,8 @@ unsigned long memory_block_size_bytes(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size,
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long size_pages = PFN_DOWN(size);
@@ -172,7 +173,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
 	if (rc)
 		return rc;
 
-	rc = __add_pages(nid, start_pfn, size_pages, want_memblock);
+	rc = __add_pages(nid, start_pfn, size_pages, restrictions);
 	if (rc)
 		vmem_remove_mapping(start, size);
 	return rc;
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index bf726af5f1a5..a603e9be989b 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -485,14 +485,15 @@ void free_initrd_mem(unsigned long start, unsigned long end)
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size,
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
 	/* We only have ZONE_NORMAL, so this is easy.. */
-	ret = __add_pages(nid, start_pfn, nr_pages, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, restrictions);
 	if (unlikely(ret))
 		printk("%s: Failed, __add_pages() == %d\n", __func__, ret);
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 8a64a6f2848d..8465555cac90 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -823,12 +823,13 @@ void __init mem_init(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size,
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	return __add_pages(nid, start_pfn, nr_pages, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, restrictions);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index cfe01150f24f..da57a9c9c218 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -772,7 +772,8 @@ static void  update_end_of_memory_vars(u64 start, u64 size)
 	}
 }
 
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size,
+		struct mhp_restrictions *restrictions)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -780,7 +781,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
 
 	init_memory_mapping(start, start + size);
 
-	ret = __add_pages(nid, start_pfn, nr_pages, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, restrictions);
 	WARN_ON_ONCE(ret);
 
 	/* update max_pfn, max_low_pfn and high_memory */
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index f64321b35e88..76e5bfde8050 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -129,9 +129,22 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
 	unsigned long nr_pages);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
+/*
+ * Do we want sysfs memblock files created. This will allow userspace to online
+ * and offline memory explicitly. Lack of this bit means that the caller has to
+ * call move_pfn_range_to_zone to finish the initialization.
+ */
+#define MHP_MEMBLOCK_API		1<<0
+
+/* Restrictions for the memory hotplug */
+struct mhp_restrictions {
+	unsigned long flags;	/* MHP_ flags */
+	struct vmem_altmap *altmap; /* use this alternative allocatro for memmaps */
+};
+
 /* reasonably generic interface to expand the physical pages */
 extern int __add_pages(int nid, unsigned long start_pfn,
-	unsigned long nr_pages, bool want_memblock);
+	unsigned long nr_pages, struct mhp_restrictions *restrictions);
 
 #ifdef CONFIG_NUMA
 extern int memory_add_physaddr_to_nid(u64 start);
@@ -306,7 +319,8 @@ extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
 		void *arg, int (*func)(struct memory_block *, void *));
 extern int add_memory(int nid, u64 start, u64 size);
 extern int add_memory_resource(int nid, struct resource *resource, bool online);
-extern int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock);
+extern int arch_add_memory(int nid, u64 start, u64 size,
+		struct mhp_restrictions *restrictions);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 124bed776532..02029a993329 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -286,6 +286,7 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 	struct dev_pagemap *pgmap;
 	struct page_map *page_map;
 	int error, nid, is_ram;
+	struct mhp_restrictions restrictions = {};
 	unsigned long pfn;
 
 	align_start = res->start & ~(SECTION_SIZE - 1);
@@ -357,8 +358,11 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 	if (error)
 		goto err_pfn_remap;
 
+	/* We do not want any optional features only our own memmap */
+	restrictions.altmap = altmap;
+
 	mem_hotplug_begin();
-	error = arch_add_memory(nid, align_start, align_size, false);
+	error = arch_add_memory(nid, align_start, align_size, &restrictions);
 	if (!error)
 		move_pfn_range_to_zone(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
 					align_start >> PAGE_SHIFT,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 8031cc41bc5c..d28883aea475 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -286,18 +286,18 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
  * add the new pages.
  */
 int __ref __add_pages(int nid, unsigned long phys_start_pfn,
-			unsigned long nr_pages, bool want_memblock)
+			unsigned long nr_pages,
+			struct mhp_restrictions *restrictions)
 {
 	unsigned long i;
 	int err = 0;
 	int start_sec, end_sec;
-	struct vmem_altmap *altmap;
+	struct vmem_altmap *altmap = restrictions->altmap;
 
 	/* during initialize mem_map, align hot-added range to section */
 	start_sec = pfn_to_section_nr(phys_start_pfn);
 	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
 
-	altmap = to_vmem_altmap((unsigned long) pfn_to_page(phys_start_pfn));
 	if (altmap) {
 		/*
 		 * Validate altmap is within bounds of the total request
@@ -312,7 +312,8 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 	}
 
 	for (i = start_sec; i <= end_sec; i++) {
-		err = __add_section(nid, section_nr_to_pfn(i), want_memblock);
+		err = __add_section(nid, section_nr_to_pfn(i),
+				restrictions->flags & MHP_MEMBLOCK_API);
 
 		/*
 		 * EEXIST is finally dealt with by ioresource collision
@@ -1114,6 +1115,7 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
 	bool new_pgdat;
 	bool new_node;
 	int ret;
+	struct mhp_restrictions restrictions = {};
 
 	start = res->start;
 	size = resource_size(res);
@@ -1145,8 +1147,10 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
 			goto error;
 	}
 
+	restrictions.flags = MHP_MEMBLOCK_API;
+
 	/* call arch's memory hotadd */
-	ret = arch_add_memory(nid, start, size, true);
+	ret = arch_add_memory(nid, start, size, &restrictions);
 
 	if (ret < 0)
 		goto error;
-- 
2.13.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/6] mm, memory_hotplug: allocate memmap from the added memory range for sparse-vmemmap
  2017-08-01 12:41 ` Michal Hocko
@ 2017-08-01 12:41   ` Michal Hocko
  -1 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-01 12:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko,
	Benjamin Herrenschmidt, Dan Williams, Gerald Schaefer,
	H. Peter Anvin, Ingo Molnar, Michael Ellerman, Paul Mackerras,
	Thomas Gleixner, x86

From: Michal Hocko <mhocko@suse.com>

Physical memory hotadd has to allocate a memmap (struct page array) for
the newly added memory section. kmalloc is currantly used for those
allocations.

This has some disadvantages a) an existing memory is consumed for
that purpose (~2MB per 128MB memory section) and b) if the whole node
is movable then we have off-node struct pages which has performance
drawbacks.

a) has turned out to be a problem for memory hotplug based ballooning
because the userspace might not react in time to online memory while
to memory consumed during physical hotadd consumes enough memory to push
system to OOM. 31bc3858ea3e ("memory-hotplug: add automatic onlining
policy for the newly added memory") has been added to workaround that
problem.

We can do much better when CONFIG_SPARSEMEM_VMEMMAP=y because vmemap
page tables can map arbitrary memory. That means that we can simply
use the beginning of each memory block and map struct pages there.
struct pages which back the allocated space then just need to be treated
carefully so that we know they are not usable.

Add {_Set,_Clear}PageVmemmap helpers to distinguish those pages in pfn
walkers. We do not have any spare page flag for this purpose so use the
combination of PageReserved bit which already tells that the page should
be ignored by the core mm code and store VMEMMAP_PAGE (which sets all
bits but PAGE_MAPPING_FLAGS) into page->mapping.

On the memory hotplug front add a new MHP_MEMMAP_FROM_RANGE restriction
flag. User is supposed to set the flag if the memmap should be allocated
from the hotadded range. Please note that this is just a hint and
architecture code can veto this if this cannot be supported. E.g. s390
cannot support this currently beause the physical memory range is made
accessible only during memory online.

Implementation wise we reuse vmem_altmap infrastructure to override
the default allocator used by __vmemap_populate. Once the memmap is
allocated we need a way to mark altmap pfns used for the allocation
and this is done by a new vmem_altmap::flush_alloc_pfns callback.
mark_vmemmap_pages implementation then simply __SetPageVmemmap all
struct pages backing those pfns. The callback is called from
sparse_add_one_section after the memmap has been initialized to 0.

We also have to be careful about those pages during online and offline
operations. They are simply skipped now so online will keep them
reserved and so unusable for any other purpose and offline ignores them
so they do not block the offline operation.

Finally __ClearPageVmemmap is called when the vmemmap page tables are
torn down.

Please note that only the memory hotplug is currently using this
allocation scheme. The boot time memmap allocation could use the same
trick as well but this is not done yet.

Changes since RFC
- rewind base pfn of the vmem_altmap when flushing one section
  (mark_vmemmap_pages) because we do not want to flush the same range
  all over again when memblock has more than one section
- Allow memory hotplug users to explicitly opt-in for this feature by
  using MHP_MEMMAP_FROM_RANGE flag.
- s390 disables MHP_MEMMAP_FROM_RANGE unconditionally because allocating
  from the added memory will blow up as the memory range is still not
  accessible that early.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/powerpc/mm/init_64.c      |  5 +++++
 arch/s390/mm/init.c            |  6 ++++++
 arch/x86/mm/init_64.c          |  9 +++++++++
 include/linux/memory_hotplug.h | 12 +++++++++++-
 include/linux/memremap.h       | 30 ++++++++++++++++++++++++++++--
 include/linux/mm.h             |  6 ++----
 include/linux/page-flags.h     | 18 ++++++++++++++++++
 kernel/memremap.c              |  6 ------
 mm/compaction.c                |  3 +++
 mm/memory_hotplug.c            | 27 +++++++++++++++++++++++----
 mm/page_alloc.c                | 14 ++++++++++++++
 mm/page_isolation.c            | 11 ++++++++++-
 mm/sparse-vmemmap.c            | 11 +++++++++--
 mm/sparse.c                    | 26 +++++++++++++++++++++-----
 14 files changed, 159 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 5ea5e870a589..b14f2239a152 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -264,6 +264,11 @@ void __ref vmemmap_free(unsigned long start, unsigned long end)
 				goto unmap;
 			}
 
+			if (PageVmemmap(page)) {
+				__ClearPageVmemmap(page);
+				goto unmap;
+			}
+
 			if (PageReserved(page)) {
 				/* allocated from bootmem */
 				if (page_size < PAGE_SIZE) {
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index e3db0079ebfc..85dacb95cfa4 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -169,6 +169,12 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	unsigned long size_pages = PFN_DOWN(size);
 	int rc;
 
+	/*
+	 * Physical memory is added only later during the memory online so we
+	 * cannot use the added range at this stage unfortunatelly
+	 */
+	restrictions->flags &= ~MHP_MEMMAP_FROM_RANGE;
+
 	rc = vmem_add_mapping(start, size);
 	if (rc)
 		return rc;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index da57a9c9c218..66951bf03629 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -804,6 +804,15 @@ static void __meminit free_pagetable(struct page *page, int order)
 		return;
 	}
 
+	/*
+	 * runtime vmemmap pages are residing inside the memory section so
+	 * they do not have to be freed anywhere.
+	 */
+	if (PageVmemmap(page)) {
+		__ClearPageVmemmap(page);
+		return;
+	}
+
 	/* bootmem page has reserved flag */
 	if (PageReserved(page)) {
 		__ClearPageReserved(page);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 76e5bfde8050..6fe47d1c9c74 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -136,6 +136,13 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
  */
 #define MHP_MEMBLOCK_API		1<<0
 
+/*
+ * Do we want memmap (struct page array) allocated from the hotadded range.
+ * Please note that only SPARSE_VMEMMAP implements this feauture and some
+ * architectures might not support it even for that memory model (e.g. s390)
+ */
+#define MHP_MEMMAP_FROM_RANGE		1<<1
+
 /* Restrictions for the memory hotplug */
 struct mhp_restrictions {
 	unsigned long flags;	/* MHP_ flags */
@@ -326,7 +333,10 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
 extern bool is_memblock_offlined(struct memory_block *mem);
 extern void remove_memory(int nid, u64 start, u64 size);
-extern int sparse_add_one_section(struct pglist_data *pgdat, unsigned long start_pfn);
+
+struct vmem_altmap;
+extern int sparse_add_one_section(struct pglist_data *pgdat, unsigned long start_pfn,
+		struct vmem_altmap *altmap);
 extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
 		unsigned long map_offset);
 extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 6f8f65d8ebdd..55f82c652d51 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -14,18 +14,44 @@ struct device;
  * @free: free pages set aside in the mapping for memmap storage
  * @align: pages reserved to meet allocation alignments
  * @alloc: track pages consumed, private to __vmemmap_populate()
+ * @flush_alloc_pfns: callback to be called on the allocated range after it
+ *  is mapped to the vmemmap - see mark_vmemmap_pages
  */
 struct vmem_altmap {
-	const unsigned long base_pfn;
+	unsigned long base_pfn;
 	const unsigned long reserve;
 	unsigned long free;
 	unsigned long align;
 	unsigned long alloc;
+	void (*flush_alloc_pfns)(struct vmem_altmap *self);
 };
 
-unsigned long vmem_altmap_offset(struct vmem_altmap *altmap);
+static inline unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
+{
+	/* number of pfns from base where pfn_to_page() is valid */
+	return altmap->reserve + altmap->free;
+}
 void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns);
 
+static inline void mark_vmemmap_pages(struct vmem_altmap *self)
+{
+	unsigned long pfn = self->base_pfn + self->reserve;
+	unsigned long nr_pages = self->alloc;
+	unsigned long i;
+
+	/*
+	 * All allocations for the memory hotplug are the same sized so align
+	 * should be 0
+	 */
+	WARN_ON(self->align);
+	for (i = 0; i < nr_pages; i++, pfn++) {
+		struct page *page = pfn_to_page(pfn);
+		__SetPageVmemmap(page);
+	}
+
+	self->alloc = 0;
+	self->base_pfn += nr_pages + self->reserve;
+}
 /**
  * struct dev_pagemap - metadata for ZONE_DEVICE mappings
  * @altmap: pre-allocated/reserved memory for vmemmap allocations
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 957d4658977d..3ce673570fb8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2426,6 +2426,8 @@ void sparse_mem_maps_populate_node(struct page **map_map,
 				   unsigned long map_count,
 				   int nodeid);
 
+struct page *__sparse_mem_map_populate(unsigned long pnum, int nid,
+		struct vmem_altmap *altmap);
 struct page *sparse_mem_map_populate(unsigned long pnum, int nid);
 pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
 p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node);
@@ -2436,10 +2438,6 @@ void *vmemmap_alloc_block(unsigned long size, int node);
 struct vmem_altmap;
 void *__vmemmap_alloc_block_buf(unsigned long size, int node,
 		struct vmem_altmap *altmap);
-static inline void *vmemmap_alloc_block_buf(unsigned long size, int node)
-{
-	return __vmemmap_alloc_block_buf(size, node, NULL);
-}
 
 #ifdef CONFIG_ZONE_DEVICE
 struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start);
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index ba2d470d2d0a..dcc7706eed8e 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -412,6 +412,24 @@ static __always_inline int __PageMovable(struct page *page)
 				PAGE_MAPPING_MOVABLE;
 }
 
+#define VMEMMAP_PAGE ~PAGE_MAPPING_FLAGS
+static __always_inline int PageVmemmap(struct page *page)
+{
+	return PageReserved(page) && (unsigned long)page->mapping == VMEMMAP_PAGE;
+}
+
+static __always_inline void __ClearPageVmemmap(struct page *page)
+{
+	ClearPageReserved(page);
+	page->mapping = NULL;
+}
+
+static __always_inline void __SetPageVmemmap(struct page *page)
+{
+	SetPageReserved(page);
+	page->mapping = (void *)VMEMMAP_PAGE;
+}
+
 #ifdef CONFIG_KSM
 /*
  * A KSM page is one of those write-protected "shared pages" or "merged pages"
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 02029a993329..844f5ba93383 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -397,12 +397,6 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 }
 EXPORT_SYMBOL(devm_memremap_pages);
 
-unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
-{
-	/* number of pfns from base where pfn_to_page() is valid */
-	return altmap->reserve + altmap->free;
-}
-
 void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns)
 {
 	altmap->alloc -= nr_pfns;
diff --git a/mm/compaction.c b/mm/compaction.c
index fb548e4c7bd4..1ba5ad029258 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -741,6 +741,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 		page = pfn_to_page(low_pfn);
 
+		if (PageVmemmap(page))
+			goto isolate_fail;
+
 		if (!valid_page)
 			valid_page = page;
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d28883aea475..08545a567de6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -35,6 +35,7 @@
 #include <linux/memblock.h>
 #include <linux/bootmem.h>
 #include <linux/compaction.h>
+#include <linux/memremap.h>
 
 #include <asm/tlbflush.h>
 
@@ -244,7 +245,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
 static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
-		bool want_memblock)
+		bool want_memblock, struct vmem_altmap *altmap)
 {
 	int ret;
 	int i;
@@ -252,7 +253,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
 	if (pfn_valid(phys_start_pfn))
 		return -EEXIST;
 
-	ret = sparse_add_one_section(NODE_DATA(nid), phys_start_pfn);
+	ret = sparse_add_one_section(NODE_DATA(nid), phys_start_pfn, altmap);
 	if (ret < 0)
 		return ret;
 
@@ -293,11 +294,23 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 	int err = 0;
 	int start_sec, end_sec;
 	struct vmem_altmap *altmap = restrictions->altmap;
+	struct vmem_altmap __memblk_altmap;
 
 	/* during initialize mem_map, align hot-added range to section */
 	start_sec = pfn_to_section_nr(phys_start_pfn);
 	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
 
+	/*
+	 * Check device specific altmap and fallback to allocating from the
+	 * begining of the memblock if requested
+	 */
+	if (!altmap && (restrictions->flags & MHP_MEMMAP_FROM_RANGE)) {
+		__memblk_altmap.base_pfn = phys_start_pfn;
+		__memblk_altmap.free = nr_pages;
+		__memblk_altmap.flush_alloc_pfns = mark_vmemmap_pages;
+		altmap = &__memblk_altmap;
+	}
+
 	if (altmap) {
 		/*
 		 * Validate altmap is within bounds of the total request
@@ -313,7 +326,8 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 
 	for (i = start_sec; i <= end_sec; i++) {
 		err = __add_section(nid, section_nr_to_pfn(i),
-				restrictions->flags & MHP_MEMBLOCK_API);
+				restrictions->flags & MHP_MEMBLOCK_API,
+				altmap);
 
 		/*
 		 * EEXIST is finally dealt with by ioresource collision
@@ -690,6 +704,8 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
 	if (PageReserved(pfn_to_page(start_pfn)))
 		for (i = 0; i < nr_pages; i++) {
 			page = pfn_to_page(start_pfn + i);
+			if (PageVmemmap(page))
+				continue;
 			if (PageHWPoison(page)) {
 				ClearPageReserved(page);
 				continue;
@@ -1147,7 +1163,7 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
 			goto error;
 	}
 
-	restrictions.flags = MHP_MEMBLOCK_API;
+	restrictions.flags = MHP_MEMBLOCK_API | MHP_MEMMAP_FROM_RANGE;
 
 	/* call arch's memory hotadd */
 	ret = arch_add_memory(nid, start, size, &restrictions);
@@ -1378,6 +1394,9 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			continue;
 		page = pfn_to_page(pfn);
 
+		if (PageVmemmap(page))
+			continue;
+
 		if (PageHuge(page)) {
 			struct page *head = compound_head(page);
 			pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f4e5db85ebfc..145187d5656b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1168,6 +1168,10 @@ static void free_one_page(struct zone *zone,
 static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 				unsigned long zone, int nid)
 {
+	/*
+	 * Please note that the page already has some state which has to be
+	 * preserved (e.g. PageVmemmap)
+	 */
 	set_page_links(page, zone, nid, pfn);
 	init_page_count(page);
 	page_mapcount_reset(page);
@@ -7639,6 +7643,16 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 			continue;
 		}
 		page = pfn_to_page(pfn);
+
+		/*
+		 * vmemmap pages are residing inside the memory section so
+		 * they do not have to be freed anywhere.
+		 */
+		if (PageVmemmap(page)) {
+			pfn++;
+			continue;
+		}
+
 		/*
 		 * The HWPoisoned page may be not in buddy system, and
 		 * page_count() is not 0.
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 757410d9f758..6e976aa94dee 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -242,6 +242,10 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 			continue;
 		}
 		page = pfn_to_page(pfn);
+		if (PageVmemmap(page)) {
+			pfn++;
+			continue;
+		}
 		if (PageBuddy(page))
 			/*
 			 * If the page is on a free list, it has to be on
@@ -272,10 +276,15 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 	 * are not aligned to pageblock_nr_pages.
 	 * Then we just check migratetype first.
 	 */
-	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
+	for (pfn = start_pfn; pfn < end_pfn; ) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
+		if (PageVmemmap(page)) {
+			pfn++;
+			continue;
+		}
 		if (page && !is_migrate_isolate_page(page))
 			break;
+		pfn += pageblock_nr_pages;
 	}
 	page = __first_valid_page(start_pfn, end_pfn - start_pfn);
 	if ((pfn < end_pfn) || !page)
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 48c44eda3254..0af022afc951 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -259,7 +259,8 @@ int __meminit vmemmap_populate_basepages(unsigned long start,
 	return 0;
 }
 
-struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid)
+struct page * __meminit __sparse_mem_map_populate(unsigned long pnum, int nid,
+		struct vmem_altmap *altmap)
 {
 	unsigned long start;
 	unsigned long end;
@@ -269,12 +270,18 @@ struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid)
 	start = (unsigned long)map;
 	end = (unsigned long)(map + PAGES_PER_SECTION);
 
-	if (vmemmap_populate(start, end, nid))
+	if (__vmemmap_populate(start, end, nid, altmap))
 		return NULL;
 
 	return map;
 }
 
+struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid)
+{
+	WARN_ON(to_vmem_altmap(pnum));
+	return __sparse_mem_map_populate(pnum, nid, NULL);
+}
+
 void __init sparse_mem_maps_populate_node(struct page **map_map,
 					  unsigned long pnum_begin,
 					  unsigned long pnum_end,
diff --git a/mm/sparse.c b/mm/sparse.c
index a9783acf2bb9..19b9aa60f48a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -10,6 +10,7 @@
 #include <linux/export.h>
 #include <linux/spinlock.h>
 #include <linux/vmalloc.h>
+#include <linux/memremap.h>
 
 #include "internal.h"
 #include <asm/dma.h>
@@ -662,10 +663,11 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 #endif
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid)
+static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
+		struct vmem_altmap *altmap)
 {
 	/* This will make the necessary allocations eventually. */
-	return sparse_mem_map_populate(pnum, nid);
+	return __sparse_mem_map_populate(pnum, nid, altmap);
 }
 static void __kfree_section_memmap(struct page *memmap)
 {
@@ -705,7 +707,8 @@ static struct page *__kmalloc_section_memmap(void)
 	return ret;
 }
 
-static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid)
+static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
+		struct vmem_altmap *altmap)
 {
 	return __kmalloc_section_memmap();
 }
@@ -757,7 +760,8 @@ static void free_map_bootmem(struct page *memmap)
  * set.  If this is <=0, then that means that the passed-in
  * map was not consumed and must be freed.
  */
-int __meminit sparse_add_one_section(struct pglist_data *pgdat, unsigned long start_pfn)
+int __meminit sparse_add_one_section(struct pglist_data *pgdat, unsigned long start_pfn,
+		struct vmem_altmap *altmap)
 {
 	unsigned long section_nr = pfn_to_section_nr(start_pfn);
 	struct mem_section *ms;
@@ -773,7 +777,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat, unsigned long st
 	ret = sparse_index_init(section_nr, pgdat->node_id);
 	if (ret < 0 && ret != -EEXIST)
 		return ret;
-	memmap = kmalloc_section_memmap(section_nr, pgdat->node_id);
+	memmap = kmalloc_section_memmap(section_nr, pgdat->node_id, altmap);
 	if (!memmap)
 		return -ENOMEM;
 	usemap = __kmalloc_section_usemap();
@@ -790,8 +794,20 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat, unsigned long st
 		goto out;
 	}
 
+	/*
+	 * TODO get rid of this somehow - we want to postpone the full
+	 * initialization until memmap_init_zone.
+	 */
 	memset(memmap, 0, sizeof(struct page) * PAGES_PER_SECTION);
 
+	/*
+	 * now that we have a valid vmemmap mapping we can use
+	 * pfn_to_page and flush struct pages which back the
+	 * memmap
+	 */
+	if (altmap && altmap->flush_alloc_pfns)
+		altmap->flush_alloc_pfns(altmap);
+
 	section_mark_present(ms);
 
 	ret = sparse_init_one_section(ms, section_nr, memmap, usemap);
-- 
2.13.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/6] mm, memory_hotplug: allocate memmap from the added memory range for sparse-vmemmap
@ 2017-08-01 12:41   ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-01 12:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko,
	Benjamin Herrenschmidt, Dan Williams, Gerald Schaefer,
	H. Peter Anvin, Ingo Molnar, Michael Ellerman, Paul Mackerras,
	Thomas Gleixner, x86

From: Michal Hocko <mhocko@suse.com>

Physical memory hotadd has to allocate a memmap (struct page array) for
the newly added memory section. kmalloc is currantly used for those
allocations.

This has some disadvantages a) an existing memory is consumed for
that purpose (~2MB per 128MB memory section) and b) if the whole node
is movable then we have off-node struct pages which has performance
drawbacks.

a) has turned out to be a problem for memory hotplug based ballooning
because the userspace might not react in time to online memory while
to memory consumed during physical hotadd consumes enough memory to push
system to OOM. 31bc3858ea3e ("memory-hotplug: add automatic onlining
policy for the newly added memory") has been added to workaround that
problem.

We can do much better when CONFIG_SPARSEMEM_VMEMMAP=y because vmemap
page tables can map arbitrary memory. That means that we can simply
use the beginning of each memory block and map struct pages there.
struct pages which back the allocated space then just need to be treated
carefully so that we know they are not usable.

Add {_Set,_Clear}PageVmemmap helpers to distinguish those pages in pfn
walkers. We do not have any spare page flag for this purpose so use the
combination of PageReserved bit which already tells that the page should
be ignored by the core mm code and store VMEMMAP_PAGE (which sets all
bits but PAGE_MAPPING_FLAGS) into page->mapping.

On the memory hotplug front add a new MHP_MEMMAP_FROM_RANGE restriction
flag. User is supposed to set the flag if the memmap should be allocated
from the hotadded range. Please note that this is just a hint and
architecture code can veto this if this cannot be supported. E.g. s390
cannot support this currently beause the physical memory range is made
accessible only during memory online.

Implementation wise we reuse vmem_altmap infrastructure to override
the default allocator used by __vmemap_populate. Once the memmap is
allocated we need a way to mark altmap pfns used for the allocation
and this is done by a new vmem_altmap::flush_alloc_pfns callback.
mark_vmemmap_pages implementation then simply __SetPageVmemmap all
struct pages backing those pfns. The callback is called from
sparse_add_one_section after the memmap has been initialized to 0.

We also have to be careful about those pages during online and offline
operations. They are simply skipped now so online will keep them
reserved and so unusable for any other purpose and offline ignores them
so they do not block the offline operation.

Finally __ClearPageVmemmap is called when the vmemmap page tables are
torn down.

Please note that only the memory hotplug is currently using this
allocation scheme. The boot time memmap allocation could use the same
trick as well but this is not done yet.

Changes since RFC
- rewind base pfn of the vmem_altmap when flushing one section
  (mark_vmemmap_pages) because we do not want to flush the same range
  all over again when memblock has more than one section
- Allow memory hotplug users to explicitly opt-in for this feature by
  using MHP_MEMMAP_FROM_RANGE flag.
- s390 disables MHP_MEMMAP_FROM_RANGE unconditionally because allocating
  from the added memory will blow up as the memory range is still not
  accessible that early.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/powerpc/mm/init_64.c      |  5 +++++
 arch/s390/mm/init.c            |  6 ++++++
 arch/x86/mm/init_64.c          |  9 +++++++++
 include/linux/memory_hotplug.h | 12 +++++++++++-
 include/linux/memremap.h       | 30 ++++++++++++++++++++++++++++--
 include/linux/mm.h             |  6 ++----
 include/linux/page-flags.h     | 18 ++++++++++++++++++
 kernel/memremap.c              |  6 ------
 mm/compaction.c                |  3 +++
 mm/memory_hotplug.c            | 27 +++++++++++++++++++++++----
 mm/page_alloc.c                | 14 ++++++++++++++
 mm/page_isolation.c            | 11 ++++++++++-
 mm/sparse-vmemmap.c            | 11 +++++++++--
 mm/sparse.c                    | 26 +++++++++++++++++++++-----
 14 files changed, 159 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 5ea5e870a589..b14f2239a152 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -264,6 +264,11 @@ void __ref vmemmap_free(unsigned long start, unsigned long end)
 				goto unmap;
 			}
 
+			if (PageVmemmap(page)) {
+				__ClearPageVmemmap(page);
+				goto unmap;
+			}
+
 			if (PageReserved(page)) {
 				/* allocated from bootmem */
 				if (page_size < PAGE_SIZE) {
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index e3db0079ebfc..85dacb95cfa4 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -169,6 +169,12 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	unsigned long size_pages = PFN_DOWN(size);
 	int rc;
 
+	/*
+	 * Physical memory is added only later during the memory online so we
+	 * cannot use the added range at this stage unfortunatelly
+	 */
+	restrictions->flags &= ~MHP_MEMMAP_FROM_RANGE;
+
 	rc = vmem_add_mapping(start, size);
 	if (rc)
 		return rc;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index da57a9c9c218..66951bf03629 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -804,6 +804,15 @@ static void __meminit free_pagetable(struct page *page, int order)
 		return;
 	}
 
+	/*
+	 * runtime vmemmap pages are residing inside the memory section so
+	 * they do not have to be freed anywhere.
+	 */
+	if (PageVmemmap(page)) {
+		__ClearPageVmemmap(page);
+		return;
+	}
+
 	/* bootmem page has reserved flag */
 	if (PageReserved(page)) {
 		__ClearPageReserved(page);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 76e5bfde8050..6fe47d1c9c74 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -136,6 +136,13 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
  */
 #define MHP_MEMBLOCK_API		1<<0
 
+/*
+ * Do we want memmap (struct page array) allocated from the hotadded range.
+ * Please note that only SPARSE_VMEMMAP implements this feauture and some
+ * architectures might not support it even for that memory model (e.g. s390)
+ */
+#define MHP_MEMMAP_FROM_RANGE		1<<1
+
 /* Restrictions for the memory hotplug */
 struct mhp_restrictions {
 	unsigned long flags;	/* MHP_ flags */
@@ -326,7 +333,10 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
 extern bool is_memblock_offlined(struct memory_block *mem);
 extern void remove_memory(int nid, u64 start, u64 size);
-extern int sparse_add_one_section(struct pglist_data *pgdat, unsigned long start_pfn);
+
+struct vmem_altmap;
+extern int sparse_add_one_section(struct pglist_data *pgdat, unsigned long start_pfn,
+		struct vmem_altmap *altmap);
 extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
 		unsigned long map_offset);
 extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 6f8f65d8ebdd..55f82c652d51 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -14,18 +14,44 @@ struct device;
  * @free: free pages set aside in the mapping for memmap storage
  * @align: pages reserved to meet allocation alignments
  * @alloc: track pages consumed, private to __vmemmap_populate()
+ * @flush_alloc_pfns: callback to be called on the allocated range after it
+ *  is mapped to the vmemmap - see mark_vmemmap_pages
  */
 struct vmem_altmap {
-	const unsigned long base_pfn;
+	unsigned long base_pfn;
 	const unsigned long reserve;
 	unsigned long free;
 	unsigned long align;
 	unsigned long alloc;
+	void (*flush_alloc_pfns)(struct vmem_altmap *self);
 };
 
-unsigned long vmem_altmap_offset(struct vmem_altmap *altmap);
+static inline unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
+{
+	/* number of pfns from base where pfn_to_page() is valid */
+	return altmap->reserve + altmap->free;
+}
 void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns);
 
+static inline void mark_vmemmap_pages(struct vmem_altmap *self)
+{
+	unsigned long pfn = self->base_pfn + self->reserve;
+	unsigned long nr_pages = self->alloc;
+	unsigned long i;
+
+	/*
+	 * All allocations for the memory hotplug are the same sized so align
+	 * should be 0
+	 */
+	WARN_ON(self->align);
+	for (i = 0; i < nr_pages; i++, pfn++) {
+		struct page *page = pfn_to_page(pfn);
+		__SetPageVmemmap(page);
+	}
+
+	self->alloc = 0;
+	self->base_pfn += nr_pages + self->reserve;
+}
 /**
  * struct dev_pagemap - metadata for ZONE_DEVICE mappings
  * @altmap: pre-allocated/reserved memory for vmemmap allocations
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 957d4658977d..3ce673570fb8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2426,6 +2426,8 @@ void sparse_mem_maps_populate_node(struct page **map_map,
 				   unsigned long map_count,
 				   int nodeid);
 
+struct page *__sparse_mem_map_populate(unsigned long pnum, int nid,
+		struct vmem_altmap *altmap);
 struct page *sparse_mem_map_populate(unsigned long pnum, int nid);
 pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
 p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node);
@@ -2436,10 +2438,6 @@ void *vmemmap_alloc_block(unsigned long size, int node);
 struct vmem_altmap;
 void *__vmemmap_alloc_block_buf(unsigned long size, int node,
 		struct vmem_altmap *altmap);
-static inline void *vmemmap_alloc_block_buf(unsigned long size, int node)
-{
-	return __vmemmap_alloc_block_buf(size, node, NULL);
-}
 
 #ifdef CONFIG_ZONE_DEVICE
 struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start);
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index ba2d470d2d0a..dcc7706eed8e 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -412,6 +412,24 @@ static __always_inline int __PageMovable(struct page *page)
 				PAGE_MAPPING_MOVABLE;
 }
 
+#define VMEMMAP_PAGE ~PAGE_MAPPING_FLAGS
+static __always_inline int PageVmemmap(struct page *page)
+{
+	return PageReserved(page) && (unsigned long)page->mapping == VMEMMAP_PAGE;
+}
+
+static __always_inline void __ClearPageVmemmap(struct page *page)
+{
+	ClearPageReserved(page);
+	page->mapping = NULL;
+}
+
+static __always_inline void __SetPageVmemmap(struct page *page)
+{
+	SetPageReserved(page);
+	page->mapping = (void *)VMEMMAP_PAGE;
+}
+
 #ifdef CONFIG_KSM
 /*
  * A KSM page is one of those write-protected "shared pages" or "merged pages"
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 02029a993329..844f5ba93383 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -397,12 +397,6 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 }
 EXPORT_SYMBOL(devm_memremap_pages);
 
-unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
-{
-	/* number of pfns from base where pfn_to_page() is valid */
-	return altmap->reserve + altmap->free;
-}
-
 void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns)
 {
 	altmap->alloc -= nr_pfns;
diff --git a/mm/compaction.c b/mm/compaction.c
index fb548e4c7bd4..1ba5ad029258 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -741,6 +741,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 		page = pfn_to_page(low_pfn);
 
+		if (PageVmemmap(page))
+			goto isolate_fail;
+
 		if (!valid_page)
 			valid_page = page;
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d28883aea475..08545a567de6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -35,6 +35,7 @@
 #include <linux/memblock.h>
 #include <linux/bootmem.h>
 #include <linux/compaction.h>
+#include <linux/memremap.h>
 
 #include <asm/tlbflush.h>
 
@@ -244,7 +245,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
 static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
-		bool want_memblock)
+		bool want_memblock, struct vmem_altmap *altmap)
 {
 	int ret;
 	int i;
@@ -252,7 +253,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
 	if (pfn_valid(phys_start_pfn))
 		return -EEXIST;
 
-	ret = sparse_add_one_section(NODE_DATA(nid), phys_start_pfn);
+	ret = sparse_add_one_section(NODE_DATA(nid), phys_start_pfn, altmap);
 	if (ret < 0)
 		return ret;
 
@@ -293,11 +294,23 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 	int err = 0;
 	int start_sec, end_sec;
 	struct vmem_altmap *altmap = restrictions->altmap;
+	struct vmem_altmap __memblk_altmap;
 
 	/* during initialize mem_map, align hot-added range to section */
 	start_sec = pfn_to_section_nr(phys_start_pfn);
 	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
 
+	/*
+	 * Check device specific altmap and fallback to allocating from the
+	 * begining of the memblock if requested
+	 */
+	if (!altmap && (restrictions->flags & MHP_MEMMAP_FROM_RANGE)) {
+		__memblk_altmap.base_pfn = phys_start_pfn;
+		__memblk_altmap.free = nr_pages;
+		__memblk_altmap.flush_alloc_pfns = mark_vmemmap_pages;
+		altmap = &__memblk_altmap;
+	}
+
 	if (altmap) {
 		/*
 		 * Validate altmap is within bounds of the total request
@@ -313,7 +326,8 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 
 	for (i = start_sec; i <= end_sec; i++) {
 		err = __add_section(nid, section_nr_to_pfn(i),
-				restrictions->flags & MHP_MEMBLOCK_API);
+				restrictions->flags & MHP_MEMBLOCK_API,
+				altmap);
 
 		/*
 		 * EEXIST is finally dealt with by ioresource collision
@@ -690,6 +704,8 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
 	if (PageReserved(pfn_to_page(start_pfn)))
 		for (i = 0; i < nr_pages; i++) {
 			page = pfn_to_page(start_pfn + i);
+			if (PageVmemmap(page))
+				continue;
 			if (PageHWPoison(page)) {
 				ClearPageReserved(page);
 				continue;
@@ -1147,7 +1163,7 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
 			goto error;
 	}
 
-	restrictions.flags = MHP_MEMBLOCK_API;
+	restrictions.flags = MHP_MEMBLOCK_API | MHP_MEMMAP_FROM_RANGE;
 
 	/* call arch's memory hotadd */
 	ret = arch_add_memory(nid, start, size, &restrictions);
@@ -1378,6 +1394,9 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			continue;
 		page = pfn_to_page(pfn);
 
+		if (PageVmemmap(page))
+			continue;
+
 		if (PageHuge(page)) {
 			struct page *head = compound_head(page);
 			pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f4e5db85ebfc..145187d5656b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1168,6 +1168,10 @@ static void free_one_page(struct zone *zone,
 static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 				unsigned long zone, int nid)
 {
+	/*
+	 * Please note that the page already has some state which has to be
+	 * preserved (e.g. PageVmemmap)
+	 */
 	set_page_links(page, zone, nid, pfn);
 	init_page_count(page);
 	page_mapcount_reset(page);
@@ -7639,6 +7643,16 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 			continue;
 		}
 		page = pfn_to_page(pfn);
+
+		/*
+		 * vmemmap pages are residing inside the memory section so
+		 * they do not have to be freed anywhere.
+		 */
+		if (PageVmemmap(page)) {
+			pfn++;
+			continue;
+		}
+
 		/*
 		 * The HWPoisoned page may be not in buddy system, and
 		 * page_count() is not 0.
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 757410d9f758..6e976aa94dee 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -242,6 +242,10 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 			continue;
 		}
 		page = pfn_to_page(pfn);
+		if (PageVmemmap(page)) {
+			pfn++;
+			continue;
+		}
 		if (PageBuddy(page))
 			/*
 			 * If the page is on a free list, it has to be on
@@ -272,10 +276,15 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 	 * are not aligned to pageblock_nr_pages.
 	 * Then we just check migratetype first.
 	 */
-	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
+	for (pfn = start_pfn; pfn < end_pfn; ) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
+		if (PageVmemmap(page)) {
+			pfn++;
+			continue;
+		}
 		if (page && !is_migrate_isolate_page(page))
 			break;
+		pfn += pageblock_nr_pages;
 	}
 	page = __first_valid_page(start_pfn, end_pfn - start_pfn);
 	if ((pfn < end_pfn) || !page)
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 48c44eda3254..0af022afc951 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -259,7 +259,8 @@ int __meminit vmemmap_populate_basepages(unsigned long start,
 	return 0;
 }
 
-struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid)
+struct page * __meminit __sparse_mem_map_populate(unsigned long pnum, int nid,
+		struct vmem_altmap *altmap)
 {
 	unsigned long start;
 	unsigned long end;
@@ -269,12 +270,18 @@ struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid)
 	start = (unsigned long)map;
 	end = (unsigned long)(map + PAGES_PER_SECTION);
 
-	if (vmemmap_populate(start, end, nid))
+	if (__vmemmap_populate(start, end, nid, altmap))
 		return NULL;
 
 	return map;
 }
 
+struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid)
+{
+	WARN_ON(to_vmem_altmap(pnum));
+	return __sparse_mem_map_populate(pnum, nid, NULL);
+}
+
 void __init sparse_mem_maps_populate_node(struct page **map_map,
 					  unsigned long pnum_begin,
 					  unsigned long pnum_end,
diff --git a/mm/sparse.c b/mm/sparse.c
index a9783acf2bb9..19b9aa60f48a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -10,6 +10,7 @@
 #include <linux/export.h>
 #include <linux/spinlock.h>
 #include <linux/vmalloc.h>
+#include <linux/memremap.h>
 
 #include "internal.h"
 #include <asm/dma.h>
@@ -662,10 +663,11 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 #endif
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid)
+static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
+		struct vmem_altmap *altmap)
 {
 	/* This will make the necessary allocations eventually. */
-	return sparse_mem_map_populate(pnum, nid);
+	return __sparse_mem_map_populate(pnum, nid, altmap);
 }
 static void __kfree_section_memmap(struct page *memmap)
 {
@@ -705,7 +707,8 @@ static struct page *__kmalloc_section_memmap(void)
 	return ret;
 }
 
-static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid)
+static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
+		struct vmem_altmap *altmap)
 {
 	return __kmalloc_section_memmap();
 }
@@ -757,7 +760,8 @@ static void free_map_bootmem(struct page *memmap)
  * set.  If this is <=0, then that means that the passed-in
  * map was not consumed and must be freed.
  */
-int __meminit sparse_add_one_section(struct pglist_data *pgdat, unsigned long start_pfn)
+int __meminit sparse_add_one_section(struct pglist_data *pgdat, unsigned long start_pfn,
+		struct vmem_altmap *altmap)
 {
 	unsigned long section_nr = pfn_to_section_nr(start_pfn);
 	struct mem_section *ms;
@@ -773,7 +777,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat, unsigned long st
 	ret = sparse_index_init(section_nr, pgdat->node_id);
 	if (ret < 0 && ret != -EEXIST)
 		return ret;
-	memmap = kmalloc_section_memmap(section_nr, pgdat->node_id);
+	memmap = kmalloc_section_memmap(section_nr, pgdat->node_id, altmap);
 	if (!memmap)
 		return -ENOMEM;
 	usemap = __kmalloc_section_usemap();
@@ -790,8 +794,20 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat, unsigned long st
 		goto out;
 	}
 
+	/*
+	 * TODO get rid of this somehow - we want to postpone the full
+	 * initialization until memmap_init_zone.
+	 */
 	memset(memmap, 0, sizeof(struct page) * PAGES_PER_SECTION);
 
+	/*
+	 * now that we have a valid vmemmap mapping we can use
+	 * pfn_to_page and flush struct pages which back the
+	 * memmap
+	 */
+	if (altmap && altmap->flush_alloc_pfns)
+		altmap->flush_alloc_pfns(altmap);
+
 	section_mark_present(ms);
 
 	ret = sparse_init_one_section(ms, section_nr, memmap, usemap);
-- 
2.13.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/6] mm, sparse: complain about implicit altmap usage in vmemmap_populate
  2017-08-01 12:41 ` Michal Hocko
@ 2017-08-01 12:41   ` Michal Hocko
  -1 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-01 12:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

All current users of the altmap are in the memory hotplug code and
they use __vmemmap_populate explicitly (via __sparse_mem_map_populate).
Complain if somebody uses vmemmap_populate with altmap registered
because that could be an unexpected usage. Also call __vmemmap_populate
with NULL from that code path.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/mm.h | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3ce673570fb8..ae1fa053d09e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2456,8 +2456,12 @@ int __vmemmap_populate(unsigned long start, unsigned long end, int node,
 static inline int vmemmap_populate(unsigned long start, unsigned long end,
 		int node)
 {
-	struct vmem_altmap *altmap = to_vmem_altmap(start);
-	return __vmemmap_populate(start, end, node, altmap);
+	/*
+	 * All users of the altmap have to be explicit and use
+	 * __vmemmap_populate directly
+	 */
+	WARN_ON(to_vmem_altmap(start));
+	return __vmemmap_populate(start, end, node, NULL);
 }
 
 void vmemmap_populate_print_last(void);
-- 
2.13.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/6] mm, sparse: complain about implicit altmap usage in vmemmap_populate
@ 2017-08-01 12:41   ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-01 12:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

All current users of the altmap are in the memory hotplug code and
they use __vmemmap_populate explicitly (via __sparse_mem_map_populate).
Complain if somebody uses vmemmap_populate with altmap registered
because that could be an unexpected usage. Also call __vmemmap_populate
with NULL from that code path.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/mm.h | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3ce673570fb8..ae1fa053d09e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2456,8 +2456,12 @@ int __vmemmap_populate(unsigned long start, unsigned long end, int node,
 static inline int vmemmap_populate(unsigned long start, unsigned long end,
 		int node)
 {
-	struct vmem_altmap *altmap = to_vmem_altmap(start);
-	return __vmemmap_populate(start, end, node, altmap);
+	/*
+	 * All users of the altmap have to be explicit and use
+	 * __vmemmap_populate directly
+	 */
+	WARN_ON(to_vmem_altmap(start));
+	return __vmemmap_populate(start, end, node, NULL);
 }
 
 void vmemmap_populate_print_last(void);
-- 
2.13.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/6] mm, sparse: rename kmalloc_section_memmap, __kfree_section_memmap
  2017-08-01 12:41 ` Michal Hocko
@ 2017-08-01 12:41   ` Michal Hocko
  -1 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-01 12:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Both functions will use altmap rather than kmalloc for sparsemem-vmemmap
so rename them to alloc_section_memmap/free_section_memmap which better
reflect the functionality.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/sparse.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 19b9aa60f48a..be1527a37112 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -663,13 +663,13 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 #endif
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
+static inline struct page *alloc_section_memmap(unsigned long pnum, int nid,
 		struct vmem_altmap *altmap)
 {
 	/* This will make the necessary allocations eventually. */
 	return __sparse_mem_map_populate(pnum, nid, altmap);
 }
-static void __kfree_section_memmap(struct page *memmap)
+static void free_section_memmap(struct page *memmap)
 {
 	unsigned long start = (unsigned long)memmap;
 	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
@@ -707,13 +707,13 @@ static struct page *__kmalloc_section_memmap(void)
 	return ret;
 }
 
-static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
+static inline struct page *alloc_section_memmap(unsigned long pnum, int nid,
 		struct vmem_altmap *altmap)
 {
 	return __kmalloc_section_memmap();
 }
 
-static void __kfree_section_memmap(struct page *memmap)
+static void free_section_memmap(struct page *memmap)
 {
 	if (is_vmalloc_addr(memmap))
 		vfree(memmap);
@@ -777,12 +777,12 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat, unsigned long st
 	ret = sparse_index_init(section_nr, pgdat->node_id);
 	if (ret < 0 && ret != -EEXIST)
 		return ret;
-	memmap = kmalloc_section_memmap(section_nr, pgdat->node_id, altmap);
+	memmap = alloc_section_memmap(section_nr, pgdat->node_id, altmap);
 	if (!memmap)
 		return -ENOMEM;
 	usemap = __kmalloc_section_usemap();
 	if (!usemap) {
-		__kfree_section_memmap(memmap);
+		free_section_memmap(memmap);
 		return -ENOMEM;
 	}
 
@@ -816,7 +816,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat, unsigned long st
 	pgdat_resize_unlock(pgdat, &flags);
 	if (ret <= 0) {
 		kfree(usemap);
-		__kfree_section_memmap(memmap);
+		free_section_memmap(memmap);
 	}
 	return ret;
 }
@@ -857,7 +857,7 @@ static void free_section_usemap(struct page *memmap, unsigned long *usemap)
 	if (PageSlab(usemap_page) || PageCompound(usemap_page)) {
 		kfree(usemap);
 		if (memmap)
-			__kfree_section_memmap(memmap);
+			free_section_memmap(memmap);
 		return;
 	}
 
-- 
2.13.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/6] mm, sparse: rename kmalloc_section_memmap, __kfree_section_memmap
@ 2017-08-01 12:41   ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-01 12:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Both functions will use altmap rather than kmalloc for sparsemem-vmemmap
so rename them to alloc_section_memmap/free_section_memmap which better
reflect the functionality.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/sparse.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 19b9aa60f48a..be1527a37112 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -663,13 +663,13 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 #endif
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
+static inline struct page *alloc_section_memmap(unsigned long pnum, int nid,
 		struct vmem_altmap *altmap)
 {
 	/* This will make the necessary allocations eventually. */
 	return __sparse_mem_map_populate(pnum, nid, altmap);
 }
-static void __kfree_section_memmap(struct page *memmap)
+static void free_section_memmap(struct page *memmap)
 {
 	unsigned long start = (unsigned long)memmap;
 	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
@@ -707,13 +707,13 @@ static struct page *__kmalloc_section_memmap(void)
 	return ret;
 }
 
-static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
+static inline struct page *alloc_section_memmap(unsigned long pnum, int nid,
 		struct vmem_altmap *altmap)
 {
 	return __kmalloc_section_memmap();
 }
 
-static void __kfree_section_memmap(struct page *memmap)
+static void free_section_memmap(struct page *memmap)
 {
 	if (is_vmalloc_addr(memmap))
 		vfree(memmap);
@@ -777,12 +777,12 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat, unsigned long st
 	ret = sparse_index_init(section_nr, pgdat->node_id);
 	if (ret < 0 && ret != -EEXIST)
 		return ret;
-	memmap = kmalloc_section_memmap(section_nr, pgdat->node_id, altmap);
+	memmap = alloc_section_memmap(section_nr, pgdat->node_id, altmap);
 	if (!memmap)
 		return -ENOMEM;
 	usemap = __kmalloc_section_usemap();
 	if (!usemap) {
-		__kfree_section_memmap(memmap);
+		free_section_memmap(memmap);
 		return -ENOMEM;
 	}
 
@@ -816,7 +816,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat, unsigned long st
 	pgdat_resize_unlock(pgdat, &flags);
 	if (ret <= 0) {
 		kfree(usemap);
-		__kfree_section_memmap(memmap);
+		free_section_memmap(memmap);
 	}
 	return ret;
 }
@@ -857,7 +857,7 @@ static void free_section_usemap(struct page *memmap, unsigned long *usemap)
 	if (PageSlab(usemap_page) || PageCompound(usemap_page)) {
 		kfree(usemap);
 		if (memmap)
-			__kfree_section_memmap(memmap);
+			free_section_memmap(memmap);
 		return;
 	}
 
-- 
2.13.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2 0/6] mm, memory_hotplug: allocate memmap from hotadded memory
  2017-08-01 12:41 ` Michal Hocko
@ 2017-08-07  7:00   ` Michal Hocko
  -1 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-07  7:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Benjamin Herrenschmidt,
	Catalin Marinas, Dan Williams, Fenghua Yu, Gerald Schaefer,
	Heiko Carstens, H. Peter Anvin, Ingo Molnar, Martin Schwidefsky,
	Michael Ellerman, Paul Mackerras, Thomas Gleixner, Tony Luck,
	Will Deacon, x86

Any comments? Especially for the arch specific? Has anybody had a chance
to test this? I do not want to rush this but I would be really glag if
we could push this work in 4.14 merge window.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2 0/6] mm, memory_hotplug: allocate memmap from hotadded memory
@ 2017-08-07  7:00   ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-07  7:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Benjamin Herrenschmidt,
	Catalin Marinas, Dan Williams, Fenghua Yu, Gerald Schaefer,
	Heiko Carstens, H. Peter Anvin, Ingo Molnar, Martin Schwidefsky,
	Michael Ellerman, Paul Mackerras, Thomas Gleixner, Tony Luck,
	Will Deacon, x86

Any comments? Especially for the arch specific? Has anybody had a chance
to test this? I do not want to rush this but I would be really glag if
we could push this work in 4.14 merge window.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2 0/6] mm, memory_hotplug: allocate memmap from hotadded memory
  2017-08-07  7:00   ` Michal Hocko
@ 2017-08-08 20:01     ` Dan Williams
  -1 siblings, 0 replies; 26+ messages in thread
From: Dan Williams @ 2017-08-08 20:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML,
	Benjamin Herrenschmidt, Catalin Marinas, Fenghua Yu,
	Gerald Schaefer, Heiko Carstens, H. Peter Anvin, Ingo Molnar,
	Martin Schwidefsky, Michael Ellerman, Paul Mackerras,
	Thomas Gleixner, Tony Luck, Will Deacon, X86 ML

On Mon, Aug 7, 2017 at 12:00 AM, Michal Hocko <mhocko@kernel.org> wrote:
> Any comments? Especially for the arch specific? Has anybody had a chance
> to test this? I do not want to rush this but I would be really glag if
> we could push this work in 4.14 merge window.

Hi Michal,

I'm interested in taking a look at this especially if we might be able
to get rid of vmem_altmap, but this is currently stuck behind some
other work in my queue. I'll try to circle back in the next couple
weeks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2 0/6] mm, memory_hotplug: allocate memmap from hotadded memory
@ 2017-08-08 20:01     ` Dan Williams
  0 siblings, 0 replies; 26+ messages in thread
From: Dan Williams @ 2017-08-08 20:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML,
	Benjamin Herrenschmidt, Catalin Marinas, Fenghua Yu,
	Gerald Schaefer, Heiko Carstens, H. Peter Anvin, Ingo Molnar,
	Martin Schwidefsky, Michael Ellerman, Paul Mackerras,
	Thomas Gleixner, Tony Luck, Will Deacon, X86 ML

On Mon, Aug 7, 2017 at 12:00 AM, Michal Hocko <mhocko@kernel.org> wrote:
> Any comments? Especially for the arch specific? Has anybody had a chance
> to test this? I do not want to rush this but I would be really glag if
> we could push this work in 4.14 merge window.

Hi Michal,

I'm interested in taking a look at this especially if we might be able
to get rid of vmem_altmap, but this is currently stuck behind some
other work in my queue. I'll try to circle back in the next couple
weeks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2 0/6] mm, memory_hotplug: allocate memmap from hotadded memory
  2017-08-08 20:01     ` Dan Williams
@ 2017-08-10 11:40       ` Michal Hocko
  -1 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-10 11:40 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML,
	Benjamin Herrenschmidt, Catalin Marinas, Fenghua Yu,
	Gerald Schaefer, Heiko Carstens, H. Peter Anvin, Ingo Molnar,
	Martin Schwidefsky, Michael Ellerman, Paul Mackerras,
	Thomas Gleixner, Tony Luck, Will Deacon, X86 ML

On Tue 08-08-17 13:01:36, Dan Williams wrote:
> On Mon, Aug 7, 2017 at 12:00 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > Any comments? Especially for the arch specific? Has anybody had a chance
> > to test this? I do not want to rush this but I would be really glag if
> > we could push this work in 4.14 merge window.
> 
> Hi Michal,
> 
> I'm interested in taking a look at this especially if we might be able
> to get rid of vmem_altmap, but this is currently stuck behind some
> other work in my queue. I'll try to circle back in the next couple
> weeks.

Well, vmem_altmap was there and easy to reuse. Replacing with something
else is certainly possible but I really need something to hook a
dedicated allocator into vmemmap code.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2 0/6] mm, memory_hotplug: allocate memmap from hotadded memory
@ 2017-08-10 11:40       ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-10 11:40 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML,
	Benjamin Herrenschmidt, Catalin Marinas, Fenghua Yu,
	Gerald Schaefer, Heiko Carstens, H. Peter Anvin, Ingo Molnar,
	Martin Schwidefsky, Michael Ellerman, Paul Mackerras,
	Thomas Gleixner, Tony Luck, Will Deacon, X86 ML

On Tue 08-08-17 13:01:36, Dan Williams wrote:
> On Mon, Aug 7, 2017 at 12:00 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > Any comments? Especially for the arch specific? Has anybody had a chance
> > to test this? I do not want to rush this but I would be really glag if
> > we could push this work in 4.14 merge window.
> 
> Hi Michal,
> 
> I'm interested in taking a look at this especially if we might be able
> to get rid of vmem_altmap, but this is currently stuck behind some
> other work in my queue. I'll try to circle back in the next couple
> weeks.

Well, vmem_altmap was there and easy to reuse. Replacing with something
else is certainly possible but I really need something to hook a
dedicated allocator into vmemmap code.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2 0/6] mm, memory_hotplug: allocate memmap from hotadded memory
  2017-08-10 11:40       ` Michal Hocko
@ 2017-08-10 15:27         ` Dan Williams
  -1 siblings, 0 replies; 26+ messages in thread
From: Dan Williams @ 2017-08-10 15:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML,
	Benjamin Herrenschmidt, Catalin Marinas, Fenghua Yu,
	Gerald Schaefer, Heiko Carstens, H. Peter Anvin, Ingo Molnar,
	Martin Schwidefsky, Michael Ellerman, Paul Mackerras,
	Thomas Gleixner, Tony Luck, Will Deacon, X86 ML

On Thu, Aug 10, 2017 at 4:40 AM, Michal Hocko <mhocko@kernel.org> wrote:
> On Tue 08-08-17 13:01:36, Dan Williams wrote:
>> On Mon, Aug 7, 2017 at 12:00 AM, Michal Hocko <mhocko@kernel.org> wrote:
>> > Any comments? Especially for the arch specific? Has anybody had a chance
>> > to test this? I do not want to rush this but I would be really glag if
>> > we could push this work in 4.14 merge window.
>>
>> Hi Michal,
>>
>> I'm interested in taking a look at this especially if we might be able
>> to get rid of vmem_altmap, but this is currently stuck behind some
>> other work in my queue. I'll try to circle back in the next couple
>> weeks.
>
> Well, vmem_altmap was there and easy to reuse. Replacing with something
> else is certainly possible but I really need something to hook a
> dedicated allocator into vmemmap code.

Oh, you're reusing it, that's great. Then I definitely got the wrong
impression from the first glance at the patch set, I'll dig deeper.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2 0/6] mm, memory_hotplug: allocate memmap from hotadded memory
@ 2017-08-10 15:27         ` Dan Williams
  0 siblings, 0 replies; 26+ messages in thread
From: Dan Williams @ 2017-08-10 15:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML,
	Benjamin Herrenschmidt, Catalin Marinas, Fenghua Yu,
	Gerald Schaefer, Heiko Carstens, H. Peter Anvin, Ingo Molnar,
	Martin Schwidefsky, Michael Ellerman, Paul Mackerras,
	Thomas Gleixner, Tony Luck, Will Deacon, X86 ML

On Thu, Aug 10, 2017 at 4:40 AM, Michal Hocko <mhocko@kernel.org> wrote:
> On Tue 08-08-17 13:01:36, Dan Williams wrote:
>> On Mon, Aug 7, 2017 at 12:00 AM, Michal Hocko <mhocko@kernel.org> wrote:
>> > Any comments? Especially for the arch specific? Has anybody had a chance
>> > to test this? I do not want to rush this but I would be really glag if
>> > we could push this work in 4.14 merge window.
>>
>> Hi Michal,
>>
>> I'm interested in taking a look at this especially if we might be able
>> to get rid of vmem_altmap, but this is currently stuck behind some
>> other work in my queue. I'll try to circle back in the next couple
>> weeks.
>
> Well, vmem_altmap was there and easy to reuse. Replacing with something
> else is certainly possible but I really need something to hook a
> dedicated allocator into vmemmap code.

Oh, you're reusing it, that's great. Then I definitely got the wrong
impression from the first glance at the patch set, I'll dig deeper.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2 0/6] mm, memory_hotplug: allocate memmap from hotadded memory
  2017-08-10 15:27         ` Dan Williams
@ 2017-08-10 15:33           ` Michal Hocko
  -1 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-10 15:33 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML,
	Benjamin Herrenschmidt, Catalin Marinas, Fenghua Yu,
	Gerald Schaefer, Heiko Carstens, H. Peter Anvin, Ingo Molnar,
	Martin Schwidefsky, Michael Ellerman, Paul Mackerras,
	Thomas Gleixner, Tony Luck, Will Deacon, X86 ML

On Thu 10-08-17 08:27:56, Dan Williams wrote:
> On Thu, Aug 10, 2017 at 4:40 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > On Tue 08-08-17 13:01:36, Dan Williams wrote:
> >> On Mon, Aug 7, 2017 at 12:00 AM, Michal Hocko <mhocko@kernel.org> wrote:
> >> > Any comments? Especially for the arch specific? Has anybody had a chance
> >> > to test this? I do not want to rush this but I would be really glag if
> >> > we could push this work in 4.14 merge window.
> >>
> >> Hi Michal,
> >>
> >> I'm interested in taking a look at this especially if we might be able
> >> to get rid of vmem_altmap, but this is currently stuck behind some
> >> other work in my queue. I'll try to circle back in the next couple
> >> weeks.
> >
> > Well, vmem_altmap was there and easy to reuse. Replacing with something
> > else is certainly possible but I really need something to hook a
> > dedicated allocator into vmemmap code.
> 
> Oh, you're reusing it, that's great. Then I definitely got the wrong
> impression from the first glance at the patch set, I'll dig deeper.

yeah, reusing and extending a bit so I would highly appreciate if you
could have a look I am doing that in a sane way.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2 0/6] mm, memory_hotplug: allocate memmap from hotadded memory
@ 2017-08-10 15:33           ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-08-10 15:33 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML,
	Benjamin Herrenschmidt, Catalin Marinas, Fenghua Yu,
	Gerald Schaefer, Heiko Carstens, H. Peter Anvin, Ingo Molnar,
	Martin Schwidefsky, Michael Ellerman, Paul Mackerras,
	Thomas Gleixner, Tony Luck, Will Deacon, X86 ML

On Thu 10-08-17 08:27:56, Dan Williams wrote:
> On Thu, Aug 10, 2017 at 4:40 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > On Tue 08-08-17 13:01:36, Dan Williams wrote:
> >> On Mon, Aug 7, 2017 at 12:00 AM, Michal Hocko <mhocko@kernel.org> wrote:
> >> > Any comments? Especially for the arch specific? Has anybody had a chance
> >> > to test this? I do not want to rush this but I would be really glag if
> >> > we could push this work in 4.14 merge window.
> >>
> >> Hi Michal,
> >>
> >> I'm interested in taking a look at this especially if we might be able
> >> to get rid of vmem_altmap, but this is currently stuck behind some
> >> other work in my queue. I'll try to circle back in the next couple
> >> weeks.
> >
> > Well, vmem_altmap was there and easy to reuse. Replacing with something
> > else is certainly possible but I really need something to hook a
> > dedicated allocator into vmemmap code.
> 
> Oh, you're reusing it, that's great. Then I definitely got the wrong
> impression from the first glance at the patch set, I'll dig deeper.

yeah, reusing and extending a bit so I would highly appreciate if you
could have a look I am doing that in a sane way.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2 0/6] mm, memory_hotplug: allocate memmap from hotadded memory
  2017-08-01 12:41 ` Michal Hocko
@ 2017-09-04 16:32   ` Michal Hocko
  -1 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-09-04 16:32 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Benjamin Herrenschmidt,
	Catalin Marinas, Dan Williams, Fenghua Yu, Gerald Schaefer,
	Heiko Carstens, H. Peter Anvin, Ingo Molnar, Martin Schwidefsky,
	Michael Ellerman, Paul Mackerras, Thomas Gleixner, Tony Luck,
	Will Deacon, x86

Hi,
I have finally found some more time and gave this a more serious testing
on a real HW. Multiple issues popped out and I am not done yet. So
please ignore this version. I will resubmit as soon as I will be more
confident about the quality.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2 0/6] mm, memory_hotplug: allocate memmap from hotadded memory
@ 2017-09-04 16:32   ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-09-04 16:32 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Benjamin Herrenschmidt,
	Catalin Marinas, Dan Williams, Fenghua Yu, Gerald Schaefer,
	Heiko Carstens, H. Peter Anvin, Ingo Molnar, Martin Schwidefsky,
	Michael Ellerman, Paul Mackerras, Thomas Gleixner, Tony Luck,
	Will Deacon, x86

Hi,
I have finally found some more time and gave this a more serious testing
on a real HW. Multiple issues popped out and I am not done yet. So
please ignore this version. I will resubmit as soon as I will be more
confident about the quality.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-09-04 16:32 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-01 12:41 [RFC PATCH v2 0/6] mm, memory_hotplug: allocate memmap from hotadded memory Michal Hocko
2017-08-01 12:41 ` Michal Hocko
2017-08-01 12:41 ` [PATCH 1/6] mm, memory_hotplug: cleanup memory offline path Michal Hocko
2017-08-01 12:41   ` Michal Hocko
2017-08-01 12:41 ` [PATCH 2/6] mm, arch: unify vmemmap_populate altmap handling Michal Hocko
2017-08-01 12:41   ` Michal Hocko
2017-08-01 12:41 ` [PATCH 3/6] mm, memory_hotplug: provide a more generic restrictions for memory hotplug Michal Hocko
2017-08-01 12:41   ` Michal Hocko
2017-08-01 12:41 ` [PATCH 4/6] mm, memory_hotplug: allocate memmap from the added memory range for sparse-vmemmap Michal Hocko
2017-08-01 12:41   ` Michal Hocko
2017-08-01 12:41 ` [PATCH 5/6] mm, sparse: complain about implicit altmap usage in vmemmap_populate Michal Hocko
2017-08-01 12:41   ` Michal Hocko
2017-08-01 12:41 ` [PATCH 6/6] mm, sparse: rename kmalloc_section_memmap, __kfree_section_memmap Michal Hocko
2017-08-01 12:41   ` Michal Hocko
2017-08-07  7:00 ` [RFC PATCH v2 0/6] mm, memory_hotplug: allocate memmap from hotadded memory Michal Hocko
2017-08-07  7:00   ` Michal Hocko
2017-08-08 20:01   ` Dan Williams
2017-08-08 20:01     ` Dan Williams
2017-08-10 11:40     ` Michal Hocko
2017-08-10 11:40       ` Michal Hocko
2017-08-10 15:27       ` Dan Williams
2017-08-10 15:27         ` Dan Williams
2017-08-10 15:33         ` Michal Hocko
2017-08-10 15:33           ` Michal Hocko
2017-09-04 16:32 ` Michal Hocko
2017-09-04 16:32   ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.