linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* revamp vmem_altmap / dev_pagemap handling V2
@ 2017-12-15 14:09 Christoph Hellwig
  2017-12-15 14:09 ` [PATCH 01/17] memremap: provide stubs for vmem_altmap_offset and vmem_altmap_free Christoph Hellwig
                   ` (17 more replies)
  0 siblings, 18 replies; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

Hi all,

this series started with two patches from Logan that now are in the
middle of the series to kill the memremap-internal pgmap structure
and to redo the dev_memreamp_pages interface to be better suitable
for future PCI P2P uses.  I reviewed them and noticed that there
isn't really any good reason to keep struct vmem_altmap either,
and that a lot of these alternative device page map access should
be better abstracted out instead of being sprinkled all over the
mm code.  But when we got the RCU warnings in V1 I went for yet
another approach, and now struct vmem_altmap is kept for now,
but passed explicitly through the memory hotplug code instead of
having to do unprotected lookups through the radix tree.  The
end result is that only the get_user_pages path ever looks up
struct dev_pagemap, and struct vmem_altmap is now always embedded
into struct dev_pagemap, and explicitly passed where needed.

Please review carefully, this has only been tested with my legacy
e820 NVDIMM system.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 01/17] memremap: provide stubs for vmem_altmap_offset and vmem_altmap_free
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
@ 2017-12-15 14:09 ` Christoph Hellwig
  2017-12-16  1:41   ` Dan Williams
  2017-12-15 14:09 ` [PATCH 02/17] mm: don't export arch_add_memory Christoph Hellwig
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

Currently all calls to those functions are eliminated by the compiler when
CONFIG_ZONE_DEVICE is not set, but this soon won't be the case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/memremap.h | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 10d23c367048..d5a6736d9737 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -26,9 +26,6 @@ struct vmem_altmap {
 	unsigned long alloc;
 };
 
-unsigned long vmem_altmap_offset(struct vmem_altmap *altmap);
-void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns);
-
 #ifdef CONFIG_ZONE_DEVICE
 struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start);
 #else
@@ -138,6 +135,9 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 		struct percpu_ref *ref, struct vmem_altmap *altmap);
 struct dev_pagemap *find_dev_pagemap(resource_size_t phys);
 
+unsigned long vmem_altmap_offset(struct vmem_altmap *altmap);
+void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns);
+
 static inline bool is_zone_device_page(const struct page *page);
 #else
 static inline void *devm_memremap_pages(struct device *dev,
@@ -157,7 +157,17 @@ static inline struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
 {
 	return NULL;
 }
-#endif
+
+static inline unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
+{
+	return 0;
+}
+
+static inline void vmem_altmap_free(struct vmem_altmap *altmap,
+		unsigned long nr_pfns)
+{
+}
+#endif /* CONFIG_ZONE_DEVICE */
 
 #if defined(CONFIG_DEVICE_PRIVATE) || defined(CONFIG_DEVICE_PUBLIC)
 static inline bool is_device_private_page(const struct page *page)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 02/17] mm: don't export arch_add_memory
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
  2017-12-15 14:09 ` [PATCH 01/17] memremap: provide stubs for vmem_altmap_offset and vmem_altmap_free Christoph Hellwig
@ 2017-12-15 14:09 ` Christoph Hellwig
  2017-12-16  1:41   ` Dan Williams
  2017-12-15 14:09 ` [PATCH 03/17] mm: don't export __add_pages Christoph Hellwig
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

Only x86_64 and sh export this symbol, and it is not used by any
modular code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/sh/mm/init.c     | 1 -
 arch/x86/mm/init_64.c | 1 -
 2 files changed, 2 deletions(-)

diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index bf726af5f1a5..afc54d593a26 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -498,7 +498,6 @@ int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(arch_add_memory);
 
 #ifdef CONFIG_NUMA
 int memory_add_physaddr_to_nid(u64 addr)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 4a837289f2ad..8acdc35c2dfa 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -796,7 +796,6 @@ int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
 
 	return add_pages(nid, start_pfn, nr_pages, want_memblock);
 }
-EXPORT_SYMBOL_GPL(arch_add_memory);
 
 #define PAGE_INUSE 0xFD
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 03/17] mm: don't export __add_pages
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
  2017-12-15 14:09 ` [PATCH 01/17] memremap: provide stubs for vmem_altmap_offset and vmem_altmap_free Christoph Hellwig
  2017-12-15 14:09 ` [PATCH 02/17] mm: don't export arch_add_memory Christoph Hellwig
@ 2017-12-15 14:09 ` Christoph Hellwig
  2017-12-16  1:42   ` Dan Williams
  2017-12-15 14:09 ` [PATCH 04/17] mm: pass the vmem_altmap to arch_add_memory and __add_pages Christoph Hellwig
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

This function isn't used by any modules, and is only to be called
from core MM code.  This includes the calls for the add_pages wrapper
that might be inlined.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/memory_hotplug.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c52aa05b106c..5c6f96e6b334 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -334,7 +334,6 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 out:
 	return err;
 }
-EXPORT_SYMBOL_GPL(__add_pages);
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /* find the smallest valid pfn in the range [start_pfn, end_pfn) */
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 04/17] mm: pass the vmem_altmap to arch_add_memory and __add_pages
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2017-12-15 14:09 ` [PATCH 03/17] mm: don't export __add_pages Christoph Hellwig
@ 2017-12-15 14:09 ` Christoph Hellwig
  2017-12-16  1:48   ` Dan Williams
                     ` (2 more replies)
  2017-12-15 14:09 ` [PATCH 05/17] mm: pass the vmem_altmap to vmemmap_populate Christoph Hellwig
                   ` (13 subsequent siblings)
  17 siblings, 3 replies; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

We can just pass this on instead of having to do a radix tree lookup
without proper locking 2 levels into the callchain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/ia64/mm/init.c            |  5 +++--
 arch/powerpc/mm/mem.c          |  5 +++--
 arch/s390/mm/init.c            |  5 +++--
 arch/sh/mm/init.c              |  5 +++--
 arch/x86/mm/init_32.c          |  5 +++--
 arch/x86/mm/init_64.c          | 11 ++++++-----
 include/linux/memory_hotplug.h | 17 ++++++++++-------
 kernel/memremap.c              |  2 +-
 mm/hmm.c                       |  5 +++--
 mm/memory_hotplug.c            |  7 +++----
 10 files changed, 38 insertions(+), 29 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 7af4e05bb61e..2e2e4f532204 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -647,13 +647,14 @@ mem_init (void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
+		bool want_memblock)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
-	ret = __add_pages(nid, start_pfn, nr_pages, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
 	if (ret)
 		printk("%s: Problem encountered in __add_pages() as ret=%d\n",
 		       __func__,  ret);
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 4362b86ef84c..e670cfc2766e 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -127,7 +127,8 @@ int __weak remove_section_mapping(unsigned long start, unsigned long end)
 	return -ENODEV;
 }
 
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
+		bool want_memblock)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -144,7 +145,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
 		return -EFAULT;
 	}
 
-	return __add_pages(nid, start_pfn, nr_pages, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 671535e64aba..e12c5af50cd7 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -222,7 +222,8 @@ device_initcall(s390_cma_mem_init);
 
 #endif /* CONFIG_CMA */
 
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
+		bool want_memblock)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long size_pages = PFN_DOWN(size);
@@ -232,7 +233,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
 	if (rc)
 		return rc;
 
-	rc = __add_pages(nid, start_pfn, size_pages, want_memblock);
+	rc = __add_pages(nid, start_pfn, size_pages, altmap, want_memblock);
 	if (rc)
 		vmem_remove_mapping(start, size);
 	return rc;
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index afc54d593a26..552afbf55bad 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -485,14 +485,15 @@ void free_initrd_mem(unsigned long start, unsigned long end)
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
+		bool want_memblock)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
 	/* We only have ZONE_NORMAL, so this is easy.. */
-	ret = __add_pages(nid, start_pfn, nr_pages, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
 	if (unlikely(ret))
 		printk("%s: Failed, __add_pages() == %d\n", __func__, ret);
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 8a64a6f2848d..cdf19ec6460c 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -823,12 +823,13 @@ void __init mem_init(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
+		bool want_memblock)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	return __add_pages(nid, start_pfn, nr_pages, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 8acdc35c2dfa..e26ade50ae18 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -772,12 +772,12 @@ static void update_end_of_memory_vars(u64 start, u64 size)
 	}
 }
 
-int add_pages(int nid, unsigned long start_pfn,
-	      unsigned long nr_pages, bool want_memblock)
+int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
+		struct vmem_altmap *altmap, bool want_memblock)
 {
 	int ret;
 
-	ret = __add_pages(nid, start_pfn, nr_pages, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, NULL, want_memblock);
 	WARN_ON_ONCE(ret);
 
 	/* update max_pfn, max_low_pfn and high_memory */
@@ -787,14 +787,15 @@ int add_pages(int nid, unsigned long start_pfn,
 	return ret;
 }
 
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
+		bool want_memblock)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
 	init_memory_mapping(start, start + size);
 
-	return add_pages(nid, start_pfn, nr_pages, want_memblock);
+	return add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
 }
 
 #define PAGE_INUSE 0xFD
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 58e110aee7ab..db276afbefcc 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -13,6 +13,7 @@ struct pglist_data;
 struct mem_section;
 struct memory_block;
 struct resource;
+struct vmem_altmap;
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 /*
@@ -131,18 +132,19 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 /* reasonably generic interface to expand the physical pages */
-extern int __add_pages(int nid, unsigned long start_pfn,
-	unsigned long nr_pages, bool want_memblock);
+extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
+		struct vmem_altmap *altmap, bool want_memblock);
 
 #ifndef CONFIG_ARCH_HAS_ADD_PAGES
 static inline int add_pages(int nid, unsigned long start_pfn,
-			    unsigned long nr_pages, bool want_memblock)
+		unsigned long nr_pages, struct vmem_altmap *altmap,
+		bool want_memblock)
 {
-	return __add_pages(nid, start_pfn, nr_pages, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
 }
 #else /* ARCH_HAS_ADD_PAGES */
-int add_pages(int nid, unsigned long start_pfn,
-	      unsigned long nr_pages, bool want_memblock);
+int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
+		struct vmem_altmap *altmap, bool want_memblock);
 #endif /* ARCH_HAS_ADD_PAGES */
 
 #ifdef CONFIG_NUMA
@@ -318,7 +320,8 @@ extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
 		void *arg, int (*func)(struct memory_block *, void *));
 extern int add_memory(int nid, u64 start, u64 size);
 extern int add_memory_resource(int nid, struct resource *resource, bool online);
-extern int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock);
+extern int arch_add_memory(int nid, u64 start, u64 size,
+		struct vmem_altmap *altmap, bool want_memblock);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 403ab9cdb949..16456117a1b1 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -427,7 +427,7 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 		goto err_pfn_remap;
 
 	mem_hotplug_begin();
-	error = arch_add_memory(nid, align_start, align_size, false);
+	error = arch_add_memory(nid, align_start, align_size, altmap, false);
 	if (!error)
 		move_pfn_range_to_zone(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
 					align_start >> PAGE_SHIFT,
diff --git a/mm/hmm.c b/mm/hmm.c
index 3a5c172af560..cff2aa6910af 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -931,10 +931,11 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
 	 * want the linear mapping and thus use arch_add_memory().
 	 */
 	if (devmem->pagemap.type == MEMORY_DEVICE_PUBLIC)
-		ret = arch_add_memory(nid, align_start, align_size, false);
+		ret = arch_add_memory(nid, align_start, align_size, NULL,
+				false);
 	else
 		ret = add_pages(nid, align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, false);
+				align_size >> PAGE_SHIFT, NULL, false);
 	if (ret) {
 		mem_hotplug_done();
 		goto error_add_memory;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 5c6f96e6b334..fc0485dcece1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -292,18 +292,17 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
  * add the new pages.
  */
 int __ref __add_pages(int nid, unsigned long phys_start_pfn,
-			unsigned long nr_pages, bool want_memblock)
+		unsigned long nr_pages, struct vmem_altmap *altmap,
+		bool want_memblock)
 {
 	unsigned long i;
 	int err = 0;
 	int start_sec, end_sec;
-	struct vmem_altmap *altmap;
 
 	/* during initialize mem_map, align hot-added range to section */
 	start_sec = pfn_to_section_nr(phys_start_pfn);
 	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
 
-	altmap = to_vmem_altmap((unsigned long) pfn_to_page(phys_start_pfn));
 	if (altmap) {
 		/*
 		 * Validate altmap is within bounds of the total request
@@ -1148,7 +1147,7 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
 	}
 
 	/* call arch's memory hotadd */
-	ret = arch_add_memory(nid, start, size, true);
+	ret = arch_add_memory(nid, start, size, NULL, true);
 
 	if (ret < 0)
 		goto error;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 05/17] mm: pass the vmem_altmap to vmemmap_populate
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2017-12-15 14:09 ` [PATCH 04/17] mm: pass the vmem_altmap to arch_add_memory and __add_pages Christoph Hellwig
@ 2017-12-15 14:09 ` Christoph Hellwig
  2017-12-16  2:03   ` Dan Williams
  2017-12-15 14:09 ` [PATCH 06/17] mm: pass the vmem_altmap to arch_remove_memory and __remove_pages Christoph Hellwig
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

We can just pass this on instead of having to do a radix tree lookup
without proper locking a few levels into the callchain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/mm/mmu.c            |  6 ++++--
 arch/ia64/mm/discontig.c       |  3 ++-
 arch/powerpc/mm/init_64.c      |  7 ++-----
 arch/s390/mm/vmem.c            |  3 ++-
 arch/sparc/mm/init_64.c        |  2 +-
 arch/x86/mm/init_64.c          |  4 ++--
 include/linux/memory_hotplug.h |  3 ++-
 include/linux/mm.h             |  6 ++++--
 mm/memory_hotplug.c            |  7 ++++---
 mm/sparse-vmemmap.c            |  7 ++++---
 mm/sparse.c                    | 20 ++++++++++++--------
 11 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 267d2b79d52d..ec8952ff13be 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -654,12 +654,14 @@ int kern_addr_valid(unsigned long addr)
 }
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 #if !ARM64_SWAPPER_USES_SECTION_MAPS
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
 	return vmemmap_populate_basepages(start, end, node);
 }
 #else	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
 	unsigned long addr = start;
 	unsigned long next;
diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index 9b2d994cddf6..1555aecaaf85 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -754,7 +754,8 @@ void arch_refresh_nodedata(int update_node, pg_data_t *update_pgdat)
 #endif
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
 	return vmemmap_populate_basepages(start, end, node);
 }
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index a07722531b32..779b74a96b8f 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -183,7 +183,8 @@ static __meminit void vmemmap_list_populate(unsigned long phys,
 	vmemmap_list = vmem_back;
 }
 
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
 	unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
 
@@ -193,16 +194,12 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 	pr_debug("vmemmap_populate %lx..%lx, node %d\n", start, end, node);
 
 	for (; start < end; start += page_size) {
-		struct vmem_altmap *altmap;
 		void *p;
 		int rc;
 
 		if (vmemmap_populated(start, page_size))
 			continue;
 
-		/* altmap lookups only work at section boundaries */
-		altmap = to_vmem_altmap(SECTION_ALIGN_DOWN(start));
-
 		p =  __vmemmap_alloc_block_buf(page_size, node, altmap);
 		if (!p)
 			return -ENOMEM;
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 3316d463fc29..c44ef0e7c466 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -211,7 +211,8 @@ static void vmem_remove_range(unsigned long start, unsigned long size)
 /*
  * Add a backed mem_map array to the virtual mem_map array.
  */
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
 	unsigned long pgt_prot, sgt_prot;
 	unsigned long address = start;
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 55ba62957e64..42d27a1a042a 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2628,7 +2628,7 @@ EXPORT_SYMBOL(_PAGE_CACHE);
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
-			       int node)
+			       int node, struct vmem_altmap *altmap)
 {
 	unsigned long pte_base;
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index e26ade50ae18..0c898098feaf 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1411,9 +1411,9 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
 	return 0;
 }
 
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
+int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap)
 {
-	struct vmem_altmap *altmap = to_vmem_altmap(start);
 	int err;
 
 	if (boot_cpu_has(X86_FEATURE_PSE))
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index db276afbefcc..cbdd6d52e877 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -327,7 +327,8 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
 extern bool is_memblock_offlined(struct memory_block *mem);
 extern void remove_memory(int nid, u64 start, u64 size);
-extern int sparse_add_one_section(struct pglist_data *pgdat, unsigned long start_pfn);
+extern int sparse_add_one_section(struct pglist_data *pgdat,
+		unsigned long start_pfn, struct vmem_altmap *altmap);
 extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
 		unsigned long map_offset);
 extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ea818ff739cd..2f3a7ebecbe2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2538,7 +2538,8 @@ void sparse_mem_maps_populate_node(struct page **map_map,
 				   unsigned long map_count,
 				   int nodeid);
 
-struct page *sparse_mem_map_populate(unsigned long pnum, int nid);
+struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
+		struct vmem_altmap *altmap);
 pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
 p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node);
 pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);
@@ -2556,7 +2557,8 @@ static inline void *vmemmap_alloc_block_buf(unsigned long size, int node)
 void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
 int vmemmap_populate_basepages(unsigned long start, unsigned long end,
 			       int node);
-int vmemmap_populate(unsigned long start, unsigned long end, int node);
+int vmemmap_populate(unsigned long start, unsigned long end, int node,
+		struct vmem_altmap *altmap);
 void vmemmap_populate_print_last(void);
 #ifdef CONFIG_MEMORY_HOTPLUG
 void vmemmap_free(unsigned long start, unsigned long end);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index fc0485dcece1..b36f1822c432 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -250,7 +250,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
 static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
-		bool want_memblock)
+		struct vmem_altmap *altmap, bool want_memblock)
 {
 	int ret;
 	int i;
@@ -258,7 +258,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
 	if (pfn_valid(phys_start_pfn))
 		return -EEXIST;
 
-	ret = sparse_add_one_section(NODE_DATA(nid), phys_start_pfn);
+	ret = sparse_add_one_section(NODE_DATA(nid), phys_start_pfn, altmap);
 	if (ret < 0)
 		return ret;
 
@@ -317,7 +317,8 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 	}
 
 	for (i = start_sec; i <= end_sec; i++) {
-		err = __add_section(nid, section_nr_to_pfn(i), want_memblock);
+		err = __add_section(nid, section_nr_to_pfn(i), altmap,
+				want_memblock);
 
 		/*
 		 * EEXIST is finally dealt with by ioresource collision
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 17acf01791fa..376dcf05a39c 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -278,7 +278,8 @@ int __meminit vmemmap_populate_basepages(unsigned long start,
 	return 0;
 }
 
-struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid)
+struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid,
+		struct vmem_altmap *altmap)
 {
 	unsigned long start;
 	unsigned long end;
@@ -288,7 +289,7 @@ struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid)
 	start = (unsigned long)map;
 	end = (unsigned long)(map + PAGES_PER_SECTION);
 
-	if (vmemmap_populate(start, end, nid))
+	if (vmemmap_populate(start, end, nid, altmap))
 		return NULL;
 
 	return map;
@@ -318,7 +319,7 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 		if (!present_section_nr(pnum))
 			continue;
 
-		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid);
+		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
 		if (map_map[pnum])
 			continue;
 		ms = __nr_to_section(pnum);
diff --git a/mm/sparse.c b/mm/sparse.c
index 7a5dacaa06e3..5f4a0dac7836 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -417,7 +417,8 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
 }
 
 #ifndef CONFIG_SPARSEMEM_VMEMMAP
-struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid)
+struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid,
+		struct vmem_altmap *altmap)
 {
 	struct page *map;
 	unsigned long size;
@@ -472,7 +473,7 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 
 		if (!present_section_nr(pnum))
 			continue;
-		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid);
+		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
 		if (map_map[pnum])
 			continue;
 		ms = __nr_to_section(pnum);
@@ -500,7 +501,7 @@ static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
 	struct mem_section *ms = __nr_to_section(pnum);
 	int nid = sparse_early_nid(ms);
 
-	map = sparse_mem_map_populate(pnum, nid);
+	map = sparse_mem_map_populate(pnum, nid, NULL);
 	if (map)
 		return map;
 
@@ -678,10 +679,11 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 #endif
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid)
+static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
+		struct vmem_altmap *altmap)
 {
 	/* This will make the necessary allocations eventually. */
-	return sparse_mem_map_populate(pnum, nid);
+	return sparse_mem_map_populate(pnum, nid, altmap);
 }
 static void __kfree_section_memmap(struct page *memmap)
 {
@@ -721,7 +723,8 @@ static struct page *__kmalloc_section_memmap(void)
 	return ret;
 }
 
-static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid)
+static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
+		struct vmem_altmap *altmap)
 {
 	return __kmalloc_section_memmap();
 }
@@ -773,7 +776,8 @@ static void free_map_bootmem(struct page *memmap)
  * set.  If this is <=0, then that means that the passed-in
  * map was not consumed and must be freed.
  */
-int __meminit sparse_add_one_section(struct pglist_data *pgdat, unsigned long start_pfn)
+int __meminit sparse_add_one_section(struct pglist_data *pgdat,
+		unsigned long start_pfn, struct vmem_altmap *altmap)
 {
 	unsigned long section_nr = pfn_to_section_nr(start_pfn);
 	struct mem_section *ms;
@@ -789,7 +793,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat, unsigned long st
 	ret = sparse_index_init(section_nr, pgdat->node_id);
 	if (ret < 0 && ret != -EEXIST)
 		return ret;
-	memmap = kmalloc_section_memmap(section_nr, pgdat->node_id);
+	memmap = kmalloc_section_memmap(section_nr, pgdat->node_id, altmap);
 	if (!memmap)
 		return -ENOMEM;
 	usemap = __kmalloc_section_usemap();
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 06/17] mm: pass the vmem_altmap to arch_remove_memory and __remove_pages
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2017-12-15 14:09 ` [PATCH 05/17] mm: pass the vmem_altmap to vmemmap_populate Christoph Hellwig
@ 2017-12-15 14:09 ` Christoph Hellwig
  2017-12-16  2:04   ` Dan Williams
  2017-12-15 14:09 ` [PATCH 07/17] mm: pass the vmem_altmap to vmemmap_free Christoph Hellwig
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

We can just pass this on instead of having to do a radix tree lookup
without proper locking 2 levels into the callchain.

Signed-off-by: Christoph Hellwig <hch@lst.de>wip
---
 arch/ia64/mm/init.c            | 4 ++--
 arch/powerpc/mm/mem.c          | 6 ++----
 arch/s390/mm/init.c            | 2 +-
 arch/sh/mm/init.c              | 4 ++--
 arch/x86/mm/init_32.c          | 4 ++--
 arch/x86/mm/init_64.c          | 6 ++----
 include/linux/memory_hotplug.h | 5 +++--
 kernel/memremap.c              | 2 +-
 mm/hmm.c                       | 4 ++--
 mm/memory_hotplug.c            | 8 ++------
 10 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 2e2e4f532204..6a8ce9e1536e 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -663,7 +663,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-int arch_remove_memory(u64 start, u64 size)
+int arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -671,7 +671,7 @@ int arch_remove_memory(u64 start, u64 size)
 	int ret;
 
 	zone = page_zone(pfn_to_page(start_pfn));
-	ret = __remove_pages(zone, start_pfn, nr_pages);
+	ret = __remove_pages(zone, start_pfn, nr_pages, altmap);
 	if (ret)
 		pr_warn("%s: Problem encountered in __remove_pages() as"
 			" ret=%d\n", __func__,  ret);
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index e670cfc2766e..22aa528b78a2 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -149,11 +149,10 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-int arch_remove_memory(u64 start, u64 size)
+int arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
-	struct vmem_altmap *altmap;
 	struct page *page;
 	int ret;
 
@@ -162,11 +161,10 @@ int arch_remove_memory(u64 start, u64 size)
 	 * when querying the zone.
 	 */
 	page = pfn_to_page(start_pfn);
-	altmap = to_vmem_altmap((unsigned long) page);
 	if (altmap)
 		page += vmem_altmap_offset(altmap);
 
-	ret = __remove_pages(page_zone(page), start_pfn, nr_pages);
+	ret = __remove_pages(page_zone(page), start_pfn, nr_pages, altmap);
 	if (ret)
 		return ret;
 
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index e12c5af50cd7..3fa3e5323612 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -240,7 +240,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-int arch_remove_memory(u64 start, u64 size)
+int arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 {
 	/*
 	 * There is no hardware or firmware interface which could trigger a
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 552afbf55bad..ce0bbaa7e404 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -510,7 +510,7 @@ EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
 #endif
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-int arch_remove_memory(u64 start, u64 size)
+int arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -518,7 +518,7 @@ int arch_remove_memory(u64 start, u64 size)
 	int ret;
 
 	zone = page_zone(pfn_to_page(start_pfn));
-	ret = __remove_pages(zone, start_pfn, nr_pages);
+	ret = __remove_pages(zone, start_pfn, nr_pages, altmap);
 	if (unlikely(ret))
 		pr_warn("%s: Failed, __remove_pages() == %d\n", __func__,
 			ret);
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index cdf19ec6460c..c3bf36fc78d5 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -833,14 +833,14 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-int arch_remove_memory(u64 start, u64 size)
+int arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	struct zone *zone;
 
 	zone = page_zone(pfn_to_page(start_pfn));
-	return __remove_pages(zone, start_pfn, nr_pages);
+	return __remove_pages(zone, start_pfn, nr_pages, altmap);
 }
 #endif
 #endif
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 0c898098feaf..c5bba00fe71f 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1132,21 +1132,19 @@ kernel_physical_mapping_remove(unsigned long start, unsigned long end)
 	remove_pagetable(start, end, true);
 }
 
-int __ref arch_remove_memory(u64 start, u64 size)
+int __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	struct page *page = pfn_to_page(start_pfn);
-	struct vmem_altmap *altmap;
 	struct zone *zone;
 	int ret;
 
 	/* With altmap the first mapped page is offset from @start */
-	altmap = to_vmem_altmap((unsigned long) page);
 	if (altmap)
 		page += vmem_altmap_offset(altmap);
 	zone = page_zone(page);
-	ret = __remove_pages(zone, start_pfn, nr_pages);
+	ret = __remove_pages(zone, start_pfn, nr_pages, altmap);
 	WARN_ON_ONCE(ret);
 	kernel_physical_mapping_remove(start, start + size);
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index cbdd6d52e877..e71927d0d46b 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -126,9 +126,10 @@ static inline bool movable_node_is_enabled(void)
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 extern bool is_pageblock_removable_nolock(struct page *page);
-extern int arch_remove_memory(u64 start, u64 size);
+extern int arch_remove_memory(u64 start, u64 size,
+		struct vmem_altmap *altmap);
 extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
-	unsigned long nr_pages);
+	unsigned long nr_pages, struct vmem_altmap *altmap);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 /* reasonably generic interface to expand the physical pages */
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 16456117a1b1..b707ac60d13c 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -304,7 +304,7 @@ static void devm_memremap_pages_release(struct device *dev, void *data)
 	align_size = ALIGN(resource_size(res), SECTION_SIZE);
 
 	mem_hotplug_begin();
-	arch_remove_memory(align_start, align_size);
+	arch_remove_memory(align_start, align_size, pgmap->altmap);
 	mem_hotplug_done();
 
 	untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
diff --git a/mm/hmm.c b/mm/hmm.c
index cff2aa6910af..b08105e2cd3b 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -838,10 +838,10 @@ static void hmm_devmem_release(struct device *dev, void *data)
 
 	mem_hotplug_begin();
 	if (resource->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY)
-		__remove_pages(zone, start_pfn, npages);
+		__remove_pages(zone, start_pfn, npages, NULL);
 	else
 		arch_remove_memory(start_pfn << PAGE_SHIFT,
-				   npages << PAGE_SHIFT);
+				   npages << PAGE_SHIFT, NULL);
 	mem_hotplug_done();
 
 	hmm_devmem_radix_release(resource);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b36f1822c432..eae6bf47caf7 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -569,7 +569,7 @@ static int __remove_section(struct zone *zone, struct mem_section *ms,
  * calling offline_pages().
  */
 int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
-		 unsigned long nr_pages)
+		 unsigned long nr_pages, struct vmem_altmap *altmap)
 {
 	unsigned long i;
 	unsigned long map_offset = 0;
@@ -577,10 +577,6 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 
 	/* In the ZONE_DEVICE case device driver owns the memory region */
 	if (is_dev_zone(zone)) {
-		struct page *page = pfn_to_page(phys_start_pfn);
-		struct vmem_altmap *altmap;
-
-		altmap = to_vmem_altmap((unsigned long) page);
 		if (altmap)
 			map_offset = vmem_altmap_offset(altmap);
 	} else {
@@ -1890,7 +1886,7 @@ void __ref remove_memory(int nid, u64 start, u64 size)
 	memblock_free(start, size);
 	memblock_remove(start, size);
 
-	arch_remove_memory(start, size);
+	arch_remove_memory(start, size, NULL);
 
 	try_offline_node(nid);
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 07/17] mm: pass the vmem_altmap to vmemmap_free
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2017-12-15 14:09 ` [PATCH 06/17] mm: pass the vmem_altmap to arch_remove_memory and __remove_pages Christoph Hellwig
@ 2017-12-15 14:09 ` Christoph Hellwig
  2017-12-16  2:12   ` Dan Williams
  2017-12-15 14:09 ` [PATCH 08/17] mm: pass the vmem_altmap to memmap_init_zone Christoph Hellwig
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

We can just pass this on instead of having to do a radix tree lookup
without proper locking a few levels into the callchain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/mm/mmu.c            |  3 +-
 arch/ia64/mm/discontig.c       |  3 +-
 arch/powerpc/mm/init_64.c      |  5 ++--
 arch/s390/mm/vmem.c            |  3 +-
 arch/sparc/mm/init_64.c        |  3 +-
 arch/x86/mm/init_64.c          | 67 ++++++++++++++++++++++++------------------
 include/linux/memory_hotplug.h |  2 +-
 include/linux/mm.h             |  3 +-
 mm/memory_hotplug.c            |  7 +++--
 mm/sparse.c                    | 23 ++++++++-------
 10 files changed, 68 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ec8952ff13be..0b1f13e0b4b3 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -696,7 +696,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	return 0;
 }
 #endif	/* CONFIG_ARM64_64K_PAGES */
-void vmemmap_free(unsigned long start, unsigned long end)
+void vmemmap_free(unsigned long start, unsigned long end,
+		struct vmem_altmap *altmap)
 {
 }
 #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index 1555aecaaf85..5ea0d8d0968b 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -760,7 +760,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	return vmemmap_populate_basepages(start, end, node);
 }
 
-void vmemmap_free(unsigned long start, unsigned long end)
+void vmemmap_free(unsigned long start, unsigned long end,
+		struct vmem_altmap *altmap)
 {
 }
 #endif
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 779b74a96b8f..db7d4e092157 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -254,7 +254,8 @@ static unsigned long vmemmap_list_free(unsigned long start)
 	return vmem_back->phys;
 }
 
-void __ref vmemmap_free(unsigned long start, unsigned long end)
+void __ref vmemmap_free(unsigned long start, unsigned long end,
+		struct vmem_altmap *altmap)
 {
 	unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
 	unsigned long page_order = get_order(page_size);
@@ -265,7 +266,6 @@ void __ref vmemmap_free(unsigned long start, unsigned long end)
 
 	for (; start < end; start += page_size) {
 		unsigned long nr_pages, addr;
-		struct vmem_altmap *altmap;
 		struct page *section_base;
 		struct page *page;
 
@@ -285,7 +285,6 @@ void __ref vmemmap_free(unsigned long start, unsigned long end)
 		section_base = pfn_to_page(vmemmap_section_start(start));
 		nr_pages = 1 << page_order;
 
-		altmap = to_vmem_altmap((unsigned long) section_base);
 		if (altmap) {
 			vmem_altmap_free(altmap, nr_pages);
 		} else if (PageReserved(page)) {
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index c44ef0e7c466..db55561c5981 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -297,7 +297,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	return ret;
 }
 
-void vmemmap_free(unsigned long start, unsigned long end)
+void vmemmap_free(unsigned long start, unsigned long end,
+		struct vmem_altmap *altmap)
 {
 }
 
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 42d27a1a042a..995f9490334d 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2671,7 +2671,8 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
 	return 0;
 }
 
-void vmemmap_free(unsigned long start, unsigned long end)
+void vmemmap_free(unsigned long start, unsigned long end,
+		struct vmem_altmap *altmap)
 {
 }
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index c5bba00fe71f..37dd79646a8b 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -800,11 +800,11 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
 
 #define PAGE_INUSE 0xFD
 
-static void __meminit free_pagetable(struct page *page, int order)
+static void __meminit free_pagetable(struct page *page, int order,
+		struct vmem_altmap *altmap)
 {
 	unsigned long magic;
 	unsigned int nr_pages = 1 << order;
-	struct vmem_altmap *altmap = to_vmem_altmap((unsigned long) page);
 
 	if (altmap) {
 		vmem_altmap_free(altmap, nr_pages);
@@ -826,7 +826,8 @@ static void __meminit free_pagetable(struct page *page, int order)
 		free_pages((unsigned long)page_address(page), order);
 }
 
-static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
+static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd,
+		struct vmem_altmap *altmap)
 {
 	pte_t *pte;
 	int i;
@@ -838,13 +839,14 @@ static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
 	}
 
 	/* free a pte talbe */
-	free_pagetable(pmd_page(*pmd), 0);
+	free_pagetable(pmd_page(*pmd), 0, altmap);
 	spin_lock(&init_mm.page_table_lock);
 	pmd_clear(pmd);
 	spin_unlock(&init_mm.page_table_lock);
 }
 
-static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
+static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud,
+		struct vmem_altmap *altmap)
 {
 	pmd_t *pmd;
 	int i;
@@ -856,13 +858,14 @@ static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
 	}
 
 	/* free a pmd talbe */
-	free_pagetable(pud_page(*pud), 0);
+	free_pagetable(pud_page(*pud), 0, altmap);
 	spin_lock(&init_mm.page_table_lock);
 	pud_clear(pud);
 	spin_unlock(&init_mm.page_table_lock);
 }
 
-static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d)
+static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d,
+		struct vmem_altmap *altmap)
 {
 	pud_t *pud;
 	int i;
@@ -874,7 +877,7 @@ static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d)
 	}
 
 	/* free a pud talbe */
-	free_pagetable(p4d_page(*p4d), 0);
+	free_pagetable(p4d_page(*p4d), 0, altmap);
 	spin_lock(&init_mm.page_table_lock);
 	p4d_clear(p4d);
 	spin_unlock(&init_mm.page_table_lock);
@@ -882,7 +885,7 @@ static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d)
 
 static void __meminit
 remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
-		 bool direct)
+		 struct vmem_altmap *altmap, bool direct)
 {
 	unsigned long next, pages = 0;
 	pte_t *pte;
@@ -913,7 +916,7 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
 			 * freed when offlining, or simplely not in use.
 			 */
 			if (!direct)
-				free_pagetable(pte_page(*pte), 0);
+				free_pagetable(pte_page(*pte), 0, altmap);
 
 			spin_lock(&init_mm.page_table_lock);
 			pte_clear(&init_mm, addr, pte);
@@ -936,7 +939,7 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
 
 			page_addr = page_address(pte_page(*pte));
 			if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
-				free_pagetable(pte_page(*pte), 0);
+				free_pagetable(pte_page(*pte), 0, altmap);
 
 				spin_lock(&init_mm.page_table_lock);
 				pte_clear(&init_mm, addr, pte);
@@ -953,7 +956,7 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
 
 static void __meminit
 remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
-		 bool direct)
+		 bool direct, struct vmem_altmap *altmap)
 {
 	unsigned long next, pages = 0;
 	pte_t *pte_base;
@@ -972,7 +975,8 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
 			    IS_ALIGNED(next, PMD_SIZE)) {
 				if (!direct)
 					free_pagetable(pmd_page(*pmd),
-						       get_order(PMD_SIZE));
+						       get_order(PMD_SIZE),
+						       altmap);
 
 				spin_lock(&init_mm.page_table_lock);
 				pmd_clear(pmd);
@@ -986,7 +990,8 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
 				if (!memchr_inv(page_addr, PAGE_INUSE,
 						PMD_SIZE)) {
 					free_pagetable(pmd_page(*pmd),
-						       get_order(PMD_SIZE));
+						       get_order(PMD_SIZE),
+						       altmap);
 
 					spin_lock(&init_mm.page_table_lock);
 					pmd_clear(pmd);
@@ -998,8 +1003,8 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
 		}
 
 		pte_base = (pte_t *)pmd_page_vaddr(*pmd);
-		remove_pte_table(pte_base, addr, next, direct);
-		free_pte_table(pte_base, pmd);
+		remove_pte_table(pte_base, addr, next, altmap, direct);
+		free_pte_table(pte_base, pmd, altmap);
 	}
 
 	/* Call free_pmd_table() in remove_pud_table(). */
@@ -1009,7 +1014,7 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
 
 static void __meminit
 remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
-		 bool direct)
+		 struct vmem_altmap *altmap, bool direct)
 {
 	unsigned long next, pages = 0;
 	pmd_t *pmd_base;
@@ -1028,7 +1033,8 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
 			    IS_ALIGNED(next, PUD_SIZE)) {
 				if (!direct)
 					free_pagetable(pud_page(*pud),
-						       get_order(PUD_SIZE));
+						       get_order(PUD_SIZE),
+						       altmap);
 
 				spin_lock(&init_mm.page_table_lock);
 				pud_clear(pud);
@@ -1042,7 +1048,8 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
 				if (!memchr_inv(page_addr, PAGE_INUSE,
 						PUD_SIZE)) {
 					free_pagetable(pud_page(*pud),
-						       get_order(PUD_SIZE));
+						       get_order(PUD_SIZE),
+						       altmap);
 
 					spin_lock(&init_mm.page_table_lock);
 					pud_clear(pud);
@@ -1054,8 +1061,8 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
 		}
 
 		pmd_base = pmd_offset(pud, 0);
-		remove_pmd_table(pmd_base, addr, next, direct);
-		free_pmd_table(pmd_base, pud);
+		remove_pmd_table(pmd_base, addr, next, direct, altmap);
+		free_pmd_table(pmd_base, pud, altmap);
 	}
 
 	if (direct)
@@ -1064,7 +1071,7 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
 
 static void __meminit
 remove_p4d_table(p4d_t *p4d_start, unsigned long addr, unsigned long end,
-		 bool direct)
+		 struct vmem_altmap *altmap, bool direct)
 {
 	unsigned long next, pages = 0;
 	pud_t *pud_base;
@@ -1080,14 +1087,14 @@ remove_p4d_table(p4d_t *p4d_start, unsigned long addr, unsigned long end,
 		BUILD_BUG_ON(p4d_large(*p4d));
 
 		pud_base = pud_offset(p4d, 0);
-		remove_pud_table(pud_base, addr, next, direct);
+		remove_pud_table(pud_base, addr, next, altmap, direct);
 		/*
 		 * For 4-level page tables we do not want to free PUDs, but in the
 		 * 5-level case we should free them. This code will have to change
 		 * to adapt for boot-time switching between 4 and 5 level page tables.
 		 */
 		if (CONFIG_PGTABLE_LEVELS == 5)
-			free_pud_table(pud_base, p4d);
+			free_pud_table(pud_base, p4d, altmap);
 	}
 
 	if (direct)
@@ -1096,7 +1103,8 @@ remove_p4d_table(p4d_t *p4d_start, unsigned long addr, unsigned long end,
 
 /* start and end are both virtual address. */
 static void __meminit
-remove_pagetable(unsigned long start, unsigned long end, bool direct)
+remove_pagetable(unsigned long start, unsigned long end, bool direct,
+		struct vmem_altmap *altmap)
 {
 	unsigned long next;
 	unsigned long addr;
@@ -1111,15 +1119,16 @@ remove_pagetable(unsigned long start, unsigned long end, bool direct)
 			continue;
 
 		p4d = p4d_offset(pgd, 0);
-		remove_p4d_table(p4d, addr, next, direct);
+		remove_p4d_table(p4d, addr, next, altmap, direct);
 	}
 
 	flush_tlb_all();
 }
 
-void __ref vmemmap_free(unsigned long start, unsigned long end)
+void __ref vmemmap_free(unsigned long start, unsigned long end,
+		struct vmem_altmap *altmap)
 {
-	remove_pagetable(start, end, false);
+	remove_pagetable(start, end, false, altmap);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
@@ -1129,7 +1138,7 @@ kernel_physical_mapping_remove(unsigned long start, unsigned long end)
 	start = (unsigned long)__va(start);
 	end = (unsigned long)__va(end);
 
-	remove_pagetable(start, end, true);
+	remove_pagetable(start, end, true, NULL);
 }
 
 int __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index e71927d0d46b..20dd98ad44a0 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -331,7 +331,7 @@ extern void remove_memory(int nid, u64 start, u64 size);
 extern int sparse_add_one_section(struct pglist_data *pgdat,
 		unsigned long start_pfn, struct vmem_altmap *altmap);
 extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
-		unsigned long map_offset);
+		unsigned long map_offset, struct vmem_altmap *altmap);
 extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
 					  unsigned long pnum);
 extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2f3a7ebecbe2..9d4cd4c1dc6d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2561,7 +2561,8 @@ int vmemmap_populate(unsigned long start, unsigned long end, int node,
 		struct vmem_altmap *altmap);
 void vmemmap_populate_print_last(void);
 #ifdef CONFIG_MEMORY_HOTPLUG
-void vmemmap_free(unsigned long start, unsigned long end);
+void vmemmap_free(unsigned long start, unsigned long end,
+		struct vmem_altmap *altmap);
 #endif
 void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
 				  unsigned long nr_pages);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index eae6bf47caf7..a8dde9734120 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -536,7 +536,7 @@ static void __remove_zone(struct zone *zone, unsigned long start_pfn)
 }
 
 static int __remove_section(struct zone *zone, struct mem_section *ms,
-		unsigned long map_offset)
+		unsigned long map_offset, struct vmem_altmap *altmap)
 {
 	unsigned long start_pfn;
 	int scn_nr;
@@ -553,7 +553,7 @@ static int __remove_section(struct zone *zone, struct mem_section *ms,
 	start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
 	__remove_zone(zone, start_pfn);
 
-	sparse_remove_one_section(zone, ms, map_offset);
+	sparse_remove_one_section(zone, ms, map_offset, altmap);
 	return 0;
 }
 
@@ -607,7 +607,8 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 	for (i = 0; i < sections_to_remove; i++) {
 		unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
 
-		ret = __remove_section(zone, __pfn_to_section(pfn), map_offset);
+		ret = __remove_section(zone, __pfn_to_section(pfn), map_offset,
+				altmap);
 		map_offset = 0;
 		if (ret)
 			break;
diff --git a/mm/sparse.c b/mm/sparse.c
index 5f4a0dac7836..06130c13dc99 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -685,12 +685,13 @@ static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
 	/* This will make the necessary allocations eventually. */
 	return sparse_mem_map_populate(pnum, nid, altmap);
 }
-static void __kfree_section_memmap(struct page *memmap)
+static void __kfree_section_memmap(struct page *memmap,
+		struct vmem_altmap *altmap)
 {
 	unsigned long start = (unsigned long)memmap;
 	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
 
-	vmemmap_free(start, end);
+	vmemmap_free(start, end, altmap);
 }
 #ifdef CONFIG_MEMORY_HOTREMOVE
 static void free_map_bootmem(struct page *memmap)
@@ -698,7 +699,7 @@ static void free_map_bootmem(struct page *memmap)
 	unsigned long start = (unsigned long)memmap;
 	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
 
-	vmemmap_free(start, end);
+	vmemmap_free(start, end, NULL);
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 #else
@@ -729,7 +730,8 @@ static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
 	return __kmalloc_section_memmap();
 }
 
-static void __kfree_section_memmap(struct page *memmap)
+static void __kfree_section_memmap(struct page *memmap,
+		struct vmem_altmap *altmap)
 {
 	if (is_vmalloc_addr(memmap))
 		vfree(memmap);
@@ -798,7 +800,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat,
 		return -ENOMEM;
 	usemap = __kmalloc_section_usemap();
 	if (!usemap) {
-		__kfree_section_memmap(memmap);
+		__kfree_section_memmap(memmap, altmap);
 		return -ENOMEM;
 	}
 
@@ -820,7 +822,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat,
 	pgdat_resize_unlock(pgdat, &flags);
 	if (ret <= 0) {
 		kfree(usemap);
-		__kfree_section_memmap(memmap);
+		__kfree_section_memmap(memmap, altmap);
 	}
 	return ret;
 }
@@ -847,7 +849,8 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
 }
 #endif
 
-static void free_section_usemap(struct page *memmap, unsigned long *usemap)
+static void free_section_usemap(struct page *memmap, unsigned long *usemap,
+		struct vmem_altmap *altmap)
 {
 	struct page *usemap_page;
 
@@ -861,7 +864,7 @@ static void free_section_usemap(struct page *memmap, unsigned long *usemap)
 	if (PageSlab(usemap_page) || PageCompound(usemap_page)) {
 		kfree(usemap);
 		if (memmap)
-			__kfree_section_memmap(memmap);
+			__kfree_section_memmap(memmap, altmap);
 		return;
 	}
 
@@ -875,7 +878,7 @@ static void free_section_usemap(struct page *memmap, unsigned long *usemap)
 }
 
 void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
-		unsigned long map_offset)
+		unsigned long map_offset, struct vmem_altmap *altmap)
 {
 	struct page *memmap = NULL;
 	unsigned long *usemap = NULL, flags;
@@ -893,7 +896,7 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
 
 	clear_hwpoisoned_pages(memmap + map_offset,
 			PAGES_PER_SECTION - map_offset);
-	free_section_usemap(memmap, usemap);
+	free_section_usemap(memmap, usemap, altmap);
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 08/17] mm: pass the vmem_altmap to memmap_init_zone
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2017-12-15 14:09 ` [PATCH 07/17] mm: pass the vmem_altmap to vmemmap_free Christoph Hellwig
@ 2017-12-15 14:09 ` Christoph Hellwig
  2017-12-16  2:15   ` Dan Williams
  2017-12-15 14:09 ` [PATCH 09/17] mm: split altmap memory map allocation from normal case Christoph Hellwig
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

Pass the vmem_altmap two levels down instead of needing a lookup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/ia64/mm/init.c            | 9 +++++----
 include/linux/memory_hotplug.h | 2 +-
 include/linux/mm.h             | 4 ++--
 kernel/memremap.c              | 2 +-
 mm/hmm.c                       | 2 +-
 mm/memory_hotplug.c            | 9 +++++----
 mm/page_alloc.c                | 6 +++---
 7 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 6a8ce9e1536e..18278b448530 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -501,7 +501,7 @@ virtual_memmap_init(u64 start, u64 end, void *arg)
 	if (map_start < map_end)
 		memmap_init_zone((unsigned long)(map_end - map_start),
 				 args->nid, args->zone, page_to_pfn(map_start),
-				 MEMMAP_EARLY);
+				 MEMMAP_EARLY, NULL);
 	return 0;
 }
 
@@ -509,9 +509,10 @@ void __meminit
 memmap_init (unsigned long size, int nid, unsigned long zone,
 	     unsigned long start_pfn)
 {
-	if (!vmem_map)
-		memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY);
-	else {
+	if (!vmem_map) {
+		memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY,
+				NULL);
+	} else {
 		struct page *start;
 		struct memmap_init_callback_data args;
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 20dd98ad44a0..aba5f86eb038 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -324,7 +324,7 @@ extern int add_memory_resource(int nid, struct resource *resource, bool online);
 extern int arch_add_memory(int nid, u64 start, u64 size,
 		struct vmem_altmap *altmap, bool want_memblock);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
-		unsigned long nr_pages);
+		unsigned long nr_pages, struct vmem_altmap *altmap);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
 extern bool is_memblock_offlined(struct memory_block *mem);
 extern void remove_memory(int nid, u64 start, u64 size);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9d4cd4c1dc6d..fd01135324b6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2069,8 +2069,8 @@ static inline void zero_resv_unavail(void) {}
 #endif
 
 extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_zone(unsigned long, int, unsigned long,
-				unsigned long, enum memmap_context);
+extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
+		enum memmap_context, struct vmem_altmap *);
 extern void setup_per_zone_wmarks(void);
 extern int __meminit init_per_zone_wmark_min(void);
 extern void mem_init(void);
diff --git a/kernel/memremap.c b/kernel/memremap.c
index b707ac60d13c..8e85803b6b0e 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -431,7 +431,7 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 	if (!error)
 		move_pfn_range_to_zone(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
 					align_start >> PAGE_SHIFT,
-					align_size >> PAGE_SHIFT);
+					align_size >> PAGE_SHIFT, altmap);
 	mem_hotplug_done();
 	if (error)
 		goto err_add_memory;
diff --git a/mm/hmm.c b/mm/hmm.c
index b08105e2cd3b..5d0f488e66bc 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -942,7 +942,7 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
 	}
 	move_pfn_range_to_zone(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
 				align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT);
+				align_size >> PAGE_SHIFT, NULL);
 	mem_hotplug_done();
 
 	for (pfn = devmem->pfn_first; pfn < devmem->pfn_last; pfn++) {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a8dde9734120..12df8a5fadcc 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -798,8 +798,8 @@ static void __meminit resize_pgdat_range(struct pglist_data *pgdat, unsigned lon
 	pgdat->node_spanned_pages = max(start_pfn + nr_pages, old_end_pfn) - pgdat->node_start_pfn;
 }
 
-void __ref move_pfn_range_to_zone(struct zone *zone,
-		unsigned long start_pfn, unsigned long nr_pages)
+void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
+		unsigned long nr_pages, struct vmem_altmap *altmap)
 {
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	int nid = pgdat->node_id;
@@ -824,7 +824,8 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
 	 * expects the zone spans the pfn range. All the pages in the range
 	 * are reserved so nobody should be touching them so we should be safe
 	 */
-	memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, MEMMAP_HOTPLUG);
+	memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn,
+			MEMMAP_HOTPLUG, altmap);
 
 	set_zone_contiguous(zone);
 }
@@ -896,7 +897,7 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
 	struct zone *zone;
 
 	zone = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
-	move_pfn_range_to_zone(zone, start_pfn, nr_pages);
+	move_pfn_range_to_zone(zone, start_pfn, nr_pages, NULL);
 	return zone;
 }
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 73f5d4556b3d..ad8820304916 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5303,9 +5303,9 @@ void __ref build_all_zonelists(pg_data_t *pgdat)
  * done. Non-atomic initialization, single-pass.
  */
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
-		unsigned long start_pfn, enum memmap_context context)
+		unsigned long start_pfn, enum memmap_context context,
+		struct vmem_altmap *altmap)
 {
-	struct vmem_altmap *altmap = to_vmem_altmap(__pfn_to_phys(start_pfn));
 	unsigned long end_pfn = start_pfn + size;
 	pg_data_t *pgdat = NODE_DATA(nid);
 	unsigned long pfn;
@@ -5406,7 +5406,7 @@ static void __meminit zone_init_free_lists(struct zone *zone)
 
 #ifndef __HAVE_ARCH_MEMMAP_INIT
 #define memmap_init(size, nid, zone, start_pfn) \
-	memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY)
+	memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY, NULL)
 #endif
 
 static int zone_batchsize(struct zone *zone)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 09/17] mm: split altmap memory map allocation from normal case
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2017-12-15 14:09 ` [PATCH 08/17] mm: pass the vmem_altmap to memmap_init_zone Christoph Hellwig
@ 2017-12-15 14:09 ` Christoph Hellwig
  2017-12-16  2:18   ` Dan Williams
  2017-12-15 14:09 ` [PATCH 10/17] mm: merge vmem_altmap_alloc into altmap_alloc_block_buf Christoph Hellwig
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

No functional changes, just untangling the call chain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
---
 arch/powerpc/mm/init_64.c |  5 ++++-
 arch/x86/mm/init_64.c     |  5 ++++-
 include/linux/mm.h        |  9 ++-------
 mm/sparse-vmemmap.c       | 15 +++------------
 4 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index db7d4e092157..7a2251d99ed3 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -200,7 +200,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 		if (vmemmap_populated(start, page_size))
 			continue;
 
-		p =  __vmemmap_alloc_block_buf(page_size, node, altmap);
+		if (altmap)
+			p = altmap_alloc_block_buf(page_size, altmap);
+		else
+			p = vmemmap_alloc_block_buf(page_size, node);
 		if (!p)
 			return -ENOMEM;
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 37dd79646a8b..39c5051cf7c2 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1385,7 +1385,10 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
 		if (pmd_none(*pmd)) {
 			void *p;
 
-			p = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
+			if (altmap)
+				p = altmap_alloc_block_buf(PMD_SIZE, altmap);
+			else
+				p = vmemmap_alloc_block_buf(PMD_SIZE, node);
 			if (p) {
 				pte_t entry;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index fd01135324b6..09637c353de0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2547,13 +2547,8 @@ pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node);
 pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node);
 void *vmemmap_alloc_block(unsigned long size, int node);
 struct vmem_altmap;
-void *__vmemmap_alloc_block_buf(unsigned long size, int node,
-		struct vmem_altmap *altmap);
-static inline void *vmemmap_alloc_block_buf(unsigned long size, int node)
-{
-	return __vmemmap_alloc_block_buf(size, node, NULL);
-}
-
+void *vmemmap_alloc_block_buf(unsigned long size, int node);
+void *altmap_alloc_block_buf(unsigned long size, struct vmem_altmap *altmap);
 void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
 int vmemmap_populate_basepages(unsigned long start, unsigned long end,
 			       int node);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 376dcf05a39c..d012c9e2811b 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -74,7 +74,7 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node)
 }
 
 /* need to make sure size is all the same during early stage */
-static void * __meminit alloc_block_buf(unsigned long size, int node)
+void * __meminit vmemmap_alloc_block_buf(unsigned long size, int node)
 {
 	void *ptr;
 
@@ -129,7 +129,7 @@ static unsigned long __meminit vmem_altmap_alloc(struct vmem_altmap *altmap,
 	return pfn + nr_align;
 }
 
-static void * __meminit altmap_alloc_block_buf(unsigned long size,
+void * __meminit altmap_alloc_block_buf(unsigned long size,
 		struct vmem_altmap *altmap)
 {
 	unsigned long pfn, nr_pfns;
@@ -153,15 +153,6 @@ static void * __meminit altmap_alloc_block_buf(unsigned long size,
 	return ptr;
 }
 
-/* need to make sure size is all the same during early stage */
-void * __meminit __vmemmap_alloc_block_buf(unsigned long size, int node,
-		struct vmem_altmap *altmap)
-{
-	if (altmap)
-		return altmap_alloc_block_buf(size, altmap);
-	return alloc_block_buf(size, node);
-}
-
 void __meminit vmemmap_verify(pte_t *pte, int node,
 				unsigned long start, unsigned long end)
 {
@@ -178,7 +169,7 @@ pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node)
 	pte_t *pte = pte_offset_kernel(pmd, addr);
 	if (pte_none(*pte)) {
 		pte_t entry;
-		void *p = alloc_block_buf(PAGE_SIZE, node);
+		void *p = vmemmap_alloc_block_buf(PAGE_SIZE, node);
 		if (!p)
 			return NULL;
 		entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 10/17] mm: merge vmem_altmap_alloc into altmap_alloc_block_buf
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2017-12-15 14:09 ` [PATCH 09/17] mm: split altmap memory map allocation from normal case Christoph Hellwig
@ 2017-12-15 14:09 ` Christoph Hellwig
  2017-12-16  2:24   ` Dan Williams
  2017-12-15 14:09 ` [PATCH 11/17] mm: move get_dev_pagemap out of line Christoph Hellwig
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

There is no clear separation between the two, so merge them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
---
 mm/sparse-vmemmap.c | 45 ++++++++++++++++-----------------------------
 1 file changed, 16 insertions(+), 29 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index d012c9e2811b..bd0276d5f66b 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -107,33 +107,16 @@ static unsigned long __meminit vmem_altmap_nr_free(struct vmem_altmap *altmap)
 }
 
 /**
- * vmem_altmap_alloc - allocate pages from the vmem_altmap reservation
- * @altmap - reserved page pool for the allocation
- * @nr_pfns - size (in pages) of the allocation
+ * altmap_alloc_block_buf - allocate pages from the device page map
+ * @altmap:	device page map
+ * @size:	size (in bytes) of the allocation
  *
- * Allocations are aligned to the size of the request
+ * Allocations are aligned to the size of the request.
  */
-static unsigned long __meminit vmem_altmap_alloc(struct vmem_altmap *altmap,
-		unsigned long nr_pfns)
-{
-	unsigned long pfn = vmem_altmap_next_pfn(altmap);
-	unsigned long nr_align;
-
-	nr_align = 1UL << find_first_bit(&nr_pfns, BITS_PER_LONG);
-	nr_align = ALIGN(pfn, nr_align) - pfn;
-
-	if (nr_pfns + nr_align > vmem_altmap_nr_free(altmap))
-		return ULONG_MAX;
-	altmap->alloc += nr_pfns;
-	altmap->align += nr_align;
-	return pfn + nr_align;
-}
-
 void * __meminit altmap_alloc_block_buf(unsigned long size,
 		struct vmem_altmap *altmap)
 {
-	unsigned long pfn, nr_pfns;
-	void *ptr;
+	unsigned long pfn, nr_pfns, nr_align;
 
 	if (size & ~PAGE_MASK) {
 		pr_warn_once("%s: allocations must be multiple of PAGE_SIZE (%ld)\n",
@@ -141,16 +124,20 @@ void * __meminit altmap_alloc_block_buf(unsigned long size,
 		return NULL;
 	}
 
+	pfn = vmem_altmap_next_pfn(altmap);
 	nr_pfns = size >> PAGE_SHIFT;
-	pfn = vmem_altmap_alloc(altmap, nr_pfns);
-	if (pfn < ULONG_MAX)
-		ptr = __va(__pfn_to_phys(pfn));
-	else
-		ptr = NULL;
+	nr_align = 1UL << find_first_bit(&nr_pfns, BITS_PER_LONG);
+	nr_align = ALIGN(pfn, nr_align) - pfn;
+	if (nr_pfns + nr_align > vmem_altmap_nr_free(altmap))
+		return NULL;
+
+	altmap->alloc += nr_pfns;
+	altmap->align += nr_align;
+	pfn += nr_align;
+
 	pr_debug("%s: pfn: %#lx alloc: %ld align: %ld nr: %#lx\n",
 			__func__, pfn, altmap->alloc, altmap->align, nr_pfns);
-
-	return ptr;
+	return __va(__pfn_to_phys(pfn));
 }
 
 void __meminit vmemmap_verify(pte_t *pte, int node,
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 11/17] mm: move get_dev_pagemap out of line
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2017-12-15 14:09 ` [PATCH 10/17] mm: merge vmem_altmap_alloc into altmap_alloc_block_buf Christoph Hellwig
@ 2017-12-15 14:09 ` Christoph Hellwig
  2017-12-17 17:26   ` Dan Williams
  2017-12-15 14:09 ` [PATCH 12/17] mm: optimize dev_pagemap reference counting around get_dev_pagemap Christoph Hellwig
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

This is a pretty big function, which should be out of line in general,
and a no-op stub if CONFIG_ZONE_DEVICЕ is not set.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
---
 include/linux/memremap.h | 39 ++++-----------------------------------
 kernel/memremap.c        | 36 ++++++++++++++++++++++++++++++++++--
 2 files changed, 38 insertions(+), 37 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index d5a6736d9737..26e8aaba27d5 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -133,7 +133,8 @@ struct dev_pagemap {
 #ifdef CONFIG_ZONE_DEVICE
 void *devm_memremap_pages(struct device *dev, struct resource *res,
 		struct percpu_ref *ref, struct vmem_altmap *altmap);
-struct dev_pagemap *find_dev_pagemap(resource_size_t phys);
+struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
+		struct dev_pagemap *pgmap);
 
 unsigned long vmem_altmap_offset(struct vmem_altmap *altmap);
 void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns);
@@ -153,7 +154,8 @@ static inline void *devm_memremap_pages(struct device *dev,
 	return ERR_PTR(-ENXIO);
 }
 
-static inline struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
+static inline struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
+		struct dev_pagemap *pgmap)
 {
 	return NULL;
 }
@@ -183,39 +185,6 @@ static inline bool is_device_public_page(const struct page *page)
 }
 #endif /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */
 
-/**
- * get_dev_pagemap() - take a new live reference on the dev_pagemap for @pfn
- * @pfn: page frame number to lookup page_map
- * @pgmap: optional known pgmap that already has a reference
- *
- * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the
- * same mapping.
- */
-static inline struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
-		struct dev_pagemap *pgmap)
-{
-	const struct resource *res = pgmap ? pgmap->res : NULL;
-	resource_size_t phys = PFN_PHYS(pfn);
-
-	/*
-	 * In the cached case we're already holding a live reference so
-	 * we can simply do a blind increment
-	 */
-	if (res && phys >= res->start && phys <= res->end) {
-		percpu_ref_get(pgmap->ref);
-		return pgmap;
-	}
-
-	/* fall back to slow path lookup */
-	rcu_read_lock();
-	pgmap = find_dev_pagemap(phys);
-	if (pgmap && !percpu_ref_tryget_live(pgmap->ref))
-		pgmap = NULL;
-	rcu_read_unlock();
-
-	return pgmap;
-}
-
 static inline void put_dev_pagemap(struct dev_pagemap *pgmap)
 {
 	if (pgmap)
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 8e85803b6b0e..43d94db97ff4 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -314,7 +314,7 @@ static void devm_memremap_pages_release(struct device *dev, void *data)
 }
 
 /* assumes rcu_read_lock() held at entry */
-struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
+static struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
 {
 	struct page_map *page_map;
 
@@ -500,8 +500,40 @@ struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start)
 
 	return pgmap ? pgmap->altmap : NULL;
 }
-#endif /* CONFIG_ZONE_DEVICE */
 
+/**
+ * get_dev_pagemap() - take a new live reference on the dev_pagemap for @pfn
+ * @pfn: page frame number to lookup page_map
+ * @pgmap: optional known pgmap that already has a reference
+ *
+ * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the
+ * same mapping.
+ */
+struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
+		struct dev_pagemap *pgmap)
+{
+	const struct resource *res = pgmap ? pgmap->res : NULL;
+	resource_size_t phys = PFN_PHYS(pfn);
+
+	/*
+	 * In the cached case we're already holding a live reference so
+	 * we can simply do a blind increment
+	 */
+	if (res && phys >= res->start && phys <= res->end) {
+		percpu_ref_get(pgmap->ref);
+		return pgmap;
+	}
+
+	/* fall back to slow path lookup */
+	rcu_read_lock();
+	pgmap = find_dev_pagemap(phys);
+	if (pgmap && !percpu_ref_tryget_live(pgmap->ref))
+		pgmap = NULL;
+	rcu_read_unlock();
+
+	return pgmap;
+}
+#endif /* CONFIG_ZONE_DEVICE */
 
 #if IS_ENABLED(CONFIG_DEVICE_PRIVATE) ||  IS_ENABLED(CONFIG_DEVICE_PUBLIC)
 void put_zone_device_private_or_public_page(struct page *page)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 12/17] mm: optimize dev_pagemap reference counting around get_dev_pagemap
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2017-12-15 14:09 ` [PATCH 11/17] mm: move get_dev_pagemap out of line Christoph Hellwig
@ 2017-12-15 14:09 ` Christoph Hellwig
  2017-12-17 17:28   ` Dan Williams
  2017-12-15 14:09 ` [PATCH 13/17] memremap: remove to_vmem_altmap Christoph Hellwig
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

Change the calling convention so that get_dev_pagemap always consumes the
previous reference instead of doing this using an explicit earlier call to
put_dev_pagemap in the callers.

The callers will still need to put the final reference after finishing the
loop over the pages.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
---
 kernel/memremap.c | 17 +++++++++--------
 mm/gup.c          |  7 +++++--
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 43d94db97ff4..26764085785d 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -506,22 +506,23 @@ struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start)
  * @pfn: page frame number to lookup page_map
  * @pgmap: optional known pgmap that already has a reference
  *
- * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the
- * same mapping.
+ * If @pgmap is non-NULL and covers @pfn it will be returned as-is.  If @pgmap
+ * is non-NULL but does not cover @pfn the reference to it while be released.
  */
 struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
 		struct dev_pagemap *pgmap)
 {
-	const struct resource *res = pgmap ? pgmap->res : NULL;
 	resource_size_t phys = PFN_PHYS(pfn);
 
 	/*
-	 * In the cached case we're already holding a live reference so
-	 * we can simply do a blind increment
+	 * In the cached case we're already holding a live reference.
 	 */
-	if (res && phys >= res->start && phys <= res->end) {
-		percpu_ref_get(pgmap->ref);
-		return pgmap;
+	if (pgmap) {
+		const struct resource *res = pgmap ? pgmap->res : NULL;
+
+		if (res && phys >= res->start && phys <= res->end)
+			return pgmap;
+		put_dev_pagemap(pgmap);
 	}
 
 	/* fall back to slow path lookup */
diff --git a/mm/gup.c b/mm/gup.c
index d3fb60e5bfac..9d142eb9e2e9 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1410,7 +1410,6 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 
 		VM_BUG_ON_PAGE(compound_head(page) != head, page);
 
-		put_dev_pagemap(pgmap);
 		SetPageReferenced(page);
 		pages[*nr] = page;
 		(*nr)++;
@@ -1420,6 +1419,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 	ret = 1;
 
 pte_unmap:
+	if (pgmap)
+		put_dev_pagemap(pgmap);
 	pte_unmap(ptem);
 	return ret;
 }
@@ -1459,10 +1460,12 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr,
 		SetPageReferenced(page);
 		pages[*nr] = page;
 		get_page(page);
-		put_dev_pagemap(pgmap);
 		(*nr)++;
 		pfn++;
 	} while (addr += PAGE_SIZE, addr != end);
+
+	if (pgmap)
+		put_dev_pagemap(pgmap);
 	return 1;
 }
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 13/17] memremap: remove to_vmem_altmap
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2017-12-15 14:09 ` [PATCH 12/17] mm: optimize dev_pagemap reference counting around get_dev_pagemap Christoph Hellwig
@ 2017-12-15 14:09 ` Christoph Hellwig
  2017-12-17 17:30   ` Dan Williams
  2017-12-15 14:09 ` [PATCH 14/17] memremap: simplify duplicate region handling in devm_memremap_pages Christoph Hellwig
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

All callers are gone now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/memremap.h |  9 ---------
 kernel/memremap.c        | 26 --------------------------
 2 files changed, 35 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 26e8aaba27d5..3fddcfe57bb0 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -26,15 +26,6 @@ struct vmem_altmap {
 	unsigned long alloc;
 };
 
-#ifdef CONFIG_ZONE_DEVICE
-struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start);
-#else
-static inline struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start)
-{
-	return NULL;
-}
-#endif
-
 /*
  * Specialize ZONE_DEVICE memory into multiple types each having differents
  * usage.
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 26764085785d..891491ddccdb 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -475,32 +475,6 @@ void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns)
 	altmap->alloc -= nr_pfns;
 }
 
-struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start)
-{
-	/*
-	 * 'memmap_start' is the virtual address for the first "struct
-	 * page" in this range of the vmemmap array.  In the case of
-	 * CONFIG_SPARSEMEM_VMEMMAP a page_to_pfn conversion is simple
-	 * pointer arithmetic, so we can perform this to_vmem_altmap()
-	 * conversion without concern for the initialization state of
-	 * the struct page fields.
-	 */
-	struct page *page = (struct page *) memmap_start;
-	struct dev_pagemap *pgmap;
-
-	/*
-	 * Unconditionally retrieve a dev_pagemap associated with the
-	 * given physical address, this is only for use in the
-	 * arch_{add|remove}_memory() for setting up and tearing down
-	 * the memmap.
-	 */
-	rcu_read_lock();
-	pgmap = find_dev_pagemap(__pfn_to_phys(page_to_pfn(page)));
-	rcu_read_unlock();
-
-	return pgmap ? pgmap->altmap : NULL;
-}
-
 /**
  * get_dev_pagemap() - take a new live reference on the dev_pagemap for @pfn
  * @pfn: page frame number to lookup page_map
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 14/17] memremap: simplify duplicate region handling in devm_memremap_pages
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2017-12-15 14:09 ` [PATCH 13/17] memremap: remove to_vmem_altmap Christoph Hellwig
@ 2017-12-15 14:09 ` Christoph Hellwig
  2017-12-17 17:34   ` Dan Williams
  2017-12-15 14:09 ` [PATCH 15/17] memremap: drop private struct page_map Christoph Hellwig
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

__radix_tree_insert already checks for duplicates and returns -EEXIST in
that case, so remove the duplicate (and racy) duplicates check.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
---
 kernel/memremap.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 891491ddccdb..901404094df1 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -395,17 +395,6 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 	align_end = align_start + align_size - 1;
 
 	foreach_order_pgoff(res, order, pgoff) {
-		struct dev_pagemap *dup;
-
-		rcu_read_lock();
-		dup = find_dev_pagemap(res->start + PFN_PHYS(pgoff));
-		rcu_read_unlock();
-		if (dup) {
-			dev_err(dev, "%s: %pr collides with mapping for %s\n",
-					__func__, res, dev_name(dup->dev));
-			error = -EBUSY;
-			break;
-		}
 		error = __radix_tree_insert(&pgmap_radix,
 				PHYS_PFN(res->start) + pgoff, order, page_map);
 		if (error) {
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 15/17] memremap: drop private struct page_map
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2017-12-15 14:09 ` [PATCH 14/17] memremap: simplify duplicate region handling in devm_memremap_pages Christoph Hellwig
@ 2017-12-15 14:09 ` Christoph Hellwig
  2017-12-17 18:43   ` Dan Williams
  2017-12-15 14:09 ` [PATCH 16/17] memremap: change devm_memremap_pages interface to use struct dev_pagemap Christoph Hellwig
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

From: Logan Gunthorpe <logang@deltatee.com>

'struct page_map' is a private structure of 'struct dev_pagemap' but the
latter replicates all the same fields as the former so there isn't much
value in it. Thus drop it in favour of a completely public struct.

This is a clean up in preperation for a more generally useful
'devm_memeremap_pages' interface.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/memremap.h |  5 ++--
 kernel/memremap.c        | 66 ++++++++++++++++++------------------------------
 mm/hmm.c                 |  2 +-
 3 files changed, 29 insertions(+), 44 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 3fddcfe57bb0..1cb5f39d25c1 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -113,8 +113,9 @@ typedef void (*dev_page_free_t)(struct page *page, void *data);
 struct dev_pagemap {
 	dev_page_fault_t page_fault;
 	dev_page_free_t page_free;
-	struct vmem_altmap *altmap;
-	const struct resource *res;
+	struct vmem_altmap altmap;
+	bool altmap_valid;
+	struct resource res;
 	struct percpu_ref *ref;
 	struct device *dev;
 	void *data;
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 901404094df1..97782215bbd4 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -188,13 +188,6 @@ static RADIX_TREE(pgmap_radix, GFP_KERNEL);
 #define SECTION_MASK ~((1UL << PA_SECTION_SHIFT) - 1)
 #define SECTION_SIZE (1UL << PA_SECTION_SHIFT)
 
-struct page_map {
-	struct resource res;
-	struct percpu_ref *ref;
-	struct dev_pagemap pgmap;
-	struct vmem_altmap altmap;
-};
-
 static unsigned long order_at(struct resource *res, unsigned long pgoff)
 {
 	unsigned long phys_pgoff = PHYS_PFN(res->start) + pgoff;
@@ -260,22 +253,21 @@ static void pgmap_radix_release(struct resource *res)
 	synchronize_rcu();
 }
 
-static unsigned long pfn_first(struct page_map *page_map)
+static unsigned long pfn_first(struct dev_pagemap *pgmap)
 {
-	struct dev_pagemap *pgmap = &page_map->pgmap;
-	const struct resource *res = &page_map->res;
-	struct vmem_altmap *altmap = pgmap->altmap;
+	const struct resource *res = &pgmap->res;
+	struct vmem_altmap *altmap = &pgmap->altmap;
 	unsigned long pfn;
 
 	pfn = res->start >> PAGE_SHIFT;
-	if (altmap)
+	if (pgmap->altmap_valid)
 		pfn += vmem_altmap_offset(altmap);
 	return pfn;
 }
 
-static unsigned long pfn_end(struct page_map *page_map)
+static unsigned long pfn_end(struct dev_pagemap *pgmap)
 {
-	const struct resource *res = &page_map->res;
+	const struct resource *res = &pgmap->res;
 
 	return (res->start + resource_size(res)) >> PAGE_SHIFT;
 }
@@ -285,13 +277,12 @@ static unsigned long pfn_end(struct page_map *page_map)
 
 static void devm_memremap_pages_release(struct device *dev, void *data)
 {
-	struct page_map *page_map = data;
-	struct resource *res = &page_map->res;
+	struct dev_pagemap *pgmap = data;
+	struct resource *res = &pgmap->res;
 	resource_size_t align_start, align_size;
-	struct dev_pagemap *pgmap = &page_map->pgmap;
 	unsigned long pfn;
 
-	for_each_device_pfn(pfn, page_map)
+	for_each_device_pfn(pfn, pgmap)
 		put_page(pfn_to_page(pfn));
 
 	if (percpu_ref_tryget_live(pgmap->ref)) {
@@ -304,24 +295,22 @@ static void devm_memremap_pages_release(struct device *dev, void *data)
 	align_size = ALIGN(resource_size(res), SECTION_SIZE);
 
 	mem_hotplug_begin();
-	arch_remove_memory(align_start, align_size, pgmap->altmap);
+	arch_remove_memory(align_start, align_size, pgmap->altmap_valid ?
+			&pgmap->altmap : NULL);
 	mem_hotplug_done();
 
 	untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
 	pgmap_radix_release(res);
-	dev_WARN_ONCE(dev, pgmap->altmap && pgmap->altmap->alloc,
-			"%s: failed to free all reserved pages\n", __func__);
+	dev_WARN_ONCE(dev, pgmap->altmap.alloc,
+		      "%s: failed to free all reserved pages\n", __func__);
 }
 
 /* assumes rcu_read_lock() held at entry */
 static struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
 {
-	struct page_map *page_map;
-
 	WARN_ON_ONCE(!rcu_read_lock_held());
 
-	page_map = radix_tree_lookup(&pgmap_radix, PHYS_PFN(phys));
-	return page_map ? &page_map->pgmap : NULL;
+	return radix_tree_lookup(&pgmap_radix, PHYS_PFN(phys));
 }
 
 /**
@@ -349,7 +338,6 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 	unsigned long pfn, pgoff, order;
 	pgprot_t pgprot = PAGE_KERNEL;
 	struct dev_pagemap *pgmap;
-	struct page_map *page_map;
 	int error, nid, is_ram, i = 0;
 
 	align_start = res->start & ~(SECTION_SIZE - 1);
@@ -370,21 +358,19 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 	if (!ref)
 		return ERR_PTR(-EINVAL);
 
-	page_map = devres_alloc_node(devm_memremap_pages_release,
-			sizeof(*page_map), GFP_KERNEL, dev_to_node(dev));
-	if (!page_map)
+	pgmap = devres_alloc_node(devm_memremap_pages_release,
+			sizeof(*pgmap), GFP_KERNEL, dev_to_node(dev));
+	if (!pgmap)
 		return ERR_PTR(-ENOMEM);
-	pgmap = &page_map->pgmap;
 
-	memcpy(&page_map->res, res, sizeof(*res));
+	memcpy(&pgmap->res, res, sizeof(*res));
 
 	pgmap->dev = dev;
 	if (altmap) {
-		memcpy(&page_map->altmap, altmap, sizeof(*altmap));
-		pgmap->altmap = &page_map->altmap;
+		memcpy(&pgmap->altmap, altmap, sizeof(*altmap));
+		pgmap->altmap_valid = true;
 	}
 	pgmap->ref = ref;
-	pgmap->res = &page_map->res;
 	pgmap->type = MEMORY_DEVICE_HOST;
 	pgmap->page_fault = NULL;
 	pgmap->page_free = NULL;
@@ -396,7 +382,7 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 
 	foreach_order_pgoff(res, order, pgoff) {
 		error = __radix_tree_insert(&pgmap_radix,
-				PHYS_PFN(res->start) + pgoff, order, page_map);
+				PHYS_PFN(res->start) + pgoff, order, pgmap);
 		if (error) {
 			dev_err(dev, "%s: failed: %d\n", __func__, error);
 			break;
@@ -425,7 +411,7 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 	if (error)
 		goto err_add_memory;
 
-	for_each_device_pfn(pfn, page_map) {
+	for_each_device_pfn(pfn, pgmap) {
 		struct page *page = pfn_to_page(pfn);
 
 		/*
@@ -440,7 +426,7 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 		if (!(++i % 1024))
 			cond_resched();
 	}
-	devres_add(dev, page_map);
+	devres_add(dev, pgmap);
 	return __va(res->start);
 
  err_add_memory:
@@ -448,7 +434,7 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
  err_pfn_remap:
  err_radix:
 	pgmap_radix_release(res);
-	devres_free(page_map);
+	devres_free(pgmap);
 	return ERR_PTR(error);
 }
 EXPORT_SYMBOL(devm_memremap_pages);
@@ -481,9 +467,7 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
 	 * In the cached case we're already holding a live reference.
 	 */
 	if (pgmap) {
-		const struct resource *res = pgmap ? pgmap->res : NULL;
-
-		if (res && phys >= res->start && phys <= res->end)
+		if (phys >= pgmap->res.start && phys <= pgmap->res.end)
 			return pgmap;
 		put_dev_pagemap(pgmap);
 	}
diff --git a/mm/hmm.c b/mm/hmm.c
index 5d0f488e66bc..ee75b2923dde 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -882,7 +882,7 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
 	else
 		devmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
 
-	devmem->pagemap.res = devmem->resource;
+	devmem->pagemap.res = *devmem->resource;
 	devmem->pagemap.page_fault = hmm_devmem_fault;
 	devmem->pagemap.page_free = hmm_devmem_free;
 	devmem->pagemap.dev = devmem->device;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 16/17] memremap: change devm_memremap_pages interface to use struct dev_pagemap
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2017-12-15 14:09 ` [PATCH 15/17] memremap: drop private struct page_map Christoph Hellwig
@ 2017-12-15 14:09 ` Christoph Hellwig
  2017-12-17 18:51   ` Dan Williams
  2017-12-15 14:09 ` [PATCH 17/17] memremap: merge find_dev_pagemap into get_dev_pagemap Christoph Hellwig
  2017-12-19 20:36 ` revamp vmem_altmap / dev_pagemap handling V2 Dan Williams
  17 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

From: Logan Gunthorpe <logang@deltatee.com>

This new interface is similar to how struct device (and many others)
work. The caller initializes a 'struct dev_pagemap' as required
and calls 'devm_memremap_pages'. This allows the pagemap structure to
be embedded in another structure and thus container_of can be used. In
this way application specific members can be stored in a containing
struct.

This will be used by the P2P infrastructure and HMM could probably
be cleaned up to use it as well (instead of having it's own, similar
'hmm_devmem_pages_create' function).

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/dax/pmem.c                | 20 +++++++++-------
 drivers/nvdimm/nd.h               |  9 ++++---
 drivers/nvdimm/pfn_devs.c         | 25 ++++++++++----------
 drivers/nvdimm/pmem.c             | 37 ++++++++++++++++-------------
 drivers/nvdimm/pmem.h             |  1 +
 include/linux/memremap.h          |  6 ++---
 kernel/memremap.c                 | 50 ++++++++++++++++-----------------------
 tools/testing/nvdimm/test/iomap.c |  7 +++---
 8 files changed, 75 insertions(+), 80 deletions(-)

diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index 8d8c852ba8f2..31b6ecce4c64 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -21,6 +21,7 @@
 struct dax_pmem {
 	struct device *dev;
 	struct percpu_ref ref;
+	struct dev_pagemap pgmap;
 	struct completion cmp;
 };
 
@@ -69,20 +70,23 @@ static int dax_pmem_probe(struct device *dev)
 	struct nd_namespace_common *ndns;
 	struct nd_dax *nd_dax = to_nd_dax(dev);
 	struct nd_pfn *nd_pfn = &nd_dax->nd_pfn;
-	struct vmem_altmap __altmap, *altmap = NULL;
 
 	ndns = nvdimm_namespace_common_probe(dev);
 	if (IS_ERR(ndns))
 		return PTR_ERR(ndns);
 	nsio = to_nd_namespace_io(&ndns->dev);
 
+	dax_pmem = devm_kzalloc(dev, sizeof(*dax_pmem), GFP_KERNEL);
+	if (!dax_pmem)
+		return -ENOMEM;
+
 	/* parse the 'pfn' info block via ->rw_bytes */
 	rc = devm_nsio_enable(dev, nsio);
 	if (rc)
 		return rc;
-	altmap = nvdimm_setup_pfn(nd_pfn, &res, &__altmap);
-	if (IS_ERR(altmap))
-		return PTR_ERR(altmap);
+	rc = nvdimm_setup_pfn(nd_pfn, &dax_pmem->pgmap);
+	if (rc)
+		return rc;
 	devm_nsio_disable(dev, nsio);
 
 	pfn_sb = nd_pfn->pfn_sb;
@@ -94,10 +98,6 @@ static int dax_pmem_probe(struct device *dev)
 		return -EBUSY;
 	}
 
-	dax_pmem = devm_kzalloc(dev, sizeof(*dax_pmem), GFP_KERNEL);
-	if (!dax_pmem)
-		return -ENOMEM;
-
 	dax_pmem->dev = dev;
 	init_completion(&dax_pmem->cmp);
 	rc = percpu_ref_init(&dax_pmem->ref, dax_pmem_percpu_release, 0,
@@ -110,7 +110,8 @@ static int dax_pmem_probe(struct device *dev)
 	if (rc)
 		return rc;
 
-	addr = devm_memremap_pages(dev, &res, &dax_pmem->ref, altmap);
+	dax_pmem->pgmap.ref = &dax_pmem->ref;
+	addr = devm_memremap_pages(dev, &dax_pmem->pgmap);
 	if (IS_ERR(addr))
 		return PTR_ERR(addr);
 
@@ -120,6 +121,7 @@ static int dax_pmem_probe(struct device *dev)
 		return rc;
 
 	/* adjust the dax_region resource to the start of data */
+	memcpy(&res, &dax_pmem->pgmap.res, sizeof(res));
 	res.start += le64_to_cpu(pfn_sb->dataoff);
 
 	rc = sscanf(dev_name(&ndns->dev), "namespace%d.%d", &region_id, &id);
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index e958f3724c41..8d6375ee0fda 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -368,15 +368,14 @@ unsigned int pmem_sector_size(struct nd_namespace_common *ndns);
 void nvdimm_badblocks_populate(struct nd_region *nd_region,
 		struct badblocks *bb, const struct resource *res);
 #if IS_ENABLED(CONFIG_ND_CLAIM)
-struct vmem_altmap *nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
-		struct resource *res, struct vmem_altmap *altmap);
+int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
 int devm_nsio_enable(struct device *dev, struct nd_namespace_io *nsio);
 void devm_nsio_disable(struct device *dev, struct nd_namespace_io *nsio);
 #else
-static inline struct vmem_altmap *nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
-		struct resource *res, struct vmem_altmap *altmap)
+static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
+				   struct dev_pagemap *pgmap)
 {
-	return ERR_PTR(-ENXIO);
+	return -ENXIO;
 }
 static inline int devm_nsio_enable(struct device *dev,
 		struct nd_namespace_io *nsio)
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 65cc171c721d..6f58615ddb85 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -541,9 +541,10 @@ static unsigned long init_altmap_reserve(resource_size_t base)
 	return reserve;
 }
 
-static struct vmem_altmap *__nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
-		struct resource *res, struct vmem_altmap *altmap)
+static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap)
 {
+	struct resource *res = &pgmap->res;
+	struct vmem_altmap *altmap = &pgmap->altmap;
 	struct nd_pfn_sb *pfn_sb = nd_pfn->pfn_sb;
 	u64 offset = le64_to_cpu(pfn_sb->dataoff);
 	u32 start_pad = __le32_to_cpu(pfn_sb->start_pad);
@@ -562,9 +563,9 @@ static struct vmem_altmap *__nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
 
 	if (nd_pfn->mode == PFN_MODE_RAM) {
 		if (offset < SZ_8K)
-			return ERR_PTR(-EINVAL);
+			return -EINVAL;
 		nd_pfn->npfns = le64_to_cpu(pfn_sb->npfns);
-		altmap = NULL;
+		pgmap->altmap_valid = false;
 	} else if (nd_pfn->mode == PFN_MODE_PMEM) {
 		nd_pfn->npfns = PFN_SECTION_ALIGN_UP((resource_size(res)
 					- offset) / PAGE_SIZE);
@@ -576,10 +577,11 @@ static struct vmem_altmap *__nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
 		memcpy(altmap, &__altmap, sizeof(*altmap));
 		altmap->free = PHYS_PFN(offset - SZ_8K);
 		altmap->alloc = 0;
+		pgmap->altmap_valid = true;
 	} else
-		return ERR_PTR(-ENXIO);
+		return -ENXIO;
 
-	return altmap;
+	return 0;
 }
 
 static int nd_pfn_init(struct nd_pfn *nd_pfn)
@@ -698,19 +700,18 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
  * Determine the effective resource range and vmem_altmap from an nd_pfn
  * instance.
  */
-struct vmem_altmap *nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
-		struct resource *res, struct vmem_altmap *altmap)
+int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap)
 {
 	int rc;
 
 	if (!nd_pfn->uuid || !nd_pfn->ndns)
-		return ERR_PTR(-ENODEV);
+		return -ENODEV;
 
 	rc = nd_pfn_init(nd_pfn);
 	if (rc)
-		return ERR_PTR(rc);
+		return rc;
 
-	/* we need a valid pfn_sb before we can init a vmem_altmap */
-	return __nvdimm_setup_pfn(nd_pfn, res, altmap);
+	/* we need a valid pfn_sb before we can init a dev_pagemap */
+	return __nvdimm_setup_pfn(nd_pfn, pgmap);
 }
 EXPORT_SYMBOL_GPL(nvdimm_setup_pfn);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 7fbc5c5dc8e1..cf074b1ce219 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -298,34 +298,34 @@ static int pmem_attach_disk(struct device *dev,
 {
 	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
 	struct nd_region *nd_region = to_nd_region(dev->parent);
-	struct vmem_altmap __altmap, *altmap = NULL;
 	int nid = dev_to_node(dev), fua, wbc;
 	struct resource *res = &nsio->res;
+	struct resource bb_res;
 	struct nd_pfn *nd_pfn = NULL;
 	struct dax_device *dax_dev;
 	struct nd_pfn_sb *pfn_sb;
 	struct pmem_device *pmem;
-	struct resource pfn_res;
 	struct request_queue *q;
 	struct device *gendev;
 	struct gendisk *disk;
 	void *addr;
+	int rc;
+
+	pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
+	if (!pmem)
+		return -ENOMEM;
 
 	/* while nsio_rw_bytes is active, parse a pfn info block if present */
 	if (is_nd_pfn(dev)) {
 		nd_pfn = to_nd_pfn(dev);
-		altmap = nvdimm_setup_pfn(nd_pfn, &pfn_res, &__altmap);
-		if (IS_ERR(altmap))
-			return PTR_ERR(altmap);
+		rc = nvdimm_setup_pfn(nd_pfn, &pmem->pgmap);
+		if (rc)
+			return rc;
 	}
 
 	/* we're attaching a block device, disable raw namespace access */
 	devm_nsio_disable(dev, nsio);
 
-	pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
-	if (!pmem)
-		return -ENOMEM;
-
 	dev_set_drvdata(dev, pmem);
 	pmem->phys_addr = res->start;
 	pmem->size = resource_size(res);
@@ -350,19 +350,22 @@ static int pmem_attach_disk(struct device *dev,
 		return -ENOMEM;
 
 	pmem->pfn_flags = PFN_DEV;
+	pmem->pgmap.ref = &q->q_usage_counter;
 	if (is_nd_pfn(dev)) {
-		addr = devm_memremap_pages(dev, &pfn_res, &q->q_usage_counter,
-				altmap);
+		addr = devm_memremap_pages(dev, &pmem->pgmap);
 		pfn_sb = nd_pfn->pfn_sb;
 		pmem->data_offset = le64_to_cpu(pfn_sb->dataoff);
-		pmem->pfn_pad = resource_size(res) - resource_size(&pfn_res);
+		pmem->pfn_pad = resource_size(res) -
+			resource_size(&pmem->pgmap.res);
 		pmem->pfn_flags |= PFN_MAP;
-		res = &pfn_res; /* for badblocks populate */
-		res->start += pmem->data_offset;
+		memcpy(&bb_res, &pmem->pgmap.res, sizeof(bb_res));
+		bb_res.start += pmem->data_offset;
 	} else if (pmem_should_map_pages(dev)) {
-		addr = devm_memremap_pages(dev, &nsio->res,
-				&q->q_usage_counter, NULL);
+		memcpy(&pmem->pgmap.res, &nsio->res, sizeof(pmem->pgmap.res));
+		pmem->pgmap.altmap_valid = false;
+		addr = devm_memremap_pages(dev, &pmem->pgmap);
 		pmem->pfn_flags |= PFN_MAP;
+		memcpy(&bb_res, &pmem->pgmap.res, sizeof(bb_res));
 	} else
 		addr = devm_memremap(dev, pmem->phys_addr,
 				pmem->size, ARCH_MEMREMAP_PMEM);
@@ -401,7 +404,7 @@ static int pmem_attach_disk(struct device *dev,
 			/ 512);
 	if (devm_init_badblocks(dev, &pmem->bb))
 		return -ENOMEM;
-	nvdimm_badblocks_populate(nd_region, &pmem->bb, res);
+	nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_res);
 	disk->bb = &pmem->bb;
 
 	dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops);
diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h
index 6a3cd2a10db6..a64ebc78b5df 100644
--- a/drivers/nvdimm/pmem.h
+++ b/drivers/nvdimm/pmem.h
@@ -22,6 +22,7 @@ struct pmem_device {
 	struct badblocks	bb;
 	struct dax_device	*dax_dev;
 	struct gendisk		*disk;
+	struct dev_pagemap	pgmap;
 };
 
 long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 1cb5f39d25c1..7b4899c06f49 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -123,8 +123,7 @@ struct dev_pagemap {
 };
 
 #ifdef CONFIG_ZONE_DEVICE
-void *devm_memremap_pages(struct device *dev, struct resource *res,
-		struct percpu_ref *ref, struct vmem_altmap *altmap);
+void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap);
 struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
 		struct dev_pagemap *pgmap);
 
@@ -134,8 +133,7 @@ void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns);
 static inline bool is_zone_device_page(const struct page *page);
 #else
 static inline void *devm_memremap_pages(struct device *dev,
-		struct resource *res, struct percpu_ref *ref,
-		struct vmem_altmap *altmap)
+		struct dev_pagemap *pgmap)
 {
 	/*
 	 * Fail attempts to call devm_memremap_pages() without
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 97782215bbd4..fd0e7c44e6bd 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -275,9 +275,10 @@ static unsigned long pfn_end(struct dev_pagemap *pgmap)
 #define for_each_device_pfn(pfn, map) \
 	for (pfn = pfn_first(map); pfn < pfn_end(map); pfn++)
 
-static void devm_memremap_pages_release(struct device *dev, void *data)
+static void devm_memremap_pages_release(void *data)
 {
 	struct dev_pagemap *pgmap = data;
+	struct device *dev = pgmap->dev;
 	struct resource *res = &pgmap->res;
 	resource_size_t align_start, align_size;
 	unsigned long pfn;
@@ -316,29 +317,34 @@ static struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
 /**
  * devm_memremap_pages - remap and provide memmap backing for the given resource
  * @dev: hosting device for @res
- * @res: "host memory" address range
- * @ref: a live per-cpu reference count
- * @altmap: optional descriptor for allocating the memmap from @res
+ * @pgmap: pointer to a struct dev_pgmap
  *
  * Notes:
- * 1/ @ref must be 'live' on entry and 'dead' before devm_memunmap_pages() time
- *    (or devm release event). The expected order of events is that @ref has
+ * 1/ At a minimum the res, ref and type members of @pgmap must be initialized
+ *    by the caller before passing it to this function
+ *
+ * 2/ The altmap field may optionally be initialized, in which case altmap_valid
+ *    must be set to true
+ *
+ * 3/ pgmap.ref must be 'live' on entry and 'dead' before devm_memunmap_pages()
+ *    time (or devm release event). The expected order of events is that ref has
  *    been through percpu_ref_kill() before devm_memremap_pages_release(). The
  *    wait for the completion of all references being dropped and
  *    percpu_ref_exit() must occur after devm_memremap_pages_release().
  *
- * 2/ @res is expected to be a host memory range that could feasibly be
+ * 4/ res is expected to be a host memory range that could feasibly be
  *    treated as a "System RAM" range, i.e. not a device mmio range, but
  *    this is not enforced.
  */
-void *devm_memremap_pages(struct device *dev, struct resource *res,
-		struct percpu_ref *ref, struct vmem_altmap *altmap)
+void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 {
 	resource_size_t align_start, align_size, align_end;
+	struct vmem_altmap *altmap = pgmap->altmap_valid ?
+			&pgmap->altmap : NULL;
 	unsigned long pfn, pgoff, order;
 	pgprot_t pgprot = PAGE_KERNEL;
-	struct dev_pagemap *pgmap;
 	int error, nid, is_ram, i = 0;
+	struct resource *res = &pgmap->res;
 
 	align_start = res->start & ~(SECTION_SIZE - 1);
 	align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
@@ -355,26 +361,10 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 	if (is_ram == REGION_INTERSECTS)
 		return __va(res->start);
 
-	if (!ref)
+	if (!pgmap->ref)
 		return ERR_PTR(-EINVAL);
 
-	pgmap = devres_alloc_node(devm_memremap_pages_release,
-			sizeof(*pgmap), GFP_KERNEL, dev_to_node(dev));
-	if (!pgmap)
-		return ERR_PTR(-ENOMEM);
-
-	memcpy(&pgmap->res, res, sizeof(*res));
-
 	pgmap->dev = dev;
-	if (altmap) {
-		memcpy(&pgmap->altmap, altmap, sizeof(*altmap));
-		pgmap->altmap_valid = true;
-	}
-	pgmap->ref = ref;
-	pgmap->type = MEMORY_DEVICE_HOST;
-	pgmap->page_fault = NULL;
-	pgmap->page_free = NULL;
-	pgmap->data = NULL;
 
 	mutex_lock(&pgmap_lock);
 	error = 0;
@@ -422,11 +412,13 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 		 */
 		list_del(&page->lru);
 		page->pgmap = pgmap;
-		percpu_ref_get(ref);
+		percpu_ref_get(pgmap->ref);
 		if (!(++i % 1024))
 			cond_resched();
 	}
-	devres_add(dev, pgmap);
+
+	devm_add_action(dev, devm_memremap_pages_release, pgmap);
+
 	return __va(res->start);
 
  err_add_memory:
diff --git a/tools/testing/nvdimm/test/iomap.c b/tools/testing/nvdimm/test/iomap.c
index e1f75a1914a1..ff9d3a5825e1 100644
--- a/tools/testing/nvdimm/test/iomap.c
+++ b/tools/testing/nvdimm/test/iomap.c
@@ -104,15 +104,14 @@ void *__wrap_devm_memremap(struct device *dev, resource_size_t offset,
 }
 EXPORT_SYMBOL(__wrap_devm_memremap);
 
-void *__wrap_devm_memremap_pages(struct device *dev, struct resource *res,
-		struct percpu_ref *ref, struct vmem_altmap *altmap)
+void *__wrap_devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 {
-	resource_size_t offset = res->start;
+	resource_size_t offset = pgmap->res.start;
 	struct nfit_test_resource *nfit_res = get_nfit_res(offset);
 
 	if (nfit_res)
 		return nfit_res->buf + offset - nfit_res->res.start;
-	return devm_memremap_pages(dev, res, ref, altmap);
+	return devm_memremap_pages(dev, pgmap);
 }
 EXPORT_SYMBOL(__wrap_devm_memremap_pages);
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 17/17] memremap: merge find_dev_pagemap into get_dev_pagemap
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
                   ` (15 preceding siblings ...)
  2017-12-15 14:09 ` [PATCH 16/17] memremap: change devm_memremap_pages interface to use struct dev_pagemap Christoph Hellwig
@ 2017-12-15 14:09 ` Christoph Hellwig
  2017-12-17 18:53   ` Dan Williams
  2017-12-19 20:36 ` revamp vmem_altmap / dev_pagemap handling V2 Dan Williams
  17 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-15 14:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, x86, linux-mm, linux-kernel

There is only one caller of the trivial function find_dev_pagemap left,
so just merge it into the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/memremap.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index fd0e7c44e6bd..c04000361664 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -306,14 +306,6 @@ static void devm_memremap_pages_release(void *data)
 		      "%s: failed to free all reserved pages\n", __func__);
 }
 
-/* assumes rcu_read_lock() held at entry */
-static struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
-{
-	WARN_ON_ONCE(!rcu_read_lock_held());
-
-	return radix_tree_lookup(&pgmap_radix, PHYS_PFN(phys));
-}
-
 /**
  * devm_memremap_pages - remap and provide memmap backing for the given resource
  * @dev: hosting device for @res
@@ -466,7 +458,7 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
 
 	/* fall back to slow path lookup */
 	rcu_read_lock();
-	pgmap = find_dev_pagemap(phys);
+	pgmap = radix_tree_lookup(&pgmap_radix, PHYS_PFN(phys));
 	if (pgmap && !percpu_ref_tryget_live(pgmap->ref))
 		pgmap = NULL;
 	rcu_read_unlock();
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/17] memremap: provide stubs for vmem_altmap_offset and vmem_altmap_free
  2017-12-15 14:09 ` [PATCH 01/17] memremap: provide stubs for vmem_altmap_offset and vmem_altmap_free Christoph Hellwig
@ 2017-12-16  1:41   ` Dan Williams
  0 siblings, 0 replies; 43+ messages in thread
From: Dan Williams @ 2017-12-16  1:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> Currently all calls to those functions are eliminated by the compiler when
> CONFIG_ZONE_DEVICE is not set, but this soon won't be the case.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good,

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 02/17] mm: don't export arch_add_memory
  2017-12-15 14:09 ` [PATCH 02/17] mm: don't export arch_add_memory Christoph Hellwig
@ 2017-12-16  1:41   ` Dan Williams
  0 siblings, 0 replies; 43+ messages in thread
From: Dan Williams @ 2017-12-16  1:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> Only x86_64 and sh export this symbol, and it is not used by any
> modular code.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good,

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/17] mm: don't export __add_pages
  2017-12-15 14:09 ` [PATCH 03/17] mm: don't export __add_pages Christoph Hellwig
@ 2017-12-16  1:42   ` Dan Williams
  0 siblings, 0 replies; 43+ messages in thread
From: Dan Williams @ 2017-12-16  1:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> This function isn't used by any modules, and is only to be called
> from core MM code.  This includes the calls for the add_pages wrapper
> that might be inlined.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good,

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 04/17] mm: pass the vmem_altmap to arch_add_memory and __add_pages
  2017-12-15 14:09 ` [PATCH 04/17] mm: pass the vmem_altmap to arch_add_memory and __add_pages Christoph Hellwig
@ 2017-12-16  1:48   ` Dan Williams
  2017-12-17 17:22     ` Dan Williams
  2017-12-23  1:49   ` Dan Williams
  2017-12-23  1:54   ` Dan Williams
  2 siblings, 1 reply; 43+ messages in thread
From: Dan Williams @ 2017-12-16  1:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> We can just pass this on instead of having to do a radix tree lookup
> without proper locking 2 levels into the callchain.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Yeah, the lookup of vmem_altmap is too magical and surprising this is better.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 05/17] mm: pass the vmem_altmap to vmemmap_populate
  2017-12-15 14:09 ` [PATCH 05/17] mm: pass the vmem_altmap to vmemmap_populate Christoph Hellwig
@ 2017-12-16  2:03   ` Dan Williams
  0 siblings, 0 replies; 43+ messages in thread
From: Dan Williams @ 2017-12-16  2:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel, Michal Hocko

[ cc Michal ]

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> We can just pass this on instead of having to do a radix tree lookup
> without proper locking a few levels into the callchain.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I know Michal has concerns about the complexity of the memory hotplug
implementation, but I think this just means I need to go write up
better kerneldoc for the vmem_altmap definition so that memory hotplug
developers know what's happening.

Other than that:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

Including the patch for Michal just in case he doesn't have it in his archives.

> ---
>  arch/arm64/mm/mmu.c            |  6 ++++--
>  arch/ia64/mm/discontig.c       |  3 ++-
>  arch/powerpc/mm/init_64.c      |  7 ++-----
>  arch/s390/mm/vmem.c            |  3 ++-
>  arch/sparc/mm/init_64.c        |  2 +-
>  arch/x86/mm/init_64.c          |  4 ++--
>  include/linux/memory_hotplug.h |  3 ++-
>  include/linux/mm.h             |  6 ++++--
>  mm/memory_hotplug.c            |  7 ++++---
>  mm/sparse-vmemmap.c            |  7 ++++---
>  mm/sparse.c                    | 20 ++++++++++++--------
>  11 files changed, 39 insertions(+), 29 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 267d2b79d52d..ec8952ff13be 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -654,12 +654,14 @@ int kern_addr_valid(unsigned long addr)
>  }
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  #if !ARM64_SWAPPER_USES_SECTION_MAPS
> -int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
> +int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> +               struct vmem_altmap *altmap)
>  {
>         return vmemmap_populate_basepages(start, end, node);
>  }
>  #else  /* !ARM64_SWAPPER_USES_SECTION_MAPS */
> -int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
> +int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> +               struct vmem_altmap *altmap)
>  {
>         unsigned long addr = start;
>         unsigned long next;
> diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
> index 9b2d994cddf6..1555aecaaf85 100644
> --- a/arch/ia64/mm/discontig.c
> +++ b/arch/ia64/mm/discontig.c
> @@ -754,7 +754,8 @@ void arch_refresh_nodedata(int update_node, pg_data_t *update_pgdat)
>  #endif
>
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
> -int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
> +int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> +               struct vmem_altmap *altmap)
>  {
>         return vmemmap_populate_basepages(start, end, node);
>  }
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index a07722531b32..779b74a96b8f 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -183,7 +183,8 @@ static __meminit void vmemmap_list_populate(unsigned long phys,
>         vmemmap_list = vmem_back;
>  }
>
> -int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
> +int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> +               struct vmem_altmap *altmap)
>  {
>         unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
>
> @@ -193,16 +194,12 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
>         pr_debug("vmemmap_populate %lx..%lx, node %d\n", start, end, node);
>
>         for (; start < end; start += page_size) {
> -               struct vmem_altmap *altmap;
>                 void *p;
>                 int rc;
>
>                 if (vmemmap_populated(start, page_size))
>                         continue;
>
> -               /* altmap lookups only work at section boundaries */
> -               altmap = to_vmem_altmap(SECTION_ALIGN_DOWN(start));
> -
>                 p =  __vmemmap_alloc_block_buf(page_size, node, altmap);
>                 if (!p)
>                         return -ENOMEM;
> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
> index 3316d463fc29..c44ef0e7c466 100644
> --- a/arch/s390/mm/vmem.c
> +++ b/arch/s390/mm/vmem.c
> @@ -211,7 +211,8 @@ static void vmem_remove_range(unsigned long start, unsigned long size)
>  /*
>   * Add a backed mem_map array to the virtual mem_map array.
>   */
> -int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
> +int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> +               struct vmem_altmap *altmap)
>  {
>         unsigned long pgt_prot, sgt_prot;
>         unsigned long address = start;
> diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
> index 55ba62957e64..42d27a1a042a 100644
> --- a/arch/sparc/mm/init_64.c
> +++ b/arch/sparc/mm/init_64.c
> @@ -2628,7 +2628,7 @@ EXPORT_SYMBOL(_PAGE_CACHE);
>
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
> -                              int node)
> +                              int node, struct vmem_altmap *altmap)
>  {
>         unsigned long pte_base;
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index e26ade50ae18..0c898098feaf 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1411,9 +1411,9 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
>         return 0;
>  }
>
> -int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
> +int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> +               struct vmem_altmap *altmap)
>  {
> -       struct vmem_altmap *altmap = to_vmem_altmap(start);
>         int err;
>
>         if (boot_cpu_has(X86_FEATURE_PSE))
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index db276afbefcc..cbdd6d52e877 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -327,7 +327,8 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>  extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
>  extern bool is_memblock_offlined(struct memory_block *mem);
>  extern void remove_memory(int nid, u64 start, u64 size);
> -extern int sparse_add_one_section(struct pglist_data *pgdat, unsigned long start_pfn);
> +extern int sparse_add_one_section(struct pglist_data *pgdat,
> +               unsigned long start_pfn, struct vmem_altmap *altmap);
>  extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
>                 unsigned long map_offset);
>  extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ea818ff739cd..2f3a7ebecbe2 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2538,7 +2538,8 @@ void sparse_mem_maps_populate_node(struct page **map_map,
>                                    unsigned long map_count,
>                                    int nodeid);
>
> -struct page *sparse_mem_map_populate(unsigned long pnum, int nid);
> +struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
> +               struct vmem_altmap *altmap);
>  pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
>  p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node);
>  pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);
> @@ -2556,7 +2557,8 @@ static inline void *vmemmap_alloc_block_buf(unsigned long size, int node)
>  void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
>  int vmemmap_populate_basepages(unsigned long start, unsigned long end,
>                                int node);
> -int vmemmap_populate(unsigned long start, unsigned long end, int node);
> +int vmemmap_populate(unsigned long start, unsigned long end, int node,
> +               struct vmem_altmap *altmap);
>  void vmemmap_populate_print_last(void);
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  void vmemmap_free(unsigned long start, unsigned long end);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index fc0485dcece1..b36f1822c432 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -250,7 +250,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
>  #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
>
>  static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
> -               bool want_memblock)
> +               struct vmem_altmap *altmap, bool want_memblock)
>  {
>         int ret;
>         int i;
> @@ -258,7 +258,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>         if (pfn_valid(phys_start_pfn))
>                 return -EEXIST;
>
> -       ret = sparse_add_one_section(NODE_DATA(nid), phys_start_pfn);
> +       ret = sparse_add_one_section(NODE_DATA(nid), phys_start_pfn, altmap);
>         if (ret < 0)
>                 return ret;
>
> @@ -317,7 +317,8 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>         }
>
>         for (i = start_sec; i <= end_sec; i++) {
> -               err = __add_section(nid, section_nr_to_pfn(i), want_memblock);
> +               err = __add_section(nid, section_nr_to_pfn(i), altmap,
> +                               want_memblock);
>
>                 /*
>                  * EEXIST is finally dealt with by ioresource collision
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 17acf01791fa..376dcf05a39c 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -278,7 +278,8 @@ int __meminit vmemmap_populate_basepages(unsigned long start,
>         return 0;
>  }
>
> -struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid)
> +struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid,
> +               struct vmem_altmap *altmap)
>  {
>         unsigned long start;
>         unsigned long end;
> @@ -288,7 +289,7 @@ struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid)
>         start = (unsigned long)map;
>         end = (unsigned long)(map + PAGES_PER_SECTION);
>
> -       if (vmemmap_populate(start, end, nid))
> +       if (vmemmap_populate(start, end, nid, altmap))
>                 return NULL;
>
>         return map;
> @@ -318,7 +319,7 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
>                 if (!present_section_nr(pnum))
>                         continue;
>
> -               map_map[pnum] = sparse_mem_map_populate(pnum, nodeid);
> +               map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
>                 if (map_map[pnum])
>                         continue;
>                 ms = __nr_to_section(pnum);
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 7a5dacaa06e3..5f4a0dac7836 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -417,7 +417,8 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
>  }
>
>  #ifndef CONFIG_SPARSEMEM_VMEMMAP
> -struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid)
> +struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid,
> +               struct vmem_altmap *altmap)
>  {
>         struct page *map;
>         unsigned long size;
> @@ -472,7 +473,7 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
>
>                 if (!present_section_nr(pnum))
>                         continue;
> -               map_map[pnum] = sparse_mem_map_populate(pnum, nodeid);
> +               map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
>                 if (map_map[pnum])
>                         continue;
>                 ms = __nr_to_section(pnum);
> @@ -500,7 +501,7 @@ static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
>         struct mem_section *ms = __nr_to_section(pnum);
>         int nid = sparse_early_nid(ms);
>
> -       map = sparse_mem_map_populate(pnum, nid);
> +       map = sparse_mem_map_populate(pnum, nid, NULL);
>         if (map)
>                 return map;
>
> @@ -678,10 +679,11 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
>  #endif
>
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
> -static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid)
> +static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
> +               struct vmem_altmap *altmap)
>  {
>         /* This will make the necessary allocations eventually. */
> -       return sparse_mem_map_populate(pnum, nid);
> +       return sparse_mem_map_populate(pnum, nid, altmap);
>  }
>  static void __kfree_section_memmap(struct page *memmap)
>  {
> @@ -721,7 +723,8 @@ static struct page *__kmalloc_section_memmap(void)
>         return ret;
>  }
>
> -static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid)
> +static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
> +               struct vmem_altmap *altmap)
>  {
>         return __kmalloc_section_memmap();
>  }
> @@ -773,7 +776,8 @@ static void free_map_bootmem(struct page *memmap)
>   * set.  If this is <=0, then that means that the passed-in
>   * map was not consumed and must be freed.
>   */
> -int __meminit sparse_add_one_section(struct pglist_data *pgdat, unsigned long start_pfn)
> +int __meminit sparse_add_one_section(struct pglist_data *pgdat,
> +               unsigned long start_pfn, struct vmem_altmap *altmap)
>  {
>         unsigned long section_nr = pfn_to_section_nr(start_pfn);
>         struct mem_section *ms;
> @@ -789,7 +793,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat, unsigned long st
>         ret = sparse_index_init(section_nr, pgdat->node_id);
>         if (ret < 0 && ret != -EEXIST)
>                 return ret;
> -       memmap = kmalloc_section_memmap(section_nr, pgdat->node_id);
> +       memmap = kmalloc_section_memmap(section_nr, pgdat->node_id, altmap);
>         if (!memmap)
>                 return -ENOMEM;
>         usemap = __kmalloc_section_usemap();
> --
> 2.14.2
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 06/17] mm: pass the vmem_altmap to arch_remove_memory and __remove_pages
  2017-12-15 14:09 ` [PATCH 06/17] mm: pass the vmem_altmap to arch_remove_memory and __remove_pages Christoph Hellwig
@ 2017-12-16  2:04   ` Dan Williams
  2017-12-19 15:02     ` Christoph Hellwig
  0 siblings, 1 reply; 43+ messages in thread
From: Dan Williams @ 2017-12-16  2:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> We can just pass this on instead of having to do a radix tree lookup
> without proper locking 2 levels into the callchain.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>wip

I assume that "wip" is a typo?

Otherwise,

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 07/17] mm: pass the vmem_altmap to vmemmap_free
  2017-12-15 14:09 ` [PATCH 07/17] mm: pass the vmem_altmap to vmemmap_free Christoph Hellwig
@ 2017-12-16  2:12   ` Dan Williams
  0 siblings, 0 replies; 43+ messages in thread
From: Dan Williams @ 2017-12-16  2:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> We can just pass this on instead of having to do a radix tree lookup
> without proper locking a few levels into the callchain.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Now I remember why I went with the radix lookup, laziness!

This looks good to me, I appreciate you digging in.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 08/17] mm: pass the vmem_altmap to memmap_init_zone
  2017-12-15 14:09 ` [PATCH 08/17] mm: pass the vmem_altmap to memmap_init_zone Christoph Hellwig
@ 2017-12-16  2:15   ` Dan Williams
  0 siblings, 0 replies; 43+ messages in thread
From: Dan Williams @ 2017-12-16  2:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> Pass the vmem_altmap two levels down instead of needing a lookup.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Given the fact that HMM and now P2P are attracted to
devm_memremap_pages() I think this churn is worth it. vmem_altmap is
worth being considered a first class citizen of memory hotplug and not
a hidden hack.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 09/17] mm: split altmap memory map allocation from normal case
  2017-12-15 14:09 ` [PATCH 09/17] mm: split altmap memory map allocation from normal case Christoph Hellwig
@ 2017-12-16  2:18   ` Dan Williams
  0 siblings, 0 replies; 43+ messages in thread
From: Dan Williams @ 2017-12-16  2:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> No functional changes, just untangling the call chain.

I'd also mention that creating more helper functions in the altmap_
namespace helps document why altmap is passed all around the hotplug
code.

>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 10/17] mm: merge vmem_altmap_alloc into altmap_alloc_block_buf
  2017-12-15 14:09 ` [PATCH 10/17] mm: merge vmem_altmap_alloc into altmap_alloc_block_buf Christoph Hellwig
@ 2017-12-16  2:24   ` Dan Williams
  0 siblings, 0 replies; 43+ messages in thread
From: Dan Williams @ 2017-12-16  2:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> There is no clear separation between the two, so merge them.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>

Looks good,

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 04/17] mm: pass the vmem_altmap to arch_add_memory and __add_pages
  2017-12-16  1:48   ` Dan Williams
@ 2017-12-17 17:22     ` Dan Williams
  0 siblings, 0 replies; 43+ messages in thread
From: Dan Williams @ 2017-12-17 17:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 5:48 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
>> We can just pass this on instead of having to do a radix tree lookup
>> without proper locking 2 levels into the callchain.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> Yeah, the lookup of vmem_altmap is too magical and surprising this is better.
>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>

I'll also note that the locking is not necessary in the memory map
init path because we can't possibly be racing mutations of the radix
as everyone who might touch the radix is serialized by the
mem_hotplug_begin() lock. It's only accesses outside of the
arch_{add,remove}_memory() that need the rcu lock. However, that is
another subtle/magic assumption of this code and its better to pass
the altmap down through the call chain. I just don't want people
thinking that -stable needs to pick any of this up, because afaics the
locking is fine as is, and we can drop that mention from the
changelog.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 11/17] mm: move get_dev_pagemap out of line
  2017-12-15 14:09 ` [PATCH 11/17] mm: move get_dev_pagemap out of line Christoph Hellwig
@ 2017-12-17 17:26   ` Dan Williams
  0 siblings, 0 replies; 43+ messages in thread
From: Dan Williams @ 2017-12-17 17:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> This is a pretty big function, which should be out of line in general,
> and a no-op stub if CONFIG_ZONE_DEVICЕ is not set.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
[..]
> +/**
> + * get_dev_pagemap() - take a new live reference on the dev_pagemap for @pfn
> + * @pfn: page frame number to lookup page_map
> + * @pgmap: optional known pgmap that already has a reference
> + *
> + * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the
> + * same mapping.
> + */
> +struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
> +               struct dev_pagemap *pgmap)
> +{
> +       const struct resource *res = pgmap ? pgmap->res : NULL;
> +       resource_size_t phys = PFN_PHYS(pfn);
> +
> +       /*
> +        * In the cached case we're already holding a live reference so
> +        * we can simply do a blind increment
> +        */
> +       if (res && phys >= res->start && phys <= res->end) {
> +               percpu_ref_get(pgmap->ref);
> +               return pgmap;
> +       }

I was going to say keep the cached case in the static inline, but with
the optimization to the calling convention in the following patch I
think that makes this moot.

So,

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 12/17] mm: optimize dev_pagemap reference counting around get_dev_pagemap
  2017-12-15 14:09 ` [PATCH 12/17] mm: optimize dev_pagemap reference counting around get_dev_pagemap Christoph Hellwig
@ 2017-12-17 17:28   ` Dan Williams
  0 siblings, 0 replies; 43+ messages in thread
From: Dan Williams @ 2017-12-17 17:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> Change the calling convention so that get_dev_pagemap always consumes the
> previous reference instead of doing this using an explicit earlier call to
> put_dev_pagemap in the callers.
>
> The callers will still need to put the final reference after finishing the
> loop over the pages.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>  kernel/memremap.c | 17 +++++++++--------
>  mm/gup.c          |  7 +++++--
>  2 files changed, 14 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index 43d94db97ff4..26764085785d 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -506,22 +506,23 @@ struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start)
>   * @pfn: page frame number to lookup page_map
>   * @pgmap: optional known pgmap that already has a reference
>   *
> - * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the
> - * same mapping.
> + * If @pgmap is non-NULL and covers @pfn it will be returned as-is.  If @pgmap
> + * is non-NULL but does not cover @pfn the reference to it while be released.

s/while/will/


Other than that you can add:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 13/17] memremap: remove to_vmem_altmap
  2017-12-15 14:09 ` [PATCH 13/17] memremap: remove to_vmem_altmap Christoph Hellwig
@ 2017-12-17 17:30   ` Dan Williams
  0 siblings, 0 replies; 43+ messages in thread
From: Dan Williams @ 2017-12-17 17:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> All callers are gone now.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Nice,

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 14/17] memremap: simplify duplicate region handling in devm_memremap_pages
  2017-12-15 14:09 ` [PATCH 14/17] memremap: simplify duplicate region handling in devm_memremap_pages Christoph Hellwig
@ 2017-12-17 17:34   ` Dan Williams
  2017-12-19 15:03     ` Christoph Hellwig
  0 siblings, 1 reply; 43+ messages in thread
From: Dan Williams @ 2017-12-17 17:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> __radix_tree_insert already checks for duplicates and returns -EEXIST in
> that case, so remove the duplicate (and racy) duplicates check.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>  kernel/memremap.c | 11 -----------
>  1 file changed, 11 deletions(-)
>
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index 891491ddccdb..901404094df1 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -395,17 +395,6 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
>         align_end = align_start + align_size - 1;
>
>         foreach_order_pgoff(res, order, pgoff) {
> -               struct dev_pagemap *dup;
> -
> -               rcu_read_lock();
> -               dup = find_dev_pagemap(res->start + PFN_PHYS(pgoff));
> -               rcu_read_unlock();
> -               if (dup) {
> -                       dev_err(dev, "%s: %pr collides with mapping for %s\n",
> -                                       __func__, res, dev_name(dup->dev));
> -                       error = -EBUSY;
> -                       break;
> -               }
>                 error = __radix_tree_insert(&pgmap_radix,
>                                 PHYS_PFN(res->start) + pgoff, order, page_map);
>                 if (error) {


This is not racy, we'll catch the error on insert, and I think the
extra debug information is useful for debugging a broken memory map or
alignment math.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 15/17] memremap: drop private struct page_map
  2017-12-15 14:09 ` [PATCH 15/17] memremap: drop private struct page_map Christoph Hellwig
@ 2017-12-17 18:43   ` Dan Williams
  0 siblings, 0 replies; 43+ messages in thread
From: Dan Williams @ 2017-12-17 18:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> From: Logan Gunthorpe <logang@deltatee.com>
>
> 'struct page_map' is a private structure of 'struct dev_pagemap' but the
> latter replicates all the same fields as the former so there isn't much
> value in it. Thus drop it in favour of a completely public struct.
>
> This is a clean up in preperation for a more generally useful
> 'devm_memeremap_pages' interface.
>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good,

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 16/17] memremap: change devm_memremap_pages interface to use struct dev_pagemap
  2017-12-15 14:09 ` [PATCH 16/17] memremap: change devm_memremap_pages interface to use struct dev_pagemap Christoph Hellwig
@ 2017-12-17 18:51   ` Dan Williams
  2017-12-19 15:03     ` Christoph Hellwig
  0 siblings, 1 reply; 43+ messages in thread
From: Dan Williams @ 2017-12-17 18:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> From: Logan Gunthorpe <logang@deltatee.com>
>
> This new interface is similar to how struct device (and many others)
> work. The caller initializes a 'struct dev_pagemap' as required
> and calls 'devm_memremap_pages'. This allows the pagemap structure to
> be embedded in another structure and thus container_of can be used. In
> this way application specific members can be stored in a containing
> struct.
>
> This will be used by the P2P infrastructure and HMM could probably
> be cleaned up to use it as well (instead of having it's own, similar
> 'hmm_devmem_pages_create' function).
>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good, I notice that this does not initialize pgmap->type to
MEMORY_DEVICE_HOST, but since that value is zero and likely won't
change we're ok.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 17/17] memremap: merge find_dev_pagemap into get_dev_pagemap
  2017-12-15 14:09 ` [PATCH 17/17] memremap: merge find_dev_pagemap into get_dev_pagemap Christoph Hellwig
@ 2017-12-17 18:53   ` Dan Williams
  0 siblings, 0 replies; 43+ messages in thread
From: Dan Williams @ 2017-12-17 18:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> There is only one caller of the trivial function find_dev_pagemap left,
> so just merge it into the caller.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good,

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

...and all of these pass the nvdimm unit tests, so I think we're good
to go. I'll rebase the filesystem-DAX vs DMA collision series on top
of this.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 06/17] mm: pass the vmem_altmap to arch_remove_memory and __remove_pages
  2017-12-16  2:04   ` Dan Williams
@ 2017-12-19 15:02     ` Christoph Hellwig
  0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-19 15:02 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Jérôme Glisse, Logan Gunthorpe,
	linux-nvdimm, linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 06:04:37PM -0800, Dan Williams wrote:
> On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> > We can just pass this on instead of having to do a radix tree lookup
> > without proper locking 2 levels into the callchain.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>wip
> 
> I assume that "wip" is a typo?

It was the description of the patch this got folded into in my
local tree.  So basically equivalent to a typo :)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 14/17] memremap: simplify duplicate region handling in devm_memremap_pages
  2017-12-17 17:34   ` Dan Williams
@ 2017-12-19 15:03     ` Christoph Hellwig
  0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-19 15:03 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Jérôme Glisse, Logan Gunthorpe,
	linux-nvdimm, linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Sun, Dec 17, 2017 at 09:34:11AM -0800, Dan Williams wrote:
> This is not racy, we'll catch the error on insert, and I think the
> extra debug information is useful for debugging a broken memory map or
> alignment math.

We can check for -ЕEXIST and print the warning, but it's a weird pattern
for sure.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 16/17] memremap: change devm_memremap_pages interface to use struct dev_pagemap
  2017-12-17 18:51   ` Dan Williams
@ 2017-12-19 15:03     ` Christoph Hellwig
  0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-19 15:03 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Jérôme Glisse, Logan Gunthorpe,
	linux-nvdimm, linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Sun, Dec 17, 2017 at 10:51:56AM -0800, Dan Williams wrote:
> On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> > From: Logan Gunthorpe <logang@deltatee.com>
> >
> > This new interface is similar to how struct device (and many others)
> > work. The caller initializes a 'struct dev_pagemap' as required
> > and calls 'devm_memremap_pages'. This allows the pagemap structure to
> > be embedded in another structure and thus container_of can be used. In
> > this way application specific members can be stored in a containing
> > struct.
> >
> > This will be used by the P2P infrastructure and HMM could probably
> > be cleaned up to use it as well (instead of having it's own, similar
> > 'hmm_devmem_pages_create' function).
> >
> > Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Looks good, I notice that this does not initialize pgmap->type to
> MEMORY_DEVICE_HOST, but since that value is zero and likely won't
> change we're ok.

I'll add it jut for clarity.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: revamp vmem_altmap / dev_pagemap handling V2
  2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
                   ` (16 preceding siblings ...)
  2017-12-15 14:09 ` [PATCH 17/17] memremap: merge find_dev_pagemap into get_dev_pagemap Christoph Hellwig
@ 2017-12-19 20:36 ` Dan Williams
  17 siblings, 0 replies; 43+ messages in thread
From: Dan Williams @ 2017-12-19 20:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
>
> Hi all,
>
> this series started with two patches from Logan that now are in the
> middle of the series to kill the memremap-internal pgmap structure
> and to redo the dev_memreamp_pages interface to be better suitable
> for future PCI P2P uses.  I reviewed them and noticed that there
> isn't really any good reason to keep struct vmem_altmap either,
> and that a lot of these alternative device page map access should
> be better abstracted out instead of being sprinkled all over the
> mm code.  But when we got the RCU warnings in V1 I went for yet
> another approach, and now struct vmem_altmap is kept for now,
> but passed explicitly through the memory hotplug code instead of
> having to do unprotected lookups through the radix tree.  The
> end result is that only the get_user_pages path ever looks up
> struct dev_pagemap, and struct vmem_altmap is now always embedded
> into struct dev_pagemap, and explicitly passed where needed.
>
> Please review carefully, this has only been tested with my legacy
> e820 NVDIMM system.

I hit the following regression in the error path with these patches
applied. I'm working on a bisect and updating the unit tests to
capture this scenario. 4.15-rc2 works as expected.

[   47.102064] ------------[ cut here ]------------
[   47.103099] dax_pmem dax1.0: devm_memremap_pages_release: failed to
free all reserved pages
[   47.104773] WARNING: CPU: 6 PID: 1226 at kernel/memremap.c:306
devm_memremap_pages_release+0x399/0x3e0
[   47.106578] Modules linked in: ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip
6table_mangle ip6table_raw ip6table_security iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
iptable_mangle iptable_raw iptable_security ebtable_filter ebtables
ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel dax_pmem(O) nd_pmem(O) device_dax(O) nd_btt(O)
nd_e820(O) nfit(O) serio_raw libnvdimm(O) nfit_test_i
omap(O)
[   47.114722] CPU: 6 PID: 1226 Comm: ndctl Tainted: G           O
4.15.0-rc2+ #981
[   47.116082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
[   47.117993] task: 00000000f9fb534d task.stack: 00000000575f2a25
[   47.119004] RIP: 0010:devm_memremap_pages_release+0x399/0x3e0
[   47.119993] RSP: 0018:ffffc90002f2fd30 EFLAGS: 00010282
[   47.120909] RAX: 0000000000000000 RBX: ffff88043715fa80 RCX: 0000000000000000
[   47.122095] RDX: ffff8801f88d6900 RSI: ffff8801f88ce478 RDI: ffff8801f88ce478
[   47.123284] RBP: ffffc90002f2fd50 R08: 0000000000000000 R09: 0000000000000000
[   47.124466] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8801f1fd2d10
[   47.125648] R13: 0000000440000000 R14: ffff8801f4dc8018 R15: ffffffff81ed6dfe
[   47.126831] FS:  00007fd93f2ba840(0000) GS:ffff8801f88c0000(0000)
knlGS:0000000000000000
[   47.128233] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   47.129216] CR2: 000055fa3e090fc0 CR3: 00000001f3dce000 CR4: 00000000000406e0
[   47.130404] Call Trace:
[   47.130913]  release_nodes+0x160/0x2a0
[   47.131617]  driver_probe_device+0xf9/0x490
[   47.132378]  bind_store+0x109/0x160
[   47.133035]  kernfs_fop_write+0x110/0x1b0
[   47.133775]  __vfs_write+0x33/0x170
[   47.134438]  ? rcu_read_lock_sched_held+0x3f/0x70
[   47.135275]  ? rcu_sync_lockdep_assert+0x2a/0x50
[   47.136091]  ? __sb_start_write+0xd0/0x1b0
[   47.136840]  ? vfs_write+0x18b/0x1b0
[   47.137519]  vfs_write+0xc5/0x1b0
[   47.138151]  SyS_write+0x55/0xc0
[   47.138776]  entry_SYSCALL_64_fastpath+0x1f/0x96
[   47.139600] RIP: 0033:0x7fd93e3a8f84
[   47.140270] RSP: 002b:00007ffca9dc0f68 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[   47.141593] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd93e3a8f84
[   47.142778] RDX: 0000000000000007 RSI: 0000000001d2de90 RDI: 0000000000000004
[   47.143962] RBP: 00007ffca9dc0fa0 R08: 0000000001d283d0 R09: 00000000fffffff8
[   47.145147] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000407d50
[   47.146330] R13: 00007ffca9dc15a0 R14: 0000000000000000 R15: 0000000000000000
[   47.147520] Code: f9 57 16 01 01 48 85 db 74 55 4c 89 f7 e8 00 21
44 00 48 c7 c1 80 62 c2 81 48 89 da 48 89 c6 48 c7 c7 08 6a ee 81 e8
c7 9f ea ff <0f> ff e9 ce fe ff ff 48 c7 c2 08 cf ec 81 be ed 02 00 00
48 c7
[   47.150607] ---[ end trace f384c72daa2ac9c5 ]---
[   47.151458] dax_pmem dax1.0: dax_pmem_percpu_exit
[   47.152478] dax_pmem: probe of dax1.0 failed with error -12

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 04/17] mm: pass the vmem_altmap to arch_add_memory and __add_pages
  2017-12-15 14:09 ` [PATCH 04/17] mm: pass the vmem_altmap to arch_add_memory and __add_pages Christoph Hellwig
  2017-12-16  1:48   ` Dan Williams
@ 2017-12-23  1:49   ` Dan Williams
  2017-12-23  1:54   ` Dan Williams
  2 siblings, 0 replies; 43+ messages in thread
From: Dan Williams @ 2017-12-23  1:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> We can just pass this on instead of having to do a radix tree lookup
> without proper locking 2 levels into the callchain.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
[..]
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index 403ab9cdb949..16456117a1b1 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -427,7 +427,7 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
>                 goto err_pfn_remap;
>
>         mem_hotplug_begin();
> -       error = arch_add_memory(nid, align_start, align_size, false);
> +       error = arch_add_memory(nid, align_start, align_size, altmap, false);
>         if (!error)
>                 move_pfn_range_to_zone(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
>                                         align_start >> PAGE_SHIFT,

Subtle bug here. This altmap is the one that was passed in that we
copy into its permanent location in the pgmap, so it looks like this
patch needs to fold the following fix:

diff --git a/kernel/memremap.c b/kernel/memremap.c
index f277bf5b8c57..157a3756e1d5 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -382,6 +382,7 @@ void *devm_memremap_pages(struct device *dev,
struct resource *res,
        if (altmap) {
                memcpy(&page_map->altmap, altmap, sizeof(*altmap));
                pgmap->altmap = &page_map->altmap;
+               altmap = pgmap->altmap;
        }
        pgmap->ref = ref;
        pgmap->res = &page_map->res;

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 04/17] mm: pass the vmem_altmap to arch_add_memory and __add_pages
  2017-12-15 14:09 ` [PATCH 04/17] mm: pass the vmem_altmap to arch_add_memory and __add_pages Christoph Hellwig
  2017-12-16  1:48   ` Dan Williams
  2017-12-23  1:49   ` Dan Williams
@ 2017-12-23  1:54   ` Dan Williams
  2 siblings, 0 replies; 43+ messages in thread
From: Dan Williams @ 2017-12-23  1:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Logan Gunthorpe, linux-nvdimm,
	linuxppc-dev, X86 ML, Linux MM, linux-kernel

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> We can just pass this on instead of having to do a radix tree lookup
> without proper locking 2 levels into the callchain.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
[..]
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 8acdc35c2dfa..e26ade50ae18 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -772,12 +772,12 @@ static void update_end_of_memory_vars(u64 start, u64 size)
>         }
>  }
>
> -int add_pages(int nid, unsigned long start_pfn,
> -             unsigned long nr_pages, bool want_memblock)
> +int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
> +               struct vmem_altmap *altmap, bool want_memblock)
>  {
>         int ret;
>
> -       ret = __add_pages(nid, start_pfn, nr_pages, want_memblock);
> +       ret = __add_pages(nid, start_pfn, nr_pages, NULL, want_memblock);
>         WARN_ON_ONCE(ret);

Should be 'altmap' instead of NULL.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 07/17] mm: pass the vmem_altmap to vmemmap_free
  2017-12-29  7:53 revamp vmem_altmap / dev_pagemap handling V3 Christoph Hellwig
@ 2017-12-29  7:53 ` Christoph Hellwig
  0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2017-12-29  7:53 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jérôme Glisse, Logan Gunthorpe, Michal Hocko,
	linux-nvdimm, linuxppc-dev, x86, linux-mm, linux-kernel

We can just pass this on instead of having to do a radix tree lookup
without proper locking a few levels into the callchain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/arm64/mm/mmu.c            |  3 +-
 arch/ia64/mm/discontig.c       |  3 +-
 arch/powerpc/mm/init_64.c      |  5 ++--
 arch/s390/mm/vmem.c            |  3 +-
 arch/sparc/mm/init_64.c        |  3 +-
 arch/x86/mm/init_64.c          | 67 ++++++++++++++++++++++++------------------
 include/linux/memory_hotplug.h |  2 +-
 include/linux/mm.h             |  3 +-
 mm/memory_hotplug.c            |  7 +++--
 mm/sparse.c                    | 23 ++++++++-------
 10 files changed, 68 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ec8952ff13be..0b1f13e0b4b3 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -696,7 +696,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	return 0;
 }
 #endif	/* CONFIG_ARM64_64K_PAGES */
-void vmemmap_free(unsigned long start, unsigned long end)
+void vmemmap_free(unsigned long start, unsigned long end,
+		struct vmem_altmap *altmap)
 {
 }
 #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index 1555aecaaf85..5ea0d8d0968b 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -760,7 +760,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	return vmemmap_populate_basepages(start, end, node);
 }
 
-void vmemmap_free(unsigned long start, unsigned long end)
+void vmemmap_free(unsigned long start, unsigned long end,
+		struct vmem_altmap *altmap)
 {
 }
 #endif
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 779b74a96b8f..db7d4e092157 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -254,7 +254,8 @@ static unsigned long vmemmap_list_free(unsigned long start)
 	return vmem_back->phys;
 }
 
-void __ref vmemmap_free(unsigned long start, unsigned long end)
+void __ref vmemmap_free(unsigned long start, unsigned long end,
+		struct vmem_altmap *altmap)
 {
 	unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
 	unsigned long page_order = get_order(page_size);
@@ -265,7 +266,6 @@ void __ref vmemmap_free(unsigned long start, unsigned long end)
 
 	for (; start < end; start += page_size) {
 		unsigned long nr_pages, addr;
-		struct vmem_altmap *altmap;
 		struct page *section_base;
 		struct page *page;
 
@@ -285,7 +285,6 @@ void __ref vmemmap_free(unsigned long start, unsigned long end)
 		section_base = pfn_to_page(vmemmap_section_start(start));
 		nr_pages = 1 << page_order;
 
-		altmap = to_vmem_altmap((unsigned long) section_base);
 		if (altmap) {
 			vmem_altmap_free(altmap, nr_pages);
 		} else if (PageReserved(page)) {
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index c44ef0e7c466..db55561c5981 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -297,7 +297,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	return ret;
 }
 
-void vmemmap_free(unsigned long start, unsigned long end)
+void vmemmap_free(unsigned long start, unsigned long end,
+		struct vmem_altmap *altmap)
 {
 }
 
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 42d27a1a042a..995f9490334d 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2671,7 +2671,8 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
 	return 0;
 }
 
-void vmemmap_free(unsigned long start, unsigned long end)
+void vmemmap_free(unsigned long start, unsigned long end,
+		struct vmem_altmap *altmap)
 {
 }
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 3c046618cc7e..0cab4b5b59ba 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -800,11 +800,11 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
 
 #define PAGE_INUSE 0xFD
 
-static void __meminit free_pagetable(struct page *page, int order)
+static void __meminit free_pagetable(struct page *page, int order,
+		struct vmem_altmap *altmap)
 {
 	unsigned long magic;
 	unsigned int nr_pages = 1 << order;
-	struct vmem_altmap *altmap = to_vmem_altmap((unsigned long) page);
 
 	if (altmap) {
 		vmem_altmap_free(altmap, nr_pages);
@@ -826,7 +826,8 @@ static void __meminit free_pagetable(struct page *page, int order)
 		free_pages((unsigned long)page_address(page), order);
 }
 
-static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
+static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd,
+		struct vmem_altmap *altmap)
 {
 	pte_t *pte;
 	int i;
@@ -838,13 +839,14 @@ static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
 	}
 
 	/* free a pte talbe */
-	free_pagetable(pmd_page(*pmd), 0);
+	free_pagetable(pmd_page(*pmd), 0, altmap);
 	spin_lock(&init_mm.page_table_lock);
 	pmd_clear(pmd);
 	spin_unlock(&init_mm.page_table_lock);
 }
 
-static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
+static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud,
+		struct vmem_altmap *altmap)
 {
 	pmd_t *pmd;
 	int i;
@@ -856,13 +858,14 @@ static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
 	}
 
 	/* free a pmd talbe */
-	free_pagetable(pud_page(*pud), 0);
+	free_pagetable(pud_page(*pud), 0, altmap);
 	spin_lock(&init_mm.page_table_lock);
 	pud_clear(pud);
 	spin_unlock(&init_mm.page_table_lock);
 }
 
-static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d)
+static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d,
+		struct vmem_altmap *altmap)
 {
 	pud_t *pud;
 	int i;
@@ -874,7 +877,7 @@ static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d)
 	}
 
 	/* free a pud talbe */
-	free_pagetable(p4d_page(*p4d), 0);
+	free_pagetable(p4d_page(*p4d), 0, altmap);
 	spin_lock(&init_mm.page_table_lock);
 	p4d_clear(p4d);
 	spin_unlock(&init_mm.page_table_lock);
@@ -882,7 +885,7 @@ static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d)
 
 static void __meminit
 remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
-		 bool direct)
+		 struct vmem_altmap *altmap, bool direct)
 {
 	unsigned long next, pages = 0;
 	pte_t *pte;
@@ -913,7 +916,7 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
 			 * freed when offlining, or simplely not in use.
 			 */
 			if (!direct)
-				free_pagetable(pte_page(*pte), 0);
+				free_pagetable(pte_page(*pte), 0, altmap);
 
 			spin_lock(&init_mm.page_table_lock);
 			pte_clear(&init_mm, addr, pte);
@@ -936,7 +939,7 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
 
 			page_addr = page_address(pte_page(*pte));
 			if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
-				free_pagetable(pte_page(*pte), 0);
+				free_pagetable(pte_page(*pte), 0, altmap);
 
 				spin_lock(&init_mm.page_table_lock);
 				pte_clear(&init_mm, addr, pte);
@@ -953,7 +956,7 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
 
 static void __meminit
 remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
-		 bool direct)
+		 bool direct, struct vmem_altmap *altmap)
 {
 	unsigned long next, pages = 0;
 	pte_t *pte_base;
@@ -972,7 +975,8 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
 			    IS_ALIGNED(next, PMD_SIZE)) {
 				if (!direct)
 					free_pagetable(pmd_page(*pmd),
-						       get_order(PMD_SIZE));
+						       get_order(PMD_SIZE),
+						       altmap);
 
 				spin_lock(&init_mm.page_table_lock);
 				pmd_clear(pmd);
@@ -986,7 +990,8 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
 				if (!memchr_inv(page_addr, PAGE_INUSE,
 						PMD_SIZE)) {
 					free_pagetable(pmd_page(*pmd),
-						       get_order(PMD_SIZE));
+						       get_order(PMD_SIZE),
+						       altmap);
 
 					spin_lock(&init_mm.page_table_lock);
 					pmd_clear(pmd);
@@ -998,8 +1003,8 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
 		}
 
 		pte_base = (pte_t *)pmd_page_vaddr(*pmd);
-		remove_pte_table(pte_base, addr, next, direct);
-		free_pte_table(pte_base, pmd);
+		remove_pte_table(pte_base, addr, next, altmap, direct);
+		free_pte_table(pte_base, pmd, altmap);
 	}
 
 	/* Call free_pmd_table() in remove_pud_table(). */
@@ -1009,7 +1014,7 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
 
 static void __meminit
 remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
-		 bool direct)
+		 struct vmem_altmap *altmap, bool direct)
 {
 	unsigned long next, pages = 0;
 	pmd_t *pmd_base;
@@ -1028,7 +1033,8 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
 			    IS_ALIGNED(next, PUD_SIZE)) {
 				if (!direct)
 					free_pagetable(pud_page(*pud),
-						       get_order(PUD_SIZE));
+						       get_order(PUD_SIZE),
+						       altmap);
 
 				spin_lock(&init_mm.page_table_lock);
 				pud_clear(pud);
@@ -1042,7 +1048,8 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
 				if (!memchr_inv(page_addr, PAGE_INUSE,
 						PUD_SIZE)) {
 					free_pagetable(pud_page(*pud),
-						       get_order(PUD_SIZE));
+						       get_order(PUD_SIZE),
+						       altmap);
 
 					spin_lock(&init_mm.page_table_lock);
 					pud_clear(pud);
@@ -1054,8 +1061,8 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
 		}
 
 		pmd_base = pmd_offset(pud, 0);
-		remove_pmd_table(pmd_base, addr, next, direct);
-		free_pmd_table(pmd_base, pud);
+		remove_pmd_table(pmd_base, addr, next, direct, altmap);
+		free_pmd_table(pmd_base, pud, altmap);
 	}
 
 	if (direct)
@@ -1064,7 +1071,7 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
 
 static void __meminit
 remove_p4d_table(p4d_t *p4d_start, unsigned long addr, unsigned long end,
-		 bool direct)
+		 struct vmem_altmap *altmap, bool direct)
 {
 	unsigned long next, pages = 0;
 	pud_t *pud_base;
@@ -1080,14 +1087,14 @@ remove_p4d_table(p4d_t *p4d_start, unsigned long addr, unsigned long end,
 		BUILD_BUG_ON(p4d_large(*p4d));
 
 		pud_base = pud_offset(p4d, 0);
-		remove_pud_table(pud_base, addr, next, direct);
+		remove_pud_table(pud_base, addr, next, altmap, direct);
 		/*
 		 * For 4-level page tables we do not want to free PUDs, but in the
 		 * 5-level case we should free them. This code will have to change
 		 * to adapt for boot-time switching between 4 and 5 level page tables.
 		 */
 		if (CONFIG_PGTABLE_LEVELS == 5)
-			free_pud_table(pud_base, p4d);
+			free_pud_table(pud_base, p4d, altmap);
 	}
 
 	if (direct)
@@ -1096,7 +1103,8 @@ remove_p4d_table(p4d_t *p4d_start, unsigned long addr, unsigned long end,
 
 /* start and end are both virtual address. */
 static void __meminit
-remove_pagetable(unsigned long start, unsigned long end, bool direct)
+remove_pagetable(unsigned long start, unsigned long end, bool direct,
+		struct vmem_altmap *altmap)
 {
 	unsigned long next;
 	unsigned long addr;
@@ -1111,15 +1119,16 @@ remove_pagetable(unsigned long start, unsigned long end, bool direct)
 			continue;
 
 		p4d = p4d_offset(pgd, 0);
-		remove_p4d_table(p4d, addr, next, direct);
+		remove_p4d_table(p4d, addr, next, altmap, direct);
 	}
 
 	flush_tlb_all();
 }
 
-void __ref vmemmap_free(unsigned long start, unsigned long end)
+void __ref vmemmap_free(unsigned long start, unsigned long end,
+		struct vmem_altmap *altmap)
 {
-	remove_pagetable(start, end, false);
+	remove_pagetable(start, end, false, altmap);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
@@ -1129,7 +1138,7 @@ kernel_physical_mapping_remove(unsigned long start, unsigned long end)
 	start = (unsigned long)__va(start);
 	end = (unsigned long)__va(end);
 
-	remove_pagetable(start, end, true);
+	remove_pagetable(start, end, true, NULL);
 }
 
 int __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index e71927d0d46b..20dd98ad44a0 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -331,7 +331,7 @@ extern void remove_memory(int nid, u64 start, u64 size);
 extern int sparse_add_one_section(struct pglist_data *pgdat,
 		unsigned long start_pfn, struct vmem_altmap *altmap);
 extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
-		unsigned long map_offset);
+		unsigned long map_offset, struct vmem_altmap *altmap);
 extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
 					  unsigned long pnum);
 extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2f3a7ebecbe2..9d4cd4c1dc6d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2561,7 +2561,8 @@ int vmemmap_populate(unsigned long start, unsigned long end, int node,
 		struct vmem_altmap *altmap);
 void vmemmap_populate_print_last(void);
 #ifdef CONFIG_MEMORY_HOTPLUG
-void vmemmap_free(unsigned long start, unsigned long end);
+void vmemmap_free(unsigned long start, unsigned long end,
+		struct vmem_altmap *altmap);
 #endif
 void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
 				  unsigned long nr_pages);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index eae6bf47caf7..a8dde9734120 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -536,7 +536,7 @@ static void __remove_zone(struct zone *zone, unsigned long start_pfn)
 }
 
 static int __remove_section(struct zone *zone, struct mem_section *ms,
-		unsigned long map_offset)
+		unsigned long map_offset, struct vmem_altmap *altmap)
 {
 	unsigned long start_pfn;
 	int scn_nr;
@@ -553,7 +553,7 @@ static int __remove_section(struct zone *zone, struct mem_section *ms,
 	start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
 	__remove_zone(zone, start_pfn);
 
-	sparse_remove_one_section(zone, ms, map_offset);
+	sparse_remove_one_section(zone, ms, map_offset, altmap);
 	return 0;
 }
 
@@ -607,7 +607,8 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 	for (i = 0; i < sections_to_remove; i++) {
 		unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
 
-		ret = __remove_section(zone, __pfn_to_section(pfn), map_offset);
+		ret = __remove_section(zone, __pfn_to_section(pfn), map_offset,
+				altmap);
 		map_offset = 0;
 		if (ret)
 			break;
diff --git a/mm/sparse.c b/mm/sparse.c
index 5f4a0dac7836..06130c13dc99 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -685,12 +685,13 @@ static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
 	/* This will make the necessary allocations eventually. */
 	return sparse_mem_map_populate(pnum, nid, altmap);
 }
-static void __kfree_section_memmap(struct page *memmap)
+static void __kfree_section_memmap(struct page *memmap,
+		struct vmem_altmap *altmap)
 {
 	unsigned long start = (unsigned long)memmap;
 	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
 
-	vmemmap_free(start, end);
+	vmemmap_free(start, end, altmap);
 }
 #ifdef CONFIG_MEMORY_HOTREMOVE
 static void free_map_bootmem(struct page *memmap)
@@ -698,7 +699,7 @@ static void free_map_bootmem(struct page *memmap)
 	unsigned long start = (unsigned long)memmap;
 	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
 
-	vmemmap_free(start, end);
+	vmemmap_free(start, end, NULL);
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 #else
@@ -729,7 +730,8 @@ static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
 	return __kmalloc_section_memmap();
 }
 
-static void __kfree_section_memmap(struct page *memmap)
+static void __kfree_section_memmap(struct page *memmap,
+		struct vmem_altmap *altmap)
 {
 	if (is_vmalloc_addr(memmap))
 		vfree(memmap);
@@ -798,7 +800,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat,
 		return -ENOMEM;
 	usemap = __kmalloc_section_usemap();
 	if (!usemap) {
-		__kfree_section_memmap(memmap);
+		__kfree_section_memmap(memmap, altmap);
 		return -ENOMEM;
 	}
 
@@ -820,7 +822,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat,
 	pgdat_resize_unlock(pgdat, &flags);
 	if (ret <= 0) {
 		kfree(usemap);
-		__kfree_section_memmap(memmap);
+		__kfree_section_memmap(memmap, altmap);
 	}
 	return ret;
 }
@@ -847,7 +849,8 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
 }
 #endif
 
-static void free_section_usemap(struct page *memmap, unsigned long *usemap)
+static void free_section_usemap(struct page *memmap, unsigned long *usemap,
+		struct vmem_altmap *altmap)
 {
 	struct page *usemap_page;
 
@@ -861,7 +864,7 @@ static void free_section_usemap(struct page *memmap, unsigned long *usemap)
 	if (PageSlab(usemap_page) || PageCompound(usemap_page)) {
 		kfree(usemap);
 		if (memmap)
-			__kfree_section_memmap(memmap);
+			__kfree_section_memmap(memmap, altmap);
 		return;
 	}
 
@@ -875,7 +878,7 @@ static void free_section_usemap(struct page *memmap, unsigned long *usemap)
 }
 
 void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
-		unsigned long map_offset)
+		unsigned long map_offset, struct vmem_altmap *altmap)
 {
 	struct page *memmap = NULL;
 	unsigned long *usemap = NULL, flags;
@@ -893,7 +896,7 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
 
 	clear_hwpoisoned_pages(memmap + map_offset,
 			PAGES_PER_SECTION - map_offset);
-	free_section_usemap(memmap, usemap);
+	free_section_usemap(memmap, usemap, altmap);
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2017-12-29  7:54 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-15 14:09 revamp vmem_altmap / dev_pagemap handling V2 Christoph Hellwig
2017-12-15 14:09 ` [PATCH 01/17] memremap: provide stubs for vmem_altmap_offset and vmem_altmap_free Christoph Hellwig
2017-12-16  1:41   ` Dan Williams
2017-12-15 14:09 ` [PATCH 02/17] mm: don't export arch_add_memory Christoph Hellwig
2017-12-16  1:41   ` Dan Williams
2017-12-15 14:09 ` [PATCH 03/17] mm: don't export __add_pages Christoph Hellwig
2017-12-16  1:42   ` Dan Williams
2017-12-15 14:09 ` [PATCH 04/17] mm: pass the vmem_altmap to arch_add_memory and __add_pages Christoph Hellwig
2017-12-16  1:48   ` Dan Williams
2017-12-17 17:22     ` Dan Williams
2017-12-23  1:49   ` Dan Williams
2017-12-23  1:54   ` Dan Williams
2017-12-15 14:09 ` [PATCH 05/17] mm: pass the vmem_altmap to vmemmap_populate Christoph Hellwig
2017-12-16  2:03   ` Dan Williams
2017-12-15 14:09 ` [PATCH 06/17] mm: pass the vmem_altmap to arch_remove_memory and __remove_pages Christoph Hellwig
2017-12-16  2:04   ` Dan Williams
2017-12-19 15:02     ` Christoph Hellwig
2017-12-15 14:09 ` [PATCH 07/17] mm: pass the vmem_altmap to vmemmap_free Christoph Hellwig
2017-12-16  2:12   ` Dan Williams
2017-12-15 14:09 ` [PATCH 08/17] mm: pass the vmem_altmap to memmap_init_zone Christoph Hellwig
2017-12-16  2:15   ` Dan Williams
2017-12-15 14:09 ` [PATCH 09/17] mm: split altmap memory map allocation from normal case Christoph Hellwig
2017-12-16  2:18   ` Dan Williams
2017-12-15 14:09 ` [PATCH 10/17] mm: merge vmem_altmap_alloc into altmap_alloc_block_buf Christoph Hellwig
2017-12-16  2:24   ` Dan Williams
2017-12-15 14:09 ` [PATCH 11/17] mm: move get_dev_pagemap out of line Christoph Hellwig
2017-12-17 17:26   ` Dan Williams
2017-12-15 14:09 ` [PATCH 12/17] mm: optimize dev_pagemap reference counting around get_dev_pagemap Christoph Hellwig
2017-12-17 17:28   ` Dan Williams
2017-12-15 14:09 ` [PATCH 13/17] memremap: remove to_vmem_altmap Christoph Hellwig
2017-12-17 17:30   ` Dan Williams
2017-12-15 14:09 ` [PATCH 14/17] memremap: simplify duplicate region handling in devm_memremap_pages Christoph Hellwig
2017-12-17 17:34   ` Dan Williams
2017-12-19 15:03     ` Christoph Hellwig
2017-12-15 14:09 ` [PATCH 15/17] memremap: drop private struct page_map Christoph Hellwig
2017-12-17 18:43   ` Dan Williams
2017-12-15 14:09 ` [PATCH 16/17] memremap: change devm_memremap_pages interface to use struct dev_pagemap Christoph Hellwig
2017-12-17 18:51   ` Dan Williams
2017-12-19 15:03     ` Christoph Hellwig
2017-12-15 14:09 ` [PATCH 17/17] memremap: merge find_dev_pagemap into get_dev_pagemap Christoph Hellwig
2017-12-17 18:53   ` Dan Williams
2017-12-19 20:36 ` revamp vmem_altmap / dev_pagemap handling V2 Dan Williams
2017-12-29  7:53 revamp vmem_altmap / dev_pagemap handling V3 Christoph Hellwig
2017-12-29  7:53 ` [PATCH 07/17] mm: pass the vmem_altmap to vmemmap_free Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).