linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/7] mm: Merge hmm into devm_memremap_pages, mark GPL-only
@ 2018-10-12 17:49 Dan Williams
  2018-10-12 17:49 ` [PATCH v7 1/7] mm, devm_memremap_pages: Mark devm_memremap_pages() EXPORT_SYMBOL_GPL Dan Williams
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Dan Williams @ 2018-10-12 17:49 UTC (permalink / raw)
  To: akpm
  Cc: stable, Balbir Singh, Logan Gunthorpe, Christoph Hellwig,
	Jérôme Glisse, Michal Hocko, Jérôme Glisse,
	linux-mm, linux-kernel

[ apologies for the resend, script error ]

Changes since v6 [1]:
* Rebase on next-20181008 and fixup conflicts with the xarray conversion
  and hotplug optimizations
* It has soaked on a 0day visible branch for a few days without any
  reports.

[1]: https://lkml.org/lkml/2018/9/13/104

---

Hi Andrew,

Jérôme has reviewed the cleanups, thanks Jérôme. We still disagree on
the EXPORT_SYMBOL_GPL status of the core HMM implementation, but Logan,
Christoph and I continue to support marking all devm_memremap_pages()
derivatives EXPORT_SYMBOL_GPL.

HMM has been upstream for over a year, with no in-tree users it is clear
it was designed first and foremost for out of tree drivers. It takes
advantage of a facility Christoph and I spearheaded to support
persistent memory. It continues to see expanding use cases with no clear
end date when it will stop attracting features / revisions. It is not
suitable to export devm_memremap_pages() as a stable 3rd party driver
api.

devm_memremap_pages() is a facility that can create struct page entries
for any arbitrary range and give out-of-tree drivers the ability to
subvert core aspects of page management. It, and anything derived from
it (e.g. hmm, pcip2p, etc...), is a deep integration point into the core
kernel, and an EXPORT_SYMBOL_GPL() interface. 

Commit 31c5bda3a656 "mm: fix exports that inadvertently make put_page()
EXPORT_SYMBOL_GPL" was merged ahead of this series to relieve some of
the pressure from innocent consumers of put_page(), but now we need this
series to address *producers* of device pages.

More details and justification in the changelogs. The 0day
infrastructure has reported success across 152 configs and this survives
the libnvdimm unit test suite. Aside from the controversial bits the
diffstat is compelling at:

	7 files changed, 126 insertions(+), 323 deletions(-)

Note that the series has some minor collisions with Alex's recent series
to improve devm_memremap_pages() scalability [2]. So, whichever you take
first the other will need a minor rebase.

[2]: https://www.lkml.org/lkml/2018/9/11/10

Dan Williams (7):
      mm, devm_memremap_pages: Mark devm_memremap_pages() EXPORT_SYMBOL_GPL
      mm, devm_memremap_pages: Kill mapping "System RAM" support
      mm, devm_memremap_pages: Fix shutdown handling
      mm, devm_memremap_pages: Add MEMORY_DEVICE_PRIVATE support
      mm, hmm: Use devm semantics for hmm_devmem_{add,remove}
      mm, hmm: Replace hmm_devmem_pages_create() with devm_memremap_pages()
      mm, hmm: Mark hmm_devmem_{add,add_resource} EXPORT_SYMBOL_GPL


 drivers/dax/pmem.c                |   14 --
 drivers/nvdimm/pmem.c             |   13 +-
 include/linux/hmm.h               |    4 
 include/linux/memremap.h          |    2 
 kernel/memremap.c                 |   94 +++++++----
 mm/hmm.c                          |  305 +++++--------------------------------
 tools/testing/nvdimm/test/iomap.c |   17 ++
 7 files changed, 126 insertions(+), 323 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v7 1/7] mm, devm_memremap_pages: Mark devm_memremap_pages() EXPORT_SYMBOL_GPL
  2018-10-12 17:49 [PATCH v7 0/7] mm: Merge hmm into devm_memremap_pages, mark GPL-only Dan Williams
@ 2018-10-12 17:49 ` Dan Williams
  2018-10-17  8:17   ` Michal Hocko
  2018-10-12 17:49 ` [PATCH v7 2/7] mm, devm_memremap_pages: Kill mapping "System RAM" support Dan Williams
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Dan Williams @ 2018-10-12 17:49 UTC (permalink / raw)
  To: akpm
  Cc: Michal Hocko, Jérôme Glisse, Christoph Hellwig,
	linux-mm, linux-kernel

devm_memremap_pages() is a facility that can create struct page entries
for any arbitrary range and give drivers the ability to subvert core
aspects of page management.

Specifically the facility is tightly integrated with the kernel's memory
hotplug functionality. It injects an altmap argument deep into the
architecture specific vmemmap implementation to allow allocating from
specific reserved pages, and it has Linux specific assumptions about
page structure reference counting relative to get_user_pages() and
get_user_pages_fast(). It was an oversight and a mistake that this was
not marked EXPORT_SYMBOL_GPL from the outset.

Again, devm_memremap_pagex() exposes and relies upon core kernel
internal assumptions and will continue to evolve along with 'struct
page', memory hotplug, and support for new memory types / topologies.
Only an in-kernel GPL-only driver is expected to keep up with this
ongoing evolution. This interface, and functionality derived from this
interface, is not suitable for kernel-external drivers.

Cc: Michal Hocko <mhocko@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 kernel/memremap.c                 |    2 +-
 tools/testing/nvdimm/test/iomap.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 6ec81e0d7a33..1bbb2e892941 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -232,7 +232,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
  err_array:
 	return ERR_PTR(error);
 }
-EXPORT_SYMBOL(devm_memremap_pages);
+EXPORT_SYMBOL_GPL(devm_memremap_pages);
 
 unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
 {
diff --git a/tools/testing/nvdimm/test/iomap.c b/tools/testing/nvdimm/test/iomap.c
index ff9d3a5825e1..ed18a0cbc0c8 100644
--- a/tools/testing/nvdimm/test/iomap.c
+++ b/tools/testing/nvdimm/test/iomap.c
@@ -113,7 +113,7 @@ void *__wrap_devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 		return nfit_res->buf + offset - nfit_res->res.start;
 	return devm_memremap_pages(dev, pgmap);
 }
-EXPORT_SYMBOL(__wrap_devm_memremap_pages);
+EXPORT_SYMBOL_GPL(__wrap_devm_memremap_pages);
 
 pfn_t __wrap_phys_to_pfn_t(phys_addr_t addr, unsigned long flags)
 {


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 2/7] mm, devm_memremap_pages: Kill mapping "System RAM" support
  2018-10-12 17:49 [PATCH v7 0/7] mm: Merge hmm into devm_memremap_pages, mark GPL-only Dan Williams
  2018-10-12 17:49 ` [PATCH v7 1/7] mm, devm_memremap_pages: Mark devm_memremap_pages() EXPORT_SYMBOL_GPL Dan Williams
@ 2018-10-12 17:49 ` Dan Williams
  2018-10-12 17:49 ` [PATCH v7 3/7] mm, devm_memremap_pages: Fix shutdown handling Dan Williams
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2018-10-12 17:49 UTC (permalink / raw)
  To: akpm
  Cc: Jérôme Glisse, Christoph Hellwig, Logan Gunthorpe,
	linux-mm, linux-kernel

Given the fact that devm_memremap_pages() requires a percpu_ref that is
torn down by devm_memremap_pages_release() the current support for
mapping RAM is broken.

Support for remapping "System RAM" has been broken since the beginning
and there is no existing user of this this code path, so just kill the
support and make it an explicit error.

This cleanup also simplifies a follow-on patch to fix the error path
when setting a devm release action for devm_memremap_pages_release()
fails.

Reviewed-by: "Jérôme Glisse" <jglisse@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 kernel/memremap.c |    9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 1bbb2e892941..871d81bf0c69 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -167,15 +167,12 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	is_ram = region_intersects(align_start, align_size,
 		IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE);
 
-	if (is_ram == REGION_MIXED) {
-		WARN_ONCE(1, "%s attempted on mixed region %pr\n",
-				__func__, res);
+	if (is_ram != REGION_DISJOINT) {
+		WARN_ONCE(1, "%s attempted on %s region %pr\n", __func__,
+				is_ram == REGION_MIXED ? "mixed" : "ram", res);
 		return ERR_PTR(-ENXIO);
 	}
 
-	if (is_ram == REGION_INTERSECTS)
-		return __va(res->start);
-
 	if (!pgmap->ref)
 		return ERR_PTR(-EINVAL);
 


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 3/7] mm, devm_memremap_pages: Fix shutdown handling
  2018-10-12 17:49 [PATCH v7 0/7] mm: Merge hmm into devm_memremap_pages, mark GPL-only Dan Williams
  2018-10-12 17:49 ` [PATCH v7 1/7] mm, devm_memremap_pages: Mark devm_memremap_pages() EXPORT_SYMBOL_GPL Dan Williams
  2018-10-12 17:49 ` [PATCH v7 2/7] mm, devm_memremap_pages: Kill mapping "System RAM" support Dan Williams
@ 2018-10-12 17:49 ` Dan Williams
  2018-10-12 17:49 ` [PATCH v7 4/7] mm, devm_memremap_pages: Add MEMORY_DEVICE_PRIVATE support Dan Williams
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2018-10-12 17:49 UTC (permalink / raw)
  To: akpm
  Cc: stable, Jérôme Glisse, Logan Gunthorpe,
	Logan Gunthorpe, Christoph Hellwig, linux-mm, linux-kernel

The last step before devm_memremap_pages() returns success is to
allocate a release action, devm_memremap_pages_release(), to tear the
entire setup down. However, the result from devm_add_action() is not
checked.

Checking the error from devm_add_action() is not enough. The api
currently relies on the fact that the percpu_ref it is using is killed
by the time the devm_memremap_pages_release() is run. Rather than
continue this awkward situation, offload the responsibility of killing
the percpu_ref to devm_memremap_pages_release() directly. This allows
devm_memremap_pages() to do the right thing  relative to init failures
and shutdown.

Without this change we could fail to register the teardown of
devm_memremap_pages(). The likelihood of hitting this failure is tiny as
small memory allocations almost always succeed. However, the impact of
the failure is large given any future reconfiguration, or
disable/enable, of an nvdimm namespace will fail forever as subsequent
calls to devm_memremap_pages() will fail to setup the pgmap_radix since
there will be stale entries for the physical address range.

An argument could be made to require that the ->kill() operation be set
in the @pgmap arg rather than passed in separately. However, it helps
code readability, tracking the lifetime of a given instance, to be able
to grep the kill routine directly at the devm_memremap_pages() call
site.

Cc: <stable@vger.kernel.org>
Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface...")
Reviewed-by: "Jérôme Glisse" <jglisse@redhat.com>
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/pmem.c                |   14 +++-----------
 drivers/nvdimm/pmem.c             |   13 +++++--------
 include/linux/memremap.h          |    2 ++
 kernel/memremap.c                 |   30 ++++++++++++++----------------
 tools/testing/nvdimm/test/iomap.c |   15 ++++++++++++++-
 5 files changed, 38 insertions(+), 36 deletions(-)

diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index 99e2aace8078..2c1f459c0c63 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -48,9 +48,8 @@ static void dax_pmem_percpu_exit(void *data)
 	percpu_ref_exit(ref);
 }
 
-static void dax_pmem_percpu_kill(void *data)
+static void dax_pmem_percpu_kill(struct percpu_ref *ref)
 {
-	struct percpu_ref *ref = data;
 	struct dax_pmem *dax_pmem = to_dax_pmem(ref);
 
 	dev_dbg(dax_pmem->dev, "trace\n");
@@ -112,17 +111,10 @@ static int dax_pmem_probe(struct device *dev)
 	}
 
 	dax_pmem->pgmap.ref = &dax_pmem->ref;
+	dax_pmem->pgmap.kill = dax_pmem_percpu_kill;
 	addr = devm_memremap_pages(dev, &dax_pmem->pgmap);
-	if (IS_ERR(addr)) {
-		devm_remove_action(dev, dax_pmem_percpu_exit, &dax_pmem->ref);
-		percpu_ref_exit(&dax_pmem->ref);
+	if (IS_ERR(addr))
 		return PTR_ERR(addr);
-	}
-
-	rc = devm_add_action_or_reset(dev, dax_pmem_percpu_kill,
-							&dax_pmem->ref);
-	if (rc)
-		return rc;
 
 	/* adjust the dax_region resource to the start of data */
 	memcpy(&res, &dax_pmem->pgmap.res, sizeof(res));
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index a75d10c23d80..c7548e712a16 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -309,8 +309,11 @@ static void pmem_release_queue(void *q)
 	blk_cleanup_queue(q);
 }
 
-static void pmem_freeze_queue(void *q)
+static void pmem_freeze_queue(struct percpu_ref *ref)
 {
+	struct request_queue *q;
+
+	q = container_of(ref, typeof(*q), q_usage_counter);
 	blk_freeze_queue_start(q);
 }
 
@@ -402,6 +405,7 @@ static int pmem_attach_disk(struct device *dev,
 
 	pmem->pfn_flags = PFN_DEV;
 	pmem->pgmap.ref = &q->q_usage_counter;
+	pmem->pgmap.kill = pmem_freeze_queue;
 	if (is_nd_pfn(dev)) {
 		if (setup_pagemap_fsdax(dev, &pmem->pgmap))
 			return -ENOMEM;
@@ -425,13 +429,6 @@ static int pmem_attach_disk(struct device *dev,
 		addr = devm_memremap(dev, pmem->phys_addr,
 				pmem->size, ARCH_MEMREMAP_PMEM);
 
-	/*
-	 * At release time the queue must be frozen before
-	 * devm_memremap_pages is unwound
-	 */
-	if (devm_add_action_or_reset(dev, pmem_freeze_queue, q))
-		return -ENOMEM;
-
 	if (IS_ERR(addr))
 		return PTR_ERR(addr);
 	pmem->virt_addr = addr;
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index f91f9e763557..a84572cdc438 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -106,6 +106,7 @@ typedef void (*dev_page_free_t)(struct page *page, void *data);
  * @altmap: pre-allocated/reserved memory for vmemmap allocations
  * @res: physical address range covered by @ref
  * @ref: reference count that pins the devm_memremap_pages() mapping
+ * @kill: callback to transition @ref to the dead state
  * @dev: host device of the mapping for debug
  * @data: private data pointer for page_free()
  * @type: memory type: see MEMORY_* in memory_hotplug.h
@@ -117,6 +118,7 @@ struct dev_pagemap {
 	bool altmap_valid;
 	struct resource res;
 	struct percpu_ref *ref;
+	void (*kill)(struct percpu_ref *ref);
 	struct device *dev;
 	void *data;
 	enum memory_type type;
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 871d81bf0c69..ec7d8ca8db2c 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -88,14 +88,10 @@ static void devm_memremap_pages_release(void *data)
 	resource_size_t align_start, align_size;
 	unsigned long pfn;
 
+	pgmap->kill(pgmap->ref);
 	for_each_device_pfn(pfn, pgmap)
 		put_page(pfn_to_page(pfn));
 
-	if (percpu_ref_tryget_live(pgmap->ref)) {
-		dev_WARN(dev, "%s: page mapping is still live!\n", __func__);
-		percpu_ref_put(pgmap->ref);
-	}
-
 	/* pages are dead and unused, undo the arch mapping */
 	align_start = res->start & ~(SECTION_SIZE - 1);
 	align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
@@ -116,7 +112,7 @@ static void devm_memremap_pages_release(void *data)
 /**
  * devm_memremap_pages - remap and provide memmap backing for the given resource
  * @dev: hosting device for @res
- * @pgmap: pointer to a struct dev_pgmap
+ * @pgmap: pointer to a struct dev_pagemap
  *
  * Notes:
  * 1/ At a minimum the res, ref and type members of @pgmap must be initialized
@@ -125,11 +121,8 @@ static void devm_memremap_pages_release(void *data)
  * 2/ The altmap field may optionally be initialized, in which case altmap_valid
  *    must be set to true
  *
- * 3/ pgmap.ref must be 'live' on entry and 'dead' before devm_memunmap_pages()
- *    time (or devm release event). The expected order of events is that ref has
- *    been through percpu_ref_kill() before devm_memremap_pages_release(). The
- *    wait for the completion of all references being dropped and
- *    percpu_ref_exit() must occur after devm_memremap_pages_release().
+ * 3/ pgmap->ref must be 'live' on entry and will be killed at
+ *    devm_memremap_pages_release() time, or if this routine fails.
  *
  * 4/ res is expected to be a host memory range that could feasibly be
  *    treated as a "System RAM" range, i.e. not a device mmio range, but
@@ -145,6 +138,9 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	pgprot_t pgprot = PAGE_KERNEL;
 	int error, nid, is_ram;
 
+	if (!pgmap->ref || !pgmap->kill)
+		return ERR_PTR(-EINVAL);
+
 	align_start = res->start & ~(SECTION_SIZE - 1);
 	align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
 		- align_start;
@@ -170,12 +166,10 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	if (is_ram != REGION_DISJOINT) {
 		WARN_ONCE(1, "%s attempted on %s region %pr\n", __func__,
 				is_ram == REGION_MIXED ? "mixed" : "ram", res);
-		return ERR_PTR(-ENXIO);
+		error = -ENXIO;
+		goto err_array;
 	}
 
-	if (!pgmap->ref)
-		return ERR_PTR(-EINVAL);
-
 	pgmap->dev = dev;
 
 	error = xa_err(xa_store_range(&pgmap_array, PHYS_PFN(res->start),
@@ -216,7 +210,10 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 				align_start >> PAGE_SHIFT,
 				align_size >> PAGE_SHIFT, pgmap);
 
-	devm_add_action(dev, devm_memremap_pages_release, pgmap);
+	error = devm_add_action_or_reset(dev, devm_memremap_pages_release,
+			pgmap);
+	if (error)
+		return ERR_PTR(error);
 
 	return __va(res->start);
 
@@ -227,6 +224,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
  err_pfn_remap:
 	pgmap_array_delete(res);
  err_array:
+	pgmap->kill(pgmap->ref);
 	return ERR_PTR(error);
 }
 EXPORT_SYMBOL_GPL(devm_memremap_pages);
diff --git a/tools/testing/nvdimm/test/iomap.c b/tools/testing/nvdimm/test/iomap.c
index ed18a0cbc0c8..c6635fee27d8 100644
--- a/tools/testing/nvdimm/test/iomap.c
+++ b/tools/testing/nvdimm/test/iomap.c
@@ -104,13 +104,26 @@ void *__wrap_devm_memremap(struct device *dev, resource_size_t offset,
 }
 EXPORT_SYMBOL(__wrap_devm_memremap);
 
+static void nfit_test_kill(void *_pgmap)
+{
+	struct dev_pagemap *pgmap = _pgmap;
+
+	pgmap->kill(pgmap->ref);
+}
+
 void *__wrap_devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 {
 	resource_size_t offset = pgmap->res.start;
 	struct nfit_test_resource *nfit_res = get_nfit_res(offset);
 
-	if (nfit_res)
+	if (nfit_res) {
+		int rc;
+
+		rc = devm_add_action_or_reset(dev, nfit_test_kill, pgmap);
+		if (rc)
+			return ERR_PTR(rc);
 		return nfit_res->buf + offset - nfit_res->res.start;
+	}
 	return devm_memremap_pages(dev, pgmap);
 }
 EXPORT_SYMBOL_GPL(__wrap_devm_memremap_pages);


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 4/7] mm, devm_memremap_pages: Add MEMORY_DEVICE_PRIVATE support
  2018-10-12 17:49 [PATCH v7 0/7] mm: Merge hmm into devm_memremap_pages, mark GPL-only Dan Williams
                   ` (2 preceding siblings ...)
  2018-10-12 17:49 ` [PATCH v7 3/7] mm, devm_memremap_pages: Fix shutdown handling Dan Williams
@ 2018-10-12 17:49 ` Dan Williams
  2018-10-12 17:49 ` [PATCH v7 5/7] mm, hmm: Use devm semantics for hmm_devmem_{add, remove} Dan Williams
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2018-10-12 17:49 UTC (permalink / raw)
  To: akpm
  Cc: Jérôme Glisse, Christoph Hellwig, Logan Gunthorpe,
	Logan Gunthorpe, linux-mm, linux-kernel

In preparation for consolidating all ZONE_DEVICE enabling via
devm_memremap_pages(), teach it how to handle the constraints of
MEMORY_DEVICE_PRIVATE ranges.

Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
[jglisse: call move_pfn_range_to_zone for MEMORY_DEVICE_PRIVATE]
Acked-by: Christoph Hellwig <hch@lst.de>
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 kernel/memremap.c |   53 +++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 41 insertions(+), 12 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index ec7d8ca8db2c..7275a227c4ec 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -98,9 +98,15 @@ static void devm_memremap_pages_release(void *data)
 		- align_start;
 
 	mem_hotplug_begin();
-	arch_remove_memory(align_start, align_size, pgmap->altmap_valid ?
-			&pgmap->altmap : NULL);
-	kasan_remove_zero_shadow(__va(align_start), align_size);
+	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
+		pfn = align_start >> PAGE_SHIFT;
+		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
+				align_size >> PAGE_SHIFT, NULL);
+	} else {
+		arch_remove_memory(align_start, align_size,
+				pgmap->altmap_valid ? &pgmap->altmap : NULL);
+		kasan_remove_zero_shadow(__va(align_start), align_size);
+	}
 	mem_hotplug_done();
 
 	untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
@@ -187,17 +193,40 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 		goto err_pfn_remap;
 
 	mem_hotplug_begin();
-	error = kasan_add_zero_shadow(__va(align_start), align_size);
-	if (error) {
-		mem_hotplug_done();
-		goto err_kasan;
+
+	/*
+	 * For device private memory we call add_pages() as we only need to
+	 * allocate and initialize struct page for the device memory. More-
+	 * over the device memory is un-accessible thus we do not want to
+	 * create a linear mapping for the memory like arch_add_memory()
+	 * would do.
+	 *
+	 * For all other device memory types, which are accessible by
+	 * the CPU, we do want the linear mapping and thus use
+	 * arch_add_memory().
+	 */
+	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
+		error = add_pages(nid, align_start >> PAGE_SHIFT,
+				align_size >> PAGE_SHIFT, NULL, false);
+	} else {
+		error = kasan_add_zero_shadow(__va(align_start), align_size);
+		if (error) {
+			mem_hotplug_done();
+			goto err_kasan;
+		}
+
+		error = arch_add_memory(nid, align_start, align_size, altmap,
+				false);
+	}
+
+	if (!error) {
+		struct zone *zone;
+
+		zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE];
+		move_pfn_range_to_zone(zone, align_start >> PAGE_SHIFT,
+				align_size >> PAGE_SHIFT, altmap);
 	}
 
-	error = arch_add_memory(nid, align_start, align_size, altmap, false);
-	if (!error)
-		move_pfn_range_to_zone(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
-					align_start >> PAGE_SHIFT,
-					align_size >> PAGE_SHIFT, altmap);
 	mem_hotplug_done();
 	if (error)
 		goto err_add_memory;


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 5/7] mm, hmm: Use devm semantics for hmm_devmem_{add, remove}
  2018-10-12 17:49 [PATCH v7 0/7] mm: Merge hmm into devm_memremap_pages, mark GPL-only Dan Williams
                   ` (3 preceding siblings ...)
  2018-10-12 17:49 ` [PATCH v7 4/7] mm, devm_memremap_pages: Add MEMORY_DEVICE_PRIVATE support Dan Williams
@ 2018-10-12 17:49 ` Dan Williams
  2018-10-12 17:50 ` [PATCH v7 6/7] mm, hmm: Replace hmm_devmem_pages_create() with devm_memremap_pages() Dan Williams
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2018-10-12 17:49 UTC (permalink / raw)
  To: akpm
  Cc: Christoph Hellwig, Jérôme Glisse,
	Jérôme Glisse, Logan Gunthorpe, linux-mm, linux-kernel

devm semantics arrange for resources to be torn down when
device-driver-probe fails or when device-driver-release completes.
Similar to devm_memremap_pages() there is no need to support an explicit
remove operation when the users properly adhere to devm semantics.

Note that devm_kzalloc() automatically handles allocating node-local
memory.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/hmm.h |    4 --
 mm/hmm.c            |  127 ++++++++++-----------------------------------------
 2 files changed, 25 insertions(+), 106 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index dde947083d4e..5888ae9f6abf 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -499,8 +499,7 @@ struct hmm_devmem {
  * enough and allocate struct page for it.
  *
  * The device driver can wrap the hmm_devmem struct inside a private device
- * driver struct. The device driver must call hmm_devmem_remove() before the
- * device goes away and before freeing the hmm_devmem struct memory.
+ * driver struct.
  */
 struct hmm_devmem *hmm_devmem_add(const struct hmm_devmem_ops *ops,
 				  struct device *device,
@@ -508,7 +507,6 @@ struct hmm_devmem *hmm_devmem_add(const struct hmm_devmem_ops *ops,
 struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops,
 					   struct device *device,
 					   struct resource *res);
-void hmm_devmem_remove(struct hmm_devmem *devmem);
 
 /*
  * hmm_devmem_page_set_drvdata - set per-page driver data field
diff --git a/mm/hmm.c b/mm/hmm.c
index 774d684fa2b4..60e4b275ad78 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -939,7 +939,6 @@ static void hmm_devmem_ref_exit(void *data)
 
 	devmem = container_of(ref, struct hmm_devmem, ref);
 	percpu_ref_exit(ref);
-	devm_remove_action(devmem->device, &hmm_devmem_ref_exit, data);
 }
 
 static void hmm_devmem_ref_kill(void *data)
@@ -950,7 +949,6 @@ static void hmm_devmem_ref_kill(void *data)
 	devmem = container_of(ref, struct hmm_devmem, ref);
 	percpu_ref_kill(ref);
 	wait_for_completion(&devmem->completion);
-	devm_remove_action(devmem->device, &hmm_devmem_ref_kill, data);
 }
 
 static int hmm_devmem_fault(struct vm_area_struct *vma,
@@ -988,7 +986,7 @@ static void hmm_devmem_radix_release(struct resource *resource)
 	mutex_unlock(&hmm_devmem_lock);
 }
 
-static void hmm_devmem_release(struct device *dev, void *data)
+static void hmm_devmem_release(void *data)
 {
 	struct hmm_devmem *devmem = data;
 	struct resource *resource = devmem->resource;
@@ -996,11 +994,6 @@ static void hmm_devmem_release(struct device *dev, void *data)
 	struct zone *zone;
 	struct page *page;
 
-	if (percpu_ref_tryget_live(&devmem->ref)) {
-		dev_WARN(dev, "%s: page mapping is still live!\n", __func__);
-		percpu_ref_put(&devmem->ref);
-	}
-
 	/* pages are dead and unused, undo the arch mapping */
 	start_pfn = (resource->start & ~(PA_SECTION_SIZE - 1)) >> PAGE_SHIFT;
 	npages = ALIGN(resource_size(resource), PA_SECTION_SIZE) >> PAGE_SHIFT;
@@ -1126,19 +1119,6 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
 	return ret;
 }
 
-static int hmm_devmem_match(struct device *dev, void *data, void *match_data)
-{
-	struct hmm_devmem *devmem = data;
-
-	return devmem->resource == match_data;
-}
-
-static void hmm_devmem_pages_remove(struct hmm_devmem *devmem)
-{
-	devres_release(devmem->device, &hmm_devmem_release,
-		       &hmm_devmem_match, devmem->resource);
-}
-
 /*
  * hmm_devmem_add() - hotplug ZONE_DEVICE memory for device memory
  *
@@ -1166,8 +1146,7 @@ struct hmm_devmem *hmm_devmem_add(const struct hmm_devmem_ops *ops,
 
 	dev_pagemap_get_ops();
 
-	devmem = devres_alloc_node(&hmm_devmem_release, sizeof(*devmem),
-				   GFP_KERNEL, dev_to_node(device));
+	devmem = devm_kzalloc(device, sizeof(*devmem), GFP_KERNEL);
 	if (!devmem)
 		return ERR_PTR(-ENOMEM);
 
@@ -1181,11 +1160,11 @@ struct hmm_devmem *hmm_devmem_add(const struct hmm_devmem_ops *ops,
 	ret = percpu_ref_init(&devmem->ref, &hmm_devmem_ref_release,
 			      0, GFP_KERNEL);
 	if (ret)
-		goto error_percpu_ref;
+		return ERR_PTR(ret);
 
-	ret = devm_add_action(device, hmm_devmem_ref_exit, &devmem->ref);
+	ret = devm_add_action_or_reset(device, hmm_devmem_ref_exit, &devmem->ref);
 	if (ret)
-		goto error_devm_add_action;
+		return ERR_PTR(ret);
 
 	size = ALIGN(size, PA_SECTION_SIZE);
 	addr = min((unsigned long)iomem_resource.end,
@@ -1205,16 +1184,12 @@ struct hmm_devmem *hmm_devmem_add(const struct hmm_devmem_ops *ops,
 
 		devmem->resource = devm_request_mem_region(device, addr, size,
 							   dev_name(device));
-		if (!devmem->resource) {
-			ret = -ENOMEM;
-			goto error_no_resource;
-		}
+		if (!devmem->resource)
+			return ERR_PTR(-ENOMEM);
 		break;
 	}
-	if (!devmem->resource) {
-		ret = -ERANGE;
-		goto error_no_resource;
-	}
+	if (!devmem->resource)
+		return ERR_PTR(-ERANGE);
 
 	devmem->resource->desc = IORES_DESC_DEVICE_PRIVATE_MEMORY;
 	devmem->pfn_first = devmem->resource->start >> PAGE_SHIFT;
@@ -1223,28 +1198,13 @@ struct hmm_devmem *hmm_devmem_add(const struct hmm_devmem_ops *ops,
 
 	ret = hmm_devmem_pages_create(devmem);
 	if (ret)
-		goto error_pages;
-
-	devres_add(device, devmem);
+		return ERR_PTR(ret);
 
-	ret = devm_add_action(device, hmm_devmem_ref_kill, &devmem->ref);
-	if (ret) {
-		hmm_devmem_remove(devmem);
+	ret = devm_add_action_or_reset(device, hmm_devmem_release, devmem);
+	if (ret)
 		return ERR_PTR(ret);
-	}
 
 	return devmem;
-
-error_pages:
-	devm_release_mem_region(device, devmem->resource->start,
-				resource_size(devmem->resource));
-error_no_resource:
-error_devm_add_action:
-	hmm_devmem_ref_kill(&devmem->ref);
-	hmm_devmem_ref_exit(&devmem->ref);
-error_percpu_ref:
-	devres_free(devmem);
-	return ERR_PTR(ret);
 }
 EXPORT_SYMBOL(hmm_devmem_add);
 
@@ -1260,8 +1220,7 @@ struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops,
 
 	dev_pagemap_get_ops();
 
-	devmem = devres_alloc_node(&hmm_devmem_release, sizeof(*devmem),
-				   GFP_KERNEL, dev_to_node(device));
+	devmem = devm_kzalloc(device, sizeof(*devmem), GFP_KERNEL);
 	if (!devmem)
 		return ERR_PTR(-ENOMEM);
 
@@ -1275,12 +1234,12 @@ struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops,
 	ret = percpu_ref_init(&devmem->ref, &hmm_devmem_ref_release,
 			      0, GFP_KERNEL);
 	if (ret)
-		goto error_percpu_ref;
+		return ERR_PTR(ret);
 
-	ret = devm_add_action(device, hmm_devmem_ref_exit, &devmem->ref);
+	ret = devm_add_action_or_reset(device, hmm_devmem_ref_exit,
+			&devmem->ref);
 	if (ret)
-		goto error_devm_add_action;
-
+		return ERR_PTR(ret);
 
 	devmem->pfn_first = devmem->resource->start >> PAGE_SHIFT;
 	devmem->pfn_last = devmem->pfn_first +
@@ -1288,59 +1247,21 @@ struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops,
 
 	ret = hmm_devmem_pages_create(devmem);
 	if (ret)
-		goto error_devm_add_action;
+		return ERR_PTR(ret);
 
-	devres_add(device, devmem);
+	ret = devm_add_action_or_reset(device, hmm_devmem_release, devmem);
+	if (ret)
+		return ERR_PTR(ret);
 
-	ret = devm_add_action(device, hmm_devmem_ref_kill, &devmem->ref);
-	if (ret) {
-		hmm_devmem_remove(devmem);
+	ret = devm_add_action_or_reset(device, hmm_devmem_ref_kill,
+			&devmem->ref);
+	if (ret)
 		return ERR_PTR(ret);
-	}
 
 	return devmem;
-
-error_devm_add_action:
-	hmm_devmem_ref_kill(&devmem->ref);
-	hmm_devmem_ref_exit(&devmem->ref);
-error_percpu_ref:
-	devres_free(devmem);
-	return ERR_PTR(ret);
 }
 EXPORT_SYMBOL(hmm_devmem_add_resource);
 
-/*
- * hmm_devmem_remove() - remove device memory (kill and free ZONE_DEVICE)
- *
- * @devmem: hmm_devmem struct use to track and manage the ZONE_DEVICE memory
- *
- * This will hot-unplug memory that was hotplugged by hmm_devmem_add on behalf
- * of the device driver. It will free struct page and remove the resource that
- * reserved the physical address range for this device memory.
- */
-void hmm_devmem_remove(struct hmm_devmem *devmem)
-{
-	resource_size_t start, size;
-	struct device *device;
-	bool cdm = false;
-
-	if (!devmem)
-		return;
-
-	device = devmem->device;
-	start = devmem->resource->start;
-	size = resource_size(devmem->resource);
-
-	cdm = devmem->resource->desc == IORES_DESC_DEVICE_PUBLIC_MEMORY;
-	hmm_devmem_ref_kill(&devmem->ref);
-	hmm_devmem_ref_exit(&devmem->ref);
-	hmm_devmem_pages_remove(devmem);
-
-	if (!cdm)
-		devm_release_mem_region(device, start, size);
-}
-EXPORT_SYMBOL(hmm_devmem_remove);
-
 /*
  * A device driver that wants to handle multiple devices memory through a
  * single fake device can use hmm_device to do so. This is purely a helper


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 6/7] mm, hmm: Replace hmm_devmem_pages_create() with devm_memremap_pages()
  2018-10-12 17:49 [PATCH v7 0/7] mm: Merge hmm into devm_memremap_pages, mark GPL-only Dan Williams
                   ` (4 preceding siblings ...)
  2018-10-12 17:49 ` [PATCH v7 5/7] mm, hmm: Use devm semantics for hmm_devmem_{add, remove} Dan Williams
@ 2018-10-12 17:50 ` Dan Williams
  2018-10-12 17:50 ` [PATCH v7 7/7] mm, hmm: Mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL Dan Williams
  2018-10-12 18:14 ` [PATCH v7 0/7] mm: Merge hmm into devm_memremap_pages, mark GPL-only Jerome Glisse
  7 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2018-10-12 17:50 UTC (permalink / raw)
  To: akpm
  Cc: Christoph Hellwig, Jérôme Glisse, Balbir Singh,
	Logan Gunthorpe, linux-mm, linux-kernel

Commit e8d513483300 "memremap: change devm_memremap_pages interface to
use struct dev_pagemap" refactored devm_memremap_pages() to allow a
dev_pagemap instance to be supplied. Passing in a dev_pagemap interface
simplifies the design of pgmap type drivers in that they can rely on
container_of() to lookup any private data associated with the given
dev_pagemap instance.

In addition to the cleanups this also gives hmm users multi-order-radix
improvements that arrived with commit ab1b597ee0e4 "mm,
devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups"

As part of the conversion to the devm_memremap_pages() method of
handling the percpu_ref relative to when pages are put, the percpu_ref
completion needs to move to hmm_devmem_ref_exit(). See commit
71389703839e ("mm, zone_device: Replace {get, put}_zone_device_page...")
for details.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 mm/hmm.c |  196 ++++++++------------------------------------------------------
 1 file changed, 26 insertions(+), 170 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 60e4b275ad78..2e72cb4188ca 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -938,17 +938,16 @@ static void hmm_devmem_ref_exit(void *data)
 	struct hmm_devmem *devmem;
 
 	devmem = container_of(ref, struct hmm_devmem, ref);
+	wait_for_completion(&devmem->completion);
 	percpu_ref_exit(ref);
 }
 
-static void hmm_devmem_ref_kill(void *data)
+static void hmm_devmem_ref_kill(struct percpu_ref *ref)
 {
-	struct percpu_ref *ref = data;
 	struct hmm_devmem *devmem;
 
 	devmem = container_of(ref, struct hmm_devmem, ref);
 	percpu_ref_kill(ref);
-	wait_for_completion(&devmem->completion);
 }
 
 static int hmm_devmem_fault(struct vm_area_struct *vma,
@@ -971,154 +970,6 @@ static void hmm_devmem_free(struct page *page, void *data)
 	devmem->ops->free(devmem, page);
 }
 
-static DEFINE_MUTEX(hmm_devmem_lock);
-static RADIX_TREE(hmm_devmem_radix, GFP_KERNEL);
-
-static void hmm_devmem_radix_release(struct resource *resource)
-{
-	resource_size_t key;
-
-	mutex_lock(&hmm_devmem_lock);
-	for (key = resource->start;
-	     key <= resource->end;
-	     key += PA_SECTION_SIZE)
-		radix_tree_delete(&hmm_devmem_radix, key >> PA_SECTION_SHIFT);
-	mutex_unlock(&hmm_devmem_lock);
-}
-
-static void hmm_devmem_release(void *data)
-{
-	struct hmm_devmem *devmem = data;
-	struct resource *resource = devmem->resource;
-	unsigned long start_pfn, npages;
-	struct zone *zone;
-	struct page *page;
-
-	/* pages are dead and unused, undo the arch mapping */
-	start_pfn = (resource->start & ~(PA_SECTION_SIZE - 1)) >> PAGE_SHIFT;
-	npages = ALIGN(resource_size(resource), PA_SECTION_SIZE) >> PAGE_SHIFT;
-
-	page = pfn_to_page(start_pfn);
-	zone = page_zone(page);
-
-	mem_hotplug_begin();
-	if (resource->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY)
-		__remove_pages(zone, start_pfn, npages, NULL);
-	else
-		arch_remove_memory(start_pfn << PAGE_SHIFT,
-				   npages << PAGE_SHIFT, NULL);
-	mem_hotplug_done();
-
-	hmm_devmem_radix_release(resource);
-}
-
-static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
-{
-	resource_size_t key, align_start, align_size, align_end;
-	struct device *device = devmem->device;
-	int ret, nid, is_ram;
-
-	align_start = devmem->resource->start & ~(PA_SECTION_SIZE - 1);
-	align_size = ALIGN(devmem->resource->start +
-			   resource_size(devmem->resource),
-			   PA_SECTION_SIZE) - align_start;
-
-	is_ram = region_intersects(align_start, align_size,
-				   IORESOURCE_SYSTEM_RAM,
-				   IORES_DESC_NONE);
-	if (is_ram == REGION_MIXED) {
-		WARN_ONCE(1, "%s attempted on mixed region %pr\n",
-				__func__, devmem->resource);
-		return -ENXIO;
-	}
-	if (is_ram == REGION_INTERSECTS)
-		return -ENXIO;
-
-	if (devmem->resource->desc == IORES_DESC_DEVICE_PUBLIC_MEMORY)
-		devmem->pagemap.type = MEMORY_DEVICE_PUBLIC;
-	else
-		devmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
-
-	devmem->pagemap.res = *devmem->resource;
-	devmem->pagemap.page_fault = hmm_devmem_fault;
-	devmem->pagemap.page_free = hmm_devmem_free;
-	devmem->pagemap.dev = devmem->device;
-	devmem->pagemap.ref = &devmem->ref;
-	devmem->pagemap.data = devmem;
-
-	mutex_lock(&hmm_devmem_lock);
-	align_end = align_start + align_size - 1;
-	for (key = align_start; key <= align_end; key += PA_SECTION_SIZE) {
-		struct hmm_devmem *dup;
-
-		dup = radix_tree_lookup(&hmm_devmem_radix,
-					key >> PA_SECTION_SHIFT);
-		if (dup) {
-			dev_err(device, "%s: collides with mapping for %s\n",
-				__func__, dev_name(dup->device));
-			mutex_unlock(&hmm_devmem_lock);
-			ret = -EBUSY;
-			goto error;
-		}
-		ret = radix_tree_insert(&hmm_devmem_radix,
-					key >> PA_SECTION_SHIFT,
-					devmem);
-		if (ret) {
-			dev_err(device, "%s: failed: %d\n", __func__, ret);
-			mutex_unlock(&hmm_devmem_lock);
-			goto error_radix;
-		}
-	}
-	mutex_unlock(&hmm_devmem_lock);
-
-	nid = dev_to_node(device);
-	if (nid < 0)
-		nid = numa_mem_id();
-
-	mem_hotplug_begin();
-	/*
-	 * For device private memory we call add_pages() as we only need to
-	 * allocate and initialize struct page for the device memory. More-
-	 * over the device memory is un-accessible thus we do not want to
-	 * create a linear mapping for the memory like arch_add_memory()
-	 * would do.
-	 *
-	 * For device public memory, which is accesible by the CPU, we do
-	 * want the linear mapping and thus use arch_add_memory().
-	 */
-	if (devmem->pagemap.type == MEMORY_DEVICE_PUBLIC)
-		ret = arch_add_memory(nid, align_start, align_size, NULL,
-				false);
-	else
-		ret = add_pages(nid, align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, NULL, false);
-	if (ret) {
-		mem_hotplug_done();
-		goto error_add_memory;
-	}
-	move_pfn_range_to_zone(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
-				align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, NULL);
-	mem_hotplug_done();
-
-	/*
-	 * Initialization of the pages has been deferred until now in order
-	 * to allow us to do the work while not holding the hotplug lock.
-	 */
-	memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
-				align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, &devmem->pagemap);
-
-	return 0;
-
-error_add_memory:
-	untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
-error_radix:
-	hmm_devmem_radix_release(devmem->resource);
-error:
-	return ret;
-}
-
 /*
  * hmm_devmem_add() - hotplug ZONE_DEVICE memory for device memory
  *
@@ -1142,6 +993,7 @@ struct hmm_devmem *hmm_devmem_add(const struct hmm_devmem_ops *ops,
 {
 	struct hmm_devmem *devmem;
 	resource_size_t addr;
+	void *result;
 	int ret;
 
 	dev_pagemap_get_ops();
@@ -1196,14 +1048,18 @@ struct hmm_devmem *hmm_devmem_add(const struct hmm_devmem_ops *ops,
 	devmem->pfn_last = devmem->pfn_first +
 			   (resource_size(devmem->resource) >> PAGE_SHIFT);
 
-	ret = hmm_devmem_pages_create(devmem);
-	if (ret)
-		return ERR_PTR(ret);
-
-	ret = devm_add_action_or_reset(device, hmm_devmem_release, devmem);
-	if (ret)
-		return ERR_PTR(ret);
+	devmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
+	devmem->pagemap.res = *devmem->resource;
+	devmem->pagemap.page_fault = hmm_devmem_fault;
+	devmem->pagemap.page_free = hmm_devmem_free;
+	devmem->pagemap.altmap_valid = false;
+	devmem->pagemap.ref = &devmem->ref;
+	devmem->pagemap.data = devmem;
+	devmem->pagemap.kill = hmm_devmem_ref_kill;
 
+	result = devm_memremap_pages(devmem->device, &devmem->pagemap);
+	if (IS_ERR(result))
+		return result;
 	return devmem;
 }
 EXPORT_SYMBOL(hmm_devmem_add);
@@ -1213,6 +1069,7 @@ struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops,
 					   struct resource *res)
 {
 	struct hmm_devmem *devmem;
+	void *result;
 	int ret;
 
 	if (res->desc != IORES_DESC_DEVICE_PUBLIC_MEMORY)
@@ -1245,19 +1102,18 @@ struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops,
 	devmem->pfn_last = devmem->pfn_first +
 			   (resource_size(devmem->resource) >> PAGE_SHIFT);
 
-	ret = hmm_devmem_pages_create(devmem);
-	if (ret)
-		return ERR_PTR(ret);
-
-	ret = devm_add_action_or_reset(device, hmm_devmem_release, devmem);
-	if (ret)
-		return ERR_PTR(ret);
-
-	ret = devm_add_action_or_reset(device, hmm_devmem_ref_kill,
-			&devmem->ref);
-	if (ret)
-		return ERR_PTR(ret);
+	devmem->pagemap.type = MEMORY_DEVICE_PUBLIC;
+	devmem->pagemap.res = *devmem->resource;
+	devmem->pagemap.page_fault = hmm_devmem_fault;
+	devmem->pagemap.page_free = hmm_devmem_free;
+	devmem->pagemap.altmap_valid = false;
+	devmem->pagemap.ref = &devmem->ref;
+	devmem->pagemap.data = devmem;
+	devmem->pagemap.kill = hmm_devmem_ref_kill;
 
+	result = devm_memremap_pages(devmem->device, &devmem->pagemap);
+	if (IS_ERR(result))
+		return result;
 	return devmem;
 }
 EXPORT_SYMBOL(hmm_devmem_add_resource);


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 7/7] mm, hmm: Mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL
  2018-10-12 17:49 [PATCH v7 0/7] mm: Merge hmm into devm_memremap_pages, mark GPL-only Dan Williams
                   ` (5 preceding siblings ...)
  2018-10-12 17:50 ` [PATCH v7 6/7] mm, hmm: Replace hmm_devmem_pages_create() with devm_memremap_pages() Dan Williams
@ 2018-10-12 17:50 ` Dan Williams
  2018-10-12 18:14 ` [PATCH v7 0/7] mm: Merge hmm into devm_memremap_pages, mark GPL-only Jerome Glisse
  7 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2018-10-12 17:50 UTC (permalink / raw)
  To: akpm
  Cc: Jérôme Glisse, Logan Gunthorpe, Christoph Hellwig,
	linux-mm, linux-kernel

The routines hmm_devmem_add(), and hmm_devmem_add_resource() duplicated
devm_memremap_pages() and are now simple now wrappers around the core
facility to inject a dev_pagemap instance into the global pgmap_radix
and hook page-idle events. The devm_memremap_pages() interface is base
infrastructure for HMM. HMM has more and deeper ties into the kernel
memory management implementation than base ZONE_DEVICE which is itself a
EXPORT_SYMBOL_GPL facility.

Originally, the HMM page structure creation routines copied the
devm_memremap_pages() code and reused ZONE_DEVICE. A cleanup to unify
the implementations was discussed during the initial review:
http://lkml.iu.edu/hypermail/linux/kernel/1701.2/00812.html
Recent work to extend devm_memremap_pages() for the peer-to-peer-DMA
facility enabled this cleanup to move forward.

In addition to the integration with devm_memremap_pages() HMM depends on
other GPL-only symbols:

    mmu_notifier_unregister_no_release
    percpu_ref
    region_intersects
    __class_create

It goes further to consume / indirectly expose functionality that is not
exported to any other driver:

    alloc_pages_vma
    walk_page_range

HMM is derived from devm_memremap_pages(), and extends deep core-kernel
fundamentals. Similar to devm_memremap_pages(), mark its entry points
EXPORT_SYMBOL_GPL().

Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 mm/hmm.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 2e72cb4188ca..90d1383c7e24 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -1062,7 +1062,7 @@ struct hmm_devmem *hmm_devmem_add(const struct hmm_devmem_ops *ops,
 		return result;
 	return devmem;
 }
-EXPORT_SYMBOL(hmm_devmem_add);
+EXPORT_SYMBOL_GPL(hmm_devmem_add);
 
 struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops,
 					   struct device *device,
@@ -1116,7 +1116,7 @@ struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops,
 		return result;
 	return devmem;
 }
-EXPORT_SYMBOL(hmm_devmem_add_resource);
+EXPORT_SYMBOL_GPL(hmm_devmem_add_resource);
 
 /*
  * A device driver that wants to handle multiple devices memory through a


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 0/7] mm: Merge hmm into devm_memremap_pages, mark GPL-only
  2018-10-12 17:49 [PATCH v7 0/7] mm: Merge hmm into devm_memremap_pages, mark GPL-only Dan Williams
                   ` (6 preceding siblings ...)
  2018-10-12 17:50 ` [PATCH v7 7/7] mm, hmm: Mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL Dan Williams
@ 2018-10-12 18:14 ` Jerome Glisse
  7 siblings, 0 replies; 12+ messages in thread
From: Jerome Glisse @ 2018-10-12 18:14 UTC (permalink / raw)
  To: Dan Williams
  Cc: akpm, stable, Balbir Singh, Logan Gunthorpe, Christoph Hellwig,
	Michal Hocko, linux-mm, linux-kernel

On Fri, Oct 12, 2018 at 10:49:31AM -0700, Dan Williams wrote:
> [ apologies for the resend, script error ]
> 
> Changes since v6 [1]:
> * Rebase on next-20181008 and fixup conflicts with the xarray conversion
>   and hotplug optimizations
> * It has soaked on a 0day visible branch for a few days without any
>   reports.
> 
> [1]: https://lkml.org/lkml/2018/9/13/104
> 
> ---
> 
> Hi Andrew,
> 
> Jérôme has reviewed the cleanups, thanks Jérôme. We still disagree on
> the EXPORT_SYMBOL_GPL status of the core HMM implementation, but Logan,
> Christoph and I continue to support marking all devm_memremap_pages()
> derivatives EXPORT_SYMBOL_GPL.
> 
> HMM has been upstream for over a year, with no in-tree users it is clear
> it was designed first and foremost for out of tree drivers. It takes
> advantage of a facility Christoph and I spearheaded to support
> persistent memory. It continues to see expanding use cases with no clear
> end date when it will stop attracting features / revisions. It is not
> suitable to export devm_memremap_pages() as a stable 3rd party driver
> api.
> 
> devm_memremap_pages() is a facility that can create struct page entries
> for any arbitrary range and give out-of-tree drivers the ability to
> subvert core aspects of page management. It, and anything derived from
> it (e.g. hmm, pcip2p, etc...), is a deep integration point into the core
> kernel, and an EXPORT_SYMBOL_GPL() interface. 
> 
> Commit 31c5bda3a656 "mm: fix exports that inadvertently make put_page()
> EXPORT_SYMBOL_GPL" was merged ahead of this series to relieve some of
> the pressure from innocent consumers of put_page(), but now we need this
> series to address *producers* of device pages.
> 
> More details and justification in the changelogs. The 0day
> infrastructure has reported success across 152 configs and this survives
> the libnvdimm unit test suite. Aside from the controversial bits the
> diffstat is compelling at:
> 
> 	7 files changed, 126 insertions(+), 323 deletions(-)
> 
> Note that the series has some minor collisions with Alex's recent series
> to improve devm_memremap_pages() scalability [2]. So, whichever you take
> first the other will need a minor rebase.
> 
> [2]: https://www.lkml.org/lkml/2018/9/11/10

I am fine with this patchset going in (i reviewed it and tested it with
HMM on nouveau), Dan and Christoph did author the original code around
devm_memremap_pages() and thus ultimately they are the one who should
decide over GPL export or not.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 1/7] mm, devm_memremap_pages: Mark devm_memremap_pages() EXPORT_SYMBOL_GPL
  2018-10-12 17:49 ` [PATCH v7 1/7] mm, devm_memremap_pages: Mark devm_memremap_pages() EXPORT_SYMBOL_GPL Dan Williams
@ 2018-10-17  8:17   ` Michal Hocko
  2018-10-17 16:30     ` Dan Williams
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2018-10-17  8:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: akpm, Jérôme Glisse, Christoph Hellwig, linux-mm, linux-kernel

On Fri 12-10-18 10:49:37, Dan Williams wrote:
> devm_memremap_pages() is a facility that can create struct page entries
> for any arbitrary range and give drivers the ability to subvert core
> aspects of page management.
> 
> Specifically the facility is tightly integrated with the kernel's memory
> hotplug functionality. It injects an altmap argument deep into the
> architecture specific vmemmap implementation to allow allocating from
> specific reserved pages, and it has Linux specific assumptions about
> page structure reference counting relative to get_user_pages() and
> get_user_pages_fast(). It was an oversight and a mistake that this was
> not marked EXPORT_SYMBOL_GPL from the outset.

One thing is still not clear to me. Both devm_memremap_* and
hmm_devmem_add essentially do the same thing AFAICS. They both allow to
hotplug a device memory. Both rely on the hotplug code (namely
add_pages) which itself is not exported to modules. One is GPL only
while the later is a general export. Is this mismatch desirable?

API exported by the core hotplug is ad-hoc to say the least. Symbols
that we actually export are GPL mostly (only try_offline_node is
EXPORT_SYMBOL without any explanation whatsoever). So I would call it a
general mess tweaked for specific usecases.

I personally do not care about EXPORT_SYMBOL vs. EXPORT_SYMBOL_GPL
much to be honest. I understand an argument that we do not care about
out-of-tree modules a wee bit. I would just be worried those will find a
way around and my experience tells me that it would be much uglier than
what the core kernel can provide. But this seems more political than
technical discussion.
 
> Again, devm_memremap_pagex() exposes and relies upon core kernel
> internal assumptions and will continue to evolve along with 'struct
> page', memory hotplug, and support for new memory types / topologies.
> Only an in-kernel GPL-only driver is expected to keep up with this
> ongoing evolution. This interface, and functionality derived from this
> interface, is not suitable for kernel-external drivers.

I do not follow this line of argumentation though. We generally do not
care about out-of-tree modules and breaking them if the interface has to
be updated. Also what about GPL out of tree modules?

That being said, I do not mind this patch. You and Christoph are the
authors and therefore it is you to decide. I just find the current
situation confusing to say the least.

> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  kernel/memremap.c                 |    2 +-
>  tools/testing/nvdimm/test/iomap.c |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index 6ec81e0d7a33..1bbb2e892941 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -232,7 +232,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
>   err_array:
>  	return ERR_PTR(error);
>  }
> -EXPORT_SYMBOL(devm_memremap_pages);
> +EXPORT_SYMBOL_GPL(devm_memremap_pages);
>  
>  unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
>  {
> diff --git a/tools/testing/nvdimm/test/iomap.c b/tools/testing/nvdimm/test/iomap.c
> index ff9d3a5825e1..ed18a0cbc0c8 100644
> --- a/tools/testing/nvdimm/test/iomap.c
> +++ b/tools/testing/nvdimm/test/iomap.c
> @@ -113,7 +113,7 @@ void *__wrap_devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
>  		return nfit_res->buf + offset - nfit_res->res.start;
>  	return devm_memremap_pages(dev, pgmap);
>  }
> -EXPORT_SYMBOL(__wrap_devm_memremap_pages);
> +EXPORT_SYMBOL_GPL(__wrap_devm_memremap_pages);
>  
>  pfn_t __wrap_phys_to_pfn_t(phys_addr_t addr, unsigned long flags)
>  {
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 1/7] mm, devm_memremap_pages: Mark devm_memremap_pages() EXPORT_SYMBOL_GPL
  2018-10-17  8:17   ` Michal Hocko
@ 2018-10-17 16:30     ` Dan Williams
  2018-10-23 17:01       ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Williams @ 2018-10-17 16:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Jérôme Glisse, Christoph Hellwig,
	Linux MM, Linux Kernel Mailing List

On Wed, Oct 17, 2018 at 1:18 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 12-10-18 10:49:37, Dan Williams wrote:
> > devm_memremap_pages() is a facility that can create struct page entries
> > for any arbitrary range and give drivers the ability to subvert core
> > aspects of page management.
> >
> > Specifically the facility is tightly integrated with the kernel's memory
> > hotplug functionality. It injects an altmap argument deep into the
> > architecture specific vmemmap implementation to allow allocating from
> > specific reserved pages, and it has Linux specific assumptions about
> > page structure reference counting relative to get_user_pages() and
> > get_user_pages_fast(). It was an oversight and a mistake that this was
> > not marked EXPORT_SYMBOL_GPL from the outset.
>
> One thing is still not clear to me. Both devm_memremap_* and
> hmm_devmem_add essentially do the same thing AFAICS. They both allow to
> hotplug a device memory. Both rely on the hotplug code (namely
> add_pages) which itself is not exported to modules. One is GPL only
> while the later is a general export. Is this mismatch desirable?

That is resolved by the last patch in this series.

> API exported by the core hotplug is ad-hoc to say the least. Symbols
> that we actually export are GPL mostly (only try_offline_node is
> EXPORT_SYMBOL without any explanation whatsoever). So I would call it a
> general mess tweaked for specific usecases.
>
> I personally do not care about EXPORT_SYMBOL vs. EXPORT_SYMBOL_GPL
> much to be honest. I understand an argument that we do not care about
> out-of-tree modules a wee bit. I would just be worried those will find a
> way around and my experience tells me that it would be much uglier than
> what the core kernel can provide. But this seems more political than
> technical discussion.

I'm not worried about people finding a way around, I'm sure
cringe-worthy workarounds of core kernel details happen on a regular
basis in out-of-tree code. EXPORT_SYMBOL_GPL in this instance is not a
political statement, it is a statement of the rate of evolution and
depth of the api.

>
> > Again, devm_memremap_pagex() exposes and relies upon core kernel
> > internal assumptions and will continue to evolve along with 'struct
> > page', memory hotplug, and support for new memory types / topologies.
> > Only an in-kernel GPL-only driver is expected to keep up with this
> > ongoing evolution. This interface, and functionality derived from this
> > interface, is not suitable for kernel-external drivers.
>
> I do not follow this line of argumentation though. We generally do not
> care about out-of-tree modules and breaking them if the interface has to
> be updated.

Exactly right. The EXPORT_SYMBOL_GPL is there to say that this api has
deep enough ties into the core kernel to lower the confidence that the
API will stay stable from one kernel revision to the next. It's also
an api that is attracting widening and varied reuse and the long term
health of the implementation depends on being able to peer deeply into
its users and refactor the interface and the core kernel as a result.

> Also what about GPL out of tree modules?

These are precisely the modules we want upstream.

> That being said, I do not mind this patch. You and Christoph are the
> authors and therefore it is you to decide. I just find the current
> situation confusing to say the least.

Hopefully I clarified, let me know if not.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 1/7] mm, devm_memremap_pages: Mark devm_memremap_pages() EXPORT_SYMBOL_GPL
  2018-10-17 16:30     ` Dan Williams
@ 2018-10-23 17:01       ` Michal Hocko
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2018-10-23 17:01 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Jérôme Glisse, Christoph Hellwig,
	Linux MM, Linux Kernel Mailing List

On Wed 17-10-18 09:30:58, Dan Williams wrote:
> On Wed, Oct 17, 2018 at 1:18 AM Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > > Again, devm_memremap_pagex() exposes and relies upon core kernel
> > > internal assumptions and will continue to evolve along with 'struct
> > > page', memory hotplug, and support for new memory types / topologies.
> > > Only an in-kernel GPL-only driver is expected to keep up with this
> > > ongoing evolution. This interface, and functionality derived from this
> > > interface, is not suitable for kernel-external drivers.
> >
> > I do not follow this line of argumentation though. We generally do not
> > care about out-of-tree modules and breaking them if the interface has to
> > be updated.
> 
> Exactly right. The EXPORT_SYMBOL_GPL is there to say that this api has
> deep enough ties into the core kernel to lower the confidence that the
> API will stay stable from one kernel revision to the next. It's also
> an api that is attracting widening and varied reuse and the long term
> health of the implementation depends on being able to peer deeply into
> its users and refactor the interface and the core kernel as a result.

I am sorry I do not follow. For in-tree modules you have to update users
whether the export is GPL or not and we do not care _at all_ about out
of tree because we do not guarantee _any_ kABI/API stability
(Documentation/process/stable-api-nonsense.rst).

Anyway, I do not really care much, but I find the way of the
argumentation dubious. I can clearly understand a simple line "me as the
author want this GPL - live with that".
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-10-23 17:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-12 17:49 [PATCH v7 0/7] mm: Merge hmm into devm_memremap_pages, mark GPL-only Dan Williams
2018-10-12 17:49 ` [PATCH v7 1/7] mm, devm_memremap_pages: Mark devm_memremap_pages() EXPORT_SYMBOL_GPL Dan Williams
2018-10-17  8:17   ` Michal Hocko
2018-10-17 16:30     ` Dan Williams
2018-10-23 17:01       ` Michal Hocko
2018-10-12 17:49 ` [PATCH v7 2/7] mm, devm_memremap_pages: Kill mapping "System RAM" support Dan Williams
2018-10-12 17:49 ` [PATCH v7 3/7] mm, devm_memremap_pages: Fix shutdown handling Dan Williams
2018-10-12 17:49 ` [PATCH v7 4/7] mm, devm_memremap_pages: Add MEMORY_DEVICE_PRIVATE support Dan Williams
2018-10-12 17:49 ` [PATCH v7 5/7] mm, hmm: Use devm semantics for hmm_devmem_{add, remove} Dan Williams
2018-10-12 17:50 ` [PATCH v7 6/7] mm, hmm: Replace hmm_devmem_pages_create() with devm_memremap_pages() Dan Williams
2018-10-12 17:50 ` [PATCH v7 7/7] mm, hmm: Mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL Dan Williams
2018-10-12 18:14 ` [PATCH v7 0/7] mm: Merge hmm into devm_memremap_pages, mark GPL-only Jerome Glisse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).