linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges
@ 2020-09-25 19:11 Dan Williams
  2020-09-25 19:11 ` [PATCH v5 01/17] device-dax: make pgmap optional for instance creation Dan Williams
                   ` (17 more replies)
  0 siblings, 18 replies; 31+ messages in thread
From: Dan Williams @ 2020-09-25 19:11 UTC (permalink / raw)
  To: akpm
  Cc: David Hildenbrand, Ira Weiny, Bjorn Helgaas, Vishal Verma,
	Dave Hansen, David Airlie, Vivek Goyal, Joao Martins, Dave Jiang,
	Jonathan Cameron, Greg Kroah-Hartman, Pavel Tatashin, Hulk Robot,
	Ben Skeggs, Benjamin Herrenschmidt, Jia He,
	Jérôme Glisse, Jason Yan, Paul Mackerras,
	Boris Ostrovsky, Brice Goglin, Stefano Stabellini,
	Michael Ellerman, Juergen Gross, Daniel Vetter, linux-mm,
	linux-nvdimm, linux-kernel

Changes since v4 [1]:
- Rebased on
  device-dax-move-instance-creation-parameters-to-struct-dev_dax_data.patch
  in -mm [2]. I.e. patches that did not need fixups from v4 are not
  included.

- Folded all fixes

- Replaced "device-dax: kill dax_kmem_res" with:

      device-dax/kmem: introduce dax_kmem_range()
      device-dax/kmem: move resource name tracking to drvdata
      device-dax/kmem: replace release_resource() with release_mem_region()

  ...to address David's request to make those cleanups easier to review.
  Note that I dropped changes to how IORESOURCE_BUSY is manipulated since
  David and I are still debating the best way forward there.

- Broke out some of dax-bus reworks in "device-dax: introduce 'seed'
  devices" to a new "device-dax: introduce 'struct dev_dax' typed-driver
  operations"

- Added a conversion of xen_alloc_unallocated_pages() from pgmap.res to
  pgmap.range. I found it odd that there is no corresponding
  memunmap_pages() triggered by xen_free_unallocated_pages()?

- Not included, a conversion of virtio_fs to use pgmap.range for its new
  usage of devm_memremap_pages(). It appears the virtio_fs changes are
  merged after -mm? My mental model of -mm was that it applies on top of
  linux-next? In any event, Vivek, you will need to coordinate a
  conversion to pgmap.range for the virtio_fs dax-support merge. Maybe
  that should go through Andrew as well?

- Lowercase all the subject lines per akpm's preference

- Received a 0day robot build-success notification over 122 configs

- Thanks to Joao for looking after this set while I was out.

[1]: http://lore.kernel.org/r/159625229779.3040297.11363509688097221416.stgit@dwillia2-desk3.amr.corp.intel.com
[2]: https://ozlabs.org/~akpm/mmots/broken-out/device-dax-move-instance-creation-parameters-to-struct-dev_dax_data.patch

---

Andrew, this series replaces

device-dax-make-pgmap-optional-for-instance-creation.patch

...through...

dax-hmem-introduce-dax_hmemregion_idle-parameter.patch

...in your stack.

Let me know if there is a different / preferred way to refresh a bulk of
patches in your queue when only a subset need updates.

---

The device-dax facility allows an address range to be directly mapped
through a chardev, or optionally hotplugged to the core kernel page
allocator as System-RAM. It is the mechanism for converting persistent
memory (pmem) to be used as another volatile memory pool i.e. the
current Memory Tiering hot topic on linux-mm.

In the case of pmem the nvdimm-namespace-label mechanism can sub-divide
it, but that labeling mechanism is not available / applicable to
soft-reserved ("EFI specific purpose") memory [3]. This series provides
a sysfs-mechanism for the daxctl utility to enable provisioning of
volatile-soft-reserved memory ranges.

The motivations for this facility are:

1/ Allow performance differentiated memory ranges to be split between
   kernel-managed and directly-accessed use cases.

2/ Allow physical memory to be provisioned along performance relevant
   address boundaries. For example, divide a memory-side cache [4] along
   cache-color boundaries.

3/ Parcel out soft-reserved memory to VMs using device-dax as a security
   / permissions boundary [5]. Specifically I have seen people (ab)using
   memmap=nn!ss (mark System-RAM as Persistent Memory) just to get the
   device-dax interface on custom address ranges. A follow-on for the VM
   use case is to teach device-dax to dynamically allocate 'struct page' at
   runtime to reduce the duplication of 'struct page' space in both the
   guest and the host kernel for the same physical pages.

[3]: http://lore.kernel.org/r/157309097008.1579826.12818463304589384434.stgit@dwillia2-desk3.amr.corp.intel.com
[4]: http://lore.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com
[5]: http://lore.kernel.org/r/20200110190313.17144-1-joao.m.martins@oracle.com

---

Dan Williams (14):
      device-dax: make pgmap optional for instance creation
      device-dax/kmem: introduce dax_kmem_range()
      device-dax/kmem: move resource name tracking to drvdata
      device-dax/kmem: replace release_resource() with release_mem_region()
      device-dax: add an allocation interface for device-dax instances
      device-dax: introduce 'struct dev_dax' typed-driver operations
      device-dax: introduce 'seed' devices
      drivers/base: make device_find_child_by_name() compatible with sysfs inputs
      device-dax: add resize support
      mm/memremap_pages: convert to 'struct range'
      mm/memremap_pages: support multiple ranges per invocation
      device-dax: add dis-contiguous resource support
      device-dax: introduce 'mapping' devices
      device-dax: add an 'align' attribute

Joao Martins (3):
      device-dax: make align a per-device property
      dax/hmem: introduce dax_hmem.region_idle parameter
      device-dax: add a range mapping allocation attribute


 arch/powerpc/kvm/book3s_hv_uvmem.c     |   14 
 drivers/base/core.c                    |    2 
 drivers/dax/bus.c                      | 1039 ++++++++++++++++++++++++++++++--
 drivers/dax/bus.h                      |   11 
 drivers/dax/dax-private.h              |   58 ++
 drivers/dax/device.c                   |  112 ++-
 drivers/dax/hmem/hmem.c                |   17 -
 drivers/dax/kmem.c                     |  178 +++--
 drivers/dax/pmem/compat.c              |    2 
 drivers/dax/pmem/core.c                |   14 
 drivers/gpu/drm/nouveau/nouveau_dmem.c |   15 
 drivers/nvdimm/badrange.c              |   26 -
 drivers/nvdimm/claim.c                 |   13 
 drivers/nvdimm/nd.h                    |    3 
 drivers/nvdimm/pfn_devs.c              |   13 
 drivers/nvdimm/pmem.c                  |   27 -
 drivers/nvdimm/region.c                |   21 -
 drivers/pci/p2pdma.c                   |   12 
 drivers/xen/unpopulated-alloc.c        |   45 +
 include/linux/memremap.h               |   11 
 include/linux/range.h                  |    6 
 lib/test_hmm.c                         |   15 
 mm/memremap.c                          |  299 +++++----
 tools/testing/nvdimm/dax-dev.c         |   22 -
 tools/testing/nvdimm/test/iomap.c      |    2 
 25 files changed, 1557 insertions(+), 420 deletions(-)

base-commit: 6764736525f27a411ba2c0c430aaa2df7375f3ac

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 01/17] device-dax: make pgmap optional for instance creation
  2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
@ 2020-09-25 19:11 ` Dan Williams
  2020-10-01  8:41   ` David Hildenbrand
  2020-09-25 19:11 ` [PATCH v5 02/17] device-dax/kmem: introduce dax_kmem_range() Dan Williams
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 31+ messages in thread
From: Dan Williams @ 2020-09-25 19:11 UTC (permalink / raw)
  To: akpm
  Cc: David Hildenbrand, Vishal Verma, Dave Hansen, Pavel Tatashin,
	Brice Goglin, Dave Jiang, David Hildenbrand, Ira Weiny, Jia He,
	Joao Martins, Jonathan Cameron, linux-mm, linux-nvdimm,
	linux-kernel

The passed in dev_pagemap is only required in the pmem case as the
libnvdimm core may have reserved a vmem_altmap for dev_memremap_pages() to
place the memmap in pmem directly.  In the hmem case there is no agent
reserving an altmap so it can all be handled by a core internal default.

Pass the resource range via a new @range property of 'struct
dev_dax_data'.

Link: https://lkml.kernel.org/r/159643099958.4062302.10379230791041872886.stgit@dwillia2-desk3.amr.corp.intel.com
Cc: David Hildenbrand <david@redhat.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/bus.c              |   29 +++++++++++++++--------------
 drivers/dax/bus.h              |    2 ++
 drivers/dax/dax-private.h      |    9 ++++++++-
 drivers/dax/device.c           |   28 +++++++++++++++++++---------
 drivers/dax/hmem/hmem.c        |    8 ++++----
 drivers/dax/kmem.c             |   12 ++++++------
 drivers/dax/pmem/core.c        |    4 ++++
 tools/testing/nvdimm/dax-dev.c |    8 ++++----
 8 files changed, 62 insertions(+), 38 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index dffa4655e128..96bd64ba95a5 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -271,7 +271,7 @@ static ssize_t size_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
 	struct dev_dax *dev_dax = to_dev_dax(dev);
-	unsigned long long size = resource_size(&dev_dax->region->res);
+	unsigned long long size = range_len(&dev_dax->range);
 
 	return sprintf(buf, "%llu\n", size);
 }
@@ -293,19 +293,12 @@ static ssize_t target_node_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(target_node);
 
-static unsigned long long dev_dax_resource(struct dev_dax *dev_dax)
-{
-	struct dax_region *dax_region = dev_dax->region;
-
-	return dax_region->res.start;
-}
-
 static ssize_t resource_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
 	struct dev_dax *dev_dax = to_dev_dax(dev);
 
-	return sprintf(buf, "%#llx\n", dev_dax_resource(dev_dax));
+	return sprintf(buf, "%#llx\n", dev_dax->range.start);
 }
 static DEVICE_ATTR(resource, 0400, resource_show, NULL);
 
@@ -376,6 +369,7 @@ static void dev_dax_release(struct device *dev)
 
 	dax_region_put(dax_region);
 	put_dax(dax_dev);
+	kfree(dev_dax->pgmap);
 	kfree(dev_dax);
 }
 
@@ -412,7 +406,12 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 	if (!dev_dax)
 		return ERR_PTR(-ENOMEM);
 
-	memcpy(&dev_dax->pgmap, data->pgmap, sizeof(struct dev_pagemap));
+	if (data->pgmap) {
+		dev_dax->pgmap = kmemdup(data->pgmap,
+				sizeof(struct dev_pagemap), GFP_KERNEL);
+		if (!dev_dax->pgmap)
+			goto err_pgmap;
+	}
 
 	/*
 	 * No 'host' or dax_operations since there is no access to this
@@ -421,18 +420,19 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 	dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
 	if (IS_ERR(dax_dev)) {
 		rc = PTR_ERR(dax_dev);
-		goto err;
+		goto err_alloc_dax;
 	}
 
 	/* a device_dax instance is dead while the driver is not attached */
 	kill_dax(dax_dev);
 
-	/* from here on we're committed to teardown via dax_dev_release() */
+	/* from here on we're committed to teardown via dev_dax_release() */
 	dev = &dev_dax->dev;
 	device_initialize(dev);
 
 	dev_dax->dax_dev = dax_dev;
 	dev_dax->region = dax_region;
+	dev_dax->range = data->range;
 	dev_dax->target_node = dax_region->target_node;
 	kref_get(&dax_region->kref);
 
@@ -458,8 +458,9 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 		return ERR_PTR(rc);
 
 	return dev_dax;
-
- err:
+err_alloc_dax:
+	kfree(dev_dax->pgmap);
+err_pgmap:
 	kfree(dev_dax);
 
 	return ERR_PTR(rc);
diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index 299c2e7fac09..4aeb36da83a4 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -3,6 +3,7 @@
 #ifndef __DAX_BUS_H__
 #define __DAX_BUS_H__
 #include <linux/device.h>
+#include <linux/range.h>
 
 struct dev_dax;
 struct resource;
@@ -21,6 +22,7 @@ struct dev_dax_data {
 	struct dax_region *dax_region;
 	struct dev_pagemap *pgmap;
 	enum dev_dax_subsys subsys;
+	struct range range;
 	int id;
 };
 
diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index 8a4c40ccd2ef..6779f683671d 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -41,6 +41,7 @@ struct dax_region {
  * @target_node: effective numa node if dev_dax memory range is onlined
  * @dev - device core
  * @pgmap - pgmap for memmap setup / lifetime (driver owned)
+ * @range: resource range for the instance
  * @dax_mem_res: physical address range of hotadded DAX memory
  * @dax_mem_name: name for hotadded DAX memory via add_memory_driver_managed()
  */
@@ -49,10 +50,16 @@ struct dev_dax {
 	struct dax_device *dax_dev;
 	int target_node;
 	struct device dev;
-	struct dev_pagemap pgmap;
+	struct dev_pagemap *pgmap;
+	struct range range;
 	struct resource *dax_kmem_res;
 };
 
+static inline u64 range_len(struct range *range)
+{
+	return range->end - range->start + 1;
+}
+
 static inline struct dev_dax *to_dev_dax(struct device *dev)
 {
 	return container_of(dev, struct dev_dax, dev);
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index c528b725789b..287cf0a3db23 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -55,12 +55,12 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
 __weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff,
 		unsigned long size)
 {
-	struct resource *res = &dev_dax->region->res;
+	struct range *range = &dev_dax->range;
 	phys_addr_t phys;
 
-	phys = pgoff * PAGE_SIZE + res->start;
-	if (phys >= res->start && phys <= res->end) {
-		if (phys + size - 1 <= res->end)
+	phys = pgoff * PAGE_SIZE + range->start;
+	if (phys >= range->start && phys <= range->end) {
+		if (phys + size - 1 <= range->end)
 			return phys;
 	}
 
@@ -396,21 +396,31 @@ int dev_dax_probe(struct device *dev)
 {
 	struct dev_dax *dev_dax = to_dev_dax(dev);
 	struct dax_device *dax_dev = dev_dax->dax_dev;
-	struct resource *res = &dev_dax->region->res;
+	struct range *range = &dev_dax->range;
+	struct dev_pagemap *pgmap;
 	struct inode *inode;
 	struct cdev *cdev;
 	void *addr;
 	int rc;
 
 	/* 1:1 map region resource range to device-dax instance range */
-	if (!devm_request_mem_region(dev, res->start, resource_size(res),
+	if (!devm_request_mem_region(dev, range->start, range_len(range),
 				dev_name(dev))) {
-		dev_warn(dev, "could not reserve region %pR\n", res);
+		dev_warn(dev, "could not reserve range: %#llx - %#llx\n",
+				range->start, range->end);
 		return -EBUSY;
 	}
 
-	dev_dax->pgmap.type = MEMORY_DEVICE_GENERIC;
-	addr = devm_memremap_pages(dev, &dev_dax->pgmap);
+	pgmap = dev_dax->pgmap;
+	if (!pgmap) {
+		pgmap = devm_kzalloc(dev, sizeof(*pgmap), GFP_KERNEL);
+		if (!pgmap)
+			return -ENOMEM;
+		pgmap->res.start = range->start;
+		pgmap->res.end = range->end;
+	}
+	pgmap->type = MEMORY_DEVICE_GENERIC;
+	addr = devm_memremap_pages(dev, pgmap);
 	if (IS_ERR(addr))
 		return PTR_ERR(addr);
 
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index b84fe17178d8..af82d6ba820a 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -8,7 +8,6 @@
 static int dax_hmem_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
-	struct dev_pagemap pgmap = { };
 	struct dax_region *dax_region;
 	struct memregion_info *mri;
 	struct dev_dax_data data;
@@ -20,8 +19,6 @@ static int dax_hmem_probe(struct platform_device *pdev)
 		return -ENOMEM;
 
 	mri = dev->platform_data;
-	memcpy(&pgmap.res, res, sizeof(*res));
-
 	dax_region = alloc_dax_region(dev, pdev->id, res, mri->target_node,
 			PMD_SIZE);
 	if (!dax_region)
@@ -30,7 +27,10 @@ static int dax_hmem_probe(struct platform_device *pdev)
 	data = (struct dev_dax_data) {
 		.dax_region = dax_region,
 		.id = 0,
-		.pgmap = &pgmap,
+		.range = {
+			.start = res->start,
+			.end = res->end,
+		},
 	};
 	dev_dax = devm_create_dev_dax(&data);
 	if (IS_ERR(dev_dax))
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 275aa5f87399..5bb133df147d 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -22,7 +22,7 @@ static bool any_hotremove_failed;
 int dev_dax_kmem_probe(struct device *dev)
 {
 	struct dev_dax *dev_dax = to_dev_dax(dev);
-	struct resource *res = &dev_dax->region->res;
+	struct range *range = &dev_dax->range;
 	resource_size_t kmem_start;
 	resource_size_t kmem_size;
 	resource_size_t kmem_end;
@@ -39,17 +39,17 @@ int dev_dax_kmem_probe(struct device *dev)
 	 */
 	numa_node = dev_dax->target_node;
 	if (numa_node < 0) {
-		dev_warn(dev, "rejecting DAX region %pR with invalid node: %d\n",
-			 res, numa_node);
+		dev_warn(dev, "rejecting DAX region with invalid node: %d\n",
+				numa_node);
 		return -EINVAL;
 	}
 
 	/* Hotplug starting at the beginning of the next block: */
-	kmem_start = ALIGN(res->start, memory_block_size_bytes());
+	kmem_start = ALIGN(range->start, memory_block_size_bytes());
 
-	kmem_size = resource_size(res);
+	kmem_size = range_len(range);
 	/* Adjust the size down to compensate for moving up kmem_start: */
-	kmem_size -= kmem_start - res->start;
+	kmem_size -= kmem_start - range->start;
 	/* Align the size down to cover only complete blocks: */
 	kmem_size &= ~(memory_block_size_bytes() - 1);
 	kmem_end = kmem_start + kmem_size;
diff --git a/drivers/dax/pmem/core.c b/drivers/dax/pmem/core.c
index 08ee5947a49c..4fa81d3d2f65 100644
--- a/drivers/dax/pmem/core.c
+++ b/drivers/dax/pmem/core.c
@@ -63,6 +63,10 @@ struct dev_dax *__dax_pmem_probe(struct device *dev, enum dev_dax_subsys subsys)
 		.id = id,
 		.pgmap = &pgmap,
 		.subsys = subsys,
+		.range = {
+			.start = res.start,
+			.end = res.end,
+		},
 	};
 	dev_dax = devm_create_dev_dax(&data);
 
diff --git a/tools/testing/nvdimm/dax-dev.c b/tools/testing/nvdimm/dax-dev.c
index 7e5d979e73cb..38d8e55c4a0d 100644
--- a/tools/testing/nvdimm/dax-dev.c
+++ b/tools/testing/nvdimm/dax-dev.c
@@ -9,12 +9,12 @@
 phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff,
 		unsigned long size)
 {
-	struct resource *res = &dev_dax->region->res;
+	struct range *range = &dev_dax->range;
 	phys_addr_t addr;
 
-	addr = pgoff * PAGE_SIZE + res->start;
-	if (addr >= res->start && addr <= res->end) {
-		if (addr + size - 1 <= res->end) {
+	addr = pgoff * PAGE_SIZE + range->start;
+	if (addr >= range->start && addr <= range->end) {
+		if (addr + size - 1 <= range->end) {
 			if (get_nfit_res(addr)) {
 				struct page *page;
 


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 02/17] device-dax/kmem: introduce dax_kmem_range()
  2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
  2020-09-25 19:11 ` [PATCH v5 01/17] device-dax: make pgmap optional for instance creation Dan Williams
@ 2020-09-25 19:11 ` Dan Williams
  2020-09-30 16:14   ` David Hildenbrand
  2020-09-25 19:11 ` [PATCH v5 03/17] device-dax/kmem: move resource name tracking to drvdata Dan Williams
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 31+ messages in thread
From: Dan Williams @ 2020-09-25 19:11 UTC (permalink / raw)
  To: akpm
  Cc: David Hildenbrand, Vishal Verma, Dave Hansen, Pavel Tatashin,
	Brice Goglin, Dave Jiang, David Hildenbrand, Ira Weiny, Jia He,
	Joao Martins, Jonathan Cameron, linux-mm, linux-nvdimm,
	linux-kernel

Towards removing the mode specific @dax_kmem_res attribute from the
generic 'struct dev_dax', and preparing for multi-range support, teach
the driver to calculate the hotplug range from the device range. The
hotplug range is the trivially calculated memory-block-size aligned
version of the device range.

Cc: David Hildenbrand <david@redhat.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/kmem.c |   40 +++++++++++++++++-----------------------
 1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 5bb133df147d..b0d6a99cf12d 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -19,13 +19,20 @@ static const char *kmem_name;
 /* Set if any memory will remain added when the driver will be unloaded. */
 static bool any_hotremove_failed;
 
+static struct range dax_kmem_range(struct dev_dax *dev_dax)
+{
+	struct range range;
+
+	/* memory-block align the hotplug range */
+	range.start = ALIGN(dev_dax->range.start, memory_block_size_bytes());
+	range.end = ALIGN_DOWN(dev_dax->range.end + 1, memory_block_size_bytes()) - 1;
+	return range;
+}
+
 int dev_dax_kmem_probe(struct device *dev)
 {
 	struct dev_dax *dev_dax = to_dev_dax(dev);
-	struct range *range = &dev_dax->range;
-	resource_size_t kmem_start;
-	resource_size_t kmem_size;
-	resource_size_t kmem_end;
+	struct range range = dax_kmem_range(dev_dax);
 	struct resource *new_res;
 	const char *new_res_name;
 	int numa_node;
@@ -44,25 +51,14 @@ int dev_dax_kmem_probe(struct device *dev)
 		return -EINVAL;
 	}
 
-	/* Hotplug starting at the beginning of the next block: */
-	kmem_start = ALIGN(range->start, memory_block_size_bytes());
-
-	kmem_size = range_len(range);
-	/* Adjust the size down to compensate for moving up kmem_start: */
-	kmem_size -= kmem_start - range->start;
-	/* Align the size down to cover only complete blocks: */
-	kmem_size &= ~(memory_block_size_bytes() - 1);
-	kmem_end = kmem_start + kmem_size;
-
 	new_res_name = kstrdup(dev_name(dev), GFP_KERNEL);
 	if (!new_res_name)
 		return -ENOMEM;
 
 	/* Region is permanently reserved if hotremove fails. */
-	new_res = request_mem_region(kmem_start, kmem_size, new_res_name);
+	new_res = request_mem_region(range.start, range_len(&range), new_res_name);
 	if (!new_res) {
-		dev_warn(dev, "could not reserve region [%pa-%pa]\n",
-			 &kmem_start, &kmem_end);
+		dev_warn(dev, "could not reserve region [%#llx-%#llx]\n", range.start, range.end);
 		kfree(new_res_name);
 		return -EBUSY;
 	}
@@ -96,9 +92,8 @@ int dev_dax_kmem_probe(struct device *dev)
 static int dev_dax_kmem_remove(struct device *dev)
 {
 	struct dev_dax *dev_dax = to_dev_dax(dev);
+	struct range range = dax_kmem_range(dev_dax);
 	struct resource *res = dev_dax->dax_kmem_res;
-	resource_size_t kmem_start = res->start;
-	resource_size_t kmem_size = resource_size(res);
 	const char *res_name = res->name;
 	int rc;
 
@@ -108,12 +103,11 @@ static int dev_dax_kmem_remove(struct device *dev)
 	 * there is no way to hotremove this memory until reboot because device
 	 * unbind will succeed even if we return failure.
 	 */
-	rc = remove_memory(dev_dax->target_node, kmem_start, kmem_size);
+	rc = remove_memory(dev_dax->target_node, range.start, range_len(&range));
 	if (rc) {
 		any_hotremove_failed = true;
-		dev_err(dev,
-			"DAX region %pR cannot be hotremoved until the next reboot\n",
-			res);
+		dev_err(dev, "%#llx-%#llx cannot be hotremoved until the next reboot\n",
+				range.start, range.end);
 		return rc;
 	}
 


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 03/17] device-dax/kmem: move resource name tracking to drvdata
  2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
  2020-09-25 19:11 ` [PATCH v5 01/17] device-dax: make pgmap optional for instance creation Dan Williams
  2020-09-25 19:11 ` [PATCH v5 02/17] device-dax/kmem: introduce dax_kmem_range() Dan Williams
@ 2020-09-25 19:11 ` Dan Williams
  2020-09-30 16:19   ` David Hildenbrand
  2020-09-25 19:12 ` [PATCH v5 04/17] device-dax/kmem: replace release_resource() with release_mem_region() Dan Williams
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 31+ messages in thread
From: Dan Williams @ 2020-09-25 19:11 UTC (permalink / raw)
  To: akpm
  Cc: David Hildenbrand, Vishal Verma, Dave Hansen, Pavel Tatashin,
	Brice Goglin, Dave Jiang, David Hildenbrand, Ira Weiny, Jia He,
	Joao Martins, Jonathan Cameron, linux-mm, linux-nvdimm,
	linux-kernel

Towards removing the mode specific @dax_kmem_res attribute from the
generic 'struct dev_dax', and preparing for multi-range support, move
resource name tracking to driver data.  The memory for the resource name
needs to have its own lifetime separate from the device bind lifetime
for cases where the driver is unbound, but the kmem range could not be
unplugged from the page allocator.

Cc: David Hildenbrand <david@redhat.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/kmem.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index b0d6a99cf12d..6fe2cb1c5f7c 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -34,7 +34,7 @@ int dev_dax_kmem_probe(struct device *dev)
 	struct dev_dax *dev_dax = to_dev_dax(dev);
 	struct range range = dax_kmem_range(dev_dax);
 	struct resource *new_res;
-	const char *new_res_name;
+	char *res_name;
 	int numa_node;
 	int rc;
 
@@ -51,15 +51,15 @@ int dev_dax_kmem_probe(struct device *dev)
 		return -EINVAL;
 	}
 
-	new_res_name = kstrdup(dev_name(dev), GFP_KERNEL);
-	if (!new_res_name)
+	res_name = kstrdup(dev_name(dev), GFP_KERNEL);
+	if (!res_name)
 		return -ENOMEM;
 
 	/* Region is permanently reserved if hotremove fails. */
-	new_res = request_mem_region(range.start, range_len(&range), new_res_name);
+	new_res = request_mem_region(range.start, range_len(&range), res_name);
 	if (!new_res) {
 		dev_warn(dev, "could not reserve region [%#llx-%#llx]\n", range.start, range.end);
-		kfree(new_res_name);
+		kfree(res_name);
 		return -EBUSY;
 	}
 
@@ -80,9 +80,11 @@ int dev_dax_kmem_probe(struct device *dev)
 	if (rc) {
 		release_resource(new_res);
 		kfree(new_res);
-		kfree(new_res_name);
+		kfree(res_name);
 		return rc;
 	}
+
+	dev_set_drvdata(dev, res_name);
 	dev_dax->dax_kmem_res = new_res;
 
 	return 0;
@@ -94,7 +96,7 @@ static int dev_dax_kmem_remove(struct device *dev)
 	struct dev_dax *dev_dax = to_dev_dax(dev);
 	struct range range = dax_kmem_range(dev_dax);
 	struct resource *res = dev_dax->dax_kmem_res;
-	const char *res_name = res->name;
+	const char *res_name = dev_get_drvdata(dev);
 	int rc;
 
 	/*


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 04/17] device-dax/kmem: replace release_resource() with release_mem_region()
  2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
                   ` (2 preceding siblings ...)
  2020-09-25 19:11 ` [PATCH v5 03/17] device-dax/kmem: move resource name tracking to drvdata Dan Williams
@ 2020-09-25 19:12 ` Dan Williams
  2020-09-30 16:23   ` David Hildenbrand
  2020-09-25 19:12 ` [PATCH v5 05/17] device-dax: add an allocation interface for device-dax instances Dan Williams
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 31+ messages in thread
From: Dan Williams @ 2020-09-25 19:12 UTC (permalink / raw)
  To: akpm
  Cc: David Hildenbrand, Vishal Verma, Dave Hansen, Pavel Tatashin,
	Brice Goglin, Dave Jiang, David Hildenbrand, Ira Weiny, Jia He,
	Joao Martins, Jonathan Cameron, linux-mm, linux-nvdimm,
	linux-kernel

Towards removing the mode specific @dax_kmem_res attribute from the
generic 'struct dev_dax', and preparing for multi-range support, change
the kmem driver to use the idiomatic release_mem_region() to pair with
the initial request_mem_region(). This also eliminates the need to open
code the release of the resource allocated by request_mem_region().

As there are no more dax_kmem_res users, delete this struct member.

Cc: David Hildenbrand <david@redhat.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/dax-private.h |    3 ---
 drivers/dax/kmem.c        |   20 +++++++-------------
 2 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index 6779f683671d..12a2dbc43b40 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -42,8 +42,6 @@ struct dax_region {
  * @dev - device core
  * @pgmap - pgmap for memmap setup / lifetime (driver owned)
  * @range: resource range for the instance
- * @dax_mem_res: physical address range of hotadded DAX memory
- * @dax_mem_name: name for hotadded DAX memory via add_memory_driver_managed()
  */
 struct dev_dax {
 	struct dax_region *region;
@@ -52,7 +50,6 @@ struct dev_dax {
 	struct device dev;
 	struct dev_pagemap *pgmap;
 	struct range range;
-	struct resource *dax_kmem_res;
 };
 
 static inline u64 range_len(struct range *range)
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 6fe2cb1c5f7c..e56fc688bdc5 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -33,7 +33,7 @@ int dev_dax_kmem_probe(struct device *dev)
 {
 	struct dev_dax *dev_dax = to_dev_dax(dev);
 	struct range range = dax_kmem_range(dev_dax);
-	struct resource *new_res;
+	struct resource *res;
 	char *res_name;
 	int numa_node;
 	int rc;
@@ -56,8 +56,8 @@ int dev_dax_kmem_probe(struct device *dev)
 		return -ENOMEM;
 
 	/* Region is permanently reserved if hotremove fails. */
-	new_res = request_mem_region(range.start, range_len(&range), res_name);
-	if (!new_res) {
+	res = request_mem_region(range.start, range_len(&range), res_name);
+	if (!res) {
 		dev_warn(dev, "could not reserve region [%#llx-%#llx]\n", range.start, range.end);
 		kfree(res_name);
 		return -EBUSY;
@@ -69,23 +69,20 @@ int dev_dax_kmem_probe(struct device *dev)
 	 * inherit flags from the parent since it may set new flags
 	 * unknown to us that will break add_memory() below.
 	 */
-	new_res->flags = IORESOURCE_SYSTEM_RAM;
+	res->flags = IORESOURCE_SYSTEM_RAM;
 
 	/*
 	 * Ensure that future kexec'd kernels will not treat this as RAM
 	 * automatically.
 	 */
-	rc = add_memory_driver_managed(numa_node, new_res->start,
-				       resource_size(new_res), kmem_name);
+	rc = add_memory_driver_managed(numa_node, range.start, range_len(&range), kmem_name);
 	if (rc) {
-		release_resource(new_res);
-		kfree(new_res);
+		release_mem_region(range.start, range_len(&range));
 		kfree(res_name);
 		return rc;
 	}
 
 	dev_set_drvdata(dev, res_name);
-	dev_dax->dax_kmem_res = new_res;
 
 	return 0;
 }
@@ -95,7 +92,6 @@ static int dev_dax_kmem_remove(struct device *dev)
 {
 	struct dev_dax *dev_dax = to_dev_dax(dev);
 	struct range range = dax_kmem_range(dev_dax);
-	struct resource *res = dev_dax->dax_kmem_res;
 	const char *res_name = dev_get_drvdata(dev);
 	int rc;
 
@@ -114,10 +110,8 @@ static int dev_dax_kmem_remove(struct device *dev)
 	}
 
 	/* Release and free dax resources */
-	release_resource(res);
-	kfree(res);
+	release_mem_region(range.start, range_len(&range));
 	kfree(res_name);
-	dev_dax->dax_kmem_res = NULL;
 
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 05/17] device-dax: add an allocation interface for device-dax instances
  2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
                   ` (3 preceding siblings ...)
  2020-09-25 19:12 ` [PATCH v5 04/17] device-dax/kmem: replace release_resource() with release_mem_region() Dan Williams
@ 2020-09-25 19:12 ` Dan Williams
  2020-09-25 19:12 ` [PATCH v5 06/17] device-dax: introduce 'struct dev_dax' typed-driver operations Dan Williams
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Dan Williams @ 2020-09-25 19:12 UTC (permalink / raw)
  To: akpm
  Cc: Vishal Verma, Brice Goglin, Dave Hansen, Dave Jiang,
	David Hildenbrand, Ira Weiny, Jia He, Joao Martins,
	Jonathan Cameron, linux-mm, linux-nvdimm, linux-kernel

;In preparation for a facility that enables dax regions to be sub-divided,
introduce infrastructure to track and allocate region capacity.

The new dax_region/available_size attribute is only enabled for volatile
hmem devices, not pmem devices that are defined by nvdimm namespace
boundaries.  This is per Jeff's feedback the last time dynamic device-dax
capacity allocation support was discussed.

Link: https://lore.kernel.org/linux-nvdimm/x49shpp3zn8.fsf@segfault.boston.devel.redhat.com
Link: https://lkml.kernel.org/r/159643101035.4062302.6785857915652647857.stgit@dwillia2-desk3.amr.corp.intel.com
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/bus.c         |  120 +++++++++++++++++++++++++++++++++++++++++----
 drivers/dax/bus.h         |    7 ++-
 drivers/dax/dax-private.h |    2 -
 drivers/dax/hmem/hmem.c   |    7 +--
 drivers/dax/pmem/core.c   |    8 +--
 5 files changed, 121 insertions(+), 23 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 96bd64ba95a5..0a48ce378686 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -130,6 +130,11 @@ ATTRIBUTE_GROUPS(dax_drv);
 
 static int dax_bus_match(struct device *dev, struct device_driver *drv);
 
+static bool is_static(struct dax_region *dax_region)
+{
+	return (dax_region->res.flags & IORESOURCE_DAX_STATIC) != 0;
+}
+
 static struct bus_type dax_bus_type = {
 	.name = "dax",
 	.uevent = dax_bus_uevent,
@@ -185,7 +190,48 @@ static ssize_t align_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(align);
 
+#define for_each_dax_region_resource(dax_region, res) \
+	for (res = (dax_region)->res.child; res; res = res->sibling)
+
+static unsigned long long dax_region_avail_size(struct dax_region *dax_region)
+{
+	resource_size_t size = resource_size(&dax_region->res);
+	struct resource *res;
+
+	device_lock_assert(dax_region->dev);
+
+	for_each_dax_region_resource(dax_region, res)
+		size -= resource_size(res);
+	return size;
+}
+
+static ssize_t available_size_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct dax_region *dax_region = dev_get_drvdata(dev);
+	unsigned long long size;
+
+	device_lock(dev);
+	size = dax_region_avail_size(dax_region);
+	device_unlock(dev);
+
+	return sprintf(buf, "%llu\n", size);
+}
+static DEVICE_ATTR_RO(available_size);
+
+static umode_t dax_region_visible(struct kobject *kobj, struct attribute *a,
+		int n)
+{
+	struct device *dev = container_of(kobj, struct device, kobj);
+	struct dax_region *dax_region = dev_get_drvdata(dev);
+
+	if (is_static(dax_region) && a == &dev_attr_available_size.attr)
+		return 0;
+	return a->mode;
+}
+
 static struct attribute *dax_region_attributes[] = {
+	&dev_attr_available_size.attr,
 	&dev_attr_region_size.attr,
 	&dev_attr_align.attr,
 	&dev_attr_id.attr,
@@ -195,6 +241,7 @@ static struct attribute *dax_region_attributes[] = {
 static const struct attribute_group dax_region_attribute_group = {
 	.name = "dax_region",
 	.attrs = dax_region_attributes,
+	.is_visible = dax_region_visible,
 };
 
 static const struct attribute_group *dax_region_attribute_groups[] = {
@@ -226,7 +273,8 @@ static void dax_region_unregister(void *region)
 }
 
 struct dax_region *alloc_dax_region(struct device *parent, int region_id,
-		struct resource *res, int target_node, unsigned int align)
+		struct resource *res, int target_node, unsigned int align,
+		unsigned long flags)
 {
 	struct dax_region *dax_region;
 
@@ -249,12 +297,17 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 		return NULL;
 
 	dev_set_drvdata(parent, dax_region);
-	memcpy(&dax_region->res, res, sizeof(*res));
 	kref_init(&dax_region->kref);
 	dax_region->id = region_id;
 	dax_region->align = align;
 	dax_region->dev = parent;
 	dax_region->target_node = target_node;
+	dax_region->res = (struct resource) {
+		.start = res->start,
+		.end = res->end,
+		.flags = IORESOURCE_MEM | flags,
+	};
+
 	if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) {
 		kfree(dax_region);
 		return NULL;
@@ -267,6 +320,32 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 }
 EXPORT_SYMBOL_GPL(alloc_dax_region);
 
+static int alloc_dev_dax_range(struct dev_dax *dev_dax, resource_size_t size)
+{
+	struct dax_region *dax_region = dev_dax->region;
+	struct resource *res = &dax_region->res;
+	struct device *dev = &dev_dax->dev;
+	struct resource *alloc;
+
+	device_lock_assert(dax_region->dev);
+
+	/* TODO: handle multiple allocations per region */
+	if (res->child)
+		return -ENOMEM;
+
+	alloc = __request_region(res, res->start, size, dev_name(dev), 0);
+
+	if (!alloc)
+		return -ENOMEM;
+
+	dev_dax->range = (struct range) {
+		.start = alloc->start,
+		.end = alloc->end,
+	};
+
+	return 0;
+}
+
 static ssize_t size_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
@@ -361,6 +440,15 @@ void kill_dev_dax(struct dev_dax *dev_dax)
 }
 EXPORT_SYMBOL_GPL(kill_dev_dax);
 
+static void free_dev_dax_range(struct dev_dax *dev_dax)
+{
+	struct dax_region *dax_region = dev_dax->region;
+	struct range *range = &dev_dax->range;
+
+	device_lock_assert(dax_region->dev);
+	__release_region(&dax_region->res, range->start, range_len(range));
+}
+
 static void dev_dax_release(struct device *dev)
 {
 	struct dev_dax *dev_dax = to_dev_dax(dev);
@@ -385,6 +473,7 @@ static void unregister_dev_dax(void *dev)
 	dev_dbg(dev, "%s\n", __func__);
 
 	kill_dev_dax(dev_dax);
+	free_dev_dax_range(dev_dax);
 	device_del(dev);
 	put_device(dev);
 }
@@ -397,7 +486,7 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 	struct dev_dax *dev_dax;
 	struct inode *inode;
 	struct device *dev;
-	int rc = -ENOMEM;
+	int rc;
 
 	if (data->id < 0)
 		return ERR_PTR(-EINVAL);
@@ -406,11 +495,25 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 	if (!dev_dax)
 		return ERR_PTR(-ENOMEM);
 
+	dev_dax->region = dax_region;
+	dev = &dev_dax->dev;
+	device_initialize(dev);
+	dev_set_name(dev, "dax%d.%d", dax_region->id, data->id);
+
+	rc = alloc_dev_dax_range(dev_dax, data->size);
+	if (rc)
+		goto err_range;
+
 	if (data->pgmap) {
+		dev_WARN_ONCE(parent, !is_static(dax_region),
+			"custom dev_pagemap requires a static dax_region\n");
+
 		dev_dax->pgmap = kmemdup(data->pgmap,
 				sizeof(struct dev_pagemap), GFP_KERNEL);
-		if (!dev_dax->pgmap)
+		if (!dev_dax->pgmap) {
+			rc = -ENOMEM;
 			goto err_pgmap;
+		}
 	}
 
 	/*
@@ -427,12 +530,7 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 	kill_dax(dax_dev);
 
 	/* from here on we're committed to teardown via dev_dax_release() */
-	dev = &dev_dax->dev;
-	device_initialize(dev);
-
 	dev_dax->dax_dev = dax_dev;
-	dev_dax->region = dax_region;
-	dev_dax->range = data->range;
 	dev_dax->target_node = dax_region->target_node;
 	kref_get(&dax_region->kref);
 
@@ -444,7 +542,6 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 		dev->class = dax_class;
 	dev->parent = parent;
 	dev->type = &dev_dax_type;
-	dev_set_name(dev, "dax%d.%d", dax_region->id, data->id);
 
 	rc = device_add(dev);
 	if (rc) {
@@ -458,9 +555,12 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 		return ERR_PTR(rc);
 
 	return dev_dax;
+
 err_alloc_dax:
 	kfree(dev_dax->pgmap);
 err_pgmap:
+	free_dev_dax_range(dev_dax);
+err_range:
 	kfree(dev_dax);
 
 	return ERR_PTR(rc);
diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index 4aeb36da83a4..44592a8cac0f 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -10,8 +10,11 @@ struct resource;
 struct dax_device;
 struct dax_region;
 void dax_region_put(struct dax_region *dax_region);
+
+#define IORESOURCE_DAX_STATIC (1UL << 0)
 struct dax_region *alloc_dax_region(struct device *parent, int region_id,
-		struct resource *res, int target_node, unsigned int align);
+		struct resource *res, int target_node, unsigned int align,
+		unsigned long flags);
 
 enum dev_dax_subsys {
 	DEV_DAX_BUS = 0, /* zeroed dev_dax_data picks this by default */
@@ -22,7 +25,7 @@ struct dev_dax_data {
 	struct dax_region *dax_region;
 	struct dev_pagemap *pgmap;
 	enum dev_dax_subsys subsys;
-	struct range range;
+	resource_size_t size;
 	int id;
 };
 
diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index 12a2dbc43b40..99b1273bb232 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -22,7 +22,7 @@ void dax_bus_exit(void);
  * @kref: to pin while other agents have a need to do lookups
  * @dev: parent device backing this region
  * @align: allocation and mapping alignment for child dax devices
- * @res: physical address range of the region
+ * @res: resource tree to track instance allocations
  */
 struct dax_region {
 	int id;
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index af82d6ba820a..e7b64539e23e 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -20,17 +20,14 @@ static int dax_hmem_probe(struct platform_device *pdev)
 
 	mri = dev->platform_data;
 	dax_region = alloc_dax_region(dev, pdev->id, res, mri->target_node,
-			PMD_SIZE);
+			PMD_SIZE, 0);
 	if (!dax_region)
 		return -ENOMEM;
 
 	data = (struct dev_dax_data) {
 		.dax_region = dax_region,
 		.id = 0,
-		.range = {
-			.start = res->start,
-			.end = res->end,
-		},
+		.size = resource_size(res),
 	};
 	dev_dax = devm_create_dev_dax(&data);
 	if (IS_ERR(dev_dax))
diff --git a/drivers/dax/pmem/core.c b/drivers/dax/pmem/core.c
index 4fa81d3d2f65..4fe700884338 100644
--- a/drivers/dax/pmem/core.c
+++ b/drivers/dax/pmem/core.c
@@ -54,7 +54,8 @@ struct dev_dax *__dax_pmem_probe(struct device *dev, enum dev_dax_subsys subsys)
 	memcpy(&res, &pgmap.res, sizeof(res));
 	res.start += offset;
 	dax_region = alloc_dax_region(dev, region_id, &res,
-			nd_region->target_node, le32_to_cpu(pfn_sb->align));
+			nd_region->target_node, le32_to_cpu(pfn_sb->align),
+			IORESOURCE_DAX_STATIC);
 	if (!dax_region)
 		return ERR_PTR(-ENOMEM);
 
@@ -63,10 +64,7 @@ struct dev_dax *__dax_pmem_probe(struct device *dev, enum dev_dax_subsys subsys)
 		.id = id,
 		.pgmap = &pgmap,
 		.subsys = subsys,
-		.range = {
-			.start = res.start,
-			.end = res.end,
-		},
+		.size = resource_size(&res),
 	};
 	dev_dax = devm_create_dev_dax(&data);
 


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 06/17] device-dax: introduce 'struct dev_dax' typed-driver operations
  2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
                   ` (4 preceding siblings ...)
  2020-09-25 19:12 ` [PATCH v5 05/17] device-dax: add an allocation interface for device-dax instances Dan Williams
@ 2020-09-25 19:12 ` Dan Williams
  2020-09-25 19:12 ` [PATCH v5 07/17] device-dax: introduce 'seed' devices Dan Williams
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Dan Williams @ 2020-09-25 19:12 UTC (permalink / raw)
  To: akpm
  Cc: Jason Yan, Vishal Verma, Brice Goglin, Dave Hansen, Dave Jiang,
	David Hildenbrand, Ira Weiny, Jia He, Joao Martins,
	Jonathan Cameron, Hulk Robot, linux-mm, linux-nvdimm,
	linux-kernel

In preparation for introducing seed devices the dax-bus core needs to be
able to intercept ->probe() and ->remove() operations. Towards that end
arrange for the bus and drivers to switch from raw 'struct device'
driver operations to 'struct dev_dax' typed operations.

Cc: Jason Yan <yanaijie@huawei.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/bus.c         |   18 ++++++++++++++++++
 drivers/dax/bus.h         |    4 +++-
 drivers/dax/device.c      |   12 +++++-------
 drivers/dax/kmem.c        |   18 ++++++++----------
 drivers/dax/pmem/compat.c |    2 +-
 5 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 0a48ce378686..9549f11ebed0 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -135,10 +135,28 @@ static bool is_static(struct dax_region *dax_region)
 	return (dax_region->res.flags & IORESOURCE_DAX_STATIC) != 0;
 }
 
+static int dax_bus_probe(struct device *dev)
+{
+	struct dax_device_driver *dax_drv = to_dax_drv(dev->driver);
+	struct dev_dax *dev_dax = to_dev_dax(dev);
+
+	return dax_drv->probe(dev_dax);
+}
+
+static int dax_bus_remove(struct device *dev)
+{
+	struct dax_device_driver *dax_drv = to_dax_drv(dev->driver);
+	struct dev_dax *dev_dax = to_dev_dax(dev);
+
+	return dax_drv->remove(dev_dax);
+}
+
 static struct bus_type dax_bus_type = {
 	.name = "dax",
 	.uevent = dax_bus_uevent,
 	.match = dax_bus_match,
+	.probe = dax_bus_probe,
+	.remove = dax_bus_remove,
 	.drv_groups = dax_drv_groups,
 };
 
diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index 44592a8cac0f..da27ea70a19a 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -38,6 +38,8 @@ struct dax_device_driver {
 	struct device_driver drv;
 	struct list_head ids;
 	int match_always;
+	int (*probe)(struct dev_dax *dev);
+	int (*remove)(struct dev_dax *dev);
 };
 
 int __dax_driver_register(struct dax_device_driver *dax_drv,
@@ -48,7 +50,7 @@ void dax_driver_unregister(struct dax_device_driver *dax_drv);
 void kill_dev_dax(struct dev_dax *dev_dax);
 
 #if IS_ENABLED(CONFIG_DEV_DAX_PMEM_COMPAT)
-int dev_dax_probe(struct device *dev);
+int dev_dax_probe(struct dev_dax *dev_dax);
 #endif
 
 /*
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 287cf0a3db23..9833fa83b537 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -392,11 +392,11 @@ static void dev_dax_kill(void *dev_dax)
 	kill_dev_dax(dev_dax);
 }
 
-int dev_dax_probe(struct device *dev)
+int dev_dax_probe(struct dev_dax *dev_dax)
 {
-	struct dev_dax *dev_dax = to_dev_dax(dev);
 	struct dax_device *dax_dev = dev_dax->dax_dev;
 	struct range *range = &dev_dax->range;
+	struct device *dev = &dev_dax->dev;
 	struct dev_pagemap *pgmap;
 	struct inode *inode;
 	struct cdev *cdev;
@@ -446,17 +446,15 @@ int dev_dax_probe(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(dev_dax_probe);
 
-static int dev_dax_remove(struct device *dev)
+static int dev_dax_remove(struct dev_dax *dev_dax)
 {
 	/* all probe actions are unwound by devm */
 	return 0;
 }
 
 static struct dax_device_driver device_dax_driver = {
-	.drv = {
-		.probe = dev_dax_probe,
-		.remove = dev_dax_remove,
-	},
+	.probe = dev_dax_probe,
+	.remove = dev_dax_remove,
 	.match_always = 1,
 };
 
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index e56fc688bdc5..c2ac465cc342 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -29,10 +29,10 @@ static struct range dax_kmem_range(struct dev_dax *dev_dax)
 	return range;
 }
 
-int dev_dax_kmem_probe(struct device *dev)
+static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
 {
-	struct dev_dax *dev_dax = to_dev_dax(dev);
 	struct range range = dax_kmem_range(dev_dax);
+	struct device *dev = &dev_dax->dev;
 	struct resource *res;
 	char *res_name;
 	int numa_node;
@@ -88,12 +88,12 @@ int dev_dax_kmem_probe(struct device *dev)
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-static int dev_dax_kmem_remove(struct device *dev)
+static int dev_dax_kmem_remove(struct dev_dax *dev_dax)
 {
-	struct dev_dax *dev_dax = to_dev_dax(dev);
+	int rc;
+	struct device *dev = &dev_dax->dev;
 	struct range range = dax_kmem_range(dev_dax);
 	const char *res_name = dev_get_drvdata(dev);
-	int rc;
 
 	/*
 	 * We have one shot for removing memory, if some memory blocks were not
@@ -116,7 +116,7 @@ static int dev_dax_kmem_remove(struct device *dev)
 	return 0;
 }
 #else
-static int dev_dax_kmem_remove(struct device *dev)
+static int dev_dax_kmem_remove(struct dev_dax *dev_dax)
 {
 	/*
 	 * Without hotremove purposely leak the request_mem_region() for the
@@ -131,10 +131,8 @@ static int dev_dax_kmem_remove(struct device *dev)
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 static struct dax_device_driver device_dax_kmem_driver = {
-	.drv = {
-		.probe = dev_dax_kmem_probe,
-		.remove = dev_dax_kmem_remove,
-	},
+	.probe = dev_dax_kmem_probe,
+	.remove = dev_dax_kmem_remove,
 };
 
 static int __init dax_kmem_init(void)
diff --git a/drivers/dax/pmem/compat.c b/drivers/dax/pmem/compat.c
index d7b15e6f30c5..863c114fd88c 100644
--- a/drivers/dax/pmem/compat.c
+++ b/drivers/dax/pmem/compat.c
@@ -22,7 +22,7 @@ static int dax_pmem_compat_probe(struct device *dev)
 		return -ENOMEM;
 
 	device_lock(&dev_dax->dev);
-	rc = dev_dax_probe(&dev_dax->dev);
+	rc = dev_dax_probe(dev_dax);
 	device_unlock(&dev_dax->dev);
 
 	devres_close_group(&dev_dax->dev, dev_dax);


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 07/17] device-dax: introduce 'seed' devices
  2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
                   ` (5 preceding siblings ...)
  2020-09-25 19:12 ` [PATCH v5 06/17] device-dax: introduce 'struct dev_dax' typed-driver operations Dan Williams
@ 2020-09-25 19:12 ` Dan Williams
  2020-09-25 19:12 ` [PATCH v5 08/17] drivers/base: make device_find_child_by_name() compatible with sysfs inputs Dan Williams
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Dan Williams @ 2020-09-25 19:12 UTC (permalink / raw)
  To: akpm
  Cc: Vishal Verma, Brice Goglin, Dave Hansen, Dave Jiang,
	David Hildenbrand, Ira Weiny, Jia He, Joao Martins,
	Jonathan Cameron, linux-mm, linux-nvdimm, linux-kernel

Add a seed device concept for dynamic dax regions to be able to split the
region amongst multiple sub-instances.  The seed device, similar to
libnvdimm seed devices, is a device that starts with zero capacity
allocated and unbound to a driver.  In contrast to libnvdimm seed devices
explicit 'create' and 'delete' interfaces are added to the region to
trigger seeds to be created and unused devices to be reclaimed.  The
explicit create and delete replaces implicit create as a side effect of
probe and implicit delete when writing 0 to the size that libnvdimm
implements.

Delete can be performed on any 0-sized and idle device.  This avoids the
gymnastics of needing to move device_unregister() to its own async
context.  Specifically, it avoids the deadlock of deleting a device via
one of its own attributes.  It is also less surprising to userspace which
never sees an extra device it did not request.

For now just add the device creation, teardown, and ->probe() prevention.
A later patch will arrange for the 'dax/size' attribute to be writable to
allocate capacity from the region.

Link: https://lkml.kernel.org/r/159643101583.4062302.12255093902950754962.stgit@dwillia2-desk3.amr.corp.intel.com
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/bus.c         |  301 +++++++++++++++++++++++++++++++++++++++------
 drivers/dax/dax-private.h |    9 +
 drivers/dax/hmem/hmem.c   |    2 
 3 files changed, 272 insertions(+), 40 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 9549f11ebed0..dce9413a4394 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -139,8 +139,26 @@ static int dax_bus_probe(struct device *dev)
 {
 	struct dax_device_driver *dax_drv = to_dax_drv(dev->driver);
 	struct dev_dax *dev_dax = to_dev_dax(dev);
+	struct dax_region *dax_region = dev_dax->region;
+	struct range *range = &dev_dax->range;
+	int rc;
+
+	if (range_len(range) == 0 || dev_dax->id < 0)
+		return -ENXIO;
+
+	rc = dax_drv->probe(dev_dax);
+
+	if (rc || is_static(dax_region))
+		return rc;
+
+	/*
+	 * Track new seed creation only after successful probe of the
+	 * previous seed.
+	 */
+	if (dax_region->seed == dev)
+		dax_region->seed = NULL;
 
-	return dax_drv->probe(dev_dax);
+	return 0;
 }
 
 static int dax_bus_remove(struct device *dev)
@@ -237,14 +255,216 @@ static ssize_t available_size_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(available_size);
 
+static ssize_t seed_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct dax_region *dax_region = dev_get_drvdata(dev);
+	struct device *seed;
+	ssize_t rc;
+
+	if (is_static(dax_region))
+		return -EINVAL;
+
+	device_lock(dev);
+	seed = dax_region->seed;
+	rc = sprintf(buf, "%s\n", seed ? dev_name(seed) : "");
+	device_unlock(dev);
+
+	return rc;
+}
+static DEVICE_ATTR_RO(seed);
+
+static ssize_t create_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct dax_region *dax_region = dev_get_drvdata(dev);
+	struct device *youngest;
+	ssize_t rc;
+
+	if (is_static(dax_region))
+		return -EINVAL;
+
+	device_lock(dev);
+	youngest = dax_region->youngest;
+	rc = sprintf(buf, "%s\n", youngest ? dev_name(youngest) : "");
+	device_unlock(dev);
+
+	return rc;
+}
+
+static ssize_t create_store(struct device *dev, struct device_attribute *attr,
+		const char *buf, size_t len)
+{
+	struct dax_region *dax_region = dev_get_drvdata(dev);
+	unsigned long long avail;
+	ssize_t rc;
+	int val;
+
+	if (is_static(dax_region))
+		return -EINVAL;
+
+	rc = kstrtoint(buf, 0, &val);
+	if (rc)
+		return rc;
+	if (val != 1)
+		return -EINVAL;
+
+	device_lock(dev);
+	avail = dax_region_avail_size(dax_region);
+	if (avail == 0)
+		rc = -ENOSPC;
+	else {
+		struct dev_dax_data data = {
+			.dax_region = dax_region,
+			.size = 0,
+			.id = -1,
+		};
+		struct dev_dax *dev_dax = devm_create_dev_dax(&data);
+
+		if (IS_ERR(dev_dax))
+			rc = PTR_ERR(dev_dax);
+		else {
+			/*
+			 * In support of crafting multiple new devices
+			 * simultaneously multiple seeds can be created,
+			 * but only the first one that has not been
+			 * successfully bound is tracked as the region
+			 * seed.
+			 */
+			if (!dax_region->seed)
+				dax_region->seed = &dev_dax->dev;
+			dax_region->youngest = &dev_dax->dev;
+			rc = len;
+		}
+	}
+	device_unlock(dev);
+
+	return rc;
+}
+static DEVICE_ATTR_RW(create);
+
+void kill_dev_dax(struct dev_dax *dev_dax)
+{
+	struct dax_device *dax_dev = dev_dax->dax_dev;
+	struct inode *inode = dax_inode(dax_dev);
+
+	kill_dax(dax_dev);
+	unmap_mapping_range(inode->i_mapping, 0, 0, 1);
+}
+EXPORT_SYMBOL_GPL(kill_dev_dax);
+
+static void free_dev_dax_range(struct dev_dax *dev_dax)
+{
+	struct dax_region *dax_region = dev_dax->region;
+	struct range *range = &dev_dax->range;
+
+	device_lock_assert(dax_region->dev);
+	if (range_len(range))
+		__release_region(&dax_region->res, range->start,
+				range_len(range));
+}
+
+static void unregister_dev_dax(void *dev)
+{
+	struct dev_dax *dev_dax = to_dev_dax(dev);
+
+	dev_dbg(dev, "%s\n", __func__);
+
+	kill_dev_dax(dev_dax);
+	free_dev_dax_range(dev_dax);
+	device_del(dev);
+	put_device(dev);
+}
+
+/* a return value >= 0 indicates this invocation invalidated the id */
+static int __free_dev_dax_id(struct dev_dax *dev_dax)
+{
+	struct dax_region *dax_region = dev_dax->region;
+	struct device *dev = &dev_dax->dev;
+	int rc = dev_dax->id;
+
+	device_lock_assert(dev);
+
+	if (is_static(dax_region) || dev_dax->id < 0)
+		return -1;
+	ida_free(&dax_region->ida, dev_dax->id);
+	dev_dax->id = -1;
+	return rc;
+}
+
+static int free_dev_dax_id(struct dev_dax *dev_dax)
+{
+	struct device *dev = &dev_dax->dev;
+	int rc;
+
+	device_lock(dev);
+	rc = __free_dev_dax_id(dev_dax);
+	device_unlock(dev);
+	return rc;
+}
+
+static ssize_t delete_store(struct device *dev, struct device_attribute *attr,
+		const char *buf, size_t len)
+{
+	struct dax_region *dax_region = dev_get_drvdata(dev);
+	struct dev_dax *dev_dax;
+	struct device *victim;
+	bool do_del = false;
+	int rc;
+
+	if (is_static(dax_region))
+		return -EINVAL;
+
+	victim = device_find_child_by_name(dax_region->dev, buf);
+	if (!victim)
+		return -ENXIO;
+
+	device_lock(dev);
+	device_lock(victim);
+	dev_dax = to_dev_dax(victim);
+	if (victim->driver || range_len(&dev_dax->range))
+		rc = -EBUSY;
+	else {
+		/*
+		 * Invalidate the device so it does not become active
+		 * again, but always preserve device-id-0 so that
+		 * /sys/bus/dax/ is guaranteed to be populated while any
+		 * dax_region is registered.
+		 */
+		if (dev_dax->id > 0) {
+			do_del = __free_dev_dax_id(dev_dax) >= 0;
+			rc = len;
+			if (dax_region->seed == victim)
+				dax_region->seed = NULL;
+			if (dax_region->youngest == victim)
+				dax_region->youngest = NULL;
+		} else
+			rc = -EBUSY;
+	}
+	device_unlock(victim);
+
+	/* won the race to invalidate the device, clean it up */
+	if (do_del)
+		devm_release_action(dev, unregister_dev_dax, victim);
+	device_unlock(dev);
+	put_device(victim);
+
+	return rc;
+}
+static DEVICE_ATTR_WO(delete);
+
 static umode_t dax_region_visible(struct kobject *kobj, struct attribute *a,
 		int n)
 {
 	struct device *dev = container_of(kobj, struct device, kobj);
 	struct dax_region *dax_region = dev_get_drvdata(dev);
 
-	if (is_static(dax_region) && a == &dev_attr_available_size.attr)
-		return 0;
+	if (is_static(dax_region))
+		if (a == &dev_attr_available_size.attr
+				|| a == &dev_attr_create.attr
+				|| a == &dev_attr_seed.attr
+				|| a == &dev_attr_delete.attr)
+			return 0;
 	return a->mode;
 }
 
@@ -252,6 +472,9 @@ static struct attribute *dax_region_attributes[] = {
 	&dev_attr_available_size.attr,
 	&dev_attr_region_size.attr,
 	&dev_attr_align.attr,
+	&dev_attr_create.attr,
+	&dev_attr_seed.attr,
+	&dev_attr_delete.attr,
 	&dev_attr_id.attr,
 	NULL,
 };
@@ -320,6 +543,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 	dax_region->align = align;
 	dax_region->dev = parent;
 	dax_region->target_node = target_node;
+	ida_init(&dax_region->ida);
 	dax_region->res = (struct resource) {
 		.start = res->start,
 		.end = res->end,
@@ -347,6 +571,15 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, resource_size_t size)
 
 	device_lock_assert(dax_region->dev);
 
+	/* handle the seed alloc special case */
+	if (!size) {
+		dev_dax->range = (struct range) {
+			.start = res->start,
+			.end = res->start - 1,
+		};
+		return 0;
+	}
+
 	/* TODO: handle multiple allocations per region */
 	if (res->child)
 		return -ENOMEM;
@@ -448,33 +681,15 @@ static const struct attribute_group *dax_attribute_groups[] = {
 	NULL,
 };
 
-void kill_dev_dax(struct dev_dax *dev_dax)
-{
-	struct dax_device *dax_dev = dev_dax->dax_dev;
-	struct inode *inode = dax_inode(dax_dev);
-
-	kill_dax(dax_dev);
-	unmap_mapping_range(inode->i_mapping, 0, 0, 1);
-}
-EXPORT_SYMBOL_GPL(kill_dev_dax);
-
-static void free_dev_dax_range(struct dev_dax *dev_dax)
-{
-	struct dax_region *dax_region = dev_dax->region;
-	struct range *range = &dev_dax->range;
-
-	device_lock_assert(dax_region->dev);
-	__release_region(&dax_region->res, range->start, range_len(range));
-}
-
 static void dev_dax_release(struct device *dev)
 {
 	struct dev_dax *dev_dax = to_dev_dax(dev);
 	struct dax_region *dax_region = dev_dax->region;
 	struct dax_device *dax_dev = dev_dax->dax_dev;
 
-	dax_region_put(dax_region);
 	put_dax(dax_dev);
+	free_dev_dax_id(dev_dax);
+	dax_region_put(dax_region);
 	kfree(dev_dax->pgmap);
 	kfree(dev_dax);
 }
@@ -484,18 +699,6 @@ static const struct device_type dev_dax_type = {
 	.groups = dax_attribute_groups,
 };
 
-static void unregister_dev_dax(void *dev)
-{
-	struct dev_dax *dev_dax = to_dev_dax(dev);
-
-	dev_dbg(dev, "%s\n", __func__);
-
-	kill_dev_dax(dev_dax);
-	free_dev_dax_range(dev_dax);
-	device_del(dev);
-	put_device(dev);
-}
-
 struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 {
 	struct dax_region *dax_region = data->dax_region;
@@ -506,17 +709,35 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 	struct device *dev;
 	int rc;
 
-	if (data->id < 0)
-		return ERR_PTR(-EINVAL);
-
 	dev_dax = kzalloc(sizeof(*dev_dax), GFP_KERNEL);
 	if (!dev_dax)
 		return ERR_PTR(-ENOMEM);
 
+	if (is_static(dax_region)) {
+		if (dev_WARN_ONCE(parent, data->id < 0,
+				"dynamic id specified to static region\n")) {
+			rc = -EINVAL;
+			goto err_id;
+		}
+
+		dev_dax->id = data->id;
+	} else {
+		if (dev_WARN_ONCE(parent, data->id >= 0,
+				"static id specified to dynamic region\n")) {
+			rc = -EINVAL;
+			goto err_id;
+		}
+
+		rc = ida_alloc(&dax_region->ida, GFP_KERNEL);
+		if (rc < 0)
+			goto err_id;
+		dev_dax->id = rc;
+	}
+
 	dev_dax->region = dax_region;
 	dev = &dev_dax->dev;
 	device_initialize(dev);
-	dev_set_name(dev, "dax%d.%d", dax_region->id, data->id);
+	dev_set_name(dev, "dax%d.%d", dax_region->id, dev_dax->id);
 
 	rc = alloc_dev_dax_range(dev_dax, data->size);
 	if (rc)
@@ -579,6 +800,8 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 err_pgmap:
 	free_dev_dax_range(dev_dax);
 err_range:
+	free_dev_dax_id(dev_dax);
+err_id:
 	kfree(dev_dax);
 
 	return ERR_PTR(rc);
diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index 99b1273bb232..b81a1494d82b 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -7,6 +7,7 @@
 
 #include <linux/device.h>
 #include <linux/cdev.h>
+#include <linux/idr.h>
 
 /* private routines between core files */
 struct dax_device;
@@ -22,7 +23,10 @@ void dax_bus_exit(void);
  * @kref: to pin while other agents have a need to do lookups
  * @dev: parent device backing this region
  * @align: allocation and mapping alignment for child dax devices
+ * @ida: instance id allocator
  * @res: resource tree to track instance allocations
+ * @seed: allow userspace to find the first unbound seed device
+ * @youngest: allow userspace to find the most recently created device
  */
 struct dax_region {
 	int id;
@@ -30,7 +34,10 @@ struct dax_region {
 	struct kref kref;
 	struct device *dev;
 	unsigned int align;
+	struct ida ida;
 	struct resource res;
+	struct device *seed;
+	struct device *youngest;
 };
 
 /**
@@ -39,6 +46,7 @@ struct dax_region {
  * @region - parent region
  * @dax_dev - core dax functionality
  * @target_node: effective numa node if dev_dax memory range is onlined
+ * @id: ida allocated id
  * @dev - device core
  * @pgmap - pgmap for memmap setup / lifetime (driver owned)
  * @range: resource range for the instance
@@ -47,6 +55,7 @@ struct dev_dax {
 	struct dax_region *region;
 	struct dax_device *dax_dev;
 	int target_node;
+	int id;
 	struct device dev;
 	struct dev_pagemap *pgmap;
 	struct range range;
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index e7b64539e23e..aa260009dfc7 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -26,7 +26,7 @@ static int dax_hmem_probe(struct platform_device *pdev)
 
 	data = (struct dev_dax_data) {
 		.dax_region = dax_region,
-		.id = 0,
+		.id = -1,
 		.size = resource_size(res),
 	};
 	dev_dax = devm_create_dev_dax(&data);


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 08/17] drivers/base: make device_find_child_by_name() compatible with sysfs inputs
  2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
                   ` (6 preceding siblings ...)
  2020-09-25 19:12 ` [PATCH v5 07/17] device-dax: introduce 'seed' devices Dan Williams
@ 2020-09-25 19:12 ` Dan Williams
  2020-09-25 19:12 ` [PATCH v5 09/17] device-dax: add resize support Dan Williams
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Dan Williams @ 2020-09-25 19:12 UTC (permalink / raw)
  To: akpm
  Cc: Greg Kroah-Hartman, vishal.l.verma, dave.hansen, linux-mm,
	linux-nvdimm, linux-kernel

Use sysfs_streq() in device_find_child_by_name() to allow it to use a
sysfs input string that might contain a trailing newline.

The other "device by name" interfaces,
{bus,driver,class}_find_device_by_name(), already account for sysfs
strings.

Link: https://lkml.kernel.org/r/159643102106.4062302.12229802117645312104.stgit@dwillia2-desk3.amr.corp.intel.com
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/base/core.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index bb5806a2bd4c..8dd753539c06 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3324,7 +3324,7 @@ struct device *device_find_child_by_name(struct device *parent,
 
 	klist_iter_init(&parent->p->klist_children, &i);
 	while ((child = next_device(&i)))
-		if (!strcmp(dev_name(child), name) && get_device(child))
+		if (sysfs_streq(dev_name(child), name) && get_device(child))
 			break;
 	klist_iter_exit(&i);
 	return child;


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 09/17] device-dax: add resize support
  2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
                   ` (7 preceding siblings ...)
  2020-09-25 19:12 ` [PATCH v5 08/17] drivers/base: make device_find_child_by_name() compatible with sysfs inputs Dan Williams
@ 2020-09-25 19:12 ` Dan Williams
  2020-09-25 19:12 ` [PATCH v5 10/17] mm/memremap_pages: convert to 'struct range' Dan Williams
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Dan Williams @ 2020-09-25 19:12 UTC (permalink / raw)
  To: akpm
  Cc: Vishal Verma, Brice Goglin, Dave Hansen, Dave Jiang,
	David Hildenbrand, Ira Weiny, Jia He, Joao Martins,
	Jonathan Cameron, linux-mm, linux-nvdimm, linux-kernel

Make the device-dax 'size' attribute writable to allow capacity to be
split between multiple instances in a region.  The intended consumers of
this capability are users that want to split a scarce memory resource
between device-dax and System-RAM access, or users that want to have
multiple security domains for a large region.

By default the hmem instance provider allocates an entire region to the
first instance.  The process of creating a new instance (assuming a
region-id of 0) is find the region and trigger the 'create' attribute
which yields an empty instance to configure.  For example:

    cd /sys/bus/dax/devices
    echo dax0.0 > dax0.0/driver/unbind
    echo $new_size > dax0.0/size
    echo 1 > $(readlink -f dax0.0)../dax_region/create
    seed=$(cat $(readlink -f dax0.0)../dax_region/seed)
    echo $new_size > $seed/size
    echo dax0.0 > ../drivers/{device_dax,kmem}/bind
    echo dax0.1 > ../drivers/{device_dax,kmem}/bind

Instances can be destroyed by:

    echo $device > $(readlink -f $device)../dax_region/delete

Link: https://lkml.kernel.org/r/159643102625.4062302.7431838945566033852.stgit@dwillia2-desk3.amr.corp.intel.com
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/bus.c |  161 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 152 insertions(+), 9 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index dce9413a4394..53d07f2f1285 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -6,6 +6,7 @@
 #include <linux/list.h>
 #include <linux/slab.h>
 #include <linux/dax.h>
+#include <linux/io.h>
 #include "dax-private.h"
 #include "bus.h"
 
@@ -562,7 +563,8 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 }
 EXPORT_SYMBOL_GPL(alloc_dax_region);
 
-static int alloc_dev_dax_range(struct dev_dax *dev_dax, resource_size_t size)
+static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start,
+		resource_size_t size)
 {
 	struct dax_region *dax_region = dev_dax->region;
 	struct resource *res = &dax_region->res;
@@ -580,12 +582,7 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, resource_size_t size)
 		return 0;
 	}
 
-	/* TODO: handle multiple allocations per region */
-	if (res->child)
-		return -ENOMEM;
-
-	alloc = __request_region(res, res->start, size, dev_name(dev), 0);
-
+	alloc = __request_region(res, start, size, dev_name(dev), 0);
 	if (!alloc)
 		return -ENOMEM;
 
@@ -597,6 +594,29 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, resource_size_t size)
 	return 0;
 }
 
+static int adjust_dev_dax_range(struct dev_dax *dev_dax, struct resource *res, resource_size_t size)
+{
+	struct dax_region *dax_region = dev_dax->region;
+	struct range *range = &dev_dax->range;
+	int rc = 0;
+
+	device_lock_assert(dax_region->dev);
+
+	if (size)
+		rc = adjust_resource(res, range->start, size);
+	else
+		__release_region(&dax_region->res, range->start, range_len(range));
+	if (rc)
+		return rc;
+
+	dev_dax->range = (struct range) {
+		.start = range->start,
+		.end = range->start + size - 1,
+	};
+
+	return 0;
+}
+
 static ssize_t size_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
@@ -605,7 +625,127 @@ static ssize_t size_show(struct device *dev,
 
 	return sprintf(buf, "%llu\n", size);
 }
-static DEVICE_ATTR_RO(size);
+
+static bool alloc_is_aligned(struct dax_region *dax_region,
+		resource_size_t size)
+{
+	/*
+	 * The minimum mapping granularity for a device instance is a
+	 * single subsection, unless the arch says otherwise.
+	 */
+	return IS_ALIGNED(size, max_t(unsigned long, dax_region->align,
+				memremap_compat_align()));
+}
+
+static int dev_dax_shrink(struct dev_dax *dev_dax, resource_size_t size)
+{
+	struct dax_region *dax_region = dev_dax->region;
+	struct range *range = &dev_dax->range;
+	struct resource *res, *adjust = NULL;
+	struct device *dev = &dev_dax->dev;
+
+	for_each_dax_region_resource(dax_region, res)
+		if (strcmp(res->name, dev_name(dev)) == 0
+				&& res->start == range->start) {
+			adjust = res;
+			break;
+		}
+
+	if (dev_WARN_ONCE(dev, !adjust, "failed to find matching resource\n"))
+		return -ENXIO;
+	return adjust_dev_dax_range(dev_dax, adjust, size);
+}
+
+static ssize_t dev_dax_resize(struct dax_region *dax_region,
+		struct dev_dax *dev_dax, resource_size_t size)
+{
+	resource_size_t avail = dax_region_avail_size(dax_region), to_alloc;
+	resource_size_t dev_size = range_len(&dev_dax->range);
+	struct resource *region_res = &dax_region->res;
+	struct device *dev = &dev_dax->dev;
+	const char *name = dev_name(dev);
+	struct resource *res, *first;
+
+	if (dev->driver)
+		return -EBUSY;
+	if (size == dev_size)
+		return 0;
+	if (size > dev_size && size - dev_size > avail)
+		return -ENOSPC;
+	if (size < dev_size)
+		return dev_dax_shrink(dev_dax, size);
+
+	to_alloc = size - dev_size;
+	if (dev_WARN_ONCE(dev, !alloc_is_aligned(dax_region, to_alloc),
+			"resize of %pa misaligned\n", &to_alloc))
+		return -ENXIO;
+
+	/*
+	 * Expand the device into the unused portion of the region. This
+	 * may involve adjusting the end of an existing resource, or
+	 * allocating a new resource.
+	 */
+	first = region_res->child;
+	if (!first)
+		return alloc_dev_dax_range(dev_dax, dax_region->res.start, to_alloc);
+	for (res = first; to_alloc && res; res = res->sibling) {
+		struct resource *next = res->sibling;
+		resource_size_t free;
+
+		/* space at the beginning of the region */
+		free = 0;
+		if (res == first && res->start > dax_region->res.start)
+			free = res->start - dax_region->res.start;
+		if (free >= to_alloc && dev_size == 0)
+			return alloc_dev_dax_range(dev_dax, dax_region->res.start, to_alloc);
+
+		free = 0;
+		/* space between allocations */
+		if (next && next->start > res->end + 1)
+			free = next->start - res->end + 1;
+
+		/* space at the end of the region */
+		if (free < to_alloc && !next && res->end < region_res->end)
+			free = region_res->end - res->end;
+
+		if (free >= to_alloc && strcmp(name, res->name) == 0)
+			return adjust_dev_dax_range(dev_dax, res, resource_size(res) + to_alloc);
+		else if (free >= to_alloc && dev_size == 0)
+			return alloc_dev_dax_range(dev_dax, res->end + 1, to_alloc);
+	}
+	return -ENOSPC;
+}
+
+static ssize_t size_store(struct device *dev, struct device_attribute *attr,
+		const char *buf, size_t len)
+{
+	ssize_t rc;
+	unsigned long long val;
+	struct dev_dax *dev_dax = to_dev_dax(dev);
+	struct dax_region *dax_region = dev_dax->region;
+
+	rc = kstrtoull(buf, 0, &val);
+	if (rc)
+		return rc;
+
+	if (!alloc_is_aligned(dax_region, val)) {
+		dev_dbg(dev, "%s: size: %lld misaligned\n", __func__, val);
+		return -EINVAL;
+	}
+
+	device_lock(dax_region->dev);
+	if (!dax_region->dev->driver) {
+		device_unlock(dax_region->dev);
+		return -ENXIO;
+	}
+	device_lock(dev);
+	rc = dev_dax_resize(dax_region, dev_dax, val);
+	device_unlock(dev);
+	device_unlock(dax_region->dev);
+
+	return rc == 0 ? len : rc;
+}
+static DEVICE_ATTR_RW(size);
 
 static int dev_dax_target_node(struct dev_dax *dev_dax)
 {
@@ -654,11 +794,14 @@ static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n)
 {
 	struct device *dev = container_of(kobj, struct device, kobj);
 	struct dev_dax *dev_dax = to_dev_dax(dev);
+	struct dax_region *dax_region = dev_dax->region;
 
 	if (a == &dev_attr_target_node.attr && dev_dax_target_node(dev_dax) < 0)
 		return 0;
 	if (a == &dev_attr_numa_node.attr && !IS_ENABLED(CONFIG_NUMA))
 		return 0;
+	if (a == &dev_attr_size.attr && is_static(dax_region))
+		return 0444;
 	return a->mode;
 }
 
@@ -739,7 +882,7 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 	device_initialize(dev);
 	dev_set_name(dev, "dax%d.%d", dax_region->id, dev_dax->id);
 
-	rc = alloc_dev_dax_range(dev_dax, data->size);
+	rc = alloc_dev_dax_range(dev_dax, dax_region->res.start, data->size);
 	if (rc)
 		goto err_range;
 


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 10/17] mm/memremap_pages: convert to 'struct range'
  2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
                   ` (8 preceding siblings ...)
  2020-09-25 19:12 ` [PATCH v5 09/17] device-dax: add resize support Dan Williams
@ 2020-09-25 19:12 ` Dan Williams
  2020-09-28 19:12   ` boris.ostrovsky
  2020-09-25 19:12 ` [PATCH v5 11/17] mm/memremap_pages: support multiple ranges per invocation Dan Williams
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 31+ messages in thread
From: Dan Williams @ 2020-09-25 19:12 UTC (permalink / raw)
  To: akpm
  Cc: Paul Mackerras, Michael Ellerman, Benjamin Herrenschmidt,
	Vishal Verma, Vivek Goyal, Dave Jiang, Ben Skeggs, David Airlie,
	Daniel Vetter, Ira Weiny, Bjorn Helgaas, Boris Ostrovsky,
	Juergen Gross, Stefano Stabellini, Jérôme Glisse,
	dave.hansen, linux-mm, linux-nvdimm, linux-kernel

The 'struct resource' in 'struct dev_pagemap' is only used for holding
resource span information.  The other fields, 'name', 'flags', 'desc',
'parent', 'sibling', and 'child' are all unused wasted space.

This is in preparation for introducing a multi-range extension of
devm_memremap_pages().

The bulk of this change is unwinding all the places internal to libnvdimm
that used 'struct resource' unnecessarily, and replacing instances of
'struct dev_pagemap'.res with 'struct dev_pagemap'.range.

P2PDMA had a minor usage of the resource flags field, but only to report
failures with "%pR".  That is replaced with an open coded print of the
range.

Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/powerpc/kvm/book3s_hv_uvmem.c     |   13 +++--
 drivers/dax/bus.c                      |   10 ++--
 drivers/dax/bus.h                      |    2 -
 drivers/dax/dax-private.h              |    5 --
 drivers/dax/device.c                   |    3 -
 drivers/dax/hmem/hmem.c                |    5 ++
 drivers/dax/pmem/core.c                |   12 ++---
 drivers/gpu/drm/nouveau/nouveau_dmem.c |   14 +++---
 drivers/nvdimm/badrange.c              |   26 +++++------
 drivers/nvdimm/claim.c                 |   13 +++--
 drivers/nvdimm/nd.h                    |    3 +
 drivers/nvdimm/pfn_devs.c              |   12 ++---
 drivers/nvdimm/pmem.c                  |   26 ++++++-----
 drivers/nvdimm/region.c                |   21 +++++----
 drivers/pci/p2pdma.c                   |   11 ++---
 drivers/xen/unpopulated-alloc.c        |   44 ++++++++++++------
 include/linux/memremap.h               |    5 +-
 include/linux/range.h                  |    6 ++
 lib/test_hmm.c                         |   14 +++---
 mm/memremap.c                          |   77 ++++++++++++++++----------------
 tools/testing/nvdimm/test/iomap.c      |    2 -
 21 files changed, 177 insertions(+), 147 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 7705d5557239..29ec555055c2 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -687,9 +687,9 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm)
 	struct kvmppc_uvmem_page_pvt *pvt;
 	unsigned long pfn_last, pfn_first;
 
-	pfn_first = kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT;
+	pfn_first = kvmppc_uvmem_pgmap.range.start >> PAGE_SHIFT;
 	pfn_last = pfn_first +
-		   (resource_size(&kvmppc_uvmem_pgmap.res) >> PAGE_SHIFT);
+		   (range_len(&kvmppc_uvmem_pgmap.range) >> PAGE_SHIFT);
 
 	spin_lock(&kvmppc_uvmem_bitmap_lock);
 	bit = find_first_zero_bit(kvmppc_uvmem_bitmap,
@@ -1007,7 +1007,7 @@ static vm_fault_t kvmppc_uvmem_migrate_to_ram(struct vm_fault *vmf)
 static void kvmppc_uvmem_page_free(struct page *page)
 {
 	unsigned long pfn = page_to_pfn(page) -
-			(kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT);
+			(kvmppc_uvmem_pgmap.range.start >> PAGE_SHIFT);
 	struct kvmppc_uvmem_page_pvt *pvt;
 
 	spin_lock(&kvmppc_uvmem_bitmap_lock);
@@ -1170,7 +1170,8 @@ int kvmppc_uvmem_init(void)
 	}
 
 	kvmppc_uvmem_pgmap.type = MEMORY_DEVICE_PRIVATE;
-	kvmppc_uvmem_pgmap.res = *res;
+	kvmppc_uvmem_pgmap.range.start = res->start;
+	kvmppc_uvmem_pgmap.range.end = res->end;
 	kvmppc_uvmem_pgmap.ops = &kvmppc_uvmem_ops;
 	/* just one global instance: */
 	kvmppc_uvmem_pgmap.owner = &kvmppc_uvmem_pgmap;
@@ -1205,7 +1206,7 @@ void kvmppc_uvmem_free(void)
 		return;
 
 	memunmap_pages(&kvmppc_uvmem_pgmap);
-	release_mem_region(kvmppc_uvmem_pgmap.res.start,
-			   resource_size(&kvmppc_uvmem_pgmap.res));
+	release_mem_region(kvmppc_uvmem_pgmap.range.start,
+			   range_len(&kvmppc_uvmem_pgmap.range));
 	kfree(kvmppc_uvmem_bitmap);
 }
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 53d07f2f1285..00fa73a8dfb4 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -515,7 +515,7 @@ static void dax_region_unregister(void *region)
 }
 
 struct dax_region *alloc_dax_region(struct device *parent, int region_id,
-		struct resource *res, int target_node, unsigned int align,
+		struct range *range, int target_node, unsigned int align,
 		unsigned long flags)
 {
 	struct dax_region *dax_region;
@@ -530,8 +530,8 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 		return NULL;
 	}
 
-	if (!IS_ALIGNED(res->start, align)
-			|| !IS_ALIGNED(resource_size(res), align))
+	if (!IS_ALIGNED(range->start, align)
+			|| !IS_ALIGNED(range_len(range), align))
 		return NULL;
 
 	dax_region = kzalloc(sizeof(*dax_region), GFP_KERNEL);
@@ -546,8 +546,8 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 	dax_region->target_node = target_node;
 	ida_init(&dax_region->ida);
 	dax_region->res = (struct resource) {
-		.start = res->start,
-		.end = res->end,
+		.start = range->start,
+		.end = range->end,
 		.flags = IORESOURCE_MEM | flags,
 	};
 
diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index da27ea70a19a..72b92f95509f 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -13,7 +13,7 @@ void dax_region_put(struct dax_region *dax_region);
 
 #define IORESOURCE_DAX_STATIC (1UL << 0)
 struct dax_region *alloc_dax_region(struct device *parent, int region_id,
-		struct resource *res, int target_node, unsigned int align,
+		struct range *range, int target_node, unsigned int align,
 		unsigned long flags);
 
 enum dev_dax_subsys {
diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index b81a1494d82b..0cbb2ec81ca7 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -61,11 +61,6 @@ struct dev_dax {
 	struct range range;
 };
 
-static inline u64 range_len(struct range *range)
-{
-	return range->end - range->start + 1;
-}
-
 static inline struct dev_dax *to_dev_dax(struct device *dev)
 {
 	return container_of(dev, struct dev_dax, dev);
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 9833fa83b537..a14448bca83d 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -416,8 +416,7 @@ int dev_dax_probe(struct dev_dax *dev_dax)
 		pgmap = devm_kzalloc(dev, sizeof(*pgmap), GFP_KERNEL);
 		if (!pgmap)
 			return -ENOMEM;
-		pgmap->res.start = range->start;
-		pgmap->res.end = range->end;
+		pgmap->range = *range;
 	}
 	pgmap->type = MEMORY_DEVICE_GENERIC;
 	addr = devm_memremap_pages(dev, pgmap);
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index aa260009dfc7..1a3347bb6143 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -13,13 +13,16 @@ static int dax_hmem_probe(struct platform_device *pdev)
 	struct dev_dax_data data;
 	struct dev_dax *dev_dax;
 	struct resource *res;
+	struct range range;
 
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	if (!res)
 		return -ENOMEM;
 
 	mri = dev->platform_data;
-	dax_region = alloc_dax_region(dev, pdev->id, res, mri->target_node,
+	range.start = res->start;
+	range.end = res->end;
+	dax_region = alloc_dax_region(dev, pdev->id, &range, mri->target_node,
 			PMD_SIZE, 0);
 	if (!dax_region)
 		return -ENOMEM;
diff --git a/drivers/dax/pmem/core.c b/drivers/dax/pmem/core.c
index 4fe700884338..62b26bfceab1 100644
--- a/drivers/dax/pmem/core.c
+++ b/drivers/dax/pmem/core.c
@@ -9,7 +9,7 @@
 
 struct dev_dax *__dax_pmem_probe(struct device *dev, enum dev_dax_subsys subsys)
 {
-	struct resource res;
+	struct range range;
 	int rc, id, region_id;
 	resource_size_t offset;
 	struct nd_pfn_sb *pfn_sb;
@@ -50,10 +50,10 @@ struct dev_dax *__dax_pmem_probe(struct device *dev, enum dev_dax_subsys subsys)
 	if (rc != 2)
 		return ERR_PTR(-EINVAL);
 
-	/* adjust the dax_region resource to the start of data */
-	memcpy(&res, &pgmap.res, sizeof(res));
-	res.start += offset;
-	dax_region = alloc_dax_region(dev, region_id, &res,
+	/* adjust the dax_region range to the start of data */
+	range = pgmap.range;
+	range.start += offset,
+	dax_region = alloc_dax_region(dev, region_id, &range,
 			nd_region->target_node, le32_to_cpu(pfn_sb->align),
 			IORESOURCE_DAX_STATIC);
 	if (!dax_region)
@@ -64,7 +64,7 @@ struct dev_dax *__dax_pmem_probe(struct device *dev, enum dev_dax_subsys subsys)
 		.id = id,
 		.pgmap = &pgmap,
 		.subsys = subsys,
-		.size = resource_size(&res),
+		.size = range_len(&range),
 	};
 	dev_dax = devm_create_dev_dax(&data);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 4e8112fde3e6..25811ed7e274 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -101,7 +101,7 @@ unsigned long nouveau_dmem_page_addr(struct page *page)
 {
 	struct nouveau_dmem_chunk *chunk = nouveau_page_to_chunk(page);
 	unsigned long off = (page_to_pfn(page) << PAGE_SHIFT) -
-				chunk->pagemap.res.start;
+				chunk->pagemap.range.start;
 
 	return chunk->bo->offset + off;
 }
@@ -249,7 +249,8 @@ nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage)
 
 	chunk->drm = drm;
 	chunk->pagemap.type = MEMORY_DEVICE_PRIVATE;
-	chunk->pagemap.res = *res;
+	chunk->pagemap.range.start = res->start;
+	chunk->pagemap.range.end = res->end;
 	chunk->pagemap.ops = &nouveau_dmem_pagemap_ops;
 	chunk->pagemap.owner = drm->dev;
 
@@ -273,7 +274,7 @@ nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage)
 	list_add(&chunk->list, &drm->dmem->chunks);
 	mutex_unlock(&drm->dmem->mutex);
 
-	pfn_first = chunk->pagemap.res.start >> PAGE_SHIFT;
+	pfn_first = chunk->pagemap.range.start >> PAGE_SHIFT;
 	page = pfn_to_page(pfn_first);
 	spin_lock(&drm->dmem->lock);
 	for (i = 0; i < DMEM_CHUNK_NPAGES - 1; ++i, ++page) {
@@ -294,8 +295,7 @@ nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage)
 out_bo_free:
 	nouveau_bo_ref(NULL, &chunk->bo);
 out_release:
-	release_mem_region(chunk->pagemap.res.start,
-			   resource_size(&chunk->pagemap.res));
+	release_mem_region(chunk->pagemap.range.start, range_len(&chunk->pagemap.range));
 out_free:
 	kfree(chunk);
 out:
@@ -382,8 +382,8 @@ nouveau_dmem_fini(struct nouveau_drm *drm)
 		nouveau_bo_ref(NULL, &chunk->bo);
 		list_del(&chunk->list);
 		memunmap_pages(&chunk->pagemap);
-		release_mem_region(chunk->pagemap.res.start,
-				   resource_size(&chunk->pagemap.res));
+		release_mem_region(chunk->pagemap.range.start,
+				   range_len(&chunk->pagemap.range));
 		kfree(chunk);
 	}
 
diff --git a/drivers/nvdimm/badrange.c b/drivers/nvdimm/badrange.c
index b9eeefa27e3a..aaf6e215a8c6 100644
--- a/drivers/nvdimm/badrange.c
+++ b/drivers/nvdimm/badrange.c
@@ -211,7 +211,7 @@ static void __add_badblock_range(struct badblocks *bb, u64 ns_offset, u64 len)
 }
 
 static void badblocks_populate(struct badrange *badrange,
-		struct badblocks *bb, const struct resource *res)
+		struct badblocks *bb, const struct range *range)
 {
 	struct badrange_entry *bre;
 
@@ -222,34 +222,34 @@ static void badblocks_populate(struct badrange *badrange,
 		u64 bre_end = bre->start + bre->length - 1;
 
 		/* Discard intervals with no intersection */
-		if (bre_end < res->start)
+		if (bre_end < range->start)
 			continue;
-		if (bre->start >  res->end)
+		if (bre->start > range->end)
 			continue;
 		/* Deal with any overlap after start of the namespace */
-		if (bre->start >= res->start) {
+		if (bre->start >= range->start) {
 			u64 start = bre->start;
 			u64 len;
 
-			if (bre_end <= res->end)
+			if (bre_end <= range->end)
 				len = bre->length;
 			else
-				len = res->start + resource_size(res)
+				len = range->start + range_len(range)
 					- bre->start;
-			__add_badblock_range(bb, start - res->start, len);
+			__add_badblock_range(bb, start - range->start, len);
 			continue;
 		}
 		/*
 		 * Deal with overlap for badrange starting before
 		 * the namespace.
 		 */
-		if (bre->start < res->start) {
+		if (bre->start < range->start) {
 			u64 len;
 
-			if (bre_end < res->end)
-				len = bre->start + bre->length - res->start;
+			if (bre_end < range->end)
+				len = bre->start + bre->length - range->start;
 			else
-				len = resource_size(res);
+				len = range_len(range);
 			__add_badblock_range(bb, 0, len);
 		}
 	}
@@ -267,7 +267,7 @@ static void badblocks_populate(struct badrange *badrange,
  * and add badblocks entries for all matching sub-ranges
  */
 void nvdimm_badblocks_populate(struct nd_region *nd_region,
-		struct badblocks *bb, const struct resource *res)
+		struct badblocks *bb, const struct range *range)
 {
 	struct nvdimm_bus *nvdimm_bus;
 
@@ -279,7 +279,7 @@ void nvdimm_badblocks_populate(struct nd_region *nd_region,
 	nvdimm_bus = walk_to_nvdimm_bus(&nd_region->dev);
 
 	nvdimm_bus_lock(&nvdimm_bus->dev);
-	badblocks_populate(&nvdimm_bus->badrange, bb, res);
+	badblocks_populate(&nvdimm_bus->badrange, bb, range);
 	nvdimm_bus_unlock(&nvdimm_bus->dev);
 }
 EXPORT_SYMBOL_GPL(nvdimm_badblocks_populate);
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index 45964acba944..290267e1ff9f 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -303,13 +303,16 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 int devm_nsio_enable(struct device *dev, struct nd_namespace_io *nsio,
 		resource_size_t size)
 {
-	struct resource *res = &nsio->res;
 	struct nd_namespace_common *ndns = &nsio->common;
+	struct range range = {
+		.start = nsio->res.start,
+		.end = nsio->res.end,
+	};
 
 	nsio->size = size;
-	if (!devm_request_mem_region(dev, res->start, size,
+	if (!devm_request_mem_region(dev, range.start, size,
 				dev_name(&ndns->dev))) {
-		dev_warn(dev, "could not reserve region %pR\n", res);
+		dev_warn(dev, "could not reserve region %pR\n", &nsio->res);
 		return -EBUSY;
 	}
 
@@ -317,9 +320,9 @@ int devm_nsio_enable(struct device *dev, struct nd_namespace_io *nsio,
 	if (devm_init_badblocks(dev, &nsio->bb))
 		return -ENOMEM;
 	nvdimm_badblocks_populate(to_nd_region(ndns->dev.parent), &nsio->bb,
-			&nsio->res);
+			&range);
 
-	nsio->addr = devm_memremap(dev, res->start, size, ARCH_MEMREMAP_PMEM);
+	nsio->addr = devm_memremap(dev, range.start, size, ARCH_MEMREMAP_PMEM);
 
 	return PTR_ERR_OR_ZERO(nsio->addr);
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 85c1ae813ea3..bac90afa4604 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -377,8 +377,9 @@ int nvdimm_namespace_detach_btt(struct nd_btt *nd_btt);
 const char *nvdimm_namespace_disk_name(struct nd_namespace_common *ndns,
 		char *name);
 unsigned int pmem_sector_size(struct nd_namespace_common *ndns);
+struct range;
 void nvdimm_badblocks_populate(struct nd_region *nd_region,
-		struct badblocks *bb, const struct resource *res);
+		struct badblocks *bb, const struct range *range);
 int devm_namespace_enable(struct device *dev, struct nd_namespace_common *ndns,
 		resource_size_t size);
 void devm_namespace_disable(struct device *dev,
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 3e11ef8d3f5b..3c4787b92a6a 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -672,7 +672,7 @@ static unsigned long init_altmap_reserve(resource_size_t base)
 
 static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap)
 {
-	struct resource *res = &pgmap->res;
+	struct range *range = &pgmap->range;
 	struct vmem_altmap *altmap = &pgmap->altmap;
 	struct nd_pfn_sb *pfn_sb = nd_pfn->pfn_sb;
 	u64 offset = le64_to_cpu(pfn_sb->dataoff);
@@ -689,16 +689,16 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap)
 		.end_pfn = PHYS_PFN(end),
 	};
 
-	memcpy(res, &nsio->res, sizeof(*res));
-	res->start += start_pad;
-	res->end -= end_trunc;
-
+	*range = (struct range) {
+		.start = nsio->res.start + start_pad,
+		.end = nsio->res.end - end_trunc,
+	};
 	if (nd_pfn->mode == PFN_MODE_RAM) {
 		if (offset < reserve)
 			return -EINVAL;
 		nd_pfn->npfns = le64_to_cpu(pfn_sb->npfns);
 	} else if (nd_pfn->mode == PFN_MODE_PMEM) {
-		nd_pfn->npfns = PHYS_PFN((resource_size(res) - offset));
+		nd_pfn->npfns = PHYS_PFN((range_len(range) - offset));
 		if (le64_to_cpu(nd_pfn->pfn_sb->npfns) > nd_pfn->npfns)
 			dev_info(&nd_pfn->dev,
 					"number of pfns truncated from %lld to %ld\n",
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index fab29b514372..69cc0e783709 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -376,7 +376,7 @@ static int pmem_attach_disk(struct device *dev,
 	struct nd_region *nd_region = to_nd_region(dev->parent);
 	int nid = dev_to_node(dev), fua;
 	struct resource *res = &nsio->res;
-	struct resource bb_res;
+	struct range bb_range;
 	struct nd_pfn *nd_pfn = NULL;
 	struct dax_device *dax_dev;
 	struct nd_pfn_sb *pfn_sb;
@@ -435,24 +435,26 @@ static int pmem_attach_disk(struct device *dev,
 		pfn_sb = nd_pfn->pfn_sb;
 		pmem->data_offset = le64_to_cpu(pfn_sb->dataoff);
 		pmem->pfn_pad = resource_size(res) -
-			resource_size(&pmem->pgmap.res);
+			range_len(&pmem->pgmap.range);
 		pmem->pfn_flags |= PFN_MAP;
-		memcpy(&bb_res, &pmem->pgmap.res, sizeof(bb_res));
-		bb_res.start += pmem->data_offset;
+		bb_range = pmem->pgmap.range;
+		bb_range.start += pmem->data_offset;
 	} else if (pmem_should_map_pages(dev)) {
-		memcpy(&pmem->pgmap.res, &nsio->res, sizeof(pmem->pgmap.res));
+		pmem->pgmap.range.start = res->start;
+		pmem->pgmap.range.end = res->end;
 		pmem->pgmap.type = MEMORY_DEVICE_FS_DAX;
 		pmem->pgmap.ops = &fsdax_pagemap_ops;
 		addr = devm_memremap_pages(dev, &pmem->pgmap);
 		pmem->pfn_flags |= PFN_MAP;
-		memcpy(&bb_res, &pmem->pgmap.res, sizeof(bb_res));
+		bb_range = pmem->pgmap.range;
 	} else {
 		if (devm_add_action_or_reset(dev, pmem_release_queue,
 					&pmem->pgmap))
 			return -ENOMEM;
 		addr = devm_memremap(dev, pmem->phys_addr,
 				pmem->size, ARCH_MEMREMAP_PMEM);
-		memcpy(&bb_res, &nsio->res, sizeof(bb_res));
+		bb_range.start =  res->start;
+		bb_range.end = res->end;
 	}
 
 	if (IS_ERR(addr))
@@ -482,7 +484,7 @@ static int pmem_attach_disk(struct device *dev,
 			/ 512);
 	if (devm_init_badblocks(dev, &pmem->bb))
 		return -ENOMEM;
-	nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_res);
+	nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_range);
 	disk->bb = &pmem->bb;
 
 	if (is_nvdimm_sync(nd_region))
@@ -593,8 +595,8 @@ static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
 	resource_size_t offset = 0, end_trunc = 0;
 	struct nd_namespace_common *ndns;
 	struct nd_namespace_io *nsio;
-	struct resource res;
 	struct badblocks *bb;
+	struct range range;
 	struct kernfs_node *bb_state;
 
 	if (event != NVDIMM_REVALIDATE_POISON)
@@ -630,9 +632,9 @@ static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
 		nsio = to_nd_namespace_io(&ndns->dev);
 	}
 
-	res.start = nsio->res.start + offset;
-	res.end = nsio->res.end - end_trunc;
-	nvdimm_badblocks_populate(nd_region, bb, &res);
+	range.start = nsio->res.start + offset;
+	range.end = nsio->res.end - end_trunc;
+	nvdimm_badblocks_populate(nd_region, bb, &range);
 	if (bb_state)
 		sysfs_notify_dirent(bb_state);
 }
diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c
index 0f6978e72e7c..bfce87ed72ab 100644
--- a/drivers/nvdimm/region.c
+++ b/drivers/nvdimm/region.c
@@ -35,7 +35,10 @@ static int nd_region_probe(struct device *dev)
 		return rc;
 
 	if (is_memory(&nd_region->dev)) {
-		struct resource ndr_res;
+		struct range range = {
+			.start = nd_region->ndr_start,
+			.end = nd_region->ndr_start + nd_region->ndr_size - 1,
+		};
 
 		if (devm_init_badblocks(dev, &nd_region->bb))
 			return -ENODEV;
@@ -44,9 +47,7 @@ static int nd_region_probe(struct device *dev)
 		if (!nd_region->bb_state)
 			dev_warn(&nd_region->dev,
 					"'badblocks' notification disabled\n");
-		ndr_res.start = nd_region->ndr_start;
-		ndr_res.end = nd_region->ndr_start + nd_region->ndr_size - 1;
-		nvdimm_badblocks_populate(nd_region, &nd_region->bb, &ndr_res);
+		nvdimm_badblocks_populate(nd_region, &nd_region->bb, &range);
 	}
 
 	rc = nd_region_register_namespaces(nd_region, &err);
@@ -121,14 +122,16 @@ static void nd_region_notify(struct device *dev, enum nvdimm_event event)
 {
 	if (event == NVDIMM_REVALIDATE_POISON) {
 		struct nd_region *nd_region = to_nd_region(dev);
-		struct resource res;
 
 		if (is_memory(&nd_region->dev)) {
-			res.start = nd_region->ndr_start;
-			res.end = nd_region->ndr_start +
-				nd_region->ndr_size - 1;
+			struct range range = {
+				.start = nd_region->ndr_start,
+				.end = nd_region->ndr_start +
+					nd_region->ndr_size - 1,
+			};
+
 			nvdimm_badblocks_populate(nd_region,
-					&nd_region->bb, &res);
+					&nd_region->bb, &range);
 			if (nd_region->bb_state)
 				sysfs_notify_dirent(nd_region->bb_state);
 		}
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index f357f9a32b3a..256850513813 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -185,9 +185,8 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
 		return -ENOMEM;
 
 	pgmap = &p2p_pgmap->pgmap;
-	pgmap->res.start = pci_resource_start(pdev, bar) + offset;
-	pgmap->res.end = pgmap->res.start + size - 1;
-	pgmap->res.flags = pci_resource_flags(pdev, bar);
+	pgmap->range.start = pci_resource_start(pdev, bar) + offset;
+	pgmap->range.end = pgmap->range.start + size - 1;
 	pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
 
 	p2p_pgmap->provider = pdev;
@@ -202,13 +201,13 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
 
 	error = gen_pool_add_owner(pdev->p2pdma->pool, (unsigned long)addr,
 			pci_bus_address(pdev, bar) + offset,
-			resource_size(&pgmap->res), dev_to_node(&pdev->dev),
+			range_len(&pgmap->range), dev_to_node(&pdev->dev),
 			pgmap->ref);
 	if (error)
 		goto pages_free;
 
-	pci_info(pdev, "added peer-to-peer DMA memory %pR\n",
-		 &pgmap->res);
+	pci_info(pdev, "added peer-to-peer DMA memory %#llx-%#llx\n",
+		 pgmap->range.start, pgmap->range.end);
 
 	return 0;
 
diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c
index 3b98dc921426..091b8669eca3 100644
--- a/drivers/xen/unpopulated-alloc.c
+++ b/drivers/xen/unpopulated-alloc.c
@@ -18,27 +18,37 @@ static unsigned int list_count;
 static int fill_list(unsigned int nr_pages)
 {
 	struct dev_pagemap *pgmap;
+	struct resource *res;
 	void *vaddr;
 	unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION);
-	int ret;
+	int ret = -ENOMEM;
+
+	res = kzalloc(sizeof(*res), GFP_KERNEL);
+	if (!res)
+		return -ENOMEM;
 
 	pgmap = kzalloc(sizeof(*pgmap), GFP_KERNEL);
 	if (!pgmap)
-		return -ENOMEM;
+		goto err_pgmap;
 
 	pgmap->type = MEMORY_DEVICE_GENERIC;
-	pgmap->res.name = "Xen scratch";
-	pgmap->res.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+	res->name = "Xen scratch";
+	res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
 
-	ret = allocate_resource(&iomem_resource, &pgmap->res,
+	ret = allocate_resource(&iomem_resource, res,
 				alloc_pages * PAGE_SIZE, 0, -1,
 				PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
 	if (ret < 0) {
 		pr_err("Cannot allocate new IOMEM resource\n");
-		kfree(pgmap);
-		return ret;
+		goto err_resource;
 	}
 
+	pgmap->range = (struct range) {
+		.start = res->start,
+		.end = res->end,
+	};
+	pgmap->owner = res;
+
 #ifdef CONFIG_XEN_HAVE_PVMMU
         /*
          * memremap will build page tables for the new memory so
@@ -50,14 +60,13 @@ static int fill_list(unsigned int nr_pages)
          * conflict with any devices.
          */
 	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-		xen_pfn_t pfn = PFN_DOWN(pgmap->res.start);
+		xen_pfn_t pfn = PFN_DOWN(res->start);
 
 		for (i = 0; i < alloc_pages; i++) {
 			if (!set_phys_to_machine(pfn + i, INVALID_P2M_ENTRY)) {
 				pr_warn("set_phys_to_machine() failed, no memory added\n");
-				release_resource(&pgmap->res);
-				kfree(pgmap);
-				return -ENOMEM;
+				ret = -ENOMEM;
+				goto err_memremap;
 			}
                 }
 	}
@@ -66,9 +75,8 @@ static int fill_list(unsigned int nr_pages)
 	vaddr = memremap_pages(pgmap, NUMA_NO_NODE);
 	if (IS_ERR(vaddr)) {
 		pr_err("Cannot remap memory range\n");
-		release_resource(&pgmap->res);
-		kfree(pgmap);
-		return PTR_ERR(vaddr);
+		ret = PTR_ERR(vaddr);
+		goto err_memremap;
 	}
 
 	for (i = 0; i < alloc_pages; i++) {
@@ -80,6 +88,14 @@ static int fill_list(unsigned int nr_pages)
 	}
 
 	return 0;
+
+err_memremap:
+	release_resource(res);
+err_resource:
+	kfree(pgmap);
+err_pgmap:
+	kfree(res);
+	return ret;
 }
 
 /**
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index b37686803a6d..375b9e87a5cf 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef _LINUX_MEMREMAP_H_
 #define _LINUX_MEMREMAP_H_
+#include <linux/range.h>
 #include <linux/ioport.h>
 #include <linux/percpu-refcount.h>
 
@@ -93,7 +94,7 @@ struct dev_pagemap_ops {
 /**
  * struct dev_pagemap - metadata for ZONE_DEVICE mappings
  * @altmap: pre-allocated/reserved memory for vmemmap allocations
- * @res: physical address range covered by @ref
+ * @range: physical address range covered by @ref
  * @ref: reference count that pins the devm_memremap_pages() mapping
  * @internal_ref: internal reference if @ref is not provided by the caller
  * @done: completion for @internal_ref
@@ -106,7 +107,7 @@ struct dev_pagemap_ops {
  */
 struct dev_pagemap {
 	struct vmem_altmap altmap;
-	struct resource res;
+	struct range range;
 	struct percpu_ref *ref;
 	struct percpu_ref internal_ref;
 	struct completion done;
diff --git a/include/linux/range.h b/include/linux/range.h
index d1fbeb664012..274681cc3154 100644
--- a/include/linux/range.h
+++ b/include/linux/range.h
@@ -1,12 +1,18 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef _LINUX_RANGE_H
 #define _LINUX_RANGE_H
+#include <linux/types.h>
 
 struct range {
 	u64   start;
 	u64   end;
 };
 
+static inline u64 range_len(const struct range *range)
+{
+	return range->end - range->start + 1;
+}
+
 int add_range(struct range *range, int az, int nr_range,
 		u64 start, u64 end);
 
diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index e7dc3de355b7..5b4521991621 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -487,7 +487,8 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice,
 		goto err_release;
 
 	devmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
-	devmem->pagemap.res = *res;
+	devmem->pagemap.range.start = res->start;
+	devmem->pagemap.range.end = res->end;
 	devmem->pagemap.ops = &dmirror_devmem_ops;
 	devmem->pagemap.owner = mdevice;
 
@@ -496,9 +497,8 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice,
 		goto err_free;
 
 	devmem->mdevice = mdevice;
-	pfn_first = devmem->pagemap.res.start >> PAGE_SHIFT;
-	pfn_last = pfn_first +
-		(resource_size(&devmem->pagemap.res) >> PAGE_SHIFT);
+	pfn_first = devmem->pagemap.range.start >> PAGE_SHIFT;
+	pfn_last = pfn_first + (range_len(&devmem->pagemap.range) >> PAGE_SHIFT);
 	mdevice->devmem_chunks[mdevice->devmem_count++] = devmem;
 
 	mutex_unlock(&mdevice->devmem_lock);
@@ -528,7 +528,7 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice,
 err_free:
 	kfree(devmem);
 err_release:
-	release_mem_region(res->start, resource_size(res));
+	release_mem_region(devmem->pagemap.range.start, range_len(&devmem->pagemap.range));
 err:
 	mutex_unlock(&mdevice->devmem_lock);
 	return false;
@@ -1100,8 +1100,8 @@ static void dmirror_device_remove(struct dmirror_device *mdevice)
 				mdevice->devmem_chunks[i];
 
 			memunmap_pages(&devmem->pagemap);
-			release_mem_region(devmem->pagemap.res.start,
-					   resource_size(&devmem->pagemap.res));
+			release_mem_region(devmem->pagemap.range.start,
+					   range_len(&devmem->pagemap.range));
 			kfree(devmem);
 		}
 		kfree(mdevice->devmem_chunks);
diff --git a/mm/memremap.c b/mm/memremap.c
index f008706b685e..7c895e1477b0 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -70,24 +70,24 @@ static void devmap_managed_enable_put(void)
 }
 #endif /* CONFIG_DEV_PAGEMAP_OPS */
 
-static void pgmap_array_delete(struct resource *res)
+static void pgmap_array_delete(struct range *range)
 {
-	xa_store_range(&pgmap_array, PHYS_PFN(res->start), PHYS_PFN(res->end),
+	xa_store_range(&pgmap_array, PHYS_PFN(range->start), PHYS_PFN(range->end),
 			NULL, GFP_KERNEL);
 	synchronize_rcu();
 }
 
 static unsigned long pfn_first(struct dev_pagemap *pgmap)
 {
-	return PHYS_PFN(pgmap->res.start) +
+	return PHYS_PFN(pgmap->range.start) +
 		vmem_altmap_offset(pgmap_altmap(pgmap));
 }
 
 static unsigned long pfn_end(struct dev_pagemap *pgmap)
 {
-	const struct resource *res = &pgmap->res;
+	const struct range *range = &pgmap->range;
 
-	return (res->start + resource_size(res)) >> PAGE_SHIFT;
+	return (range->start + range_len(range)) >> PAGE_SHIFT;
 }
 
 static unsigned long pfn_next(unsigned long pfn)
@@ -146,7 +146,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
 
 void memunmap_pages(struct dev_pagemap *pgmap)
 {
-	struct resource *res = &pgmap->res;
+	struct range *range = &pgmap->range;
 	struct page *first_page;
 	unsigned long pfn;
 	int nid;
@@ -163,20 +163,20 @@ void memunmap_pages(struct dev_pagemap *pgmap)
 	nid = page_to_nid(first_page);
 
 	mem_hotplug_begin();
-	remove_pfn_range_from_zone(page_zone(first_page), PHYS_PFN(res->start),
-				   PHYS_PFN(resource_size(res)));
+	remove_pfn_range_from_zone(page_zone(first_page), PHYS_PFN(range->start),
+				   PHYS_PFN(range_len(range)));
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
-		__remove_pages(PHYS_PFN(res->start),
-			       PHYS_PFN(resource_size(res)), NULL);
+		__remove_pages(PHYS_PFN(range->start),
+			       PHYS_PFN(range_len(range)), NULL);
 	} else {
-		arch_remove_memory(nid, res->start, resource_size(res),
+		arch_remove_memory(nid, range->start, range_len(range),
 				pgmap_altmap(pgmap));
-		kasan_remove_zero_shadow(__va(res->start), resource_size(res));
+		kasan_remove_zero_shadow(__va(range->start), range_len(range));
 	}
 	mem_hotplug_done();
 
-	untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res));
-	pgmap_array_delete(res);
+	untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range));
+	pgmap_array_delete(range);
 	WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n");
 	devmap_managed_enable_put();
 }
@@ -202,7 +202,7 @@ static void dev_pagemap_percpu_release(struct percpu_ref *ref)
  */
 void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 {
-	struct resource *res = &pgmap->res;
+	struct range *range = &pgmap->range;
 	struct dev_pagemap *conflict_pgmap;
 	struct mhp_params params = {
 		/*
@@ -271,7 +271,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 			return ERR_PTR(error);
 	}
 
-	conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->start), NULL);
+	conflict_pgmap = get_dev_pagemap(PHYS_PFN(range->start), NULL);
 	if (conflict_pgmap) {
 		WARN(1, "Conflicting mapping in same section\n");
 		put_dev_pagemap(conflict_pgmap);
@@ -279,7 +279,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 		goto err_array;
 	}
 
-	conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->end), NULL);
+	conflict_pgmap = get_dev_pagemap(PHYS_PFN(range->end), NULL);
 	if (conflict_pgmap) {
 		WARN(1, "Conflicting mapping in same section\n");
 		put_dev_pagemap(conflict_pgmap);
@@ -287,26 +287,27 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 		goto err_array;
 	}
 
-	is_ram = region_intersects(res->start, resource_size(res),
+	is_ram = region_intersects(range->start, range_len(range),
 		IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE);
 
 	if (is_ram != REGION_DISJOINT) {
-		WARN_ONCE(1, "%s attempted on %s region %pr\n", __func__,
-				is_ram == REGION_MIXED ? "mixed" : "ram", res);
+		WARN_ONCE(1, "attempted on %s region %#llx-%#llx\n",
+				is_ram == REGION_MIXED ? "mixed" : "ram",
+				range->start, range->end);
 		error = -ENXIO;
 		goto err_array;
 	}
 
-	error = xa_err(xa_store_range(&pgmap_array, PHYS_PFN(res->start),
-				PHYS_PFN(res->end), pgmap, GFP_KERNEL));
+	error = xa_err(xa_store_range(&pgmap_array, PHYS_PFN(range->start),
+				PHYS_PFN(range->end), pgmap, GFP_KERNEL));
 	if (error)
 		goto err_array;
 
 	if (nid < 0)
 		nid = numa_mem_id();
 
-	error = track_pfn_remap(NULL, &params.pgprot, PHYS_PFN(res->start),
-				0, resource_size(res));
+	error = track_pfn_remap(NULL, &params.pgprot, PHYS_PFN(range->start), 0,
+			range_len(range));
 	if (error)
 		goto err_pfn_remap;
 
@@ -324,16 +325,16 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 	 * arch_add_memory().
 	 */
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
-		error = add_pages(nid, PHYS_PFN(res->start),
-				PHYS_PFN(resource_size(res)), &params);
+		error = add_pages(nid, PHYS_PFN(range->start),
+				PHYS_PFN(range_len(range)), &params);
 	} else {
-		error = kasan_add_zero_shadow(__va(res->start), resource_size(res));
+		error = kasan_add_zero_shadow(__va(range->start), range_len(range));
 		if (error) {
 			mem_hotplug_done();
 			goto err_kasan;
 		}
 
-		error = arch_add_memory(nid, res->start, resource_size(res),
+		error = arch_add_memory(nid, range->start, range_len(range),
 					&params);
 	}
 
@@ -341,8 +342,8 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 		struct zone *zone;
 
 		zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE];
-		move_pfn_range_to_zone(zone, PHYS_PFN(res->start),
-				PHYS_PFN(resource_size(res)), params.altmap);
+		move_pfn_range_to_zone(zone, PHYS_PFN(range->start),
+				PHYS_PFN(range_len(range)), params.altmap);
 	}
 
 	mem_hotplug_done();
@@ -354,17 +355,17 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 	 * to allow us to do the work while not holding the hotplug lock.
 	 */
 	memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
-				PHYS_PFN(res->start),
-				PHYS_PFN(resource_size(res)), pgmap);
+				PHYS_PFN(range->start),
+				PHYS_PFN(range_len(range)), pgmap);
 	percpu_ref_get_many(pgmap->ref, pfn_end(pgmap) - pfn_first(pgmap));
-	return __va(res->start);
+	return __va(range->start);
 
  err_add_memory:
-	kasan_remove_zero_shadow(__va(res->start), resource_size(res));
+	kasan_remove_zero_shadow(__va(range->start), range_len(range));
  err_kasan:
-	untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res));
+	untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range));
  err_pfn_remap:
-	pgmap_array_delete(res);
+	pgmap_array_delete(range);
  err_array:
 	dev_pagemap_kill(pgmap);
 	dev_pagemap_cleanup(pgmap);
@@ -389,7 +390,7 @@ EXPORT_SYMBOL_GPL(memremap_pages);
  *    'live' on entry and will be killed and reaped at
  *    devm_memremap_pages_release() time, or if this routine fails.
  *
- * 4/ res is expected to be a host memory range that could feasibly be
+ * 4/ range is expected to be a host memory range that could feasibly be
  *    treated as a "System RAM" range, i.e. not a device mmio range, but
  *    this is not enforced.
  */
@@ -446,7 +447,7 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
 	 * In the cached case we're already holding a live reference.
 	 */
 	if (pgmap) {
-		if (phys >= pgmap->res.start && phys <= pgmap->res.end)
+		if (phys >= pgmap->range.start && phys <= pgmap->range.end)
 			return pgmap;
 		put_dev_pagemap(pgmap);
 	}
diff --git a/tools/testing/nvdimm/test/iomap.c b/tools/testing/nvdimm/test/iomap.c
index 03e40b3b0106..c62d372d426f 100644
--- a/tools/testing/nvdimm/test/iomap.c
+++ b/tools/testing/nvdimm/test/iomap.c
@@ -126,7 +126,7 @@ static void dev_pagemap_percpu_release(struct percpu_ref *ref)
 void *__wrap_devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 {
 	int error;
-	resource_size_t offset = pgmap->res.start;
+	resource_size_t offset = pgmap->range.start;
 	struct nfit_test_resource *nfit_res = get_nfit_res(offset);
 
 	if (!nfit_res)


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 11/17] mm/memremap_pages: support multiple ranges per invocation
  2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
                   ` (9 preceding siblings ...)
  2020-09-25 19:12 ` [PATCH v5 10/17] mm/memremap_pages: convert to 'struct range' Dan Williams
@ 2020-09-25 19:12 ` Dan Williams
  2020-09-25 19:12 ` [PATCH v5 12/17] device-dax: add dis-contiguous resource support Dan Williams
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Dan Williams @ 2020-09-25 19:12 UTC (permalink / raw)
  To: akpm
  Cc: Paul Mackerras, Michael Ellerman, Benjamin Herrenschmidt,
	Vishal Verma, Vivek Goyal, Dave Jiang, Ben Skeggs, David Airlie,
	Daniel Vetter, Ira Weiny, Bjorn Helgaas, Boris Ostrovsky,
	Juergen Gross, Stefano Stabellini, Jérôme Glisse,
	dave.hansen, linux-mm, linux-nvdimm, linux-kernel

In support of device-dax growing the ability to front physically
dis-contiguous ranges of memory, update devm_memremap_pages() to track
multiple ranges with a single reference counter and devm instance.

Convert all [devm_]memremap_pages() users to specify the number of
ranges they are mapping in their 'struct dev_pagemap' instance.

Link: https://lkml.kernel.org/r/159643103789.4062302.18426128170217903785.stgit@dwillia2-desk3.amr.corp.intel.com
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/powerpc/kvm/book3s_hv_uvmem.c     |    1 
 drivers/dax/device.c                   |    1 
 drivers/gpu/drm/nouveau/nouveau_dmem.c |    1 
 drivers/nvdimm/pfn_devs.c              |    1 
 drivers/nvdimm/pmem.c                  |    1 
 drivers/pci/p2pdma.c                   |    1 
 drivers/xen/unpopulated-alloc.c        |    1 
 include/linux/memremap.h               |   10 +
 lib/test_hmm.c                         |    1 
 mm/memremap.c                          |  258 +++++++++++++++++++-------------
 10 files changed, 166 insertions(+), 110 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 29ec555055c2..84e5a2dc8be5 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -1172,6 +1172,7 @@ int kvmppc_uvmem_init(void)
 	kvmppc_uvmem_pgmap.type = MEMORY_DEVICE_PRIVATE;
 	kvmppc_uvmem_pgmap.range.start = res->start;
 	kvmppc_uvmem_pgmap.range.end = res->end;
+	kvmppc_uvmem_pgmap.nr_range = 1;
 	kvmppc_uvmem_pgmap.ops = &kvmppc_uvmem_ops;
 	/* just one global instance: */
 	kvmppc_uvmem_pgmap.owner = &kvmppc_uvmem_pgmap;
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index a14448bca83d..5f808617672a 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -417,6 +417,7 @@ int dev_dax_probe(struct dev_dax *dev_dax)
 		if (!pgmap)
 			return -ENOMEM;
 		pgmap->range = *range;
+		pgmap->nr_range = 1;
 	}
 	pgmap->type = MEMORY_DEVICE_GENERIC;
 	addr = devm_memremap_pages(dev, pgmap);
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 25811ed7e274..a13c6215bba8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -251,6 +251,7 @@ nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage)
 	chunk->pagemap.type = MEMORY_DEVICE_PRIVATE;
 	chunk->pagemap.range.start = res->start;
 	chunk->pagemap.range.end = res->end;
+	chunk->pagemap.nr_range = 1;
 	chunk->pagemap.ops = &nouveau_dmem_pagemap_ops;
 	chunk->pagemap.owner = drm->dev;
 
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 3c4787b92a6a..b499df630d4d 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -693,6 +693,7 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap)
 		.start = nsio->res.start + start_pad,
 		.end = nsio->res.end - end_trunc,
 	};
+	pgmap->nr_range = 1;
 	if (nd_pfn->mode == PFN_MODE_RAM) {
 		if (offset < reserve)
 			return -EINVAL;
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 69cc0e783709..1f45af363a94 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -442,6 +442,7 @@ static int pmem_attach_disk(struct device *dev,
 	} else if (pmem_should_map_pages(dev)) {
 		pmem->pgmap.range.start = res->start;
 		pmem->pgmap.range.end = res->end;
+		pmem->pgmap.nr_range = 1;
 		pmem->pgmap.type = MEMORY_DEVICE_FS_DAX;
 		pmem->pgmap.ops = &fsdax_pagemap_ops;
 		addr = devm_memremap_pages(dev, &pmem->pgmap);
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 256850513813..9d53c16b7329 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -187,6 +187,7 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
 	pgmap = &p2p_pgmap->pgmap;
 	pgmap->range.start = pci_resource_start(pdev, bar) + offset;
 	pgmap->range.end = pgmap->range.start + size - 1;
+	pgmap->nr_range = 1;
 	pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
 
 	p2p_pgmap->provider = pdev;
diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c
index 091b8669eca3..8c512ea550bb 100644
--- a/drivers/xen/unpopulated-alloc.c
+++ b/drivers/xen/unpopulated-alloc.c
@@ -47,6 +47,7 @@ static int fill_list(unsigned int nr_pages)
 		.start = res->start,
 		.end = res->end,
 	};
+	pgmap->nr_range = 1;
 	pgmap->owner = res;
 
 #ifdef CONFIG_XEN_HAVE_PVMMU
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 375b9e87a5cf..86c6c368ce9b 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -94,7 +94,6 @@ struct dev_pagemap_ops {
 /**
  * struct dev_pagemap - metadata for ZONE_DEVICE mappings
  * @altmap: pre-allocated/reserved memory for vmemmap allocations
- * @range: physical address range covered by @ref
  * @ref: reference count that pins the devm_memremap_pages() mapping
  * @internal_ref: internal reference if @ref is not provided by the caller
  * @done: completion for @internal_ref
@@ -104,10 +103,12 @@ struct dev_pagemap_ops {
  * @owner: an opaque pointer identifying the entity that manages this
  *	instance.  Used by various helpers to make sure that no
  *	foreign ZONE_DEVICE memory is accessed.
+ * @nr_range: number of ranges to be mapped
+ * @range: range to be mapped when nr_range == 1
+ * @ranges: array of ranges to be mapped when nr_range > 1
  */
 struct dev_pagemap {
 	struct vmem_altmap altmap;
-	struct range range;
 	struct percpu_ref *ref;
 	struct percpu_ref internal_ref;
 	struct completion done;
@@ -115,6 +116,11 @@ struct dev_pagemap {
 	unsigned int flags;
 	const struct dev_pagemap_ops *ops;
 	void *owner;
+	int nr_range;
+	union {
+		struct range range;
+		struct range ranges[0];
+	};
 };
 
 static inline struct vmem_altmap *pgmap_altmap(struct dev_pagemap *pgmap)
diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 5b4521991621..e3065d6123f0 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -489,6 +489,7 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice,
 	devmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
 	devmem->pagemap.range.start = res->start;
 	devmem->pagemap.range.end = res->end;
+	devmem->pagemap.nr_range = 1;
 	devmem->pagemap.ops = &dmirror_devmem_ops;
 	devmem->pagemap.owner = mdevice;
 
diff --git a/mm/memremap.c b/mm/memremap.c
index 7c895e1477b0..282849f2e319 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -77,15 +77,19 @@ static void pgmap_array_delete(struct range *range)
 	synchronize_rcu();
 }
 
-static unsigned long pfn_first(struct dev_pagemap *pgmap)
+static unsigned long pfn_first(struct dev_pagemap *pgmap, int range_id)
 {
-	return PHYS_PFN(pgmap->range.start) +
-		vmem_altmap_offset(pgmap_altmap(pgmap));
+	struct range *range = &pgmap->ranges[range_id];
+	unsigned long pfn = PHYS_PFN(range->start);
+
+	if (range_id)
+		return pfn;
+	return pfn + vmem_altmap_offset(pgmap_altmap(pgmap));
 }
 
-static unsigned long pfn_end(struct dev_pagemap *pgmap)
+static unsigned long pfn_end(struct dev_pagemap *pgmap, int range_id)
 {
-	const struct range *range = &pgmap->range;
+	const struct range *range = &pgmap->ranges[range_id];
 
 	return (range->start + range_len(range)) >> PAGE_SHIFT;
 }
@@ -117,8 +121,8 @@ bool pfn_zone_device_reserved(unsigned long pfn)
 	return ret;
 }
 
-#define for_each_device_pfn(pfn, map) \
-	for (pfn = pfn_first(map); pfn < pfn_end(map); pfn = pfn_next(pfn))
+#define for_each_device_pfn(pfn, map, i) \
+	for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(pfn))
 
 static void dev_pagemap_kill(struct dev_pagemap *pgmap)
 {
@@ -144,20 +148,14 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
 		pgmap->ref = NULL;
 }
 
-void memunmap_pages(struct dev_pagemap *pgmap)
+static void pageunmap_range(struct dev_pagemap *pgmap, int range_id)
 {
-	struct range *range = &pgmap->range;
+	struct range *range = &pgmap->ranges[range_id];
 	struct page *first_page;
-	unsigned long pfn;
 	int nid;
 
-	dev_pagemap_kill(pgmap);
-	for_each_device_pfn(pfn, pgmap)
-		put_page(pfn_to_page(pfn));
-	dev_pagemap_cleanup(pgmap);
-
 	/* make sure to access a memmap that was actually initialized */
-	first_page = pfn_to_page(pfn_first(pgmap));
+	first_page = pfn_to_page(pfn_first(pgmap, range_id));
 
 	/* pages are dead and unused, undo the arch mapping */
 	nid = page_to_nid(first_page);
@@ -177,6 +175,22 @@ void memunmap_pages(struct dev_pagemap *pgmap)
 
 	untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range));
 	pgmap_array_delete(range);
+}
+
+void memunmap_pages(struct dev_pagemap *pgmap)
+{
+	unsigned long pfn;
+	int i;
+
+	dev_pagemap_kill(pgmap);
+	for (i = 0; i < pgmap->nr_range; i++)
+		for_each_device_pfn(pfn, pgmap, i)
+			put_page(pfn_to_page(pfn));
+	dev_pagemap_cleanup(pgmap);
+
+	for (i = 0; i < pgmap->nr_range; i++)
+		pageunmap_range(pgmap, i);
+
 	WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n");
 	devmap_managed_enable_put();
 }
@@ -195,96 +209,29 @@ static void dev_pagemap_percpu_release(struct percpu_ref *ref)
 	complete(&pgmap->done);
 }
 
-/*
- * Not device managed version of dev_memremap_pages, undone by
- * memunmap_pages().  Please use dev_memremap_pages if you have a struct
- * device available.
- */
-void *memremap_pages(struct dev_pagemap *pgmap, int nid)
+static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params,
+		int range_id, int nid)
 {
-	struct range *range = &pgmap->range;
+	struct range *range = &pgmap->ranges[range_id];
 	struct dev_pagemap *conflict_pgmap;
-	struct mhp_params params = {
-		/*
-		 * We do not want any optional features only our own memmap
-		 */
-		.altmap = pgmap_altmap(pgmap),
-		.pgprot = PAGE_KERNEL,
-	};
 	int error, is_ram;
-	bool need_devmap_managed = true;
 
-	switch (pgmap->type) {
-	case MEMORY_DEVICE_PRIVATE:
-		if (!IS_ENABLED(CONFIG_DEVICE_PRIVATE)) {
-			WARN(1, "Device private memory not supported\n");
-			return ERR_PTR(-EINVAL);
-		}
-		if (!pgmap->ops || !pgmap->ops->migrate_to_ram) {
-			WARN(1, "Missing migrate_to_ram method\n");
-			return ERR_PTR(-EINVAL);
-		}
-		if (!pgmap->owner) {
-			WARN(1, "Missing owner\n");
-			return ERR_PTR(-EINVAL);
-		}
-		break;
-	case MEMORY_DEVICE_FS_DAX:
-		if (!IS_ENABLED(CONFIG_ZONE_DEVICE) ||
-		    IS_ENABLED(CONFIG_FS_DAX_LIMITED)) {
-			WARN(1, "File system DAX not supported\n");
-			return ERR_PTR(-EINVAL);
-		}
-		break;
-	case MEMORY_DEVICE_GENERIC:
-		need_devmap_managed = false;
-		break;
-	case MEMORY_DEVICE_PCI_P2PDMA:
-		params.pgprot = pgprot_noncached(params.pgprot);
-		need_devmap_managed = false;
-		break;
-	default:
-		WARN(1, "Invalid pgmap type %d\n", pgmap->type);
-		break;
-	}
-
-	if (!pgmap->ref) {
-		if (pgmap->ops && (pgmap->ops->kill || pgmap->ops->cleanup))
-			return ERR_PTR(-EINVAL);
-
-		init_completion(&pgmap->done);
-		error = percpu_ref_init(&pgmap->internal_ref,
-				dev_pagemap_percpu_release, 0, GFP_KERNEL);
-		if (error)
-			return ERR_PTR(error);
-		pgmap->ref = &pgmap->internal_ref;
-	} else {
-		if (!pgmap->ops || !pgmap->ops->kill || !pgmap->ops->cleanup) {
-			WARN(1, "Missing reference count teardown definition\n");
-			return ERR_PTR(-EINVAL);
-		}
-	}
-
-	if (need_devmap_managed) {
-		error = devmap_managed_enable_get(pgmap);
-		if (error)
-			return ERR_PTR(error);
-	}
+	if (WARN_ONCE(pgmap_altmap(pgmap) && range_id > 0,
+				"altmap not supported for multiple ranges\n"))
+		return -EINVAL;
 
 	conflict_pgmap = get_dev_pagemap(PHYS_PFN(range->start), NULL);
 	if (conflict_pgmap) {
 		WARN(1, "Conflicting mapping in same section\n");
 		put_dev_pagemap(conflict_pgmap);
-		error = -ENOMEM;
-		goto err_array;
+		return -ENOMEM;
 	}
 
 	conflict_pgmap = get_dev_pagemap(PHYS_PFN(range->end), NULL);
 	if (conflict_pgmap) {
 		WARN(1, "Conflicting mapping in same section\n");
 		put_dev_pagemap(conflict_pgmap);
-		error = -ENOMEM;
-		goto err_array;
+		return -ENOMEM;
 	}
 
 	is_ram = region_intersects(range->start, range_len(range),
@@ -294,19 +241,18 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 		WARN_ONCE(1, "attempted on %s region %#llx-%#llx\n",
 				is_ram == REGION_MIXED ? "mixed" : "ram",
 				range->start, range->end);
-		error = -ENXIO;
-		goto err_array;
+		return -ENXIO;
 	}
 
 	error = xa_err(xa_store_range(&pgmap_array, PHYS_PFN(range->start),
 				PHYS_PFN(range->end), pgmap, GFP_KERNEL));
 	if (error)
-		goto err_array;
+		return error;
 
 	if (nid < 0)
 		nid = numa_mem_id();
 
-	error = track_pfn_remap(NULL, &params.pgprot, PHYS_PFN(range->start), 0,
+	error = track_pfn_remap(NULL, &params->pgprot, PHYS_PFN(range->start), 0,
 			range_len(range));
 	if (error)
 		goto err_pfn_remap;
@@ -326,7 +272,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 	 */
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
 		error = add_pages(nid, PHYS_PFN(range->start),
-				PHYS_PFN(range_len(range)), &params);
+				PHYS_PFN(range_len(range)), params);
 	} else {
 		error = kasan_add_zero_shadow(__va(range->start), range_len(range));
 		if (error) {
@@ -335,7 +281,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 		}
 
 		error = arch_add_memory(nid, range->start, range_len(range),
-					&params);
+					params);
 	}
 
 	if (!error) {
@@ -343,7 +289,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 
 		zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE];
 		move_pfn_range_to_zone(zone, PHYS_PFN(range->start),
-				PHYS_PFN(range_len(range)), params.altmap);
+				PHYS_PFN(range_len(range)), params->altmap);
 	}
 
 	mem_hotplug_done();
@@ -357,20 +303,116 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 	memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
 				PHYS_PFN(range->start),
 				PHYS_PFN(range_len(range)), pgmap);
-	percpu_ref_get_many(pgmap->ref, pfn_end(pgmap) - pfn_first(pgmap));
-	return __va(range->start);
+	percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id)
+			- pfn_first(pgmap, range_id));
+	return 0;
 
- err_add_memory:
+err_add_memory:
 	kasan_remove_zero_shadow(__va(range->start), range_len(range));
- err_kasan:
+err_kasan:
 	untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range));
- err_pfn_remap:
+err_pfn_remap:
 	pgmap_array_delete(range);
- err_array:
-	dev_pagemap_kill(pgmap);
-	dev_pagemap_cleanup(pgmap);
-	devmap_managed_enable_put();
-	return ERR_PTR(error);
+	return error;
+}
+
+
+/*
+ * Not device managed version of dev_memremap_pages, undone by
+ * memunmap_pages().  Please use dev_memremap_pages if you have a struct
+ * device available.
+ */
+void *memremap_pages(struct dev_pagemap *pgmap, int nid)
+{
+	struct mhp_params params = {
+		.altmap = pgmap_altmap(pgmap),
+		.pgprot = PAGE_KERNEL,
+	};
+	const int nr_range = pgmap->nr_range;
+	bool need_devmap_managed = true;
+	int error, i;
+
+	if (WARN_ONCE(!nr_range, "nr_range must be specified\n"))
+		return ERR_PTR(-EINVAL);
+
+	switch (pgmap->type) {
+	case MEMORY_DEVICE_PRIVATE:
+		if (!IS_ENABLED(CONFIG_DEVICE_PRIVATE)) {
+			WARN(1, "Device private memory not supported\n");
+			return ERR_PTR(-EINVAL);
+		}
+		if (!pgmap->ops || !pgmap->ops->migrate_to_ram) {
+			WARN(1, "Missing migrate_to_ram method\n");
+			return ERR_PTR(-EINVAL);
+		}
+		if (!pgmap->owner) {
+			WARN(1, "Missing owner\n");
+			return ERR_PTR(-EINVAL);
+		}
+		break;
+	case MEMORY_DEVICE_FS_DAX:
+		if (!IS_ENABLED(CONFIG_ZONE_DEVICE) ||
+		    IS_ENABLED(CONFIG_FS_DAX_LIMITED)) {
+			WARN(1, "File system DAX not supported\n");
+			return ERR_PTR(-EINVAL);
+		}
+		break;
+	case MEMORY_DEVICE_GENERIC:
+		need_devmap_managed = false;
+		break;
+	case MEMORY_DEVICE_PCI_P2PDMA:
+		params.pgprot = pgprot_noncached(params.pgprot);
+		need_devmap_managed = false;
+		break;
+	default:
+		WARN(1, "Invalid pgmap type %d\n", pgmap->type);
+		break;
+	}
+
+	if (!pgmap->ref) {
+		if (pgmap->ops && (pgmap->ops->kill || pgmap->ops->cleanup))
+			return ERR_PTR(-EINVAL);
+
+		init_completion(&pgmap->done);
+		error = percpu_ref_init(&pgmap->internal_ref,
+				dev_pagemap_percpu_release, 0, GFP_KERNEL);
+		if (error)
+			return ERR_PTR(error);
+		pgmap->ref = &pgmap->internal_ref;
+	} else {
+		if (!pgmap->ops || !pgmap->ops->kill || !pgmap->ops->cleanup) {
+			WARN(1, "Missing reference count teardown definition\n");
+			return ERR_PTR(-EINVAL);
+		}
+	}
+
+	if (need_devmap_managed) {
+		error = devmap_managed_enable_get(pgmap);
+		if (error)
+			return ERR_PTR(error);
+	}
+
+	/*
+	 * Clear the pgmap nr_range as it will be incremented for each
+	 * successfully processed range. This communicates how many
+	 * regions to unwind in the abort case.
+	 */
+	pgmap->nr_range = 0;
+	error = 0;
+	for (i = 0; i < nr_range; i++) {
+		error = pagemap_range(pgmap, &params, i, nid);
+		if (error)
+			break;
+		pgmap->nr_range++;
+	}
+
+	if (i < nr_range) {
+		memunmap_pages(pgmap);
+		pgmap->nr_range = nr_range;
+		return ERR_PTR(error);
+	}
+
+	return __va(pgmap->ranges[0].start);
 }
 EXPORT_SYMBOL_GPL(memremap_pages);
 


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 12/17] device-dax: add dis-contiguous resource support
  2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
                   ` (10 preceding siblings ...)
  2020-09-25 19:12 ` [PATCH v5 11/17] mm/memremap_pages: support multiple ranges per invocation Dan Williams
@ 2020-09-25 19:12 ` Dan Williams
  2020-09-25 19:12 ` [PATCH v5 13/17] device-dax: introduce 'mapping' devices Dan Williams
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Dan Williams @ 2020-09-25 19:12 UTC (permalink / raw)
  To: akpm
  Cc: Joao Martins, vishal.l.verma, dave.hansen, linux-mm,
	linux-nvdimm, linux-kernel

Break the requirement that device-dax instances are physically contiguous.
With this constraint removed it allows fragmented available capacity to
be fully allocated.

This capability is useful to mitigate the "noisy neighbor" problem with
memory-side-cache management for virtual machines, or any other scenario
where a platform address boundary also designates a performance boundary.
For example a direct mapped memory side cache might rotate cache colors at
1GB boundaries.  With dis-contiguous allocations a device-dax instance
could be configured to contain only 1 cache color.

It also satisfies Joao's use case (see link) for partitioning memory for
exclusive guest access.  It allows for a future potential mode where the
host kernel need not allocate 'struct page' capacity up-front.

Link: https://lore.kernel.org/lkml/20200110190313.17144-1-joao.m.martins@oracle.com/
Link: https://lkml.kernel.org/r/159643104304.4062302.16561669534797528660.stgit@dwillia2-desk3.amr.corp.intel.com
Reported-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/bus.c              |  233 +++++++++++++++++++++++++++++++---------
 drivers/dax/dax-private.h      |    9 +-
 drivers/dax/device.c           |   55 ++++++---
 drivers/dax/kmem.c             |  130 +++++++++++++++-------
 tools/testing/nvdimm/dax-dev.c |   20 ++-
 5 files changed, 323 insertions(+), 124 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 00fa73a8dfb4..06a789aba58a 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -136,15 +136,27 @@ static bool is_static(struct dax_region *dax_region)
 	return (dax_region->res.flags & IORESOURCE_DAX_STATIC) != 0;
 }
 
+static u64 dev_dax_size(struct dev_dax *dev_dax)
+{
+	u64 size = 0;
+	int i;
+
+	device_lock_assert(&dev_dax->dev);
+
+	for (i = 0; i < dev_dax->nr_range; i++)
+		size += range_len(&dev_dax->ranges[i].range);
+
+	return size;
+}
+
 static int dax_bus_probe(struct device *dev)
 {
 	struct dax_device_driver *dax_drv = to_dax_drv(dev->driver);
 	struct dev_dax *dev_dax = to_dev_dax(dev);
 	struct dax_region *dax_region = dev_dax->region;
-	struct range *range = &dev_dax->range;
 	int rc;
 
-	if (range_len(range) == 0 || dev_dax->id < 0)
+	if (dev_dax_size(dev_dax) == 0 || dev_dax->id < 0)
 		return -ENXIO;
 
 	rc = dax_drv->probe(dev_dax);
@@ -354,15 +366,19 @@ void kill_dev_dax(struct dev_dax *dev_dax)
 }
 EXPORT_SYMBOL_GPL(kill_dev_dax);
 
-static void free_dev_dax_range(struct dev_dax *dev_dax)
+static void free_dev_dax_ranges(struct dev_dax *dev_dax)
 {
 	struct dax_region *dax_region = dev_dax->region;
-	struct range *range = &dev_dax->range;
+	int i;
 
 	device_lock_assert(dax_region->dev);
-	if (range_len(range))
+	for (i = 0; i < dev_dax->nr_range; i++) {
+		struct range *range = &dev_dax->ranges[i].range;
+
 		__release_region(&dax_region->res, range->start,
 				range_len(range));
+	}
+	dev_dax->nr_range = 0;
 }
 
 static void unregister_dev_dax(void *dev)
@@ -372,7 +388,7 @@ static void unregister_dev_dax(void *dev)
 	dev_dbg(dev, "%s\n", __func__);
 
 	kill_dev_dax(dev_dax);
-	free_dev_dax_range(dev_dax);
+	free_dev_dax_ranges(dev_dax);
 	device_del(dev);
 	put_device(dev);
 }
@@ -423,7 +439,7 @@ static ssize_t delete_store(struct device *dev, struct device_attribute *attr,
 	device_lock(dev);
 	device_lock(victim);
 	dev_dax = to_dev_dax(victim);
-	if (victim->driver || range_len(&dev_dax->range))
+	if (victim->driver || dev_dax_size(dev_dax))
 		rc = -EBUSY;
 	else {
 		/*
@@ -569,51 +585,86 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start,
 	struct dax_region *dax_region = dev_dax->region;
 	struct resource *res = &dax_region->res;
 	struct device *dev = &dev_dax->dev;
+	struct dev_dax_range *ranges;
+	unsigned long pgoff = 0;
 	struct resource *alloc;
+	int i;
 
 	device_lock_assert(dax_region->dev);
 
 	/* handle the seed alloc special case */
 	if (!size) {
-		dev_dax->range = (struct range) {
-			.start = res->start,
-			.end = res->start - 1,
-		};
+		if (dev_WARN_ONCE(dev, dev_dax->nr_range,
+					"0-size allocation must be first\n"))
+			return -EBUSY;
+		/* nr_range == 0 is elsewhere special cased as 0-size device */
 		return 0;
 	}
 
+	ranges = krealloc(dev_dax->ranges, sizeof(*ranges)
+			* (dev_dax->nr_range + 1), GFP_KERNEL);
+	if (!ranges)
+		return -ENOMEM;
+
 	alloc = __request_region(res, start, size, dev_name(dev), 0);
-	if (!alloc)
+	if (!alloc) {
+		/*
+		 * If this was an empty set of ranges nothing else
+		 * will release @ranges, so do it now.
+		 */
+		if (!dev_dax->nr_range) {
+			kfree(ranges);
+			ranges = NULL;
+		}
+		dev_dax->ranges = ranges;
 		return -ENOMEM;
+	}
 
-	dev_dax->range = (struct range) {
-		.start = alloc->start,
-		.end = alloc->end,
+	for (i = 0; i < dev_dax->nr_range; i++)
+		pgoff += PHYS_PFN(range_len(&ranges[i].range));
+	dev_dax->ranges = ranges;
+	ranges[dev_dax->nr_range++] = (struct dev_dax_range) {
+		.pgoff = pgoff,
+		.range = {
+			.start = alloc->start,
+			.end = alloc->end,
+		},
 	};
 
+	dev_dbg(dev, "alloc range[%d]: %pa:%pa\n", dev_dax->nr_range - 1,
+			&alloc->start, &alloc->end);
+
 	return 0;
 }
 
 static int adjust_dev_dax_range(struct dev_dax *dev_dax, struct resource *res, resource_size_t size)
 {
+	int last_range = dev_dax->nr_range - 1;
+	struct dev_dax_range *dax_range = &dev_dax->ranges[last_range];
 	struct dax_region *dax_region = dev_dax->region;
-	struct range *range = &dev_dax->range;
-	int rc = 0;
+	bool is_shrink = resource_size(res) > size;
+	struct range *range = &dax_range->range;
+	struct device *dev = &dev_dax->dev;
+	int rc;
 
 	device_lock_assert(dax_region->dev);
 
-	if (size)
-		rc = adjust_resource(res, range->start, size);
-	else
-		__release_region(&dax_region->res, range->start, range_len(range));
+	if (dev_WARN_ONCE(dev, !size, "deletion is handled by dev_dax_shrink\n"))
+		return -EINVAL;
+
+	rc = adjust_resource(res, range->start, size);
 	if (rc)
 		return rc;
 
-	dev_dax->range = (struct range) {
+	*range = (struct range) {
 		.start = range->start,
 		.end = range->start + size - 1,
 	};
 
+	dev_dbg(dev, "%s range[%d]: %#llx:%#llx\n", is_shrink ? "shrink" : "extend",
+			last_range, (unsigned long long) range->start,
+			(unsigned long long) range->end);
+
 	return 0;
 }
 
@@ -621,7 +672,11 @@ static ssize_t size_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
 	struct dev_dax *dev_dax = to_dev_dax(dev);
-	unsigned long long size = range_len(&dev_dax->range);
+	unsigned long long size;
+
+	device_lock(dev);
+	size = dev_dax_size(dev_dax);
+	device_unlock(dev);
 
 	return sprintf(buf, "%llu\n", size);
 }
@@ -639,32 +694,82 @@ static bool alloc_is_aligned(struct dax_region *dax_region,
 
 static int dev_dax_shrink(struct dev_dax *dev_dax, resource_size_t size)
 {
+	resource_size_t to_shrink = dev_dax_size(dev_dax) - size;
 	struct dax_region *dax_region = dev_dax->region;
-	struct range *range = &dev_dax->range;
-	struct resource *res, *adjust = NULL;
 	struct device *dev = &dev_dax->dev;
-
-	for_each_dax_region_resource(dax_region, res)
-		if (strcmp(res->name, dev_name(dev)) == 0
-				&& res->start == range->start) {
-			adjust = res;
-			break;
+	int i;
+
+	for (i = dev_dax->nr_range - 1; i >= 0; i--) {
+		struct range *range = &dev_dax->ranges[i].range;
+		struct resource *adjust = NULL, *res;
+		resource_size_t shrink;
+
+		shrink = min_t(u64, to_shrink, range_len(range));
+		if (shrink >= range_len(range)) {
+			__release_region(&dax_region->res, range->start,
+					range_len(range));
+			dev_dax->nr_range--;
+			dev_dbg(dev, "delete range[%d]: %#llx:%#llx\n", i,
+					(unsigned long long) range->start,
+					(unsigned long long) range->end);
+			to_shrink -= shrink;
+			if (!to_shrink)
+				break;
+			continue;
 		}
 
-	if (dev_WARN_ONCE(dev, !adjust, "failed to find matching resource\n"))
-		return -ENXIO;
-	return adjust_dev_dax_range(dev_dax, adjust, size);
+		for_each_dax_region_resource(dax_region, res)
+			if (strcmp(res->name, dev_name(dev)) == 0
+					&& res->start == range->start) {
+				adjust = res;
+				break;
+			}
+
+		if (dev_WARN_ONCE(dev, !adjust || i != dev_dax->nr_range - 1,
+					"failed to find matching resource\n"))
+			return -ENXIO;
+		return adjust_dev_dax_range(dev_dax, adjust, range_len(range)
+				- shrink);
+	}
+	return 0;
+}
+
+/*
+ * Only allow adjustments that preserve the relative pgoff of existing
+ * allocations. I.e. the dev_dax->ranges array is ordered by increasing pgoff.
+ */
+static bool adjust_ok(struct dev_dax *dev_dax, struct resource *res)
+{
+	struct dev_dax_range *last;
+	int i;
+
+	if (dev_dax->nr_range == 0)
+		return false;
+	if (strcmp(res->name, dev_name(&dev_dax->dev)) != 0)
+		return false;
+	last = &dev_dax->ranges[dev_dax->nr_range - 1];
+	if (last->range.start != res->start || last->range.end != res->end)
+		return false;
+	for (i = 0; i < dev_dax->nr_range - 1; i++) {
+		struct dev_dax_range *dax_range = &dev_dax->ranges[i];
+
+		if (dax_range->pgoff > last->pgoff)
+			return false;
+	}
+
+	return true;
 }
 
 static ssize_t dev_dax_resize(struct dax_region *dax_region,
 		struct dev_dax *dev_dax, resource_size_t size)
 {
 	resource_size_t avail = dax_region_avail_size(dax_region), to_alloc;
-	resource_size_t dev_size = range_len(&dev_dax->range);
+	resource_size_t dev_size = dev_dax_size(dev_dax);
 	struct resource *region_res = &dax_region->res;
 	struct device *dev = &dev_dax->dev;
-	const char *name = dev_name(dev);
 	struct resource *res, *first;
+	resource_size_t alloc = 0;
+	int rc;
 
 	if (dev->driver)
 		return -EBUSY;
@@ -685,35 +790,47 @@ static ssize_t dev_dax_resize(struct dax_region *dax_region,
 	 * may involve adjusting the end of an existing resource, or
 	 * allocating a new resource.
 	 */
+retry:
 	first = region_res->child;
 	if (!first)
 		return alloc_dev_dax_range(dev_dax, dax_region->res.start, to_alloc);
-	for (res = first; to_alloc && res; res = res->sibling) {
+
+	rc = -ENOSPC;
+	for (res = first; res; res = res->sibling) {
 		struct resource *next = res->sibling;
-		resource_size_t free;
 
 		/* space at the beginning of the region */
-		free = 0;
-		if (res == first && res->start > dax_region->res.start)
-			free = res->start - dax_region->res.start;
-		if (free >= to_alloc && dev_size == 0)
-			return alloc_dev_dax_range(dev_dax, dax_region->res.start, to_alloc);
+		if (res == first && res->start > dax_region->res.start) {
+			alloc = min(res->start - dax_region->res.start, to_alloc);
+			rc = alloc_dev_dax_range(dev_dax, dax_region->res.start, alloc);
+			break;
+		}
 
-		free = 0;
+		alloc = 0;
 		/* space between allocations */
 		if (next && next->start > res->end + 1)
-			free = next->start - res->end + 1;
+			alloc = min(next->start - (res->end + 1), to_alloc);
 
 		/* space at the end of the region */
-		if (free < to_alloc && !next && res->end < region_res->end)
-			free = region_res->end - res->end;
+		if (!alloc && !next && res->end < region_res->end)
+			alloc = min(region_res->end - res->end, to_alloc);
 
-		if (free >= to_alloc && strcmp(name, res->name) == 0)
-			return adjust_dev_dax_range(dev_dax, res, resource_size(res) + to_alloc);
-		else if (free >= to_alloc && dev_size == 0)
-			return alloc_dev_dax_range(dev_dax, res->end + 1, to_alloc);
+		if (!alloc)
+			continue;
+
+		if (adjust_ok(dev_dax, res)) {
+			rc = adjust_dev_dax_range(dev_dax, res, resource_size(res) + alloc);
+			break;
+		}
+		rc = alloc_dev_dax_range(dev_dax, res->end + 1, alloc);
+		break;
 	}
-	return -ENOSPC;
+	if (rc)
+		return rc;
+	to_alloc -= alloc;
+	if (to_alloc)
+		goto retry;
+	return 0;
 }
 
 static ssize_t size_store(struct device *dev, struct device_attribute *attr,
@@ -767,8 +884,15 @@ static ssize_t resource_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
 	struct dev_dax *dev_dax = to_dev_dax(dev);
+	struct dax_region *dax_region = dev_dax->region;
+	unsigned long long start;
+
+	if (dev_dax->nr_range < 1)
+		start = dax_region->res.start;
+	else
+		start = dev_dax->ranges[0].range.start;
 
-	return sprintf(buf, "%#llx\n", dev_dax->range.start);
+	return sprintf(buf, "%#llx\n", start);
 }
 static DEVICE_ATTR(resource, 0400, resource_show, NULL);
 
@@ -833,6 +957,7 @@ static void dev_dax_release(struct device *dev)
 	put_dax(dax_dev);
 	free_dev_dax_id(dev_dax);
 	dax_region_put(dax_region);
+	kfree(dev_dax->ranges);
 	kfree(dev_dax->pgmap);
 	kfree(dev_dax);
 }
@@ -941,7 +1066,7 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 err_alloc_dax:
 	kfree(dev_dax->pgmap);
 err_pgmap:
-	free_dev_dax_range(dev_dax);
+	free_dev_dax_ranges(dev_dax);
 err_range:
 	free_dev_dax_id(dev_dax);
 err_id:
diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index 0cbb2ec81ca7..f863287107fd 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -49,7 +49,8 @@ struct dax_region {
  * @id: ida allocated id
  * @dev - device core
  * @pgmap - pgmap for memmap setup / lifetime (driver owned)
- * @range: resource range for the instance
+ * @nr_range: size of @ranges
+ * @ranges: resource-span + pgoff tuples for the instance
  */
 struct dev_dax {
 	struct dax_region *region;
@@ -58,7 +59,11 @@ struct dev_dax {
 	int id;
 	struct device dev;
 	struct dev_pagemap *pgmap;
-	struct range range;
+	int nr_range;
+	struct dev_dax_range {
+		unsigned long pgoff;
+		struct range range;
+	} *ranges;
 };
 
 static inline struct dev_dax *to_dev_dax(struct device *dev)
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 5f808617672a..bf389712a20b 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -55,15 +55,22 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
 __weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff,
 		unsigned long size)
 {
-	struct range *range = &dev_dax->range;
-	phys_addr_t phys;
-
-	phys = pgoff * PAGE_SIZE + range->start;
-	if (phys >= range->start && phys <= range->end) {
+	int i;
+
+	for (i = 0; i < dev_dax->nr_range; i++) {
+		struct dev_dax_range *dax_range = &dev_dax->ranges[i];
+		struct range *range = &dax_range->range;
+		unsigned long long pgoff_end;
+		phys_addr_t phys;
+
+		pgoff_end = dax_range->pgoff + PHYS_PFN(range_len(range)) - 1;
+		if (pgoff < dax_range->pgoff || pgoff > pgoff_end)
+			continue;
+		phys = PFN_PHYS(pgoff - dax_range->pgoff) + range->start;
 		if (phys + size - 1 <= range->end)
 			return phys;
+		break;
 	}
-
 	return -1;
 }
 
@@ -395,30 +402,40 @@ static void dev_dax_kill(void *dev_dax)
 int dev_dax_probe(struct dev_dax *dev_dax)
 {
 	struct dax_device *dax_dev = dev_dax->dax_dev;
-	struct range *range = &dev_dax->range;
 	struct device *dev = &dev_dax->dev;
 	struct dev_pagemap *pgmap;
 	struct inode *inode;
 	struct cdev *cdev;
 	void *addr;
-	int rc;
-
-	/* 1:1 map region resource range to device-dax instance range */
-	if (!devm_request_mem_region(dev, range->start, range_len(range),
-				dev_name(dev))) {
-		dev_warn(dev, "could not reserve range: %#llx - %#llx\n",
-				range->start, range->end);
-		return -EBUSY;
-	}
+	int rc, i;
 
 	pgmap = dev_dax->pgmap;
+	if (dev_WARN_ONCE(dev, pgmap && dev_dax->nr_range > 1,
+			"static pgmap / multi-range device conflict\n"))
+		return -EINVAL;
+
 	if (!pgmap) {
-		pgmap = devm_kzalloc(dev, sizeof(*pgmap), GFP_KERNEL);
+		pgmap = devm_kzalloc(dev, sizeof(*pgmap) + sizeof(struct range)
+				* (dev_dax->nr_range - 1), GFP_KERNEL);
 		if (!pgmap)
 			return -ENOMEM;
-		pgmap->range = *range;
-		pgmap->nr_range = 1;
+		pgmap->nr_range = dev_dax->nr_range;
+	}
+
+	for (i = 0; i < dev_dax->nr_range; i++) {
+		struct range *range = &dev_dax->ranges[i].range;
+
+		if (!devm_request_mem_region(dev, range->start,
+					range_len(range), dev_name(dev))) {
+			dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve range\n",
+					i, range->start, range->end);
+			return -EBUSY;
+		}
+		/* don't update the range for static pgmap */
+		if (!dev_dax->pgmap)
+			pgmap->ranges[i] = *range;
 	}
+
 	pgmap->type = MEMORY_DEVICE_GENERIC;
 	addr = devm_memremap_pages(dev, pgmap);
 	if (IS_ERR(addr))
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index c2ac465cc342..6c933f2b604e 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -19,24 +19,28 @@ static const char *kmem_name;
 /* Set if any memory will remain added when the driver will be unloaded. */
 static bool any_hotremove_failed;
 
-static struct range dax_kmem_range(struct dev_dax *dev_dax)
+static int dax_kmem_range(struct dev_dax *dev_dax, int i, struct range *r)
 {
-	struct range range;
+	struct dev_dax_range *dax_range = &dev_dax->ranges[i];
+	struct range *range = &dax_range->range;
 
 	/* memory-block align the hotplug range */
-	range.start = ALIGN(dev_dax->range.start, memory_block_size_bytes());
-	range.end = ALIGN_DOWN(dev_dax->range.end + 1, memory_block_size_bytes()) - 1;
-	return range;
+	r->start = ALIGN(range->start, memory_block_size_bytes());
+	r->end = ALIGN_DOWN(range->end + 1, memory_block_size_bytes()) - 1;
+	if (r->start >= r->end) {
+		r->start = range->start;
+		r->end = range->end;
+		return -ENOSPC;
+	}
+	return 0;
 }
 
 static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
 {
-	struct range range = dax_kmem_range(dev_dax);
 	struct device *dev = &dev_dax->dev;
-	struct resource *res;
+	int i, mapped = 0;
 	char *res_name;
 	int numa_node;
-	int rc;
 
 	/*
 	 * Ensure good NUMA information for the persistent memory.
@@ -55,31 +59,58 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
 	if (!res_name)
 		return -ENOMEM;
 
-	/* Region is permanently reserved if hotremove fails. */
-	res = request_mem_region(range.start, range_len(&range), res_name);
-	if (!res) {
-		dev_warn(dev, "could not reserve region [%#llx-%#llx]\n", range.start, range.end);
-		kfree(res_name);
-		return -EBUSY;
-	}
-
-	/*
-	 * Set flags appropriate for System RAM.  Leave ..._BUSY clear
-	 * so that add_memory() can add a child resource.  Do not
-	 * inherit flags from the parent since it may set new flags
-	 * unknown to us that will break add_memory() below.
-	 */
-	res->flags = IORESOURCE_SYSTEM_RAM;
-
-	/*
-	 * Ensure that future kexec'd kernels will not treat this as RAM
-	 * automatically.
-	 */
-	rc = add_memory_driver_managed(numa_node, range.start, range_len(&range), kmem_name);
-	if (rc) {
-		release_mem_region(range.start, range_len(&range));
-		kfree(res_name);
-		return rc;
+	for (i = 0; i < dev_dax->nr_range; i++) {
+		struct resource *res;
+		struct range range;
+		int rc;
+
+		rc = dax_kmem_range(dev_dax, i, &range);
+		if (rc) {
+			dev_info(dev, "mapping%d: %#llx-%#llx too small after alignment\n",
+					i, range.start, range.end);
+			continue;
+		}
+
+		/* Region is permanently reserved if hotremove fails. */
+		res = request_mem_region(range.start, range_len(&range), res_name);
+		if (!res) {
+			dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve region\n",
+					i, range.start, range.end);
+			/*
+			 * Once some memory has been onlined we can't
+			 * assume that it can be un-onlined safely.
+			 */
+			if (mapped)
+				continue;
+			kfree(res_name);
+			return -EBUSY;
+		}
+
+		/*
+		 * Set flags appropriate for System RAM.  Leave ..._BUSY clear
+		 * so that add_memory() can add a child resource.  Do not
+		 * inherit flags from the parent since it may set new flags
+		 * unknown to us that will break add_memory() below.
+		 */
+		res->flags = IORESOURCE_SYSTEM_RAM;
+
+		/*
+		 * Ensure that future kexec'd kernels will not treat
+		 * this as RAM automatically.
+		 */
+		rc = add_memory_driver_managed(numa_node, range.start,
+				range_len(&range), kmem_name);
+
+		if (rc) {
+			dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
+					i, range.start, range.end);
+			release_mem_region(range.start, range_len(&range));
+			if (mapped)
+				continue;
+			kfree(res_name);
+			return rc;
+		}
+		mapped++;
 	}
 
 	dev_set_drvdata(dev, res_name);
@@ -90,9 +121,8 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
 #ifdef CONFIG_MEMORY_HOTREMOVE
 static int dev_dax_kmem_remove(struct dev_dax *dev_dax)
 {
-	int rc;
+	int i, success = 0;
 	struct device *dev = &dev_dax->dev;
-	struct range range = dax_kmem_range(dev_dax);
 	const char *res_name = dev_get_drvdata(dev);
 
 	/*
@@ -101,17 +131,31 @@ static int dev_dax_kmem_remove(struct dev_dax *dev_dax)
 	 * there is no way to hotremove this memory until reboot because device
 	 * unbind will succeed even if we return failure.
 	 */
-	rc = remove_memory(dev_dax->target_node, range.start, range_len(&range));
-	if (rc) {
+	for (i = 0; i < dev_dax->nr_range; i++) {
+		struct range range;
+		int rc;
+
+		rc = dax_kmem_range(dev_dax, i, &range);
+		if (rc)
+			continue;
+
+		rc = remove_memory(dev_dax->target_node, range.start,
+				range_len(&range));
+		if (rc == 0) {
+			release_mem_region(range.start, range_len(&range));
+			success++;
+			continue;
+		}
 		any_hotremove_failed = true;
-		dev_err(dev, "%#llx-%#llx cannot be hotremoved until the next reboot\n",
-				range.start, range.end);
-		return rc;
+		dev_err(dev,
+			"mapping%d: %#llx-%#llx cannot be hotremoved until the next reboot\n",
+				i, range.start, range.end);
 	}
 
-	/* Release and free dax resources */
-	release_mem_region(range.start, range_len(&range));
-	kfree(res_name);
+	if (success >= dev_dax->nr_range) {
+		kfree(res_name);
+		dev_set_drvdata(dev, NULL);
+	}
 
 	return 0;
 }
diff --git a/tools/testing/nvdimm/dax-dev.c b/tools/testing/nvdimm/dax-dev.c
index 38d8e55c4a0d..fb342a8c98d3 100644
--- a/tools/testing/nvdimm/dax-dev.c
+++ b/tools/testing/nvdimm/dax-dev.c
@@ -9,11 +9,18 @@
 phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff,
 		unsigned long size)
 {
-	struct range *range = &dev_dax->range;
-	phys_addr_t addr;
+	int i;
 
-	addr = pgoff * PAGE_SIZE + range->start;
-	if (addr >= range->start && addr <= range->end) {
+	for (i = 0; i < dev_dax->nr_range; i++) {
+		struct dev_dax_range *dax_range = &dev_dax->ranges[i];
+		struct range *range = &dax_range->range;
+		unsigned long long pgoff_end;
+		phys_addr_t addr;
+
+		pgoff_end = dax_range->pgoff + PHYS_PFN(range_len(range)) - 1;
+		if (pgoff < dax_range->pgoff || pgoff > pgoff_end)
+			continue;
+		addr = PFN_PHYS(pgoff - dax_range->pgoff) + range->start;
 		if (addr + size - 1 <= range->end) {
 			if (get_nfit_res(addr)) {
 				struct page *page;
@@ -23,9 +30,10 @@ phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff,
 
 				page = vmalloc_to_page((void *)addr);
 				return PFN_PHYS(page_to_pfn(page));
-			} else
-				return addr;
+			}
+			return addr;
 		}
+		break;
 	}
 	return -1;
 }


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 13/17] device-dax: introduce 'mapping' devices
  2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
                   ` (11 preceding siblings ...)
  2020-09-25 19:12 ` [PATCH v5 12/17] device-dax: add dis-contiguous resource support Dan Williams
@ 2020-09-25 19:12 ` Dan Williams
  2020-09-25 19:12 ` [PATCH v5 14/17] device-dax: make align a per-device property Dan Williams
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Dan Williams @ 2020-09-25 19:12 UTC (permalink / raw)
  To: akpm
  Cc: Joao Martins, vishal.l.verma, dave.hansen, linux-mm,
	linux-nvdimm, linux-kernel

In support of interrogating the physical address layout of a device with
dis-contiguous ranges, introduce a sysfs directory with 'start', 'end',
and 'page_offset' attributes.  The alternative is trying to parse
/proc/iomem, and that file will not reflect the extent layout until the
device is enabled.

Link: https://lkml.kernel.org/r/159643104819.4062302.13691281391423291589.stgit@dwillia2-desk3.amr.corp.intel.com
Cc: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/bus.c         |  191 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/dax/dax-private.h |   14 +++
 2 files changed, 203 insertions(+), 2 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 06a789aba58a..005fa3e6d41c 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -579,6 +579,167 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 }
 EXPORT_SYMBOL_GPL(alloc_dax_region);
 
+static void dax_mapping_release(struct device *dev)
+{
+	struct dax_mapping *mapping = to_dax_mapping(dev);
+	struct dev_dax *dev_dax = to_dev_dax(dev->parent);
+
+	ida_free(&dev_dax->ida, mapping->id);
+	kfree(mapping);
+}
+
+static void unregister_dax_mapping(void *data)
+{
+	struct device *dev = data;
+	struct dax_mapping *mapping = to_dax_mapping(dev);
+	struct dev_dax *dev_dax = to_dev_dax(dev->parent);
+	struct dax_region *dax_region = dev_dax->region;
+
+	dev_dbg(dev, "%s\n", __func__);
+
+	device_lock_assert(dax_region->dev);
+
+	dev_dax->ranges[mapping->range_id].mapping = NULL;
+	mapping->range_id = -1;
+
+	device_del(dev);
+	put_device(dev);
+}
+
+static struct dev_dax_range *get_dax_range(struct device *dev)
+{
+	struct dax_mapping *mapping = to_dax_mapping(dev);
+	struct dev_dax *dev_dax = to_dev_dax(dev->parent);
+	struct dax_region *dax_region = dev_dax->region;
+
+	device_lock(dax_region->dev);
+	if (mapping->range_id < 0) {
+		device_unlock(dax_region->dev);
+		return NULL;
+	}
+
+	return &dev_dax->ranges[mapping->range_id];
+}
+
+static void put_dax_range(struct dev_dax_range *dax_range)
+{
+	struct dax_mapping *mapping = dax_range->mapping;
+	struct dev_dax *dev_dax = to_dev_dax(mapping->dev.parent);
+	struct dax_region *dax_region = dev_dax->region;
+
+	device_unlock(dax_region->dev);
+}
+
+static ssize_t start_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct dev_dax_range *dax_range;
+	ssize_t rc;
+
+	dax_range = get_dax_range(dev);
+	if (!dax_range)
+		return -ENXIO;
+	rc = sprintf(buf, "%#llx\n", dax_range->range.start);
+	put_dax_range(dax_range);
+
+	return rc;
+}
+static DEVICE_ATTR(start, 0400, start_show, NULL);
+
+static ssize_t end_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct dev_dax_range *dax_range;
+	ssize_t rc;
+
+	dax_range = get_dax_range(dev);
+	if (!dax_range)
+		return -ENXIO;
+	rc = sprintf(buf, "%#llx\n", dax_range->range.end);
+	put_dax_range(dax_range);
+
+	return rc;
+}
+static DEVICE_ATTR(end, 0400, end_show, NULL);
+
+static ssize_t pgoff_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct dev_dax_range *dax_range;
+	ssize_t rc;
+
+	dax_range = get_dax_range(dev);
+	if (!dax_range)
+		return -ENXIO;
+	rc = sprintf(buf, "%#lx\n", dax_range->pgoff);
+	put_dax_range(dax_range);
+
+	return rc;
+}
+static DEVICE_ATTR(page_offset, 0400, pgoff_show, NULL);
+
+static struct attribute *dax_mapping_attributes[] = {
+	&dev_attr_start.attr,
+	&dev_attr_end.attr,
+	&dev_attr_page_offset.attr,
+	NULL,
+};
+
+static const struct attribute_group dax_mapping_attribute_group = {
+	.attrs = dax_mapping_attributes,
+};
+
+static const struct attribute_group *dax_mapping_attribute_groups[] = {
+	&dax_mapping_attribute_group,
+	NULL,
+};
+
+static struct device_type dax_mapping_type = {
+	.release = dax_mapping_release,
+	.groups = dax_mapping_attribute_groups,
+};
+
+static int devm_register_dax_mapping(struct dev_dax *dev_dax, int range_id)
+{
+	struct dax_region *dax_region = dev_dax->region;
+	struct dax_mapping *mapping;
+	struct device *dev;
+	int rc;
+
+	device_lock_assert(dax_region->dev);
+
+	if (dev_WARN_ONCE(&dev_dax->dev, !dax_region->dev->driver,
+				"region disabled\n"))
+		return -ENXIO;
+
+	mapping = kzalloc(sizeof(*mapping), GFP_KERNEL);
+	if (!mapping)
+		return -ENOMEM;
+	mapping->range_id = range_id;
+	mapping->id = ida_alloc(&dev_dax->ida, GFP_KERNEL);
+	if (mapping->id < 0) {
+		kfree(mapping);
+		return -ENOMEM;
+	}
+	dev_dax->ranges[range_id].mapping = mapping;
+	dev = &mapping->dev;
+	device_initialize(dev);
+	dev->parent = &dev_dax->dev;
+	dev->type = &dax_mapping_type;
+	dev_set_name(dev, "mapping%d", mapping->id);
+	rc = device_add(dev);
+	if (rc) {
+		put_device(dev);
+		return rc;
+	}
+
+	rc = devm_add_action_or_reset(dax_region->dev, unregister_dax_mapping,
+			dev);
+	if (rc)
+		return rc;
+	return 0;
+}
+
 static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start,
 		resource_size_t size)
 {
@@ -588,7 +749,7 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start,
 	struct dev_dax_range *ranges;
 	unsigned long pgoff = 0;
 	struct resource *alloc;
-	int i;
+	int i, rc;
 
 	device_lock_assert(dax_region->dev);
 
@@ -633,6 +794,22 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start,
 
 	dev_dbg(dev, "alloc range[%d]: %pa:%pa\n", dev_dax->nr_range - 1,
 			&alloc->start, &alloc->end);
+	/*
+	 * A dev_dax instance must be registered before mapping device
+	 * children can be added. Defer to devm_create_dev_dax() to add
+	 * the initial mapping device.
+	 */
+	if (!device_is_registered(&dev_dax->dev))
+		return 0;
+
+	rc = devm_register_dax_mapping(dev_dax, dev_dax->nr_range - 1);
+	if (rc) {
+		dev_dbg(dev, "delete range[%d]: %pa:%pa\n", dev_dax->nr_range - 1,
+				&alloc->start, &alloc->end);
+		dev_dax->nr_range--;
+		__release_region(res, alloc->start, resource_size(alloc));
+		return rc;
+	}
 
 	return 0;
 }
@@ -701,11 +878,14 @@ static int dev_dax_shrink(struct dev_dax *dev_dax, resource_size_t size)
 
 	for (i = dev_dax->nr_range - 1; i >= 0; i--) {
 		struct range *range = &dev_dax->ranges[i].range;
+		struct dax_mapping *mapping = dev_dax->ranges[i].mapping;
 		struct resource *adjust = NULL, *res;
 		resource_size_t shrink;
 
 		shrink = min_t(u64, to_shrink, range_len(range));
 		if (shrink >= range_len(range)) {
+			devm_release_action(dax_region->dev,
+					unregister_dax_mapping, &mapping->dev);
 			__release_region(&dax_region->res, range->start,
 					range_len(range));
 			dev_dax->nr_range--;
@@ -1036,9 +1216,9 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 	/* a device_dax instance is dead while the driver is not attached */
 	kill_dax(dax_dev);
 
-	/* from here on we're committed to teardown via dev_dax_release() */
 	dev_dax->dax_dev = dax_dev;
 	dev_dax->target_node = dax_region->target_node;
+	ida_init(&dev_dax->ida);
 	kref_get(&dax_region->kref);
 
 	inode = dax_inode(dax_dev);
@@ -1061,6 +1241,13 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 	if (rc)
 		return ERR_PTR(rc);
 
+	/* register mapping device for the initial allocation range */
+	if (dev_dax->nr_range && range_len(&dev_dax->ranges[0].range)) {
+		rc = devm_register_dax_mapping(dev_dax, 0);
+		if (rc)
+			return ERR_PTR(rc);
+	}
+
 	return dev_dax;
 
 err_alloc_dax:
diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index f863287107fd..13780f62b95e 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -40,6 +40,12 @@ struct dax_region {
 	struct device *youngest;
 };
 
+struct dax_mapping {
+	struct device dev;
+	int range_id;
+	int id;
+};
+
 /**
  * struct dev_dax - instance data for a subdivision of a dax region, and
  * data while the device is activated in the driver.
@@ -47,6 +53,7 @@ struct dax_region {
  * @dax_dev - core dax functionality
  * @target_node: effective numa node if dev_dax memory range is onlined
  * @id: ida allocated id
+ * @ida: mapping id allocator
  * @dev - device core
  * @pgmap - pgmap for memmap setup / lifetime (driver owned)
  * @nr_range: size of @ranges
@@ -57,12 +64,14 @@ struct dev_dax {
 	struct dax_device *dax_dev;
 	int target_node;
 	int id;
+	struct ida ida;
 	struct device dev;
 	struct dev_pagemap *pgmap;
 	int nr_range;
 	struct dev_dax_range {
 		unsigned long pgoff;
 		struct range range;
+		struct dax_mapping *mapping;
 	} *ranges;
 };
 
@@ -70,4 +79,9 @@ static inline struct dev_dax *to_dev_dax(struct device *dev)
 {
 	return container_of(dev, struct dev_dax, dev);
 }
+
+static inline struct dax_mapping *to_dax_mapping(struct device *dev)
+{
+	return container_of(dev, struct dax_mapping, dev);
+}
 #endif


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 14/17] device-dax: make align a per-device property
  2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
                   ` (12 preceding siblings ...)
  2020-09-25 19:12 ` [PATCH v5 13/17] device-dax: introduce 'mapping' devices Dan Williams
@ 2020-09-25 19:12 ` Dan Williams
  2020-09-25 19:13 ` [PATCH v5 16/17] dax/hmem: introduce dax_hmem.region_idle parameter Dan Williams
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Dan Williams @ 2020-09-25 19:12 UTC (permalink / raw)
  To: akpm
  Cc: Joao Martins, vishal.l.verma, dave.hansen, linux-mm,
	linux-nvdimm, linux-kernel

From: Joao Martins <joao.m.martins@oracle.com>

Introduce @align to struct dev_dax.

When creating a new device, we still initialize to the default dax_region
@align.  Child devices belonging to a region may wish to keep a different
alignment property instead of a global region-defined one.

Link: https://lkml.kernel.org/r/159643105377.4062302.4159447829955683131.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lore.kernel.org/r/20200716172913.19658-2-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/bus.c         |    1 +
 drivers/dax/dax-private.h |    3 +++
 drivers/dax/device.c      |   41 +++++++++++++++--------------------------
 3 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 005fa3e6d41c..852899084d13 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1218,6 +1218,7 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 
 	dev_dax->dax_dev = dax_dev;
 	dev_dax->target_node = dax_region->target_node;
+	dev_dax->align = dax_region->align;
 	ida_init(&dev_dax->ida);
 	kref_get(&dax_region->kref);
 
diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index 13780f62b95e..5fd3a26cfcea 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -62,6 +62,7 @@ struct dax_mapping {
 struct dev_dax {
 	struct dax_region *region;
 	struct dax_device *dax_dev;
+	unsigned int align;
 	int target_node;
 	int id;
 	struct ida ida;
@@ -84,4 +85,6 @@ static inline struct dax_mapping *to_dax_mapping(struct device *dev)
 {
 	return container_of(dev, struct dax_mapping, dev);
 }
+
+phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff, unsigned long size);
 #endif
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index bf389712a20b..25e0b84a4296 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -17,7 +17,6 @@
 static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
 		const char *func)
 {
-	struct dax_region *dax_region = dev_dax->region;
 	struct device *dev = &dev_dax->dev;
 	unsigned long mask;
 
@@ -32,7 +31,7 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
 		return -EINVAL;
 	}
 
-	mask = dax_region->align - 1;
+	mask = dev_dax->align - 1;
 	if (vma->vm_start & mask || vma->vm_end & mask) {
 		dev_info_ratelimited(dev,
 				"%s: %s: fail, unaligned vma (%#lx - %#lx, %#lx)\n",
@@ -78,21 +77,19 @@ static vm_fault_t __dev_dax_pte_fault(struct dev_dax *dev_dax,
 				struct vm_fault *vmf, pfn_t *pfn)
 {
 	struct device *dev = &dev_dax->dev;
-	struct dax_region *dax_region;
 	phys_addr_t phys;
 	unsigned int fault_size = PAGE_SIZE;
 
 	if (check_vma(dev_dax, vmf->vma, __func__))
 		return VM_FAULT_SIGBUS;
 
-	dax_region = dev_dax->region;
-	if (dax_region->align > PAGE_SIZE) {
+	if (dev_dax->align > PAGE_SIZE) {
 		dev_dbg(dev, "alignment (%#x) > fault size (%#x)\n",
-			dax_region->align, fault_size);
+			dev_dax->align, fault_size);
 		return VM_FAULT_SIGBUS;
 	}
 
-	if (fault_size != dax_region->align)
+	if (fault_size != dev_dax->align)
 		return VM_FAULT_SIGBUS;
 
 	phys = dax_pgoff_to_phys(dev_dax, vmf->pgoff, PAGE_SIZE);
@@ -111,7 +108,6 @@ static vm_fault_t __dev_dax_pmd_fault(struct dev_dax *dev_dax,
 {
 	unsigned long pmd_addr = vmf->address & PMD_MASK;
 	struct device *dev = &dev_dax->dev;
-	struct dax_region *dax_region;
 	phys_addr_t phys;
 	pgoff_t pgoff;
 	unsigned int fault_size = PMD_SIZE;
@@ -119,16 +115,15 @@ static vm_fault_t __dev_dax_pmd_fault(struct dev_dax *dev_dax,
 	if (check_vma(dev_dax, vmf->vma, __func__))
 		return VM_FAULT_SIGBUS;
 
-	dax_region = dev_dax->region;
-	if (dax_region->align > PMD_SIZE) {
+	if (dev_dax->align > PMD_SIZE) {
 		dev_dbg(dev, "alignment (%#x) > fault size (%#x)\n",
-			dax_region->align, fault_size);
+			dev_dax->align, fault_size);
 		return VM_FAULT_SIGBUS;
 	}
 
-	if (fault_size < dax_region->align)
+	if (fault_size < dev_dax->align)
 		return VM_FAULT_SIGBUS;
-	else if (fault_size > dax_region->align)
+	else if (fault_size > dev_dax->align)
 		return VM_FAULT_FALLBACK;
 
 	/* if we are outside of the VMA */
@@ -154,7 +149,6 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax,
 {
 	unsigned long pud_addr = vmf->address & PUD_MASK;
 	struct device *dev = &dev_dax->dev;
-	struct dax_region *dax_region;
 	phys_addr_t phys;
 	pgoff_t pgoff;
 	unsigned int fault_size = PUD_SIZE;
@@ -163,16 +157,15 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax,
 	if (check_vma(dev_dax, vmf->vma, __func__))
 		return VM_FAULT_SIGBUS;
 
-	dax_region = dev_dax->region;
-	if (dax_region->align > PUD_SIZE) {
+	if (dev_dax->align > PUD_SIZE) {
 		dev_dbg(dev, "alignment (%#x) > fault size (%#x)\n",
-			dax_region->align, fault_size);
+			dev_dax->align, fault_size);
 		return VM_FAULT_SIGBUS;
 	}
 
-	if (fault_size < dax_region->align)
+	if (fault_size < dev_dax->align)
 		return VM_FAULT_SIGBUS;
-	else if (fault_size > dax_region->align)
+	else if (fault_size > dev_dax->align)
 		return VM_FAULT_FALLBACK;
 
 	/* if we are outside of the VMA */
@@ -267,9 +260,8 @@ static int dev_dax_split(struct vm_area_struct *vma, unsigned long addr)
 {
 	struct file *filp = vma->vm_file;
 	struct dev_dax *dev_dax = filp->private_data;
-	struct dax_region *dax_region = dev_dax->region;
 
-	if (!IS_ALIGNED(addr, dax_region->align))
+	if (!IS_ALIGNED(addr, dev_dax->align))
 		return -EINVAL;
 	return 0;
 }
@@ -278,9 +270,8 @@ static unsigned long dev_dax_pagesize(struct vm_area_struct *vma)
 {
 	struct file *filp = vma->vm_file;
 	struct dev_dax *dev_dax = filp->private_data;
-	struct dax_region *dax_region = dev_dax->region;
 
-	return dax_region->align;
+	return dev_dax->align;
 }
 
 static const struct vm_operations_struct dax_vm_ops = {
@@ -319,13 +310,11 @@ static unsigned long dax_get_unmapped_area(struct file *filp,
 {
 	unsigned long off, off_end, off_align, len_align, addr_align, align;
 	struct dev_dax *dev_dax = filp ? filp->private_data : NULL;
-	struct dax_region *dax_region;
 
 	if (!dev_dax || addr)
 		goto out;
 
-	dax_region = dev_dax->region;
-	align = dax_region->align;
+	align = dev_dax->align;
 	off = pgoff << PAGE_SHIFT;
 	off_end = off + len;
 	off_align = round_up(off, align);


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 16/17] dax/hmem: introduce dax_hmem.region_idle parameter
  2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
                   ` (13 preceding siblings ...)
  2020-09-25 19:12 ` [PATCH v5 14/17] device-dax: make align a per-device property Dan Williams
@ 2020-09-25 19:13 ` Dan Williams
  2020-09-25 19:13 ` [PATCH v5 17/17] device-dax: add a range mapping allocation attribute Dan Williams
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 31+ messages in thread
From: Dan Williams @ 2020-09-25 19:13 UTC (permalink / raw)
  To: akpm
  Cc: Joao Martins, vishal.l.verma, dave.hansen, linux-mm,
	linux-nvdimm, linux-kernel

From: Joao Martins <joao.m.martins@oracle.com>

Introduce a new module parameter for dax_hmem which initializes all region
devices as free, rather than allocating a pagemap for the region by
default.

All hmem devices created with dax_hmem.region_idle=1 will have full
available size for creating dynamic dax devices.

Link: https://lkml.kernel.org/r/159643106460.4062302.5868522341307530091.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lore.kernel.org/r/20200716172913.19658-4-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/hmem/hmem.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 1a3347bb6143..1bf040dbc834 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -5,6 +5,9 @@
 #include <linux/pfn_t.h>
 #include "../bus.h"
 
+static bool region_idle;
+module_param_named(region_idle, region_idle, bool, 0644);
+
 static int dax_hmem_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
@@ -30,7 +33,7 @@ static int dax_hmem_probe(struct platform_device *pdev)
 	data = (struct dev_dax_data) {
 		.dax_region = dax_region,
 		.id = -1,
-		.size = resource_size(res),
+		.size = region_idle ? 0 : resource_size(res),
 	};
 	dev_dax = devm_create_dev_dax(&data);
 	if (IS_ERR(dev_dax))


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 17/17] device-dax: add a range mapping allocation attribute
  2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
                   ` (14 preceding siblings ...)
  2020-09-25 19:13 ` [PATCH v5 16/17] dax/hmem: introduce dax_hmem.region_idle parameter Dan Williams
@ 2020-09-25 19:13 ` Dan Williams
  2020-09-25 20:51 ` [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Joao Martins
       [not found] ` <160106118486.30709.13012322227204800596.stgit@dwillia2-desk3.amr.corp.intel.com>
  17 siblings, 0 replies; 31+ messages in thread
From: Dan Williams @ 2020-09-25 19:13 UTC (permalink / raw)
  To: akpm
  Cc: Joao Martins, vishal.l.verma, dave.hansen, linux-mm,
	linux-nvdimm, linux-kernel

From: Joao Martins <joao.m.martins@oracle.com>

Add a sysfs attribute which denotes a range from the dax region to be
allocated.  It's an write only @mapping sysfs attribute in the format of
'<start>-<end>' to allocate a range.  @start and @end use hexadecimal
values and the @pgoff is implicitly ordered wrt to previous writes to
@mapping sysfs e.g.  a write of a range of length 1G the pgoff is
0..1G(-4K), a second write will use @pgoff for 1G+4K..<size>.

This range mapping interface is useful for:

 1) Application which want to implement its own allocation logic,
 and thus pick the desired ranges from dax_region.

 2) For use cases like VMM fast restart[0] where after kexec we
 want to the same gpa<->phys mappings (as originally created
 before kexec).

[0] https://static.sched.com/hosted_files/kvmforum2019/66/VMM-fast-restart_kvmforum2019.pdf

Link: https://lkml.kernel.org/r/159643106970.4062302.10402616567780784722.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lore.kernel.org/r/20200716172913.19658-5-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/bus.c |   64 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 0ac4a9c0fd18..27513d311242 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1043,6 +1043,67 @@ static ssize_t size_store(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR_RW(size);
 
+static ssize_t range_parse(const char *opt, size_t len, struct range *range)
+{
+	unsigned long long addr = 0;
+	char *start, *end, *str;
+	ssize_t rc = EINVAL;
+
+	str = kstrdup(opt, GFP_KERNEL);
+	if (!str)
+		return rc;
+
+	end = str;
+	start = strsep(&end, "-");
+	if (!start || !end)
+		goto err;
+
+	rc = kstrtoull(start, 16, &addr);
+	if (rc)
+		goto err;
+	range->start = addr;
+
+	rc = kstrtoull(end, 16, &addr);
+	if (rc)
+		goto err;
+	range->end = addr;
+
+err:
+	kfree(str);
+	return rc;
+}
+
+static ssize_t mapping_store(struct device *dev, struct device_attribute *attr,
+		const char *buf, size_t len)
+{
+	struct dev_dax *dev_dax = to_dev_dax(dev);
+	struct dax_region *dax_region = dev_dax->region;
+	size_t to_alloc;
+	struct range r;
+	ssize_t rc;
+
+	rc = range_parse(buf, len, &r);
+	if (rc)
+		return rc;
+
+	rc = -ENXIO;
+	device_lock(dax_region->dev);
+	if (!dax_region->dev->driver) {
+		device_unlock(dax_region->dev);
+		return rc;
+	}
+	device_lock(dev);
+
+	to_alloc = range_len(&r);
+	if (alloc_is_aligned(dev_dax, to_alloc))
+		rc = alloc_dev_dax_range(dev_dax, r.start, to_alloc);
+	device_unlock(dev);
+	device_unlock(dax_region->dev);
+
+	return rc == 0 ? len : rc;
+}
+static DEVICE_ATTR_WO(mapping);
+
 static ssize_t align_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
@@ -1175,6 +1236,8 @@ static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n)
 		return 0;
 	if (a == &dev_attr_numa_node.attr && !IS_ENABLED(CONFIG_NUMA))
 		return 0;
+	if (a == &dev_attr_mapping.attr && is_static(dax_region))
+		return 0;
 	if ((a == &dev_attr_align.attr ||
 	     a == &dev_attr_size.attr) && is_static(dax_region))
 		return 0444;
@@ -1184,6 +1247,7 @@ static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n)
 static struct attribute *dev_dax_attributes[] = {
 	&dev_attr_modalias.attr,
 	&dev_attr_size.attr,
+	&dev_attr_mapping.attr,
 	&dev_attr_target_node.attr,
 	&dev_attr_align.attr,
 	&dev_attr_resource.attr,


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges
  2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
                   ` (15 preceding siblings ...)
  2020-09-25 19:13 ` [PATCH v5 17/17] device-dax: add a range mapping allocation attribute Dan Williams
@ 2020-09-25 20:51 ` Joao Martins
  2020-09-25 21:01   ` Dan Williams
       [not found] ` <160106118486.30709.13012322227204800596.stgit@dwillia2-desk3.amr.corp.intel.com>
  17 siblings, 1 reply; 31+ messages in thread
From: Joao Martins @ 2020-09-25 20:51 UTC (permalink / raw)
  To: Dan Williams, akpm
  Cc: David Hildenbrand, Ira Weiny, Bjorn Helgaas, Vishal Verma,
	Dave Hansen, David Airlie, Vivek Goyal, Dave Jiang,
	Jonathan Cameron, Greg Kroah-Hartman, Pavel Tatashin, Hulk Robot,
	Ben Skeggs, Benjamin Herrenschmidt, Jia He,
	Jérôme Glisse, Jason Yan, Paul Mackerras,
	Boris Ostrovsky, Brice Goglin, Stefano Stabellini,
	Michael Ellerman, Juergen Gross, Daniel Vetter, linux-mm,
	linux-nvdimm, linux-kernel

Hey Dan,

On 9/25/20 8:11 PM, Dan Williams wrote:
> Changes since v4 [1]:
> - Rebased on
>   device-dax-move-instance-creation-parameters-to-struct-dev_dax_data.patch
>   in -mm [2]. I.e. patches that did not need fixups from v4 are not
>   included.
> 
> - Folded all fixes
> 

Hmm, perhaps you missed the fixups before the above mentioned patch?

From:

	https://www.ozlabs.org/~akpm/mmots/series

under "mm/dax", I am listing those fixups here:

x86-numa-add-nohmat-option-fix.patch
acpi-hmat-refactor-hmat_register_target_device-to-hmem_register_device-fix.patch
mm-memory_hotplug-introduce-default-phys_to_target_node-implementation-fix.patch
acpi-hmat-attach-a-device-for-each-soft-reserved-range-fix.patch

(in https://www.ozlabs.org/~akpm/mmots/broken-out/)

[...]

> ---
> 
> Andrew, this series replaces
> 
> device-dax-make-pgmap-optional-for-instance-creation.patch
> 
> ...through...
> 
> dax-hmem-introduce-dax_hmemregion_idle-parameter.patch
> 
> ...in your stack.
> 
> Let me know if there is a different / preferred way to refresh a bulk of
> patches in your queue when only a subset need updates.
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges
  2020-09-25 20:51 ` [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Joao Martins
@ 2020-09-25 21:01   ` Dan Williams
  2020-09-25 21:05     ` Joao Martins
  0 siblings, 1 reply; 31+ messages in thread
From: Dan Williams @ 2020-09-25 21:01 UTC (permalink / raw)
  To: Joao Martins
  Cc: Andrew Morton, David Hildenbrand, Ira Weiny, Bjorn Helgaas,
	Vishal Verma, Dave Hansen, David Airlie, Vivek Goyal, Dave Jiang,
	Jonathan Cameron, Greg Kroah-Hartman, Pavel Tatashin, Hulk Robot,
	Ben Skeggs, Benjamin Herrenschmidt, Jia He,
	Jérôme Glisse, Jason Yan, Paul Mackerras,
	Boris Ostrovsky, Brice Goglin, Stefano Stabellini,
	Michael Ellerman, Juergen Gross, Daniel Vetter, Linux MM,
	linux-nvdimm, Linux Kernel Mailing List

On Fri, Sep 25, 2020 at 1:52 PM Joao Martins <joao.m.martins@oracle.com> wrote:
>
> Hey Dan,
>
> On 9/25/20 8:11 PM, Dan Williams wrote:
> > Changes since v4 [1]:
> > - Rebased on
> >   device-dax-move-instance-creation-parameters-to-struct-dev_dax_data.patch
> >   in -mm [2]. I.e. patches that did not need fixups from v4 are not
> >   included.
> >
> > - Folded all fixes
> >
>
> Hmm, perhaps you missed the fixups before the above mentioned patch?
>
> From:
>
>         https://www.ozlabs.org/~akpm/mmots/series
>
> under "mm/dax", I am listing those fixups here:
>
> x86-numa-add-nohmat-option-fix.patch
> acpi-hmat-refactor-hmat_register_target_device-to-hmem_register_device-fix.patch
> mm-memory_hotplug-introduce-default-phys_to_target_node-implementation-fix.patch
> acpi-hmat-attach-a-device-for-each-soft-reserved-range-fix.patch
>
> (in https://www.ozlabs.org/~akpm/mmots/broken-out/)

I left those for Andrew to handle. I actually should have started this
set one more down in his stack because that's where my new changes
start.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges
  2020-09-25 21:01   ` Dan Williams
@ 2020-09-25 21:05     ` Joao Martins
  0 siblings, 0 replies; 31+ messages in thread
From: Joao Martins @ 2020-09-25 21:05 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, David Hildenbrand, Ira Weiny, Bjorn Helgaas,
	Vishal Verma, Dave Hansen, David Airlie, Vivek Goyal, Dave Jiang,
	Jonathan Cameron, Greg Kroah-Hartman, Pavel Tatashin, Hulk Robot,
	Ben Skeggs, Benjamin Herrenschmidt, Jia He,
	Jérôme Glisse, Jason Yan, Paul Mackerras,
	Boris Ostrovsky, Brice Goglin, Stefano Stabellini,
	Michael Ellerman, Juergen Gross, Daniel Vetter, Linux MM,
	linux-nvdimm, Linux Kernel Mailing List

On 9/25/20 10:01 PM, Dan Williams wrote:
> On Fri, Sep 25, 2020 at 1:52 PM Joao Martins <joao.m.martins@oracle.com> wrote:
>>
>> Hey Dan,
>>
>> On 9/25/20 8:11 PM, Dan Williams wrote:
>>> Changes since v4 [1]:
>>> - Rebased on
>>>   device-dax-move-instance-creation-parameters-to-struct-dev_dax_data.patch
>>>   in -mm [2]. I.e. patches that did not need fixups from v4 are not
>>>   included.
>>>
>>> - Folded all fixes
>>>
>>
>> Hmm, perhaps you missed the fixups before the above mentioned patch?
>>
>> From:
>>
>>         https://www.ozlabs.org/~akpm/mmots/series
>>
>> under "mm/dax", I am listing those fixups here:
>>
>> x86-numa-add-nohmat-option-fix.patch
>> acpi-hmat-refactor-hmat_register_target_device-to-hmem_register_device-fix.patch
>> mm-memory_hotplug-introduce-default-phys_to_target_node-implementation-fix.patch
>> acpi-hmat-attach-a-device-for-each-soft-reserved-range-fix.patch
>>
>> (in https://www.ozlabs.org/~akpm/mmots/broken-out/)
> 
> I left those for Andrew to handle. I actually should have started this
> set one more down in his stack because that's where my new changes
> start.
> 
Ah, got it!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 15/17] device-dax: add an 'align' attribute
       [not found] ` <160106118486.30709.13012322227204800596.stgit@dwillia2-desk3.amr.corp.intel.com>
@ 2020-09-26  2:22   ` Andrew Morton
  2020-09-26  3:31     ` Dan Williams
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2020-09-26  2:22 UTC (permalink / raw)
  To: Dan Williams
  Cc: Joao Martins, vishal.l.verma, dave.hansen, linux-mm,
	linux-nvdimm, linux-kernel

On Fri, 25 Sep 2020 12:13:04 -0700 Dan Williams <dan.j.williams@intel.com> wrote:

> Introduce a device align attribute.  While doing so, rename the region
> align attribute to be more explicitly named as so, but keep it named as
> @align to retain the API for tools like daxctl.
> 
> Changes on align may not always be valid, when say certain mappings were
> created with 2M and then we switch to 1G.  So, we validate all ranges
> against the new value being attempted, post resizing.
> 
> Link: https://lkml.kernel.org/r/159643105944.4062302.3131761052969132784.stgit@dwillia2-desk3.amr.corp.intel.com
> Link: https://lore.kernel.org/r/20200716172913.19658-3-joao.m.martins@oracle.com
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>

The signoff chain implies that this was From:Joao.  Please clarify?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 15/17] device-dax: add an 'align' attribute
  2020-09-26  2:22   ` [PATCH v5 15/17] device-dax: add an 'align' attribute Andrew Morton
@ 2020-09-26  3:31     ` Dan Williams
  0 siblings, 0 replies; 31+ messages in thread
From: Dan Williams @ 2020-09-26  3:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joao Martins, Vishal L Verma, Dave Hansen, Linux MM,
	linux-nvdimm, Linux Kernel Mailing List

On Fri, Sep 25, 2020 at 7:22 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Fri, 25 Sep 2020 12:13:04 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
>
> > Introduce a device align attribute.  While doing so, rename the region
> > align attribute to be more explicitly named as so, but keep it named as
> > @align to retain the API for tools like daxctl.
> >
> > Changes on align may not always be valid, when say certain mappings were
> > created with 2M and then we switch to 1G.  So, we validate all ranges
> > against the new value being attempted, post resizing.
> >
> > Link: https://lkml.kernel.org/r/159643105944.4062302.3131761052969132784.stgit@dwillia2-desk3.amr.corp.intel.com
> > Link: https://lore.kernel.org/r/20200716172913.19658-3-joao.m.martins@oracle.com
> > Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> >
>
> The signoff chain implies that this was From:Joao.  Please clarify?

Yes, sorry, my script to squash the fix rewrote the author and I
failed to catch it. This indeed should be:

From: Joao Martins <joao.m.martins@oracle.com>

I double-checked, and it looks like this was the only one in the
series with that problem.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 10/17] mm/memremap_pages: convert to 'struct range'
  2020-09-25 19:12 ` [PATCH v5 10/17] mm/memremap_pages: convert to 'struct range' Dan Williams
@ 2020-09-28 19:12   ` boris.ostrovsky
  0 siblings, 0 replies; 31+ messages in thread
From: boris.ostrovsky @ 2020-09-28 19:12 UTC (permalink / raw)
  To: Dan Williams, akpm
  Cc: Paul Mackerras, Michael Ellerman, Benjamin Herrenschmidt,
	Vishal Verma, Vivek Goyal, Dave Jiang, Ben Skeggs, David Airlie,
	Daniel Vetter, Ira Weiny, Bjorn Helgaas, Juergen Gross,
	Stefano Stabellini, Jérôme Glisse, dave.hansen,
	linux-mm, linux-nvdimm, linux-kernel


On 9/25/20 3:12 PM, Dan Williams wrote:
>  
> diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c
> index 3b98dc921426..091b8669eca3 100644
> --- a/drivers/xen/unpopulated-alloc.c
> +++ b/drivers/xen/unpopulated-alloc.c
> @@ -18,27 +18,37 @@ static unsigned int list_count;
>  static int fill_list(unsigned int nr_pages)
>  {
>  	struct dev_pagemap *pgmap;
> +	struct resource *res;
>  	void *vaddr;
>  	unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION);
> -	int ret;
> +	int ret = -ENOMEM;
> +
> +	res = kzalloc(sizeof(*res), GFP_KERNEL);
> +	if (!res)
> +		return -ENOMEM;
>  
>  	pgmap = kzalloc(sizeof(*pgmap), GFP_KERNEL);
>  	if (!pgmap)
> -		return -ENOMEM;
> +		goto err_pgmap;
>  
>  	pgmap->type = MEMORY_DEVICE_GENERIC;


Can you move these last 5 lines ...


> -	pgmap->res.name = "Xen scratch";
> -	pgmap->res.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> +	res->name = "Xen scratch";
> +	res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>  
> -	ret = allocate_resource(&iomem_resource, &pgmap->res,
> +	ret = allocate_resource(&iomem_resource, res,
>  				alloc_pages * PAGE_SIZE, 0, -1,
>  				PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
>  	if (ret < 0) {
>  		pr_err("Cannot allocate new IOMEM resource\n");
> -		kfree(pgmap);
> -		return ret;
> +		goto err_resource;
>  	}
>  


... here, so that we deal with pgmap in the same place? The diff will be slightly larger but the code will read better I think.


-boris


> +	pgmap->range = (struct range) {
> +		.start = res->start,
> +		.end = res->end,
> +	};
> +	pgmap->owner = res;
> +
>  #ifdef CONFIG_XEN_HAVE_PVMMU
>          /*
>           * memremap will build page tables for the new memory so

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 02/17] device-dax/kmem: introduce dax_kmem_range()
  2020-09-25 19:11 ` [PATCH v5 02/17] device-dax/kmem: introduce dax_kmem_range() Dan Williams
@ 2020-09-30 16:14   ` David Hildenbrand
  0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2020-09-30 16:14 UTC (permalink / raw)
  To: Dan Williams, akpm
  Cc: Vishal Verma, Dave Hansen, Pavel Tatashin, Brice Goglin,
	Dave Jiang, Ira Weiny, Jia He, Joao Martins, Jonathan Cameron,
	linux-mm, linux-nvdimm, linux-kernel

On 25.09.20 21:11, Dan Williams wrote:
> Towards removing the mode specific @dax_kmem_res attribute from the
> generic 'struct dev_dax', and preparing for multi-range support, teach
> the driver to calculate the hotplug range from the device range. The
> hotplug range is the trivially calculated memory-block-size aligned
> version of the device range.
> 
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Vishal Verma <vishal.l.verma@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Brice Goglin <Brice.Goglin@inria.fr>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Jia He <justin.he@arm.com>
> Cc: Joao Martins <joao.m.martins@oracle.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/dax/kmem.c |   40 +++++++++++++++++-----------------------
>  1 file changed, 17 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 5bb133df147d..b0d6a99cf12d 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -19,13 +19,20 @@ static const char *kmem_name;
>  /* Set if any memory will remain added when the driver will be unloaded. */
>  static bool any_hotremove_failed;
>  
> +static struct range dax_kmem_range(struct dev_dax *dev_dax)
> +{
> +	struct range range;
> +
> +	/* memory-block align the hotplug range */
> +	range.start = ALIGN(dev_dax->range.start, memory_block_size_bytes());
> +	range.end = ALIGN_DOWN(dev_dax->range.end + 1, memory_block_size_bytes()) - 1;
> +	return range;
> +}
> +
>  int dev_dax_kmem_probe(struct device *dev)
>  {
>  	struct dev_dax *dev_dax = to_dev_dax(dev);
> -	struct range *range = &dev_dax->range;
> -	resource_size_t kmem_start;
> -	resource_size_t kmem_size;
> -	resource_size_t kmem_end;
> +	struct range range = dax_kmem_range(dev_dax);
>  	struct resource *new_res;
>  	const char *new_res_name;
>  	int numa_node;
> @@ -44,25 +51,14 @@ int dev_dax_kmem_probe(struct device *dev)
>  		return -EINVAL;
>  	}
>  
> -	/* Hotplug starting at the beginning of the next block: */
> -	kmem_start = ALIGN(range->start, memory_block_size_bytes());
> -
> -	kmem_size = range_len(range);
> -	/* Adjust the size down to compensate for moving up kmem_start: */
> -	kmem_size -= kmem_start - range->start;
> -	/* Align the size down to cover only complete blocks: */
> -	kmem_size &= ~(memory_block_size_bytes() - 1);
> -	kmem_end = kmem_start + kmem_size;
> -
>  	new_res_name = kstrdup(dev_name(dev), GFP_KERNEL);
>  	if (!new_res_name)
>  		return -ENOMEM;
>  
>  	/* Region is permanently reserved if hotremove fails. */
> -	new_res = request_mem_region(kmem_start, kmem_size, new_res_name);
> +	new_res = request_mem_region(range.start, range_len(&range), new_res_name);
>  	if (!new_res) {
> -		dev_warn(dev, "could not reserve region [%pa-%pa]\n",
> -			 &kmem_start, &kmem_end);
> +		dev_warn(dev, "could not reserve region [%#llx-%#llx]\n", range.start, range.end);
>  		kfree(new_res_name);
>  		return -EBUSY;
>  	}
> @@ -96,9 +92,8 @@ int dev_dax_kmem_probe(struct device *dev)
>  static int dev_dax_kmem_remove(struct device *dev)
>  {
>  	struct dev_dax *dev_dax = to_dev_dax(dev);
> +	struct range range = dax_kmem_range(dev_dax);
>  	struct resource *res = dev_dax->dax_kmem_res;
> -	resource_size_t kmem_start = res->start;
> -	resource_size_t kmem_size = resource_size(res);
>  	const char *res_name = res->name;
>  	int rc;
>  
> @@ -108,12 +103,11 @@ static int dev_dax_kmem_remove(struct device *dev)
>  	 * there is no way to hotremove this memory until reboot because device
>  	 * unbind will succeed even if we return failure.
>  	 */
> -	rc = remove_memory(dev_dax->target_node, kmem_start, kmem_size);
> +	rc = remove_memory(dev_dax->target_node, range.start, range_len(&range));
>  	if (rc) {
>  		any_hotremove_failed = true;
> -		dev_err(dev,
> -			"DAX region %pR cannot be hotremoved until the next reboot\n",
> -			res);
> +		dev_err(dev, "%#llx-%#llx cannot be hotremoved until the next reboot\n",
> +				range.start, range.end);
>  		return rc;
>  	}
>  
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 03/17] device-dax/kmem: move resource name tracking to drvdata
  2020-09-25 19:11 ` [PATCH v5 03/17] device-dax/kmem: move resource name tracking to drvdata Dan Williams
@ 2020-09-30 16:19   ` David Hildenbrand
  0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2020-09-30 16:19 UTC (permalink / raw)
  To: Dan Williams, akpm
  Cc: Vishal Verma, Dave Hansen, Pavel Tatashin, Brice Goglin,
	Dave Jiang, Ira Weiny, Jia He, Joao Martins, Jonathan Cameron,
	linux-mm, linux-nvdimm, linux-kernel

On 25.09.20 21:11, Dan Williams wrote:
> Towards removing the mode specific @dax_kmem_res attribute from the
> generic 'struct dev_dax', and preparing for multi-range support, move
> resource name tracking to driver data.  The memory for the resource name
> needs to have its own lifetime separate from the device bind lifetime
> for cases where the driver is unbound, but the kmem range could not be
> unplugged from the page allocator.
> 
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Vishal Verma <vishal.l.verma@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Brice Goglin <Brice.Goglin@inria.fr>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Jia He <justin.he@arm.com>
> Cc: Joao Martins <joao.m.martins@oracle.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/dax/kmem.c |   16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index b0d6a99cf12d..6fe2cb1c5f7c 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -34,7 +34,7 @@ int dev_dax_kmem_probe(struct device *dev)
>  	struct dev_dax *dev_dax = to_dev_dax(dev);
>  	struct range range = dax_kmem_range(dev_dax);
>  	struct resource *new_res;
> -	const char *new_res_name;
> +	char *res_name;

I wonder why that change ...

>  	int numa_node;
>  	int rc;
>  
> @@ -51,15 +51,15 @@ int dev_dax_kmem_probe(struct device *dev)
>  		return -EINVAL;
>  	}
>  
> -	new_res_name = kstrdup(dev_name(dev), GFP_KERNEL);
> -	if (!new_res_name)
> +	res_name = kstrdup(dev_name(dev), GFP_KERNEL);
> +	if (!res_name)
>  		return -ENOMEM;
>  
>  	/* Region is permanently reserved if hotremove fails. */
> -	new_res = request_mem_region(range.start, range_len(&range), new_res_name);
> +	new_res = request_mem_region(range.start, range_len(&range), res_name);
>  	if (!new_res) {
>  		dev_warn(dev, "could not reserve region [%#llx-%#llx]\n", range.start, range.end);
> -		kfree(new_res_name);
> +		kfree(res_name);
>  		return -EBUSY;
>  	}
>  
> @@ -80,9 +80,11 @@ int dev_dax_kmem_probe(struct device *dev)
>  	if (rc) {
>  		release_resource(new_res);
>  		kfree(new_res);
> -		kfree(new_res_name);
> +		kfree(res_name);
>  		return rc;
>  	}
> +
> +	dev_set_drvdata(dev, res_name);
>  	dev_dax->dax_kmem_res = new_res;
>  
>  	return 0;
> @@ -94,7 +96,7 @@ static int dev_dax_kmem_remove(struct device *dev)
>  	struct dev_dax *dev_dax = to_dev_dax(dev);
>  	struct range range = dax_kmem_range(dev_dax);
>  	struct resource *res = dev_dax->dax_kmem_res;
> -	const char *res_name = res->name;
> +	const char *res_name = dev_get_drvdata(dev);

... because here you're back to "const".

>  	int rc;
>  
>  	/*
> 

I do wonder if it wouldn't all be easier to just have some

struct {
	const char *res_name;
	struct resource *res;
};

allocating that and storing it as drvdata. But let's see what the next
patch brings :)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 04/17] device-dax/kmem: replace release_resource() with release_mem_region()
  2020-09-25 19:12 ` [PATCH v5 04/17] device-dax/kmem: replace release_resource() with release_mem_region() Dan Williams
@ 2020-09-30 16:23   ` David Hildenbrand
  2020-09-30 17:28     ` Dan Williams
  0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2020-09-30 16:23 UTC (permalink / raw)
  To: Dan Williams, akpm
  Cc: Vishal Verma, Dave Hansen, Pavel Tatashin, Brice Goglin,
	Dave Jiang, Ira Weiny, Jia He, Joao Martins, Jonathan Cameron,
	linux-mm, linux-nvdimm, linux-kernel

On 25.09.20 21:12, Dan Williams wrote:
> Towards removing the mode specific @dax_kmem_res attribute from the
> generic 'struct dev_dax', and preparing for multi-range support, change
> the kmem driver to use the idiomatic release_mem_region() to pair with
> the initial request_mem_region(). This also eliminates the need to open
> code the release of the resource allocated by request_mem_region().
> 
> As there are no more dax_kmem_res users, delete this struct member.
> 
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Vishal Verma <vishal.l.verma@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Brice Goglin <Brice.Goglin@inria.fr>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Jia He <justin.he@arm.com>
> Cc: Joao Martins <joao.m.martins@oracle.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/dax/dax-private.h |    3 ---
>  drivers/dax/kmem.c        |   20 +++++++-------------
>  2 files changed, 7 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> index 6779f683671d..12a2dbc43b40 100644
> --- a/drivers/dax/dax-private.h
> +++ b/drivers/dax/dax-private.h
> @@ -42,8 +42,6 @@ struct dax_region {
>   * @dev - device core
>   * @pgmap - pgmap for memmap setup / lifetime (driver owned)
>   * @range: resource range for the instance
> - * @dax_mem_res: physical address range of hotadded DAX memory
> - * @dax_mem_name: name for hotadded DAX memory via add_memory_driver_managed()
>   */
>  struct dev_dax {
>  	struct dax_region *region;
> @@ -52,7 +50,6 @@ struct dev_dax {
>  	struct device dev;
>  	struct dev_pagemap *pgmap;
>  	struct range range;
> -	struct resource *dax_kmem_res;
>  };
>  
>  static inline u64 range_len(struct range *range)
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 6fe2cb1c5f7c..e56fc688bdc5 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -33,7 +33,7 @@ int dev_dax_kmem_probe(struct device *dev)
>  {
>  	struct dev_dax *dev_dax = to_dev_dax(dev);
>  	struct range range = dax_kmem_range(dev_dax);
> -	struct resource *new_res;
> +	struct resource *res;
>  	char *res_name;
>  	int numa_node;
>  	int rc;
> @@ -56,8 +56,8 @@ int dev_dax_kmem_probe(struct device *dev)
>  		return -ENOMEM;
>  
>  	/* Region is permanently reserved if hotremove fails. */
> -	new_res = request_mem_region(range.start, range_len(&range), res_name);
> -	if (!new_res) {
> +	res = request_mem_region(range.start, range_len(&range), res_name);
> +	if (!res) {
>  		dev_warn(dev, "could not reserve region [%#llx-%#llx]\n", range.start, range.end);
>  		kfree(res_name);
>  		return -EBUSY;
> @@ -69,23 +69,20 @@ int dev_dax_kmem_probe(struct device *dev)
>  	 * inherit flags from the parent since it may set new flags
>  	 * unknown to us that will break add_memory() below.
>  	 */
> -	new_res->flags = IORESOURCE_SYSTEM_RAM;
> +	res->flags = IORESOURCE_SYSTEM_RAM;
>  
>  	/*
>  	 * Ensure that future kexec'd kernels will not treat this as RAM
>  	 * automatically.
>  	 */
> -	rc = add_memory_driver_managed(numa_node, new_res->start,
> -				       resource_size(new_res), kmem_name);
> +	rc = add_memory_driver_managed(numa_node, range.start, range_len(&range), kmem_name);
>  	if (rc) {
> -		release_resource(new_res);
> -		kfree(new_res);
> +		release_mem_region(range.start, range_len(&range));
>  		kfree(res_name);
>  		return rc;
>  	}
>  
>  	dev_set_drvdata(dev, res_name);
> -	dev_dax->dax_kmem_res = new_res;
>  
>  	return 0;
>  }
> @@ -95,7 +92,6 @@ static int dev_dax_kmem_remove(struct device *dev)
>  {
>  	struct dev_dax *dev_dax = to_dev_dax(dev);
>  	struct range range = dax_kmem_range(dev_dax);
> -	struct resource *res = dev_dax->dax_kmem_res;
>  	const char *res_name = dev_get_drvdata(dev);
>  	int rc;
>  
> @@ -114,10 +110,8 @@ static int dev_dax_kmem_remove(struct device *dev)
>  	}
>  
>  	/* Release and free dax resources */
> -	release_resource(res);
> -	kfree(res);
> +	release_mem_region(range.start, range_len(&range));

Does that work? AFAIKs,

__release_region(&iomem_resource, (start), (n)) -> __release_region()

will only remove stuff that is IORESOURCE_BUSY.

Maybe storing it in drvdata is indeed the easiest way to remove it from
struct dax_region.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 04/17] device-dax/kmem: replace release_resource() with release_mem_region()
  2020-09-30 16:23   ` David Hildenbrand
@ 2020-09-30 17:28     ` Dan Williams
  0 siblings, 0 replies; 31+ messages in thread
From: Dan Williams @ 2020-09-30 17:28 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Vishal Verma, Dave Hansen, Pavel Tatashin,
	Brice Goglin, Dave Jiang, Ira Weiny, Jia He, Joao Martins,
	Jonathan Cameron, Linux MM, linux-nvdimm,
	Linux Kernel Mailing List

On Wed, Sep 30, 2020 at 9:23 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 25.09.20 21:12, Dan Williams wrote:
> > Towards removing the mode specific @dax_kmem_res attribute from the
> > generic 'struct dev_dax', and preparing for multi-range support, change
> > the kmem driver to use the idiomatic release_mem_region() to pair with
> > the initial request_mem_region(). This also eliminates the need to open
> > code the release of the resource allocated by request_mem_region().
> >
> > As there are no more dax_kmem_res users, delete this struct member.
> >
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Vishal Verma <vishal.l.verma@intel.com>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> > Cc: Brice Goglin <Brice.Goglin@inria.fr>
> > Cc: Dave Jiang <dave.jiang@intel.com>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Cc: Jia He <justin.he@arm.com>
> > Cc: Joao Martins <joao.m.martins@oracle.com>
> > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/dax/dax-private.h |    3 ---
> >  drivers/dax/kmem.c        |   20 +++++++-------------
> >  2 files changed, 7 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> > index 6779f683671d..12a2dbc43b40 100644
> > --- a/drivers/dax/dax-private.h
> > +++ b/drivers/dax/dax-private.h
> > @@ -42,8 +42,6 @@ struct dax_region {
> >   * @dev - device core
> >   * @pgmap - pgmap for memmap setup / lifetime (driver owned)
> >   * @range: resource range for the instance
> > - * @dax_mem_res: physical address range of hotadded DAX memory
> > - * @dax_mem_name: name for hotadded DAX memory via add_memory_driver_managed()
> >   */
> >  struct dev_dax {
> >       struct dax_region *region;
> > @@ -52,7 +50,6 @@ struct dev_dax {
> >       struct device dev;
> >       struct dev_pagemap *pgmap;
> >       struct range range;
> > -     struct resource *dax_kmem_res;
> >  };
> >
> >  static inline u64 range_len(struct range *range)
> > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> > index 6fe2cb1c5f7c..e56fc688bdc5 100644
> > --- a/drivers/dax/kmem.c
> > +++ b/drivers/dax/kmem.c
> > @@ -33,7 +33,7 @@ int dev_dax_kmem_probe(struct device *dev)
> >  {
> >       struct dev_dax *dev_dax = to_dev_dax(dev);
> >       struct range range = dax_kmem_range(dev_dax);
> > -     struct resource *new_res;
> > +     struct resource *res;
> >       char *res_name;
> >       int numa_node;
> >       int rc;
> > @@ -56,8 +56,8 @@ int dev_dax_kmem_probe(struct device *dev)
> >               return -ENOMEM;
> >
> >       /* Region is permanently reserved if hotremove fails. */
> > -     new_res = request_mem_region(range.start, range_len(&range), res_name);
> > -     if (!new_res) {
> > +     res = request_mem_region(range.start, range_len(&range), res_name);
> > +     if (!res) {
> >               dev_warn(dev, "could not reserve region [%#llx-%#llx]\n", range.start, range.end);
> >               kfree(res_name);
> >               return -EBUSY;
> > @@ -69,23 +69,20 @@ int dev_dax_kmem_probe(struct device *dev)
> >        * inherit flags from the parent since it may set new flags
> >        * unknown to us that will break add_memory() below.
> >        */
> > -     new_res->flags = IORESOURCE_SYSTEM_RAM;
> > +     res->flags = IORESOURCE_SYSTEM_RAM;
> >
> >       /*
> >        * Ensure that future kexec'd kernels will not treat this as RAM
> >        * automatically.
> >        */
> > -     rc = add_memory_driver_managed(numa_node, new_res->start,
> > -                                    resource_size(new_res), kmem_name);
> > +     rc = add_memory_driver_managed(numa_node, range.start, range_len(&range), kmem_name);
> >       if (rc) {
> > -             release_resource(new_res);
> > -             kfree(new_res);
> > +             release_mem_region(range.start, range_len(&range));
> >               kfree(res_name);
> >               return rc;
> >       }
> >
> >       dev_set_drvdata(dev, res_name);
> > -     dev_dax->dax_kmem_res = new_res;
> >
> >       return 0;
> >  }
> > @@ -95,7 +92,6 @@ static int dev_dax_kmem_remove(struct device *dev)
> >  {
> >       struct dev_dax *dev_dax = to_dev_dax(dev);
> >       struct range range = dax_kmem_range(dev_dax);
> > -     struct resource *res = dev_dax->dax_kmem_res;
> >       const char *res_name = dev_get_drvdata(dev);
> >       int rc;
> >
> > @@ -114,10 +110,8 @@ static int dev_dax_kmem_remove(struct device *dev)
> >       }
> >
> >       /* Release and free dax resources */
> > -     release_resource(res);
> > -     kfree(res);
> > +     release_mem_region(range.start, range_len(&range));
>
> Does that work? AFAIKs,
>
> __release_region(&iomem_resource, (start), (n)) -> __release_region()
>
> will only remove stuff that is IORESOURCE_BUSY.

Oh, crud, indeed. More fallout from futzing with the flags.

> Maybe storing it in drvdata is indeed the easiest way to remove it from
> struct dax_region.

My first choice is still to get the driver out of the layering
violation of touching resource flags, but we don't seem to be
converging on a way forward there. So, stuffing more tracking into
drvdata (especially for the multi-range case) is the next best option.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 01/17] device-dax: make pgmap optional for instance creation
  2020-09-25 19:11 ` [PATCH v5 01/17] device-dax: make pgmap optional for instance creation Dan Williams
@ 2020-10-01  8:41   ` David Hildenbrand
  2020-10-01 16:54     ` Dan Williams
  0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2020-10-01  8:41 UTC (permalink / raw)
  To: Dan Williams, akpm
  Cc: Vishal Verma, Dave Hansen, Pavel Tatashin, Brice Goglin,
	Dave Jiang, Ira Weiny, Jia He, Joao Martins, Jonathan Cameron,
	linux-mm, linux-nvdimm, linux-kernel

On 25.09.20 21:11, Dan Williams wrote:
> The passed in dev_pagemap is only required in the pmem case as the
> libnvdimm core may have reserved a vmem_altmap for dev_memremap_pages() to
> place the memmap in pmem directly.  In the hmem case there is no agent
> reserving an altmap so it can all be handled by a core internal default.
> 
> Pass the resource range via a new @range property of 'struct
> dev_dax_data'.
> 
> Link: https://lkml.kernel.org/r/159643099958.4062302.10379230791041872886.stgit@dwillia2-desk3.amr.corp.intel.com
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Vishal Verma <vishal.l.verma@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Brice Goglin <Brice.Goglin@inria.fr>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Jia He <justin.he@arm.com>
> Cc: Joao Martins <joao.m.martins@oracle.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/dax/bus.c              |   29 +++++++++++++++--------------
>  drivers/dax/bus.h              |    2 ++
>  drivers/dax/dax-private.h      |    9 ++++++++-
>  drivers/dax/device.c           |   28 +++++++++++++++++++---------
>  drivers/dax/hmem/hmem.c        |    8 ++++----
>  drivers/dax/kmem.c             |   12 ++++++------
>  drivers/dax/pmem/core.c        |    4 ++++
>  tools/testing/nvdimm/dax-dev.c |    8 ++++----
>  8 files changed, 62 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index dffa4655e128..96bd64ba95a5 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -271,7 +271,7 @@ static ssize_t size_show(struct device *dev,
>  		struct device_attribute *attr, char *buf)
>  {
>  	struct dev_dax *dev_dax = to_dev_dax(dev);
> -	unsigned long long size = resource_size(&dev_dax->region->res);
> +	unsigned long long size = range_len(&dev_dax->range);
>  
>  	return sprintf(buf, "%llu\n", size);
>  }
> @@ -293,19 +293,12 @@ static ssize_t target_node_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(target_node);
>  
> -static unsigned long long dev_dax_resource(struct dev_dax *dev_dax)
> -{
> -	struct dax_region *dax_region = dev_dax->region;
> -
> -	return dax_region->res.start;
> -}
> -
>  static ssize_t resource_show(struct device *dev,
>  		struct device_attribute *attr, char *buf)
>  {
>  	struct dev_dax *dev_dax = to_dev_dax(dev);
>  
> -	return sprintf(buf, "%#llx\n", dev_dax_resource(dev_dax));
> +	return sprintf(buf, "%#llx\n", dev_dax->range.start);
>  }
>  static DEVICE_ATTR(resource, 0400, resource_show, NULL);
>  
> @@ -376,6 +369,7 @@ static void dev_dax_release(struct device *dev)
>  
>  	dax_region_put(dax_region);
>  	put_dax(dax_dev);
> +	kfree(dev_dax->pgmap);
>  	kfree(dev_dax);
>  }
>  
> @@ -412,7 +406,12 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
>  	if (!dev_dax)
>  		return ERR_PTR(-ENOMEM);
>  
> -	memcpy(&dev_dax->pgmap, data->pgmap, sizeof(struct dev_pagemap));
> +	if (data->pgmap) {
> +		dev_dax->pgmap = kmemdup(data->pgmap,
> +				sizeof(struct dev_pagemap), GFP_KERNEL);
> +		if (!dev_dax->pgmap)
> +			goto err_pgmap;
> +	}
>  
>  	/*
>  	 * No 'host' or dax_operations since there is no access to this
> @@ -421,18 +420,19 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
>  	dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
>  	if (IS_ERR(dax_dev)) {
>  		rc = PTR_ERR(dax_dev);
> -		goto err;
> +		goto err_alloc_dax;
>  	}
>  
>  	/* a device_dax instance is dead while the driver is not attached */
>  	kill_dax(dax_dev);
>  
> -	/* from here on we're committed to teardown via dax_dev_release() */
> +	/* from here on we're committed to teardown via dev_dax_release() */
>  	dev = &dev_dax->dev;
>  	device_initialize(dev);
>  
>  	dev_dax->dax_dev = dax_dev;
>  	dev_dax->region = dax_region;
> +	dev_dax->range = data->range;
>  	dev_dax->target_node = dax_region->target_node;
>  	kref_get(&dax_region->kref);
>  
> @@ -458,8 +458,9 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
>  		return ERR_PTR(rc);
>  
>  	return dev_dax;
> -
> - err:
> +err_alloc_dax:
> +	kfree(dev_dax->pgmap);
> +err_pgmap:
>  	kfree(dev_dax);
>  
>  	return ERR_PTR(rc);
> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index 299c2e7fac09..4aeb36da83a4 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -3,6 +3,7 @@
>  #ifndef __DAX_BUS_H__
>  #define __DAX_BUS_H__
>  #include <linux/device.h>
> +#include <linux/range.h>
>  
>  struct dev_dax;
>  struct resource;
> @@ -21,6 +22,7 @@ struct dev_dax_data {
>  	struct dax_region *dax_region;
>  	struct dev_pagemap *pgmap;
>  	enum dev_dax_subsys subsys;
> +	struct range range;
>  	int id;
>  };
>  
> diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> index 8a4c40ccd2ef..6779f683671d 100644
> --- a/drivers/dax/dax-private.h
> +++ b/drivers/dax/dax-private.h
> @@ -41,6 +41,7 @@ struct dax_region {
>   * @target_node: effective numa node if dev_dax memory range is onlined
>   * @dev - device core
>   * @pgmap - pgmap for memmap setup / lifetime (driver owned)
> + * @range: resource range for the instance
>   * @dax_mem_res: physical address range of hotadded DAX memory
>   * @dax_mem_name: name for hotadded DAX memory via add_memory_driver_managed()
>   */
> @@ -49,10 +50,16 @@ struct dev_dax {
>  	struct dax_device *dax_dev;
>  	int target_node;
>  	struct device dev;
> -	struct dev_pagemap pgmap;
> +	struct dev_pagemap *pgmap;
> +	struct range range;
>  	struct resource *dax_kmem_res;
>  };
>  
> +static inline u64 range_len(struct range *range)
> +{
> +	return range->end - range->start + 1;
> +}

include/linux/range.h seems to have this function - why is this here needed?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 01/17] device-dax: make pgmap optional for instance creation
  2020-10-01  8:41   ` David Hildenbrand
@ 2020-10-01 16:54     ` Dan Williams
  2020-10-01 17:39       ` David Hildenbrand
  0 siblings, 1 reply; 31+ messages in thread
From: Dan Williams @ 2020-10-01 16:54 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Vishal Verma, Dave Hansen, Pavel Tatashin,
	Brice Goglin, Dave Jiang, Ira Weiny, Jia He, Joao Martins,
	Jonathan Cameron, Linux MM, linux-nvdimm,
	Linux Kernel Mailing List

On Thu, Oct 1, 2020 at 1:41 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 25.09.20 21:11, Dan Williams wrote:
> > The passed in dev_pagemap is only required in the pmem case as the
> > libnvdimm core may have reserved a vmem_altmap for dev_memremap_pages() to
> > place the memmap in pmem directly.  In the hmem case there is no agent
> > reserving an altmap so it can all be handled by a core internal default.
> >
> > Pass the resource range via a new @range property of 'struct
> > dev_dax_data'.
> >
> > Link: https://lkml.kernel.org/r/159643099958.4062302.10379230791041872886.stgit@dwillia2-desk3.amr.corp.intel.com
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Vishal Verma <vishal.l.verma@intel.com>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> > Cc: Brice Goglin <Brice.Goglin@inria.fr>
> > Cc: Dave Jiang <dave.jiang@intel.com>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Cc: Jia He <justin.he@arm.com>
> > Cc: Joao Martins <joao.m.martins@oracle.com>
> > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/dax/bus.c              |   29 +++++++++++++++--------------
> >  drivers/dax/bus.h              |    2 ++
> >  drivers/dax/dax-private.h      |    9 ++++++++-
> >  drivers/dax/device.c           |   28 +++++++++++++++++++---------
> >  drivers/dax/hmem/hmem.c        |    8 ++++----
> >  drivers/dax/kmem.c             |   12 ++++++------
> >  drivers/dax/pmem/core.c        |    4 ++++
> >  tools/testing/nvdimm/dax-dev.c |    8 ++++----
> >  8 files changed, 62 insertions(+), 38 deletions(-)
> >
> > diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> > index dffa4655e128..96bd64ba95a5 100644
> > --- a/drivers/dax/bus.c
> > +++ b/drivers/dax/bus.c
> > @@ -271,7 +271,7 @@ static ssize_t size_show(struct device *dev,
> >               struct device_attribute *attr, char *buf)
> >  {
> >       struct dev_dax *dev_dax = to_dev_dax(dev);
> > -     unsigned long long size = resource_size(&dev_dax->region->res);
> > +     unsigned long long size = range_len(&dev_dax->range);
> >
> >       return sprintf(buf, "%llu\n", size);
> >  }
> > @@ -293,19 +293,12 @@ static ssize_t target_node_show(struct device *dev,
> >  }
> >  static DEVICE_ATTR_RO(target_node);
> >
> > -static unsigned long long dev_dax_resource(struct dev_dax *dev_dax)
> > -{
> > -     struct dax_region *dax_region = dev_dax->region;
> > -
> > -     return dax_region->res.start;
> > -}
> > -
> >  static ssize_t resource_show(struct device *dev,
> >               struct device_attribute *attr, char *buf)
> >  {
> >       struct dev_dax *dev_dax = to_dev_dax(dev);
> >
> > -     return sprintf(buf, "%#llx\n", dev_dax_resource(dev_dax));
> > +     return sprintf(buf, "%#llx\n", dev_dax->range.start);
> >  }
> >  static DEVICE_ATTR(resource, 0400, resource_show, NULL);
> >
> > @@ -376,6 +369,7 @@ static void dev_dax_release(struct device *dev)
> >
> >       dax_region_put(dax_region);
> >       put_dax(dax_dev);
> > +     kfree(dev_dax->pgmap);
> >       kfree(dev_dax);
> >  }
> >
> > @@ -412,7 +406,12 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
> >       if (!dev_dax)
> >               return ERR_PTR(-ENOMEM);
> >
> > -     memcpy(&dev_dax->pgmap, data->pgmap, sizeof(struct dev_pagemap));
> > +     if (data->pgmap) {
> > +             dev_dax->pgmap = kmemdup(data->pgmap,
> > +                             sizeof(struct dev_pagemap), GFP_KERNEL);
> > +             if (!dev_dax->pgmap)
> > +                     goto err_pgmap;
> > +     }
> >
> >       /*
> >        * No 'host' or dax_operations since there is no access to this
> > @@ -421,18 +420,19 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
> >       dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
> >       if (IS_ERR(dax_dev)) {
> >               rc = PTR_ERR(dax_dev);
> > -             goto err;
> > +             goto err_alloc_dax;
> >       }
> >
> >       /* a device_dax instance is dead while the driver is not attached */
> >       kill_dax(dax_dev);
> >
> > -     /* from here on we're committed to teardown via dax_dev_release() */
> > +     /* from here on we're committed to teardown via dev_dax_release() */
> >       dev = &dev_dax->dev;
> >       device_initialize(dev);
> >
> >       dev_dax->dax_dev = dax_dev;
> >       dev_dax->region = dax_region;
> > +     dev_dax->range = data->range;
> >       dev_dax->target_node = dax_region->target_node;
> >       kref_get(&dax_region->kref);
> >
> > @@ -458,8 +458,9 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
> >               return ERR_PTR(rc);
> >
> >       return dev_dax;
> > -
> > - err:
> > +err_alloc_dax:
> > +     kfree(dev_dax->pgmap);
> > +err_pgmap:
> >       kfree(dev_dax);
> >
> >       return ERR_PTR(rc);
> > diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> > index 299c2e7fac09..4aeb36da83a4 100644
> > --- a/drivers/dax/bus.h
> > +++ b/drivers/dax/bus.h
> > @@ -3,6 +3,7 @@
> >  #ifndef __DAX_BUS_H__
> >  #define __DAX_BUS_H__
> >  #include <linux/device.h>
> > +#include <linux/range.h>
> >
> >  struct dev_dax;
> >  struct resource;
> > @@ -21,6 +22,7 @@ struct dev_dax_data {
> >       struct dax_region *dax_region;
> >       struct dev_pagemap *pgmap;
> >       enum dev_dax_subsys subsys;
> > +     struct range range;
> >       int id;
> >  };
> >
> > diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> > index 8a4c40ccd2ef..6779f683671d 100644
> > --- a/drivers/dax/dax-private.h
> > +++ b/drivers/dax/dax-private.h
> > @@ -41,6 +41,7 @@ struct dax_region {
> >   * @target_node: effective numa node if dev_dax memory range is onlined
> >   * @dev - device core
> >   * @pgmap - pgmap for memmap setup / lifetime (driver owned)
> > + * @range: resource range for the instance
> >   * @dax_mem_res: physical address range of hotadded DAX memory
> >   * @dax_mem_name: name for hotadded DAX memory via add_memory_driver_managed()
> >   */
> > @@ -49,10 +50,16 @@ struct dev_dax {
> >       struct dax_device *dax_dev;
> >       int target_node;
> >       struct device dev;
> > -     struct dev_pagemap pgmap;
> > +     struct dev_pagemap *pgmap;
> > +     struct range range;
> >       struct resource *dax_kmem_res;
> >  };
> >
> > +static inline u64 range_len(struct range *range)
> > +{
> > +     return range->end - range->start + 1;
> > +}
>
> include/linux/range.h seems to have this function - why is this here needed?

It's there because I add it later in this series. I waited until
"mm/memremap_pages: convert to 'struct range'" to make it global as
that's the first kernel-wide visible usage of it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 01/17] device-dax: make pgmap optional for instance creation
  2020-10-01 16:54     ` Dan Williams
@ 2020-10-01 17:39       ` David Hildenbrand
  2020-10-01 19:12         ` Dan Williams
  0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2020-10-01 17:39 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Vishal Verma, Dave Hansen, Pavel Tatashin,
	Brice Goglin, Dave Jiang, Ira Weiny, Jia He, Joao Martins,
	Jonathan Cameron, Linux MM, linux-nvdimm,
	Linux Kernel Mailing List

On 01.10.20 18:54, Dan Williams wrote:
> On Thu, Oct 1, 2020 at 1:41 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 25.09.20 21:11, Dan Williams wrote:
>>> The passed in dev_pagemap is only required in the pmem case as the
>>> libnvdimm core may have reserved a vmem_altmap for dev_memremap_pages() to
>>> place the memmap in pmem directly.  In the hmem case there is no agent
>>> reserving an altmap so it can all be handled by a core internal default.
>>>
>>> Pass the resource range via a new @range property of 'struct
>>> dev_dax_data'.
>>>
>>> Link: https://lkml.kernel.org/r/159643099958.4062302.10379230791041872886.stgit@dwillia2-desk3.amr.corp.intel.com
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Vishal Verma <vishal.l.verma@intel.com>
>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>>> Cc: Brice Goglin <Brice.Goglin@inria.fr>
>>> Cc: Dave Jiang <dave.jiang@intel.com>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Ira Weiny <ira.weiny@intel.com>
>>> Cc: Jia He <justin.he@arm.com>
>>> Cc: Joao Martins <joao.m.martins@oracle.com>
>>> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>> ---
>>>  drivers/dax/bus.c              |   29 +++++++++++++++--------------
>>>  drivers/dax/bus.h              |    2 ++
>>>  drivers/dax/dax-private.h      |    9 ++++++++-
>>>  drivers/dax/device.c           |   28 +++++++++++++++++++---------
>>>  drivers/dax/hmem/hmem.c        |    8 ++++----
>>>  drivers/dax/kmem.c             |   12 ++++++------
>>>  drivers/dax/pmem/core.c        |    4 ++++
>>>  tools/testing/nvdimm/dax-dev.c |    8 ++++----
>>>  8 files changed, 62 insertions(+), 38 deletions(-)
>>>
>>> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
>>> index dffa4655e128..96bd64ba95a5 100644
>>> --- a/drivers/dax/bus.c
>>> +++ b/drivers/dax/bus.c
>>> @@ -271,7 +271,7 @@ static ssize_t size_show(struct device *dev,
>>>               struct device_attribute *attr, char *buf)
>>>  {
>>>       struct dev_dax *dev_dax = to_dev_dax(dev);
>>> -     unsigned long long size = resource_size(&dev_dax->region->res);
>>> +     unsigned long long size = range_len(&dev_dax->range);
>>>
>>>       return sprintf(buf, "%llu\n", size);
>>>  }
>>> @@ -293,19 +293,12 @@ static ssize_t target_node_show(struct device *dev,
>>>  }
>>>  static DEVICE_ATTR_RO(target_node);
>>>
>>> -static unsigned long long dev_dax_resource(struct dev_dax *dev_dax)
>>> -{
>>> -     struct dax_region *dax_region = dev_dax->region;
>>> -
>>> -     return dax_region->res.start;
>>> -}
>>> -
>>>  static ssize_t resource_show(struct device *dev,
>>>               struct device_attribute *attr, char *buf)
>>>  {
>>>       struct dev_dax *dev_dax = to_dev_dax(dev);
>>>
>>> -     return sprintf(buf, "%#llx\n", dev_dax_resource(dev_dax));
>>> +     return sprintf(buf, "%#llx\n", dev_dax->range.start);
>>>  }
>>>  static DEVICE_ATTR(resource, 0400, resource_show, NULL);
>>>
>>> @@ -376,6 +369,7 @@ static void dev_dax_release(struct device *dev)
>>>
>>>       dax_region_put(dax_region);
>>>       put_dax(dax_dev);
>>> +     kfree(dev_dax->pgmap);
>>>       kfree(dev_dax);
>>>  }
>>>
>>> @@ -412,7 +406,12 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
>>>       if (!dev_dax)
>>>               return ERR_PTR(-ENOMEM);
>>>
>>> -     memcpy(&dev_dax->pgmap, data->pgmap, sizeof(struct dev_pagemap));
>>> +     if (data->pgmap) {
>>> +             dev_dax->pgmap = kmemdup(data->pgmap,
>>> +                             sizeof(struct dev_pagemap), GFP_KERNEL);
>>> +             if (!dev_dax->pgmap)
>>> +                     goto err_pgmap;
>>> +     }
>>>
>>>       /*
>>>        * No 'host' or dax_operations since there is no access to this
>>> @@ -421,18 +420,19 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
>>>       dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
>>>       if (IS_ERR(dax_dev)) {
>>>               rc = PTR_ERR(dax_dev);
>>> -             goto err;
>>> +             goto err_alloc_dax;
>>>       }
>>>
>>>       /* a device_dax instance is dead while the driver is not attached */
>>>       kill_dax(dax_dev);
>>>
>>> -     /* from here on we're committed to teardown via dax_dev_release() */
>>> +     /* from here on we're committed to teardown via dev_dax_release() */
>>>       dev = &dev_dax->dev;
>>>       device_initialize(dev);
>>>
>>>       dev_dax->dax_dev = dax_dev;
>>>       dev_dax->region = dax_region;
>>> +     dev_dax->range = data->range;
>>>       dev_dax->target_node = dax_region->target_node;
>>>       kref_get(&dax_region->kref);
>>>
>>> @@ -458,8 +458,9 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
>>>               return ERR_PTR(rc);
>>>
>>>       return dev_dax;
>>> -
>>> - err:
>>> +err_alloc_dax:
>>> +     kfree(dev_dax->pgmap);
>>> +err_pgmap:
>>>       kfree(dev_dax);
>>>
>>>       return ERR_PTR(rc);
>>> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
>>> index 299c2e7fac09..4aeb36da83a4 100644
>>> --- a/drivers/dax/bus.h
>>> +++ b/drivers/dax/bus.h
>>> @@ -3,6 +3,7 @@
>>>  #ifndef __DAX_BUS_H__
>>>  #define __DAX_BUS_H__
>>>  #include <linux/device.h>
>>> +#include <linux/range.h>
>>>
>>>  struct dev_dax;
>>>  struct resource;
>>> @@ -21,6 +22,7 @@ struct dev_dax_data {
>>>       struct dax_region *dax_region;
>>>       struct dev_pagemap *pgmap;
>>>       enum dev_dax_subsys subsys;
>>> +     struct range range;
>>>       int id;
>>>  };
>>>
>>> diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
>>> index 8a4c40ccd2ef..6779f683671d 100644
>>> --- a/drivers/dax/dax-private.h
>>> +++ b/drivers/dax/dax-private.h
>>> @@ -41,6 +41,7 @@ struct dax_region {
>>>   * @target_node: effective numa node if dev_dax memory range is onlined
>>>   * @dev - device core
>>>   * @pgmap - pgmap for memmap setup / lifetime (driver owned)
>>> + * @range: resource range for the instance
>>>   * @dax_mem_res: physical address range of hotadded DAX memory
>>>   * @dax_mem_name: name for hotadded DAX memory via add_memory_driver_managed()
>>>   */
>>> @@ -49,10 +50,16 @@ struct dev_dax {
>>>       struct dax_device *dax_dev;
>>>       int target_node;
>>>       struct device dev;
>>> -     struct dev_pagemap pgmap;
>>> +     struct dev_pagemap *pgmap;
>>> +     struct range range;
>>>       struct resource *dax_kmem_res;
>>>  };
>>>
>>> +static inline u64 range_len(struct range *range)
>>> +{
>>> +     return range->end - range->start + 1;
>>> +}
>>
>> include/linux/range.h seems to have this function - why is this here needed?
> 
> It's there because I add it later in this series. I waited until
> "mm/memremap_pages: convert to 'struct range'" to make it global as
> that's the first kernel-wide visible usage of it.

Ah okay - I'd just place it right at the final destination, instead of
moving fresh code around within a single series.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 01/17] device-dax: make pgmap optional for instance creation
  2020-10-01 17:39       ` David Hildenbrand
@ 2020-10-01 19:12         ` Dan Williams
  0 siblings, 0 replies; 31+ messages in thread
From: Dan Williams @ 2020-10-01 19:12 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Vishal Verma, Dave Hansen, Pavel Tatashin,
	Brice Goglin, Dave Jiang, Ira Weiny, Jia He, Joao Martins,
	Jonathan Cameron, Linux MM, linux-nvdimm,
	Linux Kernel Mailing List

On Thu, Oct 1, 2020 at 10:39 AM David Hildenbrand <david@redhat.com> wrote:
[..]
> >> include/linux/range.h seems to have this function - why is this here needed?
> >
> > It's there because I add it later in this series. I waited until
> > "mm/memremap_pages: convert to 'struct range'" to make it global as
> > that's the first kernel-wide visible usage of it.
>
> Ah okay - I'd just place it right at the final destination, instead of
> moving fresh code around within a single series.

Yeah, it's looking like this series will all land together so I'll go
ahead and move it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-10-01 19:12 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-25 19:11 [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Dan Williams
2020-09-25 19:11 ` [PATCH v5 01/17] device-dax: make pgmap optional for instance creation Dan Williams
2020-10-01  8:41   ` David Hildenbrand
2020-10-01 16:54     ` Dan Williams
2020-10-01 17:39       ` David Hildenbrand
2020-10-01 19:12         ` Dan Williams
2020-09-25 19:11 ` [PATCH v5 02/17] device-dax/kmem: introduce dax_kmem_range() Dan Williams
2020-09-30 16:14   ` David Hildenbrand
2020-09-25 19:11 ` [PATCH v5 03/17] device-dax/kmem: move resource name tracking to drvdata Dan Williams
2020-09-30 16:19   ` David Hildenbrand
2020-09-25 19:12 ` [PATCH v5 04/17] device-dax/kmem: replace release_resource() with release_mem_region() Dan Williams
2020-09-30 16:23   ` David Hildenbrand
2020-09-30 17:28     ` Dan Williams
2020-09-25 19:12 ` [PATCH v5 05/17] device-dax: add an allocation interface for device-dax instances Dan Williams
2020-09-25 19:12 ` [PATCH v5 06/17] device-dax: introduce 'struct dev_dax' typed-driver operations Dan Williams
2020-09-25 19:12 ` [PATCH v5 07/17] device-dax: introduce 'seed' devices Dan Williams
2020-09-25 19:12 ` [PATCH v5 08/17] drivers/base: make device_find_child_by_name() compatible with sysfs inputs Dan Williams
2020-09-25 19:12 ` [PATCH v5 09/17] device-dax: add resize support Dan Williams
2020-09-25 19:12 ` [PATCH v5 10/17] mm/memremap_pages: convert to 'struct range' Dan Williams
2020-09-28 19:12   ` boris.ostrovsky
2020-09-25 19:12 ` [PATCH v5 11/17] mm/memremap_pages: support multiple ranges per invocation Dan Williams
2020-09-25 19:12 ` [PATCH v5 12/17] device-dax: add dis-contiguous resource support Dan Williams
2020-09-25 19:12 ` [PATCH v5 13/17] device-dax: introduce 'mapping' devices Dan Williams
2020-09-25 19:12 ` [PATCH v5 14/17] device-dax: make align a per-device property Dan Williams
2020-09-25 19:13 ` [PATCH v5 16/17] dax/hmem: introduce dax_hmem.region_idle parameter Dan Williams
2020-09-25 19:13 ` [PATCH v5 17/17] device-dax: add a range mapping allocation attribute Dan Williams
2020-09-25 20:51 ` [PATCH v5 00/17] device-dax: support sub-dividing soft-reserved ranges Joao Martins
2020-09-25 21:01   ` Dan Williams
2020-09-25 21:05     ` Joao Martins
     [not found] ` <160106118486.30709.13012322227204800596.stgit@dwillia2-desk3.amr.corp.intel.com>
2020-09-26  2:22   ` [PATCH v5 15/17] device-dax: add an 'align' attribute Andrew Morton
2020-09-26  3:31     ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).