linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/3] PCI/IOMMU: Reserve IOVAs for PCI inbound memory
@ 2017-05-22 16:39 Oza Pawandeep
  2017-05-22 16:39 ` [PATCH v7 1/3] OF/PCI: Export inbound memory interface to PCI RC drivers Oza Pawandeep
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Oza Pawandeep @ 2017-05-22 16:39 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: iommu, linux-pci, linux-kernel, linux-arm-kernel, devicetree,
	bcm-kernel-feedback-list, Oza Pawandeep, Oza Pawandeep

iproc based PCI RC and Stingray SOC has limitaiton of addressing only 512GB
memory at once.

IOVA allocation honors device's coherent_dma_mask/dma_mask.
In PCI case, current code honors DMA mask set by EP, there is no
concept of PCI host bridge dma-mask,  should be there and hence
could truly reflect the limitation of PCI host bridge.

However assuming Linux takes care of largest possible dma_mask, still the
limitation could exist, because of the way memory banks are implemented.

for e.g. memory banks:
<0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
<0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
<0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
<0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */

When run User space (SPDK) which internally uses vfio in order to access
PCI EndPoint directly.

Vfio uses huge-pages which could come from 640G/0x000000a0. 
And the way vfio maps the hugepage is to have phys addr as iova,
and ends up calling VFIO_IOMMU_MAP_DMA ends up calling iommu_map,
inturn arm_lpae_map mapping iovas out of range.

So the way kernel allocates IOVA (where it honours device dma_mask) and
the way userspace gets IOVA is different.

dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>; will not work.

Instead we have to go for scattered dma-ranges leaving holes.
Hence, we have to reserve IOVA allocations for inbound memory.
The patch-set caters to only addressing IOVA allocation problem.

Changes since v7:
- Robin's comment addressed
where he wanted to remove depedency between IOMMU and OF layer.
- Bjorn Helgaas's comments addressed.

Changes since v6:
- Robin's comments addressed.

Changes since v5:
Changes since v4:
Changes since v3:
Changes since v2:
- minor changes, redudant checkes removed
- removed internal review

Changes since v1:
- address Rob's comments.
- Add a get_dma_ranges() function to of_bus struct..
- Convert existing contents of of_dma_get_range function to
  of_bus_default_dma_get_ranges and adding that to the
  default of_bus struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.


Oza Pawandeep (3):
  OF/PCI: expose inbound memory interface to PCI RC drivers.
  IOMMU/PCI: reserve IOVA for inbound memory for PCI masters
  PCI: add support for inbound windows resources

 drivers/iommu/dma-iommu.c | 44 ++++++++++++++++++++--
 drivers/of/of_pci.c       | 96 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/pci/probe.c       | 30 +++++++++++++--
 include/linux/of_pci.h    |  7 ++++
 include/linux/pci.h       |  1 +
 5 files changed, 170 insertions(+), 8 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v7 1/3] OF/PCI: Export inbound memory interface to PCI RC drivers.
  2017-05-22 16:39 [PATCH v7 0/3] PCI/IOMMU: Reserve IOVAs for PCI inbound memory Oza Pawandeep
@ 2017-05-22 16:39 ` Oza Pawandeep
  2017-05-22 16:39 ` [PATCH v7 2/3] PCI: Add support for PCI inbound windows resources Oza Pawandeep
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Oza Pawandeep @ 2017-05-22 16:39 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: iommu, linux-pci, linux-kernel, linux-arm-kernel, devicetree,
	bcm-kernel-feedback-list, Oza Pawandeep, Oza Pawandeep

The patch exports interface to PCIe RC drivers so that,
Drivers can get their inbound memory configuration.

It provides basis for IOVA reservations for inbound memory
holes, if RC is not capable of addressing all the host memory,
Specifically when IOMMU is enabled and on ARMv8 where 64bit IOVA
could be allocated.

It handles multiple inbound windows, and returns resources,
and is left to the caller, how it wants to use them.

Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index c9d4d3a..20cf527 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,102 @@ int of_pci_get_host_bridge_resources(struct device_node *dev,
 	return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *dn, struct list_head *resources)
+{
+	struct device_node *node = of_node_get(dn);
+	int rlen;
+	int pna = of_n_addr_cells(node);
+	const int na = 3, ns = 2;
+	int np = pna + na + ns;
+	int ret = 0;
+	struct resource *res;
+	const u32 *dma_ranges;
+	struct of_pci_range range;
+
+	if (!node)
+		return -EINVAL;
+
+	while (1) {
+		dma_ranges = of_get_property(node, "dma-ranges", &rlen);
+
+		/* Ignore empty ranges, they imply no translation required. */
+		if (dma_ranges && rlen > 0)
+			break;
+
+		/* no dma-ranges, they imply no translation required. */
+		if (!dma_ranges)
+			break;
+
+		node = of_get_next_parent(node);
+
+		if (!node)
+			break;
+	}
+
+	if (!dma_ranges) {
+		pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+			  dn->full_name);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	while ((rlen -= np * 4) >= 0) {
+		range.pci_space = be32_to_cpup((const __be32 *) &dma_ranges[0]);
+		range.pci_addr = of_read_number(dma_ranges + 1, ns);
+		range.cpu_addr = of_translate_dma_address(node,
+							dma_ranges + na);
+		range.size = of_read_number(dma_ranges + pna + na, ns);
+		range.flags = IORESOURCE_MEM;
+
+		dma_ranges += np;
+
+		/*
+		 * If we failed translation or got a zero-sized region
+		 * then skip this range.
+		 */
+		if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+			continue;
+
+		res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+		if (!res) {
+			ret = -ENOMEM;
+			goto parse_failed;
+		}
+
+		ret = of_pci_range_to_resource(&range, dn, res);
+		if (ret) {
+			kfree(res);
+			continue;
+		}
+
+		pci_add_resource_offset(resources, res,
+					res->start - range.pci_addr);
+	}
+	return ret;
+
+parse_failed:
+	pci_free_resource_list(resources);
+out:
+	of_node_put(node);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 /**
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 518c8d2..0eafe86 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max,
 			struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
 {
 	return -EINVAL;
 }
+
+static inline int of_pci_get_dma_ranges(struct device_node *np,
+					struct list_head *resources)
+{
+	return -EINVAL;
+}
 #endif
 
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v7 2/3] PCI: Add support for PCI inbound windows resources
  2017-05-22 16:39 [PATCH v7 0/3] PCI/IOMMU: Reserve IOVAs for PCI inbound memory Oza Pawandeep
  2017-05-22 16:39 ` [PATCH v7 1/3] OF/PCI: Export inbound memory interface to PCI RC drivers Oza Pawandeep
@ 2017-05-22 16:39 ` Oza Pawandeep
  2017-05-30 22:42   ` Bjorn Helgaas
  2017-05-22 16:39 ` [PATCH v7 3/3] IOMMU/PCI: Reserve IOVA for inbound memory for PCI masters Oza Pawandeep
  2017-05-22 19:18 ` [PATCH v7 0/3] PCI/IOMMU: Reserve IOVAs for PCI inbound memory Alex Williamson
  3 siblings, 1 reply; 11+ messages in thread
From: Oza Pawandeep @ 2017-05-22 16:39 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: iommu, linux-pci, linux-kernel, linux-arm-kernel, devicetree,
	bcm-kernel-feedback-list, Oza Pawandeep, Oza Pawandeep

This patch adds support for inbound memory window
for PCI RC drivers.

It defines new function pci_create_root_bus2 which
takes inbound resources as an argument and fills in the
memory resource to PCI internal host bridge structure
as inbound_windows.

Legacy RC driver could continue to use pci_create_root_bus,
but any RC driver who wants to reseve IOVAS for their
inbound memory holes, should use new API pci_create_root_bus2.

Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 19c8950..a95b9bb 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -531,6 +531,7 @@ struct pci_host_bridge *pci_alloc_host_bridge(size_t priv)
 		return NULL;
 
 	INIT_LIST_HEAD(&bridge->windows);
+	INIT_LIST_HEAD(&bridge->inbound_windows);
 
 	return bridge;
 }
@@ -726,6 +727,7 @@ int pci_register_host_bridge(struct pci_host_bridge *bridge)
 	struct pci_bus *bus, *b;
 	resource_size_t offset;
 	LIST_HEAD(resources);
+	LIST_HEAD(inbound_resources);
 	struct resource *res;
 	char addr[64], *fmt;
 	const char *name;
@@ -739,6 +741,8 @@ int pci_register_host_bridge(struct pci_host_bridge *bridge)
 
 	/* temporarily move resources off the list */
 	list_splice_init(&bridge->windows, &resources);
+	list_splice_init(&bridge->inbound_windows, &inbound_resources);
+
 	bus->sysdata = bridge->sysdata;
 	bus->msi = bridge->msi;
 	bus->ops = bridge->ops;
@@ -794,6 +798,10 @@ int pci_register_host_bridge(struct pci_host_bridge *bridge)
 	else
 		pr_info("PCI host bridge to bus %s\n", name);
 
+	/* Add inbound mem resource. */
+	resource_list_for_each_entry_safe(window, n, &inbound_resources)
+		list_move_tail(&window->node, &bridge->inbound_windows);
+
 	/* Add initial resources to the bus */
 	resource_list_for_each_entry_safe(window, n, &resources) {
 		list_move_tail(&window->node, &bridge->windows);
@@ -2300,7 +2308,8 @@ void __weak pcibios_remove_bus(struct pci_bus *bus)
 
 static struct pci_bus *pci_create_root_bus_msi(struct device *parent,
 		int bus, struct pci_ops *ops, void *sysdata,
-		struct list_head *resources, struct msi_controller *msi)
+		struct list_head *resources, struct list_head *in_res,
+		struct msi_controller *msi)
 {
 	int error;
 	struct pci_host_bridge *bridge;
@@ -2313,6 +2322,9 @@ static struct pci_bus *pci_create_root_bus_msi(struct device *parent,
 	bridge->dev.release = pci_release_host_bridge_dev;
 
 	list_splice_init(resources, &bridge->windows);
+	if (in_res)
+		list_splice_init(in_res, &bridge->inbound_windows);
+
 	bridge->sysdata = sysdata;
 	bridge->busnr = bus;
 	bridge->ops = ops;
@@ -2329,11 +2341,20 @@ static struct pci_bus *pci_create_root_bus_msi(struct device *parent,
 	return NULL;
 }
 
+struct pci_bus *pci_create_root_bus2(struct device *parent, int bus,
+		struct pci_ops *ops, void *sysdata, struct list_head *resources,
+		struct list_head *in_res)
+{
+	return pci_create_root_bus_msi(parent, bus, ops, sysdata,
+				       resources, in_res, NULL);
+}
+EXPORT_SYMBOL_GPL(pci_create_root_bus2);
+
 struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
 		struct pci_ops *ops, void *sysdata, struct list_head *resources)
 {
-	return pci_create_root_bus_msi(parent, bus, ops, sysdata, resources,
-				       NULL);
+	return pci_create_root_bus_msi(parent, bus, ops, sysdata,
+				       resources, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(pci_create_root_bus);
 
@@ -2415,7 +2436,8 @@ struct pci_bus *pci_scan_root_bus_msi(struct device *parent, int bus,
 			break;
 		}
 
-	b = pci_create_root_bus_msi(parent, bus, ops, sysdata, resources, msi);
+	b = pci_create_root_bus_msi(parent, bus, ops, sysdata,
+				    resources, NULL, msi);
 	if (!b)
 		return NULL;
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 33c2b0b..d2df107 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -432,6 +432,7 @@ struct pci_host_bridge {
 	void *sysdata;
 	int busnr;
 	struct list_head windows;	/* resource_entry */
+	struct list_head inbound_windows;	/* inbound memory */
 	void (*release_fn)(struct pci_host_bridge *);
 	void *release_data;
 	struct msi_controller *msi;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v7 3/3] IOMMU/PCI: Reserve IOVA for inbound memory for PCI masters
  2017-05-22 16:39 [PATCH v7 0/3] PCI/IOMMU: Reserve IOVAs for PCI inbound memory Oza Pawandeep
  2017-05-22 16:39 ` [PATCH v7 1/3] OF/PCI: Export inbound memory interface to PCI RC drivers Oza Pawandeep
  2017-05-22 16:39 ` [PATCH v7 2/3] PCI: Add support for PCI inbound windows resources Oza Pawandeep
@ 2017-05-22 16:39 ` Oza Pawandeep
  2017-07-19 12:07   ` Oza Oza
  2017-05-22 19:18 ` [PATCH v7 0/3] PCI/IOMMU: Reserve IOVAs for PCI inbound memory Alex Williamson
  3 siblings, 1 reply; 11+ messages in thread
From: Oza Pawandeep @ 2017-05-22 16:39 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: iommu, linux-pci, linux-kernel, linux-arm-kernel, devicetree,
	bcm-kernel-feedback-list, Oza Pawandeep, Oza Pawandeep

This patch reserves the inbound memory holes for PCI masters.
ARM64 based SOCs may have scattered memory banks.
For e.g as iproc based SOC has

<0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
<0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
<0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
<0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */

But incoming PCI transaction addressing capability is limited
by host bridge, for example if max incoming window capability
is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.

To address this problem, iommu has to avoid allocating IOVA which
are reserved. 

Which in turn does not allocate IOVA if it falls into hole.
and the holes should be reserved before any of the IOVA allocations
can happen.

Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 8348f366..efe3d07 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -171,16 +171,15 @@ void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
 {
 	struct pci_host_bridge *bridge;
 	struct resource_entry *window;
+	struct iommu_resv_region *region;
+	phys_addr_t start, end;
+	size_t length;
 
 	if (!dev_is_pci(dev))
 		return;
 
 	bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
 	resource_list_for_each_entry(window, &bridge->windows) {
-		struct iommu_resv_region *region;
-		phys_addr_t start;
-		size_t length;
-
 		if (resource_type(window->res) != IORESOURCE_MEM)
 			continue;
 
@@ -193,6 +192,43 @@ void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
 
 		list_add_tail(&region->list, list);
 	}
+
+	/* PCI inbound memory reservation. */
+	start = length = 0;
+	resource_list_for_each_entry(window, &bridge->inbound_windows) {
+		end = window->res->start - window->offset;
+
+		if (start > end) {
+			/* multiple ranges assumed sorted. */
+			pr_warn("PCI: failed to reserve iovas\n");
+			return;
+		}
+
+		if (start != end) {
+			length = end - start - 1;
+			region = iommu_alloc_resv_region(start, length, 0,
+				IOMMU_RESV_RESERVED);
+			if (!region)
+				return;
+
+			list_add_tail(&region->list, list);
+		}
+
+		start += end + length + 1;
+	}
+	/*
+	 * the last dma-range should honour based on the
+	 * 32/64-bit dma addresses.
+	 */
+	if ((start) && (start < DMA_BIT_MASK(sizeof(dma_addr_t) * 8))) {
+		length = DMA_BIT_MASK((sizeof(dma_addr_t) * 8)) - 1;
+		region = iommu_alloc_resv_region(start, length, 0,
+			IOMMU_RESV_RESERVED);
+		if (!region)
+			return;
+
+		list_add_tail(&region->list, list);
+	}
 }
 EXPORT_SYMBOL(iommu_dma_get_resv_regions);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 0/3] PCI/IOMMU: Reserve IOVAs for PCI inbound memory
  2017-05-22 16:39 [PATCH v7 0/3] PCI/IOMMU: Reserve IOVAs for PCI inbound memory Oza Pawandeep
                   ` (2 preceding siblings ...)
  2017-05-22 16:39 ` [PATCH v7 3/3] IOMMU/PCI: Reserve IOVA for inbound memory for PCI masters Oza Pawandeep
@ 2017-05-22 19:18 ` Alex Williamson
  2017-05-23  5:00   ` Oza Oza
  3 siblings, 1 reply; 11+ messages in thread
From: Alex Williamson @ 2017-05-22 19:18 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: Joerg Roedel, Robin Murphy, iommu, linux-pci, linux-kernel,
	linux-arm-kernel, devicetree, bcm-kernel-feedback-list,
	Oza Pawandeep

On Mon, 22 May 2017 22:09:39 +0530
Oza Pawandeep <oza.oza@broadcom.com> wrote:

> iproc based PCI RC and Stingray SOC has limitaiton of addressing only 512GB
> memory at once.
> 
> IOVA allocation honors device's coherent_dma_mask/dma_mask.
> In PCI case, current code honors DMA mask set by EP, there is no
> concept of PCI host bridge dma-mask,  should be there and hence
> could truly reflect the limitation of PCI host bridge.
> 
> However assuming Linux takes care of largest possible dma_mask, still the
> limitation could exist, because of the way memory banks are implemented.
> 
> for e.g. memory banks:
> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
> 
> When run User space (SPDK) which internally uses vfio in order to access
> PCI EndPoint directly.
> 
> Vfio uses huge-pages which could come from 640G/0x000000a0. 
> And the way vfio maps the hugepage is to have phys addr as iova,
> and ends up calling VFIO_IOMMU_MAP_DMA ends up calling iommu_map,
> inturn arm_lpae_map mapping iovas out of range.
> 
> So the way kernel allocates IOVA (where it honours device dma_mask) and
> the way userspace gets IOVA is different.
> 
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>; will not work.
> 
> Instead we have to go for scattered dma-ranges leaving holes.
> Hence, we have to reserve IOVA allocations for inbound memory.
> The patch-set caters to only addressing IOVA allocation problem.


The description here confuses me, with vfio the user owns the iova
allocation problem.  Mappings are only identity mapped if the user
chooses to do so.  The dma_mask of the device is set by the driver and
only relevant to the DMA-API.  vfio is a meta-driver and doesn't know
the dma_mask of any particular device, that's the user's job.  Is the
net result of what's happening here for the vfio case simply to expose
extra reserved regions in sysfs, which the user can then consume to
craft a compatible iova?  Thanks,

Alex

> 
> Changes since v7:
> - Robin's comment addressed
> where he wanted to remove depedency between IOMMU and OF layer.
> - Bjorn Helgaas's comments addressed.
> 
> Changes since v6:
> - Robin's comments addressed.
> 
> Changes since v5:
> Changes since v4:
> Changes since v3:
> Changes since v2:
> - minor changes, redudant checkes removed
> - removed internal review
> 
> Changes since v1:
> - address Rob's comments.
> - Add a get_dma_ranges() function to of_bus struct..
> - Convert existing contents of of_dma_get_range function to
>   of_bus_default_dma_get_ranges and adding that to the
>   default of_bus struct.
> - Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.
> 
> 
> Oza Pawandeep (3):
>   OF/PCI: expose inbound memory interface to PCI RC drivers.
>   IOMMU/PCI: reserve IOVA for inbound memory for PCI masters
>   PCI: add support for inbound windows resources
> 
>  drivers/iommu/dma-iommu.c | 44 ++++++++++++++++++++--
>  drivers/of/of_pci.c       | 96 +++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/pci/probe.c       | 30 +++++++++++++--
>  include/linux/of_pci.h    |  7 ++++
>  include/linux/pci.h       |  1 +
>  5 files changed, 170 insertions(+), 8 deletions(-)
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 0/3] PCI/IOMMU: Reserve IOVAs for PCI inbound memory
  2017-05-22 19:18 ` [PATCH v7 0/3] PCI/IOMMU: Reserve IOVAs for PCI inbound memory Alex Williamson
@ 2017-05-23  5:00   ` Oza Oza
  0 siblings, 0 replies; 11+ messages in thread
From: Oza Oza @ 2017-05-23  5:00 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Joerg Roedel, Robin Murphy, Linux IOMMU, linux-pci, linux-kernel,
	Linux ARM, devicetree, BCM Kernel Feedback, Oza Pawandeep

On Tue, May 23, 2017 at 12:48 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Mon, 22 May 2017 22:09:39 +0530
> Oza Pawandeep <oza.oza@broadcom.com> wrote:
>
>> iproc based PCI RC and Stingray SOC has limitaiton of addressing only 512GB
>> memory at once.
>>
>> IOVA allocation honors device's coherent_dma_mask/dma_mask.
>> In PCI case, current code honors DMA mask set by EP, there is no
>> concept of PCI host bridge dma-mask,  should be there and hence
>> could truly reflect the limitation of PCI host bridge.
>>
>> However assuming Linux takes care of largest possible dma_mask, still the
>> limitation could exist, because of the way memory banks are implemented.
>>
>> for e.g. memory banks:
>> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
>> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
>> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
>> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
>>
>> When run User space (SPDK) which internally uses vfio in order to access
>> PCI EndPoint directly.
>>
>> Vfio uses huge-pages which could come from 640G/0x000000a0.
>> And the way vfio maps the hugepage is to have phys addr as iova,
>> and ends up calling VFIO_IOMMU_MAP_DMA ends up calling iommu_map,
>> inturn arm_lpae_map mapping iovas out of range.
>>
>> So the way kernel allocates IOVA (where it honours device dma_mask) and
>> the way userspace gets IOVA is different.
>>
>> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>; will not work.
>>
>> Instead we have to go for scattered dma-ranges leaving holes.
>> Hence, we have to reserve IOVA allocations for inbound memory.
>> The patch-set caters to only addressing IOVA allocation problem.
>
>
> The description here confuses me, with vfio the user owns the iova
> allocation problem.  Mappings are only identity mapped if the user
> chooses to do so.  The dma_mask of the device is set by the driver and
> only relevant to the DMA-API.  vfio is a meta-driver and doesn't know
> the dma_mask of any particular device, that's the user's job.  Is the
> net result of what's happening here for the vfio case simply to expose
> extra reserved regions in sysfs, which the user can then consume to
> craft a compatible iova?  Thanks,
>
> Alex

Hi Alex,

this is not a VFIO problem, the reason I have mentioned VFIO because,
wanted to bring problem
statement as a whole (which includes both kernel space and user space).
The way SPDK pipeline is set, yes mapping are identity mapped, and
whatever user space passes down IOVA,
VFIO use is as is. which is fine and expected.

But the problem is, user space physical memory (hugepages)  reside
high enough in
memory, which could be beyond PCI RC's capability.

Again, this is not VFIO's problem, neither is of user-space.
In-fact both have nothing to do with dma-mask as well.
My reference of dma-mask was for Linux IOMMU framework (not for VFIO)

Regards,
Oza.
>
>>
>> Changes since v7:
>> - Robin's comment addressed
>> where he wanted to remove depedency between IOMMU and OF layer.
>> - Bjorn Helgaas's comments addressed.
>>
>> Changes since v6:
>> - Robin's comments addressed.
>>
>> Changes since v5:
>> Changes since v4:
>> Changes since v3:
>> Changes since v2:
>> - minor changes, redudant checkes removed
>> - removed internal review
>>
>> Changes since v1:
>> - address Rob's comments.
>> - Add a get_dma_ranges() function to of_bus struct..
>> - Convert existing contents of of_dma_get_range function to
>>   of_bus_default_dma_get_ranges and adding that to the
>>   default of_bus struct.
>> - Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.
>>
>>
>> Oza Pawandeep (3):
>>   OF/PCI: expose inbound memory interface to PCI RC drivers.
>>   IOMMU/PCI: reserve IOVA for inbound memory for PCI masters
>>   PCI: add support for inbound windows resources
>>
>>  drivers/iommu/dma-iommu.c | 44 ++++++++++++++++++++--
>>  drivers/of/of_pci.c       | 96 +++++++++++++++++++++++++++++++++++++++++++++++
>>  drivers/pci/probe.c       | 30 +++++++++++++--
>>  include/linux/of_pci.h    |  7 ++++
>>  include/linux/pci.h       |  1 +
>>  5 files changed, 170 insertions(+), 8 deletions(-)
>>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 2/3] PCI: Add support for PCI inbound windows resources
  2017-05-22 16:39 ` [PATCH v7 2/3] PCI: Add support for PCI inbound windows resources Oza Pawandeep
@ 2017-05-30 22:42   ` Bjorn Helgaas
  2017-05-31 16:17     ` Oza Oza
  0 siblings, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2017-05-30 22:42 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: Joerg Roedel, Robin Murphy, open list:INTEL IOMMU (VT-d),
	linux-pci, linux-kernel, linux-arm, devicetree,
	bcm-kernel-feedback-list, Oza Pawandeep

On Mon, May 22, 2017 at 11:39 AM, Oza Pawandeep <oza.oza@broadcom.com> wrote:
> This patch adds support for inbound memory window
> for PCI RC drivers.
>
> It defines new function pci_create_root_bus2 which
> takes inbound resources as an argument and fills in the
> memory resource to PCI internal host bridge structure
> as inbound_windows.
>
> Legacy RC driver could continue to use pci_create_root_bus,
> but any RC driver who wants to reseve IOVAS for their
> inbound memory holes, should use new API pci_create_root_bus2.
>
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> ...

> +struct pci_bus *pci_create_root_bus2(struct device *parent, int bus,
> +               struct pci_ops *ops, void *sysdata, struct list_head *resources,
> +               struct list_head *in_res)
> +{
> +       return pci_create_root_bus_msi(parent, bus, ops, sysdata,
> +                                      resources, in_res, NULL);
> +}
> +EXPORT_SYMBOL_GPL(pci_create_root_bus2);

Based on your response to Lorenzo's "[RFC/RFT PATCH 03/18] PCI:
Introduce pci_scan_root_bus_bridge()", I'm hoping you can avoid adding
yet another variant of pci_create_root_bus().

So I think I can wait for that to settle out and look for a v8?

Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 2/3] PCI: Add support for PCI inbound windows resources
  2017-05-30 22:42   ` Bjorn Helgaas
@ 2017-05-31 16:17     ` Oza Oza
  2017-06-01 17:08       ` Bjorn Helgaas
  0 siblings, 1 reply; 11+ messages in thread
From: Oza Oza @ 2017-05-31 16:17 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Joerg Roedel, Robin Murphy, open list:INTEL IOMMU (VT-d),
	linux-pci, linux-kernel, linux-arm, devicetree,
	bcm-kernel-feedback-list, Oza Pawandeep

On Wed, May 31, 2017 at 4:12 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Mon, May 22, 2017 at 11:39 AM, Oza Pawandeep <oza.oza@broadcom.com> wrote:
>> This patch adds support for inbound memory window
>> for PCI RC drivers.
>>
>> It defines new function pci_create_root_bus2 which
>> takes inbound resources as an argument and fills in the
>> memory resource to PCI internal host bridge structure
>> as inbound_windows.
>>
>> Legacy RC driver could continue to use pci_create_root_bus,
>> but any RC driver who wants to reseve IOVAS for their
>> inbound memory holes, should use new API pci_create_root_bus2.
>>
>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>> ...
>
>> +struct pci_bus *pci_create_root_bus2(struct device *parent, int bus,
>> +               struct pci_ops *ops, void *sysdata, struct list_head *resources,
>> +               struct list_head *in_res)
>> +{
>> +       return pci_create_root_bus_msi(parent, bus, ops, sysdata,
>> +                                      resources, in_res, NULL);
>> +}
>> +EXPORT_SYMBOL_GPL(pci_create_root_bus2);
>
> Based on your response to Lorenzo's "[RFC/RFT PATCH 03/18] PCI:
> Introduce pci_scan_root_bus_bridge()", I'm hoping you can avoid adding
> yet another variant of pci_create_root_bus().
>
> So I think I can wait for that to settle out and look for a v8?
>
> Bjorn

Sure Bjorn, please wait for v8.

But there is one more associated patch
[PATCH v7 1/3] OF/PCI: Export inbound memory interface to PCI RC
which basically aims to provide an interface to RC drivers for their
inbound resources.
RC driver already get their outbound resources from
of_pci_get_host_bridge_resources,
similar attempt for inbound dma-ranges.

Thanking you for looking into this.

Regards,
Oza.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 2/3] PCI: Add support for PCI inbound windows resources
  2017-05-31 16:17     ` Oza Oza
@ 2017-06-01 17:08       ` Bjorn Helgaas
  2017-06-01 18:06         ` Oza Oza
  0 siblings, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2017-06-01 17:08 UTC (permalink / raw)
  To: Oza Oza
  Cc: Joerg Roedel, Robin Murphy, open list:INTEL IOMMU (VT-d),
	linux-pci, linux-kernel, linux-arm, devicetree,
	bcm-kernel-feedback-list, Oza Pawandeep

On Wed, May 31, 2017 at 11:17 AM, Oza Oza <oza.oza@broadcom.com> wrote:
> On Wed, May 31, 2017 at 4:12 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Mon, May 22, 2017 at 11:39 AM, Oza Pawandeep <oza.oza@broadcom.com> wrote:
>>> This patch adds support for inbound memory window
>>> for PCI RC drivers.
>>>
>>> It defines new function pci_create_root_bus2 which
>>> takes inbound resources as an argument and fills in the
>>> memory resource to PCI internal host bridge structure
>>> as inbound_windows.
>>>
>>> Legacy RC driver could continue to use pci_create_root_bus,
>>> but any RC driver who wants to reseve IOVAS for their
>>> inbound memory holes, should use new API pci_create_root_bus2.
>>>
>>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>>> ...
>>
>>> +struct pci_bus *pci_create_root_bus2(struct device *parent, int bus,
>>> +               struct pci_ops *ops, void *sysdata, struct list_head *resources,
>>> +               struct list_head *in_res)
>>> +{
>>> +       return pci_create_root_bus_msi(parent, bus, ops, sysdata,
>>> +                                      resources, in_res, NULL);
>>> +}
>>> +EXPORT_SYMBOL_GPL(pci_create_root_bus2);
>>
>> Based on your response to Lorenzo's "[RFC/RFT PATCH 03/18] PCI:
>> Introduce pci_scan_root_bus_bridge()", I'm hoping you can avoid adding
>> yet another variant of pci_create_root_bus().
>>
>> So I think I can wait for that to settle out and look for a v8?
>>
>> Bjorn
>
> Sure Bjorn, please wait for v8.
>
> But there is one more associated patch
> [PATCH v7 1/3] OF/PCI: Export inbound memory interface to PCI RC
> which basically aims to provide an interface to RC drivers for their
> inbound resources.
> RC driver already get their outbound resources from
> of_pci_get_host_bridge_resources,
> similar attempt for inbound dma-ranges.

Not sure I understand.  Patch 1/3 adds of_pci_get_dma_ranges(), but
none of the patches adds a caller, so I don't see the point of it yet.

In general, if I'm expecting another revision of one patch in a
series, I expect the next revision to include *all* the patches in the
series.  I normally don't pick out and apply individual patches from
the series.

Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 2/3] PCI: Add support for PCI inbound windows resources
  2017-06-01 17:08       ` Bjorn Helgaas
@ 2017-06-01 18:06         ` Oza Oza
  0 siblings, 0 replies; 11+ messages in thread
From: Oza Oza @ 2017-06-01 18:06 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Joerg Roedel, Robin Murphy, open list:INTEL IOMMU (VT-d),
	linux-pci, linux-kernel, linux-arm, devicetree,
	bcm-kernel-feedback-list, Oza Pawandeep

On Thu, Jun 1, 2017 at 10:38 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Wed, May 31, 2017 at 11:17 AM, Oza Oza <oza.oza@broadcom.com> wrote:
>> On Wed, May 31, 2017 at 4:12 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> On Mon, May 22, 2017 at 11:39 AM, Oza Pawandeep <oza.oza@broadcom.com> wrote:
>>>> This patch adds support for inbound memory window
>>>> for PCI RC drivers.
>>>>
>>>> It defines new function pci_create_root_bus2 which
>>>> takes inbound resources as an argument and fills in the
>>>> memory resource to PCI internal host bridge structure
>>>> as inbound_windows.
>>>>
>>>> Legacy RC driver could continue to use pci_create_root_bus,
>>>> but any RC driver who wants to reseve IOVAS for their
>>>> inbound memory holes, should use new API pci_create_root_bus2.
>>>>
>>>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>>>> ...
>>>
>>>> +struct pci_bus *pci_create_root_bus2(struct device *parent, int bus,
>>>> +               struct pci_ops *ops, void *sysdata, struct list_head *resources,
>>>> +               struct list_head *in_res)
>>>> +{
>>>> +       return pci_create_root_bus_msi(parent, bus, ops, sysdata,
>>>> +                                      resources, in_res, NULL);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(pci_create_root_bus2);
>>>
>>> Based on your response to Lorenzo's "[RFC/RFT PATCH 03/18] PCI:
>>> Introduce pci_scan_root_bus_bridge()", I'm hoping you can avoid adding
>>> yet another variant of pci_create_root_bus().
>>>
>>> So I think I can wait for that to settle out and look for a v8?
>>>
>>> Bjorn
>>
>> Sure Bjorn, please wait for v8.
>>
>> But there is one more associated patch
>> [PATCH v7 1/3] OF/PCI: Export inbound memory interface to PCI RC
>> which basically aims to provide an interface to RC drivers for their
>> inbound resources.
>> RC driver already get their outbound resources from
>> of_pci_get_host_bridge_resources,
>> similar attempt for inbound dma-ranges.
>
> Not sure I understand.  Patch 1/3 adds of_pci_get_dma_ranges(), but
> none of the patches adds a caller, so I don't see the point of it yet.
>
> In general, if I'm expecting another revision of one patch in a
> series, I expect the next revision to include *all* the patches in the
> series.  I normally don't pick out and apply individual patches from
> the series.
>
> Bjorn

Yes, it does not get called by anybody, because it is supposed to be called
by RC drivers who want to reserve IOVAs, not all the PCI host bridge driver
might call it, but certainly iproc based PCI driver has to call it.

Then I will have PATCH v8, and with that, I will include PCI RC driver
patch calling it as well.
Thanks for the Review.

Regards,
Oza.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 3/3] IOMMU/PCI: Reserve IOVA for inbound memory for PCI masters
  2017-05-22 16:39 ` [PATCH v7 3/3] IOMMU/PCI: Reserve IOVA for inbound memory for PCI masters Oza Pawandeep
@ 2017-07-19 12:07   ` Oza Oza
  0 siblings, 0 replies; 11+ messages in thread
From: Oza Oza @ 2017-07-19 12:07 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: Linux IOMMU, linux-pci, linux-kernel, Linux ARM, devicetree,
	BCM Kernel Feedback, Oza Pawandeep, Oza Pawandeep

Hi Robin,

My apology for noise.

I have taken care of your comments.
but these whole patch-set, (specially PCI patch-set) inbound memory
addition depends on Lorenzo's patch-set
.
So I will be posting version 8 patches for IOVA reservation soon after
Lorenzo's patches are made in.

Regards,
Oza.

On Mon, May 22, 2017 at 10:09 PM, Oza Pawandeep <oza.oza@broadcom.com> wrote:
> This patch reserves the inbound memory holes for PCI masters.
> ARM64 based SOCs may have scattered memory banks.
> For e.g as iproc based SOC has
>
> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
>
> But incoming PCI transaction addressing capability is limited
> by host bridge, for example if max incoming window capability
> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
>
> To address this problem, iommu has to avoid allocating IOVA which
> are reserved.
>
> Which in turn does not allocate IOVA if it falls into hole.
> and the holes should be reserved before any of the IOVA allocations
> can happen.
>
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 8348f366..efe3d07 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -171,16 +171,15 @@ void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
>  {
>         struct pci_host_bridge *bridge;
>         struct resource_entry *window;
> +       struct iommu_resv_region *region;
> +       phys_addr_t start, end;
> +       size_t length;
>
>         if (!dev_is_pci(dev))
>                 return;
>
>         bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
>         resource_list_for_each_entry(window, &bridge->windows) {
> -               struct iommu_resv_region *region;
> -               phys_addr_t start;
> -               size_t length;
> -
>                 if (resource_type(window->res) != IORESOURCE_MEM)
>                         continue;
>
> @@ -193,6 +192,43 @@ void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
>
>                 list_add_tail(&region->list, list);
>         }
> +
> +       /* PCI inbound memory reservation. */
> +       start = length = 0;
> +       resource_list_for_each_entry(window, &bridge->inbound_windows) {
> +               end = window->res->start - window->offset;
> +
> +               if (start > end) {
> +                       /* multiple ranges assumed sorted. */
> +                       pr_warn("PCI: failed to reserve iovas\n");
> +                       return;
> +               }
> +
> +               if (start != end) {
> +                       length = end - start - 1;
> +                       region = iommu_alloc_resv_region(start, length, 0,
> +                               IOMMU_RESV_RESERVED);
> +                       if (!region)
> +                               return;
> +
> +                       list_add_tail(&region->list, list);
> +               }
> +
> +               start += end + length + 1;
> +       }
> +       /*
> +        * the last dma-range should honour based on the
> +        * 32/64-bit dma addresses.
> +        */
> +       if ((start) && (start < DMA_BIT_MASK(sizeof(dma_addr_t) * 8))) {
> +               length = DMA_BIT_MASK((sizeof(dma_addr_t) * 8)) - 1;
> +               region = iommu_alloc_resv_region(start, length, 0,
> +                       IOMMU_RESV_RESERVED);
> +               if (!region)
> +                       return;
> +
> +               list_add_tail(&region->list, list);
> +       }
>  }
>  EXPORT_SYMBOL(iommu_dma_get_resv_regions);
>
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-07-19 12:07 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-22 16:39 [PATCH v7 0/3] PCI/IOMMU: Reserve IOVAs for PCI inbound memory Oza Pawandeep
2017-05-22 16:39 ` [PATCH v7 1/3] OF/PCI: Export inbound memory interface to PCI RC drivers Oza Pawandeep
2017-05-22 16:39 ` [PATCH v7 2/3] PCI: Add support for PCI inbound windows resources Oza Pawandeep
2017-05-30 22:42   ` Bjorn Helgaas
2017-05-31 16:17     ` Oza Oza
2017-06-01 17:08       ` Bjorn Helgaas
2017-06-01 18:06         ` Oza Oza
2017-05-22 16:39 ` [PATCH v7 3/3] IOMMU/PCI: Reserve IOVA for inbound memory for PCI masters Oza Pawandeep
2017-07-19 12:07   ` Oza Oza
2017-05-22 19:18 ` [PATCH v7 0/3] PCI/IOMMU: Reserve IOVAs for PCI inbound memory Alex Williamson
2017-05-23  5:00   ` Oza Oza

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).