All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-03  4:46 ` Oza Pawandeep via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Pawandeep @ 2017-05-03  4:46 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: iommu, linux-pci, linux-kernel, linux-arm-kernel, devicetree,
	bcm-kernel-feedback-list, Oza Pawandeep, Oza Pawandeep

current device framework and of framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch serves following:

1) exposes interface to the pci host driver for their
inbound memory ranges

2) provide an interface to callers such as of_dma_get_ranges.
so then the returned size get best possible (largest) dma_mask.
because PCI RC drivers do not call APIs such as
dma_set_coherent_mask() and hence rather it shows its addressing
capabilities based on dma-ranges.
for e.g.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7fffffffff.

3) this patch handles multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.
the new function returns the resources in a standard and unform way

4) this way the callers of for e.g. of_dma_get_ranges
does not need to change.

5) leaves scope of adding PCI flag handling for inbound memory
by the new function.

Bug: SOC-5216
Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
Tested-by: CCXSW <ccxswbuild@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..ed6e69a 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev,
 	return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+	struct device_node *node = of_node_get(np);
+	int rlen;
+	int ret = 0;
+	const int na = 3, ns = 2;
+	struct resource *res;
+	struct of_pci_range_parser parser;
+	struct of_pci_range range;
+
+	if (!node)
+		return -EINVAL;
+
+	parser.node = node;
+	parser.pna = of_n_addr_cells(node);
+	parser.np = parser.pna + na + ns;
+
+	parser.range = of_get_property(node, "dma-ranges", &rlen);
+
+	if (!parser.range) {
+		pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+			  np->full_name);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	parser.end = parser.range + rlen / sizeof(__be32);
+
+	for_each_of_pci_range(&parser, &range) {
+		/*
+		 * If we failed translation or got a zero-sized region
+		 * then skip this range
+		 */
+		if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+			continue;
+
+		res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+		if (!res) {
+			ret = -ENOMEM;
+			goto parse_failed;
+		}
+
+		ret = of_pci_range_to_resource(&range, np, res);
+		if (ret) {
+			kfree(res);
+			continue;
+		}
+
+		pci_add_resource_offset(resources, res,
+					res->start - range.pci_addr);
+	}
+
+	return ret;
+
+parse_failed:
+	pci_free_resource_list(resources);
+out:
+	of_node_put(node);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..617b90d 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max,
 			struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
 {
 	return -EINVAL;
 }
+
+static inline int of_pci_get_dma_ranges(struct device_node *np,
+					struct list_head *resources)
+{
+	return -EINVAL;
+}
 #endif
 
 #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-03  4:46 ` Oza Pawandeep via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Pawandeep via iommu @ 2017-05-03  4:46 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Oza Pawandeep,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	bcm-kernel-feedback-list-dY08KVG/lbpWk0Htik3J/w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

current device framework and of framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch serves following:

1) exposes interface to the pci host driver for their
inbound memory ranges

2) provide an interface to callers such as of_dma_get_ranges.
so then the returned size get best possible (largest) dma_mask.
because PCI RC drivers do not call APIs such as
dma_set_coherent_mask() and hence rather it shows its addressing
capabilities based on dma-ranges.
for e.g.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7fffffffff.

3) this patch handles multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.
the new function returns the resources in a standard and unform way

4) this way the callers of for e.g. of_dma_get_ranges
does not need to change.

5) leaves scope of adding PCI flag handling for inbound memory
by the new function.

Bug: SOC-5216
Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
Signed-off-by: Oza Pawandeep <oza.oza-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
Reviewed-by: vpx_checkpatch status <vpx_checkpatch-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Reviewed-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Reviewed-by: Ray Jui <ray.jui-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Tested-by: vpx_autobuild status <vpx_autobuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Tested-by: vpx_smoketest status <vpx_smoketest-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Tested-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Reviewed-by: Scott Branden <scott.branden-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..ed6e69a 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev,
 	return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+	struct device_node *node = of_node_get(np);
+	int rlen;
+	int ret = 0;
+	const int na = 3, ns = 2;
+	struct resource *res;
+	struct of_pci_range_parser parser;
+	struct of_pci_range range;
+
+	if (!node)
+		return -EINVAL;
+
+	parser.node = node;
+	parser.pna = of_n_addr_cells(node);
+	parser.np = parser.pna + na + ns;
+
+	parser.range = of_get_property(node, "dma-ranges", &rlen);
+
+	if (!parser.range) {
+		pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+			  np->full_name);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	parser.end = parser.range + rlen / sizeof(__be32);
+
+	for_each_of_pci_range(&parser, &range) {
+		/*
+		 * If we failed translation or got a zero-sized region
+		 * then skip this range
+		 */
+		if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+			continue;
+
+		res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+		if (!res) {
+			ret = -ENOMEM;
+			goto parse_failed;
+		}
+
+		ret = of_pci_range_to_resource(&range, np, res);
+		if (ret) {
+			kfree(res);
+			continue;
+		}
+
+		pci_add_resource_offset(resources, res,
+					res->start - range.pci_addr);
+	}
+
+	return ret;
+
+parse_failed:
+	pci_free_resource_list(resources);
+out:
+	of_node_put(node);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..617b90d 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max,
 			struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
 {
 	return -EINVAL;
 }
+
+static inline int of_pci_get_dma_ranges(struct device_node *np,
+					struct list_head *resources)
+{
+	return -EINVAL;
+}
 #endif
 
 #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-03  4:46 ` Oza Pawandeep via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Pawandeep @ 2017-05-03  4:46 UTC (permalink / raw)
  To: linux-arm-kernel

current device framework and of framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch serves following:

1) exposes interface to the pci host driver for their
inbound memory ranges

2) provide an interface to callers such as of_dma_get_ranges.
so then the returned size get best possible (largest) dma_mask.
because PCI RC drivers do not call APIs such as
dma_set_coherent_mask() and hence rather it shows its addressing
capabilities based on dma-ranges.
for e.g.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7fffffffff.

3) this patch handles multiple inbound windows and dma-ranges.
it is left to the caller, how it wants to use them.
the new function returns the resources in a standard and unform way

4) this way the callers of for e.g. of_dma_get_ranges
does not need to change.

5) leaves scope of adding PCI flag handling for inbound memory
by the new function.

Bug: SOC-5216
Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
Tested-by: CCXSW <ccxswbuild@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 0ee42c3..ed6e69a 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev,
 	return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+/**
+ * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
+ * @np: device node of the host bridge having the dma-ranges property
+ * @resources: list where the range of resources will be added after DT parsing
+ *
+ * It is the caller's job to free the @resources list.
+ *
+ * This function will parse the "dma-ranges" property of a
+ * PCI host bridge device node and setup the resource mapping based
+ * on its content.
+ *
+ * It returns zero if the range parsing has been successful or a standard error
+ * value if it failed.
+ */
+
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
+{
+	struct device_node *node = of_node_get(np);
+	int rlen;
+	int ret = 0;
+	const int na = 3, ns = 2;
+	struct resource *res;
+	struct of_pci_range_parser parser;
+	struct of_pci_range range;
+
+	if (!node)
+		return -EINVAL;
+
+	parser.node = node;
+	parser.pna = of_n_addr_cells(node);
+	parser.np = parser.pna + na + ns;
+
+	parser.range = of_get_property(node, "dma-ranges", &rlen);
+
+	if (!parser.range) {
+		pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
+			  np->full_name);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	parser.end = parser.range + rlen / sizeof(__be32);
+
+	for_each_of_pci_range(&parser, &range) {
+		/*
+		 * If we failed translation or got a zero-sized region
+		 * then skip this range
+		 */
+		if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+			continue;
+
+		res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+		if (!res) {
+			ret = -ENOMEM;
+			goto parse_failed;
+		}
+
+		ret = of_pci_range_to_resource(&range, np, res);
+		if (ret) {
+			kfree(res);
+			continue;
+		}
+
+		pci_add_resource_offset(resources, res,
+					res->start - range.pci_addr);
+	}
+
+	return ret;
+
+parse_failed:
+	pci_free_resource_list(resources);
+out:
+	of_node_put(node);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */
 
 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 0e0974e..617b90d 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
 int of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max,
 			struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max,
@@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
 {
 	return -EINVAL;
 }
+
+static inline int of_pci_get_dma_ranges(struct device_node *np,
+					struct list_head *resources)
+{
+	return -EINVAL;
+}
 #endif
 
 #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-03  4:46   ` Oza Pawandeep via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Pawandeep @ 2017-05-03  4:46 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: iommu, linux-pci, linux-kernel, linux-arm-kernel, devicetree,
	bcm-kernel-feedback-list, Oza Pawandeep, Oza Pawandeep

this patch reserves the iova for PCI masters.
ARM64 based SOCs may have scattered memory banks.
such as iproc based SOC has

<0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
<0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
<0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
<0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */

but incoming PCI transcation addressing capability is limited
by host bridge, for example if max incoming window capability
is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.

to address this problem, iommu has to avoid allocating iovas which
are reserved. which inturn does not allocate iova if it falls into hole.

Bug: SOC-5216
Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
Tested-by: CCXSW <ccxswbuild@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 48d36ce..08764b0 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -27,6 +27,7 @@
 #include <linux/iova.h>
 #include <linux/irq.h>
 #include <linux/mm.h>
+#include <linux/of_pci.h>
 #include <linux/pci.h>
 #include <linux/scatterlist.h>
 #include <linux/vmalloc.h>
@@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
 		struct iova_domain *iovad)
 {
 	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+	struct device_node *np = bridge->dev.parent->of_node;
 	struct resource_entry *window;
 	unsigned long lo, hi;
+	int ret;
+	dma_addr_t tmp_dma_addr = 0, dma_addr;
+	LIST_HEAD(res);
 
 	resource_list_for_each_entry(window, &bridge->windows) {
 		if (resource_type(window->res) != IORESOURCE_MEM &&
@@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
 		hi = iova_pfn(iovad, window->res->end - window->offset);
 		reserve_iova(iovad, lo, hi);
 	}
+
+	/* PCI inbound memory reservation. */
+	ret = of_pci_get_dma_ranges(np, &res);
+	if (!ret) {
+		resource_list_for_each_entry(window, &res) {
+			struct resource *res_dma = window->res;
+
+			dma_addr = res_dma->start - window->offset;
+			if (tmp_dma_addr > dma_addr) {
+				pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");
+				return;
+			}
+			if (tmp_dma_addr != dma_addr) {
+				lo = iova_pfn(iovad, tmp_dma_addr);
+				hi = iova_pfn(iovad, dma_addr - 1);
+				reserve_iova(iovad, lo, hi);
+			}
+			tmp_dma_addr = window->res->end - window->offset;
+		}
+		/*
+		 * the last dma-range should honour based on the
+		 * 32/64-bit dma addresses.
+		 */
+		if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
+			lo = iova_pfn(iovad, tmp_dma_addr);
+			hi = iova_pfn(iovad,
+				      DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
+			reserve_iova(iovad, lo, hi);
+		}
+	}
 }
 
 /**
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-03  4:46   ` Oza Pawandeep via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Pawandeep via iommu @ 2017-05-03  4:46 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Oza Pawandeep,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	bcm-kernel-feedback-list-dY08KVG/lbpWk0Htik3J/w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

this patch reserves the iova for PCI masters.
ARM64 based SOCs may have scattered memory banks.
such as iproc based SOC has

<0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
<0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
<0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
<0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */

but incoming PCI transcation addressing capability is limited
by host bridge, for example if max incoming window capability
is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.

to address this problem, iommu has to avoid allocating iovas which
are reserved. which inturn does not allocate iova if it falls into hole.

Bug: SOC-5216
Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
Signed-off-by: Oza Pawandeep <oza.oza-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
Reviewed-by: vpx_checkpatch status <vpx_checkpatch-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Reviewed-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Tested-by: vpx_autobuild status <vpx_autobuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Tested-by: vpx_smoketest status <vpx_smoketest-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Tested-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Reviewed-by: Scott Branden <scott.branden-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 48d36ce..08764b0 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -27,6 +27,7 @@
 #include <linux/iova.h>
 #include <linux/irq.h>
 #include <linux/mm.h>
+#include <linux/of_pci.h>
 #include <linux/pci.h>
 #include <linux/scatterlist.h>
 #include <linux/vmalloc.h>
@@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
 		struct iova_domain *iovad)
 {
 	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+	struct device_node *np = bridge->dev.parent->of_node;
 	struct resource_entry *window;
 	unsigned long lo, hi;
+	int ret;
+	dma_addr_t tmp_dma_addr = 0, dma_addr;
+	LIST_HEAD(res);
 
 	resource_list_for_each_entry(window, &bridge->windows) {
 		if (resource_type(window->res) != IORESOURCE_MEM &&
@@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
 		hi = iova_pfn(iovad, window->res->end - window->offset);
 		reserve_iova(iovad, lo, hi);
 	}
+
+	/* PCI inbound memory reservation. */
+	ret = of_pci_get_dma_ranges(np, &res);
+	if (!ret) {
+		resource_list_for_each_entry(window, &res) {
+			struct resource *res_dma = window->res;
+
+			dma_addr = res_dma->start - window->offset;
+			if (tmp_dma_addr > dma_addr) {
+				pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");
+				return;
+			}
+			if (tmp_dma_addr != dma_addr) {
+				lo = iova_pfn(iovad, tmp_dma_addr);
+				hi = iova_pfn(iovad, dma_addr - 1);
+				reserve_iova(iovad, lo, hi);
+			}
+			tmp_dma_addr = window->res->end - window->offset;
+		}
+		/*
+		 * the last dma-range should honour based on the
+		 * 32/64-bit dma addresses.
+		 */
+		if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
+			lo = iova_pfn(iovad, tmp_dma_addr);
+			hi = iova_pfn(iovad,
+				      DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
+			reserve_iova(iovad, lo, hi);
+		}
+	}
 }
 
 /**
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-03  4:46   ` Oza Pawandeep via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Pawandeep @ 2017-05-03  4:46 UTC (permalink / raw)
  To: linux-arm-kernel

this patch reserves the iova for PCI masters.
ARM64 based SOCs may have scattered memory banks.
such as iproc based SOC has

<0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
<0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
<0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
<0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */

but incoming PCI transcation addressing capability is limited
by host bridge, for example if max incoming window capability
is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.

to address this problem, iommu has to avoid allocating iovas which
are reserved. which inturn does not allocate iova if it falls into hole.

Bug: SOC-5216
Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
Tested-by: CCXSW <ccxswbuild@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 48d36ce..08764b0 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -27,6 +27,7 @@
 #include <linux/iova.h>
 #include <linux/irq.h>
 #include <linux/mm.h>
+#include <linux/of_pci.h>
 #include <linux/pci.h>
 #include <linux/scatterlist.h>
 #include <linux/vmalloc.h>
@@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
 		struct iova_domain *iovad)
 {
 	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+	struct device_node *np = bridge->dev.parent->of_node;
 	struct resource_entry *window;
 	unsigned long lo, hi;
+	int ret;
+	dma_addr_t tmp_dma_addr = 0, dma_addr;
+	LIST_HEAD(res);
 
 	resource_list_for_each_entry(window, &bridge->windows) {
 		if (resource_type(window->res) != IORESOURCE_MEM &&
@@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
 		hi = iova_pfn(iovad, window->res->end - window->offset);
 		reserve_iova(iovad, lo, hi);
 	}
+
+	/* PCI inbound memory reservation. */
+	ret = of_pci_get_dma_ranges(np, &res);
+	if (!ret) {
+		resource_list_for_each_entry(window, &res) {
+			struct resource *res_dma = window->res;
+
+			dma_addr = res_dma->start - window->offset;
+			if (tmp_dma_addr > dma_addr) {
+				pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");
+				return;
+			}
+			if (tmp_dma_addr != dma_addr) {
+				lo = iova_pfn(iovad, tmp_dma_addr);
+				hi = iova_pfn(iovad, dma_addr - 1);
+				reserve_iova(iovad, lo, hi);
+			}
+			tmp_dma_addr = window->res->end - window->offset;
+		}
+		/*
+		 * the last dma-range should honour based on the
+		 * 32/64-bit dma addresses.
+		 */
+		if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
+			lo = iova_pfn(iovad, tmp_dma_addr);
+			hi = iova_pfn(iovad,
+				      DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
+			reserve_iova(iovad, lo, hi);
+		}
+	}
 }
 
 /**
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
@ 2017-05-03  4:46   ` Oza Pawandeep via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Pawandeep @ 2017-05-03  4:46 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: iommu, linux-pci, linux-kernel, linux-arm-kernel, devicetree,
	bcm-kernel-feedback-list, Oza Pawandeep, Oza Pawandeep

current device framework and of framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch fixes this patch fixes the bug in of_dma_get_range,
which with as is, parses the PCI memory ranges and return wrong
size as 0.

in order to get largest possible dma_mask. this patch also
retuns the largest possible size based on dma-ranges,

for e.g.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7fffffffff.

based on which iova allocation space will honour PCI host
bridge limitations.

Bug: SOC-5216
Change-Id: I4c534bdd17e70c6b27327d39d1656e8ed0cf56d6
Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
Reviewed-on: http://gerrit-ccxsw.broadcom.net/40762
Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..f7734fc 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include <linux/ioport.h>
 #include <linux/module.h>
 #include <linux/of_address.h>
+#include <linux/of_pci.h>
 #include <linux/pci.h>
 #include <linux/pci_regs.h>
 #include <linux/sizes.h>
@@ -830,6 +831,54 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz
 	int ret = 0;
 	u64 dmaaddr;
 
+#ifdef CONFIG_PCI
+	struct resource_entry *window;
+	LIST_HEAD(res);
+
+	if (!node)
+		return -EINVAL;
+
+	if (of_bus_pci_match(np)) {
+		*size = 0;
+		/*
+		 * PCI dma-ranges is not mandatory property.
+		 * many devices do no need to have it, since
+		 * host bridge does not require inbound memory
+		 * configuration or rather have design limitations.
+		 * so we look for dma-ranges, if missing we
+		 * just return the caller full size, and also
+		 * no dma-ranges suggests that, host bridge allows
+		 * whatever comes in, so we set dma_addr to 0.
+		 */
+		ret = of_pci_get_dma_ranges(np, &res);
+		if (!ret) {
+			resource_list_for_each_entry(window, &res) {
+			struct resource *res_dma = window->res;
+
+			if (*size < resource_size(res_dma)) {
+				*dma_addr = res_dma->start - window->offset;
+				*paddr = res_dma->start;
+				*size = resource_size(res_dma);
+				}
+			}
+		}
+		pci_free_resource_list(&res);
+
+		/* ignore the empty ranges. */
+		if (*size == 0) {
+			pr_debug("empty/zero size dma-ranges found for node(%s)\n",
+				np->full_name);
+			*size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8);
+			*dma_addr = *paddr = 0;
+			ret = 0;
+		}
+
+		pr_err("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+			 *dma_addr, *paddr, *size);
+		goto out;
+	}
+#endif
+
 	if (!node)
 		return -EINVAL;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
@ 2017-05-03  4:46   ` Oza Pawandeep via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Pawandeep via iommu @ 2017-05-03  4:46 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Oza Pawandeep,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	bcm-kernel-feedback-list-dY08KVG/lbpWk0Htik3J/w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

current device framework and of framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch fixes this patch fixes the bug in of_dma_get_range,
which with as is, parses the PCI memory ranges and return wrong
size as 0.

in order to get largest possible dma_mask. this patch also
retuns the largest possible size based on dma-ranges,

for e.g.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7fffffffff.

based on which iova allocation space will honour PCI host
bridge limitations.

Bug: SOC-5216
Change-Id: I4c534bdd17e70c6b27327d39d1656e8ed0cf56d6
Signed-off-by: Oza Pawandeep <oza.oza-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Reviewed-on: http://gerrit-ccxsw.broadcom.net/40762
Reviewed-by: vpx_checkpatch status <vpx_checkpatch-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Reviewed-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Reviewed-by: Scott Branden <scott.branden-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Tested-by: vpx_autobuild status <vpx_autobuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Tested-by: vpx_smoketest status <vpx_smoketest-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..f7734fc 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include <linux/ioport.h>
 #include <linux/module.h>
 #include <linux/of_address.h>
+#include <linux/of_pci.h>
 #include <linux/pci.h>
 #include <linux/pci_regs.h>
 #include <linux/sizes.h>
@@ -830,6 +831,54 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz
 	int ret = 0;
 	u64 dmaaddr;
 
+#ifdef CONFIG_PCI
+	struct resource_entry *window;
+	LIST_HEAD(res);
+
+	if (!node)
+		return -EINVAL;
+
+	if (of_bus_pci_match(np)) {
+		*size = 0;
+		/*
+		 * PCI dma-ranges is not mandatory property.
+		 * many devices do no need to have it, since
+		 * host bridge does not require inbound memory
+		 * configuration or rather have design limitations.
+		 * so we look for dma-ranges, if missing we
+		 * just return the caller full size, and also
+		 * no dma-ranges suggests that, host bridge allows
+		 * whatever comes in, so we set dma_addr to 0.
+		 */
+		ret = of_pci_get_dma_ranges(np, &res);
+		if (!ret) {
+			resource_list_for_each_entry(window, &res) {
+			struct resource *res_dma = window->res;
+
+			if (*size < resource_size(res_dma)) {
+				*dma_addr = res_dma->start - window->offset;
+				*paddr = res_dma->start;
+				*size = resource_size(res_dma);
+				}
+			}
+		}
+		pci_free_resource_list(&res);
+
+		/* ignore the empty ranges. */
+		if (*size == 0) {
+			pr_debug("empty/zero size dma-ranges found for node(%s)\n",
+				np->full_name);
+			*size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8);
+			*dma_addr = *paddr = 0;
+			ret = 0;
+		}
+
+		pr_err("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+			 *dma_addr, *paddr, *size);
+		goto out;
+	}
+#endif
+
 	if (!node)
 		return -EINVAL;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
@ 2017-05-03  4:46   ` Oza Pawandeep via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Pawandeep @ 2017-05-03  4:46 UTC (permalink / raw)
  To: linux-arm-kernel

current device framework and of framework integration assumes
dma-ranges in a way where memory-mapped devices define their
dma-ranges. (child-bus-address, parent-bus-address, length).

of_dma_configure is specifically written to take care of memory
mapped devices. but no implementation exists for pci to take
care of pcie based memory ranges.

for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
world dma-ranges.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;

this patch fixes this patch fixes the bug in of_dma_get_range,
which with as is, parses the PCI memory ranges and return wrong
size as 0.

in order to get largest possible dma_mask. this patch also
retuns the largest possible size based on dma-ranges,

for e.g.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
we should get dev->coherent_dma_mask=0x7fffffffff.

based on which iova allocation space will honour PCI host
bridge limitations.

Bug: SOC-5216
Change-Id: I4c534bdd17e70c6b27327d39d1656e8ed0cf56d6
Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
Reviewed-on: http://gerrit-ccxsw.broadcom.net/40762
Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 02b2903..f7734fc 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -6,6 +6,7 @@
 #include <linux/ioport.h>
 #include <linux/module.h>
 #include <linux/of_address.h>
+#include <linux/of_pci.h>
 #include <linux/pci.h>
 #include <linux/pci_regs.h>
 #include <linux/sizes.h>
@@ -830,6 +831,54 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz
 	int ret = 0;
 	u64 dmaaddr;
 
+#ifdef CONFIG_PCI
+	struct resource_entry *window;
+	LIST_HEAD(res);
+
+	if (!node)
+		return -EINVAL;
+
+	if (of_bus_pci_match(np)) {
+		*size = 0;
+		/*
+		 * PCI dma-ranges is not mandatory property.
+		 * many devices do no need to have it, since
+		 * host bridge does not require inbound memory
+		 * configuration or rather have design limitations.
+		 * so we look for dma-ranges, if missing we
+		 * just return the caller full size, and also
+		 * no dma-ranges suggests that, host bridge allows
+		 * whatever comes in, so we set dma_addr to 0.
+		 */
+		ret = of_pci_get_dma_ranges(np, &res);
+		if (!ret) {
+			resource_list_for_each_entry(window, &res) {
+			struct resource *res_dma = window->res;
+
+			if (*size < resource_size(res_dma)) {
+				*dma_addr = res_dma->start - window->offset;
+				*paddr = res_dma->start;
+				*size = resource_size(res_dma);
+				}
+			}
+		}
+		pci_free_resource_list(&res);
+
+		/* ignore the empty ranges. */
+		if (*size == 0) {
+			pr_debug("empty/zero size dma-ranges found for node(%s)\n",
+				np->full_name);
+			*size = DMA_BIT_MASK(sizeof(dma_addr_t) * 8);
+			*dma_addr = *paddr = 0;
+			ret = 0;
+		}
+
+		pr_err("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+			 *dma_addr, *paddr, *size);
+		goto out;
+	}
+#endif
+
 	if (!node)
 		return -EINVAL;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-03  5:06   ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-03  5:06 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: Linux IOMMU, linux-pci, linux-kernel, linux-arm-kernel,
	devicetree, BCM Kernel Feedback, Oza Pawandeep, Oza Pawandeep

I will send v2 after removing GERRIT details from
commit message. My apologies for the noise.

Regards,
Oza

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-03  5:06   ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza via iommu @ 2017-05-03  5:06 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Oza Pawandeep,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Linux IOMMU,
	BCM Kernel Feedback,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

I will send v2 after removing GERRIT details from
commit message. My apologies for the noise.

Regards,
Oza

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-03  5:06   ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-03  5:06 UTC (permalink / raw)
  To: linux-arm-kernel

I will send v2 after removing GERRIT details from
commit message. My apologies for the noise.

Regards,
Oza

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-03  5:07     ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-03  5:07 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: Linux IOMMU, linux-pci, linux-kernel, linux-arm-kernel,
	devicetree, BCM Kernel Feedback, Oza Pawandeep, Oza Pawandeep

I will send v2 after removing GERRIT details from
commit message. My apologies for the noise.

Regards,
Oza

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-03  5:07     ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza via iommu @ 2017-05-03  5:07 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Oza Pawandeep,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Linux IOMMU,
	BCM Kernel Feedback,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

I will send v2 after removing GERRIT details from
commit message. My apologies for the noise.

Regards,
Oza

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-03  5:07     ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-03  5:07 UTC (permalink / raw)
  To: linux-arm-kernel

I will send v2 after removing GERRIT details from
commit message. My apologies for the noise.

Regards,
Oza

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
@ 2017-05-03  5:07     ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-03  5:07 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: Linux IOMMU, linux-pci, linux-kernel, linux-arm-kernel,
	devicetree, BCM Kernel Feedback, Oza Pawandeep, Oza Pawandeep

I will send v2 after removing GERRIT details from
commit message. My apologies for the noise.

Regards,
Oza

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
@ 2017-05-03  5:07     ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza via iommu @ 2017-05-03  5:07 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Oza Pawandeep,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Linux IOMMU,
	BCM Kernel Feedback,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

I will send v2 after removing GERRIT details from
commit message. My apologies for the noise.

Regards,
Oza

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
@ 2017-05-03  5:07     ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-03  5:07 UTC (permalink / raw)
  To: linux-arm-kernel

I will send v2 after removing GERRIT details from
commit message. My apologies for the noise.

Regards,
Oza

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-03 19:55   ` Rob Herring
  0 siblings, 0 replies; 61+ messages in thread
From: Rob Herring @ 2017-05-03 19:55 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: Joerg Roedel, Robin Murphy, devicetree, Oza Pawandeep, linux-pci,
	linux-kernel, Linux IOMMU, bcm-kernel-feedback-list,
	linux-arm-kernel

On Tue, May 2, 2017 at 11:46 PM, Oza Pawandeep <oza.oza@broadcom.com> wrote:
> current device framework and of framework integration assumes
> dma-ranges in a way where memory-mapped devices define their
> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> of_dma_configure is specifically written to take care of memory
> mapped devices. but no implementation exists for pci to take
> care of pcie based memory ranges.
>
> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
> world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> this patch serves following:
>
> 1) exposes interface to the pci host driver for their
> inbound memory ranges
>
> 2) provide an interface to callers such as of_dma_get_ranges.
> so then the returned size get best possible (largest) dma_mask.
> because PCI RC drivers do not call APIs such as
> dma_set_coherent_mask() and hence rather it shows its addressing
> capabilities based on dma-ranges.
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> we should get dev->coherent_dma_mask=0x7fffffffff.
>
> 3) this patch handles multiple inbound windows and dma-ranges.
> it is left to the caller, how it wants to use them.
> the new function returns the resources in a standard and unform way
>
> 4) this way the callers of for e.g. of_dma_get_ranges
> does not need to change.
>
> 5) leaves scope of adding PCI flag handling for inbound memory
> by the new function.
>
> Bug: SOC-5216
> Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
> Tested-by: CCXSW <ccxswbuild@broadcom.com>

All these non-person, probably internal Broadcom Tested-by and
Reviewed-by's should be removed too.

> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>

Rob

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-03 19:55   ` Rob Herring
  0 siblings, 0 replies; 61+ messages in thread
From: Rob Herring @ 2017-05-03 19:55 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Oza Pawandeep,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Linux IOMMU,
	bcm-kernel-feedback-list-dY08KVG/lbpWk0Htik3J/w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, May 2, 2017 at 11:46 PM, Oza Pawandeep <oza.oza-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> wrote:
> current device framework and of framework integration assumes
> dma-ranges in a way where memory-mapped devices define their
> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> of_dma_configure is specifically written to take care of memory
> mapped devices. but no implementation exists for pci to take
> care of pcie based memory ranges.
>
> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
> world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> this patch serves following:
>
> 1) exposes interface to the pci host driver for their
> inbound memory ranges
>
> 2) provide an interface to callers such as of_dma_get_ranges.
> so then the returned size get best possible (largest) dma_mask.
> because PCI RC drivers do not call APIs such as
> dma_set_coherent_mask() and hence rather it shows its addressing
> capabilities based on dma-ranges.
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> we should get dev->coherent_dma_mask=0x7fffffffff.
>
> 3) this patch handles multiple inbound windows and dma-ranges.
> it is left to the caller, how it wants to use them.
> the new function returns the resources in a standard and unform way
>
> 4) this way the callers of for e.g. of_dma_get_ranges
> does not need to change.
>
> 5) leaves scope of adding PCI flag handling for inbound memory
> by the new function.
>
> Bug: SOC-5216
> Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
> Signed-off-by: Oza Pawandeep <oza.oza-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
> Reviewed-by: vpx_checkpatch status <vpx_checkpatch-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Reviewed-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Reviewed-by: Ray Jui <ray.jui-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Tested-by: vpx_autobuild status <vpx_autobuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Tested-by: vpx_smoketest status <vpx_smoketest-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Tested-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>

All these non-person, probably internal Broadcom Tested-by and
Reviewed-by's should be removed too.

> Reviewed-by: Scott Branden <scott.branden-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>

Rob

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-03 19:55   ` Rob Herring
  0 siblings, 0 replies; 61+ messages in thread
From: Rob Herring @ 2017-05-03 19:55 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: devicetree, Oza Pawandeep, linux-pci, Joerg Roedel, linux-kernel,
	Linux IOMMU, bcm-kernel-feedback-list, Robin Murphy,
	linux-arm-kernel

On Tue, May 2, 2017 at 11:46 PM, Oza Pawandeep <oza.oza@broadcom.com> wrote:
> current device framework and of framework integration assumes
> dma-ranges in a way where memory-mapped devices define their
> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> of_dma_configure is specifically written to take care of memory
> mapped devices. but no implementation exists for pci to take
> care of pcie based memory ranges.
>
> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
> world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> this patch serves following:
>
> 1) exposes interface to the pci host driver for their
> inbound memory ranges
>
> 2) provide an interface to callers such as of_dma_get_ranges.
> so then the returned size get best possible (largest) dma_mask.
> because PCI RC drivers do not call APIs such as
> dma_set_coherent_mask() and hence rather it shows its addressing
> capabilities based on dma-ranges.
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> we should get dev->coherent_dma_mask=0x7fffffffff.
>
> 3) this patch handles multiple inbound windows and dma-ranges.
> it is left to the caller, how it wants to use them.
> the new function returns the resources in a standard and unform way
>
> 4) this way the callers of for e.g. of_dma_get_ranges
> does not need to change.
>
> 5) leaves scope of adding PCI flag handling for inbound memory
> by the new function.
>
> Bug: SOC-5216
> Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
> Tested-by: CCXSW <ccxswbuild@broadcom.com>

All these non-person, probably internal Broadcom Tested-by and
Reviewed-by's should be removed too.

> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>

Rob

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-03 19:55   ` Rob Herring
  0 siblings, 0 replies; 61+ messages in thread
From: Rob Herring @ 2017-05-03 19:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 2, 2017 at 11:46 PM, Oza Pawandeep <oza.oza@broadcom.com> wrote:
> current device framework and of framework integration assumes
> dma-ranges in a way where memory-mapped devices define their
> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> of_dma_configure is specifically written to take care of memory
> mapped devices. but no implementation exists for pci to take
> care of pcie based memory ranges.
>
> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
> world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> this patch serves following:
>
> 1) exposes interface to the pci host driver for their
> inbound memory ranges
>
> 2) provide an interface to callers such as of_dma_get_ranges.
> so then the returned size get best possible (largest) dma_mask.
> because PCI RC drivers do not call APIs such as
> dma_set_coherent_mask() and hence rather it shows its addressing
> capabilities based on dma-ranges.
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> we should get dev->coherent_dma_mask=0x7fffffffff.
>
> 3) this patch handles multiple inbound windows and dma-ranges.
> it is left to the caller, how it wants to use them.
> the new function returns the resources in a standard and unform way
>
> 4) this way the callers of for e.g. of_dma_get_ranges
> does not need to change.
>
> 5) leaves scope of adding PCI flag handling for inbound memory
> by the new function.
>
> Bug: SOC-5216
> Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
> Tested-by: CCXSW <ccxswbuild@broadcom.com>

All these non-person, probably internal Broadcom Tested-by and
Reviewed-by's should be removed too.

> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>

Rob

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
@ 2017-05-03 20:06     ` Rob Herring
  0 siblings, 0 replies; 61+ messages in thread
From: Rob Herring @ 2017-05-03 20:06 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: Joerg Roedel, Robin Murphy, Linux IOMMU, linux-pci, linux-kernel,
	linux-arm-kernel, devicetree, bcm-kernel-feedback-list,
	Oza Pawandeep

On Tue, May 2, 2017 at 11:46 PM, Oza Pawandeep <oza.oza@broadcom.com> wrote:
> current device framework and of framework integration assumes
> dma-ranges in a way where memory-mapped devices define their
> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> of_dma_configure is specifically written to take care of memory
> mapped devices. but no implementation exists for pci to take
> care of pcie based memory ranges.
>
> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
> world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> this patch fixes this patch fixes the bug in of_dma_get_range,
> which with as is, parses the PCI memory ranges and return wrong
> size as 0.
>
> in order to get largest possible dma_mask. this patch also
> retuns the largest possible size based on dma-ranges,
>
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> we should get dev->coherent_dma_mask=0x7fffffffff.
>
> based on which iova allocation space will honour PCI host
> bridge limitations.
>
> Bug: SOC-5216
> Change-Id: I4c534bdd17e70c6b27327d39d1656e8ed0cf56d6
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40762
> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>
> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index 02b2903..f7734fc 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -6,6 +6,7 @@
>  #include <linux/ioport.h>
>  #include <linux/module.h>
>  #include <linux/of_address.h>
> +#include <linux/of_pci.h>
>  #include <linux/pci.h>
>  #include <linux/pci_regs.h>
>  #include <linux/sizes.h>
> @@ -830,6 +831,54 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz
>         int ret = 0;
>         u64 dmaaddr;
>
> +#ifdef CONFIG_PCI
> +       struct resource_entry *window;
> +       LIST_HEAD(res);
> +
> +       if (!node)
> +               return -EINVAL;
> +
> +       if (of_bus_pci_match(np)) {

You are not following what I'm saying. Let me spell it out:

- Add a get_dma_ranges() function to of_bus struct. Or maybe should
cover ranges too (e.g. get_ranges). I'm not sure.
- Convert existing contents of this function to
of_bus_default_dma_get_ranges and add that to the default of_bus
struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.

Rob

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
@ 2017-05-03 20:06     ` Rob Herring
  0 siblings, 0 replies; 61+ messages in thread
From: Rob Herring @ 2017-05-03 20:06 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Oza Pawandeep,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Linux IOMMU,
	bcm-kernel-feedback-list-dY08KVG/lbpWk0Htik3J/w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, May 2, 2017 at 11:46 PM, Oza Pawandeep <oza.oza-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> wrote:
> current device framework and of framework integration assumes
> dma-ranges in a way where memory-mapped devices define their
> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> of_dma_configure is specifically written to take care of memory
> mapped devices. but no implementation exists for pci to take
> care of pcie based memory ranges.
>
> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
> world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> this patch fixes this patch fixes the bug in of_dma_get_range,
> which with as is, parses the PCI memory ranges and return wrong
> size as 0.
>
> in order to get largest possible dma_mask. this patch also
> retuns the largest possible size based on dma-ranges,
>
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> we should get dev->coherent_dma_mask=0x7fffffffff.
>
> based on which iova allocation space will honour PCI host
> bridge limitations.
>
> Bug: SOC-5216
> Change-Id: I4c534bdd17e70c6b27327d39d1656e8ed0cf56d6
> Signed-off-by: Oza Pawandeep <oza.oza-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40762
> Reviewed-by: vpx_checkpatch status <vpx_checkpatch-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Reviewed-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Reviewed-by: Scott Branden <scott.branden-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Tested-by: vpx_autobuild status <vpx_autobuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Tested-by: vpx_smoketest status <vpx_smoketest-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>
> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index 02b2903..f7734fc 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -6,6 +6,7 @@
>  #include <linux/ioport.h>
>  #include <linux/module.h>
>  #include <linux/of_address.h>
> +#include <linux/of_pci.h>
>  #include <linux/pci.h>
>  #include <linux/pci_regs.h>
>  #include <linux/sizes.h>
> @@ -830,6 +831,54 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz
>         int ret = 0;
>         u64 dmaaddr;
>
> +#ifdef CONFIG_PCI
> +       struct resource_entry *window;
> +       LIST_HEAD(res);
> +
> +       if (!node)
> +               return -EINVAL;
> +
> +       if (of_bus_pci_match(np)) {

You are not following what I'm saying. Let me spell it out:

- Add a get_dma_ranges() function to of_bus struct. Or maybe should
cover ranges too (e.g. get_ranges). I'm not sure.
- Convert existing contents of this function to
of_bus_default_dma_get_ranges and add that to the default of_bus
struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.

Rob

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
@ 2017-05-03 20:06     ` Rob Herring
  0 siblings, 0 replies; 61+ messages in thread
From: Rob Herring @ 2017-05-03 20:06 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: devicetree, Oza Pawandeep, linux-pci, Joerg Roedel, linux-kernel,
	Linux IOMMU, bcm-kernel-feedback-list, Robin Murphy,
	linux-arm-kernel

On Tue, May 2, 2017 at 11:46 PM, Oza Pawandeep <oza.oza@broadcom.com> wrote:
> current device framework and of framework integration assumes
> dma-ranges in a way where memory-mapped devices define their
> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> of_dma_configure is specifically written to take care of memory
> mapped devices. but no implementation exists for pci to take
> care of pcie based memory ranges.
>
> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
> world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> this patch fixes this patch fixes the bug in of_dma_get_range,
> which with as is, parses the PCI memory ranges and return wrong
> size as 0.
>
> in order to get largest possible dma_mask. this patch also
> retuns the largest possible size based on dma-ranges,
>
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> we should get dev->coherent_dma_mask=0x7fffffffff.
>
> based on which iova allocation space will honour PCI host
> bridge limitations.
>
> Bug: SOC-5216
> Change-Id: I4c534bdd17e70c6b27327d39d1656e8ed0cf56d6
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40762
> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>
> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index 02b2903..f7734fc 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -6,6 +6,7 @@
>  #include <linux/ioport.h>
>  #include <linux/module.h>
>  #include <linux/of_address.h>
> +#include <linux/of_pci.h>
>  #include <linux/pci.h>
>  #include <linux/pci_regs.h>
>  #include <linux/sizes.h>
> @@ -830,6 +831,54 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz
>         int ret = 0;
>         u64 dmaaddr;
>
> +#ifdef CONFIG_PCI
> +       struct resource_entry *window;
> +       LIST_HEAD(res);
> +
> +       if (!node)
> +               return -EINVAL;
> +
> +       if (of_bus_pci_match(np)) {

You are not following what I'm saying. Let me spell it out:

- Add a get_dma_ranges() function to of_bus struct. Or maybe should
cover ranges too (e.g. get_ranges). I'm not sure.
- Convert existing contents of this function to
of_bus_default_dma_get_ranges and add that to the default of_bus
struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.

Rob

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
@ 2017-05-03 20:06     ` Rob Herring
  0 siblings, 0 replies; 61+ messages in thread
From: Rob Herring @ 2017-05-03 20:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 2, 2017 at 11:46 PM, Oza Pawandeep <oza.oza@broadcom.com> wrote:
> current device framework and of framework integration assumes
> dma-ranges in a way where memory-mapped devices define their
> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> of_dma_configure is specifically written to take care of memory
> mapped devices. but no implementation exists for pci to take
> care of pcie based memory ranges.
>
> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
> world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> this patch fixes this patch fixes the bug in of_dma_get_range,
> which with as is, parses the PCI memory ranges and return wrong
> size as 0.
>
> in order to get largest possible dma_mask. this patch also
> retuns the largest possible size based on dma-ranges,
>
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> we should get dev->coherent_dma_mask=0x7fffffffff.
>
> based on which iova allocation space will honour PCI host
> bridge limitations.
>
> Bug: SOC-5216
> Change-Id: I4c534bdd17e70c6b27327d39d1656e8ed0cf56d6
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40762
> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>
> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index 02b2903..f7734fc 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -6,6 +6,7 @@
>  #include <linux/ioport.h>
>  #include <linux/module.h>
>  #include <linux/of_address.h>
> +#include <linux/of_pci.h>
>  #include <linux/pci.h>
>  #include <linux/pci_regs.h>
>  #include <linux/sizes.h>
> @@ -830,6 +831,54 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz
>         int ret = 0;
>         u64 dmaaddr;
>
> +#ifdef CONFIG_PCI
> +       struct resource_entry *window;
> +       LIST_HEAD(res);
> +
> +       if (!node)
> +               return -EINVAL;
> +
> +       if (of_bus_pci_match(np)) {

You are not following what I'm saying. Let me spell it out:

- Add a get_dma_ranges() function to of_bus struct. Or maybe should
cover ranges too (e.g. get_ranges). I'm not sure.
- Convert existing contents of this function to
of_bus_default_dma_get_ranges and add that to the default of_bus
struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.

Rob

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-04 18:02   ` Robin Murphy
  0 siblings, 0 replies; 61+ messages in thread
From: Robin Murphy @ 2017-05-04 18:02 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: Joerg Roedel, iommu, linux-pci, linux-kernel, linux-arm-kernel,
	devicetree, bcm-kernel-feedback-list, Oza Pawandeep

[apologies for the silence - I've been on holiday]

On 03/05/17 05:46, Oza Pawandeep wrote:
> current device framework and of framework integration assumes
> dma-ranges in a way where memory-mapped devices define their
> dma-ranges. (child-bus-address, parent-bus-address, length).

Well, yes, that is simply the definition of dma-ranges, and remains true
regardless of the particular format of either bus address.

> of_dma_configure is specifically written to take care of memory
> mapped devices. but no implementation exists for pci to take
> care of pcie based memory ranges.

That still doesn't make sense. To repeat myself again, PCI devices *ARE*
memory-mapped devices. Yes, there do exist some platforms where I/O
space is not treated as MMIO, but config space and memory space are very
much memory-mapped however you look at them, and in the context of DMA,
only memory space is relevant anyway.

What *is* true about the current code is that of_dma_get_range() expects
to be passed an OF node representing the device itself, and doesn't work
properly when passed the node of the device's parent bus directly, which
happens to be what pci_dma_configure() currently does. That's the only
reason why it doesn't work for (single-entry) host controller dma-ranges
today. This does not mean it's a PCI problem, it is simply the case that
pci_dma_configure() is the only caller currently hitting it. Other
discoverable, DMA-capable buses like fsl-mc are still going to face the
exact same problem with or without this patch.

> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
> world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> 
> this patch serves following:
> 
> 1) exposes interface to the pci host driver for their
> inbound memory ranges
> 
> 2) provide an interface to callers such as of_dma_get_ranges.
> so then the returned size get best possible (largest) dma_mask.
> because PCI RC drivers do not call APIs such as
> dma_set_coherent_mask() and hence rather it shows its addressing
> capabilities based on dma-ranges.
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> we should get dev->coherent_dma_mask=0x7fffffffff.
> 
> 3) this patch handles multiple inbound windows and dma-ranges.
> it is left to the caller, how it wants to use them.
> the new function returns the resources in a standard and unform way
> 
> 4) this way the callers of for e.g. of_dma_get_ranges
> does not need to change.
> 
> 5) leaves scope of adding PCI flag handling for inbound memory
> by the new function.

Which flags would ever actually matter? DMA windows aren't going to be
to config or I/O space, so the memory type can be assumed, and the
32/64-bit distinction is irrelevant as it's not a relocatable BAR;
DMA-able system memory isn't going to be read-sensitive, so the
prefetchable flag shouldn't matter; and not being a BAR none of the
others would be relevant either.

> 
> Bug: SOC-5216
> Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
> Tested-by: CCXSW <ccxswbuild@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> 
> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
> index 0ee42c3..ed6e69a 100644
> --- a/drivers/of/of_pci.c
> +++ b/drivers/of/of_pci.c
> @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev,
>  	return err;
>  }
>  EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
> +
> +/**
> + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
> + * @np: device node of the host bridge having the dma-ranges property
> + * @resources: list where the range of resources will be added after DT parsing
> + *
> + * It is the caller's job to free the @resources list.
> + *
> + * This function will parse the "dma-ranges" property of a
> + * PCI host bridge device node and setup the resource mapping based
> + * on its content.
> + *
> + * It returns zero if the range parsing has been successful or a standard error
> + * value if it failed.
> + */
> +
> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
> +{
> +	struct device_node *node = of_node_get(np);
> +	int rlen;
> +	int ret = 0;
> +	const int na = 3, ns = 2;
> +	struct resource *res;
> +	struct of_pci_range_parser parser;
> +	struct of_pci_range range;
> +
> +	if (!node)
> +		return -EINVAL;
> +
> +	parser.node = node;
> +	parser.pna = of_n_addr_cells(node);
> +	parser.np = parser.pna + na + ns;
> +
> +	parser.range = of_get_property(node, "dma-ranges", &rlen);
> +
> +	if (!parser.range) {
> +		pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
> +			  np->full_name);
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	parser.end = parser.range + rlen / sizeof(__be32);
> +
> +	for_each_of_pci_range(&parser, &range) {

This is plain wrong - of_pci_range_parser_one() will translate upwards
through parent "ranges" properties, which is completely backwards for
DMA addresses.

Robin.

> +		/*
> +		 * If we failed translation or got a zero-sized region
> +		 * then skip this range
> +		 */
> +		if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
> +			continue;
> +
> +		res = kzalloc(sizeof(struct resource), GFP_KERNEL);
> +		if (!res) {
> +			ret = -ENOMEM;
> +			goto parse_failed;
> +		}
> +
> +		ret = of_pci_range_to_resource(&range, np, res);
> +		if (ret) {
> +			kfree(res);
> +			continue;
> +		}
> +
> +		pci_add_resource_offset(resources, res,
> +					res->start - range.pci_addr);
> +	}
> +
> +	return ret;
> +
> +parse_failed:
> +	pci_free_resource_list(resources);
> +out:
> +	of_node_put(node);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
>  #endif /* CONFIG_OF_ADDRESS */
>  
>  #ifdef CONFIG_PCI_MSI
> diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
> index 0e0974e..617b90d 100644
> --- a/include/linux/of_pci.h
> +++ b/include/linux/of_pci.h
> @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
>  int of_pci_get_host_bridge_resources(struct device_node *dev,
>  			unsigned char busno, unsigned char bus_max,
>  			struct list_head *resources, resource_size_t *io_base);
> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
>  #else
>  static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>  			unsigned char busno, unsigned char bus_max,
> @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>  {
>  	return -EINVAL;
>  }
> +
> +static inline int of_pci_get_dma_ranges(struct device_node *np,
> +					struct list_head *resources)
> +{
> +	return -EINVAL;
> +}
>  #endif
>  
>  #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-04 18:02   ` Robin Murphy
  0 siblings, 0 replies; 61+ messages in thread
From: Robin Murphy @ 2017-05-04 18:02 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Oza Pawandeep,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	bcm-kernel-feedback-list-dY08KVG/lbpWk0Htik3J/w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

[apologies for the silence - I've been on holiday]

On 03/05/17 05:46, Oza Pawandeep wrote:
> current device framework and of framework integration assumes
> dma-ranges in a way where memory-mapped devices define their
> dma-ranges. (child-bus-address, parent-bus-address, length).

Well, yes, that is simply the definition of dma-ranges, and remains true
regardless of the particular format of either bus address.

> of_dma_configure is specifically written to take care of memory
> mapped devices. but no implementation exists for pci to take
> care of pcie based memory ranges.

That still doesn't make sense. To repeat myself again, PCI devices *ARE*
memory-mapped devices. Yes, there do exist some platforms where I/O
space is not treated as MMIO, but config space and memory space are very
much memory-mapped however you look at them, and in the context of DMA,
only memory space is relevant anyway.

What *is* true about the current code is that of_dma_get_range() expects
to be passed an OF node representing the device itself, and doesn't work
properly when passed the node of the device's parent bus directly, which
happens to be what pci_dma_configure() currently does. That's the only
reason why it doesn't work for (single-entry) host controller dma-ranges
today. This does not mean it's a PCI problem, it is simply the case that
pci_dma_configure() is the only caller currently hitting it. Other
discoverable, DMA-capable buses like fsl-mc are still going to face the
exact same problem with or without this patch.

> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
> world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> 
> this patch serves following:
> 
> 1) exposes interface to the pci host driver for their
> inbound memory ranges
> 
> 2) provide an interface to callers such as of_dma_get_ranges.
> so then the returned size get best possible (largest) dma_mask.
> because PCI RC drivers do not call APIs such as
> dma_set_coherent_mask() and hence rather it shows its addressing
> capabilities based on dma-ranges.
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> we should get dev->coherent_dma_mask=0x7fffffffff.
> 
> 3) this patch handles multiple inbound windows and dma-ranges.
> it is left to the caller, how it wants to use them.
> the new function returns the resources in a standard and unform way
> 
> 4) this way the callers of for e.g. of_dma_get_ranges
> does not need to change.
> 
> 5) leaves scope of adding PCI flag handling for inbound memory
> by the new function.

Which flags would ever actually matter? DMA windows aren't going to be
to config or I/O space, so the memory type can be assumed, and the
32/64-bit distinction is irrelevant as it's not a relocatable BAR;
DMA-able system memory isn't going to be read-sensitive, so the
prefetchable flag shouldn't matter; and not being a BAR none of the
others would be relevant either.

> 
> Bug: SOC-5216
> Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
> Signed-off-by: Oza Pawandeep <oza.oza-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
> Reviewed-by: vpx_checkpatch status <vpx_checkpatch-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Reviewed-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Reviewed-by: Ray Jui <ray.jui-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Tested-by: vpx_autobuild status <vpx_autobuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Tested-by: vpx_smoketest status <vpx_smoketest-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Tested-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Reviewed-by: Scott Branden <scott.branden-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> 
> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
> index 0ee42c3..ed6e69a 100644
> --- a/drivers/of/of_pci.c
> +++ b/drivers/of/of_pci.c
> @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev,
>  	return err;
>  }
>  EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
> +
> +/**
> + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
> + * @np: device node of the host bridge having the dma-ranges property
> + * @resources: list where the range of resources will be added after DT parsing
> + *
> + * It is the caller's job to free the @resources list.
> + *
> + * This function will parse the "dma-ranges" property of a
> + * PCI host bridge device node and setup the resource mapping based
> + * on its content.
> + *
> + * It returns zero if the range parsing has been successful or a standard error
> + * value if it failed.
> + */
> +
> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
> +{
> +	struct device_node *node = of_node_get(np);
> +	int rlen;
> +	int ret = 0;
> +	const int na = 3, ns = 2;
> +	struct resource *res;
> +	struct of_pci_range_parser parser;
> +	struct of_pci_range range;
> +
> +	if (!node)
> +		return -EINVAL;
> +
> +	parser.node = node;
> +	parser.pna = of_n_addr_cells(node);
> +	parser.np = parser.pna + na + ns;
> +
> +	parser.range = of_get_property(node, "dma-ranges", &rlen);
> +
> +	if (!parser.range) {
> +		pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
> +			  np->full_name);
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	parser.end = parser.range + rlen / sizeof(__be32);
> +
> +	for_each_of_pci_range(&parser, &range) {

This is plain wrong - of_pci_range_parser_one() will translate upwards
through parent "ranges" properties, which is completely backwards for
DMA addresses.

Robin.

> +		/*
> +		 * If we failed translation or got a zero-sized region
> +		 * then skip this range
> +		 */
> +		if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
> +			continue;
> +
> +		res = kzalloc(sizeof(struct resource), GFP_KERNEL);
> +		if (!res) {
> +			ret = -ENOMEM;
> +			goto parse_failed;
> +		}
> +
> +		ret = of_pci_range_to_resource(&range, np, res);
> +		if (ret) {
> +			kfree(res);
> +			continue;
> +		}
> +
> +		pci_add_resource_offset(resources, res,
> +					res->start - range.pci_addr);
> +	}
> +
> +	return ret;
> +
> +parse_failed:
> +	pci_free_resource_list(resources);
> +out:
> +	of_node_put(node);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
>  #endif /* CONFIG_OF_ADDRESS */
>  
>  #ifdef CONFIG_PCI_MSI
> diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
> index 0e0974e..617b90d 100644
> --- a/include/linux/of_pci.h
> +++ b/include/linux/of_pci.h
> @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
>  int of_pci_get_host_bridge_resources(struct device_node *dev,
>  			unsigned char busno, unsigned char bus_max,
>  			struct list_head *resources, resource_size_t *io_base);
> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
>  #else
>  static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>  			unsigned char busno, unsigned char bus_max,
> @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>  {
>  	return -EINVAL;
>  }
> +
> +static inline int of_pci_get_dma_ranges(struct device_node *np,
> +					struct list_head *resources)
> +{
> +	return -EINVAL;
> +}
>  #endif
>  
>  #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-04 18:02   ` Robin Murphy
  0 siblings, 0 replies; 61+ messages in thread
From: Robin Murphy @ 2017-05-04 18:02 UTC (permalink / raw)
  To: linux-arm-kernel

[apologies for the silence - I've been on holiday]

On 03/05/17 05:46, Oza Pawandeep wrote:
> current device framework and of framework integration assumes
> dma-ranges in a way where memory-mapped devices define their
> dma-ranges. (child-bus-address, parent-bus-address, length).

Well, yes, that is simply the definition of dma-ranges, and remains true
regardless of the particular format of either bus address.

> of_dma_configure is specifically written to take care of memory
> mapped devices. but no implementation exists for pci to take
> care of pcie based memory ranges.

That still doesn't make sense. To repeat myself again, PCI devices *ARE*
memory-mapped devices. Yes, there do exist some platforms where I/O
space is not treated as MMIO, but config space and memory space are very
much memory-mapped however you look at them, and in the context of DMA,
only memory space is relevant anyway.

What *is* true about the current code is that of_dma_get_range() expects
to be passed an OF node representing the device itself, and doesn't work
properly when passed the node of the device's parent bus directly, which
happens to be what pci_dma_configure() currently does. That's the only
reason why it doesn't work for (single-entry) host controller dma-ranges
today. This does not mean it's a PCI problem, it is simply the case that
pci_dma_configure() is the only caller currently hitting it. Other
discoverable, DMA-capable buses like fsl-mc are still going to face the
exact same problem with or without this patch.

> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
> world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> 
> this patch serves following:
> 
> 1) exposes interface to the pci host driver for their
> inbound memory ranges
> 
> 2) provide an interface to callers such as of_dma_get_ranges.
> so then the returned size get best possible (largest) dma_mask.
> because PCI RC drivers do not call APIs such as
> dma_set_coherent_mask() and hence rather it shows its addressing
> capabilities based on dma-ranges.
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> we should get dev->coherent_dma_mask=0x7fffffffff.
> 
> 3) this patch handles multiple inbound windows and dma-ranges.
> it is left to the caller, how it wants to use them.
> the new function returns the resources in a standard and unform way
> 
> 4) this way the callers of for e.g. of_dma_get_ranges
> does not need to change.
> 
> 5) leaves scope of adding PCI flag handling for inbound memory
> by the new function.

Which flags would ever actually matter? DMA windows aren't going to be
to config or I/O space, so the memory type can be assumed, and the
32/64-bit distinction is irrelevant as it's not a relocatable BAR;
DMA-able system memory isn't going to be read-sensitive, so the
prefetchable flag shouldn't matter; and not being a BAR none of the
others would be relevant either.

> 
> Bug: SOC-5216
> Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
> Tested-by: CCXSW <ccxswbuild@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> 
> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
> index 0ee42c3..ed6e69a 100644
> --- a/drivers/of/of_pci.c
> +++ b/drivers/of/of_pci.c
> @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev,
>  	return err;
>  }
>  EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
> +
> +/**
> + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
> + * @np: device node of the host bridge having the dma-ranges property
> + * @resources: list where the range of resources will be added after DT parsing
> + *
> + * It is the caller's job to free the @resources list.
> + *
> + * This function will parse the "dma-ranges" property of a
> + * PCI host bridge device node and setup the resource mapping based
> + * on its content.
> + *
> + * It returns zero if the range parsing has been successful or a standard error
> + * value if it failed.
> + */
> +
> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
> +{
> +	struct device_node *node = of_node_get(np);
> +	int rlen;
> +	int ret = 0;
> +	const int na = 3, ns = 2;
> +	struct resource *res;
> +	struct of_pci_range_parser parser;
> +	struct of_pci_range range;
> +
> +	if (!node)
> +		return -EINVAL;
> +
> +	parser.node = node;
> +	parser.pna = of_n_addr_cells(node);
> +	parser.np = parser.pna + na + ns;
> +
> +	parser.range = of_get_property(node, "dma-ranges", &rlen);
> +
> +	if (!parser.range) {
> +		pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
> +			  np->full_name);
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	parser.end = parser.range + rlen / sizeof(__be32);
> +
> +	for_each_of_pci_range(&parser, &range) {

This is plain wrong - of_pci_range_parser_one() will translate upwards
through parent "ranges" properties, which is completely backwards for
DMA addresses.

Robin.

> +		/*
> +		 * If we failed translation or got a zero-sized region
> +		 * then skip this range
> +		 */
> +		if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
> +			continue;
> +
> +		res = kzalloc(sizeof(struct resource), GFP_KERNEL);
> +		if (!res) {
> +			ret = -ENOMEM;
> +			goto parse_failed;
> +		}
> +
> +		ret = of_pci_range_to_resource(&range, np, res);
> +		if (ret) {
> +			kfree(res);
> +			continue;
> +		}
> +
> +		pci_add_resource_offset(resources, res,
> +					res->start - range.pci_addr);
> +	}
> +
> +	return ret;
> +
> +parse_failed:
> +	pci_free_resource_list(resources);
> +out:
> +	of_node_put(node);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
>  #endif /* CONFIG_OF_ADDRESS */
>  
>  #ifdef CONFIG_PCI_MSI
> diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
> index 0e0974e..617b90d 100644
> --- a/include/linux/of_pci.h
> +++ b/include/linux/of_pci.h
> @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
>  int of_pci_get_host_bridge_resources(struct device_node *dev,
>  			unsigned char busno, unsigned char bus_max,
>  			struct list_head *resources, resource_size_t *io_base);
> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
>  #else
>  static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>  			unsigned char busno, unsigned char bus_max,
> @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>  {
>  	return -EINVAL;
>  }
> +
> +static inline int of_pci_get_dma_ranges(struct device_node *np,
> +					struct list_head *resources)
> +{
> +	return -EINVAL;
> +}
>  #endif
>  
>  #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-04 18:20     ` Robin Murphy
  0 siblings, 0 replies; 61+ messages in thread
From: Robin Murphy @ 2017-05-04 18:20 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: Joerg Roedel, iommu, linux-pci, linux-kernel, linux-arm-kernel,
	devicetree, bcm-kernel-feedback-list, Oza Pawandeep

On 03/05/17 05:46, Oza Pawandeep wrote:
> this patch reserves the iova for PCI masters.
> ARM64 based SOCs may have scattered memory banks.
> such as iproc based SOC has
> 
> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
> 
> but incoming PCI transcation addressing capability is limited
> by host bridge, for example if max incoming window capability
> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
> 
> to address this problem, iommu has to avoid allocating iovas which
> are reserved. which inturn does not allocate iova if it falls into hole.

I don't necessarily disagree with doing this, as we could do with facing
up to the issue of discontiguous DMA ranges in particular (I too have a
platform with this problem), but I'm still not overly keen on pulling DT
specifics into this layer. More than that, though, if we are going to do
it, then we should do it for all devices with a restrictive
"dma-ranges", not just PCI ones.

> Bug: SOC-5216
> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
> Tested-by: CCXSW <ccxswbuild@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 48d36ce..08764b0 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -27,6 +27,7 @@
>  #include <linux/iova.h>
>  #include <linux/irq.h>
>  #include <linux/mm.h>
> +#include <linux/of_pci.h>
>  #include <linux/pci.h>
>  #include <linux/scatterlist.h>
>  #include <linux/vmalloc.h>
> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>  		struct iova_domain *iovad)
>  {
>  	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
> +	struct device_node *np = bridge->dev.parent->of_node;
>  	struct resource_entry *window;
>  	unsigned long lo, hi;
> +	int ret;
> +	dma_addr_t tmp_dma_addr = 0, dma_addr;
> +	LIST_HEAD(res);
>  
>  	resource_list_for_each_entry(window, &bridge->windows) {
>  		if (resource_type(window->res) != IORESOURCE_MEM &&
> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>  		hi = iova_pfn(iovad, window->res->end - window->offset);
>  		reserve_iova(iovad, lo, hi);
>  	}
> +
> +	/* PCI inbound memory reservation. */
> +	ret = of_pci_get_dma_ranges(np, &res);
> +	if (!ret) {
> +		resource_list_for_each_entry(window, &res) {
> +			struct resource *res_dma = window->res;
> +
> +			dma_addr = res_dma->start - window->offset;
> +			if (tmp_dma_addr > dma_addr) {
> +				pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");

I don't see anything in the DT spec about the entries having to be
sorted, and it's not exactly impossible to sort a list if you need it so
(and if I'm being really pedantic, one could still trigger this with a
list that *is* sorted, only by different criteria).

Robin.

> +				return;
> +			}
> +			if (tmp_dma_addr != dma_addr) {
> +				lo = iova_pfn(iovad, tmp_dma_addr);
> +				hi = iova_pfn(iovad, dma_addr - 1);
> +				reserve_iova(iovad, lo, hi);
> +			}
> +			tmp_dma_addr = window->res->end - window->offset;
> +		}
> +		/*
> +		 * the last dma-range should honour based on the
> +		 * 32/64-bit dma addresses.
> +		 */
> +		if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
> +			lo = iova_pfn(iovad, tmp_dma_addr);
> +			hi = iova_pfn(iovad,
> +				      DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
> +			reserve_iova(iovad, lo, hi);
> +		}
> +	}
>  }
>  
>  /**
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-04 18:20     ` Robin Murphy
  0 siblings, 0 replies; 61+ messages in thread
From: Robin Murphy @ 2017-05-04 18:20 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: Joerg Roedel, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	bcm-kernel-feedback-list-dY08KVG/lbpWk0Htik3J/w, Oza Pawandeep

On 03/05/17 05:46, Oza Pawandeep wrote:
> this patch reserves the iova for PCI masters.
> ARM64 based SOCs may have scattered memory banks.
> such as iproc based SOC has
> 
> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
> 
> but incoming PCI transcation addressing capability is limited
> by host bridge, for example if max incoming window capability
> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
> 
> to address this problem, iommu has to avoid allocating iovas which
> are reserved. which inturn does not allocate iova if it falls into hole.

I don't necessarily disagree with doing this, as we could do with facing
up to the issue of discontiguous DMA ranges in particular (I too have a
platform with this problem), but I'm still not overly keen on pulling DT
specifics into this layer. More than that, though, if we are going to do
it, then we should do it for all devices with a restrictive
"dma-ranges", not just PCI ones.

> Bug: SOC-5216
> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
> Signed-off-by: Oza Pawandeep <oza.oza-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
> Reviewed-by: vpx_checkpatch status <vpx_checkpatch-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Reviewed-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Tested-by: vpx_autobuild status <vpx_autobuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Tested-by: vpx_smoketest status <vpx_smoketest-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Tested-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Reviewed-by: Scott Branden <scott.branden-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 48d36ce..08764b0 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -27,6 +27,7 @@
>  #include <linux/iova.h>
>  #include <linux/irq.h>
>  #include <linux/mm.h>
> +#include <linux/of_pci.h>
>  #include <linux/pci.h>
>  #include <linux/scatterlist.h>
>  #include <linux/vmalloc.h>
> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>  		struct iova_domain *iovad)
>  {
>  	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
> +	struct device_node *np = bridge->dev.parent->of_node;
>  	struct resource_entry *window;
>  	unsigned long lo, hi;
> +	int ret;
> +	dma_addr_t tmp_dma_addr = 0, dma_addr;
> +	LIST_HEAD(res);
>  
>  	resource_list_for_each_entry(window, &bridge->windows) {
>  		if (resource_type(window->res) != IORESOURCE_MEM &&
> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>  		hi = iova_pfn(iovad, window->res->end - window->offset);
>  		reserve_iova(iovad, lo, hi);
>  	}
> +
> +	/* PCI inbound memory reservation. */
> +	ret = of_pci_get_dma_ranges(np, &res);
> +	if (!ret) {
> +		resource_list_for_each_entry(window, &res) {
> +			struct resource *res_dma = window->res;
> +
> +			dma_addr = res_dma->start - window->offset;
> +			if (tmp_dma_addr > dma_addr) {
> +				pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");

I don't see anything in the DT spec about the entries having to be
sorted, and it's not exactly impossible to sort a list if you need it so
(and if I'm being really pedantic, one could still trigger this with a
list that *is* sorted, only by different criteria).

Robin.

> +				return;
> +			}
> +			if (tmp_dma_addr != dma_addr) {
> +				lo = iova_pfn(iovad, tmp_dma_addr);
> +				hi = iova_pfn(iovad, dma_addr - 1);
> +				reserve_iova(iovad, lo, hi);
> +			}
> +			tmp_dma_addr = window->res->end - window->offset;
> +		}
> +		/*
> +		 * the last dma-range should honour based on the
> +		 * 32/64-bit dma addresses.
> +		 */
> +		if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
> +			lo = iova_pfn(iovad, tmp_dma_addr);
> +			hi = iova_pfn(iovad,
> +				      DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
> +			reserve_iova(iovad, lo, hi);
> +		}
> +	}
>  }
>  
>  /**
> 

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-04 18:20     ` Robin Murphy
  0 siblings, 0 replies; 61+ messages in thread
From: Robin Murphy @ 2017-05-04 18:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/05/17 05:46, Oza Pawandeep wrote:
> this patch reserves the iova for PCI masters.
> ARM64 based SOCs may have scattered memory banks.
> such as iproc based SOC has
> 
> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
> 
> but incoming PCI transcation addressing capability is limited
> by host bridge, for example if max incoming window capability
> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
> 
> to address this problem, iommu has to avoid allocating iovas which
> are reserved. which inturn does not allocate iova if it falls into hole.

I don't necessarily disagree with doing this, as we could do with facing
up to the issue of discontiguous DMA ranges in particular (I too have a
platform with this problem), but I'm still not overly keen on pulling DT
specifics into this layer. More than that, though, if we are going to do
it, then we should do it for all devices with a restrictive
"dma-ranges", not just PCI ones.

> Bug: SOC-5216
> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
> Tested-by: CCXSW <ccxswbuild@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 48d36ce..08764b0 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -27,6 +27,7 @@
>  #include <linux/iova.h>
>  #include <linux/irq.h>
>  #include <linux/mm.h>
> +#include <linux/of_pci.h>
>  #include <linux/pci.h>
>  #include <linux/scatterlist.h>
>  #include <linux/vmalloc.h>
> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>  		struct iova_domain *iovad)
>  {
>  	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
> +	struct device_node *np = bridge->dev.parent->of_node;
>  	struct resource_entry *window;
>  	unsigned long lo, hi;
> +	int ret;
> +	dma_addr_t tmp_dma_addr = 0, dma_addr;
> +	LIST_HEAD(res);
>  
>  	resource_list_for_each_entry(window, &bridge->windows) {
>  		if (resource_type(window->res) != IORESOURCE_MEM &&
> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>  		hi = iova_pfn(iovad, window->res->end - window->offset);
>  		reserve_iova(iovad, lo, hi);
>  	}
> +
> +	/* PCI inbound memory reservation. */
> +	ret = of_pci_get_dma_ranges(np, &res);
> +	if (!ret) {
> +		resource_list_for_each_entry(window, &res) {
> +			struct resource *res_dma = window->res;
> +
> +			dma_addr = res_dma->start - window->offset;
> +			if (tmp_dma_addr > dma_addr) {
> +				pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");

I don't see anything in the DT spec about the entries having to be
sorted, and it's not exactly impossible to sort a list if you need it so
(and if I'm being really pedantic, one could still trigger this with a
list that *is* sorted, only by different criteria).

Robin.

> +				return;
> +			}
> +			if (tmp_dma_addr != dma_addr) {
> +				lo = iova_pfn(iovad, tmp_dma_addr);
> +				hi = iova_pfn(iovad, dma_addr - 1);
> +				reserve_iova(iovad, lo, hi);
> +			}
> +			tmp_dma_addr = window->res->end - window->offset;
> +		}
> +		/*
> +		 * the last dma-range should honour based on the
> +		 * 32/64-bit dma addresses.
> +		 */
> +		if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
> +			lo = iova_pfn(iovad, tmp_dma_addr);
> +			hi = iova_pfn(iovad,
> +				      DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
> +			reserve_iova(iovad, lo, hi);
> +		}
> +	}
>  }
>  
>  /**
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-04 18:41     ` Oza Oza
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-04 18:41 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, Linux IOMMU, linux-pci, linux-kernel,
	linux-arm-kernel, devicetree, BCM Kernel Feedback, Oza Pawandeep

On Thu, May 4, 2017 at 11:32 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> [apologies for the silence - I've been on holiday]
>
> On 03/05/17 05:46, Oza Pawandeep wrote:
>> current device framework and of framework integration assumes
>> dma-ranges in a way where memory-mapped devices define their
>> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> Well, yes, that is simply the definition of dma-ranges, and remains true
> regardless of the particular format of either bus address.
>
>> of_dma_configure is specifically written to take care of memory
>> mapped devices. but no implementation exists for pci to take
>> care of pcie based memory ranges.
>
> That still doesn't make sense. To repeat myself again, PCI devices *ARE*
> memory-mapped devices. Yes, there do exist some platforms where I/O
> space is not treated as MMIO, but config space and memory space are very
> much memory-mapped however you look at them, and in the context of DMA,
> only memory space is relevant anyway.
>
> What *is* true about the current code is that of_dma_get_range() expects
> to be passed an OF node representing the device itself, and doesn't work
> properly when passed the node of the device's parent bus directly, which
> happens to be what pci_dma_configure() currently does. That's the only
> reason why it doesn't work for (single-entry) host controller dma-ranges
> today. This does not mean it's a PCI problem, it is simply the case that
> pci_dma_configure() is the only caller currently hitting it. Other
> discoverable, DMA-capable buses like fsl-mc are still going to face the
> exact same problem with or without this patch.
>
>> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
>> world dma-ranges.
>> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>>
>> this patch serves following:
>>
>> 1) exposes interface to the pci host driver for their
>> inbound memory ranges
>>
>> 2) provide an interface to callers such as of_dma_get_ranges.
>> so then the returned size get best possible (largest) dma_mask.
>> because PCI RC drivers do not call APIs such as
>> dma_set_coherent_mask() and hence rather it shows its addressing
>> capabilities based on dma-ranges.
>> for e.g.
>> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>> we should get dev->coherent_dma_mask=0x7fffffffff.
>>
>> 3) this patch handles multiple inbound windows and dma-ranges.
>> it is left to the caller, how it wants to use them.
>> the new function returns the resources in a standard and unform way
>>
>> 4) this way the callers of for e.g. of_dma_get_ranges
>> does not need to change.
>>
>> 5) leaves scope of adding PCI flag handling for inbound memory
>> by the new function.
>
> Which flags would ever actually matter? DMA windows aren't going to be
> to config or I/O space, so the memory type can be assumed, and the
> 32/64-bit distinction is irrelevant as it's not a relocatable BAR;
> DMA-able system memory isn't going to be read-sensitive, so the
> prefetchable flag shouldn't matter; and not being a BAR none of the
> others would be relevant either.
>

Thanks Robin; for your reply and attention:

agree with you, at present it would not matter,
but it does not mean that we do not scope it to make it matter in future.

now where it could matter:
there is Relaxed Ordering for inbound memory for PCI.
According to standard, Relaxed Ordering (RO) bit can be set only for
Memory requests and completions (if present in the original request).
Also, according to transaction ordering rules, I/O and configuration
requests can still be re-ordered ahead of each other.
and we would like to make use of it.
for e.g. lets say we mark memory as Relaxed Ordered with flag.
the special about this memory is incoming PCI transactions can be
reordered and rest memory has to be strongly ordered.

how it our SOC would make use of this is out of scope for the
discussion at this point of time, but I am just bringing in the
idea/point how flags could be useful
for inbound memory, since we would not like throw-away flags completely.

>>
>> Bug: SOC-5216
>> Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
>> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
>> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
>> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
>> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>> Tested-by: CCXSW <ccxswbuild@broadcom.com>
>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>
>> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
>> index 0ee42c3..ed6e69a 100644
>> --- a/drivers/of/of_pci.c
>> +++ b/drivers/of/of_pci.c
>> @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev,
>>       return err;
>>  }
>>  EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
>> +
>> +/**
>> + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
>> + * @np: device node of the host bridge having the dma-ranges property
>> + * @resources: list where the range of resources will be added after DT parsing
>> + *
>> + * It is the caller's job to free the @resources list.
>> + *
>> + * This function will parse the "dma-ranges" property of a
>> + * PCI host bridge device node and setup the resource mapping based
>> + * on its content.
>> + *
>> + * It returns zero if the range parsing has been successful or a standard error
>> + * value if it failed.
>> + */
>> +
>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
>> +{
>> +     struct device_node *node = of_node_get(np);
>> +     int rlen;
>> +     int ret = 0;
>> +     const int na = 3, ns = 2;
>> +     struct resource *res;
>> +     struct of_pci_range_parser parser;
>> +     struct of_pci_range range;
>> +
>> +     if (!node)
>> +             return -EINVAL;
>> +
>> +     parser.node = node;
>> +     parser.pna = of_n_addr_cells(node);
>> +     parser.np = parser.pna + na + ns;
>> +
>> +     parser.range = of_get_property(node, "dma-ranges", &rlen);
>> +
>> +     if (!parser.range) {
>> +             pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
>> +                       np->full_name);
>> +             ret = -EINVAL;
>> +             goto out;
>> +     }
>> +
>> +     parser.end = parser.range + rlen / sizeof(__be32);
>> +
>> +     for_each_of_pci_range(&parser, &range) {
>
> This is plain wrong - of_pci_range_parser_one() will translate upwards
> through parent "ranges" properties, which is completely backwards for
> DMA addresses.
>
> Robin.
>

No it does not, this patch is thoroughly tested on our SOC and it works.
of_pci_range_parser_one does not translate upwards through parent. it
just sticks to given PCI parser.

>> +             /*
>> +              * If we failed translation or got a zero-sized region
>> +              * then skip this range
>> +              */
>> +             if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
>> +                     continue;
>> +
>> +             res = kzalloc(sizeof(struct resource), GFP_KERNEL);
>> +             if (!res) {
>> +                     ret = -ENOMEM;
>> +                     goto parse_failed;
>> +             }
>> +
>> +             ret = of_pci_range_to_resource(&range, np, res);
>> +             if (ret) {
>> +                     kfree(res);
>> +                     continue;
>> +             }
>> +
>> +             pci_add_resource_offset(resources, res,
>> +                                     res->start - range.pci_addr);
>> +     }
>> +
>> +     return ret;
>> +
>> +parse_failed:
>> +     pci_free_resource_list(resources);
>> +out:
>> +     of_node_put(node);
>> +     return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
>>  #endif /* CONFIG_OF_ADDRESS */
>>
>>  #ifdef CONFIG_PCI_MSI
>> diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
>> index 0e0974e..617b90d 100644
>> --- a/include/linux/of_pci.h
>> +++ b/include/linux/of_pci.h
>> @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
>>  int of_pci_get_host_bridge_resources(struct device_node *dev,
>>                       unsigned char busno, unsigned char bus_max,
>>                       struct list_head *resources, resource_size_t *io_base);
>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
>>  #else
>>  static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>>                       unsigned char busno, unsigned char bus_max,
>> @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>>  {
>>       return -EINVAL;
>>  }
>> +
>> +static inline int of_pci_get_dma_ranges(struct device_node *np,
>> +                                     struct list_head *resources)
>> +{
>> +     return -EINVAL;
>> +}
>>  #endif
>>
>>  #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
>>
>

I am posting v2, please have a look at it, it is very much improved
design, which address Rob's comments.

Regards,
Oza.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-04 18:41     ` Oza Oza
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-04 18:41 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, Linux IOMMU, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, BCM Kernel Feedback,
	Oza Pawandeep

On Thu, May 4, 2017 at 11:32 PM, Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> wrote:
> [apologies for the silence - I've been on holiday]
>
> On 03/05/17 05:46, Oza Pawandeep wrote:
>> current device framework and of framework integration assumes
>> dma-ranges in a way where memory-mapped devices define their
>> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> Well, yes, that is simply the definition of dma-ranges, and remains true
> regardless of the particular format of either bus address.
>
>> of_dma_configure is specifically written to take care of memory
>> mapped devices. but no implementation exists for pci to take
>> care of pcie based memory ranges.
>
> That still doesn't make sense. To repeat myself again, PCI devices *ARE*
> memory-mapped devices. Yes, there do exist some platforms where I/O
> space is not treated as MMIO, but config space and memory space are very
> much memory-mapped however you look at them, and in the context of DMA,
> only memory space is relevant anyway.
>
> What *is* true about the current code is that of_dma_get_range() expects
> to be passed an OF node representing the device itself, and doesn't work
> properly when passed the node of the device's parent bus directly, which
> happens to be what pci_dma_configure() currently does. That's the only
> reason why it doesn't work for (single-entry) host controller dma-ranges
> today. This does not mean it's a PCI problem, it is simply the case that
> pci_dma_configure() is the only caller currently hitting it. Other
> discoverable, DMA-capable buses like fsl-mc are still going to face the
> exact same problem with or without this patch.
>
>> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
>> world dma-ranges.
>> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>>
>> this patch serves following:
>>
>> 1) exposes interface to the pci host driver for their
>> inbound memory ranges
>>
>> 2) provide an interface to callers such as of_dma_get_ranges.
>> so then the returned size get best possible (largest) dma_mask.
>> because PCI RC drivers do not call APIs such as
>> dma_set_coherent_mask() and hence rather it shows its addressing
>> capabilities based on dma-ranges.
>> for e.g.
>> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>> we should get dev->coherent_dma_mask=0x7fffffffff.
>>
>> 3) this patch handles multiple inbound windows and dma-ranges.
>> it is left to the caller, how it wants to use them.
>> the new function returns the resources in a standard and unform way
>>
>> 4) this way the callers of for e.g. of_dma_get_ranges
>> does not need to change.
>>
>> 5) leaves scope of adding PCI flag handling for inbound memory
>> by the new function.
>
> Which flags would ever actually matter? DMA windows aren't going to be
> to config or I/O space, so the memory type can be assumed, and the
> 32/64-bit distinction is irrelevant as it's not a relocatable BAR;
> DMA-able system memory isn't going to be read-sensitive, so the
> prefetchable flag shouldn't matter; and not being a BAR none of the
> others would be relevant either.
>

Thanks Robin; for your reply and attention:

agree with you, at present it would not matter,
but it does not mean that we do not scope it to make it matter in future.

now where it could matter:
there is Relaxed Ordering for inbound memory for PCI.
According to standard, Relaxed Ordering (RO) bit can be set only for
Memory requests and completions (if present in the original request).
Also, according to transaction ordering rules, I/O and configuration
requests can still be re-ordered ahead of each other.
and we would like to make use of it.
for e.g. lets say we mark memory as Relaxed Ordered with flag.
the special about this memory is incoming PCI transactions can be
reordered and rest memory has to be strongly ordered.

how it our SOC would make use of this is out of scope for the
discussion at this point of time, but I am just bringing in the
idea/point how flags could be useful
for inbound memory, since we would not like throw-away flags completely.

>>
>> Bug: SOC-5216
>> Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
>> Signed-off-by: Oza Pawandeep <oza.oza-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Reviewed-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Reviewed-by: Ray Jui <ray.jui-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Tested-by: vpx_autobuild status <vpx_autobuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Tested-by: vpx_smoketest status <vpx_smoketest-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Tested-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Reviewed-by: Scott Branden <scott.branden-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>>
>> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
>> index 0ee42c3..ed6e69a 100644
>> --- a/drivers/of/of_pci.c
>> +++ b/drivers/of/of_pci.c
>> @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev,
>>       return err;
>>  }
>>  EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
>> +
>> +/**
>> + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
>> + * @np: device node of the host bridge having the dma-ranges property
>> + * @resources: list where the range of resources will be added after DT parsing
>> + *
>> + * It is the caller's job to free the @resources list.
>> + *
>> + * This function will parse the "dma-ranges" property of a
>> + * PCI host bridge device node and setup the resource mapping based
>> + * on its content.
>> + *
>> + * It returns zero if the range parsing has been successful or a standard error
>> + * value if it failed.
>> + */
>> +
>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
>> +{
>> +     struct device_node *node = of_node_get(np);
>> +     int rlen;
>> +     int ret = 0;
>> +     const int na = 3, ns = 2;
>> +     struct resource *res;
>> +     struct of_pci_range_parser parser;
>> +     struct of_pci_range range;
>> +
>> +     if (!node)
>> +             return -EINVAL;
>> +
>> +     parser.node = node;
>> +     parser.pna = of_n_addr_cells(node);
>> +     parser.np = parser.pna + na + ns;
>> +
>> +     parser.range = of_get_property(node, "dma-ranges", &rlen);
>> +
>> +     if (!parser.range) {
>> +             pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
>> +                       np->full_name);
>> +             ret = -EINVAL;
>> +             goto out;
>> +     }
>> +
>> +     parser.end = parser.range + rlen / sizeof(__be32);
>> +
>> +     for_each_of_pci_range(&parser, &range) {
>
> This is plain wrong - of_pci_range_parser_one() will translate upwards
> through parent "ranges" properties, which is completely backwards for
> DMA addresses.
>
> Robin.
>

No it does not, this patch is thoroughly tested on our SOC and it works.
of_pci_range_parser_one does not translate upwards through parent. it
just sticks to given PCI parser.

>> +             /*
>> +              * If we failed translation or got a zero-sized region
>> +              * then skip this range
>> +              */
>> +             if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
>> +                     continue;
>> +
>> +             res = kzalloc(sizeof(struct resource), GFP_KERNEL);
>> +             if (!res) {
>> +                     ret = -ENOMEM;
>> +                     goto parse_failed;
>> +             }
>> +
>> +             ret = of_pci_range_to_resource(&range, np, res);
>> +             if (ret) {
>> +                     kfree(res);
>> +                     continue;
>> +             }
>> +
>> +             pci_add_resource_offset(resources, res,
>> +                                     res->start - range.pci_addr);
>> +     }
>> +
>> +     return ret;
>> +
>> +parse_failed:
>> +     pci_free_resource_list(resources);
>> +out:
>> +     of_node_put(node);
>> +     return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
>>  #endif /* CONFIG_OF_ADDRESS */
>>
>>  #ifdef CONFIG_PCI_MSI
>> diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
>> index 0e0974e..617b90d 100644
>> --- a/include/linux/of_pci.h
>> +++ b/include/linux/of_pci.h
>> @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
>>  int of_pci_get_host_bridge_resources(struct device_node *dev,
>>                       unsigned char busno, unsigned char bus_max,
>>                       struct list_head *resources, resource_size_t *io_base);
>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
>>  #else
>>  static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>>                       unsigned char busno, unsigned char bus_max,
>> @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>>  {
>>       return -EINVAL;
>>  }
>> +
>> +static inline int of_pci_get_dma_ranges(struct device_node *np,
>> +                                     struct list_head *resources)
>> +{
>> +     return -EINVAL;
>> +}
>>  #endif
>>
>>  #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
>>
>

I am posting v2, please have a look at it, it is very much improved
design, which address Rob's comments.

Regards,
Oza.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-04 18:41     ` Oza Oza
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-04 18:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 4, 2017 at 11:32 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> [apologies for the silence - I've been on holiday]
>
> On 03/05/17 05:46, Oza Pawandeep wrote:
>> current device framework and of framework integration assumes
>> dma-ranges in a way where memory-mapped devices define their
>> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> Well, yes, that is simply the definition of dma-ranges, and remains true
> regardless of the particular format of either bus address.
>
>> of_dma_configure is specifically written to take care of memory
>> mapped devices. but no implementation exists for pci to take
>> care of pcie based memory ranges.
>
> That still doesn't make sense. To repeat myself again, PCI devices *ARE*
> memory-mapped devices. Yes, there do exist some platforms where I/O
> space is not treated as MMIO, but config space and memory space are very
> much memory-mapped however you look at them, and in the context of DMA,
> only memory space is relevant anyway.
>
> What *is* true about the current code is that of_dma_get_range() expects
> to be passed an OF node representing the device itself, and doesn't work
> properly when passed the node of the device's parent bus directly, which
> happens to be what pci_dma_configure() currently does. That's the only
> reason why it doesn't work for (single-entry) host controller dma-ranges
> today. This does not mean it's a PCI problem, it is simply the case that
> pci_dma_configure() is the only caller currently hitting it. Other
> discoverable, DMA-capable buses like fsl-mc are still going to face the
> exact same problem with or without this patch.
>
>> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
>> world dma-ranges.
>> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>>
>> this patch serves following:
>>
>> 1) exposes interface to the pci host driver for their
>> inbound memory ranges
>>
>> 2) provide an interface to callers such as of_dma_get_ranges.
>> so then the returned size get best possible (largest) dma_mask.
>> because PCI RC drivers do not call APIs such as
>> dma_set_coherent_mask() and hence rather it shows its addressing
>> capabilities based on dma-ranges.
>> for e.g.
>> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>> we should get dev->coherent_dma_mask=0x7fffffffff.
>>
>> 3) this patch handles multiple inbound windows and dma-ranges.
>> it is left to the caller, how it wants to use them.
>> the new function returns the resources in a standard and unform way
>>
>> 4) this way the callers of for e.g. of_dma_get_ranges
>> does not need to change.
>>
>> 5) leaves scope of adding PCI flag handling for inbound memory
>> by the new function.
>
> Which flags would ever actually matter? DMA windows aren't going to be
> to config or I/O space, so the memory type can be assumed, and the
> 32/64-bit distinction is irrelevant as it's not a relocatable BAR;
> DMA-able system memory isn't going to be read-sensitive, so the
> prefetchable flag shouldn't matter; and not being a BAR none of the
> others would be relevant either.
>

Thanks Robin; for your reply and attention:

agree with you, at present it would not matter,
but it does not mean that we do not scope it to make it matter in future.

now where it could matter:
there is Relaxed Ordering for inbound memory for PCI.
According to standard, Relaxed Ordering (RO) bit can be set only for
Memory requests and completions (if present in the original request).
Also, according to transaction ordering rules, I/O and configuration
requests can still be re-ordered ahead of each other.
and we would like to make use of it.
for e.g. lets say we mark memory as Relaxed Ordered with flag.
the special about this memory is incoming PCI transactions can be
reordered and rest memory has to be strongly ordered.

how it our SOC would make use of this is out of scope for the
discussion at this point of time, but I am just bringing in the
idea/point how flags could be useful
for inbound memory, since we would not like throw-away flags completely.

>>
>> Bug: SOC-5216
>> Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
>> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
>> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
>> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
>> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>> Tested-by: CCXSW <ccxswbuild@broadcom.com>
>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>
>> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
>> index 0ee42c3..ed6e69a 100644
>> --- a/drivers/of/of_pci.c
>> +++ b/drivers/of/of_pci.c
>> @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev,
>>       return err;
>>  }
>>  EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
>> +
>> +/**
>> + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
>> + * @np: device node of the host bridge having the dma-ranges property
>> + * @resources: list where the range of resources will be added after DT parsing
>> + *
>> + * It is the caller's job to free the @resources list.
>> + *
>> + * This function will parse the "dma-ranges" property of a
>> + * PCI host bridge device node and setup the resource mapping based
>> + * on its content.
>> + *
>> + * It returns zero if the range parsing has been successful or a standard error
>> + * value if it failed.
>> + */
>> +
>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
>> +{
>> +     struct device_node *node = of_node_get(np);
>> +     int rlen;
>> +     int ret = 0;
>> +     const int na = 3, ns = 2;
>> +     struct resource *res;
>> +     struct of_pci_range_parser parser;
>> +     struct of_pci_range range;
>> +
>> +     if (!node)
>> +             return -EINVAL;
>> +
>> +     parser.node = node;
>> +     parser.pna = of_n_addr_cells(node);
>> +     parser.np = parser.pna + na + ns;
>> +
>> +     parser.range = of_get_property(node, "dma-ranges", &rlen);
>> +
>> +     if (!parser.range) {
>> +             pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
>> +                       np->full_name);
>> +             ret = -EINVAL;
>> +             goto out;
>> +     }
>> +
>> +     parser.end = parser.range + rlen / sizeof(__be32);
>> +
>> +     for_each_of_pci_range(&parser, &range) {
>
> This is plain wrong - of_pci_range_parser_one() will translate upwards
> through parent "ranges" properties, which is completely backwards for
> DMA addresses.
>
> Robin.
>

No it does not, this patch is thoroughly tested on our SOC and it works.
of_pci_range_parser_one does not translate upwards through parent. it
just sticks to given PCI parser.

>> +             /*
>> +              * If we failed translation or got a zero-sized region
>> +              * then skip this range
>> +              */
>> +             if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
>> +                     continue;
>> +
>> +             res = kzalloc(sizeof(struct resource), GFP_KERNEL);
>> +             if (!res) {
>> +                     ret = -ENOMEM;
>> +                     goto parse_failed;
>> +             }
>> +
>> +             ret = of_pci_range_to_resource(&range, np, res);
>> +             if (ret) {
>> +                     kfree(res);
>> +                     continue;
>> +             }
>> +
>> +             pci_add_resource_offset(resources, res,
>> +                                     res->start - range.pci_addr);
>> +     }
>> +
>> +     return ret;
>> +
>> +parse_failed:
>> +     pci_free_resource_list(resources);
>> +out:
>> +     of_node_put(node);
>> +     return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
>>  #endif /* CONFIG_OF_ADDRESS */
>>
>>  #ifdef CONFIG_PCI_MSI
>> diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
>> index 0e0974e..617b90d 100644
>> --- a/include/linux/of_pci.h
>> +++ b/include/linux/of_pci.h
>> @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
>>  int of_pci_get_host_bridge_resources(struct device_node *dev,
>>                       unsigned char busno, unsigned char bus_max,
>>                       struct list_head *resources, resource_size_t *io_base);
>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
>>  #else
>>  static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>>                       unsigned char busno, unsigned char bus_max,
>> @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>>  {
>>       return -EINVAL;
>>  }
>> +
>> +static inline int of_pci_get_dma_ranges(struct device_node *np,
>> +                                     struct list_head *resources)
>> +{
>> +     return -EINVAL;
>> +}
>>  #endif
>>
>>  #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
>>
>

I am posting v2, please have a look at it, it is very much improved
design, which address Rob's comments.

Regards,
Oza.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-04 18:52       ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-04 18:52 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, Linux IOMMU, linux-pci, linux-kernel,
	linux-arm-kernel, devicetree, BCM Kernel Feedback, Oza Pawandeep

On Thu, May 4, 2017 at 11:50 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> On 03/05/17 05:46, Oza Pawandeep wrote:
>> this patch reserves the iova for PCI masters.
>> ARM64 based SOCs may have scattered memory banks.
>> such as iproc based SOC has
>>
>> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
>> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
>> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
>> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
>>
>> but incoming PCI transcation addressing capability is limited
>> by host bridge, for example if max incoming window capability
>> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
>>
>> to address this problem, iommu has to avoid allocating iovas which
>> are reserved. which inturn does not allocate iova if it falls into hole.
>
> I don't necessarily disagree with doing this, as we could do with facing
> up to the issue of discontiguous DMA ranges in particular (I too have a
> platform with this problem), but I'm still not overly keen on pulling DT
> specifics into this layer. More than that, though, if we are going to do
> it, then we should do it for all devices with a restrictive
> "dma-ranges", not just PCI ones.
>

How do you propose to do it ?

my thinking is this:
iova_reserve_pci_windows is written specific for PCI, and I am adding there.

ideally
struct pci_host_bridge should have new member:

struct list_head inbound_windows; /* resource_entry */

but somehow this resource have to be filled much before
iommu_dma_init_domain happens.
and use brdge resource directly in iova_reserve_pci_windows as it is
already doing it for outbound memory.

this will detach the DT specifics from dma-immu layer.
let me know how this sounds.


>> Bug: SOC-5216
>> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
>> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
>> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
>> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>> Tested-by: CCXSW <ccxswbuild@broadcom.com>
>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 48d36ce..08764b0 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -27,6 +27,7 @@
>>  #include <linux/iova.h>
>>  #include <linux/irq.h>
>>  #include <linux/mm.h>
>> +#include <linux/of_pci.h>
>>  #include <linux/pci.h>
>>  #include <linux/scatterlist.h>
>>  #include <linux/vmalloc.h>
>> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>               struct iova_domain *iovad)
>>  {
>>       struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>> +     struct device_node *np = bridge->dev.parent->of_node;
>>       struct resource_entry *window;
>>       unsigned long lo, hi;
>> +     int ret;
>> +     dma_addr_t tmp_dma_addr = 0, dma_addr;
>> +     LIST_HEAD(res);
>>
>>       resource_list_for_each_entry(window, &bridge->windows) {
>>               if (resource_type(window->res) != IORESOURCE_MEM &&
>> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>               hi = iova_pfn(iovad, window->res->end - window->offset);
>>               reserve_iova(iovad, lo, hi);
>>       }
>> +
>> +     /* PCI inbound memory reservation. */
>> +     ret = of_pci_get_dma_ranges(np, &res);
>> +     if (!ret) {
>> +             resource_list_for_each_entry(window, &res) {
>> +                     struct resource *res_dma = window->res;
>> +
>> +                     dma_addr = res_dma->start - window->offset;
>> +                     if (tmp_dma_addr > dma_addr) {
>> +                             pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");
>
> I don't see anything in the DT spec about the entries having to be
> sorted, and it's not exactly impossible to sort a list if you need it so
> (and if I'm being really pedantic, one could still trigger this with a
> list that *is* sorted, only by different criteria).
>

we have to sort it the way we want then. I can make it sort then.
thanks for the suggestion.

> Robin.
>
>> +                             return;
>> +                     }
>> +                     if (tmp_dma_addr != dma_addr) {
>> +                             lo = iova_pfn(iovad, tmp_dma_addr);
>> +                             hi = iova_pfn(iovad, dma_addr - 1);
>> +                             reserve_iova(iovad, lo, hi);
>> +                     }
>> +                     tmp_dma_addr = window->res->end - window->offset;
>> +             }
>> +             /*
>> +              * the last dma-range should honour based on the
>> +              * 32/64-bit dma addresses.
>> +              */
>> +             if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
>> +                     lo = iova_pfn(iovad, tmp_dma_addr);
>> +                     hi = iova_pfn(iovad,
>> +                                   DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
>> +                     reserve_iova(iovad, lo, hi);
>> +             }
>> +     }
>>  }
>>
>>  /**
>>
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-04 18:52       ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza via iommu @ 2017-05-04 18:52 UTC (permalink / raw)
  To: Robin Murphy
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Oza Pawandeep,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Linux IOMMU,
	BCM Kernel Feedback,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, May 4, 2017 at 11:50 PM, Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> wrote:
> On 03/05/17 05:46, Oza Pawandeep wrote:
>> this patch reserves the iova for PCI masters.
>> ARM64 based SOCs may have scattered memory banks.
>> such as iproc based SOC has
>>
>> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
>> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
>> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
>> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
>>
>> but incoming PCI transcation addressing capability is limited
>> by host bridge, for example if max incoming window capability
>> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
>>
>> to address this problem, iommu has to avoid allocating iovas which
>> are reserved. which inturn does not allocate iova if it falls into hole.
>
> I don't necessarily disagree with doing this, as we could do with facing
> up to the issue of discontiguous DMA ranges in particular (I too have a
> platform with this problem), but I'm still not overly keen on pulling DT
> specifics into this layer. More than that, though, if we are going to do
> it, then we should do it for all devices with a restrictive
> "dma-ranges", not just PCI ones.
>

How do you propose to do it ?

my thinking is this:
iova_reserve_pci_windows is written specific for PCI, and I am adding there.

ideally
struct pci_host_bridge should have new member:

struct list_head inbound_windows; /* resource_entry */

but somehow this resource have to be filled much before
iommu_dma_init_domain happens.
and use brdge resource directly in iova_reserve_pci_windows as it is
already doing it for outbound memory.

this will detach the DT specifics from dma-immu layer.
let me know how this sounds.


>> Bug: SOC-5216
>> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
>> Signed-off-by: Oza Pawandeep <oza.oza-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Reviewed-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Tested-by: vpx_autobuild status <vpx_autobuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Tested-by: vpx_smoketest status <vpx_smoketest-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Tested-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Reviewed-by: Scott Branden <scott.branden-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 48d36ce..08764b0 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -27,6 +27,7 @@
>>  #include <linux/iova.h>
>>  #include <linux/irq.h>
>>  #include <linux/mm.h>
>> +#include <linux/of_pci.h>
>>  #include <linux/pci.h>
>>  #include <linux/scatterlist.h>
>>  #include <linux/vmalloc.h>
>> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>               struct iova_domain *iovad)
>>  {
>>       struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>> +     struct device_node *np = bridge->dev.parent->of_node;
>>       struct resource_entry *window;
>>       unsigned long lo, hi;
>> +     int ret;
>> +     dma_addr_t tmp_dma_addr = 0, dma_addr;
>> +     LIST_HEAD(res);
>>
>>       resource_list_for_each_entry(window, &bridge->windows) {
>>               if (resource_type(window->res) != IORESOURCE_MEM &&
>> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>               hi = iova_pfn(iovad, window->res->end - window->offset);
>>               reserve_iova(iovad, lo, hi);
>>       }
>> +
>> +     /* PCI inbound memory reservation. */
>> +     ret = of_pci_get_dma_ranges(np, &res);
>> +     if (!ret) {
>> +             resource_list_for_each_entry(window, &res) {
>> +                     struct resource *res_dma = window->res;
>> +
>> +                     dma_addr = res_dma->start - window->offset;
>> +                     if (tmp_dma_addr > dma_addr) {
>> +                             pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");
>
> I don't see anything in the DT spec about the entries having to be
> sorted, and it's not exactly impossible to sort a list if you need it so
> (and if I'm being really pedantic, one could still trigger this with a
> list that *is* sorted, only by different criteria).
>

we have to sort it the way we want then. I can make it sort then.
thanks for the suggestion.

> Robin.
>
>> +                             return;
>> +                     }
>> +                     if (tmp_dma_addr != dma_addr) {
>> +                             lo = iova_pfn(iovad, tmp_dma_addr);
>> +                             hi = iova_pfn(iovad, dma_addr - 1);
>> +                             reserve_iova(iovad, lo, hi);
>> +                     }
>> +                     tmp_dma_addr = window->res->end - window->offset;
>> +             }
>> +             /*
>> +              * the last dma-range should honour based on the
>> +              * 32/64-bit dma addresses.
>> +              */
>> +             if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
>> +                     lo = iova_pfn(iovad, tmp_dma_addr);
>> +                     hi = iova_pfn(iovad,
>> +                                   DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
>> +                     reserve_iova(iovad, lo, hi);
>> +             }
>> +     }
>>  }
>>
>>  /**
>>
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-04 18:52       ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-04 18:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 4, 2017 at 11:50 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> On 03/05/17 05:46, Oza Pawandeep wrote:
>> this patch reserves the iova for PCI masters.
>> ARM64 based SOCs may have scattered memory banks.
>> such as iproc based SOC has
>>
>> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
>> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
>> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
>> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
>>
>> but incoming PCI transcation addressing capability is limited
>> by host bridge, for example if max incoming window capability
>> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
>>
>> to address this problem, iommu has to avoid allocating iovas which
>> are reserved. which inturn does not allocate iova if it falls into hole.
>
> I don't necessarily disagree with doing this, as we could do with facing
> up to the issue of discontiguous DMA ranges in particular (I too have a
> platform with this problem), but I'm still not overly keen on pulling DT
> specifics into this layer. More than that, though, if we are going to do
> it, then we should do it for all devices with a restrictive
> "dma-ranges", not just PCI ones.
>

How do you propose to do it ?

my thinking is this:
iova_reserve_pci_windows is written specific for PCI, and I am adding there.

ideally
struct pci_host_bridge should have new member:

struct list_head inbound_windows; /* resource_entry */

but somehow this resource have to be filled much before
iommu_dma_init_domain happens.
and use brdge resource directly in iova_reserve_pci_windows as it is
already doing it for outbound memory.

this will detach the DT specifics from dma-immu layer.
let me know how this sounds.


>> Bug: SOC-5216
>> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
>> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
>> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
>> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>> Tested-by: CCXSW <ccxswbuild@broadcom.com>
>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 48d36ce..08764b0 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -27,6 +27,7 @@
>>  #include <linux/iova.h>
>>  #include <linux/irq.h>
>>  #include <linux/mm.h>
>> +#include <linux/of_pci.h>
>>  #include <linux/pci.h>
>>  #include <linux/scatterlist.h>
>>  #include <linux/vmalloc.h>
>> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>               struct iova_domain *iovad)
>>  {
>>       struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>> +     struct device_node *np = bridge->dev.parent->of_node;
>>       struct resource_entry *window;
>>       unsigned long lo, hi;
>> +     int ret;
>> +     dma_addr_t tmp_dma_addr = 0, dma_addr;
>> +     LIST_HEAD(res);
>>
>>       resource_list_for_each_entry(window, &bridge->windows) {
>>               if (resource_type(window->res) != IORESOURCE_MEM &&
>> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>               hi = iova_pfn(iovad, window->res->end - window->offset);
>>               reserve_iova(iovad, lo, hi);
>>       }
>> +
>> +     /* PCI inbound memory reservation. */
>> +     ret = of_pci_get_dma_ranges(np, &res);
>> +     if (!ret) {
>> +             resource_list_for_each_entry(window, &res) {
>> +                     struct resource *res_dma = window->res;
>> +
>> +                     dma_addr = res_dma->start - window->offset;
>> +                     if (tmp_dma_addr > dma_addr) {
>> +                             pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");
>
> I don't see anything in the DT spec about the entries having to be
> sorted, and it's not exactly impossible to sort a list if you need it so
> (and if I'm being really pedantic, one could still trigger this with a
> list that *is* sorted, only by different criteria).
>

we have to sort it the way we want then. I can make it sort then.
thanks for the suggestion.

> Robin.
>
>> +                             return;
>> +                     }
>> +                     if (tmp_dma_addr != dma_addr) {
>> +                             lo = iova_pfn(iovad, tmp_dma_addr);
>> +                             hi = iova_pfn(iovad, dma_addr - 1);
>> +                             reserve_iova(iovad, lo, hi);
>> +                     }
>> +                     tmp_dma_addr = window->res->end - window->offset;
>> +             }
>> +             /*
>> +              * the last dma-range should honour based on the
>> +              * 32/64-bit dma addresses.
>> +              */
>> +             if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
>> +                     lo = iova_pfn(iovad, tmp_dma_addr);
>> +                     hi = iova_pfn(iovad,
>> +                                   DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
>> +                     reserve_iova(iovad, lo, hi);
>> +             }
>> +     }
>>  }
>>
>>  /**
>>
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-04 19:12     ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-04 19:12 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, Linux IOMMU, linux-pci, linux-kernel,
	linux-arm-kernel, devicetree, BCM Kernel Feedback, Oza Pawandeep

On Thu, May 4, 2017 at 11:32 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> [apologies for the silence - I've been on holiday]
>
> On 03/05/17 05:46, Oza Pawandeep wrote:
>> current device framework and of framework integration assumes
>> dma-ranges in a way where memory-mapped devices define their
>> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> Well, yes, that is simply the definition of dma-ranges, and remains true
> regardless of the particular format of either bus address.
>
>> of_dma_configure is specifically written to take care of memory
>> mapped devices. but no implementation exists for pci to take
>> care of pcie based memory ranges.
>
> That still doesn't make sense. To repeat myself again, PCI devices *ARE*
> memory-mapped devices. Yes, there do exist some platforms where I/O
> space is not treated as MMIO, but config space and memory space are very
> much memory-mapped however you look at them, and in the context of DMA,
> only memory space is relevant anyway.
>
> What *is* true about the current code is that of_dma_get_range() expects
> to be passed an OF node representing the device itself, and doesn't work
> properly when passed the node of the device's parent bus directly, which
> happens to be what pci_dma_configure() currently does. That's the only
> reason why it doesn't work for (single-entry) host controller dma-ranges
> today. This does not mean it's a PCI problem, it is simply the case that
> pci_dma_configure() is the only caller currently hitting it. Other
> discoverable, DMA-capable buses like fsl-mc are still going to face the
> exact same problem with or without this patch.
>

the new v2 is hooking callbacks for defualt and pci bus.
so now the implementation will not really look cluttered.
and for fsl-mc buses, we could choose to implement it in default bus callbacks.
will post the patch-set soon.

also with these patch-set we really do not need to prepare emulated child node.


>> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
>> world dma-ranges.
>> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>>
>> this patch serves following:
>>
>> 1) exposes interface to the pci host driver for their
>> inbound memory ranges
>>
>> 2) provide an interface to callers such as of_dma_get_ranges.
>> so then the returned size get best possible (largest) dma_mask.
>> because PCI RC drivers do not call APIs such as
>> dma_set_coherent_mask() and hence rather it shows its addressing
>> capabilities based on dma-ranges.
>> for e.g.
>> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>> we should get dev->coherent_dma_mask=0x7fffffffff.
>>
>> 3) this patch handles multiple inbound windows and dma-ranges.
>> it is left to the caller, how it wants to use them.
>> the new function returns the resources in a standard and unform way
>>
>> 4) this way the callers of for e.g. of_dma_get_ranges
>> does not need to change.
>>
>> 5) leaves scope of adding PCI flag handling for inbound memory
>> by the new function.
>
> Which flags would ever actually matter? DMA windows aren't going to be
> to config or I/O space, so the memory type can be assumed, and the
> 32/64-bit distinction is irrelevant as it's not a relocatable BAR;
> DMA-able system memory isn't going to be read-sensitive, so the
> prefetchable flag shouldn't matter; and not being a BAR none of the
> others would be relevant either.
>
>>
>> Bug: SOC-5216
>> Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
>> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
>> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
>> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
>> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>> Tested-by: CCXSW <ccxswbuild@broadcom.com>
>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>
>> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
>> index 0ee42c3..ed6e69a 100644
>> --- a/drivers/of/of_pci.c
>> +++ b/drivers/of/of_pci.c
>> @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev,
>>       return err;
>>  }
>>  EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
>> +
>> +/**
>> + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
>> + * @np: device node of the host bridge having the dma-ranges property
>> + * @resources: list where the range of resources will be added after DT parsing
>> + *
>> + * It is the caller's job to free the @resources list.
>> + *
>> + * This function will parse the "dma-ranges" property of a
>> + * PCI host bridge device node and setup the resource mapping based
>> + * on its content.
>> + *
>> + * It returns zero if the range parsing has been successful or a standard error
>> + * value if it failed.
>> + */
>> +
>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
>> +{
>> +     struct device_node *node = of_node_get(np);
>> +     int rlen;
>> +     int ret = 0;
>> +     const int na = 3, ns = 2;
>> +     struct resource *res;
>> +     struct of_pci_range_parser parser;
>> +     struct of_pci_range range;
>> +
>> +     if (!node)
>> +             return -EINVAL;
>> +
>> +     parser.node = node;
>> +     parser.pna = of_n_addr_cells(node);
>> +     parser.np = parser.pna + na + ns;
>> +
>> +     parser.range = of_get_property(node, "dma-ranges", &rlen);
>> +
>> +     if (!parser.range) {
>> +             pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
>> +                       np->full_name);
>> +             ret = -EINVAL;
>> +             goto out;
>> +     }
>> +
>> +     parser.end = parser.range + rlen / sizeof(__be32);
>> +
>> +     for_each_of_pci_range(&parser, &range) {
>
> This is plain wrong - of_pci_range_parser_one() will translate upwards
> through parent "ranges" properties, which is completely backwards for
> DMA addresses.
>
> Robin.
>
>> +             /*
>> +              * If we failed translation or got a zero-sized region
>> +              * then skip this range
>> +              */
>> +             if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
>> +                     continue;
>> +
>> +             res = kzalloc(sizeof(struct resource), GFP_KERNEL);
>> +             if (!res) {
>> +                     ret = -ENOMEM;
>> +                     goto parse_failed;
>> +             }
>> +
>> +             ret = of_pci_range_to_resource(&range, np, res);
>> +             if (ret) {
>> +                     kfree(res);
>> +                     continue;
>> +             }
>> +
>> +             pci_add_resource_offset(resources, res,
>> +                                     res->start - range.pci_addr);
>> +     }
>> +
>> +     return ret;
>> +
>> +parse_failed:
>> +     pci_free_resource_list(resources);
>> +out:
>> +     of_node_put(node);
>> +     return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
>>  #endif /* CONFIG_OF_ADDRESS */
>>
>>  #ifdef CONFIG_PCI_MSI
>> diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
>> index 0e0974e..617b90d 100644
>> --- a/include/linux/of_pci.h
>> +++ b/include/linux/of_pci.h
>> @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
>>  int of_pci_get_host_bridge_resources(struct device_node *dev,
>>                       unsigned char busno, unsigned char bus_max,
>>                       struct list_head *resources, resource_size_t *io_base);
>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
>>  #else
>>  static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>>                       unsigned char busno, unsigned char bus_max,
>> @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>>  {
>>       return -EINVAL;
>>  }
>> +
>> +static inline int of_pci_get_dma_ranges(struct device_node *np,
>> +                                     struct list_head *resources)
>> +{
>> +     return -EINVAL;
>> +}
>>  #endif
>>
>>  #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
>>
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-04 19:12     ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza via iommu @ 2017-05-04 19:12 UTC (permalink / raw)
  To: Robin Murphy
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Oza Pawandeep,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Linux IOMMU,
	BCM Kernel Feedback,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, May 4, 2017 at 11:32 PM, Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> wrote:
> [apologies for the silence - I've been on holiday]
>
> On 03/05/17 05:46, Oza Pawandeep wrote:
>> current device framework and of framework integration assumes
>> dma-ranges in a way where memory-mapped devices define their
>> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> Well, yes, that is simply the definition of dma-ranges, and remains true
> regardless of the particular format of either bus address.
>
>> of_dma_configure is specifically written to take care of memory
>> mapped devices. but no implementation exists for pci to take
>> care of pcie based memory ranges.
>
> That still doesn't make sense. To repeat myself again, PCI devices *ARE*
> memory-mapped devices. Yes, there do exist some platforms where I/O
> space is not treated as MMIO, but config space and memory space are very
> much memory-mapped however you look at them, and in the context of DMA,
> only memory space is relevant anyway.
>
> What *is* true about the current code is that of_dma_get_range() expects
> to be passed an OF node representing the device itself, and doesn't work
> properly when passed the node of the device's parent bus directly, which
> happens to be what pci_dma_configure() currently does. That's the only
> reason why it doesn't work for (single-entry) host controller dma-ranges
> today. This does not mean it's a PCI problem, it is simply the case that
> pci_dma_configure() is the only caller currently hitting it. Other
> discoverable, DMA-capable buses like fsl-mc are still going to face the
> exact same problem with or without this patch.
>

the new v2 is hooking callbacks for defualt and pci bus.
so now the implementation will not really look cluttered.
and for fsl-mc buses, we could choose to implement it in default bus callbacks.
will post the patch-set soon.

also with these patch-set we really do not need to prepare emulated child node.


>> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
>> world dma-ranges.
>> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>>
>> this patch serves following:
>>
>> 1) exposes interface to the pci host driver for their
>> inbound memory ranges
>>
>> 2) provide an interface to callers such as of_dma_get_ranges.
>> so then the returned size get best possible (largest) dma_mask.
>> because PCI RC drivers do not call APIs such as
>> dma_set_coherent_mask() and hence rather it shows its addressing
>> capabilities based on dma-ranges.
>> for e.g.
>> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>> we should get dev->coherent_dma_mask=0x7fffffffff.
>>
>> 3) this patch handles multiple inbound windows and dma-ranges.
>> it is left to the caller, how it wants to use them.
>> the new function returns the resources in a standard and unform way
>>
>> 4) this way the callers of for e.g. of_dma_get_ranges
>> does not need to change.
>>
>> 5) leaves scope of adding PCI flag handling for inbound memory
>> by the new function.
>
> Which flags would ever actually matter? DMA windows aren't going to be
> to config or I/O space, so the memory type can be assumed, and the
> 32/64-bit distinction is irrelevant as it's not a relocatable BAR;
> DMA-able system memory isn't going to be read-sensitive, so the
> prefetchable flag shouldn't matter; and not being a BAR none of the
> others would be relevant either.
>
>>
>> Bug: SOC-5216
>> Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
>> Signed-off-by: Oza Pawandeep <oza.oza-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Reviewed-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Reviewed-by: Ray Jui <ray.jui-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Tested-by: vpx_autobuild status <vpx_autobuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Tested-by: vpx_smoketest status <vpx_smoketest-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Tested-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Reviewed-by: Scott Branden <scott.branden-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>>
>> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
>> index 0ee42c3..ed6e69a 100644
>> --- a/drivers/of/of_pci.c
>> +++ b/drivers/of/of_pci.c
>> @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev,
>>       return err;
>>  }
>>  EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
>> +
>> +/**
>> + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
>> + * @np: device node of the host bridge having the dma-ranges property
>> + * @resources: list where the range of resources will be added after DT parsing
>> + *
>> + * It is the caller's job to free the @resources list.
>> + *
>> + * This function will parse the "dma-ranges" property of a
>> + * PCI host bridge device node and setup the resource mapping based
>> + * on its content.
>> + *
>> + * It returns zero if the range parsing has been successful or a standard error
>> + * value if it failed.
>> + */
>> +
>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
>> +{
>> +     struct device_node *node = of_node_get(np);
>> +     int rlen;
>> +     int ret = 0;
>> +     const int na = 3, ns = 2;
>> +     struct resource *res;
>> +     struct of_pci_range_parser parser;
>> +     struct of_pci_range range;
>> +
>> +     if (!node)
>> +             return -EINVAL;
>> +
>> +     parser.node = node;
>> +     parser.pna = of_n_addr_cells(node);
>> +     parser.np = parser.pna + na + ns;
>> +
>> +     parser.range = of_get_property(node, "dma-ranges", &rlen);
>> +
>> +     if (!parser.range) {
>> +             pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
>> +                       np->full_name);
>> +             ret = -EINVAL;
>> +             goto out;
>> +     }
>> +
>> +     parser.end = parser.range + rlen / sizeof(__be32);
>> +
>> +     for_each_of_pci_range(&parser, &range) {
>
> This is plain wrong - of_pci_range_parser_one() will translate upwards
> through parent "ranges" properties, which is completely backwards for
> DMA addresses.
>
> Robin.
>
>> +             /*
>> +              * If we failed translation or got a zero-sized region
>> +              * then skip this range
>> +              */
>> +             if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
>> +                     continue;
>> +
>> +             res = kzalloc(sizeof(struct resource), GFP_KERNEL);
>> +             if (!res) {
>> +                     ret = -ENOMEM;
>> +                     goto parse_failed;
>> +             }
>> +
>> +             ret = of_pci_range_to_resource(&range, np, res);
>> +             if (ret) {
>> +                     kfree(res);
>> +                     continue;
>> +             }
>> +
>> +             pci_add_resource_offset(resources, res,
>> +                                     res->start - range.pci_addr);
>> +     }
>> +
>> +     return ret;
>> +
>> +parse_failed:
>> +     pci_free_resource_list(resources);
>> +out:
>> +     of_node_put(node);
>> +     return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
>>  #endif /* CONFIG_OF_ADDRESS */
>>
>>  #ifdef CONFIG_PCI_MSI
>> diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
>> index 0e0974e..617b90d 100644
>> --- a/include/linux/of_pci.h
>> +++ b/include/linux/of_pci.h
>> @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
>>  int of_pci_get_host_bridge_resources(struct device_node *dev,
>>                       unsigned char busno, unsigned char bus_max,
>>                       struct list_head *resources, resource_size_t *io_base);
>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
>>  #else
>>  static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>>                       unsigned char busno, unsigned char bus_max,
>> @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>>  {
>>       return -EINVAL;
>>  }
>> +
>> +static inline int of_pci_get_dma_ranges(struct device_node *np,
>> +                                     struct list_head *resources)
>> +{
>> +     return -EINVAL;
>> +}
>>  #endif
>>
>>  #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
>>
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-04 19:12     ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-04 19:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 4, 2017 at 11:32 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> [apologies for the silence - I've been on holiday]
>
> On 03/05/17 05:46, Oza Pawandeep wrote:
>> current device framework and of framework integration assumes
>> dma-ranges in a way where memory-mapped devices define their
>> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> Well, yes, that is simply the definition of dma-ranges, and remains true
> regardless of the particular format of either bus address.
>
>> of_dma_configure is specifically written to take care of memory
>> mapped devices. but no implementation exists for pci to take
>> care of pcie based memory ranges.
>
> That still doesn't make sense. To repeat myself again, PCI devices *ARE*
> memory-mapped devices. Yes, there do exist some platforms where I/O
> space is not treated as MMIO, but config space and memory space are very
> much memory-mapped however you look at them, and in the context of DMA,
> only memory space is relevant anyway.
>
> What *is* true about the current code is that of_dma_get_range() expects
> to be passed an OF node representing the device itself, and doesn't work
> properly when passed the node of the device's parent bus directly, which
> happens to be what pci_dma_configure() currently does. That's the only
> reason why it doesn't work for (single-entry) host controller dma-ranges
> today. This does not mean it's a PCI problem, it is simply the case that
> pci_dma_configure() is the only caller currently hitting it. Other
> discoverable, DMA-capable buses like fsl-mc are still going to face the
> exact same problem with or without this patch.
>

the new v2 is hooking callbacks for defualt and pci bus.
so now the implementation will not really look cluttered.
and for fsl-mc buses, we could choose to implement it in default bus callbacks.
will post the patch-set soon.

also with these patch-set we really do not need to prepare emulated child node.


>> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
>> world dma-ranges.
>> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>>
>> this patch serves following:
>>
>> 1) exposes interface to the pci host driver for their
>> inbound memory ranges
>>
>> 2) provide an interface to callers such as of_dma_get_ranges.
>> so then the returned size get best possible (largest) dma_mask.
>> because PCI RC drivers do not call APIs such as
>> dma_set_coherent_mask() and hence rather it shows its addressing
>> capabilities based on dma-ranges.
>> for e.g.
>> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>> we should get dev->coherent_dma_mask=0x7fffffffff.
>>
>> 3) this patch handles multiple inbound windows and dma-ranges.
>> it is left to the caller, how it wants to use them.
>> the new function returns the resources in a standard and unform way
>>
>> 4) this way the callers of for e.g. of_dma_get_ranges
>> does not need to change.
>>
>> 5) leaves scope of adding PCI flag handling for inbound memory
>> by the new function.
>
> Which flags would ever actually matter? DMA windows aren't going to be
> to config or I/O space, so the memory type can be assumed, and the
> 32/64-bit distinction is irrelevant as it's not a relocatable BAR;
> DMA-able system memory isn't going to be read-sensitive, so the
> prefetchable flag shouldn't matter; and not being a BAR none of the
> others would be relevant either.
>
>>
>> Bug: SOC-5216
>> Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
>> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
>> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
>> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
>> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>> Tested-by: CCXSW <ccxswbuild@broadcom.com>
>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>
>> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
>> index 0ee42c3..ed6e69a 100644
>> --- a/drivers/of/of_pci.c
>> +++ b/drivers/of/of_pci.c
>> @@ -283,6 +283,83 @@ int of_pci_get_host_bridge_resources(struct device_node *dev,
>>       return err;
>>  }
>>  EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
>> +
>> +/**
>> + * of_pci_get_dma_ranges - Parse PCI host bridge inbound resources from DT
>> + * @np: device node of the host bridge having the dma-ranges property
>> + * @resources: list where the range of resources will be added after DT parsing
>> + *
>> + * It is the caller's job to free the @resources list.
>> + *
>> + * This function will parse the "dma-ranges" property of a
>> + * PCI host bridge device node and setup the resource mapping based
>> + * on its content.
>> + *
>> + * It returns zero if the range parsing has been successful or a standard error
>> + * value if it failed.
>> + */
>> +
>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
>> +{
>> +     struct device_node *node = of_node_get(np);
>> +     int rlen;
>> +     int ret = 0;
>> +     const int na = 3, ns = 2;
>> +     struct resource *res;
>> +     struct of_pci_range_parser parser;
>> +     struct of_pci_range range;
>> +
>> +     if (!node)
>> +             return -EINVAL;
>> +
>> +     parser.node = node;
>> +     parser.pna = of_n_addr_cells(node);
>> +     parser.np = parser.pna + na + ns;
>> +
>> +     parser.range = of_get_property(node, "dma-ranges", &rlen);
>> +
>> +     if (!parser.range) {
>> +             pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
>> +                       np->full_name);
>> +             ret = -EINVAL;
>> +             goto out;
>> +     }
>> +
>> +     parser.end = parser.range + rlen / sizeof(__be32);
>> +
>> +     for_each_of_pci_range(&parser, &range) {
>
> This is plain wrong - of_pci_range_parser_one() will translate upwards
> through parent "ranges" properties, which is completely backwards for
> DMA addresses.
>
> Robin.
>
>> +             /*
>> +              * If we failed translation or got a zero-sized region
>> +              * then skip this range
>> +              */
>> +             if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
>> +                     continue;
>> +
>> +             res = kzalloc(sizeof(struct resource), GFP_KERNEL);
>> +             if (!res) {
>> +                     ret = -ENOMEM;
>> +                     goto parse_failed;
>> +             }
>> +
>> +             ret = of_pci_range_to_resource(&range, np, res);
>> +             if (ret) {
>> +                     kfree(res);
>> +                     continue;
>> +             }
>> +
>> +             pci_add_resource_offset(resources, res,
>> +                                     res->start - range.pci_addr);
>> +     }
>> +
>> +     return ret;
>> +
>> +parse_failed:
>> +     pci_free_resource_list(resources);
>> +out:
>> +     of_node_put(node);
>> +     return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
>>  #endif /* CONFIG_OF_ADDRESS */
>>
>>  #ifdef CONFIG_PCI_MSI
>> diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
>> index 0e0974e..617b90d 100644
>> --- a/include/linux/of_pci.h
>> +++ b/include/linux/of_pci.h
>> @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
>>  int of_pci_get_host_bridge_resources(struct device_node *dev,
>>                       unsigned char busno, unsigned char bus_max,
>>                       struct list_head *resources, resource_size_t *io_base);
>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources);
>>  #else
>>  static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>>                       unsigned char busno, unsigned char bus_max,
>> @@ -83,6 +84,12 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
>>  {
>>       return -EINVAL;
>>  }
>> +
>> +static inline int of_pci_get_dma_ranges(struct device_node *np,
>> +                                     struct list_head *resources)
>> +{
>> +     return -EINVAL;
>> +}
>>  #endif
>>
>>  #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
>>
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-05  8:10       ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-05  8:10 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, Linux IOMMU, linux-pci, linux-kernel,
	linux-arm-kernel, devicetree, BCM Kernel Feedback, Oza Pawandeep

On Thu, May 4, 2017 at 11:50 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> On 03/05/17 05:46, Oza Pawandeep wrote:
>> this patch reserves the iova for PCI masters.
>> ARM64 based SOCs may have scattered memory banks.
>> such as iproc based SOC has
>>
>> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
>> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
>> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
>> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
>>
>> but incoming PCI transcation addressing capability is limited
>> by host bridge, for example if max incoming window capability
>> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
>>
>> to address this problem, iommu has to avoid allocating iovas which
>> are reserved. which inturn does not allocate iova if it falls into hole.
>
> I don't necessarily disagree with doing this, as we could do with facing
> up to the issue of discontiguous DMA ranges in particular (I too have a
> platform with this problem), but I'm still not overly keen on pulling DT
> specifics into this layer. More than that, though, if we are going to do
> it, then we should do it for all devices with a restrictive
> "dma-ranges", not just PCI ones.
>

pci_create_root_bus allocates host bridge, and currently it takes only
oubound resources.

if inbound memory is also added as a part of  pci_create_root_bus params,
then IOVA allocation can directly make use of inbound_windows member
of structure pci_host_bridge.

struct pci_host_bridge {
        struct device dev;
        struct pci_bus *bus; /* root bus */
        struct list_head windows; /* resource_entry */
        struct list_head inbound_windows; /* resource_entry */
        .
        .
}

so iova_reserve_pci_windows can iterate throough
resource_list_for_each_entry(window, &bridge->inbound_windows)
this way we can remove the dependency of dma-iommu.c on OF layer.

but only thing is:
pci_create_root_bus is called by handful of RC drivers, which needs to change.
ideally if you see both inbound and outbound resource should belong to
pci_host_bridge anyway.
and inbound is completely missing.

let me know your thoughts on this, Robin.

>> Bug: SOC-5216
>> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
>> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
>> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
>> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>> Tested-by: CCXSW <ccxswbuild@broadcom.com>
>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 48d36ce..08764b0 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -27,6 +27,7 @@
>>  #include <linux/iova.h>
>>  #include <linux/irq.h>
>>  #include <linux/mm.h>
>> +#include <linux/of_pci.h>
>>  #include <linux/pci.h>
>>  #include <linux/scatterlist.h>
>>  #include <linux/vmalloc.h>
>> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>               struct iova_domain *iovad)
>>  {
>>       struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>> +     struct device_node *np = bridge->dev.parent->of_node;
>>       struct resource_entry *window;
>>       unsigned long lo, hi;
>> +     int ret;
>> +     dma_addr_t tmp_dma_addr = 0, dma_addr;
>> +     LIST_HEAD(res);
>>
>>       resource_list_for_each_entry(window, &bridge->windows) {
>>               if (resource_type(window->res) != IORESOURCE_MEM &&
>> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>               hi = iova_pfn(iovad, window->res->end - window->offset);
>>               reserve_iova(iovad, lo, hi);
>>       }
>> +
>> +     /* PCI inbound memory reservation. */
>> +     ret = of_pci_get_dma_ranges(np, &res);
>> +     if (!ret) {
>> +             resource_list_for_each_entry(window, &res) {
>> +                     struct resource *res_dma = window->res;
>> +
>> +                     dma_addr = res_dma->start - window->offset;
>> +                     if (tmp_dma_addr > dma_addr) {
>> +                             pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");
>
> I don't see anything in the DT spec about the entries having to be
> sorted, and it's not exactly impossible to sort a list if you need it so
> (and if I'm being really pedantic, one could still trigger this with a
> list that *is* sorted, only by different criteria).
>
> Robin.
>
>> +                             return;
>> +                     }
>> +                     if (tmp_dma_addr != dma_addr) {
>> +                             lo = iova_pfn(iovad, tmp_dma_addr);
>> +                             hi = iova_pfn(iovad, dma_addr - 1);
>> +                             reserve_iova(iovad, lo, hi);
>> +                     }
>> +                     tmp_dma_addr = window->res->end - window->offset;
>> +             }
>> +             /*
>> +              * the last dma-range should honour based on the
>> +              * 32/64-bit dma addresses.
>> +              */
>> +             if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
>> +                     lo = iova_pfn(iovad, tmp_dma_addr);
>> +                     hi = iova_pfn(iovad,
>> +                                   DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
>> +                     reserve_iova(iovad, lo, hi);
>> +             }
>> +     }
>>  }
>>
>>  /**
>>
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-05  8:10       ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza via iommu @ 2017-05-05  8:10 UTC (permalink / raw)
  To: Robin Murphy
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Oza Pawandeep,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Linux IOMMU,
	BCM Kernel Feedback,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, May 4, 2017 at 11:50 PM, Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> wrote:
> On 03/05/17 05:46, Oza Pawandeep wrote:
>> this patch reserves the iova for PCI masters.
>> ARM64 based SOCs may have scattered memory banks.
>> such as iproc based SOC has
>>
>> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
>> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
>> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
>> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
>>
>> but incoming PCI transcation addressing capability is limited
>> by host bridge, for example if max incoming window capability
>> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
>>
>> to address this problem, iommu has to avoid allocating iovas which
>> are reserved. which inturn does not allocate iova if it falls into hole.
>
> I don't necessarily disagree with doing this, as we could do with facing
> up to the issue of discontiguous DMA ranges in particular (I too have a
> platform with this problem), but I'm still not overly keen on pulling DT
> specifics into this layer. More than that, though, if we are going to do
> it, then we should do it for all devices with a restrictive
> "dma-ranges", not just PCI ones.
>

pci_create_root_bus allocates host bridge, and currently it takes only
oubound resources.

if inbound memory is also added as a part of  pci_create_root_bus params,
then IOVA allocation can directly make use of inbound_windows member
of structure pci_host_bridge.

struct pci_host_bridge {
        struct device dev;
        struct pci_bus *bus; /* root bus */
        struct list_head windows; /* resource_entry */
        struct list_head inbound_windows; /* resource_entry */
        .
        .
}

so iova_reserve_pci_windows can iterate throough
resource_list_for_each_entry(window, &bridge->inbound_windows)
this way we can remove the dependency of dma-iommu.c on OF layer.

but only thing is:
pci_create_root_bus is called by handful of RC drivers, which needs to change.
ideally if you see both inbound and outbound resource should belong to
pci_host_bridge anyway.
and inbound is completely missing.

let me know your thoughts on this, Robin.

>> Bug: SOC-5216
>> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
>> Signed-off-by: Oza Pawandeep <oza.oza-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Reviewed-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Tested-by: vpx_autobuild status <vpx_autobuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Tested-by: vpx_smoketest status <vpx_smoketest-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Tested-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> Reviewed-by: Scott Branden <scott.branden-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 48d36ce..08764b0 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -27,6 +27,7 @@
>>  #include <linux/iova.h>
>>  #include <linux/irq.h>
>>  #include <linux/mm.h>
>> +#include <linux/of_pci.h>
>>  #include <linux/pci.h>
>>  #include <linux/scatterlist.h>
>>  #include <linux/vmalloc.h>
>> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>               struct iova_domain *iovad)
>>  {
>>       struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>> +     struct device_node *np = bridge->dev.parent->of_node;
>>       struct resource_entry *window;
>>       unsigned long lo, hi;
>> +     int ret;
>> +     dma_addr_t tmp_dma_addr = 0, dma_addr;
>> +     LIST_HEAD(res);
>>
>>       resource_list_for_each_entry(window, &bridge->windows) {
>>               if (resource_type(window->res) != IORESOURCE_MEM &&
>> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>               hi = iova_pfn(iovad, window->res->end - window->offset);
>>               reserve_iova(iovad, lo, hi);
>>       }
>> +
>> +     /* PCI inbound memory reservation. */
>> +     ret = of_pci_get_dma_ranges(np, &res);
>> +     if (!ret) {
>> +             resource_list_for_each_entry(window, &res) {
>> +                     struct resource *res_dma = window->res;
>> +
>> +                     dma_addr = res_dma->start - window->offset;
>> +                     if (tmp_dma_addr > dma_addr) {
>> +                             pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");
>
> I don't see anything in the DT spec about the entries having to be
> sorted, and it's not exactly impossible to sort a list if you need it so
> (and if I'm being really pedantic, one could still trigger this with a
> list that *is* sorted, only by different criteria).
>
> Robin.
>
>> +                             return;
>> +                     }
>> +                     if (tmp_dma_addr != dma_addr) {
>> +                             lo = iova_pfn(iovad, tmp_dma_addr);
>> +                             hi = iova_pfn(iovad, dma_addr - 1);
>> +                             reserve_iova(iovad, lo, hi);
>> +                     }
>> +                     tmp_dma_addr = window->res->end - window->offset;
>> +             }
>> +             /*
>> +              * the last dma-range should honour based on the
>> +              * 32/64-bit dma addresses.
>> +              */
>> +             if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
>> +                     lo = iova_pfn(iovad, tmp_dma_addr);
>> +                     hi = iova_pfn(iovad,
>> +                                   DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
>> +                     reserve_iova(iovad, lo, hi);
>> +             }
>> +     }
>>  }
>>
>>  /**
>>
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-05  8:10       ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-05  8:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 4, 2017 at 11:50 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> On 03/05/17 05:46, Oza Pawandeep wrote:
>> this patch reserves the iova for PCI masters.
>> ARM64 based SOCs may have scattered memory banks.
>> such as iproc based SOC has
>>
>> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
>> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
>> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
>> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
>>
>> but incoming PCI transcation addressing capability is limited
>> by host bridge, for example if max incoming window capability
>> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
>>
>> to address this problem, iommu has to avoid allocating iovas which
>> are reserved. which inturn does not allocate iova if it falls into hole.
>
> I don't necessarily disagree with doing this, as we could do with facing
> up to the issue of discontiguous DMA ranges in particular (I too have a
> platform with this problem), but I'm still not overly keen on pulling DT
> specifics into this layer. More than that, though, if we are going to do
> it, then we should do it for all devices with a restrictive
> "dma-ranges", not just PCI ones.
>

pci_create_root_bus allocates host bridge, and currently it takes only
oubound resources.

if inbound memory is also added as a part of  pci_create_root_bus params,
then IOVA allocation can directly make use of inbound_windows member
of structure pci_host_bridge.

struct pci_host_bridge {
        struct device dev;
        struct pci_bus *bus; /* root bus */
        struct list_head windows; /* resource_entry */
        struct list_head inbound_windows; /* resource_entry */
        .
        .
}

so iova_reserve_pci_windows can iterate throough
resource_list_for_each_entry(window, &bridge->inbound_windows)
this way we can remove the dependency of dma-iommu.c on OF layer.

but only thing is:
pci_create_root_bus is called by handful of RC drivers, which needs to change.
ideally if you see both inbound and outbound resource should belong to
pci_host_bridge anyway.
and inbound is completely missing.

let me know your thoughts on this, Robin.

>> Bug: SOC-5216
>> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
>> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
>> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
>> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>> Tested-by: CCXSW <ccxswbuild@broadcom.com>
>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 48d36ce..08764b0 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -27,6 +27,7 @@
>>  #include <linux/iova.h>
>>  #include <linux/irq.h>
>>  #include <linux/mm.h>
>> +#include <linux/of_pci.h>
>>  #include <linux/pci.h>
>>  #include <linux/scatterlist.h>
>>  #include <linux/vmalloc.h>
>> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>               struct iova_domain *iovad)
>>  {
>>       struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>> +     struct device_node *np = bridge->dev.parent->of_node;
>>       struct resource_entry *window;
>>       unsigned long lo, hi;
>> +     int ret;
>> +     dma_addr_t tmp_dma_addr = 0, dma_addr;
>> +     LIST_HEAD(res);
>>
>>       resource_list_for_each_entry(window, &bridge->windows) {
>>               if (resource_type(window->res) != IORESOURCE_MEM &&
>> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>               hi = iova_pfn(iovad, window->res->end - window->offset);
>>               reserve_iova(iovad, lo, hi);
>>       }
>> +
>> +     /* PCI inbound memory reservation. */
>> +     ret = of_pci_get_dma_ranges(np, &res);
>> +     if (!ret) {
>> +             resource_list_for_each_entry(window, &res) {
>> +                     struct resource *res_dma = window->res;
>> +
>> +                     dma_addr = res_dma->start - window->offset;
>> +                     if (tmp_dma_addr > dma_addr) {
>> +                             pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");
>
> I don't see anything in the DT spec about the entries having to be
> sorted, and it's not exactly impossible to sort a list if you need it so
> (and if I'm being really pedantic, one could still trigger this with a
> list that *is* sorted, only by different criteria).
>
> Robin.
>
>> +                             return;
>> +                     }
>> +                     if (tmp_dma_addr != dma_addr) {
>> +                             lo = iova_pfn(iovad, tmp_dma_addr);
>> +                             hi = iova_pfn(iovad, dma_addr - 1);
>> +                             reserve_iova(iovad, lo, hi);
>> +                     }
>> +                     tmp_dma_addr = window->res->end - window->offset;
>> +             }
>> +             /*
>> +              * the last dma-range should honour based on the
>> +              * 32/64-bit dma addresses.
>> +              */
>> +             if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
>> +                     lo = iova_pfn(iovad, tmp_dma_addr);
>> +                     hi = iova_pfn(iovad,
>> +                                   DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
>> +                     reserve_iova(iovad, lo, hi);
>> +             }
>> +     }
>>  }
>>
>>  /**
>>
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-05 15:25       ` Robin Murphy
  0 siblings, 0 replies; 61+ messages in thread
From: Robin Murphy @ 2017-05-05 15:25 UTC (permalink / raw)
  To: Oza Oza
  Cc: Joerg Roedel, Linux IOMMU, linux-pci, linux-kernel,
	linux-arm-kernel, devicetree, BCM Kernel Feedback, Oza Pawandeep

On 04/05/17 19:41, Oza Oza wrote:
[...]
>>> 5) leaves scope of adding PCI flag handling for inbound memory
>>> by the new function.
>>
>> Which flags would ever actually matter? DMA windows aren't going to be
>> to config or I/O space, so the memory type can be assumed, and the
>> 32/64-bit distinction is irrelevant as it's not a relocatable BAR;
>> DMA-able system memory isn't going to be read-sensitive, so the
>> prefetchable flag shouldn't matter; and not being a BAR none of the
>> others would be relevant either.
>>
> 
> Thanks Robin; for your reply and attention:
> 
> agree with you, at present it would not matter,
> but it does not mean that we do not scope it to make it matter in future.
> 
> now where it could matter:
> there is Relaxed Ordering for inbound memory for PCI.
> According to standard, Relaxed Ordering (RO) bit can be set only for
> Memory requests and completions (if present in the original request).
> Also, according to transaction ordering rules, I/O and configuration
> requests can still be re-ordered ahead of each other.
> and we would like to make use of it.
> for e.g. lets say we mark memory as Relaxed Ordered with flag.
> the special about this memory is incoming PCI transactions can be
> reordered and rest memory has to be strongly ordered.

Please look at "PCI Bus Binding to: IEEE Std 1275-1994 Standard for Boot
(Initialization Configuration) Firmware" (as referenced in DTSpec) and
explain how PCIe Relaxed Order has anything to do with the DT binding.

> how it our SOC would make use of this is out of scope for the
> discussion at this point of time, but I am just bringing in the
> idea/point how flags could be useful
> for inbound memory, since we would not like throw-away flags completely.

The premise for implementing a PCI-specific parser is that you claim we
need to do something with the phys.hi cell of a DT PCI address, rather
than just taking the numerical part out of the phys.mid and phys.lo
cells. Please make that argument in reference to the flags which that
upper cell actually encodes, not unrelated things.

[...]
>>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
>>> +{
>>> +     struct device_node *node = of_node_get(np);
>>> +     int rlen;
>>> +     int ret = 0;
>>> +     const int na = 3, ns = 2;
>>> +     struct resource *res;
>>> +     struct of_pci_range_parser parser;
>>> +     struct of_pci_range range;
>>> +
>>> +     if (!node)
>>> +             return -EINVAL;
>>> +
>>> +     parser.node = node;
>>> +     parser.pna = of_n_addr_cells(node);
>>> +     parser.np = parser.pna + na + ns;
>>> +
>>> +     parser.range = of_get_property(node, "dma-ranges", &rlen);
>>> +
>>> +     if (!parser.range) {
>>> +             pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
>>> +                       np->full_name);
>>> +             ret = -EINVAL;
>>> +             goto out;
>>> +     }
>>> +
>>> +     parser.end = parser.range + rlen / sizeof(__be32);
>>> +
>>> +     for_each_of_pci_range(&parser, &range) {
>>
>> This is plain wrong - of_pci_range_parser_one() will translate upwards
>> through parent "ranges" properties, which is completely backwards for
>> DMA addresses.
>>
>> Robin.
>>
> 
> No it does not, this patch is thoroughly tested on our SOC and it works.
> of_pci_range_parser_one does not translate upwards through parent. it
> just sticks to given PCI parser.

Frankly, I'm losing patience with this attitude. Please look at the code
you call:

#define for_each_of_pci_range(parser, range) \
	for (; of_pci_range_parser_one(parser, range);)


struct of_pci_range *of_pci_range_parser_one(struct of_pci_range_parser
*parser,
						struct of_pci_range *range)
{
	const int na = 3, ns = 2;

	if (!range)
		return NULL;

	if (!parser->range || parser->range + parser->np > parser->end)
		return NULL;

	range->pci_space = parser->range[0];
	range->flags = of_bus_pci_get_flags(parser->range);
	range->pci_addr = of_read_number(parser->range + 1, ns);
	range->cpu_addr = of_translate_address(parser->node,
				parser->range + na);
...


u64 of_translate_address(struct device_node *dev, const __be32 *in_addr)
{
	return __of_translate_address(dev, in_addr, "ranges");
}


I don't doubt that you still manage to get the right result on *your*
SoC, because you probably have neither further "ranges" nor "dma-ranges"
translations above your host controller node anyway. That does not
change the fact that the proposed code is still obviously wrong for more
complex DT topologies that do.

We're doing upstream work in core code here: I don't particularly care
about making your SoC work; I don't really care about making Juno work
properly either; what I do care about is that code to parse dma-ranges
actually parses dma-ranges *correctly* for all possible valid uses of
dma-ranges, which means fixing the existing bugs and not introducing
more. The principal side-effect of that is that *all* systems with valid
DTs will then work correctly.

Robin.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-05 15:25       ` Robin Murphy
  0 siblings, 0 replies; 61+ messages in thread
From: Robin Murphy @ 2017-05-05 15:25 UTC (permalink / raw)
  To: Oza Oza
  Cc: Joerg Roedel, Linux IOMMU, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, BCM Kernel Feedback,
	Oza Pawandeep

On 04/05/17 19:41, Oza Oza wrote:
[...]
>>> 5) leaves scope of adding PCI flag handling for inbound memory
>>> by the new function.
>>
>> Which flags would ever actually matter? DMA windows aren't going to be
>> to config or I/O space, so the memory type can be assumed, and the
>> 32/64-bit distinction is irrelevant as it's not a relocatable BAR;
>> DMA-able system memory isn't going to be read-sensitive, so the
>> prefetchable flag shouldn't matter; and not being a BAR none of the
>> others would be relevant either.
>>
> 
> Thanks Robin; for your reply and attention:
> 
> agree with you, at present it would not matter,
> but it does not mean that we do not scope it to make it matter in future.
> 
> now where it could matter:
> there is Relaxed Ordering for inbound memory for PCI.
> According to standard, Relaxed Ordering (RO) bit can be set only for
> Memory requests and completions (if present in the original request).
> Also, according to transaction ordering rules, I/O and configuration
> requests can still be re-ordered ahead of each other.
> and we would like to make use of it.
> for e.g. lets say we mark memory as Relaxed Ordered with flag.
> the special about this memory is incoming PCI transactions can be
> reordered and rest memory has to be strongly ordered.

Please look at "PCI Bus Binding to: IEEE Std 1275-1994 Standard for Boot
(Initialization Configuration) Firmware" (as referenced in DTSpec) and
explain how PCIe Relaxed Order has anything to do with the DT binding.

> how it our SOC would make use of this is out of scope for the
> discussion at this point of time, but I am just bringing in the
> idea/point how flags could be useful
> for inbound memory, since we would not like throw-away flags completely.

The premise for implementing a PCI-specific parser is that you claim we
need to do something with the phys.hi cell of a DT PCI address, rather
than just taking the numerical part out of the phys.mid and phys.lo
cells. Please make that argument in reference to the flags which that
upper cell actually encodes, not unrelated things.

[...]
>>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
>>> +{
>>> +     struct device_node *node = of_node_get(np);
>>> +     int rlen;
>>> +     int ret = 0;
>>> +     const int na = 3, ns = 2;
>>> +     struct resource *res;
>>> +     struct of_pci_range_parser parser;
>>> +     struct of_pci_range range;
>>> +
>>> +     if (!node)
>>> +             return -EINVAL;
>>> +
>>> +     parser.node = node;
>>> +     parser.pna = of_n_addr_cells(node);
>>> +     parser.np = parser.pna + na + ns;
>>> +
>>> +     parser.range = of_get_property(node, "dma-ranges", &rlen);
>>> +
>>> +     if (!parser.range) {
>>> +             pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
>>> +                       np->full_name);
>>> +             ret = -EINVAL;
>>> +             goto out;
>>> +     }
>>> +
>>> +     parser.end = parser.range + rlen / sizeof(__be32);
>>> +
>>> +     for_each_of_pci_range(&parser, &range) {
>>
>> This is plain wrong - of_pci_range_parser_one() will translate upwards
>> through parent "ranges" properties, which is completely backwards for
>> DMA addresses.
>>
>> Robin.
>>
> 
> No it does not, this patch is thoroughly tested on our SOC and it works.
> of_pci_range_parser_one does not translate upwards through parent. it
> just sticks to given PCI parser.

Frankly, I'm losing patience with this attitude. Please look at the code
you call:

#define for_each_of_pci_range(parser, range) \
	for (; of_pci_range_parser_one(parser, range);)


struct of_pci_range *of_pci_range_parser_one(struct of_pci_range_parser
*parser,
						struct of_pci_range *range)
{
	const int na = 3, ns = 2;

	if (!range)
		return NULL;

	if (!parser->range || parser->range + parser->np > parser->end)
		return NULL;

	range->pci_space = parser->range[0];
	range->flags = of_bus_pci_get_flags(parser->range);
	range->pci_addr = of_read_number(parser->range + 1, ns);
	range->cpu_addr = of_translate_address(parser->node,
				parser->range + na);
...


u64 of_translate_address(struct device_node *dev, const __be32 *in_addr)
{
	return __of_translate_address(dev, in_addr, "ranges");
}


I don't doubt that you still manage to get the right result on *your*
SoC, because you probably have neither further "ranges" nor "dma-ranges"
translations above your host controller node anyway. That does not
change the fact that the proposed code is still obviously wrong for more
complex DT topologies that do.

We're doing upstream work in core code here: I don't particularly care
about making your SoC work; I don't really care about making Juno work
properly either; what I do care about is that code to parse dma-ranges
actually parses dma-ranges *correctly* for all possible valid uses of
dma-ranges, which means fixing the existing bugs and not introducing
more. The principal side-effect of that is that *all* systems with valid
DTs will then work correctly.

Robin.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-05 15:25       ` Robin Murphy
  0 siblings, 0 replies; 61+ messages in thread
From: Robin Murphy @ 2017-05-05 15:25 UTC (permalink / raw)
  To: Oza Oza
  Cc: devicetree, Oza Pawandeep, linux-pci, Joerg Roedel, linux-kernel,
	Linux IOMMU, BCM Kernel Feedback, linux-arm-kernel

On 04/05/17 19:41, Oza Oza wrote:
[...]
>>> 5) leaves scope of adding PCI flag handling for inbound memory
>>> by the new function.
>>
>> Which flags would ever actually matter? DMA windows aren't going to be
>> to config or I/O space, so the memory type can be assumed, and the
>> 32/64-bit distinction is irrelevant as it's not a relocatable BAR;
>> DMA-able system memory isn't going to be read-sensitive, so the
>> prefetchable flag shouldn't matter; and not being a BAR none of the
>> others would be relevant either.
>>
> 
> Thanks Robin; for your reply and attention:
> 
> agree with you, at present it would not matter,
> but it does not mean that we do not scope it to make it matter in future.
> 
> now where it could matter:
> there is Relaxed Ordering for inbound memory for PCI.
> According to standard, Relaxed Ordering (RO) bit can be set only for
> Memory requests and completions (if present in the original request).
> Also, according to transaction ordering rules, I/O and configuration
> requests can still be re-ordered ahead of each other.
> and we would like to make use of it.
> for e.g. lets say we mark memory as Relaxed Ordered with flag.
> the special about this memory is incoming PCI transactions can be
> reordered and rest memory has to be strongly ordered.

Please look at "PCI Bus Binding to: IEEE Std 1275-1994 Standard for Boot
(Initialization Configuration) Firmware" (as referenced in DTSpec) and
explain how PCIe Relaxed Order has anything to do with the DT binding.

> how it our SOC would make use of this is out of scope for the
> discussion at this point of time, but I am just bringing in the
> idea/point how flags could be useful
> for inbound memory, since we would not like throw-away flags completely.

The premise for implementing a PCI-specific parser is that you claim we
need to do something with the phys.hi cell of a DT PCI address, rather
than just taking the numerical part out of the phys.mid and phys.lo
cells. Please make that argument in reference to the flags which that
upper cell actually encodes, not unrelated things.

[...]
>>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
>>> +{
>>> +     struct device_node *node = of_node_get(np);
>>> +     int rlen;
>>> +     int ret = 0;
>>> +     const int na = 3, ns = 2;
>>> +     struct resource *res;
>>> +     struct of_pci_range_parser parser;
>>> +     struct of_pci_range range;
>>> +
>>> +     if (!node)
>>> +             return -EINVAL;
>>> +
>>> +     parser.node = node;
>>> +     parser.pna = of_n_addr_cells(node);
>>> +     parser.np = parser.pna + na + ns;
>>> +
>>> +     parser.range = of_get_property(node, "dma-ranges", &rlen);
>>> +
>>> +     if (!parser.range) {
>>> +             pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
>>> +                       np->full_name);
>>> +             ret = -EINVAL;
>>> +             goto out;
>>> +     }
>>> +
>>> +     parser.end = parser.range + rlen / sizeof(__be32);
>>> +
>>> +     for_each_of_pci_range(&parser, &range) {
>>
>> This is plain wrong - of_pci_range_parser_one() will translate upwards
>> through parent "ranges" properties, which is completely backwards for
>> DMA addresses.
>>
>> Robin.
>>
> 
> No it does not, this patch is thoroughly tested on our SOC and it works.
> of_pci_range_parser_one does not translate upwards through parent. it
> just sticks to given PCI parser.

Frankly, I'm losing patience with this attitude. Please look at the code
you call:

#define for_each_of_pci_range(parser, range) \
	for (; of_pci_range_parser_one(parser, range);)


struct of_pci_range *of_pci_range_parser_one(struct of_pci_range_parser
*parser,
						struct of_pci_range *range)
{
	const int na = 3, ns = 2;

	if (!range)
		return NULL;

	if (!parser->range || parser->range + parser->np > parser->end)
		return NULL;

	range->pci_space = parser->range[0];
	range->flags = of_bus_pci_get_flags(parser->range);
	range->pci_addr = of_read_number(parser->range + 1, ns);
	range->cpu_addr = of_translate_address(parser->node,
				parser->range + na);
...


u64 of_translate_address(struct device_node *dev, const __be32 *in_addr)
{
	return __of_translate_address(dev, in_addr, "ranges");
}


I don't doubt that you still manage to get the right result on *your*
SoC, because you probably have neither further "ranges" nor "dma-ranges"
translations above your host controller node anyway. That does not
change the fact that the proposed code is still obviously wrong for more
complex DT topologies that do.

We're doing upstream work in core code here: I don't particularly care
about making your SoC work; I don't really care about making Juno work
properly either; what I do care about is that code to parse dma-ranges
actually parses dma-ranges *correctly* for all possible valid uses of
dma-ranges, which means fixing the existing bugs and not introducing
more. The principal side-effect of that is that *all* systems with valid
DTs will then work correctly.

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-05 15:25       ` Robin Murphy
  0 siblings, 0 replies; 61+ messages in thread
From: Robin Murphy @ 2017-05-05 15:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/05/17 19:41, Oza Oza wrote:
[...]
>>> 5) leaves scope of adding PCI flag handling for inbound memory
>>> by the new function.
>>
>> Which flags would ever actually matter? DMA windows aren't going to be
>> to config or I/O space, so the memory type can be assumed, and the
>> 32/64-bit distinction is irrelevant as it's not a relocatable BAR;
>> DMA-able system memory isn't going to be read-sensitive, so the
>> prefetchable flag shouldn't matter; and not being a BAR none of the
>> others would be relevant either.
>>
> 
> Thanks Robin; for your reply and attention:
> 
> agree with you, at present it would not matter,
> but it does not mean that we do not scope it to make it matter in future.
> 
> now where it could matter:
> there is Relaxed Ordering for inbound memory for PCI.
> According to standard, Relaxed Ordering (RO) bit can be set only for
> Memory requests and completions (if present in the original request).
> Also, according to transaction ordering rules, I/O and configuration
> requests can still be re-ordered ahead of each other.
> and we would like to make use of it.
> for e.g. lets say we mark memory as Relaxed Ordered with flag.
> the special about this memory is incoming PCI transactions can be
> reordered and rest memory has to be strongly ordered.

Please look at "PCI Bus Binding to: IEEE Std 1275-1994 Standard for Boot
(Initialization Configuration) Firmware" (as referenced in DTSpec) and
explain how PCIe Relaxed Order has anything to do with the DT binding.

> how it our SOC would make use of this is out of scope for the
> discussion at this point of time, but I am just bringing in the
> idea/point how flags could be useful
> for inbound memory, since we would not like throw-away flags completely.

The premise for implementing a PCI-specific parser is that you claim we
need to do something with the phys.hi cell of a DT PCI address, rather
than just taking the numerical part out of the phys.mid and phys.lo
cells. Please make that argument in reference to the flags which that
upper cell actually encodes, not unrelated things.

[...]
>>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
>>> +{
>>> +     struct device_node *node = of_node_get(np);
>>> +     int rlen;
>>> +     int ret = 0;
>>> +     const int na = 3, ns = 2;
>>> +     struct resource *res;
>>> +     struct of_pci_range_parser parser;
>>> +     struct of_pci_range range;
>>> +
>>> +     if (!node)
>>> +             return -EINVAL;
>>> +
>>> +     parser.node = node;
>>> +     parser.pna = of_n_addr_cells(node);
>>> +     parser.np = parser.pna + na + ns;
>>> +
>>> +     parser.range = of_get_property(node, "dma-ranges", &rlen);
>>> +
>>> +     if (!parser.range) {
>>> +             pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
>>> +                       np->full_name);
>>> +             ret = -EINVAL;
>>> +             goto out;
>>> +     }
>>> +
>>> +     parser.end = parser.range + rlen / sizeof(__be32);
>>> +
>>> +     for_each_of_pci_range(&parser, &range) {
>>
>> This is plain wrong - of_pci_range_parser_one() will translate upwards
>> through parent "ranges" properties, which is completely backwards for
>> DMA addresses.
>>
>> Robin.
>>
> 
> No it does not, this patch is thoroughly tested on our SOC and it works.
> of_pci_range_parser_one does not translate upwards through parent. it
> just sticks to given PCI parser.

Frankly, I'm losing patience with this attitude. Please look at the code
you call:

#define for_each_of_pci_range(parser, range) \
	for (; of_pci_range_parser_one(parser, range);)


struct of_pci_range *of_pci_range_parser_one(struct of_pci_range_parser
*parser,
						struct of_pci_range *range)
{
	const int na = 3, ns = 2;

	if (!range)
		return NULL;

	if (!parser->range || parser->range + parser->np > parser->end)
		return NULL;

	range->pci_space = parser->range[0];
	range->flags = of_bus_pci_get_flags(parser->range);
	range->pci_addr = of_read_number(parser->range + 1, ns);
	range->cpu_addr = of_translate_address(parser->node,
				parser->range + na);
...


u64 of_translate_address(struct device_node *dev, const __be32 *in_addr)
{
	return __of_translate_address(dev, in_addr, "ranges");
}


I don't doubt that you still manage to get the right result on *your*
SoC, because you probably have neither further "ranges" nor "dma-ranges"
translations above your host controller node anyway. That does not
change the fact that the proposed code is still obviously wrong for more
complex DT topologies that do.

We're doing upstream work in core code here: I don't particularly care
about making your SoC work; I don't really care about making Juno work
properly either; what I do care about is that code to parse dma-ranges
actually parses dma-ranges *correctly* for all possible valid uses of
dma-ranges, which means fixing the existing bugs and not introducing
more. The principal side-effect of that is that *all* systems with valid
DTs will then work correctly.

Robin.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-05 15:51         ` Robin Murphy
  0 siblings, 0 replies; 61+ messages in thread
From: Robin Murphy @ 2017-05-05 15:51 UTC (permalink / raw)
  To: Oza Oza
  Cc: Joerg Roedel, Linux IOMMU, linux-pci, linux-kernel,
	linux-arm-kernel, devicetree, BCM Kernel Feedback, Oza Pawandeep

On 04/05/17 19:52, Oza Oza wrote:
> On Thu, May 4, 2017 at 11:50 PM, Robin Murphy <robin.murphy@arm.com> wrote:
>> On 03/05/17 05:46, Oza Pawandeep wrote:
>>> this patch reserves the iova for PCI masters.
>>> ARM64 based SOCs may have scattered memory banks.
>>> such as iproc based SOC has
>>>
>>> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
>>> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
>>> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
>>> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
>>>
>>> but incoming PCI transcation addressing capability is limited
>>> by host bridge, for example if max incoming window capability
>>> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
>>>
>>> to address this problem, iommu has to avoid allocating iovas which
>>> are reserved. which inturn does not allocate iova if it falls into hole.
>>
>> I don't necessarily disagree with doing this, as we could do with facing
>> up to the issue of discontiguous DMA ranges in particular (I too have a
>> platform with this problem), but I'm still not overly keen on pulling DT
>> specifics into this layer. More than that, though, if we are going to do
>> it, then we should do it for all devices with a restrictive
>> "dma-ranges", not just PCI ones.
>>
> 
> How do you propose to do it ?
> 
> my thinking is this:
> iova_reserve_pci_windows is written specific for PCI, and I am adding there.
> 
> ideally
> struct pci_host_bridge should have new member:
> 
> struct list_head inbound_windows; /* resource_entry */
> 
> but somehow this resource have to be filled much before
> iommu_dma_init_domain happens.
> and use brdge resource directly in iova_reserve_pci_windows as it is
> already doing it for outbound memory.
> 
> this will detach the DT specifics from dma-immu layer.
> let me know how this sounds.

Please check the state of the code currently queued in Joerg's tree and
in linux-next - iommu_dma_get_resv_regions() has room for
device-agnostic stuff before the if (!dev_is_pci(dev)) check.

Furthermore, with the probe-deferral changes we end up with a common
dma_configure() routine to abstract the firmware-specifics of
of_dma_configure() vs. acpi_dma_configure(), so it would make sense to
give drivers etc. a similar interface for interrogating ranges. i.e.
some common function that abstracts the difference between parsing DT
dma-ranges vs. the ACPI _DMA object, either with a list-based get/put
model or perhaps an iterator with a user-provided callback (so users
could process in-place or create their own list as necessary). Unless of
course we go all the way to making the ranges an inherent part of the
device layer like some MIPS platforms currently do.

Robin.

>>> Bug: SOC-5216
>>> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
>>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
>>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
>>> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
>>> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
>>> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>>> Tested-by: CCXSW <ccxswbuild@broadcom.com>
>>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>>
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index 48d36ce..08764b0 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -27,6 +27,7 @@
>>>  #include <linux/iova.h>
>>>  #include <linux/irq.h>
>>>  #include <linux/mm.h>
>>> +#include <linux/of_pci.h>
>>>  #include <linux/pci.h>
>>>  #include <linux/scatterlist.h>
>>>  #include <linux/vmalloc.h>
>>> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>>               struct iova_domain *iovad)
>>>  {
>>>       struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>>> +     struct device_node *np = bridge->dev.parent->of_node;
>>>       struct resource_entry *window;
>>>       unsigned long lo, hi;
>>> +     int ret;
>>> +     dma_addr_t tmp_dma_addr = 0, dma_addr;
>>> +     LIST_HEAD(res);
>>>
>>>       resource_list_for_each_entry(window, &bridge->windows) {
>>>               if (resource_type(window->res) != IORESOURCE_MEM &&
>>> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>>               hi = iova_pfn(iovad, window->res->end - window->offset);
>>>               reserve_iova(iovad, lo, hi);
>>>       }
>>> +
>>> +     /* PCI inbound memory reservation. */
>>> +     ret = of_pci_get_dma_ranges(np, &res);
>>> +     if (!ret) {
>>> +             resource_list_for_each_entry(window, &res) {
>>> +                     struct resource *res_dma = window->res;
>>> +
>>> +                     dma_addr = res_dma->start - window->offset;
>>> +                     if (tmp_dma_addr > dma_addr) {
>>> +                             pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");
>>
>> I don't see anything in the DT spec about the entries having to be
>> sorted, and it's not exactly impossible to sort a list if you need it so
>> (and if I'm being really pedantic, one could still trigger this with a
>> list that *is* sorted, only by different criteria).
>>
> 
> we have to sort it the way we want then. I can make it sort then.
> thanks for the suggestion.
> 
>> Robin.
>>
>>> +                             return;
>>> +                     }
>>> +                     if (tmp_dma_addr != dma_addr) {
>>> +                             lo = iova_pfn(iovad, tmp_dma_addr);
>>> +                             hi = iova_pfn(iovad, dma_addr - 1);
>>> +                             reserve_iova(iovad, lo, hi);
>>> +                     }
>>> +                     tmp_dma_addr = window->res->end - window->offset;
>>> +             }
>>> +             /*
>>> +              * the last dma-range should honour based on the
>>> +              * 32/64-bit dma addresses.
>>> +              */
>>> +             if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
>>> +                     lo = iova_pfn(iovad, tmp_dma_addr);
>>> +                     hi = iova_pfn(iovad,
>>> +                                   DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
>>> +                     reserve_iova(iovad, lo, hi);
>>> +             }
>>> +     }
>>>  }
>>>
>>>  /**
>>>
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-05 15:51         ` Robin Murphy
  0 siblings, 0 replies; 61+ messages in thread
From: Robin Murphy @ 2017-05-05 15:51 UTC (permalink / raw)
  To: Oza Oza
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Oza Pawandeep,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Linux IOMMU,
	BCM Kernel Feedback,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 04/05/17 19:52, Oza Oza wrote:
> On Thu, May 4, 2017 at 11:50 PM, Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> wrote:
>> On 03/05/17 05:46, Oza Pawandeep wrote:
>>> this patch reserves the iova for PCI masters.
>>> ARM64 based SOCs may have scattered memory banks.
>>> such as iproc based SOC has
>>>
>>> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
>>> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
>>> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
>>> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
>>>
>>> but incoming PCI transcation addressing capability is limited
>>> by host bridge, for example if max incoming window capability
>>> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
>>>
>>> to address this problem, iommu has to avoid allocating iovas which
>>> are reserved. which inturn does not allocate iova if it falls into hole.
>>
>> I don't necessarily disagree with doing this, as we could do with facing
>> up to the issue of discontiguous DMA ranges in particular (I too have a
>> platform with this problem), but I'm still not overly keen on pulling DT
>> specifics into this layer. More than that, though, if we are going to do
>> it, then we should do it for all devices with a restrictive
>> "dma-ranges", not just PCI ones.
>>
> 
> How do you propose to do it ?
> 
> my thinking is this:
> iova_reserve_pci_windows is written specific for PCI, and I am adding there.
> 
> ideally
> struct pci_host_bridge should have new member:
> 
> struct list_head inbound_windows; /* resource_entry */
> 
> but somehow this resource have to be filled much before
> iommu_dma_init_domain happens.
> and use brdge resource directly in iova_reserve_pci_windows as it is
> already doing it for outbound memory.
> 
> this will detach the DT specifics from dma-immu layer.
> let me know how this sounds.

Please check the state of the code currently queued in Joerg's tree and
in linux-next - iommu_dma_get_resv_regions() has room for
device-agnostic stuff before the if (!dev_is_pci(dev)) check.

Furthermore, with the probe-deferral changes we end up with a common
dma_configure() routine to abstract the firmware-specifics of
of_dma_configure() vs. acpi_dma_configure(), so it would make sense to
give drivers etc. a similar interface for interrogating ranges. i.e.
some common function that abstracts the difference between parsing DT
dma-ranges vs. the ACPI _DMA object, either with a list-based get/put
model or perhaps an iterator with a user-provided callback (so users
could process in-place or create their own list as necessary). Unless of
course we go all the way to making the ranges an inherent part of the
device layer like some MIPS platforms currently do.

Robin.

>>> Bug: SOC-5216
>>> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
>>> Signed-off-by: Oza Pawandeep <oza.oza-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
>>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>>> Reviewed-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>>> Tested-by: vpx_autobuild status <vpx_autobuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>>> Tested-by: vpx_smoketest status <vpx_smoketest-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>>> Tested-by: CCXSW <ccxswbuild-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>>> Reviewed-by: Scott Branden <scott.branden-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>>>
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index 48d36ce..08764b0 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -27,6 +27,7 @@
>>>  #include <linux/iova.h>
>>>  #include <linux/irq.h>
>>>  #include <linux/mm.h>
>>> +#include <linux/of_pci.h>
>>>  #include <linux/pci.h>
>>>  #include <linux/scatterlist.h>
>>>  #include <linux/vmalloc.h>
>>> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>>               struct iova_domain *iovad)
>>>  {
>>>       struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>>> +     struct device_node *np = bridge->dev.parent->of_node;
>>>       struct resource_entry *window;
>>>       unsigned long lo, hi;
>>> +     int ret;
>>> +     dma_addr_t tmp_dma_addr = 0, dma_addr;
>>> +     LIST_HEAD(res);
>>>
>>>       resource_list_for_each_entry(window, &bridge->windows) {
>>>               if (resource_type(window->res) != IORESOURCE_MEM &&
>>> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>>               hi = iova_pfn(iovad, window->res->end - window->offset);
>>>               reserve_iova(iovad, lo, hi);
>>>       }
>>> +
>>> +     /* PCI inbound memory reservation. */
>>> +     ret = of_pci_get_dma_ranges(np, &res);
>>> +     if (!ret) {
>>> +             resource_list_for_each_entry(window, &res) {
>>> +                     struct resource *res_dma = window->res;
>>> +
>>> +                     dma_addr = res_dma->start - window->offset;
>>> +                     if (tmp_dma_addr > dma_addr) {
>>> +                             pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");
>>
>> I don't see anything in the DT spec about the entries having to be
>> sorted, and it's not exactly impossible to sort a list if you need it so
>> (and if I'm being really pedantic, one could still trigger this with a
>> list that *is* sorted, only by different criteria).
>>
> 
> we have to sort it the way we want then. I can make it sort then.
> thanks for the suggestion.
> 
>> Robin.
>>
>>> +                             return;
>>> +                     }
>>> +                     if (tmp_dma_addr != dma_addr) {
>>> +                             lo = iova_pfn(iovad, tmp_dma_addr);
>>> +                             hi = iova_pfn(iovad, dma_addr - 1);
>>> +                             reserve_iova(iovad, lo, hi);
>>> +                     }
>>> +                     tmp_dma_addr = window->res->end - window->offset;
>>> +             }
>>> +             /*
>>> +              * the last dma-range should honour based on the
>>> +              * 32/64-bit dma addresses.
>>> +              */
>>> +             if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
>>> +                     lo = iova_pfn(iovad, tmp_dma_addr);
>>> +                     hi = iova_pfn(iovad,
>>> +                                   DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
>>> +                     reserve_iova(iovad, lo, hi);
>>> +             }
>>> +     }
>>>  }
>>>
>>>  /**
>>>
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-05 15:51         ` Robin Murphy
  0 siblings, 0 replies; 61+ messages in thread
From: Robin Murphy @ 2017-05-05 15:51 UTC (permalink / raw)
  To: Oza Oza
  Cc: devicetree, Oza Pawandeep, linux-pci, Joerg Roedel, linux-kernel,
	Linux IOMMU, BCM Kernel Feedback, linux-arm-kernel

On 04/05/17 19:52, Oza Oza wrote:
> On Thu, May 4, 2017 at 11:50 PM, Robin Murphy <robin.murphy@arm.com> wrote:
>> On 03/05/17 05:46, Oza Pawandeep wrote:
>>> this patch reserves the iova for PCI masters.
>>> ARM64 based SOCs may have scattered memory banks.
>>> such as iproc based SOC has
>>>
>>> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
>>> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
>>> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
>>> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
>>>
>>> but incoming PCI transcation addressing capability is limited
>>> by host bridge, for example if max incoming window capability
>>> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
>>>
>>> to address this problem, iommu has to avoid allocating iovas which
>>> are reserved. which inturn does not allocate iova if it falls into hole.
>>
>> I don't necessarily disagree with doing this, as we could do with facing
>> up to the issue of discontiguous DMA ranges in particular (I too have a
>> platform with this problem), but I'm still not overly keen on pulling DT
>> specifics into this layer. More than that, though, if we are going to do
>> it, then we should do it for all devices with a restrictive
>> "dma-ranges", not just PCI ones.
>>
> 
> How do you propose to do it ?
> 
> my thinking is this:
> iova_reserve_pci_windows is written specific for PCI, and I am adding there.
> 
> ideally
> struct pci_host_bridge should have new member:
> 
> struct list_head inbound_windows; /* resource_entry */
> 
> but somehow this resource have to be filled much before
> iommu_dma_init_domain happens.
> and use brdge resource directly in iova_reserve_pci_windows as it is
> already doing it for outbound memory.
> 
> this will detach the DT specifics from dma-immu layer.
> let me know how this sounds.

Please check the state of the code currently queued in Joerg's tree and
in linux-next - iommu_dma_get_resv_regions() has room for
device-agnostic stuff before the if (!dev_is_pci(dev)) check.

Furthermore, with the probe-deferral changes we end up with a common
dma_configure() routine to abstract the firmware-specifics of
of_dma_configure() vs. acpi_dma_configure(), so it would make sense to
give drivers etc. a similar interface for interrogating ranges. i.e.
some common function that abstracts the difference between parsing DT
dma-ranges vs. the ACPI _DMA object, either with a list-based get/put
model or perhaps an iterator with a user-provided callback (so users
could process in-place or create their own list as necessary). Unless of
course we go all the way to making the ranges an inherent part of the
device layer like some MIPS platforms currently do.

Robin.

>>> Bug: SOC-5216
>>> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
>>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
>>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
>>> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
>>> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
>>> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>>> Tested-by: CCXSW <ccxswbuild@broadcom.com>
>>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>>
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index 48d36ce..08764b0 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -27,6 +27,7 @@
>>>  #include <linux/iova.h>
>>>  #include <linux/irq.h>
>>>  #include <linux/mm.h>
>>> +#include <linux/of_pci.h>
>>>  #include <linux/pci.h>
>>>  #include <linux/scatterlist.h>
>>>  #include <linux/vmalloc.h>
>>> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>>               struct iova_domain *iovad)
>>>  {
>>>       struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>>> +     struct device_node *np = bridge->dev.parent->of_node;
>>>       struct resource_entry *window;
>>>       unsigned long lo, hi;
>>> +     int ret;
>>> +     dma_addr_t tmp_dma_addr = 0, dma_addr;
>>> +     LIST_HEAD(res);
>>>
>>>       resource_list_for_each_entry(window, &bridge->windows) {
>>>               if (resource_type(window->res) != IORESOURCE_MEM &&
>>> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>>               hi = iova_pfn(iovad, window->res->end - window->offset);
>>>               reserve_iova(iovad, lo, hi);
>>>       }
>>> +
>>> +     /* PCI inbound memory reservation. */
>>> +     ret = of_pci_get_dma_ranges(np, &res);
>>> +     if (!ret) {
>>> +             resource_list_for_each_entry(window, &res) {
>>> +                     struct resource *res_dma = window->res;
>>> +
>>> +                     dma_addr = res_dma->start - window->offset;
>>> +                     if (tmp_dma_addr > dma_addr) {
>>> +                             pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");
>>
>> I don't see anything in the DT spec about the entries having to be
>> sorted, and it's not exactly impossible to sort a list if you need it so
>> (and if I'm being really pedantic, one could still trigger this with a
>> list that *is* sorted, only by different criteria).
>>
> 
> we have to sort it the way we want then. I can make it sort then.
> thanks for the suggestion.
> 
>> Robin.
>>
>>> +                             return;
>>> +                     }
>>> +                     if (tmp_dma_addr != dma_addr) {
>>> +                             lo = iova_pfn(iovad, tmp_dma_addr);
>>> +                             hi = iova_pfn(iovad, dma_addr - 1);
>>> +                             reserve_iova(iovad, lo, hi);
>>> +                     }
>>> +                     tmp_dma_addr = window->res->end - window->offset;
>>> +             }
>>> +             /*
>>> +              * the last dma-range should honour based on the
>>> +              * 32/64-bit dma addresses.
>>> +              */
>>> +             if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
>>> +                     lo = iova_pfn(iovad, tmp_dma_addr);
>>> +                     hi = iova_pfn(iovad,
>>> +                                   DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
>>> +                     reserve_iova(iovad, lo, hi);
>>> +             }
>>> +     }
>>>  }
>>>
>>>  /**
>>>
>>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-05 15:51         ` Robin Murphy
  0 siblings, 0 replies; 61+ messages in thread
From: Robin Murphy @ 2017-05-05 15:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/05/17 19:52, Oza Oza wrote:
> On Thu, May 4, 2017 at 11:50 PM, Robin Murphy <robin.murphy@arm.com> wrote:
>> On 03/05/17 05:46, Oza Pawandeep wrote:
>>> this patch reserves the iova for PCI masters.
>>> ARM64 based SOCs may have scattered memory banks.
>>> such as iproc based SOC has
>>>
>>> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
>>> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
>>> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
>>> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
>>>
>>> but incoming PCI transcation addressing capability is limited
>>> by host bridge, for example if max incoming window capability
>>> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
>>>
>>> to address this problem, iommu has to avoid allocating iovas which
>>> are reserved. which inturn does not allocate iova if it falls into hole.
>>
>> I don't necessarily disagree with doing this, as we could do with facing
>> up to the issue of discontiguous DMA ranges in particular (I too have a
>> platform with this problem), but I'm still not overly keen on pulling DT
>> specifics into this layer. More than that, though, if we are going to do
>> it, then we should do it for all devices with a restrictive
>> "dma-ranges", not just PCI ones.
>>
> 
> How do you propose to do it ?
> 
> my thinking is this:
> iova_reserve_pci_windows is written specific for PCI, and I am adding there.
> 
> ideally
> struct pci_host_bridge should have new member:
> 
> struct list_head inbound_windows; /* resource_entry */
> 
> but somehow this resource have to be filled much before
> iommu_dma_init_domain happens.
> and use brdge resource directly in iova_reserve_pci_windows as it is
> already doing it for outbound memory.
> 
> this will detach the DT specifics from dma-immu layer.
> let me know how this sounds.

Please check the state of the code currently queued in Joerg's tree and
in linux-next - iommu_dma_get_resv_regions() has room for
device-agnostic stuff before the if (!dev_is_pci(dev)) check.

Furthermore, with the probe-deferral changes we end up with a common
dma_configure() routine to abstract the firmware-specifics of
of_dma_configure() vs. acpi_dma_configure(), so it would make sense to
give drivers etc. a similar interface for interrogating ranges. i.e.
some common function that abstracts the difference between parsing DT
dma-ranges vs. the ACPI _DMA object, either with a list-based get/put
model or perhaps an iterator with a user-provided callback (so users
could process in-place or create their own list as necessary). Unless of
course we go all the way to making the ranges an inherent part of the
device layer like some MIPS platforms currently do.

Robin.

>>> Bug: SOC-5216
>>> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
>>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
>>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
>>> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
>>> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
>>> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>>> Tested-by: CCXSW <ccxswbuild@broadcom.com>
>>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>>
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index 48d36ce..08764b0 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -27,6 +27,7 @@
>>>  #include <linux/iova.h>
>>>  #include <linux/irq.h>
>>>  #include <linux/mm.h>
>>> +#include <linux/of_pci.h>
>>>  #include <linux/pci.h>
>>>  #include <linux/scatterlist.h>
>>>  #include <linux/vmalloc.h>
>>> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>>               struct iova_domain *iovad)
>>>  {
>>>       struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>>> +     struct device_node *np = bridge->dev.parent->of_node;
>>>       struct resource_entry *window;
>>>       unsigned long lo, hi;
>>> +     int ret;
>>> +     dma_addr_t tmp_dma_addr = 0, dma_addr;
>>> +     LIST_HEAD(res);
>>>
>>>       resource_list_for_each_entry(window, &bridge->windows) {
>>>               if (resource_type(window->res) != IORESOURCE_MEM &&
>>> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>>               hi = iova_pfn(iovad, window->res->end - window->offset);
>>>               reserve_iova(iovad, lo, hi);
>>>       }
>>> +
>>> +     /* PCI inbound memory reservation. */
>>> +     ret = of_pci_get_dma_ranges(np, &res);
>>> +     if (!ret) {
>>> +             resource_list_for_each_entry(window, &res) {
>>> +                     struct resource *res_dma = window->res;
>>> +
>>> +                     dma_addr = res_dma->start - window->offset;
>>> +                     if (tmp_dma_addr > dma_addr) {
>>> +                             pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");
>>
>> I don't see anything in the DT spec about the entries having to be
>> sorted, and it's not exactly impossible to sort a list if you need it so
>> (and if I'm being really pedantic, one could still trigger this with a
>> list that *is* sorted, only by different criteria).
>>
> 
> we have to sort it the way we want then. I can make it sort then.
> thanks for the suggestion.
> 
>> Robin.
>>
>>> +                             return;
>>> +                     }
>>> +                     if (tmp_dma_addr != dma_addr) {
>>> +                             lo = iova_pfn(iovad, tmp_dma_addr);
>>> +                             hi = iova_pfn(iovad, dma_addr - 1);
>>> +                             reserve_iova(iovad, lo, hi);
>>> +                     }
>>> +                     tmp_dma_addr = window->res->end - window->offset;
>>> +             }
>>> +             /*
>>> +              * the last dma-range should honour based on the
>>> +              * 32/64-bit dma addresses.
>>> +              */
>>> +             if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
>>> +                     lo = iova_pfn(iovad, tmp_dma_addr);
>>> +                     hi = iova_pfn(iovad,
>>> +                                   DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
>>> +                     reserve_iova(iovad, lo, hi);
>>> +             }
>>> +     }
>>>  }
>>>
>>>  /**
>>>
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-06  5:30         ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-06  5:30 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, Linux IOMMU, linux-pci, linux-kernel,
	linux-arm-kernel, devicetree, BCM Kernel Feedback, Oza Pawandeep

On Fri, May 5, 2017 at 8:55 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> On 04/05/17 19:41, Oza Oza wrote:
> [...]
>>>> 5) leaves scope of adding PCI flag handling for inbound memory
>>>> by the new function.
>>>
>>> Which flags would ever actually matter? DMA windows aren't going to be
>>> to config or I/O space, so the memory type can be assumed, and the
>>> 32/64-bit distinction is irrelevant as it's not a relocatable BAR;
>>> DMA-able system memory isn't going to be read-sensitive, so the
>>> prefetchable flag shouldn't matter; and not being a BAR none of the
>>> others would be relevant either.
>>>
>>
>> Thanks Robin; for your reply and attention:
>>
>> agree with you, at present it would not matter,
>> but it does not mean that we do not scope it to make it matter in future.
>>
>> now where it could matter:
>> there is Relaxed Ordering for inbound memory for PCI.
>> According to standard, Relaxed Ordering (RO) bit can be set only for
>> Memory requests and completions (if present in the original request).
>> Also, according to transaction ordering rules, I/O and configuration
>> requests can still be re-ordered ahead of each other.
>> and we would like to make use of it.
>> for e.g. lets say we mark memory as Relaxed Ordered with flag.
>> the special about this memory is incoming PCI transactions can be
>> reordered and rest memory has to be strongly ordered.
>
> Please look at "PCI Bus Binding to: IEEE Std 1275-1994 Standard for Boot
> (Initialization Configuration) Firmware" (as referenced in DTSpec) and
> explain how PCIe Relaxed Order has anything to do with the DT binding.
>
>> how it our SOC would make use of this is out of scope for the
>> discussion at this point of time, but I am just bringing in the
>> idea/point how flags could be useful
>> for inbound memory, since we would not like throw-away flags completely.
>
> The premise for implementing a PCI-specific parser is that you claim we
> need to do something with the phys.hi cell of a DT PCI address, rather
> than just taking the numerical part out of the phys.mid and phys.lo
> cells. Please make that argument in reference to the flags which that
> upper cell actually encodes, not unrelated things.
>

I think, the whole discussion around inbound flags is not what I
intended to bring.
this patch does nothing about inbound flag and never intends to solve
anything regarding inbound flags.
infact I would like to remove point 5 form the commit message. which
should keep it out of discussion completely.

let met tell what this patch is trying to address/solve 2 BUGs
1) fix wrong size return from of_dma_configure for PCI masters. (which
is right now BUG)

2) handles multiple dma-ranges cleanly

3) It takes care of dma-ranges being optional.

4) following is the comment on function of_dma_get_range (which is also a BUG)
                "It returns -ENODEV if "dma-ranges" property was not found
                 * for this device in DT."
which I think is wrong for PCI device, because if dma-ranges are
absent then instead of returning  -ENODEV,
it should return 0 with largest possible host memory.

it solves all the above 4 problems.

> [...]
>>>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
>>>> +{
>>>> +     struct device_node *node = of_node_get(np);
>>>> +     int rlen;
>>>> +     int ret = 0;
>>>> +     const int na = 3, ns = 2;
>>>> +     struct resource *res;
>>>> +     struct of_pci_range_parser parser;
>>>> +     struct of_pci_range range;
>>>> +
>>>> +     if (!node)
>>>> +             return -EINVAL;
>>>> +
>>>> +     parser.node = node;
>>>> +     parser.pna = of_n_addr_cells(node);
>>>> +     parser.np = parser.pna + na + ns;
>>>> +
>>>> +     parser.range = of_get_property(node, "dma-ranges", &rlen);
>>>> +
>>>> +     if (!parser.range) {
>>>> +             pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
>>>> +                       np->full_name);
>>>> +             ret = -EINVAL;
>>>> +             goto out;
>>>> +     }
>>>> +
>>>> +     parser.end = parser.range + rlen / sizeof(__be32);
>>>> +
>>>> +     for_each_of_pci_range(&parser, &range) {
>>>
>>> This is plain wrong - of_pci_range_parser_one() will translate upwards
>>> through parent "ranges" properties, which is completely backwards for
>>> DMA addresses.
>>>
>>> Robin.
>>>
>>
>> No it does not, this patch is thoroughly tested on our SOC and it works.
>> of_pci_range_parser_one does not translate upwards through parent. it
>> just sticks to given PCI parser.
>
> Frankly, I'm losing patience with this attitude. Please look at the code
> you call:
>
> #define for_each_of_pci_range(parser, range) \
>         for (; of_pci_range_parser_one(parser, range);)
>
>
> struct of_pci_range *of_pci_range_parser_one(struct of_pci_range_parser
> *parser,
>                                                 struct of_pci_range *range)
> {
>         const int na = 3, ns = 2;
>
>         if (!range)
>                 return NULL;
>
>         if (!parser->range || parser->range + parser->np > parser->end)
>                 return NULL;
>
>         range->pci_space = parser->range[0];
>         range->flags = of_bus_pci_get_flags(parser->range);
>         range->pci_addr = of_read_number(parser->range + 1, ns);
>         range->cpu_addr = of_translate_address(parser->node,
>                                 parser->range + na);
> ...
>
>
> u64 of_translate_address(struct device_node *dev, const __be32 *in_addr)
> {
>         return __of_translate_address(dev, in_addr, "ranges");
> }
>
>
> I don't doubt that you still manage to get the right result on *your*
> SoC, because you probably have neither further "ranges" nor "dma-ranges"
> translations above your host controller node anyway. That does not
> change the fact that the proposed code is still obviously wrong for more
> complex DT topologies that do.

sorry but I am confused, and sorry again for not getting through on
what you said.

this patch assumes that the root bus would have dma-ranges.
are you saying this code doesn't iterate through way up till it finds
valid dma-ranges ?

or

you are saying
of_pci_range_parser_one() will translate upwards
through parent "ranges" properties

>
> We're doing upstream work in core code here: I don't particularly care
> about making your SoC work; I don't really care about making Juno work
> properly either; what I do care about is that code to parse dma-ranges
> actually parses dma-ranges *correctly* for all possible valid uses of
> dma-ranges, which means fixing the existing bugs and not introducing
> more. The principal side-effect of that is that *all* systems with valid
> DTs will then work correctly.
>

I do see your point now.....and my apologies for not getting it right
at the first time.

but I would not know all the nitty-gritty of all the code of framework and
every complex DT topology of other SOCs.
that is the reason we seek for comments from experts like you, to make
the patch better.

when I say it works on our SOC, I only meant that this patch is
tested. so again apologies there.

there is one obvious problem is
of_translate_dma_address should get called instead of
of_translate_address (so no "ranges") instead ("dma-ranges")

but with that also as you said, it will traverse all the way up to the
DT hierarchy.

I think there are 2 problems with this patch.

1) this patch should try to iterate through all the way up to find
first dma-ranges and should stop when it finds it.
    it should not assume that dma-ranges will always be found at the
current node.

2) of_translate_dma_address is broken, because if point 1 is achieved,
then no need to traverse anymore.

but before that, again seeking your opinion, whether we want to go
down this path.
registering bus specific get_ranges as in PATCH v5 ? in my opinion it
is better way of handling it.

the original patch which you had in mind, you will have to club both
PCI and memory -mapped implementation together.
even if dma-ranges (ignoring flags) remain the same in nature, still
you have to parse it differently. because address-cells are different.
and you will have to handle multiple ranges.

I just tried to bring it out to different path with registering bus
specific callbacks.

> Robin.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-06  5:30         ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza via iommu @ 2017-05-06  5:30 UTC (permalink / raw)
  To: Robin Murphy
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Oza Pawandeep,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Linux IOMMU,
	BCM Kernel Feedback,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, May 5, 2017 at 8:55 PM, Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> wrote:
> On 04/05/17 19:41, Oza Oza wrote:
> [...]
>>>> 5) leaves scope of adding PCI flag handling for inbound memory
>>>> by the new function.
>>>
>>> Which flags would ever actually matter? DMA windows aren't going to be
>>> to config or I/O space, so the memory type can be assumed, and the
>>> 32/64-bit distinction is irrelevant as it's not a relocatable BAR;
>>> DMA-able system memory isn't going to be read-sensitive, so the
>>> prefetchable flag shouldn't matter; and not being a BAR none of the
>>> others would be relevant either.
>>>
>>
>> Thanks Robin; for your reply and attention:
>>
>> agree with you, at present it would not matter,
>> but it does not mean that we do not scope it to make it matter in future.
>>
>> now where it could matter:
>> there is Relaxed Ordering for inbound memory for PCI.
>> According to standard, Relaxed Ordering (RO) bit can be set only for
>> Memory requests and completions (if present in the original request).
>> Also, according to transaction ordering rules, I/O and configuration
>> requests can still be re-ordered ahead of each other.
>> and we would like to make use of it.
>> for e.g. lets say we mark memory as Relaxed Ordered with flag.
>> the special about this memory is incoming PCI transactions can be
>> reordered and rest memory has to be strongly ordered.
>
> Please look at "PCI Bus Binding to: IEEE Std 1275-1994 Standard for Boot
> (Initialization Configuration) Firmware" (as referenced in DTSpec) and
> explain how PCIe Relaxed Order has anything to do with the DT binding.
>
>> how it our SOC would make use of this is out of scope for the
>> discussion at this point of time, but I am just bringing in the
>> idea/point how flags could be useful
>> for inbound memory, since we would not like throw-away flags completely.
>
> The premise for implementing a PCI-specific parser is that you claim we
> need to do something with the phys.hi cell of a DT PCI address, rather
> than just taking the numerical part out of the phys.mid and phys.lo
> cells. Please make that argument in reference to the flags which that
> upper cell actually encodes, not unrelated things.
>

I think, the whole discussion around inbound flags is not what I
intended to bring.
this patch does nothing about inbound flag and never intends to solve
anything regarding inbound flags.
infact I would like to remove point 5 form the commit message. which
should keep it out of discussion completely.

let met tell what this patch is trying to address/solve 2 BUGs
1) fix wrong size return from of_dma_configure for PCI masters. (which
is right now BUG)

2) handles multiple dma-ranges cleanly

3) It takes care of dma-ranges being optional.

4) following is the comment on function of_dma_get_range (which is also a BUG)
                "It returns -ENODEV if "dma-ranges" property was not found
                 * for this device in DT."
which I think is wrong for PCI device, because if dma-ranges are
absent then instead of returning  -ENODEV,
it should return 0 with largest possible host memory.

it solves all the above 4 problems.

> [...]
>>>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
>>>> +{
>>>> +     struct device_node *node = of_node_get(np);
>>>> +     int rlen;
>>>> +     int ret = 0;
>>>> +     const int na = 3, ns = 2;
>>>> +     struct resource *res;
>>>> +     struct of_pci_range_parser parser;
>>>> +     struct of_pci_range range;
>>>> +
>>>> +     if (!node)
>>>> +             return -EINVAL;
>>>> +
>>>> +     parser.node = node;
>>>> +     parser.pna = of_n_addr_cells(node);
>>>> +     parser.np = parser.pna + na + ns;
>>>> +
>>>> +     parser.range = of_get_property(node, "dma-ranges", &rlen);
>>>> +
>>>> +     if (!parser.range) {
>>>> +             pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
>>>> +                       np->full_name);
>>>> +             ret = -EINVAL;
>>>> +             goto out;
>>>> +     }
>>>> +
>>>> +     parser.end = parser.range + rlen / sizeof(__be32);
>>>> +
>>>> +     for_each_of_pci_range(&parser, &range) {
>>>
>>> This is plain wrong - of_pci_range_parser_one() will translate upwards
>>> through parent "ranges" properties, which is completely backwards for
>>> DMA addresses.
>>>
>>> Robin.
>>>
>>
>> No it does not, this patch is thoroughly tested on our SOC and it works.
>> of_pci_range_parser_one does not translate upwards through parent. it
>> just sticks to given PCI parser.
>
> Frankly, I'm losing patience with this attitude. Please look at the code
> you call:
>
> #define for_each_of_pci_range(parser, range) \
>         for (; of_pci_range_parser_one(parser, range);)
>
>
> struct of_pci_range *of_pci_range_parser_one(struct of_pci_range_parser
> *parser,
>                                                 struct of_pci_range *range)
> {
>         const int na = 3, ns = 2;
>
>         if (!range)
>                 return NULL;
>
>         if (!parser->range || parser->range + parser->np > parser->end)
>                 return NULL;
>
>         range->pci_space = parser->range[0];
>         range->flags = of_bus_pci_get_flags(parser->range);
>         range->pci_addr = of_read_number(parser->range + 1, ns);
>         range->cpu_addr = of_translate_address(parser->node,
>                                 parser->range + na);
> ...
>
>
> u64 of_translate_address(struct device_node *dev, const __be32 *in_addr)
> {
>         return __of_translate_address(dev, in_addr, "ranges");
> }
>
>
> I don't doubt that you still manage to get the right result on *your*
> SoC, because you probably have neither further "ranges" nor "dma-ranges"
> translations above your host controller node anyway. That does not
> change the fact that the proposed code is still obviously wrong for more
> complex DT topologies that do.

sorry but I am confused, and sorry again for not getting through on
what you said.

this patch assumes that the root bus would have dma-ranges.
are you saying this code doesn't iterate through way up till it finds
valid dma-ranges ?

or

you are saying
of_pci_range_parser_one() will translate upwards
through parent "ranges" properties

>
> We're doing upstream work in core code here: I don't particularly care
> about making your SoC work; I don't really care about making Juno work
> properly either; what I do care about is that code to parse dma-ranges
> actually parses dma-ranges *correctly* for all possible valid uses of
> dma-ranges, which means fixing the existing bugs and not introducing
> more. The principal side-effect of that is that *all* systems with valid
> DTs will then work correctly.
>

I do see your point now.....and my apologies for not getting it right
at the first time.

but I would not know all the nitty-gritty of all the code of framework and
every complex DT topology of other SOCs.
that is the reason we seek for comments from experts like you, to make
the patch better.

when I say it works on our SOC, I only meant that this patch is
tested. so again apologies there.

there is one obvious problem is
of_translate_dma_address should get called instead of
of_translate_address (so no "ranges") instead ("dma-ranges")

but with that also as you said, it will traverse all the way up to the
DT hierarchy.

I think there are 2 problems with this patch.

1) this patch should try to iterate through all the way up to find
first dma-ranges and should stop when it finds it.
    it should not assume that dma-ranges will always be found at the
current node.

2) of_translate_dma_address is broken, because if point 1 is achieved,
then no need to traverse anymore.

but before that, again seeking your opinion, whether we want to go
down this path.
registering bus specific get_ranges as in PATCH v5 ? in my opinion it
is better way of handling it.

the original patch which you had in mind, you will have to club both
PCI and memory -mapped implementation together.
even if dma-ranges (ignoring flags) remain the same in nature, still
you have to parse it differently. because address-cells are different.
and you will have to handle multiple ranges.

I just tried to bring it out to different path with registering bus
specific callbacks.

> Robin.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-06  5:30         ` Oza Oza via iommu
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-06  5:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 5, 2017 at 8:55 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> On 04/05/17 19:41, Oza Oza wrote:
> [...]
>>>> 5) leaves scope of adding PCI flag handling for inbound memory
>>>> by the new function.
>>>
>>> Which flags would ever actually matter? DMA windows aren't going to be
>>> to config or I/O space, so the memory type can be assumed, and the
>>> 32/64-bit distinction is irrelevant as it's not a relocatable BAR;
>>> DMA-able system memory isn't going to be read-sensitive, so the
>>> prefetchable flag shouldn't matter; and not being a BAR none of the
>>> others would be relevant either.
>>>
>>
>> Thanks Robin; for your reply and attention:
>>
>> agree with you, at present it would not matter,
>> but it does not mean that we do not scope it to make it matter in future.
>>
>> now where it could matter:
>> there is Relaxed Ordering for inbound memory for PCI.
>> According to standard, Relaxed Ordering (RO) bit can be set only for
>> Memory requests and completions (if present in the original request).
>> Also, according to transaction ordering rules, I/O and configuration
>> requests can still be re-ordered ahead of each other.
>> and we would like to make use of it.
>> for e.g. lets say we mark memory as Relaxed Ordered with flag.
>> the special about this memory is incoming PCI transactions can be
>> reordered and rest memory has to be strongly ordered.
>
> Please look at "PCI Bus Binding to: IEEE Std 1275-1994 Standard for Boot
> (Initialization Configuration) Firmware" (as referenced in DTSpec) and
> explain how PCIe Relaxed Order has anything to do with the DT binding.
>
>> how it our SOC would make use of this is out of scope for the
>> discussion at this point of time, but I am just bringing in the
>> idea/point how flags could be useful
>> for inbound memory, since we would not like throw-away flags completely.
>
> The premise for implementing a PCI-specific parser is that you claim we
> need to do something with the phys.hi cell of a DT PCI address, rather
> than just taking the numerical part out of the phys.mid and phys.lo
> cells. Please make that argument in reference to the flags which that
> upper cell actually encodes, not unrelated things.
>

I think, the whole discussion around inbound flags is not what I
intended to bring.
this patch does nothing about inbound flag and never intends to solve
anything regarding inbound flags.
infact I would like to remove point 5 form the commit message. which
should keep it out of discussion completely.

let met tell what this patch is trying to address/solve 2 BUGs
1) fix wrong size return from of_dma_configure for PCI masters. (which
is right now BUG)

2) handles multiple dma-ranges cleanly

3) It takes care of dma-ranges being optional.

4) following is the comment on function of_dma_get_range (which is also a BUG)
                "It returns -ENODEV if "dma-ranges" property was not found
                 * for this device in DT."
which I think is wrong for PCI device, because if dma-ranges are
absent then instead of returning  -ENODEV,
it should return 0 with largest possible host memory.

it solves all the above 4 problems.

> [...]
>>>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
>>>> +{
>>>> +     struct device_node *node = of_node_get(np);
>>>> +     int rlen;
>>>> +     int ret = 0;
>>>> +     const int na = 3, ns = 2;
>>>> +     struct resource *res;
>>>> +     struct of_pci_range_parser parser;
>>>> +     struct of_pci_range range;
>>>> +
>>>> +     if (!node)
>>>> +             return -EINVAL;
>>>> +
>>>> +     parser.node = node;
>>>> +     parser.pna = of_n_addr_cells(node);
>>>> +     parser.np = parser.pna + na + ns;
>>>> +
>>>> +     parser.range = of_get_property(node, "dma-ranges", &rlen);
>>>> +
>>>> +     if (!parser.range) {
>>>> +             pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
>>>> +                       np->full_name);
>>>> +             ret = -EINVAL;
>>>> +             goto out;
>>>> +     }
>>>> +
>>>> +     parser.end = parser.range + rlen / sizeof(__be32);
>>>> +
>>>> +     for_each_of_pci_range(&parser, &range) {
>>>
>>> This is plain wrong - of_pci_range_parser_one() will translate upwards
>>> through parent "ranges" properties, which is completely backwards for
>>> DMA addresses.
>>>
>>> Robin.
>>>
>>
>> No it does not, this patch is thoroughly tested on our SOC and it works.
>> of_pci_range_parser_one does not translate upwards through parent. it
>> just sticks to given PCI parser.
>
> Frankly, I'm losing patience with this attitude. Please look at the code
> you call:
>
> #define for_each_of_pci_range(parser, range) \
>         for (; of_pci_range_parser_one(parser, range);)
>
>
> struct of_pci_range *of_pci_range_parser_one(struct of_pci_range_parser
> *parser,
>                                                 struct of_pci_range *range)
> {
>         const int na = 3, ns = 2;
>
>         if (!range)
>                 return NULL;
>
>         if (!parser->range || parser->range + parser->np > parser->end)
>                 return NULL;
>
>         range->pci_space = parser->range[0];
>         range->flags = of_bus_pci_get_flags(parser->range);
>         range->pci_addr = of_read_number(parser->range + 1, ns);
>         range->cpu_addr = of_translate_address(parser->node,
>                                 parser->range + na);
> ...
>
>
> u64 of_translate_address(struct device_node *dev, const __be32 *in_addr)
> {
>         return __of_translate_address(dev, in_addr, "ranges");
> }
>
>
> I don't doubt that you still manage to get the right result on *your*
> SoC, because you probably have neither further "ranges" nor "dma-ranges"
> translations above your host controller node anyway. That does not
> change the fact that the proposed code is still obviously wrong for more
> complex DT topologies that do.

sorry but I am confused, and sorry again for not getting through on
what you said.

this patch assumes that the root bus would have dma-ranges.
are you saying this code doesn't iterate through way up till it finds
valid dma-ranges ?

or

you are saying
of_pci_range_parser_one() will translate upwards
through parent "ranges" properties

>
> We're doing upstream work in core code here: I don't particularly care
> about making your SoC work; I don't really care about making Juno work
> properly either; what I do care about is that code to parse dma-ranges
> actually parses dma-ranges *correctly* for all possible valid uses of
> dma-ranges, which means fixing the existing bugs and not introducing
> more. The principal side-effect of that is that *all* systems with valid
> DTs will then work correctly.
>

I do see your point now.....and my apologies for not getting it right
at the first time.

but I would not know all the nitty-gritty of all the code of framework and
every complex DT topology of other SOCs.
that is the reason we seek for comments from experts like you, to make
the patch better.

when I say it works on our SOC, I only meant that this patch is
tested. so again apologies there.

there is one obvious problem is
of_translate_dma_address should get called instead of
of_translate_address (so no "ranges") instead ("dma-ranges")

but with that also as you said, it will traverse all the way up to the
DT hierarchy.

I think there are 2 problems with this patch.

1) this patch should try to iterate through all the way up to find
first dma-ranges and should stop when it finds it.
    it should not assume that dma-ranges will always be found at the
current node.

2) of_translate_dma_address is broken, because if point 1 is achieved,
then no need to traverse anymore.

but before that, again seeking your opinion, whether we want to go
down this path.
registering bus specific get_ranges as in PATCH v5 ? in my opinion it
is better way of handling it.

the original patch which you had in mind, you will have to club both
PCI and memory -mapped implementation together.
even if dma-ranges (ignoring flags) remain the same in nature, still
you have to parse it differently. because address-cells are different.
and you will have to handle multiple ranges.

I just tried to bring it out to different path with registering bus
specific callbacks.

> Robin.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/3] iommu/pci: reserve iova for PCI masters
  2017-05-05 15:51         ` Robin Murphy
@ 2017-05-06  6:01           ` Oza Oza
  -1 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-06  6:01 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, Linux IOMMU, linux-pci, linux-kernel,
	linux-arm-kernel, devicetree, BCM Kernel Feedback, Oza Pawandeep

On Fri, May 5, 2017 at 9:21 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> On 04/05/17 19:52, Oza Oza wrote:
>> On Thu, May 4, 2017 at 11:50 PM, Robin Murphy <robin.murphy@arm.com> wrote:
>>> On 03/05/17 05:46, Oza Pawandeep wrote:
>>>> this patch reserves the iova for PCI masters.
>>>> ARM64 based SOCs may have scattered memory banks.
>>>> such as iproc based SOC has
>>>>
>>>> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
>>>> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
>>>> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
>>>> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
>>>>
>>>> but incoming PCI transcation addressing capability is limited
>>>> by host bridge, for example if max incoming window capability
>>>> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
>>>>
>>>> to address this problem, iommu has to avoid allocating iovas which
>>>> are reserved. which inturn does not allocate iova if it falls into hole.
>>>
>>> I don't necessarily disagree with doing this, as we could do with facing
>>> up to the issue of discontiguous DMA ranges in particular (I too have a
>>> platform with this problem), but I'm still not overly keen on pulling DT
>>> specifics into this layer. More than that, though, if we are going to do
>>> it, then we should do it for all devices with a restrictive
>>> "dma-ranges", not just PCI ones.
>>>
>>
>> How do you propose to do it ?
>>
>> my thinking is this:
>> iova_reserve_pci_windows is written specific for PCI, and I am adding there.
>>
>> ideally
>> struct pci_host_bridge should have new member:
>>
>> struct list_head inbound_windows; /* resource_entry */
>>
>> but somehow this resource have to be filled much before
>> iommu_dma_init_domain happens.
>> and use brdge resource directly in iova_reserve_pci_windows as it is
>> already doing it for outbound memory.
>>
>> this will detach the DT specifics from dma-immu layer.
>> let me know how this sounds.
>
> Please check the state of the code currently queued in Joerg's tree and
> in linux-next - iommu_dma_get_resv_regions() has room for
> device-agnostic stuff before the if (!dev_is_pci(dev)) check.
>
> Furthermore, with the probe-deferral changes we end up with a common
> dma_configure() routine to abstract the firmware-specifics of
> of_dma_configure() vs. acpi_dma_configure(), so it would make sense to
> give drivers etc. a similar interface for interrogating ranges. i.e.
> some common function that abstracts the difference between parsing DT
> dma-ranges vs. the ACPI _DMA object, either with a list-based get/put
> model or perhaps an iterator with a user-provided callback (so users
> could process in-place or create their own list as necessary). Unless of
> course we go all the way to making the ranges an inherent part of the
> device layer like some MIPS platforms currently do.
>
> Robin.
>

you are suggesting to wait till iommu_dma_get_resv_regions gets in ?

Oza.


>>>> Bug: SOC-5216
>>>> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
>>>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>>>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
>>>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
>>>> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
>>>> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
>>>> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>>>> Tested-by: CCXSW <ccxswbuild@broadcom.com>
>>>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>>>
>>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>>> index 48d36ce..08764b0 100644
>>>> --- a/drivers/iommu/dma-iommu.c
>>>> +++ b/drivers/iommu/dma-iommu.c
>>>> @@ -27,6 +27,7 @@
>>>>  #include <linux/iova.h>
>>>>  #include <linux/irq.h>
>>>>  #include <linux/mm.h>
>>>> +#include <linux/of_pci.h>
>>>>  #include <linux/pci.h>
>>>>  #include <linux/scatterlist.h>
>>>>  #include <linux/vmalloc.h>
>>>> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>>>               struct iova_domain *iovad)
>>>>  {
>>>>       struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>>>> +     struct device_node *np = bridge->dev.parent->of_node;
>>>>       struct resource_entry *window;
>>>>       unsigned long lo, hi;
>>>> +     int ret;
>>>> +     dma_addr_t tmp_dma_addr = 0, dma_addr;
>>>> +     LIST_HEAD(res);
>>>>
>>>>       resource_list_for_each_entry(window, &bridge->windows) {
>>>>               if (resource_type(window->res) != IORESOURCE_MEM &&
>>>> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>>>               hi = iova_pfn(iovad, window->res->end - window->offset);
>>>>               reserve_iova(iovad, lo, hi);
>>>>       }
>>>> +
>>>> +     /* PCI inbound memory reservation. */
>>>> +     ret = of_pci_get_dma_ranges(np, &res);
>>>> +     if (!ret) {
>>>> +             resource_list_for_each_entry(window, &res) {
>>>> +                     struct resource *res_dma = window->res;
>>>> +
>>>> +                     dma_addr = res_dma->start - window->offset;
>>>> +                     if (tmp_dma_addr > dma_addr) {
>>>> +                             pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");
>>>
>>> I don't see anything in the DT spec about the entries having to be
>>> sorted, and it's not exactly impossible to sort a list if you need it so
>>> (and if I'm being really pedantic, one could still trigger this with a
>>> list that *is* sorted, only by different criteria).
>>>
>>
>> we have to sort it the way we want then. I can make it sort then.
>> thanks for the suggestion.
>>
>>> Robin.
>>>
>>>> +                             return;
>>>> +                     }
>>>> +                     if (tmp_dma_addr != dma_addr) {
>>>> +                             lo = iova_pfn(iovad, tmp_dma_addr);
>>>> +                             hi = iova_pfn(iovad, dma_addr - 1);
>>>> +                             reserve_iova(iovad, lo, hi);
>>>> +                     }
>>>> +                     tmp_dma_addr = window->res->end - window->offset;
>>>> +             }
>>>> +             /*
>>>> +              * the last dma-range should honour based on the
>>>> +              * 32/64-bit dma addresses.
>>>> +              */
>>>> +             if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
>>>> +                     lo = iova_pfn(iovad, tmp_dma_addr);
>>>> +                     hi = iova_pfn(iovad,
>>>> +                                   DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
>>>> +                     reserve_iova(iovad, lo, hi);
>>>> +             }
>>>> +     }
>>>>  }
>>>>
>>>>  /**
>>>>
>>>
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-06  6:01           ` Oza Oza
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-06  6:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 5, 2017 at 9:21 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> On 04/05/17 19:52, Oza Oza wrote:
>> On Thu, May 4, 2017 at 11:50 PM, Robin Murphy <robin.murphy@arm.com> wrote:
>>> On 03/05/17 05:46, Oza Pawandeep wrote:
>>>> this patch reserves the iova for PCI masters.
>>>> ARM64 based SOCs may have scattered memory banks.
>>>> such as iproc based SOC has
>>>>
>>>> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
>>>> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
>>>> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
>>>> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
>>>>
>>>> but incoming PCI transcation addressing capability is limited
>>>> by host bridge, for example if max incoming window capability
>>>> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
>>>>
>>>> to address this problem, iommu has to avoid allocating iovas which
>>>> are reserved. which inturn does not allocate iova if it falls into hole.
>>>
>>> I don't necessarily disagree with doing this, as we could do with facing
>>> up to the issue of discontiguous DMA ranges in particular (I too have a
>>> platform with this problem), but I'm still not overly keen on pulling DT
>>> specifics into this layer. More than that, though, if we are going to do
>>> it, then we should do it for all devices with a restrictive
>>> "dma-ranges", not just PCI ones.
>>>
>>
>> How do you propose to do it ?
>>
>> my thinking is this:
>> iova_reserve_pci_windows is written specific for PCI, and I am adding there.
>>
>> ideally
>> struct pci_host_bridge should have new member:
>>
>> struct list_head inbound_windows; /* resource_entry */
>>
>> but somehow this resource have to be filled much before
>> iommu_dma_init_domain happens.
>> and use brdge resource directly in iova_reserve_pci_windows as it is
>> already doing it for outbound memory.
>>
>> this will detach the DT specifics from dma-immu layer.
>> let me know how this sounds.
>
> Please check the state of the code currently queued in Joerg's tree and
> in linux-next - iommu_dma_get_resv_regions() has room for
> device-agnostic stuff before the if (!dev_is_pci(dev)) check.
>
> Furthermore, with the probe-deferral changes we end up with a common
> dma_configure() routine to abstract the firmware-specifics of
> of_dma_configure() vs. acpi_dma_configure(), so it would make sense to
> give drivers etc. a similar interface for interrogating ranges. i.e.
> some common function that abstracts the difference between parsing DT
> dma-ranges vs. the ACPI _DMA object, either with a list-based get/put
> model or perhaps an iterator with a user-provided callback (so users
> could process in-place or create their own list as necessary). Unless of
> course we go all the way to making the ranges an inherent part of the
> device layer like some MIPS platforms currently do.
>
> Robin.
>

you are suggesting to wait till iommu_dma_get_resv_regions gets in ?

Oza.


>>>> Bug: SOC-5216
>>>> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
>>>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>>>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
>>>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
>>>> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
>>>> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
>>>> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>>>> Tested-by: CCXSW <ccxswbuild@broadcom.com>
>>>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>>>
>>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>>> index 48d36ce..08764b0 100644
>>>> --- a/drivers/iommu/dma-iommu.c
>>>> +++ b/drivers/iommu/dma-iommu.c
>>>> @@ -27,6 +27,7 @@
>>>>  #include <linux/iova.h>
>>>>  #include <linux/irq.h>
>>>>  #include <linux/mm.h>
>>>> +#include <linux/of_pci.h>
>>>>  #include <linux/pci.h>
>>>>  #include <linux/scatterlist.h>
>>>>  #include <linux/vmalloc.h>
>>>> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>>>               struct iova_domain *iovad)
>>>>  {
>>>>       struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>>>> +     struct device_node *np = bridge->dev.parent->of_node;
>>>>       struct resource_entry *window;
>>>>       unsigned long lo, hi;
>>>> +     int ret;
>>>> +     dma_addr_t tmp_dma_addr = 0, dma_addr;
>>>> +     LIST_HEAD(res);
>>>>
>>>>       resource_list_for_each_entry(window, &bridge->windows) {
>>>>               if (resource_type(window->res) != IORESOURCE_MEM &&
>>>> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>>>               hi = iova_pfn(iovad, window->res->end - window->offset);
>>>>               reserve_iova(iovad, lo, hi);
>>>>       }
>>>> +
>>>> +     /* PCI inbound memory reservation. */
>>>> +     ret = of_pci_get_dma_ranges(np, &res);
>>>> +     if (!ret) {
>>>> +             resource_list_for_each_entry(window, &res) {
>>>> +                     struct resource *res_dma = window->res;
>>>> +
>>>> +                     dma_addr = res_dma->start - window->offset;
>>>> +                     if (tmp_dma_addr > dma_addr) {
>>>> +                             pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");
>>>
>>> I don't see anything in the DT spec about the entries having to be
>>> sorted, and it's not exactly impossible to sort a list if you need it so
>>> (and if I'm being really pedantic, one could still trigger this with a
>>> list that *is* sorted, only by different criteria).
>>>
>>
>> we have to sort it the way we want then. I can make it sort then.
>> thanks for the suggestion.
>>
>>> Robin.
>>>
>>>> +                             return;
>>>> +                     }
>>>> +                     if (tmp_dma_addr != dma_addr) {
>>>> +                             lo = iova_pfn(iovad, tmp_dma_addr);
>>>> +                             hi = iova_pfn(iovad, dma_addr - 1);
>>>> +                             reserve_iova(iovad, lo, hi);
>>>> +                     }
>>>> +                     tmp_dma_addr = window->res->end - window->offset;
>>>> +             }
>>>> +             /*
>>>> +              * the last dma-range should honour based on the
>>>> +              * 32/64-bit dma addresses.
>>>> +              */
>>>> +             if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
>>>> +                     lo = iova_pfn(iovad, tmp_dma_addr);
>>>> +                     hi = iova_pfn(iovad,
>>>> +                                   DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
>>>> +                     reserve_iova(iovad, lo, hi);
>>>> +             }
>>>> +     }
>>>>  }
>>>>
>>>>  /**
>>>>
>>>
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
  2017-05-06  5:30         ` Oza Oza via iommu
@ 2017-05-16  5:24           ` Oza Oza
  -1 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-16  5:24 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, Linux IOMMU, linux-pci, linux-kernel,
	linux-arm-kernel, devicetree, BCM Kernel Feedback, Oza Pawandeep

On Sat, May 6, 2017 at 11:00 AM, Oza Oza <oza.oza@broadcom.com> wrote:
> On Fri, May 5, 2017 at 8:55 PM, Robin Murphy <robin.murphy@arm.com> wrote:
>> On 04/05/17 19:41, Oza Oza wrote:
>> [...]
>>>>> 5) leaves scope of adding PCI flag handling for inbound memory
>>>>> by the new function.
>>>>
>>>> Which flags would ever actually matter? DMA windows aren't going to be
>>>> to config or I/O space, so the memory type can be assumed, and the
>>>> 32/64-bit distinction is irrelevant as it's not a relocatable BAR;
>>>> DMA-able system memory isn't going to be read-sensitive, so the
>>>> prefetchable flag shouldn't matter; and not being a BAR none of the
>>>> others would be relevant either.
>>>>
>>>
>>> Thanks Robin; for your reply and attention:
>>>
>>> agree with you, at present it would not matter,
>>> but it does not mean that we do not scope it to make it matter in future.
>>>
>>> now where it could matter:
>>> there is Relaxed Ordering for inbound memory for PCI.
>>> According to standard, Relaxed Ordering (RO) bit can be set only for
>>> Memory requests and completions (if present in the original request).
>>> Also, according to transaction ordering rules, I/O and configuration
>>> requests can still be re-ordered ahead of each other.
>>> and we would like to make use of it.
>>> for e.g. lets say we mark memory as Relaxed Ordered with flag.
>>> the special about this memory is incoming PCI transactions can be
>>> reordered and rest memory has to be strongly ordered.
>>
>> Please look at "PCI Bus Binding to: IEEE Std 1275-1994 Standard for Boot
>> (Initialization Configuration) Firmware" (as referenced in DTSpec) and
>> explain how PCIe Relaxed Order has anything to do with the DT binding.
>>
>>> how it our SOC would make use of this is out of scope for the
>>> discussion at this point of time, but I am just bringing in the
>>> idea/point how flags could be useful
>>> for inbound memory, since we would not like throw-away flags completely.
>>
>> The premise for implementing a PCI-specific parser is that you claim we
>> need to do something with the phys.hi cell of a DT PCI address, rather
>> than just taking the numerical part out of the phys.mid and phys.lo
>> cells. Please make that argument in reference to the flags which that
>> upper cell actually encodes, not unrelated things.
>>
>
> I think, the whole discussion around inbound flags is not what I
> intended to bring.
> this patch does nothing about inbound flag and never intends to solve
> anything regarding inbound flags.
> infact I would like to remove point 5 form the commit message. which
> should keep it out of discussion completely.
>
> let met tell what this patch is trying to address/solve 2 BUGs
> 1) fix wrong size return from of_dma_configure for PCI masters. (which
> is right now BUG)
>
> 2) handles multiple dma-ranges cleanly
>
> 3) It takes care of dma-ranges being optional.
>
> 4) following is the comment on function of_dma_get_range (which is also a BUG)
>                 "It returns -ENODEV if "dma-ranges" property was not found
>                  * for this device in DT."
> which I think is wrong for PCI device, because if dma-ranges are
> absent then instead of returning  -ENODEV,
> it should return 0 with largest possible host memory.
>
> it solves all the above 4 problems.
>
>> [...]
>>>>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
>>>>> +{
>>>>> +     struct device_node *node = of_node_get(np);
>>>>> +     int rlen;
>>>>> +     int ret = 0;
>>>>> +     const int na = 3, ns = 2;
>>>>> +     struct resource *res;
>>>>> +     struct of_pci_range_parser parser;
>>>>> +     struct of_pci_range range;
>>>>> +
>>>>> +     if (!node)
>>>>> +             return -EINVAL;
>>>>> +
>>>>> +     parser.node = node;
>>>>> +     parser.pna = of_n_addr_cells(node);
>>>>> +     parser.np = parser.pna + na + ns;
>>>>> +
>>>>> +     parser.range = of_get_property(node, "dma-ranges", &rlen);
>>>>> +
>>>>> +     if (!parser.range) {
>>>>> +             pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
>>>>> +                       np->full_name);
>>>>> +             ret = -EINVAL;
>>>>> +             goto out;
>>>>> +     }
>>>>> +
>>>>> +     parser.end = parser.range + rlen / sizeof(__be32);
>>>>> +
>>>>> +     for_each_of_pci_range(&parser, &range) {
>>>>
>>>> This is plain wrong - of_pci_range_parser_one() will translate upwards
>>>> through parent "ranges" properties, which is completely backwards for
>>>> DMA addresses.
>>>>
>>>> Robin.
>>>>
>>>
>>> No it does not, this patch is thoroughly tested on our SOC and it works.
>>> of_pci_range_parser_one does not translate upwards through parent. it
>>> just sticks to given PCI parser.
>>
>> Frankly, I'm losing patience with this attitude. Please look at the code
>> you call:
>>
>> #define for_each_of_pci_range(parser, range) \
>>         for (; of_pci_range_parser_one(parser, range);)
>>
>>
>> struct of_pci_range *of_pci_range_parser_one(struct of_pci_range_parser
>> *parser,
>>                                                 struct of_pci_range *range)
>> {
>>         const int na = 3, ns = 2;
>>
>>         if (!range)
>>                 return NULL;
>>
>>         if (!parser->range || parser->range + parser->np > parser->end)
>>                 return NULL;
>>
>>         range->pci_space = parser->range[0];
>>         range->flags = of_bus_pci_get_flags(parser->range);
>>         range->pci_addr = of_read_number(parser->range + 1, ns);
>>         range->cpu_addr = of_translate_address(parser->node,
>>                                 parser->range + na);
>> ...
>>
>>
>> u64 of_translate_address(struct device_node *dev, const __be32 *in_addr)
>> {
>>         return __of_translate_address(dev, in_addr, "ranges");
>> }
>>
>>
>> I don't doubt that you still manage to get the right result on *your*
>> SoC, because you probably have neither further "ranges" nor "dma-ranges"
>> translations above your host controller node anyway. That does not
>> change the fact that the proposed code is still obviously wrong for more
>> complex DT topologies that do.
>
> sorry but I am confused, and sorry again for not getting through on
> what you said.
>
> this patch assumes that the root bus would have dma-ranges.
> are you saying this code doesn't iterate through way up till it finds
> valid dma-ranges ?
>
> or
>
> you are saying
> of_pci_range_parser_one() will translate upwards
> through parent "ranges" properties
>
>>
>> We're doing upstream work in core code here: I don't particularly care
>> about making your SoC work; I don't really care about making Juno work
>> properly either; what I do care about is that code to parse dma-ranges
>> actually parses dma-ranges *correctly* for all possible valid uses of
>> dma-ranges, which means fixing the existing bugs and not introducing
>> more. The principal side-effect of that is that *all* systems with valid
>> DTs will then work correctly.
>>
>
> I do see your point now.....and my apologies for not getting it right
> at the first time.
>
> but I would not know all the nitty-gritty of all the code of framework and
> every complex DT topology of other SOCs.
> that is the reason we seek for comments from experts like you, to make
> the patch better.
>
> when I say it works on our SOC, I only meant that this patch is
> tested. so again apologies there.
>
> there is one obvious problem is
> of_translate_dma_address should get called instead of
> of_translate_address (so no "ranges") instead ("dma-ranges")
>
> but with that also as you said, it will traverse all the way up to the
> DT hierarchy.
>
> I think there are 2 problems with this patch.
>
> 1) this patch should try to iterate through all the way up to find
> first dma-ranges and should stop when it finds it.
>     it should not assume that dma-ranges will always be found at the
> current node.
>
> 2) of_translate_dma_address is broken, because if point 1 is achieved,
> then no need to traverse anymore.
>
> but before that, again seeking your opinion, whether we want to go
> down this path.
> registering bus specific get_ranges as in PATCH v5 ? in my opinion it
> is better way of handling it.
>
> the original patch which you had in mind, you will have to club both
> PCI and memory -mapped implementation together.
> even if dma-ranges (ignoring flags) remain the same in nature, still
> you have to parse it differently. because address-cells are different.
> and you will have to handle multiple ranges.
>
> I just tried to bring it out to different path with registering bus
> specific callbacks.
>
>> Robin.

Hi Robin,

I have addressed your comments to the best of my understanding.
please have a look at PATCH v6

Regards,
Oza.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
@ 2017-05-16  5:24           ` Oza Oza
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-16  5:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, May 6, 2017 at 11:00 AM, Oza Oza <oza.oza@broadcom.com> wrote:
> On Fri, May 5, 2017 at 8:55 PM, Robin Murphy <robin.murphy@arm.com> wrote:
>> On 04/05/17 19:41, Oza Oza wrote:
>> [...]
>>>>> 5) leaves scope of adding PCI flag handling for inbound memory
>>>>> by the new function.
>>>>
>>>> Which flags would ever actually matter? DMA windows aren't going to be
>>>> to config or I/O space, so the memory type can be assumed, and the
>>>> 32/64-bit distinction is irrelevant as it's not a relocatable BAR;
>>>> DMA-able system memory isn't going to be read-sensitive, so the
>>>> prefetchable flag shouldn't matter; and not being a BAR none of the
>>>> others would be relevant either.
>>>>
>>>
>>> Thanks Robin; for your reply and attention:
>>>
>>> agree with you, at present it would not matter,
>>> but it does not mean that we do not scope it to make it matter in future.
>>>
>>> now where it could matter:
>>> there is Relaxed Ordering for inbound memory for PCI.
>>> According to standard, Relaxed Ordering (RO) bit can be set only for
>>> Memory requests and completions (if present in the original request).
>>> Also, according to transaction ordering rules, I/O and configuration
>>> requests can still be re-ordered ahead of each other.
>>> and we would like to make use of it.
>>> for e.g. lets say we mark memory as Relaxed Ordered with flag.
>>> the special about this memory is incoming PCI transactions can be
>>> reordered and rest memory has to be strongly ordered.
>>
>> Please look at "PCI Bus Binding to: IEEE Std 1275-1994 Standard for Boot
>> (Initialization Configuration) Firmware" (as referenced in DTSpec) and
>> explain how PCIe Relaxed Order has anything to do with the DT binding.
>>
>>> how it our SOC would make use of this is out of scope for the
>>> discussion at this point of time, but I am just bringing in the
>>> idea/point how flags could be useful
>>> for inbound memory, since we would not like throw-away flags completely.
>>
>> The premise for implementing a PCI-specific parser is that you claim we
>> need to do something with the phys.hi cell of a DT PCI address, rather
>> than just taking the numerical part out of the phys.mid and phys.lo
>> cells. Please make that argument in reference to the flags which that
>> upper cell actually encodes, not unrelated things.
>>
>
> I think, the whole discussion around inbound flags is not what I
> intended to bring.
> this patch does nothing about inbound flag and never intends to solve
> anything regarding inbound flags.
> infact I would like to remove point 5 form the commit message. which
> should keep it out of discussion completely.
>
> let met tell what this patch is trying to address/solve 2 BUGs
> 1) fix wrong size return from of_dma_configure for PCI masters. (which
> is right now BUG)
>
> 2) handles multiple dma-ranges cleanly
>
> 3) It takes care of dma-ranges being optional.
>
> 4) following is the comment on function of_dma_get_range (which is also a BUG)
>                 "It returns -ENODEV if "dma-ranges" property was not found
>                  * for this device in DT."
> which I think is wrong for PCI device, because if dma-ranges are
> absent then instead of returning  -ENODEV,
> it should return 0 with largest possible host memory.
>
> it solves all the above 4 problems.
>
>> [...]
>>>>> +int of_pci_get_dma_ranges(struct device_node *np, struct list_head *resources)
>>>>> +{
>>>>> +     struct device_node *node = of_node_get(np);
>>>>> +     int rlen;
>>>>> +     int ret = 0;
>>>>> +     const int na = 3, ns = 2;
>>>>> +     struct resource *res;
>>>>> +     struct of_pci_range_parser parser;
>>>>> +     struct of_pci_range range;
>>>>> +
>>>>> +     if (!node)
>>>>> +             return -EINVAL;
>>>>> +
>>>>> +     parser.node = node;
>>>>> +     parser.pna = of_n_addr_cells(node);
>>>>> +     parser.np = parser.pna + na + ns;
>>>>> +
>>>>> +     parser.range = of_get_property(node, "dma-ranges", &rlen);
>>>>> +
>>>>> +     if (!parser.range) {
>>>>> +             pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
>>>>> +                       np->full_name);
>>>>> +             ret = -EINVAL;
>>>>> +             goto out;
>>>>> +     }
>>>>> +
>>>>> +     parser.end = parser.range + rlen / sizeof(__be32);
>>>>> +
>>>>> +     for_each_of_pci_range(&parser, &range) {
>>>>
>>>> This is plain wrong - of_pci_range_parser_one() will translate upwards
>>>> through parent "ranges" properties, which is completely backwards for
>>>> DMA addresses.
>>>>
>>>> Robin.
>>>>
>>>
>>> No it does not, this patch is thoroughly tested on our SOC and it works.
>>> of_pci_range_parser_one does not translate upwards through parent. it
>>> just sticks to given PCI parser.
>>
>> Frankly, I'm losing patience with this attitude. Please look at the code
>> you call:
>>
>> #define for_each_of_pci_range(parser, range) \
>>         for (; of_pci_range_parser_one(parser, range);)
>>
>>
>> struct of_pci_range *of_pci_range_parser_one(struct of_pci_range_parser
>> *parser,
>>                                                 struct of_pci_range *range)
>> {
>>         const int na = 3, ns = 2;
>>
>>         if (!range)
>>                 return NULL;
>>
>>         if (!parser->range || parser->range + parser->np > parser->end)
>>                 return NULL;
>>
>>         range->pci_space = parser->range[0];
>>         range->flags = of_bus_pci_get_flags(parser->range);
>>         range->pci_addr = of_read_number(parser->range + 1, ns);
>>         range->cpu_addr = of_translate_address(parser->node,
>>                                 parser->range + na);
>> ...
>>
>>
>> u64 of_translate_address(struct device_node *dev, const __be32 *in_addr)
>> {
>>         return __of_translate_address(dev, in_addr, "ranges");
>> }
>>
>>
>> I don't doubt that you still manage to get the right result on *your*
>> SoC, because you probably have neither further "ranges" nor "dma-ranges"
>> translations above your host controller node anyway. That does not
>> change the fact that the proposed code is still obviously wrong for more
>> complex DT topologies that do.
>
> sorry but I am confused, and sorry again for not getting through on
> what you said.
>
> this patch assumes that the root bus would have dma-ranges.
> are you saying this code doesn't iterate through way up till it finds
> valid dma-ranges ?
>
> or
>
> you are saying
> of_pci_range_parser_one() will translate upwards
> through parent "ranges" properties
>
>>
>> We're doing upstream work in core code here: I don't particularly care
>> about making your SoC work; I don't really care about making Juno work
>> properly either; what I do care about is that code to parse dma-ranges
>> actually parses dma-ranges *correctly* for all possible valid uses of
>> dma-ranges, which means fixing the existing bugs and not introducing
>> more. The principal side-effect of that is that *all* systems with valid
>> DTs will then work correctly.
>>
>
> I do see your point now.....and my apologies for not getting it right
> at the first time.
>
> but I would not know all the nitty-gritty of all the code of framework and
> every complex DT topology of other SOCs.
> that is the reason we seek for comments from experts like you, to make
> the patch better.
>
> when I say it works on our SOC, I only meant that this patch is
> tested. so again apologies there.
>
> there is one obvious problem is
> of_translate_dma_address should get called instead of
> of_translate_address (so no "ranges") instead ("dma-ranges")
>
> but with that also as you said, it will traverse all the way up to the
> DT hierarchy.
>
> I think there are 2 problems with this patch.
>
> 1) this patch should try to iterate through all the way up to find
> first dma-ranges and should stop when it finds it.
>     it should not assume that dma-ranges will always be found at the
> current node.
>
> 2) of_translate_dma_address is broken, because if point 1 is achieved,
> then no need to traverse anymore.
>
> but before that, again seeking your opinion, whether we want to go
> down this path.
> registering bus specific get_ranges as in PATCH v5 ? in my opinion it
> is better way of handling it.
>
> the original patch which you had in mind, you will have to club both
> PCI and memory -mapped implementation together.
> even if dma-ranges (ignoring flags) remain the same in nature, still
> you have to parse it differently. because address-cells are different.
> and you will have to handle multiple ranges.
>
> I just tried to bring it out to different path with registering bus
> specific callbacks.
>
>> Robin.

Hi Robin,

I have addressed your comments to the best of my understanding.
please have a look at PATCH v6

Regards,
Oza.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/3] iommu/pci: reserve iova for PCI masters
  2017-05-05 15:51         ` Robin Murphy
@ 2017-05-22 16:45           ` Oza Oza
  -1 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-22 16:45 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, Linux IOMMU, linux-pci, linux-kernel, Linux ARM,
	devicetree, BCM Kernel Feedback, Oza Pawandeep

On Fri, May 5, 2017 at 9:21 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> On 04/05/17 19:52, Oza Oza wrote:
>> On Thu, May 4, 2017 at 11:50 PM, Robin Murphy <robin.murphy@arm.com> wrote:
>>> On 03/05/17 05:46, Oza Pawandeep wrote:
>>>> this patch reserves the iova for PCI masters.
>>>> ARM64 based SOCs may have scattered memory banks.
>>>> such as iproc based SOC has
>>>>
>>>> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
>>>> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
>>>> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
>>>> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
>>>>
>>>> but incoming PCI transcation addressing capability is limited
>>>> by host bridge, for example if max incoming window capability
>>>> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
>>>>
>>>> to address this problem, iommu has to avoid allocating iovas which
>>>> are reserved. which inturn does not allocate iova if it falls into hole.
>>>
>>> I don't necessarily disagree with doing this, as we could do with facing
>>> up to the issue of discontiguous DMA ranges in particular (I too have a
>>> platform with this problem), but I'm still not overly keen on pulling DT
>>> specifics into this layer. More than that, though, if we are going to do
>>> it, then we should do it for all devices with a restrictive
>>> "dma-ranges", not just PCI ones.
>>>
>>
>> How do you propose to do it ?
>>
>> my thinking is this:
>> iova_reserve_pci_windows is written specific for PCI, and I am adding there.
>>
>> ideally
>> struct pci_host_bridge should have new member:
>>
>> struct list_head inbound_windows; /* resource_entry */
>>
>> but somehow this resource have to be filled much before
>> iommu_dma_init_domain happens.
>> and use brdge resource directly in iova_reserve_pci_windows as it is
>> already doing it for outbound memory.
>>
>> this will detach the DT specifics from dma-immu layer.
>> let me know how this sounds.
>
> Please check the state of the code currently queued in Joerg's tree and
> in linux-next - iommu_dma_get_resv_regions() has room for
> device-agnostic stuff before the if (!dev_is_pci(dev)) check.
>
> Furthermore, with the probe-deferral changes we end up with a common
> dma_configure() routine to abstract the firmware-specifics of
> of_dma_configure() vs. acpi_dma_configure(), so it would make sense to
> give drivers etc. a similar interface for interrogating ranges. i.e.
> some common function that abstracts the difference between parsing DT
> dma-ranges vs. the ACPI _DMA object, either with a list-based get/put
> model or perhaps an iterator with a user-provided callback (so users
> could process in-place or create their own list as necessary). Unless of
> course we go all the way to making the ranges an inherent part of the
> device layer like some MIPS platforms currently do.
>
> Robin.
>

Hi Robin,

your above comments are taken care to the best of my understanding.
Kindly have a look at the PATCH v7.
the whole patch-set takes care of IOVA reservation for PCI inbound memory.
now there is no dependency on IOMMU and OF layer.

Regards,
Oza.


>>>> Bug: SOC-5216
>>>> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
>>>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>>>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
>>>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
>>>> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
>>>> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
>>>> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>>>> Tested-by: CCXSW <ccxswbuild@broadcom.com>
>>>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>>>
>>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>>> index 48d36ce..08764b0 100644
>>>> --- a/drivers/iommu/dma-iommu.c
>>>> +++ b/drivers/iommu/dma-iommu.c
>>>> @@ -27,6 +27,7 @@
>>>>  #include <linux/iova.h>
>>>>  #include <linux/irq.h>
>>>>  #include <linux/mm.h>
>>>> +#include <linux/of_pci.h>
>>>>  #include <linux/pci.h>
>>>>  #include <linux/scatterlist.h>
>>>>  #include <linux/vmalloc.h>
>>>> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>>>               struct iova_domain *iovad)
>>>>  {
>>>>       struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>>>> +     struct device_node *np = bridge->dev.parent->of_node;
>>>>       struct resource_entry *window;
>>>>       unsigned long lo, hi;
>>>> +     int ret;
>>>> +     dma_addr_t tmp_dma_addr = 0, dma_addr;
>>>> +     LIST_HEAD(res);
>>>>
>>>>       resource_list_for_each_entry(window, &bridge->windows) {
>>>>               if (resource_type(window->res) != IORESOURCE_MEM &&
>>>> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>>>               hi = iova_pfn(iovad, window->res->end - window->offset);
>>>>               reserve_iova(iovad, lo, hi);
>>>>       }
>>>> +
>>>> +     /* PCI inbound memory reservation. */
>>>> +     ret = of_pci_get_dma_ranges(np, &res);
>>>> +     if (!ret) {
>>>> +             resource_list_for_each_entry(window, &res) {
>>>> +                     struct resource *res_dma = window->res;
>>>> +
>>>> +                     dma_addr = res_dma->start - window->offset;
>>>> +                     if (tmp_dma_addr > dma_addr) {
>>>> +                             pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");
>>>
>>> I don't see anything in the DT spec about the entries having to be
>>> sorted, and it's not exactly impossible to sort a list if you need it so
>>> (and if I'm being really pedantic, one could still trigger this with a
>>> list that *is* sorted, only by different criteria).
>>>
>>
>> we have to sort it the way we want then. I can make it sort then.
>> thanks for the suggestion.
>>
>>> Robin.
>>>
>>>> +                             return;
>>>> +                     }
>>>> +                     if (tmp_dma_addr != dma_addr) {
>>>> +                             lo = iova_pfn(iovad, tmp_dma_addr);
>>>> +                             hi = iova_pfn(iovad, dma_addr - 1);
>>>> +                             reserve_iova(iovad, lo, hi);
>>>> +                     }
>>>> +                     tmp_dma_addr = window->res->end - window->offset;
>>>> +             }
>>>> +             /*
>>>> +              * the last dma-range should honour based on the
>>>> +              * 32/64-bit dma addresses.
>>>> +              */
>>>> +             if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
>>>> +                     lo = iova_pfn(iovad, tmp_dma_addr);
>>>> +                     hi = iova_pfn(iovad,
>>>> +                                   DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
>>>> +                     reserve_iova(iovad, lo, hi);
>>>> +             }
>>>> +     }
>>>>  }
>>>>
>>>>  /**
>>>>
>>>
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 2/3] iommu/pci: reserve iova for PCI masters
@ 2017-05-22 16:45           ` Oza Oza
  0 siblings, 0 replies; 61+ messages in thread
From: Oza Oza @ 2017-05-22 16:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 5, 2017 at 9:21 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> On 04/05/17 19:52, Oza Oza wrote:
>> On Thu, May 4, 2017 at 11:50 PM, Robin Murphy <robin.murphy@arm.com> wrote:
>>> On 03/05/17 05:46, Oza Pawandeep wrote:
>>>> this patch reserves the iova for PCI masters.
>>>> ARM64 based SOCs may have scattered memory banks.
>>>> such as iproc based SOC has
>>>>
>>>> <0x00000000 0x80000000 0x0 0x80000000>, /* 2G @ 2G */
>>>> <0x00000008 0x80000000 0x3 0x80000000>, /* 14G @ 34G */
>>>> <0x00000090 0x00000000 0x4 0x00000000>, /* 16G @ 576G */
>>>> <0x000000a0 0x00000000 0x4 0x00000000>; /* 16G @ 640G */
>>>>
>>>> but incoming PCI transcation addressing capability is limited
>>>> by host bridge, for example if max incoming window capability
>>>> is 512 GB, then 0x00000090 and 0x000000a0 will fall beyond it.
>>>>
>>>> to address this problem, iommu has to avoid allocating iovas which
>>>> are reserved. which inturn does not allocate iova if it falls into hole.
>>>
>>> I don't necessarily disagree with doing this, as we could do with facing
>>> up to the issue of discontiguous DMA ranges in particular (I too have a
>>> platform with this problem), but I'm still not overly keen on pulling DT
>>> specifics into this layer. More than that, though, if we are going to do
>>> it, then we should do it for all devices with a restrictive
>>> "dma-ranges", not just PCI ones.
>>>
>>
>> How do you propose to do it ?
>>
>> my thinking is this:
>> iova_reserve_pci_windows is written specific for PCI, and I am adding there.
>>
>> ideally
>> struct pci_host_bridge should have new member:
>>
>> struct list_head inbound_windows; /* resource_entry */
>>
>> but somehow this resource have to be filled much before
>> iommu_dma_init_domain happens.
>> and use brdge resource directly in iova_reserve_pci_windows as it is
>> already doing it for outbound memory.
>>
>> this will detach the DT specifics from dma-immu layer.
>> let me know how this sounds.
>
> Please check the state of the code currently queued in Joerg's tree and
> in linux-next - iommu_dma_get_resv_regions() has room for
> device-agnostic stuff before the if (!dev_is_pci(dev)) check.
>
> Furthermore, with the probe-deferral changes we end up with a common
> dma_configure() routine to abstract the firmware-specifics of
> of_dma_configure() vs. acpi_dma_configure(), so it would make sense to
> give drivers etc. a similar interface for interrogating ranges. i.e.
> some common function that abstracts the difference between parsing DT
> dma-ranges vs. the ACPI _DMA object, either with a list-based get/put
> model or perhaps an iterator with a user-provided callback (so users
> could process in-place or create their own list as necessary). Unless of
> course we go all the way to making the ranges an inherent part of the
> device layer like some MIPS platforms currently do.
>
> Robin.
>

Hi Robin,

your above comments are taken care to the best of my understanding.
Kindly have a look at the PATCH v7.
the whole patch-set takes care of IOVA reservation for PCI inbound memory.
now there is no dependency on IOMMU and OF layer.

Regards,
Oza.


>>>> Bug: SOC-5216
>>>> Change-Id: Icbfc99a045d730be143fef427098c937b9d46353
>>>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>>>> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40760
>>>> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
>>>> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
>>>> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
>>>> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>>>> Tested-by: CCXSW <ccxswbuild@broadcom.com>
>>>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>>>
>>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>>> index 48d36ce..08764b0 100644
>>>> --- a/drivers/iommu/dma-iommu.c
>>>> +++ b/drivers/iommu/dma-iommu.c
>>>> @@ -27,6 +27,7 @@
>>>>  #include <linux/iova.h>
>>>>  #include <linux/irq.h>
>>>>  #include <linux/mm.h>
>>>> +#include <linux/of_pci.h>
>>>>  #include <linux/pci.h>
>>>>  #include <linux/scatterlist.h>
>>>>  #include <linux/vmalloc.h>
>>>> @@ -171,8 +172,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>>>               struct iova_domain *iovad)
>>>>  {
>>>>       struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>>>> +     struct device_node *np = bridge->dev.parent->of_node;
>>>>       struct resource_entry *window;
>>>>       unsigned long lo, hi;
>>>> +     int ret;
>>>> +     dma_addr_t tmp_dma_addr = 0, dma_addr;
>>>> +     LIST_HEAD(res);
>>>>
>>>>       resource_list_for_each_entry(window, &bridge->windows) {
>>>>               if (resource_type(window->res) != IORESOURCE_MEM &&
>>>> @@ -183,6 +188,36 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>>>>               hi = iova_pfn(iovad, window->res->end - window->offset);
>>>>               reserve_iova(iovad, lo, hi);
>>>>       }
>>>> +
>>>> +     /* PCI inbound memory reservation. */
>>>> +     ret = of_pci_get_dma_ranges(np, &res);
>>>> +     if (!ret) {
>>>> +             resource_list_for_each_entry(window, &res) {
>>>> +                     struct resource *res_dma = window->res;
>>>> +
>>>> +                     dma_addr = res_dma->start - window->offset;
>>>> +                     if (tmp_dma_addr > dma_addr) {
>>>> +                             pr_warn("PCI: failed to reserve iovas; ranges should be sorted\n");
>>>
>>> I don't see anything in the DT spec about the entries having to be
>>> sorted, and it's not exactly impossible to sort a list if you need it so
>>> (and if I'm being really pedantic, one could still trigger this with a
>>> list that *is* sorted, only by different criteria).
>>>
>>
>> we have to sort it the way we want then. I can make it sort then.
>> thanks for the suggestion.
>>
>>> Robin.
>>>
>>>> +                             return;
>>>> +                     }
>>>> +                     if (tmp_dma_addr != dma_addr) {
>>>> +                             lo = iova_pfn(iovad, tmp_dma_addr);
>>>> +                             hi = iova_pfn(iovad, dma_addr - 1);
>>>> +                             reserve_iova(iovad, lo, hi);
>>>> +                     }
>>>> +                     tmp_dma_addr = window->res->end - window->offset;
>>>> +             }
>>>> +             /*
>>>> +              * the last dma-range should honour based on the
>>>> +              * 32/64-bit dma addresses.
>>>> +              */
>>>> +             if (tmp_dma_addr < DMA_BIT_MASK(sizeof(dma_addr_t) * 8)) {
>>>> +                     lo = iova_pfn(iovad, tmp_dma_addr);
>>>> +                     hi = iova_pfn(iovad,
>>>> +                                   DMA_BIT_MASK(sizeof(dma_addr_t) * 8) - 1);
>>>> +                     reserve_iova(iovad, lo, hi);
>>>> +             }
>>>> +     }
>>>>  }
>>>>
>>>>  /**
>>>>
>>>
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2017-05-22 16:45 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-03  4:46 [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters Oza Pawandeep
2017-05-03  4:46 ` Oza Pawandeep
2017-05-03  4:46 ` Oza Pawandeep via iommu
2017-05-03  4:46 ` [PATCH 2/3] iommu/pci: reserve iova " Oza Pawandeep
2017-05-03  4:46   ` Oza Pawandeep
2017-05-03  4:46   ` Oza Pawandeep via iommu
2017-05-03  5:07   ` Oza Oza
2017-05-03  5:07     ` Oza Oza
2017-05-03  5:07     ` Oza Oza via iommu
2017-05-04 18:20   ` Robin Murphy
2017-05-04 18:20     ` Robin Murphy
2017-05-04 18:20     ` Robin Murphy
2017-05-04 18:52     ` Oza Oza
2017-05-04 18:52       ` Oza Oza
2017-05-04 18:52       ` Oza Oza via iommu
2017-05-05 15:51       ` Robin Murphy
2017-05-05 15:51         ` Robin Murphy
2017-05-05 15:51         ` Robin Murphy
2017-05-05 15:51         ` Robin Murphy
2017-05-06  6:01         ` Oza Oza
2017-05-06  6:01           ` Oza Oza
2017-05-22 16:45         ` Oza Oza
2017-05-22 16:45           ` Oza Oza
2017-05-05  8:10     ` Oza Oza
2017-05-05  8:10       ` Oza Oza
2017-05-05  8:10       ` Oza Oza via iommu
2017-05-03  4:46 ` [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges Oza Pawandeep
2017-05-03  4:46   ` Oza Pawandeep
2017-05-03  4:46   ` Oza Pawandeep via iommu
2017-05-03  5:07   ` Oza Oza
2017-05-03  5:07     ` Oza Oza
2017-05-03  5:07     ` Oza Oza via iommu
2017-05-03 20:06   ` Rob Herring
2017-05-03 20:06     ` Rob Herring
2017-05-03 20:06     ` Rob Herring
2017-05-03 20:06     ` Rob Herring
2017-05-03  5:06 ` [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters Oza Oza
2017-05-03  5:06   ` Oza Oza
2017-05-03  5:06   ` Oza Oza via iommu
2017-05-03 19:55 ` Rob Herring
2017-05-03 19:55   ` Rob Herring
2017-05-03 19:55   ` Rob Herring
2017-05-03 19:55   ` Rob Herring
2017-05-04 18:02 ` Robin Murphy
2017-05-04 18:02   ` Robin Murphy
2017-05-04 18:02   ` Robin Murphy
2017-05-04 18:41   ` Oza Oza
2017-05-04 18:41     ` Oza Oza
2017-05-04 18:41     ` Oza Oza
2017-05-05 15:25     ` Robin Murphy
2017-05-05 15:25       ` Robin Murphy
2017-05-05 15:25       ` Robin Murphy
2017-05-05 15:25       ` Robin Murphy
2017-05-06  5:30       ` Oza Oza
2017-05-06  5:30         ` Oza Oza
2017-05-06  5:30         ` Oza Oza via iommu
2017-05-16  5:24         ` Oza Oza
2017-05-16  5:24           ` Oza Oza
2017-05-04 19:12   ` Oza Oza
2017-05-04 19:12     ` Oza Oza
2017-05-04 19:12     ` Oza Oza via iommu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.