All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation
@ 2017-03-20  8:57 ` Oza Oza via iommu
  0 siblings, 0 replies; 14+ messages in thread
From: Oza Oza @ 2017-03-20  8:57 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy, linux-pci
  Cc: iommu, linux-kernel, linux-arm-kernel, devicetree,
	bcm-kernel-feedback-list

+  linux-pci

Regards,
Oza.

-----Original Message-----
From: Oza Pawandeep [mailto:oza.oza@broadcom.com]
Sent: Friday, March 17, 2017 11:41 AM
To: Joerg Roedel; Robin Murphy
Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
linux-arm-kernel@lists.infradead.org; devicetree@vger.kernel.org;
bcm-kernel-feedback-list@broadcom.com; Oza Pawandeep
Subject: [RFC PATCH] iommu/dma: account pci host bridge dma_mask for IOVA
allocation

It is possible that PCI device supports 64-bit DMA addressing, and thus
it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
bridge may have limitations on the inbound transaction addressing. As an
example, consider NVME SSD device connected to iproc-PCIe controller.

Currently, the IOMMU DMA ops only considers PCI device dma_mask when
allocating an IOVA. This is particularly problematic on
ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for
in-bound transactions only after PCI Host has forwarded these transactions
on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound
transactions has to honor the addressing restrictions of the PCI Host.

this patch is inspired by
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.html
http://www.spinics.net/lists/arm-kernel/msg566947.html

but above inspiraiton solves the half of the problem.
the rest of the problem is descrbied below, what we face on iproc based
SOCs.

current pcie frmework and of framework integration assumes dma-ranges in a
way where memory-mapped devices define their dma-ranges.
dma-ranges: (child-bus-address, parent-bus-address, length).

but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;

of_dma_configure is specifically witten to take care of memory mapped
devices.
but no implementation exists for pci to take care of pcie based memory
ranges.
in fact pci world doesnt seem to define standard dma-ranges since there is
an absense of the same, the dma_mask used to remain 32bit because of
0 size return (parsed by of_dma_configure())

this patch also implements of_pci_get_dma_ranges to cater to pci world
dma-ranges.
so then the returned size get best possible (largest) dma_mask.
for e.g.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>; we should get
dev->coherent_dma_mask=0x7fffffffff.

conclusion: there are following problems
1) linux pci and iommu framework integration has glitches with respect to
dma-ranges
2) pci linux framework look very uncertain about dma-ranges, rather
binding is not defined
   the way it is defined for memory mapped devices.
   rcar and iproc based SOCs use their custom one dma-ranges
   (rather can be standard)
3) even if in case of default parser of_dma_get_ranges,:
   it throws and erro"
   "no dma-ranges found for node"
   because of the bug which exists.
   following lines should be moved to the end of while(1)
	839                 node = of_get_next_parent(node);
	840                 if (!node)
	841                         break;

Reviewed-by: Anup Patel <anup.patel@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index
8c7c244..20cfff7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE  config NEED_SG_DMA_LENGTH
 	def_bool y

+config ARCH_HAS_DMA_SET_COHERENT_MASK
+	def_bool y
+
 config SMP
 	def_bool y

diff --git a/arch/arm64/include/asm/device.h
b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -20,6 +20,7 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
 	void *iommu;			/* private IOMMU data */
 #endif
+	u64 parent_dma_mask;
 	bool dma_coherent;
 };

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 81cdb2e..5845ecd 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void
*virt, phys_addr_t phys)
 	__dma_flush_area(virt, PAGE_SIZE);
 }

+
 static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 				 dma_addr_t *handle, gfp_t gfp,
 				 unsigned long attrs)
@@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device
*dev,
 	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);  }

+static int __iommu_set_dma_mask(struct device *dev, u64 mask) {
+	/* device is not DMA capable */
+	if (!dev->dma_mask)
+		return -EIO;
+
+	if (mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	*dev->dma_mask = mask;
+
+	return 0;
+}
+
 static const struct dma_map_ops iommu_dma_ops = {
 	.alloc = __iommu_alloc_attrs,
 	.free = __iommu_free_attrs,
@@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device
*dev,
 	.map_resource = iommu_dma_map_resource,
 	.unmap_resource = iommu_dma_unmap_resource,
 	.mapping_error = iommu_dma_mapping_error,
+	.set_dma_mask = __iommu_set_dma_mask,
 };

+int dma_set_coherent_mask(struct device *dev, u64 mask) {
+	if (get_dma_ops(dev) == &iommu_dma_ops &&
+	    mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	dev->coherent_dma_mask = mask;
+	return 0;
+}
+EXPORT_SYMBOL(dma_set_coherent_mask);
+
+
 /*
  * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
  * everything it needs to - the device is only partially created and the
@@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev, u64
dma_base, u64 size,
 	if (!dev->dma_ops)
 		dev->dma_ops = &swiotlb_dma_ops;

+	dev->archdata.parent_dma_mask = size - 1;
+
 	dev->archdata.dma_coherent = coherent;
 	__iommu_setup_dma_ops(dev, dma_base, size, iommu);  } diff --git
a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..5804717 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,51 @@ int of_pci_get_host_bridge_resources(struct
device_node *dev,
 	return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
+*paddr, u64 *size) {
+	struct device_node *node = of_node_get(np);
+	int rlen, naddr, nsize, pna;
+	int ret = 0;
+	const int na = 3, ns = 2;
+	struct of_pci_range_parser parser;
+	struct of_pci_range range;
+
+	if (!node)
+		return -EINVAL;
+
+	parser.node = node;
+	parser.pna = of_n_addr_cells(node);
+	parser.np = parser.pna + na + ns;
+
+	parser.range = of_get_property(node, "dma-ranges", &rlen);
+
+	if (!parser.range) {
+		pr_debug("pcie device has no dma-ranges defined for
node(%s)\n", np->full_name);
+		ret = -ENODEV;
+		goto out;
+	}
+
+	parser.end = parser.range + rlen / sizeof(__be32);
+
+	/* how do we take care of multiple dma windows ?. */
+	for_each_of_pci_range(&parser, &range) {
+		*dma_addr = range.pci_addr;
+		*size = range.size;
+		*paddr = range.cpu_addr;
+	}
+
+	pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+		 *dma_addr, *paddr, *size);
+		 *dma_addr = range.pci_addr;
+		 *size = range.size;
+
+out:
+	of_node_put(node);
+	return ret;
+
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */

 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index
0e0974e..907ace0 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
int of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max,
 			struct list_head *resources, resource_size_t
*io_base);
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
+*paddr, u64 *size);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node
*dev,
 			unsigned char busno, unsigned char bus_max, @@
-83,6 +84,11 @@ static inline int of_pci_get_host_bridge_resources(struct
device_node *dev,  {
 	return -EINVAL;
 }
+
+static inline int of_pci_get_dma_ranges(struct device_node *np, u64
+*dma_addr, u64 *paddr, u64 *size) {
+	return -EINVAL;
+}
 #endif

 #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
--
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation
@ 2017-03-20  8:57 ` Oza Oza via iommu
  0 siblings, 0 replies; 14+ messages in thread
From: Oza Oza via iommu @ 2017-03-20  8:57 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy, linux-pci-u79uwXL29TY76Z2rM5mHXA
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	bcm-kernel-feedback-list-dY08KVG/lbpWk0Htik3J/w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

+  linux-pci

Regards,
Oza.

-----Original Message-----
From: Oza Pawandeep [mailto:oza.oza-dY08KVG/lbpWk0Htik3J/w@public.gmane.org]
Sent: Friday, March 17, 2017 11:41 AM
To: Joerg Roedel; Robin Murphy
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org; devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
bcm-kernel-feedback-list-dY08KVG/lbpWk0Htik3J/w@public.gmane.org; Oza Pawandeep
Subject: [RFC PATCH] iommu/dma: account pci host bridge dma_mask for IOVA
allocation

It is possible that PCI device supports 64-bit DMA addressing, and thus
it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
bridge may have limitations on the inbound transaction addressing. As an
example, consider NVME SSD device connected to iproc-PCIe controller.

Currently, the IOMMU DMA ops only considers PCI device dma_mask when
allocating an IOVA. This is particularly problematic on
ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for
in-bound transactions only after PCI Host has forwarded these transactions
on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound
transactions has to honor the addressing restrictions of the PCI Host.

this patch is inspired by
http://www.mail-archive.com/linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg1306545.html
http://www.spinics.net/lists/arm-kernel/msg566947.html

but above inspiraiton solves the half of the problem.
the rest of the problem is descrbied below, what we face on iproc based
SOCs.

current pcie frmework and of framework integration assumes dma-ranges in a
way where memory-mapped devices define their dma-ranges.
dma-ranges: (child-bus-address, parent-bus-address, length).

but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;

of_dma_configure is specifically witten to take care of memory mapped
devices.
but no implementation exists for pci to take care of pcie based memory
ranges.
in fact pci world doesnt seem to define standard dma-ranges since there is
an absense of the same, the dma_mask used to remain 32bit because of
0 size return (parsed by of_dma_configure())

this patch also implements of_pci_get_dma_ranges to cater to pci world
dma-ranges.
so then the returned size get best possible (largest) dma_mask.
for e.g.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>; we should get
dev->coherent_dma_mask=0x7fffffffff.

conclusion: there are following problems
1) linux pci and iommu framework integration has glitches with respect to
dma-ranges
2) pci linux framework look very uncertain about dma-ranges, rather
binding is not defined
   the way it is defined for memory mapped devices.
   rcar and iproc based SOCs use their custom one dma-ranges
   (rather can be standard)
3) even if in case of default parser of_dma_get_ranges,:
   it throws and erro"
   "no dma-ranges found for node"
   because of the bug which exists.
   following lines should be moved to the end of while(1)
	839                 node = of_get_next_parent(node);
	840                 if (!node)
	841                         break;

Reviewed-by: Anup Patel <anup.patel-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Reviewed-by: Scott Branden <scott.branden-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Signed-off-by: Oza Pawandeep <oza.oza-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index
8c7c244..20cfff7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE  config NEED_SG_DMA_LENGTH
 	def_bool y

+config ARCH_HAS_DMA_SET_COHERENT_MASK
+	def_bool y
+
 config SMP
 	def_bool y

diff --git a/arch/arm64/include/asm/device.h
b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -20,6 +20,7 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
 	void *iommu;			/* private IOMMU data */
 #endif
+	u64 parent_dma_mask;
 	bool dma_coherent;
 };

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 81cdb2e..5845ecd 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void
*virt, phys_addr_t phys)
 	__dma_flush_area(virt, PAGE_SIZE);
 }

+
 static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 				 dma_addr_t *handle, gfp_t gfp,
 				 unsigned long attrs)
@@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device
*dev,
 	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);  }

+static int __iommu_set_dma_mask(struct device *dev, u64 mask) {
+	/* device is not DMA capable */
+	if (!dev->dma_mask)
+		return -EIO;
+
+	if (mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	*dev->dma_mask = mask;
+
+	return 0;
+}
+
 static const struct dma_map_ops iommu_dma_ops = {
 	.alloc = __iommu_alloc_attrs,
 	.free = __iommu_free_attrs,
@@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device
*dev,
 	.map_resource = iommu_dma_map_resource,
 	.unmap_resource = iommu_dma_unmap_resource,
 	.mapping_error = iommu_dma_mapping_error,
+	.set_dma_mask = __iommu_set_dma_mask,
 };

+int dma_set_coherent_mask(struct device *dev, u64 mask) {
+	if (get_dma_ops(dev) == &iommu_dma_ops &&
+	    mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	dev->coherent_dma_mask = mask;
+	return 0;
+}
+EXPORT_SYMBOL(dma_set_coherent_mask);
+
+
 /*
  * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
  * everything it needs to - the device is only partially created and the
@@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev, u64
dma_base, u64 size,
 	if (!dev->dma_ops)
 		dev->dma_ops = &swiotlb_dma_ops;

+	dev->archdata.parent_dma_mask = size - 1;
+
 	dev->archdata.dma_coherent = coherent;
 	__iommu_setup_dma_ops(dev, dma_base, size, iommu);  } diff --git
a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..5804717 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,51 @@ int of_pci_get_host_bridge_resources(struct
device_node *dev,
 	return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
+*paddr, u64 *size) {
+	struct device_node *node = of_node_get(np);
+	int rlen, naddr, nsize, pna;
+	int ret = 0;
+	const int na = 3, ns = 2;
+	struct of_pci_range_parser parser;
+	struct of_pci_range range;
+
+	if (!node)
+		return -EINVAL;
+
+	parser.node = node;
+	parser.pna = of_n_addr_cells(node);
+	parser.np = parser.pna + na + ns;
+
+	parser.range = of_get_property(node, "dma-ranges", &rlen);
+
+	if (!parser.range) {
+		pr_debug("pcie device has no dma-ranges defined for
node(%s)\n", np->full_name);
+		ret = -ENODEV;
+		goto out;
+	}
+
+	parser.end = parser.range + rlen / sizeof(__be32);
+
+	/* how do we take care of multiple dma windows ?. */
+	for_each_of_pci_range(&parser, &range) {
+		*dma_addr = range.pci_addr;
+		*size = range.size;
+		*paddr = range.cpu_addr;
+	}
+
+	pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+		 *dma_addr, *paddr, *size);
+		 *dma_addr = range.pci_addr;
+		 *size = range.size;
+
+out:
+	of_node_put(node);
+	return ret;
+
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */

 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index
0e0974e..907ace0 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
int of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max,
 			struct list_head *resources, resource_size_t
*io_base);
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
+*paddr, u64 *size);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node
*dev,
 			unsigned char busno, unsigned char bus_max, @@
-83,6 +84,11 @@ static inline int of_pci_get_host_bridge_resources(struct
device_node *dev,  {
 	return -EINVAL;
 }
+
+static inline int of_pci_get_dma_ranges(struct device_node *np, u64
+*dma_addr, u64 *paddr, u64 *size) {
+	return -EINVAL;
+}
 #endif

 #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
--
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation
@ 2017-03-20  8:57 ` Oza Oza via iommu
  0 siblings, 0 replies; 14+ messages in thread
From: Oza Oza @ 2017-03-20  8:57 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy, linux-pci
  Cc: devicetree, iommu, bcm-kernel-feedback-list, linux-kernel,
	linux-arm-kernel

+  linux-pci

Regards,
Oza.

-----Original Message-----
From: Oza Pawandeep [mailto:oza.oza@broadcom.com]
Sent: Friday, March 17, 2017 11:41 AM
To: Joerg Roedel; Robin Murphy
Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
linux-arm-kernel@lists.infradead.org; devicetree@vger.kernel.org;
bcm-kernel-feedback-list@broadcom.com; Oza Pawandeep
Subject: [RFC PATCH] iommu/dma: account pci host bridge dma_mask for IOVA
allocation

It is possible that PCI device supports 64-bit DMA addressing, and thus
it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
bridge may have limitations on the inbound transaction addressing. As an
example, consider NVME SSD device connected to iproc-PCIe controller.

Currently, the IOMMU DMA ops only considers PCI device dma_mask when
allocating an IOVA. This is particularly problematic on
ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for
in-bound transactions only after PCI Host has forwarded these transactions
on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound
transactions has to honor the addressing restrictions of the PCI Host.

this patch is inspired by
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.html
http://www.spinics.net/lists/arm-kernel/msg566947.html

but above inspiraiton solves the half of the problem.
the rest of the problem is descrbied below, what we face on iproc based
SOCs.

current pcie frmework and of framework integration assumes dma-ranges in a
way where memory-mapped devices define their dma-ranges.
dma-ranges: (child-bus-address, parent-bus-address, length).

but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;

of_dma_configure is specifically witten to take care of memory mapped
devices.
but no implementation exists for pci to take care of pcie based memory
ranges.
in fact pci world doesnt seem to define standard dma-ranges since there is
an absense of the same, the dma_mask used to remain 32bit because of
0 size return (parsed by of_dma_configure())

this patch also implements of_pci_get_dma_ranges to cater to pci world
dma-ranges.
so then the returned size get best possible (largest) dma_mask.
for e.g.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>; we should get
dev->coherent_dma_mask=0x7fffffffff.

conclusion: there are following problems
1) linux pci and iommu framework integration has glitches with respect to
dma-ranges
2) pci linux framework look very uncertain about dma-ranges, rather
binding is not defined
   the way it is defined for memory mapped devices.
   rcar and iproc based SOCs use their custom one dma-ranges
   (rather can be standard)
3) even if in case of default parser of_dma_get_ranges,:
   it throws and erro"
   "no dma-ranges found for node"
   because of the bug which exists.
   following lines should be moved to the end of while(1)
	839                 node = of_get_next_parent(node);
	840                 if (!node)
	841                         break;

Reviewed-by: Anup Patel <anup.patel@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index
8c7c244..20cfff7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE  config NEED_SG_DMA_LENGTH
 	def_bool y

+config ARCH_HAS_DMA_SET_COHERENT_MASK
+	def_bool y
+
 config SMP
 	def_bool y

diff --git a/arch/arm64/include/asm/device.h
b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -20,6 +20,7 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
 	void *iommu;			/* private IOMMU data */
 #endif
+	u64 parent_dma_mask;
 	bool dma_coherent;
 };

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 81cdb2e..5845ecd 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void
*virt, phys_addr_t phys)
 	__dma_flush_area(virt, PAGE_SIZE);
 }

+
 static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 				 dma_addr_t *handle, gfp_t gfp,
 				 unsigned long attrs)
@@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device
*dev,
 	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);  }

+static int __iommu_set_dma_mask(struct device *dev, u64 mask) {
+	/* device is not DMA capable */
+	if (!dev->dma_mask)
+		return -EIO;
+
+	if (mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	*dev->dma_mask = mask;
+
+	return 0;
+}
+
 static const struct dma_map_ops iommu_dma_ops = {
 	.alloc = __iommu_alloc_attrs,
 	.free = __iommu_free_attrs,
@@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device
*dev,
 	.map_resource = iommu_dma_map_resource,
 	.unmap_resource = iommu_dma_unmap_resource,
 	.mapping_error = iommu_dma_mapping_error,
+	.set_dma_mask = __iommu_set_dma_mask,
 };

+int dma_set_coherent_mask(struct device *dev, u64 mask) {
+	if (get_dma_ops(dev) == &iommu_dma_ops &&
+	    mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	dev->coherent_dma_mask = mask;
+	return 0;
+}
+EXPORT_SYMBOL(dma_set_coherent_mask);
+
+
 /*
  * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
  * everything it needs to - the device is only partially created and the
@@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev, u64
dma_base, u64 size,
 	if (!dev->dma_ops)
 		dev->dma_ops = &swiotlb_dma_ops;

+	dev->archdata.parent_dma_mask = size - 1;
+
 	dev->archdata.dma_coherent = coherent;
 	__iommu_setup_dma_ops(dev, dma_base, size, iommu);  } diff --git
a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..5804717 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,51 @@ int of_pci_get_host_bridge_resources(struct
device_node *dev,
 	return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
+*paddr, u64 *size) {
+	struct device_node *node = of_node_get(np);
+	int rlen, naddr, nsize, pna;
+	int ret = 0;
+	const int na = 3, ns = 2;
+	struct of_pci_range_parser parser;
+	struct of_pci_range range;
+
+	if (!node)
+		return -EINVAL;
+
+	parser.node = node;
+	parser.pna = of_n_addr_cells(node);
+	parser.np = parser.pna + na + ns;
+
+	parser.range = of_get_property(node, "dma-ranges", &rlen);
+
+	if (!parser.range) {
+		pr_debug("pcie device has no dma-ranges defined for
node(%s)\n", np->full_name);
+		ret = -ENODEV;
+		goto out;
+	}
+
+	parser.end = parser.range + rlen / sizeof(__be32);
+
+	/* how do we take care of multiple dma windows ?. */
+	for_each_of_pci_range(&parser, &range) {
+		*dma_addr = range.pci_addr;
+		*size = range.size;
+		*paddr = range.cpu_addr;
+	}
+
+	pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+		 *dma_addr, *paddr, *size);
+		 *dma_addr = range.pci_addr;
+		 *size = range.size;
+
+out:
+	of_node_put(node);
+	return ret;
+
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */

 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index
0e0974e..907ace0 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
int of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max,
 			struct list_head *resources, resource_size_t
*io_base);
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
+*paddr, u64 *size);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node
*dev,
 			unsigned char busno, unsigned char bus_max, @@
-83,6 +84,11 @@ static inline int of_pci_get_host_bridge_resources(struct
device_node *dev,  {
 	return -EINVAL;
 }
+
+static inline int of_pci_get_dma_ranges(struct device_node *np, u64
+*dma_addr, u64 *paddr, u64 *size) {
+	return -EINVAL;
+}
 #endif

 #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
--
1.9.1

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation
@ 2017-03-20  8:57 ` Oza Oza via iommu
  0 siblings, 0 replies; 14+ messages in thread
From: Oza Oza @ 2017-03-20  8:57 UTC (permalink / raw)
  To: linux-arm-kernel

+  linux-pci

Regards,
Oza.

-----Original Message-----
From: Oza Pawandeep [mailto:oza.oza at broadcom.com]
Sent: Friday, March 17, 2017 11:41 AM
To: Joerg Roedel; Robin Murphy
Cc: iommu at lists.linux-foundation.org; linux-kernel at vger.kernel.org;
linux-arm-kernel at lists.infradead.org; devicetree at vger.kernel.org;
bcm-kernel-feedback-list at broadcom.com; Oza Pawandeep
Subject: [RFC PATCH] iommu/dma: account pci host bridge dma_mask for IOVA
allocation

It is possible that PCI device supports 64-bit DMA addressing, and thus
it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
bridge may have limitations on the inbound transaction addressing. As an
example, consider NVME SSD device connected to iproc-PCIe controller.

Currently, the IOMMU DMA ops only considers PCI device dma_mask when
allocating an IOVA. This is particularly problematic on
ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for
in-bound transactions only after PCI Host has forwarded these transactions
on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound
transactions has to honor the addressing restrictions of the PCI Host.

this patch is inspired by
http://www.mail-archive.com/linux-kernel at vger.kernel.org/msg1306545.html
http://www.spinics.net/lists/arm-kernel/msg566947.html

but above inspiraiton solves the half of the problem.
the rest of the problem is descrbied below, what we face on iproc based
SOCs.

current pcie frmework and of framework integration assumes dma-ranges in a
way where memory-mapped devices define their dma-ranges.
dma-ranges: (child-bus-address, parent-bus-address, length).

but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;

of_dma_configure is specifically witten to take care of memory mapped
devices.
but no implementation exists for pci to take care of pcie based memory
ranges.
in fact pci world doesnt seem to define standard dma-ranges since there is
an absense of the same, the dma_mask used to remain 32bit because of
0 size return (parsed by of_dma_configure())

this patch also implements of_pci_get_dma_ranges to cater to pci world
dma-ranges.
so then the returned size get best possible (largest) dma_mask.
for e.g.
dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>; we should get
dev->coherent_dma_mask=0x7fffffffff.

conclusion: there are following problems
1) linux pci and iommu framework integration has glitches with respect to
dma-ranges
2) pci linux framework look very uncertain about dma-ranges, rather
binding is not defined
   the way it is defined for memory mapped devices.
   rcar and iproc based SOCs use their custom one dma-ranges
   (rather can be standard)
3) even if in case of default parser of_dma_get_ranges,:
   it throws and erro"
   "no dma-ranges found for node"
   because of the bug which exists.
   following lines should be moved to the end of while(1)
	839                 node = of_get_next_parent(node);
	840                 if (!node)
	841                         break;

Reviewed-by: Anup Patel <anup.patel@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index
8c7c244..20cfff7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE  config NEED_SG_DMA_LENGTH
 	def_bool y

+config ARCH_HAS_DMA_SET_COHERENT_MASK
+	def_bool y
+
 config SMP
 	def_bool y

diff --git a/arch/arm64/include/asm/device.h
b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -20,6 +20,7 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
 	void *iommu;			/* private IOMMU data */
 #endif
+	u64 parent_dma_mask;
 	bool dma_coherent;
 };

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 81cdb2e..5845ecd 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void
*virt, phys_addr_t phys)
 	__dma_flush_area(virt, PAGE_SIZE);
 }

+
 static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 				 dma_addr_t *handle, gfp_t gfp,
 				 unsigned long attrs)
@@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device
*dev,
 	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);  }

+static int __iommu_set_dma_mask(struct device *dev, u64 mask) {
+	/* device is not DMA capable */
+	if (!dev->dma_mask)
+		return -EIO;
+
+	if (mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	*dev->dma_mask = mask;
+
+	return 0;
+}
+
 static const struct dma_map_ops iommu_dma_ops = {
 	.alloc = __iommu_alloc_attrs,
 	.free = __iommu_free_attrs,
@@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device
*dev,
 	.map_resource = iommu_dma_map_resource,
 	.unmap_resource = iommu_dma_unmap_resource,
 	.mapping_error = iommu_dma_mapping_error,
+	.set_dma_mask = __iommu_set_dma_mask,
 };

+int dma_set_coherent_mask(struct device *dev, u64 mask) {
+	if (get_dma_ops(dev) == &iommu_dma_ops &&
+	    mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	dev->coherent_dma_mask = mask;
+	return 0;
+}
+EXPORT_SYMBOL(dma_set_coherent_mask);
+
+
 /*
  * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
  * everything it needs to - the device is only partially created and the
@@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev, u64
dma_base, u64 size,
 	if (!dev->dma_ops)
 		dev->dma_ops = &swiotlb_dma_ops;

+	dev->archdata.parent_dma_mask = size - 1;
+
 	dev->archdata.dma_coherent = coherent;
 	__iommu_setup_dma_ops(dev, dma_base, size, iommu);  } diff --git
a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..5804717 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,51 @@ int of_pci_get_host_bridge_resources(struct
device_node *dev,
 	return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
+*paddr, u64 *size) {
+	struct device_node *node = of_node_get(np);
+	int rlen, naddr, nsize, pna;
+	int ret = 0;
+	const int na = 3, ns = 2;
+	struct of_pci_range_parser parser;
+	struct of_pci_range range;
+
+	if (!node)
+		return -EINVAL;
+
+	parser.node = node;
+	parser.pna = of_n_addr_cells(node);
+	parser.np = parser.pna + na + ns;
+
+	parser.range = of_get_property(node, "dma-ranges", &rlen);
+
+	if (!parser.range) {
+		pr_debug("pcie device has no dma-ranges defined for
node(%s)\n", np->full_name);
+		ret = -ENODEV;
+		goto out;
+	}
+
+	parser.end = parser.range + rlen / sizeof(__be32);
+
+	/* how do we take care of multiple dma windows ?. */
+	for_each_of_pci_range(&parser, &range) {
+		*dma_addr = range.pci_addr;
+		*size = range.size;
+		*paddr = range.cpu_addr;
+	}
+
+	pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+		 *dma_addr, *paddr, *size);
+		 *dma_addr = range.pci_addr;
+		 *size = range.size;
+
+out:
+	of_node_put(node);
+	return ret;
+
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */

 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index
0e0974e..907ace0 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
int of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max,
 			struct list_head *resources, resource_size_t
*io_base);
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
+*paddr, u64 *size);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node
*dev,
 			unsigned char busno, unsigned char bus_max, @@
-83,6 +84,11 @@ static inline int of_pci_get_host_bridge_resources(struct
device_node *dev,  {
 	return -EINVAL;
 }
+
+static inline int of_pci_get_dma_ranges(struct device_node *np, u64
+*dma_addr, u64 *paddr, u64 *size) {
+	return -EINVAL;
+}
 #endif

 #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
--
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation
  2017-03-20  8:57 ` Oza Oza via iommu
  (?)
@ 2017-03-20 15:43   ` Robin Murphy
  -1 siblings, 0 replies; 14+ messages in thread
From: Robin Murphy @ 2017-03-20 15:43 UTC (permalink / raw)
  To: Oza Oza
  Cc: Joerg Roedel, linux-pci, iommu, linux-kernel, linux-arm-kernel,
	devicetree, bcm-kernel-feedback-list

On 20/03/17 08:57, Oza Oza wrote:
> +  linux-pci
> 
> Regards,
> Oza.
> 
> -----Original Message-----
> From: Oza Pawandeep [mailto:oza.oza@broadcom.com]
> Sent: Friday, March 17, 2017 11:41 AM
> To: Joerg Roedel; Robin Murphy
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> linux-arm-kernel@lists.infradead.org; devicetree@vger.kernel.org;
> bcm-kernel-feedback-list@broadcom.com; Oza Pawandeep
> Subject: [RFC PATCH] iommu/dma: account pci host bridge dma_mask for IOVA
> allocation
> 
> It is possible that PCI device supports 64-bit DMA addressing, and thus
> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
> bridge may have limitations on the inbound transaction addressing. As an
> example, consider NVME SSD device connected to iproc-PCIe controller.
> 
> Currently, the IOMMU DMA ops only considers PCI device dma_mask when
> allocating an IOVA. This is particularly problematic on
> ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for
> in-bound transactions only after PCI Host has forwarded these transactions
> on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound
> transactions has to honor the addressing restrictions of the PCI Host.
> 
> this patch is inspired by
> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.html
> http://www.spinics.net/lists/arm-kernel/msg566947.html
> 
> but above inspiraiton solves the half of the problem.
> the rest of the problem is descrbied below, what we face on iproc based
> SOCs.
> 
> current pcie frmework and of framework integration assumes dma-ranges in a
> way where memory-mapped devices define their dma-ranges.
> dma-ranges: (child-bus-address, parent-bus-address, length).
> 
> but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> 
> of_dma_configure is specifically witten to take care of memory mapped
> devices.
> but no implementation exists for pci to take care of pcie based memory
> ranges.
> in fact pci world doesnt seem to define standard dma-ranges since there is
> an absense of the same, the dma_mask used to remain 32bit because of
> 0 size return (parsed by of_dma_configure())
> 
> this patch also implements of_pci_get_dma_ranges to cater to pci world
> dma-ranges.
> so then the returned size get best possible (largest) dma_mask.
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>; we should get
> dev->coherent_dma_mask=0x7fffffffff.
> 
> conclusion: there are following problems
> 1) linux pci and iommu framework integration has glitches with respect to
> dma-ranges
> 2) pci linux framework look very uncertain about dma-ranges, rather
> binding is not defined
>    the way it is defined for memory mapped devices.
>    rcar and iproc based SOCs use their custom one dma-ranges
>    (rather can be standard)
> 3) even if in case of default parser of_dma_get_ranges,:
>    it throws and erro"
>    "no dma-ranges found for node"
>    because of the bug which exists.
>    following lines should be moved to the end of while(1)
> 	839                 node = of_get_next_parent(node);
> 	840                 if (!node)
> 	841                         break;

Right, having made sense of this and looked into things myself I think I
understand now; what this boils down to is that the existing
implementation of of_dma_get_range() expects always to be given a leaf
device_node, and doesn't cope with being given a device_node for the
given device's parent bus directly. That's really all there is; it's not
specific to PCI (there are other probeable and DMA-capable buses whose
children aren't described in DT, like the fsl-mc thing), and it
definitely doesn't have anything to do with IOMMUs.

Now, that's certainly something to fix, but AFAICS this patch doesn't do
that, only adds some PCI-specific code which is never called.

DMA mask inheritance for arm64 is another issue, which again is general,
but does tend to be more visible in the IOMMU case. That still needs
some work on the APCI side - all the DT-centric approaches so far either
regress or at best do nothing for ACPI. I've made a note to try to look
into that soon, but from what I recall I fear there is still an open
question about what to do for a default in the absence of IORT or _DMA
(once the current assumption that drivers can override our arbitrary
default at will is closed down).

In the meantime, have you tried 4.11-rc1 or later on the affected
system? One of the ulterior motives behind 122fac030e91 was that in many
cases it also happens to paper over most versions of this problem for
PCI devices, and makes the IOMMU at least useable (on systems which
don't need to dma_map_*() vast amounts of RAM all at once) while we fix
the underlying things properly.

Robin.

> Reviewed-by: Anup Patel <anup.patel@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index
> 8c7c244..20cfff7 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE  config NEED_SG_DMA_LENGTH
>  	def_bool y
> 
> +config ARCH_HAS_DMA_SET_COHERENT_MASK
> +	def_bool y
> +
>  config SMP
>  	def_bool y
> 
> diff --git a/arch/arm64/include/asm/device.h
> b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644
> --- a/arch/arm64/include/asm/device.h
> +++ b/arch/arm64/include/asm/device.h
> @@ -20,6 +20,7 @@ struct dev_archdata {
>  #ifdef CONFIG_IOMMU_API
>  	void *iommu;			/* private IOMMU data */
>  #endif
> +	u64 parent_dma_mask;
>  	bool dma_coherent;
>  };
> 
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 81cdb2e..5845ecd 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void
> *virt, phys_addr_t phys)
>  	__dma_flush_area(virt, PAGE_SIZE);
>  }
> 
> +
>  static void *__iommu_alloc_attrs(struct device *dev, size_t size,
>  				 dma_addr_t *handle, gfp_t gfp,
>  				 unsigned long attrs)
> @@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device
> *dev,
>  	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);  }
> 
> +static int __iommu_set_dma_mask(struct device *dev, u64 mask) {
> +	/* device is not DMA capable */
> +	if (!dev->dma_mask)
> +		return -EIO;
> +
> +	if (mask > dev->archdata.parent_dma_mask)
> +		mask = dev->archdata.parent_dma_mask;
> +
> +	*dev->dma_mask = mask;
> +
> +	return 0;
> +}
> +
>  static const struct dma_map_ops iommu_dma_ops = {
>  	.alloc = __iommu_alloc_attrs,
>  	.free = __iommu_free_attrs,
> @@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device
> *dev,
>  	.map_resource = iommu_dma_map_resource,
>  	.unmap_resource = iommu_dma_unmap_resource,
>  	.mapping_error = iommu_dma_mapping_error,
> +	.set_dma_mask = __iommu_set_dma_mask,
>  };
> 
> +int dma_set_coherent_mask(struct device *dev, u64 mask) {
> +	if (get_dma_ops(dev) == &iommu_dma_ops &&
> +	    mask > dev->archdata.parent_dma_mask)
> +		mask = dev->archdata.parent_dma_mask;
> +
> +	dev->coherent_dma_mask = mask;
> +	return 0;
> +}
> +EXPORT_SYMBOL(dma_set_coherent_mask);
> +
> +
>  /*
>   * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
>   * everything it needs to - the device is only partially created and the
> @@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev, u64
> dma_base, u64 size,
>  	if (!dev->dma_ops)
>  		dev->dma_ops = &swiotlb_dma_ops;
> 
> +	dev->archdata.parent_dma_mask = size - 1;
> +
>  	dev->archdata.dma_coherent = coherent;
>  	__iommu_setup_dma_ops(dev, dma_base, size, iommu);  } diff --git
> a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..5804717 100644
> --- a/drivers/of/of_pci.c
> +++ b/drivers/of/of_pci.c
> @@ -283,6 +283,51 @@ int of_pci_get_host_bridge_resources(struct
> device_node *dev,
>  	return err;
>  }
>  EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
> +
> +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
> +*paddr, u64 *size) {
> +	struct device_node *node = of_node_get(np);
> +	int rlen, naddr, nsize, pna;
> +	int ret = 0;
> +	const int na = 3, ns = 2;
> +	struct of_pci_range_parser parser;
> +	struct of_pci_range range;
> +
> +	if (!node)
> +		return -EINVAL;
> +
> +	parser.node = node;
> +	parser.pna = of_n_addr_cells(node);
> +	parser.np = parser.pna + na + ns;
> +
> +	parser.range = of_get_property(node, "dma-ranges", &rlen);
> +
> +	if (!parser.range) {
> +		pr_debug("pcie device has no dma-ranges defined for
> node(%s)\n", np->full_name);
> +		ret = -ENODEV;
> +		goto out;
> +	}
> +
> +	parser.end = parser.range + rlen / sizeof(__be32);
> +
> +	/* how do we take care of multiple dma windows ?. */
> +	for_each_of_pci_range(&parser, &range) {
> +		*dma_addr = range.pci_addr;
> +		*size = range.size;
> +		*paddr = range.cpu_addr;
> +	}
> +
> +	pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
> +		 *dma_addr, *paddr, *size);
> +		 *dma_addr = range.pci_addr;
> +		 *size = range.size;
> +
> +out:
> +	of_node_put(node);
> +	return ret;
> +
> +}
> +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
>  #endif /* CONFIG_OF_ADDRESS */
> 
>  #ifdef CONFIG_PCI_MSI
> diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index
> 0e0974e..907ace0 100644
> --- a/include/linux/of_pci.h
> +++ b/include/linux/of_pci.h
> @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
> int of_pci_get_host_bridge_resources(struct device_node *dev,
>  			unsigned char busno, unsigned char bus_max,
>  			struct list_head *resources, resource_size_t
> *io_base);
> +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
> +*paddr, u64 *size);
>  #else
>  static inline int of_pci_get_host_bridge_resources(struct device_node
> *dev,
>  			unsigned char busno, unsigned char bus_max, @@
> -83,6 +84,11 @@ static inline int of_pci_get_host_bridge_resources(struct
> device_node *dev,  {
>  	return -EINVAL;
>  }
> +
> +static inline int of_pci_get_dma_ranges(struct device_node *np, u64
> +*dma_addr, u64 *paddr, u64 *size) {
> +	return -EINVAL;
> +}
>  #endif
> 
>  #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
> --
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation
@ 2017-03-20 15:43   ` Robin Murphy
  0 siblings, 0 replies; 14+ messages in thread
From: Robin Murphy @ 2017-03-20 15:43 UTC (permalink / raw)
  To: Oza Oza
  Cc: devicetree, linux-pci, Joerg Roedel, linux-kernel, iommu,
	bcm-kernel-feedback-list, linux-arm-kernel

On 20/03/17 08:57, Oza Oza wrote:
> +  linux-pci
> 
> Regards,
> Oza.
> 
> -----Original Message-----
> From: Oza Pawandeep [mailto:oza.oza@broadcom.com]
> Sent: Friday, March 17, 2017 11:41 AM
> To: Joerg Roedel; Robin Murphy
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> linux-arm-kernel@lists.infradead.org; devicetree@vger.kernel.org;
> bcm-kernel-feedback-list@broadcom.com; Oza Pawandeep
> Subject: [RFC PATCH] iommu/dma: account pci host bridge dma_mask for IOVA
> allocation
> 
> It is possible that PCI device supports 64-bit DMA addressing, and thus
> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
> bridge may have limitations on the inbound transaction addressing. As an
> example, consider NVME SSD device connected to iproc-PCIe controller.
> 
> Currently, the IOMMU DMA ops only considers PCI device dma_mask when
> allocating an IOVA. This is particularly problematic on
> ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for
> in-bound transactions only after PCI Host has forwarded these transactions
> on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound
> transactions has to honor the addressing restrictions of the PCI Host.
> 
> this patch is inspired by
> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.html
> http://www.spinics.net/lists/arm-kernel/msg566947.html
> 
> but above inspiraiton solves the half of the problem.
> the rest of the problem is descrbied below, what we face on iproc based
> SOCs.
> 
> current pcie frmework and of framework integration assumes dma-ranges in a
> way where memory-mapped devices define their dma-ranges.
> dma-ranges: (child-bus-address, parent-bus-address, length).
> 
> but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> 
> of_dma_configure is specifically witten to take care of memory mapped
> devices.
> but no implementation exists for pci to take care of pcie based memory
> ranges.
> in fact pci world doesnt seem to define standard dma-ranges since there is
> an absense of the same, the dma_mask used to remain 32bit because of
> 0 size return (parsed by of_dma_configure())
> 
> this patch also implements of_pci_get_dma_ranges to cater to pci world
> dma-ranges.
> so then the returned size get best possible (largest) dma_mask.
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>; we should get
> dev->coherent_dma_mask=0x7fffffffff.
> 
> conclusion: there are following problems
> 1) linux pci and iommu framework integration has glitches with respect to
> dma-ranges
> 2) pci linux framework look very uncertain about dma-ranges, rather
> binding is not defined
>    the way it is defined for memory mapped devices.
>    rcar and iproc based SOCs use their custom one dma-ranges
>    (rather can be standard)
> 3) even if in case of default parser of_dma_get_ranges,:
>    it throws and erro"
>    "no dma-ranges found for node"
>    because of the bug which exists.
>    following lines should be moved to the end of while(1)
> 	839                 node = of_get_next_parent(node);
> 	840                 if (!node)
> 	841                         break;

Right, having made sense of this and looked into things myself I think I
understand now; what this boils down to is that the existing
implementation of of_dma_get_range() expects always to be given a leaf
device_node, and doesn't cope with being given a device_node for the
given device's parent bus directly. That's really all there is; it's not
specific to PCI (there are other probeable and DMA-capable buses whose
children aren't described in DT, like the fsl-mc thing), and it
definitely doesn't have anything to do with IOMMUs.

Now, that's certainly something to fix, but AFAICS this patch doesn't do
that, only adds some PCI-specific code which is never called.

DMA mask inheritance for arm64 is another issue, which again is general,
but does tend to be more visible in the IOMMU case. That still needs
some work on the APCI side - all the DT-centric approaches so far either
regress or at best do nothing for ACPI. I've made a note to try to look
into that soon, but from what I recall I fear there is still an open
question about what to do for a default in the absence of IORT or _DMA
(once the current assumption that drivers can override our arbitrary
default at will is closed down).

In the meantime, have you tried 4.11-rc1 or later on the affected
system? One of the ulterior motives behind 122fac030e91 was that in many
cases it also happens to paper over most versions of this problem for
PCI devices, and makes the IOMMU at least useable (on systems which
don't need to dma_map_*() vast amounts of RAM all at once) while we fix
the underlying things properly.

Robin.

> Reviewed-by: Anup Patel <anup.patel@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index
> 8c7c244..20cfff7 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE  config NEED_SG_DMA_LENGTH
>  	def_bool y
> 
> +config ARCH_HAS_DMA_SET_COHERENT_MASK
> +	def_bool y
> +
>  config SMP
>  	def_bool y
> 
> diff --git a/arch/arm64/include/asm/device.h
> b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644
> --- a/arch/arm64/include/asm/device.h
> +++ b/arch/arm64/include/asm/device.h
> @@ -20,6 +20,7 @@ struct dev_archdata {
>  #ifdef CONFIG_IOMMU_API
>  	void *iommu;			/* private IOMMU data */
>  #endif
> +	u64 parent_dma_mask;
>  	bool dma_coherent;
>  };
> 
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 81cdb2e..5845ecd 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void
> *virt, phys_addr_t phys)
>  	__dma_flush_area(virt, PAGE_SIZE);
>  }
> 
> +
>  static void *__iommu_alloc_attrs(struct device *dev, size_t size,
>  				 dma_addr_t *handle, gfp_t gfp,
>  				 unsigned long attrs)
> @@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device
> *dev,
>  	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);  }
> 
> +static int __iommu_set_dma_mask(struct device *dev, u64 mask) {
> +	/* device is not DMA capable */
> +	if (!dev->dma_mask)
> +		return -EIO;
> +
> +	if (mask > dev->archdata.parent_dma_mask)
> +		mask = dev->archdata.parent_dma_mask;
> +
> +	*dev->dma_mask = mask;
> +
> +	return 0;
> +}
> +
>  static const struct dma_map_ops iommu_dma_ops = {
>  	.alloc = __iommu_alloc_attrs,
>  	.free = __iommu_free_attrs,
> @@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device
> *dev,
>  	.map_resource = iommu_dma_map_resource,
>  	.unmap_resource = iommu_dma_unmap_resource,
>  	.mapping_error = iommu_dma_mapping_error,
> +	.set_dma_mask = __iommu_set_dma_mask,
>  };
> 
> +int dma_set_coherent_mask(struct device *dev, u64 mask) {
> +	if (get_dma_ops(dev) == &iommu_dma_ops &&
> +	    mask > dev->archdata.parent_dma_mask)
> +		mask = dev->archdata.parent_dma_mask;
> +
> +	dev->coherent_dma_mask = mask;
> +	return 0;
> +}
> +EXPORT_SYMBOL(dma_set_coherent_mask);
> +
> +
>  /*
>   * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
>   * everything it needs to - the device is only partially created and the
> @@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev, u64
> dma_base, u64 size,
>  	if (!dev->dma_ops)
>  		dev->dma_ops = &swiotlb_dma_ops;
> 
> +	dev->archdata.parent_dma_mask = size - 1;
> +
>  	dev->archdata.dma_coherent = coherent;
>  	__iommu_setup_dma_ops(dev, dma_base, size, iommu);  } diff --git
> a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..5804717 100644
> --- a/drivers/of/of_pci.c
> +++ b/drivers/of/of_pci.c
> @@ -283,6 +283,51 @@ int of_pci_get_host_bridge_resources(struct
> device_node *dev,
>  	return err;
>  }
>  EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
> +
> +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
> +*paddr, u64 *size) {
> +	struct device_node *node = of_node_get(np);
> +	int rlen, naddr, nsize, pna;
> +	int ret = 0;
> +	const int na = 3, ns = 2;
> +	struct of_pci_range_parser parser;
> +	struct of_pci_range range;
> +
> +	if (!node)
> +		return -EINVAL;
> +
> +	parser.node = node;
> +	parser.pna = of_n_addr_cells(node);
> +	parser.np = parser.pna + na + ns;
> +
> +	parser.range = of_get_property(node, "dma-ranges", &rlen);
> +
> +	if (!parser.range) {
> +		pr_debug("pcie device has no dma-ranges defined for
> node(%s)\n", np->full_name);
> +		ret = -ENODEV;
> +		goto out;
> +	}
> +
> +	parser.end = parser.range + rlen / sizeof(__be32);
> +
> +	/* how do we take care of multiple dma windows ?. */
> +	for_each_of_pci_range(&parser, &range) {
> +		*dma_addr = range.pci_addr;
> +		*size = range.size;
> +		*paddr = range.cpu_addr;
> +	}
> +
> +	pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
> +		 *dma_addr, *paddr, *size);
> +		 *dma_addr = range.pci_addr;
> +		 *size = range.size;
> +
> +out:
> +	of_node_put(node);
> +	return ret;
> +
> +}
> +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
>  #endif /* CONFIG_OF_ADDRESS */
> 
>  #ifdef CONFIG_PCI_MSI
> diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index
> 0e0974e..907ace0 100644
> --- a/include/linux/of_pci.h
> +++ b/include/linux/of_pci.h
> @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
> int of_pci_get_host_bridge_resources(struct device_node *dev,
>  			unsigned char busno, unsigned char bus_max,
>  			struct list_head *resources, resource_size_t
> *io_base);
> +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
> +*paddr, u64 *size);
>  #else
>  static inline int of_pci_get_host_bridge_resources(struct device_node
> *dev,
>  			unsigned char busno, unsigned char bus_max, @@
> -83,6 +84,11 @@ static inline int of_pci_get_host_bridge_resources(struct
> device_node *dev,  {
>  	return -EINVAL;
>  }
> +
> +static inline int of_pci_get_dma_ranges(struct device_node *np, u64
> +*dma_addr, u64 *paddr, u64 *size) {
> +	return -EINVAL;
> +}
>  #endif
> 
>  #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
> --
> 1.9.1
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation
@ 2017-03-20 15:43   ` Robin Murphy
  0 siblings, 0 replies; 14+ messages in thread
From: Robin Murphy @ 2017-03-20 15:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 20/03/17 08:57, Oza Oza wrote:
> +  linux-pci
> 
> Regards,
> Oza.
> 
> -----Original Message-----
> From: Oza Pawandeep [mailto:oza.oza at broadcom.com]
> Sent: Friday, March 17, 2017 11:41 AM
> To: Joerg Roedel; Robin Murphy
> Cc: iommu at lists.linux-foundation.org; linux-kernel at vger.kernel.org;
> linux-arm-kernel at lists.infradead.org; devicetree at vger.kernel.org;
> bcm-kernel-feedback-list at broadcom.com; Oza Pawandeep
> Subject: [RFC PATCH] iommu/dma: account pci host bridge dma_mask for IOVA
> allocation
> 
> It is possible that PCI device supports 64-bit DMA addressing, and thus
> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
> bridge may have limitations on the inbound transaction addressing. As an
> example, consider NVME SSD device connected to iproc-PCIe controller.
> 
> Currently, the IOMMU DMA ops only considers PCI device dma_mask when
> allocating an IOVA. This is particularly problematic on
> ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for
> in-bound transactions only after PCI Host has forwarded these transactions
> on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound
> transactions has to honor the addressing restrictions of the PCI Host.
> 
> this patch is inspired by
> http://www.mail-archive.com/linux-kernel at vger.kernel.org/msg1306545.html
> http://www.spinics.net/lists/arm-kernel/msg566947.html
> 
> but above inspiraiton solves the half of the problem.
> the rest of the problem is descrbied below, what we face on iproc based
> SOCs.
> 
> current pcie frmework and of framework integration assumes dma-ranges in a
> way where memory-mapped devices define their dma-ranges.
> dma-ranges: (child-bus-address, parent-bus-address, length).
> 
> but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> 
> of_dma_configure is specifically witten to take care of memory mapped
> devices.
> but no implementation exists for pci to take care of pcie based memory
> ranges.
> in fact pci world doesnt seem to define standard dma-ranges since there is
> an absense of the same, the dma_mask used to remain 32bit because of
> 0 size return (parsed by of_dma_configure())
> 
> this patch also implements of_pci_get_dma_ranges to cater to pci world
> dma-ranges.
> so then the returned size get best possible (largest) dma_mask.
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>; we should get
> dev->coherent_dma_mask=0x7fffffffff.
> 
> conclusion: there are following problems
> 1) linux pci and iommu framework integration has glitches with respect to
> dma-ranges
> 2) pci linux framework look very uncertain about dma-ranges, rather
> binding is not defined
>    the way it is defined for memory mapped devices.
>    rcar and iproc based SOCs use their custom one dma-ranges
>    (rather can be standard)
> 3) even if in case of default parser of_dma_get_ranges,:
>    it throws and erro"
>    "no dma-ranges found for node"
>    because of the bug which exists.
>    following lines should be moved to the end of while(1)
> 	839                 node = of_get_next_parent(node);
> 	840                 if (!node)
> 	841                         break;

Right, having made sense of this and looked into things myself I think I
understand now; what this boils down to is that the existing
implementation of of_dma_get_range() expects always to be given a leaf
device_node, and doesn't cope with being given a device_node for the
given device's parent bus directly. That's really all there is; it's not
specific to PCI (there are other probeable and DMA-capable buses whose
children aren't described in DT, like the fsl-mc thing), and it
definitely doesn't have anything to do with IOMMUs.

Now, that's certainly something to fix, but AFAICS this patch doesn't do
that, only adds some PCI-specific code which is never called.

DMA mask inheritance for arm64 is another issue, which again is general,
but does tend to be more visible in the IOMMU case. That still needs
some work on the APCI side - all the DT-centric approaches so far either
regress or at best do nothing for ACPI. I've made a note to try to look
into that soon, but from what I recall I fear there is still an open
question about what to do for a default in the absence of IORT or _DMA
(once the current assumption that drivers can override our arbitrary
default at will is closed down).

In the meantime, have you tried 4.11-rc1 or later on the affected
system? One of the ulterior motives behind 122fac030e91 was that in many
cases it also happens to paper over most versions of this problem for
PCI devices, and makes the IOMMU at least useable (on systems which
don't need to dma_map_*() vast amounts of RAM all at once) while we fix
the underlying things properly.

Robin.

> Reviewed-by: Anup Patel <anup.patel@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index
> 8c7c244..20cfff7 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE  config NEED_SG_DMA_LENGTH
>  	def_bool y
> 
> +config ARCH_HAS_DMA_SET_COHERENT_MASK
> +	def_bool y
> +
>  config SMP
>  	def_bool y
> 
> diff --git a/arch/arm64/include/asm/device.h
> b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644
> --- a/arch/arm64/include/asm/device.h
> +++ b/arch/arm64/include/asm/device.h
> @@ -20,6 +20,7 @@ struct dev_archdata {
>  #ifdef CONFIG_IOMMU_API
>  	void *iommu;			/* private IOMMU data */
>  #endif
> +	u64 parent_dma_mask;
>  	bool dma_coherent;
>  };
> 
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 81cdb2e..5845ecd 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void
> *virt, phys_addr_t phys)
>  	__dma_flush_area(virt, PAGE_SIZE);
>  }
> 
> +
>  static void *__iommu_alloc_attrs(struct device *dev, size_t size,
>  				 dma_addr_t *handle, gfp_t gfp,
>  				 unsigned long attrs)
> @@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device
> *dev,
>  	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);  }
> 
> +static int __iommu_set_dma_mask(struct device *dev, u64 mask) {
> +	/* device is not DMA capable */
> +	if (!dev->dma_mask)
> +		return -EIO;
> +
> +	if (mask > dev->archdata.parent_dma_mask)
> +		mask = dev->archdata.parent_dma_mask;
> +
> +	*dev->dma_mask = mask;
> +
> +	return 0;
> +}
> +
>  static const struct dma_map_ops iommu_dma_ops = {
>  	.alloc = __iommu_alloc_attrs,
>  	.free = __iommu_free_attrs,
> @@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device
> *dev,
>  	.map_resource = iommu_dma_map_resource,
>  	.unmap_resource = iommu_dma_unmap_resource,
>  	.mapping_error = iommu_dma_mapping_error,
> +	.set_dma_mask = __iommu_set_dma_mask,
>  };
> 
> +int dma_set_coherent_mask(struct device *dev, u64 mask) {
> +	if (get_dma_ops(dev) == &iommu_dma_ops &&
> +	    mask > dev->archdata.parent_dma_mask)
> +		mask = dev->archdata.parent_dma_mask;
> +
> +	dev->coherent_dma_mask = mask;
> +	return 0;
> +}
> +EXPORT_SYMBOL(dma_set_coherent_mask);
> +
> +
>  /*
>   * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
>   * everything it needs to - the device is only partially created and the
> @@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev, u64
> dma_base, u64 size,
>  	if (!dev->dma_ops)
>  		dev->dma_ops = &swiotlb_dma_ops;
> 
> +	dev->archdata.parent_dma_mask = size - 1;
> +
>  	dev->archdata.dma_coherent = coherent;
>  	__iommu_setup_dma_ops(dev, dma_base, size, iommu);  } diff --git
> a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..5804717 100644
> --- a/drivers/of/of_pci.c
> +++ b/drivers/of/of_pci.c
> @@ -283,6 +283,51 @@ int of_pci_get_host_bridge_resources(struct
> device_node *dev,
>  	return err;
>  }
>  EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
> +
> +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
> +*paddr, u64 *size) {
> +	struct device_node *node = of_node_get(np);
> +	int rlen, naddr, nsize, pna;
> +	int ret = 0;
> +	const int na = 3, ns = 2;
> +	struct of_pci_range_parser parser;
> +	struct of_pci_range range;
> +
> +	if (!node)
> +		return -EINVAL;
> +
> +	parser.node = node;
> +	parser.pna = of_n_addr_cells(node);
> +	parser.np = parser.pna + na + ns;
> +
> +	parser.range = of_get_property(node, "dma-ranges", &rlen);
> +
> +	if (!parser.range) {
> +		pr_debug("pcie device has no dma-ranges defined for
> node(%s)\n", np->full_name);
> +		ret = -ENODEV;
> +		goto out;
> +	}
> +
> +	parser.end = parser.range + rlen / sizeof(__be32);
> +
> +	/* how do we take care of multiple dma windows ?. */
> +	for_each_of_pci_range(&parser, &range) {
> +		*dma_addr = range.pci_addr;
> +		*size = range.size;
> +		*paddr = range.cpu_addr;
> +	}
> +
> +	pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
> +		 *dma_addr, *paddr, *size);
> +		 *dma_addr = range.pci_addr;
> +		 *size = range.size;
> +
> +out:
> +	of_node_put(node);
> +	return ret;
> +
> +}
> +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
>  #endif /* CONFIG_OF_ADDRESS */
> 
>  #ifdef CONFIG_PCI_MSI
> diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index
> 0e0974e..907ace0 100644
> --- a/include/linux/of_pci.h
> +++ b/include/linux/of_pci.h
> @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
> int of_pci_get_host_bridge_resources(struct device_node *dev,
>  			unsigned char busno, unsigned char bus_max,
>  			struct list_head *resources, resource_size_t
> *io_base);
> +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
> +*paddr, u64 *size);
>  #else
>  static inline int of_pci_get_host_bridge_resources(struct device_node
> *dev,
>  			unsigned char busno, unsigned char bus_max, @@
> -83,6 +84,11 @@ static inline int of_pci_get_host_bridge_resources(struct
> device_node *dev,  {
>  	return -EINVAL;
>  }
> +
> +static inline int of_pci_get_dma_ranges(struct device_node *np, u64
> +*dma_addr, u64 *paddr, u64 *size) {
> +	return -EINVAL;
> +}
>  #endif
> 
>  #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
> --
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation
@ 2017-03-20 17:49     ` Oza Oza via iommu
  0 siblings, 0 replies; 14+ messages in thread
From: Oza Oza @ 2017-03-20 17:49 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, linux-pci, iommu, linux-kernel, linux-arm-kernel,
	devicetree, bcm-kernel-feedback-list

Hi Robin,

Please find my comments inline.

Regards,
Oza.

-----Original Message-----
From: Robin Murphy [mailto:robin.murphy@arm.com]
Sent: Monday, March 20, 2017 9:14 PM
To: Oza Oza
Cc: Joerg Roedel; linux-pci@vger.kernel.org;
iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
linux-arm-kernel@lists.infradead.org; devicetree@vger.kernel.org;
bcm-kernel-feedback-list@broadcom.com
Subject: Re: [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for
IOVA allocation

On 20/03/17 08:57, Oza Oza wrote:
> +  linux-pci
>
> Regards,
> Oza.
>
> -----Original Message-----
> From: Oza Pawandeep [mailto:oza.oza@broadcom.com]
> Sent: Friday, March 17, 2017 11:41 AM
> To: Joerg Roedel; Robin Murphy
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> linux-arm-kernel@lists.infradead.org; devicetree@vger.kernel.org;
> bcm-kernel-feedback-list@broadcom.com; Oza Pawandeep
> Subject: [RFC PATCH] iommu/dma: account pci host bridge dma_mask for
> IOVA allocation
>
> It is possible that PCI device supports 64-bit DMA addressing, and
> thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however
> PCI host bridge may have limitations on the inbound transaction
> addressing. As an example, consider NVME SSD device connected to
> iproc-PCIe controller.
>
> Currently, the IOMMU DMA ops only considers PCI device dma_mask when
> allocating an IOVA. This is particularly problematic on
> ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for
> in-bound transactions only after PCI Host has forwarded these
> transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA
> of in-bound transactions has to honor the addressing restrictions of the
> PCI Host.
>
> this patch is inspired by
> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.ht
> ml http://www.spinics.net/lists/arm-kernel/msg566947.html
>
> but above inspiraiton solves the half of the problem.
> the rest of the problem is descrbied below, what we face on iproc
> based SOCs.
>
> current pcie frmework and of framework integration assumes dma-ranges
> in a way where memory-mapped devices define their dma-ranges.
> dma-ranges: (child-bus-address, parent-bus-address, length).
>
> but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> of_dma_configure is specifically witten to take care of memory mapped
> devices.
> but no implementation exists for pci to take care of pcie based memory
> ranges.
> in fact pci world doesnt seem to define standard dma-ranges since
> there is an absense of the same, the dma_mask used to remain 32bit
> because of
> 0 size return (parsed by of_dma_configure())
>
> this patch also implements of_pci_get_dma_ranges to cater to pci world
> dma-ranges.
> so then the returned size get best possible (largest) dma_mask.
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>; we should get
> dev->coherent_dma_mask=0x7fffffffff.
>
> conclusion: there are following problems
> 1) linux pci and iommu framework integration has glitches with respect
> to dma-ranges
> 2) pci linux framework look very uncertain about dma-ranges, rather
> binding is not defined
>    the way it is defined for memory mapped devices.
>    rcar and iproc based SOCs use their custom one dma-ranges
>    (rather can be standard)
> 3) even if in case of default parser of_dma_get_ranges,:
>    it throws and erro"
>    "no dma-ranges found for node"
>    because of the bug which exists.
>    following lines should be moved to the end of while(1)
> 	839                 node = of_get_next_parent(node);
> 	840                 if (!node)
> 	841                         break;

Right, having made sense of this and looked into things myself I think I
understand now; what this boils down to is that the existing implementation
of of_dma_get_range() expects always to be given a leaf device_node, and
doesn't cope with being given a device_node for the given device's parent
bus directly. That's really all there is; it's not specific to PCI (there
are other probeable and DMA-capable buses whose children aren't described in
DT, like the fsl-mc thing), and it definitely doesn't have anything to do
with IOMMUs.

>Oza: I think it’s the other way around, or rather it is given leaf device
>node correctly. At-least in this case.
>The problem is of_dma_get_range jumps to parent node  <node =
>of_get_next_parent(node);> without examining child.
>Although I tried to fix it, but in that case, the dma-ranges parse code
>doesn’t really parse pci ranges. And size returned is 0.
>Rather it parses memory mapped devices dma-ranges. And that format is
>different.

Now, that's certainly something to fix, but AFAICS this patch doesn't do
that, only adds some PCI-specific code which is never called.

>Oza: it defines of_pci_dma_get_ranges, which does get called (ahhh......its
>my bad that, I don’t have that call in this patch-set, probably missed that
>file, sorry about that.)
>I have just pasted the patch at the end, check drivers/of/device.c
>Again, this code is specific to dma-ranges defined by pci host, which
>differs from the way memory-mapped device define their ranges.
>At-least that is the way binding document suggests, and current dma-range
>doesn’t parse pci dma-ranges correctly.

>So this patch fixes that.
>of_dma_configure , when it calls of_dma_get_range or in this case
>of_pci_dma_get_ranges, both should be retuning size correctly back.
>Because all the later statements make use of size to derive dma_mask.
>And from there, especially for pci, it derives root bridge mask, which
>suggests limitation of pci host bridges.
>Now the strange thing is that this limitation does not exist for us when
>IOMMU is disabled, which is expected because our inbound memory window just
>programed fine to address all the available memory in the system,
need not be physically contiguous.
>But when IOMMU is enabled, IOVA address size becomes limitation, and our
>max window can-not go beyond 512GB which is just 39bits.
> having said that, at-least parsing of dma-ranges is broken, and this patch
> is the an attempt to fix that.
> ideally I should be making pci dma-ranges patch and device patch to make
> it look like a proper patch. Do you think this is the only and right way
> to fix it,
Or you have any other opinions ?

DMA mask inheritance for arm64 is another issue, which again is general, but
does tend to be more visible in the IOMMU case. That still needs some work
on the APCI side - all the DT-centric approaches so far either regress or at
best do nothing for ACPI. I've made a note to try to look into that soon,
but from what I recall I fear there is still an open question about what to
do for a default in the absence of IORT or _DMA (once the current assumption
that drivers can override our arbitrary default at will is closed down).

In the meantime, have you tried 4.11-rc1 or later on the affected system?
One of the ulterior motives behind 122fac030e91 was that in many cases it
also happens to paper over most versions of this problem for PCI devices,
and makes the IOMMU at least useable (on systems which don't need to
dma_map_*() vast amounts of RAM all at once) while we fix the underlying
things properly.

Robin.

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index
0ee42c3..5804717 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,51 @@ int of_pci_get_host_bridge_resources(struct device_node
*dev,
 	return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
+*paddr, u64 *size) {
+	struct device_node *node = of_node_get(np);
+	int rlen, naddr, nsize, pna;
+	int ret = 0;
+	const int na = 3, ns = 2;
+	struct of_pci_range_parser parser;
+	struct of_pci_range range;
+
+	if (!node)
+		return -EINVAL;
+
+	parser.node = node;
+	parser.pna = of_n_addr_cells(node);
+	parser.np = parser.pna + na + ns;
+
+	parser.range = of_get_property(node, "dma-ranges", &rlen);
+
+	if (!parser.range) {
+		pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
np->full_name);
+		ret = -ENODEV;
+		goto out;
+	}
+
+	parser.end = parser.range + rlen / sizeof(__be32);
+
+	/* how do we take care of multiple dma windows ?. */
+	for_each_of_pci_range(&parser, &range) {
+		*dma_addr = range.pci_addr;
+		*size = range.size;
+		*paddr = range.cpu_addr;
+	}
+
+	pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+		 *dma_addr, *paddr, *size);
+		 *dma_addr = range.pci_addr;
+		 *size = range.size;
+
+out:
+	of_node_put(node);
+	return ret;
+
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */

 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index
0e0974e..907ace0 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }  int
of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max,
 			struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
+*paddr, u64 *size);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max, @@ -83,6 +84,11 @@ static
inline int of_pci_get_host_bridge_resources(struct device_node *dev,  {
 	return -EINVAL;
 }
+
+static inline int of_pci_get_dma_ranges(struct device_node *np, u64
+*dma_addr, u64 *paddr, u64 *size) {
+	return -EINVAL;
+}
 #endif

 #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
--
1.9.1

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 8c7c244..20cfff7
100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE  config NEED_SG_DMA_LENGTH
 	def_bool y

+config ARCH_HAS_DMA_SET_COHERENT_MASK
+	def_bool y
+
 config SMP
 	def_bool y

diff --git a/arch/arm64/include/asm/device.h
b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -20,6 +20,7 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
 	void *iommu;			/* private IOMMU data */
 #endif
+	u64 parent_dma_mask;
 	bool dma_coherent;
 };

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index
81cdb2e..5845ecd 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void
*virt, phys_addr_t phys)
 	__dma_flush_area(virt, PAGE_SIZE);
 }

+
 static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 				 dma_addr_t *handle, gfp_t gfp,
 				 unsigned long attrs)
@@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device *dev,
 	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);  }

+static int __iommu_set_dma_mask(struct device *dev, u64 mask) {
+	/* device is not DMA capable */
+	if (!dev->dma_mask)
+		return -EIO;
+
+	if (mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	*dev->dma_mask = mask;
+
+	return 0;
+}
+
 static const struct dma_map_ops iommu_dma_ops = {
 	.alloc = __iommu_alloc_attrs,
 	.free = __iommu_free_attrs,
@@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device *dev,
 	.map_resource = iommu_dma_map_resource,
 	.unmap_resource = iommu_dma_unmap_resource,
 	.mapping_error = iommu_dma_mapping_error,
+	.set_dma_mask = __iommu_set_dma_mask,
 };

+int dma_set_coherent_mask(struct device *dev, u64 mask) {
+	if (get_dma_ops(dev) == &iommu_dma_ops &&
+	    mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	dev->coherent_dma_mask = mask;
+	return 0;
+}
+EXPORT_SYMBOL(dma_set_coherent_mask);
+
+
 /*
  * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
  * everything it needs to - the device is only partially created and the
@@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev, u64
dma_base, u64 size,
 	if (!dev->dma_ops)
 		dev->dma_ops = &swiotlb_dma_ops;

+	dev->archdata.parent_dma_mask = size - 1;
+
 	dev->archdata.dma_coherent = coherent;
 	__iommu_setup_dma_ops(dev, dma_base, size, iommu);  }

diff --git a/drivers/of/device.c b/drivers/of/device.c index
b1e6beb..10ada4a 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -9,6 +9,7 @@
 #include <linux/module.h>
 #include <linux/mod_devicetable.h>
 #include <linux/slab.h>
+#include <linux/of_pci.h>

 #include <asm/errno.h>
 #include "of_private.h"
@@ -104,7 +105,11 @@ void of_dma_configure(struct device *dev, struct
device_node *np)
 	if (!dev->dma_mask)
 		dev->dma_mask = &dev->coherent_dma_mask;

-	ret = of_dma_get_range(np, &dma_addr, &paddr, &size);
+	if (dev_is_pci(dev))
+		ret = of_pci_get_dma_ranges(np, &dma_addr, &paddr, &size);
+	else
+		ret = of_dma_get_range(np, &dma_addr, &paddr, &size);
+
 	if (ret < 0) {
 		dma_addr = offset = 0;
 		size = dev->coherent_dma_mask + 1;
@@ -134,10 +139,8 @@ void of_dma_configure(struct device *dev, struct
device_node *np)
 	 * Limit coherent and dma mask based on size and default mask
 	 * set by the driver.
 	 */
-	dev->coherent_dma_mask = min(dev->coherent_dma_mask,
-				     DMA_BIT_MASK(ilog2(dma_addr + size)));
-	*dev->dma_mask = min((*dev->dma_mask),
-			     DMA_BIT_MASK(ilog2(dma_addr + size)));
+	dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(dma_addr + size));
+	*dev->dma_mask = dev->coherent_dma_mask;

 	coherent = of_dma_is_coherent(np);
 	dev_dbg(dev, "device is%sdma coherent\n", @@ -225,30 +228,6 @@ ssize_t
of_device_get_modalias(struct device *dev, char *str, ssize_t len)

 	return tsize;
 }
--
1.9.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation
@ 2017-03-20 17:49     ` Oza Oza via iommu
  0 siblings, 0 replies; 14+ messages in thread
From: Oza Oza via iommu @ 2017-03-20 17:49 UTC (permalink / raw)
  To: Robin Murphy
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	bcm-kernel-feedback-list-dY08KVG/lbpWk0Htik3J/w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Robin,

Please find my comments inline.

Regards,
Oza.

-----Original Message-----
From: Robin Murphy [mailto:robin.murphy@arm.com]
Sent: Monday, March 20, 2017 9:14 PM
To: Oza Oza
Cc: Joerg Roedel; linux-pci@vger.kernel.org;
iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
linux-arm-kernel@lists.infradead.org; devicetree@vger.kernel.org;
bcm-kernel-feedback-list@broadcom.com
Subject: Re: [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for
IOVA allocation

On 20/03/17 08:57, Oza Oza wrote:
> +  linux-pci
>
> Regards,
> Oza.
>
> -----Original Message-----
> From: Oza Pawandeep [mailto:oza.oza@broadcom.com]
> Sent: Friday, March 17, 2017 11:41 AM
> To: Joerg Roedel; Robin Murphy
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> linux-arm-kernel@lists.infradead.org; devicetree@vger.kernel.org;
> bcm-kernel-feedback-list@broadcom.com; Oza Pawandeep
> Subject: [RFC PATCH] iommu/dma: account pci host bridge dma_mask for
> IOVA allocation
>
> It is possible that PCI device supports 64-bit DMA addressing, and
> thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however
> PCI host bridge may have limitations on the inbound transaction
> addressing. As an example, consider NVME SSD device connected to
> iproc-PCIe controller.
>
> Currently, the IOMMU DMA ops only considers PCI device dma_mask when
> allocating an IOVA. This is particularly problematic on
> ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for
> in-bound transactions only after PCI Host has forwarded these
> transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA
> of in-bound transactions has to honor the addressing restrictions of the
> PCI Host.
>
> this patch is inspired by
> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.ht
> ml http://www.spinics.net/lists/arm-kernel/msg566947.html
>
> but above inspiraiton solves the half of the problem.
> the rest of the problem is descrbied below, what we face on iproc
> based SOCs.
>
> current pcie frmework and of framework integration assumes dma-ranges
> in a way where memory-mapped devices define their dma-ranges.
> dma-ranges: (child-bus-address, parent-bus-address, length).
>
> but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> of_dma_configure is specifically witten to take care of memory mapped
> devices.
> but no implementation exists for pci to take care of pcie based memory
> ranges.
> in fact pci world doesnt seem to define standard dma-ranges since
> there is an absense of the same, the dma_mask used to remain 32bit
> because of
> 0 size return (parsed by of_dma_configure())
>
> this patch also implements of_pci_get_dma_ranges to cater to pci world
> dma-ranges.
> so then the returned size get best possible (largest) dma_mask.
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>; we should get
> dev->coherent_dma_mask=0x7fffffffff.
>
> conclusion: there are following problems
> 1) linux pci and iommu framework integration has glitches with respect
> to dma-ranges
> 2) pci linux framework look very uncertain about dma-ranges, rather
> binding is not defined
>    the way it is defined for memory mapped devices.
>    rcar and iproc based SOCs use their custom one dma-ranges
>    (rather can be standard)
> 3) even if in case of default parser of_dma_get_ranges,:
>    it throws and erro"
>    "no dma-ranges found for node"
>    because of the bug which exists.
>    following lines should be moved to the end of while(1)
> 	839                 node = of_get_next_parent(node);
> 	840                 if (!node)
> 	841                         break;

Right, having made sense of this and looked into things myself I think I
understand now; what this boils down to is that the existing implementation
of of_dma_get_range() expects always to be given a leaf device_node, and
doesn't cope with being given a device_node for the given device's parent
bus directly. That's really all there is; it's not specific to PCI (there
are other probeable and DMA-capable buses whose children aren't described in
DT, like the fsl-mc thing), and it definitely doesn't have anything to do
with IOMMUs.

>Oza: I think it’s the other way around, or rather it is given leaf device
>node correctly. At-least in this case.
>The problem is of_dma_get_range jumps to parent node  <node =
>of_get_next_parent(node);> without examining child.
>Although I tried to fix it, but in that case, the dma-ranges parse code
>doesn’t really parse pci ranges. And size returned is 0.
>Rather it parses memory mapped devices dma-ranges. And that format is
>different.

Now, that's certainly something to fix, but AFAICS this patch doesn't do
that, only adds some PCI-specific code which is never called.

>Oza: it defines of_pci_dma_get_ranges, which does get called (ahhh......its
>my bad that, I don’t have that call in this patch-set, probably missed that
>file, sorry about that.)
>I have just pasted the patch at the end, check drivers/of/device.c
>Again, this code is specific to dma-ranges defined by pci host, which
>differs from the way memory-mapped device define their ranges.
>At-least that is the way binding document suggests, and current dma-range
>doesn’t parse pci dma-ranges correctly.

>So this patch fixes that.
>of_dma_configure , when it calls of_dma_get_range or in this case
>of_pci_dma_get_ranges, both should be retuning size correctly back.
>Because all the later statements make use of size to derive dma_mask.
>And from there, especially for pci, it derives root bridge mask, which
>suggests limitation of pci host bridges.
>Now the strange thing is that this limitation does not exist for us when
>IOMMU is disabled, which is expected because our inbound memory window just
>programed fine to address all the available memory in the system,
need not be physically contiguous.
>But when IOMMU is enabled, IOVA address size becomes limitation, and our
>max window can-not go beyond 512GB which is just 39bits.
> having said that, at-least parsing of dma-ranges is broken, and this patch
> is the an attempt to fix that.
> ideally I should be making pci dma-ranges patch and device patch to make
> it look like a proper patch. Do you think this is the only and right way
> to fix it,
Or you have any other opinions ?

DMA mask inheritance for arm64 is another issue, which again is general, but
does tend to be more visible in the IOMMU case. That still needs some work
on the APCI side - all the DT-centric approaches so far either regress or at
best do nothing for ACPI. I've made a note to try to look into that soon,
but from what I recall I fear there is still an open question about what to
do for a default in the absence of IORT or _DMA (once the current assumption
that drivers can override our arbitrary default at will is closed down).

In the meantime, have you tried 4.11-rc1 or later on the affected system?
One of the ulterior motives behind 122fac030e91 was that in many cases it
also happens to paper over most versions of this problem for PCI devices,
and makes the IOMMU at least useable (on systems which don't need to
dma_map_*() vast amounts of RAM all at once) while we fix the underlying
things properly.

Robin.

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index
0ee42c3..5804717 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,51 @@ int of_pci_get_host_bridge_resources(struct device_node
*dev,
 	return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
+*paddr, u64 *size) {
+	struct device_node *node = of_node_get(np);
+	int rlen, naddr, nsize, pna;
+	int ret = 0;
+	const int na = 3, ns = 2;
+	struct of_pci_range_parser parser;
+	struct of_pci_range range;
+
+	if (!node)
+		return -EINVAL;
+
+	parser.node = node;
+	parser.pna = of_n_addr_cells(node);
+	parser.np = parser.pna + na + ns;
+
+	parser.range = of_get_property(node, "dma-ranges", &rlen);
+
+	if (!parser.range) {
+		pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
np->full_name);
+		ret = -ENODEV;
+		goto out;
+	}
+
+	parser.end = parser.range + rlen / sizeof(__be32);
+
+	/* how do we take care of multiple dma windows ?. */
+	for_each_of_pci_range(&parser, &range) {
+		*dma_addr = range.pci_addr;
+		*size = range.size;
+		*paddr = range.cpu_addr;
+	}
+
+	pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+		 *dma_addr, *paddr, *size);
+		 *dma_addr = range.pci_addr;
+		 *size = range.size;
+
+out:
+	of_node_put(node);
+	return ret;
+
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */

 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index
0e0974e..907ace0 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }  int
of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max,
 			struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
+*paddr, u64 *size);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max, @@ -83,6 +84,11 @@ static
inline int of_pci_get_host_bridge_resources(struct device_node *dev,  {
 	return -EINVAL;
 }
+
+static inline int of_pci_get_dma_ranges(struct device_node *np, u64
+*dma_addr, u64 *paddr, u64 *size) {
+	return -EINVAL;
+}
 #endif

 #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
--
1.9.1

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 8c7c244..20cfff7
100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE  config NEED_SG_DMA_LENGTH
 	def_bool y

+config ARCH_HAS_DMA_SET_COHERENT_MASK
+	def_bool y
+
 config SMP
 	def_bool y

diff --git a/arch/arm64/include/asm/device.h
b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -20,6 +20,7 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
 	void *iommu;			/* private IOMMU data */
 #endif
+	u64 parent_dma_mask;
 	bool dma_coherent;
 };

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index
81cdb2e..5845ecd 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void
*virt, phys_addr_t phys)
 	__dma_flush_area(virt, PAGE_SIZE);
 }

+
 static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 				 dma_addr_t *handle, gfp_t gfp,
 				 unsigned long attrs)
@@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device *dev,
 	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);  }

+static int __iommu_set_dma_mask(struct device *dev, u64 mask) {
+	/* device is not DMA capable */
+	if (!dev->dma_mask)
+		return -EIO;
+
+	if (mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	*dev->dma_mask = mask;
+
+	return 0;
+}
+
 static const struct dma_map_ops iommu_dma_ops = {
 	.alloc = __iommu_alloc_attrs,
 	.free = __iommu_free_attrs,
@@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device *dev,
 	.map_resource = iommu_dma_map_resource,
 	.unmap_resource = iommu_dma_unmap_resource,
 	.mapping_error = iommu_dma_mapping_error,
+	.set_dma_mask = __iommu_set_dma_mask,
 };

+int dma_set_coherent_mask(struct device *dev, u64 mask) {
+	if (get_dma_ops(dev) == &iommu_dma_ops &&
+	    mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	dev->coherent_dma_mask = mask;
+	return 0;
+}
+EXPORT_SYMBOL(dma_set_coherent_mask);
+
+
 /*
  * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
  * everything it needs to - the device is only partially created and the
@@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev, u64
dma_base, u64 size,
 	if (!dev->dma_ops)
 		dev->dma_ops = &swiotlb_dma_ops;

+	dev->archdata.parent_dma_mask = size - 1;
+
 	dev->archdata.dma_coherent = coherent;
 	__iommu_setup_dma_ops(dev, dma_base, size, iommu);  }

diff --git a/drivers/of/device.c b/drivers/of/device.c index
b1e6beb..10ada4a 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -9,6 +9,7 @@
 #include <linux/module.h>
 #include <linux/mod_devicetable.h>
 #include <linux/slab.h>
+#include <linux/of_pci.h>

 #include <asm/errno.h>
 #include "of_private.h"
@@ -104,7 +105,11 @@ void of_dma_configure(struct device *dev, struct
device_node *np)
 	if (!dev->dma_mask)
 		dev->dma_mask = &dev->coherent_dma_mask;

-	ret = of_dma_get_range(np, &dma_addr, &paddr, &size);
+	if (dev_is_pci(dev))
+		ret = of_pci_get_dma_ranges(np, &dma_addr, &paddr, &size);
+	else
+		ret = of_dma_get_range(np, &dma_addr, &paddr, &size);
+
 	if (ret < 0) {
 		dma_addr = offset = 0;
 		size = dev->coherent_dma_mask + 1;
@@ -134,10 +139,8 @@ void of_dma_configure(struct device *dev, struct
device_node *np)
 	 * Limit coherent and dma mask based on size and default mask
 	 * set by the driver.
 	 */
-	dev->coherent_dma_mask = min(dev->coherent_dma_mask,
-				     DMA_BIT_MASK(ilog2(dma_addr + size)));
-	*dev->dma_mask = min((*dev->dma_mask),
-			     DMA_BIT_MASK(ilog2(dma_addr + size)));
+	dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(dma_addr + size));
+	*dev->dma_mask = dev->coherent_dma_mask;

 	coherent = of_dma_is_coherent(np);
 	dev_dbg(dev, "device is%sdma coherent\n", @@ -225,30 +228,6 @@ ssize_t
of_device_get_modalias(struct device *dev, char *str, ssize_t len)

 	return tsize;
 }
--
1.9.1
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation
@ 2017-03-20 17:49     ` Oza Oza via iommu
  0 siblings, 0 replies; 14+ messages in thread
From: Oza Oza @ 2017-03-20 17:49 UTC (permalink / raw)
  To: Robin Murphy
  Cc: devicetree, linux-pci, Joerg Roedel, linux-kernel, iommu,
	bcm-kernel-feedback-list, linux-arm-kernel

SGkgUm9iaW4sCgpQbGVhc2UgZmluZCBteSBjb21tZW50cyBpbmxpbmUuCgpSZWdhcmRzLApPemEu
CgotLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQpGcm9tOiBSb2JpbiBNdXJwaHkgW21haWx0bzpy
b2Jpbi5tdXJwaHlAYXJtLmNvbV0KU2VudDogTW9uZGF5LCBNYXJjaCAyMCwgMjAxNyA5OjE0IFBN
ClRvOiBPemEgT3phCkNjOiBKb2VyZyBSb2VkZWw7IGxpbnV4LXBjaUB2Z2VyLmtlcm5lbC5vcmc7
CmlvbW11QGxpc3RzLmxpbnV4LWZvdW5kYXRpb24ub3JnOyBsaW51eC1rZXJuZWxAdmdlci5rZXJu
ZWwub3JnOwpsaW51eC1hcm0ta2VybmVsQGxpc3RzLmluZnJhZGVhZC5vcmc7IGRldmljZXRyZWVA
dmdlci5rZXJuZWwub3JnOwpiY20ta2VybmVsLWZlZWRiYWNrLWxpc3RAYnJvYWRjb20uY29tClN1
YmplY3Q6IFJlOiBbUkZDIFBBVENIXSBpb21tdS9kbWEvcGNpOiBhY2NvdW50IHBjaSBob3N0IGJy
aWRnZSBkbWFfbWFzayBmb3IKSU9WQSBhbGxvY2F0aW9uCgpPbiAyMC8wMy8xNyAwODo1NywgT3ph
IE96YSB3cm90ZToKPiArICBsaW51eC1wY2kKPgo+IFJlZ2FyZHMsCj4gT3phLgo+Cj4gLS0tLS1P
cmlnaW5hbCBNZXNzYWdlLS0tLS0KPiBGcm9tOiBPemEgUGF3YW5kZWVwIFttYWlsdG86b3phLm96
YUBicm9hZGNvbS5jb21dCj4gU2VudDogRnJpZGF5LCBNYXJjaCAxNywgMjAxNyAxMTo0MSBBTQo+
IFRvOiBKb2VyZyBSb2VkZWw7IFJvYmluIE11cnBoeQo+IENjOiBpb21tdUBsaXN0cy5saW51eC1m
b3VuZGF0aW9uLm9yZzsgbGludXgta2VybmVsQHZnZXIua2VybmVsLm9yZzsKPiBsaW51eC1hcm0t
a2VybmVsQGxpc3RzLmluZnJhZGVhZC5vcmc7IGRldmljZXRyZWVAdmdlci5rZXJuZWwub3JnOwo+
IGJjbS1rZXJuZWwtZmVlZGJhY2stbGlzdEBicm9hZGNvbS5jb207IE96YSBQYXdhbmRlZXAKPiBT
dWJqZWN0OiBbUkZDIFBBVENIXSBpb21tdS9kbWE6IGFjY291bnQgcGNpIGhvc3QgYnJpZGdlIGRt
YV9tYXNrIGZvcgo+IElPVkEgYWxsb2NhdGlvbgo+Cj4gSXQgaXMgcG9zc2libGUgdGhhdCBQQ0kg
ZGV2aWNlIHN1cHBvcnRzIDY0LWJpdCBETUEgYWRkcmVzc2luZywgYW5kCj4gdGh1cyBpdCdzIGRy
aXZlciBzZXRzIGRldmljZSdzIGRtYV9tYXNrIHRvIERNQV9CSVRfTUFTSyg2NCksIGhvd2V2ZXIK
PiBQQ0kgaG9zdCBicmlkZ2UgbWF5IGhhdmUgbGltaXRhdGlvbnMgb24gdGhlIGluYm91bmQgdHJh
bnNhY3Rpb24KPiBhZGRyZXNzaW5nLiBBcyBhbiBleGFtcGxlLCBjb25zaWRlciBOVk1FIFNTRCBk
ZXZpY2UgY29ubmVjdGVkIHRvCj4gaXByb2MtUENJZSBjb250cm9sbGVyLgo+Cj4gQ3VycmVudGx5
LCB0aGUgSU9NTVUgRE1BIG9wcyBvbmx5IGNvbnNpZGVycyBQQ0kgZGV2aWNlIGRtYV9tYXNrIHdo
ZW4KPiBhbGxvY2F0aW5nIGFuIElPVkEuIFRoaXMgaXMgcGFydGljdWxhcmx5IHByb2JsZW1hdGlj
IG9uCj4gQVJNL0FSTTY0IFNPQ3Mgd2hlcmUgdGhlIElPTU1VIChpLmUuIFNNTVUpIHRyYW5zbGF0
ZXMgSU9WQSB0byBQQSBmb3IKPiBpbi1ib3VuZCB0cmFuc2FjdGlvbnMgb25seSBhZnRlciBQQ0kg
SG9zdCBoYXMgZm9yd2FyZGVkIHRoZXNlCj4gdHJhbnNhY3Rpb25zIG9uIFNPQyBJTyBidXMuIFRo
aXMgbWVhbnMgb24gc3VjaCBBUk0vQVJNNjQgU09DcyB0aGUgSU9WQQo+IG9mIGluLWJvdW5kIHRy
YW5zYWN0aW9ucyBoYXMgdG8gaG9ub3IgdGhlIGFkZHJlc3NpbmcgcmVzdHJpY3Rpb25zIG9mIHRo
ZQo+IFBDSSBIb3N0Lgo+Cj4gdGhpcyBwYXRjaCBpcyBpbnNwaXJlZCBieQo+IGh0dHA6Ly93d3cu
bWFpbC1hcmNoaXZlLmNvbS9saW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnL21zZzEzMDY1NDUu
aHQKPiBtbCBodHRwOi8vd3d3LnNwaW5pY3MubmV0L2xpc3RzL2FybS1rZXJuZWwvbXNnNTY2OTQ3
Lmh0bWwKPgo+IGJ1dCBhYm92ZSBpbnNwaXJhaXRvbiBzb2x2ZXMgdGhlIGhhbGYgb2YgdGhlIHBy
b2JsZW0uCj4gdGhlIHJlc3Qgb2YgdGhlIHByb2JsZW0gaXMgZGVzY3JiaWVkIGJlbG93LCB3aGF0
IHdlIGZhY2Ugb24gaXByb2MKPiBiYXNlZCBTT0NzLgo+Cj4gY3VycmVudCBwY2llIGZybWV3b3Jr
IGFuZCBvZiBmcmFtZXdvcmsgaW50ZWdyYXRpb24gYXNzdW1lcyBkbWEtcmFuZ2VzCj4gaW4gYSB3
YXkgd2hlcmUgbWVtb3J5LW1hcHBlZCBkZXZpY2VzIGRlZmluZSB0aGVpciBkbWEtcmFuZ2VzLgo+
IGRtYS1yYW5nZXM6IChjaGlsZC1idXMtYWRkcmVzcywgcGFyZW50LWJ1cy1hZGRyZXNzLCBsZW5n
dGgpLgo+Cj4gYnV0IGlwcm9jIGJhc2VkIFNPQ3MgYW5kIGV2ZW4gUmNhciBiYXNlZCBTT0NzIGhh
cyBQQ0kgd29ybGQgZG1hLXJhbmdlcy4KPiBkbWEtcmFuZ2VzID0gPDB4NDMwMDAwMDAgMHgwMCAw
eDAwIDB4MDAgMHgwMCAweDgwIDB4MDA+Owo+Cj4gb2ZfZG1hX2NvbmZpZ3VyZSBpcyBzcGVjaWZp
Y2FsbHkgd2l0dGVuIHRvIHRha2UgY2FyZSBvZiBtZW1vcnkgbWFwcGVkCj4gZGV2aWNlcy4KPiBi
dXQgbm8gaW1wbGVtZW50YXRpb24gZXhpc3RzIGZvciBwY2kgdG8gdGFrZSBjYXJlIG9mIHBjaWUg
YmFzZWQgbWVtb3J5Cj4gcmFuZ2VzLgo+IGluIGZhY3QgcGNpIHdvcmxkIGRvZXNudCBzZWVtIHRv
IGRlZmluZSBzdGFuZGFyZCBkbWEtcmFuZ2VzIHNpbmNlCj4gdGhlcmUgaXMgYW4gYWJzZW5zZSBv
ZiB0aGUgc2FtZSwgdGhlIGRtYV9tYXNrIHVzZWQgdG8gcmVtYWluIDMyYml0Cj4gYmVjYXVzZSBv
Zgo+IDAgc2l6ZSByZXR1cm4gKHBhcnNlZCBieSBvZl9kbWFfY29uZmlndXJlKCkpCj4KPiB0aGlz
IHBhdGNoIGFsc28gaW1wbGVtZW50cyBvZl9wY2lfZ2V0X2RtYV9yYW5nZXMgdG8gY2F0ZXIgdG8g
cGNpIHdvcmxkCj4gZG1hLXJhbmdlcy4KPiBzbyB0aGVuIHRoZSByZXR1cm5lZCBzaXplIGdldCBi
ZXN0IHBvc3NpYmxlIChsYXJnZXN0KSBkbWFfbWFzay4KPiBmb3IgZS5nLgo+IGRtYS1yYW5nZXMg
PSA8MHg0MzAwMDAwMCAweDAwIDB4MDAgMHgwMCAweDAwIDB4ODAgMHgwMD47IHdlIHNob3VsZCBn
ZXQKPiBkZXYtPmNvaGVyZW50X2RtYV9tYXNrPTB4N2ZmZmZmZmZmZi4KPgo+IGNvbmNsdXNpb246
IHRoZXJlIGFyZSBmb2xsb3dpbmcgcHJvYmxlbXMKPiAxKSBsaW51eCBwY2kgYW5kIGlvbW11IGZy
YW1ld29yayBpbnRlZ3JhdGlvbiBoYXMgZ2xpdGNoZXMgd2l0aCByZXNwZWN0Cj4gdG8gZG1hLXJh
bmdlcwo+IDIpIHBjaSBsaW51eCBmcmFtZXdvcmsgbG9vayB2ZXJ5IHVuY2VydGFpbiBhYm91dCBk
bWEtcmFuZ2VzLCByYXRoZXIKPiBiaW5kaW5nIGlzIG5vdCBkZWZpbmVkCj4gICAgdGhlIHdheSBp
dCBpcyBkZWZpbmVkIGZvciBtZW1vcnkgbWFwcGVkIGRldmljZXMuCj4gICAgcmNhciBhbmQgaXBy
b2MgYmFzZWQgU09DcyB1c2UgdGhlaXIgY3VzdG9tIG9uZSBkbWEtcmFuZ2VzCj4gICAgKHJhdGhl
ciBjYW4gYmUgc3RhbmRhcmQpCj4gMykgZXZlbiBpZiBpbiBjYXNlIG9mIGRlZmF1bHQgcGFyc2Vy
IG9mX2RtYV9nZXRfcmFuZ2VzLDoKPiAgICBpdCB0aHJvd3MgYW5kIGVycm8iCj4gICAgIm5vIGRt
YS1yYW5nZXMgZm91bmQgZm9yIG5vZGUiCj4gICAgYmVjYXVzZSBvZiB0aGUgYnVnIHdoaWNoIGV4
aXN0cy4KPiAgICBmb2xsb3dpbmcgbGluZXMgc2hvdWxkIGJlIG1vdmVkIHRvIHRoZSBlbmQgb2Yg
d2hpbGUoMSkKPiAJODM5ICAgICAgICAgICAgICAgICBub2RlID0gb2ZfZ2V0X25leHRfcGFyZW50
KG5vZGUpOwo+IAk4NDAgICAgICAgICAgICAgICAgIGlmICghbm9kZSkKPiAJODQxICAgICAgICAg
ICAgICAgICAgICAgICAgIGJyZWFrOwoKUmlnaHQsIGhhdmluZyBtYWRlIHNlbnNlIG9mIHRoaXMg
YW5kIGxvb2tlZCBpbnRvIHRoaW5ncyBteXNlbGYgSSB0aGluayBJCnVuZGVyc3RhbmQgbm93OyB3
aGF0IHRoaXMgYm9pbHMgZG93biB0byBpcyB0aGF0IHRoZSBleGlzdGluZyBpbXBsZW1lbnRhdGlv
bgpvZiBvZl9kbWFfZ2V0X3JhbmdlKCkgZXhwZWN0cyBhbHdheXMgdG8gYmUgZ2l2ZW4gYSBsZWFm
IGRldmljZV9ub2RlLCBhbmQKZG9lc24ndCBjb3BlIHdpdGggYmVpbmcgZ2l2ZW4gYSBkZXZpY2Vf
bm9kZSBmb3IgdGhlIGdpdmVuIGRldmljZSdzIHBhcmVudApidXMgZGlyZWN0bHkuIFRoYXQncyBy
ZWFsbHkgYWxsIHRoZXJlIGlzOyBpdCdzIG5vdCBzcGVjaWZpYyB0byBQQ0kgKHRoZXJlCmFyZSBv
dGhlciBwcm9iZWFibGUgYW5kIERNQS1jYXBhYmxlIGJ1c2VzIHdob3NlIGNoaWxkcmVuIGFyZW4n
dCBkZXNjcmliZWQgaW4KRFQsIGxpa2UgdGhlIGZzbC1tYyB0aGluZyksIGFuZCBpdCBkZWZpbml0
ZWx5IGRvZXNuJ3QgaGF2ZSBhbnl0aGluZyB0byBkbwp3aXRoIElPTU1Vcy4KCj5PemE6IEkgdGhp
bmsgaXTigJlzIHRoZSBvdGhlciB3YXkgYXJvdW5kLCBvciByYXRoZXIgaXQgaXMgZ2l2ZW4gbGVh
ZiBkZXZpY2UKPm5vZGUgY29ycmVjdGx5LiBBdC1sZWFzdCBpbiB0aGlzIGNhc2UuCj5UaGUgcHJv
YmxlbSBpcyBvZl9kbWFfZ2V0X3JhbmdlIGp1bXBzIHRvIHBhcmVudCBub2RlICA8bm9kZSA9Cj5v
Zl9nZXRfbmV4dF9wYXJlbnQobm9kZSk7PiB3aXRob3V0IGV4YW1pbmluZyBjaGlsZC4KPkFsdGhv
dWdoIEkgdHJpZWQgdG8gZml4IGl0LCBidXQgaW4gdGhhdCBjYXNlLCB0aGUgZG1hLXJhbmdlcyBw
YXJzZSBjb2RlCj5kb2VzbuKAmXQgcmVhbGx5IHBhcnNlIHBjaSByYW5nZXMuIEFuZCBzaXplIHJl
dHVybmVkIGlzIDAuCj5SYXRoZXIgaXQgcGFyc2VzIG1lbW9yeSBtYXBwZWQgZGV2aWNlcyBkbWEt
cmFuZ2VzLiBBbmQgdGhhdCBmb3JtYXQgaXMKPmRpZmZlcmVudC4KCk5vdywgdGhhdCdzIGNlcnRh
aW5seSBzb21ldGhpbmcgdG8gZml4LCBidXQgQUZBSUNTIHRoaXMgcGF0Y2ggZG9lc24ndCBkbwp0
aGF0LCBvbmx5IGFkZHMgc29tZSBQQ0ktc3BlY2lmaWMgY29kZSB3aGljaCBpcyBuZXZlciBjYWxs
ZWQuCgo+T3phOiBpdCBkZWZpbmVzIG9mX3BjaV9kbWFfZ2V0X3Jhbmdlcywgd2hpY2ggZG9lcyBn
ZXQgY2FsbGVkIChhaGhoLi4uLi4uaXRzCj5teSBiYWQgdGhhdCwgSSBkb27igJl0IGhhdmUgdGhh
dCBjYWxsIGluIHRoaXMgcGF0Y2gtc2V0LCBwcm9iYWJseSBtaXNzZWQgdGhhdAo+ZmlsZSwgc29y
cnkgYWJvdXQgdGhhdC4pCj5JIGhhdmUganVzdCBwYXN0ZWQgdGhlIHBhdGNoIGF0IHRoZSBlbmQs
IGNoZWNrIGRyaXZlcnMvb2YvZGV2aWNlLmMKPkFnYWluLCB0aGlzIGNvZGUgaXMgc3BlY2lmaWMg
dG8gZG1hLXJhbmdlcyBkZWZpbmVkIGJ5IHBjaSBob3N0LCB3aGljaAo+ZGlmZmVycyBmcm9tIHRo
ZSB3YXkgbWVtb3J5LW1hcHBlZCBkZXZpY2UgZGVmaW5lIHRoZWlyIHJhbmdlcy4KPkF0LWxlYXN0
IHRoYXQgaXMgdGhlIHdheSBiaW5kaW5nIGRvY3VtZW50IHN1Z2dlc3RzLCBhbmQgY3VycmVudCBk
bWEtcmFuZ2UKPmRvZXNu4oCZdCBwYXJzZSBwY2kgZG1hLXJhbmdlcyBjb3JyZWN0bHkuCgo+U28g
dGhpcyBwYXRjaCBmaXhlcyB0aGF0Lgo+b2ZfZG1hX2NvbmZpZ3VyZSAsIHdoZW4gaXQgY2FsbHMg
b2ZfZG1hX2dldF9yYW5nZSBvciBpbiB0aGlzIGNhc2UKPm9mX3BjaV9kbWFfZ2V0X3Jhbmdlcywg
Ym90aCBzaG91bGQgYmUgcmV0dW5pbmcgc2l6ZSBjb3JyZWN0bHkgYmFjay4KPkJlY2F1c2UgYWxs
IHRoZSBsYXRlciBzdGF0ZW1lbnRzIG1ha2UgdXNlIG9mIHNpemUgdG8gZGVyaXZlIGRtYV9tYXNr
Lgo+QW5kIGZyb20gdGhlcmUsIGVzcGVjaWFsbHkgZm9yIHBjaSwgaXQgZGVyaXZlcyByb290IGJy
aWRnZSBtYXNrLCB3aGljaAo+c3VnZ2VzdHMgbGltaXRhdGlvbiBvZiBwY2kgaG9zdCBicmlkZ2Vz
Lgo+Tm93IHRoZSBzdHJhbmdlIHRoaW5nIGlzIHRoYXQgdGhpcyBsaW1pdGF0aW9uIGRvZXMgbm90
IGV4aXN0IGZvciB1cyB3aGVuCj5JT01NVSBpcyBkaXNhYmxlZCwgd2hpY2ggaXMgZXhwZWN0ZWQg
YmVjYXVzZSBvdXIgaW5ib3VuZCBtZW1vcnkgd2luZG93IGp1c3QKPnByb2dyYW1lZCBmaW5lIHRv
IGFkZHJlc3MgYWxsIHRoZSBhdmFpbGFibGUgbWVtb3J5IGluIHRoZSBzeXN0ZW0sCm5lZWQgbm90
IGJlIHBoeXNpY2FsbHkgY29udGlndW91cy4KPkJ1dCB3aGVuIElPTU1VIGlzIGVuYWJsZWQsIElP
VkEgYWRkcmVzcyBzaXplIGJlY29tZXMgbGltaXRhdGlvbiwgYW5kIG91cgo+bWF4IHdpbmRvdyBj
YW4tbm90IGdvIGJleW9uZCA1MTJHQiB3aGljaCBpcyBqdXN0IDM5Yml0cy4KPiBoYXZpbmcgc2Fp
ZCB0aGF0LCBhdC1sZWFzdCBwYXJzaW5nIG9mIGRtYS1yYW5nZXMgaXMgYnJva2VuLCBhbmQgdGhp
cyBwYXRjaAo+IGlzIHRoZSBhbiBhdHRlbXB0IHRvIGZpeCB0aGF0Lgo+IGlkZWFsbHkgSSBzaG91
bGQgYmUgbWFraW5nIHBjaSBkbWEtcmFuZ2VzIHBhdGNoIGFuZCBkZXZpY2UgcGF0Y2ggdG8gbWFr
ZQo+IGl0IGxvb2sgbGlrZSBhIHByb3BlciBwYXRjaC4gRG8geW91IHRoaW5rIHRoaXMgaXMgdGhl
IG9ubHkgYW5kIHJpZ2h0IHdheQo+IHRvIGZpeCBpdCwKT3IgeW91IGhhdmUgYW55IG90aGVyIG9w
aW5pb25zID8KCkRNQSBtYXNrIGluaGVyaXRhbmNlIGZvciBhcm02NCBpcyBhbm90aGVyIGlzc3Vl
LCB3aGljaCBhZ2FpbiBpcyBnZW5lcmFsLCBidXQKZG9lcyB0ZW5kIHRvIGJlIG1vcmUgdmlzaWJs
ZSBpbiB0aGUgSU9NTVUgY2FzZS4gVGhhdCBzdGlsbCBuZWVkcyBzb21lIHdvcmsKb24gdGhlIEFQ
Q0kgc2lkZSAtIGFsbCB0aGUgRFQtY2VudHJpYyBhcHByb2FjaGVzIHNvIGZhciBlaXRoZXIgcmVn
cmVzcyBvciBhdApiZXN0IGRvIG5vdGhpbmcgZm9yIEFDUEkuIEkndmUgbWFkZSBhIG5vdGUgdG8g
dHJ5IHRvIGxvb2sgaW50byB0aGF0IHNvb24sCmJ1dCBmcm9tIHdoYXQgSSByZWNhbGwgSSBmZWFy
IHRoZXJlIGlzIHN0aWxsIGFuIG9wZW4gcXVlc3Rpb24gYWJvdXQgd2hhdCB0bwpkbyBmb3IgYSBk
ZWZhdWx0IGluIHRoZSBhYnNlbmNlIG9mIElPUlQgb3IgX0RNQSAob25jZSB0aGUgY3VycmVudCBh
c3N1bXB0aW9uCnRoYXQgZHJpdmVycyBjYW4gb3ZlcnJpZGUgb3VyIGFyYml0cmFyeSBkZWZhdWx0
IGF0IHdpbGwgaXMgY2xvc2VkIGRvd24pLgoKSW4gdGhlIG1lYW50aW1lLCBoYXZlIHlvdSB0cmll
ZCA0LjExLXJjMSBvciBsYXRlciBvbiB0aGUgYWZmZWN0ZWQgc3lzdGVtPwpPbmUgb2YgdGhlIHVs
dGVyaW9yIG1vdGl2ZXMgYmVoaW5kIDEyMmZhYzAzMGU5MSB3YXMgdGhhdCBpbiBtYW55IGNhc2Vz
IGl0CmFsc28gaGFwcGVucyB0byBwYXBlciBvdmVyIG1vc3QgdmVyc2lvbnMgb2YgdGhpcyBwcm9i
bGVtIGZvciBQQ0kgZGV2aWNlcywKYW5kIG1ha2VzIHRoZSBJT01NVSBhdCBsZWFzdCB1c2VhYmxl
IChvbiBzeXN0ZW1zIHdoaWNoIGRvbid0IG5lZWQgdG8KZG1hX21hcF8qKCkgdmFzdCBhbW91bnRz
IG9mIFJBTSBhbGwgYXQgb25jZSkgd2hpbGUgd2UgZml4IHRoZSB1bmRlcmx5aW5nCnRoaW5ncyBw
cm9wZXJseS4KClJvYmluLgoKZGlmZiAtLWdpdCBhL2RyaXZlcnMvb2Yvb2ZfcGNpLmMgYi9kcml2
ZXJzL29mL29mX3BjaS5jIGluZGV4CjBlZTQyYzMuLjU4MDQ3MTcgMTAwNjQ0Ci0tLSBhL2RyaXZl
cnMvb2Yvb2ZfcGNpLmMKKysrIGIvZHJpdmVycy9vZi9vZl9wY2kuYwpAQCAtMjgzLDYgKzI4Myw1
MSBAQCBpbnQgb2ZfcGNpX2dldF9ob3N0X2JyaWRnZV9yZXNvdXJjZXMoc3RydWN0IGRldmljZV9u
b2RlCipkZXYsCiAJcmV0dXJuIGVycjsKIH0KIEVYUE9SVF9TWU1CT0xfR1BMKG9mX3BjaV9nZXRf
aG9zdF9icmlkZ2VfcmVzb3VyY2VzKTsKKworaW50IG9mX3BjaV9nZXRfZG1hX3JhbmdlcyhzdHJ1
Y3QgZGV2aWNlX25vZGUgKm5wLCB1NjQgKmRtYV9hZGRyLCB1NjQKKypwYWRkciwgdTY0ICpzaXpl
KSB7CisJc3RydWN0IGRldmljZV9ub2RlICpub2RlID0gb2Zfbm9kZV9nZXQobnApOworCWludCBy
bGVuLCBuYWRkciwgbnNpemUsIHBuYTsKKwlpbnQgcmV0ID0gMDsKKwljb25zdCBpbnQgbmEgPSAz
LCBucyA9IDI7CisJc3RydWN0IG9mX3BjaV9yYW5nZV9wYXJzZXIgcGFyc2VyOworCXN0cnVjdCBv
Zl9wY2lfcmFuZ2UgcmFuZ2U7CisKKwlpZiAoIW5vZGUpCisJCXJldHVybiAtRUlOVkFMOworCisJ
cGFyc2VyLm5vZGUgPSBub2RlOworCXBhcnNlci5wbmEgPSBvZl9uX2FkZHJfY2VsbHMobm9kZSk7
CisJcGFyc2VyLm5wID0gcGFyc2VyLnBuYSArIG5hICsgbnM7CisKKwlwYXJzZXIucmFuZ2UgPSBv
Zl9nZXRfcHJvcGVydHkobm9kZSwgImRtYS1yYW5nZXMiLCAmcmxlbik7CisKKwlpZiAoIXBhcnNl
ci5yYW5nZSkgeworCQlwcl9kZWJ1ZygicGNpZSBkZXZpY2UgaGFzIG5vIGRtYS1yYW5nZXMgZGVm
aW5lZCBmb3Igbm9kZSglcylcbiIsCm5wLT5mdWxsX25hbWUpOworCQlyZXQgPSAtRU5PREVWOwor
CQlnb3RvIG91dDsKKwl9CisKKwlwYXJzZXIuZW5kID0gcGFyc2VyLnJhbmdlICsgcmxlbiAvIHNp
emVvZihfX2JlMzIpOworCisJLyogaG93IGRvIHdlIHRha2UgY2FyZSBvZiBtdWx0aXBsZSBkbWEg
d2luZG93cyA/LiAqLworCWZvcl9lYWNoX29mX3BjaV9yYW5nZSgmcGFyc2VyLCAmcmFuZ2UpIHsK
KwkJKmRtYV9hZGRyID0gcmFuZ2UucGNpX2FkZHI7CisJCSpzaXplID0gcmFuZ2Uuc2l6ZTsKKwkJ
KnBhZGRyID0gcmFuZ2UuY3B1X2FkZHI7CisJfQorCisJcHJfZGVidWcoImRtYV9hZGRyKCVsbHgp
IGNwdV9hZGRyKCVsbHgpIHNpemUoJWxseClcbiIsCisJCSAqZG1hX2FkZHIsICpwYWRkciwgKnNp
emUpOworCQkgKmRtYV9hZGRyID0gcmFuZ2UucGNpX2FkZHI7CisJCSAqc2l6ZSA9IHJhbmdlLnNp
emU7CisKK291dDoKKwlvZl9ub2RlX3B1dChub2RlKTsKKwlyZXR1cm4gcmV0OworCit9CitFWFBP
UlRfU1lNQk9MX0dQTChvZl9wY2lfZ2V0X2RtYV9yYW5nZXMpOwogI2VuZGlmIC8qIENPTkZJR19P
Rl9BRERSRVNTICovCgogI2lmZGVmIENPTkZJR19QQ0lfTVNJCmRpZmYgLS1naXQgYS9pbmNsdWRl
L2xpbnV4L29mX3BjaS5oIGIvaW5jbHVkZS9saW51eC9vZl9wY2kuaCBpbmRleAowZTA5NzRlLi45
MDdhY2UwIDEwMDY0NAotLS0gYS9pbmNsdWRlL2xpbnV4L29mX3BjaS5oCisrKyBiL2luY2x1ZGUv
bGludXgvb2ZfcGNpLmgKQEAgLTc2LDYgKzc2LDcgQEAgc3RhdGljIGlubGluZSB2b2lkIG9mX3Bj
aV9jaGVja19wcm9iZV9vbmx5KHZvaWQpIHsgfSAgaW50Cm9mX3BjaV9nZXRfaG9zdF9icmlkZ2Vf
cmVzb3VyY2VzKHN0cnVjdCBkZXZpY2Vfbm9kZSAqZGV2LAogCQkJdW5zaWduZWQgY2hhciBidXNu
bywgdW5zaWduZWQgY2hhciBidXNfbWF4LAogCQkJc3RydWN0IGxpc3RfaGVhZCAqcmVzb3VyY2Vz
LCByZXNvdXJjZV9zaXplX3QgKmlvX2Jhc2UpOworaW50IG9mX3BjaV9nZXRfZG1hX3Jhbmdlcyhz
dHJ1Y3QgZGV2aWNlX25vZGUgKm5wLCB1NjQgKmRtYV9hZGRyLCB1NjQKKypwYWRkciwgdTY0ICpz
aXplKTsKICNlbHNlCiBzdGF0aWMgaW5saW5lIGludCBvZl9wY2lfZ2V0X2hvc3RfYnJpZGdlX3Jl
c291cmNlcyhzdHJ1Y3QgZGV2aWNlX25vZGUgKmRldiwKIAkJCXVuc2lnbmVkIGNoYXIgYnVzbm8s
IHVuc2lnbmVkIGNoYXIgYnVzX21heCwgQEAgLTgzLDYgKzg0LDExIEBAIHN0YXRpYwppbmxpbmUg
aW50IG9mX3BjaV9nZXRfaG9zdF9icmlkZ2VfcmVzb3VyY2VzKHN0cnVjdCBkZXZpY2Vfbm9kZSAq
ZGV2LCAgewogCXJldHVybiAtRUlOVkFMOwogfQorCitzdGF0aWMgaW5saW5lIGludCBvZl9wY2lf
Z2V0X2RtYV9yYW5nZXMoc3RydWN0IGRldmljZV9ub2RlICpucCwgdTY0CisqZG1hX2FkZHIsIHU2
NCAqcGFkZHIsIHU2NCAqc2l6ZSkgeworCXJldHVybiAtRUlOVkFMOworfQogI2VuZGlmCgogI2lm
IGRlZmluZWQoQ09ORklHX09GKSAmJiBkZWZpbmVkKENPTkZJR19QQ0lfTVNJKQotLQoxLjkuMQoK
ZGlmZiAtLWdpdCBhL2FyY2gvYXJtNjQvS2NvbmZpZyBiL2FyY2gvYXJtNjQvS2NvbmZpZyBpbmRl
eCA4YzdjMjQ0Li4yMGNmZmY3CjEwMDY0NAotLS0gYS9hcmNoL2FybTY0L0tjb25maWcKKysrIGIv
YXJjaC9hcm02NC9LY29uZmlnCkBAIC0yMTcsNiArMjE3LDkgQEAgY29uZmlnIE5FRURfRE1BX01B
UF9TVEFURSAgY29uZmlnIE5FRURfU0dfRE1BX0xFTkdUSAogCWRlZl9ib29sIHkKCitjb25maWcg
QVJDSF9IQVNfRE1BX1NFVF9DT0hFUkVOVF9NQVNLCisJZGVmX2Jvb2wgeQorCiBjb25maWcgU01Q
CiAJZGVmX2Jvb2wgeQoKZGlmZiAtLWdpdCBhL2FyY2gvYXJtNjQvaW5jbHVkZS9hc20vZGV2aWNl
LmgKYi9hcmNoL2FybTY0L2luY2x1ZGUvYXNtL2RldmljZS5oIGluZGV4IDczZDViYWIuLjY0YjRk
YzMgMTAwNjQ0Ci0tLSBhL2FyY2gvYXJtNjQvaW5jbHVkZS9hc20vZGV2aWNlLmgKKysrIGIvYXJj
aC9hcm02NC9pbmNsdWRlL2FzbS9kZXZpY2UuaApAQCAtMjAsNiArMjAsNyBAQCBzdHJ1Y3QgZGV2
X2FyY2hkYXRhIHsKICNpZmRlZiBDT05GSUdfSU9NTVVfQVBJCiAJdm9pZCAqaW9tbXU7CQkJLyog
cHJpdmF0ZSBJT01NVSBkYXRhICovCiAjZW5kaWYKKwl1NjQgcGFyZW50X2RtYV9tYXNrOwogCWJv
b2wgZG1hX2NvaGVyZW50OwogfTsKCmRpZmYgLS1naXQgYS9hcmNoL2FybTY0L21tL2RtYS1tYXBw
aW5nLmMgYi9hcmNoL2FybTY0L21tL2RtYS1tYXBwaW5nLmMgaW5kZXgKODFjZGIyZS4uNTg0NWVj
ZCAxMDA2NDQKLS0tIGEvYXJjaC9hcm02NC9tbS9kbWEtbWFwcGluZy5jCisrKyBiL2FyY2gvYXJt
NjQvbW0vZG1hLW1hcHBpbmcuYwpAQCAtNTY0LDYgKzU2NCw3IEBAIHN0YXRpYyB2b2lkIGZsdXNo
X3BhZ2Uoc3RydWN0IGRldmljZSAqZGV2LCBjb25zdCB2b2lkCip2aXJ0LCBwaHlzX2FkZHJfdCBw
aHlzKQogCV9fZG1hX2ZsdXNoX2FyZWEodmlydCwgUEFHRV9TSVpFKTsKIH0KCisKIHN0YXRpYyB2
b2lkICpfX2lvbW11X2FsbG9jX2F0dHJzKHN0cnVjdCBkZXZpY2UgKmRldiwgc2l6ZV90IHNpemUs
CiAJCQkJIGRtYV9hZGRyX3QgKmhhbmRsZSwgZ2ZwX3QgZ2ZwLAogCQkJCSB1bnNpZ25lZCBsb25n
IGF0dHJzKQpAQCAtNzk1LDYgKzc5NiwyMCBAQCBzdGF0aWMgdm9pZCBfX2lvbW11X3VubWFwX3Nn
X2F0dHJzKHN0cnVjdCBkZXZpY2UgKmRldiwKIAlpb21tdV9kbWFfdW5tYXBfc2coZGV2LCBzZ2ws
IG5lbGVtcywgZGlyLCBhdHRycyk7ICB9Cgorc3RhdGljIGludCBfX2lvbW11X3NldF9kbWFfbWFz
ayhzdHJ1Y3QgZGV2aWNlICpkZXYsIHU2NCBtYXNrKSB7CisJLyogZGV2aWNlIGlzIG5vdCBETUEg
Y2FwYWJsZSAqLworCWlmICghZGV2LT5kbWFfbWFzaykKKwkJcmV0dXJuIC1FSU87CisKKwlpZiAo
bWFzayA+IGRldi0+YXJjaGRhdGEucGFyZW50X2RtYV9tYXNrKQorCQltYXNrID0gZGV2LT5hcmNo
ZGF0YS5wYXJlbnRfZG1hX21hc2s7CisKKwkqZGV2LT5kbWFfbWFzayA9IG1hc2s7CisKKwlyZXR1
cm4gMDsKK30KKwogc3RhdGljIGNvbnN0IHN0cnVjdCBkbWFfbWFwX29wcyBpb21tdV9kbWFfb3Bz
ID0gewogCS5hbGxvYyA9IF9faW9tbXVfYWxsb2NfYXR0cnMsCiAJLmZyZWUgPSBfX2lvbW11X2Zy
ZWVfYXR0cnMsCkBAIC04MTEsOCArODI2LDIxIEBAIHN0YXRpYyB2b2lkIF9faW9tbXVfdW5tYXBf
c2dfYXR0cnMoc3RydWN0IGRldmljZSAqZGV2LAogCS5tYXBfcmVzb3VyY2UgPSBpb21tdV9kbWFf
bWFwX3Jlc291cmNlLAogCS51bm1hcF9yZXNvdXJjZSA9IGlvbW11X2RtYV91bm1hcF9yZXNvdXJj
ZSwKIAkubWFwcGluZ19lcnJvciA9IGlvbW11X2RtYV9tYXBwaW5nX2Vycm9yLAorCS5zZXRfZG1h
X21hc2sgPSBfX2lvbW11X3NldF9kbWFfbWFzaywKIH07CgoraW50IGRtYV9zZXRfY29oZXJlbnRf
bWFzayhzdHJ1Y3QgZGV2aWNlICpkZXYsIHU2NCBtYXNrKSB7CisJaWYgKGdldF9kbWFfb3BzKGRl
dikgPT0gJmlvbW11X2RtYV9vcHMgJiYKKwkgICAgbWFzayA+IGRldi0+YXJjaGRhdGEucGFyZW50
X2RtYV9tYXNrKQorCQltYXNrID0gZGV2LT5hcmNoZGF0YS5wYXJlbnRfZG1hX21hc2s7CisKKwlk
ZXYtPmNvaGVyZW50X2RtYV9tYXNrID0gbWFzazsKKwlyZXR1cm4gMDsKK30KK0VYUE9SVF9TWU1C
T0woZG1hX3NldF9jb2hlcmVudF9tYXNrKTsKKworCiAvKgogICogVE9ETzogUmlnaHQgbm93IF9f
aW9tbXVfc2V0dXBfZG1hX29wcygpIGdldHMgY2FsbGVkIHRvbyBlYXJseSB0byBkbwogICogZXZl
cnl0aGluZyBpdCBuZWVkcyB0byAtIHRoZSBkZXZpY2UgaXMgb25seSBwYXJ0aWFsbHkgY3JlYXRl
ZCBhbmQgdGhlCkBAIC05NzUsNiArMTAwMyw4IEBAIHZvaWQgYXJjaF9zZXR1cF9kbWFfb3BzKHN0
cnVjdCBkZXZpY2UgKmRldiwgdTY0CmRtYV9iYXNlLCB1NjQgc2l6ZSwKIAlpZiAoIWRldi0+ZG1h
X29wcykKIAkJZGV2LT5kbWFfb3BzID0gJnN3aW90bGJfZG1hX29wczsKCisJZGV2LT5hcmNoZGF0
YS5wYXJlbnRfZG1hX21hc2sgPSBzaXplIC0gMTsKKwogCWRldi0+YXJjaGRhdGEuZG1hX2NvaGVy
ZW50ID0gY29oZXJlbnQ7CiAJX19pb21tdV9zZXR1cF9kbWFfb3BzKGRldiwgZG1hX2Jhc2UsIHNp
emUsIGlvbW11KTsgIH0KCmRpZmYgLS1naXQgYS9kcml2ZXJzL29mL2RldmljZS5jIGIvZHJpdmVy
cy9vZi9kZXZpY2UuYyBpbmRleApiMWU2YmViLi4xMGFkYTRhIDEwMDY0NAotLS0gYS9kcml2ZXJz
L29mL2RldmljZS5jCisrKyBiL2RyaXZlcnMvb2YvZGV2aWNlLmMKQEAgLTksNiArOSw3IEBACiAj
aW5jbHVkZSA8bGludXgvbW9kdWxlLmg+CiAjaW5jbHVkZSA8bGludXgvbW9kX2RldmljZXRhYmxl
Lmg+CiAjaW5jbHVkZSA8bGludXgvc2xhYi5oPgorI2luY2x1ZGUgPGxpbnV4L29mX3BjaS5oPgoK
ICNpbmNsdWRlIDxhc20vZXJybm8uaD4KICNpbmNsdWRlICJvZl9wcml2YXRlLmgiCkBAIC0xMDQs
NyArMTA1LDExIEBAIHZvaWQgb2ZfZG1hX2NvbmZpZ3VyZShzdHJ1Y3QgZGV2aWNlICpkZXYsIHN0
cnVjdApkZXZpY2Vfbm9kZSAqbnApCiAJaWYgKCFkZXYtPmRtYV9tYXNrKQogCQlkZXYtPmRtYV9t
YXNrID0gJmRldi0+Y29oZXJlbnRfZG1hX21hc2s7CgotCXJldCA9IG9mX2RtYV9nZXRfcmFuZ2Uo
bnAsICZkbWFfYWRkciwgJnBhZGRyLCAmc2l6ZSk7CisJaWYgKGRldl9pc19wY2koZGV2KSkKKwkJ
cmV0ID0gb2ZfcGNpX2dldF9kbWFfcmFuZ2VzKG5wLCAmZG1hX2FkZHIsICZwYWRkciwgJnNpemUp
OworCWVsc2UKKwkJcmV0ID0gb2ZfZG1hX2dldF9yYW5nZShucCwgJmRtYV9hZGRyLCAmcGFkZHIs
ICZzaXplKTsKKwogCWlmIChyZXQgPCAwKSB7CiAJCWRtYV9hZGRyID0gb2Zmc2V0ID0gMDsKIAkJ
c2l6ZSA9IGRldi0+Y29oZXJlbnRfZG1hX21hc2sgKyAxOwpAQCAtMTM0LDEwICsxMzksOCBAQCB2
b2lkIG9mX2RtYV9jb25maWd1cmUoc3RydWN0IGRldmljZSAqZGV2LCBzdHJ1Y3QKZGV2aWNlX25v
ZGUgKm5wKQogCSAqIExpbWl0IGNvaGVyZW50IGFuZCBkbWEgbWFzayBiYXNlZCBvbiBzaXplIGFu
ZCBkZWZhdWx0IG1hc2sKIAkgKiBzZXQgYnkgdGhlIGRyaXZlci4KIAkgKi8KLQlkZXYtPmNvaGVy
ZW50X2RtYV9tYXNrID0gbWluKGRldi0+Y29oZXJlbnRfZG1hX21hc2ssCi0JCQkJICAgICBETUFf
QklUX01BU0soaWxvZzIoZG1hX2FkZHIgKyBzaXplKSkpOwotCSpkZXYtPmRtYV9tYXNrID0gbWlu
KCgqZGV2LT5kbWFfbWFzayksCi0JCQkgICAgIERNQV9CSVRfTUFTSyhpbG9nMihkbWFfYWRkciAr
IHNpemUpKSk7CisJZGV2LT5jb2hlcmVudF9kbWFfbWFzayA9IERNQV9CSVRfTUFTSyhpbG9nMihk
bWFfYWRkciArIHNpemUpKTsKKwkqZGV2LT5kbWFfbWFzayA9IGRldi0+Y29oZXJlbnRfZG1hX21h
c2s7CgogCWNvaGVyZW50ID0gb2ZfZG1hX2lzX2NvaGVyZW50KG5wKTsKIAlkZXZfZGJnKGRldiwg
ImRldmljZSBpcyVzZG1hIGNvaGVyZW50XG4iLCBAQCAtMjI1LDMwICsyMjgsNiBAQCBzc2l6ZV90
Cm9mX2RldmljZV9nZXRfbW9kYWxpYXMoc3RydWN0IGRldmljZSAqZGV2LCBjaGFyICpzdHIsIHNz
aXplX3QgbGVuKQoKIAlyZXR1cm4gdHNpemU7CiB9Ci0tCjEuOS4xCgpfX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpsaW51eC1hcm0ta2VybmVsIG1haWxpbmcg
bGlzdApsaW51eC1hcm0ta2VybmVsQGxpc3RzLmluZnJhZGVhZC5vcmcKaHR0cDovL2xpc3RzLmlu
ZnJhZGVhZC5vcmcvbWFpbG1hbi9saXN0aW5mby9saW51eC1hcm0ta2VybmVsCg==

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation
@ 2017-03-20 17:49     ` Oza Oza via iommu
  0 siblings, 0 replies; 14+ messages in thread
From: Oza Oza @ 2017-03-20 17:49 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

Please find my comments inline.

Regards,
Oza.

-----Original Message-----
From: Robin Murphy [mailto:robin.murphy at arm.com]
Sent: Monday, March 20, 2017 9:14 PM
To: Oza Oza
Cc: Joerg Roedel; linux-pci at vger.kernel.org;
iommu at lists.linux-foundation.org; linux-kernel at vger.kernel.org;
linux-arm-kernel at lists.infradead.org; devicetree at vger.kernel.org;
bcm-kernel-feedback-list at broadcom.com
Subject: Re: [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for
IOVA allocation

On 20/03/17 08:57, Oza Oza wrote:
> +  linux-pci
>
> Regards,
> Oza.
>
> -----Original Message-----
> From: Oza Pawandeep [mailto:oza.oza at broadcom.com]
> Sent: Friday, March 17, 2017 11:41 AM
> To: Joerg Roedel; Robin Murphy
> Cc: iommu at lists.linux-foundation.org; linux-kernel at vger.kernel.org;
> linux-arm-kernel at lists.infradead.org; devicetree at vger.kernel.org;
> bcm-kernel-feedback-list at broadcom.com; Oza Pawandeep
> Subject: [RFC PATCH] iommu/dma: account pci host bridge dma_mask for
> IOVA allocation
>
> It is possible that PCI device supports 64-bit DMA addressing, and
> thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however
> PCI host bridge may have limitations on the inbound transaction
> addressing. As an example, consider NVME SSD device connected to
> iproc-PCIe controller.
>
> Currently, the IOMMU DMA ops only considers PCI device dma_mask when
> allocating an IOVA. This is particularly problematic on
> ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for
> in-bound transactions only after PCI Host has forwarded these
> transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA
> of in-bound transactions has to honor the addressing restrictions of the
> PCI Host.
>
> this patch is inspired by
> http://www.mail-archive.com/linux-kernel at vger.kernel.org/msg1306545.ht
> ml http://www.spinics.net/lists/arm-kernel/msg566947.html
>
> but above inspiraiton solves the half of the problem.
> the rest of the problem is descrbied below, what we face on iproc
> based SOCs.
>
> current pcie frmework and of framework integration assumes dma-ranges
> in a way where memory-mapped devices define their dma-ranges.
> dma-ranges: (child-bus-address, parent-bus-address, length).
>
> but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> of_dma_configure is specifically witten to take care of memory mapped
> devices.
> but no implementation exists for pci to take care of pcie based memory
> ranges.
> in fact pci world doesnt seem to define standard dma-ranges since
> there is an absense of the same, the dma_mask used to remain 32bit
> because of
> 0 size return (parsed by of_dma_configure())
>
> this patch also implements of_pci_get_dma_ranges to cater to pci world
> dma-ranges.
> so then the returned size get best possible (largest) dma_mask.
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>; we should get
> dev->coherent_dma_mask=0x7fffffffff.
>
> conclusion: there are following problems
> 1) linux pci and iommu framework integration has glitches with respect
> to dma-ranges
> 2) pci linux framework look very uncertain about dma-ranges, rather
> binding is not defined
>    the way it is defined for memory mapped devices.
>    rcar and iproc based SOCs use their custom one dma-ranges
>    (rather can be standard)
> 3) even if in case of default parser of_dma_get_ranges,:
>    it throws and erro"
>    "no dma-ranges found for node"
>    because of the bug which exists.
>    following lines should be moved to the end of while(1)
> 	839                 node = of_get_next_parent(node);
> 	840                 if (!node)
> 	841                         break;

Right, having made sense of this and looked into things myself I think I
understand now; what this boils down to is that the existing implementation
of of_dma_get_range() expects always to be given a leaf device_node, and
doesn't cope with being given a device_node for the given device's parent
bus directly. That's really all there is; it's not specific to PCI (there
are other probeable and DMA-capable buses whose children aren't described in
DT, like the fsl-mc thing), and it definitely doesn't have anything to do
with IOMMUs.

>Oza: I think it?s the other way around, or rather it is given leaf device
>node correctly. At-least in this case.
>The problem is of_dma_get_range jumps to parent node  <node =
>of_get_next_parent(node);> without examining child.
>Although I tried to fix it, but in that case, the dma-ranges parse code
>doesn?t really parse pci ranges. And size returned is 0.
>Rather it parses memory mapped devices dma-ranges. And that format is
>different.

Now, that's certainly something to fix, but AFAICS this patch doesn't do
that, only adds some PCI-specific code which is never called.

>Oza: it defines of_pci_dma_get_ranges, which does get called (ahhh......its
>my bad that, I don?t have that call in this patch-set, probably missed that
>file, sorry about that.)
>I have just pasted the patch at the end, check drivers/of/device.c
>Again, this code is specific to dma-ranges defined by pci host, which
>differs from the way memory-mapped device define their ranges.
>At-least that is the way binding document suggests, and current dma-range
>doesn?t parse pci dma-ranges correctly.

>So this patch fixes that.
>of_dma_configure , when it calls of_dma_get_range or in this case
>of_pci_dma_get_ranges, both should be retuning size correctly back.
>Because all the later statements make use of size to derive dma_mask.
>And from there, especially for pci, it derives root bridge mask, which
>suggests limitation of pci host bridges.
>Now the strange thing is that this limitation does not exist for us when
>IOMMU is disabled, which is expected because our inbound memory window just
>programed fine to address all the available memory in the system,
need not be physically contiguous.
>But when IOMMU is enabled, IOVA address size becomes limitation, and our
>max window can-not go beyond 512GB which is just 39bits.
> having said that, at-least parsing of dma-ranges is broken, and this patch
> is the an attempt to fix that.
> ideally I should be making pci dma-ranges patch and device patch to make
> it look like a proper patch. Do you think this is the only and right way
> to fix it,
Or you have any other opinions ?

DMA mask inheritance for arm64 is another issue, which again is general, but
does tend to be more visible in the IOMMU case. That still needs some work
on the APCI side - all the DT-centric approaches so far either regress or at
best do nothing for ACPI. I've made a note to try to look into that soon,
but from what I recall I fear there is still an open question about what to
do for a default in the absence of IORT or _DMA (once the current assumption
that drivers can override our arbitrary default at will is closed down).

In the meantime, have you tried 4.11-rc1 or later on the affected system?
One of the ulterior motives behind 122fac030e91 was that in many cases it
also happens to paper over most versions of this problem for PCI devices,
and makes the IOMMU at least useable (on systems which don't need to
dma_map_*() vast amounts of RAM all at once) while we fix the underlying
things properly.

Robin.

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index
0ee42c3..5804717 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -283,6 +283,51 @@ int of_pci_get_host_bridge_resources(struct device_node
*dev,
 	return err;
 }
 EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
+
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
+*paddr, u64 *size) {
+	struct device_node *node = of_node_get(np);
+	int rlen, naddr, nsize, pna;
+	int ret = 0;
+	const int na = 3, ns = 2;
+	struct of_pci_range_parser parser;
+	struct of_pci_range range;
+
+	if (!node)
+		return -EINVAL;
+
+	parser.node = node;
+	parser.pna = of_n_addr_cells(node);
+	parser.np = parser.pna + na + ns;
+
+	parser.range = of_get_property(node, "dma-ranges", &rlen);
+
+	if (!parser.range) {
+		pr_debug("pcie device has no dma-ranges defined for node(%s)\n",
np->full_name);
+		ret = -ENODEV;
+		goto out;
+	}
+
+	parser.end = parser.range + rlen / sizeof(__be32);
+
+	/* how do we take care of multiple dma windows ?. */
+	for_each_of_pci_range(&parser, &range) {
+		*dma_addr = range.pci_addr;
+		*size = range.size;
+		*paddr = range.cpu_addr;
+	}
+
+	pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
+		 *dma_addr, *paddr, *size);
+		 *dma_addr = range.pci_addr;
+		 *size = range.size;
+
+out:
+	of_node_put(node);
+	return ret;
+
+}
+EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
 #endif /* CONFIG_OF_ADDRESS */

 #ifdef CONFIG_PCI_MSI
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index
0e0974e..907ace0 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }  int
of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max,
 			struct list_head *resources, resource_size_t *io_base);
+int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
+*paddr, u64 *size);
 #else
 static inline int of_pci_get_host_bridge_resources(struct device_node *dev,
 			unsigned char busno, unsigned char bus_max, @@ -83,6 +84,11 @@ static
inline int of_pci_get_host_bridge_resources(struct device_node *dev,  {
 	return -EINVAL;
 }
+
+static inline int of_pci_get_dma_ranges(struct device_node *np, u64
+*dma_addr, u64 *paddr, u64 *size) {
+	return -EINVAL;
+}
 #endif

 #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
--
1.9.1

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 8c7c244..20cfff7
100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE  config NEED_SG_DMA_LENGTH
 	def_bool y

+config ARCH_HAS_DMA_SET_COHERENT_MASK
+	def_bool y
+
 config SMP
 	def_bool y

diff --git a/arch/arm64/include/asm/device.h
b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -20,6 +20,7 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
 	void *iommu;			/* private IOMMU data */
 #endif
+	u64 parent_dma_mask;
 	bool dma_coherent;
 };

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index
81cdb2e..5845ecd 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void
*virt, phys_addr_t phys)
 	__dma_flush_area(virt, PAGE_SIZE);
 }

+
 static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 				 dma_addr_t *handle, gfp_t gfp,
 				 unsigned long attrs)
@@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device *dev,
 	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);  }

+static int __iommu_set_dma_mask(struct device *dev, u64 mask) {
+	/* device is not DMA capable */
+	if (!dev->dma_mask)
+		return -EIO;
+
+	if (mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	*dev->dma_mask = mask;
+
+	return 0;
+}
+
 static const struct dma_map_ops iommu_dma_ops = {
 	.alloc = __iommu_alloc_attrs,
 	.free = __iommu_free_attrs,
@@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device *dev,
 	.map_resource = iommu_dma_map_resource,
 	.unmap_resource = iommu_dma_unmap_resource,
 	.mapping_error = iommu_dma_mapping_error,
+	.set_dma_mask = __iommu_set_dma_mask,
 };

+int dma_set_coherent_mask(struct device *dev, u64 mask) {
+	if (get_dma_ops(dev) == &iommu_dma_ops &&
+	    mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	dev->coherent_dma_mask = mask;
+	return 0;
+}
+EXPORT_SYMBOL(dma_set_coherent_mask);
+
+
 /*
  * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
  * everything it needs to - the device is only partially created and the
@@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev, u64
dma_base, u64 size,
 	if (!dev->dma_ops)
 		dev->dma_ops = &swiotlb_dma_ops;

+	dev->archdata.parent_dma_mask = size - 1;
+
 	dev->archdata.dma_coherent = coherent;
 	__iommu_setup_dma_ops(dev, dma_base, size, iommu);  }

diff --git a/drivers/of/device.c b/drivers/of/device.c index
b1e6beb..10ada4a 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -9,6 +9,7 @@
 #include <linux/module.h>
 #include <linux/mod_devicetable.h>
 #include <linux/slab.h>
+#include <linux/of_pci.h>

 #include <asm/errno.h>
 #include "of_private.h"
@@ -104,7 +105,11 @@ void of_dma_configure(struct device *dev, struct
device_node *np)
 	if (!dev->dma_mask)
 		dev->dma_mask = &dev->coherent_dma_mask;

-	ret = of_dma_get_range(np, &dma_addr, &paddr, &size);
+	if (dev_is_pci(dev))
+		ret = of_pci_get_dma_ranges(np, &dma_addr, &paddr, &size);
+	else
+		ret = of_dma_get_range(np, &dma_addr, &paddr, &size);
+
 	if (ret < 0) {
 		dma_addr = offset = 0;
 		size = dev->coherent_dma_mask + 1;
@@ -134,10 +139,8 @@ void of_dma_configure(struct device *dev, struct
device_node *np)
 	 * Limit coherent and dma mask based on size and default mask
 	 * set by the driver.
 	 */
-	dev->coherent_dma_mask = min(dev->coherent_dma_mask,
-				     DMA_BIT_MASK(ilog2(dma_addr + size)));
-	*dev->dma_mask = min((*dev->dma_mask),
-			     DMA_BIT_MASK(ilog2(dma_addr + size)));
+	dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(dma_addr + size));
+	*dev->dma_mask = dev->coherent_dma_mask;

 	coherent = of_dma_is_coherent(np);
 	dev_dbg(dev, "device is%sdma coherent\n", @@ -225,30 +228,6 @@ ssize_t
of_device_get_modalias(struct device *dev, char *str, ssize_t len)

 	return tsize;
 }
--
1.9.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation
  2017-03-20 15:43   ` Robin Murphy
  (?)
@ 2017-03-25  5:34     ` Oza Oza
  -1 siblings, 0 replies; 14+ messages in thread
From: Oza Oza @ 2017-03-25  5:34 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, linux-pci, iommu, linux-kernel, linux-arm-kernel,
	devicetree, bcm-kernel-feedback-list

Hi Robin,

I have made 3 separate patches now, which gives clear idea about the
changes.
we can have discussion there.

Regards,
Oza.

-----Original Message-----
From: Robin Murphy [mailto:robin.murphy@arm.com]
Sent: Monday, March 20, 2017 9:14 PM
To: Oza Oza
Cc: Joerg Roedel; linux-pci@vger.kernel.org;
iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
linux-arm-kernel@lists.infradead.org; devicetree@vger.kernel.org;
bcm-kernel-feedback-list@broadcom.com
Subject: Re: [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for
IOVA allocation

On 20/03/17 08:57, Oza Oza wrote:
> +  linux-pci
>
> Regards,
> Oza.
>
> -----Original Message-----
> From: Oza Pawandeep [mailto:oza.oza@broadcom.com]
> Sent: Friday, March 17, 2017 11:41 AM
> To: Joerg Roedel; Robin Murphy
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> linux-arm-kernel@lists.infradead.org; devicetree@vger.kernel.org;
> bcm-kernel-feedback-list@broadcom.com; Oza Pawandeep
> Subject: [RFC PATCH] iommu/dma: account pci host bridge dma_mask for
> IOVA allocation
>
> It is possible that PCI device supports 64-bit DMA addressing, and
> thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however
> PCI host bridge may have limitations on the inbound transaction
> addressing. As an example, consider NVME SSD device connected to
> iproc-PCIe controller.
>
> Currently, the IOMMU DMA ops only considers PCI device dma_mask when
> allocating an IOVA. This is particularly problematic on
> ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for
> in-bound transactions only after PCI Host has forwarded these
> transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA
> of in-bound transactions has to honor the addressing restrictions of the
> PCI Host.
>
> this patch is inspired by
> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.ht
> ml http://www.spinics.net/lists/arm-kernel/msg566947.html
>
> but above inspiraiton solves the half of the problem.
> the rest of the problem is descrbied below, what we face on iproc
> based SOCs.
>
> current pcie frmework and of framework integration assumes dma-ranges
> in a way where memory-mapped devices define their dma-ranges.
> dma-ranges: (child-bus-address, parent-bus-address, length).
>
> but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> of_dma_configure is specifically witten to take care of memory mapped
> devices.
> but no implementation exists for pci to take care of pcie based memory
> ranges.
> in fact pci world doesnt seem to define standard dma-ranges since
> there is an absense of the same, the dma_mask used to remain 32bit
> because of
> 0 size return (parsed by of_dma_configure())
>
> this patch also implements of_pci_get_dma_ranges to cater to pci world
> dma-ranges.
> so then the returned size get best possible (largest) dma_mask.
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>; we should get
> dev->coherent_dma_mask=0x7fffffffff.
>
> conclusion: there are following problems
> 1) linux pci and iommu framework integration has glitches with respect
> to dma-ranges
> 2) pci linux framework look very uncertain about dma-ranges, rather
> binding is not defined
>    the way it is defined for memory mapped devices.
>    rcar and iproc based SOCs use their custom one dma-ranges
>    (rather can be standard)
> 3) even if in case of default parser of_dma_get_ranges,:
>    it throws and erro"
>    "no dma-ranges found for node"
>    because of the bug which exists.
>    following lines should be moved to the end of while(1)
> 	839                 node = of_get_next_parent(node);
> 	840                 if (!node)
> 	841                         break;

Right, having made sense of this and looked into things myself I think I
understand now; what this boils down to is that the existing implementation
of of_dma_get_range() expects always to be given a leaf device_node, and
doesn't cope with being given a device_node for the given device's parent
bus directly. That's really all there is; it's not specific to PCI (there
are other probeable and DMA-capable buses whose children aren't described in
DT, like the fsl-mc thing), and it definitely doesn't have anything to do
with IOMMUs.

Now, that's certainly something to fix, but AFAICS this patch doesn't do
that, only adds some PCI-specific code which is never called.

DMA mask inheritance for arm64 is another issue, which again is general, but
does tend to be more visible in the IOMMU case. That still needs some work
on the APCI side - all the DT-centric approaches so far either regress or at
best do nothing for ACPI. I've made a note to try to look into that soon,
but from what I recall I fear there is still an open question about what to
do for a default in the absence of IORT or _DMA (once the current assumption
that drivers can override our arbitrary default at will is closed down).

In the meantime, have you tried 4.11-rc1 or later on the affected system?
One of the ulterior motives behind 122fac030e91 was that in many cases it
also happens to paper over most versions of this problem for PCI devices,
and makes the IOMMU at least useable (on systems which don't need to
dma_map_*() vast amounts of RAM all at once) while we fix the underlying
things properly.

Robin.

> Reviewed-by: Anup Patel <anup.patel@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index
> 8c7c244..20cfff7 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE  config NEED_SG_DMA_LENGTH
>  	def_bool y
>
> +config ARCH_HAS_DMA_SET_COHERENT_MASK
> +	def_bool y
> +
>  config SMP
>  	def_bool y
>
> diff --git a/arch/arm64/include/asm/device.h
> b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644
> --- a/arch/arm64/include/asm/device.h
> +++ b/arch/arm64/include/asm/device.h
> @@ -20,6 +20,7 @@ struct dev_archdata {  #ifdef CONFIG_IOMMU_API
>  	void *iommu;			/* private IOMMU data */
>  #endif
> +	u64 parent_dma_mask;
>  	bool dma_coherent;
>  };
>
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 81cdb2e..5845ecd 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const
> void *virt, phys_addr_t phys)
>  	__dma_flush_area(virt, PAGE_SIZE);
>  }
>
> +
>  static void *__iommu_alloc_attrs(struct device *dev, size_t size,
>  				 dma_addr_t *handle, gfp_t gfp,
>  				 unsigned long attrs)
> @@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device
> *dev,
>  	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);  }
>
> +static int __iommu_set_dma_mask(struct device *dev, u64 mask) {
> +	/* device is not DMA capable */
> +	if (!dev->dma_mask)
> +		return -EIO;
> +
> +	if (mask > dev->archdata.parent_dma_mask)
> +		mask = dev->archdata.parent_dma_mask;
> +
> +	*dev->dma_mask = mask;
> +
> +	return 0;
> +}
> +
>  static const struct dma_map_ops iommu_dma_ops = {
>  	.alloc = __iommu_alloc_attrs,
>  	.free = __iommu_free_attrs,
> @@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device
> *dev,
>  	.map_resource = iommu_dma_map_resource,
>  	.unmap_resource = iommu_dma_unmap_resource,
>  	.mapping_error = iommu_dma_mapping_error,
> +	.set_dma_mask = __iommu_set_dma_mask,
>  };
>
> +int dma_set_coherent_mask(struct device *dev, u64 mask) {
> +	if (get_dma_ops(dev) == &iommu_dma_ops &&
> +	    mask > dev->archdata.parent_dma_mask)
> +		mask = dev->archdata.parent_dma_mask;
> +
> +	dev->coherent_dma_mask = mask;
> +	return 0;
> +}
> +EXPORT_SYMBOL(dma_set_coherent_mask);
> +
> +
>  /*
>   * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
>   * everything it needs to - the device is only partially created and
> the @@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev,
> u64 dma_base, u64 size,
>  	if (!dev->dma_ops)
>  		dev->dma_ops = &swiotlb_dma_ops;
>
> +	dev->archdata.parent_dma_mask = size - 1;
> +
>  	dev->archdata.dma_coherent = coherent;
>  	__iommu_setup_dma_ops(dev, dma_base, size, iommu);  } diff --git
> a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..5804717
> 100644
> --- a/drivers/of/of_pci.c
> +++ b/drivers/of/of_pci.c
> @@ -283,6 +283,51 @@ int of_pci_get_host_bridge_resources(struct
> device_node *dev,
>  	return err;
>  }
>  EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
> +
> +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
> +*paddr, u64 *size) {
> +	struct device_node *node = of_node_get(np);
> +	int rlen, naddr, nsize, pna;
> +	int ret = 0;
> +	const int na = 3, ns = 2;
> +	struct of_pci_range_parser parser;
> +	struct of_pci_range range;
> +
> +	if (!node)
> +		return -EINVAL;
> +
> +	parser.node = node;
> +	parser.pna = of_n_addr_cells(node);
> +	parser.np = parser.pna + na + ns;
> +
> +	parser.range = of_get_property(node, "dma-ranges", &rlen);
> +
> +	if (!parser.range) {
> +		pr_debug("pcie device has no dma-ranges defined for
> node(%s)\n", np->full_name);
> +		ret = -ENODEV;
> +		goto out;
> +	}
> +
> +	parser.end = parser.range + rlen / sizeof(__be32);
> +
> +	/* how do we take care of multiple dma windows ?. */
> +	for_each_of_pci_range(&parser, &range) {
> +		*dma_addr = range.pci_addr;
> +		*size = range.size;
> +		*paddr = range.cpu_addr;
> +	}
> +
> +	pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
> +		 *dma_addr, *paddr, *size);
> +		 *dma_addr = range.pci_addr;
> +		 *size = range.size;
> +
> +out:
> +	of_node_put(node);
> +	return ret;
> +
> +}
> +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
>  #endif /* CONFIG_OF_ADDRESS */
>
>  #ifdef CONFIG_PCI_MSI
> diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index
> 0e0974e..907ace0 100644
> --- a/include/linux/of_pci.h
> +++ b/include/linux/of_pci.h
> @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
> int of_pci_get_host_bridge_resources(struct device_node *dev,
>  			unsigned char busno, unsigned char bus_max,
>  			struct list_head *resources, resource_size_t *io_base);
> +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
> +*paddr, u64 *size);
>  #else
>  static inline int of_pci_get_host_bridge_resources(struct device_node
> *dev,
>  			unsigned char busno, unsigned char bus_max, @@
> -83,6 +84,11 @@ static inline int
> of_pci_get_host_bridge_resources(struct
> device_node *dev,  {
>  	return -EINVAL;
>  }
> +
> +static inline int of_pci_get_dma_ranges(struct device_node *np, u64
> +*dma_addr, u64 *paddr, u64 *size) {
> +	return -EINVAL;
> +}
>  #endif
>
>  #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation
@ 2017-03-25  5:34     ` Oza Oza
  0 siblings, 0 replies; 14+ messages in thread
From: Oza Oza @ 2017-03-25  5:34 UTC (permalink / raw)
  To: Robin Murphy
  Cc: devicetree, linux-pci, Joerg Roedel, linux-kernel, iommu,
	bcm-kernel-feedback-list, linux-arm-kernel

Hi Robin,

I have made 3 separate patches now, which gives clear idea about the
changes.
we can have discussion there.

Regards,
Oza.

-----Original Message-----
From: Robin Murphy [mailto:robin.murphy@arm.com]
Sent: Monday, March 20, 2017 9:14 PM
To: Oza Oza
Cc: Joerg Roedel; linux-pci@vger.kernel.org;
iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
linux-arm-kernel@lists.infradead.org; devicetree@vger.kernel.org;
bcm-kernel-feedback-list@broadcom.com
Subject: Re: [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for
IOVA allocation

On 20/03/17 08:57, Oza Oza wrote:
> +  linux-pci
>
> Regards,
> Oza.
>
> -----Original Message-----
> From: Oza Pawandeep [mailto:oza.oza@broadcom.com]
> Sent: Friday, March 17, 2017 11:41 AM
> To: Joerg Roedel; Robin Murphy
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> linux-arm-kernel@lists.infradead.org; devicetree@vger.kernel.org;
> bcm-kernel-feedback-list@broadcom.com; Oza Pawandeep
> Subject: [RFC PATCH] iommu/dma: account pci host bridge dma_mask for
> IOVA allocation
>
> It is possible that PCI device supports 64-bit DMA addressing, and
> thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however
> PCI host bridge may have limitations on the inbound transaction
> addressing. As an example, consider NVME SSD device connected to
> iproc-PCIe controller.
>
> Currently, the IOMMU DMA ops only considers PCI device dma_mask when
> allocating an IOVA. This is particularly problematic on
> ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for
> in-bound transactions only after PCI Host has forwarded these
> transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA
> of in-bound transactions has to honor the addressing restrictions of the
> PCI Host.
>
> this patch is inspired by
> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.ht
> ml http://www.spinics.net/lists/arm-kernel/msg566947.html
>
> but above inspiraiton solves the half of the problem.
> the rest of the problem is descrbied below, what we face on iproc
> based SOCs.
>
> current pcie frmework and of framework integration assumes dma-ranges
> in a way where memory-mapped devices define their dma-ranges.
> dma-ranges: (child-bus-address, parent-bus-address, length).
>
> but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> of_dma_configure is specifically witten to take care of memory mapped
> devices.
> but no implementation exists for pci to take care of pcie based memory
> ranges.
> in fact pci world doesnt seem to define standard dma-ranges since
> there is an absense of the same, the dma_mask used to remain 32bit
> because of
> 0 size return (parsed by of_dma_configure())
>
> this patch also implements of_pci_get_dma_ranges to cater to pci world
> dma-ranges.
> so then the returned size get best possible (largest) dma_mask.
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>; we should get
> dev->coherent_dma_mask=0x7fffffffff.
>
> conclusion: there are following problems
> 1) linux pci and iommu framework integration has glitches with respect
> to dma-ranges
> 2) pci linux framework look very uncertain about dma-ranges, rather
> binding is not defined
>    the way it is defined for memory mapped devices.
>    rcar and iproc based SOCs use their custom one dma-ranges
>    (rather can be standard)
> 3) even if in case of default parser of_dma_get_ranges,:
>    it throws and erro"
>    "no dma-ranges found for node"
>    because of the bug which exists.
>    following lines should be moved to the end of while(1)
> 	839                 node = of_get_next_parent(node);
> 	840                 if (!node)
> 	841                         break;

Right, having made sense of this and looked into things myself I think I
understand now; what this boils down to is that the existing implementation
of of_dma_get_range() expects always to be given a leaf device_node, and
doesn't cope with being given a device_node for the given device's parent
bus directly. That's really all there is; it's not specific to PCI (there
are other probeable and DMA-capable buses whose children aren't described in
DT, like the fsl-mc thing), and it definitely doesn't have anything to do
with IOMMUs.

Now, that's certainly something to fix, but AFAICS this patch doesn't do
that, only adds some PCI-specific code which is never called.

DMA mask inheritance for arm64 is another issue, which again is general, but
does tend to be more visible in the IOMMU case. That still needs some work
on the APCI side - all the DT-centric approaches so far either regress or at
best do nothing for ACPI. I've made a note to try to look into that soon,
but from what I recall I fear there is still an open question about what to
do for a default in the absence of IORT or _DMA (once the current assumption
that drivers can override our arbitrary default at will is closed down).

In the meantime, have you tried 4.11-rc1 or later on the affected system?
One of the ulterior motives behind 122fac030e91 was that in many cases it
also happens to paper over most versions of this problem for PCI devices,
and makes the IOMMU at least useable (on systems which don't need to
dma_map_*() vast amounts of RAM all at once) while we fix the underlying
things properly.

Robin.

> Reviewed-by: Anup Patel <anup.patel@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index
> 8c7c244..20cfff7 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE  config NEED_SG_DMA_LENGTH
>  	def_bool y
>
> +config ARCH_HAS_DMA_SET_COHERENT_MASK
> +	def_bool y
> +
>  config SMP
>  	def_bool y
>
> diff --git a/arch/arm64/include/asm/device.h
> b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644
> --- a/arch/arm64/include/asm/device.h
> +++ b/arch/arm64/include/asm/device.h
> @@ -20,6 +20,7 @@ struct dev_archdata {  #ifdef CONFIG_IOMMU_API
>  	void *iommu;			/* private IOMMU data */
>  #endif
> +	u64 parent_dma_mask;
>  	bool dma_coherent;
>  };
>
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 81cdb2e..5845ecd 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const
> void *virt, phys_addr_t phys)
>  	__dma_flush_area(virt, PAGE_SIZE);
>  }
>
> +
>  static void *__iommu_alloc_attrs(struct device *dev, size_t size,
>  				 dma_addr_t *handle, gfp_t gfp,
>  				 unsigned long attrs)
> @@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device
> *dev,
>  	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);  }
>
> +static int __iommu_set_dma_mask(struct device *dev, u64 mask) {
> +	/* device is not DMA capable */
> +	if (!dev->dma_mask)
> +		return -EIO;
> +
> +	if (mask > dev->archdata.parent_dma_mask)
> +		mask = dev->archdata.parent_dma_mask;
> +
> +	*dev->dma_mask = mask;
> +
> +	return 0;
> +}
> +
>  static const struct dma_map_ops iommu_dma_ops = {
>  	.alloc = __iommu_alloc_attrs,
>  	.free = __iommu_free_attrs,
> @@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device
> *dev,
>  	.map_resource = iommu_dma_map_resource,
>  	.unmap_resource = iommu_dma_unmap_resource,
>  	.mapping_error = iommu_dma_mapping_error,
> +	.set_dma_mask = __iommu_set_dma_mask,
>  };
>
> +int dma_set_coherent_mask(struct device *dev, u64 mask) {
> +	if (get_dma_ops(dev) == &iommu_dma_ops &&
> +	    mask > dev->archdata.parent_dma_mask)
> +		mask = dev->archdata.parent_dma_mask;
> +
> +	dev->coherent_dma_mask = mask;
> +	return 0;
> +}
> +EXPORT_SYMBOL(dma_set_coherent_mask);
> +
> +
>  /*
>   * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
>   * everything it needs to - the device is only partially created and
> the @@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev,
> u64 dma_base, u64 size,
>  	if (!dev->dma_ops)
>  		dev->dma_ops = &swiotlb_dma_ops;
>
> +	dev->archdata.parent_dma_mask = size - 1;
> +
>  	dev->archdata.dma_coherent = coherent;
>  	__iommu_setup_dma_ops(dev, dma_base, size, iommu);  } diff --git
> a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..5804717
> 100644
> --- a/drivers/of/of_pci.c
> +++ b/drivers/of/of_pci.c
> @@ -283,6 +283,51 @@ int of_pci_get_host_bridge_resources(struct
> device_node *dev,
>  	return err;
>  }
>  EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
> +
> +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
> +*paddr, u64 *size) {
> +	struct device_node *node = of_node_get(np);
> +	int rlen, naddr, nsize, pna;
> +	int ret = 0;
> +	const int na = 3, ns = 2;
> +	struct of_pci_range_parser parser;
> +	struct of_pci_range range;
> +
> +	if (!node)
> +		return -EINVAL;
> +
> +	parser.node = node;
> +	parser.pna = of_n_addr_cells(node);
> +	parser.np = parser.pna + na + ns;
> +
> +	parser.range = of_get_property(node, "dma-ranges", &rlen);
> +
> +	if (!parser.range) {
> +		pr_debug("pcie device has no dma-ranges defined for
> node(%s)\n", np->full_name);
> +		ret = -ENODEV;
> +		goto out;
> +	}
> +
> +	parser.end = parser.range + rlen / sizeof(__be32);
> +
> +	/* how do we take care of multiple dma windows ?. */
> +	for_each_of_pci_range(&parser, &range) {
> +		*dma_addr = range.pci_addr;
> +		*size = range.size;
> +		*paddr = range.cpu_addr;
> +	}
> +
> +	pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
> +		 *dma_addr, *paddr, *size);
> +		 *dma_addr = range.pci_addr;
> +		 *size = range.size;
> +
> +out:
> +	of_node_put(node);
> +	return ret;
> +
> +}
> +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
>  #endif /* CONFIG_OF_ADDRESS */
>
>  #ifdef CONFIG_PCI_MSI
> diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index
> 0e0974e..907ace0 100644
> --- a/include/linux/of_pci.h
> +++ b/include/linux/of_pci.h
> @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
> int of_pci_get_host_bridge_resources(struct device_node *dev,
>  			unsigned char busno, unsigned char bus_max,
>  			struct list_head *resources, resource_size_t *io_base);
> +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
> +*paddr, u64 *size);
>  #else
>  static inline int of_pci_get_host_bridge_resources(struct device_node
> *dev,
>  			unsigned char busno, unsigned char bus_max, @@
> -83,6 +84,11 @@ static inline int
> of_pci_get_host_bridge_resources(struct
> device_node *dev,  {
>  	return -EINVAL;
>  }
> +
> +static inline int of_pci_get_dma_ranges(struct device_node *np, u64
> +*dma_addr, u64 *paddr, u64 *size) {
> +	return -EINVAL;
> +}
>  #endif
>
>  #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
> --
> 1.9.1
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation
@ 2017-03-25  5:34     ` Oza Oza
  0 siblings, 0 replies; 14+ messages in thread
From: Oza Oza @ 2017-03-25  5:34 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

I have made 3 separate patches now, which gives clear idea about the
changes.
we can have discussion there.

Regards,
Oza.

-----Original Message-----
From: Robin Murphy [mailto:robin.murphy at arm.com]
Sent: Monday, March 20, 2017 9:14 PM
To: Oza Oza
Cc: Joerg Roedel; linux-pci at vger.kernel.org;
iommu at lists.linux-foundation.org; linux-kernel at vger.kernel.org;
linux-arm-kernel at lists.infradead.org; devicetree at vger.kernel.org;
bcm-kernel-feedback-list at broadcom.com
Subject: Re: [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for
IOVA allocation

On 20/03/17 08:57, Oza Oza wrote:
> +  linux-pci
>
> Regards,
> Oza.
>
> -----Original Message-----
> From: Oza Pawandeep [mailto:oza.oza at broadcom.com]
> Sent: Friday, March 17, 2017 11:41 AM
> To: Joerg Roedel; Robin Murphy
> Cc: iommu at lists.linux-foundation.org; linux-kernel at vger.kernel.org;
> linux-arm-kernel at lists.infradead.org; devicetree at vger.kernel.org;
> bcm-kernel-feedback-list at broadcom.com; Oza Pawandeep
> Subject: [RFC PATCH] iommu/dma: account pci host bridge dma_mask for
> IOVA allocation
>
> It is possible that PCI device supports 64-bit DMA addressing, and
> thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however
> PCI host bridge may have limitations on the inbound transaction
> addressing. As an example, consider NVME SSD device connected to
> iproc-PCIe controller.
>
> Currently, the IOMMU DMA ops only considers PCI device dma_mask when
> allocating an IOVA. This is particularly problematic on
> ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for
> in-bound transactions only after PCI Host has forwarded these
> transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA
> of in-bound transactions has to honor the addressing restrictions of the
> PCI Host.
>
> this patch is inspired by
> http://www.mail-archive.com/linux-kernel at vger.kernel.org/msg1306545.ht
> ml http://www.spinics.net/lists/arm-kernel/msg566947.html
>
> but above inspiraiton solves the half of the problem.
> the rest of the problem is descrbied below, what we face on iproc
> based SOCs.
>
> current pcie frmework and of framework integration assumes dma-ranges
> in a way where memory-mapped devices define their dma-ranges.
> dma-ranges: (child-bus-address, parent-bus-address, length).
>
> but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> of_dma_configure is specifically witten to take care of memory mapped
> devices.
> but no implementation exists for pci to take care of pcie based memory
> ranges.
> in fact pci world doesnt seem to define standard dma-ranges since
> there is an absense of the same, the dma_mask used to remain 32bit
> because of
> 0 size return (parsed by of_dma_configure())
>
> this patch also implements of_pci_get_dma_ranges to cater to pci world
> dma-ranges.
> so then the returned size get best possible (largest) dma_mask.
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>; we should get
> dev->coherent_dma_mask=0x7fffffffff.
>
> conclusion: there are following problems
> 1) linux pci and iommu framework integration has glitches with respect
> to dma-ranges
> 2) pci linux framework look very uncertain about dma-ranges, rather
> binding is not defined
>    the way it is defined for memory mapped devices.
>    rcar and iproc based SOCs use their custom one dma-ranges
>    (rather can be standard)
> 3) even if in case of default parser of_dma_get_ranges,:
>    it throws and erro"
>    "no dma-ranges found for node"
>    because of the bug which exists.
>    following lines should be moved to the end of while(1)
> 	839                 node = of_get_next_parent(node);
> 	840                 if (!node)
> 	841                         break;

Right, having made sense of this and looked into things myself I think I
understand now; what this boils down to is that the existing implementation
of of_dma_get_range() expects always to be given a leaf device_node, and
doesn't cope with being given a device_node for the given device's parent
bus directly. That's really all there is; it's not specific to PCI (there
are other probeable and DMA-capable buses whose children aren't described in
DT, like the fsl-mc thing), and it definitely doesn't have anything to do
with IOMMUs.

Now, that's certainly something to fix, but AFAICS this patch doesn't do
that, only adds some PCI-specific code which is never called.

DMA mask inheritance for arm64 is another issue, which again is general, but
does tend to be more visible in the IOMMU case. That still needs some work
on the APCI side - all the DT-centric approaches so far either regress or at
best do nothing for ACPI. I've made a note to try to look into that soon,
but from what I recall I fear there is still an open question about what to
do for a default in the absence of IORT or _DMA (once the current assumption
that drivers can override our arbitrary default at will is closed down).

In the meantime, have you tried 4.11-rc1 or later on the affected system?
One of the ulterior motives behind 122fac030e91 was that in many cases it
also happens to paper over most versions of this problem for PCI devices,
and makes the IOMMU at least useable (on systems which don't need to
dma_map_*() vast amounts of RAM all at once) while we fix the underlying
things properly.

Robin.

> Reviewed-by: Anup Patel <anup.patel@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index
> 8c7c244..20cfff7 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE  config NEED_SG_DMA_LENGTH
>  	def_bool y
>
> +config ARCH_HAS_DMA_SET_COHERENT_MASK
> +	def_bool y
> +
>  config SMP
>  	def_bool y
>
> diff --git a/arch/arm64/include/asm/device.h
> b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644
> --- a/arch/arm64/include/asm/device.h
> +++ b/arch/arm64/include/asm/device.h
> @@ -20,6 +20,7 @@ struct dev_archdata {  #ifdef CONFIG_IOMMU_API
>  	void *iommu;			/* private IOMMU data */
>  #endif
> +	u64 parent_dma_mask;
>  	bool dma_coherent;
>  };
>
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 81cdb2e..5845ecd 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const
> void *virt, phys_addr_t phys)
>  	__dma_flush_area(virt, PAGE_SIZE);
>  }
>
> +
>  static void *__iommu_alloc_attrs(struct device *dev, size_t size,
>  				 dma_addr_t *handle, gfp_t gfp,
>  				 unsigned long attrs)
> @@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device
> *dev,
>  	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);  }
>
> +static int __iommu_set_dma_mask(struct device *dev, u64 mask) {
> +	/* device is not DMA capable */
> +	if (!dev->dma_mask)
> +		return -EIO;
> +
> +	if (mask > dev->archdata.parent_dma_mask)
> +		mask = dev->archdata.parent_dma_mask;
> +
> +	*dev->dma_mask = mask;
> +
> +	return 0;
> +}
> +
>  static const struct dma_map_ops iommu_dma_ops = {
>  	.alloc = __iommu_alloc_attrs,
>  	.free = __iommu_free_attrs,
> @@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device
> *dev,
>  	.map_resource = iommu_dma_map_resource,
>  	.unmap_resource = iommu_dma_unmap_resource,
>  	.mapping_error = iommu_dma_mapping_error,
> +	.set_dma_mask = __iommu_set_dma_mask,
>  };
>
> +int dma_set_coherent_mask(struct device *dev, u64 mask) {
> +	if (get_dma_ops(dev) == &iommu_dma_ops &&
> +	    mask > dev->archdata.parent_dma_mask)
> +		mask = dev->archdata.parent_dma_mask;
> +
> +	dev->coherent_dma_mask = mask;
> +	return 0;
> +}
> +EXPORT_SYMBOL(dma_set_coherent_mask);
> +
> +
>  /*
>   * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
>   * everything it needs to - the device is only partially created and
> the @@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev,
> u64 dma_base, u64 size,
>  	if (!dev->dma_ops)
>  		dev->dma_ops = &swiotlb_dma_ops;
>
> +	dev->archdata.parent_dma_mask = size - 1;
> +
>  	dev->archdata.dma_coherent = coherent;
>  	__iommu_setup_dma_ops(dev, dma_base, size, iommu);  } diff --git
> a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..5804717
> 100644
> --- a/drivers/of/of_pci.c
> +++ b/drivers/of/of_pci.c
> @@ -283,6 +283,51 @@ int of_pci_get_host_bridge_resources(struct
> device_node *dev,
>  	return err;
>  }
>  EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
> +
> +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
> +*paddr, u64 *size) {
> +	struct device_node *node = of_node_get(np);
> +	int rlen, naddr, nsize, pna;
> +	int ret = 0;
> +	const int na = 3, ns = 2;
> +	struct of_pci_range_parser parser;
> +	struct of_pci_range range;
> +
> +	if (!node)
> +		return -EINVAL;
> +
> +	parser.node = node;
> +	parser.pna = of_n_addr_cells(node);
> +	parser.np = parser.pna + na + ns;
> +
> +	parser.range = of_get_property(node, "dma-ranges", &rlen);
> +
> +	if (!parser.range) {
> +		pr_debug("pcie device has no dma-ranges defined for
> node(%s)\n", np->full_name);
> +		ret = -ENODEV;
> +		goto out;
> +	}
> +
> +	parser.end = parser.range + rlen / sizeof(__be32);
> +
> +	/* how do we take care of multiple dma windows ?. */
> +	for_each_of_pci_range(&parser, &range) {
> +		*dma_addr = range.pci_addr;
> +		*size = range.size;
> +		*paddr = range.cpu_addr;
> +	}
> +
> +	pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
> +		 *dma_addr, *paddr, *size);
> +		 *dma_addr = range.pci_addr;
> +		 *size = range.size;
> +
> +out:
> +	of_node_put(node);
> +	return ret;
> +
> +}
> +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges);
>  #endif /* CONFIG_OF_ADDRESS */
>
>  #ifdef CONFIG_PCI_MSI
> diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index
> 0e0974e..907ace0 100644
> --- a/include/linux/of_pci.h
> +++ b/include/linux/of_pci.h
> @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { }
> int of_pci_get_host_bridge_resources(struct device_node *dev,
>  			unsigned char busno, unsigned char bus_max,
>  			struct list_head *resources, resource_size_t *io_base);
> +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64
> +*paddr, u64 *size);
>  #else
>  static inline int of_pci_get_host_bridge_resources(struct device_node
> *dev,
>  			unsigned char busno, unsigned char bus_max, @@
> -83,6 +84,11 @@ static inline int
> of_pci_get_host_bridge_resources(struct
> device_node *dev,  {
>  	return -EINVAL;
>  }
> +
> +static inline int of_pci_get_dma_ranges(struct device_node *np, u64
> +*dma_addr, u64 *paddr, u64 *size) {
> +	return -EINVAL;
> +}
>  #endif
>
>  #if defined(CONFIG_OF) && defined(CONFIG_PCI_MSI)
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-03-25  5:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-20  8:57 [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation Oza Oza
2017-03-20  8:57 ` Oza Oza
2017-03-20  8:57 ` Oza Oza
2017-03-20  8:57 ` Oza Oza via iommu
2017-03-20 15:43 ` Robin Murphy
2017-03-20 15:43   ` Robin Murphy
2017-03-20 15:43   ` Robin Murphy
2017-03-20 17:49   ` Oza Oza
2017-03-20 17:49     ` Oza Oza
2017-03-20 17:49     ` Oza Oza
2017-03-20 17:49     ` Oza Oza via iommu
2017-03-25  5:34   ` Oza Oza
2017-03-25  5:34     ` Oza Oza
2017-03-25  5:34     ` Oza Oza

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.