linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] IOMMU/DMA map_resource support for peer-to-peer
@ 2015-05-29 17:14 wdavis
  2015-05-29 17:14 ` [PATCH v3 1/7] dma-debug: add checking for map/unmap_resource wdavis
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: wdavis @ 2015-05-29 17:14 UTC (permalink / raw)
  To: Joerg Roedel, Bjorn Helgaas
  Cc: Terence Ripperda, John Hubbard, Jerome Glisse, Mark Hounschell,
	Konrad Rzeszutek Wilk, Jonathan Corbet, David S. Miller,
	Yijing Wang, Alex Williamson, Dave Jiang, iommu, linux-pci,
	Will Davis

From: Will Davis <wdavis@nvidia.com>

Hi,

This is the third version of a patchset to add the DMA APIs necessary to
map and unmap a struct resource to and from a PCI device's IOVA domain.
This allows a PCI device to access a peer device's BAR resource when a
hardware IOMMU is enabled.

Thanks,
Will

Changelog:

v3:
- changed dma_map_resource() to BUG() if the map_resource DMA op is not
  provided, instead of returning 0
- updated documentation to remove requirement to check for 0 return value
  as an error
- remove const keyword from struct dma_map_ops in new DMA APIs for
  consistency with other APIs

v2: http://www.spinics.net/lists/linux-pci/msg41192.html
- added documentation for the new DMA APIs
- fixed physical-to-bus address conversion in the nommu implementation

v1: http://www.spinics.net/lists/linux-pci/msg40747.html

Will Davis (7):
  dma-debug: add checking for map/unmap_resource
  DMA-API: Introduce dma_(un)map_resource
  dma-mapping: pci: add pci_(un)map_resource
  DMA-API: Add dma_(un)map_resource() documentation
  iommu/amd: Implement (un)map_resource
  iommu/vt-d: implement (un)map_resource
  x86: add pci-nommu implementation of map_resource

 Documentation/DMA-API-HOWTO.txt          | 36 ++++++++++++++-
 Documentation/DMA-API.txt                | 31 ++++++++++---
 arch/x86/kernel/pci-nommu.c              | 32 ++++++++++++++
 drivers/iommu/amd_iommu.c                | 76 ++++++++++++++++++++++++++------
 drivers/iommu/intel-iommu.c              | 18 ++++++++
 include/asm-generic/dma-mapping-broken.h |  9 ++++
 include/asm-generic/dma-mapping-common.h | 34 ++++++++++++++
 include/asm-generic/pci-dma-compat.h     | 14 ++++++
 include/linux/dma-debug.h                | 20 +++++++++
 include/linux/dma-mapping.h              |  7 +++
 lib/dma-debug.c                          | 47 ++++++++++++++++++++
 11 files changed, 304 insertions(+), 20 deletions(-)

-- 
2.4.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/7] dma-debug: add checking for map/unmap_resource
  2015-05-29 17:14 [PATCH v3 0/7] IOMMU/DMA map_resource support for peer-to-peer wdavis
@ 2015-05-29 17:14 ` wdavis
  2015-05-29 17:14 ` [PATCH v3 2/7] DMA-API: Introduce dma_(un)map_resource wdavis
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: wdavis @ 2015-05-29 17:14 UTC (permalink / raw)
  To: Joerg Roedel, Bjorn Helgaas
  Cc: Terence Ripperda, John Hubbard, Jerome Glisse, Mark Hounschell,
	Konrad Rzeszutek Wilk, Jonathan Corbet, David S. Miller,
	Yijing Wang, Alex Williamson, Dave Jiang, iommu, linux-pci,
	Will Davis

From: Will Davis <wdavis@nvidia.com>

Add debug callbacks for the new dma_map_resource and dma_unmap_resource
functions.

Signed-off-by: Will Davis <wdavis@nvidia.com>
Reviewed-by: Terence Ripperda <tripperda@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/dma-debug.h | 20 ++++++++++++++++++++
 lib/dma-debug.c           | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+)

diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
index fe8cb61..19f328c 100644
--- a/include/linux/dma-debug.h
+++ b/include/linux/dma-debug.h
@@ -44,6 +44,13 @@ extern void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
 extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 				 size_t size, int direction, bool map_single);
 
+extern void debug_dma_map_resource(struct device *dev, struct resource *res,
+				   size_t offset, size_t size, int direction,
+				   dma_addr_t dma_addr);
+
+extern void debug_dma_unmap_resource(struct device *dev, dma_addr_t addr,
+				     size_t size, int direction);
+
 extern void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
 			     int nents, int mapped_ents, int direction);
 
@@ -120,6 +127,19 @@ static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 {
 }
 
+static inline void debug_dma_map_resource(struct device *dev,
+					  struct resource *res, size_t offset,
+					  size_t size, int direction,
+					  dma_addr_t dma_addr)
+{
+}
+
+static inline void debug_dma_unmap_resource(struct device *dev,
+					    dma_addr_t addr, size_t size,
+					    int direction)
+{
+}
+
 static inline void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
 				    int nents, int mapped_ents, int direction)
 {
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index ae4b65e..a6d8fa7 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -43,6 +43,7 @@ enum {
 	dma_debug_page,
 	dma_debug_sg,
 	dma_debug_coherent,
+	dma_debug_resource,
 };
 
 enum map_err_types {
@@ -1348,6 +1349,52 @@ void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 }
 EXPORT_SYMBOL(debug_dma_unmap_page);
 
+void debug_dma_map_resource(struct device *dev, struct resource *resource,
+			    size_t offset, size_t size, int direction,
+			    dma_addr_t dma_addr)
+{
+	struct dma_debug_entry *entry;
+
+	if (unlikely(dma_debug_disabled()))
+		return;
+
+	if (dma_mapping_error(dev, dma_addr))
+		return;
+
+	entry = dma_entry_alloc();
+	if (!entry)
+		return;
+
+	entry->dev       = dev;
+	entry->type      = dma_debug_resource;
+	entry->pfn       = resource->start >> PAGE_SHIFT;
+	entry->offset    = offset,
+	entry->dev_addr  = dma_addr;
+	entry->size      = size;
+	entry->direction = direction;
+
+	add_dma_entry(entry);
+}
+EXPORT_SYMBOL(debug_dma_map_resource);
+
+void debug_dma_unmap_resource(struct device *dev, dma_addr_t addr,
+			      size_t size, int direction)
+{
+	struct dma_debug_entry ref = {
+		.type           = dma_debug_resource,
+		.dev            = dev,
+		.dev_addr       = addr,
+		.size           = size,
+		.direction      = direction,
+	};
+
+	if (unlikely(dma_debug_disabled()))
+		return;
+
+	check_unmap(&ref);
+}
+EXPORT_SYMBOL(debug_dma_unmap_resource);
+
 void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
 		      int nents, int mapped_ents, int direction)
 {
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/7] DMA-API: Introduce dma_(un)map_resource
  2015-05-29 17:14 [PATCH v3 0/7] IOMMU/DMA map_resource support for peer-to-peer wdavis
  2015-05-29 17:14 ` [PATCH v3 1/7] dma-debug: add checking for map/unmap_resource wdavis
@ 2015-05-29 17:14 ` wdavis
  2015-05-29 17:14 ` [PATCH v3 3/7] dma-mapping: pci: add pci_(un)map_resource wdavis
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: wdavis @ 2015-05-29 17:14 UTC (permalink / raw)
  To: Joerg Roedel, Bjorn Helgaas
  Cc: Terence Ripperda, John Hubbard, Jerome Glisse, Mark Hounschell,
	Konrad Rzeszutek Wilk, Jonathan Corbet, David S. Miller,
	Yijing Wang, Alex Williamson, Dave Jiang, iommu, linux-pci,
	Will Davis

From: Will Davis <wdavis@nvidia.com>

Add functions to DMA-map and -unmap a resource for a given device. This
will allow devices to DMA-map a peer device's resource (for example,
another device's BAR region on PCI) to enable peer-to-peer transactions.

Signed-off-by: Will Davis <wdavis@nvidia.com>
---
 include/asm-generic/dma-mapping-broken.h |  9 +++++++++
 include/asm-generic/dma-mapping-common.h | 34 ++++++++++++++++++++++++++++++++
 include/linux/dma-mapping.h              |  7 +++++++
 3 files changed, 50 insertions(+)

diff --git a/include/asm-generic/dma-mapping-broken.h b/include/asm-generic/dma-mapping-broken.h
index 6c32af9..d171f01 100644
--- a/include/asm-generic/dma-mapping-broken.h
+++ b/include/asm-generic/dma-mapping-broken.h
@@ -59,6 +59,15 @@ extern void
 dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size,
 	       enum dma_data_direction direction);
 
+extern dma_addr_t
+dma_map_resource(struct device *dev, struct resource *res,
+		 unsigned long offset, size_t size,
+		 enum dma_data_direction direction);
+
+extern void
+dma_unmap_resource(struct device *dev, dma_addr_t dma_address, size_t size,
+		   enum dma_data_direction direction);
+
 extern void
 dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, size_t size,
 			enum dma_data_direction direction);
diff --git a/include/asm-generic/dma-mapping-common.h b/include/asm-generic/dma-mapping-common.h
index 940d5ec..9e6c201 100644
--- a/include/asm-generic/dma-mapping-common.h
+++ b/include/asm-generic/dma-mapping-common.h
@@ -73,6 +73,36 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
 		ops->unmap_sg(dev, sg, nents, dir, attrs);
 }
 
+static inline dma_addr_t dma_map_resource_attrs(struct device *dev,
+						struct resource *res,
+						size_t offset, size_t size,
+						enum dma_data_direction dir,
+						struct dma_attrs *attrs)
+{
+	struct dma_map_ops *ops = get_dma_ops(dev);
+	dma_addr_t addr;
+
+	BUG_ON(!valid_dma_direction(dir));
+	BUG_ON(ops->map_resource == NULL);
+	addr = ops->map_resource(dev, res, offset, size, dir, attrs);
+	debug_dma_map_resource(dev, res, offset, size, dir, addr);
+
+	return addr;
+}
+
+static inline void dma_unmap_resource_attrs(struct device *dev, dma_addr_t addr,
+					    size_t size,
+					    enum dma_data_direction dir,
+					    struct dma_attrs *attrs)
+{
+	struct dma_map_ops *ops = get_dma_ops(dev);
+
+	BUG_ON(!valid_dma_direction(dir));
+	if (ops->unmap_resource)
+		ops->unmap_resource(dev, addr, size, dir, attrs);
+	debug_dma_unmap_resource(dev, addr, size, dir);
+}
+
 static inline dma_addr_t dma_map_page(struct device *dev, struct page *page,
 				      size_t offset, size_t size,
 				      enum dma_data_direction dir)
@@ -180,6 +210,10 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 #define dma_unmap_single(d, a, s, r) dma_unmap_single_attrs(d, a, s, r, NULL)
 #define dma_map_sg(d, s, n, r) dma_map_sg_attrs(d, s, n, r, NULL)
 #define dma_unmap_sg(d, s, n, r) dma_unmap_sg_attrs(d, s, n, r, NULL)
+#define dma_map_resource(d, e, o, s, r) \
+	dma_map_resource_attrs(d, e, o, s, r, NULL)
+#define dma_unmap_resource(d, a, s, r) \
+	dma_unmap_resource_attrs(d, a, s, r, NULL)
 
 extern int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
 			   void *cpu_addr, dma_addr_t dma_addr, size_t size);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index ac07ff0..05b0b51 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -34,6 +34,13 @@ struct dma_map_ops {
 	void (*unmap_page)(struct device *dev, dma_addr_t dma_handle,
 			   size_t size, enum dma_data_direction dir,
 			   struct dma_attrs *attrs);
+	dma_addr_t (*map_resource)(struct device *dev, struct resource *res,
+				   unsigned long offset, size_t size,
+				   enum dma_data_direction dir,
+				   struct dma_attrs *attrs);
+	void (*unmap_resource)(struct device *dev, dma_addr_t dma_handle,
+			       size_t size, enum dma_data_direction dir,
+			       struct dma_attrs *attrs);
 	/*
 	 * map_sg returns 0 on error and a value > 0 on success.
 	 * It should never return a value < 0.
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 3/7] dma-mapping: pci: add pci_(un)map_resource
  2015-05-29 17:14 [PATCH v3 0/7] IOMMU/DMA map_resource support for peer-to-peer wdavis
  2015-05-29 17:14 ` [PATCH v3 1/7] dma-debug: add checking for map/unmap_resource wdavis
  2015-05-29 17:14 ` [PATCH v3 2/7] DMA-API: Introduce dma_(un)map_resource wdavis
@ 2015-05-29 17:14 ` wdavis
  2015-07-01 16:06   ` Bjorn Helgaas
  2015-05-29 17:14 ` [PATCH v3 4/7] DMA-API: Add dma_(un)map_resource() documentation wdavis
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: wdavis @ 2015-05-29 17:14 UTC (permalink / raw)
  To: Joerg Roedel, Bjorn Helgaas
  Cc: Terence Ripperda, John Hubbard, Jerome Glisse, Mark Hounschell,
	Konrad Rzeszutek Wilk, Jonathan Corbet, David S. Miller,
	Yijing Wang, Alex Williamson, Dave Jiang, iommu, linux-pci,
	Will Davis

From: Will Davis <wdavis@nvidia.com>

Simply route these through to the new dma_(un)map_resource APIs.

Signed-off-by: Will Davis <wdavis@nvidia.com>
Reviewed-by: Terence Ripperda <tripperda@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
---
 include/asm-generic/pci-dma-compat.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/asm-generic/pci-dma-compat.h b/include/asm-generic/pci-dma-compat.h
index c110843..ac4a4ad 100644
--- a/include/asm-generic/pci-dma-compat.h
+++ b/include/asm-generic/pci-dma-compat.h
@@ -61,6 +61,20 @@ pci_unmap_page(struct pci_dev *hwdev, dma_addr_t dma_address,
 	dma_unmap_page(hwdev == NULL ? NULL : &hwdev->dev, dma_address, size, (enum dma_data_direction)direction);
 }
 
+static inline dma_addr_t
+pci_map_resource(struct pci_dev *hwdev, struct resource *resource,
+		 unsigned long offset, size_t size, int direction)
+{
+	return dma_map_resource(hwdev == NULL ? NULL : &hwdev->dev, resource, offset, size, (enum dma_data_direction)direction);
+}
+
+static inline void
+pci_unmap_resource(struct pci_dev *hwdev, dma_addr_t dma_address, size_t size,
+		   int direction)
+{
+	dma_unmap_resource(hwdev == NULL ? NULL : &hwdev->dev, dma_address, size, (enum dma_data_direction)direction);
+}
+
 static inline int
 pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
 	   int nents, int direction)
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 4/7] DMA-API: Add dma_(un)map_resource() documentation
  2015-05-29 17:14 [PATCH v3 0/7] IOMMU/DMA map_resource support for peer-to-peer wdavis
                   ` (2 preceding siblings ...)
  2015-05-29 17:14 ` [PATCH v3 3/7] dma-mapping: pci: add pci_(un)map_resource wdavis
@ 2015-05-29 17:14 ` wdavis
  2015-05-29 17:14 ` [PATCH v3 5/7] iommu/amd: Implement (un)map_resource wdavis
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: wdavis @ 2015-05-29 17:14 UTC (permalink / raw)
  To: Joerg Roedel, Bjorn Helgaas
  Cc: Terence Ripperda, John Hubbard, Jerome Glisse, Mark Hounschell,
	Konrad Rzeszutek Wilk, Jonathan Corbet, David S. Miller,
	Yijing Wang, Alex Williamson, Dave Jiang, iommu, linux-pci,
	Will Davis

From: Will Davis <wdavis@nvidia.com>

Add references to both the general API documentation as well as the HOWTO.

Signed-off-by: Will Davis <wdavis@nvidia.com>
---
 Documentation/DMA-API-HOWTO.txt | 36 ++++++++++++++++++++++++++++++++++--
 Documentation/DMA-API.txt       | 31 ++++++++++++++++++++++++++-----
 2 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt
index 0f7afb2..837af63 100644
--- a/Documentation/DMA-API-HOWTO.txt
+++ b/Documentation/DMA-API-HOWTO.txt
@@ -138,6 +138,10 @@ What about block I/O and networking buffers?  The block I/O and
 networking subsystems make sure that the buffers they use are valid
 for you to DMA from/to.
 
+In some systems, it may also be possible to DMA to and/or from a peer
+device's MMIO region, as described by a 'struct resource'. This is
+referred to as a peer-to-peer mapping.
+
 			DMA addressing limitations
 
 Does your device have any DMA addressing limitations?  For example, is
@@ -648,6 +652,34 @@ Every dma_map_{single,sg}() call should have its dma_unmap_{single,sg}()
 counterpart, because the bus address space is a shared resource and
 you could render the machine unusable by consuming all bus addresses.
 
+Peer-to-peer DMA mappings can be obtained using dma_map_resource() to map
+another device's MMIO region for the given device:
+
+	struct resource *peer_mmio_res = &other_dev->resource[0];
+	dma_addr_t dma_handle = dma_map_resource(dev, peer_mmio_res,
+						 offset, size, direction);
+	if (dma_mapping_error(dev, dma_handle))
+	{
+		/*
+		 * reduce current DMA mapping usage,
+		 * delay and try again later or
+		 * reset driver.
+		 */
+		goto map_error_handling;
+	}
+
+	...
+
+	dma_unmap_resource(dev, dma_handle, size, direction);
+
+Here, "offset" means byte offset within the given resource.
+
+You should call dma_mapping_error() as dma_map_resource() could fail and
+return error as outlined under the dma_map_single() discussion.
+
+You should call dma_unmap_resource() when DMA activity is finished, e.g.,
+from the interrupt which told you that the DMA transfer is done.
+
 If you need to use the same streaming DMA region multiple times and touch
 the data in between the DMA transfers, the buffer needs to be synced
 properly in order for the CPU and device to see the most up-to-date and
@@ -765,8 +797,8 @@ failure can be determined by:
 
 - checking if dma_alloc_coherent() returns NULL or dma_map_sg returns 0
 
-- checking the dma_addr_t returned from dma_map_single() and dma_map_page()
-  by using dma_mapping_error():
+- checking the dma_addr_t returned from dma_map_single(), dma_map_resource(),
+  and dma_map_page() by using dma_mapping_error():
 
 	dma_addr_t dma_handle;
 
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 5208840..8158f4c 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -283,14 +283,35 @@ and <size> parameters are provided to do partial page mapping, it is
 recommended that you never use these unless you really know what the
 cache width is.
 
+dma_addr_t
+dma_map_resource(struct device *dev, struct resource *res,
+		 unsigned long offset, size_t size,
+		 enum dma_data_direction_direction)
+
+API for mapping resources. This API allows a driver to map a peer
+device's resource for DMA. All the notes and warnings for the other
+APIs apply here. Also, the success of this API does not validate or
+guarantee that peer-to-peer transactions between the device and its
+peer will be functional. They only grant access so that if such
+transactions are possible, an IOMMU will not prevent them from
+succeeding.
+
+void
+dma_unmap_resource(struct device *dev, dma_addr_t dma_address, size_t size,
+		   enum dma_data_direction direction)
+
+Unmaps the resource previously mapped. All the parameters passed in
+must be identical to those passed in to (and returned by) the mapping
+API.
+
 int
 dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 
-In some circumstances dma_map_single() and dma_map_page() will fail to create
-a mapping. A driver can check for these errors by testing the returned
-DMA address with dma_mapping_error(). A non-zero return value means the mapping
-could not be created and the driver should take appropriate action (e.g.
-reduce current DMA mapping usage or delay and try again later).
+In some circumstances dma_map_single(), dma_map_page() and dma_map_resource()
+will fail to create a mapping. A driver can check for these errors by testing
+the returned DMA address with dma_mapping_error(). A non-zero return value
+means the mapping could not be created and the driver should take appropriate
+action (e.g. reduce current DMA mapping usage or delay and try again later).
 
 	int
 	dma_map_sg(struct device *dev, struct scatterlist *sg,
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 5/7] iommu/amd: Implement (un)map_resource
  2015-05-29 17:14 [PATCH v3 0/7] IOMMU/DMA map_resource support for peer-to-peer wdavis
                   ` (3 preceding siblings ...)
  2015-05-29 17:14 ` [PATCH v3 4/7] DMA-API: Add dma_(un)map_resource() documentation wdavis
@ 2015-05-29 17:14 ` wdavis
  2015-07-01 16:13   ` Bjorn Helgaas
  2015-05-29 17:14 ` [PATCH v3 6/7] iommu/vt-d: implement (un)map_resource wdavis
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: wdavis @ 2015-05-29 17:14 UTC (permalink / raw)
  To: Joerg Roedel, Bjorn Helgaas
  Cc: Terence Ripperda, John Hubbard, Jerome Glisse, Mark Hounschell,
	Konrad Rzeszutek Wilk, Jonathan Corbet, David S. Miller,
	Yijing Wang, Alex Williamson, Dave Jiang, iommu, linux-pci,
	Will Davis

From: Will Davis <wdavis@nvidia.com>

Implement 'map_resource' for the AMD IOMMU driver. Generalize the existing
map_page implementation to operate on a physical address, and make both
map_page and map_resource wrappers around that helper (and similiarly, for
unmap_page and unmap_resource).

This allows a device to map another's resource, to enable peer-to-peer
transactions.

Signed-off-by: Will Davis <wdavis@nvidia.com>
Reviewed-by: Terence Ripperda <tripperda@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/iommu/amd_iommu.c | 76 +++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 63 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index e43d489..ca2dac6 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -503,6 +503,8 @@ DECLARE_STATS_COUNTER(cnt_map_single);
 DECLARE_STATS_COUNTER(cnt_unmap_single);
 DECLARE_STATS_COUNTER(cnt_map_sg);
 DECLARE_STATS_COUNTER(cnt_unmap_sg);
+DECLARE_STATS_COUNTER(cnt_map_resource);
+DECLARE_STATS_COUNTER(cnt_unmap_resource);
 DECLARE_STATS_COUNTER(cnt_alloc_coherent);
 DECLARE_STATS_COUNTER(cnt_free_coherent);
 DECLARE_STATS_COUNTER(cross_page);
@@ -541,6 +543,8 @@ static void amd_iommu_stats_init(void)
 	amd_iommu_stats_add(&cnt_unmap_single);
 	amd_iommu_stats_add(&cnt_map_sg);
 	amd_iommu_stats_add(&cnt_unmap_sg);
+	amd_iommu_stats_add(&cnt_map_resource);
+	amd_iommu_stats_add(&cnt_unmap_resource);
 	amd_iommu_stats_add(&cnt_alloc_coherent);
 	amd_iommu_stats_add(&cnt_free_coherent);
 	amd_iommu_stats_add(&cross_page);
@@ -2752,20 +2756,16 @@ static void __unmap_single(struct dma_ops_domain *dma_dom,
 }
 
 /*
- * The exported map_single function for dma_ops.
+ * Wrapper function that contains code common to mapping a physical address
+ * range from a page or a resource.
  */
-static dma_addr_t map_page(struct device *dev, struct page *page,
-			   unsigned long offset, size_t size,
-			   enum dma_data_direction dir,
-			   struct dma_attrs *attrs)
+static dma_addr_t __map_phys(struct device *dev, phys_addr_t paddr,
+			     size_t size, enum dma_data_direction dir)
 {
 	unsigned long flags;
 	struct protection_domain *domain;
 	dma_addr_t addr;
 	u64 dma_mask;
-	phys_addr_t paddr = page_to_phys(page) + offset;
-
-	INC_STATS_COUNTER(cnt_map_single);
 
 	domain = get_domain(dev);
 	if (PTR_ERR(domain) == -EINVAL)
@@ -2791,16 +2791,15 @@ out:
 }
 
 /*
- * The exported unmap_single function for dma_ops.
+ * Wrapper function that contains code common to unmapping a physical address
+ * range from a page or a resource.
  */
-static void unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size,
-		       enum dma_data_direction dir, struct dma_attrs *attrs)
+static void __unmap_phys(struct device *dev, dma_addr_t dma_addr, size_t size,
+			 enum dma_data_direction dir)
 {
 	unsigned long flags;
 	struct protection_domain *domain;
 
-	INC_STATS_COUNTER(cnt_unmap_single);
-
 	domain = get_domain(dev);
 	if (IS_ERR(domain))
 		return;
@@ -2815,6 +2814,55 @@ static void unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size,
 }
 
 /*
+ * The exported map_single function for dma_ops.
+ */
+static dma_addr_t map_page(struct device *dev, struct page *page,
+			   unsigned long offset, size_t size,
+			   enum dma_data_direction dir,
+			   struct dma_attrs *attrs)
+{
+	INC_STATS_COUNTER(cnt_map_single);
+
+	return __map_phys(dev, page_to_phys(page) + offset, size, dir);
+}
+
+/*
+ * The exported unmap_single function for dma_ops.
+ */
+static void unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size,
+		       enum dma_data_direction dir, struct dma_attrs *attrs)
+{
+	INC_STATS_COUNTER(cnt_unmap_single);
+
+	__unmap_phys(dev, dma_addr, size, dir);
+}
+
+/*
+ * The exported map_resource function for dma_ops.
+ */
+static dma_addr_t map_resource(struct device *dev, struct resource *res,
+			       unsigned long offset, size_t size,
+			       enum dma_data_direction dir,
+			       struct dma_attrs *attrs)
+{
+	INC_STATS_COUNTER(cnt_map_resource);
+
+	return __map_phys(dev, res->start + offset, size, dir);
+}
+
+/*
+ * The exported unmap_resource function for dma_ops.
+ */
+static void unmap_resource(struct device *dev, dma_addr_t dma_addr,
+			   size_t size, enum dma_data_direction dir,
+			   struct dma_attrs *attrs)
+{
+	INC_STATS_COUNTER(cnt_unmap_resource);
+
+	__unmap_phys(dev, dma_addr, size, dir);
+}
+
+/*
  * The exported map_sg function for dma_ops (handles scatter-gather
  * lists).
  */
@@ -3066,6 +3114,8 @@ static struct dma_map_ops amd_iommu_dma_ops = {
 	.unmap_page = unmap_page,
 	.map_sg = map_sg,
 	.unmap_sg = unmap_sg,
+	.map_resource = map_resource,
+	.unmap_resource = unmap_resource,
 	.dma_supported = amd_iommu_dma_supported,
 };
 
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 6/7] iommu/vt-d: implement (un)map_resource
  2015-05-29 17:14 [PATCH v3 0/7] IOMMU/DMA map_resource support for peer-to-peer wdavis
                   ` (4 preceding siblings ...)
  2015-05-29 17:14 ` [PATCH v3 5/7] iommu/amd: Implement (un)map_resource wdavis
@ 2015-05-29 17:14 ` wdavis
  2015-05-29 17:14 ` [PATCH v3 7/7] x86: add pci-nommu implementation of map_resource wdavis
  2015-06-15 15:49 ` [PATCH v3 0/7] IOMMU/DMA map_resource support for peer-to-peer William Davis
  7 siblings, 0 replies; 18+ messages in thread
From: wdavis @ 2015-05-29 17:14 UTC (permalink / raw)
  To: Joerg Roedel, Bjorn Helgaas
  Cc: Terence Ripperda, John Hubbard, Jerome Glisse, Mark Hounschell,
	Konrad Rzeszutek Wilk, Jonathan Corbet, David S. Miller,
	Yijing Wang, Alex Williamson, Dave Jiang, iommu, linux-pci,
	Will Davis

From: Will Davis <wdavis@nvidia.com>

Implement 'map_resource' for the Intel IOMMU driver. Simply translate the
resource to a physical address and route it to the same handlers used by
the 'map_page' API.

This allows a device to map another's resource, to enable peer-to-peer
transactions.

Signed-off-by: Will Davis <wdavis@nvidia.com>
Reviewed-by: Terence Ripperda <tripperda@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/iommu/intel-iommu.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 68d43be..0f49eff 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3095,6 +3095,15 @@ static dma_addr_t intel_map_page(struct device *dev, struct page *page,
 				  dir, *dev->dma_mask);
 }
 
+static dma_addr_t intel_map_resource(struct device *dev, struct resource *res,
+				     unsigned long offset, size_t size,
+				     enum dma_data_direction dir,
+				     struct dma_attrs *attrs)
+{
+	return __intel_map_single(dev, res->start + offset, size,
+				  dir, *dev->dma_mask);
+}
+
 static void flush_unmaps(void)
 {
 	int i, j;
@@ -3226,6 +3235,13 @@ static void intel_unmap_page(struct device *dev, dma_addr_t dev_addr,
 	intel_unmap(dev, dev_addr);
 }
 
+static void intel_unmap_resource(struct device *dev, dma_addr_t dev_addr,
+				 size_t size, enum dma_data_direction dir,
+				 struct dma_attrs *attrs)
+{
+	intel_unmap(dev, dev_addr);
+}
+
 static void *intel_alloc_coherent(struct device *dev, size_t size,
 				  dma_addr_t *dma_handle, gfp_t flags,
 				  struct dma_attrs *attrs)
@@ -3382,6 +3398,8 @@ struct dma_map_ops intel_dma_ops = {
 	.unmap_sg = intel_unmap_sg,
 	.map_page = intel_map_page,
 	.unmap_page = intel_unmap_page,
+	.map_resource = intel_map_resource,
+	.unmap_resource = intel_unmap_resource,
 	.mapping_error = intel_mapping_error,
 };
 
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 7/7] x86: add pci-nommu implementation of map_resource
  2015-05-29 17:14 [PATCH v3 0/7] IOMMU/DMA map_resource support for peer-to-peer wdavis
                   ` (5 preceding siblings ...)
  2015-05-29 17:14 ` [PATCH v3 6/7] iommu/vt-d: implement (un)map_resource wdavis
@ 2015-05-29 17:14 ` wdavis
  2015-07-01 16:45   ` Bjorn Helgaas
  2015-06-15 15:49 ` [PATCH v3 0/7] IOMMU/DMA map_resource support for peer-to-peer William Davis
  7 siblings, 1 reply; 18+ messages in thread
From: wdavis @ 2015-05-29 17:14 UTC (permalink / raw)
  To: Joerg Roedel, Bjorn Helgaas
  Cc: Terence Ripperda, John Hubbard, Jerome Glisse, Mark Hounschell,
	Konrad Rzeszutek Wilk, Jonathan Corbet, David S. Miller,
	Yijing Wang, Alex Williamson, Dave Jiang, iommu, linux-pci,
	Will Davis

From: Will Davis <wdavis@nvidia.com>

Lookup the bus address of the resource by finding the parent host bridge,
which may be different than the parent host bridge of the target device.

Signed-off-by: Will Davis <wdavis@nvidia.com>
---
 arch/x86/kernel/pci-nommu.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index da15918..6384482 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -38,6 +38,37 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
 	return bus;
 }
 
+static dma_addr_t nommu_map_resource(struct device *dev, struct resource *res,
+				     unsigned long offset, size_t size,
+				     enum dma_data_direction dir,
+				     struct dma_attrs *attrs)
+{
+	struct pci_bus *bus;
+	struct pci_host_bridge *bridge;
+	struct resource_entry *window;
+	resource_size_t bus_offset = 0;
+	dma_addr_t dma_address;
+
+	/* Find the parent host bridge of the resource, and determine the
+	 * relative offset.
+	 */
+	list_for_each_entry(bus, &pci_root_buses, node) {
+		bridge = to_pci_host_bridge(bus->bridge);
+		resource_list_for_each_entry(window, &bridge->windows) {
+			if (resource_contains(window->res, res))
+				bus_offset = window->offset;
+		}
+	}
+
+	dma_address = (res->start - bus_offset) + offset;
+	WARN_ON(size == 0);
+	if (!check_addr("map_resource", dev, dma_address, size))
+		return DMA_ERROR_CODE;
+	flush_write_buffers();
+	return dma_address;
+}
+
+
 /* Map a set of buffers described by scatterlist in streaming
  * mode for DMA.  This is the scatter-gather version of the
  * above pci_map_single interface.  Here the scatter gather list
@@ -93,6 +124,7 @@ struct dma_map_ops nommu_dma_ops = {
 	.free			= dma_generic_free_coherent,
 	.map_sg			= nommu_map_sg,
 	.map_page		= nommu_map_page,
+	.map_resource		= nommu_map_resource,
 	.sync_single_for_device = nommu_sync_single_for_device,
 	.sync_sg_for_device	= nommu_sync_sg_for_device,
 	.is_phys		= 1,
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* RE: [PATCH v3 0/7] IOMMU/DMA map_resource support for peer-to-peer
  2015-05-29 17:14 [PATCH v3 0/7] IOMMU/DMA map_resource support for peer-to-peer wdavis
                   ` (6 preceding siblings ...)
  2015-05-29 17:14 ` [PATCH v3 7/7] x86: add pci-nommu implementation of map_resource wdavis
@ 2015-06-15 15:49 ` William Davis
  2015-07-01 15:11   ` William Davis
  7 siblings, 1 reply; 18+ messages in thread
From: William Davis @ 2015-06-15 15:49 UTC (permalink / raw)
  To: Joerg Roedel, Bjorn Helgaas, David S. Miller
  Cc: Terence Ripperda, John Hubbard, Jerome Glisse, Mark Hounschell,
	Konrad Rzeszutek Wilk, Jonathan Corbet, Yijing Wang,
	Alex Williamson, Dave Jiang, iommu, linux-pci


> From: Will Davis <wdavis@nvidia.com>
> 
> Hi,
> 
> This is the third version of a patchset to add the DMA APIs necessary to
> map and unmap a struct resource to and from a PCI device's IOVA domain.
> This allows a PCI device to access a peer device's BAR resource when a
> hardware IOMMU is enabled.
> 

Bjorn, Dave, do you have any remaining concerns with this patch series?

> Thanks,
> Will
> 
> Changelog:
> 
> v3:
> - changed dma_map_resource() to BUG() if the map_resource DMA op is not
>   provided, instead of returning 0
> - updated documentation to remove requirement to check for 0 return value
>   as an error
> - remove const keyword from struct dma_map_ops in new DMA APIs for
>   consistency with other APIs
> 
> v2: http://www.spinics.net/lists/linux-pci/msg41192.html
> - added documentation for the new DMA APIs
> - fixed physical-to-bus address conversion in the nommu implementation
> 
> v1: http://www.spinics.net/lists/linux-pci/msg40747.html
> 
> Will Davis (7):
>   dma-debug: add checking for map/unmap_resource
>   DMA-API: Introduce dma_(un)map_resource
>   dma-mapping: pci: add pci_(un)map_resource
>   DMA-API: Add dma_(un)map_resource() documentation
>   iommu/amd: Implement (un)map_resource
>   iommu/vt-d: implement (un)map_resource
>   x86: add pci-nommu implementation of map_resource
> 
>  Documentation/DMA-API-HOWTO.txt          | 36 ++++++++++++++-
>  Documentation/DMA-API.txt                | 31 ++++++++++---
>  arch/x86/kernel/pci-nommu.c              | 32 ++++++++++++++
>  drivers/iommu/amd_iommu.c                | 76 ++++++++++++++++++++++++++--
> ----
>  drivers/iommu/intel-iommu.c              | 18 ++++++++
>  include/asm-generic/dma-mapping-broken.h |  9 ++++  include/asm-
> generic/dma-mapping-common.h | 34 ++++++++++++++
>  include/asm-generic/pci-dma-compat.h     | 14 ++++++
>  include/linux/dma-debug.h                | 20 +++++++++
>  include/linux/dma-mapping.h              |  7 +++
>  lib/dma-debug.c                          | 47 ++++++++++++++++++++
>  11 files changed, 304 insertions(+), 20 deletions(-)
> 
> --
> 2.4.0

--
nvpublic


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v3 0/7] IOMMU/DMA map_resource support for peer-to-peer
  2015-06-15 15:49 ` [PATCH v3 0/7] IOMMU/DMA map_resource support for peer-to-peer William Davis
@ 2015-07-01 15:11   ` William Davis
  0 siblings, 0 replies; 18+ messages in thread
From: William Davis @ 2015-07-01 15:11 UTC (permalink / raw)
  To: Joerg Roedel, Bjorn Helgaas, David S. Miller
  Cc: Terence Ripperda, John Hubbard, Jerome Glisse, Mark Hounschell,
	Konrad Rzeszutek Wilk, Jonathan Corbet, Yijing Wang,
	Alex Williamson, Dave Jiang, iommu, linux-pci

> From: Will Davis <wdavis@nvidia.com>
>
> > From: Will Davis <wdavis@nvidia.com>
> >
> > Hi,
> >
> > This is the third version of a patchset to add the DMA APIs necessary to
> > map and unmap a struct resource to and from a PCI device's IOVA domain.
> > This allows a PCI device to access a peer device's BAR resource when a
> > hardware IOMMU is enabled.
> >
> 
> Bjorn, Dave, do you have any remaining concerns with this patch series?
> 

Bjorn and/or Dave, do you have any more comments or concerns about these patches, or are they Ack-worthy as is?

Thanks,
Will

> > Thanks,
> > Will
> >
> > Changelog:
> >
> > v3:
> > - changed dma_map_resource() to BUG() if the map_resource DMA op is not
> >   provided, instead of returning 0
> > - updated documentation to remove requirement to check for 0 return value
> >   as an error
> > - remove const keyword from struct dma_map_ops in new DMA APIs for
> >   consistency with other APIs
> >
> > v2: http://www.spinics.net/lists/linux-pci/msg41192.html
> > - added documentation for the new DMA APIs
> > - fixed physical-to-bus address conversion in the nommu implementation
> >
> > v1: http://www.spinics.net/lists/linux-pci/msg40747.html
> >
> > Will Davis (7):
> >   dma-debug: add checking for map/unmap_resource
> >   DMA-API: Introduce dma_(un)map_resource
> >   dma-mapping: pci: add pci_(un)map_resource
> >   DMA-API: Add dma_(un)map_resource() documentation
> >   iommu/amd: Implement (un)map_resource
> >   iommu/vt-d: implement (un)map_resource
> >   x86: add pci-nommu implementation of map_resource
> >
> >  Documentation/DMA-API-HOWTO.txt          | 36 ++++++++++++++-
> >  Documentation/DMA-API.txt                | 31 ++++++++++---
> >  arch/x86/kernel/pci-nommu.c              | 32 ++++++++++++++
> >  drivers/iommu/amd_iommu.c                | 76 ++++++++++++++++++++++++++--
> > ----
> >  drivers/iommu/intel-iommu.c              | 18 ++++++++
> >  include/asm-generic/dma-mapping-broken.h |  9 ++++  include/asm-
> > generic/dma-mapping-common.h | 34 ++++++++++++++
> >  include/asm-generic/pci-dma-compat.h     | 14 ++++++
> >  include/linux/dma-debug.h                | 20 +++++++++
> >  include/linux/dma-mapping.h              |  7 +++
> >  lib/dma-debug.c                          | 47 ++++++++++++++++++++
> >  11 files changed, 304 insertions(+), 20 deletions(-)
> >
> > --
> > 2.4.0
> 
--
nvpublic


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/7] dma-mapping: pci: add pci_(un)map_resource
  2015-05-29 17:14 ` [PATCH v3 3/7] dma-mapping: pci: add pci_(un)map_resource wdavis
@ 2015-07-01 16:06   ` Bjorn Helgaas
  2015-07-01 18:31     ` Bjorn Helgaas
  0 siblings, 1 reply; 18+ messages in thread
From: Bjorn Helgaas @ 2015-07-01 16:06 UTC (permalink / raw)
  To: wdavis
  Cc: Joerg Roedel, Terence Ripperda, John Hubbard, Jerome Glisse,
	Mark Hounschell, Konrad Rzeszutek Wilk, Jonathan Corbet,
	David S. Miller, Yijing Wang, Alex Williamson, Dave Jiang, iommu,
	linux-pci

On Fri, May 29, 2015 at 12:14:42PM -0500, wdavis@nvidia.com wrote:
> From: Will Davis <wdavis@nvidia.com>
> 
> Simply route these through to the new dma_(un)map_resource APIs.
> 
> Signed-off-by: Will Davis <wdavis@nvidia.com>
> Reviewed-by: Terence Ripperda <tripperda@nvidia.com>
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  include/asm-generic/pci-dma-compat.h | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/include/asm-generic/pci-dma-compat.h b/include/asm-generic/pci-dma-compat.h
> index c110843..ac4a4ad 100644
> --- a/include/asm-generic/pci-dma-compat.h
> +++ b/include/asm-generic/pci-dma-compat.h
> @@ -61,6 +61,20 @@ pci_unmap_page(struct pci_dev *hwdev, dma_addr_t dma_address,
>  	dma_unmap_page(hwdev == NULL ? NULL : &hwdev->dev, dma_address, size, (enum dma_data_direction)direction);
>  }
>  
> +static inline dma_addr_t
> +pci_map_resource(struct pci_dev *hwdev, struct resource *resource,
> +		 unsigned long offset, size_t size, int direction)

After 3a9ad0b4fdcd ("PCI: Add pci_bus_addr_t"), which will appear in
v4.2-rc1, there is a pci_bus_addr_t, and I think you need that instead
of dma_addr_t.

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3a9ad0b4fdcd

> +{
> +	return dma_map_resource(hwdev == NULL ? NULL : &hwdev->dev, resource, offset, size, (enum dma_data_direction)direction);
> +}
> +
> +static inline void
> +pci_unmap_resource(struct pci_dev *hwdev, dma_addr_t dma_address, size_t size,
> +		   int direction)
> +{
> +	dma_unmap_resource(hwdev == NULL ? NULL : &hwdev->dev, dma_address, size, (enum dma_data_direction)direction);
> +}
> +
>  static inline int
>  pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
>  	   int nents, int direction)
> -- 
> 2.4.0
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 5/7] iommu/amd: Implement (un)map_resource
  2015-05-29 17:14 ` [PATCH v3 5/7] iommu/amd: Implement (un)map_resource wdavis
@ 2015-07-01 16:13   ` Bjorn Helgaas
  0 siblings, 0 replies; 18+ messages in thread
From: Bjorn Helgaas @ 2015-07-01 16:13 UTC (permalink / raw)
  To: wdavis
  Cc: Joerg Roedel, Terence Ripperda, John Hubbard, Jerome Glisse,
	Mark Hounschell, Konrad Rzeszutek Wilk, Jonathan Corbet,
	David S. Miller, Yijing Wang, Alex Williamson, Dave Jiang, iommu,
	linux-pci

On Fri, May 29, 2015 at 12:14:44PM -0500, wdavis@nvidia.com wrote:
> From: Will Davis <wdavis@nvidia.com>
> 
> Implement 'map_resource' for the AMD IOMMU driver. Generalize the existing
> map_page implementation to operate on a physical address, and make both
> map_page and map_resource wrappers around that helper (and similiarly, for

s/similiarly/similarly/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 7/7] x86: add pci-nommu implementation of map_resource
  2015-05-29 17:14 ` [PATCH v3 7/7] x86: add pci-nommu implementation of map_resource wdavis
@ 2015-07-01 16:45   ` Bjorn Helgaas
  2015-07-01 18:14     ` Will Davis
  0 siblings, 1 reply; 18+ messages in thread
From: Bjorn Helgaas @ 2015-07-01 16:45 UTC (permalink / raw)
  To: wdavis
  Cc: Joerg Roedel, Terence Ripperda, John Hubbard, Jerome Glisse,
	Mark Hounschell, Konrad Rzeszutek Wilk, Jonathan Corbet,
	David S. Miller, Yijing Wang, Alex Williamson, Dave Jiang, iommu,
	linux-pci

On Fri, May 29, 2015 at 12:14:46PM -0500, wdavis@nvidia.com wrote:
> From: Will Davis <wdavis@nvidia.com>
> 
> Lookup the bus address of the resource by finding the parent host bridge,
> which may be different than the parent host bridge of the target device.
> 
> Signed-off-by: Will Davis <wdavis@nvidia.com>
> ---
>  arch/x86/kernel/pci-nommu.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
> index da15918..6384482 100644
> --- a/arch/x86/kernel/pci-nommu.c
> +++ b/arch/x86/kernel/pci-nommu.c
> @@ -38,6 +38,37 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
>  	return bus;
>  }
>  
> +static dma_addr_t nommu_map_resource(struct device *dev, struct resource *res,
> +				     unsigned long offset, size_t size,
> +				     enum dma_data_direction dir,
> +				     struct dma_attrs *attrs)
> +{
> +	struct pci_bus *bus;
> +	struct pci_host_bridge *bridge;
> +	struct resource_entry *window;
> +	resource_size_t bus_offset = 0;
> +	dma_addr_t dma_address;
> +
> +	/* Find the parent host bridge of the resource, and determine the
> +	 * relative offset.
> +	 */
> +	list_for_each_entry(bus, &pci_root_buses, node) {
> +		bridge = to_pci_host_bridge(bus->bridge);
> +		resource_list_for_each_entry(window, &bridge->windows) {
> +			if (resource_contains(window->res, res))
> +				bus_offset = window->offset;
> +		}
> +	}

I don't think this is safe.  Assume we have the following topology, and
we want to set it up so 0000:00:00.0 can perform peer-to-peer DMA to
0001:00:01.0:

  pci_bus 0000:00: root bus resource [mem 0x80000000-0xffffffff] (bus address [0x80000000-0xffffffff])
  pci 0000:00:00.0: ...
  pci_bus 0001:00: root bus resource [mem 0x180000000-0x1ffffffff] (bus address [0x80000000-0xffffffff])
  pci 0001:00:01.0: reg 0x10: [mem 0x180000000-0x1803fffff 64bit]

I assume the way this works is that the driver for 0000:00:00.0 would call
this function with 0001:00:01.0 and [mem 0x180000000-0x1803fffff 64bit].

We'll figure out that the resource belongs to 0001:00, so we return a
dma_addr of 0x80000000, which is the bus address as seen by 0001:00:01.0.
But if 0000:00:00.0 uses that address, it refers to something in the
0000:00 hierarchy, not the 0001:00 hierarchy.

We talked about pci_bus_address() and pcibios_resource_to_bus() earlier.
What's the subtlety that makes them unusable here?  I'd rather not add more
uses of the pci_root_buses list if we can avoid it.

> +	dma_address = (res->start - bus_offset) + offset;
> +	WARN_ON(size == 0);
> +	if (!check_addr("map_resource", dev, dma_address, size))
> +		return DMA_ERROR_CODE;
> +	flush_write_buffers();
> +	return dma_address;
> +}
> +
> +

You added an extra blank line here (there was already an extra one before
nommu_sync_sg_for_device(), which is probably what you copied).

>  /* Map a set of buffers described by scatterlist in streaming
>   * mode for DMA.  This is the scatter-gather version of the
>   * above pci_map_single interface.  Here the scatter gather list
> @@ -93,6 +124,7 @@ struct dma_map_ops nommu_dma_ops = {
>  	.free			= dma_generic_free_coherent,
>  	.map_sg			= nommu_map_sg,
>  	.map_page		= nommu_map_page,
> +	.map_resource		= nommu_map_resource,
>  	.sync_single_for_device = nommu_sync_single_for_device,
>  	.sync_sg_for_device	= nommu_sync_sg_for_device,
>  	.is_phys		= 1,
> -- 
> 2.4.0
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 7/7] x86: add pci-nommu implementation of map_resource
  2015-07-01 16:45   ` Bjorn Helgaas
@ 2015-07-01 18:14     ` Will Davis
  2015-07-07 15:34       ` Bjorn Helgaas
  0 siblings, 1 reply; 18+ messages in thread
From: Will Davis @ 2015-07-01 18:14 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Dave Jiang, John Hubbard, Jonathan Corbet, iommu, Jerome Glisse,
	linux-pci, Terence Ripperda, David S. Miller, Will Davis

> From: Bjorn Helgaas <bhelgaas@google.com>
> On Fri, May 29, 2015 at 12:14:46PM -0500, wdavis@nvidia.com wrote:
> > From: Will Davis <wdavis@nvidia.com>
> > 
> > Lookup the bus address of the resource by finding the parent host bridge,
> > which may be different than the parent host bridge of the target device.
> > 
> > Signed-off-by: Will Davis <wdavis@nvidia.com>
> > ---
> >  arch/x86/kernel/pci-nommu.c | 32 ++++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
> > index da15918..6384482 100644
> > --- a/arch/x86/kernel/pci-nommu.c
> > +++ b/arch/x86/kernel/pci-nommu.c
> > @@ -38,6 +38,37 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
> >  	return bus;
> >  }
> >  
> > +static dma_addr_t nommu_map_resource(struct device *dev, struct resource *res,
> > +				     unsigned long offset, size_t size,
> > +				     enum dma_data_direction dir,
> > +				     struct dma_attrs *attrs)
> > +{
> > +	struct pci_bus *bus;
> > +	struct pci_host_bridge *bridge;
> > +	struct resource_entry *window;
> > +	resource_size_t bus_offset = 0;
> > +	dma_addr_t dma_address;
> > +
> > +	/* Find the parent host bridge of the resource, and determine the
> > +	 * relative offset.
> > +	 */
> > +	list_for_each_entry(bus, &pci_root_buses, node) {
> > +		bridge = to_pci_host_bridge(bus->bridge);
> > +		resource_list_for_each_entry(window, &bridge->windows) {
> > +			if (resource_contains(window->res, res))
> > +				bus_offset = window->offset;
> > +		}
> > +	}
> 
> I don't think this is safe.  Assume we have the following topology, and
> we want to set it up so 0000:00:00.0 can perform peer-to-peer DMA to
> 0001:00:01.0:
> 
>   pci_bus 0000:00: root bus resource [mem 0x80000000-0xffffffff] (bus address [0x80000000-0xffffffff])
>   pci 0000:00:00.0: ...
>   pci_bus 0001:00: root bus resource [mem 0x180000000-0x1ffffffff] (bus address [0x80000000-0xffffffff])
>   pci 0001:00:01.0: reg 0x10: [mem 0x180000000-0x1803fffff 64bit]
> 
> I assume the way this works is that the driver for 0000:00:00.0 would call
> this function with 0001:00:01.0 and [mem 0x180000000-0x1803fffff 64bit].
> 

The intention is that pci_map_resource() would be called with the device to
map the region to, and the resource to map. So in this example, we would
call pci_map_resource(0000:00:00.0, [mem 0x180000000-0x1803fffff 64bit]).
The driver for 0000:00:00.0 needs to pass some information to
pci_map_resource() indicating that the mapping is for device 0000:00:00.0.

> We'll figure out that the resource belongs to 0001:00, so we return a
> dma_addr of 0x80000000, which is the bus address as seen by 0001:00:01.0.
> But if 0000:00:00.0 uses that address, it refers to something in the
> 0000:00 hierarchy, not the 0001:00 hierarchy.
> 

If the bus addresses are organized as described, is peer-to-peer DMA even
possible with this nommu topology? Is there any way in which device
0000:00:00.0 can address resources under the 0001:00: root bus, since the
bus address range is identical?

> We talked about pci_bus_address() and pcibios_resource_to_bus() earlier.
> What's the subtlety that makes them unusable here?  I'd rather not add more
> uses of the pci_root_buses list if we can avoid it.
> 

I ran into the challenge I mentioned above, where the only struct pci_dev*
we have is for the target device, and not that of the device that owns the
resource.

> > +	dma_address = (res->start - bus_offset) + offset;
> > +	WARN_ON(size == 0);
> > +	if (!check_addr("map_resource", dev, dma_address, size))
> > +		return DMA_ERROR_CODE;
> > +	flush_write_buffers();
> > +	return dma_address;
> > +}
> > +
> > +
> 
> You added an extra blank line here (there was already an extra one before
> nommu_sync_sg_for_device(), which is probably what you copied).
> 

Thanks, I'll remove the extra line.

> >  /* Map a set of buffers described by scatterlist in streaming
> >   * mode for DMA.  This is the scatter-gather version of the
> >   * above pci_map_single interface.  Here the scatter gather list
> > @@ -93,6 +124,7 @@ struct dma_map_ops nommu_dma_ops = {
> >  	.free			= dma_generic_free_coherent,
> >  	.map_sg			= nommu_map_sg,
> >  	.map_page		= nommu_map_page,
> > +	.map_resource		= nommu_map_resource,
> >  	.sync_single_for_device = nommu_sync_single_for_device,
> >  	.sync_sg_for_device	= nommu_sync_sg_for_device,
> >  	.is_phys		= 1,
> > -- 
> > 2.4.0
> > 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/7] dma-mapping: pci: add pci_(un)map_resource
  2015-07-01 16:06   ` Bjorn Helgaas
@ 2015-07-01 18:31     ` Bjorn Helgaas
  2015-07-06 15:16       ` Will Davis
  0 siblings, 1 reply; 18+ messages in thread
From: Bjorn Helgaas @ 2015-07-01 18:31 UTC (permalink / raw)
  To: wdavis
  Cc: Joerg Roedel, Terence Ripperda, John Hubbard, Jerome Glisse,
	Mark Hounschell, Konrad Rzeszutek Wilk, Jonathan Corbet,
	David S. Miller, Yijing Wang, Alex Williamson, Dave Jiang, iommu,
	linux-pci

On Wed, Jul 01, 2015 at 11:06:06AM -0500, Bjorn Helgaas wrote:
> On Fri, May 29, 2015 at 12:14:42PM -0500, wdavis@nvidia.com wrote:
> > From: Will Davis <wdavis@nvidia.com>
> > 
> > Simply route these through to the new dma_(un)map_resource APIs.
> > 
> > Signed-off-by: Will Davis <wdavis@nvidia.com>
> > Reviewed-by: Terence Ripperda <tripperda@nvidia.com>
> > Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> > ---
> >  include/asm-generic/pci-dma-compat.h | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> > 
> > diff --git a/include/asm-generic/pci-dma-compat.h b/include/asm-generic/pci-dma-compat.h
> > index c110843..ac4a4ad 100644
> > --- a/include/asm-generic/pci-dma-compat.h
> > +++ b/include/asm-generic/pci-dma-compat.h
> > @@ -61,6 +61,20 @@ pci_unmap_page(struct pci_dev *hwdev, dma_addr_t dma_address,
> >  	dma_unmap_page(hwdev == NULL ? NULL : &hwdev->dev, dma_address, size, (enum dma_data_direction)direction);
> >  }
> >  
> > +static inline dma_addr_t
> > +pci_map_resource(struct pci_dev *hwdev, struct resource *resource,
> > +		 unsigned long offset, size_t size, int direction)
> 
> After 3a9ad0b4fdcd ("PCI: Add pci_bus_addr_t"), which will appear in
> v4.2-rc1, there is a pci_bus_addr_t, and I think you need that instead
> of dma_addr_t.
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3a9ad0b4fdcd

This wasn't a very good response, sorry.  I don't think just
changing dma_addr_t to pci_bus_addr_t is a good resolution.

Isn't there an implicit assumption here that either you're using an IOMMU
driver that always returns bus addresses that fit in a dma_addr_t, or
you're not using an IOMMU and dma_addr_t is big enough for any bus address?

What happens on a system with 64-bit PCI bus addresses, 32-bit dma_addr_t,
and no IOMMU?  Would this return failure somehow?

> > +{
> > +	return dma_map_resource(hwdev == NULL ? NULL : &hwdev->dev, resource, offset, size, (enum dma_data_direction)direction);
> > +}
> > +
> > +static inline void
> > +pci_unmap_resource(struct pci_dev *hwdev, dma_addr_t dma_address, size_t size,
> > +		   int direction)
> > +{
> > +	dma_unmap_resource(hwdev == NULL ? NULL : &hwdev->dev, dma_address, size, (enum dma_data_direction)direction);
> > +}
> > +
> >  static inline int
> >  pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
> >  	   int nents, int direction)
> > -- 
> > 2.4.0
> > 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/7] dma-mapping: pci: add pci_(un)map_resource
  2015-07-01 18:31     ` Bjorn Helgaas
@ 2015-07-06 15:16       ` Will Davis
  0 siblings, 0 replies; 18+ messages in thread
From: Will Davis @ 2015-07-06 15:16 UTC (permalink / raw)
  To: Bjorn Helgaas, Joerg Roedel
  Cc: Will Davis, Dave Jiang, John Hubbard, Jonathan Corbet, iommu,
	Jerome Glisse, linux-pci, Terence Ripperda, David S. Miller


> On Wed, Jul 01, 2015 at 11:06:06AM -0500, Bjorn Helgaas wrote:
> > On Fri, May 29, 2015 at 12:14:42PM -0500, wdavis@... wrote:
> > > From: Will Davis <wdavis@...>
> > > 
> > > Simply route these through to the new dma_(un)map_resource APIs.
> > > 
> > > Signed-off-by: Will Davis <wdavis@...>
> > > Reviewed-by: Terence Ripperda <tripperda@...>
> > > Reviewed-by: John Hubbard <jhubbard@...>
> > > ---
> > >  include/asm-generic/pci-dma-compat.h | 14 ++++++++++++++
> > >  1 file changed, 14 insertions(+)
> > > 
> > > diff --git a/include/asm-generic/pci-dma-compat.h b/include/asm-generic/pci-dma-compat.h
> > > index c110843..ac4a4ad 100644
> > > --- a/include/asm-generic/pci-dma-compat.h
> > > +++ b/include/asm-generic/pci-dma-compat.h
> > >  <at>  <at>  -61,6 +61,20  <at>  <at>  pci_unmap_page(struct pci_dev *hwdev, dma_addr_t dma_address,
> > >  	dma_unmap_page(hwdev == NULL ? NULL : &hwdev->dev, dma_address, size, (enum dma_data_direction)direction);
> > >  }
> > >  
> > > +static inline dma_addr_t
> > > +pci_map_resource(struct pci_dev *hwdev, struct resource *resource,
> > > +		 unsigned long offset, size_t size, int direction)
> > 
> > After 3a9ad0b4fdcd ("PCI: Add pci_bus_addr_t"), which will appear in
> > v4.2-rc1, there is a pci_bus_addr_t, and I think you need that instead
> > of dma_addr_t.
> > 
> > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3a9ad0b4fdcd
> 
> This wasn't a very good response, sorry.  I don't think just
> changing dma_addr_t to pci_bus_addr_t is a good resolution.
> 
> Isn't there an implicit assumption here that either you're using an IOMMU
> driver that always returns bus addresses that fit in a dma_addr_t, or
> you're not using an IOMMU and dma_addr_t is big enough for any bus address?
> 
> What happens on a system with 64-bit PCI bus addresses, 32-bit dma_addr_t,
> and no IOMMU?  Would this return failure somehow?
> 

I think I see your point. Even if pci_map_resource() were changed to return
a pci_bus_addr_t, we would have the same problem again where the dma_addr_t
would either need to expand (a non-starter as already discussed) or replace
the return type of dma_map_resource() with something wider to accommodate,
or prevent such platforms from attempting to return a valid address that
will be truncated.

PCI devices on the platform in question (64-bit PCI bus addresses, 32-bit
dma_addr_t, and no IOMMU) should theoretically still be capable of PCI
peer-to-peer traffic, but would only be limited by the API, which we do not
want.

What about introducing another always-64-bit type, like peer_dma_addr_t,
for use with the new DMA APIs?

> > > +{
> > > +	return dma_map_resource(hwdev == NULL ? NULL : &hwdev->dev, resource, offset, size, (enum dma_data_direction)direction);
> > > +}
> > > +
> > > +static inline void
> > > +pci_unmap_resource(struct pci_dev *hwdev, dma_addr_t dma_address, size_t size,
> > > +		   int direction)
> > > +{
> > > +	dma_unmap_resource(hwdev == NULL ? NULL : &hwdev->dev, dma_address, size, (enum dma_data_direction)direction);
> > > +}
> > > +
> > >  static inline int
> > >  pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
> > >  	   int nents, int direction)
> > > -- 
> > > 2.4.0
> > >

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 7/7] x86: add pci-nommu implementation of map_resource
  2015-07-01 18:14     ` Will Davis
@ 2015-07-07 15:34       ` Bjorn Helgaas
  2015-07-07 18:59         ` Will Davis
  0 siblings, 1 reply; 18+ messages in thread
From: Bjorn Helgaas @ 2015-07-07 15:34 UTC (permalink / raw)
  To: Will Davis
  Cc: Dave Jiang, John Hubbard, Jonathan Corbet, iommu, Jerome Glisse,
	linux-pci, Terence Ripperda, David S. Miller, Mark Hounschell,
	joro, Konrad Rzeszutek Wilk, Alex Williamson

[+cc Mark, Joerg, Konrad, Alex]

Hi Will,

On Wed, Jul 01, 2015 at 01:14:30PM -0500, Will Davis wrote:
> > From: Bjorn Helgaas <bhelgaas@google.com>
> > On Fri, May 29, 2015 at 12:14:46PM -0500, wdavis@nvidia.com wrote:
> > > From: Will Davis <wdavis@nvidia.com>
> > > 
> > > Lookup the bus address of the resource by finding the parent host bridge,
> > > which may be different than the parent host bridge of the target device.
> > > 
> > > Signed-off-by: Will Davis <wdavis@nvidia.com>
> > > ---
> > >  arch/x86/kernel/pci-nommu.c | 32 ++++++++++++++++++++++++++++++++
> > >  1 file changed, 32 insertions(+)
> > > 
> > > diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
> > > index da15918..6384482 100644
> > > --- a/arch/x86/kernel/pci-nommu.c
> > > +++ b/arch/x86/kernel/pci-nommu.c
> > > @@ -38,6 +38,37 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
> > >  	return bus;
> > >  }
> > >  
> > > +static dma_addr_t nommu_map_resource(struct device *dev, struct resource *res,
> > > +				     unsigned long offset, size_t size,
> > > +				     enum dma_data_direction dir,
> > > +				     struct dma_attrs *attrs)
> > > +{
> > > +	struct pci_bus *bus;
> > > +	struct pci_host_bridge *bridge;
> > > +	struct resource_entry *window;
> > > +	resource_size_t bus_offset = 0;
> > > +	dma_addr_t dma_address;
> > > +
> > > +	/* Find the parent host bridge of the resource, and determine the
> > > +	 * relative offset.
> > > +	 */
> > > +	list_for_each_entry(bus, &pci_root_buses, node) {
> > > +		bridge = to_pci_host_bridge(bus->bridge);
> > > +		resource_list_for_each_entry(window, &bridge->windows) {
> > > +			if (resource_contains(window->res, res))
> > > +				bus_offset = window->offset;
> > > +		}
> > > +	}
> > 
> > I don't think this is safe.  Assume we have the following topology, and
> > we want to set it up so 0000:00:00.0 can perform peer-to-peer DMA to
> > 0001:00:01.0:
> > 
> >   pci_bus 0000:00: root bus resource [mem 0x80000000-0xffffffff] (bus address [0x80000000-0xffffffff])
> >   pci 0000:00:00.0: ...
> >   pci_bus 0001:00: root bus resource [mem 0x180000000-0x1ffffffff] (bus address [0x80000000-0xffffffff])
> >   pci 0001:00:01.0: reg 0x10: [mem 0x180000000-0x1803fffff 64bit]
> > 
> > I assume the way this works is that the driver for 0000:00:00.0 would call
> > this function with 0001:00:01.0 and [mem 0x180000000-0x1803fffff 64bit].
> > 
> 
> The intention is that pci_map_resource() would be called with the device to
> map the region to, and the resource to map. So in this example, we would
> call pci_map_resource(0000:00:00.0, [mem 0x180000000-0x1803fffff 64bit]).
> The driver for 0000:00:00.0 needs to pass some information to
> pci_map_resource() indicating that the mapping is for device 0000:00:00.0.

Oh, of course; that's sort of analogous to the way the other DMA
mapping interfaces work.

> > We'll figure out that the resource belongs to 0001:00, so we return a
> > dma_addr of 0x80000000, which is the bus address as seen by 0001:00:01.0.
> > But if 0000:00:00.0 uses that address, it refers to something in the
> > 0000:00 hierarchy, not the 0001:00 hierarchy.
> 
> If the bus addresses are organized as described, is peer-to-peer DMA even
> possible with this nommu topology? Is there any way in which device
> 0000:00:00.0 can address resources under the 0001:00: root bus, since the
> bus address range is identical?

It doesn't seem possible on conventional PCI, because the host bridge
to 0000:00 believes the transaction is intended for a device under it,
not for a device under 0001:00.

On PCIe, I think it would depend on ACS configuration and the IOMMU
and whether there's anything that can route transactions between host
bridges.

Is it important to support peer-to-peer between host bridges?  If it's
not important, you could probably simplify things by disallowing that
case.

The pci_map_resource(struct pci_dev *, struct resource *, offset, ...)
interface is analogous to dma_map_single() and similar interfaces.
But we're essentially using the resource as a proxy to identify the
other device: we use the resource, i.e., the CPU physical address of
one of the BARs, to search for the host bridge.

What would you think about explicitly passing both devices, e.g.,
replacing the "struct resource *" with a "struct pci_dev *, int bar"
pair?  It seems like then we'd be better prepared to figure out
whether it's even possible to do peer-to-peer between the two devices.

I don't know how to discover that today, but I assume that's just
because I'm ignorant or there's a hole in the system description that
might be filled eventually.

Bjorn

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 7/7] x86: add pci-nommu implementation of map_resource
  2015-07-07 15:34       ` Bjorn Helgaas
@ 2015-07-07 18:59         ` Will Davis
  0 siblings, 0 replies; 18+ messages in thread
From: Will Davis @ 2015-07-07 18:59 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Dave Jiang, John Hubbard, Jonathan Corbet, iommu, Jerome Glisse,
	linux-pci, Terence Ripperda, David S. Miller, Mark Hounschell,
	joro, Konrad Rzeszutek Wilk, Alex Williamson

> [+cc Mark, Joerg, Konrad, Alex]
> 
> Hi Will,
> 
> On Wed, Jul 01, 2015 at 01:14:30PM -0500, Will Davis wrote:
> > > From: Bjorn Helgaas <bhelgaas@google.com>
> > > On Fri, May 29, 2015 at 12:14:46PM -0500, wdavis@nvidia.com wrote:
> > > > From: Will Davis <wdavis@nvidia.com>
> > > > 
> > > > Lookup the bus address of the resource by finding the parent host bridge,
> > > > which may be different than the parent host bridge of the target device.
> > > > 
> > > > Signed-off-by: Will Davis <wdavis@nvidia.com>
> > > > ---
> > > >  arch/x86/kernel/pci-nommu.c | 32 ++++++++++++++++++++++++++++++++
> > > >  1 file changed, 32 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
> > > > index da15918..6384482 100644
> > > > --- a/arch/x86/kernel/pci-nommu.c
> > > > +++ b/arch/x86/kernel/pci-nommu.c
> > > > @@ -38,6 +38,37 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
> > > >  	return bus;
> > > >  }
> > > >  
> > > > +static dma_addr_t nommu_map_resource(struct device *dev, struct resource *res,
> > > > +				     unsigned long offset, size_t size,
> > > > +				     enum dma_data_direction dir,
> > > > +				     struct dma_attrs *attrs)
> > > > +{
> > > > +	struct pci_bus *bus;
> > > > +	struct pci_host_bridge *bridge;
> > > > +	struct resource_entry *window;
> > > > +	resource_size_t bus_offset = 0;
> > > > +	dma_addr_t dma_address;
> > > > +
> > > > +	/* Find the parent host bridge of the resource, and determine the
> > > > +	 * relative offset.
> > > > +	 */
> > > > +	list_for_each_entry(bus, &pci_root_buses, node) {
> > > > +		bridge = to_pci_host_bridge(bus->bridge);
> > > > +		resource_list_for_each_entry(window, &bridge->windows) {
> > > > +			if (resource_contains(window->res, res))
> > > > +				bus_offset = window->offset;
> > > > +		}
> > > > +	}
> > > 
> > > I don't think this is safe.  Assume we have the following topology, and
> > > we want to set it up so 0000:00:00.0 can perform peer-to-peer DMA to
> > > 0001:00:01.0:
> > > 
> > >   pci_bus 0000:00: root bus resource [mem 0x80000000-0xffffffff] (bus address [0x80000000-0xffffffff])
> > >   pci 0000:00:00.0: ...
> > >   pci_bus 0001:00: root bus resource [mem 0x180000000-0x1ffffffff] (bus address [0x80000000-0xffffffff])
> > >   pci 0001:00:01.0: reg 0x10: [mem 0x180000000-0x1803fffff 64bit]
> > > 
> > > I assume the way this works is that the driver for 0000:00:00.0 would call
> > > this function with 0001:00:01.0 and [mem 0x180000000-0x1803fffff 64bit].
> > > 
> > 
> > The intention is that pci_map_resource() would be called with the device to
> > map the region to, and the resource to map. So in this example, we would
> > call pci_map_resource(0000:00:00.0, [mem 0x180000000-0x1803fffff 64bit]).
> > The driver for 0000:00:00.0 needs to pass some information to
> > pci_map_resource() indicating that the mapping is for device 0000:00:00.0.
> 
> Oh, of course; that's sort of analogous to the way the other DMA
> mapping interfaces work.
> 
> > > We'll figure out that the resource belongs to 0001:00, so we return a
> > > dma_addr of 0x80000000, which is the bus address as seen by 0001:00:01.0.
> > > But if 0000:00:00.0 uses that address, it refers to something in the
> > > 0000:00 hierarchy, not the 0001:00 hierarchy.
> > 
> > If the bus addresses are organized as described, is peer-to-peer DMA even
> > possible with this nommu topology? Is there any way in which device
> > 0000:00:00.0 can address resources under the 0001:00: root bus, since the
> > bus address range is identical?
> 
> It doesn't seem possible on conventional PCI, because the host bridge
> to 0000:00 believes the transaction is intended for a device under it,
> not for a device under 0001:00.
> 
> On PCIe, I think it would depend on ACS configuration and the IOMMU
> and whether there's anything that can route transactions between host
> bridges.
> 
> Is it important to support peer-to-peer between host bridges?  If it's
> not important, you could probably simplify things by disallowing that
> case.
>

I've mostly been focused on peer-to-peer under a single root complex. At
least for our hardware (which is what I've had to test with), we only
support peer-to-peer when both devices are under the same root complex,
due to some performance and/or functional issues that I am not entirely
clear on, seen when trying to peer-to-peer to a device under another root
complex via Intel QPI [1]:

"The ability to use the peer-to-peer
protocol among GPUs, and its performance, is constrained
by the PCIe topology; performance is excellent when two
GPUs share the same PCIe root-complex, e.g. they are directly
connected to a PCIe switch or to the same hub. Otherwise,
when GPUs are linked to different bus branches, performance
may suffers or malfunctionings can arise. This can be an issue
on multi-socket Sandy Bridge Xeon platforms, where PCIe
slots might be connected to different processors, therefore
requiring GPU peer-to-peer traffic to cross the inter-socket
QPI channel(s)."

Apparently the QPI protocol is not quite compatible with PCIe peer-to-peer,
so perhaps that is what is being referred to [2]:

"The IOH does not support non-contiguous byte enables from PCI Express for
remote peer-to-peer MMIO transactions. This is an additional restriction
over the PCI Express standard requirements to prevent incompatibility with
Intel QuickPath Interconnect."

[1] http://arxiv.org/pdf/1307.8276.pdf
[2] http://www.intel.com/content/www/us/en/chipsets/5520-5500-chipset-ioh-datasheet.html

I'm trying to find some more details on the issues behind this but it
seems that they are Intel-specific. Although if we can't determine an
easy way to tell if inter-root-complex peer-to-peer is supported, per
the other sub-thread, then maybe it would be best to just disable that
case for now.

> The pci_map_resource(struct pci_dev *, struct resource *, offset, ...)
> interface is analogous to dma_map_single() and similar interfaces.
> But we're essentially using the resource as a proxy to identify the
> other device: we use the resource, i.e., the CPU physical address of
> one of the BARs, to search for the host bridge.
> 
> What would you think about explicitly passing both devices, e.g.,
> replacing the "struct resource *" with a "struct pci_dev *, int bar"
> pair?  It seems like then we'd be better prepared to figure out
> whether it's even possible to do peer-to-peer between the two devices.
> 

Yes, that does sound better to me. I'll work this into the next version.

Thanks,
Will

> I don't know how to discover that today, but I assume that's just
> because I'm ignorant or there's a hole in the system description that
> might be filled eventually.
> 
> Bjorn
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-07-07 19:14 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-29 17:14 [PATCH v3 0/7] IOMMU/DMA map_resource support for peer-to-peer wdavis
2015-05-29 17:14 ` [PATCH v3 1/7] dma-debug: add checking for map/unmap_resource wdavis
2015-05-29 17:14 ` [PATCH v3 2/7] DMA-API: Introduce dma_(un)map_resource wdavis
2015-05-29 17:14 ` [PATCH v3 3/7] dma-mapping: pci: add pci_(un)map_resource wdavis
2015-07-01 16:06   ` Bjorn Helgaas
2015-07-01 18:31     ` Bjorn Helgaas
2015-07-06 15:16       ` Will Davis
2015-05-29 17:14 ` [PATCH v3 4/7] DMA-API: Add dma_(un)map_resource() documentation wdavis
2015-05-29 17:14 ` [PATCH v3 5/7] iommu/amd: Implement (un)map_resource wdavis
2015-07-01 16:13   ` Bjorn Helgaas
2015-05-29 17:14 ` [PATCH v3 6/7] iommu/vt-d: implement (un)map_resource wdavis
2015-05-29 17:14 ` [PATCH v3 7/7] x86: add pci-nommu implementation of map_resource wdavis
2015-07-01 16:45   ` Bjorn Helgaas
2015-07-01 18:14     ` Will Davis
2015-07-07 15:34       ` Bjorn Helgaas
2015-07-07 18:59         ` Will Davis
2015-06-15 15:49 ` [PATCH v3 0/7] IOMMU/DMA map_resource support for peer-to-peer William Davis
2015-07-01 15:11   ` William Davis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).