linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/7] Fixes for dma-iommu swiotlb bounce buffers
@ 2021-08-17  1:38 David Stevens
  2021-08-17  1:38 ` [PATCH v6 1/7] dma-iommu: fix sync_sg with swiotlb David Stevens
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: David Stevens @ 2021-08-17  1:38 UTC (permalink / raw)
  To: Robin Murphy, Christoph Hellwig
  Cc: Joerg Roedel, Will Deacon, Lu Baolu, Tom Murphy, iommu,
	linux-kernel, David Stevens

From: David Stevens <stevensd@chromium.org>

This patch set includes various fixes for dma-iommu's swiotlb bounce
buffers for untrusted devices. There are four fixes for correctness
issues, one for a performance issue, and one for general cleanup.

The min_align_mask issue was found when running fio on an untrusted nvme
device with bs=512. The other issues were found via code inspection, so
I don't have any specific use cases where things were not working, nor
any concrete performance numbers.

v5 -> v6:
 - Remove unnecessary line break
 - Remove redundant config check

v4 -> v5:
 - Fix xen build error
 - Move _swiotlb refactor into its own patch

v3 -> v4:
 - Fold _swiotlb functions into _page functions
 - Add patch to align swiotlb buffer to iovad granule
 - Combine if checks in iommu_dma_sync_sg_* functions

v2 -> v3:
 - Add new patch to address min_align_mask bug
 - Set SKIP_CPU_SYNC flag after syncing in map/unmap
 - Properly call arch_sync_dma_for_cpu in iommu_dma_sync_sg_for_cpu

v1 -> v2:
 - Split fixes into dedicated patches
 - Less invasive changes to fix arch_sync when mapping
 - Leave dev_is_untrusted check for strict iommu

David Stevens (7):
  dma-iommu: fix sync_sg with swiotlb
  dma-iommu: fix arch_sync_dma for map
  dma-iommu: skip extra sync during unmap w/swiotlb
  dma-iommu: fold _swiotlb helpers into callers
  dma-iommu: Check CONFIG_SWIOTLB more broadly
  swiotlb: support aligned swiotlb buffers
  dma-iommu: account for min_align_mask

 drivers/iommu/dma-iommu.c | 191 +++++++++++++++++---------------------
 drivers/xen/swiotlb-xen.c |   2 +-
 include/linux/swiotlb.h   |   3 +-
 kernel/dma/swiotlb.c      |  11 ++-
 4 files changed, 96 insertions(+), 111 deletions(-)

-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v6 1/7] dma-iommu: fix sync_sg with swiotlb
  2021-08-17  1:38 [PATCH v6 0/7] Fixes for dma-iommu swiotlb bounce buffers David Stevens
@ 2021-08-17  1:38 ` David Stevens
  2021-08-17  1:38 ` [PATCH v6 2/7] dma-iommu: fix arch_sync_dma for map David Stevens
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: David Stevens @ 2021-08-17  1:38 UTC (permalink / raw)
  To: Robin Murphy, Christoph Hellwig
  Cc: Joerg Roedel, Will Deacon, Lu Baolu, Tom Murphy, iommu,
	linux-kernel, David Stevens

From: David Stevens <stevensd@chromium.org>

The is_swiotlb_buffer function takes the physical address of the swiotlb
buffer, not the physical address of the original buffer. The sglist
contains the physical addresses of the original buffer, so for the
sync_sg functions to work properly when a bounce buffer might have been
used, we need to use iommu_iova_to_phys to look up the physical address.
This is what sync_single does, so call that function on each sglist
segment.

The previous code mostly worked because swiotlb does the transfer on map
and unmap. However, any callers which use DMA_ATTR_SKIP_CPU_SYNC with
sglists or which call sync_sg would not have had anything copied to the
bounce buffer.

Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/iommu/dma-iommu.c | 33 +++++++++++++--------------------
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 98ba927aee1a..968e0150c95e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -810,17 +810,13 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
 	struct scatterlist *sg;
 	int i;
 
-	if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
-		return;
-
-	for_each_sg(sgl, sg, nelems, i) {
-		if (!dev_is_dma_coherent(dev))
+	if (dev_is_untrusted(dev))
+		for_each_sg(sgl, sg, nelems, i)
+			iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
+						      sg->length, dir);
+	else if (!dev_is_dma_coherent(dev))
+		for_each_sg(sgl, sg, nelems, i)
 			arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
-
-		if (is_swiotlb_buffer(sg_phys(sg)))
-			swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
-						    sg->length, dir);
-	}
 }
 
 static void iommu_dma_sync_sg_for_device(struct device *dev,
@@ -830,17 +826,14 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
 	struct scatterlist *sg;
 	int i;
 
-	if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
-		return;
-
-	for_each_sg(sgl, sg, nelems, i) {
-		if (is_swiotlb_buffer(sg_phys(sg)))
-			swiotlb_sync_single_for_device(dev, sg_phys(sg),
-						       sg->length, dir);
-
-		if (!dev_is_dma_coherent(dev))
+	if (dev_is_untrusted(dev))
+		for_each_sg(sgl, sg, nelems, i)
+			iommu_dma_sync_single_for_device(dev,
+							 sg_dma_address(sg),
+							 sg->length, dir);
+	else if (!dev_is_dma_coherent(dev))
+		for_each_sg(sgl, sg, nelems, i)
 			arch_sync_dma_for_device(sg_phys(sg), sg->length, dir);
-	}
 }
 
 static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 2/7] dma-iommu: fix arch_sync_dma for map
  2021-08-17  1:38 [PATCH v6 0/7] Fixes for dma-iommu swiotlb bounce buffers David Stevens
  2021-08-17  1:38 ` [PATCH v6 1/7] dma-iommu: fix sync_sg with swiotlb David Stevens
@ 2021-08-17  1:38 ` David Stevens
  2021-08-19  8:59   ` Robin Murphy
  2021-08-17  1:38 ` [PATCH v6 3/7] dma-iommu: skip extra sync during unmap w/swiotlb David Stevens
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: David Stevens @ 2021-08-17  1:38 UTC (permalink / raw)
  To: Robin Murphy, Christoph Hellwig
  Cc: Joerg Roedel, Will Deacon, Lu Baolu, Tom Murphy, iommu,
	linux-kernel, David Stevens

From: David Stevens <stevensd@chromium.org>

When calling arch_sync_dma, we need to pass it the memory that's
actually being used for dma. When using swiotlb bounce buffers, this is
the bounce buffer. Move arch_sync_dma into the __iommu_dma_map_swiotlb
helper, so it can use the bounce buffer address if necessary.

Now that iommu_dma_map_sg delegates to a function which takes care of
architectural syncing in the untrusted device case, the call to
iommu_dma_sync_sg_for_device can be moved so it only occurs for trusted
devices. Doing the sync for untrusted devices before mapping never
really worked, since it needs to be able to target swiotlb buffers.

This also moves the architectural sync to before the call to
__iommu_dma_map, to guarantee that untrusted devices can't see stale
data they shouldn't see.

Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/iommu/dma-iommu.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 968e0150c95e..8098ce593640 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -576,6 +576,9 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
 		memset(padding_start, 0, padding_size);
 	}
 
+	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+		arch_sync_dma_for_device(phys, org_size, dir);
+
 	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
 	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
 		swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
@@ -842,14 +845,9 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 {
 	phys_addr_t phys = page_to_phys(page) + offset;
 	bool coherent = dev_is_dma_coherent(dev);
-	dma_addr_t dma_handle;
 
-	dma_handle = __iommu_dma_map_swiotlb(dev, phys, size, dma_get_mask(dev),
+	return __iommu_dma_map_swiotlb(dev, phys, size, dma_get_mask(dev),
 			coherent, dir, attrs);
-	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-	    dma_handle != DMA_MAPPING_ERROR)
-		arch_sync_dma_for_device(phys, size, dir);
-	return dma_handle;
 }
 
 static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
@@ -992,12 +990,12 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 	    iommu_deferred_attach(dev, domain))
 		return 0;
 
-	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-		iommu_dma_sync_sg_for_device(dev, sg, nents, dir);
-
 	if (dev_is_untrusted(dev))
 		return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
 
+	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+		iommu_dma_sync_sg_for_device(dev, sg, nents, dir);
+
 	/*
 	 * Work out how much IOVA space we need, and align the segments to
 	 * IOVA granules for the IOMMU driver to handle. With some clever
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 3/7] dma-iommu: skip extra sync during unmap w/swiotlb
  2021-08-17  1:38 [PATCH v6 0/7] Fixes for dma-iommu swiotlb bounce buffers David Stevens
  2021-08-17  1:38 ` [PATCH v6 1/7] dma-iommu: fix sync_sg with swiotlb David Stevens
  2021-08-17  1:38 ` [PATCH v6 2/7] dma-iommu: fix arch_sync_dma for map David Stevens
@ 2021-08-17  1:38 ` David Stevens
  2021-08-19  8:59   ` Robin Murphy
  2021-08-17  1:38 ` [PATCH v6 4/7] dma-iommu: fold _swiotlb helpers into callers David Stevens
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: David Stevens @ 2021-08-17  1:38 UTC (permalink / raw)
  To: Robin Murphy, Christoph Hellwig
  Cc: Joerg Roedel, Will Deacon, Lu Baolu, Tom Murphy, iommu,
	linux-kernel, David Stevens

From: David Stevens <stevensd@chromium.org>

Calling the iommu_dma_sync_*_for_cpu functions during unmap can cause
two copies out of the swiotlb buffer. Do the arch sync directly in
__iommu_dma_unmap_swiotlb instead to avoid this. This makes the call to
iommu_dma_sync_sg_for_cpu for untrusted devices in iommu_dma_unmap_sg no
longer necessary, so move that invocation later in the function.

Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/iommu/dma-iommu.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 8098ce593640..5dd2c517dbf5 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -504,6 +504,9 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
 	if (WARN_ON(!phys))
 		return;
 
+	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && !dev_is_dma_coherent(dev))
+		arch_sync_dma_for_cpu(phys, size, dir);
+
 	__iommu_dma_unmap(dev, dma_addr, size);
 
 	if (unlikely(is_swiotlb_buffer(phys)))
@@ -853,8 +856,6 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-		iommu_dma_sync_single_for_cpu(dev, dma_handle, size, dir);
 	__iommu_dma_unmap_swiotlb(dev, dma_handle, size, dir, attrs);
 }
 
@@ -1062,14 +1063,14 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
 	struct scatterlist *tmp;
 	int i;
 
-	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-		iommu_dma_sync_sg_for_cpu(dev, sg, nents, dir);
-
 	if (dev_is_untrusted(dev)) {
 		iommu_dma_unmap_sg_swiotlb(dev, sg, nents, dir, attrs);
 		return;
 	}
 
+	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+		iommu_dma_sync_sg_for_cpu(dev, sg, nents, dir);
+
 	/*
 	 * The scatterlist segments are mapped into a single
 	 * contiguous IOVA allocation, so this is incredibly easy.
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 4/7] dma-iommu: fold _swiotlb helpers into callers
  2021-08-17  1:38 [PATCH v6 0/7] Fixes for dma-iommu swiotlb bounce buffers David Stevens
                   ` (2 preceding siblings ...)
  2021-08-17  1:38 ` [PATCH v6 3/7] dma-iommu: skip extra sync during unmap w/swiotlb David Stevens
@ 2021-08-17  1:38 ` David Stevens
  2021-08-19  9:00   ` Robin Murphy
  2021-08-17  1:38 ` [PATCH v6 5/7] dma-iommu: Check CONFIG_SWIOTLB more broadly David Stevens
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: David Stevens @ 2021-08-17  1:38 UTC (permalink / raw)
  To: Robin Murphy, Christoph Hellwig
  Cc: Joerg Roedel, Will Deacon, Lu Baolu, Tom Murphy, iommu,
	linux-kernel, David Stevens

From: David Stevens <stevensd@chromium.org>

Fold the _swiotlb helper functions into the respective _page functions,
since recent fixes have moved all logic from the _page functions to the
_swiotlb functions.

Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/iommu/dma-iommu.c | 135 +++++++++++++++++---------------------
 1 file changed, 59 insertions(+), 76 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 5dd2c517dbf5..8152efada8b2 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -493,26 +493,6 @@ static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr,
 	iommu_dma_free_iova(cookie, dma_addr, size, iotlb_gather.freelist);
 }
 
-static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
-		size_t size, enum dma_data_direction dir,
-		unsigned long attrs)
-{
-	struct iommu_domain *domain = iommu_get_dma_domain(dev);
-	phys_addr_t phys;
-
-	phys = iommu_iova_to_phys(domain, dma_addr);
-	if (WARN_ON(!phys))
-		return;
-
-	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && !dev_is_dma_coherent(dev))
-		arch_sync_dma_for_cpu(phys, size, dir);
-
-	__iommu_dma_unmap(dev, dma_addr, size);
-
-	if (unlikely(is_swiotlb_buffer(phys)))
-		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
-}
-
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
 		size_t size, int prot, u64 dma_mask)
 {
@@ -539,55 +519,6 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
 	return iova + iova_off;
 }
 
-static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
-		size_t org_size, dma_addr_t dma_mask, bool coherent,
-		enum dma_data_direction dir, unsigned long attrs)
-{
-	int prot = dma_info_to_prot(dir, coherent, attrs);
-	struct iommu_domain *domain = iommu_get_dma_domain(dev);
-	struct iommu_dma_cookie *cookie = domain->iova_cookie;
-	struct iova_domain *iovad = &cookie->iovad;
-	size_t aligned_size = org_size;
-	void *padding_start;
-	size_t padding_size;
-	dma_addr_t iova;
-
-	/*
-	 * If both the physical buffer start address and size are
-	 * page aligned, we don't need to use a bounce page.
-	 */
-	if (IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev) &&
-	    iova_offset(iovad, phys | org_size)) {
-		aligned_size = iova_align(iovad, org_size);
-		phys = swiotlb_tbl_map_single(dev, phys, org_size,
-					      aligned_size, dir, attrs);
-
-		if (phys == DMA_MAPPING_ERROR)
-			return DMA_MAPPING_ERROR;
-
-		/* Cleanup the padding area. */
-		padding_start = phys_to_virt(phys);
-		padding_size = aligned_size;
-
-		if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-		    (dir == DMA_TO_DEVICE ||
-		     dir == DMA_BIDIRECTIONAL)) {
-			padding_start += org_size;
-			padding_size -= org_size;
-		}
-
-		memset(padding_start, 0, padding_size);
-	}
-
-	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-		arch_sync_dma_for_device(phys, org_size, dir);
-
-	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
-	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
-		swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
-	return iova;
-}
-
 static void __iommu_dma_free_pages(struct page **pages, int count)
 {
 	while (count--)
@@ -848,15 +779,68 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 {
 	phys_addr_t phys = page_to_phys(page) + offset;
 	bool coherent = dev_is_dma_coherent(dev);
+	int prot = dma_info_to_prot(dir, coherent, attrs);
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
+	struct iommu_dma_cookie *cookie = domain->iova_cookie;
+	struct iova_domain *iovad = &cookie->iovad;
+	size_t aligned_size = size;
+	dma_addr_t iova, dma_mask = dma_get_mask(dev);
+
+	/*
+	 * If both the physical buffer start address and size are
+	 * page aligned, we don't need to use a bounce page.
+	 */
+	if (IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev) &&
+	    iova_offset(iovad, phys | size)) {
+		void *padding_start;
+		size_t padding_size;
+
+		aligned_size = iova_align(iovad, size);
+		phys = swiotlb_tbl_map_single(dev, phys, size,
+					      aligned_size, dir, attrs);
+
+		if (phys == DMA_MAPPING_ERROR)
+			return DMA_MAPPING_ERROR;
 
-	return __iommu_dma_map_swiotlb(dev, phys, size, dma_get_mask(dev),
-			coherent, dir, attrs);
+		/* Cleanup the padding area. */
+		padding_start = phys_to_virt(phys);
+		padding_size = aligned_size;
+
+		if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+		    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) {
+			padding_start += size;
+			padding_size -= size;
+		}
+
+		memset(padding_start, 0, padding_size);
+	}
+
+	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+		arch_sync_dma_for_device(phys, size, dir);
+
+	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
+	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
+		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
+	return iova;
 }
 
 static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-	__iommu_dma_unmap_swiotlb(dev, dma_handle, size, dir, attrs);
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
+	phys_addr_t phys;
+
+	phys = iommu_iova_to_phys(domain, dma_handle);
+	if (WARN_ON(!phys))
+		return;
+
+	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && !dev_is_dma_coherent(dev))
+		arch_sync_dma_for_cpu(phys, size, dir);
+
+	__iommu_dma_unmap(dev, dma_handle, size);
+
+	if (unlikely(is_swiotlb_buffer(phys)))
+		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 
 /*
@@ -941,7 +925,7 @@ static void iommu_dma_unmap_sg_swiotlb(struct device *dev, struct scatterlist *s
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		__iommu_dma_unmap_swiotlb(dev, sg_dma_address(s),
+		iommu_dma_unmap_page(dev, sg_dma_address(s),
 				sg_dma_len(s), dir, attrs);
 }
 
@@ -952,9 +936,8 @@ static int iommu_dma_map_sg_swiotlb(struct device *dev, struct scatterlist *sg,
 	int i;
 
 	for_each_sg(sg, s, nents, i) {
-		sg_dma_address(s) = __iommu_dma_map_swiotlb(dev, sg_phys(s),
-				s->length, dma_get_mask(dev),
-				dev_is_dma_coherent(dev), dir, attrs);
+		sg_dma_address(s) = iommu_dma_map_page(dev, sg_page(s),
+				s->offset, s->length, dir, attrs);
 		if (sg_dma_address(s) == DMA_MAPPING_ERROR)
 			goto out_unmap;
 		sg_dma_len(s) = s->length;
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 5/7] dma-iommu: Check CONFIG_SWIOTLB more broadly
  2021-08-17  1:38 [PATCH v6 0/7] Fixes for dma-iommu swiotlb bounce buffers David Stevens
                   ` (3 preceding siblings ...)
  2021-08-17  1:38 ` [PATCH v6 4/7] dma-iommu: fold _swiotlb helpers into callers David Stevens
@ 2021-08-17  1:38 ` David Stevens
  2021-08-17  1:38 ` [PATCH v6 6/7] swiotlb: support aligned swiotlb buffers David Stevens
  2021-08-17  1:38 ` [PATCH v6 7/7] dma-iommu: account for min_align_mask David Stevens
  6 siblings, 0 replies; 13+ messages in thread
From: David Stevens @ 2021-08-17  1:38 UTC (permalink / raw)
  To: Robin Murphy, Christoph Hellwig
  Cc: Joerg Roedel, Will Deacon, Lu Baolu, Tom Murphy, iommu,
	linux-kernel, David Stevens

From: David Stevens <stevensd@chromium.org>

Introduce a new dev_use_swiotlb function to guard swiotlb code, instead
of overloading dev_is_untrusted. This allows CONFIG_SWIOTLB to be
checked more broadly, so the swiotlb related code can be removed more
aggressively.

Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/iommu/dma-iommu.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 8152efada8b2..49a0d4de5f6c 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -317,6 +317,11 @@ static bool dev_is_untrusted(struct device *dev)
 	return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
 }
 
+static bool dev_use_swiotlb(struct device *dev)
+{
+	return IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev);
+}
+
 /**
  * iommu_dma_init_domain - Initialise a DMA mapping domain
  * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
@@ -713,7 +718,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
 {
 	phys_addr_t phys;
 
-	if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
+	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
 		return;
 
 	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
@@ -729,7 +734,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
 {
 	phys_addr_t phys;
 
-	if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
+	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
 		return;
 
 	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
@@ -747,7 +752,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
 	struct scatterlist *sg;
 	int i;
 
-	if (dev_is_untrusted(dev))
+	if (dev_use_swiotlb(dev))
 		for_each_sg(sgl, sg, nelems, i)
 			iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
 						      sg->length, dir);
@@ -763,7 +768,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
 	struct scatterlist *sg;
 	int i;
 
-	if (dev_is_untrusted(dev))
+	if (dev_use_swiotlb(dev))
 		for_each_sg(sgl, sg, nelems, i)
 			iommu_dma_sync_single_for_device(dev,
 							 sg_dma_address(sg),
@@ -790,8 +795,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 	 * If both the physical buffer start address and size are
 	 * page aligned, we don't need to use a bounce page.
 	 */
-	if (IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev) &&
-	    iova_offset(iovad, phys | size)) {
+	if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) {
 		void *padding_start;
 		size_t padding_size;
 
@@ -974,7 +978,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 	    iommu_deferred_attach(dev, domain))
 		return 0;
 
-	if (dev_is_untrusted(dev))
+	if (dev_use_swiotlb(dev))
 		return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
 
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
@@ -1046,7 +1050,7 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
 	struct scatterlist *tmp;
 	int i;
 
-	if (dev_is_untrusted(dev)) {
+	if (dev_use_swiotlb(dev)) {
 		iommu_dma_unmap_sg_swiotlb(dev, sg, nents, dir, attrs);
 		return;
 	}
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 6/7] swiotlb: support aligned swiotlb buffers
  2021-08-17  1:38 [PATCH v6 0/7] Fixes for dma-iommu swiotlb bounce buffers David Stevens
                   ` (4 preceding siblings ...)
  2021-08-17  1:38 ` [PATCH v6 5/7] dma-iommu: Check CONFIG_SWIOTLB more broadly David Stevens
@ 2021-08-17  1:38 ` David Stevens
  2021-08-17  1:38 ` [PATCH v6 7/7] dma-iommu: account for min_align_mask David Stevens
  6 siblings, 0 replies; 13+ messages in thread
From: David Stevens @ 2021-08-17  1:38 UTC (permalink / raw)
  To: Robin Murphy, Christoph Hellwig
  Cc: Joerg Roedel, Will Deacon, Lu Baolu, Tom Murphy, iommu,
	linux-kernel, David Stevens

From: David Stevens <stevensd@chromium.org>

Add an argument to swiotlb_tbl_map_single that specifies the desired
alignment of the allocated buffer. This is used by dma-iommu to ensure
the buffer is aligned to the iova granule size when using swiotlb with
untrusted sub-granule mappings. This addresses an issue where adjacent
slots could be exposed to the untrusted device if IO_TLB_SIZE < iova
granule < PAGE_SIZE.

Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/iommu/dma-iommu.c |  4 ++--
 drivers/xen/swiotlb-xen.c |  2 +-
 include/linux/swiotlb.h   |  3 ++-
 kernel/dma/swiotlb.c      | 11 +++++++----
 4 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 49a0d4de5f6c..6738420fc081 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -800,8 +800,8 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 		size_t padding_size;
 
 		aligned_size = iova_align(iovad, size);
-		phys = swiotlb_tbl_map_single(dev, phys, size,
-					      aligned_size, dir, attrs);
+		phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size,
+					      iova_mask(iovad), dir, attrs);
 
 		if (phys == DMA_MAPPING_ERROR)
 			return DMA_MAPPING_ERROR;
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 24d11861ac7d..8b03d2c93428 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -382,7 +382,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
 	 */
 	trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force);
 
-	map = swiotlb_tbl_map_single(dev, phys, size, size, dir, attrs);
+	map = swiotlb_tbl_map_single(dev, phys, size, size, 0, dir, attrs);
 	if (map == (phys_addr_t)DMA_MAPPING_ERROR)
 		return DMA_MAPPING_ERROR;
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 216854a5e513..93d82e43eb3a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -44,7 +44,8 @@ extern void __init swiotlb_update_mem_attributes(void);
 
 phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
 		size_t mapping_size, size_t alloc_size,
-		enum dma_data_direction dir, unsigned long attrs);
+		unsigned int alloc_aligned_mask, enum dma_data_direction dir,
+		unsigned long attrs);
 
 extern void swiotlb_tbl_unmap_single(struct device *hwdev,
 				     phys_addr_t tlb_addr,
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e50df8d8f87e..d4c45d8cd1fa 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -427,7 +427,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, unsigned int index)
  * allocate a buffer from that IO TLB pool.
  */
 static int find_slots(struct device *dev, phys_addr_t orig_addr,
-		size_t alloc_size)
+		size_t alloc_size, unsigned int alloc_align_mask)
 {
 	struct io_tlb_mem *mem = io_tlb_default_mem;
 	unsigned long boundary_mask = dma_get_seg_boundary(dev);
@@ -450,6 +450,7 @@ static int find_slots(struct device *dev, phys_addr_t orig_addr,
 	stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1;
 	if (alloc_size >= PAGE_SIZE)
 		stride = max(stride, stride << (PAGE_SHIFT - IO_TLB_SHIFT));
+	stride = max(stride, (alloc_align_mask >> IO_TLB_SHIFT) + 1);
 
 	spin_lock_irqsave(&mem->lock, flags);
 	if (unlikely(nslots > mem->nslabs - mem->used))
@@ -504,7 +505,8 @@ static int find_slots(struct device *dev, phys_addr_t orig_addr,
 
 phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 		size_t mapping_size, size_t alloc_size,
-		enum dma_data_direction dir, unsigned long attrs)
+		unsigned int alloc_align_mask, enum dma_data_direction dir,
+		unsigned long attrs)
 {
 	struct io_tlb_mem *mem = io_tlb_default_mem;
 	unsigned int offset = swiotlb_align_offset(dev, orig_addr);
@@ -524,7 +526,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 		return (phys_addr_t)DMA_MAPPING_ERROR;
 	}
 
-	index = find_slots(dev, orig_addr, alloc_size + offset);
+	index = find_slots(dev, orig_addr,
+			   alloc_size + offset, alloc_align_mask);
 	if (index == -1) {
 		if (!(attrs & DMA_ATTR_NO_WARN))
 			dev_warn_ratelimited(dev,
@@ -636,7 +639,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
 	trace_swiotlb_bounced(dev, phys_to_dma(dev, paddr), size,
 			      swiotlb_force);
 
-	swiotlb_addr = swiotlb_tbl_map_single(dev, paddr, size, size, dir,
+	swiotlb_addr = swiotlb_tbl_map_single(dev, paddr, size, size, 0, dir,
 			attrs);
 	if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
 		return DMA_MAPPING_ERROR;
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 7/7] dma-iommu: account for min_align_mask
  2021-08-17  1:38 [PATCH v6 0/7] Fixes for dma-iommu swiotlb bounce buffers David Stevens
                   ` (5 preceding siblings ...)
  2021-08-17  1:38 ` [PATCH v6 6/7] swiotlb: support aligned swiotlb buffers David Stevens
@ 2021-08-17  1:38 ` David Stevens
  2021-08-19  9:03   ` Robin Murphy
  6 siblings, 1 reply; 13+ messages in thread
From: David Stevens @ 2021-08-17  1:38 UTC (permalink / raw)
  To: Robin Murphy, Christoph Hellwig
  Cc: Joerg Roedel, Will Deacon, Lu Baolu, Tom Murphy, iommu,
	linux-kernel, David Stevens

From: David Stevens <stevensd@chromium.org>

For devices which set min_align_mask, swiotlb preserves the offset of
the original physical address within that mask. Since __iommu_dma_map
accounts for non-aligned addresses, passing a non-aligned swiotlb
address with the swiotlb aligned size results in the offset being
accounted for twice in the size passed to iommu_map_atomic. The extra
page exposed to DMA is also not cleaned up by __iommu_dma_unmap, since
that function unmaps with the correct size. This causes mapping failures
if the iova gets reused, due to collisions in the iommu page tables.

To fix this, pass the original size to __iommu_dma_map, since that
function already handles alignment.

Additionally, when swiotlb returns non-aligned addresses, there is
padding at the start of the bounce buffer that needs to be cleared.

Fixes: 1f221a0d0dbf ("swiotlb: respect min_align_mask")
Signed-off-by: David Stevens <stevensd@chromium.org>
---
 drivers/iommu/dma-iommu.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 6738420fc081..f2fb360c2907 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -788,7 +788,6 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 	struct iommu_domain *domain = iommu_get_dma_domain(dev);
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	struct iova_domain *iovad = &cookie->iovad;
-	size_t aligned_size = size;
 	dma_addr_t iova, dma_mask = dma_get_mask(dev);
 
 	/*
@@ -796,8 +795,8 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 	 * page aligned, we don't need to use a bounce page.
 	 */
 	if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) {
-		void *padding_start;
-		size_t padding_size;
+		void *tlb_start;
+		size_t aligned_size, iova_off, mapping_end_off;
 
 		aligned_size = iova_align(iovad, size);
 		phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size,
@@ -806,23 +805,26 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 		if (phys == DMA_MAPPING_ERROR)
 			return DMA_MAPPING_ERROR;
 
-		/* Cleanup the padding area. */
-		padding_start = phys_to_virt(phys);
-		padding_size = aligned_size;
+		iova_off = iova_offset(iovad, phys);
+		tlb_start = phys_to_virt(phys - iova_off);
 
 		if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 		    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) {
-			padding_start += size;
-			padding_size -= size;
+			/* Cleanup the padding area. */
+			mapping_end_off = iova_off + size;
+			memset(tlb_start, 0, iova_off);
+			memset(tlb_start + mapping_end_off, 0,
+			       aligned_size - mapping_end_off);
+		} else {
+			/* Nothing was sync'ed, so clear the whole buffer. */
+			memset(tlb_start, 0, aligned_size);
 		}
-
-		memset(padding_start, 0, padding_size);
 	}
 
 	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
 		arch_sync_dma_for_device(phys, size, dir);
 
-	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
+	iova = __iommu_dma_map(dev, phys, size, prot, dma_mask);
 	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
 		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 	return iova;
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 2/7] dma-iommu: fix arch_sync_dma for map
  2021-08-17  1:38 ` [PATCH v6 2/7] dma-iommu: fix arch_sync_dma for map David Stevens
@ 2021-08-19  8:59   ` Robin Murphy
  0 siblings, 0 replies; 13+ messages in thread
From: Robin Murphy @ 2021-08-19  8:59 UTC (permalink / raw)
  To: David Stevens, Christoph Hellwig
  Cc: Joerg Roedel, Will Deacon, Lu Baolu, Tom Murphy, iommu, linux-kernel

On 2021-08-17 02:38, David Stevens wrote:
> From: David Stevens <stevensd@chromium.org>
> 
> When calling arch_sync_dma, we need to pass it the memory that's
> actually being used for dma. When using swiotlb bounce buffers, this is
> the bounce buffer. Move arch_sync_dma into the __iommu_dma_map_swiotlb
> helper, so it can use the bounce buffer address if necessary.
> 
> Now that iommu_dma_map_sg delegates to a function which takes care of
> architectural syncing in the untrusted device case, the call to
> iommu_dma_sync_sg_for_device can be moved so it only occurs for trusted
> devices. Doing the sync for untrusted devices before mapping never
> really worked, since it needs to be able to target swiotlb buffers.
> 
> This also moves the architectural sync to before the call to
> __iommu_dma_map, to guarantee that untrusted devices can't see stale
> data they shouldn't see.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
> Signed-off-by: David Stevens <stevensd@chromium.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/iommu/dma-iommu.c | 16 +++++++---------
>   1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 968e0150c95e..8098ce593640 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -576,6 +576,9 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
>   		memset(padding_start, 0, padding_size);
>   	}
>   
> +	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> +		arch_sync_dma_for_device(phys, org_size, dir);
> +
>   	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
>   	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
>   		swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
> @@ -842,14 +845,9 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>   {
>   	phys_addr_t phys = page_to_phys(page) + offset;
>   	bool coherent = dev_is_dma_coherent(dev);
> -	dma_addr_t dma_handle;
>   
> -	dma_handle = __iommu_dma_map_swiotlb(dev, phys, size, dma_get_mask(dev),
> +	return __iommu_dma_map_swiotlb(dev, phys, size, dma_get_mask(dev),
>   			coherent, dir, attrs);
> -	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
> -	    dma_handle != DMA_MAPPING_ERROR)
> -		arch_sync_dma_for_device(phys, size, dir);
> -	return dma_handle;
>   }
>   
>   static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
> @@ -992,12 +990,12 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>   	    iommu_deferred_attach(dev, domain))
>   		return 0;
>   
> -	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> -		iommu_dma_sync_sg_for_device(dev, sg, nents, dir);
> -
>   	if (dev_is_untrusted(dev))
>   		return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
>   
> +	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> +		iommu_dma_sync_sg_for_device(dev, sg, nents, dir);
> +
>   	/*
>   	 * Work out how much IOVA space we need, and align the segments to
>   	 * IOVA granules for the IOMMU driver to handle. With some clever
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/7] dma-iommu: skip extra sync during unmap w/swiotlb
  2021-08-17  1:38 ` [PATCH v6 3/7] dma-iommu: skip extra sync during unmap w/swiotlb David Stevens
@ 2021-08-19  8:59   ` Robin Murphy
  0 siblings, 0 replies; 13+ messages in thread
From: Robin Murphy @ 2021-08-19  8:59 UTC (permalink / raw)
  To: David Stevens, Christoph Hellwig
  Cc: Joerg Roedel, Will Deacon, Lu Baolu, Tom Murphy, iommu, linux-kernel

On 2021-08-17 02:38, David Stevens wrote:
> From: David Stevens <stevensd@chromium.org>
> 
> Calling the iommu_dma_sync_*_for_cpu functions during unmap can cause
> two copies out of the swiotlb buffer. Do the arch sync directly in
> __iommu_dma_unmap_swiotlb instead to avoid this. This makes the call to
> iommu_dma_sync_sg_for_cpu for untrusted devices in iommu_dma_unmap_sg no
> longer necessary, so move that invocation later in the function.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Signed-off-by: David Stevens <stevensd@chromium.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/iommu/dma-iommu.c | 11 ++++++-----
>   1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 8098ce593640..5dd2c517dbf5 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -504,6 +504,9 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
>   	if (WARN_ON(!phys))
>   		return;
>   
> +	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && !dev_is_dma_coherent(dev))
> +		arch_sync_dma_for_cpu(phys, size, dir);
> +
>   	__iommu_dma_unmap(dev, dma_addr, size);
>   
>   	if (unlikely(is_swiotlb_buffer(phys)))
> @@ -853,8 +856,6 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>   static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
>   		size_t size, enum dma_data_direction dir, unsigned long attrs)
>   {
> -	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> -		iommu_dma_sync_single_for_cpu(dev, dma_handle, size, dir);
>   	__iommu_dma_unmap_swiotlb(dev, dma_handle, size, dir, attrs);
>   }
>   
> @@ -1062,14 +1063,14 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
>   	struct scatterlist *tmp;
>   	int i;
>   
> -	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> -		iommu_dma_sync_sg_for_cpu(dev, sg, nents, dir);
> -
>   	if (dev_is_untrusted(dev)) {
>   		iommu_dma_unmap_sg_swiotlb(dev, sg, nents, dir, attrs);
>   		return;
>   	}
>   
> +	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> +		iommu_dma_sync_sg_for_cpu(dev, sg, nents, dir);
> +
>   	/*
>   	 * The scatterlist segments are mapped into a single
>   	 * contiguous IOVA allocation, so this is incredibly easy.
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 4/7] dma-iommu: fold _swiotlb helpers into callers
  2021-08-17  1:38 ` [PATCH v6 4/7] dma-iommu: fold _swiotlb helpers into callers David Stevens
@ 2021-08-19  9:00   ` Robin Murphy
  0 siblings, 0 replies; 13+ messages in thread
From: Robin Murphy @ 2021-08-19  9:00 UTC (permalink / raw)
  To: David Stevens, Christoph Hellwig
  Cc: Joerg Roedel, Will Deacon, Lu Baolu, Tom Murphy, iommu, linux-kernel

On 2021-08-17 02:38, David Stevens wrote:
> From: David Stevens <stevensd@chromium.org>
> 
> Fold the _swiotlb helper functions into the respective _page functions,
> since recent fixes have moved all logic from the _page functions to the
> _swiotlb functions.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Signed-off-by: David Stevens <stevensd@chromium.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/iommu/dma-iommu.c | 135 +++++++++++++++++---------------------
>   1 file changed, 59 insertions(+), 76 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 5dd2c517dbf5..8152efada8b2 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -493,26 +493,6 @@ static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr,
>   	iommu_dma_free_iova(cookie, dma_addr, size, iotlb_gather.freelist);
>   }
>   
> -static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
> -		size_t size, enum dma_data_direction dir,
> -		unsigned long attrs)
> -{
> -	struct iommu_domain *domain = iommu_get_dma_domain(dev);
> -	phys_addr_t phys;
> -
> -	phys = iommu_iova_to_phys(domain, dma_addr);
> -	if (WARN_ON(!phys))
> -		return;
> -
> -	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && !dev_is_dma_coherent(dev))
> -		arch_sync_dma_for_cpu(phys, size, dir);
> -
> -	__iommu_dma_unmap(dev, dma_addr, size);
> -
> -	if (unlikely(is_swiotlb_buffer(phys)))
> -		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
> -}
> -
>   static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
>   		size_t size, int prot, u64 dma_mask)
>   {
> @@ -539,55 +519,6 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
>   	return iova + iova_off;
>   }
>   
> -static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
> -		size_t org_size, dma_addr_t dma_mask, bool coherent,
> -		enum dma_data_direction dir, unsigned long attrs)
> -{
> -	int prot = dma_info_to_prot(dir, coherent, attrs);
> -	struct iommu_domain *domain = iommu_get_dma_domain(dev);
> -	struct iommu_dma_cookie *cookie = domain->iova_cookie;
> -	struct iova_domain *iovad = &cookie->iovad;
> -	size_t aligned_size = org_size;
> -	void *padding_start;
> -	size_t padding_size;
> -	dma_addr_t iova;
> -
> -	/*
> -	 * If both the physical buffer start address and size are
> -	 * page aligned, we don't need to use a bounce page.
> -	 */
> -	if (IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev) &&
> -	    iova_offset(iovad, phys | org_size)) {
> -		aligned_size = iova_align(iovad, org_size);
> -		phys = swiotlb_tbl_map_single(dev, phys, org_size,
> -					      aligned_size, dir, attrs);
> -
> -		if (phys == DMA_MAPPING_ERROR)
> -			return DMA_MAPPING_ERROR;
> -
> -		/* Cleanup the padding area. */
> -		padding_start = phys_to_virt(phys);
> -		padding_size = aligned_size;
> -
> -		if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
> -		    (dir == DMA_TO_DEVICE ||
> -		     dir == DMA_BIDIRECTIONAL)) {
> -			padding_start += org_size;
> -			padding_size -= org_size;
> -		}
> -
> -		memset(padding_start, 0, padding_size);
> -	}
> -
> -	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> -		arch_sync_dma_for_device(phys, org_size, dir);
> -
> -	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
> -	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
> -		swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
> -	return iova;
> -}
> -
>   static void __iommu_dma_free_pages(struct page **pages, int count)
>   {
>   	while (count--)
> @@ -848,15 +779,68 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>   {
>   	phys_addr_t phys = page_to_phys(page) + offset;
>   	bool coherent = dev_is_dma_coherent(dev);
> +	int prot = dma_info_to_prot(dir, coherent, attrs);
> +	struct iommu_domain *domain = iommu_get_dma_domain(dev);
> +	struct iommu_dma_cookie *cookie = domain->iova_cookie;
> +	struct iova_domain *iovad = &cookie->iovad;
> +	size_t aligned_size = size;
> +	dma_addr_t iova, dma_mask = dma_get_mask(dev);
> +
> +	/*
> +	 * If both the physical buffer start address and size are
> +	 * page aligned, we don't need to use a bounce page.
> +	 */
> +	if (IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev) &&
> +	    iova_offset(iovad, phys | size)) {
> +		void *padding_start;
> +		size_t padding_size;
> +
> +		aligned_size = iova_align(iovad, size);
> +		phys = swiotlb_tbl_map_single(dev, phys, size,
> +					      aligned_size, dir, attrs);
> +
> +		if (phys == DMA_MAPPING_ERROR)
> +			return DMA_MAPPING_ERROR;
>   
> -	return __iommu_dma_map_swiotlb(dev, phys, size, dma_get_mask(dev),
> -			coherent, dir, attrs);
> +		/* Cleanup the padding area. */
> +		padding_start = phys_to_virt(phys);
> +		padding_size = aligned_size;
> +
> +		if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
> +		    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) {
> +			padding_start += size;
> +			padding_size -= size;
> +		}
> +
> +		memset(padding_start, 0, padding_size);
> +	}
> +
> +	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> +		arch_sync_dma_for_device(phys, size, dir);
> +
> +	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
> +	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
> +		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
> +	return iova;
>   }
>   
>   static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
>   		size_t size, enum dma_data_direction dir, unsigned long attrs)
>   {
> -	__iommu_dma_unmap_swiotlb(dev, dma_handle, size, dir, attrs);
> +	struct iommu_domain *domain = iommu_get_dma_domain(dev);
> +	phys_addr_t phys;
> +
> +	phys = iommu_iova_to_phys(domain, dma_handle);
> +	if (WARN_ON(!phys))
> +		return;
> +
> +	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && !dev_is_dma_coherent(dev))
> +		arch_sync_dma_for_cpu(phys, size, dir);
> +
> +	__iommu_dma_unmap(dev, dma_handle, size);
> +
> +	if (unlikely(is_swiotlb_buffer(phys)))
> +		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>   }
>   
>   /*
> @@ -941,7 +925,7 @@ static void iommu_dma_unmap_sg_swiotlb(struct device *dev, struct scatterlist *s
>   	int i;
>   
>   	for_each_sg(sg, s, nents, i)
> -		__iommu_dma_unmap_swiotlb(dev, sg_dma_address(s),
> +		iommu_dma_unmap_page(dev, sg_dma_address(s),
>   				sg_dma_len(s), dir, attrs);
>   }
>   
> @@ -952,9 +936,8 @@ static int iommu_dma_map_sg_swiotlb(struct device *dev, struct scatterlist *sg,
>   	int i;
>   
>   	for_each_sg(sg, s, nents, i) {
> -		sg_dma_address(s) = __iommu_dma_map_swiotlb(dev, sg_phys(s),
> -				s->length, dma_get_mask(dev),
> -				dev_is_dma_coherent(dev), dir, attrs);
> +		sg_dma_address(s) = iommu_dma_map_page(dev, sg_page(s),
> +				s->offset, s->length, dir, attrs);
>   		if (sg_dma_address(s) == DMA_MAPPING_ERROR)
>   			goto out_unmap;
>   		sg_dma_len(s) = s->length;
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 7/7] dma-iommu: account for min_align_mask
  2021-08-17  1:38 ` [PATCH v6 7/7] dma-iommu: account for min_align_mask David Stevens
@ 2021-08-19  9:03   ` Robin Murphy
  2021-08-19 10:05     ` David Stevens
  0 siblings, 1 reply; 13+ messages in thread
From: Robin Murphy @ 2021-08-19  9:03 UTC (permalink / raw)
  To: David Stevens, Christoph Hellwig
  Cc: Joerg Roedel, Will Deacon, Lu Baolu, Tom Murphy, iommu, linux-kernel

On 2021-08-17 02:38, David Stevens wrote:
> From: David Stevens <stevensd@chromium.org>
> 
> For devices which set min_align_mask, swiotlb preserves the offset of
> the original physical address within that mask. Since __iommu_dma_map
> accounts for non-aligned addresses, passing a non-aligned swiotlb
> address with the swiotlb aligned size results in the offset being
> accounted for twice in the size passed to iommu_map_atomic. The extra
> page exposed to DMA is also not cleaned up by __iommu_dma_unmap, since
> that function unmaps with the correct size. This causes mapping failures
> if the iova gets reused, due to collisions in the iommu page tables.
> 
> To fix this, pass the original size to __iommu_dma_map, since that
> function already handles alignment.
> 
> Additionally, when swiotlb returns non-aligned addresses, there is
> padding at the start of the bounce buffer that needs to be cleared.
> 
> Fixes: 1f221a0d0dbf ("swiotlb: respect min_align_mask")
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>   drivers/iommu/dma-iommu.c | 24 +++++++++++++-----------
>   1 file changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 6738420fc081..f2fb360c2907 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -788,7 +788,6 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>   	struct iommu_domain *domain = iommu_get_dma_domain(dev);
>   	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>   	struct iova_domain *iovad = &cookie->iovad;
> -	size_t aligned_size = size;
>   	dma_addr_t iova, dma_mask = dma_get_mask(dev);
>   
>   	/*
> @@ -796,8 +795,8 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>   	 * page aligned, we don't need to use a bounce page.
>   	 */
>   	if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) {
> -		void *padding_start;
> -		size_t padding_size;
> +		void *tlb_start;
> +		size_t aligned_size, iova_off, mapping_end_off;
>   
>   		aligned_size = iova_align(iovad, size);
>   		phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size,
> @@ -806,23 +805,26 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>   		if (phys == DMA_MAPPING_ERROR)
>   			return DMA_MAPPING_ERROR;
>   
> -		/* Cleanup the padding area. */
> -		padding_start = phys_to_virt(phys);
> -		padding_size = aligned_size;
> +		iova_off = iova_offset(iovad, phys);
> +		tlb_start = phys_to_virt(phys - iova_off);
>   
>   		if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
>   		    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) {
> -			padding_start += size;
> -			padding_size -= size;
> +			/* Cleanup the padding area. */
> +			mapping_end_off = iova_off + size;
> +			memset(tlb_start, 0, iova_off);
> +			memset(tlb_start + mapping_end_off, 0,
> +			       aligned_size - mapping_end_off);
> +		} else {
> +			/* Nothing was sync'ed, so clear the whole buffer. */
> +			memset(tlb_start, 0, aligned_size);
>   		}
> -
> -		memset(padding_start, 0, padding_size);
>   	}
>   
>   	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
>   		arch_sync_dma_for_device(phys, size, dir);
>   
> -	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
> +	iova = __iommu_dma_map(dev, phys, size, prot, dma_mask);

I still don't see how this preserves min_align_mask if it is larger than 
the IOVA granule (either way this change here does nothing since the 
first thing __iommu_dma_map() does is iova_align() the size right back 
anyway).

Robin.

>   	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
>   		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>   	return iova;
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 7/7] dma-iommu: account for min_align_mask
  2021-08-19  9:03   ` Robin Murphy
@ 2021-08-19 10:05     ` David Stevens
  0 siblings, 0 replies; 13+ messages in thread
From: David Stevens @ 2021-08-19 10:05 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Christoph Hellwig, Joerg Roedel, Will Deacon, Lu Baolu,
	Tom Murphy, iommu, open list

  On Thu, Aug 19, 2021 at 6:03 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2021-08-17 02:38, David Stevens wrote:
> > From: David Stevens <stevensd@chromium.org>
> >
> > For devices which set min_align_mask, swiotlb preserves the offset of
> > the original physical address within that mask. Since __iommu_dma_map
> > accounts for non-aligned addresses, passing a non-aligned swiotlb
> > address with the swiotlb aligned size results in the offset being
> > accounted for twice in the size passed to iommu_map_atomic. The extra
> > page exposed to DMA is also not cleaned up by __iommu_dma_unmap, since
> > that function unmaps with the correct size. This causes mapping failures
> > if the iova gets reused, due to collisions in the iommu page tables.
> >
> > To fix this, pass the original size to __iommu_dma_map, since that
> > function already handles alignment.
> >
> > Additionally, when swiotlb returns non-aligned addresses, there is
> > padding at the start of the bounce buffer that needs to be cleared.
> >
> > Fixes: 1f221a0d0dbf ("swiotlb: respect min_align_mask")
> > Signed-off-by: David Stevens <stevensd@chromium.org>
> > ---
> >   drivers/iommu/dma-iommu.c | 24 +++++++++++++-----------
> >   1 file changed, 13 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 6738420fc081..f2fb360c2907 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -788,7 +788,6 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
> >       struct iommu_domain *domain = iommu_get_dma_domain(dev);
> >       struct iommu_dma_cookie *cookie = domain->iova_cookie;
> >       struct iova_domain *iovad = &cookie->iovad;
> > -     size_t aligned_size = size;
> >       dma_addr_t iova, dma_mask = dma_get_mask(dev);
> >
> >       /*
> > @@ -796,8 +795,8 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
> >        * page aligned, we don't need to use a bounce page.
> >        */
> >       if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) {
> > -             void *padding_start;
> > -             size_t padding_size;
> > +             void *tlb_start;
> > +             size_t aligned_size, iova_off, mapping_end_off;
> >
> >               aligned_size = iova_align(iovad, size);
> >               phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size,
> > @@ -806,23 +805,26 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
> >               if (phys == DMA_MAPPING_ERROR)
> >                       return DMA_MAPPING_ERROR;
> >
> > -             /* Cleanup the padding area. */
> > -             padding_start = phys_to_virt(phys);
> > -             padding_size = aligned_size;
> > +             iova_off = iova_offset(iovad, phys);
> > +             tlb_start = phys_to_virt(phys - iova_off);
> >
> >               if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
> >                   (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) {
> > -                     padding_start += size;
> > -                     padding_size -= size;
> > +                     /* Cleanup the padding area. */
> > +                     mapping_end_off = iova_off + size;
> > +                     memset(tlb_start, 0, iova_off);
> > +                     memset(tlb_start + mapping_end_off, 0,
> > +                            aligned_size - mapping_end_off);
> > +             } else {
> > +                     /* Nothing was sync'ed, so clear the whole buffer. */
> > +                     memset(tlb_start, 0, aligned_size);
> >               }
> > -
> > -             memset(padding_start, 0, padding_size);
> >       }
> >
> >       if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> >               arch_sync_dma_for_device(phys, size, dir);
> >
> > -     iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
> > +     iova = __iommu_dma_map(dev, phys, size, prot, dma_mask);
>
> I still don't see how this preserves min_align_mask if it is larger than
> the IOVA granule

That's a slightly different issue, and not addressed in this series. I
guess the commit message should be 'dma-iommu: account for
min_align_mask w/swiotlb'. At least from my understanding of
min_align_mask, getting min_align_mask larger than the IOVA granule to
work would require changes to IOVA allocation, not anything to do
directly with swiotlb bounce buffers. Also, probably changes to
scatterlist coalescing. That being said, it looks like the only driver
that sets min_align_mask is the nvme driver, which sets it to 4096.

> (either way this change here does nothing since the
> first thing __iommu_dma_map() does is iova_align() the size right back
> anyway).
>

__iommu_dma_map() doesn't just align the size, it aligns
size+iova_off. Let's say you're doing a read of size 512 bytes at
offset 2048 within a page. In this case, aligned_size will be 4096.
Without min_align_mask, phys will be page aligned, so that's fine. But
with min_align_mask=4096, phys will also be at offset 2048. This
causes __iommu_dma_map to align 4096 + 2048, which becomes 8192. That
results in an extra page being mapped, which then doesn't get cleaned
up by __iommu_dma_unmap. That causes collisions in the IOMMU driver
the next time the iova is reused.

Passing size to __iommu_dma_map is sufficient. iommu_dma_map_page
needs to map [phys, phys+size), regardless of whether or not bounce
buffers are being used. __iommu_dma_map already takes care of cleaning
up the alignment, there's no need to do any extra alignment specific
to the bounce buffer case.

-David

> Robin.
>
> >       if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
> >               swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
> >       return iova;
> >

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-08-19 10:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-17  1:38 [PATCH v6 0/7] Fixes for dma-iommu swiotlb bounce buffers David Stevens
2021-08-17  1:38 ` [PATCH v6 1/7] dma-iommu: fix sync_sg with swiotlb David Stevens
2021-08-17  1:38 ` [PATCH v6 2/7] dma-iommu: fix arch_sync_dma for map David Stevens
2021-08-19  8:59   ` Robin Murphy
2021-08-17  1:38 ` [PATCH v6 3/7] dma-iommu: skip extra sync during unmap w/swiotlb David Stevens
2021-08-19  8:59   ` Robin Murphy
2021-08-17  1:38 ` [PATCH v6 4/7] dma-iommu: fold _swiotlb helpers into callers David Stevens
2021-08-19  9:00   ` Robin Murphy
2021-08-17  1:38 ` [PATCH v6 5/7] dma-iommu: Check CONFIG_SWIOTLB more broadly David Stevens
2021-08-17  1:38 ` [PATCH v6 6/7] swiotlb: support aligned swiotlb buffers David Stevens
2021-08-17  1:38 ` [PATCH v6 7/7] dma-iommu: account for min_align_mask David Stevens
2021-08-19  9:03   ` Robin Murphy
2021-08-19 10:05     ` David Stevens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).