All of lore.kernel.org
 help / color / mirror / Atom feed
* preserve DMA offsets when using swiotlb
@ 2021-02-04 19:30 ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu,
	erdemaktas, robin.murphy, m.szyprowski

Hi all,

this series make NVMe happy when running with swiotlb.  This caters
towards to completely broken NVMe controllers that ignore the
specification (hello to the biggest cloud provider on the planet!),
to crappy SOC that have addressing limitations, or "secure"
virtualization that force bounce buffering to enhance the user
experience.  Or in other words, no one sane should hit it, but
people do.

It is basically a respin of the

    "SWIOTLB: Preserve swiotlb map offset when needed."

series from Jianxiong Gao.  It complete rewrites the swiotlb part so that
the offset really is preserved and not just the offset into the swiotlb
slot, and to do so it grew half a dozen patches to refactor the swiotlb
so that a mere mortal like me could actually understand it.


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 36+ messages in thread

* preserve DMA offsets when using swiotlb
@ 2021-02-04 19:30 ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu, robin.murphy

Hi all,

this series make NVMe happy when running with swiotlb.  This caters
towards to completely broken NVMe controllers that ignore the
specification (hello to the biggest cloud provider on the planet!),
to crappy SOC that have addressing limitations, or "secure"
virtualization that force bounce buffering to enhance the user
experience.  Or in other words, no one sane should hit it, but
people do.

It is basically a respin of the

    "SWIOTLB: Preserve swiotlb map offset when needed."

series from Jianxiong Gao.  It complete rewrites the swiotlb part so that
the offset really is preserved and not just the offset into the swiotlb
slot, and to do so it grew half a dozen patches to refactor the swiotlb
so that a mere mortal like me could actually understand it.

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/8] driver core: add a min_align_mask field to struct device_dma_parameters
  2021-02-04 19:30 ` Christoph Hellwig
@ 2021-02-04 19:30   ` Christoph Hellwig
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu,
	erdemaktas, robin.murphy, m.szyprowski

From: Jianxiong Gao <jxgao@google.com>

Some devices rely on the address offset in a page to function
correctly (NVMe driver as an example). These devices may use
a different page size than the Linux kernel. The address offset
has to be preserved upon mapping, and in order to do so, we
need to record the page_offset_mask first.

Signed-off-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/device.h      |  1 +
 include/linux/dma-mapping.h | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index 1779f90eeb4cb4..7960bf516dd7fe 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -291,6 +291,7 @@ struct device_dma_parameters {
 	 * sg limitations.
 	 */
 	unsigned int max_segment_size;
+	unsigned int min_align_mask;
 	unsigned long segment_boundary_mask;
 };
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 2e49996a8f391a..9c26225754e719 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -500,6 +500,22 @@ static inline int dma_set_seg_boundary(struct device *dev, unsigned long mask)
 	return -EIO;
 }
 
+static inline unsigned int dma_get_min_align_mask(struct device *dev)
+{
+	if (dev->dma_parms)
+		return dev->dma_parms->min_align_mask;
+	return 0;
+}
+
+static inline int dma_set_min_align_mask(struct device *dev,
+		unsigned int min_align_mask)
+{
+	if (WARN_ON_ONCE(!dev->dma_parms))
+		return -EIO;
+	dev->dma_parms->min_align_mask = min_align_mask;
+	return 0;
+}
+
 static inline int dma_get_cache_alignment(void)
 {
 #ifdef ARCH_DMA_MINALIGN
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 1/8] driver core: add a min_align_mask field to struct device_dma_parameters
@ 2021-02-04 19:30   ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu, robin.murphy

From: Jianxiong Gao <jxgao@google.com>

Some devices rely on the address offset in a page to function
correctly (NVMe driver as an example). These devices may use
a different page size than the Linux kernel. The address offset
has to be preserved upon mapping, and in order to do so, we
need to record the page_offset_mask first.

Signed-off-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/device.h      |  1 +
 include/linux/dma-mapping.h | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index 1779f90eeb4cb4..7960bf516dd7fe 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -291,6 +291,7 @@ struct device_dma_parameters {
 	 * sg limitations.
 	 */
 	unsigned int max_segment_size;
+	unsigned int min_align_mask;
 	unsigned long segment_boundary_mask;
 };
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 2e49996a8f391a..9c26225754e719 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -500,6 +500,22 @@ static inline int dma_set_seg_boundary(struct device *dev, unsigned long mask)
 	return -EIO;
 }
 
+static inline unsigned int dma_get_min_align_mask(struct device *dev)
+{
+	if (dev->dma_parms)
+		return dev->dma_parms->min_align_mask;
+	return 0;
+}
+
+static inline int dma_set_min_align_mask(struct device *dev,
+		unsigned int min_align_mask)
+{
+	if (WARN_ON_ONCE(!dev->dma_parms))
+		return -EIO;
+	dev->dma_parms->min_align_mask = min_align_mask;
+	return 0;
+}
+
 static inline int dma_get_cache_alignment(void)
 {
 #ifdef ARCH_DMA_MINALIGN
-- 
2.29.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/8] swiotlb: add a io_tlb_offset helper
  2021-02-04 19:30 ` Christoph Hellwig
@ 2021-02-04 19:30   ` Christoph Hellwig
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu,
	erdemaktas, robin.murphy, m.szyprowski

Replace the very genericly named OFFSET macro with a little inline
helper that hardcodes the alignment to the only value ever passed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/swiotlb.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7c42df6e61001d..838dbad10ab916 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -50,9 +50,6 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/swiotlb.h>
 
-#define OFFSET(val,align) ((unsigned long)	\
-	                   ( (val) & ( (align) - 1)))
-
 #define SLABS_PER_PAGE (1 << (PAGE_SHIFT - IO_TLB_SHIFT))
 
 /*
@@ -192,6 +189,11 @@ void swiotlb_print_info(void)
 	       bytes >> 20);
 }
 
+static inline unsigned long io_tlb_offset(unsigned long val)
+{
+	return val & (IO_TLB_SEGSIZE - 1);
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -241,7 +243,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 		      __func__, alloc_size, PAGE_SIZE);
 
 	for (i = 0; i < io_tlb_nslabs; i++) {
-		io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+		io_tlb_list[i] = IO_TLB_SEGSIZE - io_tlb_offset(i);
 		io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
 	}
 	io_tlb_index = 0;
@@ -375,7 +377,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 		goto cleanup4;
 
 	for (i = 0; i < io_tlb_nslabs; i++) {
-		io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+		io_tlb_list[i] = IO_TLB_SEGSIZE - io_tlb_offset(i);
 		io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
 	}
 	io_tlb_index = 0;
@@ -546,7 +548,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 
 			for (i = index; i < (int) (index + nslots); i++)
 				io_tlb_list[i] = 0;
-			for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE - 1) && io_tlb_list[i]; i--)
+			for (i = index - 1;
+			     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
+			     io_tlb_list[i]; i--)
 				io_tlb_list[i] = ++count;
 			tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
 
@@ -632,7 +636,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 		 * Step 2: merge the returned slots with the preceding slots,
 		 * if available (non zero)
 		 */
-		for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
+		for (i = index - 1;
+		     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
+		     io_tlb_list[i]; i--)
 			io_tlb_list[i] = ++count;
 
 		io_tlb_used -= nslots;
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/8] swiotlb: add a io_tlb_offset helper
@ 2021-02-04 19:30   ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu, robin.murphy

Replace the very genericly named OFFSET macro with a little inline
helper that hardcodes the alignment to the only value ever passed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/swiotlb.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7c42df6e61001d..838dbad10ab916 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -50,9 +50,6 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/swiotlb.h>
 
-#define OFFSET(val,align) ((unsigned long)	\
-	                   ( (val) & ( (align) - 1)))
-
 #define SLABS_PER_PAGE (1 << (PAGE_SHIFT - IO_TLB_SHIFT))
 
 /*
@@ -192,6 +189,11 @@ void swiotlb_print_info(void)
 	       bytes >> 20);
 }
 
+static inline unsigned long io_tlb_offset(unsigned long val)
+{
+	return val & (IO_TLB_SEGSIZE - 1);
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -241,7 +243,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 		      __func__, alloc_size, PAGE_SIZE);
 
 	for (i = 0; i < io_tlb_nslabs; i++) {
-		io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+		io_tlb_list[i] = IO_TLB_SEGSIZE - io_tlb_offset(i);
 		io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
 	}
 	io_tlb_index = 0;
@@ -375,7 +377,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 		goto cleanup4;
 
 	for (i = 0; i < io_tlb_nslabs; i++) {
-		io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+		io_tlb_list[i] = IO_TLB_SEGSIZE - io_tlb_offset(i);
 		io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
 	}
 	io_tlb_index = 0;
@@ -546,7 +548,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 
 			for (i = index; i < (int) (index + nslots); i++)
 				io_tlb_list[i] = 0;
-			for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE - 1) && io_tlb_list[i]; i--)
+			for (i = index - 1;
+			     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
+			     io_tlb_list[i]; i--)
 				io_tlb_list[i] = ++count;
 			tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
 
@@ -632,7 +636,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 		 * Step 2: merge the returned slots with the preceding slots,
 		 * if available (non zero)
 		 */
-		for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
+		for (i = index - 1;
+		     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
+		     io_tlb_list[i]; i--)
 			io_tlb_list[i] = ++count;
 
 		io_tlb_used -= nslots;
-- 
2.29.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/8] swiotlb: factor out a nr_slots helper
  2021-02-04 19:30 ` Christoph Hellwig
@ 2021-02-04 19:30   ` Christoph Hellwig
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu,
	erdemaktas, robin.murphy, m.szyprowski

Factor out a helper to find the number of slots for a given size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/swiotlb.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 838dbad10ab916..0c0b81799edbdb 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -194,6 +194,11 @@ static inline unsigned long io_tlb_offset(unsigned long val)
 	return val & (IO_TLB_SEGSIZE - 1);
 }
 
+static unsigned long nr_slots(u64 val)
+{
+	return ALIGN(val, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -493,20 +498,20 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 
 	tbl_dma_addr &= mask;
 
-	offset_slots = ALIGN(tbl_dma_addr, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+	offset_slots = nr_slots(tbl_dma_addr);
 
 	/*
 	 * Carefully handle integer overflow which can occur when mask == ~0UL.
 	 */
 	max_slots = mask + 1
-		    ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
+		    ? nr_slots(mask + 1)
 		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
 
 	/*
 	 * For mappings greater than or equal to a page, we limit the stride
 	 * (and hence alignment) to a page size.
 	 */
-	nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+	nslots = nr_slots(alloc_size);
 	if (alloc_size >= PAGE_SIZE)
 		stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
 	else
@@ -602,7 +607,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 			      enum dma_data_direction dir, unsigned long attrs)
 {
 	unsigned long flags;
-	int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+	int i, count, nslots = nr_slots(alloc_size);
 	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
 	phys_addr_t orig_addr = io_tlb_orig_addr[index];
 
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/8] swiotlb: factor out a nr_slots helper
@ 2021-02-04 19:30   ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu, robin.murphy

Factor out a helper to find the number of slots for a given size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/swiotlb.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 838dbad10ab916..0c0b81799edbdb 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -194,6 +194,11 @@ static inline unsigned long io_tlb_offset(unsigned long val)
 	return val & (IO_TLB_SEGSIZE - 1);
 }
 
+static unsigned long nr_slots(u64 val)
+{
+	return ALIGN(val, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -493,20 +498,20 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 
 	tbl_dma_addr &= mask;
 
-	offset_slots = ALIGN(tbl_dma_addr, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+	offset_slots = nr_slots(tbl_dma_addr);
 
 	/*
 	 * Carefully handle integer overflow which can occur when mask == ~0UL.
 	 */
 	max_slots = mask + 1
-		    ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
+		    ? nr_slots(mask + 1)
 		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
 
 	/*
 	 * For mappings greater than or equal to a page, we limit the stride
 	 * (and hence alignment) to a page size.
 	 */
-	nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+	nslots = nr_slots(alloc_size);
 	if (alloc_size >= PAGE_SIZE)
 		stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
 	else
@@ -602,7 +607,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 			      enum dma_data_direction dir, unsigned long attrs)
 {
 	unsigned long flags;
-	int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+	int i, count, nslots = nr_slots(alloc_size);
 	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
 	phys_addr_t orig_addr = io_tlb_orig_addr[index];
 
-- 
2.29.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/8] swiotlb: clean up swiotlb_tbl_unmap_single
  2021-02-04 19:30 ` Christoph Hellwig
@ 2021-02-04 19:30   ` Christoph Hellwig
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu,
	erdemaktas, robin.murphy, m.szyprowski

Remove a layer of pointless indentation, replace a hard to follow
ternary expression with a plain if/else.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/swiotlb.c | 41 +++++++++++++++++++++--------------------
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 0c0b81799edbdb..79d5b236f25f10 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -626,28 +626,29 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 	 * with slots below and above the pool being returned.
 	 */
 	spin_lock_irqsave(&io_tlb_lock, flags);
-	{
-		count = ((index + nslots) < ALIGN(index + 1, IO_TLB_SEGSIZE) ?
-			 io_tlb_list[index + nslots] : 0);
-		/*
-		 * Step 1: return the slots to the free list, merging the
-		 * slots with superceeding slots
-		 */
-		for (i = index + nslots - 1; i >= index; i--) {
-			io_tlb_list[i] = ++count;
-			io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
-		}
-		/*
-		 * Step 2: merge the returned slots with the preceding slots,
-		 * if available (non zero)
-		 */
-		for (i = index - 1;
-		     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
-		     io_tlb_list[i]; i--)
-			io_tlb_list[i] = ++count;
+	if (index + nslots < ALIGN(index + 1, IO_TLB_SEGSIZE))
+		count = io_tlb_list[index + nslots];
+	else
+		count = 0;
 
-		io_tlb_used -= nslots;
+	/*
+	 * Step 1: return the slots to the free list, merging the slots with
+	 * superceeding slots
+	 */
+	for (i = index + nslots - 1; i >= index; i--) {
+		io_tlb_list[i] = ++count;
+		io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
 	}
+
+	/*
+	 * Step 2: merge the returned slots with the preceding slots, if
+	 * available (non zero)
+	 */
+	for (i = index - 1;
+	     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 && io_tlb_list[i];
+	     i--)
+		io_tlb_list[i] = ++count;
+	io_tlb_used -= nslots;
 	spin_unlock_irqrestore(&io_tlb_lock, flags);
 }
 
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/8] swiotlb: clean up swiotlb_tbl_unmap_single
@ 2021-02-04 19:30   ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu, robin.murphy

Remove a layer of pointless indentation, replace a hard to follow
ternary expression with a plain if/else.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/swiotlb.c | 41 +++++++++++++++++++++--------------------
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 0c0b81799edbdb..79d5b236f25f10 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -626,28 +626,29 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 	 * with slots below and above the pool being returned.
 	 */
 	spin_lock_irqsave(&io_tlb_lock, flags);
-	{
-		count = ((index + nslots) < ALIGN(index + 1, IO_TLB_SEGSIZE) ?
-			 io_tlb_list[index + nslots] : 0);
-		/*
-		 * Step 1: return the slots to the free list, merging the
-		 * slots with superceeding slots
-		 */
-		for (i = index + nslots - 1; i >= index; i--) {
-			io_tlb_list[i] = ++count;
-			io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
-		}
-		/*
-		 * Step 2: merge the returned slots with the preceding slots,
-		 * if available (non zero)
-		 */
-		for (i = index - 1;
-		     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
-		     io_tlb_list[i]; i--)
-			io_tlb_list[i] = ++count;
+	if (index + nslots < ALIGN(index + 1, IO_TLB_SEGSIZE))
+		count = io_tlb_list[index + nslots];
+	else
+		count = 0;
 
-		io_tlb_used -= nslots;
+	/*
+	 * Step 1: return the slots to the free list, merging the slots with
+	 * superceeding slots
+	 */
+	for (i = index + nslots - 1; i >= index; i--) {
+		io_tlb_list[i] = ++count;
+		io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
 	}
+
+	/*
+	 * Step 2: merge the returned slots with the preceding slots, if
+	 * available (non zero)
+	 */
+	for (i = index - 1;
+	     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 && io_tlb_list[i];
+	     i--)
+		io_tlb_list[i] = ++count;
+	io_tlb_used -= nslots;
 	spin_unlock_irqrestore(&io_tlb_lock, flags);
 }
 
-- 
2.29.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/8] swiotlb: refactor swiotlb_tbl_map_single
  2021-02-04 19:30 ` Christoph Hellwig
@ 2021-02-04 19:30   ` Christoph Hellwig
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu,
	erdemaktas, robin.murphy, m.szyprowski

Split out a bunch of a self-contained helpers to make the function easier
to follow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/swiotlb.c | 177 +++++++++++++++++++++----------------------
 1 file changed, 87 insertions(+), 90 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 79d5b236f25f10..e78615e3be2906 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -468,134 +468,131 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
 	}
 }
 
-phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
-		size_t mapping_size, size_t alloc_size,
-		enum dma_data_direction dir, unsigned long attrs)
+/*
+ * Carefully handle integer overflow which can occur when boundary_mask == ~0UL.
+ */
+static inline unsigned long get_max_slots(unsigned long boundary_mask)
 {
-	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
-	unsigned long flags;
-	phys_addr_t tlb_addr;
-	unsigned int nslots, stride, index, wrap;
-	int i;
-	unsigned long mask;
-	unsigned long offset_slots;
-	unsigned long max_slots;
-	unsigned long tmp_io_tlb_used;
-
-	if (no_iotlb_memory)
-		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
-
-	if (mem_encrypt_active())
-		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
-
-	if (mapping_size > alloc_size) {
-		dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
-			      mapping_size, alloc_size);
-		return (phys_addr_t)DMA_MAPPING_ERROR;
-	}
-
-	mask = dma_get_seg_boundary(hwdev);
+	if (boundary_mask + 1 == ~0UL)
+		return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
+	return nr_slots(boundary_mask + 1);
+}
 
-	tbl_dma_addr &= mask;
+static unsigned int wrap_index(unsigned int index)
+{
+	if (index >= io_tlb_nslabs)
+		return 0;
+	return index;
+}
 
-	offset_slots = nr_slots(tbl_dma_addr);
+/*
+ * Find a suitable number of IO TLB entries size that will fit this request and
+ * allocate a buffer from that IO TLB pool.
+ */
+static int find_slots(struct device *dev, size_t alloc_size,
+		dma_addr_t tbl_dma_addr)
+{
+	unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev));
+	unsigned int nslots = nr_slots(alloc_size), stride = 1;
+	unsigned int index, wrap, count = 0, i;
+	unsigned long flags;
 
-	/*
-	 * Carefully handle integer overflow which can occur when mask == ~0UL.
-	 */
-	max_slots = mask + 1
-		    ? nr_slots(mask + 1)
-		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
+	BUG_ON(!nslots);
 
 	/*
 	 * For mappings greater than or equal to a page, we limit the stride
 	 * (and hence alignment) to a page size.
 	 */
-	nslots = nr_slots(alloc_size);
 	if (alloc_size >= PAGE_SIZE)
-		stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
-	else
-		stride = 1;
-
-	BUG_ON(!nslots);
+		stride <<= (PAGE_SHIFT - IO_TLB_SHIFT);
 
-	/*
-	 * Find suitable number of IO TLB entries size that will fit this
-	 * request and allocate a buffer from that IO TLB pool.
-	 */
 	spin_lock_irqsave(&io_tlb_lock, flags);
-
 	if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))
 		goto not_found;
 
-	index = ALIGN(io_tlb_index, stride);
-	if (index >= io_tlb_nslabs)
-		index = 0;
-	wrap = index;
-
+	index = wrap = wrap_index(ALIGN(io_tlb_index, stride));
 	do {
-		while (iommu_is_span_boundary(index, nslots, offset_slots,
-					      max_slots)) {
-			index += stride;
-			if (index >= io_tlb_nslabs)
-				index = 0;
-			if (index == wrap)
-				goto not_found;
-		}
-
 		/*
 		 * If we find a slot that indicates we have 'nslots' number of
 		 * contiguous buffers, we allocate the buffers from that slot
 		 * and mark the entries as '0' indicating unavailable.
 		 */
-		if (io_tlb_list[index] >= nslots) {
-			int count = 0;
-
-			for (i = index; i < (int) (index + nslots); i++)
-				io_tlb_list[i] = 0;
-			for (i = index - 1;
-			     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
-			     io_tlb_list[i]; i--)
-				io_tlb_list[i] = ++count;
-			tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
-
-			/*
-			 * Update the indices to avoid searching in the next
-			 * round.
-			 */
-			io_tlb_index = ((index + nslots) < io_tlb_nslabs
-					? (index + nslots) : 0);
-
-			goto found;
+		if (!iommu_is_span_boundary(index, nslots,
+					    nr_slots(tbl_dma_addr),
+					    max_slots)) {
+			if (io_tlb_list[index] >= nslots)
+				goto found;
 		}
-		index += stride;
-		if (index >= io_tlb_nslabs)
-			index = 0;
+		index = wrap_index(index + stride);
 	} while (index != wrap);
 
 not_found:
-	tmp_io_tlb_used = io_tlb_used;
-
 	spin_unlock_irqrestore(&io_tlb_lock, flags);
-	if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
-		dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
-			 alloc_size, io_tlb_nslabs, tmp_io_tlb_used);
-	return (phys_addr_t)DMA_MAPPING_ERROR;
+	return -1;
+
 found:
+	for (i = index; i < index + nslots; i++)
+		io_tlb_list[i] = 0;
+	for (i = index - 1;
+	     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
+	     io_tlb_list[i]; i--)
+		io_tlb_list[i] = ++count;
+
+	/*
+	 * Update the indices to avoid searching in the next round.
+	 */
+	if (index + nslots < io_tlb_nslabs)
+		io_tlb_index = index + nslots;
+	else
+		io_tlb_index = 0;
 	io_tlb_used += nslots;
+
 	spin_unlock_irqrestore(&io_tlb_lock, flags);
+	return index;
+}
+
+phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
+		size_t mapping_size, size_t alloc_size,
+		enum dma_data_direction dir, unsigned long attrs)
+{
+	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(dev, io_tlb_start) &
+			dma_get_seg_boundary(dev);
+	unsigned int index, i;
+	phys_addr_t tlb_addr;
+
+	if (no_iotlb_memory)
+		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
+
+	if (mem_encrypt_active())
+		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
+
+	if (mapping_size > alloc_size) {
+		dev_warn_once(dev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
+			      mapping_size, alloc_size);
+		return (phys_addr_t)DMA_MAPPING_ERROR;
+	}
+
+	index = find_slots(dev, alloc_size, tbl_dma_addr);
+	if (index == -1) {
+		if (!(attrs & DMA_ATTR_NO_WARN))
+			dev_warn_ratelimited(dev,
+	"swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
+				 alloc_size, io_tlb_nslabs, io_tlb_used);
+		return (phys_addr_t)DMA_MAPPING_ERROR;
+	}
 
 	/*
 	 * Save away the mapping from the original address to the DMA address.
 	 * This is needed when we sync the memory.  Then we sync the buffer if
 	 * needed.
 	 */
-	for (i = 0; i < nslots; i++)
-		io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+	for (i = 0; i < nr_slots(alloc_size); i++)
+		io_tlb_orig_addr[index + i] = orig_addr + (i << IO_TLB_SHIFT);
+
+	tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
 		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
-
 	return tlb_addr;
 }
 
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/8] swiotlb: refactor swiotlb_tbl_map_single
@ 2021-02-04 19:30   ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu, robin.murphy

Split out a bunch of a self-contained helpers to make the function easier
to follow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/swiotlb.c | 177 +++++++++++++++++++++----------------------
 1 file changed, 87 insertions(+), 90 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 79d5b236f25f10..e78615e3be2906 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -468,134 +468,131 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
 	}
 }
 
-phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
-		size_t mapping_size, size_t alloc_size,
-		enum dma_data_direction dir, unsigned long attrs)
+/*
+ * Carefully handle integer overflow which can occur when boundary_mask == ~0UL.
+ */
+static inline unsigned long get_max_slots(unsigned long boundary_mask)
 {
-	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
-	unsigned long flags;
-	phys_addr_t tlb_addr;
-	unsigned int nslots, stride, index, wrap;
-	int i;
-	unsigned long mask;
-	unsigned long offset_slots;
-	unsigned long max_slots;
-	unsigned long tmp_io_tlb_used;
-
-	if (no_iotlb_memory)
-		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
-
-	if (mem_encrypt_active())
-		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
-
-	if (mapping_size > alloc_size) {
-		dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
-			      mapping_size, alloc_size);
-		return (phys_addr_t)DMA_MAPPING_ERROR;
-	}
-
-	mask = dma_get_seg_boundary(hwdev);
+	if (boundary_mask + 1 == ~0UL)
+		return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
+	return nr_slots(boundary_mask + 1);
+}
 
-	tbl_dma_addr &= mask;
+static unsigned int wrap_index(unsigned int index)
+{
+	if (index >= io_tlb_nslabs)
+		return 0;
+	return index;
+}
 
-	offset_slots = nr_slots(tbl_dma_addr);
+/*
+ * Find a suitable number of IO TLB entries size that will fit this request and
+ * allocate a buffer from that IO TLB pool.
+ */
+static int find_slots(struct device *dev, size_t alloc_size,
+		dma_addr_t tbl_dma_addr)
+{
+	unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev));
+	unsigned int nslots = nr_slots(alloc_size), stride = 1;
+	unsigned int index, wrap, count = 0, i;
+	unsigned long flags;
 
-	/*
-	 * Carefully handle integer overflow which can occur when mask == ~0UL.
-	 */
-	max_slots = mask + 1
-		    ? nr_slots(mask + 1)
-		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
+	BUG_ON(!nslots);
 
 	/*
 	 * For mappings greater than or equal to a page, we limit the stride
 	 * (and hence alignment) to a page size.
 	 */
-	nslots = nr_slots(alloc_size);
 	if (alloc_size >= PAGE_SIZE)
-		stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
-	else
-		stride = 1;
-
-	BUG_ON(!nslots);
+		stride <<= (PAGE_SHIFT - IO_TLB_SHIFT);
 
-	/*
-	 * Find suitable number of IO TLB entries size that will fit this
-	 * request and allocate a buffer from that IO TLB pool.
-	 */
 	spin_lock_irqsave(&io_tlb_lock, flags);
-
 	if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))
 		goto not_found;
 
-	index = ALIGN(io_tlb_index, stride);
-	if (index >= io_tlb_nslabs)
-		index = 0;
-	wrap = index;
-
+	index = wrap = wrap_index(ALIGN(io_tlb_index, stride));
 	do {
-		while (iommu_is_span_boundary(index, nslots, offset_slots,
-					      max_slots)) {
-			index += stride;
-			if (index >= io_tlb_nslabs)
-				index = 0;
-			if (index == wrap)
-				goto not_found;
-		}
-
 		/*
 		 * If we find a slot that indicates we have 'nslots' number of
 		 * contiguous buffers, we allocate the buffers from that slot
 		 * and mark the entries as '0' indicating unavailable.
 		 */
-		if (io_tlb_list[index] >= nslots) {
-			int count = 0;
-
-			for (i = index; i < (int) (index + nslots); i++)
-				io_tlb_list[i] = 0;
-			for (i = index - 1;
-			     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
-			     io_tlb_list[i]; i--)
-				io_tlb_list[i] = ++count;
-			tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
-
-			/*
-			 * Update the indices to avoid searching in the next
-			 * round.
-			 */
-			io_tlb_index = ((index + nslots) < io_tlb_nslabs
-					? (index + nslots) : 0);
-
-			goto found;
+		if (!iommu_is_span_boundary(index, nslots,
+					    nr_slots(tbl_dma_addr),
+					    max_slots)) {
+			if (io_tlb_list[index] >= nslots)
+				goto found;
 		}
-		index += stride;
-		if (index >= io_tlb_nslabs)
-			index = 0;
+		index = wrap_index(index + stride);
 	} while (index != wrap);
 
 not_found:
-	tmp_io_tlb_used = io_tlb_used;
-
 	spin_unlock_irqrestore(&io_tlb_lock, flags);
-	if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
-		dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
-			 alloc_size, io_tlb_nslabs, tmp_io_tlb_used);
-	return (phys_addr_t)DMA_MAPPING_ERROR;
+	return -1;
+
 found:
+	for (i = index; i < index + nslots; i++)
+		io_tlb_list[i] = 0;
+	for (i = index - 1;
+	     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
+	     io_tlb_list[i]; i--)
+		io_tlb_list[i] = ++count;
+
+	/*
+	 * Update the indices to avoid searching in the next round.
+	 */
+	if (index + nslots < io_tlb_nslabs)
+		io_tlb_index = index + nslots;
+	else
+		io_tlb_index = 0;
 	io_tlb_used += nslots;
+
 	spin_unlock_irqrestore(&io_tlb_lock, flags);
+	return index;
+}
+
+phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
+		size_t mapping_size, size_t alloc_size,
+		enum dma_data_direction dir, unsigned long attrs)
+{
+	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(dev, io_tlb_start) &
+			dma_get_seg_boundary(dev);
+	unsigned int index, i;
+	phys_addr_t tlb_addr;
+
+	if (no_iotlb_memory)
+		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
+
+	if (mem_encrypt_active())
+		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
+
+	if (mapping_size > alloc_size) {
+		dev_warn_once(dev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
+			      mapping_size, alloc_size);
+		return (phys_addr_t)DMA_MAPPING_ERROR;
+	}
+
+	index = find_slots(dev, alloc_size, tbl_dma_addr);
+	if (index == -1) {
+		if (!(attrs & DMA_ATTR_NO_WARN))
+			dev_warn_ratelimited(dev,
+	"swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
+				 alloc_size, io_tlb_nslabs, io_tlb_used);
+		return (phys_addr_t)DMA_MAPPING_ERROR;
+	}
 
 	/*
 	 * Save away the mapping from the original address to the DMA address.
 	 * This is needed when we sync the memory.  Then we sync the buffer if
 	 * needed.
 	 */
-	for (i = 0; i < nslots; i++)
-		io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+	for (i = 0; i < nr_slots(alloc_size); i++)
+		io_tlb_orig_addr[index + i] = orig_addr + (i << IO_TLB_SHIFT);
+
+	tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
 		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
-
 	return tlb_addr;
 }
 
-- 
2.29.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 6/8] swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single
  2021-02-04 19:30 ` Christoph Hellwig
@ 2021-02-04 19:30   ` Christoph Hellwig
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu,
	erdemaktas, robin.murphy, m.szyprowski

swiotlb_tbl_map_single currently nevers sets a tlb_addr that is not
aligned to the tlb bucket size.  But we're going to add such a case
soon, for which this adjustment would be bogus.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/swiotlb.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e78615e3be2906..6a2439826a1ba4 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -658,7 +658,6 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
 
 	if (orig_addr == INVALID_PHYS_ADDR)
 		return;
-	orig_addr += (unsigned long)tlb_addr & ((1 << IO_TLB_SHIFT) - 1);
 
 	switch (target) {
 	case SYNC_FOR_CPU:
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 6/8] swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single
@ 2021-02-04 19:30   ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu, robin.murphy

swiotlb_tbl_map_single currently nevers sets a tlb_addr that is not
aligned to the tlb bucket size.  But we're going to add such a case
soon, for which this adjustment would be bogus.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/swiotlb.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e78615e3be2906..6a2439826a1ba4 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -658,7 +658,6 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
 
 	if (orig_addr == INVALID_PHYS_ADDR)
 		return;
-	orig_addr += (unsigned long)tlb_addr & ((1 << IO_TLB_SHIFT) - 1);
 
 	switch (target) {
 	case SYNC_FOR_CPU:
-- 
2.29.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 7/8] swiotlb: respect min_align_mask
  2021-02-04 19:30 ` Christoph Hellwig
@ 2021-02-04 19:30   ` Christoph Hellwig
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu,
	erdemaktas, robin.murphy, m.szyprowski

Respect the min_align_mask in struct device_dma_parameters in swiotlb.

There are two parts to it:
 1) for the lower bits of the alignment inside the io tlb slot, just
    extent the size of the allocation and leave the start of the slot
     empty
 2) for the high bits ensure we find a slot that matches the high bits
    of the alignment to avoid wasting too much memory

Based on an earlier patch from Jianxiong Gao <jxgao@google.com>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/swiotlb.c | 49 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 6a2439826a1ba4..ab3192142b9906 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -468,6 +468,18 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
 	}
 }
 
+/*
+ * Return the offset into a iotlb slot required to keep the device happy.
+ */
+static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
+{
+	unsigned min_align_mask = dma_get_min_align_mask(dev);
+
+	if (!min_align_mask)
+		return 0;
+	return addr & min_align_mask & ((1 << IO_TLB_SHIFT) - 1);
+}
+
 /*
  * Carefully handle integer overflow which can occur when boundary_mask == ~0UL.
  */
@@ -478,6 +490,16 @@ static inline unsigned long get_max_slots(unsigned long boundary_mask)
 	return nr_slots(boundary_mask + 1);
 }
 
+static inline bool check_alignment(phys_addr_t orig_addr,
+		dma_addr_t tbl_dma_addr, unsigned int index,
+		unsigned int min_align_mask)
+{
+	if (!min_align_mask)
+		return true;
+	return ((tbl_dma_addr + (index << IO_TLB_SHIFT)) & min_align_mask) ==
+		(orig_addr & min_align_mask);
+}
+
 static unsigned int wrap_index(unsigned int index)
 {
 	if (index >= io_tlb_nslabs)
@@ -489,9 +511,11 @@ static unsigned int wrap_index(unsigned int index)
  * Find a suitable number of IO TLB entries size that will fit this request and
  * allocate a buffer from that IO TLB pool.
  */
-static int find_slots(struct device *dev, size_t alloc_size,
-		dma_addr_t tbl_dma_addr)
+static int find_slots(struct device *dev, phys_addr_t orig_addr,
+		size_t alloc_size, dma_addr_t tbl_dma_addr)
 {
+	unsigned int min_align_mask = dma_get_min_align_mask(dev) &
+			~((1 << IO_TLB_SHIFT) - 1);
 	unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev));
 	unsigned int nslots = nr_slots(alloc_size), stride = 1;
 	unsigned int index, wrap, count = 0, i;
@@ -503,7 +527,9 @@ static int find_slots(struct device *dev, size_t alloc_size,
 	 * For mappings greater than or equal to a page, we limit the stride
 	 * (and hence alignment) to a page size.
 	 */
-	if (alloc_size >= PAGE_SIZE)
+	if (min_align_mask)
+		stride = (min_align_mask + 1) >> IO_TLB_SHIFT;
+	else if (alloc_size >= PAGE_SIZE)
 		stride <<= (PAGE_SHIFT - IO_TLB_SHIFT);
 
 	spin_lock_irqsave(&io_tlb_lock, flags);
@@ -512,6 +538,12 @@ static int find_slots(struct device *dev, size_t alloc_size,
 
 	index = wrap = wrap_index(ALIGN(io_tlb_index, stride));
 	do {
+		if (!check_alignment(orig_addr, tbl_dma_addr, index,
+				     min_align_mask)) {
+			index = wrap_index(index + 1);
+			continue;
+		}
+
 		/*
 		 * If we find a slot that indicates we have 'nslots' number of
 		 * contiguous buffers, we allocate the buffers from that slot
@@ -557,6 +589,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 {
 	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(dev, io_tlb_start) &
 			dma_get_seg_boundary(dev);
+	unsigned int offset = swiotlb_align_offset(dev, orig_addr);
 	unsigned int index, i;
 	phys_addr_t tlb_addr;
 
@@ -572,7 +605,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 		return (phys_addr_t)DMA_MAPPING_ERROR;
 	}
 
-	index = find_slots(dev, alloc_size, tbl_dma_addr);
+	alloc_size += offset;
+	index = find_slots(dev, orig_addr, alloc_size, tbl_dma_addr);
 	if (index == -1) {
 		if (!(attrs & DMA_ATTR_NO_WARN))
 			dev_warn_ratelimited(dev,
@@ -589,7 +623,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 	for (i = 0; i < nr_slots(alloc_size); i++)
 		io_tlb_orig_addr[index + i] = orig_addr + (i << IO_TLB_SHIFT);
 
-	tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
+	tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT) + offset;
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
 		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
@@ -604,8 +638,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 			      enum dma_data_direction dir, unsigned long attrs)
 {
 	unsigned long flags;
-	int i, count, nslots = nr_slots(alloc_size);
-	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
+	unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
+	int i, count, nslots = nr_slots(alloc_size + offset);
+	int index = (tlb_addr - offset - io_tlb_start) >> IO_TLB_SHIFT;
 	phys_addr_t orig_addr = io_tlb_orig_addr[index];
 
 	/*
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 7/8] swiotlb: respect min_align_mask
@ 2021-02-04 19:30   ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu, robin.murphy

Respect the min_align_mask in struct device_dma_parameters in swiotlb.

There are two parts to it:
 1) for the lower bits of the alignment inside the io tlb slot, just
    extent the size of the allocation and leave the start of the slot
     empty
 2) for the high bits ensure we find a slot that matches the high bits
    of the alignment to avoid wasting too much memory

Based on an earlier patch from Jianxiong Gao <jxgao@google.com>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/swiotlb.c | 49 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 6a2439826a1ba4..ab3192142b9906 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -468,6 +468,18 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
 	}
 }
 
+/*
+ * Return the offset into a iotlb slot required to keep the device happy.
+ */
+static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
+{
+	unsigned min_align_mask = dma_get_min_align_mask(dev);
+
+	if (!min_align_mask)
+		return 0;
+	return addr & min_align_mask & ((1 << IO_TLB_SHIFT) - 1);
+}
+
 /*
  * Carefully handle integer overflow which can occur when boundary_mask == ~0UL.
  */
@@ -478,6 +490,16 @@ static inline unsigned long get_max_slots(unsigned long boundary_mask)
 	return nr_slots(boundary_mask + 1);
 }
 
+static inline bool check_alignment(phys_addr_t orig_addr,
+		dma_addr_t tbl_dma_addr, unsigned int index,
+		unsigned int min_align_mask)
+{
+	if (!min_align_mask)
+		return true;
+	return ((tbl_dma_addr + (index << IO_TLB_SHIFT)) & min_align_mask) ==
+		(orig_addr & min_align_mask);
+}
+
 static unsigned int wrap_index(unsigned int index)
 {
 	if (index >= io_tlb_nslabs)
@@ -489,9 +511,11 @@ static unsigned int wrap_index(unsigned int index)
  * Find a suitable number of IO TLB entries size that will fit this request and
  * allocate a buffer from that IO TLB pool.
  */
-static int find_slots(struct device *dev, size_t alloc_size,
-		dma_addr_t tbl_dma_addr)
+static int find_slots(struct device *dev, phys_addr_t orig_addr,
+		size_t alloc_size, dma_addr_t tbl_dma_addr)
 {
+	unsigned int min_align_mask = dma_get_min_align_mask(dev) &
+			~((1 << IO_TLB_SHIFT) - 1);
 	unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev));
 	unsigned int nslots = nr_slots(alloc_size), stride = 1;
 	unsigned int index, wrap, count = 0, i;
@@ -503,7 +527,9 @@ static int find_slots(struct device *dev, size_t alloc_size,
 	 * For mappings greater than or equal to a page, we limit the stride
 	 * (and hence alignment) to a page size.
 	 */
-	if (alloc_size >= PAGE_SIZE)
+	if (min_align_mask)
+		stride = (min_align_mask + 1) >> IO_TLB_SHIFT;
+	else if (alloc_size >= PAGE_SIZE)
 		stride <<= (PAGE_SHIFT - IO_TLB_SHIFT);
 
 	spin_lock_irqsave(&io_tlb_lock, flags);
@@ -512,6 +538,12 @@ static int find_slots(struct device *dev, size_t alloc_size,
 
 	index = wrap = wrap_index(ALIGN(io_tlb_index, stride));
 	do {
+		if (!check_alignment(orig_addr, tbl_dma_addr, index,
+				     min_align_mask)) {
+			index = wrap_index(index + 1);
+			continue;
+		}
+
 		/*
 		 * If we find a slot that indicates we have 'nslots' number of
 		 * contiguous buffers, we allocate the buffers from that slot
@@ -557,6 +589,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 {
 	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(dev, io_tlb_start) &
 			dma_get_seg_boundary(dev);
+	unsigned int offset = swiotlb_align_offset(dev, orig_addr);
 	unsigned int index, i;
 	phys_addr_t tlb_addr;
 
@@ -572,7 +605,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 		return (phys_addr_t)DMA_MAPPING_ERROR;
 	}
 
-	index = find_slots(dev, alloc_size, tbl_dma_addr);
+	alloc_size += offset;
+	index = find_slots(dev, orig_addr, alloc_size, tbl_dma_addr);
 	if (index == -1) {
 		if (!(attrs & DMA_ATTR_NO_WARN))
 			dev_warn_ratelimited(dev,
@@ -589,7 +623,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 	for (i = 0; i < nr_slots(alloc_size); i++)
 		io_tlb_orig_addr[index + i] = orig_addr + (i << IO_TLB_SHIFT);
 
-	tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
+	tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT) + offset;
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
 		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
@@ -604,8 +638,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 			      enum dma_data_direction dir, unsigned long attrs)
 {
 	unsigned long flags;
-	int i, count, nslots = nr_slots(alloc_size);
-	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
+	unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
+	int i, count, nslots = nr_slots(alloc_size + offset);
+	int index = (tlb_addr - offset - io_tlb_start) >> IO_TLB_SHIFT;
 	phys_addr_t orig_addr = io_tlb_orig_addr[index];
 
 	/*
-- 
2.29.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 8/8] nvme-pci: set min_align_mask
  2021-02-04 19:30 ` Christoph Hellwig
@ 2021-02-04 19:30   ` Christoph Hellwig
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu,
	erdemaktas, robin.murphy, m.szyprowski

From: Jianxiong Gao <jxgao@google.com>

The PRP addressing scheme requires all PRP entries except for the
first one to have a zero offset into the NVMe controller pages (which
can be different from the Linux PAGE_SIZE).  Use the min_align_mask
device parameter to ensure that swiotlb does not change the address
of the buffer modulo the device page size to ensure that the PRPs
won't be malformed.

Signed-off-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 81e6389b204205..5d194b4e814719 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2629,6 +2629,7 @@ static void nvme_reset_work(struct work_struct *work)
 	 * Don't limit the IOMMU merged segment size.
 	 */
 	dma_set_max_seg_size(dev->dev, 0xffffffff);
+	dma_set_min_align_mask(&pdev->dev, NVME_CTRL_PAGE_SIZE - 1);
 
 	mutex_unlock(&dev->shutdown_lock);
 
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 8/8] nvme-pci: set min_align_mask
@ 2021-02-04 19:30   ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:30 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu, robin.murphy

From: Jianxiong Gao <jxgao@google.com>

The PRP addressing scheme requires all PRP entries except for the
first one to have a zero offset into the NVMe controller pages (which
can be different from the Linux PAGE_SIZE).  Use the min_align_mask
device parameter to ensure that swiotlb does not change the address
of the buffer modulo the device page size to ensure that the PRPs
won't be malformed.

Signed-off-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 81e6389b204205..5d194b4e814719 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2629,6 +2629,7 @@ static void nvme_reset_work(struct work_struct *work)
 	 * Don't limit the IOMMU merged segment size.
 	 */
 	dma_set_max_seg_size(dev->dev, 0xffffffff);
+	dma_set_min_align_mask(&pdev->dev, NVME_CTRL_PAGE_SIZE - 1);
 
 	mutex_unlock(&dev->shutdown_lock);
 
-- 
2.29.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] nvme-pci: set min_align_mask
  2021-02-04 19:30   ` Christoph Hellwig
@ 2021-02-04 19:32     ` Christoph Hellwig
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:32 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, erdemaktas, iommu,
	kbusch, robin.murphy, m.szyprowski

> +	dma_set_min_align_mask(&pdev->dev, NVME_CTRL_PAGE_SIZE - 1);

And due to a last minute change from me this doesn't actually compile,
as pdev should be dev.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] nvme-pci: set min_align_mask
@ 2021-02-04 19:32     ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-04 19:32 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, iommu, kbusch, robin.murphy

> +	dma_set_min_align_mask(&pdev->dev, NVME_CTRL_PAGE_SIZE - 1);

And due to a last minute change from me this doesn't actually compile,
as pdev should be dev.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/8] driver core: add a min_align_mask field to struct device_dma_parameters
  2021-02-04 19:30   ` Christoph Hellwig
@ 2021-02-04 19:44     ` Greg KH
  -1 siblings, 0 replies; 36+ messages in thread
From: Greg KH @ 2021-02-04 19:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu,
	erdemaktas, jxgao, robin.murphy, m.szyprowski

On Thu, Feb 04, 2021 at 08:30:28PM +0100, Christoph Hellwig wrote:
> From: Jianxiong Gao <jxgao@google.com>
> 
> Some devices rely on the address offset in a page to function
> correctly (NVMe driver as an example). These devices may use
> a different page size than the Linux kernel. The address offset
> has to be preserved upon mapping, and in order to do so, we
> need to record the page_offset_mask first.
> 
> Signed-off-by: Jianxiong Gao <jxgao@google.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/device.h      |  1 +
>  include/linux/dma-mapping.h | 16 ++++++++++++++++
>  2 files changed, 17 insertions(+)

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/8] driver core: add a min_align_mask field to struct device_dma_parameters
@ 2021-02-04 19:44     ` Greg KH
  0 siblings, 0 replies; 36+ messages in thread
From: Greg KH @ 2021-02-04 19:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu,
	jxgao, robin.murphy

On Thu, Feb 04, 2021 at 08:30:28PM +0100, Christoph Hellwig wrote:
> From: Jianxiong Gao <jxgao@google.com>
> 
> Some devices rely on the address offset in a page to function
> correctly (NVMe driver as an example). These devices may use
> a different page size than the Linux kernel. The address offset
> has to be preserved upon mapping, and in order to do so, we
> need to record the page_offset_mask first.
> 
> Signed-off-by: Jianxiong Gao <jxgao@google.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/device.h      |  1 +
>  include/linux/dma-mapping.h | 16 ++++++++++++++++
>  2 files changed, 17 insertions(+)

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/8] swiotlb: factor out a nr_slots helper
  2021-02-04 19:30   ` Christoph Hellwig
@ 2021-02-04 22:09     ` Robin Murphy
  -1 siblings, 0 replies; 36+ messages in thread
From: Robin Murphy @ 2021-02-04 22:09 UTC (permalink / raw)
  To: Christoph Hellwig, jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu,
	erdemaktas, m.szyprowski

On 2021-02-04 19:30, Christoph Hellwig wrote:
> Factor out a helper to find the number of slots for a given size.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   kernel/dma/swiotlb.c | 13 +++++++++----
>   1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 838dbad10ab916..0c0b81799edbdb 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -194,6 +194,11 @@ static inline unsigned long io_tlb_offset(unsigned long val)
>   	return val & (IO_TLB_SEGSIZE - 1);
>   }
>   
> +static unsigned long nr_slots(u64 val)
> +{
> +	return ALIGN(val, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;

Would DIV_ROUND_UP(val, 1 << IOTLB_SHIFT) be even clearer?

Robin.

> +}
> +
>   /*
>    * Early SWIOTLB allocation may be too early to allow an architecture to
>    * perform the desired operations.  This function allows the architecture to
> @@ -493,20 +498,20 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
>   
>   	tbl_dma_addr &= mask;
>   
> -	offset_slots = ALIGN(tbl_dma_addr, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
> +	offset_slots = nr_slots(tbl_dma_addr);
>   
>   	/*
>   	 * Carefully handle integer overflow which can occur when mask == ~0UL.
>   	 */
>   	max_slots = mask + 1
> -		    ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
> +		    ? nr_slots(mask + 1)
>   		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
>   
>   	/*
>   	 * For mappings greater than or equal to a page, we limit the stride
>   	 * (and hence alignment) to a page size.
>   	 */
> -	nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
> +	nslots = nr_slots(alloc_size);
>   	if (alloc_size >= PAGE_SIZE)
>   		stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
>   	else
> @@ -602,7 +607,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
>   			      enum dma_data_direction dir, unsigned long attrs)
>   {
>   	unsigned long flags;
> -	int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
> +	int i, count, nslots = nr_slots(alloc_size);
>   	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
>   	phys_addr_t orig_addr = io_tlb_orig_addr[index];
>   
> 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/8] swiotlb: factor out a nr_slots helper
@ 2021-02-04 22:09     ` Robin Murphy
  0 siblings, 0 replies; 36+ messages in thread
From: Robin Murphy @ 2021-02-04 22:09 UTC (permalink / raw)
  To: Christoph Hellwig, jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu

On 2021-02-04 19:30, Christoph Hellwig wrote:
> Factor out a helper to find the number of slots for a given size.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   kernel/dma/swiotlb.c | 13 +++++++++----
>   1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 838dbad10ab916..0c0b81799edbdb 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -194,6 +194,11 @@ static inline unsigned long io_tlb_offset(unsigned long val)
>   	return val & (IO_TLB_SEGSIZE - 1);
>   }
>   
> +static unsigned long nr_slots(u64 val)
> +{
> +	return ALIGN(val, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;

Would DIV_ROUND_UP(val, 1 << IOTLB_SHIFT) be even clearer?

Robin.

> +}
> +
>   /*
>    * Early SWIOTLB allocation may be too early to allow an architecture to
>    * perform the desired operations.  This function allows the architecture to
> @@ -493,20 +498,20 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
>   
>   	tbl_dma_addr &= mask;
>   
> -	offset_slots = ALIGN(tbl_dma_addr, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
> +	offset_slots = nr_slots(tbl_dma_addr);
>   
>   	/*
>   	 * Carefully handle integer overflow which can occur when mask == ~0UL.
>   	 */
>   	max_slots = mask + 1
> -		    ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
> +		    ? nr_slots(mask + 1)
>   		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
>   
>   	/*
>   	 * For mappings greater than or equal to a page, we limit the stride
>   	 * (and hence alignment) to a page size.
>   	 */
> -	nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
> +	nslots = nr_slots(alloc_size);
>   	if (alloc_size >= PAGE_SIZE)
>   		stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
>   	else
> @@ -602,7 +607,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
>   			      enum dma_data_direction dir, unsigned long attrs)
>   {
>   	unsigned long flags;
> -	int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
> +	int i, count, nslots = nr_slots(alloc_size);
>   	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
>   	phys_addr_t orig_addr = io_tlb_orig_addr[index];
>   
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/8] swiotlb: refactor swiotlb_tbl_map_single
  2021-02-04 19:30   ` Christoph Hellwig
@ 2021-02-04 22:12     ` Robin Murphy
  -1 siblings, 0 replies; 36+ messages in thread
From: Robin Murphy @ 2021-02-04 22:12 UTC (permalink / raw)
  To: Christoph Hellwig, jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu,
	erdemaktas, m.szyprowski

On 2021-02-04 19:30, Christoph Hellwig wrote:
> Split out a bunch of a self-contained helpers to make the function easier
> to follow.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   kernel/dma/swiotlb.c | 177 +++++++++++++++++++++----------------------
>   1 file changed, 87 insertions(+), 90 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 79d5b236f25f10..e78615e3be2906 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -468,134 +468,131 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
>   	}
>   }
>   
> -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
> -		size_t mapping_size, size_t alloc_size,
> -		enum dma_data_direction dir, unsigned long attrs)
> +/*
> + * Carefully handle integer overflow which can occur when boundary_mask == ~0UL.
> + */
> +static inline unsigned long get_max_slots(unsigned long boundary_mask)
>   {
> -	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
> -	unsigned long flags;
> -	phys_addr_t tlb_addr;
> -	unsigned int nslots, stride, index, wrap;
> -	int i;
> -	unsigned long mask;
> -	unsigned long offset_slots;
> -	unsigned long max_slots;
> -	unsigned long tmp_io_tlb_used;
> -
> -	if (no_iotlb_memory)
> -		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
> -
> -	if (mem_encrypt_active())
> -		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
> -
> -	if (mapping_size > alloc_size) {
> -		dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
> -			      mapping_size, alloc_size);
> -		return (phys_addr_t)DMA_MAPPING_ERROR;
> -	}
> -
> -	mask = dma_get_seg_boundary(hwdev);
> +	if (boundary_mask + 1 == ~0UL)

Either "mask == ~0UL" or "mask + 1 == 0", surely?

> +		return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
> +	return nr_slots(boundary_mask + 1);
> +}
>   
> -	tbl_dma_addr &= mask;
> +static unsigned int wrap_index(unsigned int index)
> +{
> +	if (index >= io_tlb_nslabs)
> +		return 0;
> +	return index;
> +}
>   
> -	offset_slots = nr_slots(tbl_dma_addr);
> +/*
> + * Find a suitable number of IO TLB entries size that will fit this request and
> + * allocate a buffer from that IO TLB pool.
> + */
> +static int find_slots(struct device *dev, size_t alloc_size,
> +		dma_addr_t tbl_dma_addr)
> +{
> +	unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev));
> +	unsigned int nslots = nr_slots(alloc_size), stride = 1;
> +	unsigned int index, wrap, count = 0, i;
> +	unsigned long flags;
>   
> -	/*
> -	 * Carefully handle integer overflow which can occur when mask == ~0UL.
> -	 */
> -	max_slots = mask + 1
> -		    ? nr_slots(mask + 1)
> -		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);

...since the condition here is effectively the latter.

Robin.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/8] swiotlb: refactor swiotlb_tbl_map_single
@ 2021-02-04 22:12     ` Robin Murphy
  0 siblings, 0 replies; 36+ messages in thread
From: Robin Murphy @ 2021-02-04 22:12 UTC (permalink / raw)
  To: Christoph Hellwig, jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu

On 2021-02-04 19:30, Christoph Hellwig wrote:
> Split out a bunch of a self-contained helpers to make the function easier
> to follow.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   kernel/dma/swiotlb.c | 177 +++++++++++++++++++++----------------------
>   1 file changed, 87 insertions(+), 90 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 79d5b236f25f10..e78615e3be2906 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -468,134 +468,131 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
>   	}
>   }
>   
> -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
> -		size_t mapping_size, size_t alloc_size,
> -		enum dma_data_direction dir, unsigned long attrs)
> +/*
> + * Carefully handle integer overflow which can occur when boundary_mask == ~0UL.
> + */
> +static inline unsigned long get_max_slots(unsigned long boundary_mask)
>   {
> -	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
> -	unsigned long flags;
> -	phys_addr_t tlb_addr;
> -	unsigned int nslots, stride, index, wrap;
> -	int i;
> -	unsigned long mask;
> -	unsigned long offset_slots;
> -	unsigned long max_slots;
> -	unsigned long tmp_io_tlb_used;
> -
> -	if (no_iotlb_memory)
> -		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
> -
> -	if (mem_encrypt_active())
> -		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
> -
> -	if (mapping_size > alloc_size) {
> -		dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
> -			      mapping_size, alloc_size);
> -		return (phys_addr_t)DMA_MAPPING_ERROR;
> -	}
> -
> -	mask = dma_get_seg_boundary(hwdev);
> +	if (boundary_mask + 1 == ~0UL)

Either "mask == ~0UL" or "mask + 1 == 0", surely?

> +		return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
> +	return nr_slots(boundary_mask + 1);
> +}
>   
> -	tbl_dma_addr &= mask;
> +static unsigned int wrap_index(unsigned int index)
> +{
> +	if (index >= io_tlb_nslabs)
> +		return 0;
> +	return index;
> +}
>   
> -	offset_slots = nr_slots(tbl_dma_addr);
> +/*
> + * Find a suitable number of IO TLB entries size that will fit this request and
> + * allocate a buffer from that IO TLB pool.
> + */
> +static int find_slots(struct device *dev, size_t alloc_size,
> +		dma_addr_t tbl_dma_addr)
> +{
> +	unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev));
> +	unsigned int nslots = nr_slots(alloc_size), stride = 1;
> +	unsigned int index, wrap, count = 0, i;
> +	unsigned long flags;
>   
> -	/*
> -	 * Carefully handle integer overflow which can occur when mask == ~0UL.
> -	 */
> -	max_slots = mask + 1
> -		    ? nr_slots(mask + 1)
> -		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);

...since the condition here is effectively the latter.

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 7/8] swiotlb: respect min_align_mask
  2021-02-04 19:30   ` Christoph Hellwig
@ 2021-02-04 23:13     ` Robin Murphy
  -1 siblings, 0 replies; 36+ messages in thread
From: Robin Murphy @ 2021-02-04 23:13 UTC (permalink / raw)
  To: Christoph Hellwig, jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu,
	erdemaktas, m.szyprowski

On 2021-02-04 19:30, Christoph Hellwig wrote:
> Respect the min_align_mask in struct device_dma_parameters in swiotlb.
> 
> There are two parts to it:
>   1) for the lower bits of the alignment inside the io tlb slot, just
>      extent the size of the allocation and leave the start of the slot
>       empty
>   2) for the high bits ensure we find a slot that matches the high bits
>      of the alignment to avoid wasting too much memory
> 
> Based on an earlier patch from Jianxiong Gao <jxgao@google.com>.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   kernel/dma/swiotlb.c | 49 +++++++++++++++++++++++++++++++++++++-------
>   1 file changed, 42 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 6a2439826a1ba4..ab3192142b9906 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -468,6 +468,18 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
>   	}
>   }
>   
> +/*
> + * Return the offset into a iotlb slot required to keep the device happy.
> + */
> +static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
> +{
> +	unsigned min_align_mask = dma_get_min_align_mask(dev);
> +
> +	if (!min_align_mask)
> +		return 0;

I doubt that's beneficial - even if the compiler can convert it into a 
csel, it'll then be doing unnecessary work to throw away a 
cheaply-calculated 0 in favour of hard-coded 0 in the one case it matters ;)

> +	return addr & min_align_mask & ((1 << IO_TLB_SHIFT) - 1);

(BTW, for readability throughout, "#define IO_TLB_SIZE (1 << 
IO_TLB_SHIFT)" sure wouldn't go amiss...)

> +}
> +
>   /*
>    * Carefully handle integer overflow which can occur when boundary_mask == ~0UL.
>    */
> @@ -478,6 +490,16 @@ static inline unsigned long get_max_slots(unsigned long boundary_mask)
>   	return nr_slots(boundary_mask + 1);
>   }
>   
> +static inline bool check_alignment(phys_addr_t orig_addr,
> +		dma_addr_t tbl_dma_addr, unsigned int index,
> +		unsigned int min_align_mask)
> +{
> +	if (!min_align_mask)
> +		return true;

Ditto - even the 5 or so operations this might skip is unlikely to 
outweigh a branch on anything that matters, and again csel would be a 
net loss since x & 0 == y & 0 is still the correct answer.

> +	return ((tbl_dma_addr + (index << IO_TLB_SHIFT)) & min_align_mask) ==
> +		(orig_addr & min_align_mask);
> +}
> +
>   static unsigned int wrap_index(unsigned int index)
>   {
>   	if (index >= io_tlb_nslabs)
> @@ -489,9 +511,11 @@ static unsigned int wrap_index(unsigned int index)
>    * Find a suitable number of IO TLB entries size that will fit this request and
>    * allocate a buffer from that IO TLB pool.
>    */
> -static int find_slots(struct device *dev, size_t alloc_size,
> -		dma_addr_t tbl_dma_addr)
> +static int find_slots(struct device *dev, phys_addr_t orig_addr,
> +		size_t alloc_size, dma_addr_t tbl_dma_addr)
>   {
> +	unsigned int min_align_mask = dma_get_min_align_mask(dev) &
> +			~((1 << IO_TLB_SHIFT) - 1);
>   	unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev));
>   	unsigned int nslots = nr_slots(alloc_size), stride = 1;
>   	unsigned int index, wrap, count = 0, i;
> @@ -503,7 +527,9 @@ static int find_slots(struct device *dev, size_t alloc_size,
>   	 * For mappings greater than or equal to a page, we limit the stride
>   	 * (and hence alignment) to a page size.
>   	 */
> -	if (alloc_size >= PAGE_SIZE)
> +	if (min_align_mask)
> +		stride = (min_align_mask + 1) >> IO_TLB_SHIFT;

So this can't underflow because "min_align_mask" is actually just the 
high-order bits representing the number of iotlb slots needed to meet 
the requirement, right? (It took a good 5 minutes to realise this wasn't 
doing what I initially thought it did...)

In that case, a) could the local var be called something like 
iotlb_align_mask to clarify that it's *not* just a copy of the device's 
min_align_mask, and b) maybe just have an unconditional initialisation 
that works either way:

	stride = (min_align_mask >> IO_TLB_SHIFT) + 1;

In fact with that, I think could just mask orig_addr with ~IO_TLB_SIZE 
in the call to check_alignment() below, or shift everything down by 
IO_TLB_SHIFT in check_alignment() itself, instead of mangling 
min_align_mask at all (I'm assuming we do need to ignore the low-order 
bits of orig_addr at this point).

Robin.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 7/8] swiotlb: respect min_align_mask
@ 2021-02-04 23:13     ` Robin Murphy
  0 siblings, 0 replies; 36+ messages in thread
From: Robin Murphy @ 2021-02-04 23:13 UTC (permalink / raw)
  To: Christoph Hellwig, jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, kbusch, iommu

On 2021-02-04 19:30, Christoph Hellwig wrote:
> Respect the min_align_mask in struct device_dma_parameters in swiotlb.
> 
> There are two parts to it:
>   1) for the lower bits of the alignment inside the io tlb slot, just
>      extent the size of the allocation and leave the start of the slot
>       empty
>   2) for the high bits ensure we find a slot that matches the high bits
>      of the alignment to avoid wasting too much memory
> 
> Based on an earlier patch from Jianxiong Gao <jxgao@google.com>.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   kernel/dma/swiotlb.c | 49 +++++++++++++++++++++++++++++++++++++-------
>   1 file changed, 42 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 6a2439826a1ba4..ab3192142b9906 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -468,6 +468,18 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
>   	}
>   }
>   
> +/*
> + * Return the offset into a iotlb slot required to keep the device happy.
> + */
> +static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
> +{
> +	unsigned min_align_mask = dma_get_min_align_mask(dev);
> +
> +	if (!min_align_mask)
> +		return 0;

I doubt that's beneficial - even if the compiler can convert it into a 
csel, it'll then be doing unnecessary work to throw away a 
cheaply-calculated 0 in favour of hard-coded 0 in the one case it matters ;)

> +	return addr & min_align_mask & ((1 << IO_TLB_SHIFT) - 1);

(BTW, for readability throughout, "#define IO_TLB_SIZE (1 << 
IO_TLB_SHIFT)" sure wouldn't go amiss...)

> +}
> +
>   /*
>    * Carefully handle integer overflow which can occur when boundary_mask == ~0UL.
>    */
> @@ -478,6 +490,16 @@ static inline unsigned long get_max_slots(unsigned long boundary_mask)
>   	return nr_slots(boundary_mask + 1);
>   }
>   
> +static inline bool check_alignment(phys_addr_t orig_addr,
> +		dma_addr_t tbl_dma_addr, unsigned int index,
> +		unsigned int min_align_mask)
> +{
> +	if (!min_align_mask)
> +		return true;

Ditto - even the 5 or so operations this might skip is unlikely to 
outweigh a branch on anything that matters, and again csel would be a 
net loss since x & 0 == y & 0 is still the correct answer.

> +	return ((tbl_dma_addr + (index << IO_TLB_SHIFT)) & min_align_mask) ==
> +		(orig_addr & min_align_mask);
> +}
> +
>   static unsigned int wrap_index(unsigned int index)
>   {
>   	if (index >= io_tlb_nslabs)
> @@ -489,9 +511,11 @@ static unsigned int wrap_index(unsigned int index)
>    * Find a suitable number of IO TLB entries size that will fit this request and
>    * allocate a buffer from that IO TLB pool.
>    */
> -static int find_slots(struct device *dev, size_t alloc_size,
> -		dma_addr_t tbl_dma_addr)
> +static int find_slots(struct device *dev, phys_addr_t orig_addr,
> +		size_t alloc_size, dma_addr_t tbl_dma_addr)
>   {
> +	unsigned int min_align_mask = dma_get_min_align_mask(dev) &
> +			~((1 << IO_TLB_SHIFT) - 1);
>   	unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev));
>   	unsigned int nslots = nr_slots(alloc_size), stride = 1;
>   	unsigned int index, wrap, count = 0, i;
> @@ -503,7 +527,9 @@ static int find_slots(struct device *dev, size_t alloc_size,
>   	 * For mappings greater than or equal to a page, we limit the stride
>   	 * (and hence alignment) to a page size.
>   	 */
> -	if (alloc_size >= PAGE_SIZE)
> +	if (min_align_mask)
> +		stride = (min_align_mask + 1) >> IO_TLB_SHIFT;

So this can't underflow because "min_align_mask" is actually just the 
high-order bits representing the number of iotlb slots needed to meet 
the requirement, right? (It took a good 5 minutes to realise this wasn't 
doing what I initially thought it did...)

In that case, a) could the local var be called something like 
iotlb_align_mask to clarify that it's *not* just a copy of the device's 
min_align_mask, and b) maybe just have an unconditional initialisation 
that works either way:

	stride = (min_align_mask >> IO_TLB_SHIFT) + 1;

In fact with that, I think could just mask orig_addr with ~IO_TLB_SIZE 
in the call to check_alignment() below, or shift everything down by 
IO_TLB_SHIFT in check_alignment() itself, instead of mangling 
min_align_mask at all (I'm assuming we do need to ignore the low-order 
bits of orig_addr at this point).

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/8] swiotlb: factor out a nr_slots helper
  2021-02-04 22:09     ` Robin Murphy
@ 2021-02-05  9:45       ` Christoph Hellwig
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-05  9:45 UTC (permalink / raw)
  To: Robin Murphy
  Cc: saravanak, konrad.wilk, marcorr, gregkh, linux-nvme, kbusch,
	iommu, erdemaktas, m.szyprowski, Christoph Hellwig, jxgao

On Thu, Feb 04, 2021 at 10:09:02PM +0000, Robin Murphy wrote:
>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>> index 838dbad10ab916..0c0b81799edbdb 100644
>> --- a/kernel/dma/swiotlb.c
>> +++ b/kernel/dma/swiotlb.c
>> @@ -194,6 +194,11 @@ static inline unsigned long io_tlb_offset(unsigned long val)
>>   	return val & (IO_TLB_SEGSIZE - 1);
>>   }
>>   +static unsigned long nr_slots(u64 val)
>> +{
>> +	return ALIGN(val, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
>
> Would DIV_ROUND_UP(val, 1 << IOTLB_SHIFT) be even clearer?

Not sure it is all that much cleaner, but it does fit a common pattern,
so I'll switch to that.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/8] swiotlb: factor out a nr_slots helper
@ 2021-02-05  9:45       ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-05  9:45 UTC (permalink / raw)
  To: Robin Murphy
  Cc: saravanak, konrad.wilk, marcorr, gregkh, linux-nvme, kbusch,
	iommu, Christoph Hellwig, jxgao

On Thu, Feb 04, 2021 at 10:09:02PM +0000, Robin Murphy wrote:
>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>> index 838dbad10ab916..0c0b81799edbdb 100644
>> --- a/kernel/dma/swiotlb.c
>> +++ b/kernel/dma/swiotlb.c
>> @@ -194,6 +194,11 @@ static inline unsigned long io_tlb_offset(unsigned long val)
>>   	return val & (IO_TLB_SEGSIZE - 1);
>>   }
>>   +static unsigned long nr_slots(u64 val)
>> +{
>> +	return ALIGN(val, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
>
> Would DIV_ROUND_UP(val, 1 << IOTLB_SHIFT) be even clearer?

Not sure it is all that much cleaner, but it does fit a common pattern,
so I'll switch to that.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/8] swiotlb: refactor swiotlb_tbl_map_single
  2021-02-04 22:12     ` Robin Murphy
@ 2021-02-05  9:45       ` Christoph Hellwig
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-05  9:45 UTC (permalink / raw)
  To: Robin Murphy
  Cc: saravanak, konrad.wilk, marcorr, gregkh, linux-nvme, kbusch,
	iommu, erdemaktas, m.szyprowski, Christoph Hellwig, jxgao

On Thu, Feb 04, 2021 at 10:12:31PM +0000, Robin Murphy wrote:
>> -	mask = dma_get_seg_boundary(hwdev);
>> +	if (boundary_mask + 1 == ~0UL)
>
> Either "mask == ~0UL" or "mask + 1 == 0", surely?

I switched forth and back a few times and ended up with the broken
variant in the middle.  Fixed.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/8] swiotlb: refactor swiotlb_tbl_map_single
@ 2021-02-05  9:45       ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-05  9:45 UTC (permalink / raw)
  To: Robin Murphy
  Cc: saravanak, konrad.wilk, marcorr, gregkh, linux-nvme, kbusch,
	iommu, Christoph Hellwig, jxgao

On Thu, Feb 04, 2021 at 10:12:31PM +0000, Robin Murphy wrote:
>> -	mask = dma_get_seg_boundary(hwdev);
>> +	if (boundary_mask + 1 == ~0UL)
>
> Either "mask == ~0UL" or "mask + 1 == 0", surely?

I switched forth and back a few times and ended up with the broken
variant in the middle.  Fixed.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 7/8] swiotlb: respect min_align_mask
  2021-02-04 23:13     ` Robin Murphy
@ 2021-02-05 10:34       ` Christoph Hellwig
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-05 10:34 UTC (permalink / raw)
  To: Robin Murphy
  Cc: saravanak, konrad.wilk, marcorr, gregkh, linux-nvme, kbusch,
	iommu, erdemaktas, m.szyprowski, Christoph Hellwig, jxgao

On Thu, Feb 04, 2021 at 11:13:45PM +0000, Robin Murphy wrote:
>> + */
>> +static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
>> +{
>> +	unsigned min_align_mask = dma_get_min_align_mask(dev);
>> +
>> +	if (!min_align_mask)
>> +		return 0;
>
> I doubt that's beneficial - even if the compiler can convert it into a 
> csel, it'll then be doing unnecessary work to throw away a 
> cheaply-calculated 0 in favour of hard-coded 0 in the one case it matters 

True, I'll drop the checks.

> ;)
>
>> +	return addr & min_align_mask & ((1 << IO_TLB_SHIFT) - 1);
>
> (BTW, for readability throughout, "#define IO_TLB_SIZE (1 << IO_TLB_SHIFT)" 
> sure wouldn't go amiss...)

I actually had a patch doing just that, but as it is the only patch
touching swiotlb.h it caused endless rebuilds for me, so I dropped it
as it only had a few uses anyway.  But I've added it back.

>> -	if (alloc_size >= PAGE_SIZE)
>> +	if (min_align_mask)
>> +		stride = (min_align_mask + 1) >> IO_TLB_SHIFT;
>
> So this can't underflow because "min_align_mask" is actually just the 
> high-order bits representing the number of iotlb slots needed to meet the 
> requirement, right? (It took a good 5 minutes to realise this wasn't doing 
> what I initially thought it did...)

Yes.

> In that case, a) could the local var be called something like 
> iotlb_align_mask to clarify that it's *not* just a copy of the device's 
> min_align_mask,

Ok.

> and b) maybe just have an unconditional initialisation that 
> works either way:
>
> 	stride = (min_align_mask >> IO_TLB_SHIFT) + 1;

Sure.

> In fact with that, I think could just mask orig_addr with ~IO_TLB_SIZE in 
> the call to check_alignment() below, or shift everything down by 
> IO_TLB_SHIFT in check_alignment() itself, instead of mangling 
> min_align_mask at all (I'm assuming we do need to ignore the low-order bits 
> of orig_addr at this point).

Yes, we do need to ignore the low bits as they won't ever be set in
tlb_dma_addr.  Not sure the shift helps as we need to mask first.

I ended up killing check_alignment entirely, in favor of a new
slot_addr helper that calculates the address based off the base and index
and which can be used in a few other places as this one.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 7/8] swiotlb: respect min_align_mask
@ 2021-02-05 10:34       ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-05 10:34 UTC (permalink / raw)
  To: Robin Murphy
  Cc: saravanak, konrad.wilk, marcorr, gregkh, linux-nvme, kbusch,
	iommu, Christoph Hellwig, jxgao

On Thu, Feb 04, 2021 at 11:13:45PM +0000, Robin Murphy wrote:
>> + */
>> +static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
>> +{
>> +	unsigned min_align_mask = dma_get_min_align_mask(dev);
>> +
>> +	if (!min_align_mask)
>> +		return 0;
>
> I doubt that's beneficial - even if the compiler can convert it into a 
> csel, it'll then be doing unnecessary work to throw away a 
> cheaply-calculated 0 in favour of hard-coded 0 in the one case it matters 

True, I'll drop the checks.

> ;)
>
>> +	return addr & min_align_mask & ((1 << IO_TLB_SHIFT) - 1);
>
> (BTW, for readability throughout, "#define IO_TLB_SIZE (1 << IO_TLB_SHIFT)" 
> sure wouldn't go amiss...)

I actually had a patch doing just that, but as it is the only patch
touching swiotlb.h it caused endless rebuilds for me, so I dropped it
as it only had a few uses anyway.  But I've added it back.

>> -	if (alloc_size >= PAGE_SIZE)
>> +	if (min_align_mask)
>> +		stride = (min_align_mask + 1) >> IO_TLB_SHIFT;
>
> So this can't underflow because "min_align_mask" is actually just the 
> high-order bits representing the number of iotlb slots needed to meet the 
> requirement, right? (It took a good 5 minutes to realise this wasn't doing 
> what I initially thought it did...)

Yes.

> In that case, a) could the local var be called something like 
> iotlb_align_mask to clarify that it's *not* just a copy of the device's 
> min_align_mask,

Ok.

> and b) maybe just have an unconditional initialisation that 
> works either way:
>
> 	stride = (min_align_mask >> IO_TLB_SHIFT) + 1;

Sure.

> In fact with that, I think could just mask orig_addr with ~IO_TLB_SIZE in 
> the call to check_alignment() below, or shift everything down by 
> IO_TLB_SHIFT in check_alignment() itself, instead of mangling 
> min_align_mask at all (I'm assuming we do need to ignore the low-order bits 
> of orig_addr at this point).

Yes, we do need to ignore the low bits as they won't ever be set in
tlb_dma_addr.  Not sure the shift helps as we need to mask first.

I ended up killing check_alignment entirely, in favor of a new
slot_addr helper that calculates the address based off the base and index
and which can be used in a few other places as this one.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: preserve DMA offsets when using swiotlb
  2021-02-04 19:30 ` Christoph Hellwig
@ 2021-02-05 11:50   ` Christoph Hellwig
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-05 11:50 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, iommu, kbusch, robin.murphy

I've pushed the updated series out to:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-offset

but I'm going to wait until next week before patch bombing everyone
again.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: preserve DMA offsets when using swiotlb
@ 2021-02-05 11:50   ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2021-02-05 11:50 UTC (permalink / raw)
  To: jxgao, gregkh
  Cc: saravanak, konrad.wilk, marcorr, linux-nvme, iommu, kbusch, robin.murphy

I've pushed the updated series out to:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-offset

but I'm going to wait until next week before patch bombing everyone
again.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2021-02-05 11:50 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-04 19:30 preserve DMA offsets when using swiotlb Christoph Hellwig
2021-02-04 19:30 ` Christoph Hellwig
2021-02-04 19:30 ` [PATCH 1/8] driver core: add a min_align_mask field to struct device_dma_parameters Christoph Hellwig
2021-02-04 19:30   ` Christoph Hellwig
2021-02-04 19:44   ` Greg KH
2021-02-04 19:44     ` Greg KH
2021-02-04 19:30 ` [PATCH 2/8] swiotlb: add a io_tlb_offset helper Christoph Hellwig
2021-02-04 19:30   ` Christoph Hellwig
2021-02-04 19:30 ` [PATCH 3/8] swiotlb: factor out a nr_slots helper Christoph Hellwig
2021-02-04 19:30   ` Christoph Hellwig
2021-02-04 22:09   ` Robin Murphy
2021-02-04 22:09     ` Robin Murphy
2021-02-05  9:45     ` Christoph Hellwig
2021-02-05  9:45       ` Christoph Hellwig
2021-02-04 19:30 ` [PATCH 4/8] swiotlb: clean up swiotlb_tbl_unmap_single Christoph Hellwig
2021-02-04 19:30   ` Christoph Hellwig
2021-02-04 19:30 ` [PATCH 5/8] swiotlb: refactor swiotlb_tbl_map_single Christoph Hellwig
2021-02-04 19:30   ` Christoph Hellwig
2021-02-04 22:12   ` Robin Murphy
2021-02-04 22:12     ` Robin Murphy
2021-02-05  9:45     ` Christoph Hellwig
2021-02-05  9:45       ` Christoph Hellwig
2021-02-04 19:30 ` [PATCH 6/8] swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single Christoph Hellwig
2021-02-04 19:30   ` Christoph Hellwig
2021-02-04 19:30 ` [PATCH 7/8] swiotlb: respect min_align_mask Christoph Hellwig
2021-02-04 19:30   ` Christoph Hellwig
2021-02-04 23:13   ` Robin Murphy
2021-02-04 23:13     ` Robin Murphy
2021-02-05 10:34     ` Christoph Hellwig
2021-02-05 10:34       ` Christoph Hellwig
2021-02-04 19:30 ` [PATCH 8/8] nvme-pci: set min_align_mask Christoph Hellwig
2021-02-04 19:30   ` Christoph Hellwig
2021-02-04 19:32   ` Christoph Hellwig
2021-02-04 19:32     ` Christoph Hellwig
2021-02-05 11:50 ` preserve DMA offsets when using swiotlb Christoph Hellwig
2021-02-05 11:50   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.