All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb
@ 2021-04-05 21:02 Jianxiong Gao
  2021-04-05 21:02 ` [PATCH v5.10 1/8] driver core: add a min_align_mask field to struct device_dma_parameters Jianxiong Gao
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Jianxiong Gao @ 2021-04-05 21:02 UTC (permalink / raw)
  To: stable; +Cc: Jianxiong Gao

Hi all,

This series of backports fixes the SWIOTLB library to maintain the
page offset when mapping a DMA address. The bug that motivated this
patch series manifested when running a 5.4 kernel as a SEV guest with
an NVMe device. However, any device that infers information from the
page offset and is accessed through the SWIOTLB will benefit from this
bug fix.

Jianxiong Gao (8):
  driver core: add a min_align_mask field to struct 
    device_dma_parameters
  swiotlb: factor out an io_tlb_offset helper
  swiotlb: factor out a nr_slots helper
  swiotlb: clean up swiotlb_tbl_unmap_single
  swiotlb: refactor swiotlb_tbl_map_single
  swiotlb: don't modify orig_addr in  swiotlb_tbl_sync_single
  swiotlb: respect min_align_mask
  nvme-pci: set min_align_mask

 drivers/nvme/host/pci.c     |   1 +
 include/linux/device.h      |   1 +
 include/linux/dma-mapping.h |  16 +++
 include/linux/swiotlb.h     |   1 +
 kernel/dma/swiotlb.c        | 256 ++++++++++++++++++++----------------
 5 files changed, 160 insertions(+), 115 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v5.10 1/8] driver core: add a min_align_mask field to struct  device_dma_parameters
  2021-04-05 21:02 [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb Jianxiong Gao
@ 2021-04-05 21:02 ` Jianxiong Gao
  2021-04-07 12:50   ` Greg KH
  2021-04-05 21:02 ` [PATCH v5.10 2/8] swiotlb: factor out an io_tlb_offset helper Jianxiong Gao
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Jianxiong Gao @ 2021-04-05 21:02 UTC (permalink / raw)
  To: stable; +Cc: Jianxiong Gao, Christoph Hellwig

'commit 36950f2da1ea ("driver core: add a min_align_mask field to struct device_dma_parameters")'

Some devices rely on the address offset in a page to function
correctly (NVMe driver as an example). These devices may use
a different page size than the Linux kernel. The address offset
has to be preserved upon mapping, and in order to do so, we
need to record the page_offset_mask first.

Signed-off-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/device.h      |  1 +
 include/linux/dma-mapping.h | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index 5ed101be7b2e..cfa8ead48f1e 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -291,6 +291,7 @@ struct device_dma_parameters {
 	 * sg limitations.
 	 */
 	unsigned int max_segment_size;
+	unsigned int min_align_mask;
 	unsigned long segment_boundary_mask;
 };
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 956151052d45..a7d70cdee25e 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -500,6 +500,22 @@ static inline int dma_set_seg_boundary(struct device *dev, unsigned long mask)
 	return -EIO;
 }
 
+static inline unsigned int dma_get_min_align_mask(struct device *dev)
+{
+	if (dev->dma_parms)
+		return dev->dma_parms->min_align_mask;
+	return 0;
+}
+
+static inline int dma_set_min_align_mask(struct device *dev,
+		unsigned int min_align_mask)
+{
+	if (WARN_ON_ONCE(!dev->dma_parms))
+		return -EIO;
+	dev->dma_parms->min_align_mask = min_align_mask;
+	return 0;
+}
+
 static inline int dma_get_cache_alignment(void)
 {
 #ifdef ARCH_DMA_MINALIGN
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5.10 2/8] swiotlb: factor out an io_tlb_offset helper
  2021-04-05 21:02 [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb Jianxiong Gao
  2021-04-05 21:02 ` [PATCH v5.10 1/8] driver core: add a min_align_mask field to struct device_dma_parameters Jianxiong Gao
@ 2021-04-05 21:02 ` Jianxiong Gao
  2021-04-05 21:02 ` [PATCH v5.10 3/8] swiotlb: factor out a nr_slots helper Jianxiong Gao
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Jianxiong Gao @ 2021-04-05 21:02 UTC (permalink / raw)
  To: stable; +Cc: Jianxiong Gao, Christoph Hellwig

From: Christoph Hellwig <hch@lst.de>
'commit c7fbeca757fe ("swiotlb: factor out an io_tlb_offset helper")'

Replace the very genericly named OFFSET macro with a little inline
helper that hardcodes the alignment to the only value ever passed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 kernel/dma/swiotlb.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 781b9dca197c..6d6ff626c937 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -50,9 +50,6 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/swiotlb.h>
 
-#define OFFSET(val,align) ((unsigned long)	\
-	                   ( (val) & ( (align) - 1)))
-
 #define SLABS_PER_PAGE (1 << (PAGE_SHIFT - IO_TLB_SHIFT))
 
 /*
@@ -176,6 +173,11 @@ void swiotlb_print_info(void)
 	       bytes >> 20);
 }
 
+static inline unsigned long io_tlb_offset(unsigned long val)
+{
+	return val & (IO_TLB_SEGSIZE - 1);
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -225,7 +227,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 		      __func__, alloc_size, PAGE_SIZE);
 
 	for (i = 0; i < io_tlb_nslabs; i++) {
-		io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+		io_tlb_list[i] = IO_TLB_SEGSIZE - io_tlb_offset(i);
 		io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
 	}
 	io_tlb_index = 0;
@@ -359,7 +361,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 		goto cleanup4;
 
 	for (i = 0; i < io_tlb_nslabs; i++) {
-		io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+		io_tlb_list[i] = IO_TLB_SEGSIZE - io_tlb_offset(i);
 		io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
 	}
 	io_tlb_index = 0;
@@ -530,7 +532,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 
 			for (i = index; i < (int) (index + nslots); i++)
 				io_tlb_list[i] = 0;
-			for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE - 1) && io_tlb_list[i]; i--)
+			for (i = index - 1;
+			     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
+			     io_tlb_list[i]; i--)
 				io_tlb_list[i] = ++count;
 			tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
 
@@ -616,7 +620,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 		 * Step 2: merge the returned slots with the preceding slots,
 		 * if available (non zero)
 		 */
-		for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
+		for (i = index - 1;
+		     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
+		     io_tlb_list[i]; i--)
 			io_tlb_list[i] = ++count;
 
 		io_tlb_used -= nslots;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5.10 3/8] swiotlb: factor out a nr_slots helper
  2021-04-05 21:02 [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb Jianxiong Gao
  2021-04-05 21:02 ` [PATCH v5.10 1/8] driver core: add a min_align_mask field to struct device_dma_parameters Jianxiong Gao
  2021-04-05 21:02 ` [PATCH v5.10 2/8] swiotlb: factor out an io_tlb_offset helper Jianxiong Gao
@ 2021-04-05 21:02 ` Jianxiong Gao
  2021-04-05 21:02 ` [PATCH v5.10 4/8] swiotlb: clean up swiotlb_tbl_unmap_single Jianxiong Gao
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Jianxiong Gao @ 2021-04-05 21:02 UTC (permalink / raw)
  To: stable; +Cc: Jianxiong Gao, Christoph Hellwig, Konrad Rzeszutek Wilk

From: Christoph Hellwig <hch@lst.de>

'commit c32a77fd1878 ("swiotlb: factor out a nr_slots helper")'

Factor out a helper to find the number of slots for a given size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jianxiong Gao <jxgao@google.com>
Tested-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 include/linux/swiotlb.h |  1 +
 kernel/dma/swiotlb.c    | 13 +++++++++----
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index fbdc65782195..5d2dbe7e04c3 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -29,6 +29,7 @@ enum swiotlb_force {
  * controllable.
  */
 #define IO_TLB_SHIFT 11
+#define IO_TLB_SIZE (1 << IO_TLB_SHIFT)
 
 extern void swiotlb_init(int verbose);
 int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 6d6ff626c937..1ba8e1095dfc 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -178,6 +178,11 @@ static inline unsigned long io_tlb_offset(unsigned long val)
 	return val & (IO_TLB_SEGSIZE - 1);
 }
 
+static inline unsigned long nr_slots(u64 val)
+{
+	return DIV_ROUND_UP(val, IO_TLB_SIZE);
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -477,20 +482,20 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 
 	tbl_dma_addr &= mask;
 
-	offset_slots = ALIGN(tbl_dma_addr, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+	offset_slots = nr_slots(tbl_dma_addr);
 
 	/*
 	 * Carefully handle integer overflow which can occur when mask == ~0UL.
 	 */
 	max_slots = mask + 1
-		    ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
+		    ? nr_slots(mask + 1)
 		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
 
 	/*
 	 * For mappings greater than or equal to a page, we limit the stride
 	 * (and hence alignment) to a page size.
 	 */
-	nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+	nslots = nr_slots(alloc_size);
 	if (alloc_size >= PAGE_SIZE)
 		stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
 	else
@@ -586,7 +591,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 			      enum dma_data_direction dir, unsigned long attrs)
 {
 	unsigned long flags;
-	int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+	int i, count, nslots = nr_slots(alloc_size);
 	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
 	phys_addr_t orig_addr = io_tlb_orig_addr[index];
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5.10 4/8] swiotlb: clean up swiotlb_tbl_unmap_single
  2021-04-05 21:02 [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb Jianxiong Gao
                   ` (2 preceding siblings ...)
  2021-04-05 21:02 ` [PATCH v5.10 3/8] swiotlb: factor out a nr_slots helper Jianxiong Gao
@ 2021-04-05 21:02 ` Jianxiong Gao
  2021-04-05 21:02 ` [PATCH v5.10 5/8] swiotlb: refactor swiotlb_tbl_map_single Jianxiong Gao
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Jianxiong Gao @ 2021-04-05 21:02 UTC (permalink / raw)
  To: stable; +Cc: Jianxiong Gao, Christoph Hellwig

From: Christoph Hellwig <hch@lst.de>

'commit ca10d0f8e530 ("swiotlb: clean up swiotlb_tbl_unmap_single")'

Remove a layer of pointless indentation, replace a hard to follow
ternary expression with a plain if/else.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 kernel/dma/swiotlb.c | 42 +++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1ba8e1095dfc..47d5dfeffd0a 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -610,28 +610,28 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 	 * with slots below and above the pool being returned.
 	 */
 	spin_lock_irqsave(&io_tlb_lock, flags);
-	{
-		count = ((index + nslots) < ALIGN(index + 1, IO_TLB_SEGSIZE) ?
-			 io_tlb_list[index + nslots] : 0);
-		/*
-		 * Step 1: return the slots to the free list, merging the
-		 * slots with superceeding slots
-		 */
-		for (i = index + nslots - 1; i >= index; i--) {
-			io_tlb_list[i] = ++count;
-			io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
-		}
-		/*
-		 * Step 2: merge the returned slots with the preceding slots,
-		 * if available (non zero)
-		 */
-		for (i = index - 1;
-		     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
-		     io_tlb_list[i]; i--)
-			io_tlb_list[i] = ++count;
-
-		io_tlb_used -= nslots;
+	if (index + nslots < ALIGN(index + 1, IO_TLB_SEGSIZE))
+		count = io_tlb_list[index + nslots];
+	else
+		count = 0;
+	/*
+	 * Step 1: return the slots to the free list, merging the slots with
+	 * superceeding slots
+	 */
+	for (i = index + nslots - 1; i >= index; i--) {
+		io_tlb_list[i] = ++count;
+		io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
 	}
+
+	/*
+	 * Step 2: merge the returned slots with the preceding slots, if
+	 * available (non zero)
+	 */
+	for (i = index - 1;
+	     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 && io_tlb_list[i];
+	     i--)
+		io_tlb_list[i] = ++count;
+	io_tlb_used -= nslots;
 	spin_unlock_irqrestore(&io_tlb_lock, flags);
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5.10 5/8] swiotlb: refactor swiotlb_tbl_map_single
  2021-04-05 21:02 [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb Jianxiong Gao
                   ` (3 preceding siblings ...)
  2021-04-05 21:02 ` [PATCH v5.10 4/8] swiotlb: clean up swiotlb_tbl_unmap_single Jianxiong Gao
@ 2021-04-05 21:02 ` Jianxiong Gao
  2021-04-05 21:02 ` [PATCH v5.10 6/8] swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single Jianxiong Gao
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Jianxiong Gao @ 2021-04-05 21:02 UTC (permalink / raw)
  To: stable; +Cc: Jianxiong Gao, Christoph Hellwig, Konrad Rzeszutek Wilk

From: Christoph Hellwig <hch@lst.de>

'commit 26a7e094783d ("swiotlb: refactor swiotlb_tbl_map_single")'

Split out a bunch of a self-contained helpers to make the function easier
to follow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jianxiong Gao <jxgao@google.com>
Tested-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 kernel/dma/swiotlb.c | 177 +++++++++++++++++++++----------------------
 1 file changed, 86 insertions(+), 91 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 47d5dfeffd0a..a3c0ccc53b8d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -452,134 +452,129 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
 	}
 }
 
-phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
-		size_t mapping_size, size_t alloc_size,
-		enum dma_data_direction dir, unsigned long attrs)
+#define slot_addr(start, idx)  ((start) + ((idx) << IO_TLB_SHIFT))
+/*
+ * Carefully handle integer overflow which can occur when boundary_mask == ~0UL.
+ */
+static inline unsigned long get_max_slots(unsigned long boundary_mask)
 {
-	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
-	unsigned long flags;
-	phys_addr_t tlb_addr;
-	unsigned int nslots, stride, index, wrap;
-	int i;
-	unsigned long mask;
-	unsigned long offset_slots;
-	unsigned long max_slots;
-	unsigned long tmp_io_tlb_used;
-
-	if (no_iotlb_memory)
-		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
-
-	if (mem_encrypt_active())
-		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
-
-	if (mapping_size > alloc_size) {
-		dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
-			      mapping_size, alloc_size);
-		return (phys_addr_t)DMA_MAPPING_ERROR;
-	}
-
-	mask = dma_get_seg_boundary(hwdev);
-
-	tbl_dma_addr &= mask;
+	if (boundary_mask == ~0UL)
+		return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
+	return nr_slots(boundary_mask + 1);
+}
 
-	offset_slots = nr_slots(tbl_dma_addr);
+static unsigned int wrap_index(unsigned int index)
+{
+	if (index >= io_tlb_nslabs)
+		return 0;
+	return index;
+}
 
-	/*
-	 * Carefully handle integer overflow which can occur when mask == ~0UL.
-	 */
-	max_slots = mask + 1
-		    ? nr_slots(mask + 1)
-		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
+/*
+ * Find a suitable number of IO TLB entries size that will fit this request and
+ * allocate a buffer from that IO TLB pool.
+ */
+static int find_slots(struct device *dev, size_t alloc_size)
+{
+	unsigned long boundary_mask = dma_get_seg_boundary(dev);
+	dma_addr_t tbl_dma_addr =
+		phys_to_dma_unencrypted(dev, io_tlb_start) & boundary_mask;
+	unsigned long max_slots = get_max_slots(boundary_mask);
+	unsigned int nslots = nr_slots(alloc_size), stride = 1;
+	unsigned int index, wrap, count = 0, i;
+	unsigned long flags;
+	BUG_ON(!nslots);
 
 	/*
 	 * For mappings greater than or equal to a page, we limit the stride
 	 * (and hence alignment) to a page size.
 	 */
-	nslots = nr_slots(alloc_size);
 	if (alloc_size >= PAGE_SIZE)
-		stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
-	else
-		stride = 1;
+		stride <<= (PAGE_SHIFT - IO_TLB_SHIFT);
 
-	BUG_ON(!nslots);
-
-	/*
-	 * Find suitable number of IO TLB entries size that will fit this
-	 * request and allocate a buffer from that IO TLB pool.
-	 */
 	spin_lock_irqsave(&io_tlb_lock, flags);
-
 	if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))
 		goto not_found;
 
-	index = ALIGN(io_tlb_index, stride);
-	if (index >= io_tlb_nslabs)
-		index = 0;
-	wrap = index;
-
+	index = wrap = wrap_index(ALIGN(io_tlb_index, stride));
 	do {
-		while (iommu_is_span_boundary(index, nslots, offset_slots,
-					      max_slots)) {
-			index += stride;
-			if (index >= io_tlb_nslabs)
-				index = 0;
-			if (index == wrap)
-				goto not_found;
-		}
-
 		/*
 		 * If we find a slot that indicates we have 'nslots' number of
 		 * contiguous buffers, we allocate the buffers from that slot
 		 * and mark the entries as '0' indicating unavailable.
 		 */
-		if (io_tlb_list[index] >= nslots) {
-			int count = 0;
-
-			for (i = index; i < (int) (index + nslots); i++)
-				io_tlb_list[i] = 0;
-			for (i = index - 1;
-			     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
-			     io_tlb_list[i]; i--)
-				io_tlb_list[i] = ++count;
-			tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
-
-			/*
-			 * Update the indices to avoid searching in the next
-			 * round.
-			 */
-			io_tlb_index = ((index + nslots) < io_tlb_nslabs
-					? (index + nslots) : 0);
-
-			goto found;
+		if (!iommu_is_span_boundary(index, nslots,
+					    nr_slots(tbl_dma_addr),
+					    max_slots)) {
+			if (io_tlb_list[index] >= nslots)
+				goto found;
 		}
-		index += stride;
-		if (index >= io_tlb_nslabs)
-			index = 0;
+		index = wrap_index(index + stride);
 	} while (index != wrap);
 
 not_found:
-	tmp_io_tlb_used = io_tlb_used;
-
 	spin_unlock_irqrestore(&io_tlb_lock, flags);
-	if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
-		dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
-			 alloc_size, io_tlb_nslabs, tmp_io_tlb_used);
-	return (phys_addr_t)DMA_MAPPING_ERROR;
+	return -1;
 found:
+	for (i = index; i < index + nslots; i++)
+		io_tlb_list[i] = 0;
+	for (i = index - 1;
+	     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
+	     io_tlb_list[i]; i--)
+		io_tlb_list[i] = ++count;
+
+	/*
+	 * Update the indices to avoid searching in the next round.
+	 */
+	if (index + nslots < io_tlb_nslabs)
+		io_tlb_index = index + nslots;
+	else
+		io_tlb_index = 0;
 	io_tlb_used += nslots;
+
 	spin_unlock_irqrestore(&io_tlb_lock, flags);
+	return index;
+}
 
+phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
+		size_t mapping_size, size_t alloc_size,
+		enum dma_data_direction dir, unsigned long attrs)
+{
+	unsigned int index, i;
+	phys_addr_t tlb_addr;
+
+	if (no_iotlb_memory)
+		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
+
+	if (mem_encrypt_active())
+		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
+
+	if (mapping_size > alloc_size) {
+		dev_warn_once(dev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
+			mapping_size, alloc_size);
+		return (phys_addr_t)DMA_MAPPING_ERROR;
+	}
+
+	index = find_slots(dev, alloc_size);
+	if (index == -1) {
+		if (!(attrs & DMA_ATTR_NO_WARN))
+			dev_warn_ratelimited(dev,
+	"swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
+				 alloc_size, io_tlb_nslabs, io_tlb_used);
+		return (phys_addr_t)DMA_MAPPING_ERROR;
+	}
 	/*
 	 * Save away the mapping from the original address to the DMA address.
 	 * This is needed when we sync the memory.  Then we sync the buffer if
 	 * needed.
 	 */
-	for (i = 0; i < nslots; i++)
-		io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+	for (i = 0; i < nr_slots(alloc_size); i++)
+		io_tlb_orig_addr[index + i] = slot_addr(orig_addr, i);
+
+	tlb_addr = slot_addr(io_tlb_start, index);
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
 		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
-
 	return tlb_addr;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5.10 6/8] swiotlb: don't modify orig_addr in  swiotlb_tbl_sync_single
  2021-04-05 21:02 [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb Jianxiong Gao
                   ` (4 preceding siblings ...)
  2021-04-05 21:02 ` [PATCH v5.10 5/8] swiotlb: refactor swiotlb_tbl_map_single Jianxiong Gao
@ 2021-04-05 21:02 ` Jianxiong Gao
  2021-04-05 21:02 ` [PATCH v5.10 7/8] swiotlb: respect min_align_mask Jianxiong Gao
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Jianxiong Gao @ 2021-04-05 21:02 UTC (permalink / raw)
  To: stable; +Cc: Jianxiong Gao, Christoph Hellwig

From: Christoph Hellwig <hch@lst.de>

'commit 16fc3cef33a0 ("swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single")'

swiotlb_tbl_map_single currently nevers sets a tlb_addr that is not
aligned to the tlb bucket size.  But we're going to add such a case
soon, for which this adjustment would be bogus.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 kernel/dma/swiotlb.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index a3c0ccc53b8d..d1349971f099 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -639,7 +639,6 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
 
 	if (orig_addr == INVALID_PHYS_ADDR)
 		return;
-	orig_addr += (unsigned long)tlb_addr & ((1 << IO_TLB_SHIFT) - 1);
 
 	switch (target) {
 	case SYNC_FOR_CPU:
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5.10 7/8] swiotlb: respect min_align_mask
  2021-04-05 21:02 [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb Jianxiong Gao
                   ` (5 preceding siblings ...)
  2021-04-05 21:02 ` [PATCH v5.10 6/8] swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single Jianxiong Gao
@ 2021-04-05 21:02 ` Jianxiong Gao
  2021-04-05 21:02 ` [PATCH v5.10 8/8] nvme-pci: set min_align_mask Jianxiong Gao
  2021-04-07 12:51 ` [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb Greg KH
  8 siblings, 0 replies; 19+ messages in thread
From: Jianxiong Gao @ 2021-04-05 21:02 UTC (permalink / raw)
  To: stable; +Cc: Jianxiong Gao, Christoph Hellwig, Konrad Rzeszutek Wilk

From: Christoph Hellwig <hch@lst.de>

'commit 1f221a0d0dbf ("swiotlb: respect min_align_mask")'

Respect the min_align_mask in struct device_dma_parameters in swiotlb.

There are two parts to it:
 1) for the lower bits of the alignment inside the io tlb slot, just
    extent the size of the allocation and leave the start of the slot
     empty
 2) for the high bits ensure we find a slot that matches the high bits
    of the alignment to avoid wasting too much memory

Based on an earlier patch from Jianxiong Gao <jxgao@google.com>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jianxiong Gao <jxgao@google.com>
Tested-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 kernel/dma/swiotlb.c | 41 +++++++++++++++++++++++++++++++----------
 1 file changed, 31 insertions(+), 10 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d1349971f099..ba6e9269a1cc 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -453,6 +453,15 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
 }
 
 #define slot_addr(start, idx)  ((start) + ((idx) << IO_TLB_SHIFT))
+
+/*
+ * Return the offset into a iotlb slot required to keep the device happy.
+ */
+static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
+{
+	return addr & dma_get_min_align_mask(dev) & (IO_TLB_SIZE - 1);
+}
+
 /*
  * Carefully handle integer overflow which can occur when boundary_mask == ~0UL.
  */
@@ -474,23 +483,27 @@ static unsigned int wrap_index(unsigned int index)
  * Find a suitable number of IO TLB entries size that will fit this request and
  * allocate a buffer from that IO TLB pool.
  */
-static int find_slots(struct device *dev, size_t alloc_size)
+static int find_slots(struct device *dev, phys_addr_t orig_addr, size_t alloc_size)
 {
 	unsigned long boundary_mask = dma_get_seg_boundary(dev);
 	dma_addr_t tbl_dma_addr = 
 		phys_to_dma_unencrypted(dev, io_tlb_start) & boundary_mask;
 	unsigned long max_slots = get_max_slots(boundary_mask);
-	unsigned int nslots = nr_slots(alloc_size), stride = 1;
+	unsigned int iotlb_align_mask =
+		dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
+	unsigned int nslots = nr_slots(alloc_size), stride;
 	unsigned int index, wrap, count = 0, i;
 	unsigned long flags;
 	BUG_ON(!nslots);
 
 	/*
-	 * For mappings greater than or equal to a page, we limit the stride
-	 * (and hence alignment) to a page size.
+	 * For mappings with an alignment requirement don't bother looping to
+	 * unaligned slots once we found an aligned one.  For allocations of
+	 * PAGE_SIZE or larger only look for page aligned allocations.
 	 */
+	stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1;
 	if (alloc_size >= PAGE_SIZE)
-		stride <<= (PAGE_SHIFT - IO_TLB_SHIFT);
+		stride = max(stride, stride << (PAGE_SHIFT - IO_TLB_SHIFT));
 
 	spin_lock_irqsave(&io_tlb_lock, flags);
 	if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))
@@ -498,6 +511,12 @@ static int find_slots(struct device *dev, size_t alloc_size)
 
 	index = wrap = wrap_index(ALIGN(io_tlb_index, stride));
 	do {
+		if ((slot_addr(tbl_dma_addr, index) & iotlb_align_mask) !=
+		    (orig_addr & iotlb_align_mask)) {
+			index = wrap_index(index + 1);
+			continue;
+		}
+
 		/*
 		 * If we find a slot that indicates we have 'nslots' number of
 		 * contiguous buffers, we allocate the buffers from that slot
@@ -540,6 +559,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 		size_t mapping_size, size_t alloc_size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
+	unsigned int offset = swiotlb_align_offset(dev, orig_addr);
 	unsigned int index, i;
 	phys_addr_t tlb_addr;
 
@@ -555,7 +575,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 		return (phys_addr_t)DMA_MAPPING_ERROR;
 	}
 
-	index = find_slots(dev, alloc_size);
+	index = find_slots(dev, orig_addr, alloc_size + offset);
 	if (index == -1) {
 		if (!(attrs & DMA_ATTR_NO_WARN))
 			dev_warn_ratelimited(dev,
@@ -568,10 +588,10 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 	 * This is needed when we sync the memory.  Then we sync the buffer if
 	 * needed.
 	 */
-	for (i = 0; i < nr_slots(alloc_size); i++)
+	for (i = 0; i < nr_slots(alloc_size + offset); i++)
 		io_tlb_orig_addr[index + i] = slot_addr(orig_addr, i);
 
-	tlb_addr = slot_addr(io_tlb_start, index);
+	tlb_addr = slot_addr(io_tlb_start, index) + offset;
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
 		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
@@ -586,8 +606,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 			      enum dma_data_direction dir, unsigned long attrs)
 {
 	unsigned long flags;
-	int i, count, nslots = nr_slots(alloc_size);
-	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
+	unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
+	int i, count, nslots = nr_slots(alloc_size + offset);
+	int index = (tlb_addr - offset - io_tlb_start) >> IO_TLB_SHIFT;
 	phys_addr_t orig_addr = io_tlb_orig_addr[index];
 
 	/*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5.10 8/8] nvme-pci: set min_align_mask
  2021-04-05 21:02 [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb Jianxiong Gao
                   ` (6 preceding siblings ...)
  2021-04-05 21:02 ` [PATCH v5.10 7/8] swiotlb: respect min_align_mask Jianxiong Gao
@ 2021-04-05 21:02 ` Jianxiong Gao
  2021-04-07 12:51 ` [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb Greg KH
  8 siblings, 0 replies; 19+ messages in thread
From: Jianxiong Gao @ 2021-04-05 21:02 UTC (permalink / raw)
  To: stable; +Cc: Jianxiong Gao, Christoph Hellwig, Konrad Rzeszutek Wilk

From: Christoph Hellwig <hch@lst.de>

'commit 3d2d861eb03e ("nvme-pci: set min_align_mask")'

The PRP addressing scheme requires all PRP entries except for the
first one to have a zero offset into the NVMe controller pages (which
can be different from the Linux PAGE_SIZE).  Use the min_align_mask
device parameter to ensure that swiotlb does not change the address
of the buffer modulo the device page size to ensure that the PRPs
won't be malformed.

Signed-off-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
---
 drivers/nvme/host/pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3be352403839..dddccfe82abc 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2590,6 +2590,7 @@ static void nvme_reset_work(struct work_struct *work)
 	 * Don't limit the IOMMU merged segment size.
 	 */
 	dma_set_max_seg_size(dev->dev, 0xffffffff);
+	dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1);
 
 	mutex_unlock(&dev->shutdown_lock);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v5.10 1/8] driver core: add a min_align_mask field to struct  device_dma_parameters
  2021-04-05 21:02 ` [PATCH v5.10 1/8] driver core: add a min_align_mask field to struct device_dma_parameters Jianxiong Gao
@ 2021-04-07 12:50   ` Greg KH
  0 siblings, 0 replies; 19+ messages in thread
From: Greg KH @ 2021-04-07 12:50 UTC (permalink / raw)
  To: Jianxiong Gao; +Cc: stable, Christoph Hellwig

On Mon, Apr 05, 2021 at 09:02:23PM +0000, Jianxiong Gao wrote:
> 'commit 36950f2da1ea ("driver core: add a min_align_mask field to struct device_dma_parameters")'

Odd first line, can you at least try to use the normal format we use for
stable kernels?

> Some devices rely on the address offset in a page to function
> correctly (NVMe driver as an example). These devices may use
> a different page size than the Linux kernel. The address offset
> has to be preserved upon mapping, and in order to do so, we
> need to record the page_offset_mask first.
> 
> Signed-off-by: Jianxiong Gao <jxgao@google.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

What happened to all the other signed-off-by lines?

And yours should go last, right?

This series needs work :(

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb
  2021-04-05 21:02 [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb Jianxiong Gao
                   ` (7 preceding siblings ...)
  2021-04-05 21:02 ` [PATCH v5.10 8/8] nvme-pci: set min_align_mask Jianxiong Gao
@ 2021-04-07 12:51 ` Greg KH
  2021-04-20 23:38   ` Jianxiong Gao
  8 siblings, 1 reply; 19+ messages in thread
From: Greg KH @ 2021-04-07 12:51 UTC (permalink / raw)
  To: Jianxiong Gao; +Cc: stable

On Mon, Apr 05, 2021 at 09:02:22PM +0000, Jianxiong Gao wrote:
> Hi all,
> 
> This series of backports fixes the SWIOTLB library to maintain the
> page offset when mapping a DMA address. The bug that motivated this
> patch series manifested when running a 5.4 kernel as a SEV guest with
> an NVMe device. However, any device that infers information from the
> page offset and is accessed through the SWIOTLB will benefit from this
> bug fix.

But this is 5.10, not 5.4, why mention 5.4 here?

And you are backporting a 5.12-rc feature to 5.10, what happened to
5.11?

Why not just use 5.12 to get this new feature instead of using an older
kernel?  It's not like this has ever worked before, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb
  2021-04-07 12:51 ` [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb Greg KH
@ 2021-04-20 23:38   ` Jianxiong Gao
  2021-04-23 15:14     ` Greg KH
  0 siblings, 1 reply; 19+ messages in thread
From: Jianxiong Gao @ 2021-04-20 23:38 UTC (permalink / raw)
  To: Greg KH; +Cc: stable

On Wed, Apr 7, 2021 at 5:51 AM Greg KH <greg@kroah.com> wrote:
>
> On Mon, Apr 05, 2021 at 09:02:22PM +0000, Jianxiong Gao wrote:
> > Hi all,
> >
> > This series of backports fixes the SWIOTLB library to maintain the
> > page offset when mapping a DMA address. The bug that motivated this
> > patch series manifested when running a 5.4 kernel as a SEV guest with
> > an NVMe device. However, any device that infers information from the
> > page offset and is accessed through the SWIOTLB will benefit from this
> > bug fix.
>
> But this is 5.10, not 5.4, why mention 5.4 here?
Oops. The cover letter shouldn't mention the kernel version. The bug
is present in both 5.4 and 5.10. Sorry for the confusion.>
> And you are backporting a 5.12-rc feature to 5.10, what happened to
> 5.11?
No. The goal is to backport a bug fix to the LTS releases.
> Why not just use 5.12 to get this new feature instead of using an older
> kernel?  It's not like this has ever worked before, right?
>
It's true, that a new feature (SEV virtualization) is what motivated
the bug fix. However, I still think this makes sense to backport to
the LTS releases because it does fix a pre-existing bug that may be
impacting pre-existing setups.

In particular, while working on these patches, I got the following feedback:
"There are plenty of other hardware designs that rely on dma mapping
not adding offsets that did not exist, e.g. ahci and various RDMA
NICs."

[1] https://lkml.org/lkml/2020/11/24/520
> thanks,
>
> greg k-h



-- 
Jianxiong Gao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb
  2021-04-20 23:38   ` Jianxiong Gao
@ 2021-04-23 15:14     ` Greg KH
  2021-04-23 17:28       ` Jianxiong Gao
  0 siblings, 1 reply; 19+ messages in thread
From: Greg KH @ 2021-04-23 15:14 UTC (permalink / raw)
  To: Jianxiong Gao; +Cc: stable

On Tue, Apr 20, 2021 at 04:38:13PM -0700, Jianxiong Gao wrote:
> On Wed, Apr 7, 2021 at 5:51 AM Greg KH <greg@kroah.com> wrote:
> >
> > On Mon, Apr 05, 2021 at 09:02:22PM +0000, Jianxiong Gao wrote:
> > > Hi all,
> > >
> > > This series of backports fixes the SWIOTLB library to maintain the
> > > page offset when mapping a DMA address. The bug that motivated this
> > > patch series manifested when running a 5.4 kernel as a SEV guest with
> > > an NVMe device. However, any device that infers information from the
> > > page offset and is accessed through the SWIOTLB will benefit from this
> > > bug fix.
> >
> > But this is 5.10, not 5.4, why mention 5.4 here?
> Oops. The cover letter shouldn't mention the kernel version. The bug
> is present in both 5.4 and 5.10. Sorry for the confusion.>
> > And you are backporting a 5.12-rc feature to 5.10, what happened to
> > 5.11?
> No. The goal is to backport a bug fix to the LTS releases.
> > Why not just use 5.12 to get this new feature instead of using an older
> > kernel?  It's not like this has ever worked before, right?
> >
> It's true, that a new feature (SEV virtualization) is what motivated
> the bug fix. However, I still think this makes sense to backport to
> the LTS releases because it does fix a pre-existing bug that may be
> impacting pre-existing setups.

How?  Anything that installed 5.10 when it was released never had this
working, they had to move to 5.12 to get that to work.

> In particular, while working on these patches, I got the following feedback:
> "There are plenty of other hardware designs that rely on dma mapping
> not adding offsets that did not exist, e.g. ahci and various RDMA
> NICs."

I do not understand that statement, how does that pertain to this patch
set?

confused,

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb
  2021-04-23 15:14     ` Greg KH
@ 2021-04-23 17:28       ` Jianxiong Gao
  2021-04-23 22:10         ` Jianxiong Gao
  2021-04-24 14:43         ` Greg KH
  0 siblings, 2 replies; 19+ messages in thread
From: Jianxiong Gao @ 2021-04-23 17:28 UTC (permalink / raw)
  To: Greg KH; +Cc: stable

> How?  Anything that installed 5.10 when it was released never had this
> working, they had to move to 5.12 to get that to work.

I wasn't clear. The bug is not specific to SEV virtualization. We
simply encountered it while working on SEV virtualization. This is a
pre-existing bug.

Briefly, the NVMe spec expects a page offset to be retained from the
memory address space to the IO address space.

Before these patches, the SWIOTLB truncates any page offset.

Thus, all NVMe + SWIOTLB systems are broken due to this bug without
these patches.

I searched online and found what appeared to be a very similar bug
from a few years ago [1]. Ultimately, it was fixed in the device
firmware. However, it began with NVMe + SWIOTLB resulting in similar
issues to what we observed without these patches.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1402533
-- 
Jianxiong Gao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb
  2021-04-23 17:28       ` Jianxiong Gao
@ 2021-04-23 22:10         ` Jianxiong Gao
  2021-04-23 22:33           ` Marc Orr
  2021-04-24 14:43         ` Greg KH
  1 sibling, 1 reply; 19+ messages in thread
From: Jianxiong Gao @ 2021-04-23 22:10 UTC (permalink / raw)
  To: Greg KH; +Cc: stable, marcorr

+Marc, who can help filling the gaps.
-- 
Jianxiong Gao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb
  2021-04-23 22:10         ` Jianxiong Gao
@ 2021-04-23 22:33           ` Marc Orr
  0 siblings, 0 replies; 19+ messages in thread
From: Marc Orr @ 2021-04-23 22:33 UTC (permalink / raw)
  To: Jianxiong Gao; +Cc: Greg KH, stable

On Fri, Apr 23, 2021 at 3:11 PM Jianxiong Gao <jxgao@google.com> wrote:
>
> +Marc, who can help filling the gaps.
> --
> Jianxiong Gao

Oops. Gao over-trimmed. From lkml, here's Gao's last reply.

>> How?  Anything that installed 5.10 when it was released never had this
>> working, they had to move to 5.12 to get that to work.

> I wasn't clear. The bug is not specific to SEV virtualization. We
> simply encountered it while working on SEV virtualization. This is a
> pre-existing bug.
>
> Briefly, the NVMe spec expects a page offset to be retained from the
> memory address space to the IO address space.
>
> Before these patches, the SWIOTLB truncates any page offset.
>
> Thus, all NVMe + SWIOTLB systems are broken due to this bug without
> these patches.
>
> I searched online and found what appeared to be a very similar bug
> from a few years ago [1]. Ultimately, it was fixed in the device
> firmware. However, it began with NVMe + SWIOTLB resulting in similar
> issues to what we observed without these patches.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1402533

The bug is not specific to SEV virtualization.  We've repro'd the bug
on vanilla NVMe + SWIOTLB kernels, and confirmed that these patches
fix the issue. We simply first encountered it while working on SEV
virtualization.

The bug itself is that on an NVMe + SWIOTLB setup, `mkfs.xfs -f
/dev/nvme2n1` triggers the following error "mkfs.xfs: pwrite failed:
Input/output error". We observed this on a RHEL system.

An example system where a user might encounter this bug is the
following. On a system with NVMe + older 32-bit devices that has been
configured with the `swiotlb=force` kernel command line flag to ensure
that the 32-bit devices work properly.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb
  2021-04-23 17:28       ` Jianxiong Gao
  2021-04-23 22:10         ` Jianxiong Gao
@ 2021-04-24 14:43         ` Greg KH
  2021-04-26 18:00           ` Jianxiong Gao
  1 sibling, 1 reply; 19+ messages in thread
From: Greg KH @ 2021-04-24 14:43 UTC (permalink / raw)
  To: Jianxiong Gao; +Cc: stable

On Fri, Apr 23, 2021 at 10:28:32AM -0700, Jianxiong Gao wrote:
> > How?  Anything that installed 5.10 when it was released never had this
> > working, they had to move to 5.12 to get that to work.
> 
> I wasn't clear. The bug is not specific to SEV virtualization. We
> simply encountered it while working on SEV virtualization. This is a
> pre-existing bug.
> 
> Briefly, the NVMe spec expects a page offset to be retained from the
> memory address space to the IO address space.
> 
> Before these patches, the SWIOTLB truncates any page offset.
> 
> Thus, all NVMe + SWIOTLB systems are broken due to this bug without
> these patches.

Ok, and what prevents you from adding this new feature do your "custom"
kernel, or to use 5.12 instead?

This is a new feature that Linux has never supported, and these patches
are not "trivial" at all.  I also do not see the maintainer of the
subsystem agreeing that these are needed to be backported, which is not
a good sign.

So I recommend just using a newer kernel version, that way all will be
good and no need to backport anything.  What is preventing you from
doing that today?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb
  2021-04-24 14:43         ` Greg KH
@ 2021-04-26 18:00           ` Jianxiong Gao
  2021-04-28 16:18             ` Sasha Levin
  0 siblings, 1 reply; 19+ messages in thread
From: Jianxiong Gao @ 2021-04-26 18:00 UTC (permalink / raw)
  To: Greg KH; +Cc: stable, marcorr

On Sat, Apr 24, 2021 at 7:43 AM Greg KH <greg@kroah.com> wrote:
> Ok, and what prevents you from adding this new feature do your "custom"
> kernel, or to use 5.12 instead?
>
> This is a new feature that Linux has never supported, and these patches
> are not "trivial" at all.  I also do not see the maintainer of the
> subsystem agreeing that these are needed to be backported, which is not
> a good sign.
>
So this is not about a new feature. This is about an existing bug that
we stumbled onto while using SEV virtualization. However SEV is not
needed to trigger the bug. We have reproduced the bug with just
NVMe + SWIOTLB=force option in Rhel 8 environment. Please note
that NVMe and SWIOTLB=force are both existing feature and without
the patch they don't work together. This is why we are proposing to merge
the patches into the LTS kernels.
> So I recommend just using a newer kernel version, that way all will be
> good and no need to backport anything.  What is preventing you from
> doing that today?
>
> thanks,
>
> greg k-h



-- 
Jianxiong Gao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb
  2021-04-26 18:00           ` Jianxiong Gao
@ 2021-04-28 16:18             ` Sasha Levin
  0 siblings, 0 replies; 19+ messages in thread
From: Sasha Levin @ 2021-04-28 16:18 UTC (permalink / raw)
  To: Jianxiong Gao; +Cc: Greg KH, stable, marcorr

On Mon, Apr 26, 2021 at 11:00:56AM -0700, Jianxiong Gao wrote:
>On Sat, Apr 24, 2021 at 7:43 AM Greg KH <greg@kroah.com> wrote:
>> Ok, and what prevents you from adding this new feature do your "custom"
>> kernel, or to use 5.12 instead?
>>
>> This is a new feature that Linux has never supported, and these patches
>> are not "trivial" at all.  I also do not see the maintainer of the
>> subsystem agreeing that these are needed to be backported, which is not
>> a good sign.
>>
>So this is not about a new feature. This is about an existing bug that
>we stumbled onto while using SEV virtualization. However SEV is not
>needed to trigger the bug. We have reproduced the bug with just
>NVMe + SWIOTLB=force option in Rhel 8 environment. Please note
>that NVMe and SWIOTLB=force are both existing feature and without
>the patch they don't work together. This is why we are proposing to merge
>the patches into the LTS kernels.

Could you re-spin this series with the comments around patch formatting
addressed, and explicitly cc hch on this to get his ack?

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-04-28 16:18 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-05 21:02 [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb Jianxiong Gao
2021-04-05 21:02 ` [PATCH v5.10 1/8] driver core: add a min_align_mask field to struct device_dma_parameters Jianxiong Gao
2021-04-07 12:50   ` Greg KH
2021-04-05 21:02 ` [PATCH v5.10 2/8] swiotlb: factor out an io_tlb_offset helper Jianxiong Gao
2021-04-05 21:02 ` [PATCH v5.10 3/8] swiotlb: factor out a nr_slots helper Jianxiong Gao
2021-04-05 21:02 ` [PATCH v5.10 4/8] swiotlb: clean up swiotlb_tbl_unmap_single Jianxiong Gao
2021-04-05 21:02 ` [PATCH v5.10 5/8] swiotlb: refactor swiotlb_tbl_map_single Jianxiong Gao
2021-04-05 21:02 ` [PATCH v5.10 6/8] swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single Jianxiong Gao
2021-04-05 21:02 ` [PATCH v5.10 7/8] swiotlb: respect min_align_mask Jianxiong Gao
2021-04-05 21:02 ` [PATCH v5.10 8/8] nvme-pci: set min_align_mask Jianxiong Gao
2021-04-07 12:51 ` [PATCH v5.10 0/8] preserve DMA offsets when using swiotlb Greg KH
2021-04-20 23:38   ` Jianxiong Gao
2021-04-23 15:14     ` Greg KH
2021-04-23 17:28       ` Jianxiong Gao
2021-04-23 22:10         ` Jianxiong Gao
2021-04-23 22:33           ` Marc Orr
2021-04-24 14:43         ` Greg KH
2021-04-26 18:00           ` Jianxiong Gao
2021-04-28 16:18             ` Sasha Levin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.