All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb
@ 2021-05-18 22:18 Jianxiong Gao
  2021-05-18 22:18 ` [PATCH 5.4 v2 1/9] driver core: add a min_align_mask field to struct device_dma_parameters Jianxiong Gao
                   ` (9 more replies)
  0 siblings, 10 replies; 17+ messages in thread
From: Jianxiong Gao @ 2021-05-18 22:18 UTC (permalink / raw)
  To: stable, hch, marcorr, sashal; +Cc: Jianxiong Gao

We observed several NVMe failures when running with SWIOTLB. The root
cause of the issue is that when data is mapped via SWIOTLB, the address
offset is not preserved. Several device drivers including the NVMe
driver relies on this offset to function correctly.

Even though we discovered the error when running using AMD SEV, we have
reproduced the same error in Rhel 8 without SEV. By adding swiotlb=force
option to the boot command line parameter, NVMe funcionality is
impacted. For example formatting a disk into xfs format returns an
error.

----
Changes in v2:
Rebased patches to 5.4.115
Updated patch description to correct format.

Jianxiong Gao (9):
  driver core: add a min_align_mask field to struct
    device_dma_parameters
  swiotlb: add a IO_TLB_SIZE define
  swiotlb: factor out an io_tlb_offset helper
  swiotlb: factor out a nr_slots helper
  swiotlb: clean up swiotlb_tbl_unmap_single
  swiotlb: refactor swiotlb_tbl_map_single
  swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single
  swiotlb: respect min_align_mask
  nvme-pci: set min_align_mask

 drivers/nvme/host/pci.c     |   1 +
 include/linux/device.h      |   1 +
 include/linux/dma-mapping.h |  16 +++
 include/linux/swiotlb.h     |   1 +
 kernel/dma/swiotlb.c        | 269 ++++++++++++++++++++----------------
 5 files changed, 169 insertions(+), 119 deletions(-)

-- 
2.31.1.751.gd2f1c929bd-goog


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 5.4 v2 1/9] driver core: add a min_align_mask field to struct device_dma_parameters
  2021-05-18 22:18 [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb Jianxiong Gao
@ 2021-05-18 22:18 ` Jianxiong Gao
  2021-05-18 22:18 ` [PATCH 5.4 v2 2/9] swiotlb: add a IO_TLB_SIZE define Jianxiong Gao
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Jianxiong Gao @ 2021-05-18 22:18 UTC (permalink / raw)
  To: stable, hch, marcorr, sashal
  Cc: Jianxiong Gao, Greg Kroah-Hartman, Konrad Rzeszutek Wilk

Some devices rely on the address offset in a page to function
correctly (NVMe driver as an example). These devices may use
a different page size than the Linux kernel. The address offset
has to be preserved upon mapping, and in order to do so, we
need to record the page_offset_mask first.

Signed-off-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Upstream: 36950f2da1ea4cb683be174f6f581e25b2d33e71
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 include/linux/device.h      |  1 +
 include/linux/dma-mapping.h | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index 297239a08bb7..2d30a6d7249e 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -993,6 +993,7 @@ struct device_dma_parameters {
 	 * sg limitations.
 	 */
 	unsigned int max_segment_size;
+	unsigned int min_align_mask;
 	unsigned long segment_boundary_mask;
 };
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4d450672b7d6..953e1e3078f7 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -751,6 +751,22 @@ static inline int dma_set_seg_boundary(struct device *dev, unsigned long mask)
 	return -EIO;
 }
 
+static inline unsigned int dma_get_min_align_mask(struct device *dev)
+{
+	if (dev->dma_parms)
+		return dev->dma_parms->min_align_mask;
+	return 0;
+}
+
+static inline int dma_set_min_align_mask(struct device *dev,
+		unsigned int min_align_mask)
+{
+	if (WARN_ON_ONCE(!dev->dma_parms))
+		return -EIO;
+	dev->dma_parms->min_align_mask = min_align_mask;
+	return 0;
+}
+
 static inline int dma_get_cache_alignment(void)
 {
 #ifdef ARCH_DMA_MINALIGN
-- 
2.31.1.751.gd2f1c929bd-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5.4 v2 2/9] swiotlb: add a IO_TLB_SIZE define
  2021-05-18 22:18 [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb Jianxiong Gao
  2021-05-18 22:18 ` [PATCH 5.4 v2 1/9] driver core: add a min_align_mask field to struct device_dma_parameters Jianxiong Gao
@ 2021-05-18 22:18 ` Jianxiong Gao
  2021-05-18 22:18 ` [PATCH 5.4 v2 3/9] swiotlb: factor out an io_tlb_offset helper Jianxiong Gao
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Jianxiong Gao @ 2021-05-18 22:18 UTC (permalink / raw)
  To: stable, hch, marcorr, sashal; +Cc: Jianxiong Gao, Konrad Rzeszutek Wilk

Add a new IO_TLB_SIZE define instead open coding it using
IO_TLB_SHIFT all over.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jianxiong Gao <jxgao@google.com>
Tested-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Upstream: b5d7ccb7aac3895c2138fe0980a109116ce15eff
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 include/linux/swiotlb.h |  1 +
 kernel/dma/swiotlb.c    | 12 ++++++------
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 0a8fced6aaec..f7aadd297aa9 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -29,6 +29,7 @@ enum swiotlb_force {
  * controllable.
  */
 #define IO_TLB_SHIFT 11
+#define IO_TLB_SIZE (1 << IO_TLB_SHIFT)
 
 extern void swiotlb_init(int verbose);
 int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index f99b79d7e123..af4130059202 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -479,20 +479,20 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 
 	tbl_dma_addr &= mask;
 
-	offset_slots = ALIGN(tbl_dma_addr, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+	offset_slots = ALIGN(tbl_dma_addr, IO_TLB_SIZE) >> IO_TLB_SHIFT;
 
 	/*
 	 * Carefully handle integer overflow which can occur when mask == ~0UL.
 	 */
 	max_slots = mask + 1
-		    ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
+		    ? ALIGN(mask + 1, IO_TLB_SIZE) >> IO_TLB_SHIFT
 		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
 
 	/*
 	 * For mappings greater than or equal to a page, we limit the stride
 	 * (and hence alignment) to a page size.
 	 */
-	nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+	nslots = ALIGN(alloc_size, IO_TLB_SIZE) >> IO_TLB_SHIFT;
 	if (alloc_size >= PAGE_SIZE)
 		stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
 	else
@@ -586,7 +586,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 			      enum dma_data_direction dir, unsigned long attrs)
 {
 	unsigned long flags;
-	int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+	int i, count, nslots = ALIGN(alloc_size, IO_TLB_SIZE) >> IO_TLB_SHIFT;
 	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
 	phys_addr_t orig_addr = io_tlb_orig_addr[index];
 
@@ -637,7 +637,7 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
 
 	if (orig_addr == INVALID_PHYS_ADDR)
 		return;
-	orig_addr += (unsigned long)tlb_addr & ((1 << IO_TLB_SHIFT) - 1);
+	orig_addr += (unsigned long)tlb_addr & (IO_TLB_SIZE - 1);
 
 	switch (target) {
 	case SYNC_FOR_CPU:
@@ -693,7 +693,7 @@ bool swiotlb_map(struct device *dev, phys_addr_t *phys, dma_addr_t *dma_addr,
 
 size_t swiotlb_max_mapping_size(struct device *dev)
 {
-	return ((size_t)1 << IO_TLB_SHIFT) * IO_TLB_SEGSIZE;
+	return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
 }
 
 bool is_swiotlb_active(void)
-- 
2.31.1.751.gd2f1c929bd-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5.4 v2 3/9] swiotlb: factor out an io_tlb_offset helper
  2021-05-18 22:18 [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb Jianxiong Gao
  2021-05-18 22:18 ` [PATCH 5.4 v2 1/9] driver core: add a min_align_mask field to struct device_dma_parameters Jianxiong Gao
  2021-05-18 22:18 ` [PATCH 5.4 v2 2/9] swiotlb: add a IO_TLB_SIZE define Jianxiong Gao
@ 2021-05-18 22:18 ` Jianxiong Gao
  2021-05-18 22:18 ` [PATCH 5.4 v2 4/9] swiotlb: factor out a nr_slots helper Jianxiong Gao
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Jianxiong Gao @ 2021-05-18 22:18 UTC (permalink / raw)
  To: stable, hch, marcorr, sashal; +Cc: Jianxiong Gao, Konrad Rzeszutek Wilk

Replace the very genericly named OFFSET macro with a little inline
helper that hardcodes the alignment to the only value ever passed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jianxiong Gao <jxgao@google.com>
Tested-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Upstream: c7fbeca757fe74135d8b6a4c8ddaef76f5775d68
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 kernel/dma/swiotlb.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index af4130059202..db265dc324b9 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -49,9 +49,6 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/swiotlb.h>
 
-#define OFFSET(val,align) ((unsigned long)	\
-	                   ( (val) & ( (align) - 1)))
-
 #define SLABS_PER_PAGE (1 << (PAGE_SHIFT - IO_TLB_SHIFT))
 
 /*
@@ -177,6 +174,11 @@ void swiotlb_print_info(void)
 	       bytes >> 20);
 }
 
+static inline unsigned long io_tlb_offset(unsigned long val)
+{
+	return val & (IO_TLB_SEGSIZE - 1);
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -226,7 +228,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 		      __func__, alloc_size, PAGE_SIZE);
 
 	for (i = 0; i < io_tlb_nslabs; i++) {
-		io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+		io_tlb_list[i] = IO_TLB_SEGSIZE - io_tlb_offset(i);
 		io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
 	}
 	io_tlb_index = 0;
@@ -360,7 +362,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 		goto cleanup4;
 
 	for (i = 0; i < io_tlb_nslabs; i++) {
-		io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+		io_tlb_list[i] = IO_TLB_SEGSIZE - io_tlb_offset(i);
 		io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
 	}
 	io_tlb_index = 0;
@@ -534,7 +536,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 
 			for (i = index; i < (int) (index + nslots); i++)
 				io_tlb_list[i] = 0;
-			for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE - 1) && io_tlb_list[i]; i--)
+			for (i = index - 1;
+			     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
+			     io_tlb_list[i]; i--)
 				io_tlb_list[i] = ++count;
 			tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
 
@@ -620,7 +624,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 		 * Step 2: merge the returned slots with the preceding slots,
 		 * if available (non zero)
 		 */
-		for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
+		for (i = index - 1;
+		     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
+		     io_tlb_list[i]; i--)
 			io_tlb_list[i] = ++count;
 
 		io_tlb_used -= nslots;
-- 
2.31.1.751.gd2f1c929bd-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5.4 v2 4/9] swiotlb: factor out a nr_slots helper
  2021-05-18 22:18 [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb Jianxiong Gao
                   ` (2 preceding siblings ...)
  2021-05-18 22:18 ` [PATCH 5.4 v2 3/9] swiotlb: factor out an io_tlb_offset helper Jianxiong Gao
@ 2021-05-18 22:18 ` Jianxiong Gao
  2021-05-18 22:18 ` [PATCH 5.4 v2 5/9] swiotlb: clean up swiotlb_tbl_unmap_single Jianxiong Gao
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Jianxiong Gao @ 2021-05-18 22:18 UTC (permalink / raw)
  To: stable, hch, marcorr, sashal; +Cc: Jianxiong Gao, Konrad Rzeszutek Wilk

Factor out a helper to find the number of slots for a given size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jianxiong Gao <jxgao@google.com>
Tested-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Upstream: c32a77fd18780a5192dfb6eec69f239faebf28fd
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 kernel/dma/swiotlb.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index db265dc324b9..b57e0741ce2f 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -179,6 +179,11 @@ static inline unsigned long io_tlb_offset(unsigned long val)
 	return val & (IO_TLB_SEGSIZE - 1);
 }
 
+static inline unsigned long nr_slots(u64 val)
+{
+	return DIV_ROUND_UP(val, IO_TLB_SIZE);
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -481,20 +486,20 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 
 	tbl_dma_addr &= mask;
 
-	offset_slots = ALIGN(tbl_dma_addr, IO_TLB_SIZE) >> IO_TLB_SHIFT;
+	offset_slots = nr_slots(tbl_dma_addr);
 
 	/*
 	 * Carefully handle integer overflow which can occur when mask == ~0UL.
 	 */
 	max_slots = mask + 1
-		    ? ALIGN(mask + 1, IO_TLB_SIZE) >> IO_TLB_SHIFT
+		    ? nr_slots(mask + 1)
 		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
 
 	/*
 	 * For mappings greater than or equal to a page, we limit the stride
 	 * (and hence alignment) to a page size.
 	 */
-	nslots = ALIGN(alloc_size, IO_TLB_SIZE) >> IO_TLB_SHIFT;
+	nslots = nr_slots(alloc_size);
 	if (alloc_size >= PAGE_SIZE)
 		stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
 	else
@@ -590,7 +595,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 			      enum dma_data_direction dir, unsigned long attrs)
 {
 	unsigned long flags;
-	int i, count, nslots = ALIGN(alloc_size, IO_TLB_SIZE) >> IO_TLB_SHIFT;
+	int i, count, nslots = nr_slots(alloc_size);
 	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
 	phys_addr_t orig_addr = io_tlb_orig_addr[index];
 
-- 
2.31.1.751.gd2f1c929bd-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5.4 v2 5/9] swiotlb: clean up swiotlb_tbl_unmap_single
  2021-05-18 22:18 [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb Jianxiong Gao
                   ` (3 preceding siblings ...)
  2021-05-18 22:18 ` [PATCH 5.4 v2 4/9] swiotlb: factor out a nr_slots helper Jianxiong Gao
@ 2021-05-18 22:18 ` Jianxiong Gao
  2021-05-18 22:18 ` [PATCH 5.4 v2 6/9] swiotlb: refactor swiotlb_tbl_map_single Jianxiong Gao
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Jianxiong Gao @ 2021-05-18 22:18 UTC (permalink / raw)
  To: stable, hch, marcorr, sashal; +Cc: Jianxiong Gao, Konrad Rzeszutek Wilk

Remove a layer of pointless indentation, replace a hard to follow
ternary expression with a plain if/else.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jianxiong Gao <jxgao@google.com>
Tested-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Upstream: ca10d0f8e530600ec63c603dbace2c30927d70b7
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 kernel/dma/swiotlb.c | 41 +++++++++++++++++++++--------------------
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b57e0741ce2f..af22c3c5e488 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -614,28 +614,29 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 	 * with slots below and above the pool being returned.
 	 */
 	spin_lock_irqsave(&io_tlb_lock, flags);
-	{
-		count = ((index + nslots) < ALIGN(index + 1, IO_TLB_SEGSIZE) ?
-			 io_tlb_list[index + nslots] : 0);
-		/*
-		 * Step 1: return the slots to the free list, merging the
-		 * slots with superceeding slots
-		 */
-		for (i = index + nslots - 1; i >= index; i--) {
-			io_tlb_list[i] = ++count;
-			io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
-		}
-		/*
-		 * Step 2: merge the returned slots with the preceding slots,
-		 * if available (non zero)
-		 */
-		for (i = index - 1;
-		     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
-		     io_tlb_list[i]; i--)
-			io_tlb_list[i] = ++count;
+	if (index + nslots < ALIGN(index + 1, IO_TLB_SEGSIZE))
+		count = io_tlb_list[index + nslots];
+	else
+		count = 0;
 
-		io_tlb_used -= nslots;
+	/*
+	 * Step 1: return the slots to the free list, merging the slots with
+	 * superceeding slots
+	 */
+	for (i = index + nslots - 1; i >= index; i--) {
+		io_tlb_list[i] = ++count;
+		io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
 	}
+
+	/*
+	 * Step 2: merge the returned slots with the preceding slots, if
+	 * available (non zero)
+	 */
+	for (i = index - 1;
+	     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 && io_tlb_list[i];
+	     i--)
+		io_tlb_list[i] = ++count;
+	io_tlb_used -= nslots;
 	spin_unlock_irqrestore(&io_tlb_lock, flags);
 }
 
-- 
2.31.1.751.gd2f1c929bd-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5.4 v2 6/9] swiotlb: refactor swiotlb_tbl_map_single
  2021-05-18 22:18 [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb Jianxiong Gao
                   ` (4 preceding siblings ...)
  2021-05-18 22:18 ` [PATCH 5.4 v2 5/9] swiotlb: clean up swiotlb_tbl_unmap_single Jianxiong Gao
@ 2021-05-18 22:18 ` Jianxiong Gao
  2021-05-18 22:18 ` [PATCH 5.4 v2 7/9] swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single Jianxiong Gao
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Jianxiong Gao @ 2021-05-18 22:18 UTC (permalink / raw)
  To: stable, hch, marcorr, sashal; +Cc: Jianxiong Gao, Konrad Rzeszutek Wilk

Split out a bunch of a self-contained helpers to make the function easier
to follow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jianxiong Gao <jxgao@google.com>
Tested-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Upstream: 26a7e094783d482f3e125f09945a5bb1d867b2e6
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 kernel/dma/swiotlb.c | 184 +++++++++++++++++++++----------------------
 1 file changed, 91 insertions(+), 93 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index af22c3c5e488..d71f05a33aa4 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -453,133 +453,132 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
 	}
 }
 
-phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
-				   dma_addr_t tbl_dma_addr,
-				   phys_addr_t orig_addr,
-				   size_t mapping_size,
-				   size_t alloc_size,
-				   enum dma_data_direction dir,
-				   unsigned long attrs)
+#define slot_addr(start, idx)  ((start) + ((idx) << IO_TLB_SHIFT))
+/*
+ * Carefully handle integer overflow which can occur when boundary_mask == ~0UL.
+ */
+static inline unsigned long get_max_slots(unsigned long boundary_mask)
 {
-	unsigned long flags;
-	phys_addr_t tlb_addr;
-	unsigned int nslots, stride, index, wrap;
-	int i;
-	unsigned long mask;
-	unsigned long offset_slots;
-	unsigned long max_slots;
-	unsigned long tmp_io_tlb_used;
-
-	if (no_iotlb_memory)
-		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
-
-	if (mem_encrypt_active())
-		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
-
-	if (mapping_size > alloc_size) {
-		dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
-			      mapping_size, alloc_size);
-		return (phys_addr_t)DMA_MAPPING_ERROR;
-	}
-
-	mask = dma_get_seg_boundary(hwdev);
+	if (boundary_mask == ~0UL)
+		return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
+	return nr_slots(boundary_mask + 1);
+}
 
-	tbl_dma_addr &= mask;
+static unsigned int wrap_index(unsigned int index)
+{
+	if (index >= io_tlb_nslabs)
+		return 0;
+	return index;
+}
 
-	offset_slots = nr_slots(tbl_dma_addr);
+/*
+ * Find a suitable number of IO TLB entries size that will fit this request and
+ * allocate a buffer from that IO TLB pool.
+ */
+static int find_slots(struct device *dev, size_t alloc_size)
+{
+	unsigned long boundary_mask = dma_get_seg_boundary(dev);
+	dma_addr_t tbl_dma_addr =
+		__phys_to_dma(dev, io_tlb_start) & boundary_mask;
+	unsigned long max_slots = get_max_slots(boundary_mask);
+	unsigned int nslots = nr_slots(alloc_size), stride = 1;
+	unsigned int index, wrap, count = 0, i;
+	unsigned long flags;
 
-	/*
-	 * Carefully handle integer overflow which can occur when mask == ~0UL.
-	 */
-	max_slots = mask + 1
-		    ? nr_slots(mask + 1)
-		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
+	BUG_ON(!nslots);
 
 	/*
 	 * For mappings greater than or equal to a page, we limit the stride
 	 * (and hence alignment) to a page size.
 	 */
-	nslots = nr_slots(alloc_size);
 	if (alloc_size >= PAGE_SIZE)
-		stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
-	else
-		stride = 1;
+	       stride <<= (PAGE_SHIFT - IO_TLB_SHIFT);
 
-	BUG_ON(!nslots);
-
-	/*
-	 * Find suitable number of IO TLB entries size that will fit this
-	 * request and allocate a buffer from that IO TLB pool.
-	 */
 	spin_lock_irqsave(&io_tlb_lock, flags);
-
 	if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))
 		goto not_found;
 
-	index = ALIGN(io_tlb_index, stride);
-	if (index >= io_tlb_nslabs)
-		index = 0;
-	wrap = index;
-
+	index = wrap = wrap_index(ALIGN(io_tlb_index, stride));
 	do {
-		while (iommu_is_span_boundary(index, nslots, offset_slots,
-					      max_slots)) {
-			index += stride;
-			if (index >= io_tlb_nslabs)
-				index = 0;
-			if (index == wrap)
-				goto not_found;
-		}
-
 		/*
 		 * If we find a slot that indicates we have 'nslots' number of
 		 * contiguous buffers, we allocate the buffers from that slot
 		 * and mark the entries as '0' indicating unavailable.
 		 */
-		if (io_tlb_list[index] >= nslots) {
-			int count = 0;
-
-			for (i = index; i < (int) (index + nslots); i++)
-				io_tlb_list[i] = 0;
-			for (i = index - 1;
-			     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
-			     io_tlb_list[i]; i--)
-				io_tlb_list[i] = ++count;
-			tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
-
-			/*
-			 * Update the indices to avoid searching in the next
-			 * round.
-			 */
-			io_tlb_index = ((index + nslots) < io_tlb_nslabs
-					? (index + nslots) : 0);
-
-			goto found;
+		if (!iommu_is_span_boundary(index, nslots,
+					   nr_slots(tbl_dma_addr),
+					   max_slots)) {
+			if (io_tlb_list[index] >= nslots)
+				goto found;
 		}
-		index += stride;
-		if (index >= io_tlb_nslabs)
-			index = 0;
+		index = wrap_index(index + stride);
 	} while (index != wrap);
 
 not_found:
-	tmp_io_tlb_used = io_tlb_used;
-
 	spin_unlock_irqrestore(&io_tlb_lock, flags);
-	if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
-		dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
-			 alloc_size, io_tlb_nslabs, tmp_io_tlb_used);
-	return (phys_addr_t)DMA_MAPPING_ERROR;
+	return -1;
+
 found:
-	io_tlb_used += nslots;
+	for (i = index; i < index + nslots; i++)
+		io_tlb_list[i] = 0;
+	for (i = index - 1;
+	     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
+	     io_tlb_list[i]; i--)
+		io_tlb_list[i] = ++count;
+
+	/*
+	 * Update the indices to avoid searching in the next round.
+	 */
+	if (index + nslots < io_tlb_nslabs)
+		io_tlb_index = index + nslots;
+	else
+		io_tlb_index = 0;
+
+	io_tlb_used += nslots;
+
 	spin_unlock_irqrestore(&io_tlb_lock, flags);
+	return index;
+}
+
+phys_addr_t swiotlb_tbl_map_single(struct device *dev, dma_addr_t dma_addr,
+				   phys_addr_t orig_addr, size_t mapping_size,
+				   size_t alloc_size,
+				   enum dma_data_direction dir,
+				   unsigned long attrs)
+{
+	unsigned int index, i;
+	phys_addr_t tlb_addr;
+
+	if (no_iotlb_memory)
+		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
+
+	if (mem_encrypt_active())
+		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
+
+	if (mapping_size > alloc_size) {
+		dev_warn_once(dev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
+			      mapping_size, alloc_size);
+		return (phys_addr_t)DMA_MAPPING_ERROR;
+	}
+
+	index = find_slots(dev, alloc_size);
+	if (index == -1) {
+		if (!(attrs & DMA_ATTR_NO_WARN))
+			dev_warn_ratelimited(dev,
+	"swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
+				 alloc_size, io_tlb_nslabs, io_tlb_used);
+		return (phys_addr_t)DMA_MAPPING_ERROR;
+	}
 
 	/*
 	 * Save away the mapping from the original address to the DMA address.
 	 * This is needed when we sync the memory.  Then we sync the buffer if
 	 * needed.
 	 */
-	for (i = 0; i < nslots; i++)
-		io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+	for (i = 0; i < nr_slots(alloc_size); i++)
+		io_tlb_orig_addr[index + i] = slot_addr(orig_addr, i);
+
+	tlb_addr = slot_addr(io_tlb_start, index);
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
 		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
-- 
2.31.1.751.gd2f1c929bd-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5.4 v2 7/9] swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single
  2021-05-18 22:18 [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb Jianxiong Gao
                   ` (5 preceding siblings ...)
  2021-05-18 22:18 ` [PATCH 5.4 v2 6/9] swiotlb: refactor swiotlb_tbl_map_single Jianxiong Gao
@ 2021-05-18 22:18 ` Jianxiong Gao
  2021-05-18 22:18 ` [PATCH 5.4 v2 8/9] swiotlb: respect min_align_mask Jianxiong Gao
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Jianxiong Gao @ 2021-05-18 22:18 UTC (permalink / raw)
  To: stable, hch, marcorr, sashal; +Cc: Jianxiong Gao, Konrad Rzeszutek Wilk

swiotlb_tbl_map_single currently nevers sets a tlb_addr that is not
aligned to the tlb bucket size.  But we're going to add such a case
soon, for which this adjustment would be bogus.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jianxiong Gao <jxgao@google.com>
Tested-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Upstream: 16fc3cef33a04632ab6b31758abdd77563a20759
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 kernel/dma/swiotlb.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d71f05a33aa4..f4e18ae33507 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -647,7 +647,6 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
 
 	if (orig_addr == INVALID_PHYS_ADDR)
 		return;
-	orig_addr += (unsigned long)tlb_addr & (IO_TLB_SIZE - 1);
 
 	switch (target) {
 	case SYNC_FOR_CPU:
-- 
2.31.1.751.gd2f1c929bd-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5.4 v2 8/9] swiotlb: respect min_align_mask
  2021-05-18 22:18 [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb Jianxiong Gao
                   ` (6 preceding siblings ...)
  2021-05-18 22:18 ` [PATCH 5.4 v2 7/9] swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single Jianxiong Gao
@ 2021-05-18 22:18 ` Jianxiong Gao
  2021-05-18 22:18 ` [PATCH 5.4 v2 9/9] nvme-pci: set min_align_mask Jianxiong Gao
  2021-05-19  8:11 ` [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb Greg KH
  9 siblings, 0 replies; 17+ messages in thread
From: Jianxiong Gao @ 2021-05-18 22:18 UTC (permalink / raw)
  To: stable, hch, marcorr, sashal; +Cc: Jianxiong Gao, Konrad Rzeszutek Wilk

swiotlb: respect min_align_mask

Respect the min_align_mask in struct device_dma_parameters in swiotlb.

There are two parts to it:
 1) for the lower bits of the alignment inside the io tlb slot, just
    extent the size of the allocation and leave the start of the slot
     empty
 2) for the high bits ensure we find a slot that matches the high bits
    of the alignment to avoid wasting too much memory

Based on an earlier patch from Jianxiong Gao <jxgao@google.com>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jianxiong Gao <jxgao@google.com>
Tested-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Upstream: 1f221a0d0dbf0e48ef3a9c62871281d6a7819f05
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 kernel/dma/swiotlb.c | 42 ++++++++++++++++++++++++++++++++----------
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index f4e18ae33507..743bf7e36385 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -454,6 +454,15 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
 }
 
 #define slot_addr(start, idx)  ((start) + ((idx) << IO_TLB_SHIFT))
+
+/*
+ * Return the offset into a iotlb slot required to keep the device happy.
+ */
+static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
+{
+	return addr & dma_get_min_align_mask(dev) & (IO_TLB_SIZE - 1);
+}
+
 /*
  * Carefully handle integer overflow which can occur when boundary_mask == ~0UL.
  */
@@ -475,24 +484,29 @@ static unsigned int wrap_index(unsigned int index)
  * Find a suitable number of IO TLB entries size that will fit this request and
  * allocate a buffer from that IO TLB pool.
  */
-static int find_slots(struct device *dev, size_t alloc_size)
+static int find_slots(struct device *dev, phys_addr_t orig_addr,
+		      size_t alloc_size)
 {
         unsigned long boundary_mask = dma_get_seg_boundary(dev);
         dma_addr_t tbl_dma_addr =
                 __phys_to_dma(dev, io_tlb_start) & boundary_mask;
         unsigned long max_slots = get_max_slots(boundary_mask);
-        unsigned int nslots = nr_slots(alloc_size), stride = 1;
+	unsigned int iotlb_align_mask =
+	    dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
+	unsigned int nslots = nr_slots(alloc_size), stride;
         unsigned int index, wrap, count = 0, i;
         unsigned long flags;
 
         BUG_ON(!nslots);
 
 	/*
-	 * For mappings greater than or equal to a page, we limit the stride
-	 * (and hence alignment) to a page size.
+	 * For mappings with an alignment requirement don't bother looping to
+	 * unaligned slots once we found an aligned one.  For allocations of
+	 * PAGE_SIZE or larger only look for page aligned allocations.
 	 */
+	stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1;
 	if (alloc_size >= PAGE_SIZE)
-               stride <<= (PAGE_SHIFT - IO_TLB_SHIFT);
+		stride = max(stride, stride << (PAGE_SHIFT - IO_TLB_SHIFT));
 
 	spin_lock_irqsave(&io_tlb_lock, flags);
 	if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))
@@ -500,6 +514,12 @@ static int find_slots(struct device *dev, size_t alloc_size)
 
         index = wrap = wrap_index(ALIGN(io_tlb_index, stride));
 	do {
+		if ((slot_addr(tbl_dma_addr, index) & iotlb_align_mask) !=
+		    (orig_addr & iotlb_align_mask)) {
+			index = wrap_index(index + 1);
+			continue;
+		}
+
 		/*
 		 * If we find a slot that indicates we have 'nslots' number of
 		 * contiguous buffers, we allocate the buffers from that slot
@@ -545,6 +565,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, dma_addr_t dma_addr,
                                    size_t alloc_size, enum dma_data_direction dir,
                                    unsigned long attrs)
  {
+	unsigned int offset = swiotlb_align_offset(dev, orig_addr);
         unsigned int index, i;
         phys_addr_t tlb_addr;
 
@@ -560,7 +581,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, dma_addr_t dma_addr,
                 return (phys_addr_t)DMA_MAPPING_ERROR;
         }
 
-        index = find_slots(dev, alloc_size);
+	index = find_slots(dev, orig_addr, alloc_size + offset);
         if (index == -1) {
                 if (!(attrs & DMA_ATTR_NO_WARN))
                         dev_warn_ratelimited(dev,
@@ -574,10 +595,10 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, dma_addr_t dma_addr,
 	 * This is needed when we sync the memory.  Then we sync the buffer if
 	 * needed.
 	 */
-        for (i = 0; i < nr_slots(alloc_size); i++)
+	for (i = 0; i < nr_slots(alloc_size + offset); i++)
                 io_tlb_orig_addr[index + i] = slot_addr(orig_addr, i);
 
-        tlb_addr = slot_addr(io_tlb_start, index);
+	tlb_addr = slot_addr(io_tlb_start, index) + offset;
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
 		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
@@ -593,8 +614,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 			      enum dma_data_direction dir, unsigned long attrs)
 {
 	unsigned long flags;
-	int i, count, nslots = nr_slots(alloc_size);
-	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
+	unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
+	int i, count, nslots = nr_slots(alloc_size + offset);
+	int index = (tlb_addr - offset - io_tlb_start) >> IO_TLB_SHIFT;
 	phys_addr_t orig_addr = io_tlb_orig_addr[index];
 
 	/*
-- 
2.31.1.751.gd2f1c929bd-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5.4 v2 9/9] nvme-pci: set min_align_mask
  2021-05-18 22:18 [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb Jianxiong Gao
                   ` (7 preceding siblings ...)
  2021-05-18 22:18 ` [PATCH 5.4 v2 8/9] swiotlb: respect min_align_mask Jianxiong Gao
@ 2021-05-18 22:18 ` Jianxiong Gao
  2021-05-19  8:11 ` [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb Greg KH
  9 siblings, 0 replies; 17+ messages in thread
From: Jianxiong Gao @ 2021-05-18 22:18 UTC (permalink / raw)
  To: stable, hch, marcorr, sashal; +Cc: Jianxiong Gao, Konrad Rzeszutek Wilk

The PRP addressing scheme requires all PRP entries except for the
first one to have a zero offset into the NVMe controller pages (which
can be different from the Linux PAGE_SIZE).  Use the min_align_mask
device parameter to ensure that swiotlb does not change the address
of the buffer modulo the device page size to ensure that the PRPs
won't be malformed.

Signed-off-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jianxiong Gao <jxgao@google.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Upstream: 3d2d861eb03e8ee96dc430a54361c900cbe28afd
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 drivers/nvme/host/pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3bee3724e9fa..0fe86858f39c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2628,6 +2628,7 @@ static void nvme_reset_work(struct work_struct *work)
 	 * Don't limit the IOMMU merged segment size.
 	 */
 	dma_set_max_seg_size(dev->dev, 0xffffffff);
+	dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1);
 
 	mutex_unlock(&dev->shutdown_lock);
 
-- 
2.31.1.751.gd2f1c929bd-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb
  2021-05-18 22:18 [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb Jianxiong Gao
                   ` (8 preceding siblings ...)
  2021-05-18 22:18 ` [PATCH 5.4 v2 9/9] nvme-pci: set min_align_mask Jianxiong Gao
@ 2021-05-19  8:11 ` Greg KH
  2021-05-19 16:42   ` Jianxiong Gao
  9 siblings, 1 reply; 17+ messages in thread
From: Greg KH @ 2021-05-19  8:11 UTC (permalink / raw)
  To: Jianxiong Gao; +Cc: stable, hch, marcorr, sashal

On Tue, May 18, 2021 at 10:18:09PM +0000, Jianxiong Gao wrote:
> We observed several NVMe failures when running with SWIOTLB. The root
> cause of the issue is that when data is mapped via SWIOTLB, the address
> offset is not preserved. Several device drivers including the NVMe
> driver relies on this offset to function correctly.
> 
> Even though we discovered the error when running using AMD SEV, we have
> reproduced the same error in Rhel 8 without SEV. By adding swiotlb=force
> option to the boot command line parameter, NVMe funcionality is
> impacted. For example formatting a disk into xfs format returns an
> error.

I still fail to understand why you can not just use the 5.10.y kernel or
newer.  What is preventing you from doing this if you wish to use this
type of hardware?  This is not a "regression" in that the 5.4.y kernel
has never worked with this hardware before, it feels like a new feature.

Please, just use 5.10.y or newer, your life will be so much easier in
the longrun.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb
  2021-05-19  8:11 ` [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb Greg KH
@ 2021-05-19 16:42   ` Jianxiong Gao
  2021-05-19 17:03     ` Greg KH
  0 siblings, 1 reply; 17+ messages in thread
From: Jianxiong Gao @ 2021-05-19 16:42 UTC (permalink / raw)
  To: Greg KH; +Cc: stable, Christoph Hellwig, Marc Orr, sashal

On Wed, May 19, 2021 at 1:11 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> I still fail to understand why you can not just use the 5.10.y kernel or
> newer.  What is preventing you from doing this if you wish to use this
> type of hardware?  This is not a "regression" in that the 5.4.y kernel
> has never worked with this hardware before, it feels like a new feature.
>
NVMe + SWIOTLB is not a new feature. From my understanding it should
be supported by 5.4.y kernel correctly. Currently without the patch, any
NVMe device (along with some other devices that relies on offset to
work correctly), could be broken if the SWIOTLB is used on a 5.4.y kernel.

Neither NVMe driver nor the SWIOTLB is new on 5.4.y kernel.

If by new feature you mean SEV, then yes SEV relies on NVMe and SWIOTLB
to work correctly. But we can, and have reproduced the same bug without
SEV. So I believe this patch is necessary even without considering SEV.

Thanks
> Please, just use 5.10.y or newer, your life will be so much easier in
> the longrun.
>
> thanks,
>
> greg k-h



-- 
Jianxiong Gao

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb
  2021-05-19 16:42   ` Jianxiong Gao
@ 2021-05-19 17:03     ` Greg KH
  2021-05-19 17:18       ` Marc Orr
  0 siblings, 1 reply; 17+ messages in thread
From: Greg KH @ 2021-05-19 17:03 UTC (permalink / raw)
  To: Jianxiong Gao; +Cc: stable, Christoph Hellwig, Marc Orr, sashal

On Wed, May 19, 2021 at 09:42:42AM -0700, Jianxiong Gao wrote:
> On Wed, May 19, 2021 at 1:11 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > I still fail to understand why you can not just use the 5.10.y kernel or
> > newer.  What is preventing you from doing this if you wish to use this
> > type of hardware?  This is not a "regression" in that the 5.4.y kernel
> > has never worked with this hardware before, it feels like a new feature.
> >
> NVMe + SWIOTLB is not a new feature. From my understanding it should
> be supported by 5.4.y kernel correctly. Currently without the patch, any
> NVMe device (along with some other devices that relies on offset to
> work correctly), could be broken if the SWIOTLB is used on a 5.4.y kernel.

Then do not do that, as obviously it never worked without your fixes, so
this isn't a "regression".

And again, why can you not just use 5.10.y?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb
  2021-05-19 17:03     ` Greg KH
@ 2021-05-19 17:18       ` Marc Orr
  2021-05-19 17:25         ` Greg KH
  0 siblings, 1 reply; 17+ messages in thread
From: Marc Orr @ 2021-05-19 17:18 UTC (permalink / raw)
  To: Greg KH; +Cc: Jianxiong Gao, stable, Christoph Hellwig, sashal

On Wed, May 19, 2021 at 10:03 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Wed, May 19, 2021 at 09:42:42AM -0700, Jianxiong Gao wrote:
> > On Wed, May 19, 2021 at 1:11 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > >
> > > I still fail to understand why you can not just use the 5.10.y kernel or
> > > newer.  What is preventing you from doing this if you wish to use this
> > > type of hardware?  This is not a "regression" in that the 5.4.y kernel
> > > has never worked with this hardware before, it feels like a new feature.
> > >
> > NVMe + SWIOTLB is not a new feature. From my understanding it should
> > be supported by 5.4.y kernel correctly. Currently without the patch, any
> > NVMe device (along with some other devices that relies on offset to
> > work correctly), could be broken if the SWIOTLB is used on a 5.4.y kernel.
>
> Then do not do that, as obviously it never worked without your fixes, so
> this isn't a "regression".

NVMe + SWIOTLB works fine without this bug fix. By fine I mean that a
guest kernel using this configuration boots and runs stably for weeks
and months under general-purpose usage. The bug that this patch set
fixes was only encountered when a user tried to format an xfs
filesystem under a RHEL guest kernel.

> And again, why can you not just use 5.10.y?

For our use case, this fixes the guest kernel, not the host kernel.
The guest distros that we support use 5.4 kernels. We do not control
the kernel that the distros deploy for usage as a guest OS on cloud.
We only control the host kernel.

Thanks,
Marc

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb
  2021-05-19 17:18       ` Marc Orr
@ 2021-05-19 17:25         ` Greg KH
  2021-05-19 20:01           ` Marc Orr
  0 siblings, 1 reply; 17+ messages in thread
From: Greg KH @ 2021-05-19 17:25 UTC (permalink / raw)
  To: Marc Orr; +Cc: Jianxiong Gao, stable, Christoph Hellwig, sashal

On Wed, May 19, 2021 at 10:18:38AM -0700, Marc Orr wrote:
> On Wed, May 19, 2021 at 10:03 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Wed, May 19, 2021 at 09:42:42AM -0700, Jianxiong Gao wrote:
> > > On Wed, May 19, 2021 at 1:11 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > >
> > > > I still fail to understand why you can not just use the 5.10.y kernel or
> > > > newer.  What is preventing you from doing this if you wish to use this
> > > > type of hardware?  This is not a "regression" in that the 5.4.y kernel
> > > > has never worked with this hardware before, it feels like a new feature.
> > > >
> > > NVMe + SWIOTLB is not a new feature. From my understanding it should
> > > be supported by 5.4.y kernel correctly. Currently without the patch, any
> > > NVMe device (along with some other devices that relies on offset to
> > > work correctly), could be broken if the SWIOTLB is used on a 5.4.y kernel.
> >
> > Then do not do that, as obviously it never worked without your fixes, so
> > this isn't a "regression".
> 
> NVMe + SWIOTLB works fine without this bug fix. By fine I mean that a
> guest kernel using this configuration boots and runs stably for weeks
> and months under general-purpose usage. The bug that this patch set
> fixes was only encountered when a user tried to format an xfs
> filesystem under a RHEL guest kernel.
> 
> > And again, why can you not just use 5.10.y?
> 
> For our use case, this fixes the guest kernel, not the host kernel.
> The guest distros that we support use 5.4 kernels. We do not control
> the kernel that the distros deploy for usage as a guest OS on cloud.
> We only control the host kernel.

And how are you going to get your guest kernels to update to these
patches?  What specific ones are you concerned about?

RHEL ignores stable kernel updates, so if you are worried about them,
please just work with that company directly.

Good luck!

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb
  2021-05-19 17:25         ` Greg KH
@ 2021-05-19 20:01           ` Marc Orr
  2021-05-20  8:32             ` Greg KH
  0 siblings, 1 reply; 17+ messages in thread
From: Marc Orr @ 2021-05-19 20:01 UTC (permalink / raw)
  To: Greg KH; +Cc: Jianxiong Gao, stable, Christoph Hellwig, sashal

On Wed, May 19, 2021 at 10:25 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Wed, May 19, 2021 at 10:18:38AM -0700, Marc Orr wrote:
> > On Wed, May 19, 2021 at 10:03 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Wed, May 19, 2021 at 09:42:42AM -0700, Jianxiong Gao wrote:
> > > > On Wed, May 19, 2021 at 1:11 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > >
> > > > > I still fail to understand why you can not just use the 5.10.y kernel or
> > > > > newer.  What is preventing you from doing this if you wish to use this
> > > > > type of hardware?  This is not a "regression" in that the 5.4.y kernel
> > > > > has never worked with this hardware before, it feels like a new feature.
> > > > >
> > > > NVMe + SWIOTLB is not a new feature. From my understanding it should
> > > > be supported by 5.4.y kernel correctly. Currently without the patch, any
> > > > NVMe device (along with some other devices that relies on offset to
> > > > work correctly), could be broken if the SWIOTLB is used on a 5.4.y kernel.
> > >
> > > Then do not do that, as obviously it never worked without your fixes, so
> > > this isn't a "regression".
> >
> > NVMe + SWIOTLB works fine without this bug fix. By fine I mean that a
> > guest kernel using this configuration boots and runs stably for weeks
> > and months under general-purpose usage. The bug that this patch set
> > fixes was only encountered when a user tried to format an xfs
> > filesystem under a RHEL guest kernel.
> >
> > > And again, why can you not just use 5.10.y?
> >
> > For our use case, this fixes the guest kernel, not the host kernel.
> > The guest distros that we support use 5.4 kernels. We do not control
> > the kernel that the distros deploy for usage as a guest OS on cloud.
> > We only control the host kernel.
>
> And how are you going to get your guest kernels to update to these
> patches?  What specific ones are you concerned about?
>
> RHEL ignores stable kernel updates, so if you are worried about them,
> please just work with that company directly.

We support COS as a guest [1], which does base their kernel on 5.4
LTS. If these patches were accepted into 5.4 LTS, they would
automatically get picked up by COS.

[1] https://cloud.google.com/container-optimized-os

Thanks,
Marc

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb
  2021-05-19 20:01           ` Marc Orr
@ 2021-05-20  8:32             ` Greg KH
  0 siblings, 0 replies; 17+ messages in thread
From: Greg KH @ 2021-05-20  8:32 UTC (permalink / raw)
  To: Marc Orr; +Cc: Jianxiong Gao, stable, Christoph Hellwig, sashal

On Wed, May 19, 2021 at 01:01:25PM -0700, Marc Orr wrote:
> On Wed, May 19, 2021 at 10:25 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Wed, May 19, 2021 at 10:18:38AM -0700, Marc Orr wrote:
> > > On Wed, May 19, 2021 at 10:03 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > >
> > > > On Wed, May 19, 2021 at 09:42:42AM -0700, Jianxiong Gao wrote:
> > > > > On Wed, May 19, 2021 at 1:11 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > > >
> > > > > > I still fail to understand why you can not just use the 5.10.y kernel or
> > > > > > newer.  What is preventing you from doing this if you wish to use this
> > > > > > type of hardware?  This is not a "regression" in that the 5.4.y kernel
> > > > > > has never worked with this hardware before, it feels like a new feature.
> > > > > >
> > > > > NVMe + SWIOTLB is not a new feature. From my understanding it should
> > > > > be supported by 5.4.y kernel correctly. Currently without the patch, any
> > > > > NVMe device (along with some other devices that relies on offset to
> > > > > work correctly), could be broken if the SWIOTLB is used on a 5.4.y kernel.
> > > >
> > > > Then do not do that, as obviously it never worked without your fixes, so
> > > > this isn't a "regression".
> > >
> > > NVMe + SWIOTLB works fine without this bug fix. By fine I mean that a
> > > guest kernel using this configuration boots and runs stably for weeks
> > > and months under general-purpose usage. The bug that this patch set
> > > fixes was only encountered when a user tried to format an xfs
> > > filesystem under a RHEL guest kernel.
> > >
> > > > And again, why can you not just use 5.10.y?
> > >
> > > For our use case, this fixes the guest kernel, not the host kernel.
> > > The guest distros that we support use 5.4 kernels. We do not control
> > > the kernel that the distros deploy for usage as a guest OS on cloud.
> > > We only control the host kernel.
> >
> > And how are you going to get your guest kernels to update to these
> > patches?  What specific ones are you concerned about?
> >
> > RHEL ignores stable kernel updates, so if you are worried about them,
> > please just work with that company directly.
> 
> We support COS as a guest [1], which does base their kernel on 5.4
> LTS. If these patches were accepted into 5.4 LTS, they would
> automatically get picked up by COS.
> 
> [1] https://cloud.google.com/container-optimized-os

Then go work with that group to add this "required" set of new features
for your cloud systems that require this as again, I fail to see how
this is a "regression" at all.

Maybe I should just go rip out these from 5.10.y as well as it feels
like a _very_ platform-specific issue that you all are having here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-05-20  8:33 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-18 22:18 [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb Jianxiong Gao
2021-05-18 22:18 ` [PATCH 5.4 v2 1/9] driver core: add a min_align_mask field to struct device_dma_parameters Jianxiong Gao
2021-05-18 22:18 ` [PATCH 5.4 v2 2/9] swiotlb: add a IO_TLB_SIZE define Jianxiong Gao
2021-05-18 22:18 ` [PATCH 5.4 v2 3/9] swiotlb: factor out an io_tlb_offset helper Jianxiong Gao
2021-05-18 22:18 ` [PATCH 5.4 v2 4/9] swiotlb: factor out a nr_slots helper Jianxiong Gao
2021-05-18 22:18 ` [PATCH 5.4 v2 5/9] swiotlb: clean up swiotlb_tbl_unmap_single Jianxiong Gao
2021-05-18 22:18 ` [PATCH 5.4 v2 6/9] swiotlb: refactor swiotlb_tbl_map_single Jianxiong Gao
2021-05-18 22:18 ` [PATCH 5.4 v2 7/9] swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single Jianxiong Gao
2021-05-18 22:18 ` [PATCH 5.4 v2 8/9] swiotlb: respect min_align_mask Jianxiong Gao
2021-05-18 22:18 ` [PATCH 5.4 v2 9/9] nvme-pci: set min_align_mask Jianxiong Gao
2021-05-19  8:11 ` [PATCH 5.4 v2 0/9] preserve DMA offsets when using swiotlb Greg KH
2021-05-19 16:42   ` Jianxiong Gao
2021-05-19 17:03     ` Greg KH
2021-05-19 17:18       ` Marc Orr
2021-05-19 17:25         ` Greg KH
2021-05-19 20:01           ` Marc Orr
2021-05-20  8:32             ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.