linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] swiotlb: allocate padding slots if necessary
@ 2024-03-18 13:04 Petr Tesarik
  2024-03-18 13:04 ` [PATCH v2 1/2] swiotlb: extend buffer pre-padding to alloc_align_mask " Petr Tesarik
  2024-03-18 13:04 ` [PATCH v2 2/2] bug: introduce ASSERT_VAR_CAN_HOLD() Petr Tesarik
  0 siblings, 2 replies; 6+ messages in thread
From: Petr Tesarik @ 2024-03-18 13:04 UTC (permalink / raw)
  To: Christoph Hellwig, Marek Szyprowski, Robin Murphy, Petr Tesarik,
	Michael Kelley, Will Deacon, open list,
	open list:DMA MAPPING HELPERS
  Cc: Roberto Sassu, Petr Tesarik

From: Petr Tesarik <petr.tesarik1@huawei-partners.com>

If the allocation alignment is bigger than IO_TLB_SIZE and min_align_mask
covers some bits in the original address between IO_TLB_SIZE and
alloc_align_mask, preserve these bits by allocating additional padding
slots before the actual swiotlb buffer.

Changes from v1
---------------
* Rename padding to pad_slots.
* Set pad_slots only for the first allocated non-padding slot.
* Do not bother initializing orig_addr to INVALID_PHYS_ADDR.
* Change list and pad_slots to unsigned short to avoid growing
  struct io_tlb_slot on 32-bit targets.
* Add build-time check that list and pad_slots can hold the maximum
  allowed value of IO_TLB_SEGSIZE.

Petr Tesarik (2):
  swiotlb: extend buffer pre-padding to alloc_align_mask if necessary
  bug: introduce ASSERT_VAR_CAN_HOLD()

 include/linux/build_bug.h | 10 ++++++++++
 kernel/dma/swiotlb.c      | 37 +++++++++++++++++++++++++++++++------
 2 files changed, 41 insertions(+), 6 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] swiotlb: extend buffer pre-padding to alloc_align_mask if necessary
  2024-03-18 13:04 [PATCH v2 0/2] swiotlb: allocate padding slots if necessary Petr Tesarik
@ 2024-03-18 13:04 ` Petr Tesarik
  2024-03-18 15:37   ` Michael Kelley
  2024-03-19  4:45   ` kernel test robot
  2024-03-18 13:04 ` [PATCH v2 2/2] bug: introduce ASSERT_VAR_CAN_HOLD() Petr Tesarik
  1 sibling, 2 replies; 6+ messages in thread
From: Petr Tesarik @ 2024-03-18 13:04 UTC (permalink / raw)
  To: Christoph Hellwig, Marek Szyprowski, Robin Murphy, Petr Tesarik,
	Michael Kelley, Will Deacon, open list,
	open list:DMA MAPPING HELPERS
  Cc: Roberto Sassu, Petr Tesarik

From: Petr Tesarik <petr.tesarik1@huawei-partners.com>

Allow a buffer pre-padding of up to alloc_align_mask. If the allocation
alignment is bigger than IO_TLB_SIZE and min_align_mask covers any non-zero
bits in the original address between IO_TLB_SIZE and alloc_align_mask,
these bits are not preserved in the swiotlb buffer address.

To fix this case, increase the allocation size and use a larger offset
within the allocated buffer. As a result, extra padding slots may be
allocated before the mapping start address.

Set the orig_addr in these padding slots to INVALID_PHYS_ADDR, because they
do not correspond to any CPU buffer and the data must never be synced.

The padding slots should be automatically released when the buffer is
unmapped. However, swiotlb_tbl_unmap_single() takes only the address of the
DMA buffer slot, not the first padding slot. Save the number of padding
slots in struct io_tlb_slot and use it to adjust the slot index in
swiotlb_release_slots(), so all allocated slots are properly freed.

Fixes: 2fd4fa5d3fb5 ("swiotlb: Fix alignment checks when both allocation and DMA masks are present")
Link: https://lore.kernel.org/linux-iommu/20240311210507.217daf8b@meshulam.tesarici.cz/
Signed-off-by: Petr Tesarik <petr.tesarik1@huawei-partners.com>
---
 kernel/dma/swiotlb.c | 35 +++++++++++++++++++++++++++++------
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 86fe172b5958..aefb05ff55e7 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -69,11 +69,14 @@
  * @alloc_size:	Size of the allocated buffer.
  * @list:	The free list describing the number of free entries available
  *		from each index.
+ * @pad_slots:	Number of preceding padding slots. Valid only in the first
+ *		allocated non-padding slot.
  */
 struct io_tlb_slot {
 	phys_addr_t orig_addr;
 	size_t alloc_size;
-	unsigned int list;
+	unsigned short list;
+	unsigned short pad_slots;
 };
 
 static bool swiotlb_force_bounce;
@@ -287,6 +290,7 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
 					 mem->nslabs - i);
 		mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
 		mem->slots[i].alloc_size = 0;
+		mem->slots[i].pad_slots = 0;
 	}
 
 	memset(vaddr, 0, bytes);
@@ -1328,11 +1332,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 		unsigned long attrs)
 {
 	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
-	unsigned int offset = swiotlb_align_offset(dev, orig_addr);
+	unsigned int offset;
 	struct io_tlb_pool *pool;
 	unsigned int i;
 	int index;
 	phys_addr_t tlb_addr;
+	unsigned short pad_slots;
 
 	if (!mem || !mem->nslabs) {
 		dev_warn_ratelimited(dev,
@@ -1349,6 +1354,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 		return (phys_addr_t)DMA_MAPPING_ERROR;
 	}
 
+	/*
+	 * Calculate buffer pre-padding within the allocated space. Use it to
+	 * preserve the low bits of the original address according to device's
+	 * min_align_mask. Limit the padding to alloc_align_mask or slot size
+	 * (whichever is bigger); higher bits of the original address are
+	 * preserved by selecting a suitable IO TLB slot.
+	 */
+	offset = orig_addr & dma_get_min_align_mask(dev) &
+		(alloc_align_mask | (IO_TLB_SIZE - 1));
 	index = swiotlb_find_slots(dev, orig_addr,
 				   alloc_size + offset, alloc_align_mask, &pool);
 	if (index == -1) {
@@ -1364,6 +1378,10 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 	 * This is needed when we sync the memory.  Then we sync the buffer if
 	 * needed.
 	 */
+	pad_slots = offset / IO_TLB_SIZE;
+	offset %= IO_TLB_SIZE;
+	index += pad_slots;
+	pool->slots[index].pad_slots = i;
 	for (i = 0; i < nr_slots(alloc_size + offset); i++)
 		pool->slots[index + i].orig_addr = slot_addr(orig_addr, i);
 	tlb_addr = slot_addr(pool->start, index) + offset;
@@ -1385,12 +1403,16 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
 	struct io_tlb_pool *mem = swiotlb_find_pool(dev, tlb_addr);
 	unsigned long flags;
 	unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
-	int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
-	int nslots = nr_slots(mem->slots[index].alloc_size + offset);
-	int aindex = index / mem->area_nslabs;
-	struct io_tlb_area *area = &mem->areas[aindex];
+	int index, nslots, aindex;
+	struct io_tlb_area *area;
 	int count, i;
 
+	index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
+	index -= mem->slots[index].pad_slots;
+	nslots = nr_slots(mem->slots[index].alloc_size + offset);
+	aindex = index / mem->area_nslabs;
+	area = &mem->areas[aindex];
+
 	/*
 	 * Return the buffer to the free list by setting the corresponding
 	 * entries to indicate the number of contiguous entries available.
@@ -1413,6 +1435,7 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
 		mem->slots[i].list = ++count;
 		mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
 		mem->slots[i].alloc_size = 0;
+		mem->slots[i].pad_slots = 0;
 	}
 
 	/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] bug: introduce ASSERT_VAR_CAN_HOLD()
  2024-03-18 13:04 [PATCH v2 0/2] swiotlb: allocate padding slots if necessary Petr Tesarik
  2024-03-18 13:04 ` [PATCH v2 1/2] swiotlb: extend buffer pre-padding to alloc_align_mask " Petr Tesarik
@ 2024-03-18 13:04 ` Petr Tesarik
  1 sibling, 0 replies; 6+ messages in thread
From: Petr Tesarik @ 2024-03-18 13:04 UTC (permalink / raw)
  To: Christoph Hellwig, Marek Szyprowski, Robin Murphy, Petr Tesarik,
	Michael Kelley, Will Deacon, open list,
	open list:DMA MAPPING HELPERS
  Cc: Roberto Sassu, Petr Tesarik

From: Petr Tesarik <petr.tesarik1@huawei-partners.com>

Introduce an ASSERT_VAR_CAN_HOLD() macro to check at build time that a
variable can hold the given value.

Use this macro in swiotlb to make sure that the list and pad_slots fields
of struct io_tlb_slot are big enough to hold the maximum possible value of
IO_TLB_SEGSIZE.

Signed-off-by: Petr Tesarik <petr.tesarik1@huawei-partners.com>
---
 include/linux/build_bug.h | 10 ++++++++++
 kernel/dma/swiotlb.c      |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/include/linux/build_bug.h b/include/linux/build_bug.h
index 3aa3640f8c18..6e2486508af0 100644
--- a/include/linux/build_bug.h
+++ b/include/linux/build_bug.h
@@ -86,4 +86,14 @@
 		"Offset of " #field " in " #type " has changed.")
 
 
+/*
+ * Compile time check that a variable can hold the given value
+ */
+#define ASSERT_VAR_CAN_HOLD(var, value) ({		\
+	typeof(value) __val = (value);			\
+	typeof(var) __tmp = __val;			\
+	BUILD_BUG_ON_MSG(__tmp != __val,		\
+		#var " cannot hold " #value ".");	\
+})
+
 #endif	/* _LINUX_BUILD_BUG_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index aefb05ff55e7..0737c1283f86 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -285,6 +285,8 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
 		mem->areas[i].used = 0;
 	}
 
+	ASSERT_VAR_CAN_HOLD(mem->slots[0].list, IO_TLB_SEGSIZE);
+	ASSERT_VAR_CAN_HOLD(mem->slots[0].pad_slots, IO_TLB_SEGSIZE);
 	for (i = 0; i < mem->nslabs; i++) {
 		mem->slots[i].list = min(IO_TLB_SEGSIZE - io_tlb_offset(i),
 					 mem->nslabs - i);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* RE: [PATCH v2 1/2] swiotlb: extend buffer pre-padding to alloc_align_mask if necessary
  2024-03-18 13:04 ` [PATCH v2 1/2] swiotlb: extend buffer pre-padding to alloc_align_mask " Petr Tesarik
@ 2024-03-18 15:37   ` Michael Kelley
  2024-03-18 18:50     ` Petr Tesařík
  2024-03-19  4:45   ` kernel test robot
  1 sibling, 1 reply; 6+ messages in thread
From: Michael Kelley @ 2024-03-18 15:37 UTC (permalink / raw)
  To: Petr Tesarik, Christoph Hellwig, Marek Szyprowski, Robin Murphy,
	Petr Tesarik, Will Deacon, open list,
	open list:DMA MAPPING HELPERS
  Cc: Roberto Sassu, Petr Tesarik

From: Petr Tesarik <petrtesarik@huaweicloud.com> Sent: Monday, March 18, 2024 6:05 AM
> 
> Allow a buffer pre-padding of up to alloc_align_mask. If the allocation
> alignment is bigger than IO_TLB_SIZE and min_align_mask covers any non-zero
> bits in the original address between IO_TLB_SIZE and alloc_align_mask,
> these bits are not preserved in the swiotlb buffer address.
> 
> To fix this case, increase the allocation size and use a larger offset
> within the allocated buffer. As a result, extra padding slots may be
> allocated before the mapping start address.
> 
> Set the orig_addr in these padding slots to INVALID_PHYS_ADDR, because they
> do not correspond to any CPU buffer and the data must never be synced.

This paragraph is now obsolete.

> 
> The padding slots should be automatically released when the buffer is
> unmapped. However, swiotlb_tbl_unmap_single() takes only the address of the
> DMA buffer slot, not the first padding slot. Save the number of padding
> slots in struct io_tlb_slot and use it to adjust the slot index in
> swiotlb_release_slots(), so all allocated slots are properly freed.
> 
> Fixes: 2fd4fa5d3fb5 ("swiotlb: Fix alignment checks when both allocation and DMA masks are present")
> Link: https://lore.kernel.org/linux-iommu/20240311210507.217daf8b@meshulam.tesarici.cz/
> Signed-off-by: Petr Tesarik <petr.tesarik1@huawei-partners.com>
> ---
>  kernel/dma/swiotlb.c | 35 +++++++++++++++++++++++++++++------
>  1 file changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 86fe172b5958..aefb05ff55e7 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -69,11 +69,14 @@
>   * @alloc_size:	Size of the allocated buffer.
>   * @list:	The free list describing the number of free entries available
>   *		from each index.
> + * @pad_slots:	Number of preceding padding slots. Valid only in the first
> + *		allocated non-padding slot.
>   */
>  struct io_tlb_slot {
>  	phys_addr_t orig_addr;
>  	size_t alloc_size;
> -	unsigned int list;
> +	unsigned short list;
> +	unsigned short pad_slots;
>  };
> 
>  static bool swiotlb_force_bounce;
> @@ -287,6 +290,7 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
>  					 mem->nslabs - i);
>  		mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
>  		mem->slots[i].alloc_size = 0;
> +		mem->slots[i].pad_slots = 0;
>  	}
> 
>  	memset(vaddr, 0, bytes);
> @@ -1328,11 +1332,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
>  		unsigned long attrs)
>  {
>  	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> -	unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> +	unsigned int offset;
>  	struct io_tlb_pool *pool;
>  	unsigned int i;
>  	int index;
>  	phys_addr_t tlb_addr;
> +	unsigned short pad_slots;
> 
>  	if (!mem || !mem->nslabs) {
>  		dev_warn_ratelimited(dev,
> @@ -1349,6 +1354,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
>  		return (phys_addr_t)DMA_MAPPING_ERROR;
>  	}
> 
> +	/*
> +	 * Calculate buffer pre-padding within the allocated space. Use it to
> +	 * preserve the low bits of the original address according to device's
> +	 * min_align_mask. Limit the padding to alloc_align_mask or slot size
> +	 * (whichever is bigger); higher bits of the original address are
> +	 * preserved by selecting a suitable IO TLB slot.
> +	 */
> +	offset = orig_addr & dma_get_min_align_mask(dev) &
> +		(alloc_align_mask | (IO_TLB_SIZE - 1));
>  	index = swiotlb_find_slots(dev, orig_addr,
>  				   alloc_size + offset, alloc_align_mask, &pool);
>  	if (index == -1) {
> @@ -1364,6 +1378,10 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
>  	 * This is needed when we sync the memory.  Then we sync the buffer if
>  	 * needed.
>  	 */
> +	pad_slots = offset / IO_TLB_SIZE;
> +	offset %= IO_TLB_SIZE;
> +	index += pad_slots;
> +	pool->slots[index].pad_slots = i;

The above line should be:
	pool->slots[index].pad_slots = pad_slots;

>  	for (i = 0; i < nr_slots(alloc_size + offset); i++)
>  		pool->slots[index + i].orig_addr = slot_addr(orig_addr, i);
>  	tlb_addr = slot_addr(pool->start, index) + offset;
> @@ -1385,12 +1403,16 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
>  	struct io_tlb_pool *mem = swiotlb_find_pool(dev, tlb_addr);
>  	unsigned long flags;
>  	unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
> -	int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
> -	int nslots = nr_slots(mem->slots[index].alloc_size + offset);
> -	int aindex = index / mem->area_nslabs;
> -	struct io_tlb_area *area = &mem->areas[aindex];
> +	int index, nslots, aindex;
> +	struct io_tlb_area *area;
>  	int count, i;
> 
> +	index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
> +	index -= mem->slots[index].pad_slots;
> +	nslots = nr_slots(mem->slots[index].alloc_size + offset);
> +	aindex = index / mem->area_nslabs;
> +	area = &mem->areas[aindex];
> +
>  	/*
>  	 * Return the buffer to the free list by setting the corresponding
>  	 * entries to indicate the number of contiguous entries available.
> @@ -1413,6 +1435,7 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
>  		mem->slots[i].list = ++count;
>  		mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
>  		mem->slots[i].alloc_size = 0;
> +		mem->slots[i].pad_slots = 0;
>  	}
> 
>  	/*
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] swiotlb: extend buffer pre-padding to alloc_align_mask if necessary
  2024-03-18 15:37   ` Michael Kelley
@ 2024-03-18 18:50     ` Petr Tesařík
  0 siblings, 0 replies; 6+ messages in thread
From: Petr Tesařík @ 2024-03-18 18:50 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Christoph Hellwig, Marek Szyprowski, Robin Murphy, Petr Tesarik,
	Will Deacon, open list, open list:DMA MAPPING HELPERS,
	Roberto Sassu, Petr Tesarik

On Mon, 18 Mar 2024 15:37:16 +0000
Michael Kelley <mhklinux@outlook.com> wrote:

> From: Petr Tesarik <petrtesarik@huaweicloud.com> Sent: Monday, March 18, 2024 6:05 AM
> > 
> > Allow a buffer pre-padding of up to alloc_align_mask. If the allocation
> > alignment is bigger than IO_TLB_SIZE and min_align_mask covers any non-zero
> > bits in the original address between IO_TLB_SIZE and alloc_align_mask,
> > these bits are not preserved in the swiotlb buffer address.
> > 
> > To fix this case, increase the allocation size and use a larger offset
> > within the allocated buffer. As a result, extra padding slots may be
> > allocated before the mapping start address.
> > 
> > Set the orig_addr in these padding slots to INVALID_PHYS_ADDR, because they
> > do not correspond to any CPU buffer and the data must never be synced.  
> 
> This paragraph is now obsolete.

Right. It should now say that orig_addr already _is_ set to
INVALID_PHYS_ADDR, so attempts to sync data will be ignored.

> > 
> > The padding slots should be automatically released when the buffer is
> > unmapped. However, swiotlb_tbl_unmap_single() takes only the address of the
> > DMA buffer slot, not the first padding slot. Save the number of padding
> > slots in struct io_tlb_slot and use it to adjust the slot index in
> > swiotlb_release_slots(), so all allocated slots are properly freed.
> > 
> > Fixes: 2fd4fa5d3fb5 ("swiotlb: Fix alignment checks when both allocation and DMA masks are present")
> > Link: https://lore.kernel.org/linux-iommu/20240311210507.217daf8b@meshulam.tesarici.cz/
> > Signed-off-by: Petr Tesarik <petr.tesarik1@huawei-partners.com>
> > ---
> >  kernel/dma/swiotlb.c | 35 +++++++++++++++++++++++++++++------
> >  1 file changed, 29 insertions(+), 6 deletions(-)
> > 
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index 86fe172b5958..aefb05ff55e7 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -69,11 +69,14 @@
> >   * @alloc_size:	Size of the allocated buffer.
> >   * @list:	The free list describing the number of free entries available
> >   *		from each index.
> > + * @pad_slots:	Number of preceding padding slots. Valid only in the first
> > + *		allocated non-padding slot.
> >   */
> >  struct io_tlb_slot {
> >  	phys_addr_t orig_addr;
> >  	size_t alloc_size;
> > -	unsigned int list;
> > +	unsigned short list;
> > +	unsigned short pad_slots;
> >  };
> > 
> >  static bool swiotlb_force_bounce;
> > @@ -287,6 +290,7 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
> >  					 mem->nslabs - i);
> >  		mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> >  		mem->slots[i].alloc_size = 0;
> > +		mem->slots[i].pad_slots = 0;
> >  	}
> > 
> >  	memset(vaddr, 0, bytes);
> > @@ -1328,11 +1332,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> >  		unsigned long attrs)
> >  {
> >  	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> > -	unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> > +	unsigned int offset;
> >  	struct io_tlb_pool *pool;
> >  	unsigned int i;
> >  	int index;
> >  	phys_addr_t tlb_addr;
> > +	unsigned short pad_slots;
> > 
> >  	if (!mem || !mem->nslabs) {
> >  		dev_warn_ratelimited(dev,
> > @@ -1349,6 +1354,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> >  		return (phys_addr_t)DMA_MAPPING_ERROR;
> >  	}
> > 
> > +	/*
> > +	 * Calculate buffer pre-padding within the allocated space. Use it to
> > +	 * preserve the low bits of the original address according to device's
> > +	 * min_align_mask. Limit the padding to alloc_align_mask or slot size
> > +	 * (whichever is bigger); higher bits of the original address are
> > +	 * preserved by selecting a suitable IO TLB slot.
> > +	 */
> > +	offset = orig_addr & dma_get_min_align_mask(dev) &
> > +		(alloc_align_mask | (IO_TLB_SIZE - 1));
> >  	index = swiotlb_find_slots(dev, orig_addr,
> >  				   alloc_size + offset, alloc_align_mask, &pool);
> >  	if (index == -1) {
> > @@ -1364,6 +1378,10 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> >  	 * This is needed when we sync the memory.  Then we sync the buffer if
> >  	 * needed.
> >  	 */
> > +	pad_slots = offset / IO_TLB_SIZE;
> > +	offset %= IO_TLB_SIZE;
> > +	index += pad_slots;
> > +	pool->slots[index].pad_slots = i;  
> 
> The above line should be:
> 	pool->slots[index].pad_slots = pad_slots;

Doh. Yes, I rewrote it a few times and then forgot to change this.

How did it even pass the test suite?

Thank you for the review.

Petr T

> >  	for (i = 0; i < nr_slots(alloc_size + offset); i++)
> >  		pool->slots[index + i].orig_addr = slot_addr(orig_addr, i);
> >  	tlb_addr = slot_addr(pool->start, index) + offset;
> > @@ -1385,12 +1403,16 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
> >  	struct io_tlb_pool *mem = swiotlb_find_pool(dev, tlb_addr);
> >  	unsigned long flags;
> >  	unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
> > -	int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
> > -	int nslots = nr_slots(mem->slots[index].alloc_size + offset);
> > -	int aindex = index / mem->area_nslabs;
> > -	struct io_tlb_area *area = &mem->areas[aindex];
> > +	int index, nslots, aindex;
> > +	struct io_tlb_area *area;
> >  	int count, i;
> > 
> > +	index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
> > +	index -= mem->slots[index].pad_slots;
> > +	nslots = nr_slots(mem->slots[index].alloc_size + offset);
> > +	aindex = index / mem->area_nslabs;
> > +	area = &mem->areas[aindex];
> > +
> >  	/*
> >  	 * Return the buffer to the free list by setting the corresponding
> >  	 * entries to indicate the number of contiguous entries available.
> > @@ -1413,6 +1435,7 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
> >  		mem->slots[i].list = ++count;
> >  		mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> >  		mem->slots[i].alloc_size = 0;
> > +		mem->slots[i].pad_slots = 0;
> >  	}
> > 
> >  	/*
> > --
> > 2.34.1  
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] swiotlb: extend buffer pre-padding to alloc_align_mask if necessary
  2024-03-18 13:04 ` [PATCH v2 1/2] swiotlb: extend buffer pre-padding to alloc_align_mask " Petr Tesarik
  2024-03-18 15:37   ` Michael Kelley
@ 2024-03-19  4:45   ` kernel test robot
  1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2024-03-19  4:45 UTC (permalink / raw)
  To: Petr Tesarik, Christoph Hellwig, Marek Szyprowski, Robin Murphy,
	Petr Tesarik, Michael Kelley, Will Deacon, open list,
	open list:DMA MAPPING HELPERS
  Cc: llvm, oe-kbuild-all, Roberto Sassu

Hi Petr,

kernel test robot noticed the following build warnings:

[auto build test WARNING on v6.8]
[also build test WARNING on linus/master next-20240318]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Petr-Tesarik/swiotlb-extend-buffer-pre-padding-to-alloc_align_mask-if-necessary/20240318-212500
base:   v6.8
patch link:    https://lore.kernel.org/r/20240318130447.594-2-petrtesarik%40huaweicloud.com
patch subject: [PATCH v2 1/2] swiotlb: extend buffer pre-padding to alloc_align_mask if necessary
config: s390-allnoconfig (https://download.01.org/0day-ci/archive/20240319/202403191203.AtV7fvue-lkp@intel.com/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 8f68022f8e6e54d1aeae4ed301f5a015963089b7)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240319/202403191203.AtV7fvue-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202403191203.AtV7fvue-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from kernel/dma/swiotlb.c:27:
   In file included from include/linux/dma-direct.h:9:
   In file included from include/linux/dma-mapping.h:8:
   In file included from include/linux/device.h:32:
   In file included from include/linux/device/driver.h:21:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/s390/include/asm/elf.h:173:
   In file included from arch/s390/include/asm/mmu_context.h:11:
   In file included from arch/s390/include/asm/pgalloc.h:18:
   In file included from include/linux/mm.h:2188:
   include/linux/vmstat.h:522:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     522 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   In file included from kernel/dma/swiotlb.c:27:
   In file included from include/linux/dma-direct.h:9:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:78:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     547 |         val = __raw_readb(PCI_IOBASE + addr);
         |                           ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     560 |         val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu'
      37 | #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
         |                                                           ^
   include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
     102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
         |                                                      ^
   In file included from kernel/dma/swiotlb.c:27:
   In file included from include/linux/dma-direct.h:9:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:78:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     573 |         val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu'
      35 | #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
         |                                                           ^
   include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
     115 | #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
         |                                                      ^
   In file included from kernel/dma/swiotlb.c:27:
   In file included from include/linux/dma-direct.h:9:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:78:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     584 |         __raw_writeb(value, PCI_IOBASE + addr);
         |                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     594 |         __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     604 |         __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     692 |         readsb(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     700 |         readsw(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     708 |         readsl(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     717 |         writesb(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     726 |         writesw(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     735 |         writesl(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
>> kernel/dma/swiotlb.c:1348:33: warning: variable 'i' is uninitialized when used here [-Wuninitialized]
    1348 |         pool->slots[index].pad_slots = i;
         |                                        ^
   kernel/dma/swiotlb.c:1301:16: note: initialize the variable 'i' to silence this warning
    1301 |         unsigned int i;
         |                       ^
         |                        = 0
   kernel/dma/swiotlb.c:1643:20: warning: unused function 'swiotlb_create_debugfs_files' [-Wunused-function]
    1643 | static inline void swiotlb_create_debugfs_files(struct io_tlb_mem *mem,
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   15 warnings generated.


vim +/i +1348 kernel/dma/swiotlb.c

  1292	
  1293	phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
  1294			size_t mapping_size, size_t alloc_size,
  1295			unsigned int alloc_align_mask, enum dma_data_direction dir,
  1296			unsigned long attrs)
  1297	{
  1298		struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
  1299		unsigned int offset;
  1300		struct io_tlb_pool *pool;
  1301		unsigned int i;
  1302		int index;
  1303		phys_addr_t tlb_addr;
  1304		unsigned short pad_slots;
  1305	
  1306		if (!mem || !mem->nslabs) {
  1307			dev_warn_ratelimited(dev,
  1308				"Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
  1309			return (phys_addr_t)DMA_MAPPING_ERROR;
  1310		}
  1311	
  1312		if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
  1313			pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
  1314	
  1315		if (mapping_size > alloc_size) {
  1316			dev_warn_once(dev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
  1317				      mapping_size, alloc_size);
  1318			return (phys_addr_t)DMA_MAPPING_ERROR;
  1319		}
  1320	
  1321		/*
  1322		 * Calculate buffer pre-padding within the allocated space. Use it to
  1323		 * preserve the low bits of the original address according to device's
  1324		 * min_align_mask. Limit the padding to alloc_align_mask or slot size
  1325		 * (whichever is bigger); higher bits of the original address are
  1326		 * preserved by selecting a suitable IO TLB slot.
  1327		 */
  1328		offset = orig_addr & dma_get_min_align_mask(dev) &
  1329			(alloc_align_mask | (IO_TLB_SIZE - 1));
  1330		index = swiotlb_find_slots(dev, orig_addr,
  1331					   alloc_size + offset, alloc_align_mask, &pool);
  1332		if (index == -1) {
  1333			if (!(attrs & DMA_ATTR_NO_WARN))
  1334				dev_warn_ratelimited(dev,
  1335		"swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
  1336					 alloc_size, mem->nslabs, mem_used(mem));
  1337			return (phys_addr_t)DMA_MAPPING_ERROR;
  1338		}
  1339	
  1340		/*
  1341		 * Save away the mapping from the original address to the DMA address.
  1342		 * This is needed when we sync the memory.  Then we sync the buffer if
  1343		 * needed.
  1344		 */
  1345		pad_slots = offset / IO_TLB_SIZE;
  1346		offset %= IO_TLB_SIZE;
  1347		index += pad_slots;
> 1348		pool->slots[index].pad_slots = i;
  1349		for (i = 0; i < nr_slots(alloc_size + offset); i++)
  1350			pool->slots[index + i].orig_addr = slot_addr(orig_addr, i);
  1351		tlb_addr = slot_addr(pool->start, index) + offset;
  1352		/*
  1353		 * When the device is writing memory, i.e. dir == DMA_FROM_DEVICE, copy
  1354		 * the original buffer to the TLB buffer before initiating DMA in order
  1355		 * to preserve the original's data if the device does a partial write,
  1356		 * i.e. if the device doesn't overwrite the entire buffer.  Preserving
  1357		 * the original data, even if it's garbage, is necessary to match
  1358		 * hardware behavior.  Use of swiotlb is supposed to be transparent,
  1359		 * i.e. swiotlb must not corrupt memory by clobbering unwritten bytes.
  1360		 */
  1361		swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
  1362		return tlb_addr;
  1363	}
  1364	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-03-19  4:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-18 13:04 [PATCH v2 0/2] swiotlb: allocate padding slots if necessary Petr Tesarik
2024-03-18 13:04 ` [PATCH v2 1/2] swiotlb: extend buffer pre-padding to alloc_align_mask " Petr Tesarik
2024-03-18 15:37   ` Michael Kelley
2024-03-18 18:50     ` Petr Tesařík
2024-03-19  4:45   ` kernel test robot
2024-03-18 13:04 ` [PATCH v2 2/2] bug: introduce ASSERT_VAR_CAN_HOLD() Petr Tesarik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).