All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dma-direct: avoid redundant memory sync for swiotlb
@ 2022-04-12 11:38 ` Chao Gao
  0 siblings, 0 replies; 18+ messages in thread
From: Chao Gao @ 2022-04-12 11:38 UTC (permalink / raw)
  To: linux-kernel, iommu
  Cc: Kevin Tian, Wang Zhaoyang1, Gao Liang, robin.murphy, hch

When we looked into FIO performance with swiotlb enabled in VM, we found
swiotlb_bounce() is always called one more time than expected for each DMA
read request.

It turns out that the bounce buffer is copied to original DMA buffer twice
after the completion of a DMA request (one is done by in
dma_direct_sync_single_for_cpu(), the other by swiotlb_tbl_unmap_single()).
But the content in bounce buffer actually doesn't change between the two
rounds of copy. So, one round of copy is redundant.

Pass DMA_ATTR_SKIP_CPU_SYNC flag to swiotlb_tbl_unmap_single() to
skip the memory copy in it.

This fix increases FIO 64KB sequential read throughput in a guest with
swiotlb=force by 5.6%.

Reported-by: Wang Zhaoyang1 <zhaoyang1.wang@intel.com>
Reported-by: Gao Liang <liang.gao@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 kernel/dma/direct.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 4632b0f4f72e..8a6cd53dbe8c 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -114,6 +114,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 
 	if (unlikely(is_swiotlb_buffer(dev, phys)))
-		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
+		swiotlb_tbl_unmap_single(dev, phys, size, dir,
+					 attrs | DMA_ATTR_SKIP_CPU_SYNC);
 }
 #endif /* _KERNEL_DMA_DIRECT_H */
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH] dma-direct: avoid redundant memory sync for swiotlb
@ 2022-04-12 11:38 ` Chao Gao
  0 siblings, 0 replies; 18+ messages in thread
From: Chao Gao @ 2022-04-12 11:38 UTC (permalink / raw)
  To: linux-kernel, iommu
  Cc: robin.murphy, m.szyprowski, hch, Chao Gao, Wang Zhaoyang1,
	Gao Liang, Kevin Tian

When we looked into FIO performance with swiotlb enabled in VM, we found
swiotlb_bounce() is always called one more time than expected for each DMA
read request.

It turns out that the bounce buffer is copied to original DMA buffer twice
after the completion of a DMA request (one is done by in
dma_direct_sync_single_for_cpu(), the other by swiotlb_tbl_unmap_single()).
But the content in bounce buffer actually doesn't change between the two
rounds of copy. So, one round of copy is redundant.

Pass DMA_ATTR_SKIP_CPU_SYNC flag to swiotlb_tbl_unmap_single() to
skip the memory copy in it.

This fix increases FIO 64KB sequential read throughput in a guest with
swiotlb=force by 5.6%.

Reported-by: Wang Zhaoyang1 <zhaoyang1.wang@intel.com>
Reported-by: Gao Liang <liang.gao@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 kernel/dma/direct.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 4632b0f4f72e..8a6cd53dbe8c 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -114,6 +114,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 
 	if (unlikely(is_swiotlb_buffer(dev, phys)))
-		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
+		swiotlb_tbl_unmap_single(dev, phys, size, dir,
+					 attrs | DMA_ATTR_SKIP_CPU_SYNC);
 }
 #endif /* _KERNEL_DMA_DIRECT_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] dma-direct: avoid redundant memory sync for swiotlb
  2022-04-12 11:38 ` Chao Gao
@ 2022-04-12 13:21   ` Chao Gao
  -1 siblings, 0 replies; 18+ messages in thread
From: Chao Gao @ 2022-04-12 13:21 UTC (permalink / raw)
  To: linux-kernel, iommu
  Cc: Kevin Tian, Wang Zhaoyang1, Gao Liang, robin.murphy, hch

On Tue, Apr 12, 2022 at 07:38:05PM +0800, Chao Gao wrote:
>When we looked into FIO performance with swiotlb enabled in VM, we found
>swiotlb_bounce() is always called one more time than expected for each DMA
>read request.
>
>It turns out that the bounce buffer is copied to original DMA buffer twice
>after the completion of a DMA request (one is done by in
>dma_direct_sync_single_for_cpu(), the other by swiotlb_tbl_unmap_single()).
>But the content in bounce buffer actually doesn't change between the two
>rounds of copy. So, one round of copy is redundant.
>
>Pass DMA_ATTR_SKIP_CPU_SYNC flag to swiotlb_tbl_unmap_single() to
>skip the memory copy in it.
>
>This fix increases FIO 64KB sequential read throughput in a guest with
>swiotlb=force by 5.6%.
>

Sorry. A fixes tag is missing:

Fixes: 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")

>Reported-by: Wang Zhaoyang1 <zhaoyang1.wang@intel.com>
>Reported-by: Gao Liang <liang.gao@intel.com>
>Signed-off-by: Chao Gao <chao.gao@intel.com>
>Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>---
> kernel/dma/direct.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
>index 4632b0f4f72e..8a6cd53dbe8c 100644
>--- a/kernel/dma/direct.h
>+++ b/kernel/dma/direct.h
>@@ -114,6 +114,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
> 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
> 
> 	if (unlikely(is_swiotlb_buffer(dev, phys)))
>-		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>+		swiotlb_tbl_unmap_single(dev, phys, size, dir,
>+					 attrs | DMA_ATTR_SKIP_CPU_SYNC);
> }
> #endif /* _KERNEL_DMA_DIRECT_H */
>-- 
>2.25.1
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] dma-direct: avoid redundant memory sync for swiotlb
@ 2022-04-12 13:21   ` Chao Gao
  0 siblings, 0 replies; 18+ messages in thread
From: Chao Gao @ 2022-04-12 13:21 UTC (permalink / raw)
  To: linux-kernel, iommu
  Cc: robin.murphy, m.szyprowski, hch, Wang Zhaoyang1, Gao Liang, Kevin Tian

On Tue, Apr 12, 2022 at 07:38:05PM +0800, Chao Gao wrote:
>When we looked into FIO performance with swiotlb enabled in VM, we found
>swiotlb_bounce() is always called one more time than expected for each DMA
>read request.
>
>It turns out that the bounce buffer is copied to original DMA buffer twice
>after the completion of a DMA request (one is done by in
>dma_direct_sync_single_for_cpu(), the other by swiotlb_tbl_unmap_single()).
>But the content in bounce buffer actually doesn't change between the two
>rounds of copy. So, one round of copy is redundant.
>
>Pass DMA_ATTR_SKIP_CPU_SYNC flag to swiotlb_tbl_unmap_single() to
>skip the memory copy in it.
>
>This fix increases FIO 64KB sequential read throughput in a guest with
>swiotlb=force by 5.6%.
>

Sorry. A fixes tag is missing:

Fixes: 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")

>Reported-by: Wang Zhaoyang1 <zhaoyang1.wang@intel.com>
>Reported-by: Gao Liang <liang.gao@intel.com>
>Signed-off-by: Chao Gao <chao.gao@intel.com>
>Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>---
> kernel/dma/direct.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
>index 4632b0f4f72e..8a6cd53dbe8c 100644
>--- a/kernel/dma/direct.h
>+++ b/kernel/dma/direct.h
>@@ -114,6 +114,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
> 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
> 
> 	if (unlikely(is_swiotlb_buffer(dev, phys)))
>-		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>+		swiotlb_tbl_unmap_single(dev, phys, size, dir,
>+					 attrs | DMA_ATTR_SKIP_CPU_SYNC);
> }
> #endif /* _KERNEL_DMA_DIRECT_H */
>-- 
>2.25.1
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] dma-direct: avoid redundant memory sync for swiotlb
  2022-04-12 11:38 ` Chao Gao
@ 2022-04-12 13:33   ` Robin Murphy
  -1 siblings, 0 replies; 18+ messages in thread
From: Robin Murphy @ 2022-04-12 13:33 UTC (permalink / raw)
  To: Chao Gao, linux-kernel, iommu
  Cc: m.szyprowski, hch, Wang Zhaoyang1, Gao Liang, Kevin Tian

On 12/04/2022 12:38 pm, Chao Gao wrote:
> When we looked into FIO performance with swiotlb enabled in VM, we found
> swiotlb_bounce() is always called one more time than expected for each DMA
> read request.
> 
> It turns out that the bounce buffer is copied to original DMA buffer twice
> after the completion of a DMA request (one is done by in
> dma_direct_sync_single_for_cpu(), the other by swiotlb_tbl_unmap_single()).
> But the content in bounce buffer actually doesn't change between the two
> rounds of copy. So, one round of copy is redundant.
> 
> Pass DMA_ATTR_SKIP_CPU_SYNC flag to swiotlb_tbl_unmap_single() to
> skip the memory copy in it.

It's still a little suboptimal and non-obvious to call into SWIOTLB 
twice though - even better might be for SWIOTLB to call 
arch_sync_dma_for_cpu() at the appropriate place internally, then put 
the dma_direct_sync in an else path here. I'm really not sure why we 
have the current disparity between map and unmap in this regard... :/

Robin.

> This fix increases FIO 64KB sequential read throughput in a guest with
> swiotlb=force by 5.6%.
> 
> Reported-by: Wang Zhaoyang1 <zhaoyang1.wang@intel.com>
> Reported-by: Gao Liang <liang.gao@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>   kernel/dma/direct.h | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
> index 4632b0f4f72e..8a6cd53dbe8c 100644
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -114,6 +114,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>   		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
>   
>   	if (unlikely(is_swiotlb_buffer(dev, phys)))
> -		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
> +		swiotlb_tbl_unmap_single(dev, phys, size, dir,
> +					 attrs | DMA_ATTR_SKIP_CPU_SYNC);
>   }
>   #endif /* _KERNEL_DMA_DIRECT_H */

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] dma-direct: avoid redundant memory sync for swiotlb
@ 2022-04-12 13:33   ` Robin Murphy
  0 siblings, 0 replies; 18+ messages in thread
From: Robin Murphy @ 2022-04-12 13:33 UTC (permalink / raw)
  To: Chao Gao, linux-kernel, iommu; +Cc: Wang Zhaoyang1, Gao Liang, Kevin Tian, hch

On 12/04/2022 12:38 pm, Chao Gao wrote:
> When we looked into FIO performance with swiotlb enabled in VM, we found
> swiotlb_bounce() is always called one more time than expected for each DMA
> read request.
> 
> It turns out that the bounce buffer is copied to original DMA buffer twice
> after the completion of a DMA request (one is done by in
> dma_direct_sync_single_for_cpu(), the other by swiotlb_tbl_unmap_single()).
> But the content in bounce buffer actually doesn't change between the two
> rounds of copy. So, one round of copy is redundant.
> 
> Pass DMA_ATTR_SKIP_CPU_SYNC flag to swiotlb_tbl_unmap_single() to
> skip the memory copy in it.

It's still a little suboptimal and non-obvious to call into SWIOTLB 
twice though - even better might be for SWIOTLB to call 
arch_sync_dma_for_cpu() at the appropriate place internally, then put 
the dma_direct_sync in an else path here. I'm really not sure why we 
have the current disparity between map and unmap in this regard... :/

Robin.

> This fix increases FIO 64KB sequential read throughput in a guest with
> swiotlb=force by 5.6%.
> 
> Reported-by: Wang Zhaoyang1 <zhaoyang1.wang@intel.com>
> Reported-by: Gao Liang <liang.gao@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>   kernel/dma/direct.h | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
> index 4632b0f4f72e..8a6cd53dbe8c 100644
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -114,6 +114,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>   		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
>   
>   	if (unlikely(is_swiotlb_buffer(dev, phys)))
> -		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
> +		swiotlb_tbl_unmap_single(dev, phys, size, dir,
> +					 attrs | DMA_ATTR_SKIP_CPU_SYNC);
>   }
>   #endif /* _KERNEL_DMA_DIRECT_H */
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] dma-direct: avoid redundant memory sync for swiotlb
  2022-04-12 13:33   ` Robin Murphy
@ 2022-04-13  1:02     ` Chao Gao
  -1 siblings, 0 replies; 18+ messages in thread
From: Chao Gao @ 2022-04-13  1:02 UTC (permalink / raw)
  To: Robin Murphy
  Cc: linux-kernel, iommu, m.szyprowski, hch, Wang Zhaoyang1,
	Gao Liang, Kevin Tian

On Tue, Apr 12, 2022 at 02:33:05PM +0100, Robin Murphy wrote:
>On 12/04/2022 12:38 pm, Chao Gao wrote:
>> When we looked into FIO performance with swiotlb enabled in VM, we found
>> swiotlb_bounce() is always called one more time than expected for each DMA
>> read request.
>> 
>> It turns out that the bounce buffer is copied to original DMA buffer twice
>> after the completion of a DMA request (one is done by in
>> dma_direct_sync_single_for_cpu(), the other by swiotlb_tbl_unmap_single()).
>> But the content in bounce buffer actually doesn't change between the two
>> rounds of copy. So, one round of copy is redundant.
>> 
>> Pass DMA_ATTR_SKIP_CPU_SYNC flag to swiotlb_tbl_unmap_single() to
>> skip the memory copy in it.
>
>It's still a little suboptimal and non-obvious to call into SWIOTLB twice
>though - even better might be for SWIOTLB to call arch_sync_dma_for_cpu() at
>the appropriate place internally,

Hi Robin,

dma_direct_sync_single_for_cpu() also calls arch_sync_dma_for_cpu_all()
and arch_dma_mark_clean() in some cases. if SWIOTLB does sync internally,
should these two functions be called by SWIOTLB?

Personally, it might be better if swiotlb can just focus on bounce buffer
alloc/free. Adding more DMA coherence logic into swiotlb will make it
a little complicated.

How about an open-coded version of dma_direct_sync_single_for_cpu
in dma_direct_unmap_page with swiotlb_sync_single_for_cpu replaced by
swiotlb_tbl_unmap_single?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] dma-direct: avoid redundant memory sync for swiotlb
@ 2022-04-13  1:02     ` Chao Gao
  0 siblings, 0 replies; 18+ messages in thread
From: Chao Gao @ 2022-04-13  1:02 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Kevin Tian, Wang Zhaoyang1, linux-kernel, Gao Liang, iommu, hch

On Tue, Apr 12, 2022 at 02:33:05PM +0100, Robin Murphy wrote:
>On 12/04/2022 12:38 pm, Chao Gao wrote:
>> When we looked into FIO performance with swiotlb enabled in VM, we found
>> swiotlb_bounce() is always called one more time than expected for each DMA
>> read request.
>> 
>> It turns out that the bounce buffer is copied to original DMA buffer twice
>> after the completion of a DMA request (one is done by in
>> dma_direct_sync_single_for_cpu(), the other by swiotlb_tbl_unmap_single()).
>> But the content in bounce buffer actually doesn't change between the two
>> rounds of copy. So, one round of copy is redundant.
>> 
>> Pass DMA_ATTR_SKIP_CPU_SYNC flag to swiotlb_tbl_unmap_single() to
>> skip the memory copy in it.
>
>It's still a little suboptimal and non-obvious to call into SWIOTLB twice
>though - even better might be for SWIOTLB to call arch_sync_dma_for_cpu() at
>the appropriate place internally,

Hi Robin,

dma_direct_sync_single_for_cpu() also calls arch_sync_dma_for_cpu_all()
and arch_dma_mark_clean() in some cases. if SWIOTLB does sync internally,
should these two functions be called by SWIOTLB?

Personally, it might be better if swiotlb can just focus on bounce buffer
alloc/free. Adding more DMA coherence logic into swiotlb will make it
a little complicated.

How about an open-coded version of dma_direct_sync_single_for_cpu
in dma_direct_unmap_page with swiotlb_sync_single_for_cpu replaced by
swiotlb_tbl_unmap_single?
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] dma-direct: avoid redundant memory sync for swiotlb
  2022-04-13  1:02     ` Chao Gao
@ 2022-04-13  4:59       ` Christoph Hellwig
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2022-04-13  4:59 UTC (permalink / raw)
  To: Chao Gao
  Cc: Kevin Tian, Wang Zhaoyang1, linux-kernel, Gao Liang, iommu,
	Robin Murphy, hch

On Wed, Apr 13, 2022 at 09:02:02AM +0800, Chao Gao wrote:
> dma_direct_sync_single_for_cpu() also calls arch_sync_dma_for_cpu_all()
> and arch_dma_mark_clean() in some cases. if SWIOTLB does sync internally,
> should these two functions be called by SWIOTLB?
> 
> Personally, it might be better if swiotlb can just focus on bounce buffer
> alloc/free. Adding more DMA coherence logic into swiotlb will make it
> a little complicated.
> 
> How about an open-coded version of dma_direct_sync_single_for_cpu
> in dma_direct_unmap_page with swiotlb_sync_single_for_cpu replaced by
> swiotlb_tbl_unmap_single?

I don't think the swiotlb and non-coherent case ever fully worked.
Before the merge of swiotlb into dma-direct they obviously were
mutally exclusive, and even now all the cache maintainance is done
on the physical address of the original data, not the swiotlb buffer.

If we want to fix that properly all the arch dma calls will need to
move into swiotlb, but that is a much bigger patch.

So for now I'd be happy with the one liner presented here, but
eventually the whole area could use an overhaul.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] dma-direct: avoid redundant memory sync for swiotlb
@ 2022-04-13  4:59       ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2022-04-13  4:59 UTC (permalink / raw)
  To: Chao Gao
  Cc: Robin Murphy, linux-kernel, iommu, m.szyprowski, hch,
	Wang Zhaoyang1, Gao Liang, Kevin Tian

On Wed, Apr 13, 2022 at 09:02:02AM +0800, Chao Gao wrote:
> dma_direct_sync_single_for_cpu() also calls arch_sync_dma_for_cpu_all()
> and arch_dma_mark_clean() in some cases. if SWIOTLB does sync internally,
> should these two functions be called by SWIOTLB?
> 
> Personally, it might be better if swiotlb can just focus on bounce buffer
> alloc/free. Adding more DMA coherence logic into swiotlb will make it
> a little complicated.
> 
> How about an open-coded version of dma_direct_sync_single_for_cpu
> in dma_direct_unmap_page with swiotlb_sync_single_for_cpu replaced by
> swiotlb_tbl_unmap_single?

I don't think the swiotlb and non-coherent case ever fully worked.
Before the merge of swiotlb into dma-direct they obviously were
mutally exclusive, and even now all the cache maintainance is done
on the physical address of the original data, not the swiotlb buffer.

If we want to fix that properly all the arch dma calls will need to
move into swiotlb, but that is a much bigger patch.

So for now I'd be happy with the one liner presented here, but
eventually the whole area could use an overhaul.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] dma-direct: avoid redundant memory sync for swiotlb
  2022-04-13  4:59       ` Christoph Hellwig
@ 2022-04-13  5:46         ` Chao Gao
  -1 siblings, 0 replies; 18+ messages in thread
From: Chao Gao @ 2022-04-13  5:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Robin Murphy, linux-kernel, iommu, m.szyprowski, Wang Zhaoyang1,
	Gao Liang, Kevin Tian

On Wed, Apr 13, 2022 at 06:59:58AM +0200, Christoph Hellwig wrote:
>So for now I'd be happy with the one liner presented here, but
>eventually the whole area could use an overhaul.

Thanks. Do you want me to post a new version with the Fixes tag or you
will take care of it?

Fixes: 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] dma-direct: avoid redundant memory sync for swiotlb
@ 2022-04-13  5:46         ` Chao Gao
  0 siblings, 0 replies; 18+ messages in thread
From: Chao Gao @ 2022-04-13  5:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kevin Tian, Wang Zhaoyang1, linux-kernel, Gao Liang, iommu, Robin Murphy

On Wed, Apr 13, 2022 at 06:59:58AM +0200, Christoph Hellwig wrote:
>So for now I'd be happy with the one liner presented here, but
>eventually the whole area could use an overhaul.

Thanks. Do you want me to post a new version with the Fixes tag or you
will take care of it?

Fixes: 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] dma-direct: avoid redundant memory sync for swiotlb
  2022-04-13  5:46         ` Chao Gao
@ 2022-04-13  5:49           ` Christoph Hellwig
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2022-04-13  5:49 UTC (permalink / raw)
  To: Chao Gao
  Cc: Christoph Hellwig, Robin Murphy, linux-kernel, iommu,
	m.szyprowski, Wang Zhaoyang1, Gao Liang, Kevin Tian

On Wed, Apr 13, 2022 at 01:46:06PM +0800, Chao Gao wrote:
> On Wed, Apr 13, 2022 at 06:59:58AM +0200, Christoph Hellwig wrote:
> >So for now I'd be happy with the one liner presented here, but
> >eventually the whole area could use an overhaul.
> 
> Thanks. Do you want me to post a new version with the Fixes tag or you
> will take care of it?

I can add the fixes tag.  I'll wait another day or two for more feedback,
though.

> 
> Fixes: 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")
---end quoted text---

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] dma-direct: avoid redundant memory sync for swiotlb
@ 2022-04-13  5:49           ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2022-04-13  5:49 UTC (permalink / raw)
  To: Chao Gao
  Cc: Kevin Tian, Wang Zhaoyang1, linux-kernel, Gao Liang, iommu,
	Robin Murphy, Christoph Hellwig

On Wed, Apr 13, 2022 at 01:46:06PM +0800, Chao Gao wrote:
> On Wed, Apr 13, 2022 at 06:59:58AM +0200, Christoph Hellwig wrote:
> >So for now I'd be happy with the one liner presented here, but
> >eventually the whole area could use an overhaul.
> 
> Thanks. Do you want me to post a new version with the Fixes tag or you
> will take care of it?

I can add the fixes tag.  I'll wait another day or two for more feedback,
though.

> 
> Fixes: 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")
---end quoted text---
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] dma-direct: avoid redundant memory sync for swiotlb
  2022-04-13  4:59       ` Christoph Hellwig
@ 2022-04-13 13:10         ` Robin Murphy
  -1 siblings, 0 replies; 18+ messages in thread
From: Robin Murphy @ 2022-04-13 13:10 UTC (permalink / raw)
  To: Christoph Hellwig, Chao Gao
  Cc: linux-kernel, iommu, m.szyprowski, Wang Zhaoyang1, Gao Liang, Kevin Tian

On 2022-04-13 05:59, Christoph Hellwig wrote:
> On Wed, Apr 13, 2022 at 09:02:02AM +0800, Chao Gao wrote:
>> dma_direct_sync_single_for_cpu() also calls arch_sync_dma_for_cpu_all()
>> and arch_dma_mark_clean() in some cases. if SWIOTLB does sync internally,
>> should these two functions be called by SWIOTLB?
>>
>> Personally, it might be better if swiotlb can just focus on bounce buffer
>> alloc/free. Adding more DMA coherence logic into swiotlb will make it
>> a little complicated.
>>
>> How about an open-coded version of dma_direct_sync_single_for_cpu
>> in dma_direct_unmap_page with swiotlb_sync_single_for_cpu replaced by
>> swiotlb_tbl_unmap_single?
> 
> I don't think the swiotlb and non-coherent case ever fully worked.
> Before the merge of swiotlb into dma-direct they obviously were
> mutally exclusive, and even now all the cache maintainance is done
> on the physical address of the original data, not the swiotlb buffer.

Are you sure? AFAICS swiotlb_map() does the right thing, and 
dma_direct_{sync,unmap} are working off the DMA address, which is that 
of the bounce slot when SWIOTLB is involved (not least, how would the 
is_swiotlb_buffer() checks work otherwise?)

> If we want to fix that properly all the arch dma calls will need to
> move into swiotlb, but that is a much bigger patch.
> 
> So for now I'd be happy with the one liner presented here, but
> eventually the whole area could use an overhaul.

Sure, whoever gets round to tackling DMA_ATTR_NO_SNOOP first will need 
to go through all the cache maintenance hooks anyway, so happy to kick 
the can down the road until then.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] dma-direct: avoid redundant memory sync for swiotlb
@ 2022-04-13 13:10         ` Robin Murphy
  0 siblings, 0 replies; 18+ messages in thread
From: Robin Murphy @ 2022-04-13 13:10 UTC (permalink / raw)
  To: Christoph Hellwig, Chao Gao
  Cc: Kevin Tian, Wang Zhaoyang1, linux-kernel, Gao Liang, iommu

On 2022-04-13 05:59, Christoph Hellwig wrote:
> On Wed, Apr 13, 2022 at 09:02:02AM +0800, Chao Gao wrote:
>> dma_direct_sync_single_for_cpu() also calls arch_sync_dma_for_cpu_all()
>> and arch_dma_mark_clean() in some cases. if SWIOTLB does sync internally,
>> should these two functions be called by SWIOTLB?
>>
>> Personally, it might be better if swiotlb can just focus on bounce buffer
>> alloc/free. Adding more DMA coherence logic into swiotlb will make it
>> a little complicated.
>>
>> How about an open-coded version of dma_direct_sync_single_for_cpu
>> in dma_direct_unmap_page with swiotlb_sync_single_for_cpu replaced by
>> swiotlb_tbl_unmap_single?
> 
> I don't think the swiotlb and non-coherent case ever fully worked.
> Before the merge of swiotlb into dma-direct they obviously were
> mutally exclusive, and even now all the cache maintainance is done
> on the physical address of the original data, not the swiotlb buffer.

Are you sure? AFAICS swiotlb_map() does the right thing, and 
dma_direct_{sync,unmap} are working off the DMA address, which is that 
of the bounce slot when SWIOTLB is involved (not least, how would the 
is_swiotlb_buffer() checks work otherwise?)

> If we want to fix that properly all the arch dma calls will need to
> move into swiotlb, but that is a much bigger patch.
> 
> So for now I'd be happy with the one liner presented here, but
> eventually the whole area could use an overhaul.

Sure, whoever gets round to tackling DMA_ATTR_NO_SNOOP first will need 
to go through all the cache maintenance hooks anyway, so happy to kick 
the can down the road until then.

Thanks,
Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] dma-direct: avoid redundant memory sync for swiotlb
  2022-04-13 13:10         ` Robin Murphy
@ 2022-04-13 16:44           ` Christoph Hellwig
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2022-04-13 16:44 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Kevin Tian, Wang Zhaoyang1, linux-kernel, Gao Liang, iommu,
	Christoph Hellwig

On Wed, Apr 13, 2022 at 02:10:56PM +0100, Robin Murphy wrote:
> Are you sure? AFAICS swiotlb_map() does the right thing, and 
> dma_direct_{sync,unmap} are working off the DMA address, which is that of 
> the bounce slot when SWIOTLB is involved (not least, how would the 
> is_swiotlb_buffer() checks work otherwise?)

Yeah, actually this should be fine.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] dma-direct: avoid redundant memory sync for swiotlb
@ 2022-04-13 16:44           ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2022-04-13 16:44 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Christoph Hellwig, Chao Gao, linux-kernel, iommu, m.szyprowski,
	Wang Zhaoyang1, Gao Liang, Kevin Tian

On Wed, Apr 13, 2022 at 02:10:56PM +0100, Robin Murphy wrote:
> Are you sure? AFAICS swiotlb_map() does the right thing, and 
> dma_direct_{sync,unmap} are working off the DMA address, which is that of 
> the bounce slot when SWIOTLB is involved (not least, how would the 
> is_swiotlb_buffer() checks work otherwise?)

Yeah, actually this should be fine.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-04-13 16:44 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-12 11:38 [PATCH] dma-direct: avoid redundant memory sync for swiotlb Chao Gao
2022-04-12 11:38 ` Chao Gao
2022-04-12 13:21 ` Chao Gao
2022-04-12 13:21   ` Chao Gao
2022-04-12 13:33 ` Robin Murphy
2022-04-12 13:33   ` Robin Murphy
2022-04-13  1:02   ` Chao Gao
2022-04-13  1:02     ` Chao Gao
2022-04-13  4:59     ` Christoph Hellwig
2022-04-13  4:59       ` Christoph Hellwig
2022-04-13  5:46       ` Chao Gao
2022-04-13  5:46         ` Chao Gao
2022-04-13  5:49         ` Christoph Hellwig
2022-04-13  5:49           ` Christoph Hellwig
2022-04-13 13:10       ` Robin Murphy
2022-04-13 13:10         ` Robin Murphy
2022-04-13 16:44         ` Christoph Hellwig
2022-04-13 16:44           ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.