linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] iommu/iova: Free all CPU rcache for retry when iova alloc failure
@ 2022-03-04  4:46 yf.wang
  2022-03-04  5:00 ` yf.wang
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: yf.wang @ 2022-03-04  4:46 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Matthias Brugger,
	open list:IOMMU DRIVERS, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support
  Cc: wsd_upstream, Libo Kang, Ning Li, Yong Wu, Yunfei Wang, stable

From: Yunfei Wang <yf.wang@mediatek.com>

In alloc_iova_fast function, if an iova alloc request fail,
it will free the iova ranges present in the percpu iova
rcaches and free global iova rcache and then retry, but
flushing CPU iova rcaches only for each online CPU, which
will cause incomplete rcache cleaning, and iova rcaches of
not online CPU cannot be flushed, because iova rcaches may
also lead to fragmentation of iova space, so the next retry
action may still be fail.

Based on the above, so need to flushing all iova rcaches
for each possible CPU, use for_each_possible_cpu instead of
for_each_online_cpu like in free_iova_rcaches function,
so that all rcaches can be completely released to try
replenishing IOVAs.

Signed-off-by: Yunfei Wang <yf.wang@mediatek.com>
Cc: <stable@vger.kernel.org> # 5.4.*
---
 drivers/iommu/iova.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b28c9435b898..5a0637cd7bc2 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -460,7 +460,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long size,
 
 		/* Try replenishing IOVAs by flushing rcache. */
 		flush_rcache = false;
-		for_each_online_cpu(cpu)
+		for_each_possible_cpu(cpu)
 			free_cpu_cached_iovas(cpu, iovad);
 		free_global_cached_iovas(iovad);
 		goto retry;
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] iommu/iova: Free all CPU rcache for retry when iova alloc failure
  2022-03-04  4:46 [PATCH] iommu/iova: Free all CPU rcache for retry when iova alloc failure yf.wang
@ 2022-03-04  5:00 ` yf.wang
  2022-03-04  9:22 ` John Garry
  2022-03-04 14:03 ` Robin Murphy
  2 siblings, 0 replies; 7+ messages in thread
From: yf.wang @ 2022-03-04  5:00 UTC (permalink / raw)
  To: iommu
  Cc: Libo.Kang, Ning.Li, joro, linux-arm-kernel, linux-kernel,
	linux-mediatek, matthias.bgg, stable, will, wsd_upstream,
	yf.wang

On Fri, 2022-03-04 at 12:46 +0800, yf.wang@mediatek.com wrote:
> From: Yunfei Wang <yf.wang@mediatek.com>
> 
> In alloc_iova_fast function, if an iova alloc request fail,
> it will free the iova ranges present in the percpu iova
> rcaches and free global iova rcache and then retry, but
> flushing CPU iova rcaches only for each online CPU, which
> will cause incomplete rcache cleaning, and iova rcaches of
> not online CPU cannot be flushed, because iova rcaches may
> also lead to fragmentation of iova space, so the next retry
> action may still be fail.
> 
> Based on the above, so need to flushing all iova rcaches
> for each possible CPU, use for_each_possible_cpu instead of
> for_each_online_cpu like in free_iova_rcaches function,
> so that all rcaches can be completely released to try
> replenishing IOVAs.
> 
> Signed-off-by: Yunfei Wang <yf.wang@mediatek.com>
> Cc: <stable@vger.kernel.org> # 5.4.*
> ---
>  drivers/iommu/iova.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index b28c9435b898..5a0637cd7bc2 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -460,7 +460,7 @@ alloc_iova_fast(struct iova_domain *iovad,
> unsigned long size,
>  
>  		/* Try replenishing IOVAs by flushing rcache. */
>  		flush_rcache = false;
> -		for_each_online_cpu(cpu)
> +		for_each_possible_cpu(cpu)
>  			free_cpu_cached_iovas(cpu, iovad);
>  		free_global_cached_iovas(iovad);
>  		goto retry;

The following is the actual stress test log dump: 
1. __alloc_and_insert_iova_range_fail\x1a
[name:iova&]__alloc_and_insert_iova_range_fail:  iovad:0xffffff80dc9e6008 {granule:0x1000, start_pfn:0x40000, dma_32bit_pfn:0x100000, max32_alloc_size:0x731f},size:0x731f

2. dump all iova nodes of rbtree, contain cached iova:
[name:iova&]dump_iova_rbtree start:
[name:iova&]index    iova            pfn_lo      size      pfn_hi
[name:iova&]   1 0xffffff80680f8f80  0x47000000  0x4c9000  0x474c8000
[name:iova&]   2 0xffffff813e69c140  0x47500000  0x100000  0x475ff000
  ...317 lines of log are omitted here...
[name:iova&] 319 0xffffff80c8cea540  0xffefc000  0x4000    0xffeff000
[name:iova&] 320 0xffffff80de3a14c0  0xfff00000  0x100000  0xfffff000
[name:iova&] 321 0xffffff80dc9e6060  0xfffffffffffff000 0x1000 0xfffffffffffff000
[name:iova&]dump_iova_rbtree done, iovad:0xffffff80dc9e6008, count:321, total_size:606964KB

3. for_each_possible_cpu dump all cached iovas by iova_magazine pfns:
cpu:*-1: online cpu, cpu:*-0: not online cpu
[name:iova&]dump_iova_list, for_each_possible_cpu iovad:0xffffff80dc9e6008

3.1 online cpu cached iovas:
[name:iova&]dump_cpu_cached_iovas, iovad:0xffffff80dc9e6008, cpu:0-1, cache:0, iova_size:0x1000
[name:iova&]mag    iova              pfn_lo      size    pfn_h
[name:iova&]0/2  0xffffff8089c93c40  0xb8aff000  0x1000  0xb8aff000
[name:iova&]1/2  0xffffff8072046c40  0x716cf000  0x1000  0x716cf000
... omit other online cpu cached iovas ...

3.2 not online cpu cached iovas\x1a
... omit other not online cpu cached iovas ...
[name:iova&]dump_cpu_cached_iovas, iovad:0xffffff80dc9e6008, cpu:7-0, cache:5, iova_size:0x20000
[name:iova&]mag    iova               pfn_lo      size    pfn_hi
[name:iova&]0/10  0xffffff82b728ea40  0xdfbe0000  0x20000  0xdfbff000
[name:iova&]1/10  0xffffff82b728e480  0xee1e0000  0x20000  0xee1ff000
[name:iova&]2/10  0xffffff81a9fc5440  0xf27c0000  0x20000  0xf27df000
[name:iova&]3/10  0xffffff816f213800  0xd7fe0000  0x20000  0xd7fff000
[name:iova&]4/10  0xffffff80aa3fbf00  0xdb0e0000  0x20000  0xdb0ff000
[name:iova&]5/10  0xffffff8109217b00  0x4a7e0000  0x20000  0x4a7ff000
[name:iova&]6/10  0xffffff8109217540  0xb55a0000  0x20000  0xb55bf000
[name:iova&]7/10  0xffffff82b728e580  0xf33c0000  0x20000  0xf33df000
[name:iova&]8/10  0xffffff8109217500  0x7b5e0000  0x20000  0x7b5ff000
[name:iova&]9/10  0xffffff81a9fc5a80  0xf2da0000  0x20000  0xf2dbf000
[name:iova&]dump_iova_list, done iovad:0xffffff80dc9e6008, cpu_mask:0xff(11111111), online_cpu_mask:(00111111)

4. alloc fail, flushing rcache and retry:
[name:iova&]alloc_iova_fast, flushing rcache and retry, iovad:0xffffff80dc9e6008, size:0x731f, limit_pfn:0xfffff

5. retry and alloc fail again.
[name:iova&]__alloc_and_insert_iova_range_fail, iovad:0xffffff80dc9e6008 {granule:0x1000, start_pfn:0x40000, dma_32bit_pfn:0x100000, max32_alloc_size:0x731f},size:0x731f

6. dump all iova nodes of rbtree, contain cached iova:
[name:iova&]dump_iova_rbtree start:
[name:iova&]index    iova                pfn_lo      size    pfn_hi
[name:iova&]   1 0xffffff80680f8f80 0x47000000 0x4c9000  0x474c8000    
[name:iova&]   2 0xffffff813e69c140 0x47500000 0x100000  0x475ff000    
... 215 lines of log are omitted here ...
[name:iova&] 217 0xffffff80c8cea540 0xffefc000 0x4000    0xffeff000    
[name:iova&] 218 0xffffff80de3a14c0 0xfff00000 0x100000  0xfffff000    
[name:iova&] 219 0xffffff80dc9e6060 0xfffffffffffff000 0x1000 0xfffffffffffff000
[name:iova&]dump_iova_rbtree done, iovad:0xffffff80dc9e6008, count:219, total_size:603640KB

7. for_each_possible_cpu dump all cached iovas by iova_magazine pfns:
cpu:*-1: online cpu, cpu:*-0: not online cpu
[name:iova&]dump_iova_list, for_each_possible_cpu iovad:0xffffff80dc9e6008

7.1 online cpu cached iovas empty\x1a
[name:iova&]dump_cpu_cached_iovas, iovad:0xffffff80dc9e6008, cpu:0-1, cache:0, iova_size:0x1000 empty
[name:iova&]dump_cpu_cached_iovas, iovad:0xffffff80dc9e6008, cpu:0-1, cache:1, iova_size:0x2000 empty
... omit other online cpus ...
[name:iova&]dump_cpu_cached_iovas, iovad:0xffffff80dc9e6008, cpu:5-1, cache:4, iova_size:0x10000 empty
[name:iova&]dump_cpu_cached_iovas, iovad:0xffffff80dc9e6008, cpu:5-1, cache:5, iova_size:0x20000 empty

7.1 not online cpu cached iovas\x1a still have cached iovas:
... omit other not online cpu cached iovas ...
[name:iova&]dump_cpu_cached_iovas, iovad:0xffffff80dc9e6008, cpu:7-0, cache:5, iova_size:0x20000
[name:iova&]mag    iova               pfn_lo      size    pfn_hi
[name:iova&]0/10  0xffffff82b728ea40  0xdfbe0000  0x20000  0xdfbff000
[name:iova&]1/10  0xffffff82b728e480  0xee1e0000  0x20000  0xee1ff000
[name:iova&]2/10  0xffffff81a9fc5440  0xf27c0000  0x20000  0xf27df000
[name:iova&]3/10  0xffffff816f213800  0xd7fe0000  0x20000  0xd7fff000
[name:iova&]4/10  0xffffff80aa3fbf00  0xdb0e0000  0x20000  0xdb0ff000
[name:iova&]5/10  0xffffff8109217b00  0x4a7e0000  0x20000  0x4a7ff000
[name:iova&]6/10  0xffffff8109217540  0xb55a0000  0x20000  0xb55bf000
[name:iova&]7/10  0xffffff82b728e580  0xf33c0000  0x20000  0xf33df000
[name:iova&]8/10  0xffffff8109217500  0x7b5e0000  0x20000  0x7b5ff000
[name:iova&]9/10  0xffffff81a9fc5a80  0xf2da0000  0x20000  0xf2dbf000
[name:iova&]dump_iova_list, done iovad:0xffffff80dc9e6008, cpu_mask:0xff(11111111), online_cpu_mask:0x3f(00111111)

Thanks,
Yunfei.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] iommu/iova: Free all CPU rcache for retry when iova alloc failure
  2022-03-04  4:46 [PATCH] iommu/iova: Free all CPU rcache for retry when iova alloc failure yf.wang
  2022-03-04  5:00 ` yf.wang
@ 2022-03-04  9:22 ` John Garry
  2022-03-07  3:18   ` yf.wang
  2022-03-04 14:03 ` Robin Murphy
  2 siblings, 1 reply; 7+ messages in thread
From: John Garry @ 2022-03-04  9:22 UTC (permalink / raw)
  To: yf.wang, Joerg Roedel, Will Deacon, Matthias Brugger,
	open list:IOMMU DRIVERS, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support
  Cc: wsd_upstream, Libo Kang, stable, Ning Li

On 04/03/2022 04:46, yf.wang--- via iommu wrote:
> ************* MEDIATEK Confidentiality Notice ********************
> The information contained in this e-mail message (including any
> attachments) may be confidential, proprietary, privileged, or otherwise
> exempt from disclosure under applicable laws. It is intended to be

Can you please stop sending patches with this?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] iommu/iova: Free all CPU rcache for retry when iova alloc failure
  2022-03-04  4:46 [PATCH] iommu/iova: Free all CPU rcache for retry when iova alloc failure yf.wang
  2022-03-04  5:00 ` yf.wang
  2022-03-04  9:22 ` John Garry
@ 2022-03-04 14:03 ` Robin Murphy
  2022-03-07  3:32   ` yf.wang
  2022-03-07  3:49   ` yf.wang
  2 siblings, 2 replies; 7+ messages in thread
From: Robin Murphy @ 2022-03-04 14:03 UTC (permalink / raw)
  To: yf.wang, Joerg Roedel, Will Deacon, Matthias Brugger,
	open list:IOMMU DRIVERS, open list,
	moderated list:ARM/Mediatek SoC support,
	moderated list:ARM/Mediatek SoC support
  Cc: wsd_upstream, Libo Kang, stable, Ning Li

On 2022-03-04 04:46, yf.wang--- via iommu wrote:
> From: Yunfei Wang <yf.wang@mediatek.com>
> 
> In alloc_iova_fast function, if an iova alloc request fail,
> it will free the iova ranges present in the percpu iova
> rcaches and free global iova rcache and then retry, but
> flushing CPU iova rcaches only for each online CPU, which
> will cause incomplete rcache cleaning, and iova rcaches of
> not online CPU cannot be flushed, because iova rcaches may
> also lead to fragmentation of iova space, so the next retry
> action may still be fail.
> 
> Based on the above, so need to flushing all iova rcaches
> for each possible CPU, use for_each_possible_cpu instead of
> for_each_online_cpu like in free_iova_rcaches function,
> so that all rcaches can be completely released to try
> replenishing IOVAs.

OK, so either there's a mystery bug where IOVAs somehow get freed on 
offline CPUs, or the hotplug notifier isn't working correctly, or you've 
contrived a situation where alloc_iova_fast() is actually racing against 
iova_cpuhp_dead(). In the latter case, the solution is "don't do that".

This change should not be necessary.

Thanks,
Robin.

> Signed-off-by: Yunfei Wang <yf.wang@mediatek.com>
> Cc: <stable@vger.kernel.org> # 5.4.*
> ---
>   drivers/iommu/iova.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index b28c9435b898..5a0637cd7bc2 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -460,7 +460,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long size,
>   
>   		/* Try replenishing IOVAs by flushing rcache. */
>   		flush_rcache = false;
> -		for_each_online_cpu(cpu)
> +		for_each_possible_cpu(cpu)
>   			free_cpu_cached_iovas(cpu, iovad);
>   		free_global_cached_iovas(iovad);
>   		goto retry;
> 
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] iommu/iova: Free all CPU rcache for retry when iova alloc failure
  2022-03-04  9:22 ` John Garry
@ 2022-03-07  3:18   ` yf.wang
  0 siblings, 0 replies; 7+ messages in thread
From: yf.wang @ 2022-03-07  3:18 UTC (permalink / raw)
  To: iommu
  Cc: Libo.Kang, Ning.Li, john.garry, joro, linux-arm-kernel,
	linux-kernel, linux-mediatek, matthias.bgg, stable, will,
	wsd_upstream, yf.wang

On 2022-03-04 9:22, John Garry wrote:
> On 04/03/2022 04:46, yf.wang--- via iommu wrote:
> > ************* MEDIATEK Confidentiality Notice ********************
> > The 
> > information contained in this e-mail message (including any
> > attachments) may be confidential, proprietary, privileged, or 
> > otherwise exempt from disclosure under applicable laws. It is
> > intended 
> > to be
> 
> Can you please stop sending patches with this?

Hi John,

I will remote it later.

Thanks,
Yunfei.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] iommu/iova: Free all CPU rcache for retry when iova alloc failure
  2022-03-04 14:03 ` Robin Murphy
@ 2022-03-07  3:32   ` yf.wang
  2022-03-07  3:49   ` yf.wang
  1 sibling, 0 replies; 7+ messages in thread
From: yf.wang @ 2022-03-07  3:32 UTC (permalink / raw)
  To: robin.murphy
  Cc: Libo.Kang, Ning.Li, iommu, joro, linux-arm-kernel, linux-kernel,
	linux-mediatek, matthias.bgg, stable, will, wsd_upstream,
	yf.wang

On Fri, 2022-03-04 at 14:03 +0000, Robin Murphy wrote:
> 
> OK, so either there's a mystery bug where IOVAs somehow get freed on 
> offline CPUs, or the hotplug notifier isn't working correctly, or
> you've 
> contrived a situation where alloc_iova_fast() is actually racing
> against 
> iova_cpuhp_dead(). In the latter case, the solution is "don't do
> that".
> 
> This change should not be necessary.
> 
> Thanks,
> Robin.

Hi Robin,

1.As long as iova domain is not destroyed, the cached iovas will always
exist, the only chance to free the cache is the retry flushing
mechanism when alloc fail, but not free cached iova of not online CPU.

2.Iova rcache mechanism is by cpu, but there is no free rcache
mechanism when the CPU state switch.

3.iova.c does not know about CPU state switching, eg.online <--> offline.

Based on the above basic information, this is not a user bug, it is
more like a defect of the iova rcache mechanism.

Thanks,
Yunfei.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] iommu/iova: Free all CPU rcache for retry when iova alloc failure
  2022-03-04 14:03 ` Robin Murphy
  2022-03-07  3:32   ` yf.wang
@ 2022-03-07  3:49   ` yf.wang
  1 sibling, 0 replies; 7+ messages in thread
From: yf.wang @ 2022-03-07  3:49 UTC (permalink / raw)
  To: robin.murphy
  Cc: Libo.Kang, Ning.Li, iommu, joro, linux-arm-kernel, linux-kernel,
	linux-mediatek, matthias.bgg, stable, will, wsd_upstream,
	yf.wang

On Fri, 2022-03-04 at 14:03 +0000, Robin Murphy wrote:
> 
> OK, so either there's a mystery bug where IOVAs somehow get freed on 
> offline CPUs, or the hotplug notifier isn't working correctly, or
> you've 
> contrived a situation where alloc_iova_fast() is actually racing
> against 
> iova_cpuhp_dead(). In the latter case, the solution is "don't do
> that".
> 
> This change should not be necessary.
> 
> Thanks,
> Robin.

Hi Robin,

You are right, kernel 5.15 already has this patch, but kernel 5.10
not contain.

Thanks,
Yunfei.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-03-07  3:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-04  4:46 [PATCH] iommu/iova: Free all CPU rcache for retry when iova alloc failure yf.wang
2022-03-04  5:00 ` yf.wang
2022-03-04  9:22 ` John Garry
2022-03-07  3:18   ` yf.wang
2022-03-04 14:03 ` Robin Murphy
2022-03-07  3:32   ` yf.wang
2022-03-07  3:49   ` yf.wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).