linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
@ 2021-11-04 17:51 Vincent Donnefort
  2021-11-05 15:08 ` Vincent Guittot
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Vincent Donnefort @ 2021-11-04 17:51 UTC (permalink / raw)
  To: peterz, mingo, vincent.guittot
  Cc: linux-kernel, dietmar.eggemann, valentin.schneider, jing-ting.wu,
	Vincent Donnefort

Nothing protects the access to the per_cpu variable sd_llc_id. When testing
the same CPU (i.e. this_cpu == that_cpu), a race condition exists with
update_top_cache_domain(). One scenario being:

              CPU1                            CPU2
  ==================================================================

  per_cpu(sd_llc_id, CPUX) => 0
                                    partition_sched_domains_locked()
      				      detach_destroy_domains()
  cpus_share_cache(CPUX, CPUX)          update_top_cache_domain(CPUX)
    per_cpu(sd_llc_id, CPUX) => 0
                                          per_cpu(sd_llc_id, CPUX) = CPUX
    per_cpu(sd_llc_id, CPUX) => CPUX
    return false

ttwu_queue_cond() wouldn't catch smp_processor_id() == cpu and the result
is a warning triggered from ttwu_queue_wakelist().

Avoid a such race in cpus_share_cache() by always returning true when
this_cpu == that_cpu.

Fixes: 518cd6234178 ("sched: Only queue remote wakeups when crossing cache boundaries")
Reported-by: Jing-Ting Wu <jing-ting.wu@mediatek.com>
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f2611b9cf503..f5ca15cdcff4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3726,6 +3726,9 @@ void wake_up_if_idle(int cpu)
 
 bool cpus_share_cache(int this_cpu, int that_cpu)
 {
+	if (this_cpu == that_cpu)
+		return true;
+
 	return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
  2021-11-04 17:51 [PATCH] sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain() Vincent Donnefort
@ 2021-11-05 15:08 ` Vincent Guittot
  2021-11-08 11:19   ` Peter Zijlstra
  2021-11-11 12:22 ` [tip: sched/urgent] " tip-bot2 for Vincent Donnefort
  2021-11-15 10:46 ` [PATCH] " Shinichiro Kawasaki
  2 siblings, 1 reply; 8+ messages in thread
From: Vincent Guittot @ 2021-11-05 15:08 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: peterz, mingo, linux-kernel, dietmar.eggemann,
	Valentin.Schneider, jing-ting.wu

On Thu, 4 Nov 2021 at 18:52, Vincent Donnefort
<vincent.donnefort@arm.com> wrote:
>
> Nothing protects the access to the per_cpu variable sd_llc_id. When testing
> the same CPU (i.e. this_cpu == that_cpu), a race condition exists with
> update_top_cache_domain(). One scenario being:
>
>               CPU1                            CPU2
>   ==================================================================
>
>   per_cpu(sd_llc_id, CPUX) => 0
>                                     partition_sched_domains_locked()
>                                       detach_destroy_domains()
>   cpus_share_cache(CPUX, CPUX)          update_top_cache_domain(CPUX)
>     per_cpu(sd_llc_id, CPUX) => 0
>                                           per_cpu(sd_llc_id, CPUX) = CPUX
>     per_cpu(sd_llc_id, CPUX) => CPUX
>     return false
>
> ttwu_queue_cond() wouldn't catch smp_processor_id() == cpu and the result
> is a warning triggered from ttwu_queue_wakelist().
>
> Avoid a such race in cpus_share_cache() by always returning true when
> this_cpu == that_cpu.
>
> Fixes: 518cd6234178 ("sched: Only queue remote wakeups when crossing cache boundaries")
> Reported-by: Jing-Ting Wu <jing-ting.wu@mediatek.com>
> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>

 Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f2611b9cf503..f5ca15cdcff4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3726,6 +3726,9 @@ void wake_up_if_idle(int cpu)
>
>  bool cpus_share_cache(int this_cpu, int that_cpu)
>  {
> +       if (this_cpu == that_cpu)
> +               return true;
> +
>         return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);
>  }
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
  2021-11-05 15:08 ` Vincent Guittot
@ 2021-11-08 11:19   ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2021-11-08 11:19 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Vincent Donnefort, mingo, linux-kernel, dietmar.eggemann,
	Valentin.Schneider, jing-ting.wu

On Fri, Nov 05, 2021 at 04:08:33PM +0100, Vincent Guittot wrote:
> On Thu, 4 Nov 2021 at 18:52, Vincent Donnefort
> <vincent.donnefort@arm.com> wrote:
> >
> > Nothing protects the access to the per_cpu variable sd_llc_id. When testing
> > the same CPU (i.e. this_cpu == that_cpu), a race condition exists with
> > update_top_cache_domain(). One scenario being:
> >
> >               CPU1                            CPU2
> >   ==================================================================
> >
> >   per_cpu(sd_llc_id, CPUX) => 0
> >                                     partition_sched_domains_locked()
> >                                       detach_destroy_domains()
> >   cpus_share_cache(CPUX, CPUX)          update_top_cache_domain(CPUX)
> >     per_cpu(sd_llc_id, CPUX) => 0
> >                                           per_cpu(sd_llc_id, CPUX) = CPUX
> >     per_cpu(sd_llc_id, CPUX) => CPUX
> >     return false
> >
> > ttwu_queue_cond() wouldn't catch smp_processor_id() == cpu and the result
> > is a warning triggered from ttwu_queue_wakelist().
> >
> > Avoid a such race in cpus_share_cache() by always returning true when
> > this_cpu == that_cpu.
> >
> > Fixes: 518cd6234178 ("sched: Only queue remote wakeups when crossing cache boundaries")
> > Reported-by: Jing-Ting Wu <jing-ting.wu@mediatek.com>
> > Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> > Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
> 
>  Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> 
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index f2611b9cf503..f5ca15cdcff4 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3726,6 +3726,9 @@ void wake_up_if_idle(int cpu)
> >
> >  bool cpus_share_cache(int this_cpu, int that_cpu)
> >  {
> > +       if (this_cpu == that_cpu)
> > +               return true;
> > +
> >         return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);
> >  }

Blergh, that's annoying. Thanks guys, picked it up for /urgent

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip: sched/urgent] sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
  2021-11-04 17:51 [PATCH] sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain() Vincent Donnefort
  2021-11-05 15:08 ` Vincent Guittot
@ 2021-11-11 12:22 ` tip-bot2 for Vincent Donnefort
  2021-11-15 10:46 ` [PATCH] " Shinichiro Kawasaki
  2 siblings, 0 replies; 8+ messages in thread
From: tip-bot2 for Vincent Donnefort @ 2021-11-11 12:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Jing-Ting Wu, Vincent Donnefort, Peter Zijlstra (Intel),
	Valentin Schneider, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     42dc938a590c96eeb429e1830123fef2366d9c80
Gitweb:        https://git.kernel.org/tip/42dc938a590c96eeb429e1830123fef2366d9c80
Author:        Vincent Donnefort <vincent.donnefort@arm.com>
AuthorDate:    Thu, 04 Nov 2021 17:51:20 
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 11 Nov 2021 13:09:32 +01:00

sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()

Nothing protects the access to the per_cpu variable sd_llc_id. When testing
the same CPU (i.e. this_cpu == that_cpu), a race condition exists with
update_top_cache_domain(). One scenario being:

              CPU1                            CPU2
  ==================================================================

  per_cpu(sd_llc_id, CPUX) => 0
                                    partition_sched_domains_locked()
      				      detach_destroy_domains()
  cpus_share_cache(CPUX, CPUX)          update_top_cache_domain(CPUX)
    per_cpu(sd_llc_id, CPUX) => 0
                                          per_cpu(sd_llc_id, CPUX) = CPUX
    per_cpu(sd_llc_id, CPUX) => CPUX
    return false

ttwu_queue_cond() wouldn't catch smp_processor_id() == cpu and the result
is a warning triggered from ttwu_queue_wakelist().

Avoid a such race in cpus_share_cache() by always returning true when
this_cpu == that_cpu.

Fixes: 518cd6234178 ("sched: Only queue remote wakeups when crossing cache boundaries")
Reported-by: Jing-Ting Wu <jing-ting.wu@mediatek.com>
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20211104175120.857087-1-vincent.donnefort@arm.com
---
 kernel/sched/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 523fd60..cec173a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3726,6 +3726,9 @@ out:
 
 bool cpus_share_cache(int this_cpu, int that_cpu)
 {
+	if (this_cpu == that_cpu)
+		return true;
+
 	return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);
 }
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
  2021-11-04 17:51 [PATCH] sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain() Vincent Donnefort
  2021-11-05 15:08 ` Vincent Guittot
  2021-11-11 12:22 ` [tip: sched/urgent] " tip-bot2 for Vincent Donnefort
@ 2021-11-15 10:46 ` Shinichiro Kawasaki
  2021-11-15 18:33   ` Valentin Schneider
  2 siblings, 1 reply; 8+ messages in thread
From: Shinichiro Kawasaki @ 2021-11-15 10:46 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: peterz, mingo, vincent.guittot, linux-kernel, dietmar.eggemann,
	valentin.schneider, jing-ting.wu, Damien Le Moal

On Nov 04, 2021 / 17:51, Vincent Donnefort wrote:
> Nothing protects the access to the per_cpu variable sd_llc_id. When testing
> the same CPU (i.e. this_cpu == that_cpu), a race condition exists with
> update_top_cache_domain(). One scenario being:
> 
>               CPU1                            CPU2
>   ==================================================================
> 
>   per_cpu(sd_llc_id, CPUX) => 0
>                                     partition_sched_domains_locked()
>       				      detach_destroy_domains()
>   cpus_share_cache(CPUX, CPUX)          update_top_cache_domain(CPUX)
>     per_cpu(sd_llc_id, CPUX) => 0
>                                           per_cpu(sd_llc_id, CPUX) = CPUX
>     per_cpu(sd_llc_id, CPUX) => CPUX
>     return false
> 
> ttwu_queue_cond() wouldn't catch smp_processor_id() == cpu and the result
> is a warning triggered from ttwu_queue_wakelist().
> 
> Avoid a such race in cpus_share_cache() by always returning true when
> this_cpu == that_cpu.
> 
> Fixes: 518cd6234178 ("sched: Only queue remote wakeups when crossing cache boundaries")
> Reported-by: Jing-Ting Wu <jing-ting.wu@mediatek.com>
> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f2611b9cf503..f5ca15cdcff4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3726,6 +3726,9 @@ void wake_up_if_idle(int cpu)
>  
>  bool cpus_share_cache(int this_cpu, int that_cpu)
>  {
> +	if (this_cpu == that_cpu)
> +		return true;
> +
>  	return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);
>  }
>  
> -- 
> 2.25.1
> 

Oh, this is the exactly same fix as I posted before [1]. It is a little bit sad
that my post did not get reviewed. Anyway, good to see the issue fixed. Thanks.

[1] https://lore.kernel.org/all/20211029005618.773579-1-shinichiro.kawasaki@wdc.com/

-- 
Best Regards,
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
  2021-11-15 10:46 ` [PATCH] " Shinichiro Kawasaki
@ 2021-11-15 18:33   ` Valentin Schneider
  2021-11-16  8:02     ` Vincent Guittot
  0 siblings, 1 reply; 8+ messages in thread
From: Valentin Schneider @ 2021-11-15 18:33 UTC (permalink / raw)
  To: Shinichiro Kawasaki, Vincent Donnefort
  Cc: peterz, mingo, vincent.guittot, linux-kernel, dietmar.eggemann,
	jing-ting.wu, Damien Le Moal

On 15/11/21 10:46, Shinichiro Kawasaki wrote:
> Oh, this is the exactly same fix as I posted before [1]. It is a little bit sad
> that my post did not get reviewed. Anyway, good to see the issue fixed. Thanks.
>
> [1] https://lore.kernel.org/all/20211029005618.773579-1-shinichiro.kawasaki@wdc.com/
>

Oh, that sucks, sorry about that. I do try to keep an eye out for sched
stuff sent to LKML, but clearly there are some that fall through :(

> --
> Best Regards,
> Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
  2021-11-15 18:33   ` Valentin Schneider
@ 2021-11-16  8:02     ` Vincent Guittot
  2021-11-16  8:45       ` Shinichiro Kawasaki
  0 siblings, 1 reply; 8+ messages in thread
From: Vincent Guittot @ 2021-11-16  8:02 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Shinichiro Kawasaki, Vincent Donnefort, peterz, mingo,
	linux-kernel, dietmar.eggemann, jing-ting.wu, Damien Le Moal

Hi Shinichiro

On Mon, 15 Nov 2021 at 19:34, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> On 15/11/21 10:46, Shinichiro Kawasaki wrote:
> > Oh, this is the exactly same fix as I posted before [1]. It is a little bit sad
> > that my post did not get reviewed. Anyway, good to see the issue fixed. Thanks.
> >
> > [1] https://lore.kernel.org/all/20211029005618.773579-1-shinichiro.kawasaki@wdc.com/
> >
>
> Oh, that sucks, sorry about that. I do try to keep an eye out for sched
> stuff sent to LKML, but clearly there are some that fall through :(

I would advise you to use get_maintainer.pl to make sure to not miss
to cc someone.
Like valentin, this patch has been lost in the lkml flow and i haven't
noticed it

Regards,
Vincent
>
> > --
> > Best Regards,
> > Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
  2021-11-16  8:02     ` Vincent Guittot
@ 2021-11-16  8:45       ` Shinichiro Kawasaki
  0 siblings, 0 replies; 8+ messages in thread
From: Shinichiro Kawasaki @ 2021-11-16  8:45 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Valentin Schneider, Vincent Donnefort, peterz, mingo,
	linux-kernel, dietmar.eggemann, jing-ting.wu, Damien Le Moal

On Nov 16, 2021 / 09:02, Vincent Guittot wrote:
> Hi Shinichiro
> 
> On Mon, 15 Nov 2021 at 19:34, Valentin Schneider
> <valentin.schneider@arm.com> wrote:
> >
> > On 15/11/21 10:46, Shinichiro Kawasaki wrote:
> > > Oh, this is the exactly same fix as I posted before [1]. It is a little bit sad
> > > that my post did not get reviewed. Anyway, good to see the issue fixed. Thanks.
> > >
> > > [1] https://lore.kernel.org/all/20211029005618.773579-1-shinichiro.kawasaki@wdc.com/
> > >
> >
> > Oh, that sucks, sorry about that. I do try to keep an eye out for sched
> > stuff sent to LKML, but clearly there are some that fall through :(
> 
> I would advise you to use get_maintainer.pl to make sure to not miss
> to cc someone.
> Like valentin, this patch has been lost in the lkml flow and i haven't
> noticed it

Hi Velentin, Vincent, thank you for the comments.

I have run get_maintainer.pl on may patch, and it listed 9 maintainers and
reviewers. But my patch had only 2 out of the 9 in its To/Cc list, which was
too small. For next time, I will use get_maintainer.pl and post with full
To/Cc list. Thanks!

-- 
Best Regards,
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-11-16  8:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-04 17:51 [PATCH] sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain() Vincent Donnefort
2021-11-05 15:08 ` Vincent Guittot
2021-11-08 11:19   ` Peter Zijlstra
2021-11-11 12:22 ` [tip: sched/urgent] " tip-bot2 for Vincent Donnefort
2021-11-15 10:46 ` [PATCH] " Shinichiro Kawasaki
2021-11-15 18:33   ` Valentin Schneider
2021-11-16  8:02     ` Vincent Guittot
2021-11-16  8:45       ` Shinichiro Kawasaki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).