linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine
@ 2022-08-10 22:33 Libo Chen
  2022-08-15 11:01 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Libo Chen @ 2022-08-10 22:33 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, juri.lelli, dietmar.eggemann, mgorman
  Cc: linux-kernel

There are scenarios where non-affine wakeups are incorrectly counted as
affine wakeups by schedstats.

When wake_affine_idle() returns prev_cpu which doesn't equal to
nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits
in wake_affine() and be counted as if target == this_cpu in schedstats.

Replace target == nr_cpumask_bits with target != this_cpu to make sure
affine wakeups are accurately tallied.

Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle)
Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Libo Chen <libo.chen@oracle.com>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index da388657d5ac..b179da4f8105 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
 		target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
 
 	schedstat_inc(p->stats.nr_wakeups_affine_attempts);
-	if (target == nr_cpumask_bits)
+	if (target != this_cpu)
 		return prev_cpu;
 
 	schedstat_inc(sd->ttwu_move_affine);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine
  2022-08-10 22:33 [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine Libo Chen
@ 2022-08-15 11:01 ` Peter Zijlstra
  2022-08-15 19:19   ` Libo Chen
  2022-08-25  7:30 ` Gautham R. Shenoy
  2023-04-06 10:05 ` [tip: sched/core] " tip-bot2 for Libo Chen
  2 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2022-08-15 11:01 UTC (permalink / raw)
  To: Libo Chen
  Cc: mingo, vincent.guittot, juri.lelli, dietmar.eggemann, mgorman,
	linux-kernel

On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote:
> There are scenarios where non-affine wakeups are incorrectly counted as
> affine wakeups by schedstats.
> 
> When wake_affine_idle() returns prev_cpu which doesn't equal to
> nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits
> in wake_affine() and be counted as if target == this_cpu in schedstats.
> 
> Replace target == nr_cpumask_bits with target != this_cpu to make sure
> affine wakeups are accurately tallied.
> 
> Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle)
> Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Signed-off-by: Libo Chen <libo.chen@oracle.com>
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index da388657d5ac..b179da4f8105 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
>  		target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
>  
>  	schedstat_inc(p->stats.nr_wakeups_affine_attempts);
> -	if (target == nr_cpumask_bits)
> +	if (target != this_cpu)
>  		return prev_cpu;
>  
>  	schedstat_inc(sd->ttwu_move_affine);

This not only changes the accounting but also the placement, no?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine
  2022-08-15 11:01 ` Peter Zijlstra
@ 2022-08-15 19:19   ` Libo Chen
  2022-08-17 13:20     ` Vincent Guittot
  2023-01-09 22:00     ` Libo Chen
  0 siblings, 2 replies; 10+ messages in thread
From: Libo Chen @ 2022-08-15 19:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, vincent.guittot, juri.lelli, dietmar.eggemann, mgorman,
	linux-kernel, Daniel Jordan



On 8/15/22 04:01, Peter Zijlstra wrote:
> On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote:
>> There are scenarios where non-affine wakeups are incorrectly counted as
>> affine wakeups by schedstats.
>>
>> When wake_affine_idle() returns prev_cpu which doesn't equal to
>> nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits
>> in wake_affine() and be counted as if target == this_cpu in schedstats.
>>
>> Replace target == nr_cpumask_bits with target != this_cpu to make sure
>> affine wakeups are accurately tallied.
>>
>> Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle)
>> Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com>
>> Signed-off-by: Libo Chen <libo.chen@oracle.com>
>> ---
>>   kernel/sched/fair.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index da388657d5ac..b179da4f8105 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
>>   		target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
>>   
>>   	schedstat_inc(p->stats.nr_wakeups_affine_attempts);
>> -	if (target == nr_cpumask_bits)
>> +	if (target != this_cpu)
>>   		return prev_cpu;
>>   
>>   	schedstat_inc(sd->ttwu_move_affine);
> This not only changes the accounting but also the placement, no?
No, it should only change the accounting. wake_affine() still returns 
prev_cpu if target equals to prev_cpu or nr_cpumask_bits, the same 
behavior as before.


Libo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine
  2022-08-15 19:19   ` Libo Chen
@ 2022-08-17 13:20     ` Vincent Guittot
  2023-01-09 22:00     ` Libo Chen
  1 sibling, 0 replies; 10+ messages in thread
From: Vincent Guittot @ 2022-08-17 13:20 UTC (permalink / raw)
  To: Libo Chen
  Cc: Peter Zijlstra, mingo, juri.lelli, dietmar.eggemann, mgorman,
	linux-kernel, Daniel Jordan

On Mon, 15 Aug 2022 at 21:19, Libo Chen <libo.chen@oracle.com> wrote:
>
>
>
> On 8/15/22 04:01, Peter Zijlstra wrote:
> > On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote:
> >> There are scenarios where non-affine wakeups are incorrectly counted as
> >> affine wakeups by schedstats.
> >>
> >> When wake_affine_idle() returns prev_cpu which doesn't equal to
> >> nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits
> >> in wake_affine() and be counted as if target == this_cpu in schedstats.
> >>
> >> Replace target == nr_cpumask_bits with target != this_cpu to make sure
> >> affine wakeups are accurately tallied.
> >>
> >> Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle)
> >> Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> >> Signed-off-by: Libo Chen <libo.chen@oracle.com>
> >> ---
> >>   kernel/sched/fair.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index da388657d5ac..b179da4f8105 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
> >>              target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
> >>
> >>      schedstat_inc(p->stats.nr_wakeups_affine_attempts);
> >> -    if (target == nr_cpumask_bits)
> >> +    if (target != this_cpu)
> >>              return prev_cpu;
> >>
> >>      schedstat_inc(sd->ttwu_move_affine);
> > This not only changes the accounting but also the placement, no?
> No, it should only change the accounting. wake_affine() still returns
> prev_cpu if target equals to prev_cpu or nr_cpumask_bits, the same
> behavior as before.

Looks good to me

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

>
>
> Libo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine
  2022-08-10 22:33 [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine Libo Chen
  2022-08-15 11:01 ` Peter Zijlstra
@ 2022-08-25  7:30 ` Gautham R. Shenoy
  2022-08-25  9:13   ` Libo Chen
  2023-04-06 10:05 ` [tip: sched/core] " tip-bot2 for Libo Chen
  2 siblings, 1 reply; 10+ messages in thread
From: Gautham R. Shenoy @ 2022-08-25  7:30 UTC (permalink / raw)
  To: Libo Chen
  Cc: mingo, peterz, vincent.guittot, juri.lelli, dietmar.eggemann,
	mgorman, linux-kernel

On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote:
> There are scenarios where non-affine wakeups are incorrectly counted as
> affine wakeups by schedstats.
> 
> When wake_affine_idle() returns prev_cpu which doesn't equal to
> nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits
> in wake_affine() and be counted as if target == this_cpu in schedstats.
> 
> Replace target == nr_cpumask_bits with target != this_cpu to make sure
> affine wakeups are accurately tallied.
> 
> Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle)
> Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Signed-off-by: Libo Chen <libo.chen@oracle.com>
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index da388657d5ac..b179da4f8105 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
>  		target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
>  
>  	schedstat_inc(p->stats.nr_wakeups_affine_attempts);
> -	if (target == nr_cpumask_bits)
> +	if (target != this_cpu)
>  		return prev_cpu;


This seems to be the right thing to do. However,..

if this_cpu and prev_cpu were in the same LLC and we pick prev_cpu,
technically is it still not an affine wakeup?


>  
>  	schedstat_inc(sd->ttwu_move_affine);
> -- 
> 2.31.1
>

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine
  2022-08-25  7:30 ` Gautham R. Shenoy
@ 2022-08-25  9:13   ` Libo Chen
  2023-03-30 10:39     ` Gautham R. Shenoy
  0 siblings, 1 reply; 10+ messages in thread
From: Libo Chen @ 2022-08-25  9:13 UTC (permalink / raw)
  To: Gautham R. Shenoy
  Cc: mingo, peterz, vincent.guittot, juri.lelli, dietmar.eggemann,
	mgorman, linux-kernel, Daniel Jordan



On 8/25/22 00:30, Gautham R. Shenoy wrote:
> On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote:
>> There are scenarios where non-affine wakeups are incorrectly counted as
>> affine wakeups by schedstats.
>>
>> When wake_affine_idle() returns prev_cpu which doesn't equal to
>> nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits
>> in wake_affine() and be counted as if target == this_cpu in schedstats.
>>
>> Replace target == nr_cpumask_bits with target != this_cpu to make sure
>> affine wakeups are accurately tallied.
>>
>> Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle)
>> Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com>
>> Signed-off-by: Libo Chen <libo.chen@oracle.com>
>> ---
>>   kernel/sched/fair.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index da388657d5ac..b179da4f8105 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
>>   		target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
>>   
>>   	schedstat_inc(p->stats.nr_wakeups_affine_attempts);
>> -	if (target == nr_cpumask_bits)
>> +	if (target != this_cpu)
>>   		return prev_cpu;
>
> This seems to be the right thing to do. However,..
>
> if this_cpu and prev_cpu were in the same LLC and we pick prev_cpu,
> technically is it still not an affine wakeup?
>
I think schedstats like ttwu_move_affine/ttwu_wake_remote is defined 
within a sched domain, so if the waking CPU and the previous CPU are in 
the same MC domain, then picking the previous CPU is a remote wakeup
within that MC. If the two candidate CPUs are from two different NUMA 
nodes, then picking the waking CPU is an affine wakeup within that NUMA 
domain. Correct me if I am wrong, this definition is consistent across
all levels of sched domains.

But I do understand that when two candidate CPUs are within an LLC,
     a) all the fast-path wakeups should be affine wakeups if your 
definition of an affine wakeup is a wakeup to the same LLC of the waker.
     b) select_idle_sibling() may pick a CPU in that LLC other than the 
two candidate CPUs which makes the affine/remote stats here useless even 
if we are consistent with ttwu_move_affine/ttwu_wake_remote
        definition.

I personally think it's just too much trouble to add additional code in 
the kernel to, let's say, treat all wakeups within an LLC as 
ttwu_move_affine. It's a lot easier to do that when you process 
schedstats data,
whether you want to treat all wakeups in LLC domains as affine wakeups 
or ignore ttwu_move_affine/ttwu_wake_remote stats from LLC domains.



>>   
>>   	schedstat_inc(sd->ttwu_move_affine);
>> -- 
>> 2.31.1
>>
> --
> Thanks and Regards
> gautham.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine
  2022-08-15 19:19   ` Libo Chen
  2022-08-17 13:20     ` Vincent Guittot
@ 2023-01-09 22:00     ` Libo Chen
  2023-03-09  3:17       ` Libo Chen
  1 sibling, 1 reply; 10+ messages in thread
From: Libo Chen @ 2023-01-09 22:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, vincent.guittot, juri.lelli, dietmar.eggemann, mgorman,
	linux-kernel, Daniel Jordan

Hi Peter,

A gentle ping~ Vincent has signed it off. Let me know what else I should 
do for this patch.

Libo

On 8/15/22 12:19 PM, Libo Chen wrote:
>
>
> On 8/15/22 04:01, Peter Zijlstra wrote:
>> On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote:
>>> There are scenarios where non-affine wakeups are incorrectly counted as
>>> affine wakeups by schedstats.
>>>
>>> When wake_affine_idle() returns prev_cpu which doesn't equal to
>>> nr_cpumask_bits, it will slip through the check: target == 
>>> nr_cpumask_bits
>>> in wake_affine() and be counted as if target == this_cpu in schedstats.
>>>
>>> Replace target == nr_cpumask_bits with target != this_cpu to make sure
>>> affine wakeups are accurately tallied.
>>>
>>> Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is 
>>> idle)
>>> Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com>
>>> Signed-off-by: Libo Chen <libo.chen@oracle.com>
>>> ---
>>>   kernel/sched/fair.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index da388657d5ac..b179da4f8105 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain 
>>> *sd, struct task_struct *p,
>>>           target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
>>>         schedstat_inc(p->stats.nr_wakeups_affine_attempts);
>>> -    if (target == nr_cpumask_bits)
>>> +    if (target != this_cpu)
>>>           return prev_cpu;
>>>         schedstat_inc(sd->ttwu_move_affine);
>> This not only changes the accounting but also the placement, no?
> No, it should only change the accounting. wake_affine() still returns 
> prev_cpu if target equals to prev_cpu or nr_cpumask_bits, the same 
> behavior as before.
>
>
> Libo


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine
  2023-01-09 22:00     ` Libo Chen
@ 2023-03-09  3:17       ` Libo Chen
  0 siblings, 0 replies; 10+ messages in thread
From: Libo Chen @ 2023-03-09  3:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, vincent.guittot, juri.lelli, dietmar.eggemann, mgorman,
	linux-kernel, Daniel Jordan



> On Jan 9, 2023, at 2:00 PM, Libo Chen <libo.chen@oracle.com> wrote:
> 
> Hi Peter,
> 
> A gentle ping~ Vincent has signed it off. Let me know what else I should do for this patch.
> 
> Libo
> 
> On 8/15/22 12:19 PM, Libo Chen wrote:
>> 
>> 
>> On 8/15/22 04:01, Peter Zijlstra wrote:
>>> On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote:
>>>> There are scenarios where non-affine wakeups are incorrectly counted as
>>>> affine wakeups by schedstats.
>>>> 
>>>> When wake_affine_idle() returns prev_cpu which doesn't equal to
>>>> nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits
>>>> in wake_affine() and be counted as if target == this_cpu in schedstats.
>>>> 
>>>> Replace target == nr_cpumask_bits with target != this_cpu to make sure
>>>> affine wakeups are accurately tallied.
>>>> 
>>>> Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle)
>>>> Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com>
>>>> Signed-off-by: Libo Chen <libo.chen@oracle.com>
>>>> ---
>>>>   kernel/sched/fair.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>> 
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index da388657d5ac..b179da4f8105 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
>>>>           target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
>>>>         schedstat_inc(p->stats.nr_wakeups_affine_attempts);
>>>> -    if (target == nr_cpumask_bits)
>>>> +    if (target != this_cpu)
>>>>           return prev_cpu;
>>>>         schedstat_inc(sd->ttwu_move_affine);
>>> This not only changes the accounting but also the placement, no?
>> No, it should only change the accounting. wake_affine() still returns prev_cpu if target equals to prev_cpu or nr_cpumask_bits, the same behavior as before.
>> 

Hi Peter,

A second ping in case you missed the first one, what else should I do to get this fix in?


Libo 

>> 
>> Libo
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine
  2022-08-25  9:13   ` Libo Chen
@ 2023-03-30 10:39     ` Gautham R. Shenoy
  0 siblings, 0 replies; 10+ messages in thread
From: Gautham R. Shenoy @ 2023-03-30 10:39 UTC (permalink / raw)
  To: Libo Chen
  Cc: mingo, peterz, vincent.guittot, juri.lelli, dietmar.eggemann,
	mgorman, linux-kernel, Daniel Jordan

Hello Libo,

On Thu, Aug 25, 2022 at 02:13:17AM -0700, Libo Chen wrote:
>

Sorry, looks like this message got burried under the pile.

> 
> On 8/25/22 00:30, Gautham R. Shenoy wrote:
> > On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote:
> > > There are scenarios where non-affine wakeups are incorrectly counted as
> > > affine wakeups by schedstats.
> > > 
> > > When wake_affine_idle() returns prev_cpu which doesn't equal to
> > > nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits
> > > in wake_affine() and be counted as if target == this_cpu in schedstats.
> > > 
> > > Replace target == nr_cpumask_bits with target != this_cpu to make sure
> > > affine wakeups are accurately tallied.
> > > 
> > > Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle)
> > > Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> > > Signed-off-by: Libo Chen <libo.chen@oracle.com>
> > > ---
> > >   kernel/sched/fair.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index da388657d5ac..b179da4f8105 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
> > >   		target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
> > >   	schedstat_inc(p->stats.nr_wakeups_affine_attempts);
> > > -	if (target == nr_cpumask_bits)
> > > +	if (target != this_cpu)
> > >   		return prev_cpu;
> > 
> > This seems to be the right thing to do. However,..
> > 
> > if this_cpu and prev_cpu were in the same LLC and we pick prev_cpu,
> > technically is it still not an affine wakeup?
> > 
> I think schedstats like ttwu_move_affine/ttwu_wake_remote is defined within
> a sched domain, so if the waking CPU and the previous CPU are in the same MC
> domain, then picking the previous CPU is a remote wakeup
> within that MC. If the two candidate CPUs are from two different NUMA nodes,
> then picking the waking CPU is an affine wakeup within that NUMA domain.
> Correct me if I am wrong, this definition is consistent across
> all levels of sched domains.

Yes, the definition of ttwu_wake_remote in the lowest sched-domain
containing both the prev_cpu and this_cpu, is target_cpu != this_cpu.
This is fairly unambiguous.


From the code, the definition of an ttwu_move_affine is to capture an
_attempt_ to chose the this_cpu as the target_cpu in the lowest
sched-domain containing both prev CPU and this_cpu. It is merely an
attempt since the actual target_CPU is selected by SIS and could be
any idle CPU in the LLC of the prev/this_cpu. 

ttwu_move_affine makes sense for sched-domains higher than the LLC
domain since we move the task to the LLC of this_cpu away from the LLC
of the prev_cpu (This is possible on AMD processors which contains
multiple LLC domains within a NUMA node). Having given it some more
thought, I am not sure how to interpret this metric for the LLC domain
and lower ones, since the eventual target CPU may not even be "closer"
to this_cpu.

> 
> But I do understand that when two candidate CPUs are within an LLC,
>     a) all the fast-path wakeups should be affine wakeups if your definition
> of an affine wakeup is a wakeup to the same LLC of the waker.
>     b) select_idle_sibling() may pick a CPU in that LLC other than the two
> candidate CPUs which makes the affine/remote stats here useless even if we
> are consistent with ttwu_move_affine/ttwu_wake_remote
>        definition.

Fair enough.

> 
> I personally think it's just too much trouble to add additional code in the
> kernel to, let's say, treat all wakeups within an LLC as ttwu_move_affine.
> It's a lot easier to do that when you process schedstats data,
> whether you want to treat all wakeups in LLC domains as affine wakeups or
> ignore ttwu_move_affine/ttwu_wake_remote stats from LLC domains.


I agree.

I think that having your fix is the right thing, since currently the
move_affine data in schedstats isn't accurate when wake_affine_idle()
or wake_affine_weight() picks the prev_cpu, especially when prev_cpu
and this_cpu are in sched-domains higher than the LLC. Thus today we
overcount affine wakeups which is incorrect.

So,
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip: sched/core] sched/fair: Fix inaccurate tally of ttwu_move_affine
  2022-08-10 22:33 [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine Libo Chen
  2022-08-15 11:01 ` Peter Zijlstra
  2022-08-25  7:30 ` Gautham R. Shenoy
@ 2023-04-06 10:05 ` tip-bot2 for Libo Chen
  2 siblings, 0 replies; 10+ messages in thread
From: tip-bot2 for Libo Chen @ 2023-04-06 10:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Daniel Jordan, Libo Chen, Peter Zijlstra (Intel),
	Gautham R. Shenoy, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     39afe5d6fc59237ff7738bf3ede5a8856822d59d
Gitweb:        https://git.kernel.org/tip/39afe5d6fc59237ff7738bf3ede5a8856822d59d
Author:        Libo Chen <libo.chen@oracle.com>
AuthorDate:    Wed, 10 Aug 2022 15:33:13 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 05 Apr 2023 09:58:48 +02:00

sched/fair: Fix inaccurate tally of ttwu_move_affine

There are scenarios where non-affine wakeups are incorrectly counted as
affine wakeups by schedstats.

When wake_affine_idle() returns prev_cpu which doesn't equal to
nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits
in wake_affine() and be counted as if target == this_cpu in schedstats.

Replace target == nr_cpumask_bits with target != this_cpu to make sure
affine wakeups are accurately tallied.

Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle)
Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Libo Chen <libo.chen@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20220810223313.386614-1-libo.chen@oracle.com
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bc358dc..f5da01a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6582,7 +6582,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
 		target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
 
 	schedstat_inc(p->stats.nr_wakeups_affine_attempts);
-	if (target == nr_cpumask_bits)
+	if (target != this_cpu)
 		return prev_cpu;
 
 	schedstat_inc(sd->ttwu_move_affine);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-04-06 10:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-10 22:33 [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine Libo Chen
2022-08-15 11:01 ` Peter Zijlstra
2022-08-15 19:19   ` Libo Chen
2022-08-17 13:20     ` Vincent Guittot
2023-01-09 22:00     ` Libo Chen
2023-03-09  3:17       ` Libo Chen
2022-08-25  7:30 ` Gautham R. Shenoy
2022-08-25  9:13   ` Libo Chen
2023-03-30 10:39     ` Gautham R. Shenoy
2023-04-06 10:05 ` [tip: sched/core] " tip-bot2 for Libo Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).