All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/fair: Track target domain's avg_scan_cost in select_idle_cpu
@ 2021-12-11 10:43 Yicong Yang
  2021-12-22 10:47 ` Vincent Guittot
  0 siblings, 1 reply; 8+ messages in thread
From: Yicong Yang @ 2021-12-11 10:43 UTC (permalink / raw)
  To: peterz, mingo, juri.lelli, vincent.guittot, mgorman, linux-kernel
  Cc: dietmar.eggemann, rostedt, bsegall, bristot, prime.zeng,
	yangyicong, linuxarm, 21cnbao

We regulate the LLC domain scan in select_idle_cpu() by comparing
the average scan cost of this_sd against the average idle time of
this_rq. This is correct when the domain to scan is the LLC domain
of this cpu. But when the domain to scan is different from this
LLC domain, we'll have an inaccurate estimation of the scan cost
on the target domain as this_sd->avg_scan_cost contains contributions
of scanning other domains besides the target domain.

Track the avg_scan_cost of the target domain to make the estimation
more accurate.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
 kernel/sched/fair.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6e476f6d9435..6301740d98cb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6267,7 +6267,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
 		}
 
 		avg_idle = this_rq->wake_avg_idle;
-		avg_cost = this_sd->avg_scan_cost + 1;
+		avg_cost = sd->avg_scan_cost + 1;
 
 		span_avg = sd->span_weight * avg_idle;
 		if (span_avg > 4*avg_cost)
@@ -6305,7 +6305,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
 		 */
 		this_rq->wake_avg_idle -= min(this_rq->wake_avg_idle, time);
 
-		update_avg(&this_sd->avg_scan_cost, time);
+		update_avg(&sd->avg_scan_cost, time);
 	}
 
 	return idle_cpu;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/fair: Track target domain's avg_scan_cost in select_idle_cpu
  2021-12-11 10:43 [PATCH] sched/fair: Track target domain's avg_scan_cost in select_idle_cpu Yicong Yang
@ 2021-12-22 10:47 ` Vincent Guittot
  2021-12-23  8:23   ` Yicong Yang
  0 siblings, 1 reply; 8+ messages in thread
From: Vincent Guittot @ 2021-12-22 10:47 UTC (permalink / raw)
  To: Yicong Yang
  Cc: peterz, mingo, juri.lelli, mgorman, linux-kernel,
	dietmar.eggemann, rostedt, bsegall, bristot, prime.zeng,
	linuxarm, 21cnbao

On Sat, 11 Dec 2021 at 11:43, Yicong Yang <yangyicong@hisilicon.com> wrote:
>
> We regulate the LLC domain scan in select_idle_cpu() by comparing
> the average scan cost of this_sd against the average idle time of
> this_rq. This is correct when the domain to scan is the LLC domain
> of this cpu. But when the domain to scan is different from this
> LLC domain, we'll have an inaccurate estimation of the scan cost
> on the target domain as this_sd->avg_scan_cost contains contributions
> of scanning other domains besides the target domain.
>
> Track the avg_scan_cost of the target domain to make the estimation
> more accurate.
>
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
>  kernel/sched/fair.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6e476f6d9435..6301740d98cb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6267,7 +6267,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>                 }
>
>                 avg_idle = this_rq->wake_avg_idle;
> -               avg_cost = this_sd->avg_scan_cost + 1;
> +               avg_cost = sd->avg_scan_cost + 1;
>
>                 span_avg = sd->span_weight * avg_idle;
>                 if (span_avg > 4*avg_cost)
> @@ -6305,7 +6305,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>                  */
>                 this_rq->wake_avg_idle -= min(this_rq->wake_avg_idle, time);
>
> -               update_avg(&this_sd->avg_scan_cost, time);
> +               update_avg(&sd->avg_scan_cost, time);

But then you can have several cpus updating the same value simultaneously

>         }
>
>         return idle_cpu;
> --
> 2.33.0
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/fair: Track target domain's avg_scan_cost in select_idle_cpu
  2021-12-22 10:47 ` Vincent Guittot
@ 2021-12-23  8:23   ` Yicong Yang
  2022-01-04 14:18     ` Vincent Guittot
  0 siblings, 1 reply; 8+ messages in thread
From: Yicong Yang @ 2021-12-23  8:23 UTC (permalink / raw)
  To: Vincent Guittot, Yicong Yang
  Cc: peterz, mingo, juri.lelli, mgorman, linux-kernel,
	dietmar.eggemann, rostedt, bsegall, bristot, prime.zeng,
	linuxarm, 21cnbao

On 2021/12/22 18:47, Vincent Guittot wrote:
> On Sat, 11 Dec 2021 at 11:43, Yicong Yang <yangyicong@hisilicon.com> wrote:
>>
>> We regulate the LLC domain scan in select_idle_cpu() by comparing
>> the average scan cost of this_sd against the average idle time of
>> this_rq. This is correct when the domain to scan is the LLC domain
>> of this cpu. But when the domain to scan is different from this
>> LLC domain, we'll have an inaccurate estimation of the scan cost
>> on the target domain as this_sd->avg_scan_cost contains contributions
>> of scanning other domains besides the target domain.
>>
>> Track the avg_scan_cost of the target domain to make the estimation
>> more accurate.
>>
>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>> ---
>>  kernel/sched/fair.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 6e476f6d9435..6301740d98cb 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6267,7 +6267,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>>                 }
>>
>>                 avg_idle = this_rq->wake_avg_idle;
>> -               avg_cost = this_sd->avg_scan_cost + 1;
>> +               avg_cost = sd->avg_scan_cost + 1;
>>
>>                 span_avg = sd->span_weight * avg_idle;
>>                 if (span_avg > 4*avg_cost)
>> @@ -6305,7 +6305,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>>                  */
>>                 this_rq->wake_avg_idle -= min(this_rq->wake_avg_idle, time);
>>
>> -               update_avg(&this_sd->avg_scan_cost, time);
>> +               update_avg(&sd->avg_scan_cost, time);
> 
> But then you can have several cpus updating the same value simultaneously
> 

yes. sd->avg_scan_cost should includes the contributions of all the cpus scanned the sd.

We regulated the scanning nr based on two things:
- avg_idle: to indicate how much time we can have for this time scanning
- avg_cost: to indicate how much time we'll spend for scanning the target domain based
            on the history cost

Previously sd->avg_scan_cost may not reflect the cost as it count the scanning cost
on the domain of the scanner cpu, which may not be the domain the cpu scanned.
For example, cpu 0 on llc A scanned llc B and llc C, we'll count the cost of scanning B
and C on llc A's avg_scan_cost and we'll use this to estimate the cost for scanning
llc A, which is not accurate.

>>         }
>>
>>         return idle_cpu;
>> --
>> 2.33.0
>>
> .
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/fair: Track target domain's avg_scan_cost in select_idle_cpu
  2021-12-23  8:23   ` Yicong Yang
@ 2022-01-04 14:18     ` Vincent Guittot
  2022-01-06  9:10       ` Yicong Yang
  0 siblings, 1 reply; 8+ messages in thread
From: Vincent Guittot @ 2022-01-04 14:18 UTC (permalink / raw)
  To: Yicong Yang
  Cc: Yicong Yang, peterz, mingo, juri.lelli, mgorman, linux-kernel,
	dietmar.eggemann, rostedt, bsegall, bristot, prime.zeng,
	linuxarm, 21cnbao

On Thu, 23 Dec 2021 at 09:23, Yicong Yang <yangyicong@huawei.com> wrote:
>
> On 2021/12/22 18:47, Vincent Guittot wrote:
> > On Sat, 11 Dec 2021 at 11:43, Yicong Yang <yangyicong@hisilicon.com> wrote:
> >>
> >> We regulate the LLC domain scan in select_idle_cpu() by comparing
> >> the average scan cost of this_sd against the average idle time of
> >> this_rq. This is correct when the domain to scan is the LLC domain
> >> of this cpu. But when the domain to scan is different from this
> >> LLC domain, we'll have an inaccurate estimation of the scan cost
> >> on the target domain as this_sd->avg_scan_cost contains contributions
> >> of scanning other domains besides the target domain.
> >>
> >> Track the avg_scan_cost of the target domain to make the estimation
> >> more accurate.
> >>
> >> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> >> ---
> >>  kernel/sched/fair.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 6e476f6d9435..6301740d98cb 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -6267,7 +6267,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> >>                 }
> >>
> >>                 avg_idle = this_rq->wake_avg_idle;
> >> -               avg_cost = this_sd->avg_scan_cost + 1;
> >> +               avg_cost = sd->avg_scan_cost + 1;
> >>
> >>                 span_avg = sd->span_weight * avg_idle;
> >>                 if (span_avg > 4*avg_cost)
> >> @@ -6305,7 +6305,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> >>                  */
> >>                 this_rq->wake_avg_idle -= min(this_rq->wake_avg_idle, time);
> >>
> >> -               update_avg(&this_sd->avg_scan_cost, time);
> >> +               update_avg(&sd->avg_scan_cost, time);
> >
> > But then you can have several cpus updating the same value simultaneously
> >
>
> yes. sd->avg_scan_cost should includes the contributions of all the cpus scanned the sd.
>
> We regulated the scanning nr based on two things:
> - avg_idle: to indicate how much time we can have for this time scanning
> - avg_cost: to indicate how much time we'll spend for scanning the target domain based
>             on the history cost
>
> Previously sd->avg_scan_cost may not reflect the cost as it count the scanning cost
> on the domain of the scanner cpu, which may not be the domain the cpu scanned.
> For example, cpu 0 on llc A scanned llc B and llc C, we'll count the cost of scanning B
> and C on llc A's avg_scan_cost and we'll use this to estimate the cost for scanning
> llc A, which is not accurate.

I mean that you can now have several CPUs that will read, modify,
write sd->avg_scan_cost simultaneously without any protection

>
> >>         }
> >>
> >>         return idle_cpu;
> >> --
> >> 2.33.0
> >>
> > .
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/fair: Track target domain's avg_scan_cost in select_idle_cpu
  2022-01-04 14:18     ` Vincent Guittot
@ 2022-01-06  9:10       ` Yicong Yang
  2022-01-11  7:58         ` Vincent Guittot
  0 siblings, 1 reply; 8+ messages in thread
From: Yicong Yang @ 2022-01-06  9:10 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: yangyicong, peterz, mingo, juri.lelli, mgorman, linux-kernel,
	dietmar.eggemann, rostedt, bsegall, bristot, prime.zeng,
	linuxarm, 21cnbao, tiantao (H)

On 2022/1/4 22:18, Vincent Guittot wrote:
> On Thu, 23 Dec 2021 at 09:23, Yicong Yang <yangyicong@huawei.com> wrote:
>>
>> On 2021/12/22 18:47, Vincent Guittot wrote:
>>> On Sat, 11 Dec 2021 at 11:43, Yicong Yang <yangyicong@hisilicon.com> wrote:
>>>>
>>>> We regulate the LLC domain scan in select_idle_cpu() by comparing
>>>> the average scan cost of this_sd against the average idle time of
>>>> this_rq. This is correct when the domain to scan is the LLC domain
>>>> of this cpu. But when the domain to scan is different from this
>>>> LLC domain, we'll have an inaccurate estimation of the scan cost
>>>> on the target domain as this_sd->avg_scan_cost contains contributions
>>>> of scanning other domains besides the target domain.
>>>>
>>>> Track the avg_scan_cost of the target domain to make the estimation
>>>> more accurate.
>>>>
>>>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>>>> ---
>>>>  kernel/sched/fair.c | 4 ++--
>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 6e476f6d9435..6301740d98cb 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -6267,7 +6267,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>>>>                 }
>>>>
>>>>                 avg_idle = this_rq->wake_avg_idle;
>>>> -               avg_cost = this_sd->avg_scan_cost + 1;
>>>> +               avg_cost = sd->avg_scan_cost + 1;
>>>>
>>>>                 span_avg = sd->span_weight * avg_idle;
>>>>                 if (span_avg > 4*avg_cost)
>>>> @@ -6305,7 +6305,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>>>>                  */
>>>>                 this_rq->wake_avg_idle -= min(this_rq->wake_avg_idle, time);
>>>>
>>>> -               update_avg(&this_sd->avg_scan_cost, time);
>>>> +               update_avg(&sd->avg_scan_cost, time);
>>>
>>> But then you can have several cpus updating the same value simultaneously
>>>
>>
>> yes. sd->avg_scan_cost should includes the contributions of all the cpus scanned the sd.
>>
>> We regulated the scanning nr based on two things:
>> - avg_idle: to indicate how much time we can have for this time scanning
>> - avg_cost: to indicate how much time we'll spend for scanning the target domain based
>>             on the history cost
>>
>> Previously sd->avg_scan_cost may not reflect the cost as it count the scanning cost
>> on the domain of the scanner cpu, which may not be the domain the cpu scanned.
>> For example, cpu 0 on llc A scanned llc B and llc C, we'll count the cost of scanning B
>> and C on llc A's avg_scan_cost and we'll use this to estimate the cost for scanning
>> llc A, which is not accurate.
> 
> I mean that you can now have several CPUs that will read, modify,
> write sd->avg_scan_cost simultaneously without any protection
> 

uh I misunderstood. not sure I've missed something, but looks like we also have this problem
when updating &this_sd->avg_scan_cost?

>>
>>>>         }
>>>>
>>>>         return idle_cpu;
>>>> --
>>>> 2.33.0
>>>>
>>> .
>>>
> .
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/fair: Track target domain's avg_scan_cost in select_idle_cpu
  2022-01-06  9:10       ` Yicong Yang
@ 2022-01-11  7:58         ` Vincent Guittot
  2022-01-11  8:31           ` Barry Song
  0 siblings, 1 reply; 8+ messages in thread
From: Vincent Guittot @ 2022-01-11  7:58 UTC (permalink / raw)
  To: Yicong Yang
  Cc: yangyicong, peterz, mingo, juri.lelli, mgorman, linux-kernel,
	dietmar.eggemann, rostedt, bsegall, bristot, prime.zeng,
	linuxarm, 21cnbao, tiantao (H)

On Thu, 6 Jan 2022 at 10:10, Yicong Yang <yangyicong@huawei.com> wrote:
>
> On 2022/1/4 22:18, Vincent Guittot wrote:
> > On Thu, 23 Dec 2021 at 09:23, Yicong Yang <yangyicong@huawei.com> wrote:
> >>
> >> On 2021/12/22 18:47, Vincent Guittot wrote:
> >>> On Sat, 11 Dec 2021 at 11:43, Yicong Yang <yangyicong@hisilicon.com> wrote:
> >>>>
> >>>> We regulate the LLC domain scan in select_idle_cpu() by comparing
> >>>> the average scan cost of this_sd against the average idle time of
> >>>> this_rq. This is correct when the domain to scan is the LLC domain
> >>>> of this cpu. But when the domain to scan is different from this
> >>>> LLC domain, we'll have an inaccurate estimation of the scan cost
> >>>> on the target domain as this_sd->avg_scan_cost contains contributions
> >>>> of scanning other domains besides the target domain.
> >>>>
> >>>> Track the avg_scan_cost of the target domain to make the estimation
> >>>> more accurate.
> >>>>
> >>>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> >>>> ---
> >>>>  kernel/sched/fair.c | 4 ++--
> >>>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>>> index 6e476f6d9435..6301740d98cb 100644
> >>>> --- a/kernel/sched/fair.c
> >>>> +++ b/kernel/sched/fair.c
> >>>> @@ -6267,7 +6267,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> >>>>                 }
> >>>>
> >>>>                 avg_idle = this_rq->wake_avg_idle;
> >>>> -               avg_cost = this_sd->avg_scan_cost + 1;
> >>>> +               avg_cost = sd->avg_scan_cost + 1;
> >>>>
> >>>>                 span_avg = sd->span_weight * avg_idle;
> >>>>                 if (span_avg > 4*avg_cost)
> >>>> @@ -6305,7 +6305,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> >>>>                  */
> >>>>                 this_rq->wake_avg_idle -= min(this_rq->wake_avg_idle, time);
> >>>>
> >>>> -               update_avg(&this_sd->avg_scan_cost, time);
> >>>> +               update_avg(&sd->avg_scan_cost, time);
> >>>
> >>> But then you can have several cpus updating the same value simultaneously
> >>>
> >>
> >> yes. sd->avg_scan_cost should includes the contributions of all the cpus scanned the sd.
> >>
> >> We regulated the scanning nr based on two things:
> >> - avg_idle: to indicate how much time we can have for this time scanning
> >> - avg_cost: to indicate how much time we'll spend for scanning the target domain based
> >>             on the history cost
> >>
> >> Previously sd->avg_scan_cost may not reflect the cost as it count the scanning cost
> >> on the domain of the scanner cpu, which may not be the domain the cpu scanned.
> >> For example, cpu 0 on llc A scanned llc B and llc C, we'll count the cost of scanning B
> >> and C on llc A's avg_scan_cost and we'll use this to estimate the cost for scanning
> >> llc A, which is not accurate.
> >
> > I mean that you can now have several CPUs that will read, modify,
> > write sd->avg_scan_cost simultaneously without any protection
> >
>
> uh I misunderstood. not sure I've missed something, but looks like we also have this problem
> when updating &this_sd->avg_scan_cost?

No because this_sd->avg_scan_cost is only used by the local cpu so you
don't have several cpus trying to update it simultaneously.

>
> >>
> >>>>         }
> >>>>
> >>>>         return idle_cpu;
> >>>> --
> >>>> 2.33.0
> >>>>
> >>> .
> >>>
> > .
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/fair: Track target domain's avg_scan_cost in select_idle_cpu
  2022-01-11  7:58         ` Vincent Guittot
@ 2022-01-11  8:31           ` Barry Song
  2022-01-11 16:30             ` Vincent Guittot
  0 siblings, 1 reply; 8+ messages in thread
From: Barry Song @ 2022-01-11  8:31 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Yicong Yang, Yicong Yang, Peter Zijlstra, Ingo Molnar,
	Juri Lelli, Mel Gorman, LKML, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Daniel Bristot de Oliveira, prime.zeng, Linuxarm,
	tiantao (H)

On Tue, Jan 11, 2022 at 8:59 PM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Thu, 6 Jan 2022 at 10:10, Yicong Yang <yangyicong@huawei.com> wrote:
> >
> > On 2022/1/4 22:18, Vincent Guittot wrote:
> > > On Thu, 23 Dec 2021 at 09:23, Yicong Yang <yangyicong@huawei.com> wrote:
> > >>
> > >> On 2021/12/22 18:47, Vincent Guittot wrote:
> > >>> On Sat, 11 Dec 2021 at 11:43, Yicong Yang <yangyicong@hisilicon.com> wrote:
> > >>>>
> > >>>> We regulate the LLC domain scan in select_idle_cpu() by comparing
> > >>>> the average scan cost of this_sd against the average idle time of
> > >>>> this_rq. This is correct when the domain to scan is the LLC domain
> > >>>> of this cpu. But when the domain to scan is different from this
> > >>>> LLC domain, we'll have an inaccurate estimation of the scan cost
> > >>>> on the target domain as this_sd->avg_scan_cost contains contributions
> > >>>> of scanning other domains besides the target domain.
> > >>>>
> > >>>> Track the avg_scan_cost of the target domain to make the estimation
> > >>>> more accurate.
> > >>>>
> > >>>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> > >>>> ---
> > >>>>  kernel/sched/fair.c | 4 ++--
> > >>>>  1 file changed, 2 insertions(+), 2 deletions(-)
> > >>>>
> > >>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > >>>> index 6e476f6d9435..6301740d98cb 100644
> > >>>> --- a/kernel/sched/fair.c
> > >>>> +++ b/kernel/sched/fair.c
> > >>>> @@ -6267,7 +6267,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> > >>>>                 }
> > >>>>
> > >>>>                 avg_idle = this_rq->wake_avg_idle;
> > >>>> -               avg_cost = this_sd->avg_scan_cost + 1;
> > >>>> +               avg_cost = sd->avg_scan_cost + 1;
> > >>>>
> > >>>>                 span_avg = sd->span_weight * avg_idle;
> > >>>>                 if (span_avg > 4*avg_cost)
> > >>>> @@ -6305,7 +6305,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> > >>>>                  */
> > >>>>                 this_rq->wake_avg_idle -= min(this_rq->wake_avg_idle, time);
> > >>>>
> > >>>> -               update_avg(&this_sd->avg_scan_cost, time);
> > >>>> +               update_avg(&sd->avg_scan_cost, time);
> > >>>
> > >>> But then you can have several cpus updating the same value simultaneously
> > >>>
> > >>
> > >> yes. sd->avg_scan_cost should includes the contributions of all the cpus scanned the sd.
> > >>
> > >> We regulated the scanning nr based on two things:
> > >> - avg_idle: to indicate how much time we can have for this time scanning
> > >> - avg_cost: to indicate how much time we'll spend for scanning the target domain based
> > >>             on the history cost
> > >>
> > >> Previously sd->avg_scan_cost may not reflect the cost as it count the scanning cost
> > >> on the domain of the scanner cpu, which may not be the domain the cpu scanned.
> > >> For example, cpu 0 on llc A scanned llc B and llc C, we'll count the cost of scanning B
> > >> and C on llc A's avg_scan_cost and we'll use this to estimate the cost for scanning
> > >> llc A, which is not accurate.
> > >
> > > I mean that you can now have several CPUs that will read, modify,
> > > write sd->avg_scan_cost simultaneously without any protection
> > >
> >
> > uh I misunderstood. not sure I've missed something, but looks like we also have this problem
> > when updating &this_sd->avg_scan_cost?
>
> No because this_sd->avg_scan_cost is only used by the local cpu so you
> don't have several cpus trying to update it simultaneously.

Though this is a problem, does the original idea in this patch make some sense?
CPUx might scan its own LLC and it might scan other LLCs.  the scan cost should
be depending on how busy the target LLC is?

it seems improper to have one single scan cost for all targets/LLCs?

>
> >
> > >>
> > >>>>         }
> > >>>>
> > >>>>         return idle_cpu;
> > >>>> --
> > >>>> 2.33.0
> > >>>>
> > >>> .
> > >>>
> > > .
> > >

Thanks
Barry

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/fair: Track target domain's avg_scan_cost in select_idle_cpu
  2022-01-11  8:31           ` Barry Song
@ 2022-01-11 16:30             ` Vincent Guittot
  0 siblings, 0 replies; 8+ messages in thread
From: Vincent Guittot @ 2022-01-11 16:30 UTC (permalink / raw)
  To: Barry Song
  Cc: Yicong Yang, Yicong Yang, Peter Zijlstra, Ingo Molnar,
	Juri Lelli, Mel Gorman, LKML, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Daniel Bristot de Oliveira, prime.zeng, Linuxarm,
	tiantao (H)

On Tue, 11 Jan 2022 at 09:32, Barry Song <21cnbao@gmail.com> wrote:
>
> On Tue, Jan 11, 2022 at 8:59 PM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > On Thu, 6 Jan 2022 at 10:10, Yicong Yang <yangyicong@huawei.com> wrote:
> > >
> > > On 2022/1/4 22:18, Vincent Guittot wrote:
> > > > On Thu, 23 Dec 2021 at 09:23, Yicong Yang <yangyicong@huawei.com> wrote:
> > > >>
> > > >> On 2021/12/22 18:47, Vincent Guittot wrote:
> > > >>> On Sat, 11 Dec 2021 at 11:43, Yicong Yang <yangyicong@hisilicon.com> wrote:
> > > >>>>
> > > >>>> We regulate the LLC domain scan in select_idle_cpu() by comparing
> > > >>>> the average scan cost of this_sd against the average idle time of
> > > >>>> this_rq. This is correct when the domain to scan is the LLC domain
> > > >>>> of this cpu. But when the domain to scan is different from this
> > > >>>> LLC domain, we'll have an inaccurate estimation of the scan cost
> > > >>>> on the target domain as this_sd->avg_scan_cost contains contributions
> > > >>>> of scanning other domains besides the target domain.
> > > >>>>
> > > >>>> Track the avg_scan_cost of the target domain to make the estimation
> > > >>>> more accurate.
> > > >>>>
> > > >>>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> > > >>>> ---
> > > >>>>  kernel/sched/fair.c | 4 ++--
> > > >>>>  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >>>>
> > > >>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > >>>> index 6e476f6d9435..6301740d98cb 100644
> > > >>>> --- a/kernel/sched/fair.c
> > > >>>> +++ b/kernel/sched/fair.c
> > > >>>> @@ -6267,7 +6267,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> > > >>>>                 }
> > > >>>>
> > > >>>>                 avg_idle = this_rq->wake_avg_idle;
> > > >>>> -               avg_cost = this_sd->avg_scan_cost + 1;
> > > >>>> +               avg_cost = sd->avg_scan_cost + 1;
> > > >>>>
> > > >>>>                 span_avg = sd->span_weight * avg_idle;
> > > >>>>                 if (span_avg > 4*avg_cost)
> > > >>>> @@ -6305,7 +6305,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> > > >>>>                  */
> > > >>>>                 this_rq->wake_avg_idle -= min(this_rq->wake_avg_idle, time);
> > > >>>>
> > > >>>> -               update_avg(&this_sd->avg_scan_cost, time);
> > > >>>> +               update_avg(&sd->avg_scan_cost, time);
> > > >>>
> > > >>> But then you can have several cpus updating the same value simultaneously
> > > >>>
> > > >>
> > > >> yes. sd->avg_scan_cost should includes the contributions of all the cpus scanned the sd.
> > > >>
> > > >> We regulated the scanning nr based on two things:
> > > >> - avg_idle: to indicate how much time we can have for this time scanning
> > > >> - avg_cost: to indicate how much time we'll spend for scanning the target domain based
> > > >>             on the history cost
> > > >>
> > > >> Previously sd->avg_scan_cost may not reflect the cost as it count the scanning cost
> > > >> on the domain of the scanner cpu, which may not be the domain the cpu scanned.
> > > >> For example, cpu 0 on llc A scanned llc B and llc C, we'll count the cost of scanning B
> > > >> and C on llc A's avg_scan_cost and we'll use this to estimate the cost for scanning
> > > >> llc A, which is not accurate.
> > > >
> > > > I mean that you can now have several CPUs that will read, modify,
> > > > write sd->avg_scan_cost simultaneously without any protection
> > > >
> > >
> > > uh I misunderstood. not sure I've missed something, but looks like we also have this problem
> > > when updating &this_sd->avg_scan_cost?
> >
> > No because this_sd->avg_scan_cost is only used by the local cpu so you
> > don't have several cpus trying to update it simultaneously.
>
> Though this is a problem, does the original idea in this patch make some sense?
> CPUx might scan its own LLC and it might scan other LLCs.  the scan cost should
> be depending on how busy the target LLC is?
>
> it seems improper to have one single scan cost for all targets/LLCs?

I'm not sure that this would make any real difference. We use local
cpu to prevent costly protection and cache line bouncing between cpus.
There are several shortfalls in the current way to estimate how many
cores we should scan.

>
> >
> > >
> > > >>
> > > >>>>         }
> > > >>>>
> > > >>>>         return idle_cpu;
> > > >>>> --
> > > >>>> 2.33.0
> > > >>>>
> > > >>> .
> > > >>>
> > > > .
> > > >
>
> Thanks
> Barry

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-01-11 16:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-11 10:43 [PATCH] sched/fair: Track target domain's avg_scan_cost in select_idle_cpu Yicong Yang
2021-12-22 10:47 ` Vincent Guittot
2021-12-23  8:23   ` Yicong Yang
2022-01-04 14:18     ` Vincent Guittot
2022-01-06  9:10       ` Yicong Yang
2022-01-11  7:58         ` Vincent Guittot
2022-01-11  8:31           ` Barry Song
2022-01-11 16:30             ` Vincent Guittot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.