All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/1] select idle cpu from idle cpumask in sched domain
@ 2020-09-10  5:42 Aubrey Li
  2020-09-10  5:42 ` [RFC PATCH v1 1/1] sched/fair: " Aubrey Li
  0 siblings, 1 reply; 15+ messages in thread
From: Aubrey Li @ 2020-09-10  5:42 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, valentin.schneider
  Cc: tim.c.chen, linux-kernel, Aubrey Li

I'm writting to see if it makes sense to track idle cpus in a shared cpumask
in sched domain, then a task wakes up it can select idle cpu from this cpumask
instead of scanning all the cpus in the last level cache domain, especially
when the system is heavily loaded, the scanning cost could be significantly
reduced. The price is that the atomic cpumask ops are added to the idle entry
and exit paths.

I tested the following benchmarks on a x86 4 socket system with 24 cores per
socket and 2 hyperthreads per core, total 192 CPUs:

uperf throughput: netperf workload, tcp_nodelay, r/w size = 90

  threads	baseline-avg	%std	patch-avg	%std
  96		1		1.24	0.98		2.76
  144		1		1.13	1.35		4.01
  192		1		0.58	1.67		3.25
  240		1		2.49	1.68		3.55

hackbench: process mode, 100000 loops, 40 file descriptors per group

  group		baseline-avg	%std	patch-avg	%std
  2(80)		1		12.05	0.97		9.88
  3(120)	1		12.48	0.95		11.62
  4(160)	1		13.83	0.97		13.22
  5(200)	1		2.76	1.01		2.94	 

schbench: 99th percentile latency, 16 workers per message thread

  mthread	baseline-avg	%std	patch-avg	%std
  6(96)		1		1.24	0.993		1.73
  9(144)	1		0.38	0.998		0.39
  12(192)	1		1.58	0.995		1.64
  15(240)	1		51.71	0.606		37.41

sysbench mysql throughput: read/write, table size = 10,000,000

  thread	baseline-avg	%std	patch-avg	%std
  96		1		1.77	1.015		1.71
  144		1		3.39	0.998		4.05
  192		1		2.88	1.002		2.81
  240		1		2.07	1.011		2.09

kbuild: kexec reboot every time

  baseline-avg	patch-avg
  1		1

Any suggestions are highly appreciated!

Thanks,
-Aubrey

Aubrey Li (1):
  sched/fair: select idle cpu from idle cpumask in sched domain

 include/linux/sched/topology.h | 13 +++++++++++++
 kernel/sched/fair.c            |  4 +++-
 kernel/sched/topology.c        |  2 +-
 3 files changed, 17 insertions(+), 2 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH v1 1/1] sched/fair: select idle cpu from idle cpumask in sched domain
  2020-09-10  5:42 [RFC PATCH v1 0/1] select idle cpu from idle cpumask in sched domain Aubrey Li
@ 2020-09-10  5:42 ` Aubrey Li
  2020-09-11 16:28   ` Qais Yousef
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Aubrey Li @ 2020-09-10  5:42 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, valentin.schneider
  Cc: tim.c.chen, linux-kernel, Aubrey Li, Aubrey Li

Added idle cpumask to track idle cpus in sched domain. When a CPU
enters idle, its corresponding bit in the idle cpumask will be set,
and when the CPU exits idle, its bit will be cleared.

When a task wakes up to select an idle cpu, scanning idle cpumask
has low cost than scanning all the cpus in last level cache domain,
especially when the system is heavily loaded.

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
---
 include/linux/sched/topology.h | 13 +++++++++++++
 kernel/sched/fair.c            |  4 +++-
 kernel/sched/topology.c        |  2 +-
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index fb11091129b3..43a641d26154 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -65,8 +65,21 @@ struct sched_domain_shared {
 	atomic_t	ref;
 	atomic_t	nr_busy_cpus;
 	int		has_idle_cores;
+	/*
+	 * Span of all idle CPUs in this domain.
+	 *
+	 * NOTE: this field is variable length. (Allocated dynamically
+	 * by attaching extra space to the end of the structure,
+	 * depending on how many CPUs the kernel has booted up with)
+	 */
+	unsigned long	idle_cpus_span[];
 };
 
+static inline struct cpumask *sds_idle_cpus(struct sched_domain_shared *sds)
+{
+	return to_cpumask(sds->idle_cpus_span);
+}
+
 struct sched_domain {
 	/* These fields must be setup */
 	struct sched_domain __rcu *parent;	/* top domain must be null terminated */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6b3b59cc51d6..3b6f8a3589be 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6136,7 +6136,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 
 	time = cpu_clock(this);
 
-	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+	cpumask_and(cpus, sds_idle_cpus(sd->shared), p->cpus_ptr);
 
 	for_each_cpu_wrap(cpu, cpus, target) {
 		if (!--nr)
@@ -10182,6 +10182,7 @@ static void set_cpu_sd_state_busy(int cpu)
 	sd->nohz_idle = 0;
 
 	atomic_inc(&sd->shared->nr_busy_cpus);
+	cpumask_clear_cpu(cpu, sds_idle_cpus(sd->shared));
 unlock:
 	rcu_read_unlock();
 }
@@ -10212,6 +10213,7 @@ static void set_cpu_sd_state_idle(int cpu)
 	sd->nohz_idle = 1;
 
 	atomic_dec(&sd->shared->nr_busy_cpus);
+	cpumask_set_cpu(cpu, sds_idle_cpus(sd->shared));
 unlock:
 	rcu_read_unlock();
 }
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 9079d865a935..92d0aeef86bf 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1769,7 +1769,7 @@ static int __sdt_alloc(const struct cpumask *cpu_map)
 
 			*per_cpu_ptr(sdd->sd, j) = sd;
 
-			sds = kzalloc_node(sizeof(struct sched_domain_shared),
+			sds = kzalloc_node(sizeof(struct sched_domain_shared) + cpumask_size(),
 					GFP_KERNEL, cpu_to_node(j));
 			if (!sds)
 				return -ENOMEM;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/1] sched/fair: select idle cpu from idle cpumask in sched domain
  2020-09-10  5:42 ` [RFC PATCH v1 1/1] sched/fair: " Aubrey Li
@ 2020-09-11 16:28   ` Qais Yousef
  2020-09-11 23:04     ` Li, Aubrey
  2020-09-13  3:59   ` Jiang Biao
  2020-09-14 12:32   ` Vincent Guittot
  2 siblings, 1 reply; 15+ messages in thread
From: Qais Yousef @ 2020-09-11 16:28 UTC (permalink / raw)
  To: Aubrey Li
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, valentin.schneider, tim.c.chen,
	linux-kernel, Aubrey Li

On 09/10/20 13:42, Aubrey Li wrote:
> Added idle cpumask to track idle cpus in sched domain. When a CPU
> enters idle, its corresponding bit in the idle cpumask will be set,
> and when the CPU exits idle, its bit will be cleared.
> 
> When a task wakes up to select an idle cpu, scanning idle cpumask
> has low cost than scanning all the cpus in last level cache domain,
> especially when the system is heavily loaded.
> 
> Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
> ---
>  include/linux/sched/topology.h | 13 +++++++++++++
>  kernel/sched/fair.c            |  4 +++-
>  kernel/sched/topology.c        |  2 +-
>  3 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index fb11091129b3..43a641d26154 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -65,8 +65,21 @@ struct sched_domain_shared {
>  	atomic_t	ref;
>  	atomic_t	nr_busy_cpus;
>  	int		has_idle_cores;
> +	/*
> +	 * Span of all idle CPUs in this domain.
> +	 *
> +	 * NOTE: this field is variable length. (Allocated dynamically
> +	 * by attaching extra space to the end of the structure,
> +	 * depending on how many CPUs the kernel has booted up with)
> +	 */
> +	unsigned long	idle_cpus_span[];

Can't you use cpumask_var_t and zalloc_cpumask_var() instead?

The patch looks useful. Did it help you with any particular workload? It'd be
good to expand on that in the commit message.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/1] sched/fair: select idle cpu from idle cpumask in sched domain
  2020-09-11 16:28   ` Qais Yousef
@ 2020-09-11 23:04     ` Li, Aubrey
  2020-09-11 23:16       ` Li, Aubrey
  2020-09-14 10:31       ` Valentin Schneider
  0 siblings, 2 replies; 15+ messages in thread
From: Li, Aubrey @ 2020-09-11 23:04 UTC (permalink / raw)
  To: Qais Yousef, Aubrey Li
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, valentin.schneider, tim.c.chen,
	linux-kernel

On 2020/9/12 0:28, Qais Yousef wrote:
> On 09/10/20 13:42, Aubrey Li wrote:
>> Added idle cpumask to track idle cpus in sched domain. When a CPU
>> enters idle, its corresponding bit in the idle cpumask will be set,
>> and when the CPU exits idle, its bit will be cleared.
>>
>> When a task wakes up to select an idle cpu, scanning idle cpumask
>> has low cost than scanning all the cpus in last level cache domain,
>> especially when the system is heavily loaded.
>>
>> Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
>> ---
>>  include/linux/sched/topology.h | 13 +++++++++++++
>>  kernel/sched/fair.c            |  4 +++-
>>  kernel/sched/topology.c        |  2 +-
>>  3 files changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
>> index fb11091129b3..43a641d26154 100644
>> --- a/include/linux/sched/topology.h
>> +++ b/include/linux/sched/topology.h
>> @@ -65,8 +65,21 @@ struct sched_domain_shared {
>>  	atomic_t	ref;
>>  	atomic_t	nr_busy_cpus;
>>  	int		has_idle_cores;
>> +	/*
>> +	 * Span of all idle CPUs in this domain.
>> +	 *
>> +	 * NOTE: this field is variable length. (Allocated dynamically
>> +	 * by attaching extra space to the end of the structure,
>> +	 * depending on how many CPUs the kernel has booted up with)
>> +	 */
>> +	unsigned long	idle_cpus_span[];
> 
> Can't you use cpumask_var_t and zalloc_cpumask_var() instead?

I can use the existing free code. Do we have a problem of this?

> 
> The patch looks useful. Did it help you with any particular workload? It'd be
> good to expand on that in the commit message.
> 
Odd, that included in patch v1 0/1, did you receive it?

Thanks,
-Aubrey

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/1] sched/fair: select idle cpu from idle cpumask in sched domain
  2020-09-11 23:04     ` Li, Aubrey
@ 2020-09-11 23:16       ` Li, Aubrey
  2020-09-14 10:31       ` Valentin Schneider
  1 sibling, 0 replies; 15+ messages in thread
From: Li, Aubrey @ 2020-09-11 23:16 UTC (permalink / raw)
  To: Qais Yousef, Aubrey Li
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, valentin.schneider, tim.c.chen,
	linux-kernel

On 2020/9/12 7:04, Li, Aubrey wrote:
> On 2020/9/12 0:28, Qais Yousef wrote:
>> On 09/10/20 13:42, Aubrey Li wrote:
>>> Added idle cpumask to track idle cpus in sched domain. When a CPU
>>> enters idle, its corresponding bit in the idle cpumask will be set,
>>> and when the CPU exits idle, its bit will be cleared.
>>>
>>> When a task wakes up to select an idle cpu, scanning idle cpumask
>>> has low cost than scanning all the cpus in last level cache domain,
>>> especially when the system is heavily loaded.
>>>
>>> Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
>>> ---
>>>  include/linux/sched/topology.h | 13 +++++++++++++
>>>  kernel/sched/fair.c            |  4 +++-
>>>  kernel/sched/topology.c        |  2 +-
>>>  3 files changed, 17 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
>>> index fb11091129b3..43a641d26154 100644
>>> --- a/include/linux/sched/topology.h
>>> +++ b/include/linux/sched/topology.h
>>> @@ -65,8 +65,21 @@ struct sched_domain_shared {
>>>  	atomic_t	ref;
>>>  	atomic_t	nr_busy_cpus;
>>>  	int		has_idle_cores;
>>> +	/*
>>> +	 * Span of all idle CPUs in this domain.
>>> +	 *
>>> +	 * NOTE: this field is variable length. (Allocated dynamically
>>> +	 * by attaching extra space to the end of the structure,
>>> +	 * depending on how many CPUs the kernel has booted up with)
>>> +	 */
>>> +	unsigned long	idle_cpus_span[];
>>
>> Can't you use cpumask_var_t and zalloc_cpumask_var() instead?
> 
> I can use the existing free code. Do we have a problem of this?
> 
>>
>> The patch looks useful. Did it help you with any particular workload? It'd be
>> good to expand on that in the commit message.
>>
> Odd, that included in patch v1 0/1, did you receive it?

I found it at here:

https://lkml.org/lkml/2020/9/11/645

> 
> Thanks,
> -Aubrey
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/1] sched/fair: select idle cpu from idle cpumask in sched domain
  2020-09-10  5:42 ` [RFC PATCH v1 1/1] sched/fair: " Aubrey Li
  2020-09-11 16:28   ` Qais Yousef
@ 2020-09-13  3:59   ` Jiang Biao
  2020-09-14 12:26     ` Vincent Guittot
  2020-09-14 12:32   ` Vincent Guittot
  2 siblings, 1 reply; 15+ messages in thread
From: Jiang Biao @ 2020-09-13  3:59 UTC (permalink / raw)
  To: Aubrey Li
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Tim Chen, linux-kernel, Aubrey Li

Hi, Aubrey

On Fri, 11 Sep 2020 at 23:48, Aubrey Li <aubrey.li@intel.com> wrote:
>
> Added idle cpumask to track idle cpus in sched domain. When a CPU
> enters idle, its corresponding bit in the idle cpumask will be set,
> and when the CPU exits idle, its bit will be cleared.
>
> When a task wakes up to select an idle cpu, scanning idle cpumask
> has low cost than scanning all the cpus in last level cache domain,
> especially when the system is heavily loaded.
>
> Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
> ---
>  include/linux/sched/topology.h | 13 +++++++++++++
>  kernel/sched/fair.c            |  4 +++-
>  kernel/sched/topology.c        |  2 +-
>  3 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index fb11091129b3..43a641d26154 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -65,8 +65,21 @@ struct sched_domain_shared {
>         atomic_t        ref;
>         atomic_t        nr_busy_cpus;
>         int             has_idle_cores;
> +       /*
> +        * Span of all idle CPUs in this domain.
> +        *
> +        * NOTE: this field is variable length. (Allocated dynamically
> +        * by attaching extra space to the end of the structure,
> +        * depending on how many CPUs the kernel has booted up with)
> +        */
> +       unsigned long   idle_cpus_span[];
>  };
>
> +static inline struct cpumask *sds_idle_cpus(struct sched_domain_shared *sds)
> +{
> +       return to_cpumask(sds->idle_cpus_span);
> +}
> +
>  struct sched_domain {
>         /* These fields must be setup */
>         struct sched_domain __rcu *parent;      /* top domain must be null terminated */
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6b3b59cc51d6..3b6f8a3589be 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6136,7 +6136,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>
>         time = cpu_clock(this);
>
> -       cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> +       cpumask_and(cpus, sds_idle_cpus(sd->shared), p->cpus_ptr);
Is the sds_idle_cpus() always empty if nohz=off?
Do we need to initialize the idle_cpus_span with sched_domain_span(sd)?

>
>         for_each_cpu_wrap(cpu, cpus, target) {
>                 if (!--nr)
> @@ -10182,6 +10182,7 @@ static void set_cpu_sd_state_busy(int cpu)
>         sd->nohz_idle = 0;
>
>         atomic_inc(&sd->shared->nr_busy_cpus);
> +       cpumask_clear_cpu(cpu, sds_idle_cpus(sd->shared));
>  unlock:
>         rcu_read_unlock();
>  }
> @@ -10212,6 +10213,7 @@ static void set_cpu_sd_state_idle(int cpu)
>         sd->nohz_idle = 1;
>
>         atomic_dec(&sd->shared->nr_busy_cpus);
> +       cpumask_set_cpu(cpu, sds_idle_cpus(sd->shared));
This only works when entering/exiting tickless mode? :)
Why not update idle_cpus_span during tick_nohz_idle_enter()/exit()?

Thx.
Regards,
Jiang

>  unlock:
>         rcu_read_unlock();
>  }
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 9079d865a935..92d0aeef86bf 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1769,7 +1769,7 @@ static int __sdt_alloc(const struct cpumask *cpu_map)
>
>                         *per_cpu_ptr(sdd->sd, j) = sd;
>
> -                       sds = kzalloc_node(sizeof(struct sched_domain_shared),
> +                       sds = kzalloc_node(sizeof(struct sched_domain_shared) + cpumask_size(),
>                                         GFP_KERNEL, cpu_to_node(j));
>                         if (!sds)
>                                 return -ENOMEM;
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/1] sched/fair: select idle cpu from idle cpumask in sched domain
  2020-09-11 23:04     ` Li, Aubrey
  2020-09-11 23:16       ` Li, Aubrey
@ 2020-09-14 10:31       ` Valentin Schneider
  2020-09-14 11:08         ` Qais Yousef
  1 sibling, 1 reply; 15+ messages in thread
From: Valentin Schneider @ 2020-09-14 10:31 UTC (permalink / raw)
  To: Li, Aubrey
  Cc: Qais Yousef, Aubrey Li, mingo, peterz, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	tim.c.chen, linux-kernel


On 12/09/20 00:04, Li, Aubrey wrote:
>>> +++ b/include/linux/sched/topology.h
>>> @@ -65,8 +65,21 @@ struct sched_domain_shared {
>>>     atomic_t	ref;
>>>     atomic_t	nr_busy_cpus;
>>>     int		has_idle_cores;
>>> +	/*
>>> +	 * Span of all idle CPUs in this domain.
>>> +	 *
>>> +	 * NOTE: this field is variable length. (Allocated dynamically
>>> +	 * by attaching extra space to the end of the structure,
>>> +	 * depending on how many CPUs the kernel has booted up with)
>>> +	 */
>>> +	unsigned long	idle_cpus_span[];
>>
>> Can't you use cpumask_var_t and zalloc_cpumask_var() instead?
>
> I can use the existing free code. Do we have a problem of this?
>

Nah, flexible array members are the preferred approach here; this also
means we don't let CONFIG_CPUMASK_OFFSTACK dictate where this gets
allocated.

See struct numa_group, struct sched_group, struct sched_domain, struct
em_perf_domain...

>>
>> The patch looks useful. Did it help you with any particular workload? It'd be
>> good to expand on that in the commit message.
>>
> Odd, that included in patch v1 0/1, did you receive it?
>
> Thanks,
> -Aubrey

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/1] sched/fair: select idle cpu from idle cpumask in sched domain
  2020-09-14 10:31       ` Valentin Schneider
@ 2020-09-14 11:08         ` Qais Yousef
  2020-09-14 11:26           ` Valentin Schneider
  0 siblings, 1 reply; 15+ messages in thread
From: Qais Yousef @ 2020-09-14 11:08 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Li, Aubrey, Aubrey Li, mingo, peterz, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	tim.c.chen, linux-kernel

On 09/14/20 11:31, Valentin Schneider wrote:
> 
> On 12/09/20 00:04, Li, Aubrey wrote:
> >>> +++ b/include/linux/sched/topology.h
> >>> @@ -65,8 +65,21 @@ struct sched_domain_shared {
> >>>     atomic_t	ref;
> >>>     atomic_t	nr_busy_cpus;
> >>>     int		has_idle_cores;
> >>> +	/*
> >>> +	 * Span of all idle CPUs in this domain.
> >>> +	 *
> >>> +	 * NOTE: this field is variable length. (Allocated dynamically
> >>> +	 * by attaching extra space to the end of the structure,
> >>> +	 * depending on how many CPUs the kernel has booted up with)
> >>> +	 */
> >>> +	unsigned long	idle_cpus_span[];
> >>
> >> Can't you use cpumask_var_t and zalloc_cpumask_var() instead?
> >
> > I can use the existing free code. Do we have a problem of this?
> >
> 
> Nah, flexible array members are the preferred approach here; this also

Is this your opinion or a rule written somewhere I missed?

> means we don't let CONFIG_CPUMASK_OFFSTACK dictate where this gets
> allocated.
> 
> See struct numa_group, struct sched_group, struct sched_domain, struct
> em_perf_domain...

struct root_domain, struct cpupri_vec, struct generic_pm_domain,
struct irq_common_data..

Use cpumask_var_t.

Both approach look correct to me, so no objection in principle. cpumask_var_t
looks neater IMO and will be necessary once more than one cpumask are required
in a struct.

> 
> >>
> >> The patch looks useful. Did it help you with any particular workload? It'd be
> >> good to expand on that in the commit message.
> >>
> > Odd, that included in patch v1 0/1, did you receive it?

Aubrey,

Sorry I didn't see that no. It's important justification to be part of the
commit message, I think worth adding it.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/1] sched/fair: select idle cpu from idle cpumask in sched domain
  2020-09-14 11:08         ` Qais Yousef
@ 2020-09-14 11:26           ` Valentin Schneider
  2020-09-14 11:31             ` Qais Yousef
  0 siblings, 1 reply; 15+ messages in thread
From: Valentin Schneider @ 2020-09-14 11:26 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Li, Aubrey, Aubrey Li, mingo, peterz, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	tim.c.chen, linux-kernel


On 14/09/20 12:08, Qais Yousef wrote:
> On 09/14/20 11:31, Valentin Schneider wrote:
>>
>> On 12/09/20 00:04, Li, Aubrey wrote:
>> >>> +++ b/include/linux/sched/topology.h
>> >>> @@ -65,8 +65,21 @@ struct sched_domain_shared {
>> >>>     atomic_t	ref;
>> >>>     atomic_t	nr_busy_cpus;
>> >>>     int		has_idle_cores;
>> >>> +	/*
>> >>> +	 * Span of all idle CPUs in this domain.
>> >>> +	 *
>> >>> +	 * NOTE: this field is variable length. (Allocated dynamically
>> >>> +	 * by attaching extra space to the end of the structure,
>> >>> +	 * depending on how many CPUs the kernel has booted up with)
>> >>> +	 */
>> >>> +	unsigned long	idle_cpus_span[];
>> >>
>> >> Can't you use cpumask_var_t and zalloc_cpumask_var() instead?
>> >
>> > I can use the existing free code. Do we have a problem of this?
>> >
>>
>> Nah, flexible array members are the preferred approach here; this also
>
> Is this your opinion or a rule written somewhere I missed?

I don't think there's a written rule, but AIUI it is preferred by at
least Peter:

https://lore.kernel.org/linux-pm/20180612125930.GP12217@hirez.programming.kicks-ass.net/
https://lore.kernel.org/lkml/20180619110734.GO2458@hirez.programming.kicks-ass.net/

And my opinion is that, if you can, having fewer separate allocation is better.

>
>> means we don't let CONFIG_CPUMASK_OFFSTACK dictate where this gets
>> allocated.
>>
>> See struct numa_group, struct sched_group, struct sched_domain, struct
>> em_perf_domain...
>
> struct root_domain, struct cpupri_vec, struct generic_pm_domain,
> struct irq_common_data..
>
> Use cpumask_var_t.
>
> Both approach look correct to me, so no objection in principle. cpumask_var_t
> looks neater IMO and will be necessary once more than one cpumask are required
> in a struct.
>

You're right in that cpumask_var_t becomes necessary when you need more
than one mask. For those that use it despite requiring only one mask
(cpupri stuff, struct nohz too), I'm not sure.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/1] sched/fair: select idle cpu from idle cpumask in sched domain
  2020-09-14 11:26           ` Valentin Schneider
@ 2020-09-14 11:31             ` Qais Yousef
  0 siblings, 0 replies; 15+ messages in thread
From: Qais Yousef @ 2020-09-14 11:31 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Li, Aubrey, Aubrey Li, mingo, peterz, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	tim.c.chen, linux-kernel

On 09/14/20 12:26, Valentin Schneider wrote:
> 
> On 14/09/20 12:08, Qais Yousef wrote:
> > On 09/14/20 11:31, Valentin Schneider wrote:
> >>
> >> On 12/09/20 00:04, Li, Aubrey wrote:
> >> >>> +++ b/include/linux/sched/topology.h
> >> >>> @@ -65,8 +65,21 @@ struct sched_domain_shared {
> >> >>>     atomic_t	ref;
> >> >>>     atomic_t	nr_busy_cpus;
> >> >>>     int		has_idle_cores;
> >> >>> +	/*
> >> >>> +	 * Span of all idle CPUs in this domain.
> >> >>> +	 *
> >> >>> +	 * NOTE: this field is variable length. (Allocated dynamically
> >> >>> +	 * by attaching extra space to the end of the structure,
> >> >>> +	 * depending on how many CPUs the kernel has booted up with)
> >> >>> +	 */
> >> >>> +	unsigned long	idle_cpus_span[];
> >> >>
> >> >> Can't you use cpumask_var_t and zalloc_cpumask_var() instead?
> >> >
> >> > I can use the existing free code. Do we have a problem of this?
> >> >
> >>
> >> Nah, flexible array members are the preferred approach here; this also
> >
> > Is this your opinion or a rule written somewhere I missed?
> 
> I don't think there's a written rule, but AIUI it is preferred by at
> least Peter:
> 
> https://lore.kernel.org/linux-pm/20180612125930.GP12217@hirez.programming.kicks-ass.net/
> https://lore.kernel.org/lkml/20180619110734.GO2458@hirez.programming.kicks-ass.net/
> 
> And my opinion is that, if you can, having fewer separate allocation is better.

+1

> 
> >
> >> means we don't let CONFIG_CPUMASK_OFFSTACK dictate where this gets
> >> allocated.
> >>
> >> See struct numa_group, struct sched_group, struct sched_domain, struct
> >> em_perf_domain...
> >
> > struct root_domain, struct cpupri_vec, struct generic_pm_domain,
> > struct irq_common_data..
> >
> > Use cpumask_var_t.
> >
> > Both approach look correct to me, so no objection in principle. cpumask_var_t
> > looks neater IMO and will be necessary once more than one cpumask are required
> > in a struct.
> >
> 
> You're right in that cpumask_var_t becomes necessary when you need more
> than one mask. For those that use it despite requiring only one mask
> (cpupri stuff, struct nohz too), I'm not sure.

I don't have a strong opinoin. cpumask_var_t is more readable and maintainble
IMO. But it's not a big deal. Any form can be easily changed.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/1] sched/fair: select idle cpu from idle cpumask in sched domain
  2020-09-13  3:59   ` Jiang Biao
@ 2020-09-14 12:26     ` Vincent Guittot
  2020-09-15  8:47       ` Jiang Biao
  0 siblings, 1 reply; 15+ messages in thread
From: Vincent Guittot @ 2020-09-14 12:26 UTC (permalink / raw)
  To: Jiang Biao
  Cc: Aubrey Li, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Tim Chen, linux-kernel, Aubrey Li

On Sun, 13 Sep 2020 at 05:59, Jiang Biao <benbjiang@gmail.com> wrote:
>
> Hi, Aubrey
>
> On Fri, 11 Sep 2020 at 23:48, Aubrey Li <aubrey.li@intel.com> wrote:
> >
> > Added idle cpumask to track idle cpus in sched domain. When a CPU
> > enters idle, its corresponding bit in the idle cpumask will be set,
> > and when the CPU exits idle, its bit will be cleared.
> >
> > When a task wakes up to select an idle cpu, scanning idle cpumask
> > has low cost than scanning all the cpus in last level cache domain,
> > especially when the system is heavily loaded.
> >
> > Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
> > ---
> >  include/linux/sched/topology.h | 13 +++++++++++++
> >  kernel/sched/fair.c            |  4 +++-
> >  kernel/sched/topology.c        |  2 +-
> >  3 files changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> > index fb11091129b3..43a641d26154 100644
> > --- a/include/linux/sched/topology.h
> > +++ b/include/linux/sched/topology.h
> > @@ -65,8 +65,21 @@ struct sched_domain_shared {
> >         atomic_t        ref;
> >         atomic_t        nr_busy_cpus;
> >         int             has_idle_cores;
> > +       /*
> > +        * Span of all idle CPUs in this domain.
> > +        *
> > +        * NOTE: this field is variable length. (Allocated dynamically
> > +        * by attaching extra space to the end of the structure,
> > +        * depending on how many CPUs the kernel has booted up with)
> > +        */
> > +       unsigned long   idle_cpus_span[];
> >  };
> >
> > +static inline struct cpumask *sds_idle_cpus(struct sched_domain_shared *sds)
> > +{
> > +       return to_cpumask(sds->idle_cpus_span);
> > +}
> > +
> >  struct sched_domain {
> >         /* These fields must be setup */
> >         struct sched_domain __rcu *parent;      /* top domain must be null terminated */
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 6b3b59cc51d6..3b6f8a3589be 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6136,7 +6136,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> >
> >         time = cpu_clock(this);
> >
> > -       cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > +       cpumask_and(cpus, sds_idle_cpus(sd->shared), p->cpus_ptr);
> Is the sds_idle_cpus() always empty if nohz=off?

Good point

> Do we need to initialize the idle_cpus_span with sched_domain_span(sd)?
>
> >
> >         for_each_cpu_wrap(cpu, cpus, target) {
> >                 if (!--nr)
> > @@ -10182,6 +10182,7 @@ static void set_cpu_sd_state_busy(int cpu)
> >         sd->nohz_idle = 0;
> >
> >         atomic_inc(&sd->shared->nr_busy_cpus);
> > +       cpumask_clear_cpu(cpu, sds_idle_cpus(sd->shared));
> >  unlock:
> >         rcu_read_unlock();
> >  }
> > @@ -10212,6 +10213,7 @@ static void set_cpu_sd_state_idle(int cpu)
> >         sd->nohz_idle = 1;
> >
> >         atomic_dec(&sd->shared->nr_busy_cpus);
> > +       cpumask_set_cpu(cpu, sds_idle_cpus(sd->shared));
> This only works when entering/exiting tickless mode? :)
> Why not update idle_cpus_span during tick_nohz_idle_enter()/exit()?

set_cpu_sd_state_busy is only called during a tick in order to limit
the rate of the update to once per tick per cpu at most and prevents
any kind of storm of update if short running tasks wake/sleep all the
time. We don't want to update a cpumask at each and every enter/leave
idle.

>
> Thx.
> Regards,
> Jiang
>
> >  unlock:
> >         rcu_read_unlock();
> >  }
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 9079d865a935..92d0aeef86bf 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -1769,7 +1769,7 @@ static int __sdt_alloc(const struct cpumask *cpu_map)
> >
> >                         *per_cpu_ptr(sdd->sd, j) = sd;
> >
> > -                       sds = kzalloc_node(sizeof(struct sched_domain_shared),
> > +                       sds = kzalloc_node(sizeof(struct sched_domain_shared) + cpumask_size(),
> >                                         GFP_KERNEL, cpu_to_node(j));
> >                         if (!sds)
> >                                 return -ENOMEM;
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/1] sched/fair: select idle cpu from idle cpumask in sched domain
  2020-09-10  5:42 ` [RFC PATCH v1 1/1] sched/fair: " Aubrey Li
  2020-09-11 16:28   ` Qais Yousef
  2020-09-13  3:59   ` Jiang Biao
@ 2020-09-14 12:32   ` Vincent Guittot
  2 siblings, 0 replies; 15+ messages in thread
From: Vincent Guittot @ 2020-09-14 12:32 UTC (permalink / raw)
  To: Aubrey Li
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Tim Chen, linux-kernel, Aubrey Li

On Fri, 11 Sep 2020 at 15:28, Aubrey Li <aubrey.li@intel.com> wrote:
>
> Added idle cpumask to track idle cpus in sched domain. When a CPU
> enters idle, its corresponding bit in the idle cpumask will be set,
> and when the CPU exits idle, its bit will be cleared.
>
> When a task wakes up to select an idle cpu, scanning idle cpumask
> has low cost than scanning all the cpus in last level cache domain,
> especially when the system is heavily loaded.
>
> Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
> ---
>  include/linux/sched/topology.h | 13 +++++++++++++
>  kernel/sched/fair.c            |  4 +++-
>  kernel/sched/topology.c        |  2 +-
>  3 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index fb11091129b3..43a641d26154 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -65,8 +65,21 @@ struct sched_domain_shared {
>         atomic_t        ref;
>         atomic_t        nr_busy_cpus;
>         int             has_idle_cores;
> +       /*
> +        * Span of all idle CPUs in this domain.
> +        *
> +        * NOTE: this field is variable length. (Allocated dynamically
> +        * by attaching extra space to the end of the structure,
> +        * depending on how many CPUs the kernel has booted up with)
> +        */
> +       unsigned long   idle_cpus_span[];
>  };
>
> +static inline struct cpumask *sds_idle_cpus(struct sched_domain_shared *sds)
> +{
> +       return to_cpumask(sds->idle_cpus_span);
> +}
> +
>  struct sched_domain {
>         /* These fields must be setup */
>         struct sched_domain __rcu *parent;      /* top domain must be null terminated */
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6b3b59cc51d6..3b6f8a3589be 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6136,7 +6136,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>
>         time = cpu_clock(this);
>
> -       cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> +       cpumask_and(cpus, sds_idle_cpus(sd->shared), p->cpus_ptr);

sched_domain_shared is set only at shared cache level so this works
only because select_idle_cpu is called with sd_llc. It's worth adding
a comment to describe this.

>
>         for_each_cpu_wrap(cpu, cpus, target) {
>                 if (!--nr)
> @@ -10182,6 +10182,7 @@ static void set_cpu_sd_state_busy(int cpu)
>         sd->nohz_idle = 0;
>
>         atomic_inc(&sd->shared->nr_busy_cpus);
> +       cpumask_clear_cpu(cpu, sds_idle_cpus(sd->shared));
>  unlock:
>         rcu_read_unlock();
>  }
> @@ -10212,6 +10213,7 @@ static void set_cpu_sd_state_idle(int cpu)
>         sd->nohz_idle = 1;
>
>         atomic_dec(&sd->shared->nr_busy_cpus);
> +       cpumask_set_cpu(cpu, sds_idle_cpus(sd->shared));
>  unlock:
>         rcu_read_unlock();
>  }
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 9079d865a935..92d0aeef86bf 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1769,7 +1769,7 @@ static int __sdt_alloc(const struct cpumask *cpu_map)
>
>                         *per_cpu_ptr(sdd->sd, j) = sd;
>
> -                       sds = kzalloc_node(sizeof(struct sched_domain_shared),
> +                       sds = kzalloc_node(sizeof(struct sched_domain_shared) + cpumask_size(),
>                                         GFP_KERNEL, cpu_to_node(j));
>                         if (!sds)
>                                 return -ENOMEM;
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/1] sched/fair: select idle cpu from idle cpumask in sched domain
  2020-09-14 12:26     ` Vincent Guittot
@ 2020-09-15  8:47       ` Jiang Biao
  2020-09-15  9:23         ` Vincent Guittot
  0 siblings, 1 reply; 15+ messages in thread
From: Jiang Biao @ 2020-09-15  8:47 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Aubrey Li, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Tim Chen, linux-kernel, Aubrey Li

Hi, Vincent

On Mon, 14 Sep 2020 at 20:26, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Sun, 13 Sep 2020 at 05:59, Jiang Biao <benbjiang@gmail.com> wrote:
> >
> > Hi, Aubrey
> >
> > On Fri, 11 Sep 2020 at 23:48, Aubrey Li <aubrey.li@intel.com> wrote:
> > >
> > > Added idle cpumask to track idle cpus in sched domain. When a CPU
> > > enters idle, its corresponding bit in the idle cpumask will be set,
> > > and when the CPU exits idle, its bit will be cleared.
> > >
> > > When a task wakes up to select an idle cpu, scanning idle cpumask
> > > has low cost than scanning all the cpus in last level cache domain,
> > > especially when the system is heavily loaded.
> > >
> > > Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
> > > ---
> > >  include/linux/sched/topology.h | 13 +++++++++++++
> > >  kernel/sched/fair.c            |  4 +++-
> > >  kernel/sched/topology.c        |  2 +-
> > >  3 files changed, 17 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> > > index fb11091129b3..43a641d26154 100644
> > > --- a/include/linux/sched/topology.h
> > > +++ b/include/linux/sched/topology.h
> > > @@ -65,8 +65,21 @@ struct sched_domain_shared {
> > >         atomic_t        ref;
> > >         atomic_t        nr_busy_cpus;
> > >         int             has_idle_cores;
> > > +       /*
> > > +        * Span of all idle CPUs in this domain.
> > > +        *
> > > +        * NOTE: this field is variable length. (Allocated dynamically
> > > +        * by attaching extra space to the end of the structure,
> > > +        * depending on how many CPUs the kernel has booted up with)
> > > +        */
> > > +       unsigned long   idle_cpus_span[];
> > >  };
> > >
> > > +static inline struct cpumask *sds_idle_cpus(struct sched_domain_shared *sds)
> > > +{
> > > +       return to_cpumask(sds->idle_cpus_span);
> > > +}
> > > +
> > >  struct sched_domain {
> > >         /* These fields must be setup */
> > >         struct sched_domain __rcu *parent;      /* top domain must be null terminated */
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 6b3b59cc51d6..3b6f8a3589be 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -6136,7 +6136,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> > >
> > >         time = cpu_clock(this);
> > >
> > > -       cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > > +       cpumask_and(cpus, sds_idle_cpus(sd->shared), p->cpus_ptr);
> > Is the sds_idle_cpus() always empty if nohz=off?
>
> Good point
>
> > Do we need to initialize the idle_cpus_span with sched_domain_span(sd)?
> >
> > >
> > >         for_each_cpu_wrap(cpu, cpus, target) {
> > >                 if (!--nr)
> > > @@ -10182,6 +10182,7 @@ static void set_cpu_sd_state_busy(int cpu)
> > >         sd->nohz_idle = 0;
> > >
> > >         atomic_inc(&sd->shared->nr_busy_cpus);
> > > +       cpumask_clear_cpu(cpu, sds_idle_cpus(sd->shared));
> > >  unlock:
> > >         rcu_read_unlock();
> > >  }
> > > @@ -10212,6 +10213,7 @@ static void set_cpu_sd_state_idle(int cpu)
> > >         sd->nohz_idle = 1;
> > >
> > >         atomic_dec(&sd->shared->nr_busy_cpus);
> > > +       cpumask_set_cpu(cpu, sds_idle_cpus(sd->shared));
> > This only works when entering/exiting tickless mode? :)
> > Why not update idle_cpus_span during tick_nohz_idle_enter()/exit()?
>
> set_cpu_sd_state_busy is only called during a tick in order to limit
> the rate of the update to once per tick per cpu at most and prevents
> any kind of storm of update if short running tasks wake/sleep all the
> time. We don't want to update a cpumask at each and every enter/leave
> idle.
>
Agree. But set_cpu_sd_state_busy seems not being reached when
nohz=off, which means it will not work for that case? :)

Thx.
Regards,
Jiang

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/1] sched/fair: select idle cpu from idle cpumask in sched domain
  2020-09-15  8:47       ` Jiang Biao
@ 2020-09-15  9:23         ` Vincent Guittot
  2020-09-15 13:38           ` Li, Aubrey
  0 siblings, 1 reply; 15+ messages in thread
From: Vincent Guittot @ 2020-09-15  9:23 UTC (permalink / raw)
  To: Jiang Biao
  Cc: Aubrey Li, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Tim Chen, linux-kernel, Aubrey Li

On Tue, 15 Sep 2020 at 10:47, Jiang Biao <benbjiang@gmail.com> wrote:
>
> Hi, Vincent
>
> On Mon, 14 Sep 2020 at 20:26, Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > On Sun, 13 Sep 2020 at 05:59, Jiang Biao <benbjiang@gmail.com> wrote:
> > >
> > > Hi, Aubrey
> > >
> > > On Fri, 11 Sep 2020 at 23:48, Aubrey Li <aubrey.li@intel.com> wrote:
> > > >
> > > > Added idle cpumask to track idle cpus in sched domain. When a CPU
> > > > enters idle, its corresponding bit in the idle cpumask will be set,
> > > > and when the CPU exits idle, its bit will be cleared.
> > > >
> > > > When a task wakes up to select an idle cpu, scanning idle cpumask
> > > > has low cost than scanning all the cpus in last level cache domain,
> > > > especially when the system is heavily loaded.
> > > >
> > > > Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
> > > > ---
> > > >  include/linux/sched/topology.h | 13 +++++++++++++
> > > >  kernel/sched/fair.c            |  4 +++-
> > > >  kernel/sched/topology.c        |  2 +-
> > > >  3 files changed, 17 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> > > > index fb11091129b3..43a641d26154 100644
> > > > --- a/include/linux/sched/topology.h
> > > > +++ b/include/linux/sched/topology.h
> > > > @@ -65,8 +65,21 @@ struct sched_domain_shared {
> > > >         atomic_t        ref;
> > > >         atomic_t        nr_busy_cpus;
> > > >         int             has_idle_cores;
> > > > +       /*
> > > > +        * Span of all idle CPUs in this domain.
> > > > +        *
> > > > +        * NOTE: this field is variable length. (Allocated dynamically
> > > > +        * by attaching extra space to the end of the structure,
> > > > +        * depending on how many CPUs the kernel has booted up with)
> > > > +        */
> > > > +       unsigned long   idle_cpus_span[];
> > > >  };
> > > >
> > > > +static inline struct cpumask *sds_idle_cpus(struct sched_domain_shared *sds)
> > > > +{
> > > > +       return to_cpumask(sds->idle_cpus_span);
> > > > +}
> > > > +
> > > >  struct sched_domain {
> > > >         /* These fields must be setup */
> > > >         struct sched_domain __rcu *parent;      /* top domain must be null terminated */
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index 6b3b59cc51d6..3b6f8a3589be 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -6136,7 +6136,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> > > >
> > > >         time = cpu_clock(this);
> > > >
> > > > -       cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > > > +       cpumask_and(cpus, sds_idle_cpus(sd->shared), p->cpus_ptr);
> > > Is the sds_idle_cpus() always empty if nohz=off?
> >
> > Good point
> >
> > > Do we need to initialize the idle_cpus_span with sched_domain_span(sd)?
> > >
> > > >
> > > >         for_each_cpu_wrap(cpu, cpus, target) {
> > > >                 if (!--nr)
> > > > @@ -10182,6 +10182,7 @@ static void set_cpu_sd_state_busy(int cpu)
> > > >         sd->nohz_idle = 0;
> > > >
> > > >         atomic_inc(&sd->shared->nr_busy_cpus);
> > > > +       cpumask_clear_cpu(cpu, sds_idle_cpus(sd->shared));
> > > >  unlock:
> > > >         rcu_read_unlock();
> > > >  }
> > > > @@ -10212,6 +10213,7 @@ static void set_cpu_sd_state_idle(int cpu)
> > > >         sd->nohz_idle = 1;
> > > >
> > > >         atomic_dec(&sd->shared->nr_busy_cpus);
> > > > +       cpumask_set_cpu(cpu, sds_idle_cpus(sd->shared));
> > > This only works when entering/exiting tickless mode? :)
> > > Why not update idle_cpus_span during tick_nohz_idle_enter()/exit()?
> >
> > set_cpu_sd_state_busy is only called during a tick in order to limit
> > the rate of the update to once per tick per cpu at most and prevents
> > any kind of storm of update if short running tasks wake/sleep all the
> > time. We don't want to update a cpumask at each and every enter/leave
> > idle.
> >
> Agree. But set_cpu_sd_state_busy seems not being reached when
> nohz=off, which means it will not work for that case? :)

Yes set_cpu_sd_state_idle/busy are nohz function
>
> Thx.
> Regards,
> Jiang

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/1] sched/fair: select idle cpu from idle cpumask in sched domain
  2020-09-15  9:23         ` Vincent Guittot
@ 2020-09-15 13:38           ` Li, Aubrey
  0 siblings, 0 replies; 15+ messages in thread
From: Li, Aubrey @ 2020-09-15 13:38 UTC (permalink / raw)
  To: Vincent Guittot, Jiang Biao
  Cc: Aubrey Li, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Tim Chen, linux-kernel

On 2020/9/15 17:23, Vincent Guittot wrote:
> On Tue, 15 Sep 2020 at 10:47, Jiang Biao <benbjiang@gmail.com> wrote:
>>
>> Hi, Vincent
>>
>> On Mon, 14 Sep 2020 at 20:26, Vincent Guittot
>> <vincent.guittot@linaro.org> wrote:
>>>
>>> On Sun, 13 Sep 2020 at 05:59, Jiang Biao <benbjiang@gmail.com> wrote:
>>>>
>>>> Hi, Aubrey
>>>>
>>>> On Fri, 11 Sep 2020 at 23:48, Aubrey Li <aubrey.li@intel.com> wrote:
>>>>>
>>>>> Added idle cpumask to track idle cpus in sched domain. When a CPU
>>>>> enters idle, its corresponding bit in the idle cpumask will be set,
>>>>> and when the CPU exits idle, its bit will be cleared.
>>>>>
>>>>> When a task wakes up to select an idle cpu, scanning idle cpumask
>>>>> has low cost than scanning all the cpus in last level cache domain,
>>>>> especially when the system is heavily loaded.
>>>>>
>>>>> Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
>>>>> ---
>>>>>  include/linux/sched/topology.h | 13 +++++++++++++
>>>>>  kernel/sched/fair.c            |  4 +++-
>>>>>  kernel/sched/topology.c        |  2 +-
>>>>>  3 files changed, 17 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
>>>>> index fb11091129b3..43a641d26154 100644
>>>>> --- a/include/linux/sched/topology.h
>>>>> +++ b/include/linux/sched/topology.h
>>>>> @@ -65,8 +65,21 @@ struct sched_domain_shared {
>>>>>         atomic_t        ref;
>>>>>         atomic_t        nr_busy_cpus;
>>>>>         int             has_idle_cores;
>>>>> +       /*
>>>>> +        * Span of all idle CPUs in this domain.
>>>>> +        *
>>>>> +        * NOTE: this field is variable length. (Allocated dynamically
>>>>> +        * by attaching extra space to the end of the structure,
>>>>> +        * depending on how many CPUs the kernel has booted up with)
>>>>> +        */
>>>>> +       unsigned long   idle_cpus_span[];
>>>>>  };
>>>>>
>>>>> +static inline struct cpumask *sds_idle_cpus(struct sched_domain_shared *sds)
>>>>> +{
>>>>> +       return to_cpumask(sds->idle_cpus_span);
>>>>> +}
>>>>> +
>>>>>  struct sched_domain {
>>>>>         /* These fields must be setup */
>>>>>         struct sched_domain __rcu *parent;      /* top domain must be null terminated */
>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>> index 6b3b59cc51d6..3b6f8a3589be 100644
>>>>> --- a/kernel/sched/fair.c
>>>>> +++ b/kernel/sched/fair.c
>>>>> @@ -6136,7 +6136,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>>>>>
>>>>>         time = cpu_clock(this);
>>>>>
>>>>> -       cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>>>>> +       cpumask_and(cpus, sds_idle_cpus(sd->shared), p->cpus_ptr);
>>>> Is the sds_idle_cpus() always empty if nohz=off?
>>>
>>> Good point
>>>
>>>> Do we need to initialize the idle_cpus_span with sched_domain_span(sd)?
>>>>
>>>>>
>>>>>         for_each_cpu_wrap(cpu, cpus, target) {
>>>>>                 if (!--nr)
>>>>> @@ -10182,6 +10182,7 @@ static void set_cpu_sd_state_busy(int cpu)
>>>>>         sd->nohz_idle = 0;
>>>>>
>>>>>         atomic_inc(&sd->shared->nr_busy_cpus);
>>>>> +       cpumask_clear_cpu(cpu, sds_idle_cpus(sd->shared));
>>>>>  unlock:
>>>>>         rcu_read_unlock();
>>>>>  }
>>>>> @@ -10212,6 +10213,7 @@ static void set_cpu_sd_state_idle(int cpu)
>>>>>         sd->nohz_idle = 1;
>>>>>
>>>>>         atomic_dec(&sd->shared->nr_busy_cpus);
>>>>> +       cpumask_set_cpu(cpu, sds_idle_cpus(sd->shared));
>>>> This only works when entering/exiting tickless mode? :)
>>>> Why not update idle_cpus_span during tick_nohz_idle_enter()/exit()?
>>>
>>> set_cpu_sd_state_busy is only called during a tick in order to limit
>>> the rate of the update to once per tick per cpu at most and prevents
>>> any kind of storm of update if short running tasks wake/sleep all the
>>> time. We don't want to update a cpumask at each and every enter/leave
>>> idle.
>>>
>> Agree. But set_cpu_sd_state_busy seems not being reached when
>> nohz=off, which means it will not work for that case? :)
> 
> Yes set_cpu_sd_state_idle/busy are nohz function

Thanks Biao to point this out.

If the shared idle cpumask is initialized with sched_domain_span(sd),
then nohz=off case will remain the previous behavior.

Thanks,
-Aubrey

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-09-16  0:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-10  5:42 [RFC PATCH v1 0/1] select idle cpu from idle cpumask in sched domain Aubrey Li
2020-09-10  5:42 ` [RFC PATCH v1 1/1] sched/fair: " Aubrey Li
2020-09-11 16:28   ` Qais Yousef
2020-09-11 23:04     ` Li, Aubrey
2020-09-11 23:16       ` Li, Aubrey
2020-09-14 10:31       ` Valentin Schneider
2020-09-14 11:08         ` Qais Yousef
2020-09-14 11:26           ` Valentin Schneider
2020-09-14 11:31             ` Qais Yousef
2020-09-13  3:59   ` Jiang Biao
2020-09-14 12:26     ` Vincent Guittot
2020-09-15  8:47       ` Jiang Biao
2020-09-15  9:23         ` Vincent Guittot
2020-09-15 13:38           ` Li, Aubrey
2020-09-14 12:32   ` Vincent Guittot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.