linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
@ 2020-12-23  8:09 ultrachin
  2020-12-23 11:30 ` Vincent Guittot
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: ultrachin @ 2020-12-23  8:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, heddchen, xiaoggchen

From: Chen Xiaoguang <xiaoggchen@tencent.com>

Before a CPU switches from running SCHED_NORMAL task to
SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
CPU by doing load_balance first.

Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
Signed-off-by: Chen He <heddchen@tencent.com>
---
 kernel/sched/fair.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ae7ceba..0a26132 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7004,6 +7004,11 @@ struct task_struct *
 	struct task_struct *p;
 	int new_tasks;
 
+	if (prev &&
+	    fair_policy(prev->policy) &&
+	    sched_idle_cpu(rq->cpu))
+		goto idle;
+
 again:
 	if (!sched_fair_runnable(rq))
 		goto idle;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2020-12-23  8:09 [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks ultrachin
@ 2020-12-23 11:30 ` Vincent Guittot
  2021-01-11  8:26   ` chin
  2021-01-11  9:15   ` He Chen
  2020-12-27 19:13 ` kernel test robot
  2020-12-27 19:42 ` kernel test robot
  2 siblings, 2 replies; 20+ messages in thread
From: Vincent Guittot @ 2020-12-23 11:30 UTC (permalink / raw)
  To: ultrachin
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, heddchen,
	xiaoggchen(陈小光)

On Wed, 23 Dec 2020 at 09:32, <ultrachin@163.com> wrote:
>
> From: Chen Xiaoguang <xiaoggchen@tencent.com>
>
> Before a CPU switches from running SCHED_NORMAL task to
> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other

Could you explain more in detail why you only care about this use case
in particular and not the general case?

> CPU by doing load_balance first.
>
> Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
> Signed-off-by: Chen He <heddchen@tencent.com>
> ---
>  kernel/sched/fair.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ae7ceba..0a26132 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7004,6 +7004,11 @@ struct task_struct *
>         struct task_struct *p;
>         int new_tasks;
>
> +       if (prev &&
> +           fair_policy(prev->policy) &&

Why do you need a prev and fair task  ? You seem to target the special
case of pick_next_task  but in this case why not only testing rf!=null
 to make sure to not return immediately after jumping to the idle
label?

Also why not doing that for default case too ? i.e. balance_fair() ?

> +           sched_idle_cpu(rq->cpu))
> +               goto idle;
> +
>  again:
>         if (!sched_fair_runnable(rq))
>                 goto idle;
> --
> 1.8.3.1
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2020-12-23  8:09 [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks ultrachin
  2020-12-23 11:30 ` Vincent Guittot
@ 2020-12-27 19:13 ` kernel test robot
  2020-12-27 19:42 ` kernel test robot
  2 siblings, 0 replies; 20+ messages in thread
From: kernel test robot @ 2020-12-27 19:13 UTC (permalink / raw)
  To: ultrachin, linux-kernel
  Cc: kbuild-all, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot

[-- Attachment #1: Type: text/plain, Size: 2697 bytes --]

Hi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on tip/master linux/master linus/master v5.10 next-20201223]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/ultrachin-163-com/sched-pull-tasks-when-CPU-is-about-to-run-SCHED_IDLE-tasks/20201223-175522
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 5b78f2dc315354c05300795064f587366a02c6ff
config: microblaze-randconfig-r023-20201223 (attached as .config)
compiler: microblaze-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/3dc62b606dd00e8c8935ff6b85d8bf26a960842b
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review ultrachin-163-com/sched-pull-tasks-when-CPU-is-about-to-run-SCHED_IDLE-tasks/20201223-175522
        git checkout 3dc62b606dd00e8c8935ff6b85d8bf26a960842b
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=microblaze 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/sched/fair.c: In function 'pick_next_task_fair':
   kernel/sched/fair.c:7034:6: error: implicit declaration of function 'sched_idle_cpu'; did you mean 'sched_idle_rq'? [-Werror=implicit-function-declaration]
    7034 |      sched_idle_cpu(rq->cpu))
         |      ^~~~~~~~~~~~~~
         |      sched_idle_rq
>> kernel/sched/fair.c:7034:23: error: 'struct rq' has no member named 'cpu'
    7034 |      sched_idle_cpu(rq->cpu))
         |                       ^~
   cc1: some warnings being treated as errors


vim +7034 kernel/sched/fair.c

  7023	
  7024	struct task_struct *
  7025	pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
  7026	{
  7027		struct cfs_rq *cfs_rq = &rq->cfs;
  7028		struct sched_entity *se;
  7029		struct task_struct *p;
  7030		int new_tasks;
  7031	
  7032		if (prev &&
  7033		    fair_policy(prev->policy) &&
> 7034		    sched_idle_cpu(rq->cpu))
  7035			goto idle;
  7036	
  7037	again:
  7038		if (!sched_fair_runnable(rq))
  7039			goto idle;
  7040	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34704 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2020-12-23  8:09 [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks ultrachin
  2020-12-23 11:30 ` Vincent Guittot
  2020-12-27 19:13 ` kernel test robot
@ 2020-12-27 19:42 ` kernel test robot
  2 siblings, 0 replies; 20+ messages in thread
From: kernel test robot @ 2020-12-27 19:42 UTC (permalink / raw)
  To: ultrachin, linux-kernel
  Cc: kbuild-all, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot

[-- Attachment #1: Type: text/plain, Size: 2697 bytes --]

Hi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on tip/master linux/master linus/master v5.10 next-20201223]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/ultrachin-163-com/sched-pull-tasks-when-CPU-is-about-to-run-SCHED_IDLE-tasks/20201223-175522
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 5b78f2dc315354c05300795064f587366a02c6ff
config: microblaze-randconfig-r002-20201223 (attached as .config)
compiler: microblaze-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/3dc62b606dd00e8c8935ff6b85d8bf26a960842b
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review ultrachin-163-com/sched-pull-tasks-when-CPU-is-about-to-run-SCHED_IDLE-tasks/20201223-175522
        git checkout 3dc62b606dd00e8c8935ff6b85d8bf26a960842b
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=microblaze 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/sched/fair.c: In function 'pick_next_task_fair':
   kernel/sched/fair.c:7034:6: error: implicit declaration of function 'sched_idle_cpu'; did you mean 'sched_idle_rq'? [-Werror=implicit-function-declaration]
    7034 |      sched_idle_cpu(rq->cpu))
         |      ^~~~~~~~~~~~~~
         |      sched_idle_rq
>> kernel/sched/fair.c:7034:23: error: 'struct rq' has no member named 'cpu'
    7034 |      sched_idle_cpu(rq->cpu))
         |                       ^~
   cc1: some warnings being treated as errors


vim +7034 kernel/sched/fair.c

  7023	
  7024	struct task_struct *
  7025	pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
  7026	{
  7027		struct cfs_rq *cfs_rq = &rq->cfs;
  7028		struct sched_entity *se;
  7029		struct task_struct *p;
  7030		int new_tasks;
  7031	
  7032		if (prev &&
  7033		    fair_policy(prev->policy) &&
> 7034		    sched_idle_cpu(rq->cpu))
  7035			goto idle;
  7036	
  7037	again:
  7038		if (!sched_fair_runnable(rq))
  7039			goto idle;
  7040	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30107 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2020-12-23 11:30 ` Vincent Guittot
@ 2021-01-11  8:26   ` chin
  2021-01-11 11:04     ` Vincent Guittot
  2021-01-11  9:15   ` He Chen
  1 sibling, 1 reply; 20+ messages in thread
From: chin @ 2021-01-11  8:26 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, heddchen,
	xiaoggchen(陈小光)


At 2020-12-23 19:30:26, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>On Wed, 23 Dec 2020 at 09:32, <ultrachin@163.com> wrote:
>>
>> From: Chen Xiaoguang <xiaoggchen@tencent.com>
>>
>> Before a CPU switches from running SCHED_NORMAL task to
>> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
>
>Could you explain more in detail why you only care about this use case

>in particular and not the general case?


We want to run online tasks using SCHED_NORMAL policy and offline tasks
using SCHED_IDLE policy. The online tasks and the offline tasks run in
the same computer in order to use the computer efficiently.
The online tasks are in sleep in most times but should responce soon once
wake up. The offline tasks are in low priority and will run only when no online
tasks.

The online tasks are more important than the offline tasks and are latency
sensitive we should make sure the online tasks preempt the offline tasks
as soon as possilbe while there are online tasks waiting to run.
So in our situation we hope the SCHED_NORMAL to run if has any.

Let's assume we have 2 CPUs,
In CPU1 we got 2 SCHED_NORMAL tasks.
in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.

             CPU1                      CPU2
        curr       rq1            curr          rq2
      +------+ | +------+       +------+ | +----+ +----+
t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
      +------+ | +------+       +------+ | +----+ +----+

                                 NORMAL exits or blocked
      +------+ | +------+                | +----+ +----+
t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
      +------+ | +------+                | +----+ +----+

                                 pick_next_task_fair
      +------+ | +------+         +----+ | +----+
t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
      +------+ | +------+         +----+ | +----+

                                 SCHED_IDLE running
t3    +------+ | +------+        +----+  | +----+
      |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
      +------+ | +------+        +----+  | +----+
                 
                                 run_rebalance_domains
      +------+ |                +------+ | +----+ +----+
t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
      +------+ |                +------+ | +----+ +----+

As we can see
t1: NORMAL task in CPU2 exits or blocked
t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
another SCHED_NORMAL in rq1 is waiting. 
t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
t4: after a short time, periodic load_balance triggerd and pull
SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.

In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to run.
The latency of this SCHED_NORMAL will be high which is not acceptble.

Do a load_balance before running the SCHED_IDLE may fix this problem.

This patch works as below:

             CPU1                      CPU2
        curr       rq1            curr          rq2
      +------+ | +------+       +------+ | +----+ +----+
t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
      +------+ | +------+       +------+ | +----+ +----+

                                 NORMAL exits or blocked
      +------+ | +------+                | +----+ +----+
t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
      +------+ | +------+                | +----+ +----+

t2                            pick_next_task_fair (all se are SCHED_IDLE)

                                 newidle_balance
      +------+ |                 +------+ | +----+ +----+
t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
      +------+ |                 +------+ | +----+ +----+


t1: NORMAL task in CPU2 exits or blocked
t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
newidle_balance who tries to pull a SCHED_NORMAL(if has).
t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
SCHED_IDLE(likely).

>
>> CPU by doing load_balance first.
>>
>> Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
>> Signed-off-by: Chen He <heddchen@tencent.com>
>> ---
>>  kernel/sched/fair.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index ae7ceba..0a26132 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -7004,6 +7004,11 @@ struct task_struct *
>>         struct task_struct *p;
>>         int new_tasks;
>>
>> +       if (prev &&
>> +           fair_policy(prev->policy) &&
>
>Why do you need a prev and fair task  ? You seem to target the special
>case of pick_next_task  but in this case why not only testing rf!=null
> to make sure to not return immediately after jumping to the idle

>label?
We just want to do load_balance only when CPU switches from SCHED_NORMAL
to SCHED_IDLE.
If not check prev, when the running tasks are all SCHED_IDLE, we would
do newidle_balance everytime in pick_next_task_fair, it makes no sense
and kind of wasting.

>

>Also why not doing that for default case too ? i.e. balance_fair() ?
You are right, if you think this scenario makes sense, we will send a
refined patch soon :-)

>
>> +           sched_idle_cpu(rq->cpu))
>> +               goto idle;
>> +
>>  again:
>>         if (!sched_fair_runnable(rq))
>>                 goto idle;
>> --
>> 1.8.3.1
>>
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2020-12-23 11:30 ` Vincent Guittot
  2021-01-11  8:26   ` chin
@ 2021-01-11  9:15   ` He Chen
  1 sibling, 0 replies; 20+ messages in thread
From: He Chen @ 2021-01-11  9:15 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, He Chen, He Chen, Xiaoguang Chen,
	Xiaoguang Chen

On Wed, Dec 23, 2020 at 12:30:26PM +0100, Vincent Guittot wrote:
> On Wed, 23 Dec 2020 at 09:32, <ultrachin@163.com> wrote:
> >
> > From: Chen Xiaoguang <xiaoggchen@tencent.com>
> >
> > Before a CPU switches from running SCHED_NORMAL task to
> > SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
> 
> Could you explain more in detail why you only care about this use case
> in particular and not the general case?
> 

We want to run online tasks using SCHED_NORMAL policy and offline tasks
using SCHED_IDLE policy. The online tasks and the offline tasks run in
the same computer in order to use the computer efficiently.
The online tasks are in sleep in most times but should responce soon
once
wake up. The offline tasks are in low priority and will run only when no
online
tasks.

The online tasks are more important than the offline tasks and are
latency
sensitive we should make sure the online tasks preempt the offline tasks
as soon as possilbe while there are online tasks waiting to run.
So in our situation we hope the SCHED_NORMAL to run if has any.

Let's assume we have 2 CPUs,
In CPU1 we got 2 SCHED_NORMAL tasks.
in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.

             CPU1                      CPU2
        curr       rq1            curr          rq2
      +------+ | +------+       +------+ | +----+ +----+
t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
      +------+ | +------+       +------+ | +----+ +----+

                                 NORMAL exits or blocked
      +------+ | +------+                | +----+ +----+
t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
      +------+ | +------+                | +----+ +----+

                                 pick_next_task_fair
      +------+ | +------+         +----+ | +----+
t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
      +------+ | +------+         +----+ | +----+

                                 SCHED_IDLE running
t3    +------+ | +------+        +----+  | +----+
      |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
      +------+ | +------+        +----+  | +----+
                 
                                 run_rebalance_domains
      +------+ |                +------+ | +----+ +----+
t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
      +------+ |                +------+ | +----+ +----+

As we can see
t1: NORMAL task in CPU2 exits or blocked
t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
another SCHED_NORMAL in rq1 is waiting. 
t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
t4: after a short time, periodic load_balance triggerd and pull
SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.

In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to
run.
The latency of this SCHED_NORMAL will be high which is not acceptble.

Do a load_balance before running the SCHED_IDLE may fix this problem.

This patch works as below:

             CPU1                      CPU2
        curr       rq1            curr          rq2
      +------+ | +------+       +------+ | +----+ +----+
t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
      +------+ | +------+       +------+ | +----+ +----+

                                 NORMAL exits or blocked
      +------+ | +------+                | +----+ +----+
t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
      +------+ | +------+                | +----+ +----+

t2                            pick_next_task_fair (all se are
SCHED_IDLE)

                                 newidle_balance
      +------+ |                 +------+ | +----+ +----+
t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
      +------+ |                 +------+ | +----+ +----+


t1: NORMAL task in CPU2 exits or blocked
t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
newidle_balance who tries to pull a SCHED_NORMAL(if has).
t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
SCHED_IDLE(likely).

> > CPU by doing load_balance first.
> >
> > Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
> > Signed-off-by: Chen He <heddchen@tencent.com>
> > ---
> >  kernel/sched/fair.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index ae7ceba..0a26132 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7004,6 +7004,11 @@ struct task_struct *
> >         struct task_struct *p;
> >         int new_tasks;
> >
> > +       if (prev &&
> > +           fair_policy(prev->policy) &&
> 
> Why do you need a prev and fair task  ? You seem to target the special
> case of pick_next_task  but in this case why not only testing rf!=null
>  to make sure to not return immediately after jumping to the idle
> label?
> 

We just want to do load_balance only when CPU switches from SCHED_NORMAL
to SCHED_IDLE.
If not check prev, when the running tasks are all SCHED_IDLE, we would
do newidle_balance everytime in pick_next_task_fair, it makes no sense
and kind of wasting.

> Also why not doing that for default case too ? i.e. balance_fair() ?

You are right, if you think this scenario makes sense, we will send a
refined patch soon :-)

Thanks,
-He

> 
> > +           sched_idle_cpu(rq->cpu))
> > +               goto idle;
> > +
> >  again:
> >         if (!sched_fair_runnable(rq))
> >                 goto idle;
> > --
> > 1.8.3.1
> >
> >
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2021-01-11  8:26   ` chin
@ 2021-01-11 11:04     ` Vincent Guittot
  2021-01-12  6:57       ` chin
  0 siblings, 1 reply; 20+ messages in thread
From: Vincent Guittot @ 2021-01-11 11:04 UTC (permalink / raw)
  To: chin
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, heddchen,
	xiaoggchen(陈小光)

On Mon, 11 Jan 2021 at 09:27, chin <ultrachin@163.com> wrote:
>
>
> At 2020-12-23 19:30:26, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >On Wed, 23 Dec 2020 at 09:32, <ultrachin@163.com> wrote:
> >>
> >> From: Chen Xiaoguang <xiaoggchen@tencent.com>
> >>
> >> Before a CPU switches from running SCHED_NORMAL task to
> >> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
> >
> >Could you explain more in detail why you only care about this use case
>
> >in particular and not the general case?
>
>
> We want to run online tasks using SCHED_NORMAL policy and offline tasks
> using SCHED_IDLE policy. The online tasks and the offline tasks run in
> the same computer in order to use the computer efficiently.
> The online tasks are in sleep in most times but should responce soon once
> wake up. The offline tasks are in low priority and will run only when no online
> tasks.
>
> The online tasks are more important than the offline tasks and are latency
> sensitive we should make sure the online tasks preempt the offline tasks
> as soon as possilbe while there are online tasks waiting to run.
> So in our situation we hope the SCHED_NORMAL to run if has any.
>
> Let's assume we have 2 CPUs,
> In CPU1 we got 2 SCHED_NORMAL tasks.
> in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.
>
>              CPU1                      CPU2
>         curr       rq1            curr          rq2
>       +------+ | +------+       +------+ | +----+ +----+
> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
>       +------+ | +------+       +------+ | +----+ +----+
>
>                                  NORMAL exits or blocked
>       +------+ | +------+                | +----+ +----+
> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
>       +------+ | +------+                | +----+ +----+
>
>                                  pick_next_task_fair
>       +------+ | +------+         +----+ | +----+
> t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
>       +------+ | +------+         +----+ | +----+
>
>                                  SCHED_IDLE running
> t3    +------+ | +------+        +----+  | +----+
>       |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
>       +------+ | +------+        +----+  | +----+
>
>                                  run_rebalance_domains
>       +------+ |                +------+ | +----+ +----+
> t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
>       +------+ |                +------+ | +----+ +----+
>
> As we can see
> t1: NORMAL task in CPU2 exits or blocked
> t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
> another SCHED_NORMAL in rq1 is waiting.
> t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
> t4: after a short time, periodic load_balance triggerd and pull
> SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.
>
> In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to run.
> The latency of this SCHED_NORMAL will be high which is not acceptble.
>
> Do a load_balance before running the SCHED_IDLE may fix this problem.
>
> This patch works as below:
>
>              CPU1                      CPU2
>         curr       rq1            curr          rq2
>       +------+ | +------+       +------+ | +----+ +----+
> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
>       +------+ | +------+       +------+ | +----+ +----+
>
>                                  NORMAL exits or blocked
>       +------+ | +------+                | +----+ +----+
> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
>       +------+ | +------+                | +----+ +----+
>
> t2                            pick_next_task_fair (all se are SCHED_IDLE)
>
>                                  newidle_balance
>       +------+ |                 +------+ | +----+ +----+
> t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
>       +------+ |                 +------+ | +----+ +----+
>
>
> t1: NORMAL task in CPU2 exits or blocked
> t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
> newidle_balance who tries to pull a SCHED_NORMAL(if has).
> t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
> SCHED_IDLE(likely).
>
> >
> >> CPU by doing load_balance first.
> >>
> >> Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
> >> Signed-off-by: Chen He <heddchen@tencent.com>
> >> ---
> >>  kernel/sched/fair.c | 5 +++++
> >>  1 file changed, 5 insertions(+)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index ae7ceba..0a26132 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -7004,6 +7004,11 @@ struct task_struct *
> >>         struct task_struct *p;
> >>         int new_tasks;
> >>
> >> +       if (prev &&
> >> +           fair_policy(prev->policy) &&
> >
> >Why do you need a prev and fair task  ? You seem to target the special
> >case of pick_next_task  but in this case why not only testing rf!=null
> > to make sure to not return immediately after jumping to the idle
>
> >label?
> We just want to do load_balance only when CPU switches from SCHED_NORMAL
> to SCHED_IDLE.
> If not check prev, when the running tasks are all SCHED_IDLE, we would
> do newidle_balance everytime in pick_next_task_fair, it makes no sense
> and kind of wasting.

I agree that calling newidle_balance every time pick_next_task_fair is
called when there are only sched_idle tasks is useless.
But you also have to take into account cases where there was another
class of task running on the cpu like RT one. In your example above,
if you replace the normal task on CPU2 by a RT task, you still want to
pick the normal task on CPU1 once RT task goes to sleep.

Another point that you will have to consider the impact on
rq->idle_stamp because newidle_balance is assumed to be called before
going idle which is not the case anymore with your use case

>
> >
>
> >Also why not doing that for default case too ? i.e. balance_fair() ?
> You are right, if you think this scenario makes sense, we will send a
> refined patch soon :-)
>
> >
> >> +           sched_idle_cpu(rq->cpu))
> >> +               goto idle;
> >> +
> >>  again:
> >>         if (!sched_fair_runnable(rq))
> >>                 goto idle;
> >> --
> >> 1.8.3.1
> >>
> >>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2021-01-11 11:04     ` Vincent Guittot
@ 2021-01-12  6:57       ` chin
  2021-01-12  8:18         ` Vincent Guittot
  0 siblings, 1 reply; 20+ messages in thread
From: chin @ 2021-01-12  6:57 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, heddchen,
	xiaoggchen(陈小光)




At 2021-01-11 19:04:19, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>On Mon, 11 Jan 2021 at 09:27, chin <ultrachin@163.com> wrote:
>>
>>
>> At 2020-12-23 19:30:26, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>> >On Wed, 23 Dec 2020 at 09:32, <ultrachin@163.com> wrote:
>> >>
>> >> From: Chen Xiaoguang <xiaoggchen@tencent.com>
>> >>
>> >> Before a CPU switches from running SCHED_NORMAL task to
>> >> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
>> >
>> >Could you explain more in detail why you only care about this use case
>>
>> >in particular and not the general case?
>>
>>
>> We want to run online tasks using SCHED_NORMAL policy and offline tasks
>> using SCHED_IDLE policy. The online tasks and the offline tasks run in
>> the same computer in order to use the computer efficiently.
>> The online tasks are in sleep in most times but should responce soon once
>> wake up. The offline tasks are in low priority and will run only when no online
>> tasks.
>>
>> The online tasks are more important than the offline tasks and are latency
>> sensitive we should make sure the online tasks preempt the offline tasks
>> as soon as possilbe while there are online tasks waiting to run.
>> So in our situation we hope the SCHED_NORMAL to run if has any.
>>
>> Let's assume we have 2 CPUs,
>> In CPU1 we got 2 SCHED_NORMAL tasks.
>> in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.
>>
>>              CPU1                      CPU2
>>         curr       rq1            curr          rq2
>>       +------+ | +------+       +------+ | +----+ +----+
>> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
>>       +------+ | +------+       +------+ | +----+ +----+
>>
>>                                  NORMAL exits or blocked
>>       +------+ | +------+                | +----+ +----+
>> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
>>       +------+ | +------+                | +----+ +----+
>>
>>                                  pick_next_task_fair
>>       +------+ | +------+         +----+ | +----+
>> t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
>>       +------+ | +------+         +----+ | +----+
>>
>>                                  SCHED_IDLE running
>> t3    +------+ | +------+        +----+  | +----+
>>       |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
>>       +------+ | +------+        +----+  | +----+
>>
>>                                  run_rebalance_domains
>>       +------+ |                +------+ | +----+ +----+
>> t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
>>       +------+ |                +------+ | +----+ +----+
>>
>> As we can see
>> t1: NORMAL task in CPU2 exits or blocked
>> t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
>> another SCHED_NORMAL in rq1 is waiting.
>> t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
>> t4: after a short time, periodic load_balance triggerd and pull
>> SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.
>>
>> In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to run.
>> The latency of this SCHED_NORMAL will be high which is not acceptble.
>>
>> Do a load_balance before running the SCHED_IDLE may fix this problem.
>>
>> This patch works as below:
>>
>>              CPU1                      CPU2
>>         curr       rq1            curr          rq2
>>       +------+ | +------+       +------+ | +----+ +----+
>> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
>>       +------+ | +------+       +------+ | +----+ +----+
>>
>>                                  NORMAL exits or blocked
>>       +------+ | +------+                | +----+ +----+
>> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
>>       +------+ | +------+                | +----+ +----+
>>
>> t2                            pick_next_task_fair (all se are SCHED_IDLE)
>>
>>                                  newidle_balance
>>       +------+ |                 +------+ | +----+ +----+
>> t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
>>       +------+ |                 +------+ | +----+ +----+
>>
>>
>> t1: NORMAL task in CPU2 exits or blocked
>> t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
>> newidle_balance who tries to pull a SCHED_NORMAL(if has).
>> t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
>> SCHED_IDLE(likely).
>>
>> >
>> >> CPU by doing load_balance first.
>> >>
>> >> Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
>> >> Signed-off-by: Chen He <heddchen@tencent.com>
>> >> ---
>> >>  kernel/sched/fair.c | 5 +++++
>> >>  1 file changed, 5 insertions(+)
>> >>
>> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> >> index ae7ceba..0a26132 100644
>> >> --- a/kernel/sched/fair.c
>> >> +++ b/kernel/sched/fair.c
>> >> @@ -7004,6 +7004,11 @@ struct task_struct *
>> >>         struct task_struct *p;
>> >>         int new_tasks;
>> >>
>> >> +       if (prev &&
>> >> +           fair_policy(prev->policy) &&
>> >
>> >Why do you need a prev and fair task  ? You seem to target the special
>> >case of pick_next_task  but in this case why not only testing rf!=null
>> > to make sure to not return immediately after jumping to the idle
>>
>> >label?
>> We just want to do load_balance only when CPU switches from SCHED_NORMAL
>> to SCHED_IDLE.
>> If not check prev, when the running tasks are all SCHED_IDLE, we would
>> do newidle_balance everytime in pick_next_task_fair, it makes no sense
>> and kind of wasting.
>
>I agree that calling newidle_balance every time pick_next_task_fair is
>called when there are only sched_idle tasks is useless.
>But you also have to take into account cases where there was another
>class of task running on the cpu like RT one. In your example above,
>if you replace the normal task on CPU2 by a RT task, you still want to

>pick the normal task on CPU1 once RT task goes to sleep.
Sure, this case should be taken into account,  we should also try to
pick normal task in this case.

>
>Another point that you will have to consider the impact on
>rq->idle_stamp because newidle_balance is assumed to be called before

>going idle which is not the case anymore with your use case
Yes. rq->idle_stamp should not be changed in this case.



Actually we want to pull a SCHED_NORMAL task (if possible) to run when a cpu is
about to run SCHED_IDLE task. But currently newidle_balance is not
designed for SCHED_IDLE  so SCHED_IDLE can also be pulled which
is useless in our situation.

So we plan to add a new function sched_idle_balance which only try to
pull SCHED_NORMAL tasks from the busiest cpu. And we will call 
sched_idle_balance when the previous task is normal or RT and
hoping we can pull a SCHED_NORMAL task to run.

Do you think it is ok to add a new sched_idle_balance?

>
>>
>> >
>>
>> >Also why not doing that for default case too ? i.e. balance_fair() ?
>> You are right, if you think this scenario makes sense, we will send a
>> refined patch soon :-)
>>
>> >
>> >> +           sched_idle_cpu(rq->cpu))
>> >> +               goto idle;
>> >> +
>> >>  again:
>> >>         if (!sched_fair_runnable(rq))
>> >>                 goto idle;
>> >> --
>> >> 1.8.3.1
>> >>
>> >>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2021-01-12  6:57       ` chin
@ 2021-01-12  8:18         ` Vincent Guittot
  2021-01-13  3:12           ` chin
  0 siblings, 1 reply; 20+ messages in thread
From: Vincent Guittot @ 2021-01-12  8:18 UTC (permalink / raw)
  To: chin
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, heddchen,
	xiaoggchen(陈小光)

On Tue, 12 Jan 2021 at 07:59, chin <ultrachin@163.com> wrote:
>
>
>
>
> At 2021-01-11 19:04:19, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >On Mon, 11 Jan 2021 at 09:27, chin <ultrachin@163.com> wrote:
> >>
> >>
> >> At 2020-12-23 19:30:26, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >> >On Wed, 23 Dec 2020 at 09:32, <ultrachin@163.com> wrote:
> >> >>
> >> >> From: Chen Xiaoguang <xiaoggchen@tencent.com>
> >> >>
> >> >> Before a CPU switches from running SCHED_NORMAL task to
> >> >> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
> >> >
> >> >Could you explain more in detail why you only care about this use case
> >>
> >> >in particular and not the general case?
> >>
> >>
> >> We want to run online tasks using SCHED_NORMAL policy and offline tasks
> >> using SCHED_IDLE policy. The online tasks and the offline tasks run in
> >> the same computer in order to use the computer efficiently.
> >> The online tasks are in sleep in most times but should responce soon once
> >> wake up. The offline tasks are in low priority and will run only when no online
> >> tasks.
> >>
> >> The online tasks are more important than the offline tasks and are latency
> >> sensitive we should make sure the online tasks preempt the offline tasks
> >> as soon as possilbe while there are online tasks waiting to run.
> >> So in our situation we hope the SCHED_NORMAL to run if has any.
> >>
> >> Let's assume we have 2 CPUs,
> >> In CPU1 we got 2 SCHED_NORMAL tasks.
> >> in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.
> >>
> >>              CPU1                      CPU2
> >>         curr       rq1            curr          rq2
> >>       +------+ | +------+       +------+ | +----+ +----+
> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> >>       +------+ | +------+       +------+ | +----+ +----+
> >>
> >>                                  NORMAL exits or blocked
> >>       +------+ | +------+                | +----+ +----+
> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> >>       +------+ | +------+                | +----+ +----+
> >>
> >>                                  pick_next_task_fair
> >>       +------+ | +------+         +----+ | +----+
> >> t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
> >>       +------+ | +------+         +----+ | +----+
> >>
> >>                                  SCHED_IDLE running
> >> t3    +------+ | +------+        +----+  | +----+
> >>       |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
> >>       +------+ | +------+        +----+  | +----+
> >>
> >>                                  run_rebalance_domains
> >>       +------+ |                +------+ | +----+ +----+
> >> t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
> >>       +------+ |                +------+ | +----+ +----+
> >>
> >> As we can see
> >> t1: NORMAL task in CPU2 exits or blocked
> >> t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
> >> another SCHED_NORMAL in rq1 is waiting.
> >> t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
> >> t4: after a short time, periodic load_balance triggerd and pull
> >> SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.
> >>
> >> In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to run.
> >> The latency of this SCHED_NORMAL will be high which is not acceptble.
> >>
> >> Do a load_balance before running the SCHED_IDLE may fix this problem.
> >>
> >> This patch works as below:
> >>
> >>              CPU1                      CPU2
> >>         curr       rq1            curr          rq2
> >>       +------+ | +------+       +------+ | +----+ +----+
> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> >>       +------+ | +------+       +------+ | +----+ +----+
> >>
> >>                                  NORMAL exits or blocked
> >>       +------+ | +------+                | +----+ +----+
> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> >>       +------+ | +------+                | +----+ +----+
> >>
> >> t2                            pick_next_task_fair (all se are SCHED_IDLE)
> >>
> >>                                  newidle_balance
> >>       +------+ |                 +------+ | +----+ +----+
> >> t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
> >>       +------+ |                 +------+ | +----+ +----+
> >>
> >>
> >> t1: NORMAL task in CPU2 exits or blocked
> >> t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
> >> newidle_balance who tries to pull a SCHED_NORMAL(if has).
> >> t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
> >> SCHED_IDLE(likely).
> >>
> >> >
> >> >> CPU by doing load_balance first.
> >> >>
> >> >> Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
> >> >> Signed-off-by: Chen He <heddchen@tencent.com>
> >> >> ---
> >> >>  kernel/sched/fair.c | 5 +++++
> >> >>  1 file changed, 5 insertions(+)
> >> >>
> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> >> index ae7ceba..0a26132 100644
> >> >> --- a/kernel/sched/fair.c
> >> >> +++ b/kernel/sched/fair.c
> >> >> @@ -7004,6 +7004,11 @@ struct task_struct *
> >> >>         struct task_struct *p;
> >> >>         int new_tasks;
> >> >>
> >> >> +       if (prev &&
> >> >> +           fair_policy(prev->policy) &&
> >> >
> >> >Why do you need a prev and fair task  ? You seem to target the special
> >> >case of pick_next_task  but in this case why not only testing rf!=null
> >> > to make sure to not return immediately after jumping to the idle
> >>
> >> >label?
> >> We just want to do load_balance only when CPU switches from SCHED_NORMAL
> >> to SCHED_IDLE.
> >> If not check prev, when the running tasks are all SCHED_IDLE, we would
> >> do newidle_balance everytime in pick_next_task_fair, it makes no sense
> >> and kind of wasting.
> >
> >I agree that calling newidle_balance every time pick_next_task_fair is
> >called when there are only sched_idle tasks is useless.
> >But you also have to take into account cases where there was another
> >class of task running on the cpu like RT one. In your example above,
> >if you replace the normal task on CPU2 by a RT task, you still want to
>
> >pick the normal task on CPU1 once RT task goes to sleep.
> Sure, this case should be taken into account,  we should also try to
> pick normal task in this case.
>
> >
> >Another point that you will have to consider the impact on
> >rq->idle_stamp because newidle_balance is assumed to be called before
>
> >going idle which is not the case anymore with your use case
> Yes. rq->idle_stamp should not be changed in this case.
>
>
>
> Actually we want to pull a SCHED_NORMAL task (if possible) to run when a cpu is
> about to run SCHED_IDLE task. But currently newidle_balance is not
> designed for SCHED_IDLE  so SCHED_IDLE can also be pulled which
> is useless in our situation.

newidle_balance will pull a sched_idle task only if there is an
imbalance which is the right thing to do IMO to ensure fairness
between sched_idle tasks.  Being a sched_idle task doesn't mean that
we should break the fairness

>
> So we plan to add a new function sched_idle_balance which only try to
> pull SCHED_NORMAL tasks from the busiest cpu. And we will call
> sched_idle_balance when the previous task is normal or RT and
> hoping we can pull a SCHED_NORMAL task to run.
>
> Do you think it is ok to add a new sched_idle_balance?

I don't see any reason why the scheduler should not pull a sched_idle
task if there is an imbalance. That will happen anyway during the next
periodic load balance

>
> >
> >>
> >> >
> >>
> >> >Also why not doing that for default case too ? i.e. balance_fair() ?
> >> You are right, if you think this scenario makes sense, we will send a
> >> refined patch soon :-)
> >>
> >> >
> >> >> +           sched_idle_cpu(rq->cpu))
> >> >> +               goto idle;
> >> >> +
> >> >>  again:
> >> >>         if (!sched_fair_runnable(rq))
> >> >>                 goto idle;
> >> >> --
> >> >> 1.8.3.1
> >> >>
> >> >>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2021-01-12  8:18         ` Vincent Guittot
@ 2021-01-13  3:12           ` chin
  2021-01-13  8:30             ` Vincent Guittot
  0 siblings, 1 reply; 20+ messages in thread
From: chin @ 2021-01-13  3:12 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, heddchen,
	xiaoggchen(陈小光)




At 2021-01-12 16:18:51, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>On Tue, 12 Jan 2021 at 07:59, chin <ultrachin@163.com> wrote:
>>
>>
>>
>>
>> At 2021-01-11 19:04:19, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>> >On Mon, 11 Jan 2021 at 09:27, chin <ultrachin@163.com> wrote:
>> >>
>> >>
>> >> At 2020-12-23 19:30:26, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>> >> >On Wed, 23 Dec 2020 at 09:32, <ultrachin@163.com> wrote:
>> >> >>
>> >> >> From: Chen Xiaoguang <xiaoggchen@tencent.com>
>> >> >>
>> >> >> Before a CPU switches from running SCHED_NORMAL task to
>> >> >> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
>> >> >
>> >> >Could you explain more in detail why you only care about this use case
>> >>
>> >> >in particular and not the general case?
>> >>
>> >>
>> >> We want to run online tasks using SCHED_NORMAL policy and offline tasks
>> >> using SCHED_IDLE policy. The online tasks and the offline tasks run in
>> >> the same computer in order to use the computer efficiently.
>> >> The online tasks are in sleep in most times but should responce soon once
>> >> wake up. The offline tasks are in low priority and will run only when no online
>> >> tasks.
>> >>
>> >> The online tasks are more important than the offline tasks and are latency
>> >> sensitive we should make sure the online tasks preempt the offline tasks
>> >> as soon as possilbe while there are online tasks waiting to run.
>> >> So in our situation we hope the SCHED_NORMAL to run if has any.
>> >>
>> >> Let's assume we have 2 CPUs,
>> >> In CPU1 we got 2 SCHED_NORMAL tasks.
>> >> in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.
>> >>
>> >>              CPU1                      CPU2
>> >>         curr       rq1            curr          rq2
>> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
>> >>       +------+ | +------+       +------+ | +----+ +----+
>> >>
>> >>                                  NORMAL exits or blocked
>> >>       +------+ | +------+                | +----+ +----+
>> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
>> >>       +------+ | +------+                | +----+ +----+
>> >>
>> >>                                  pick_next_task_fair
>> >>       +------+ | +------+         +----+ | +----+
>> >> t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
>> >>       +------+ | +------+         +----+ | +----+
>> >>
>> >>                                  SCHED_IDLE running
>> >> t3    +------+ | +------+        +----+  | +----+
>> >>       |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
>> >>       +------+ | +------+        +----+  | +----+
>> >>
>> >>                                  run_rebalance_domains
>> >>       +------+ |                +------+ | +----+ +----+
>> >> t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
>> >>       +------+ |                +------+ | +----+ +----+
>> >>
>> >> As we can see
>> >> t1: NORMAL task in CPU2 exits or blocked
>> >> t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
>> >> another SCHED_NORMAL in rq1 is waiting.
>> >> t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
>> >> t4: after a short time, periodic load_balance triggerd and pull
>> >> SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.
>> >>
>> >> In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to run.
>> >> The latency of this SCHED_NORMAL will be high which is not acceptble.
>> >>
>> >> Do a load_balance before running the SCHED_IDLE may fix this problem.
>> >>
>> >> This patch works as below:
>> >>
>> >>              CPU1                      CPU2
>> >>         curr       rq1            curr          rq2
>> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
>> >>       +------+ | +------+       +------+ | +----+ +----+
>> >>
>> >>                                  NORMAL exits or blocked
>> >>       +------+ | +------+                | +----+ +----+
>> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
>> >>       +------+ | +------+                | +----+ +----+
>> >>
>> >> t2                            pick_next_task_fair (all se are SCHED_IDLE)
>> >>
>> >>                                  newidle_balance
>> >>       +------+ |                 +------+ | +----+ +----+
>> >> t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
>> >>       +------+ |                 +------+ | +----+ +----+
>> >>
>> >>
>> >> t1: NORMAL task in CPU2 exits or blocked
>> >> t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
>> >> newidle_balance who tries to pull a SCHED_NORMAL(if has).
>> >> t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
>> >> SCHED_IDLE(likely).
>> >>
>> >> >
>> >> >> CPU by doing load_balance first.
>> >> >>
>> >> >> Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
>> >> >> Signed-off-by: Chen He <heddchen@tencent.com>
>> >> >> ---
>> >> >>  kernel/sched/fair.c | 5 +++++
>> >> >>  1 file changed, 5 insertions(+)
>> >> >>
>> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> >> >> index ae7ceba..0a26132 100644
>> >> >> --- a/kernel/sched/fair.c
>> >> >> +++ b/kernel/sched/fair.c
>> >> >> @@ -7004,6 +7004,11 @@ struct task_struct *
>> >> >>         struct task_struct *p;
>> >> >>         int new_tasks;
>> >> >>
>> >> >> +       if (prev &&
>> >> >> +           fair_policy(prev->policy) &&
>> >> >
>> >> >Why do you need a prev and fair task  ? You seem to target the special
>> >> >case of pick_next_task  but in this case why not only testing rf!=null
>> >> > to make sure to not return immediately after jumping to the idle
>> >>
>> >> >label?
>> >> We just want to do load_balance only when CPU switches from SCHED_NORMAL
>> >> to SCHED_IDLE.
>> >> If not check prev, when the running tasks are all SCHED_IDLE, we would
>> >> do newidle_balance everytime in pick_next_task_fair, it makes no sense
>> >> and kind of wasting.
>> >
>> >I agree that calling newidle_balance every time pick_next_task_fair is
>> >called when there are only sched_idle tasks is useless.
>> >But you also have to take into account cases where there was another
>> >class of task running on the cpu like RT one. In your example above,
>> >if you replace the normal task on CPU2 by a RT task, you still want to
>>
>> >pick the normal task on CPU1 once RT task goes to sleep.
>> Sure, this case should be taken into account,  we should also try to
>> pick normal task in this case.
>>
>> >
>> >Another point that you will have to consider the impact on
>> >rq->idle_stamp because newidle_balance is assumed to be called before
>>
>> >going idle which is not the case anymore with your use case
>> Yes. rq->idle_stamp should not be changed in this case.
>>
>>
>>
>> Actually we want to pull a SCHED_NORMAL task (if possible) to run when a cpu is
>> about to run SCHED_IDLE task. But currently newidle_balance is not
>> designed for SCHED_IDLE  so SCHED_IDLE can also be pulled which
>> is useless in our situation.
>
>newidle_balance will pull a sched_idle task only if there is an
>imbalance which is the right thing to do IMO to ensure fairness
>between sched_idle tasks.  Being a sched_idle task doesn't mean that
>we should break the fairness
>
>>
>> So we plan to add a new function sched_idle_balance which only try to
>> pull SCHED_NORMAL tasks from the busiest cpu. And we will call
>> sched_idle_balance when the previous task is normal or RT and
>> hoping we can pull a SCHED_NORMAL task to run.
>>
>> Do you think it is ok to add a new sched_idle_balance?
>
>I don't see any reason why the scheduler should not pull a sched_idle
>task if there is an imbalance. That will happen anyway during the next

>periodic load balance
OK. We should not pull the SCHED_IDLE tasks only in load_balance.


Do you think it make sense to do an extra load_balance when cpu is
about to run SCHED_IDLE task (switched from normal/RT)?
By doing this SCHED_NORMAL tasks waiting on other cpus would get
a chance to be pulled to this cpu and run, it is helpful to reduce the latency
of SCHED_NORMAL tasks.


>>>
>> >
>> >>
>> >> >
>> >>
>> >> >Also why not doing that for default case too ? i.e. balance_fair() ?
>> >> You are right, if you think this scenario makes sense, we will send a
>> >> refined patch soon :-)
>> >>
>> >> >
>> >> >> +           sched_idle_cpu(rq->cpu))
>> >> >> +               goto idle;
>> >> >> +
>> >> >>  again:
>> >> >>         if (!sched_fair_runnable(rq))
>> >> >>                 goto idle;
>> >> >> --
>> >> >> 1.8.3.1
>> >> >>
>> >> >>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2021-01-13  3:12           ` chin
@ 2021-01-13  8:30             ` Vincent Guittot
  2021-02-02  7:54               ` chin
  0 siblings, 1 reply; 20+ messages in thread
From: Vincent Guittot @ 2021-01-13  8:30 UTC (permalink / raw)
  To: chin
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, heddchen,
	xiaoggchen(陈小光)

On Wed, 13 Jan 2021 at 04:14, chin <ultrachin@163.com> wrote:
>
>
>
>
> At 2021-01-12 16:18:51, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >On Tue, 12 Jan 2021 at 07:59, chin <ultrachin@163.com> wrote:
> >>
> >>
> >>
> >>
> >> At 2021-01-11 19:04:19, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >> >On Mon, 11 Jan 2021 at 09:27, chin <ultrachin@163.com> wrote:
> >> >>
> >> >>
> >> >> At 2020-12-23 19:30:26, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >> >> >On Wed, 23 Dec 2020 at 09:32, <ultrachin@163.com> wrote:
> >> >> >>
> >> >> >> From: Chen Xiaoguang <xiaoggchen@tencent.com>
> >> >> >>
> >> >> >> Before a CPU switches from running SCHED_NORMAL task to
> >> >> >> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
> >> >> >
> >> >> >Could you explain more in detail why you only care about this use case
> >> >>
> >> >> >in particular and not the general case?
> >> >>
> >> >>
> >> >> We want to run online tasks using SCHED_NORMAL policy and offline tasks
> >> >> using SCHED_IDLE policy. The online tasks and the offline tasks run in
> >> >> the same computer in order to use the computer efficiently.
> >> >> The online tasks are in sleep in most times but should responce soon once
> >> >> wake up. The offline tasks are in low priority and will run only when no online
> >> >> tasks.
> >> >>
> >> >> The online tasks are more important than the offline tasks and are latency
> >> >> sensitive we should make sure the online tasks preempt the offline tasks
> >> >> as soon as possilbe while there are online tasks waiting to run.
> >> >> So in our situation we hope the SCHED_NORMAL to run if has any.
> >> >>
> >> >> Let's assume we have 2 CPUs,
> >> >> In CPU1 we got 2 SCHED_NORMAL tasks.
> >> >> in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.
> >> >>
> >> >>              CPU1                      CPU2
> >> >>         curr       rq1            curr          rq2
> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >>
> >> >>                                  NORMAL exits or blocked
> >> >>       +------+ | +------+                | +----+ +----+
> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> >> >>       +------+ | +------+                | +----+ +----+
> >> >>
> >> >>                                  pick_next_task_fair
> >> >>       +------+ | +------+         +----+ | +----+
> >> >> t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
> >> >>       +------+ | +------+         +----+ | +----+
> >> >>
> >> >>                                  SCHED_IDLE running
> >> >> t3    +------+ | +------+        +----+  | +----+
> >> >>       |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
> >> >>       +------+ | +------+        +----+  | +----+
> >> >>
> >> >>                                  run_rebalance_domains
> >> >>       +------+ |                +------+ | +----+ +----+
> >> >> t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
> >> >>       +------+ |                +------+ | +----+ +----+
> >> >>
> >> >> As we can see
> >> >> t1: NORMAL task in CPU2 exits or blocked
> >> >> t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
> >> >> another SCHED_NORMAL in rq1 is waiting.
> >> >> t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
> >> >> t4: after a short time, periodic load_balance triggerd and pull
> >> >> SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.
> >> >>
> >> >> In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to run.
> >> >> The latency of this SCHED_NORMAL will be high which is not acceptble.
> >> >>
> >> >> Do a load_balance before running the SCHED_IDLE may fix this problem.
> >> >>
> >> >> This patch works as below:
> >> >>
> >> >>              CPU1                      CPU2
> >> >>         curr       rq1            curr          rq2
> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >>
> >> >>                                  NORMAL exits or blocked
> >> >>       +------+ | +------+                | +----+ +----+
> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> >> >>       +------+ | +------+                | +----+ +----+
> >> >>
> >> >> t2                            pick_next_task_fair (all se are SCHED_IDLE)
> >> >>
> >> >>                                  newidle_balance
> >> >>       +------+ |                 +------+ | +----+ +----+
> >> >> t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
> >> >>       +------+ |                 +------+ | +----+ +----+
> >> >>
> >> >>
> >> >> t1: NORMAL task in CPU2 exits or blocked
> >> >> t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
> >> >> newidle_balance who tries to pull a SCHED_NORMAL(if has).
> >> >> t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
> >> >> SCHED_IDLE(likely).
> >> >>
> >> >> >
> >> >> >> CPU by doing load_balance first.
> >> >> >>
> >> >> >> Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
> >> >> >> Signed-off-by: Chen He <heddchen@tencent.com>
> >> >> >> ---
> >> >> >>  kernel/sched/fair.c | 5 +++++
> >> >> >>  1 file changed, 5 insertions(+)
> >> >> >>
> >> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> >> >> index ae7ceba..0a26132 100644
> >> >> >> --- a/kernel/sched/fair.c
> >> >> >> +++ b/kernel/sched/fair.c
> >> >> >> @@ -7004,6 +7004,11 @@ struct task_struct *
> >> >> >>         struct task_struct *p;
> >> >> >>         int new_tasks;
> >> >> >>
> >> >> >> +       if (prev &&
> >> >> >> +           fair_policy(prev->policy) &&
> >> >> >
> >> >> >Why do you need a prev and fair task  ? You seem to target the special
> >> >> >case of pick_next_task  but in this case why not only testing rf!=null
> >> >> > to make sure to not return immediately after jumping to the idle
> >> >>
> >> >> >label?
> >> >> We just want to do load_balance only when CPU switches from SCHED_NORMAL
> >> >> to SCHED_IDLE.
> >> >> If not check prev, when the running tasks are all SCHED_IDLE, we would
> >> >> do newidle_balance everytime in pick_next_task_fair, it makes no sense
> >> >> and kind of wasting.
> >> >
> >> >I agree that calling newidle_balance every time pick_next_task_fair is
> >> >called when there are only sched_idle tasks is useless.
> >> >But you also have to take into account cases where there was another
> >> >class of task running on the cpu like RT one. In your example above,
> >> >if you replace the normal task on CPU2 by a RT task, you still want to
> >>
> >> >pick the normal task on CPU1 once RT task goes to sleep.
> >> Sure, this case should be taken into account,  we should also try to
> >> pick normal task in this case.
> >>
> >> >
> >> >Another point that you will have to consider the impact on
> >> >rq->idle_stamp because newidle_balance is assumed to be called before
> >>
> >> >going idle which is not the case anymore with your use case
> >> Yes. rq->idle_stamp should not be changed in this case.
> >>
> >>
> >>
> >> Actually we want to pull a SCHED_NORMAL task (if possible) to run when a cpu is
> >> about to run SCHED_IDLE task. But currently newidle_balance is not
> >> designed for SCHED_IDLE  so SCHED_IDLE can also be pulled which
> >> is useless in our situation.
> >
> >newidle_balance will pull a sched_idle task only if there is an
> >imbalance which is the right thing to do IMO to ensure fairness
> >between sched_idle tasks.  Being a sched_idle task doesn't mean that
> >we should break the fairness
> >
> >>
> >> So we plan to add a new function sched_idle_balance which only try to
> >> pull SCHED_NORMAL tasks from the busiest cpu. And we will call
> >> sched_idle_balance when the previous task is normal or RT and
> >> hoping we can pull a SCHED_NORMAL task to run.
> >>
> >> Do you think it is ok to add a new sched_idle_balance?
> >
> >I don't see any reason why the scheduler should not pull a sched_idle
> >task if there is an imbalance. That will happen anyway during the next
>
> >periodic load balance
> OK. We should not pull the SCHED_IDLE tasks only in load_balance.
>
>
> Do you think it make sense to do an extra load_balance when cpu is
> about to run SCHED_IDLE task (switched from normal/RT)?

I'm not sure to get your point here.
Do you mean if a sched_idle task is picked to become the running task
whereas there are runnable normal tasks ? This can happen if normal
tasks are long running tasks. We should not in this case. The only
case is when the running task, which is not a sched_idle task but a
normal/rt/deadline one, goes to sleep and there are only sched_idle
tasks enqueued. In this case and only in this case, we should trigger
a load_balance to get a chance to pull a waiting normal task from
another CPU.

This means checking this state in pick_next_task_fair() and in balance_fair()

> By doing this SCHED_NORMAL tasks waiting on other cpus would get
> a chance to be pulled to this cpu and run, it is helpful to reduce the latency
> of SCHED_NORMAL tasks.
>
>
> >>>
> >> >
> >> >>
> >> >> >
> >> >>
> >> >> >Also why not doing that for default case too ? i.e. balance_fair() ?
> >> >> You are right, if you think this scenario makes sense, we will send a
> >> >> refined patch soon :-)
> >> >>
> >> >> >
> >> >> >> +           sched_idle_cpu(rq->cpu))
> >> >> >> +               goto idle;
> >> >> >> +
> >> >> >>  again:
> >> >> >>         if (!sched_fair_runnable(rq))
> >> >> >>                 goto idle;
> >> >> >> --
> >> >> >> 1.8.3.1
> >> >> >>
> >> >> >>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2021-01-13  8:30             ` Vincent Guittot
@ 2021-02-02  7:54               ` chin
  2021-02-02 15:54                 ` Vincent Guittot
  2021-02-04  8:01                 ` Vincent Guittot
  0 siblings, 2 replies; 20+ messages in thread
From: chin @ 2021-02-02  7:54 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, heddchen,
	xiaoggchen(陈小光)




At 2021-01-13 16:30:14, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>On Wed, 13 Jan 2021 at 04:14, chin <ultrachin@163.com> wrote:
>>
>>
>>
>>
>> At 2021-01-12 16:18:51, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>> >On Tue, 12 Jan 2021 at 07:59, chin <ultrachin@163.com> wrote:
>> >>
>> >>
>> >>
>> >>
>> >> At 2021-01-11 19:04:19, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>> >> >On Mon, 11 Jan 2021 at 09:27, chin <ultrachin@163.com> wrote:
>> >> >>
>> >> >>
>> >> >> At 2020-12-23 19:30:26, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>> >> >> >On Wed, 23 Dec 2020 at 09:32, <ultrachin@163.com> wrote:
>> >> >> >>
>> >> >> >> From: Chen Xiaoguang <xiaoggchen@tencent.com>
>> >> >> >>
>> >> >> >> Before a CPU switches from running SCHED_NORMAL task to
>> >> >> >> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
>> >> >> >
>> >> >> >Could you explain more in detail why you only care about this use case
>> >> >>
>> >> >> >in particular and not the general case?
>> >> >>
>> >> >>
>> >> >> We want to run online tasks using SCHED_NORMAL policy and offline tasks
>> >> >> using SCHED_IDLE policy. The online tasks and the offline tasks run in
>> >> >> the same computer in order to use the computer efficiently.
>> >> >> The online tasks are in sleep in most times but should responce soon once
>> >> >> wake up. The offline tasks are in low priority and will run only when no online
>> >> >> tasks.
>> >> >>
>> >> >> The online tasks are more important than the offline tasks and are latency
>> >> >> sensitive we should make sure the online tasks preempt the offline tasks
>> >> >> as soon as possilbe while there are online tasks waiting to run.
>> >> >> So in our situation we hope the SCHED_NORMAL to run if has any.
>> >> >>
>> >> >> Let's assume we have 2 CPUs,
>> >> >> In CPU1 we got 2 SCHED_NORMAL tasks.
>> >> >> in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.
>> >> >>
>> >> >>              CPU1                      CPU2
>> >> >>         curr       rq1            curr          rq2
>> >> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
>> >> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> >>
>> >> >>                                  NORMAL exits or blocked
>> >> >>       +------+ | +------+                | +----+ +----+
>> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
>> >> >>       +------+ | +------+                | +----+ +----+
>> >> >>
>> >> >>                                  pick_next_task_fair
>> >> >>       +------+ | +------+         +----+ | +----+
>> >> >> t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
>> >> >>       +------+ | +------+         +----+ | +----+
>> >> >>
>> >> >>                                  SCHED_IDLE running
>> >> >> t3    +------+ | +------+        +----+  | +----+
>> >> >>       |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
>> >> >>       +------+ | +------+        +----+  | +----+
>> >> >>
>> >> >>                                  run_rebalance_domains
>> >> >>       +------+ |                +------+ | +----+ +----+
>> >> >> t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
>> >> >>       +------+ |                +------+ | +----+ +----+
>> >> >>
>> >> >> As we can see
>> >> >> t1: NORMAL task in CPU2 exits or blocked
>> >> >> t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
>> >> >> another SCHED_NORMAL in rq1 is waiting.
>> >> >> t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
>> >> >> t4: after a short time, periodic load_balance triggerd and pull
>> >> >> SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.
>> >> >>
>> >> >> In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to run.
>> >> >> The latency of this SCHED_NORMAL will be high which is not acceptble.
>> >> >>
>> >> >> Do a load_balance before running the SCHED_IDLE may fix this problem.
>> >> >>
>> >> >> This patch works as below:
>> >> >>
>> >> >>              CPU1                      CPU2
>> >> >>         curr       rq1            curr          rq2
>> >> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
>> >> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> >>
>> >> >>                                  NORMAL exits or blocked
>> >> >>       +------+ | +------+                | +----+ +----+
>> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
>> >> >>       +------+ | +------+                | +----+ +----+
>> >> >>
>> >> >> t2                            pick_next_task_fair (all se are SCHED_IDLE)
>> >> >>
>> >> >>                                  newidle_balance
>> >> >>       +------+ |                 +------+ | +----+ +----+
>> >> >> t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
>> >> >>       +------+ |                 +------+ | +----+ +----+
>> >> >>
>> >> >>
>> >> >> t1: NORMAL task in CPU2 exits or blocked
>> >> >> t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
>> >> >> newidle_balance who tries to pull a SCHED_NORMAL(if has).
>> >> >> t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
>> >> >> SCHED_IDLE(likely).
>> >> >>
>> >> >> >
>> >> >> >> CPU by doing load_balance first.
>> >> >> >>
>> >> >> >> Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
>> >> >> >> Signed-off-by: Chen He <heddchen@tencent.com>
>> >> >> >> ---
>> >> >> >>  kernel/sched/fair.c | 5 +++++
>> >> >> >>  1 file changed, 5 insertions(+)
>> >> >> >>
>> >> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> >> >> >> index ae7ceba..0a26132 100644
>> >> >> >> --- a/kernel/sched/fair.c
>> >> >> >> +++ b/kernel/sched/fair.c
>> >> >> >> @@ -7004,6 +7004,11 @@ struct task_struct *
>> >> >> >>         struct task_struct *p;
>> >> >> >>         int new_tasks;
>> >> >> >>
>> >> >> >> +       if (prev &&
>> >> >> >> +           fair_policy(prev->policy) &&
>> >> >> >
>> >> >> >Why do you need a prev and fair task  ? You seem to target the special
>> >> >> >case of pick_next_task  but in this case why not only testing rf!=null
>> >> >> > to make sure to not return immediately after jumping to the idle
>> >> >>
>> >> >> >label?
>> >> >> We just want to do load_balance only when CPU switches from SCHED_NORMAL
>> >> >> to SCHED_IDLE.
>> >> >> If not check prev, when the running tasks are all SCHED_IDLE, we would
>> >> >> do newidle_balance everytime in pick_next_task_fair, it makes no sense
>> >> >> and kind of wasting.
>> >> >
>> >> >I agree that calling newidle_balance every time pick_next_task_fair is
>> >> >called when there are only sched_idle tasks is useless.
>> >> >But you also have to take into account cases where there was another
>> >> >class of task running on the cpu like RT one. In your example above,
>> >> >if you replace the normal task on CPU2 by a RT task, you still want to
>> >>
>> >> >pick the normal task on CPU1 once RT task goes to sleep.
>> >> Sure, this case should be taken into account,  we should also try to
>> >> pick normal task in this case.
>> >>
>> >> >
>> >> >Another point that you will have to consider the impact on
>> >> >rq->idle_stamp because newidle_balance is assumed to be called before
>> >>
>> >> >going idle which is not the case anymore with your use case
>> >> Yes. rq->idle_stamp should not be changed in this case.
>> >>
>> >>
>> >>
>> >> Actually we want to pull a SCHED_NORMAL task (if possible) to run when a cpu is
>> >> about to run SCHED_IDLE task. But currently newidle_balance is not
>> >> designed for SCHED_IDLE  so SCHED_IDLE can also be pulled which
>> >> is useless in our situation.
>> >
>> >newidle_balance will pull a sched_idle task only if there is an
>> >imbalance which is the right thing to do IMO to ensure fairness
>> >between sched_idle tasks.  Being a sched_idle task doesn't mean that
>> >we should break the fairness
>> >
>> >>
>> >> So we plan to add a new function sched_idle_balance which only try to
>> >> pull SCHED_NORMAL tasks from the busiest cpu. And we will call
>> >> sched_idle_balance when the previous task is normal or RT and
>> >> hoping we can pull a SCHED_NORMAL task to run.
>> >>
>> >> Do you think it is ok to add a new sched_idle_balance?
>> >
>> >I don't see any reason why the scheduler should not pull a sched_idle
>> >task if there is an imbalance. That will happen anyway during the next
>>
>> >periodic load balance
>> OK. We should not pull the SCHED_IDLE tasks only in load_balance.
>>
>>
>> Do you think it make sense to do an extra load_balance when cpu is
>> about to run SCHED_IDLE task (switched from normal/RT)?
>
>I'm not sure to get your point here.
>Do you mean if a sched_idle task is picked to become the running task
>whereas there are runnable normal tasks ? This can happen if normal
>tasks are long running tasks. We should not in this case. The only
>case is when the running task, which is not a sched_idle task but a
>normal/rt/deadline one, goes to sleep and there are only sched_idle
>tasks enqueued. In this case and only in this case, we should trigger
>a load_balance to get a chance to pull a waiting normal task from
>another CPU.
>
>This means checking this state in pick_next_task_fair() and in balance_fair()

We made another change would you please give some comments?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 04a3ce2..2357301 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7029,6 +7029,10 @@ struct task_struct *
        struct task_struct *p;
        int new_tasks;
 
+       if (sched_idle_rq(rq) && prev && prev->state &&
+           prev->policy != SCHED_IDLE)
+               goto idle;
+
 again:
        if (!sched_fair_runnable(rq))
                goto idle;
@@ -10571,7 +10575,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
         * We must set idle_stamp _before_ calling idle_balance(), such that we
         * measure the duration of idle_balance() as idle time.
         */
-       this_rq->idle_stamp = rq_clock(this_rq);
+       if (!rq->nr_running)
+               this_rq->idle_stamp = rq_clock(this_rq);
 
        /*
         * Do not pull tasks towards !active CPUs...

>
>> By doing this SCHED_NORMAL tasks waiting on other cpus would get
>> a chance to be pulled to this cpu and run, it is helpful to reduce the latency
>> of SCHED_NORMAL tasks.
>>
>>
>> >>>
>> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >Also why not doing that for default case too ? i.e. balance_fair() ?
>> >> >> You are right, if you think this scenario makes sense, we will send a
>> >> >> refined patch soon :-)
>> >> >>
>> >> >> >
>> >> >> >> +           sched_idle_cpu(rq->cpu))
>> >> >> >> +               goto idle;
>> >> >> >> +
>> >> >> >>  again:
>> >> >> >>         if (!sched_fair_runnable(rq))
>> >> >> >>                 goto idle;
>> >> >> >> --
>> >> >> >> 1.8.3.1
>> >> >> >>
>> >> >> >>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2021-02-02  7:54               ` chin
@ 2021-02-02 15:54                 ` Vincent Guittot
  2021-02-03  2:53                   ` chin
  2021-02-04  8:01                 ` Vincent Guittot
  1 sibling, 1 reply; 20+ messages in thread
From: Vincent Guittot @ 2021-02-02 15:54 UTC (permalink / raw)
  To: chin
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, heddchen,
	xiaoggchen(陈小光)

On Tue, 2 Feb 2021 at 08:56, chin <ultrachin@163.com> wrote:
>
>
>
>
> At 2021-01-13 16:30:14, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >On Wed, 13 Jan 2021 at 04:14, chin <ultrachin@163.com> wrote:
> >>
> >>
> >>
> >>
> >> At 2021-01-12 16:18:51, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >> >On Tue, 12 Jan 2021 at 07:59, chin <ultrachin@163.com> wrote:
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> At 2021-01-11 19:04:19, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >> >> >On Mon, 11 Jan 2021 at 09:27, chin <ultrachin@163.com> wrote:
> >> >> >>
> >> >> >>
> >> >> >> At 2020-12-23 19:30:26, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >> >> >> >On Wed, 23 Dec 2020 at 09:32, <ultrachin@163.com> wrote:
> >> >> >> >>
> >> >> >> >> From: Chen Xiaoguang <xiaoggchen@tencent.com>
> >> >> >> >>
> >> >> >> >> Before a CPU switches from running SCHED_NORMAL task to
> >> >> >> >> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
> >> >> >> >
> >> >> >> >Could you explain more in detail why you only care about this use case
> >> >> >>
> >> >> >> >in particular and not the general case?
> >> >> >>
> >> >> >>
> >> >> >> We want to run online tasks using SCHED_NORMAL policy and offline tasks
> >> >> >> using SCHED_IDLE policy. The online tasks and the offline tasks run in
> >> >> >> the same computer in order to use the computer efficiently.
> >> >> >> The online tasks are in sleep in most times but should responce soon once
> >> >> >> wake up. The offline tasks are in low priority and will run only when no online
> >> >> >> tasks.
> >> >> >>
> >> >> >> The online tasks are more important than the offline tasks and are latency
> >> >> >> sensitive we should make sure the online tasks preempt the offline tasks
> >> >> >> as soon as possilbe while there are online tasks waiting to run.
> >> >> >> So in our situation we hope the SCHED_NORMAL to run if has any.
> >> >> >>
> >> >> >> Let's assume we have 2 CPUs,
> >> >> >> In CPU1 we got 2 SCHED_NORMAL tasks.
> >> >> >> in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.
> >> >> >>
> >> >> >>              CPU1                      CPU2
> >> >> >>         curr       rq1            curr          rq2
> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> >>
> >> >> >>                                  NORMAL exits or blocked
> >> >> >>       +------+ | +------+                | +----+ +----+
> >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> >> >> >>       +------+ | +------+                | +----+ +----+
> >> >> >>
> >> >> >>                                  pick_next_task_fair
> >> >> >>       +------+ | +------+         +----+ | +----+
> >> >> >> t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
> >> >> >>       +------+ | +------+         +----+ | +----+
> >> >> >>
> >> >> >>                                  SCHED_IDLE running
> >> >> >> t3    +------+ | +------+        +----+  | +----+
> >> >> >>       |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
> >> >> >>       +------+ | +------+        +----+  | +----+
> >> >> >>
> >> >> >>                                  run_rebalance_domains
> >> >> >>       +------+ |                +------+ | +----+ +----+
> >> >> >> t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
> >> >> >>       +------+ |                +------+ | +----+ +----+
> >> >> >>
> >> >> >> As we can see
> >> >> >> t1: NORMAL task in CPU2 exits or blocked
> >> >> >> t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
> >> >> >> another SCHED_NORMAL in rq1 is waiting.
> >> >> >> t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
> >> >> >> t4: after a short time, periodic load_balance triggerd and pull
> >> >> >> SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.
> >> >> >>
> >> >> >> In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to run.
> >> >> >> The latency of this SCHED_NORMAL will be high which is not acceptble.
> >> >> >>
> >> >> >> Do a load_balance before running the SCHED_IDLE may fix this problem.
> >> >> >>
> >> >> >> This patch works as below:
> >> >> >>
> >> >> >>              CPU1                      CPU2
> >> >> >>         curr       rq1            curr          rq2
> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> >>
> >> >> >>                                  NORMAL exits or blocked
> >> >> >>       +------+ | +------+                | +----+ +----+
> >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> >> >> >>       +------+ | +------+                | +----+ +----+
> >> >> >>
> >> >> >> t2                            pick_next_task_fair (all se are SCHED_IDLE)
> >> >> >>
> >> >> >>                                  newidle_balance
> >> >> >>       +------+ |                 +------+ | +----+ +----+
> >> >> >> t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
> >> >> >>       +------+ |                 +------+ | +----+ +----+
> >> >> >>
> >> >> >>
> >> >> >> t1: NORMAL task in CPU2 exits or blocked
> >> >> >> t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
> >> >> >> newidle_balance who tries to pull a SCHED_NORMAL(if has).
> >> >> >> t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
> >> >> >> SCHED_IDLE(likely).
> >> >> >>
> >> >> >> >
> >> >> >> >> CPU by doing load_balance first.
> >> >> >> >>
> >> >> >> >> Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
> >> >> >> >> Signed-off-by: Chen He <heddchen@tencent.com>
> >> >> >> >> ---
> >> >> >> >>  kernel/sched/fair.c | 5 +++++
> >> >> >> >>  1 file changed, 5 insertions(+)
> >> >> >> >>
> >> >> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> >> >> >> index ae7ceba..0a26132 100644
> >> >> >> >> --- a/kernel/sched/fair.c
> >> >> >> >> +++ b/kernel/sched/fair.c
> >> >> >> >> @@ -7004,6 +7004,11 @@ struct task_struct *
> >> >> >> >>         struct task_struct *p;
> >> >> >> >>         int new_tasks;
> >> >> >> >>
> >> >> >> >> +       if (prev &&
> >> >> >> >> +           fair_policy(prev->policy) &&
> >> >> >> >
> >> >> >> >Why do you need a prev and fair task  ? You seem to target the special
> >> >> >> >case of pick_next_task  but in this case why not only testing rf!=null
> >> >> >> > to make sure to not return immediately after jumping to the idle
> >> >> >>
> >> >> >> >label?
> >> >> >> We just want to do load_balance only when CPU switches from SCHED_NORMAL
> >> >> >> to SCHED_IDLE.
> >> >> >> If not check prev, when the running tasks are all SCHED_IDLE, we would
> >> >> >> do newidle_balance everytime in pick_next_task_fair, it makes no sense
> >> >> >> and kind of wasting.
> >> >> >
> >> >> >I agree that calling newidle_balance every time pick_next_task_fair is
> >> >> >called when there are only sched_idle tasks is useless.
> >> >> >But you also have to take into account cases where there was another
> >> >> >class of task running on the cpu like RT one. In your example above,
> >> >> >if you replace the normal task on CPU2 by a RT task, you still want to
> >> >>
> >> >> >pick the normal task on CPU1 once RT task goes to sleep.
> >> >> Sure, this case should be taken into account,  we should also try to
> >> >> pick normal task in this case.
> >> >>
> >> >> >
> >> >> >Another point that you will have to consider the impact on
> >> >> >rq->idle_stamp because newidle_balance is assumed to be called before
> >> >>
> >> >> >going idle which is not the case anymore with your use case
> >> >> Yes. rq->idle_stamp should not be changed in this case.
> >> >>
> >> >>
> >> >>
> >> >> Actually we want to pull a SCHED_NORMAL task (if possible) to run when a cpu is
> >> >> about to run SCHED_IDLE task. But currently newidle_balance is not
> >> >> designed for SCHED_IDLE  so SCHED_IDLE can also be pulled which
> >> >> is useless in our situation.
> >> >
> >> >newidle_balance will pull a sched_idle task only if there is an
> >> >imbalance which is the right thing to do IMO to ensure fairness
> >> >between sched_idle tasks.  Being a sched_idle task doesn't mean that
> >> >we should break the fairness
> >> >
> >> >>
> >> >> So we plan to add a new function sched_idle_balance which only try to
> >> >> pull SCHED_NORMAL tasks from the busiest cpu. And we will call
> >> >> sched_idle_balance when the previous task is normal or RT and
> >> >> hoping we can pull a SCHED_NORMAL task to run.
> >> >>
> >> >> Do you think it is ok to add a new sched_idle_balance?
> >> >
> >> >I don't see any reason why the scheduler should not pull a sched_idle
> >> >task if there is an imbalance. That will happen anyway during the next
> >>
> >> >periodic load balance
> >> OK. We should not pull the SCHED_IDLE tasks only in load_balance.
> >>
> >>
> >> Do you think it make sense to do an extra load_balance when cpu is
> >> about to run SCHED_IDLE task (switched from normal/RT)?
> >
> >I'm not sure to get your point here.
> >Do you mean if a sched_idle task is picked to become the running task
> >whereas there are runnable normal tasks ? This can happen if normal
> >tasks are long running tasks. We should not in this case. The only
> >case is when the running task, which is not a sched_idle task but a
> >normal/rt/deadline one, goes to sleep and there are only sched_idle
> >tasks enqueued. In this case and only in this case, we should trigger
> >a load_balance to get a chance to pull a waiting normal task from
> >another CPU.
> >
> >This means checking this state in pick_next_task_fair() and in balance_fair()
>
> We made another change would you please give some comments?
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04a3ce2..2357301 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7029,6 +7029,10 @@ struct task_struct *
>         struct task_struct *p;
>         int new_tasks;
>
> +       if (sched_idle_rq(rq) && prev && prev->state &&
> +           prev->policy != SCHED_IDLE)
> +               goto idle;
> +
>  again:
>         if (!sched_fair_runnable(rq))
>                 goto idle;
> @@ -10571,7 +10575,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
>          * We must set idle_stamp _before_ calling idle_balance(), such that we
>          * measure the duration of idle_balance() as idle time.
>          */
> -       this_rq->idle_stamp = rq_clock(this_rq);
> +       if (!rq->nr_running)
> +               this_rq->idle_stamp = rq_clock(this_rq);

I know that I asked you to take care of not setting idle_stamp during
the last review. But I forgot that it was cleared anyway at the end of
newidle_balance() if there is some tasks running on the cpu so this is
not needed and make the code less readable

>
>         /*
>          * Do not pull tasks towards !active CPUs...
>

I don't see the change for balance_fair()
When a rt task goes back to sleep and there is only sched_idle tasks
as an example


> >
> >> By doing this SCHED_NORMAL tasks waiting on other cpus would get
> >> a chance to be pulled to this cpu and run, it is helpful to reduce the latency
> >> of SCHED_NORMAL tasks.
> >>
> >>
> >> >>>
> >> >> >
> >> >> >>
> >> >> >> >
> >> >> >>
> >> >> >> >Also why not doing that for default case too ? i.e. balance_fair() ?
> >> >> >> You are right, if you think this scenario makes sense, we will send a
> >> >> >> refined patch soon :-)
> >> >> >>
> >> >> >> >
> >> >> >> >> +           sched_idle_cpu(rq->cpu))
> >> >> >> >> +               goto idle;
> >> >> >> >> +
> >> >> >> >>  again:
> >> >> >> >>         if (!sched_fair_runnable(rq))
> >> >> >> >>                 goto idle;
> >> >> >> >> --
> >> >> >> >> 1.8.3.1
> >> >> >> >>
> >> >> >> >>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2021-02-02 15:54                 ` Vincent Guittot
@ 2021-02-03  2:53                   ` chin
  2021-02-04  3:57                     ` Jiang Biao
  0 siblings, 1 reply; 20+ messages in thread
From: chin @ 2021-02-03  2:53 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, heddchen,
	xiaoggchen(陈小光)




At 2021-02-02 23:54:15, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>On Tue, 2 Feb 2021 at 08:56, chin <ultrachin@163.com> wrote:
>>
>>
>>
>>
>> At 2021-01-13 16:30:14, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>> >On Wed, 13 Jan 2021 at 04:14, chin <ultrachin@163.com> wrote:
>> >>
>> >>
>> >>
>> >>
>> >> At 2021-01-12 16:18:51, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>> >> >On Tue, 12 Jan 2021 at 07:59, chin <ultrachin@163.com> wrote:
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> At 2021-01-11 19:04:19, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>> >> >> >On Mon, 11 Jan 2021 at 09:27, chin <ultrachin@163.com> wrote:
>> >> >> >>
>> >> >> >>
>> >> >> >> At 2020-12-23 19:30:26, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>> >> >> >> >On Wed, 23 Dec 2020 at 09:32, <ultrachin@163.com> wrote:
>> >> >> >> >>
>> >> >> >> >> From: Chen Xiaoguang <xiaoggchen@tencent.com>
>> >> >> >> >>
>> >> >> >> >> Before a CPU switches from running SCHED_NORMAL task to
>> >> >> >> >> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
>> >> >> >> >
>> >> >> >> >Could you explain more in detail why you only care about this use case
>> >> >> >>
>> >> >> >> >in particular and not the general case?
>> >> >> >>
>> >> >> >>
>> >> >> >> We want to run online tasks using SCHED_NORMAL policy and offline tasks
>> >> >> >> using SCHED_IDLE policy. The online tasks and the offline tasks run in
>> >> >> >> the same computer in order to use the computer efficiently.
>> >> >> >> The online tasks are in sleep in most times but should responce soon once
>> >> >> >> wake up. The offline tasks are in low priority and will run only when no online
>> >> >> >> tasks.
>> >> >> >>
>> >> >> >> The online tasks are more important than the offline tasks and are latency
>> >> >> >> sensitive we should make sure the online tasks preempt the offline tasks
>> >> >> >> as soon as possilbe while there are online tasks waiting to run.
>> >> >> >> So in our situation we hope the SCHED_NORMAL to run if has any.
>> >> >> >>
>> >> >> >> Let's assume we have 2 CPUs,
>> >> >> >> In CPU1 we got 2 SCHED_NORMAL tasks.
>> >> >> >> in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.
>> >> >> >>
>> >> >> >>              CPU1                      CPU2
>> >> >> >>         curr       rq1            curr          rq2
>> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
>> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> >> >>
>> >> >> >>                                  NORMAL exits or blocked
>> >> >> >>       +------+ | +------+                | +----+ +----+
>> >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
>> >> >> >>       +------+ | +------+                | +----+ +----+
>> >> >> >>
>> >> >> >>                                  pick_next_task_fair
>> >> >> >>       +------+ | +------+         +----+ | +----+
>> >> >> >> t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
>> >> >> >>       +------+ | +------+         +----+ | +----+
>> >> >> >>
>> >> >> >>                                  SCHED_IDLE running
>> >> >> >> t3    +------+ | +------+        +----+  | +----+
>> >> >> >>       |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
>> >> >> >>       +------+ | +------+        +----+  | +----+
>> >> >> >>
>> >> >> >>                                  run_rebalance_domains
>> >> >> >>       +------+ |                +------+ | +----+ +----+
>> >> >> >> t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
>> >> >> >>       +------+ |                +------+ | +----+ +----+
>> >> >> >>
>> >> >> >> As we can see
>> >> >> >> t1: NORMAL task in CPU2 exits or blocked
>> >> >> >> t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
>> >> >> >> another SCHED_NORMAL in rq1 is waiting.
>> >> >> >> t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
>> >> >> >> t4: after a short time, periodic load_balance triggerd and pull
>> >> >> >> SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.
>> >> >> >>
>> >> >> >> In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to run.
>> >> >> >> The latency of this SCHED_NORMAL will be high which is not acceptble.
>> >> >> >>
>> >> >> >> Do a load_balance before running the SCHED_IDLE may fix this problem.
>> >> >> >>
>> >> >> >> This patch works as below:
>> >> >> >>
>> >> >> >>              CPU1                      CPU2
>> >> >> >>         curr       rq1            curr          rq2
>> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
>> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> >> >>
>> >> >> >>                                  NORMAL exits or blocked
>> >> >> >>       +------+ | +------+                | +----+ +----+
>> >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
>> >> >> >>       +------+ | +------+                | +----+ +----+
>> >> >> >>
>> >> >> >> t2                            pick_next_task_fair (all se are SCHED_IDLE)
>> >> >> >>
>> >> >> >>                                  newidle_balance
>> >> >> >>       +------+ |                 +------+ | +----+ +----+
>> >> >> >> t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
>> >> >> >>       +------+ |                 +------+ | +----+ +----+
>> >> >> >>
>> >> >> >>
>> >> >> >> t1: NORMAL task in CPU2 exits or blocked
>> >> >> >> t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
>> >> >> >> newidle_balance who tries to pull a SCHED_NORMAL(if has).
>> >> >> >> t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
>> >> >> >> SCHED_IDLE(likely).
>> >> >> >>
>> >> >> >> >
>> >> >> >> >> CPU by doing load_balance first.
>> >> >> >> >>
>> >> >> >> >> Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
>> >> >> >> >> Signed-off-by: Chen He <heddchen@tencent.com>
>> >> >> >> >> ---
>> >> >> >> >>  kernel/sched/fair.c | 5 +++++
>> >> >> >> >>  1 file changed, 5 insertions(+)
>> >> >> >> >>
>> >> >> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> >> >> >> >> index ae7ceba..0a26132 100644
>> >> >> >> >> --- a/kernel/sched/fair.c
>> >> >> >> >> +++ b/kernel/sched/fair.c
>> >> >> >> >> @@ -7004,6 +7004,11 @@ struct task_struct *
>> >> >> >> >>         struct task_struct *p;
>> >> >> >> >>         int new_tasks;
>> >> >> >> >>
>> >> >> >> >> +       if (prev &&
>> >> >> >> >> +           fair_policy(prev->policy) &&
>> >> >> >> >
>> >> >> >> >Why do you need a prev and fair task  ? You seem to target the special
>> >> >> >> >case of pick_next_task  but in this case why not only testing rf!=null
>> >> >> >> > to make sure to not return immediately after jumping to the idle
>> >> >> >>
>> >> >> >> >label?
>> >> >> >> We just want to do load_balance only when CPU switches from SCHED_NORMAL
>> >> >> >> to SCHED_IDLE.
>> >> >> >> If not check prev, when the running tasks are all SCHED_IDLE, we would
>> >> >> >> do newidle_balance everytime in pick_next_task_fair, it makes no sense
>> >> >> >> and kind of wasting.
>> >> >> >
>> >> >> >I agree that calling newidle_balance every time pick_next_task_fair is
>> >> >> >called when there are only sched_idle tasks is useless.
>> >> >> >But you also have to take into account cases where there was another
>> >> >> >class of task running on the cpu like RT one. In your example above,
>> >> >> >if you replace the normal task on CPU2 by a RT task, you still want to
>> >> >>
>> >> >> >pick the normal task on CPU1 once RT task goes to sleep.
>> >> >> Sure, this case should be taken into account,  we should also try to
>> >> >> pick normal task in this case.
>> >> >>
>> >> >> >
>> >> >> >Another point that you will have to consider the impact on
>> >> >> >rq->idle_stamp because newidle_balance is assumed to be called before
>> >> >>
>> >> >> >going idle which is not the case anymore with your use case
>> >> >> Yes. rq->idle_stamp should not be changed in this case.
>> >> >>
>> >> >>
>> >> >>
>> >> >> Actually we want to pull a SCHED_NORMAL task (if possible) to run when a cpu is
>> >> >> about to run SCHED_IDLE task. But currently newidle_balance is not
>> >> >> designed for SCHED_IDLE  so SCHED_IDLE can also be pulled which
>> >> >> is useless in our situation.
>> >> >
>> >> >newidle_balance will pull a sched_idle task only if there is an
>> >> >imbalance which is the right thing to do IMO to ensure fairness
>> >> >between sched_idle tasks.  Being a sched_idle task doesn't mean that
>> >> >we should break the fairness
>> >> >
>> >> >>
>> >> >> So we plan to add a new function sched_idle_balance which only try to
>> >> >> pull SCHED_NORMAL tasks from the busiest cpu. And we will call
>> >> >> sched_idle_balance when the previous task is normal or RT and
>> >> >> hoping we can pull a SCHED_NORMAL task to run.
>> >> >>
>> >> >> Do you think it is ok to add a new sched_idle_balance?
>> >> >
>> >> >I don't see any reason why the scheduler should not pull a sched_idle
>> >> >task if there is an imbalance. That will happen anyway during the next
>> >>
>> >> >periodic load balance
>> >> OK. We should not pull the SCHED_IDLE tasks only in load_balance.
>> >>
>> >>
>> >> Do you think it make sense to do an extra load_balance when cpu is
>> >> about to run SCHED_IDLE task (switched from normal/RT)?
>> >
>> >I'm not sure to get your point here.
>> >Do you mean if a sched_idle task is picked to become the running task
>> >whereas there are runnable normal tasks ? This can happen if normal
>> >tasks are long running tasks. We should not in this case. The only
>> >case is when the running task, which is not a sched_idle task but a
>> >normal/rt/deadline one, goes to sleep and there are only sched_idle
>> >tasks enqueued. In this case and only in this case, we should trigger
>> >a load_balance to get a chance to pull a waiting normal task from
>> >another CPU.
>> >
>> >This means checking this state in pick_next_task_fair() and in balance_fair()
>>
>> We made another change would you please give some comments?
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 04a3ce2..2357301 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -7029,6 +7029,10 @@ struct task_struct *
>>         struct task_struct *p;
>>         int new_tasks;
>>
>> +       if (sched_idle_rq(rq) && prev && prev->state &&
>> +           prev->policy != SCHED_IDLE)
>> +               goto idle;
>> +
>>  again:
>>         if (!sched_fair_runnable(rq))
>>                 goto idle;
>> @@ -10571,7 +10575,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
>>          * We must set idle_stamp _before_ calling idle_balance(), such that we
>>          * measure the duration of idle_balance() as idle time.
>>          */
>> -       this_rq->idle_stamp = rq_clock(this_rq);
>> +       if (!rq->nr_running)
>> +               this_rq->idle_stamp = rq_clock(this_rq);
>
>I know that I asked you to take care of not setting idle_stamp during
>the last review. But I forgot that it was cleared anyway at the end of
>newidle_balance() if there is some tasks running on the cpu so this is

>not needed and make the code less readable
Yes, the idle_stamp was cleared.

>
>>
>>         /*
>>          * Do not pull tasks towards !active CPUs...
>>
>
>I don't see the change for balance_fair()
>When a rt task goes back to sleep and there is only sched_idle tasks

>as an example


Yes, we should consider this situation too.
How about this one ?


diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 04a3ce2..982b842 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6849,6 +6849,9 @@ static void task_dead_fair(struct task_struct *p)
 static int
 balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
+       if (sched_idle_rq(rq))
+               return newidle_balance(rq, rf) != 0;
+
        if (rq->nr_running)
                return 1;
 
@@ -7029,6 +7032,10 @@ struct task_struct *
        struct task_struct *p;
        int new_tasks;
 
+       if (sched_idle_rq(rq) && prev && prev->state &&
+           prev->policy != SCHED_IDLE)
+               goto idle;
+
 again:
        if (!sched_fair_runnable(rq))
                goto idle;


>
>
>> >
>> >> By doing this SCHED_NORMAL tasks waiting on other cpus would get
>> >> a chance to be pulled to this cpu and run, it is helpful to reduce the latency
>> >> of SCHED_NORMAL tasks.
>> >>
>> >>
>> >> >>>
>> >> >> >
>> >> >> >>
>> >> >> >> >
>> >> >> >>
>> >> >> >> >Also why not doing that for default case too ? i.e. balance_fair() ?
>> >> >> >> You are right, if you think this scenario makes sense, we will send a
>> >> >> >> refined patch soon :-)
>> >> >> >>
>> >> >> >> >
>> >> >> >> >> +           sched_idle_cpu(rq->cpu))
>> >> >> >> >> +               goto idle;
>> >> >> >> >> +
>> >> >> >> >>  again:
>> >> >> >> >>         if (!sched_fair_runnable(rq))
>> >> >> >> >>                 goto idle;
>> >> >> >> >> --
>> >> >> >> >> 1.8.3.1
>> >> >> >> >>
>> >> >> >> >>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2021-02-03  2:53                   ` chin
@ 2021-02-04  3:57                     ` Jiang Biao
  2021-02-04  8:03                       ` Vincent Guittot
  0 siblings, 1 reply; 20+ messages in thread
From: Jiang Biao @ 2021-02-04  3:57 UTC (permalink / raw)
  To: chin
  Cc: Vincent Guittot, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, heddchen,
	xiaoggchen(陈小光)

Hi,

On Wed, 3 Feb 2021 at 19:17, chin <ultrachin@163.com> wrote:
>
>
>
>
> At 2021-02-02 23:54:15, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >On Tue, 2 Feb 2021 at 08:56, chin <ultrachin@163.com> wrote:
> >>
> >>
> >>
> >>
> >> At 2021-01-13 16:30:14, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >> >On Wed, 13 Jan 2021 at 04:14, chin <ultrachin@163.com> wrote:
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> At 2021-01-12 16:18:51, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >> >> >On Tue, 12 Jan 2021 at 07:59, chin <ultrachin@163.com> wrote:
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> At 2021-01-11 19:04:19, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >> >> >> >On Mon, 11 Jan 2021 at 09:27, chin <ultrachin@163.com> wrote:
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> At 2020-12-23 19:30:26, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >> >> >> >> >On Wed, 23 Dec 2020 at 09:32, <ultrachin@163.com> wrote:
> >> >> >> >> >>
> >> >> >> >> >> From: Chen Xiaoguang <xiaoggchen@tencent.com>
> >> >> >> >> >>
> >> >> >> >> >> Before a CPU switches from running SCHED_NORMAL task to
> >> >> >> >> >> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
> >> >> >> >> >
> >> >> >> >> >Could you explain more in detail why you only care about this use case
> >> >> >> >>
> >> >> >> >> >in particular and not the general case?
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> We want to run online tasks using SCHED_NORMAL policy and offline tasks
> >> >> >> >> using SCHED_IDLE policy. The online tasks and the offline tasks run in
> >> >> >> >> the same computer in order to use the computer efficiently.
> >> >> >> >> The online tasks are in sleep in most times but should responce soon once
> >> >> >> >> wake up. The offline tasks are in low priority and will run only when no online
> >> >> >> >> tasks.
> >> >> >> >>
> >> >> >> >> The online tasks are more important than the offline tasks and are latency
> >> >> >> >> sensitive we should make sure the online tasks preempt the offline tasks
> >> >> >> >> as soon as possilbe while there are online tasks waiting to run.
> >> >> >> >> So in our situation we hope the SCHED_NORMAL to run if has any.
> >> >> >> >>
> >> >> >> >> Let's assume we have 2 CPUs,
> >> >> >> >> In CPU1 we got 2 SCHED_NORMAL tasks.
> >> >> >> >> in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.
> >> >> >> >>
> >> >> >> >>              CPU1                      CPU2
> >> >> >> >>         curr       rq1            curr          rq2
> >> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> >> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> >> >>
> >> >> >> >>                                  NORMAL exits or blocked
> >> >> >> >>       +------+ | +------+                | +----+ +----+
> >> >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> >> >> >> >>       +------+ | +------+                | +----+ +----+
> >> >> >> >>
> >> >> >> >>                                  pick_next_task_fair
> >> >> >> >>       +------+ | +------+         +----+ | +----+
> >> >> >> >> t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
> >> >> >> >>       +------+ | +------+         +----+ | +----+
> >> >> >> >>
> >> >> >> >>                                  SCHED_IDLE running
> >> >> >> >> t3    +------+ | +------+        +----+  | +----+
> >> >> >> >>       |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
> >> >> >> >>       +------+ | +------+        +----+  | +----+
> >> >> >> >>
> >> >> >> >>                                  run_rebalance_domains
> >> >> >> >>       +------+ |                +------+ | +----+ +----+
> >> >> >> >> t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
> >> >> >> >>       +------+ |                +------+ | +----+ +----+
> >> >> >> >>
> >> >> >> >> As we can see
> >> >> >> >> t1: NORMAL task in CPU2 exits or blocked
> >> >> >> >> t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
> >> >> >> >> another SCHED_NORMAL in rq1 is waiting.
> >> >> >> >> t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
> >> >> >> >> t4: after a short time, periodic load_balance triggerd and pull
> >> >> >> >> SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.
> >> >> >> >>
> >> >> >> >> In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to run.
> >> >> >> >> The latency of this SCHED_NORMAL will be high which is not acceptble.
> >> >> >> >>
> >> >> >> >> Do a load_balance before running the SCHED_IDLE may fix this problem.
> >> >> >> >>
> >> >> >> >> This patch works as below:
> >> >> >> >>
> >> >> >> >>              CPU1                      CPU2
> >> >> >> >>         curr       rq1            curr          rq2
> >> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> >> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> >> >>
> >> >> >> >>                                  NORMAL exits or blocked
> >> >> >> >>       +------+ | +------+                | +----+ +----+
> >> >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> >> >> >> >>       +------+ | +------+                | +----+ +----+
> >> >> >> >>
> >> >> >> >> t2                            pick_next_task_fair (all se are SCHED_IDLE)
> >> >> >> >>
> >> >> >> >>                                  newidle_balance
> >> >> >> >>       +------+ |                 +------+ | +----+ +----+
> >> >> >> >> t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
> >> >> >> >>       +------+ |                 +------+ | +----+ +----+
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> t1: NORMAL task in CPU2 exits or blocked
> >> >> >> >> t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
> >> >> >> >> newidle_balance who tries to pull a SCHED_NORMAL(if has).
> >> >> >> >> t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
> >> >> >> >> SCHED_IDLE(likely).
> >> >> >> >>
> >> >> >> >> >
> >> >> >> >> >> CPU by doing load_balance first.
> >> >> >> >> >>
> >> >> >> >> >> Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
> >> >> >> >> >> Signed-off-by: Chen He <heddchen@tencent.com>
> >> >> >> >> >> ---
> >> >> >> >> >>  kernel/sched/fair.c | 5 +++++
> >> >> >> >> >>  1 file changed, 5 insertions(+)
> >> >> >> >> >>
> >> >> >> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> >> >> >> >> index ae7ceba..0a26132 100644
> >> >> >> >> >> --- a/kernel/sched/fair.c
> >> >> >> >> >> +++ b/kernel/sched/fair.c
> >> >> >> >> >> @@ -7004,6 +7004,11 @@ struct task_struct *
> >> >> >> >> >>         struct task_struct *p;
> >> >> >> >> >>         int new_tasks;
> >> >> >> >> >>
> >> >> >> >> >> +       if (prev &&
> >> >> >> >> >> +           fair_policy(prev->policy) &&
> >> >> >> >> >
> >> >> >> >> >Why do you need a prev and fair task  ? You seem to target the special
> >> >> >> >> >case of pick_next_task  but in this case why not only testing rf!=null
> >> >> >> >> > to make sure to not return immediately after jumping to the idle
> >> >> >> >>
> >> >> >> >> >label?
> >> >> >> >> We just want to do load_balance only when CPU switches from SCHED_NORMAL
> >> >> >> >> to SCHED_IDLE.
> >> >> >> >> If not check prev, when the running tasks are all SCHED_IDLE, we would
> >> >> >> >> do newidle_balance everytime in pick_next_task_fair, it makes no sense
> >> >> >> >> and kind of wasting.
> >> >> >> >
> >> >> >> >I agree that calling newidle_balance every time pick_next_task_fair is
> >> >> >> >called when there are only sched_idle tasks is useless.
> >> >> >> >But you also have to take into account cases where there was another
> >> >> >> >class of task running on the cpu like RT one. In your example above,
> >> >> >> >if you replace the normal task on CPU2 by a RT task, you still want to
> >> >> >>
> >> >> >> >pick the normal task on CPU1 once RT task goes to sleep.
> >> >> >> Sure, this case should be taken into account,  we should also try to
> >> >> >> pick normal task in this case.
> >> >> >>
> >> >> >> >
> >> >> >> >Another point that you will have to consider the impact on
> >> >> >> >rq->idle_stamp because newidle_balance is assumed to be called before
> >> >> >>
> >> >> >> >going idle which is not the case anymore with your use case
> >> >> >> Yes. rq->idle_stamp should not be changed in this case.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Actually we want to pull a SCHED_NORMAL task (if possible) to run when a cpu is
> >> >> >> about to run SCHED_IDLE task. But currently newidle_balance is not
> >> >> >> designed for SCHED_IDLE  so SCHED_IDLE can also be pulled which
> >> >> >> is useless in our situation.
> >> >> >
> >> >> >newidle_balance will pull a sched_idle task only if there is an
> >> >> >imbalance which is the right thing to do IMO to ensure fairness
> >> >> >between sched_idle tasks.  Being a sched_idle task doesn't mean that
> >> >> >we should break the fairness
> >> >> >
> >> >> >>
> >> >> >> So we plan to add a new function sched_idle_balance which only try to
> >> >> >> pull SCHED_NORMAL tasks from the busiest cpu. And we will call
> >> >> >> sched_idle_balance when the previous task is normal or RT and
> >> >> >> hoping we can pull a SCHED_NORMAL task to run.
> >> >> >>
> >> >> >> Do you think it is ok to add a new sched_idle_balance?
> >> >> >
> >> >> >I don't see any reason why the scheduler should not pull a sched_idle
> >> >> >task if there is an imbalance. That will happen anyway during the next
> >> >>
> >> >> >periodic load balance
> >> >> OK. We should not pull the SCHED_IDLE tasks only in load_balance.
> >> >>
> >> >>
> >> >> Do you think it make sense to do an extra load_balance when cpu is
> >> >> about to run SCHED_IDLE task (switched from normal/RT)?
> >> >
> >> >I'm not sure to get your point here.
> >> >Do you mean if a sched_idle task is picked to become the running task
> >> >whereas there are runnable normal tasks ? This can happen if normal
> >> >tasks are long running tasks. We should not in this case. The only
> >> >case is when the running task, which is not a sched_idle task but a
> >> >normal/rt/deadline one, goes to sleep and there are only sched_idle
> >> >tasks enqueued. In this case and only in this case, we should trigger
> >> >a load_balance to get a chance to pull a waiting normal task from
> >> >another CPU.
> >> >
> >> >This means checking this state in pick_next_task_fair() and in balance_fair()
> >>
> >> We made another change would you please give some comments?
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 04a3ce2..2357301 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -7029,6 +7029,10 @@ struct task_struct *
> >>         struct task_struct *p;
> >>         int new_tasks;
> >>
> >> +       if (sched_idle_rq(rq) && prev && prev->state &&
> >> +           prev->policy != SCHED_IDLE)
> >> +               goto idle;
> >> +
> >>  again:
> >>         if (!sched_fair_runnable(rq))
> >>                 goto idle;
> >> @@ -10571,7 +10575,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
> >>          * We must set idle_stamp _before_ calling idle_balance(), such that we
> >>          * measure the duration of idle_balance() as idle time.
> >>          */
> >> -       this_rq->idle_stamp = rq_clock(this_rq);
> >> +       if (!rq->nr_running)
> >> +               this_rq->idle_stamp = rq_clock(this_rq);
> >
> >I know that I asked you to take care of not setting idle_stamp during
> >the last review. But I forgot that it was cleared anyway at the end of
> >newidle_balance() if there is some tasks running on the cpu so this is
>
> >not needed and make the code less readable
> Yes, the idle_stamp was cleared.
>
> >
> >>
> >>         /*
> >>          * Do not pull tasks towards !active CPUs...
> >>
> >
> >I don't see the change for balance_fair()
> >When a rt task goes back to sleep and there is only sched_idle tasks
>
> >as an example
>
>
> Yes, we should consider this situation too.
> How about this one ?
>
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04a3ce2..982b842 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6849,6 +6849,9 @@ static void task_dead_fair(struct task_struct *p)
>  static int
>  balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
>  {
> +       if (sched_idle_rq(rq))
> +               return newidle_balance(rq, rf) != 0;
> +
>         if (rq->nr_running)
>                 return 1;
>
Maybe we'd better merge the branches? like,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ac950ac950bc..259deda79c06 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6852,7 +6852,7 @@ static void task_dead_fair(struct task_struct *p)
 static int
 balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
-       if (rq->nr_running)
+       if (rq->nr_running && !sched_idle_rq(rq))
                return 1;

        return newidle_balance(rq, rf) != 0;

Thx.
Regards,
Jiang

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2021-02-02  7:54               ` chin
  2021-02-02 15:54                 ` Vincent Guittot
@ 2021-02-04  8:01                 ` Vincent Guittot
  2021-02-04  8:52                   ` chin
  2021-02-04  9:13                   ` Jiang Biao
  1 sibling, 2 replies; 20+ messages in thread
From: Vincent Guittot @ 2021-02-04  8:01 UTC (permalink / raw)
  To: chin
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, heddchen,
	xiaoggchen(陈小光)

On Tue, 2 Feb 2021 at 08:56, chin <ultrachin@163.com> wrote:
>
>
>
>
> At 2021-01-13 16:30:14, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >On Wed, 13 Jan 2021 at 04:14, chin <ultrachin@163.com> wrote:
> >>
> >>
> >>
> >>
> >> At 2021-01-12 16:18:51, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >> >On Tue, 12 Jan 2021 at 07:59, chin <ultrachin@163.com> wrote:
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> At 2021-01-11 19:04:19, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >> >> >On Mon, 11 Jan 2021 at 09:27, chin <ultrachin@163.com> wrote:
> >> >> >>
> >> >> >>
> >> >> >> At 2020-12-23 19:30:26, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >> >> >> >On Wed, 23 Dec 2020 at 09:32, <ultrachin@163.com> wrote:
> >> >> >> >>
> >> >> >> >> From: Chen Xiaoguang <xiaoggchen@tencent.com>
> >> >> >> >>
> >> >> >> >> Before a CPU switches from running SCHED_NORMAL task to
> >> >> >> >> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
> >> >> >> >
> >> >> >> >Could you explain more in detail why you only care about this use case
> >> >> >>
> >> >> >> >in particular and not the general case?
> >> >> >>
> >> >> >>
> >> >> >> We want to run online tasks using SCHED_NORMAL policy and offline tasks
> >> >> >> using SCHED_IDLE policy. The online tasks and the offline tasks run in
> >> >> >> the same computer in order to use the computer efficiently.
> >> >> >> The online tasks are in sleep in most times but should responce soon once
> >> >> >> wake up. The offline tasks are in low priority and will run only when no online
> >> >> >> tasks.
> >> >> >>
> >> >> >> The online tasks are more important than the offline tasks and are latency
> >> >> >> sensitive we should make sure the online tasks preempt the offline tasks
> >> >> >> as soon as possilbe while there are online tasks waiting to run.
> >> >> >> So in our situation we hope the SCHED_NORMAL to run if has any.
> >> >> >>
> >> >> >> Let's assume we have 2 CPUs,
> >> >> >> In CPU1 we got 2 SCHED_NORMAL tasks.
> >> >> >> in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.
> >> >> >>
> >> >> >>              CPU1                      CPU2
> >> >> >>         curr       rq1            curr          rq2
> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> >>
> >> >> >>                                  NORMAL exits or blocked
> >> >> >>       +------+ | +------+                | +----+ +----+
> >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> >> >> >>       +------+ | +------+                | +----+ +----+
> >> >> >>
> >> >> >>                                  pick_next_task_fair
> >> >> >>       +------+ | +------+         +----+ | +----+
> >> >> >> t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
> >> >> >>       +------+ | +------+         +----+ | +----+
> >> >> >>
> >> >> >>                                  SCHED_IDLE running
> >> >> >> t3    +------+ | +------+        +----+  | +----+
> >> >> >>       |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
> >> >> >>       +------+ | +------+        +----+  | +----+
> >> >> >>
> >> >> >>                                  run_rebalance_domains
> >> >> >>       +------+ |                +------+ | +----+ +----+
> >> >> >> t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
> >> >> >>       +------+ |                +------+ | +----+ +----+
> >> >> >>
> >> >> >> As we can see
> >> >> >> t1: NORMAL task in CPU2 exits or blocked
> >> >> >> t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
> >> >> >> another SCHED_NORMAL in rq1 is waiting.
> >> >> >> t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
> >> >> >> t4: after a short time, periodic load_balance triggerd and pull
> >> >> >> SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.
> >> >> >>
> >> >> >> In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to run.
> >> >> >> The latency of this SCHED_NORMAL will be high which is not acceptble.
> >> >> >>
> >> >> >> Do a load_balance before running the SCHED_IDLE may fix this problem.
> >> >> >>
> >> >> >> This patch works as below:
> >> >> >>
> >> >> >>              CPU1                      CPU2
> >> >> >>         curr       rq1            curr          rq2
> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> >>
> >> >> >>                                  NORMAL exits or blocked
> >> >> >>       +------+ | +------+                | +----+ +----+
> >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> >> >> >>       +------+ | +------+                | +----+ +----+
> >> >> >>
> >> >> >> t2                            pick_next_task_fair (all se are SCHED_IDLE)
> >> >> >>
> >> >> >>                                  newidle_balance
> >> >> >>       +------+ |                 +------+ | +----+ +----+
> >> >> >> t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
> >> >> >>       +------+ |                 +------+ | +----+ +----+
> >> >> >>
> >> >> >>
> >> >> >> t1: NORMAL task in CPU2 exits or blocked
> >> >> >> t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
> >> >> >> newidle_balance who tries to pull a SCHED_NORMAL(if has).
> >> >> >> t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
> >> >> >> SCHED_IDLE(likely).
> >> >> >>
> >> >> >> >
> >> >> >> >> CPU by doing load_balance first.
> >> >> >> >>
> >> >> >> >> Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
> >> >> >> >> Signed-off-by: Chen He <heddchen@tencent.com>
> >> >> >> >> ---
> >> >> >> >>  kernel/sched/fair.c | 5 +++++
> >> >> >> >>  1 file changed, 5 insertions(+)
> >> >> >> >>
> >> >> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> >> >> >> index ae7ceba..0a26132 100644
> >> >> >> >> --- a/kernel/sched/fair.c
> >> >> >> >> +++ b/kernel/sched/fair.c
> >> >> >> >> @@ -7004,6 +7004,11 @@ struct task_struct *
> >> >> >> >>         struct task_struct *p;
> >> >> >> >>         int new_tasks;
> >> >> >> >>
> >> >> >> >> +       if (prev &&
> >> >> >> >> +           fair_policy(prev->policy) &&
> >> >> >> >
> >> >> >> >Why do you need a prev and fair task  ? You seem to target the special
> >> >> >> >case of pick_next_task  but in this case why not only testing rf!=null
> >> >> >> > to make sure to not return immediately after jumping to the idle
> >> >> >>
> >> >> >> >label?
> >> >> >> We just want to do load_balance only when CPU switches from SCHED_NORMAL
> >> >> >> to SCHED_IDLE.
> >> >> >> If not check prev, when the running tasks are all SCHED_IDLE, we would
> >> >> >> do newidle_balance everytime in pick_next_task_fair, it makes no sense
> >> >> >> and kind of wasting.
> >> >> >
> >> >> >I agree that calling newidle_balance every time pick_next_task_fair is
> >> >> >called when there are only sched_idle tasks is useless.
> >> >> >But you also have to take into account cases where there was another
> >> >> >class of task running on the cpu like RT one. In your example above,
> >> >> >if you replace the normal task on CPU2 by a RT task, you still want to
> >> >>
> >> >> >pick the normal task on CPU1 once RT task goes to sleep.
> >> >> Sure, this case should be taken into account,  we should also try to
> >> >> pick normal task in this case.
> >> >>
> >> >> >
> >> >> >Another point that you will have to consider the impact on
> >> >> >rq->idle_stamp because newidle_balance is assumed to be called before
> >> >>
> >> >> >going idle which is not the case anymore with your use case
> >> >> Yes. rq->idle_stamp should not be changed in this case.
> >> >>
> >> >>
> >> >>
> >> >> Actually we want to pull a SCHED_NORMAL task (if possible) to run when a cpu is
> >> >> about to run SCHED_IDLE task. But currently newidle_balance is not
> >> >> designed for SCHED_IDLE  so SCHED_IDLE can also be pulled which
> >> >> is useless in our situation.
> >> >
> >> >newidle_balance will pull a sched_idle task only if there is an
> >> >imbalance which is the right thing to do IMO to ensure fairness
> >> >between sched_idle tasks.  Being a sched_idle task doesn't mean that
> >> >we should break the fairness
> >> >
> >> >>
> >> >> So we plan to add a new function sched_idle_balance which only try to
> >> >> pull SCHED_NORMAL tasks from the busiest cpu. And we will call
> >> >> sched_idle_balance when the previous task is normal or RT and
> >> >> hoping we can pull a SCHED_NORMAL task to run.
> >> >>
> >> >> Do you think it is ok to add a new sched_idle_balance?
> >> >
> >> >I don't see any reason why the scheduler should not pull a sched_idle
> >> >task if there is an imbalance. That will happen anyway during the next
> >>
> >> >periodic load balance
> >> OK. We should not pull the SCHED_IDLE tasks only in load_balance.
> >>
> >>
> >> Do you think it make sense to do an extra load_balance when cpu is
> >> about to run SCHED_IDLE task (switched from normal/RT)?
> >
> >I'm not sure to get your point here.
> >Do you mean if a sched_idle task is picked to become the running task
> >whereas there are runnable normal tasks ? This can happen if normal
> >tasks are long running tasks. We should not in this case. The only
> >case is when the running task, which is not a sched_idle task but a
> >normal/rt/deadline one, goes to sleep and there are only sched_idle
> >tasks enqueued. In this case and only in this case, we should trigger
> >a load_balance to get a chance to pull a waiting normal task from
> >another CPU.
> >
> >This means checking this state in pick_next_task_fair() and in balance_fair()
>
> We made another change would you please give some comments?
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04a3ce2..2357301 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7029,6 +7029,10 @@ struct task_struct *
>         struct task_struct *p;
>         int new_tasks;
>
> +       if (sched_idle_rq(rq) && prev && prev->state &&
> +           prev->policy != SCHED_IDLE)

This need a comment to explain what it want to achieve

Why do you need to test prev->state ?

> +               goto idle;
> +
>  again:
>         if (!sched_fair_runnable(rq))
>                 goto idle;
> @@ -10571,7 +10575,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
>          * We must set idle_stamp _before_ calling idle_balance(), such that we
>          * measure the duration of idle_balance() as idle time.
>          */
> -       this_rq->idle_stamp = rq_clock(this_rq);
> +       if (!rq->nr_running)
> +               this_rq->idle_stamp = rq_clock(this_rq);
>
>         /*
>          * Do not pull tasks towards !active CPUs...
>
> >
> >> By doing this SCHED_NORMAL tasks waiting on other cpus would get
> >> a chance to be pulled to this cpu and run, it is helpful to reduce the latency
> >> of SCHED_NORMAL tasks.
> >>
> >>
> >> >>>
> >> >> >
> >> >> >>
> >> >> >> >
> >> >> >>
> >> >> >> >Also why not doing that for default case too ? i.e. balance_fair() ?
> >> >> >> You are right, if you think this scenario makes sense, we will send a
> >> >> >> refined patch soon :-)
> >> >> >>
> >> >> >> >
> >> >> >> >> +           sched_idle_cpu(rq->cpu))
> >> >> >> >> +               goto idle;
> >> >> >> >> +
> >> >> >> >>  again:
> >> >> >> >>         if (!sched_fair_runnable(rq))
> >> >> >> >>                 goto idle;
> >> >> >> >> --
> >> >> >> >> 1.8.3.1
> >> >> >> >>
> >> >> >> >>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2021-02-04  3:57                     ` Jiang Biao
@ 2021-02-04  8:03                       ` Vincent Guittot
  0 siblings, 0 replies; 20+ messages in thread
From: Vincent Guittot @ 2021-02-04  8:03 UTC (permalink / raw)
  To: Jiang Biao
  Cc: chin, linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, heddchen,
	xiaoggchen(陈小光)

On Thu, 4 Feb 2021 at 04:57, Jiang Biao <benbjiang@gmail.com> wrote:
>
> Hi,
>
> On Wed, 3 Feb 2021 at 19:17, chin <ultrachin@163.com> wrote:
> >
> >
> >
> >
> > At 2021-02-02 23:54:15, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> > >On Tue, 2 Feb 2021 at 08:56, chin <ultrachin@163.com> wrote:
> > >>
> > >>
> > >>
> > >>
> > >> At 2021-01-13 16:30:14, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> > >> >On Wed, 13 Jan 2021 at 04:14, chin <ultrachin@163.com> wrote:
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> At 2021-01-12 16:18:51, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> > >> >> >On Tue, 12 Jan 2021 at 07:59, chin <ultrachin@163.com> wrote:
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> At 2021-01-11 19:04:19, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> > >> >> >> >On Mon, 11 Jan 2021 at 09:27, chin <ultrachin@163.com> wrote:
> > >> >> >> >>
> > >> >> >> >>
> > >> >> >> >> At 2020-12-23 19:30:26, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> > >> >> >> >> >On Wed, 23 Dec 2020 at 09:32, <ultrachin@163.com> wrote:
> > >> >> >> >> >>
> > >> >> >> >> >> From: Chen Xiaoguang <xiaoggchen@tencent.com>
> > >> >> >> >> >>
> > >> >> >> >> >> Before a CPU switches from running SCHED_NORMAL task to
> > >> >> >> >> >> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
> > >> >> >> >> >
> > >> >> >> >> >Could you explain more in detail why you only care about this use case
> > >> >> >> >>
> > >> >> >> >> >in particular and not the general case?
> > >> >> >> >>
> > >> >> >> >>
> > >> >> >> >> We want to run online tasks using SCHED_NORMAL policy and offline tasks
> > >> >> >> >> using SCHED_IDLE policy. The online tasks and the offline tasks run in
> > >> >> >> >> the same computer in order to use the computer efficiently.
> > >> >> >> >> The online tasks are in sleep in most times but should responce soon once
> > >> >> >> >> wake up. The offline tasks are in low priority and will run only when no online
> > >> >> >> >> tasks.
> > >> >> >> >>
> > >> >> >> >> The online tasks are more important than the offline tasks and are latency
> > >> >> >> >> sensitive we should make sure the online tasks preempt the offline tasks
> > >> >> >> >> as soon as possilbe while there are online tasks waiting to run.
> > >> >> >> >> So in our situation we hope the SCHED_NORMAL to run if has any.
> > >> >> >> >>
> > >> >> >> >> Let's assume we have 2 CPUs,
> > >> >> >> >> In CPU1 we got 2 SCHED_NORMAL tasks.
> > >> >> >> >> in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.
> > >> >> >> >>
> > >> >> >> >>              CPU1                      CPU2
> > >> >> >> >>         curr       rq1            curr          rq2
> > >> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> > >> >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> > >> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> > >> >> >> >>
> > >> >> >> >>                                  NORMAL exits or blocked
> > >> >> >> >>       +------+ | +------+                | +----+ +----+
> > >> >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> > >> >> >> >>       +------+ | +------+                | +----+ +----+
> > >> >> >> >>
> > >> >> >> >>                                  pick_next_task_fair
> > >> >> >> >>       +------+ | +------+         +----+ | +----+
> > >> >> >> >> t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
> > >> >> >> >>       +------+ | +------+         +----+ | +----+
> > >> >> >> >>
> > >> >> >> >>                                  SCHED_IDLE running
> > >> >> >> >> t3    +------+ | +------+        +----+  | +----+
> > >> >> >> >>       |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
> > >> >> >> >>       +------+ | +------+        +----+  | +----+
> > >> >> >> >>
> > >> >> >> >>                                  run_rebalance_domains
> > >> >> >> >>       +------+ |                +------+ | +----+ +----+
> > >> >> >> >> t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
> > >> >> >> >>       +------+ |                +------+ | +----+ +----+
> > >> >> >> >>
> > >> >> >> >> As we can see
> > >> >> >> >> t1: NORMAL task in CPU2 exits or blocked
> > >> >> >> >> t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
> > >> >> >> >> another SCHED_NORMAL in rq1 is waiting.
> > >> >> >> >> t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
> > >> >> >> >> t4: after a short time, periodic load_balance triggerd and pull
> > >> >> >> >> SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.
> > >> >> >> >>
> > >> >> >> >> In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to run.
> > >> >> >> >> The latency of this SCHED_NORMAL will be high which is not acceptble.
> > >> >> >> >>
> > >> >> >> >> Do a load_balance before running the SCHED_IDLE may fix this problem.
> > >> >> >> >>
> > >> >> >> >> This patch works as below:
> > >> >> >> >>
> > >> >> >> >>              CPU1                      CPU2
> > >> >> >> >>         curr       rq1            curr          rq2
> > >> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> > >> >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> > >> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> > >> >> >> >>
> > >> >> >> >>                                  NORMAL exits or blocked
> > >> >> >> >>       +------+ | +------+                | +----+ +----+
> > >> >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> > >> >> >> >>       +------+ | +------+                | +----+ +----+
> > >> >> >> >>
> > >> >> >> >> t2                            pick_next_task_fair (all se are SCHED_IDLE)
> > >> >> >> >>
> > >> >> >> >>                                  newidle_balance
> > >> >> >> >>       +------+ |                 +------+ | +----+ +----+
> > >> >> >> >> t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
> > >> >> >> >>       +------+ |                 +------+ | +----+ +----+
> > >> >> >> >>
> > >> >> >> >>
> > >> >> >> >> t1: NORMAL task in CPU2 exits or blocked
> > >> >> >> >> t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
> > >> >> >> >> newidle_balance who tries to pull a SCHED_NORMAL(if has).
> > >> >> >> >> t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
> > >> >> >> >> SCHED_IDLE(likely).
> > >> >> >> >>
> > >> >> >> >> >
> > >> >> >> >> >> CPU by doing load_balance first.
> > >> >> >> >> >>
> > >> >> >> >> >> Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
> > >> >> >> >> >> Signed-off-by: Chen He <heddchen@tencent.com>
> > >> >> >> >> >> ---
> > >> >> >> >> >>  kernel/sched/fair.c | 5 +++++
> > >> >> >> >> >>  1 file changed, 5 insertions(+)
> > >> >> >> >> >>
> > >> >> >> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > >> >> >> >> >> index ae7ceba..0a26132 100644
> > >> >> >> >> >> --- a/kernel/sched/fair.c
> > >> >> >> >> >> +++ b/kernel/sched/fair.c
> > >> >> >> >> >> @@ -7004,6 +7004,11 @@ struct task_struct *
> > >> >> >> >> >>         struct task_struct *p;
> > >> >> >> >> >>         int new_tasks;
> > >> >> >> >> >>
> > >> >> >> >> >> +       if (prev &&
> > >> >> >> >> >> +           fair_policy(prev->policy) &&
> > >> >> >> >> >
> > >> >> >> >> >Why do you need a prev and fair task  ? You seem to target the special
> > >> >> >> >> >case of pick_next_task  but in this case why not only testing rf!=null
> > >> >> >> >> > to make sure to not return immediately after jumping to the idle
> > >> >> >> >>
> > >> >> >> >> >label?
> > >> >> >> >> We just want to do load_balance only when CPU switches from SCHED_NORMAL
> > >> >> >> >> to SCHED_IDLE.
> > >> >> >> >> If not check prev, when the running tasks are all SCHED_IDLE, we would
> > >> >> >> >> do newidle_balance everytime in pick_next_task_fair, it makes no sense
> > >> >> >> >> and kind of wasting.
> > >> >> >> >
> > >> >> >> >I agree that calling newidle_balance every time pick_next_task_fair is
> > >> >> >> >called when there are only sched_idle tasks is useless.
> > >> >> >> >But you also have to take into account cases where there was another
> > >> >> >> >class of task running on the cpu like RT one. In your example above,
> > >> >> >> >if you replace the normal task on CPU2 by a RT task, you still want to
> > >> >> >>
> > >> >> >> >pick the normal task on CPU1 once RT task goes to sleep.
> > >> >> >> Sure, this case should be taken into account,  we should also try to
> > >> >> >> pick normal task in this case.
> > >> >> >>
> > >> >> >> >
> > >> >> >> >Another point that you will have to consider the impact on
> > >> >> >> >rq->idle_stamp because newidle_balance is assumed to be called before
> > >> >> >>
> > >> >> >> >going idle which is not the case anymore with your use case
> > >> >> >> Yes. rq->idle_stamp should not be changed in this case.
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> Actually we want to pull a SCHED_NORMAL task (if possible) to run when a cpu is
> > >> >> >> about to run SCHED_IDLE task. But currently newidle_balance is not
> > >> >> >> designed for SCHED_IDLE  so SCHED_IDLE can also be pulled which
> > >> >> >> is useless in our situation.
> > >> >> >
> > >> >> >newidle_balance will pull a sched_idle task only if there is an
> > >> >> >imbalance which is the right thing to do IMO to ensure fairness
> > >> >> >between sched_idle tasks.  Being a sched_idle task doesn't mean that
> > >> >> >we should break the fairness
> > >> >> >
> > >> >> >>
> > >> >> >> So we plan to add a new function sched_idle_balance which only try to
> > >> >> >> pull SCHED_NORMAL tasks from the busiest cpu. And we will call
> > >> >> >> sched_idle_balance when the previous task is normal or RT and
> > >> >> >> hoping we can pull a SCHED_NORMAL task to run.
> > >> >> >>
> > >> >> >> Do you think it is ok to add a new sched_idle_balance?
> > >> >> >
> > >> >> >I don't see any reason why the scheduler should not pull a sched_idle
> > >> >> >task if there is an imbalance. That will happen anyway during the next
> > >> >>
> > >> >> >periodic load balance
> > >> >> OK. We should not pull the SCHED_IDLE tasks only in load_balance.
> > >> >>
> > >> >>
> > >> >> Do you think it make sense to do an extra load_balance when cpu is
> > >> >> about to run SCHED_IDLE task (switched from normal/RT)?
> > >> >
> > >> >I'm not sure to get your point here.
> > >> >Do you mean if a sched_idle task is picked to become the running task
> > >> >whereas there are runnable normal tasks ? This can happen if normal
> > >> >tasks are long running tasks. We should not in this case. The only
> > >> >case is when the running task, which is not a sched_idle task but a
> > >> >normal/rt/deadline one, goes to sleep and there are only sched_idle
> > >> >tasks enqueued. In this case and only in this case, we should trigger
> > >> >a load_balance to get a chance to pull a waiting normal task from
> > >> >another CPU.
> > >> >
> > >> >This means checking this state in pick_next_task_fair() and in balance_fair()
> > >>
> > >> We made another change would you please give some comments?
> > >>
> > >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > >> index 04a3ce2..2357301 100644
> > >> --- a/kernel/sched/fair.c
> > >> +++ b/kernel/sched/fair.c
> > >> @@ -7029,6 +7029,10 @@ struct task_struct *
> > >>         struct task_struct *p;
> > >>         int new_tasks;
> > >>
> > >> +       if (sched_idle_rq(rq) && prev && prev->state &&
> > >> +           prev->policy != SCHED_IDLE)
> > >> +               goto idle;
> > >> +
> > >>  again:
> > >>         if (!sched_fair_runnable(rq))
> > >>                 goto idle;
> > >> @@ -10571,7 +10575,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
> > >>          * We must set idle_stamp _before_ calling idle_balance(), such that we
> > >>          * measure the duration of idle_balance() as idle time.
> > >>          */
> > >> -       this_rq->idle_stamp = rq_clock(this_rq);
> > >> +       if (!rq->nr_running)
> > >> +               this_rq->idle_stamp = rq_clock(this_rq);
> > >
> > >I know that I asked you to take care of not setting idle_stamp during
> > >the last review. But I forgot that it was cleared anyway at the end of
> > >newidle_balance() if there is some tasks running on the cpu so this is
> >
> > >not needed and make the code less readable
> > Yes, the idle_stamp was cleared.
> >
> > >
> > >>
> > >>         /*
> > >>          * Do not pull tasks towards !active CPUs...
> > >>
> > >
> > >I don't see the change for balance_fair()
> > >When a rt task goes back to sleep and there is only sched_idle tasks
> >
> > >as an example
> >
> >
> > Yes, we should consider this situation too.
> > How about this one ?
> >
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 04a3ce2..982b842 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6849,6 +6849,9 @@ static void task_dead_fair(struct task_struct *p)
> >  static int
> >  balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> >  {
> > +       if (sched_idle_rq(rq))
> > +               return newidle_balance(rq, rf) != 0;
> > +
> >         if (rq->nr_running)
> >                 return 1;
> >
> Maybe we'd better merge the branches? like,
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ac950ac950bc..259deda79c06 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6852,7 +6852,7 @@ static void task_dead_fair(struct task_struct *p)
>  static int
>  balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
>  {
> -       if (rq->nr_running)
> +       if (rq->nr_running && !sched_idle_rq(rq))

yes this looks better

>                 return 1;
>
>         return newidle_balance(rq, rf) != 0;
>
> Thx.
> Regards,
> Jiang

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2021-02-04  8:01                 ` Vincent Guittot
@ 2021-02-04  8:52                   ` chin
  2021-02-04  9:02                     ` Vincent Guittot
  2021-02-04  9:13                   ` Jiang Biao
  1 sibling, 1 reply; 20+ messages in thread
From: chin @ 2021-02-04  8:52 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, heddchen,
	xiaoggchen(陈小光)


At 2021-02-04 16:01:57, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>On Tue, 2 Feb 2021 at 08:56, chin <ultrachin@163.com> wrote:
>>
>>
>>
>>
>> At 2021-01-13 16:30:14, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>> >On Wed, 13 Jan 2021 at 04:14, chin <ultrachin@163.com> wrote:
>> >>
>> >>
>> >>
>> >>
>> >> At 2021-01-12 16:18:51, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>> >> >On Tue, 12 Jan 2021 at 07:59, chin <ultrachin@163.com> wrote:
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> At 2021-01-11 19:04:19, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>> >> >> >On Mon, 11 Jan 2021 at 09:27, chin <ultrachin@163.com> wrote:
>> >> >> >>
>> >> >> >>
>> >> >> >> At 2020-12-23 19:30:26, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>> >> >> >> >On Wed, 23 Dec 2020 at 09:32, <ultrachin@163.com> wrote:
>> >> >> >> >>
>> >> >> >> >> From: Chen Xiaoguang <xiaoggchen@tencent.com>
>> >> >> >> >>
>> >> >> >> >> Before a CPU switches from running SCHED_NORMAL task to
>> >> >> >> >> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
>> >> >> >> >
>> >> >> >> >Could you explain more in detail why you only care about this use case
>> >> >> >>
>> >> >> >> >in particular and not the general case?
>> >> >> >>
>> >> >> >>
>> >> >> >> We want to run online tasks using SCHED_NORMAL policy and offline tasks
>> >> >> >> using SCHED_IDLE policy. The online tasks and the offline tasks run in
>> >> >> >> the same computer in order to use the computer efficiently.
>> >> >> >> The online tasks are in sleep in most times but should responce soon once
>> >> >> >> wake up. The offline tasks are in low priority and will run only when no online
>> >> >> >> tasks.
>> >> >> >>
>> >> >> >> The online tasks are more important than the offline tasks and are latency
>> >> >> >> sensitive we should make sure the online tasks preempt the offline tasks
>> >> >> >> as soon as possilbe while there are online tasks waiting to run.
>> >> >> >> So in our situation we hope the SCHED_NORMAL to run if has any.
>> >> >> >>
>> >> >> >> Let's assume we have 2 CPUs,
>> >> >> >> In CPU1 we got 2 SCHED_NORMAL tasks.
>> >> >> >> in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.
>> >> >> >>
>> >> >> >>              CPU1                      CPU2
>> >> >> >>         curr       rq1            curr          rq2
>> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
>> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> >> >>
>> >> >> >>                                  NORMAL exits or blocked
>> >> >> >>       +------+ | +------+                | +----+ +----+
>> >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
>> >> >> >>       +------+ | +------+                | +----+ +----+
>> >> >> >>
>> >> >> >>                                  pick_next_task_fair
>> >> >> >>       +------+ | +------+         +----+ | +----+
>> >> >> >> t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
>> >> >> >>       +------+ | +------+         +----+ | +----+
>> >> >> >>
>> >> >> >>                                  SCHED_IDLE running
>> >> >> >> t3    +------+ | +------+        +----+  | +----+
>> >> >> >>       |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
>> >> >> >>       +------+ | +------+        +----+  | +----+
>> >> >> >>
>> >> >> >>                                  run_rebalance_domains
>> >> >> >>       +------+ |                +------+ | +----+ +----+
>> >> >> >> t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
>> >> >> >>       +------+ |                +------+ | +----+ +----+
>> >> >> >>
>> >> >> >> As we can see
>> >> >> >> t1: NORMAL task in CPU2 exits or blocked
>> >> >> >> t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
>> >> >> >> another SCHED_NORMAL in rq1 is waiting.
>> >> >> >> t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
>> >> >> >> t4: after a short time, periodic load_balance triggerd and pull
>> >> >> >> SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.
>> >> >> >>
>> >> >> >> In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to run.
>> >> >> >> The latency of this SCHED_NORMAL will be high which is not acceptble.
>> >> >> >>
>> >> >> >> Do a load_balance before running the SCHED_IDLE may fix this problem.
>> >> >> >>
>> >> >> >> This patch works as below:
>> >> >> >>
>> >> >> >>              CPU1                      CPU2
>> >> >> >>         curr       rq1            curr          rq2
>> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
>> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
>> >> >> >>
>> >> >> >>                                  NORMAL exits or blocked
>> >> >> >>       +------+ | +------+                | +----+ +----+
>> >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
>> >> >> >>       +------+ | +------+                | +----+ +----+
>> >> >> >>
>> >> >> >> t2                            pick_next_task_fair (all se are SCHED_IDLE)
>> >> >> >>
>> >> >> >>                                  newidle_balance
>> >> >> >>       +------+ |                 +------+ | +----+ +----+
>> >> >> >> t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
>> >> >> >>       +------+ |                 +------+ | +----+ +----+
>> >> >> >>
>> >> >> >>
>> >> >> >> t1: NORMAL task in CPU2 exits or blocked
>> >> >> >> t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
>> >> >> >> newidle_balance who tries to pull a SCHED_NORMAL(if has).
>> >> >> >> t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
>> >> >> >> SCHED_IDLE(likely).
>> >> >> >>
>> >> >> >> >
>> >> >> >> >> CPU by doing load_balance first.
>> >> >> >> >>
>> >> >> >> >> Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
>> >> >> >> >> Signed-off-by: Chen He <heddchen@tencent.com>
>> >> >> >> >> ---
>> >> >> >> >>  kernel/sched/fair.c | 5 +++++
>> >> >> >> >>  1 file changed, 5 insertions(+)
>> >> >> >> >>
>> >> >> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> >> >> >> >> index ae7ceba..0a26132 100644
>> >> >> >> >> --- a/kernel/sched/fair.c
>> >> >> >> >> +++ b/kernel/sched/fair.c
>> >> >> >> >> @@ -7004,6 +7004,11 @@ struct task_struct *
>> >> >> >> >>         struct task_struct *p;
>> >> >> >> >>         int new_tasks;
>> >> >> >> >>
>> >> >> >> >> +       if (prev &&
>> >> >> >> >> +           fair_policy(prev->policy) &&
>> >> >> >> >
>> >> >> >> >Why do you need a prev and fair task  ? You seem to target the special
>> >> >> >> >case of pick_next_task  but in this case why not only testing rf!=null
>> >> >> >> > to make sure to not return immediately after jumping to the idle
>> >> >> >>
>> >> >> >> >label?
>> >> >> >> We just want to do load_balance only when CPU switches from SCHED_NORMAL
>> >> >> >> to SCHED_IDLE.
>> >> >> >> If not check prev, when the running tasks are all SCHED_IDLE, we would
>> >> >> >> do newidle_balance everytime in pick_next_task_fair, it makes no sense
>> >> >> >> and kind of wasting.
>> >> >> >
>> >> >> >I agree that calling newidle_balance every time pick_next_task_fair is
>> >> >> >called when there are only sched_idle tasks is useless.
>> >> >> >But you also have to take into account cases where there was another
>> >> >> >class of task running on the cpu like RT one. In your example above,
>> >> >> >if you replace the normal task on CPU2 by a RT task, you still want to
>> >> >>
>> >> >> >pick the normal task on CPU1 once RT task goes to sleep.
>> >> >> Sure, this case should be taken into account,  we should also try to
>> >> >> pick normal task in this case.
>> >> >>
>> >> >> >
>> >> >> >Another point that you will have to consider the impact on
>> >> >> >rq->idle_stamp because newidle_balance is assumed to be called before
>> >> >>
>> >> >> >going idle which is not the case anymore with your use case
>> >> >> Yes. rq->idle_stamp should not be changed in this case.
>> >> >>
>> >> >>
>> >> >>
>> >> >> Actually we want to pull a SCHED_NORMAL task (if possible) to run when a cpu is
>> >> >> about to run SCHED_IDLE task. But currently newidle_balance is not
>> >> >> designed for SCHED_IDLE  so SCHED_IDLE can also be pulled which
>> >> >> is useless in our situation.
>> >> >
>> >> >newidle_balance will pull a sched_idle task only if there is an
>> >> >imbalance which is the right thing to do IMO to ensure fairness
>> >> >between sched_idle tasks.  Being a sched_idle task doesn't mean that
>> >> >we should break the fairness
>> >> >
>> >> >>
>> >> >> So we plan to add a new function sched_idle_balance which only try to
>> >> >> pull SCHED_NORMAL tasks from the busiest cpu. And we will call
>> >> >> sched_idle_balance when the previous task is normal or RT and
>> >> >> hoping we can pull a SCHED_NORMAL task to run.
>> >> >>
>> >> >> Do you think it is ok to add a new sched_idle_balance?
>> >> >
>> >> >I don't see any reason why the scheduler should not pull a sched_idle
>> >> >task if there is an imbalance. That will happen anyway during the next
>> >>
>> >> >periodic load balance
>> >> OK. We should not pull the SCHED_IDLE tasks only in load_balance.
>> >>
>> >>
>> >> Do you think it make sense to do an extra load_balance when cpu is
>> >> about to run SCHED_IDLE task (switched from normal/RT)?
>> >
>> >I'm not sure to get your point here.
>> >Do you mean if a sched_idle task is picked to become the running task
>> >whereas there are runnable normal tasks ? This can happen if normal
>> >tasks are long running tasks. We should not in this case. The only
>> >case is when the running task, which is not a sched_idle task but a
>> >normal/rt/deadline one, goes to sleep and there are only sched_idle
>> >tasks enqueued. In this case and only in this case, we should trigger
>> >a load_balance to get a chance to pull a waiting normal task from
>> >another CPU.
>> >
>> >This means checking this state in pick_next_task_fair() and in balance_fair()
>>
>> We made another change would you please give some comments?
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 04a3ce2..2357301 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -7029,6 +7029,10 @@ struct task_struct *
>>         struct task_struct *p;
>>         int new_tasks;
>>
>> +       if (sched_idle_rq(rq) && prev && prev->state &&
>> +           prev->policy != SCHED_IDLE)
>

>This need a comment to explain what it want to achieve
Sure we will add a comment when we send version 2.

>

>Why do you need to test prev->state ?
We only want to do this when a normal task goes to sleep.
If a long running normal task reaches its time slice then it is the time
for the SCHED_IDLE task to run.

>
>> +               goto idle;
>> +
>>  again:
>>         if (!sched_fair_runnable(rq))
>>                 goto idle;
>> @@ -10571,7 +10575,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
>>          * We must set idle_stamp _before_ calling idle_balance(), such that we
>>          * measure the duration of idle_balance() as idle time.
>>          */
>> -       this_rq->idle_stamp = rq_clock(this_rq);
>> +       if (!rq->nr_running)
>> +               this_rq->idle_stamp = rq_clock(this_rq);
>>
>>         /*
>>          * Do not pull tasks towards !active CPUs...
>>
>> >
>> >> By doing this SCHED_NORMAL tasks waiting on other cpus would get
>> >> a chance to be pulled to this cpu and run, it is helpful to reduce the latency
>> >> of SCHED_NORMAL tasks.
>> >>
>> >>
>> >> >>>
>> >> >> >
>> >> >> >>
>> >> >> >> >
>> >> >> >>
>> >> >> >> >Also why not doing that for default case too ? i.e. balance_fair() ?
>> >> >> >> You are right, if you think this scenario makes sense, we will send a
>> >> >> >> refined patch soon :-)
>> >> >> >>
>> >> >> >> >
>> >> >> >> >> +           sched_idle_cpu(rq->cpu))
>> >> >> >> >> +               goto idle;
>> >> >> >> >> +
>> >> >> >> >>  again:
>> >> >> >> >>         if (!sched_fair_runnable(rq))
>> >> >> >> >>                 goto idle;
>> >> >> >> >> --
>> >> >> >> >> 1.8.3.1
>> >> >> >> >>
>> >> >> >> >>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2021-02-04  8:52                   ` chin
@ 2021-02-04  9:02                     ` Vincent Guittot
  0 siblings, 0 replies; 20+ messages in thread
From: Vincent Guittot @ 2021-02-04  9:02 UTC (permalink / raw)
  To: chin
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, heddchen,
	xiaoggchen(陈小光)

On Thu, 4 Feb 2021 at 09:54, chin <ultrachin@163.com> wrote:
>
>
> At 2021-02-04 16:01:57, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >On Tue, 2 Feb 2021 at 08:56, chin <ultrachin@163.com> wrote:
> >>
> >>
> >>
> >>
> >> At 2021-01-13 16:30:14, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >> >On Wed, 13 Jan 2021 at 04:14, chin <ultrachin@163.com> wrote:
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> At 2021-01-12 16:18:51, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >> >> >On Tue, 12 Jan 2021 at 07:59, chin <ultrachin@163.com> wrote:
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> At 2021-01-11 19:04:19, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >> >> >> >On Mon, 11 Jan 2021 at 09:27, chin <ultrachin@163.com> wrote:
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> At 2020-12-23 19:30:26, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> >> >> >> >> >On Wed, 23 Dec 2020 at 09:32, <ultrachin@163.com> wrote:
> >> >> >> >> >>
> >> >> >> >> >> From: Chen Xiaoguang <xiaoggchen@tencent.com>
> >> >> >> >> >>
> >> >> >> >> >> Before a CPU switches from running SCHED_NORMAL task to
> >> >> >> >> >> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
> >> >> >> >> >
> >> >> >> >> >Could you explain more in detail why you only care about this use case
> >> >> >> >>
> >> >> >> >> >in particular and not the general case?
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> We want to run online tasks using SCHED_NORMAL policy and offline tasks
> >> >> >> >> using SCHED_IDLE policy. The online tasks and the offline tasks run in
> >> >> >> >> the same computer in order to use the computer efficiently.
> >> >> >> >> The online tasks are in sleep in most times but should responce soon once
> >> >> >> >> wake up. The offline tasks are in low priority and will run only when no online
> >> >> >> >> tasks.
> >> >> >> >>
> >> >> >> >> The online tasks are more important than the offline tasks and are latency
> >> >> >> >> sensitive we should make sure the online tasks preempt the offline tasks
> >> >> >> >> as soon as possilbe while there are online tasks waiting to run.
> >> >> >> >> So in our situation we hope the SCHED_NORMAL to run if has any.
> >> >> >> >>
> >> >> >> >> Let's assume we have 2 CPUs,
> >> >> >> >> In CPU1 we got 2 SCHED_NORMAL tasks.
> >> >> >> >> in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.
> >> >> >> >>
> >> >> >> >>              CPU1                      CPU2
> >> >> >> >>         curr       rq1            curr          rq2
> >> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> >> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> >> >>
> >> >> >> >>                                  NORMAL exits or blocked
> >> >> >> >>       +------+ | +------+                | +----+ +----+
> >> >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> >> >> >> >>       +------+ | +------+                | +----+ +----+
> >> >> >> >>
> >> >> >> >>                                  pick_next_task_fair
> >> >> >> >>       +------+ | +------+         +----+ | +----+
> >> >> >> >> t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
> >> >> >> >>       +------+ | +------+         +----+ | +----+
> >> >> >> >>
> >> >> >> >>                                  SCHED_IDLE running
> >> >> >> >> t3    +------+ | +------+        +----+  | +----+
> >> >> >> >>       |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
> >> >> >> >>       +------+ | +------+        +----+  | +----+
> >> >> >> >>
> >> >> >> >>                                  run_rebalance_domains
> >> >> >> >>       +------+ |                +------+ | +----+ +----+
> >> >> >> >> t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
> >> >> >> >>       +------+ |                +------+ | +----+ +----+
> >> >> >> >>
> >> >> >> >> As we can see
> >> >> >> >> t1: NORMAL task in CPU2 exits or blocked
> >> >> >> >> t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
> >> >> >> >> another SCHED_NORMAL in rq1 is waiting.
> >> >> >> >> t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
> >> >> >> >> t4: after a short time, periodic load_balance triggerd and pull
> >> >> >> >> SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.
> >> >> >> >>
> >> >> >> >> In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to run.
> >> >> >> >> The latency of this SCHED_NORMAL will be high which is not acceptble.
> >> >> >> >>
> >> >> >> >> Do a load_balance before running the SCHED_IDLE may fix this problem.
> >> >> >> >>
> >> >> >> >> This patch works as below:
> >> >> >> >>
> >> >> >> >>              CPU1                      CPU2
> >> >> >> >>         curr       rq1            curr          rq2
> >> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> >> >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> >> >> >> >>
> >> >> >> >>                                  NORMAL exits or blocked
> >> >> >> >>       +------+ | +------+                | +----+ +----+
> >> >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> >> >> >> >>       +------+ | +------+                | +----+ +----+
> >> >> >> >>
> >> >> >> >> t2                            pick_next_task_fair (all se are SCHED_IDLE)
> >> >> >> >>
> >> >> >> >>                                  newidle_balance
> >> >> >> >>       +------+ |                 +------+ | +----+ +----+
> >> >> >> >> t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
> >> >> >> >>       +------+ |                 +------+ | +----+ +----+
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> t1: NORMAL task in CPU2 exits or blocked
> >> >> >> >> t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
> >> >> >> >> newidle_balance who tries to pull a SCHED_NORMAL(if has).
> >> >> >> >> t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
> >> >> >> >> SCHED_IDLE(likely).
> >> >> >> >>
> >> >> >> >> >
> >> >> >> >> >> CPU by doing load_balance first.
> >> >> >> >> >>
> >> >> >> >> >> Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
> >> >> >> >> >> Signed-off-by: Chen He <heddchen@tencent.com>
> >> >> >> >> >> ---
> >> >> >> >> >>  kernel/sched/fair.c | 5 +++++
> >> >> >> >> >>  1 file changed, 5 insertions(+)
> >> >> >> >> >>
> >> >> >> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> >> >> >> >> index ae7ceba..0a26132 100644
> >> >> >> >> >> --- a/kernel/sched/fair.c
> >> >> >> >> >> +++ b/kernel/sched/fair.c
> >> >> >> >> >> @@ -7004,6 +7004,11 @@ struct task_struct *
> >> >> >> >> >>         struct task_struct *p;
> >> >> >> >> >>         int new_tasks;
> >> >> >> >> >>
> >> >> >> >> >> +       if (prev &&
> >> >> >> >> >> +           fair_policy(prev->policy) &&
> >> >> >> >> >
> >> >> >> >> >Why do you need a prev and fair task  ? You seem to target the special
> >> >> >> >> >case of pick_next_task  but in this case why not only testing rf!=null
> >> >> >> >> > to make sure to not return immediately after jumping to the idle
> >> >> >> >>
> >> >> >> >> >label?
> >> >> >> >> We just want to do load_balance only when CPU switches from SCHED_NORMAL
> >> >> >> >> to SCHED_IDLE.
> >> >> >> >> If not check prev, when the running tasks are all SCHED_IDLE, we would
> >> >> >> >> do newidle_balance everytime in pick_next_task_fair, it makes no sense
> >> >> >> >> and kind of wasting.
> >> >> >> >
> >> >> >> >I agree that calling newidle_balance every time pick_next_task_fair is
> >> >> >> >called when there are only sched_idle tasks is useless.
> >> >> >> >But you also have to take into account cases where there was another
> >> >> >> >class of task running on the cpu like RT one. In your example above,
> >> >> >> >if you replace the normal task on CPU2 by a RT task, you still want to
> >> >> >>
> >> >> >> >pick the normal task on CPU1 once RT task goes to sleep.
> >> >> >> Sure, this case should be taken into account,  we should also try to
> >> >> >> pick normal task in this case.
> >> >> >>
> >> >> >> >
> >> >> >> >Another point that you will have to consider the impact on
> >> >> >> >rq->idle_stamp because newidle_balance is assumed to be called before
> >> >> >>
> >> >> >> >going idle which is not the case anymore with your use case
> >> >> >> Yes. rq->idle_stamp should not be changed in this case.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Actually we want to pull a SCHED_NORMAL task (if possible) to run when a cpu is
> >> >> >> about to run SCHED_IDLE task. But currently newidle_balance is not
> >> >> >> designed for SCHED_IDLE  so SCHED_IDLE can also be pulled which
> >> >> >> is useless in our situation.
> >> >> >
> >> >> >newidle_balance will pull a sched_idle task only if there is an
> >> >> >imbalance which is the right thing to do IMO to ensure fairness
> >> >> >between sched_idle tasks.  Being a sched_idle task doesn't mean that
> >> >> >we should break the fairness
> >> >> >
> >> >> >>
> >> >> >> So we plan to add a new function sched_idle_balance which only try to
> >> >> >> pull SCHED_NORMAL tasks from the busiest cpu. And we will call
> >> >> >> sched_idle_balance when the previous task is normal or RT and
> >> >> >> hoping we can pull a SCHED_NORMAL task to run.
> >> >> >>
> >> >> >> Do you think it is ok to add a new sched_idle_balance?
> >> >> >
> >> >> >I don't see any reason why the scheduler should not pull a sched_idle
> >> >> >task if there is an imbalance. That will happen anyway during the next
> >> >>
> >> >> >periodic load balance
> >> >> OK. We should not pull the SCHED_IDLE tasks only in load_balance.
> >> >>
> >> >>
> >> >> Do you think it make sense to do an extra load_balance when cpu is
> >> >> about to run SCHED_IDLE task (switched from normal/RT)?
> >> >
> >> >I'm not sure to get your point here.
> >> >Do you mean if a sched_idle task is picked to become the running task
> >> >whereas there are runnable normal tasks ? This can happen if normal
> >> >tasks are long running tasks. We should not in this case. The only
> >> >case is when the running task, which is not a sched_idle task but a
> >> >normal/rt/deadline one, goes to sleep and there are only sched_idle
> >> >tasks enqueued. In this case and only in this case, we should trigger
> >> >a load_balance to get a chance to pull a waiting normal task from
> >> >another CPU.
> >> >
> >> >This means checking this state in pick_next_task_fair() and in balance_fair()
> >>
> >> We made another change would you please give some comments?
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 04a3ce2..2357301 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -7029,6 +7029,10 @@ struct task_struct *
> >>         struct task_struct *p;
> >>         int new_tasks;
> >>
> >> +       if (sched_idle_rq(rq) && prev && prev->state &&
> >> +           prev->policy != SCHED_IDLE)
> >
>
> >This need a comment to explain what it want to achieve
> Sure we will add a comment when we send version 2.
>
> >
>
> >Why do you need to test prev->state ?
> We only want to do this when a normal task goes to sleep.
> If a long running normal task reaches its time slice then it is the time
> for the SCHED_IDLE task to run.

But sched_idle_rq(rq) will be false in this case because the long
running normal task is still accounted in cfs.h_nr_running


>
> >
> >> +               goto idle;
> >> +
> >>  again:
> >>         if (!sched_fair_runnable(rq))
> >>                 goto idle;
> >> @@ -10571,7 +10575,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
> >>          * We must set idle_stamp _before_ calling idle_balance(), such that we
> >>          * measure the duration of idle_balance() as idle time.
> >>          */
> >> -       this_rq->idle_stamp = rq_clock(this_rq);
> >> +       if (!rq->nr_running)
> >> +               this_rq->idle_stamp = rq_clock(this_rq);
> >>
> >>         /*
> >>          * Do not pull tasks towards !active CPUs...
> >>
> >> >
> >> >> By doing this SCHED_NORMAL tasks waiting on other cpus would get
> >> >> a chance to be pulled to this cpu and run, it is helpful to reduce the latency
> >> >> of SCHED_NORMAL tasks.
> >> >>
> >> >>
> >> >> >>>
> >> >> >> >
> >> >> >> >>
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >> >Also why not doing that for default case too ? i.e. balance_fair() ?
> >> >> >> >> You are right, if you think this scenario makes sense, we will send a
> >> >> >> >> refined patch soon :-)
> >> >> >> >>
> >> >> >> >> >
> >> >> >> >> >> +           sched_idle_cpu(rq->cpu))
> >> >> >> >> >> +               goto idle;
> >> >> >> >> >> +
> >> >> >> >> >>  again:
> >> >> >> >> >>         if (!sched_fair_runnable(rq))
> >> >> >> >> >>                 goto idle;
> >> >> >> >> >> --
> >> >> >> >> >> 1.8.3.1
> >> >> >> >> >>
> >> >> >> >> >>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks
  2021-02-04  8:01                 ` Vincent Guittot
  2021-02-04  8:52                   ` chin
@ 2021-02-04  9:13                   ` Jiang Biao
  1 sibling, 0 replies; 20+ messages in thread
From: Jiang Biao @ 2021-02-04  9:13 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: chin, linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, heddchen,
	xiaoggchen(陈小光)

Hi, Vincent

On Thu, 4 Feb 2021 at 16:04, Vincent Guittot <vincent.guittot@linaro.org> wrote:
>
> On Tue, 2 Feb 2021 at 08:56, chin <ultrachin@163.com> wrote:
> >
> >
> >
> >
> > At 2021-01-13 16:30:14, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> > >On Wed, 13 Jan 2021 at 04:14, chin <ultrachin@163.com> wrote:
> > >>
> > >>
> > >>
> > >>
> > >> At 2021-01-12 16:18:51, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> > >> >On Tue, 12 Jan 2021 at 07:59, chin <ultrachin@163.com> wrote:
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> At 2021-01-11 19:04:19, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> > >> >> >On Mon, 11 Jan 2021 at 09:27, chin <ultrachin@163.com> wrote:
> > >> >> >>
> > >> >> >>
> > >> >> >> At 2020-12-23 19:30:26, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
> > >> >> >> >On Wed, 23 Dec 2020 at 09:32, <ultrachin@163.com> wrote:
> > >> >> >> >>
> > >> >> >> >> From: Chen Xiaoguang <xiaoggchen@tencent.com>
> > >> >> >> >>
> > >> >> >> >> Before a CPU switches from running SCHED_NORMAL task to
> > >> >> >> >> SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
> > >> >> >> >
> > >> >> >> >Could you explain more in detail why you only care about this use case
> > >> >> >>
> > >> >> >> >in particular and not the general case?
> > >> >> >>
> > >> >> >>
> > >> >> >> We want to run online tasks using SCHED_NORMAL policy and offline tasks
> > >> >> >> using SCHED_IDLE policy. The online tasks and the offline tasks run in
> > >> >> >> the same computer in order to use the computer efficiently.
> > >> >> >> The online tasks are in sleep in most times but should responce soon once
> > >> >> >> wake up. The offline tasks are in low priority and will run only when no online
> > >> >> >> tasks.
> > >> >> >>
> > >> >> >> The online tasks are more important than the offline tasks and are latency
> > >> >> >> sensitive we should make sure the online tasks preempt the offline tasks
> > >> >> >> as soon as possilbe while there are online tasks waiting to run.
> > >> >> >> So in our situation we hope the SCHED_NORMAL to run if has any.
> > >> >> >>
> > >> >> >> Let's assume we have 2 CPUs,
> > >> >> >> In CPU1 we got 2 SCHED_NORMAL tasks.
> > >> >> >> in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.
> > >> >> >>
> > >> >> >>              CPU1                      CPU2
> > >> >> >>         curr       rq1            curr          rq2
> > >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> > >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> > >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> > >> >> >>
> > >> >> >>                                  NORMAL exits or blocked
> > >> >> >>       +------+ | +------+                | +----+ +----+
> > >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> > >> >> >>       +------+ | +------+                | +----+ +----+
> > >> >> >>
> > >> >> >>                                  pick_next_task_fair
> > >> >> >>       +------+ | +------+         +----+ | +----+
> > >> >> >> t2    |NORMAL| | |NORMAL|         |IDLE| | |IDLE|
> > >> >> >>       +------+ | +------+         +----+ | +----+
> > >> >> >>
> > >> >> >>                                  SCHED_IDLE running
> > >> >> >> t3    +------+ | +------+        +----+  | +----+
> > >> >> >>       |NORMAL| | |NORMAL|        |IDLE|  | |IDLE|
> > >> >> >>       +------+ | +------+        +----+  | +----+
> > >> >> >>
> > >> >> >>                                  run_rebalance_domains
> > >> >> >>       +------+ |                +------+ | +----+ +----+
> > >> >> >> t4    |NORMAL| |                |NORMAL| | |IDLE| |IDLE|
> > >> >> >>       +------+ |                +------+ | +----+ +----+
> > >> >> >>
> > >> >> >> As we can see
> > >> >> >> t1: NORMAL task in CPU2 exits or blocked
> > >> >> >> t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
> > >> >> >> another SCHED_NORMAL in rq1 is waiting.
> > >> >> >> t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
> > >> >> >> t4: after a short time, periodic load_balance triggerd and pull
> > >> >> >> SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.
> > >> >> >>
> > >> >> >> In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to run.
> > >> >> >> The latency of this SCHED_NORMAL will be high which is not acceptble.
> > >> >> >>
> > >> >> >> Do a load_balance before running the SCHED_IDLE may fix this problem.
> > >> >> >>
> > >> >> >> This patch works as below:
> > >> >> >>
> > >> >> >>              CPU1                      CPU2
> > >> >> >>         curr       rq1            curr          rq2
> > >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> > >> >> >> t0    |NORMAL| | |NORMAL|       |NORMAL| | |IDLE| |IDLE|
> > >> >> >>       +------+ | +------+       +------+ | +----+ +----+
> > >> >> >>
> > >> >> >>                                  NORMAL exits or blocked
> > >> >> >>       +------+ | +------+                | +----+ +----+
> > >> >> >> t1    |NORMAL| | |NORMAL|                | |IDLE| |IDLE|
> > >> >> >>       +------+ | +------+                | +----+ +----+
> > >> >> >>
> > >> >> >> t2                            pick_next_task_fair (all se are SCHED_IDLE)
> > >> >> >>
> > >> >> >>                                  newidle_balance
> > >> >> >>       +------+ |                 +------+ | +----+ +----+
> > >> >> >> t3    |NORMAL| |                 |NORMAL| | |IDLE| |IDLE|
> > >> >> >>       +------+ |                 +------+ | +----+ +----+
> > >> >> >>
> > >> >> >>
> > >> >> >> t1: NORMAL task in CPU2 exits or blocked
> > >> >> >> t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
> > >> >> >> newidle_balance who tries to pull a SCHED_NORMAL(if has).
> > >> >> >> t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
> > >> >> >> SCHED_IDLE(likely).
> > >> >> >>
> > >> >> >> >
> > >> >> >> >> CPU by doing load_balance first.
> > >> >> >> >>
> > >> >> >> >> Signed-off-by: Chen Xiaoguang <xiaoggchen@tencent.com>
> > >> >> >> >> Signed-off-by: Chen He <heddchen@tencent.com>
> > >> >> >> >> ---
> > >> >> >> >>  kernel/sched/fair.c | 5 +++++
> > >> >> >> >>  1 file changed, 5 insertions(+)
> > >> >> >> >>
> > >> >> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > >> >> >> >> index ae7ceba..0a26132 100644
> > >> >> >> >> --- a/kernel/sched/fair.c
> > >> >> >> >> +++ b/kernel/sched/fair.c
> > >> >> >> >> @@ -7004,6 +7004,11 @@ struct task_struct *
> > >> >> >> >>         struct task_struct *p;
> > >> >> >> >>         int new_tasks;
> > >> >> >> >>
> > >> >> >> >> +       if (prev &&
> > >> >> >> >> +           fair_policy(prev->policy) &&
> > >> >> >> >
> > >> >> >> >Why do you need a prev and fair task  ? You seem to target the special
> > >> >> >> >case of pick_next_task  but in this case why not only testing rf!=null
> > >> >> >> > to make sure to not return immediately after jumping to the idle
> > >> >> >>
> > >> >> >> >label?
> > >> >> >> We just want to do load_balance only when CPU switches from SCHED_NORMAL
> > >> >> >> to SCHED_IDLE.
> > >> >> >> If not check prev, when the running tasks are all SCHED_IDLE, we would
> > >> >> >> do newidle_balance everytime in pick_next_task_fair, it makes no sense
> > >> >> >> and kind of wasting.
> > >> >> >
> > >> >> >I agree that calling newidle_balance every time pick_next_task_fair is
> > >> >> >called when there are only sched_idle tasks is useless.
> > >> >> >But you also have to take into account cases where there was another
> > >> >> >class of task running on the cpu like RT one. In your example above,
> > >> >> >if you replace the normal task on CPU2 by a RT task, you still want to
> > >> >>
> > >> >> >pick the normal task on CPU1 once RT task goes to sleep.
> > >> >> Sure, this case should be taken into account,  we should also try to
> > >> >> pick normal task in this case.
> > >> >>
> > >> >> >
> > >> >> >Another point that you will have to consider the impact on
> > >> >> >rq->idle_stamp because newidle_balance is assumed to be called before
> > >> >>
> > >> >> >going idle which is not the case anymore with your use case
> > >> >> Yes. rq->idle_stamp should not be changed in this case.
> > >> >>
> > >> >>
> > >> >>
> > >> >> Actually we want to pull a SCHED_NORMAL task (if possible) to run when a cpu is
> > >> >> about to run SCHED_IDLE task. But currently newidle_balance is not
> > >> >> designed for SCHED_IDLE  so SCHED_IDLE can also be pulled which
> > >> >> is useless in our situation.
> > >> >
> > >> >newidle_balance will pull a sched_idle task only if there is an
> > >> >imbalance which is the right thing to do IMO to ensure fairness
> > >> >between sched_idle tasks.  Being a sched_idle task doesn't mean that
> > >> >we should break the fairness
> > >> >
> > >> >>
> > >> >> So we plan to add a new function sched_idle_balance which only try to
> > >> >> pull SCHED_NORMAL tasks from the busiest cpu. And we will call
> > >> >> sched_idle_balance when the previous task is normal or RT and
> > >> >> hoping we can pull a SCHED_NORMAL task to run.
> > >> >>
> > >> >> Do you think it is ok to add a new sched_idle_balance?
> > >> >
> > >> >I don't see any reason why the scheduler should not pull a sched_idle
> > >> >task if there is an imbalance. That will happen anyway during the next
> > >>
> > >> >periodic load balance
> > >> OK. We should not pull the SCHED_IDLE tasks only in load_balance.
> > >>
> > >>
> > >> Do you think it make sense to do an extra load_balance when cpu is
> > >> about to run SCHED_IDLE task (switched from normal/RT)?
> > >
> > >I'm not sure to get your point here.
> > >Do you mean if a sched_idle task is picked to become the running task
> > >whereas there are runnable normal tasks ? This can happen if normal
> > >tasks are long running tasks. We should not in this case. The only
> > >case is when the running task, which is not a sched_idle task but a
> > >normal/rt/deadline one, goes to sleep and there are only sched_idle
> > >tasks enqueued. In this case and only in this case, we should trigger
> > >a load_balance to get a chance to pull a waiting normal task from
> > >another CPU.
> > >
> > >This means checking this state in pick_next_task_fair() and in balance_fair()
> >
> > We made another change would you please give some comments?
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 04a3ce2..2357301 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7029,6 +7029,10 @@ struct task_struct *
> >         struct task_struct *p;
> >         int new_tasks;
> >
> > +       if (sched_idle_rq(rq) && prev && prev->state &&
> > +           prev->policy != SCHED_IDLE)
>
> This need a comment to explain what it want to achieve
>
> Why do you need to test prev->state ?
Just to avoid triggering load_balance for the case if prev normal
tasks are long running tasks, as you said. :)
Or could testing (prev->state & TASK_NORMAL) be better?

Thx.
Jiang,
Regards

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-02-04 17:05 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-23  8:09 [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks ultrachin
2020-12-23 11:30 ` Vincent Guittot
2021-01-11  8:26   ` chin
2021-01-11 11:04     ` Vincent Guittot
2021-01-12  6:57       ` chin
2021-01-12  8:18         ` Vincent Guittot
2021-01-13  3:12           ` chin
2021-01-13  8:30             ` Vincent Guittot
2021-02-02  7:54               ` chin
2021-02-02 15:54                 ` Vincent Guittot
2021-02-03  2:53                   ` chin
2021-02-04  3:57                     ` Jiang Biao
2021-02-04  8:03                       ` Vincent Guittot
2021-02-04  8:01                 ` Vincent Guittot
2021-02-04  8:52                   ` chin
2021-02-04  9:02                     ` Vincent Guittot
2021-02-04  9:13                   ` Jiang Biao
2021-01-11  9:15   ` He Chen
2020-12-27 19:13 ` kernel test robot
2020-12-27 19:42 ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).