linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
@ 2016-09-17  1:28 Joonwoo Park
  2016-09-17 23:28 ` Wanpeng Li
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Joonwoo Park @ 2016-09-17  1:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srivatsa Vaddagiri, Ingo Molnar, linux-kernel, Joonwoo Park

From: Srivatsa Vaddagiri <vatsa@codeaurora.org>

SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
(just when they would have exceeded their ideal_runtime). It makes use
of a per-cpu hrtimer resource and hence alarming that hrtimer should
be based on total SCHED_FAIR tasks a cpu has across its various cfs_rqs,
rather than being based on number of tasks in a particular cfs_rq (as
implemented currently). As a result, with current code, its possible for
a running task (which is the sole task in its cfs_rq) to be preempted
much after its ideal_runtime has elapsed, resulting in increased latency
for tasks in other cfs_rq on same cpu.

Fix this by alarming sched hrtimer based on total number of SCHED_FAIR
tasks a CPU has across its various cfs_rqs.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
---

 joonwoop: Do we also need to update or remove if-statement inside
 hrtick_update()?
 I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
 has large number of nr_running where slice is longer than sched_latency.

 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4088eed..c55c566 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
 
 	WARN_ON(task_rq(p) != rq);
 
-	if (cfs_rq->nr_running > 1) {
+	if (rq->cfs.h_nr_running > 1) {
 		u64 slice = sched_slice(cfs_rq, se);
 		u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
 		s64 delta = slice - ran;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
  2016-09-17  1:28 [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks Joonwoo Park
@ 2016-09-17 23:28 ` Wanpeng Li
  2016-09-19 18:08   ` Joonwoo Park
  2016-09-19  8:21 ` Peter Zijlstra
  2016-09-22 14:01 ` [tip:sched/core] sched/fair: " tip-bot for Srivatsa Vaddagiri
  2 siblings, 1 reply; 8+ messages in thread
From: Wanpeng Li @ 2016-09-17 23:28 UTC (permalink / raw)
  To: Joonwoo Park
  Cc: Peter Zijlstra, Srivatsa Vaddagiri, Ingo Molnar, linux-kernel

2016-09-17 9:28 GMT+08:00 Joonwoo Park <joonwoop@codeaurora.org>:
> From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
>
> SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
> (just when they would have exceeded their ideal_runtime). It makes use
> of a per-cpu hrtimer resource and hence alarming that hrtimer should
> be based on total SCHED_FAIR tasks a cpu has across its various cfs_rqs,
> rather than being based on number of tasks in a particular cfs_rq (as
> implemented currently). As a result, with current code, its possible for
> a running task (which is the sole task in its cfs_rq) to be preempted

not be preempted much, right?

> much after its ideal_runtime has elapsed, resulting in increased latency
> for tasks in other cfs_rq on same cpu.
>
> Fix this by alarming sched hrtimer based on total number of SCHED_FAIR
> tasks a CPU has across its various cfs_rqs.
>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
> ---
>
>  joonwoop: Do we also need to update or remove if-statement inside
>  hrtick_update()?
>  I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
>  has large number of nr_running where slice is longer than sched_latency.
>
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4088eed..c55c566 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
>
>         WARN_ON(task_rq(p) != rq);
>
> -       if (cfs_rq->nr_running > 1) {
> +       if (rq->cfs.h_nr_running > 1) {
>                 u64 slice = sched_slice(cfs_rq, se);
>                 u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
>                 s64 delta = slice - ran;
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> hosted by The Linux Foundation
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
  2016-09-17  1:28 [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks Joonwoo Park
  2016-09-17 23:28 ` Wanpeng Li
@ 2016-09-19  8:21 ` Peter Zijlstra
  2016-09-19 18:04   ` Joonwoo Park
  2016-09-22 14:01 ` [tip:sched/core] sched/fair: " tip-bot for Srivatsa Vaddagiri
  2 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2016-09-19  8:21 UTC (permalink / raw)
  To: Joonwoo Park; +Cc: Srivatsa Vaddagiri, Ingo Molnar, linux-kernel

On Fri, Sep 16, 2016 at 06:28:51PM -0700, Joonwoo Park wrote:
> From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
> 
> SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot

Right, but I always found the overhead of the thing too high to be
really useful.

How come you're using this?


>  joonwoop: Do we also need to update or remove if-statement inside
>  hrtick_update()?

>  I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
>  has large number of nr_running where slice is longer than sched_latency.

Right, you want that to match with whatever sched_slice() does.

> +++ b/kernel/sched/fair.c
> @@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
>  
>  	WARN_ON(task_rq(p) != rq);
>  
> -	if (cfs_rq->nr_running > 1) {
> +	if (rq->cfs.h_nr_running > 1) {
>  		u64 slice = sched_slice(cfs_rq, se);
>  		u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
>  		s64 delta = slice - ran;

Yeah, that looks right. I don't think I've ever tried hrtick with
cgroups enabled...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
  2016-09-19  8:21 ` Peter Zijlstra
@ 2016-09-19 18:04   ` Joonwoo Park
  2016-09-19 18:17     ` Joonwoo Park
  0 siblings, 1 reply; 8+ messages in thread
From: Joonwoo Park @ 2016-09-19 18:04 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Srivatsa Vaddagiri, Ingo Molnar, linux-kernel

On Mon, Sep 19, 2016 at 10:21:58AM +0200, Peter Zijlstra wrote:
> On Fri, Sep 16, 2016 at 06:28:51PM -0700, Joonwoo Park wrote:
> > From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
> > 
> > SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
> 
> Right, but I always found the overhead of the thing too high to be
> really useful.
> 
> How come you're using this?

This patch was in our internal tree for decades so I unfortunately cannot
find actual usecase or history.
But I guess it was about excessive latency when there are number of CPU
bound tasks running on a CPU but on different cfs_rqs and CONFIG_HZ = 100.

See how I recreated :

* run 4 cpu hogs on the same cgroup [1] :
 dd-960   [000] d..3   110.651060: sched_switch: prev_comm=dd prev_pid=960 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=959 next_prio=120
 dd-959   [000] d..3   110.652566: sched_switch: prev_comm=dd prev_pid=959 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=961 next_prio=120
 dd-961   [000] d..3   110.654072: sched_switch: prev_comm=dd prev_pid=961 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=962 next_prio=120
 dd-962   [000] d..3   110.655578: sched_switch: prev_comm=dd prev_pid=962 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=960 next_prio=120
  preempt every 1.5ms slice by hrtick.

* run 4 CPU hogs on 4 different cgroups [2] :
 dd-964   [000] d..3    24.169873: sched_switch: prev_comm=dd prev_pid=964 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=966 next_prio=120
 dd-966   [000] d..3    24.179873: sched_switch: prev_comm=dd prev_pid=966 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=965 next_prio=120
 dd-965   [000] d..3    24.189873: sched_switch: prev_comm=dd prev_pid=965 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=967 next_prio=120
 dd-967   [000] d..3    24.199873: sched_switch: prev_comm=dd prev_pid=967 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=964 next_prio=120
  preempt every 10ms by scheduler tick so that all tasks suffers from 40ms preemption latency.

[1] : 
 dd if=/dev/zero of=/dev/zero &
 dd if=/dev/zero of=/dev/zero &
 dd if=/dev/zero of=/dev/zero &
 dd if=/dev/zero of=/dev/zero &

[2] :
 mount -t cgroup -o cpu cpu /sys/fs/cgroup
 mkdir /sys/fs/cgroup/grp1
 mkdir /sys/fs/cgroup/grp2
 mkdir /sys/fs/cgroup/grp3
 mkdir /sys/fs/cgroup/grp4
 dd if=/dev/zero of=/dev/zero &
 echo $! > /sys/fs/cgroup/grp1/tasks 
 dd if=/dev/zero of=/dev/zero &
 echo $! > /sys/fs/cgroup/grp2/tasks 
 dd if=/dev/zero of=/dev/zero &
 echo $! > /sys/fs/cgroup/grp3/tasks 
 dd if=/dev/zero of=/dev/zero &
 echo $! > /sys/fs/cgroup/grp4/tasks 

I could confirm this patch makes the latter behaves as same as the former in terms of preemption latency.

> 
> 
> >  joonwoop: Do we also need to update or remove if-statement inside
> >  hrtick_update()?
> 
> >  I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
> >  has large number of nr_running where slice is longer than sched_latency.
> 
> Right, you want that to match with whatever sched_slice() does.

Cool.  Thank you!

Thanks,
Joonwoo

> 
> > +++ b/kernel/sched/fair.c
> > @@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
> >  
> >  	WARN_ON(task_rq(p) != rq);
> >  
> > -	if (cfs_rq->nr_running > 1) {
> > +	if (rq->cfs.h_nr_running > 1) {
> >  		u64 slice = sched_slice(cfs_rq, se);
> >  		u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
> >  		s64 delta = slice - ran;
> 
> Yeah, that looks right. I don't think I've ever tried hrtick with
> cgroups enabled...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
  2016-09-17 23:28 ` Wanpeng Li
@ 2016-09-19 18:08   ` Joonwoo Park
  2016-09-19 23:30     ` Wanpeng Li
  0 siblings, 1 reply; 8+ messages in thread
From: Joonwoo Park @ 2016-09-19 18:08 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Peter Zijlstra, Srivatsa Vaddagiri, Ingo Molnar, linux-kernel

On Sun, Sep 18, 2016 at 07:28:32AM +0800, Wanpeng Li wrote:
> 2016-09-17 9:28 GMT+08:00 Joonwoo Park <joonwoop@codeaurora.org>:
> > From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
> >
> > SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
> > (just when they would have exceeded their ideal_runtime). It makes use
> > of a per-cpu hrtimer resource and hence alarming that hrtimer should
> > be based on total SCHED_FAIR tasks a cpu has across its various cfs_rqs,
> > rather than being based on number of tasks in a particular cfs_rq (as
> > implemented currently). As a result, with current code, its possible for
> > a running task (which is the sole task in its cfs_rq) to be preempted
> 
> not be preempted much, right?

I don't think so....
By saying 'to be preempted much after its ideal_runtime has elapsed' I
wanted to describe the current suboptimal behaviour.

Thanks,
Joonwoo

> 
> > much after its ideal_runtime has elapsed, resulting in increased latency
> > for tasks in other cfs_rq on same cpu.
> >
> > Fix this by alarming sched hrtimer based on total number of SCHED_FAIR
> > tasks a CPU has across its various cfs_rqs.
> >
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
> > Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
> > ---
> >
> >  joonwoop: Do we also need to update or remove if-statement inside
> >  hrtick_update()?
> >  I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
> >  has large number of nr_running where slice is longer than sched_latency.
> >
> >  kernel/sched/fair.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 4088eed..c55c566 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
> >
> >         WARN_ON(task_rq(p) != rq);
> >
> > -       if (cfs_rq->nr_running > 1) {
> > +       if (rq->cfs.h_nr_running > 1) {
> >                 u64 slice = sched_slice(cfs_rq, se);
> >                 u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
> >                 s64 delta = slice - ran;
> > --
> > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> > hosted by The Linux Foundation
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
  2016-09-19 18:04   ` Joonwoo Park
@ 2016-09-19 18:17     ` Joonwoo Park
  0 siblings, 0 replies; 8+ messages in thread
From: Joonwoo Park @ 2016-09-19 18:17 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Srivatsa Vaddagiri, Ingo Molnar, linux-kernel

On Mon, Sep 19, 2016 at 11:04:49AM -0700, Joonwoo Park wrote:
> On Mon, Sep 19, 2016 at 10:21:58AM +0200, Peter Zijlstra wrote:
> > On Fri, Sep 16, 2016 at 06:28:51PM -0700, Joonwoo Park wrote:
> > > From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
> > > 
> > > SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
> > 
> > Right, but I always found the overhead of the thing too high to be
> > really useful.
> > 
> > How come you're using this?
> 
> This patch was in our internal tree for decades so I unfortunately cannot
> find actual usecase or history.
> But I guess it was about excessive latency when there are number of CPU
> bound tasks running on a CPU but on different cfs_rqs and CONFIG_HZ = 100.
> 
> See how I recreated :
> 
> * run 4 cpu hogs on the same cgroup [1] :
>  dd-960   [000] d..3   110.651060: sched_switch: prev_comm=dd prev_pid=960 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=959 next_prio=120
>  dd-959   [000] d..3   110.652566: sched_switch: prev_comm=dd prev_pid=959 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=961 next_prio=120
>  dd-961   [000] d..3   110.654072: sched_switch: prev_comm=dd prev_pid=961 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=962 next_prio=120
>  dd-962   [000] d..3   110.655578: sched_switch: prev_comm=dd prev_pid=962 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=960 next_prio=120
>   preempt every 1.5ms slice by hrtick.
> 
> * run 4 CPU hogs on 4 different cgroups [2] :
>  dd-964   [000] d..3    24.169873: sched_switch: prev_comm=dd prev_pid=964 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=966 next_prio=120
>  dd-966   [000] d..3    24.179873: sched_switch: prev_comm=dd prev_pid=966 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=965 next_prio=120
>  dd-965   [000] d..3    24.189873: sched_switch: prev_comm=dd prev_pid=965 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=967 next_prio=120
>  dd-967   [000] d..3    24.199873: sched_switch: prev_comm=dd prev_pid=967 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=964 next_prio=120
>   preempt every 10ms by scheduler tick so that all tasks suffers from 40ms preemption latency.
> 
> [1] : 
>  dd if=/dev/zero of=/dev/zero &
Ugh..  of=/dev/null instead.

>  dd if=/dev/zero of=/dev/zero &
>  dd if=/dev/zero of=/dev/zero &
>  dd if=/dev/zero of=/dev/zero &
> 
> [2] :
>  mount -t cgroup -o cpu cpu /sys/fs/cgroup
>  mkdir /sys/fs/cgroup/grp1
>  mkdir /sys/fs/cgroup/grp2
>  mkdir /sys/fs/cgroup/grp3
>  mkdir /sys/fs/cgroup/grp4
>  dd if=/dev/zero of=/dev/zero &
>  echo $! > /sys/fs/cgroup/grp1/tasks 
>  dd if=/dev/zero of=/dev/zero &
>  echo $! > /sys/fs/cgroup/grp2/tasks 
>  dd if=/dev/zero of=/dev/zero &
>  echo $! > /sys/fs/cgroup/grp3/tasks 
>  dd if=/dev/zero of=/dev/zero &
>  echo $! > /sys/fs/cgroup/grp4/tasks 
> 
> I could confirm this patch makes the latter behaves as same as the former in terms of preemption latency.
> 
> > 
> > 
> > >  joonwoop: Do we also need to update or remove if-statement inside
> > >  hrtick_update()?
> > 
> > >  I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
> > >  has large number of nr_running where slice is longer than sched_latency.
> > 
> > Right, you want that to match with whatever sched_slice() does.
> 
> Cool.  Thank you!
> 
> Thanks,
> Joonwoo
> 
> > 
> > > +++ b/kernel/sched/fair.c
> > > @@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
> > >  
> > >  	WARN_ON(task_rq(p) != rq);
> > >  
> > > -	if (cfs_rq->nr_running > 1) {
> > > +	if (rq->cfs.h_nr_running > 1) {
> > >  		u64 slice = sched_slice(cfs_rq, se);
> > >  		u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
> > >  		s64 delta = slice - ran;
> > 
> > Yeah, that looks right. I don't think I've ever tried hrtick with
> > cgroups enabled...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
  2016-09-19 18:08   ` Joonwoo Park
@ 2016-09-19 23:30     ` Wanpeng Li
  0 siblings, 0 replies; 8+ messages in thread
From: Wanpeng Li @ 2016-09-19 23:30 UTC (permalink / raw)
  To: Joonwoo Park
  Cc: Peter Zijlstra, Srivatsa Vaddagiri, Ingo Molnar, linux-kernel

2016-09-20 2:08 GMT+08:00 Joonwoo Park <joonwoop@codeaurora.org>:
> On Sun, Sep 18, 2016 at 07:28:32AM +0800, Wanpeng Li wrote:
>> 2016-09-17 9:28 GMT+08:00 Joonwoo Park <joonwoop@codeaurora.org>:
>> > From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
>> >
>> > SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
>> > (just when they would have exceeded their ideal_runtime). It makes use
>> > of a per-cpu hrtimer resource and hence alarming that hrtimer should
>> > be based on total SCHED_FAIR tasks a cpu has across its various cfs_rqs,
>> > rather than being based on number of tasks in a particular cfs_rq (as
>> > implemented currently). As a result, with current code, its possible for
>> > a running task (which is the sole task in its cfs_rq) to be preempted
>>
>> not be preempted much, right?
>
> I don't think so....
> By saying 'to be preempted much after its ideal_runtime has elapsed' I
> wanted to describe the current suboptimal behaviour.

I also described the current suboptimal behaviour. A running task
(which is the sole task in its cfs_rq) as the test case 2 you
mentioned in another reply is preempted after 10ms since the scheduler
tick, however, it will be preempted after 1.5ms since the hrtick fire
in test case 1. That's what I mean "not be preempted much for test
case 2".

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip:sched/core] sched/fair: Fix SCHED_HRTICK bug leading to late preemption of tasks
  2016-09-17  1:28 [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks Joonwoo Park
  2016-09-17 23:28 ` Wanpeng Li
  2016-09-19  8:21 ` Peter Zijlstra
@ 2016-09-22 14:01 ` tip-bot for Srivatsa Vaddagiri
  2 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Srivatsa Vaddagiri @ 2016-09-22 14:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, hpa, linux-kernel, tglx, mingo, peterz, joonwoop, vatsa

Commit-ID:  8bf46a39be910937d4c9e8d999a7438a7ae1a75b
Gitweb:     http://git.kernel.org/tip/8bf46a39be910937d4c9e8d999a7438a7ae1a75b
Author:     Srivatsa Vaddagiri <vatsa@codeaurora.org>
AuthorDate: Fri, 16 Sep 2016 18:28:51 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 22 Sep 2016 15:20:18 +0200

sched/fair: Fix SCHED_HRTICK bug leading to late preemption of tasks

SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
(just when they would have exceeded their ideal_runtime).

It makes use of a per-CPU hrtimer resource and hence arming that
hrtimer should be based on total SCHED_FAIR tasks a CPU has across its
various cfs_rqs, rather than being based on number of tasks in a
particular cfs_rq (as implemented currently).

As a result, with current code, its possible for a running task (which
is the sole task in its cfs_rq) to be preempted much after its
ideal_runtime has elapsed, resulting in increased latency for tasks in
other cfs_rq on same CPU.

Fix this by arming sched hrtimer based on total number of SCHED_FAIR
tasks a CPU has across its various cfs_rqs.

Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1474075731-11550-1-git-send-email-joonwoop@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 986c10c..8fb4d19 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4469,7 +4469,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
 
 	WARN_ON(task_rq(p) != rq);
 
-	if (cfs_rq->nr_running > 1) {
+	if (rq->cfs.h_nr_running > 1) {
 		u64 slice = sched_slice(cfs_rq, se);
 		u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
 		s64 delta = slice - ran;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-09-22 14:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-17  1:28 [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks Joonwoo Park
2016-09-17 23:28 ` Wanpeng Li
2016-09-19 18:08   ` Joonwoo Park
2016-09-19 23:30     ` Wanpeng Li
2016-09-19  8:21 ` Peter Zijlstra
2016-09-19 18:04   ` Joonwoo Park
2016-09-19 18:17     ` Joonwoo Park
2016-09-22 14:01 ` [tip:sched/core] sched/fair: " tip-bot for Srivatsa Vaddagiri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).