* [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
@ 2016-09-17 1:28 Joonwoo Park
2016-09-17 23:28 ` Wanpeng Li
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Joonwoo Park @ 2016-09-17 1:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Srivatsa Vaddagiri, Ingo Molnar, linux-kernel, Joonwoo Park
From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
(just when they would have exceeded their ideal_runtime). It makes use
of a per-cpu hrtimer resource and hence alarming that hrtimer should
be based on total SCHED_FAIR tasks a cpu has across its various cfs_rqs,
rather than being based on number of tasks in a particular cfs_rq (as
implemented currently). As a result, with current code, its possible for
a running task (which is the sole task in its cfs_rq) to be preempted
much after its ideal_runtime has elapsed, resulting in increased latency
for tasks in other cfs_rq on same cpu.
Fix this by alarming sched hrtimer based on total number of SCHED_FAIR
tasks a CPU has across its various cfs_rqs.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
---
joonwoop: Do we also need to update or remove if-statement inside
hrtick_update()?
I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
has large number of nr_running where slice is longer than sched_latency.
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4088eed..c55c566 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
WARN_ON(task_rq(p) != rq);
- if (cfs_rq->nr_running > 1) {
+ if (rq->cfs.h_nr_running > 1) {
u64 slice = sched_slice(cfs_rq, se);
u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
s64 delta = slice - ran;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
2016-09-17 1:28 [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks Joonwoo Park
@ 2016-09-17 23:28 ` Wanpeng Li
2016-09-19 18:08 ` Joonwoo Park
2016-09-19 8:21 ` Peter Zijlstra
2016-09-22 14:01 ` [tip:sched/core] sched/fair: " tip-bot for Srivatsa Vaddagiri
2 siblings, 1 reply; 8+ messages in thread
From: Wanpeng Li @ 2016-09-17 23:28 UTC (permalink / raw)
To: Joonwoo Park
Cc: Peter Zijlstra, Srivatsa Vaddagiri, Ingo Molnar, linux-kernel
2016-09-17 9:28 GMT+08:00 Joonwoo Park <joonwoop@codeaurora.org>:
> From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
>
> SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
> (just when they would have exceeded their ideal_runtime). It makes use
> of a per-cpu hrtimer resource and hence alarming that hrtimer should
> be based on total SCHED_FAIR tasks a cpu has across its various cfs_rqs,
> rather than being based on number of tasks in a particular cfs_rq (as
> implemented currently). As a result, with current code, its possible for
> a running task (which is the sole task in its cfs_rq) to be preempted
not be preempted much, right?
> much after its ideal_runtime has elapsed, resulting in increased latency
> for tasks in other cfs_rq on same cpu.
>
> Fix this by alarming sched hrtimer based on total number of SCHED_FAIR
> tasks a CPU has across its various cfs_rqs.
>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
> ---
>
> joonwoop: Do we also need to update or remove if-statement inside
> hrtick_update()?
> I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
> has large number of nr_running where slice is longer than sched_latency.
>
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4088eed..c55c566 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
>
> WARN_ON(task_rq(p) != rq);
>
> - if (cfs_rq->nr_running > 1) {
> + if (rq->cfs.h_nr_running > 1) {
> u64 slice = sched_slice(cfs_rq, se);
> u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
> s64 delta = slice - ran;
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> hosted by The Linux Foundation
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
2016-09-17 1:28 [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks Joonwoo Park
2016-09-17 23:28 ` Wanpeng Li
@ 2016-09-19 8:21 ` Peter Zijlstra
2016-09-19 18:04 ` Joonwoo Park
2016-09-22 14:01 ` [tip:sched/core] sched/fair: " tip-bot for Srivatsa Vaddagiri
2 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2016-09-19 8:21 UTC (permalink / raw)
To: Joonwoo Park; +Cc: Srivatsa Vaddagiri, Ingo Molnar, linux-kernel
On Fri, Sep 16, 2016 at 06:28:51PM -0700, Joonwoo Park wrote:
> From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
>
> SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
Right, but I always found the overhead of the thing too high to be
really useful.
How come you're using this?
> joonwoop: Do we also need to update or remove if-statement inside
> hrtick_update()?
> I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
> has large number of nr_running where slice is longer than sched_latency.
Right, you want that to match with whatever sched_slice() does.
> +++ b/kernel/sched/fair.c
> @@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
>
> WARN_ON(task_rq(p) != rq);
>
> - if (cfs_rq->nr_running > 1) {
> + if (rq->cfs.h_nr_running > 1) {
> u64 slice = sched_slice(cfs_rq, se);
> u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
> s64 delta = slice - ran;
Yeah, that looks right. I don't think I've ever tried hrtick with
cgroups enabled...
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
2016-09-19 8:21 ` Peter Zijlstra
@ 2016-09-19 18:04 ` Joonwoo Park
2016-09-19 18:17 ` Joonwoo Park
0 siblings, 1 reply; 8+ messages in thread
From: Joonwoo Park @ 2016-09-19 18:04 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Srivatsa Vaddagiri, Ingo Molnar, linux-kernel
On Mon, Sep 19, 2016 at 10:21:58AM +0200, Peter Zijlstra wrote:
> On Fri, Sep 16, 2016 at 06:28:51PM -0700, Joonwoo Park wrote:
> > From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
> >
> > SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
>
> Right, but I always found the overhead of the thing too high to be
> really useful.
>
> How come you're using this?
This patch was in our internal tree for decades so I unfortunately cannot
find actual usecase or history.
But I guess it was about excessive latency when there are number of CPU
bound tasks running on a CPU but on different cfs_rqs and CONFIG_HZ = 100.
See how I recreated :
* run 4 cpu hogs on the same cgroup [1] :
dd-960 [000] d..3 110.651060: sched_switch: prev_comm=dd prev_pid=960 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=959 next_prio=120
dd-959 [000] d..3 110.652566: sched_switch: prev_comm=dd prev_pid=959 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=961 next_prio=120
dd-961 [000] d..3 110.654072: sched_switch: prev_comm=dd prev_pid=961 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=962 next_prio=120
dd-962 [000] d..3 110.655578: sched_switch: prev_comm=dd prev_pid=962 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=960 next_prio=120
preempt every 1.5ms slice by hrtick.
* run 4 CPU hogs on 4 different cgroups [2] :
dd-964 [000] d..3 24.169873: sched_switch: prev_comm=dd prev_pid=964 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=966 next_prio=120
dd-966 [000] d..3 24.179873: sched_switch: prev_comm=dd prev_pid=966 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=965 next_prio=120
dd-965 [000] d..3 24.189873: sched_switch: prev_comm=dd prev_pid=965 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=967 next_prio=120
dd-967 [000] d..3 24.199873: sched_switch: prev_comm=dd prev_pid=967 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=964 next_prio=120
preempt every 10ms by scheduler tick so that all tasks suffers from 40ms preemption latency.
[1] :
dd if=/dev/zero of=/dev/zero &
dd if=/dev/zero of=/dev/zero &
dd if=/dev/zero of=/dev/zero &
dd if=/dev/zero of=/dev/zero &
[2] :
mount -t cgroup -o cpu cpu /sys/fs/cgroup
mkdir /sys/fs/cgroup/grp1
mkdir /sys/fs/cgroup/grp2
mkdir /sys/fs/cgroup/grp3
mkdir /sys/fs/cgroup/grp4
dd if=/dev/zero of=/dev/zero &
echo $! > /sys/fs/cgroup/grp1/tasks
dd if=/dev/zero of=/dev/zero &
echo $! > /sys/fs/cgroup/grp2/tasks
dd if=/dev/zero of=/dev/zero &
echo $! > /sys/fs/cgroup/grp3/tasks
dd if=/dev/zero of=/dev/zero &
echo $! > /sys/fs/cgroup/grp4/tasks
I could confirm this patch makes the latter behaves as same as the former in terms of preemption latency.
>
>
> > joonwoop: Do we also need to update or remove if-statement inside
> > hrtick_update()?
>
> > I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
> > has large number of nr_running where slice is longer than sched_latency.
>
> Right, you want that to match with whatever sched_slice() does.
Cool. Thank you!
Thanks,
Joonwoo
>
> > +++ b/kernel/sched/fair.c
> > @@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
> >
> > WARN_ON(task_rq(p) != rq);
> >
> > - if (cfs_rq->nr_running > 1) {
> > + if (rq->cfs.h_nr_running > 1) {
> > u64 slice = sched_slice(cfs_rq, se);
> > u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
> > s64 delta = slice - ran;
>
> Yeah, that looks right. I don't think I've ever tried hrtick with
> cgroups enabled...
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
2016-09-17 23:28 ` Wanpeng Li
@ 2016-09-19 18:08 ` Joonwoo Park
2016-09-19 23:30 ` Wanpeng Li
0 siblings, 1 reply; 8+ messages in thread
From: Joonwoo Park @ 2016-09-19 18:08 UTC (permalink / raw)
To: Wanpeng Li; +Cc: Peter Zijlstra, Srivatsa Vaddagiri, Ingo Molnar, linux-kernel
On Sun, Sep 18, 2016 at 07:28:32AM +0800, Wanpeng Li wrote:
> 2016-09-17 9:28 GMT+08:00 Joonwoo Park <joonwoop@codeaurora.org>:
> > From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
> >
> > SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
> > (just when they would have exceeded their ideal_runtime). It makes use
> > of a per-cpu hrtimer resource and hence alarming that hrtimer should
> > be based on total SCHED_FAIR tasks a cpu has across its various cfs_rqs,
> > rather than being based on number of tasks in a particular cfs_rq (as
> > implemented currently). As a result, with current code, its possible for
> > a running task (which is the sole task in its cfs_rq) to be preempted
>
> not be preempted much, right?
I don't think so....
By saying 'to be preempted much after its ideal_runtime has elapsed' I
wanted to describe the current suboptimal behaviour.
Thanks,
Joonwoo
>
> > much after its ideal_runtime has elapsed, resulting in increased latency
> > for tasks in other cfs_rq on same cpu.
> >
> > Fix this by alarming sched hrtimer based on total number of SCHED_FAIR
> > tasks a CPU has across its various cfs_rqs.
> >
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
> > Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
> > ---
> >
> > joonwoop: Do we also need to update or remove if-statement inside
> > hrtick_update()?
> > I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
> > has large number of nr_running where slice is longer than sched_latency.
> >
> > kernel/sched/fair.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 4088eed..c55c566 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
> >
> > WARN_ON(task_rq(p) != rq);
> >
> > - if (cfs_rq->nr_running > 1) {
> > + if (rq->cfs.h_nr_running > 1) {
> > u64 slice = sched_slice(cfs_rq, se);
> > u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
> > s64 delta = slice - ran;
> > --
> > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> > hosted by The Linux Foundation
> >
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
2016-09-19 18:04 ` Joonwoo Park
@ 2016-09-19 18:17 ` Joonwoo Park
0 siblings, 0 replies; 8+ messages in thread
From: Joonwoo Park @ 2016-09-19 18:17 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Srivatsa Vaddagiri, Ingo Molnar, linux-kernel
On Mon, Sep 19, 2016 at 11:04:49AM -0700, Joonwoo Park wrote:
> On Mon, Sep 19, 2016 at 10:21:58AM +0200, Peter Zijlstra wrote:
> > On Fri, Sep 16, 2016 at 06:28:51PM -0700, Joonwoo Park wrote:
> > > From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
> > >
> > > SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
> >
> > Right, but I always found the overhead of the thing too high to be
> > really useful.
> >
> > How come you're using this?
>
> This patch was in our internal tree for decades so I unfortunately cannot
> find actual usecase or history.
> But I guess it was about excessive latency when there are number of CPU
> bound tasks running on a CPU but on different cfs_rqs and CONFIG_HZ = 100.
>
> See how I recreated :
>
> * run 4 cpu hogs on the same cgroup [1] :
> dd-960 [000] d..3 110.651060: sched_switch: prev_comm=dd prev_pid=960 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=959 next_prio=120
> dd-959 [000] d..3 110.652566: sched_switch: prev_comm=dd prev_pid=959 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=961 next_prio=120
> dd-961 [000] d..3 110.654072: sched_switch: prev_comm=dd prev_pid=961 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=962 next_prio=120
> dd-962 [000] d..3 110.655578: sched_switch: prev_comm=dd prev_pid=962 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=960 next_prio=120
> preempt every 1.5ms slice by hrtick.
>
> * run 4 CPU hogs on 4 different cgroups [2] :
> dd-964 [000] d..3 24.169873: sched_switch: prev_comm=dd prev_pid=964 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=966 next_prio=120
> dd-966 [000] d..3 24.179873: sched_switch: prev_comm=dd prev_pid=966 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=965 next_prio=120
> dd-965 [000] d..3 24.189873: sched_switch: prev_comm=dd prev_pid=965 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=967 next_prio=120
> dd-967 [000] d..3 24.199873: sched_switch: prev_comm=dd prev_pid=967 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=964 next_prio=120
> preempt every 10ms by scheduler tick so that all tasks suffers from 40ms preemption latency.
>
> [1] :
> dd if=/dev/zero of=/dev/zero &
Ugh.. of=/dev/null instead.
> dd if=/dev/zero of=/dev/zero &
> dd if=/dev/zero of=/dev/zero &
> dd if=/dev/zero of=/dev/zero &
>
> [2] :
> mount -t cgroup -o cpu cpu /sys/fs/cgroup
> mkdir /sys/fs/cgroup/grp1
> mkdir /sys/fs/cgroup/grp2
> mkdir /sys/fs/cgroup/grp3
> mkdir /sys/fs/cgroup/grp4
> dd if=/dev/zero of=/dev/zero &
> echo $! > /sys/fs/cgroup/grp1/tasks
> dd if=/dev/zero of=/dev/zero &
> echo $! > /sys/fs/cgroup/grp2/tasks
> dd if=/dev/zero of=/dev/zero &
> echo $! > /sys/fs/cgroup/grp3/tasks
> dd if=/dev/zero of=/dev/zero &
> echo $! > /sys/fs/cgroup/grp4/tasks
>
> I could confirm this patch makes the latter behaves as same as the former in terms of preemption latency.
>
> >
> >
> > > joonwoop: Do we also need to update or remove if-statement inside
> > > hrtick_update()?
> >
> > > I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
> > > has large number of nr_running where slice is longer than sched_latency.
> >
> > Right, you want that to match with whatever sched_slice() does.
>
> Cool. Thank you!
>
> Thanks,
> Joonwoo
>
> >
> > > +++ b/kernel/sched/fair.c
> > > @@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
> > >
> > > WARN_ON(task_rq(p) != rq);
> > >
> > > - if (cfs_rq->nr_running > 1) {
> > > + if (rq->cfs.h_nr_running > 1) {
> > > u64 slice = sched_slice(cfs_rq, se);
> > > u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
> > > s64 delta = slice - ran;
> >
> > Yeah, that looks right. I don't think I've ever tried hrtick with
> > cgroups enabled...
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
2016-09-19 18:08 ` Joonwoo Park
@ 2016-09-19 23:30 ` Wanpeng Li
0 siblings, 0 replies; 8+ messages in thread
From: Wanpeng Li @ 2016-09-19 23:30 UTC (permalink / raw)
To: Joonwoo Park
Cc: Peter Zijlstra, Srivatsa Vaddagiri, Ingo Molnar, linux-kernel
2016-09-20 2:08 GMT+08:00 Joonwoo Park <joonwoop@codeaurora.org>:
> On Sun, Sep 18, 2016 at 07:28:32AM +0800, Wanpeng Li wrote:
>> 2016-09-17 9:28 GMT+08:00 Joonwoo Park <joonwoop@codeaurora.org>:
>> > From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
>> >
>> > SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
>> > (just when they would have exceeded their ideal_runtime). It makes use
>> > of a per-cpu hrtimer resource and hence alarming that hrtimer should
>> > be based on total SCHED_FAIR tasks a cpu has across its various cfs_rqs,
>> > rather than being based on number of tasks in a particular cfs_rq (as
>> > implemented currently). As a result, with current code, its possible for
>> > a running task (which is the sole task in its cfs_rq) to be preempted
>>
>> not be preempted much, right?
>
> I don't think so....
> By saying 'to be preempted much after its ideal_runtime has elapsed' I
> wanted to describe the current suboptimal behaviour.
I also described the current suboptimal behaviour. A running task
(which is the sole task in its cfs_rq) as the test case 2 you
mentioned in another reply is preempted after 10ms since the scheduler
tick, however, it will be preempted after 1.5ms since the hrtick fire
in test case 1. That's what I mean "not be preempted much for test
case 2".
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 8+ messages in thread
* [tip:sched/core] sched/fair: Fix SCHED_HRTICK bug leading to late preemption of tasks
2016-09-17 1:28 [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks Joonwoo Park
2016-09-17 23:28 ` Wanpeng Li
2016-09-19 8:21 ` Peter Zijlstra
@ 2016-09-22 14:01 ` tip-bot for Srivatsa Vaddagiri
2 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Srivatsa Vaddagiri @ 2016-09-22 14:01 UTC (permalink / raw)
To: linux-tip-commits
Cc: torvalds, hpa, linux-kernel, tglx, mingo, peterz, joonwoop, vatsa
Commit-ID: 8bf46a39be910937d4c9e8d999a7438a7ae1a75b
Gitweb: http://git.kernel.org/tip/8bf46a39be910937d4c9e8d999a7438a7ae1a75b
Author: Srivatsa Vaddagiri <vatsa@codeaurora.org>
AuthorDate: Fri, 16 Sep 2016 18:28:51 -0700
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 22 Sep 2016 15:20:18 +0200
sched/fair: Fix SCHED_HRTICK bug leading to late preemption of tasks
SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
(just when they would have exceeded their ideal_runtime).
It makes use of a per-CPU hrtimer resource and hence arming that
hrtimer should be based on total SCHED_FAIR tasks a CPU has across its
various cfs_rqs, rather than being based on number of tasks in a
particular cfs_rq (as implemented currently).
As a result, with current code, its possible for a running task (which
is the sole task in its cfs_rq) to be preempted much after its
ideal_runtime has elapsed, resulting in increased latency for tasks in
other cfs_rq on same CPU.
Fix this by arming sched hrtimer based on total number of SCHED_FAIR
tasks a CPU has across its various cfs_rqs.
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1474075731-11550-1-git-send-email-joonwoop@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 986c10c..8fb4d19 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4469,7 +4469,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
WARN_ON(task_rq(p) != rq);
- if (cfs_rq->nr_running > 1) {
+ if (rq->cfs.h_nr_running > 1) {
u64 slice = sched_slice(cfs_rq, se);
u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
s64 delta = slice - ran;
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-09-22 14:02 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-17 1:28 [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks Joonwoo Park
2016-09-17 23:28 ` Wanpeng Li
2016-09-19 18:08 ` Joonwoo Park
2016-09-19 23:30 ` Wanpeng Li
2016-09-19 8:21 ` Peter Zijlstra
2016-09-19 18:04 ` Joonwoo Park
2016-09-19 18:17 ` Joonwoo Park
2016-09-22 14:01 ` [tip:sched/core] sched/fair: " tip-bot for Srivatsa Vaddagiri
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).