[PATCH] sched/fair : prevent unlimited runtime on throttled group

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] sched/fair : prevent unlimited runtime on throttled group
@ 2020-01-14 14:13 Vincent Guittot
  2020-01-14 18:29 ` bsegall
  2020-01-29 11:32 ` [tip: sched/core] sched/fair: Prevent " tip-bot2 for Vincent Guittot
  0 siblings, 2 replies; 5+ messages in thread
From: Vincent Guittot @ 2020-01-14 14:13 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel
  Cc: Vincent Guittot

When a running task is moved on a throttled task group and there is no
other task enqueued on the CPU, the task can keep running using 100% CPU
whatever the allocated bandwidth for the group and although its cfs rq is
throttled. Furthermore, the group entity of the cfs_rq and its parents are
not enqueued but only set as curr on their respective cfs_rqs.

We have the following sequence:

sched_move_task
  -dequeue_task: dequeue task and group_entities.
  -put_prev_task: put task and group entities.
  -sched_change_group: move task to new group.
  -enqueue_task: enqueue only task but not group entities because cfs_rq is
    throttled.
  -set_next_task : set task and group_entities as current sched_entity of
    their cfs_rq.

Another impact is that the root cfs_rq runnable_load_avg at root rq stays
null because the group_entities are not enqueued. This situation will stay
the same until an "external" event triggers a reschedule. Let trigger it
immediately instead.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/core.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e7b08d52db93..d0acc67336c0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7062,8 +7062,15 @@ void sched_move_task(struct task_struct *tsk)
 
 	if (queued)
 		enqueue_task(rq, tsk, queue_flags);
-	if (running)
+	if (running) {
 		set_next_task(rq, tsk);
+		/*
+		 * After changing group, the running task may have joined a
+		 * throttled one but it's still the running task. Trigger a
+		 * resched to make sure that task can still run.
+		 */
+		resched_curr(rq);
+	}
 
 	task_rq_unlock(rq, tsk, &rf);
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/fair : prevent unlimited runtime on throttled group
  2020-01-14 14:13 [PATCH] sched/fair : prevent unlimited runtime on throttled group Vincent Guittot
@ 2020-01-14 18:29 ` bsegall
  2020-01-20  9:46   ` Peter Zijlstra
  2020-01-29 11:32 ` [tip: sched/core] sched/fair: Prevent " tip-bot2 for Vincent Guittot
  1 sibling, 1 reply; 5+ messages in thread
From: bsegall @ 2020-01-14 18:29 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel

Vincent Guittot <vincent.guittot@linaro.org> writes:

> When a running task is moved on a throttled task group and there is no
> other task enqueued on the CPU, the task can keep running using 100% CPU
> whatever the allocated bandwidth for the group and although its cfs rq is
> throttled. Furthermore, the group entity of the cfs_rq and its parents are
> not enqueued but only set as curr on their respective cfs_rqs.
>
> We have the following sequence:
>
> sched_move_task
>   -dequeue_task: dequeue task and group_entities.
>   -put_prev_task: put task and group entities.
>   -sched_change_group: move task to new group.
>   -enqueue_task: enqueue only task but not group entities because cfs_rq is
>     throttled.
>   -set_next_task : set task and group_entities as current sched_entity of
>     their cfs_rq.
>
> Another impact is that the root cfs_rq runnable_load_avg at root rq stays
> null because the group_entities are not enqueued. This situation will stay
> the same until an "external" event triggers a reschedule. Let trigger it
> immediately instead.

Sounds reasonable to me, "moved group" being an explicit resched check
doesn't sound like a problem in general.

>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/core.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e7b08d52db93..d0acc67336c0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7062,8 +7062,15 @@ void sched_move_task(struct task_struct *tsk)
>  
>  	if (queued)
>  		enqueue_task(rq, tsk, queue_flags);
> -	if (running)
> +	if (running) {
>  		set_next_task(rq, tsk);
> +		/*
> +		 * After changing group, the running task may have joined a
> +		 * throttled one but it's still the running task. Trigger a
> +		 * resched to make sure that task can still run.
> +		 */
> +		resched_curr(rq);
> +	}
>  
>  	task_rq_unlock(rq, tsk, &rf);
>  }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/fair : prevent unlimited runtime on throttled group
  2020-01-14 18:29 ` bsegall
@ 2020-01-20  9:46   ` Peter Zijlstra
  2020-01-21 18:26     ` bsegall
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2020-01-20  9:46 UTC (permalink / raw)
  To: bsegall
  Cc: Vincent Guittot, mingo, juri.lelli, dietmar.eggemann, rostedt,
	mgorman, linux-kernel

On Tue, Jan 14, 2020 at 10:29:43AM -0800, bsegall@google.com wrote:
> Vincent Guittot <vincent.guittot@linaro.org> writes:
> 
> > When a running task is moved on a throttled task group and there is no
> > other task enqueued on the CPU, the task can keep running using 100% CPU
> > whatever the allocated bandwidth for the group and although its cfs rq is
> > throttled. Furthermore, the group entity of the cfs_rq and its parents are
> > not enqueued but only set as curr on their respective cfs_rqs.
> >
> > We have the following sequence:
> >
> > sched_move_task
> >   -dequeue_task: dequeue task and group_entities.
> >   -put_prev_task: put task and group entities.
> >   -sched_change_group: move task to new group.
> >   -enqueue_task: enqueue only task but not group entities because cfs_rq is
> >     throttled.
> >   -set_next_task : set task and group_entities as current sched_entity of
> >     their cfs_rq.
> >
> > Another impact is that the root cfs_rq runnable_load_avg at root rq stays
> > null because the group_entities are not enqueued. This situation will stay
> > the same until an "external" event triggers a reschedule. Let trigger it
> > immediately instead.
> 
> Sounds reasonable to me, "moved group" being an explicit resched check
> doesn't sound like a problem in general.

Do I read that as an Ack from you Ben? :-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/fair : prevent unlimited runtime on throttled group
  2020-01-20  9:46   ` Peter Zijlstra
@ 2020-01-21 18:26     ` bsegall
  0 siblings, 0 replies; 5+ messages in thread
From: bsegall @ 2020-01-21 18:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: bsegall, Vincent Guittot, mingo, juri.lelli, dietmar.eggemann,
	rostedt, mgorman, linux-kernel

Peter Zijlstra <peterz@infradead.org> writes:

> On Tue, Jan 14, 2020 at 10:29:43AM -0800, bsegall@google.com wrote:
>> Vincent Guittot <vincent.guittot@linaro.org> writes:
>> 
>> > When a running task is moved on a throttled task group and there is no
>> > other task enqueued on the CPU, the task can keep running using 100% CPU
>> > whatever the allocated bandwidth for the group and although its cfs rq is
>> > throttled. Furthermore, the group entity of the cfs_rq and its parents are
>> > not enqueued but only set as curr on their respective cfs_rqs.
>> >
>> > We have the following sequence:
>> >
>> > sched_move_task
>> >   -dequeue_task: dequeue task and group_entities.
>> >   -put_prev_task: put task and group entities.
>> >   -sched_change_group: move task to new group.
>> >   -enqueue_task: enqueue only task but not group entities because cfs_rq is
>> >     throttled.
>> >   -set_next_task : set task and group_entities as current sched_entity of
>> >     their cfs_rq.
>> >
>> > Another impact is that the root cfs_rq runnable_load_avg at root rq stays
>> > null because the group_entities are not enqueued. This situation will stay
>> > the same until an "external" event triggers a reschedule. Let trigger it
>> > immediately instead.
>> 
>> Sounds reasonable to me, "moved group" being an explicit resched check
>> doesn't sound like a problem in general.
>
> Do I read that as an Ack from you Ben? :-)

Yeah,

Acked-by: Ben Segall <bsegall@google.com>

The only question I see is if we care about avoiding the overhead for
non-cfsb cases, but cgroup attach is already slow enough that it's
probably not a real problem, and it's reasonable to check if it's still
right to run this task in general.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip: sched/core] sched/fair: Prevent unlimited runtime on throttled group
  2020-01-14 14:13 [PATCH] sched/fair : prevent unlimited runtime on throttled group Vincent Guittot
  2020-01-14 18:29 ` bsegall
@ 2020-01-29 11:32 ` tip-bot2 for Vincent Guittot
  1 sibling, 0 replies; 5+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2020-01-29 11:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Guittot, Peter Zijlstra (Intel),
	Ingo Molnar, Ben Segall, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     2a4b03ffc69f2dedc6388e9a6438b5f4c133a40d
Gitweb:        https://git.kernel.org/tip/2a4b03ffc69f2dedc6388e9a6438b5f4c133a40d
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Tue, 14 Jan 2020 15:13:56 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 28 Jan 2020 21:36:58 +01:00

sched/fair: Prevent unlimited runtime on throttled group

When a running task is moved on a throttled task group and there is no
other task enqueued on the CPU, the task can keep running using 100% CPU
whatever the allocated bandwidth for the group and although its cfs rq is
throttled. Furthermore, the group entity of the cfs_rq and its parents are
not enqueued but only set as curr on their respective cfs_rqs.

We have the following sequence:

sched_move_task
  -dequeue_task: dequeue task and group_entities.
  -put_prev_task: put task and group entities.
  -sched_change_group: move task to new group.
  -enqueue_task: enqueue only task but not group entities because cfs_rq is
    throttled.
  -set_next_task : set task and group_entities as current sched_entity of
    their cfs_rq.

Another impact is that the root cfs_rq runnable_load_avg at root rq stays
null because the group_entities are not enqueued. This situation will stay
the same until an "external" event triggers a reschedule. Let trigger it
immediately instead.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Ben Segall <bsegall@google.com>
Link: https://lkml.kernel.org/r/1579011236-31256-1-git-send-email-vincent.guittot@linaro.org
---
 kernel/sched/core.c |  9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a8a5d5b..89e54f3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7072,8 +7072,15 @@ void sched_move_task(struct task_struct *tsk)
 
 	if (queued)
 		enqueue_task(rq, tsk, queue_flags);
-	if (running)
+	if (running) {
 		set_next_task(rq, tsk);
+		/*
+		 * After changing group, the running task may have joined a
+		 * throttled one but it's still the running task. Trigger a
+		 * resched to make sure that task can still run.
+		 */
+		resched_curr(rq);
+	}
 
 	task_rq_unlock(rq, tsk, &rf);
 }

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-01-29 11:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-14 14:13 [PATCH] sched/fair : prevent unlimited runtime on throttled group Vincent Guittot
2020-01-14 18:29 ` bsegall
2020-01-20  9:46   ` Peter Zijlstra
2020-01-21 18:26     ` bsegall
2020-01-29 11:32 ` [tip: sched/core] sched/fair: Prevent " tip-bot2 for Vincent Guittot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.