From: Vincent Guittot <vincent.guittot@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Yuyang Du <yuyang.du@intel.com>, Ingo Molnar <mingo@kernel.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Mike Galbraith <umgwanakikbuti@gmail.com>,
Benjamin Segall <bsegall@google.com>,
Paul Turner <pjt@google.com>,
Morten Rasmussen <morten.rasmussen@arm.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Matt Fleming <matt@codeblueprint.co.uk>
Subject: Re: [PATCH v6 1/4] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group
Date: Thu, 16 Jun 2016 18:30:13 +0200 [thread overview]
Message-ID: <20160616163013.GA32169@vingu-laptop> (raw)
In-Reply-To: <20160615152217.GN30921@twins.programming.kicks-ass.net>
Le Wednesday 15 Jun 2016 à 17:22:17 (+0200), Peter Zijlstra a écrit :
> On Wed, Jun 15, 2016 at 09:46:53AM +0200, Vincent Guittot wrote:
> > I still have concerned with this change of the behavior that attaches
> > the task only when it is enqueued. The load avg of the task will not
> > be decayed between the time we move it into its new group until its
> > enqueue. With this change, a task's load can stay high whereas it has
> > slept for the last couple of seconds. Then, its load and utilization
> > is no more accounted anywhere in the mean time just because we have
> > moved the task which will be enqueued on the same rq.
> > A task should always be attached to a cfs_rq and its load/utilization
> > should always be accounted on a cfs_rq and decayed for its sleep
> > period
>
> OK; so I think I agree with that. Does the below (completely untested,
> hasn't even been near a compiler) look reasonable?
>
> The general idea is to always attach to a cfs_rq -- through
> post_init_entity_util_avg(). This covers both the new task isn't
> attached yet and was never in the fair class to begin with issues.
Your patch ensures that a task will be attached to a cfs_rq and fix the issue raised by Yuyang because of se->avg.last_update_time = 0 at init. During the test the following message has raised "BUG: using smp_processor_id() in preemptible [00000000] code: systemd/1" because of cfs_rq_util_change that is called in attach_entity_load_avg
With patch [1] for the init of cfs_rq side, all use cases will be covered regarding the issue linked to a last_update_time set to 0 at init
[1] https://lkml.org/lkml/2016/5/30/508
>
> That only leaves a tiny hole in fork() where the task is hashed but
> hasn't yet passed through wake_up_new_task() in which someone can do
> cgroup move on it. That is closed with TASK_NEW and can_attach()
> refusing those tasks.
>
But a new fair task is still detached and attached from/to task_group with cgroup_post_fork()-->ss->fork(child)-->cpu_cgroup_fork()-->sched_move_task()-->task_move_group_fair().
cpu_cgroup_can_attach is not used in this path and sched_move_task do the move unconditionally for fair task.
With your patch, we still have the sequence
sched_fork()
set_task_cpu()
cgroup_post_fork()--> ... --> task_move_group_fair()
detach_task_cfs_rq()
set_task_rq()
attach_task_cfs_rq()
wake_up_new_task()
select_task_rq() can select a new cpu
set_task_cpu()
migrate_task_rq_fair if the new_cpu != task_cpu
remove_load()
__set_task_cpu
post_init_entity_util_avg
attach_task_cfs_rq()
activate_task
enqueue_task
In fact, cpu_cgroup_fork needs a small part of sched_move_task so we can just call this small part directly instead sched_move_task. And the task doesn't really migrate because it is not yet attached so we need the sequence:
sched_fork()
__set_task_cpu()
cgroup_post_fork()--> ... --> task_move_group_fair()
set_task_rq() to set task group and runqueue
wake_up_new_task()
select_task_rq() can select a new cpu
__set_task_cpu
post_init_entity_util_avg
attach_task_cfs_rq()
activate_task
enqueue_task
The patch below on top of your patch, ensures that we follow the right sequence :
---
kernel/sched/core.c | 60 +++++++++++++++++++++++++++++++++++------------------
1 file changed, 40 insertions(+), 20 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7895689a..a21e3dc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2373,7 +2373,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
* Silence PROVE_RCU.
*/
raw_spin_lock_irqsave(&p->pi_lock, flags);
- set_task_cpu(p, cpu);
+ __set_task_cpu(p, cpu);
if (p->sched_class->task_fork)
p->sched_class->task_fork(p);
raw_spin_unlock_irqrestore(&p->pi_lock, flags);
@@ -2515,7 +2515,7 @@ void wake_up_new_task(struct task_struct *p)
* - cpus_allowed can change in the fork path
* - any previously selected cpu might disappear through hotplug
*/
- set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
+ __set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
#endif
/* Post initialize new task's util average when its cfs_rq is set */
post_init_entity_util_avg(&p->se);
@@ -7715,6 +7715,35 @@ void sched_offline_group(struct task_group *tg)
spin_unlock_irqrestore(&task_group_lock, flags);
}
+/* Set task's runqueue and group
+ * In case of a move between group, we update src and dst group
+ * thanks to sched_class->task_move_group. Otherwise, we just need to set
+ * runqueue and group pointers. The task will be attached to the runqueue
+ * during its wake up.
+ */
+static void sched_set_group(struct task_struct *tsk, bool move)
+{
+ struct task_group *tg;
+
+ /*
+ * All callers are synchronized by task_rq_lock(); we do not use RCU
+ * which is pointless here. Thus, we pass "true" to task_css_check()
+ * to prevent lockdep warnings.
+ */
+ tg = container_of(task_css_check(tsk, cpu_cgrp_id, true),
+ struct task_group, css);
+ tg = autogroup_task_group(tsk, tg);
+ tsk->sched_task_group = tg;
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
+ if (move && tsk->sched_class->task_move_group)
+ tsk->sched_class->task_move_group(tsk);
+ else
+#endif
+ set_task_rq(tsk, task_cpu(tsk));
+
+}
+
/* change task's runqueue when it moves between groups.
* The caller of this function should have put the task in its new group
* by now. This function just updates tsk->se.cfs_rq and tsk->se.parent to
@@ -7722,7 +7751,6 @@ void sched_offline_group(struct task_group *tg)
*/
void sched_move_task(struct task_struct *tsk)
{
- struct task_group *tg;
int queued, running;
struct rq_flags rf;
struct rq *rq;
@@ -7737,22 +7765,7 @@ void sched_move_task(struct task_struct *tsk)
if (unlikely(running))
put_prev_task(rq, tsk);
- /*
- * All callers are synchronized by task_rq_lock(); we do not use RCU
- * which is pointless here. Thus, we pass "true" to task_css_check()
- * to prevent lockdep warnings.
- */
- tg = container_of(task_css_check(tsk, cpu_cgrp_id, true),
- struct task_group, css);
- tg = autogroup_task_group(tsk, tg);
- tsk->sched_task_group = tg;
-
-#ifdef CONFIG_FAIR_GROUP_SCHED
- if (tsk->sched_class->task_move_group)
- tsk->sched_class->task_move_group(tsk);
- else
-#endif
- set_task_rq(tsk, task_cpu(tsk));
+ sched_set_group(tsk, true);
if (unlikely(running))
tsk->sched_class->set_curr_task(rq);
@@ -8182,7 +8195,14 @@ static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)
static void cpu_cgroup_fork(struct task_struct *task)
{
- sched_move_task(task);
+ struct rq_flags rf;
+ struct rq *rq;
+
+ rq = task_rq_lock(task, &rf);
+
+ sched_set_group(task, false);
+
+ task_rq_unlock(rq, task, &rf);
}
static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)
--
1.9.1
next prev parent reply other threads:[~2016-06-16 16:30 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-14 22:21 [PATCH v6 0/4] sched/fair: Fix attach and detach sched avgs for task group change and sched class change Yuyang Du
2016-06-14 22:21 ` [PATCH v6 1/4] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group Yuyang Du
2016-06-15 7:46 ` Vincent Guittot
2016-06-15 0:18 ` Yuyang Du
2016-06-15 14:15 ` Vincent Guittot
2016-06-15 15:22 ` Peter Zijlstra
2016-06-16 1:00 ` Yuyang Du
2016-06-16 16:30 ` Vincent Guittot [this message]
2016-06-16 17:17 ` Peter Zijlstra
2016-06-16 18:57 ` Vincent Guittot
2016-06-16 18:51 ` Peter Zijlstra
2016-06-16 19:00 ` Vincent Guittot
2016-06-16 20:07 ` Peter Zijlstra
2016-06-16 21:21 ` Vincent Guittot
2016-06-17 2:12 ` Yuyang Du
2016-06-17 12:00 ` Vincent Guittot
2016-06-17 9:48 ` Peter Zijlstra
2016-06-17 11:31 ` Peter Zijlstra
2016-06-14 22:21 ` [PATCH v6 2/4] sched/fair: Move load and util avgs from wake_up_new_task() to sched_fork() Yuyang Du
2016-06-14 22:21 ` [PATCH v6 3/4] sched/fair: Skip detach sched avgs for new task when changing task groups Yuyang Du
2016-06-14 22:21 ` [PATCH v6 4/4] sched/fair: Add inline to detach_entity_load_evg() Yuyang Du
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160616163013.GA32169@vingu-laptop \
--to=vincent.guittot@linaro.org \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=matt@codeblueprint.co.uk \
--cc=mingo@kernel.org \
--cc=morten.rasmussen@arm.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=umgwanakikbuti@gmail.com \
--cc=yuyang.du@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).