All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Yuyang Du <yuyang.du@intel.com>, Ingo Molnar <mingo@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Benjamin Segall <bsegall@google.com>,
	Paul Turner <pjt@google.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Matt Fleming <matt@codeblueprint.co.uk>
Subject: Re: [PATCH v6 1/4] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group
Date: Thu, 16 Jun 2016 18:30:13 +0200	[thread overview]
Message-ID: <20160616163013.GA32169@vingu-laptop> (raw)
In-Reply-To: <20160615152217.GN30921@twins.programming.kicks-ass.net>

Le Wednesday 15 Jun 2016 à 17:22:17 (+0200), Peter Zijlstra a écrit :
> On Wed, Jun 15, 2016 at 09:46:53AM +0200, Vincent Guittot wrote:
> > I still have concerned with this change of the behavior that  attaches
> > the task only when it is enqueued. The load avg of the task will not
> > be decayed between the time we move it into its new group until its
> > enqueue. With this change, a task's load can stay high whereas it has
> > slept for the last couple of seconds. Then, its load and utilization
> > is no more accounted anywhere in the mean time just because we have
> > moved the task which will be enqueued on the same rq.
> > A task should always be attached to a cfs_rq and its load/utilization
> > should always be accounted on a cfs_rq and decayed for its sleep
> > period
> 
> OK; so I think I agree with that. Does the below (completely untested,
> hasn't even been near a compiler) look reasonable?
> 
> The general idea is to always attach to a cfs_rq -- through
> post_init_entity_util_avg(). This covers both the new task isn't
> attached yet and was never in the fair class to begin with issues.

Your patch ensures that a task will be attached to a cfs_rq and fix the issue raised by Yuyang because of se->avg.last_update_time = 0 at init. During the test the following message has raised  "BUG: using smp_processor_id() in preemptible [00000000] code: systemd/1" because of cfs_rq_util_change that is called in attach_entity_load_avg

With patch [1] for the init of cfs_rq side, all use cases will be covered regarding the issue linked to a last_update_time set to 0 at init
[1] https://lkml.org/lkml/2016/5/30/508

> 
> That only leaves a tiny hole in fork() where the task is hashed but
> hasn't yet passed through wake_up_new_task() in which someone can do
> cgroup move on it. That is closed with TASK_NEW and can_attach()
> refusing those tasks.
> 

But a new fair task is still detached and attached from/to task_group with cgroup_post_fork()-->ss->fork(child)-->cpu_cgroup_fork()-->sched_move_task()-->task_move_group_fair().
cpu_cgroup_can_attach is not used in this path and sched_move_task do the move unconditionally for fair task.

With your patch, we still have the sequence

sched_fork()
    set_task_cpu()
cgroup_post_fork()--> ... --> task_move_group_fair()
    detach_task_cfs_rq()
    set_task_rq()
    attach_task_cfs_rq()
wake_up_new_task()
    select_task_rq() can select a new cpu
    set_task_cpu()
        migrate_task_rq_fair if the new_cpu != task_cpu
             remove_load()
        __set_task_cpu
    post_init_entity_util_avg
        attach_task_cfs_rq()
    activate_task
        enqueue_task

In fact, cpu_cgroup_fork needs a small part of sched_move_task so we can just call this small part directly instead sched_move_task. And the task doesn't really migrate because it is not yet attached so we need the sequence:
sched_fork()
    __set_task_cpu()
cgroup_post_fork()--> ... --> task_move_group_fair()
    set_task_rq() to set task group and runqueue
wake_up_new_task()
    select_task_rq() can select a new cpu
    __set_task_cpu
    post_init_entity_util_avg
        attach_task_cfs_rq()
    activate_task
        enqueue_task

The patch below on top of your patch, ensures that we follow the right sequence :

---
 kernel/sched/core.c | 60 +++++++++++++++++++++++++++++++++++------------------
 1 file changed, 40 insertions(+), 20 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7895689a..a21e3dc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2373,7 +2373,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 	 * Silence PROVE_RCU.
 	 */
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
-	set_task_cpu(p, cpu);
+	__set_task_cpu(p, cpu);
 	if (p->sched_class->task_fork)
 		p->sched_class->task_fork(p);
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
@@ -2515,7 +2515,7 @@ void wake_up_new_task(struct task_struct *p)
 	 *  - cpus_allowed can change in the fork path
 	 *  - any previously selected cpu might disappear through hotplug
 	 */
-	set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
+	__set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
 #endif
 	/* Post initialize new task's util average when its cfs_rq is set */
 	post_init_entity_util_avg(&p->se);
@@ -7715,6 +7715,35 @@ void sched_offline_group(struct task_group *tg)
 	spin_unlock_irqrestore(&task_group_lock, flags);
 }
 
+/* Set task's runqueue and group
+ *     In case of a move between group, we update src and dst group
+ *     thanks to sched_class->task_move_group. Otherwise, we just need to set
+ *     runqueue and group pointers. The task will be attached to the runqueue
+ *     during its wake up.
+ */
+static void sched_set_group(struct task_struct *tsk, bool move)
+{
+	struct task_group *tg;
+
+	/*
+	 * All callers are synchronized by task_rq_lock(); we do not use RCU
+	 * which is pointless here. Thus, we pass "true" to task_css_check()
+	 * to prevent lockdep warnings.
+	 */
+	tg = container_of(task_css_check(tsk, cpu_cgrp_id, true),
+			  struct task_group, css);
+	tg = autogroup_task_group(tsk, tg);
+	tsk->sched_task_group = tg;
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	if (move && tsk->sched_class->task_move_group)
+		tsk->sched_class->task_move_group(tsk);
+	else
+#endif
+		set_task_rq(tsk, task_cpu(tsk));
+
+}
+
 /* change task's runqueue when it moves between groups.
  *	The caller of this function should have put the task in its new group
  *	by now. This function just updates tsk->se.cfs_rq and tsk->se.parent to
@@ -7722,7 +7751,6 @@ void sched_offline_group(struct task_group *tg)
  */
 void sched_move_task(struct task_struct *tsk)
 {
-	struct task_group *tg;
 	int queued, running;
 	struct rq_flags rf;
 	struct rq *rq;
@@ -7737,22 +7765,7 @@ void sched_move_task(struct task_struct *tsk)
 	if (unlikely(running))
 		put_prev_task(rq, tsk);
 
-	/*
-	 * All callers are synchronized by task_rq_lock(); we do not use RCU
-	 * which is pointless here. Thus, we pass "true" to task_css_check()
-	 * to prevent lockdep warnings.
-	 */
-	tg = container_of(task_css_check(tsk, cpu_cgrp_id, true),
-			  struct task_group, css);
-	tg = autogroup_task_group(tsk, tg);
-	tsk->sched_task_group = tg;
-
-#ifdef CONFIG_FAIR_GROUP_SCHED
-	if (tsk->sched_class->task_move_group)
-		tsk->sched_class->task_move_group(tsk);
-	else
-#endif
-		set_task_rq(tsk, task_cpu(tsk));
+	sched_set_group(tsk, true);
 
 	if (unlikely(running))
 		tsk->sched_class->set_curr_task(rq);
@@ -8182,7 +8195,14 @@ static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)
 
 static void cpu_cgroup_fork(struct task_struct *task)
 {
-	sched_move_task(task);
+	struct rq_flags rf;
+	struct rq *rq;
+
+	rq = task_rq_lock(task, &rf);
+
+	sched_set_group(task, false);
+
+	task_rq_unlock(rq, task, &rf);
 }
 
 static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)
-- 
1.9.1

  parent reply	other threads:[~2016-06-16 16:30 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-14 22:21 [PATCH v6 0/4] sched/fair: Fix attach and detach sched avgs for task group change and sched class change Yuyang Du
2016-06-14 22:21 ` [PATCH v6 1/4] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group Yuyang Du
2016-06-15  7:46   ` Vincent Guittot
2016-06-15  0:18     ` Yuyang Du
2016-06-15 14:15       ` Vincent Guittot
2016-06-15 15:22     ` Peter Zijlstra
2016-06-16  1:00       ` Yuyang Du
2016-06-16 16:30       ` Vincent Guittot [this message]
2016-06-16 17:17         ` Peter Zijlstra
2016-06-16 18:57           ` Vincent Guittot
2016-06-16 18:51         ` Peter Zijlstra
2016-06-16 19:00           ` Vincent Guittot
2016-06-16 20:07             ` Peter Zijlstra
2016-06-16 21:21               ` Vincent Guittot
2016-06-17  2:12                 ` Yuyang Du
2016-06-17 12:00                   ` Vincent Guittot
2016-06-17  9:48                 ` Peter Zijlstra
2016-06-17 11:31                 ` Peter Zijlstra
2016-06-14 22:21 ` [PATCH v6 2/4] sched/fair: Move load and util avgs from wake_up_new_task() to sched_fork() Yuyang Du
2016-06-14 22:21 ` [PATCH v6 3/4] sched/fair: Skip detach sched avgs for new task when changing task groups Yuyang Du
2016-06-14 22:21 ` [PATCH v6 4/4] sched/fair: Add inline to detach_entity_load_evg() Yuyang Du

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160616163013.GA32169@vingu-laptop \
    --to=vincent.guittot@linaro.org \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=umgwanakikbuti@gmail.com \
    --cc=yuyang.du@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.