From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753038AbbFAON0 (ORCPT ); Mon, 1 Jun 2015 10:13:26 -0400 Received: from casper.infradead.org ([85.118.1.10]:50771 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752772AbbFAONT (ORCPT ); Mon, 1 Jun 2015 10:13:19 -0400 Message-Id: <20150601140839.852669281@infradead.org> User-Agent: quilt/0.61-1 Date: Mon, 01 Jun 2015 15:58:21 +0200 From: Peter Zijlstra To: umgwanakikbuti@gmail.com, mingo@elte.hu Cc: ktkhai@parallels.com, rostedt@goodmis.org, juri.lelli@gmail.com, pang.xunlei@linaro.org, oleg@redhat.com, linux-kernel@vger.kernel.org, "Peter Zijlstra" Subject: [RFC][PATCH 3/7] sched: Allow balance callbacks for check_class_changed() References: <20150601135818.506080835@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline; filename=peterz-sched-post_schedule-7.patch Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In order to remove dropping rq->lock from the switched_{to,from}()/prio_changed() sched_class methods, run the balance callbacks after it. We need to remove dropping rq->lock because its buggy, suppose using sched_setattr()/sched_setscheduler() to change a running task from FIFO to OTHER. By the time we get to switched_from_rt() the task is already enqueued on the cfs runqueues. If switched_from_rt() does pull_rt_task() and drops rq->lock, load-balancing can come in and move our task @p to another rq. The subsequent switched_to_fair() still assumes @p is on @rq and bad things will happen. By using balance callbacks we delay the load-balancing operations {rt,dl}x{push,pull} until we've done all the important work and the task is fully set up. Furthermore, the balance callbacks do not know about @p, therefore they cannot get confused like this. Reported-by: Mike Galbraith Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1001,7 +1001,11 @@ inline int task_curr(const struct task_s } /* - * Can drop rq->lock because from sched_class::switched_from() methods drop it. + * switched_from, switched_to and prio_changed must _NOT_ drop rq->lock, + * use the balance_callback list if you want balancing. + * + * this means any call to check_class_changed() must be followed by a call to + * balance_callback(). */ static inline void check_class_changed(struct rq *rq, struct task_struct *p, const struct sched_class *prev_class, @@ -1010,7 +1014,7 @@ static inline void check_class_changed(s if (prev_class != p->sched_class) { if (prev_class->switched_from) prev_class->switched_from(rq, p); - /* Possble rq->lock 'hole'. */ + p->sched_class->switched_to(rq, p); } else if (oldprio != p->prio || dl_task(p)) p->sched_class->prio_changed(rq, p, oldprio); @@ -1491,8 +1495,12 @@ ttwu_do_wakeup(struct rq *rq, struct tas p->state = TASK_RUNNING; #ifdef CONFIG_SMP - if (p->sched_class->task_woken) + if (p->sched_class->task_woken) { + /* + * XXX can drop rq->lock; most likely ok. + */ p->sched_class->task_woken(rq, p); + } if (rq->idle_stamp) { u64 delta = rq_clock(rq) - rq->idle_stamp; @@ -3094,7 +3102,11 @@ void rt_mutex_setprio(struct task_struct check_class_changed(rq, p, prev_class, oldprio); out_unlock: + preempt_disable(); /* avoid rq from going away on us */ __task_rq_unlock(rq); + + balance_callback(rq); + preempt_enable(); } #endif @@ -3655,11 +3667,18 @@ static int __sched_setscheduler(struct t } check_class_changed(rq, p, prev_class, oldprio); + preempt_disable(); /* avoid rq from going away on us */ task_rq_unlock(rq, p, &flags); if (pi) rt_mutex_adjust_pi(p); + /* + * Run balance callbacks after we've adjusted the PI chain. + */ + balance_callback(rq); + preempt_enable(); + return 0; }