All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: mingo@kernel.org, tglx@linutronix.de, juri.lelli@arm.com,
	rostedt@goodmis.org, xlpang@redhat.com
Cc: linux-kernel@vger.kernel.org, mathieu.desnoyers@efficios.com,
	jdesfossez@efficios.com, bristot@redhat.com,
	peterz@infradead.org, Ingo Molnar <mingo@redhat.com>
Subject: [RFC][PATCH 1/8] rtmutex: Deboost before waking up the top waiter
Date: Tue, 07 Jun 2016 21:56:36 +0200	[thread overview]
Message-ID: <20160607200215.637804442@infradead.org> (raw)
In-Reply-To: 20160607195635.710022345@infradead.org

[-- Attachment #1: xunlei_pang-rtmutex-deboost_before_waking_up_the_top_waiter.patch --]
[-- Type: text/plain, Size: 3510 bytes --]

From: Xunlei Pang <xlpang@redhat.com>

We should deboost before waking the high-prio task, such that
we don't run two tasks with the same "state"(priority, deadline,
sched_class, etc) during the period between the end of wake_up_q()
and the end of rt_mutex_adjust_prio().

As "Peter Zijlstra" said:
Its semantically icky to have the two tasks running off the same
state and practically icky when you consider bandwidth inheritance --
where the boosted task wants to explicitly modify the state of the
booster. In that latter case you really want to unboost before you
let the booster run again.

But this however can lead to prio-inversion if current would get
preempted after the deboost but before waking our high-prio task,
hence we disable preemption before doing deboost, and enabling it
after the wake up is over.

The patch fixed the logic, and introduced rt_mutex_postunlock()
to do some code refactor.

Most importantly however; this change ensures pointer stability for
the next patch, where we have rt_mutex_setprio() cache a pointer to
the top-most waiter task. If we, as before this change, do the wakeup
first and then deboost, this pointer might point into thin air.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@arm.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
[peterz: Changelog]
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1461659449-19497-1-git-send-email-xlpang@redhat.com
---

 kernel/futex.c                  |    5 ++---
 kernel/locking/rtmutex.c        |   28 ++++++++++++++++++++++++----
 kernel/locking/rtmutex_common.h |    1 +
 3 files changed, 27 insertions(+), 7 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1336,9 +1336,8 @@ static int wake_futex_pi(u32 __user *uad
 	 * scheduled away before the wake up can take place.
 	 */
 	spin_unlock(&hb->lock);
-	wake_up_q(&wake_q);
-	if (deboost)
-		rt_mutex_adjust_prio(current);
+
+	rt_mutex_postunlock(&wake_q, deboost);
 
 	return 0;
 }
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1390,12 +1390,32 @@ rt_mutex_fastunlock(struct rt_mutex *loc
 	} else {
 		bool deboost = slowfn(lock, &wake_q);
 
-		wake_up_q(&wake_q);
+		rt_mutex_postunlock(&wake_q, deboost);
+	}
+}
+
 
-		/* Undo pi boosting if necessary: */
-		if (deboost)
-			rt_mutex_adjust_prio(current);
+/*
+ * Undo pi boosting (if necessary) and wake top waiter.
+ */
+void rt_mutex_postunlock(struct wake_q_head *wake_q, bool deboost)
+{
+	/*
+	 * We should deboost before waking the top waiter task such that
+	 * we don't run two tasks with the 'same' priority. This however
+	 * can lead to prio-inversion if we would get preempted after
+	 * the deboost but before waking our high-prio task, hence the
+	 * preempt_disable.
+	 */
+	if (deboost) {
+		preempt_disable();
+		rt_mutex_adjust_prio(current);
 	}
+
+	wake_up_q(wake_q);
+
+	if (deboost)
+		preempt_enable();
 }
 
 /**
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -111,6 +111,7 @@ extern int rt_mutex_finish_proxy_lock(st
 extern int rt_mutex_timed_futex_lock(struct rt_mutex *l, struct hrtimer_sleeper *to);
 extern bool rt_mutex_futex_unlock(struct rt_mutex *lock,
 				  struct wake_q_head *wqh);
+extern void rt_mutex_postunlock(struct wake_q_head *wake_q, bool deboost);
 extern void rt_mutex_adjust_prio(struct task_struct *task);
 
 #ifdef CONFIG_DEBUG_RT_MUTEXES

  reply	other threads:[~2016-06-07 20:19 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-07 19:56 [RFC][PATCH 0/8] PI and assorted failings Peter Zijlstra
2016-06-07 19:56 ` Peter Zijlstra [this message]
2016-06-14  9:09   ` [RFC][PATCH 1/8] rtmutex: Deboost before waking up the top waiter Juri Lelli
2016-06-14 12:54     ` Peter Zijlstra
2016-06-14 13:20       ` Juri Lelli
2016-06-14 13:59         ` Peter Zijlstra
2016-06-14 16:36     ` Davidlohr Bueso
2016-06-14 17:01       ` Juri Lelli
2016-06-14 18:22   ` Steven Rostedt
2016-06-07 19:56 ` [RFC][PATCH 2/8] sched/rtmutex/deadline: Fix a PI crash for deadline tasks Peter Zijlstra
2016-06-14 10:21   ` Juri Lelli
2016-06-14 12:30     ` Peter Zijlstra
2016-06-14 12:53     ` Xunlei Pang
2016-06-14 13:07       ` Juri Lelli
2016-06-14 16:39         ` Juri Lelli
2016-06-14 18:42   ` Steven Rostedt
2016-06-14 20:28     ` Peter Zijlstra
2016-06-15 16:14       ` Steven Rostedt
2016-06-07 19:56 ` [RFC][PATCH 3/8] sched/deadline/rtmutex: Dont miss the dl_runtime/dl_period update Peter Zijlstra
2016-06-14 10:43   ` Juri Lelli
2016-06-14 12:09     ` Peter Zijlstra
2016-06-15 16:30   ` Steven Rostedt
2016-06-15 17:55     ` Peter Zijlstra
2016-06-07 19:56 ` [RFC][PATCH 4/8] rtmutex: Remove rt_mutex_fastunlock() Peter Zijlstra
2016-06-15 16:43   ` Steven Rostedt
2016-06-07 19:56 ` [RFC][PATCH 5/8] rtmutex: Clean up Peter Zijlstra
2016-06-14 12:08   ` Juri Lelli
2016-06-14 12:32     ` Peter Zijlstra
2016-06-14 12:41       ` Juri Lelli
2016-06-07 19:56 ` [RFC][PATCH 6/8] sched/rtmutex: Refactor rt_mutex_setprio() Peter Zijlstra
2016-06-14 13:14   ` Juri Lelli
2016-06-14 14:08     ` Peter Zijlstra
2016-06-07 19:56 ` [RFC][PATCH 7/8] sched,tracing: Update trace_sched_pi_setprio() Peter Zijlstra
2016-06-07 19:56 ` [RFC][PATCH 8/8] rtmutex: Fix PI chain order integrity Peter Zijlstra
2016-06-14 17:39   ` Juri Lelli
2016-06-14 19:44     ` Peter Zijlstra
2016-06-15  7:25       ` Juri Lelli
2016-06-27 12:23         ` Peter Zijlstra
2016-06-27 12:40           ` Thomas Gleixner
2016-06-28  9:05           ` Juri Lelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160607200215.637804442@infradead.org \
    --to=peterz@infradead.org \
    --cc=bristot@redhat.com \
    --cc=jdesfossez@efficios.com \
    --cc=juri.lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=xlpang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.