From: Peter Zijlstra <peterz@infradead.org>
To: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Waiman Long <waiman.long@hpe.com>, Jason Low <jason.low2@hpe.com>,
Ding Tianhong <dingtianhong@huawei.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Imre Deak <imre.deak@intel.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Davidlohr Bueso <dave@stgolabs.net>,
Tim Chen <tim.c.chen@linux.intel.com>,
Terry Rudd <terry.rudd@hpe.com>,
"Paul E. McKenney" <paulmck@us.ibm.com>,
Jason Low <jason.low2@hp.com>,
Chris Wilson <chris@chris-wilson.co.uk>,
Daniel Vetter <daniel.vetter@ffwll.ch>
Subject: Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop
Date: Wed, 19 Oct 2016 19:34:03 +0200 [thread overview]
Message-ID: <20161019173403.GB3142@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20161013151720.GB13138@arm.com>
On Thu, Oct 13, 2016 at 04:17:21PM +0100, Will Deacon wrote:
> > if (!first && __mutex_waiter_is_first(lock, &waiter)) {
> > first = true;
> > __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
> > }
> > +
> > + set_task_state(task, state);
>
> With this change, we no longer hold the lock wit_hen we set the task
> state, and it's ordered strictly *after* setting the HANDOFF flag.
> Doesn't that mean that the unlock code can see the HANDOFF flag, issue
> the wakeup, but then we come in and overwrite the task state?
>
> I'm struggling to work out whether that's an issue, but it certainly
> feels odd and is a change from the previous behaviour.
OK, so after a discussion on IRC the problem appears to have been
unfamiliarity with the basic sleep/wakeup scheme. Mutex used to be the
odd duck out for being fully serialized by wait_lock.
The below adds a few words on how the 'normal' sleep/wakeup scheme
works.
---
Subject: sched: Better explain sleep/wakeup
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed Oct 19 15:45:27 CEST 2016
There were a few questions wrt how sleep-wakeup works. Try and explain
it more.
Requested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/linux/sched.h | 52 ++++++++++++++++++++++++++++++++++----------------
kernel/sched/core.c | 15 +++++++-------
2 files changed, 44 insertions(+), 23 deletions(-)
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -262,20 +262,9 @@ extern char ___assert_task_state[1 - 2*!
#define set_task_state(tsk, state_value) \
do { \
(tsk)->task_state_change = _THIS_IP_; \
- smp_store_mb((tsk)->state, (state_value)); \
+ smp_store_mb((tsk)->state, (state_value)); \
} while (0)
-/*
- * set_current_state() includes a barrier so that the write of current->state
- * is correctly serialised wrt the caller's subsequent test of whether to
- * actually sleep:
- *
- * set_current_state(TASK_UNINTERRUPTIBLE);
- * if (do_i_need_to_sleep())
- * schedule();
- *
- * If the caller does not need such serialisation then use __set_current_state()
- */
#define __set_current_state(state_value) \
do { \
current->task_state_change = _THIS_IP_; \
@@ -284,11 +273,19 @@ extern char ___assert_task_state[1 - 2*!
#define set_current_state(state_value) \
do { \
current->task_state_change = _THIS_IP_; \
- smp_store_mb(current->state, (state_value)); \
+ smp_store_mb(current->state, (state_value)); \
} while (0)
#else
+/*
+ * @tsk had better be current, or you get to keep the pieces.
+ *
+ * The only reason is that computing current can be more expensive than
+ * using a pointer that's already available.
+ *
+ * Therefore, see set_current_state().
+ */
#define __set_task_state(tsk, state_value) \
do { (tsk)->state = (state_value); } while (0)
#define set_task_state(tsk, state_value) \
@@ -299,11 +296,34 @@ extern char ___assert_task_state[1 - 2*!
* is correctly serialised wrt the caller's subsequent test of whether to
* actually sleep:
*
+ * for (;;) {
* set_current_state(TASK_UNINTERRUPTIBLE);
- * if (do_i_need_to_sleep())
- * schedule();
+ * if (!need_sleep)
+ * break;
+ *
+ * schedule();
+ * }
+ * __set_current_state(TASK_RUNNING);
+ *
+ * If the caller does not need such serialisation (because, for instance, the
+ * condition test and condition change and wakeup are under the same lock) then
+ * use __set_current_state().
+ *
+ * The above is typically ordered against the wakeup, which does:
+ *
+ * need_sleep = false;
+ * wake_up_state(p, TASK_UNINTERRUPTIBLE);
+ *
+ * Where wake_up_state() (and all other wakeup primitives) imply enough
+ * barriers to order the store of the variable against wakeup.
+ *
+ * Wakeup will do: if (@state & p->state) p->state = TASK_RUNNING, that is,
+ * once it observes the TASK_UNINTERRUPTIBLE store the waking CPU can issue a
+ * TASK_RUNNING store which can collide with __set_current_state(TASK_RUNNING).
+ *
+ * This is obviously fine, since they both store the exact same value.
*
- * If the caller does not need such serialisation then use __set_current_state()
+ * Also see the comments of try_to_wake_up().
*/
#define __set_current_state(state_value) \
do { current->state = (state_value); } while (0)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2000,14 +2000,15 @@ static void ttwu_queue(struct task_struc
* @state: the mask of task states that can be woken
* @wake_flags: wake modifier flags (WF_*)
*
- * Put it on the run-queue if it's not already there. The "current"
- * thread is always on the run-queue (except when the actual
- * re-schedule is in progress), and as such you're allowed to do
- * the simpler "current->state = TASK_RUNNING" to mark yourself
- * runnable without the overhead of this.
+ * If (@state & @p->state) @p->state = TASK_RUNNING.
*
- * Return: %true if @p was woken up, %false if it was already running.
- * or @state didn't match @p's state.
+ * If the task was not queued/runnable, also place it back on a runqueue.
+ *
+ * Atomic against schedule() which would dequeue a task, also see
+ * set_current_state().
+ *
+ * Return: %true if @p->state changes (an actual wakeup was done),
+ * %false otherwise.
*/
static int
try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
next prev parent reply other threads:[~2016-10-19 17:35 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-07 14:52 [PATCH -v4 0/8] locking/mutex: Rewrite basic mutex Peter Zijlstra
2016-10-07 14:52 ` [PATCH -v4 1/8] locking/drm: Kill mutex trickery Peter Zijlstra
2016-10-07 15:43 ` Peter Zijlstra
2016-10-07 15:58 ` Linus Torvalds
2016-10-07 16:13 ` Peter Zijlstra
2016-10-07 21:58 ` Waiman Long
2016-10-08 11:58 ` Thomas Gleixner
2016-10-08 14:01 ` Peter Zijlstra
2016-10-08 14:11 ` Thomas Gleixner
2016-10-08 16:42 ` Peter Zijlstra
2016-11-09 10:38 ` Peter Zijlstra
2016-10-18 12:48 ` Peter Zijlstra
2016-10-18 12:57 ` Peter Zijlstra
2016-11-11 11:22 ` Daniel Vetter
2016-11-11 11:38 ` Peter Zijlstra
2016-11-12 10:58 ` Ingo Molnar
2016-11-14 14:04 ` Peter Zijlstra
2016-11-14 14:27 ` Ingo Molnar
2016-10-18 12:57 ` Chris Wilson
2016-10-07 14:52 ` [PATCH -v4 2/8] locking/mutex: Rework mutex::owner Peter Zijlstra
2016-10-12 17:59 ` Davidlohr Bueso
2016-10-12 19:52 ` Jason Low
2016-10-13 15:18 ` Will Deacon
2016-10-07 14:52 ` [PATCH -v4 3/8] locking/mutex: Kill arch specific code Peter Zijlstra
2016-10-07 14:52 ` [PATCH -v4 4/8] locking/mutex: Allow MUTEX_SPIN_ON_OWNER when DEBUG_MUTEXES Peter Zijlstra
2016-10-07 14:52 ` [PATCH -v4 5/8] locking/mutex: Add lock handoff to avoid starvation Peter Zijlstra
2016-10-13 15:14 ` Will Deacon
2016-10-17 9:22 ` Peter Zijlstra
2016-10-17 18:45 ` Waiman Long
2016-10-17 19:07 ` Waiman Long
2016-10-18 13:02 ` Peter Zijlstra
2016-10-18 12:36 ` Peter Zijlstra
2016-12-27 13:55 ` Chris Wilson
2017-01-09 11:52 ` [PATCH] locking/mutex: Clear mutex-handoff flag on interrupt Chris Wilson
2017-01-11 16:43 ` Peter Zijlstra
2017-01-11 16:57 ` Chris Wilson
2017-01-12 20:58 ` Chris Wilson
2016-10-07 14:52 ` [PATCH -v4 6/8] locking/mutex: Restructure wait loop Peter Zijlstra
2016-10-13 15:17 ` Will Deacon
2016-10-17 10:44 ` Peter Zijlstra
2016-10-17 13:24 ` Peter Zijlstra
2016-10-17 13:45 ` Boqun Feng
2016-10-17 15:49 ` Peter Zijlstra
2016-10-19 17:34 ` Peter Zijlstra [this message]
2016-10-24 1:57 ` ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop) Davidlohr Bueso
2016-10-24 13:26 ` Kent Overstreet
2016-10-24 14:27 ` Kent Overstreet
2016-10-25 16:55 ` Eric Wheeler
2016-10-25 17:45 ` Kent Overstreet
2016-10-17 23:16 ` [PATCH -v4 6/8] locking/mutex: Restructure wait loop Waiman Long
2016-10-18 13:14 ` Peter Zijlstra
2016-10-07 14:52 ` [PATCH -v4 7/8] locking/mutex: Simplify some ww_mutex code in __mutex_lock_common() Peter Zijlstra
2016-10-07 14:52 ` [PATCH -v4 8/8] locking/mutex: Enable optimistic spinning of woken waiter Peter Zijlstra
2016-10-13 15:28 ` Will Deacon
2016-10-17 9:32 ` Peter Zijlstra
2016-10-17 23:21 ` Waiman Long
2016-10-18 12:19 ` Peter Zijlstra
2016-10-07 15:20 ` [PATCH -v4 0/8] locking/mutex: Rewrite basic mutex Linus Torvalds
2016-10-11 18:42 ` Jason Low
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161019173403.GB3142@twins.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=chris@chris-wilson.co.uk \
--cc=daniel.vetter@ffwll.ch \
--cc=dave@stgolabs.net \
--cc=dingtianhong@huawei.com \
--cc=imre.deak@intel.com \
--cc=jason.low2@hp.com \
--cc=jason.low2@hpe.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=paulmck@us.ibm.com \
--cc=terry.rudd@hpe.com \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@linux.intel.com \
--cc=torvalds@linux-foundation.org \
--cc=waiman.long@hpe.com \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).