All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Waiman Long <waiman.long@hpe.com>, Jason Low <jason.low2@hpe.com>,
	Ding Tianhong <dingtianhong@huawei.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Imre Deak <imre.deak@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Terry Rudd <terry.rudd@hpe.com>,
	"Paul E. McKenney" <paulmck@us.ibm.com>,
	Jason Low <jason.low2@hp.com>,
	Chris Wilson <chris@chris-wilson.co.uk>,
	Daniel Vetter <daniel.vetter@ffwll.ch>
Subject: Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop
Date: Wed, 19 Oct 2016 19:34:03 +0200	[thread overview]
Message-ID: <20161019173403.GB3142@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20161013151720.GB13138@arm.com>

On Thu, Oct 13, 2016 at 04:17:21PM +0100, Will Deacon wrote:
> >  		if (!first && __mutex_waiter_is_first(lock, &waiter)) {
> >  			first = true;
> >  			__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
> >  		}
> > +
> > +		set_task_state(task, state);
> 
> With this change, we no longer hold the lock wit_hen we set the task
> state, and it's ordered strictly *after* setting the HANDOFF flag.
> Doesn't that mean that the unlock code can see the HANDOFF flag, issue
> the wakeup, but then we come in and overwrite the task state?
> 
> I'm struggling to work out whether that's an issue, but it certainly
> feels odd and is a change from the previous behaviour.

OK, so after a discussion on IRC the problem appears to have been
unfamiliarity with the basic sleep/wakeup scheme. Mutex used to be the
odd duck out for being fully serialized by wait_lock.

The below adds a few words on how the 'normal' sleep/wakeup scheme
works.


---
Subject: sched: Better explain sleep/wakeup
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed Oct 19 15:45:27 CEST 2016

There were a few questions wrt how sleep-wakeup works. Try and explain
it more.

Requested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/sched.h |   52 ++++++++++++++++++++++++++++++++++----------------
 kernel/sched/core.c   |   15 +++++++-------
 2 files changed, 44 insertions(+), 23 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -262,20 +262,9 @@ extern char ___assert_task_state[1 - 2*!
 #define set_task_state(tsk, state_value)			\
 	do {							\
 		(tsk)->task_state_change = _THIS_IP_;		\
-		smp_store_mb((tsk)->state, (state_value));		\
+		smp_store_mb((tsk)->state, (state_value));	\
 	} while (0)
 
-/*
- * set_current_state() includes a barrier so that the write of current->state
- * is correctly serialised wrt the caller's subsequent test of whether to
- * actually sleep:
- *
- *	set_current_state(TASK_UNINTERRUPTIBLE);
- *	if (do_i_need_to_sleep())
- *		schedule();
- *
- * If the caller does not need such serialisation then use __set_current_state()
- */
 #define __set_current_state(state_value)			\
 	do {							\
 		current->task_state_change = _THIS_IP_;		\
@@ -284,11 +273,19 @@ extern char ___assert_task_state[1 - 2*!
 #define set_current_state(state_value)				\
 	do {							\
 		current->task_state_change = _THIS_IP_;		\
-		smp_store_mb(current->state, (state_value));		\
+		smp_store_mb(current->state, (state_value));	\
 	} while (0)
 
 #else
 
+/*
+ * @tsk had better be current, or you get to keep the pieces.
+ *
+ * The only reason is that computing current can be more expensive than
+ * using a pointer that's already available.
+ *
+ * Therefore, see set_current_state().
+ */
 #define __set_task_state(tsk, state_value)		\
 	do { (tsk)->state = (state_value); } while (0)
 #define set_task_state(tsk, state_value)		\
@@ -299,11 +296,34 @@ extern char ___assert_task_state[1 - 2*!
  * is correctly serialised wrt the caller's subsequent test of whether to
  * actually sleep:
  *
+ *   for (;;) {
  *	set_current_state(TASK_UNINTERRUPTIBLE);
- *	if (do_i_need_to_sleep())
- *		schedule();
+ *	if (!need_sleep)
+ *		break;
+ *
+ *	schedule();
+ *   }
+ *   __set_current_state(TASK_RUNNING);
+ *
+ * If the caller does not need such serialisation (because, for instance, the
+ * condition test and condition change and wakeup are under the same lock) then
+ * use __set_current_state().
+ *
+ * The above is typically ordered against the wakeup, which does:
+ *
+ *	need_sleep = false;
+ *	wake_up_state(p, TASK_UNINTERRUPTIBLE);
+ *
+ * Where wake_up_state() (and all other wakeup primitives) imply enough
+ * barriers to order the store of the variable against wakeup.
+ *
+ * Wakeup will do: if (@state & p->state) p->state = TASK_RUNNING, that is,
+ * once it observes the TASK_UNINTERRUPTIBLE store the waking CPU can issue a
+ * TASK_RUNNING store which can collide with __set_current_state(TASK_RUNNING).
+ *
+ * This is obviously fine, since they both store the exact same value.
  *
- * If the caller does not need such serialisation then use __set_current_state()
+ * Also see the comments of try_to_wake_up().
  */
 #define __set_current_state(state_value)		\
 	do { current->state = (state_value); } while (0)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2000,14 +2000,15 @@ static void ttwu_queue(struct task_struc
  * @state: the mask of task states that can be woken
  * @wake_flags: wake modifier flags (WF_*)
  *
- * Put it on the run-queue if it's not already there. The "current"
- * thread is always on the run-queue (except when the actual
- * re-schedule is in progress), and as such you're allowed to do
- * the simpler "current->state = TASK_RUNNING" to mark yourself
- * runnable without the overhead of this.
+ * If (@state & @p->state) @p->state = TASK_RUNNING.
  *
- * Return: %true if @p was woken up, %false if it was already running.
- * or @state didn't match @p's state.
+ * If the task was not queued/runnable, also place it back on a runqueue.
+ *
+ * Atomic against schedule() which would dequeue a task, also see
+ * set_current_state().
+ *
+ * Return: %true if @p->state changes (an actual wakeup was done),
+ *	   %false otherwise.
  */
 static int
 try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)

  parent reply	other threads:[~2016-10-19 17:35 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-07 14:52 [PATCH -v4 0/8] locking/mutex: Rewrite basic mutex Peter Zijlstra
2016-10-07 14:52 ` [PATCH -v4 1/8] locking/drm: Kill mutex trickery Peter Zijlstra
2016-10-07 15:43   ` Peter Zijlstra
2016-10-07 15:58     ` Linus Torvalds
2016-10-07 16:13       ` Peter Zijlstra
2016-10-07 21:58         ` Waiman Long
2016-10-08 11:58     ` Thomas Gleixner
2016-10-08 14:01       ` Peter Zijlstra
2016-10-08 14:11         ` Thomas Gleixner
2016-10-08 16:42           ` Peter Zijlstra
2016-11-09 10:38     ` Peter Zijlstra
2016-10-18 12:48   ` Peter Zijlstra
2016-10-18 12:57     ` Peter Zijlstra
2016-11-11 11:22       ` Daniel Vetter
2016-11-11 11:38         ` Peter Zijlstra
2016-11-12 10:58           ` Ingo Molnar
2016-11-14 14:04             ` Peter Zijlstra
2016-11-14 14:27               ` Ingo Molnar
2016-10-18 12:57     ` Chris Wilson
2016-10-07 14:52 ` [PATCH -v4 2/8] locking/mutex: Rework mutex::owner Peter Zijlstra
2016-10-12 17:59   ` Davidlohr Bueso
2016-10-12 19:52     ` Jason Low
2016-10-13 15:18   ` Will Deacon
2016-10-07 14:52 ` [PATCH -v4 3/8] locking/mutex: Kill arch specific code Peter Zijlstra
2016-10-07 14:52 ` [PATCH -v4 4/8] locking/mutex: Allow MUTEX_SPIN_ON_OWNER when DEBUG_MUTEXES Peter Zijlstra
2016-10-07 14:52 ` [PATCH -v4 5/8] locking/mutex: Add lock handoff to avoid starvation Peter Zijlstra
2016-10-13 15:14   ` Will Deacon
2016-10-17  9:22     ` Peter Zijlstra
2016-10-17 18:45   ` Waiman Long
2016-10-17 19:07     ` Waiman Long
2016-10-18 13:02       ` Peter Zijlstra
2016-10-18 12:36     ` Peter Zijlstra
2016-12-27 13:55   ` Chris Wilson
2017-01-09 11:52   ` [PATCH] locking/mutex: Clear mutex-handoff flag on interrupt Chris Wilson
2017-01-11 16:43     ` Peter Zijlstra
2017-01-11 16:57       ` Chris Wilson
2017-01-12 20:58         ` Chris Wilson
2016-10-07 14:52 ` [PATCH -v4 6/8] locking/mutex: Restructure wait loop Peter Zijlstra
2016-10-13 15:17   ` Will Deacon
2016-10-17 10:44     ` Peter Zijlstra
2016-10-17 13:24       ` Peter Zijlstra
2016-10-17 13:45         ` Boqun Feng
2016-10-17 15:49           ` Peter Zijlstra
2016-10-19 17:34     ` Peter Zijlstra [this message]
2016-10-24  1:57       ` ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop) Davidlohr Bueso
2016-10-24 13:26         ` Kent Overstreet
2016-10-24 14:27         ` Kent Overstreet
2016-10-25 16:55           ` Eric Wheeler
2016-10-25 17:45             ` Kent Overstreet
2016-10-17 23:16   ` [PATCH -v4 6/8] locking/mutex: Restructure wait loop Waiman Long
2016-10-18 13:14     ` Peter Zijlstra
2016-10-07 14:52 ` [PATCH -v4 7/8] locking/mutex: Simplify some ww_mutex code in __mutex_lock_common() Peter Zijlstra
2016-10-07 14:52 ` [PATCH -v4 8/8] locking/mutex: Enable optimistic spinning of woken waiter Peter Zijlstra
2016-10-13 15:28   ` Will Deacon
2016-10-17  9:32     ` Peter Zijlstra
2016-10-17 23:21   ` Waiman Long
2016-10-18 12:19     ` Peter Zijlstra
2016-10-07 15:20 ` [PATCH -v4 0/8] locking/mutex: Rewrite basic mutex Linus Torvalds
2016-10-11 18:42 ` Jason Low

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161019173403.GB3142@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=chris@chris-wilson.co.uk \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dave@stgolabs.net \
    --cc=dingtianhong@huawei.com \
    --cc=imre.deak@intel.com \
    --cc=jason.low2@hp.com \
    --cc=jason.low2@hpe.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@us.ibm.com \
    --cc=terry.rudd@hpe.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=waiman.long@hpe.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.