All of lore.kernel.org
 help / color / mirror / Atom feed
From: Davidlohr Bueso <dave@stgolabs.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Waiman Long <waiman.long@hpe.com>, Jason Low <jason.low2@hpe.com>,
	Ding Tianhong <dingtianhong@huawei.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Imre Deak <imre.deak@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Terry Rudd <terry.rudd@hpe.com>,
	"Paul E. McKenney" <paulmck@us.ibm.com>,
	Jason Low <jason.low2@hp.com>,
	Chris Wilson <chris@chris-wilson.co.uk>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	kent.overstreet@gmail.com, axboe@fb.com,
	linux-bcache@vger.kernel.org
Subject: ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop)
Date: Sun, 23 Oct 2016 18:57:26 -0700	[thread overview]
Message-ID: <20161024015726.GA26130@linux-80c1.suse> (raw)
In-Reply-To: <20161019173403.GB3142@twins.programming.kicks-ass.net>

On Wed, 19 Oct 2016, Peter Zijlstra wrote:

>Subject: sched: Better explain sleep/wakeup
>From: Peter Zijlstra <peterz@infradead.org>
>Date: Wed Oct 19 15:45:27 CEST 2016
>
>There were a few questions wrt how sleep-wakeup works. Try and explain
>it more.
>
>Requested-by: Will Deacon <will.deacon@arm.com>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>---
> include/linux/sched.h |   52 ++++++++++++++++++++++++++++++++++----------------
> kernel/sched/core.c   |   15 +++++++-------
> 2 files changed, 44 insertions(+), 23 deletions(-)
>
>--- a/include/linux/sched.h
>+++ b/include/linux/sched.h
>@@ -262,20 +262,9 @@ extern char ___assert_task_state[1 - 2*!
> #define set_task_state(tsk, state_value)			\
> 	do {							\
> 		(tsk)->task_state_change = _THIS_IP_;		\
>-		smp_store_mb((tsk)->state, (state_value));		\
>+		smp_store_mb((tsk)->state, (state_value));	\
> 	} while (0)
>
>-/*
>- * set_current_state() includes a barrier so that the write of current->state
>- * is correctly serialised wrt the caller's subsequent test of whether to
>- * actually sleep:
>- *
>- *	set_current_state(TASK_UNINTERRUPTIBLE);
>- *	if (do_i_need_to_sleep())
>- *		schedule();
>- *
>- * If the caller does not need such serialisation then use __set_current_state()
>- */
> #define __set_current_state(state_value)			\
> 	do {							\
> 		current->task_state_change = _THIS_IP_;		\
>@@ -284,11 +273,19 @@ extern char ___assert_task_state[1 - 2*!
> #define set_current_state(state_value)				\
> 	do {							\
> 		current->task_state_change = _THIS_IP_;		\
>-		smp_store_mb(current->state, (state_value));		\
>+		smp_store_mb(current->state, (state_value));	\
> 	} while (0)
>
> #else
>
>+/*
>+ * @tsk had better be current, or you get to keep the pieces.

That reminds me we were getting rid of the set_task_state() calls. Bcache was
pending, being only user in the kernel that doesn't actually use current; but
instead breaks newly (yet blocked/uninterruptible) created garbage collection
kthread. I cannot figure out why this is done (ie purposely accounting the
load avg. Furthermore gc kicks in in very specific scenarios obviously, such
as as by the allocator task, so I don't see why bcache gc should want to be
interruptible.

Kent, Jens, can we get rid of this?

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 76f7534d1dd1..6e3c358b5759 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1798,7 +1798,6 @@ int bch_gc_thread_start(struct cache_set *c)
        if (IS_ERR(c->gc_thread))
                return PTR_ERR(c->gc_thread);
 
-       set_task_state(c->gc_thread, TASK_INTERRUPTIBLE);
        return 0;
 }

Thanks,
Davidlohr

>+ *
>+ * The only reason is that computing current can be more expensive than
>+ * using a pointer that's already available.
>+ *
>+ * Therefore, see set_current_state().
>+ */
> #define __set_task_state(tsk, state_value)		\
> 	do { (tsk)->state = (state_value); } while (0)
> #define set_task_state(tsk, state_value)		\
>@@ -299,11 +296,34 @@ extern char ___assert_task_state[1 - 2*!
>  * is correctly serialised wrt the caller's subsequent test of whether to
>  * actually sleep:
>  *
>+ *   for (;;) {
>  *	set_current_state(TASK_UNINTERRUPTIBLE);
>- *	if (do_i_need_to_sleep())
>- *		schedule();
>+ *	if (!need_sleep)
>+ *		break;
>+ *
>+ *	schedule();
>+ *   }
>+ *   __set_current_state(TASK_RUNNING);
>+ *
>+ * If the caller does not need such serialisation (because, for instance, the
>+ * condition test and condition change and wakeup are under the same lock) then
>+ * use __set_current_state().
>+ *
>+ * The above is typically ordered against the wakeup, which does:
>+ *
>+ *	need_sleep = false;
>+ *	wake_up_state(p, TASK_UNINTERRUPTIBLE);
>+ *
>+ * Where wake_up_state() (and all other wakeup primitives) imply enough
>+ * barriers to order the store of the variable against wakeup.
>+ *
>+ * Wakeup will do: if (@state & p->state) p->state = TASK_RUNNING, that is,
>+ * once it observes the TASK_UNINTERRUPTIBLE store the waking CPU can issue a
>+ * TASK_RUNNING store which can collide with __set_current_state(TASK_RUNNING).
>+ *
>+ * This is obviously fine, since they both store the exact same value.
>  *
>- * If the caller does not need such serialisation then use __set_current_state()
>+ * Also see the comments of try_to_wake_up().
>  */
> #define __set_current_state(state_value)		\
> 	do { current->state = (state_value); } while (0)
>--- a/kernel/sched/core.c
>+++ b/kernel/sched/core.c
>@@ -2000,14 +2000,15 @@ static void ttwu_queue(struct task_struc
>  * @state: the mask of task states that can be woken
>  * @wake_flags: wake modifier flags (WF_*)
>  *
>- * Put it on the run-queue if it's not already there. The "current"
>- * thread is always on the run-queue (except when the actual
>- * re-schedule is in progress), and as such you're allowed to do
>- * the simpler "current->state = TASK_RUNNING" to mark yourself
>- * runnable without the overhead of this.
>+ * If (@state & @p->state) @p->state = TASK_RUNNING.
>  *
>- * Return: %true if @p was woken up, %false if it was already running.
>- * or @state didn't match @p's state.
>+ * If the task was not queued/runnable, also place it back on a runqueue.
>+ *
>+ * Atomic against schedule() which would dequeue a task, also see
>+ * set_current_state().
>+ *
>+ * Return: %true if @p->state changes (an actual wakeup was done),
>+ *	   %false otherwise.
>  */
> static int
> try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)

  reply	other threads:[~2016-10-24  1:57 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-07 14:52 [PATCH -v4 0/8] locking/mutex: Rewrite basic mutex Peter Zijlstra
2016-10-07 14:52 ` [PATCH -v4 1/8] locking/drm: Kill mutex trickery Peter Zijlstra
2016-10-07 15:43   ` Peter Zijlstra
2016-10-07 15:58     ` Linus Torvalds
2016-10-07 16:13       ` Peter Zijlstra
2016-10-07 21:58         ` Waiman Long
2016-10-08 11:58     ` Thomas Gleixner
2016-10-08 14:01       ` Peter Zijlstra
2016-10-08 14:11         ` Thomas Gleixner
2016-10-08 16:42           ` Peter Zijlstra
2016-11-09 10:38     ` Peter Zijlstra
2016-10-18 12:48   ` Peter Zijlstra
2016-10-18 12:57     ` Peter Zijlstra
2016-11-11 11:22       ` Daniel Vetter
2016-11-11 11:38         ` Peter Zijlstra
2016-11-12 10:58           ` Ingo Molnar
2016-11-14 14:04             ` Peter Zijlstra
2016-11-14 14:27               ` Ingo Molnar
2016-10-18 12:57     ` Chris Wilson
2016-10-07 14:52 ` [PATCH -v4 2/8] locking/mutex: Rework mutex::owner Peter Zijlstra
2016-10-12 17:59   ` Davidlohr Bueso
2016-10-12 19:52     ` Jason Low
2016-10-13 15:18   ` Will Deacon
2016-10-07 14:52 ` [PATCH -v4 3/8] locking/mutex: Kill arch specific code Peter Zijlstra
2016-10-07 14:52 ` [PATCH -v4 4/8] locking/mutex: Allow MUTEX_SPIN_ON_OWNER when DEBUG_MUTEXES Peter Zijlstra
2016-10-07 14:52 ` [PATCH -v4 5/8] locking/mutex: Add lock handoff to avoid starvation Peter Zijlstra
2016-10-13 15:14   ` Will Deacon
2016-10-17  9:22     ` Peter Zijlstra
2016-10-17 18:45   ` Waiman Long
2016-10-17 19:07     ` Waiman Long
2016-10-18 13:02       ` Peter Zijlstra
2016-10-18 12:36     ` Peter Zijlstra
2016-12-27 13:55   ` Chris Wilson
2017-01-09 11:52   ` [PATCH] locking/mutex: Clear mutex-handoff flag on interrupt Chris Wilson
2017-01-11 16:43     ` Peter Zijlstra
2017-01-11 16:57       ` Chris Wilson
2017-01-12 20:58         ` Chris Wilson
2016-10-07 14:52 ` [PATCH -v4 6/8] locking/mutex: Restructure wait loop Peter Zijlstra
2016-10-13 15:17   ` Will Deacon
2016-10-17 10:44     ` Peter Zijlstra
2016-10-17 13:24       ` Peter Zijlstra
2016-10-17 13:45         ` Boqun Feng
2016-10-17 15:49           ` Peter Zijlstra
2016-10-19 17:34     ` Peter Zijlstra
2016-10-24  1:57       ` Davidlohr Bueso [this message]
2016-10-24 13:26         ` ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop) Kent Overstreet
2016-10-24 14:27         ` Kent Overstreet
2016-10-25 16:55           ` Eric Wheeler
2016-10-25 17:45             ` Kent Overstreet
2016-10-17 23:16   ` [PATCH -v4 6/8] locking/mutex: Restructure wait loop Waiman Long
2016-10-18 13:14     ` Peter Zijlstra
2016-10-07 14:52 ` [PATCH -v4 7/8] locking/mutex: Simplify some ww_mutex code in __mutex_lock_common() Peter Zijlstra
2016-10-07 14:52 ` [PATCH -v4 8/8] locking/mutex: Enable optimistic spinning of woken waiter Peter Zijlstra
2016-10-13 15:28   ` Will Deacon
2016-10-17  9:32     ` Peter Zijlstra
2016-10-17 23:21   ` Waiman Long
2016-10-18 12:19     ` Peter Zijlstra
2016-10-07 15:20 ` [PATCH -v4 0/8] locking/mutex: Rewrite basic mutex Linus Torvalds
2016-10-11 18:42 ` Jason Low

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161024015726.GA26130@linux-80c1.suse \
    --to=dave@stgolabs.net \
    --cc=axboe@fb.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dingtianhong@huawei.com \
    --cc=imre.deak@intel.com \
    --cc=jason.low2@hp.com \
    --cc=jason.low2@hpe.com \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@us.ibm.com \
    --cc=peterz@infradead.org \
    --cc=terry.rudd@hpe.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=waiman.long@hpe.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.