From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750853AbdCHMUI (ORCPT ); Wed, 8 Mar 2017 07:20:08 -0500 Received: from mail-wm0-f65.google.com ([74.125.82.65]:36220 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752110AbdCHMUG (ORCPT ); Wed, 8 Mar 2017 07:20:06 -0500 Date: Wed, 8 Mar 2017 13:10:47 +0100 From: Ingo Molnar To: Linus Torvalds Cc: Linux Kernel Mailing List , Peter Zijlstra , Thomas Gleixner , Andrew Morton Subject: [RFC PATCH, -v2] sched/wait: Introduce new, more compact wait_event*() primitives Message-ID: <20170308121047.GA19796@gmail.com> References: <20170303013646.GA32094@gmail.com> <20170308083719.GA3251@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170308083719.GA3251@gmail.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Ingo Molnar wrote: > + if (___wait_is_interruptible(0) && wes->ret) { So that line is dumb and bogus, it should be: if (___wait_is_interruptible(TASK_UNINTERRUPTIBLE) && wes->ret) { ... which in the final version will be passed in parametrically within wait.c itself (but probably not passed in from the macro space, to lower the call site overhead), that is why I preserved the __wait_is_interruptible() check to begin with. (This bug could perhaps explain the boot hang Peter saw.) I also removed the bool state variables Peter disliked and added (a tiny ...) bit more documentation, the latest version is attached below. Still not signed off, due to the _v1/_v2 devel hack - and it's of course still very incomplete, as only wait_event() is converted. But this should be a reviewable demo of the new wait-event state machine logic. I'd hesitate to put much more work into this without first having consensus over whether this is a good idea to begin with. Thanks, Ingo ======================> >>From 2a918b9bb21e1e1daf047797f598cadb75bba241 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Sun, 5 Mar 2017 14:28:09 +0100 Subject: [PATCH] sched/wait: Introduce new, more compact wait_event*() primitives Turn the wait_event() interface into a state machine. Only very lightly tested, but should demonstrate the principle. Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Not-Signed-off-by: Ingo Molnar --- include/linux/wait.h | 34 ++++++++++++++++++++++++++++++- kernel/sched/wait.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 90 insertions(+), 1 deletion(-) diff --git a/include/linux/wait.h b/include/linux/wait.h index bdcca46ae54a..b1291cdb9b53 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -226,6 +226,36 @@ void __wake_up_sync(struct wait_queue_head *wq_head, unsigned int mode, int nr); extern void init_wait_entry(struct wait_queue_entry *wq_entry, int flags); /* + * The current state that the wait_event() loop operates + * over. This minimally initialized by the macro and then + * iterated inside the wait_event_loop() iterator: + */ +struct wait_event_state { + char queued; + char prepared; + char done; + + long ret; + struct wait_queue_entry wq_entry; +}; + +extern long wait_event_loop(struct wait_queue_head *wq_head, struct wait_event_state *wes, int condition); + +#define wait_event_v2(wq_head, condition) \ +({ \ + struct wait_event_state __wes; \ + long __ret; \ + \ + __wes.queued = 0; \ + \ + do { \ + __ret = wait_event_loop(&(wq_head), &__wes, (condition) != 0); \ + } while (!__wes.done); \ + \ + __ret; \ +}) + +/* * The below macro ___wait_event() has an explicit shadow of the __ret * variable when used from the wait_event_*() macros. * @@ -277,7 +307,7 @@ __out: __ret; \ * wake_up() has to be called after changing any variable that could * change the result of the wait condition. */ -#define wait_event(wq_head, condition) \ +#define wait_event_v1(wq_head, condition) \ do { \ might_sleep(); \ if (condition) \ @@ -285,6 +315,8 @@ do { \ __wait_event(wq_head, condition); \ } while (0) +#define wait_event wait_event_v2 + #define __io_wait_event(wq_head, condition) \ (void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 0, 0, \ io_schedule()) diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index 48794482d9ac..6d1e1139d5d3 100644 --- a/kernel/sched/wait.c +++ b/kernel/sched/wait.c @@ -293,6 +293,63 @@ static inline bool is_kthread_should_stop(void) } /* + * The main wait_event*() event loop iteration state machine. + * + * Note that this function itself does not loop, it returns to + * the caller to evaluate the call site dependent condition in + * every iteration. + */ +long wait_event_loop(struct wait_queue_head *wq_head, struct wait_event_state *wes, int condition) +{ + if (!wes->queued) { + might_sleep(); + /* + * If we are not initialized yet and the condition is already + * met, we can return immediately: + */ + if (condition) { + wes->done = 1; + return 0; + } + + /* Set up the wait-queue entry: */ + init_wait_entry(&wes->wq_entry, 0); + + wes->done = 0; + wes->queued = 1; + wes->prepared = 0; + wes->ret = 0; + } else { + /* Here is where we notice an updated wait condition: */ + if (condition) { + finish_wait(wq_head, &wes->wq_entry); + wes->done = 1; + return 0; + } + } + + if (!wes->prepared) { +prepare_again: + wes->ret = prepare_to_wait_event(wq_head, &wes->wq_entry, 0); + wes->prepared = 1; + + return 0; + } + + if (___wait_is_interruptible(TASK_UNINTERRUPTIBLE) && wes->ret) { + /* We already got dequeued, so mark it done: */ + wes->done = 1; + + /* But return any eventual interruption code: */ + return wes->ret; + } + + schedule(); + goto prepare_again; +} +EXPORT_SYMBOL_GPL(wait_event_loop); + +/* * DEFINE_WAIT_FUNC(wait, woken_wake_func); * * add_wait_queue(&wq_head, &wait);