All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] Simple wait queue support
@ 2015-10-20  7:28 Daniel Wagner
  2015-10-20  7:28 ` [PATCH v3 1/4] wait.[ch]: Introduce the simple waitqueue (swait) implementation Daniel Wagner
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Daniel Wagner @ 2015-10-20  7:28 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Daniel Wagner, Marcelo Tosatti, Paolo Bonzini, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra (Intel),
	Thomas Gleixner

Hi,

Only small updates in this version, like fixing mips and reordering
two patches to avoid lockdep warning when doing git bissect.  Reported
by Fengguang Wu's build robot. Thanks!

Also removed the unnecessary initialization in the rcu patch as Paul
pointed out.

Hopefully, I do a better job on Cc list this time.

These patches are against

  tip/master 11f4d95e6b634d7d41e7c2b521fcec261efbf769

also available as git tree:

  git://git.kernel.org/pub/scm/linux/kernel/git/wagi/linux.git tip-swait

cheers,
daniel

changes since v2
 - rebased again on tip/master. The patches apply
   cleanly on v4.3-rc6 too.
 - fixed up mips
 - reordered patches to avoid lockdep warning when doing bissect.
 - remove unnecessary initialization of rsp->rda in rcu_init_one().

changes since v1 (PATCH v0)
 - rebased and fixed some typos found by cross building
   for S390, ARM and powerpc. For some unknown reason didn't catch
   them last time.
 - dropped completion patches because it is not clear yet
   how to handle complete_all() calls hard-irq/atomic contexts
   and swake_up_all.

changes since v0 (RFC v0)
 - promoted the series to PATCH state instead of RFC
 - fixed a few fallouts with build all and some cross compilers
   such ARM, PowerPC, S390.
 - Added the simple waitqueue transformation for KVM from -rt
   including some numbers requested by Paolo.
 - Added a commit message to PeterZ's patch. Hope he likes it.

[I got the numbering wrong in v1, so instead 'PATCH v1' you find it
 as 'PATCH v0' series]

v1: http://lwn.net/Articles/656942/
v0: http://lwn.net/Articles/653586/

Daniel Wagner (1):
  rcu: Do not call rcu_nocb_gp_cleanup() while holding rnp->lock

Marcelo Tosatti (1):
  KVM: use simple waitqueue for vcpu->wq

Paul Gortmaker (1):
  rcu: use simple wait queues where possible in rcutree

Peter Zijlstra (Intel) (1):
  wait.[ch]: Introduce the simple waitqueue (swait) implementation

 arch/arm/kvm/arm.c                  |   4 +-
 arch/arm/kvm/psci.c                 |   4 +-
 arch/mips/kvm/mips.c                |   8 +-
 arch/powerpc/include/asm/kvm_host.h |   4 +-
 arch/powerpc/kvm/book3s_hv.c        |  23 +++--
 arch/s390/include/asm/kvm_host.h    |   2 +-
 arch/s390/kvm/interrupt.c           |   8 +-
 arch/x86/kvm/lapic.c                |   6 +-
 include/linux/kvm_host.h            |   5 +-
 include/linux/swait.h               | 172 ++++++++++++++++++++++++++++++++++++
 kernel/rcu/tree.c                   |  16 ++--
 kernel/rcu/tree.h                   |  10 ++-
 kernel/rcu/tree_plugin.h            |  32 ++++---
 kernel/sched/Makefile               |   2 +-
 kernel/sched/swait.c                | 122 +++++++++++++++++++++++++
 virt/kvm/async_pf.c                 |   4 +-
 virt/kvm/kvm_main.c                 |  17 ++--
 17 files changed, 373 insertions(+), 66 deletions(-)
 create mode 100644 include/linux/swait.h
 create mode 100644 kernel/sched/swait.c

-- 
2.4.3


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 1/4] wait.[ch]: Introduce the simple waitqueue (swait) implementation
  2015-10-20  7:28 [PATCH v3 0/4] Simple wait queue support Daniel Wagner
@ 2015-10-20  7:28 ` Daniel Wagner
  2015-10-26 12:04   ` Boqun Feng
  2015-11-04 10:33   ` Thomas Gleixner
  2015-10-20  7:28 ` [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq Daniel Wagner
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Daniel Wagner @ 2015-10-20  7:28 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Peter Zijlstra (Intel),
	Daniel Wagner, Paul Gortmaker, Marcelo Tosatti, Paolo Bonzini,
	Paul E. McKenney, Thomas Gleixner

From: "Peter Zijlstra (Intel)" <peterz@infradead.org>

The existing wait queue support has support for custom wake up call
backs, wake flags, wake key (passed to call back) and exclusive
flags that allow wakers to be tagged as exclusive, for limiting
the number of wakers.

In a lot of cases, none of these features are used, and hence we
can benefit from a slimmed down version that lowers memory overhead
and reduces runtime overhead.

The concept originated from -rt, where waitqueues are a constant
source of trouble, as we can't convert the head lock to a raw
spinlock due to fancy and long lasting callbacks.

With the removal of custom callbacks, we can use a raw lock for
queue list manipulations, hence allowing the simple wait support
to be used in -rt.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Mostly-Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: linux-kernel@vger.kernel.org

[Patch is from PeterZ which is based on Thomas version.
 Commit message is written by Paul G.
 And some compile issues fixed by Daniel.]
---
 include/linux/swait.h | 172 ++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/Makefile |   2 +-
 kernel/sched/swait.c  | 122 +++++++++++++++++++++++++++++++++++
 3 files changed, 295 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/swait.h
 create mode 100644 kernel/sched/swait.c

diff --git a/include/linux/swait.h b/include/linux/swait.h
new file mode 100644
index 0000000..c1f9c62
--- /dev/null
+++ b/include/linux/swait.h
@@ -0,0 +1,172 @@
+#ifndef _LINUX_SWAIT_H
+#define _LINUX_SWAIT_H
+
+#include <linux/list.h>
+#include <linux/stddef.h>
+#include <linux/spinlock.h>
+#include <asm/current.h>
+
+/*
+ * Simple wait queues
+ *
+ * While these are very similar to the other/complex wait queues (wait.h) the
+ * most important difference is that the simple waitqueue allows for
+ * deterministic behaviour -- IOW it has strictly bounded IRQ and lock hold
+ * times.
+ *
+ * In order to make this so, we had to drop a fair number of features of the
+ * other waitqueue code; notably:
+ *
+ *  - mixing INTERRUPTIBLE and UNINTERRUPTIBLE sleeps on the same waitqueue;
+ *    all wakeups are TASK_NORMAL in order to avoid O(n) lookups for the right
+ *    sleeper state.
+ *
+ *  - the exclusive mode; because this requires preserving the list order
+ *    and this is hard.
+ *
+ *  - custom wake functions; because you cannot give any guarantees about
+ *    random code.
+ *
+ * As a side effect of this; the data structures are slimmer.
+ *
+ * One would recommend using this wait queue where possible.
+ */
+
+struct task_struct;
+
+struct swait_queue_head {
+	raw_spinlock_t		lock;
+	struct list_head	task_list;
+};
+
+struct swait_queue {
+	struct task_struct	*task;
+	struct list_head	task_list;
+};
+
+#define __SWAITQUEUE_INITIALIZER(name) {				\
+	.task		= current,					\
+	.task_list	= LIST_HEAD_INIT((name).task_list),		\
+}
+
+#define DECLARE_SWAITQUEUE(name)					\
+	struct swait_queue name = __SWAITQUEUE_INITIALIZER(name)
+
+#define __SWAIT_QUEUE_HEAD_INITIALIZER(name) {				\
+	.lock		= __RAW_SPIN_LOCK_UNLOCKED(name.lock),		\
+	.task_list	= LIST_HEAD_INIT((name).task_list),		\
+}
+
+#define DECLARE_SWAIT_QUEUE_HEAD(name)					\
+	struct swait_queue_head name = __SWAIT_QUEUE_HEAD_INITIALIZER(name)
+
+extern void __init_swait_queue_head(struct swait_queue_head *q, const char *name,
+				    struct lock_class_key *key);
+
+#define init_swait_queue_head(q)				\
+	do {							\
+		static struct lock_class_key __key;		\
+		__init_swait_queue_head((q), #q, &__key);	\
+	} while (0)
+
+#ifdef CONFIG_LOCKDEP
+# define __SWAIT_QUEUE_HEAD_INIT_ONSTACK(name)			\
+	({ init_swait_queue_head(&name); name; })
+# define DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(name)			\
+	struct swait_queue_head name = __SWAIT_QUEUE_HEAD_INIT_ONSTACK(name)
+#else
+# define DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(name)			\
+	DECLARE_SWAIT_QUEUE_HEAD(name)
+#endif
+
+static inline int swait_active(struct swait_queue_head *q)
+{
+	return !list_empty(&q->task_list);
+}
+
+extern void swake_up(struct swait_queue_head *q);
+extern void swake_up_all(struct swait_queue_head *q);
+extern void swake_up_locked(struct swait_queue_head *q);
+
+extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
+extern void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state);
+extern long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state);
+
+extern void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait);
+extern void finish_swait(struct swait_queue_head *q, struct swait_queue *wait);
+
+/* as per ___wait_event() but for swait, therefore "exclusive == 0" */
+#define ___swait_event(wq, condition, state, ret, cmd)			\
+({									\
+	struct swait_queue __wait;					\
+	long __ret = ret;						\
+									\
+	INIT_LIST_HEAD(&__wait.task_list);				\
+	for (;;) {							\
+		long __int = prepare_to_swait_event(&wq, &__wait, state);\
+									\
+		if (condition)						\
+			break;						\
+									\
+		if (___wait_is_interruptible(state) && __int) {		\
+			__ret = __int;					\
+			break;						\
+		}							\
+									\
+		cmd;							\
+	}								\
+	finish_swait(&wq, &__wait);					\
+	__ret;								\
+})
+
+#define __swait_event(wq, condition)					\
+	(void)___swait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0,	\
+			    schedule())
+
+#define swait_event(wq, condition)					\
+do {									\
+	if (condition)							\
+		break;							\
+	__swait_event(wq, condition);					\
+} while (0)
+
+#define __swait_event_timeout(wq, condition, timeout)			\
+	___swait_event(wq, ___wait_cond_timeout(condition),		\
+		      TASK_UNINTERRUPTIBLE, timeout,			\
+		      __ret = schedule_timeout(__ret))
+
+#define swait_event_timeout(wq, condition, timeout)			\
+({									\
+	long __ret = timeout;						\
+	if (!___wait_cond_timeout(condition))				\
+		__ret = __swait_event_timeout(wq, condition, timeout);	\
+	__ret;								\
+})
+
+#define __swait_event_interruptible(wq, condition)			\
+	___swait_event(wq, condition, TASK_INTERRUPTIBLE, 0,		\
+		      schedule())
+
+#define swait_event_interruptible(wq, condition)			\
+({									\
+	int __ret = 0;							\
+	if (!(condition))						\
+		__ret = __swait_event_interruptible(wq, condition);	\
+	__ret;								\
+})
+
+#define __swait_event_interruptible_timeout(wq, condition, timeout)	\
+	___swait_event(wq, ___wait_cond_timeout(condition),		\
+		      TASK_INTERRUPTIBLE, timeout,			\
+		      __ret = schedule_timeout(__ret))
+
+#define swait_event_interruptible_timeout(wq, condition, timeout)	\
+({									\
+	long __ret = timeout;						\
+	if (!___wait_cond_timeout(condition))				\
+		__ret = __swait_event_interruptible_timeout(wq,		\
+						condition, timeout);	\
+	__ret;								\
+})
+
+#endif /* _LINUX_SWAIT_H */
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 6768797..7d4cba2 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -13,7 +13,7 @@ endif
 
 obj-y += core.o loadavg.o clock.o cputime.o
 obj-y += idle_task.o fair.o rt.o deadline.o stop_task.o
-obj-y += wait.o completion.o idle.o
+obj-y += wait.o swait.o completion.o idle.o
 obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o
 obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
 obj-$(CONFIG_SCHEDSTATS) += stats.o
diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c
new file mode 100644
index 0000000..533710e
--- /dev/null
+++ b/kernel/sched/swait.c
@@ -0,0 +1,122 @@
+#include <linux/sched.h>
+#include <linux/swait.h>
+
+void __init_swait_queue_head(struct swait_queue_head *q, const char *name,
+			     struct lock_class_key *key)
+{
+	raw_spin_lock_init(&q->lock);
+	lockdep_set_class_and_name(&q->lock, key, name);
+	INIT_LIST_HEAD(&q->task_list);
+}
+EXPORT_SYMBOL(__init_swait_queue_head);
+
+/*
+ * The thing about the wake_up_state() return value; I think we can ignore it.
+ *
+ * If for some reason it would return 0, that means the previously waiting
+ * task is already running, so it will observe condition true (or has already).
+ */
+void swake_up_locked(struct swait_queue_head *q)
+{
+	struct swait_queue *curr;
+
+	list_for_each_entry(curr, &q->task_list, task_list) {
+		wake_up_process(curr->task);
+		list_del_init(&curr->task_list);
+		break;
+	}
+}
+EXPORT_SYMBOL(swake_up_locked);
+
+void swake_up(struct swait_queue_head *q)
+{
+	unsigned long flags;
+
+	if (!swait_active(q))
+		return;
+
+	raw_spin_lock_irqsave(&q->lock, flags);
+	swake_up_locked(q);
+	raw_spin_unlock_irqrestore(&q->lock, flags);
+}
+EXPORT_SYMBOL(swake_up);
+
+/*
+ * Does not allow usage from IRQ disabled, since we must be able to
+ * release IRQs to guarantee bounded hold time.
+ */
+void swake_up_all(struct swait_queue_head *q)
+{
+	struct swait_queue *curr;
+	LIST_HEAD(tmp);
+
+	if (!swait_active(q))
+		return;
+
+	raw_spin_lock_irq(&q->lock);
+	list_splice_init(&q->task_list, &tmp);
+	while (!list_empty(&tmp)) {
+		curr = list_first_entry(&tmp, typeof(*curr), task_list);
+
+		wake_up_state(curr->task, TASK_NORMAL);
+		list_del_init(&curr->task_list);
+
+		if (list_empty(&tmp))
+			break;
+
+		raw_spin_unlock_irq(&q->lock);
+		raw_spin_lock_irq(&q->lock);
+	}
+	raw_spin_unlock_irq(&q->lock);
+}
+EXPORT_SYMBOL(swake_up_all);
+
+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
+{
+	wait->task = current;
+	if (list_empty(&wait->task_list))
+		list_add(&wait->task_list, &q->task_list);
+}
+
+void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&q->lock, flags);
+	__prepare_to_swait(q, wait);
+	set_current_state(state);
+	raw_spin_unlock_irqrestore(&q->lock, flags);
+}
+EXPORT_SYMBOL(prepare_to_swait);
+
+long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state)
+{
+	if (signal_pending_state(state, current))
+		return -ERESTARTSYS;
+
+	prepare_to_swait(q, wait, state);
+
+	return 0;
+}
+EXPORT_SYMBOL(prepare_to_swait_event);
+
+void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
+{
+	__set_current_state(TASK_RUNNING);
+	if (!list_empty(&wait->task_list))
+		list_del_init(&wait->task_list);
+}
+
+void finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
+{
+	unsigned long flags;
+
+	__set_current_state(TASK_RUNNING);
+
+	if (!list_empty_careful(&wait->task_list)) {
+		raw_spin_lock_irqsave(&q->lock, flags);
+		list_del_init(&wait->task_list);
+		raw_spin_unlock_irqrestore(&q->lock, flags);
+	}
+}
+EXPORT_SYMBOL(finish_swait);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq
  2015-10-20  7:28 [PATCH v3 0/4] Simple wait queue support Daniel Wagner
  2015-10-20  7:28 ` [PATCH v3 1/4] wait.[ch]: Introduce the simple waitqueue (swait) implementation Daniel Wagner
@ 2015-10-20  7:28 ` Daniel Wagner
  2015-10-20 13:11   ` Paolo Bonzini
  2015-10-20 14:00   ` Peter Zijlstra
  2015-10-20  7:28 ` [PATCH v3 3/4] rcu: Do not call rcu_nocb_gp_cleanup() while holding rnp->lock Daniel Wagner
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Daniel Wagner @ 2015-10-20  7:28 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Marcelo Tosatti, Daniel Wagner, Paolo Bonzini, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra (Intel),
	Thomas Gleixner

From: Marcelo Tosatti <mtosatti@redhat.com>

The problem:

On -rt, an emulated LAPIC timer instances has the following path:

1) hard interrupt
2) ksoftirqd is scheduled
3) ksoftirqd wakes up vcpu thread
4) vcpu thread is scheduled

This extra context switch introduces unnecessary latency in the
LAPIC path for a KVM guest.

The solution:

Allow waking up vcpu thread from hardirq context,
thus avoiding the need for ksoftirqd to be scheduled.

Normal waitqueues make use of spinlocks, which on -RT
are sleepable locks. Therefore, waking up a waitqueue
waiter involves locking a sleeping lock, which
is not allowed from hard interrupt context.

cyclictest command line:

This patch reduces the average latency in my tests from 14us to 11us.

Daniel writes:
Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency
benchmark on mainline. The test was run 382, respectively 300 times:

  ./x86-run x86/tscdeadline_latency.flat -cpu host

with idle=poll.

The test seems not to deliver really stable numbers though most of
them are smaller which I consider a good sign:

Before:

	min              max          mean            std
count   382.000000       382.000000    382.000000     382.000000
mean   6068.552356    269502.528796   8056.016198    3912.128273
std     707.404966    848866.474783   1062.472704    9835.891707
min    2335.000000     29828.000000   7337.426000     445.738750
25%    6004.500000     44237.500000   7471.094250    1078.834837
50%    6372.000000     64175.000000   7663.133700    1783.172446
75%    6465.500000    150384.500000   8210.771900    2759.734524
max    6886.000000  10188451.000000  15466.434000  120469.205668

After
	min             max          mean           std
count   300.000000      300.000000    300.000000    300.000000
mean   5618.380000   217464.786667   7745.545114   3258.483272
std     824.719741   516371.888369    847.391685   5632.943904
min    3494.000000    31410.000000   7083.574800    438.445477
25%    4937.000000    45446.000000   7214.102850   1045.536261
50%    6118.000000    67023.000000   7417.330800   1699.574075
75%    6224.000000   134191.500000   7871.625600   2809.536185
max    6654.000000  4570896.000000  13528.788600  52206.226799

[Patch was originaly based on the swait implementation found in the -rt
 tree. Daniel ported it to mainline's version and gathered the
 benchmark numbers for tscdeadline_latency test.]

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org
---
 arch/arm/kvm/arm.c                  |  4 ++--
 arch/arm/kvm/psci.c                 |  4 ++--
 arch/mips/kvm/mips.c                |  8 ++++----
 arch/powerpc/include/asm/kvm_host.h |  4 ++--
 arch/powerpc/kvm/book3s_hv.c        | 23 +++++++++++------------
 arch/s390/include/asm/kvm_host.h    |  2 +-
 arch/s390/kvm/interrupt.c           |  8 ++++----
 arch/x86/kvm/lapic.c                |  6 +++---
 include/linux/kvm_host.h            |  5 +++--
 virt/kvm/async_pf.c                 |  4 ++--
 virt/kvm/kvm_main.c                 | 17 ++++++++---------
 11 files changed, 42 insertions(+), 43 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dc017ad..97e8336 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -470,9 +470,9 @@ bool kvm_arch_intc_initialized(struct kvm *kvm)
 
 static void vcpu_pause(struct kvm_vcpu *vcpu)
 {
-	wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
+	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
 
-	wait_event_interruptible(*wq, !vcpu->arch.pause);
+	swait_event_interruptible(*wq, !vcpu->arch.pause);
 }
 
 static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
index ad6f642..2b93577 100644
--- a/arch/arm/kvm/psci.c
+++ b/arch/arm/kvm/psci.c
@@ -70,7 +70,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 {
 	struct kvm *kvm = source_vcpu->kvm;
 	struct kvm_vcpu *vcpu = NULL;
-	wait_queue_head_t *wq;
+	struct swait_queue_head *wq;
 	unsigned long cpu_id;
 	unsigned long context_id;
 	phys_addr_t target_pc;
@@ -119,7 +119,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	smp_mb();		/* Make sure the above is visible */
 
 	wq = kvm_arch_vcpu_wq(vcpu);
-	wake_up_interruptible(wq);
+	swake_up(wq);
 
 	return PSCI_RET_SUCCESS;
 }
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 49ff3bf..290161d 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -442,8 +442,8 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
 
 	dvcpu->arch.wait = 0;
 
-	if (waitqueue_active(&dvcpu->wq))
-		wake_up_interruptible(&dvcpu->wq);
+	if (swait_active(&dvcpu->wq))
+		swake_up(&dvcpu->wq);
 
 	return 0;
 }
@@ -1171,8 +1171,8 @@ static void kvm_mips_comparecount_func(unsigned long data)
 	kvm_mips_callbacks->queue_timer_int(vcpu);
 
 	vcpu->arch.wait = 0;
-	if (waitqueue_active(&vcpu->wq))
-		wake_up_interruptible(&vcpu->wq);
+	if (swait_active(&vcpu->wq))
+		swake_up(&vcpu->wq);
 }
 
 /* low level hrtimer wake routine */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 827a38d..12e9835 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -286,7 +286,7 @@ struct kvmppc_vcore {
 	struct list_head runnable_threads;
 	struct list_head preempt_list;
 	spinlock_t lock;
-	wait_queue_head_t wq;
+	struct swait_queue_head wq;
 	spinlock_t stoltb_lock;	/* protects stolen_tb and preempt_tb */
 	u64 stolen_tb;
 	u64 preempt_tb;
@@ -628,7 +628,7 @@ struct kvm_vcpu_arch {
 	u8 prodded;
 	u32 last_inst;
 
-	wait_queue_head_t *wqp;
+	struct swait_queue_head *wqp;
 	struct kvmppc_vcore *vcore;
 	int ret;
 	int trap;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2280497..f534e15 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -121,11 +121,11 @@ static bool kvmppc_ipi_thread(int cpu)
 static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 {
 	int cpu;
-	wait_queue_head_t *wqp;
+	struct swait_queue_head *wqp;
 
 	wqp = kvm_arch_vcpu_wq(vcpu);
-	if (waitqueue_active(wqp)) {
-		wake_up_interruptible(wqp);
+	if (swait_active(wqp)) {
+		swake_up(wqp);
 		++vcpu->stat.halt_wakeup;
 	}
 
@@ -708,8 +708,8 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 		tvcpu->arch.prodded = 1;
 		smp_mb();
 		if (vcpu->arch.ceded) {
-			if (waitqueue_active(&vcpu->wq)) {
-				wake_up_interruptible(&vcpu->wq);
+			if (swait_active(&vcpu->wq)) {
+				swake_up(&vcpu->wq);
 				vcpu->stat.halt_wakeup++;
 			}
 		}
@@ -1448,7 +1448,7 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
 	INIT_LIST_HEAD(&vcore->runnable_threads);
 	spin_lock_init(&vcore->lock);
 	spin_lock_init(&vcore->stoltb_lock);
-	init_waitqueue_head(&vcore->wq);
+	init_swait_queue_head(&vcore->wq);
 	vcore->preempt_tb = TB_NIL;
 	vcore->lpcr = kvm->arch.lpcr;
 	vcore->first_vcpuid = core * threads_per_subcore;
@@ -2560,10 +2560,9 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 {
 	struct kvm_vcpu *vcpu;
 	int do_sleep = 1;
+	DECLARE_SWAITQUEUE(wait);
 
-	DEFINE_WAIT(wait);
-
-	prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
+	prepare_to_swait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
 
 	/*
 	 * Check one last time for pending exceptions and ceded state after
@@ -2577,7 +2576,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 	}
 
 	if (!do_sleep) {
-		finish_wait(&vc->wq, &wait);
+		finish_swait(&vc->wq, &wait);
 		return;
 	}
 
@@ -2585,7 +2584,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 	trace_kvmppc_vcore_blocked(vc, 0);
 	spin_unlock(&vc->lock);
 	schedule();
-	finish_wait(&vc->wq, &wait);
+	finish_swait(&vc->wq, &wait);
 	spin_lock(&vc->lock);
 	vc->vcore_state = VCORE_INACTIVE;
 	trace_kvmppc_vcore_blocked(vc, 1);
@@ -2641,7 +2640,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 			kvmppc_start_thread(vcpu, vc);
 			trace_kvm_guest_enter(vcpu);
 		} else if (vc->vcore_state == VCORE_SLEEPING) {
-			wake_up(&vc->wq);
+			swake_up(&vc->wq);
 		}
 
 	}
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 8ced426..a044ddb 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -427,7 +427,7 @@ struct kvm_s390_irq_payload {
 struct kvm_s390_local_interrupt {
 	spinlock_t lock;
 	struct kvm_s390_float_interrupt *float_int;
-	wait_queue_head_t *wq;
+	struct swait_queue_head *wq;
 	atomic_t *cpuflags;
 	DECLARE_BITMAP(sigp_emerg_pending, KVM_MAX_VCPUS);
 	struct kvm_s390_irq_payload irq;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 5c2c169..78625fa 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -884,13 +884,13 @@ no_timer:
 
 void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu)
 {
-	if (waitqueue_active(&vcpu->wq)) {
+	if (swait_active(&vcpu->wq)) {
 		/*
 		 * The vcpu gave up the cpu voluntarily, mark it as a good
 		 * yield-candidate.
 		 */
 		vcpu->preempted = true;
-		wake_up_interruptible(&vcpu->wq);
+		swake_up(&vcpu->wq);
 		vcpu->stat.halt_wakeup++;
 	}
 }
@@ -994,7 +994,7 @@ int kvm_s390_inject_program_int(struct kvm_vcpu *vcpu, u16 code)
 	spin_lock(&li->lock);
 	irq.u.pgm.code = code;
 	__inject_prog(vcpu, &irq);
-	BUG_ON(waitqueue_active(li->wq));
+	BUG_ON(swait_active(li->wq));
 	spin_unlock(&li->lock);
 	return 0;
 }
@@ -1009,7 +1009,7 @@ int kvm_s390_inject_prog_irq(struct kvm_vcpu *vcpu,
 	spin_lock(&li->lock);
 	irq.u.pgm = *pgm_info;
 	rc = __inject_prog(vcpu, &irq);
-	BUG_ON(waitqueue_active(li->wq));
+	BUG_ON(swait_active(li->wq));
 	spin_unlock(&li->lock);
 	return rc;
 }
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 8d9013c..a59aead 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1117,7 +1117,7 @@ static void apic_update_lvtt(struct kvm_lapic *apic)
 static void apic_timer_expired(struct kvm_lapic *apic)
 {
 	struct kvm_vcpu *vcpu = apic->vcpu;
-	wait_queue_head_t *q = &vcpu->wq;
+	struct swait_queue_head *q = &vcpu->wq;
 	struct kvm_timer *ktimer = &apic->lapic_timer;
 
 	if (atomic_read(&apic->lapic_timer.pending))
@@ -1126,8 +1126,8 @@ static void apic_timer_expired(struct kvm_lapic *apic)
 	atomic_inc(&apic->lapic_timer.pending);
 	kvm_set_pending_timer(vcpu);
 
-	if (waitqueue_active(q))
-		wake_up_interruptible(q);
+	if (swait_active(q))
+		swake_up(q);
 
 	if (apic_lvtt_tscdeadline(apic))
 		ktimer->expired_tscdeadline = ktimer->tscdeadline;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1bef9e2..7b6231e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -24,6 +24,7 @@
 #include <linux/err.h>
 #include <linux/irqflags.h>
 #include <linux/context_tracking.h>
+#include <linux/swait.h>
 #include <asm/signal.h>
 
 #include <linux/kvm.h>
@@ -237,7 +238,7 @@ struct kvm_vcpu {
 	int fpu_active;
 	int guest_fpu_loaded, guest_xcr0_loaded;
 	unsigned char fpu_counter;
-	wait_queue_head_t wq;
+	struct swait_queue_head wq;
 	struct pid *pid;
 	int sigset_active;
 	sigset_t sigset;
@@ -759,7 +760,7 @@ static inline bool kvm_arch_has_assigned_device(struct kvm *kvm)
 }
 #endif
 
-static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
+static inline struct swait_queue_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
 {
 #ifdef __KVM_HAVE_ARCH_WQP
 	return vcpu->arch.wqp;
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 44660ae..ff4891c 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -94,8 +94,8 @@ static void async_pf_execute(struct work_struct *work)
 
 	trace_kvm_async_pf_completed(addr, gva);
 
-	if (waitqueue_active(&vcpu->wq))
-		wake_up_interruptible(&vcpu->wq);
+	if (swait_active(&vcpu->wq))
+		swake_up(&vcpu->wq);
 
 	mmput(mm);
 	kvm_put_kvm(vcpu->kvm);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8db1d93..45ab55f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -226,8 +226,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 	vcpu->kvm = kvm;
 	vcpu->vcpu_id = id;
 	vcpu->pid = NULL;
-	vcpu->halt_poll_ns = 0;
-	init_waitqueue_head(&vcpu->wq);
+	init_swait_queue_head(&vcpu->wq);
 	kvm_async_pf_vcpu_init(vcpu);
 
 	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
@@ -1996,7 +1995,7 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
 void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 {
 	ktime_t start, cur;
-	DEFINE_WAIT(wait);
+	DECLARE_SWAITQUEUE(wait);
 	bool waited = false;
 	u64 block_ns;
 
@@ -2019,7 +2018,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 	}
 
 	for (;;) {
-		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_swait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
 
 		if (kvm_vcpu_check_block(vcpu) < 0)
 			break;
@@ -2028,7 +2027,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 		schedule();
 	}
 
-	finish_wait(&vcpu->wq, &wait);
+	finish_swait(&vcpu->wq, &wait);
 	cur = ktime_get();
 
 out:
@@ -2059,11 +2058,11 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 {
 	int me;
 	int cpu = vcpu->cpu;
-	wait_queue_head_t *wqp;
+	struct swait_queue_head *wqp;
 
 	wqp = kvm_arch_vcpu_wq(vcpu);
-	if (waitqueue_active(wqp)) {
-		wake_up_interruptible(wqp);
+	if (swait_active(wqp)) {
+		swake_up(wqp);
 		++vcpu->stat.halt_wakeup;
 	}
 
@@ -2164,7 +2163,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 				continue;
 			if (vcpu == me)
 				continue;
-			if (waitqueue_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
+			if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
 				continue;
 			if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
 				continue;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 3/4] rcu: Do not call rcu_nocb_gp_cleanup() while holding rnp->lock
  2015-10-20  7:28 [PATCH v3 0/4] Simple wait queue support Daniel Wagner
  2015-10-20  7:28 ` [PATCH v3 1/4] wait.[ch]: Introduce the simple waitqueue (swait) implementation Daniel Wagner
  2015-10-20  7:28 ` [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq Daniel Wagner
@ 2015-10-20  7:28 ` Daniel Wagner
  2015-10-20  7:28 ` [PATCH v3 4/4] rcu: use simple wait queues where possible in rcutree Daniel Wagner
  2015-10-25 20:10 ` [PATCH v3 0/4] Simple wait queue support Paul E. McKenney
  4 siblings, 0 replies; 26+ messages in thread
From: Daniel Wagner @ 2015-10-20  7:28 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Daniel Wagner, Paul E. McKenney, Peter Zijlstra, Thomas Gleixner,
	Marcelo Tosatti, Paolo Bonzini, Paul Gortmaker

rcu_nocb_gp_cleanup() is called while holding rnp->lock. Currently,
this is okay because the wake_up_all() in rcu_nocb_gp_cleanup() will
not enable the IRQs. lockdep is happy.

By switching over using swait this is not true anymore. swake_up_all()
enables the IRQs while processing the waiters. __do_softirq() can now
run and will eventually call rcu_process_callbacks() which wants to
grap nrp->lock.

Let's move the rcu_nocb_gp_cleanup() call outside the lock before we
switch over to swait.

If we would hold the rnp->lock and use swait, lockdep reports
following:

 =================================
 [ INFO: inconsistent lock state ]
 4.2.0-rc5-00025-g9a73ba0 #136 Not tainted
 ---------------------------------
 inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
 rcu_preempt/8 [HC0[0]:SC0[0]:HE1:SE1] takes:
  (rcu_node_1){+.?...}, at: [<ffffffff811387c7>] rcu_gp_kthread+0xb97/0xeb0
 {IN-SOFTIRQ-W} state was registered at:
   [<ffffffff81109b9f>] __lock_acquire+0xd5f/0x21e0
   [<ffffffff8110be0f>] lock_acquire+0xdf/0x2b0
   [<ffffffff81841cc9>] _raw_spin_lock_irqsave+0x59/0xa0
   [<ffffffff81136991>] rcu_process_callbacks+0x141/0x3c0
   [<ffffffff810b1a9d>] __do_softirq+0x14d/0x670
   [<ffffffff810b2214>] irq_exit+0x104/0x110
   [<ffffffff81844e96>] smp_apic_timer_interrupt+0x46/0x60
   [<ffffffff81842e70>] apic_timer_interrupt+0x70/0x80
   [<ffffffff810dba66>] rq_attach_root+0xa6/0x100
   [<ffffffff810dbc2d>] cpu_attach_domain+0x16d/0x650
   [<ffffffff810e4b42>] build_sched_domains+0x942/0xb00
   [<ffffffff821777c2>] sched_init_smp+0x509/0x5c1
   [<ffffffff821551e3>] kernel_init_freeable+0x172/0x28f
   [<ffffffff8182cdce>] kernel_init+0xe/0xe0
   [<ffffffff8184231f>] ret_from_fork+0x3f/0x70
 irq event stamp: 76
 hardirqs last  enabled at (75): [<ffffffff81841330>] _raw_spin_unlock_irq+0x30/0x60
 hardirqs last disabled at (76): [<ffffffff8184116f>] _raw_spin_lock_irq+0x1f/0x90
 softirqs last  enabled at (0): [<ffffffff810a8df2>] copy_process.part.26+0x602/0x1cf0
 softirqs last disabled at (0): [<          (null)>]           (null)
 other info that might help us debug this:
  Possible unsafe locking scenario:
        CPU0
        ----
   lock(rcu_node_1);
   <Interrupt>
     lock(rcu_node_1);
  *** DEADLOCK ***
 1 lock held by rcu_preempt/8:
  #0:  (rcu_node_1){+.?...}, at: [<ffffffff811387c7>] rcu_gp_kthread+0xb97/0xeb0
 stack backtrace:
 CPU: 0 PID: 8 Comm: rcu_preempt Not tainted 4.2.0-rc5-00025-g9a73ba0 #136
 Hardware name: Dell Inc. PowerEdge R820/066N7P, BIOS 2.0.20 01/16/2014
  0000000000000000 000000006d7e67d8 ffff881fb081fbd8 ffffffff818379e0
  0000000000000000 ffff881fb0812a00 ffff881fb081fc38 ffffffff8110813b
  0000000000000000 0000000000000001 ffff881f00000001 ffffffff8102fa4f
 Call Trace:
  [<ffffffff818379e0>] dump_stack+0x4f/0x7b
  [<ffffffff8110813b>] print_usage_bug+0x1db/0x1e0
  [<ffffffff8102fa4f>] ? save_stack_trace+0x2f/0x50
  [<ffffffff811087ad>] mark_lock+0x66d/0x6e0
  [<ffffffff81107790>] ? check_usage_forwards+0x150/0x150
  [<ffffffff81108898>] mark_held_locks+0x78/0xa0
  [<ffffffff81841330>] ? _raw_spin_unlock_irq+0x30/0x60
  [<ffffffff81108a28>] trace_hardirqs_on_caller+0x168/0x220
  [<ffffffff81108aed>] trace_hardirqs_on+0xd/0x10
  [<ffffffff81841330>] _raw_spin_unlock_irq+0x30/0x60
  [<ffffffff810fd1c7>] swake_up_all+0xb7/0xe0
  [<ffffffff811386e1>] rcu_gp_kthread+0xab1/0xeb0
  [<ffffffff811089bf>] ? trace_hardirqs_on_caller+0xff/0x220
  [<ffffffff81841341>] ? _raw_spin_unlock_irq+0x41/0x60
  [<ffffffff81137c30>] ? rcu_barrier+0x20/0x20
  [<ffffffff810d2014>] kthread+0x104/0x120
  [<ffffffff81841330>] ? _raw_spin_unlock_irq+0x30/0x60
  [<ffffffff810d1f10>] ? kthread_create_on_node+0x260/0x260
  [<ffffffff8184231f>] ret_from_fork+0x3f/0x70
  [<ffffffff810d1f10>] ? kthread_create_on_node+0x260/0x260

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
---
 kernel/rcu/tree.c        |  4 +++-
 kernel/rcu/tree.h        |  3 ++-
 kernel/rcu/tree_plugin.h | 16 +++++++++++++---
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 775d36c..952536d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1568,7 +1568,6 @@ static int rcu_future_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
 	int needmore;
 	struct rcu_data *rdp = this_cpu_ptr(rsp->rda);
 
-	rcu_nocb_gp_cleanup(rsp, rnp);
 	rnp->need_future_gp[c & 0x1] = 0;
 	needmore = rnp->need_future_gp[(c + 1) & 0x1];
 	trace_rcu_future_gp(rnp, rdp, c,
@@ -1972,6 +1971,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 	int nocb = 0;
 	struct rcu_data *rdp;
 	struct rcu_node *rnp = rcu_get_root(rsp);
+	struct swait_queue_head *sq;
 
 	WRITE_ONCE(rsp->gp_activity, jiffies);
 	raw_spin_lock_irq(&rnp->lock);
@@ -2010,7 +2010,9 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 			needgp = __note_gp_changes(rsp, rnp, rdp) || needgp;
 		/* smp_mb() provided by prior unlock-lock pair. */
 		nocb += rcu_future_gp_cleanup(rsp, rnp);
+		sq = rcu_nocb_gp_get(rnp);
 		raw_spin_unlock_irq(&rnp->lock);
+		rcu_nocb_gp_cleanup(sq);
 		cond_resched_rcu_qs();
 		WRITE_ONCE(rsp->gp_activity, jiffies);
 		rcu_gp_slow(rsp, gp_cleanup_delay);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 2e991f8..3dcf6368 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -608,7 +608,8 @@ static void zero_cpu_stall_ticks(struct rcu_data *rdp);
 static void increment_cpu_stall_ticks(void);
 static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
 static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq);
-static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp);
+static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
+static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
 static void rcu_init_one_nocb(struct rcu_node *rnp);
 static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp,
 			    bool lazy, unsigned long flags);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index b2bf396..db4f357 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1777,9 +1777,9 @@ early_param("rcu_nocb_poll", parse_rcu_nocb_poll);
  * Wake up any no-CBs CPUs' kthreads that were waiting on the just-ended
  * grace period.
  */
-static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
+static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq)
 {
-	wake_up_all(&rnp->nocb_gp_wq[rnp->completed & 0x1]);
+	wake_up_all(sq);
 }
 
 /*
@@ -1795,6 +1795,11 @@ static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq)
 	rnp->need_future_gp[(rnp->completed + 1) & 0x1] += nrq;
 }
 
+static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp)
+{
+	return &rnp->nocb_gp_wq[rnp->completed & 0x1];
+}
+
 static void rcu_init_one_nocb(struct rcu_node *rnp)
 {
 	init_waitqueue_head(&rnp->nocb_gp_wq[0]);
@@ -2469,7 +2474,7 @@ static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
 	return false;
 }
 
-static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
+static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq)
 {
 }
 
@@ -2477,6 +2482,11 @@ static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq)
 {
 }
 
+static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp)
+{
+	return NULL;
+}
+
 static void rcu_init_one_nocb(struct rcu_node *rnp)
 {
 }
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 4/4] rcu: use simple wait queues where possible in rcutree
  2015-10-20  7:28 [PATCH v3 0/4] Simple wait queue support Daniel Wagner
                   ` (2 preceding siblings ...)
  2015-10-20  7:28 ` [PATCH v3 3/4] rcu: Do not call rcu_nocb_gp_cleanup() while holding rnp->lock Daniel Wagner
@ 2015-10-20  7:28 ` Daniel Wagner
  2015-10-25 20:10 ` [PATCH v3 0/4] Simple wait queue support Paul E. McKenney
  4 siblings, 0 replies; 26+ messages in thread
From: Daniel Wagner @ 2015-10-20  7:28 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Paul Gortmaker, Daniel Wagner, Paul E. McKenney,
	Peter Zijlstra (Intel),
	Thomas Gleixner, Marcelo Tosatti, Paolo Bonzini

From: Paul Gortmaker <paul.gortmaker@windriver.com>

As of commit dae6e64d2bcfd4b06304ab864c7e3a4f6b5fedf4 ("rcu: Introduce
proper blocking to no-CBs kthreads GP waits") the RCU subsystem started
making use of wait queues.

Here we convert all additions of RCU wait queues to use simple wait queues,
since they don't need the extra overhead of the full wait queue features.

Originally this was done for RT kernels[1], since we would get things like...

  BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
  in_atomic(): 1, irqs_disabled(): 1, pid: 8, name: rcu_preempt
  Pid: 8, comm: rcu_preempt Not tainted
  Call Trace:
   [<ffffffff8106c8d0>] __might_sleep+0xd0/0xf0
   [<ffffffff817d77b4>] rt_spin_lock+0x24/0x50
   [<ffffffff8106fcf6>] __wake_up+0x36/0x70
   [<ffffffff810c4542>] rcu_gp_kthread+0x4d2/0x680
   [<ffffffff8105f910>] ? __init_waitqueue_head+0x50/0x50
   [<ffffffff810c4070>] ? rcu_gp_fqs+0x80/0x80
   [<ffffffff8105eabb>] kthread+0xdb/0xe0
   [<ffffffff8106b912>] ? finish_task_switch+0x52/0x100
   [<ffffffff817e0754>] kernel_thread_helper+0x4/0x10
   [<ffffffff8105e9e0>] ? __init_kthread_worker+0x60/0x60
   [<ffffffff817e0750>] ? gs_change+0xb/0xb

...and hence simple wait queues were deployed on RT out of necessity
(as simple wait uses a raw lock), but mainline might as well take
advantage of the more streamline support as well.

[1] This is a carry forward of work from v3.10-rt; the original conversion
was by Thomas on an earlier -rt version, and Sebastian extended it to
additional post-3.10 added RCU waiters; here I've added a commit log and
unified the RCU changes into one, and uprev'd it to match mainline RCU.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
---
 kernel/rcu/tree.c        | 12 ++++++------
 kernel/rcu/tree.h        |  7 ++++---
 kernel/rcu/tree_plugin.h | 18 +++++++++---------
 3 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 952536d..a927ef78 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1588,7 +1588,7 @@ static void rcu_gp_kthread_wake(struct rcu_state *rsp)
 	    !READ_ONCE(rsp->gp_flags) ||
 	    !rsp->gp_kthread)
 		return;
-	wake_up(&rsp->gp_wq);
+	swake_up(&rsp->gp_wq);
 }
 
 /*
@@ -2059,7 +2059,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
 					       READ_ONCE(rsp->gpnum),
 					       TPS("reqwait"));
 			rsp->gp_state = RCU_GP_WAIT_GPS;
-			wait_event_interruptible(rsp->gp_wq,
+			swait_event_interruptible(rsp->gp_wq,
 						 READ_ONCE(rsp->gp_flags) &
 						 RCU_GP_FLAG_INIT);
 			rsp->gp_state = RCU_GP_DONE_GPS;
@@ -2089,7 +2089,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
 					       READ_ONCE(rsp->gpnum),
 					       TPS("fqswait"));
 			rsp->gp_state = RCU_GP_WAIT_FQS;
-			ret = wait_event_interruptible_timeout(rsp->gp_wq,
+			ret = swait_event_interruptible_timeout(rsp->gp_wq,
 					rcu_gp_fqs_check_wake(rsp, &gf), j);
 			rsp->gp_state = RCU_GP_DOING_FQS;
 			/* Locking provides needed memory barriers. */
@@ -2212,7 +2212,7 @@ static void rcu_report_qs_rsp(struct rcu_state *rsp, unsigned long flags)
 	WARN_ON_ONCE(!rcu_gp_in_progress(rsp));
 	WRITE_ONCE(rsp->gp_flags, READ_ONCE(rsp->gp_flags) | RCU_GP_FLAG_FQS);
 	raw_spin_unlock_irqrestore(&rcu_get_root(rsp)->lock, flags);
-	rcu_gp_kthread_wake(rsp);
+	swake_up(&rsp->gp_wq);  /* Memory barrier implied by swake_up() path. */
 }
 
 /*
@@ -2873,7 +2873,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
 	}
 	WRITE_ONCE(rsp->gp_flags, READ_ONCE(rsp->gp_flags) | RCU_GP_FLAG_FQS);
 	raw_spin_unlock_irqrestore(&rnp_old->lock, flags);
-	rcu_gp_kthread_wake(rsp);
+	swake_up(&rsp->gp_wq); /* Memory barrier implied by swake_up() path. */
 }
 
 /*
@@ -4180,7 +4180,7 @@ static void __init rcu_init_one(struct rcu_state *rsp,
 		}
 	}
 
-	init_waitqueue_head(&rsp->gp_wq);
+	init_swait_queue_head(&rsp->gp_wq);
 	rnp = rsp->level[rcu_num_lvls - 1];
 	for_each_possible_cpu(i) {
 		while (i > rnp->grphi)
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 3dcf6368..30cf567 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -27,6 +27,7 @@
 #include <linux/threads.h>
 #include <linux/cpumask.h>
 #include <linux/seqlock.h>
+#include <linux/swait.h>
 #include <linux/stop_machine.h>
 
 /*
@@ -244,7 +245,7 @@ struct rcu_node {
 				/* Refused to boost: not sure why, though. */
 				/*  This can happen due to race conditions. */
 #ifdef CONFIG_RCU_NOCB_CPU
-	wait_queue_head_t nocb_gp_wq[2];
+	struct swait_queue_head nocb_gp_wq[2];
 				/* Place for rcu_nocb_kthread() to wait GP. */
 #endif /* #ifdef CONFIG_RCU_NOCB_CPU */
 	int need_future_gp[2];
@@ -388,7 +389,7 @@ struct rcu_data {
 	atomic_long_t nocb_q_count_lazy; /*  invocation (all stages). */
 	struct rcu_head *nocb_follower_head; /* CBs ready to invoke. */
 	struct rcu_head **nocb_follower_tail;
-	wait_queue_head_t nocb_wq;	/* For nocb kthreads to sleep on. */
+	struct swait_queue_head nocb_wq; /* For nocb kthreads to sleep on. */
 	struct task_struct *nocb_kthread;
 	int nocb_defer_wakeup;		/* Defer wakeup of nocb_kthread. */
 
@@ -475,7 +476,7 @@ struct rcu_state {
 	unsigned long gpnum;			/* Current gp number. */
 	unsigned long completed;		/* # of last completed gp. */
 	struct task_struct *gp_kthread;		/* Task for grace periods. */
-	wait_queue_head_t gp_wq;		/* Where GP task waits. */
+	struct swait_queue_head gp_wq;		/* Where GP task waits. */
 	short gp_flags;				/* Commands for GP task. */
 	short gp_state;				/* GP kthread sleep state. */
 
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index db4f357..0c69868 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1779,7 +1779,7 @@ early_param("rcu_nocb_poll", parse_rcu_nocb_poll);
  */
 static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq)
 {
-	wake_up_all(sq);
+	swake_up_all(sq);
 }
 
 /*
@@ -1802,8 +1802,8 @@ static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp)
 
 static void rcu_init_one_nocb(struct rcu_node *rnp)
 {
-	init_waitqueue_head(&rnp->nocb_gp_wq[0]);
-	init_waitqueue_head(&rnp->nocb_gp_wq[1]);
+	init_swait_queue_head(&rnp->nocb_gp_wq[0]);
+	init_swait_queue_head(&rnp->nocb_gp_wq[1]);
 }
 
 #ifndef CONFIG_RCU_NOCB_CPU_ALL
@@ -1828,7 +1828,7 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
 	if (READ_ONCE(rdp_leader->nocb_leader_sleep) || force) {
 		/* Prior smp_mb__after_atomic() orders against prior enqueue. */
 		WRITE_ONCE(rdp_leader->nocb_leader_sleep, false);
-		wake_up(&rdp_leader->nocb_wq);
+		swake_up(&rdp_leader->nocb_wq);
 	}
 }
 
@@ -2041,7 +2041,7 @@ static void rcu_nocb_wait_gp(struct rcu_data *rdp)
 	 */
 	trace_rcu_future_gp(rnp, rdp, c, TPS("StartWait"));
 	for (;;) {
-		wait_event_interruptible(
+		swait_event_interruptible(
 			rnp->nocb_gp_wq[c & 0x1],
 			(d = ULONG_CMP_GE(READ_ONCE(rnp->completed), c)));
 		if (likely(d))
@@ -2069,7 +2069,7 @@ wait_again:
 	/* Wait for callbacks to appear. */
 	if (!rcu_nocb_poll) {
 		trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, "Sleep");
-		wait_event_interruptible(my_rdp->nocb_wq,
+		swait_event_interruptible(my_rdp->nocb_wq,
 				!READ_ONCE(my_rdp->nocb_leader_sleep));
 		/* Memory barrier handled by smp_mb() calls below and repoll. */
 	} else if (firsttime) {
@@ -2144,7 +2144,7 @@ wait_again:
 			 * List was empty, wake up the follower.
 			 * Memory barriers supplied by atomic_long_add().
 			 */
-			wake_up(&rdp->nocb_wq);
+			swake_up(&rdp->nocb_wq);
 		}
 	}
 
@@ -2165,7 +2165,7 @@ static void nocb_follower_wait(struct rcu_data *rdp)
 		if (!rcu_nocb_poll) {
 			trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu,
 					    "FollowerSleep");
-			wait_event_interruptible(rdp->nocb_wq,
+			swait_event_interruptible(rdp->nocb_wq,
 						 READ_ONCE(rdp->nocb_follower_head));
 		} else if (firsttime) {
 			/* Don't drown trace log with "Poll"! */
@@ -2324,7 +2324,7 @@ void __init rcu_init_nohz(void)
 static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
 {
 	rdp->nocb_tail = &rdp->nocb_head;
-	init_waitqueue_head(&rdp->nocb_wq);
+	init_swait_queue_head(&rdp->nocb_wq);
 	rdp->nocb_follower_tail = &rdp->nocb_follower_head;
 }
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq
  2015-10-20  7:28 ` [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq Daniel Wagner
@ 2015-10-20 13:11   ` Paolo Bonzini
  2015-10-20 14:00   ` Peter Zijlstra
  1 sibling, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2015-10-20 13:11 UTC (permalink / raw)
  To: Daniel Wagner, linux-kernel, linux-rt-users
  Cc: Marcelo Tosatti, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra (Intel),
	Thomas Gleixner



On 20/10/2015 09:28, Daniel Wagner wrote:
> 	min              max          mean            std
> count   382.000000       382.000000    382.000000     382.000000
> mean   6068.552356    269502.528796   8056.016198    3912.128273
> std     707.404966    848866.474783   1062.472704    9835.891707
> min    2335.000000     29828.000000   7337.426000     445.738750
> 25%    6004.500000     44237.500000   7471.094250    1078.834837
> 50%    6372.000000     64175.000000   7663.133700    1783.172446
> 75%    6465.500000    150384.500000   8210.771900    2759.734524
> max    6886.000000  10188451.000000  15466.434000  120469.205668
> 
> After
> 	min             max          mean           std
> count   300.000000      300.000000    300.000000    300.000000
> mean   5618.380000   217464.786667   7745.545114   3258.483272
> std     824.719741   516371.888369    847.391685   5632.943904
> min    3494.000000    31410.000000   7083.574800    438.445477
> 25%    4937.000000    45446.000000   7214.102850   1045.536261
> 50%    6118.000000    67023.000000   7417.330800   1699.574075
> 75%    6224.000000   134191.500000   7871.625600   2809.536185
> max    6654.000000  4570896.000000  13528.788600  52206.226799

Anything above ~10000 cycles means that the host went to C1 or
lower---the number means more or less nothing in that case.

The mean shows an improvement indeed.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq
  2015-10-20  7:28 ` [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq Daniel Wagner
  2015-10-20 13:11   ` Paolo Bonzini
@ 2015-10-20 14:00   ` Peter Zijlstra
  2015-10-20 15:40     ` Paolo Bonzini
                       ` (3 more replies)
  1 sibling, 4 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-10-20 14:00 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-kernel, linux-rt-users, Marcelo Tosatti, Paolo Bonzini,
	Paul E. McKenney, Paul Gortmaker, Thomas Gleixner

On Tue, Oct 20, 2015 at 09:28:08AM +0200, Daniel Wagner wrote:
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 2280497..f534e15 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -2560,10 +2560,9 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>  {
>  	struct kvm_vcpu *vcpu;
>  	int do_sleep = 1;
> +	DECLARE_SWAITQUEUE(wait);
>  
> -	DEFINE_WAIT(wait);
> -
> -	prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
> +	prepare_to_swait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
>  
>  	/*
>  	 * Check one last time for pending exceptions and ceded state after
> @@ -2577,7 +2576,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>  	}
>  
>  	if (!do_sleep) {
> -		finish_wait(&vc->wq, &wait);
> +		finish_swait(&vc->wq, &wait);
>  		return;
>  	}
>  
> @@ -2585,7 +2584,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>  	trace_kvmppc_vcore_blocked(vc, 0);
>  	spin_unlock(&vc->lock);
>  	schedule();
> -	finish_wait(&vc->wq, &wait);
> +	finish_swait(&vc->wq, &wait);
>  	spin_lock(&vc->lock);
>  	vc->vcore_state = VCORE_INACTIVE;
>  	trace_kvmppc_vcore_blocked(vc, 1);

This one looks buggy, one should _NOT_ assume that your blocking
condition is true after schedule().

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8db1d93..45ab55f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2019,7 +2018,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  	}
>  
>  	for (;;) {
> -		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> +		prepare_to_swait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
>  
>  		if (kvm_vcpu_check_block(vcpu) < 0)
>  			break;
> @@ -2028,7 +2027,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  		schedule();
>  	}
>  
> -	finish_wait(&vcpu->wq, &wait);
> +	finish_swait(&vcpu->wq, &wait);
>  	cur = ktime_get();
>  
>  out:

Should we not take this opportunity to get rid of these open-coded wait
loops?


Does this work?

---
 arch/powerpc/kvm/book3s_hv.c | 33 +++++++++++++++++----------------
 virt/kvm/kvm_main.c          | 13 ++-----------
 2 files changed, 19 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 228049786888..b5b8bcad5105 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2552,18 +2552,10 @@ static void kvmppc_wait_for_exec(struct kvmppc_vcore *vc,
 	finish_wait(&vcpu->arch.cpu_run, &wait);
 }
 
-/*
- * All the vcpus in this vcore are idle, so wait for a decrementer
- * or external interrupt to one of the vcpus.  vc->lock is held.
- */
-static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
+static inline bool kvmppc_vcore_should_sleep(struct kvmppc_vcore *vc)
 {
 	struct kvm_vcpu *vcpu;
-	int do_sleep = 1;
-
-	DEFINE_WAIT(wait);
-
-	prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
+	bool sleep = true;
 
 	/*
 	 * Check one last time for pending exceptions and ceded state after
@@ -2571,26 +2563,35 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 	 */
 	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
 		if (vcpu->arch.pending_exceptions || !vcpu->arch.ceded) {
-			do_sleep = 0;
+			sleep = false;
 			break;
 		}
 	}
 
-	if (!do_sleep) {
-		finish_wait(&vc->wq, &wait);
-		return;
-	}
+	return sleep;
+}
 
+static inline void kvmppc_vcore_schedule(struct kvmppc_vcore *vc)
+{
 	vc->vcore_state = VCORE_SLEEPING;
 	trace_kvmppc_vcore_blocked(vc, 0);
 	spin_unlock(&vc->lock);
 	schedule();
-	finish_wait(&vc->wq, &wait);
 	spin_lock(&vc->lock);
 	vc->vcore_state = VCORE_INACTIVE;
 	trace_kvmppc_vcore_blocked(vc, 1);
 }
 
+/*
+ * All the vcpus in this vcore are idle, so wait for a decrementer
+ * or external interrupt to one of the vcpus.  vc->lock is held.
+ */
+static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
+{
+	___wait_event(vc->wq, !kvmppc_vcore_should_sleep(vc), TASK_IDLE, 0, 0,
+		      kvmppc_vcore_schedule(vc));
+}
+
 static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
 	int n_ceded;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8db1d9361993..488f00d79059 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1996,7 +1996,6 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
 void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 {
 	ktime_t start, cur;
-	DEFINE_WAIT(wait);
 	bool waited = false;
 	u64 block_ns;
 
@@ -2018,17 +2017,9 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 		} while (single_task_running() && ktime_before(cur, stop));
 	}
 
-	for (;;) {
-		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
+	___wait_event(vcpu->wq, kvm_cpu_check_block(vcpu) < 0, TASK_IDLE, 0, 0,
+			waited = true; schedule());
 
-		if (kvm_vcpu_check_block(vcpu) < 0)
-			break;
-
-		waited = true;
-		schedule();
-	}
-
-	finish_wait(&vcpu->wq, &wait);
 	cur = ktime_get();
 
 out:


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq
  2015-10-20 14:00   ` Peter Zijlstra
@ 2015-10-20 15:40     ` Paolo Bonzini
  2015-10-20 16:06       ` Peter Zijlstra
  2015-10-21  8:55     ` Paul Mackerras
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2015-10-20 15:40 UTC (permalink / raw)
  To: Peter Zijlstra, Daniel Wagner
  Cc: linux-kernel, linux-rt-users, Marcelo Tosatti, Paul E. McKenney,
	Paul Gortmaker, Thomas Gleixner



On 20/10/2015 16:00, Peter Zijlstra wrote:
>> > -		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
>> > +		prepare_to_swait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
>> >  
>> >  		if (kvm_vcpu_check_block(vcpu) < 0)
>> >  			break;
>> > @@ -2028,7 +2027,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>> >  		schedule();
>> >  	}
>> >  
>> > -	finish_wait(&vcpu->wq, &wait);
>> > +	finish_swait(&vcpu->wq, &wait);
>> >  	cur = ktime_get();
>> >  
>> >  out:
> Should we not take this opportunity to get rid of these open-coded wait
> loops?

I find them way more readable than a 6-argument __wait_event...

I've forwarded your remark about kvmppc_vcore_blocked to the kvm-ppc
maintainers.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq
  2015-10-20 15:40     ` Paolo Bonzini
@ 2015-10-20 16:06       ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-10-20 16:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Daniel Wagner, linux-kernel, linux-rt-users, Marcelo Tosatti,
	Paul E. McKenney, Paul Gortmaker, Thomas Gleixner

On Tue, Oct 20, 2015 at 05:40:36PM +0200, Paolo Bonzini wrote:
> 
> 
> On 20/10/2015 16:00, Peter Zijlstra wrote:
> >> > -		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> >> > +		prepare_to_swait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> >> >  
> >> >  		if (kvm_vcpu_check_block(vcpu) < 0)
> >> >  			break;
> >> > @@ -2028,7 +2027,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> >> >  		schedule();
> >> >  	}
> >> >  
> >> > -	finish_wait(&vcpu->wq, &wait);
> >> > +	finish_swait(&vcpu->wq, &wait);
> >> >  	cur = ktime_get();
> >> >  
> >> >  out:
> > Should we not take this opportunity to get rid of these open-coded wait
> > loops?
> 
> I find them way more readable than a 6-argument __wait_event...

I could introduce wait_event_idle_cmd() and be at 3 if you think that
helps.

#define __wait_event_idle_cmd(wq, cond, cmd) \
	___wait_event(wq, cond, TASK_IDLE, 0, 0, cmd)

etc..

Its that awkward waited variable that makes it hard to use the 'regular'
2 parameter thing. Although you could of course do horrible things like:

	__wait_event_idle(vcpu->wq, ({
		bool done = kvm_cpu_check_block(vcpu) < 0;
		if (!done)
			waited = true;
		done;
	}));


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq
  2015-10-20 14:00   ` Peter Zijlstra
  2015-10-20 15:40     ` Paolo Bonzini
@ 2015-10-21  8:55     ` Paul Mackerras
  2015-10-21  9:05       ` Peter Zijlstra
  2015-10-21  9:10     ` Paul Mackerras
  2015-10-21  9:24     ` Paul Mackerras
  3 siblings, 1 reply; 26+ messages in thread
From: Paul Mackerras @ 2015-10-21  8:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Wagner, linux-kernel, linux-rt-users, Marcelo Tosatti,
	Paolo Bonzini, Paul E. McKenney, Paul Gortmaker, Thomas Gleixner

On Tue, Oct 20, 2015 at 04:00:31PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 20, 2015 at 09:28:08AM +0200, Daniel Wagner wrote:
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index 2280497..f534e15 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -2560,10 +2560,9 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
> >  {
> >  	struct kvm_vcpu *vcpu;
> >  	int do_sleep = 1;
> > +	DECLARE_SWAITQUEUE(wait);
> >  
> > -	DEFINE_WAIT(wait);
> > -
> > -	prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
> > +	prepare_to_swait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
> >  
> >  	/*
> >  	 * Check one last time for pending exceptions and ceded state after
> > @@ -2577,7 +2576,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
> >  	}
> >  
> >  	if (!do_sleep) {
> > -		finish_wait(&vc->wq, &wait);
> > +		finish_swait(&vc->wq, &wait);
> >  		return;
> >  	}
> >  
> > @@ -2585,7 +2584,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
> >  	trace_kvmppc_vcore_blocked(vc, 0);
> >  	spin_unlock(&vc->lock);
> >  	schedule();
> > -	finish_wait(&vc->wq, &wait);
> > +	finish_swait(&vc->wq, &wait);
> >  	spin_lock(&vc->lock);
> >  	vc->vcore_state = VCORE_INACTIVE;
> >  	trace_kvmppc_vcore_blocked(vc, 1);
> 
> This one looks buggy, one should _NOT_ assume that your blocking
> condition is true after schedule().

Do you mean it's buggy in calling finish_swait there, or it's buggy in
not immediately re-checking the condition?  If the latter, then it's
OK because the sole caller of this function calls it in a loop and
checks the condition (all runnable vcpus in this vcore are idle) each
time around the loop.

> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 8db1d93..45ab55f 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2019,7 +2018,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> >  	}
> >  
> >  	for (;;) {
> > -		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> > +		prepare_to_swait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> >  
> >  		if (kvm_vcpu_check_block(vcpu) < 0)
> >  			break;
> > @@ -2028,7 +2027,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> >  		schedule();
> >  	}
> >  
> > -	finish_wait(&vcpu->wq, &wait);
> > +	finish_swait(&vcpu->wq, &wait);
> >  	cur = ktime_get();
> >  
> >  out:
> 
> Should we not take this opportunity to get rid of these open-coded wait
> loops?
> 
> 
> Does this work?
> 
> ---
>  arch/powerpc/kvm/book3s_hv.c | 33 +++++++++++++++++----------------
>  virt/kvm/kvm_main.c          | 13 ++-----------
>  2 files changed, 19 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 228049786888..b5b8bcad5105 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -2552,18 +2552,10 @@ static void kvmppc_wait_for_exec(struct kvmppc_vcore *vc,
>  	finish_wait(&vcpu->arch.cpu_run, &wait);
>  }
>  
> -/*
> - * All the vcpus in this vcore are idle, so wait for a decrementer
> - * or external interrupt to one of the vcpus.  vc->lock is held.
> - */
> -static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
> +static inline bool kvmppc_vcore_should_sleep(struct kvmppc_vcore *vc)

This function could also be used in kvmppc_run_vcpu().

>  {
>  	struct kvm_vcpu *vcpu;
> -	int do_sleep = 1;
> -
> -	DEFINE_WAIT(wait);
> -
> -	prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
> +	bool sleep = true;
>  
>  	/*
>  	 * Check one last time for pending exceptions and ceded state after
> @@ -2571,26 +2563,35 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>  	 */
>  	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
>  		if (vcpu->arch.pending_exceptions || !vcpu->arch.ceded) {
> -			do_sleep = 0;
> +			sleep = false;
>  			break;
>  		}
>  	}
>  
> -	if (!do_sleep) {
> -		finish_wait(&vc->wq, &wait);
> -		return;
> -	}
> +	return sleep;
> +}
>  
> +static inline void kvmppc_vcore_schedule(struct kvmppc_vcore *vc)
> +{
>  	vc->vcore_state = VCORE_SLEEPING;
>  	trace_kvmppc_vcore_blocked(vc, 0);
>  	spin_unlock(&vc->lock);
>  	schedule();
> -	finish_wait(&vc->wq, &wait);
>  	spin_lock(&vc->lock);
>  	vc->vcore_state = VCORE_INACTIVE;
>  	trace_kvmppc_vcore_blocked(vc, 1);
>  }
>  
> +/*
> + * All the vcpus in this vcore are idle, so wait for a decrementer
> + * or external interrupt to one of the vcpus.  vc->lock is held.
> + */
> +static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
> +{
> +	___wait_event(vc->wq, !kvmppc_vcore_should_sleep(vc), TASK_IDLE, 0, 0,
> +		      kvmppc_vcore_schedule(vc));

Wow, triple underscores, that must be an ultra-trendy function. :)

> +}
> +
>  static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
>  {
>  	int n_ceded;

That all looks OK at a first glance, I'll give it a whirl.

Paul.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq
  2015-10-21  8:55     ` Paul Mackerras
@ 2015-10-21  9:05       ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-10-21  9:05 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Daniel Wagner, linux-kernel, linux-rt-users, Marcelo Tosatti,
	Paolo Bonzini, Paul E. McKenney, Paul Gortmaker, Thomas Gleixner

On Wed, Oct 21, 2015 at 07:55:00PM +1100, Paul Mackerras wrote:
> On Tue, Oct 20, 2015 at 04:00:31PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 20, 2015 at 09:28:08AM +0200, Daniel Wagner wrote:
> > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > > index 2280497..f534e15 100644
> > > --- a/arch/powerpc/kvm/book3s_hv.c
> > > +++ b/arch/powerpc/kvm/book3s_hv.c
> > > @@ -2560,10 +2560,9 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
> > >  {
> > >  	struct kvm_vcpu *vcpu;
> > >  	int do_sleep = 1;
> > > +	DECLARE_SWAITQUEUE(wait);
> > >  
> > > -	DEFINE_WAIT(wait);
> > > -
> > > -	prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
> > > +	prepare_to_swait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
> > >  
> > >  	/*
> > >  	 * Check one last time for pending exceptions and ceded state after
> > > @@ -2577,7 +2576,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
> > >  	}
> > >  
> > >  	if (!do_sleep) {
> > > -		finish_wait(&vc->wq, &wait);
> > > +		finish_swait(&vc->wq, &wait);
> > >  		return;
> > >  	}
> > >  
> > > @@ -2585,7 +2584,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
> > >  	trace_kvmppc_vcore_blocked(vc, 0);
> > >  	spin_unlock(&vc->lock);
> > >  	schedule();
> > > -	finish_wait(&vc->wq, &wait);
> > > +	finish_swait(&vc->wq, &wait);
> > >  	spin_lock(&vc->lock);
> > >  	vc->vcore_state = VCORE_INACTIVE;
> > >  	trace_kvmppc_vcore_blocked(vc, 1);
> > 
> > This one looks buggy, one should _NOT_ assume that your blocking
> > condition is true after schedule().
> 
> Do you mean it's buggy in calling finish_swait there, or it's buggy in
> not immediately re-checking the condition?  If the latter, then it's
> OK because the sole caller of this function calls it in a loop and
> checks the condition (all runnable vcpus in this vcore are idle) each
> time around the loop.

Ah, I missed the caller loop, yes that's fine.

I'm biased against such code for having seen a few too many broken
open-coded wait loops I suppose..

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq
  2015-10-20 14:00   ` Peter Zijlstra
  2015-10-20 15:40     ` Paolo Bonzini
  2015-10-21  8:55     ` Paul Mackerras
@ 2015-10-21  9:10     ` Paul Mackerras
  2015-10-21  9:24     ` Paul Mackerras
  3 siblings, 0 replies; 26+ messages in thread
From: Paul Mackerras @ 2015-10-21  9:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Wagner, linux-kernel, linux-rt-users, Marcelo Tosatti,
	Paolo Bonzini, Paul E. McKenney, Paul Gortmaker, Thomas Gleixner

On Tue, Oct 20, 2015 at 04:00:31PM +0200, Peter Zijlstra wrote:

> @@ -2018,17 +2017,9 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  		} while (single_task_running() && ktime_before(cur, stop));
>  	}
>  
> -	for (;;) {
> -		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> +	___wait_event(vcpu->wq, kvm_cpu_check_block(vcpu) < 0, TASK_IDLE, 0, 0,

Needs to be kvm_vcpu_check_block not kvm_cpu_check_block (note the
extra 'v').

Paul.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq
  2015-10-20 14:00   ` Peter Zijlstra
                       ` (2 preceding siblings ...)
  2015-10-21  9:10     ` Paul Mackerras
@ 2015-10-21  9:24     ` Paul Mackerras
  2015-10-21 11:13       ` Peter Zijlstra
  3 siblings, 1 reply; 26+ messages in thread
From: Paul Mackerras @ 2015-10-21  9:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Wagner, linux-kernel, linux-rt-users, Marcelo Tosatti,
	Paolo Bonzini, Paul E. McKenney, Paul Gortmaker, Thomas Gleixner

On Tue, Oct 20, 2015 at 04:00:31PM +0200, Peter Zijlstra wrote:
> 
> Should we not take this opportunity to get rid of these open-coded wait
> loops?
> 
> 
> Does this work?

No, on Book3S HV (POWER8) the VM hangs immediately after the kernel
brings up all the secondary vCPUs, and is then unkillable.  I'm not
sure what's wrong, although I wonder why you have TASK_IDLE rather
than TASK_INTERRUPTIBLE in the ___wait_event call.

Paul.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq
  2015-10-21  9:24     ` Paul Mackerras
@ 2015-10-21 11:13       ` Peter Zijlstra
  2015-10-23 11:51         ` Daniel Wagner
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-10-21 11:13 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Daniel Wagner, linux-kernel, linux-rt-users, Marcelo Tosatti,
	Paolo Bonzini, Paul E. McKenney, Paul Gortmaker, Thomas Gleixner

On Wed, Oct 21, 2015 at 08:24:11PM +1100, Paul Mackerras wrote:
> On Tue, Oct 20, 2015 at 04:00:31PM +0200, Peter Zijlstra wrote:
> > 
> > Should we not take this opportunity to get rid of these open-coded wait
> > loops?
> > 
> > 
> > Does this work?
> 
> No, on Book3S HV (POWER8) the VM hangs immediately after the kernel
> brings up all the secondary vCPUs, and is then unkillable.  I'm not
> sure what's wrong, although I wonder why you have TASK_IDLE rather
> than TASK_INTERRUPTIBLE in the ___wait_event call.

This was under the assumption that INTERRUPTIBLE was because you wanted
to avoid increasing load. Which was based on the lack of
signal_pending() tests near there (although there might have been in the
outermost loop which I overlooked).

If it does rely on signals, then this was obviously false and TASK_IDLE
is indeed wrong.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq
  2015-10-21 11:13       ` Peter Zijlstra
@ 2015-10-23 11:51         ` Daniel Wagner
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel Wagner @ 2015-10-23 11:51 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Peter Zijlstra, linux-kernel, linux-rt-users, Marcelo Tosatti,
	Paolo Bonzini, Paul E. McKenney, Paul Gortmaker, Thomas Gleixner

Hi Paul,

On 10/21/2015 01:13 PM, Peter Zijlstra wrote:
> On Wed, Oct 21, 2015 at 08:24:11PM +1100, Paul Mackerras wrote:
>> On Tue, Oct 20, 2015 at 04:00:31PM +0200, Peter Zijlstra wrote:
>>>
>>> Should we not take this opportunity to get rid of these open-coded wait
>>> loops?
>>>
>>>
>>> Does this work?
>>
>> No, on Book3S HV (POWER8) the VM hangs immediately after the kernel
>> brings up all the secondary vCPUs, and is then unkillable.  I'm not
>> sure what's wrong, although I wonder why you have TASK_IDLE rather
>> than TASK_INTERRUPTIBLE in the ___wait_event call.
> 
> This was under the assumption that INTERRUPTIBLE was because you wanted
> to avoid increasing load. Which was based on the lack of
> signal_pending() tests near there (although there might have been in the
> outermost loop which I overlooked).
> 
> If it does rely on signals, then this was obviously false and TASK_IDLE
> is indeed wrong.

If I get this right, the current patch is okay but you are going to
redactor the code? Maybe I can cherry pick your patch then and update
this patch accordingly.

cheers,
daniel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/4] Simple wait queue support
  2015-10-20  7:28 [PATCH v3 0/4] Simple wait queue support Daniel Wagner
                   ` (3 preceding siblings ...)
  2015-10-20  7:28 ` [PATCH v3 4/4] rcu: use simple wait queues where possible in rcutree Daniel Wagner
@ 2015-10-25 20:10 ` Paul E. McKenney
  2015-10-26  6:34   ` Daniel Wagner
  4 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2015-10-25 20:10 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-kernel, linux-rt-users, Marcelo Tosatti, Paolo Bonzini,
	Paul Gortmaker, Peter Zijlstra (Intel),
	Thomas Gleixner

On Tue, Oct 20, 2015 at 09:28:06AM +0200, Daniel Wagner wrote:
> Hi,
> 
> Only small updates in this version, like fixing mips and reordering
> two patches to avoid lockdep warning when doing git bissect.  Reported
> by Fengguang Wu's build robot. Thanks!
> 
> Also removed the unnecessary initialization in the rcu patch as Paul
> pointed out.
> 
> Hopefully, I do a better job on Cc list this time.
> 
> These patches are against
> 
>   tip/master 11f4d95e6b634d7d41e7c2b521fcec261efbf769

I didn't find this commit, so I am (temporarily!) applying against
19a5ecde086a (rcu: Suppress lockdep false positive for rcp->exp_funnel_mutex)
for testing purposes.  RCU appears to be a bit of a moving target here...

							Thanx, Paul

> also available as git tree:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/wagi/linux.git tip-swait
> 
> cheers,
> daniel
> 
> changes since v2
>  - rebased again on tip/master. The patches apply
>    cleanly on v4.3-rc6 too.
>  - fixed up mips
>  - reordered patches to avoid lockdep warning when doing bissect.
>  - remove unnecessary initialization of rsp->rda in rcu_init_one().
> 
> changes since v1 (PATCH v0)
>  - rebased and fixed some typos found by cross building
>    for S390, ARM and powerpc. For some unknown reason didn't catch
>    them last time.
>  - dropped completion patches because it is not clear yet
>    how to handle complete_all() calls hard-irq/atomic contexts
>    and swake_up_all.
> 
> changes since v0 (RFC v0)
>  - promoted the series to PATCH state instead of RFC
>  - fixed a few fallouts with build all and some cross compilers
>    such ARM, PowerPC, S390.
>  - Added the simple waitqueue transformation for KVM from -rt
>    including some numbers requested by Paolo.
>  - Added a commit message to PeterZ's patch. Hope he likes it.
> 
> [I got the numbering wrong in v1, so instead 'PATCH v1' you find it
>  as 'PATCH v0' series]
> 
> v1: http://lwn.net/Articles/656942/
> v0: http://lwn.net/Articles/653586/
> 
> Daniel Wagner (1):
>   rcu: Do not call rcu_nocb_gp_cleanup() while holding rnp->lock
> 
> Marcelo Tosatti (1):
>   KVM: use simple waitqueue for vcpu->wq
> 
> Paul Gortmaker (1):
>   rcu: use simple wait queues where possible in rcutree
> 
> Peter Zijlstra (Intel) (1):
>   wait.[ch]: Introduce the simple waitqueue (swait) implementation
> 
>  arch/arm/kvm/arm.c                  |   4 +-
>  arch/arm/kvm/psci.c                 |   4 +-
>  arch/mips/kvm/mips.c                |   8 +-
>  arch/powerpc/include/asm/kvm_host.h |   4 +-
>  arch/powerpc/kvm/book3s_hv.c        |  23 +++--
>  arch/s390/include/asm/kvm_host.h    |   2 +-
>  arch/s390/kvm/interrupt.c           |   8 +-
>  arch/x86/kvm/lapic.c                |   6 +-
>  include/linux/kvm_host.h            |   5 +-
>  include/linux/swait.h               | 172 ++++++++++++++++++++++++++++++++++++
>  kernel/rcu/tree.c                   |  16 ++--
>  kernel/rcu/tree.h                   |  10 ++-
>  kernel/rcu/tree_plugin.h            |  32 ++++---
>  kernel/sched/Makefile               |   2 +-
>  kernel/sched/swait.c                | 122 +++++++++++++++++++++++++
>  virt/kvm/async_pf.c                 |   4 +-
>  virt/kvm/kvm_main.c                 |  17 ++--
>  17 files changed, 373 insertions(+), 66 deletions(-)
>  create mode 100644 include/linux/swait.h
>  create mode 100644 kernel/sched/swait.c
> 
> -- 
> 2.4.3
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/4] Simple wait queue support
  2015-10-25 20:10 ` [PATCH v3 0/4] Simple wait queue support Paul E. McKenney
@ 2015-10-26  6:34   ` Daniel Wagner
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel Wagner @ 2015-10-26  6:34 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, linux-rt-users, Marcelo Tosatti, Paolo Bonzini,
	Paul Gortmaker, Peter Zijlstra (Intel),
	Thomas Gleixner

On 10/25/2015 09:10 PM, Paul E. McKenney wrote:
> On Tue, Oct 20, 2015 at 09:28:06AM +0200, Daniel Wagner wrote:
>> Only small updates in this version, like fixing mips and reordering
>> two patches to avoid lockdep warning when doing git bissect.  Reported
>> by Fengguang Wu's build robot. Thanks!
>>
>> Also removed the unnecessary initialization in the rcu patch as Paul
>> pointed out.
>>
>> Hopefully, I do a better job on Cc list this time.
>>
>> These patches are against
>>
>>   tip/master 11f4d95e6b634d7d41e7c2b521fcec261efbf769
> 
> I didn't find this commit, so I am (temporarily!) applying against
> 19a5ecde086a (rcu: Suppress lockdep false positive for rcp->exp_funnel_mutex)
> for testing purposes.  

I verified it and I can't find it either in the upstream tree anymore.
The chances I got it wrong is quite likely considering that I just have
dangerous half knowledge on how the tip tree is organized. I was under
the impression that tip/master is a merge only branch. And even if that
is the case there is plenty of possibilities using git the wrong way.

Please let me know which tree is the preferred target. The patches seem
to apply cleanly on most trees so far.

> RCU appears to be a bit of a moving target here...

Yeah, the maintainer of RCU seems to be busy :)

cheers,
daniel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/4] wait.[ch]: Introduce the simple waitqueue (swait) implementation
  2015-10-20  7:28 ` [PATCH v3 1/4] wait.[ch]: Introduce the simple waitqueue (swait) implementation Daniel Wagner
@ 2015-10-26 12:04   ` Boqun Feng
  2015-10-26 12:28     ` Peter Zijlstra
  2015-10-26 12:59     ` Daniel Wagner
  2015-11-04 10:33   ` Thomas Gleixner
  1 sibling, 2 replies; 26+ messages in thread
From: Boqun Feng @ 2015-10-26 12:04 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-kernel, linux-rt-users, Peter Zijlstra (Intel),
	Paul Gortmaker, Marcelo Tosatti, Paolo Bonzini, Paul E. McKenney,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 710 bytes --]

Hi Daniel,

On Tue, Oct 20, 2015 at 09:28:07AM +0200, Daniel Wagner wrote:
> +
> +/*
> + * The thing about the wake_up_state() return value; I think we can ignore it.
> + *
> + * If for some reason it would return 0, that means the previously waiting
> + * task is already running, so it will observe condition true (or has already).
> + */
> +void swake_up_locked(struct swait_queue_head *q)
> +{
> +	struct swait_queue *curr;
> +
> +	list_for_each_entry(curr, &q->task_list, task_list) {
> +		wake_up_process(curr->task);
> +		list_del_init(&curr->task_list);
> +		break;

Just be curious, what's this break for? Or what's this loop(?) for?

> +	}
> +}
> +EXPORT_SYMBOL(swake_up_locked);
> +

Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/4] wait.[ch]: Introduce the simple waitqueue (swait) implementation
  2015-10-26 12:04   ` Boqun Feng
@ 2015-10-26 12:28     ` Peter Zijlstra
  2015-10-26 12:59     ` Daniel Wagner
  1 sibling, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-10-26 12:28 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Daniel Wagner, linux-kernel, linux-rt-users, Paul Gortmaker,
	Marcelo Tosatti, Paolo Bonzini, Paul E. McKenney,
	Thomas Gleixner

On Mon, Oct 26, 2015 at 08:04:26PM +0800, Boqun Feng wrote:
> Hi Daniel,
> 
> On Tue, Oct 20, 2015 at 09:28:07AM +0200, Daniel Wagner wrote:
> > +
> > +/*
> > + * The thing about the wake_up_state() return value; I think we can ignore it.
> > + *
> > + * If for some reason it would return 0, that means the previously waiting
> > + * task is already running, so it will observe condition true (or has already).
> > + */
> > +void swake_up_locked(struct swait_queue_head *q)
> > +{
> > +	struct swait_queue *curr;
> > +
> > +	list_for_each_entry(curr, &q->task_list, task_list) {
> > +		wake_up_process(curr->task);
> > +		list_del_init(&curr->task_list);
> > +		break;
> 
> Just be curious, what's this break for? Or what's this loop(?) for?

Lazy way of writing: if (!empty) { curr = first-entry;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/4] wait.[ch]: Introduce the simple waitqueue (swait) implementation
  2015-10-26 12:04   ` Boqun Feng
  2015-10-26 12:28     ` Peter Zijlstra
@ 2015-10-26 12:59     ` Daniel Wagner
  2015-10-26 13:26       ` Peter Zijlstra
  1 sibling, 1 reply; 26+ messages in thread
From: Daniel Wagner @ 2015-10-26 12:59 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, linux-rt-users, Peter Zijlstra (Intel),
	Paul Gortmaker, Marcelo Tosatti, Paolo Bonzini, Paul E. McKenney,
	Thomas Gleixner

Hi Boqun,

On 10/26/2015 01:04 PM, Boqun Feng wrote:
> On Tue, Oct 20, 2015 at 09:28:07AM +0200, Daniel Wagner wrote:
>> +
>> +/*
>> + * The thing about the wake_up_state() return value; I think we can ignore it.
>> + *
>> + * If for some reason it would return 0, that means the previously waiting
>> + * task is already running, so it will observe condition true (or has already).
>> + */
>> +void swake_up_locked(struct swait_queue_head *q)
>> +{
>> +	struct swait_queue *curr;
>> +
>> +	list_for_each_entry(curr, &q->task_list, task_list) {
>> +		wake_up_process(curr->task);
>> +		list_del_init(&curr->task_list);
>> +		break;
> 
> Just be curious, what's this break for? Or what's this loop(?) for?

I have to guess here, since Peter wrote it. It looks like the function
is based on __wake_up_common(). Though I agree the loop is not necessary
and something like below should the trick. Unless I do not see something
important.

	void swake_up_locked(struct swait_queue_head *q)
	{
		struct swait_queue *curr;

		if (list_emtpy(&q))
			return;

		curr = list_first_entry(&q, typeof(*curr), task_list);
		wake_up_process(curr->task);
		list_del_init(&curr->task_list);
	}

If Peter is not complaining I change swake_up_locked() for the next version.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/4] wait.[ch]: Introduce the simple waitqueue (swait) implementation
  2015-10-26 12:59     ` Daniel Wagner
@ 2015-10-26 13:26       ` Peter Zijlstra
  2015-10-26 14:19         ` Boqun Feng
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-10-26 13:26 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Boqun Feng, linux-kernel, linux-rt-users, Paul Gortmaker,
	Marcelo Tosatti, Paolo Bonzini, Paul E. McKenney,
	Thomas Gleixner

On Mon, Oct 26, 2015 at 01:59:44PM +0100, Daniel Wagner wrote:
> Hi Boqun,
> 
> On 10/26/2015 01:04 PM, Boqun Feng wrote:
> > On Tue, Oct 20, 2015 at 09:28:07AM +0200, Daniel Wagner wrote:
> >> +
> >> +/*
> >> + * The thing about the wake_up_state() return value; I think we can ignore it.
> >> + *
> >> + * If for some reason it would return 0, that means the previously waiting
> >> + * task is already running, so it will observe condition true (or has already).
> >> + */
> >> +void swake_up_locked(struct swait_queue_head *q)
> >> +{
> >> +	struct swait_queue *curr;
> >> +
> >> +	list_for_each_entry(curr, &q->task_list, task_list) {
> >> +		wake_up_process(curr->task);
> >> +		list_del_init(&curr->task_list);
> >> +		break;
> > 
> > Just be curious, what's this break for? Or what's this loop(?) for?
> 
> I have to guess here, since Peter wrote it. It looks like the function
> is based on __wake_up_common(). Though I agree the loop is not necessary
> and something like below should the trick. Unless I do not see something
> important.
> 
> 	void swake_up_locked(struct swait_queue_head *q)
> 	{
> 		struct swait_queue *curr;
> 
> 		if (list_emtpy(&q))
> 			return;
> 
> 		curr = list_first_entry(&q, typeof(*curr), task_list);
> 		wake_up_process(curr->task);
> 		list_del_init(&curr->task_list);
> 	}
> 
> If Peter is not complaining I change swake_up_locked() for the next version.

Yes, that is equivalent, just more code. As I wrote in my last email; I
was lazy :-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/4] wait.[ch]: Introduce the simple waitqueue (swait) implementation
  2015-10-26 13:26       ` Peter Zijlstra
@ 2015-10-26 14:19         ` Boqun Feng
  0 siblings, 0 replies; 26+ messages in thread
From: Boqun Feng @ 2015-10-26 14:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Wagner, linux-kernel, linux-rt-users, Paul Gortmaker,
	Marcelo Tosatti, Paolo Bonzini, Paul E. McKenney,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 2249 bytes --]

On Mon, Oct 26, 2015 at 02:26:01PM +0100, Peter Zijlstra wrote:
> On Mon, Oct 26, 2015 at 01:59:44PM +0100, Daniel Wagner wrote:
> > Hi Boqun,
> > 
> > On 10/26/2015 01:04 PM, Boqun Feng wrote:
> > > On Tue, Oct 20, 2015 at 09:28:07AM +0200, Daniel Wagner wrote:
> > >> +
> > >> +/*
> > >> + * The thing about the wake_up_state() return value; I think we can ignore it.
> > >> + *
> > >> + * If for some reason it would return 0, that means the previously waiting
> > >> + * task is already running, so it will observe condition true (or has already).
> > >> + */
> > >> +void swake_up_locked(struct swait_queue_head *q)
> > >> +{
> > >> +	struct swait_queue *curr;
> > >> +
> > >> +	list_for_each_entry(curr, &q->task_list, task_list) {
> > >> +		wake_up_process(curr->task);
> > >> +		list_del_init(&curr->task_list);
> > >> +		break;
> > > 
> > > Just be curious, what's this break for? Or what's this loop(?) for?
> > 
> > I have to guess here, since Peter wrote it. It looks like the function
> > is based on __wake_up_common(). Though I agree the loop is not necessary
> > and something like below should the trick. Unless I do not see something
> > important.
> > 
> > 	void swake_up_locked(struct swait_queue_head *q)
> > 	{
> > 		struct swait_queue *curr;
> > 
> > 		if (list_emtpy(&q))
> > 			return;
> > 
> > 		curr = list_first_entry(&q, typeof(*curr), task_list);
> > 		wake_up_process(curr->task);
> > 		list_del_init(&curr->task_list);
> > 	}
> > 
> > If Peter is not complaining I change swake_up_locked() for the next version.

This gains better readability, I think ;-)

> 
> Yes, that is equivalent, just more code. As I wrote in my last email; I
> was lazy :-)

;-)

Maybe introduce a list_pick_one_if_any() macro for convenience:

	#define list_pick_one_if_any(pos, list, member) 	\
	if (!list_empty(list) && (pos = list_first_entry(list, typeof(*pos), member), 1)) 

then

	void swake_up_locked(struct swait_queue_head *q)
	{
		struct swait_queue *curr;

		list_pick_one_if_any(curr, q->task_list, task_list) {
			wake_up_process(curr->task);
			list_del_init(&curr->task_list);
		}
	}
	


Anyway, thank you both for going through this.

Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/4] wait.[ch]: Introduce the simple waitqueue (swait) implementation
  2015-10-20  7:28 ` [PATCH v3 1/4] wait.[ch]: Introduce the simple waitqueue (swait) implementation Daniel Wagner
  2015-10-26 12:04   ` Boqun Feng
@ 2015-11-04 10:33   ` Thomas Gleixner
  2015-11-04 12:12     ` Daniel Wagner
  2015-11-18 10:33     ` Peter Zijlstra
  1 sibling, 2 replies; 26+ messages in thread
From: Thomas Gleixner @ 2015-11-04 10:33 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-kernel, linux-rt-users, Peter Zijlstra (Intel),
	Paul Gortmaker, Marcelo Tosatti, Paolo Bonzini, Paul E. McKenney

On Tue, 20 Oct 2015, Daniel Wagner wrote:
> +
> +extern void swake_up(struct swait_queue_head *q);
> +extern void swake_up_all(struct swait_queue_head *q);
> +extern void swake_up_locked(struct swait_queue_head *q);

I intentionally named these functions swait_wake* in my initial
implementation for two reasons:

  - typoing wake_up vs. swake_up only emits a compiler warning and does
    not break the build
    
  - I really prefer new infrastructure to have a consistent prefix
    which reflects the "subsystem". That's simpler to read and simpler
    to grep for.

> +extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
> +extern void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state);
> +extern long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state);
> +
> +extern void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait);
> +extern void finish_swait(struct swait_queue_head *q, struct swait_queue *wait);

Can we please go with the original names?

swait_prepare()
swait_prepare_locked()
swait_finish()
swait_finish_locked()

Hmm?

> +#define swait_event(wq, condition)					\

Here we have the same swait vs. wait problem as above. So either we
come up with a slightly different name or have an explicit type check
in __swait_event event.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/4] wait.[ch]: Introduce the simple waitqueue (swait) implementation
  2015-11-04 10:33   ` Thomas Gleixner
@ 2015-11-04 12:12     ` Daniel Wagner
  2015-11-18 10:33     ` Peter Zijlstra
  1 sibling, 0 replies; 26+ messages in thread
From: Daniel Wagner @ 2015-11-04 12:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, linux-rt-users, Peter Zijlstra (Intel),
	Paul Gortmaker, Marcelo Tosatti, Paolo Bonzini, Paul E. McKenney

On 11/04/2015 11:33 AM, Thomas Gleixner wrote:
> On Tue, 20 Oct 2015, Daniel Wagner wrote:
>> +
>> +extern void swake_up(struct swait_queue_head *q);
>> +extern void swake_up_all(struct swait_queue_head *q);
>> +extern void swake_up_locked(struct swait_queue_head *q);
> 
> I intentionally named these functions swait_wake* in my initial
> implementation for two reasons:
> 
>   - typoing wake_up vs. swake_up only emits a compiler warning and does
>     not break the build

I played a bit around on this and came up with the patch below. The type
check results in an error.
    
>   - I really prefer new infrastructure to have a consistent prefix
>     which reflects the "subsystem". That's simpler to read and simpler
>     to grep for.
> 
>> +extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
>> +extern void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state);
>> +extern long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state);
>> +
>> +extern void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait);
>> +extern void finish_swait(struct swait_queue_head *q, struct swait_queue *wait);
> 
> Can we please go with the original names?
> 
> swait_prepare()
> swait_prepare_locked()
> swait_finish()
> swait_finish_locked()
> 
> Hmm?

I defer to Peter :)

>> +#define swait_event(wq, condition)					\
> 
> Here we have the same swait vs. wait problem as above. So either we
> come up with a slightly different name or have an explicit type check
> in __swait_event event.

What about something like this:

diff --git a/include/linux/swait.h b/include/linux/swait.h
index c1f9c62..f59369d 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -6,6 +6,9 @@
 #include <linux/spinlock.h>
 #include <asm/current.h>
 
+#define compiletime_assert_same_type(a, b) \
+       compiletime_assert(__same_type(a, b), "Need to match correct type");
+
 /*
  * Simple wait queues
  *
@@ -66,6 +69,7 @@ extern void __init_swait_queue_head(struct swait_queue_head *q, const char *name
 #define init_swait_queue_head(q)                               \
        do {                                                    \
                static struct lock_class_key __key;             \
+               compiletime_assert_same_type(struct swait_queue_head *, q); \
                __init_swait_queue_head((q), #q, &__key);       \
        } while (0)
 


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/4] wait.[ch]: Introduce the simple waitqueue (swait) implementation
  2015-11-04 10:33   ` Thomas Gleixner
  2015-11-04 12:12     ` Daniel Wagner
@ 2015-11-18 10:33     ` Peter Zijlstra
  2015-11-18 15:07       ` Thomas Gleixner
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-11-18 10:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Daniel Wagner, linux-kernel, linux-rt-users, Paul Gortmaker,
	Marcelo Tosatti, Paolo Bonzini, Paul E. McKenney, Ingo Molnar

On Wed, Nov 04, 2015 at 11:33:51AM +0100, Thomas Gleixner wrote:
> On Tue, 20 Oct 2015, Daniel Wagner wrote:
> > +
> > +extern void swake_up(struct swait_queue_head *q);
> > +extern void swake_up_all(struct swait_queue_head *q);
> > +extern void swake_up_locked(struct swait_queue_head *q);
> 
> I intentionally named these functions swait_wake* in my initial
> implementation for two reasons:
> 
>   - typoing wake_up vs. swake_up only emits a compiler warning and does
>     not break the build

-Werror ftw, but yes good point.

>   - I really prefer new infrastructure to have a consistent prefix
>     which reflects the "subsystem". That's simpler to read and simpler
>     to grep for.

I generally agree, but seeing how this is really an 'extension' of
existing infrastructure, I went along with this.

> > +extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
> > +extern void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state);
> > +extern long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state);
> > +
> > +extern void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait);
> > +extern void finish_swait(struct swait_queue_head *q, struct swait_queue *wait);
> 
> Can we please go with the original names?
> 
> swait_prepare()
> swait_prepare_locked()
> swait_finish()
> swait_finish_locked()
> 
> Hmm?
> 
> > +#define swait_event(wq, condition)					\
> 
> Here we have the same swait vs. wait problem as above. So either we
> come up with a slightly different name or have an explicit type check
> in __swait_event event.

Type check macros, otherwise you break the naming scheme you so
adamantly push for (or end up with horrid stuff like
swait_wait_event()).

I suppose we can do the rename as you propose to avoid single letter
typoes, but it does bug me to have two nearly identical bits of infra
with such dissimilar names.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/4] wait.[ch]: Introduce the simple waitqueue (swait) implementation
  2015-11-18 10:33     ` Peter Zijlstra
@ 2015-11-18 15:07       ` Thomas Gleixner
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2015-11-18 15:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Wagner, linux-kernel, linux-rt-users, Paul Gortmaker,
	Marcelo Tosatti, Paolo Bonzini, Paul E. McKenney, Ingo Molnar

On Wed, 18 Nov 2015, Peter Zijlstra wrote:
> I suppose we can do the rename as you propose to avoid single letter
> typoes, but it does bug me to have two nearly identical bits of infra
> with such dissimilar names.

I can see your point, but OTOH the existing interface is ugly and
copying it does not make it any better.

But I'm really not religious about that. Up to you :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-11-18 15:07 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-20  7:28 [PATCH v3 0/4] Simple wait queue support Daniel Wagner
2015-10-20  7:28 ` [PATCH v3 1/4] wait.[ch]: Introduce the simple waitqueue (swait) implementation Daniel Wagner
2015-10-26 12:04   ` Boqun Feng
2015-10-26 12:28     ` Peter Zijlstra
2015-10-26 12:59     ` Daniel Wagner
2015-10-26 13:26       ` Peter Zijlstra
2015-10-26 14:19         ` Boqun Feng
2015-11-04 10:33   ` Thomas Gleixner
2015-11-04 12:12     ` Daniel Wagner
2015-11-18 10:33     ` Peter Zijlstra
2015-11-18 15:07       ` Thomas Gleixner
2015-10-20  7:28 ` [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq Daniel Wagner
2015-10-20 13:11   ` Paolo Bonzini
2015-10-20 14:00   ` Peter Zijlstra
2015-10-20 15:40     ` Paolo Bonzini
2015-10-20 16:06       ` Peter Zijlstra
2015-10-21  8:55     ` Paul Mackerras
2015-10-21  9:05       ` Peter Zijlstra
2015-10-21  9:10     ` Paul Mackerras
2015-10-21  9:24     ` Paul Mackerras
2015-10-21 11:13       ` Peter Zijlstra
2015-10-23 11:51         ` Daniel Wagner
2015-10-20  7:28 ` [PATCH v3 3/4] rcu: Do not call rcu_nocb_gp_cleanup() while holding rnp->lock Daniel Wagner
2015-10-20  7:28 ` [PATCH v3 4/4] rcu: use simple wait queues where possible in rcutree Daniel Wagner
2015-10-25 20:10 ` [PATCH v3 0/4] Simple wait queue support Paul E. McKenney
2015-10-26  6:34   ` Daniel Wagner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.