[patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3)
@ 2015-01-14 17:12 Marcelo Tosatti
  2015-01-14 17:12 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti
                   ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2015-01-14 17:12 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner,
	kvm, Paolo Bonzini

Against v3.14-rt branch of 
git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

The problem:

On -RT, an emulated LAPIC timer instance has the following path:

1) hard interrupt
2) ksoftirqd is scheduled
3) ksoftirqd wakes up vcpu thread
4) vcpu thread is scheduled

This extra context switch introduces unnecessary latency in the
LAPIC path for a KVM guest.

The solution:

Allow waking up vcpu thread from hardirq context,
thus avoiding the need for ksoftirqd to be scheduled.

Normal waitqueues make use of spinlocks, which on -RT
are sleepable locks. Therefore, waking up a waitqueue
waiter involves locking a sleeping lock, which
is not allowed from hard interrupt context.

cyclictest command line:
# cyclictest -m -n -q -p99 -l 1000000 -h60  -D 1m

This patch reduces the average latency in my tests from 14us to 11us.

v2: improve changelog (Rik van Riel)
v3: limit (once) guest triggered printk and WARN_ON (Paolo Bonzini)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-01-14 17:12 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) Marcelo Tosatti
@ 2015-01-14 17:12 ` Marcelo Tosatti
  2015-01-14 18:22   ` Rik van Riel
                     ` (2 more replies)
  2015-01-14 17:12 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
  2015-01-14 17:35 ` [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) Paolo Bonzini
  2 siblings, 3 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2015-01-14 17:12 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner,
	kvm, Paolo Bonzini, Marcelo Tosatti

[-- Attachment #1: kvm-wq-use-simple-waitqueue --]
[-- Type: text/plain, Size: 14747 bytes --]

The problem:

On -RT, an emulated LAPIC timer instances has the following path:

1) hard interrupt
2) ksoftirqd is scheduled
3) ksoftirqd wakes up vcpu thread
4) vcpu thread is scheduled

This extra context switch introduces unnecessary latency in the 
LAPIC path for a KVM guest.

The solution:

Allow waking up vcpu thread from hardirq context,
thus avoiding the need for ksoftirqd to be scheduled.

Normal waitqueues make use of spinlocks, which on -RT 
are sleepable locks. Therefore, waking up a waitqueue 
waiter involves locking a sleeping lock, which 
is not allowed from hard interrupt context.

cyclictest command line:
# cyclictest -m -n -q -p99 -l 1000000 -h60  -D 1m

This patch reduces the average latency in my tests from 14us to 11us.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 arch/arm/kvm/arm.c                  |    4 ++--
 arch/arm/kvm/psci.c                 |    4 ++--
 arch/mips/kvm/kvm_mips.c            |    8 ++++----
 arch/powerpc/include/asm/kvm_host.h |    4 ++--
 arch/powerpc/kvm/book3s_hv.c        |   20 ++++++++++----------
 arch/s390/include/asm/kvm_host.h    |    2 +-
 arch/s390/kvm/interrupt.c           |   22 ++++++++++------------
 arch/s390/kvm/sigp.c                |   16 ++++++++--------
 arch/x86/kvm/lapic.c                |    6 +++---
 include/linux/kvm_host.h            |    4 ++--
 virt/kvm/async_pf.c                 |    4 ++--
 virt/kvm/kvm_main.c                 |   16 ++++++++--------
 12 files changed, 54 insertions(+), 56 deletions(-)

Index: linux-stable-rt/arch/arm/kvm/arm.c
===================================================================
--- linux-stable-rt.orig/arch/arm/kvm/arm.c	2014-11-25 14:13:39.188899952 -0200
+++ linux-stable-rt/arch/arm/kvm/arm.c	2014-11-25 14:14:38.620810092 -0200
@@ -495,9 +495,9 @@
 
 static void vcpu_pause(struct kvm_vcpu *vcpu)
 {
-	wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
+	struct swait_head *wq = kvm_arch_vcpu_wq(vcpu);
 
-	wait_event_interruptible(*wq, !vcpu->arch.pause);
+	swait_event_interruptible(*wq, !vcpu->arch.pause);
 }
 
 static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
Index: linux-stable-rt/arch/arm/kvm/psci.c
===================================================================
--- linux-stable-rt.orig/arch/arm/kvm/psci.c	2014-11-25 14:13:39.189899951 -0200
+++ linux-stable-rt/arch/arm/kvm/psci.c	2014-11-25 14:14:38.620810092 -0200
@@ -36,7 +36,7 @@
 {
 	struct kvm *kvm = source_vcpu->kvm;
 	struct kvm_vcpu *vcpu = NULL, *tmp;
-	wait_queue_head_t *wq;
+	struct swait_head *wq;
 	unsigned long cpu_id;
 	unsigned long mpidr;
 	phys_addr_t target_pc;
@@ -80,7 +80,7 @@
 	smp_mb();		/* Make sure the above is visible */
 
 	wq = kvm_arch_vcpu_wq(vcpu);
-	wake_up_interruptible(wq);
+	swait_wake_interruptible(wq);
 
 	return KVM_PSCI_RET_SUCCESS;
 }
Index: linux-stable-rt/arch/mips/kvm/kvm_mips.c
===================================================================
--- linux-stable-rt.orig/arch/mips/kvm/kvm_mips.c	2014-11-25 14:13:39.191899948 -0200
+++ linux-stable-rt/arch/mips/kvm/kvm_mips.c	2014-11-25 14:14:38.621810091 -0200
@@ -464,8 +464,8 @@
 
 	dvcpu->arch.wait = 0;
 
-	if (waitqueue_active(&dvcpu->wq)) {
-		wake_up_interruptible(&dvcpu->wq);
+	if (swaitqueue_active(&dvcpu->wq)) {
+		swait_wake_interruptible(&dvcpu->wq);
 	}
 
 	return 0;
@@ -971,8 +971,8 @@
 	kvm_mips_callbacks->queue_timer_int(vcpu);
 
 	vcpu->arch.wait = 0;
-	if (waitqueue_active(&vcpu->wq)) {
-		wake_up_interruptible(&vcpu->wq);
+	if (swaitqueue_active(&vcpu->wq)) {
+		swait_wake_nterruptible(&vcpu->wq);
 	}
 }
 
Index: linux-stable-rt/arch/powerpc/include/asm/kvm_host.h
===================================================================
--- linux-stable-rt.orig/arch/powerpc/include/asm/kvm_host.h	2014-11-25 14:13:39.193899944 -0200
+++ linux-stable-rt/arch/powerpc/include/asm/kvm_host.h	2014-11-25 14:14:38.621810091 -0200
@@ -295,7 +295,7 @@
 	u8 in_guest;
 	struct list_head runnable_threads;
 	spinlock_t lock;
-	wait_queue_head_t wq;
+	struct swait_head wq;
 	u64 stolen_tb;
 	u64 preempt_tb;
 	struct kvm_vcpu *runner;
@@ -612,7 +612,7 @@
 	u8 prodded;
 	u32 last_inst;
 
-	wait_queue_head_t *wqp;
+	struct swait_head *wqp;
 	struct kvmppc_vcore *vcore;
 	int ret;
 	int trap;
Index: linux-stable-rt/arch/powerpc/kvm/book3s_hv.c
===================================================================
--- linux-stable-rt.orig/arch/powerpc/kvm/book3s_hv.c	2014-11-25 14:13:39.195899942 -0200
+++ linux-stable-rt/arch/powerpc/kvm/book3s_hv.c	2014-11-25 14:14:38.625810085 -0200
@@ -74,11 +74,11 @@
 {
 	int me;
 	int cpu = vcpu->cpu;
-	wait_queue_head_t *wqp;
+	struct swait_head *wqp;
 
 	wqp = kvm_arch_vcpu_wq(vcpu);
-	if (waitqueue_active(wqp)) {
-		wake_up_interruptible(wqp);
+	if (swaitqueue_active(wqp)) {
+		swait_wake_interruptible(wqp);
 		++vcpu->stat.halt_wakeup;
 	}
 
@@ -583,8 +583,8 @@
 		tvcpu->arch.prodded = 1;
 		smp_mb();
 		if (vcpu->arch.ceded) {
-			if (waitqueue_active(&vcpu->wq)) {
-				wake_up_interruptible(&vcpu->wq);
+			if (swaitqueue_active(&vcpu->wq)) {
+				swait_wake_interruptible(&vcpu->wq);
 				vcpu->stat.halt_wakeup++;
 			}
 		}
@@ -1199,7 +1199,7 @@
 		if (vcore) {
 			INIT_LIST_HEAD(&vcore->runnable_threads);
 			spin_lock_init(&vcore->lock);
-			init_waitqueue_head(&vcore->wq);
+			init_swait_head(&vcore->wq);
 			vcore->preempt_tb = TB_NIL;
 			vcore->lpcr = kvm->arch.lpcr;
 			vcore->first_vcpuid = core * threads_per_core;
@@ -1566,13 +1566,13 @@
  */
 static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 {
-	DEFINE_WAIT(wait);
+	DEFINE_SWAITER(wait);
 
-	prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
+	swait_prepare(&vc->wq, &wait, TASK_INTERRUPTIBLE);
 	vc->vcore_state = VCORE_SLEEPING;
 	spin_unlock(&vc->lock);
 	schedule();
-	finish_wait(&vc->wq, &wait);
+	swait_finish(&vc->wq, &wait);
 	spin_lock(&vc->lock);
 	vc->vcore_state = VCORE_INACTIVE;
 }
@@ -1613,7 +1613,7 @@
 			kvmppc_create_dtl_entry(vcpu, vc);
 			kvmppc_start_thread(vcpu);
 		} else if (vc->vcore_state == VCORE_SLEEPING) {
-			wake_up(&vc->wq);
+			swait_wake(&vc->wq);
 		}
 
 	}
Index: linux-stable-rt/arch/s390/include/asm/kvm_host.h
===================================================================
--- linux-stable-rt.orig/arch/s390/include/asm/kvm_host.h	2014-11-25 14:13:39.196899940 -0200
+++ linux-stable-rt/arch/s390/include/asm/kvm_host.h	2014-11-25 14:14:38.627810082 -0200
@@ -233,7 +233,7 @@
 	atomic_t active;
 	struct kvm_s390_float_interrupt *float_int;
 	int timer_due; /* event indicator for waitqueue below */
-	wait_queue_head_t *wq;
+	struct swait_head *wq;
 	atomic_t *cpuflags;
 	unsigned int action_bits;
 };
Index: linux-stable-rt/arch/s390/kvm/interrupt.c
===================================================================
--- linux-stable-rt.orig/arch/s390/kvm/interrupt.c	2014-11-25 14:13:39.197899939 -0200
+++ linux-stable-rt/arch/s390/kvm/interrupt.c	2014-11-25 14:14:38.630810077 -0200
@@ -403,7 +403,7 @@
 int kvm_s390_handle_wait(struct kvm_vcpu *vcpu)
 {
 	u64 now, sltime;
-	DECLARE_WAITQUEUE(wait, current);
+	DECLARE_SWAITER(wait);
 
 	vcpu->stat.exit_wait_state++;
 	if (kvm_cpu_has_interrupt(vcpu))
@@ -440,12 +440,11 @@
 	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
 	spin_lock(&vcpu->arch.local_int.float_int->lock);
 	spin_lock_bh(&vcpu->arch.local_int.lock);
-	add_wait_queue(&vcpu->wq, &wait);
+	swait_prepare(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
 	while (list_empty(&vcpu->arch.local_int.list) &&
 		list_empty(&vcpu->arch.local_int.float_int->list) &&
 		(!vcpu->arch.local_int.timer_due) &&
 		!signal_pending(current)) {
-		set_current_state(TASK_INTERRUPTIBLE);
 		spin_unlock_bh(&vcpu->arch.local_int.lock);
 		spin_unlock(&vcpu->arch.local_int.float_int->lock);
 		schedule();
@@ -453,8 +452,7 @@
 		spin_lock_bh(&vcpu->arch.local_int.lock);
 	}
 	__unset_cpu_idle(vcpu);
-	__set_current_state(TASK_RUNNING);
-	remove_wait_queue(&vcpu->wq, &wait);
+	swait_finish(&vcpu->wq, &wait);
 	spin_unlock_bh(&vcpu->arch.local_int.lock);
 	spin_unlock(&vcpu->arch.local_int.float_int->lock);
 	vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
@@ -469,8 +467,8 @@
 
 	spin_lock(&vcpu->arch.local_int.lock);
 	vcpu->arch.local_int.timer_due = 1;
-	if (waitqueue_active(&vcpu->wq))
-		wake_up_interruptible(&vcpu->wq);
+	if (swaitqueue_active(&vcpu->wq))
+		swait_wake_interruptible(&vcpu->wq);
 	spin_unlock(&vcpu->arch.local_int.lock);
 }
 
@@ -617,7 +615,7 @@
 	spin_lock_bh(&li->lock);
 	list_add(&inti->list, &li->list);
 	atomic_set(&li->active, 1);
-	BUG_ON(waitqueue_active(li->wq));
+	BUG_ON(swaitqueue_active(li->wq));
 	spin_unlock_bh(&li->lock);
 	return 0;
 }
@@ -750,8 +748,8 @@
 	li = fi->local_int[sigcpu];
 	spin_lock_bh(&li->lock);
 	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
-	if (waitqueue_active(li->wq))
-		wake_up_interruptible(li->wq);
+	if (swaitqueue_active(li->wq))
+		swait_wake_interruptible(li->wq);
 	spin_unlock_bh(&li->lock);
 	spin_unlock(&fi->lock);
 	mutex_unlock(&kvm->lock);
@@ -836,8 +834,8 @@
 	if (inti->type == KVM_S390_SIGP_STOP)
 		li->action_bits |= ACTION_STOP_ON_STOP;
 	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
-	if (waitqueue_active(&vcpu->wq))
-		wake_up_interruptible(&vcpu->wq);
+	if (swaitqueue_active(&vcpu->wq))
+		swait_wake_interruptible(&vcpu->wq);
 	spin_unlock_bh(&li->lock);
 	mutex_unlock(&vcpu->kvm->lock);
 	return 0;
Index: linux-stable-rt/arch/s390/kvm/sigp.c
===================================================================
--- linux-stable-rt.orig/arch/s390/kvm/sigp.c	2014-11-25 14:13:39.198899937 -0200
+++ linux-stable-rt/arch/s390/kvm/sigp.c	2014-11-25 14:14:38.633810073 -0200
@@ -79,8 +79,8 @@
 	list_add_tail(&inti->list, &li->list);
 	atomic_set(&li->active, 1);
 	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
-	if (waitqueue_active(li->wq))
-		wake_up_interruptible(li->wq);
+	if (swaitqueue_active(li->wq))
+		swait_wake_interruptible(li->wq);
 	spin_unlock_bh(&li->lock);
 	rc = SIGP_CC_ORDER_CODE_ACCEPTED;
 	VCPU_EVENT(vcpu, 4, "sent sigp emerg to cpu %x", cpu_addr);
@@ -148,8 +148,8 @@
 	list_add_tail(&inti->list, &li->list);
 	atomic_set(&li->active, 1);
 	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
-	if (waitqueue_active(li->wq))
-		wake_up_interruptible(li->wq);
+	if (swaitqueue_active(li->wq))
+		swait_wake_interruptible(li->wq);
 	spin_unlock_bh(&li->lock);
 	rc = SIGP_CC_ORDER_CODE_ACCEPTED;
 	VCPU_EVENT(vcpu, 4, "sent sigp ext call to cpu %x", cpu_addr);
@@ -179,8 +179,8 @@
 	atomic_set(&li->active, 1);
 	atomic_set_mask(CPUSTAT_STOP_INT, li->cpuflags);
 	li->action_bits |= action;
-	if (waitqueue_active(li->wq))
-		wake_up_interruptible(li->wq);
+	if (swaitqueue_active(li->wq))
+		swait_wake_interruptible(li->wq);
 out:
 	spin_unlock_bh(&li->lock);
 
@@ -288,8 +288,8 @@
 
 	list_add_tail(&inti->list, &li->list);
 	atomic_set(&li->active, 1);
-	if (waitqueue_active(li->wq))
-		wake_up_interruptible(li->wq);
+	if (swaitqueue_active(li->wq))
+		swait_wake_interruptible(li->wq);
 	rc = SIGP_CC_ORDER_CODE_ACCEPTED;
 
 	VCPU_EVENT(vcpu, 4, "set prefix of cpu %02x to %x", cpu_addr, address);
Index: linux-stable-rt/arch/x86/kvm/lapic.c
===================================================================
--- linux-stable-rt.orig/arch/x86/kvm/lapic.c	2014-11-25 14:13:39.199899935 -0200
+++ linux-stable-rt/arch/x86/kvm/lapic.c	2015-01-14 15:02:41.135771065 -0200
@@ -1533,7 +1533,7 @@
 	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
 	struct kvm_lapic *apic = container_of(ktimer, struct kvm_lapic, lapic_timer);
 	struct kvm_vcpu *vcpu = apic->vcpu;
-	wait_queue_head_t *q = &vcpu->wq;
+	struct swait_head *q = &vcpu->wq;
 
 	/*
 	 * There is a race window between reading and incrementing, but we do
@@ -1547,8 +1547,8 @@
 		kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
 	}
 
-	if (waitqueue_active(q))
-		wake_up_interruptible(q);
+	if (swaitqueue_active(q))
+		swait_wake_interruptible(q);
 
 	if (lapic_is_periodic(apic)) {
 		hrtimer_add_expires_ns(&ktimer->timer, ktimer->period);
Index: linux-stable-rt/include/linux/kvm_host.h
===================================================================
--- linux-stable-rt.orig/include/linux/kvm_host.h	2014-11-25 14:13:39.201899932 -0200
+++ linux-stable-rt/include/linux/kvm_host.h	2014-11-25 14:14:38.638810065 -0200
@@ -231,7 +231,7 @@
 
 	int fpu_active;
 	int guest_fpu_loaded, guest_xcr0_loaded;
-	wait_queue_head_t wq;
+	struct swait_head wq;
 	struct pid *pid;
 	int sigset_active;
 	sigset_t sigset;
@@ -677,7 +677,7 @@
 }
 #endif
 
-static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
+static inline struct swait_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
 {
 #ifdef __KVM_HAVE_ARCH_WQP
 	return vcpu->arch.wqp;
Index: linux-stable-rt/virt/kvm/async_pf.c
===================================================================
--- linux-stable-rt.orig/virt/kvm/async_pf.c	2014-11-25 14:13:39.202899931 -0200
+++ linux-stable-rt/virt/kvm/async_pf.c	2014-11-25 14:14:38.639810063 -0200
@@ -82,8 +82,8 @@
 
 	trace_kvm_async_pf_completed(addr, gva);
 
-	if (waitqueue_active(&vcpu->wq))
-		wake_up_interruptible(&vcpu->wq);
+	if (swaitqueue_active(&vcpu->wq))
+		swait_wake_interruptible(&vcpu->wq);
 
 	mmput(mm);
 	kvm_put_kvm(vcpu->kvm);
Index: linux-stable-rt/virt/kvm/kvm_main.c
===================================================================
--- linux-stable-rt.orig/virt/kvm/kvm_main.c	2014-11-25 14:13:39.204899928 -0200
+++ linux-stable-rt/virt/kvm/kvm_main.c	2014-11-25 14:14:38.641810060 -0200
@@ -219,7 +219,7 @@
 	vcpu->kvm = kvm;
 	vcpu->vcpu_id = id;
 	vcpu->pid = NULL;
-	init_waitqueue_head(&vcpu->wq);
+	init_swait_head(&vcpu->wq);
 	kvm_async_pf_vcpu_init(vcpu);
 
 	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
@@ -1675,10 +1675,10 @@
  */
 void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 {
-	DEFINE_WAIT(wait);
+	DEFINE_SWAITER(wait);
 
 	for (;;) {
-		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
+		swait_prepare(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
 
 		if (kvm_arch_vcpu_runnable(vcpu)) {
 			kvm_make_request(KVM_REQ_UNHALT, vcpu);
@@ -1692,7 +1692,7 @@
 		schedule();
 	}
 
-	finish_wait(&vcpu->wq, &wait);
+	swait_finish(&vcpu->wq, &wait);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_block);
 
@@ -1704,11 +1704,11 @@
 {
 	int me;
 	int cpu = vcpu->cpu;
-	wait_queue_head_t *wqp;
+	struct swait_head *wqp;
 
 	wqp = kvm_arch_vcpu_wq(vcpu);
-	if (waitqueue_active(wqp)) {
-		wake_up_interruptible(wqp);
+	if (swaitqueue_active(wqp)) {
+		swait_wake_interruptible(wqp);
 		++vcpu->stat.halt_wakeup;
 	}
 
@@ -1814,7 +1814,7 @@
 				continue;
 			if (vcpu == me)
 				continue;
-			if (waitqueue_active(&vcpu->wq))
+			if (swaitqueue_active(&vcpu->wq))
 				continue;
 			if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
 				continue;



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-01-14 17:12 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti
@ 2015-01-14 18:22   ` Rik van Riel
  2015-01-16 16:48   ` Steven Rostedt
  2015-03-06 13:54   ` Sebastian Andrzej Siewior
  2 siblings, 0 replies; 42+ messages in thread
From: Rik van Riel @ 2015-01-14 18:22 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel, linux-rt-users
  Cc: Luiz Capitulino, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini

On 01/14/2015 12:12 PM, Marcelo Tosatti wrote:
> The problem:
> 
> On -RT, an emulated LAPIC timer instances has the following path:
> 
> 1) hard interrupt
> 2) ksoftirqd is scheduled
> 3) ksoftirqd wakes up vcpu thread
> 4) vcpu thread is scheduled
> 
> This extra context switch introduces unnecessary latency in the 
> LAPIC path for a KVM guest.

> This patch reduces the average latency in my tests from 14us to 11us.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-01-14 17:12 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti
  2015-01-14 18:22   ` Rik van Riel
@ 2015-01-16 16:48   ` Steven Rostedt
  2015-01-16 16:56     ` Steven Rostedt
                       ` (2 more replies)
  2015-03-06 13:54   ` Sebastian Andrzej Siewior
  2 siblings, 3 replies; 42+ messages in thread
From: Steven Rostedt @ 2015-01-16 16:48 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel,
	Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini,
	Peter Zijlstra

On Fri, 16 Jan 2015 11:40:45 -0500
Marcelo Tosatti <mtosatti@redhat.com> wrote:

> The problem:
> 
> On -RT, an emulated LAPIC timer instances has the following path:
> 
> 1) hard interrupt
> 2) ksoftirqd is scheduled
> 3) ksoftirqd wakes up vcpu thread
> 4) vcpu thread is scheduled
> 
> This extra context switch introduces unnecessary latency in the 
> LAPIC path for a KVM guest.
> 
> The solution:
> 
> Allow waking up vcpu thread from hardirq context,
> thus avoiding the need for ksoftirqd to be scheduled.
> 
> Normal waitqueues make use of spinlocks, which on -RT 
> are sleepable locks. Therefore, waking up a waitqueue 
> waiter involves locking a sleeping lock, which 
> is not allowed from hard interrupt context.
> 

I have no issue in pulling this into stable-rt, but read below.

> cyclictest command line:
> # cyclictest -m -n -q -p99 -l 1000000 -h60  -D 1m
> 
> This patch reduces the average latency in my tests from 14us to 11us.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> ---
>  arch/arm/kvm/arm.c                  |    4 ++--
>  arch/arm/kvm/psci.c                 |    4 ++--
>  arch/mips/kvm/kvm_mips.c            |    8 ++++----
>  arch/powerpc/include/asm/kvm_host.h |    4 ++--
>  arch/powerpc/kvm/book3s_hv.c        |   20 ++++++++++----------
>  arch/s390/include/asm/kvm_host.h    |    2 +-
>  arch/s390/kvm/interrupt.c           |   22 ++++++++++------------
>  arch/s390/kvm/sigp.c                |   16 ++++++++--------
>  arch/x86/kvm/lapic.c                |    6 +++---
>  include/linux/kvm_host.h            |    4 ++--
>  virt/kvm/async_pf.c                 |    4 ++--
>  virt/kvm/kvm_main.c                 |   16 ++++++++--------
>  12 files changed, 54 insertions(+), 56 deletions(-)
> 
> Index: linux-stable-rt/arch/arm/kvm/arm.c
> ===================================================================
> --- linux-stable-rt.orig/arch/arm/kvm/arm.c	2014-11-25 14:13:39.188899952 -0200
> +++ linux-stable-rt/arch/arm/kvm/arm.c	2014-11-25 14:14:38.620810092 -0200
> @@ -495,9 +495,9 @@
>  
>  static void vcpu_pause(struct kvm_vcpu *vcpu)
>  {
> -	wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
> +	struct swait_head *wq = kvm_arch_vcpu_wq(vcpu);
>  
> -	wait_event_interruptible(*wq, !vcpu->arch.pause);
> +	swait_event_interruptible(*wq, !vcpu->arch.pause);
>  }
>  
>  static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
> Index: linux-stable-rt/arch/arm/kvm/psci.c
> ===================================================================
> --- linux-stable-rt.orig/arch/arm/kvm/psci.c	2014-11-25 14:13:39.189899951 -0200
> +++ linux-stable-rt/arch/arm/kvm/psci.c	2014-11-25 14:14:38.620810092 -0200
> @@ -36,7 +36,7 @@
>  {
>  	struct kvm *kvm = source_vcpu->kvm;
>  	struct kvm_vcpu *vcpu = NULL, *tmp;
> -	wait_queue_head_t *wq;
> +	struct swait_head *wq;
>  	unsigned long cpu_id;
>  	unsigned long mpidr;
>  	phys_addr_t target_pc;
> @@ -80,7 +80,7 @@
>  	smp_mb();		/* Make sure the above is visible */
>  
>  	wq = kvm_arch_vcpu_wq(vcpu);
> -	wake_up_interruptible(wq);
> +	swait_wake_interruptible(wq);
>  
>  	return KVM_PSCI_RET_SUCCESS;
>  }
> Index: linux-stable-rt/arch/mips/kvm/kvm_mips.c
> ===================================================================
> --- linux-stable-rt.orig/arch/mips/kvm/kvm_mips.c	2014-11-25 14:13:39.191899948 -0200
> +++ linux-stable-rt/arch/mips/kvm/kvm_mips.c	2014-11-25 14:14:38.621810091 -0200
> @@ -464,8 +464,8 @@
>  
>  	dvcpu->arch.wait = 0;
>  
> -	if (waitqueue_active(&dvcpu->wq)) {
> -		wake_up_interruptible(&dvcpu->wq);
> +	if (swaitqueue_active(&dvcpu->wq)) {
> +		swait_wake_interruptible(&dvcpu->wq);
>  	}
>  
>  	return 0;
> @@ -971,8 +971,8 @@
>  	kvm_mips_callbacks->queue_timer_int(vcpu);
>  
>  	vcpu->arch.wait = 0;
> -	if (waitqueue_active(&vcpu->wq)) {
> -		wake_up_interruptible(&vcpu->wq);
> +	if (swaitqueue_active(&vcpu->wq)) {
> +		swait_wake_nterruptible(&vcpu->wq);

Sure you compiled this patch?

>  	}
>  }
>  
> Index: linux-stable-rt/arch/powerpc/include/asm/kvm_host.h
> ===================================================================
> --- linux-stable-rt.orig/arch/powerpc/include/asm/kvm_host.h	2014-11-25 14:13:39.193899944 -0200
> +++ linux-stable-rt/arch/powerpc/include/asm/kvm_host.h	2014-11-25 14:14:38.621810091 -0200
> @@ -295,7 +295,7 @@
>  	u8 in_guest;
>  	struct list_head runnable_threads;
>  	spinlock_t lock;
> -	wait_queue_head_t wq;
> +	struct swait_head wq;
>  	u64 stolen_tb;
>  	u64 preempt_tb;
>  	struct kvm_vcpu *runner;
> @@ -612,7 +612,7 @@
>  	u8 prodded;
>  	u32 last_inst;
>  
> -	wait_queue_head_t *wqp;
> +	struct swait_head *wqp;
>  	struct kvmppc_vcore *vcore;
>  	int ret;
>  	int trap;
> Index: linux-stable-rt/arch/powerpc/kvm/book3s_hv.c
> ===================================================================
> --- linux-stable-rt.orig/arch/powerpc/kvm/book3s_hv.c	2014-11-25 14:13:39.195899942 -0200
> +++ linux-stable-rt/arch/powerpc/kvm/book3s_hv.c	2014-11-25 14:14:38.625810085 -0200
> @@ -74,11 +74,11 @@
>  {
>  	int me;
>  	int cpu = vcpu->cpu;
> -	wait_queue_head_t *wqp;
> +	struct swait_head *wqp;
>  
>  	wqp = kvm_arch_vcpu_wq(vcpu);
> -	if (waitqueue_active(wqp)) {
> -		wake_up_interruptible(wqp);
> +	if (swaitqueue_active(wqp)) {
> +		swait_wake_interruptible(wqp);
>  		++vcpu->stat.halt_wakeup;
>  	}
>  
> @@ -583,8 +583,8 @@
>  		tvcpu->arch.prodded = 1;
>  		smp_mb();
>  		if (vcpu->arch.ceded) {
> -			if (waitqueue_active(&vcpu->wq)) {
> -				wake_up_interruptible(&vcpu->wq);
> +			if (swaitqueue_active(&vcpu->wq)) {
> +				swait_wake_interruptible(&vcpu->wq);
>  				vcpu->stat.halt_wakeup++;
>  			}
>  		}
> @@ -1199,7 +1199,7 @@
>  		if (vcore) {
>  			INIT_LIST_HEAD(&vcore->runnable_threads);
>  			spin_lock_init(&vcore->lock);
> -			init_waitqueue_head(&vcore->wq);
> +			init_swait_head(&vcore->wq);
>  			vcore->preempt_tb = TB_NIL;
>  			vcore->lpcr = kvm->arch.lpcr;
>  			vcore->first_vcpuid = core * threads_per_core;
> @@ -1566,13 +1566,13 @@
>   */
>  static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>  {
> -	DEFINE_WAIT(wait);
> +	DEFINE_SWAITER(wait);
>  
> -	prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
> +	swait_prepare(&vc->wq, &wait, TASK_INTERRUPTIBLE);
>  	vc->vcore_state = VCORE_SLEEPING;
>  	spin_unlock(&vc->lock);
>  	schedule();
> -	finish_wait(&vc->wq, &wait);
> +	swait_finish(&vc->wq, &wait);
>  	spin_lock(&vc->lock);
>  	vc->vcore_state = VCORE_INACTIVE;
>  }
> @@ -1613,7 +1613,7 @@
>  			kvmppc_create_dtl_entry(vcpu, vc);
>  			kvmppc_start_thread(vcpu);
>  		} else if (vc->vcore_state == VCORE_SLEEPING) {
> -			wake_up(&vc->wq);
> +			swait_wake(&vc->wq);

I notice everywhere you have a swait_wake_interruptible() but here. Is
there a reason why?

IIRC, Peter wants to make swait wakeup usage homogenous. That is, you
either sleep in an interruptible state, or you don't. You can't mix and
match it.

Peter is that what you plan?

-- Steve


>  		}
>  
>  	}
> Index: linux-stable-rt/arch/s390/include/asm/kvm_host.h
> ===================================================================
> --- linux-stable-rt.orig/arch/s390/include/asm/kvm_host.h	2014-11-25 14:13:39.196899940 -0200
> +++ linux-stable-rt/arch/s390/include/asm/kvm_host.h	2014-11-25 14:14:38.627810082 -0200
> @@ -233,7 +233,7 @@
>  	atomic_t active;
>  	struct kvm_s390_float_interrupt *float_int;
>  	int timer_due; /* event indicator for waitqueue below */
> -	wait_queue_head_t *wq;
> +	struct swait_head *wq;
>  	atomic_t *cpuflags;
>  	unsigned int action_bits;
>  };
> Index: linux-stable-rt/arch/s390/kvm/interrupt.c
> ===================================================================
> --- linux-stable-rt.orig/arch/s390/kvm/interrupt.c	2014-11-25 14:13:39.197899939 -0200
> +++ linux-stable-rt/arch/s390/kvm/interrupt.c	2014-11-25 14:14:38.630810077 -0200
> @@ -403,7 +403,7 @@
>  int kvm_s390_handle_wait(struct kvm_vcpu *vcpu)
>  {
>  	u64 now, sltime;
> -	DECLARE_WAITQUEUE(wait, current);
> +	DECLARE_SWAITER(wait);
>  
>  	vcpu->stat.exit_wait_state++;
>  	if (kvm_cpu_has_interrupt(vcpu))
> @@ -440,12 +440,11 @@
>  	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
>  	spin_lock(&vcpu->arch.local_int.float_int->lock);
>  	spin_lock_bh(&vcpu->arch.local_int.lock);
> -	add_wait_queue(&vcpu->wq, &wait);
> +	swait_prepare(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
>  	while (list_empty(&vcpu->arch.local_int.list) &&
>  		list_empty(&vcpu->arch.local_int.float_int->list) &&
>  		(!vcpu->arch.local_int.timer_due) &&
>  		!signal_pending(current)) {
> -		set_current_state(TASK_INTERRUPTIBLE);
>  		spin_unlock_bh(&vcpu->arch.local_int.lock);
>  		spin_unlock(&vcpu->arch.local_int.float_int->lock);
>  		schedule();
> @@ -453,8 +452,7 @@
>  		spin_lock_bh(&vcpu->arch.local_int.lock);
>  	}
>  	__unset_cpu_idle(vcpu);
> -	__set_current_state(TASK_RUNNING);
> -	remove_wait_queue(&vcpu->wq, &wait);
> +	swait_finish(&vcpu->wq, &wait);
>  	spin_unlock_bh(&vcpu->arch.local_int.lock);
>  	spin_unlock(&vcpu->arch.local_int.float_int->lock);
>  	vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
> @@ -469,8 +467,8 @@
>  
>  	spin_lock(&vcpu->arch.local_int.lock);
>  	vcpu->arch.local_int.timer_due = 1;
> -	if (waitqueue_active(&vcpu->wq))
> -		wake_up_interruptible(&vcpu->wq);
> +	if (swaitqueue_active(&vcpu->wq))
> +		swait_wake_interruptible(&vcpu->wq);
>  	spin_unlock(&vcpu->arch.local_int.lock);
>  }
>  
> @@ -617,7 +615,7 @@
>  	spin_lock_bh(&li->lock);
>  	list_add(&inti->list, &li->list);
>  	atomic_set(&li->active, 1);
> -	BUG_ON(waitqueue_active(li->wq));
> +	BUG_ON(swaitqueue_active(li->wq));
>  	spin_unlock_bh(&li->lock);
>  	return 0;
>  }
> @@ -750,8 +748,8 @@
>  	li = fi->local_int[sigcpu];
>  	spin_lock_bh(&li->lock);
>  	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
> -	if (waitqueue_active(li->wq))
> -		wake_up_interruptible(li->wq);
> +	if (swaitqueue_active(li->wq))
> +		swait_wake_interruptible(li->wq);
>  	spin_unlock_bh(&li->lock);
>  	spin_unlock(&fi->lock);
>  	mutex_unlock(&kvm->lock);
> @@ -836,8 +834,8 @@
>  	if (inti->type == KVM_S390_SIGP_STOP)
>  		li->action_bits |= ACTION_STOP_ON_STOP;
>  	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
> -	if (waitqueue_active(&vcpu->wq))
> -		wake_up_interruptible(&vcpu->wq);
> +	if (swaitqueue_active(&vcpu->wq))
> +		swait_wake_interruptible(&vcpu->wq);
>  	spin_unlock_bh(&li->lock);
>  	mutex_unlock(&vcpu->kvm->lock);
>  	return 0;
> Index: linux-stable-rt/arch/s390/kvm/sigp.c
> ===================================================================
> --- linux-stable-rt.orig/arch/s390/kvm/sigp.c	2014-11-25 14:13:39.198899937 -0200
> +++ linux-stable-rt/arch/s390/kvm/sigp.c	2014-11-25 14:14:38.633810073 -0200
> @@ -79,8 +79,8 @@
>  	list_add_tail(&inti->list, &li->list);
>  	atomic_set(&li->active, 1);
>  	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
> -	if (waitqueue_active(li->wq))
> -		wake_up_interruptible(li->wq);
> +	if (swaitqueue_active(li->wq))
> +		swait_wake_interruptible(li->wq);
>  	spin_unlock_bh(&li->lock);
>  	rc = SIGP_CC_ORDER_CODE_ACCEPTED;
>  	VCPU_EVENT(vcpu, 4, "sent sigp emerg to cpu %x", cpu_addr);
> @@ -148,8 +148,8 @@
>  	list_add_tail(&inti->list, &li->list);
>  	atomic_set(&li->active, 1);
>  	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
> -	if (waitqueue_active(li->wq))
> -		wake_up_interruptible(li->wq);
> +	if (swaitqueue_active(li->wq))
> +		swait_wake_interruptible(li->wq);
>  	spin_unlock_bh(&li->lock);
>  	rc = SIGP_CC_ORDER_CODE_ACCEPTED;
>  	VCPU_EVENT(vcpu, 4, "sent sigp ext call to cpu %x", cpu_addr);
> @@ -179,8 +179,8 @@
>  	atomic_set(&li->active, 1);
>  	atomic_set_mask(CPUSTAT_STOP_INT, li->cpuflags);
>  	li->action_bits |= action;
> -	if (waitqueue_active(li->wq))
> -		wake_up_interruptible(li->wq);
> +	if (swaitqueue_active(li->wq))
> +		swait_wake_interruptible(li->wq);
>  out:
>  	spin_unlock_bh(&li->lock);
>  
> @@ -288,8 +288,8 @@
>  
>  	list_add_tail(&inti->list, &li->list);
>  	atomic_set(&li->active, 1);
> -	if (waitqueue_active(li->wq))
> -		wake_up_interruptible(li->wq);
> +	if (swaitqueue_active(li->wq))
> +		swait_wake_interruptible(li->wq);
>  	rc = SIGP_CC_ORDER_CODE_ACCEPTED;
>  
>  	VCPU_EVENT(vcpu, 4, "set prefix of cpu %02x to %x", cpu_addr, address);
> Index: linux-stable-rt/arch/x86/kvm/lapic.c
> ===================================================================
> --- linux-stable-rt.orig/arch/x86/kvm/lapic.c	2014-11-25 14:13:39.199899935 -0200
> +++ linux-stable-rt/arch/x86/kvm/lapic.c	2015-01-14 15:02:41.135771065 -0200
> @@ -1533,7 +1533,7 @@
>  	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
>  	struct kvm_lapic *apic = container_of(ktimer, struct kvm_lapic, lapic_timer);
>  	struct kvm_vcpu *vcpu = apic->vcpu;
> -	wait_queue_head_t *q = &vcpu->wq;
> +	struct swait_head *q = &vcpu->wq;
>  
>  	/*
>  	 * There is a race window between reading and incrementing, but we do
> @@ -1547,8 +1547,8 @@
>  		kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
>  	}
>  
> -	if (waitqueue_active(q))
> -		wake_up_interruptible(q);
> +	if (swaitqueue_active(q))
> +		swait_wake_interruptible(q);
>  
>  	if (lapic_is_periodic(apic)) {
>  		hrtimer_add_expires_ns(&ktimer->timer, ktimer->period);
> Index: linux-stable-rt/include/linux/kvm_host.h
> ===================================================================
> --- linux-stable-rt.orig/include/linux/kvm_host.h	2014-11-25 14:13:39.201899932 -0200
> +++ linux-stable-rt/include/linux/kvm_host.h	2014-11-25 14:14:38.638810065 -0200
> @@ -231,7 +231,7 @@
>  
>  	int fpu_active;
>  	int guest_fpu_loaded, guest_xcr0_loaded;
> -	wait_queue_head_t wq;
> +	struct swait_head wq;
>  	struct pid *pid;
>  	int sigset_active;
>  	sigset_t sigset;
> @@ -677,7 +677,7 @@
>  }
>  #endif
>  
> -static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
> +static inline struct swait_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
>  {
>  #ifdef __KVM_HAVE_ARCH_WQP
>  	return vcpu->arch.wqp;
> Index: linux-stable-rt/virt/kvm/async_pf.c
> ===================================================================
> --- linux-stable-rt.orig/virt/kvm/async_pf.c	2014-11-25 14:13:39.202899931 -0200
> +++ linux-stable-rt/virt/kvm/async_pf.c	2014-11-25 14:14:38.639810063 -0200
> @@ -82,8 +82,8 @@
>  
>  	trace_kvm_async_pf_completed(addr, gva);
>  
> -	if (waitqueue_active(&vcpu->wq))
> -		wake_up_interruptible(&vcpu->wq);
> +	if (swaitqueue_active(&vcpu->wq))
> +		swait_wake_interruptible(&vcpu->wq);
>  
>  	mmput(mm);
>  	kvm_put_kvm(vcpu->kvm);
> Index: linux-stable-rt/virt/kvm/kvm_main.c
> ===================================================================
> --- linux-stable-rt.orig/virt/kvm/kvm_main.c	2014-11-25 14:13:39.204899928 -0200
> +++ linux-stable-rt/virt/kvm/kvm_main.c	2014-11-25 14:14:38.641810060 -0200
> @@ -219,7 +219,7 @@
>  	vcpu->kvm = kvm;
>  	vcpu->vcpu_id = id;
>  	vcpu->pid = NULL;
> -	init_waitqueue_head(&vcpu->wq);
> +	init_swait_head(&vcpu->wq);
>  	kvm_async_pf_vcpu_init(vcpu);
>  
>  	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> @@ -1675,10 +1675,10 @@
>   */
>  void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  {
> -	DEFINE_WAIT(wait);
> +	DEFINE_SWAITER(wait);
>  
>  	for (;;) {
> -		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> +		swait_prepare(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
>  
>  		if (kvm_arch_vcpu_runnable(vcpu)) {
>  			kvm_make_request(KVM_REQ_UNHALT, vcpu);
> @@ -1692,7 +1692,7 @@
>  		schedule();
>  	}
>  
> -	finish_wait(&vcpu->wq, &wait);
> +	swait_finish(&vcpu->wq, &wait);
>  }
>  EXPORT_SYMBOL_GPL(kvm_vcpu_block);
>  
> @@ -1704,11 +1704,11 @@
>  {
>  	int me;
>  	int cpu = vcpu->cpu;
> -	wait_queue_head_t *wqp;
> +	struct swait_head *wqp;
>  
>  	wqp = kvm_arch_vcpu_wq(vcpu);
> -	if (waitqueue_active(wqp)) {
> -		wake_up_interruptible(wqp);
> +	if (swaitqueue_active(wqp)) {
> +		swait_wake_interruptible(wqp);
>  		++vcpu->stat.halt_wakeup;
>  	}
>  
> @@ -1814,7 +1814,7 @@
>  				continue;
>  			if (vcpu == me)
>  				continue;
> -			if (waitqueue_active(&vcpu->wq))
> +			if (swaitqueue_active(&vcpu->wq))
>  				continue;
>  			if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
>  				continue;
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-01-16 16:48   ` Steven Rostedt
@ 2015-01-16 16:56     ` Steven Rostedt
  2015-01-17  7:57     ` Peter Zijlstra
  2015-01-19 14:41     ` Marcelo Tosatti
  2 siblings, 0 replies; 42+ messages in thread
From: Steven Rostedt @ 2015-01-16 16:56 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel,
	Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini,
	Peter Zijlstra

On Fri, 16 Jan 2015 11:48:46 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> >  static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
> >  {
> > -	DEFINE_WAIT(wait);
> > +	DEFINE_SWAITER(wait);
> >  
> > -	prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
> > +	swait_prepare(&vc->wq, &wait, TASK_INTERRUPTIBLE);
> >  	vc->vcore_state = VCORE_SLEEPING;
> >  	spin_unlock(&vc->lock);
> >  	schedule();
> > -	finish_wait(&vc->wq, &wait);
> > +	swait_finish(&vc->wq, &wait);
> >  	spin_lock(&vc->lock);
> >  	vc->vcore_state = VCORE_INACTIVE;
> >  }
> > @@ -1613,7 +1613,7 @@
> >  			kvmppc_create_dtl_entry(vcpu, vc);
> >  			kvmppc_start_thread(vcpu);
> >  		} else if (vc->vcore_state == VCORE_SLEEPING) {
> > -			wake_up(&vc->wq);
> > +			swait_wake(&vc->wq);
> 
> I notice everywhere you have a swait_wake_interruptible() but here. Is
> there a reason why?
> 

Of course I could have misread this patch, and this specific work queue
will only be woken up by swait_wake() and not
swait_wake_interruptible(). If that's the case, then its fine as is
(except for that typo I pointed out before).

-- Steve

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-01-16 16:48   ` Steven Rostedt
  2015-01-16 16:56     ` Steven Rostedt
@ 2015-01-17  7:57     ` Peter Zijlstra
  2015-01-19 14:41     ` Marcelo Tosatti
  2 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2015-01-17  7:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino,
	Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm,
	Paolo Bonzini

On Fri, Jan 16, 2015 at 11:48:46AM -0500, Steven Rostedt wrote:
> I notice everywhere you have a swait_wake_interruptible() but here. Is
> there a reason why?
> 
> IIRC, Peter wants to make swait wakeup usage homogenous. That is, you
> either sleep in an interruptible state, or you don't. You can't mix and
> match it.
> 
> Peter is that what you plan?

Yes, this is required for the single wakeup case to be bounded.

If you have both INTERRUPTIBLE and UNINTERRUPTIBLE waiters on a list,
and you were allowed to do INTERRUPTIBLE/UNINTERRUPTIBLE wakeups, the
wakeup would have to do O(n) iteration to find a matching waiter.

Seeing that the entire purpose of simple wait queues was to retain RT
properties, that's entirely retarded ;-)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-01-16 16:48   ` Steven Rostedt
  2015-01-16 16:56     ` Steven Rostedt
  2015-01-17  7:57     ` Peter Zijlstra
@ 2015-01-19 14:41     ` Marcelo Tosatti
  2015-01-20  5:46       ` Paul Mackerras
  2 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2015-01-19 14:41 UTC (permalink / raw)
  To: Steven Rostedt, Paul Mackerras
  Cc: linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel,
	Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini,
	Peter Zijlstra

On Fri, Jan 16, 2015 at 11:48:46AM -0500, Steven Rostedt wrote:
> > @@ -971,8 +971,8 @@
> >  	kvm_mips_callbacks->queue_timer_int(vcpu);
> >  
> >  	vcpu->arch.wait = 0;
> > -	if (waitqueue_active(&vcpu->wq)) {
> > -		wake_up_interruptible(&vcpu->wq);
> > +	if (swaitqueue_active(&vcpu->wq)) {
> > +		swait_wake_nterruptible(&vcpu->wq);
> 
> Sure you compiled this patch?

Only x86.

> > Index: linux-stable-rt/arch/powerpc/include/asm/kvm_host.h
> > ===================================================================
> > --- linux-stable-rt.orig/arch/powerpc/include/asm/kvm_host.h	2014-11-25 14:13:39.193899944 -0200
> > +++ linux-stable-rt/arch/powerpc/include/asm/kvm_host.h	2014-11-25 14:14:38.621810091 -0200
> > @@ -295,7 +295,7 @@
> >  	u8 in_guest;
> >  	struct list_head runnable_threads;
> >  	spinlock_t lock;
> > -	wait_queue_head_t wq;
> > +	struct swait_head wq;
> >  	u64 stolen_tb;
> >  	u64 preempt_tb;
> >  	struct kvm_vcpu *runner;
> > @@ -612,7 +612,7 @@
> >  	u8 prodded;
> >  	u32 last_inst;
> >  
> > -	wait_queue_head_t *wqp;
> > +	struct swait_head *wqp;
> >  	struct kvmppc_vcore *vcore;
> >  	int ret;
> >  	int trap;
> > Index: linux-stable-rt/arch/powerpc/kvm/book3s_hv.c
> > ===================================================================
> > --- linux-stable-rt.orig/arch/powerpc/kvm/book3s_hv.c	2014-11-25 14:13:39.195899942 -0200
> > +++ linux-stable-rt/arch/powerpc/kvm/book3s_hv.c	2014-11-25 14:14:38.625810085 -0200
> > @@ -74,11 +74,11 @@
> >  {
> >  	int me;
> >  	int cpu = vcpu->cpu;
> > -	wait_queue_head_t *wqp;
> > +	struct swait_head *wqp;
> >  
> >  	wqp = kvm_arch_vcpu_wq(vcpu);
> > -	if (waitqueue_active(wqp)) {
> > -		wake_up_interruptible(wqp);
> > +	if (swaitqueue_active(wqp)) {
> > +		swait_wake_interruptible(wqp);
> >  		++vcpu->stat.halt_wakeup;
> >  	}
> >  
> > @@ -583,8 +583,8 @@
> >  		tvcpu->arch.prodded = 1;
> >  		smp_mb();
> >  		if (vcpu->arch.ceded) {
> > -			if (waitqueue_active(&vcpu->wq)) {
> > -				wake_up_interruptible(&vcpu->wq);
> > +			if (swaitqueue_active(&vcpu->wq)) {
> > +				swait_wake_interruptible(&vcpu->wq);
> >  				vcpu->stat.halt_wakeup++;
> >  			}
> >  		}
> > @@ -1199,7 +1199,7 @@
> >  		if (vcore) {
> >  			INIT_LIST_HEAD(&vcore->runnable_threads);
> >  			spin_lock_init(&vcore->lock);
> > -			init_waitqueue_head(&vcore->wq);
> > +			init_swait_head(&vcore->wq);
> >  			vcore->preempt_tb = TB_NIL;
> >  			vcore->lpcr = kvm->arch.lpcr;
> >  			vcore->first_vcpuid = core * threads_per_core;
> > @@ -1566,13 +1566,13 @@
> >   */
> >  static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
> >  {
> > -	DEFINE_WAIT(wait);
> > +	DEFINE_SWAITER(wait);
> >  
> > -	prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
> > +	swait_prepare(&vc->wq, &wait, TASK_INTERRUPTIBLE);
> >  	vc->vcore_state = VCORE_SLEEPING;
> >  	spin_unlock(&vc->lock);
> >  	schedule();
> > -	finish_wait(&vc->wq, &wait);
> > +	swait_finish(&vc->wq, &wait);
> >  	spin_lock(&vc->lock);
> >  	vc->vcore_state = VCORE_INACTIVE;
> >  }
> > @@ -1613,7 +1613,7 @@
> >  			kvmppc_create_dtl_entry(vcpu, vc);
> >  			kvmppc_start_thread(vcpu);
> >  		} else if (vc->vcore_state == VCORE_SLEEPING) {
> > -			wake_up(&vc->wq);
> > +			swait_wake(&vc->wq);
> 
> I notice everywhere you have a swait_wake_interruptible() but here. Is
> there a reason why?
> 
> IIRC, Peter wants to make swait wakeup usage homogenous. That is, you
> either sleep in an interruptible state, or you don't. You can't mix and
> match it.

IIUC there is only one waiter on this waitqueue at any given time.

Paul is that correct?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-01-19 14:41     ` Marcelo Tosatti
@ 2015-01-20  5:46       ` Paul Mackerras
  2015-01-20 18:16         ` Steven Rostedt
  0 siblings, 1 reply; 42+ messages in thread
From: Paul Mackerras @ 2015-01-20  5:46 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Steven Rostedt, linux-kernel, linux-rt-users, Luiz Capitulino,
	Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm,
	Paolo Bonzini, Peter Zijlstra

On Mon, Jan 19, 2015 at 12:41:00PM -0200, Marcelo Tosatti wrote:
> On Fri, Jan 16, 2015 at 11:48:46AM -0500, Steven Rostedt wrote:
> > >  static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
> > >  {
> > > -	DEFINE_WAIT(wait);
> > > +	DEFINE_SWAITER(wait);
> > >  
> > > -	prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
> > > +	swait_prepare(&vc->wq, &wait, TASK_INTERRUPTIBLE);
> > >  	vc->vcore_state = VCORE_SLEEPING;
> > >  	spin_unlock(&vc->lock);
> > >  	schedule();
> > > -	finish_wait(&vc->wq, &wait);
> > > +	swait_finish(&vc->wq, &wait);
> > >  	spin_lock(&vc->lock);
> > >  	vc->vcore_state = VCORE_INACTIVE;
> > >  }
> > > @@ -1613,7 +1613,7 @@
> > >  			kvmppc_create_dtl_entry(vcpu, vc);
> > >  			kvmppc_start_thread(vcpu);
> > >  		} else if (vc->vcore_state == VCORE_SLEEPING) {
> > > -			wake_up(&vc->wq);
> > > +			swait_wake(&vc->wq);
> > 
> > I notice everywhere you have a swait_wake_interruptible() but here. Is
> > there a reason why?
> > 
> > IIRC, Peter wants to make swait wakeup usage homogenous. That is, you
> > either sleep in an interruptible state, or you don't. You can't mix and
> > match it.
> 
> IIUC there is only one waiter on this waitqueue at any given time.
> 
> Paul is that correct?

Yes, that's right.  It's only the task that has taken the
responsibility for running the virtual core that would be waiting on
that wait queue.

Paul.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-01-20  5:46       ` Paul Mackerras
@ 2015-01-20 18:16         ` Steven Rostedt
  2015-01-21 15:07           ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Steven Rostedt @ 2015-01-20 18:16 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino,
	Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm,
	Paolo Bonzini, Peter Zijlstra

On Tue, 20 Jan 2015 16:46:53 +1100
Paul Mackerras <paulus@samba.org> wrote:

> On Mon, Jan 19, 2015 at 12:41:00PM -0200, Marcelo Tosatti wrote:
> > On Fri, Jan 16, 2015 at 11:48:46AM -0500, Steven Rostedt wrote:
> > > >  static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
> > > >  {
> > > > -	DEFINE_WAIT(wait);
> > > > +	DEFINE_SWAITER(wait);
> > > >  
> > > > -	prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
> > > > +	swait_prepare(&vc->wq, &wait, TASK_INTERRUPTIBLE);
> > > >  	vc->vcore_state = VCORE_SLEEPING;
> > > >  	spin_unlock(&vc->lock);
> > > >  	schedule();
> > > > -	finish_wait(&vc->wq, &wait);
> > > > +	swait_finish(&vc->wq, &wait);
> > > >  	spin_lock(&vc->lock);
> > > >  	vc->vcore_state = VCORE_INACTIVE;
> > > >  }
> > > > @@ -1613,7 +1613,7 @@
> > > >  			kvmppc_create_dtl_entry(vcpu, vc);
> > > >  			kvmppc_start_thread(vcpu);
> > > >  		} else if (vc->vcore_state == VCORE_SLEEPING) {
> > > > -			wake_up(&vc->wq);
> > > > +			swait_wake(&vc->wq);
> > > 
> > > I notice everywhere you have a swait_wake_interruptible() but here. Is
> > > there a reason why?
> > > 
> > > IIRC, Peter wants to make swait wakeup usage homogenous. That is, you
> > > either sleep in an interruptible state, or you don't. You can't mix and
> > > match it.
> > 
> > IIUC there is only one waiter on this waitqueue at any given time.
> > 
> > Paul is that correct?
> 
> Yes, that's right.  It's only the task that has taken the
> responsibility for running the virtual core that would be waiting on
> that wait queue.

Thanks Paul, but it still makes me nervious.

I'm actually wondering if we should just nuke the _interruptible()
version of swait. As it should only be all interruptible or all not
interruptible, that the swait_wake() should just do the wake up
regardless. In which case, swait_wake() is good enough. No need to have
different versions where people may think do something special.

Peter?

-- Steve

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-01-20 18:16         ` Steven Rostedt
@ 2015-01-21 15:07           ` Peter Zijlstra
  2015-02-17 17:44             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2015-01-21 15:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul Mackerras, Marcelo Tosatti, linux-kernel, linux-rt-users,
	Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner,
	kvm, Paolo Bonzini

On Tue, Jan 20, 2015 at 01:16:13PM -0500, Steven Rostedt wrote:
> I'm actually wondering if we should just nuke the _interruptible()
> version of swait. As it should only be all interruptible or all not
> interruptible, that the swait_wake() should just do the wake up
> regardless. In which case, swait_wake() is good enough. No need to have
> different versions where people may think do something special.
> 
> Peter?

Yeah, I think the lastest thing I have sitting here on my disk only has
the swake_up() which does TASK_NORMAL, no choice there.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-01-21 15:07           ` Peter Zijlstra
@ 2015-02-17 17:44             ` Sebastian Andrzej Siewior
  2015-02-18 14:03               ` Peter Zijlstra
  2015-02-27  0:23               ` Marcelo Tosatti
  0 siblings, 2 replies; 42+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-17 17:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Paul Mackerras, Marcelo Tosatti, linux-kernel,
	linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt,
	Thomas Gleixner, kvm, Paolo Bonzini

* Peter Zijlstra | 2015-01-21 16:07:16 [+0100]:

>On Tue, Jan 20, 2015 at 01:16:13PM -0500, Steven Rostedt wrote:
>> I'm actually wondering if we should just nuke the _interruptible()
>> version of swait. As it should only be all interruptible or all not
>> interruptible, that the swait_wake() should just do the wake up
>> regardless. In which case, swait_wake() is good enough. No need to have
>> different versions where people may think do something special.
>> 
>> Peter?
>
>Yeah, I think the lastest thing I have sitting here on my disk only has
>the swake_up() which does TASK_NORMAL, no choice there.

what is the swait status in terms of mainline? This sounds like it
beeing worked on.
I could take the series but then I would drop it again if the mainline
implementation changes…

Sebastian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-02-17 17:44             ` Sebastian Andrzej Siewior
@ 2015-02-18 14:03               ` Peter Zijlstra
  2015-02-25 21:02                 ` Sebastian Andrzej Siewior
  2015-02-27  0:23               ` Marcelo Tosatti
  1 sibling, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2015-02-18 14:03 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, Paul Mackerras, Marcelo Tosatti, linux-kernel,
	linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt,
	Thomas Gleixner, kvm, Paolo Bonzini

On Tue, Feb 17, 2015 at 06:44:19PM +0100, Sebastian Andrzej Siewior wrote:
> * Peter Zijlstra | 2015-01-21 16:07:16 [+0100]:
> 
> >On Tue, Jan 20, 2015 at 01:16:13PM -0500, Steven Rostedt wrote:
> >> I'm actually wondering if we should just nuke the _interruptible()
> >> version of swait. As it should only be all interruptible or all not
> >> interruptible, that the swait_wake() should just do the wake up
> >> regardless. In which case, swait_wake() is good enough. No need to have
> >> different versions where people may think do something special.
> >> 
> >> Peter?
> >
> >Yeah, I think the lastest thing I have sitting here on my disk only has
> >the swake_up() which does TASK_NORMAL, no choice there.
> 
> what is the swait status in terms of mainline? This sounds like it
> beeing worked on.
> I could take the series but then I would drop it again if the mainline
> implementation changes…

Well, 'worked' on might be putting too much on it, its one of the many
many 'spare' time things that never get attention unless people bug me
;-)

The below is my current patch, I've not actually tried it yet, I (think
I) had one patch doing some conversions but I'm having trouble locating
it.

Mostly-Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/swait.h |  172 ++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/swait.c  |  122 +++++++++++++++++++++++++++++++++++
 2 files changed, 294 insertions(+)

--- /dev/null
+++ b/include/linux/swait.h
@@ -0,0 +1,172 @@
+#ifndef _LINUX_SWAIT_H
+#define _LINUX_SWAIT_H
+
+#include <linux/list.h>
+#include <linux/stddef.h>
+#include <linux/spinlock.h>
+#include <asm/current.h>
+
+/*
+ * Simple wait queues
+ *
+ * While these are very similar to the other/complex wait queues (wait.h) the
+ * most important difference is that the simple waitqueue allows for
+ * deterministic behaviour -- IOW it has strictly bounded IRQ and lock hold
+ * times.
+ *
+ * In order to make this so, we had to drop a fair number of features of the
+ * other waitqueue code; notably:
+ *
+ *  - mixing INTERRUPTIBLE and UNINTERRUPTIBLE sleeps on the same waitqueue;
+ *    all wakeups are TASK_NORMAL in order to avoid O(n) lookups for the right
+ *    sleeper state.
+ *
+ *  - the exclusive mode; because this requires preserving the list order
+ *    and this is hard.
+ *
+ *  - custom wake functions; because you cannot give any guarantees about
+ *    random code.
+ *
+ * As a side effect of this; the data structures are slimmer.
+ *
+ * One would recommend using this wait queue where possible.
+ */
+
+struct task_struct;
+
+struct swait_queue_head {
+	raw_spinlock_t		lock;
+	struct list_head	task_list;
+};
+
+struct swait_queue {
+	struct task_struct	*task;
+	struct list_head	task_list;
+};
+
+#define __SWAITQUEUE_INITIALIZER(name) {				\
+	.task		= current,					\
+	.task_list	= LIST_HEAD_INIT((name).task_list),		\
+}
+
+#define DECLARE_SWAITQUEUE(name)					\
+	struct swait_queue name = __SWAITQUEUE_INITIALIZER(name)
+
+#define __SWAIT_QUEUE_HEAD_INITIALIZER(name) {				\
+	.lock		= __RAW_SPIN_LOCK_UNLOCKED(name.lock),		\
+	.task_list	= LIST_HEAD_INIT((name).task_list),		\
+}
+
+#define DECLARE_SWAIT_QUEUE_HEAD(name)					\
+	struct swait_queue_head name = __SWAIT_QUEUE_HEAD_INITIALIZER(name)
+
+extern void __init_swait_queue_head(struct swait_queue_head *q, const char *name,
+				    struct lock_class_key *key);
+
+#define init_swait_queue_head(q)				\
+	do {							\
+		static struct lock_class_key __key;		\
+		__init_swait_queue_head((q), #q, &__key);	\
+	} while (0)
+
+#ifdef CONFIG_LOCKDEP
+# define __SWAIT_QUEUE_HEAD_INIT_ONSTACK(name)			\
+	({ init_swait_queue_head(&name); name; })
+# define DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(name)			\
+	struct swait_queue_head name = __SWAIT_QUEUE_HEAD_INIT_ONSTACK(name)
+#else
+# define DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(name)			\
+	DECLARE_SWAIT_QUEUE_HEAD(name)
+#endif
+
+static inline int swait_active(struct swait_queue_head *q)
+{
+	return !list_empty(&q->task_list);
+}
+
+extern void swake_up(struct swait_queue_head *q);
+extern void swake_up_all(struct swait_queue_head *q);
+extern void swake_up_locked(struct swait_queue_head *q);
+
+extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
+extern void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state);
+extern long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state);
+
+extern void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait);
+extern void finish_swait(struct swait_queue_head *q, struct swait_queue *wait);
+
+/* as per ___wait_event() but for swait, therefore "exclusive == 0" */
+#define ___swait_event(wq, condition, state, ret, cmd)			\
+({									\
+	struct swait_queue __wait;					\
+	long __ret = ret;						\
+									\
+	INIT_LIST_HEAD(&__wait.task_list);				\
+	for (;;) {							\
+		long __int = prepare_to_swait_event(&wq, &__wait, state);\
+									\
+		if (condition)						\
+			break;						\
+									\
+		if (___wait_is_interruptible(state) && __int) {		\
+			__ret = __int;					\
+			break;						\
+		}							\
+									\
+		cmd;							\
+	}								\
+	finish_swait(&wq, &__wait);					\
+	__ret;								\
+})
+
+#define __swait_event(wq, condition)					\
+	(void)___swait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0,	\
+			    schedule())
+
+#define swait_event(wq, condition)					\
+do {									\
+	if (condition)							\
+		break;							\
+	__swait_event(wq, condition);					\
+} while (0)
+
+#define __swait_event_timeout(wq, condition, timeout)			\
+	___swait_event(wq, ___wait_cond_timeout(condition),		\
+		      TASK_UNINTERRUPTIBLE, timeout,			\
+		      __ret = schedule_timeout(__ret))
+
+#define swait_event_timeout(wq, condition, timeout)			\
+({									\
+	long __ret = timeout;						\
+	if (!___wait_cond_timeout(condition))				\
+		__ret = __swait_event_timeout(wq, condition, timeout);	\
+	__ret;								\
+})
+
+#define __swait_event_interruptible(wq, condition)			\
+	___swait_event(wq, condition, TASK_INTERRUPTIBLE, 0,		\
+		      schedule())
+
+#define swait_event_interruptible(wq, condition)			\
+({									\
+	int __ret = 0;							\
+	if (!(condition))						\
+		__ret = __swait_event_interruptible(wq, condition);	\
+	__ret;								\
+})
+
+#define __swait_event_interruptible_timeout(wq, condition, timeout)	\
+	___swait_event(wq, ___wait_cond_timeout(condition),		\
+		      TASK_INTERRUPTIBLE, timeout,			\
+		      __ret = schedule_timeout(__ret))
+
+#define swait_event_interruptible_timeout(wq, condition, timeout)	\
+({									\
+	long __ret = timeout;						\
+	if (!___wait_cond_timeout(condition))				\
+		__ret = __swait_event_interruptible_timeout(wq,		\
+						condition, timeout);	\
+	__ret;								\
+})
+
+#endif /* _LINUX_SWAIT_H */
--- /dev/null
+++ b/kernel/sched/swait.c
@@ -0,0 +1,122 @@
+
+#include <linux/swait.h>
+
+void __init_swait_queue_head(struct swait_queue_head *q, const char *name,
+			     struct lock_class_key *key)
+{
+	raw_spin_lock_init(&q->lock);
+	lockdep_set_class_and_name(&q->lock, key, name);
+	INIT_LIST_HEAD(&q->task_list);
+}
+EXPORT_SYMBOL(__init_swait_queue_head);
+
+/*
+ * The thing about the wake_up_state() return value; I think we can ignore it.
+ *
+ * If for some reason it would return 0, that means the previously waiting
+ * task is already running, so it will observe condition true (or has already).
+ */
+void swake_up_locked(struct swait_queue_head *q)
+{
+	struct swait_queue *curr;
+
+	list_for_each_entry(curr, &q->task_list, task_list) {
+		wake_up_process(curr->task);
+		list_del_init(&curr->task_list);
+		break;
+	}
+}
+EXPORT_SYMBOL(swake_up_locked);
+
+void swake_up(struct swait_queue_head *q)
+{
+	unsigned long flags;
+
+	if (!swait_active(q))
+		return;
+
+	raw_spin_lock_irqsave(&q->lock, flags);
+	__swake_up_locked(q);
+	raw_spin_unlock_irqrestore(&q->lock, flags);
+}
+EXPORT_SYMBOL(swake_up);
+
+/*
+ * Does not allow usage from IRQ disabled, since we must be able to
+ * release IRQs to guarantee bounded hold time.
+ */
+void swake_up_all(struct swait_queue_head *q)
+{
+	struct swait_queue *curr, *next;
+	LIST_HEAD(tmp);
+
+	if (!swait_active(q))
+		return;
+
+	raw_spin_lock_irq(&q->lock);
+	list_splice_init(&q->task_list, &tmp);
+	while (!list_empty(&tmp)) {
+		curr = list_first_entry(&tmp, typeof(curr), task_list);
+
+		wake_up_state(curr->task, state);
+		list_del_init(&curr->task_list);
+
+		if (list_empty(&tmp))
+			break;
+
+		raw_spin_unlock_irq(&q->lock);
+		raw_spin_lock_irq(&q->lock);
+	}
+	raw_spin_unlock_irq(&q->lock);
+}
+EXPORT_SYMBOL(swake_up_all);
+
+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
+{
+	wait->task = current;
+	if (list_empty(&wait->node))
+		list_add(&wait->task_list, &q->task_list);
+}
+
+void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&q->lock, flags);
+	__prepare_to_swait(q, wait);
+	set_current_state(state);
+	raw_spin_unlock_irqrestore(&q->lock, flags);
+}
+EXPORT_SYMBOL(prepare_to_swait);
+
+long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state)
+{
+	if (signal_pending_state(state, current))
+		return -ERESTARTSYS;
+
+	prepare_to_swait(q, wait, state);
+
+	return 0;
+}
+EXPORT_SYMBOL(prepare_to_swait_event);
+
+void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
+{
+	__set_current_state(TASK_RUNNING);
+	if (!list_empty(&wait->task_list))
+		list_del_init(&wait->task_list);
+}
+
+void finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
+{
+	unsigned long flags;
+
+	__set_current_state(TASK_RUNNING);
+
+	if (!list_empty_careful(&wait->task_list)) {
+		raw_spin_lock_irqsave(&q->lock, flags);
+		list_del_init(&wait->task_list);
+		raw_spin_unlock_irqrestore(&q->lock, flags);
+	}
+}
+EXPORT_SYMBOL(finish_swait);

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-02-18 14:03               ` Peter Zijlstra
@ 2015-02-25 21:02                 ` Sebastian Andrzej Siewior
  2015-08-07 10:57                   ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-25 21:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Paul Mackerras, Marcelo Tosatti, linux-kernel,
	linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt,
	Thomas Gleixner, kvm, Paolo Bonzini

* Peter Zijlstra | 2015-02-18 15:03:20 [+0100]:

>On Tue, Feb 17, 2015 at 06:44:19PM +0100, Sebastian Andrzej Siewior wrote:
>> * Peter Zijlstra | 2015-01-21 16:07:16 [+0100]:
>> 
>> >On Tue, Jan 20, 2015 at 01:16:13PM -0500, Steven Rostedt wrote:
>> >> I'm actually wondering if we should just nuke the _interruptible()
>> >> version of swait. As it should only be all interruptible or all not
>> >> interruptible, that the swait_wake() should just do the wake up
>> >> regardless. In which case, swait_wake() is good enough. No need to have
>> >> different versions where people may think do something special.
>> >> 
>> >> Peter?
>> >
>> >Yeah, I think the lastest thing I have sitting here on my disk only has
>> >the swake_up() which does TASK_NORMAL, no choice there.
>> 
>> what is the swait status in terms of mainline? This sounds like it
>> beeing worked on.
>> I could take the series but then I would drop it again if the mainline
>> implementation changes…
>
>Well, 'worked' on might be putting too much on it, its one of the many
>many 'spare' time things that never get attention unless people bug me
>;-)
>
>The below is my current patch, I've not actually tried it yet, I (think
>I) had one patch doing some conversions but I'm having trouble locating
>it.
>
>Mostly-Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>---
> include/linux/swait.h |  172 ++++++++++++++++++++++++++++++++++++++++++++++++++
> kernel/sched/swait.c  |  122 +++++++++++++++++++++++++++++++++++
> 2 files changed, 294 insertions(+)
>
>--- /dev/null
>+++ b/include/linux/swait.h
>@@ -0,0 +1,172 @@
>+#ifndef _LINUX_SWAIT_H
>+#define _LINUX_SWAIT_H
>+
>+#include <linux/list.h>
>+#include <linux/stddef.h>
>+#include <linux/spinlock.h>
>+#include <asm/current.h>
>+
>+/*
>+ * Simple wait queues
>+ *
>+ * While these are very similar to the other/complex wait queues (wait.h) the
>+ * most important difference is that the simple waitqueue allows for
>+ * deterministic behaviour -- IOW it has strictly bounded IRQ and lock hold
>+ * times.
>+ *
>+ * In order to make this so, we had to drop a fair number of features of the
>+ * other waitqueue code; notably:
>+ *
>+ *  - mixing INTERRUPTIBLE and UNINTERRUPTIBLE sleeps on the same waitqueue;
>+ *    all wakeups are TASK_NORMAL in order to avoid O(n) lookups for the right
>+ *    sleeper state.
>+ *
>+ *  - the exclusive mode; because this requires preserving the list order
>+ *    and this is hard.
>+ *
>+ *  - custom wake functions; because you cannot give any guarantees about
>+ *    random code.
>+ *
>+ * As a side effect of this; the data structures are slimmer.
>+ *
>+ * One would recommend using this wait queue where possible.
>+ */
>+
>+struct task_struct;
>+
>+struct swait_queue_head {
>+	raw_spinlock_t		lock;
>+	struct list_head	task_list;
>+};
>+
>+struct swait_queue {
>+	struct task_struct	*task;
>+	struct list_head	task_list;

I would prefer something different than task_list here since this is an
item. Scrolling down you tried to use node once so maybe that would be
good here :)

>+};
>+
>+#define __SWAITQUEUE_INITIALIZER(name) {				\
>+	.task		= current,					\
>+	.task_list	= LIST_HEAD_INIT((name).task_list),		\
>+}
>+
>+#define DECLARE_SWAITQUEUE(name)					\
>+	struct swait_queue name = __SWAITQUEUE_INITIALIZER(name)
>+
>+#define __SWAIT_QUEUE_HEAD_INITIALIZER(name) {				\
>+	.lock		= __RAW_SPIN_LOCK_UNLOCKED(name.lock),		\
>+	.task_list	= LIST_HEAD_INIT((name).task_list),		\
>+}
>+
>+#define DECLARE_SWAIT_QUEUE_HEAD(name)					\
>+	struct swait_queue_head name = __SWAIT_QUEUE_HEAD_INITIALIZER(name)
>+
>+extern void __init_swait_queue_head(struct swait_queue_head *q, const char *name,
>+				    struct lock_class_key *key);
>+
>+#define init_swait_queue_head(q)				\
>+	do {							\
>+		static struct lock_class_key __key;		\
>+		__init_swait_queue_head((q), #q, &__key);	\
>+	} while (0)
>+
>+#ifdef CONFIG_LOCKDEP
>+# define __SWAIT_QUEUE_HEAD_INIT_ONSTACK(name)			\
>+	({ init_swait_queue_head(&name); name; })
>+# define DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(name)			\
>+	struct swait_queue_head name = __SWAIT_QUEUE_HEAD_INIT_ONSTACK(name)
>+#else
>+# define DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(name)			\
>+	DECLARE_SWAIT_QUEUE_HEAD(name)
>+#endif
>+
>+static inline int swait_active(struct swait_queue_head *q)
>+{
>+	return !list_empty(&q->task_list);

In RT there was a smp_mb() which you dropped and I assume you had
reasons for it. I assumed that one can perform list_empty_careful()
without a lock if the items were removed with list_del_init(). But since
nothing in -RT blow up so far I guess this here is legal, too :)

>+}
>+
>+extern void swake_up(struct swait_queue_head *q);
>+extern void swake_up_all(struct swait_queue_head *q);
>+extern void swake_up_locked(struct swait_queue_head *q);
>+
>+extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
>+extern void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state);
>+extern long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state);
>+
>+extern void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait);
>+extern void finish_swait(struct swait_queue_head *q, struct swait_queue *wait);
>+
>+/* as per ___wait_event() but for swait, therefore "exclusive == 0" */
>+#define ___swait_event(wq, condition, state, ret, cmd)			\
>+({									\
>+	struct swait_queue __wait;					\
>+	long __ret = ret;						\
>+									\
>+	INIT_LIST_HEAD(&__wait.task_list);				\
>+	for (;;) {							\
>+		long __int = prepare_to_swait_event(&wq, &__wait, state);\
>+									\
>+		if (condition)						\
>+			break;						\
>+									\
>+		if (___wait_is_interruptible(state) && __int) {		\
>+			__ret = __int;					\
>+			break;						\
>+		}							\
>+									\
>+		cmd;							\
>+	}								\
>+	finish_swait(&wq, &__wait);					\
>+	__ret;								\
>+})
>+
>+#define __swait_event(wq, condition)					\
>+	(void)___swait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0,	\
>+			    schedule())
>+
>+#define swait_event(wq, condition)					\
>+do {									\
>+	if (condition)							\
>+		break;							\
>+	__swait_event(wq, condition);					\
>+} while (0)
>+
>+#define __swait_event_timeout(wq, condition, timeout)			\
>+	___swait_event(wq, ___wait_cond_timeout(condition),		\
>+		      TASK_UNINTERRUPTIBLE, timeout,			\
>+		      __ret = schedule_timeout(__ret))
>+
>+#define swait_event_timeout(wq, condition, timeout)			\
>+({									\
>+	long __ret = timeout;						\
>+	if (!___wait_cond_timeout(condition))				\
>+		__ret = __swait_event_timeout(wq, condition, timeout);	\
>+	__ret;								\
>+})
>+
>+#define __swait_event_interruptible(wq, condition)			\
>+	___swait_event(wq, condition, TASK_INTERRUPTIBLE, 0,		\
>+		      schedule())
>+
>+#define swait_event_interruptible(wq, condition)			\
>+({									\
>+	int __ret = 0;							\
>+	if (!(condition))						\
>+		__ret = __swait_event_interruptible(wq, condition);	\
>+	__ret;								\
>+})
>+
>+#define __swait_event_interruptible_timeout(wq, condition, timeout)	\
>+	___swait_event(wq, ___wait_cond_timeout(condition),		\
>+		      TASK_INTERRUPTIBLE, timeout,			\
>+		      __ret = schedule_timeout(__ret))
>+
>+#define swait_event_interruptible_timeout(wq, condition, timeout)	\
>+({									\
>+	long __ret = timeout;						\
>+	if (!___wait_cond_timeout(condition))				\
>+		__ret = __swait_event_interruptible_timeout(wq,		\
>+						condition, timeout);	\
>+	__ret;								\
>+})
>+
>+#endif /* _LINUX_SWAIT_H */
>--- /dev/null
>+++ b/kernel/sched/swait.c
>@@ -0,0 +1,122 @@
>+
>+#include <linux/swait.h>
>+
>+void __init_swait_queue_head(struct swait_queue_head *q, const char *name,
>+			     struct lock_class_key *key)
>+{
>+	raw_spin_lock_init(&q->lock);
>+	lockdep_set_class_and_name(&q->lock, key, name);
>+	INIT_LIST_HEAD(&q->task_list);
>+}
>+EXPORT_SYMBOL(__init_swait_queue_head);
>+
>+/*
>+ * The thing about the wake_up_state() return value; I think we can ignore it.
>+ *
>+ * If for some reason it would return 0, that means the previously waiting
>+ * task is already running, so it will observe condition true (or has already).
>+ */
>+void swake_up_locked(struct swait_queue_head *q)
>+{
>+	struct swait_queue *curr;
>+
>+	list_for_each_entry(curr, &q->task_list, task_list) {
>+		wake_up_process(curr->task);

okay. So since we limit everything to TASK_NORMAL which has to sleep
while on the list there is no need to check if we actually woken up
someone.

>+		list_del_init(&curr->task_list);
>+		break;
>+	}
>+}
>+EXPORT_SYMBOL(swake_up_locked);
>+
>+void swake_up(struct swait_queue_head *q)
>+{
>+	unsigned long flags;
>+
>+	if (!swait_active(q))
>+		return;
>+
>+	raw_spin_lock_irqsave(&q->lock, flags);
>+	__swake_up_locked(q);

I thing this should have been swake_up_locked() instead since
__swake_up_locked() isn't part of this patch.

Just a nitpick: later there is __prepare_to_swait() and __finish_swait()
which have the __ prefix instead a _locked suffix. Not sure what is
better for a better for a public API but maybe one way would be good.

>+	raw_spin_unlock_irqrestore(&q->lock, flags);
>+}
>+EXPORT_SYMBOL(swake_up);
>+
>+/*
>+ * Does not allow usage from IRQ disabled, since we must be able to
>+ * release IRQs to guarantee bounded hold time.
>+ */
>+void swake_up_all(struct swait_queue_head *q)
>+{
>+	struct swait_queue *curr, *next;
>+	LIST_HEAD(tmp);

WARN_ON(irqs_disabled()) ?

>+	if (!swait_active(q))
>+		return;
>+
>+	raw_spin_lock_irq(&q->lock);
>+	list_splice_init(&q->task_list, &tmp);
>+	while (!list_empty(&tmp)) {
>+		curr = list_first_entry(&tmp, typeof(curr), task_list);
>+
>+		wake_up_state(curr->task, state);
>+		list_del_init(&curr->task_list);

So because the task may timeout and remove itself from the list at
anytime you need to hold the lock during wakeup and the removal from the
list

>+
>+		if (list_empty(&tmp))
>+			break;
>+
>+		raw_spin_unlock_irq(&q->lock);

and you drop the lock after each iteration in case there is an IRQ 
pending or the task, that has been just woken up, has a higher priority
than the current task and needs to get on the CPU.
Not sure if this case matters:
- _this_ task (wake_all) prio 120
- first task in queue prio 10, RR
- second task in queue prio 9, RR

the *old* behavior would put the second task before the first task on
CPU. The *new* behaviour puts the first task on the CPU after dropping
the lock. The second task (that has a higher priority but nobody knows)
has to wait until the first one is done (and anything else that might
been woken up in the meantime with a higher prio than 120).

>+		raw_spin_lock_irq(&q->lock);
>+	}
>+	raw_spin_unlock_irq(&q->lock);
>+}
>+EXPORT_SYMBOL(swake_up_all);
>+
>+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
>+{
>+	wait->task = current;
>+	if (list_empty(&wait->node))
>+		list_add(&wait->task_list, &q->task_list);
>+}
>+
>+void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state)
>+{
>+	unsigned long flags;
>+
>+	raw_spin_lock_irqsave(&q->lock, flags);
>+	__prepare_to_swait(q, wait);
>+	set_current_state(state);
>+	raw_spin_unlock_irqrestore(&q->lock, flags);
>+}
>+EXPORT_SYMBOL(prepare_to_swait);
>+
>+long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state)
>+{
>+	if (signal_pending_state(state, current))
>+		return -ERESTARTSYS;
>+
>+	prepare_to_swait(q, wait, state);
>+
>+	return 0;
>+}
>+EXPORT_SYMBOL(prepare_to_swait_event);
>+
>+void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
this one has no users the __ suggests that it is locked edition. Maybe
it is for the completions…

>+{
>+	__set_current_state(TASK_RUNNING);
>+	if (!list_empty(&wait->task_list))
>+		list_del_init(&wait->task_list);
>+}
>+
>+void finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
>+{
>+	unsigned long flags;
>+
>+	__set_current_state(TASK_RUNNING);
>+
>+	if (!list_empty_careful(&wait->task_list)) {
>+		raw_spin_lock_irqsave(&q->lock, flags);
>+		list_del_init(&wait->task_list);
>+		raw_spin_unlock_irqrestore(&q->lock, flags);
>+	}
>+}
>+EXPORT_SYMBOL(finish_swait);

Sebastian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-02-25 21:02                 ` Sebastian Andrzej Siewior
@ 2015-08-07 10:57                   ` Peter Zijlstra
  2015-08-07 11:14                       ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2015-08-07 10:57 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, Paul Mackerras, Marcelo Tosatti, linux-kernel,
	linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt,
	Thomas Gleixner, kvm, Paolo Bonzini

On Wed, Feb 25, 2015 at 10:02:50PM +0100, Sebastian Andrzej Siewior wrote:

> >+static inline int swait_active(struct swait_queue_head *q)
> >+{
> >+	return !list_empty(&q->task_list);
> 
> In RT there was a smp_mb() which you dropped and I assume you had
> reasons for it.

Yeah, RT didn't have a reason for the smp_mb() -- any barrier without a
comment is a bug :-)

Also waitqueue_active(), its counterpart, does not have a barrier there
either.

Nor did I see any reason for that mb to be there.

> I assumed that one can perform list_empty_careful()
> without a lock if the items were removed with list_del_init(). But since
> nothing in -RT blow up so far I guess this here is legal, too :)

Nobody will come and arrest us for software bugs -- yet ;-)

> >+/*
> >+ * The thing about the wake_up_state() return value; I think we can ignore it.
> >+ *
> >+ * If for some reason it would return 0, that means the previously waiting
> >+ * task is already running, so it will observe condition true (or has already).
> >+ */
> >+void swake_up_locked(struct swait_queue_head *q)
> >+{
> >+	struct swait_queue *curr;
> >+
> >+	list_for_each_entry(curr, &q->task_list, task_list) {
> >+		wake_up_process(curr->task);
> 
> okay. So since we limit everything to TASK_NORMAL which has to sleep
> while on the list there is no need to check if we actually woken up
> someone.

Partly that, also that I don't see how that return value is meaningful
in the first place.

If it were to return false, the task was/is already running and it will
observe whatever condition we just satisfied to allow waking things up.

So either way around, we'll get (at least) 1 task running.

> >+		list_del_init(&curr->task_list);
> >+		break;
> >+	}
> >+}
> >+EXPORT_SYMBOL(swake_up_locked);



> >+void swake_up(struct swait_queue_head *q)
> >+{
> >+	unsigned long flags;
> >+
> >+	if (!swait_active(q))
> >+		return;
> >+
> >+	raw_spin_lock_irqsave(&q->lock, flags);
> >+	__swake_up_locked(q);
> 
> I thing this should have been swake_up_locked() instead since
> __swake_up_locked() isn't part of this patch.
> 
> Just a nitpick: later there is __prepare_to_swait() and __finish_swait()
> which have the __ prefix instead a _locked suffix. Not sure what is
> better for a better for a public API but maybe one way would be good.

Yeah, I suppose that's true ;-)

> >+	raw_spin_unlock_irqrestore(&q->lock, flags);
> >+}
> >+EXPORT_SYMBOL(swake_up);
> >+
> >+/*
> >+ * Does not allow usage from IRQ disabled, since we must be able to
> >+ * release IRQs to guarantee bounded hold time.
> >+ */
> >+void swake_up_all(struct swait_queue_head *q)
> >+{
> >+	struct swait_queue *curr, *next;
> >+	LIST_HEAD(tmp);
> 
> WARN_ON(irqs_disabled()) ?

Lockdep should already catch that by virtue of using unconditional _irq
spinlock primitives.

> >+	if (!swait_active(q))
> >+		return;
> >+
> >+	raw_spin_lock_irq(&q->lock);
> >+	list_splice_init(&q->task_list, &tmp);
> >+	while (!list_empty(&tmp)) {
> >+		curr = list_first_entry(&tmp, typeof(curr), task_list);
> >+
> >+		wake_up_state(curr->task, state);
> >+		list_del_init(&curr->task_list);
> 
> So because the task may timeout and remove itself from the list at
> anytime you need to hold the lock during wakeup and the removal from the
> list

Indeed.

> >+
> >+		if (list_empty(&tmp))
> >+			break;
> >+
> >+		raw_spin_unlock_irq(&q->lock);
> 
> and you drop the lock after each iteration in case there is an IRQ 
> pending or the task, that has been just woken up, has a higher priority
> than the current task and needs to get on the CPU.

> Not sure if this case matters:
> - _this_ task (wake_all) prio 120
> - first task in queue prio 10, RR
> - second task in queue prio 9, RR

Why complicate things? Better to not assume anything and just do the
simple correct thing.

> the *old* behavior would put the second task before the first task on
> CPU. The *new* behaviour puts the first task on the CPU after dropping
> the lock. The second task (that has a higher priority but nobody knows)
> has to wait until the first one is done (and anything else that might
> been woken up in the meantime with a higher prio than 120).

Irrelevant, we _must_ drop the lock in order to maintain bounded
behaviour.

> >+		raw_spin_lock_irq(&q->lock);
> >+	}
> >+	raw_spin_unlock_irq(&q->lock);
> >+}
> >+EXPORT_SYMBOL(swake_up_all);

> >+void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
> this one has no users the __ suggests that it is locked edition. Maybe
> it is for the completions…

Yeah, who knows, I certainly do not anymore ;-)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-08-07 10:57                   ` Peter Zijlstra
@ 2015-08-07 11:14                       ` Peter Zijlstra
  0 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2015-08-07 11:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, Paul Mackerras, Marcelo Tosatti, linux-kernel,
	linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt,
	Thomas Gleixner, kvm, Paolo Bonzini

On Fri, Aug 07, 2015 at 12:57:38PM +0200, Peter Zijlstra wrote:

> > >+void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait)

> > this one has no users the __ suggests that it is locked edition. Maybe
> > it is for the completions…
> 
> Yeah, who knows, I certainly do not anymore ;-)

On that, we cannot convert completions to swait. Because swait wake_all
must not happen from IRQ context, and complete_all() typically is used
from just that.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
@ 2015-08-07 11:14                       ` Peter Zijlstra
  0 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2015-08-07 11:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, Paul Mackerras, Marcelo Tosatti, linux-kernel,
	linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt,
	Thomas Gleixner, kvm, Paolo Bonzini

On Fri, Aug 07, 2015 at 12:57:38PM +0200, Peter Zijlstra wrote:

> > >+void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait)

> > this one has no users the __ suggests that it is locked edition. Maybe
> > it is for the completions…
> 
> Yeah, who knows, I certainly do not anymore ;-)

On that, we cannot convert completions to swait. Because swait wake_all
must not happen from IRQ context, and complete_all() typically is used
from just that.
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-08-07 11:14                       ` Peter Zijlstra
  (?)
@ 2015-08-07 16:41                       ` Christoph Hellwig
  2015-08-07 16:45                         ` Peter Zijlstra
  -1 siblings, 1 reply; 42+ messages in thread
From: Christoph Hellwig @ 2015-08-07 16:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sebastian Andrzej Siewior, Steven Rostedt, Paul Mackerras,
	Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino,
	Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm,
	Paolo Bonzini

On Fri, Aug 07, 2015 at 01:14:15PM +0200, Peter Zijlstra wrote:
> On that, we cannot convert completions to swait. Because swait wake_all
> must not happen from IRQ context, and complete_all() typically is used
> from just that.

If swait queues aren't useable from IRQ context they will be fairly
useless.  What's the problem with making them irq safe?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-08-07 16:41                       ` Christoph Hellwig
@ 2015-08-07 16:45                         ` Peter Zijlstra
  2015-08-09  6:39                           ` Christoph Hellwig
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2015-08-07 16:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sebastian Andrzej Siewior, Steven Rostedt, Paul Mackerras,
	Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino,
	Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm,
	Paolo Bonzini

On Fri, Aug 07, 2015 at 09:41:31AM -0700, Christoph Hellwig wrote:
> On Fri, Aug 07, 2015 at 01:14:15PM +0200, Peter Zijlstra wrote:
> > On that, we cannot convert completions to swait. Because swait wake_all
> > must not happen from IRQ context, and complete_all() typically is used
> > from just that.
> 
> If swait queues aren't useable from IRQ context they will be fairly
> useless.  What's the problem with making them irq safe?

Its just the swait_wake_all() that is not. The entire purpose of them
was to have something that allows bounded execution (RT and all).

Since you can have unbounded numbers of tasks waiting on a waitqueue
(well, reality has bounds of course, like total memory available etc..)
a wake_all() can end up being many many wake_process() calls.

We've had this be a problem in RT.

So the proposed swait_wake_all() requires being called from task
context, such that it can drop the lock (and IRQ disable) after every
wakeup, and thereby guarantee that higher priority things will not
experience undue latencies.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-08-07 16:45                         ` Peter Zijlstra
@ 2015-08-09  6:39                           ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2015-08-09  6:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Sebastian Andrzej Siewior, Steven Rostedt,
	Paul Mackerras, Marcelo Tosatti, linux-kernel, linux-rt-users,
	Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner,
	kvm, Paolo Bonzini

On Fri, Aug 07, 2015 at 06:45:26PM +0200, Peter Zijlstra wrote:
> Its just the swait_wake_all() that is not. The entire purpose of them
> was to have something that allows bounded execution (RT and all).

Still not sure i that might be a too big burden for mainline, but at
least it's not as severe..

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-08-07 11:14                       ` Peter Zijlstra
  (?)
  (?)
@ 2015-08-17 20:31                       ` Thomas Gleixner
  -1 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2015-08-17 20:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sebastian Andrzej Siewior, Steven Rostedt, Paul Mackerras,
	Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino,
	Rik van Riel, Steven Rostedt, kvm, Paolo Bonzini

[-- Attachment #1: Type: TEXT/PLAIN, Size: 697 bytes --]

On Fri, 7 Aug 2015, Peter Zijlstra wrote:
> On Fri, Aug 07, 2015 at 12:57:38PM +0200, Peter Zijlstra wrote:
> 
> > > >+void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
> 
> > > this one has no users the __ suggests that it is locked edition. Maybe
> > > it is for the completions…
> > 
> > Yeah, who knows, I certainly do not anymore ;-)
> 
> On that, we cannot convert completions to swait. Because swait wake_all
> must not happen from IRQ context, and complete_all() typically is used
> from just that.

Well, completions are not the ones which have a gazillion of waiters
and we got quite some performance back from using swait in completions
on RT.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-02-17 17:44             ` Sebastian Andrzej Siewior
  2015-02-18 14:03               ` Peter Zijlstra
@ 2015-02-27  0:23               ` Marcelo Tosatti
  2015-03-05  1:09                   ` Marcelo Tosatti
  1 sibling, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2015-02-27  0:23 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Peter Zijlstra, Steven Rostedt, Paul Mackerras, linux-kernel,
	linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt,
	Thomas Gleixner, kvm, Paolo Bonzini

On Tue, Feb 17, 2015 at 06:44:19PM +0100, Sebastian Andrzej Siewior wrote:
> * Peter Zijlstra | 2015-01-21 16:07:16 [+0100]:
> 
> >On Tue, Jan 20, 2015 at 01:16:13PM -0500, Steven Rostedt wrote:
> >> I'm actually wondering if we should just nuke the _interruptible()
> >> version of swait. As it should only be all interruptible or all not
> >> interruptible, that the swait_wake() should just do the wake up
> >> regardless. In which case, swait_wake() is good enough. No need to have
> >> different versions where people may think do something special.
> >> 
> >> Peter?
> >
> >Yeah, I think the lastest thing I have sitting here on my disk only has
> >the swake_up() which does TASK_NORMAL, no choice there.
> 
> what is the swait status in terms of mainline? This sounds like it
> beeing worked on.
> I could take the series but then I would drop it again if the mainline
> implementation changes…

Hi Sebastian,

No, you would just adjust it to the upstream kernel interfaces, as the rest of
the -rt users of the swait interfaces.

Can you please include the series?

Thanks


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-02-27  0:23               ` Marcelo Tosatti
@ 2015-03-05  1:09                   ` Marcelo Tosatti
  0 siblings, 0 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2015-03-05  1:09 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Peter Zijlstra, Steven Rostedt, Paul Mackerras, linux-kernel,
	linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt,
	Thomas Gleixner, kvm, Paolo Bonzini

On Thu, Feb 26, 2015 at 09:23:57PM -0300, Marcelo Tosatti wrote:
> On Tue, Feb 17, 2015 at 06:44:19PM +0100, Sebastian Andrzej Siewior wrote:
> > * Peter Zijlstra | 2015-01-21 16:07:16 [+0100]:
> > 
> > >On Tue, Jan 20, 2015 at 01:16:13PM -0500, Steven Rostedt wrote:
> > >> I'm actually wondering if we should just nuke the _interruptible()
> > >> version of swait. As it should only be all interruptible or all not
> > >> interruptible, that the swait_wake() should just do the wake up
> > >> regardless. In which case, swait_wake() is good enough. No need to have
> > >> different versions where people may think do something special.
> > >> 
> > >> Peter?
> > >
> > >Yeah, I think the lastest thing I have sitting here on my disk only has
> > >the swake_up() which does TASK_NORMAL, no choice there.
> > 
> > what is the swait status in terms of mainline? This sounds like it
> > beeing worked on.
> > I could take the series but then I would drop it again if the mainline
> > implementation changes…
> 
> Hi Sebastian,
> 
> No, you would just adjust it to the upstream kernel interfaces, as the rest of
> the -rt users of the swait interfaces.
> 
> Can you please include the series?
> 
> Thanks

Sebastian?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
@ 2015-03-05  1:09                   ` Marcelo Tosatti
  0 siblings, 0 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2015-03-05  1:09 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Peter Zijlstra, Steven Rostedt, Paul Mackerras, linux-kernel,
	linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt,
	Thomas Gleixner, kvm, Paolo Bonzini

On Thu, Feb 26, 2015 at 09:23:57PM -0300, Marcelo Tosatti wrote:
> On Tue, Feb 17, 2015 at 06:44:19PM +0100, Sebastian Andrzej Siewior wrote:
> > * Peter Zijlstra | 2015-01-21 16:07:16 [+0100]:
> > 
> > >On Tue, Jan 20, 2015 at 01:16:13PM -0500, Steven Rostedt wrote:
> > >> I'm actually wondering if we should just nuke the _interruptible()
> > >> version of swait. As it should only be all interruptible or all not
> > >> interruptible, that the swait_wake() should just do the wake up
> > >> regardless. In which case, swait_wake() is good enough. No need to have
> > >> different versions where people may think do something special.
> > >> 
> > >> Peter?
> > >
> > >Yeah, I think the lastest thing I have sitting here on my disk only has
> > >the swake_up() which does TASK_NORMAL, no choice there.
> > 
> > what is the swait status in terms of mainline? This sounds like it
> > beeing worked on.
> > I could take the series but then I would drop it again if the mainline
> > implementation changes…
> 
> Hi Sebastian,
> 
> No, you would just adjust it to the upstream kernel interfaces, as the rest of
> the -rt users of the swait interfaces.
> 
> Can you please include the series?
> 
> Thanks

Sebastian?

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-03-05  1:09                   ` Marcelo Tosatti
  (?)
@ 2015-03-05  7:42                   ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 42+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-03-05  7:42 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Peter Zijlstra, Steven Rostedt, Paul Mackerras, linux-kernel,
	linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt,
	Thomas Gleixner, kvm, Paolo Bonzini

On 03/05/2015 02:09 AM, Marcelo Tosatti wrote:
>> Can you please include the series?
>>
>> Thanks
> 
> Sebastian?

I will pick it up, don't worry. I think I do my -RT day tomorrow.

Sebastian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2015-01-14 17:12 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti
  2015-01-14 18:22   ` Rik van Riel
  2015-01-16 16:48   ` Steven Rostedt
@ 2015-03-06 13:54   ` Sebastian Andrzej Siewior
  2 siblings, 0 replies; 42+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-03-06 13:54 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel,
	Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini

* Marcelo Tosatti | 2015-01-14 15:12:52 [-0200]:

Against which tree was this prepared? Could please rebase it against
v3.18.7-rt2? Because a I see "fuzz 2", the mips file is gone and
s390 rejects almost every chunk.
And there was that mips chunk Steven noticed.
Patch #2 seems to apply but since it requires #1 I wait for the update.

Sebastian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2015-01-14 17:12 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) Marcelo Tosatti
  2015-01-14 17:12 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti
@ 2015-01-14 17:12 ` Marcelo Tosatti
  2015-01-14 18:23   ` Rik van Riel
  2015-01-14 17:35 ` [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) Paolo Bonzini
  2 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2015-01-14 17:12 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner,
	kvm, Paolo Bonzini, Marcelo Tosatti

[-- Attachment #1: kvm-lapic-irqsafe --]
[-- Type: text/plain, Size: 2691 bytes --]

Since lapic timer handler only wakes up a simple waitqueue,
it can be executed from hardirq context.

Reduces average cyclictest latency by 3us.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 arch/x86/kvm/lapic.c |   42 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

Index: linux-stable-rt/arch/x86/kvm/lapic.c
===================================================================
--- linux-stable-rt.orig/arch/x86/kvm/lapic.c	2014-11-25 14:14:38.636810068 -0200
+++ linux-stable-rt/arch/x86/kvm/lapic.c	2015-01-14 14:59:17.840251874 -0200
@@ -1031,8 +1031,38 @@
 				   apic->divide_count);
 }
 
+
+static enum hrtimer_restart apic_timer_fn(struct hrtimer *data);
+
+static void apic_timer_expired(struct hrtimer *data)
+{
+	int ret, i = 0;
+	enum hrtimer_restart r;
+	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+
+	r = apic_timer_fn(data);
+
+	if (r == HRTIMER_RESTART) {
+		do {
+			ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS);
+			if (ret == -ETIME)
+				hrtimer_add_expires_ns(&ktimer->timer,
+							ktimer->period);
+			i++;
+		} while (ret == -ETIME && i < 10);
+
+		if (ret == -ETIME) {
+			printk_once(KERN_ERR "%s: failed to reprogram timer\n",
+			       __func__);
+			WARN_ON_ONCE(1);
+		}
+	}
+}
+
+
 static void start_apic_timer(struct kvm_lapic *apic)
 {
+	int ret;
 	ktime_t now;
 	atomic_set(&apic->lapic_timer.pending, 0);
 
@@ -1062,9 +1092,11 @@
 			}
 		}
 
-		hrtimer_start(&apic->lapic_timer.timer,
+		ret = hrtimer_start(&apic->lapic_timer.timer,
 			      ktime_add_ns(now, apic->lapic_timer.period),
 			      HRTIMER_MODE_ABS);
+		if (ret == -ETIME)
+			apic_timer_expired(&apic->lapic_timer.timer);
 
 		apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
 			   PRIx64 ", "
@@ -1094,8 +1126,10 @@
 			ns = (tscdeadline - guest_tsc) * 1000000ULL;
 			do_div(ns, this_tsc_khz);
 		}
-		hrtimer_start(&apic->lapic_timer.timer,
+		ret = hrtimer_start(&apic->lapic_timer.timer,
 			ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
+		if (ret == -ETIME)
+			apic_timer_expired(&apic->lapic_timer.timer);
 
 		local_irq_restore(flags);
 	}
@@ -1581,6 +1615,7 @@
 	hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC,
 		     HRTIMER_MODE_ABS);
 	apic->lapic_timer.timer.function = apic_timer_fn;
+	apic->lapic_timer.timer.irqsafe = 1;
 
 	/*
 	 * APIC is created enabled. This will prevent kvm_lapic_set_base from
@@ -1699,7 +1734,8 @@
 
 	timer = &vcpu->arch.apic->lapic_timer.timer;
 	if (hrtimer_cancel(timer))
-		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
+		if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME)
+			apic_timer_expired(timer);
 }
 
 /*



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2015-01-14 17:12 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
@ 2015-01-14 18:23   ` Rik van Riel
  0 siblings, 0 replies; 42+ messages in thread
From: Rik van Riel @ 2015-01-14 18:23 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel, linux-rt-users
  Cc: Luiz Capitulino, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini

On 01/14/2015 12:12 PM, Marcelo Tosatti wrote:
> Since lapic timer handler only wakes up a simple waitqueue,
> it can be executed from hardirq context.
> 
> Reduces average cyclictest latency by 3us.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3)
  2015-01-14 17:12 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) Marcelo Tosatti
  2015-01-14 17:12 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti
  2015-01-14 17:12 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
@ 2015-01-14 17:35 ` Paolo Bonzini
  2 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2015-01-14 17:35 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel, linux-rt-users
  Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm



On 14/01/2015 18:12, Marcelo Tosatti wrote:
> Against v3.14-rt branch of 
> git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git
> 
> The problem:
> 
> On -RT, an emulated LAPIC timer instance has the following path:
> 
> 1) hard interrupt
> 2) ksoftirqd is scheduled
> 3) ksoftirqd wakes up vcpu thread
> 4) vcpu thread is scheduled
> 
> This extra context switch introduces unnecessary latency in the
> LAPIC path for a KVM guest.
> 
> The solution:
> 
> Allow waking up vcpu thread from hardirq context,
> thus avoiding the need for ksoftirqd to be scheduled.
> 
> Normal waitqueues make use of spinlocks, which on -RT
> are sleepable locks. Therefore, waking up a waitqueue
> waiter involves locking a sleeping lock, which
> is not allowed from hard interrupt context.
> 
> cyclictest command line:
> # cyclictest -m -n -q -p99 -l 1000000 -h60  -D 1m
> 
> This patch reduces the average latency in my tests from 14us to 11us.
> 
> v2: improve changelog (Rik van Riel)
> v3: limit (once) guest triggered printk and WARN_ON (Paolo Bonzini)
> 
> 

Looks good.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v5)
@ 2015-04-08 23:33 Marcelo Tosatti
  2015-04-08 23:33 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
  0 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2015-04-08 23:33 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Rik van Riel, Luiz Capitulino, linux-rt-users, kvm

Sebastian,

rebased against v3.18.7-rt2 as requested.

The problem:

On -RT, an emulated LAPIC timer instance has the following path:

1) hard interrupt
2) ksoftirqd is scheduled
3) ksoftirqd wakes up vcpu thread
4) vcpu thread is scheduled

This extra context switch introduces unnecessary latency in the
LAPIC path for a KVM guest.

The solution:

Allow waking up vcpu thread from hardirq context,
thus avoiding the need for ksoftirqd to be scheduled.

Normal waitqueues make use of spinlocks, which on -RT
are sleepable locks. Therefore, waking up a waitqueue
waiter involves locking a sleeping lock, which
is not allowed from hard interrupt context.

cyclictest command line:
# cyclictest -m -n -q -p99 -l 1000000 -h60  -D 1m

This patch reduces the average latency in my tests from 14us to 11us.

v2: improve changelog (Rik van Riel)
v3: limit (once) guest triggered printk and WARN_ON (Paolo Bonzini)
v4: fix typo (Steven Rostedt)
v5: rebase against v3.18.7-rt2.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2015-04-08 23:33 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v5) Marcelo Tosatti
@ 2015-04-08 23:33 ` Marcelo Tosatti
  0 siblings, 0 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2015-04-08 23:33 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Rik van Riel, Luiz Capitulino, linux-rt-users, kvm, Marcelo Tosatti

[-- Attachment #1: kvm-use-simplewaitqueue-2-lapic --]
[-- Type: text/plain, Size: 2794 bytes --]

Since lapic timer handler only wakes up a simple waitqueue,
it can be executed from hardirq context.

Also handle the case where hrtimer_start_expires fails due to -ETIME,
by injecting the interrupt to the guest immediately.

Reduces average cyclictest latency by 3us.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 arch/x86/kvm/lapic.c |   42 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

Index: rt-linux/arch/x86/kvm/lapic.c
===================================================================
--- rt-linux.orig/arch/x86/kvm/lapic.c	2015-04-08 20:20:41.000000000 -0300
+++ rt-linux/arch/x86/kvm/lapic.c	2015-04-08 20:21:16.592739674 -0300
@@ -1034,8 +1034,38 @@
 				   apic->divide_count);
 }
 
+
+static enum hrtimer_restart apic_timer_fn(struct hrtimer *data);
+
+static void apic_timer_expired(struct hrtimer *data)
+{
+	int ret, i = 0;
+	enum hrtimer_restart r;
+	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+
+	r = apic_timer_fn(data);
+
+	if (r == HRTIMER_RESTART) {
+		do {
+			ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS);
+			if (ret == -ETIME)
+				hrtimer_add_expires_ns(&ktimer->timer,
+							ktimer->period);
+			i++;
+		} while (ret == -ETIME && i < 10);
+
+		if (ret == -ETIME) {
+			printk_once(KERN_ERR "%s: failed to reprogram timer\n",
+			       __func__);
+			WARN_ON_ONCE(1);
+		}
+	}
+}
+
+
 static void start_apic_timer(struct kvm_lapic *apic)
 {
+	int ret;
 	ktime_t now;
 	atomic_set(&apic->lapic_timer.pending, 0);
 
@@ -1065,9 +1095,11 @@
 			}
 		}
 
-		hrtimer_start(&apic->lapic_timer.timer,
+		ret = hrtimer_start(&apic->lapic_timer.timer,
 			      ktime_add_ns(now, apic->lapic_timer.period),
 			      HRTIMER_MODE_ABS);
+		if (ret == -ETIME)
+			apic_timer_expired(&apic->lapic_timer.timer);
 
 		apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
 			   PRIx64 ", "
@@ -1097,8 +1129,10 @@
 			ns = (tscdeadline - guest_tsc) * 1000000ULL;
 			do_div(ns, this_tsc_khz);
 		}
-		hrtimer_start(&apic->lapic_timer.timer,
+		ret = hrtimer_start(&apic->lapic_timer.timer,
 			ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
+		if (ret == -ETIME)
+			apic_timer_expired(&apic->lapic_timer.timer);
 
 		local_irq_restore(flags);
 	}
@@ -1587,6 +1621,7 @@
 	hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC,
 		     HRTIMER_MODE_ABS);
 	apic->lapic_timer.timer.function = apic_timer_fn;
+	apic->lapic_timer.timer.irqsafe = 1;
 
 	/*
 	 * APIC is created enabled. This will prevent kvm_lapic_set_base from
@@ -1707,7 +1742,8 @@
 
 	timer = &vcpu->arch.apic->lapic_timer.timer;
 	if (hrtimer_cancel(timer))
-		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
+		if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME)
+			apic_timer_expired(timer);
 }
 
 /*



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v4)
@ 2015-01-21 20:36 Marcelo Tosatti
  2015-01-21 20:36 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
  0 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2015-01-21 20:36 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner,
	kvm, Paolo Bonzini

Against v3.14-rt branch of
git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

The problem:

On -RT, an emulated LAPIC timer instance has the following path:

1) hard interrupt
2) ksoftirqd is scheduled
3) ksoftirqd wakes up vcpu thread
4) vcpu thread is scheduled

This extra context switch introduces unnecessary latency in the
LAPIC path for a KVM guest.

The solution:

Allow waking up vcpu thread from hardirq context,
thus avoiding the need for ksoftirqd to be scheduled.

Normal waitqueues make use of spinlocks, which on -RT
are sleepable locks. Therefore, waking up a waitqueue
waiter involves locking a sleeping lock, which
is not allowed from hard interrupt context.

cyclictest command line:
# cyclictest -m -n -q -p99 -l 1000000 -h60  -D 1m

This patch reduces the average latency in my tests from 14us to 11us.

v2: improve changelog (Rik van Riel)
v3: limit (once) guest triggered printk and WARN_ON (Paolo Bonzini)
v4: fix typo (Steven Rostedt)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2015-01-21 20:36 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v4) Marcelo Tosatti
@ 2015-01-21 20:36 ` Marcelo Tosatti
  0 siblings, 0 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2015-01-21 20:36 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner,
	kvm, Paolo Bonzini, Marcelo Tosatti

[-- Attachment #1: kvm-lapic-irqsafe --]
[-- Type: text/plain, Size: 2815 bytes --]

Since lapic timer handler only wakes up a simple waitqueue,
it can be executed from hardirq context.

Also handle the case where hrtimer_start_expires fails due to -ETIME,
by injecting the interrupt to the guest immediately.

Reduces average cyclictest latency by 3us.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 arch/x86/kvm/lapic.c |   42 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

Index: linux-stable-rt/arch/x86/kvm/lapic.c
===================================================================
--- linux-stable-rt.orig/arch/x86/kvm/lapic.c	2014-11-25 14:14:38.636810068 -0200
+++ linux-stable-rt/arch/x86/kvm/lapic.c	2015-01-14 14:59:17.840251874 -0200
@@ -1031,8 +1031,38 @@
 				   apic->divide_count);
 }
 
+
+static enum hrtimer_restart apic_timer_fn(struct hrtimer *data);
+
+static void apic_timer_expired(struct hrtimer *data)
+{
+	int ret, i = 0;
+	enum hrtimer_restart r;
+	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+
+	r = apic_timer_fn(data);
+
+	if (r == HRTIMER_RESTART) {
+		do {
+			ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS);
+			if (ret == -ETIME)
+				hrtimer_add_expires_ns(&ktimer->timer,
+							ktimer->period);
+			i++;
+		} while (ret == -ETIME && i < 10);
+
+		if (ret == -ETIME) {
+			printk_once(KERN_ERR "%s: failed to reprogram timer\n",
+			       __func__);
+			WARN_ON_ONCE(1);
+		}
+	}
+}
+
+
 static void start_apic_timer(struct kvm_lapic *apic)
 {
+	int ret;
 	ktime_t now;
 	atomic_set(&apic->lapic_timer.pending, 0);
 
@@ -1062,9 +1092,11 @@
 			}
 		}
 
-		hrtimer_start(&apic->lapic_timer.timer,
+		ret = hrtimer_start(&apic->lapic_timer.timer,
 			      ktime_add_ns(now, apic->lapic_timer.period),
 			      HRTIMER_MODE_ABS);
+		if (ret == -ETIME)
+			apic_timer_expired(&apic->lapic_timer.timer);
 
 		apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
 			   PRIx64 ", "
@@ -1094,8 +1126,10 @@
 			ns = (tscdeadline - guest_tsc) * 1000000ULL;
 			do_div(ns, this_tsc_khz);
 		}
-		hrtimer_start(&apic->lapic_timer.timer,
+		ret = hrtimer_start(&apic->lapic_timer.timer,
 			ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
+		if (ret == -ETIME)
+			apic_timer_expired(&apic->lapic_timer.timer);
 
 		local_irq_restore(flags);
 	}
@@ -1581,6 +1615,7 @@
 	hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC,
 		     HRTIMER_MODE_ABS);
 	apic->lapic_timer.timer.function = apic_timer_fn;
+	apic->lapic_timer.timer.irqsafe = 1;
 
 	/*
 	 * APIC is created enabled. This will prevent kvm_lapic_set_base from
@@ -1699,7 +1734,8 @@
 
 	timer = &vcpu->arch.apic->lapic_timer.timer;
 	if (hrtimer_cancel(timer))
-		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
+		if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME)
+			apic_timer_expired(timer);
 }
 
 /*



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v2)
@ 2014-11-25 17:21 Marcelo Tosatti
  2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
  0 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2014-11-25 17:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm

The problem:

On -RT, an emulated LAPIC timer instances has the following path:

1) hard interrupt
2) ksoftirqd is scheduled
3) ksoftirqd wakes up vcpu thread
4) vcpu thread is scheduled

This extra context switch introduces unnecessary latency in the
LAPIC path for a KVM guest.

The solution:

Allow waking up vcpu thread from hardirq context,
thus avoiding the need for ksoftirqd to be scheduled.

Normal waitqueues make use of spinlocks, which on -RT
are sleepable locks. Therefore, waking up a waitqueue
waiter involves locking a sleeping lock, which
is not allowed from hard interrupt context.

cyclictest command line:
# cyclictest -m -n -q -p99 -l 1000000 -h60  -D 1m

This patch reduces the average latency in my tests from 14us to 11us.

v2: improve changelog (Rik van Riel)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-11-25 17:21 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v2) Marcelo Tosatti
@ 2014-11-25 17:21 ` Marcelo Tosatti
  2014-11-25 17:38     ` Paolo Bonzini
  2014-11-25 19:11   ` Rik van Riel
  0 siblings, 2 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2014-11-25 17:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner,
	kvm, Marcelo Tosatti

[-- Attachment #1: kvm-lapic-irqsafe --]
[-- Type: text/plain, Size: 2681 bytes --]

Since lapic timer handler only wakes up a simple waitqueue,
it can be executed from hardirq context.

Reduces average cyclictest latency by 3us.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 arch/x86/kvm/lapic.c |   42 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

Index: linux-stable-rt/arch/x86/kvm/lapic.c
===================================================================
--- linux-stable-rt.orig/arch/x86/kvm/lapic.c	2014-11-25 14:14:38.636810068 -0200
+++ linux-stable-rt/arch/x86/kvm/lapic.c	2014-11-25 14:17:28.850567059 -0200
@@ -1031,8 +1031,38 @@
 				   apic->divide_count);
 }
 
+
+static enum hrtimer_restart apic_timer_fn(struct hrtimer *data);
+
+static void apic_timer_expired(struct hrtimer *data)
+{
+	int ret, i = 0;
+	enum hrtimer_restart r;
+	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+
+	r = apic_timer_fn(data);
+
+	if (r == HRTIMER_RESTART) {
+		do {
+			ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS);
+			if (ret == -ETIME)
+				hrtimer_add_expires_ns(&ktimer->timer,
+							ktimer->period);
+			i++;
+		} while (ret == -ETIME && i < 10);
+
+		if (ret == -ETIME) {
+			printk(KERN_ERR "%s: failed to reprogram timer\n",
+			       __func__);
+			WARN_ON(1);
+		}
+	}
+}
+
+
 static void start_apic_timer(struct kvm_lapic *apic)
 {
+	int ret;
 	ktime_t now;
 	atomic_set(&apic->lapic_timer.pending, 0);
 
@@ -1062,9 +1092,11 @@
 			}
 		}
 
-		hrtimer_start(&apic->lapic_timer.timer,
+		ret = hrtimer_start(&apic->lapic_timer.timer,
 			      ktime_add_ns(now, apic->lapic_timer.period),
 			      HRTIMER_MODE_ABS);
+		if (ret == -ETIME)
+			apic_timer_expired(&apic->lapic_timer.timer);
 
 		apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
 			   PRIx64 ", "
@@ -1094,8 +1126,10 @@
 			ns = (tscdeadline - guest_tsc) * 1000000ULL;
 			do_div(ns, this_tsc_khz);
 		}
-		hrtimer_start(&apic->lapic_timer.timer,
+		ret = hrtimer_start(&apic->lapic_timer.timer,
 			ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
+		if (ret == -ETIME)
+			apic_timer_expired(&apic->lapic_timer.timer);
 
 		local_irq_restore(flags);
 	}
@@ -1581,6 +1615,7 @@
 	hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC,
 		     HRTIMER_MODE_ABS);
 	apic->lapic_timer.timer.function = apic_timer_fn;
+	apic->lapic_timer.timer.irqsafe = 1;
 
 	/*
 	 * APIC is created enabled. This will prevent kvm_lapic_set_base from
@@ -1699,7 +1734,8 @@
 
 	timer = &vcpu->arch.apic->lapic_timer.timer;
 	if (hrtimer_cancel(timer))
-		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
+		if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME)
+			apic_timer_expired(timer);
 }
 
 /*



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
@ 2014-11-25 17:38     ` Paolo Bonzini
  2014-11-25 19:11   ` Rik van Riel
  1 sibling, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2014-11-25 17:38 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel
  Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm



On 25/11/2014 18:21, Marcelo Tosatti wrote:
> +
> +	if (r == HRTIMER_RESTART) {
> +		do {
> +			ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS);
> +			if (ret == -ETIME)
> +				hrtimer_add_expires_ns(&ktimer->timer,
> +							ktimer->period);

Is it possible to just compute the time where the next interrupt
happens?  I suspect the printk and WARN_ON below can be easily triggered
by a guest.

Paolo

> +			i++;
> +		} while (ret == -ETIME && i < 10);
> +
> +		if (ret == -ETIME) {
> +			printk(KERN_ERR "%s: failed to reprogram timer\n",
> +			       __func__);
> +			WARN_ON(1);
> +		}
> +	}

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
@ 2014-11-25 17:38     ` Paolo Bonzini
  0 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2014-11-25 17:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm



On 25/11/2014 18:21, Marcelo Tosatti wrote:
> +
> +	if (r == HRTIMER_RESTART) {
> +		do {
> +			ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS);
> +			if (ret == -ETIME)
> +				hrtimer_add_expires_ns(&ktimer->timer,
> +							ktimer->period);

Is it possible to just compute the time where the next interrupt
happens?  I suspect the printk and WARN_ON below can be easily triggered
by a guest.

Paolo

> +			i++;
> +		} while (ret == -ETIME && i < 10);
> +
> +		if (ret == -ETIME) {
> +			printk(KERN_ERR "%s: failed to reprogram timer\n",
> +			       __func__);
> +			WARN_ON(1);
> +		}
> +	}

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-11-25 17:38     ` Paolo Bonzini
  (?)
@ 2014-11-25 19:01     ` Jan Kiszka
  -1 siblings, 0 replies; 42+ messages in thread
From: Jan Kiszka @ 2014-11-25 19:01 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel; +Cc: kvm

On 2014-11-25 18:38, Paolo Bonzini wrote:
> 
> 
> On 25/11/2014 18:21, Marcelo Tosatti wrote:
>> +
>> +	if (r == HRTIMER_RESTART) {
>> +		do {
>> +			ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS);
>> +			if (ret == -ETIME)
>> +				hrtimer_add_expires_ns(&ktimer->timer,
>> +							ktimer->period);
> 
> Is it possible to just compute the time where the next interrupt
> happens?  I suspect the printk and WARN_ON below can be easily triggered
> by a guest.

We have a lower bound for the period that a guest can program. Unless
that value is set too low, this should practically not happen if we
avoid disturbances while handling the event and reprogramming the next
one (irqs off?).

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-11-25 17:38     ` Paolo Bonzini
  (?)
  (?)
@ 2014-12-04 13:43     ` Marcelo Tosatti
  -1 siblings, 0 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2014-12-04 13:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, Luiz Capitulino, Rik van Riel, Steven Rostedt,
	Thomas Gleixner, kvm

On Tue, Nov 25, 2014 at 06:38:04PM +0100, Paolo Bonzini wrote:
> 
> 
> On 25/11/2014 18:21, Marcelo Tosatti wrote:
> > +
> > +	if (r == HRTIMER_RESTART) {
> > +		do {
> > +			ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS);
> > +			if (ret == -ETIME)
> > +				hrtimer_add_expires_ns(&ktimer->timer,
> > +							ktimer->period);
> 
> Is it possible to just compute the time where the next interrupt
> happens?  

Yes but there is no guarantee hrtimer_start_expires will not fail.

> I suspect the printk and WARN_ON below can be easily triggered
> by a guest.

I'll remove those, thanks.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
  2014-11-25 17:38     ` Paolo Bonzini
@ 2014-11-25 19:11   ` Rik van Riel
  2014-11-25 20:24     ` Thomas Gleixner
  1 sibling, 1 reply; 42+ messages in thread
From: Rik van Riel @ 2014-11-25 19:11 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel
  Cc: Luiz Capitulino, Steven Rostedt, Thomas Gleixner, kvm

On 11/25/2014 12:21 PM, Marcelo Tosatti wrote:
> Since lapic timer handler only wakes up a simple waitqueue,
> it can be executed from hardirq context.
> 
> Reduces average cyclictest latency by 3us.

Can this patch be merged in the KVM tree, and go
upstream via Paolo?

> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-11-25 19:11   ` Rik van Riel
@ 2014-11-25 20:24     ` Thomas Gleixner
  2014-12-04 13:53         ` Paolo Bonzini
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Gleixner @ 2014-11-25 20:24 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Marcelo Tosatti, linux-kernel, Luiz Capitulino, Steven Rostedt, kvm

On Tue, 25 Nov 2014, Rik van Riel wrote:
> On 11/25/2014 12:21 PM, Marcelo Tosatti wrote:
> > Since lapic timer handler only wakes up a simple waitqueue,
> > it can be executed from hardirq context.
> > 
> > Reduces average cyclictest latency by 3us.
> 
> Can this patch be merged in the KVM tree, and go
> upstream via Paolo?

Not really as it has RT dependencies ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-11-25 20:24     ` Thomas Gleixner
@ 2014-12-04 13:53         ` Paolo Bonzini
  0 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2014-12-04 13:53 UTC (permalink / raw)
  To: Thomas Gleixner, Rik van Riel
  Cc: Marcelo Tosatti, linux-kernel, Luiz Capitulino, Steven Rostedt, kvm



On 25/11/2014 21:24, Thomas Gleixner wrote:
> On Tue, 25 Nov 2014, Rik van Riel wrote:
>> On 11/25/2014 12:21 PM, Marcelo Tosatti wrote:
>>> Since lapic timer handler only wakes up a simple waitqueue,
>>> it can be executed from hardirq context.
>>>
>>> Reduces average cyclictest latency by 3us.
>>
>> Can this patch be merged in the KVM tree, and go
>> upstream via Paolo?
> 
> Not really as it has RT dependencies ....

Can hrtimer_start_expires even return ETIME on a non-RT kernel?  If yes,
I can take patch 2.  If not, it's better to keep both patches in the RT
tree.

Paolo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
@ 2014-12-04 13:53         ` Paolo Bonzini
  0 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2014-12-04 13:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm



On 25/11/2014 21:24, Thomas Gleixner wrote:
> On Tue, 25 Nov 2014, Rik van Riel wrote:
>> On 11/25/2014 12:21 PM, Marcelo Tosatti wrote:
>>> Since lapic timer handler only wakes up a simple waitqueue,
>>> it can be executed from hardirq context.
>>>
>>> Reduces average cyclictest latency by 3us.
>>
>> Can this patch be merged in the KVM tree, and go
>> upstream via Paolo?
> 
> Not really as it has RT dependencies ....

Can hrtimer_start_expires even return ETIME on a non-RT kernel?  If yes,
I can take patch 2.  If not, it's better to keep both patches in the RT
tree.

Paolo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-12-04 13:53         ` Paolo Bonzini
  (?)
@ 2014-12-05 16:17         ` Marcelo Tosatti
  -1 siblings, 0 replies; 42+ messages in thread
From: Marcelo Tosatti @ 2014-12-05 16:17 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Thomas Gleixner, Rik van Riel, linux-kernel, Luiz Capitulino,
	Steven Rostedt, kvm

On Thu, Dec 04, 2014 at 02:53:26PM +0100, Paolo Bonzini wrote:
> 
> 
> On 25/11/2014 21:24, Thomas Gleixner wrote:
> > On Tue, 25 Nov 2014, Rik van Riel wrote:
> >> On 11/25/2014 12:21 PM, Marcelo Tosatti wrote:
> >>> Since lapic timer handler only wakes up a simple waitqueue,
> >>> it can be executed from hardirq context.
> >>>
> >>> Reduces average cyclictest latency by 3us.
> >>
> >> Can this patch be merged in the KVM tree, and go
> >> upstream via Paolo?
> > 
> > Not really as it has RT dependencies ....
> 
> Can hrtimer_start_expires even return ETIME on a non-RT kernel?  If yes,
> I can take patch 2.  If not, it's better to keep both patches in the RT
> tree.
> 
> Paolo

It can't. We're still evaluating the necessity of this patch, will
resubmit accordingly.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue
@ 2014-11-25 16:45 Marcelo Tosatti
  2014-11-25 16:45 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
  0 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2014-11-25 16:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner

Which allows waking up vcpu from hardirq context.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-11-25 16:45 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue Marcelo Tosatti
@ 2014-11-25 16:45 ` Marcelo Tosatti
  2014-11-25 17:10   ` Paolo Bonzini
  0 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2014-11-25 16:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner,
	Marcelo Tosatti

[-- Attachment #1: kvm-lapic-irqsafe --]
[-- Type: text/plain, Size: 2681 bytes --]

Since lapic timer handler only wakes up a simple waitqueue,
it can be executed from hardirq context.

Reduces average cyclictest latency by 3us.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 arch/x86/kvm/lapic.c |   42 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

Index: linux-stable-rt/arch/x86/kvm/lapic.c
===================================================================
--- linux-stable-rt.orig/arch/x86/kvm/lapic.c	2014-11-25 14:14:38.636810068 -0200
+++ linux-stable-rt/arch/x86/kvm/lapic.c	2014-11-25 14:17:28.850567059 -0200
@@ -1031,8 +1031,38 @@
 				   apic->divide_count);
 }
 
+
+static enum hrtimer_restart apic_timer_fn(struct hrtimer *data);
+
+static void apic_timer_expired(struct hrtimer *data)
+{
+	int ret, i = 0;
+	enum hrtimer_restart r;
+	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+
+	r = apic_timer_fn(data);
+
+	if (r == HRTIMER_RESTART) {
+		do {
+			ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS);
+			if (ret == -ETIME)
+				hrtimer_add_expires_ns(&ktimer->timer,
+							ktimer->period);
+			i++;
+		} while (ret == -ETIME && i < 10);
+
+		if (ret == -ETIME) {
+			printk(KERN_ERR "%s: failed to reprogram timer\n",
+			       __func__);
+			WARN_ON(1);
+		}
+	}
+}
+
+
 static void start_apic_timer(struct kvm_lapic *apic)
 {
+	int ret;
 	ktime_t now;
 	atomic_set(&apic->lapic_timer.pending, 0);
 
@@ -1062,9 +1092,11 @@
 			}
 		}
 
-		hrtimer_start(&apic->lapic_timer.timer,
+		ret = hrtimer_start(&apic->lapic_timer.timer,
 			      ktime_add_ns(now, apic->lapic_timer.period),
 			      HRTIMER_MODE_ABS);
+		if (ret == -ETIME)
+			apic_timer_expired(&apic->lapic_timer.timer);
 
 		apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
 			   PRIx64 ", "
@@ -1094,8 +1126,10 @@
 			ns = (tscdeadline - guest_tsc) * 1000000ULL;
 			do_div(ns, this_tsc_khz);
 		}
-		hrtimer_start(&apic->lapic_timer.timer,
+		ret = hrtimer_start(&apic->lapic_timer.timer,
 			ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
+		if (ret == -ETIME)
+			apic_timer_expired(&apic->lapic_timer.timer);
 
 		local_irq_restore(flags);
 	}
@@ -1581,6 +1615,7 @@
 	hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC,
 		     HRTIMER_MODE_ABS);
 	apic->lapic_timer.timer.function = apic_timer_fn;
+	apic->lapic_timer.timer.irqsafe = 1;
 
 	/*
 	 * APIC is created enabled. This will prevent kvm_lapic_set_base from
@@ -1699,7 +1734,8 @@
 
 	timer = &vcpu->arch.apic->lapic_timer.timer;
 	if (hrtimer_cancel(timer))
-		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
+		if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME)
+			apic_timer_expired(timer);
 }
 
 /*



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-11-25 16:45 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
@ 2014-11-25 17:10   ` Paolo Bonzini
  0 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2014-11-25 17:10 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel
  Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner



On 25/11/2014 17:45, Marcelo Tosatti wrote:
> Since lapic timer handler only wakes up a simple waitqueue,
> it can be executed from hardirq context.
> 
> Reduces average cyclictest latency by 3us.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Please include kvm@vger.kernel.org too.

Paolo

> ---
>  arch/x86/kvm/lapic.c |   42 +++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 39 insertions(+), 3 deletions(-)
> 
> Index: linux-stable-rt/arch/x86/kvm/lapic.c
> ===================================================================
> --- linux-stable-rt.orig/arch/x86/kvm/lapic.c	2014-11-25 14:14:38.636810068 -0200
> +++ linux-stable-rt/arch/x86/kvm/lapic.c	2014-11-25 14:17:28.850567059 -0200
> @@ -1031,8 +1031,38 @@
>  				   apic->divide_count);
>  }
>  
> +
> +static enum hrtimer_restart apic_timer_fn(struct hrtimer *data);
> +
> +static void apic_timer_expired(struct hrtimer *data)
> +{
> +	int ret, i = 0;
> +	enum hrtimer_restart r;
> +	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
> +
> +	r = apic_timer_fn(data);
> +
> +	if (r == HRTIMER_RESTART) {
> +		do {
> +			ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS);
> +			if (ret == -ETIME)
> +				hrtimer_add_expires_ns(&ktimer->timer,
> +							ktimer->period);
> +			i++;
> +		} while (ret == -ETIME && i < 10);
> +
> +		if (ret == -ETIME) {
> +			printk(KERN_ERR "%s: failed to reprogram timer\n",
> +			       __func__);
> +			WARN_ON(1);
> +		}
> +	}
> +}
> +
> +
>  static void start_apic_timer(struct kvm_lapic *apic)
>  {
> +	int ret;
>  	ktime_t now;
>  	atomic_set(&apic->lapic_timer.pending, 0);
>  
> @@ -1062,9 +1092,11 @@
>  			}
>  		}
>  
> -		hrtimer_start(&apic->lapic_timer.timer,
> +		ret = hrtimer_start(&apic->lapic_timer.timer,
>  			      ktime_add_ns(now, apic->lapic_timer.period),
>  			      HRTIMER_MODE_ABS);
> +		if (ret == -ETIME)
> +			apic_timer_expired(&apic->lapic_timer.timer);
>  
>  		apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
>  			   PRIx64 ", "
> @@ -1094,8 +1126,10 @@
>  			ns = (tscdeadline - guest_tsc) * 1000000ULL;
>  			do_div(ns, this_tsc_khz);
>  		}
> -		hrtimer_start(&apic->lapic_timer.timer,
> +		ret = hrtimer_start(&apic->lapic_timer.timer,
>  			ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
> +		if (ret == -ETIME)
> +			apic_timer_expired(&apic->lapic_timer.timer);
>  
>  		local_irq_restore(flags);
>  	}
> @@ -1581,6 +1615,7 @@
>  	hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC,
>  		     HRTIMER_MODE_ABS);
>  	apic->lapic_timer.timer.function = apic_timer_fn;
> +	apic->lapic_timer.timer.irqsafe = 1;
>  
>  	/*
>  	 * APIC is created enabled. This will prevent kvm_lapic_set_base from
> @@ -1699,7 +1734,8 @@
>  
>  	timer = &vcpu->arch.apic->lapic_timer.timer;
>  	if (hrtimer_cancel(timer))
> -		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
> +		if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME)
> +			apic_timer_expired(timer);
>  }
>  
>  /*
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2015-08-17 20:31 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-14 17:12 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) Marcelo Tosatti
2015-01-14 17:12 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti
2015-01-14 18:22   ` Rik van Riel
2015-01-16 16:48   ` Steven Rostedt
2015-01-16 16:56     ` Steven Rostedt
2015-01-17  7:57     ` Peter Zijlstra
2015-01-19 14:41     ` Marcelo Tosatti
2015-01-20  5:46       ` Paul Mackerras
2015-01-20 18:16         ` Steven Rostedt
2015-01-21 15:07           ` Peter Zijlstra
2015-02-17 17:44             ` Sebastian Andrzej Siewior
2015-02-18 14:03               ` Peter Zijlstra
2015-02-25 21:02                 ` Sebastian Andrzej Siewior
2015-08-07 10:57                   ` Peter Zijlstra
2015-08-07 11:14                     ` Peter Zijlstra
2015-08-07 11:14                       ` Peter Zijlstra
2015-08-07 16:41                       ` Christoph Hellwig
2015-08-07 16:45                         ` Peter Zijlstra
2015-08-09  6:39                           ` Christoph Hellwig
2015-08-17 20:31                       ` Thomas Gleixner
2015-02-27  0:23               ` Marcelo Tosatti
2015-03-05  1:09                 ` Marcelo Tosatti
2015-03-05  1:09                   ` Marcelo Tosatti
2015-03-05  7:42                   ` Sebastian Andrzej Siewior
2015-03-06 13:54   ` Sebastian Andrzej Siewior
2015-01-14 17:12 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
2015-01-14 18:23   ` Rik van Riel
2015-01-14 17:35 ` [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) Paolo Bonzini
  -- strict thread matches above, loose matches on Subject: below --
2015-04-08 23:33 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v5) Marcelo Tosatti
2015-04-08 23:33 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
2015-01-21 20:36 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v4) Marcelo Tosatti
2015-01-21 20:36 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
2014-11-25 17:21 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v2) Marcelo Tosatti
2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
2014-11-25 17:38   ` Paolo Bonzini
2014-11-25 17:38     ` Paolo Bonzini
2014-11-25 19:01     ` Jan Kiszka
2014-12-04 13:43     ` Marcelo Tosatti
2014-11-25 19:11   ` Rik van Riel
2014-11-25 20:24     ` Thomas Gleixner
2014-12-04 13:53       ` Paolo Bonzini
2014-12-04 13:53         ` Paolo Bonzini
2014-12-05 16:17         ` Marcelo Tosatti
2014-11-25 16:45 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue Marcelo Tosatti
2014-11-25 16:45 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
2014-11-25 17:10   ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.