* [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) @ 2015-01-14 17:12 Marcelo Tosatti 2015-01-14 17:12 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti ` (2 more replies) 0 siblings, 3 replies; 42+ messages in thread From: Marcelo Tosatti @ 2015-01-14 17:12 UTC (permalink / raw) To: linux-kernel, linux-rt-users Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini Against v3.14-rt branch of git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git The problem: On -RT, an emulated LAPIC timer instance has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: # cyclictest -m -n -q -p99 -l 1000000 -h60 -D 1m This patch reduces the average latency in my tests from 14us to 11us. v2: improve changelog (Rik van Riel) v3: limit (once) guest triggered printk and WARN_ON (Paolo Bonzini) ^ permalink raw reply [flat|nested] 42+ messages in thread
* [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-01-14 17:12 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) Marcelo Tosatti @ 2015-01-14 17:12 ` Marcelo Tosatti 2015-01-14 18:22 ` Rik van Riel ` (2 more replies) 2015-01-14 17:12 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 2015-01-14 17:35 ` [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) Paolo Bonzini 2 siblings, 3 replies; 42+ messages in thread From: Marcelo Tosatti @ 2015-01-14 17:12 UTC (permalink / raw) To: linux-kernel, linux-rt-users Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini, Marcelo Tosatti [-- Attachment #1: kvm-wq-use-simple-waitqueue --] [-- Type: text/plain, Size: 14747 bytes --] The problem: On -RT, an emulated LAPIC timer instances has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: # cyclictest -m -n -q -p99 -l 1000000 -h60 -D 1m This patch reduces the average latency in my tests from 14us to 11us. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- arch/arm/kvm/arm.c | 4 ++-- arch/arm/kvm/psci.c | 4 ++-- arch/mips/kvm/kvm_mips.c | 8 ++++---- arch/powerpc/include/asm/kvm_host.h | 4 ++-- arch/powerpc/kvm/book3s_hv.c | 20 ++++++++++---------- arch/s390/include/asm/kvm_host.h | 2 +- arch/s390/kvm/interrupt.c | 22 ++++++++++------------ arch/s390/kvm/sigp.c | 16 ++++++++-------- arch/x86/kvm/lapic.c | 6 +++--- include/linux/kvm_host.h | 4 ++-- virt/kvm/async_pf.c | 4 ++-- virt/kvm/kvm_main.c | 16 ++++++++-------- 12 files changed, 54 insertions(+), 56 deletions(-) Index: linux-stable-rt/arch/arm/kvm/arm.c =================================================================== --- linux-stable-rt.orig/arch/arm/kvm/arm.c 2014-11-25 14:13:39.188899952 -0200 +++ linux-stable-rt/arch/arm/kvm/arm.c 2014-11-25 14:14:38.620810092 -0200 @@ -495,9 +495,9 @@ static void vcpu_pause(struct kvm_vcpu *vcpu) { - wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu); + struct swait_head *wq = kvm_arch_vcpu_wq(vcpu); - wait_event_interruptible(*wq, !vcpu->arch.pause); + swait_event_interruptible(*wq, !vcpu->arch.pause); } static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu) Index: linux-stable-rt/arch/arm/kvm/psci.c =================================================================== --- linux-stable-rt.orig/arch/arm/kvm/psci.c 2014-11-25 14:13:39.189899951 -0200 +++ linux-stable-rt/arch/arm/kvm/psci.c 2014-11-25 14:14:38.620810092 -0200 @@ -36,7 +36,7 @@ { struct kvm *kvm = source_vcpu->kvm; struct kvm_vcpu *vcpu = NULL, *tmp; - wait_queue_head_t *wq; + struct swait_head *wq; unsigned long cpu_id; unsigned long mpidr; phys_addr_t target_pc; @@ -80,7 +80,7 @@ smp_mb(); /* Make sure the above is visible */ wq = kvm_arch_vcpu_wq(vcpu); - wake_up_interruptible(wq); + swait_wake_interruptible(wq); return KVM_PSCI_RET_SUCCESS; } Index: linux-stable-rt/arch/mips/kvm/kvm_mips.c =================================================================== --- linux-stable-rt.orig/arch/mips/kvm/kvm_mips.c 2014-11-25 14:13:39.191899948 -0200 +++ linux-stable-rt/arch/mips/kvm/kvm_mips.c 2014-11-25 14:14:38.621810091 -0200 @@ -464,8 +464,8 @@ dvcpu->arch.wait = 0; - if (waitqueue_active(&dvcpu->wq)) { - wake_up_interruptible(&dvcpu->wq); + if (swaitqueue_active(&dvcpu->wq)) { + swait_wake_interruptible(&dvcpu->wq); } return 0; @@ -971,8 +971,8 @@ kvm_mips_callbacks->queue_timer_int(vcpu); vcpu->arch.wait = 0; - if (waitqueue_active(&vcpu->wq)) { - wake_up_interruptible(&vcpu->wq); + if (swaitqueue_active(&vcpu->wq)) { + swait_wake_nterruptible(&vcpu->wq); } } Index: linux-stable-rt/arch/powerpc/include/asm/kvm_host.h =================================================================== --- linux-stable-rt.orig/arch/powerpc/include/asm/kvm_host.h 2014-11-25 14:13:39.193899944 -0200 +++ linux-stable-rt/arch/powerpc/include/asm/kvm_host.h 2014-11-25 14:14:38.621810091 -0200 @@ -295,7 +295,7 @@ u8 in_guest; struct list_head runnable_threads; spinlock_t lock; - wait_queue_head_t wq; + struct swait_head wq; u64 stolen_tb; u64 preempt_tb; struct kvm_vcpu *runner; @@ -612,7 +612,7 @@ u8 prodded; u32 last_inst; - wait_queue_head_t *wqp; + struct swait_head *wqp; struct kvmppc_vcore *vcore; int ret; int trap; Index: linux-stable-rt/arch/powerpc/kvm/book3s_hv.c =================================================================== --- linux-stable-rt.orig/arch/powerpc/kvm/book3s_hv.c 2014-11-25 14:13:39.195899942 -0200 +++ linux-stable-rt/arch/powerpc/kvm/book3s_hv.c 2014-11-25 14:14:38.625810085 -0200 @@ -74,11 +74,11 @@ { int me; int cpu = vcpu->cpu; - wait_queue_head_t *wqp; + struct swait_head *wqp; wqp = kvm_arch_vcpu_wq(vcpu); - if (waitqueue_active(wqp)) { - wake_up_interruptible(wqp); + if (swaitqueue_active(wqp)) { + swait_wake_interruptible(wqp); ++vcpu->stat.halt_wakeup; } @@ -583,8 +583,8 @@ tvcpu->arch.prodded = 1; smp_mb(); if (vcpu->arch.ceded) { - if (waitqueue_active(&vcpu->wq)) { - wake_up_interruptible(&vcpu->wq); + if (swaitqueue_active(&vcpu->wq)) { + swait_wake_interruptible(&vcpu->wq); vcpu->stat.halt_wakeup++; } } @@ -1199,7 +1199,7 @@ if (vcore) { INIT_LIST_HEAD(&vcore->runnable_threads); spin_lock_init(&vcore->lock); - init_waitqueue_head(&vcore->wq); + init_swait_head(&vcore->wq); vcore->preempt_tb = TB_NIL; vcore->lpcr = kvm->arch.lpcr; vcore->first_vcpuid = core * threads_per_core; @@ -1566,13 +1566,13 @@ */ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) { - DEFINE_WAIT(wait); + DEFINE_SWAITER(wait); - prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE); + swait_prepare(&vc->wq, &wait, TASK_INTERRUPTIBLE); vc->vcore_state = VCORE_SLEEPING; spin_unlock(&vc->lock); schedule(); - finish_wait(&vc->wq, &wait); + swait_finish(&vc->wq, &wait); spin_lock(&vc->lock); vc->vcore_state = VCORE_INACTIVE; } @@ -1613,7 +1613,7 @@ kvmppc_create_dtl_entry(vcpu, vc); kvmppc_start_thread(vcpu); } else if (vc->vcore_state == VCORE_SLEEPING) { - wake_up(&vc->wq); + swait_wake(&vc->wq); } } Index: linux-stable-rt/arch/s390/include/asm/kvm_host.h =================================================================== --- linux-stable-rt.orig/arch/s390/include/asm/kvm_host.h 2014-11-25 14:13:39.196899940 -0200 +++ linux-stable-rt/arch/s390/include/asm/kvm_host.h 2014-11-25 14:14:38.627810082 -0200 @@ -233,7 +233,7 @@ atomic_t active; struct kvm_s390_float_interrupt *float_int; int timer_due; /* event indicator for waitqueue below */ - wait_queue_head_t *wq; + struct swait_head *wq; atomic_t *cpuflags; unsigned int action_bits; }; Index: linux-stable-rt/arch/s390/kvm/interrupt.c =================================================================== --- linux-stable-rt.orig/arch/s390/kvm/interrupt.c 2014-11-25 14:13:39.197899939 -0200 +++ linux-stable-rt/arch/s390/kvm/interrupt.c 2014-11-25 14:14:38.630810077 -0200 @@ -403,7 +403,7 @@ int kvm_s390_handle_wait(struct kvm_vcpu *vcpu) { u64 now, sltime; - DECLARE_WAITQUEUE(wait, current); + DECLARE_SWAITER(wait); vcpu->stat.exit_wait_state++; if (kvm_cpu_has_interrupt(vcpu)) @@ -440,12 +440,11 @@ srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); spin_lock(&vcpu->arch.local_int.float_int->lock); spin_lock_bh(&vcpu->arch.local_int.lock); - add_wait_queue(&vcpu->wq, &wait); + swait_prepare(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); while (list_empty(&vcpu->arch.local_int.list) && list_empty(&vcpu->arch.local_int.float_int->list) && (!vcpu->arch.local_int.timer_due) && !signal_pending(current)) { - set_current_state(TASK_INTERRUPTIBLE); spin_unlock_bh(&vcpu->arch.local_int.lock); spin_unlock(&vcpu->arch.local_int.float_int->lock); schedule(); @@ -453,8 +452,7 @@ spin_lock_bh(&vcpu->arch.local_int.lock); } __unset_cpu_idle(vcpu); - __set_current_state(TASK_RUNNING); - remove_wait_queue(&vcpu->wq, &wait); + swait_finish(&vcpu->wq, &wait); spin_unlock_bh(&vcpu->arch.local_int.lock); spin_unlock(&vcpu->arch.local_int.float_int->lock); vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); @@ -469,8 +467,8 @@ spin_lock(&vcpu->arch.local_int.lock); vcpu->arch.local_int.timer_due = 1; - if (waitqueue_active(&vcpu->wq)) - wake_up_interruptible(&vcpu->wq); + if (swaitqueue_active(&vcpu->wq)) + swait_wake_interruptible(&vcpu->wq); spin_unlock(&vcpu->arch.local_int.lock); } @@ -617,7 +615,7 @@ spin_lock_bh(&li->lock); list_add(&inti->list, &li->list); atomic_set(&li->active, 1); - BUG_ON(waitqueue_active(li->wq)); + BUG_ON(swaitqueue_active(li->wq)); spin_unlock_bh(&li->lock); return 0; } @@ -750,8 +748,8 @@ li = fi->local_int[sigcpu]; spin_lock_bh(&li->lock); atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags); - if (waitqueue_active(li->wq)) - wake_up_interruptible(li->wq); + if (swaitqueue_active(li->wq)) + swait_wake_interruptible(li->wq); spin_unlock_bh(&li->lock); spin_unlock(&fi->lock); mutex_unlock(&kvm->lock); @@ -836,8 +834,8 @@ if (inti->type == KVM_S390_SIGP_STOP) li->action_bits |= ACTION_STOP_ON_STOP; atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags); - if (waitqueue_active(&vcpu->wq)) - wake_up_interruptible(&vcpu->wq); + if (swaitqueue_active(&vcpu->wq)) + swait_wake_interruptible(&vcpu->wq); spin_unlock_bh(&li->lock); mutex_unlock(&vcpu->kvm->lock); return 0; Index: linux-stable-rt/arch/s390/kvm/sigp.c =================================================================== --- linux-stable-rt.orig/arch/s390/kvm/sigp.c 2014-11-25 14:13:39.198899937 -0200 +++ linux-stable-rt/arch/s390/kvm/sigp.c 2014-11-25 14:14:38.633810073 -0200 @@ -79,8 +79,8 @@ list_add_tail(&inti->list, &li->list); atomic_set(&li->active, 1); atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags); - if (waitqueue_active(li->wq)) - wake_up_interruptible(li->wq); + if (swaitqueue_active(li->wq)) + swait_wake_interruptible(li->wq); spin_unlock_bh(&li->lock); rc = SIGP_CC_ORDER_CODE_ACCEPTED; VCPU_EVENT(vcpu, 4, "sent sigp emerg to cpu %x", cpu_addr); @@ -148,8 +148,8 @@ list_add_tail(&inti->list, &li->list); atomic_set(&li->active, 1); atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags); - if (waitqueue_active(li->wq)) - wake_up_interruptible(li->wq); + if (swaitqueue_active(li->wq)) + swait_wake_interruptible(li->wq); spin_unlock_bh(&li->lock); rc = SIGP_CC_ORDER_CODE_ACCEPTED; VCPU_EVENT(vcpu, 4, "sent sigp ext call to cpu %x", cpu_addr); @@ -179,8 +179,8 @@ atomic_set(&li->active, 1); atomic_set_mask(CPUSTAT_STOP_INT, li->cpuflags); li->action_bits |= action; - if (waitqueue_active(li->wq)) - wake_up_interruptible(li->wq); + if (swaitqueue_active(li->wq)) + swait_wake_interruptible(li->wq); out: spin_unlock_bh(&li->lock); @@ -288,8 +288,8 @@ list_add_tail(&inti->list, &li->list); atomic_set(&li->active, 1); - if (waitqueue_active(li->wq)) - wake_up_interruptible(li->wq); + if (swaitqueue_active(li->wq)) + swait_wake_interruptible(li->wq); rc = SIGP_CC_ORDER_CODE_ACCEPTED; VCPU_EVENT(vcpu, 4, "set prefix of cpu %02x to %x", cpu_addr, address); Index: linux-stable-rt/arch/x86/kvm/lapic.c =================================================================== --- linux-stable-rt.orig/arch/x86/kvm/lapic.c 2014-11-25 14:13:39.199899935 -0200 +++ linux-stable-rt/arch/x86/kvm/lapic.c 2015-01-14 15:02:41.135771065 -0200 @@ -1533,7 +1533,7 @@ struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); struct kvm_lapic *apic = container_of(ktimer, struct kvm_lapic, lapic_timer); struct kvm_vcpu *vcpu = apic->vcpu; - wait_queue_head_t *q = &vcpu->wq; + struct swait_head *q = &vcpu->wq; /* * There is a race window between reading and incrementing, but we do @@ -1547,8 +1547,8 @@ kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); } - if (waitqueue_active(q)) - wake_up_interruptible(q); + if (swaitqueue_active(q)) + swait_wake_interruptible(q); if (lapic_is_periodic(apic)) { hrtimer_add_expires_ns(&ktimer->timer, ktimer->period); Index: linux-stable-rt/include/linux/kvm_host.h =================================================================== --- linux-stable-rt.orig/include/linux/kvm_host.h 2014-11-25 14:13:39.201899932 -0200 +++ linux-stable-rt/include/linux/kvm_host.h 2014-11-25 14:14:38.638810065 -0200 @@ -231,7 +231,7 @@ int fpu_active; int guest_fpu_loaded, guest_xcr0_loaded; - wait_queue_head_t wq; + struct swait_head wq; struct pid *pid; int sigset_active; sigset_t sigset; @@ -677,7 +677,7 @@ } #endif -static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu) +static inline struct swait_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu) { #ifdef __KVM_HAVE_ARCH_WQP return vcpu->arch.wqp; Index: linux-stable-rt/virt/kvm/async_pf.c =================================================================== --- linux-stable-rt.orig/virt/kvm/async_pf.c 2014-11-25 14:13:39.202899931 -0200 +++ linux-stable-rt/virt/kvm/async_pf.c 2014-11-25 14:14:38.639810063 -0200 @@ -82,8 +82,8 @@ trace_kvm_async_pf_completed(addr, gva); - if (waitqueue_active(&vcpu->wq)) - wake_up_interruptible(&vcpu->wq); + if (swaitqueue_active(&vcpu->wq)) + swait_wake_interruptible(&vcpu->wq); mmput(mm); kvm_put_kvm(vcpu->kvm); Index: linux-stable-rt/virt/kvm/kvm_main.c =================================================================== --- linux-stable-rt.orig/virt/kvm/kvm_main.c 2014-11-25 14:13:39.204899928 -0200 +++ linux-stable-rt/virt/kvm/kvm_main.c 2014-11-25 14:14:38.641810060 -0200 @@ -219,7 +219,7 @@ vcpu->kvm = kvm; vcpu->vcpu_id = id; vcpu->pid = NULL; - init_waitqueue_head(&vcpu->wq); + init_swait_head(&vcpu->wq); kvm_async_pf_vcpu_init(vcpu); page = alloc_page(GFP_KERNEL | __GFP_ZERO); @@ -1675,10 +1675,10 @@ */ void kvm_vcpu_block(struct kvm_vcpu *vcpu) { - DEFINE_WAIT(wait); + DEFINE_SWAITER(wait); for (;;) { - prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); + swait_prepare(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); if (kvm_arch_vcpu_runnable(vcpu)) { kvm_make_request(KVM_REQ_UNHALT, vcpu); @@ -1692,7 +1692,7 @@ schedule(); } - finish_wait(&vcpu->wq, &wait); + swait_finish(&vcpu->wq, &wait); } EXPORT_SYMBOL_GPL(kvm_vcpu_block); @@ -1704,11 +1704,11 @@ { int me; int cpu = vcpu->cpu; - wait_queue_head_t *wqp; + struct swait_head *wqp; wqp = kvm_arch_vcpu_wq(vcpu); - if (waitqueue_active(wqp)) { - wake_up_interruptible(wqp); + if (swaitqueue_active(wqp)) { + swait_wake_interruptible(wqp); ++vcpu->stat.halt_wakeup; } @@ -1814,7 +1814,7 @@ continue; if (vcpu == me) continue; - if (waitqueue_active(&vcpu->wq)) + if (swaitqueue_active(&vcpu->wq)) continue; if (!kvm_vcpu_eligible_for_directed_yield(vcpu)) continue; ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-01-14 17:12 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti @ 2015-01-14 18:22 ` Rik van Riel 2015-01-16 16:48 ` Steven Rostedt 2015-03-06 13:54 ` Sebastian Andrzej Siewior 2 siblings, 0 replies; 42+ messages in thread From: Rik van Riel @ 2015-01-14 18:22 UTC (permalink / raw) To: Marcelo Tosatti, linux-kernel, linux-rt-users Cc: Luiz Capitulino, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini On 01/14/2015 12:12 PM, Marcelo Tosatti wrote: > The problem: > > On -RT, an emulated LAPIC timer instances has the following path: > > 1) hard interrupt > 2) ksoftirqd is scheduled > 3) ksoftirqd wakes up vcpu thread > 4) vcpu thread is scheduled > > This extra context switch introduces unnecessary latency in the > LAPIC path for a KVM guest. > This patch reduces the average latency in my tests from 14us to 11us. > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-01-14 17:12 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti 2015-01-14 18:22 ` Rik van Riel @ 2015-01-16 16:48 ` Steven Rostedt 2015-01-16 16:56 ` Steven Rostedt ` (2 more replies) 2015-03-06 13:54 ` Sebastian Andrzej Siewior 2 siblings, 3 replies; 42+ messages in thread From: Steven Rostedt @ 2015-01-16 16:48 UTC (permalink / raw) To: Marcelo Tosatti Cc: linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini, Peter Zijlstra On Fri, 16 Jan 2015 11:40:45 -0500 Marcelo Tosatti <mtosatti@redhat.com> wrote: > The problem: > > On -RT, an emulated LAPIC timer instances has the following path: > > 1) hard interrupt > 2) ksoftirqd is scheduled > 3) ksoftirqd wakes up vcpu thread > 4) vcpu thread is scheduled > > This extra context switch introduces unnecessary latency in the > LAPIC path for a KVM guest. > > The solution: > > Allow waking up vcpu thread from hardirq context, > thus avoiding the need for ksoftirqd to be scheduled. > > Normal waitqueues make use of spinlocks, which on -RT > are sleepable locks. Therefore, waking up a waitqueue > waiter involves locking a sleeping lock, which > is not allowed from hard interrupt context. > I have no issue in pulling this into stable-rt, but read below. > cyclictest command line: > # cyclictest -m -n -q -p99 -l 1000000 -h60 -D 1m > > This patch reduces the average latency in my tests from 14us to 11us. > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > --- > arch/arm/kvm/arm.c | 4 ++-- > arch/arm/kvm/psci.c | 4 ++-- > arch/mips/kvm/kvm_mips.c | 8 ++++---- > arch/powerpc/include/asm/kvm_host.h | 4 ++-- > arch/powerpc/kvm/book3s_hv.c | 20 ++++++++++---------- > arch/s390/include/asm/kvm_host.h | 2 +- > arch/s390/kvm/interrupt.c | 22 ++++++++++------------ > arch/s390/kvm/sigp.c | 16 ++++++++-------- > arch/x86/kvm/lapic.c | 6 +++--- > include/linux/kvm_host.h | 4 ++-- > virt/kvm/async_pf.c | 4 ++-- > virt/kvm/kvm_main.c | 16 ++++++++-------- > 12 files changed, 54 insertions(+), 56 deletions(-) > > Index: linux-stable-rt/arch/arm/kvm/arm.c > =================================================================== > --- linux-stable-rt.orig/arch/arm/kvm/arm.c 2014-11-25 14:13:39.188899952 -0200 > +++ linux-stable-rt/arch/arm/kvm/arm.c 2014-11-25 14:14:38.620810092 -0200 > @@ -495,9 +495,9 @@ > > static void vcpu_pause(struct kvm_vcpu *vcpu) > { > - wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu); > + struct swait_head *wq = kvm_arch_vcpu_wq(vcpu); > > - wait_event_interruptible(*wq, !vcpu->arch.pause); > + swait_event_interruptible(*wq, !vcpu->arch.pause); > } > > static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu) > Index: linux-stable-rt/arch/arm/kvm/psci.c > =================================================================== > --- linux-stable-rt.orig/arch/arm/kvm/psci.c 2014-11-25 14:13:39.189899951 -0200 > +++ linux-stable-rt/arch/arm/kvm/psci.c 2014-11-25 14:14:38.620810092 -0200 > @@ -36,7 +36,7 @@ > { > struct kvm *kvm = source_vcpu->kvm; > struct kvm_vcpu *vcpu = NULL, *tmp; > - wait_queue_head_t *wq; > + struct swait_head *wq; > unsigned long cpu_id; > unsigned long mpidr; > phys_addr_t target_pc; > @@ -80,7 +80,7 @@ > smp_mb(); /* Make sure the above is visible */ > > wq = kvm_arch_vcpu_wq(vcpu); > - wake_up_interruptible(wq); > + swait_wake_interruptible(wq); > > return KVM_PSCI_RET_SUCCESS; > } > Index: linux-stable-rt/arch/mips/kvm/kvm_mips.c > =================================================================== > --- linux-stable-rt.orig/arch/mips/kvm/kvm_mips.c 2014-11-25 14:13:39.191899948 -0200 > +++ linux-stable-rt/arch/mips/kvm/kvm_mips.c 2014-11-25 14:14:38.621810091 -0200 > @@ -464,8 +464,8 @@ > > dvcpu->arch.wait = 0; > > - if (waitqueue_active(&dvcpu->wq)) { > - wake_up_interruptible(&dvcpu->wq); > + if (swaitqueue_active(&dvcpu->wq)) { > + swait_wake_interruptible(&dvcpu->wq); > } > > return 0; > @@ -971,8 +971,8 @@ > kvm_mips_callbacks->queue_timer_int(vcpu); > > vcpu->arch.wait = 0; > - if (waitqueue_active(&vcpu->wq)) { > - wake_up_interruptible(&vcpu->wq); > + if (swaitqueue_active(&vcpu->wq)) { > + swait_wake_nterruptible(&vcpu->wq); Sure you compiled this patch? > } > } > > Index: linux-stable-rt/arch/powerpc/include/asm/kvm_host.h > =================================================================== > --- linux-stable-rt.orig/arch/powerpc/include/asm/kvm_host.h 2014-11-25 14:13:39.193899944 -0200 > +++ linux-stable-rt/arch/powerpc/include/asm/kvm_host.h 2014-11-25 14:14:38.621810091 -0200 > @@ -295,7 +295,7 @@ > u8 in_guest; > struct list_head runnable_threads; > spinlock_t lock; > - wait_queue_head_t wq; > + struct swait_head wq; > u64 stolen_tb; > u64 preempt_tb; > struct kvm_vcpu *runner; > @@ -612,7 +612,7 @@ > u8 prodded; > u32 last_inst; > > - wait_queue_head_t *wqp; > + struct swait_head *wqp; > struct kvmppc_vcore *vcore; > int ret; > int trap; > Index: linux-stable-rt/arch/powerpc/kvm/book3s_hv.c > =================================================================== > --- linux-stable-rt.orig/arch/powerpc/kvm/book3s_hv.c 2014-11-25 14:13:39.195899942 -0200 > +++ linux-stable-rt/arch/powerpc/kvm/book3s_hv.c 2014-11-25 14:14:38.625810085 -0200 > @@ -74,11 +74,11 @@ > { > int me; > int cpu = vcpu->cpu; > - wait_queue_head_t *wqp; > + struct swait_head *wqp; > > wqp = kvm_arch_vcpu_wq(vcpu); > - if (waitqueue_active(wqp)) { > - wake_up_interruptible(wqp); > + if (swaitqueue_active(wqp)) { > + swait_wake_interruptible(wqp); > ++vcpu->stat.halt_wakeup; > } > > @@ -583,8 +583,8 @@ > tvcpu->arch.prodded = 1; > smp_mb(); > if (vcpu->arch.ceded) { > - if (waitqueue_active(&vcpu->wq)) { > - wake_up_interruptible(&vcpu->wq); > + if (swaitqueue_active(&vcpu->wq)) { > + swait_wake_interruptible(&vcpu->wq); > vcpu->stat.halt_wakeup++; > } > } > @@ -1199,7 +1199,7 @@ > if (vcore) { > INIT_LIST_HEAD(&vcore->runnable_threads); > spin_lock_init(&vcore->lock); > - init_waitqueue_head(&vcore->wq); > + init_swait_head(&vcore->wq); > vcore->preempt_tb = TB_NIL; > vcore->lpcr = kvm->arch.lpcr; > vcore->first_vcpuid = core * threads_per_core; > @@ -1566,13 +1566,13 @@ > */ > static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) > { > - DEFINE_WAIT(wait); > + DEFINE_SWAITER(wait); > > - prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE); > + swait_prepare(&vc->wq, &wait, TASK_INTERRUPTIBLE); > vc->vcore_state = VCORE_SLEEPING; > spin_unlock(&vc->lock); > schedule(); > - finish_wait(&vc->wq, &wait); > + swait_finish(&vc->wq, &wait); > spin_lock(&vc->lock); > vc->vcore_state = VCORE_INACTIVE; > } > @@ -1613,7 +1613,7 @@ > kvmppc_create_dtl_entry(vcpu, vc); > kvmppc_start_thread(vcpu); > } else if (vc->vcore_state == VCORE_SLEEPING) { > - wake_up(&vc->wq); > + swait_wake(&vc->wq); I notice everywhere you have a swait_wake_interruptible() but here. Is there a reason why? IIRC, Peter wants to make swait wakeup usage homogenous. That is, you either sleep in an interruptible state, or you don't. You can't mix and match it. Peter is that what you plan? -- Steve > } > > } > Index: linux-stable-rt/arch/s390/include/asm/kvm_host.h > =================================================================== > --- linux-stable-rt.orig/arch/s390/include/asm/kvm_host.h 2014-11-25 14:13:39.196899940 -0200 > +++ linux-stable-rt/arch/s390/include/asm/kvm_host.h 2014-11-25 14:14:38.627810082 -0200 > @@ -233,7 +233,7 @@ > atomic_t active; > struct kvm_s390_float_interrupt *float_int; > int timer_due; /* event indicator for waitqueue below */ > - wait_queue_head_t *wq; > + struct swait_head *wq; > atomic_t *cpuflags; > unsigned int action_bits; > }; > Index: linux-stable-rt/arch/s390/kvm/interrupt.c > =================================================================== > --- linux-stable-rt.orig/arch/s390/kvm/interrupt.c 2014-11-25 14:13:39.197899939 -0200 > +++ linux-stable-rt/arch/s390/kvm/interrupt.c 2014-11-25 14:14:38.630810077 -0200 > @@ -403,7 +403,7 @@ > int kvm_s390_handle_wait(struct kvm_vcpu *vcpu) > { > u64 now, sltime; > - DECLARE_WAITQUEUE(wait, current); > + DECLARE_SWAITER(wait); > > vcpu->stat.exit_wait_state++; > if (kvm_cpu_has_interrupt(vcpu)) > @@ -440,12 +440,11 @@ > srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); > spin_lock(&vcpu->arch.local_int.float_int->lock); > spin_lock_bh(&vcpu->arch.local_int.lock); > - add_wait_queue(&vcpu->wq, &wait); > + swait_prepare(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); > while (list_empty(&vcpu->arch.local_int.list) && > list_empty(&vcpu->arch.local_int.float_int->list) && > (!vcpu->arch.local_int.timer_due) && > !signal_pending(current)) { > - set_current_state(TASK_INTERRUPTIBLE); > spin_unlock_bh(&vcpu->arch.local_int.lock); > spin_unlock(&vcpu->arch.local_int.float_int->lock); > schedule(); > @@ -453,8 +452,7 @@ > spin_lock_bh(&vcpu->arch.local_int.lock); > } > __unset_cpu_idle(vcpu); > - __set_current_state(TASK_RUNNING); > - remove_wait_queue(&vcpu->wq, &wait); > + swait_finish(&vcpu->wq, &wait); > spin_unlock_bh(&vcpu->arch.local_int.lock); > spin_unlock(&vcpu->arch.local_int.float_int->lock); > vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); > @@ -469,8 +467,8 @@ > > spin_lock(&vcpu->arch.local_int.lock); > vcpu->arch.local_int.timer_due = 1; > - if (waitqueue_active(&vcpu->wq)) > - wake_up_interruptible(&vcpu->wq); > + if (swaitqueue_active(&vcpu->wq)) > + swait_wake_interruptible(&vcpu->wq); > spin_unlock(&vcpu->arch.local_int.lock); > } > > @@ -617,7 +615,7 @@ > spin_lock_bh(&li->lock); > list_add(&inti->list, &li->list); > atomic_set(&li->active, 1); > - BUG_ON(waitqueue_active(li->wq)); > + BUG_ON(swaitqueue_active(li->wq)); > spin_unlock_bh(&li->lock); > return 0; > } > @@ -750,8 +748,8 @@ > li = fi->local_int[sigcpu]; > spin_lock_bh(&li->lock); > atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags); > - if (waitqueue_active(li->wq)) > - wake_up_interruptible(li->wq); > + if (swaitqueue_active(li->wq)) > + swait_wake_interruptible(li->wq); > spin_unlock_bh(&li->lock); > spin_unlock(&fi->lock); > mutex_unlock(&kvm->lock); > @@ -836,8 +834,8 @@ > if (inti->type == KVM_S390_SIGP_STOP) > li->action_bits |= ACTION_STOP_ON_STOP; > atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags); > - if (waitqueue_active(&vcpu->wq)) > - wake_up_interruptible(&vcpu->wq); > + if (swaitqueue_active(&vcpu->wq)) > + swait_wake_interruptible(&vcpu->wq); > spin_unlock_bh(&li->lock); > mutex_unlock(&vcpu->kvm->lock); > return 0; > Index: linux-stable-rt/arch/s390/kvm/sigp.c > =================================================================== > --- linux-stable-rt.orig/arch/s390/kvm/sigp.c 2014-11-25 14:13:39.198899937 -0200 > +++ linux-stable-rt/arch/s390/kvm/sigp.c 2014-11-25 14:14:38.633810073 -0200 > @@ -79,8 +79,8 @@ > list_add_tail(&inti->list, &li->list); > atomic_set(&li->active, 1); > atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags); > - if (waitqueue_active(li->wq)) > - wake_up_interruptible(li->wq); > + if (swaitqueue_active(li->wq)) > + swait_wake_interruptible(li->wq); > spin_unlock_bh(&li->lock); > rc = SIGP_CC_ORDER_CODE_ACCEPTED; > VCPU_EVENT(vcpu, 4, "sent sigp emerg to cpu %x", cpu_addr); > @@ -148,8 +148,8 @@ > list_add_tail(&inti->list, &li->list); > atomic_set(&li->active, 1); > atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags); > - if (waitqueue_active(li->wq)) > - wake_up_interruptible(li->wq); > + if (swaitqueue_active(li->wq)) > + swait_wake_interruptible(li->wq); > spin_unlock_bh(&li->lock); > rc = SIGP_CC_ORDER_CODE_ACCEPTED; > VCPU_EVENT(vcpu, 4, "sent sigp ext call to cpu %x", cpu_addr); > @@ -179,8 +179,8 @@ > atomic_set(&li->active, 1); > atomic_set_mask(CPUSTAT_STOP_INT, li->cpuflags); > li->action_bits |= action; > - if (waitqueue_active(li->wq)) > - wake_up_interruptible(li->wq); > + if (swaitqueue_active(li->wq)) > + swait_wake_interruptible(li->wq); > out: > spin_unlock_bh(&li->lock); > > @@ -288,8 +288,8 @@ > > list_add_tail(&inti->list, &li->list); > atomic_set(&li->active, 1); > - if (waitqueue_active(li->wq)) > - wake_up_interruptible(li->wq); > + if (swaitqueue_active(li->wq)) > + swait_wake_interruptible(li->wq); > rc = SIGP_CC_ORDER_CODE_ACCEPTED; > > VCPU_EVENT(vcpu, 4, "set prefix of cpu %02x to %x", cpu_addr, address); > Index: linux-stable-rt/arch/x86/kvm/lapic.c > =================================================================== > --- linux-stable-rt.orig/arch/x86/kvm/lapic.c 2014-11-25 14:13:39.199899935 -0200 > +++ linux-stable-rt/arch/x86/kvm/lapic.c 2015-01-14 15:02:41.135771065 -0200 > @@ -1533,7 +1533,7 @@ > struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); > struct kvm_lapic *apic = container_of(ktimer, struct kvm_lapic, lapic_timer); > struct kvm_vcpu *vcpu = apic->vcpu; > - wait_queue_head_t *q = &vcpu->wq; > + struct swait_head *q = &vcpu->wq; > > /* > * There is a race window between reading and incrementing, but we do > @@ -1547,8 +1547,8 @@ > kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); > } > > - if (waitqueue_active(q)) > - wake_up_interruptible(q); > + if (swaitqueue_active(q)) > + swait_wake_interruptible(q); > > if (lapic_is_periodic(apic)) { > hrtimer_add_expires_ns(&ktimer->timer, ktimer->period); > Index: linux-stable-rt/include/linux/kvm_host.h > =================================================================== > --- linux-stable-rt.orig/include/linux/kvm_host.h 2014-11-25 14:13:39.201899932 -0200 > +++ linux-stable-rt/include/linux/kvm_host.h 2014-11-25 14:14:38.638810065 -0200 > @@ -231,7 +231,7 @@ > > int fpu_active; > int guest_fpu_loaded, guest_xcr0_loaded; > - wait_queue_head_t wq; > + struct swait_head wq; > struct pid *pid; > int sigset_active; > sigset_t sigset; > @@ -677,7 +677,7 @@ > } > #endif > > -static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu) > +static inline struct swait_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu) > { > #ifdef __KVM_HAVE_ARCH_WQP > return vcpu->arch.wqp; > Index: linux-stable-rt/virt/kvm/async_pf.c > =================================================================== > --- linux-stable-rt.orig/virt/kvm/async_pf.c 2014-11-25 14:13:39.202899931 -0200 > +++ linux-stable-rt/virt/kvm/async_pf.c 2014-11-25 14:14:38.639810063 -0200 > @@ -82,8 +82,8 @@ > > trace_kvm_async_pf_completed(addr, gva); > > - if (waitqueue_active(&vcpu->wq)) > - wake_up_interruptible(&vcpu->wq); > + if (swaitqueue_active(&vcpu->wq)) > + swait_wake_interruptible(&vcpu->wq); > > mmput(mm); > kvm_put_kvm(vcpu->kvm); > Index: linux-stable-rt/virt/kvm/kvm_main.c > =================================================================== > --- linux-stable-rt.orig/virt/kvm/kvm_main.c 2014-11-25 14:13:39.204899928 -0200 > +++ linux-stable-rt/virt/kvm/kvm_main.c 2014-11-25 14:14:38.641810060 -0200 > @@ -219,7 +219,7 @@ > vcpu->kvm = kvm; > vcpu->vcpu_id = id; > vcpu->pid = NULL; > - init_waitqueue_head(&vcpu->wq); > + init_swait_head(&vcpu->wq); > kvm_async_pf_vcpu_init(vcpu); > > page = alloc_page(GFP_KERNEL | __GFP_ZERO); > @@ -1675,10 +1675,10 @@ > */ > void kvm_vcpu_block(struct kvm_vcpu *vcpu) > { > - DEFINE_WAIT(wait); > + DEFINE_SWAITER(wait); > > for (;;) { > - prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); > + swait_prepare(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); > > if (kvm_arch_vcpu_runnable(vcpu)) { > kvm_make_request(KVM_REQ_UNHALT, vcpu); > @@ -1692,7 +1692,7 @@ > schedule(); > } > > - finish_wait(&vcpu->wq, &wait); > + swait_finish(&vcpu->wq, &wait); > } > EXPORT_SYMBOL_GPL(kvm_vcpu_block); > > @@ -1704,11 +1704,11 @@ > { > int me; > int cpu = vcpu->cpu; > - wait_queue_head_t *wqp; > + struct swait_head *wqp; > > wqp = kvm_arch_vcpu_wq(vcpu); > - if (waitqueue_active(wqp)) { > - wake_up_interruptible(wqp); > + if (swaitqueue_active(wqp)) { > + swait_wake_interruptible(wqp); > ++vcpu->stat.halt_wakeup; > } > > @@ -1814,7 +1814,7 @@ > continue; > if (vcpu == me) > continue; > - if (waitqueue_active(&vcpu->wq)) > + if (swaitqueue_active(&vcpu->wq)) > continue; > if (!kvm_vcpu_eligible_for_directed_yield(vcpu)) > continue; > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-01-16 16:48 ` Steven Rostedt @ 2015-01-16 16:56 ` Steven Rostedt 2015-01-17 7:57 ` Peter Zijlstra 2015-01-19 14:41 ` Marcelo Tosatti 2 siblings, 0 replies; 42+ messages in thread From: Steven Rostedt @ 2015-01-16 16:56 UTC (permalink / raw) To: Marcelo Tosatti Cc: linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini, Peter Zijlstra On Fri, 16 Jan 2015 11:48:46 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > > static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) > > { > > - DEFINE_WAIT(wait); > > + DEFINE_SWAITER(wait); > > > > - prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE); > > + swait_prepare(&vc->wq, &wait, TASK_INTERRUPTIBLE); > > vc->vcore_state = VCORE_SLEEPING; > > spin_unlock(&vc->lock); > > schedule(); > > - finish_wait(&vc->wq, &wait); > > + swait_finish(&vc->wq, &wait); > > spin_lock(&vc->lock); > > vc->vcore_state = VCORE_INACTIVE; > > } > > @@ -1613,7 +1613,7 @@ > > kvmppc_create_dtl_entry(vcpu, vc); > > kvmppc_start_thread(vcpu); > > } else if (vc->vcore_state == VCORE_SLEEPING) { > > - wake_up(&vc->wq); > > + swait_wake(&vc->wq); > > I notice everywhere you have a swait_wake_interruptible() but here. Is > there a reason why? > Of course I could have misread this patch, and this specific work queue will only be woken up by swait_wake() and not swait_wake_interruptible(). If that's the case, then its fine as is (except for that typo I pointed out before). -- Steve ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-01-16 16:48 ` Steven Rostedt 2015-01-16 16:56 ` Steven Rostedt @ 2015-01-17 7:57 ` Peter Zijlstra 2015-01-19 14:41 ` Marcelo Tosatti 2 siblings, 0 replies; 42+ messages in thread From: Peter Zijlstra @ 2015-01-17 7:57 UTC (permalink / raw) To: Steven Rostedt Cc: Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini On Fri, Jan 16, 2015 at 11:48:46AM -0500, Steven Rostedt wrote: > I notice everywhere you have a swait_wake_interruptible() but here. Is > there a reason why? > > IIRC, Peter wants to make swait wakeup usage homogenous. That is, you > either sleep in an interruptible state, or you don't. You can't mix and > match it. > > Peter is that what you plan? Yes, this is required for the single wakeup case to be bounded. If you have both INTERRUPTIBLE and UNINTERRUPTIBLE waiters on a list, and you were allowed to do INTERRUPTIBLE/UNINTERRUPTIBLE wakeups, the wakeup would have to do O(n) iteration to find a matching waiter. Seeing that the entire purpose of simple wait queues was to retain RT properties, that's entirely retarded ;-) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-01-16 16:48 ` Steven Rostedt 2015-01-16 16:56 ` Steven Rostedt 2015-01-17 7:57 ` Peter Zijlstra @ 2015-01-19 14:41 ` Marcelo Tosatti 2015-01-20 5:46 ` Paul Mackerras 2 siblings, 1 reply; 42+ messages in thread From: Marcelo Tosatti @ 2015-01-19 14:41 UTC (permalink / raw) To: Steven Rostedt, Paul Mackerras Cc: linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini, Peter Zijlstra On Fri, Jan 16, 2015 at 11:48:46AM -0500, Steven Rostedt wrote: > > @@ -971,8 +971,8 @@ > > kvm_mips_callbacks->queue_timer_int(vcpu); > > > > vcpu->arch.wait = 0; > > - if (waitqueue_active(&vcpu->wq)) { > > - wake_up_interruptible(&vcpu->wq); > > + if (swaitqueue_active(&vcpu->wq)) { > > + swait_wake_nterruptible(&vcpu->wq); > > Sure you compiled this patch? Only x86. > > Index: linux-stable-rt/arch/powerpc/include/asm/kvm_host.h > > =================================================================== > > --- linux-stable-rt.orig/arch/powerpc/include/asm/kvm_host.h 2014-11-25 14:13:39.193899944 -0200 > > +++ linux-stable-rt/arch/powerpc/include/asm/kvm_host.h 2014-11-25 14:14:38.621810091 -0200 > > @@ -295,7 +295,7 @@ > > u8 in_guest; > > struct list_head runnable_threads; > > spinlock_t lock; > > - wait_queue_head_t wq; > > + struct swait_head wq; > > u64 stolen_tb; > > u64 preempt_tb; > > struct kvm_vcpu *runner; > > @@ -612,7 +612,7 @@ > > u8 prodded; > > u32 last_inst; > > > > - wait_queue_head_t *wqp; > > + struct swait_head *wqp; > > struct kvmppc_vcore *vcore; > > int ret; > > int trap; > > Index: linux-stable-rt/arch/powerpc/kvm/book3s_hv.c > > =================================================================== > > --- linux-stable-rt.orig/arch/powerpc/kvm/book3s_hv.c 2014-11-25 14:13:39.195899942 -0200 > > +++ linux-stable-rt/arch/powerpc/kvm/book3s_hv.c 2014-11-25 14:14:38.625810085 -0200 > > @@ -74,11 +74,11 @@ > > { > > int me; > > int cpu = vcpu->cpu; > > - wait_queue_head_t *wqp; > > + struct swait_head *wqp; > > > > wqp = kvm_arch_vcpu_wq(vcpu); > > - if (waitqueue_active(wqp)) { > > - wake_up_interruptible(wqp); > > + if (swaitqueue_active(wqp)) { > > + swait_wake_interruptible(wqp); > > ++vcpu->stat.halt_wakeup; > > } > > > > @@ -583,8 +583,8 @@ > > tvcpu->arch.prodded = 1; > > smp_mb(); > > if (vcpu->arch.ceded) { > > - if (waitqueue_active(&vcpu->wq)) { > > - wake_up_interruptible(&vcpu->wq); > > + if (swaitqueue_active(&vcpu->wq)) { > > + swait_wake_interruptible(&vcpu->wq); > > vcpu->stat.halt_wakeup++; > > } > > } > > @@ -1199,7 +1199,7 @@ > > if (vcore) { > > INIT_LIST_HEAD(&vcore->runnable_threads); > > spin_lock_init(&vcore->lock); > > - init_waitqueue_head(&vcore->wq); > > + init_swait_head(&vcore->wq); > > vcore->preempt_tb = TB_NIL; > > vcore->lpcr = kvm->arch.lpcr; > > vcore->first_vcpuid = core * threads_per_core; > > @@ -1566,13 +1566,13 @@ > > */ > > static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) > > { > > - DEFINE_WAIT(wait); > > + DEFINE_SWAITER(wait); > > > > - prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE); > > + swait_prepare(&vc->wq, &wait, TASK_INTERRUPTIBLE); > > vc->vcore_state = VCORE_SLEEPING; > > spin_unlock(&vc->lock); > > schedule(); > > - finish_wait(&vc->wq, &wait); > > + swait_finish(&vc->wq, &wait); > > spin_lock(&vc->lock); > > vc->vcore_state = VCORE_INACTIVE; > > } > > @@ -1613,7 +1613,7 @@ > > kvmppc_create_dtl_entry(vcpu, vc); > > kvmppc_start_thread(vcpu); > > } else if (vc->vcore_state == VCORE_SLEEPING) { > > - wake_up(&vc->wq); > > + swait_wake(&vc->wq); > > I notice everywhere you have a swait_wake_interruptible() but here. Is > there a reason why? > > IIRC, Peter wants to make swait wakeup usage homogenous. That is, you > either sleep in an interruptible state, or you don't. You can't mix and > match it. IIUC there is only one waiter on this waitqueue at any given time. Paul is that correct? ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-01-19 14:41 ` Marcelo Tosatti @ 2015-01-20 5:46 ` Paul Mackerras 2015-01-20 18:16 ` Steven Rostedt 0 siblings, 1 reply; 42+ messages in thread From: Paul Mackerras @ 2015-01-20 5:46 UTC (permalink / raw) To: Marcelo Tosatti Cc: Steven Rostedt, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini, Peter Zijlstra On Mon, Jan 19, 2015 at 12:41:00PM -0200, Marcelo Tosatti wrote: > On Fri, Jan 16, 2015 at 11:48:46AM -0500, Steven Rostedt wrote: > > > static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) > > > { > > > - DEFINE_WAIT(wait); > > > + DEFINE_SWAITER(wait); > > > > > > - prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE); > > > + swait_prepare(&vc->wq, &wait, TASK_INTERRUPTIBLE); > > > vc->vcore_state = VCORE_SLEEPING; > > > spin_unlock(&vc->lock); > > > schedule(); > > > - finish_wait(&vc->wq, &wait); > > > + swait_finish(&vc->wq, &wait); > > > spin_lock(&vc->lock); > > > vc->vcore_state = VCORE_INACTIVE; > > > } > > > @@ -1613,7 +1613,7 @@ > > > kvmppc_create_dtl_entry(vcpu, vc); > > > kvmppc_start_thread(vcpu); > > > } else if (vc->vcore_state == VCORE_SLEEPING) { > > > - wake_up(&vc->wq); > > > + swait_wake(&vc->wq); > > > > I notice everywhere you have a swait_wake_interruptible() but here. Is > > there a reason why? > > > > IIRC, Peter wants to make swait wakeup usage homogenous. That is, you > > either sleep in an interruptible state, or you don't. You can't mix and > > match it. > > IIUC there is only one waiter on this waitqueue at any given time. > > Paul is that correct? Yes, that's right. It's only the task that has taken the responsibility for running the virtual core that would be waiting on that wait queue. Paul. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-01-20 5:46 ` Paul Mackerras @ 2015-01-20 18:16 ` Steven Rostedt 2015-01-21 15:07 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Steven Rostedt @ 2015-01-20 18:16 UTC (permalink / raw) To: Paul Mackerras Cc: Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini, Peter Zijlstra On Tue, 20 Jan 2015 16:46:53 +1100 Paul Mackerras <paulus@samba.org> wrote: > On Mon, Jan 19, 2015 at 12:41:00PM -0200, Marcelo Tosatti wrote: > > On Fri, Jan 16, 2015 at 11:48:46AM -0500, Steven Rostedt wrote: > > > > static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) > > > > { > > > > - DEFINE_WAIT(wait); > > > > + DEFINE_SWAITER(wait); > > > > > > > > - prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE); > > > > + swait_prepare(&vc->wq, &wait, TASK_INTERRUPTIBLE); > > > > vc->vcore_state = VCORE_SLEEPING; > > > > spin_unlock(&vc->lock); > > > > schedule(); > > > > - finish_wait(&vc->wq, &wait); > > > > + swait_finish(&vc->wq, &wait); > > > > spin_lock(&vc->lock); > > > > vc->vcore_state = VCORE_INACTIVE; > > > > } > > > > @@ -1613,7 +1613,7 @@ > > > > kvmppc_create_dtl_entry(vcpu, vc); > > > > kvmppc_start_thread(vcpu); > > > > } else if (vc->vcore_state == VCORE_SLEEPING) { > > > > - wake_up(&vc->wq); > > > > + swait_wake(&vc->wq); > > > > > > I notice everywhere you have a swait_wake_interruptible() but here. Is > > > there a reason why? > > > > > > IIRC, Peter wants to make swait wakeup usage homogenous. That is, you > > > either sleep in an interruptible state, or you don't. You can't mix and > > > match it. > > > > IIUC there is only one waiter on this waitqueue at any given time. > > > > Paul is that correct? > > Yes, that's right. It's only the task that has taken the > responsibility for running the virtual core that would be waiting on > that wait queue. Thanks Paul, but it still makes me nervious. I'm actually wondering if we should just nuke the _interruptible() version of swait. As it should only be all interruptible or all not interruptible, that the swait_wake() should just do the wake up regardless. In which case, swait_wake() is good enough. No need to have different versions where people may think do something special. Peter? -- Steve ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-01-20 18:16 ` Steven Rostedt @ 2015-01-21 15:07 ` Peter Zijlstra 2015-02-17 17:44 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2015-01-21 15:07 UTC (permalink / raw) To: Steven Rostedt Cc: Paul Mackerras, Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini On Tue, Jan 20, 2015 at 01:16:13PM -0500, Steven Rostedt wrote: > I'm actually wondering if we should just nuke the _interruptible() > version of swait. As it should only be all interruptible or all not > interruptible, that the swait_wake() should just do the wake up > regardless. In which case, swait_wake() is good enough. No need to have > different versions where people may think do something special. > > Peter? Yeah, I think the lastest thing I have sitting here on my disk only has the swake_up() which does TASK_NORMAL, no choice there. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-01-21 15:07 ` Peter Zijlstra @ 2015-02-17 17:44 ` Sebastian Andrzej Siewior 2015-02-18 14:03 ` Peter Zijlstra 2015-02-27 0:23 ` Marcelo Tosatti 0 siblings, 2 replies; 42+ messages in thread From: Sebastian Andrzej Siewior @ 2015-02-17 17:44 UTC (permalink / raw) To: Peter Zijlstra Cc: Steven Rostedt, Paul Mackerras, Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini * Peter Zijlstra | 2015-01-21 16:07:16 [+0100]: >On Tue, Jan 20, 2015 at 01:16:13PM -0500, Steven Rostedt wrote: >> I'm actually wondering if we should just nuke the _interruptible() >> version of swait. As it should only be all interruptible or all not >> interruptible, that the swait_wake() should just do the wake up >> regardless. In which case, swait_wake() is good enough. No need to have >> different versions where people may think do something special. >> >> Peter? > >Yeah, I think the lastest thing I have sitting here on my disk only has >the swake_up() which does TASK_NORMAL, no choice there. what is the swait status in terms of mainline? This sounds like it beeing worked on. I could take the series but then I would drop it again if the mainline implementation changes… Sebastian ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-02-17 17:44 ` Sebastian Andrzej Siewior @ 2015-02-18 14:03 ` Peter Zijlstra 2015-02-25 21:02 ` Sebastian Andrzej Siewior 2015-02-27 0:23 ` Marcelo Tosatti 1 sibling, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2015-02-18 14:03 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Steven Rostedt, Paul Mackerras, Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini On Tue, Feb 17, 2015 at 06:44:19PM +0100, Sebastian Andrzej Siewior wrote: > * Peter Zijlstra | 2015-01-21 16:07:16 [+0100]: > > >On Tue, Jan 20, 2015 at 01:16:13PM -0500, Steven Rostedt wrote: > >> I'm actually wondering if we should just nuke the _interruptible() > >> version of swait. As it should only be all interruptible or all not > >> interruptible, that the swait_wake() should just do the wake up > >> regardless. In which case, swait_wake() is good enough. No need to have > >> different versions where people may think do something special. > >> > >> Peter? > > > >Yeah, I think the lastest thing I have sitting here on my disk only has > >the swake_up() which does TASK_NORMAL, no choice there. > > what is the swait status in terms of mainline? This sounds like it > beeing worked on. > I could take the series but then I would drop it again if the mainline > implementation changes… Well, 'worked' on might be putting too much on it, its one of the many many 'spare' time things that never get attention unless people bug me ;-) The below is my current patch, I've not actually tried it yet, I (think I) had one patch doing some conversions but I'm having trouble locating it. Mostly-Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- include/linux/swait.h | 172 ++++++++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/swait.c | 122 +++++++++++++++++++++++++++++++++++ 2 files changed, 294 insertions(+) --- /dev/null +++ b/include/linux/swait.h @@ -0,0 +1,172 @@ +#ifndef _LINUX_SWAIT_H +#define _LINUX_SWAIT_H + +#include <linux/list.h> +#include <linux/stddef.h> +#include <linux/spinlock.h> +#include <asm/current.h> + +/* + * Simple wait queues + * + * While these are very similar to the other/complex wait queues (wait.h) the + * most important difference is that the simple waitqueue allows for + * deterministic behaviour -- IOW it has strictly bounded IRQ and lock hold + * times. + * + * In order to make this so, we had to drop a fair number of features of the + * other waitqueue code; notably: + * + * - mixing INTERRUPTIBLE and UNINTERRUPTIBLE sleeps on the same waitqueue; + * all wakeups are TASK_NORMAL in order to avoid O(n) lookups for the right + * sleeper state. + * + * - the exclusive mode; because this requires preserving the list order + * and this is hard. + * + * - custom wake functions; because you cannot give any guarantees about + * random code. + * + * As a side effect of this; the data structures are slimmer. + * + * One would recommend using this wait queue where possible. + */ + +struct task_struct; + +struct swait_queue_head { + raw_spinlock_t lock; + struct list_head task_list; +}; + +struct swait_queue { + struct task_struct *task; + struct list_head task_list; +}; + +#define __SWAITQUEUE_INITIALIZER(name) { \ + .task = current, \ + .task_list = LIST_HEAD_INIT((name).task_list), \ +} + +#define DECLARE_SWAITQUEUE(name) \ + struct swait_queue name = __SWAITQUEUE_INITIALIZER(name) + +#define __SWAIT_QUEUE_HEAD_INITIALIZER(name) { \ + .lock = __RAW_SPIN_LOCK_UNLOCKED(name.lock), \ + .task_list = LIST_HEAD_INIT((name).task_list), \ +} + +#define DECLARE_SWAIT_QUEUE_HEAD(name) \ + struct swait_queue_head name = __SWAIT_QUEUE_HEAD_INITIALIZER(name) + +extern void __init_swait_queue_head(struct swait_queue_head *q, const char *name, + struct lock_class_key *key); + +#define init_swait_queue_head(q) \ + do { \ + static struct lock_class_key __key; \ + __init_swait_queue_head((q), #q, &__key); \ + } while (0) + +#ifdef CONFIG_LOCKDEP +# define __SWAIT_QUEUE_HEAD_INIT_ONSTACK(name) \ + ({ init_swait_queue_head(&name); name; }) +# define DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(name) \ + struct swait_queue_head name = __SWAIT_QUEUE_HEAD_INIT_ONSTACK(name) +#else +# define DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(name) \ + DECLARE_SWAIT_QUEUE_HEAD(name) +#endif + +static inline int swait_active(struct swait_queue_head *q) +{ + return !list_empty(&q->task_list); +} + +extern void swake_up(struct swait_queue_head *q); +extern void swake_up_all(struct swait_queue_head *q); +extern void swake_up_locked(struct swait_queue_head *q); + +extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait); +extern void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state); +extern long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state); + +extern void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait); +extern void finish_swait(struct swait_queue_head *q, struct swait_queue *wait); + +/* as per ___wait_event() but for swait, therefore "exclusive == 0" */ +#define ___swait_event(wq, condition, state, ret, cmd) \ +({ \ + struct swait_queue __wait; \ + long __ret = ret; \ + \ + INIT_LIST_HEAD(&__wait.task_list); \ + for (;;) { \ + long __int = prepare_to_swait_event(&wq, &__wait, state);\ + \ + if (condition) \ + break; \ + \ + if (___wait_is_interruptible(state) && __int) { \ + __ret = __int; \ + break; \ + } \ + \ + cmd; \ + } \ + finish_swait(&wq, &__wait); \ + __ret; \ +}) + +#define __swait_event(wq, condition) \ + (void)___swait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0, \ + schedule()) + +#define swait_event(wq, condition) \ +do { \ + if (condition) \ + break; \ + __swait_event(wq, condition); \ +} while (0) + +#define __swait_event_timeout(wq, condition, timeout) \ + ___swait_event(wq, ___wait_cond_timeout(condition), \ + TASK_UNINTERRUPTIBLE, timeout, \ + __ret = schedule_timeout(__ret)) + +#define swait_event_timeout(wq, condition, timeout) \ +({ \ + long __ret = timeout; \ + if (!___wait_cond_timeout(condition)) \ + __ret = __swait_event_timeout(wq, condition, timeout); \ + __ret; \ +}) + +#define __swait_event_interruptible(wq, condition) \ + ___swait_event(wq, condition, TASK_INTERRUPTIBLE, 0, \ + schedule()) + +#define swait_event_interruptible(wq, condition) \ +({ \ + int __ret = 0; \ + if (!(condition)) \ + __ret = __swait_event_interruptible(wq, condition); \ + __ret; \ +}) + +#define __swait_event_interruptible_timeout(wq, condition, timeout) \ + ___swait_event(wq, ___wait_cond_timeout(condition), \ + TASK_INTERRUPTIBLE, timeout, \ + __ret = schedule_timeout(__ret)) + +#define swait_event_interruptible_timeout(wq, condition, timeout) \ +({ \ + long __ret = timeout; \ + if (!___wait_cond_timeout(condition)) \ + __ret = __swait_event_interruptible_timeout(wq, \ + condition, timeout); \ + __ret; \ +}) + +#endif /* _LINUX_SWAIT_H */ --- /dev/null +++ b/kernel/sched/swait.c @@ -0,0 +1,122 @@ + +#include <linux/swait.h> + +void __init_swait_queue_head(struct swait_queue_head *q, const char *name, + struct lock_class_key *key) +{ + raw_spin_lock_init(&q->lock); + lockdep_set_class_and_name(&q->lock, key, name); + INIT_LIST_HEAD(&q->task_list); +} +EXPORT_SYMBOL(__init_swait_queue_head); + +/* + * The thing about the wake_up_state() return value; I think we can ignore it. + * + * If for some reason it would return 0, that means the previously waiting + * task is already running, so it will observe condition true (or has already). + */ +void swake_up_locked(struct swait_queue_head *q) +{ + struct swait_queue *curr; + + list_for_each_entry(curr, &q->task_list, task_list) { + wake_up_process(curr->task); + list_del_init(&curr->task_list); + break; + } +} +EXPORT_SYMBOL(swake_up_locked); + +void swake_up(struct swait_queue_head *q) +{ + unsigned long flags; + + if (!swait_active(q)) + return; + + raw_spin_lock_irqsave(&q->lock, flags); + __swake_up_locked(q); + raw_spin_unlock_irqrestore(&q->lock, flags); +} +EXPORT_SYMBOL(swake_up); + +/* + * Does not allow usage from IRQ disabled, since we must be able to + * release IRQs to guarantee bounded hold time. + */ +void swake_up_all(struct swait_queue_head *q) +{ + struct swait_queue *curr, *next; + LIST_HEAD(tmp); + + if (!swait_active(q)) + return; + + raw_spin_lock_irq(&q->lock); + list_splice_init(&q->task_list, &tmp); + while (!list_empty(&tmp)) { + curr = list_first_entry(&tmp, typeof(curr), task_list); + + wake_up_state(curr->task, state); + list_del_init(&curr->task_list); + + if (list_empty(&tmp)) + break; + + raw_spin_unlock_irq(&q->lock); + raw_spin_lock_irq(&q->lock); + } + raw_spin_unlock_irq(&q->lock); +} +EXPORT_SYMBOL(swake_up_all); + +void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait) +{ + wait->task = current; + if (list_empty(&wait->node)) + list_add(&wait->task_list, &q->task_list); +} + +void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&q->lock, flags); + __prepare_to_swait(q, wait); + set_current_state(state); + raw_spin_unlock_irqrestore(&q->lock, flags); +} +EXPORT_SYMBOL(prepare_to_swait); + +long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state) +{ + if (signal_pending_state(state, current)) + return -ERESTARTSYS; + + prepare_to_swait(q, wait, state); + + return 0; +} +EXPORT_SYMBOL(prepare_to_swait_event); + +void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait) +{ + __set_current_state(TASK_RUNNING); + if (!list_empty(&wait->task_list)) + list_del_init(&wait->task_list); +} + +void finish_swait(struct swait_queue_head *q, struct swait_queue *wait) +{ + unsigned long flags; + + __set_current_state(TASK_RUNNING); + + if (!list_empty_careful(&wait->task_list)) { + raw_spin_lock_irqsave(&q->lock, flags); + list_del_init(&wait->task_list); + raw_spin_unlock_irqrestore(&q->lock, flags); + } +} +EXPORT_SYMBOL(finish_swait); ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-02-18 14:03 ` Peter Zijlstra @ 2015-02-25 21:02 ` Sebastian Andrzej Siewior 2015-08-07 10:57 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Sebastian Andrzej Siewior @ 2015-02-25 21:02 UTC (permalink / raw) To: Peter Zijlstra Cc: Steven Rostedt, Paul Mackerras, Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini * Peter Zijlstra | 2015-02-18 15:03:20 [+0100]: >On Tue, Feb 17, 2015 at 06:44:19PM +0100, Sebastian Andrzej Siewior wrote: >> * Peter Zijlstra | 2015-01-21 16:07:16 [+0100]: >> >> >On Tue, Jan 20, 2015 at 01:16:13PM -0500, Steven Rostedt wrote: >> >> I'm actually wondering if we should just nuke the _interruptible() >> >> version of swait. As it should only be all interruptible or all not >> >> interruptible, that the swait_wake() should just do the wake up >> >> regardless. In which case, swait_wake() is good enough. No need to have >> >> different versions where people may think do something special. >> >> >> >> Peter? >> > >> >Yeah, I think the lastest thing I have sitting here on my disk only has >> >the swake_up() which does TASK_NORMAL, no choice there. >> >> what is the swait status in terms of mainline? This sounds like it >> beeing worked on. >> I could take the series but then I would drop it again if the mainline >> implementation changes… > >Well, 'worked' on might be putting too much on it, its one of the many >many 'spare' time things that never get attention unless people bug me >;-) > >The below is my current patch, I've not actually tried it yet, I (think >I) had one patch doing some conversions but I'm having trouble locating >it. > >Mostly-Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> >--- > include/linux/swait.h | 172 ++++++++++++++++++++++++++++++++++++++++++++++++++ > kernel/sched/swait.c | 122 +++++++++++++++++++++++++++++++++++ > 2 files changed, 294 insertions(+) > >--- /dev/null >+++ b/include/linux/swait.h >@@ -0,0 +1,172 @@ >+#ifndef _LINUX_SWAIT_H >+#define _LINUX_SWAIT_H >+ >+#include <linux/list.h> >+#include <linux/stddef.h> >+#include <linux/spinlock.h> >+#include <asm/current.h> >+ >+/* >+ * Simple wait queues >+ * >+ * While these are very similar to the other/complex wait queues (wait.h) the >+ * most important difference is that the simple waitqueue allows for >+ * deterministic behaviour -- IOW it has strictly bounded IRQ and lock hold >+ * times. >+ * >+ * In order to make this so, we had to drop a fair number of features of the >+ * other waitqueue code; notably: >+ * >+ * - mixing INTERRUPTIBLE and UNINTERRUPTIBLE sleeps on the same waitqueue; >+ * all wakeups are TASK_NORMAL in order to avoid O(n) lookups for the right >+ * sleeper state. >+ * >+ * - the exclusive mode; because this requires preserving the list order >+ * and this is hard. >+ * >+ * - custom wake functions; because you cannot give any guarantees about >+ * random code. >+ * >+ * As a side effect of this; the data structures are slimmer. >+ * >+ * One would recommend using this wait queue where possible. >+ */ >+ >+struct task_struct; >+ >+struct swait_queue_head { >+ raw_spinlock_t lock; >+ struct list_head task_list; >+}; >+ >+struct swait_queue { >+ struct task_struct *task; >+ struct list_head task_list; I would prefer something different than task_list here since this is an item. Scrolling down you tried to use node once so maybe that would be good here :) >+}; >+ >+#define __SWAITQUEUE_INITIALIZER(name) { \ >+ .task = current, \ >+ .task_list = LIST_HEAD_INIT((name).task_list), \ >+} >+ >+#define DECLARE_SWAITQUEUE(name) \ >+ struct swait_queue name = __SWAITQUEUE_INITIALIZER(name) >+ >+#define __SWAIT_QUEUE_HEAD_INITIALIZER(name) { \ >+ .lock = __RAW_SPIN_LOCK_UNLOCKED(name.lock), \ >+ .task_list = LIST_HEAD_INIT((name).task_list), \ >+} >+ >+#define DECLARE_SWAIT_QUEUE_HEAD(name) \ >+ struct swait_queue_head name = __SWAIT_QUEUE_HEAD_INITIALIZER(name) >+ >+extern void __init_swait_queue_head(struct swait_queue_head *q, const char *name, >+ struct lock_class_key *key); >+ >+#define init_swait_queue_head(q) \ >+ do { \ >+ static struct lock_class_key __key; \ >+ __init_swait_queue_head((q), #q, &__key); \ >+ } while (0) >+ >+#ifdef CONFIG_LOCKDEP >+# define __SWAIT_QUEUE_HEAD_INIT_ONSTACK(name) \ >+ ({ init_swait_queue_head(&name); name; }) >+# define DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(name) \ >+ struct swait_queue_head name = __SWAIT_QUEUE_HEAD_INIT_ONSTACK(name) >+#else >+# define DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(name) \ >+ DECLARE_SWAIT_QUEUE_HEAD(name) >+#endif >+ >+static inline int swait_active(struct swait_queue_head *q) >+{ >+ return !list_empty(&q->task_list); In RT there was a smp_mb() which you dropped and I assume you had reasons for it. I assumed that one can perform list_empty_careful() without a lock if the items were removed with list_del_init(). But since nothing in -RT blow up so far I guess this here is legal, too :) >+} >+ >+extern void swake_up(struct swait_queue_head *q); >+extern void swake_up_all(struct swait_queue_head *q); >+extern void swake_up_locked(struct swait_queue_head *q); >+ >+extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait); >+extern void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state); >+extern long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state); >+ >+extern void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait); >+extern void finish_swait(struct swait_queue_head *q, struct swait_queue *wait); >+ >+/* as per ___wait_event() but for swait, therefore "exclusive == 0" */ >+#define ___swait_event(wq, condition, state, ret, cmd) \ >+({ \ >+ struct swait_queue __wait; \ >+ long __ret = ret; \ >+ \ >+ INIT_LIST_HEAD(&__wait.task_list); \ >+ for (;;) { \ >+ long __int = prepare_to_swait_event(&wq, &__wait, state);\ >+ \ >+ if (condition) \ >+ break; \ >+ \ >+ if (___wait_is_interruptible(state) && __int) { \ >+ __ret = __int; \ >+ break; \ >+ } \ >+ \ >+ cmd; \ >+ } \ >+ finish_swait(&wq, &__wait); \ >+ __ret; \ >+}) >+ >+#define __swait_event(wq, condition) \ >+ (void)___swait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0, \ >+ schedule()) >+ >+#define swait_event(wq, condition) \ >+do { \ >+ if (condition) \ >+ break; \ >+ __swait_event(wq, condition); \ >+} while (0) >+ >+#define __swait_event_timeout(wq, condition, timeout) \ >+ ___swait_event(wq, ___wait_cond_timeout(condition), \ >+ TASK_UNINTERRUPTIBLE, timeout, \ >+ __ret = schedule_timeout(__ret)) >+ >+#define swait_event_timeout(wq, condition, timeout) \ >+({ \ >+ long __ret = timeout; \ >+ if (!___wait_cond_timeout(condition)) \ >+ __ret = __swait_event_timeout(wq, condition, timeout); \ >+ __ret; \ >+}) >+ >+#define __swait_event_interruptible(wq, condition) \ >+ ___swait_event(wq, condition, TASK_INTERRUPTIBLE, 0, \ >+ schedule()) >+ >+#define swait_event_interruptible(wq, condition) \ >+({ \ >+ int __ret = 0; \ >+ if (!(condition)) \ >+ __ret = __swait_event_interruptible(wq, condition); \ >+ __ret; \ >+}) >+ >+#define __swait_event_interruptible_timeout(wq, condition, timeout) \ >+ ___swait_event(wq, ___wait_cond_timeout(condition), \ >+ TASK_INTERRUPTIBLE, timeout, \ >+ __ret = schedule_timeout(__ret)) >+ >+#define swait_event_interruptible_timeout(wq, condition, timeout) \ >+({ \ >+ long __ret = timeout; \ >+ if (!___wait_cond_timeout(condition)) \ >+ __ret = __swait_event_interruptible_timeout(wq, \ >+ condition, timeout); \ >+ __ret; \ >+}) >+ >+#endif /* _LINUX_SWAIT_H */ >--- /dev/null >+++ b/kernel/sched/swait.c >@@ -0,0 +1,122 @@ >+ >+#include <linux/swait.h> >+ >+void __init_swait_queue_head(struct swait_queue_head *q, const char *name, >+ struct lock_class_key *key) >+{ >+ raw_spin_lock_init(&q->lock); >+ lockdep_set_class_and_name(&q->lock, key, name); >+ INIT_LIST_HEAD(&q->task_list); >+} >+EXPORT_SYMBOL(__init_swait_queue_head); >+ >+/* >+ * The thing about the wake_up_state() return value; I think we can ignore it. >+ * >+ * If for some reason it would return 0, that means the previously waiting >+ * task is already running, so it will observe condition true (or has already). >+ */ >+void swake_up_locked(struct swait_queue_head *q) >+{ >+ struct swait_queue *curr; >+ >+ list_for_each_entry(curr, &q->task_list, task_list) { >+ wake_up_process(curr->task); okay. So since we limit everything to TASK_NORMAL which has to sleep while on the list there is no need to check if we actually woken up someone. >+ list_del_init(&curr->task_list); >+ break; >+ } >+} >+EXPORT_SYMBOL(swake_up_locked); >+ >+void swake_up(struct swait_queue_head *q) >+{ >+ unsigned long flags; >+ >+ if (!swait_active(q)) >+ return; >+ >+ raw_spin_lock_irqsave(&q->lock, flags); >+ __swake_up_locked(q); I thing this should have been swake_up_locked() instead since __swake_up_locked() isn't part of this patch. Just a nitpick: later there is __prepare_to_swait() and __finish_swait() which have the __ prefix instead a _locked suffix. Not sure what is better for a better for a public API but maybe one way would be good. >+ raw_spin_unlock_irqrestore(&q->lock, flags); >+} >+EXPORT_SYMBOL(swake_up); >+ >+/* >+ * Does not allow usage from IRQ disabled, since we must be able to >+ * release IRQs to guarantee bounded hold time. >+ */ >+void swake_up_all(struct swait_queue_head *q) >+{ >+ struct swait_queue *curr, *next; >+ LIST_HEAD(tmp); WARN_ON(irqs_disabled()) ? >+ if (!swait_active(q)) >+ return; >+ >+ raw_spin_lock_irq(&q->lock); >+ list_splice_init(&q->task_list, &tmp); >+ while (!list_empty(&tmp)) { >+ curr = list_first_entry(&tmp, typeof(curr), task_list); >+ >+ wake_up_state(curr->task, state); >+ list_del_init(&curr->task_list); So because the task may timeout and remove itself from the list at anytime you need to hold the lock during wakeup and the removal from the list >+ >+ if (list_empty(&tmp)) >+ break; >+ >+ raw_spin_unlock_irq(&q->lock); and you drop the lock after each iteration in case there is an IRQ pending or the task, that has been just woken up, has a higher priority than the current task and needs to get on the CPU. Not sure if this case matters: - _this_ task (wake_all) prio 120 - first task in queue prio 10, RR - second task in queue prio 9, RR the *old* behavior would put the second task before the first task on CPU. The *new* behaviour puts the first task on the CPU after dropping the lock. The second task (that has a higher priority but nobody knows) has to wait until the first one is done (and anything else that might been woken up in the meantime with a higher prio than 120). >+ raw_spin_lock_irq(&q->lock); >+ } >+ raw_spin_unlock_irq(&q->lock); >+} >+EXPORT_SYMBOL(swake_up_all); >+ >+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait) >+{ >+ wait->task = current; >+ if (list_empty(&wait->node)) >+ list_add(&wait->task_list, &q->task_list); >+} >+ >+void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state) >+{ >+ unsigned long flags; >+ >+ raw_spin_lock_irqsave(&q->lock, flags); >+ __prepare_to_swait(q, wait); >+ set_current_state(state); >+ raw_spin_unlock_irqrestore(&q->lock, flags); >+} >+EXPORT_SYMBOL(prepare_to_swait); >+ >+long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state) >+{ >+ if (signal_pending_state(state, current)) >+ return -ERESTARTSYS; >+ >+ prepare_to_swait(q, wait, state); >+ >+ return 0; >+} >+EXPORT_SYMBOL(prepare_to_swait_event); >+ >+void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait) this one has no users the __ suggests that it is locked edition. Maybe it is for the completions… >+{ >+ __set_current_state(TASK_RUNNING); >+ if (!list_empty(&wait->task_list)) >+ list_del_init(&wait->task_list); >+} >+ >+void finish_swait(struct swait_queue_head *q, struct swait_queue *wait) >+{ >+ unsigned long flags; >+ >+ __set_current_state(TASK_RUNNING); >+ >+ if (!list_empty_careful(&wait->task_list)) { >+ raw_spin_lock_irqsave(&q->lock, flags); >+ list_del_init(&wait->task_list); >+ raw_spin_unlock_irqrestore(&q->lock, flags); >+ } >+} >+EXPORT_SYMBOL(finish_swait); Sebastian ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-02-25 21:02 ` Sebastian Andrzej Siewior @ 2015-08-07 10:57 ` Peter Zijlstra 2015-08-07 11:14 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2015-08-07 10:57 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Steven Rostedt, Paul Mackerras, Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini On Wed, Feb 25, 2015 at 10:02:50PM +0100, Sebastian Andrzej Siewior wrote: > >+static inline int swait_active(struct swait_queue_head *q) > >+{ > >+ return !list_empty(&q->task_list); > > In RT there was a smp_mb() which you dropped and I assume you had > reasons for it. Yeah, RT didn't have a reason for the smp_mb() -- any barrier without a comment is a bug :-) Also waitqueue_active(), its counterpart, does not have a barrier there either. Nor did I see any reason for that mb to be there. > I assumed that one can perform list_empty_careful() > without a lock if the items were removed with list_del_init(). But since > nothing in -RT blow up so far I guess this here is legal, too :) Nobody will come and arrest us for software bugs -- yet ;-) > >+/* > >+ * The thing about the wake_up_state() return value; I think we can ignore it. > >+ * > >+ * If for some reason it would return 0, that means the previously waiting > >+ * task is already running, so it will observe condition true (or has already). > >+ */ > >+void swake_up_locked(struct swait_queue_head *q) > >+{ > >+ struct swait_queue *curr; > >+ > >+ list_for_each_entry(curr, &q->task_list, task_list) { > >+ wake_up_process(curr->task); > > okay. So since we limit everything to TASK_NORMAL which has to sleep > while on the list there is no need to check if we actually woken up > someone. Partly that, also that I don't see how that return value is meaningful in the first place. If it were to return false, the task was/is already running and it will observe whatever condition we just satisfied to allow waking things up. So either way around, we'll get (at least) 1 task running. > >+ list_del_init(&curr->task_list); > >+ break; > >+ } > >+} > >+EXPORT_SYMBOL(swake_up_locked); > >+void swake_up(struct swait_queue_head *q) > >+{ > >+ unsigned long flags; > >+ > >+ if (!swait_active(q)) > >+ return; > >+ > >+ raw_spin_lock_irqsave(&q->lock, flags); > >+ __swake_up_locked(q); > > I thing this should have been swake_up_locked() instead since > __swake_up_locked() isn't part of this patch. > > Just a nitpick: later there is __prepare_to_swait() and __finish_swait() > which have the __ prefix instead a _locked suffix. Not sure what is > better for a better for a public API but maybe one way would be good. Yeah, I suppose that's true ;-) > >+ raw_spin_unlock_irqrestore(&q->lock, flags); > >+} > >+EXPORT_SYMBOL(swake_up); > >+ > >+/* > >+ * Does not allow usage from IRQ disabled, since we must be able to > >+ * release IRQs to guarantee bounded hold time. > >+ */ > >+void swake_up_all(struct swait_queue_head *q) > >+{ > >+ struct swait_queue *curr, *next; > >+ LIST_HEAD(tmp); > > WARN_ON(irqs_disabled()) ? Lockdep should already catch that by virtue of using unconditional _irq spinlock primitives. > >+ if (!swait_active(q)) > >+ return; > >+ > >+ raw_spin_lock_irq(&q->lock); > >+ list_splice_init(&q->task_list, &tmp); > >+ while (!list_empty(&tmp)) { > >+ curr = list_first_entry(&tmp, typeof(curr), task_list); > >+ > >+ wake_up_state(curr->task, state); > >+ list_del_init(&curr->task_list); > > So because the task may timeout and remove itself from the list at > anytime you need to hold the lock during wakeup and the removal from the > list Indeed. > >+ > >+ if (list_empty(&tmp)) > >+ break; > >+ > >+ raw_spin_unlock_irq(&q->lock); > > and you drop the lock after each iteration in case there is an IRQ > pending or the task, that has been just woken up, has a higher priority > than the current task and needs to get on the CPU. > Not sure if this case matters: > - _this_ task (wake_all) prio 120 > - first task in queue prio 10, RR > - second task in queue prio 9, RR Why complicate things? Better to not assume anything and just do the simple correct thing. > the *old* behavior would put the second task before the first task on > CPU. The *new* behaviour puts the first task on the CPU after dropping > the lock. The second task (that has a higher priority but nobody knows) > has to wait until the first one is done (and anything else that might > been woken up in the meantime with a higher prio than 120). Irrelevant, we _must_ drop the lock in order to maintain bounded behaviour. > >+ raw_spin_lock_irq(&q->lock); > >+ } > >+ raw_spin_unlock_irq(&q->lock); > >+} > >+EXPORT_SYMBOL(swake_up_all); > >+void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait) > this one has no users the __ suggests that it is locked edition. Maybe > it is for the completions… Yeah, who knows, I certainly do not anymore ;-) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-08-07 10:57 ` Peter Zijlstra @ 2015-08-07 11:14 ` Peter Zijlstra 0 siblings, 0 replies; 42+ messages in thread From: Peter Zijlstra @ 2015-08-07 11:14 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Steven Rostedt, Paul Mackerras, Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini On Fri, Aug 07, 2015 at 12:57:38PM +0200, Peter Zijlstra wrote: > > >+void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait) > > this one has no users the __ suggests that it is locked edition. Maybe > > it is for the completions… > > Yeah, who knows, I certainly do not anymore ;-) On that, we cannot convert completions to swait. Because swait wake_all must not happen from IRQ context, and complete_all() typically is used from just that. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq @ 2015-08-07 11:14 ` Peter Zijlstra 0 siblings, 0 replies; 42+ messages in thread From: Peter Zijlstra @ 2015-08-07 11:14 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Steven Rostedt, Paul Mackerras, Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini On Fri, Aug 07, 2015 at 12:57:38PM +0200, Peter Zijlstra wrote: > > >+void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait) > > this one has no users the __ suggests that it is locked edition. Maybe > > it is for the completions… > > Yeah, who knows, I certainly do not anymore ;-) On that, we cannot convert completions to swait. Because swait wake_all must not happen from IRQ context, and complete_all() typically is used from just that. -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-08-07 11:14 ` Peter Zijlstra (?) @ 2015-08-07 16:41 ` Christoph Hellwig 2015-08-07 16:45 ` Peter Zijlstra -1 siblings, 1 reply; 42+ messages in thread From: Christoph Hellwig @ 2015-08-07 16:41 UTC (permalink / raw) To: Peter Zijlstra Cc: Sebastian Andrzej Siewior, Steven Rostedt, Paul Mackerras, Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini On Fri, Aug 07, 2015 at 01:14:15PM +0200, Peter Zijlstra wrote: > On that, we cannot convert completions to swait. Because swait wake_all > must not happen from IRQ context, and complete_all() typically is used > from just that. If swait queues aren't useable from IRQ context they will be fairly useless. What's the problem with making them irq safe? ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-08-07 16:41 ` Christoph Hellwig @ 2015-08-07 16:45 ` Peter Zijlstra 2015-08-09 6:39 ` Christoph Hellwig 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2015-08-07 16:45 UTC (permalink / raw) To: Christoph Hellwig Cc: Sebastian Andrzej Siewior, Steven Rostedt, Paul Mackerras, Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini On Fri, Aug 07, 2015 at 09:41:31AM -0700, Christoph Hellwig wrote: > On Fri, Aug 07, 2015 at 01:14:15PM +0200, Peter Zijlstra wrote: > > On that, we cannot convert completions to swait. Because swait wake_all > > must not happen from IRQ context, and complete_all() typically is used > > from just that. > > If swait queues aren't useable from IRQ context they will be fairly > useless. What's the problem with making them irq safe? Its just the swait_wake_all() that is not. The entire purpose of them was to have something that allows bounded execution (RT and all). Since you can have unbounded numbers of tasks waiting on a waitqueue (well, reality has bounds of course, like total memory available etc..) a wake_all() can end up being many many wake_process() calls. We've had this be a problem in RT. So the proposed swait_wake_all() requires being called from task context, such that it can drop the lock (and IRQ disable) after every wakeup, and thereby guarantee that higher priority things will not experience undue latencies. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-08-07 16:45 ` Peter Zijlstra @ 2015-08-09 6:39 ` Christoph Hellwig 0 siblings, 0 replies; 42+ messages in thread From: Christoph Hellwig @ 2015-08-09 6:39 UTC (permalink / raw) To: Peter Zijlstra Cc: Christoph Hellwig, Sebastian Andrzej Siewior, Steven Rostedt, Paul Mackerras, Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini On Fri, Aug 07, 2015 at 06:45:26PM +0200, Peter Zijlstra wrote: > Its just the swait_wake_all() that is not. The entire purpose of them > was to have something that allows bounded execution (RT and all). Still not sure i that might be a too big burden for mainline, but at least it's not as severe.. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-08-07 11:14 ` Peter Zijlstra (?) (?) @ 2015-08-17 20:31 ` Thomas Gleixner -1 siblings, 0 replies; 42+ messages in thread From: Thomas Gleixner @ 2015-08-17 20:31 UTC (permalink / raw) To: Peter Zijlstra Cc: Sebastian Andrzej Siewior, Steven Rostedt, Paul Mackerras, Marcelo Tosatti, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, kvm, Paolo Bonzini [-- Attachment #1: Type: TEXT/PLAIN, Size: 697 bytes --] On Fri, 7 Aug 2015, Peter Zijlstra wrote: > On Fri, Aug 07, 2015 at 12:57:38PM +0200, Peter Zijlstra wrote: > > > > >+void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait) > > > > this one has no users the __ suggests that it is locked edition. Maybe > > > it is for the completions… > > > > Yeah, who knows, I certainly do not anymore ;-) > > On that, we cannot convert completions to swait. Because swait wake_all > must not happen from IRQ context, and complete_all() typically is used > from just that. Well, completions are not the ones which have a gazillion of waiters and we got quite some performance back from using swait in completions on RT. Thanks, tglx ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-02-17 17:44 ` Sebastian Andrzej Siewior 2015-02-18 14:03 ` Peter Zijlstra @ 2015-02-27 0:23 ` Marcelo Tosatti 2015-03-05 1:09 ` Marcelo Tosatti 1 sibling, 1 reply; 42+ messages in thread From: Marcelo Tosatti @ 2015-02-27 0:23 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Peter Zijlstra, Steven Rostedt, Paul Mackerras, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini On Tue, Feb 17, 2015 at 06:44:19PM +0100, Sebastian Andrzej Siewior wrote: > * Peter Zijlstra | 2015-01-21 16:07:16 [+0100]: > > >On Tue, Jan 20, 2015 at 01:16:13PM -0500, Steven Rostedt wrote: > >> I'm actually wondering if we should just nuke the _interruptible() > >> version of swait. As it should only be all interruptible or all not > >> interruptible, that the swait_wake() should just do the wake up > >> regardless. In which case, swait_wake() is good enough. No need to have > >> different versions where people may think do something special. > >> > >> Peter? > > > >Yeah, I think the lastest thing I have sitting here on my disk only has > >the swake_up() which does TASK_NORMAL, no choice there. > > what is the swait status in terms of mainline? This sounds like it > beeing worked on. > I could take the series but then I would drop it again if the mainline > implementation changes… Hi Sebastian, No, you would just adjust it to the upstream kernel interfaces, as the rest of the -rt users of the swait interfaces. Can you please include the series? Thanks ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-02-27 0:23 ` Marcelo Tosatti @ 2015-03-05 1:09 ` Marcelo Tosatti 0 siblings, 0 replies; 42+ messages in thread From: Marcelo Tosatti @ 2015-03-05 1:09 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Peter Zijlstra, Steven Rostedt, Paul Mackerras, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini On Thu, Feb 26, 2015 at 09:23:57PM -0300, Marcelo Tosatti wrote: > On Tue, Feb 17, 2015 at 06:44:19PM +0100, Sebastian Andrzej Siewior wrote: > > * Peter Zijlstra | 2015-01-21 16:07:16 [+0100]: > > > > >On Tue, Jan 20, 2015 at 01:16:13PM -0500, Steven Rostedt wrote: > > >> I'm actually wondering if we should just nuke the _interruptible() > > >> version of swait. As it should only be all interruptible or all not > > >> interruptible, that the swait_wake() should just do the wake up > > >> regardless. In which case, swait_wake() is good enough. No need to have > > >> different versions where people may think do something special. > > >> > > >> Peter? > > > > > >Yeah, I think the lastest thing I have sitting here on my disk only has > > >the swake_up() which does TASK_NORMAL, no choice there. > > > > what is the swait status in terms of mainline? This sounds like it > > beeing worked on. > > I could take the series but then I would drop it again if the mainline > > implementation changes… > > Hi Sebastian, > > No, you would just adjust it to the upstream kernel interfaces, as the rest of > the -rt users of the swait interfaces. > > Can you please include the series? > > Thanks Sebastian? ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq @ 2015-03-05 1:09 ` Marcelo Tosatti 0 siblings, 0 replies; 42+ messages in thread From: Marcelo Tosatti @ 2015-03-05 1:09 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Peter Zijlstra, Steven Rostedt, Paul Mackerras, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini On Thu, Feb 26, 2015 at 09:23:57PM -0300, Marcelo Tosatti wrote: > On Tue, Feb 17, 2015 at 06:44:19PM +0100, Sebastian Andrzej Siewior wrote: > > * Peter Zijlstra | 2015-01-21 16:07:16 [+0100]: > > > > >On Tue, Jan 20, 2015 at 01:16:13PM -0500, Steven Rostedt wrote: > > >> I'm actually wondering if we should just nuke the _interruptible() > > >> version of swait. As it should only be all interruptible or all not > > >> interruptible, that the swait_wake() should just do the wake up > > >> regardless. In which case, swait_wake() is good enough. No need to have > > >> different versions where people may think do something special. > > >> > > >> Peter? > > > > > >Yeah, I think the lastest thing I have sitting here on my disk only has > > >the swake_up() which does TASK_NORMAL, no choice there. > > > > what is the swait status in terms of mainline? This sounds like it > > beeing worked on. > > I could take the series but then I would drop it again if the mainline > > implementation changes… > > Hi Sebastian, > > No, you would just adjust it to the upstream kernel interfaces, as the rest of > the -rt users of the swait interfaces. > > Can you please include the series? > > Thanks Sebastian? -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-03-05 1:09 ` Marcelo Tosatti (?) @ 2015-03-05 7:42 ` Sebastian Andrzej Siewior -1 siblings, 0 replies; 42+ messages in thread From: Sebastian Andrzej Siewior @ 2015-03-05 7:42 UTC (permalink / raw) To: Marcelo Tosatti Cc: Peter Zijlstra, Steven Rostedt, Paul Mackerras, linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini On 03/05/2015 02:09 AM, Marcelo Tosatti wrote: >> Can you please include the series? >> >> Thanks > > Sebastian? I will pick it up, don't worry. I think I do my -RT day tomorrow. Sebastian ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2015-01-14 17:12 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti 2015-01-14 18:22 ` Rik van Riel 2015-01-16 16:48 ` Steven Rostedt @ 2015-03-06 13:54 ` Sebastian Andrzej Siewior 2 siblings, 0 replies; 42+ messages in thread From: Sebastian Andrzej Siewior @ 2015-03-06 13:54 UTC (permalink / raw) To: Marcelo Tosatti Cc: linux-kernel, linux-rt-users, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini * Marcelo Tosatti | 2015-01-14 15:12:52 [-0200]: Against which tree was this prepared? Could please rebase it against v3.18.7-rt2? Because a I see "fuzz 2", the mips file is gone and s390 rejects almost every chunk. And there was that mips chunk Steven noticed. Patch #2 seems to apply but since it requires #1 I wait for the update. Sebastian ^ permalink raw reply [flat|nested] 42+ messages in thread
* [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2015-01-14 17:12 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) Marcelo Tosatti 2015-01-14 17:12 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti @ 2015-01-14 17:12 ` Marcelo Tosatti 2015-01-14 18:23 ` Rik van Riel 2015-01-14 17:35 ` [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) Paolo Bonzini 2 siblings, 1 reply; 42+ messages in thread From: Marcelo Tosatti @ 2015-01-14 17:12 UTC (permalink / raw) To: linux-kernel, linux-rt-users Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini, Marcelo Tosatti [-- Attachment #1: kvm-lapic-irqsafe --] [-- Type: text/plain, Size: 2691 bytes --] Since lapic timer handler only wakes up a simple waitqueue, it can be executed from hardirq context. Reduces average cyclictest latency by 3us. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- arch/x86/kvm/lapic.c | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) Index: linux-stable-rt/arch/x86/kvm/lapic.c =================================================================== --- linux-stable-rt.orig/arch/x86/kvm/lapic.c 2014-11-25 14:14:38.636810068 -0200 +++ linux-stable-rt/arch/x86/kvm/lapic.c 2015-01-14 14:59:17.840251874 -0200 @@ -1031,8 +1031,38 @@ apic->divide_count); } + +static enum hrtimer_restart apic_timer_fn(struct hrtimer *data); + +static void apic_timer_expired(struct hrtimer *data) +{ + int ret, i = 0; + enum hrtimer_restart r; + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + + r = apic_timer_fn(data); + + if (r == HRTIMER_RESTART) { + do { + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); + if (ret == -ETIME) + hrtimer_add_expires_ns(&ktimer->timer, + ktimer->period); + i++; + } while (ret == -ETIME && i < 10); + + if (ret == -ETIME) { + printk_once(KERN_ERR "%s: failed to reprogram timer\n", + __func__); + WARN_ON_ONCE(1); + } + } +} + + static void start_apic_timer(struct kvm_lapic *apic) { + int ret; ktime_t now; atomic_set(&apic->lapic_timer.pending, 0); @@ -1062,9 +1092,11 @@ } } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, apic->lapic_timer.period), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016" PRIx64 ", " @@ -1094,8 +1126,10 @@ ns = (tscdeadline - guest_tsc) * 1000000ULL; do_div(ns, this_tsc_khz); } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, ns), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); local_irq_restore(flags); } @@ -1581,6 +1615,7 @@ hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); apic->lapic_timer.timer.function = apic_timer_fn; + apic->lapic_timer.timer.irqsafe = 1; /* * APIC is created enabled. This will prevent kvm_lapic_set_base from @@ -1699,7 +1734,8 @@ timer = &vcpu->arch.apic->lapic_timer.timer; if (hrtimer_cancel(timer)) - hrtimer_start_expires(timer, HRTIMER_MODE_ABS); + if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME) + apic_timer_expired(timer); } /* ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2015-01-14 17:12 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti @ 2015-01-14 18:23 ` Rik van Riel 0 siblings, 0 replies; 42+ messages in thread From: Rik van Riel @ 2015-01-14 18:23 UTC (permalink / raw) To: Marcelo Tosatti, linux-kernel, linux-rt-users Cc: Luiz Capitulino, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini On 01/14/2015 12:12 PM, Marcelo Tosatti wrote: > Since lapic timer handler only wakes up a simple waitqueue, > it can be executed from hardirq context. > > Reduces average cyclictest latency by 3us. > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) 2015-01-14 17:12 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) Marcelo Tosatti 2015-01-14 17:12 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti 2015-01-14 17:12 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti @ 2015-01-14 17:35 ` Paolo Bonzini 2 siblings, 0 replies; 42+ messages in thread From: Paolo Bonzini @ 2015-01-14 17:35 UTC (permalink / raw) To: Marcelo Tosatti, linux-kernel, linux-rt-users Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm On 14/01/2015 18:12, Marcelo Tosatti wrote: > Against v3.14-rt branch of > git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git > > The problem: > > On -RT, an emulated LAPIC timer instance has the following path: > > 1) hard interrupt > 2) ksoftirqd is scheduled > 3) ksoftirqd wakes up vcpu thread > 4) vcpu thread is scheduled > > This extra context switch introduces unnecessary latency in the > LAPIC path for a KVM guest. > > The solution: > > Allow waking up vcpu thread from hardirq context, > thus avoiding the need for ksoftirqd to be scheduled. > > Normal waitqueues make use of spinlocks, which on -RT > are sleepable locks. Therefore, waking up a waitqueue > waiter involves locking a sleeping lock, which > is not allowed from hard interrupt context. > > cyclictest command line: > # cyclictest -m -n -q -p99 -l 1000000 -h60 -D 1m > > This patch reduces the average latency in my tests from 14us to 11us. > > v2: improve changelog (Rik van Riel) > v3: limit (once) guest triggered printk and WARN_ON (Paolo Bonzini) > > Looks good. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> ^ permalink raw reply [flat|nested] 42+ messages in thread
* [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v5) @ 2015-04-08 23:33 Marcelo Tosatti 2015-04-08 23:33 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 0 siblings, 1 reply; 42+ messages in thread From: Marcelo Tosatti @ 2015-04-08 23:33 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Rik van Riel, Luiz Capitulino, linux-rt-users, kvm Sebastian, rebased against v3.18.7-rt2 as requested. The problem: On -RT, an emulated LAPIC timer instance has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: # cyclictest -m -n -q -p99 -l 1000000 -h60 -D 1m This patch reduces the average latency in my tests from 14us to 11us. v2: improve changelog (Rik van Riel) v3: limit (once) guest triggered printk and WARN_ON (Paolo Bonzini) v4: fix typo (Steven Rostedt) v5: rebase against v3.18.7-rt2. ^ permalink raw reply [flat|nested] 42+ messages in thread
* [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2015-04-08 23:33 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v5) Marcelo Tosatti @ 2015-04-08 23:33 ` Marcelo Tosatti 0 siblings, 0 replies; 42+ messages in thread From: Marcelo Tosatti @ 2015-04-08 23:33 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Rik van Riel, Luiz Capitulino, linux-rt-users, kvm, Marcelo Tosatti [-- Attachment #1: kvm-use-simplewaitqueue-2-lapic --] [-- Type: text/plain, Size: 2794 bytes --] Since lapic timer handler only wakes up a simple waitqueue, it can be executed from hardirq context. Also handle the case where hrtimer_start_expires fails due to -ETIME, by injecting the interrupt to the guest immediately. Reduces average cyclictest latency by 3us. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- arch/x86/kvm/lapic.c | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) Index: rt-linux/arch/x86/kvm/lapic.c =================================================================== --- rt-linux.orig/arch/x86/kvm/lapic.c 2015-04-08 20:20:41.000000000 -0300 +++ rt-linux/arch/x86/kvm/lapic.c 2015-04-08 20:21:16.592739674 -0300 @@ -1034,8 +1034,38 @@ apic->divide_count); } + +static enum hrtimer_restart apic_timer_fn(struct hrtimer *data); + +static void apic_timer_expired(struct hrtimer *data) +{ + int ret, i = 0; + enum hrtimer_restart r; + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + + r = apic_timer_fn(data); + + if (r == HRTIMER_RESTART) { + do { + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); + if (ret == -ETIME) + hrtimer_add_expires_ns(&ktimer->timer, + ktimer->period); + i++; + } while (ret == -ETIME && i < 10); + + if (ret == -ETIME) { + printk_once(KERN_ERR "%s: failed to reprogram timer\n", + __func__); + WARN_ON_ONCE(1); + } + } +} + + static void start_apic_timer(struct kvm_lapic *apic) { + int ret; ktime_t now; atomic_set(&apic->lapic_timer.pending, 0); @@ -1065,9 +1095,11 @@ } } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, apic->lapic_timer.period), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016" PRIx64 ", " @@ -1097,8 +1129,10 @@ ns = (tscdeadline - guest_tsc) * 1000000ULL; do_div(ns, this_tsc_khz); } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, ns), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); local_irq_restore(flags); } @@ -1587,6 +1621,7 @@ hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); apic->lapic_timer.timer.function = apic_timer_fn; + apic->lapic_timer.timer.irqsafe = 1; /* * APIC is created enabled. This will prevent kvm_lapic_set_base from @@ -1707,7 +1742,8 @@ timer = &vcpu->arch.apic->lapic_timer.timer; if (hrtimer_cancel(timer)) - hrtimer_start_expires(timer, HRTIMER_MODE_ABS); + if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME) + apic_timer_expired(timer); } /* ^ permalink raw reply [flat|nested] 42+ messages in thread
* [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v4) @ 2015-01-21 20:36 Marcelo Tosatti 2015-01-21 20:36 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 0 siblings, 1 reply; 42+ messages in thread From: Marcelo Tosatti @ 2015-01-21 20:36 UTC (permalink / raw) To: linux-kernel, linux-rt-users Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini Against v3.14-rt branch of git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git The problem: On -RT, an emulated LAPIC timer instance has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: # cyclictest -m -n -q -p99 -l 1000000 -h60 -D 1m This patch reduces the average latency in my tests from 14us to 11us. v2: improve changelog (Rik van Riel) v3: limit (once) guest triggered printk and WARN_ON (Paolo Bonzini) v4: fix typo (Steven Rostedt) ^ permalink raw reply [flat|nested] 42+ messages in thread
* [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2015-01-21 20:36 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v4) Marcelo Tosatti @ 2015-01-21 20:36 ` Marcelo Tosatti 0 siblings, 0 replies; 42+ messages in thread From: Marcelo Tosatti @ 2015-01-21 20:36 UTC (permalink / raw) To: linux-kernel, linux-rt-users Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini, Marcelo Tosatti [-- Attachment #1: kvm-lapic-irqsafe --] [-- Type: text/plain, Size: 2815 bytes --] Since lapic timer handler only wakes up a simple waitqueue, it can be executed from hardirq context. Also handle the case where hrtimer_start_expires fails due to -ETIME, by injecting the interrupt to the guest immediately. Reduces average cyclictest latency by 3us. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- arch/x86/kvm/lapic.c | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) Index: linux-stable-rt/arch/x86/kvm/lapic.c =================================================================== --- linux-stable-rt.orig/arch/x86/kvm/lapic.c 2014-11-25 14:14:38.636810068 -0200 +++ linux-stable-rt/arch/x86/kvm/lapic.c 2015-01-14 14:59:17.840251874 -0200 @@ -1031,8 +1031,38 @@ apic->divide_count); } + +static enum hrtimer_restart apic_timer_fn(struct hrtimer *data); + +static void apic_timer_expired(struct hrtimer *data) +{ + int ret, i = 0; + enum hrtimer_restart r; + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + + r = apic_timer_fn(data); + + if (r == HRTIMER_RESTART) { + do { + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); + if (ret == -ETIME) + hrtimer_add_expires_ns(&ktimer->timer, + ktimer->period); + i++; + } while (ret == -ETIME && i < 10); + + if (ret == -ETIME) { + printk_once(KERN_ERR "%s: failed to reprogram timer\n", + __func__); + WARN_ON_ONCE(1); + } + } +} + + static void start_apic_timer(struct kvm_lapic *apic) { + int ret; ktime_t now; atomic_set(&apic->lapic_timer.pending, 0); @@ -1062,9 +1092,11 @@ } } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, apic->lapic_timer.period), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016" PRIx64 ", " @@ -1094,8 +1126,10 @@ ns = (tscdeadline - guest_tsc) * 1000000ULL; do_div(ns, this_tsc_khz); } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, ns), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); local_irq_restore(flags); } @@ -1581,6 +1615,7 @@ hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); apic->lapic_timer.timer.function = apic_timer_fn; + apic->lapic_timer.timer.irqsafe = 1; /* * APIC is created enabled. This will prevent kvm_lapic_set_base from @@ -1699,7 +1734,8 @@ timer = &vcpu->arch.apic->lapic_timer.timer; if (hrtimer_cancel(timer)) - hrtimer_start_expires(timer, HRTIMER_MODE_ABS); + if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME) + apic_timer_expired(timer); } /* ^ permalink raw reply [flat|nested] 42+ messages in thread
* [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v2) @ 2014-11-25 17:21 Marcelo Tosatti 2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 0 siblings, 1 reply; 42+ messages in thread From: Marcelo Tosatti @ 2014-11-25 17:21 UTC (permalink / raw) To: linux-kernel Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm The problem: On -RT, an emulated LAPIC timer instances has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: # cyclictest -m -n -q -p99 -l 1000000 -h60 -D 1m This patch reduces the average latency in my tests from 14us to 11us. v2: improve changelog (Rik van Riel) ^ permalink raw reply [flat|nested] 42+ messages in thread
* [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 17:21 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v2) Marcelo Tosatti @ 2014-11-25 17:21 ` Marcelo Tosatti 2014-11-25 17:38 ` Paolo Bonzini 2014-11-25 19:11 ` Rik van Riel 0 siblings, 2 replies; 42+ messages in thread From: Marcelo Tosatti @ 2014-11-25 17:21 UTC (permalink / raw) To: linux-kernel Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Marcelo Tosatti [-- Attachment #1: kvm-lapic-irqsafe --] [-- Type: text/plain, Size: 2681 bytes --] Since lapic timer handler only wakes up a simple waitqueue, it can be executed from hardirq context. Reduces average cyclictest latency by 3us. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- arch/x86/kvm/lapic.c | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) Index: linux-stable-rt/arch/x86/kvm/lapic.c =================================================================== --- linux-stable-rt.orig/arch/x86/kvm/lapic.c 2014-11-25 14:14:38.636810068 -0200 +++ linux-stable-rt/arch/x86/kvm/lapic.c 2014-11-25 14:17:28.850567059 -0200 @@ -1031,8 +1031,38 @@ apic->divide_count); } + +static enum hrtimer_restart apic_timer_fn(struct hrtimer *data); + +static void apic_timer_expired(struct hrtimer *data) +{ + int ret, i = 0; + enum hrtimer_restart r; + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + + r = apic_timer_fn(data); + + if (r == HRTIMER_RESTART) { + do { + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); + if (ret == -ETIME) + hrtimer_add_expires_ns(&ktimer->timer, + ktimer->period); + i++; + } while (ret == -ETIME && i < 10); + + if (ret == -ETIME) { + printk(KERN_ERR "%s: failed to reprogram timer\n", + __func__); + WARN_ON(1); + } + } +} + + static void start_apic_timer(struct kvm_lapic *apic) { + int ret; ktime_t now; atomic_set(&apic->lapic_timer.pending, 0); @@ -1062,9 +1092,11 @@ } } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, apic->lapic_timer.period), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016" PRIx64 ", " @@ -1094,8 +1126,10 @@ ns = (tscdeadline - guest_tsc) * 1000000ULL; do_div(ns, this_tsc_khz); } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, ns), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); local_irq_restore(flags); } @@ -1581,6 +1615,7 @@ hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); apic->lapic_timer.timer.function = apic_timer_fn; + apic->lapic_timer.timer.irqsafe = 1; /* * APIC is created enabled. This will prevent kvm_lapic_set_base from @@ -1699,7 +1734,8 @@ timer = &vcpu->arch.apic->lapic_timer.timer; if (hrtimer_cancel(timer)) - hrtimer_start_expires(timer, HRTIMER_MODE_ABS); + if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME) + apic_timer_expired(timer); } /* ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti @ 2014-11-25 17:38 ` Paolo Bonzini 2014-11-25 19:11 ` Rik van Riel 1 sibling, 0 replies; 42+ messages in thread From: Paolo Bonzini @ 2014-11-25 17:38 UTC (permalink / raw) To: Marcelo Tosatti, linux-kernel Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm On 25/11/2014 18:21, Marcelo Tosatti wrote: > + > + if (r == HRTIMER_RESTART) { > + do { > + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); > + if (ret == -ETIME) > + hrtimer_add_expires_ns(&ktimer->timer, > + ktimer->period); Is it possible to just compute the time where the next interrupt happens? I suspect the printk and WARN_ON below can be easily triggered by a guest. Paolo > + i++; > + } while (ret == -ETIME && i < 10); > + > + if (ret == -ETIME) { > + printk(KERN_ERR "%s: failed to reprogram timer\n", > + __func__); > + WARN_ON(1); > + } > + } ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe @ 2014-11-25 17:38 ` Paolo Bonzini 0 siblings, 0 replies; 42+ messages in thread From: Paolo Bonzini @ 2014-11-25 17:38 UTC (permalink / raw) To: linux-kernel; +Cc: kvm On 25/11/2014 18:21, Marcelo Tosatti wrote: > + > + if (r == HRTIMER_RESTART) { > + do { > + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); > + if (ret == -ETIME) > + hrtimer_add_expires_ns(&ktimer->timer, > + ktimer->period); Is it possible to just compute the time where the next interrupt happens? I suspect the printk and WARN_ON below can be easily triggered by a guest. Paolo > + i++; > + } while (ret == -ETIME && i < 10); > + > + if (ret == -ETIME) { > + printk(KERN_ERR "%s: failed to reprogram timer\n", > + __func__); > + WARN_ON(1); > + } > + } ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 17:38 ` Paolo Bonzini (?) @ 2014-11-25 19:01 ` Jan Kiszka -1 siblings, 0 replies; 42+ messages in thread From: Jan Kiszka @ 2014-11-25 19:01 UTC (permalink / raw) To: Paolo Bonzini, linux-kernel; +Cc: kvm On 2014-11-25 18:38, Paolo Bonzini wrote: > > > On 25/11/2014 18:21, Marcelo Tosatti wrote: >> + >> + if (r == HRTIMER_RESTART) { >> + do { >> + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); >> + if (ret == -ETIME) >> + hrtimer_add_expires_ns(&ktimer->timer, >> + ktimer->period); > > Is it possible to just compute the time where the next interrupt > happens? I suspect the printk and WARN_ON below can be easily triggered > by a guest. We have a lower bound for the period that a guest can program. Unless that value is set too low, this should practically not happen if we avoid disturbances while handling the event and reprogramming the next one (irqs off?). Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 17:38 ` Paolo Bonzini (?) (?) @ 2014-12-04 13:43 ` Marcelo Tosatti -1 siblings, 0 replies; 42+ messages in thread From: Marcelo Tosatti @ 2014-12-04 13:43 UTC (permalink / raw) To: Paolo Bonzini Cc: linux-kernel, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm On Tue, Nov 25, 2014 at 06:38:04PM +0100, Paolo Bonzini wrote: > > > On 25/11/2014 18:21, Marcelo Tosatti wrote: > > + > > + if (r == HRTIMER_RESTART) { > > + do { > > + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); > > + if (ret == -ETIME) > > + hrtimer_add_expires_ns(&ktimer->timer, > > + ktimer->period); > > Is it possible to just compute the time where the next interrupt > happens? Yes but there is no guarantee hrtimer_start_expires will not fail. > I suspect the printk and WARN_ON below can be easily triggered > by a guest. I'll remove those, thanks. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 2014-11-25 17:38 ` Paolo Bonzini @ 2014-11-25 19:11 ` Rik van Riel 2014-11-25 20:24 ` Thomas Gleixner 1 sibling, 1 reply; 42+ messages in thread From: Rik van Riel @ 2014-11-25 19:11 UTC (permalink / raw) To: Marcelo Tosatti, linux-kernel Cc: Luiz Capitulino, Steven Rostedt, Thomas Gleixner, kvm On 11/25/2014 12:21 PM, Marcelo Tosatti wrote: > Since lapic timer handler only wakes up a simple waitqueue, > it can be executed from hardirq context. > > Reduces average cyclictest latency by 3us. Can this patch be merged in the KVM tree, and go upstream via Paolo? > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 19:11 ` Rik van Riel @ 2014-11-25 20:24 ` Thomas Gleixner 2014-12-04 13:53 ` Paolo Bonzini 0 siblings, 1 reply; 42+ messages in thread From: Thomas Gleixner @ 2014-11-25 20:24 UTC (permalink / raw) To: Rik van Riel Cc: Marcelo Tosatti, linux-kernel, Luiz Capitulino, Steven Rostedt, kvm On Tue, 25 Nov 2014, Rik van Riel wrote: > On 11/25/2014 12:21 PM, Marcelo Tosatti wrote: > > Since lapic timer handler only wakes up a simple waitqueue, > > it can be executed from hardirq context. > > > > Reduces average cyclictest latency by 3us. > > Can this patch be merged in the KVM tree, and go > upstream via Paolo? Not really as it has RT dependencies .... Thanks, tglx ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 20:24 ` Thomas Gleixner @ 2014-12-04 13:53 ` Paolo Bonzini 0 siblings, 0 replies; 42+ messages in thread From: Paolo Bonzini @ 2014-12-04 13:53 UTC (permalink / raw) To: Thomas Gleixner, Rik van Riel Cc: Marcelo Tosatti, linux-kernel, Luiz Capitulino, Steven Rostedt, kvm On 25/11/2014 21:24, Thomas Gleixner wrote: > On Tue, 25 Nov 2014, Rik van Riel wrote: >> On 11/25/2014 12:21 PM, Marcelo Tosatti wrote: >>> Since lapic timer handler only wakes up a simple waitqueue, >>> it can be executed from hardirq context. >>> >>> Reduces average cyclictest latency by 3us. >> >> Can this patch be merged in the KVM tree, and go >> upstream via Paolo? > > Not really as it has RT dependencies .... Can hrtimer_start_expires even return ETIME on a non-RT kernel? If yes, I can take patch 2. If not, it's better to keep both patches in the RT tree. Paolo ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe @ 2014-12-04 13:53 ` Paolo Bonzini 0 siblings, 0 replies; 42+ messages in thread From: Paolo Bonzini @ 2014-12-04 13:53 UTC (permalink / raw) To: linux-kernel; +Cc: kvm On 25/11/2014 21:24, Thomas Gleixner wrote: > On Tue, 25 Nov 2014, Rik van Riel wrote: >> On 11/25/2014 12:21 PM, Marcelo Tosatti wrote: >>> Since lapic timer handler only wakes up a simple waitqueue, >>> it can be executed from hardirq context. >>> >>> Reduces average cyclictest latency by 3us. >> >> Can this patch be merged in the KVM tree, and go >> upstream via Paolo? > > Not really as it has RT dependencies .... Can hrtimer_start_expires even return ETIME on a non-RT kernel? If yes, I can take patch 2. If not, it's better to keep both patches in the RT tree. Paolo ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-12-04 13:53 ` Paolo Bonzini (?) @ 2014-12-05 16:17 ` Marcelo Tosatti -1 siblings, 0 replies; 42+ messages in thread From: Marcelo Tosatti @ 2014-12-05 16:17 UTC (permalink / raw) To: Paolo Bonzini Cc: Thomas Gleixner, Rik van Riel, linux-kernel, Luiz Capitulino, Steven Rostedt, kvm On Thu, Dec 04, 2014 at 02:53:26PM +0100, Paolo Bonzini wrote: > > > On 25/11/2014 21:24, Thomas Gleixner wrote: > > On Tue, 25 Nov 2014, Rik van Riel wrote: > >> On 11/25/2014 12:21 PM, Marcelo Tosatti wrote: > >>> Since lapic timer handler only wakes up a simple waitqueue, > >>> it can be executed from hardirq context. > >>> > >>> Reduces average cyclictest latency by 3us. > >> > >> Can this patch be merged in the KVM tree, and go > >> upstream via Paolo? > > > > Not really as it has RT dependencies .... > > Can hrtimer_start_expires even return ETIME on a non-RT kernel? If yes, > I can take patch 2. If not, it's better to keep both patches in the RT > tree. > > Paolo It can't. We're still evaluating the necessity of this patch, will resubmit accordingly. ^ permalink raw reply [flat|nested] 42+ messages in thread
* [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue @ 2014-11-25 16:45 Marcelo Tosatti 2014-11-25 16:45 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 0 siblings, 1 reply; 42+ messages in thread From: Marcelo Tosatti @ 2014-11-25 16:45 UTC (permalink / raw) To: linux-kernel Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner Which allows waking up vcpu from hardirq context. ^ permalink raw reply [flat|nested] 42+ messages in thread
* [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 16:45 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue Marcelo Tosatti @ 2014-11-25 16:45 ` Marcelo Tosatti 2014-11-25 17:10 ` Paolo Bonzini 0 siblings, 1 reply; 42+ messages in thread From: Marcelo Tosatti @ 2014-11-25 16:45 UTC (permalink / raw) To: linux-kernel Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, Marcelo Tosatti [-- Attachment #1: kvm-lapic-irqsafe --] [-- Type: text/plain, Size: 2681 bytes --] Since lapic timer handler only wakes up a simple waitqueue, it can be executed from hardirq context. Reduces average cyclictest latency by 3us. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- arch/x86/kvm/lapic.c | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) Index: linux-stable-rt/arch/x86/kvm/lapic.c =================================================================== --- linux-stable-rt.orig/arch/x86/kvm/lapic.c 2014-11-25 14:14:38.636810068 -0200 +++ linux-stable-rt/arch/x86/kvm/lapic.c 2014-11-25 14:17:28.850567059 -0200 @@ -1031,8 +1031,38 @@ apic->divide_count); } + +static enum hrtimer_restart apic_timer_fn(struct hrtimer *data); + +static void apic_timer_expired(struct hrtimer *data) +{ + int ret, i = 0; + enum hrtimer_restart r; + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + + r = apic_timer_fn(data); + + if (r == HRTIMER_RESTART) { + do { + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); + if (ret == -ETIME) + hrtimer_add_expires_ns(&ktimer->timer, + ktimer->period); + i++; + } while (ret == -ETIME && i < 10); + + if (ret == -ETIME) { + printk(KERN_ERR "%s: failed to reprogram timer\n", + __func__); + WARN_ON(1); + } + } +} + + static void start_apic_timer(struct kvm_lapic *apic) { + int ret; ktime_t now; atomic_set(&apic->lapic_timer.pending, 0); @@ -1062,9 +1092,11 @@ } } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, apic->lapic_timer.period), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016" PRIx64 ", " @@ -1094,8 +1126,10 @@ ns = (tscdeadline - guest_tsc) * 1000000ULL; do_div(ns, this_tsc_khz); } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, ns), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); local_irq_restore(flags); } @@ -1581,6 +1615,7 @@ hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); apic->lapic_timer.timer.function = apic_timer_fn; + apic->lapic_timer.timer.irqsafe = 1; /* * APIC is created enabled. This will prevent kvm_lapic_set_base from @@ -1699,7 +1734,8 @@ timer = &vcpu->arch.apic->lapic_timer.timer; if (hrtimer_cancel(timer)) - hrtimer_start_expires(timer, HRTIMER_MODE_ABS); + if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME) + apic_timer_expired(timer); } /* ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 16:45 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti @ 2014-11-25 17:10 ` Paolo Bonzini 0 siblings, 0 replies; 42+ messages in thread From: Paolo Bonzini @ 2014-11-25 17:10 UTC (permalink / raw) To: Marcelo Tosatti, linux-kernel Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner On 25/11/2014 17:45, Marcelo Tosatti wrote: > Since lapic timer handler only wakes up a simple waitqueue, > it can be executed from hardirq context. > > Reduces average cyclictest latency by 3us. > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Please include kvm@vger.kernel.org too. Paolo > --- > arch/x86/kvm/lapic.c | 42 +++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 39 insertions(+), 3 deletions(-) > > Index: linux-stable-rt/arch/x86/kvm/lapic.c > =================================================================== > --- linux-stable-rt.orig/arch/x86/kvm/lapic.c 2014-11-25 14:14:38.636810068 -0200 > +++ linux-stable-rt/arch/x86/kvm/lapic.c 2014-11-25 14:17:28.850567059 -0200 > @@ -1031,8 +1031,38 @@ > apic->divide_count); > } > > + > +static enum hrtimer_restart apic_timer_fn(struct hrtimer *data); > + > +static void apic_timer_expired(struct hrtimer *data) > +{ > + int ret, i = 0; > + enum hrtimer_restart r; > + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); > + > + r = apic_timer_fn(data); > + > + if (r == HRTIMER_RESTART) { > + do { > + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); > + if (ret == -ETIME) > + hrtimer_add_expires_ns(&ktimer->timer, > + ktimer->period); > + i++; > + } while (ret == -ETIME && i < 10); > + > + if (ret == -ETIME) { > + printk(KERN_ERR "%s: failed to reprogram timer\n", > + __func__); > + WARN_ON(1); > + } > + } > +} > + > + > static void start_apic_timer(struct kvm_lapic *apic) > { > + int ret; > ktime_t now; > atomic_set(&apic->lapic_timer.pending, 0); > > @@ -1062,9 +1092,11 @@ > } > } > > - hrtimer_start(&apic->lapic_timer.timer, > + ret = hrtimer_start(&apic->lapic_timer.timer, > ktime_add_ns(now, apic->lapic_timer.period), > HRTIMER_MODE_ABS); > + if (ret == -ETIME) > + apic_timer_expired(&apic->lapic_timer.timer); > > apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016" > PRIx64 ", " > @@ -1094,8 +1126,10 @@ > ns = (tscdeadline - guest_tsc) * 1000000ULL; > do_div(ns, this_tsc_khz); > } > - hrtimer_start(&apic->lapic_timer.timer, > + ret = hrtimer_start(&apic->lapic_timer.timer, > ktime_add_ns(now, ns), HRTIMER_MODE_ABS); > + if (ret == -ETIME) > + apic_timer_expired(&apic->lapic_timer.timer); > > local_irq_restore(flags); > } > @@ -1581,6 +1615,7 @@ > hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC, > HRTIMER_MODE_ABS); > apic->lapic_timer.timer.function = apic_timer_fn; > + apic->lapic_timer.timer.irqsafe = 1; > > /* > * APIC is created enabled. This will prevent kvm_lapic_set_base from > @@ -1699,7 +1734,8 @@ > > timer = &vcpu->arch.apic->lapic_timer.timer; > if (hrtimer_cancel(timer)) > - hrtimer_start_expires(timer, HRTIMER_MODE_ABS); > + if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME) > + apic_timer_expired(timer); > } > > /* > > ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2015-08-17 20:31 UTC | newest] Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-01-14 17:12 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) Marcelo Tosatti 2015-01-14 17:12 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti 2015-01-14 18:22 ` Rik van Riel 2015-01-16 16:48 ` Steven Rostedt 2015-01-16 16:56 ` Steven Rostedt 2015-01-17 7:57 ` Peter Zijlstra 2015-01-19 14:41 ` Marcelo Tosatti 2015-01-20 5:46 ` Paul Mackerras 2015-01-20 18:16 ` Steven Rostedt 2015-01-21 15:07 ` Peter Zijlstra 2015-02-17 17:44 ` Sebastian Andrzej Siewior 2015-02-18 14:03 ` Peter Zijlstra 2015-02-25 21:02 ` Sebastian Andrzej Siewior 2015-08-07 10:57 ` Peter Zijlstra 2015-08-07 11:14 ` Peter Zijlstra 2015-08-07 11:14 ` Peter Zijlstra 2015-08-07 16:41 ` Christoph Hellwig 2015-08-07 16:45 ` Peter Zijlstra 2015-08-09 6:39 ` Christoph Hellwig 2015-08-17 20:31 ` Thomas Gleixner 2015-02-27 0:23 ` Marcelo Tosatti 2015-03-05 1:09 ` Marcelo Tosatti 2015-03-05 1:09 ` Marcelo Tosatti 2015-03-05 7:42 ` Sebastian Andrzej Siewior 2015-03-06 13:54 ` Sebastian Andrzej Siewior 2015-01-14 17:12 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 2015-01-14 18:23 ` Rik van Riel 2015-01-14 17:35 ` [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) Paolo Bonzini -- strict thread matches above, loose matches on Subject: below -- 2015-04-08 23:33 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v5) Marcelo Tosatti 2015-04-08 23:33 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 2015-01-21 20:36 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v4) Marcelo Tosatti 2015-01-21 20:36 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 2014-11-25 17:21 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v2) Marcelo Tosatti 2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 2014-11-25 17:38 ` Paolo Bonzini 2014-11-25 17:38 ` Paolo Bonzini 2014-11-25 19:01 ` Jan Kiszka 2014-12-04 13:43 ` Marcelo Tosatti 2014-11-25 19:11 ` Rik van Riel 2014-11-25 20:24 ` Thomas Gleixner 2014-12-04 13:53 ` Paolo Bonzini 2014-12-04 13:53 ` Paolo Bonzini 2014-12-05 16:17 ` Marcelo Tosatti 2014-11-25 16:45 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue Marcelo Tosatti 2014-11-25 16:45 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 2014-11-25 17:10 ` Paolo Bonzini
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.