All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3]: Fixes to IRQ routing
@ 2010-06-16 21:11 Chris Lalancette
  2010-06-16 21:11 ` [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts Chris Lalancette
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Chris Lalancette @ 2010-06-16 21:11 UTC (permalink / raw)
  To: kvm

As we've discussed previously, here is a series of patches to
fix some of the IRQ routing issues we have in KVM.  With this series
in place I was able to successfully kdump a RHEL-5 64-bit, and RHEL-6
32- and 64-bit guest on CPU's other than the BSP.  RHEL-5 32-bit kdump still
does not work; it gets stuck on "Checking 'hlt' instruction".  However,
it does that both before and after this series, so there is something
else going on there that I still have to debug.

I also need to change the "kvm_migrate_pit_timer" function to migrate the
timer over to the last CPU that handled the timer interrupt, on the
theory that that particlar CPU is likely to handle the timer interrupt again
in the near future.  However, this is an optimization and shouldn't delay the
inclusion of the rest of the series for correctness.

Changes since RFC:
     - Changed ps->inject_lock from raw_spinlock_t to spinlock_t
     - Fixed up some formatting issues
     - Changed to have one PIT workqueue per-guest
     - Remember to cancel_work_sync when destroying the PIT

Changes since v1:
     - Call cancel_work_sync everywhere we call hrtimer_cancel
     - Bring back the reinjection logic
     - Fix up formatting issues from checkpatch

Changes since v2:
     - Fix up the reinjection logic thanks to review from Gleb and Marcelo
       Tested with -no-kvm-pit-reinjection on a RHEL-3 guest


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2010-06-16 21:11 [PATCH v3 0/3]: Fixes to IRQ routing Chris Lalancette
@ 2010-06-16 21:11 ` Chris Lalancette
  2012-04-16 16:33   ` Jan Kiszka
  2010-06-16 21:11 ` [PATCH v3 2/3] Allow any LAPIC to accept PIC interrupts Chris Lalancette
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Chris Lalancette @ 2010-06-16 21:11 UTC (permalink / raw)
  To: kvm; +Cc: Chris Lalancette

We really want to "kvm_set_irq" during the hrtimer callback,
but that is risky because that is during interrupt context.
Instead, offload the work to a workqueue, which is a bit safer
and should provide most of the same functionality.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
---
 arch/x86/kvm/i8254.c |  141 ++++++++++++++++++++++++++++++--------------------
 arch/x86/kvm/i8254.h |    4 +-
 arch/x86/kvm/irq.c   |    1 -
 3 files changed, 88 insertions(+), 58 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 188d827..467cc47 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -34,6 +34,7 @@
 
 #include <linux/kvm_host.h>
 #include <linux/slab.h>
+#include <linux/workqueue.h>
 
 #include "irq.h"
 #include "i8254.h"
@@ -244,11 +245,22 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
 {
 	struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state,
 						 irq_ack_notifier);
-	raw_spin_lock(&ps->inject_lock);
-	if (atomic_dec_return(&ps->pit_timer.pending) < 0)
+	int value;
+
+	spin_lock(&ps->inject_lock);
+	value = atomic_dec_return(&ps->pit_timer.pending);
+	if (value < 0)
+		/* spurious acks can be generated if, for example, the
+		 * PIC is being reset.  Handle it gracefully here
+		 */
 		atomic_inc(&ps->pit_timer.pending);
+	else if (value > 0)
+		/* in this case, we had multiple outstanding pit interrupts
+		 * that we needed to inject.  Reinject
+		 */
+		queue_work(ps->pit->wq, &ps->pit->expired);
 	ps->irq_ack = 1;
-	raw_spin_unlock(&ps->inject_lock);
+	spin_unlock(&ps->inject_lock);
 }
 
 void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
@@ -264,10 +276,10 @@ void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
 		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
 }
 
-static void destroy_pit_timer(struct kvm_timer *pt)
+static void destroy_pit_timer(struct kvm_pit *pit)
 {
-	pr_debug("execute del timer!\n");
-	hrtimer_cancel(&pt->timer);
+	hrtimer_cancel(&pit->pit_state.pit_timer.timer);
+	cancel_work_sync(&pit->expired);
 }
 
 static bool kpit_is_periodic(struct kvm_timer *ktimer)
@@ -281,6 +293,60 @@ static struct kvm_timer_ops kpit_ops = {
 	.is_periodic = kpit_is_periodic,
 };
 
+static void pit_do_work(struct work_struct *work)
+{
+	struct kvm_pit *pit = container_of(work, struct kvm_pit, expired);
+	struct kvm *kvm = pit->kvm;
+	struct kvm_vcpu *vcpu;
+	int i;
+	struct kvm_kpit_state *ps = &pit->pit_state;
+	int inject = 0;
+
+	/* Try to inject pending interrupts when
+	 * last one has been acked.
+	 */
+	spin_lock(&ps->inject_lock);
+	if (ps->irq_ack) {
+		ps->irq_ack = 0;
+		inject = 1;
+	}
+	spin_unlock(&ps->inject_lock);
+	if (inject) {
+		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
+		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
+
+		/*
+		 * Provides NMI watchdog support via Virtual Wire mode.
+		 * The route is: PIT -> PIC -> LVT0 in NMI mode.
+		 *
+		 * Note: Our Virtual Wire implementation is simplified, only
+		 * propagating PIT interrupts to all VCPUs when they have set
+		 * LVT0 to NMI delivery. Other PIC interrupts are just sent to
+		 * VCPU0, and only if its LVT0 is in EXTINT mode.
+		 */
+		if (kvm->arch.vapics_in_nmi_mode > 0)
+			kvm_for_each_vcpu(i, vcpu, kvm)
+				kvm_apic_nmi_wd_deliver(vcpu);
+	}
+}
+
+static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
+{
+	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+	struct kvm_pit *pt = ktimer->kvm->arch.vpit;
+
+	if (ktimer->reinject || !atomic_read(&ktimer->pending)) {
+		atomic_inc(&ktimer->pending);
+		queue_work(pt->wq, &pt->expired);
+	}
+
+	if (ktimer->t_ops->is_periodic(ktimer)) {
+		hrtimer_add_expires_ns(&ktimer->timer, ktimer->period);
+		return HRTIMER_RESTART;
+	} else
+		return HRTIMER_NORESTART;
+}
+
 static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period)
 {
 	struct kvm_timer *pt = &ps->pit_timer;
@@ -292,13 +358,13 @@ static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period)
 
 	/* TODO The new value only affected after the retriggered */
 	hrtimer_cancel(&pt->timer);
+	cancel_work_sync(&ps->pit->expired);
 	pt->period = interval;
 	ps->is_periodic = is_period;
 
-	pt->timer.function = kvm_timer_fn;
+	pt->timer.function = pit_timer_fn;
 	pt->t_ops = &kpit_ops;
 	pt->kvm = ps->pit->kvm;
-	pt->vcpu = pt->kvm->bsp_vcpu;
 
 	atomic_set(&pt->pending, 0);
 	ps->irq_ack = 1;
@@ -347,7 +413,7 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val)
 		}
 		break;
 	default:
-		destroy_pit_timer(&ps->pit_timer);
+		destroy_pit_timer(kvm->arch.vpit);
 	}
 }
 
@@ -626,7 +692,14 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 
 	mutex_init(&pit->pit_state.lock);
 	mutex_lock(&pit->pit_state.lock);
-	raw_spin_lock_init(&pit->pit_state.inject_lock);
+	spin_lock_init(&pit->pit_state.inject_lock);
+
+	pit->wq = create_singlethread_workqueue("kvm-pit-wq");
+	if (!pit->wq) {
+		kfree(pit);
+		return NULL;
+	}
+	INIT_WORK(&pit->expired, pit_do_work);
 
 	kvm->arch.vpit = pit;
 	pit->kvm = kvm;
@@ -685,54 +758,10 @@ void kvm_free_pit(struct kvm *kvm)
 		mutex_lock(&kvm->arch.vpit->pit_state.lock);
 		timer = &kvm->arch.vpit->pit_state.pit_timer.timer;
 		hrtimer_cancel(timer);
+		cancel_work_sync(&kvm->arch.vpit->expired);
 		kvm_free_irq_source_id(kvm, kvm->arch.vpit->irq_source_id);
 		mutex_unlock(&kvm->arch.vpit->pit_state.lock);
+		destroy_workqueue(kvm->arch.vpit->wq);
 		kfree(kvm->arch.vpit);
 	}
 }
-
-static void __inject_pit_timer_intr(struct kvm *kvm)
-{
-	struct kvm_vcpu *vcpu;
-	int i;
-
-	kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
-	kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
-
-	/*
-	 * Provides NMI watchdog support via Virtual Wire mode.
-	 * The route is: PIT -> PIC -> LVT0 in NMI mode.
-	 *
-	 * Note: Our Virtual Wire implementation is simplified, only
-	 * propagating PIT interrupts to all VCPUs when they have set
-	 * LVT0 to NMI delivery. Other PIC interrupts are just sent to
-	 * VCPU0, and only if its LVT0 is in EXTINT mode.
-	 */
-	if (kvm->arch.vapics_in_nmi_mode > 0)
-		kvm_for_each_vcpu(i, vcpu, kvm)
-			kvm_apic_nmi_wd_deliver(vcpu);
-}
-
-void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu)
-{
-	struct kvm_pit *pit = vcpu->kvm->arch.vpit;
-	struct kvm *kvm = vcpu->kvm;
-	struct kvm_kpit_state *ps;
-
-	if (pit) {
-		int inject = 0;
-		ps = &pit->pit_state;
-
-		/* Try to inject pending interrupts when
-		 * last one has been acked.
-		 */
-		raw_spin_lock(&ps->inject_lock);
-		if (atomic_read(&ps->pit_timer.pending) && ps->irq_ack) {
-			ps->irq_ack = 0;
-			inject = 1;
-		}
-		raw_spin_unlock(&ps->inject_lock);
-		if (inject)
-			__inject_pit_timer_intr(kvm);
-	}
-}
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index 900d6b0..46d08ca 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -27,7 +27,7 @@ struct kvm_kpit_state {
 	u32    speaker_data_on;
 	struct mutex lock;
 	struct kvm_pit *pit;
-	raw_spinlock_t inject_lock;
+	spinlock_t inject_lock;
 	unsigned long irq_ack;
 	struct kvm_irq_ack_notifier irq_ack_notifier;
 };
@@ -40,6 +40,8 @@ struct kvm_pit {
 	struct kvm_kpit_state pit_state;
 	int irq_source_id;
 	struct kvm_irq_mask_notifier mask_notifier;
+	struct workqueue_struct *wq;
+	struct work_struct expired;
 };
 
 #define KVM_PIT_BASE_ADDRESS	    0x40
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 0f4e488..2095a04 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -90,7 +90,6 @@ EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
 {
 	kvm_inject_apic_timer_irqs(vcpu);
-	kvm_inject_pit_timer_irqs(vcpu);
 	/* TODO: PIT, RTC etc. */
 }
 EXPORT_SYMBOL_GPL(kvm_inject_pending_timer_irqs);
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/3] Allow any LAPIC to accept PIC interrupts.
  2010-06-16 21:11 [PATCH v3 0/3]: Fixes to IRQ routing Chris Lalancette
  2010-06-16 21:11 ` [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts Chris Lalancette
@ 2010-06-16 21:11 ` Chris Lalancette
  2010-06-16 21:11 ` [PATCH v3 3/3] In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's Chris Lalancette
  2010-06-18 17:44 ` [PATCH v3 0/3]: Fixes to IRQ routing Marcelo Tosatti
  3 siblings, 0 replies; 23+ messages in thread
From: Chris Lalancette @ 2010-06-16 21:11 UTC (permalink / raw)
  To: kvm; +Cc: Chris Lalancette

If the guest wants to accept timer interrupts on a CPU other
than the BSP, we need to remove this gate.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
---
 arch/x86/kvm/lapic.c |   12 +++++-------
 1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index d8258a0..ee0f76c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1107,13 +1107,11 @@ int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
 	u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
 	int r = 0;
 
-	if (kvm_vcpu_is_bsp(vcpu)) {
-		if (!apic_hw_enabled(vcpu->arch.apic))
-			r = 1;
-		if ((lvt0 & APIC_LVT_MASKED) == 0 &&
-		    GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
-			r = 1;
-	}
+	if (!apic_hw_enabled(vcpu->arch.apic))
+		r = 1;
+	if ((lvt0 & APIC_LVT_MASKED) == 0 &&
+	    GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
+		r = 1;
 	return r;
 }
 
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 3/3] In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's
  2010-06-16 21:11 [PATCH v3 0/3]: Fixes to IRQ routing Chris Lalancette
  2010-06-16 21:11 ` [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts Chris Lalancette
  2010-06-16 21:11 ` [PATCH v3 2/3] Allow any LAPIC to accept PIC interrupts Chris Lalancette
@ 2010-06-16 21:11 ` Chris Lalancette
  2010-06-18 17:44 ` [PATCH v3 0/3]: Fixes to IRQ routing Marcelo Tosatti
  3 siblings, 0 replies; 23+ messages in thread
From: Chris Lalancette @ 2010-06-16 21:11 UTC (permalink / raw)
  To: kvm; +Cc: Chris Lalancette

Otherwise we might try to deliver a timer interrupt to a cpu that
can't possibly handle it.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
---
 virt/kvm/irq_comm.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 52f412f..06cf61e 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -100,7 +100,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 			if (r < 0)
 				r = 0;
 			r += kvm_apic_set_irq(vcpu, irq);
-		} else {
+		} else if (kvm_lapic_enabled(vcpu)) {
 			if (!lowest)
 				lowest = vcpu;
 			else if (kvm_apic_compare_prio(vcpu, lowest) < 0)
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/3]: Fixes to IRQ routing
  2010-06-16 21:11 [PATCH v3 0/3]: Fixes to IRQ routing Chris Lalancette
                   ` (2 preceding siblings ...)
  2010-06-16 21:11 ` [PATCH v3 3/3] In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's Chris Lalancette
@ 2010-06-18 17:44 ` Marcelo Tosatti
  3 siblings, 0 replies; 23+ messages in thread
From: Marcelo Tosatti @ 2010-06-18 17:44 UTC (permalink / raw)
  To: Chris Lalancette; +Cc: kvm

On Wed, Jun 16, 2010 at 05:11:10PM -0400, Chris Lalancette wrote:
> As we've discussed previously, here is a series of patches to
> fix some of the IRQ routing issues we have in KVM.  With this series
> in place I was able to successfully kdump a RHEL-5 64-bit, and RHEL-6
> 32- and 64-bit guest on CPU's other than the BSP.  RHEL-5 32-bit kdump still
> does not work; it gets stuck on "Checking 'hlt' instruction".  However,
> it does that both before and after this series, so there is something
> else going on there that I still have to debug.
> 
> I also need to change the "kvm_migrate_pit_timer" function to migrate the
> timer over to the last CPU that handled the timer interrupt, on the
> theory that that particlar CPU is likely to handle the timer interrupt again
> in the near future.  However, this is an optimization and shouldn't delay the
> inclusion of the rest of the series for correctness.

Applied, thanks.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2010-06-16 21:11 ` [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts Chris Lalancette
@ 2012-04-16 16:33   ` Jan Kiszka
  2012-04-16 17:07     ` Avi Kivity
  2012-04-17  9:31     ` Gleb Natapov
  0 siblings, 2 replies; 23+ messages in thread
From: Jan Kiszka @ 2012-04-16 16:33 UTC (permalink / raw)
  To: Chris Lalancette; +Cc: kvm, Marcelo Tosatti, Avi Kivity, Gleb Natapov

On 2010-06-16 23:11, Chris Lalancette wrote:
> We really want to "kvm_set_irq" during the hrtimer callback,
> but that is risky because that is during interrupt context.
> Instead, offload the work to a workqueue, which is a bit safer
> and should provide most of the same functionality.

Unfortunately, workqueues do not have fixed kthread associations (and
"kvm-pit-wq" would be too unspecific when running multiple VMs). So I
just realized that this subtly breaks the ability to run KVM guests with
RT priority (boot managers with timeouts hang as the workqueue starves).

Before throwing some kthread_worker at this, could someone help me
recalling what was "risky" here? That the PIT IRQ may have to be
broadcast to a large number of VCPUs? I would offer to include this
information in my changelog. ;)

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2012-04-16 16:33   ` Jan Kiszka
@ 2012-04-16 17:07     ` Avi Kivity
  2012-04-17  9:31     ` Gleb Natapov
  1 sibling, 0 replies; 23+ messages in thread
From: Avi Kivity @ 2012-04-16 17:07 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm, Marcelo Tosatti, Gleb Natapov

On 04/16/2012 07:33 PM, Jan Kiszka wrote:
> On 2010-06-16 23:11, Chris Lalancette wrote:
> > We really want to "kvm_set_irq" during the hrtimer callback,
> > but that is risky because that is during interrupt context.
> > Instead, offload the work to a workqueue, which is a bit safer
> > and should provide most of the same functionality.
>
> Unfortunately, workqueues do not have fixed kthread associations (and
> "kvm-pit-wq" would be too unspecific when running multiple VMs). So I
> just realized that this subtly breaks the ability to run KVM guests with
> RT priority (boot managers with timeouts hang as the workqueue starves).
>
> Before throwing some kthread_worker at this, could someone help me
> recalling what was "risky" here? That the PIT IRQ may have to be
> broadcast to a large number of VCPUs? 

(or any irq, once we make that function irq safe)

> I would offer to include this
> information in my changelog. ;)

That, plus the need to make all the .set methods safe in that context.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2012-04-16 16:33   ` Jan Kiszka
  2012-04-16 17:07     ` Avi Kivity
@ 2012-04-17  9:31     ` Gleb Natapov
  2012-04-17 10:23       ` Avi Kivity
  1 sibling, 1 reply; 23+ messages in thread
From: Gleb Natapov @ 2012-04-17  9:31 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Chris Lalancette, kvm, Marcelo Tosatti, Avi Kivity

On Mon, Apr 16, 2012 at 06:33:26PM +0200, Jan Kiszka wrote:
> On 2010-06-16 23:11, Chris Lalancette wrote:
> > We really want to "kvm_set_irq" during the hrtimer callback,
> > but that is risky because that is during interrupt context.
> > Instead, offload the work to a workqueue, which is a bit safer
> > and should provide most of the same functionality.
> 
> Unfortunately, workqueues do not have fixed kthread associations (and
> "kvm-pit-wq" would be too unspecific when running multiple VMs). So I
> just realized that this subtly breaks the ability to run KVM guests with
> RT priority (boot managers with timeouts hang as the workqueue starves).
> 
> Before throwing some kthread_worker at this, could someone help me
> recalling what was "risky" here? That the PIT IRQ may have to be
> broadcast to a large number of VCPUs? I would offer to include this
> information in my changelog. ;)
> 
ioapic and pic irq_set functions uses spinlocks, but not irqsave
variants. Also I am always forget whether is is safe to kick vcpu from
irq context.

--
			Gleb.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2012-04-17  9:31     ` Gleb Natapov
@ 2012-04-17 10:23       ` Avi Kivity
  2012-04-17 10:26         ` Gleb Natapov
  0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2012-04-17 10:23 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Jan Kiszka, kvm, Marcelo Tosatti

On 04/17/2012 12:31 PM, Gleb Natapov wrote:
> On Mon, Apr 16, 2012 at 06:33:26PM +0200, Jan Kiszka wrote:
> > On 2010-06-16 23:11, Chris Lalancette wrote:
> > > We really want to "kvm_set_irq" during the hrtimer callback,
> > > but that is risky because that is during interrupt context.
> > > Instead, offload the work to a workqueue, which is a bit safer
> > > and should provide most of the same functionality.
> > 
> > Unfortunately, workqueues do not have fixed kthread associations (and
> > "kvm-pit-wq" would be too unspecific when running multiple VMs). So I
> > just realized that this subtly breaks the ability to run KVM guests with
> > RT priority (boot managers with timeouts hang as the workqueue starves).
> > 
> > Before throwing some kthread_worker at this, could someone help me
> > recalling what was "risky" here? That the PIT IRQ may have to be
> > broadcast to a large number of VCPUs? I would offer to include this
> > information in my changelog. ;)
> > 
> ioapic and pic irq_set functions uses spinlocks, but not irqsave
> variants. Also I am always forget whether is is safe to kick vcpu from
> irq context.

It isn't, since you need to send an IPI.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2012-04-17 10:23       ` Avi Kivity
@ 2012-04-17 10:26         ` Gleb Natapov
  2012-04-17 10:29           ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Gleb Natapov @ 2012-04-17 10:26 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jan Kiszka, kvm, Marcelo Tosatti

On Tue, Apr 17, 2012 at 01:23:52PM +0300, Avi Kivity wrote:
> On 04/17/2012 12:31 PM, Gleb Natapov wrote:
> > On Mon, Apr 16, 2012 at 06:33:26PM +0200, Jan Kiszka wrote:
> > > On 2010-06-16 23:11, Chris Lalancette wrote:
> > > > We really want to "kvm_set_irq" during the hrtimer callback,
> > > > but that is risky because that is during interrupt context.
> > > > Instead, offload the work to a workqueue, which is a bit safer
> > > > and should provide most of the same functionality.
> > > 
> > > Unfortunately, workqueues do not have fixed kthread associations (and
> > > "kvm-pit-wq" would be too unspecific when running multiple VMs). So I
> > > just realized that this subtly breaks the ability to run KVM guests with
> > > RT priority (boot managers with timeouts hang as the workqueue starves).
> > > 
> > > Before throwing some kthread_worker at this, could someone help me
> > > recalling what was "risky" here? That the PIT IRQ may have to be
> > > broadcast to a large number of VCPUs? I would offer to include this
> > > information in my changelog. ;)
> > > 
> > ioapic and pic irq_set functions uses spinlocks, but not irqsave
> > variants. Also I am always forget whether is is safe to kick vcpu from
> > irq context.
> 
> It isn't, since you need to send an IPI.
> 
That is exactly what I forget whether you can send IPI from there :)
Anyway this is another reason.

--
			Gleb.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2012-04-17 10:26         ` Gleb Natapov
@ 2012-04-17 10:29           ` Avi Kivity
  2012-04-17 10:31             ` Gleb Natapov
  0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2012-04-17 10:29 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Jan Kiszka, kvm, Marcelo Tosatti

On 04/17/2012 01:26 PM, Gleb Natapov wrote:
> > It isn't, since you need to send an IPI.
> > 
> That is exactly what I forget whether you can send IPI from there :)
> Anyway this is another reason.
>

Actually I was wrong.  You can't smp_call_function_single() from irq
context (deadlocks if two vcpus do that), but we send a reschedule
interrupt.  So it should work.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2012-04-17 10:29           ` Avi Kivity
@ 2012-04-17 10:31             ` Gleb Natapov
  2012-04-17 10:42               ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Gleb Natapov @ 2012-04-17 10:31 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jan Kiszka, kvm, Marcelo Tosatti

On Tue, Apr 17, 2012 at 01:29:04PM +0300, Avi Kivity wrote:
> On 04/17/2012 01:26 PM, Gleb Natapov wrote:
> > > It isn't, since you need to send an IPI.
> > > 
> > That is exactly what I forget whether you can send IPI from there :)
> > Anyway this is another reason.
> >
> 
> Actually I was wrong.  You can't smp_call_function_single() from irq
> context (deadlocks if two vcpus do that), but we send a reschedule
> interrupt.  So it should work.
> 
Ah, good point. So if we'll use irqsave versions of spinlocks we can
drop kthread?

--
			Gleb.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2012-04-17 10:31             ` Gleb Natapov
@ 2012-04-17 10:42               ` Avi Kivity
  2012-04-17 10:43                 ` Avi Kivity
  2012-04-17 11:05                 ` Gleb Natapov
  0 siblings, 2 replies; 23+ messages in thread
From: Avi Kivity @ 2012-04-17 10:42 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Jan Kiszka, kvm, Marcelo Tosatti

On 04/17/2012 01:31 PM, Gleb Natapov wrote:
> On Tue, Apr 17, 2012 at 01:29:04PM +0300, Avi Kivity wrote:
> > On 04/17/2012 01:26 PM, Gleb Natapov wrote:
> > > > It isn't, since you need to send an IPI.
> > > > 
> > > That is exactly what I forget whether you can send IPI from there :)
> > > Anyway this is another reason.
> > >
> > 
> > Actually I was wrong.  You can't smp_call_function_single() from irq
> > context (deadlocks if two vcpus do that), but we send a reschedule
> > interrupt.  So it should work.
> > 
> Ah, good point. So if we'll use irqsave versions of spinlocks we can
> drop kthread?

Do we want 254 IPIs to be issued from irq context?  Could be slow.

We can make the unicast case run from irq context and defer the
multicast to a thread.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2012-04-17 10:42               ` Avi Kivity
@ 2012-04-17 10:43                 ` Avi Kivity
  2012-04-17 11:05                 ` Gleb Natapov
  1 sibling, 0 replies; 23+ messages in thread
From: Avi Kivity @ 2012-04-17 10:43 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Jan Kiszka, kvm, Marcelo Tosatti

On 04/17/2012 01:42 PM, Avi Kivity wrote:
> On 04/17/2012 01:31 PM, Gleb Natapov wrote:
> > On Tue, Apr 17, 2012 at 01:29:04PM +0300, Avi Kivity wrote:
> > > On 04/17/2012 01:26 PM, Gleb Natapov wrote:
> > > > > It isn't, since you need to send an IPI.
> > > > > 
> > > > That is exactly what I forget whether you can send IPI from there :)
> > > > Anyway this is another reason.
> > > >
> > > 
> > > Actually I was wrong.  You can't smp_call_function_single() from irq
> > > context (deadlocks if two vcpus do that), but we send a reschedule
> > > interrupt.  So it should work.
> > > 
> > Ah, good point. So if we'll use irqsave versions of spinlocks we can
> > drop kthread?
>
> Do we want 254 IPIs to be issued from irq context?  Could be slow.
>
> We can make the unicast case run from irq context and defer the
> multicast to a thread.

(that should help eventfd and device assignment)

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2012-04-17 10:42               ` Avi Kivity
  2012-04-17 10:43                 ` Avi Kivity
@ 2012-04-17 11:05                 ` Gleb Natapov
  2012-04-17 12:00                   ` Avi Kivity
  1 sibling, 1 reply; 23+ messages in thread
From: Gleb Natapov @ 2012-04-17 11:05 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jan Kiszka, kvm, Marcelo Tosatti

On Tue, Apr 17, 2012 at 01:42:31PM +0300, Avi Kivity wrote:
> On 04/17/2012 01:31 PM, Gleb Natapov wrote:
> > On Tue, Apr 17, 2012 at 01:29:04PM +0300, Avi Kivity wrote:
> > > On 04/17/2012 01:26 PM, Gleb Natapov wrote:
> > > > > It isn't, since you need to send an IPI.
> > > > > 
> > > > That is exactly what I forget whether you can send IPI from there :)
> > > > Anyway this is another reason.
> > > >
> > > 
> > > Actually I was wrong.  You can't smp_call_function_single() from irq
> > > context (deadlocks if two vcpus do that), but we send a reschedule
> > > interrupt.  So it should work.
> > > 
> > Ah, good point. So if we'll use irqsave versions of spinlocks we can
> > drop kthread?
> 
> Do we want 254 IPIs to be issued from irq context?  Could be slow.
> 
Where this number is coming from?

> We can make the unicast case run from irq context and defer the
> multicast to a thread.
> 
We do not know if gsi is multicasted in the pit code though.

--
			Gleb.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2012-04-17 11:05                 ` Gleb Natapov
@ 2012-04-17 12:00                   ` Avi Kivity
  2012-04-17 12:03                     ` Gleb Natapov
  0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2012-04-17 12:00 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Jan Kiszka, kvm, Marcelo Tosatti

On 04/17/2012 02:05 PM, Gleb Natapov wrote:
> On Tue, Apr 17, 2012 at 01:42:31PM +0300, Avi Kivity wrote:
> > On 04/17/2012 01:31 PM, Gleb Natapov wrote:
> > > On Tue, Apr 17, 2012 at 01:29:04PM +0300, Avi Kivity wrote:
> > > > On 04/17/2012 01:26 PM, Gleb Natapov wrote:
> > > > > > It isn't, since you need to send an IPI.
> > > > > > 
> > > > > That is exactly what I forget whether you can send IPI from there :)
> > > > > Anyway this is another reason.
> > > > >
> > > > 
> > > > Actually I was wrong.  You can't smp_call_function_single() from irq
> > > > context (deadlocks if two vcpus do that), but we send a reschedule
> > > > interrupt.  So it should work.
> > > > 
> > > Ah, good point. So if we'll use irqsave versions of spinlocks we can
> > > drop kthread?
> > 
> > Do we want 254 IPIs to be issued from irq context?  Could be slow.
> > 
> Where this number is coming from?

KVM_MAX_VCPUS.

> > We can make the unicast case run from irq context and defer the
> > multicast to a thread.
> > 
> We do not know if gsi is multicasted in the pit code though.
>

We don't have to do it in the pit code.  The .set method can decide
whether it wants a direct path or go through a thread (which thread?
passed as a parameter?)

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2012-04-17 12:00                   ` Avi Kivity
@ 2012-04-17 12:03                     ` Gleb Natapov
  2012-04-17 12:06                       ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Gleb Natapov @ 2012-04-17 12:03 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jan Kiszka, kvm, Marcelo Tosatti

On Tue, Apr 17, 2012 at 03:00:10PM +0300, Avi Kivity wrote:
> On 04/17/2012 02:05 PM, Gleb Natapov wrote:
> > On Tue, Apr 17, 2012 at 01:42:31PM +0300, Avi Kivity wrote:
> > > On 04/17/2012 01:31 PM, Gleb Natapov wrote:
> > > > On Tue, Apr 17, 2012 at 01:29:04PM +0300, Avi Kivity wrote:
> > > > > On 04/17/2012 01:26 PM, Gleb Natapov wrote:
> > > > > > > It isn't, since you need to send an IPI.
> > > > > > > 
> > > > > > That is exactly what I forget whether you can send IPI from there :)
> > > > > > Anyway this is another reason.
> > > > > >
> > > > > 
> > > > > Actually I was wrong.  You can't smp_call_function_single() from irq
> > > > > context (deadlocks if two vcpus do that), but we send a reschedule
> > > > > interrupt.  So it should work.
> > > > > 
> > > > Ah, good point. So if we'll use irqsave versions of spinlocks we can
> > > > drop kthread?
> > > 
> > > Do we want 254 IPIs to be issued from irq context?  Could be slow.
> > > 
> > Where this number is coming from?
> 
> KVM_MAX_VCPUS.
> 
Ah, so you are worried about malicious guest configuring pit to
broadcast to all its vcpus.

> > > We can make the unicast case run from irq context and defer the
> > > multicast to a thread.
> > > 
> > We do not know if gsi is multicasted in the pit code though.
> >
> 
> We don't have to do it in the pit code.  The .set method can decide
> whether it wants a direct path or go through a thread (which thread?
> passed as a parameter?)
> 
Agree.

--
			Gleb.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2012-04-17 12:03                     ` Gleb Natapov
@ 2012-04-17 12:06                       ` Avi Kivity
  2012-04-17 16:15                         ` Jan Kiszka
  0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2012-04-17 12:06 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Jan Kiszka, kvm, Marcelo Tosatti

On 04/17/2012 03:03 PM, Gleb Natapov wrote:
> > 
> > KVM_MAX_VCPUS.
> > 
> Ah, so you are worried about malicious guest configuring pit to
> broadcast to all its vcpus.

Yes - it can introduce huge amounts of latency this way which is exactly
what Jan is trying to prevent.

Though I'm not sure spin_lock_irq() in the realtime tree actually
disables irqs (but it's certainly not a good idea in mainline; it's
nasty even with just the spinlock).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2012-04-17 12:06                       ` Avi Kivity
@ 2012-04-17 16:15                         ` Jan Kiszka
  2012-04-17 16:17                           ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kiszka @ 2012-04-17 16:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gleb Natapov, kvm, Marcelo Tosatti

On 2012-04-17 14:06, Avi Kivity wrote:
> On 04/17/2012 03:03 PM, Gleb Natapov wrote:
>>>
>>> KVM_MAX_VCPUS.
>>>
>> Ah, so you are worried about malicious guest configuring pit to
>> broadcast to all its vcpus.
> 
> Yes - it can introduce huge amounts of latency this way which is exactly
> what Jan is trying to prevent.
> 
> Though I'm not sure spin_lock_irq() in the realtime tree actually
> disables irqs (but it's certainly not a good idea in mainline; it's
> nasty even with just the spinlock).

This depends on how you declare the spin lock type - raw or normal. The
former will disable irqs, the latter not even preemption (but become a
mutex).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2012-04-17 16:15                         ` Jan Kiszka
@ 2012-04-17 16:17                           ` Avi Kivity
  2012-04-18  8:04                             ` Gleb Natapov
  0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2012-04-17 16:17 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Gleb Natapov, kvm, Marcelo Tosatti

On 04/17/2012 07:15 PM, Jan Kiszka wrote:
> On 2012-04-17 14:06, Avi Kivity wrote:
> > On 04/17/2012 03:03 PM, Gleb Natapov wrote:
> >>>
> >>> KVM_MAX_VCPUS.
> >>>
> >> Ah, so you are worried about malicious guest configuring pit to
> >> broadcast to all its vcpus.
> > 
> > Yes - it can introduce huge amounts of latency this way which is exactly
> > what Jan is trying to prevent.
> > 
> > Though I'm not sure spin_lock_irq() in the realtime tree actually
> > disables irqs (but it's certainly not a good idea in mainline; it's
> > nasty even with just the spinlock).
>
> This depends on how you declare the spin lock type - raw or normal. The
> former will disable irqs, the latter not even preemption (but become a
> mutex).

Yes (and I see no reason to use raw spinlocks here).  Still, for
mainline, are we okay with 254*IPIs?  Maybe it's not so bad and I'm
overinflating the problem.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2012-04-17 16:17                           ` Avi Kivity
@ 2012-04-18  8:04                             ` Gleb Natapov
  2012-04-18  8:25                               ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Gleb Natapov @ 2012-04-18  8:04 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jan Kiszka, kvm, Marcelo Tosatti

On Tue, Apr 17, 2012 at 07:17:11PM +0300, Avi Kivity wrote:
> On 04/17/2012 07:15 PM, Jan Kiszka wrote:
> > On 2012-04-17 14:06, Avi Kivity wrote:
> > > On 04/17/2012 03:03 PM, Gleb Natapov wrote:
> > >>>
> > >>> KVM_MAX_VCPUS.
> > >>>
> > >> Ah, so you are worried about malicious guest configuring pit to
> > >> broadcast to all its vcpus.
> > > 
> > > Yes - it can introduce huge amounts of latency this way which is exactly
> > > what Jan is trying to prevent.
> > > 
> > > Though I'm not sure spin_lock_irq() in the realtime tree actually
> > > disables irqs (but it's certainly not a good idea in mainline; it's
> > > nasty even with just the spinlock).
> >
> > This depends on how you declare the spin lock type - raw or normal. The
> > former will disable irqs, the latter not even preemption (but become a
> > mutex).
> 
> Yes (and I see no reason to use raw spinlocks here).  Still, for
It was raw spinlock until f4f510508741680e423524c222f615276ca6222c.

> mainline, are we okay with 254*IPIs?  Maybe it's not so bad and I'm
> overinflating the problem.
> 
Isn't 254*IPIs can also happen if application changes memory mapping?

--
			Gleb.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2012-04-18  8:04                             ` Gleb Natapov
@ 2012-04-18  8:25                               ` Avi Kivity
  2012-04-18  8:27                                 ` Gleb Natapov
  0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2012-04-18  8:25 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Jan Kiszka, kvm, Marcelo Tosatti

On 04/18/2012 11:04 AM, Gleb Natapov wrote:
> > mainline, are we okay with 254*IPIs?  Maybe it's not so bad and I'm
> > overinflating the problem.
> > 
> Isn't 254*IPIs can also happen if application changes memory mapping?
>

It's not in irq context.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
  2012-04-18  8:25                               ` Avi Kivity
@ 2012-04-18  8:27                                 ` Gleb Natapov
  0 siblings, 0 replies; 23+ messages in thread
From: Gleb Natapov @ 2012-04-18  8:27 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jan Kiszka, kvm, Marcelo Tosatti

On Wed, Apr 18, 2012 at 11:25:45AM +0300, Avi Kivity wrote:
> On 04/18/2012 11:04 AM, Gleb Natapov wrote:
> > > mainline, are we okay with 254*IPIs?  Maybe it's not so bad and I'm
> > > overinflating the problem.
> > > 
> > Isn't 254*IPIs can also happen if application changes memory mapping?
> >
> 
> It's not in irq context.
> 
Ah, yes. Missed that small detail.

--
			Gleb.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2012-04-18  8:28 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-16 21:11 [PATCH v3 0/3]: Fixes to IRQ routing Chris Lalancette
2010-06-16 21:11 ` [PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts Chris Lalancette
2012-04-16 16:33   ` Jan Kiszka
2012-04-16 17:07     ` Avi Kivity
2012-04-17  9:31     ` Gleb Natapov
2012-04-17 10:23       ` Avi Kivity
2012-04-17 10:26         ` Gleb Natapov
2012-04-17 10:29           ` Avi Kivity
2012-04-17 10:31             ` Gleb Natapov
2012-04-17 10:42               ` Avi Kivity
2012-04-17 10:43                 ` Avi Kivity
2012-04-17 11:05                 ` Gleb Natapov
2012-04-17 12:00                   ` Avi Kivity
2012-04-17 12:03                     ` Gleb Natapov
2012-04-17 12:06                       ` Avi Kivity
2012-04-17 16:15                         ` Jan Kiszka
2012-04-17 16:17                           ` Avi Kivity
2012-04-18  8:04                             ` Gleb Natapov
2012-04-18  8:25                               ` Avi Kivity
2012-04-18  8:27                                 ` Gleb Natapov
2010-06-16 21:11 ` [PATCH v3 2/3] Allow any LAPIC to accept PIC interrupts Chris Lalancette
2010-06-16 21:11 ` [PATCH v3 3/3] In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's Chris Lalancette
2010-06-18 17:44 ` [PATCH v3 0/3]: Fixes to IRQ routing Marcelo Tosatti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.