All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5]: Fix kdump under KVM
@ 2009-10-27 16:41 Chris Lalancette
  2009-10-27 16:41 ` [PATCH 1/5] Fix up some comments around the source tree Chris Lalancette
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Chris Lalancette @ 2009-10-27 16:41 UTC (permalink / raw)
  To: kvm


This patch series aims to get kdump working inside a KVM guest.
The current problem with using kdump is that KVM always delivers
PIT interrupts to the BSP, and the BSP only.  While this is
technically allowed by the MPS spec, most motherboards actually
deliver timer interrupts to *any* LAPIC in virtual wire mode.
Since a crash can occur on any CPU, timer interrupts must
be able to reach any CPU in order for kdump to work properly.

Therefore, this patch series kicks all of the relevant vCPUs
when delivering a timer interrupt.  With these patches in
place, kdump in a RHEL-5 guest works properly.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/5] Fix up some comments around the source tree.
  2009-10-27 16:41 [PATCH 0/5]: Fix kdump under KVM Chris Lalancette
@ 2009-10-27 16:41 ` Chris Lalancette
  2009-10-27 16:41 ` [PATCH 2/5] Remove KVM_REQ_PENDING_TIMER Chris Lalancette
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Chris Lalancette @ 2009-10-27 16:41 UTC (permalink / raw)
  To: kvm; +Cc: Chris Lalancette

Signed-off-by: Chris Lalancette <clalance@redhat.com>
---
:100644 100644 34b700f... ba61f27... M	arch/x86/kvm/svm.c
:100644 100644 38a2d20... cd6f92b... M	virt/kvm/ioapic.c
:100644 100644 bd44fb4... c22bc17... M	virt/kvm/kvm_main.c
 arch/x86/kvm/svm.c  |    2 +-
 virt/kvm/ioapic.c   |    2 +-
 virt/kvm/kvm_main.c |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 34b700f..ba61f27 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2608,7 +2608,7 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
 		break;
 	case SVM_EXITINTINFO_TYPE_EXEPT:
 		/* In case of software exception do not reinject an exception
-		   vector, but re-execute and instruction instead */
+		   vector, but re-execute an instruction instead */
 		if (is_nested(svm))
 			break;
 		if (kvm_exception_is_soft(vector))
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 38a2d20..cd6f92b 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -164,7 +164,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
 	irqe.shorthand = 0;
 
 #ifdef CONFIG_X86
-	/* Always delivery PIT interrupt to vcpu 0 */
+	/* Always deliver PIT interrupt to vcpu 0 */
 	if (irq == 0) {
 		irqe.dest_mode = 0; /* Physical mode. */
 		/* need to read apic_id from apic regiest since
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bd44fb4..c22bc17 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1180,7 +1180,7 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu)
 }
 
 /*
- * Creates some virtual cpus.  Good luck creating more than one.
+ * Creates a virtual cpu.
  */
 static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 {
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/5] Remove KVM_REQ_PENDING_TIMER.
  2009-10-27 16:41 [PATCH 0/5]: Fix kdump under KVM Chris Lalancette
  2009-10-27 16:41 ` [PATCH 1/5] Fix up some comments around the source tree Chris Lalancette
@ 2009-10-27 16:41 ` Chris Lalancette
  2009-10-27 16:41 ` [PATCH 3/5] Remove references to VCPU in i8254 Chris Lalancette
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Chris Lalancette @ 2009-10-27 16:41 UTC (permalink / raw)
  To: kvm; +Cc: Chris Lalancette

KVM_REQ_PENDING_TIMER is set and cleared in a couple of places,
but it never seems to be actually checked.  Remove it.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
---
:100644 100644 eea4043... 72b5144... M	arch/x86/kvm/timer.c
:100644 100644 2ef39062.. 93a65b4... M	arch/x86/kvm/x86.c
:100644 100644 bd5a616... 053e49f... M	include/linux/kvm_host.h
 arch/x86/kvm/timer.c     |    5 +----
 arch/x86/kvm/x86.c       |    1 -
 include/linux/kvm_host.h |    1 -
 3 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c
index eea4043..72b5144 100644
--- a/arch/x86/kvm/timer.c
+++ b/arch/x86/kvm/timer.c
@@ -14,11 +14,8 @@ static int __kvm_timer_fn(struct kvm_vcpu *vcpu, struct kvm_timer *ktimer)
 	 * not care about potentially loosing timer events in the !reinject
 	 * case anyway.
 	 */
-	if (ktimer->reinject || !atomic_read(&ktimer->pending)) {
+	if (ktimer->reinject || !atomic_read(&ktimer->pending))
 		atomic_inc(&ktimer->pending);
-		/* FIXME: this code should not know anything about vcpus */
-		set_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests);
-	}
 
 	if (waitqueue_active(q))
 		wake_up_interruptible(q);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2ef3906..93a65b4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3906,7 +3906,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
 		if (r <= 0)
 			break;
 
-		clear_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests);
 		if (kvm_cpu_has_pending_timer(vcpu))
 			kvm_inject_pending_timer_irqs(vcpu);
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bd5a616..053e49f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -33,7 +33,6 @@
 #define KVM_REQ_REPORT_TPR_ACCESS  2
 #define KVM_REQ_MMU_RELOAD         3
 #define KVM_REQ_TRIPLE_FAULT       4
-#define KVM_REQ_PENDING_TIMER      5
 #define KVM_REQ_UNHALT             6
 #define KVM_REQ_MMU_SYNC           7
 #define KVM_REQ_KVMCLOCK_UPDATE    8
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/5] Remove references to VCPU in i8254
  2009-10-27 16:41 [PATCH 0/5]: Fix kdump under KVM Chris Lalancette
  2009-10-27 16:41 ` [PATCH 1/5] Fix up some comments around the source tree Chris Lalancette
  2009-10-27 16:41 ` [PATCH 2/5] Remove KVM_REQ_PENDING_TIMER Chris Lalancette
@ 2009-10-27 16:41 ` Chris Lalancette
  2009-10-27 16:41 ` [PATCH 4/5] Remove timer.c Chris Lalancette
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Chris Lalancette @ 2009-10-27 16:41 UTC (permalink / raw)
  To: kvm; +Cc: Chris Lalancette

Conceptually, the i8254 is hooked to a PIC or IOAPIC.  Therefore,
this patch removes most references to "vcpu" in i8254.c.
There are two exceptions to this:

1)  In pit_timer_fn, we still have to kick the BSP to wake it out
of idle.  This will be changed in a later patch.

2)  In __kvm_migrate_pit_timer, we have to migrate the PIT
around with the BSP, since hrtimers work on a per-CPU basis.
I've added a comment here to clarify why this is needed.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
---
:100644 100644 fab7440... d5c08fa... M	arch/x86/kvm/i8254.c
:100644 100644 d4c1c7f... d7bc40b... M	arch/x86/kvm/i8254.h
:100644 100644 96dfbb6... ab3a56e... M	arch/x86/kvm/irq.c
:100644 100644 c025a23... e16b968... M	arch/x86/kvm/irq.h
:100644 100644 55c7524... ba39e25... M	arch/x86/kvm/kvm_timer.h
 arch/x86/kvm/i8254.c     |   50 ++++++++++++++++++++++++++++++++++++++-------
 arch/x86/kvm/i8254.h     |    4 ++-
 arch/x86/kvm/irq.c       |    4 +-
 arch/x86/kvm/irq.h       |    2 -
 arch/x86/kvm/kvm_timer.h |    3 ++
 5 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index fab7440..d5c08fa 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -33,6 +33,7 @@
 
 #include "irq.h"
 #include "i8254.h"
+#include "kvm_timer.h"
 
 #ifndef CONFIG_X86_64
 #define mod_64(x, y) ((x) - (y) * div64_u64(x, y))
@@ -227,12 +228,13 @@ static void pit_latch_status(struct kvm *kvm, int channel)
 	}
 }
 
-int pit_has_pending_timer(struct kvm_vcpu *vcpu)
+int pit_has_pending_timer(struct kvm *kvm)
 {
-	struct kvm_pit *pit = vcpu->kvm->arch.vpit;
+	struct kvm_pit *pit = kvm->arch.vpit;
 
-	if (pit && kvm_vcpu_is_bsp(vcpu) && pit->pit_state.irq_ack)
+	if (pit && pit->pit_state.irq_ack)
 		return atomic_read(&pit->pit_state.pit_timer.pending);
+
 	return 0;
 }
 
@@ -252,6 +254,13 @@ void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
 	struct kvm_pit *pit = vcpu->kvm->arch.vpit;
 	struct hrtimer *timer;
 
+	/*
+	 * technically, the PIT isn't hooked to a particular VCPU;
+	 * the logical structure is PIT -> [IOA]PIC -> CPU[s].  However,
+	 * hrtimers expire on a per-cpu basis, and since we initially
+	 * created the hrtimer during BSP creation, we move it around
+	 * with the BSP.
+	 */
 	if (!kvm_vcpu_is_bsp(vcpu) || !pit)
 		return;
 
@@ -277,6 +286,33 @@ static struct kvm_timer_ops kpit_ops = {
 	.is_periodic = kpit_is_periodic,
 };
 
+static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
+{
+	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+	int restart_timer = 0;
+
+	/*
+	 * There is a race window between reading and incrementing, but we do
+	 * not care about potentially losing timer events in the !reinject
+	 * case anyway.
+	 */
+	if (ktimer->reinject || !atomic_read(&ktimer->pending))
+		atomic_inc(&ktimer->pending);
+
+	if (waitqueue_active(&ktimer->kvm->bsp_vcpu->wq))
+		wake_up_interruptible(&ktimer->kvm->bsp_vcpu->wq);
+
+	if (ktimer->t_ops->is_periodic(ktimer)) {
+		hrtimer_add_expires_ns(&ktimer->timer, ktimer->period);
+		restart_timer = 1;
+	}
+
+	if (restart_timer)
+		return HRTIMER_RESTART;
+	else
+		return HRTIMER_NORESTART;
+}
+
 static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period)
 {
 	struct kvm_timer *pt = &ps->pit_timer;
@@ -291,10 +327,9 @@ static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period)
 	pt->period = interval;
 	ps->is_periodic = is_period;
 
-	pt->timer.function = kvm_timer_fn;
+	pt->timer.function = pit_timer_fn;
 	pt->t_ops = &kpit_ops;
 	pt->kvm = ps->pit->kvm;
-	pt->vcpu = pt->kvm->bsp_vcpu;
 
 	atomic_set(&pt->pending, 0);
 	ps->irq_ack = 1;
@@ -705,10 +740,9 @@ static void __inject_pit_timer_intr(struct kvm *kvm)
 			kvm_apic_nmi_wd_deliver(vcpu);
 }
 
-void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu)
+void kvm_inject_pit_timer_irqs(struct kvm *kvm)
 {
-	struct kvm_pit *pit = vcpu->kvm->arch.vpit;
-	struct kvm *kvm = vcpu->kvm;
+	struct kvm_pit *pit = kvm->arch.vpit;
 	struct kvm_kpit_state *ps;
 
 	if (pit) {
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index d4c1c7f..d7bc40b 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -49,10 +49,12 @@ struct kvm_pit {
 #define KVM_MAX_PIT_INTR_INTERVAL   HZ / 100
 #define KVM_PIT_CHANNEL_MASK	    0x3
 
-void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu);
+int pit_has_pending_timer(struct kvm *kvm);
+void kvm_inject_pit_timer_irqs(struct kvm *kvm);
 void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start);
 struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags);
 void kvm_free_pit(struct kvm *kvm);
 void kvm_pit_reset(struct kvm_pit *pit);
+void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu);
 
 #endif
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 96dfbb6..ab3a56e 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -34,7 +34,7 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 {
 	int ret;
 
-	ret = pit_has_pending_timer(vcpu);
+	ret = pit_has_pending_timer(vcpu->kvm);
 	ret |= apic_has_pending_timer(vcpu);
 
 	return ret;
@@ -89,7 +89,7 @@ EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
 {
 	kvm_inject_apic_timer_irqs(vcpu);
-	kvm_inject_pit_timer_irqs(vcpu);
+	kvm_inject_pit_timer_irqs(vcpu->kvm);
 	/* TODO: PIT, RTC etc. */
 }
 EXPORT_SYMBOL_GPL(kvm_inject_pending_timer_irqs);
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index c025a23..e16b968 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -95,10 +95,8 @@ void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
 void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu);
-void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu);
 void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
 
-int pit_has_pending_timer(struct kvm_vcpu *vcpu);
 int apic_has_pending_timer(struct kvm_vcpu *vcpu);
 
 #endif
diff --git a/arch/x86/kvm/kvm_timer.h b/arch/x86/kvm/kvm_timer.h
index 55c7524..ba39e25 100644
--- a/arch/x86/kvm/kvm_timer.h
+++ b/arch/x86/kvm/kvm_timer.h
@@ -1,3 +1,5 @@
+#ifndef __KVM_TIMER_H
+#define __KVM_TIMER_H
 
 struct kvm_timer {
 	struct hrtimer timer;
@@ -16,3 +18,4 @@ struct kvm_timer_ops {
 
 enum hrtimer_restart kvm_timer_fn(struct hrtimer *data);
 
+#endif
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/5] Remove timer.c
  2009-10-27 16:41 [PATCH 0/5]: Fix kdump under KVM Chris Lalancette
                   ` (2 preceding siblings ...)
  2009-10-27 16:41 ` [PATCH 3/5] Remove references to VCPU in i8254 Chris Lalancette
@ 2009-10-27 16:41 ` Chris Lalancette
  2009-10-27 16:41 ` [PATCH 5/5] Fix kdump under KVM Chris Lalancette
  2009-10-28  9:59 ` [PATCH 0/5]: " Avi Kivity
  5 siblings, 0 replies; 23+ messages in thread
From: Chris Lalancette @ 2009-10-27 16:41 UTC (permalink / raw)
  To: kvm; +Cc: Chris Lalancette

The code in arch/x86/kvm/timer.c is not similar enough between
the various implementations to really share it.  Move the
implementation into the LAPIC code, and then remove timer.c

Signed-off-by: Chris Lalancette <clalance@redhat.com>
---
:100644 100644 31a7035... 8d9adf6... M	arch/x86/kvm/Makefile
:100644 100644 ba39e25... 55b2dab... M	arch/x86/kvm/kvm_timer.h
:100644 100644 cd60c0b... 411110f... M	arch/x86/kvm/lapic.c
:100644 000000 72b5144... 0000000... D	arch/x86/kvm/timer.c
 arch/x86/kvm/Makefile    |    3 +-
 arch/x86/kvm/kvm_timer.h |    3 --
 arch/x86/kvm/lapic.c     |   34 ++++++++++++++++++++++++++++++++-
 arch/x86/kvm/timer.c     |   47 ----------------------------------------------
 4 files changed, 34 insertions(+), 53 deletions(-)

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 31a7035..8d9adf6 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -10,8 +10,7 @@ kvm-y			+= $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
 				assigned-dev.o)
 kvm-$(CONFIG_IOMMU_API)	+= $(addprefix ../../../virt/kvm/, iommu.o)
 
-kvm-y			+= x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
-			   i8254.o timer.o
+kvm-y			+= x86.o mmu.o emulate.o i8259.o irq.o lapic.o i8254.o
 kvm-intel-y		+= vmx.o
 kvm-amd-y		+= svm.o
 
diff --git a/arch/x86/kvm/kvm_timer.h b/arch/x86/kvm/kvm_timer.h
index ba39e25..55b2dab 100644
--- a/arch/x86/kvm/kvm_timer.h
+++ b/arch/x86/kvm/kvm_timer.h
@@ -15,7 +15,4 @@ struct kvm_timer_ops {
         bool (*is_periodic)(struct kvm_timer *);
 };
 
-
-enum hrtimer_restart kvm_timer_fn(struct hrtimer *data);
-
 #endif
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index cd60c0b..411110f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1040,6 +1040,38 @@ static const struct kvm_io_device_ops apic_mmio_ops = {
 	.write    = apic_mmio_write,
 };
 
+static enum hrtimer_restart lapic_timer_fn(struct hrtimer *data)
+{
+	struct kvm_vcpu *vcpu;
+	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+	int restart_timer = 0;
+
+	vcpu = ktimer->vcpu;
+	if (!vcpu)
+		return HRTIMER_NORESTART;
+
+	/*
+	 * There is a race window between reading and incrementing, but we do
+	 * not care about potentially losing timer events in the !reinject
+	 * case anyway.
+	 */
+	if (ktimer->reinject || !atomic_read(&ktimer->pending))
+		atomic_inc(&ktimer->pending);
+
+	if (waitqueue_active(&vcpu->wq))
+		wake_up_interruptible(&vcpu->wq);
+
+	if (ktimer->t_ops->is_periodic(ktimer)) {
+		hrtimer_add_expires_ns(&ktimer->timer, ktimer->period);
+		restart_timer = 1;
+	}
+
+	if (restart_timer)
+		return HRTIMER_RESTART;
+	else
+		return HRTIMER_NORESTART;
+}
+
 int kvm_create_lapic(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic;
@@ -1065,7 +1097,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
 
 	hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC,
 		     HRTIMER_MODE_ABS);
-	apic->lapic_timer.timer.function = kvm_timer_fn;
+	apic->lapic_timer.timer.function = lapic_timer_fn;
 	apic->lapic_timer.t_ops = &lapic_timer_ops;
 	apic->lapic_timer.kvm = vcpu->kvm;
 	apic->lapic_timer.vcpu = vcpu;
diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c
deleted file mode 100644
index 72b5144..0000000
--- a/arch/x86/kvm/timer.c
+++ /dev/null
@@ -1,47 +0,0 @@
-#include <linux/kvm_host.h>
-#include <linux/kvm.h>
-#include <linux/hrtimer.h>
-#include <asm/atomic.h>
-#include "kvm_timer.h"
-
-static int __kvm_timer_fn(struct kvm_vcpu *vcpu, struct kvm_timer *ktimer)
-{
-	int restart_timer = 0;
-	wait_queue_head_t *q = &vcpu->wq;
-
-	/*
-	 * There is a race window between reading and incrementing, but we do
-	 * not care about potentially loosing timer events in the !reinject
-	 * case anyway.
-	 */
-	if (ktimer->reinject || !atomic_read(&ktimer->pending))
-		atomic_inc(&ktimer->pending);
-
-	if (waitqueue_active(q))
-		wake_up_interruptible(q);
-
-	if (ktimer->t_ops->is_periodic(ktimer)) {
-		hrtimer_add_expires_ns(&ktimer->timer, ktimer->period);
-		restart_timer = 1;
-	}
-
-	return restart_timer;
-}
-
-enum hrtimer_restart kvm_timer_fn(struct hrtimer *data)
-{
-	int restart_timer;
-	struct kvm_vcpu *vcpu;
-	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
-
-	vcpu = ktimer->vcpu;
-	if (!vcpu)
-		return HRTIMER_NORESTART;
-
-	restart_timer = __kvm_timer_fn(vcpu, ktimer);
-	if (restart_timer)
-		return HRTIMER_RESTART;
-	else
-		return HRTIMER_NORESTART;
-}
-
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/5] Fix kdump under KVM.
  2009-10-27 16:41 [PATCH 0/5]: Fix kdump under KVM Chris Lalancette
                   ` (3 preceding siblings ...)
  2009-10-27 16:41 ` [PATCH 4/5] Remove timer.c Chris Lalancette
@ 2009-10-27 16:41 ` Chris Lalancette
  2009-10-27 17:42   ` Marcelo Tosatti
  2009-10-28  9:59 ` [PATCH 0/5]: " Avi Kivity
  5 siblings, 1 reply; 23+ messages in thread
From: Chris Lalancette @ 2009-10-27 16:41 UTC (permalink / raw)
  To: kvm; +Cc: Chris Lalancette

This patch is the main point of the series.  In order for
kdump to properly work inside a KVM guest, we need to make
sure that all VCPUs in virtual wire APIC mode get kicked
to try and pick up the timer interrupts.  To do this,
we iterate over the CPUs and deliver interrupts to the
proper VCPUs.

I don't love the concept of doing kvm_irq_kick_vcpus() from
within pit_timer_fn().  A PIT is not connected to a CPU at all,
only to a PIC or APIC.  However, if a CPU enters idle, this is
the only way to wake it up to check for the interrupt.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
---
:100644 100644 d5c08fa... fe08823... M	arch/x86/kvm/i8254.c
:100644 100644 411110f... 5b699c1... M	arch/x86/kvm/lapic.c
:100644 100644 40010b0... 9c4e52b... M	arch/x86/kvm/lapic.h
:100644 100644 053e49f... 975b0d6... M	include/linux/kvm_host.h
:100644 100644 cd6f92b... 717d265... M	virt/kvm/ioapic.c
:100644 100644 0d454d3... d24ac91... M	virt/kvm/irq_comm.c
 arch/x86/kvm/i8254.c     |    3 +--
 arch/x86/kvm/lapic.c     |   12 ++++++++++++
 arch/x86/kvm/lapic.h     |    1 +
 include/linux/kvm_host.h |    2 ++
 virt/kvm/ioapic.c        |    9 ---------
 virt/kvm/irq_comm.c      |   12 ++++++++++++
 6 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index d5c08fa..fe08823 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -299,8 +299,7 @@ static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
 	if (ktimer->reinject || !atomic_read(&ktimer->pending))
 		atomic_inc(&ktimer->pending);
 
-	if (waitqueue_active(&ktimer->kvm->bsp_vcpu->wq))
-		wake_up_interruptible(&ktimer->kvm->bsp_vcpu->wq);
+	kvm_irq_kick_vcpus(ktimer->kvm);
 
 	if (ktimer->t_ops->is_periodic(ktimer)) {
 		hrtimer_add_expires_ns(&ktimer->timer, ktimer->period);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 411110f..5b699c1 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1031,6 +1031,18 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu)
 		kvm_apic_local_deliver(apic, APIC_LVT0);
 }
 
+int kvm_apic_in_virtual_wire_mode(struct kvm_vcpu *vcpu)
+{
+	if (kvm_lapic_enabled(vcpu)) {
+		u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
+		if ((lvt0 & APIC_LVT_MASKED) == 0 &&
+		    GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
+			return 1;
+	}
+
+	return 0;
+}
+
 static struct kvm_timer_ops lapic_timer_ops = {
 	.is_periodic = lapic_is_periodic,
 };
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 40010b0..9c4e52b 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -30,6 +30,7 @@ void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
 void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
 u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
 void kvm_apic_set_version(struct kvm_vcpu *vcpu);
+int kvm_apic_in_virtual_wire_mode(struct kvm_vcpu *vcpu);
 
 int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
 int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 053e49f..975b0d6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -542,6 +542,8 @@ int kvm_set_irq_routing(struct kvm *kvm,
 			unsigned flags);
 void kvm_free_irq_routing(struct kvm *kvm);
 
+void kvm_irq_kick_vcpus(struct kvm *kvm);
+
 #else
 
 static inline void kvm_free_irq_routing(struct kvm *kvm) {}
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index cd6f92b..717d265 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -163,15 +163,6 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
 	irqe.level = 1;
 	irqe.shorthand = 0;
 
-#ifdef CONFIG_X86
-	/* Always deliver PIT interrupt to vcpu 0 */
-	if (irq == 0) {
-		irqe.dest_mode = 0; /* Physical mode. */
-		/* need to read apic_id from apic regiest since
-		 * it can be rewritten */
-		irqe.dest_id = ioapic->kvm->bsp_vcpu->vcpu_id;
-	}
-#endif
 	return kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe);
 }
 
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 0d454d3..d24ac91 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -293,6 +293,18 @@ void kvm_free_irq_routing(struct kvm *kvm)
 	kfree(kvm->irq_routing);
 }
 
+void kvm_irq_kick_vcpus(struct kvm *kvm)
+{
+	int i;
+	struct kvm_vcpu *vcpu;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (kvm_apic_in_virtual_wire_mode(vcpu))
+			if (waitqueue_active(&vcpu->wq))
+				wake_up_interruptible(&vcpu->wq);
+	}
+}
+
 static int setup_routing_entry(struct kvm_irq_routing_table *rt,
 			       struct kvm_kernel_irq_routing_entry *e,
 			       const struct kvm_irq_routing_entry *ue)
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] Fix kdump under KVM.
  2009-10-27 16:41 ` [PATCH 5/5] Fix kdump under KVM Chris Lalancette
@ 2009-10-27 17:42   ` Marcelo Tosatti
  2009-10-28 10:21     ` Chris Lalancette
  0 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2009-10-27 17:42 UTC (permalink / raw)
  To: Chris Lalancette; +Cc: kvm

On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
> This patch is the main point of the series.  In order for
> kdump to properly work inside a KVM guest, we need to make
> sure that all VCPUs in virtual wire APIC mode get kicked
> to try and pick up the timer interrupts.  To do this,
> we iterate over the CPUs and deliver interrupts to the
> proper VCPUs.
> 
> I don't love the concept of doing kvm_irq_kick_vcpus() from
> within pit_timer_fn().  A PIT is not connected to a CPU at all,
> only to a PIC or APIC.  However, if a CPU enters idle, this is
> the only way to wake it up to check for the interrupt.

The reason the PIT interrupt was fixed to BSP is:

http://www.mail-archive.com/kvm-devel@lists.sourceforge.net/msg13250.html

Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the       
destination overwrite in case its programmed by the guest to 
a single CPU would fix it?




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/5]: Fix kdump under KVM
  2009-10-27 16:41 [PATCH 0/5]: Fix kdump under KVM Chris Lalancette
                   ` (4 preceding siblings ...)
  2009-10-27 16:41 ` [PATCH 5/5] Fix kdump under KVM Chris Lalancette
@ 2009-10-28  9:59 ` Avi Kivity
  2009-10-28 10:13   ` Chris Lalancette
  5 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2009-10-28  9:59 UTC (permalink / raw)
  To: Chris Lalancette; +Cc: kvm

On 10/27/2009 06:41 PM, Chris Lalancette wrote:
> This patch series aims to get kdump working inside a KVM guest.
> The current problem with using kdump is that KVM always delivers
> PIT interrupts to the BSP, and the BSP only.  While this is
> technically allowed by the MPS spec, most motherboards actually
> deliver timer interrupts to *any* LAPIC in virtual wire mode.
> Since a crash can occur on any CPU, timer interrupts must
> be able to reach any CPU in order for kdump to work properly.
>
> Therefore, this patch series kicks all of the relevant vCPUs
> when delivering a timer interrupt.  With these patches in
> place, kdump in a RHEL-5 guest works properly.
>    

This is pretty expensive on large guests.  However I suppose under 
normal conditions we won't be in virtual wire mode?

The kick from i8254 code is pretty bad, as you mention.  I forget why it 
is needed at all - shouldn't kvm_set_irq() end up kicking the correct 
vcpu (and note the pic still insists on the bsp:)

/*
  * callback when PIC0 irq status changed
  */
static void pic_irq_request(void *opaque, int level)
{
     struct kvm *kvm = opaque;
     struct kvm_vcpu *vcpu = kvm->bsp_vcpu;
     struct kvm_pic *s = pic_irqchip(kvm);
     int irq = pic_get_irq(&s->pics[0]);

     s->output = level;
     if (vcpu && level && (s->pics[0].isr_ack & (1 << irq))) {
         s->pics[0].isr_ack &= ~(1 << irq);
         kvm_vcpu_kick(vcpu);
     }
}

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/5]: Fix kdump under KVM
  2009-10-28  9:59 ` [PATCH 0/5]: " Avi Kivity
@ 2009-10-28 10:13   ` Chris Lalancette
  2009-10-28 10:31     ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Lalancette @ 2009-10-28 10:13 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

Avi Kivity wrote:
> On 10/27/2009 06:41 PM, Chris Lalancette wrote:
>> This patch series aims to get kdump working inside a KVM guest.
>> The current problem with using kdump is that KVM always delivers
>> PIT interrupts to the BSP, and the BSP only.  While this is
>> technically allowed by the MPS spec, most motherboards actually
>> deliver timer interrupts to *any* LAPIC in virtual wire mode.
>> Since a crash can occur on any CPU, timer interrupts must
>> be able to reach any CPU in order for kdump to work properly.
>>
>> Therefore, this patch series kicks all of the relevant vCPUs
>> when delivering a timer interrupt.  With these patches in
>> place, kdump in a RHEL-5 guest works properly.
>>    
> 
> This is pretty expensive on large guests.  However I suppose under 
> normal conditions we won't be in virtual wire mode?

Right, exactly.  When running in normal SMP mode, your LAPIC's are in "Symmetric
I/O Mode", meaning they won't be kicked at all.  It's possible in theory for a
guest OS to program all of it's LAPIC's in virtual wire mode, but that is a
decidedly non-standard setup and not one recommended in any of the MPS
documentation that I've read.

> 
> The kick from i8254 code is pretty bad, as you mention.  I forget why it 
> is needed at all - shouldn't kvm_set_irq() end up kicking the correct 

As I understand it, that's not quite how it works.  From what I can see, what
happens is that the i8254 is programmed as an hrtimer.  When the hrtimer
expires, we get a callback in kvm_timer_fn (or pit_timer_fn, in my new code).
That code is running in interrupt context, however, so you can't directly call
"set_irq" at that point.  Instead, we update the "pending" variable and defer
work until later on.  That "later on" is when we are doing a vcpu_run, at which
point we check the "pending" variable, and if set, inject the interrupt.

The problem is that if the vcpu(s) are in idle when the hrtimer expires, and we
don't kick them, no vcpu will wake up, and hence none of them will ever run
"set_irq" to get it injected into the guest.

If you have other ideas on how we might accomplish this, I'd definitely be
interested in hearing them.

> vcpu (and note the pic still insists on the bsp:)
> 
> /*
>   * callback when PIC0 irq status changed
>   */
> static void pic_irq_request(void *opaque, int level)
> {
>      struct kvm *kvm = opaque;
>      struct kvm_vcpu *vcpu = kvm->bsp_vcpu;
>      struct kvm_pic *s = pic_irqchip(kvm);
>      int irq = pic_get_irq(&s->pics[0]);
> 
>      s->output = level;
>      if (vcpu && level && (s->pics[0].isr_ack & (1 << irq))) {
>          s->pics[0].isr_ack &= ~(1 << irq);
>          kvm_vcpu_kick(vcpu);
>      }
> }
> 

Yes, I looked at this too.  It's another one we could fix by changing
"kvm_vcpu_kick()" to "kvm_irq_kick_vcpus()".  However, it's not exactly required
by kdump since the linux kernel prefers to use the IOAPIC where possible.

-- 
Chris Lalancette

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] Fix kdump under KVM.
  2009-10-27 17:42   ` Marcelo Tosatti
@ 2009-10-28 10:21     ` Chris Lalancette
  2009-10-28 12:41       ` Marcelo Tosatti
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Lalancette @ 2009-10-28 10:21 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

Marcelo Tosatti wrote:
> On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
>> This patch is the main point of the series.  In order for
>> kdump to properly work inside a KVM guest, we need to make
>> sure that all VCPUs in virtual wire APIC mode get kicked
>> to try and pick up the timer interrupts.  To do this,
>> we iterate over the CPUs and deliver interrupts to the
>> proper VCPUs.
>>
>> I don't love the concept of doing kvm_irq_kick_vcpus() from
>> within pit_timer_fn().  A PIT is not connected to a CPU at all,
>> only to a PIC or APIC.  However, if a CPU enters idle, this is
>> the only way to wake it up to check for the interrupt.
> 
> The reason the PIT interrupt was fixed to BSP is:
> 
> http://www.mail-archive.com/kvm-devel@lists.sourceforge.net/msg13250.html
> 
> Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the       
> destination overwrite in case its programmed by the guest to 
> a single CPU would fix it?

Ug, nasty.  I definitely don't want to re-introduce that bug.  What exactly do
you mean by "fix it"?  Do you mean fix the original RHEL-5.1 PAE issue, or fix
the kdump issue?

-- 
Chris Lalancette

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/5]: Fix kdump under KVM
  2009-10-28 10:13   ` Chris Lalancette
@ 2009-10-28 10:31     ` Avi Kivity
  2009-10-29  8:34       ` Chris Lalancette
  0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2009-10-28 10:31 UTC (permalink / raw)
  To: Chris Lalancette; +Cc: kvm

On 10/28/2009 12:13 PM, Chris Lalancette wrote:
>
>> The kick from i8254 code is pretty bad, as you mention.  I forget why it
>> is needed at all - shouldn't kvm_set_irq() end up kicking the correct
>>      
> As I understand it, that's not quite how it works.  From what I can see, what
> happens is that the i8254 is programmed as an hrtimer.  When the hrtimer
> expires, we get a callback in kvm_timer_fn (or pit_timer_fn, in my new code).
> That code is running in interrupt context, however, so you can't directly call
> "set_irq" at that point.  Instead, we update the "pending" variable and defer
> work until later on.  That "later on" is when we are doing a vcpu_run, at which
> point we check the "pending" variable, and if set, inject the interrupt.
>
> The problem is that if the vcpu(s) are in idle when the hrtimer expires, and we
> don't kick them, no vcpu will wake up, and hence none of them will ever run
> "set_irq" to get it injected into the guest.
>    

Oh yes, I remember now.

> If you have other ideas on how we might accomplish this, I'd definitely be
> interested in hearing them.
>    

We could schedule work to run in thread context.  Previous to the user 
return notifier work, this would cause a reloading of a bunch of msrs 
and would thus be quite expensive, but now switching to kernel threads 
and back is pretty cheap.

So we could try schedule_work().  My only worry is that it uses 
wake_up() instead of wake_up_sync(), so it could introduce delay.  I'd 
much prefer a variant the schedules immediately.

An alternative is to make irq injection work from irq context.  It's not 
difficult to do technically - convert mutexes to spin_lock_irqsave() - 
I'm just worried about long irqoff times with large guests (ioapic needs 
to iterate over all vcpus sometimes).  This is useful for other cases - 
device assignment interrupt injection, irqfd for vhost/vbus.  So maybe 
we should go this path, and worry about reducing irqoff times later when 
we bump KVM_MAX_VCPUS.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] Fix kdump under KVM.
  2009-10-28 10:21     ` Chris Lalancette
@ 2009-10-28 12:41       ` Marcelo Tosatti
  2009-10-30 12:23         ` Chris Lalancette
  0 siblings, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2009-10-28 12:41 UTC (permalink / raw)
  To: Chris Lalancette; +Cc: kvm

On Wed, Oct 28, 2009 at 11:21:42AM +0100, Chris Lalancette wrote:
> Marcelo Tosatti wrote:
> > On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
> >> This patch is the main point of the series.  In order for
> >> kdump to properly work inside a KVM guest, we need to make
> >> sure that all VCPUs in virtual wire APIC mode get kicked
> >> to try and pick up the timer interrupts.  To do this,
> >> we iterate over the CPUs and deliver interrupts to the
> >> proper VCPUs.
> >>
> >> I don't love the concept of doing kvm_irq_kick_vcpus() from
> >> within pit_timer_fn().  A PIT is not connected to a CPU at all,
> >> only to a PIC or APIC.  However, if a CPU enters idle, this is
> >> the only way to wake it up to check for the interrupt.
> > 
> > The reason the PIT interrupt was fixed to BSP is:
> > 
> > http://www.mail-archive.com/kvm-devel@lists.sourceforge.net/msg13250.html
> > 
> > Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the       
> > destination overwrite in case its programmed by the guest to 
> > a single CPU would fix it?
> 
> Ug, nasty.  I definitely don't want to re-introduce that bug.  What exactly do
> you mean by "fix it"?  Do you mean fix the original RHEL-5.1 PAE issue, or fix
> the kdump issue?

The kdump issue. Something like:

#ifdef CONFIG_X86
        /* Always delivery PIT interrupt to vcpu 0 */
        if (irq == 0 && dest_multiple_vcpus(entry)) {
                irqe.dest_mode = 0; /* Physical mode. */
                /* need to read apic_id from apic regiest since
                 * it can be rewritten */
                irqe.dest_id = ioapic->kvm->bsp_vcpu->vcpu_id;
        }
#endif

The rest of the code should be ready to deal with it.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/5]: Fix kdump under KVM
  2009-10-28 10:31     ` Avi Kivity
@ 2009-10-29  8:34       ` Chris Lalancette
  2009-10-29 10:15         ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Lalancette @ 2009-10-29  8:34 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

Avi Kivity wrote:
> On 10/28/2009 12:13 PM, Chris Lalancette wrote:
>>> The kick from i8254 code is pretty bad, as you mention.  I forget why it
>>> is needed at all - shouldn't kvm_set_irq() end up kicking the correct
>>>      
>> As I understand it, that's not quite how it works.  From what I can see, what
>> happens is that the i8254 is programmed as an hrtimer.  When the hrtimer
>> expires, we get a callback in kvm_timer_fn (or pit_timer_fn, in my new code).
>> That code is running in interrupt context, however, so you can't directly call
>> "set_irq" at that point.  Instead, we update the "pending" variable and defer
>> work until later on.  That "later on" is when we are doing a vcpu_run, at which
>> point we check the "pending" variable, and if set, inject the interrupt.
>>
>> The problem is that if the vcpu(s) are in idle when the hrtimer expires, and we
>> don't kick them, no vcpu will wake up, and hence none of them will ever run
>> "set_irq" to get it injected into the guest.
>>    
> 
> Oh yes, I remember now.
> 
>> If you have other ideas on how we might accomplish this, I'd definitely be
>> interested in hearing them.
>>    
> 
> We could schedule work to run in thread context.  Previous to the user 
> return notifier work, this would cause a reloading of a bunch of msrs 
> and would thus be quite expensive, but now switching to kernel threads 
> and back is pretty cheap.
> 
> So we could try schedule_work().  My only worry is that it uses 
> wake_up() instead of wake_up_sync(), so it could introduce delay.  I'd 
> much prefer a variant the schedules immediately.
> 
> An alternative is to make irq injection work from irq context.  It's not 
> difficult to do technically - convert mutexes to spin_lock_irqsave() - 
> I'm just worried about long irqoff times with large guests (ioapic needs 
> to iterate over all vcpus sometimes).  This is useful for other cases - 
> device assignment interrupt injection, irqfd for vhost/vbus.  So maybe 
> we should go this path, and worry about reducing irqoff times later when 
> we bump KVM_MAX_VCPUS.
> 

I'm starting to take a look at this.  While this may be a generically useful
cleanup, it occurs to me that even if we call "set_irq" from the hrtimer
callback, we still have to do the "kick_vcpus" type thing.  Otherwise, we still
have the problem where if all vcpus are idle, none of them will be doing
vcpu_run anytime soon to actually inject the interrupt into the guest.

Or did I mis-understand what you are proposing?

-- 
Chris Lalancette

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/5]: Fix kdump under KVM
  2009-10-29  8:34       ` Chris Lalancette
@ 2009-10-29 10:15         ` Avi Kivity
  0 siblings, 0 replies; 23+ messages in thread
From: Avi Kivity @ 2009-10-29 10:15 UTC (permalink / raw)
  To: Chris Lalancette; +Cc: kvm

On 10/29/2009 10:34 AM, Chris Lalancette wrote:
>
> I'm starting to take a look at this.  While this may be a generically useful
> cleanup, it occurs to me that even if we call "set_irq" from the hrtimer
> callback, we still have to do the "kick_vcpus" type thing.  Otherwise, we still
> have the problem where if all vcpus are idle, none of them will be doing
> vcpu_run anytime soon to actually inject the interrupt into the guest.
>
> Or did I mis-understand what you are proposing?
>    

The pic and ioapic code will kick vcpus that they deliver interrupts to, 
that's the natural order of things.  It was the kick from the timer 
(which is only connected indirectly to vcpus via the pic/ioapic) which 
is hacky and special.

As a bonus, if the interrupt is masked, no vcpus will be kicked.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] Fix kdump under KVM.
  2009-10-28 12:41       ` Marcelo Tosatti
@ 2009-10-30 12:23         ` Chris Lalancette
  2009-10-30 13:07           ` Marcelo Tosatti
  2009-11-01 15:21           ` Avi Kivity
  0 siblings, 2 replies; 23+ messages in thread
From: Chris Lalancette @ 2009-10-30 12:23 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

Marcelo Tosatti wrote:
> On Wed, Oct 28, 2009 at 11:21:42AM +0100, Chris Lalancette wrote:
>> Marcelo Tosatti wrote:
>>> On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
>>>> This patch is the main point of the series.  In order for
>>>> kdump to properly work inside a KVM guest, we need to make
>>>> sure that all VCPUs in virtual wire APIC mode get kicked
>>>> to try and pick up the timer interrupts.  To do this,
>>>> we iterate over the CPUs and deliver interrupts to the
>>>> proper VCPUs.
>>>>
>>>> I don't love the concept of doing kvm_irq_kick_vcpus() from
>>>> within pit_timer_fn().  A PIT is not connected to a CPU at all,
>>>> only to a PIC or APIC.  However, if a CPU enters idle, this is
>>>> the only way to wake it up to check for the interrupt.
>>> The reason the PIT interrupt was fixed to BSP is:
>>>
>>> http://www.mail-archive.com/kvm-devel@lists.sourceforge.net/msg13250.html
>>>
>>> Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the       
>>> destination overwrite in case its programmed by the guest to 
>>> a single CPU would fix it?
>> Ug, nasty.  I definitely don't want to re-introduce that bug.  What exactly do
>> you mean by "fix it"?  Do you mean fix the original RHEL-5.1 PAE issue, or fix
>> the kdump issue?
> 
> The kdump issue. Something like:
> 
> #ifdef CONFIG_X86
>         /* Always delivery PIT interrupt to vcpu 0 */
>         if (irq == 0 && dest_multiple_vcpus(entry)) {
>                 irqe.dest_mode = 0; /* Physical mode. */
>                 /* need to read apic_id from apic regiest since
>                  * it can be rewritten */
>                 irqe.dest_id = ioapic->kvm->bsp_vcpu->vcpu_id;
>         }
> #endif
> 
> The rest of the code should be ready to deal with it.

Do you have any more information about how to reproduce that original issue?  I
tried installing a RHEL-5.1 i386 guest with a PAE kernel, and I was able to
install and boot it just fine with or without the "always deliver irq 0 to vcpu
0" check above.  Either there's something in particular I need to do to trigger
the issue (hardware or software), or it's being solved another way.

In the meantime, I've gotten the "set_irq from IRQ context" that Avi suggested
working, and the fixing up of this IOAPIC check is the last bit to actually get
kdump working.

-- 
Chris Lalancette

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] Fix kdump under KVM.
  2009-10-30 12:23         ` Chris Lalancette
@ 2009-10-30 13:07           ` Marcelo Tosatti
  2009-10-30 15:28             ` Chris Lalancette
  2009-11-01 15:21           ` Avi Kivity
  1 sibling, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2009-10-30 13:07 UTC (permalink / raw)
  To: Chris Lalancette; +Cc: kvm

On Fri, Oct 30, 2009 at 01:23:57PM +0100, Chris Lalancette wrote:
> Marcelo Tosatti wrote:
> > On Wed, Oct 28, 2009 at 11:21:42AM +0100, Chris Lalancette wrote:
> >> Marcelo Tosatti wrote:
> >>> On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
> >>>> This patch is the main point of the series.  In order for
> >>>> kdump to properly work inside a KVM guest, we need to make
> >>>> sure that all VCPUs in virtual wire APIC mode get kicked
> >>>> to try and pick up the timer interrupts.  To do this,
> >>>> we iterate over the CPUs and deliver interrupts to the
> >>>> proper VCPUs.
> >>>>
> >>>> I don't love the concept of doing kvm_irq_kick_vcpus() from
> >>>> within pit_timer_fn().  A PIT is not connected to a CPU at all,
> >>>> only to a PIC or APIC.  However, if a CPU enters idle, this is
> >>>> the only way to wake it up to check for the interrupt.
> >>> The reason the PIT interrupt was fixed to BSP is:
> >>>
> >>> http://www.mail-archive.com/kvm-devel@lists.sourceforge.net/msg13250.html
> >>>
> >>> Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the       
> >>> destination overwrite in case its programmed by the guest to 
> >>> a single CPU would fix it?
> >> Ug, nasty.  I definitely don't want to re-introduce that bug.  What exactly do
> >> you mean by "fix it"?  Do you mean fix the original RHEL-5.1 PAE issue, or fix
> >> the kdump issue?
> > 
> > The kdump issue. Something like:
> > 
> > #ifdef CONFIG_X86
> >         /* Always delivery PIT interrupt to vcpu 0 */
> >         if (irq == 0 && dest_multiple_vcpus(entry)) {
> >                 irqe.dest_mode = 0; /* Physical mode. */
> >                 /* need to read apic_id from apic regiest since
> >                  * it can be rewritten */
> >                 irqe.dest_id = ioapic->kvm->bsp_vcpu->vcpu_id;
> >         }
> > #endif
> > 
> > The rest of the code should be ready to deal with it.
> 
> Do you have any more information about how to reproduce that original issue?  I
> tried installing a RHEL-5.1 i386 guest with a PAE kernel, and I was able to
> install and boot it just fine with or without the "always deliver irq 0 to vcpu
> 0" check above.  Either there's something in particular I need to do to trigger
> the issue (hardware or software), or it's being solved another way.

For VMX, the guests TSC's are now usually synchronized (which was not
the case before, when this patch was written). For AMD, they start out
of sync.

Migration also causes them to be out of sync. So i'd prefer to postpone
the removal of this limitation until there is a guarantee the TSCs 
are synchronized across vcpus.

> In the meantime, I've gotten the "set_irq from IRQ context" that Avi suggested
> working, and the fixing up of this IOAPIC check is the last bit to actually get
> kdump working.

The kdump fix does not require that, I believe (although it is
useful/required for other things, please send the patch separately).

The hrtimer will fire on vcpu0, then:

kvm_inject_pit_interrupts -> ioapic_set_irq -> apic_deliver ->
kvm_vcpu_kick(vcpuN)

As long as the dest-id = vcpu_id overwrite change is in place.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] Fix kdump under KVM.
  2009-10-30 13:07           ` Marcelo Tosatti
@ 2009-10-30 15:28             ` Chris Lalancette
  2009-10-30 18:03               ` David S. Ahern
  2009-10-30 22:19               ` Marcelo Tosatti
  0 siblings, 2 replies; 23+ messages in thread
From: Chris Lalancette @ 2009-10-30 15:28 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 4090 bytes --]

Marcelo Tosatti wrote:
> On Fri, Oct 30, 2009 at 01:23:57PM +0100, Chris Lalancette wrote:
>> Marcelo Tosatti wrote:
>>> On Wed, Oct 28, 2009 at 11:21:42AM +0100, Chris Lalancette wrote:
>>>> Marcelo Tosatti wrote:
>>>>> On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
>>>>>> This patch is the main point of the series.  In order for
>>>>>> kdump to properly work inside a KVM guest, we need to make
>>>>>> sure that all VCPUs in virtual wire APIC mode get kicked
>>>>>> to try and pick up the timer interrupts.  To do this,
>>>>>> we iterate over the CPUs and deliver interrupts to the
>>>>>> proper VCPUs.
>>>>>>
>>>>>> I don't love the concept of doing kvm_irq_kick_vcpus() from
>>>>>> within pit_timer_fn().  A PIT is not connected to a CPU at all,
>>>>>> only to a PIC or APIC.  However, if a CPU enters idle, this is
>>>>>> the only way to wake it up to check for the interrupt.
>>>>> The reason the PIT interrupt was fixed to BSP is:
>>>>>
>>>>> http://www.mail-archive.com/kvm-devel@lists.sourceforge.net/msg13250.html
>>>>>
>>>>> Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the       
>>>>> destination overwrite in case its programmed by the guest to 
>>>>> a single CPU would fix it?
>>>> Ug, nasty.  I definitely don't want to re-introduce that bug.  What exactly do
>>>> you mean by "fix it"?  Do you mean fix the original RHEL-5.1 PAE issue, or fix
>>>> the kdump issue?
>>> The kdump issue. Something like:
>>>
>>> #ifdef CONFIG_X86
>>>         /* Always delivery PIT interrupt to vcpu 0 */
>>>         if (irq == 0 && dest_multiple_vcpus(entry)) {
>>>                 irqe.dest_mode = 0; /* Physical mode. */
>>>                 /* need to read apic_id from apic regiest since
>>>                  * it can be rewritten */
>>>                 irqe.dest_id = ioapic->kvm->bsp_vcpu->vcpu_id;
>>>         }
>>> #endif
>>>
>>> The rest of the code should be ready to deal with it.
>> Do you have any more information about how to reproduce that original issue?  I
>> tried installing a RHEL-5.1 i386 guest with a PAE kernel, and I was able to
>> install and boot it just fine with or without the "always deliver irq 0 to vcpu
>> 0" check above.  Either there's something in particular I need to do to trigger
>> the issue (hardware or software), or it's being solved another way.
> 
> For VMX, the guests TSC's are now usually synchronized (which was not
> the case before, when this patch was written). For AMD, they start out
> of sync.
> 
> Migration also causes them to be out of sync. So i'd prefer to postpone
> the removal of this limitation until there is a guarantee the TSCs 
> are synchronized across vcpus.

OK.  I'll try to get on an unsynchronized TSC machine and see if I can reproduce
there.

> 
>> In the meantime, I've gotten the "set_irq from IRQ context" that Avi suggested
>> working, and the fixing up of this IOAPIC check is the last bit to actually get
>> kdump working.
> 
> The kdump fix does not require that, I believe (although it is
> useful/required for other things, please send the patch separately).
> 
> The hrtimer will fire on vcpu0, then:
> 
> kvm_inject_pit_interrupts -> ioapic_set_irq -> apic_deliver ->
> kvm_vcpu_kick(vcpuN)
> 
> As long as the dest-id = vcpu_id overwrite change is in place.

Just to follow up here, I am taking your suggestion about testing for dest_id to
multiple vcpus in the IOAPIC, and it seems to be working so far.  However, that
is not enough to fix kdump.  We also need to remove the BSP assumptions in the
i8254 code, otherwise we will never even attempt to deliver the interrupt to
!vcpu0.  That means that we need code for Avi's suggestion about kvm_set_irq for
this to all work.

Attached is the patch that I'm currently testing if anyone else wants to look at
it.  I obviously need to break it up into a nice series, but that will have to
wait until next week.  My "ioapic_dest_multiple_cpus()" function could do with a
"popcnt", but I couldn't find a macro after a quick look around the tree.
What's the recommended way to count bits in Linux?

-- 
Chris Lalancette

[-- Attachment #2: current.patch --]
[-- Type: application/mbox, Size: 24559 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] Fix kdump under KVM.
  2009-10-30 15:28             ` Chris Lalancette
@ 2009-10-30 18:03               ` David S. Ahern
  2009-10-30 22:21                 ` Marcelo Tosatti
  2009-10-30 22:19               ` Marcelo Tosatti
  1 sibling, 1 reply; 23+ messages in thread
From: David S. Ahern @ 2009-10-30 18:03 UTC (permalink / raw)
  To: Chris Lalancette; +Cc: Marcelo Tosatti, kvm, Avi Kivity


On 10/30/2009 09:28 AM, Chris Lalancette wrote:
>>
>> For VMX, the guests TSC's are now usually synchronized (which was not
>> the case before, when this patch was written). For AMD, they start out
>> of sync.
>>
>> Migration also causes them to be out of sync. So i'd prefer to postpone
>> the removal of this limitation until there is a guarantee the TSCs 
>> are synchronized across vcpus.
> 
> OK.  I'll try to get on an unsynchronized TSC machine and see if I can reproduce
> there.

Perhaps Marcelo was referring to his patchset from December 2008
targeting TSC synchronization?

http://www.mail-archive.com/kvm@vger.kernel.org/msg08071.html

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] Fix kdump under KVM.
  2009-10-30 15:28             ` Chris Lalancette
  2009-10-30 18:03               ` David S. Ahern
@ 2009-10-30 22:19               ` Marcelo Tosatti
  1 sibling, 0 replies; 23+ messages in thread
From: Marcelo Tosatti @ 2009-10-30 22:19 UTC (permalink / raw)
  To: Chris Lalancette; +Cc: kvm, Avi Kivity

On Fri, Oct 30, 2009 at 04:28:01PM +0100, Chris Lalancette wrote:
> Marcelo Tosatti wrote:
> > On Fri, Oct 30, 2009 at 01:23:57PM +0100, Chris Lalancette wrote:
> >> Marcelo Tosatti wrote:
> >>> On Wed, Oct 28, 2009 at 11:21:42AM +0100, Chris Lalancette wrote:
> >>>> Marcelo Tosatti wrote:
> >>>>> On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
> >>>>>> This patch is the main point of the series.  In order for
> >>>>>> kdump to properly work inside a KVM guest, we need to make
> >>>>>> sure that all VCPUs in virtual wire APIC mode get kicked
> >>>>>> to try and pick up the timer interrupts.  To do this,
> >>>>>> we iterate over the CPUs and deliver interrupts to the
> >>>>>> proper VCPUs.
> >>>>>>
> >>>>>> I don't love the concept of doing kvm_irq_kick_vcpus() from
> >>>>>> within pit_timer_fn().  A PIT is not connected to a CPU at all,
> >>>>>> only to a PIC or APIC.  However, if a CPU enters idle, this is
> >>>>>> the only way to wake it up to check for the interrupt.
> >>>>> The reason the PIT interrupt was fixed to BSP is:
> >>>>>
> >>>>> http://www.mail-archive.com/kvm-devel@lists.sourceforge.net/msg13250.html
> >>>>>
> >>>>> Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the       
> >>>>> destination overwrite in case its programmed by the guest to 
> >>>>> a single CPU would fix it?
> >>>> Ug, nasty.  I definitely don't want to re-introduce that bug.  What exactly do
> >>>> you mean by "fix it"?  Do you mean fix the original RHEL-5.1 PAE issue, or fix
> >>>> the kdump issue?
> >>> The kdump issue. Something like:
> >>>
> >>> #ifdef CONFIG_X86
> >>>         /* Always delivery PIT interrupt to vcpu 0 */
> >>>         if (irq == 0 && dest_multiple_vcpus(entry)) {
> >>>                 irqe.dest_mode = 0; /* Physical mode. */
> >>>                 /* need to read apic_id from apic regiest since
> >>>                  * it can be rewritten */
> >>>                 irqe.dest_id = ioapic->kvm->bsp_vcpu->vcpu_id;
> >>>         }
> >>> #endif
> >>>
> >>> The rest of the code should be ready to deal with it.
> >> Do you have any more information about how to reproduce that original issue?  I
> >> tried installing a RHEL-5.1 i386 guest with a PAE kernel, and I was able to
> >> install and boot it just fine with or without the "always deliver irq 0 to vcpu
> >> 0" check above.  Either there's something in particular I need to do to trigger
> >> the issue (hardware or software), or it's being solved another way.
> > 
> > For VMX, the guests TSC's are now usually synchronized (which was not
> > the case before, when this patch was written). For AMD, they start out
> > of sync.
> > 
> > Migration also causes them to be out of sync. So i'd prefer to postpone
> > the removal of this limitation until there is a guarantee the TSCs 
> > are synchronized across vcpus.
> 
> OK.  I'll try to get on an unsynchronized TSC machine and see if I can reproduce
> there.
> 
> > 
> >> In the meantime, I've gotten the "set_irq from IRQ context" that Avi suggested
> >> working, and the fixing up of this IOAPIC check is the last bit to actually get
> >> kdump working.
> > 
> > The kdump fix does not require that, I believe (although it is
> > useful/required for other things, please send the patch separately).
> > 
> > The hrtimer will fire on vcpu0, then:
> > 
> > kvm_inject_pit_interrupts -> ioapic_set_irq -> apic_deliver ->
> > kvm_vcpu_kick(vcpuN)
> > 
> > As long as the dest-id = vcpu_id overwrite change is in place.
> 
> Just to follow up here, I am taking your suggestion about testing for dest_id to
> multiple vcpus in the IOAPIC, and it seems to be working so far.  However, that
> is not enough to fix kdump.  We also need to remove the BSP assumptions in the
> i8254 code, otherwise we will never even attempt to deliver the interrupt to
> !vcpu0.

But the limitation is only in ioapic_deliver. vcpu0 will process the
hrtimer interrupt on its entry path (through __inject_pit_timer_intr),
call into the ioapic code via kvm_set_irq, and follow the normal apic
interrupt delivery path (which includes waking up the target vcpu(s)).

> That means that we need code for Avi's suggestion about kvm_set_irq for
> this to all work.

What happens now is the kvm_set_irq work is "scheduled" to execute in
vcpu0 entry context.

Regarding KVM_REQ_PENDING_TIMER, it is implicitly used in
vcpu_enter_guest here:

        if (vcpu->requests || need_resched() || signal_pending(current))

> Attached is the patch that I'm currently testing if anyone else wants to look at
> it.  I obviously need to break it up into a nice series, but that will have to
> wait until next week.  My "ioapic_dest_multiple_cpus()" function could do with a
> "popcnt", but I couldn't find a macro after a quick look around the tree.
> What's the recommended way to count bits in Linux?

There's a hweight_long helper.

Hum, suppose its safe to limit the irq0->bsp override to ioapic delivery
mode lowest priority (so you don't need to compare IOAPIC destination id
with the local apic ID's).


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] Fix kdump under KVM.
  2009-10-30 18:03               ` David S. Ahern
@ 2009-10-30 22:21                 ` Marcelo Tosatti
  0 siblings, 0 replies; 23+ messages in thread
From: Marcelo Tosatti @ 2009-10-30 22:21 UTC (permalink / raw)
  To: David S. Ahern; +Cc: Chris Lalancette, kvm, Avi Kivity

On Fri, Oct 30, 2009 at 12:03:20PM -0600, David S. Ahern wrote:
> 
> On 10/30/2009 09:28 AM, Chris Lalancette wrote:
> >>
> >> For VMX, the guests TSC's are now usually synchronized (which was not
> >> the case before, when this patch was written). For AMD, they start out
> >> of sync.
> >>
> >> Migration also causes them to be out of sync. So i'd prefer to postpone
> >> the removal of this limitation until there is a guarantee the TSCs 
> >> are synchronized across vcpus.
> > 
> > OK.  I'll try to get on an unsynchronized TSC machine and see if I can reproduce
> > there.
> 
> Perhaps Marcelo was referring to his patchset from December 2008
> targeting TSC synchronization?
> 
> http://www.mail-archive.com/kvm@vger.kernel.org/msg08071.html
> 
> David

Yes, unsynchronized TSCs in the guest perspective.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] Fix kdump under KVM.
  2009-10-30 12:23         ` Chris Lalancette
  2009-10-30 13:07           ` Marcelo Tosatti
@ 2009-11-01 15:21           ` Avi Kivity
  2009-11-02 13:22             ` Chris Lalancette
  1 sibling, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2009-11-01 15:21 UTC (permalink / raw)
  To: Chris Lalancette; +Cc: Marcelo Tosatti, kvm

On 10/30/2009 02:23 PM, Chris Lalancette wrote:
> In the meantime, I've gotten the "set_irq from IRQ context" that Avi 
> suggested
> working, and the fixing up of this IOAPIC check is the last bit to actually get
> kdump working.
>    

There are two problems with that:

- kvm_set_irq() calls some notifiers (irq_mask_notifiers, at least, but 
possibly more); need to make sure those are safe from irq context
- kvm_set_irq() will loop on all vcpus, usually incurring a cache miss 
or two; with large vcpu counts this can make the irq-off time quite 
large, hurting our worst-case latency

we can defer the second problem since it's only a performance issue, but 
how did you deal with the first?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] Fix kdump under KVM.
  2009-11-01 15:21           ` Avi Kivity
@ 2009-11-02 13:22             ` Chris Lalancette
  2009-11-02 13:30               ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Lalancette @ 2009-11-02 13:22 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm

Avi Kivity wrote:
> On 10/30/2009 02:23 PM, Chris Lalancette wrote:
>> In the meantime, I've gotten the "set_irq from IRQ context" that Avi 
>> suggested
>> working, and the fixing up of this IOAPIC check is the last bit to actually get
>> kdump working.
>>    
> 
> There are two problems with that:
> 
> - kvm_set_irq() calls some notifiers (irq_mask_notifiers, at least, but 
> possibly more); need to make sure those are safe from irq context
> - kvm_set_irq() will loop on all vcpus, usually incurring a cache miss 
> or two; with large vcpu counts this can make the irq-off time quite 
> large, hurting our worst-case latency
> 
> we can defer the second problem since it's only a performance issue, but 
> how did you deal with the first?
> 

Actually, there is a problem here, but I don't think it's as you describe.
kvm_set_irq() doesn't directly call the mask notifiers; all it really does is
call the i8259/ioapic callback, which causes an interrupt to be raised on the
LAPIC in question, which causes the interrupt to be injected into the VCPU on
the next run.  That doesn't call the irq_mask_notifiers, and seems to be
IRQ-safe (with the conversion from mutex's to spin_locks).

The irq mask notifiers are called later on during an EOI.  In particular,
ioapic_mmio_write() -> ioapic_write_indirect(), which is protected by the
ioapic_lock.  I don't see a problem here (although please point it out if I
missed it).

The problem I do see has to do with the irq_ack notifiers.  In particular, if
you are on ia64 and ioapic_mmio_write() gets called for the IOAPIC_REG_EOI
address, we call __kvm_ioapic_update_eoi().  In that case, we drop the spin_lock
and call kvm_notify_acked_irq(), which can re-enter the ioapic code.  This could
be IRQ unsafe, after we return from kvm_notify_acked_irq() we re-acquire the
spin-lock, but in an irq-unsafe fashion.  Therefore we could take the spin-lock,
get interrupted for the timer interrupt, and try to re-acquire the lock, causing
a deadlock.  I'll have to think a bit more on how to solve that one (there's a
similar situation going on in the i8259, which is not ia64 specific).

So, either I'm missing exactly what you are talking about, or there's no
deadlock with my patch (except with ia64).  Can you explain further?

-- 
Chris Lalancette

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] Fix kdump under KVM.
  2009-11-02 13:22             ` Chris Lalancette
@ 2009-11-02 13:30               ` Avi Kivity
  0 siblings, 0 replies; 23+ messages in thread
From: Avi Kivity @ 2009-11-02 13:30 UTC (permalink / raw)
  To: Chris Lalancette; +Cc: Marcelo Tosatti, kvm

On 11/02/2009 03:22 PM, Chris Lalancette wrote:
> Avi Kivity wrote:
>    
>> On 10/30/2009 02:23 PM, Chris Lalancette wrote:
>>      
>>> In the meantime, I've gotten the "set_irq from IRQ context" that Avi
>>> suggested
>>> working, and the fixing up of this IOAPIC check is the last bit to actually get
>>> kdump working.
>>>
>>>        
>> There are two problems with that:
>>
>> - kvm_set_irq() calls some notifiers (irq_mask_notifiers, at least, but
>> possibly more); need to make sure those are safe from irq context
>> - kvm_set_irq() will loop on all vcpus, usually incurring a cache miss
>> or two; with large vcpu counts this can make the irq-off time quite
>> large, hurting our worst-case latency
>>
>> we can defer the second problem since it's only a performance issue, but
>> how did you deal with the first?
>>
>>      
> Actually, there is a problem here, but I don't think it's as you describe.
> kvm_set_irq() doesn't directly call the mask notifiers; all it really does is
> call the i8259/ioapic callback, which causes an interrupt to be raised on the
> LAPIC in question, which causes the interrupt to be injected into the VCPU on
> the next run.  That doesn't call the irq_mask_notifiers, and seems to be
> IRQ-safe (with the conversion from mutex's to spin_locks).
>
> The irq mask notifiers are called later on during an EOI.  In particular,
> ioapic_mmio_write() ->  ioapic_write_indirect(), which is protected by the
> ioapic_lock.  I don't see a problem here (although please point it out if I
> missed it).
>    

If we convert the ioapic lock to spin_lock_irq, then mask notifiers will 
be called under that lock.  It's true that mask notifiers aren't called 
directly from the kvm_set_irq path, but if we change the lock, all paths 
are affected.

> The problem I do see has to do with the irq_ack notifiers.  In particular, if
> you are on ia64 and ioapic_mmio_write() gets called for the IOAPIC_REG_EOI
> address, we call __kvm_ioapic_update_eoi().  In that case, we drop the spin_lock
> and call kvm_notify_acked_irq(), which can re-enter the ioapic code.  This could
> be IRQ unsafe, after we return from kvm_notify_acked_irq() we re-acquire the
> spin-lock, but in an irq-unsafe fashion.  Therefore we could take the spin-lock,
> get interrupted for the timer interrupt, and try to re-acquire the lock, causing
> a deadlock.  I'll have to think a bit more on how to solve that one (there's a
> similar situation going on in the i8259, which is not ia64 specific).
>    

We have to reaquire the lock with spin_lock_irqsave() to avoid the deadlock.

> So, either I'm missing exactly what you are talking about, or there's no
> deadlock with my patch (except with ia64).  Can you explain further

EOI is also called from x86 (though the lapic).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2009-11-02 13:30 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-27 16:41 [PATCH 0/5]: Fix kdump under KVM Chris Lalancette
2009-10-27 16:41 ` [PATCH 1/5] Fix up some comments around the source tree Chris Lalancette
2009-10-27 16:41 ` [PATCH 2/5] Remove KVM_REQ_PENDING_TIMER Chris Lalancette
2009-10-27 16:41 ` [PATCH 3/5] Remove references to VCPU in i8254 Chris Lalancette
2009-10-27 16:41 ` [PATCH 4/5] Remove timer.c Chris Lalancette
2009-10-27 16:41 ` [PATCH 5/5] Fix kdump under KVM Chris Lalancette
2009-10-27 17:42   ` Marcelo Tosatti
2009-10-28 10:21     ` Chris Lalancette
2009-10-28 12:41       ` Marcelo Tosatti
2009-10-30 12:23         ` Chris Lalancette
2009-10-30 13:07           ` Marcelo Tosatti
2009-10-30 15:28             ` Chris Lalancette
2009-10-30 18:03               ` David S. Ahern
2009-10-30 22:21                 ` Marcelo Tosatti
2009-10-30 22:19               ` Marcelo Tosatti
2009-11-01 15:21           ` Avi Kivity
2009-11-02 13:22             ` Chris Lalancette
2009-11-02 13:30               ` Avi Kivity
2009-10-28  9:59 ` [PATCH 0/5]: " Avi Kivity
2009-10-28 10:13   ` Chris Lalancette
2009-10-28 10:31     ` Avi Kivity
2009-10-29  8:34       ` Chris Lalancette
2009-10-29 10:15         ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.