All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] MSI-X Enabling
@ 2008-12-23  8:12 Sheng Yang
  2008-12-23  8:12 ` [PATCH 1/4] KVM: Using kfifo for irq recording Sheng Yang
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Sheng Yang @ 2008-12-23  8:12 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

Hi Avi

This patchset would enable MSI-X support.

The main change is a new kind of fake device - intercepted MMIO device is
added for supporting to intercept one page contained MSI-X entries table.

I also consider of doing this intercept in userspace. But I think it's would
be more complex to push it in userspace, for we also have to prevent KVM from
creating page table for the page. And in the future, we would add hook
function in the read/write operation to enabling mask bit for each MSI-X
entry, may be with pending bits. So I decided to put it in the kernel.

Another thing is I am not very confident about modify the position of handling
guest memory and MMIO in emulate read/write function. Is it would cause any
side affect?

Thanks!
--
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] KVM: Using kfifo for irq recording
  2008-12-23  8:12 [PATCH 0/4] MSI-X Enabling Sheng Yang
@ 2008-12-23  8:12 ` Sheng Yang
  2008-12-25 11:07   ` Avi Kivity
  2008-12-23  8:12 ` [PATCH 2/4] KVM: Add intercepted MMIO for KVM Sheng Yang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Sheng Yang @ 2008-12-23  8:12 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Sheng Yang

For MSI-X, we have to deal with multiply IRQ with same IRQ handler, so it's
necessary to record the IRQ that trigger the IRQ handler.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 include/linux/kvm_host.h |    4 ++++
 virt/kvm/kvm_main.c      |   30 +++++++++++++++++++++++++++---
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5b671b6..541ccaf 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -17,6 +17,7 @@
 #include <linux/preempt.h>
 #include <linux/marker.h>
 #include <linux/msi.h>
+#include <linux/kfifo.h>
 #include <asm/signal.h>
 
 #include <linux/kvm.h>
@@ -313,6 +314,9 @@ struct kvm_assigned_dev_kernel {
 	int host_irq;
 	bool host_irq_disabled;
 	int guest_irq;
+#define KVM_ASSIGNED_DEV_IRQ_FIFO_LEN	0x100
+	struct kfifo *irq_fifo;
+	spinlock_t irq_fifo_lock;
 #define KVM_ASSIGNED_DEV_GUEST_INTX	(1 << 0)
 #define KVM_ASSIGNED_DEV_GUEST_MSI	(1 << 1)
 #define KVM_ASSIGNED_DEV_HOST_INTX	(1 << 8)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e39c57a..3955e4d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -99,6 +99,8 @@ static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *h
 static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
 {
 	struct kvm_assigned_dev_kernel *assigned_dev;
+	int irq;
+	u32 gsi;
 
 	assigned_dev = container_of(work, struct kvm_assigned_dev_kernel,
 				    interrupt_work);
@@ -109,14 +111,22 @@ static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
 	 */
 	mutex_lock(&assigned_dev->kvm->lock);
 
-	kvm_set_irq(assigned_dev->kvm, assigned_dev->irq_source_id,
-		    assigned_dev->guest_irq, 1);
+handle_irq:
+	kfifo_get(assigned_dev->irq_fifo,
+		  (unsigned char *)&irq, sizeof(int));
+
+	gsi = assigned_dev->guest_irq;
+
+	kvm_set_irq(assigned_dev->kvm, assigned_dev->irq_source_id, gsi, 1);
 
 	if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_GUEST_MSI) {
 		enable_irq(assigned_dev->host_irq);
 		assigned_dev->host_irq_disabled = false;
 	}
 
+	if (kfifo_len(assigned_dev->irq_fifo) != 0)
+		goto handle_irq;
+
 	mutex_unlock(&assigned_dev->kvm->lock);
 	kvm_put_kvm(assigned_dev->kvm);
 }
@@ -128,6 +138,9 @@ static irqreturn_t kvm_assigned_dev_intr(int irq, void *dev_id)
 
 	kvm_get_kvm(assigned_dev->kvm);
 
+	kfifo_put(assigned_dev->irq_fifo,
+		  (unsigned char *)&irq, sizeof(int));
+
 	schedule_work(&assigned_dev->interrupt_work);
 
 	disable_irq_nosync(irq);
@@ -201,6 +214,7 @@ static void kvm_free_assigned_device(struct kvm *kvm,
 	pci_dev_put(assigned_dev->dev);
 
 	list_del(&assigned_dev->list);
+	kfifo_free(assigned_dev->irq_fifo);
 	kfree(assigned_dev);
 }
 
@@ -449,15 +463,25 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 
 	list_add(&match->list, &kvm->arch.assigned_dev_head);
 
+	spin_lock_init(&match->irq_fifo_lock);
+	match->irq_fifo = kfifo_alloc(sizeof(unsigned char) *
+				      KVM_ASSIGNED_DEV_IRQ_FIFO_LEN,
+				      GFP_KERNEL | __GFP_ZERO,
+				      &match->irq_fifo_lock);
+	if (!match->irq_fifo)
+		goto out_list_del;
+
 	if (assigned_dev->flags & KVM_DEV_ASSIGN_ENABLE_IOMMU) {
 		r = kvm_iommu_map_guest(kvm, match);
 		if (r)
-			goto out_list_del;
+			goto out_fifo_del;
 	}
 
 out:
 	mutex_unlock(&kvm->lock);
 	return r;
+out_fifo_del:
+	kfifo_free(match->irq_fifo);
 out_list_del:
 	list_del(&match->list);
 	pci_release_regions(dev);
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/4] KVM: Add intercepted MMIO for KVM
  2008-12-23  8:12 [PATCH 0/4] MSI-X Enabling Sheng Yang
  2008-12-23  8:12 ` [PATCH 1/4] KVM: Using kfifo for irq recording Sheng Yang
@ 2008-12-23  8:12 ` Sheng Yang
  2008-12-23  8:12 ` [PATCH 3/4] KVM: x86: displace MMIO handling part Sheng Yang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Sheng Yang @ 2008-12-23  8:12 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Sheng Yang

Intercepted MMIO is a fake device for intercepting device MMIO region. It
would prevent kvm from setting up page table for the specific pages in device's
MMIO region, and emulate them using HIGHMEM mapped kernel pages.

For MSI-X, we need to intercept the guest accessing to a MMIO page
which stored MSI-X table, and what we write to real device's MMIO page is
another story.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 arch/x86/kvm/Makefile       |    2 +-
 arch/x86/kvm/mmu.c          |   11 +++
 arch/x86/kvm/paging_tmpl.h  |   11 +++
 include/linux/kvm.h         |    3 +
 include/linux/kvm_host.h    |   14 +++
 virt/kvm/intercepted_mmio.c |  201 +++++++++++++++++++++++++++++++++++++++++++
 virt/kvm/intercepted_mmio.h |   23 +++++
 virt/kvm/kvm_main.c         |    7 ++
 8 files changed, 271 insertions(+), 1 deletions(-)
 create mode 100644 virt/kvm/intercepted_mmio.c
 create mode 100644 virt/kvm/intercepted_mmio.h

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index c023435..4ce3137 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -3,7 +3,7 @@
 #
 
 common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
-                coalesced_mmio.o irq_comm.o)
+                coalesced_mmio.o irq_comm.o intercepted_mmio.o)
 ifeq ($(CONFIG_KVM_TRACE),y)
 common-objs += $(addprefix ../../../virt/kvm/, kvm_trace.o)
 endif
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9871d9d..39bb483 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2096,6 +2096,17 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 		kvm_release_pfn_clean(pfn);
 		return 1;
 	}
+
+	/*
+	 * We mapped MMIO in assigned device for performance, but we
+	 * have to intercept some MMIO access
+	 */
+	if (kvm_is_mmio_pfn(pfn) &&
+	    kvm_intercept_mmio_pfn(vcpu->kvm, pfn)) {
+		kvm_release_pfn_clean(pfn);
+		return 1;
+	}
+
 	spin_lock(&vcpu->kvm->mmu_lock);
 	if (mmu_notifier_retry(vcpu, mmu_seq))
 		goto out_unlock;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index d206401..7788beb 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -440,6 +440,17 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 		return 1;
 	}
 
+	/*
+	 * We mapped MMIO in assigned device for performance, but we
+	 * have to intercept some MMIO access
+	 */
+	if (kvm_is_mmio_pfn(pfn) &&
+	    kvm_intercept_mmio_pfn(vcpu->kvm, pfn)) {
+		pgprintk("gfn %lx is intercepted mmio\n", walker.gfn);
+		kvm_release_pfn_clean(pfn);
+		return 1;
+	}
+
 	spin_lock(&vcpu->kvm->mmu_lock);
 	if (mmu_notifier_retry(vcpu, mmu_seq))
 		goto out_unlock;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index b091a86..c45b08d 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -395,6 +395,9 @@ struct kvm_trace_rec {
 #endif
 #define KVM_CAP_SET_GUEST_DEBUG 23
 #define KVM_CAP_GSI_MSG 24
+#if defined(CONFIG_X86)
+#define KVM_CAP_INTERCEPTED_MMIO 25
+#endif
 
 /*
  * ioctls for VM fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 541ccaf..e4d6b99 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -137,6 +137,7 @@ struct kvm {
 	struct mutex gsi_msg_lock;
 #define KVM_NR_GSI_MSG	    256
 	DECLARE_BITMAP(gsi_msg_bitmap, KVM_NR_GSI_MSG);
+	struct kvm_intercepted_mmio_dev *intercepted_mmio_dev;
 };
 
 /* The guest did something we don't support. */
@@ -304,6 +305,13 @@ struct kvm_irq_ack_notifier {
 	void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
 };
 
+struct kvm_intercepted_mmio {
+	pfn_t pfn;
+	struct page *page;
+	struct kvm_assigned_dev_kernel *dev;
+	struct hlist_node link;
+};
+
 struct kvm_assigned_dev_kernel {
 	struct kvm_irq_ack_notifier ack_notifier;
 	struct work_struct interrupt_work;
@@ -323,6 +331,7 @@ struct kvm_assigned_dev_kernel {
 #define KVM_ASSIGNED_DEV_HOST_MSI	(1 << 9)
 	unsigned long irq_requested_type;
 	int irq_source_id;
+	struct kvm_intercepted_mmio msix_mmio;
 	struct pci_dev *dev;
 	struct kvm *kvm;
 };
@@ -346,6 +355,11 @@ struct kvm_gsi_msg *kvm_find_gsi_msg(struct kvm *kvm, u32 gsi);
 void kvm_free_gsi_msg(struct kvm *kvm, struct kvm_gsi_msg *gsi_msg);
 void kvm_free_gsi_msg_list(struct kvm *kvm);
 
+int kvm_intercept_mmio_pfn(struct kvm *kvm, pfn_t pfn);
+int kvm_register_intercept_mmio(struct kvm *kvm,
+				struct kvm_intercepted_mmio *mmio);
+void kvm_unregister_intercept_mmio(struct kvm_intercepted_mmio *mmio);
+
 #ifdef CONFIG_DMAR
 int kvm_iommu_map_pages(struct kvm *kvm, gfn_t base_gfn,
 			unsigned long npages);
diff --git a/virt/kvm/intercepted_mmio.c b/virt/kvm/intercepted_mmio.c
new file mode 100644
index 0000000..693cdd7
--- /dev/null
+++ b/virt/kvm/intercepted_mmio.c
@@ -0,0 +1,201 @@
+/*
+ * KVM intercepted MMIO
+ *
+ * Copyright (c) 2008 Intel Corporation
+ *
+ *  A fake device help to emulate one piece of MMIO region
+ *
+ *  Author: Sheng Yang <sheng.yang@intel.com>
+ *
+ */
+
+#include <linux/kvm_host.h>
+#include <linux/kvm.h>
+#include <linux/highmem.h>
+
+#include "iodev.h"
+#include "intercepted_mmio.h"
+
+static struct kvm_intercepted_mmio *find_intercepted_mmio(
+		struct kvm_intercepted_mmio_dev *dev, pfn_t pfn)
+{
+	struct hlist_node *n;
+	struct kvm_intercepted_mmio *mmio;
+
+	hlist_for_each_entry(mmio, n, &dev->mmio_list, link)
+		if (mmio->pfn == pfn)
+			return mmio;
+	return NULL;
+}
+
+static int intercepted_mmio_in_range(struct kvm_io_device *this,
+				   gpa_t addr, int len, int is_write)
+{
+	struct kvm_intercepted_mmio_dev *dev =
+			(struct kvm_intercepted_mmio_dev*)this->private;
+	pfn_t pfn;
+	int r = 0;
+
+	pfn = gfn_to_pfn(dev->kvm, addr >> PAGE_SHIFT);
+
+	if (is_error_pfn(pfn)) {
+		r = 0;
+		goto out;
+	}
+
+	if (!kvm_is_mmio_pfn(pfn)) {
+		r = 0;
+		goto out;
+	}
+
+	/* Notice we can't handle the accessing across the page */
+	if (addr >> PAGE_SHIFT != (addr + len - 2) >> PAGE_SHIFT) {
+		printk(KERN_WARNING
+		       "kvm: intercepted MMIO across the page! "
+		       "gpa 0x%lx, length %d\n", (unsigned long)addr, len);
+		r = 0;
+		goto out;
+	}
+
+	if (find_intercepted_mmio(dev, pfn))
+		r = 1;
+out:
+	kvm_release_pfn_clean(pfn);
+	return r;
+}
+
+int kvm_intercept_mmio_pfn(struct kvm *kvm, pfn_t pfn)
+{
+	struct kvm_intercepted_mmio_dev *dev = kvm->intercepted_mmio_dev;
+	int r = 0;
+
+	mutex_lock(&kvm->lock);
+	if (find_intercepted_mmio(dev, pfn))
+		r = 1;
+	mutex_unlock(&kvm->lock);
+
+	return r;
+}
+
+static void intercepted_mmio_destructor(struct kvm_io_device *this)
+{
+	struct kvm_intercepted_mmio_dev *dev =
+			(struct kvm_intercepted_mmio_dev*)this->private;
+	struct kvm_intercepted_mmio *mmio;
+	struct hlist_node *n;
+
+	hlist_for_each_entry(mmio, n, &dev->mmio_list, link)
+		__free_page(mmio->page);
+	kfree(this);
+}
+
+static void intercepted_mmio_read(struct kvm_io_device *this,
+				  gpa_t addr, int len, void *val)
+{
+	struct kvm_intercepted_mmio_dev *dev =
+			(struct kvm_intercepted_mmio_dev*)this->private;
+	struct kvm_intercepted_mmio *mmio;
+	pfn_t pfn;
+	int offset = addr & ~PAGE_MASK;
+	void *va;
+
+	pfn = gfn_to_pfn(dev->kvm, addr >> PAGE_SHIFT);
+
+	/* We should already ensure that pfn is legal */
+	if (is_error_pfn(pfn)) {
+		BUG();
+		goto out;
+	}
+
+	if (!kvm_is_mmio_pfn(pfn)) {
+		BUG();
+		goto out;
+	}
+
+	mmio = find_intercepted_mmio(dev, pfn);
+
+	BUG_ON(!mmio);
+	BUG_ON(!mmio->page);
+
+	va = kmap(mmio->page);
+	memcpy(val, (void *)((char *)va + offset), len);
+	kunmap(mmio->page);
+
+out:
+	kvm_release_pfn_clean(pfn);
+}
+
+static void intercepted_mmio_write(struct kvm_io_device *this,
+				   gpa_t addr, int len, const void *val)
+{
+	struct kvm_intercepted_mmio_dev *dev =
+			(struct kvm_intercepted_mmio_dev*)this->private;
+	struct kvm_intercepted_mmio *mmio;
+	pfn_t pfn;
+	int offset = addr & ~PAGE_MASK;
+	void *va;
+
+	pfn = gfn_to_pfn(dev->kvm, addr >> PAGE_SHIFT);
+
+	/* We should already ensure that pfn is legal */
+	if (is_error_pfn(pfn)) {
+		BUG();
+		goto out;
+	}
+
+	if (!kvm_is_mmio_pfn(pfn)) {
+		BUG();
+		goto out;
+	}
+
+	mmio = find_intercepted_mmio(dev, pfn);
+
+	BUG_ON(!mmio);
+	BUG_ON(!mmio->page);
+
+	va = kmap(mmio->page);
+	memcpy((void *)((char *)va + offset), val, len);
+	kunmap(mmio->page);
+
+out:
+	kvm_release_pfn_clean(pfn);
+}
+
+int kvm_intercepted_mmio_init(struct kvm *kvm)
+{
+	struct kvm_intercepted_mmio_dev *dev;
+
+	dev = kzalloc(sizeof(struct kvm_intercepted_mmio_dev), GFP_KERNEL);
+	if (!dev)
+		return -ENOMEM;
+	dev->dev.write = intercepted_mmio_write;
+	dev->dev.in_range = intercepted_mmio_in_range;
+	dev->dev.read = intercepted_mmio_read;
+	dev->dev.destructor = intercepted_mmio_destructor;
+	dev->dev.private = dev;
+	dev->kvm = kvm;
+	kvm->intercepted_mmio_dev = dev;
+	kvm_io_bus_register_dev(&kvm->mmio_bus, &dev->dev);
+	INIT_HLIST_HEAD(&dev->mmio_list);
+
+	return 0;
+}
+
+/* Register intercepted MMIO, called with kvm->lock hold */
+int kvm_register_intercept_mmio(struct kvm *kvm,
+				struct kvm_intercepted_mmio *mmio)
+{
+	mmio->page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
+	if (!mmio->page)
+		return -ENOMEM;
+
+	hlist_add_head(&mmio->link, &kvm->intercepted_mmio_dev->mmio_list);
+	return 0;
+}
+
+/* Register intercepted MMIO, called with kvm->lock hold */
+void kvm_unregister_intercept_mmio(struct kvm_intercepted_mmio *mmio)
+{
+	__free_page(mmio->page);
+	hlist_del(&mmio->link);
+}
diff --git a/virt/kvm/intercepted_mmio.h b/virt/kvm/intercepted_mmio.h
new file mode 100644
index 0000000..18fe10f
--- /dev/null
+++ b/virt/kvm/intercepted_mmio.h
@@ -0,0 +1,23 @@
+/*
+ * KVM intercepted MMIO
+ *
+ * Copyright (c) 2008 Intel Corporation
+ *
+ *  A fake device help to emulate one piece of MMIO region
+ *
+ *  Author: Sheng Yang <sheng.yang@intel.com>
+ *
+ */
+
+#include "iodev.h"
+
+#include <linux/kvm_host.h>
+#include <linux/kvm.h>
+
+struct kvm_intercepted_mmio_dev {
+	struct kvm_io_device dev;
+	struct kvm *kvm;
+	struct hlist_head mmio_list;
+};
+
+int kvm_intercepted_mmio_init(struct kvm *kvm);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3955e4d..a5a9763 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -57,6 +57,10 @@
 #include "irq.h"
 #endif
 
+#ifdef KVM_CAP_INTERCEPTED_MMIO
+#include "intercepted_mmio.h"
+#endif
+
 MODULE_AUTHOR("Qumranet");
 MODULE_LICENSE("GPL");
 
@@ -784,6 +788,9 @@ static struct kvm *kvm_create_vm(void)
 #endif
 	INIT_HLIST_HEAD(&kvm->gsi_msg_list);
 	mutex_init(&kvm->gsi_msg_lock);
+#ifdef KVM_CAP_INTERCEPTED_MMIO
+	kvm_intercepted_mmio_init(kvm);
+#endif
 out:
 	return kvm;
 }
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] KVM: x86: displace MMIO handling part
  2008-12-23  8:12 [PATCH 0/4] MSI-X Enabling Sheng Yang
  2008-12-23  8:12 ` [PATCH 1/4] KVM: Using kfifo for irq recording Sheng Yang
  2008-12-23  8:12 ` [PATCH 2/4] KVM: Add intercepted MMIO for KVM Sheng Yang
@ 2008-12-23  8:12 ` Sheng Yang
  2008-12-23  8:12 ` [PATCH 4/4] KVM: Enable MSI-X for KVM assigned device Sheng Yang
  2008-12-23 20:19 ` [PATCH 0/4] MSI-X Enabling Marcelo Tosatti
  4 siblings, 0 replies; 11+ messages in thread
From: Sheng Yang @ 2008-12-23  8:12 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Sheng Yang

Now we handling accessing guest memory first, then MMIO. But for intercepted
MMIO, the mapping to MMIO page is exist, so KVM would write to guest by mistake.
The patch move MMIO handling ahead of guest memory in emulating instruction.

(I am not confident on this modify, would it bring some side effect?)

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 arch/x86/kvm/x86.c |   28 +++++++++-------------------
 1 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index aa4575c..6554966 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2022,17 +2022,6 @@ static int emulator_read_emulated(unsigned long addr,
 
 	gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr);
 
-	/* For APIC access vmexit */
-	if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
-		goto mmio;
-
-	if (emulator_read_std(addr, val, bytes, vcpu)
-			== X86EMUL_CONTINUE)
-		return X86EMUL_CONTINUE;
-	if (gpa == UNMAPPED_GVA)
-		return X86EMUL_PROPAGATE_FAULT;
-
-mmio:
 	/*
 	 * Is this MMIO handled locally?
 	 */
@@ -2045,6 +2034,12 @@ mmio:
 	}
 	mutex_unlock(&vcpu->kvm->lock);
 
+	if (emulator_read_std(addr, val, bytes, vcpu)
+			== X86EMUL_CONTINUE)
+		return X86EMUL_CONTINUE;
+	if (gpa == UNMAPPED_GVA)
+		return X86EMUL_PROPAGATE_FAULT;
+
 	vcpu->mmio_needed = 1;
 	vcpu->mmio_phys_addr = gpa;
 	vcpu->mmio_size = bytes;
@@ -2080,14 +2075,6 @@ static int emulator_write_emulated_onepage(unsigned long addr,
 		return X86EMUL_PROPAGATE_FAULT;
 	}
 
-	/* For APIC access vmexit */
-	if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
-		goto mmio;
-
-	if (emulator_write_phys(vcpu, gpa, val, bytes))
-		return X86EMUL_CONTINUE;
-
-mmio:
 	/*
 	 * Is this MMIO handled locally?
 	 */
@@ -2100,6 +2087,9 @@ mmio:
 	}
 	mutex_unlock(&vcpu->kvm->lock);
 
+	if (emulator_write_phys(vcpu, gpa, val, bytes))
+		return X86EMUL_CONTINUE;
+
 	vcpu->mmio_needed = 1;
 	vcpu->mmio_phys_addr = gpa;
 	vcpu->mmio_size = bytes;
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] KVM: Enable MSI-X for KVM assigned device
  2008-12-23  8:12 [PATCH 0/4] MSI-X Enabling Sheng Yang
                   ` (2 preceding siblings ...)
  2008-12-23  8:12 ` [PATCH 3/4] KVM: x86: displace MMIO handling part Sheng Yang
@ 2008-12-23  8:12 ` Sheng Yang
  2008-12-23 20:19 ` [PATCH 0/4] MSI-X Enabling Marcelo Tosatti
  4 siblings, 0 replies; 11+ messages in thread
From: Sheng Yang @ 2008-12-23  8:12 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Sheng Yang

This patch finally enable MSI-X.

What we need for MSI-X:
1. Intercept one page in MMIO region of device. So that we can get guest desired
MSI-X table and set up the real one.

2. IRQ fifo. Now one device can have more than one interrupt, and they are all
handled by one workqueue structure. So we need to identify them. irq_fifo
provide a mechanism to handle more than one interrupt at one time.

3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X
message address/data. We used same entry number for the host and guest here, so
that it's easy to find the correlated guest gsi.

What we lack for now:
1. The PCI spec said nothing can existed with MSI-X table in the same page of
MMIO region, except pending bits. The patch ignore pending bits as the first
step (so they are always 0 - no pending).

2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS
can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch
didn't support this, and Linux also don't work in this way.

3. The patch didn't implement MSI-X mask all and mask single entry. I would
implement the former in driver/pci/msi.c later. And for single entry, I would
add a hook in intercepted MMIO's read/write handler later.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 include/linux/kvm.h      |    4 +
 include/linux/kvm_host.h |   12 ++-
 virt/kvm/kvm_main.c      |  246 +++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 255 insertions(+), 7 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index c45b08d..0531838 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -397,6 +397,7 @@ struct kvm_trace_rec {
 #define KVM_CAP_GSI_MSG 24
 #if defined(CONFIG_X86)
 #define KVM_CAP_INTERCEPTED_MMIO 25
+#define KVM_CAP_DEVICE_MSIX 26
 #endif
 
 /*
@@ -552,6 +553,9 @@ struct kvm_assigned_irq {
 
 #define KVM_DEV_IRQ_ASSIGN_MSI_ACTION	(1 << 0)
 #define KVM_DEV_IRQ_ASSIGN_ENABLE_MSI	(1 << 1)
+#define KVM_DEV_IRQ_ASSIGN_MSIX_ACTION  (1 << 2)
+#define KVM_DEV_IRQ_ASSIGN_ENABLE_MSIX  (1 << 3)
+#define KVM_DEV_IRQ_ASSIGN_MASK_MSIX    (1 << 4)
 
 struct kvm_assigned_gsi_msg {
 	__u32 gsi;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e4d6b99..c0d29aa 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -319,16 +319,24 @@ struct kvm_assigned_dev_kernel {
 	int assigned_dev_id;
 	int host_busnr;
 	int host_devfn;
-	int host_irq;
 	bool host_irq_disabled;
-	int guest_irq;
 #define KVM_ASSIGNED_DEV_IRQ_FIFO_LEN	0x100
 	struct kfifo *irq_fifo;
 	spinlock_t irq_fifo_lock;
+	int entries_nr;
+	union {
+		int host_irq;
+		struct msix_entry *host_msix_entries;
+	};
+	union {
+		int guest_irq;
+		struct msix_entry *guest_msix_entries;
+	};
 #define KVM_ASSIGNED_DEV_GUEST_INTX	(1 << 0)
 #define KVM_ASSIGNED_DEV_GUEST_MSI	(1 << 1)
 #define KVM_ASSIGNED_DEV_HOST_INTX	(1 << 8)
 #define KVM_ASSIGNED_DEV_HOST_MSI	(1 << 9)
+#define KVM_ASSIGNED_DEV_MSIX		((1 << 2) | (1 << 10))
 	unsigned long irq_requested_type;
 	int irq_source_id;
 	struct kvm_intercepted_mmio msix_mmio;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a5a9763..b453279 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -100,6 +100,41 @@ static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *h
 	return NULL;
 }
 
+static u32 find_gsi_from_host_irq(struct kvm_assigned_dev_kernel *assigned_dev,
+				  int irq)
+{
+	int i;
+	int entry;
+	u32 gsi;
+	struct msix_entry *host_msix_entries, *guest_msix_entries;
+
+	host_msix_entries = assigned_dev->host_msix_entries;
+	guest_msix_entries = assigned_dev->guest_msix_entries;
+
+	entry = -1;
+	gsi = 0;
+	for (i = 0; i < assigned_dev->entries_nr; i++)
+		if (irq == (host_msix_entries + i)->vector) {
+			entry = (host_msix_entries + i)->entry;
+			break;
+		}
+	if (entry < 0) {
+		printk(KERN_WARNING "Fail to find correlated MSI-X entry!\n");
+		return 0;
+	}
+	for (i = 0; i < assigned_dev->entries_nr; i++)
+		if (entry == (guest_msix_entries + i)->entry) {
+			gsi = (guest_msix_entries + i)->vector;
+			break;
+		}
+	if (gsi == 0) {
+		printk(KERN_WARNING "Fail to find correlated MSI-X gsi!\n");
+		return 0;
+	}
+
+	return gsi;
+}
+
 static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
 {
 	struct kvm_assigned_dev_kernel *assigned_dev;
@@ -119,12 +154,16 @@ handle_irq:
 	kfifo_get(assigned_dev->irq_fifo,
 		  (unsigned char *)&irq, sizeof(int));
 
-	gsi = assigned_dev->guest_irq;
+	if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_MSIX)
+		gsi = find_gsi_from_host_irq(assigned_dev, irq);
+	else
+		gsi = assigned_dev->guest_irq;
 
 	kvm_set_irq(assigned_dev->kvm, assigned_dev->irq_source_id, gsi, 1);
 
-	if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_GUEST_MSI) {
-		enable_irq(assigned_dev->host_irq);
+	if ((assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_GUEST_MSI) ||
+	    (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_MSIX)) {
+		enable_irq(irq);
 		assigned_dev->host_irq_disabled = false;
 	}
 
@@ -196,11 +235,23 @@ static void kvm_free_assigned_irq(struct kvm *kvm,
 		 */
 		kvm_put_kvm(kvm);
 
-	free_irq(assigned_dev->host_irq, (void *)assigned_dev);
+	if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_MSIX) {
+		int i;
+		for (i = 0; i < assigned_dev->entries_nr; i++)
+			free_irq((assigned_dev->host_msix_entries + i)->vector,
+				 (void *)assigned_dev);
+	} else
+		free_irq(assigned_dev->host_irq, (void *)assigned_dev);
 
 	if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI)
 		pci_disable_msi(assigned_dev->dev);
 
+	if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_MSIX) {
+		kfree(assigned_dev->host_msix_entries);
+		kfree(assigned_dev->guest_msix_entries);
+		pci_disable_msix(assigned_dev->dev);
+	}
+
 	assigned_dev->irq_requested_type = 0;
 }
 
@@ -325,6 +376,144 @@ static int assigned_device_update_msi(struct kvm *kvm,
 	adev->irq_requested_type |= KVM_ASSIGNED_DEV_HOST_MSI;
 	return 0;
 }
+
+static int assigned_device_update_msix_mmio(struct kvm *kvm,
+			    struct kvm_assigned_dev_kernel *adev)
+{
+	struct kvm_intercepted_mmio *mmio = &adev->msix_mmio;
+	void * va;
+	u16 entries_nr = 0, entries_max_nr;
+	int pos, i, r = 0;
+	u32 msg_addr, msg_upper_addr, msg_data, msg_ctrl;
+	struct kvm_gsi_msg gsi_msg;
+
+	pos = pci_find_capability(adev->dev, PCI_CAP_ID_MSIX);
+	if (!pos)
+		return -EINVAL;
+
+	pci_read_config_word(adev->dev, pos + PCI_MSIX_FLAGS, &entries_max_nr);
+	entries_max_nr &= PCI_MSIX_FLAGS_QSIZE;
+
+	va = kmap(mmio->page);
+	/* Get the usable entry number for allocating */
+	for (i = 0; i < entries_max_nr; i++) {
+		memcpy(&msg_ctrl, va + i * 16 + 12, 4);
+		if (msg_ctrl & PCI_MSIX_FLAGS_BITMASK)
+			continue;
+		memcpy(&msg_data, va + i * 16 + 8, 4);
+		/* Ignore unused entry even it's unmasked */
+		if (msg_data == 0)
+			continue;
+		entries_nr ++;
+	}
+
+	adev->entries_nr = entries_nr;
+	adev->host_msix_entries = kmalloc(sizeof(struct msix_entry) * entries_nr,
+					  GFP_KERNEL);
+	if (!adev->host_msix_entries) {
+		printk(KERN_ERR "no memory for host msix entries!\n");
+		return -ENOMEM;
+	}
+	adev->guest_msix_entries = kmalloc(sizeof(struct msix_entry) * entries_nr,
+					   GFP_KERNEL);
+	if (!adev->guest_msix_entries) {
+		printk(KERN_ERR "no memory for guest msix entries!\n");
+		return -ENOMEM;
+	}
+
+	entries_nr = 0;
+	for (i = 0; i < entries_max_nr; i++) {
+		if (entries_nr >= adev->entries_nr)
+			break;
+		memcpy(&msg_ctrl, va + i * 16 + 12, 4);
+		if (msg_ctrl & PCI_MSIX_FLAGS_BITMASK)
+			continue;
+		memcpy(&msg_addr, va + i * 16, 4);
+		memcpy(&msg_upper_addr, va + i * 16 + 4, 4);
+		memcpy(&msg_data, va + i * 16 + 8, 4);
+		if (msg_data == 0)
+			continue;
+
+		gsi_msg.gsi = 0;
+		gsi_msg.msg.address_lo = msg_addr;
+		gsi_msg.msg.address_hi = msg_upper_addr;
+		gsi_msg.msg.data = msg_data;
+		r = kvm_update_gsi_msg(kvm, &gsi_msg);
+		if (r) {
+			printk(KERN_ERR "Fail to update gsi_msg for MSIX!");
+			break;
+		}
+		(adev->guest_msix_entries + entries_nr)->entry = i;
+		(adev->guest_msix_entries + entries_nr)->vector = gsi_msg.gsi;
+		(adev->host_msix_entries + entries_nr)->entry = i;
+		entries_nr ++;
+	}
+	kunmap(mmio->page);
+
+	return r;
+}
+
+static int assigned_device_update_msix(struct kvm *kvm,
+			struct kvm_assigned_dev_kernel *adev,
+			struct kvm_assigned_irq *airq)
+{
+	/* TODO Deal with KVM_DEV_IRQ_ASSIGNED_MASK_MSIX */
+	int i, r;
+
+	adev->ack_notifier.gsi = -1;
+
+	if (irqchip_in_kernel(kvm)) {
+		if (airq->flags & KVM_DEV_IRQ_ASSIGN_MASK_MSIX) {
+			printk(KERN_WARNING
+			       "kvm: unsupported mask MSI-X, flags 0x%x!\n",
+			       airq->flags);
+			return 0;
+		}
+
+		if (!(airq->flags & KVM_DEV_IRQ_ASSIGN_ENABLE_MSIX)) {
+			/* Guest disable MSI-X */
+			kvm_free_assigned_irq(kvm, adev);
+			if (msi2intx) {
+				pci_enable_msi(adev->dev);
+				if (adev->dev->msi_enabled)
+					return assigned_device_update_msi(kvm,
+							adev, airq);
+			}
+			return assigned_device_update_intx(kvm, adev, airq);
+		}
+
+		kvm_free_assigned_irq(kvm, adev);
+
+		/*
+		 * We only scan device (emulated) MMIO when guest want to enable
+		 * MSI-X, and don't support dynamically add MSI-X entry for now
+		 */
+		r = assigned_device_update_msix_mmio(kvm, adev);
+		if (r)
+			return r;
+
+		r = pci_enable_msix(adev->dev, adev->host_msix_entries,
+				    adev->entries_nr);
+		if (r) {
+			printk(KERN_ERR "Fail to enable MSI-X feature!\n");
+			return r;
+		}
+
+		for (i = 0; i < adev->entries_nr; i++) {
+			r = request_irq((adev->host_msix_entries + i)->vector,
+					kvm_assigned_dev_intr, 0,
+					"kvm_assigned_msix_device",
+					(void *)adev);
+			if (r)
+				return r;
+		}
+	}
+
+	adev->irq_requested_type |= KVM_ASSIGNED_DEV_MSIX;
+
+	return 0;
+}
+
 #endif
 
 static int kvm_vm_ioctl_assign_irq(struct kvm *kvm,
@@ -370,6 +559,16 @@ static int kvm_vm_ioctl_assign_irq(struct kvm *kvm,
 		}
 	}
 
+#ifdef CONFIG_X86
+	if (assigned_irq->flags & KVM_DEV_IRQ_ASSIGN_MSIX_ACTION) {
+		r = assigned_device_update_msix(kvm, match, assigned_irq);
+		if (r) {
+			printk(KERN_WARNING "kvm: failed to execute "
+					"MSI-X action!\n");
+			goto out_release;
+		}
+	} else
+#endif
 	if ((!msi2intx &&
 	     (assigned_irq->flags & KVM_DEV_IRQ_ASSIGN_MSI_ACTION)) ||
 	    (msi2intx && match->dev->msi_enabled)) {
@@ -413,6 +612,33 @@ out_release:
 	return r;
 }
 
+static int assigned_dev_register_msix_mmio(struct kvm_assigned_dev_kernel *adev)
+{
+	int pos = pci_find_capability(adev->dev, PCI_CAP_ID_MSIX);
+	u32 msix_table_entry;
+	int bar_nr;
+
+	adev->msix_mmio.dev = adev;
+	INIT_HLIST_NODE(&adev->msix_mmio.link);
+
+	if (!pos)
+		return 0;
+
+	if (pci_read_config_dword(adev->dev, pos + 4,
+				   &msix_table_entry) != PCIBIOS_SUCCESSFUL)
+		return -EFAULT;
+
+	bar_nr = msix_table_entry & PCI_MSIX_FLAGS_BIRMASK;
+
+	/* Get table offset */
+	msix_table_entry &= ~PCI_MSIX_FLAGS_BIRMASK;
+	adev->msix_mmio.pfn = (pci_resource_start(adev->dev, bar_nr) +
+			       msix_table_entry) >> PAGE_SHIFT;
+
+	kvm_register_intercept_mmio(adev->kvm, &adev->msix_mmio);
+	return 0;
+}
+
 static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 				      struct kvm_assigned_pci_dev *assigned_dev)
 {
@@ -475,15 +701,25 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 	if (!match->irq_fifo)
 		goto out_list_del;
 
+	/*
+	 * Check for MSI-X capability, if device got, we need to intercept
+	 * its MSI-X table accessing
+	 */
+	if (assigned_dev_register_msix_mmio(match))
+		goto out_fifo_del;
+
 	if (assigned_dev->flags & KVM_DEV_ASSIGN_ENABLE_IOMMU) {
 		r = kvm_iommu_map_guest(kvm, match);
 		if (r)
-			goto out_fifo_del;
+			goto out_unregister;
 	}
 
 out:
 	mutex_unlock(&kvm->lock);
 	return r;
+out_unregister:
+	if (pci_find_capability(dev, PCI_CAP_ID_MSIX))
+		kvm_unregister_intercept_mmio(&match->msix_mmio);
 out_fifo_del:
 	kfifo_free(match->irq_fifo);
 out_list_del:
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] MSI-X Enabling
  2008-12-23  8:12 [PATCH 0/4] MSI-X Enabling Sheng Yang
                   ` (3 preceding siblings ...)
  2008-12-23  8:12 ` [PATCH 4/4] KVM: Enable MSI-X for KVM assigned device Sheng Yang
@ 2008-12-23 20:19 ` Marcelo Tosatti
  2008-12-24  3:00   ` Sheng Yang
  4 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2008-12-23 20:19 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, kvm

On Tue, Dec 23, 2008 at 04:12:49PM +0800, Sheng Yang wrote:
> Hi Avi
> 
> This patchset would enable MSI-X support.
> 
> The main change is a new kind of fake device - intercepted MMIO device is
> added for supporting to intercept one page contained MSI-X entries table.
> 
> I also consider of doing this intercept in userspace. But I think it's would
> be more complex to push it in userspace, for we also have to prevent KVM from
> creating page table for the page. 

Unmapping the page in userspace (with munmap) from the PCI MMIO map will
disallow KVM from creating pagetable for the page.

> And in the future, we would add hook
> function in the read/write operation to enabling mask bit for each MSI-X
> entry, may be with pending bits. 

Can you explain it in more detail? Can't do it from userspace?

Are there performance considerations to access in this page?

> So I decided to put it in the kernel.
> 
> Another thing is I am not very confident about modify the position of handling
> guest memory and MMIO in emulate read/write function. Is it would cause any
> side affect?

Should not be necessary if done in userspace.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] MSI-X Enabling
  2008-12-23 20:19 ` [PATCH 0/4] MSI-X Enabling Marcelo Tosatti
@ 2008-12-24  3:00   ` Sheng Yang
  0 siblings, 0 replies; 11+ messages in thread
From: Sheng Yang @ 2008-12-24  3:00 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, kvm

On Wednesday 24 December 2008 04:19:32 Marcelo Tosatti wrote:
> On Tue, Dec 23, 2008 at 04:12:49PM +0800, Sheng Yang wrote:
> > Hi Avi
> >
> > This patchset would enable MSI-X support.
> >
> > The main change is a new kind of fake device - intercepted MMIO device is
> > added for supporting to intercept one page contained MSI-X entries table.
> >
> > I also consider of doing this intercept in userspace. But I think it's
> > would be more complex to push it in userspace, for we also have to
> > prevent KVM from creating page table for the page.
>
> Unmapping the page in userspace (with munmap) from the PCI MMIO map will
> disallow KVM from creating pagetable for the page.
>
> > And in the future, we would add hook
> > function in the read/write operation to enabling mask bit for each MSI-X
> > entry, may be with pending bits.
>
> Can you explain it in more detail? Can't do it from userspace?
>
> Are there performance considerations to access in this page?
>
> > So I decided to put it in the kernel.
> >
> > Another thing is I am not very confident about modify the position of
> > handling guest memory and MMIO in emulate read/write function. Is it
> > would cause any side affect?
>
> Should not be necessary if done in userspace.

Oops...

After reconsider, I think I have made a design mistake here. MMIO should can 
be emulated in userspace...

I would update the patch.

-- 
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] KVM: Using kfifo for irq recording
  2008-12-23  8:12 ` [PATCH 1/4] KVM: Using kfifo for irq recording Sheng Yang
@ 2008-12-25 11:07   ` Avi Kivity
  2008-12-25 11:27     ` Sheng Yang
  0 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2008-12-25 11:07 UTC (permalink / raw)
  To: Sheng Yang; +Cc: kvm

Sheng Yang wrote:
> For MSI-X, we have to deal with multiply IRQ with same IRQ handler, so it's
> necessary to record the IRQ that trigger the IRQ handler.
>
>   

Does MSI-X disallowing coalescing two requests into one interrupt?  Or 
can we still coalesce interrupts (perhaps by recording them as a (irq, 
cpu) pair?)

> @@ -313,6 +314,9 @@ struct kvm_assigned_dev_kernel {
>  	int host_irq;
>  	bool host_irq_disabled;
>  	int guest_irq;
> +#define KVM_ASSIGNED_DEV_IRQ_FIFO_LEN	0x100
> +	struct kfifo *irq_fifo;
> +	spinlock_t irq_fifo_lock;
>  #define KVM_ASSIGNED_DEV_GUEST_INTX	(1 << 0)
>   

What if it runs out?

What does real hardware do?  I'm sure it doesn't have a 100-entry queue.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] KVM: Using kfifo for irq recording
  2008-12-25 11:07   ` Avi Kivity
@ 2008-12-25 11:27     ` Sheng Yang
  2008-12-25 13:26       ` Avi Kivity
  0 siblings, 1 reply; 11+ messages in thread
From: Sheng Yang @ 2008-12-25 11:27 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

On Thursday 25 December 2008 19:07:22 Avi Kivity wrote:
> Sheng Yang wrote:
> > For MSI-X, we have to deal with multiply IRQ with same IRQ handler, so
> > it's necessary to record the IRQ that trigger the IRQ handler.
>
> Does MSI-X disallowing coalescing two requests into one interrupt?  Or
> can we still coalesce interrupts (perhaps by recording them as a (irq,
> cpu) pair?)

Disallow? Not quite understand. PCI spec said OS don't need to ensure the 
sequence they handled is the same as they happened. This struct is used just 
because we lost information of irq after schedule_work...

> > @@ -313,6 +314,9 @@ struct kvm_assigned_dev_kernel {
> >  	int host_irq;
> >  	bool host_irq_disabled;
> >  	int guest_irq;
> > +#define KVM_ASSIGNED_DEV_IRQ_FIFO_LEN	0x100
> > +	struct kfifo *irq_fifo;
> > +	spinlock_t irq_fifo_lock;
> >  #define KVM_ASSIGNED_DEV_GUEST_INTX	(1 << 0)
>
> What if it runs out?
>
> What does real hardware do?  I'm sure it doesn't have a 100-entry queue.

0x100 is just a simple number which I thought different interrupts of same 
MSI-X device can happen at same period(indeed it's 0x100/sizeof(int)). Maybe 
not that many. And it just used by work function later to find what guest 
vector is, and then inject the correlated interrupt to the guest.

If hardware device driver also postpone the work, I think it also need 
something like this.

-- 
regards
Yang, Sheng


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] KVM: Using kfifo for irq recording
  2008-12-25 11:27     ` Sheng Yang
@ 2008-12-25 13:26       ` Avi Kivity
  2008-12-26  1:53         ` Sheng Yang
  0 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2008-12-25 13:26 UTC (permalink / raw)
  To: Sheng Yang; +Cc: kvm

Sheng Yang wrote:
> On Thursday 25 December 2008 19:07:22 Avi Kivity wrote:
>   
>> Sheng Yang wrote:
>>     
>>> For MSI-X, we have to deal with multiply IRQ with same IRQ handler, so
>>> it's necessary to record the IRQ that trigger the IRQ handler.
>>>       
>> Does MSI-X disallowing coalescing two requests into one interrupt?  Or
>> can we still coalesce interrupts (perhaps by recording them as a (irq,
>> cpu) pair?)
>>     
>
> Disallow? Not quite understand. PCI spec said OS don't need to ensure the 
> sequence they handled is the same as they happened. This struct is used just 
> because we lost information of irq after schedule_work...
>
>   

Why can't we store this information in a bitmap?  There are a limited 
number of irqs.

The only reason I can think of for using a fifo is if we want to 
preserve the number and ordering of interrupts.  Is there another reason?

>>> @@ -313,6 +314,9 @@ struct kvm_assigned_dev_kernel {
>>>  	int host_irq;
>>>  	bool host_irq_disabled;
>>>  	int guest_irq;
>>> +#define KVM_ASSIGNED_DEV_IRQ_FIFO_LEN	0x100
>>> +	struct kfifo *irq_fifo;
>>> +	spinlock_t irq_fifo_lock;
>>>  #define KVM_ASSIGNED_DEV_GUEST_INTX	(1 << 0)
>>>       
>> What if it runs out?
>>
>> What does real hardware do?  I'm sure it doesn't have a 100-entry queue.
>>     
>
> 0x100 is just a simple number which I thought different interrupts of same 
> MSI-X device can happen at same period(indeed it's 0x100/sizeof(int)). Maybe 
> not that many. And it just used by work function later to find what guest 
> vector is, and then inject the correlated interrupt to the guest.
>   

Maybe it's better to do the conversion immediately, so we can store the 
information in a structure that's not prone to overflow.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] KVM: Using kfifo for irq recording
  2008-12-25 13:26       ` Avi Kivity
@ 2008-12-26  1:53         ` Sheng Yang
  0 siblings, 0 replies; 11+ messages in thread
From: Sheng Yang @ 2008-12-26  1:53 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

On Thursday 25 December 2008 21:26:29 Avi Kivity wrote:
> Sheng Yang wrote:
> > On Thursday 25 December 2008 19:07:22 Avi Kivity wrote:
> >> Sheng Yang wrote:
> >>> For MSI-X, we have to deal with multiply IRQ with same IRQ handler, so
> >>> it's necessary to record the IRQ that trigger the IRQ handler.
> >>
> >> Does MSI-X disallowing coalescing two requests into one interrupt?  Or
> >> can we still coalesce interrupts (perhaps by recording them as a (irq,
> >> cpu) pair?)
> >
> > Disallow? Not quite understand. PCI spec said OS don't need to ensure the
> > sequence they handled is the same as they happened. This struct is used
> > just because we lost information of irq after schedule_work...
>
> Why can't we store this information in a bitmap?  There are a limited
> number of irqs.
>
> The only reason I can think of for using a fifo is if we want to
> preserve the number and ordering of interrupts.  Is there another reason?

Well, I just think using fifo is more generic and unify the logic of three 
type of interrupt easily, something seems more elegant. 

> >>> @@ -313,6 +314,9 @@ struct kvm_assigned_dev_kernel {
> >>>  	int host_irq;
> >>>  	bool host_irq_disabled;
> >>>  	int guest_irq;
> >>> +#define KVM_ASSIGNED_DEV_IRQ_FIFO_LEN	0x100
> >>> +	struct kfifo *irq_fifo;
> >>> +	spinlock_t irq_fifo_lock;
> >>>  #define KVM_ASSIGNED_DEV_GUEST_INTX	(1 << 0)
> >>
> >> What if it runs out?
> >>
> >> What does real hardware do?  I'm sure it doesn't have a 100-entry queue.
> >
> > 0x100 is just a simple number which I thought different interrupts of
> > same MSI-X device can happen at same period(indeed it's
> > 0x100/sizeof(int)). Maybe not that many. And it just used by work
> > function later to find what guest vector is, and then inject the
> > correlated interrupt to the guest.
>
> Maybe it's better to do the conversion immediately, so we can store the
> information in a structure that's not prone to overflow.

OK. I would give a bitmap to kvm struct with gsi_msg which is unable to 
overflow.

-- 
regards
Yang, Sheng


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-12-26  1:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-23  8:12 [PATCH 0/4] MSI-X Enabling Sheng Yang
2008-12-23  8:12 ` [PATCH 1/4] KVM: Using kfifo for irq recording Sheng Yang
2008-12-25 11:07   ` Avi Kivity
2008-12-25 11:27     ` Sheng Yang
2008-12-25 13:26       ` Avi Kivity
2008-12-26  1:53         ` Sheng Yang
2008-12-23  8:12 ` [PATCH 2/4] KVM: Add intercepted MMIO for KVM Sheng Yang
2008-12-23  8:12 ` [PATCH 3/4] KVM: x86: displace MMIO handling part Sheng Yang
2008-12-23  8:12 ` [PATCH 4/4] KVM: Enable MSI-X for KVM assigned device Sheng Yang
2008-12-23 20:19 ` [PATCH 0/4] MSI-X Enabling Marcelo Tosatti
2008-12-24  3:00   ` Sheng Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.