All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH]Add MSI support to PV_dom0
@ 2009-05-08  9:10 Jiang, Yunhong
  2009-05-12 20:05 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 4+ messages in thread
From: Jiang, Yunhong @ 2009-05-08  9:10 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Xen-devel

[-- Attachment #1: Type: text/plain, Size: 1942 bytes --]

Jeremy, attached is basic MSI support to PV_dom0.  Please have a look on it.

Thanks
Yunhong Jiang

The pci_wrapper.patch add some hook to pci_bus_type, so that Xen will be notified when a PCI device is added to system.
 Makefile   |    2 -
 pci_wrap.c |   90 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+), 1 deletion(-)

The enable_msi.patch add the MSI support to dom0, basically it just allocate irq from Xen irq name space.

 arch/x86/include/asm/xen/pci.h  |   17 +++++++
 arch/x86/kernel/apic/io_apic.c  |   22 +++++++--
 arch/x86/xen/apic.c             |    1
 drivers/xen/events.c            |   91 +++++++++++++++++++++++++++++++++++++++-
 include/xen/interface/physdev.h |   59 +++++++++++++++++++++++++
 5 files changed, 182 insertions(+), 8 deletions(-)

The reenable_msi.patch just remove original function that disable MSI in xen environment.

 arch/x86/xen/apic.c |    2 --
 drivers/pci/pci.h   |    2 ++
 include/linux/pci.h |    6 ------
 3 files changed, 2 insertions(+), 8 deletions(-)


Notice:
a) Currently the PCI save/restore is broken. The reason is because pci_restore_msi_state() in "drivers/pci/msi.c" will try to restore the MSI config depends on device's msi msg information (i.e. content of msi_desc->msg). However, that information is wrong in Xen environment because Xen HV owns MSI. I'm still trying to find a method to achieve the save/restore without touch the common msi.c code, any suggestion is welcome.
b) I notice pci frontend is in a branch. But to support pci frontend without touch common PCI function is difficult, I'm still considering it.
c) I'm not sure if xen's irq space should be same as pirq. Basically I think irq is dom0 internal structure, while PIRQ is interface between Xen HV/dom0. But seems current implementation think these two items are same. I didn't try to change it, but that may need improvement.

[-- Attachment #2: enable_msi.patch --]
[-- Type: application/octet-stream, Size: 7263 bytes --]

commit 702965d82162e07c0c2afbdddbbe9a0c9a1c599d
Author: Jiang, Yunhong <yunhong.jiang@intel.com>
Date:   Fri May 8 00:50:34 2009 +0800

    Add MSI support to Xen Dom0
    
    It add some hook to arch specific MSI function, so that it will
    a) allocate irq from Xen's irq space
    b) allocate vector through Xen's map_irq hypercall.
    
    Currently Xen's irq space function has no chip_data function, I assume this should be ok since we Xen's IRQ is different chip with native IOAPIC.
    
    Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>

diff --git a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h
index 0563fc6..a5d9027 100644
--- a/arch/x86/include/asm/xen/pci.h
+++ b/arch/x86/include/asm/xen/pci.h
@@ -3,11 +3,28 @@
 
 #ifdef CONFIG_XEN_DOM0_PCI
 int xen_register_gsi(u32 gsi, int triggering, int polarity);
+int xen_create_msi_irq(struct pci_dev *dev,
+			unsigned int want,
+			struct msi_desc *msidesc,
+			int type);
+int xen_destroy_irq(int irq);
 #else
 static inline int xen_register_gsi(u32 gsi, int triggering, int polarity)
 {
 	return -1;
 }
+
+int xen_create_msi_irq(struct pci_dev *dev,
+			unsigned int want,
+			struct msi_desc *msidesc,
+			int type)
+{
+    return -1;
+}
+int xen_destroy_irq(int irq)
+{
+    return -1;
+}
 #endif
 
 #endif	/* _ASM_X86_XEN_PCI_H */
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index b562550..d1e3408 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -66,6 +66,7 @@
 #include <asm/xen/hypervisor.h>
 #include <asm/apic.h>
 
+#include <asm/xen/pci.h>
 
 #define __apicdebuginit(type) static type __init
 
@@ -3472,6 +3473,10 @@ static int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc, int irq)
 		return ret;
 
 	set_irq_msi(irq, msidesc);
+
+	if (xen_domain())
+		return 0;
+
 	write_msi_msg(irq, &msg);
 
 	if (irq_remapped(irq)) {
@@ -3505,9 +3510,14 @@ int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 	irq_want = nr_irqs_gsi;
 	sub_handle = 0;
 	list_for_each_entry(msidesc, &dev->msi_list, list) {
-		irq = create_irq_nr(irq_want);
-		if (irq == 0)
-			return -1;
+		if ((irq = xen_create_msi_irq(dev, irq_want, msidesc, type)) > 0)
+		{
+			dev_printk(KERN_DEBUG, "xen create irq %x for msi\n", irq);
+		} else {
+			irq = create_irq_nr(irq_want);
+			if (irq == 0)
+				return -1;
+		}
 		irq_want = irq + 1;
 		if (!intr_remapping_enabled)
 			goto no_ir;
@@ -3544,13 +3554,15 @@ no_ir:
 	return 0;
 
 error:
-	destroy_irq(irq);
+	if (xen_destroy_irq(irq))
+		destroy_irq(irq);
 	return ret;
 }
 
 void arch_teardown_msi_irq(unsigned int irq)
 {
-	destroy_irq(irq);
+	if (xen_destroy_irq(irq))
+		destroy_irq(irq);
 }
 
 #if defined (CONFIG_DMAR) || defined (CONFIG_INTR_REMAP)
diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
index 496f07d..2f35a2d 100644
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -1,7 +1,6 @@
 #include <linux/kernel.h>
 #include <linux/threads.h>
 #include <linux/bitmap.h>
-#include <linux/pci.h>
 
 #include <asm/io_apic.h>
 #include <asm/acpi.h>
diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index af2aad4..65e4c7a 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -28,6 +28,9 @@
 #include <linux/string.h>
 #include <linux/bootmem.h>
 #include <linux/irqnr.h>
+#include <linux/pci_regs.h>
+#include <linux/pci.h>
+#include <linux/msi.h>
 
 #include <asm/ptrace.h>
 #include <asm/irq.h>
@@ -560,14 +563,98 @@ int xen_allocate_pirq(unsigned gsi, char *name)
 	if (HYPERVISOR_physdev_op(PHYSDEVOP_alloc_irq_vector, &irq_op)) {
 		dynamic_irq_cleanup(irq);
 		irq = -ENOSPC;
+ 		goto out;
+ 	}
+
+	irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector);
+out:
+	spin_unlock(&irq_mapping_update_lock);
+	return irq;
+}
+
+int xen_destroy_irq(int irq)
+{
+	struct physdev_unmap_pirq unmap_irq;
+	int rc;
+
+	if (!xen_domain())
+		return -1;
+
+	unmap_irq.pirq = irq;
+	unmap_irq.domid = DOMID_SELF;
+	if ((rc = HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq)))
+	{
+		printk(KERN_WARNING "unmap irq failed %x\n", rc);
 		goto out;
 	}
 
-	irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector);
+	irq_info[irq] = mk_unbound_info();
+
+	dynamic_irq_cleanup(irq);
 
+	return 0;
 out:
-	spin_unlock(&irq_mapping_update_lock);
+	return -1;
+}
+
+int xen_create_msi_irq(struct pci_dev *dev,
+			unsigned int want,
+			struct msi_desc *msidesc,
+			int type)
+{
+	int irq = 0;
+	struct physdev_map_pirq map_irq;
+	int rc;
+	domid_t domid = DOMID_SELF;
+	int pos;
+	u32 table_offset, bir;
 
+	if (!xen_domain())
+		return -1;
+
+	memset(&map_irq, 0, sizeof(map_irq));
+	map_irq.domid = domid;
+	map_irq.type = MAP_PIRQ_TYPE_MSI;
+	map_irq.index = -1;
+	map_irq.bus = dev->bus->number;
+	map_irq.devfn = dev->devfn;
+
+	if (type == PCI_CAP_ID_MSIX)
+	{
+		pos = pci_find_capability(dev, PCI_CAP_ID_MSIX);
+
+#define msix_table_offset_reg(base)	(base + 0x04)
+		pci_read_config_dword(dev, msix_table_offset_reg(pos), &table_offset);
+		bir = (u8)(table_offset & PCI_MSIX_FLAGS_BIRMASK);
+
+		map_irq.table_base = pci_resource_start(dev, bir);
+		map_irq.entry_nr = msidesc->msi_attrib.entry_nr;
+	}
+
+	spin_lock(&irq_mapping_update_lock);
+
+	irq = find_unbound_irq();
+
+	if (irq == -1)
+		goto out;
+
+	map_irq.pirq = irq;
+
+	if ((rc = HYPERVISOR_physdev_op(PHYSDEVOP_map_pirq, &map_irq)))
+	{
+		printk(KERN_WARNING "xen map irq failed %x\n", rc);
+		dynamic_irq_cleanup(irq);
+		irq = -1;
+		goto out;
+	}
+
+	irq_info[irq] = mk_pirq_info(0, -1, map_irq.index);
+	set_irq_chip_and_handler_name(irq, &xen_pirq_chip,
+			handle_level_irq,
+			(type == PCI_CAP_ID_MSIX) ? "msi-x":"msi");
+
+out:
+	spin_unlock(&irq_mapping_update_lock);
 	return irq;
 }
 
diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index cd69391..47ab7f7 100644
--- a/include/xen/interface/physdev.h
+++ b/include/xen/interface/physdev.h
@@ -121,6 +121,65 @@ struct physdev_op {
 	} u;
 };
 
+
+#define MAP_PIRQ_TYPE_MSI               0x0
+#define MAP_PIRQ_TYPE_GSI               0x1
+#define MAP_PIRQ_TYPE_UNKNOWN           0x2
+
+#define PHYSDEVOP_map_pirq               13
+struct physdev_map_pirq {
+    domid_t domid;
+    /* IN */
+    int type;
+    /* IN */
+    int index;
+    /* IN or OUT */
+    int pirq;
+    /* IN */
+    int bus;
+    /* IN */
+    int devfn;
+    /* IN */
+    int entry_nr;
+    /* IN */
+    uint64_t table_base;
+};
+
+#define PHYSDEVOP_unmap_pirq             14
+struct physdev_unmap_pirq {
+    domid_t domid;
+    /* IN */
+    int pirq;
+};
+
+#define PHYSDEVOP_manage_pci_add         15
+#define PHYSDEVOP_manage_pci_remove      16
+struct physdev_manage_pci {
+    /* IN */
+    uint8_t bus;
+    uint8_t devfn;
+}; 
+
+#define PHYSDEVOP_restore_msi            19
+struct physdev_restore_msi {
+    /* IN */
+    uint8_t bus;
+    uint8_t devfn;
+};
+
+#define PHYSDEVOP_manage_pci_add_ext     20
+struct physdev_manage_pci_ext {
+    /* IN */
+    uint8_t bus;
+    uint8_t devfn;
+    unsigned is_extfn;
+    unsigned is_virtfn;
+    struct {
+        uint8_t bus;
+        uint8_t devfn;
+    } physfn;
+};
+
 /*
  * Notify that some PIRQ-bound event channels have been unmasked.
  * ** This command is obsolete since interface version 0x00030202 and is **

[-- Attachment #3: pci_wrapper.patch --]
[-- Type: application/octet-stream, Size: 3480 bytes --]

commit 134186e0b4526908b4c64c1454caff5db6ddf972
Author: Jiang, Yunhong <yunhong.jiang@intel.com>
Date:   Thu May 7 21:22:26 2009 +0800

    Add the pci wrap function, to notify Xen of all PCI information.
    
    Currently Xen depends on Dom0 to notify all PCI information. When Xen try to setup MSI for a pci device, it will depends on this information.
    
    This method need more discussion, since it add some ugly hook to pci_bus_type.
    
    Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>

diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index f0d1a89..372ad9e 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -13,4 +13,4 @@ obj-$(CONFIG_SMP)		+= smp.o spinlock.o
 obj-$(CONFIG_XEN_DEBUG_FS)	+= debugfs.o
 obj-$(CONFIG_XEN_DOM0)		+= vga.o apic.o
 obj-$(CONFIG_PCI_XEN)		+= pci-swiotlb.o
-obj-$(CONFIG_XEN_DOM0_PCI)	+= pci.o
\ No newline at end of file
+obj-$(CONFIG_XEN_DOM0_PCI)	+= pci.o pci_wrap.o
diff --git a/arch/x86/xen/pci_wrap.c b/arch/x86/xen/pci_wrap.c
new file mode 100644
index 0000000..4ab6966
--- /dev/null
+++ b/arch/x86/xen/pci_wrap.c
@@ -0,0 +1,90 @@
+/*
+ * vim:shiftwidth=8:noexpandtab
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/pci.h>
+#include <xen/interface/physdev.h>
+#include <asm/xen/hypercall.h>
+#include <asm/xen/hypervisor.h>
+
+static int (*pci_bus_probe)(struct device *dev);
+static int (*pci_bus_remove)(struct device *dev);
+
+static int pci_ari_enabled(struct pci_bus *bus)
+{
+	return bus->self && bus->self->ari_enabled;
+}
+
+static int pci_bus_probe_wrapper(struct device *dev)
+{
+	int r;
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct physdev_manage_pci manage_pci;
+	struct physdev_manage_pci_ext manage_pci_ext;
+
+#ifdef CONFIG_PCI_IOV
+	if (pci_dev->is_virtfn) {
+		memset(&manage_pci_ext, 0, sizeof(manage_pci_ext));
+		manage_pci_ext.bus = pci_dev->bus->number;
+		manage_pci_ext.devfn = pci_dev->devfn;
+		manage_pci_ext.is_virtfn = 1;
+		manage_pci_ext.physfn.bus = pci_dev->physfn->bus->number;
+		manage_pci_ext.physfn.devfn = pci_dev->physfn->devfn;
+		r = HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add_ext,
+					  &manage_pci_ext);
+	} else
+#endif
+	if (pci_ari_enabled(pci_dev->bus) && PCI_SLOT(pci_dev->devfn)) {
+		memset(&manage_pci_ext, 0, sizeof(manage_pci_ext));
+		manage_pci_ext.bus = pci_dev->bus->number;
+		manage_pci_ext.devfn = pci_dev->devfn;
+		manage_pci_ext.is_extfn = 1;
+		r = HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add_ext,
+					  &manage_pci_ext);
+	} else {
+		manage_pci.bus = pci_dev->bus->number;
+		manage_pci.devfn = pci_dev->devfn;
+		r = HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add,
+					  &manage_pci);
+	}
+	if (r && r != -ENOSYS)
+		return r;
+
+	r = pci_bus_probe(dev);
+	return r;
+}
+
+static int pci_bus_remove_wrapper(struct device *dev)
+{
+	int r;
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct physdev_manage_pci manage_pci;
+	manage_pci.bus = pci_dev->bus->number;
+	manage_pci.devfn = pci_dev->devfn;
+
+	r = pci_bus_remove(dev);
+	/* dev and pci_dev are no longer valid!! */
+
+	WARN_ON(HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_remove,
+		&manage_pci));
+	return r;
+}
+
+static int __init hook_pci_bus(void)
+{
+
+	if (!xen_domain())
+		return 0;
+
+	pci_bus_probe = pci_bus_type.probe;
+	pci_bus_type.probe = pci_bus_probe_wrapper;
+
+	pci_bus_remove = pci_bus_type.remove;
+	pci_bus_type.remove = pci_bus_remove_wrapper;
+
+	return 0;
+}
+
+core_initcall(hook_pci_bus);

[-- Attachment #4: reenable_msi.patch --]
[-- Type: application/octet-stream, Size: 1489 bytes --]

commit 2714571f3bc6aa9995d6a8ea14acc0e2dd2c123a
Author: Jiang, Yunhong <yunhong.jiang@intel.com>
Date:   Fri May 8 00:51:05 2009 +0800

    Remove xen's disable MSI function
    
    Since now we have basic MSI support, we can remove this.
    
    Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>

diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
index 2f35a2d..ee0db39 100644
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -47,8 +47,6 @@ void xen_init_apic(void)
 	if (!xen_initial_domain())
 		return;
 
-	pci_no_msi();
-
 #ifdef CONFIG_ACPI
 	/*
 	 * Pretend ACPI found our lapic even though we've disabled it,
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 79ada7b..d03f6b9 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -111,8 +111,10 @@ extern struct rw_semaphore pci_bus_sem;
 extern unsigned int pci_pm_d3_delay;
 
 #ifdef CONFIG_PCI_MSI
+void pci_no_msi(void);
 extern void pci_msi_init_pci_dev(struct pci_dev *dev);
 #else
+static inline void pci_no_msi(void) { }
 static inline void pci_msi_init_pci_dev(struct pci_dev *dev) { }
 #endif
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 75b0645..e831a10 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1256,11 +1256,5 @@ static inline irqreturn_t pci_sriov_migration(struct pci_dev *dev)
 }
 #endif
 
-#ifdef CONFIG_PCI_MSI
-void pci_no_msi(void);
-#else
-static inline void pci_no_msi(void) { }
-#endif
-
 #endif /* __KERNEL__ */
 #endif /* LINUX_PCI_H */

[-- Attachment #5: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH]Add MSI support to PV_dom0
  2009-05-08  9:10 [PATCH]Add MSI support to PV_dom0 Jiang, Yunhong
@ 2009-05-12 20:05 ` Jeremy Fitzhardinge
  2009-05-13  3:16   ` Jiang, Yunhong
  0 siblings, 1 reply; 4+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 20:05 UTC (permalink / raw)
  To: Jiang, Yunhong; +Cc: Xen-devel, Matthew Wilcox

Jiang, Yunhong wrote:
> Jeremy, attached is basic MSI support to PV_dom0.  Please have a look on it.
>
> Thanks
> Yunhong Jiang
>
> The pci_wrapper.patch add some hook to pci_bus_type, so that Xen will be notified when a PCI device is added to system.
>  Makefile   |    2 -
>  pci_wrap.c |   90 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 91 insertions(+), 1 deletion(-)
>
> The enable_msi.patch add the MSI support to dom0, basically it just allocate irq from Xen irq name space.
>
>  arch/x86/include/asm/xen/pci.h  |   17 +++++++
>  arch/x86/kernel/apic/io_apic.c  |   22 +++++++--
>  arch/x86/xen/apic.c             |    1
>  drivers/xen/events.c            |   91 +++++++++++++++++++++++++++++++++++++++-
>  include/xen/interface/physdev.h |   59 +++++++++++++++++++++++++
>  5 files changed, 182 insertions(+), 8 deletions(-)
>
> The reenable_msi.patch just remove original function that disable MSI in xen environment.
>
>  arch/x86/xen/apic.c |    2 --
>  drivers/pci/pci.h   |    2 ++
>  include/linux/pci.h |    6 ------
>  3 files changed, 2 insertions(+), 8 deletions(-)
>
>
> Notice:
> a) Currently the PCI save/restore is broken. The reason is because pci_restore_msi_state() in "drivers/pci/msi.c" will try to restore the MSI config depends on device's msi msg information (i.e. content of msi_desc->msg). However, that information is wrong in Xen environment because Xen HV owns MSI. I'm still trying to find a method to achieve the save/restore without touch the common msi.c code, any suggestion is welcome.
>   

You mean because it tries to directly write to pci config space, or that 
it writes the wrong thing there?

What will this affect?  Dom0 S3 suspend/resume?  More?

> b) I notice pci frontend is in a branch. But to support pci frontend without touch common PCI function is difficult, I'm still considering it.
>   

OK.

> c) I'm not sure if xen's irq space should be same as pirq. Basically I think irq is dom0 internal structure, while PIRQ is interface between Xen HV/dom0. But seems current implementation think these two items are same. I didn't try to change it, but that may need improvement.

What do you mean by "Xen's irq space"?  Does it have an irq space that's 
distinct from pirqs?  Or do you mean "Linux's irq space"?  At the 
moment, a Linux irq == the gsi, which is a convention I kept for Xen 
interrupts, though there's no very strong tie (identity_mapped_irq() 
determines where we bother with the identity mapping; only the legacy 
ISA interrupts are essential).  Obviously this has no meaning with MSI 
interrupts.

Or have I misunderstood you?

(It would be easier to review the patches if they were inline, one per mail)

> commit 702965d82162e07c0c2afbdddbbe9a0c9a1c599d Author: Jiang, Yunhong 
> <yunhong.jiang@intel.com> Date: Fri May 8 00:50:34 2009 +0800 Add MSI 
> support to Xen Dom0 It add some hook to arch specific MSI function, so 
> that it will a) allocate irq from Xen's irq space b) allocate vector 
> through Xen's map_irq hypercall. Currently Xen's irq space function 
> has no chip_data function, I assume this should be ok since we Xen's 
> IRQ is different chip with native IOAPIC. Signed-off-by: Jiang, 
> Yunhong <yunhong.jiang@intel.com> diff --git 
> a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h 
> index 0563fc6..a5d9027 100644 --- a/arch/x86/include/asm/xen/pci.h +++ 
> b/arch/x86/include/asm/xen/pci.h @@ -3,11 +3,28 @@ #ifdef 
> CONFIG_XEN_DOM0_PCI int xen_register_gsi(u32 gsi, int triggering, int 
> polarity); +int xen_create_msi_irq(struct pci_dev *dev, + unsigned int 
> want, + struct msi_desc *msidesc, + int type); +int 
> xen_destroy_irq(int irq); #else static inline int xen_register_gsi(u32 
> gsi, int triggering, int polarity) { return -1; } + +int 
> xen_create_msi_irq(struct pci_dev *dev, 
static inline
> + unsigned int want, + struct msi_desc *msidesc, + int type) +{ + 
> return -1; +} +int xen_destroy_irq(int irq) 
static inline
> +{ + return -1; +} #endif #endif /* _ASM_X86_XEN_PCI_H */ diff --git 
> a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c 
> index b562550..d1e3408 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ 
> b/arch/x86/kernel/apic/io_apic.c @@ -66,6 +66,7 @@ #include 
> <asm/xen/hypervisor.h> #include <asm/apic.h> +#include <asm/xen/pci.h> 
> #define __apicdebuginit(type) static type __init @@ -3472,6 +3473,10 
> @@ static int setup_msi_irq(struct pci_dev *dev, struct msi_desc 
> *msidesc, int irq) return ret; set_irq_msi(irq, msidesc); + + if 
> (xen_domain()) + return 0; 

I think it would be cleaner to have a xen_setup_msi_irq() which just does the 
	ret = msi_compose_msg(dev, irq, &msg);
	if (ret < 0)
		return ret;

	set_irq_msi(irq, msidesc);

parts then returns, and put the if (xen) logic at the callsite (either with an explicit
test, or add an ops vec of some kind.

> + write_msi_msg(irq, &msg); if (irq_remapped(irq)) { @@ -3505,9 
> +3510,14 @@ int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int 
> type) irq_want = nr_irqs_gsi; sub_handle = 0; 
> list_for_each_entry(msidesc, &dev->msi_list, list) { - irq = 
> create_irq_nr(irq_want); - if (irq == 0) - return -1; + if ((irq = 
> xen_create_msi_irq(dev, irq_want, msidesc, type)) > 0) + { + 
> dev_printk(KERN_DEBUG, "xen create irq %x for msi\n", irq); + } else { 
> + irq = create_irq_nr(irq_want); + if (irq == 0) + return -1; + } 
Hoist this out into a create_msi_irq() which does the native vs Xen thing
appropriately.
> irq_want = irq + 1; if (!intr_remapping_enabled) goto no_ir; @@ 
> -3544,13 +3554,15 @@ no_ir: return 0; error: - destroy_irq(irq); + if 
> (xen_destroy_irq(irq)) + destroy_irq(irq); 
Hoist this out too.  (destroy_msi_irq()?)
> return ret; } 

Do we (or will we ever) support the interrupt remapping stuff in the dom0 kernel,
or is that something that might happen within Xen?  (I don't know anything about
it.)

If not, then it looks like there isn't very much common code between the native
and Xen versions of arch_setup_msi_irqs().  I think it might be overall cleaner
just to explicitly have two versions, with arch_setup_msi_irqs() being a wrapper to
choose which one to use.  Even if we do support interrupt remapping, it wouldn't
result in much duplicated code at all, as far as I can see.
  
> void arch_teardown_msi_irq(unsigned int irq) { - destroy_irq(irq); + 
> if (xen_destroy_irq(irq)) + destroy_irq(irq); 

Yes, this little pattern should be put into a single place.

> } #if defined (CONFIG_DMAR) || defined (CONFIG_INTR_REMAP) diff --git 
> a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c index 496f07d..2f35a2d 
> 100644 --- a/arch/x86/xen/apic.c +++ b/arch/x86/xen/apic.c @@ -1,7 
> +1,6 @@ #include <linux/kernel.h> #include <linux/threads.h> #include 
> <linux/bitmap.h> -#include <linux/pci.h> #include <asm/io_apic.h> 
> #include <asm/acpi.h> diff --git a/drivers/xen/events.c 
> b/drivers/xen/events.c index af2aad4..65e4c7a 100644 --- 
> a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -28,6 +28,9 @@ 
> #include <linux/string.h> #include <linux/bootmem.h> #include 
> <linux/irqnr.h> +#include <linux/pci_regs.h> +#include <linux/pci.h> 
> +#include <linux/msi.h> #include <asm/ptrace.h> #include <asm/irq.h> 
> @@ -560,14 +563,98 @@ int xen_allocate_pirq(unsigned gsi, char *name) 
> if (HYPERVISOR_physdev_op(PHYSDEVOP_alloc_irq_vector, &irq_op)) { 
> dynamic_irq_cleanup(irq); irq = -ENOSPC; + goto out; + } + + 
> irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector); +out: + 
> spin_unlock(&irq_mapping_update_lock); + return irq; +} + +int 
> xen_destroy_irq(int irq) +{ + struct physdev_unmap_pirq unmap_irq; + 
> int rc; + + if (!xen_domain()) + return -1; + + unmap_irq.pirq = irq; 
> + unmap_irq.domid = DOMID_SELF; + if ((rc = 
> HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq))) + { + 
> printk(KERN_WARNING "unmap irq failed %x\n", rc); goto out; } - 
> irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector); + irq_info[irq] = 
> mk_unbound_info(); + + dynamic_irq_cleanup(irq); + return 0; out: - 
> spin_unlock(&irq_mapping_update_lock); + return -1; +} + +int 
> xen_create_msi_irq(struct pci_dev *dev, + unsigned int want, 

Do we need this param?  Looks unused.

> + struct msi_desc *msidesc, + int type) +{ + int irq = 0; + struct 
> physdev_map_pirq map_irq; + int rc; + domid_t domid = DOMID_SELF; + 
> int pos; + u32 table_offset, bir; + if (!xen_domain()) + return -1; + 
> + memset(&map_irq, 0, sizeof(map_irq)); + map_irq.domid = domid; + 
> map_irq.type = MAP_PIRQ_TYPE_MSI; + map_irq.index = -1; + map_irq.bus 
> = dev->bus->number; + map_irq.devfn = dev->devfn; + + if (type == 
> PCI_CAP_ID_MSIX) + { + pos = pci_find_capability(dev, 
> PCI_CAP_ID_MSIX); + +#define msix_table_offset_reg(base) (base + 0x04) 
Isn't this defined somewhere else?  If not, is there a better place to define it?
> + pci_read_config_dword(dev, msix_table_offset_reg(pos), 
> &table_offset); + bir = (u8)(table_offset & PCI_MSIX_FLAGS_BIRMASK); + 
> + map_irq.table_base = pci_resource_start(dev, bir); + 
> map_irq.entry_nr = msidesc->msi_attrib.entry_nr; + } + + 
> spin_lock(&irq_mapping_update_lock); + + irq = find_unbound_irq(); + + 
> if (irq == -1) + goto out; + + map_irq.pirq = irq; + + if ((rc = 
> HYPERVISOR_physdev_op(PHYSDEVOP_map_pirq, &map_irq))) + { + 
> printk(KERN_WARNING "xen map irq failed %x\n", rc); + 
> dynamic_irq_cleanup(irq); + irq = -1; + goto out; + } + + 
> irq_info[irq] = mk_pirq_info(0, -1, map_irq.index); + 
> set_irq_chip_and_handler_name(irq, &xen_pirq_chip, + handle_level_irq, 
> + (type == PCI_CAP_ID_MSIX) ? "msi-x":"msi"); + +out: + 
> spin_unlock(&irq_mapping_update_lock); return irq; } diff --git 
> a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h 
> index cd69391..47ab7f7 100644 --- a/include/xen/interface/physdev.h 
> +++ b/include/xen/interface/physdev.h @@ -121,6 +121,65 @@ struct 
> physdev_op { } u; }; + +#define MAP_PIRQ_TYPE_MSI 0x0 +#define 
> MAP_PIRQ_TYPE_GSI 0x1 +#define MAP_PIRQ_TYPE_UNKNOWN 0x2 + +#define 
> PHYSDEVOP_map_pirq 13 +struct physdev_map_pirq { + domid_t domid; + /* 
> IN */ + int type; + /* IN */ + int index; + /* IN or OUT */ + int 
> pirq; + /* IN */ + int bus; + /* IN */ + int devfn; + /* IN */ + int 
> entry_nr; + /* IN */ + uint64_t table_base; +}; + +#define 
> PHYSDEVOP_unmap_pirq 14 +struct physdev_unmap_pirq { + domid_t domid; 
> + /* IN */ + int pirq; +}; + +#define PHYSDEVOP_manage_pci_add 15 
> +#define PHYSDEVOP_manage_pci_remove 16 +struct physdev_manage_pci { + 
> /* IN */ + uint8_t bus; + uint8_t devfn; +}; + +#define 
> PHYSDEVOP_restore_msi 19 +struct physdev_restore_msi { + /* IN */ + 
> uint8_t bus; + uint8_t devfn; +}; + +#define 
> PHYSDEVOP_manage_pci_add_ext 20 +struct physdev_manage_pci_ext { + /* 
> IN */ + uint8_t bus; + uint8_t devfn; + unsigned is_extfn; + unsigned 
> is_virtfn; + struct { + uint8_t bus; + uint8_t devfn; + } physfn; +}; 
> + /* * Notify that some PIRQ-bound event channels have been unmasked. 
> * ** This command is obsolete since interface version 0x00030202 and 
> is ** 


> commit 134186e0b4526908b4c64c1454caff5db6ddf972 Author: Jiang, Yunhong 
> <yunhong.jiang@intel.com> Date: Thu May 7 21:22:26 2009 +0800 Add the 
> pci wrap function, to notify Xen of all PCI information. Currently Xen 
> depends on Dom0 to notify all PCI information. When Xen try to setup 
> MSI for a pci device, it will depends on this information. This method 
> need more discussion, since it add some ugly hook to pci_bus_type. 

Yes, this is a bit unpleasant.  I can't see any immediately satisfying solution, partly because
I don't fully understand what needs to be done in these hooks.  Why does it need to do this at probe
time rather than when setting up the interrupts?

> Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> diff --git 
> a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index f0d1a89..372ad9e 
> 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ 
> -13,4 +13,4 @@ obj-$(CONFIG_SMP) += smp.o spinlock.o 
> obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o obj-$(CONFIG_XEN_DOM0) += 
> vga.o apic.o obj-$(CONFIG_PCI_XEN) += pci-swiotlb.o 
> -obj-$(CONFIG_XEN_DOM0_PCI) += pci.o \ No newline at end of file 
> +obj-$(CONFIG_XEN_DOM0_PCI) += pci.o pci_wrap.o diff --git 
> a/arch/x86/xen/pci_wrap.c b/arch/x86/xen/pci_wrap.c new file mode 
> 100644 index 0000000..4ab6966 --- /dev/null +++ 
> b/arch/x86/xen/pci_wrap.c @@ -0,0 +1,90 @@ +/* + * 
> vim:shiftwidth=8:noexpandtab + */ + +#include <linux/kernel.h> 
> +#include <linux/init.h> +#include <linux/pci.h> +#include 
> <xen/interface/physdev.h> +#include <asm/xen/hypercall.h> +#include 
> <asm/xen/hypervisor.h> + +static int (*pci_bus_probe)(struct device 
> *dev); +static int (*pci_bus_remove)(struct device *dev); 

I'd give these more distinctive names: orig_pci_bus_probe, or something,
and call them with (*orig_pci_bus_probe)(...) to make it obvious they're
function callers.  I overlooked the calls the first couple of times because
they didn't stand out.

> +static int pci_ari_enabled(struct pci_bus *bus) +{ + return bus->self 
> && bus->self->ari_enabled; +} 

Is this a generally useful predicate?

> + +static int pci_bus_probe_wrapper(struct device *dev) +{ + int r; + 
> struct pci_dev *pci_dev = to_pci_dev(dev); + struct physdev_manage_pci 
> manage_pci; + struct physdev_manage_pci_ext manage_pci_ext; + +#ifdef 
> CONFIG_PCI_IOV + if (pci_dev->is_virtfn) { + memset(&manage_pci_ext, 
> 0, sizeof(manage_pci_ext)); + manage_pci_ext.bus = 
> pci_dev->bus->number; + manage_pci_ext.devfn = pci_dev->devfn; + 
> manage_pci_ext.is_virtfn = 1; + manage_pci_ext.physfn.bus = 
> pci_dev->physfn->bus->number; + manage_pci_ext.physfn.devfn = 
> pci_dev->physfn->devfn; + r = 
> HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add_ext, + 
> &manage_pci_ext); + } else +#endif + if (pci_ari_enabled(pci_dev->bus) 
> && PCI_SLOT(pci_dev->devfn)) { + memset(&manage_pci_ext, 0, 
> sizeof(manage_pci_ext)); + manage_pci_ext.bus = pci_dev->bus->number; 
> + manage_pci_ext.devfn = pci_dev->devfn; + manage_pci_ext.is_extfn = 
> 1; + r = HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add_ext, + 
> &manage_pci_ext); + } else { + manage_pci.bus = pci_dev->bus->number; 
> + manage_pci.devfn = pci_dev->devfn; + r = 
> HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add, + &manage_pci); + } + 
> if (r && r != -ENOSYS) + return r; 

Why is ENOSYS OK?  Doesn't it mean that Xen is missing some aspect of MSI support?

> + + r = pci_bus_probe(dev); + return r; +} + +static int 
> pci_bus_remove_wrapper(struct device *dev) +{ + int r; + struct 
> pci_dev *pci_dev = to_pci_dev(dev); + struct physdev_manage_pci 
> manage_pci; + manage_pci.bus = pci_dev->bus->number; + 
> manage_pci.devfn = pci_dev->devfn; + + r = pci_bus_remove(dev); + /* 
> dev and pci_dev are no longer valid!! */ + + 
> WARN_ON(HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_remove, + 
> &manage_pci)); 
I prefer:
	if (... != 0)
		WARN_ON(1);
to make it obvious that we're doing an error check, rather than some
internal consistency assertion.

> + return r; +} + +static int __init hook_pci_bus(void) +{ + + if 
> (!xen_domain()) + return 0; + + pci_bus_probe = pci_bus_type.probe; + 
> pci_bus_type.probe = pci_bus_probe_wrapper; + + pci_bus_remove = 
> pci_bus_type.remove; + pci_bus_type.remove = pci_bus_remove_wrapper; + 
> + return 0; +} + +core_initcall(hook_pci_bus); 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH]Add MSI support to PV_dom0
  2009-05-12 20:05 ` Jeremy Fitzhardinge
@ 2009-05-13  3:16   ` Jiang, Yunhong
  2009-05-13 17:31     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 4+ messages in thread
From: Jiang, Yunhong @ 2009-05-13  3:16 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Xen-devel, Matthew Wilcox

Jeremy, thanks for your detailed comments, I will update according to your feedback. See comments embeded please.

--jyh

Jeremy Fitzhardinge wrote:
> Jiang, Yunhong wrote:
>> Jeremy, attached is basic MSI support to PV_dom0.  Please have a
>> look on it.
>>
>> Thanks
>> Yunhong Jiang
>>
>> The pci_wrapper.patch add some hook to pci_bus_type, so that
> Xen will be notified when a PCI device is added to system.
>>  Makefile   |    2 -
>>  pci_wrap.c |   90
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 91 insertions(+), 1 deletion(-)
>>
>> The enable_msi.patch add the MSI support to dom0, basically
> it just allocate irq from Xen irq name space.
>>
>>  arch/x86/include/asm/xen/pci.h  |   17 +++++++
>>  arch/x86/kernel/apic/io_apic.c  |   22 +++++++--
>>  arch/x86/xen/apic.c             |    1
>>  drivers/xen/events.c            |   91
> +++++++++++++++++++++++++++++++++++++++-
>>  include/xen/interface/physdev.h |   59 +++++++++++++++++++++++++
>>  5 files changed, 182 insertions(+), 8 deletions(-)
>>
>> The reenable_msi.patch just remove original function that
> disable MSI in xen environment.
>>
>>  arch/x86/xen/apic.c |    2 --
>>  drivers/pci/pci.h   |    2 ++
>>  include/linux/pci.h |    6 ------
>>  3 files changed, 2 insertions(+), 8 deletions(-)
>>
>>
>> Notice:
>> a) Currently the PCI save/restore is broken. The reason is
> because pci_restore_msi_state() in "drivers/pci/msi.c" will
> try to restore the MSI config depends on device's msi msg
> information (i.e. content of msi_desc->msg). However, that
> information is wrong in Xen environment because Xen HV owns
> MSI. I'm still trying to find a method to achieve the
> save/restore without touch the common msi.c code, any
> suggestion is welcome.
>>
>
> You mean because it tries to directly write to pci config
> space, or that
> it writes the wrong thing there?

Because It will try to write to the pci config space directly, and currently it will write a wrong value.
I'm considering to overwrite msi_compose_msg() in arch/x86/kernel/apic/io_apic.c, so that msi_desc->msg will hold the value Xen is using, I hope it will resolve the issue.


>
> What will this affect?  Dom0 S3 suspend/resume?  More?

Yes, suspend/resume.

>
>> b) I notice pci frontend is in a branch. But to support pci
> frontend without touch common PCI function is difficult, I'm
> still considering it.
>>
>
> OK.
>
>> c) I'm not sure if xen's irq space should be same as pirq.
> Basically I think irq is dom0 internal structure, while PIRQ
> is interface between Xen HV/dom0. But seems current
> implementation think these two items are same. I didn't try to
> change it, but that may need improvement.
>
> What do you mean by "Xen's irq space"?  Does it have an irq
> space that's
> distinct from pirqs?  Or do you mean "Linux's irq space"?  At the

Aha, yes, I mean linux's irq space.

> moment, a Linux irq == the gsi, which is a convention I kept for Xen

I suspect if this will always work, because
1) It may not work on x86_32, because gsi is compressed in x86_32, or we will not support that?
2) Even in x86_64, I suspect it may not work still. Currently Xen defined NR_IRQS as 256, unless specified in Rules.mk. So what will happen if the gsi > 256? I suspect the PHYSDEVOP_alloc_irq_vector hypercall will fail.

The reason is, the pirq is virtual interrupt line between Xen/guest, and is determined by Xen HV, while irq/gsi is defined by guest/dom0. Current Xen's dom0 has a mapping between pirq/irq, so I'm not sure if that logic is needed in pv_dom0 still.

> interrupts, though there's no very strong tie (identity_mapped_irq()
> determines where we bother with the identity mapping; only the legacy
> ISA interrupts are essential).  Obviously this has no meaning with
> MSI interrupts.
>
> Or have I misunderstood you?
>
> (It would be easier to review the patches if they were inline,
> one per mail)

Sorry, I will do it next time.

>
>> commit 702965d82162e07c0c2afbdddbbe9a0c9a1c599d Author: Jiang,
>> Yunhong <yunhong.jiang@intel.com> Date: Fri May 8 00:50:34 2009
>> +0800 Add MSI support to Xen Dom0 It add some hook to arch specific
>> MSI function, so that it will a) allocate irq from Xen's irq space
>> b) allocate vector through Xen's map_irq hypercall. Currently Xen's
>> irq space function has no chip_data function, I assume this should
>> be ok since we Xen's IRQ is different chip with native IOAPIC.
>> Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> diff --git
>> a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h
>> index 0563fc6..a5d9027 100644 ---
> a/arch/x86/include/asm/xen/pci.h +++
>> b/arch/x86/include/asm/xen/pci.h @@ -3,11 +3,28 @@ #ifdef
>> CONFIG_XEN_DOM0_PCI int xen_register_gsi(u32 gsi, int triggering, int
>> polarity); +int xen_create_msi_irq(struct pci_dev *dev, + unsigned
>> int want, + struct msi_desc *msidesc, + int type); +int
>> xen_destroy_irq(int irq); #else static inline int
>> xen_register_gsi(u32 gsi, int triggering, int polarity) { return -1;
>> } + +int xen_create_msi_irq(struct pci_dev *dev,
> static inline
>> + unsigned int want, + struct msi_desc *msidesc, + int type) +{ +
>> return -1; +} +int xen_destroy_irq(int irq) static inline
>> +{ + return -1; +} #endif #endif /* _ASM_X86_XEN_PCI_H */ diff --git
>> a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
>> index b562550..d1e3408 100644 ---
> a/arch/x86/kernel/apic/io_apic.c +++
>> b/arch/x86/kernel/apic/io_apic.c @@ -66,6 +66,7 @@ #include
>> <asm/xen/hypervisor.h> #include <asm/apic.h> +#include
>> <asm/xen/pci.h> #define __apicdebuginit(type) static type __init @@
>> -3472,6 +3473,10 @@ static int setup_msi_irq(struct pci_dev *dev,
>> struct msi_desc *msidesc, int irq) return ret; set_irq_msi(irq,
>> msidesc); + + if (xen_domain()) + return 0;
>
> I think it would be cleaner to have a xen_setup_msi_irq()
> which just does the
>       ret = msi_compose_msg(dev, irq, &msg);
>       if (ret < 0)
>               return ret;
>
>       set_irq_msi(irq, msidesc);
>
> parts then returns, and put the if (xen) logic at the callsite
> (either with an explicit test, or add an ops vec of some kind.

Sure, I will do like that. In fact, my draft implemented this way.
BTW, what do you mean of "an ops vec"???

>
>> + write_msi_msg(irq, &msg); if (irq_remapped(irq)) { @@ -3505,9
>> +3510,14 @@ int arch_setup_msi_irqs(struct pci_dev *dev, int nvec,
>> int type) irq_want = nr_irqs_gsi; sub_handle = 0;
>> list_for_each_entry(msidesc, &dev->msi_list, list) { - irq =
>> create_irq_nr(irq_want); - if (irq == 0) - return -1; + if ((irq =
>> xen_create_msi_irq(dev, irq_want, msidesc, type)) > 0) + { +
>> dev_printk(KERN_DEBUG, "xen create irq %x for msi\n", irq); + } else
>> { + irq = create_irq_nr(irq_want); + if (irq == 0) + return -1; + }
> Hoist this out into a create_msi_irq() which does the native
> vs Xen thing
> appropriately.

Sure.

>> irq_want = irq + 1; if (!intr_remapping_enabled) goto no_ir; @@
>> -3544,13 +3554,15 @@ no_ir: return 0; error: - destroy_irq(irq); + if
>> (xen_destroy_irq(irq)) + destroy_irq(irq);
> Hoist this out too.  (destroy_msi_irq()?)

Sure.

>> return ret; }
>
> Do we (or will we ever) support the interrupt remapping stuff
> in the dom0 kernel,
> or is that something that might happen within Xen?  (I don't
> know anything about
> it.)

Per my understanding, interrupt remapping should happen within Xen.

>
> If not, then it looks like there isn't very much common code
> between the native
> and Xen versions of arch_setup_msi_irqs().  I think it might
> be overall cleaner
> just to explicitly have two versions, with
> arch_setup_msi_irqs() being a wrapper to
> choose which one to use.  Even if we do support interrupt
> remapping, it wouldn't
> result in much duplicated code at all, as far as I can see.

So do you mean something like
arch_setup_msi_irqs(...)
{
        if (xen_domain)
                xen_setup_msi_irqs()
          ..... same as native
}

>
>> void arch_teardown_msi_irq(unsigned int irq) { - destroy_irq(irq); +
>> if (xen_destroy_irq(irq)) + destroy_irq(irq);
>
> Yes, this little pattern should be put into a single place.

What do you mean of put into a single place?

>
>> } #if defined (CONFIG_DMAR) || defined (CONFIG_INTR_REMAP) diff --git
>> a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c index 496f07d..2f35a2d
>> 100644 --- a/arch/x86/xen/apic.c +++ b/arch/x86/xen/apic.c @@ -1,7
>> +1,6 @@ #include <linux/kernel.h> #include <linux/threads.h> #include
>> <linux/bitmap.h> -#include <linux/pci.h> #include <asm/io_apic.h>
>> #include <asm/acpi.h> diff --git a/drivers/xen/events.c
>> b/drivers/xen/events.c index af2aad4..65e4c7a 100644 ---
>> a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -28,6 +28,9 @@
>> #include <linux/string.h> #include <linux/bootmem.h> #include
>> <linux/irqnr.h> +#include <linux/pci_regs.h> +#include <linux/pci.h>
>> +#include <linux/msi.h> #include <asm/ptrace.h> #include <asm/irq.h>
>> @@ -560,14 +563,98 @@ int xen_allocate_pirq(unsigned gsi, char *name)
>> if (HYPERVISOR_physdev_op(PHYSDEVOP_alloc_irq_vector, &irq_op)) {
>> dynamic_irq_cleanup(irq); irq = -ENOSPC; + goto out; + } + +
>> irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector); +out: +
>> spin_unlock(&irq_mapping_update_lock); + return irq; +} + +int
>> xen_destroy_irq(int irq) +{ + struct physdev_unmap_pirq unmap_irq; +
>> int rc; + + if (!xen_domain()) + return -1; + + unmap_irq.pirq = irq;
>> + unmap_irq.domid = DOMID_SELF; + if ((rc =
>> HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq))) + { +
>> printk(KERN_WARNING "unmap irq failed %x\n", rc); goto out; } -
>> irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector); + irq_info[irq]
>> = mk_unbound_info(); + + dynamic_irq_cleanup(irq); + return 0; out: -
>> spin_unlock(&irq_mapping_update_lock); + return -1; +} + +int
>> xen_create_msi_irq(struct pci_dev *dev, + unsigned int want,
>
> Do we need this param?  Looks unused.

I wil change it. I just wanted to keep it similar as native create_irq_nr() function.

>
>> + struct msi_desc *msidesc, + int type) +{ + int irq = 0; + struct
>> physdev_map_pirq map_irq; + int rc; + domid_t domid = DOMID_SELF; +
>> int pos; + u32 table_offset, bir; + if (!xen_domain()) + return -1; +
>> + memset(&map_irq, 0, sizeof(map_irq)); + map_irq.domid = domid; +
>> map_irq.type = MAP_PIRQ_TYPE_MSI; + map_irq.index = -1; + map_irq.bus
>> = dev->bus->number; + map_irq.devfn = dev->devfn; + + if (type ==
>> PCI_CAP_ID_MSIX) + { + pos = pci_find_capability(dev,
>> PCI_CAP_ID_MSIX); + +#define msix_table_offset_reg(base)
> (base + 0x04)
> Isn't this defined somewhere else?  If not, is there a better
> place to define it?

It is defined in drivers/pci/msi.h. As I don't want to touch anything in drivers/pci/ directory, so I put it here (sorry that I should place it under some .h file).

>> + pci_read_config_dword(dev, msix_table_offset_reg(pos),
>> &table_offset); + bir = (u8)(table_offset & PCI_MSIX_FLAGS_BIRMASK);
>> + + map_irq.table_base = pci_resource_start(dev, bir); +
>> map_irq.entry_nr = msidesc->msi_attrib.entry_nr; + } + +
>> spin_lock(&irq_mapping_update_lock); + + irq = find_unbound_irq(); +
>> + if (irq == -1) + goto out; + + map_irq.pirq = irq; + + if ((rc =
>> HYPERVISOR_physdev_op(PHYSDEVOP_map_pirq, &map_irq))) + { +
>> printk(KERN_WARNING "xen map irq failed %x\n", rc); +
>> dynamic_irq_cleanup(irq); + irq = -1; + goto out; + } + +
>> irq_info[irq] = mk_pirq_info(0, -1, map_irq.index); +
>> set_irq_chip_and_handler_name(irq, &xen_pirq_chip, +
>> handle_level_irq, + (type == PCI_CAP_ID_MSIX) ? "msi-x":"msi"); +
>> +out: + spin_unlock(&irq_mapping_update_lock); return irq; } diff
>> --git a/include/xen/interface/physdev.h
>> b/include/xen/interface/physdev.h index cd69391..47ab7f7 100644 ---
>> a/include/xen/interface/physdev.h +++
>> b/include/xen/interface/physdev.h @@ -121,6 +121,65 @@ struct
>> physdev_op { } u; }; + +#define MAP_PIRQ_TYPE_MSI 0x0 +#define
>> MAP_PIRQ_TYPE_GSI 0x1 +#define MAP_PIRQ_TYPE_UNKNOWN 0x2 + +#define
>> PHYSDEVOP_map_pirq 13 +struct physdev_map_pirq { + domid_t domid; +
>> /* IN */ + int type; + /* IN */ + int index; + /* IN or OUT */ + int
>> pirq; + /* IN */ + int bus; + /* IN */ + int devfn; + /* IN */ + int
>> entry_nr; + /* IN */ + uint64_t table_base; +}; + +#define
>> PHYSDEVOP_unmap_pirq 14 +struct physdev_unmap_pirq { + domid_t
>> domid; + /* IN */ + int pirq; +}; + +#define
>> PHYSDEVOP_manage_pci_add 15 +#define PHYSDEVOP_manage_pci_remove 16
>> +struct physdev_manage_pci { + /* IN */ + uint8_t bus; + uint8_t
>> devfn; +}; + +#define PHYSDEVOP_restore_msi 19 +struct
>> physdev_restore_msi { + /* IN */ + uint8_t bus; + uint8_t devfn; +};
>> + +#define PHYSDEVOP_manage_pci_add_ext 20 +struct
>> physdev_manage_pci_ext { + /* IN */ + uint8_t bus; + uint8_t devfn;
>> + unsigned is_extfn; + unsigned is_virtfn; + struct { + uint8_t bus;
>> + uint8_t devfn; + } physfn; +}; + /* * Notify that some PIRQ-bound
>> event channels have been unmasked. * ** This command is obsolete
>> since interface version 0x00030202 and is **
>
>
>> commit 134186e0b4526908b4c64c1454caff5db6ddf972 Author: Jiang,
>> Yunhong <yunhong.jiang@intel.com> Date: Thu May 7 21:22:26 2009
>> +0800 Add the pci wrap function, to notify Xen of all PCI
>> information. Currently Xen depends on Dom0 to notify all PCI
>> information. When Xen try to setup MSI for a pci device, it will
>> depends on this information. This method need more discussion, since
>> it add some ugly hook to pci_bus_type.
>
> Yes, this is a bit unpleasant.  I can't see any immediately
> satisfying solution, partly because
> I don't fully understand what needs to be done in these hooks.
> Why does it need to do this at probe
> time rather than when setting up the interrupts?

The main purpose of this hook is to keep Xen have the list of PCI device in the system, include those hot plug/unpluged devices, before the device begin function,  I think the main purpose is for VT-d support in Xen, so it is in probe time.

In fact, MSI should works without this list. Unfortunately, currently Xen will check if the device exists or not when enabling MSI/MSI-X, and will return -ENODEV if the device is not in the list.

But I do think this code is very ugly, and I suspect if we can push this code into upstream successfully. I know Winston is working on PCI hotplug in Xen side, I will talk with him to see if any plan to change this logic.

>
>> Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> diff --git
>> a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index
>> f0d1a89..372ad9e 100644 --- a/arch/x86/xen/Makefile +++
>> b/arch/x86/xen/Makefile @@ -13,4 +13,4 @@ obj-$(CONFIG_SMP) += smp.o
>> spinlock.o obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o
>> obj-$(CONFIG_XEN_DOM0) += vga.o apic.o obj-$(CONFIG_PCI_XEN) +=
>> pci-swiotlb.o -obj-$(CONFIG_XEN_DOM0_PCI) += pci.o \ No newline at
>> end of file +obj-$(CONFIG_XEN_DOM0_PCI) += pci.o pci_wrap.o diff
>> --git a/arch/x86/xen/pci_wrap.c b/arch/x86/xen/pci_wrap.c new file
>> mode 100644 index 0000000..4ab6966 --- /dev/null +++
>> b/arch/x86/xen/pci_wrap.c @@ -0,0 +1,90 @@ +/* + *
>> vim:shiftwidth=8:noexpandtab + */ + +#include <linux/kernel.h>
>> +#include <linux/init.h> +#include <linux/pci.h> +#include
>> <xen/interface/physdev.h> +#include <asm/xen/hypercall.h> +#include
>> <asm/xen/hypervisor.h> + +static int (*pci_bus_probe)(struct device
>> *dev); +static int (*pci_bus_remove)(struct device *dev);
>
> I'd give these more distinctive names: orig_pci_bus_probe, or
> something, and call them with (*orig_pci_bus_probe)(...) to make it
> obvious they're
> function callers.  I overlooked the calls the first couple of
> times because
> they didn't stand out.
>
>> +static int pci_ari_enabled(struct pci_bus *bus) +{ + return
>> bus->self && bus->self->ari_enabled; +}
>
> Is this a generally useful predicate?

I just copied from Xen's dom0 code. I will check it .

>
>> + +static int pci_bus_probe_wrapper(struct device *dev) +{ + int r; +
>> struct pci_dev *pci_dev = to_pci_dev(dev); + struct
>> physdev_manage_pci manage_pci; + struct physdev_manage_pci_ext
>> manage_pci_ext; + +#ifdef CONFIG_PCI_IOV + if (pci_dev->is_virtfn) {
>> + memset(&manage_pci_ext, 0, sizeof(manage_pci_ext)); +
>> manage_pci_ext.bus = pci_dev->bus->number; + manage_pci_ext.devfn =
>> pci_dev->devfn; + manage_pci_ext.is_virtfn = 1; +
>> manage_pci_ext.physfn.bus = pci_dev->physfn->bus->number; +
>> manage_pci_ext.physfn.devfn = pci_dev->physfn->devfn; + r =
>> HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add_ext, +
>> &manage_pci_ext); + } else +#endif + if
>> (pci_ari_enabled(pci_dev->bus) && PCI_SLOT(pci_dev->devfn)) { +
>> memset(&manage_pci_ext, 0, sizeof(manage_pci_ext)); +
>> manage_pci_ext.bus = pci_dev->bus->number; + manage_pci_ext.devfn =
>> pci_dev->devfn; + manage_pci_ext.is_extfn = 1; + r =
>> HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add_ext, +
>> &manage_pci_ext); + } else { + manage_pci.bus =
>> pci_dev->bus->number; + manage_pci.devfn = pci_dev->devfn; + r =
>> HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add, + &manage_pci); + }
>> + if (r && r != -ENOSYS) + return r;
>
> Why is ENOSYS OK?  Doesn't it mean that Xen is missing some
> aspect of MSI support?

I'm not sure if that is for backward compatibility? I just copied this from current Xen's dom0 code.

>
>> + + r = pci_bus_probe(dev); + return r; +} + +static int
>> pci_bus_remove_wrapper(struct device *dev) +{ + int r; + struct
>> pci_dev *pci_dev = to_pci_dev(dev); + struct physdev_manage_pci
>> manage_pci; + manage_pci.bus = pci_dev->bus->number; +
>> manage_pci.devfn = pci_dev->devfn; + + r = pci_bus_remove(dev); + /*
>> dev and pci_dev are no longer valid!! */ + +
>> WARN_ON(HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_remove, +
>> &manage_pci));
> I prefer:
>       if (... != 0)
>               WARN_ON(1);
> to make it obvious that we're doing an error check, rather than some
> internal consistency assertion.
>
>> + return r; +} + +static int __init hook_pci_bus(void) +{ + + if
>> (!xen_domain()) + return 0; + + pci_bus_probe = pci_bus_type.probe; +
>> pci_bus_type.probe = pci_bus_probe_wrapper; + + pci_bus_remove =
>> pci_bus_type.remove; + pci_bus_type.remove = pci_bus_remove_wrapper;
>> + + return 0; +} + +core_initcall(hook_pci_bus);

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH]Add MSI support to PV_dom0
  2009-05-13  3:16   ` Jiang, Yunhong
@ 2009-05-13 17:31     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 4+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-13 17:31 UTC (permalink / raw)
  To: Jiang, Yunhong; +Cc: Xen-devel, Matthew Wilcox

Jiang, Yunhong wrote:
>> moment, a Linux irq == the gsi, which is a convention I kept for Xen
>>     
>
> I suspect if this will always work, because
> 1) It may not work on x86_32, because gsi is compressed in x86_32, or we will not support that?
>   

Yinghai Lu has posted patches to remove gsi compression, now that 
dynamic irq arrays mean we have no practical limit on irq numbers.

> 2) Even in x86_64, I suspect it may not work still. Currently Xen defined NR_IRQS as 256, unless specified in Rules.mk. So what will happen if the gsi > 256? I suspect the PHYSDEVOP_alloc_irq_vector hypercall will fail.
>
> The reason is, the pirq is virtual interrupt line between Xen/guest, and is determined by Xen HV, while irq/gsi is defined by guest/dom0. Current Xen's dom0 has a mapping between pirq/irq, so I'm not sure if that logic is needed in pv_dom0 still.
>   

Well, we could decouple Xen pirq from Linux irq fairly easily if 
needed.  Or just limit the range of identity mapped irqs (by adjusting 
get_nr_hw_irqs()/identity_mapped_irq()) so that the common case will 
still get identities but we can leave room in the sub-256 irq space for 
pirqs.

> Sure, I will do like that. In fact, my draft implemented this way.
> BTW, what do you mean of "an ops vec"???
>   

A struct msi_ops (or something) with function pointers to appropriate 
functions.  But I think a better approach is to just pull everything out 
into  xen_setup_msi_irqs().

>>> void arch_teardown_msi_irq(unsigned int irq) { - destroy_irq(irq); +
>>> if (xen_destroy_irq(irq)) + destroy_irq(irq);
>>>       
>> Yes, this little pattern should be put into a single place.
>>     
>
> What do you mean of put into a single place?
>   

Have a destroy_msi_irq() which contains this logic, and call it from 
whenever it is needed.

>> (base + 0x04)
>> Isn't this defined somewhere else?  If not, is there a better
>> place to define it?
>>     
>
> It is defined in drivers/pci/msi.h. As I don't want to touch anything in drivers/pci/ directory, so I put it here (sorry that I should place it under some .h file).
>   

It is always best to put generically useful stuff in the most 
appropriate common place.

>>> commit 134186e0b4526908b4c64c1454caff5db6ddf972 Author: Jiang,
>>> Yunhong <yunhong.jiang@intel.com> Date: Thu May 7 21:22:26 2009
>>> +0800 Add the pci wrap function, to notify Xen of all PCI
>>> information. Currently Xen depends on Dom0 to notify all PCI
>>> information. When Xen try to setup MSI for a pci device, it will
>>> depends on this information. This method need more discussion, since
>>> it add some ugly hook to pci_bus_type.
>>>       
>> Yes, this is a bit unpleasant.  I can't see any immediately
>> satisfying solution, partly because
>> I don't fully understand what needs to be done in these hooks.
>> Why does it need to do this at probe
>> time rather than when setting up the interrupts?
>>     
>
> The main purpose of this hook is to keep Xen have the list of PCI device in the system, include those hot plug/unpluged devices, before the device begin function,  I think the main purpose is for VT-d support in Xen, so it is in probe time.
>
> In fact, MSI should works without this list. Unfortunately, currently Xen will check if the device exists or not when enabling MSI/MSI-X, and will return -ENODEV if the device is not in the list.
>   

Well, couldn't we quietly add it to the list just before setting up the 
first interrupt for the device?  I guess that wouldn't work for VT-d 
support, unless we can defer it until we first see the device in some 
Xen-specific code.

Matthew, do you have any thoughts about a cleaner way to hook into 
probe/remove?  Would a arch_pci_device_probe() hook in 
pci_device_probe() be a reasonable way to handle it?

> But I do think this code is very ugly, and I suspect if we can push this code into upstream successfully. I know Winston is working on PCI hotplug in Xen side, I will talk with him to see if any plan to change this logic.
>   

Either way we're eventually going to need something like it for VT-d, 
won't we?

>>> +static int pci_ari_enabled(struct pci_bus *bus) +{ + return
>>> bus->self && bus->self->ari_enabled; +}
>>>       
>> Is this a generally useful predicate?
>>     
>
> I just copied from Xen's dom0 code. I will check it .
>   

It looks like something that could be in a common pci header.


    J

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-05-13 17:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-08  9:10 [PATCH]Add MSI support to PV_dom0 Jiang, Yunhong
2009-05-12 20:05 ` Jeremy Fitzhardinge
2009-05-13  3:16   ` Jiang, Yunhong
2009-05-13 17:31     ` Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.