All of lore.kernel.org
 help / color / mirror / Atom feed
* [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
@ 2009-02-17  9:07 Simon Horman
  2009-02-17  9:07 ` [rfc 01/18] Make register_real_device() and unregister_real_device() static Simon Horman
                   ` (19 more replies)
  0 siblings, 20 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

This series starts of with servaral cleanup patches.

The meat of the changes start with the patch
"ioemu: use devfn instead of slots as the unit for passthrough"

This allows multi-function cards to be appear in guets as
multi-function cards, with the restriction that function 0 must
be passed through. Otherwise each function is allocated its own
slot, as before.

e.g.

1. Function 0 and two other functions of a multi-function card are
passed through, and the function numbers are maintained in the guest.

Physical                     Guest
0:1b.0    - pass through ->  0:6.0
0:1b.1    - pass through ->  0:6.1
0:1b.2    - pass through ->  0:6.2

2. Two functions other than zero of a multi-function card are
passed through. Each function is represent as function 0 of a slot
in the guest.

Physical                     Guest
0:1b.1    - pass through ->  0:6.0
0:1b.2    - pass through ->  0:7.0

Patches are also supplied to allow the virtual slot and device
to be supplied in the domain's configuration file. Amongst
other things this allows the existing assignment behaviour
to be specified.

Currently hotplug is not working with this scheme.
I am unsure of why, but am working towards a fix.

These patches are against qemu-xen-unstable.git
3f23188224b7ce69fcf13f52cb1c7977a5372900 as subsequent
revisions do not seem to work for me.

http://lists.xensource.com/archives/html/xen-devel/2009-02/msg00580.html

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 01/18] Make register_real_device() and unregister_real_device() static
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
@ 2009-02-17  9:07 ` Simon Horman
  2009-02-17  9:07 ` [rfc 02/18] Make dpci_infos static Simon Horman
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: ioemu-static-register_real_device.patch --]
[-- Type: text/plain, Size: 1155 bytes --]

Make register_real_device() and unregister_real_device() static as they are
only used inside hw/pass-through.c

Signed-off-by: Simon Horman <horms@verge.net.au>

Index: ioemu-remote/hw/pass-through.c
===================================================================
--- ioemu-remote.orig/hw/pass-through.c	2009-02-17 17:27:27.000000000 +0900
+++ ioemu-remote/hw/pass-through.c	2009-02-17 17:27:29.000000000 +0900
@@ -3068,7 +3068,7 @@ static int pt_msixctrl_reg_write(struct 
     return 0;
 }
 
-struct pt_dev * register_real_device(PCIBus *e_bus,
+static struct pt_dev * register_real_device(PCIBus *e_bus,
         const char *e_dev_name, int e_devfn, uint8_t r_bus, uint8_t r_dev,
         uint8_t r_func, uint32_t machine_irq, struct pci_access *pci_access,
         char *opt)
@@ -3254,7 +3254,7 @@ out:
     return assigned_device;
 }
 
-int unregister_real_device(int php_slot)
+static int unregister_real_device(int php_slot)
 {
     struct php_dev *php_dev;
     struct pci_dev *pci_dev;

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 02/18] Make dpci_infos static
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
  2009-02-17  9:07 ` [rfc 01/18] Make register_real_device() and unregister_real_device() static Simon Horman
@ 2009-02-17  9:07 ` Simon Horman
  2009-02-17  9:07 ` [rfc 03/18] ioemu: vslots needs to be freed on error in power_off_php_slot() Simon Horman
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: ioemu-static-dpci_infos.patch --]
[-- Type: text/plain, Size: 699 bytes --]

Make dpci_infos static as it is only used inside hw/pass-through.c

Signed-off-by: Simon Horman <horms@verge.net.au>

Index: ioemu-remote/hw/pass-through.c
===================================================================
--- ioemu-remote.orig/hw/pass-through.c	2009-02-17 17:27:29.000000000 +0900
+++ ioemu-remote/hw/pass-through.c	2009-02-17 17:27:33.000000000 +0900
@@ -36,7 +36,7 @@ struct php_dev {
     uint8_t r_func;
     char *opt;
 };
-struct dpci_infos {
+static struct dpci_infos {
 
     struct php_dev php_devs[PHP_SLOT_LEN];
 

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 03/18] ioemu: vslots needs to be freed on error in power_off_php_slot()
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
  2009-02-17  9:07 ` [rfc 01/18] Make register_real_device() and unregister_real_device() static Simon Horman
  2009-02-17  9:07 ` [rfc 02/18] Make dpci_infos static Simon Horman
@ 2009-02-17  9:07 ` Simon Horman
  2009-02-17  9:07 ` [rfc 04/18] ioemu: Remove lsi_scsi_init()s devfn parameter as it is always passed as -1 Simon Horman
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: ioemu-vslots-leak.patch --]
[-- Type: text/plain, Size: 1375 bytes --]

Signed-off-by: Simon Horman <horms@verge.net.au>

Index: ioemu-remote/hw/pass-through.c
===================================================================
--- ioemu-remote.orig/hw/pass-through.c	2009-02-17 17:28:10.000000000 +0900
+++ ioemu-remote/hw/pass-through.c	2009-02-17 17:28:11.000000000 +0900
@@ -3356,7 +3356,7 @@ int power_off_php_slot(int php_slot)
 
 int pt_init(PCIBus *e_bus, const char *direct_pci)
 {
-    int seg, b, d, f, php_slot = 0;
+    int seg, b, d, f, php_slot = 0, status = -1;
     struct pt_dev *pt_dev;
     struct pci_access *pci_access;
     char *vslots;
@@ -3400,8 +3400,7 @@ int pt_init(PCIBus *e_bus, const char *d
         if ( pt_dev == NULL )
         {
             PT_LOG("Error: Registration failed (%02x:%02x.%x)\n", b, d, f);
-            free(direct_pci_head);
-            return -1;
+            goto err;
         }
 
         /* Record the virtual slot info */
@@ -3420,10 +3419,11 @@ int pt_init(PCIBus *e_bus, const char *d
     /* Write virtual slots info to xenstore for Control panel use */
     xenstore_write_vslots(vslots);
 
+    status = 0;
+err:
     qemu_free(vslots);
     free(direct_pci_head);
 
-    /* Success */
-    return 0;
+    return status;
 }
 

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 04/18] ioemu: Remove lsi_scsi_init()s devfn parameter as it is always passed as -1
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (2 preceding siblings ...)
  2009-02-17  9:07 ` [rfc 03/18] ioemu: vslots needs to be freed on error in power_off_php_slot() Simon Horman
@ 2009-02-17  9:07 ` Simon Horman
  2009-02-17  9:07 ` [rfc 05/18] " Simon Horman
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: ioemu-lsi_scsi_init-no-devfn-arg.patch --]
[-- Type: text/plain, Size: 3371 bytes --]

Signed-off-by: Simon Horman <horms@verge.net.au>

Index: ioemu-remote/hw/lsi53c895a.c
===================================================================
--- ioemu-remote.orig/hw/lsi53c895a.c	2009-02-12 15:36:30.000000000 +1100
+++ ioemu-remote/hw/lsi53c895a.c	2009-02-12 15:38:34.000000000 +1100
@@ -1858,12 +1858,12 @@ void lsi_scsi_attach(void *opaque, Block
         s->scsi_dev[id] = scsi_disk_init(bd, 1, lsi_command_complete, s);
 }
 
-void *lsi_scsi_init(PCIBus *bus, int devfn)
+void *lsi_scsi_init(PCIBus *bus)
 {
     LSIState *s;
 
     s = (LSIState *)pci_register_device(bus, "LSI53C895A SCSI HBA",
-                                        sizeof(*s), devfn, NULL, NULL);
+                                        sizeof(*s), -1, NULL, NULL);
     if (s == NULL) {
         fprintf(stderr, "lsi-scsi: Failed to register PCI device\n");
         return NULL;
Index: ioemu-remote/hw/pc.c
===================================================================
--- ioemu-remote.orig/hw/pc.c	2009-02-12 15:36:30.000000000 +1100
+++ ioemu-remote/hw/pc.c	2009-02-12 15:38:34.000000000 +1100
@@ -1128,7 +1128,7 @@ static void pc_init1(ram_addr_t ram_size
         max_bus = drive_get_max_bus(IF_SCSI);
 
 	for (bus = 0; bus <= max_bus; bus++) {
-            scsi = lsi_scsi_init(pci_bus, -1);
+            scsi = lsi_scsi_init(pci_bus);
             for (unit = 0; unit < LSI_MAX_DEVS; unit++) {
 	        index = drive_get_index(IF_SCSI, bus, unit);
 		if (index == -1)
Index: ioemu-remote/hw/pci.h
===================================================================
--- ioemu-remote.orig/hw/pci.h	2009-02-12 15:36:30.000000000 +1100
+++ ioemu-remote/hw/pci.h	2009-02-12 15:38:34.000000000 +1100
@@ -125,7 +125,7 @@ void do_pci_del(char *devname);
 /* lsi53c895a.c */
 #define LSI_MAX_DEVS 7
 void lsi_scsi_attach(void *opaque, BlockDriverState *bd, int id);
-void *lsi_scsi_init(PCIBus *bus, int devfn);
+void *lsi_scsi_init(PCIBus *bus);
 
 /* vmware_vga.c */
 void pci_vmsvga_init(PCIBus *bus, DisplayState *ds, uint8_t *vga_ram_base,
Index: ioemu-remote/hw/realview.c
===================================================================
--- ioemu-remote.orig/hw/realview.c	2009-02-12 15:36:30.000000000 +1100
+++ ioemu-remote/hw/realview.c	2009-02-12 15:38:34.000000000 +1100
@@ -112,7 +112,7 @@ static void realview_init(ram_addr_t ram
         fprintf(stderr, "qemu: too many SCSI bus\n");
         exit(1);
     }
-    scsi_hba = lsi_scsi_init(pci_bus, -1);
+    scsi_hba = lsi_scsi_init(pci_bus);
     for (n = 0; n < LSI_MAX_DEVS; n++) {
         index = drive_get_index(IF_SCSI, 0, n);
         if (index == -1)
Index: ioemu-remote/hw/versatilepb.c
===================================================================
--- ioemu-remote.orig/hw/versatilepb.c	2009-02-12 15:36:30.000000000 +1100
+++ ioemu-remote/hw/versatilepb.c	2009-02-12 15:38:34.000000000 +1100
@@ -213,7 +213,7 @@ static void versatile_init(ram_addr_t ra
         fprintf(stderr, "qemu: too many SCSI bus\n");
         exit(1);
     }
-    scsi_hba = lsi_scsi_init(pci_bus, -1);
+    scsi_hba = lsi_scsi_init(pci_bus);
     for (n = 0; n < LSI_MAX_DEVS; n++) {
         index = drive_get_index(IF_SCSI, 0, n);
         if (index == -1)

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 05/18] ioemu: Remove lsi_scsi_init()s devfn parameter as it is always passed as -1
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (3 preceding siblings ...)
  2009-02-17  9:07 ` [rfc 04/18] ioemu: Remove lsi_scsi_init()s devfn parameter as it is always passed as -1 Simon Horman
@ 2009-02-17  9:07 ` Simon Horman
  2009-02-17  9:07 ` [rfc 06/18] ioemu: Remove usb_ohci_init*()s devfn parameter as they are " Simon Horman
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: ioemu-piix3_init-no-devfn-arg.patch --]
[-- Type: text/plain, Size: 2027 bytes --]

Signed-off-by: Simon Horman <horms@verge.net.au>

Index: ioemu-remote/hw/pc.c
===================================================================
--- ioemu-remote.orig/hw/pc.c	2009-02-17 17:24:14.000000000 +0900
+++ ioemu-remote/hw/pc.c	2009-02-17 17:24:17.000000000 +0900
@@ -946,7 +946,7 @@ static void pc_init1(ram_addr_t ram_size
 
     if (pci_enabled) {
         pci_bus = i440fx_init(&i440fx_state, i8259);
-        piix3_devfn = piix3_init(pci_bus, -1);
+        piix3_devfn = piix3_init(pci_bus);
     } else {
         pci_bus = NULL;
     }
Index: ioemu-remote/hw/pc.h
===================================================================
--- ioemu-remote.orig/hw/pc.h	2009-02-17 17:23:25.000000000 +0900
+++ ioemu-remote/hw/pc.h	2009-02-17 17:24:17.000000000 +0900
@@ -106,7 +106,7 @@ int pcspk_audio_init(AudioState *, qemu_
 /* piix_pci.c */
 PCIBus *i440fx_init(PCIDevice **pi440fx_state, qemu_irq *pic);
 void i440fx_set_smm(PCIDevice *d, int val);
-int piix3_init(PCIBus *bus, int devfn);
+int piix3_init(PCIBus *bus);
 void i440fx_init_memory_mappings(PCIDevice *d);
 
 int piix4_init(PCIBus *bus, int devfn);
Index: ioemu-remote/hw/piix_pci.c
===================================================================
--- ioemu-remote.orig/hw/piix_pci.c	2009-02-17 17:23:25.000000000 +0900
+++ ioemu-remote/hw/piix_pci.c	2009-02-17 17:24:17.000000000 +0900
@@ -348,13 +348,13 @@ static int piix_load(QEMUFile* f, void *
     return pci_device_load(d, f);
 }
 
-int piix3_init(PCIBus *bus, int devfn)
+int piix3_init(PCIBus *bus)
 {
     PCIDevice *d;
     uint8_t *pci_conf;
 
     d = pci_register_device(bus, "PIIX3", sizeof(PCIDevice),
-                                    devfn, NULL, piix3_write_config);
+                                    -1, NULL, piix3_write_config);
     register_savevm("PIIX3", 0, 2, piix_save, piix_load, d);
 
     piix3_dev = d;

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 06/18] ioemu: Remove usb_ohci_init*()s devfn parameter as they are always passed as -1
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (4 preceding siblings ...)
  2009-02-17  9:07 ` [rfc 05/18] " Simon Horman
@ 2009-02-17  9:07 ` Simon Horman
  2009-02-17  9:07 ` [rfc 07/18] iommu: Define PCI_DEVFN_AUTO and use it Simon Horman
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: ioemu-usb_ohci_init-no-devfn-arg.patch --]
[-- Type: text/plain, Size: 7092 bytes --]

Signed-off-by: Simon Horman <horms@verge.net.au>

Index: ioemu-remote/hw/pci.h
===================================================================
--- ioemu-remote.orig/hw/pci.h	2009-02-17 17:24:14.000000000 +0900
+++ ioemu-remote/hw/pci.h	2009-02-17 17:24:24.000000000 +0900
@@ -136,7 +136,7 @@ void usb_uhci_piix3_init(PCIBus *bus, in
 void usb_uhci_piix4_init(PCIBus *bus, int devfn);
 
 /* usb-ohci.c */
-void usb_ohci_init_pci(struct PCIBus *bus, int num_ports, int devfn);
+void usb_ohci_init_pci(struct PCIBus *bus, int num_ports);
 
 /* eepro100.c */
 
Index: ioemu-remote/hw/ppc_chrp.c
===================================================================
--- ioemu-remote.orig/hw/ppc_chrp.c	2009-02-17 17:23:25.000000000 +0900
+++ ioemu-remote/hw/ppc_chrp.c	2009-02-17 17:24:24.000000000 +0900
@@ -298,7 +298,7 @@ static void ppc_core99_init (ram_addr_t 
                cuda_mem_index, NULL, 2, ide_mem_index);
 
     if (usb_enabled) {
-        usb_ohci_init_pci(pci_bus, 3, -1);
+        usb_ohci_init_pci(pci_bus, 3);
     }
 
     if (graphic_depth != 15 && graphic_depth != 32 && graphic_depth != 8)
Index: ioemu-remote/hw/ppc_oldworld.c
===================================================================
--- ioemu-remote.orig/hw/ppc_oldworld.c	2009-02-17 17:23:25.000000000 +0900
+++ ioemu-remote/hw/ppc_oldworld.c	2009-02-17 17:24:24.000000000 +0900
@@ -342,7 +342,7 @@ static void ppc_heathrow_init (ram_addr_
                cuda_mem_index, nvr, 2, ide_mem_index);
 
     if (usb_enabled) {
-        usb_ohci_init_pci(pci_bus, 3, -1);
+        usb_ohci_init_pci(pci_bus, 3);
     }
 
     if (graphic_depth != 15 && graphic_depth != 32 && graphic_depth != 8)
Index: ioemu-remote/hw/ppc_prep.c
===================================================================
--- ioemu-remote.orig/hw/ppc_prep.c	2009-02-17 17:23:25.000000000 +0900
+++ ioemu-remote/hw/ppc_prep.c	2009-02-17 17:24:24.000000000 +0900
@@ -733,7 +733,7 @@ static void ppc_prep_init (ram_addr_t ra
 #endif
 
     if (usb_enabled) {
-        usb_ohci_init_pci(pci_bus, 3, -1);
+        usb_ohci_init_pci(pci_bus, 3);
     }
 
     m48t59 = m48t59_init(i8259[8], 0, 0x0074, NVRAM_SIZE, 59);
Index: ioemu-remote/hw/realview.c
===================================================================
--- ioemu-remote.orig/hw/realview.c	2009-02-17 17:24:14.000000000 +0900
+++ ioemu-remote/hw/realview.c	2009-02-17 17:24:24.000000000 +0900
@@ -106,7 +106,7 @@ static void realview_init(ram_addr_t ram
 
     pci_bus = pci_vpb_init(pic, 48, 1);
     if (usb_enabled) {
-        usb_ohci_init_pci(pci_bus, 3, -1);
+        usb_ohci_init_pci(pci_bus, 3);
     }
     if (drive_get_max_bus(IF_SCSI) > 0) {
         fprintf(stderr, "qemu: too many SCSI bus\n");
Index: ioemu-remote/hw/versatilepb.c
===================================================================
--- ioemu-remote.orig/hw/versatilepb.c	2009-02-17 17:24:14.000000000 +0900
+++ ioemu-remote/hw/versatilepb.c	2009-02-17 17:24:24.000000000 +0900
@@ -207,7 +207,7 @@ static void versatile_init(ram_addr_t ra
         }
     }
     if (usb_enabled) {
-        usb_ohci_init_pci(pci_bus, 3, -1);
+        usb_ohci_init_pci(pci_bus, 3);
     }
     if (drive_get_max_bus(IF_SCSI) > 0) {
         fprintf(stderr, "qemu: too many SCSI bus\n");
Index: ioemu-remote/hw/usb-ohci.c
===================================================================
--- ioemu-remote.orig/hw/usb-ohci.c	2009-02-17 17:23:25.000000000 +0900
+++ ioemu-remote/hw/usb-ohci.c	2009-02-17 17:24:24.000000000 +0900
@@ -1592,7 +1592,7 @@ static CPUWriteMemoryFunc *ohci_writefn[
     ohci_mem_write
 };
 
-static void usb_ohci_init(OHCIState *ohci, int num_ports, int devfn,
+static void usb_ohci_init(OHCIState *ohci, int num_ports,
             qemu_irq irq, enum ohci_type type, const char *name)
 {
     int i;
@@ -1642,14 +1642,14 @@ static void ohci_mapfunc(PCIDevice *pci_
     cpu_register_physical_memory(addr, size, ohci->state.mem);
 }
 
-void usb_ohci_init_pci(struct PCIBus *bus, int num_ports, int devfn)
+void usb_ohci_init_pci(struct PCIBus *bus, int num_ports)
 {
     OHCIPCIState *ohci;
     int vid = 0x106b;
     int did = 0x003f;
 
     ohci = (OHCIPCIState *)pci_register_device(bus, "OHCI USB", sizeof(*ohci),
-                                               devfn, NULL, NULL);
+                                               -1, NULL, NULL);
     if (ohci == NULL) {
         fprintf(stderr, "usb-ohci: Failed to register PCI device\n");
         return;
@@ -1664,20 +1664,18 @@ void usb_ohci_init_pci(struct PCIBus *bu
     ohci->pci_dev.config[0x0b] = 0xc;
     ohci->pci_dev.config[0x3d] = 0x01; /* interrupt pin 1 */
 
-    usb_ohci_init(&ohci->state, num_ports, devfn, ohci->pci_dev.irq[0],
+    usb_ohci_init(&ohci->state, num_ports, ohci->pci_dev.irq[0],
                   OHCI_TYPE_PCI, ohci->pci_dev.name);
 
     pci_register_io_region((struct PCIDevice *)ohci, 0, 256,
                            PCI_ADDRESS_SPACE_MEM, ohci_mapfunc);
 }
 
-void usb_ohci_init_pxa(target_phys_addr_t base, int num_ports, int devfn,
-                       qemu_irq irq)
+void usb_ohci_init_pxa(target_phys_addr_t base, int num_ports, qemu_irq irq)
 {
     OHCIState *ohci = (OHCIState *)qemu_mallocz(sizeof(OHCIState));
 
-    usb_ohci_init(ohci, num_ports, devfn, irq,
-                  OHCI_TYPE_PXA, "OHCI USB");
+    usb_ohci_init(ohci, num_ports, irq, OHCI_TYPE_PXA, "OHCI USB");
     ohci->mem_base = base;
 
     cpu_register_physical_memory(ohci->mem_base, 0x1000, ohci->mem);
Index: ioemu-remote/hw/pxa.h
===================================================================
--- ioemu-remote.orig/hw/pxa.h	2009-02-17 17:23:25.000000000 +0900
+++ ioemu-remote/hw/pxa.h	2009-02-17 17:24:24.000000000 +0900
@@ -221,7 +221,6 @@ struct pxa2xx_state_s *pxa270_init(unsig
 struct pxa2xx_state_s *pxa255_init(unsigned int sdram_size, DisplayState *ds);
 
 /* usb-ohci.c */
-void usb_ohci_init_pxa(target_phys_addr_t base, int num_ports, int devfn,
-                       qemu_irq irq);
+void usb_ohci_init_pxa(target_phys_addr_t base, int num_ports, qemu_irq irq);
 
 #endif	/* PXA_H */
Index: ioemu-remote/hw/pxa2xx.c
===================================================================
--- ioemu-remote.orig/hw/pxa2xx.c	2009-02-17 17:23:25.000000000 +0900
+++ ioemu-remote/hw/pxa2xx.c	2009-02-17 17:24:24.000000000 +0900
@@ -2130,7 +2130,7 @@ struct pxa2xx_state_s *pxa270_init(unsig
     }
 
     if (usb_enabled) {
-        usb_ohci_init_pxa(0x4c000000, 3, -1, s->pic[PXA2XX_PIC_USBH1]);
+        usb_ohci_init_pxa(0x4c000000, 3, s->pic[PXA2XX_PIC_USBH1]);
     }
 
     s->pcmcia[0] = pxa2xx_pcmcia_init(0x20000000);
@@ -2253,7 +2253,7 @@ struct pxa2xx_state_s *pxa255_init(unsig
     }
 
     if (usb_enabled) {
-        usb_ohci_init_pxa(0x4c000000, 3, -1, s->pic[PXA2XX_PIC_USBH1]);
+        usb_ohci_init_pxa(0x4c000000, 3, s->pic[PXA2XX_PIC_USBH1]);
     }
 
     s->pcmcia[0] = pxa2xx_pcmcia_init(0x20000000);

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 07/18] iommu: Define PCI_DEVFN_AUTO and use it
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (5 preceding siblings ...)
  2009-02-17  9:07 ` [rfc 06/18] ioemu: Remove usb_ohci_init*()s devfn parameter as they are " Simon Horman
@ 2009-02-17  9:07 ` Simon Horman
  2009-02-17  9:07 ` [rfc 08/18] iommu: Use PCI_DEVFN to create devfn numbers Simon Horman
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: ioemu-PCI_DEVFN_AUTO.patch --]
[-- Type: text/plain, Size: 18760 bytes --]

Rather than using -1 to signify autodetection from non-pass-through
and PT_MACHINE_IRQ_AUTO for pass-through, always use PCI_DEVFN_AUTO.

Signed-off-by: Simon Horman <horms@verge.net.au>

Index: ioemu-remote/hw/pci.c
===================================================================
--- ioemu-remote.orig/hw/pci.c	2009-02-17 17:26:38.000000000 +0900
+++ ioemu-remote/hw/pci.c	2009-02-17 17:28:20.000000000 +0900
@@ -130,7 +130,6 @@ int pci_device_load(PCIDevice *s, QEMUFi
     return 0;
 }
 
-/* -1 for devfn means auto assign */
 PCIDevice *pci_register_device(PCIBus *bus, const char *name,
                                int instance_size, int devfn,
                                PCIConfigReadFunc *config_read,
@@ -138,7 +137,7 @@ PCIDevice *pci_register_device(PCIBus *b
 {
     PCIDevice *pci_dev;
 
-    if (devfn < 0) {
+    if (devfn == PCI_DEVFN_AUTO) {
         for(devfn = bus->devfn_min ; devfn < 256; devfn += 8) {
             if ( !bus->devices[devfn] &&
                  !( devfn >= PHP_DEVFN_START && devfn < PHP_DEVFN_END ) )
Index: ioemu-remote/hw/ac97.c
===================================================================
--- ioemu-remote.orig/hw/ac97.c	2009-02-17 17:26:38.000000000 +0900
+++ ioemu-remote/hw/ac97.c	2009-02-17 17:28:20.000000000 +0900
@@ -1326,7 +1326,7 @@ int ac97_init (PCIBus *bus, AudioState *
 
     d = (PCIAC97LinkState *) pci_register_device (bus, "AC97",
                                                   sizeof (PCIAC97LinkState),
-                                                  -1, NULL, NULL);
+                                                  PCI_DEVFN_AUTO, NULL, NULL);
 
     if (!d) {
         AUD_log ("ac97", "Failed to register PCI device\n");
Index: ioemu-remote/hw/cirrus_vga.c
===================================================================
--- ioemu-remote.orig/hw/cirrus_vga.c	2009-02-17 17:26:38.000000000 +0900
+++ ioemu-remote/hw/cirrus_vga.c	2009-02-17 17:28:20.000000000 +0900
@@ -3361,7 +3361,7 @@ void pci_cirrus_vga_init(PCIBus *bus, Di
     /* setup PCI configuration registers */
     d = (PCICirrusVGAState *)pci_register_device(bus, "Cirrus VGA",
                                                  sizeof(PCICirrusVGAState),
-                                                 -1, NULL, NULL);
+                                                 PCI_DEVFN_AUTO, NULL, NULL);
     pci_conf = d->dev.config;
     pci_conf[0x00] = (uint8_t) (PCI_VENDOR_CIRRUS & 0xff);
     pci_conf[0x01] = (uint8_t) (PCI_VENDOR_CIRRUS >> 8);
Index: ioemu-remote/hw/eepro100.c
===================================================================
--- ioemu-remote.orig/hw/eepro100.c	2009-02-17 17:26:38.000000000 +0900
+++ ioemu-remote/hw/eepro100.c	2009-02-17 17:28:20.000000000 +0900
@@ -1750,8 +1750,8 @@ static void nic_init(PCIBus * bus, NICIn
     logout("\n");
 
     d = (PCIEEPRO100State *) pci_register_device(bus, name,
-                                                 sizeof(PCIEEPRO100State), -1,
-                                                 NULL, NULL);
+                                                 sizeof(PCIEEPRO100State),
+                                                 PCI_DEVFN_AUTO, NULL, NULL);
 
     s = &d->eepro100;
     s->device = device;
Index: ioemu-remote/hw/es1370.c
===================================================================
--- ioemu-remote.orig/hw/es1370.c	2009-02-17 17:26:38.000000000 +0900
+++ ioemu-remote/hw/es1370.c	2009-02-17 17:28:20.000000000 +0900
@@ -1023,7 +1023,7 @@ int es1370_init (PCIBus *bus, AudioState
 
     d = (PCIES1370State *) pci_register_device (bus, "ES1370",
                                                 sizeof (PCIES1370State),
-                                                -1, NULL, NULL);
+                                                PCI_DEVFN_AUTO, NULL, NULL);
 
     if (!d) {
         AUD_log (NULL, "Failed to register PCI device for ES1370\n");
Index: ioemu-remote/hw/ide.c
===================================================================
--- ioemu-remote.orig/hw/ide.c	2009-02-17 17:26:38.000000000 +0900
+++ ioemu-remote/hw/ide.c	2009-02-17 17:28:20.000000000 +0900
@@ -3463,7 +3463,7 @@ void pci_cmd646_ide_init(PCIBus *bus, Bl
 
     d = (PCIIDEState *)pci_register_device(bus, "CMD646 IDE",
                                            sizeof(PCIIDEState),
-                                           -1,
+                                           PCI_DEVFN_AUTO,
                                            NULL, NULL);
     if (principal_ide_controller)
 	abort();
Index: ioemu-remote/hw/macio.c
===================================================================
--- ioemu-remote.orig/hw/macio.c	2009-02-17 17:26:38.000000000 +0900
+++ ioemu-remote/hw/macio.c	2009-02-17 17:28:20.000000000 +0900
@@ -83,7 +83,7 @@ void macio_init (PCIBus *bus, int device
 
     d = pci_register_device(bus, "macio",
                             sizeof(PCIDevice) + sizeof(macio_state_t),
-                            -1, NULL, NULL);
+                            PCI_DEVFN_AUTO, NULL, NULL);
     macio_state = (macio_state_t *)(d + 1);
     macio_state->is_oldworld = is_oldworld;
     macio_state->pic_mem_index = pic_mem_index;
Index: ioemu-remote/hw/openpic.c
===================================================================
--- ioemu-remote.orig/hw/openpic.c	2009-02-17 17:26:38.000000000 +0900
+++ ioemu-remote/hw/openpic.c	2009-02-17 17:28:20.000000000 +0900
@@ -1013,7 +1013,7 @@ qemu_irq *openpic_init (PCIBus *bus, int
         return NULL;
     if (bus) {
         opp = (openpic_t *)pci_register_device(bus, "OpenPIC", sizeof(openpic_t),
-                                               -1, NULL, NULL);
+                                               PCI_DEVFN_AUTO, NULL, NULL);
         if (opp == NULL)
             return NULL;
         pci_conf = opp->pci_dev.config;
Index: ioemu-remote/hw/pass-through.c
===================================================================
--- ioemu-remote.orig/hw/pass-through.c	2009-02-17 17:28:11.000000000 +0900
+++ ioemu-remote/hw/pass-through.c	2009-02-17 17:28:23.000000000 +0900
@@ -3101,7 +3101,7 @@ static struct pt_dev * register_real_dev
     pci_fill_info(pci_dev, PCI_FILL_IRQ | PCI_FILL_BASES | PCI_FILL_ROM_BASE | PCI_FILL_SIZES);
     pt_libpci_fixup(pci_dev);
 
-    if ( e_devfn == PT_VIRT_DEVFN_AUTO ) {
+    if ( e_devfn == PCI_DEVFN_AUTO ) {
         /*indicate a static assignment(not hotplug), so find a free PCI hot plug slot */
         free_pci_slot = __insert_to_pci_slot(r_bus, r_dev, r_func, 0, NULL);
         if ( free_pci_slot > 0 )
@@ -3395,7 +3395,7 @@ int pt_init(PCIBus *e_bus, const char *d
     while ( next_bdf(&direct_pci_p, &seg, &b, &d, &f, &opt) )
     {
         /* Register real device with the emulated bus */
-        pt_dev = register_real_device(e_bus, "DIRECT PCI", PT_VIRT_DEVFN_AUTO,
+        pt_dev = register_real_device(e_bus, "DIRECT PCI", PCI_DEVFN_AUTO,
             b, d, f, PT_MACHINE_IRQ_AUTO, pci_access, opt);
         if ( pt_dev == NULL )
         {
Index: ioemu-remote/hw/pass-through.h
===================================================================
--- ioemu-remote.orig/hw/pass-through.h	2009-02-17 17:28:06.000000000 +0900
+++ ioemu-remote/hw/pass-through.h	2009-02-17 17:28:20.000000000 +0900
@@ -38,7 +38,6 @@
 // #define PT_DEBUG_PCI_CONFIG_ACCESS
 
 #define PT_MACHINE_IRQ_AUTO (0xFFFFFFFF)
-#define PT_VIRT_DEVFN_AUTO  (-1)
 
 /* Misc PCI constants that should be moved to a separate library :) */
 #define PCI_CONFIG_SIZE         (256)
Index: ioemu-remote/hw/pci.h
===================================================================
--- ioemu-remote.orig/hw/pci.h	2009-02-17 17:28:19.000000000 +0900
+++ ioemu-remote/hw/pci.h	2009-02-17 17:28:20.000000000 +0900
@@ -29,6 +29,8 @@ typedef struct PCIIORegion {
 #define PCI_ROM_SLOT 6
 #define PCI_NUM_REGIONS 7
 
+#define PCI_DEVFN_AUTO -1
+
 #define PCI_DEVICES_MAX 64
 
 #define PCI_VENDOR_ID		0x00	/* 16 bits */
Index: ioemu-remote/hw/pci_emulation.c
===================================================================
--- ioemu-remote.orig/hw/pci_emulation.c	2009-02-17 17:26:38.000000000 +0900
+++ ioemu-remote/hw/pci_emulation.c	2009-02-17 17:28:20.000000000 +0900
@@ -90,7 +90,7 @@ void pci_emulation_init(PCIBus *bus, PCI
     d = (PCI_EMULATION_State *)pci_register_device(bus,
                                                    pci_emulation_info->name, 
                                                    sizeof(PCI_EMULATION_State),
-                                                   -1, 
+                                                   PCI_DEVFN_AUTO,
                                                     NULL, NULL);
     pci_conf = d->dev.config;
     pci_conf[0x00] = pci_emulation_info->vendorid & 0xff;
Index: ioemu-remote/hw/versatile_pci.c
===================================================================
--- ioemu-remote.orig/hw/versatile_pci.c	2009-02-17 17:26:38.000000000 +0900
+++ ioemu-remote/hw/versatile_pci.c	2009-02-17 17:28:20.000000000 +0900
@@ -117,7 +117,8 @@ PCIBus *pci_vpb_init(qemu_irq *pic, int 
     /* Normal config area.  */
     cpu_register_physical_memory(base + 0x02000000, 0x1000000, mem_config);
 
-    d = pci_register_device(s, name, sizeof(PCIDevice), -1, NULL, NULL);
+    d = pci_register_device(s, name, sizeof(PCIDevice), PCI_DEVFN_AUTO,
+                            NULL, NULL);
 
     if (realview) {
         /* IO memory area.  */
Index: ioemu-remote/hw/vga.c
===================================================================
--- ioemu-remote.orig/hw/vga.c	2009-02-17 17:26:38.000000000 +0900
+++ ioemu-remote/hw/vga.c	2009-02-17 17:28:20.000000000 +0900
@@ -2638,7 +2638,7 @@ int pci_vga_init(PCIBus *bus, DisplaySta
 
     d = (PCIVGAState *)pci_register_device(bus, "VGA",
                                            sizeof(PCIVGAState),
-                                           -1, NULL, NULL);
+                                           PCI_DEVFN_AUTO, NULL, NULL);
     if (!d)
         return -1;
     s = &d->vga_state;
Index: ioemu-remote/hw/vmware_vga.c
===================================================================
--- ioemu-remote.orig/hw/vmware_vga.c	2009-02-17 17:26:38.000000000 +0900
+++ ioemu-remote/hw/vmware_vga.c	2009-02-17 17:28:20.000000000 +0900
@@ -1229,7 +1229,7 @@ void pci_vmsvga_init(PCIBus *bus, Displa
     /* Setup PCI configuration */
     s = (struct pci_vmsvga_state_s *)
         pci_register_device(bus, "QEMUware SVGA",
-                sizeof(struct pci_vmsvga_state_s), -1, 0, 0);
+                sizeof(struct pci_vmsvga_state_s), PCI_DEVFN_AUTO, 0, 0);
     s->card.config[PCI_VENDOR_ID]	= PCI_VENDOR_ID_VMWARE & 0xff;
     s->card.config[PCI_VENDOR_ID + 1]	= PCI_VENDOR_ID_VMWARE >> 8;
     s->card.config[PCI_DEVICE_ID]	= SVGA_PCI_DEVICE_ID & 0xff;
Index: ioemu-remote/hw/xen_platform.c
===================================================================
--- ioemu-remote.orig/hw/xen_platform.c	2009-02-17 17:26:38.000000000 +0900
+++ ioemu-remote/hw/xen_platform.c	2009-02-17 17:28:20.000000000 +0900
@@ -401,7 +401,8 @@ void pci_xen_platform_init(PCIBus *bus)
 
     printf("Register xen platform.\n");
     d = (PCIXenPlatformState *)pci_register_device(
-        bus, "xen-platform", sizeof(PCIXenPlatformState), -1, NULL, NULL);
+        bus, "xen-platform", sizeof(PCIXenPlatformState),
+        PCI_DEVFN_AUTO, NULL, NULL);
     pch = (struct pci_config_header *)d->pci_dev.config;
     pch->vendor_id = 0x5853;
     pch->device_id = 0x0001;
Index: ioemu-remote/hw/lsi53c895a.c
===================================================================
--- ioemu-remote.orig/hw/lsi53c895a.c	2009-02-17 17:28:17.000000000 +0900
+++ ioemu-remote/hw/lsi53c895a.c	2009-02-17 17:28:20.000000000 +0900
@@ -1863,7 +1863,7 @@ void *lsi_scsi_init(PCIBus *bus)
     LSIState *s;
 
     s = (LSIState *)pci_register_device(bus, "LSI53C895A SCSI HBA",
-                                        sizeof(*s), -1, NULL, NULL);
+                                        sizeof(*s), PCI_DEVFN_AUTO, NULL, NULL);
     if (s == NULL) {
         fprintf(stderr, "lsi-scsi: Failed to register PCI device\n");
         return NULL;
Index: ioemu-remote/hw/mips_malta.c
===================================================================
--- ioemu-remote.orig/hw/mips_malta.c	2009-02-17 17:26:38.000000000 +0900
+++ ioemu-remote/hw/mips_malta.c	2009-02-17 17:28:23.000000000 +0900
@@ -497,7 +497,7 @@ static void network_init (PCIBus *pci_bu
             /* The malta board has a PCNet card using PCI SLOT 11 */
             pci_nic_init(pci_bus, nd, 88);
         } else {
-            pci_nic_init(pci_bus, nd, -1);
+            pci_nic_init(pci_bus, nd, PCI_DEVFN_AUTO);
         }
     }
 }
Index: ioemu-remote/hw/pc.c
===================================================================
--- ioemu-remote.orig/hw/pc.c	2009-02-17 17:28:18.000000000 +0900
+++ ioemu-remote/hw/pc.c	2009-02-17 17:28:20.000000000 +0900
@@ -1044,7 +1044,7 @@ static void pc_init1(ram_addr_t ram_size
         } else if (pci_enabled) {
             if (strcmp(nd->model, "?") == 0)
                 fprintf(stderr, "qemu: Supported ISA NICs: ne2k_isa\n");
-            pci_nic_init(pci_bus, nd, -1);
+            pci_nic_init(pci_bus, nd, PCI_DEVFN_AUTO);
         } else if (strcmp(nd->model, "?") == 0) {
             fprintf(stderr, "qemu: Supported ISA NICs: ne2k_isa\n");
             exit(1);
Index: ioemu-remote/hw/ppc_chrp.c
===================================================================
--- ioemu-remote.orig/hw/ppc_chrp.c	2009-02-17 17:28:19.000000000 +0900
+++ ioemu-remote/hw/ppc_chrp.c	2009-02-17 17:28:20.000000000 +0900
@@ -267,7 +267,7 @@ static void ppc_core99_init (ram_addr_t 
     for(i = 0; i < nb_nics; i++) {
         if (!nd_table[i].model)
             nd_table[i].model = "ne2k_pci";
-        pci_nic_init(pci_bus, &nd_table[i], -1);
+        pci_nic_init(pci_bus, &nd_table[i], PCI_DEVFN_AUTO);
     }
     if (drive_get_max_bus(IF_IDE) >= MAX_IDE_BUS) {
         fprintf(stderr, "qemu: too many IDE bus\n");
Index: ioemu-remote/hw/ppc_oldworld.c
===================================================================
--- ioemu-remote.orig/hw/ppc_oldworld.c	2009-02-17 17:28:19.000000000 +0900
+++ ioemu-remote/hw/ppc_oldworld.c	2009-02-17 17:28:20.000000000 +0900
@@ -291,7 +291,7 @@ static void ppc_heathrow_init (ram_addr_
     for(i = 0; i < nb_nics; i++) {
         if (!nd_table[i].model)
             nd_table[i].model = "ne2k_pci";
-        pci_nic_init(pci_bus, &nd_table[i], -1);
+        pci_nic_init(pci_bus, &nd_table[i], PCI_DEVFN_AUTO);
     }
 
     /* First IDE channel is a CMD646 on the PCI bus */
Index: ioemu-remote/hw/ppc_prep.c
===================================================================
--- ioemu-remote.orig/hw/ppc_prep.c	2009-02-17 17:28:19.000000000 +0900
+++ ioemu-remote/hw/ppc_prep.c	2009-02-17 17:28:20.000000000 +0900
@@ -673,7 +673,7 @@ static void ppc_prep_init (ram_addr_t ra
             || strcmp(nd_table[i].model, "ne2k_isa") == 0) {
             isa_ne2000_init(ne2000_io[i], i8259[ne2000_irq[i]], &nd_table[i]);
         } else {
-            pci_nic_init(pci_bus, &nd_table[i], -1);
+            pci_nic_init(pci_bus, &nd_table[i], PCI_DEVFN__AUTO);
         }
     }
 
Index: ioemu-remote/hw/realview.c
===================================================================
--- ioemu-remote.orig/hw/realview.c	2009-02-17 17:28:19.000000000 +0900
+++ ioemu-remote/hw/realview.c	2009-02-17 17:28:20.000000000 +0900
@@ -126,7 +126,7 @@ static void realview_init(ram_addr_t ram
         if (strcmp(nd->model, "smc91c111") == 0) {
             smc91c111_init(nd, 0x4e000000, pic[28]);
         } else {
-            pci_nic_init(pci_bus, nd, -1);
+            pci_nic_init(pci_bus, nd, PCI_DEVFN_AUTO);
         }
     }
 
Index: ioemu-remote/hw/sun4u.c
===================================================================
--- ioemu-remote.orig/hw/sun4u.c	2009-02-17 17:26:38.000000000 +0900
+++ ioemu-remote/hw/sun4u.c	2009-02-17 17:28:20.000000000 +0900
@@ -478,7 +478,7 @@ static void sun4uv_init(ram_addr_t RAM_s
     for(i = 0; i < nb_nics; i++) {
         if (!nd_table[i].model)
             nd_table[i].model = "ne2k_pci";
-        pci_nic_init(pci_bus, &nd_table[i], -1);
+        pci_nic_init(pci_bus, &nd_table[i], PCI_DEVFN_AUTO);
     }
 
     irq = qemu_allocate_irqs(cpu_set_irq, env, MAX_PILS);
@@ -496,7 +496,7 @@ static void sun4uv_init(ram_addr_t RAM_s
     }
 
     // XXX pci_cmd646_ide_init(pci_bus, hd, 1);
-    pci_piix3_ide_init(pci_bus, hd, -1, irq);
+    pci_piix3_ide_init(pci_bus, hd, PCI_DEVFN_AUTO, irq);
     /* FIXME: wire up interrupts.  */
     i8042_init(NULL/*1*/, NULL/*12*/, 0x60);
     for(i = 0; i < MAX_FD; i++) {
Index: ioemu-remote/hw/versatilepb.c
===================================================================
--- ioemu-remote.orig/hw/versatilepb.c	2009-02-17 17:28:19.000000000 +0900
+++ ioemu-remote/hw/versatilepb.c	2009-02-17 17:28:20.000000000 +0900
@@ -203,7 +203,7 @@ static void versatile_init(ram_addr_t ra
         if (strcmp(nd->model, "smc91c111") == 0) {
             smc91c111_init(nd, 0x10010000, sic[25]);
         } else {
-            pci_nic_init(pci_bus, nd, -1);
+            pci_nic_init(pci_bus, nd, PCI_DEVFN_AUTO);
         }
     }
     if (usb_enabled) {
Index: ioemu-remote/hw/piix_pci.c
===================================================================
--- ioemu-remote.orig/hw/piix_pci.c	2009-02-17 17:28:18.000000000 +0900
+++ ioemu-remote/hw/piix_pci.c	2009-02-17 17:28:20.000000000 +0900
@@ -354,7 +354,7 @@ int piix3_init(PCIBus *bus)
     uint8_t *pci_conf;
 
     d = pci_register_device(bus, "PIIX3", sizeof(PCIDevice),
-                                    -1, NULL, piix3_write_config);
+                                    PCI_DEVFN_AUTO, NULL, piix3_write_config);
     register_savevm("PIIX3", 0, 2, piix_save, piix_load, d);
 
     piix3_dev = d;
Index: ioemu-remote/hw/usb-ohci.c
===================================================================
--- ioemu-remote.orig/hw/usb-ohci.c	2009-02-17 17:28:19.000000000 +0900
+++ ioemu-remote/hw/usb-ohci.c	2009-02-17 17:28:20.000000000 +0900
@@ -1649,7 +1649,7 @@ void usb_ohci_init_pci(struct PCIBus *bu
     int did = 0x003f;
 
     ohci = (OHCIPCIState *)pci_register_device(bus, "OHCI USB", sizeof(*ohci),
-                                               -1, NULL, NULL);
+                                               PCI_DEVFN_AUTO, NULL, NULL);
     if (ohci == NULL) {
         fprintf(stderr, "usb-ohci: Failed to register PCI device\n");
         return;

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 08/18] iommu: Use PCI_DEVFN to create devfn numbers
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (6 preceding siblings ...)
  2009-02-17  9:07 ` [rfc 07/18] iommu: Define PCI_DEVFN_AUTO and use it Simon Horman
@ 2009-02-17  9:07 ` Simon Horman
  2009-02-17  9:07 ` [rfc 09/18] ioemu: piix4acpi.c: make various variables static Simon Horman
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: ioemu-dm-PCI_DEVFN.patch --]
[-- Type: text/plain, Size: 4999 bytes --]

It seems to me that this is a lot clearer and less prone to error.

Signed-off-by: Simon Horman <horms@verge.net.au>

Index: ioemu-remote/hw/apb_pci.c
===================================================================
--- ioemu-remote.orig/hw/apb_pci.c	2009-02-17 17:28:23.000000000 +0900
+++ ioemu-remote/hw/apb_pci.c	2009-02-17 17:29:05.000000000 +0900
@@ -255,9 +255,10 @@ PCIBus *pci_apb_init(target_phys_addr_t 
     d->config[0x0E] = 0x00; // header_type
 
     /* APB secondary busses */
-    secondary = pci_bridge_init(s->bus, 8, 0x108e5000, pci_apb_map_irq,
+    secondary = pci_bridge_init(s->bus, PCI_DEVFN(1, 0), 0x108e5000,
+                                pci_apb_map_irq,
                                 "Advanced PCI Bus secondary bridge 1");
-    pci_bridge_init(s->bus, 9, 0x108e5000, pci_apb_map_irq,
+    pci_bridge_init(s->bus, PCI_DEVFN(1, 1), 0x108e5000, pci_apb_map_irq,
                     "Advanced PCI Bus secondary bridge 2");
     return secondary;
 }
Index: ioemu-remote/hw/mips_malta.c
===================================================================
--- ioemu-remote.orig/hw/mips_malta.c	2009-02-17 17:28:23.000000000 +0900
+++ ioemu-remote/hw/mips_malta.c	2009-02-17 17:29:05.000000000 +0900
@@ -495,7 +495,7 @@ static void network_init (PCIBus *pci_bu
         }
         if (i == 0  && strcmp(nd->model, "pcnet") == 0) {
             /* The malta board has a PCNet card using PCI SLOT 11 */
-            pci_nic_init(pci_bus, nd, 88);
+            pci_nic_init(pci_bus, nd, PCI_DEVFN(11, 0));
         } else {
             pci_nic_init(pci_bus, nd, PCI_DEVFN_AUTO);
         }
@@ -901,7 +901,7 @@ void mips_malta_init (ram_addr_t ram_siz
             hd[i] = NULL;
     }
 
-    piix4_devfn = piix4_init(pci_bus, 80);
+    piix4_devfn = piix4_init(pci_bus, PCI_DEVFN(10, 0));
     pci_piix4_ide_init(pci_bus, hd, piix4_devfn + 1, i8259);
     usb_uhci_piix4_init(pci_bus, piix4_devfn + 2);
     smbus = piix4_pm_init(pci_bus, piix4_devfn + 3, 0x1100, i8259[9]);
Index: ioemu-remote/hw/unin_pci.c
===================================================================
--- ioemu-remote.orig/hw/unin_pci.c	2009-02-17 17:28:23.000000000 +0900
+++ ioemu-remote/hw/unin_pci.c	2009-02-17 17:29:05.000000000 +0900
@@ -173,7 +173,7 @@ PCIBus *pci_pmac_init(qemu_irq *pic)
     cpu_register_physical_memory(0xf2800000, 0x1000, pci_mem_config);
     cpu_register_physical_memory(0xf2c00000, 0x1000, pci_mem_data);
     d = pci_register_device(s->bus, "Uni-north main", sizeof(PCIDevice),
-                            11 << 3, NULL, NULL);
+                           PCI_DEVFN(11, 0), NULL, NULL);
     d->config[0x00] = 0x6b; // vendor_id : Apple
     d->config[0x01] = 0x10;
     d->config[0x02] = 0x1F; // device_id
@@ -188,8 +188,8 @@ PCIBus *pci_pmac_init(qemu_irq *pic)
 
 #if 0 // XXX: not activated as PPC BIOS doesn't handle multiple buses properly
     /* pci-to-pci bridge */
-    d = pci_register_device("Uni-north bridge", sizeof(PCIDevice), 0, 13 << 3,
-                            NULL, NULL);
+    d = pci_register_device("Uni-north bridge", sizeof(PCIDevice), 0,
+                            PCI_DEVFN(13, 0), NULL, NULL);
     d->config[0x00] = 0x11; // vendor_id : TI
     d->config[0x01] = 0x10;
     d->config[0x02] = 0x26; // device_id
@@ -227,8 +227,8 @@ PCIBus *pci_pmac_init(qemu_irq *pic)
     cpu_register_physical_memory(0xf0800000, 0x1000, pci_mem_config);
     cpu_register_physical_memory(0xf0c00000, 0x1000, pci_mem_data);
 
-    d = pci_register_device("Uni-north AGP", sizeof(PCIDevice), 0, 11 << 3,
-                            NULL, NULL);
+    d = pci_register_device("Uni-north AGP", sizeof(PCIDevice), 0,
+                            PCI_DEVFN(11, 0), NULL, NULL);
     d->config[0x00] = 0x6b; // vendor_id : Apple
     d->config[0x01] = 0x10;
     d->config[0x02] = 0x20; // device_id
@@ -253,7 +253,7 @@ PCIBus *pci_pmac_init(qemu_irq *pic)
     cpu_register_physical_memory(0xf4c00000, 0x1000, pci_mem_data);
 
     d = pci_register_device("Uni-north internal", sizeof(PCIDevice),
-                            3, 11 << 3, NULL, NULL);
+                            3, PCI_DEVFN(11, 0), NULL, NULL);
     d->config[0x00] = 0x6b; // vendor_id : Apple
     d->config[0x01] = 0x10;
     d->config[0x02] = 0x1E; // device_id
Index: ioemu-remote/hw/pass-through.c
===================================================================
--- ioemu-remote.orig/hw/pass-through.c	2009-02-17 17:28:23.000000000 +0900
+++ ioemu-remote/hw/pass-through.c	2009-02-17 17:29:05.000000000 +0900
@@ -3333,7 +3333,7 @@ int power_on_php_slot(int php_slot)
     pt_dev = 
         register_real_device(dpci_infos.e_bus,
             "DIRECT PCI",
-            pci_slot << 3,
+            PCI_DEVFN(pci_slot, 0),
             php_dev->r_bus,
             php_dev->r_dev,
             php_dev->r_func,

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 09/18] ioemu: piix4acpi.c: make various variables static
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (7 preceding siblings ...)
  2009-02-17  9:07 ` [rfc 08/18] iommu: Use PCI_DEVFN to create devfn numbers Simon Horman
@ 2009-02-17  9:07 ` Simon Horman
  2009-02-17  9:07 ` [rfc 10/18] ioemu: piix4acpi.c: Simplfy PHPSlots structure Simon Horman
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: ioemu-piix4acpi-static.patch --]
[-- Type: text/plain, Size: 1154 bytes --]

All of these variables appear to only be used inside piix4acpi.c

Signed-off-by: Simon Horman <horms@verge.net.au>

Index: ioemu-remote/hw/piix4acpi.c
===================================================================
--- ioemu-remote/hw/piix4acpi.c	2009-01-02 16:01:54.000000000 +1100
+++ ioemu-remote/hw/piix4acpi.c	2009-01-02 16:28:06.000000000 +1100
@@ -57,7 +57,7 @@
 #define ACPI_PHP_SLOT_NUM PHP_SLOT_LEN
 
 typedef struct AcpiDeviceState AcpiDeviceState;
-AcpiDeviceState *acpi_device_table;
+static AcpiDeviceState *acpi_device_table;
 
 typedef struct PCIAcpiState {
     PCIDevice dev;
@@ -74,7 +74,7 @@ typedef struct GPEState {
 
 } GPEState;
 
-GPEState gpe_state;
+static GPEState gpe_state;
 
 typedef struct PHPSlots {
     struct {
@@ -83,7 +83,7 @@ typedef struct PHPSlots {
     uint8_t plug_evt;      /* slot|event slot:0-no event;1-1st. event:0-remove;1-add */
 } PHPSlots;
 
-PHPSlots php_slots;
+static PHPSlots php_slots;
 int s3_shutdown_flag;
 static qemu_irq sci_irq;
 

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 10/18] ioemu: piix4acpi.c: Simplfy PHPSlots structure
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (8 preceding siblings ...)
  2009-02-17  9:07 ` [rfc 09/18] ioemu: piix4acpi.c: make various variables static Simon Horman
@ 2009-02-17  9:07 ` Simon Horman
  2009-02-17  9:07 ` [rfc 11/18] ioemu: piix4acpi.c: Consistently dont cast opaque to PHPSlots Simon Horman
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: ioemu-simplify-PHPSlots.patch --]
[-- Type: text/plain, Size: 2872 bytes --]

The anonymouys structure inside PHPSlots only has one element,
so things can be simplified by moving the element into PHPSlots
and removing the anonymous structure.

Signed-off-by: Simon Horman <horms@verge.net.au>

Index: ioemu-remote/hw/piix4acpi.c
===================================================================
--- ioemu-remote/hw/piix4acpi.c	2009-01-02 16:04:48.000000000 +1100
+++ ioemu-remote/hw/piix4acpi.c	2009-01-02 16:06:28.000000000 +1100
@@ -77,9 +77,7 @@ typedef struct GPEState {
 static GPEState gpe_state;
 
 typedef struct PHPSlots {
-    struct {
-        uint8_t status;    /* Apaptor stats */
-    } slot[ACPI_PHP_SLOT_NUM];
+    uint8_t status[ACPI_PHP_SLOT_NUM]; /* Apaptor stats */
     uint8_t plug_evt;      /* slot|event slot:0-no event;1-1st. event:0-remove;1-add */
 } PHPSlots;
 
@@ -240,7 +238,7 @@ static uint32_t acpi_php_readb(void *opa
         break;
     default:
         num = addr - ACPI_PHP_IO_ADDR - 1;
-        val = hotplug_slots->slot[num].status;
+        val = hotplug_slots->status[num];
     }
 
     fprintf(logfile, "ACPI PCI hotplug: read addr=0x%x, val=0x%x.\n",
@@ -265,7 +263,7 @@ static void acpi_php_writeb(void *opaque
         php_slot = addr - ACPI_PHP_IO_ADDR - 1;
         if ( val == 0x1 ) { /* Eject command */
             /* make _STA of the slot 0 */
-            hotplug_slots->slot[php_slot].status = 0;
+            hotplug_slots->status[php_slot] = 0;
 
             /* clear the hotplug event */
             hotplug_slots->plug_evt = 0;
@@ -284,7 +282,7 @@ static void pcislots_save(QEMUFile* f, v
     PHPSlots *s = (PHPSlots*)opaque;
     int i;
     for ( i = 0; i < ACPI_PHP_SLOT_NUM; i++ ) {
-        qemu_put_8s( f, &s->slot[i].status);
+        qemu_put_8s( f, &s->status[i]);
     }
     qemu_put_8s(f, &s->plug_evt);
 }
@@ -296,7 +294,7 @@ static int pcislots_load(QEMUFile* f, vo
     if (version_id != 1)
         return -EINVAL;
     for ( i = 0; i < ACPI_PHP_SLOT_NUM; i++ ) {
-        qemu_get_8s( f, &s->slot[i].status);
+        qemu_get_8s( f, &s->status[i]);
     }
     qemu_get_8s(f, &s->plug_evt);
     return 0;
@@ -311,7 +309,7 @@ static void php_slots_init(void)
     /* update the pci slot status */
     for ( i = 0; i < PHP_SLOT_LEN; i++ ) {
         if ( test_pci_slot( PHP_TO_PCI_SLOT(i) ) == 1 )
-            slots->slot[i].status = 0xf;
+            slots->status[i] = 0xf;
     }
 
 
@@ -495,7 +493,7 @@ void acpi_php_add(int pci_slot)
     hotplug_slots->plug_evt = (((php_slot+1) << 4) | PHP_EVT_ADD);
 
     /* update the slot status as present */
-    hotplug_slots->slot[php_slot].status = 0xf;
+    hotplug_slots->status[php_slot]= 0xf;
 
     /* power on the slot */
     power_on_php_slot(php_slot);

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 11/18] ioemu: piix4acpi.c: Consistently dont cast opaque to PHPSlots
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (9 preceding siblings ...)
  2009-02-17  9:07 ` [rfc 10/18] ioemu: piix4acpi.c: Simplfy PHPSlots structure Simon Horman
@ 2009-02-17  9:07 ` Simon Horman
  2009-02-17  9:08 ` [rfc 12/18] ioemu: piix4acpi.c: remove unnecessary assignment of pci_slots to local variables Simon Horman
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: ioemu-PHPSlots-no-cast.patch --]
[-- Type: text/plain, Size: 1574 bytes --]

Some instances were cast and others were not.
This patch makes all instances not cast opaque into PHPSlots.

Also consistently use hotplug_slots as the variable to which
opaque is assigned.

Signed-off-by: Simon Horman <horms@verge.net.au>

Index: ioemu-remote/hw/piix4acpi.c
===================================================================
--- xen-unstable.hg.orig/tools/ioemu-remote/hw/piix4acpi.c	2009-01-02 16:28:11.000000000 +1100
+++ ioemu-remote/hw/piix4acpi.c	2009-01-02 16:44:18.000000000 +1100
@@ -279,24 +279,24 @@ static void acpi_php_writeb(void *opaque
 
 static void pcislots_save(QEMUFile* f, void* opaque)
 {
-    PHPSlots *s = (PHPSlots*)opaque;
+    PHPSlots *hotplug_slots = opaque;
     int i;
     for ( i = 0; i < ACPI_PHP_SLOT_NUM; i++ ) {
-        qemu_put_8s( f, &s->status[i]);
+        qemu_put_8s( f, &hotplug_slots->status[i]);
     }
-    qemu_put_8s(f, &s->plug_evt);
+    qemu_put_8s(f, &hotplug_slots->plug_evt);
 }
 
 static int pcislots_load(QEMUFile* f, void* opaque, int version_id)
 {
-    PHPSlots *s = (PHPSlots*)opaque;
+    PHPSlots *hotplug_slots = opaque;
     int i;
     if (version_id != 1)
         return -EINVAL;
     for ( i = 0; i < ACPI_PHP_SLOT_NUM; i++ ) {
-        qemu_get_8s( f, &s->status[i]);
+        qemu_get_8s( f, &hotplug_slots->status[i]);
     }
-    qemu_get_8s(f, &s->plug_evt);
+    qemu_get_8s(f, &hotplug_slots->plug_evt);
     return 0;
 }
 

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 12/18] ioemu: piix4acpi.c: remove unnecessary assignment of pci_slots to local variables
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (10 preceding siblings ...)
  2009-02-17  9:07 ` [rfc 11/18] ioemu: piix4acpi.c: Consistently dont cast opaque to PHPSlots Simon Horman
@ 2009-02-17  9:08 ` Simon Horman
  2009-02-17  9:08 ` [rfc 13/18] ioemu: piix4acpi.c: remove ACPI_PHP_SLOT_NUM Simon Horman
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: ioemu-no-local-php_slots.patch --]
[-- Type: text/plain, Size: 2899 bytes --]

Signed-off-by: Simon Horman <horms@verge.net.au>

Index: ioemu-remote/hw/piix4acpi.c
===================================================================
--- ioemu-remote/hw/piix4acpi.c	2009-01-02 16:13:05.000000000 +1100
+++ ioemu-remote/hw/piix4acpi.c	2009-01-02 16:15:29.000000000 +1100
@@ -302,21 +302,23 @@ static int pcislots_load(QEMUFile* f, vo
 
 static void php_slots_init(void)
 {
-    PHPSlots *slots = &php_slots;
     int i;
-    memset(slots, 0, sizeof(PHPSlots));
+    memset(&php_slots, 0, sizeof(PHPSlots));
 
     /* update the pci slot status */
     for ( i = 0; i < PHP_SLOT_LEN; i++ ) {
         if ( test_pci_slot( PHP_TO_PCI_SLOT(i) ) == 1 )
-            slots->status[i] = 0xf;
+            php_slots.status[i] = 0xf;
     }
 
 
     /* ACPI PCI hotplug controller */
-    register_ioport_read(ACPI_PHP_IO_ADDR, ACPI_PHP_SLOT_NUM + 1, 1, acpi_php_readb, slots);
-    register_ioport_write(ACPI_PHP_IO_ADDR, ACPI_PHP_SLOT_NUM + 1, 1, acpi_php_writeb, slots);
-    register_savevm("pcislots", 0, 1, pcislots_save, pcislots_load, slots);
+    register_ioport_read(ACPI_PHP_IO_ADDR, ACPI_PHP_SLOT_NUM + 1, 1,
+                         acpi_php_readb, &php_slots);
+    register_ioport_write(ACPI_PHP_IO_ADDR, ACPI_PHP_SLOT_NUM + 1, 1,
+                          acpi_php_writeb, &php_slots);
+    register_savevm("pcislots", 0, 1, pcislots_save, pcislots_load,
+                    &php_slots);
 }
 
 /* GPEx_STS occupy 1st half of the block, while GPEx_EN 2nd half */
@@ -452,7 +454,6 @@ static void acpi_sci_intr(GPEState *s)
 void acpi_php_del(int pci_slot)
 {
     GPEState *s = &gpe_state;
-    PHPSlots *hotplug_slots = &php_slots;
     int php_slot = PCI_TO_PHP_SLOT(pci_slot);
 
     if ( pci_slot < PHP_SLOT_START || pci_slot >= PHP_SLOT_END ) {
@@ -462,7 +463,7 @@ void acpi_php_del(int pci_slot)
     }
 
     /* update the php controller status */
-    hotplug_slots->plug_evt = (((php_slot+1) << 4) | PHP_EVT_REMOVE);
+    php_slots.plug_evt = (((php_slot+1) << 4) | PHP_EVT_REMOVE);
 
     /* generate a SCI interrupt */
     acpi_sci_intr(s);
@@ -471,7 +472,6 @@ void acpi_php_del(int pci_slot)
 void acpi_php_add(int pci_slot)
 {
     GPEState *s = &gpe_state;
-    PHPSlots *hotplug_slots = &php_slots;
     int php_slot = PCI_TO_PHP_SLOT(pci_slot);
     char ret_str[30];
 
@@ -490,10 +490,10 @@ void acpi_php_add(int pci_slot)
     }
 
     /* update the php controller status */
-    hotplug_slots->plug_evt = (((php_slot+1) << 4) | PHP_EVT_ADD);
+    php_slots.plug_evt = (((php_slot+1) << 4) | PHP_EVT_ADD);
 
     /* update the slot status as present */
-    hotplug_slots->status[php_slot]= 0xf;
+    php_slots.status[php_slot]= 0xf;
 
     /* power on the slot */
     power_on_php_slot(php_slot);

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 13/18] ioemu: piix4acpi.c: remove ACPI_PHP_SLOT_NUM
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (11 preceding siblings ...)
  2009-02-17  9:08 ` [rfc 12/18] ioemu: piix4acpi.c: remove unnecessary assignment of pci_slots to local variables Simon Horman
@ 2009-02-17  9:08 ` Simon Horman
  2009-02-17  9:08 ` [rfc 14/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: ioemu-no-ACPI_PHP_SLOT_NUM-3f23188224b7ce69fcf13f52cb1c7977a5372900.patch --]
[-- Type: text/plain, Size: 2377 bytes --]

ACPI_PHP_SLOT_NUM is equivalent to PHP_SLOT_LEN and both
are used inside piix4acpi.c. So for the sake of consistency
remove ACPI_PHP_SLOT_NUM.

Signed-off-by: Simon Horman <horms@verge.net.au>

Index: ioemu-remote/hw/piix4acpi.c
===================================================================
--- ioemu-remote/hw/piix4acpi.c	2009-01-02 16:48:04.000000000 +1100
+++ ioemu-remote/hw/piix4acpi.c	2009-01-02 16:48:44.000000000 +1100
@@ -54,8 +54,6 @@
 /* The bit in GPE0_STS/EN to notify the pci hotplug event */
 #define ACPI_PHP_GPE_BIT 3
 
-#define ACPI_PHP_SLOT_NUM PHP_SLOT_LEN
-
 typedef struct AcpiDeviceState AcpiDeviceState;
 static AcpiDeviceState *acpi_device_table;
 
@@ -77,7 +75,7 @@ typedef struct GPEState {
 static GPEState gpe_state;
 
 typedef struct PHPSlots {
-    uint8_t status[ACPI_PHP_SLOT_NUM]; /* Apaptor stats */
+    uint8_t status[PHP_SLOT_LEN]; /* Apaptor stats */
     uint8_t plug_evt;      /* slot|event slot:0-no event;1-1st. event:0-remove;1-add */
 } PHPSlots;
 
@@ -281,7 +279,7 @@ static void pcislots_save(QEMUFile* f, v
 {
     PHPSlots *hotplug_slots = opaque;
     int i;
-    for ( i = 0; i < ACPI_PHP_SLOT_NUM; i++ ) {
+    for ( i = 0; i < PHP_SLOT_LEN; i++ ) {
         qemu_put_8s( f, &hotplug_slots->status[i]);
     }
     qemu_put_8s(f, &hotplug_slots->plug_evt);
@@ -293,7 +291,7 @@ static int pcislots_load(QEMUFile* f, vo
     int i;
     if (version_id != 1)
         return -EINVAL;
-    for ( i = 0; i < ACPI_PHP_SLOT_NUM; i++ ) {
+    for ( i = 0; i < PHP_SLOT_LEN; i++ ) {
         qemu_get_8s( f, &hotplug_slots->status[i]);
     }
     qemu_get_8s(f, &hotplug_slots->plug_evt);
@@ -313,9 +311,9 @@ static void php_slots_init(void)
 
 
     /* ACPI PCI hotplug controller */
-    register_ioport_read(ACPI_PHP_IO_ADDR, ACPI_PHP_SLOT_NUM + 1, 1,
+    register_ioport_read(ACPI_PHP_IO_ADDR, PHP_SLOT_LEN + 1, 1,
                          acpi_php_readb, &php_slots);
-    register_ioport_write(ACPI_PHP_IO_ADDR, ACPI_PHP_SLOT_NUM + 1, 1,
+    register_ioport_write(ACPI_PHP_IO_ADDR, PHP_SLOT_LEN + 1, 1,
                           acpi_php_writeb, &php_slots);
     register_savevm("pcislots", 0, 1, pcislots_save, pcislots_load,
                     &php_slots);

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 14/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (12 preceding siblings ...)
  2009-02-17  9:08 ` [rfc 13/18] ioemu: piix4acpi.c: remove ACPI_PHP_SLOT_NUM Simon Horman
@ 2009-02-17  9:08 ` Simon Horman
  2009-02-17  9:08 ` [rfc 15/18] ioemu: use struct php_dev to pass around PCI pass-through assignment parameters Simon Horman
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: ioemu-devfn-3f23188224b7ce69fcf13f52cb1c7977a5372900.patch --]
[-- Type: text/plain, Size: 23939 bytes --]

This allows multi-function cards to be appear in guets as
multi-function cards, with the restriction that function 0 must
be passed through. Otherwise each function is allocated its own
slot, as before.

e.g.

1. Function 0 and two other functions of a multi-function card are
passed through, and the function numbers are maintained in the guest.

Physical                     Guest
0:1b.0    - pass through ->  0:6.0
0:1b.1    - pass through ->  0:6.1
0:1b.2    - pass through ->  0:6.2

2. Two functions other than zero of a multi-function card are
passed through. Each function is represent as function 0 of a slot
in the guest.

Physical                     Guest
0:1b.1    - pass through ->  0:6.0
0:1b.2    - pass through ->  0:7.0

Subsequent patches will allow the virtual device and function
to be specified which amongst other things allows the exisiting
default behaviour to be configured.

This patch seems to break hotplug.
I am unsure of why, but am working towards a fix.

Signed-off-by: Simon Horman <horms@verge.net.au>

--- 
* Thu, 12 Feb 2009 21:10:27 +1100
  Add missing PCI_TO_PHP_DEVFN conversion of non-zero devfn passed to
  insert_to_php_devfn(). This is a regression introduced by this patch.

Fri, 13 Feb 2009 15:22:10 +1100
* Rebased
  from "Restore xenfb.h and atkbd_ translation tables for * xenfbfront"
  to "fix memory/fd leak in pt_msix_init()"

Index: ioemu-remote/hw/piix4acpi.c
===================================================================
--- ioemu-remote.orig/hw/piix4acpi.c	2009-02-17 17:29:14.000000000 +0900
+++ ioemu-remote/hw/piix4acpi.c	2009-02-17 17:36:01.000000000 +0900
@@ -74,12 +74,12 @@ typedef struct GPEState {
 
 static GPEState gpe_state;
 
-typedef struct PHPSlots {
-    uint8_t status[PHP_SLOT_LEN]; /* Apaptor stats */
-    uint8_t plug_evt;      /* slot|event slot:0-no event;1-1st. event:0-remove;1-add */
-} PHPSlots;
+typedef struct PHPDevFn {
+    uint8_t status[PHP_DEVFN_LEN];    /* Apaptor stats */
+    uint16_t plug_evt;     /* devfn|event devfn:0-no event;1-1st. event:0-remove;1-add */
+} PHPDevFn;
 
-static PHPSlots php_slots;
+static PHPDevFn php_devfn_stat;
 int s3_shutdown_flag;
 static qemu_irq sci_irq;
 
@@ -218,25 +218,25 @@ static void acpi_dbg_writel(void *opaque
 /*
  * simple PCI hotplug controller IO 
  * ACPI_PHP_IO_ADDR + :
- * 0 - the hotplug description: slot(|event(remove/add); 
- * 1 - 1st php slot ctr/sts reg
- * 2 - 2nd php slot ctr/sts reg
+ * 0 - the hotplug description: devfn(|event(remove/add);
+ * 1 - 1st php devfn ctr/sts reg
+ * 2 - 2nd php devfn ctr/sts reg
  * ......
  */
 static uint32_t acpi_php_readb(void *opaque, uint32_t addr)
 {
-    PHPSlots *hotplug_slots = opaque;
+    PHPDevFn *hotplug_devfn = opaque;
     int num;
     uint32_t val; 
 
     switch (addr)
     {
     case ACPI_PHP_IO_ADDR:
-        val = hotplug_slots->plug_evt;
+        val = hotplug_devfn->plug_evt;
         break;
     default:
         num = addr - ACPI_PHP_IO_ADDR - 1;
-        val = hotplug_slots->status[num];
+        val = hotplug_devfn->status[num];
     }
 
     fprintf(logfile, "ACPI PCI hotplug: read addr=0x%x, val=0x%x.\n",
@@ -247,8 +247,8 @@ static uint32_t acpi_php_readb(void *opa
 
 static void acpi_php_writeb(void *opaque, uint32_t addr, uint32_t val)
 {
-    PHPSlots *hotplug_slots = opaque;
-    int php_slot;
+    PHPDevFn *hotplug_devfn = opaque;
+    int php_devfn;
 
     fprintf(logfile, "ACPI PCI hotplug: write addr=0x%x, val=0x%x.\n",
             addr, val);
@@ -258,16 +258,16 @@ static void acpi_php_writeb(void *opaque
     case ACPI_PHP_IO_ADDR:
         break;
     default:
-        php_slot = addr - ACPI_PHP_IO_ADDR - 1;
+        php_devfn = addr - ACPI_PHP_IO_ADDR - 1;
         if ( val == 0x1 ) { /* Eject command */
-            /* make _STA of the slot 0 */
-            hotplug_slots->status[php_slot] = 0;
+            /* make _STA of devfn 0 */
+            hotplug_devfn->status[php_devfn] = 0;
 
             /* clear the hotplug event */
-            hotplug_slots->plug_evt = 0;
+            hotplug_devfn->plug_evt = 0;
 
-            /* power off the slot */
-            power_off_php_slot(php_slot);
+            /* power off the devfn */
+            power_off_php_devfn(php_devfn);
 
             /* signal the CP ACPI hot remove done. */
             xenstore_record_dm_state("pci-removed");
@@ -275,48 +275,48 @@ static void acpi_php_writeb(void *opaque
     }
 }
 
-static void pcislots_save(QEMUFile* f, void* opaque)
+static void php_devfn_save(QEMUFile* f, void* opaque)
 {
-    PHPSlots *hotplug_slots = opaque;
+    PHPDevFn *hotplug_devfn = opaque;
     int i;
-    for ( i = 0; i < PHP_SLOT_LEN; i++ ) {
-        qemu_put_8s( f, &hotplug_slots->status[i]);
+    for ( i = 0; i < PHP_DEVFN_LEN; i++ ) {
+        qemu_put_8s( f, &hotplug_devfn->status[i]);
     }
-    qemu_put_8s(f, &hotplug_slots->plug_evt);
+    qemu_put_8s(f, &hotplug_devfn->plug_evt);
 }
 
-static int pcislots_load(QEMUFile* f, void* opaque, int version_id)
+static int php_devfn_load(QEMUFile* f, void* opaque, int version_id)
 {
-    PHPSlots *hotplug_slots = opaque;
+    PHPDevFn *hotplug_devfn = opaque;
     int i;
     if (version_id != 1)
         return -EINVAL;
-    for ( i = 0; i < PHP_SLOT_LEN; i++ ) {
-        qemu_get_8s( f, &hotplug_slots->status[i]);
+    for ( i = 0; i < PHP_DEVFN_LEN; i++ ) {
+        qemu_get_8s( f, &hotplug_devfn->status[i]);
     }
-    qemu_get_8s(f, &hotplug_slots->plug_evt);
+    qemu_get_8s(f, &hotplug_devfn->plug_evt);
     return 0;
 }
 
-static void php_slots_init(void)
+static void php_devfn_init(void)
 {
     int i;
-    memset(&php_slots, 0, sizeof(PHPSlots));
 
-    /* update the pci slot status */
-    for ( i = 0; i < PHP_SLOT_LEN; i++ ) {
-        if ( test_pci_slot( PHP_TO_PCI_SLOT(i) ) == 1 )
-            php_slots.status[i] = 0xf;
-    }
+    memset(&php_devfn_stat, 0, sizeof(PHPDevFn));
 
+    /* update the pci devfn status */
+    for ( i = 0; i < PHP_DEVFN_LEN; i++ ) {
+        if ( test_php_devfn(PHP_TO_PCI_DEVFN(i)) == 1 )
+            php_devfn_stat.status[i] = 0xf;
+    }
 
     /* ACPI PCI hotplug controller */
     register_ioport_read(ACPI_PHP_IO_ADDR, PHP_SLOT_LEN + 1, 1,
-                         acpi_php_readb, &php_slots);
+                         acpi_php_readb, &php_devfn_stat);
     register_ioport_write(ACPI_PHP_IO_ADDR, PHP_SLOT_LEN + 1, 1,
-                          acpi_php_writeb, &php_slots);
-    register_savevm("pcislots", 0, 1, pcislots_save, pcislots_load,
-                    &php_slots);
+                          acpi_php_writeb, &php_devfn_stat);
+    register_savevm("pcidevfn", 0, 1, php_devfn_save, php_devfn_load,
+                    &php_devfn_stat);
 }
 
 /* GPEx_STS occupy 1st half of the block, while GPEx_EN 2nd half */
@@ -449,37 +449,36 @@ static void acpi_sci_intr(GPEState *s)
     }
 }
 
-void acpi_php_del(int pci_slot)
+void acpi_php_del(int devfn)
 {
     GPEState *s = &gpe_state;
-    int php_slot = PCI_TO_PHP_SLOT(pci_slot);
-
-    if ( pci_slot < PHP_SLOT_START || pci_slot >= PHP_SLOT_END ) {
-        fprintf(logfile, "not find the pci slot %d when hot remove.\n", pci_slot);
+    int php_devfn = PCI_TO_PHP_DEVFN(devfn);
 
+    if ( !PCI_DEVFN_IS_PHP(devfn) ) {
+        fprintf(logfile, "not a php devfn: %d\n", devfn);
         return;
     }
 
     /* update the php controller status */
-    php_slots.plug_evt = (((php_slot+1) << 4) | PHP_EVT_REMOVE);
+    php_devfn_stat.plug_evt = (((php_devfn+1) << 4) | PHP_EVT_REMOVE);
 
     /* generate a SCI interrupt */
     acpi_sci_intr(s);
 }
 
-void acpi_php_add(int pci_slot)
+void acpi_php_add(int devfn)
 {
     GPEState *s = &gpe_state;
-    int php_slot = PCI_TO_PHP_SLOT(pci_slot);
+    int php_devfn = PCI_TO_PHP_DEVFN(devfn);
     char ret_str[30];
 
-    if ( pci_slot < PHP_SLOT_START || pci_slot >= PHP_SLOT_END ) {
-        fprintf(logfile, "hot add pci slot %d exceed.\n", pci_slot);
+    if ( !PCI_DEVFN_IS_PHP(devfn) ) {
+        fprintf(logfile, "not a php devfn: %d\n", devfn);
 
-        if ( pci_slot == 0 )
-            sprintf(ret_str, "no free hotplug slots");
-        else if ( pci_slot == -1 )
-            sprintf(ret_str, "wrong bdf or vslot");
+        if ( devfn == 0 )
+            sprintf(ret_str, "no free hotplug devfn");
+        else if ( devfn == -1 )
+            sprintf(ret_str, "wrong bdf or vdevfn");
 
         if ( strlen(ret_str) > 0 )
             xenstore_record_dm("parameter", ret_str);
@@ -488,16 +487,16 @@ void acpi_php_add(int pci_slot)
     }
 
     /* update the php controller status */
-    php_slots.plug_evt = (((php_slot+1) << 4) | PHP_EVT_ADD);
+    php_devfn_stat.plug_evt = (((php_devfn+1) << 4) | PHP_EVT_ADD);
 
-    /* update the slot status as present */
-    php_slots.status[php_slot]= 0xf;
+    /* update the devfn status as present */
+    php_devfn_stat.status[php_devfn]= 0xf;
 
-    /* power on the slot */
-    power_on_php_slot(php_slot);
+    /* power on the devfn */
+    power_on_php_devfn(php_devfn);
 
-    /* tell Control panel which slot for the new pass-throgh dev */
-    sprintf(ret_str, "0x%x", pci_slot);
+    /* tell Control panel which devfn for the new pass-throgh dev */
+    sprintf(ret_str, "0x%x", devfn);
     xenstore_record_dm("parameter", ret_str);
 
     /* signal the CP ACPI hot insert done */
@@ -555,7 +554,7 @@ i2c_bus *piix4_pm_init(PCIBus *bus, int 
 
 #ifdef CONFIG_PASSTHROUGH
     gpe_acpi_init();
-    php_slots_init();
+    php_devfn_init();
     register_ioport_write(ACPI_DBG_IO_ADDR, 4, 4, acpi_dbg_writel, d);
 #endif
 
Index: ioemu-remote/hw/pci.c
===================================================================
--- ioemu-remote.orig/hw/pci.c	2009-02-17 17:28:20.000000000 +0900
+++ ioemu-remote/hw/pci.c	2009-02-17 17:36:01.000000000 +0900
@@ -139,8 +139,7 @@ PCIDevice *pci_register_device(PCIBus *b
 
     if (devfn == PCI_DEVFN_AUTO) {
         for(devfn = bus->devfn_min ; devfn < 256; devfn += 8) {
-            if ( !bus->devices[devfn] &&
-                 !( devfn >= PHP_DEVFN_START && devfn < PHP_DEVFN_END ) )
+            if ( !bus->devices[devfn] && !PCI_DEVFN_IS_PHP(devfn) )
                 goto found;
         }
         return NULL;
Index: ioemu-remote/hw/pass-through.c
===================================================================
--- ioemu-remote.orig/hw/pass-through.c	2009-02-17 17:29:05.000000000 +0900
+++ ioemu-remote/hw/pass-through.c	2009-02-17 17:36:01.000000000 +0900
@@ -38,7 +38,7 @@ struct php_dev {
 };
 static struct dpci_infos {
 
-    struct php_dev php_devs[PHP_SLOT_LEN];
+    struct php_dev php_devs[PHP_DEVFN_LEN];
 
     PCIBus *e_bus;
     struct pci_access *pci_access;
@@ -76,6 +76,8 @@ static uint32_t pt_msgdata_reg_init(stru
     struct pt_reg_info_tbl *reg, uint32_t real_offset);
 static uint32_t pt_msixctrl_reg_init(struct pt_dev *ptdev,
     struct pt_reg_info_tbl *reg, uint32_t real_offset);
+static uint32_t pt_header_type_reg_init(struct pt_dev *ptdev,
+    struct pt_reg_info_tbl *reg, uint32_t real_offset);
 static uint8_t pt_reg_grp_size_init(struct pt_dev *ptdev,
     struct pt_reg_grp_info_tbl *grp_reg, uint32_t base_offset);
 static uint8_t pt_msi_size_init(struct pt_dev *ptdev,
@@ -243,7 +245,7 @@ static struct pt_reg_info_tbl pt_emu_reg
         .init_val   = 0x00,
         .ro_mask    = 0xFF,
         .emu_mask   = 0x80,
-        .init       = pt_common_reg_init,
+        .init       = pt_header_type_reg_init,
         .u.b.read   = pt_byte_reg_read,
         .u.b.write  = pt_byte_reg_write,
     },
@@ -794,58 +796,66 @@ static void msi_set_enable(struct pt_dev
     pci_write_word(ptdev->pci_dev, address, val);
 }
 
-/* Insert a new pass-through device into a specific pci slot.
+/* Insert a new pass-through device into function 0 of a specific pci slot.
  * input  dom:bus:dev.func@slot, chose free one if slot == 0
  * return -1: required slot not available
  *         0: no free hotplug slots, but normal slot should okay
- *        >0: the new hotplug slot
+ *        >0: the new hotplug devfn
  */
-static int __insert_to_pci_slot(int bus, int dev, int func, int slot,
+static int insert_to_php_devfn(int bus, int dev, int func, int devfn,
                                 char *opt)
 {
-    int i, php_slot;
+    int php_slot, php_func, php_devfn, php_devfn_match;
 
     /* preferred virt pci slot */
-    if ( slot >= PHP_SLOT_START && slot < PHP_SLOT_END )
+    if ( devfn >= PHP_DEVFN_START && devfn < PHP_DEVFN_END )
     {
-        php_slot = PCI_TO_PHP_SLOT(slot);
-        if ( !dpci_infos.php_devs[php_slot].valid )
-        {
+        php_devfn = PCI_TO_PHP_DEVFN(devfn);
+        if ( !dpci_infos.php_devs[php_devfn].valid )
             goto found;
-        }
-        else
-            return -1;
     }
-
-    if ( slot != 0 )
+    if ( devfn != 0 )
         return -1;
 
-    /* slot == 0, pick up a free one */
-    for ( i = 0; i < PHP_SLOT_LEN; i++ )
+    /* Co-locate functions for the same device in the same slot */
+    for ( php_slot = 0; php_slot < PHP_SLOT_LEN; php_slot++ )
     {
-        if ( !dpci_infos.php_devs[i].valid )
+        php_devfn = PCI_DEVFN(php_slot, func);
+        for ( php_func = 0; php_func < 8; php_func++ )
         {
-            php_slot = i;
-            goto found;
+            php_devfn_match = PCI_DEVFN(php_slot, php_func);
+            if ( dpci_infos.php_devs[php_devfn_match].valid &&
+                 dpci_infos.php_devs[php_devfn_match].r_bus == bus &&
+                 dpci_infos.php_devs[php_devfn_match].r_dev == dev &&
+                 !dpci_infos.php_devs[php_devfn].valid )
+                goto found;
         }
     }
 
+    /* Pick a free slot */
+    for ( php_slot = 0; php_slot < PHP_SLOT_LEN; php_slot++ )
+    {
+        php_devfn = PCI_DEVFN(php_slot, 0);
+        if ( !dpci_infos.php_devs[php_devfn].valid )
+            goto found;
+    }
+
     /* not found */
     return 0;
 
 found:
-    dpci_infos.php_devs[php_slot].valid  = 1;
-    dpci_infos.php_devs[php_slot].r_bus  = bus;
-    dpci_infos.php_devs[php_slot].r_dev  = dev;
-    dpci_infos.php_devs[php_slot].r_func = func;
-    dpci_infos.php_devs[php_slot].opt = opt;
-    return PHP_TO_PCI_SLOT(php_slot);
+    dpci_infos.php_devs[php_devfn].valid  = 1;
+    dpci_infos.php_devs[php_devfn].r_bus  = bus;
+    dpci_infos.php_devs[php_devfn].r_dev  = dev;
+    dpci_infos.php_devs[php_devfn].r_func = func;
+    dpci_infos.php_devs[php_devfn].opt = opt;
+    return PHP_TO_PCI_DEVFN(php_devfn);
 }
 
-/* Insert a new pass-through device into a specific pci slot.
+/* Insert a new pass-through device into function 0 of a specific pci slot.
  * input  dom:bus:dev.func@slot
  */
-int insert_to_pci_slot(char *bdf_slt)
+int insert_bdf_to_php_devfn(char *bdf_slt)
 {
     int seg, bus, dev, func, slot;
     char *bdf_str, *slt_str, *opt;
@@ -860,31 +870,31 @@ int insert_to_pci_slot(char *bdf_slt)
         return -1;
     }
 
-    return __insert_to_pci_slot(bus, dev, func, slot, opt);
+    return insert_to_php_devfn(bus, dev, func, PCI_DEVFN(slot, 0), opt);
 
 }
 
-/* Test if a pci slot has a device
+/* Test if a pci devfn has a device
  * 1:  present
  * 0:  not present
  * -1: invalide pci slot input
  */
-int test_pci_slot(int slot)
+int test_php_devfn(int devfn)
 {
-    int php_slot;
+    int php_devfn;
 
-    if ( slot < PHP_SLOT_START || slot >= PHP_SLOT_END )
+    if ( ! PCI_DEVFN_IS_PHP(devfn) )
         return -1;
 
-    php_slot = PCI_TO_PHP_SLOT(slot);
-    if ( dpci_infos.php_devs[php_slot].valid )
+    php_devfn = PCI_TO_PHP_DEVFN(devfn);
+    if ( dpci_infos.php_devs[php_devfn].valid )
         return 1;
     else
         return 0;
 }
 
 /* find the pci slot for pass-through dev with specified BDF */
-int bdf_to_slot(char *bdf_str)
+int bdf_to_php_devfn(char *bdf_str)
 {
     int seg, bus, dev, func, i;
     char *opt;
@@ -893,16 +903,17 @@ int bdf_to_slot(char *bdf_str)
     {
         return -1;
     }
+    PT_LOG("%s: bdf: %04x:%02x.%x\n", __func__, bus, dev, func);
 
     /* locate the virtual pci slot for this VTd device */
-    for ( i = 0; i < PHP_SLOT_LEN; i++ )
+    for ( i = 0; i < PHP_DEVFN_LEN; i++ )
     {
         if ( dpci_infos.php_devs[i].valid &&
            dpci_infos.php_devs[i].r_bus == bus &&
            dpci_infos.php_devs[i].r_dev  == dev &&
            dpci_infos.php_devs[i].r_func == func )
         {
-            return PHP_TO_PCI_SLOT(i);
+            return PHP_TO_PCI_DEVFN(i);
         }
     }
 
@@ -2313,6 +2324,19 @@ static uint8_t pt_pcie_size_init(struct 
     return pcie_size;
 }
 
+/* read PCI_HEADER_TYPE */
+static uint32_t pt_header_type_reg_init(struct pt_dev *ptdev,
+    struct pt_reg_info_tbl *reg, uint32_t real_offset)
+{
+    uint32_t r_val;
+
+    r_val = pci_read_byte(ptdev->pci_dev, PCI_HEADER_TYPE);
+
+    /* If the real device is multi-function,
+     * the passed-through device can be too */
+    return reg->init_val | (r_val & 0x80);
+}
+
 /* read byte size emulate register */
 static int pt_byte_reg_read(struct pt_dev *ptdev,
         struct pt_reg_tbl *cfg_entry,
@@ -3078,7 +3102,7 @@ static struct pt_dev * register_real_dev
     struct pci_dev *pci_dev;
     uint8_t e_device, e_intx;
     struct pci_config_cf8 machine_bdf;
-    int free_pci_slot = -1;
+    int free_devfn = -1;
     char *key, *val;
     int msi_translate;
 
@@ -3103,9 +3127,9 @@ static struct pt_dev * register_real_dev
 
     if ( e_devfn == PCI_DEVFN_AUTO ) {
         /*indicate a static assignment(not hotplug), so find a free PCI hot plug slot */
-        free_pci_slot = __insert_to_pci_slot(r_bus, r_dev, r_func, 0, NULL);
-        if ( free_pci_slot > 0 )
-            e_devfn = free_pci_slot  << 3;
+        free_devfn = insert_to_php_devfn(r_bus, r_dev, r_func, 0, NULL);
+        if ( free_devfn > 0 )
+            e_devfn = free_devfn;
         else
             PT_LOG("Error: no free virtual PCI hot plug slot, thus no live migration.\n");
     }
@@ -3148,8 +3172,9 @@ static struct pt_dev * register_real_dev
         return NULL;
     }
 
-    if ( free_pci_slot > 0 )
-        dpci_infos.php_devs[PCI_TO_PHP_SLOT(free_pci_slot)].pt_dev = assigned_device;
+    if ( free_devfn > 0 )
+        dpci_infos.php_devs[PCI_TO_PHP_DEVFN(free_devfn)].pt_dev =
+                assigned_device;
 
     assigned_device->pci_dev = pci_dev;
     assigned_device->msi_trans_cap = msi_translate;
@@ -3254,7 +3279,7 @@ out:
     return assigned_device;
 }
 
-static int unregister_real_device(int php_slot)
+static int unregister_real_device(int php_devfn)
 {
     struct php_dev *php_dev;
     struct pci_dev *pci_dev;
@@ -3264,10 +3289,10 @@ static int unregister_real_device(int ph
     uint32_t bdf = 0;
     int rc = -1;
 
-    if ( php_slot < 0 || php_slot >= PHP_SLOT_LEN )
+    if ( !PHP_DEVFN_IS_VALID(php_devfn) )
        return -1;
 
-    php_dev = &dpci_infos.php_devs[php_slot];
+    php_dev = &dpci_infos.php_devs[php_devfn];
     assigned_device = php_dev->pt_dev;
 
     if ( !assigned_device || !php_dev->valid )
@@ -3279,7 +3304,7 @@ static int unregister_real_device(int ph
     pci_hide_device((PCIDevice*)assigned_device);
 
     /* Unbind interrupt */
-    e_device = (assigned_device->dev.devfn >> 3) & 0x1f;
+    e_device = assigned_device->dev.devfn;
     /* fix virtual interrupt pin to INTA# */
     e_intx = 0;
     machine_irq = assigned_device->machine_irq;
@@ -3325,15 +3350,15 @@ static int unregister_real_device(int ph
     return 0;
 }
 
-int power_on_php_slot(int php_slot)
+int power_on_php_devfn(int php_devfn)
 {
-    struct php_dev *php_dev = &dpci_infos.php_devs[php_slot];
-    int pci_slot = php_slot + PHP_SLOT_START;
+    struct php_dev *php_dev = &dpci_infos.php_devs[php_devfn];
+    int pci_devfn = PHP_TO_PCI_DEVFN(php_devfn);
     struct pt_dev *pt_dev;
     pt_dev = 
         register_real_device(dpci_infos.e_bus,
             "DIRECT PCI",
-            PCI_DEVFN(pci_slot, 0),
+            pci_devfn,
             php_dev->r_bus,
             php_dev->r_dev,
             php_dev->r_func,
@@ -3349,14 +3374,14 @@ int power_on_php_slot(int php_slot)
 
 }
 
-int power_off_php_slot(int php_slot)
+int power_off_php_devfn(int php_devfn)
 {
-    return unregister_real_device(php_slot);
+    return unregister_real_device(php_devfn);
 }
 
 int pt_init(PCIBus *e_bus, const char *direct_pci)
 {
-    int seg, b, d, f, php_slot = 0, status = -1;
+    int seg, b, d, f, status = -1;
     struct pt_dev *pt_dev;
     struct pci_access *pci_access;
     char *vslots;
@@ -3403,17 +3428,8 @@ int pt_init(PCIBus *e_bus, const char *d
             goto err;
         }
 
-        /* Record the virtual slot info */
-        if ( php_slot < PHP_SLOT_LEN &&
-              dpci_infos.php_devs[php_slot].pt_dev == pt_dev )
-        {
-            sprintf(slot_str, "0x%x;", PHP_TO_PCI_SLOT(php_slot));
-        }
-        else
-            sprintf(slot_str, "0x%x;", 0);
-
+        sprintf(slot_str, "0x%x;", pt_dev->dev.devfn);
         strcat(vslots, slot_str);
-        php_slot++;
     }
 
     /* Write virtual slots info to xenstore for Control panel use */
Index: ioemu-remote/hw/pci.h
===================================================================
--- ioemu-remote.orig/hw/pci.h	2009-02-17 17:28:20.000000000 +0900
+++ ioemu-remote/hw/pci.h	2009-02-17 17:36:01.000000000 +0900
@@ -107,16 +107,19 @@ PCIBus *pci_bridge_init(PCIBus *bus, int
 #define PHP_SLOT_START  (6)
 #define PHP_SLOT_END    (8)
 #define PHP_SLOT_LEN    (PHP_SLOT_END - PHP_SLOT_START)
-#define PHP_TO_PCI_SLOT(x) (x + PHP_SLOT_START)
-#define PCI_TO_PHP_SLOT(x) (x - PHP_SLOT_START)
-#define PHP_DEVFN_START (PHP_SLOT_START << 3)
-#define PHP_DEVFN_END   (PHP_SLOT_END << 3)
-
-int insert_to_pci_slot(char*);
-int test_pci_slot(int);
-int bdf_to_slot(char*);
-int power_on_php_slot(int);
-int power_off_php_slot(int);
+#define PHP_DEVFN_START       (PHP_SLOT_START << 3)
+#define PHP_DEVFN_END         (PHP_SLOT_END << 3)
+#define PHP_DEVFN_LEN         (PHP_DEVFN_END - PHP_DEVFN_START)
+#define PCI_TO_PHP_DEVFN(x)   ((x) - PHP_DEVFN_START)
+#define PHP_TO_PCI_DEVFN(x)   ((x) + PHP_DEVFN_START)
+#define PCI_DEVFN_IS_PHP(x)   ((x) >= PHP_DEVFN_START && (x) < PHP_DEVFN_END)
+#define PHP_DEVFN_IS_VALID(x) (PCI_DEVFN_IS_PHP(PHP_TO_PCI_DEVFN(x)))
+
+int insert_bdf_to_php_devfn(char *bfd_slt);
+int test_php_devfn(int devfn);
+int bdf_to_php_devfn(char *bfd_str);
+int power_on_php_devfn(int php_devfn);
+int power_off_php_devfn(int php_devfn);
 
 /* pci_emulation.c */
 #include "hw/pci_emulation.h"
Index: ioemu-remote/hw/pc.h
===================================================================
--- ioemu-remote.orig/hw/pc.h	2009-02-17 17:28:18.000000000 +0900
+++ ioemu-remote/hw/pc.h	2009-02-17 17:36:01.000000000 +0900
@@ -96,8 +96,8 @@ i2c_bus *piix4_pm_init(PCIBus *bus, int 
 void piix4_smbus_register_device(SMBusDevice *dev, uint8_t addr);
 void acpi_bios_init(void);
 
-void acpi_php_add(int);
-void acpi_php_del(int);
+void acpi_php_add(int devfn);
+void acpi_php_del(int devfn);
 
 /* pcspk.c */
 void pcspk_init(PITState *);
Index: ioemu-remote/vl.c
===================================================================
--- ioemu-remote.orig/vl.c	2009-02-17 17:28:06.000000000 +0900
+++ ioemu-remote/vl.c	2009-02-17 17:36:01.000000000 +0900
@@ -3900,19 +3900,19 @@ void qemu_chr_close(CharDriverState *chr
 #ifdef CONFIG_PASSTHROUGH
 void do_pci_del(char *devname)
 {
-    int pci_slot;
-    pci_slot = bdf_to_slot(devname);
+    int devfn;
+    devfn = bdf_to_php_devfn(devname);
 
-    acpi_php_del(pci_slot);
+    acpi_php_del(devfn);
 }
 
 void do_pci_add(char *devname)
 {
-    int pci_slot;
+    int devfn;
 
-    pci_slot = insert_to_pci_slot(devname);
+    devfn = insert_bdf_to_php_devfn(devname);
 
-    acpi_php_add(pci_slot);
+    acpi_php_add(devfn);
 }
 #endif
 

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 15/18] ioemu: use struct php_dev to pass around PCI pass-through assignment parameters
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (13 preceding siblings ...)
  2009-02-17  9:08 ` [rfc 14/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
@ 2009-02-17  9:08 ` Simon Horman
  2009-02-17  9:08 ` [rfc 16/18] ioemu: non-destructive parsing of PCI assignement strings Simon Horman
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: regex-php_dev-3f23188224b7ce69fcf13f52cb1c7977a5372900.patch --]
[-- Type: text/plain, Size: 12387 bytes --]

Signed-off-by: Simon Horman <horms@verge.ne.au>

--- 
Fri, 13 Feb 2009 13:50:13 +1100
* Account for vslot supplied for hotplug insert

Fri, 13 Feb 2009 15:22:10 +1100
* Rebased
  from "Restore xenfb.h and atkbd_ translation tables for * xenfbfront"
  to "fix memory/fd leak in pt_msix_init()"
Index: ioemu-remote/hw/pass-through.c
===================================================================
--- ioemu-remote.orig/hw/pass-through.c	2009-02-17 17:55:37.000000000 +0900
+++ ioemu-remote/hw/pass-through.c	2009-02-17 17:56:46.000000000 +0900
@@ -34,6 +34,7 @@ struct php_dev {
     uint8_t r_bus;
     uint8_t r_dev;
     uint8_t r_func;
+    uint8_t v_devfn;
     char *opt;
 };
 static struct dpci_infos {
@@ -726,37 +727,65 @@ static const struct pt_reg_grp_info_tbl 
     }, 
 };
 
+static int cmp_php_dev(const struct php_dev *a, const struct php_dev *b)
+{
+    if (a->r_bus != b->r_bus)
+    {
+        return a->r_bus - b->r_bus;
+    }
+    if (a->r_dev != b->r_dev)
+    {
+        return a->r_dev - b->r_dev;
+    }
+    if (a->r_func != b->r_func)
+    {
+        return a->r_func - b->r_func;
+    }
+
+    return 0;
+}
+
+static void cpy_php_dev(struct php_dev *dst, const struct php_dev *src)
+{
+    memcpy(dst, src, sizeof(*dst));
+}
+
 static int token_value(char *token)
 {
     return strtol(token, NULL, 16);
 }
 
-static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func, char **opt)
+static struct php_dev next_bdf(char **str)
 {
     char *token;
     const char *delim = ":.-";
+    struct php_dev bdf;
+
+    memset(&bdf, 0, sizeof(bdf));
 
     if ( !(*str) ||
           ( !strchr(*str, ':') && !strchr(*str, '.')) )
-        return 0;
+        return bdf;
 
     token  = strsep(str, delim);
-    *seg = token_value(token);
+    /* segment */
 
     token  = strsep(str, delim);
-    *bus  = token_value(token);
+    bdf.r_bus  = token_value(token);
 
     token  = strsep(str, delim);
-    *dev  = token_value(token);
+    bdf.r_dev  = token_value(token);
 
     token  = strsep(str, delim);
-    *opt = strchr(token, ',');
-    if (*opt)
-        *(*opt)++ = '\0';
+    bdf.opt = strchr(token, ',');
+    if (bdf.opt)
+        *(bdf.opt)++ = '\0';
 
-    *func  = token_value(token);
+    bdf.r_func  = token_value(token);
 
-    return 1;
+    bdf.valid = 1;
+
+    return bdf;
 }
 
 static int get_next_keyval(char **option, char **key, char **val)
@@ -802,31 +831,31 @@ static void msi_set_enable(struct pt_dev
  *         0: no free hotplug slots, but normal slot should okay
  *        >0: the new hotplug devfn
  */
-static int insert_to_php_devfn(int bus, int dev, int func, int devfn,
-                                char *opt)
+static int insert_to_php_devfn(const struct php_dev *bdf)
 {
     int php_slot, php_func, php_devfn, php_devfn_match;
 
     /* preferred virt pci slot */
-    if ( devfn >= PHP_DEVFN_START && devfn < PHP_DEVFN_END )
+    if ( bdf->v_devfn >= PHP_DEVFN_START && bdf->v_devfn < PHP_DEVFN_END )
     {
-        php_devfn = PCI_TO_PHP_DEVFN(devfn);
-        if ( !dpci_infos.php_devs[php_devfn].valid )
+        php_devfn = PCI_TO_PHP_DEVFN(bdf->v_devfn);
+        if ( !dpci_infos.php_devs[php_devfn].valid ||
+             !cmp_php_dev(&dpci_infos.php_devs[php_devfn], bdf) )
             goto found;
     }
-    if ( devfn != 0 )
+    if ( bdf->v_devfn != 0 )
         return -1;
 
     /* Co-locate functions for the same device in the same slot */
     for ( php_slot = 0; php_slot < PHP_SLOT_LEN; php_slot++ )
     {
-        php_devfn = PCI_DEVFN(php_slot, func);
+        php_devfn = PCI_DEVFN(php_slot, bdf->r_func);
         for ( php_func = 0; php_func < 8; php_func++ )
         {
             php_devfn_match = PCI_DEVFN(php_slot, php_func);
             if ( dpci_infos.php_devs[php_devfn_match].valid &&
-                 dpci_infos.php_devs[php_devfn_match].r_bus == bus &&
-                 dpci_infos.php_devs[php_devfn_match].r_dev == dev &&
+                 dpci_infos.php_devs[php_devfn_match].r_bus == bdf->r_bus &&
+                 dpci_infos.php_devs[php_devfn_match].r_dev == bdf->r_dev &&
                  !dpci_infos.php_devs[php_devfn].valid )
                 goto found;
         }
@@ -844,12 +873,9 @@ static int insert_to_php_devfn(int bus, 
     return 0;
 
 found:
-    dpci_infos.php_devs[php_devfn].valid  = 1;
-    dpci_infos.php_devs[php_devfn].r_bus  = bus;
-    dpci_infos.php_devs[php_devfn].r_dev  = dev;
-    dpci_infos.php_devs[php_devfn].r_func = func;
-    dpci_infos.php_devs[php_devfn].opt = opt;
-    return PHP_TO_PCI_DEVFN(php_devfn);
+    cpy_php_dev(&dpci_infos.php_devs[php_devfn], bdf);
+    dpci_infos.php_devs[php_devfn].v_devfn = PHP_TO_PCI_DEVFN(php_devfn);
+    return dpci_infos.php_devs[php_devfn].v_devfn;
 }
 
 /* Insert a new pass-through device into function 0 of a specific pci slot.
@@ -857,7 +883,8 @@ found:
  */
 int insert_bdf_to_php_devfn(char *bdf_slt)
 {
-    int seg, bus, dev, func, slot;
+    int slot;
+    struct php_dev bdf;
     char *bdf_str, *slt_str, *opt;
     const char *delim="@";
 
@@ -865,12 +892,15 @@ int insert_bdf_to_php_devfn(char *bdf_sl
     slt_str = bdf_slt;
     slot = token_value(slt_str);
 
-    if ( !next_bdf(&bdf_str, &seg, &bus, &dev, &func, &opt))
+    bdf = next_bdf(&bdf_str);
+    if (!bdf.valid)
     {
         return -1;
     }
 
-    return insert_to_php_devfn(bus, dev, func, PCI_DEVFN(slot, 0), opt);
+    bdf.v_devfn = PCI_DEVFN(slot, 0);
+
+    return insert_to_php_devfn(&bdf);
 
 }
 
@@ -896,22 +926,22 @@ int test_php_devfn(int devfn)
 /* find the pci slot for pass-through dev with specified BDF */
 int bdf_to_php_devfn(char *bdf_str)
 {
-    int seg, bus, dev, func, i;
-    char *opt;
+    int i;
+    struct php_dev bdf;
 
-    if ( !next_bdf(&bdf_str, &seg, &bus, &dev, &func, &opt))
+    bdf = next_bdf(&bdf_str);
+    if (!bdf.valid)
     {
         return -1;
     }
-    PT_LOG("%s: bdf: %04x:%02x.%x\n", __func__, bus, dev, func);
+    PT_LOG("%s: bdf: %04x:%02x.%x\n", __func__, bdf.r_bus, bdf.r_dev,
+           bdf.r_func);
 
     /* locate the virtual pci slot for this VTd device */
     for ( i = 0; i < PHP_DEVFN_LEN; i++ )
     {
         if ( dpci_infos.php_devs[i].valid &&
-           dpci_infos.php_devs[i].r_bus == bus &&
-           dpci_infos.php_devs[i].r_dev  == dev &&
-           dpci_infos.php_devs[i].r_func == func )
+             cmp_php_dev(&dpci_infos.php_devs[i], &bdf) )
         {
             return PHP_TO_PCI_DEVFN(i);
         }
@@ -3093,9 +3123,8 @@ static int pt_msixctrl_reg_write(struct 
 }
 
 static struct pt_dev * register_real_device(PCIBus *e_bus,
-        const char *e_dev_name, int e_devfn, uint8_t r_bus, uint8_t r_dev,
-        uint8_t r_func, uint32_t machine_irq, struct pci_access *pci_access,
-        char *opt)
+        const char *e_dev_name, struct php_dev *bdf,
+        uint32_t machine_irq, struct pci_access *pci_access)
 {
     int rc = -1, i;
     struct pt_dev *assigned_device = NULL;
@@ -3107,14 +3136,14 @@ static struct pt_dev * register_real_dev
     int msi_translate;
 
     PT_LOG("Assigning real physical device %02x:%02x.%x ...\n",
-        r_bus, r_dev, r_func);
+        bdf->r_bus, bdf->r_dev, bdf->r_func);
 
     /* Find real device structure */
     for (pci_dev = pci_access->devices; pci_dev != NULL;
          pci_dev = pci_dev->next)
     {
-        if ((r_bus == pci_dev->bus) && (r_dev == pci_dev->dev)
-            && (r_func == pci_dev->func))
+        if ((bdf->r_bus == pci_dev->bus) && (bdf->r_dev == pci_dev->dev)
+            && (bdf->r_func == pci_dev->func))
             break;
     }
     if ( pci_dev == NULL )
@@ -3125,19 +3154,16 @@ static struct pt_dev * register_real_dev
     pci_fill_info(pci_dev, PCI_FILL_IRQ | PCI_FILL_BASES | PCI_FILL_ROM_BASE | PCI_FILL_SIZES);
     pt_libpci_fixup(pci_dev);
 
-    if ( e_devfn == PCI_DEVFN_AUTO ) {
-        /*indicate a static assignment(not hotplug), so find a free PCI hot plug slot */
-        free_devfn = insert_to_php_devfn(r_bus, r_dev, r_func, 0, NULL);
-        if ( free_devfn > 0 )
-            e_devfn = free_devfn;
-        else
-            PT_LOG("Error: no free virtual PCI hot plug slot, thus no live migration.\n");
-    }
+    free_devfn = insert_to_php_devfn(bdf);
+    if ( free_devfn <= 0 )
+        PT_LOG("Error: no free virtual PCI hot plug slot, "
+               "thus no live migration.\n");
 
     msi_translate = direct_pci_msitranslate;
-    while (opt) {
-        if (get_next_keyval(&opt, &key, &val)) {
-            PT_LOG("Error: unrecognized PCI assignment option \"%s\"\n", opt);
+    while (bdf->opt) {
+        if (get_next_keyval(&bdf->opt, &key, &val)) {
+            PT_LOG("Error: unrecognized PCI assignment option \"%s\"\n",
+                   bdf->opt);
             break;
         }
 
@@ -3164,7 +3190,7 @@ static struct pt_dev * register_real_dev
 
     /* Register device */
     assigned_device = (struct pt_dev *) pci_register_device(e_bus, e_dev_name,
-                                sizeof(struct pt_dev), e_devfn,
+                                sizeof(struct pt_dev), free_devfn,
                                 pt_pci_read_config, pt_pci_write_config);
     if ( assigned_device == NULL )
     {
@@ -3181,9 +3207,9 @@ static struct pt_dev * register_real_dev
 
     /* Assign device */
     machine_bdf.reg = 0;
-    machine_bdf.bus = r_bus;
-    machine_bdf.dev = r_dev;
-    machine_bdf.func = r_func;
+    machine_bdf.bus = bdf->r_bus;
+    machine_bdf.dev = bdf->r_dev;
+    machine_bdf.func = bdf->r_func;
     rc = xc_assign_device(xc_handle, domid, machine_bdf.value);
     if ( rc < 0 )
         PT_LOG("Error: xc_assign_device error %d\n", rc);
@@ -3273,7 +3299,7 @@ static struct pt_dev * register_real_dev
 
 out:
     PT_LOG("Real physical device %02x:%02x.%x registered successfuly!\n"
-           "IRQ type = %s\n", r_bus, r_dev, r_func,
+           "IRQ type = %s\n", bdf->r_bus, bdf->r_dev, bdf->r_func,
            assigned_device->msi_trans_en? "MSI-INTx":"INTx");
 
     return assigned_device;
@@ -3353,18 +3379,11 @@ static int unregister_real_device(int ph
 int power_on_php_devfn(int php_devfn)
 {
     struct php_dev *php_dev = &dpci_infos.php_devs[php_devfn];
-    int pci_devfn = PHP_TO_PCI_DEVFN(php_devfn);
     struct pt_dev *pt_dev;
-    pt_dev = 
-        register_real_device(dpci_infos.e_bus,
-            "DIRECT PCI",
-            pci_devfn,
-            php_dev->r_bus,
-            php_dev->r_dev,
-            php_dev->r_func,
-            PT_MACHINE_IRQ_AUTO,
-            dpci_infos.pci_access,
-            php_dev->opt);
+
+    pt_dev = register_real_device(dpci_infos.e_bus, "DIRECT PCI",
+                                  php_dev, PT_MACHINE_IRQ_AUTO,
+                                  dpci_infos.pci_access);
 
     php_dev->opt = NULL;
 
@@ -3381,14 +3400,14 @@ int power_off_php_devfn(int php_devfn)
 
 int pt_init(PCIBus *e_bus, const char *direct_pci)
 {
-    int seg, b, d, f, status = -1;
+    int status = -1;
     struct pt_dev *pt_dev;
     struct pci_access *pci_access;
     char *vslots;
     char slot_str[8];
     char *direct_pci_head = NULL;
     char *direct_pci_p = NULL;
-    char *opt;
+    struct php_dev bdf;
 
     /* Initialize libpci */
     pci_access = pci_alloc();
@@ -3417,14 +3436,20 @@ int pt_init(PCIBus *e_bus, const char *d
     vslots = qemu_mallocz ( strlen(direct_pci) / 3 );
 
     /* Assign given devices to guest */
-    while ( next_bdf(&direct_pci_p, &seg, &b, &d, &f, &opt) )
+    while (1)
     {
+        bdf = next_bdf(&direct_pci_p);
+        if (!bdf.valid)
+        {
+            break;
+        }
         /* Register real device with the emulated bus */
-        pt_dev = register_real_device(e_bus, "DIRECT PCI", PCI_DEVFN_AUTO,
-            b, d, f, PT_MACHINE_IRQ_AUTO, pci_access, opt);
+        pt_dev = register_real_device(e_bus, "DIRECT PCI",
+                                      &bdf, PT_MACHINE_IRQ_AUTO, pci_access);
         if ( pt_dev == NULL )
         {
-            PT_LOG("Error: Registration failed (%02x:%02x.%x)\n", b, d, f);
+            PT_LOG("Error: Registration failed (%02x:%02x.%x)\n",
+                   bdf.r_bus, bdf.r_dev, bdf.r_func);
             goto err;
         }
 

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 16/18] ioemu: non-destructive parsing of PCI assignement strings
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (14 preceding siblings ...)
  2009-02-17  9:08 ` [rfc 15/18] ioemu: use struct php_dev to pass around PCI pass-through assignment parameters Simon Horman
@ 2009-02-17  9:08 ` Simon Horman
  2009-02-17  9:08 ` [rfc 17/18] ioemu: sort pass-through PCI devices before inserting Simon Horman
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: regex-parser-3f23188224b7ce69fcf13f52cb1c7977a5372900.patch --]
[-- Type: text/plain, Size: 14856 bytes --]

Signed-off-by: Simon Horman <horms@verge.net.au>

--- 
Fri, 13 Feb 2009 13:45:22 +1100
* Fix parsing of vslot - the regex was incorrect
* Allow 0x to prefix hex values that are parsed

Fri, 13 Feb 2009 15:22:10 +1100
* Rebased
  from "Restore xenfb.h and atkbd_ translation tables for * xenfbfront"
  to "fix memory/fd leak in pt_msix_init()"
Index: ioemu-remote/hw/pass-through.c
===================================================================
--- ioemu-remote.orig/hw/pass-through.c	2009-02-17 17:56:46.000000000 +0900
+++ ioemu-remote/hw/pass-through.c	2009-02-17 17:57:25.000000000 +0900
@@ -28,6 +28,12 @@
 #include "pt-msi.h"
 #include "qemu-xen.h"
 
+#include <sys/types.h>
+#include <regex.h>
+
+#define PHP_DEV_OPT_MSITRANSLATE 0x01
+#define PHP_DEV_OPT_ERROR        0x80
+
 struct php_dev {
     struct pt_dev *pt_dev;
     uint8_t valid;
@@ -35,7 +41,7 @@ struct php_dev {
     uint8_t r_dev;
     uint8_t r_func;
     uint8_t v_devfn;
-    char *opt;
+    uint8_t opt;
 };
 static struct dpci_infos {
 
@@ -750,62 +756,199 @@ static void cpy_php_dev(struct php_dev *
     memcpy(dst, src, sizeof(*dst));
 }
 
-static int token_value(char *token)
+static char *regerror_log(int errcode, const regex_t *preg)
 {
-    return strtol(token, NULL, 16);
+    size_t len = 0, new_len;
+    char *buf = NULL;
+
+    while (1)
+    {
+        new_len = regerror(errcode, preg, buf, len);
+        if (new_len <= len)
+            break;
+        len = new_len;
+        if (!(buf = realloc(buf, len)))
+        {
+            PT_LOG("Error: can't allocate %lu bytes for rexex error message\n",
+                   len);
+            return NULL;
+        }
+    }
+
+    return buf;
 }
 
-static struct php_dev next_bdf(char **str)
+static int opt_cmp(const char *source, const char *match)
 {
-    char *token;
-    const char *delim = ":.-";
-    struct php_dev bdf;
+    return strncmp(source, match, strlen(match));
+}
 
-    memset(&bdf, 0, sizeof(bdf));
+#define RE_OPT_MSITRANSLATE "msitranslate=\\(yes\\|1\\|no\\|0\\)"
 
-    if ( !(*str) ||
-          ( !strchr(*str, ':') && !strchr(*str, '.')) )
-        return bdf;
+static uint8_t parse_opt(const char *str, size_t n, int8_t opt)
+{
+    regmatch_t pmatch[2];
+    regex_t preg;
+    int err, status = PHP_DEV_OPT_ERROR;
+    char *err_str;
 
-    token  = strsep(str, delim);
-    /* segment */
+    if ((err = regcomp(&preg, RE_OPT_MSITRANSLATE, 0)))
+    {
+        if ((err_str = regerror_log(err, &preg)))
+        {
+            PT_LOG("regcomp() failed: %s\n", err_str);
+        }
+        else
+        {
+            PT_LOG("regcomp() failed\n");
+        }
+        return PHP_DEV_OPT_ERROR;
+    }
+
+    while (n > 0)
+    {
+        if ((err = regexec(&preg, str, 2, pmatch, 0)))
+        {
+            PT_LOG("Error: unrecognized PCI assignment option at \"%s\"\n",
+                   str);
+            goto err;
+        }
 
-    token  = strsep(str, delim);
-    bdf.r_bus  = token_value(token);
+        if (!opt_cmp(str + pmatch[1].rm_so, "no") ||
+            !opt_cmp(str + pmatch[1].rm_so, "0"))
+        {
+            opt &= ~PHP_DEV_OPT_MSITRANSLATE;
+        }
+        else if (!opt_cmp(str + pmatch[1].rm_so, "yes") ||
+                 !opt_cmp(str + pmatch[1].rm_so, "1"))
+        {
+            opt |= PHP_DEV_OPT_MSITRANSLATE;
+        }
 
-    token  = strsep(str, delim);
-    bdf.r_dev  = token_value(token);
+        n -= pmatch[0].rm_eo;
+        str += pmatch[0].rm_eo;
+        if (!n)
+        {
+            break;
+        }
+        if (*str != ',')
+        {
+            PT_LOG("Error: trailing garbage in PCI assignment option at "
+                   "\"%s\"\n", str);
+            goto err;
+        }
+        n--;
+        str++;
+        if (!n)
+        {
+            break;
+        }
+    }
 
-    token  = strsep(str, delim);
-    bdf.opt = strchr(token, ',');
-    if (bdf.opt)
-        *(bdf.opt)++ = '\0';
+    status = opt;
+err:
+    regfree(&preg);
+    return status;
+}
 
-    bdf.r_func  = token_value(token);
+#define RE_SEG  "\\(0x\\)\\?[0-9a-fA-F]\\{4\\}"
+#define RE_BUS  "\\(0x\\)\\?\\([0-9a-fA-F]\\{2\\}\\)"
+#define RE_DEV  "\\(0x\\)\\?\\([01][0-9a-fA-F]\\)"
+#define RE_FUNC "\\(0x\\)\\?\\([0-7]\\)"
+#define RE_BDF  RE_SEG ":" RE_BUS ":" RE_DEV "\\." RE_FUNC
 
-    bdf.valid = 1;
+#define RE_OPT  "\\(,\\([^@-]\\+\\)\\)"
+#define RE_BDF_OPT RE_BDF RE_OPT "\\?"
 
-    return bdf;
-}
+#define RE_VDEV  "\\(0x\\)\\?\\([01]\\?[0-9a-fA-F]\\)"
+#define RE_BDF_OPT_SLOT RE_BDF_OPT "\\(@" RE_VDEV "\\)\\?"
 
-static int get_next_keyval(char **option, char **key, char **val)
+static struct php_dev *parse_bdf(const char *str)
 {
-    char *opt, *k, *v;
+    struct php_dev *list = NULL, *e;
+    regex_t preg;
+    regmatch_t pmatch[14];
+    int err, nmemb = 0;
+    char *err_str;
 
-    k = *option;
-    opt = strchr(k, ',');
-    if (opt)
-        *opt++ = '\0';
-    v = strchr(k, '=');
-    if (!v)
-        return -1;
-    *v++ = '\0';
+    if ((err = regcomp(&preg, RE_BDF_OPT_SLOT, 0)))
+    {
+        if ((err_str = regerror_log(err, &preg)))
+        {
+            PT_LOG("regcomp() failed: %s\n", err_str);
+        }
+        else
+        {
+            PT_LOG("regcomp() failed\n");
+        }
+        return NULL;
+    }
 
-    *key = k;
-    *val = v;
-    *option = opt;
+    while (1)
+    {
+        if ((err = regexec(&preg, str, 14, pmatch, 0)))
+        {
+            PT_LOG("Error: invalid PCI assignment \"%s\"\n", str);
+            goto err;
+        }
 
-    return 0;
+        list = realloc(list, (++nmemb + 1) * sizeof(*list));
+        e = list + nmemb - 1;
+        memset(e, 0, sizeof(*e) * 2);
+
+        e->r_bus = strtol(str + pmatch[3].rm_so, NULL, 16);
+        e->r_dev = strtol(str + pmatch[5].rm_so, NULL, 16);
+        e->r_func = strtol(str + pmatch[7].rm_so, NULL, 16);
+
+        if (pmatch[9].rm_so >= 0)
+        {
+           if (parse_opt(str + pmatch[9].rm_so,
+                         pmatch[9].rm_eo - pmatch[9].rm_so,
+                         direct_pci_msitranslate) & PHP_DEV_OPT_ERROR)
+           {
+               goto err;
+           }
+        }
+
+        if (pmatch[12].rm_so >= 0)
+        {
+            e->v_devfn = PCI_DEVFN(strtol(str + pmatch[12].rm_so, NULL, 16), 0);
+        }
+
+        e->valid = 1;
+
+        str += pmatch[0].rm_eo;
+        if (!*str)
+        {
+            break;
+        }
+        if (*str != '-')
+        {
+            PT_LOG("Error: trailing garbage in PCI assignment at \"%s\"\n",
+                   str);
+            goto err;
+        }
+        str++;
+        /* A trailing '-' delimiter is ok */
+        if (!*str)
+        {
+            break;
+        }
+    }
+
+    regfree(&preg);
+    return list;
+
+err:
+    regfree(&preg);
+    if (list)
+        free(list);
+    return NULL;
+}
+
+static struct php_dev *next_bdf(struct php_dev *l)
+{
+    return (++l)->valid ? l : NULL;
 }
 
 static void msi_set_enable(struct pt_dev *ptdev, int en)
@@ -881,27 +1024,21 @@ found:
 /* Insert a new pass-through device into function 0 of a specific pci slot.
  * input  dom:bus:dev.func@slot
  */
-int insert_bdf_to_php_devfn(char *bdf_slt)
+int insert_bdf_to_php_devfn(const char *bdf_slt)
 {
-    int slot;
-    struct php_dev bdf;
-    char *bdf_str, *slt_str, *opt;
-    const char *delim="@";
-
-    bdf_str = strsep(&bdf_slt, delim);
-    slt_str = bdf_slt;
-    slot = token_value(slt_str);
+    struct php_dev *bdf;
+    int status;
 
-    bdf = next_bdf(&bdf_str);
-    if (!bdf.valid)
+    bdf = parse_bdf(bdf_slt);
+    if (!bdf)
     {
         return -1;
     }
 
-    bdf.v_devfn = PCI_DEVFN(slot, 0);
-
-    return insert_to_php_devfn(&bdf);
+    status = insert_to_php_devfn(bdf);
+    free(bdf);
 
+    return status;
 }
 
 /* Test if a pci devfn has a device
@@ -924,30 +1061,32 @@ int test_php_devfn(int devfn)
 }
 
 /* find the pci slot for pass-through dev with specified BDF */
-int bdf_to_php_devfn(char *bdf_str)
+int bdf_to_php_devfn(const char *bdf_str)
 {
-    int i;
-    struct php_dev bdf;
+    int i, status = -1;
+    struct php_dev *bdf;
 
-    bdf = next_bdf(&bdf_str);
-    if (!bdf.valid)
+    bdf = parse_bdf(bdf_str);
+    if (!bdf)
     {
         return -1;
     }
-    PT_LOG("%s: bdf: %04x:%02x.%x\n", __func__, bdf.r_bus, bdf.r_dev,
-           bdf.r_func);
+    PT_LOG("%s: bdf: %04x:%02x.%x\n", __func__, bdf->r_bus, bdf->r_dev,
+           bdf->r_func);
 
     /* locate the virtual pci slot for this VTd device */
     for ( i = 0; i < PHP_DEVFN_LEN; i++ )
     {
         if ( dpci_infos.php_devs[i].valid &&
-             cmp_php_dev(&dpci_infos.php_devs[i], &bdf) )
+             cmp_php_dev(&dpci_infos.php_devs[i], bdf) )
         {
-            return PHP_TO_PCI_DEVFN(i);
+            status = PHP_TO_PCI_DEVFN(i);
+            break;
         }
     }
 
-    return -1;
+    free(bdf);
+    return status;
 }
 
 /* Being called each time a mmio region has been updated */
@@ -3132,8 +3271,6 @@ static struct pt_dev * register_real_dev
     uint8_t e_device, e_intx;
     struct pci_config_cf8 machine_bdf;
     int free_devfn = -1;
-    char *key, *val;
-    int msi_translate;
 
     PT_LOG("Assigning real physical device %02x:%02x.%x ...\n",
         bdf->r_bus, bdf->r_dev, bdf->r_func);
@@ -3159,35 +3296,6 @@ static struct pt_dev * register_real_dev
         PT_LOG("Error: no free virtual PCI hot plug slot, "
                "thus no live migration.\n");
 
-    msi_translate = direct_pci_msitranslate;
-    while (bdf->opt) {
-        if (get_next_keyval(&bdf->opt, &key, &val)) {
-            PT_LOG("Error: unrecognized PCI assignment option \"%s\"\n",
-                   bdf->opt);
-            break;
-        }
-
-        if (strcmp(key, "msitranslate") == 0)
-        {
-            if (strcmp(val, "0") == 0 || strcmp(val, "no") == 0)
-            {
-                PT_LOG("Disable MSI translation via per device option\n");
-                msi_translate = 0;
-            }
-            else if (strcmp(val, "1") == 0 || strcmp(val, "yes") == 0)
-            {
-                PT_LOG("Enable MSI translation via per device option\n");
-                msi_translate = 1;
-            }
-            else
-                PT_LOG("Error: unrecognized value for msitranslate=\n");
-        }
-        else
-            PT_LOG("Error: unrecognized PCI assignment option \"%s=%s\"\n", key, val);
-
-    }
-
-
     /* Register device */
     assigned_device = (struct pt_dev *) pci_register_device(e_bus, e_dev_name,
                                 sizeof(struct pt_dev), free_devfn,
@@ -3203,7 +3311,10 @@ static struct pt_dev * register_real_dev
                 assigned_device;
 
     assigned_device->pci_dev = pci_dev;
-    assigned_device->msi_trans_cap = msi_translate;
+    if (bdf->opt & PHP_DEV_OPT_MSITRANSLATE)
+    {
+        assigned_device->msi_trans_cap = 1;
+    }
 
     /* Assign device */
     machine_bdf.reg = 0;
@@ -3384,9 +3495,6 @@ int power_on_php_devfn(int php_devfn)
     pt_dev = register_real_device(dpci_infos.e_bus, "DIRECT PCI",
                                   php_dev, PT_MACHINE_IRQ_AUTO,
                                   dpci_infos.pci_access);
-
-    php_dev->opt = NULL;
-
     php_dev->pt_dev = pt_dev;
 
     return 0;
@@ -3405,9 +3513,7 @@ int pt_init(PCIBus *e_bus, const char *d
     struct pci_access *pci_access;
     char *vslots;
     char slot_str[8];
-    char *direct_pci_head = NULL;
-    char *direct_pci_p = NULL;
-    struct php_dev bdf;
+    struct php_dev *list, *bdf;
 
     /* Initialize libpci */
     pci_access = pci_alloc();
@@ -3427,29 +3533,21 @@ int pt_init(PCIBus *e_bus, const char *d
         return 0;
     }
 
-    if ( !(direct_pci_head = direct_pci_p = strdup(direct_pci)) )
-        return 0;
-
     /* the virtual pci slots of all pass-through devs
      * with hex format: xx;xx...;
      */
     vslots = qemu_mallocz ( strlen(direct_pci) / 3 );
 
     /* Assign given devices to guest */
-    while (1)
+    for (bdf = list = parse_bdf(direct_pci); bdf; bdf = next_bdf(bdf))
     {
-        bdf = next_bdf(&direct_pci_p);
-        if (!bdf.valid)
-        {
-            break;
-        }
         /* Register real device with the emulated bus */
         pt_dev = register_real_device(e_bus, "DIRECT PCI",
-                                      &bdf, PT_MACHINE_IRQ_AUTO, pci_access);
+                                      bdf, PT_MACHINE_IRQ_AUTO, pci_access);
         if ( pt_dev == NULL )
         {
             PT_LOG("Error: Registration failed (%02x:%02x.%x)\n",
-                   bdf.r_bus, bdf.r_dev, bdf.r_func);
+                   bdf->r_bus, bdf->r_dev, bdf->r_func);
             goto err;
         }
 
@@ -3458,13 +3556,15 @@ int pt_init(PCIBus *e_bus, const char *d
     }
 
     /* Write virtual slots info to xenstore for Control panel use */
-    xenstore_write_vslots(vslots);
+    if (*vslots)
+    {
+        xenstore_write_vslots(vslots);
+    }
 
     status = 0;
 err:
     qemu_free(vslots);
-    free(direct_pci_head);
-
+    free(list);
     return status;
 }
 
Index: ioemu-remote/hw/pci.h
===================================================================
--- ioemu-remote.orig/hw/pci.h	2009-02-17 17:55:37.000000000 +0900
+++ ioemu-remote/hw/pci.h	2009-02-17 17:57:05.000000000 +0900
@@ -115,17 +115,17 @@ PCIBus *pci_bridge_init(PCIBus *bus, int
 #define PCI_DEVFN_IS_PHP(x)   ((x) >= PHP_DEVFN_START && (x) < PHP_DEVFN_END)
 #define PHP_DEVFN_IS_VALID(x) (PCI_DEVFN_IS_PHP(PHP_TO_PCI_DEVFN(x)))
 
-int insert_bdf_to_php_devfn(char *bfd_slt);
+int insert_bdf_to_php_devfn(const char *bfd_slt);
 int test_php_devfn(int devfn);
-int bdf_to_php_devfn(char *bfd_str);
+int bdf_to_php_devfn(const char *bfd_str);
 int power_on_php_devfn(int php_devfn);
 int power_off_php_devfn(int php_devfn);
 
 /* pci_emulation.c */
 #include "hw/pci_emulation.h"
  
-void do_pci_add(char *devname);
-void do_pci_del(char *devname);
+void do_pci_add(const char *devname);
+void do_pci_del(const char *devname);
 
 /* lsi53c895a.c */
 #define LSI_MAX_DEVS 7
Index: ioemu-remote/vl.c
===================================================================
--- ioemu-remote.orig/vl.c	2009-02-17 17:55:37.000000000 +0900
+++ ioemu-remote/vl.c	2009-02-17 17:57:05.000000000 +0900
@@ -3898,7 +3898,7 @@ void qemu_chr_close(CharDriverState *chr
 }
 
 #ifdef CONFIG_PASSTHROUGH
-void do_pci_del(char *devname)
+void do_pci_del(const char *devname)
 {
     int devfn;
     devfn = bdf_to_php_devfn(devname);
@@ -3906,7 +3906,7 @@ void do_pci_del(char *devname)
     acpi_php_del(devfn);
 }
 
-void do_pci_add(char *devname)
+void do_pci_add(const char *devname)
 {
     int devfn;
 

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 17/18] ioemu: sort pass-through PCI devices before inserting
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (15 preceding siblings ...)
  2009-02-17  9:08 ` [rfc 16/18] ioemu: non-destructive parsing of PCI assignement strings Simon Horman
@ 2009-02-17  9:08 ` Simon Horman
  2009-02-17  9:08 ` [rfc 18/18] ioemu: Allow virtual function to be speficied for PCI pass-through Simon Horman
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: regex-qsort.patch --]
[-- Type: text/plain, Size: 1225 bytes --]

When passing though a milti-function device it is important that function
zero is inserted first as the multifunction bit in its PCI_HEADER_TYPE
needs to be chacked to verify that the device is multi-function and enable
probing of non-zero functions in (Linux and probably other) guests.

Signed-off-by: Simon Horman <horms@verge.net.au>

Index: ioemu-remote/hw/pass-through.c
===================================================================
--- ioemu-remote.orig/hw/pass-through.c	2009-02-17 17:51:30.000000000 +0900
+++ ioemu-remote/hw/pass-through.c	2009-02-17 17:52:08.000000000 +0900
@@ -751,6 +751,11 @@ static int cmp_php_dev(const struct php_
     return 0;
 }
 
+static int cmp_php_dev_for_qsort(const void *a, const void *b)
+{
+    return cmp_php_dev(a, b);
+}
+
 static void cpy_php_dev(struct php_dev *dst, const struct php_dev *src)
 {
     memcpy(dst, src, sizeof(*dst));
@@ -937,6 +942,7 @@ static struct php_dev *parse_bdf(const c
     }
 
     regfree(&preg);
+    qsort(list, nmemb, sizeof(*list), cmp_php_dev_for_qsort);
     return list;
 
 err:

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [rfc 18/18] ioemu: Allow virtual function to be speficied for PCI pass-through
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (16 preceding siblings ...)
  2009-02-17  9:08 ` [rfc 17/18] ioemu: sort pass-through PCI devices before inserting Simon Horman
@ 2009-02-17  9:08 ` Simon Horman
  2009-02-17 12:03 ` [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Ian Jackson
  2009-02-18  3:12 ` Yuji Shimada
  19 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17  9:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

[-- Attachment #1: regex-v_devfn.patch --]
[-- Type: text/plain, Size: 3216 bytes --]

Signed-off-by: Simon Horman <horms@verge.net.au>

--- 
Fri, 13 Feb 2009 13:45:22 +1100
* Fix parsing of vslot - the regex was incorrect
* Allow 0x to prefix hex values that are parsed

Index: ioemu-remote/hw/pass-through.c
===================================================================
--- ioemu-remote.orig/hw/pass-through.c	2009-02-17 17:52:08.000000000 +0900
+++ ioemu-remote/hw/pass-through.c	2009-02-17 17:52:13.000000000 +0900
@@ -866,14 +866,15 @@ err:
 #define RE_BDF_OPT RE_BDF RE_OPT "\\?"
 
 #define RE_VDEV  "\\(0x\\)\\?\\([01]\\?[0-9a-fA-F]\\)"
-#define RE_BDF_OPT_SLOT RE_BDF_OPT "\\(@" RE_VDEV "\\)\\?"
+#define RE_VDEVFN RE_VDEV "\\(\\." RE_FUNC "\\)\\?"
+#define RE_BDF_OPT_SLOT RE_BDF_OPT "\\(@" RE_VDEVFN "\\)\\?"
 
 static struct php_dev *parse_bdf(const char *str)
 {
     struct php_dev *list = NULL, *e;
     regex_t preg;
-    regmatch_t pmatch[14];
-    int err, nmemb = 0;
+    regmatch_t pmatch[17];
+    int err, nmemb = 0, v_func = 0;
     char *err_str;
 
     if ((err = regcomp(&preg, RE_BDF_OPT_SLOT, 0)))
@@ -891,7 +892,7 @@ static struct php_dev *parse_bdf(const c
 
     while (1)
     {
-        if ((err = regexec(&preg, str, 14, pmatch, 0)))
+        if ((err = regexec(&preg, str, 17, pmatch, 0)))
         {
             PT_LOG("Error: invalid PCI assignment \"%s\"\n", str);
             goto err;
@@ -917,7 +918,12 @@ static struct php_dev *parse_bdf(const c
 
         if (pmatch[12].rm_so >= 0)
         {
-            e->v_devfn = PCI_DEVFN(strtol(str + pmatch[12].rm_so, NULL, 16), 0);
+            if (pmatch[15].rm_so >= 0)
+            {
+                v_func = strtol(str + pmatch[15].rm_so, NULL, 16);
+            }
+            e->v_devfn = PCI_DEVFN(strtol(str + pmatch[12].rm_so, NULL, 16),
+                                   v_func);
         }
 
         e->valid = 1;
Index: ioemu-remote/xenstore.c
===================================================================
--- ioemu-remote.orig/xenstore.c	2009-02-17 17:28:06.000000000 +0900
+++ ioemu-remote/xenstore.c	2009-02-17 17:52:13.000000000 +0900
@@ -582,6 +582,24 @@ void xenstore_parse_domain_config(int hv
 
             strcat(direct_pci_str, dev);
 
+            if (pasprintf(&buf, "/local/domain/0/backend/pci/%u/%u/vdevfn-%d",
+                          hvm_domid, pci_devid, i) != -1) {
+                free(dev);
+                dev = xs_read(xsh, XBT_NULL, buf, &len);
+            }
+            if ( dev ) {
+                if (strlen(dev) + strlen(direct_pci_str) >
+                    DIRECT_PCI_STR_LEN - 2) {
+                    fprintf(stderr, "qemu: too many pci pass-through "
+                            "devices\n");
+                    memset(direct_pci_str, 0, DIRECT_PCI_STR_LEN);
+                    goto out;
+                }
+                strcat(direct_pci_str, "@");
+                strcat(direct_pci_str, dev);
+            }
+
+
             if (pasprintf(&buf, "/local/domain/0/backend/pci/%u/%u/opts-%d",
                           hvm_domid, pci_devid, i) != -1) {
                 free(dev);

-- 

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (17 preceding siblings ...)
  2009-02-17  9:08 ` [rfc 18/18] ioemu: Allow virtual function to be speficied for PCI pass-through Simon Horman
@ 2009-02-17 12:03 ` Ian Jackson
  2009-02-17 22:24   ` Simon Horman
  2009-02-18  3:12 ` Yuji Shimada
  19 siblings, 1 reply; 62+ messages in thread
From: Ian Jackson @ 2009-02-17 12:03 UTC (permalink / raw)
  To: Simon Horman; +Cc: xen-devel

Simon Horman writes ("[rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough"):
> This series starts of with servaral cleanup patches.

Many of these patches are cosmetic or cleanup changes to code which is
identical in upstream qemu and in qemu-xen-unstable.  These are not
really things which I want to commit directly into the Xen tree; we
try to minimise the differences between the Xen qemu and upstream.

If you would like to clean up the upstream qemu code then by all means
submit your changes to the qemu-devel mailing list as patches against
upstream qemu.  I'd be happy to give you some pointers (feel free to
email me privately) and engage with you on the qemu-devel list.  If
your changes are accepted by upstream we'll get them in due course,
when I merge.

If you do consider submitting upstream you might consider lumping your
changes together rather more, so that a single patch covers each set
of similar changes to different files appears.

> The meat of the changes start with the patch
> "ioemu: use devfn instead of slots as the unit for passthrough"

I'd be interested to hear the opinions on this proposal of other
developers of the passthrough support.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-02-17 12:03 ` [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Ian Jackson
@ 2009-02-17 22:24   ` Simon Horman
  0 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-17 22:24 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

On Tue, Feb 17, 2009 at 12:03:28PM +0000, Ian Jackson wrote:
> Simon Horman writes ("[rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough"):
> > This series starts of with servaral cleanup patches.
> 
> Many of these patches are cosmetic or cleanup changes to code which is
> identical in upstream qemu and in qemu-xen-unstable.  These are not
> really things which I want to commit directly into the Xen tree; we
> try to minimise the differences between the Xen qemu and upstream.
> 
> If you would like to clean up the upstream qemu code then by all means
> submit your changes to the qemu-devel mailing list as patches against
> upstream qemu.  I'd be happy to give you some pointers (feel free to
> email me privately) and engage with you on the qemu-devel list.  If
> your changes are accepted by upstream we'll get them in due course,
> when I merge.
> 
> If you do consider submitting upstream you might consider lumping your
> changes together rather more, so that a single patch covers each set
> of similar changes to different files appears.

Thanks for the clarification, I will contact you off-list.

> > The meat of the changes start with the patch
> > "ioemu: use devfn instead of slots as the unit for passthrough"
> 
> I'd be interested to hear the opinions on this proposal of other
> developers of the passthrough support.

Thanks. These later patches are still somewhat work in progress,
so I am very interested on hearing people's ideas on what they
should look like.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
                   ` (18 preceding siblings ...)
  2009-02-17 12:03 ` [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Ian Jackson
@ 2009-02-18  3:12 ` Yuji Shimada
  2009-02-19  6:15   ` Simon Horman
  19 siblings, 1 reply; 62+ messages in thread
From: Yuji Shimada @ 2009-02-18  3:12 UTC (permalink / raw)
  To: Simon Horman; +Cc: xen-devel, Ian Jackson

On Tue, 17 Feb 2009 20:07:48 +1100
Simon Horman <horms@verge.net.au> wrote:

> This series starts of with servaral cleanup patches.
> 
> The meat of the changes start with the patch
> "ioemu: use devfn instead of slots as the unit for passthrough"
> 
> This allows multi-function cards to be appear in guets as
> multi-function cards, with the restriction that function 0 must
> be passed through. Otherwise each function is allocated its own
> slot, as before.

How do you guarantee that virtual gsi is not shared between
pass-throughed devices? Xen hypervisor does not support sharing
virtual gsi between pass-throughed devices.

The function 0 always uses INTA. The function 1-7 might use INTB-INTD.
INTB-INTD will route to the same virtual gsi with other device's
INTA-INTD.

Current interrupt routing in xen hypervisor is defined as follows.

xen/include/asm-x86/hvm/irq.h
#define hvm_pci_intx_gsi(dev, intx)  \
    (((((dev)<<2) + ((dev)>>3) + (intx)) & 31) + 16)

Thanks,
--
Yuji Shimada

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-02-18  3:12 ` Yuji Shimada
@ 2009-02-19  6:15   ` Simon Horman
  2009-02-19  9:21     ` Yuji Shimada
  0 siblings, 1 reply; 62+ messages in thread
From: Simon Horman @ 2009-02-19  6:15 UTC (permalink / raw)
  To: Yuji Shimada; +Cc: xen-devel, Ian Jackson

On Wed, Feb 18, 2009 at 12:12:27PM +0900, Yuji Shimada wrote:
> On Tue, 17 Feb 2009 20:07:48 +1100
> Simon Horman <horms@verge.net.au> wrote:
> 
> > This series starts of with servaral cleanup patches.
> > 
> > The meat of the changes start with the patch
> > "ioemu: use devfn instead of slots as the unit for passthrough"
> > 
> > This allows multi-function cards to be appear in guets as
> > multi-function cards, with the restriction that function 0 must
> > be passed through. Otherwise each function is allocated its own
> > slot, as before.
> 
> How do you guarantee that virtual gsi is not shared between
> pass-throughed devices? Xen hypervisor does not support sharing
> virtual gsi between pass-throughed devices.
> 
> The function 0 always uses INTA. The function 1-7 might use INTB-INTD.
> INTB-INTD will route to the same virtual gsi with other device's
> INTA-INTD.
> 
> Current interrupt routing in xen hypervisor is defined as follows.
> 
> xen/include/asm-x86/hvm/irq.h
> #define hvm_pci_intx_gsi(dev, intx)  \
>     (((((dev)<<2) + ((dev)>>3) + (intx)) & 31) + 16)

To be honest I am a little confused about what the above maping
is supposed to achive.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-02-19  6:15   ` Simon Horman
@ 2009-02-19  9:21     ` Yuji Shimada
  2009-02-19  9:38       ` Keir Fraser
  0 siblings, 1 reply; 62+ messages in thread
From: Yuji Shimada @ 2009-02-19  9:21 UTC (permalink / raw)
  To: Simon Horman; +Cc: xen-devel, Ian Jackson

[-- Attachment #1: Type: text/plain, Size: 1530 bytes --]

On Thu, 19 Feb 2009 17:15:49 +1100
Simon Horman <horms@verge.net.au> wrote:

> On Wed, Feb 18, 2009 at 12:12:27PM +0900, Yuji Shimada wrote:
> > On Tue, 17 Feb 2009 20:07:48 +1100
> > Simon Horman <horms@verge.net.au> wrote:
> > 
> > > This series starts of with servaral cleanup patches.
> > > 
> > > The meat of the changes start with the patch
> > > "ioemu: use devfn instead of slots as the unit for passthrough"
> > > 
> > > This allows multi-function cards to be appear in guets as
> > > multi-function cards, with the restriction that function 0 must
> > > be passed through. Otherwise each function is allocated its own
> > > slot, as before.
> > 
> > How do you guarantee that virtual gsi is not shared between
> > pass-throughed devices? Xen hypervisor does not support sharing
> > virtual gsi between pass-throughed devices.
> > 
> > The function 0 always uses INTA. The function 1-7 might use INTB-INTD.
> > INTB-INTD will route to the same virtual gsi with other device's
> > INTA-INTD.
> > 
> > Current interrupt routing in xen hypervisor is defined as follows.
> > 
> > xen/include/asm-x86/hvm/irq.h
> > #define hvm_pci_intx_gsi(dev, intx)  \
> >     (((((dev)<<2) + ((dev)>>3) + (intx)) & 31) + 16)
> 
> To be honest I am a little confused about what the above maping
> is supposed to achive.

Please find the attached figure which shows the interrupt routing in
xen hypervisor.

When we use only INTA, virtual GSIs are not shared.
But when we use INTB-INTD, virtual GSIs might be shared.

Thanks,
--
Yuji Shimada

[-- Attachment #2: interrupt_routing_in_hypervisor.png --]
[-- Type: image/png, Size: 42675 bytes --]

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-02-19  9:21     ` Yuji Shimada
@ 2009-02-19  9:38       ` Keir Fraser
  2009-02-20  7:07         ` Simon Horman
  0 siblings, 1 reply; 62+ messages in thread
From: Keir Fraser @ 2009-02-19  9:38 UTC (permalink / raw)
  To: Yuji Shimada, Simon Horman; +Cc: xen-devel, Ian Jackson

On 19/02/2009 09:21, "Yuji Shimada" <shimada-yxb@necst.nec.co.jp> wrote:

>> To be honest I am a little confused about what the above maping
>> is supposed to achive.
> 
> Please find the attached figure which shows the interrupt routing in
> xen hypervisor.

The point being to deliberately permute the mapping to try to avoid
accidental GSI sharing even if there are patterns in DEV:INTX usage (e.g.,
all devs use INTA).

 -- Keir

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-02-19  9:38       ` Keir Fraser
@ 2009-02-20  7:07         ` Simon Horman
  2009-02-23  6:24           ` Yuji Shimada
  0 siblings, 1 reply; 62+ messages in thread
From: Simon Horman @ 2009-02-20  7:07 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Yuji Shimada, xen-devel, Ian Jackson

On Thu, Feb 19, 2009 at 09:38:24AM +0000, Keir Fraser wrote:
> On 19/02/2009 09:21, "Yuji Shimada" <shimada-yxb@necst.nec.co.jp> wrote:
> 
> >> To be honest I am a little confused about what the above maping
> >> is supposed to achive.
> > 
> > Please find the attached figure which shows the interrupt routing in
> > xen hypervisor.
> 
> The point being to deliberately permute the mapping to try to avoid
> accidental GSI sharing even if there are patterns in DEV:INTX usage (e.g.,
> all devs use INTA).

Thanks for the information, especially the diagram. It is very useful.

Armed with this new kowledge I have a few questions.

1. Shimada-san stated that shared GSI are not permitted for
   pass-through devices. Is it permitted for a GSI to be shared
   between a pass-through device and a non-pass-through device?
   The current scheme seems to leave scope for this as

   gsi 6 A = gsi 13 D = gsi 21 C = gsi 29 B
   gsi 7 A = gsi 14 D = gsi 22 C = gsi 30 B

2. In several places in ioemu:io/passthrough.c e_intx is set to 0,
   corresponding to INTA. Is this because it is virtual and
   using INTA is convenient? Or is it because it is assumed
   that the physical device being passed-through is a 0 function
   (and 0 functions always use INTA) ?

   The latter assumption is not valid because even without my pacthes
   it is possible to pass-through non-0 functions, its just that
   they end up as the 0th function of the virtual slot in the guest.


I am now pretty sure that my change leads to incorrect usage of
hvm_pci_intx_gsi(). Answers to the questions above will help me to
understand how trivial to fix this is.

The most difficult cases seem to be 1) sharing of gsi between
pass-through and non-pass-through devices is not permitted or 2)
intx used inside ioemu:io/passthrough.c should reflect the physical
intx. In either case I wonder if a reasonable solution would be to
just allocate allocate GSI in a non-colliding manner. Say, GSI 16 for
the first device to ask, 17 for the next one and so on. Or perhaps
the existing hash + overflow to the next GSI on collision.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-02-20  7:07         ` Simon Horman
@ 2009-02-23  6:24           ` Yuji Shimada
  2009-02-23  6:55             ` Simon Horman
  0 siblings, 1 reply; 62+ messages in thread
From: Yuji Shimada @ 2009-02-23  6:24 UTC (permalink / raw)
  To: Simon Horman; +Cc: xen-devel, Ian Jackson, Keir Fraser

On Fri, 20 Feb 2009 18:07:00 +1100
Simon Horman <horms@verge.net.au> wrote:

> On Thu, Feb 19, 2009 at 09:38:24AM +0000, Keir Fraser wrote:
> > On 19/02/2009 09:21, "Yuji Shimada" <shimada-yxb@necst.nec.co.jp> wrote:
> > 
> > >> To be honest I am a little confused about what the above maping
> > >> is supposed to achive.
> > > 
> > > Please find the attached figure which shows the interrupt routing in
> > > xen hypervisor.
> > 
> > The point being to deliberately permute the mapping to try to avoid
> > accidental GSI sharing even if there are patterns in DEV:INTX usage (e.g.,
> > all devs use INTA).
> 
> Thanks for the information, especially the diagram. It is very useful.
> 
> Armed with this new kowledge I have a few questions.
> 
> 1. Shimada-san stated that shared GSI are not permitted for
>    pass-through devices. Is it permitted for a GSI to be shared
>    between a pass-through device and a non-pass-through device?

Yes, it is permitted. But guest software will receive spurious
interrupt. So it is not good.

>    The current scheme seems to leave scope for this as
> 
>    gsi 6 A = gsi 13 D = gsi 21 C = gsi 29 B
>    gsi 7 A = gsi 14 D = gsi 22 C = gsi 30 B

Do you mean this?

     Dev 6 INTA = Dev 13 INTD = Dev 21 INTC = Dev 29 INTB -> GSI 40
     Dev 7 INTA = Dev 14 INTD = Dev 22 INTC = Dev 30 INTB -> GSI 44

> 2. In several places in ioemu:io/passthrough.c e_intx is set to 0,
>    corresponding to INTA. Is this because it is virtual and
>    using INTA is convenient? Or is it because it is assumed
>    that the physical device being passed-through is a 0 function
>    (and 0 functions always use INTA) ?

INTx is virtualized, because the single function device normally use
INTA.

When we make multi-function cards appear in guests as multi-function
cards, it is good that virtual INTx reflects the physical INTx. The
reason is one of functions of a device may share INTx of the other
function. In my environment, UHCI(00:1d.0) and EHCI(00:1d.7) share the
same INTA. If physical functions share physical INTx, virtual
functions should share virtual INTx. To achieve this, virtual INTx
needs to reflect the physical INTx.

>    The latter assumption is not valid because even without my pacthes
>    it is possible to pass-through non-0 functions, its just that
>    they end up as the 0th function of the virtual slot in the guest.
> 
> I am now pretty sure that my change leads to incorrect usage of
> hvm_pci_intx_gsi(). Answers to the questions above will help me to
> understand how trivial to fix this is.
> 
> The most difficult cases seem to be 1) sharing of gsi between
> pass-through and non-pass-through devices is not permitted or 2)
> intx used inside ioemu:io/passthrough.c should reflect the physical
> intx. In either case I wonder if a reasonable solution would be to
> just allocate allocate GSI in a non-colliding manner. Say, GSI 16 for
> the first device to ask, 17 for the next one and so on. Or perhaps
> the existing hash + overflow to the next GSI on collision.

The another solution is expanding GSI to 127. I don't sure it is
possible, but sharing virtual GSI will not occur.

Thanks,
--
Yuji Shimada

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-02-23  6:24           ` Yuji Shimada
@ 2009-02-23  6:55             ` Simon Horman
  2009-02-23  8:39               ` Yuji Shimada
  2009-02-23 11:31               ` Keir Fraser
  0 siblings, 2 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-23  6:55 UTC (permalink / raw)
  To: Yuji Shimada; +Cc: xen-devel, Ian Jackson, Keir Fraser

On Mon, Feb 23, 2009 at 03:24:41PM +0900, Yuji Shimada wrote:
> On Fri, 20 Feb 2009 18:07:00 +1100
> Simon Horman <horms@verge.net.au> wrote:
> 
> > On Thu, Feb 19, 2009 at 09:38:24AM +0000, Keir Fraser wrote:
> > > On 19/02/2009 09:21, "Yuji Shimada" <shimada-yxb@necst.nec.co.jp> wrote:
> > > 
> > > >> To be honest I am a little confused about what the above maping
> > > >> is supposed to achive.
> > > > 
> > > > Please find the attached figure which shows the interrupt routing in
> > > > xen hypervisor.
> > > 
> > > The point being to deliberately permute the mapping to try to avoid
> > > accidental GSI sharing even if there are patterns in DEV:INTX usage (e.g.,
> > > all devs use INTA).
> > 
> > Thanks for the information, especially the diagram. It is very useful.
> > 
> > Armed with this new kowledge I have a few questions.
> > 
> > 1. Shimada-san stated that shared GSI are not permitted for
> >    pass-through devices. Is it permitted for a GSI to be shared
> >    between a pass-through device and a non-pass-through device?
> 
> Yes, it is permitted. But guest software will receive spurious
> interrupt. So it is not good.

Ok, so it would be good to avoid if possible.

> >    The current scheme seems to leave scope for this as
> > 
> >    gsi 6 A = gsi 13 D = gsi 21 C = gsi 29 B
> >    gsi 7 A = gsi 14 D = gsi 22 C = gsi 30 B
> 
> Do you mean this?
> 
>      Dev 6 INTA = Dev 13 INTD = Dev 21 INTC = Dev 29 INTB -> GSI 40
>      Dev 7 INTA = Dev 14 INTD = Dev 22 INTC = Dev 30 INTB -> GSI 44

Yes, that is what I meant.

> > 2. In several places in ioemu:io/passthrough.c e_intx is set to 0,
> >    corresponding to INTA. Is this because it is virtual and
> >    using INTA is convenient? Or is it because it is assumed
> >    that the physical device being passed-through is a 0 function
> >    (and 0 functions always use INTA) ?
> 
> INTx is virtualized, because the single function device normally use
> INTA.

Suppose the case where 00:1d.0 has INTA and 00:1d.1 has INTB,
and both these functions are passed-trhough into a guest
without any of my patches applied. In the guest 00:1d.0 will
appear as 00:06.0 with INTA. And 00:1d.1 will apepar as
00:06.1 with INTA. Is this ok?

> When we make multi-function cards appear in guests as multi-function
> cards, it is good that virtual INTx reflects the physical INTx. The
> reason is one of functions of a device may share INTx of the other
> function. In my environment, UHCI(00:1d.0) and EHCI(00:1d.7) share the
> same INTA. If physical functions share physical INTx, virtual
> functions should share virtual INTx. To achieve this, virtual INTx
> needs to reflect the physical INTx.

Understood. The issue of assigning GSIs aside, this should
be fairly straightforward.

> >    The latter assumption is not valid because even without my pacthes
> >    it is possible to pass-through non-0 functions, its just that
> >    they end up as the 0th function of the virtual slot in the guest.
> > 
> > I am now pretty sure that my change leads to incorrect usage of
> > hvm_pci_intx_gsi(). Answers to the questions above will help me to
> > understand how trivial to fix this is.
> > 
> > The most difficult cases seem to be 1) sharing of gsi between
> > pass-through and non-pass-through devices is not permitted or 2)
> > intx used inside ioemu:io/passthrough.c should reflect the physical
> > intx. In either case I wonder if a reasonable solution would be to
> > just allocate allocate GSI in a non-colliding manner. Say, GSI 16 for
> > the first device to ask, 17 for the next one and so on. Or perhaps
> > the existing hash + overflow to the next GSI on collision.
> 
> The another solution is expanding GSI to 127. I don't sure it is
> possible, but sharing virtual GSI will not occur.

That thought crossed my mind too, I will investigate further.
But I think that ideally it would need to be expanded to 143
as the first 16 GSI are currently reserved for ISA.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-02-23  6:55             ` Simon Horman
@ 2009-02-23  8:39               ` Yuji Shimada
  2009-02-23  9:33                 ` Simon Horman
  2009-02-23 11:31               ` Keir Fraser
  1 sibling, 1 reply; 62+ messages in thread
From: Yuji Shimada @ 2009-02-23  8:39 UTC (permalink / raw)
  To: Simon Horman; +Cc: xen-devel, Ian Jackson, Keir Fraser

On Mon, 23 Feb 2009 17:55:30 +1100
Simon Horman <horms@verge.net.au> wrote:

> On Mon, Feb 23, 2009 at 03:24:41PM +0900, Yuji Shimada wrote:
> > On Fri, 20 Feb 2009 18:07:00 +1100
> > Simon Horman <horms@verge.net.au> wrote:
> > 
> > > On Thu, Feb 19, 2009 at 09:38:24AM +0000, Keir Fraser wrote:
> > > > On 19/02/2009 09:21, "Yuji Shimada" <shimada-yxb@necst.nec.co.jp> wrote:
> > > > 
> > > > >> To be honest I am a little confused about what the above maping
> > > > >> is supposed to achive.
> > > > > 
> > > > > Please find the attached figure which shows the interrupt routing in
> > > > > xen hypervisor.
> > > > 
> > > > The point being to deliberately permute the mapping to try to avoid
> > > > accidental GSI sharing even if there are patterns in DEV:INTX usage (e.g.,
> > > > all devs use INTA).
> > > 
> > > Thanks for the information, especially the diagram. It is very useful.
> > > 
> > > Armed with this new kowledge I have a few questions.
> > > 
> > > 1. Shimada-san stated that shared GSI are not permitted for
> > >    pass-through devices. Is it permitted for a GSI to be shared
> > >    between a pass-through device and a non-pass-through device?
> > 
> > Yes, it is permitted. But guest software will receive spurious
> > interrupt. So it is not good.
> 
> Ok, so it would be good to avoid if possible.
> 
> > >    The current scheme seems to leave scope for this as
> > > 
> > >    gsi 6 A = gsi 13 D = gsi 21 C = gsi 29 B
> > >    gsi 7 A = gsi 14 D = gsi 22 C = gsi 30 B
> > 
> > Do you mean this?
> > 
> >      Dev 6 INTA = Dev 13 INTD = Dev 21 INTC = Dev 29 INTB -> GSI 40
> >      Dev 7 INTA = Dev 14 INTD = Dev 22 INTC = Dev 30 INTB -> GSI 44
> 
> Yes, that is what I meant.
> 
> > > 2. In several places in ioemu:io/passthrough.c e_intx is set to 0,
> > >    corresponding to INTA. Is this because it is virtual and
> > >    using INTA is convenient? Or is it because it is assumed
> > >    that the physical device being passed-through is a 0 function
> > >    (and 0 functions always use INTA) ?
> > 
> > INTx is virtualized, because the single function device normally use
> > INTA.
> 
> Suppose the case where 00:1d.0 has INTA and 00:1d.1 has INTB,
> and both these functions are passed-trhough into a guest
> without any of my patches applied. In the guest 00:1d.0 will
> appear as 00:06.0 with INTA. And 00:1d.1 will apepar as
> 00:06.1 with INTA. Is this ok?

00:1d.1 with INTB will appear as 00:07.0 with INTA, when we use
current xen.

> > When we make multi-function cards appear in guests as multi-function
> > cards, it is good that virtual INTx reflects the physical INTx. The
> > reason is one of functions of a device may share INTx of the other
> > function. In my environment, UHCI(00:1d.0) and EHCI(00:1d.7) share the
> > same INTA. If physical functions share physical INTx, virtual
> > functions should share virtual INTx. To achieve this, virtual INTx
> > needs to reflect the physical INTx.
> 
> Understood. The issue of assigning GSIs aside, this should
> be fairly straightforward.
> 
> > >    The latter assumption is not valid because even without my pacthes
> > >    it is possible to pass-through non-0 functions, its just that
> > >    they end up as the 0th function of the virtual slot in the guest.
> > > 
> > > I am now pretty sure that my change leads to incorrect usage of
> > > hvm_pci_intx_gsi(). Answers to the questions above will help me to
> > > understand how trivial to fix this is.
> > > 
> > > The most difficult cases seem to be 1) sharing of gsi between
> > > pass-through and non-pass-through devices is not permitted or 2)
> > > intx used inside ioemu:io/passthrough.c should reflect the physical
> > > intx. In either case I wonder if a reasonable solution would be to
> > > just allocate allocate GSI in a non-colliding manner. Say, GSI 16 for
> > > the first device to ask, 17 for the next one and so on. Or perhaps
> > > the existing hash + overflow to the next GSI on collision.
> > 
> > The another solution is expanding GSI to 127. I don't sure it is
> > possible, but sharing virtual GSI will not occur.
> 
> That thought crossed my mind too, I will investigate further.
> But I think that ideally it would need to be expanded to 143
> as the first 16 GSI are currently reserved for ISA.

That's exactly right.

Thanks,
--
Yuji Shimada

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-02-23  8:39               ` Yuji Shimada
@ 2009-02-23  9:33                 ` Simon Horman
  0 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-23  9:33 UTC (permalink / raw)
  To: Yuji Shimada; +Cc: xen-devel, Ian Jackson, Keir Fraser

On Mon, Feb 23, 2009 at 05:39:52PM +0900, Yuji Shimada wrote:
> On Mon, 23 Feb 2009 17:55:30 +1100
> Simon Horman <horms@verge.net.au> wrote:
> 
> > On Mon, Feb 23, 2009 at 03:24:41PM +0900, Yuji Shimada wrote:
> > > On Fri, 20 Feb 2009 18:07:00 +1100
> > > Simon Horman <horms@verge.net.au> wrote:
> > > 
> > > > On Thu, Feb 19, 2009 at 09:38:24AM +0000, Keir Fraser wrote:
> > > > > On 19/02/2009 09:21, "Yuji Shimada" <shimada-yxb@necst.nec.co.jp> wrote:
> > > > > 
> > > > > >> To be honest I am a little confused about what the above maping
> > > > > >> is supposed to achive.
> > > > > > 
> > > > > > Please find the attached figure which shows the interrupt routing in
> > > > > > xen hypervisor.
> > > > > 
> > > > > The point being to deliberately permute the mapping to try to avoid
> > > > > accidental GSI sharing even if there are patterns in DEV:INTX usage (e.g.,
> > > > > all devs use INTA).
> > > > 
> > > > Thanks for the information, especially the diagram. It is very useful.
> > > > 
> > > > Armed with this new kowledge I have a few questions.
> > > > 
> > > > 1. Shimada-san stated that shared GSI are not permitted for
> > > >    pass-through devices. Is it permitted for a GSI to be shared
> > > >    between a pass-through device and a non-pass-through device?
> > > 
> > > Yes, it is permitted. But guest software will receive spurious
> > > interrupt. So it is not good.
> > 
> > Ok, so it would be good to avoid if possible.
> > 
> > > >    The current scheme seems to leave scope for this as
> > > > 
> > > >    gsi 6 A = gsi 13 D = gsi 21 C = gsi 29 B
> > > >    gsi 7 A = gsi 14 D = gsi 22 C = gsi 30 B
> > > 
> > > Do you mean this?
> > > 
> > >      Dev 6 INTA = Dev 13 INTD = Dev 21 INTC = Dev 29 INTB -> GSI 40
> > >      Dev 7 INTA = Dev 14 INTD = Dev 22 INTC = Dev 30 INTB -> GSI 44
> > 
> > Yes, that is what I meant.
> > 
> > > > 2. In several places in ioemu:io/passthrough.c e_intx is set to 0,
> > > >    corresponding to INTA. Is this because it is virtual and
> > > >    using INTA is convenient? Or is it because it is assumed
> > > >    that the physical device being passed-through is a 0 function
> > > >    (and 0 functions always use INTA) ?
> > > 
> > > INTx is virtualized, because the single function device normally use
> > > INTA.
> > 
> > Suppose the case where 00:1d.0 has INTA and 00:1d.1 has INTB,
> > and both these functions are passed-trhough into a guest
> > without any of my patches applied. In the guest 00:1d.0 will
> > appear as 00:06.0 with INTA. And 00:1d.1 will apepar as
> > 00:06.1 with INTA. Is this ok?
> 
> 00:1d.1 with INTB will appear as 00:07.0 with INTA, when we use
> current xen.

Thanks, my understanding of the code seems to be correct :-)

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-02-23  6:55             ` Simon Horman
  2009-02-23  8:39               ` Yuji Shimada
@ 2009-02-23 11:31               ` Keir Fraser
  2009-02-23 22:18                 ` Simon Horman
  2009-02-24  1:29                 ` Ian Pratt
  1 sibling, 2 replies; 62+ messages in thread
From: Keir Fraser @ 2009-02-23 11:31 UTC (permalink / raw)
  To: Simon Horman, Yuji Shimada; +Cc: xen-devel, Ian Jackson

On 22/02/2009 22:55, "Simon Horman" <horms@verge.net.au> wrote:

>> The another solution is expanding GSI to 127. I don't sure it is
>> possible, but sharing virtual GSI will not occur.
> 
> That thought crossed my mind too, I will investigate further.
> But I think that ideally it would need to be expanded to 143
> as the first 16 GSI are currently reserved for ISA.

Can't xend or qemu do a search of PCI slot space to pick a virtual
devfn:intx that doesn't conflict (or that purposely does conflict, if there
are any cases where you want that)? 32 non-legacy GSIs should be enough to
avoid any aliasing if a bit of effort is put in to avoid it.

 -- Keir

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-02-23 11:31               ` Keir Fraser
@ 2009-02-23 22:18                 ` Simon Horman
  2009-03-02  4:14                   ` Simon Horman
  2009-02-24  1:29                 ` Ian Pratt
  1 sibling, 1 reply; 62+ messages in thread
From: Simon Horman @ 2009-02-23 22:18 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Yuji Shimada, xen-devel, Ian Jackson

On Mon, Feb 23, 2009 at 03:31:30AM -0800, Keir Fraser wrote:
> On 22/02/2009 22:55, "Simon Horman" <horms@verge.net.au> wrote:
> 
> >> The another solution is expanding GSI to 127. I don't sure it is
> >> possible, but sharing virtual GSI will not occur.
> > 
> > That thought crossed my mind too, I will investigate further.
> > But I think that ideally it would need to be expanded to 143
> > as the first 16 GSI are currently reserved for ISA.
> 
> Can't xend or qemu do a search of PCI slot space to pick a virtual
> devfn:intx that doesn't conflict (or that purposely does conflict, if there
> are any cases where you want that)? 32 non-legacy GSIs should be enough to
> avoid any aliasing if a bit of effort is put in to avoid it.

That was what I had in mind in my original proposal.
I'll try coding it up and see what it looks like.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-02-23 11:31               ` Keir Fraser
  2009-02-23 22:18                 ` Simon Horman
@ 2009-02-24  1:29                 ` Ian Pratt
  2009-02-24  1:50                   ` Simon Horman
  1 sibling, 1 reply; 62+ messages in thread
From: Ian Pratt @ 2009-02-24  1:29 UTC (permalink / raw)
  To: Keir Fraser, Simon Horman, Yuji Shimada
  Cc: Ian, Ian Pratt, xen-devel, Jackson

> > That thought crossed my mind too, I will investigate further.
> > But I think that ideally it would need to be expanded to 143
> > as the first 16 GSI are currently reserved for ISA.
> 
> Can't xend or qemu do a search of PCI slot space to pick a virtual
> devfn:intx that doesn't conflict (or that purposely does conflict, if
> there are any cases where you want that)? 32 non-legacy GSIs should be enough
> to avoid any aliasing if a bit of effort is put in to avoid it.

One related issue we need to think about is how we can get some sort of consistency over how devices are assigned to slots, so it's possible to add a new device and not have all existing ones re-numbered (hence causing the OS to hiccup).

Giving devices (both emulated and PT) an optional order parameter in the config file which is used to determine the order with which they are allocated resources is probably sufficient.

Ian

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-02-24  1:29                 ` Ian Pratt
@ 2009-02-24  1:50                   ` Simon Horman
  0 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-02-24  1:50 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Ian, Yuji Shimada, xen-devel, Jackson, Keir Fraser

On Tue, Feb 24, 2009 at 01:29:48AM +0000, Ian Pratt wrote:
> > > That thought crossed my mind too, I will investigate further.
> > > But I think that ideally it would need to be expanded to 143
> > > as the first 16 GSI are currently reserved for ISA.
> > 
> > Can't xend or qemu do a search of PCI slot space to pick a virtual
> > devfn:intx that doesn't conflict (or that purposely does conflict, if
> > there are any cases where you want that)? 32 non-legacy GSIs should be enough
> > to avoid any aliasing if a bit of effort is put in to avoid it.
> 
> One related issue we need to think about is how we can get some sort of
> consistency over how devices are assigned to slots, so it's possible to
> add a new device and not have all existing ones re-numbered (hence
> causing the OS to hiccup).

I'm not clear on how this renumbering occurs? Are you talking
about renumbering of vslots, gsi, or something else?

> Giving devices (both emulated and PT) an optional order parameter in the
> config file which is used to determine the order with which they are
> allocated resources is probably sufficient.

One of the patches in this series sorts pass-through devices
before they are inserted. My approach was to do this inside ioemu.

http://lists.xensource.com/archives/html/xen-devel/2009-02/msg00603.html

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-02-23 22:18                 ` Simon Horman
@ 2009-03-02  4:14                   ` Simon Horman
  2009-03-02  8:44                     ` Keir Fraser
  0 siblings, 1 reply; 62+ messages in thread
From: Simon Horman @ 2009-03-02  4:14 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Yuji Shimada, xen-devel, Ian Jackson

On Tue, Feb 24, 2009 at 09:18:46AM +1100, Simon Horman wrote:
> On Mon, Feb 23, 2009 at 03:31:30AM -0800, Keir Fraser wrote:
> > On 22/02/2009 22:55, "Simon Horman" <horms@verge.net.au> wrote:
> > 
> > >> The another solution is expanding GSI to 127. I don't sure it is
> > >> possible, but sharing virtual GSI will not occur.
> > > 
> > > That thought crossed my mind too, I will investigate further.
> > > But I think that ideally it would need to be expanded to 143
> > > as the first 16 GSI are currently reserved for ISA.
> > 
> > Can't xend or qemu do a search of PCI slot space to pick a virtual
> > devfn:intx that doesn't conflict (or that purposely does conflict, if there
> > are any cases where you want that)? 32 non-legacy GSIs should be enough to
> > avoid any aliasing if a bit of effort is put in to avoid it.
> 
> That was what I had in mind in my original proposal.
> I'll try coding it up and see what it looks like.

As per the patch below I tried coding this up.

Unfortuantely in my test cases it seems that an rtl8139 ioemu device
doesn't work unless its given the same girq that the original mapping
yeilded.  I guess this is because the patch below isn't sufficient to wire
up the girq for the ioemu case. I am investigating why.

Index: xen-unstable.hg/xen/arch/x86/hvm/irq.c
===================================================================
--- xen-unstable.hg.orig/xen/arch/x86/hvm/irq.c	2009-03-02 15:02:28.000000000 +1100
+++ xen-unstable.hg/xen/arch/x86/hvm/irq.c	2009-03-02 15:09:51.000000000 +1100
@@ -26,6 +26,127 @@
 #include <asm/hvm/domain.h>
 #include <asm/hvm/support.h>
 
+#define GSI_F_PASS_THROUGH 0x80
+#define GSI_F_MASK         0x7f
+
+static int gsi_is_pass_through(struct hvm_irq *hvm_irq, uint8_t gsi)
+{
+    int i;
+
+    for ( i = 0; i < NR_PCI_DEVINTX; i++ )
+    {
+        if ( (hvm_irq->pci_devintx_gsi[i] & GSI_F_MASK) == gsi &&
+             hvm_irq->pci_devintx_gsi[i] & GSI_F_PASS_THROUGH )
+                return 1;
+    }
+
+    return 0;
+}
+
+/* Must be protected by d->arch.hvm_domain.irq_lock
+ * as d->arch.hvm_domain.irq is accessed
+ */
+static uint8_t __hvm_pci_intx_gsi_find(struct domain *d, uint32_t device,
+                                       uint32_t intx, uint8_t flag)
+{
+    uint8_t gsi;
+    int cnt, devintx = PCI_DEVINTX(device, intx);
+    struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
+    int seed = 0; /* XXX: remove this, it is for testing only */
+
+    ASSERT((device <= NR_PCI_DEV) && (intx <= NR_PCI_INTX));
+
+    if ( (gsi = (hvm_irq->pci_devintx_gsi[devintx] & GSI_F_MASK)) )
+        goto out;
+
+    /* Find the gsi with the lowest usage count that
+     * is not used by a pass-through device
+     */
+    for ( cnt = 0; cnt < NR_PCI_DEVINTX; cnt++ )
+    {
+        for ( gsi = NR_ISAIRQS + seed; gsi < VIOAPIC_NUM_PINS; gsi++ )
+        {
+            if ( hvm_irq->pci_gsi_cnt[gsi - NR_ISAIRQS] != cnt ||
+                 gsi_is_pass_through(hvm_irq, gsi) )
+                continue;
+            goto out;
+        }
+    }
+
+    /* If this device isn't a pass-through device,
+     * then it may share a gsi with a pass-through device
+     */
+    if (flag & GSI_F_PASS_THROUGH)
+        goto err;
+
+    for ( cnt = 0; cnt < NR_PCI_DEVINTX; cnt++ )
+    {
+        for ( gsi = NR_ISAIRQS; gsi < VIOAPIC_NUM_PINS; gsi++ )
+        {
+            if ( hvm_irq->pci_gsi_cnt[gsi - NR_ISAIRQS] != cnt)
+                continue;
+            goto out;
+        }
+
+    }
+
+err:
+    gdprintk(XENLOG_ERR, "HVM: No GSI available for dev=%d intx=%d\n",
+             device, intx);
+    return NR_PCI_DEVINTX;
+
+out:
+    hvm_irq->pci_devintx_gsi[devintx] = gsi | flag;
+    hvm_irq->pci_gsi_cnt[gsi - NR_ISAIRQS]++;
+    return gsi;
+}
+
+/* Must be protected by d->arch.hvm_domain.irq_lock
+ * as d->arch.hvm_domain.irq is accessed
+ */
+uint8_t hvm_pci_intx_gsi_find(struct domain *d, uint32_t device, uint32_t intx)
+{
+    uint8_t gsi;
+
+    gsi = __hvm_pci_intx_gsi_find(d, device, intx, 0);
+
+    dprintk(XENLOG_G_INFO, "%s(%d, %u, %u) = %u\n",
+            __func__, d->domain_id, device, intx, gsi);
+    return gsi;
+}
+
+uint8_t hvm_pci_intx_gsi_bind_pt(struct domain *d, uint32_t device,
+                                 uint32_t intx)
+{
+    uint8_t gsi;
+
+    spin_lock(&d->arch.hvm_domain.irq_lock);
+    gsi = __hvm_pci_intx_gsi_find(d, device, intx, GSI_F_PASS_THROUGH);
+    spin_unlock(&d->arch.hvm_domain.irq_lock);
+
+    dprintk(XENLOG_G_INFO, "%s(%d, %u, %u) = %u\n",
+            __func__, d->domain_id, device, intx, gsi);
+    return gsi;
+}
+
+uint8_t hvm_pci_intx_gsi_unbind_pt(struct domain *d, uint32_t device,
+                                   uint32_t intx)
+{
+    uint8_t gsi;
+    int devintx = PCI_DEVINTX(device, intx);
+    struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
+
+    spin_lock(&d->arch.hvm_domain.irq_lock);
+    gsi = __hvm_pci_intx_gsi_find(d, device, intx, GSI_F_PASS_THROUGH);
+    hvm_irq->pci_devintx_gsi[devintx] = 0;
+    hvm_irq->pci_gsi_cnt[gsi - NR_ISAIRQS]--;
+    spin_unlock(&d->arch.hvm_domain.irq_lock);
+
+    dprintk(XENLOG_G_INFO, "%s(%d, %u, %u) = %u\n",
+            __func__, d->domain_id, device, intx, gsi);
+    return gsi;
+}
+
 static void __hvm_pci_intx_assert(
     struct domain *d, unsigned int device, unsigned int intx)
 {
@@ -37,7 +158,11 @@ static void __hvm_pci_intx_assert(
     if ( __test_and_set_bit(device*4 + intx, &hvm_irq->pci_intx.i) )
         return;
 
-    gsi = hvm_pci_intx_gsi(device, intx);
+    gsi = hvm_pci_intx_gsi_find(d, device, intx);
+    /* XXX: Need better error handling */
+    if ( gsi == VIOAPIC_NUM_PINS )
+        return;
+
     if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
         vioapic_irq_positive_edge(d, gsi);
 
@@ -70,7 +195,10 @@ static void __hvm_pci_intx_deassert(
     if ( !__test_and_clear_bit(device*4 + intx, &hvm_irq->pci_intx.i) )
         return;
 
-    gsi = hvm_pci_intx_gsi(device, intx);
+    gsi = hvm_pci_intx_gsi_find(d, device, intx);
+    /* XXX: Need better error handling */
+    if ( gsi == VIOAPIC_NUM_PINS )
+        return;
     --hvm_irq->gsi_assert_count[gsi];
 
     link    = hvm_pci_intx_link(device, intx);
@@ -472,7 +600,9 @@ static int irq_load_pci(struct domain *d
             if ( test_bit(dev*4 + intx, &hvm_irq->pci_intx.i) )
             {
                 /* Direct GSI assert */
-                gsi = hvm_pci_intx_gsi(dev, intx);
+                spin_lock(&d->event_lock);
+                gsi = hvm_pci_intx_gsi_find(d, dev, intx);
+                spin_unlock(&d->event_lock);
                 hvm_irq->gsi_assert_count[gsi]++;
                 /* PCI-ISA bridge assert */
                 link = hvm_pci_intx_link(dev, intx);
Index: xen-unstable.hg/xen/include/asm-x86/hvm/irq.h
===================================================================
--- xen-unstable.hg.orig/xen/include/asm-x86/hvm/irq.h	2009-03-02 15:02:28.000000000 +1100
+++ xen-unstable.hg/xen/include/asm-x86/hvm/irq.h	2009-03-02 15:02:34.000000000 +1100
@@ -27,6 +27,13 @@
 #include <asm/hvm/vpic.h>
 #include <asm/hvm/vioapic.h>
 
+#define NR_PCI_DEV     32
+#define NR_PCI_INTX    4
+#define NR_PCI_DEVINTX (NR_PCI_DEV*NR_PCI_INTX)
+#define PCI_DEVINTX(dev, intx) ((dev * 4) + intx)
+
+#define NR_PCI_GSI     (VIOAPIC_NUM_PINS - NR_ISAIRQS)
+
 struct hvm_irq {
     /*
      * Virtual interrupt wires for a single PCI bus.
@@ -88,10 +95,20 @@ struct hvm_irq {
     u8 round_robin_prev_vcpu;
 
     struct hvm_irq_dpci *dpci;
+
+    /* Map guest pci device to guest gsi */
+    uint8_t pci_devintx_gsi[NR_PCI_DEVINTX];
+    /* PCI devices using a GSI */
+    uint8_t pci_gsi_cnt[NR_PCI_GSI];
 };
 
-#define hvm_pci_intx_gsi(dev, intx)  \
-    (((((dev)<<2) + ((dev)>>3) + (intx)) & 31) + 16)
+/* Must be protected by domain's event_loc as hvm_irq_dpci is accessed */
+uint8_t hvm_pci_intx_gsi_find(struct domain *d, uint32_t device, uint32_t intx);
+uint8_t hvm_pci_intx_gsi_bind_pt(struct domain *d, uint32_t device,
+                                 uint32_t intx);
+uint8_t hvm_pci_intx_gsi_unbind_pt(struct domain *d, uint32_t device,
+                                   uint32_t intx);
+
 #define hvm_pci_intx_link(dev, intx) \
     (((dev) + (intx)) & 3)
 
Index: xen-unstable.hg/xen/drivers/passthrough/io.c
===================================================================
--- xen-unstable.hg.orig/xen/drivers/passthrough/io.c	2009-03-02 15:02:28.000000000 +1100
+++ xen-unstable.hg/xen/drivers/passthrough/io.c	2009-03-02 15:02:34.000000000 +1100
@@ -129,7 +129,6 @@ int pt_irq_create_bind_vtd(
         machine_gsi = pt_irq_bind->machine_irq;
         device = pt_irq_bind->u.pci.device;
         intx = pt_irq_bind->u.pci.intx;
-        guest_gsi = hvm_pci_intx_gsi(device, intx);
         link = hvm_pci_intx_link(device, intx);
         hvm_irq_dpci->link_cnt[link]++;
 
@@ -140,6 +139,13 @@ int pt_irq_create_bind_vtd(
             return -ENOMEM;
         }
 
+        guest_gsi = hvm_pci_intx_gsi_bind_pt(d, device, intx);
+        if (guest_gsi == VIOAPIC_NUM_PINS)
+        {
+            spin_unlock(&d->event_lock);
+            return -ENODEV;
+        }
+
         digl->device = device;
         digl->intx = intx;
         digl->gsi = guest_gsi;
@@ -189,6 +195,7 @@ int pt_irq_create_bind_vtd(
                 hvm_irq_dpci->girq[guest_gsi].intx = 0;
                 hvm_irq_dpci->girq[guest_gsi].device = 0;
                 hvm_irq_dpci->girq[guest_gsi].valid = 0;
+                hvm_pci_intx_gsi_unbind_pt(d, device, intx);
                 list_del(&digl->list);
                 hvm_irq_dpci->link_cnt[link]--;
                 spin_unlock(&d->event_lock);
@@ -217,13 +224,8 @@ int pt_irq_destroy_bind_vtd(
     machine_gsi = pt_irq_bind->machine_irq;
     device = pt_irq_bind->u.pci.device;
     intx = pt_irq_bind->u.pci.intx;
-    guest_gsi = hvm_pci_intx_gsi(device, intx);
     link = hvm_pci_intx_link(device, intx);
 
-    gdprintk(XENLOG_INFO,
-             "pt_irq_destroy_bind_vtd: machine_gsi=%d "
-             "guest_gsi=%d, device=%d, intx=%d.\n",
-             machine_gsi, guest_gsi, device, intx);
     spin_lock(&d->event_lock);
 
     hvm_irq_dpci = domain_get_irq_dpci(d);
@@ -234,6 +236,7 @@ int pt_irq_destroy_bind_vtd(
         return -EINVAL;
     }
 
+    guest_gsi = hvm_pci_intx_gsi_unbind_pt(d, device, intx);
     hvm_irq_dpci->link_cnt[link]--;
     memset(&hvm_irq_dpci->girq[guest_gsi], 0,
            sizeof(struct hvm_girq_dpci_mapping));
@@ -268,6 +271,10 @@ int pt_irq_destroy_bind_vtd(
     }
     spin_unlock(&d->event_lock);
     gdprintk(XENLOG_INFO,
+             "pt_irq_destroy_bind_vtd: machine_gsi=%d "
+             "guest_gsi=%d, device=%d, intx=%d.\n",
+             machine_gsi, guest_gsi, device, intx);
+    gdprintk(XENLOG_INFO,
              "XEN_DOMCTL_irq_unmapping: m_irq = %x device = %x intx = %x\n",
              machine_gsi, device, intx);

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-02  4:14                   ` Simon Horman
@ 2009-03-02  8:44                     ` Keir Fraser
  2009-03-02  9:53                       ` Simon Horman
  0 siblings, 1 reply; 62+ messages in thread
From: Keir Fraser @ 2009-03-02  8:44 UTC (permalink / raw)
  To: Simon Horman; +Cc: Yuji Shimada, xen-devel, Ian Jackson

On 02/03/2009 04:14, "Simon Horman" <horms@verge.net.au> wrote:

>> That was what I had in mind in my original proposal.
>> I'll try coding it up and see what it looks like.
> 
> As per the patch below I tried coding this up.
> 
> Unfortuantely in my test cases it seems that an rtl8139 ioemu device
> doesn't work unless its given the same girq that the original mapping
> yeilded.  I guess this is because the patch below isn't sufficient to wire
> up the girq for the ioemu case. I am investigating why.

Not really what I had in mind. Xend can do the GSI->slot mapping, to ensure
non-conflicting GSIs. I don't think any hypervisor changes are required, let
alone substantial ones.

 -- Keir

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-02  8:44                     ` Keir Fraser
@ 2009-03-02  9:53                       ` Simon Horman
  2009-03-02 10:08                         ` Keir Fraser
  0 siblings, 1 reply; 62+ messages in thread
From: Simon Horman @ 2009-03-02  9:53 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Yuji Shimada, xen-devel, Ian Jackson

On Mon, Mar 02, 2009 at 08:44:42AM +0000, Keir Fraser wrote:
> On 02/03/2009 04:14, "Simon Horman" <horms@verge.net.au> wrote:
> 
> >> That was what I had in mind in my original proposal.
> >> I'll try coding it up and see what it looks like.
> > 
> > As per the patch below I tried coding this up.
> > 
> > Unfortuantely in my test cases it seems that an rtl8139 ioemu device
> > doesn't work unless its given the same girq that the original mapping
> > yeilded.  I guess this is because the patch below isn't sufficient to wire
> > up the girq for the ioemu case. I am investigating why.
> 
> Not really what I had in mind. Xend can do the GSI->slot mapping, to ensure
> non-conflicting GSIs. I don't think any hypervisor changes are required, let
> alone substantial ones.

Is the idea that xend would allocate a gsi to a device and
then pass that gsi along as part of the device configuration
to the device model?

If so, I think something similar to what I wrote, but moved
into xend could work quite well. But I sense that wasn't what
you had in mind either.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-02  9:53                       ` Simon Horman
@ 2009-03-02 10:08                         ` Keir Fraser
  2009-03-02 11:25                           ` Simon Horman
  2009-03-04 23:05                           ` Simon Horman
  0 siblings, 2 replies; 62+ messages in thread
From: Keir Fraser @ 2009-03-02 10:08 UTC (permalink / raw)
  To: Simon Horman; +Cc: Yuji Shimada, xen-devel, Ian Jackson

On 02/03/2009 09:53, "Simon Horman" <horms@verge.net.au> wrote:

>> Not really what I had in mind. Xend can do the GSI->slot mapping, to ensure
>> non-conflicting GSIs. I don't think any hypervisor changes are required, let
>> alone substantial ones.
> 
> Is the idea that xend would allocate a gsi to a device and
> then pass that gsi along as part of the device configuration
> to the device model?
> 
> If so, I think something similar to what I wrote, but moved
> into xend could work quite well. But I sense that wasn't what
> you had in mind either.

I mean that xend can pick a virtual devfn for the device that it knows has a
non-conflicting GSI. This avoids any need for dynamic mapping between devfn
and GSI (which would be more of a pain in the neck -- for example, your
patch doesn't work because certain parts of BIOS info tables need to be
dynamically generated, as currently they hardcode the devfn-GSI
relationship).

 -- Keir

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-02 10:08                         ` Keir Fraser
@ 2009-03-02 11:25                           ` Simon Horman
  2009-03-02 11:33                             ` Keir Fraser
  2009-03-04 23:05                           ` Simon Horman
  1 sibling, 1 reply; 62+ messages in thread
From: Simon Horman @ 2009-03-02 11:25 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Yuji Shimada, xen-devel, Ian Jackson

On Mon, Mar 02, 2009 at 10:08:29AM +0000, Keir Fraser wrote:
> On 02/03/2009 09:53, "Simon Horman" <horms@verge.net.au> wrote:
> 
> >> Not really what I had in mind. Xend can do the GSI->slot mapping, to ensure
> >> non-conflicting GSIs. I don't think any hypervisor changes are required, let
> >> alone substantial ones.
> > 
> > Is the idea that xend would allocate a gsi to a device and
> > then pass that gsi along as part of the device configuration
> > to the device model?
> > 
> > If so, I think something similar to what I wrote, but moved
> > into xend could work quite well. But I sense that wasn't what
> > you had in mind either.
> 
> I mean that xend can pick a virtual devfn for the device that it knows has a
> non-conflicting GSI. This avoids any need for dynamic mapping between devfn
> and GSI (which would be more of a pain in the neck -- for example, your
> patch doesn't work because certain parts of BIOS info tables need to be
> dynamically generated, as currently they hardcode the devfn-GSI
> relationship).

Thanks for the clarification. I suspect that scheme could easily run into
allocation problems when multi-function devices are passed-through as
multi-function devices.  Especially in the case of hot-plug. Buy which I
mean, it might be hard to find a device with the GSI for INTA + one or more
of INTB, C and D are free. But I'll take a look into it and see how it
goes.

In any case, could you be more specific about which areas my approach break?

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-02 11:25                           ` Simon Horman
@ 2009-03-02 11:33                             ` Keir Fraser
  2009-03-03  5:57                               ` Yuji Shimada
  0 siblings, 1 reply; 62+ messages in thread
From: Keir Fraser @ 2009-03-02 11:33 UTC (permalink / raw)
  To: Simon Horman; +Cc: Yuji Shimada, xen-devel, Ian Jackson

On 02/03/2009 11:25, "Simon Horman" <horms@verge.net.au> wrote:

>> I mean that xend can pick a virtual devfn for the device that it knows has a
>> non-conflicting GSI. This avoids any need for dynamic mapping between devfn
>> and GSI (which would be more of a pain in the neck -- for example, your
>> patch doesn't work because certain parts of BIOS info tables need to be
>> dynamically generated, as currently they hardcode the devfn-GSI
>> relationship).
> 
> Thanks for the clarification. I suspect that scheme could easily run into
> allocation problems when multi-function devices are passed-through as
> multi-function devices.  Especially in the case of hot-plug. Buy which I
> mean, it might be hard to find a device with the GSI for INTA + one or more
> of INTB, C and D are free. But I'll take a look into it and see how it
> goes.

Well, it depends how many devices you want to pass through. I bet you're
good up to at least half dozen, and likely more.

> In any case, could you be more specific about which areas my approach break?

Mainly, virtual firmware and (possibly) save/restore. The latter depends on
whether a guest with dynamically assigned devfn<->GSI relationship is ever
allowed to be saved/restored.

Your approach is also no good for PCI hotplug, since I'm pretty sure you
cannot update GSI bindings after the guest has booted. Unless there's some
ACPI magic that could be employed in this case.

 -- Keir

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-02 11:33                             ` Keir Fraser
@ 2009-03-03  5:57                               ` Yuji Shimada
  2009-03-03  8:56                                 ` Keir Fraser
  2009-03-04 22:26                                 ` Simon Horman
  0 siblings, 2 replies; 62+ messages in thread
From: Yuji Shimada @ 2009-03-03  5:57 UTC (permalink / raw)
  To: Simon Horman; +Cc: xen-devel, Ian Jackson, Keir Fraser

On Mon, 02 Mar 2009 11:33:41 +0000
Keir Fraser <keir.fraser@eu.citrix.com> wrote:

> On 02/03/2009 11:25, "Simon Horman" <horms@verge.net.au> wrote:
> 
> >> I mean that xend can pick a virtual devfn for the device that it knows has a
> >> non-conflicting GSI. This avoids any need for dynamic mapping between devfn
> >> and GSI (which would be more of a pain in the neck -- for example, your
> >> patch doesn't work because certain parts of BIOS info tables need to be
> >> dynamically generated, as currently they hardcode the devfn-GSI
> >> relationship).
> > 
> > Thanks for the clarification. I suspect that scheme could easily run into
> > allocation problems when multi-function devices are passed-through as
> > multi-function devices.  Especially in the case of hot-plug. Buy which I
> > mean, it might be hard to find a device with the GSI for INTA + one or more
> > of INTB, C and D are free. But I'll take a look into it and see how it
> > goes.
> 
> Well, it depends how many devices you want to pass through. I bet you're
> good up to at least half dozen, and likely more.
> 
> > In any case, could you be more specific about which areas my approach break?
> 
> Mainly, virtual firmware and (possibly) save/restore. The latter depends on
> whether a guest with dynamically assigned devfn<->GSI relationship is ever
> allowed to be saved/restored.
> 
> Your approach is also no good for PCI hotplug, since I'm pretty sure you
> cannot update GSI bindings after the guest has booted. Unless there's some
> ACPI magic that could be employed in this case.

If you want to assign many devices to guest, it is one of approaches
to expand GSIs statically.  You can support hot-plug easily.

    dev 0  INTA -> GSI  16
    dev 0  INTB -> GSI  17
    dev 0  INTC -> GSI  18
    dev 0  INTD -> GSI  19
    dev 1  INTA -> GSI  20
    dev 1  INTB -> GSI  21
    dev 1  INTC -> GSI  22
    dev 1  INTD -> GSI  23
    dev 2  INTA -> GSI  24
    dev 2  INTB -> GSI  25
    dev 2  INTC -> GSI  26
    dev 2  INTD -> GSI  27
    ...
    dev 31 INTA -> GSI 140
    dev 31 INTB -> GSI 141
    dev 31 INTC -> GSI 142
    dev 31 INTD -> GSI 143

Please note that _PRT method in ACPI AML should reflect GSIs. If you expand
GSIs, it will be necessary to change the _PRT method. Please see
tools/firmware/hvmloader/acpi/dsdt.asl.

Thanks,
--
Yuji Shimada

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-03  5:57                               ` Yuji Shimada
@ 2009-03-03  8:56                                 ` Keir Fraser
  2009-03-04 22:26                                 ` Simon Horman
  1 sibling, 0 replies; 62+ messages in thread
From: Keir Fraser @ 2009-03-03  8:56 UTC (permalink / raw)
  To: Yuji Shimada, Simon Horman; +Cc: xen-devel, Ian Jackson

On 03/03/2009 05:57, "Yuji Shimada" <shimada-yxb@necst.nec.co.jp> wrote:

>> Your approach is also no good for PCI hotplug, since I'm pretty sure you
>> cannot update GSI bindings after the guest has booted. Unless there's some
>> ACPI magic that could be employed in this case.
> 
> If you want to assign many devices to guest, it is one of approaches
> to expand GSIs statically.  You can support hot-plug easily.

How many devices do you want to assign? Bear in mind that most guests will
support MSI anyway, so this INTx issue is rather moot. I'd rather stretch
what we have than introduce extra INTx modes, requiring backward
compatibility, etc.

 -- Keir

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-03  5:57                               ` Yuji Shimada
  2009-03-03  8:56                                 ` Keir Fraser
@ 2009-03-04 22:26                                 ` Simon Horman
  2009-03-04 22:32                                   ` Keir Fraser
  1 sibling, 1 reply; 62+ messages in thread
From: Simon Horman @ 2009-03-04 22:26 UTC (permalink / raw)
  To: Yuji Shimada; +Cc: xen-devel, Ian Jackson, Keir Fraser

On Tue, Mar 03, 2009 at 02:57:15PM +0900, Yuji Shimada wrote:
> On Mon, 02 Mar 2009 11:33:41 +0000
> Keir Fraser <keir.fraser@eu.citrix.com> wrote:
> 
> > On 02/03/2009 11:25, "Simon Horman" <horms@verge.net.au> wrote:
> > 
> > >> I mean that xend can pick a virtual devfn for the device that it knows has a
> > >> non-conflicting GSI. This avoids any need for dynamic mapping between devfn
> > >> and GSI (which would be more of a pain in the neck -- for example, your
> > >> patch doesn't work because certain parts of BIOS info tables need to be
> > >> dynamically generated, as currently they hardcode the devfn-GSI
> > >> relationship).
> > > 
> > > Thanks for the clarification. I suspect that scheme could easily run into
> > > allocation problems when multi-function devices are passed-through as
> > > multi-function devices.  Especially in the case of hot-plug. Buy which I
> > > mean, it might be hard to find a device with the GSI for INTA + one or more
> > > of INTB, C and D are free. But I'll take a look into it and see how it
> > > goes.
> > 
> > Well, it depends how many devices you want to pass through. I bet you're
> > good up to at least half dozen, and likely more.
> > 
> > > In any case, could you be more specific about which areas my approach break?
> > 
> > Mainly, virtual firmware and (possibly) save/restore. The latter depends on
> > whether a guest with dynamically assigned devfn<->GSI relationship is ever
> > allowed to be saved/restored.
> > 
> > Your approach is also no good for PCI hotplug, since I'm pretty sure you
> > cannot update GSI bindings after the guest has booted. Unless there's some
> > ACPI magic that could be employed in this case.
> 
> If you want to assign many devices to guest, it is one of approaches
> to expand GSIs statically.  You can support hot-plug easily.
> 
>     dev 0  INTA -> GSI  16
>     dev 0  INTB -> GSI  17
>     dev 0  INTC -> GSI  18
>     dev 0  INTD -> GSI  19
>     dev 1  INTA -> GSI  20
>     dev 1  INTB -> GSI  21
>     dev 1  INTC -> GSI  22
>     dev 1  INTD -> GSI  23
>     dev 2  INTA -> GSI  24
>     dev 2  INTB -> GSI  25
>     dev 2  INTC -> GSI  26
>     dev 2  INTD -> GSI  27
>     ...
>     dev 31 INTA -> GSI 140
>     dev 31 INTB -> GSI 141
>     dev 31 INTC -> GSI 142
>     dev 31 INTD -> GSI 143
> 
> Please note that _PRT method in ACPI AML should reflect GSIs. If you expand
> GSIs, it will be necessary to change the _PRT method. Please see
> tools/firmware/hvmloader/acpi/dsdt.asl.

Could someone explain why the current code uses 32 GSIs rather than using 128?

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-04 22:26                                 ` Simon Horman
@ 2009-03-04 22:32                                   ` Keir Fraser
  2009-03-04 22:53                                     ` Simon Horman
  0 siblings, 1 reply; 62+ messages in thread
From: Keir Fraser @ 2009-03-04 22:32 UTC (permalink / raw)
  To: Simon Horman, Yuji Shimada; +Cc: xen-devel, Ian Jackson

On 04/03/2009 22:26, "Simon Horman" <horms@verge.net.au> wrote:

>> Please note that _PRT method in ACPI AML should reflect GSIs. If you expand
>> GSIs, it will be necessary to change the _PRT method. Please see
>> tools/firmware/hvmloader/acpi/dsdt.asl.
> 
> Could someone explain why the current code uses 32 GSIs rather than using 128?

32 should be plenty, and I'd want to emulate multiple IO-APICs beyond that
just out of fear that some OS will choke on an IO-APIC with 128 pins (since
no real single IO-APIC has so many pins). It just seemed a bit unnecessary
when this support was originally implemented. At this point I haven't heard
any concrete evidence so far that 32 GSIs is insufficient in any real
scenario. So that's what we'll be sticking with for the time being, I think.

 -- Keir

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-04 22:32                                   ` Keir Fraser
@ 2009-03-04 22:53                                     ` Simon Horman
  0 siblings, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-03-04 22:53 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Yuji Shimada, xen-devel, Ian Jackson

On Wed, Mar 04, 2009 at 10:32:11PM +0000, Keir Fraser wrote:
> On 04/03/2009 22:26, "Simon Horman" <horms@verge.net.au> wrote:
> 
> >> Please note that _PRT method in ACPI AML should reflect GSIs. If you expand
> >> GSIs, it will be necessary to change the _PRT method. Please see
> >> tools/firmware/hvmloader/acpi/dsdt.asl.
> > 
> > Could someone explain why the current code uses 32 GSIs rather than using 128?
> 
> 32 should be plenty, and I'd want to emulate multiple IO-APICs beyond that
> just out of fear that some OS will choke on an IO-APIC with 128 pins (since
> no real single IO-APIC has so many pins). It just seemed a bit unnecessary
> when this support was originally implemented. At this point I haven't heard
> any concrete evidence so far that 32 GSIs is insufficient in any real
> scenario. So that's what we'll be sticking with for the time being, I think.

Thanks, I will see if I can dig out any real-world reasons.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-02 10:08                         ` Keir Fraser
  2009-03-02 11:25                           ` Simon Horman
@ 2009-03-04 23:05                           ` Simon Horman
  2009-03-05  8:34                             ` Keir Fraser
  1 sibling, 1 reply; 62+ messages in thread
From: Simon Horman @ 2009-03-04 23:05 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Yuji Shimada, xen-devel, Ian Jackson

On Mon, Mar 02, 2009 at 10:08:29AM +0000, Keir Fraser wrote:
> On 02/03/2009 09:53, "Simon Horman" <horms@verge.net.au> wrote:
> 
> >> Not really what I had in mind. Xend can do the GSI->slot mapping, to ensure
> >> non-conflicting GSIs. I don't think any hypervisor changes are required, let
> >> alone substantial ones.
> > 
> > Is the idea that xend would allocate a gsi to a device and
> > then pass that gsi along as part of the device configuration
> > to the device model?
> > 
> > If so, I think something similar to what I wrote, but moved
> > into xend could work quite well. But I sense that wasn't what
> > you had in mind either.
> 
> I mean that xend can pick a virtual devfn for the device that it knows has a
> non-conflicting GSI. This avoids any need for dynamic mapping between devfn
> and GSI (which would be more of a pain in the neck -- for example, your
> patch doesn't work because certain parts of BIOS info tables need to be
> dynamically generated, as currently they hardcode the devfn-GSI
> relationship).

Hi Keir,

I tried coding up this allocation idea in python with a view to
plugging it into xend. A description of that code and the code itself
is below. However, I think that I must still be misunderstanding what
you have in mind.

In order to allocate devfn for pass-through devices in xend it is
necessary to know about all devfn that are in use (or going to be used)
by qemu-dm. This includes ioemu devices (unless you don't care about
them sharing GSI with pass-through devices). But as far as I can see
xend doesn't know anything about some ioemu devices, beyond say
requesting a nic.

Adding some simple debugging code to pci_register_device(), I see
that it allocates the following devfn on boot.

pci_register_device: name=i440FX devfn=0
pci_register_device: name=PIIX3 devfn=16
pci_register_device: name=Cirrus VGA devfn=8
pci_register_device: name=xen-platform devfn=24
pci_register_device: name=RTL8139 devfn=32
pci_register_device: name=PIIX3 IDE devfn=17
pci_register_device: name=PIIX4 ACPI devfn=19

I am guessing that your idea was not for xend to allocate all
of those devfn and pass them to qemu-dm on the command line.

My instinct is that it would really be easier to allocate devn -
using something like the algorithm I describe below - in qemu-dm
rather than xend.


Code decription
---------------

It tries to handle alocation
of (single-function) ioemu, single-function pass-through and
multi-function pass-through devices.

It tries to allocate devices such that there are as few
GSI clashes as possible (no classhes when there are few devices).
After that is allos ioemu devices to share a GSI with other devices.
But it doesn't allow two pass-through devices to share a GSI.

It handles multi-function by reserving 4 GSI when function 0 of
a multi-function device is assigned.

Unassignment is supported only for pass-through devices
as ioemu devices don't support hot-plug.

The code I have so far is below.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en


#============================================================================
# This library is free software; you can redistribute it and/or
# modify it under the terms of version 2.1 of the GNU Lesser General Public
# License as published by the Free Software Foundation.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#============================================================================
# Copyright (C) 2009 Simon Horman <horms@verge.net.au>
#============================================================================

#from xen.xend.XendError import VmError
#from xen.xend.XendLogging import log

import logging as log
log.basicConfig(level=log.DEBUG)


NR_PCI_DEV = 32
NR_PCI_INTX = 4
NR_PCI_DEVINTX = NR_PCI_DEV * NR_PCI_INTX

VIOAPIC_NUM_PINS = 48
NR_ISAIRQS = 16
NR_PCIGSI = VIOAPIC_NUM_PINS - NR_ISAIRQS

# No flags: IOEMU device
# DEVITNX_F_PT only: Passed-through as a single-function device
# DEVITNX_F_PT and DEVITNX_F_MULTI: Passed-through as a multi-function device
# DEVITNX_F_MULTI only: Place holder.
#                       Function 0 of a multi-function device
#                       has been passed through as a multi-function device
#                       and the subsequent 3 gsi are reserved for
#                       use by functions of the same physical device.
#                       If other functions are assigned they
#                       can use this place-holder if it hasn't
#                       already been used by another function of
#                       the same device.

DEVITNX_F_PT    = 0x1
DEVITNX_F_MULTI = 0x2
DEVITNX_F_HP    = 0x4

class gsi:
    """@assign devfn as per available gsi"""

    def __init__(self):
        self.pcigsi_devintx = []
        for pcigsi in range(NR_PCIGSI):
            self.pcigsi_devintx += [ [] ]

    def entry_to_phys(self, entry):
        return entry[0]

    def entry_to_virt(self, entry):
        return entry[1]

    def entry_to_intx(self, entry):
        return entry[2]

    def entry_to_flag(self, entry):
        return entry[3]

    def entry_update(self, entry, phys, v_devfn, intx, flag):
        entry[0] = phys
        entry[1] = v_devfn
        entry[2] = intx
        entry[3] = flag
        return v_devfn

    def entry(self, phys, virt, intx, flag):
        return [phys, virt, intx, flag]

    def devfn_to_dev(self, devfn):
        return devfn >> 3

    def devfn_to_fn(self, devfn):
        return devfn & 0x7

    def devfn(self, dev, fn):
        return (dev << 3) | fn

    def pcigsi_intx_to_dev(self, pcigsi, intx):
        # http://lists.xensource.com/archives/html/xen-devel/2009-02/pngmIO1Sm0VEX.png
        row = (pcigsi + NR_PCI_INTX - intx) % NR_PCI_INTX
        column = ((pcigsi + NR_PCIGSI - intx) % NR_PCIGSI)/NR_PCI_INTX
        return column + (8 * row)

    def dev_intx_to_pcigsi(self, dev, intx):
        return (((dev)<<2) + ((dev)>>3) + (intx)) & (NR_PCIGSI - 1)

    def pcigsi_wrap(self, pcigsi):
        return (pcigsi) % NR_PCIGSI

    def density(self, pcigsi, width):
        density = 0
        for i in range(width):
            idx = self.pcigsi_wrap(pcigsi + i)
            density += len(self.pcigsi_devintx[idx])
        return density

    def collision(self, pcigsi, width, phys_devfn, flag):
        # ioemu devices don't care about collisions
	if flag & DEVITNX_F_PT|DEVITNX_F_MULTI == 0:
	    return False
        phys_dev = self.devfn_to_dev(phys_devfn)
        for i in range(width):
            idx = self.pcigsi_wrap(pcigsi + i)
            for j in range(len(self.pcigsi_devintx[idx])):
                entry = self.pcigsi_devintx[idx][j]
                e_flag = self.entry_to_flag(entry)
                e_dev = self.devfn_to_dev(self.entry_to_phys(entry))
                if e_flag & DEVITNX_F_PT or                             \
                   (not flag & DEVITNX_F_MULTI and e_flag & DEVITNX_F_MULTI):
                    return True
        return False

    def find_empty(self, intx, width, phys_devfn, flag):
        min_density = NR_PCI_DEVINTX
        min_pcigsi = 0
	# Don't do this, GSIs need to be used densely to maximise the
	# posibility of space being available for multi-function devices,
	# which need 4 contigous GSIs
	# # Loop over devices and then convert them to pcigsi rather than
	# # looping over pcigsi directly to prefer lower numbered and
	# # contigious virtual device assignment
        # for v_devfn in range(NR_PCI_DEV):
        #     pcigsi = self.dev_intx_to_pcigsi(v_devfn, intx)
	for pcigsi in range(NR_PCIGSI):
            if self.collision(pcigsi, width, phys_devfn, flag):
                continue
            density = self.density(pcigsi, width)
            if min_density > density:
                min_density = density
                min_pcigsi = pcigsi
        if min_density == NR_PCI_DEVINTX:
            return -1
        return min_pcigsi

    def assign(self, phys, intx, pcigsi, v_fn, flag):
        v_dev = self.pcigsi_intx_to_dev(pcigsi, intx)
        v_devfn = self.devfn(v_dev, v_fn)
        entry = self.entry(phys, v_devfn, intx, flag)
        self.pcigsi_devintx[pcigsi].append(entry)
        return v_devfn

    def pv_multi_fn0_pcigsi(self, phys_dev):
        for pcigsi in range(NR_PCIGSI):
            for i in range(len(self.pcigsi_devintx[pcigsi])):
                entry = self.pcigsi_devintx[pcigsi][i]
                flag = self.entry_to_flag(entry)
                devfn = self.entry_to_phys(entry)
                dev = self.devfn_to_dev(devfn)
                fn = self.devfn_to_fn(devfn)
                if flag & DEVITNX_F_PT and flag & DEVITNX_F_MULTI and  \
                   dev == phys_dev and fn == 0:
                    return pcigsi
        return -1

    def assign_pt_multi_fn0(self, phys_dev):
        # Function 0 of a multi-function device reserves 4 gsi
        # As the device may use INTB, C or D as well as INTA,
        # which is used by function 0
        phys_devfn = self.devfn(phys_dev, 0)
        pcigsi = self.find_empty(0, 4, phys_devfn,                      \
                                 DEVITNX_F_PT|DEVITNX_F_MULTI)
        if pcigsi < 0:
	    log.debug("Can't find 4 contigous free slots\n")
            return -1
        for i in range(1, 4):
            self.assign(phys_devfn, i, self.pcigsi_wrap(pcigsi + i), 0, \
                                  DEVITNX_F_MULTI)
        return self.assign(phys_devfn, 0, pcigsi, 0,                    \
                           DEVITNX_F_PT|DEVITNX_F_MULTI)

    def assign_pt_multi_fnx(self, phys_dev, phys_fn, phys_intx, fn0_pcigsi):
        # For the non-zeorth function of a multi-function device
        # when the fucntion 0 has already been inserted
        pcigsi = self.pcigsi_wrap(fn0_pcigsi + phys_intx)
        phys_devfn = self.devfn(phys_dev, phys_fn)
        for i in range(len(self.pcigsi_devintx[pcigsi])):
             entry = self.pcigsi_devintx[pcigsi][i]
             e_flag = self.entry_to_flag(entry)
             e_virt = self.entry_to_virt(entry)
             e_devfn = self.entry_to_phys(entry)
             e_dev = self.devfn_to_dev(e_devfn)
             if not e_flag & DEVITNX_F_PT and                           \
	        e_flag & DEVITNX_F_MULTI and e_dev == phys_dev:
                 self.entry_update(entry, phys_devfn, e_virt,           \
                                   phys_intx, DEVITNX_F_PT|DEVITNX_F_MULTI)
                 return phys_devfn
        return self.assign(phys_devfn, phys_intx, pcigsi, phys_fn,      \
                               DEVITNX_F_PT|DEVITNX_F_MULTI)

    def unassign(self, pcigsi, idx):
        self.pcigsi_devintx[pcigsi].pop(idx)

    def pv_multi_dev_in_use_by_non_zero(self, phys_dev):
        ret = []
        for pcigsi in range(NR_PCIGSI):
            for i in range(len(self.pcigsi_devintx[pcigsi])):
                entry = self.pcigsi_devintx[pcigsi][i]
                e_flag = self.entry_to_flag(entry)
                e_devfn = self.entry_to_phys(entry)
                e_dev = self.devfn_to_dev(e_devfn)
                e_fn = self.devfn_to_fn(e_devfn)
                if e_flag & DEVITNX_F_PT and e_dev == phys_dev and e_fn != 0:
                    ret.append(entry)
        return ret

    def unassign_pt_multi_fn0(self, phys_dev, phys_fn):
        # Function 0 of a devices that has been passed through
        # as multi-function can't be removed as long as other
        # functions are passed through as part of the same
        # virtual device
        error = self.pv_multi_dev_in_use_by_non_zero(phys_dev)
        if len(error):
            s = "Can't unassign dev=0x%02x,func=0x%x because "          \
                "it is still in use by:" % (phys_dev, phys_fn)
            for i in range(len(error)):
                entry = error[i]
                e_devfn = self.entry_to_phys(entry)
                e_dev = self.devfn_to_dev(e_devfn)
                e_fn = self.devfn_to_fn(e_devfn)
                s += " dev=0x%02x,func=0x%x" % (e_dev, e_fn)
            log.error(s)
            return -1
        # Remove the device and any place holders
        for pcigsi in range(NR_PCIGSI):
	    remove = []
            for i in range(len(self.pcigsi_devintx[pcigsi])):
                entry = self.pcigsi_devintx[pcigsi][i]
                e_devfn = self.entry_to_phys(entry)
                e_dev = self.devfn_to_dev(e_devfn)
                e_flag = self.entry_to_flag(entry)
                if e_flag & DEVITNX_F_PT|DEVITNX_F_MULTI and            \
                   e_dev == phys_dev:
		    remove.append(i)
	    removed = 0
            for i in remove:
                self.unassign(pcigsi, i - removed)
		removed += 1
        return 0

    def unassign_pt_multi_fnx(self, phys_dev, phys_fn, pcigsi, idx):
        # If this is the last function for this device
        # attached to this gsi then convert it to a place holder.
        # Otherwise remove it.
        density = 0
        for i in range(len(self.pcigsi_devintx[pcigsi])):
            entry = self.pcigsi_devintx[pcigsi][i]
            e_devfn = self.entry_to_phys(entry)
            e_dev = self.devfn_to_dev(e_devfn)
            e_flag = self.entry_to_flag(entry)
            if e_flag & DEVITNX_F_PT|DEVITNX_F_MULTI and               \
               e_dev == phys_dev:
                density += 1
        if density > 1:
            self.unassign(pcigsi, idx)
	    return
        entry = self.pcigsi_devintx[pcigsi][idx]
        e_devfn = self.entry_to_phys(entry)
        e_dev = self.devfn_to_dev(e_devfn)
        self.entry_update(entry, self.devfn(e_dev, 0), 0, 0, DEVITNX_F_MULTI)

    # Functions below this line are intended to be public

    # idx is just a key for debuging
    # It could be removed. Or it could be used to search for devices
    # that have already been insergted.
    # Removal doesn't occur for ioemu devices, but if it did,
    # then idx could be used for that too.
    def assign_ioemu(self, idx):
        # IOEMU devices always use function 0 and thus INTA
        pcigsi = self.find_empty(0, 1, -1, 0)
        if pcigsi < 0:
            return -1
        return self.assign(idx, 0, pcigsi, 0, 0)

    # assign_pt_single and assign_pt_multi should probably
    # be a single function.

    def assign_pt_single(self, phys_dev, phys_fn):
        phys_devfn = self.devfn(phys_dev, phys_fn)
        pcigsi = self.find_empty(0, 1, phys_devfn, DEVITNX_F_PT)
        if pcigsi < 0:
            return -1
        return self.assign(phys_devfn, 0, pcigsi, 0, DEVITNX_F_PT)

    def assign_pt_multi(self, phys_dev, phys_fn, phys_intx):
        if phys_fn == 0:
            return self.assign_pt_multi_fn0(phys_dev)
        pcigsi = self.pv_multi_fn0_pcigsi(phys_dev)
        if pcigsi >= 0:
            return self.assign_pt_multi_fnx(phys_dev, phys_fn,          \
                                            phys_intx, pcigsi)
        # This does not get the DEVITNX_F_MULTI flag set because
        # it is not assigned to the same virtual device as function 0
        # and thus is passed-through as a single-function
        return self.assign_pt_single(phys_dev, phys_fn)

    def unassign_pt(self, phys_dev, phys_fn):
        phys_devfn = self.devfn(phys_dev, phys_fn)
        for pcigsi in range(NR_PCIGSI):
            for i in range(len(self.pcigsi_devintx[pcigsi])):
                entry = self.pcigsi_devintx[pcigsi][i]
                e_devfn = self.entry_to_phys(entry)
                e_flag = self.entry_to_flag(entry)
                if phys_devfn != e_devfn or not e_flag & DEVITNX_F_PT:
                    continue
                # If it isn't marked multi-function
                # just remove it as it doesn't affect any other entries
                e_fn = self.devfn_to_fn(e_devfn)
                if not e_flag & DEVITNX_F_MULTI:
                    self.unassign(pcigsi, i)
                    return 0
                # If it is a non-zero function, it can be removed as it
                # doesn't affect any other entries, but be sure to leave a
                # place holder as neccessary
                e_fn = self.devfn_to_fn(e_devfn)
                if e_fn != 0:
                    self.unassign_pt_multi_fnx(phys_dev, phys_fn, pcigsi, i)
                    return 0
                # Function 0 of a devices that has been passed through
                # as multi-function can't be removed as long as other
                # functions are passed through as part of the same
                # virtual device
                return self.unassign_pt_multi_fn0(phys_dev, phys_fn)
        log.error("Can\'t unassign device dev=0x%02x,func=0x%x, "      \
                  "it hasn't been assigned" % (dev, fn))
        return -1

    def dump(self):
        log.debug("-------")
        for pcigsi in range(NR_PCIGSI):
            if not self.density(pcigsi, 1):
                continue
            log.debug("pcigsi=%02x:" % pcigsi)
            for i in range(len(self.pcigsi_devintx[pcigsi])):
                s = "\t"
                entry = self.pcigsi_devintx[pcigsi][i]
                if self.entry_to_flag(entry) & DEVITNX_F_PT|DEVITNX_F_MULTI:
                    s += "phys=0x%02x,0x%x " %                          \
                        (self.devfn_to_dev(self.entry_to_phys(entry)),  \
                         self.devfn_to_fn(self.entry_to_phys(entry)))
                else:
                    s += "phys=0x%02x     " % (self.entry_to_phys(entry))
                s += "virt=0x%02x,0x%x intx=0x%x flag=0x%x" %           \
                         (self.devfn_to_dev(self.entry_to_virt(entry)), \
                          self.devfn_to_fn(self.entry_to_virt(entry)),  \
                          self.entry_to_intx(entry),
                          self.entry_to_flag(entry))
                if self.entry_to_flag(entry) == DEVITNX_F_PT:
                    s += "(pv)"
                elif self.entry_to_flag(entry) ==                       \
		         DEVITNX_F_PT|DEVITNX_F_MULTI:
                    s += "(pv,multi)"
                elif self.entry_to_flag(entry) == DEVITNX_F_MULTI:
                    s += "(multi)"
                else:
                    s += "(ioemu)"
                log.debug("%s", s)

gsi = gsi()
#for i in range(128):

for dev in range(12):
    if dev % 4 == 0:
        v_dev = gsi.assign_ioemu(dev)
        if v_dev < 0:
            log.error("Can\'t assign ioemu device idx=0x%02x" % dev)
            gsi.dump()
        #gsi.dump()
    if dev % 4 == 1:
        for fn in range(8):
            v_dev = gsi.assign_pt_single(dev, fn)
            if v_dev < 0:
                log.error("Can\'t assign pt device dev=0x%02x,func=0x%x" %\
                              (dev, fn))
                gsi.dump()
        #gsi.dump()
        #for fn in range(8):
        #    v_dev = gsi.unassign_pt(dev, fn)
        #    if v_dev < 0:
        #        log.error("Can\'t unassign pv device dev=0x%02x,func=0x%x" %\
        #                      (dev, fn))
        #        gsi.dump()
        #gsi.dump()
    if dev % 4 == 2:
        for fn in range(8):
            intx = fn % 4
            v_dev = gsi.assign_pt_multi(dev, fn, intx)
            if v_dev < 0:
                log.error("Can\'t assign pv multi-function device "     \
                          "dev=0x%02x,func=0x%x,intx=0x%x" % (dev, fn, intx))
                gsi.dump()
        #gsi.dump()
        #for fn in range(8):
        #    intx = fn % 4
        #    v_dev = gsi.unassign_pt(dev, fn)
        #    if v_dev < 0:
        #        log.error("Can\'t unassign pv multi-function device "     \
        #                  "dev=0x%02x,func=0x%x" % (dev, fn))
        #        gsi.dump()
        #gsi.dump()
    if dev % 4 == 3:
        for fn in range(8):
            intx = fn % 4
            v_dev = gsi.assign_pt_multi(dev, fn, intx)
            if v_dev < 0:
                log.error("Can\'t assign pv multi-function device "     \
                          "dev=0x%02x,func=0x%x,intx=0x%x" % (dev, fn, intx))
                gsi.dump()
            v_dev = gsi.unassign_pt(dev, fn)
            if v_dev < 0:
                log.error("Can\'t unassign pv multi-function device "     \
                          "dev=0x%02x,func=0x%x" % (dev, fn))
                gsi.dump()
            v_dev = gsi.assign_pt_multi(dev, fn, intx)
            if v_dev < 0:
                log.error("Can\'t assign pv multi-function device "     \
                          "dev=0x%02x,func=0x%x,intx=0x%x" % (dev, fn, intx))
                gsi.dump()
        #gsi.dump()
        #for fn in [7, 6, 5, 4, 3, 2, 1, 0]:
        #    intx = fn % 4
        #    v_dev = gsi.unassign_pt(dev, fn)
        #    if v_dev < 0:
        #        log.error("Can\'t unassign pv multi-function device "     \
        #                  "dev=0x%02x,func=0x%x" % (dev, fn))
        #        gsi.dump()
        #    v_dev = gsi.assign_pt_multi(dev, fn, intx)
        #    if v_dev < 0:
        #        log.error("Can\'t assign pv multi-function device "     \
        #                  "dev=0x%02x,func=0x%x,intx=0x%x" % (dev, fn, intx))
        #        gsi.dump()
        #    v_dev = gsi.unassign_pt(dev, fn)
        #    if v_dev < 0:
        #        log.error("Can\'t unassign pv multi-function device "     \
        #                  "dev=0x%02x,func=0x%x" % (dev, fn))
        #        gsi.dump()
        #    gsi.dump()

gsi.dump()

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-04 23:05                           ` Simon Horman
@ 2009-03-05  8:34                             ` Keir Fraser
  2009-03-05  9:05                               ` Simon Horman
  0 siblings, 1 reply; 62+ messages in thread
From: Keir Fraser @ 2009-03-05  8:34 UTC (permalink / raw)
  To: Simon Horman; +Cc: Yuji Shimada, xen-devel, Ian Jackson

On 04/03/2009 23:05, "Simon Horman" <horms@verge.net.au> wrote:

> I am guessing that your idea was not for xend to allocate all
> of those devfn and pass them to qemu-dm on the command line.

That would be reasonable. Does xend need to know or care about the virtual
devfn for any reason?

> My instinct is that it would really be easier to allocate devn -
> using something like the algorithm I describe below - in qemu-dm
> rather than xend.

The python version looks unfeasibly complicated. Perhaps this kind of thing
is easier implemented in C? :-)

 -- Keir

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-05  8:34                             ` Keir Fraser
@ 2009-03-05  9:05                               ` Simon Horman
  2009-03-05  9:22                                 ` Keir Fraser
  2009-03-05  9:26                                 ` Keir Fraser
  0 siblings, 2 replies; 62+ messages in thread
From: Simon Horman @ 2009-03-05  9:05 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Yuji Shimada, xen-devel, Ian Jackson

On Thu, Mar 05, 2009 at 08:34:57AM +0000, Keir Fraser wrote:
> On 04/03/2009 23:05, "Simon Horman" <horms@verge.net.au> wrote:
> 
> > I am guessing that your idea was not for xend to allocate all
> > of those devfn and pass them to qemu-dm on the command line.
> 
> That would be reasonable. Does xend need to know or care about the virtual
> devfn for any reason?

I don't think that xend cares about devfn at this time.

> > My instinct is that it would really be easier to allocate devn -
> > using something like the algorithm I describe below - in qemu-dm
> > rather than xend.
> 
> The python version looks unfeasibly complicated. Perhaps this kind of thing
> is easier implemented in C? :-)

I should note that was the first python code I have written
(other than the occasional bug fix here and there). I expect
that I could write a more reasonable implementation in C without
too much trouble. (I also expect I could write a more reasonable
python version if needed.)

Another reason that the code is complex is that the mapping
rules - expercially regarding sharing of GSI - seem complex to me.
And to be honest, I'm not sure that I have them correct yet.

Currently my assumptions are:
* ioemu devices may share GSI, though its best to limit sharing as
  much as possible
* pass-through devices may not share a GSI with another pass-through
  device, and its better if they don't share a GSI with an ioemu
  device either
* for multi-function devices, the device needs 4 consecutive GSI,
  so that INTA, B, C and D can be used by functions of the device.
  - this can probably be relaxed in many cases as its probably
    true that many multi-function devices don't use all of
    INTA, B, C and D. But my feeling is that accounting
    for this will add complexity rather than remove it.

* I have not accounted for MSI devices/functions at this time,
  but my understanding is that they don't need a GSI at all.
  So it should be just a matter of giving them a device
  and making sure nothing else uses it.


A question related to GSI that I would like your opinion on is
if devices need to be allocated sequentially. Currently the
code that allocates devfn (which is in qemu-dm to answer your
question from another email) tries to allocate them sequentially,
save a hole left for hotplug.

After studying the diagram that Shimada-san sent, I have concluded that in
order to reduce fragmentation of the GSI space (so that 4 continous GSI can
be allocated to multi-function devices as much as possible, assuming that
the number of pass-through devices is expanded from 2, which I would like
to do) it would be better to allocate every 8th device, which corresponds
to using intA for each GSI in turn.

GSI 16, INTA -> Dev 0
GSI 17, INTA -> Dev 8
GSI 18, INTA -> Dev 16
GSI 19, INTA -> Dev 24
GSI 20, INTA -> Dev 1
...

So instead of allocating by scanning the rows in the diagram,
we work down the columns.

http://lists.xensource.com/archives/html/xen-devel/2009-02/pngmIO1Sm0VEX.png

Or to give a more conrete example:
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 Class ff80: XenSource, Inc. Xen Platform Device (rev 01)
00: 4.0 Ethernet controller: Realtek Semiconductor Co., Ltd.  RTL-8139/8139C/8139C+ (rev 20)

Would become:
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd.  RTL-8139/8139C/8139C+ (rev 20)
00:08.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:08.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:08.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:16.0 VGA compatible controller: Cirrus Logic GD 5446
00:24.0 Class ff80: XenSource, Inc. Xen Platform Device (rev 01)

I can't actually get a system with this mapping to boot
(which prehaps means its flawed for the first few devices).
But a fundamental problem (e.g. for users) with using
such an allocation stratergy?

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-05  9:05                               ` Simon Horman
@ 2009-03-05  9:22                                 ` Keir Fraser
  2009-03-05  9:42                                   ` Simon Horman
  2009-03-05  9:26                                 ` Keir Fraser
  1 sibling, 1 reply; 62+ messages in thread
From: Keir Fraser @ 2009-03-05  9:22 UTC (permalink / raw)
  To: Simon Horman; +Cc: Yuji Shimada, xen-devel, Ian Jackson

On 05/03/2009 09:05, "Simon Horman" <horms@verge.net.au> wrote:

> * pass-through devices may not share a GSI with another pass-through
>   device, and its better if they don't share a GSI with an ioemu
>   device either

Why is such sharing disallowed? Is this a problem with mapping multiple MSI
sources to a single level-triggered GSI? If we can reliably map one MSI to
an emulated GSI, I would have thought that wire-ORing them would be easy,
but perhaps the emulation is dodgy to begin with, even without sharing?

> I can't actually get a system with this mapping to boot
> (which prehaps means its flawed for the first few devices).
> But a fundamental problem (e.g. for users) with using
> such an allocation stratergy?

Yes, a sparse mapping into the devfn space is absolutely fine. Indeed you'll
be needing to consider all that space to make efficient use of the currently
available 32 GSIs.

 -- Keir

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-05  9:05                               ` Simon Horman
  2009-03-05  9:22                                 ` Keir Fraser
@ 2009-03-05  9:26                                 ` Keir Fraser
  2009-03-05  9:31                                   ` Jiang, Yunhong
  2009-03-05  9:41                                   ` Simon Horman
  1 sibling, 2 replies; 62+ messages in thread
From: Keir Fraser @ 2009-03-05  9:26 UTC (permalink / raw)
  To: Simon Horman; +Cc: Yuji Shimada, xen-devel, Ian Jackson

On 05/03/2009 09:05, "Simon Horman" <horms@verge.net.au> wrote:

> * I have not accounted for MSI devices/functions at this time,
>   but my understanding is that they don't need a GSI at all.
>   So it should be just a matter of giving them a device
>   and making sure nothing else uses it.

Depending on how critical your no-sharing-between-passthru-devices is, you
might need to be careful with this. Just because a device is MSI-capable
doesn't mean the guest OS will use it (I don't know how common that would be
in practice though!) -- generally there will be a legacy INTx operation mode
which you might want to consider allocating GSIs for. Just a point of
information.

 -- Keir

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-05  9:26                                 ` Keir Fraser
@ 2009-03-05  9:31                                   ` Jiang, Yunhong
  2009-03-05  9:57                                     ` Simon Horman
  2009-03-05  9:41                                   ` Simon Horman
  1 sibling, 1 reply; 62+ messages in thread
From: Jiang, Yunhong @ 2009-03-05  9:31 UTC (permalink / raw)
  To: Keir Fraser, Simon Horman; +Cc: Yuji Shimada, xen-devel, Ian Jackson

xen-devel-bounces@lists.xensource.com <> wrote:
> On 05/03/2009 09:05, "Simon Horman" <horms@verge.net.au> wrote:
> 
>> * I have not accounted for MSI devices/functions at this time,
>>   but my understanding is that they don't need a GSI at all.
>>   So it should be just a matter of giving them a device
>>   and making sure nothing else uses it.
> 
> Depending on how critical your
> no-sharing-between-passthru-devices is, you
> might need to be careful with this. Just because a device is MSI-capable
> doesn't mean the guest OS will use it (I don't know how common
> that would be

I think most OS will support a option to disable MSI, like Vista or Linux.

Thanks
Yunhong Jiang

> in practice though!) -- generally there will be a legacy INTx operation mode
> which you might want to consider allocating GSIs for. Just a point of
> information. 
> 
> -- Keir
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-05  9:26                                 ` Keir Fraser
  2009-03-05  9:31                                   ` Jiang, Yunhong
@ 2009-03-05  9:41                                   ` Simon Horman
  1 sibling, 0 replies; 62+ messages in thread
From: Simon Horman @ 2009-03-05  9:41 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Yuji Shimada, xen-devel, Ian Jackson

On Thu, Mar 05, 2009 at 09:26:18AM +0000, Keir Fraser wrote:
> On 05/03/2009 09:05, "Simon Horman" <horms@verge.net.au> wrote:
> 
> > * I have not accounted for MSI devices/functions at this time,
> >   but my understanding is that they don't need a GSI at all.
> >   So it should be just a matter of giving them a device
> >   and making sure nothing else uses it.
> 
> Depending on how critical your no-sharing-between-passthru-devices is, you
> might need to be careful with this. Just because a device is MSI-capable
> doesn't mean the guest OS will use it (I don't know how common that would be
> in practice though!) -- generally there will be a legacy INTx operation mode
> which you might want to consider allocating GSIs for. Just a point of
> information.

That is a very good point. I suspect that will mean that we need
to assume that INTx will be used and thus MSI won't be a special case.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-05  9:22                                 ` Keir Fraser
@ 2009-03-05  9:42                                   ` Simon Horman
  2009-03-06  1:29                                     ` Yuji Shimada
  0 siblings, 1 reply; 62+ messages in thread
From: Simon Horman @ 2009-03-05  9:42 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Yuji Shimada, xen-devel, Ian Jackson

On Thu, Mar 05, 2009 at 09:22:28AM +0000, Keir Fraser wrote:
> On 05/03/2009 09:05, "Simon Horman" <horms@verge.net.au> wrote:
> 
> > * pass-through devices may not share a GSI with another pass-through
> >   device, and its better if they don't share a GSI with an ioemu
> >   device either
> 
> Why is such sharing disallowed? Is this a problem with mapping multiple MSI
> sources to a single level-triggered GSI? If we can reliably map one MSI to
> an emulated GSI, I would have thought that wire-ORing them would be easy,
> but perhaps the emulation is dodgy to begin with, even without sharing?

Shimada-san mentioned it in a previous post, hopefully he can explain further.

> > I can't actually get a system with this mapping to boot
> > (which prehaps means its flawed for the first few devices).
> > But a fundamental problem (e.g. for users) with using
> > such an allocation stratergy?
> 
> Yes, a sparse mapping into the devfn space is absolutely fine. Indeed you'll
> be needing to consider all that space to make efficient use of the currently
> available 32 GSIs.

Agreed

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-05  9:31                                   ` Jiang, Yunhong
@ 2009-03-05  9:57                                     ` Simon Horman
  2009-03-05 10:13                                       ` Simon Horman
  2009-03-05 14:47                                       ` Jiang, Yunhong
  0 siblings, 2 replies; 62+ messages in thread
From: Simon Horman @ 2009-03-05  9:57 UTC (permalink / raw)
  To: Jiang, Yunhong; +Cc: Yuji Shimada, xen-devel, Ian Jackson, Keir Fraser

On Thu, Mar 05, 2009 at 05:31:15PM +0800, Jiang, Yunhong wrote:
> xen-devel-bounces@lists.xensource.com <> wrote:
> > On 05/03/2009 09:05, "Simon Horman" <horms@verge.net.au> wrote:
> > 
> >> * I have not accounted for MSI devices/functions at this time,
> >>   but my understanding is that they don't need a GSI at all.
> >>   So it should be just a matter of giving them a device
> >>   and making sure nothing else uses it.
> > 
> > Depending on how critical your
> > no-sharing-between-passthru-devices is, you
> > might need to be careful with this. Just because a device is MSI-capable
> > doesn't mean the guest OS will use it (I don't know how common
> > that would be
> 
> I think most OS will support a option to disable MSI, like Vista or Linux.

So it is entirely possible that MSI won't be used on an MSI-capable
function?  Is this true of SR-IOV (virtual) functions too?

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-05  9:57                                     ` Simon Horman
@ 2009-03-05 10:13                                       ` Simon Horman
  2009-03-05 12:44                                         ` Keir Fraser
  2009-03-05 14:47                                       ` Jiang, Yunhong
  1 sibling, 1 reply; 62+ messages in thread
From: Simon Horman @ 2009-03-05 10:13 UTC (permalink / raw)
  To: Jiang, Yunhong; +Cc: Yuji Shimada, xen-devel, Ian Jackson, Keir Fraser

On Thu, Mar 05, 2009 at 08:57:20PM +1100, Simon Horman wrote:
> On Thu, Mar 05, 2009 at 05:31:15PM +0800, Jiang, Yunhong wrote:
> > xen-devel-bounces@lists.xensource.com <> wrote:
> > > On 05/03/2009 09:05, "Simon Horman" <horms@verge.net.au> wrote:
> > > 
> > >> * I have not accounted for MSI devices/functions at this time,
> > >>   but my understanding is that they don't need a GSI at all.
> > >>   So it should be just a matter of giving them a device
> > >>   and making sure nothing else uses it.
> > > 
> > > Depending on how critical your
> > > no-sharing-between-passthru-devices is, you
> > > might need to be careful with this. Just because a device is MSI-capable
> > > doesn't mean the guest OS will use it (I don't know how common
> > > that would be
> > 
> > I think most OS will support a option to disable MSI, like Vista or Linux.
> 
> So it is entirely possible that MSI won't be used on an MSI-capable
> function?  Is this true of SR-IOV (virtual) functions too?

Thinking more. One of the arguments surounding why 32 GSI is enough,
is that systems with more functions than that can reasonably be
expected to be using MSI. And thus will not actually use more than 32 GSI.

I agree with that argument. But in terms of reserving GSI, it seems to
break down if we can never know if a function will actually use MSI, and
thus need to reserve a GSI just in case.

Perhaps this is a case of find a real-world example and then
we will worry about it?

Or perhaps hints are required from xend and in turn the user - its ok, MSI
will be used for this function, so there is no need to reserve a GSI for it.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-05 10:13                                       ` Simon Horman
@ 2009-03-05 12:44                                         ` Keir Fraser
  0 siblings, 0 replies; 62+ messages in thread
From: Keir Fraser @ 2009-03-05 12:44 UTC (permalink / raw)
  To: Simon Horman, Jiang, Yunhong; +Cc: Yuji Shimada, xen-devel, Ian Jackson

On 05/03/2009 10:13, "Simon Horman" <horms@verge.net.au> wrote:

>> So it is entirely possible that MSI won't be used on an MSI-capable
>> function?  Is this true of SR-IOV (virtual) functions too?
> 
> Thinking more. One of the arguments surounding why 32 GSI is enough,
> is that systems with more functions than that can reasonably be
> expected to be using MSI. And thus will not actually use more than 32 GSI.

I think that is a reasonable assumption. But the fallback to GSI does need
to work, and I wonder why sharing of GSIs among passed-thru devices needs to
be absolutely disallowed? We'd have an easier time if it worked. ;-)

 -- Keir

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-05  9:57                                     ` Simon Horman
  2009-03-05 10:13                                       ` Simon Horman
@ 2009-03-05 14:47                                       ` Jiang, Yunhong
  1 sibling, 0 replies; 62+ messages in thread
From: Jiang, Yunhong @ 2009-03-05 14:47 UTC (permalink / raw)
  To: Simon Horman; +Cc: Yuji Shimada, xen-devel, Ian Jackson, Keir Fraser

Simon Horman <mailto:horms@verge.net.au> wrote:
> On Thu, Mar 05, 2009 at 05:31:15PM +0800, Jiang, Yunhong wrote:
>> xen-devel-bounces@lists.xensource.com <> wrote:
>>> On 05/03/2009 09:05, "Simon Horman" <horms@verge.net.au> wrote:
>>> 
>>>> * I have not accounted for MSI devices/functions at this time,
>>>>   but my understanding is that they don't need a GSI at all.
>>>>   So it should be just a matter of giving them a device
>>>>   and making sure nothing else uses it.
>>> 
>>> Depending on how critical your
>>> no-sharing-between-passthru-devices is, you
>>> might need to be careful with this. Just because a device is MSI-capable
>>> doesn't mean the guest OS will use it (I don't know how common
>>> that would be
>> 
>> I think most OS will support a option to disable MSI, like Vista or Linux.
> 
> So it is entirely possible that MSI won't be used on an MSI-capable
> function?  Is this true of SR-IOV (virtual) functions too?

I rememeber SR-IOV VF requires  MSI support. Maybe that means we can't disable MSI if we need SR-IOV.

Thanks
Yunhong Jiang

> 
> --
> Simon Horman
>  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
>  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-05  9:42                                   ` Simon Horman
@ 2009-03-06  1:29                                     ` Yuji Shimada
  2009-03-06  2:46                                       ` Simon Horman
  2009-03-06  8:15                                       ` Keir Fraser
  0 siblings, 2 replies; 62+ messages in thread
From: Yuji Shimada @ 2009-03-06  1:29 UTC (permalink / raw)
  To: Simon Horman; +Cc: xen-devel, Ian Jackson, Keir Fraser

On Thu, 5 Mar 2009 20:42:32 +1100
Simon Horman <horms@verge.net.au> wrote:

> On Thu, Mar 05, 2009 at 09:22:28AM +0000, Keir Fraser wrote:
> > On 05/03/2009 09:05, "Simon Horman" <horms@verge.net.au> wrote:
> > 
> > > * pass-through devices may not share a GSI with another pass-through
> > >   device, and its better if they don't share a GSI with an ioemu
> > >   device either
> > 
> > Why is such sharing disallowed? Is this a problem with mapping multiple MSI
> > sources to a single level-triggered GSI? If we can reliably map one MSI to
> > an emulated GSI, I would have thought that wire-ORing them would be easy,
> > but perhaps the emulation is dodgy to begin with, even without sharing?
> 
> Shimada-san mentioned it in a previous post, hopefully he can explain further.

Please read hvm_dpci_eoi in xen/drivers/passthrough/io.c.

void hvm_dpci_eoi(struct domain *d, unsigned int guest_gsi,
                  union vioapic_redir_entry *ent)
{
    ....
    device = hvm_irq_dpci->girq[guest_gsi].device;
    intx = hvm_irq_dpci->girq[guest_gsi].intx;
    hvm_pci_intx_deassert(d, device, intx);

    machine_gsi = hvm_irq_dpci->girq[guest_gsi].machine_gsi;

Current hypervisor assumes only one pass-throughed device is connected
to a guest gsi.

It is possible to enhance to connect more than one pass-throughed
device to a guest gsi, I think. If expanding guest GSI to 143 is not
acceptable, sharing guest GSI between pass-throughed devices is one
approach.


In my understanding, Windows 2003 does not support MSI/MSI-X. So
supporting INTx is still important, I think.

Thanks,
--
Yuji Shimada

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-06  1:29                                     ` Yuji Shimada
@ 2009-03-06  2:46                                       ` Simon Horman
  2009-03-06  6:55                                         ` Yuji Shimada
  2009-03-06  8:15                                       ` Keir Fraser
  1 sibling, 1 reply; 62+ messages in thread
From: Simon Horman @ 2009-03-06  2:46 UTC (permalink / raw)
  To: Yuji Shimada; +Cc: xen-devel, Ian Jackson, Keir Fraser

On Fri, Mar 06, 2009 at 10:29:59AM +0900, Yuji Shimada wrote:
> On Thu, 5 Mar 2009 20:42:32 +1100
> Simon Horman <horms@verge.net.au> wrote:
> 
> > On Thu, Mar 05, 2009 at 09:22:28AM +0000, Keir Fraser wrote:
> > > On 05/03/2009 09:05, "Simon Horman" <horms@verge.net.au> wrote:
> > > 
> > > > * pass-through devices may not share a GSI with another pass-through
> > > >   device, and its better if they don't share a GSI with an ioemu
> > > >   device either
> > > 
> > > Why is such sharing disallowed? Is this a problem with mapping multiple MSI
> > > sources to a single level-triggered GSI? If we can reliably map one MSI to
> > > an emulated GSI, I would have thought that wire-ORing them would be easy,
> > > but perhaps the emulation is dodgy to begin with, even without sharing?
> > 
> > Shimada-san mentioned it in a previous post, hopefully he can explain further.
> 
> Please read hvm_dpci_eoi in xen/drivers/passthrough/io.c.
> 
> void hvm_dpci_eoi(struct domain *d, unsigned int guest_gsi,
>                   union vioapic_redir_entry *ent)
> {
>     ....
>     device = hvm_irq_dpci->girq[guest_gsi].device;
>     intx = hvm_irq_dpci->girq[guest_gsi].intx;
>     hvm_pci_intx_deassert(d, device, intx);
> 
>     machine_gsi = hvm_irq_dpci->girq[guest_gsi].machine_gsi;
> 
> Current hypervisor assumes only one pass-throughed device is connected
> to a guest gsi.

Hi Shimada-san,

Thanks, that is very helpful.

> It is possible to enhance to connect more than one pass-throughed
> device to a guest gsi, I think. If expanding guest GSI to 143 is not
> acceptable, sharing guest GSI between pass-throughed devices is one
> approach.

If we were to take the approach of allowing shared GSI,
then I guess a naieve method would be to:

1. Expand  hvm_irq_dpci->girq[guest_gsi] to allow it to
   have a list of devices rather than a single device

2. Have hvm_dpci_eoi() loop through each of these devices
   - A possible optimisation (suggested to me by Carsten Haitzler)
     + Have a mechanism so that we know which device
       actually handled the interrupt, and move it to the top
       of the list so that it is tried first next time.
       Assuming busy devices are likely to stay busy.

On the other hand, looking at expanding the number of guest GSI.
Are these actually mapped to hardware GSI, other than through
machine_gsi = hvm_irq_dpci->girq[guest_gsi].machine_gsi ?
I ask because Keir mentioned that one reason for not extending
the range is a worry that some HW won't be able to cope with that many GSI.

> In my understanding, Windows 2003 does not support MSI/MSI-X. So
> supporting INTx is still important, I think.

Understood.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-06  2:46                                       ` Simon Horman
@ 2009-03-06  6:55                                         ` Yuji Shimada
  2009-03-06  8:19                                           ` Keir Fraser
  0 siblings, 1 reply; 62+ messages in thread
From: Yuji Shimada @ 2009-03-06  6:55 UTC (permalink / raw)
  To: Simon Horman; +Cc: xen-devel, Ian Jackson, Keir Fraser

Hi Simon,

On Fri, 6 Mar 2009 13:46:40 +1100
Simon Horman <horms@verge.net.au> wrote:
> If we were to take the approach of allowing shared GSI,
> then I guess a naieve method would be to:
> 
> 1. Expand  hvm_irq_dpci->girq[guest_gsi] to allow it to
>    have a list of devices rather than a single device
> 
> 2. Have hvm_dpci_eoi() loop through each of these devices

I thought the same thing.

>    - A possible optimisation (suggested to me by Carsten Haitzler)
>      + Have a mechanism so that we know which device
>        actually handled the interrupt, and move it to the top
>        of the list so that it is tried first next time.
>        Assuming busy devices are likely to stay busy.

This does not seems to the optimization of sharing guest gsi for me.
This is optimization of sharing machine gsi, isn't it?

> On the other hand, looking at expanding the number of guest GSI.
> Are these actually mapped to hardware GSI, other than through
> machine_gsi = hvm_irq_dpci->girq[guest_gsi].machine_gsi ?

No, they are.

> I ask because Keir mentioned that one reason for not extending
> the range is a worry that some HW won't be able to cope with that many GSI.

"version register" of APIC in ICH9 has "maximum redirection entries"
field.  The field is equal to the number of interrupt input pins minus
one and is in the range 0 through 239. So I think virtual GSI can be
extended.

ICH9 datasheet: http://www.intel.com/assets/pdf/datasheet/316972.pdf


But I have not seen such big I/O APIC. So it is possible that guest OS
can't handle big virtual I/O APIC.

Thanks,
--
Yuji Shimada

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-06  1:29                                     ` Yuji Shimada
  2009-03-06  2:46                                       ` Simon Horman
@ 2009-03-06  8:15                                       ` Keir Fraser
  1 sibling, 0 replies; 62+ messages in thread
From: Keir Fraser @ 2009-03-06  8:15 UTC (permalink / raw)
  To: Yuji Shimada, Simon Horman; +Cc: xen-devel, Ian Jackson

On 06/03/2009 01:29, "Yuji Shimada" <shimada-yxb@necst.nec.co.jp> wrote:

> 
> Current hypervisor assumes only one pass-throughed device is connected
> to a guest gsi.
> 
> It is possible to enhance to connect more than one pass-throughed
> device to a guest gsi, I think. If expanding guest GSI to 143 is not
> acceptable, sharing guest GSI between pass-throughed devices is one
> approach.

I think we should support sharing. How hard is that? Linked lists in place
of singleton values and a few while loops. Conceptually speaking. :-)

 -- Keir

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough
  2009-03-06  6:55                                         ` Yuji Shimada
@ 2009-03-06  8:19                                           ` Keir Fraser
  0 siblings, 0 replies; 62+ messages in thread
From: Keir Fraser @ 2009-03-06  8:19 UTC (permalink / raw)
  To: Yuji Shimada, Simon Horman; +Cc: xen-devel, Ian Jackson

On 06/03/2009 06:55, "Yuji Shimada" <shimada-yxb@necst.nec.co.jp> wrote:

>> I ask because Keir mentioned that one reason for not extending
>> the range is a worry that some HW won't be able to cope with that many GSI.
> 
> "version register" of APIC in ICH9 has "maximum redirection entries"
> field.  The field is equal to the number of interrupt input pins minus
> one and is in the range 0 through 239. So I think virtual GSI can be
> extended.
> 
> ICH9 datasheet: http://www.intel.com/assets/pdf/datasheet/316972.pdf
> 
> But I have not seen such big I/O APIC. So it is possible that guest OS
> can't handle big virtual I/O APIC.

I really think we should support sharing, and distributing devices
efficiently across devfn space. Then think about changes like this with
compatibility and versioning issues. But you know my position already!

 -- Keir

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2009-03-06  8:19 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-17  9:07 [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
2009-02-17  9:07 ` [rfc 01/18] Make register_real_device() and unregister_real_device() static Simon Horman
2009-02-17  9:07 ` [rfc 02/18] Make dpci_infos static Simon Horman
2009-02-17  9:07 ` [rfc 03/18] ioemu: vslots needs to be freed on error in power_off_php_slot() Simon Horman
2009-02-17  9:07 ` [rfc 04/18] ioemu: Remove lsi_scsi_init()s devfn parameter as it is always passed as -1 Simon Horman
2009-02-17  9:07 ` [rfc 05/18] " Simon Horman
2009-02-17  9:07 ` [rfc 06/18] ioemu: Remove usb_ohci_init*()s devfn parameter as they are " Simon Horman
2009-02-17  9:07 ` [rfc 07/18] iommu: Define PCI_DEVFN_AUTO and use it Simon Horman
2009-02-17  9:07 ` [rfc 08/18] iommu: Use PCI_DEVFN to create devfn numbers Simon Horman
2009-02-17  9:07 ` [rfc 09/18] ioemu: piix4acpi.c: make various variables static Simon Horman
2009-02-17  9:07 ` [rfc 10/18] ioemu: piix4acpi.c: Simplfy PHPSlots structure Simon Horman
2009-02-17  9:07 ` [rfc 11/18] ioemu: piix4acpi.c: Consistently dont cast opaque to PHPSlots Simon Horman
2009-02-17  9:08 ` [rfc 12/18] ioemu: piix4acpi.c: remove unnecessary assignment of pci_slots to local variables Simon Horman
2009-02-17  9:08 ` [rfc 13/18] ioemu: piix4acpi.c: remove ACPI_PHP_SLOT_NUM Simon Horman
2009-02-17  9:08 ` [rfc 14/18] ioemu: use devfn instead of slots as the unit for passthrough Simon Horman
2009-02-17  9:08 ` [rfc 15/18] ioemu: use struct php_dev to pass around PCI pass-through assignment parameters Simon Horman
2009-02-17  9:08 ` [rfc 16/18] ioemu: non-destructive parsing of PCI assignement strings Simon Horman
2009-02-17  9:08 ` [rfc 17/18] ioemu: sort pass-through PCI devices before inserting Simon Horman
2009-02-17  9:08 ` [rfc 18/18] ioemu: Allow virtual function to be speficied for PCI pass-through Simon Horman
2009-02-17 12:03 ` [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough Ian Jackson
2009-02-17 22:24   ` Simon Horman
2009-02-18  3:12 ` Yuji Shimada
2009-02-19  6:15   ` Simon Horman
2009-02-19  9:21     ` Yuji Shimada
2009-02-19  9:38       ` Keir Fraser
2009-02-20  7:07         ` Simon Horman
2009-02-23  6:24           ` Yuji Shimada
2009-02-23  6:55             ` Simon Horman
2009-02-23  8:39               ` Yuji Shimada
2009-02-23  9:33                 ` Simon Horman
2009-02-23 11:31               ` Keir Fraser
2009-02-23 22:18                 ` Simon Horman
2009-03-02  4:14                   ` Simon Horman
2009-03-02  8:44                     ` Keir Fraser
2009-03-02  9:53                       ` Simon Horman
2009-03-02 10:08                         ` Keir Fraser
2009-03-02 11:25                           ` Simon Horman
2009-03-02 11:33                             ` Keir Fraser
2009-03-03  5:57                               ` Yuji Shimada
2009-03-03  8:56                                 ` Keir Fraser
2009-03-04 22:26                                 ` Simon Horman
2009-03-04 22:32                                   ` Keir Fraser
2009-03-04 22:53                                     ` Simon Horman
2009-03-04 23:05                           ` Simon Horman
2009-03-05  8:34                             ` Keir Fraser
2009-03-05  9:05                               ` Simon Horman
2009-03-05  9:22                                 ` Keir Fraser
2009-03-05  9:42                                   ` Simon Horman
2009-03-06  1:29                                     ` Yuji Shimada
2009-03-06  2:46                                       ` Simon Horman
2009-03-06  6:55                                         ` Yuji Shimada
2009-03-06  8:19                                           ` Keir Fraser
2009-03-06  8:15                                       ` Keir Fraser
2009-03-05  9:26                                 ` Keir Fraser
2009-03-05  9:31                                   ` Jiang, Yunhong
2009-03-05  9:57                                     ` Simon Horman
2009-03-05 10:13                                       ` Simon Horman
2009-03-05 12:44                                         ` Keir Fraser
2009-03-05 14:47                                       ` Jiang, Yunhong
2009-03-05  9:41                                   ` Simon Horman
2009-02-24  1:29                 ` Ian Pratt
2009-02-24  1:50                   ` Simon Horman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.