All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support
@ 2012-01-28 14:21 Alex Williamson
  2012-01-28 14:21 ` [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest Alex Williamson
                   ` (10 more replies)
  0 siblings, 11 replies; 54+ messages in thread
From: Alex Williamson @ 2012-01-28 14:21 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, jan.kiszka, shashidhar.patil

Patch 1 & 2 here are independent of the rest, but I include them
here to avoid conflicts.  The first patch enables exposing MMIO
BARs as their native width to the guest.  I added a config option
for this with the default to use the existing behavior as I
suspect we may have some latent issues there.  Patch 2 is just
some trivial debug build warning fixes.

The rest of the patches work on improving MSI-X table support.
Particularly, vectors can now be updated by the guest after
MSI-X is enabled to support things like irqbalance for SMP
affinity tuning.  We also now update MSI-X configuration as
new vectors are unmasked, which enables assignment of MSI-X
devices on FreeBSD.  I was able to assign and use an 82576
(PF & VF) on a FreeBSD 9.0 guest with this series.  Hopefully
Shashidhar can report whether this improves the behavior he
as seeing with an 82599.

I wasn't able to get masking to work reliably, so I left that
as is for now.  Perhaps someone has suggestions on getting that
to work.  Thanks,

Alex

---

Alex Williamson (9):
      pci-assign: Update MSI-X config based on table writes
      pci-assign: Use MSIX_PAGE_SIZE
      pci-assign: Allocate entries for all MSI-X vectors
      pci-assign: Proper initialization for MSI-X table
      pci-assign: Only calculate maximum MSI-X vector entries once
      pci-assign: Use struct for MSI-X table
      pci-assign: Update MSI-X MMIO to Memory API
      pci-assign: Fix warnings with DEBUG enabled
      pci-assign: Optionally enable 64bit BARs in guest


 hw/device-assignment.c |  272 ++++++++++++++++++++++++++++++------------------
 hw/device-assignment.h |   12 ++
 2 files changed, 179 insertions(+), 105 deletions(-)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
  2012-01-28 14:21 [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support Alex Williamson
@ 2012-01-28 14:21 ` Alex Williamson
  2012-01-31 12:40   ` Avi Kivity
  2012-01-28 14:21 ` [PATCH 2/9] pci-assign: Fix warnings with DEBUG enabled Alex Williamson
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2012-01-28 14:21 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, jan.kiszka, shashidhar.patil

To date we've only exposed BARs as 32bit even if the device
physically supports 64bit BARs.  Enable 64bit BARs to be
exposed as such in the guest, which may free up MMIO below
4G should the guest choose to use it.

This adds a new mem64= option to pci-assign, with the
default being off for testing and enablement.  The goal
is to eventually make this enabled by default.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 hw/device-assignment.c |   13 +++++++++++--
 hw/device-assignment.h |    2 ++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 7f4a5ec..07ae0a0 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -49,6 +49,7 @@
 #define IORESOURCE_IRQ      0x00000400
 #define IORESOURCE_DMA      0x00000800
 #define IORESOURCE_PREFETCH 0x00002000  /* No side effects */
+#define IORESOURCE_MEM_64   0x00100000
 
 /* #define DEVICE_ASSIGNMENT_DEBUG 1 */
 
@@ -377,6 +378,11 @@ static int assigned_dev_register_regions(PCIRegion *io_regions,
                 ? PCI_BASE_ADDRESS_MEM_PREFETCH
                 : PCI_BASE_ADDRESS_SPACE_MEMORY;
 
+            if ((pci_dev->features & ASSIGNED_DEVICE_USE_MEM64_MASK) &&
+                (cur_region->type & IORESOURCE_MEM_64)) {
+                t |= PCI_BASE_ADDRESS_MEM_TYPE_64;
+            }
+
             /* map physical memory */
             pci_dev->v_addrs[i].u.r_virtbase = mmap(NULL, cur_region->size,
                                                     PROT_WRITE | PROT_READ,
@@ -569,8 +575,9 @@ again:
         rp->valid = 0;
         rp->resource_fd = -1;
         size = end - start + 1;
-        flags &= IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH;
-        if (size == 0 || (flags & ~IORESOURCE_PREFETCH) == 0)
+        flags &= IORESOURCE_IO | IORESOURCE_MEM |
+                 IORESOURCE_PREFETCH | IORESOURCE_MEM_64;
+        if (size == 0 || !(flags & (IORESOURCE_IO | IORESOURCE_MEM)))
             continue;
         if (flags & IORESOURCE_MEM) {
             flags &= ~IORESOURCE_IO;
@@ -1701,6 +1708,8 @@ static PCIDeviceInfo assign_info = {
                         ASSIGNED_DEVICE_USE_IOMMU_BIT, true),
         DEFINE_PROP_BIT("prefer_msi", AssignedDevice, features,
                         ASSIGNED_DEVICE_PREFER_MSI_BIT, true),
+        DEFINE_PROP_BIT("mem64", AssignedDevice, features,
+                        ASSIGNED_DEVICE_USE_MEM64_BIT, false),
         DEFINE_PROP_INT32("bootindex", AssignedDevice, bootindex, -1),
         DEFINE_PROP_STRING("configfd", AssignedDevice, configfd_name),
         DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index 1b4aecc..8ee2274 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -76,9 +76,11 @@ typedef struct {
 
 #define ASSIGNED_DEVICE_USE_IOMMU_BIT	0
 #define ASSIGNED_DEVICE_PREFER_MSI_BIT	1
+#define ASSIGNED_DEVICE_USE_MEM64_BIT	2
 
 #define ASSIGNED_DEVICE_USE_IOMMU_MASK	(1 << ASSIGNED_DEVICE_USE_IOMMU_BIT)
 #define ASSIGNED_DEVICE_PREFER_MSI_MASK	(1 << ASSIGNED_DEVICE_PREFER_MSI_BIT)
+#define ASSIGNED_DEVICE_USE_MEM64_MASK	(1 << ASSIGNED_DEVICE_USE_MEM64_BIT)
 
 typedef struct AssignedDevice {
     PCIDevice dev;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 2/9] pci-assign: Fix warnings with DEBUG enabled
  2012-01-28 14:21 [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support Alex Williamson
  2012-01-28 14:21 ` [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest Alex Williamson
@ 2012-01-28 14:21 ` Alex Williamson
  2012-01-28 14:21 ` [PATCH 3/9] pci-assign: Update MSI-X MMIO to Memory API Alex Williamson
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 54+ messages in thread
From: Alex Williamson @ 2012-01-28 14:21 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, jan.kiszka, shashidhar.patil

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 hw/device-assignment.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 07ae0a0..b176375 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -75,7 +75,7 @@ static uint64_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
 
     if (fd >= 0) {
         if (data) {
-            DEBUG("pwrite data=%x, size=%d, e_phys=%x, addr=%x\n",
+            DEBUG("pwrite data=%lx, size=%d, e_phys=%lx, addr=%lx\n",
                   *data, size, addr, addr);
             if (pwrite(fd, data, size, addr) != size) {
                 fprintf(stderr, "%s - pwrite failed %s\n",
@@ -87,14 +87,14 @@ static uint64_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
                         __func__, strerror(errno));
                 val = (1UL << (size * 8)) - 1;
             }
-            DEBUG("pread val=%x, size=%d, e_phys=%x, addr=%x\n",
+            DEBUG("pread val=%lx, size=%d, e_phys=%lx, addr=%lx\n",
                   val, size, addr, addr);
         }
     } else {
         uint32_t port = addr + dev_region->u.r_baseport;
 
         if (data) {
-            DEBUG("out data=%x, size=%d, e_phys=%x, host=%x\n",
+            DEBUG("out data=%lx, size=%d, e_phys=%lx, host=%x\n",
                   *data, size, addr, port);
             switch (size) {
                 case 1:
@@ -119,7 +119,7 @@ static uint64_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
                     val = inl(port);
                     break;
             }
-            DEBUG("in data=%x, size=%d, e_phys=%x, host=%x\n",
+            DEBUG("in data=%lx, size=%d, e_phys=%lx, host=%x\n",
                   val, size, addr, port);
         }
     }


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 3/9] pci-assign: Update MSI-X MMIO to Memory API
  2012-01-28 14:21 [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support Alex Williamson
  2012-01-28 14:21 ` [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest Alex Williamson
  2012-01-28 14:21 ` [PATCH 2/9] pci-assign: Fix warnings with DEBUG enabled Alex Williamson
@ 2012-01-28 14:21 ` Alex Williamson
  2012-01-31 12:45   ` Avi Kivity
  2012-01-28 14:21 ` [PATCH 4/9] pci-assign: Use struct for MSI-X table Alex Williamson
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2012-01-28 14:21 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, jan.kiszka, shashidhar.patil

Stop using compatibility mode and at the same time fix available
access sizes.  The PCI spec indicates that the MSI-X table may
only be accessed as DWORD or QWORD.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 hw/device-assignment.c |   56 +++++++++++++-----------------------------------
 1 files changed, 15 insertions(+), 41 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index b176375..1fc7ffa 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1437,62 +1437,36 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
     return 0;
 }
 
-static uint32_t msix_mmio_readl(void *opaque, target_phys_addr_t addr)
+static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
+                               unsigned size)
 {
     AssignedDevice *adev = opaque;
-    unsigned int offset = addr & 0xfff;
-    void *page = adev->msix_table_page;
-    uint32_t val = 0;
+    uint64_t val;
 
-    memcpy(&val, (void *)((char *)page + offset), 4);
+    memcpy(&val, (void *)((uint8_t *)adev->msix_table_page + addr), size);
 
     return val;
 }
 
-static uint32_t msix_mmio_readb(void *opaque, target_phys_addr_t addr)
-{
-    return ((msix_mmio_readl(opaque, addr & ~3)) >>
-            (8 * (addr & 3))) & 0xff;
-}
-
-static uint32_t msix_mmio_readw(void *opaque, target_phys_addr_t addr)
-{
-    return ((msix_mmio_readl(opaque, addr & ~3)) >>
-            (8 * (addr & 3))) & 0xffff;
-}
-
-static void msix_mmio_writel(void *opaque,
-                             target_phys_addr_t addr, uint32_t val)
+static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
+                            uint64_t val, unsigned size)
 {
     AssignedDevice *adev = opaque;
-    unsigned int offset = addr & 0xfff;
-    void *page = adev->msix_table_page;
-
-    DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%x\n",
-		    addr, val);
-    memcpy((void *)((char *)page + offset), &val, 4);
-}
 
-static void msix_mmio_writew(void *opaque,
-                             target_phys_addr_t addr, uint32_t val)
-{
-    msix_mmio_writel(opaque, addr & ~3,
-                     (val & 0xffff) << (8*(addr & 3)));
-}
+    DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%lx\n",
+          addr, val);
 
-static void msix_mmio_writeb(void *opaque,
-                             target_phys_addr_t addr, uint32_t val)
-{
-    msix_mmio_writel(opaque, addr & ~3,
-                     (val & 0xff) << (8*(addr & 3)));
+    memcpy((void *)((uint8_t *)adev->msix_table_page + addr), &val, size);
 }
 
 static const MemoryRegionOps msix_mmio_ops = {
-    .old_mmio = {
-        .read = { msix_mmio_readb, msix_mmio_readw, msix_mmio_readl, },
-        .write = { msix_mmio_writeb, msix_mmio_writew, msix_mmio_writel, },
-    },
+    .read = msix_mmio_read,
+    .write = msix_mmio_write,
     .endianness = DEVICE_NATIVE_ENDIAN,
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 8,
+    },
 };
 
 static int assigned_dev_register_msix_mmio(AssignedDevice *dev)


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 4/9] pci-assign: Use struct for MSI-X table
  2012-01-28 14:21 [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support Alex Williamson
                   ` (2 preceding siblings ...)
  2012-01-28 14:21 ` [PATCH 3/9] pci-assign: Update MSI-X MMIO to Memory API Alex Williamson
@ 2012-01-28 14:21 ` Alex Williamson
  2012-01-31 17:40   ` Michael S. Tsirkin
  2012-01-28 14:22 ` [PATCH 5/9] pci-assign: Only calculate maximum MSI-X vector entries once Alex Williamson
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2012-01-28 14:21 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, jan.kiszka, shashidhar.patil

This makes access much easier.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 hw/device-assignment.c |   55 ++++++++++++++++++++++--------------------------
 hw/device-assignment.h |    9 +++++++-
 2 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 1fc7ffa..422ee00 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -973,10 +973,9 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
     AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
     uint16_t entries_nr = 0, entries_max_nr;
     int pos = 0, i, r = 0;
-    uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
     struct kvm_assigned_msix_nr msix_nr;
     struct kvm_assigned_msix_entry msix_entry;
-    void *va = adev->msix_table_page;
+    MSIXTableEntry *entry = adev->msix_table;
 
     pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
 
@@ -985,12 +984,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
     entries_max_nr += 1;
 
     /* Get the usable entry number for allocating */
-    for (i = 0; i < entries_max_nr; i++) {
-        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
-        memcpy(&msg_data, va + i * 16 + 8, 4);
+    for (i = 0; i < entries_max_nr; i++, entry++) {
         /* Ignore unused entry even it's unmasked */
-        if (msg_data == 0)
+        if (entry->data == 0) {
             continue;
+        }
         entries_nr ++;
     }
 
@@ -1013,16 +1011,13 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
 
     msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
     entries_nr = 0;
-    for (i = 0; i < entries_max_nr; i++) {
+    entry = adev->msix_table;
+    for (i = 0; i < entries_max_nr; i++, entry++) {
         if (entries_nr >= msix_nr.entry_nr)
             break;
-        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
-        memcpy(&msg_data, va + i * 16 + 8, 4);
-        if (msg_data == 0)
+        if (entry->data == 0) {
             continue;
-
-        memcpy(&msg_addr, va + i * 16, 4);
-        memcpy(&msg_upper_addr, va + i * 16 + 4, 4);
+        }
 
         r = kvm_get_irq_route_gsi();
         if (r < 0)
@@ -1031,10 +1026,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
         adev->entry[entries_nr].gsi = r;
         adev->entry[entries_nr].type = KVM_IRQ_ROUTING_MSI;
         adev->entry[entries_nr].flags = 0;
-        adev->entry[entries_nr].u.msi.address_lo = msg_addr;
-        adev->entry[entries_nr].u.msi.address_hi = msg_upper_addr;
-        adev->entry[entries_nr].u.msi.data = msg_data;
-        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!", msg_data, msg_addr);
+        adev->entry[entries_nr].u.msi.address_lo = entry->addr_lo;
+        adev->entry[entries_nr].u.msi.address_hi = entry->addr_hi;
+        adev->entry[entries_nr].u.msi.data = entry->data;
+        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!",
+              entry->data, entry->addr_lo);
 	kvm_add_routing_entry(&adev->entry[entries_nr]);
 
         msix_entry.gsi = adev->entry[entries_nr].gsi;
@@ -1443,7 +1439,7 @@ static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
     AssignedDevice *adev = opaque;
     uint64_t val;
 
-    memcpy(&val, (void *)((uint8_t *)adev->msix_table_page + addr), size);
+    memcpy(&val, (void *)((uint8_t *)adev->msix_table + addr), size);
 
     return val;
 }
@@ -1456,7 +1452,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
     DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%lx\n",
           addr, val);
 
-    memcpy((void *)((uint8_t *)adev->msix_table_page + addr), &val, size);
+    memcpy((void *)((uint8_t *)adev->msix_table + addr), &val, size);
 }
 
 static const MemoryRegionOps msix_mmio_ops = {
@@ -1471,15 +1467,13 @@ static const MemoryRegionOps msix_mmio_ops = {
 
 static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
 {
-    dev->msix_table_page = mmap(NULL, 0x1000,
-                                PROT_READ|PROT_WRITE,
-                                MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
-    if (dev->msix_table_page == MAP_FAILED) {
-        fprintf(stderr, "fail allocate msix_table_page! %s\n",
-                strerror(errno));
+    dev->msix_table = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE,
+                           MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
+    if (dev->msix_table == MAP_FAILED) {
+        fprintf(stderr, "fail allocate msix_table! %s\n", strerror(errno));
         return -EFAULT;
     }
-    memset(dev->msix_table_page, 0, 0x1000);
+    memset(dev->msix_table, 0, 0x1000);
     memory_region_init_io(&dev->mmio, &msix_mmio_ops, dev,
                           "assigned-dev-msix", MSIX_PAGE_SIZE);
     return 0;
@@ -1487,16 +1481,17 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
 
 static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
 {
-    if (!dev->msix_table_page)
+    if (!dev->msix_table) {
         return;
+    }
 
     memory_region_destroy(&dev->mmio);
 
-    if (munmap(dev->msix_table_page, 0x1000) == -1) {
-        fprintf(stderr, "error unmapping msix_table_page! %s\n",
+    if (munmap(dev->msix_table, 0x1000) == -1) {
+        fprintf(stderr, "error unmapping msix_table! %s\n",
                 strerror(errno));
     }
-    dev->msix_table_page = NULL;
+    dev->msix_table = NULL;
 }
 
 static const VMStateDescription vmstate_assigned_device = {
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index 8ee2274..e229303 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -82,6 +82,13 @@ typedef struct {
 #define ASSIGNED_DEVICE_PREFER_MSI_MASK	(1 << ASSIGNED_DEVICE_PREFER_MSI_BIT)
 #define ASSIGNED_DEVICE_USE_MEM64_MASK	(1 << ASSIGNED_DEVICE_USE_MEM64_BIT)
 
+typedef struct {
+    uint32_t addr_lo;
+    uint32_t addr_hi;
+    uint32_t data;
+    uint32_t ctrl;
+} MSIXTableEntry;
+
 typedef struct AssignedDevice {
     PCIDevice dev;
     PCIHostDevice host;
@@ -110,7 +117,7 @@ typedef struct AssignedDevice {
     uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
     int irq_entries_nr;
     struct kvm_irq_routing_entry *entry;
-    void *msix_table_page;
+    MSIXTableEntry *msix_table;
     target_phys_addr_t msix_table_addr;
     MemoryRegion mmio;
     char *configfd_name;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 5/9] pci-assign: Only calculate maximum MSI-X vector entries once
  2012-01-28 14:21 [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support Alex Williamson
                   ` (3 preceding siblings ...)
  2012-01-28 14:21 ` [PATCH 4/9] pci-assign: Use struct for MSI-X table Alex Williamson
@ 2012-01-28 14:22 ` Alex Williamson
  2012-01-31 20:18   ` Michael S. Tsirkin
  2012-01-28 14:22 ` [PATCH 6/9] pci-assign: Proper initialization for MSI-X table Alex Williamson
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2012-01-28 14:22 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, jan.kiszka, shashidhar.patil

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 hw/device-assignment.c |   17 +++++++----------
 hw/device-assignment.h |    1 +
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 422ee00..af614d3 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -971,20 +971,14 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
 static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
 {
     AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
-    uint16_t entries_nr = 0, entries_max_nr;
-    int pos = 0, i, r = 0;
+    uint16_t entries_nr = 0;
+    int i, r = 0;
     struct kvm_assigned_msix_nr msix_nr;
     struct kvm_assigned_msix_entry msix_entry;
     MSIXTableEntry *entry = adev->msix_table;
 
-    pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
-
-    entries_max_nr = *(uint16_t *)(pci_dev->config + pos + 2);
-    entries_max_nr &= PCI_MSIX_FLAGS_QSIZE;
-    entries_max_nr += 1;
-
     /* Get the usable entry number for allocating */
-    for (i = 0; i < entries_max_nr; i++, entry++) {
+    for (i = 0; i < adev->msix_max; i++, entry++) {
         /* Ignore unused entry even it's unmasked */
         if (entry->data == 0) {
             continue;
@@ -1012,7 +1006,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
     msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
     entries_nr = 0;
     entry = adev->msix_table;
-    for (i = 0; i < entries_max_nr; i++, entry++) {
+    for (i = 0; i < adev->msix_max; i++, entry++) {
         if (entries_nr >= msix_nr.entry_nr)
             break;
         if (entry->data == 0) {
@@ -1225,6 +1219,9 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         bar_nr = msix_table_entry & PCI_MSIX_FLAGS_BIRMASK;
         msix_table_entry &= ~PCI_MSIX_FLAGS_BIRMASK;
         dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
+        dev->msix_max = pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS);
+        dev->msix_max &= PCI_MSIX_FLAGS_QSIZE;
+        dev->msix_max += 1;
     }
 
     /* Minimal PM support, nothing writable, device appears to NAK changes */
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index e229303..a909fb2 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -119,6 +119,7 @@ typedef struct AssignedDevice {
     struct kvm_irq_routing_entry *entry;
     MSIXTableEntry *msix_table;
     target_phys_addr_t msix_table_addr;
+    uint16_t msix_max;
     MemoryRegion mmio;
     char *configfd_name;
     int32_t bootindex;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 6/9] pci-assign: Proper initialization for MSI-X table
  2012-01-28 14:21 [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support Alex Williamson
                   ` (4 preceding siblings ...)
  2012-01-28 14:22 ` [PATCH 5/9] pci-assign: Only calculate maximum MSI-X vector entries once Alex Williamson
@ 2012-01-28 14:22 ` Alex Williamson
  2012-01-31 17:40   ` Michael S. Tsirkin
  2012-01-28 14:22 ` [PATCH 7/9] pci-assign: Allocate entries for all MSI-X vectors Alex Williamson
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2012-01-28 14:22 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, jan.kiszka, shashidhar.patil

Per the PCI spec, all vectors should be masked at handoff.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 hw/device-assignment.c |   20 +++++++++++++++++++-
 1 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index af614d3..6efa219 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1462,6 +1462,22 @@ static const MemoryRegionOps msix_mmio_ops = {
     },
 };
 
+static void msix_reset(AssignedDevice *dev)
+{
+    MSIXTableEntry *entry;
+    int i;
+
+    if (!dev->msix_table) {
+        return;
+    }
+
+    memset(dev->msix_table, 0, 0x1000);
+
+    for (i = 0, entry = dev->msix_table; i < dev->msix_max; i++, entry++) {
+        entry->ctrl = 0x1; /* Masked */
+    }
+}
+
 static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
 {
     dev->msix_table = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE,
@@ -1470,7 +1486,9 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
         fprintf(stderr, "fail allocate msix_table! %s\n", strerror(errno));
         return -EFAULT;
     }
-    memset(dev->msix_table, 0, 0x1000);
+
+    msix_reset(dev);
+
     memory_region_init_io(&dev->mmio, &msix_mmio_ops, dev,
                           "assigned-dev-msix", MSIX_PAGE_SIZE);
     return 0;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 7/9] pci-assign: Allocate entries for all MSI-X vectors
  2012-01-28 14:21 [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support Alex Williamson
                   ` (5 preceding siblings ...)
  2012-01-28 14:22 ` [PATCH 6/9] pci-assign: Proper initialization for MSI-X table Alex Williamson
@ 2012-01-28 14:22 ` Alex Williamson
  2012-01-28 14:22 ` [PATCH 8/9] pci-assign: Use MSIX_PAGE_SIZE Alex Williamson
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 54+ messages in thread
From: Alex Williamson @ 2012-01-28 14:22 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, jan.kiszka, shashidhar.patil

We still only initialize the number used.  This lets us avoid
address space translations if the guest were to sparsely
initialize MSI-X vectors on the device.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 hw/device-assignment.c |   44 ++++++++++++++++++++++----------------------
 1 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 6efa219..a6d3a5b 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -634,8 +634,11 @@ static void free_dev_irq_entries(AssignedDevice *dev)
 {
     int i;
 
-    for (i = 0; i < dev->irq_entries_nr; i++)
-        kvm_del_routing_entry(&dev->entry[i]);
+    for (i = 0; i < dev->irq_entries_nr; i++) {
+        if (dev->entry[i].type) {
+            kvm_del_routing_entry(&dev->entry[i]);
+        }
+    }
     g_free(dev->entry);
     dev->entry = NULL;
     dev->irq_entries_nr = 0;
@@ -983,7 +986,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
         if (entry->data == 0) {
             continue;
         }
-        entries_nr ++;
+        entries_nr++;
     }
 
     if (entries_nr == 0) {
@@ -1000,15 +1003,13 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
     }
 
     free_dev_irq_entries(adev);
-    adev->irq_entries_nr = entries_nr;
-    adev->entry = g_malloc0(entries_nr * sizeof(*(adev->entry)));
+
+    adev->irq_entries_nr = adev->msix_max;
+    adev->entry = g_malloc0(adev->msix_max * sizeof(*(adev->entry)));
 
     msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
-    entries_nr = 0;
     entry = adev->msix_table;
     for (i = 0; i < adev->msix_max; i++, entry++) {
-        if (entries_nr >= msix_nr.entry_nr)
-            break;
         if (entry->data == 0) {
             continue;
         }
@@ -1017,26 +1018,25 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
         if (r < 0)
             return r;
 
-        adev->entry[entries_nr].gsi = r;
-        adev->entry[entries_nr].type = KVM_IRQ_ROUTING_MSI;
-        adev->entry[entries_nr].flags = 0;
-        adev->entry[entries_nr].u.msi.address_lo = entry->addr_lo;
-        adev->entry[entries_nr].u.msi.address_hi = entry->addr_hi;
-        adev->entry[entries_nr].u.msi.data = entry->data;
-        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!",
-              entry->data, entry->addr_lo);
-	kvm_add_routing_entry(&adev->entry[entries_nr]);
-
-        msix_entry.gsi = adev->entry[entries_nr].gsi;
+        adev->entry[i].gsi = r;
+        adev->entry[i].type = KVM_IRQ_ROUTING_MSI;
+        adev->entry[i].flags = 0;
+        adev->entry[i].u.msi.address_lo = entry->addr_lo;
+        adev->entry[i].u.msi.address_hi = entry->addr_hi;
+        adev->entry[i].u.msi.data = entry->data;
+
+        DEBUG("MSI-X vector %d, gsi %d, addr %08x_%08x, data %08x\n", i,
+              r, entry->addr_hi, entry->addr_lo, entry->data);
+
+        kvm_add_routing_entry(&adev->entry[i]);
+
+        msix_entry.gsi = adev->entry[i].gsi;
         msix_entry.entry = i;
         r = kvm_assign_set_msix_entry(kvm_state, &msix_entry);
         if (r) {
             fprintf(stderr, "fail to set MSI-X entry! %s\n", strerror(-r));
             break;
         }
-        DEBUG("MSI-X entry gsi 0x%x, entry %d\n!",
-                msix_entry.gsi, msix_entry.entry);
-        entries_nr ++;
     }
 
     if (r == 0 && kvm_commit_irq_routes() < 0) {


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 8/9] pci-assign: Use MSIX_PAGE_SIZE
  2012-01-28 14:21 [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support Alex Williamson
                   ` (6 preceding siblings ...)
  2012-01-28 14:22 ` [PATCH 7/9] pci-assign: Allocate entries for all MSI-X vectors Alex Williamson
@ 2012-01-28 14:22 ` Alex Williamson
  2012-01-28 14:22 ` [PATCH 9/9] pci-assign: Update MSI-X config based on table writes Alex Williamson
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 54+ messages in thread
From: Alex Williamson @ 2012-01-28 14:22 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, jan.kiszka, shashidhar.patil

We've already got it defined, use it.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 hw/device-assignment.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index a6d3a5b..7e52615 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1471,7 +1471,7 @@ static void msix_reset(AssignedDevice *dev)
         return;
     }
 
-    memset(dev->msix_table, 0, 0x1000);
+    memset(dev->msix_table, 0, MSIX_PAGE_SIZE);
 
     for (i = 0, entry = dev->msix_table; i < dev->msix_max; i++, entry++) {
         entry->ctrl = 0x1; /* Masked */
@@ -1480,7 +1480,7 @@ static void msix_reset(AssignedDevice *dev)
 
 static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
 {
-    dev->msix_table = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE,
+    dev->msix_table = mmap(NULL, MSIX_PAGE_SIZE, PROT_READ|PROT_WRITE,
                            MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
     if (dev->msix_table == MAP_FAILED) {
         fprintf(stderr, "fail allocate msix_table! %s\n", strerror(errno));
@@ -1502,7 +1502,7 @@ static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
 
     memory_region_destroy(&dev->mmio);
 
-    if (munmap(dev->msix_table, 0x1000) == -1) {
+    if (munmap(dev->msix_table, MSIX_PAGE_SIZE) == -1) {
         fprintf(stderr, "error unmapping msix_table! %s\n",
                 strerror(errno));
     }


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 9/9] pci-assign: Update MSI-X config based on table writes
  2012-01-28 14:21 [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support Alex Williamson
                   ` (7 preceding siblings ...)
  2012-01-28 14:22 ` [PATCH 8/9] pci-assign: Use MSIX_PAGE_SIZE Alex Williamson
@ 2012-01-28 14:22 ` Alex Williamson
  2012-01-31 12:50   ` Avi Kivity
  2012-01-30 10:11 ` [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support Jan Kiszka
  2012-02-06 15:55 ` Shashidhar Patil
  10 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2012-01-28 14:22 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, jan.kiszka, shashidhar.patil

We currently only update MSI-X configuration with the enable bit
in PCI config space is toggled.  This is pretty sketchy and part
of the reason for the odd checks for vector data is to guess
whether the guest is going to use the vector so we can pre-enable
it.

Two key things missed by doing this is that we don't catch vector
changes after enabling (ex. setting smp_affinity on an irq) and
we don't support guests that don't touch the vector table prior
to enabling the MSI-X capability (ex. freebsd).  This patch
fixes both of these problems.

I'm not able to get good behavior attempting to disable masked
vectors, so we don't actually do anything on mask yet.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 hw/device-assignment.c |   93 ++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 82 insertions(+), 11 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 7e52615..8b05fa3 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -971,6 +971,11 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
     }
 }
 
+static inline bool msix_masked(MSIXTableEntry *entry)
+{
+    return !!(entry->ctrl & 0x1);
+}
+
 static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
 {
     AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
@@ -982,17 +987,19 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
 
     /* Get the usable entry number for allocating */
     for (i = 0; i < adev->msix_max; i++, entry++) {
-        /* Ignore unused entry even it's unmasked */
-        if (entry->data == 0) {
+        if (msix_masked(entry)) {
             continue;
         }
         entries_nr++;
     }
 
-    if (entries_nr == 0) {
-        fprintf(stderr, "MSI-X entry number is zero!\n");
-        return -EINVAL;
+    DEBUG("MSI-X entries: %d\n", entries_nr);
+
+    /* It's valid to enable MSI-X with all entries masked */
+    if (!entries_nr) {
+        return 0;
     }
+
     msix_nr.assigned_dev_id = calc_assigned_dev_id(adev);
     msix_nr.entry_nr = entries_nr;
     r = kvm_assign_set_msix_nr(kvm_state, &msix_nr);
@@ -1010,7 +1017,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
     msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
     entry = adev->msix_table;
     for (i = 0; i < adev->msix_max; i++, entry++) {
-        if (entry->data == 0) {
+        if (msix_masked(entry)) {
             continue;
         }
 
@@ -1082,9 +1089,12 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
             perror("assigned_dev_update_msix_mmio");
             return;
         }
-        if (kvm_assign_irq(kvm_state, &assigned_irq_data) < 0) {
-            perror("assigned_dev_enable_msix: assign irq");
-            return;
+
+        if (assigned_dev->irq_entries_nr) {
+            if (kvm_assign_irq(kvm_state, &assigned_irq_data) < 0) {
+                perror("assigned_dev_enable_msix: assign irq");
+                return;
+            }
         }
         assigned_dev->girq = -1;
         assigned_dev->irq_requested_type = assigned_irq_data.flags;
@@ -1445,11 +1455,72 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
                             uint64_t val, unsigned size)
 {
     AssignedDevice *adev = opaque;
+    PCIDevice *pdev = &adev->dev;
+    uint16_t ctrl;
+    MSIXTableEntry orig;
+    int i = addr >> 4;
+
+    if (i >= adev->msix_max) {
+        return; /* Drop write */
+    }
 
-    DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%lx\n",
-          addr, val);
+    ctrl = pci_get_word(pdev->config + pdev->msix_cap + PCI_MSIX_FLAGS);
+
+    DEBUG("write to MSI-X table offset 0x%lx, val 0x%lx\n", addr, val);
+
+    if (ctrl & PCI_MSIX_FLAGS_ENABLE) {
+        orig = adev->msix_table[i];
+    }
 
     memcpy((void *)((uint8_t *)adev->msix_table + addr), &val, size);
+
+    if (ctrl & PCI_MSIX_FLAGS_ENABLE) {
+        MSIXTableEntry *entry = &adev->msix_table[i];
+
+        if (!msix_masked(&orig) && msix_masked(entry)) {
+            /*
+             * Vector masked, disable it
+             *
+             * XXX theoretically we'd call kvm_assign_set_msix_entry
+             * with a gsi of 0 here as the API indicates that should
+             * disable the interrupt.  However, when we do that with
+             * devices with lots of vectors and irqbalance running,
+             * we seem to make kvm confused and get an ENOSPC from
+             * kvm_assign_set_msix_entry when we try to restore it.
+             * So for now we don't actually disable, but we'll update
+             * the entry when it's re-enabled below.
+             */
+        } else if (msix_masked(&orig) && !msix_masked(entry)) {
+            /* Vector unmasked */
+            if (i >= adev->irq_entries_nr || !adev->entry[i].type) {
+                /* Previously unassigned vector, start from scratch */
+                assigned_dev_update_msix(pdev);
+                return;
+            } else {
+                /* Update an existing, previously masked vector */
+                struct kvm_irq_routing_entry orig = adev->entry[i];
+                int ret;
+
+                adev->entry[i].u.msi.address_lo = entry->addr_lo;
+                adev->entry[i].u.msi.address_hi = entry->addr_hi;
+                adev->entry[i].u.msi.data = entry->data;
+
+                ret = kvm_update_routing_entry(&orig, &adev->entry[i]);
+                if (ret) {
+                    fprintf(stderr,
+                            "Error updating irq routing entry (%d)\n", ret);
+                    return;
+                }
+
+                ret = kvm_commit_irq_routes();
+                if (ret) {
+                    fprintf(stderr,
+                            "Error committing irq routes (%d)\n", ret);
+                    return;
+                }
+            }
+        }
+    }
 }
 
 static const MemoryRegionOps msix_mmio_ops = {


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support
  2012-01-28 14:21 [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support Alex Williamson
                   ` (8 preceding siblings ...)
  2012-01-28 14:22 ` [PATCH 9/9] pci-assign: Update MSI-X config based on table writes Alex Williamson
@ 2012-01-30 10:11 ` Jan Kiszka
  2012-01-30 13:44   ` Alex Williamson
  2012-02-06 15:55 ` Shashidhar Patil
  10 siblings, 1 reply; 54+ messages in thread
From: Jan Kiszka @ 2012-01-30 10:11 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, shashidhar.patil

On 2012-01-28 15:21, Alex Williamson wrote:
> Patch 1 & 2 here are independent of the rest, but I include them
> here to avoid conflicts.  The first patch enables exposing MMIO
> BARs as their native width to the guest.  I added a config option
> for this with the default to use the existing behavior as I
> suspect we may have some latent issues there.  Patch 2 is just
> some trivial debug build warning fixes.
> 
> The rest of the patches work on improving MSI-X table support.
> Particularly, vectors can now be updated by the guest after
> MSI-X is enabled to support things like irqbalance for SMP
> affinity tuning.  We also now update MSI-X configuration as
> new vectors are unmasked, which enables assignment of MSI-X
> devices on FreeBSD.  I was able to assign and use an 82576
> (PF & VF) on a FreeBSD 9.0 guest with this series.  Hopefully
> Shashidhar can report whether this improves the behavior he
> as seeing with an 82599.
> 
> I wasn't able to get masking to work reliably, so I left that
> as is for now.  Perhaps someone has suggestions on getting that
> to work.  Thanks,

Unless it's urging, let's focus on getting this implemented via the
MSI/MSI-X core, not widely duplicated in device-assignment.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support
  2012-01-30 10:11 ` [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support Jan Kiszka
@ 2012-01-30 13:44   ` Alex Williamson
  2012-01-31 12:52     ` Avi Kivity
  0 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2012-01-30 13:44 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm, shashidhar.patil

On Mon, 2012-01-30 at 11:11 +0100, Jan Kiszka wrote:
> On 2012-01-28 15:21, Alex Williamson wrote:
> > Patch 1 & 2 here are independent of the rest, but I include them
> > here to avoid conflicts.  The first patch enables exposing MMIO
> > BARs as their native width to the guest.  I added a config option
> > for this with the default to use the existing behavior as I
> > suspect we may have some latent issues there.  Patch 2 is just
> > some trivial debug build warning fixes.
> > 
> > The rest of the patches work on improving MSI-X table support.
> > Particularly, vectors can now be updated by the guest after
> > MSI-X is enabled to support things like irqbalance for SMP
> > affinity tuning.  We also now update MSI-X configuration as
> > new vectors are unmasked, which enables assignment of MSI-X
> > devices on FreeBSD.  I was able to assign and use an 82576
> > (PF & VF) on a FreeBSD 9.0 guest with this series.  Hopefully
> > Shashidhar can report whether this improves the behavior he
> > as seeing with an 82599.
> > 
> > I wasn't able to get masking to work reliably, so I left that
> > as is for now.  Perhaps someone has suggestions on getting that
> > to work.  Thanks,
> 
> Unless it's urging, let's focus on getting this implemented via the
> MSI/MSI-X core, not widely duplicated in device-assignment.

I disagree.  This isn't making the code duplication worse and it solves
at least two use cases that are currently broken.  This won't make it
any more difficult to eventually move to msix.c, if it does, the core
needs more work.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
  2012-01-28 14:21 ` [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest Alex Williamson
@ 2012-01-31 12:40   ` Avi Kivity
  2012-01-31 12:45     ` Jan Kiszka
  0 siblings, 1 reply; 54+ messages in thread
From: Avi Kivity @ 2012-01-31 12:40 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, jan.kiszka, shashidhar.patil

On 01/28/2012 04:21 PM, Alex Williamson wrote:
> To date we've only exposed BARs as 32bit even if the device
> physically supports 64bit BARs.  Enable 64bit BARs to be
> exposed as such in the guest, which may free up MMIO below
> 4G should the guest choose to use it.
>
> This adds a new mem64= option to pci-assign, with the
> default being off for testing and enablement.  The goal
> is to eventually make this enabled by default.
>

Seems fine, but do we really need the option?  If it doesn't work we
should treat it as an ordinary but and fix it.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
  2012-01-31 12:40   ` Avi Kivity
@ 2012-01-31 12:45     ` Jan Kiszka
  2012-01-31 12:51       ` Avi Kivity
  0 siblings, 1 reply; 54+ messages in thread
From: Jan Kiszka @ 2012-01-31 12:45 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alex Williamson, kvm, shashidhar.patil

On 2012-01-31 13:40, Avi Kivity wrote:
> On 01/28/2012 04:21 PM, Alex Williamson wrote:
>> To date we've only exposed BARs as 32bit even if the device
>> physically supports 64bit BARs.  Enable 64bit BARs to be
>> exposed as such in the guest, which may free up MMIO below
>> 4G should the guest choose to use it.
>>
>> This adds a new mem64= option to pci-assign, with the
>> default being off for testing and enablement.  The goal
>> is to eventually make this enabled by default.
>>
> 
> Seems fine, but do we really need the option?  If it doesn't work we
> should treat it as an ordinary but and fix it.

So far it's against the architecture of the emulated system: our current
chipset predates 64 bit PCI.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 3/9] pci-assign: Update MSI-X MMIO to Memory API
  2012-01-28 14:21 ` [PATCH 3/9] pci-assign: Update MSI-X MMIO to Memory API Alex Williamson
@ 2012-01-31 12:45   ` Avi Kivity
  2012-01-31 21:13     ` Alex Williamson
  0 siblings, 1 reply; 54+ messages in thread
From: Avi Kivity @ 2012-01-31 12:45 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, jan.kiszka, shashidhar.patil

On 01/28/2012 04:21 PM, Alex Williamson wrote:
> Stop using compatibility mode and at the same time fix available
> access sizes.  The PCI spec indicates that the MSI-X table may
> only be accessed as DWORD or QWORD.
>
>  
>  static const MemoryRegionOps msix_mmio_ops = {
> -    .old_mmio = {
> -        .read = { msix_mmio_readb, msix_mmio_readw, msix_mmio_readl, },
> -        .write = { msix_mmio_writeb, msix_mmio_writew, msix_mmio_writel, },
> -    },
> +    .read = msix_mmio_read,
> +    .write = msix_mmio_write,
>      .endianness = DEVICE_NATIVE_ENDIAN,
> +    .impl = {
> +        .min_access_size = 4,
> +        .max_access_size = 8,
> +    },
>  };
>  

.impl.min_access_size = 4 means the core will convert 1-byte I/O to
4-byte I/O (using rmw if needed).  That's not what we want, I think you
can leave it at 1 and explicitly ignore small accesses in the callbacks.

Have you tested 8-byte I/O?  This is the first user.  Don't you need to
set .valid.max_access_size?


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 9/9] pci-assign: Update MSI-X config based on table writes
  2012-01-28 14:22 ` [PATCH 9/9] pci-assign: Update MSI-X config based on table writes Alex Williamson
@ 2012-01-31 12:50   ` Avi Kivity
  0 siblings, 0 replies; 54+ messages in thread
From: Avi Kivity @ 2012-01-31 12:50 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, jan.kiszka, shashidhar.patil

On 01/28/2012 04:22 PM, Alex Williamson wrote:
> We currently only update MSI-X configuration with the enable bit
> in PCI config space is toggled.  This is pretty sketchy and part
> of the reason for the odd checks for vector data is to guess
> whether the guest is going to use the vector so we can pre-enable
> it.
>
> Two key things missed by doing this is that we don't catch vector
> changes after enabling (ex. setting smp_affinity on an irq) and
> we don't support guests that don't touch the vector table prior
> to enabling the MSI-X capability (ex. freebsd).  This patch
> fixes both of these problems.
>
> I'm not able to get good behavior attempting to disable masked
> vectors, so we don't actually do anything on mask yet.
>
>  
> +static inline bool msix_masked(MSIXTableEntry *entry)
> +{
> +    return !!(entry->ctrl & 0x1);
> +}

!! !needed (nor 'inline').

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
  2012-01-31 12:45     ` Jan Kiszka
@ 2012-01-31 12:51       ` Avi Kivity
  2012-01-31 12:57         ` Jan Kiszka
  0 siblings, 1 reply; 54+ messages in thread
From: Avi Kivity @ 2012-01-31 12:51 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, kvm, shashidhar.patil

On 01/31/2012 02:45 PM, Jan Kiszka wrote:
> On 2012-01-31 13:40, Avi Kivity wrote:
> > On 01/28/2012 04:21 PM, Alex Williamson wrote:
> >> To date we've only exposed BARs as 32bit even if the device
> >> physically supports 64bit BARs.  Enable 64bit BARs to be
> >> exposed as such in the guest, which may free up MMIO below
> >> 4G should the guest choose to use it.
> >>
> >> This adds a new mem64= option to pci-assign, with the
> >> default being off for testing and enablement.  The goal
> >> is to eventually make this enabled by default.
> >>
> > 
> > Seems fine, but do we really need the option?  If it doesn't work we
> > should treat it as an ordinary but and fix it.
>
> So far it's against the architecture of the emulated system: our current
> chipset predates 64 bit PCI.
>

Then it should be enabled/disabled at the chipset level.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support
  2012-01-30 13:44   ` Alex Williamson
@ 2012-01-31 12:52     ` Avi Kivity
  2012-01-31 12:56       ` Jan Kiszka
  0 siblings, 1 reply; 54+ messages in thread
From: Avi Kivity @ 2012-01-31 12:52 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jan Kiszka, kvm, shashidhar.patil

On 01/30/2012 03:44 PM, Alex Williamson wrote:
> On Mon, 2012-01-30 at 11:11 +0100, Jan Kiszka wrote:
> > On 2012-01-28 15:21, Alex Williamson wrote:
> > > Patch 1 & 2 here are independent of the rest, but I include them
> > > here to avoid conflicts.  The first patch enables exposing MMIO
> > > BARs as their native width to the guest.  I added a config option
> > > for this with the default to use the existing behavior as I
> > > suspect we may have some latent issues there.  Patch 2 is just
> > > some trivial debug build warning fixes.
> > > 
> > > The rest of the patches work on improving MSI-X table support.
> > > Particularly, vectors can now be updated by the guest after
> > > MSI-X is enabled to support things like irqbalance for SMP
> > > affinity tuning.  We also now update MSI-X configuration as
> > > new vectors are unmasked, which enables assignment of MSI-X
> > > devices on FreeBSD.  I was able to assign and use an 82576
> > > (PF & VF) on a FreeBSD 9.0 guest with this series.  Hopefully
> > > Shashidhar can report whether this improves the behavior he
> > > as seeing with an 82599.
> > > 
> > > I wasn't able to get masking to work reliably, so I left that
> > > as is for now.  Perhaps someone has suggestions on getting that
> > > to work.  Thanks,
> > 
> > Unless it's urging, let's focus on getting this implemented via the
> > MSI/MSI-X core, not widely duplicated in device-assignment.
>
> I disagree.  This isn't making the code duplication worse and it solves
> at least two use cases that are currently broken.  This won't make it
> any more difficult to eventually move to msix.c, if it does, the core
> needs more work.  Thanks,
>

I agree (with Alex), but maybe I missed something?  Patch 9 does call
kvm directly instead of going through msi services, but I don't think
this should hold the patches.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support
  2012-01-31 12:52     ` Avi Kivity
@ 2012-01-31 12:56       ` Jan Kiszka
  0 siblings, 0 replies; 54+ messages in thread
From: Jan Kiszka @ 2012-01-31 12:56 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alex Williamson, kvm, shashidhar.patil

On 2012-01-31 13:52, Avi Kivity wrote:
> On 01/30/2012 03:44 PM, Alex Williamson wrote:
>> On Mon, 2012-01-30 at 11:11 +0100, Jan Kiszka wrote:
>>> On 2012-01-28 15:21, Alex Williamson wrote:
>>>> Patch 1 & 2 here are independent of the rest, but I include them
>>>> here to avoid conflicts.  The first patch enables exposing MMIO
>>>> BARs as their native width to the guest.  I added a config option
>>>> for this with the default to use the existing behavior as I
>>>> suspect we may have some latent issues there.  Patch 2 is just
>>>> some trivial debug build warning fixes.
>>>>
>>>> The rest of the patches work on improving MSI-X table support.
>>>> Particularly, vectors can now be updated by the guest after
>>>> MSI-X is enabled to support things like irqbalance for SMP
>>>> affinity tuning.  We also now update MSI-X configuration as
>>>> new vectors are unmasked, which enables assignment of MSI-X
>>>> devices on FreeBSD.  I was able to assign and use an 82576
>>>> (PF & VF) on a FreeBSD 9.0 guest with this series.  Hopefully
>>>> Shashidhar can report whether this improves the behavior he
>>>> as seeing with an 82599.
>>>>
>>>> I wasn't able to get masking to work reliably, so I left that
>>>> as is for now.  Perhaps someone has suggestions on getting that
>>>> to work.  Thanks,
>>>
>>> Unless it's urging, let's focus on getting this implemented via the
>>> MSI/MSI-X core, not widely duplicated in device-assignment.
>>
>> I disagree.  This isn't making the code duplication worse and it solves
>> at least two use cases that are currently broken.  This won't make it
>> any more difficult to eventually move to msix.c, if it does, the core
>> needs more work.  Thanks,
>>
> 
> I agree (with Alex), but maybe I missed something?  Patch 9 does call
> kvm directly instead of going through msi services, but I don't think
> this should hold the patches.

If this solves the issues, ok. That all needs quite some refactoring
anyway (including the kvm core services).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
  2012-01-31 12:51       ` Avi Kivity
@ 2012-01-31 12:57         ` Jan Kiszka
  2012-01-31 13:10           ` Avi Kivity
  0 siblings, 1 reply; 54+ messages in thread
From: Jan Kiszka @ 2012-01-31 12:57 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alex Williamson, kvm, shashidhar.patil, Michael S. Tsirkin

On 2012-01-31 13:51, Avi Kivity wrote:
> On 01/31/2012 02:45 PM, Jan Kiszka wrote:
>> On 2012-01-31 13:40, Avi Kivity wrote:
>>> On 01/28/2012 04:21 PM, Alex Williamson wrote:
>>>> To date we've only exposed BARs as 32bit even if the device
>>>> physically supports 64bit BARs.  Enable 64bit BARs to be
>>>> exposed as such in the guest, which may free up MMIO below
>>>> 4G should the guest choose to use it.
>>>>
>>>> This adds a new mem64= option to pci-assign, with the
>>>> default being off for testing and enablement.  The goal
>>>> is to eventually make this enabled by default.
>>>>
>>>
>>> Seems fine, but do we really need the option?  If it doesn't work we
>>> should treat it as an ordinary but and fix it.
>>
>> So far it's against the architecture of the emulated system: our current
>> chipset predates 64 bit PCI.
>>
> 
> Then it should be enabled/disabled at the chipset level.

Makes me wonder if we already do some filtering if the device supports
64 bit but the next bridge does not.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
  2012-01-31 12:57         ` Jan Kiszka
@ 2012-01-31 13:10           ` Avi Kivity
  2012-01-31 13:21             ` Jan Kiszka
  0 siblings, 1 reply; 54+ messages in thread
From: Avi Kivity @ 2012-01-31 13:10 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, kvm, shashidhar.patil, Michael S. Tsirkin

On 01/31/2012 02:57 PM, Jan Kiszka wrote:
> >>> Seems fine, but do we really need the option?  If it doesn't work we
> >>> should treat it as an ordinary but and fix it.
> >>
> >> So far it's against the architecture of the emulated system: our current
> >> chipset predates 64 bit PCI.
> >>
> > 
> > Then it should be enabled/disabled at the chipset level.
>
> Makes me wonder if we already do some filtering if the device supports
> 64 bit but the next bridge does not.
>

Our 440fx does support 64-bit bars, so the question doesn't arise for
x86.  Instead we violate the spec.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
  2012-01-31 13:10           ` Avi Kivity
@ 2012-01-31 13:21             ` Jan Kiszka
  2012-01-31 13:33               ` Avi Kivity
  0 siblings, 1 reply; 54+ messages in thread
From: Jan Kiszka @ 2012-01-31 13:21 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alex Williamson, kvm, shashidhar.patil, Michael S. Tsirkin

On 2012-01-31 14:10, Avi Kivity wrote:
> On 01/31/2012 02:57 PM, Jan Kiszka wrote:
>>>>> Seems fine, but do we really need the option?  If it doesn't work we
>>>>> should treat it as an ordinary but and fix it.
>>>>
>>>> So far it's against the architecture of the emulated system: our current
>>>> chipset predates 64 bit PCI.
>>>>
>>>
>>> Then it should be enabled/disabled at the chipset level.
>>
>> Makes me wonder if we already do some filtering if the device supports
>> 64 bit but the next bridge does not.
>>
> 
> Our 440fx does support 64-bit bars, so the question doesn't arise for
> x86.  Instead we violate the spec.

If you mean by "our" the 440fx-qemu, not the real 440fx. That one does
not even support >1GB RAM.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
  2012-01-31 13:21             ` Jan Kiszka
@ 2012-01-31 13:33               ` Avi Kivity
  2012-01-31 21:08                 ` Alex Williamson
  0 siblings, 1 reply; 54+ messages in thread
From: Avi Kivity @ 2012-01-31 13:33 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, kvm, shashidhar.patil, Michael S. Tsirkin

On 01/31/2012 03:21 PM, Jan Kiszka wrote:
> On 2012-01-31 14:10, Avi Kivity wrote:
> > On 01/31/2012 02:57 PM, Jan Kiszka wrote:
> >>>>> Seems fine, but do we really need the option?  If it doesn't work we
> >>>>> should treat it as an ordinary but and fix it.
> >>>>
> >>>> So far it's against the architecture of the emulated system: our current
> >>>> chipset predates 64 bit PCI.
> >>>>
> >>>
> >>> Then it should be enabled/disabled at the chipset level.
> >>
> >> Makes me wonder if we already do some filtering if the device supports
> >> 64 bit but the next bridge does not.
> >>
> > 
> > Our 440fx does support 64-bit bars, so the question doesn't arise for
> > x86.  Instead we violate the spec.
>
> If you mean by "our" the 440fx-qemu, not the real 440fx. That one does
> not even support >1GB RAM.

Yes, that's what I meant.  It also supports pci hotplug, more slots, cpu
hotplug, etc.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 4/9] pci-assign: Use struct for MSI-X table
  2012-01-28 14:21 ` [PATCH 4/9] pci-assign: Use struct for MSI-X table Alex Williamson
@ 2012-01-31 17:40   ` Michael S. Tsirkin
  2012-01-31 19:05     ` Alex Williamson
  0 siblings, 1 reply; 54+ messages in thread
From: Michael S. Tsirkin @ 2012-01-31 17:40 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, jan.kiszka, shashidhar.patil

On Sat, Jan 28, 2012 at 07:21:58AM -0700, Alex Williamson wrote:
> This makes access much easier.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Yes but this also makes it easier to forget
to handle endian-ness.
How about using pci_get/pci_set instead?

> ---
> 
>  hw/device-assignment.c |   55 ++++++++++++++++++++++--------------------------
>  hw/device-assignment.h |    9 +++++++-
>  2 files changed, 33 insertions(+), 31 deletions(-)
> 
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index 1fc7ffa..422ee00 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -973,10 +973,9 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
>      AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
>      uint16_t entries_nr = 0, entries_max_nr;
>      int pos = 0, i, r = 0;
> -    uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
>      struct kvm_assigned_msix_nr msix_nr;
>      struct kvm_assigned_msix_entry msix_entry;
> -    void *va = adev->msix_table_page;
> +    MSIXTableEntry *entry = adev->msix_table;
>  
>      pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
>  
> @@ -985,12 +984,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
>      entries_max_nr += 1;
>  
>      /* Get the usable entry number for allocating */
> -    for (i = 0; i < entries_max_nr; i++) {
> -        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> -        memcpy(&msg_data, va + i * 16 + 8, 4);
> +    for (i = 0; i < entries_max_nr; i++, entry++) {
>          /* Ignore unused entry even it's unmasked */
> -        if (msg_data == 0)
> +        if (entry->data == 0) {
>              continue;
> +        }
>          entries_nr ++;
>      }
>  
> @@ -1013,16 +1011,13 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
>  
>      msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
>      entries_nr = 0;
> -    for (i = 0; i < entries_max_nr; i++) {
> +    entry = adev->msix_table;
> +    for (i = 0; i < entries_max_nr; i++, entry++) {
>          if (entries_nr >= msix_nr.entry_nr)
>              break;
> -        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> -        memcpy(&msg_data, va + i * 16 + 8, 4);
> -        if (msg_data == 0)
> +        if (entry->data == 0) {
>              continue;
> -
> -        memcpy(&msg_addr, va + i * 16, 4);
> -        memcpy(&msg_upper_addr, va + i * 16 + 4, 4);
> +        }
>  
>          r = kvm_get_irq_route_gsi();
>          if (r < 0)
> @@ -1031,10 +1026,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
>          adev->entry[entries_nr].gsi = r;
>          adev->entry[entries_nr].type = KVM_IRQ_ROUTING_MSI;
>          adev->entry[entries_nr].flags = 0;
> -        adev->entry[entries_nr].u.msi.address_lo = msg_addr;
> -        adev->entry[entries_nr].u.msi.address_hi = msg_upper_addr;
> -        adev->entry[entries_nr].u.msi.data = msg_data;
> -        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!", msg_data, msg_addr);
> +        adev->entry[entries_nr].u.msi.address_lo = entry->addr_lo;
> +        adev->entry[entries_nr].u.msi.address_hi = entry->addr_hi;
> +        adev->entry[entries_nr].u.msi.data = entry->data;
> +        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!",
> +              entry->data, entry->addr_lo);
>  	kvm_add_routing_entry(&adev->entry[entries_nr]);
>  
>          msix_entry.gsi = adev->entry[entries_nr].gsi;
> @@ -1443,7 +1439,7 @@ static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
>      AssignedDevice *adev = opaque;
>      uint64_t val;
>  
> -    memcpy(&val, (void *)((uint8_t *)adev->msix_table_page + addr), size);
> +    memcpy(&val, (void *)((uint8_t *)adev->msix_table + addr), size);
>  
>      return val;
>  }
> @@ -1456,7 +1452,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
>      DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%lx\n",
>            addr, val);
>  
> -    memcpy((void *)((uint8_t *)adev->msix_table_page + addr), &val, size);
> +    memcpy((void *)((uint8_t *)adev->msix_table + addr), &val, size);
>  }
>  
>  static const MemoryRegionOps msix_mmio_ops = {
> @@ -1471,15 +1467,13 @@ static const MemoryRegionOps msix_mmio_ops = {
>  
>  static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
>  {
> -    dev->msix_table_page = mmap(NULL, 0x1000,
> -                                PROT_READ|PROT_WRITE,
> -                                MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
> -    if (dev->msix_table_page == MAP_FAILED) {
> -        fprintf(stderr, "fail allocate msix_table_page! %s\n",
> -                strerror(errno));
> +    dev->msix_table = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE,
> +                           MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
> +    if (dev->msix_table == MAP_FAILED) {
> +        fprintf(stderr, "fail allocate msix_table! %s\n", strerror(errno));
>          return -EFAULT;
>      }
> -    memset(dev->msix_table_page, 0, 0x1000);
> +    memset(dev->msix_table, 0, 0x1000);
>      memory_region_init_io(&dev->mmio, &msix_mmio_ops, dev,
>                            "assigned-dev-msix", MSIX_PAGE_SIZE);
>      return 0;
> @@ -1487,16 +1481,17 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
>  
>  static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
>  {
> -    if (!dev->msix_table_page)
> +    if (!dev->msix_table) {
>          return;
> +    }
>  
>      memory_region_destroy(&dev->mmio);
>  
> -    if (munmap(dev->msix_table_page, 0x1000) == -1) {
> -        fprintf(stderr, "error unmapping msix_table_page! %s\n",
> +    if (munmap(dev->msix_table, 0x1000) == -1) {
> +        fprintf(stderr, "error unmapping msix_table! %s\n",
>                  strerror(errno));
>      }
> -    dev->msix_table_page = NULL;
> +    dev->msix_table = NULL;
>  }
>  
>  static const VMStateDescription vmstate_assigned_device = {
> diff --git a/hw/device-assignment.h b/hw/device-assignment.h
> index 8ee2274..e229303 100644
> --- a/hw/device-assignment.h
> +++ b/hw/device-assignment.h
> @@ -82,6 +82,13 @@ typedef struct {
>  #define ASSIGNED_DEVICE_PREFER_MSI_MASK	(1 << ASSIGNED_DEVICE_PREFER_MSI_BIT)
>  #define ASSIGNED_DEVICE_USE_MEM64_MASK	(1 << ASSIGNED_DEVICE_USE_MEM64_BIT)
>  
> +typedef struct {
> +    uint32_t addr_lo;
> +    uint32_t addr_hi;
> +    uint32_t data;
> +    uint32_t ctrl;
> +} MSIXTableEntry;
> +
>  typedef struct AssignedDevice {
>      PCIDevice dev;
>      PCIHostDevice host;
> @@ -110,7 +117,7 @@ typedef struct AssignedDevice {
>      uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
>      int irq_entries_nr;
>      struct kvm_irq_routing_entry *entry;
> -    void *msix_table_page;
> +    MSIXTableEntry *msix_table;
>      target_phys_addr_t msix_table_addr;
>      MemoryRegion mmio;
>      char *configfd_name;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/9] pci-assign: Proper initialization for MSI-X table
  2012-01-28 14:22 ` [PATCH 6/9] pci-assign: Proper initialization for MSI-X table Alex Williamson
@ 2012-01-31 17:40   ` Michael S. Tsirkin
  2012-01-31 19:07     ` Alex Williamson
  0 siblings, 1 reply; 54+ messages in thread
From: Michael S. Tsirkin @ 2012-01-31 17:40 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, jan.kiszka, shashidhar.patil

On Sat, Jan 28, 2012 at 07:22:09AM -0700, Alex Williamson wrote:
> Per the PCI spec, all vectors should be masked at handoff.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
>  hw/device-assignment.c |   20 +++++++++++++++++++-
>  1 files changed, 19 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index af614d3..6efa219 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -1462,6 +1462,22 @@ static const MemoryRegionOps msix_mmio_ops = {
>      },
>  };
>  
> +static void msix_reset(AssignedDevice *dev)
> +{
> +    MSIXTableEntry *entry;
> +    int i;
> +
> +    if (!dev->msix_table) {
> +        return;
> +    }
> +
> +    memset(dev->msix_table, 0, 0x1000);
> +
> +    for (i = 0, entry = dev->msix_table; i < dev->msix_max; i++, entry++) {
> +        entry->ctrl = 0x1; /* Masked */

This is broken for BE hosts.

> +    }
> +}
> +
>  static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
>  {
>      dev->msix_table = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE,
> @@ -1470,7 +1486,9 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
>          fprintf(stderr, "fail allocate msix_table! %s\n", strerror(errno));
>          return -EFAULT;
>      }
> -    memset(dev->msix_table, 0, 0x1000);
> +
> +    msix_reset(dev);
> +
>      memory_region_init_io(&dev->mmio, &msix_mmio_ops, dev,
>                            "assigned-dev-msix", MSIX_PAGE_SIZE);
>      return 0;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 4/9] pci-assign: Use struct for MSI-X table
  2012-01-31 17:40   ` Michael S. Tsirkin
@ 2012-01-31 19:05     ` Alex Williamson
  2012-01-31 20:00       ` Michael S. Tsirkin
  0 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2012-01-31 19:05 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, jan.kiszka, shashidhar.patil

On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote:
> On Sat, Jan 28, 2012 at 07:21:58AM -0700, Alex Williamson wrote:
> > This makes access much easier.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> 
> Yes but this also makes it easier to forget
> to handle endian-ness.
> How about using pci_get/pci_set instead?

Do we really think existing device assignment is ever going to work on
anything but x86/x86?

Alex

> > ---
> > 
> >  hw/device-assignment.c |   55 ++++++++++++++++++++++--------------------------
> >  hw/device-assignment.h |    9 +++++++-
> >  2 files changed, 33 insertions(+), 31 deletions(-)
> > 
> > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > index 1fc7ffa..422ee00 100644
> > --- a/hw/device-assignment.c
> > +++ b/hw/device-assignment.c
> > @@ -973,10 +973,9 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> >      AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> >      uint16_t entries_nr = 0, entries_max_nr;
> >      int pos = 0, i, r = 0;
> > -    uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
> >      struct kvm_assigned_msix_nr msix_nr;
> >      struct kvm_assigned_msix_entry msix_entry;
> > -    void *va = adev->msix_table_page;
> > +    MSIXTableEntry *entry = adev->msix_table;
> >  
> >      pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
> >  
> > @@ -985,12 +984,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> >      entries_max_nr += 1;
> >  
> >      /* Get the usable entry number for allocating */
> > -    for (i = 0; i < entries_max_nr; i++) {
> > -        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > -        memcpy(&msg_data, va + i * 16 + 8, 4);
> > +    for (i = 0; i < entries_max_nr; i++, entry++) {
> >          /* Ignore unused entry even it's unmasked */
> > -        if (msg_data == 0)
> > +        if (entry->data == 0) {
> >              continue;
> > +        }
> >          entries_nr ++;
> >      }
> >  
> > @@ -1013,16 +1011,13 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> >  
> >      msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
> >      entries_nr = 0;
> > -    for (i = 0; i < entries_max_nr; i++) {
> > +    entry = adev->msix_table;
> > +    for (i = 0; i < entries_max_nr; i++, entry++) {
> >          if (entries_nr >= msix_nr.entry_nr)
> >              break;
> > -        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > -        memcpy(&msg_data, va + i * 16 + 8, 4);
> > -        if (msg_data == 0)
> > +        if (entry->data == 0) {
> >              continue;
> > -
> > -        memcpy(&msg_addr, va + i * 16, 4);
> > -        memcpy(&msg_upper_addr, va + i * 16 + 4, 4);
> > +        }
> >  
> >          r = kvm_get_irq_route_gsi();
> >          if (r < 0)
> > @@ -1031,10 +1026,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> >          adev->entry[entries_nr].gsi = r;
> >          adev->entry[entries_nr].type = KVM_IRQ_ROUTING_MSI;
> >          adev->entry[entries_nr].flags = 0;
> > -        adev->entry[entries_nr].u.msi.address_lo = msg_addr;
> > -        adev->entry[entries_nr].u.msi.address_hi = msg_upper_addr;
> > -        adev->entry[entries_nr].u.msi.data = msg_data;
> > -        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!", msg_data, msg_addr);
> > +        adev->entry[entries_nr].u.msi.address_lo = entry->addr_lo;
> > +        adev->entry[entries_nr].u.msi.address_hi = entry->addr_hi;
> > +        adev->entry[entries_nr].u.msi.data = entry->data;
> > +        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!",
> > +              entry->data, entry->addr_lo);
> >  	kvm_add_routing_entry(&adev->entry[entries_nr]);
> >  
> >          msix_entry.gsi = adev->entry[entries_nr].gsi;
> > @@ -1443,7 +1439,7 @@ static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
> >      AssignedDevice *adev = opaque;
> >      uint64_t val;
> >  
> > -    memcpy(&val, (void *)((uint8_t *)adev->msix_table_page + addr), size);
> > +    memcpy(&val, (void *)((uint8_t *)adev->msix_table + addr), size);
> >  
> >      return val;
> >  }
> > @@ -1456,7 +1452,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
> >      DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%lx\n",
> >            addr, val);
> >  
> > -    memcpy((void *)((uint8_t *)adev->msix_table_page + addr), &val, size);
> > +    memcpy((void *)((uint8_t *)adev->msix_table + addr), &val, size);
> >  }
> >  
> >  static const MemoryRegionOps msix_mmio_ops = {
> > @@ -1471,15 +1467,13 @@ static const MemoryRegionOps msix_mmio_ops = {
> >  
> >  static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
> >  {
> > -    dev->msix_table_page = mmap(NULL, 0x1000,
> > -                                PROT_READ|PROT_WRITE,
> > -                                MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
> > -    if (dev->msix_table_page == MAP_FAILED) {
> > -        fprintf(stderr, "fail allocate msix_table_page! %s\n",
> > -                strerror(errno));
> > +    dev->msix_table = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE,
> > +                           MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
> > +    if (dev->msix_table == MAP_FAILED) {
> > +        fprintf(stderr, "fail allocate msix_table! %s\n", strerror(errno));
> >          return -EFAULT;
> >      }
> > -    memset(dev->msix_table_page, 0, 0x1000);
> > +    memset(dev->msix_table, 0, 0x1000);
> >      memory_region_init_io(&dev->mmio, &msix_mmio_ops, dev,
> >                            "assigned-dev-msix", MSIX_PAGE_SIZE);
> >      return 0;
> > @@ -1487,16 +1481,17 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
> >  
> >  static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
> >  {
> > -    if (!dev->msix_table_page)
> > +    if (!dev->msix_table) {
> >          return;
> > +    }
> >  
> >      memory_region_destroy(&dev->mmio);
> >  
> > -    if (munmap(dev->msix_table_page, 0x1000) == -1) {
> > -        fprintf(stderr, "error unmapping msix_table_page! %s\n",
> > +    if (munmap(dev->msix_table, 0x1000) == -1) {
> > +        fprintf(stderr, "error unmapping msix_table! %s\n",
> >                  strerror(errno));
> >      }
> > -    dev->msix_table_page = NULL;
> > +    dev->msix_table = NULL;
> >  }
> >  
> >  static const VMStateDescription vmstate_assigned_device = {
> > diff --git a/hw/device-assignment.h b/hw/device-assignment.h
> > index 8ee2274..e229303 100644
> > --- a/hw/device-assignment.h
> > +++ b/hw/device-assignment.h
> > @@ -82,6 +82,13 @@ typedef struct {
> >  #define ASSIGNED_DEVICE_PREFER_MSI_MASK	(1 << ASSIGNED_DEVICE_PREFER_MSI_BIT)
> >  #define ASSIGNED_DEVICE_USE_MEM64_MASK	(1 << ASSIGNED_DEVICE_USE_MEM64_BIT)
> >  
> > +typedef struct {
> > +    uint32_t addr_lo;
> > +    uint32_t addr_hi;
> > +    uint32_t data;
> > +    uint32_t ctrl;
> > +} MSIXTableEntry;
> > +
> >  typedef struct AssignedDevice {
> >      PCIDevice dev;
> >      PCIHostDevice host;
> > @@ -110,7 +117,7 @@ typedef struct AssignedDevice {
> >      uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
> >      int irq_entries_nr;
> >      struct kvm_irq_routing_entry *entry;
> > -    void *msix_table_page;
> > +    MSIXTableEntry *msix_table;
> >      target_phys_addr_t msix_table_addr;
> >      MemoryRegion mmio;
> >      char *configfd_name;
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/9] pci-assign: Proper initialization for MSI-X table
  2012-01-31 17:40   ` Michael S. Tsirkin
@ 2012-01-31 19:07     ` Alex Williamson
  2012-01-31 19:12       ` Michael S. Tsirkin
  0 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2012-01-31 19:07 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, jan.kiszka, shashidhar.patil

On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote:
> On Sat, Jan 28, 2012 at 07:22:09AM -0700, Alex Williamson wrote:
> > Per the PCI spec, all vectors should be masked at handoff.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> > 
> >  hw/device-assignment.c |   20 +++++++++++++++++++-
> >  1 files changed, 19 insertions(+), 1 deletions(-)
> > 
> > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > index af614d3..6efa219 100644
> > --- a/hw/device-assignment.c
> > +++ b/hw/device-assignment.c
> > @@ -1462,6 +1462,22 @@ static const MemoryRegionOps msix_mmio_ops = {
> >      },
> >  };
> >  
> > +static void msix_reset(AssignedDevice *dev)
> > +{
> > +    MSIXTableEntry *entry;
> > +    int i;
> > +
> > +    if (!dev->msix_table) {
> > +        return;
> > +    }
> > +
> > +    memset(dev->msix_table, 0, 0x1000);
> > +
> > +    for (i = 0, entry = dev->msix_table; i < dev->msix_max; i++, entry++) {
> > +        entry->ctrl = 0x1; /* Masked */
> 
> This is broken for BE hosts.

Show me a BE host that even remotely works with this device assignment
implementation.  Thanks,

Alex

> > +    }
> > +}
> > +
> >  static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
> >  {
> >      dev->msix_table = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE,
> > @@ -1470,7 +1486,9 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
> >          fprintf(stderr, "fail allocate msix_table! %s\n", strerror(errno));
> >          return -EFAULT;
> >      }
> > -    memset(dev->msix_table, 0, 0x1000);
> > +
> > +    msix_reset(dev);
> > +
> >      memory_region_init_io(&dev->mmio, &msix_mmio_ops, dev,
> >                            "assigned-dev-msix", MSIX_PAGE_SIZE);
> >      return 0;
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/9] pci-assign: Proper initialization for MSI-X table
  2012-01-31 19:07     ` Alex Williamson
@ 2012-01-31 19:12       ` Michael S. Tsirkin
  2012-01-31 19:16         ` Jan Kiszka
  0 siblings, 1 reply; 54+ messages in thread
From: Michael S. Tsirkin @ 2012-01-31 19:12 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, jan.kiszka, shashidhar.patil

On Tue, Jan 31, 2012 at 12:07:39PM -0700, Alex Williamson wrote:
> On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote:
> > On Sat, Jan 28, 2012 at 07:22:09AM -0700, Alex Williamson wrote:
> > > Per the PCI spec, all vectors should be masked at handoff.
> > > 
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > ---
> > > 
> > >  hw/device-assignment.c |   20 +++++++++++++++++++-
> > >  1 files changed, 19 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > index af614d3..6efa219 100644
> > > --- a/hw/device-assignment.c
> > > +++ b/hw/device-assignment.c
> > > @@ -1462,6 +1462,22 @@ static const MemoryRegionOps msix_mmio_ops = {
> > >      },
> > >  };
> > >  
> > > +static void msix_reset(AssignedDevice *dev)
> > > +{
> > > +    MSIXTableEntry *entry;
> > > +    int i;
> > > +
> > > +    if (!dev->msix_table) {
> > > +        return;
> > > +    }
> > > +
> > > +    memset(dev->msix_table, 0, 0x1000);
> > > +
> > > +    for (i = 0, entry = dev->msix_table; i < dev->msix_max; i++, entry++) {
> > > +        entry->ctrl = 0x1; /* Masked */
> > 
> > This is broken for BE hosts.
> 
> Show me a BE host that even remotely works with this device assignment
> implementation.  Thanks,
> 
> Alex

I don't get it. Yes lots of cleanup is needed but why add more broken
code?

> > > +    }
> > > +}
> > > +
> > >  static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
> > >  {
> > >      dev->msix_table = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE,
> > > @@ -1470,7 +1486,9 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
> > >          fprintf(stderr, "fail allocate msix_table! %s\n", strerror(errno));
> > >          return -EFAULT;
> > >      }
> > > -    memset(dev->msix_table, 0, 0x1000);
> > > +
> > > +    msix_reset(dev);
> > > +
> > >      memory_region_init_io(&dev->mmio, &msix_mmio_ops, dev,
> > >                            "assigned-dev-msix", MSIX_PAGE_SIZE);
> > >      return 0;
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/9] pci-assign: Proper initialization for MSI-X table
  2012-01-31 19:12       ` Michael S. Tsirkin
@ 2012-01-31 19:16         ` Jan Kiszka
  2012-01-31 20:19           ` Michael S. Tsirkin
  0 siblings, 1 reply; 54+ messages in thread
From: Jan Kiszka @ 2012-01-31 19:16 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alex Williamson, kvm, shashidhar.patil

On 2012-01-31 20:12, Michael S. Tsirkin wrote:
> On Tue, Jan 31, 2012 at 12:07:39PM -0700, Alex Williamson wrote:
>> On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote:
>>> On Sat, Jan 28, 2012 at 07:22:09AM -0700, Alex Williamson wrote:
>>>> Per the PCI spec, all vectors should be masked at handoff.
>>>>
>>>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>>>> ---
>>>>
>>>>  hw/device-assignment.c |   20 +++++++++++++++++++-
>>>>  1 files changed, 19 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
>>>> index af614d3..6efa219 100644
>>>> --- a/hw/device-assignment.c
>>>> +++ b/hw/device-assignment.c
>>>> @@ -1462,6 +1462,22 @@ static const MemoryRegionOps msix_mmio_ops = {
>>>>      },
>>>>  };
>>>>  
>>>> +static void msix_reset(AssignedDevice *dev)
>>>> +{
>>>> +    MSIXTableEntry *entry;
>>>> +    int i;
>>>> +
>>>> +    if (!dev->msix_table) {
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    memset(dev->msix_table, 0, 0x1000);
>>>> +
>>>> +    for (i = 0, entry = dev->msix_table; i < dev->msix_max; i++, entry++) {
>>>> +        entry->ctrl = 0x1; /* Masked */
>>>
>>> This is broken for BE hosts.
>>
>> Show me a BE host that even remotely works with this device assignment
>> implementation.  Thanks,
>>
>> Alex
> 
> I don't get it. Yes lots of cleanup is needed but why add more broken
> code?

At some point, we will just do this via the PCI core, and that will have
to do it correctly anyway. This code here is supposed to be removed again.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 4/9] pci-assign: Use struct for MSI-X table
  2012-01-31 19:05     ` Alex Williamson
@ 2012-01-31 20:00       ` Michael S. Tsirkin
  2012-01-31 21:17         ` Alex Williamson
  0 siblings, 1 reply; 54+ messages in thread
From: Michael S. Tsirkin @ 2012-01-31 20:00 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, jan.kiszka, shashidhar.patil

On Tue, Jan 31, 2012 at 12:05:49PM -0700, Alex Williamson wrote:
> On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote:
> > On Sat, Jan 28, 2012 at 07:21:58AM -0700, Alex Williamson wrote:
> > > This makes access much easier.
> > > 
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > 
> > Yes but this also makes it easier to forget
> > to handle endian-ness.
> > How about using pci_get/pci_set instead?
> 
> Do we really think existing device assignment is ever going to work on
> anything but x86/x86?
> 
> Alex

It's a question of whether anyone bothers to
fix all portability bugs. Besides endian-ness,
and msix vector decoding, there's not much that is
x86 specific there.

> > > ---
> > > 
> > >  hw/device-assignment.c |   55 ++++++++++++++++++++++--------------------------
> > >  hw/device-assignment.h |    9 +++++++-
> > >  2 files changed, 33 insertions(+), 31 deletions(-)
> > > 
> > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > index 1fc7ffa..422ee00 100644
> > > --- a/hw/device-assignment.c
> > > +++ b/hw/device-assignment.c
> > > @@ -973,10 +973,9 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > >      AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> > >      uint16_t entries_nr = 0, entries_max_nr;
> > >      int pos = 0, i, r = 0;
> > > -    uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
> > >      struct kvm_assigned_msix_nr msix_nr;
> > >      struct kvm_assigned_msix_entry msix_entry;
> > > -    void *va = adev->msix_table_page;
> > > +    MSIXTableEntry *entry = adev->msix_table;
> > >  
> > >      pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
> > >  
> > > @@ -985,12 +984,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > >      entries_max_nr += 1;
> > >  
> > >      /* Get the usable entry number for allocating */
> > > -    for (i = 0; i < entries_max_nr; i++) {
> > > -        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > > -        memcpy(&msg_data, va + i * 16 + 8, 4);
> > > +    for (i = 0; i < entries_max_nr; i++, entry++) {
> > >          /* Ignore unused entry even it's unmasked */
> > > -        if (msg_data == 0)
> > > +        if (entry->data == 0) {
> > >              continue;
> > > +        }
> > >          entries_nr ++;
> > >      }
> > >  
> > > @@ -1013,16 +1011,13 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > >  
> > >      msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
> > >      entries_nr = 0;
> > > -    for (i = 0; i < entries_max_nr; i++) {
> > > +    entry = adev->msix_table;
> > > +    for (i = 0; i < entries_max_nr; i++, entry++) {
> > >          if (entries_nr >= msix_nr.entry_nr)
> > >              break;
> > > -        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > > -        memcpy(&msg_data, va + i * 16 + 8, 4);
> > > -        if (msg_data == 0)
> > > +        if (entry->data == 0) {
> > >              continue;
> > > -
> > > -        memcpy(&msg_addr, va + i * 16, 4);
> > > -        memcpy(&msg_upper_addr, va + i * 16 + 4, 4);
> > > +        }
> > >  
> > >          r = kvm_get_irq_route_gsi();
> > >          if (r < 0)
> > > @@ -1031,10 +1026,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > >          adev->entry[entries_nr].gsi = r;
> > >          adev->entry[entries_nr].type = KVM_IRQ_ROUTING_MSI;
> > >          adev->entry[entries_nr].flags = 0;
> > > -        adev->entry[entries_nr].u.msi.address_lo = msg_addr;
> > > -        adev->entry[entries_nr].u.msi.address_hi = msg_upper_addr;
> > > -        adev->entry[entries_nr].u.msi.data = msg_data;
> > > -        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!", msg_data, msg_addr);
> > > +        adev->entry[entries_nr].u.msi.address_lo = entry->addr_lo;
> > > +        adev->entry[entries_nr].u.msi.address_hi = entry->addr_hi;
> > > +        adev->entry[entries_nr].u.msi.data = entry->data;
> > > +        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!",
> > > +              entry->data, entry->addr_lo);
> > >  	kvm_add_routing_entry(&adev->entry[entries_nr]);
> > >  
> > >          msix_entry.gsi = adev->entry[entries_nr].gsi;
> > > @@ -1443,7 +1439,7 @@ static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
> > >      AssignedDevice *adev = opaque;
> > >      uint64_t val;
> > >  
> > > -    memcpy(&val, (void *)((uint8_t *)adev->msix_table_page + addr), size);
> > > +    memcpy(&val, (void *)((uint8_t *)adev->msix_table + addr), size);
> > >  
> > >      return val;
> > >  }
> > > @@ -1456,7 +1452,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
> > >      DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%lx\n",
> > >            addr, val);
> > >  
> > > -    memcpy((void *)((uint8_t *)adev->msix_table_page + addr), &val, size);
> > > +    memcpy((void *)((uint8_t *)adev->msix_table + addr), &val, size);
> > >  }
> > >  
> > >  static const MemoryRegionOps msix_mmio_ops = {
> > > @@ -1471,15 +1467,13 @@ static const MemoryRegionOps msix_mmio_ops = {
> > >  
> > >  static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
> > >  {
> > > -    dev->msix_table_page = mmap(NULL, 0x1000,
> > > -                                PROT_READ|PROT_WRITE,
> > > -                                MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
> > > -    if (dev->msix_table_page == MAP_FAILED) {
> > > -        fprintf(stderr, "fail allocate msix_table_page! %s\n",
> > > -                strerror(errno));
> > > +    dev->msix_table = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE,
> > > +                           MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
> > > +    if (dev->msix_table == MAP_FAILED) {
> > > +        fprintf(stderr, "fail allocate msix_table! %s\n", strerror(errno));
> > >          return -EFAULT;
> > >      }
> > > -    memset(dev->msix_table_page, 0, 0x1000);
> > > +    memset(dev->msix_table, 0, 0x1000);
> > >      memory_region_init_io(&dev->mmio, &msix_mmio_ops, dev,
> > >                            "assigned-dev-msix", MSIX_PAGE_SIZE);
> > >      return 0;
> > > @@ -1487,16 +1481,17 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
> > >  
> > >  static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
> > >  {
> > > -    if (!dev->msix_table_page)
> > > +    if (!dev->msix_table) {
> > >          return;
> > > +    }
> > >  
> > >      memory_region_destroy(&dev->mmio);
> > >  
> > > -    if (munmap(dev->msix_table_page, 0x1000) == -1) {
> > > -        fprintf(stderr, "error unmapping msix_table_page! %s\n",
> > > +    if (munmap(dev->msix_table, 0x1000) == -1) {
> > > +        fprintf(stderr, "error unmapping msix_table! %s\n",
> > >                  strerror(errno));
> > >      }
> > > -    dev->msix_table_page = NULL;
> > > +    dev->msix_table = NULL;
> > >  }
> > >  
> > >  static const VMStateDescription vmstate_assigned_device = {
> > > diff --git a/hw/device-assignment.h b/hw/device-assignment.h
> > > index 8ee2274..e229303 100644
> > > --- a/hw/device-assignment.h
> > > +++ b/hw/device-assignment.h
> > > @@ -82,6 +82,13 @@ typedef struct {
> > >  #define ASSIGNED_DEVICE_PREFER_MSI_MASK	(1 << ASSIGNED_DEVICE_PREFER_MSI_BIT)
> > >  #define ASSIGNED_DEVICE_USE_MEM64_MASK	(1 << ASSIGNED_DEVICE_USE_MEM64_BIT)
> > >  
> > > +typedef struct {
> > > +    uint32_t addr_lo;
> > > +    uint32_t addr_hi;
> > > +    uint32_t data;
> > > +    uint32_t ctrl;
> > > +} MSIXTableEntry;
> > > +
> > >  typedef struct AssignedDevice {
> > >      PCIDevice dev;
> > >      PCIHostDevice host;
> > > @@ -110,7 +117,7 @@ typedef struct AssignedDevice {
> > >      uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
> > >      int irq_entries_nr;
> > >      struct kvm_irq_routing_entry *entry;
> > > -    void *msix_table_page;
> > > +    MSIXTableEntry *msix_table;
> > >      target_phys_addr_t msix_table_addr;
> > >      MemoryRegion mmio;
> > >      char *configfd_name;
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 5/9] pci-assign: Only calculate maximum MSI-X vector entries once
  2012-01-28 14:22 ` [PATCH 5/9] pci-assign: Only calculate maximum MSI-X vector entries once Alex Williamson
@ 2012-01-31 20:18   ` Michael S. Tsirkin
  2012-01-31 20:31     ` Alex Williamson
  0 siblings, 1 reply; 54+ messages in thread
From: Michael S. Tsirkin @ 2012-01-31 20:18 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, jan.kiszka, shashidhar.patil

On Sat, Jan 28, 2012 at 07:22:04AM -0700, Alex Williamson wrote:
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---

Why? Optimization?

>  hw/device-assignment.c |   17 +++++++----------
>  hw/device-assignment.h |    1 +
>  2 files changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index 422ee00..af614d3 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -971,20 +971,14 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
>  static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
>  {
>      AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> -    uint16_t entries_nr = 0, entries_max_nr;
> -    int pos = 0, i, r = 0;
> +    uint16_t entries_nr = 0;
> +    int i, r = 0;
>      struct kvm_assigned_msix_nr msix_nr;
>      struct kvm_assigned_msix_entry msix_entry;
>      MSIXTableEntry *entry = adev->msix_table;
>  
> -    pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
> -
> -    entries_max_nr = *(uint16_t *)(pci_dev->config + pos + 2);
> -    entries_max_nr &= PCI_MSIX_FLAGS_QSIZE;
> -    entries_max_nr += 1;
> -
>      /* Get the usable entry number for allocating */
> -    for (i = 0; i < entries_max_nr; i++, entry++) {
> +    for (i = 0; i < adev->msix_max; i++, entry++) {
>          /* Ignore unused entry even it's unmasked */
>          if (entry->data == 0) {
>              continue;
> @@ -1012,7 +1006,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
>      msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
>      entries_nr = 0;
>      entry = adev->msix_table;
> -    for (i = 0; i < entries_max_nr; i++, entry++) {
> +    for (i = 0; i < adev->msix_max; i++, entry++) {
>          if (entries_nr >= msix_nr.entry_nr)
>              break;
>          if (entry->data == 0) {
> @@ -1225,6 +1219,9 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>          bar_nr = msix_table_entry & PCI_MSIX_FLAGS_BIRMASK;
>          msix_table_entry &= ~PCI_MSIX_FLAGS_BIRMASK;
>          dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
> +        dev->msix_max = pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS);
> +        dev->msix_max &= PCI_MSIX_FLAGS_QSIZE;
> +        dev->msix_max += 1;
>      }
>  
>      /* Minimal PM support, nothing writable, device appears to NAK changes */
> diff --git a/hw/device-assignment.h b/hw/device-assignment.h
> index e229303..a909fb2 100644
> --- a/hw/device-assignment.h
> +++ b/hw/device-assignment.h
> @@ -119,6 +119,7 @@ typedef struct AssignedDevice {
>      struct kvm_irq_routing_entry *entry;
>      MSIXTableEntry *msix_table;
>      target_phys_addr_t msix_table_addr;
> +    uint16_t msix_max;
>      MemoryRegion mmio;
>      char *configfd_name;
>      int32_t bootindex;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/9] pci-assign: Proper initialization for MSI-X table
  2012-01-31 19:16         ` Jan Kiszka
@ 2012-01-31 20:19           ` Michael S. Tsirkin
  2012-01-31 21:06             ` Alex Williamson
  0 siblings, 1 reply; 54+ messages in thread
From: Michael S. Tsirkin @ 2012-01-31 20:19 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, kvm, shashidhar.patil

On Tue, Jan 31, 2012 at 08:16:31PM +0100, Jan Kiszka wrote:
> On 2012-01-31 20:12, Michael S. Tsirkin wrote:
> > On Tue, Jan 31, 2012 at 12:07:39PM -0700, Alex Williamson wrote:
> >> On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote:
> >>> On Sat, Jan 28, 2012 at 07:22:09AM -0700, Alex Williamson wrote:
> >>>> Per the PCI spec, all vectors should be masked at handoff.
> >>>>
> >>>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> >>>> ---
> >>>>
> >>>>  hw/device-assignment.c |   20 +++++++++++++++++++-
> >>>>  1 files changed, 19 insertions(+), 1 deletions(-)
> >>>>
> >>>> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> >>>> index af614d3..6efa219 100644
> >>>> --- a/hw/device-assignment.c
> >>>> +++ b/hw/device-assignment.c
> >>>> @@ -1462,6 +1462,22 @@ static const MemoryRegionOps msix_mmio_ops = {
> >>>>      },
> >>>>  };
> >>>>  
> >>>> +static void msix_reset(AssignedDevice *dev)
> >>>> +{
> >>>> +    MSIXTableEntry *entry;
> >>>> +    int i;
> >>>> +
> >>>> +    if (!dev->msix_table) {
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>> +    memset(dev->msix_table, 0, 0x1000);
> >>>> +
> >>>> +    for (i = 0, entry = dev->msix_table; i < dev->msix_max; i++, entry++) {
> >>>> +        entry->ctrl = 0x1; /* Masked */
> >>>
> >>> This is broken for BE hosts.
> >>
> >> Show me a BE host that even remotely works with this device assignment
> >> implementation.  Thanks,
> >>
> >> Alex
> > 
> > I don't get it. Yes lots of cleanup is needed but why add more broken
> > code?
> 
> At some point, we will just do this via the PCI core, and that will have
> to do it correctly anyway. This code here is supposed to be removed again.
> 
> Jan

Either we say this is all dead code, then
let's just fix bugs until it is rewritten.
To fix the bug, all we need is patch 9, maybe 6 and 7.

Or there is intent to maintain it going forward
then, methinks, it needs to be brought up to
some minimal standards.


> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 5/9] pci-assign: Only calculate maximum MSI-X vector entries once
  2012-01-31 20:18   ` Michael S. Tsirkin
@ 2012-01-31 20:31     ` Alex Williamson
  2012-01-31 20:56       ` Michael S. Tsirkin
  0 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2012-01-31 20:31 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, jan.kiszka, shashidhar.patil

On Tue, 2012-01-31 at 22:18 +0200, Michael S. Tsirkin wrote:
> On Sat, Jan 28, 2012 at 07:22:04AM -0700, Alex Williamson wrote:
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> 
> Why? Optimization?

Because in 6/9 we'd have to calculate it again for resetting the msix
table and in 9/9 we use it to drop writes outside of the valid vector
range.  I suspect we might actually want to call msix_reset on device
reset, so that means we'd be recalculating it every time we reset the
device, write to the msix table, and again if we're actually updating
msix entries.  Redundancy exceeded my cost of two bytes metric.  Thanks,

Alex

> >  hw/device-assignment.c |   17 +++++++----------
> >  hw/device-assignment.h |    1 +
> >  2 files changed, 8 insertions(+), 10 deletions(-)
> > 
> > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > index 422ee00..af614d3 100644
> > --- a/hw/device-assignment.c
> > +++ b/hw/device-assignment.c
> > @@ -971,20 +971,14 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
> >  static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> >  {
> >      AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> > -    uint16_t entries_nr = 0, entries_max_nr;
> > -    int pos = 0, i, r = 0;
> > +    uint16_t entries_nr = 0;
> > +    int i, r = 0;
> >      struct kvm_assigned_msix_nr msix_nr;
> >      struct kvm_assigned_msix_entry msix_entry;
> >      MSIXTableEntry *entry = adev->msix_table;
> >  
> > -    pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
> > -
> > -    entries_max_nr = *(uint16_t *)(pci_dev->config + pos + 2);
> > -    entries_max_nr &= PCI_MSIX_FLAGS_QSIZE;
> > -    entries_max_nr += 1;
> > -
> >      /* Get the usable entry number for allocating */
> > -    for (i = 0; i < entries_max_nr; i++, entry++) {
> > +    for (i = 0; i < adev->msix_max; i++, entry++) {
> >          /* Ignore unused entry even it's unmasked */
> >          if (entry->data == 0) {
> >              continue;
> > @@ -1012,7 +1006,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> >      msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
> >      entries_nr = 0;
> >      entry = adev->msix_table;
> > -    for (i = 0; i < entries_max_nr; i++, entry++) {
> > +    for (i = 0; i < adev->msix_max; i++, entry++) {
> >          if (entries_nr >= msix_nr.entry_nr)
> >              break;
> >          if (entry->data == 0) {
> > @@ -1225,6 +1219,9 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
> >          bar_nr = msix_table_entry & PCI_MSIX_FLAGS_BIRMASK;
> >          msix_table_entry &= ~PCI_MSIX_FLAGS_BIRMASK;
> >          dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
> > +        dev->msix_max = pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS);
> > +        dev->msix_max &= PCI_MSIX_FLAGS_QSIZE;
> > +        dev->msix_max += 1;
> >      }
> >  
> >      /* Minimal PM support, nothing writable, device appears to NAK changes */
> > diff --git a/hw/device-assignment.h b/hw/device-assignment.h
> > index e229303..a909fb2 100644
> > --- a/hw/device-assignment.h
> > +++ b/hw/device-assignment.h
> > @@ -119,6 +119,7 @@ typedef struct AssignedDevice {
> >      struct kvm_irq_routing_entry *entry;
> >      MSIXTableEntry *msix_table;
> >      target_phys_addr_t msix_table_addr;
> > +    uint16_t msix_max;
> >      MemoryRegion mmio;
> >      char *configfd_name;
> >      int32_t bootindex;
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 5/9] pci-assign: Only calculate maximum MSI-X vector entries once
  2012-01-31 20:31     ` Alex Williamson
@ 2012-01-31 20:56       ` Michael S. Tsirkin
  0 siblings, 0 replies; 54+ messages in thread
From: Michael S. Tsirkin @ 2012-01-31 20:56 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, jan.kiszka, shashidhar.patil

On Tue, Jan 31, 2012 at 01:31:59PM -0700, Alex Williamson wrote:
> On Tue, 2012-01-31 at 22:18 +0200, Michael S. Tsirkin wrote:
> > On Sat, Jan 28, 2012 at 07:22:04AM -0700, Alex Williamson wrote:
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > ---
> > 
> > Why? Optimization?
> 
> Because in 6/9 we'd have to calculate it again for resetting the msix
> table and in 9/9 we use it to drop writes outside of the valid vector
> range.  I suspect we might actually want to call msix_reset on device
> reset, so that means we'd be recalculating it every time we reset the
> device, write to the msix table, and again if we're actually updating
> msix entries.  Redundancy exceeded my cost of two bytes metric.  Thanks,
> 
> Alex

No problem but motivation should probably go into commit log.

> > >  hw/device-assignment.c |   17 +++++++----------
> > >  hw/device-assignment.h |    1 +
> > >  2 files changed, 8 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > index 422ee00..af614d3 100644
> > > --- a/hw/device-assignment.c
> > > +++ b/hw/device-assignment.c
> > > @@ -971,20 +971,14 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
> > >  static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > >  {
> > >      AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> > > -    uint16_t entries_nr = 0, entries_max_nr;
> > > -    int pos = 0, i, r = 0;
> > > +    uint16_t entries_nr = 0;
> > > +    int i, r = 0;
> > >      struct kvm_assigned_msix_nr msix_nr;
> > >      struct kvm_assigned_msix_entry msix_entry;
> > >      MSIXTableEntry *entry = adev->msix_table;
> > >  
> > > -    pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
> > > -
> > > -    entries_max_nr = *(uint16_t *)(pci_dev->config + pos + 2);
> > > -    entries_max_nr &= PCI_MSIX_FLAGS_QSIZE;
> > > -    entries_max_nr += 1;
> > > -
> > >      /* Get the usable entry number for allocating */
> > > -    for (i = 0; i < entries_max_nr; i++, entry++) {
> > > +    for (i = 0; i < adev->msix_max; i++, entry++) {
> > >          /* Ignore unused entry even it's unmasked */
> > >          if (entry->data == 0) {
> > >              continue;
> > > @@ -1012,7 +1006,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > >      msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
> > >      entries_nr = 0;
> > >      entry = adev->msix_table;
> > > -    for (i = 0; i < entries_max_nr; i++, entry++) {
> > > +    for (i = 0; i < adev->msix_max; i++, entry++) {
> > >          if (entries_nr >= msix_nr.entry_nr)
> > >              break;
> > >          if (entry->data == 0) {
> > > @@ -1225,6 +1219,9 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
> > >          bar_nr = msix_table_entry & PCI_MSIX_FLAGS_BIRMASK;
> > >          msix_table_entry &= ~PCI_MSIX_FLAGS_BIRMASK;
> > >          dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
> > > +        dev->msix_max = pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS);
> > > +        dev->msix_max &= PCI_MSIX_FLAGS_QSIZE;
> > > +        dev->msix_max += 1;
> > >      }
> > >  
> > >      /* Minimal PM support, nothing writable, device appears to NAK changes */
> > > diff --git a/hw/device-assignment.h b/hw/device-assignment.h
> > > index e229303..a909fb2 100644
> > > --- a/hw/device-assignment.h
> > > +++ b/hw/device-assignment.h
> > > @@ -119,6 +119,7 @@ typedef struct AssignedDevice {
> > >      struct kvm_irq_routing_entry *entry;
> > >      MSIXTableEntry *msix_table;
> > >      target_phys_addr_t msix_table_addr;
> > > +    uint16_t msix_max;
> > >      MemoryRegion mmio;
> > >      char *configfd_name;
> > >      int32_t bootindex;
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/9] pci-assign: Proper initialization for MSI-X table
  2012-01-31 20:19           ` Michael S. Tsirkin
@ 2012-01-31 21:06             ` Alex Williamson
  0 siblings, 0 replies; 54+ messages in thread
From: Alex Williamson @ 2012-01-31 21:06 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jan Kiszka, kvm, shashidhar.patil

On Tue, 2012-01-31 at 22:19 +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 31, 2012 at 08:16:31PM +0100, Jan Kiszka wrote:
> > On 2012-01-31 20:12, Michael S. Tsirkin wrote:
> > > On Tue, Jan 31, 2012 at 12:07:39PM -0700, Alex Williamson wrote:
> > >> On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote:
> > >>> On Sat, Jan 28, 2012 at 07:22:09AM -0700, Alex Williamson wrote:
> > >>>> Per the PCI spec, all vectors should be masked at handoff.
> > >>>>
> > >>>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > >>>> ---
> > >>>>
> > >>>>  hw/device-assignment.c |   20 +++++++++++++++++++-
> > >>>>  1 files changed, 19 insertions(+), 1 deletions(-)
> > >>>>
> > >>>> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > >>>> index af614d3..6efa219 100644
> > >>>> --- a/hw/device-assignment.c
> > >>>> +++ b/hw/device-assignment.c
> > >>>> @@ -1462,6 +1462,22 @@ static const MemoryRegionOps msix_mmio_ops = {
> > >>>>      },
> > >>>>  };
> > >>>>  
> > >>>> +static void msix_reset(AssignedDevice *dev)
> > >>>> +{
> > >>>> +    MSIXTableEntry *entry;
> > >>>> +    int i;
> > >>>> +
> > >>>> +    if (!dev->msix_table) {
> > >>>> +        return;
> > >>>> +    }
> > >>>> +
> > >>>> +    memset(dev->msix_table, 0, 0x1000);
> > >>>> +
> > >>>> +    for (i = 0, entry = dev->msix_table; i < dev->msix_max; i++, entry++) {
> > >>>> +        entry->ctrl = 0x1; /* Masked */
> > >>>
> > >>> This is broken for BE hosts.
> > >>
> > >> Show me a BE host that even remotely works with this device assignment
> > >> implementation.  Thanks,
> > >>
> > >> Alex
> > > 
> > > I don't get it. Yes lots of cleanup is needed but why add more broken
> > > code?
> > 
> > At some point, we will just do this via the PCI core, and that will have
> > to do it correctly anyway. This code here is supposed to be removed again.
> > 
> > Jan
> 
> Either we say this is all dead code, then
> let's just fix bugs until it is rewritten.
> To fix the bug, all we need is patch 9, maybe 6 and 7.
> 
> Or there is intent to maintain it going forward
> then, methinks, it needs to be brought up to
> some minimal standards.

There's minimal standards and then there's endian neutrality and non-x86
support for a piece of code that, I think, is probably never going to
see qemu.git.  There's a big gap between those.  I'd love for us to move
to vfio, but for now we're stuck with this and I'll try to update it and
make it a little better each time.  Looking out for big endian hosts
isn't high on my priority list for this code though.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
  2012-01-31 13:33               ` Avi Kivity
@ 2012-01-31 21:08                 ` Alex Williamson
  2012-01-31 21:14                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2012-01-31 21:08 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jan Kiszka, kvm, shashidhar.patil, Michael S. Tsirkin

On Tue, 2012-01-31 at 15:33 +0200, Avi Kivity wrote:
> On 01/31/2012 03:21 PM, Jan Kiszka wrote:
> > On 2012-01-31 14:10, Avi Kivity wrote:
> > > On 01/31/2012 02:57 PM, Jan Kiszka wrote:
> > >>>>> Seems fine, but do we really need the option?  If it doesn't work we
> > >>>>> should treat it as an ordinary but and fix it.
> > >>>>
> > >>>> So far it's against the architecture of the emulated system: our current
> > >>>> chipset predates 64 bit PCI.
> > >>>>
> > >>>
> > >>> Then it should be enabled/disabled at the chipset level.
> > >>
> > >> Makes me wonder if we already do some filtering if the device supports
> > >> 64 bit but the next bridge does not.
> > >>
> > > 
> > > Our 440fx does support 64-bit bars, so the question doesn't arise for
> > > x86.  Instead we violate the spec.
> >
> > If you mean by "our" the 440fx-qemu, not the real 440fx. That one does
> > not even support >1GB RAM.
> 
> Yes, that's what I meant.  It also supports pci hotplug, more slots, cpu
> hotplug, etc.

I'll drop this patch for now, it was just something I enabled based on a
query from MST and didn't want to lose it.  Maybe we need the option in
PCI core, but I'm just turn it on and hope for the best w/o giving users
a way to disable it.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 3/9] pci-assign: Update MSI-X MMIO to Memory API
  2012-01-31 12:45   ` Avi Kivity
@ 2012-01-31 21:13     ` Alex Williamson
  2012-02-01  4:22       ` Alex Williamson
  0 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2012-01-31 21:13 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, jan.kiszka, shashidhar.patil

On Tue, 2012-01-31 at 14:45 +0200, Avi Kivity wrote:
> On 01/28/2012 04:21 PM, Alex Williamson wrote:
> > Stop using compatibility mode and at the same time fix available
> > access sizes.  The PCI spec indicates that the MSI-X table may
> > only be accessed as DWORD or QWORD.
> >
> >  
> >  static const MemoryRegionOps msix_mmio_ops = {
> > -    .old_mmio = {
> > -        .read = { msix_mmio_readb, msix_mmio_readw, msix_mmio_readl, },
> > -        .write = { msix_mmio_writeb, msix_mmio_writew, msix_mmio_writel, },
> > -    },
> > +    .read = msix_mmio_read,
> > +    .write = msix_mmio_write,
> >      .endianness = DEVICE_NATIVE_ENDIAN,
> > +    .impl = {
> > +        .min_access_size = 4,
> > +        .max_access_size = 8,
> > +    },
> >  };
> >  
> 
> .impl.min_access_size = 4 means the core will convert 1-byte I/O to
> 4-byte I/O (using rmw if needed).  That's not what we want, I think you
> can leave it at 1 and explicitly ignore small accesses in the callbacks.
> 
> Have you tested 8-byte I/O?  This is the first user.  Don't you need to
> set .valid.max_access_size?

I have not explicitly tested 8-byte I/O, figured it might just work.
Hmm, I wonder if we really need to be strict enough to reject byte and
word access.  It doesn't follow the spec, but I don't know that it buys
us anything to be strict about it.  Anyway, I'll look at .valid and
respin.  Thanks,

Alex




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
  2012-01-31 21:08                 ` Alex Williamson
@ 2012-01-31 21:14                   ` Michael S. Tsirkin
  2012-02-01  9:03                     ` Avi Kivity
  0 siblings, 1 reply; 54+ messages in thread
From: Michael S. Tsirkin @ 2012-01-31 21:14 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Avi Kivity, Jan Kiszka, kvm, shashidhar.patil

On Tue, Jan 31, 2012 at 02:08:38PM -0700, Alex Williamson wrote:
> On Tue, 2012-01-31 at 15:33 +0200, Avi Kivity wrote:
> > On 01/31/2012 03:21 PM, Jan Kiszka wrote:
> > > On 2012-01-31 14:10, Avi Kivity wrote:
> > > > On 01/31/2012 02:57 PM, Jan Kiszka wrote:
> > > >>>>> Seems fine, but do we really need the option?  If it doesn't work we
> > > >>>>> should treat it as an ordinary but and fix it.
> > > >>>>
> > > >>>> So far it's against the architecture of the emulated system: our current
> > > >>>> chipset predates 64 bit PCI.
> > > >>>>
> > > >>>
> > > >>> Then it should be enabled/disabled at the chipset level.
> > > >>
> > > >> Makes me wonder if we already do some filtering if the device supports
> > > >> 64 bit but the next bridge does not.
> > > >>
> > > > 
> > > > Our 440fx does support 64-bit bars, so the question doesn't arise for
> > > > x86.  Instead we violate the spec.
> > >
> > > If you mean by "our" the 440fx-qemu, not the real 440fx. That one does
> > > not even support >1GB RAM.
> > 
> > Yes, that's what I meant.  It also supports pci hotplug, more slots, cpu
> > hotplug, etc.
> 
> I'll drop this patch for now, it was just something I enabled based on a
> query from MST and didn't want to lose it.  Maybe we need the option in
> PCI core, but I'm just turn it on and hope for the best w/o giving users
> a way to disable it.  Thanks,
> 
> Alex

The patch itself makes sense to me.
Avi found some bugs in 64 bit BAR handling, let's wait till that is
fixed then merge this patch.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 4/9] pci-assign: Use struct for MSI-X table
  2012-01-31 20:00       ` Michael S. Tsirkin
@ 2012-01-31 21:17         ` Alex Williamson
  2012-01-31 21:24           ` Michael S. Tsirkin
  0 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2012-01-31 21:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, jan.kiszka, shashidhar.patil

On Tue, 2012-01-31 at 22:00 +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 31, 2012 at 12:05:49PM -0700, Alex Williamson wrote:
> > On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote:
> > > On Sat, Jan 28, 2012 at 07:21:58AM -0700, Alex Williamson wrote:
> > > > This makes access much easier.
> > > > 
> > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > 
> > > Yes but this also makes it easier to forget
> > > to handle endian-ness.
> > > How about using pci_get/pci_set instead?
> > 
> > Do we really think existing device assignment is ever going to work on
> > anything but x86/x86?
> > 
> > Alex
> 
> It's a question of whether anyone bothers to
> fix all portability bugs. Besides endian-ness,
> and msix vector decoding, there's not much that is
> x86 specific there.

The way interrupt routing is updated from the chipset is disgustingly
x86 specific.  The entire kernel side implementation is only enabled for
x86 (well, not counting ia64).  I expect the kernel side interrupt
injection is very x86, or at least ioapic, specific.  Beyond that, yeah,
it may be trivial ;)  Thanks,

Alex

> > > > ---
> > > > 
> > > >  hw/device-assignment.c |   55 ++++++++++++++++++++++--------------------------
> > > >  hw/device-assignment.h |    9 +++++++-
> > > >  2 files changed, 33 insertions(+), 31 deletions(-)
> > > > 
> > > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > > index 1fc7ffa..422ee00 100644
> > > > --- a/hw/device-assignment.c
> > > > +++ b/hw/device-assignment.c
> > > > @@ -973,10 +973,9 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > > >      AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> > > >      uint16_t entries_nr = 0, entries_max_nr;
> > > >      int pos = 0, i, r = 0;
> > > > -    uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
> > > >      struct kvm_assigned_msix_nr msix_nr;
> > > >      struct kvm_assigned_msix_entry msix_entry;
> > > > -    void *va = adev->msix_table_page;
> > > > +    MSIXTableEntry *entry = adev->msix_table;
> > > >  
> > > >      pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
> > > >  
> > > > @@ -985,12 +984,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > > >      entries_max_nr += 1;
> > > >  
> > > >      /* Get the usable entry number for allocating */
> > > > -    for (i = 0; i < entries_max_nr; i++) {
> > > > -        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > > > -        memcpy(&msg_data, va + i * 16 + 8, 4);
> > > > +    for (i = 0; i < entries_max_nr; i++, entry++) {
> > > >          /* Ignore unused entry even it's unmasked */
> > > > -        if (msg_data == 0)
> > > > +        if (entry->data == 0) {
> > > >              continue;
> > > > +        }
> > > >          entries_nr ++;
> > > >      }
> > > >  
> > > > @@ -1013,16 +1011,13 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > > >  
> > > >      msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
> > > >      entries_nr = 0;
> > > > -    for (i = 0; i < entries_max_nr; i++) {
> > > > +    entry = adev->msix_table;
> > > > +    for (i = 0; i < entries_max_nr; i++, entry++) {
> > > >          if (entries_nr >= msix_nr.entry_nr)
> > > >              break;
> > > > -        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > > > -        memcpy(&msg_data, va + i * 16 + 8, 4);
> > > > -        if (msg_data == 0)
> > > > +        if (entry->data == 0) {
> > > >              continue;
> > > > -
> > > > -        memcpy(&msg_addr, va + i * 16, 4);
> > > > -        memcpy(&msg_upper_addr, va + i * 16 + 4, 4);
> > > > +        }
> > > >  
> > > >          r = kvm_get_irq_route_gsi();
> > > >          if (r < 0)
> > > > @@ -1031,10 +1026,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > > >          adev->entry[entries_nr].gsi = r;
> > > >          adev->entry[entries_nr].type = KVM_IRQ_ROUTING_MSI;
> > > >          adev->entry[entries_nr].flags = 0;
> > > > -        adev->entry[entries_nr].u.msi.address_lo = msg_addr;
> > > > -        adev->entry[entries_nr].u.msi.address_hi = msg_upper_addr;
> > > > -        adev->entry[entries_nr].u.msi.data = msg_data;
> > > > -        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!", msg_data, msg_addr);
> > > > +        adev->entry[entries_nr].u.msi.address_lo = entry->addr_lo;
> > > > +        adev->entry[entries_nr].u.msi.address_hi = entry->addr_hi;
> > > > +        adev->entry[entries_nr].u.msi.data = entry->data;
> > > > +        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!",
> > > > +              entry->data, entry->addr_lo);
> > > >  	kvm_add_routing_entry(&adev->entry[entries_nr]);
> > > >  
> > > >          msix_entry.gsi = adev->entry[entries_nr].gsi;
> > > > @@ -1443,7 +1439,7 @@ static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
> > > >      AssignedDevice *adev = opaque;
> > > >      uint64_t val;
> > > >  
> > > > -    memcpy(&val, (void *)((uint8_t *)adev->msix_table_page + addr), size);
> > > > +    memcpy(&val, (void *)((uint8_t *)adev->msix_table + addr), size);
> > > >  
> > > >      return val;
> > > >  }
> > > > @@ -1456,7 +1452,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
> > > >      DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%lx\n",
> > > >            addr, val);
> > > >  
> > > > -    memcpy((void *)((uint8_t *)adev->msix_table_page + addr), &val, size);
> > > > +    memcpy((void *)((uint8_t *)adev->msix_table + addr), &val, size);
> > > >  }
> > > >  
> > > >  static const MemoryRegionOps msix_mmio_ops = {
> > > > @@ -1471,15 +1467,13 @@ static const MemoryRegionOps msix_mmio_ops = {
> > > >  
> > > >  static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
> > > >  {
> > > > -    dev->msix_table_page = mmap(NULL, 0x1000,
> > > > -                                PROT_READ|PROT_WRITE,
> > > > -                                MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
> > > > -    if (dev->msix_table_page == MAP_FAILED) {
> > > > -        fprintf(stderr, "fail allocate msix_table_page! %s\n",
> > > > -                strerror(errno));
> > > > +    dev->msix_table = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE,
> > > > +                           MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
> > > > +    if (dev->msix_table == MAP_FAILED) {
> > > > +        fprintf(stderr, "fail allocate msix_table! %s\n", strerror(errno));
> > > >          return -EFAULT;
> > > >      }
> > > > -    memset(dev->msix_table_page, 0, 0x1000);
> > > > +    memset(dev->msix_table, 0, 0x1000);
> > > >      memory_region_init_io(&dev->mmio, &msix_mmio_ops, dev,
> > > >                            "assigned-dev-msix", MSIX_PAGE_SIZE);
> > > >      return 0;
> > > > @@ -1487,16 +1481,17 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
> > > >  
> > > >  static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
> > > >  {
> > > > -    if (!dev->msix_table_page)
> > > > +    if (!dev->msix_table) {
> > > >          return;
> > > > +    }
> > > >  
> > > >      memory_region_destroy(&dev->mmio);
> > > >  
> > > > -    if (munmap(dev->msix_table_page, 0x1000) == -1) {
> > > > -        fprintf(stderr, "error unmapping msix_table_page! %s\n",
> > > > +    if (munmap(dev->msix_table, 0x1000) == -1) {
> > > > +        fprintf(stderr, "error unmapping msix_table! %s\n",
> > > >                  strerror(errno));
> > > >      }
> > > > -    dev->msix_table_page = NULL;
> > > > +    dev->msix_table = NULL;
> > > >  }
> > > >  
> > > >  static const VMStateDescription vmstate_assigned_device = {
> > > > diff --git a/hw/device-assignment.h b/hw/device-assignment.h
> > > > index 8ee2274..e229303 100644
> > > > --- a/hw/device-assignment.h
> > > > +++ b/hw/device-assignment.h
> > > > @@ -82,6 +82,13 @@ typedef struct {
> > > >  #define ASSIGNED_DEVICE_PREFER_MSI_MASK	(1 << ASSIGNED_DEVICE_PREFER_MSI_BIT)
> > > >  #define ASSIGNED_DEVICE_USE_MEM64_MASK	(1 << ASSIGNED_DEVICE_USE_MEM64_BIT)
> > > >  
> > > > +typedef struct {
> > > > +    uint32_t addr_lo;
> > > > +    uint32_t addr_hi;
> > > > +    uint32_t data;
> > > > +    uint32_t ctrl;
> > > > +} MSIXTableEntry;
> > > > +
> > > >  typedef struct AssignedDevice {
> > > >      PCIDevice dev;
> > > >      PCIHostDevice host;
> > > > @@ -110,7 +117,7 @@ typedef struct AssignedDevice {
> > > >      uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
> > > >      int irq_entries_nr;
> > > >      struct kvm_irq_routing_entry *entry;
> > > > -    void *msix_table_page;
> > > > +    MSIXTableEntry *msix_table;
> > > >      target_phys_addr_t msix_table_addr;
> > > >      MemoryRegion mmio;
> > > >      char *configfd_name;
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 4/9] pci-assign: Use struct for MSI-X table
  2012-01-31 21:17         ` Alex Williamson
@ 2012-01-31 21:24           ` Michael S. Tsirkin
  2012-01-31 21:30             ` Alex Williamson
  0 siblings, 1 reply; 54+ messages in thread
From: Michael S. Tsirkin @ 2012-01-31 21:24 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, jan.kiszka, shashidhar.patil

On Tue, Jan 31, 2012 at 02:17:30PM -0700, Alex Williamson wrote:
> On Tue, 2012-01-31 at 22:00 +0200, Michael S. Tsirkin wrote:
> > On Tue, Jan 31, 2012 at 12:05:49PM -0700, Alex Williamson wrote:
> > > On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote:
> > > > On Sat, Jan 28, 2012 at 07:21:58AM -0700, Alex Williamson wrote:
> > > > > This makes access much easier.
> > > > > 
> > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > 
> > > > Yes but this also makes it easier to forget
> > > > to handle endian-ness.
> > > > How about using pci_get/pci_set instead?
> > > 
> > > Do we really think existing device assignment is ever going to work on
> > > anything but x86/x86?
> > > 
> > > Alex
> > 
> > It's a question of whether anyone bothers to
> > fix all portability bugs. Besides endian-ness,
> > and msix vector decoding, there's not much that is
> > x86 specific there.
> 
> The way interrupt routing is updated from the chipset is disgustingly
> x86 specific.

Right, that's what I mean - the decoding of msix vector to destination.

> The entire kernel side implementation is only enabled for
> x86 (well, not counting ia64).  I expect the kernel side interrupt
> injection is very x86, or at least ioapic, specific.  Beyond that, yeah,
> it may be trivial ;)  Thanks,
> 
> Alex
> 
> > > > > ---
> > > > > 
> > > > >  hw/device-assignment.c |   55 ++++++++++++++++++++++--------------------------
> > > > >  hw/device-assignment.h |    9 +++++++-
> > > > >  2 files changed, 33 insertions(+), 31 deletions(-)
> > > > > 
> > > > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > > > index 1fc7ffa..422ee00 100644
> > > > > --- a/hw/device-assignment.c
> > > > > +++ b/hw/device-assignment.c
> > > > > @@ -973,10 +973,9 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > > > >      AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> > > > >      uint16_t entries_nr = 0, entries_max_nr;
> > > > >      int pos = 0, i, r = 0;
> > > > > -    uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
> > > > >      struct kvm_assigned_msix_nr msix_nr;
> > > > >      struct kvm_assigned_msix_entry msix_entry;
> > > > > -    void *va = adev->msix_table_page;
> > > > > +    MSIXTableEntry *entry = adev->msix_table;
> > > > >  
> > > > >      pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
> > > > >  
> > > > > @@ -985,12 +984,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > > > >      entries_max_nr += 1;
> > > > >  
> > > > >      /* Get the usable entry number for allocating */
> > > > > -    for (i = 0; i < entries_max_nr; i++) {
> > > > > -        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > > > > -        memcpy(&msg_data, va + i * 16 + 8, 4);
> > > > > +    for (i = 0; i < entries_max_nr; i++, entry++) {
> > > > >          /* Ignore unused entry even it's unmasked */
> > > > > -        if (msg_data == 0)
> > > > > +        if (entry->data == 0) {
> > > > >              continue;
> > > > > +        }
> > > > >          entries_nr ++;
> > > > >      }
> > > > >  
> > > > > @@ -1013,16 +1011,13 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > > > >  
> > > > >      msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
> > > > >      entries_nr = 0;
> > > > > -    for (i = 0; i < entries_max_nr; i++) {
> > > > > +    entry = adev->msix_table;
> > > > > +    for (i = 0; i < entries_max_nr; i++, entry++) {
> > > > >          if (entries_nr >= msix_nr.entry_nr)
> > > > >              break;
> > > > > -        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > > > > -        memcpy(&msg_data, va + i * 16 + 8, 4);
> > > > > -        if (msg_data == 0)
> > > > > +        if (entry->data == 0) {
> > > > >              continue;
> > > > > -
> > > > > -        memcpy(&msg_addr, va + i * 16, 4);
> > > > > -        memcpy(&msg_upper_addr, va + i * 16 + 4, 4);
> > > > > +        }
> > > > >  
> > > > >          r = kvm_get_irq_route_gsi();
> > > > >          if (r < 0)
> > > > > @@ -1031,10 +1026,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > > > >          adev->entry[entries_nr].gsi = r;
> > > > >          adev->entry[entries_nr].type = KVM_IRQ_ROUTING_MSI;
> > > > >          adev->entry[entries_nr].flags = 0;
> > > > > -        adev->entry[entries_nr].u.msi.address_lo = msg_addr;
> > > > > -        adev->entry[entries_nr].u.msi.address_hi = msg_upper_addr;
> > > > > -        adev->entry[entries_nr].u.msi.data = msg_data;
> > > > > -        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!", msg_data, msg_addr);
> > > > > +        adev->entry[entries_nr].u.msi.address_lo = entry->addr_lo;
> > > > > +        adev->entry[entries_nr].u.msi.address_hi = entry->addr_hi;
> > > > > +        adev->entry[entries_nr].u.msi.data = entry->data;
> > > > > +        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!",
> > > > > +              entry->data, entry->addr_lo);
> > > > >  	kvm_add_routing_entry(&adev->entry[entries_nr]);
> > > > >  
> > > > >          msix_entry.gsi = adev->entry[entries_nr].gsi;
> > > > > @@ -1443,7 +1439,7 @@ static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
> > > > >      AssignedDevice *adev = opaque;
> > > > >      uint64_t val;
> > > > >  
> > > > > -    memcpy(&val, (void *)((uint8_t *)adev->msix_table_page + addr), size);
> > > > > +    memcpy(&val, (void *)((uint8_t *)adev->msix_table + addr), size);
> > > > >  
> > > > >      return val;
> > > > >  }
> > > > > @@ -1456,7 +1452,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
> > > > >      DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%lx\n",
> > > > >            addr, val);
> > > > >  
> > > > > -    memcpy((void *)((uint8_t *)adev->msix_table_page + addr), &val, size);
> > > > > +    memcpy((void *)((uint8_t *)adev->msix_table + addr), &val, size);
> > > > >  }
> > > > >  
> > > > >  static const MemoryRegionOps msix_mmio_ops = {
> > > > > @@ -1471,15 +1467,13 @@ static const MemoryRegionOps msix_mmio_ops = {
> > > > >  
> > > > >  static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
> > > > >  {
> > > > > -    dev->msix_table_page = mmap(NULL, 0x1000,
> > > > > -                                PROT_READ|PROT_WRITE,
> > > > > -                                MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
> > > > > -    if (dev->msix_table_page == MAP_FAILED) {
> > > > > -        fprintf(stderr, "fail allocate msix_table_page! %s\n",
> > > > > -                strerror(errno));
> > > > > +    dev->msix_table = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE,
> > > > > +                           MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
> > > > > +    if (dev->msix_table == MAP_FAILED) {
> > > > > +        fprintf(stderr, "fail allocate msix_table! %s\n", strerror(errno));
> > > > >          return -EFAULT;
> > > > >      }
> > > > > -    memset(dev->msix_table_page, 0, 0x1000);
> > > > > +    memset(dev->msix_table, 0, 0x1000);
> > > > >      memory_region_init_io(&dev->mmio, &msix_mmio_ops, dev,
> > > > >                            "assigned-dev-msix", MSIX_PAGE_SIZE);
> > > > >      return 0;
> > > > > @@ -1487,16 +1481,17 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
> > > > >  
> > > > >  static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
> > > > >  {
> > > > > -    if (!dev->msix_table_page)
> > > > > +    if (!dev->msix_table) {
> > > > >          return;
> > > > > +    }
> > > > >  
> > > > >      memory_region_destroy(&dev->mmio);
> > > > >  
> > > > > -    if (munmap(dev->msix_table_page, 0x1000) == -1) {
> > > > > -        fprintf(stderr, "error unmapping msix_table_page! %s\n",
> > > > > +    if (munmap(dev->msix_table, 0x1000) == -1) {
> > > > > +        fprintf(stderr, "error unmapping msix_table! %s\n",
> > > > >                  strerror(errno));
> > > > >      }
> > > > > -    dev->msix_table_page = NULL;
> > > > > +    dev->msix_table = NULL;
> > > > >  }
> > > > >  
> > > > >  static const VMStateDescription vmstate_assigned_device = {
> > > > > diff --git a/hw/device-assignment.h b/hw/device-assignment.h
> > > > > index 8ee2274..e229303 100644
> > > > > --- a/hw/device-assignment.h
> > > > > +++ b/hw/device-assignment.h
> > > > > @@ -82,6 +82,13 @@ typedef struct {
> > > > >  #define ASSIGNED_DEVICE_PREFER_MSI_MASK	(1 << ASSIGNED_DEVICE_PREFER_MSI_BIT)
> > > > >  #define ASSIGNED_DEVICE_USE_MEM64_MASK	(1 << ASSIGNED_DEVICE_USE_MEM64_BIT)
> > > > >  
> > > > > +typedef struct {
> > > > > +    uint32_t addr_lo;
> > > > > +    uint32_t addr_hi;
> > > > > +    uint32_t data;
> > > > > +    uint32_t ctrl;
> > > > > +} MSIXTableEntry;
> > > > > +
> > > > >  typedef struct AssignedDevice {
> > > > >      PCIDevice dev;
> > > > >      PCIHostDevice host;
> > > > > @@ -110,7 +117,7 @@ typedef struct AssignedDevice {
> > > > >      uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
> > > > >      int irq_entries_nr;
> > > > >      struct kvm_irq_routing_entry *entry;
> > > > > -    void *msix_table_page;
> > > > > +    MSIXTableEntry *msix_table;
> > > > >      target_phys_addr_t msix_table_addr;
> > > > >      MemoryRegion mmio;
> > > > >      char *configfd_name;
> > > > > 
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > 
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 4/9] pci-assign: Use struct for MSI-X table
  2012-01-31 21:24           ` Michael S. Tsirkin
@ 2012-01-31 21:30             ` Alex Williamson
  0 siblings, 0 replies; 54+ messages in thread
From: Alex Williamson @ 2012-01-31 21:30 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, jan.kiszka, shashidhar.patil

On Tue, 2012-01-31 at 23:24 +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 31, 2012 at 02:17:30PM -0700, Alex Williamson wrote:
> > On Tue, 2012-01-31 at 22:00 +0200, Michael S. Tsirkin wrote:
> > > On Tue, Jan 31, 2012 at 12:05:49PM -0700, Alex Williamson wrote:
> > > > On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote:
> > > > > On Sat, Jan 28, 2012 at 07:21:58AM -0700, Alex Williamson wrote:
> > > > > > This makes access much easier.
> > > > > > 
> > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > > 
> > > > > Yes but this also makes it easier to forget
> > > > > to handle endian-ness.
> > > > > How about using pci_get/pci_set instead?
> > > > 
> > > > Do we really think existing device assignment is ever going to work on
> > > > anything but x86/x86?
> > > > 
> > > > Alex
> > > 
> > > It's a question of whether anyone bothers to
> > > fix all portability bugs. Besides endian-ness,
> > > and msix vector decoding, there's not much that is
> > > x86 specific there.
> > 
> > The way interrupt routing is updated from the chipset is disgustingly
> > x86 specific.
> 
> Right, that's what I mean - the decoding of msix vector to destination.

Yes, that, but I was referring more to INTx and the ugly hard coded hook
to get notified when the 440fx changes PCI interrupt link setup.
Thanks,

Alex

> 
> > The entire kernel side implementation is only enabled for
> > x86 (well, not counting ia64).  I expect the kernel side interrupt
> > injection is very x86, or at least ioapic, specific.  Beyond that, yeah,
> > it may be trivial ;)  Thanks,
> > 
> > Alex
> > 
> > > > > > ---
> > > > > > 
> > > > > >  hw/device-assignment.c |   55 ++++++++++++++++++++++--------------------------
> > > > > >  hw/device-assignment.h |    9 +++++++-
> > > > > >  2 files changed, 33 insertions(+), 31 deletions(-)
> > > > > > 
> > > > > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > > > > index 1fc7ffa..422ee00 100644
> > > > > > --- a/hw/device-assignment.c
> > > > > > +++ b/hw/device-assignment.c
> > > > > > @@ -973,10 +973,9 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > > > > >      AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> > > > > >      uint16_t entries_nr = 0, entries_max_nr;
> > > > > >      int pos = 0, i, r = 0;
> > > > > > -    uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
> > > > > >      struct kvm_assigned_msix_nr msix_nr;
> > > > > >      struct kvm_assigned_msix_entry msix_entry;
> > > > > > -    void *va = adev->msix_table_page;
> > > > > > +    MSIXTableEntry *entry = adev->msix_table;
> > > > > >  
> > > > > >      pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
> > > > > >  
> > > > > > @@ -985,12 +984,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > > > > >      entries_max_nr += 1;
> > > > > >  
> > > > > >      /* Get the usable entry number for allocating */
> > > > > > -    for (i = 0; i < entries_max_nr; i++) {
> > > > > > -        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > > > > > -        memcpy(&msg_data, va + i * 16 + 8, 4);
> > > > > > +    for (i = 0; i < entries_max_nr; i++, entry++) {
> > > > > >          /* Ignore unused entry even it's unmasked */
> > > > > > -        if (msg_data == 0)
> > > > > > +        if (entry->data == 0) {
> > > > > >              continue;
> > > > > > +        }
> > > > > >          entries_nr ++;
> > > > > >      }
> > > > > >  
> > > > > > @@ -1013,16 +1011,13 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > > > > >  
> > > > > >      msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
> > > > > >      entries_nr = 0;
> > > > > > -    for (i = 0; i < entries_max_nr; i++) {
> > > > > > +    entry = adev->msix_table;
> > > > > > +    for (i = 0; i < entries_max_nr; i++, entry++) {
> > > > > >          if (entries_nr >= msix_nr.entry_nr)
> > > > > >              break;
> > > > > > -        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > > > > > -        memcpy(&msg_data, va + i * 16 + 8, 4);
> > > > > > -        if (msg_data == 0)
> > > > > > +        if (entry->data == 0) {
> > > > > >              continue;
> > > > > > -
> > > > > > -        memcpy(&msg_addr, va + i * 16, 4);
> > > > > > -        memcpy(&msg_upper_addr, va + i * 16 + 4, 4);
> > > > > > +        }
> > > > > >  
> > > > > >          r = kvm_get_irq_route_gsi();
> > > > > >          if (r < 0)
> > > > > > @@ -1031,10 +1026,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > > > > >          adev->entry[entries_nr].gsi = r;
> > > > > >          adev->entry[entries_nr].type = KVM_IRQ_ROUTING_MSI;
> > > > > >          adev->entry[entries_nr].flags = 0;
> > > > > > -        adev->entry[entries_nr].u.msi.address_lo = msg_addr;
> > > > > > -        adev->entry[entries_nr].u.msi.address_hi = msg_upper_addr;
> > > > > > -        adev->entry[entries_nr].u.msi.data = msg_data;
> > > > > > -        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!", msg_data, msg_addr);
> > > > > > +        adev->entry[entries_nr].u.msi.address_lo = entry->addr_lo;
> > > > > > +        adev->entry[entries_nr].u.msi.address_hi = entry->addr_hi;
> > > > > > +        adev->entry[entries_nr].u.msi.data = entry->data;
> > > > > > +        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!",
> > > > > > +              entry->data, entry->addr_lo);
> > > > > >  	kvm_add_routing_entry(&adev->entry[entries_nr]);
> > > > > >  
> > > > > >          msix_entry.gsi = adev->entry[entries_nr].gsi;
> > > > > > @@ -1443,7 +1439,7 @@ static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
> > > > > >      AssignedDevice *adev = opaque;
> > > > > >      uint64_t val;
> > > > > >  
> > > > > > -    memcpy(&val, (void *)((uint8_t *)adev->msix_table_page + addr), size);
> > > > > > +    memcpy(&val, (void *)((uint8_t *)adev->msix_table + addr), size);
> > > > > >  
> > > > > >      return val;
> > > > > >  }
> > > > > > @@ -1456,7 +1452,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
> > > > > >      DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%lx\n",
> > > > > >            addr, val);
> > > > > >  
> > > > > > -    memcpy((void *)((uint8_t *)adev->msix_table_page + addr), &val, size);
> > > > > > +    memcpy((void *)((uint8_t *)adev->msix_table + addr), &val, size);
> > > > > >  }
> > > > > >  
> > > > > >  static const MemoryRegionOps msix_mmio_ops = {
> > > > > > @@ -1471,15 +1467,13 @@ static const MemoryRegionOps msix_mmio_ops = {
> > > > > >  
> > > > > >  static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
> > > > > >  {
> > > > > > -    dev->msix_table_page = mmap(NULL, 0x1000,
> > > > > > -                                PROT_READ|PROT_WRITE,
> > > > > > -                                MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
> > > > > > -    if (dev->msix_table_page == MAP_FAILED) {
> > > > > > -        fprintf(stderr, "fail allocate msix_table_page! %s\n",
> > > > > > -                strerror(errno));
> > > > > > +    dev->msix_table = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE,
> > > > > > +                           MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
> > > > > > +    if (dev->msix_table == MAP_FAILED) {
> > > > > > +        fprintf(stderr, "fail allocate msix_table! %s\n", strerror(errno));
> > > > > >          return -EFAULT;
> > > > > >      }
> > > > > > -    memset(dev->msix_table_page, 0, 0x1000);
> > > > > > +    memset(dev->msix_table, 0, 0x1000);
> > > > > >      memory_region_init_io(&dev->mmio, &msix_mmio_ops, dev,
> > > > > >                            "assigned-dev-msix", MSIX_PAGE_SIZE);
> > > > > >      return 0;
> > > > > > @@ -1487,16 +1481,17 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
> > > > > >  
> > > > > >  static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
> > > > > >  {
> > > > > > -    if (!dev->msix_table_page)
> > > > > > +    if (!dev->msix_table) {
> > > > > >          return;
> > > > > > +    }
> > > > > >  
> > > > > >      memory_region_destroy(&dev->mmio);
> > > > > >  
> > > > > > -    if (munmap(dev->msix_table_page, 0x1000) == -1) {
> > > > > > -        fprintf(stderr, "error unmapping msix_table_page! %s\n",
> > > > > > +    if (munmap(dev->msix_table, 0x1000) == -1) {
> > > > > > +        fprintf(stderr, "error unmapping msix_table! %s\n",
> > > > > >                  strerror(errno));
> > > > > >      }
> > > > > > -    dev->msix_table_page = NULL;
> > > > > > +    dev->msix_table = NULL;
> > > > > >  }
> > > > > >  
> > > > > >  static const VMStateDescription vmstate_assigned_device = {
> > > > > > diff --git a/hw/device-assignment.h b/hw/device-assignment.h
> > > > > > index 8ee2274..e229303 100644
> > > > > > --- a/hw/device-assignment.h
> > > > > > +++ b/hw/device-assignment.h
> > > > > > @@ -82,6 +82,13 @@ typedef struct {
> > > > > >  #define ASSIGNED_DEVICE_PREFER_MSI_MASK	(1 << ASSIGNED_DEVICE_PREFER_MSI_BIT)
> > > > > >  #define ASSIGNED_DEVICE_USE_MEM64_MASK	(1 << ASSIGNED_DEVICE_USE_MEM64_BIT)
> > > > > >  
> > > > > > +typedef struct {
> > > > > > +    uint32_t addr_lo;
> > > > > > +    uint32_t addr_hi;
> > > > > > +    uint32_t data;
> > > > > > +    uint32_t ctrl;
> > > > > > +} MSIXTableEntry;
> > > > > > +
> > > > > >  typedef struct AssignedDevice {
> > > > > >      PCIDevice dev;
> > > > > >      PCIHostDevice host;
> > > > > > @@ -110,7 +117,7 @@ typedef struct AssignedDevice {
> > > > > >      uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
> > > > > >      int irq_entries_nr;
> > > > > >      struct kvm_irq_routing_entry *entry;
> > > > > > -    void *msix_table_page;
> > > > > > +    MSIXTableEntry *msix_table;
> > > > > >      target_phys_addr_t msix_table_addr;
> > > > > >      MemoryRegion mmio;
> > > > > >      char *configfd_name;
> > > > > > 
> > > > > > --
> > > > > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > > > > the body of a message to majordomo@vger.kernel.org
> > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > 
> > > > 
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 3/9] pci-assign: Update MSI-X MMIO to Memory API
  2012-01-31 21:13     ` Alex Williamson
@ 2012-02-01  4:22       ` Alex Williamson
  2012-02-01  9:04         ` Avi Kivity
  0 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2012-02-01  4:22 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, jan.kiszka, shashidhar.patil

On Tue, 2012-01-31 at 14:13 -0700, Alex Williamson wrote:
> On Tue, 2012-01-31 at 14:45 +0200, Avi Kivity wrote:
> > On 01/28/2012 04:21 PM, Alex Williamson wrote:
> > > Stop using compatibility mode and at the same time fix available
> > > access sizes.  The PCI spec indicates that the MSI-X table may
> > > only be accessed as DWORD or QWORD.
> > >
> > >  
> > >  static const MemoryRegionOps msix_mmio_ops = {
> > > -    .old_mmio = {
> > > -        .read = { msix_mmio_readb, msix_mmio_readw, msix_mmio_readl, },
> > > -        .write = { msix_mmio_writeb, msix_mmio_writew, msix_mmio_writel, },
> > > -    },
> > > +    .read = msix_mmio_read,
> > > +    .write = msix_mmio_write,
> > >      .endianness = DEVICE_NATIVE_ENDIAN,
> > > +    .impl = {
> > > +        .min_access_size = 4,
> > > +        .max_access_size = 8,
> > > +    },
> > >  };
> > >  
> > 
> > .impl.min_access_size = 4 means the core will convert 1-byte I/O to
> > 4-byte I/O (using rmw if needed).  That's not what we want, I think you
> > can leave it at 1 and explicitly ignore small accesses in the callbacks.
> > 
> > Have you tested 8-byte I/O?  This is the first user.  Don't you need to
> > set .valid.max_access_size?
> 
> I have not explicitly tested 8-byte I/O, figured it might just work.

Tested, it gets split into 4-byte accesses by cpu_physical_memory_rw().
It's trivial to add 8-byte access there and they get all the way through
the memory API intact.  I expect there's a lot more than that to make
exec.c quad word access clean though.  I'll leave the 8-byte access in
place in case it's a good test case.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
  2012-01-31 21:14                   ` Michael S. Tsirkin
@ 2012-02-01  9:03                     ` Avi Kivity
  2012-02-01 10:03                       ` Michael S. Tsirkin
  2012-02-01 13:55                       ` Alex Williamson
  0 siblings, 2 replies; 54+ messages in thread
From: Avi Kivity @ 2012-02-01  9:03 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alex Williamson, Jan Kiszka, kvm, shashidhar.patil

On 01/31/2012 11:14 PM, Michael S. Tsirkin wrote:
> On Tue, Jan 31, 2012 at 02:08:38PM -0700, Alex Williamson wrote:
> > On Tue, 2012-01-31 at 15:33 +0200, Avi Kivity wrote:
> > > On 01/31/2012 03:21 PM, Jan Kiszka wrote:
> > > > On 2012-01-31 14:10, Avi Kivity wrote:
> > > > > On 01/31/2012 02:57 PM, Jan Kiszka wrote:
> > > > >>>>> Seems fine, but do we really need the option?  If it doesn't work we
> > > > >>>>> should treat it as an ordinary but and fix it.
> > > > >>>>
> > > > >>>> So far it's against the architecture of the emulated system: our current
> > > > >>>> chipset predates 64 bit PCI.
> > > > >>>>
> > > > >>>
> > > > >>> Then it should be enabled/disabled at the chipset level.
> > > > >>
> > > > >> Makes me wonder if we already do some filtering if the device supports
> > > > >> 64 bit but the next bridge does not.
> > > > >>
> > > > > 
> > > > > Our 440fx does support 64-bit bars, so the question doesn't arise for
> > > > > x86.  Instead we violate the spec.
> > > >
> > > > If you mean by "our" the 440fx-qemu, not the real 440fx. That one does
> > > > not even support >1GB RAM.
> > > 
> > > Yes, that's what I meant.  It also supports pci hotplug, more slots, cpu
> > > hotplug, etc.
> > 
> > I'll drop this patch for now, it was just something I enabled based on a
> > query from MST and didn't want to lose it.  Maybe we need the option in
> > PCI core, but I'm just turn it on and hope for the best w/o giving users
> > a way to disable it.  Thanks,
> > 
> > Alex
>
> The patch itself makes sense to me.
> Avi found some bugs in 64 bit BAR handling, let's wait till that is
> fixed then merge this patch.

The bugs are unrelated to 64-bit BARs.

Alex, why drop the patch?  We're far from a release, if there are indeed
bugs there, we'll find out.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 3/9] pci-assign: Update MSI-X MMIO to Memory API
  2012-02-01  4:22       ` Alex Williamson
@ 2012-02-01  9:04         ` Avi Kivity
  2012-02-01 13:56           ` Alex Williamson
  0 siblings, 1 reply; 54+ messages in thread
From: Avi Kivity @ 2012-02-01  9:04 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, jan.kiszka, shashidhar.patil

On 02/01/2012 06:22 AM, Alex Williamson wrote:
> On Tue, 2012-01-31 at 14:13 -0700, Alex Williamson wrote:
> > On Tue, 2012-01-31 at 14:45 +0200, Avi Kivity wrote:
> > > On 01/28/2012 04:21 PM, Alex Williamson wrote:
> > > > Stop using compatibility mode and at the same time fix available
> > > > access sizes.  The PCI spec indicates that the MSI-X table may
> > > > only be accessed as DWORD or QWORD.
> > > >
> > > >  
> > > >  static const MemoryRegionOps msix_mmio_ops = {
> > > > -    .old_mmio = {
> > > > -        .read = { msix_mmio_readb, msix_mmio_readw, msix_mmio_readl, },
> > > > -        .write = { msix_mmio_writeb, msix_mmio_writew, msix_mmio_writel, },
> > > > -    },
> > > > +    .read = msix_mmio_read,
> > > > +    .write = msix_mmio_write,
> > > >      .endianness = DEVICE_NATIVE_ENDIAN,
> > > > +    .impl = {
> > > > +        .min_access_size = 4,
> > > > +        .max_access_size = 8,
> > > > +    },
> > > >  };
> > > >  
> > > 
> > > .impl.min_access_size = 4 means the core will convert 1-byte I/O to
> > > 4-byte I/O (using rmw if needed).  That's not what we want, I think you
> > > can leave it at 1 and explicitly ignore small accesses in the callbacks.
> > > 
> > > Have you tested 8-byte I/O?  This is the first user.  Don't you need to
> > > set .valid.max_access_size?
> > 
> > I have not explicitly tested 8-byte I/O, figured it might just work.
>
> Tested, it gets split into 4-byte accesses by cpu_physical_memory_rw().
> It's trivial to add 8-byte access there and they get all the way through
> the memory API intact.  I expect there's a lot more than that to make
> exec.c quad word access clean though.  I'll leave the 8-byte access in
> place in case it's a good test case.  Thanks,
>

It doesn't make sense without .valid.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
  2012-02-01  9:03                     ` Avi Kivity
@ 2012-02-01 10:03                       ` Michael S. Tsirkin
  2012-02-01 13:55                       ` Alex Williamson
  1 sibling, 0 replies; 54+ messages in thread
From: Michael S. Tsirkin @ 2012-02-01 10:03 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alex Williamson, Jan Kiszka, kvm, shashidhar.patil

On Wed, Feb 01, 2012 at 11:03:25AM +0200, Avi Kivity wrote:
> On 01/31/2012 11:14 PM, Michael S. Tsirkin wrote:
> > On Tue, Jan 31, 2012 at 02:08:38PM -0700, Alex Williamson wrote:
> > > On Tue, 2012-01-31 at 15:33 +0200, Avi Kivity wrote:
> > > > On 01/31/2012 03:21 PM, Jan Kiszka wrote:
> > > > > On 2012-01-31 14:10, Avi Kivity wrote:
> > > > > > On 01/31/2012 02:57 PM, Jan Kiszka wrote:
> > > > > >>>>> Seems fine, but do we really need the option?  If it doesn't work we
> > > > > >>>>> should treat it as an ordinary but and fix it.
> > > > > >>>>
> > > > > >>>> So far it's against the architecture of the emulated system: our current
> > > > > >>>> chipset predates 64 bit PCI.
> > > > > >>>>
> > > > > >>>
> > > > > >>> Then it should be enabled/disabled at the chipset level.
> > > > > >>
> > > > > >> Makes me wonder if we already do some filtering if the device supports
> > > > > >> 64 bit but the next bridge does not.
> > > > > >>
> > > > > > 
> > > > > > Our 440fx does support 64-bit bars, so the question doesn't arise for
> > > > > > x86.  Instead we violate the spec.
> > > > >
> > > > > If you mean by "our" the 440fx-qemu, not the real 440fx. That one does
> > > > > not even support >1GB RAM.
> > > > 
> > > > Yes, that's what I meant.  It also supports pci hotplug, more slots, cpu
> > > > hotplug, etc.
> > > 
> > > I'll drop this patch for now, it was just something I enabled based on a
> > > query from MST and didn't want to lose it.  Maybe we need the option in
> > > PCI core, but I'm just turn it on and hope for the best w/o giving users
> > > a way to disable it.  Thanks,
> > > 
> > > Alex
> >
> > The patch itself makes sense to me.
> > Avi found some bugs in 64 bit BAR handling, let's wait till that is
> > fixed then merge this patch.
> 
> The bugs are unrelated to 64-bit BARs.

This is my understanding as well.
However, qemu pci has this work around for 32 bit BARs:

    /* Now pcibus_t is 64bit.
     * Check if 32 bit BAR wraps around explicitly.
     * Without this, PC ide doesn't work well.
     * TODO: remove this work around.
     */
    if  (!(type & PCI_BASE_ADDRESS_MEM_TYPE_64) && last_addr >= UINT32_MAX) {
        return PCI_BAR_UNMAPPED;
    }

this helps work around the bugs on condition that
guests write UINT32_MAX into BARs to size them which
happens to be true.

And for this reason, the bugs trigger with 64 bit BARs
but not 32 bit ones.

> Alex, why drop the patch?  We're far from a release, if there are indeed
> bugs there, we'll find out.

I think it will expose the 64 bit handling bug so I suggested
applying this patch after you fix the bugs you found in memory -
it's better for bisectability.

> -- 
> error compiling committee.c: too many arguments to function
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
  2012-02-01  9:03                     ` Avi Kivity
  2012-02-01 10:03                       ` Michael S. Tsirkin
@ 2012-02-01 13:55                       ` Alex Williamson
  2012-02-01 15:18                         ` Michael S. Tsirkin
  1 sibling, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2012-02-01 13:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Michael S. Tsirkin, Jan Kiszka, kvm, shashidhar.patil

On Wed, 2012-02-01 at 11:03 +0200, Avi Kivity wrote:
> On 01/31/2012 11:14 PM, Michael S. Tsirkin wrote:
> > On Tue, Jan 31, 2012 at 02:08:38PM -0700, Alex Williamson wrote:
> > > On Tue, 2012-01-31 at 15:33 +0200, Avi Kivity wrote:
> > > > On 01/31/2012 03:21 PM, Jan Kiszka wrote:
> > > > > On 2012-01-31 14:10, Avi Kivity wrote:
> > > > > > On 01/31/2012 02:57 PM, Jan Kiszka wrote:
> > > > > >>>>> Seems fine, but do we really need the option?  If it doesn't work we
> > > > > >>>>> should treat it as an ordinary but and fix it.
> > > > > >>>>
> > > > > >>>> So far it's against the architecture of the emulated system: our current
> > > > > >>>> chipset predates 64 bit PCI.
> > > > > >>>>
> > > > > >>>
> > > > > >>> Then it should be enabled/disabled at the chipset level.
> > > > > >>
> > > > > >> Makes me wonder if we already do some filtering if the device supports
> > > > > >> 64 bit but the next bridge does not.
> > > > > >>
> > > > > > 
> > > > > > Our 440fx does support 64-bit bars, so the question doesn't arise for
> > > > > > x86.  Instead we violate the spec.
> > > > >
> > > > > If you mean by "our" the 440fx-qemu, not the real 440fx. That one does
> > > > > not even support >1GB RAM.
> > > > 
> > > > Yes, that's what I meant.  It also supports pci hotplug, more slots, cpu
> > > > hotplug, etc.
> > > 
> > > I'll drop this patch for now, it was just something I enabled based on a
> > > query from MST and didn't want to lose it.  Maybe we need the option in
> > > PCI core, but I'm just turn it on and hope for the best w/o giving users
> > > a way to disable it.  Thanks,
> > > 
> > > Alex
> >
> > The patch itself makes sense to me.
> > Avi found some bugs in 64 bit BAR handling, let's wait till that is
> > fixed then merge this patch.
> 
> The bugs are unrelated to 64-bit BARs.
> 
> Alex, why drop the patch?  We're far from a release, if there are indeed
> bugs there, we'll find out.

I think maybe this should be a core pci device option so we don't have
pci-assign.mem64 vs virtio-net-pci.64bit.  I'd like to always set the
MEM64 bit to match hardware when registering a BAR and let PCI core
decide whether to expose it that way.  Besides, it was unrelated to the
bulk of the series.  Thanks,

Alex




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 3/9] pci-assign: Update MSI-X MMIO to Memory API
  2012-02-01  9:04         ` Avi Kivity
@ 2012-02-01 13:56           ` Alex Williamson
  0 siblings, 0 replies; 54+ messages in thread
From: Alex Williamson @ 2012-02-01 13:56 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, jan.kiszka, shashidhar.patil

On Wed, 2012-02-01 at 11:04 +0200, Avi Kivity wrote:
> On 02/01/2012 06:22 AM, Alex Williamson wrote:
> > On Tue, 2012-01-31 at 14:13 -0700, Alex Williamson wrote:
> > > On Tue, 2012-01-31 at 14:45 +0200, Avi Kivity wrote:
> > > > On 01/28/2012 04:21 PM, Alex Williamson wrote:
> > > > > Stop using compatibility mode and at the same time fix available
> > > > > access sizes.  The PCI spec indicates that the MSI-X table may
> > > > > only be accessed as DWORD or QWORD.
> > > > >
> > > > >  
> > > > >  static const MemoryRegionOps msix_mmio_ops = {
> > > > > -    .old_mmio = {
> > > > > -        .read = { msix_mmio_readb, msix_mmio_readw, msix_mmio_readl, },
> > > > > -        .write = { msix_mmio_writeb, msix_mmio_writew, msix_mmio_writel, },
> > > > > -    },
> > > > > +    .read = msix_mmio_read,
> > > > > +    .write = msix_mmio_write,
> > > > >      .endianness = DEVICE_NATIVE_ENDIAN,
> > > > > +    .impl = {
> > > > > +        .min_access_size = 4,
> > > > > +        .max_access_size = 8,
> > > > > +    },
> > > > >  };
> > > > >  
> > > > 
> > > > .impl.min_access_size = 4 means the core will convert 1-byte I/O to
> > > > 4-byte I/O (using rmw if needed).  That's not what we want, I think you
> > > > can leave it at 1 and explicitly ignore small accesses in the callbacks.
> > > > 
> > > > Have you tested 8-byte I/O?  This is the first user.  Don't you need to
> > > > set .valid.max_access_size?
> > > 
> > > I have not explicitly tested 8-byte I/O, figured it might just work.
> >
> > Tested, it gets split into 4-byte accesses by cpu_physical_memory_rw().
> > It's trivial to add 8-byte access there and they get all the way through
> > the memory API intact.  I expect there's a lot more than that to make
> > exec.c quad word access clean though.  I'll leave the 8-byte access in
> > place in case it's a good test case.  Thanks,
> >
> 
> It doesn't make sense without .valid.

Yep, v2 sets both .valid & .impl, which is what I used to test.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
  2012-02-01 13:55                       ` Alex Williamson
@ 2012-02-01 15:18                         ` Michael S. Tsirkin
  2012-02-01 15:24                           ` Alex Williamson
  0 siblings, 1 reply; 54+ messages in thread
From: Michael S. Tsirkin @ 2012-02-01 15:18 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Avi Kivity, Jan Kiszka, kvm, shashidhar.patil

On Wed, Feb 01, 2012 at 06:55:40AM -0700, Alex Williamson wrote:
> On Wed, 2012-02-01 at 11:03 +0200, Avi Kivity wrote:
> > On 01/31/2012 11:14 PM, Michael S. Tsirkin wrote:
> > > On Tue, Jan 31, 2012 at 02:08:38PM -0700, Alex Williamson wrote:
> > > > On Tue, 2012-01-31 at 15:33 +0200, Avi Kivity wrote:
> > > > > On 01/31/2012 03:21 PM, Jan Kiszka wrote:
> > > > > > On 2012-01-31 14:10, Avi Kivity wrote:
> > > > > > > On 01/31/2012 02:57 PM, Jan Kiszka wrote:
> > > > > > >>>>> Seems fine, but do we really need the option?  If it doesn't work we
> > > > > > >>>>> should treat it as an ordinary but and fix it.
> > > > > > >>>>
> > > > > > >>>> So far it's against the architecture of the emulated system: our current
> > > > > > >>>> chipset predates 64 bit PCI.
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>> Then it should be enabled/disabled at the chipset level.
> > > > > > >>
> > > > > > >> Makes me wonder if we already do some filtering if the device supports
> > > > > > >> 64 bit but the next bridge does not.
> > > > > > >>
> > > > > > > 
> > > > > > > Our 440fx does support 64-bit bars, so the question doesn't arise for
> > > > > > > x86.  Instead we violate the spec.
> > > > > >
> > > > > > If you mean by "our" the 440fx-qemu, not the real 440fx. That one does
> > > > > > not even support >1GB RAM.
> > > > > 
> > > > > Yes, that's what I meant.  It also supports pci hotplug, more slots, cpu
> > > > > hotplug, etc.
> > > > 
> > > > I'll drop this patch for now, it was just something I enabled based on a
> > > > query from MST and didn't want to lose it.  Maybe we need the option in
> > > > PCI core, but I'm just turn it on and hope for the best w/o giving users
> > > > a way to disable it.  Thanks,
> > > > 
> > > > Alex
> > >
> > > The patch itself makes sense to me.
> > > Avi found some bugs in 64 bit BAR handling, let's wait till that is
> > > fixed then merge this patch.
> > 
> > The bugs are unrelated to 64-bit BARs.
> > 
> > Alex, why drop the patch?  We're far from a release, if there are indeed
> > bugs there, we'll find out.
> 
> I think maybe this should be a core pci device option so we don't have
> pci-assign.mem64 vs virtio-net-pci.64bit.  I'd like to always set the
> MEM64 bit to match hardware when registering a BAR and let PCI core
> decide whether to expose it that way.

Besides the bugs, there aren't any reasons not to expose it
that way, right?

>  Besides, it was unrelated to the
> bulk of the series.  Thanks,
> 
> Alex
> 
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
  2012-02-01 15:18                         ` Michael S. Tsirkin
@ 2012-02-01 15:24                           ` Alex Williamson
  0 siblings, 0 replies; 54+ messages in thread
From: Alex Williamson @ 2012-02-01 15:24 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Jan Kiszka, kvm, shashidhar.patil

On Wed, 2012-02-01 at 17:18 +0200, Michael S. Tsirkin wrote:
> On Wed, Feb 01, 2012 at 06:55:40AM -0700, Alex Williamson wrote:
> > On Wed, 2012-02-01 at 11:03 +0200, Avi Kivity wrote:
> > > On 01/31/2012 11:14 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Jan 31, 2012 at 02:08:38PM -0700, Alex Williamson wrote:
> > > > > On Tue, 2012-01-31 at 15:33 +0200, Avi Kivity wrote:
> > > > > > On 01/31/2012 03:21 PM, Jan Kiszka wrote:
> > > > > > > On 2012-01-31 14:10, Avi Kivity wrote:
> > > > > > > > On 01/31/2012 02:57 PM, Jan Kiszka wrote:
> > > > > > > >>>>> Seems fine, but do we really need the option?  If it doesn't work we
> > > > > > > >>>>> should treat it as an ordinary but and fix it.
> > > > > > > >>>>
> > > > > > > >>>> So far it's against the architecture of the emulated system: our current
> > > > > > > >>>> chipset predates 64 bit PCI.
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>> Then it should be enabled/disabled at the chipset level.
> > > > > > > >>
> > > > > > > >> Makes me wonder if we already do some filtering if the device supports
> > > > > > > >> 64 bit but the next bridge does not.
> > > > > > > >>
> > > > > > > > 
> > > > > > > > Our 440fx does support 64-bit bars, so the question doesn't arise for
> > > > > > > > x86.  Instead we violate the spec.
> > > > > > >
> > > > > > > If you mean by "our" the 440fx-qemu, not the real 440fx. That one does
> > > > > > > not even support >1GB RAM.
> > > > > > 
> > > > > > Yes, that's what I meant.  It also supports pci hotplug, more slots, cpu
> > > > > > hotplug, etc.
> > > > > 
> > > > > I'll drop this patch for now, it was just something I enabled based on a
> > > > > query from MST and didn't want to lose it.  Maybe we need the option in
> > > > > PCI core, but I'm just turn it on and hope for the best w/o giving users
> > > > > a way to disable it.  Thanks,
> > > > > 
> > > > > Alex
> > > >
> > > > The patch itself makes sense to me.
> > > > Avi found some bugs in 64 bit BAR handling, let's wait till that is
> > > > fixed then merge this patch.
> > > 
> > > The bugs are unrelated to 64-bit BARs.
> > > 
> > > Alex, why drop the patch?  We're far from a release, if there are indeed
> > > bugs there, we'll find out.
> > 
> > I think maybe this should be a core pci device option so we don't have
> > pci-assign.mem64 vs virtio-net-pci.64bit.  I'd like to always set the
> > MEM64 bit to match hardware when registering a BAR and let PCI core
> > decide whether to expose it that way.
> 
> Besides the bugs, there aren't any reasons not to expose it
> that way, right?

Yep, mainly just a compatibility/debugging thing.

Alex



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support
  2012-01-28 14:21 [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support Alex Williamson
                   ` (9 preceding siblings ...)
  2012-01-30 10:11 ` [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support Jan Kiszka
@ 2012-02-06 15:55 ` Shashidhar Patil
  2012-02-06 17:29   ` Alex Williamson
  10 siblings, 1 reply; 54+ messages in thread
From: Shashidhar Patil @ 2012-02-06 15:55 UTC (permalink / raw)
  To: Alex Williamson, kvm, Jan Kiszka

HI Alex,
     I can give it a try. Please send me all patches as one file if possible.
I was about to report yet another problem with Guest MSI-X smp affinity
not being honoured by KVM while injecting interrupts. Its again in the context
of 82599 device (tried with Linux as guest). I had a hunch that the MSI-X vector
data which gets modified when SMP affinity is not tracked by the qemu_kvm.
So the interrupt injection was done in load balancing fashion. Because of this
interrupt LB in guest VCPUs I saw that packets for all the queues were processed
by one VCPU instead 2 or 4 available.
Anyway I will test the the MSI-X affinity fix also and get back with results.
And I hope 82599 send/receive works. Thanks for your help.

-Shashidhar

On Sat, Jan 28, 2012 at 7:51 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> Patch 1 & 2 here are independent of the rest, but I include them
> here to avoid conflicts.  The first patch enables exposing MMIO
> BARs as their native width to the guest.  I added a config option
> for this with the default to use the existing behavior as I
> suspect we may have some latent issues there.  Patch 2 is just
> some trivial debug build warning fixes.
>
> The rest of the patches work on improving MSI-X table support.
> Particularly, vectors can now be updated by the guest after
> MSI-X is enabled to support things like irqbalance for SMP
> affinity tuning.  We also now update MSI-X configuration as
> new vectors are unmasked, which enables assignment of MSI-X
> devices on FreeBSD.  I was able to assign and use an 82576
> (PF & VF) on a FreeBSD 9.0 guest with this series.  Hopefully
> Shashidhar can report whether this improves the behavior he
> as seeing with an 82599.
>
> I wasn't able to get masking to work reliably, so I left that
> as is for now.  Perhaps someone has suggestions on getting that
> to work.  Thanks,
>
> Alex
>
> ---
>
> Alex Williamson (9):
>      pci-assign: Update MSI-X config based on table writes
>      pci-assign: Use MSIX_PAGE_SIZE
>      pci-assign: Allocate entries for all MSI-X vectors
>      pci-assign: Proper initialization for MSI-X table
>      pci-assign: Only calculate maximum MSI-X vector entries once
>      pci-assign: Use struct for MSI-X table
>      pci-assign: Update MSI-X MMIO to Memory API
>      pci-assign: Fix warnings with DEBUG enabled
>      pci-assign: Optionally enable 64bit BARs in guest
>
>
>  hw/device-assignment.c |  272 ++++++++++++++++++++++++++++++------------------
>  hw/device-assignment.h |   12 ++
>  2 files changed, 179 insertions(+), 105 deletions(-)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support
  2012-02-06 15:55 ` Shashidhar Patil
@ 2012-02-06 17:29   ` Alex Williamson
  2012-02-09 16:23     ` Shashidhar Patil
  0 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2012-02-06 17:29 UTC (permalink / raw)
  To: Shashidhar Patil; +Cc: kvm, Jan Kiszka

On Mon, 2012-02-06 at 21:25 +0530, Shashidhar Patil wrote:
> HI Alex,
>      I can give it a try. Please send me all patches as one file if possible.
> I was about to report yet another problem with Guest MSI-X smp affinity
> not being honoured by KVM while injecting interrupts. Its again in the context
> of 82599 device (tried with Linux as guest). I had a hunch that the MSI-X vector
> data which gets modified when SMP affinity is not tracked by the qemu_kvm.
> So the interrupt injection was done in load balancing fashion. Because of this
> interrupt LB in guest VCPUs I saw that packets for all the queues were processed
> by one VCPU instead 2 or 4 available.
> Anyway I will test the the MSI-X affinity fix also and get back with results.
> And I hope 82599 send/receive works. Thanks for your help.

I pushed the v2 version to github, so you can grab it from here for
testing:

git://github.com/awilliam/qemu-kvm.git

Branch pci-assign-msix.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support
  2012-02-06 17:29   ` Alex Williamson
@ 2012-02-09 16:23     ` Shashidhar Patil
  2012-02-09 17:23       ` Alex Williamson
  0 siblings, 1 reply; 54+ messages in thread
From: Shashidhar Patil @ 2012-02-09 16:23 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, Jan Kiszka

Hi Alex,
    I tested your code changes for two problems
1. MSIX vectors allocation for FreeBSD Guest
    Tested the allocation of MSI-X allocation happening for both
FreeBSD-9.0 (also tested 8.2) and Linux guest.
     Also tried multiple restart of the guests to see if the hosts
still maintains those allocations. With Linux
     as guest I could pass traffic. Though interrupt allocation worked
the FreeBSD could to pass traffic. The FreeBSD
     driver seems to be broken(atleast when running in guest).

2. Guest MSI-X smp_affinity fix.
     The smp_affinity configured in guest is honoured. The traffic
distribution is uniform. When smp affinity is configured
     in host and guest properly this fix provides multi queue and
hence multi core traffic distribution in both host and
     guest. The result is good packet throughput.

Thanks for the the much needed fixes.

-Shashidhar

On Mon, Feb 6, 2012 at 10:59 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Mon, 2012-02-06 at 21:25 +0530, Shashidhar Patil wrote:
>> HI Alex,
>>      I can give it a try. Please send me all patches as one file if possible.
>> I was about to report yet another problem with Guest MSI-X smp affinity
>> not being honoured by KVM while injecting interrupts. Its again in the context
>> of 82599 device (tried with Linux as guest). I had a hunch that the MSI-X vector
>> data which gets modified when SMP affinity is not tracked by the qemu_kvm.
>> So the interrupt injection was done in load balancing fashion. Because of this
>> interrupt LB in guest VCPUs I saw that packets for all the queues were processed
>> by one VCPU instead 2 or 4 available.
>> Anyway I will test the the MSI-X affinity fix also and get back with results.
>> And I hope 82599 send/receive works. Thanks for your help.
>
> I pushed the v2 version to github, so you can grab it from here for
> testing:
>
> git://github.com/awilliam/qemu-kvm.git
>
> Branch pci-assign-msix.  Thanks,
>
> Alex
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support
  2012-02-09 16:23     ` Shashidhar Patil
@ 2012-02-09 17:23       ` Alex Williamson
  2012-02-12 16:30         ` Shashidhar Patil
  0 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2012-02-09 17:23 UTC (permalink / raw)
  To: Shashidhar Patil; +Cc: kvm, Jan Kiszka

On Thu, 2012-02-09 at 21:53 +0530, Shashidhar Patil wrote:
> Hi Alex,
>     I tested your code changes for two problems
> 1. MSIX vectors allocation for FreeBSD Guest
>     Tested the allocation of MSI-X allocation happening for both
> FreeBSD-9.0 (also tested 8.2) and Linux guest.
>      Also tried multiple restart of the guests to see if the hosts
> still maintains those allocations. With Linux
>      as guest I could pass traffic. Though interrupt allocation worked
> the FreeBSD could to pass traffic. The FreeBSD
>      driver seems to be broken(atleast when running in guest).

Which device/driver?  I was able to use an 82576, both physical device
and sr-iov virtual function) with the freebsd 9.0 igb driver.

> 2. Guest MSI-X smp_affinity fix.
>      The smp_affinity configured in guest is honoured. The traffic
> distribution is uniform. When smp affinity is configured
>      in host and guest properly this fix provides multi queue and
> hence multi core traffic distribution in both host and
>      guest. The result is good packet throughput.
> 
> Thanks for the the much needed fixes.

Thanks for testing!

Alex

> -Shashidhar
> 
> On Mon, Feb 6, 2012 at 10:59 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Mon, 2012-02-06 at 21:25 +0530, Shashidhar Patil wrote:
> >> HI Alex,
> >>      I can give it a try. Please send me all patches as one file if possible.
> >> I was about to report yet another problem with Guest MSI-X smp affinity
> >> not being honoured by KVM while injecting interrupts. Its again in the context
> >> of 82599 device (tried with Linux as guest). I had a hunch that the MSI-X vector
> >> data which gets modified when SMP affinity is not tracked by the qemu_kvm.
> >> So the interrupt injection was done in load balancing fashion. Because of this
> >> interrupt LB in guest VCPUs I saw that packets for all the queues were processed
> >> by one VCPU instead 2 or 4 available.
> >> Anyway I will test the the MSI-X affinity fix also and get back with results.
> >> And I hope 82599 send/receive works. Thanks for your help.
> >
> > I pushed the v2 version to github, so you can grab it from here for
> > testing:
> >
> > git://github.com/awilliam/qemu-kvm.git
> >
> > Branch pci-assign-msix.  Thanks,
> >
> > Alex
> >




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support
  2012-02-09 17:23       ` Alex Williamson
@ 2012-02-12 16:30         ` Shashidhar Patil
  0 siblings, 0 replies; 54+ messages in thread
From: Shashidhar Patil @ 2012-02-12 16:30 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, Jan Kiszka

HI Alex,
   I tried both FreeBSD stock and Intel official freebsd driver for 82599.
I tried both FreeBSD 9.0 and FreeBSD8.2.
No luck with traffic.

-Shashidhar

On Thu, Feb 9, 2012 at 10:53 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Thu, 2012-02-09 at 21:53 +0530, Shashidhar Patil wrote:
>> Hi Alex,
>>     I tested your code changes for two problems
>> 1. MSIX vectors allocation for FreeBSD Guest
>>     Tested the allocation of MSI-X allocation happening for both
>> FreeBSD-9.0 (also tested 8.2) and Linux guest.
>>      Also tried multiple restart of the guests to see if the hosts
>> still maintains those allocations. With Linux
>>      as guest I could pass traffic. Though interrupt allocation worked
>> the FreeBSD could to pass traffic. The FreeBSD
>>      driver seems to be broken(atleast when running in guest).
>
> Which device/driver?  I was able to use an 82576, both physical device
> and sr-iov virtual function) with the freebsd 9.0 igb driver.
>
>> 2. Guest MSI-X smp_affinity fix.
>>      The smp_affinity configured in guest is honoured. The traffic
>> distribution is uniform. When smp affinity is configured
>>      in host and guest properly this fix provides multi queue and
>> hence multi core traffic distribution in both host and
>>      guest. The result is good packet throughput.
>>
>> Thanks for the the much needed fixes.
>
> Thanks for testing!
>
> Alex
>
>> -Shashidhar
>>
>> On Mon, Feb 6, 2012 at 10:59 PM, Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>> > On Mon, 2012-02-06 at 21:25 +0530, Shashidhar Patil wrote:
>> >> HI Alex,
>> >>      I can give it a try. Please send me all patches as one file if possible.
>> >> I was about to report yet another problem with Guest MSI-X smp affinity
>> >> not being honoured by KVM while injecting interrupts. Its again in the context
>> >> of 82599 device (tried with Linux as guest). I had a hunch that the MSI-X vector
>> >> data which gets modified when SMP affinity is not tracked by the qemu_kvm.
>> >> So the interrupt injection was done in load balancing fashion. Because of this
>> >> interrupt LB in guest VCPUs I saw that packets for all the queues were processed
>> >> by one VCPU instead 2 or 4 available.
>> >> Anyway I will test the the MSI-X affinity fix also and get back with results.
>> >> And I hope 82599 send/receive works. Thanks for your help.
>> >
>> > I pushed the v2 version to github, so you can grab it from here for
>> > testing:
>> >
>> > git://github.com/awilliam/qemu-kvm.git
>> >
>> > Branch pci-assign-msix.  Thanks,
>> >
>> > Alex
>> >
>
>
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2012-02-12 16:30 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-28 14:21 [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support Alex Williamson
2012-01-28 14:21 ` [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest Alex Williamson
2012-01-31 12:40   ` Avi Kivity
2012-01-31 12:45     ` Jan Kiszka
2012-01-31 12:51       ` Avi Kivity
2012-01-31 12:57         ` Jan Kiszka
2012-01-31 13:10           ` Avi Kivity
2012-01-31 13:21             ` Jan Kiszka
2012-01-31 13:33               ` Avi Kivity
2012-01-31 21:08                 ` Alex Williamson
2012-01-31 21:14                   ` Michael S. Tsirkin
2012-02-01  9:03                     ` Avi Kivity
2012-02-01 10:03                       ` Michael S. Tsirkin
2012-02-01 13:55                       ` Alex Williamson
2012-02-01 15:18                         ` Michael S. Tsirkin
2012-02-01 15:24                           ` Alex Williamson
2012-01-28 14:21 ` [PATCH 2/9] pci-assign: Fix warnings with DEBUG enabled Alex Williamson
2012-01-28 14:21 ` [PATCH 3/9] pci-assign: Update MSI-X MMIO to Memory API Alex Williamson
2012-01-31 12:45   ` Avi Kivity
2012-01-31 21:13     ` Alex Williamson
2012-02-01  4:22       ` Alex Williamson
2012-02-01  9:04         ` Avi Kivity
2012-02-01 13:56           ` Alex Williamson
2012-01-28 14:21 ` [PATCH 4/9] pci-assign: Use struct for MSI-X table Alex Williamson
2012-01-31 17:40   ` Michael S. Tsirkin
2012-01-31 19:05     ` Alex Williamson
2012-01-31 20:00       ` Michael S. Tsirkin
2012-01-31 21:17         ` Alex Williamson
2012-01-31 21:24           ` Michael S. Tsirkin
2012-01-31 21:30             ` Alex Williamson
2012-01-28 14:22 ` [PATCH 5/9] pci-assign: Only calculate maximum MSI-X vector entries once Alex Williamson
2012-01-31 20:18   ` Michael S. Tsirkin
2012-01-31 20:31     ` Alex Williamson
2012-01-31 20:56       ` Michael S. Tsirkin
2012-01-28 14:22 ` [PATCH 6/9] pci-assign: Proper initialization for MSI-X table Alex Williamson
2012-01-31 17:40   ` Michael S. Tsirkin
2012-01-31 19:07     ` Alex Williamson
2012-01-31 19:12       ` Michael S. Tsirkin
2012-01-31 19:16         ` Jan Kiszka
2012-01-31 20:19           ` Michael S. Tsirkin
2012-01-31 21:06             ` Alex Williamson
2012-01-28 14:22 ` [PATCH 7/9] pci-assign: Allocate entries for all MSI-X vectors Alex Williamson
2012-01-28 14:22 ` [PATCH 8/9] pci-assign: Use MSIX_PAGE_SIZE Alex Williamson
2012-01-28 14:22 ` [PATCH 9/9] pci-assign: Update MSI-X config based on table writes Alex Williamson
2012-01-31 12:50   ` Avi Kivity
2012-01-30 10:11 ` [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support Jan Kiszka
2012-01-30 13:44   ` Alex Williamson
2012-01-31 12:52     ` Avi Kivity
2012-01-31 12:56       ` Jan Kiszka
2012-02-06 15:55 ` Shashidhar Patil
2012-02-06 17:29   ` Alex Williamson
2012-02-09 16:23     ` Shashidhar Patil
2012-02-09 17:23       ` Alex Williamson
2012-02-12 16:30         ` Shashidhar Patil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.