All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions
@ 2011-06-27 18:19 Jan Kiszka
  2011-06-27 18:19 ` [PATCH 01/13] qemu-kvm: Reduce configure and Makefile.target diff to upstream Jan Kiszka
                   ` (13 more replies)
  0 siblings, 14 replies; 48+ messages in thread
From: Jan Kiszka @ 2011-06-27 18:19 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Michael S. Tsirkin, Alex Williamson, Joerg Roedel

This series basically consists of two halves. The first one applies a
few smaller cleanups to qemu-kvm to improve similarity with upstream.
This includes the recently discussed removal of -enable-nesting.

The second half starts with two device assignment fixes and then applies
some refactorings, specifically dropping the libpci dependency and
enabling the revert of some qemu-kvm private PCI core changes. The
latter is achieved by simplifying the access control management to the
passed-through device's config space. That also saves 100 LOC.

Note that this series was tested with upstream commit af2be20777 ("Fix
fallouts from Linux header inclusion") applied and probably depends on
it also mechanically.

Please review/merge.

CC: Joerg Roedel <Joerg.Roedel@amd.com>

Jan Kiszka (13):
  qemu-kvm: Reduce configure and Makefile.target diff to upstream
  qemu-kvm: Drop some no longer needed #ifdefs
  qemu-kvm: Drop -enable-nesting command line switch
  qemu-kvm: Remove eventfd compat header
  qemu-kvm: Remove qemu_ram_unmap
  qemu-kvm: Drop or replace useless device-assignment.h inclusions
  pci-assign: Fix kvm_deassign_irq handling in assign_irq
  pci-assign: Update legacy interrupts only if used
  pci-assign: Drop libpci header dependency
  pci-assign: Refactor calc_assigned_dev_id
  pci-assign: Track MSI/MSI-X capability position, clean up related
    code
  pci-assign: Generic config space access management
  qemu-kvm: Resolve PCI upstream diffs

 Makefile.target        |    2 -
 compat/sys/eventfd.h   |   13 --
 configure              |   29 +---
 cpu-common.h           |    1 -
 exec.c                 |   13 --
 hw/device-assignment.c |  450 +++++++++++++++++-------------------------------
 hw/device-assignment.h |    9 +-
 hw/pc.c                |    1 -
 hw/pci-hotplug.c       |    1 -
 hw/pci.c               |   29 ++--
 hw/pci.h               |    8 +-
 hw/pci_regs.h          |    7 -
 kvm-all.c              |    4 -
 qemu-kvm-x86.c         |   20 --
 qemu-kvm.h             |   30 ----
 qemu-options.hx        |    2 -
 target-i386/cpuid.c    |    3 -
 target-i386/kvm.c      |    2 -
 vl.c                   |    6 +-
 19 files changed, 182 insertions(+), 448 deletions(-)
 delete mode 100644 compat/sys/eventfd.h


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 01/13] qemu-kvm: Reduce configure and Makefile.target diff to upstream
  2011-06-27 18:19 [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Jan Kiszka
@ 2011-06-27 18:19 ` Jan Kiszka
  2011-06-27 18:19 ` [PATCH 02/13] qemu-kvm: Drop some no longer needed #ifdefs Jan Kiszka
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2011-06-27 18:19 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Michael S. Tsirkin, Alex Williamson

From: Jan Kiszka <jan.kiszka@siemens.com>

None of those bits are still required.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 Makefile.target |    2 --
 configure       |    8 +-------
 2 files changed, 1 insertions(+), 9 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index 4989345..43bab8f 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -40,8 +40,6 @@ ifndef CONFIG_HAIKU
 LIBS+=-lm
 endif
 
-CFLAGS += $(KVM_CFLAGS)
-
 config-target.h: config-target.h-timestamp
 config-target.h-timestamp: config-target.mak
 
diff --git a/configure b/configure
index 1d0641f..15db2a3 100755
--- a/configure
+++ b/configure
@@ -177,7 +177,6 @@ io_thread="yes"
 mixemu="no"
 kvm_cap_pit="yes"
 kvm_cap_device_assignment="yes"
-kerneldir=""
 aix="no"
 blobs="yes"
 pkgversion=" ($(kvm_version))"
@@ -3320,11 +3319,6 @@ case "$target_arch2" in
       \( "$target_arch2" = "x86_64" -a "$cpu" = "i386"   \) -o \
       \( "$target_arch2" = "i386"   -a "$cpu" = "x86_64" \) \) ; then
       echo "CONFIG_KVM=y" >> $config_target_mak
-      echo "CONFIG_THREAD=y" >> $config_host_mak
-      echo "KVM_CFLAGS=$kvm_cflags" >> $config_target_mak
-      if test "$kvm_para" = "yes"; then
-        echo "CONFIG_KVM_PARA=y" >> $config_target_mak
-      fi
       if test $vhost_net = "yes" ; then
         echo "CONFIG_VHOST_NET=y" >> $config_target_mak
       fi
@@ -3566,7 +3560,7 @@ for f in $FILES ; do
 done
 
 # temporary config to build submodules
-for rom in seabios vgabios; do
+for rom in seabios vgabios ; do
     config_mak=roms/$rom/config.mak
     echo "# Automatically generated by configure - do not modify" > $config_mak
     echo "SRC_PATH=$source_path/roms/$rom" >> $config_mak
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 02/13] qemu-kvm: Drop some no longer needed #ifdefs
  2011-06-27 18:19 [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Jan Kiszka
  2011-06-27 18:19 ` [PATCH 01/13] qemu-kvm: Reduce configure and Makefile.target diff to upstream Jan Kiszka
@ 2011-06-27 18:19 ` Jan Kiszka
  2011-06-27 18:19 ` [PATCH 03/13] qemu-kvm: Drop -enable-nesting command line switch Jan Kiszka
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2011-06-27 18:19 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Michael S. Tsirkin, Alex Williamson

From: Jan Kiszka <jan.kiszka@siemens.com>

Not a complete cleanup, other code has more subtle dependencies. Will be
resolved during upstream merge.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |   52 ++++++-----------------------------------------
 kvm-all.c              |    4 ---
 qemu-kvm-x86.c         |   20 ------------------
 qemu-kvm.h             |   23 ---------------------
 target-i386/kvm.c      |    2 -
 5 files changed, 7 insertions(+), 94 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 57d8dc0..6a2a8c9 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -800,7 +800,6 @@ again:
 
 static QLIST_HEAD(, AssignedDevice) devs = QLIST_HEAD_INITIALIZER(devs);
 
-#ifdef KVM_CAP_IRQ_ROUTING
 static void free_dev_irq_entries(AssignedDevice *dev)
 {
     int i;
@@ -811,7 +810,6 @@ static void free_dev_irq_entries(AssignedDevice *dev)
     dev->entry = NULL;
     dev->irq_entries_nr = 0;
 }
-#endif
 
 static void free_assigned_device(AssignedDevice *dev)
 {
@@ -860,9 +858,7 @@ static void free_assigned_device(AssignedDevice *dev)
         close(dev->real_device.config_fd);
     }
 
-#ifdef KVM_CAP_IRQ_ROUTING
     free_dev_irq_entries(dev);
-#endif
 }
 
 static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t devfn)
@@ -922,7 +918,6 @@ static int assign_device(AssignedDevice *dev)
     struct kvm_assigned_pci_dev assigned_dev_data;
     int r;
 
-#ifdef KVM_CAP_PCI_SEGMENT
     /* Only pass non-zero PCI segment to capable module */
     if (!kvm_check_extension(kvm_state, KVM_CAP_PCI_SEGMENT) &&
         dev->h_segnr) {
@@ -930,18 +925,14 @@ static int assign_device(AssignedDevice *dev)
                 "as this KVM module doesn't support it.\n");
         return -ENODEV;
     }
-#endif
 
     memset(&assigned_dev_data, 0, sizeof(assigned_dev_data));
     assigned_dev_data.assigned_dev_id  =
 	calc_assigned_dev_id(dev->h_segnr, dev->h_busnr, dev->h_devfn);
-#ifdef KVM_CAP_PCI_SEGMENT
     assigned_dev_data.segnr = dev->h_segnr;
-#endif
     assigned_dev_data.busnr = dev->h_busnr;
     assigned_dev_data.devfn = dev->h_devfn;
 
-#ifdef KVM_CAP_IOMMU
     /* We always enable the IOMMU unless disabled on the command line */
     if (dev->features & ASSIGNED_DEVICE_USE_IOMMU_MASK) {
         if (!kvm_check_extension(kvm_state, KVM_CAP_IOMMU)) {
@@ -951,9 +942,6 @@ static int assign_device(AssignedDevice *dev)
         }
         assigned_dev_data.flags |= KVM_DEV_ASSIGN_ENABLE_IOMMU;
     }
-#else
-    dev->features &= ~ASSIGNED_DEVICE_USE_IOMMU_MASK;
-#endif
     if (!(dev->features & ASSIGNED_DEVICE_USE_IOMMU_MASK)) {
         fprintf(stderr,
                 "WARNING: Assigning a device without IOMMU protection can "
@@ -1001,7 +989,6 @@ static int assign_irq(AssignedDevice *dev)
         calc_assigned_dev_id(dev->h_segnr, dev->h_busnr, dev->h_devfn);
     assigned_irq_data.guest_irq = irq;
     assigned_irq_data.host_irq = dev->real_device.irq;
-#ifdef KVM_CAP_ASSIGN_DEV_IRQ
     if (dev->irq_requested_type) {
         assigned_irq_data.flags = dev->irq_requested_type;
         r = kvm_deassign_irq(kvm_state, &assigned_irq_data);
@@ -1016,7 +1003,6 @@ static int assign_irq(AssignedDevice *dev)
         assigned_irq_data.flags |= KVM_DEV_IRQ_HOST_MSI;
     else
         assigned_irq_data.flags |= KVM_DEV_IRQ_HOST_INTX;
-#endif
 
     r = kvm_assign_irq(kvm_state, &assigned_irq_data);
     if (r < 0) {
@@ -1034,7 +1020,6 @@ static int assign_irq(AssignedDevice *dev)
 
 static void deassign_device(AssignedDevice *dev)
 {
-#ifdef KVM_CAP_DEVICE_DEASSIGNMENT
     struct kvm_assigned_pci_dev assigned_dev_data;
     int r;
 
@@ -1046,7 +1031,6 @@ static void deassign_device(AssignedDevice *dev)
     if (r < 0)
 	fprintf(stderr, "Failed to deassign device \"%s\" : %s\n",
                 dev->dev.qdev.id, strerror(-r));
-#endif
 }
 
 #if 0
@@ -1084,9 +1068,6 @@ void assigned_dev_update_irqs(void)
     }
 }
 
-#ifdef KVM_CAP_IRQ_ROUTING
-
-#ifdef KVM_CAP_DEVICE_MSI
 static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos)
 {
     struct kvm_assigned_irq assigned_irq_data;
@@ -1151,9 +1132,7 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos)
         assign_irq(assigned_dev);
     }
 }
-#endif
 
-#ifdef KVM_CAP_DEVICE_MSIX
 static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
 {
     AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
@@ -1290,8 +1269,6 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
         assign_irq(assigned_dev);
     }
 }
-#endif
-#endif
 
 /* There can be multiple VNDR capabilities per device, we need to find the
  * one that starts closet to the given address without going over. */
@@ -1338,32 +1315,23 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
                                                  uint32_t val, int len)
 {
     uint8_t cap_id = pci_dev->config_map[address];
+    uint8_t cap;
 
     pci_default_write_config(pci_dev, address, val, len);
     switch (cap_id) {
-#ifdef KVM_CAP_IRQ_ROUTING
     case PCI_CAP_ID_MSI:
-#ifdef KVM_CAP_DEVICE_MSI
-        {
-            uint8_t cap = pci_find_capability(pci_dev, cap_id);
-            if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
-                assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
-            }
+        cap = pci_find_capability(pci_dev, cap_id);
+        if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
+            assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
         }
-#endif
         break;
 
     case PCI_CAP_ID_MSIX:
-#ifdef KVM_CAP_DEVICE_MSIX
-        {
-            uint8_t cap = pci_find_capability(pci_dev, cap_id);
-            if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
-                assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
-            }
+        cap = pci_find_capability(pci_dev, cap_id);
+        if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
+            assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
         }
-#endif
         break;
-#endif
 
     case PCI_CAP_ID_VPD:
     case PCI_CAP_ID_VNDR:
@@ -1384,8 +1352,6 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
                  pci_get_word(pci_dev->config + PCI_STATUS) &
                  ~PCI_STATUS_CAP_LIST);
 
-#ifdef KVM_CAP_IRQ_ROUTING
-#ifdef KVM_CAP_DEVICE_MSI
     /* Expose MSI capability
      * MSI capability is the 1st capability in capability config */
     if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI, 0))) {
@@ -1407,8 +1373,6 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         pci_set_long(pci_dev->wmask + pos + PCI_MSI_ADDRESS_LO, 0xfffffffc);
         pci_set_word(pci_dev->wmask + pos + PCI_MSI_DATA_32, 0xffff);
     }
-#endif
-#ifdef KVM_CAP_DEVICE_MSIX
     /* Expose MSI-X capability */
     if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX, 0))) {
         int bar_nr;
@@ -1432,8 +1396,6 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         msix_table_entry &= ~PCI_MSIX_BIR;
         dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
     }
-#endif
-#endif
 
     /* Minimal PM support, nothing writable, device appears to NAK changes */
     if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PM, 0))) {
diff --git a/kvm-all.c b/kvm-all.c
index 34453dc..3ad2459 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1418,7 +1418,6 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign)
 
 int kvm_set_irqfd(int gsi, int fd, bool assigned)
 {
-#if defined(KVM_IRQFD)
     struct kvm_irqfd irqfd = {
         .fd = fd,
         .gsi = gsi,
@@ -1432,9 +1431,6 @@ int kvm_set_irqfd(int gsi, int fd, bool assigned)
     if (r < 0)
         return r;
     return 0;
-#else
-    return -ENOSYS;
-#endif
 }
 
 int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr)
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 2a01ccc..a7981b1 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -25,7 +25,6 @@
 
 static int kvm_create_pit(KVMState *s)
 {
-#ifdef KVM_CAP_PIT
     int r;
 
     if (kvm_pit) {
@@ -43,12 +42,9 @@ static int kvm_create_pit(KVMState *s)
             }
         }
     }
-#endif
     return 0;
 }
 
-#ifdef KVM_EXIT_TPR_ACCESS
-
 int kvm_handle_tpr_access(CPUState *env)
 {
     struct kvm_run *run = env->kvm_run;
@@ -68,10 +64,6 @@ int kvm_enable_vapic(CPUState *env, uint64_t vapic)
     return kvm_vcpu_ioctl(env, KVM_SET_VAPIC_ADDR, &va);
 }
 
-#endif
-
-#ifdef KVM_CAP_IRQCHIP
-
 int kvm_get_lapic(CPUState *env, struct kvm_lapic_state *s)
 {
     int r = 0;
@@ -103,10 +95,6 @@ int kvm_set_lapic(CPUState *env, struct kvm_lapic_state *s)
     return r;
 }
 
-#endif
-
-#ifdef KVM_CAP_PIT
-
 int kvm_get_pit(KVMState *s, struct kvm_pit_state *pit_state)
 {
     if (!kvm_pit_in_kernel()) {
@@ -123,7 +111,6 @@ int kvm_set_pit(KVMState *s, struct kvm_pit_state *pit_state)
     return kvm_vm_ioctl(s, KVM_SET_PIT, pit_state);
 }
 
-#ifdef KVM_CAP_PIT_STATE2
 int kvm_get_pit2(KVMState *s, struct kvm_pit_state2 *ps2)
 {
     if (!kvm_pit_in_kernel()) {
@@ -140,10 +127,6 @@ int kvm_set_pit2(KVMState *s, struct kvm_pit_state2 *ps2)
     return kvm_vm_ioctl(s, KVM_SET_PIT2, ps2);
 }
 
-#endif
-#endif
-
-#ifdef KVM_CAP_VAPIC
 static int kvm_enable_tpr_access_reporting(CPUState *env)
 {
     int r;
@@ -155,15 +138,12 @@ static int kvm_enable_tpr_access_reporting(CPUState *env)
     }
     return kvm_vcpu_ioctl(env, KVM_TPR_ACCESS_REPORTING, &tac);
 }
-#endif
 
 static int _kvm_arch_init_vcpu(CPUState *env)
 {
     kvm_arch_reset_vcpu(env);
 
-#ifdef KVM_EXIT_TPR_ACCESS
     kvm_enable_tpr_access_reporting(env);
-#endif
 
     return kvm_update_ioport_access(env);
 }
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 09d07d0..f3f775c 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -39,7 +39,6 @@
 
 int kvm_create_irqchip(KVMState *s);
 
-#ifdef KVM_CAP_IRQCHIP
 /*!
  * \brief Dump in kernel IRQCHIP contents
  *
@@ -86,13 +85,6 @@ int kvm_get_lapic(CPUState *env, struct kvm_lapic_state *s);
  */
 int kvm_set_lapic(CPUState *env, struct kvm_lapic_state *s);
 
-#endif
-
-#endif
-
-#ifdef KVM_CAP_PIT
-
-#if defined(__i386__) || defined(__x86_64__)
 /*!
  * \brief Get in kernel PIT of the virtual domain
  *
@@ -116,7 +108,6 @@ int kvm_set_pit(KVMState *s, struct kvm_pit_state *pit_state);
 
 int kvm_reinject_control(KVMState *s, int pit_reinject);
 
-#ifdef KVM_CAP_PIT_STATE2
 /*!
  * \brief Set in kernel PIT state2 of the virtual domain
  *
@@ -138,16 +129,9 @@ int kvm_set_pit2(KVMState *s, struct kvm_pit_state2 *ps2);
 int kvm_get_pit2(KVMState *s, struct kvm_pit_state2 *ps2);
 
 #endif
-#endif
-#endif
-
-#ifdef KVM_CAP_VAPIC
 
 int kvm_enable_vapic(CPUState *env, uint64_t vapic);
 
-#endif
-
-#ifdef KVM_CAP_DEVICE_ASSIGNMENT
 /*!
  * \brief Notifies host kernel about a PCI device to be assigned to a guest
  *
@@ -171,7 +155,6 @@ int kvm_assign_pci_device(KVMState *s,
  */
 int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq);
 
-#ifdef KVM_CAP_ASSIGN_DEV_IRQ
 /*!
  * \brief Deassign IRQ for an assigned device
  *
@@ -182,10 +165,7 @@ int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq);
  * \param assigned_irq Parameters, like dev id, host irq, guest irq, etc
  */
 int kvm_deassign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq);
-#endif
-#endif
 
-#ifdef KVM_CAP_DEVICE_DEASSIGNMENT
 /*!
  * \brief Notifies host kernel about a PCI device to be deassigned from a guest
  *
@@ -197,7 +177,6 @@ int kvm_deassign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq);
  */
 int kvm_deassign_pci_device(KVMState *s,
                             struct kvm_assigned_pci_dev *assigned_dev);
-#endif
 
 /*!
  * \brief Clears the temporary irq routing table
@@ -252,11 +231,9 @@ int kvm_update_routing_entry(struct kvm_irq_routing_entry *entry,
                              struct kvm_irq_routing_entry *newentry);
 
 
-#ifdef KVM_CAP_DEVICE_MSIX
 int kvm_assign_set_msix_nr(KVMState *s, struct kvm_assigned_msix_nr *msix_nr);
 int kvm_assign_set_msix_entry(KVMState *s,
                               struct kvm_assigned_msix_entry *entry);
-#endif
 
 #else                           /* !CONFIG_KVM */
 
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index cfc6c78..aa843f0 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -512,13 +512,11 @@ int kvm_arch_init_vcpu(CPUState *env)
 
 static void kvm_clear_vapic(CPUState *env)
 {
-#ifdef KVM_SET_VAPIC_ADDR
     struct kvm_vapic_addr va = {
         .vapic_addr = 0,
     };
 
     kvm_vcpu_ioctl(env, KVM_SET_VAPIC_ADDR, &va);
-#endif
 }
 
 void kvm_arch_reset_vcpu(CPUState *env)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 03/13] qemu-kvm: Drop -enable-nesting command line switch
  2011-06-27 18:19 [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Jan Kiszka
  2011-06-27 18:19 ` [PATCH 01/13] qemu-kvm: Reduce configure and Makefile.target diff to upstream Jan Kiszka
  2011-06-27 18:19 ` [PATCH 02/13] qemu-kvm: Drop some no longer needed #ifdefs Jan Kiszka
@ 2011-06-27 18:19 ` Jan Kiszka
  2011-06-28 10:48   ` Roedel, Joerg
  2011-06-27 18:19 ` [PATCH 04/13] qemu-kvm: Remove eventfd compat header Jan Kiszka
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2011-06-27 18:19 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Michael S. Tsirkin, Alex Williamson, Joerg Roedel

From: Jan Kiszka <jan.kiszka@siemens.com>

Link it is already handling in upstream, this feature can be controlled
via -cpu ...+/-svm.

CC: Joerg Roedel <Joerg.Roedel@amd.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 qemu-kvm.h          |    7 -------
 qemu-options.hx     |    2 --
 target-i386/cpuid.c |    3 ---
 vl.c                |    4 ----
 4 files changed, 0 insertions(+), 16 deletions(-)

diff --git a/qemu-kvm.h b/qemu-kvm.h
index f3f775c..845880e 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -259,19 +259,12 @@ int kvm_update_ioport_access(CPUState *env);
 int kvm_arch_set_ioport_access(unsigned long start, unsigned long size,
                                bool enable);
 
-#ifdef CONFIG_KVM
 extern int kvm_irqchip;
 extern int kvm_pit;
 extern int kvm_pit_reinject;
-extern int kvm_nested;
 extern unsigned int kvm_shadow_memory;
 
 int kvm_handle_tpr_access(CPUState *env);
-
-#else
-#define kvm_nested 0
-#endif
-
 int kvm_tpr_enable_vapic(CPUState *env);
 
 #endif
diff --git a/qemu-options.hx b/qemu-options.hx
index 9972dcf..6a6dc89 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2420,8 +2420,6 @@ DEF("no-kvm-pit-reinjection", 0, QEMU_OPTION_no_kvm_pit_reinjection,
     "-no-kvm-pit-reinjection\n"
     "                disable KVM kernel mode PIT interrupt reinjection\n",
     QEMU_ARCH_I386)
-DEF("enable-nesting", 0, QEMU_OPTION_enable_nesting,
-    "-enable-nesting enable support for running a VM inside the VM (AMD only)\n", QEMU_ARCH_I386)
 DEF("nvram", HAS_ARG, QEMU_OPTION_nvram,
     "-nvram FILE     provide ia64 nvram contents\n", QEMU_ARCH_ALL)
 DEF("tdf", 0, QEMU_OPTION_tdf,
diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index b5b563c..d1f3b43 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -1238,9 +1238,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
 
             /* disable CPU features that KVM cannot support */
 
-            /* svm */
-            if (!kvm_nested)
-                *ecx &= ~CPUID_EXT3_SVM;
             /* 3dnow */
             *edx &= ~0xc0000000;
         }
diff --git a/vl.c b/vl.c
index 611dbd2..dd870e6 100644
--- a/vl.c
+++ b/vl.c
@@ -2717,10 +2717,6 @@ int main(int argc, char **argv, char **envp)
                 kvm_pit_reinject = 0;
                 break;
             }
-	    case QEMU_OPTION_enable_nesting: {
-		kvm_nested = 1;
-		break;
-	    }
 #endif
             case QEMU_OPTION_usb:
                 usb_enabled = 1;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 04/13] qemu-kvm: Remove eventfd compat header
  2011-06-27 18:19 [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Jan Kiszka
                   ` (2 preceding siblings ...)
  2011-06-27 18:19 ` [PATCH 03/13] qemu-kvm: Drop -enable-nesting command line switch Jan Kiszka
@ 2011-06-27 18:19 ` Jan Kiszka
  2011-06-28 11:09   ` Michael S. Tsirkin
  2011-06-27 18:19 ` [PATCH 05/13] qemu-kvm: Remove qemu_ram_unmap Jan Kiszka
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2011-06-27 18:19 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Michael S. Tsirkin, Alex Williamson

From: Jan Kiszka <jan.kiszka@siemens.com>

No longer used.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 compat/sys/eventfd.h |   13 -------------
 1 files changed, 0 insertions(+), 13 deletions(-)
 delete mode 100644 compat/sys/eventfd.h

diff --git a/compat/sys/eventfd.h b/compat/sys/eventfd.h
deleted file mode 100644
index f55d96a..0000000
--- a/compat/sys/eventfd.h
+++ /dev/null
@@ -1,13 +0,0 @@
-#ifndef _COMPAT_SYS_EVENTFD
-#define _COMPAT_SYS_EVENTFD
-
-#include <unistd.h>
-#include <syscall.h>
-
-
-static inline int eventfd (int count, int flags)
-{
-    return syscall(SYS_eventfd, count, flags);
-}
-
-#endif
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 05/13] qemu-kvm: Remove qemu_ram_unmap
  2011-06-27 18:19 [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Jan Kiszka
                   ` (3 preceding siblings ...)
  2011-06-27 18:19 ` [PATCH 04/13] qemu-kvm: Remove eventfd compat header Jan Kiszka
@ 2011-06-27 18:19 ` Jan Kiszka
  2011-06-27 18:19 ` [PATCH 06/13] qemu-kvm: Drop or replace useless device-assignment.h inclusions Jan Kiszka
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2011-06-27 18:19 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Michael S. Tsirkin, Alex Williamson

From: Jan Kiszka <jan.kiszka@siemens.com>

Upstream gained identical qemu_ram_free_from_ptr.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 cpu-common.h           |    1 -
 exec.c                 |   13 -------------
 hw/device-assignment.c |    2 +-
 3 files changed, 1 insertions(+), 15 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index 1ccf2f4..b027e43 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -60,7 +60,6 @@ ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr);
 ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
                         ram_addr_t size, void *host);
 ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size);
-void qemu_ram_unmap(ram_addr_t addr);
 void qemu_ram_free(ram_addr_t addr);
 void qemu_ram_free_from_ptr(ram_addr_t addr);
 void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
diff --git a/exec.c b/exec.c
index c728290..68f5b33 100644
--- a/exec.c
+++ b/exec.c
@@ -2982,19 +2982,6 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
     return new_block->offset;
 }
 
-void qemu_ram_unmap(ram_addr_t addr)
-{
-    RAMBlock *block;
-
-    QLIST_FOREACH(block, &ram_list.blocks, next) {
-        if (addr == block->offset) {
-            QLIST_REMOVE(block, next);
-            qemu_free(block);
-            return;
-        }
-    }
-}
-
 ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size)
 {
     return qemu_ram_alloc_from_ptr(dev, name, size, NULL);
diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 6a2a8c9..36ad6b0 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -836,7 +836,7 @@ static void free_assigned_device(AssignedDevice *dev)
                 if (region->r_size & 0xFFF) {
                     cpu_unregister_io_memory(region->memory_index);
                 } else {
-                    qemu_ram_unmap(region->memory_index);
+                    qemu_ram_free_from_ptr(region->memory_index);
                 }
                 if (munmap(region->u.r_virtbase,
                            (pci_region->size + 0xFFF) & 0xFFFFF000)) {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 06/13] qemu-kvm: Drop or replace useless device-assignment.h inclusions
  2011-06-27 18:19 [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Jan Kiszka
                   ` (4 preceding siblings ...)
  2011-06-27 18:19 ` [PATCH 05/13] qemu-kvm: Remove qemu_ram_unmap Jan Kiszka
@ 2011-06-27 18:19 ` Jan Kiszka
  2011-06-27 18:19 ` [PATCH 07/13] pci-assign: Fix kvm_deassign_irq handling in assign_irq Jan Kiszka
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2011-06-27 18:19 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Michael S. Tsirkin, Alex Williamson

From: Jan Kiszka <jan.kiszka@siemens.com>

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/pc.c          |    1 -
 hw/pci-hotplug.c |    1 -
 vl.c             |    2 +-
 3 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index c4e3f26..c0a88e1 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -39,7 +39,6 @@
 #include "msix.h"
 #include "sysbus.h"
 #include "sysemu.h"
-#include "device-assignment.h"
 #include "kvm.h"
 #include "blockdev.h"
 #include "ui/qemu-spice.h"
diff --git a/hw/pci-hotplug.c b/hw/pci-hotplug.c
index 76ce539..b59be2a 100644
--- a/hw/pci-hotplug.c
+++ b/hw/pci-hotplug.c
@@ -31,7 +31,6 @@
 #include "scsi.h"
 #include "virtio-blk.h"
 #include "qemu-config.h"
-#include "device-assignment.h"
 #include "blockdev.h"
 
 #if defined(TARGET_I386)
diff --git a/vl.c b/vl.c
index dd870e6..7708254 100644
--- a/vl.c
+++ b/vl.c
@@ -145,7 +145,7 @@ int main(int argc, char **argv)
 #include "qemu-config.h"
 #include "qemu-objects.h"
 #include "qemu-options.h"
-#include "hw/device-assignment.h"
+#include "kvm.h"
 #ifdef CONFIG_VIRTFS
 #include "fsdev/qemu-fsdev.h"
 #endif
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 07/13] pci-assign: Fix kvm_deassign_irq handling in assign_irq
  2011-06-27 18:19 [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Jan Kiszka
                   ` (5 preceding siblings ...)
  2011-06-27 18:19 ` [PATCH 06/13] qemu-kvm: Drop or replace useless device-assignment.h inclusions Jan Kiszka
@ 2011-06-27 18:19 ` Jan Kiszka
  2011-06-27 18:19 ` [PATCH 08/13] pci-assign: Update legacy interrupts only if used Jan Kiszka
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2011-06-27 18:19 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Michael S. Tsirkin, Alex Williamson

From: Jan Kiszka <jan.kiszka@siemens.com>

Always clear AssignedDevice::irq_requested_type after calling
kvm_deassign_irq. Moreover, drop the obviously incorrect exclusion when
reporting related errors - if irq_requested_type is non-zero, deassign
must not fail.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 36ad6b0..36c0f5f 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -992,9 +992,10 @@ static int assign_irq(AssignedDevice *dev)
     if (dev->irq_requested_type) {
         assigned_irq_data.flags = dev->irq_requested_type;
         r = kvm_deassign_irq(kvm_state, &assigned_irq_data);
-        /* -ENXIO means no assigned irq */
-        if (r && r != -ENXIO)
+        if (r) {
             perror("assign_irq: deassign");
+        }
+        dev->irq_requested_type = 0;
     }
 
     assigned_irq_data.flags = KVM_DEV_IRQ_GUEST_INTX;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 08/13] pci-assign: Update legacy interrupts only if used
  2011-06-27 18:19 [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Jan Kiszka
                   ` (6 preceding siblings ...)
  2011-06-27 18:19 ` [PATCH 07/13] pci-assign: Fix kvm_deassign_irq handling in assign_irq Jan Kiszka
@ 2011-06-27 18:19 ` Jan Kiszka
  2011-06-27 18:19 ` [PATCH 09/13] pci-assign: Drop libpci header dependency Jan Kiszka
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2011-06-27 18:19 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Michael S. Tsirkin, Alex Williamson

From: Jan Kiszka <jan.kiszka@siemens.com>

Don't mess with assign_intx on devices that are in MSI or MSI-X mode, it
would corrupt their interrupt routing.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 36c0f5f..a88d4fc 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1062,9 +1062,12 @@ void assigned_dev_update_irqs(void)
     dev = QLIST_FIRST(&devs);
     while (dev) {
         next = QLIST_NEXT(dev, next);
-        r = assign_irq(dev);
-        if (r < 0)
-            qdev_unplug(&dev->dev.qdev);
+        if (dev->irq_requested_type & KVM_DEV_IRQ_HOST_INTX) {
+            r = assign_irq(dev);
+            if (r < 0) {
+                qdev_unplug(&dev->dev.qdev);
+            }
+        }
         dev = next;
     }
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 09/13] pci-assign: Drop libpci header dependency
  2011-06-27 18:19 [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Jan Kiszka
                   ` (7 preceding siblings ...)
  2011-06-27 18:19 ` [PATCH 08/13] pci-assign: Update legacy interrupts only if used Jan Kiszka
@ 2011-06-27 18:19 ` Jan Kiszka
  2011-06-28  8:54   ` Michael S. Tsirkin
  2011-06-27 18:19 ` [PATCH 10/13] pci-assign: Refactor calc_assigned_dev_id Jan Kiszka
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2011-06-27 18:19 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Michael S. Tsirkin, Alex Williamson

From: Jan Kiszka <jan.kiszka@siemens.com>

All constants are now available through QEMU. Also drop the upstream
diff of pci_regs.h, it cannot clash with libpci anymore.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 configure              |   21 ---------------------
 hw/device-assignment.c |   13 ++++++-------
 hw/pci_regs.h          |    7 -------
 3 files changed, 6 insertions(+), 35 deletions(-)

diff --git a/configure b/configure
index 15db2a3..3b4b894 100755
--- a/configure
+++ b/configure
@@ -1838,27 +1838,6 @@ EOF
 fi
 
 ##########################################
-##########################################
-# libpci header probe for kvm_cap_device_assignment
-if test $kvm_cap_device_assignment = "yes" ; then
-  cat > $TMPC << EOF
-#include <pci/header.h>
-#ifndef PCI_VENDOR_ID
-#error NO LIBPCI HEADER
-#endif
-int main(void) { return 0; }
-EOF
-  if compile_prog "" "" ; then
-    kvm_cap_device_assignment=yes
-  else
-    echo
-    echo "Error: libpci header check failed"
-    echo "Disable KVM Device Assignment capability."
-    echo
-    kvm_cap_device_assignment=no
-  fi
-fi
-
 # pthread probe
 PTHREADLIBS_LIST="-lpthread -lpthreadGC2"
 
diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index a88d4fc..b341fd9 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -39,7 +39,6 @@
 #include "loader.h"
 #include "monitor.h"
 #include "range.h"
-#include <pci/header.h>
 #include "sysemu.h"
 
 /* From linux/ioport.h */
@@ -1150,7 +1149,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
     pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
 
     entries_max_nr = *(uint16_t *)(pci_dev->config + pos + 2);
-    entries_max_nr &= PCI_MSIX_TABSIZE;
+    entries_max_nr &= PCI_MSIX_FLAGS_QSIZE;
     entries_max_nr += 1;
 
     /* Get the usable entry number for allocating */
@@ -1243,7 +1242,7 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
      * try to catch this by only deassigning irqs if the guest is using
      * MSIX or intends to start. */
     if ((assigned_dev->irq_requested_type & KVM_DEV_IRQ_GUEST_MSIX) ||
-        (*ctrl_word & PCI_MSIX_ENABLE)) {
+        (*ctrl_word & PCI_MSIX_FLAGS_ENABLE)) {
 
         assigned_irq_data.flags = assigned_dev->irq_requested_type;
         free_dev_irq_entries(assigned_dev);
@@ -1255,7 +1254,7 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
         assigned_dev->irq_requested_type = 0;
     }
 
-    if (*ctrl_word & PCI_MSIX_ENABLE) {
+    if (*ctrl_word & PCI_MSIX_FLAGS_ENABLE) {
         assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
                                   KVM_DEV_IRQ_GUEST_MSIX;
 
@@ -1389,15 +1388,15 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
 
         pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS,
                      pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) &
-                     PCI_MSIX_TABSIZE);
+                     PCI_MSIX_FLAGS_QSIZE);
 
         /* Only enable and function mask bits are writable */
         pci_set_word(pci_dev->wmask + pos + PCI_MSIX_FLAGS,
                      PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
 
         msix_table_entry = pci_get_long(pci_dev->config + pos + PCI_MSIX_TABLE);
-        bar_nr = msix_table_entry & PCI_MSIX_BIR;
-        msix_table_entry &= ~PCI_MSIX_BIR;
+        bar_nr = msix_table_entry & PCI_MSIX_FLAGS_BIRMASK;
+        msix_table_entry &= ~PCI_MSIX_FLAGS_BIRMASK;
         dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
     }
 
diff --git a/hw/pci_regs.h b/hw/pci_regs.h
index 9836d35..e884096 100644
--- a/hw/pci_regs.h
+++ b/hw/pci_regs.h
@@ -44,16 +44,9 @@
 #define PCI_STATUS		0x06	/* 16 bits */
 #define  PCI_STATUS_INTERRUPT	0x08	/* Interrupt status */
 #define  PCI_STATUS_CAP_LIST	0x10	/* Support Capability List */
-
-#ifndef PCI_STATUS_66MHZ
 #define  PCI_STATUS_66MHZ	0x20	/* Support 66 Mhz PCI 2.1 bus */
-#endif
-
 #define  PCI_STATUS_UDF		0x40	/* Support User Definable Features [obsolete] */
-#ifndef PCI_STATUS_FAST_BACK
 #define  PCI_STATUS_FAST_BACK	0x80	/* Accept fast-back to back */
-#endif
-
 #define  PCI_STATUS_PARITY	0x100	/* Detected parity error */
 #define  PCI_STATUS_DEVSEL_MASK	0x600	/* DEVSEL timing */
 #define  PCI_STATUS_DEVSEL_FAST		0x000
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 10/13] pci-assign: Refactor calc_assigned_dev_id
  2011-06-27 18:19 [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Jan Kiszka
                   ` (8 preceding siblings ...)
  2011-06-27 18:19 ` [PATCH 09/13] pci-assign: Drop libpci header dependency Jan Kiszka
@ 2011-06-27 18:19 ` Jan Kiszka
  2011-06-27 18:19 ` [PATCH 11/13] pci-assign: Track MSI/MSI-X capability position, clean up related code Jan Kiszka
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2011-06-27 18:19 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Michael S. Tsirkin, Alex Williamson

From: Jan Kiszka <jan.kiszka@siemens.com>

Make calc_assigned_dev_id pick up all required bits from the device
passed to it.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |   25 +++++++++----------------
 hw/device-assignment.h |    6 +++---
 2 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index b341fd9..43a048f 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -860,9 +860,10 @@ static void free_assigned_device(AssignedDevice *dev)
     free_dev_irq_entries(dev);
 }
 
-static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t devfn)
+static uint32_t calc_assigned_dev_id(AssignedDevice *dev)
 {
-    return (uint32_t)seg << 16 | (uint32_t)bus << 8 | (uint32_t)devfn;
+    return (uint32_t)dev->h_segnr << 16 | (uint32_t)dev->h_busnr << 8 |
+           (uint32_t)dev->h_devfn;
 }
 
 static void assign_failed_examine(AssignedDevice *dev)
@@ -926,8 +927,7 @@ static int assign_device(AssignedDevice *dev)
     }
 
     memset(&assigned_dev_data, 0, sizeof(assigned_dev_data));
-    assigned_dev_data.assigned_dev_id  =
-	calc_assigned_dev_id(dev->h_segnr, dev->h_busnr, dev->h_devfn);
+    assigned_dev_data.assigned_dev_id = calc_assigned_dev_id(dev);
     assigned_dev_data.segnr = dev->h_segnr;
     assigned_dev_data.busnr = dev->h_busnr;
     assigned_dev_data.devfn = dev->h_devfn;
@@ -984,8 +984,7 @@ static int assign_irq(AssignedDevice *dev)
         return r;
 
     memset(&assigned_irq_data, 0, sizeof(assigned_irq_data));
-    assigned_irq_data.assigned_dev_id =
-        calc_assigned_dev_id(dev->h_segnr, dev->h_busnr, dev->h_devfn);
+    assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(dev);
     assigned_irq_data.guest_irq = irq;
     assigned_irq_data.host_irq = dev->real_device.irq;
     if (dev->irq_requested_type) {
@@ -1024,8 +1023,7 @@ static void deassign_device(AssignedDevice *dev)
     int r;
 
     memset(&assigned_dev_data, 0, sizeof(assigned_dev_data));
-    assigned_dev_data.assigned_dev_id  =
-	calc_assigned_dev_id(dev->h_segnr, dev->h_busnr, dev->h_devfn);
+    assigned_dev_data.assigned_dev_id = calc_assigned_dev_id(dev);
 
     r = kvm_deassign_pci_device(kvm_state, &assigned_dev_data);
     if (r < 0)
@@ -1079,9 +1077,7 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos)
     int r;
 
     memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
-    assigned_irq_data.assigned_dev_id  =
-        calc_assigned_dev_id(assigned_dev->h_segnr, assigned_dev->h_busnr,
-                (uint8_t)assigned_dev->h_devfn);
+    assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(assigned_dev);
 
     /* Some guests gratuitously disable MSI even if they're not using it,
      * try to catch this by only deassigning irqs if the guest is using
@@ -1166,8 +1162,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
         fprintf(stderr, "MSI-X entry number is zero!\n");
         return -EINVAL;
     }
-    msix_nr.assigned_dev_id = calc_assigned_dev_id(adev->h_segnr, adev->h_busnr,
-                                          (uint8_t)adev->h_devfn);
+    msix_nr.assigned_dev_id = calc_assigned_dev_id(adev);
     msix_nr.entry_nr = entries_nr;
     r = kvm_assign_set_msix_nr(kvm_state, &msix_nr);
     if (r != 0) {
@@ -1234,9 +1229,7 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
     int r;
 
     memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
-    assigned_irq_data.assigned_dev_id  =
-            calc_assigned_dev_id(assigned_dev->h_segnr, assigned_dev->h_busnr,
-                    (uint8_t)assigned_dev->h_devfn);
+    assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(assigned_dev);
 
     /* Some guests gratuitously disable MSIX even if they're not using it,
      * try to catch this by only deassigning irqs if the guest is using
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index ae1bc58..3d20207 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -90,9 +90,9 @@ typedef struct AssignedDevice {
     PCIDevRegions real_device;
     int run;
     int girq;
-    unsigned int h_segnr;
-    unsigned char h_busnr;
-    unsigned int h_devfn;
+    uint16_t h_segnr;
+    uint8_t h_busnr;
+    uint8_t h_devfn;
     int irq_requested_type;
     int bound;
     struct {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 11/13] pci-assign: Track MSI/MSI-X capability position, clean up related code
  2011-06-27 18:19 [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Jan Kiszka
                   ` (9 preceding siblings ...)
  2011-06-27 18:19 ` [PATCH 10/13] pci-assign: Refactor calc_assigned_dev_id Jan Kiszka
@ 2011-06-27 18:19 ` Jan Kiszka
  2011-06-27 18:19 ` [PATCH 12/13] pci-assign: Generic config space access management Jan Kiszka
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2011-06-27 18:19 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Michael S. Tsirkin, Alex Williamson

From: Jan Kiszka <jan.kiszka@siemens.com>

Store the MSI and MSI-X capability position in the same fields the QEMU
core uses as well. Although we still open-code MSI support, this does
not cause conflicts. Instead, it allow us to drop config space searches
from assigned_device_pci_cap_write_config. Moreover, we no longer need
to pass the cap position around and can clean up
assigned_dev_update_msi/msix a bit as well.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |   37 ++++++++++++++++++++-----------------
 1 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 43a048f..6a2ca8c 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1069,11 +1069,12 @@ void assigned_dev_update_irqs(void)
     }
 }
 
-static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos)
+static void assigned_dev_update_msi(PCIDevice *pci_dev)
 {
     struct kvm_assigned_irq assigned_irq_data;
     AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
-    uint8_t ctrl_byte = pci_dev->config[ctrl_pos];
+    uint8_t ctrl_byte = pci_get_byte(pci_dev->config + pci_dev->msi_cap +
+                                     PCI_MSI_FLAGS);
     int r;
 
     memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
@@ -1096,13 +1097,13 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos)
     }
 
     if (ctrl_byte & PCI_MSI_FLAGS_ENABLE) {
-        int pos = ctrl_pos - PCI_MSI_FLAGS;
+        uint8_t *pos = pci_dev->config + pci_dev->msi_cap;
+
         assigned_dev->entry = qemu_mallocz(sizeof(*(assigned_dev->entry)));
         assigned_dev->entry->u.msi.address_lo =
-            pci_get_long(pci_dev->config + pos + PCI_MSI_ADDRESS_LO);
+            pci_get_long(pos + PCI_MSI_ADDRESS_LO);
         assigned_dev->entry->u.msi.address_hi = 0;
-        assigned_dev->entry->u.msi.data =
-            pci_get_word(pci_dev->config + pos + PCI_MSI_DATA_32);
+        assigned_dev->entry->u.msi.data = pci_get_word(pos + PCI_MSI_DATA_32);
         assigned_dev->entry->type = KVM_IRQ_ROUTING_MSI;
         r = kvm_get_irq_route_gsi();
         if (r < 0) {
@@ -1221,11 +1222,12 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
     return r;
 }
 
-static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
+static void assigned_dev_update_msix(PCIDevice *pci_dev)
 {
     struct kvm_assigned_irq assigned_irq_data;
     AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
-    uint16_t *ctrl_word = (uint16_t *)(pci_dev->config + ctrl_pos);
+    uint16_t ctrl_word = pci_get_word(pci_dev->config + pci_dev->msix_cap +
+                                      PCI_MSIX_FLAGS);
     int r;
 
     memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
@@ -1235,7 +1237,7 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
      * try to catch this by only deassigning irqs if the guest is using
      * MSIX or intends to start. */
     if ((assigned_dev->irq_requested_type & KVM_DEV_IRQ_GUEST_MSIX) ||
-        (*ctrl_word & PCI_MSIX_FLAGS_ENABLE)) {
+        (ctrl_word & PCI_MSIX_FLAGS_ENABLE)) {
 
         assigned_irq_data.flags = assigned_dev->irq_requested_type;
         free_dev_irq_entries(assigned_dev);
@@ -1247,7 +1249,7 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
         assigned_dev->irq_requested_type = 0;
     }
 
-    if (*ctrl_word & PCI_MSIX_FLAGS_ENABLE) {
+    if (ctrl_word & PCI_MSIX_FLAGS_ENABLE) {
         assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
                                   KVM_DEV_IRQ_GUEST_MSIX;
 
@@ -1311,21 +1313,20 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
                                                  uint32_t val, int len)
 {
     uint8_t cap_id = pci_dev->config_map[address];
-    uint8_t cap;
 
     pci_default_write_config(pci_dev, address, val, len);
     switch (cap_id) {
     case PCI_CAP_ID_MSI:
-        cap = pci_find_capability(pci_dev, cap_id);
-        if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
-            assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
+        if (range_covers_byte(address, len,
+                              pci_dev->msi_cap + PCI_MSI_FLAGS)) {
+            assigned_dev_update_msi(pci_dev);
         }
         break;
 
     case PCI_CAP_ID_MSIX:
-        cap = pci_find_capability(pci_dev, cap_id);
-        if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
-            assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
+        if (range_covers_byte(address, len,
+                              pci_dev->msix_cap + PCI_MSIX_FLAGS + 1)) {
+            assigned_dev_update_msix(pci_dev);
         }
         break;
 
@@ -1356,6 +1357,7 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_MSI, pos, 10)) < 0) {
             return ret;
         }
+        pci_dev->msi_cap = pos;
 
         pci_set_word(pci_dev->config + pos + PCI_MSI_FLAGS,
                      pci_get_word(pci_dev->config + pos + PCI_MSI_FLAGS) &
@@ -1378,6 +1380,7 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_MSIX, pos, 12)) < 0) {
             return ret;
         }
+        pci_dev->msix_cap = pos;
 
         pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS,
                      pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) &
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 12/13] pci-assign: Generic config space access management
  2011-06-27 18:19 [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Jan Kiszka
                   ` (10 preceding siblings ...)
  2011-06-27 18:19 ` [PATCH 11/13] pci-assign: Track MSI/MSI-X capability position, clean up related code Jan Kiszka
@ 2011-06-27 18:19 ` Jan Kiszka
  2011-06-27 20:54   ` Michael S. Tsirkin
                     ` (3 more replies)
  2011-06-27 18:19 ` [PATCH 13/13] qemu-kvm: Resolve PCI upstream diffs Jan Kiszka
  2011-06-28  8:10 ` [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Avi Kivity
  13 siblings, 4 replies; 48+ messages in thread
From: Jan Kiszka @ 2011-06-27 18:19 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Michael S. Tsirkin, Alex Williamson

From: Jan Kiszka <jan.kiszka@siemens.com>

This drastically simplifies config space access management: Instead of
coding various range checks and merging bits, set up two access control
bitmaps. One defines, which bits can be directly read from the device,
the other allows direct write to the device, also with bit-granularity.

The setup of those bitmaps, specifically the corner cases like the
capability list status bit, is way more readable than the existing code.
And the generic config access functions just need one additional logic
to catch writes to the MSI/MSI-X control flags and call the related
update handlers.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |  325 ++++++++++++++++-------------------------------
 hw/device-assignment.h |    3 +-
 2 files changed, 113 insertions(+), 215 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 6a2ca8c..be199d2 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -63,35 +63,6 @@ static void assigned_dev_load_option_rom(AssignedDevice *dev);
 
 static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
 
-static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
-                                                 uint32_t address,
-                                                 uint32_t val, int len);
-
-static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
-                                                    uint32_t address, int len);
-
-/* Merge the bits set in mask from mval into val.  Both val and mval are
- * at the same addr offset, pos is the starting offset of the mask. */
-static uint32_t merge_bits(uint32_t val, uint32_t mval, uint8_t addr,
-                           int len, uint8_t pos, uint32_t mask)
-{
-    if (!ranges_overlap(addr, len, pos, 4)) {
-        return val;
-    }
-
-    if (addr >= pos) {
-        mask >>= (addr - pos) * 8;
-    } else {
-        mask <<= (pos - addr) * 8;
-    }
-    mask &= 0xffffffffU >> (4 - len) * 8;
-
-    val &= ~mask;
-    val |= (mval & mask);
-
-    return val;
-}
-
 static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
                                        uint32_t addr, int len, uint32_t *val)
 {
@@ -404,143 +375,6 @@ static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap, uint8_t start)
     return 0;
 }
 
-static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address,
-                                          uint32_t val, int len)
-{
-    int fd;
-    ssize_t ret;
-    AssignedDevice *pci_dev = DO_UPCAST(AssignedDevice, dev, d);
-
-    DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
-          ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
-          (uint16_t) address, val, len);
-
-    if (address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address]) {
-        return assigned_device_pci_cap_write_config(d, address, val, len);
-    }
-
-    if (ranges_overlap(address, len, PCI_COMMAND, 2)) {
-        pci_default_write_config(d, address, val, len);
-        /* Continue to program the card */
-    }
-
-    /*
-     * Catch access to
-     *  - base address registers
-     *  - ROM base address & capability pointer
-     *  - interrupt line & pin
-     */
-    if (ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
-        ranges_overlap(address, len, PCI_ROM_ADDRESS, 4)) {
-        pci_default_write_config(d, address, val, len);
-        return;
-    } else if (ranges_overlap(address, len, PCI_CAPABILITY_LIST, 1) ||
-               ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) {
-        uint32_t real_val;
-
-        pci_default_write_config(d, address, val, len);
-
-        /* Ensure that writes to overlapping areas we don't virtualize still
-         * hit the device. */
-        real_val = assigned_dev_pci_read(d, address, len);
-        val = merge_bits(val, real_val, address, len,
-                         PCI_CAPABILITY_LIST, 0xff);
-        val = merge_bits(val, real_val, address, len,
-                         PCI_INTERRUPT_LINE, 0xffff);
-    }
-
-    DEBUG("NON BAR (%x.%x): address=%04x val=0x%08x len=%d\n",
-          ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
-          (uint16_t) address, val, len);
-
-    fd = pci_dev->real_device.config_fd;
-
-again:
-    ret = pwrite(fd, &val, len, address);
-    if (ret != len) {
-	if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
-	    goto again;
-
-	fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
-		__func__, ret, errno);
-
-	exit(1);
-    }
-}
-
-static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address,
-                                             int len)
-{
-    uint32_t val = 0, virt_val;
-    int fd;
-    ssize_t ret;
-    AssignedDevice *pci_dev = DO_UPCAST(AssignedDevice, dev, d);
-
-    if (address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address]) {
-        val = assigned_device_pci_cap_read_config(d, address, len);
-        DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
-              (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
-        return val;
-    }
-
-    /*
-     * Catch access to
-     *  - vendor & device ID
-     *  - base address registers
-     *  - ROM base address
-     */
-    if (ranges_overlap(address, len, PCI_VENDOR_ID, 4) ||
-        ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
-        ranges_overlap(address, len, PCI_ROM_ADDRESS, 4)) {
-        val = pci_default_read_config(d, address, len);
-        DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
-              (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
-        return val;
-    }
-
-    fd = pci_dev->real_device.config_fd;
-
-again:
-    ret = pread(fd, &val, len, address);
-    if (ret != len) {
-	if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
-	    goto again;
-
-	fprintf(stderr, "%s: pread failed, ret = %zd errno = %d\n",
-		__func__, ret, errno);
-
-	exit(1);
-    }
-
-    DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
-          (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
-
-    if (pci_dev->emulate_cmd_mask) {
-        val = merge_bits(val, pci_default_read_config(d, address, len),
-                         address, len, PCI_COMMAND, pci_dev->emulate_cmd_mask);
-    }
-
-    /*
-     * Merge bits from virtualized
-     *  - capability pointer
-     *  - interrupt line & pin
-     */
-    virt_val = pci_default_read_config(d, address, len);
-    val = merge_bits(val, virt_val, address, len, PCI_CAPABILITY_LIST, 0xff);
-    val = merge_bits(val, virt_val, address, len, PCI_INTERRUPT_LINE, 0xffff);
-
-    if (!pci_dev->cap.available) {
-        /* kill the special capabilities */
-        if (address == PCI_COMMAND && len == 4) {
-            val &= ~(PCI_STATUS_CAP_LIST << 16);
-        } else if (address == PCI_STATUS) {
-            val &= ~PCI_STATUS_CAP_LIST;
-        }
-    }
-
-    return val;
-}
-
 static int assigned_dev_register_regions(PCIRegion *io_regions,
                                          unsigned long regions_num,
                                          AssignedDevice *pci_dev)
@@ -790,7 +624,8 @@ again:
     /* dealing with virtual function device */
     snprintf(name, sizeof(name), "%sphysfn/", dir);
     if (!stat(name, &statbuf)) {
-        pci_dev->emulate_cmd_mask = 0xffff;
+        /* always provide the written value on readout */
+        memset(pci_dev->direct_config_read + PCI_COMMAND, 0xff, 2);
     }
 
     dev->region_number = r;
@@ -1268,73 +1103,81 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
     }
 }
 
-/* There can be multiple VNDR capabilities per device, we need to find the
- * one that starts closet to the given address without going over. */
-static uint8_t find_vndr_start(PCIDevice *pci_dev, uint32_t address)
+static uint32_t assigned_dev_pci_read_config(PCIDevice *pci_dev,
+                                             uint32_t address, int len)
 {
-    uint8_t cap, pos;
+    AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
+    uint32_t virt_val = pci_default_read_config(pci_dev, address, len);
+    uint32_t real_val, direct_mask;
 
-    for (cap = pos = 0;
-         (pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VNDR, pos));
-         pos += PCI_CAP_LIST_NEXT) {
-        if (pos <= address) {
-            cap = MAX(pos, cap);
-        }
+    switch (len) {
+    case 4:
+        direct_mask =
+            pci_get_long(assigned_dev->direct_config_read + address);
+        break;
+    case 2:
+        direct_mask =
+            pci_get_word(assigned_dev->direct_config_read + address);
+        break;
+    case 1:
+        direct_mask =
+            pci_get_byte(assigned_dev->direct_config_read + address);
+        break;
+    default:
+        abort();
     }
-    return cap;
-}
-
-static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
-                                                    uint32_t address, int len)
-{
-    uint8_t cap, cap_id = pci_dev->config_map[address];
-    uint32_t val;
-
-    switch (cap_id) {
-
-    case PCI_CAP_ID_VPD:
-        cap = pci_find_capability(pci_dev, cap_id);
-        val = assigned_dev_pci_read(pci_dev, address, len);
-        return merge_bits(val, pci_get_long(pci_dev->config + address),
-                          address, len, cap + PCI_CAP_LIST_NEXT, 0xff);
-
-    case PCI_CAP_ID_VNDR:
-        cap = find_vndr_start(pci_dev, address);
-        val = assigned_dev_pci_read(pci_dev, address, len);
-        return merge_bits(val, pci_get_long(pci_dev->config + address),
-                          address, len, cap + PCI_CAP_LIST_NEXT, 0xff);
+    if (direct_mask) {
+        real_val = assigned_dev_pci_read(pci_dev, address, len);
+        return (virt_val & ~direct_mask) | (real_val & direct_mask);
+    } else {
+        return virt_val;
     }
-
-    return pci_default_read_config(pci_dev, address, len);
 }
 
-static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
-                                                 uint32_t address,
-                                                 uint32_t val, int len)
+static void assigned_dev_pci_write_config(PCIDevice *pci_dev, uint32_t address,
+                                          uint32_t val, int len)
 {
-    uint8_t cap_id = pci_dev->config_map[address];
+    AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
+    uint32_t direct_mask, full_access;
 
     pci_default_write_config(pci_dev, address, val, len);
-    switch (cap_id) {
-    case PCI_CAP_ID_MSI:
+
+    if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
         if (range_covers_byte(address, len,
                               pci_dev->msi_cap + PCI_MSI_FLAGS)) {
             assigned_dev_update_msi(pci_dev);
         }
-        break;
-
-    case PCI_CAP_ID_MSIX:
+    } else if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
         if (range_covers_byte(address, len,
                               pci_dev->msix_cap + PCI_MSIX_FLAGS + 1)) {
             assigned_dev_update_msix(pci_dev);
         }
-        break;
+    }
 
-    case PCI_CAP_ID_VPD:
-    case PCI_CAP_ID_VNDR:
-        assigned_dev_pci_write(pci_dev, address, val, len);
+    switch (len) {
+    case 4:
+        direct_mask =
+            pci_get_long(assigned_dev->direct_config_write + address);
+        full_access = 0xffffffff;
+        break;
+    case 2:
+        direct_mask =
+            pci_get_word(assigned_dev->direct_config_write + address);
+        full_access = 0xffff;
         break;
+    case 1:
+        direct_mask =
+            pci_get_byte(assigned_dev->direct_config_write + address);
+        full_access = 0xff;
+        break;
+    default:
+        abort();
+    }
+    if (direct_mask != full_access) {
+        val &= direct_mask;
+        val |= assigned_dev_pci_read(pci_dev, address, len) & ~direct_mask;
     }
+    assigned_dev_pci_write(pci_dev, address, val, len);
 }
 
 static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
@@ -1404,6 +1247,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
             return ret;
         }
 
+        /* direct read except for next cap pointer */
+        memset(dev->direct_config_read + pos, 0xff, PCI_PM_SIZEOF);
+        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
+
         pmc = pci_get_word(pci_dev->config + pos + PCI_CAP_FLAGS);
         pmc &= (PCI_PM_CAP_VER_MASK | PCI_PM_CAP_DSI);
         pci_set_word(pci_dev->config + pos + PCI_CAP_FLAGS, pmc);
@@ -1438,6 +1285,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
             return ret;
         }
 
+        /* direct read except for next cap pointer */
+        memset(dev->direct_config_read + pos, 0xff, size);
+        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
+
         type = pci_get_word(pci_dev->config + pos + PCI_EXP_FLAGS);
         type = (type & PCI_EXP_FLAGS_TYPE) >> 8;
         if (type != PCI_EXP_TYPE_ENDPOINT &&
@@ -1513,6 +1364,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
             return ret;
         }
 
+        /* direct read except for next cap pointer */
+        memset(dev->direct_config_read + pos, 0xff, 8);
+        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
+
         /* Command register, clear upper bits, including extended modes */
         cmd = pci_get_word(pci_dev->config + pos + PCI_X_CMD);
         cmd &= (PCI_X_CMD_DPERR_E | PCI_X_CMD_ERO | PCI_X_CMD_MAX_READ |
@@ -1534,6 +1389,13 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_VPD, pos, 8)) < 0) {
             return ret;
         }
+
+        /* direct read except for next cap pointer */
+        memset(dev->direct_config_read + pos, 0xff, 8);
+        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
+
+        /* direct write for cap content */
+        memset(dev->direct_config_write + pos + 2, 0xff, 6);
     }
 
     /* Devices can have multiple vendor capabilities, get them all */
@@ -1545,6 +1407,21 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
                                       pos, len)) < 0) {
             return ret;
         }
+
+        /* direct read except for next cap pointer */
+        memset(dev->direct_config_read + pos, 0xff, len);
+        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
+
+        /* direct write for cap content */
+        memset(dev->direct_config_write + pos + 2, 0xff, len - 2);
+    }
+
+    /* If real and virtual capability list status bits differ, virtualize the
+     * access. */
+    if ((pci_get_word(pci_dev->config + PCI_STATUS) & PCI_STATUS_CAP_LIST) !=
+        (assigned_dev_pci_read_byte(pci_dev, PCI_STATUS) &
+         PCI_STATUS_CAP_LIST)) {
+        dev->direct_config_read[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
     }
 
     return 0;
@@ -1694,6 +1571,26 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
         return -1;
     }
 
+    /*
+     * Set up basic config space access control. Will be further refined during
+     * device initialization.
+     *
+     * Direct read/write access is grangted for:
+     *  - status & command registers, revision ID & class code, cacheline size,
+     *    latency timer, header type, BIST
+     *  - cardbus CIS pointer, subsystem vendor & ID
+     *  - reserved fields
+     *  - Min_Gnt & Max_Lat
+     */
+    memset(dev->direct_config_read + PCI_COMMAND, 0xff, 12);
+    memset(dev->direct_config_read + PCI_CARDBUS_CIS, 0xff, 8);
+    memset(dev->direct_config_read + PCI_CAPABILITY_LIST + 1, 0xff, 7);
+    memset(dev->direct_config_read + PCI_MIN_GNT, 0xff, 2);
+    memset(dev->direct_config_write + PCI_COMMAND, 0xff, 12);
+    memset(dev->direct_config_write + PCI_CARDBUS_CIS, 0xff, 8);
+    memset(dev->direct_config_write + PCI_CAPABILITY_LIST + 1, 0xff, 7);
+    memset(dev->direct_config_write + PCI_MIN_GNT, 0xff, 2);
+
     if (get_real_device(dev, dev->host.seg, dev->host.bus,
                         dev->host.dev, dev->host.func)) {
         error_report("pci-assign: Error: Couldn't get real device (%s)!",
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index 3d20207..ed5defb 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -104,12 +104,13 @@ typedef struct AssignedDevice {
 #define ASSIGNED_DEVICE_MSIX_MASKED (1 << 2)
         uint32_t state;
     } cap;
+    uint8_t direct_config_read[PCI_CONFIG_SPACE_SIZE];
+    uint8_t direct_config_write[PCI_CONFIG_SPACE_SIZE];
     int irq_entries_nr;
     struct kvm_irq_routing_entry *entry;
     void *msix_table_page;
     target_phys_addr_t msix_table_addr;
     int mmio_index;
-    uint32_t emulate_cmd_mask;
     char *configfd_name;
     int32_t bootindex;
     QLIST_ENTRY(AssignedDevice) next;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 13/13] qemu-kvm: Resolve PCI upstream diffs
  2011-06-27 18:19 [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Jan Kiszka
                   ` (11 preceding siblings ...)
  2011-06-27 18:19 ` [PATCH 12/13] pci-assign: Generic config space access management Jan Kiszka
@ 2011-06-27 18:19 ` Jan Kiszka
  2011-06-28  8:58   ` Michael S. Tsirkin
  2011-06-28  8:10 ` [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Avi Kivity
  13 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2011-06-27 18:19 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Michael S. Tsirkin, Alex Williamson

From: Jan Kiszka <jan.kiszka@siemens.com>

Device assignment no longer peeks into config_map, so we can drop all
the related changes and sync the PCI core with upstream.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/pci.c |   29 +++++++++++++++--------------
 hw/pci.h |    8 ++++++--
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 5deaa04..a637506 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -793,7 +793,7 @@ static void pci_config_alloc(PCIDevice *pci_dev)
     pci_dev->cmask = qemu_mallocz(config_size);
     pci_dev->wmask = qemu_mallocz(config_size);
     pci_dev->w1cmask = qemu_mallocz(config_size);
-    pci_dev->config_map = qemu_mallocz(config_size);
+    pci_dev->used = qemu_mallocz(config_size);
 }
 
 static void pci_config_free(PCIDevice *pci_dev)
@@ -802,7 +802,7 @@ static void pci_config_free(PCIDevice *pci_dev)
     qemu_free(pci_dev->cmask);
     qemu_free(pci_dev->wmask);
     qemu_free(pci_dev->w1cmask);
-    qemu_free(pci_dev->config_map);
+    qemu_free(pci_dev->used);
 }
 
 /* -1 for devfn means auto assign */
@@ -833,8 +833,6 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
     pci_dev->irq_state = 0;
     pci_config_alloc(pci_dev);
 
-    memset(pci_dev->config_map, 0xff, PCI_CONFIG_HEADER_SIZE);
-
     pci_config_set_vendor_id(pci_dev->config, info->vendor_id);
     pci_config_set_device_id(pci_dev->config, info->device_id);
     pci_config_set_revision(pci_dev->config, info->revision);
@@ -1899,7 +1897,7 @@ static int pci_find_space(PCIDevice *pdev, uint8_t size)
     int offset = PCI_CONFIG_HEADER_SIZE;
     int i;
     for (i = PCI_CONFIG_HEADER_SIZE; i < config_size; ++i)
-        if (pdev->config_map[i])
+        if (pdev->used[i])
             offset = i + 1;
         else if (i - offset + 1 == size)
             return offset;
@@ -2080,13 +2078,13 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
         int i;
 
         for (i = offset; i < offset + size; i++) {
-            if (pdev->config_map[i]) {
+            if (pdev->used[i]) {
                 fprintf(stderr, "ERROR: %04x:%02x:%02x.%x "
                         "Attempt to add PCI capability %x at offset "
                         "%x overlaps existing capability %x at offset %x\n",
                         pci_find_domain(pdev->bus), pci_bus_num(pdev->bus),
                         PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
-                        cap_id, offset, pdev->config_map[i], i);
+                        cap_id, offset, pdev->used[i], i);
                 return -EINVAL;
             }
         }
@@ -2096,14 +2094,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
     config[PCI_CAP_LIST_ID] = cap_id;
     config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
     pdev->config[PCI_CAPABILITY_LIST] = offset;
-    memset(pdev->config_map + offset, cap_id, size);
+    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
+    memset(pdev->used + offset, 0xFF, size);
     /* Make capability read-only by default */
     memset(pdev->wmask + offset, 0, size);
     /* Check capability by default */
     memset(pdev->cmask + offset, 0xFF, size);
-
-    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
-
     return offset;
 }
 
@@ -2119,11 +2115,16 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
     memset(pdev->w1cmask + offset, 0, size);
     /* Clear cmask as device-specific registers can't be checked */
     memset(pdev->cmask + offset, 0, size);
-    memset(pdev->config_map + offset, 0, size);
+    memset(pdev->used + offset, 0, size);
 
-    if (!pdev->config[PCI_CAPABILITY_LIST]) {
+    if (!pdev->config[PCI_CAPABILITY_LIST])
         pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
-    }
+}
+
+/* Reserve space for capability at a known offset (to call after load). */
+void pci_reserve_capability(PCIDevice *pdev, uint8_t offset, uint8_t size)
+{
+    memset(pdev->used + offset, 0xff, size);
 }
 
 uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id)
diff --git a/hw/pci.h b/hw/pci.h
index c7081bc..ed734b3 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -146,8 +146,8 @@ struct PCIDevice {
     /* Used to implement RW1C(Write 1 to Clear) bytes */
     uint8_t *w1cmask;
 
-    /* Used to allocate config space and track capabilities. */
-    uint8_t *config_map;
+    /* Used to allocate config space capabilities. */
+    uint8_t *used;
 
     /* the following fields are read only */
     PCIBus *bus;
@@ -233,14 +233,18 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
 
 void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size);
 
+void pci_reserve_capability(PCIDevice *pci_dev, uint8_t offset, uint8_t size);
+
 uint8_t pci_find_capability(PCIDevice *pci_dev, uint8_t cap_id);
 
+
 uint32_t pci_default_read_config(PCIDevice *d,
                                  uint32_t address, int len);
 void pci_default_write_config(PCIDevice *d,
                               uint32_t address, uint32_t val, int len);
 void pci_device_save(PCIDevice *s, QEMUFile *f);
 int pci_device_load(PCIDevice *s, QEMUFile *f);
+
 typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level);
 typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 12/13] pci-assign: Generic config space access management
  2011-06-27 18:19 ` [PATCH 12/13] pci-assign: Generic config space access management Jan Kiszka
@ 2011-06-27 20:54   ` Michael S. Tsirkin
  2011-06-27 22:48   ` Alex Williamson
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2011-06-27 20:54 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

On Mon, Jun 27, 2011 at 08:19:55PM +0200, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> This drastically simplifies config space access management: Instead of
> coding various range checks and merging bits, set up two access control
> bitmaps. One defines, which bits can be directly read from the device,
> the other allows direct write to the device, also with bit-granularity.
> 
> The setup of those bitmaps, specifically the corner cases like the
> capability list status bit, is way more readable than the existing code.
> And the generic config access functions just need one additional logic
> to catch writes to the MSI/MSI-X control flags and call the related
> update handlers.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

Just started the review but I already like the approach very much.

> ---
>  hw/device-assignment.c |  325 ++++++++++++++++-------------------------------
>  hw/device-assignment.h |    3 +-
>  2 files changed, 113 insertions(+), 215 deletions(-)
> 
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index 6a2ca8c..be199d2 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -63,35 +63,6 @@ static void assigned_dev_load_option_rom(AssignedDevice *dev);
>  
>  static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
>  
> -static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
> -                                                 uint32_t address,
> -                                                 uint32_t val, int len);
> -
> -static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
> -                                                    uint32_t address, int len);
> -
> -/* Merge the bits set in mask from mval into val.  Both val and mval are
> - * at the same addr offset, pos is the starting offset of the mask. */
> -static uint32_t merge_bits(uint32_t val, uint32_t mval, uint8_t addr,
> -                           int len, uint8_t pos, uint32_t mask)
> -{
> -    if (!ranges_overlap(addr, len, pos, 4)) {
> -        return val;
> -    }
> -
> -    if (addr >= pos) {
> -        mask >>= (addr - pos) * 8;
> -    } else {
> -        mask <<= (pos - addr) * 8;
> -    }
> -    mask &= 0xffffffffU >> (4 - len) * 8;
> -
> -    val &= ~mask;
> -    val |= (mval & mask);
> -
> -    return val;
> -}
> -
>  static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
>                                         uint32_t addr, int len, uint32_t *val)
>  {
> @@ -404,143 +375,6 @@ static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap, uint8_t start)
>      return 0;
>  }
>  
> -static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address,
> -                                          uint32_t val, int len)
> -{
> -    int fd;
> -    ssize_t ret;
> -    AssignedDevice *pci_dev = DO_UPCAST(AssignedDevice, dev, d);
> -
> -    DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> -          ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
> -          (uint16_t) address, val, len);
> -
> -    if (address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address]) {
> -        return assigned_device_pci_cap_write_config(d, address, val, len);
> -    }
> -
> -    if (ranges_overlap(address, len, PCI_COMMAND, 2)) {
> -        pci_default_write_config(d, address, val, len);
> -        /* Continue to program the card */
> -    }
> -
> -    /*
> -     * Catch access to
> -     *  - base address registers
> -     *  - ROM base address & capability pointer
> -     *  - interrupt line & pin
> -     */
> -    if (ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
> -        ranges_overlap(address, len, PCI_ROM_ADDRESS, 4)) {
> -        pci_default_write_config(d, address, val, len);
> -        return;
> -    } else if (ranges_overlap(address, len, PCI_CAPABILITY_LIST, 1) ||
> -               ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) {
> -        uint32_t real_val;
> -
> -        pci_default_write_config(d, address, val, len);
> -
> -        /* Ensure that writes to overlapping areas we don't virtualize still
> -         * hit the device. */
> -        real_val = assigned_dev_pci_read(d, address, len);
> -        val = merge_bits(val, real_val, address, len,
> -                         PCI_CAPABILITY_LIST, 0xff);
> -        val = merge_bits(val, real_val, address, len,
> -                         PCI_INTERRUPT_LINE, 0xffff);
> -    }
> -
> -    DEBUG("NON BAR (%x.%x): address=%04x val=0x%08x len=%d\n",
> -          ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
> -          (uint16_t) address, val, len);
> -
> -    fd = pci_dev->real_device.config_fd;
> -
> -again:
> -    ret = pwrite(fd, &val, len, address);
> -    if (ret != len) {
> -	if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
> -	    goto again;
> -
> -	fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
> -		__func__, ret, errno);
> -
> -	exit(1);
> -    }
> -}
> -
> -static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address,
> -                                             int len)
> -{
> -    uint32_t val = 0, virt_val;
> -    int fd;
> -    ssize_t ret;
> -    AssignedDevice *pci_dev = DO_UPCAST(AssignedDevice, dev, d);
> -
> -    if (address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address]) {
> -        val = assigned_device_pci_cap_read_config(d, address, len);
> -        DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> -              (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
> -        return val;
> -    }
> -
> -    /*
> -     * Catch access to
> -     *  - vendor & device ID
> -     *  - base address registers
> -     *  - ROM base address
> -     */
> -    if (ranges_overlap(address, len, PCI_VENDOR_ID, 4) ||
> -        ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
> -        ranges_overlap(address, len, PCI_ROM_ADDRESS, 4)) {
> -        val = pci_default_read_config(d, address, len);
> -        DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> -              (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
> -        return val;
> -    }
> -
> -    fd = pci_dev->real_device.config_fd;
> -
> -again:
> -    ret = pread(fd, &val, len, address);
> -    if (ret != len) {
> -	if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
> -	    goto again;
> -
> -	fprintf(stderr, "%s: pread failed, ret = %zd errno = %d\n",
> -		__func__, ret, errno);
> -
> -	exit(1);
> -    }
> -
> -    DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> -          (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
> -
> -    if (pci_dev->emulate_cmd_mask) {
> -        val = merge_bits(val, pci_default_read_config(d, address, len),
> -                         address, len, PCI_COMMAND, pci_dev->emulate_cmd_mask);
> -    }
> -
> -    /*
> -     * Merge bits from virtualized
> -     *  - capability pointer
> -     *  - interrupt line & pin
> -     */
> -    virt_val = pci_default_read_config(d, address, len);
> -    val = merge_bits(val, virt_val, address, len, PCI_CAPABILITY_LIST, 0xff);
> -    val = merge_bits(val, virt_val, address, len, PCI_INTERRUPT_LINE, 0xffff);
> -
> -    if (!pci_dev->cap.available) {
> -        /* kill the special capabilities */
> -        if (address == PCI_COMMAND && len == 4) {
> -            val &= ~(PCI_STATUS_CAP_LIST << 16);
> -        } else if (address == PCI_STATUS) {
> -            val &= ~PCI_STATUS_CAP_LIST;
> -        }
> -    }
> -
> -    return val;
> -}
> -
>  static int assigned_dev_register_regions(PCIRegion *io_regions,
>                                           unsigned long regions_num,
>                                           AssignedDevice *pci_dev)
> @@ -790,7 +624,8 @@ again:
>      /* dealing with virtual function device */
>      snprintf(name, sizeof(name), "%sphysfn/", dir);
>      if (!stat(name, &statbuf)) {
> -        pci_dev->emulate_cmd_mask = 0xffff;
> +        /* always provide the written value on readout */
> +        memset(pci_dev->direct_config_read + PCI_COMMAND, 0xff, 2);
>      }
>  
>      dev->region_number = r;
> @@ -1268,73 +1103,81 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
>      }
>  }
>  
> -/* There can be multiple VNDR capabilities per device, we need to find the
> - * one that starts closet to the given address without going over. */
> -static uint8_t find_vndr_start(PCIDevice *pci_dev, uint32_t address)
> +static uint32_t assigned_dev_pci_read_config(PCIDevice *pci_dev,
> +                                             uint32_t address, int len)
>  {
> -    uint8_t cap, pos;
> +    AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> +    uint32_t virt_val = pci_default_read_config(pci_dev, address, len);
> +    uint32_t real_val, direct_mask;
>  
> -    for (cap = pos = 0;
> -         (pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VNDR, pos));
> -         pos += PCI_CAP_LIST_NEXT) {
> -        if (pos <= address) {
> -            cap = MAX(pos, cap);
> -        }
> +    switch (len) {
> +    case 4:
> +        direct_mask =
> +            pci_get_long(assigned_dev->direct_config_read + address);
> +        break;
> +    case 2:
> +        direct_mask =
> +            pci_get_word(assigned_dev->direct_config_read + address);
> +        break;
> +    case 1:
> +        direct_mask =
> +            pci_get_byte(assigned_dev->direct_config_read + address);
> +        break;
> +    default:
> +        abort();
>      }
> -    return cap;
> -}
> -
> -static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
> -                                                    uint32_t address, int len)
> -{
> -    uint8_t cap, cap_id = pci_dev->config_map[address];
> -    uint32_t val;
> -
> -    switch (cap_id) {
> -
> -    case PCI_CAP_ID_VPD:
> -        cap = pci_find_capability(pci_dev, cap_id);
> -        val = assigned_dev_pci_read(pci_dev, address, len);
> -        return merge_bits(val, pci_get_long(pci_dev->config + address),
> -                          address, len, cap + PCI_CAP_LIST_NEXT, 0xff);
> -
> -    case PCI_CAP_ID_VNDR:
> -        cap = find_vndr_start(pci_dev, address);
> -        val = assigned_dev_pci_read(pci_dev, address, len);
> -        return merge_bits(val, pci_get_long(pci_dev->config + address),
> -                          address, len, cap + PCI_CAP_LIST_NEXT, 0xff);
> +    if (direct_mask) {
> +        real_val = assigned_dev_pci_read(pci_dev, address, len);
> +        return (virt_val & ~direct_mask) | (real_val & direct_mask);
> +    } else {
> +        return virt_val;
>      }
> -
> -    return pci_default_read_config(pci_dev, address, len);
>  }
>  
> -static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
> -                                                 uint32_t address,
> -                                                 uint32_t val, int len)
> +static void assigned_dev_pci_write_config(PCIDevice *pci_dev, uint32_t address,
> +                                          uint32_t val, int len)
>  {
> -    uint8_t cap_id = pci_dev->config_map[address];
> +    AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> +    uint32_t direct_mask, full_access;
>  
>      pci_default_write_config(pci_dev, address, val, len);
> -    switch (cap_id) {
> -    case PCI_CAP_ID_MSI:
> +
> +    if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
>          if (range_covers_byte(address, len,
>                                pci_dev->msi_cap + PCI_MSI_FLAGS)) {
>              assigned_dev_update_msi(pci_dev);
>          }
> -        break;
> -
> -    case PCI_CAP_ID_MSIX:
> +    } else if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
>          if (range_covers_byte(address, len,
>                                pci_dev->msix_cap + PCI_MSIX_FLAGS + 1)) {
>              assigned_dev_update_msix(pci_dev);
>          }
> -        break;
> +    }
>  
> -    case PCI_CAP_ID_VPD:
> -    case PCI_CAP_ID_VNDR:
> -        assigned_dev_pci_write(pci_dev, address, val, len);
> +    switch (len) {
> +    case 4:
> +        direct_mask =
> +            pci_get_long(assigned_dev->direct_config_write + address);
> +        full_access = 0xffffffff;
> +        break;
> +    case 2:
> +        direct_mask =
> +            pci_get_word(assigned_dev->direct_config_write + address);
> +        full_access = 0xffff;
>          break;
> +    case 1:
> +        direct_mask =
> +            pci_get_byte(assigned_dev->direct_config_write + address);
> +        full_access = 0xff;
> +        break;
> +    default:
> +        abort();
> +    }
> +    if (direct_mask != full_access) {
> +        val &= direct_mask;
> +        val |= assigned_dev_pci_read(pci_dev, address, len) & ~direct_mask;
>      }
> +    assigned_dev_pci_write(pci_dev, address, val, len);
>  }
>  
>  static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
> @@ -1404,6 +1247,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>              return ret;
>          }
>  
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, PCI_PM_SIZEOF);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
>          pmc = pci_get_word(pci_dev->config + pos + PCI_CAP_FLAGS);
>          pmc &= (PCI_PM_CAP_VER_MASK | PCI_PM_CAP_DSI);
>          pci_set_word(pci_dev->config + pos + PCI_CAP_FLAGS, pmc);
> @@ -1438,6 +1285,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>              return ret;
>          }
>  
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, size);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
>          type = pci_get_word(pci_dev->config + pos + PCI_EXP_FLAGS);
>          type = (type & PCI_EXP_FLAGS_TYPE) >> 8;
>          if (type != PCI_EXP_TYPE_ENDPOINT &&
> @@ -1513,6 +1364,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>              return ret;
>          }
>  
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, 8);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
>          /* Command register, clear upper bits, including extended modes */
>          cmd = pci_get_word(pci_dev->config + pos + PCI_X_CMD);
>          cmd &= (PCI_X_CMD_DPERR_E | PCI_X_CMD_ERO | PCI_X_CMD_MAX_READ |
> @@ -1534,6 +1389,13 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>          if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_VPD, pos, 8)) < 0) {
>              return ret;
>          }
> +
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, 8);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
> +        /* direct write for cap content */
> +        memset(dev->direct_config_write + pos + 2, 0xff, 6);
>      }
>  
>      /* Devices can have multiple vendor capabilities, get them all */
> @@ -1545,6 +1407,21 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>                                        pos, len)) < 0) {
>              return ret;
>          }
> +
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, len);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
> +        /* direct write for cap content */
> +        memset(dev->direct_config_write + pos + 2, 0xff, len - 2);
> +    }
> +
> +    /* If real and virtual capability list status bits differ, virtualize the
> +     * access. */
> +    if ((pci_get_word(pci_dev->config + PCI_STATUS) & PCI_STATUS_CAP_LIST) !=
> +        (assigned_dev_pci_read_byte(pci_dev, PCI_STATUS) &
> +         PCI_STATUS_CAP_LIST)) {
> +        dev->direct_config_read[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
>      }
>  
>      return 0;
> @@ -1694,6 +1571,26 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
>          return -1;
>      }
>  
> +    /*
> +     * Set up basic config space access control. Will be further refined during
> +     * device initialization.
> +     *
> +     * Direct read/write access is grangted for:
> +     *  - status & command registers, revision ID & class code, cacheline size,
> +     *    latency timer, header type, BIST
> +     *  - cardbus CIS pointer, subsystem vendor & ID
> +     *  - reserved fields
> +     *  - Min_Gnt & Max_Lat
> +     */
> +    memset(dev->direct_config_read + PCI_COMMAND, 0xff, 12);
> +    memset(dev->direct_config_read + PCI_CARDBUS_CIS, 0xff, 8);
> +    memset(dev->direct_config_read + PCI_CAPABILITY_LIST + 1, 0xff, 7);
> +    memset(dev->direct_config_read + PCI_MIN_GNT, 0xff, 2);
> +    memset(dev->direct_config_write + PCI_COMMAND, 0xff, 12);
> +    memset(dev->direct_config_write + PCI_CARDBUS_CIS, 0xff, 8);
> +    memset(dev->direct_config_write + PCI_CAPABILITY_LIST + 1, 0xff, 7);
> +    memset(dev->direct_config_write + PCI_MIN_GNT, 0xff, 2);
> +
>      if (get_real_device(dev, dev->host.seg, dev->host.bus,
>                          dev->host.dev, dev->host.func)) {
>          error_report("pci-assign: Error: Couldn't get real device (%s)!",
> diff --git a/hw/device-assignment.h b/hw/device-assignment.h
> index 3d20207..ed5defb 100644
> --- a/hw/device-assignment.h
> +++ b/hw/device-assignment.h
> @@ -104,12 +104,13 @@ typedef struct AssignedDevice {
>  #define ASSIGNED_DEVICE_MSIX_MASKED (1 << 2)
>          uint32_t state;
>      } cap;
> +    uint8_t direct_config_read[PCI_CONFIG_SPACE_SIZE];
> +    uint8_t direct_config_write[PCI_CONFIG_SPACE_SIZE];
>      int irq_entries_nr;
>      struct kvm_irq_routing_entry *entry;
>      void *msix_table_page;
>      target_phys_addr_t msix_table_addr;
>      int mmio_index;
> -    uint32_t emulate_cmd_mask;
>      char *configfd_name;
>      int32_t bootindex;
>      QLIST_ENTRY(AssignedDevice) next;
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 12/13] pci-assign: Generic config space access management
  2011-06-27 18:19 ` [PATCH 12/13] pci-assign: Generic config space access management Jan Kiszka
  2011-06-27 20:54   ` Michael S. Tsirkin
@ 2011-06-27 22:48   ` Alex Williamson
  2011-06-28  7:08     ` Jan Kiszka
  2011-06-28  8:07     ` Avi Kivity
  2011-06-28  8:10   ` Michael S. Tsirkin
  2011-06-28  8:51   ` Michael S. Tsirkin
  3 siblings, 2 replies; 48+ messages in thread
From: Alex Williamson @ 2011-06-27 22:48 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Michael S. Tsirkin

On Mon, 2011-06-27 at 20:19 +0200, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> This drastically simplifies config space access management: Instead of
> coding various range checks and merging bits, set up two access control
> bitmaps. One defines, which bits can be directly read from the device,
> the other allows direct write to the device, also with bit-granularity.
> 
> The setup of those bitmaps, specifically the corner cases like the
> capability list status bit, is way more readable than the existing code.
> And the generic config access functions just need one additional logic
> to catch writes to the MSI/MSI-X control flags and call the related
> update handlers.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  hw/device-assignment.c |  325 ++++++++++++++++-------------------------------
>  hw/device-assignment.h |    3 +-
>  2 files changed, 113 insertions(+), 215 deletions(-)
> 
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index 6a2ca8c..be199d2 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -63,35 +63,6 @@ static void assigned_dev_load_option_rom(AssignedDevice *dev);
>  
>  static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
>  
> -static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
> -                                                 uint32_t address,
> -                                                 uint32_t val, int len);
> -
> -static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
> -                                                    uint32_t address, int len);
> -
> -/* Merge the bits set in mask from mval into val.  Both val and mval are
> - * at the same addr offset, pos is the starting offset of the mask. */
> -static uint32_t merge_bits(uint32_t val, uint32_t mval, uint8_t addr,
> -                           int len, uint8_t pos, uint32_t mask)
> -{
> -    if (!ranges_overlap(addr, len, pos, 4)) {
> -        return val;
> -    }
> -
> -    if (addr >= pos) {
> -        mask >>= (addr - pos) * 8;
> -    } else {
> -        mask <<= (pos - addr) * 8;
> -    }
> -    mask &= 0xffffffffU >> (4 - len) * 8;
> -
> -    val &= ~mask;
> -    val |= (mval & mask);
> -
> -    return val;
> -}
> -
>  static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
>                                         uint32_t addr, int len, uint32_t *val)
>  {
> @@ -404,143 +375,6 @@ static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap, uint8_t start)
>      return 0;
>  }
>  
> -static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address,
> -                                          uint32_t val, int len)
> -{
> -    int fd;
> -    ssize_t ret;
> -    AssignedDevice *pci_dev = DO_UPCAST(AssignedDevice, dev, d);
> -
> -    DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> -          ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
> -          (uint16_t) address, val, len);
> -
> -    if (address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address]) {
> -        return assigned_device_pci_cap_write_config(d, address, val, len);
> -    }
> -
> -    if (ranges_overlap(address, len, PCI_COMMAND, 2)) {
> -        pci_default_write_config(d, address, val, len);
> -        /* Continue to program the card */
> -    }
> -
> -    /*
> -     * Catch access to
> -     *  - base address registers
> -     *  - ROM base address & capability pointer
> -     *  - interrupt line & pin
> -     */
> -    if (ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
> -        ranges_overlap(address, len, PCI_ROM_ADDRESS, 4)) {
> -        pci_default_write_config(d, address, val, len);
> -        return;
> -    } else if (ranges_overlap(address, len, PCI_CAPABILITY_LIST, 1) ||
> -               ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) {
> -        uint32_t real_val;
> -
> -        pci_default_write_config(d, address, val, len);
> -
> -        /* Ensure that writes to overlapping areas we don't virtualize still
> -         * hit the device. */
> -        real_val = assigned_dev_pci_read(d, address, len);
> -        val = merge_bits(val, real_val, address, len,
> -                         PCI_CAPABILITY_LIST, 0xff);
> -        val = merge_bits(val, real_val, address, len,
> -                         PCI_INTERRUPT_LINE, 0xffff);
> -    }
> -
> -    DEBUG("NON BAR (%x.%x): address=%04x val=0x%08x len=%d\n",
> -          ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
> -          (uint16_t) address, val, len);
> -
> -    fd = pci_dev->real_device.config_fd;
> -
> -again:
> -    ret = pwrite(fd, &val, len, address);
> -    if (ret != len) {
> -	if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
> -	    goto again;
> -
> -	fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
> -		__func__, ret, errno);
> -
> -	exit(1);
> -    }
> -}
> -
> -static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address,
> -                                             int len)
> -{
> -    uint32_t val = 0, virt_val;
> -    int fd;
> -    ssize_t ret;
> -    AssignedDevice *pci_dev = DO_UPCAST(AssignedDevice, dev, d);
> -
> -    if (address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address]) {
> -        val = assigned_device_pci_cap_read_config(d, address, len);
> -        DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> -              (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
> -        return val;
> -    }
> -
> -    /*
> -     * Catch access to
> -     *  - vendor & device ID
> -     *  - base address registers
> -     *  - ROM base address
> -     */
> -    if (ranges_overlap(address, len, PCI_VENDOR_ID, 4) ||
> -        ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
> -        ranges_overlap(address, len, PCI_ROM_ADDRESS, 4)) {
> -        val = pci_default_read_config(d, address, len);
> -        DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> -              (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
> -        return val;
> -    }
> -
> -    fd = pci_dev->real_device.config_fd;
> -
> -again:
> -    ret = pread(fd, &val, len, address);
> -    if (ret != len) {
> -	if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
> -	    goto again;
> -
> -	fprintf(stderr, "%s: pread failed, ret = %zd errno = %d\n",
> -		__func__, ret, errno);
> -
> -	exit(1);
> -    }
> -
> -    DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> -          (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
> -
> -    if (pci_dev->emulate_cmd_mask) {
> -        val = merge_bits(val, pci_default_read_config(d, address, len),
> -                         address, len, PCI_COMMAND, pci_dev->emulate_cmd_mask);
> -    }
> -
> -    /*
> -     * Merge bits from virtualized
> -     *  - capability pointer
> -     *  - interrupt line & pin
> -     */
> -    virt_val = pci_default_read_config(d, address, len);
> -    val = merge_bits(val, virt_val, address, len, PCI_CAPABILITY_LIST, 0xff);
> -    val = merge_bits(val, virt_val, address, len, PCI_INTERRUPT_LINE, 0xffff);
> -
> -    if (!pci_dev->cap.available) {
> -        /* kill the special capabilities */
> -        if (address == PCI_COMMAND && len == 4) {
> -            val &= ~(PCI_STATUS_CAP_LIST << 16);
> -        } else if (address == PCI_STATUS) {
> -            val &= ~PCI_STATUS_CAP_LIST;
> -        }
> -    }
> -
> -    return val;
> -}
> -
>  static int assigned_dev_register_regions(PCIRegion *io_regions,
>                                           unsigned long regions_num,
>                                           AssignedDevice *pci_dev)
> @@ -790,7 +624,8 @@ again:
>      /* dealing with virtual function device */
>      snprintf(name, sizeof(name), "%sphysfn/", dir);
>      if (!stat(name, &statbuf)) {
> -        pci_dev->emulate_cmd_mask = 0xffff;
> +        /* always provide the written value on readout */
> +        memset(pci_dev->direct_config_read + PCI_COMMAND, 0xff, 2);
>      }
>  
>      dev->region_number = r;
> @@ -1268,73 +1103,81 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
>      }
>  }
>  
> -/* There can be multiple VNDR capabilities per device, we need to find the
> - * one that starts closet to the given address without going over. */
> -static uint8_t find_vndr_start(PCIDevice *pci_dev, uint32_t address)
> +static uint32_t assigned_dev_pci_read_config(PCIDevice *pci_dev,
> +                                             uint32_t address, int len)
>  {
> -    uint8_t cap, pos;
> +    AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> +    uint32_t virt_val = pci_default_read_config(pci_dev, address, len);
> +    uint32_t real_val, direct_mask;
>  
> -    for (cap = pos = 0;
> -         (pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VNDR, pos));
> -         pos += PCI_CAP_LIST_NEXT) {
> -        if (pos <= address) {
> -            cap = MAX(pos, cap);
> -        }
> +    switch (len) {
> +    case 4:
> +        direct_mask =
> +            pci_get_long(assigned_dev->direct_config_read + address);
> +        break;
> +    case 2:
> +        direct_mask =
> +            pci_get_word(assigned_dev->direct_config_read + address);
> +        break;
> +    case 1:
> +        direct_mask =
> +            pci_get_byte(assigned_dev->direct_config_read + address);
> +        break;
> +    default:
> +        abort();
>      }
> -    return cap;
> -}
> -
> -static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
> -                                                    uint32_t address, int len)
> -{
> -    uint8_t cap, cap_id = pci_dev->config_map[address];
> -    uint32_t val;
> -
> -    switch (cap_id) {
> -
> -    case PCI_CAP_ID_VPD:
> -        cap = pci_find_capability(pci_dev, cap_id);
> -        val = assigned_dev_pci_read(pci_dev, address, len);
> -        return merge_bits(val, pci_get_long(pci_dev->config + address),
> -                          address, len, cap + PCI_CAP_LIST_NEXT, 0xff);
> -
> -    case PCI_CAP_ID_VNDR:
> -        cap = find_vndr_start(pci_dev, address);
> -        val = assigned_dev_pci_read(pci_dev, address, len);
> -        return merge_bits(val, pci_get_long(pci_dev->config + address),
> -                          address, len, cap + PCI_CAP_LIST_NEXT, 0xff);
> +    if (direct_mask) {
> +        real_val = assigned_dev_pci_read(pci_dev, address, len);
> +        return (virt_val & ~direct_mask) | (real_val & direct_mask);
> +    } else {
> +        return virt_val;
>      }
> -
> -    return pci_default_read_config(pci_dev, address, len);
>  }
>  
> -static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
> -                                                 uint32_t address,
> -                                                 uint32_t val, int len)
> +static void assigned_dev_pci_write_config(PCIDevice *pci_dev, uint32_t address,
> +                                          uint32_t val, int len)
>  {
> -    uint8_t cap_id = pci_dev->config_map[address];
> +    AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> +    uint32_t direct_mask, full_access;
>  
>      pci_default_write_config(pci_dev, address, val, len);
> -    switch (cap_id) {
> -    case PCI_CAP_ID_MSI:
> +
> +    if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
>          if (range_covers_byte(address, len,
>                                pci_dev->msi_cap + PCI_MSI_FLAGS)) {
>              assigned_dev_update_msi(pci_dev);
>          }
> -        break;
> -
> -    case PCI_CAP_ID_MSIX:
> +    } else if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
>          if (range_covers_byte(address, len,
>                                pci_dev->msix_cap + PCI_MSIX_FLAGS + 1)) {
>              assigned_dev_update_msix(pci_dev);
>          }
> -        break;
> +    }
>  
> -    case PCI_CAP_ID_VPD:
> -    case PCI_CAP_ID_VNDR:
> -        assigned_dev_pci_write(pci_dev, address, val, len);
> +    switch (len) {
> +    case 4:
> +        direct_mask =
> +            pci_get_long(assigned_dev->direct_config_write + address);
> +        full_access = 0xffffffff;
> +        break;
> +    case 2:
> +        direct_mask =
> +            pci_get_word(assigned_dev->direct_config_write + address);
> +        full_access = 0xffff;
>          break;
> +    case 1:
> +        direct_mask =
> +            pci_get_byte(assigned_dev->direct_config_write + address);
> +        full_access = 0xff;
> +        break;
> +    default:
> +        abort();
> +    }

Shouldn't there be a if (!direct_mask) return; here?

> +    if (direct_mask != full_access) {
> +        val &= direct_mask;
> +        val |= assigned_dev_pci_read(pci_dev, address, len) & ~direct_mask;
>      }
> +    assigned_dev_pci_write(pci_dev, address, val, len);
>  }
>  
>  static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
> @@ -1404,6 +1247,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>              return ret;
>          }
>  
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, PCI_PM_SIZEOF);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
>          pmc = pci_get_word(pci_dev->config + pos + PCI_CAP_FLAGS);
>          pmc &= (PCI_PM_CAP_VER_MASK | PCI_PM_CAP_DSI);
>          pci_set_word(pci_dev->config + pos + PCI_CAP_FLAGS, pmc);
> @@ -1438,6 +1285,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>              return ret;
>          }
>  
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, size);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
>          type = pci_get_word(pci_dev->config + pos + PCI_EXP_FLAGS);
>          type = (type & PCI_EXP_FLAGS_TYPE) >> 8;
>          if (type != PCI_EXP_TYPE_ENDPOINT &&
> @@ -1513,6 +1364,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>              return ret;
>          }
>  
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, 8);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
>          /* Command register, clear upper bits, including extended modes */
>          cmd = pci_get_word(pci_dev->config + pos + PCI_X_CMD);
>          cmd &= (PCI_X_CMD_DPERR_E | PCI_X_CMD_ERO | PCI_X_CMD_MAX_READ |
> @@ -1534,6 +1389,13 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>          if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_VPD, pos, 8)) < 0) {
>              return ret;
>          }
> +
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, 8);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
> +        /* direct write for cap content */
> +        memset(dev->direct_config_write + pos + 2, 0xff, 6);
>      }
>  
>      /* Devices can have multiple vendor capabilities, get them all */
> @@ -1545,6 +1407,21 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>                                        pos, len)) < 0) {
>              return ret;
>          }
> +
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, len);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
> +        /* direct write for cap content */
> +        memset(dev->direct_config_write + pos + 2, 0xff, len - 2);
> +    }
> +
> +    /* If real and virtual capability list status bits differ, virtualize the
> +     * access. */
> +    if ((pci_get_word(pci_dev->config + PCI_STATUS) & PCI_STATUS_CAP_LIST) !=
> +        (assigned_dev_pci_read_byte(pci_dev, PCI_STATUS) &
> +         PCI_STATUS_CAP_LIST)) {
> +        dev->direct_config_read[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
>      }
>  
>      return 0;
> @@ -1694,6 +1571,26 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
>          return -1;
>      }
>  
> +    /*
> +     * Set up basic config space access control. Will be further refined during
> +     * device initialization.
> +     *
> +     * Direct read/write access is grangted for:
> +     *  - status & command registers, revision ID & class code, cacheline size,
> +     *    latency timer, header type, BIST
> +     *  - cardbus CIS pointer, subsystem vendor & ID
> +     *  - reserved fields
> +     *  - Min_Gnt & Max_Lat
> +     */
> +    memset(dev->direct_config_read + PCI_COMMAND, 0xff, 12);

This is more efficient, but maybe we should memset each field for
grep-ability.

Nice change overall.  Thanks,

Alex

> +    memset(dev->direct_config_read + PCI_CARDBUS_CIS, 0xff, 8);
> +    memset(dev->direct_config_read + PCI_CAPABILITY_LIST + 1, 0xff, 7);
> +    memset(dev->direct_config_read + PCI_MIN_GNT, 0xff, 2);
> +    memset(dev->direct_config_write + PCI_COMMAND, 0xff, 12);
> +    memset(dev->direct_config_write + PCI_CARDBUS_CIS, 0xff, 8);
> +    memset(dev->direct_config_write + PCI_CAPABILITY_LIST + 1, 0xff, 7);
> +    memset(dev->direct_config_write + PCI_MIN_GNT, 0xff, 2);
> +
>      if (get_real_device(dev, dev->host.seg, dev->host.bus,
>                          dev->host.dev, dev->host.func)) {
>          error_report("pci-assign: Error: Couldn't get real device (%s)!",
> diff --git a/hw/device-assignment.h b/hw/device-assignment.h
> index 3d20207..ed5defb 100644
> --- a/hw/device-assignment.h
> +++ b/hw/device-assignment.h
> @@ -104,12 +104,13 @@ typedef struct AssignedDevice {
>  #define ASSIGNED_DEVICE_MSIX_MASKED (1 << 2)
>          uint32_t state;
>      } cap;
> +    uint8_t direct_config_read[PCI_CONFIG_SPACE_SIZE];
> +    uint8_t direct_config_write[PCI_CONFIG_SPACE_SIZE];
>      int irq_entries_nr;
>      struct kvm_irq_routing_entry *entry;
>      void *msix_table_page;
>      target_phys_addr_t msix_table_addr;
>      int mmio_index;
> -    uint32_t emulate_cmd_mask;
>      char *configfd_name;
>      int32_t bootindex;
>      QLIST_ENTRY(AssignedDevice) next;




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 12/13] pci-assign: Generic config space access management
  2011-06-27 22:48   ` Alex Williamson
@ 2011-06-28  7:08     ` Jan Kiszka
  2011-06-28  8:07     ` Avi Kivity
  1 sibling, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2011-06-28  7:08 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 5165 bytes --]

On 2011-06-28 00:48, Alex Williamson wrote:
>> -    case PCI_CAP_ID_VPD:
>> -    case PCI_CAP_ID_VNDR:
>> -        assigned_dev_pci_write(pci_dev, address, val, len);
>> +    switch (len) {
>> +    case 4:
>> +        direct_mask =
>> +            pci_get_long(assigned_dev->direct_config_write + address);
>> +        full_access = 0xffffffff;
>> +        break;
>> +    case 2:
>> +        direct_mask =
>> +            pci_get_word(assigned_dev->direct_config_write + address);
>> +        full_access = 0xffff;
>>          break;
>> +    case 1:
>> +        direct_mask =
>> +            pci_get_byte(assigned_dev->direct_config_write + address);
>> +        full_access = 0xff;
>> +        break;
>> +    default:
>> +        abort();
>> +    }
> 
> Shouldn't there be a if (!direct_mask) return; here?

Yep.

> 
>> +    if (direct_mask != full_access) {
>> +        val &= direct_mask;
>> +        val |= assigned_dev_pci_read(pci_dev, address, len) & ~direct_mask;
>>      }
>> +    assigned_dev_pci_write(pci_dev, address, val, len);
>>  }
>>  
>>  static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>> @@ -1404,6 +1247,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>>              return ret;
>>          }
>>  
>> +        /* direct read except for next cap pointer */
>> +        memset(dev->direct_config_read + pos, 0xff, PCI_PM_SIZEOF);
>> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
>> +
>>          pmc = pci_get_word(pci_dev->config + pos + PCI_CAP_FLAGS);
>>          pmc &= (PCI_PM_CAP_VER_MASK | PCI_PM_CAP_DSI);
>>          pci_set_word(pci_dev->config + pos + PCI_CAP_FLAGS, pmc);
>> @@ -1438,6 +1285,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>>              return ret;
>>          }
>>  
>> +        /* direct read except for next cap pointer */
>> +        memset(dev->direct_config_read + pos, 0xff, size);
>> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
>> +
>>          type = pci_get_word(pci_dev->config + pos + PCI_EXP_FLAGS);
>>          type = (type & PCI_EXP_FLAGS_TYPE) >> 8;
>>          if (type != PCI_EXP_TYPE_ENDPOINT &&
>> @@ -1513,6 +1364,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>>              return ret;
>>          }
>>  
>> +        /* direct read except for next cap pointer */
>> +        memset(dev->direct_config_read + pos, 0xff, 8);
>> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
>> +
>>          /* Command register, clear upper bits, including extended modes */
>>          cmd = pci_get_word(pci_dev->config + pos + PCI_X_CMD);
>>          cmd &= (PCI_X_CMD_DPERR_E | PCI_X_CMD_ERO | PCI_X_CMD_MAX_READ |
>> @@ -1534,6 +1389,13 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>>          if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_VPD, pos, 8)) < 0) {
>>              return ret;
>>          }
>> +
>> +        /* direct read except for next cap pointer */
>> +        memset(dev->direct_config_read + pos, 0xff, 8);
>> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
>> +
>> +        /* direct write for cap content */
>> +        memset(dev->direct_config_write + pos + 2, 0xff, 6);
>>      }
>>  
>>      /* Devices can have multiple vendor capabilities, get them all */
>> @@ -1545,6 +1407,21 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>>                                        pos, len)) < 0) {
>>              return ret;
>>          }
>> +
>> +        /* direct read except for next cap pointer */
>> +        memset(dev->direct_config_read + pos, 0xff, len);
>> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
>> +
>> +        /* direct write for cap content */
>> +        memset(dev->direct_config_write + pos + 2, 0xff, len - 2);
>> +    }
>> +
>> +    /* If real and virtual capability list status bits differ, virtualize the
>> +     * access. */
>> +    if ((pci_get_word(pci_dev->config + PCI_STATUS) & PCI_STATUS_CAP_LIST) !=
>> +        (assigned_dev_pci_read_byte(pci_dev, PCI_STATUS) &
>> +         PCI_STATUS_CAP_LIST)) {
>> +        dev->direct_config_read[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
>>      }
>>  
>>      return 0;
>> @@ -1694,6 +1571,26 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
>>          return -1;
>>      }
>>  
>> +    /*
>> +     * Set up basic config space access control. Will be further refined during
>> +     * device initialization.
>> +     *
>> +     * Direct read/write access is grangted for:
>> +     *  - status & command registers, revision ID & class code, cacheline size,
>> +     *    latency timer, header type, BIST
>> +     *  - cardbus CIS pointer, subsystem vendor & ID
>> +     *  - reserved fields
>> +     *  - Min_Gnt & Max_Lat
>> +     */
>> +    memset(dev->direct_config_read + PCI_COMMAND, 0xff, 12);
> 
> This is more efficient, but maybe we should memset each field for
> grep-ability.

Will play with that. May save some comments.

> 
> Nice change overall.  Thanks,

Thanks,
Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 12/13] pci-assign: Generic config space access management
  2011-06-27 22:48   ` Alex Williamson
  2011-06-28  7:08     ` Jan Kiszka
@ 2011-06-28  8:07     ` Avi Kivity
  2011-06-28  8:19       ` Jan Kiszka
  1 sibling, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2011-06-28  8:07 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jan Kiszka, Marcelo Tosatti, kvm, Michael S. Tsirkin

On 06/28/2011 01:48 AM, Alex Williamson wrote:
>> +    memset(dev->direct_config_read + PCI_COMMAND, 0xff, 12);
> This is more efficient, but maybe we should memset each field for
> grep-ability.

Agree.  Or even

     pci_assign_set_direct_config(dev, register, start_bit, nr_bits);


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions
  2011-06-27 18:19 [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Jan Kiszka
                   ` (12 preceding siblings ...)
  2011-06-27 18:19 ` [PATCH 13/13] qemu-kvm: Resolve PCI upstream diffs Jan Kiszka
@ 2011-06-28  8:10 ` Avi Kivity
  2011-06-28  8:57   ` Michael S. Tsirkin
  13 siblings, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2011-06-28  8:10 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Marcelo Tosatti, kvm, Michael S. Tsirkin, Alex Williamson, Joerg Roedel

On 06/27/2011 09:19 PM, Jan Kiszka wrote:
> This series basically consists of two halves. The first one applies a
> few smaller cleanups to qemu-kvm to improve similarity with upstream.
> This includes the recently discussed removal of -enable-nesting.
>
> The second half starts with two device assignment fixes and then applies
> some refactorings, specifically dropping the libpci dependency and
> enabling the revert of some qemu-kvm private PCI core changes. The
> latter is achieved by simplifying the access control management to the
> passed-through device's config space. That also saves 100 LOC.
>
> Note that this series was tested with upstream commit af2be20777 ("Fix
> fallouts from Linux header inclusion") applied and probably depends on
> it also mechanically.
>
> Please review/merge.
>

Thanks, applied 1-6.  The rest looks good as well, just need the few 
comments attended to.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 12/13] pci-assign: Generic config space access management
  2011-06-27 18:19 ` [PATCH 12/13] pci-assign: Generic config space access management Jan Kiszka
  2011-06-27 20:54   ` Michael S. Tsirkin
  2011-06-27 22:48   ` Alex Williamson
@ 2011-06-28  8:10   ` Michael S. Tsirkin
  2011-06-28  8:18     ` Jan Kiszka
  2011-06-28  8:51   ` Michael S. Tsirkin
  3 siblings, 1 reply; 48+ messages in thread
From: Michael S. Tsirkin @ 2011-06-28  8:10 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

On Mon, Jun 27, 2011 at 08:19:55PM +0200, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> This drastically simplifies config space access management: Instead of
> coding various range checks and merging bits, set up two access control
> bitmaps. One defines, which bits can be directly read from the device,
> the other allows direct write to the device, also with bit-granularity.
> 
> The setup of those bitmaps, specifically the corner cases like the
> capability list status bit, is way more readable than the existing code.
> And the generic config access functions just need one additional logic
> to catch writes to the MSI/MSI-X control flags and call the related
> update handlers.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  hw/device-assignment.c |  325 ++++++++++++++++-------------------------------
>  hw/device-assignment.h |    3 +-
>  2 files changed, 113 insertions(+), 215 deletions(-)
> 
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index 6a2ca8c..be199d2 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -63,35 +63,6 @@ static void assigned_dev_load_option_rom(AssignedDevice *dev);
>  
>  static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
>  
> -static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
> -                                                 uint32_t address,
> -                                                 uint32_t val, int len);
> -
> -static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
> -                                                    uint32_t address, int len);
> -
> -/* Merge the bits set in mask from mval into val.  Both val and mval are
> - * at the same addr offset, pos is the starting offset of the mask. */
> -static uint32_t merge_bits(uint32_t val, uint32_t mval, uint8_t addr,
> -                           int len, uint8_t pos, uint32_t mask)
> -{
> -    if (!ranges_overlap(addr, len, pos, 4)) {
> -        return val;
> -    }
> -
> -    if (addr >= pos) {
> -        mask >>= (addr - pos) * 8;
> -    } else {
> -        mask <<= (pos - addr) * 8;
> -    }
> -    mask &= 0xffffffffU >> (4 - len) * 8;
> -
> -    val &= ~mask;
> -    val |= (mval & mask);
> -
> -    return val;
> -}
> -
>  static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
>                                         uint32_t addr, int len, uint32_t *val)
>  {
> @@ -404,143 +375,6 @@ static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap, uint8_t start)
>      return 0;
>  }
>  
> -static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address,
> -                                          uint32_t val, int len)
> -{
> -    int fd;
> -    ssize_t ret;
> -    AssignedDevice *pci_dev = DO_UPCAST(AssignedDevice, dev, d);
> -
> -    DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> -          ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
> -          (uint16_t) address, val, len);
> -
> -    if (address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address]) {
> -        return assigned_device_pci_cap_write_config(d, address, val, len);
> -    }
> -
> -    if (ranges_overlap(address, len, PCI_COMMAND, 2)) {
> -        pci_default_write_config(d, address, val, len);
> -        /* Continue to program the card */
> -    }
> -
> -    /*
> -     * Catch access to
> -     *  - base address registers
> -     *  - ROM base address & capability pointer
> -     *  - interrupt line & pin
> -     */
> -    if (ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
> -        ranges_overlap(address, len, PCI_ROM_ADDRESS, 4)) {
> -        pci_default_write_config(d, address, val, len);
> -        return;
> -    } else if (ranges_overlap(address, len, PCI_CAPABILITY_LIST, 1) ||
> -               ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) {
> -        uint32_t real_val;
> -
> -        pci_default_write_config(d, address, val, len);
> -
> -        /* Ensure that writes to overlapping areas we don't virtualize still
> -         * hit the device. */
> -        real_val = assigned_dev_pci_read(d, address, len);
> -        val = merge_bits(val, real_val, address, len,
> -                         PCI_CAPABILITY_LIST, 0xff);
> -        val = merge_bits(val, real_val, address, len,
> -                         PCI_INTERRUPT_LINE, 0xffff);
> -    }
> -
> -    DEBUG("NON BAR (%x.%x): address=%04x val=0x%08x len=%d\n",
> -          ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
> -          (uint16_t) address, val, len);
> -
> -    fd = pci_dev->real_device.config_fd;
> -
> -again:
> -    ret = pwrite(fd, &val, len, address);
> -    if (ret != len) {
> -	if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
> -	    goto again;
> -
> -	fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
> -		__func__, ret, errno);
> -
> -	exit(1);
> -    }
> -}
> -
> -static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address,
> -                                             int len)
> -{
> -    uint32_t val = 0, virt_val;
> -    int fd;
> -    ssize_t ret;
> -    AssignedDevice *pci_dev = DO_UPCAST(AssignedDevice, dev, d);
> -
> -    if (address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address]) {
> -        val = assigned_device_pci_cap_read_config(d, address, len);
> -        DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> -              (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
> -        return val;
> -    }
> -
> -    /*
> -     * Catch access to
> -     *  - vendor & device ID
> -     *  - base address registers
> -     *  - ROM base address
> -     */
> -    if (ranges_overlap(address, len, PCI_VENDOR_ID, 4) ||
> -        ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
> -        ranges_overlap(address, len, PCI_ROM_ADDRESS, 4)) {
> -        val = pci_default_read_config(d, address, len);
> -        DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> -              (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
> -        return val;
> -    }
> -
> -    fd = pci_dev->real_device.config_fd;
> -
> -again:
> -    ret = pread(fd, &val, len, address);
> -    if (ret != len) {
> -	if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
> -	    goto again;
> -
> -	fprintf(stderr, "%s: pread failed, ret = %zd errno = %d\n",
> -		__func__, ret, errno);
> -
> -	exit(1);
> -    }
> -
> -    DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> -          (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
> -
> -    if (pci_dev->emulate_cmd_mask) {
> -        val = merge_bits(val, pci_default_read_config(d, address, len),
> -                         address, len, PCI_COMMAND, pci_dev->emulate_cmd_mask);
> -    }
> -
> -    /*
> -     * Merge bits from virtualized
> -     *  - capability pointer
> -     *  - interrupt line & pin
> -     */
> -    virt_val = pci_default_read_config(d, address, len);
> -    val = merge_bits(val, virt_val, address, len, PCI_CAPABILITY_LIST, 0xff);
> -    val = merge_bits(val, virt_val, address, len, PCI_INTERRUPT_LINE, 0xffff);
> -
> -    if (!pci_dev->cap.available) {
> -        /* kill the special capabilities */
> -        if (address == PCI_COMMAND && len == 4) {
> -            val &= ~(PCI_STATUS_CAP_LIST << 16);
> -        } else if (address == PCI_STATUS) {
> -            val &= ~PCI_STATUS_CAP_LIST;
> -        }
> -    }
> -
> -    return val;
> -}
> -
>  static int assigned_dev_register_regions(PCIRegion *io_regions,
>                                           unsigned long regions_num,
>                                           AssignedDevice *pci_dev)
> @@ -790,7 +624,8 @@ again:
>      /* dealing with virtual function device */
>      snprintf(name, sizeof(name), "%sphysfn/", dir);
>      if (!stat(name, &statbuf)) {
> -        pci_dev->emulate_cmd_mask = 0xffff;
> +        /* always provide the written value on readout */
> +        memset(pci_dev->direct_config_read + PCI_COMMAND, 0xff, 2);
>      }
>  
>      dev->region_number = r;
> @@ -1268,73 +1103,81 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
>      }
>  }
>  
> -/* There can be multiple VNDR capabilities per device, we need to find the
> - * one that starts closet to the given address without going over. */
> -static uint8_t find_vndr_start(PCIDevice *pci_dev, uint32_t address)
> +static uint32_t assigned_dev_pci_read_config(PCIDevice *pci_dev,
> +                                             uint32_t address, int len)
>  {
> -    uint8_t cap, pos;
> +    AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> +    uint32_t virt_val = pci_default_read_config(pci_dev, address, len);
> +    uint32_t real_val, direct_mask;
>  
> -    for (cap = pos = 0;
> -         (pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VNDR, pos));
> -         pos += PCI_CAP_LIST_NEXT) {
> -        if (pos <= address) {
> -            cap = MAX(pos, cap);
> -        }
> +    switch (len) {
> +    case 4:
> +        direct_mask =
> +            pci_get_long(assigned_dev->direct_config_read + address);
> +        break;
> +    case 2:
> +        direct_mask =
> +            pci_get_word(assigned_dev->direct_config_read + address);
> +        break;
> +    case 1:
> +        direct_mask =
> +            pci_get_byte(assigned_dev->direct_config_read + address);
> +        break;
> +    default:
> +        abort();
>      }
> -    return cap;
> -}
> -
> -static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
> -                                                    uint32_t address, int len)
> -{
> -    uint8_t cap, cap_id = pci_dev->config_map[address];
> -    uint32_t val;
> -
> -    switch (cap_id) {
> -
> -    case PCI_CAP_ID_VPD:
> -        cap = pci_find_capability(pci_dev, cap_id);
> -        val = assigned_dev_pci_read(pci_dev, address, len);
> -        return merge_bits(val, pci_get_long(pci_dev->config + address),
> -                          address, len, cap + PCI_CAP_LIST_NEXT, 0xff);
> -
> -    case PCI_CAP_ID_VNDR:
> -        cap = find_vndr_start(pci_dev, address);
> -        val = assigned_dev_pci_read(pci_dev, address, len);
> -        return merge_bits(val, pci_get_long(pci_dev->config + address),
> -                          address, len, cap + PCI_CAP_LIST_NEXT, 0xff);
> +    if (direct_mask) {
> +        real_val = assigned_dev_pci_read(pci_dev, address, len);
> +        return (virt_val & ~direct_mask) | (real_val & direct_mask);
> +    } else {
> +        return virt_val;
>      }
> -
> -    return pci_default_read_config(pci_dev, address, len);
>  }
>  
> -static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
> -                                                 uint32_t address,
> -                                                 uint32_t val, int len)
> +static void assigned_dev_pci_write_config(PCIDevice *pci_dev, uint32_t address,
> +                                          uint32_t val, int len)
>  {
> -    uint8_t cap_id = pci_dev->config_map[address];
> +    AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> +    uint32_t direct_mask, full_access;
>  
>      pci_default_write_config(pci_dev, address, val, len);
> -    switch (cap_id) {
> -    case PCI_CAP_ID_MSI:
> +
> +    if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
>          if (range_covers_byte(address, len,
>                                pci_dev->msi_cap + PCI_MSI_FLAGS)) {
>              assigned_dev_update_msi(pci_dev);
>          }
> -        break;
> -
> -    case PCI_CAP_ID_MSIX:
> +    } else if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
>          if (range_covers_byte(address, len,
>                                pci_dev->msix_cap + PCI_MSIX_FLAGS + 1)) {
>              assigned_dev_update_msix(pci_dev);
>          }
> -        break;
> +    }
>  
> -    case PCI_CAP_ID_VPD:
> -    case PCI_CAP_ID_VNDR:
> -        assigned_dev_pci_write(pci_dev, address, val, len);
> +    switch (len) {
> +    case 4:
> +        direct_mask =
> +            pci_get_long(assigned_dev->direct_config_write + address);
> +        full_access = 0xffffffff;
> +        break;
> +    case 2:
> +        direct_mask =
> +            pci_get_word(assigned_dev->direct_config_write + address);
> +        full_access = 0xffff;
>          break;
> +    case 1:
> +        direct_mask =
> +            pci_get_byte(assigned_dev->direct_config_write + address);
> +        full_access = 0xff;
> +        break;
> +    default:
> +        abort();
> +    }
> +    if (direct_mask != full_access) {
> +        val &= direct_mask;
> +        val |= assigned_dev_pci_read(pci_dev, address, len) & ~direct_mask;
>      }
> +    assigned_dev_pci_write(pci_dev, address, val, len);

I think we should not write at all for fully virtualized fields
(direct_mask == 0). E.g. PCI BARs.

>  }
>  
>  static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
> @@ -1404,6 +1247,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>              return ret;
>          }
>  
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, PCI_PM_SIZEOF);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
>          pmc = pci_get_word(pci_dev->config + pos + PCI_CAP_FLAGS);
>          pmc &= (PCI_PM_CAP_VER_MASK | PCI_PM_CAP_DSI);
>          pci_set_word(pci_dev->config + pos + PCI_CAP_FLAGS, pmc);
> @@ -1438,6 +1285,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>              return ret;
>          }
>  
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, size);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
>          type = pci_get_word(pci_dev->config + pos + PCI_EXP_FLAGS);
>          type = (type & PCI_EXP_FLAGS_TYPE) >> 8;
>          if (type != PCI_EXP_TYPE_ENDPOINT &&
> @@ -1513,6 +1364,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>              return ret;
>          }
>  
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, 8);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
>          /* Command register, clear upper bits, including extended modes */
>          cmd = pci_get_word(pci_dev->config + pos + PCI_X_CMD);
>          cmd &= (PCI_X_CMD_DPERR_E | PCI_X_CMD_ERO | PCI_X_CMD_MAX_READ |
> @@ -1534,6 +1389,13 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>          if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_VPD, pos, 8)) < 0) {
>              return ret;
>          }
> +
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, 8);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
> +        /* direct write for cap content */
> +        memset(dev->direct_config_write + pos + 2, 0xff, 6);
>      }
>  
>      /* Devices can have multiple vendor capabilities, get them all */
> @@ -1545,6 +1407,21 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>                                        pos, len)) < 0) {
>              return ret;
>          }
> +
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, len);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;

Add a function for the above 3 lines?

> +
> +        /* direct write for cap content */
> +        memset(dev->direct_config_write + pos + 2, 0xff, len - 2);
> +    }
> +
> +    /* If real and virtual capability list status bits differ, virtualize the
> +     * access. */
> +    if ((pci_get_word(pci_dev->config + PCI_STATUS) & PCI_STATUS_CAP_LIST) !=
> +        (assigned_dev_pci_read_byte(pci_dev, PCI_STATUS) &
> +         PCI_STATUS_CAP_LIST)) {
> +        dev->direct_config_read[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
>      }

Is this an optimization? What harm does it do to virtualize always?

>  
>      return 0;
> @@ -1694,6 +1571,26 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
>          return -1;
>      }
>  
> +    /*
> +     * Set up basic config space access control. Will be further refined during
> +     * device initialization.
> +     *
> +     * Direct read/write access is grangted for:

typo

> +     *  - status & command registers, revision ID & class code, cacheline size,
> +     *    latency timer, header type, BIST
> +     *  - cardbus CIS pointer, subsystem vendor & ID
> +     *  - reserved fields
> +     *  - Min_Gnt & Max_Lat
> +     */
> +    memset(dev->direct_config_read + PCI_COMMAND, 0xff, 12);
> +    memset(dev->direct_config_read + PCI_CARDBUS_CIS, 0xff, 8);
> +    memset(dev->direct_config_read + PCI_CAPABILITY_LIST + 1, 0xff, 7);
> +    memset(dev->direct_config_read + PCI_MIN_GNT, 0xff, 2);
> +    memset(dev->direct_config_write + PCI_COMMAND, 0xff, 12);
> +    memset(dev->direct_config_write + PCI_CARDBUS_CIS, 0xff, 8);
> +    memset(dev->direct_config_write + PCI_CAPABILITY_LIST + 1, 0xff, 7);
> +    memset(dev->direct_config_write + PCI_MIN_GNT, 0xff, 2);
> +
>      if (get_real_device(dev, dev->host.seg, dev->host.bus,
>                          dev->host.dev, dev->host.func)) {
>          error_report("pci-assign: Error: Couldn't get real device (%s)!",
> diff --git a/hw/device-assignment.h b/hw/device-assignment.h
> index 3d20207..ed5defb 100644
> --- a/hw/device-assignment.h
> +++ b/hw/device-assignment.h
> @@ -104,12 +104,13 @@ typedef struct AssignedDevice {
>  #define ASSIGNED_DEVICE_MSIX_MASKED (1 << 2)
>          uint32_t state;
>      } cap;
> +    uint8_t direct_config_read[PCI_CONFIG_SPACE_SIZE];
> +    uint8_t direct_config_write[PCI_CONFIG_SPACE_SIZE];

What direct means isn't obvious. Add a comment?

We could revert the logic, have bits for
emulate_config_read/emulate_config_write.  I think that most of the
registers can safely be read/written directly, so protecting relevant
bits will be less work. Is that true?

>      int irq_entries_nr;
>      struct kvm_irq_routing_entry *entry;
>      void *msix_table_page;
>      target_phys_addr_t msix_table_addr;
>      int mmio_index;
> -    uint32_t emulate_cmd_mask;
>      char *configfd_name;
>      int32_t bootindex;
>      QLIST_ENTRY(AssignedDevice) next;
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 12/13] pci-assign: Generic config space access management
  2011-06-28  8:10   ` Michael S. Tsirkin
@ 2011-06-28  8:18     ` Jan Kiszka
  2011-06-28  8:30       ` Michael S. Tsirkin
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2011-06-28  8:18 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 19372 bytes --]

On 2011-06-28 10:10, Michael S. Tsirkin wrote:
> On Mon, Jun 27, 2011 at 08:19:55PM +0200, Jan Kiszka wrote:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> This drastically simplifies config space access management: Instead of
>> coding various range checks and merging bits, set up two access control
>> bitmaps. One defines, which bits can be directly read from the device,
>> the other allows direct write to the device, also with bit-granularity.
>>
>> The setup of those bitmaps, specifically the corner cases like the
>> capability list status bit, is way more readable than the existing code.
>> And the generic config access functions just need one additional logic
>> to catch writes to the MSI/MSI-X control flags and call the related
>> update handlers.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  hw/device-assignment.c |  325 ++++++++++++++++-------------------------------
>>  hw/device-assignment.h |    3 +-
>>  2 files changed, 113 insertions(+), 215 deletions(-)
>>
>> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
>> index 6a2ca8c..be199d2 100644
>> --- a/hw/device-assignment.c
>> +++ b/hw/device-assignment.c
>> @@ -63,35 +63,6 @@ static void assigned_dev_load_option_rom(AssignedDevice *dev);
>>  
>>  static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
>>  
>> -static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
>> -                                                 uint32_t address,
>> -                                                 uint32_t val, int len);
>> -
>> -static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
>> -                                                    uint32_t address, int len);
>> -
>> -/* Merge the bits set in mask from mval into val.  Both val and mval are
>> - * at the same addr offset, pos is the starting offset of the mask. */
>> -static uint32_t merge_bits(uint32_t val, uint32_t mval, uint8_t addr,
>> -                           int len, uint8_t pos, uint32_t mask)
>> -{
>> -    if (!ranges_overlap(addr, len, pos, 4)) {
>> -        return val;
>> -    }
>> -
>> -    if (addr >= pos) {
>> -        mask >>= (addr - pos) * 8;
>> -    } else {
>> -        mask <<= (pos - addr) * 8;
>> -    }
>> -    mask &= 0xffffffffU >> (4 - len) * 8;
>> -
>> -    val &= ~mask;
>> -    val |= (mval & mask);
>> -
>> -    return val;
>> -}
>> -
>>  static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
>>                                         uint32_t addr, int len, uint32_t *val)
>>  {
>> @@ -404,143 +375,6 @@ static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap, uint8_t start)
>>      return 0;
>>  }
>>  
>> -static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address,
>> -                                          uint32_t val, int len)
>> -{
>> -    int fd;
>> -    ssize_t ret;
>> -    AssignedDevice *pci_dev = DO_UPCAST(AssignedDevice, dev, d);
>> -
>> -    DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
>> -          ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
>> -          (uint16_t) address, val, len);
>> -
>> -    if (address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address]) {
>> -        return assigned_device_pci_cap_write_config(d, address, val, len);
>> -    }
>> -
>> -    if (ranges_overlap(address, len, PCI_COMMAND, 2)) {
>> -        pci_default_write_config(d, address, val, len);
>> -        /* Continue to program the card */
>> -    }
>> -
>> -    /*
>> -     * Catch access to
>> -     *  - base address registers
>> -     *  - ROM base address & capability pointer
>> -     *  - interrupt line & pin
>> -     */
>> -    if (ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
>> -        ranges_overlap(address, len, PCI_ROM_ADDRESS, 4)) {
>> -        pci_default_write_config(d, address, val, len);
>> -        return;
>> -    } else if (ranges_overlap(address, len, PCI_CAPABILITY_LIST, 1) ||
>> -               ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) {
>> -        uint32_t real_val;
>> -
>> -        pci_default_write_config(d, address, val, len);
>> -
>> -        /* Ensure that writes to overlapping areas we don't virtualize still
>> -         * hit the device. */
>> -        real_val = assigned_dev_pci_read(d, address, len);
>> -        val = merge_bits(val, real_val, address, len,
>> -                         PCI_CAPABILITY_LIST, 0xff);
>> -        val = merge_bits(val, real_val, address, len,
>> -                         PCI_INTERRUPT_LINE, 0xffff);
>> -    }
>> -
>> -    DEBUG("NON BAR (%x.%x): address=%04x val=0x%08x len=%d\n",
>> -          ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
>> -          (uint16_t) address, val, len);
>> -
>> -    fd = pci_dev->real_device.config_fd;
>> -
>> -again:
>> -    ret = pwrite(fd, &val, len, address);
>> -    if (ret != len) {
>> -	if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
>> -	    goto again;
>> -
>> -	fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
>> -		__func__, ret, errno);
>> -
>> -	exit(1);
>> -    }
>> -}
>> -
>> -static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address,
>> -                                             int len)
>> -{
>> -    uint32_t val = 0, virt_val;
>> -    int fd;
>> -    ssize_t ret;
>> -    AssignedDevice *pci_dev = DO_UPCAST(AssignedDevice, dev, d);
>> -
>> -    if (address >= PCI_CONFIG_HEADER_SIZE && d->config_map[address]) {
>> -        val = assigned_device_pci_cap_read_config(d, address, len);
>> -        DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
>> -              (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
>> -        return val;
>> -    }
>> -
>> -    /*
>> -     * Catch access to
>> -     *  - vendor & device ID
>> -     *  - base address registers
>> -     *  - ROM base address
>> -     */
>> -    if (ranges_overlap(address, len, PCI_VENDOR_ID, 4) ||
>> -        ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
>> -        ranges_overlap(address, len, PCI_ROM_ADDRESS, 4)) {
>> -        val = pci_default_read_config(d, address, len);
>> -        DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
>> -              (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
>> -        return val;
>> -    }
>> -
>> -    fd = pci_dev->real_device.config_fd;
>> -
>> -again:
>> -    ret = pread(fd, &val, len, address);
>> -    if (ret != len) {
>> -	if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
>> -	    goto again;
>> -
>> -	fprintf(stderr, "%s: pread failed, ret = %zd errno = %d\n",
>> -		__func__, ret, errno);
>> -
>> -	exit(1);
>> -    }
>> -
>> -    DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
>> -          (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
>> -
>> -    if (pci_dev->emulate_cmd_mask) {
>> -        val = merge_bits(val, pci_default_read_config(d, address, len),
>> -                         address, len, PCI_COMMAND, pci_dev->emulate_cmd_mask);
>> -    }
>> -
>> -    /*
>> -     * Merge bits from virtualized
>> -     *  - capability pointer
>> -     *  - interrupt line & pin
>> -     */
>> -    virt_val = pci_default_read_config(d, address, len);
>> -    val = merge_bits(val, virt_val, address, len, PCI_CAPABILITY_LIST, 0xff);
>> -    val = merge_bits(val, virt_val, address, len, PCI_INTERRUPT_LINE, 0xffff);
>> -
>> -    if (!pci_dev->cap.available) {
>> -        /* kill the special capabilities */
>> -        if (address == PCI_COMMAND && len == 4) {
>> -            val &= ~(PCI_STATUS_CAP_LIST << 16);
>> -        } else if (address == PCI_STATUS) {
>> -            val &= ~PCI_STATUS_CAP_LIST;
>> -        }
>> -    }
>> -
>> -    return val;
>> -}
>> -
>>  static int assigned_dev_register_regions(PCIRegion *io_regions,
>>                                           unsigned long regions_num,
>>                                           AssignedDevice *pci_dev)
>> @@ -790,7 +624,8 @@ again:
>>      /* dealing with virtual function device */
>>      snprintf(name, sizeof(name), "%sphysfn/", dir);
>>      if (!stat(name, &statbuf)) {
>> -        pci_dev->emulate_cmd_mask = 0xffff;
>> +        /* always provide the written value on readout */
>> +        memset(pci_dev->direct_config_read + PCI_COMMAND, 0xff, 2);
>>      }
>>  
>>      dev->region_number = r;
>> @@ -1268,73 +1103,81 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
>>      }
>>  }
>>  
>> -/* There can be multiple VNDR capabilities per device, we need to find the
>> - * one that starts closet to the given address without going over. */
>> -static uint8_t find_vndr_start(PCIDevice *pci_dev, uint32_t address)
>> +static uint32_t assigned_dev_pci_read_config(PCIDevice *pci_dev,
>> +                                             uint32_t address, int len)
>>  {
>> -    uint8_t cap, pos;
>> +    AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
>> +    uint32_t virt_val = pci_default_read_config(pci_dev, address, len);
>> +    uint32_t real_val, direct_mask;
>>  
>> -    for (cap = pos = 0;
>> -         (pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VNDR, pos));
>> -         pos += PCI_CAP_LIST_NEXT) {
>> -        if (pos <= address) {
>> -            cap = MAX(pos, cap);
>> -        }
>> +    switch (len) {
>> +    case 4:
>> +        direct_mask =
>> +            pci_get_long(assigned_dev->direct_config_read + address);
>> +        break;
>> +    case 2:
>> +        direct_mask =
>> +            pci_get_word(assigned_dev->direct_config_read + address);
>> +        break;
>> +    case 1:
>> +        direct_mask =
>> +            pci_get_byte(assigned_dev->direct_config_read + address);
>> +        break;
>> +    default:
>> +        abort();
>>      }
>> -    return cap;
>> -}
>> -
>> -static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
>> -                                                    uint32_t address, int len)
>> -{
>> -    uint8_t cap, cap_id = pci_dev->config_map[address];
>> -    uint32_t val;
>> -
>> -    switch (cap_id) {
>> -
>> -    case PCI_CAP_ID_VPD:
>> -        cap = pci_find_capability(pci_dev, cap_id);
>> -        val = assigned_dev_pci_read(pci_dev, address, len);
>> -        return merge_bits(val, pci_get_long(pci_dev->config + address),
>> -                          address, len, cap + PCI_CAP_LIST_NEXT, 0xff);
>> -
>> -    case PCI_CAP_ID_VNDR:
>> -        cap = find_vndr_start(pci_dev, address);
>> -        val = assigned_dev_pci_read(pci_dev, address, len);
>> -        return merge_bits(val, pci_get_long(pci_dev->config + address),
>> -                          address, len, cap + PCI_CAP_LIST_NEXT, 0xff);
>> +    if (direct_mask) {
>> +        real_val = assigned_dev_pci_read(pci_dev, address, len);
>> +        return (virt_val & ~direct_mask) | (real_val & direct_mask);
>> +    } else {
>> +        return virt_val;
>>      }
>> -
>> -    return pci_default_read_config(pci_dev, address, len);
>>  }
>>  
>> -static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
>> -                                                 uint32_t address,
>> -                                                 uint32_t val, int len)
>> +static void assigned_dev_pci_write_config(PCIDevice *pci_dev, uint32_t address,
>> +                                          uint32_t val, int len)
>>  {
>> -    uint8_t cap_id = pci_dev->config_map[address];
>> +    AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
>> +    uint32_t direct_mask, full_access;
>>  
>>      pci_default_write_config(pci_dev, address, val, len);
>> -    switch (cap_id) {
>> -    case PCI_CAP_ID_MSI:
>> +
>> +    if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
>>          if (range_covers_byte(address, len,
>>                                pci_dev->msi_cap + PCI_MSI_FLAGS)) {
>>              assigned_dev_update_msi(pci_dev);
>>          }
>> -        break;
>> -
>> -    case PCI_CAP_ID_MSIX:
>> +    } else if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
>>          if (range_covers_byte(address, len,
>>                                pci_dev->msix_cap + PCI_MSIX_FLAGS + 1)) {
>>              assigned_dev_update_msix(pci_dev);
>>          }
>> -        break;
>> +    }
>>  
>> -    case PCI_CAP_ID_VPD:
>> -    case PCI_CAP_ID_VNDR:
>> -        assigned_dev_pci_write(pci_dev, address, val, len);
>> +    switch (len) {
>> +    case 4:
>> +        direct_mask =
>> +            pci_get_long(assigned_dev->direct_config_write + address);
>> +        full_access = 0xffffffff;
>> +        break;
>> +    case 2:
>> +        direct_mask =
>> +            pci_get_word(assigned_dev->direct_config_write + address);
>> +        full_access = 0xffff;
>>          break;
>> +    case 1:
>> +        direct_mask =
>> +            pci_get_byte(assigned_dev->direct_config_write + address);
>> +        full_access = 0xff;
>> +        break;
>> +    default:
>> +        abort();
>> +    }
>> +    if (direct_mask != full_access) {
>> +        val &= direct_mask;
>> +        val |= assigned_dev_pci_read(pci_dev, address, len) & ~direct_mask;
>>      }
>> +    assigned_dev_pci_write(pci_dev, address, val, len);
> 
> I think we should not write at all for fully virtualized fields
> (direct_mask == 0). E.g. PCI BARs.


Yes, Alex caught that as well. Fixed already.

> 
>>  }
>>  
>>  static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>> @@ -1404,6 +1247,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>>              return ret;
>>          }
>>  
>> +        /* direct read except for next cap pointer */
>> +        memset(dev->direct_config_read + pos, 0xff, PCI_PM_SIZEOF);
>> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
>> +
>>          pmc = pci_get_word(pci_dev->config + pos + PCI_CAP_FLAGS);
>>          pmc &= (PCI_PM_CAP_VER_MASK | PCI_PM_CAP_DSI);
>>          pci_set_word(pci_dev->config + pos + PCI_CAP_FLAGS, pmc);
>> @@ -1438,6 +1285,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>>              return ret;
>>          }
>>  
>> +        /* direct read except for next cap pointer */
>> +        memset(dev->direct_config_read + pos, 0xff, size);
>> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
>> +
>>          type = pci_get_word(pci_dev->config + pos + PCI_EXP_FLAGS);
>>          type = (type & PCI_EXP_FLAGS_TYPE) >> 8;
>>          if (type != PCI_EXP_TYPE_ENDPOINT &&
>> @@ -1513,6 +1364,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>>              return ret;
>>          }
>>  
>> +        /* direct read except for next cap pointer */
>> +        memset(dev->direct_config_read + pos, 0xff, 8);
>> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
>> +
>>          /* Command register, clear upper bits, including extended modes */
>>          cmd = pci_get_word(pci_dev->config + pos + PCI_X_CMD);
>>          cmd &= (PCI_X_CMD_DPERR_E | PCI_X_CMD_ERO | PCI_X_CMD_MAX_READ |
>> @@ -1534,6 +1389,13 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>>          if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_VPD, pos, 8)) < 0) {
>>              return ret;
>>          }
>> +
>> +        /* direct read except for next cap pointer */
>> +        memset(dev->direct_config_read + pos, 0xff, 8);
>> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
>> +
>> +        /* direct write for cap content */
>> +        memset(dev->direct_config_write + pos + 2, 0xff, 6);
>>      }
>>  
>>      /* Devices can have multiple vendor capabilities, get them all */
>> @@ -1545,6 +1407,21 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>>                                        pos, len)) < 0) {
>>              return ret;
>>          }
>> +
>> +        /* direct read except for next cap pointer */
>> +        memset(dev->direct_config_read + pos, 0xff, len);
>> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> 
> Add a function for the above 3 lines?
> 
>> +
>> +        /* direct write for cap content */
>> +        memset(dev->direct_config_write + pos + 2, 0xff, len - 2);
>> +    }
>> +
>> +    /* If real and virtual capability list status bits differ, virtualize the
>> +     * access. */
>> +    if ((pci_get_word(pci_dev->config + PCI_STATUS) & PCI_STATUS_CAP_LIST) !=
>> +        (assigned_dev_pci_read_byte(pci_dev, PCI_STATUS) &
>> +         PCI_STATUS_CAP_LIST)) {
>> +        dev->direct_config_read[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
>>      }
> 
> Is this an optimization? What harm does it do to virtualize always?

Just replicating the old behavior, but I can virtualize it unconditionally.

> 
>>  
>>      return 0;
>> @@ -1694,6 +1571,26 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
>>          return -1;
>>      }
>>  
>> +    /*
>> +     * Set up basic config space access control. Will be further refined during
>> +     * device initialization.
>> +     *
>> +     * Direct read/write access is grangted for:
> 
> typo
> 
>> +     *  - status & command registers, revision ID & class code, cacheline size,
>> +     *    latency timer, header type, BIST
>> +     *  - cardbus CIS pointer, subsystem vendor & ID
>> +     *  - reserved fields
>> +     *  - Min_Gnt & Max_Lat
>> +     */
>> +    memset(dev->direct_config_read + PCI_COMMAND, 0xff, 12);
>> +    memset(dev->direct_config_read + PCI_CARDBUS_CIS, 0xff, 8);
>> +    memset(dev->direct_config_read + PCI_CAPABILITY_LIST + 1, 0xff, 7);
>> +    memset(dev->direct_config_read + PCI_MIN_GNT, 0xff, 2);
>> +    memset(dev->direct_config_write + PCI_COMMAND, 0xff, 12);
>> +    memset(dev->direct_config_write + PCI_CARDBUS_CIS, 0xff, 8);
>> +    memset(dev->direct_config_write + PCI_CAPABILITY_LIST + 1, 0xff, 7);
>> +    memset(dev->direct_config_write + PCI_MIN_GNT, 0xff, 2);
>> +
>>      if (get_real_device(dev, dev->host.seg, dev->host.bus,
>>                          dev->host.dev, dev->host.func)) {
>>          error_report("pci-assign: Error: Couldn't get real device (%s)!",
>> diff --git a/hw/device-assignment.h b/hw/device-assignment.h
>> index 3d20207..ed5defb 100644
>> --- a/hw/device-assignment.h
>> +++ b/hw/device-assignment.h
>> @@ -104,12 +104,13 @@ typedef struct AssignedDevice {
>>  #define ASSIGNED_DEVICE_MSIX_MASKED (1 << 2)
>>          uint32_t state;
>>      } cap;
>> +    uint8_t direct_config_read[PCI_CONFIG_SPACE_SIZE];
>> +    uint8_t direct_config_write[PCI_CONFIG_SPACE_SIZE];
> 
> What direct means isn't obvious. Add a comment?

Ok.

> 
> We could revert the logic, have bits for
> emulate_config_read/emulate_config_write.  I think that most of the
> registers can safely be read/written directly, so protecting relevant
> bits will be less work. Is that true?

I intentionally chose whitelisting to avoid accidentally granting access
to forgotten fields. I don't think it's a good idea to switch to
blacklisting.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 12/13] pci-assign: Generic config space access management
  2011-06-28  8:07     ` Avi Kivity
@ 2011-06-28  8:19       ` Jan Kiszka
  2011-06-28  8:21         ` Avi Kivity
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2011-06-28  8:19 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alex Williamson, Marcelo Tosatti, kvm, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 445 bytes --]

On 2011-06-28 10:07, Avi Kivity wrote:
> On 06/28/2011 01:48 AM, Alex Williamson wrote:
>>> +    memset(dev->direct_config_read + PCI_COMMAND, 0xff, 12);
>> This is more efficient, but maybe we should memset each field for
>> grep-ability.
> 
> Agree.  Or even
> 
>     pci_assign_set_direct_config(dev, register, start_bit, nr_bits);

Given that pci_regs.h defines masks and not bits, working with |= and &=
is more handy.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 12/13] pci-assign: Generic config space access management
  2011-06-28  8:19       ` Jan Kiszka
@ 2011-06-28  8:21         ` Avi Kivity
  0 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2011-06-28  8:21 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, kvm, Michael S. Tsirkin

On 06/28/2011 11:19 AM, Jan Kiszka wrote:
> On 2011-06-28 10:07, Avi Kivity wrote:
> >  On 06/28/2011 01:48 AM, Alex Williamson wrote:
> >>>  +    memset(dev->direct_config_read + PCI_COMMAND, 0xff, 12);
> >>  This is more efficient, but maybe we should memset each field for
> >>  grep-ability.
> >
> >  Agree.  Or even
> >
> >      pci_assign_set_direct_config(dev, register, start_bit, nr_bits);
>
> Given that pci_regs.h defines masks and not bits, working with |= and&=
> is more handy.

Ok.  But a function is clearer than a memset.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 12/13] pci-assign: Generic config space access management
  2011-06-28  8:18     ` Jan Kiszka
@ 2011-06-28  8:30       ` Michael S. Tsirkin
  2011-06-28  9:20         ` Jan Kiszka
  0 siblings, 1 reply; 48+ messages in thread
From: Michael S. Tsirkin @ 2011-06-28  8:30 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

On Tue, Jun 28, 2011 at 10:18:20AM +0200, Jan Kiszka wrote:
> > Is this an optimization? What harm does it do to virtualize always?
> 
> Just replicating the old behavior,

Oh, I absolutely agree with that.
It's better to do more cleanups in separate patches,
this one is huge so it's better if bisect can lead us to
a specific change.

> but I can virtualize it unconditionally.


Just pointing out follow-up
cleanups that become possible.

> > 
> >>  
> >>      return 0;
> >> @@ -1694,6 +1571,26 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
> >>          return -1;
> >>      }
> >>  
> >> +    /*
> >> +     * Set up basic config space access control. Will be further refined during
> >> +     * device initialization.
> >> +     *
> >> +     * Direct read/write access is grangted for:
> > 
> > typo
> > 
> >> +     *  - status & command registers, revision ID & class code, cacheline size,
> >> +     *    latency timer, header type, BIST
> >> +     *  - cardbus CIS pointer, subsystem vendor & ID
> >> +     *  - reserved fields
> >> +     *  - Min_Gnt & Max_Lat
> >> +     */
> >> +    memset(dev->direct_config_read + PCI_COMMAND, 0xff, 12);
> >> +    memset(dev->direct_config_read + PCI_CARDBUS_CIS, 0xff, 8);
> >> +    memset(dev->direct_config_read + PCI_CAPABILITY_LIST + 1, 0xff, 7);
> >> +    memset(dev->direct_config_read + PCI_MIN_GNT, 0xff, 2);
> >> +    memset(dev->direct_config_write + PCI_COMMAND, 0xff, 12);
> >> +    memset(dev->direct_config_write + PCI_CARDBUS_CIS, 0xff, 8);
> >> +    memset(dev->direct_config_write + PCI_CAPABILITY_LIST + 1, 0xff, 7);
> >> +    memset(dev->direct_config_write + PCI_MIN_GNT, 0xff, 2);
> >> +
> >>      if (get_real_device(dev, dev->host.seg, dev->host.bus,
> >>                          dev->host.dev, dev->host.func)) {
> >>          error_report("pci-assign: Error: Couldn't get real device (%s)!",
> >> diff --git a/hw/device-assignment.h b/hw/device-assignment.h
> >> index 3d20207..ed5defb 100644
> >> --- a/hw/device-assignment.h
> >> +++ b/hw/device-assignment.h
> >> @@ -104,12 +104,13 @@ typedef struct AssignedDevice {
> >>  #define ASSIGNED_DEVICE_MSIX_MASKED (1 << 2)
> >>          uint32_t state;
> >>      } cap;
> >> +    uint8_t direct_config_read[PCI_CONFIG_SPACE_SIZE];
> >> +    uint8_t direct_config_write[PCI_CONFIG_SPACE_SIZE];
> > 
> > What direct means isn't obvious. Add a comment?
> 
> Ok.
> 
> > 
> > We could revert the logic, have bits for
> > emulate_config_read/emulate_config_write.  I think that most of the
> > registers can safely be read/written directly, so protecting relevant
> > bits will be less work. Is that true?
> 
> I intentionally chose whitelisting to avoid accidentally granting access
> to forgotten fields. I don't think it's a good idea to switch to
> blacklisting.
> 
> Jan
> 

At least in theory iommu protects us from most harm device can do
(besides downstream transactions, which is why we must protect the BAR).
Even if we change things, it probably should be a separate patch since
whitelisting is what existing code had.
However, I think emulate_xx is just a better name for the field.
You can preset them to all ones if you think it's safer.

-- 
MST

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 12/13] pci-assign: Generic config space access management
  2011-06-27 18:19 ` [PATCH 12/13] pci-assign: Generic config space access management Jan Kiszka
                     ` (2 preceding siblings ...)
  2011-06-28  8:10   ` Michael S. Tsirkin
@ 2011-06-28  8:51   ` Michael S. Tsirkin
  2011-06-28  9:10     ` Avi Kivity
  3 siblings, 1 reply; 48+ messages in thread
From: Michael S. Tsirkin @ 2011-06-28  8:51 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

On Mon, Jun 27, 2011 at 08:19:55PM +0200, Jan Kiszka wrote:
> -/* There can be multiple VNDR capabilities per device, we need to find the
> - * one that starts closet to the given address without going over. */
> -static uint8_t find_vndr_start(PCIDevice *pci_dev, uint32_t address)
> +static uint32_t assigned_dev_pci_read_config(PCIDevice *pci_dev,
> +                                             uint32_t address, int len)
>  {
> -    uint8_t cap, pos;
> +    AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> +    uint32_t virt_val = pci_default_read_config(pci_dev, address, len);
> +    uint32_t real_val, direct_mask;
>  
> -    for (cap = pos = 0;
> -         (pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VNDR, pos));
> -         pos += PCI_CAP_LIST_NEXT) {
> -        if (pos <= address) {
> -            cap = MAX(pos, cap);
> -        }
> +    switch (len) {
> +    case 4:
> +        direct_mask =
> +            pci_get_long(assigned_dev->direct_config_read + address);
> +        break;
> +    case 2:
> +        direct_mask =
> +            pci_get_word(assigned_dev->direct_config_read + address);
> +        break;
> +    case 1:
> +        direct_mask =
> +            pci_get_byte(assigned_dev->direct_config_read + address);
> +        break;
> +    default:
> +        abort();
>      }

One small issue here is that I think read config is not guaranteed to
get a length aligned address, while pci_get_XXX assume length
alignment. Further, because of alignment you can get beyond
the end of array.

I don't think it ever happens in practice on the PC, so we can
add an assert, but it's might be easier to look at how
pci_default_read_config handles all these issues:

    uint32_t val = 0;
    assert(len == 1 || len == 2 || len == 4);
    len = MIN(len, pci_config_size(d) - address);
    memcpy(&val, d->config + address, len);
    return le32_to_cpu(val);

You can replace config with direct access, and check full access
with:

	allones = ~0x0;
	memcmp(&allones, &val, len)
and full emulation:
	zero = 0x0;
	memcmp(&zero, &val, len)

This is not done correctly by current code btw, I think.

> -    return cap;
> -}
> -
> -static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
> -                                                    uint32_t address, int len)
> -{
> -    uint8_t cap, cap_id = pci_dev->config_map[address];
> -    uint32_t val;
> -
> -    switch (cap_id) {
> -
> -    case PCI_CAP_ID_VPD:
> -        cap = pci_find_capability(pci_dev, cap_id);
> -        val = assigned_dev_pci_read(pci_dev, address, len);
> -        return merge_bits(val, pci_get_long(pci_dev->config + address),
> -                          address, len, cap + PCI_CAP_LIST_NEXT, 0xff);
> -
> -    case PCI_CAP_ID_VNDR:
> -        cap = find_vndr_start(pci_dev, address);
> -        val = assigned_dev_pci_read(pci_dev, address, len);
> -        return merge_bits(val, pci_get_long(pci_dev->config + address),
> -                          address, len, cap + PCI_CAP_LIST_NEXT, 0xff);
> +    if (direct_mask) {
> +        real_val = assigned_dev_pci_read(pci_dev, address, len);
> +        return (virt_val & ~direct_mask) | (real_val & direct_mask);
> +    } else {
> +        return virt_val;
>      }
> -
> -    return pci_default_read_config(pci_dev, address, len);
>  }
>  
> -static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
> -                                                 uint32_t address,
> -                                                 uint32_t val, int len)
> +static void assigned_dev_pci_write_config(PCIDevice *pci_dev, uint32_t address,
> +                                          uint32_t val, int len)
>  {
> -    uint8_t cap_id = pci_dev->config_map[address];
> +    AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> +    uint32_t direct_mask, full_access;
>  
>      pci_default_write_config(pci_dev, address, val, len);
> -    switch (cap_id) {
> -    case PCI_CAP_ID_MSI:
> +
> +    if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
>          if (range_covers_byte(address, len,
>                                pci_dev->msi_cap + PCI_MSI_FLAGS)) {
>              assigned_dev_update_msi(pci_dev);
>          }
> -        break;
> -
> -    case PCI_CAP_ID_MSIX:
> +    } else if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
>          if (range_covers_byte(address, len,
>                                pci_dev->msix_cap + PCI_MSIX_FLAGS + 1)) {
>              assigned_dev_update_msix(pci_dev);
>          }
> -        break;
> +    }
>  
> -    case PCI_CAP_ID_VPD:
> -    case PCI_CAP_ID_VNDR:
> -        assigned_dev_pci_write(pci_dev, address, val, len);
> +    switch (len) {
> +    case 4:
> +        direct_mask =
> +            pci_get_long(assigned_dev->direct_config_write + address);
> +        full_access = 0xffffffff;
> +        break;
> +    case 2:
> +        direct_mask =
> +            pci_get_word(assigned_dev->direct_config_write + address);
> +        full_access = 0xffff;
>          break;
> +    case 1:
> +        direct_mask =
> +            pci_get_byte(assigned_dev->direct_config_write + address);
> +        full_access = 0xff;
> +        break;
> +    default:
> +        abort();
> +    }
> +    if (direct_mask != full_access) {
> +        val &= direct_mask;
> +        val |= assigned_dev_pci_read(pci_dev, address, len) & ~direct_mask;
>      }
> +    assigned_dev_pci_write(pci_dev, address, val, len);
>  }
>  

I need to think about it a bit more, but did you consider
that with VPD reads have side-effects? I'm not
saying I see a bug but it makes me a bit nervious.

Specifically, if direct mask disables access
to a specific byte, you will still happily
go ahead and read it from the device if there's a word access that
covers it.

>  static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
> @@ -1404,6 +1247,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>              return ret;
>          }
>  
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, PCI_PM_SIZEOF);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
>          pmc = pci_get_word(pci_dev->config + pos + PCI_CAP_FLAGS);
>          pmc &= (PCI_PM_CAP_VER_MASK | PCI_PM_CAP_DSI);
>          pci_set_word(pci_dev->config + pos + PCI_CAP_FLAGS, pmc);
> @@ -1438,6 +1285,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>              return ret;
>          }
>  
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, size);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
>          type = pci_get_word(pci_dev->config + pos + PCI_EXP_FLAGS);
>          type = (type & PCI_EXP_FLAGS_TYPE) >> 8;
>          if (type != PCI_EXP_TYPE_ENDPOINT &&
> @@ -1513,6 +1364,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>              return ret;
>          }
>  
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, 8);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
>          /* Command register, clear upper bits, including extended modes */
>          cmd = pci_get_word(pci_dev->config + pos + PCI_X_CMD);
>          cmd &= (PCI_X_CMD_DPERR_E | PCI_X_CMD_ERO | PCI_X_CMD_MAX_READ |
> @@ -1534,6 +1389,13 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>          if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_VPD, pos, 8)) < 0) {
>              return ret;
>          }
> +
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, 8);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
> +        /* direct write for cap content */
> +        memset(dev->direct_config_write + pos + 2, 0xff, 6);
>      }
>  
>      /* Devices can have multiple vendor capabilities, get them all */
> @@ -1545,6 +1407,21 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>                                        pos, len)) < 0) {
>              return ret;
>          }
> +
> +        /* direct read except for next cap pointer */
> +        memset(dev->direct_config_read + pos, 0xff, len);
> +        dev->direct_config_read[pos + PCI_CAP_LIST_NEXT] = 0;
> +
> +        /* direct write for cap content */
> +        memset(dev->direct_config_write + pos + 2, 0xff, len - 2);
> +    }
> +
> +    /* If real and virtual capability list status bits differ, virtualize the
> +     * access. */
> +    if ((pci_get_word(pci_dev->config + PCI_STATUS) & PCI_STATUS_CAP_LIST) !=
> +        (assigned_dev_pci_read_byte(pci_dev, PCI_STATUS) &
> +         PCI_STATUS_CAP_LIST)) {
> +        dev->direct_config_read[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
>      }
>  
>      return 0;
> @@ -1694,6 +1571,26 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
>          return -1;
>      }
>  
> +    /*
> +     * Set up basic config space access control. Will be further refined during
> +     * device initialization.
> +     *
> +     * Direct read/write access is grangted for:
> +     *  - status & command registers, revision ID & class code, cacheline size,
> +     *    latency timer, header type, BIST
> +     *  - cardbus CIS pointer, subsystem vendor & ID
> +     *  - reserved fields
> +     *  - Min_Gnt & Max_Lat
> +     */
> +    memset(dev->direct_config_read + PCI_COMMAND, 0xff, 12);
> +    memset(dev->direct_config_read + PCI_CARDBUS_CIS, 0xff, 8);
> +    memset(dev->direct_config_read + PCI_CAPABILITY_LIST + 1, 0xff, 7);
> +    memset(dev->direct_config_read + PCI_MIN_GNT, 0xff, 2);
> +    memset(dev->direct_config_write + PCI_COMMAND, 0xff, 12);
> +    memset(dev->direct_config_write + PCI_CARDBUS_CIS, 0xff, 8);
> +    memset(dev->direct_config_write + PCI_CAPABILITY_LIST + 1, 0xff, 7);
> +    memset(dev->direct_config_write + PCI_MIN_GNT, 0xff, 2);
> +
>      if (get_real_device(dev, dev->host.seg, dev->host.bus,
>                          dev->host.dev, dev->host.func)) {
>          error_report("pci-assign: Error: Couldn't get real device (%s)!",
> diff --git a/hw/device-assignment.h b/hw/device-assignment.h
> index 3d20207..ed5defb 100644
> --- a/hw/device-assignment.h
> +++ b/hw/device-assignment.h
> @@ -104,12 +104,13 @@ typedef struct AssignedDevice {
>  #define ASSIGNED_DEVICE_MSIX_MASKED (1 << 2)
>          uint32_t state;
>      } cap;
> +    uint8_t direct_config_read[PCI_CONFIG_SPACE_SIZE];
> +    uint8_t direct_config_write[PCI_CONFIG_SPACE_SIZE];
>      int irq_entries_nr;
>      struct kvm_irq_routing_entry *entry;
>      void *msix_table_page;
>      target_phys_addr_t msix_table_addr;
>      int mmio_index;
> -    uint32_t emulate_cmd_mask;
>      char *configfd_name;
>      int32_t bootindex;
>      QLIST_ENTRY(AssignedDevice) next;
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 09/13] pci-assign: Drop libpci header dependency
  2011-06-27 18:19 ` [PATCH 09/13] pci-assign: Drop libpci header dependency Jan Kiszka
@ 2011-06-28  8:54   ` Michael S. Tsirkin
  0 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2011-06-28  8:54 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

On Mon, Jun 27, 2011 at 08:19:52PM +0200, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> All constants are now available through QEMU. Also drop the upstream
> diff of pci_regs.h, it cannot clash with libpci anymore.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

Yay.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions
  2011-06-28  8:10 ` [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Avi Kivity
@ 2011-06-28  8:57   ` Michael S. Tsirkin
  0 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2011-06-28  8:57 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jan Kiszka, Marcelo Tosatti, kvm, Alex Williamson, Joerg Roedel

On Tue, Jun 28, 2011 at 11:10:04AM +0300, Avi Kivity wrote:
> On 06/27/2011 09:19 PM, Jan Kiszka wrote:
> >This series basically consists of two halves. The first one applies a
> >few smaller cleanups to qemu-kvm to improve similarity with upstream.
> >This includes the recently discussed removal of -enable-nesting.
> >
> >The second half starts with two device assignment fixes and then applies
> >some refactorings, specifically dropping the libpci dependency and
> >enabling the revert of some qemu-kvm private PCI core changes. The
> >latter is achieved by simplifying the access control management to the
> >passed-through device's config space. That also saves 100 LOC.
> >
> >Note that this series was tested with upstream commit af2be20777 ("Fix
> >fallouts from Linux header inclusion") applied and probably depends on
> >it also mechanically.
> >
> >Please review/merge.
> >
> 
> Thanks, applied 1-6.  The rest looks good as well, just need the few
> comments attended to.

7-11 look good to me too.

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 13/13] qemu-kvm: Resolve PCI upstream diffs
  2011-06-27 18:19 ` [PATCH 13/13] qemu-kvm: Resolve PCI upstream diffs Jan Kiszka
@ 2011-06-28  8:58   ` Michael S. Tsirkin
  2011-06-28  9:12     ` Jan Kiszka
  0 siblings, 1 reply; 48+ messages in thread
From: Michael S. Tsirkin @ 2011-06-28  8:58 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

On Mon, Jun 27, 2011 at 08:19:56PM +0200, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> Device assignment no longer peeks into config_map, so we can drop all
> the related changes and sync the PCI core with upstream.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

That's great because I was looking at refactoring that code.
Will make merges much less painful.

> ---
>  hw/pci.c |   29 +++++++++++++++--------------
>  hw/pci.h |    8 ++++++--
>  2 files changed, 21 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 5deaa04..a637506 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -793,7 +793,7 @@ static void pci_config_alloc(PCIDevice *pci_dev)
>      pci_dev->cmask = qemu_mallocz(config_size);
>      pci_dev->wmask = qemu_mallocz(config_size);
>      pci_dev->w1cmask = qemu_mallocz(config_size);
> -    pci_dev->config_map = qemu_mallocz(config_size);
> +    pci_dev->used = qemu_mallocz(config_size);
>  }
>  
>  static void pci_config_free(PCIDevice *pci_dev)
> @@ -802,7 +802,7 @@ static void pci_config_free(PCIDevice *pci_dev)
>      qemu_free(pci_dev->cmask);
>      qemu_free(pci_dev->wmask);
>      qemu_free(pci_dev->w1cmask);
> -    qemu_free(pci_dev->config_map);
> +    qemu_free(pci_dev->used);
>  }
>  
>  /* -1 for devfn means auto assign */
> @@ -833,8 +833,6 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>      pci_dev->irq_state = 0;
>      pci_config_alloc(pci_dev);
>  
> -    memset(pci_dev->config_map, 0xff, PCI_CONFIG_HEADER_SIZE);
> -
>      pci_config_set_vendor_id(pci_dev->config, info->vendor_id);
>      pci_config_set_device_id(pci_dev->config, info->device_id);
>      pci_config_set_revision(pci_dev->config, info->revision);
> @@ -1899,7 +1897,7 @@ static int pci_find_space(PCIDevice *pdev, uint8_t size)
>      int offset = PCI_CONFIG_HEADER_SIZE;
>      int i;
>      for (i = PCI_CONFIG_HEADER_SIZE; i < config_size; ++i)
> -        if (pdev->config_map[i])
> +        if (pdev->used[i])
>              offset = i + 1;
>          else if (i - offset + 1 == size)
>              return offset;
> @@ -2080,13 +2078,13 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>          int i;
>  
>          for (i = offset; i < offset + size; i++) {
> -            if (pdev->config_map[i]) {
> +            if (pdev->used[i]) {
>                  fprintf(stderr, "ERROR: %04x:%02x:%02x.%x "
>                          "Attempt to add PCI capability %x at offset "
>                          "%x overlaps existing capability %x at offset %x\n",
>                          pci_find_domain(pdev->bus), pci_bus_num(pdev->bus),
>                          PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
> -                        cap_id, offset, pdev->config_map[i], i);
> +                        cap_id, offset, pdev->used[i], i);
>                  return -EINVAL;
>              }
>          }
> @@ -2096,14 +2094,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>      config[PCI_CAP_LIST_ID] = cap_id;
>      config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>      pdev->config[PCI_CAPABILITY_LIST] = offset;
> -    memset(pdev->config_map + offset, cap_id, size);
> +    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
> +    memset(pdev->used + offset, 0xFF, size);
>      /* Make capability read-only by default */
>      memset(pdev->wmask + offset, 0, size);
>      /* Check capability by default */
>      memset(pdev->cmask + offset, 0xFF, size);
> -
> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
> -
>      return offset;
>  }
>  
> @@ -2119,11 +2115,16 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
>      memset(pdev->w1cmask + offset, 0, size);
>      /* Clear cmask as device-specific registers can't be checked */
>      memset(pdev->cmask + offset, 0, size);
> -    memset(pdev->config_map + offset, 0, size);
> +    memset(pdev->used + offset, 0, size);
>  
> -    if (!pdev->config[PCI_CAPABILITY_LIST]) {
> +    if (!pdev->config[PCI_CAPABILITY_LIST])
>          pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
> -    }
> +}
> +
> +/* Reserve space for capability at a known offset (to call after load). */
> +void pci_reserve_capability(PCIDevice *pdev, uint8_t offset, uint8_t size)
> +{
> +    memset(pdev->used + offset, 0xff, size);
>  }
>  
>  uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id)
> diff --git a/hw/pci.h b/hw/pci.h
> index c7081bc..ed734b3 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -146,8 +146,8 @@ struct PCIDevice {
>      /* Used to implement RW1C(Write 1 to Clear) bytes */
>      uint8_t *w1cmask;
>  
> -    /* Used to allocate config space and track capabilities. */
> -    uint8_t *config_map;
> +    /* Used to allocate config space capabilities. */
> +    uint8_t *used;
>  
>      /* the following fields are read only */
>      PCIBus *bus;
> @@ -233,14 +233,18 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>  
>  void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size);
>  
> +void pci_reserve_capability(PCIDevice *pci_dev, uint8_t offset, uint8_t size);
> +
>  uint8_t pci_find_capability(PCIDevice *pci_dev, uint8_t cap_id);
>  
> +
>  uint32_t pci_default_read_config(PCIDevice *d,
>                                   uint32_t address, int len);
>  void pci_default_write_config(PCIDevice *d,
>                                uint32_t address, uint32_t val, int len);
>  void pci_device_save(PCIDevice *s, QEMUFile *f);
>  int pci_device_load(PCIDevice *s, QEMUFile *f);
> +
>  typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level);
>  typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num);
>  
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 12/13] pci-assign: Generic config space access management
  2011-06-28  8:51   ` Michael S. Tsirkin
@ 2011-06-28  9:10     ` Avi Kivity
  0 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2011-06-28  9:10 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jan Kiszka, Marcelo Tosatti, kvm, Alex Williamson

On 06/28/2011 11:51 AM, Michael S. Tsirkin wrote:
> On Mon, Jun 27, 2011 at 08:19:55PM +0200, Jan Kiszka wrote:
> >  -/* There can be multiple VNDR capabilities per device, we need to find the
> >  - * one that starts closet to the given address without going over. */
> >  -static uint8_t find_vndr_start(PCIDevice *pci_dev, uint32_t address)
> >  +static uint32_t assigned_dev_pci_read_config(PCIDevice *pci_dev,
> >  +                                             uint32_t address, int len)
> >   {
> >  -    uint8_t cap, pos;
> >  +    AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> >  +    uint32_t virt_val = pci_default_read_config(pci_dev, address, len);
> >  +    uint32_t real_val, direct_mask;
> >
> >  -    for (cap = pos = 0;
> >  -         (pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VNDR, pos));
> >  -         pos += PCI_CAP_LIST_NEXT) {
> >  -        if (pos<= address) {
> >  -            cap = MAX(pos, cap);
> >  -        }
> >  +    switch (len) {
> >  +    case 4:
> >  +        direct_mask =
> >  +            pci_get_long(assigned_dev->direct_config_read + address);
> >  +        break;
> >  +    case 2:
> >  +        direct_mask =
> >  +            pci_get_word(assigned_dev->direct_config_read + address);
> >  +        break;
> >  +    case 1:
> >  +        direct_mask =
> >  +            pci_get_byte(assigned_dev->direct_config_read + address);
> >  +        break;
> >  +    default:
> >  +        abort();
> >       }
>
> One small issue here is that I think read config is not guaranteed to
> get a length aligned address, while pci_get_XXX assume length
> alignment. Further, because of alignment you can get beyond
> the end of array.
>
> I don't think it ever happens in practice on the PC,

It cannot happen with CF8/CFC configuration (bits 1:0 of CF8 are 
reserved), but it can happen with mmconfig.

> so we can
> add an assert, but it's might be easier to look at how
> pci_default_read_config handles all these issues:
>
>      uint32_t val = 0;
>      assert(len == 1 || len == 2 || len == 4);
>      len = MIN(len, pci_config_size(d) - address);
>      memcpy(&val, d->config + address, len);
>      return le32_to_cpu(val);
>
> You can replace config with direct access, and check full access
> with:
>
> 	allones = ~0x0;
> 	memcmp(&allones,&val, len)
> and full emulation:
> 	zero = 0x0;
> 	memcmp(&zero,&val, len)
>
> This is not done correctly by current code btw, I think.


I think this should be done by the bus code.

On the original Pentium, a 4-byte write to address 7 would be issued as 
a write to address 0 with #BE7 (byte enable 7) asserted, then a write to 
address 0 with #BE0, #BE1, and #B2 asserted.  That is, a one-byte write 
followed by a 3-byte write.

The memory API has support for this (only at the device level at 
present, but shouldn't be to difficult to implement at the bus level as 
well).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 13/13] qemu-kvm: Resolve PCI upstream diffs
  2011-06-28  8:58   ` Michael S. Tsirkin
@ 2011-06-28  9:12     ` Jan Kiszka
  2011-06-28  9:22       ` Michael S. Tsirkin
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2011-06-28  9:12 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 667 bytes --]

On 2011-06-28 10:58, Michael S. Tsirkin wrote:
> On Mon, Jun 27, 2011 at 08:19:56PM +0200, Jan Kiszka wrote:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> Device assignment no longer peeks into config_map, so we can drop all
>> the related changes and sync the PCI core with upstream.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> That's great because I was looking at refactoring that code.
> Will make merges much less painful.

I have more refactorings pending, though at best half-done. MSI/MSI-X
will be generalized, also to make it usable for pci assignment. Are we
working on orthogonal or overlapping topics here?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 12/13] pci-assign: Generic config space access management
  2011-06-28  8:30       ` Michael S. Tsirkin
@ 2011-06-28  9:20         ` Jan Kiszka
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2011-06-28  9:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 3391 bytes --]

On 2011-06-28 10:30, Michael S. Tsirkin wrote:
> On Tue, Jun 28, 2011 at 10:18:20AM +0200, Jan Kiszka wrote:
>>> Is this an optimization? What harm does it do to virtualize always?
>>
>> Just replicating the old behavior,
> 
> Oh, I absolutely agree with that.
> It's better to do more cleanups in separate patches,
> this one is huge so it's better if bisect can lead us to
> a specific change.
> 
>> but I can virtualize it unconditionally.
> 
> 
> Just pointing out follow-up
> cleanups that become possible.
> 
>>>
>>>>  
>>>>      return 0;
>>>> @@ -1694,6 +1571,26 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
>>>>          return -1;
>>>>      }
>>>>  
>>>> +    /*
>>>> +     * Set up basic config space access control. Will be further refined during
>>>> +     * device initialization.
>>>> +     *
>>>> +     * Direct read/write access is grangted for:
>>>
>>> typo
>>>
>>>> +     *  - status & command registers, revision ID & class code, cacheline size,
>>>> +     *    latency timer, header type, BIST
>>>> +     *  - cardbus CIS pointer, subsystem vendor & ID
>>>> +     *  - reserved fields
>>>> +     *  - Min_Gnt & Max_Lat
>>>> +     */
>>>> +    memset(dev->direct_config_read + PCI_COMMAND, 0xff, 12);
>>>> +    memset(dev->direct_config_read + PCI_CARDBUS_CIS, 0xff, 8);
>>>> +    memset(dev->direct_config_read + PCI_CAPABILITY_LIST + 1, 0xff, 7);
>>>> +    memset(dev->direct_config_read + PCI_MIN_GNT, 0xff, 2);
>>>> +    memset(dev->direct_config_write + PCI_COMMAND, 0xff, 12);
>>>> +    memset(dev->direct_config_write + PCI_CARDBUS_CIS, 0xff, 8);
>>>> +    memset(dev->direct_config_write + PCI_CAPABILITY_LIST + 1, 0xff, 7);
>>>> +    memset(dev->direct_config_write + PCI_MIN_GNT, 0xff, 2);
>>>> +
>>>>      if (get_real_device(dev, dev->host.seg, dev->host.bus,
>>>>                          dev->host.dev, dev->host.func)) {
>>>>          error_report("pci-assign: Error: Couldn't get real device (%s)!",
>>>> diff --git a/hw/device-assignment.h b/hw/device-assignment.h
>>>> index 3d20207..ed5defb 100644
>>>> --- a/hw/device-assignment.h
>>>> +++ b/hw/device-assignment.h
>>>> @@ -104,12 +104,13 @@ typedef struct AssignedDevice {
>>>>  #define ASSIGNED_DEVICE_MSIX_MASKED (1 << 2)
>>>>          uint32_t state;
>>>>      } cap;
>>>> +    uint8_t direct_config_read[PCI_CONFIG_SPACE_SIZE];
>>>> +    uint8_t direct_config_write[PCI_CONFIG_SPACE_SIZE];
>>>
>>> What direct means isn't obvious. Add a comment?
>>
>> Ok.
>>
>>>
>>> We could revert the logic, have bits for
>>> emulate_config_read/emulate_config_write.  I think that most of the
>>> registers can safely be read/written directly, so protecting relevant
>>> bits will be less work. Is that true?
>>
>> I intentionally chose whitelisting to avoid accidentally granting access
>> to forgotten fields. I don't think it's a good idea to switch to
>> blacklisting.
>>
>> Jan
>>
> 
> At least in theory iommu protects us from most harm device can do
> (besides downstream transactions, which is why we must protect the BAR).
> Even if we change things, it probably should be a separate patch since
> whitelisting is what existing code had.
> However, I think emulate_xx is just a better name for the field.
> You can preset them to all ones if you think it's safer.

OK, that sounds reasonable.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 13/13] qemu-kvm: Resolve PCI upstream diffs
  2011-06-28  9:12     ` Jan Kiszka
@ 2011-06-28  9:22       ` Michael S. Tsirkin
  0 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2011-06-28  9:22 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

On Tue, Jun 28, 2011 at 11:12:54AM +0200, Jan Kiszka wrote:
> On 2011-06-28 10:58, Michael S. Tsirkin wrote:
> > On Mon, Jun 27, 2011 at 08:19:56PM +0200, Jan Kiszka wrote:
> >> From: Jan Kiszka <jan.kiszka@siemens.com>
> >>
> >> Device assignment no longer peeks into config_map, so we can drop all
> >> the related changes and sync the PCI core with upstream.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > 
> > That's great because I was looking at refactoring that code.
> > Will make merges much less painful.
> 
> I have more refactorings pending, though at best half-done. MSI/MSI-X
> will be generalized, also to make it usable for pci assignment. Are we
> working on orthogonal or overlapping topics here?
> 
> Jan
> 

Mostly orthogonal I think.
I just wanted to get rid of the used array and
have devices always specify offsets when adding capabilities.

-- 
MST

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 03/13] qemu-kvm: Drop -enable-nesting command line switch
  2011-06-27 18:19 ` [PATCH 03/13] qemu-kvm: Drop -enable-nesting command line switch Jan Kiszka
@ 2011-06-28 10:48   ` Roedel, Joerg
  2011-06-28 10:52     ` Jan Kiszka
  0 siblings, 1 reply; 48+ messages in thread
From: Roedel, Joerg @ 2011-06-28 10:48 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Michael S. Tsirkin, Alex Williamson

On Mon, Jun 27, 2011 at 02:19:46PM -0400, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> Link it is already handling in upstream, this feature can be controlled
> via -cpu ...+/-svm.
> 
> CC: Joerg Roedel <Joerg.Roedel@amd.com>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  qemu-kvm.h          |    7 -------
>  qemu-options.hx     |    2 --
>  target-i386/cpuid.c |    3 ---
>  vl.c                |    4 ----
>  4 files changed, 0 insertions(+), 16 deletions(-)
> 
> diff --git a/qemu-kvm.h b/qemu-kvm.h
> index f3f775c..845880e 100644
> --- a/qemu-kvm.h
> +++ b/qemu-kvm.h
> @@ -259,19 +259,12 @@ int kvm_update_ioport_access(CPUState *env);
>  int kvm_arch_set_ioport_access(unsigned long start, unsigned long size,
>                                 bool enable);
>  
> -#ifdef CONFIG_KVM
>  extern int kvm_irqchip;
>  extern int kvm_pit;
>  extern int kvm_pit_reinject;
> -extern int kvm_nested;
>  extern unsigned int kvm_shadow_memory;
>  
>  int kvm_handle_tpr_access(CPUState *env);
> -
> -#else
> -#define kvm_nested 0
> -#endif
> -
>  int kvm_tpr_enable_vapic(CPUState *env);
>  
>  #endif
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 9972dcf..6a6dc89 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2420,8 +2420,6 @@ DEF("no-kvm-pit-reinjection", 0, QEMU_OPTION_no_kvm_pit_reinjection,
>      "-no-kvm-pit-reinjection\n"
>      "                disable KVM kernel mode PIT interrupt reinjection\n",
>      QEMU_ARCH_I386)
> -DEF("enable-nesting", 0, QEMU_OPTION_enable_nesting,
> -    "-enable-nesting enable support for running a VM inside the VM (AMD only)\n", QEMU_ARCH_I386)
>  DEF("nvram", HAS_ARG, QEMU_OPTION_nvram,
>      "-nvram FILE     provide ia64 nvram contents\n", QEMU_ARCH_ALL)
>  DEF("tdf", 0, QEMU_OPTION_tdf,
> diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
> index b5b563c..d1f3b43 100644
> --- a/target-i386/cpuid.c
> +++ b/target-i386/cpuid.c
> @@ -1238,9 +1238,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>  
>              /* disable CPU features that KVM cannot support */
>  
> -            /* svm */
> -            if (!kvm_nested)
> -                *ecx &= ~CPUID_EXT3_SVM;
>              /* 3dnow */
>              *edx &= ~0xc0000000;
>          }
> diff --git a/vl.c b/vl.c
> index 611dbd2..dd870e6 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -2717,10 +2717,6 @@ int main(int argc, char **argv, char **envp)
>                  kvm_pit_reinject = 0;
>                  break;
>              }
> -	    case QEMU_OPTION_enable_nesting: {
> -		kvm_nested = 1;
> -		break;
> -	    }
>  #endif
>              case QEMU_OPTION_usb:
>                  usb_enabled = 1;

Looks good to me too.


-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 03/13] qemu-kvm: Drop -enable-nesting command line switch
  2011-06-28 10:48   ` Roedel, Joerg
@ 2011-06-28 10:52     ` Jan Kiszka
  2011-06-28 11:45       ` Avi Kivity
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2011-06-28 10:52 UTC (permalink / raw)
  To: Roedel, Joerg, Avi Kivity
  Cc: Marcelo Tosatti, kvm, Michael S. Tsirkin, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 383 bytes --]

On 2011-06-28 12:48, Roedel, Joerg wrote:
> On Mon, Jun 27, 2011 at 02:19:46PM -0400, Jan Kiszka wrote:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> Link it is already handling in upstream, this feature can be controlled

Ugh, I must have been drunken.

"Like it is already handled in upstream, ..."

Avi, already merged? Otherwise, please fix up.

Thanks,
Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/13] qemu-kvm: Remove eventfd compat header
  2011-06-27 18:19 ` [PATCH 04/13] qemu-kvm: Remove eventfd compat header Jan Kiszka
@ 2011-06-28 11:09   ` Michael S. Tsirkin
  2011-06-28 11:11     ` Jan Kiszka
  2011-07-03  9:46     ` Bernhard Held
  0 siblings, 2 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2011-06-28 11:09 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

On Mon, Jun 27, 2011 at 08:19:47PM +0200, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> No longer used.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

I think it's actually handy to build on old systems which have
a recent enough kernel but not the header.
Does it hurt much to keep it around?

> ---
>  compat/sys/eventfd.h |   13 -------------
>  1 files changed, 0 insertions(+), 13 deletions(-)
>  delete mode 100644 compat/sys/eventfd.h
> 
> diff --git a/compat/sys/eventfd.h b/compat/sys/eventfd.h
> deleted file mode 100644
> index f55d96a..0000000
> --- a/compat/sys/eventfd.h
> +++ /dev/null
> @@ -1,13 +0,0 @@
> -#ifndef _COMPAT_SYS_EVENTFD
> -#define _COMPAT_SYS_EVENTFD
> -
> -#include <unistd.h>
> -#include <syscall.h>
> -
> -
> -static inline int eventfd (int count, int flags)
> -{
> -    return syscall(SYS_eventfd, count, flags);
> -}
> -
> -#endif
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/13] qemu-kvm: Remove eventfd compat header
  2011-06-28 11:09   ` Michael S. Tsirkin
@ 2011-06-28 11:11     ` Jan Kiszka
  2011-06-28 12:07       ` Michael S. Tsirkin
  2011-07-03  9:46     ` Bernhard Held
  1 sibling, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2011-06-28 11:11 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 565 bytes --]

On 2011-06-28 13:09, Michael S. Tsirkin wrote:
> On Mon, Jun 27, 2011 at 08:19:47PM +0200, Jan Kiszka wrote:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> No longer used.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> I think it's actually handy to build on old systems which have
> a recent enough kernel but not the header.
> Does it hurt much to keep it around?

Unless I missed something, it's not part of any include search path. So
how can it help?

But if you think such a header is useful, then get it upstream.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 03/13] qemu-kvm: Drop -enable-nesting command line switch
  2011-06-28 10:52     ` Jan Kiszka
@ 2011-06-28 11:45       ` Avi Kivity
  0 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2011-06-28 11:45 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Roedel, Joerg, Marcelo Tosatti, kvm, Michael S. Tsirkin, Alex Williamson

On 06/28/2011 01:52 PM, Jan Kiszka wrote:
> On 2011-06-28 12:48, Roedel, Joerg wrote:
> >  On Mon, Jun 27, 2011 at 02:19:46PM -0400, Jan Kiszka wrote:
> >>  From: Jan Kiszka<jan.kiszka@siemens.com>
> >>
> >>  Link it is already handling in upstream, this feature can be controlled
>
> Ugh, I must have been drunken.
>
> "Like it is already handled in upstream, ..."
>
> Avi, already merged? Otherwise, please fix up.
>

Already merged, but also already fixed.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/13] qemu-kvm: Remove eventfd compat header
  2011-06-28 11:11     ` Jan Kiszka
@ 2011-06-28 12:07       ` Michael S. Tsirkin
  2011-06-28 12:11         ` Jan Kiszka
  0 siblings, 1 reply; 48+ messages in thread
From: Michael S. Tsirkin @ 2011-06-28 12:07 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

On Tue, Jun 28, 2011 at 01:11:20PM +0200, Jan Kiszka wrote:
> On 2011-06-28 13:09, Michael S. Tsirkin wrote:
> > On Mon, Jun 27, 2011 at 08:19:47PM +0200, Jan Kiszka wrote:
> >> From: Jan Kiszka <jan.kiszka@siemens.com>
> >>
> >> No longer used.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > 
> > I think it's actually handy to build on old systems which have
> > a recent enough kernel but not the header.
> > Does it hurt much to keep it around?
> 
> Unless I missed something, it's not part of any include search path. So
> how can it help?
> 

We used to have compat last on search path. Is that no longer so?
I can try and dig up an old host that has this configuration.

> But if you think such a header is useful, then get it upstream.
> 
> Jan

It's not upstream because we didn't used to have system
headers upstream. Now that we do it makes sense I think ...

-- 
MST

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/13] qemu-kvm: Remove eventfd compat header
  2011-06-28 12:07       ` Michael S. Tsirkin
@ 2011-06-28 12:11         ` Jan Kiszka
  2011-06-28 12:17           ` Michael S. Tsirkin
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2011-06-28 12:11 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 848 bytes --]

On 2011-06-28 14:07, Michael S. Tsirkin wrote:
> On Tue, Jun 28, 2011 at 01:11:20PM +0200, Jan Kiszka wrote:
>> On 2011-06-28 13:09, Michael S. Tsirkin wrote:
>>> On Mon, Jun 27, 2011 at 08:19:47PM +0200, Jan Kiszka wrote:
>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>
>>>> No longer used.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> I think it's actually handy to build on old systems which have
>>> a recent enough kernel but not the header.
>>> Does it hurt much to keep it around?
>>
>> Unless I missed something, it's not part of any include search path. So
>> how can it help?
>>
> 
> We used to have compat last on search path. Is that no longer so?

I did a grep for "compat" and found nothing suspicious in configure or
some makefile. So this file looked like dead bits to me.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/13] qemu-kvm: Remove eventfd compat header
  2011-06-28 12:11         ` Jan Kiszka
@ 2011-06-28 12:17           ` Michael S. Tsirkin
  2011-06-28 12:40             ` Jan Kiszka
  0 siblings, 1 reply; 48+ messages in thread
From: Michael S. Tsirkin @ 2011-06-28 12:17 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

On Tue, Jun 28, 2011 at 02:11:23PM +0200, Jan Kiszka wrote:
> On 2011-06-28 14:07, Michael S. Tsirkin wrote:
> > On Tue, Jun 28, 2011 at 01:11:20PM +0200, Jan Kiszka wrote:
> >> On 2011-06-28 13:09, Michael S. Tsirkin wrote:
> >>> On Mon, Jun 27, 2011 at 08:19:47PM +0200, Jan Kiszka wrote:
> >>>> From: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>
> >>>> No longer used.
> >>>>
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>
> >>> I think it's actually handy to build on old systems which have
> >>> a recent enough kernel but not the header.
> >>> Does it hurt much to keep it around?
> >>
> >> Unless I missed something, it's not part of any include search path. So
> >> how can it help?
> >>
> > 
> > We used to have compat last on search path. Is that no longer so?
> 
> I did a grep for "compat" and found nothing suspicious in configure or
> some makefile. So this file looked like dead bits to me.
> 
> Jan
> 

  kvm_cflags="$kvm_cflags -idirafter $source_path/compat"


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/13] qemu-kvm: Remove eventfd compat header
  2011-06-28 12:17           ` Michael S. Tsirkin
@ 2011-06-28 12:40             ` Jan Kiszka
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2011-06-28 12:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 1269 bytes --]

On 2011-06-28 14:17, Michael S. Tsirkin wrote:
> On Tue, Jun 28, 2011 at 02:11:23PM +0200, Jan Kiszka wrote:
>> On 2011-06-28 14:07, Michael S. Tsirkin wrote:
>>> On Tue, Jun 28, 2011 at 01:11:20PM +0200, Jan Kiszka wrote:
>>>> On 2011-06-28 13:09, Michael S. Tsirkin wrote:
>>>>> On Mon, Jun 27, 2011 at 08:19:47PM +0200, Jan Kiszka wrote:
>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>
>>>>>> No longer used.
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>
>>>>> I think it's actually handy to build on old systems which have
>>>>> a recent enough kernel but not the header.
>>>>> Does it hurt much to keep it around?
>>>>
>>>> Unless I missed something, it's not part of any include search path. So
>>>> how can it help?
>>>>
>>>
>>> We used to have compat last on search path. Is that no longer so?
>>
>> I did a grep for "compat" and found nothing suspicious in configure or
>> some makefile. So this file looked like dead bits to me.
>>
>> Jan
>>
> 
>   kvm_cflags="$kvm_cflags -idirafter $source_path/compat"
> 

I see. That was dropped with the recent upstream merge (c13a890960).

Please check what setup benefited from it, and then discuss if and how
to reintroduce it via upstream.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/13] qemu-kvm: Remove eventfd compat header
  2011-06-28 11:09   ` Michael S. Tsirkin
  2011-06-28 11:11     ` Jan Kiszka
@ 2011-07-03  9:46     ` Bernhard Held
  2011-07-03  9:54       ` Michael S. Tsirkin
  1 sibling, 1 reply; 48+ messages in thread
From: Bernhard Held @ 2011-07-03  9:46 UTC (permalink / raw)
  To: kvm; +Cc: Jan Kiszka, Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 1484 bytes --]

Am 28.06.2011 13:09, schrieb Michael S. Tsirkin:
> On Mon, Jun 27, 2011 at 08:19:47PM +0200, Jan Kiszka wrote:
>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>
>> No longer used.
>>
>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>
> I think it's actually handy to build on old systems which have
> a recent enough kernel but not the header.
Yep, I'm running recent qemu-kvm and kernels on CentOS 5.6.

The current eventfd.h was handy in the beginning, but some day I had big 
troubles with crashing qemu-kvm. It turned out that I had to switch from 
eventfd to eventfd2. IIRC the old eventfd doesn't care about the flags.

My version of eventfd.h is attached.

Bernhard

>> ---
>>   compat/sys/eventfd.h |   13 -------------
>>   1 files changed, 0 insertions(+), 13 deletions(-)
>>   delete mode 100644 compat/sys/eventfd.h
>>
>> diff --git a/compat/sys/eventfd.h b/compat/sys/eventfd.h
>> deleted file mode 100644
>> index f55d96a..0000000
>> --- a/compat/sys/eventfd.h
>> +++ /dev/null
>> @@ -1,13 +0,0 @@
>> -#ifndef _COMPAT_SYS_EVENTFD
>> -#define _COMPAT_SYS_EVENTFD
>> -
>> -#include<unistd.h>
>> -#include<syscall.h>
>> -
>> -
>> -static inline int eventfd (int count, int flags)
>> -{
>> -    return syscall(SYS_eventfd, count, flags);
>> -}
>> -
>> -#endif
>> --
>> 1.7.1
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


[-- Attachment #2: eventfd.h --]
[-- Type: text/x-chdr, Size: 599 bytes --]

#ifndef _COMPAT_SYS_EVENTFD
#define _COMPAT_SYS_EVENTFD

#include <unistd.h>
#include <syscall.h>

/* Flags for signalfd.  */
enum
  {
	EFD_SEMAPHORE = 1,
#define EFD_SEMAPHORE EFD_SEMAPHORE
	EFD_CLOEXEC = 02000000,
#define EFD_CLOEXEC EFD_CLOEXEC
	EFD_NONBLOCK = 04000
#define EFD_NONBLOCK EFD_NONBLOCK
};

#ifndef __NR_eventfd2
#if defined(__x86_64__)
#define __NR_eventfd2 290
#elif defined(__i386__)
#define __NR_eventfd2 328
#else
#error Cannot detect your architecture!
#endif
#endif

static inline int eventfd (int count, int flags)
{
	return syscall(__NR_eventfd2, count, flags);
}

#endif


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/13] qemu-kvm: Remove eventfd compat header
  2011-07-03  9:46     ` Bernhard Held
@ 2011-07-03  9:54       ` Michael S. Tsirkin
  2011-07-03  9:57         ` Michael S. Tsirkin
  0 siblings, 1 reply; 48+ messages in thread
From: Michael S. Tsirkin @ 2011-07-03  9:54 UTC (permalink / raw)
  To: Bernhard Held
  Cc: Jan Kiszka, Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

On Sun, Jul 03, 2011 at 11:46:51AM +0200, Bernhard Held wrote:
> #ifndef _COMPAT_SYS_EVENTFD
> #define _COMPAT_SYS_EVENTFD
> 
> #include <unistd.h>
> #include <syscall.h>
> 
> /* Flags for signalfd.  */
> enum
>   {
> 	EFD_SEMAPHORE = 1,
> #define EFD_SEMAPHORE EFD_SEMAPHORE
> 	EFD_CLOEXEC = 02000000,
> #define EFD_CLOEXEC EFD_CLOEXEC
> 	EFD_NONBLOCK = 04000
> #define EFD_NONBLOCK EFD_NONBLOCK
> };
> 
> #ifndef __NR_eventfd2
> #if defined(__x86_64__)
> #define __NR_eventfd2 290
> #elif defined(__i386__)
> #define __NR_eventfd2 328
> #else
> #error Cannot detect your architecture!
> #endif
> #endif

Can't we use SYS_eventfd2 instead?

> static inline int eventfd (int count, int flags)
> {
> 	return syscall(__NR_eventfd2, count, flags);
> }
> 
> #endif
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/13] qemu-kvm: Remove eventfd compat header
  2011-07-03  9:54       ` Michael S. Tsirkin
@ 2011-07-03  9:57         ` Michael S. Tsirkin
  2011-07-03 18:31           ` Bernhard Held
  0 siblings, 1 reply; 48+ messages in thread
From: Michael S. Tsirkin @ 2011-07-03  9:57 UTC (permalink / raw)
  To: Bernhard Held
  Cc: Jan Kiszka, Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

On Sun, Jul 03, 2011 at 12:54:06PM +0300, Michael S. Tsirkin wrote:
> On Sun, Jul 03, 2011 at 11:46:51AM +0200, Bernhard Held wrote:
> > #ifndef _COMPAT_SYS_EVENTFD
> > #define _COMPAT_SYS_EVENTFD
> > 
> > #include <unistd.h>
> > #include <syscall.h>
> > 
> > /* Flags for signalfd.  */
> > enum
> >   {
> > 	EFD_SEMAPHORE = 1,
> > #define EFD_SEMAPHORE EFD_SEMAPHORE
> > 	EFD_CLOEXEC = 02000000,
> > #define EFD_CLOEXEC EFD_CLOEXEC
> > 	EFD_NONBLOCK = 04000
> > #define EFD_NONBLOCK EFD_NONBLOCK
> > };
> > 
> > #ifndef __NR_eventfd2
> > #if defined(__x86_64__)
> > #define __NR_eventfd2 290
> > #elif defined(__i386__)
> > #define __NR_eventfd2 328
> > #else
> > #error Cannot detect your architecture!
> > #endif
> > #endif
> 
> Can't we use SYS_eventfd2 instead?

Looks like we can't, but __NR_eventfd2
does seem to exit?

> > static inline int eventfd (int count, int flags)
> > {
> > 	return syscall(__NR_eventfd2, count, flags);
> > }
> > 
> > #endif
> > 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/13] qemu-kvm: Remove eventfd compat header
  2011-07-03  9:57         ` Michael S. Tsirkin
@ 2011-07-03 18:31           ` Bernhard Held
  2011-07-04 10:37             ` Michael S. Tsirkin
  0 siblings, 1 reply; 48+ messages in thread
From: Bernhard Held @ 2011-07-03 18:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jan Kiszka, Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

Am 03.07.2011 11:57, schrieb Michael S. Tsirkin:
> On Sun, Jul 03, 2011 at 12:54:06PM +0300, Michael S. Tsirkin wrote:
>> On Sun, Jul 03, 2011 at 11:46:51AM +0200, Bernhard Held wrote:
>>> #ifndef _COMPAT_SYS_EVENTFD
>>> #define _COMPAT_SYS_EVENTFD
>>>
>>> #include<unistd.h>
>>> #include<syscall.h>
>>>
>>> /* Flags for signalfd.  */
>>> enum
>>>    {
>>> 	EFD_SEMAPHORE = 1,
>>> #define EFD_SEMAPHORE EFD_SEMAPHORE
>>> 	EFD_CLOEXEC = 02000000,
>>> #define EFD_CLOEXEC EFD_CLOEXEC
>>> 	EFD_NONBLOCK = 04000
>>> #define EFD_NONBLOCK EFD_NONBLOCK
>>> };
>>>
>>> #ifndef __NR_eventfd2
>>> #if defined(__x86_64__)
>>> #define __NR_eventfd2 290
>>> #elif defined(__i386__)
>>> #define __NR_eventfd2 328
>>> #else
>>> #error Cannot detect your architecture!
>>> #endif
>>> #endif
>>
>> Can't we use SYS_eventfd2 instead?
>
> Looks like we can't, but __NR_eventfd2
> does seem to exit?
SYS_eventfd2 is not defined on systems with old glibc. These numbers are 
working for me, for 32 and 64 bit linux. What means "__NR_eventfd2 does 
seem to exit"?

The eventfd() man page nicely documents the evolution:
http://www.kernel.org/doc/man-pages/online/pages/man2/eventfd.2.html
"eventfd() is available on Linux since kernel 2.6.22.  Working support 
is provided in glibc since version 2.8.  The eventfd2() system call (see 
NOTES) is available on Linux since kernel 2.6.27.  Since version 2.9, 
the glibc eventfd() wrapper will employ the eventfd2() system call, if 
it is supported by the kernel."

"There are two underlying Linux system calls: eventfd() and the more 
recent eventfd2(). The former system call does not implement a flags 
argument. The latter system call implements the flags values described 
above. The glibc wrapper function will use eventfd2() where it is 
available."

This means that the current compat header is buggy; qemu-kvm sets the 
flags argument, but the wrapper uses the old eventfd() system call.

Bernhard

>>> static inline int eventfd (int count, int flags)
>>> {
>>> 	return syscall(__NR_eventfd2, count, flags);
>>> }
>>>
>>> #endif
>>>
>>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/13] qemu-kvm: Remove eventfd compat header
  2011-07-03 18:31           ` Bernhard Held
@ 2011-07-04 10:37             ` Michael S. Tsirkin
  2011-07-04 12:13               ` Bernhard Held
  0 siblings, 1 reply; 48+ messages in thread
From: Michael S. Tsirkin @ 2011-07-04 10:37 UTC (permalink / raw)
  To: Bernhard Held
  Cc: Jan Kiszka, Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

On Sun, Jul 03, 2011 at 08:31:11PM +0200, Bernhard Held wrote:
> SYS_eventfd2 is not defined on systems with old glibc. These numbers
> are working for me, for 32 and 64 bit linux. What means
> "__NR_eventfd2 does seem to exit"?

My system seems to have a definition for the 64 bit
_NR_eventfd2 in unistd.h. 
# grep __NR_eventfd2 /usr/include/*/*h
/usr/include/asm-x86_64/unistd.h:#define __NR_eventfd2          290
So I think we should use another name to avoid conflicts.


-- 
MST

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/13] qemu-kvm: Remove eventfd compat header
  2011-07-04 10:37             ` Michael S. Tsirkin
@ 2011-07-04 12:13               ` Bernhard Held
  2011-07-04 13:34                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 48+ messages in thread
From: Bernhard Held @ 2011-07-04 12:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jan Kiszka, Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

Am 04.07.2011 12:37, schrieb Michael S. Tsirkin:
> On Sun, Jul 03, 2011 at 08:31:11PM +0200, Bernhard Held wrote:
>> SYS_eventfd2 is not defined on systems with old glibc. These numbers
>> are working for me, for 32 and 64 bit linux. What means
>> "__NR_eventfd2 does seem to exit"?
>
> My system seems to have a definition for the 64 bit
> _NR_eventfd2 in unistd.h.
> # grep __NR_eventfd2 /usr/include/*/*h
> /usr/include/asm-x86_64/unistd.h:#define __NR_eventfd2          290
> So I think we should use another name to avoid conflicts.

The definition of __NR_eventfd2 is embraced by

#ifndef __NR_eventfd2
   ...
#endif

which effectively avoids conflicts.

However the definition of the flags is unconditional, this is indeed buggy.

Best regards,
Bernhard

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/13] qemu-kvm: Remove eventfd compat header
  2011-07-04 12:13               ` Bernhard Held
@ 2011-07-04 13:34                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 48+ messages in thread
From: Michael S. Tsirkin @ 2011-07-04 13:34 UTC (permalink / raw)
  To: Bernhard Held
  Cc: Jan Kiszka, Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

On Mon, Jul 04, 2011 at 02:13:36PM +0200, Bernhard Held wrote:
> Am 04.07.2011 12:37, schrieb Michael S. Tsirkin:
> >On Sun, Jul 03, 2011 at 08:31:11PM +0200, Bernhard Held wrote:
> >>SYS_eventfd2 is not defined on systems with old glibc. These numbers
> >>are working for me, for 32 and 64 bit linux. What means
> >>"__NR_eventfd2 does seem to exit"?
> >
> >My system seems to have a definition for the 64 bit
> >_NR_eventfd2 in unistd.h.
> ># grep __NR_eventfd2 /usr/include/*/*h
> >/usr/include/asm-x86_64/unistd.h:#define __NR_eventfd2          290
> >So I think we should use another name to avoid conflicts.
> 
> The definition of __NR_eventfd2 is embraced by
> 
> #ifndef __NR_eventfd2
>   ...
> #endif
> 
> which effectively avoids conflicts.

Not if the system header gets included afterwards.

> However the definition of the flags is unconditional, this is indeed buggy.
> 
> Best regards,
> Bernhard

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2011-07-04 13:34 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-27 18:19 [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Jan Kiszka
2011-06-27 18:19 ` [PATCH 01/13] qemu-kvm: Reduce configure and Makefile.target diff to upstream Jan Kiszka
2011-06-27 18:19 ` [PATCH 02/13] qemu-kvm: Drop some no longer needed #ifdefs Jan Kiszka
2011-06-27 18:19 ` [PATCH 03/13] qemu-kvm: Drop -enable-nesting command line switch Jan Kiszka
2011-06-28 10:48   ` Roedel, Joerg
2011-06-28 10:52     ` Jan Kiszka
2011-06-28 11:45       ` Avi Kivity
2011-06-27 18:19 ` [PATCH 04/13] qemu-kvm: Remove eventfd compat header Jan Kiszka
2011-06-28 11:09   ` Michael S. Tsirkin
2011-06-28 11:11     ` Jan Kiszka
2011-06-28 12:07       ` Michael S. Tsirkin
2011-06-28 12:11         ` Jan Kiszka
2011-06-28 12:17           ` Michael S. Tsirkin
2011-06-28 12:40             ` Jan Kiszka
2011-07-03  9:46     ` Bernhard Held
2011-07-03  9:54       ` Michael S. Tsirkin
2011-07-03  9:57         ` Michael S. Tsirkin
2011-07-03 18:31           ` Bernhard Held
2011-07-04 10:37             ` Michael S. Tsirkin
2011-07-04 12:13               ` Bernhard Held
2011-07-04 13:34                 ` Michael S. Tsirkin
2011-06-27 18:19 ` [PATCH 05/13] qemu-kvm: Remove qemu_ram_unmap Jan Kiszka
2011-06-27 18:19 ` [PATCH 06/13] qemu-kvm: Drop or replace useless device-assignment.h inclusions Jan Kiszka
2011-06-27 18:19 ` [PATCH 07/13] pci-assign: Fix kvm_deassign_irq handling in assign_irq Jan Kiszka
2011-06-27 18:19 ` [PATCH 08/13] pci-assign: Update legacy interrupts only if used Jan Kiszka
2011-06-27 18:19 ` [PATCH 09/13] pci-assign: Drop libpci header dependency Jan Kiszka
2011-06-28  8:54   ` Michael S. Tsirkin
2011-06-27 18:19 ` [PATCH 10/13] pci-assign: Refactor calc_assigned_dev_id Jan Kiszka
2011-06-27 18:19 ` [PATCH 11/13] pci-assign: Track MSI/MSI-X capability position, clean up related code Jan Kiszka
2011-06-27 18:19 ` [PATCH 12/13] pci-assign: Generic config space access management Jan Kiszka
2011-06-27 20:54   ` Michael S. Tsirkin
2011-06-27 22:48   ` Alex Williamson
2011-06-28  7:08     ` Jan Kiszka
2011-06-28  8:07     ` Avi Kivity
2011-06-28  8:19       ` Jan Kiszka
2011-06-28  8:21         ` Avi Kivity
2011-06-28  8:10   ` Michael S. Tsirkin
2011-06-28  8:18     ` Jan Kiszka
2011-06-28  8:30       ` Michael S. Tsirkin
2011-06-28  9:20         ` Jan Kiszka
2011-06-28  8:51   ` Michael S. Tsirkin
2011-06-28  9:10     ` Avi Kivity
2011-06-27 18:19 ` [PATCH 13/13] qemu-kvm: Resolve PCI upstream diffs Jan Kiszka
2011-06-28  8:58   ` Michael S. Tsirkin
2011-06-28  9:12     ` Jan Kiszka
2011-06-28  9:22       ` Michael S. Tsirkin
2011-06-28  8:10 ` [PATCH 00/13] qemu-kvm: device assignment cleanups and upstream diff reductions Avi Kivity
2011-06-28  8:57   ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.