All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [v4 0/6] AMD IOMMU Interrupt remapping
@ 2016-09-12 10:08 David Kiarie
  2016-09-12 10:08 ` [Qemu-devel] [v4 1/6] hw/msi: Allow platform devices to use explicit SID David Kiarie
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: David Kiarie @ 2016-09-12 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: jan.kiszka, mst, rkrcmar, peterx, ehabkost, pbonzini,
	alex.williamson, David Kiarie

Hello,

This patchset adds AMD IOMMU interrupt remapping logic to Qemu.

I have made some changes to the way X86 Qemu manages MSI routes. The 
current code assumes that MSI routes are affiliated to a PCI device which 
will not always be the case with split irqchip. With split irqchip platform
devices, in our case IOAPIC should be able to make interrupt requests via KVM.

That being said, we not have IOMMU interrupt remapping and these platform devices
will be expected to make interrupt requests using an explicit SID. I have therefore
made some changes so that MSI routes are affiliated to a Requester ID and a PCI device
if one is present.

This patchset builds ontop of the AMD IOMMU patchset but is available here[1] for quick
testing.

https://github.com/aslaq/qemu IR

David Kiarie (6):
  hw/msi: Allow platform devices to use explicit SID
  hw/i386: enforce SID verification
  hw/iommu: Prepare for AMD IOMMU interrupt remapping
  hw/iommu: AMD IOMMU interrupt remapping
  hw/acpi: report IOAPIC on IVRS
  hw/iommu: share common code between IOMMUs

 hw/i386/acpi-build.c              |   2 +
 hw/i386/amd_iommu.c               | 240 +++++++++++++++++++++++++++++++++++++-
 hw/i386/amd_iommu.h               |  82 +++++++++++++
 hw/i386/intel_iommu.c             |  89 +++++++-------
 hw/i386/kvm/pci-assign.c          |  12 +-
 hw/i386/trace-events              |   7 ++
 hw/i386/x86-iommu.c               |   8 ++
 hw/intc/ioapic.c                  |  30 ++++-
 hw/misc/ivshmem.c                 |   6 +-
 hw/vfio/pci.c                     |   6 +-
 hw/virtio/virtio-pci.c            |   7 +-
 include/hw/i386/ioapic_internal.h |   1 +
 include/hw/i386/x86-iommu.h       |   1 +
 include/sysemu/kvm.h              |  25 ++--
 kvm-all.c                         |  10 +-
 kvm-stub.c                        |   5 +-
 qemu-version.h                    |   1 +
 target-i386/kvm.c                 |  15 ++-
 18 files changed, 461 insertions(+), 86 deletions(-)
 create mode 100644 qemu-version.h

-- 
2.1.4

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [v4 1/6] hw/msi: Allow platform devices to use explicit SID
  2016-09-12 10:08 [Qemu-devel] [v4 0/6] AMD IOMMU Interrupt remapping David Kiarie
@ 2016-09-12 10:08 ` David Kiarie
  2016-09-12 11:02   ` Peter Xu
  2016-09-12 10:08 ` [Qemu-devel] [v4 2/6] hw/i386: enforce SID verification David Kiarie
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: David Kiarie @ 2016-09-12 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: jan.kiszka, mst, rkrcmar, peterx, ehabkost, pbonzini,
	alex.williamson, David Kiarie

When using IOMMU platform devices like IOAPIC are required to make
interrupt remapping requests using explicit SID.We affiliate an MSI
route with a requester ID and a PCI device if present which ensures
that platform devices can call IOMMU interrupt remapping code with
explicit SID while maintaining compatility with the original code
which mainly dealt with PCI devices.

Signed-off-by: David Kiarie <davidkiarie4@gmail.com>
---
 hw/i386/intel_iommu.c             |  3 +++
 hw/i386/kvm/pci-assign.c          | 12 ++++++++----
 hw/intc/ioapic.c                  | 31 ++++++++++++++++++++++++++-----
 hw/misc/ivshmem.c                 |  6 ++++--
 hw/vfio/pci.c                     |  6 ++++--
 hw/virtio/virtio-pci.c            |  7 +++++--
 include/hw/i386/ioapic_internal.h |  1 +
 include/hw/i386/x86-iommu.h       |  1 +
 include/sysemu/kvm.h              | 25 ++++++++++++++-----------
 kvm-all.c                         | 10 ++++++----
 kvm-stub.c                        |  5 +++--
 qemu-version.h                    |  1 +
 target-i386/kvm.c                 | 15 +++++++++------
 13 files changed, 85 insertions(+), 38 deletions(-)
 create mode 100644 qemu-version.h

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index d6e02c8..496d836 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2466,6 +2466,9 @@ static void vtd_realize(DeviceState *dev, Error **errp)
     vtd_init(s);
     sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
     pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
+    /* IOMMU expected IOAPIC SID */
+    x86_iommu->ioapic_bdf = PCI_BUILD_BDF(Q35_PSEUDO_DEVFN_IOAPIC,
+                            Q35_PSEUDO_DEVFN_IOAPIC);
     /* Pseudo address space under root PCI bus. */
     pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
 
diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index 8238fbc..3f26be1 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -976,7 +976,8 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
     if (ctrl_byte & PCI_MSI_FLAGS_ENABLE) {
         int virq;
 
-        virq = kvm_irqchip_add_msi_route(kvm_state, 0, pci_dev);
+        virq = kvm_irqchip_add_msi_route(kvm_state, 0, pci_dev,
+                                         pci_requester_id(pci_dev));
         if (virq < 0) {
             perror("assigned_dev_update_msi: kvm_irqchip_add_msi_route");
             return;
@@ -1014,7 +1015,8 @@ static void assigned_dev_update_msi_msg(PCIDevice *pci_dev)
     }
 
     kvm_irqchip_update_msi_route(kvm_state, assigned_dev->msi_virq[0],
-                                 msi_get_message(pci_dev, 0), pci_dev);
+                                 msi_get_message(pci_dev, 0), pci_dev,
+                                 pci_requester_id(pci_dev));
     kvm_irqchip_commit_routes(kvm_state);
 }
 
@@ -1078,7 +1080,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
             continue;
         }
 
-        r = kvm_irqchip_add_msi_route(kvm_state, i, pci_dev);
+        r = kvm_irqchip_add_msi_route(kvm_state, i, pci_dev,
+                                      pci_requester_id(pci_dev));
         if (r < 0) {
             return r;
         }
@@ -1599,7 +1602,8 @@ static void assigned_dev_msix_mmio_write(void *opaque, hwaddr addr,
 
                 ret = kvm_irqchip_update_msi_route(kvm_state,
                                                    adev->msi_virq[i], msg,
-                                                   pdev);
+                                                   pdev,
+                                                   pci_requester_id(pdev));
                 if (ret) {
                     error_report("Error updating irq routing entry (%d)", ret);
                 }
diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 31791b0..f2d4c15 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -95,9 +95,17 @@ static void ioapic_entry_parse(uint64_t entry, struct ioapic_entry_info *info)
         (info->delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
 }
 
-static void ioapic_service(IOAPICCommonState *s)
+static void ioapic_as_write(IOAPICCommonState *s, uint32_t data, uint64_t addr)
 {
     AddressSpace *ioapic_as = PC_MACHINE(qdev_get_machine())->ioapic_as;
+    MemTxAttrs attrs;
+
+    attrs.requester_id = s->devid;
+    address_space_stl_le(ioapic_as, addr, data, attrs, NULL);
+}
+
+static void ioapic_service(IOAPICCommonState *s)
+{
     struct ioapic_entry_info info;
     uint8_t i;
     uint32_t mask;
@@ -141,7 +149,7 @@ static void ioapic_service(IOAPICCommonState *s)
                  * the IOAPIC message into a MSI one, and its
                  * address space will decide whether we need a
                  * translation. */
-                stl_le_phys(ioapic_as, info.addr, info.data);
+                ioapic_as_write(s, info.data, info.addr);
             }
         }
     }
@@ -197,7 +205,7 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s)
             ioapic_entry_parse(s->ioredtbl[i], &info);
             msg.address = info.addr;
             msg.data = info.data;
-            kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL);
+            kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL, s->devid);
         }
         kvm_irqchip_commit_routes(kvm_state);
     }
@@ -383,14 +391,28 @@ static void ioapic_machine_done_notify(Notifier *notifier, void *data)
     IOAPICCommonState *s = container_of(notifier, IOAPICCommonState,
                                         machine_done);
 
+    X86IOMMUState *iommu = x86_iommu_get_default();
+    /* kernel_irqchip=off */
+    if (iommu) {
+        s->devid = iommu->ioapic_bdf;
+    }
+
     if (kvm_irqchip_is_split()) {
-        X86IOMMUState *iommu = x86_iommu_get_default();
+        MSIMessage msg = {0, 0};
+        int i;
+
         if (iommu) {
             /* Register this IOAPIC with IOMMU IEC notifier, so that
              * when there are IR invalidates, we can be notified to
              * update kernel IR cache. */
             x86_iommu_iec_register_notifier(iommu, ioapic_iec_notifier, s);
+            /* update IOAPIC routes to the right SID */
+            for (i = 0; i < IOAPIC_NUM_PINS; i++) {
+                kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL, s->devid);
+            }
+            kvm_irqchip_commit_routes(kvm_state);
         }
+
     }
 #endif
 }
@@ -407,7 +429,6 @@ static void ioapic_realize(DeviceState *dev, Error **errp)
 
     memory_region_init_io(&s->io_memory, OBJECT(s), &ioapic_io_ops, s,
                           "ioapic", 0x1000);
-
     qdev_init_gpio_in(dev, ioapic_set_irq, IOAPIC_NUM_PINS);
 
     ioapics[ioapic_no] = s;
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 40a2ebc..9d5133f 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -318,7 +318,8 @@ static int ivshmem_vector_unmask(PCIDevice *dev, unsigned vector,
 
     IVSHMEM_DPRINTF("vector unmask %p %d\n", dev, vector);
 
-    ret = kvm_irqchip_update_msi_route(kvm_state, v->virq, msg, dev);
+    ret = kvm_irqchip_update_msi_route(kvm_state, v->virq, msg, dev,
+                                       pci_requester_id(dev));
     if (ret < 0) {
         return ret;
     }
@@ -447,7 +448,8 @@ static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector,
     IVSHMEM_DPRINTF("ivshmem_add_kvm_msi_virq vector:%d\n", vector);
     assert(!s->msi_vectors[vector].pdev);
 
-    ret = kvm_irqchip_add_msi_route(kvm_state, vector, pdev);
+    ret = kvm_irqchip_add_msi_route(kvm_state, vector, pdev,
+                                    pci_requester_id(pdev));
     if (ret < 0) {
         error_setg(errp, "kvm_irqchip_add_msi_route failed");
         return;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 7bfa17c..a745e07 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -429,7 +429,8 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
         return;
     }
 
-    virq = kvm_irqchip_add_msi_route(kvm_state, vector_n, &vdev->pdev);
+    virq = kvm_irqchip_add_msi_route(kvm_state, vector_n, &vdev->pdev,
+                                     pci_requester_id(&vdev->pdev));
     if (virq < 0) {
         event_notifier_cleanup(&vector->kvm_interrupt);
         return;
@@ -457,7 +458,8 @@ static void vfio_remove_kvm_msi_virq(VFIOMSIVector *vector)
 static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg,
                                      PCIDevice *pdev)
 {
-    kvm_irqchip_update_msi_route(kvm_state, vector->virq, msg, pdev);
+    kvm_irqchip_update_msi_route(kvm_state, vector->virq, msg, pdev,
+                                 pci_requester_id(pdev));
     kvm_irqchip_commit_routes(kvm_state);
 }
 
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 755f921..658d904 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -705,7 +705,8 @@ static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy,
     int ret;
 
     if (irqfd->users == 0) {
-        ret = kvm_irqchip_add_msi_route(kvm_state, vector, &proxy->pci_dev);
+        ret = kvm_irqchip_add_msi_route(kvm_state, vector, &proxy->pci_dev,
+                                        pci_requester_id(&proxy->pci_dev));
         if (ret < 0) {
             return ret;
         }
@@ -833,12 +834,14 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy *proxy,
     EventNotifier *n = virtio_queue_get_guest_notifier(vq);
     VirtIOIRQFD *irqfd;
     int ret = 0;
+    uint16_t requester_id = pci_requester_id(&proxy->pci_dev);
 
     if (proxy->vector_irqfd) {
         irqfd = &proxy->vector_irqfd[vector];
         if (irqfd->msg.data != msg.data || irqfd->msg.address != msg.address) {
             ret = kvm_irqchip_update_msi_route(kvm_state, irqfd->virq, msg,
-                                               &proxy->pci_dev);
+                                               &proxy->pci_dev,
+                                               requester_id);
             if (ret < 0) {
                 return ret;
             }
diff --git a/include/hw/i386/ioapic_internal.h b/include/hw/i386/ioapic_internal.h
index a11d86d..d68a24f 100644
--- a/include/hw/i386/ioapic_internal.h
+++ b/include/hw/i386/ioapic_internal.h
@@ -103,6 +103,7 @@ typedef struct IOAPICCommonClass {
 struct IOAPICCommonState {
     SysBusDevice busdev;
     MemoryRegion io_memory;
+    uint16_t devid;
     uint8_t id;
     uint8_t ioregsel;
     uint32_t irr;
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
index 0c89d98..5d05865 100644
--- a/include/hw/i386/x86-iommu.h
+++ b/include/hw/i386/x86-iommu.h
@@ -72,6 +72,7 @@ typedef struct IEC_Notifier IEC_Notifier;
 
 struct X86IOMMUState {
     SysBusDevice busdev;
+    uint16_t ioapic_bdf;        /* expected IOAPIC SID        */
     bool intr_supported;        /* Whether vIOMMU supports IR */
     IommuType type;             /* IOMMU type - AMD/Intel     */
     QLIST_HEAD(, IEC_Notifier) iec_notifiers; /* IEC notify list */
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index c9c2436..ae81f92 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -357,7 +357,8 @@ int kvm_arch_on_sigbus(int code, void *addr);
 void kvm_arch_init_irq_routing(KVMState *s);
 
 int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
-                             uint64_t address, uint32_t data, PCIDevice *dev);
+                             uint64_t address, uint32_t data,
+                             uint16_t requester_id);
 
 /* Notify arch about newly added MSI routes */
 int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
@@ -481,18 +482,20 @@ static inline void cpu_synchronize_post_init(CPUState *cpu)
 
 /**
  * kvm_irqchip_add_msi_route - Add MSI route for specific vector
- * @s:      KVM state
- * @vector: which vector to add. This can be either MSI/MSIX
- *          vector. The function will automatically detect whether
- *          MSI/MSIX is enabled, and fetch corresponding MSI
- *          message.
- * @dev:    Owner PCI device to add the route. If @dev is specified
- *          as @NULL, an empty MSI message will be inited.
- * @return: virq (>=0) when success, errno (<0) when failed.
+ * @s:            KVM state
+ * @vector:       which vector to add. This can be either MSI/MSIX
+ *                vector. The function will automatically detect whether
+ *                MSI/MSIX is enabled, and fetch corresponding MSI
+ *                message.
+ * @dev:          Owner PCI device to add the route. If @dev is specified
+ *                as @NULL, an empty MSI message will be inited.
+ * @requester_id: SID when calling IOMMU code
+ * @return:       virq (>=0) when success, errno (<0) when failed.
  */
-int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev);
+int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev,
+                              uint16_t requester_id);
 int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
-                                 PCIDevice *dev);
+                                 PCIDevice *dev, uint16_t requester_id);
 void kvm_irqchip_commit_routes(KVMState *s);
 void kvm_irqchip_release_virq(KVMState *s, int virq);
 
diff --git a/kvm-all.c b/kvm-all.c
index ebf35b0..9e888ab 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1246,7 +1246,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
     return kvm_set_irq(s, route->kroute.gsi, 1);
 }
 
-int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
+int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev, uint16_t requester_id)
 {
     struct kvm_irq_routing_entry kroute = {};
     int virq;
@@ -1275,7 +1275,8 @@ int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
     kroute.u.msi.address_lo = (uint32_t)msg.address;
     kroute.u.msi.address_hi = msg.address >> 32;
     kroute.u.msi.data = le32_to_cpu(msg.data);
-    if (kvm_arch_fixup_msi_route(&kroute, msg.address, msg.data, dev)) {
+    if (kvm_arch_fixup_msi_route(&kroute, msg.address, msg.data,
+                                 requester_id)) {
         kvm_irqchip_release_virq(s, virq);
         return -EINVAL;
     }
@@ -1290,7 +1291,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
 }
 
 int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
-                                 PCIDevice *dev)
+                                 PCIDevice *dev, uint16_t requester_id)
 {
     struct kvm_irq_routing_entry kroute = {};
 
@@ -1308,7 +1309,8 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
     kroute.u.msi.address_lo = (uint32_t)msg.address;
     kroute.u.msi.address_hi = msg.address >> 32;
     kroute.u.msi.data = le32_to_cpu(msg.data);
-    if (kvm_arch_fixup_msi_route(&kroute, msg.address, msg.data, dev)) {
+    if (kvm_arch_fixup_msi_route(&kroute, msg.address, msg.data,
+                                 requester_id)) {
         return -EINVAL;
     }
 
diff --git a/kvm-stub.c b/kvm-stub.c
index 64e23f6..9a49ce0 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -116,7 +116,8 @@ int kvm_on_sigbus(int code, void *addr)
 }
 
 #ifndef CONFIG_USER_ONLY
-int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
+int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev,
+                              uint16_t requester_id)
 {
     return -ENOSYS;
 }
@@ -130,7 +131,7 @@ void kvm_irqchip_release_virq(KVMState *s, int virq)
 }
 
 int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
-                                 PCIDevice *dev)
+                                 PCIDevice *dev, uint16_t requester_id)
 {
     return -ENOSYS;
 }
diff --git a/qemu-version.h b/qemu-version.h
new file mode 100644
index 0000000..0e06b3e
--- /dev/null
+++ b/qemu-version.h
@@ -0,0 +1 @@
+#define QEMU_PKGVERSION " (v2.3.0-9603-g33203eb-dirty)"
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index d1a25c5..723bd67 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -3243,7 +3243,7 @@ void kvm_arch_init_irq_routing(KVMState *s)
         /* If the ioapic is in QEMU and the lapics are in KVM, reserve
            MSI routes for signaling interrupts to the local apics. */
         for (i = 0; i < IOAPIC_NUM_PINS; i++) {
-            if (kvm_irqchip_add_msi_route(s, 0, NULL) < 0) {
+            if (kvm_irqchip_add_msi_route(s, 0, NULL, 0) < 0) {
                 error_report("Could not enable split IRQ mode.");
                 exit(1);
             }
@@ -3411,7 +3411,8 @@ int kvm_device_msix_deassign(KVMState *s, uint32_t dev_id)
 }
 
 int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
-                             uint64_t address, uint32_t data, PCIDevice *dev)
+                             uint64_t address, uint32_t data,
+                             uint16_t requester_id)
 {
     X86IOMMUState *iommu = x86_iommu_get_default();
 
@@ -3425,9 +3426,8 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
         src.address |= route->u.msi.address_lo;
         src.data = route->u.msi.data;
 
-        ret = class->int_remap(iommu, &src, &dst, dev ? \
-                               pci_requester_id(dev) : \
-                               X86_IOMMU_SID_INVALID);
+        ret = class->int_remap(iommu, &src, &dst, requester_id);
+
         if (ret) {
             trace_kvm_x86_fixup_msi_error(route->gsi);
             return 1;
@@ -3445,6 +3445,7 @@ typedef struct MSIRouteEntry MSIRouteEntry;
 
 struct MSIRouteEntry {
     PCIDevice *dev;             /* Device pointer */
+    uint16_t requester_id;      /* Requesting SID */
     int vector;                 /* MSI/MSIX vector index */
     int virq;                   /* Virtual IRQ index */
     QLIST_ENTRY(MSIRouteEntry) list;
@@ -3465,7 +3466,8 @@ static void kvm_update_msi_routes_all(void *private, bool global,
         cnt++;
         msg = pci_get_msi_message(entry->dev, entry->vector);
         kvm_irqchip_update_msi_route(kvm_state, entry->virq,
-                                     msg, entry->dev);
+                                     msg, entry->dev,
+                                     entry->requester_id);
     }
     kvm_irqchip_commit_routes(kvm_state);
     trace_kvm_x86_update_msi_routes(cnt);
@@ -3486,6 +3488,7 @@ int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
 
     entry = g_new0(MSIRouteEntry, 1);
     entry->dev = dev;
+    entry->requester_id = pci_requester_id(dev);
     entry->vector = vector;
     entry->virq = route->gsi;
     QLIST_INSERT_HEAD(&msi_route_list, entry, list);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [v4 2/6] hw/i386: enforce SID verification
  2016-09-12 10:08 [Qemu-devel] [v4 0/6] AMD IOMMU Interrupt remapping David Kiarie
  2016-09-12 10:08 ` [Qemu-devel] [v4 1/6] hw/msi: Allow platform devices to use explicit SID David Kiarie
@ 2016-09-12 10:08 ` David Kiarie
  2016-09-12 11:09   ` Peter Xu
  2016-09-12 11:13   ` Peter Xu
  2016-09-12 10:08 ` [Qemu-devel] [v4 3/6] hw/iommu: Prepare for AMD IOMMU interrupt remapping David Kiarie
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 21+ messages in thread
From: David Kiarie @ 2016-09-12 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: jan.kiszka, mst, rkrcmar, peterx, ehabkost, pbonzini,
	alex.williamson, David Kiarie

Platform devices are now able to make interrupt request with
explicit SIDs hence we can safely expect triggered AddressSpace ID
to match the requesting ID

Signed-off-by: David Kiarie <davidkiarie4@gmail.com>
---
 hw/i386/intel_iommu.c | 77 ++++++++++++++++++++++++++-------------------------
 1 file changed, 39 insertions(+), 38 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 496d836..e4bad6a 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2043,43 +2043,41 @@ static int vtd_irte_get(IntelIOMMUState *iommu, uint16_t index,
         return -VTD_FR_IR_IRTE_RSVD;
     }
 
-    if (sid != X86_IOMMU_SID_INVALID) {
-        /* Validate IRTE SID */
-        source_id = le32_to_cpu(entry->irte.source_id);
-        switch (entry->irte.sid_vtype) {
-        case VTD_SVT_NONE:
-            VTD_DPRINTF(IR, "No SID validation for IRTE index %d", index);
-            break;
-
-        case VTD_SVT_ALL:
-            mask = vtd_svt_mask[entry->irte.sid_q];
-            if ((source_id & mask) != (sid & mask)) {
-                VTD_DPRINTF(GENERAL, "SID validation for IRTE index "
-                            "%d failed (reqid 0x%04x sid 0x%04x)", index,
-                            sid, source_id);
-                return -VTD_FR_IR_SID_ERR;
-            }
-            break;
+    /* Validate IRTE SID */
+    source_id = le32_to_cpu(entry->irte.source_id);
+    switch (entry->irte.sid_vtype) {
+    case VTD_SVT_NONE:
+        VTD_DPRINTF(IR, "No SID validation for IRTE index %d", index);
+        break;
 
-        case VTD_SVT_BUS:
-            bus_max = source_id >> 8;
-            bus_min = source_id & 0xff;
-            bus = sid >> 8;
-            if (bus > bus_max || bus < bus_min) {
-                VTD_DPRINTF(GENERAL, "SID validation for IRTE index %d "
-                            "failed (bus %d outside %d-%d)", index, bus,
-                            bus_min, bus_max);
-                return -VTD_FR_IR_SID_ERR;
-            }
-            break;
+    case VTD_SVT_ALL:
+        mask = vtd_svt_mask[entry->irte.sid_q];
+        if ((source_id & mask) != (sid & mask)) {
+            VTD_DPRINTF(GENERAL, "SID validation for IRTE index "
+                    "%d failed (reqid 0x%04x sid 0x%04x)", index,
+                    sid, source_id);
+            return -VTD_FR_IR_SID_ERR;
+        }
+        break;
 
-        default:
-            VTD_DPRINTF(GENERAL, "Invalid SVT bits (0x%x) in IRTE index "
-                        "%d", entry->irte.sid_vtype, index);
-            /* Take this as verification failure. */
+    case VTD_SVT_BUS:
+        bus_max = source_id >> 8;
+        bus_min = source_id & 0xff;
+        bus = sid >> 8;
+        if (bus > bus_max || bus < bus_min) {
+            VTD_DPRINTF(GENERAL, "SID validation for IRTE index %d "
+                    "failed (bus %d outside %d-%d)", index, bus,
+                    bus_min, bus_max);
             return -VTD_FR_IR_SID_ERR;
-            break;
         }
+        break;
+
+    default:
+        VTD_DPRINTF(GENERAL, "Invalid SVT bits (0x%x) in IRTE index "
+                "%d", entry->irte.sid_vtype, index);
+        /* Take this as verification failure. */
+        return -VTD_FR_IR_SID_ERR;
+        break;
     }
 
     return 0;
@@ -2252,14 +2250,17 @@ static MemTxResult vtd_mem_ir_write(void *opaque, hwaddr addr,
 {
     int ret = 0;
     MSIMessage from = {}, to = {};
-    uint16_t sid = X86_IOMMU_SID_INVALID;
+    VTDAddressSpace *as = opaque;
+    uint16_t sid = PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
 
     from.address = (uint64_t) addr + VTD_INTERRUPT_ADDR_FIRST;
     from.data = (uint32_t) value;
 
-    if (!attrs.unspecified) {
-        /* We have explicit Source ID */
-        sid = attrs.requester_id;
+    if (attrs.requester_id != sid) {
+        VTD_DPRINTF(GENERAL, "int remap request for sid 0x%04x"
+                    " requester_id 0x%04x couldn't be verified",
+                    sid, attrs.requester_id);
+        return MEMTX_ERROR;
     }
 
     ret = vtd_interrupt_remap_msi(opaque, &from, &to, sid);
@@ -2325,7 +2326,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
         memory_region_init_iommu(&vtd_dev_as->iommu, OBJECT(s),
                                  &s->iommu_ops, "intel_iommu", UINT64_MAX);
         memory_region_init_io(&vtd_dev_as->iommu_ir, OBJECT(s),
-                              &vtd_mem_ir_ops, s, "intel_iommu_ir",
+                              &vtd_mem_ir_ops, vtd_dev_as, "intel_iommu_ir",
                               VTD_INTERRUPT_ADDR_SIZE);
         memory_region_add_subregion(&vtd_dev_as->iommu, VTD_INTERRUPT_ADDR_FIRST,
                                     &vtd_dev_as->iommu_ir);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [v4 3/6] hw/iommu: Prepare for AMD IOMMU interrupt remapping
  2016-09-12 10:08 [Qemu-devel] [v4 0/6] AMD IOMMU Interrupt remapping David Kiarie
  2016-09-12 10:08 ` [Qemu-devel] [v4 1/6] hw/msi: Allow platform devices to use explicit SID David Kiarie
  2016-09-12 10:08 ` [Qemu-devel] [v4 2/6] hw/i386: enforce SID verification David Kiarie
@ 2016-09-12 10:08 ` David Kiarie
  2016-09-12 10:08 ` [Qemu-devel] [v4 4/6] hw/iommu: " David Kiarie
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: David Kiarie @ 2016-09-12 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: jan.kiszka, mst, rkrcmar, peterx, ehabkost, pbonzini,
	alex.williamson, David Kiarie

Introduce macros and trace events for use in AMD IOMMU
interrupt remapping

Signed-off-by: David Kiarie <davidkiarie4@gmail.com>
---
 hw/i386/amd_iommu.h  | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/i386/trace-events |  7 +++++
 2 files changed, 87 insertions(+)

diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index 884926e..5c4a13b 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -177,6 +177,68 @@
 #define AMDVI_IOTLB_MAX_SIZE 1024
 #define AMDVI_DEVID_SHIFT    36
 
+/* interrupt types */
+#define AMDVI_MT_FIXED  0x0
+#define AMDVI_MT_ARBIT  0x1
+#define AMDVI_MT_SMI    0x2
+#define AMDVI_MT_NMI    0x3
+#define AMDVI_MT_INIT   0x4
+#define AMDVI_MT_EXTINT 0x6
+#define AMDVI_MT_LINT1  0xb
+#define AMDVI_MT_LINT0  0xe
+
+/* MSI interrupt type mask */
+#define AMDVI_IR_TYPE_MASK 0x300
+
+/* interrupt destination mode */
+#define AMDVI_IRDEST_MODE_MASK 0x2
+
+/* select MSI data 10:0 bits */
+#define AMDVI_IRTE_INDEX_MASK 0x7ff
+
+/* bits determining whether specific interrupts should be passed
+ * split DTE into 64-bit chunks
+ */
+#define AMDVI_DTE_INTPASS_LSHIFT       56
+#define AMDVI_DTE_EINTPASS_LSHIFT      57
+#define AMDVI_DTE_NMIPASS_LSHIFT       58
+#define AMDVI_DTE_INTCTL_RSHIFT        60
+#define AMDVI_DTE_LINT0PASS_LSHIFT     62
+#define AMDVI_DTE_LINT1PASS_LSHIFT     63
+
+/* INTCTL expected values */
+#define AMDVI_INTCTL_ABORT      0x0
+#define AMDVI_INTCTL_PASS       0x1
+#define AMDVI_INTCTL_REMAP      0x2
+#define AMDVI_INTCTL_RSVD       0x3
+
+/* interrupt data valid */
+#define AMDVI_IR_VALID          (1UL << 0)
+
+/* interrupt root table mask */
+#define AMDVI_IRTEROOT_MASK     0xffffffffffffc0
+
+/* default IRTE size */
+#define AMDVI_DEFAULT_IRTE_SIZE 0x4
+
+#define AMDVI_IR_TABLE_SIZE_MASK 0xfe
+
+/* offsets into MSI data */
+#define AMDVI_MSI_DATA_DM_RSHIFT       0x8
+#define AMDVI_MSI_DATA_LEVEL_RSHIFT    0xe
+#define AMDVI_MSI_DATA_TRM_RSHIFT      0xf
+
+/* offsets into MSI address */
+#define AMDVI_MSI_ADDR_DM_RSHIFT       0x2
+#define AMDVI_MSI_ADDR_RH_RSHIFT       0x3
+#define AMDVI_MSI_ADDR_DEST_RSHIFT     0xc
+
+#define AMDVI_BUS_NUM                  0x0
+/* AMD-Vi specific IOAPIC Device function */
+#define AMDVI_DEVFN_IOAPIC             0xa0
+
+#define AMDVI_LOCAL_APIC_ADDR     0xfee00000
+
 /* extended feature support */
 #define AMDVI_EXT_FEATURES (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
         AMDVI_FEATURE_IA | AMDVI_FEATURE_GT | AMDVI_FEATURE_HE | \
@@ -214,6 +276,24 @@
 #define AMDVI_INT_ADDR_FIRST 0xfee00000
 #define AMDVI_INT_ADDR_LAST  0xfeefffff
 
+#define AMDVI_INT_ADDR_SIZE ((AMDVI_INT_ADDR_LAST - \
+        AMDVI_INT_ADDR_FIRST) + 1)
+/* AMD IOMMU errors */
+#define AMDVI_ILLEG_DEV_TAB  0x1
+#define AMDVI_IOPF_          0x2
+#define AMDVI_DEV_TAB_HW     0x3
+#define AMDVI_PAGE_TAB_HW    0x4
+#define AMDVI_ILLEG_COM      0x5
+#define AMDVI_COM_HW         0x6
+#define AMDVI_IOTLB_TIMEOUT  0x7
+#define AMDVI_INVAL_DEV_REQ  0x8
+#define AMDVI_INVAL_PPR_REQ  0x9
+#define AMDVI_EVT_COUNT_ZERO 0xa
+
+/* represent target and master aborts error state */
+#define AMDVI_TARGET_ABORT     0xb
+#define AMDVI_MASTER_ABORT     0xc
+
 #define TYPE_AMD_IOMMU_DEVICE "amd-iommu"
 #define AMD_IOMMU_DEVICE(obj)\
     OBJECT_CHECK(AMDVIState, (obj), TYPE_AMD_IOMMU_DEVICE)
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 60bdf6a..344c2f6 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -42,3 +42,10 @@ amdvi_mode_invalid(unsigned level, uint64_t addr)"error: translation level 0x%"P
 amdvi_page_fault(uint64_t addr) "error: page fault accessing guest physical address 0x%"PRIx64
 amdvi_iotlb_hit(uint16_t bus, uint16_t slot, uint16_t func, uint64_t addr, uint64_t txaddr) "hit iotlb devid %02x:%02x.%x gpa 0x%"PRIx64 " hpa 0x%"PRIx64
 amdvi_translation_result(uint16_t bus, uint16_t slot, uint16_t func, uint64_t addr, uint64_t txaddr) "devid: %02x:%02x.%x gpa 0x%"PRIx64 " hpa 0x%"PRIx64
+amdvi_irte_get_fail(uint64_t addr, uint64_t offset) "couldn't access device table entry 0x%"PRIx64" + offset 0x%"PRIx64
+amdvi_invalid_irte_entry(uint16_t devid, uint64_t offset) "devid %x requested IRTE offset 0x%"PRIx64" Outside IR table range"
+amdvi_ir_request(uint32_t data, uint64_t addr, uint16_t sid) "IR request data 0x%"PRIx32" address 0x%"PRIx64" SID %x"
+amdvi_ir_remap(uint32_t data, uint64_t addr, uint16_t sid) "IR remap data 0x%"PRIx32" address 0x%"PRIx64" SID %x"
+amdvi_ir_target_abort(uint32_t data, uint64_t addr, uint16_t sid) "IR target abort data 0x%"PRIx32" address 0x%"PRIx64" SID %x"
+amdvi_ir_write_fail(uint64_t addr, uint32_t data) "fail to write to addr 0x%"PRIx64 " value 0x%"PRIx32
+amdvi_ir_read_fail(uint64_t addr) " fail to read from addr 0x%"PRIx64
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [v4 4/6] hw/iommu: AMD IOMMU interrupt remapping
  2016-09-12 10:08 [Qemu-devel] [v4 0/6] AMD IOMMU Interrupt remapping David Kiarie
                   ` (2 preceding siblings ...)
  2016-09-12 10:08 ` [Qemu-devel] [v4 3/6] hw/iommu: Prepare for AMD IOMMU interrupt remapping David Kiarie
@ 2016-09-12 10:08 ` David Kiarie
  2016-09-12 11:34   ` Peter Xu
  2016-09-12 10:08 ` [Qemu-devel] [v4 5/6] hw/acpi: report IOAPIC on IVRS David Kiarie
  2016-09-12 10:08 ` [Qemu-devel] [v4 6/6] hw/iommu: share common code between IOMMUs David Kiarie
  5 siblings, 1 reply; 21+ messages in thread
From: David Kiarie @ 2016-09-12 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: jan.kiszka, mst, rkrcmar, peterx, ehabkost, pbonzini,
	alex.williamson, David Kiarie

Introduce AMD IOMMU interrupt remapping and hook it onto
the existing interrupt remapping infrastructure

Signed-off-by: David Kiarie <davidkiarie4@gmail.com>
---
 hw/i386/amd_iommu.c | 240 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/i386/amd_iommu.h |   4 +-
 hw/intc/ioapic.c    |   1 -
 3 files changed, 242 insertions(+), 3 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 226fea5..76d3816 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -20,6 +20,7 @@
  * Cache implementation inspired by hw/i386/intel_iommu.c
  */
 #include "qemu/osdep.h"
+#include "qemu/error-report.h"
 #include "hw/i386/amd_iommu.h"
 #include "trace.h"
 
@@ -255,6 +256,31 @@ typedef struct QEMU_PACKED {
     uint32_t reserved_5:16;
 } CMDCompletePPR;
 
+typedef union IRTE {
+    struct {
+#ifdef HOST_WORDS_BIGENDIAN
+        uint32_t destination:8;
+        uint32_t rsvd_1:1;
+        uint32_t dm:1;
+        uint32_t rq_eoi:1;
+        uint32_t int_type:3;
+        uint32_t no_fault:1;
+        uint32_t valid:1;
+#else
+        uint32_t valid:1;
+        uint32_t no_fault:1;
+        uint32_t int_type:3;
+        uint32_t rq_eoi:1;
+        uint32_t dm:1;
+        uint32_t rsvd_1:1;
+        uint32_t destination:8;
+#endif
+        uint32_t vector:8;
+        uint32_t rsvd_2:8;
+    } bits;
+    uint32_t data;
+} IRTE;
+
 /* configure MMIO registers at startup/reset */
 static void amdvi_set_quad(AMDVIState *s, hwaddr addr, uint64_t val,
                            uint64_t romask, uint64_t w1cmask)
@@ -641,6 +667,11 @@ static void amdvi_inval_inttable(AMDVIState *s, CMDInvalIntrTable *inval)
         amdvi_log_illegalcom_error(s, inval->type, s->cmdbuf + s->cmdbuf_head);
         return;
     }
+
+    if (s->ir_cache) {
+        x86_iommu_iec_notify_all(X86_IOMMU_DEVICE(s), true, 0, 0);
+    }
+
     trace_amdvi_intr_inval();
 }
 
@@ -1203,6 +1234,197 @@ static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
     return ret;
 }
 
+static inline int amdvi_ir_handle_non_vectored(MSIMessage *src,
+                                               MSIMessage *dst, uint8_t bitpos,
+                                               uint64_t dte)
+{
+    if ((dte & (1UL << bitpos))) {
+        /* passing interrupt enabled */
+        memcpy(dst, src, sizeof(*dst));
+    } else {
+        /* should be target aborted */
+        return -AMDVI_TARGET_ABORT;
+    }
+    return 0;
+}
+
+static int amdvi_remap_ir_intctl(uint64_t dte, IRTE irte,
+                                 MSIMessage *src, MSIMessage *dst)
+{
+    int ret = 0;
+
+    switch ((dte >> AMDVI_DTE_INTCTL_RSHIFT) & 3UL) {
+    case AMDVI_INTCTL_PASS:
+        /* pass */
+        memcpy(dst, src, sizeof(*dst));
+        break;
+    case AMDVI_INTCTL_REMAP:
+        /* remap */
+        if (irte.bits.valid) {
+            /* LOCAL APIC address */
+            dst->address = AMDVI_LOCAL_APIC_ADDR;
+            /* destination mode */
+            dst->address |= ((uint64_t)irte.bits.dm) <<
+                            AMDVI_MSI_ADDR_DM_RSHIFT;
+            /* RH */
+            dst->address |= ((uint64_t)irte.bits.rq_eoi) <<
+                            AMDVI_MSI_ADDR_RH_RSHIFT;
+            /* Destination ID */
+            dst->address |= ((uint64_t)irte.bits.destination) <<
+                            AMDVI_MSI_ADDR_DEST_RSHIFT;
+            /* construct data - vector */
+            dst->data |= irte.bits.vector;
+            /* Interrupt type */
+            dst->data |= ((uint64_t)irte.bits.int_type) <<
+                         AMDVI_MSI_DATA_DM_RSHIFT;
+        } else  {
+            ret = -AMDVI_TARGET_ABORT;
+        }
+        break;
+    case AMDVI_INTCTL_ABORT:
+    case AMDVI_INTCTL_RSVD:
+        ret = -AMDVI_TARGET_ABORT;
+    }
+    return ret;
+}
+
+static int amdvi_irte_get(AMDVIState *s, MSIMessage *src, IRTE *irte,
+                          uint64_t *dte, uint16_t devid)
+{
+    uint64_t irte_root, offset = devid * AMDVI_DEVTAB_ENTRY_SIZE,
+             ir_table_size;
+
+    irte_root = dte[2] & AMDVI_IRTEROOT_MASK;
+    offset = (src->data & AMDVI_IRTE_INDEX_MASK) << 2;
+    ir_table_size = 1UL << (dte[2] & AMDVI_IR_TABLE_SIZE_MASK);
+    /* enforce IR table size */
+    if (offset > (ir_table_size * AMDVI_DEFAULT_IRTE_SIZE)) {
+        trace_amdvi_invalid_irte_entry(offset, ir_table_size);
+        return -AMDVI_TARGET_ABORT;
+    }
+    /* read IRTE */
+    if (dma_memory_read(&address_space_memory, irte_root + offset,
+        irte, sizeof(*irte))) {
+        trace_amdvi_irte_get_fail(irte_root, offset);
+        return -AMDVI_DEV_TAB_HW;
+    }
+    return 0;
+}
+
+static int amdvi_int_remap(X86IOMMUState *iommu, MSIMessage *src,
+                           MSIMessage *dst, uint16_t sid)
+{
+    trace_amdvi_ir_request(src->data, src->address, sid);
+
+    AMDVIState *s = AMD_IOMMU_DEVICE(iommu);
+    int ret = 0;
+    uint64_t dte[4];
+    uint32_t bitpos;
+    IRTE irte;
+
+    amdvi_get_dte(s, sid, dte);
+
+    /* interrupt remapping disabled */
+    if (!(dte[2] & AMDVI_IR_VALID)) {
+        memcpy(dst, src, sizeof(*src));
+        return ret;
+    }
+
+    ret = amdvi_irte_get(s, src, &irte, dte, sid);
+    if (ret < 0) {
+        goto no_remap;
+    }
+    switch (src->data & AMDVI_IR_TYPE_MASK) {
+    case AMDVI_MT_FIXED:
+    case AMDVI_MT_ARBIT:
+        ret = amdvi_remap_ir_intctl(dte[2], irte, src, dst);
+        if (ret < 0) {
+            goto no_remap;
+        } else {
+            s->ir_cache = true;
+            trace_amdvi_ir_remap(dst->data, dst->address, sid);
+            return ret;
+        }
+    /* not handling SMI currently */
+    case AMDVI_MT_SMI:
+        error_report("SMI interrupts not currently handled");
+        goto no_remap;
+    case AMDVI_MT_NMI:
+        bitpos = AMDVI_DTE_NMIPASS_LSHIFT;
+        break;
+    case AMDVI_MT_INIT:
+        bitpos = AMDVI_DTE_INTPASS_LSHIFT;
+        break;
+    case AMDVI_MT_EXTINT:
+        bitpos = AMDVI_DTE_EINTPASS_LSHIFT;
+        break;
+    case AMDVI_MT_LINT1:
+        bitpos = AMDVI_DTE_LINT1PASS_LSHIFT;
+        break;
+    case AMDVI_MT_LINT0:
+        bitpos = AMDVI_DTE_LINT0PASS_LSHIFT;
+    default:
+        goto no_remap;
+    }
+
+    ret = amdvi_ir_handle_non_vectored(src, dst, bitpos, dte[2]);
+    if (ret < 0){
+        goto no_remap;
+    }
+    s->ir_cache = true;
+    trace_amdvi_ir_remap(dst->data, dst->address, sid);
+    return ret;
+no_remap:
+    memcpy(dst, src, sizeof(*src));
+    trace_amdvi_ir_target_abort(dst->data, dst->address, sid);
+    return ret;
+}
+
+static MemTxResult amdvi_ir_read(void *opaque, hwaddr addr,
+                                 uint64_t *data, unsigned size,
+                                 MemTxAttrs attrs)
+{
+    return MEMTX_OK;
+}
+
+static MemTxResult amdvi_ir_write(void *opaque, hwaddr addr, uint64_t val,
+                                  unsigned size, MemTxAttrs attrs)
+{
+    AMDVIAddressSpace *as = opaque;
+    MSIMessage from = { addr + AMDVI_INT_ADDR_FIRST, val }, to = { 0, 0};
+    int ret = 0;
+
+    ret  = amdvi_int_remap(X86_IOMMU_DEVICE(as->iommu_state), &from, &to,
+                           attrs.requester_id);
+
+    if (ret < 0) {
+        trace_amdvi_ir_target_abort(from.data, from.address,
+                                    attrs.requester_id);
+        return MEMTX_ERROR;
+    }
+
+    if(dma_memory_write(&address_space_memory, to.address, &to.data, size)) {
+        trace_amdvi_ir_write_fail(to.address, to.data);
+        return MEMTX_ERROR;
+    }
+
+    return MEMTX_OK;
+}
+
+static const MemoryRegionOps amdvi_ir_ops = {
+    .read_with_attrs = amdvi_ir_read,
+    .write_with_attrs = amdvi_ir_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    }
+};
+
 static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 {
     AMDVIState *s = opaque;
@@ -1226,6 +1448,12 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 
         memory_region_init_iommu(&iommu_as[devfn]->iommu, OBJECT(s),
                                  &s->iommu_ops, "amd-iommu", UINT64_MAX);
+        memory_region_init_io(&iommu_as[devfn]->iommu_ir, OBJECT(s),
+                              &amdvi_ir_ops, iommu_as[devfn], "amd-iommu-ir",
+                              AMDVI_INT_ADDR_SIZE);
+        memory_region_add_subregion(&iommu_as[devfn]->iommu,
+                                    AMDVI_INT_ADDR_FIRST,
+                                    &iommu_as[devfn]->iommu_ir);
         address_space_init(&iommu_as[devfn]->as, &iommu_as[devfn]->iommu,
                            "amd-iommu");
     }
@@ -1274,6 +1502,7 @@ static void amdvi_init(AMDVIState *s)
     s->enabled = false;
     s->ats_enabled = false;
     s->cmdbuf_enabled = false;
+    s->ir_cache = false;
 
     /* reset MMIO */
     memset(s->mmior, 0, AMDVI_MMIO_SIZE);
@@ -1313,11 +1542,15 @@ static void amdvi_realize(DeviceState *dev, Error **err)
     AMDVIState *s = AMD_IOMMU_DEVICE(dev);
     X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
     PCIBus *bus = PC_MACHINE(qdev_get_machine())->bus;
+    PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
     s->iotlb = g_hash_table_new_full(amdvi_uint64_hash,
                                      amdvi_uint64_equal, g_free, g_free);
 
-    /* This device should take care of IOMMU PCI properties */
+    /* AMD IOMMU has Interrupt Remapping on by default */
+    x86_iommu->intr_supported = true;
     x86_iommu->type = TYPE_AMD;
+
+    /* This device should take care of IOMMU PCI properties */
     qdev_set_parent_bus(DEVICE(&s->pci), &bus->qbus);
     object_property_set_bool(OBJECT(&s->pci), true, "realized", err);
     s->capab_offset = pci_add_capability(&s->pci.dev, AMDVI_CAPAB_ID_SEC, 0,
@@ -1329,9 +1562,13 @@ static void amdvi_realize(DeviceState *dev, Error **err)
     memory_region_init_io(&s->mmio, OBJECT(s), &mmio_mem_ops, s, "amdvi-mmio",
                           AMDVI_MMIO_SIZE);
 
+    x86_iommu->ioapic_bdf = PCI_BUILD_BDF(AMDVI_BUS_NUM,
+             AMDVI_SB_IOAPIC_ID);
+
     sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->mmio);
     sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, AMDVI_BASE_ADDR);
     pci_setup_iommu(bus, amdvi_host_dma_iommu, s);
+    pcms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_SB_IOAPIC_ID);
     s->devid = object_property_get_int(OBJECT(&s->pci), "addr", err);
     msi_init(&s->pci.dev, 0, 1, true, false, err);
     amdvi_init(s);
@@ -1358,6 +1595,7 @@ static void amdvi_class_init(ObjectClass *klass, void* data)
     dc->vmsd = &vmstate_amdvi;
     dc->hotpluggable = false;
     dc_class->realize = amdvi_realize;
+    dc_class->int_remap = amdvi_int_remap;
 }
 
 static const TypeInfo amdvi = {
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index 5c4a13b..28b9603 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -235,7 +235,7 @@
 
 #define AMDVI_BUS_NUM                  0x0
 /* AMD-Vi specific IOAPIC Device function */
-#define AMDVI_DEVFN_IOAPIC             0xa0
+#define AMDVI_SB_IOAPIC_ID            0xa0
 
 #define AMDVI_LOCAL_APIC_ADDR     0xfee00000
 
@@ -338,6 +338,8 @@ typedef struct AMDVIState {
     uint32_t evtlog_len;         /* event log length             */
     uint32_t evtlog_head;        /* current IOMMU write position */
     uint32_t evtlog_tail;        /* current Software read position */
+    /* whether we have remapped any interrupts and hence IR cache */
+    bool ir_cache;
 
     /* unused for now */
     hwaddr excl_base;            /* base DVA - IOMMU exclusion range */
diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index f2d4c15..3fefe4a 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -412,7 +412,6 @@ static void ioapic_machine_done_notify(Notifier *notifier, void *data)
             }
             kvm_irqchip_commit_routes(kvm_state);
         }
-
     }
 #endif
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [v4 5/6] hw/acpi: report IOAPIC on IVRS
  2016-09-12 10:08 [Qemu-devel] [v4 0/6] AMD IOMMU Interrupt remapping David Kiarie
                   ` (3 preceding siblings ...)
  2016-09-12 10:08 ` [Qemu-devel] [v4 4/6] hw/iommu: " David Kiarie
@ 2016-09-12 10:08 ` David Kiarie
  2016-09-12 10:08 ` [Qemu-devel] [v4 6/6] hw/iommu: share common code between IOMMUs David Kiarie
  5 siblings, 0 replies; 21+ messages in thread
From: David Kiarie @ 2016-09-12 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: jan.kiszka, mst, rkrcmar, peterx, ehabkost, pbonzini,
	alex.williamson, David Kiarie

Report IOAPIC via IVRS which effectively allows linux AMD-Vi
driver to enable interrupt remapping

Signed-off-by: David Kiarie <davidkiarie4@gmail.com>
---
 hw/i386/acpi-build.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index c20bc71..c9bee8f 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2615,6 +2615,8 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
      *   Refer to Spec - Table 95:IVHD Device Entry Type Codes(4-byte)
      */
     build_append_int_noprefix(table_data, 0x0000001, 4);
+    /* IOAPIC represented as an 8-byte entry. Spec v2.62 Tables 97 */
+    build_append_int_noprefix(table_data, 0x0100a000cf000048, 8);
 
     build_header(linker, table_data, (void *)(table_data->data + iommu_start),
                  "IVRS", table_data->len - iommu_start, 1, NULL, NULL);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [v4 6/6] hw/iommu: share common code between IOMMUs
  2016-09-12 10:08 [Qemu-devel] [v4 0/6] AMD IOMMU Interrupt remapping David Kiarie
                   ` (4 preceding siblings ...)
  2016-09-12 10:08 ` [Qemu-devel] [v4 5/6] hw/acpi: report IOAPIC on IVRS David Kiarie
@ 2016-09-12 10:08 ` David Kiarie
  5 siblings, 0 replies; 21+ messages in thread
From: David Kiarie @ 2016-09-12 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: jan.kiszka, mst, rkrcmar, peterx, ehabkost, pbonzini,
	alex.williamson, David Kiarie

Enabling interrupt remapping with kernel_irqchip=on should result
in an error for both VT-d and AMD-Vi

Signed-off-by: David Kiarie <davidkiarie4@gmail.com>
---
 hw/i386/intel_iommu.c | 9 ---------
 hw/i386/x86-iommu.c   | 8 ++++++++
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index e4bad6a..bf86dcc 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -30,7 +30,6 @@
 #include "hw/boards.h"
 #include "hw/i386/x86-iommu.h"
 #include "hw/pci-host/q35.h"
-#include "sysemu/kvm.h"
 
 /*#define DEBUG_INTEL_IOMMU*/
 #ifdef DEBUG_INTEL_IOMMU
@@ -2472,14 +2471,6 @@ static void vtd_realize(DeviceState *dev, Error **errp)
                             Q35_PSEUDO_DEVFN_IOAPIC);
     /* Pseudo address space under root PCI bus. */
     pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
-
-    /* Currently Intel IOMMU IR only support "kernel-irqchip={off|split}" */
-    if (x86_iommu->intr_supported && kvm_irqchip_in_kernel() &&
-        !kvm_irqchip_is_split()) {
-        error_report("Intel Interrupt Remapping cannot work with "
-                     "kernel-irqchip=on, please use 'split|off'.");
-        exit(1);
-    }
 }
 
 static void vtd_class_init(ObjectClass *klass, void *data)
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 2278af7..66510f7 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -21,6 +21,7 @@
 #include "hw/sysbus.h"
 #include "hw/boards.h"
 #include "hw/i386/x86-iommu.h"
+#include "sysemu/kvm.h"
 #include "qemu/error-report.h"
 #include "trace.h"
 
@@ -84,6 +85,13 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
     if (x86_class->realize) {
         x86_class->realize(dev, errp);
     }
+    /* Currently IOMMU IR only support "kernel-irqchip={off|split}" */
+    if (x86_iommu->intr_supported && kvm_irqchip_in_kernel() &&
+        !kvm_irqchip_is_split()) {
+        error_report("Interrupt Remapping cannot work with "
+                     "kernel-irqchip=on, please use 'split|off'.");
+        exit(1);
+    }
 
     x86_iommu_set_default(X86_IOMMU_DEVICE(dev));
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [v4 1/6] hw/msi: Allow platform devices to use explicit SID
  2016-09-12 10:08 ` [Qemu-devel] [v4 1/6] hw/msi: Allow platform devices to use explicit SID David Kiarie
@ 2016-09-12 11:02   ` Peter Xu
  2016-09-12 11:26     ` David Kiarie
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Xu @ 2016-09-12 11:02 UTC (permalink / raw)
  To: David Kiarie
  Cc: qemu-devel, jan.kiszka, mst, rkrcmar, ehabkost, pbonzini,
	alex.williamson

On Mon, Sep 12, 2016 at 01:08:04PM +0300, David Kiarie wrote:
> When using IOMMU platform devices like IOAPIC are required to make
> interrupt remapping requests using explicit SID.We affiliate an MSI
> route with a requester ID and a PCI device if present which ensures
> that platform devices can call IOMMU interrupt remapping code with
> explicit SID while maintaining compatility with the original code
> which mainly dealt with PCI devices.

Nit: IMHO it'll be cooler and easier to understand if we can avoid
using very long sentences like above...

[...]

> diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> index 31791b0..f2d4c15 100644
> --- a/hw/intc/ioapic.c
> +++ b/hw/intc/ioapic.c
> @@ -95,9 +95,17 @@ static void ioapic_entry_parse(uint64_t entry, struct ioapic_entry_info *info)
>          (info->delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
>  }
>  
> -static void ioapic_service(IOAPICCommonState *s)
> +static void ioapic_as_write(IOAPICCommonState *s, uint32_t data, uint64_t addr)
>  {
>      AddressSpace *ioapic_as = PC_MACHINE(qdev_get_machine())->ioapic_as;
> +    MemTxAttrs attrs;

Need to zero it first?

[...]

> @@ -383,14 +391,28 @@ static void ioapic_machine_done_notify(Notifier *notifier, void *data)
>      IOAPICCommonState *s = container_of(notifier, IOAPICCommonState,
>                                          machine_done);
>  
> +    X86IOMMUState *iommu = x86_iommu_get_default();
> +    /* kernel_irqchip=off */
> +    if (iommu) {
> +        s->devid = iommu->ioapic_bdf;
> +    }

Move out of "#ifdef CONFIG_KVM"?

[...]

> @@ -407,7 +429,6 @@ static void ioapic_realize(DeviceState *dev, Error **errp)
>  
>      memory_region_init_io(&s->io_memory, OBJECT(s), &ioapic_io_ops, s,
>                            "ioapic", 0x1000);
> -

Useless change.

[...]

> -int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
> +int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev, uint16_t requester_id)
>  {
>      struct kvm_irq_routing_entry kroute = {};
>      int virq;
> @@ -1275,7 +1275,8 @@ int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
>      kroute.u.msi.address_lo = (uint32_t)msg.address;
>      kroute.u.msi.address_hi = msg.address >> 32;
>      kroute.u.msi.data = le32_to_cpu(msg.data);
> -    if (kvm_arch_fixup_msi_route(&kroute, msg.address, msg.data, dev)) {
> +    if (kvm_arch_fixup_msi_route(&kroute, msg.address, msg.data,
> +                                 requester_id)) {

Can we just remove the PCIDevice parameter directly? I didn't go
deeper, but it seems to be explicitly introduced here:

    commit dc9f06ca81e6e16d062ec382701142a3a2ab3f7d
    Author: Pavel Fedin <p.fedin@samsung.com>
    Date:   Thu Oct 15 16:44:52 2015 +0300

        kvm: Pass PCI device pointer to MSI routing functions

And we'd better make sure we really wanted to remove it.

Also, I think we need to modify all target-*/kvm.c for this interface
change?

[...]

Btw, could you explain a bit about what kind of tests have you carried
out for this series (maybe also the AMD IOMMU general one)?

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [v4 2/6] hw/i386: enforce SID verification
  2016-09-12 10:08 ` [Qemu-devel] [v4 2/6] hw/i386: enforce SID verification David Kiarie
@ 2016-09-12 11:09   ` Peter Xu
  2016-09-12 11:12     ` David Kiarie
  2016-09-12 11:13   ` Peter Xu
  1 sibling, 1 reply; 21+ messages in thread
From: Peter Xu @ 2016-09-12 11:09 UTC (permalink / raw)
  To: David Kiarie
  Cc: qemu-devel, jan.kiszka, mst, rkrcmar, ehabkost, pbonzini,
	alex.williamson

On Mon, Sep 12, 2016 at 01:08:05PM +0300, David Kiarie wrote:

[...]

> @@ -2252,14 +2250,17 @@ static MemTxResult vtd_mem_ir_write(void *opaque, hwaddr addr,
>  {
>      int ret = 0;
>      MSIMessage from = {}, to = {};
> -    uint16_t sid = X86_IOMMU_SID_INVALID;
> +    VTDAddressSpace *as = opaque;
> +    uint16_t sid = PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);

I remembered to have commented on this... PCI_BUILD_BDF() should be
problematic. SID may not be built that way when with PCI bridges (or
say, I think current code won't work with PCI bridges). Please see
commit:

    commit 4a94b3aa6d97dfa67a20c7a0315c9773352f0e8e
    Author: Peter Xu <peterx@redhat.com>
    Date:   Tue May 17 19:26:10 2016 +0800

        pci: fix pci_requester_id()

That's why we explicitly differenciate BDF and SID.

I would suggest to make it simpler: we just do not do this extra
check, and pass attrs.requester_id to vtd_interrupt_remap_msi()
directly.

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [v4 2/6] hw/i386: enforce SID verification
  2016-09-12 11:09   ` Peter Xu
@ 2016-09-12 11:12     ` David Kiarie
  0 siblings, 0 replies; 21+ messages in thread
From: David Kiarie @ 2016-09-12 11:12 UTC (permalink / raw)
  To: Peter Xu
  Cc: QEMU Developers, Jan Kiszka, Michael S. Tsirkin, rkrcmar,
	Eduardo Habkost, Paolo Bonzini, Alex Williamson

On Mon, Sep 12, 2016 at 2:09 PM, Peter Xu <peterx@redhat.com> wrote:

> On Mon, Sep 12, 2016 at 01:08:05PM +0300, David Kiarie wrote:
>
> [...]
>
> > @@ -2252,14 +2250,17 @@ static MemTxResult vtd_mem_ir_write(void
> *opaque, hwaddr addr,
> >  {
> >      int ret = 0;
> >      MSIMessage from = {}, to = {};
> > -    uint16_t sid = X86_IOMMU_SID_INVALID;
> > +    VTDAddressSpace *as = opaque;
> > +    uint16_t sid = PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
>
> I remembered to have commented on this... PCI_BUILD_BDF() should be
> problematic. SID may not be built that way when with PCI bridges (or
> say, I think current code won't work with PCI bridges). Please see
> commit:
>
>     commit 4a94b3aa6d97dfa67a20c7a0315c9773352f0e8e
>     Author: Peter Xu <peterx@redhat.com>
>     Date:   Tue May 17 19:26:10 2016 +0800
>
>         pci: fix pci_requester_id()
>
> That's why we explicitly differenciate BDF and SID.
>
> I would suggest to make it simpler: we just do not do this extra
> check, and pass attrs.requester_id to vtd_interrupt_remap_msi()

directly.
>

Yes, we did discuss this but didn't come up with a conclusion so I stuck
with the check.

I will remove the check in the next version.


> Thanks,
>
> -- peterx
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [v4 2/6] hw/i386: enforce SID verification
  2016-09-12 10:08 ` [Qemu-devel] [v4 2/6] hw/i386: enforce SID verification David Kiarie
  2016-09-12 11:09   ` Peter Xu
@ 2016-09-12 11:13   ` Peter Xu
  1 sibling, 0 replies; 21+ messages in thread
From: Peter Xu @ 2016-09-12 11:13 UTC (permalink / raw)
  To: David Kiarie
  Cc: qemu-devel, jan.kiszka, mst, rkrcmar, ehabkost, pbonzini,
	alex.williamson

On Mon, Sep 12, 2016 at 01:08:05PM +0300, David Kiarie wrote:
> Platform devices are now able to make interrupt request with
> explicit SIDs hence we can safely expect triggered AddressSpace ID
> to match the requesting ID
> 
> Signed-off-by: David Kiarie <davidkiarie4@gmail.com>
> ---
>  hw/i386/intel_iommu.c | 77 ++++++++++++++++++++++++++-------------------------
>  1 file changed, 39 insertions(+), 38 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 496d836..e4bad6a 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2043,43 +2043,41 @@ static int vtd_irte_get(IntelIOMMUState *iommu, uint16_t index,
>          return -VTD_FR_IR_IRTE_RSVD;
>      }
>  
> -    if (sid != X86_IOMMU_SID_INVALID) {

Btw, if we removed all references of X86_IOMMU_SID_INVALID, we can
just remove the defination as well in this patch.

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [v4 1/6] hw/msi: Allow platform devices to use explicit SID
  2016-09-12 11:02   ` Peter Xu
@ 2016-09-12 11:26     ` David Kiarie
  2016-09-12 11:54       ` Peter Xu
  0 siblings, 1 reply; 21+ messages in thread
From: David Kiarie @ 2016-09-12 11:26 UTC (permalink / raw)
  To: Peter Xu
  Cc: QEMU Developers, Jan Kiszka, Michael S. Tsirkin, rkrcmar,
	Eduardo Habkost, Paolo Bonzini, Alex Williamson

On Mon, Sep 12, 2016 at 2:02 PM, Peter Xu <peterx@redhat.com> wrote:

> On Mon, Sep 12, 2016 at 01:08:04PM +0300, David Kiarie wrote:
> > When using IOMMU platform devices like IOAPIC are required to make
> > interrupt remapping requests using explicit SID.We affiliate an MSI
> > route with a requester ID and a PCI device if present which ensures
> > that platform devices can call IOMMU interrupt remapping code with
> > explicit SID while maintaining compatility with the original code
> > which mainly dealt with PCI devices.
>
> Nit: IMHO it'll be cooler and easier to understand if we can avoid
> using very long sentences like above...
>
> [...]
>
> > diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> > index 31791b0..f2d4c15 100644
> > --- a/hw/intc/ioapic.c
> > +++ b/hw/intc/ioapic.c
> > @@ -95,9 +95,17 @@ static void ioapic_entry_parse(uint64_t entry, struct
> ioapic_entry_info *info)
> >          (info->delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
> >  }
> >
> > -static void ioapic_service(IOAPICCommonState *s)
> > +static void ioapic_as_write(IOAPICCommonState *s, uint32_t data,
> uint64_t addr)
> >  {
> >      AddressSpace *ioapic_as = PC_MACHINE(qdev_get_machine())
> ->ioapic_as;
> > +    MemTxAttrs attrs;
>
> Need to zero it first?
>
> [...]
>
> > @@ -383,14 +391,28 @@ static void ioapic_machine_done_notify(Notifier
> *notifier, void *data)
> >      IOAPICCommonState *s = container_of(notifier, IOAPICCommonState,
> >                                          machine_done);
> >
> > +    X86IOMMUState *iommu = x86_iommu_get_default();
> > +    /* kernel_irqchip=off */
> > +    if (iommu) {
> > +        s->devid = iommu->ioapic_bdf;
> > +    }
>
> Move out of "#ifdef CONFIG_KVM"?
>

Right, will move this out.


>
> [...]
>
> > @@ -407,7 +429,6 @@ static void ioapic_realize(DeviceState *dev, Error
> **errp)
> >
> >      memory_region_init_io(&s->io_memory, OBJECT(s), &ioapic_io_ops, s,
> >                            "ioapic", 0x1000);
> > -
>
> Useless change.
>
> [...]
>
> > -int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
> > +int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev,
> uint16_t requester_id)
> >  {
> >      struct kvm_irq_routing_entry kroute = {};
> >      int virq;
> > @@ -1275,7 +1275,8 @@ int kvm_irqchip_add_msi_route(KVMState *s, int
> vector, PCIDevice *dev)
> >      kroute.u.msi.address_lo = (uint32_t)msg.address;
> >      kroute.u.msi.address_hi = msg.address >> 32;
> >      kroute.u.msi.data = le32_to_cpu(msg.data);
> > -    if (kvm_arch_fixup_msi_route(&kroute, msg.address, msg.data, dev))
> {
> > +    if (kvm_arch_fixup_msi_route(&kroute, msg.address, msg.data,
> > +                                 requester_id)) {
>
> Can we just remove the PCIDevice parameter directly? I didn't go
> deeper, but it seems to be explicitly introduced here:
>
>     commit dc9f06ca81e6e16d062ec382701142a3a2ab3f7d
>     Author: Pavel Fedin <p.fedin@samsung.com>
>     Date:   Thu Oct 15 16:44:52 2015 +0300
>
>         kvm: Pass PCI device pointer to MSI routing functions
>
> And we'd better make sure we really wanted to remove it.
>
> Also, I think we need to modify all target-*/kvm.c for this interface
> change?
>

The current code keeps track of PCI devices for purposes of easily
constructing MSI messages

e.g in kvm_update_msi_routes_all which means to get rid of PCI device
pointer means we need another way to keep track of PCI devices.


>
> [...]
>
> Btw, could you explain a bit about what kind of tests have you carried
> out for this series (maybe also the AMD IOMMU general one)?
>

The tests done are pretty basic and manual. For DMA the related device
addresses are pretty static which means I can spot incorrect translations
in the logs. IR has also mainly been tested by watching the logs.


> Thanks,
>
> -- peterx
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [v4 4/6] hw/iommu: AMD IOMMU interrupt remapping
  2016-09-12 10:08 ` [Qemu-devel] [v4 4/6] hw/iommu: " David Kiarie
@ 2016-09-12 11:34   ` Peter Xu
  2016-09-12 11:51     ` David Kiarie
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Xu @ 2016-09-12 11:34 UTC (permalink / raw)
  To: David Kiarie
  Cc: qemu-devel, jan.kiszka, mst, rkrcmar, ehabkost, pbonzini,
	alex.williamson

On Mon, Sep 12, 2016 at 01:08:07PM +0300, David Kiarie wrote:

[...]

>  /* configure MMIO registers at startup/reset */
>  static void amdvi_set_quad(AMDVIState *s, hwaddr addr, uint64_t val,
>                             uint64_t romask, uint64_t w1cmask)
> @@ -641,6 +667,11 @@ static void amdvi_inval_inttable(AMDVIState *s, CMDInvalIntrTable *inval)
>          amdvi_log_illegalcom_error(s, inval->type, s->cmdbuf + s->cmdbuf_head);
>          return;
>      }
> +
> +    if (s->ir_cache) {

Here, we notify IEC only if ir_cache == true, while...

[...]

> +static int amdvi_int_remap(X86IOMMUState *iommu, MSIMessage *src,
> +                           MSIMessage *dst, uint16_t sid)
> +{
> +    trace_amdvi_ir_request(src->data, src->address, sid);
> +
> +    AMDVIState *s = AMD_IOMMU_DEVICE(iommu);
> +    int ret = 0;
> +    uint64_t dte[4];
> +    uint32_t bitpos;
> +    IRTE irte;
> +
> +    amdvi_get_dte(s, sid, dte);
> +
> +    /* interrupt remapping disabled */
> +    if (!(dte[2] & AMDVI_IR_VALID)) {
> +        memcpy(dst, src, sizeof(*src));
> +        return ret;
> +    }
> +
> +    ret = amdvi_irte_get(s, src, &irte, dte, sid);
> +    if (ret < 0) {
> +        goto no_remap;
> +    }
> +    switch (src->data & AMDVI_IR_TYPE_MASK) {
> +    case AMDVI_MT_FIXED:
> +    case AMDVI_MT_ARBIT:
> +        ret = amdvi_remap_ir_intctl(dte[2], irte, src, dst);
> +        if (ret < 0) {
> +            goto no_remap;
> +        } else {
> +            s->ir_cache = true;

Here we set it only if the interrupts are triggered.

Shouldn't we notify IEC in all cases? Since the caches are setup
during configuration, not the first time the interrupt is triggered,
no?

> +            trace_amdvi_ir_remap(dst->data, dst->address, sid);
> +            return ret;
> +        }
> +    /* not handling SMI currently */
> +    case AMDVI_MT_SMI:
> +        error_report("SMI interrupts not currently handled");
> +        goto no_remap;
> +    case AMDVI_MT_NMI:
> +        bitpos = AMDVI_DTE_NMIPASS_LSHIFT;
> +        break;
> +    case AMDVI_MT_INIT:
> +        bitpos = AMDVI_DTE_INTPASS_LSHIFT;
> +        break;
> +    case AMDVI_MT_EXTINT:
> +        bitpos = AMDVI_DTE_EINTPASS_LSHIFT;
> +        break;
> +    case AMDVI_MT_LINT1:
> +        bitpos = AMDVI_DTE_LINT1PASS_LSHIFT;
> +        break;
> +    case AMDVI_MT_LINT0:
> +        bitpos = AMDVI_DTE_LINT0PASS_LSHIFT;
> +    default:
> +        goto no_remap;
> +    }
> +
> +    ret = amdvi_ir_handle_non_vectored(src, dst, bitpos, dte[2]);
> +    if (ret < 0){
> +        goto no_remap;
> +    }
> +    s->ir_cache = true;
> +    trace_amdvi_ir_remap(dst->data, dst->address, sid);
> +    return ret;
> +no_remap:
> +    memcpy(dst, src, sizeof(*src));

Shall we drop it and report when the remapping failed in some way?

I'm totally okay that we just drop it here, and we can do the
reporting in the future. But using the old is not a good one, since if
someone injects fault interrupts, it will be delivered just like
without IOMMU. So we have no protection at all.

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [v4 4/6] hw/iommu: AMD IOMMU interrupt remapping
  2016-09-12 11:34   ` Peter Xu
@ 2016-09-12 11:51     ` David Kiarie
  2016-09-12 12:11       ` Peter Xu
  0 siblings, 1 reply; 21+ messages in thread
From: David Kiarie @ 2016-09-12 11:51 UTC (permalink / raw)
  To: Peter Xu
  Cc: QEMU Developers, Jan Kiszka, Michael S. Tsirkin, rkrcmar,
	Eduardo Habkost, Paolo Bonzini, Alex Williamson

On Mon, Sep 12, 2016 at 2:34 PM, Peter Xu <peterx@redhat.com> wrote:

> On Mon, Sep 12, 2016 at 01:08:07PM +0300, David Kiarie wrote:
>
> [...]
>
> >  /* configure MMIO registers at startup/reset */
> >  static void amdvi_set_quad(AMDVIState *s, hwaddr addr, uint64_t val,
> >                             uint64_t romask, uint64_t w1cmask)
> > @@ -641,6 +667,11 @@ static void amdvi_inval_inttable(AMDVIState *s,
> CMDInvalIntrTable *inval)
> >          amdvi_log_illegalcom_error(s, inval->type, s->cmdbuf +
> s->cmdbuf_head);
> >          return;
> >      }
> > +
> > +    if (s->ir_cache) {
>
> Here, we notify IEC only if ir_cache == true, while...
>
> [...]
>
> > +static int amdvi_int_remap(X86IOMMUState *iommu, MSIMessage *src,
> > +                           MSIMessage *dst, uint16_t sid)
> > +{
> > +    trace_amdvi_ir_request(src->data, src->address, sid);
> > +
> > +    AMDVIState *s = AMD_IOMMU_DEVICE(iommu);
> > +    int ret = 0;
> > +    uint64_t dte[4];
> > +    uint32_t bitpos;
> > +    IRTE irte;
> > +
> > +    amdvi_get_dte(s, sid, dte);
> > +
> > +    /* interrupt remapping disabled */
> > +    if (!(dte[2] & AMDVI_IR_VALID)) {
> > +        memcpy(dst, src, sizeof(*src));
> > +        return ret;
> > +    }
> > +
> > +    ret = amdvi_irte_get(s, src, &irte, dte, sid);
> > +    if (ret < 0) {
> > +        goto no_remap;
> > +    }
> > +    switch (src->data & AMDVI_IR_TYPE_MASK) {
> > +    case AMDVI_MT_FIXED:
> > +    case AMDVI_MT_ARBIT:
> > +        ret = amdvi_remap_ir_intctl(dte[2], irte, src, dst);
> > +        if (ret < 0) {
> > +            goto no_remap;
> > +        } else {
> > +            s->ir_cache = true;
>
> Here we set it only if the interrupts are triggered.
>
> Shouldn't we notify IEC in all cases? Since the caches are setup
> during configuration, not the first time the interrupt is triggered,
> no?


I did have a problem with this. I don't know whether Intel IOMMU behaves
the same way but AMD IOMMU invalidates interrupt cache for each and every
device at boot(every possible device). Having the cache invalidation
trigger this many times bugs the guest at boot. I was of the opinion that
caches will not contain anything until translations actually happen.


>


> > +            trace_amdvi_ir_remap(dst->data, dst->address, sid);
> > +            return ret;
> > +        }
> > +    /* not handling SMI currently */
> > +    case AMDVI_MT_SMI:
> > +        error_report("SMI interrupts not currently handled");
> > +        goto no_remap;
> > +    case AMDVI_MT_NMI:
> > +        bitpos = AMDVI_DTE_NMIPASS_LSHIFT;
> > +        break;
> > +    case AMDVI_MT_INIT:
> > +        bitpos = AMDVI_DTE_INTPASS_LSHIFT;
> > +        break;
> > +    case AMDVI_MT_EXTINT:
> > +        bitpos = AMDVI_DTE_EINTPASS_LSHIFT;
> > +        break;
> > +    case AMDVI_MT_LINT1:
> > +        bitpos = AMDVI_DTE_LINT1PASS_LSHIFT;
> > +        break;
> > +    case AMDVI_MT_LINT0:
> > +        bitpos = AMDVI_DTE_LINT0PASS_LSHIFT;
> > +    default:
> > +        goto no_remap;
> > +    }
> > +
> > +    ret = amdvi_ir_handle_non_vectored(src, dst, bitpos, dte[2]);
> > +    if (ret < 0){
> > +        goto no_remap;
> > +    }
> > +    s->ir_cache = true;
> > +    trace_amdvi_ir_remap(dst->data, dst->address, sid);
> > +    return ret;
> > +no_remap:
> > +    memcpy(dst, src, sizeof(*src));
>
> Shall we drop it and report when the remapping failed in some way?
>
> I'm totally okay that we just drop it here, and we can do the
> reporting in the future. But using the old is not a good one, since if
> someone injects fault interrupts, it will be delivered just like
> without IOMMU. So we have no protection at all.
>

Wont this get dropped based on the return value ? I think the 'memcpy' is
not necessary but  my understanding is kvm will drop the translation based
on the return value, no ?


> Thanks,
>
> -- peterx
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [v4 1/6] hw/msi: Allow platform devices to use explicit SID
  2016-09-12 11:26     ` David Kiarie
@ 2016-09-12 11:54       ` Peter Xu
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Xu @ 2016-09-12 11:54 UTC (permalink / raw)
  To: David Kiarie
  Cc: QEMU Developers, Jan Kiszka, Michael S. Tsirkin, rkrcmar,
	Eduardo Habkost, Paolo Bonzini, Alex Williamson

On Mon, Sep 12, 2016 at 02:26:14PM +0300, David Kiarie wrote:

[...]

> > > -int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
> > > +int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev,
> > uint16_t requester_id)
> > >  {
> > >      struct kvm_irq_routing_entry kroute = {};
> > >      int virq;
> > > @@ -1275,7 +1275,8 @@ int kvm_irqchip_add_msi_route(KVMState *s, int
> > vector, PCIDevice *dev)
> > >      kroute.u.msi.address_lo = (uint32_t)msg.address;
> > >      kroute.u.msi.address_hi = msg.address >> 32;
> > >      kroute.u.msi.data = le32_to_cpu(msg.data);
> > > -    if (kvm_arch_fixup_msi_route(&kroute, msg.address, msg.data, dev))
> > {
> > > +    if (kvm_arch_fixup_msi_route(&kroute, msg.address, msg.data,
> > > +                                 requester_id)) {
> >
> > Can we just remove the PCIDevice parameter directly? I didn't go
> > deeper, but it seems to be explicitly introduced here:
> >
> >     commit dc9f06ca81e6e16d062ec382701142a3a2ab3f7d
> >     Author: Pavel Fedin <p.fedin@samsung.com>
> >     Date:   Thu Oct 15 16:44:52 2015 +0300
> >
> >         kvm: Pass PCI device pointer to MSI routing functions
> >
> > And we'd better make sure we really wanted to remove it.
> >
> > Also, I think we need to modify all target-*/kvm.c for this interface
> > change?
> >
> 
> The current code keeps track of PCI devices for purposes of easily
> constructing MSI messages
> 
> e.g in kvm_update_msi_routes_all which means to get rid of PCI device
> pointer means we need another way to keep track of PCI devices.

Actually I mean kvm_arch_fixup_msi_route(), not
kvm_{add|update}_msi_routes_all().

It looks like above commit (dc9f06ca81e) was preparing to pass in SID
in the future for ARM (though still haven't implemented on QEMU side).
So maybe removing PCIDevice is okay in this one since you are passing
in the SID directly. However I think possibly Paolo is the best one to
know the truth...

No matter what, you may still need to change the interface for other
platforms. Just try to grep kvm_arch_fixup_msi_route() in target-*/
directories. 

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [v4 4/6] hw/iommu: AMD IOMMU interrupt remapping
  2016-09-12 11:51     ` David Kiarie
@ 2016-09-12 12:11       ` Peter Xu
  2016-09-12 12:45         ` David Kiarie
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Xu @ 2016-09-12 12:11 UTC (permalink / raw)
  To: David Kiarie
  Cc: QEMU Developers, Jan Kiszka, Michael S. Tsirkin, rkrcmar,
	Eduardo Habkost, Paolo Bonzini, Alex Williamson

On Mon, Sep 12, 2016 at 02:51:27PM +0300, David Kiarie wrote:
> On Mon, Sep 12, 2016 at 2:34 PM, Peter Xu <peterx@redhat.com> wrote:
> 
> > On Mon, Sep 12, 2016 at 01:08:07PM +0300, David Kiarie wrote:
> >
> > [...]
> >
> > >  /* configure MMIO registers at startup/reset */
> > >  static void amdvi_set_quad(AMDVIState *s, hwaddr addr, uint64_t val,
> > >                             uint64_t romask, uint64_t w1cmask)
> > > @@ -641,6 +667,11 @@ static void amdvi_inval_inttable(AMDVIState *s,
> > CMDInvalIntrTable *inval)
> > >          amdvi_log_illegalcom_error(s, inval->type, s->cmdbuf +
> > s->cmdbuf_head);
> > >          return;
> > >      }
> > > +
> > > +    if (s->ir_cache) {
> >
> > Here, we notify IEC only if ir_cache == true, while...
> >
> > [...]
> >
> > > +static int amdvi_int_remap(X86IOMMUState *iommu, MSIMessage *src,
> > > +                           MSIMessage *dst, uint16_t sid)
> > > +{
> > > +    trace_amdvi_ir_request(src->data, src->address, sid);
> > > +
> > > +    AMDVIState *s = AMD_IOMMU_DEVICE(iommu);
> > > +    int ret = 0;
> > > +    uint64_t dte[4];
> > > +    uint32_t bitpos;
> > > +    IRTE irte;
> > > +
> > > +    amdvi_get_dte(s, sid, dte);
> > > +
> > > +    /* interrupt remapping disabled */
> > > +    if (!(dte[2] & AMDVI_IR_VALID)) {
> > > +        memcpy(dst, src, sizeof(*src));
> > > +        return ret;
> > > +    }
> > > +
> > > +    ret = amdvi_irte_get(s, src, &irte, dte, sid);
> > > +    if (ret < 0) {
> > > +        goto no_remap;
> > > +    }
> > > +    switch (src->data & AMDVI_IR_TYPE_MASK) {
> > > +    case AMDVI_MT_FIXED:
> > > +    case AMDVI_MT_ARBIT:
> > > +        ret = amdvi_remap_ir_intctl(dte[2], irte, src, dst);
> > > +        if (ret < 0) {
> > > +            goto no_remap;
> > > +        } else {
> > > +            s->ir_cache = true;
> >
> > Here we set it only if the interrupts are triggered.
> >
> > Shouldn't we notify IEC in all cases? Since the caches are setup
> > during configuration, not the first time the interrupt is triggered,
> > no?
> 
> 
> I did have a problem with this. I don't know whether Intel IOMMU behaves
> the same way but AMD IOMMU invalidates interrupt cache for each and every
> device at boot(every possible device). Having the cache invalidation
> trigger this many times bugs the guest at boot. I was of the opinion that
> caches will not contain anything until translations actually happen.

When we say cache here, we are mostly talking about GSI routes in
kernel, right? Since we still don't have other kind of interrupt
caches AFAIK. If so, GSI routes should already been setup even if the
interrupts are not triggered for a single time. So we need to
invalidate them even ir_cache == false.

I think the problem is why cache invalidations during boot will bug
the system. Any clue?

> 
> 
> >
> 
> 
> > > +            trace_amdvi_ir_remap(dst->data, dst->address, sid);
> > > +            return ret;
> > > +        }
> > > +    /* not handling SMI currently */
> > > +    case AMDVI_MT_SMI:
> > > +        error_report("SMI interrupts not currently handled");
> > > +        goto no_remap;
> > > +    case AMDVI_MT_NMI:
> > > +        bitpos = AMDVI_DTE_NMIPASS_LSHIFT;
> > > +        break;
> > > +    case AMDVI_MT_INIT:
> > > +        bitpos = AMDVI_DTE_INTPASS_LSHIFT;
> > > +        break;
> > > +    case AMDVI_MT_EXTINT:
> > > +        bitpos = AMDVI_DTE_EINTPASS_LSHIFT;
> > > +        break;
> > > +    case AMDVI_MT_LINT1:
> > > +        bitpos = AMDVI_DTE_LINT1PASS_LSHIFT;
> > > +        break;
> > > +    case AMDVI_MT_LINT0:
> > > +        bitpos = AMDVI_DTE_LINT0PASS_LSHIFT;
> > > +    default:
> > > +        goto no_remap;
> > > +    }
> > > +
> > > +    ret = amdvi_ir_handle_non_vectored(src, dst, bitpos, dte[2]);
> > > +    if (ret < 0){
> > > +        goto no_remap;
> > > +    }
> > > +    s->ir_cache = true;
> > > +    trace_amdvi_ir_remap(dst->data, dst->address, sid);
> > > +    return ret;
> > > +no_remap:
> > > +    memcpy(dst, src, sizeof(*src));
> >
> > Shall we drop it and report when the remapping failed in some way?
> >
> > I'm totally okay that we just drop it here, and we can do the
> > reporting in the future. But using the old is not a good one, since if
> > someone injects fault interrupts, it will be delivered just like
> > without IOMMU. So we have no protection at all.
> >
> 
> Wont this get dropped based on the return value ? I think the 'memcpy' is
> not necessary but  my understanding is kvm will drop the translation based
> on the return value, no ?

Yeah you are right. Then I'll suggest something like:

    no_remap:
        memcpy(...);
    remap_fail:
        trace_...();
        return ret;

And we goto remap_fail when error happens. That'll be cleaner to me,
and after all we don't need to memcpy() if something failed.

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [v4 4/6] hw/iommu: AMD IOMMU interrupt remapping
  2016-09-12 12:11       ` Peter Xu
@ 2016-09-12 12:45         ` David Kiarie
  2016-09-13  7:38           ` Peter Xu
  0 siblings, 1 reply; 21+ messages in thread
From: David Kiarie @ 2016-09-12 12:45 UTC (permalink / raw)
  To: Peter Xu
  Cc: QEMU Developers, Jan Kiszka, Michael S. Tsirkin, rkrcmar,
	Eduardo Habkost, Paolo Bonzini, Alex Williamson

On Mon, Sep 12, 2016 at 3:11 PM, Peter Xu <peterx@redhat.com> wrote:

> On Mon, Sep 12, 2016 at 02:51:27PM +0300, David Kiarie wrote:
> > On Mon, Sep 12, 2016 at 2:34 PM, Peter Xu <peterx@redhat.com> wrote:
> >
> > > On Mon, Sep 12, 2016 at 01:08:07PM +0300, David Kiarie wrote:
> > >
> > > [...]
> > >
> > > >  /* configure MMIO registers at startup/reset */
> > > >  static void amdvi_set_quad(AMDVIState *s, hwaddr addr, uint64_t val,
> > > >                             uint64_t romask, uint64_t w1cmask)
> > > > @@ -641,6 +667,11 @@ static void amdvi_inval_inttable(AMDVIState *s,
> > > CMDInvalIntrTable *inval)
> > > >          amdvi_log_illegalcom_error(s, inval->type, s->cmdbuf +
> > > s->cmdbuf_head);
> > > >          return;
> > > >      }
> > > > +
> > > > +    if (s->ir_cache) {
> > >
> > > Here, we notify IEC only if ir_cache == true, while...
> > >
> > > [...]
> > >
> > > > +static int amdvi_int_remap(X86IOMMUState *iommu, MSIMessage *src,
> > > > +                           MSIMessage *dst, uint16_t sid)
> > > > +{
> > > > +    trace_amdvi_ir_request(src->data, src->address, sid);
> > > > +
> > > > +    AMDVIState *s = AMD_IOMMU_DEVICE(iommu);
> > > > +    int ret = 0;
> > > > +    uint64_t dte[4];
> > > > +    uint32_t bitpos;
> > > > +    IRTE irte;
> > > > +
> > > > +    amdvi_get_dte(s, sid, dte);
> > > > +
> > > > +    /* interrupt remapping disabled */
> > > > +    if (!(dte[2] & AMDVI_IR_VALID)) {
> > > > +        memcpy(dst, src, sizeof(*src));
> > > > +        return ret;
> > > > +    }
> > > > +
> > > > +    ret = amdvi_irte_get(s, src, &irte, dte, sid);
> > > > +    if (ret < 0) {
> > > > +        goto no_remap;
> > > > +    }
> > > > +    switch (src->data & AMDVI_IR_TYPE_MASK) {
> > > > +    case AMDVI_MT_FIXED:
> > > > +    case AMDVI_MT_ARBIT:
> > > > +        ret = amdvi_remap_ir_intctl(dte[2], irte, src, dst);
> > > > +        if (ret < 0) {
> > > > +            goto no_remap;
> > > > +        } else {
> > > > +            s->ir_cache = true;
> > >
> > > Here we set it only if the interrupts are triggered.
> > >
> > > Shouldn't we notify IEC in all cases? Since the caches are setup
> > > during configuration, not the first time the interrupt is triggered,
> > > no?
> >
> >
> > I did have a problem with this. I don't know whether Intel IOMMU behaves
> > the same way but AMD IOMMU invalidates interrupt cache for each and every
> > device at boot(every possible device). Having the cache invalidation
> > trigger this many times bugs the guest at boot. I was of the opinion that
> > caches will not contain anything until translations actually happen.
>
> When we say cache here, we are mostly talking about GSI routes in
> kernel, right? Since we still don't have other kind of interrupt
> caches AFAIK. If so, GSI routes should already been setup even if the
> interrupts are not triggered for a single time. So we need to
> invalidate them even ir_cache == false.
>

You're right but I'm not sure how to implement that while avoiding
triggering the notifier numerous pointless times during boot.


> I think the problem is why cache invalidations during boot will bug
> the system. Any clue?
>

The issue is not too many invalidations. I don't have a very clear idea of
how notifiers work but I would assume it spawns a thread or they somehow
use a multithreaded approach which would mean triggering the notifier too
many times within a very short period many trigger a bunch of issues.


> >
> >
> > >
> >
> >
> > > > +            trace_amdvi_ir_remap(dst->data, dst->address, sid);
> > > > +            return ret;
> > > > +        }
> > > > +    /* not handling SMI currently */
> > > > +    case AMDVI_MT_SMI:
> > > > +        error_report("SMI interrupts not currently handled");
> > > > +        goto no_remap;
> > > > +    case AMDVI_MT_NMI:
> > > > +        bitpos = AMDVI_DTE_NMIPASS_LSHIFT;
> > > > +        break;
> > > > +    case AMDVI_MT_INIT:
> > > > +        bitpos = AMDVI_DTE_INTPASS_LSHIFT;
> > > > +        break;
> > > > +    case AMDVI_MT_EXTINT:
> > > > +        bitpos = AMDVI_DTE_EINTPASS_LSHIFT;
> > > > +        break;
> > > > +    case AMDVI_MT_LINT1:
> > > > +        bitpos = AMDVI_DTE_LINT1PASS_LSHIFT;
> > > > +        break;
> > > > +    case AMDVI_MT_LINT0:
> > > > +        bitpos = AMDVI_DTE_LINT0PASS_LSHIFT;
> > > > +    default:
> > > > +        goto no_remap;
> > > > +    }
> > > > +
> > > > +    ret = amdvi_ir_handle_non_vectored(src, dst, bitpos, dte[2]);
> > > > +    if (ret < 0){
> > > > +        goto no_remap;
> > > > +    }
> > > > +    s->ir_cache = true;
> > > > +    trace_amdvi_ir_remap(dst->data, dst->address, sid);
> > > > +    return ret;
> > > > +no_remap:
> > > > +    memcpy(dst, src, sizeof(*src));
> > >
> > > Shall we drop it and report when the remapping failed in some way?
> > >
> > > I'm totally okay that we just drop it here, and we can do the
> > > reporting in the future. But using the old is not a good one, since if
> > > someone injects fault interrupts, it will be delivered just like
> > > without IOMMU. So we have no protection at all.
> > >
> >
> > Wont this get dropped based on the return value ? I think the 'memcpy' is
> > not necessary but  my understanding is kvm will drop the translation
> based
> > on the return value, no ?
>
> Yeah you are right. Then I'll suggest something like:
>
>     no_remap:
>         memcpy(...);
>     remap_fail:
>         trace_...();
>         return ret;
>
> And we goto remap_fail when error happens. That'll be cleaner to me,
> and after all we don't need to memcpy() if something failed.
>
> Thanks,
>
> -- peterx
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [v4 4/6] hw/iommu: AMD IOMMU interrupt remapping
  2016-09-12 12:45         ` David Kiarie
@ 2016-09-13  7:38           ` Peter Xu
  2016-09-13 14:11             ` Michael S. Tsirkin
  2016-09-14 10:12             ` David Kiarie
  0 siblings, 2 replies; 21+ messages in thread
From: Peter Xu @ 2016-09-13  7:38 UTC (permalink / raw)
  To: David Kiarie
  Cc: QEMU Developers, Jan Kiszka, Michael S. Tsirkin, rkrcmar,
	Eduardo Habkost, Paolo Bonzini, Alex Williamson

On Mon, Sep 12, 2016 at 03:45:48PM +0300, David Kiarie wrote:
> > When we say cache here, we are mostly talking about GSI routes in
> > kernel, right? Since we still don't have other kind of interrupt
> > caches AFAIK. If so, GSI routes should already been setup even if the
> > interrupts are not triggered for a single time. So we need to
> > invalidate them even ir_cache == false.
> >
> 
> You're right but I'm not sure how to implement that while avoiding
> triggering the notifier numerous pointless times during boot.
> 
> 
> > I think the problem is why cache invalidations during boot will bug
> > the system. Any clue?
> >
> 
> The issue is not too many invalidations. I don't have a very clear idea of
> how notifiers work but I would assume it spawns a thread or they somehow
> use a multithreaded approach which would mean triggering the notifier too
> many times within a very short period many trigger a bunch of issues.

No thread is spawn, we just call the notifier callbacks.

For me it's fairly acceptable that guest sends lots of invalidations
during boot. That should not lead to any functional issues. If there
is, then something might be wrong.

I don't know whether mst will like to merge this series even without
fixing this. Anyway I would still prefer to root cause the issue, or
at least comment this out in the commit message (or codes somewhere)
so that we know there is something TBD and might cause misterious
troubles...

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [v4 4/6] hw/iommu: AMD IOMMU interrupt remapping
  2016-09-13  7:38           ` Peter Xu
@ 2016-09-13 14:11             ` Michael S. Tsirkin
  2016-09-13 14:13               ` Michael S. Tsirkin
  2016-09-14 10:12             ` David Kiarie
  1 sibling, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2016-09-13 14:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: David Kiarie, QEMU Developers, Jan Kiszka, rkrcmar,
	Eduardo Habkost, Paolo Bonzini, Alex Williamson

On Tue, Sep 13, 2016 at 03:38:38PM +0800, Peter Xu wrote:
> On Mon, Sep 12, 2016 at 03:45:48PM +0300, David Kiarie wrote:
> > > When we say cache here, we are mostly talking about GSI routes in
> > > kernel, right? Since we still don't have other kind of interrupt
> > > caches AFAIK. If so, GSI routes should already been setup even if the
> > > interrupts are not triggered for a single time. So we need to
> > > invalidate them even ir_cache == false.
> > >
> > 
> > You're right but I'm not sure how to implement that while avoiding
> > triggering the notifier numerous pointless times during boot.
> > 
> > 
> > > I think the problem is why cache invalidations during boot will bug
> > > the system. Any clue?
> > >
> > 
> > The issue is not too many invalidations. I don't have a very clear idea of
> > how notifiers work but I would assume it spawns a thread or they somehow
> > use a multithreaded approach which would mean triggering the notifier too
> > many times within a very short period many trigger a bunch of issues.
> 
> No thread is spawn, we just call the notifier callbacks.
> 
> For me it's fairly acceptable that guest sends lots of invalidations
> during boot. That should not lead to any functional issues. If there
> is, then something might be wrong.
> 
> I don't know whether mst will like to merge this series even without
> fixing this. Anyway I would still prefer to root cause the issue, or
> at least comment this out in the commit message (or codes somewhere)
> so that we know there is something TBD and might cause misterious
> troubles...
> 
> Thanks,
> 
> -- peterx

By now it's minimally intrusive, so yes, I think I'll merge
and we can apply fixes on top incrementally.
E.g. would you like to post the suggested comment as a patch?

-- 
MST

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [v4 4/6] hw/iommu: AMD IOMMU interrupt remapping
  2016-09-13 14:11             ` Michael S. Tsirkin
@ 2016-09-13 14:13               ` Michael S. Tsirkin
  0 siblings, 0 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2016-09-13 14:13 UTC (permalink / raw)
  To: Peter Xu
  Cc: David Kiarie, QEMU Developers, Jan Kiszka, rkrcmar,
	Eduardo Habkost, Paolo Bonzini, Alex Williamson

On Tue, Sep 13, 2016 at 05:11:00PM +0300, Michael S. Tsirkin wrote:
> On Tue, Sep 13, 2016 at 03:38:38PM +0800, Peter Xu wrote:
> > On Mon, Sep 12, 2016 at 03:45:48PM +0300, David Kiarie wrote:
> > > > When we say cache here, we are mostly talking about GSI routes in
> > > > kernel, right? Since we still don't have other kind of interrupt
> > > > caches AFAIK. If so, GSI routes should already been setup even if the
> > > > interrupts are not triggered for a single time. So we need to
> > > > invalidate them even ir_cache == false.
> > > >
> > > 
> > > You're right but I'm not sure how to implement that while avoiding
> > > triggering the notifier numerous pointless times during boot.
> > > 
> > > 
> > > > I think the problem is why cache invalidations during boot will bug
> > > > the system. Any clue?
> > > >
> > > 
> > > The issue is not too many invalidations. I don't have a very clear idea of
> > > how notifiers work but I would assume it spawns a thread or they somehow
> > > use a multithreaded approach which would mean triggering the notifier too
> > > many times within a very short period many trigger a bunch of issues.
> > 
> > No thread is spawn, we just call the notifier callbacks.
> > 
> > For me it's fairly acceptable that guest sends lots of invalidations
> > during boot. That should not lead to any functional issues. If there
> > is, then something might be wrong.
> > 
> > I don't know whether mst will like to merge this series even without
> > fixing this. Anyway I would still prefer to root cause the issue, or
> > at least comment this out in the commit message (or codes somewhere)
> > so that we know there is something TBD and might cause misterious
> > troubles...
> > 
> > Thanks,
> > 
> > -- peterx
> 
> By now it's minimally intrusive, so yes, I think I'll merge
> and we can apply fixes on top incrementally.
> E.g. would you like to post the suggested comment as a patch?
> 
> -- 
> MST

Sorry, in fact I was referring to amd iommu itself, while posting
in the int remapping thread.
There will be more versions of this one
so including a comment there directly is better.

-- 
MST

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [v4 4/6] hw/iommu: AMD IOMMU interrupt remapping
  2016-09-13  7:38           ` Peter Xu
  2016-09-13 14:11             ` Michael S. Tsirkin
@ 2016-09-14 10:12             ` David Kiarie
  1 sibling, 0 replies; 21+ messages in thread
From: David Kiarie @ 2016-09-14 10:12 UTC (permalink / raw)
  To: Peter Xu
  Cc: QEMU Developers, Jan Kiszka, Michael S. Tsirkin, rkrcmar,
	Eduardo Habkost, Paolo Bonzini, Alex Williamson

On Tue, Sep 13, 2016 at 10:38 AM, Peter Xu <peterx@redhat.com> wrote:

> On Mon, Sep 12, 2016 at 03:45:48PM +0300, David Kiarie wrote:
> > > When we say cache here, we are mostly talking about GSI routes in
> > > kernel, right? Since we still don't have other kind of interrupt
> > > caches AFAIK. If so, GSI routes should already been setup even if the
> > > interrupts are not triggered for a single time. So we need to
> > > invalidate them even ir_cache == false.
> > >
> >
> > You're right but I'm not sure how to implement that while avoiding
> > triggering the notifier numerous pointless times during boot.
> >
> >
> > > I think the problem is why cache invalidations during boot will bug
> > > the system. Any clue?
> > >
> >
> > The issue is not too many invalidations. I don't have a very clear idea
> of
> > how notifiers work but I would assume it spawns a thread or they somehow
> > use a multithreaded approach which would mean triggering the notifier too
> > many times within a very short period many trigger a bunch of issues.
>
> No thread is spawn, we just call the notifier callbacks.
>
> For me it's fairly acceptable that guest sends lots of invalidations
> during boot. That should not lead to any functional issues. If there
> is, then something might be wrong.
>
> I don't know whether mst will like to merge this series even without
> fixing this. Anyway I would still prefer to root cause the issue, or
> at least comment this out in the commit message (or codes somewhere)
> so that we know there is something TBD and might cause misterious
> troubles...
>

Unfortunately, I, since recently can't reproduce this issue. I am however
going to change the current code so I invalidate kernel interrupt cache
each time AMD IOMMU issues an invalidate command(hence will get rid of the
ir_cache variable and it's dependency).

Thanks for taking time to review this patchset a new version coming up soon!


> Thanks,
>
> -- peterx
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-09-14 10:13 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-12 10:08 [Qemu-devel] [v4 0/6] AMD IOMMU Interrupt remapping David Kiarie
2016-09-12 10:08 ` [Qemu-devel] [v4 1/6] hw/msi: Allow platform devices to use explicit SID David Kiarie
2016-09-12 11:02   ` Peter Xu
2016-09-12 11:26     ` David Kiarie
2016-09-12 11:54       ` Peter Xu
2016-09-12 10:08 ` [Qemu-devel] [v4 2/6] hw/i386: enforce SID verification David Kiarie
2016-09-12 11:09   ` Peter Xu
2016-09-12 11:12     ` David Kiarie
2016-09-12 11:13   ` Peter Xu
2016-09-12 10:08 ` [Qemu-devel] [v4 3/6] hw/iommu: Prepare for AMD IOMMU interrupt remapping David Kiarie
2016-09-12 10:08 ` [Qemu-devel] [v4 4/6] hw/iommu: " David Kiarie
2016-09-12 11:34   ` Peter Xu
2016-09-12 11:51     ` David Kiarie
2016-09-12 12:11       ` Peter Xu
2016-09-12 12:45         ` David Kiarie
2016-09-13  7:38           ` Peter Xu
2016-09-13 14:11             ` Michael S. Tsirkin
2016-09-13 14:13               ` Michael S. Tsirkin
2016-09-14 10:12             ` David Kiarie
2016-09-12 10:08 ` [Qemu-devel] [v4 5/6] hw/acpi: report IOAPIC on IVRS David Kiarie
2016-09-12 10:08 ` [Qemu-devel] [v4 6/6] hw/iommu: share common code between IOMMUs David Kiarie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.