All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU
@ 2016-06-21  7:47 Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 01/26] x86-iommu: introduce parent class Peter Xu
                   ` (27 more replies)
  0 siblings, 28 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

This is v10 of Intel IOMMU IR support, based on patches:

- [PATCH v2 0/3] enable iommu with -device
  https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg00554.html

V9 introduced one bug when split irqchip is used with multiple
vCPUs.  V10 mainly fixes this issue, with several other trivial
enhancements.

Online branch:

  https://github.com/xzpeter/qemu vtd-intr-v10

Please review.  Thanks.

v10 changes:
- Fix issue when specify more than 1 vcpus.  This is introduced in v9
  after rebased to Marcel's patches.  The problem is that, before
  Marcel's patch, we will first create IOMMU then IOAPIC, while the
  order is switched after Marcel's changes.  This affects patch 18
  ("register IOMMU IEC notifier for ioapic") and I need to do the
  registration after IOAPIC realization.
- Display readable error message if user specify more than one x86
  vIOMMU, rather than an assertion fail. (patch 2)
- Correct vtd iec notifier "global" parameter: if granularity bit is
  clear (not set), then it's a global invalidation (patch 17,
  inverted meaning for granularity).
- added one more patch (patch 26) to add some trace events for irqchip
  msi routes operations.
- rebase to latest master

v9 changes:
- addressed several possible acpi issue with BE machines, and comment
  fix [Igor]
- removed patch 16 in v8 since it's useless after rebasing to Marcel's
  patches
- move vtd_svt_mask into vtd_irte_get() and declare it as constant.
- rebase to latest master, with Marcel's "-device intel-iommu" patch v2
  - re-arrange patch order, moving x86-iommu to the beginning (so that
    I can add "intremap" property for it, which can be further shared
	by future AMD IOMMUs)
  - add device property "intremap" for X86 IOMMU device (new patch 4
    in v9)
  - replace all existing references of MachineState.iommu_intr to
    device property X86IOMMUState.intr_supported, removing
    MachineState.iommu_intr
  - some other minor changes due to the rebase

v8 changes:
- rebase to latest master
- patch 7
  - remove VTD_IR_IOAPICEntry, which is useless now
  - fix possible issue on big endian machines for VTD_IRTE,
    VTD_IR_MSIAddress
- patch 12
  - fix endianess issue with bit-field defines: fix BE issue with
    VTD_MSIMessage, do cpu_to_*() or reverse when necessary on
    bit-field uses.
- patch 19
  - used le32_to_cpu() for dest_id, and added my s-o-b line beneath
    Jan's.

v7 changes (using v6 patch index):
- patch 10: trivial change in debug string (remove one more "\n")
- patch 17-18: ioapic remote irr patches, sent seperately
  already. So removed from this series.
- patch 24: 
  - fix commit message: only irqfd msi routes are maintained, not
    all msi routes.
  - skip all IOAPIC msi entries (dev == NULL). We only need to
    housekeep irqfd users.
- added patches
  - pick up Radim's patch on adding MHMV ecap bits [Radim]
- remove all vtd_* patches, instead, use x86-iommu ones at the first
  place. This introduced lots of patch order changes and content
  changes, which affected from original patch 8 to the end. Sorry!
  [Jan]

v6 changes:
- patch 10: use write_with_attrs() rather than write(), preparing
  for SID verification [Jan]
- patch 17-18: add r-b line from Radim [Radim]
- new patch 19: put together Jan's EIM patch [Jan]
- new patch 20: add SID validation process
- new patch 21-22: introduce X86IOMMU class, which is the parent of
  IntelIOMMU class. Patch 21 only introduce the class and did
  nothing, patch 22 cleaned up all the vtd_*() hooks into x86
  ones. This is only a start. In the future, we can abstract more
  things into X86IOMMU class, like iotlb, address spaces mgmt,
  etc. [Jan]
- new patch 23-25: this is to do IEC notify to all irqfd consumers
  like vhost/vfio. patch 23 changed interface for
  kvm_irqchip_add_msi_route(), provide vector info rather than a raw
  MSI message. Patch 24 added new hooks to do arch-specific
  notification on addition/deletion of msi routes. Patch 25 is x86
  specific, which added one more IEC notifier for msi routes. [Jan]
- new patch 26: this is to partially solve the issue that Jan has
  encountered (1 sec delay when invalidating IR cache).

v5 changes:
- patch 10: add vector checking for IOAPIC interrupts (this may help
  debug in the future, will only generate warning if specify
  IOMMU_DEBUG)
- patch 13: replace error_report() with a trace. [Jan]
- patch 14: rename parameter "intr" to "intremap", to be aligned
  with kernel parameter [Jan]
- patch 15: fix comments for vtd_iec_notify_fn
- patch 17 & 18 (added): fix issue when IR enabled with devices
  using level-triggered interrupts, like e1000. Adding it to the end
  of series, since this issue never happen without IR.

  Patch 17 adds read-only check for IOAPIC entries.
  Patch 18 clears remote IRR bit when entry configured as
  edge-triggered.

v4 changes (all patch number corresponds to v3):
- add one patch at the start of v3 series: I missed to send the
  first patch in v3. adding it in. [Jan]
- patch 9: add support for compatible mode (no reason not to support
  it, if not, we will get some warnings when using split irqchip)
- patch 11: further simplify ioapic_update_kvm_routes() using the
  helper function.
- patch 12: tweak on kvm_arch_fixup_msi_route() rather than
  ioapic_update_kvm_routes() only. [Radim]
- add patch 15: introduce IEC (Interrupt Entry Cache) invalidation
  notifier list. We can register to this list if we want to be
  notified when we got IR invalidation requests [Radim]
- add patch 16: let IOAPIC the first consumer for the above IEC
  notifier list. [Radim]
- several other trivial fixes (like moving some defines from .c to
  .h, moving several lines of changes from one patch to another to
  make it make more sense, etc.)

v3 changes (all patch numbers corresponds to v2):
- patch 1 (-> v3 patch 13)
  - move to the end of series [Alex]
- patch 10 (dropped)
  - drop this one, since re-worked on IOAPIC support, so we do not
    need this any more.
- patch 12 (-> v3 patch 10)
  - leverage MSI path for IOAPIC IR [Jan]
- patch 13 (v3 -> patch 9)
  - remove vtd_interrupt_remap_msi() declaration by reordering the
    functions [mst]
  - vtd_generate_msi_message(): init msg using {}, remove FIXME
    [mst]
- new patches
  - v3 patch 11: introduce ioapic_entry_parse() helper function
  - v3 patch 12: add support for kernel-irqchip=split. This needs
    more reviews, logically this should enable lots of things:
	splitted irqchip, irqfd, vhost, and irqfd support for
	passthrough devices (not tested). Please refer to the patch for
	more information.

v2 changes:
- patch 1
  - rename "int_remap" to "intr" in several places [Marcel]
  - remove "Intel" specific words in desc or commit message, prepare
    itself with further AMD support [Marcel]
  - avoid using object_property_get_bool() [Marcel]
- patch 5
  - use PCI bus number 0xff rather than 0xf0 for the IOAPIC scope
    definition. (please let me know if anyone knows how I can avoid
	user using PCI bus number 0xff... TIA)
- patch 11
  - fix comments [Marcel]
- all
  - remove intr_supported variable [Marcel]

This patchset provide interrupt remapping (IR) support of the emulated
Intel IOMMU device.

By default, IR is disabled to be better compatible with current
QEMU. To enable IR, we can use the following command to boot a
IR-supported VM with virtio-net device with vhost (do not support
kvm-ioapic, so we need to specify kernel-irqchip={split|off} here):

$ qemu-system-x86_64 -M q35,kernel-irqchip=split \
     -device intel-iommu,intremap=on \
     -enable-kvm -m 1024 \
	 -netdev tap,id=net0,vhost=on \
     -device virtio-net-pci,netdev=user.0 \
     -monitor telnet::3333,server,nowait \
	 /var/lib/libvirt/images/vm1.qcow2

When guest boots, we can verify whether IR enabled by grepping the
dmesg like:

Feb 19 11:21:23 localhost.localdomain kernel: DMAR-IR: IOAPIC id 0 under DRHD base  0xfed90000 IOMMU 0
Feb 19 11:21:23 localhost.localdomain kernel: DMAR-IR: Enabled IRQ remapping in x2apic mode

Testing is only covering basic smoke test for the following matrix:

- IR enabled/disable
- kernel irqchip off/split
- network device: tap with/without vhost, e1000
- vCPU count: 1/2

Currently supported:

- Emulated/Splitted irqchip
- Generic PCI Devices
- vhost devices
- pass through device support? Not tested, but suppose it should work.
- IEC (Interrupt Entry Cache) cache invalidation notification
- EIM (from Jan)
- IRTE Source-id validation

TODO List:

- explicit IEC invalidation (currently, we do update without
  checking. Also, we can process QI invalidation in bulk, as Jan
  suggested)
- IR fault reporting
- migration support (for IOMMU as general?)
- more?

Jan Kiszka (1):
  intel_iommu: Add support for Extended Interrupt Mode

Peter Xu (24):
  x86-iommu: introduce parent class
  x86-iommu: provide x86_iommu_get_default
  x86-iommu: q35: generalize find_add_as()
  x86-iommu: introduce "intremap" property
  acpi: enable INTR for DMAR report structure
  intel_iommu: allow queued invalidation for IR
  intel_iommu: set IR bit for ECAP register
  acpi: add DMAR scope definition for root IOAPIC
  intel_iommu: define interrupt remap table addr register
  intel_iommu: handle interrupt remap enable
  intel_iommu: define several structs for IOMMU IR
  intel_iommu: add IR translation faults defines
  intel_iommu: Add support for PCI MSI remap
  q35: ioapic: add support for emulated IOAPIC IR
  ioapic: introduce ioapic_entry_parse() helper
  intel_iommu: add support for split irqchip
  x86-iommu: introduce IEC notifiers
  ioapic: register IOMMU IEC notifier for ioapic
  intel_iommu: add SID validation for IR
  kvm-irqchip: simplify kvm_irqchip_add_msi_route
  kvm-irqchip: i386: add hook for add/remove virq
  kvm-irqchip: x86: add msi route notify fn
  kvm-irqchip: do explicit commit when update irq
  kvm-all: add trace events for kvm irqchip ops

Radim Krčmář (1):
  intel_iommu: support all masks in interrupt entry cache invalidation

 hw/i386/Makefile.objs             |   2 +-
 hw/i386/acpi-build.c              |  39 +++-
 hw/i386/intel_iommu.c             | 445 ++++++++++++++++++++++++++++++++++++--
 hw/i386/intel_iommu_internal.h    |  50 ++++-
 hw/i386/kvm/pci-assign.c          |  10 +-
 hw/i386/pc.c                      |   3 +
 hw/i386/x86-iommu.c               | 128 +++++++++++
 hw/intc/ioapic.c                  | 133 ++++++++----
 hw/misc/ivshmem.c                 |   4 +-
 hw/pci/pci.c                      |  15 ++
 hw/vfio/pci.c                     |  12 +-
 hw/virtio/virtio-pci.c            |  10 +-
 include/hw/acpi/acpi-defs.h       |  15 ++
 include/hw/i386/apic-msidef.h     |   1 +
 include/hw/i386/intel_iommu.h     | 175 ++++++++++++++-
 include/hw/i386/ioapic_internal.h |   3 +
 include/hw/i386/pc.h              |   4 +
 include/hw/i386/x86-iommu.h       | 103 +++++++++
 include/hw/pci-host/q35.h         |   8 +
 include/hw/pci/pci.h              |   2 +
 include/sysemu/kvm.h              |  21 +-
 kvm-all.c                         |  19 +-
 kvm-stub.c                        |   6 +-
 target-arm/kvm.c                  |  11 +
 target-i386/kvm.c                 | 109 +++++++++-
 target-mips/kvm.c                 |  11 +
 target-ppc/kvm.c                  |  11 +
 target-s390x/kvm.c                |  11 +
 trace-events                      |  12 +
 29 files changed, 1263 insertions(+), 110 deletions(-)
 create mode 100644 hw/i386/x86-iommu.c
 create mode 100644 include/hw/i386/x86-iommu.h

-- 
2.4.11

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 01/26] x86-iommu: introduce parent class
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-24  7:10   ` [Qemu-devel] [PATCH v10 27/26] intel_iommu: disallow kernel-irqchip=on with IR Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 02/26] x86-iommu: provide x86_iommu_get_default Peter Xu
                   ` (26 subsequent siblings)
  27 siblings, 1 reply; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

Introducing parent class for intel-iommu devices named "x86-iommu". This
is preparation work to abstract shared functionalities out from Intel
and AMD IOMMUs. Currently, only the parent class is introduced. It does
nothing yet.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/Makefile.objs         |  2 +-
 hw/i386/intel_iommu.c         |  5 ++--
 hw/i386/x86-iommu.c           | 53 +++++++++++++++++++++++++++++++++++++++++++
 include/hw/i386/intel_iommu.h |  3 ++-
 include/hw/i386/x86-iommu.h   | 46 +++++++++++++++++++++++++++++++++++++
 5 files changed, 105 insertions(+), 4 deletions(-)
 create mode 100644 hw/i386/x86-iommu.c
 create mode 100644 include/hw/i386/x86-iommu.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index b52d5b8..90e94ff 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -2,7 +2,7 @@ obj-$(CONFIG_KVM) += kvm/
 obj-y += multiboot.o
 obj-y += pc.o pc_piix.o pc_q35.o
 obj-y += pc_sysfw.o
-obj-y += intel_iommu.o
+obj-y += x86-iommu.o intel_iommu.o
 obj-$(CONFIG_XEN) += ../xenpv/ xen/
 
 obj-y += kvmvapic.o
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 9af5d6b..2734f6b 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2051,16 +2051,17 @@ static void vtd_realize(DeviceState *dev, Error **errp)
 static void vtd_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
+    X86IOMMUClass *x86_class = X86_IOMMU_CLASS(klass);
 
     dc->reset = vtd_reset;
-    dc->realize = vtd_realize;
     dc->vmsd = &vtd_vmstate;
     dc->props = vtd_properties;
+    x86_class->realize = vtd_realize;
 }
 
 static const TypeInfo vtd_info = {
     .name          = TYPE_INTEL_IOMMU_DEVICE,
-    .parent        = TYPE_SYS_BUS_DEVICE,
+    .parent        = TYPE_X86_IOMMU_DEVICE,
     .instance_size = sizeof(IntelIOMMUState),
     .class_init    = vtd_class_init,
 };
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
new file mode 100644
index 0000000..d739afb
--- /dev/null
+++ b/hw/i386/x86-iommu.c
@@ -0,0 +1,53 @@
+/*
+ * QEMU emulation of common X86 IOMMU
+ *
+ * Copyright (C) 2016 Peter Xu, Red Hat <peterx@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/boards.h"
+#include "hw/i386/x86-iommu.h"
+
+static void x86_iommu_realize(DeviceState *dev, Error **errp)
+{
+    X86IOMMUClass *x86_class = X86_IOMMU_GET_CLASS(dev);
+    if (x86_class->realize) {
+        x86_class->realize(dev, errp);
+    }
+}
+
+static void x86_iommu_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    dc->realize = x86_iommu_realize;
+}
+
+static const TypeInfo x86_iommu_info = {
+    .name          = TYPE_X86_IOMMU_DEVICE,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(X86IOMMUState),
+    .class_init    = x86_iommu_class_init,
+    .class_size    = sizeof(X86IOMMUClass),
+    .abstract      = true,
+};
+
+static void x86_iommu_register_types(void)
+{
+    type_register_static(&x86_iommu_info);
+}
+
+type_init(x86_iommu_register_types)
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index b024ffa..680a0c4 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -23,6 +23,7 @@
 #define INTEL_IOMMU_H
 #include "hw/qdev.h"
 #include "sysemu/dma.h"
+#include "hw/i386/x86-iommu.h"
 
 #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
 #define INTEL_IOMMU_DEVICE(obj) \
@@ -90,7 +91,7 @@ struct VTDIOTLBEntry {
 
 /* The iommu (DMAR) device state struct */
 struct IntelIOMMUState {
-    SysBusDevice busdev;
+    X86IOMMUState x86_iommu;
     MemoryRegion csrmem;
     uint8_t csr[DMAR_REG_SIZE];     /* register values */
     uint8_t wmask[DMAR_REG_SIZE];   /* R/W bytes */
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
new file mode 100644
index 0000000..924f39a
--- /dev/null
+++ b/include/hw/i386/x86-iommu.h
@@ -0,0 +1,46 @@
+/*
+ * Common IOMMU interface for X86 platform
+ *
+ * Copyright (C) 2016 Peter Xu, Red Hat <peterx@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IOMMU_COMMON_H
+#define IOMMU_COMMON_H
+
+#include "hw/sysbus.h"
+
+#define  TYPE_X86_IOMMU_DEVICE  ("x86-iommu")
+#define  X86_IOMMU_DEVICE(obj) \
+    OBJECT_CHECK(X86IOMMUState, (obj), TYPE_X86_IOMMU_DEVICE)
+#define  X86_IOMMU_CLASS(klass) \
+    OBJECT_CLASS_CHECK(X86IOMMUClass, (klass), TYPE_X86_IOMMU_DEVICE)
+#define  X86_IOMMU_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(X86IOMMUClass, obj, TYPE_X86_IOMMU_DEVICE)
+
+typedef struct X86IOMMUState X86IOMMUState;
+typedef struct X86IOMMUClass X86IOMMUClass;
+
+struct X86IOMMUClass {
+    SysBusDeviceClass parent;
+    /* Intel/AMD specific realize() hook */
+    DeviceRealize realize;
+};
+
+struct X86IOMMUState {
+    SysBusDevice busdev;
+};
+
+#endif
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 02/26] x86-iommu: provide x86_iommu_get_default
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 01/26] x86-iommu: introduce parent class Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-07-04 15:16   ` Michael S. Tsirkin
  2016-07-04 15:17   ` Michael S. Tsirkin
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 03/26] x86-iommu: q35: generalize find_add_as() Peter Xu
                   ` (25 subsequent siblings)
  27 siblings, 2 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

Instead of searching the device tree every time, one static variable is
declared for the default system x86 IOMMU device.  Also, some VT-d
macros are replaced by x86 ones.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/acpi-build.c          |  9 ++-------
 hw/i386/intel_iommu.c         |  9 ++++++---
 hw/i386/x86-iommu.c           | 23 +++++++++++++++++++++++
 include/hw/i386/intel_iommu.h |  1 -
 include/hw/i386/x86-iommu.h   |  9 +++++++++
 5 files changed, 40 insertions(+), 11 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 8ca2032..161f089 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -50,7 +50,7 @@
 #include "hw/i386/ich9.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/pci-host/q35.h"
-#include "hw/i386/intel_iommu.h"
+#include "hw/i386/x86-iommu.h"
 #include "hw/timer/hpet.h"
 
 #include "hw/acpi/aml-build.h"
@@ -2500,12 +2500,7 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
 
 static bool acpi_has_iommu(void)
 {
-    bool ambiguous;
-    Object *intel_iommu;
-
-    intel_iommu = object_resolve_path_type("", TYPE_INTEL_IOMMU_DEVICE,
-                                           &ambiguous);
-    return intel_iommu && !ambiguous;
+    return !!x86_iommu_get_default();
 }
 
 static
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 2734f6b..1936c41 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -26,6 +26,8 @@
 #include "hw/pci/pci.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/i386/pc.h"
+#include "hw/boards.h"
+#include "hw/i386/x86-iommu.h"
 
 /*#define DEBUG_INTEL_IOMMU*/
 #ifdef DEBUG_INTEL_IOMMU
@@ -192,7 +194,7 @@ static void vtd_reset_context_cache(IntelIOMMUState *s)
 
     VTD_DPRINTF(CACHE, "global context_cache_gen=1");
     while (g_hash_table_iter_next (&bus_it, NULL, (void**)&vtd_bus)) {
-        for (devfn_it = 0; devfn_it < VTD_PCI_DEVFN_MAX; ++devfn_it) {
+        for (devfn_it = 0; devfn_it < X86_IOMMU_PCI_DEVFN_MAX; ++devfn_it) {
             vtd_as = vtd_bus->dev_as[devfn_it];
             if (!vtd_as) {
                 continue;
@@ -964,7 +966,7 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
     vtd_bus = vtd_find_as_from_bus_num(s, VTD_SID_TO_BUS(source_id));
     if (vtd_bus) {
         devfn = VTD_SID_TO_DEVFN(source_id);
-        for (devfn_it = 0; devfn_it < VTD_PCI_DEVFN_MAX; ++devfn_it) {
+        for (devfn_it = 0; devfn_it < X86_IOMMU_PCI_DEVFN_MAX; ++devfn_it) {
             vtd_as = vtd_bus->dev_as[devfn_it];
             if (vtd_as && ((devfn_it & mask) == (devfn & mask))) {
                 VTD_DPRINTF(INV, "invalidate context-cahce of devfn 0x%"PRIx16,
@@ -1906,7 +1908,8 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
 
     if (!vtd_bus) {
         /* No corresponding free() */
-        vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * VTD_PCI_DEVFN_MAX);
+        vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * \
+                            X86_IOMMU_PCI_DEVFN_MAX);
         vtd_bus->bus = bus;
         key = (uintptr_t)bus;
         g_hash_table_insert(s->vtd_as_by_busptr, &key, vtd_bus);
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index d739afb..f395139 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -21,6 +21,28 @@
 #include "hw/sysbus.h"
 #include "hw/boards.h"
 #include "hw/i386/x86-iommu.h"
+#include "qemu/error-report.h"
+
+/* Default X86 IOMMU device */
+static X86IOMMUState *x86_iommu_default = NULL;
+
+static void x86_iommu_set_default(X86IOMMUState *x86_iommu)
+{
+    assert(x86_iommu);
+
+    if (x86_iommu_default) {
+        error_report("QEMU does not support multiple vIOMMUs "
+                     "for x86 yet.");
+        exit(1);
+    }
+
+    x86_iommu_default = x86_iommu;
+}
+
+X86IOMMUState *x86_iommu_get_default(void)
+{
+    return x86_iommu_default;
+}
 
 static void x86_iommu_realize(DeviceState *dev, Error **errp)
 {
@@ -28,6 +50,7 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
     if (x86_class->realize) {
         x86_class->realize(dev, errp);
     }
+    x86_iommu_set_default(X86_IOMMU_DEVICE(dev));
 }
 
 static void x86_iommu_class_init(ObjectClass *klass, void *data)
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 680a0c4..0794309 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -35,7 +35,6 @@
 #define VTD_PCI_BUS_MAX             256
 #define VTD_PCI_SLOT_MAX            32
 #define VTD_PCI_FUNC_MAX            8
-#define VTD_PCI_DEVFN_MAX           256
 #define VTD_PCI_SLOT(devfn)         (((devfn) >> 3) & 0x1f)
 #define VTD_PCI_FUNC(devfn)         ((devfn) & 0x07)
 #define VTD_SID_TO_BUS(sid)         (((sid) >> 8) & 0xff)
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
index 924f39a..d6991cb 100644
--- a/include/hw/i386/x86-iommu.h
+++ b/include/hw/i386/x86-iommu.h
@@ -30,6 +30,9 @@
 #define  X86_IOMMU_GET_CLASS(obj) \
     OBJECT_GET_CLASS(X86IOMMUClass, obj, TYPE_X86_IOMMU_DEVICE)
 
+#define X86_IOMMU_PCI_DEVFN_MAX           256
+#define X86_IOMMU_SID_INVALID             (0xffff)
+
 typedef struct X86IOMMUState X86IOMMUState;
 typedef struct X86IOMMUClass X86IOMMUClass;
 
@@ -43,4 +46,10 @@ struct X86IOMMUState {
     SysBusDevice busdev;
 };
 
+/**
+ * x86_iommu_get_default - get default IOMMU device
+ * @return: pointer to default IOMMU device
+ */
+X86IOMMUState *x86_iommu_get_default(void);
+
 #endif
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 03/26] x86-iommu: q35: generalize find_add_as()
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 01/26] x86-iommu: introduce parent class Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 02/26] x86-iommu: provide x86_iommu_get_default Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-07-04 15:16   ` Michael S. Tsirkin
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 04/26] x86-iommu: introduce "intremap" property Peter Xu
                   ` (24 subsequent siblings)
  27 siblings, 1 reply; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

Remove VT-d calls in common q35 codes. Instead, we provide a general
find_add_as() for x86-iommu type.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c         | 17 +++++++++--------
 include/hw/i386/intel_iommu.h |  5 -----
 include/hw/i386/x86-iommu.h   |  3 +++
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 1936c41..b487224 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1900,8 +1900,10 @@ static Property vtd_properties[] = {
 };
 
 
-VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
+static AddressSpace *vtd_find_add_as(X86IOMMUState *x86_iommu, PCIBus *bus,
+                                     int devfn)
 {
+    IntelIOMMUState *s = (IntelIOMMUState *)x86_iommu;
     uintptr_t key = (uintptr_t)bus;
     VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
     VTDAddressSpace *vtd_dev_as;
@@ -1929,7 +1931,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
         address_space_init(&vtd_dev_as->as,
                            &vtd_dev_as->iommu, "intel_iommu");
     }
-    return vtd_dev_as;
+    return &vtd_dev_as->as;
 }
 
 /* Do the initialization. It will also be called when reset, so pay
@@ -2021,13 +2023,11 @@ static void vtd_reset(DeviceState *dev)
 
 static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 {
-    IntelIOMMUState *s = opaque;
-    VTDAddressSpace *vtd_as;
-
-    assert(0 <= devfn && devfn <= VTD_PCI_DEVFN_MAX);
+    X86IOMMUState *x86_iommu = opaque;
+    X86IOMMUClass *x86_class = X86_IOMMU_GET_CLASS(x86_iommu);
 
-    vtd_as = vtd_find_add_as(s, bus, devfn);
-    return &vtd_as->as;
+    assert(0 <= devfn && devfn <= X86_IOMMU_PCI_DEVFN_MAX);
+    return x86_class->find_add_as(x86_iommu, bus, devfn);
 }
 
 static void vtd_realize(DeviceState *dev, Error **errp)
@@ -2060,6 +2060,7 @@ static void vtd_class_init(ObjectClass *klass, void *data)
     dc->vmsd = &vtd_vmstate;
     dc->props = vtd_properties;
     x86_class->realize = vtd_realize;
+    x86_class->find_add_as = vtd_find_add_as;
 }
 
 static const TypeInfo vtd_info = {
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 0794309..e36b896 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -125,9 +125,4 @@ struct IntelIOMMUState {
     VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
 };
 
-/* Find the VTD Address space associated with the given bus pointer,
- * create a new one if none exists
- */
-VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn);
-
 #endif
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
index d6991cb..2070cd1 100644
--- a/include/hw/i386/x86-iommu.h
+++ b/include/hw/i386/x86-iommu.h
@@ -21,6 +21,7 @@
 #define IOMMU_COMMON_H
 
 #include "hw/sysbus.h"
+#include "exec/memory.h"
 
 #define  TYPE_X86_IOMMU_DEVICE  ("x86-iommu")
 #define  X86_IOMMU_DEVICE(obj) \
@@ -40,6 +41,8 @@ struct X86IOMMUClass {
     SysBusDeviceClass parent;
     /* Intel/AMD specific realize() hook */
     DeviceRealize realize;
+    /* Find/Add IOMMU address space for specific PCI device */
+    AddressSpace *(*find_add_as)(X86IOMMUState *s, PCIBus *bus, int devfn);
 };
 
 struct X86IOMMUState {
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 04/26] x86-iommu: introduce "intremap" property
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (2 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 03/26] x86-iommu: q35: generalize find_add_as() Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 05/26] acpi: enable INTR for DMAR report structure Peter Xu
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

Adding one property for intel-iommu devices to specify whether we should
support interrupt remapping. By default, IR is disabled. To enable it,
we should use (take Intel IOMMU as example):

  -device intel_iommu,intremap=on

This property can be shared by Intel and future AMD IOMMUs.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/x86-iommu.c         | 23 +++++++++++++++++++++++
 include/hw/i386/x86-iommu.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index f395139..4280839 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -59,9 +59,32 @@ static void x86_iommu_class_init(ObjectClass *klass, void *data)
     dc->realize = x86_iommu_realize;
 }
 
+static bool x86_iommu_intremap_prop_get(Object *o, Error **errp)
+{
+    X86IOMMUState *s = X86_IOMMU_DEVICE(o);
+    return s->intr_supported;
+}
+
+static void x86_iommu_intremap_prop_set(Object *o, bool value, Error **errp)
+{
+    X86IOMMUState *s = X86_IOMMU_DEVICE(o);
+    s->intr_supported = value;
+}
+
+static void x86_iommu_instance_init(Object *o)
+{
+    X86IOMMUState *s = X86_IOMMU_DEVICE(o);
+
+    /* By default, do not support IR */
+    s->intr_supported = false;
+    object_property_add_bool(o, "intremap", x86_iommu_intremap_prop_get,
+                             x86_iommu_intremap_prop_set, NULL);
+}
+
 static const TypeInfo x86_iommu_info = {
     .name          = TYPE_X86_IOMMU_DEVICE,
     .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_init = x86_iommu_instance_init,
     .instance_size = sizeof(X86IOMMUState),
     .class_init    = x86_iommu_class_init,
     .class_size    = sizeof(X86IOMMUClass),
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
index 2070cd1..07199be 100644
--- a/include/hw/i386/x86-iommu.h
+++ b/include/hw/i386/x86-iommu.h
@@ -47,6 +47,7 @@ struct X86IOMMUClass {
 
 struct X86IOMMUState {
     SysBusDevice busdev;
+    bool intr_supported;        /* Whether vIOMMU supports IR */
 };
 
 /**
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 05/26] acpi: enable INTR for DMAR report structure
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (3 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 04/26] x86-iommu: introduce "intremap" property Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-07-04 15:14   ` Michael S. Tsirkin
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 06/26] intel_iommu: allow queued invalidation for IR Peter Xu
                   ` (22 subsequent siblings)
  27 siblings, 1 reply; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

In ACPI DMA remapping report structure, enable INTR flag when specified.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/acpi-build.c          | 11 ++++++++++-
 include/hw/i386/intel_iommu.h |  2 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 161f089..961ccd6a 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -57,6 +57,7 @@
 
 #include "qapi/qmp/qint.h"
 #include "qom/qom-qobject.h"
+#include "hw/i386/x86-iommu.h"
 
 /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
  * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
@@ -2422,10 +2423,18 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
 
     AcpiTableDmar *dmar;
     AcpiDmarHardwareUnit *drhd;
+    uint8_t dmar_flags = 0;
+    X86IOMMUState *iommu = x86_iommu_get_default();
+
+    assert(iommu);
+    if (iommu->intr_supported) {
+        /* enable INTR for the IOMMU device */
+        dmar_flags |= DMAR_REPORT_F_INTR;
+    }
 
     dmar = acpi_data_push(table_data, sizeof(*dmar));
     dmar->host_address_width = VTD_HOST_ADDRESS_WIDTH - 1;
-    dmar->flags = 0;    /* No intr_remap for now */
+    dmar->flags = dmar_flags;
 
     /* DMAR Remapping Hardware Unit Definition structure */
     drhd = acpi_data_push(table_data, sizeof(*drhd));
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index e36b896..638d77f 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -44,6 +44,8 @@
 #define VTD_HOST_ADDRESS_WIDTH      39
 #define VTD_HAW_MASK                ((1ULL << VTD_HOST_ADDRESS_WIDTH) - 1)
 
+#define DMAR_REPORT_F_INTR          (1)
+
 typedef struct VTDContextEntry VTDContextEntry;
 typedef struct VTDContextCacheEntry VTDContextCacheEntry;
 typedef struct IntelIOMMUState IntelIOMMUState;
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 06/26] intel_iommu: allow queued invalidation for IR
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (4 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 05/26] acpi: enable INTR for DMAR report structure Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 07/26] intel_iommu: set IR bit for ECAP register Peter Xu
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

Queued invalidation is required for IR. This patch add basic support for
interrupt cache invalidate requests. Since we currently have no IR cache
implemented yet, we can just skip all interrupt cache invalidation
requests for now.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c          | 9 +++++++++
 hw/i386/intel_iommu_internal.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index b487224..b170f97 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1404,6 +1404,15 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         }
         break;
 
+    case VTD_INV_DESC_IEC:
+        VTD_DPRINTF(INV, "Interrupt Entry Cache Invalidation "
+                    "not implemented yet");
+        /*
+         * Since currently we do not cache interrupt entries, we can
+         * just mark this descriptor as "good" and move on.
+         */
+        break;
+
     default:
         VTD_DPRINTF(GENERAL, "error: unkonw Invalidation Descriptor type "
                     "hi 0x%"PRIx64 " lo 0x%"PRIx64 " type %"PRIu8,
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index e5f514c..b648e69 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -286,6 +286,8 @@ typedef struct VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_TYPE               0xf
 #define VTD_INV_DESC_CC                 0x1 /* Context-cache Invalidate Desc */
 #define VTD_INV_DESC_IOTLB              0x2
+#define VTD_INV_DESC_IEC                0x4 /* Interrupt Entry Cache
+                                               Invalidate Descriptor */
 #define VTD_INV_DESC_WAIT               0x5 /* Invalidation Wait Descriptor */
 #define VTD_INV_DESC_NONE               0   /* Not an Invalidate Descriptor */
 
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 07/26] intel_iommu: set IR bit for ECAP register
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (5 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 06/26] intel_iommu: allow queued invalidation for IR Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 08/26] acpi: add DMAR scope definition for root IOAPIC Peter Xu
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

Enable IR in IOMMU Extended Capability register.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c          | 6 ++++++
 hw/i386/intel_iommu_internal.h | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index b170f97..e216fd3 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1948,6 +1948,8 @@ static AddressSpace *vtd_find_add_as(X86IOMMUState *x86_iommu, PCIBus *bus,
  */
 static void vtd_init(IntelIOMMUState *s)
 {
+    X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
+
     memset(s->csr, 0, DMAR_REG_SIZE);
     memset(s->wmask, 0, DMAR_REG_SIZE);
     memset(s->w1cmask, 0, DMAR_REG_SIZE);
@@ -1968,6 +1970,10 @@ static void vtd_init(IntelIOMMUState *s)
              VTD_CAP_SAGAW | VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS;
     s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
 
+    if (x86_iommu->intr_supported) {
+        s->ecap |= VTD_ECAP_IR;
+    }
+
     vtd_reset_context_cache(s);
     vtd_reset_iotlb(s);
 
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index b648e69..5b98a11 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -176,6 +176,8 @@
 /* (offset >> 4) << 8 */
 #define VTD_ECAP_IRO                (DMAR_IOTLB_REG_OFFSET << 4)
 #define VTD_ECAP_QI                 (1ULL << 1)
+/* Interrupt Remapping support */
+#define VTD_ECAP_IR                 (1ULL << 3)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 08/26] acpi: add DMAR scope definition for root IOAPIC
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (6 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 07/26] intel_iommu: set IR bit for ECAP register Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-07-04 15:22   ` Michael S. Tsirkin
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 09/26] intel_iommu: define interrupt remap table addr register Peter Xu
                   ` (19 subsequent siblings)
  27 siblings, 1 reply; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

To enable interrupt remapping for intel IOMMU device, each IOAPIC device
in the system reported via ACPI MADT must be explicitly enumerated under
one specific remapping hardware unit. This patch adds the root-complex
IOAPIC into the default DMAR device.

Please refer to VT-d spec 8.3.1.1 for more information.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/acpi-build.c        | 19 ++++++++++++++++---
 include/hw/acpi/acpi-defs.h | 15 +++++++++++++++
 include/hw/pci-host/q35.h   |  8 ++++++++
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 961ccd6a..eec022e 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -77,6 +77,9 @@
 #define ACPI_BUILD_DPRINTF(fmt, ...)
 #endif
 
+/* Default IOAPIC ID */
+#define ACPI_BUILD_IOAPIC_ID 0x0
+
 typedef struct AcpiMcfgInfo {
     uint64_t mcfg_base;
     uint32_t mcfg_size;
@@ -370,7 +373,6 @@ build_madt(GArray *table_data, BIOSLinker *linker, PCMachineState *pcms)
     io_apic = acpi_data_push(table_data, sizeof *io_apic);
     io_apic->type = ACPI_APIC_IO;
     io_apic->length = sizeof(*io_apic);
-#define ACPI_BUILD_IOAPIC_ID 0x0
     io_apic->io_apic_id = ACPI_BUILD_IOAPIC_ID;
     io_apic->address = cpu_to_le32(IO_APIC_DEFAULT_ADDRESS);
     io_apic->interrupt = cpu_to_le32(0);
@@ -2425,6 +2427,9 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
     AcpiDmarHardwareUnit *drhd;
     uint8_t dmar_flags = 0;
     X86IOMMUState *iommu = x86_iommu_get_default();
+    AcpiDmarDeviceScope *scope = NULL;
+    /* Root complex IOAPIC use one path[0] only */
+    uint8_t ioapic_scope_size = sizeof(*scope) + sizeof(scope->path[0]);
 
     assert(iommu);
     if (iommu->intr_supported) {
@@ -2437,13 +2442,21 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
     dmar->flags = dmar_flags;
 
     /* DMAR Remapping Hardware Unit Definition structure */
-    drhd = acpi_data_push(table_data, sizeof(*drhd));
+    drhd = acpi_data_push(table_data, sizeof(*drhd) + ioapic_scope_size);
     drhd->type = cpu_to_le16(ACPI_DMAR_TYPE_HARDWARE_UNIT);
-    drhd->length = cpu_to_le16(sizeof(*drhd));   /* No device scope now */
+    drhd->length = cpu_to_le16(sizeof(*drhd) + ioapic_scope_size);
     drhd->flags = ACPI_DMAR_INCLUDE_PCI_ALL;
     drhd->pci_segment = cpu_to_le16(0);
     drhd->address = cpu_to_le64(Q35_HOST_BRIDGE_IOMMU_ADDR);
 
+    /* Scope definition for the root-complex IOAPIC */
+    scope = &drhd->scope[0];
+    scope->entry_type = ACPI_DMAR_DEV_SCOPE_TYPE_IOAPIC;
+    scope->length = ioapic_scope_size;
+    scope->enumeration_id = ACPI_BUILD_IOAPIC_ID;
+    scope->bus = Q35_PSEUDO_BUS_PLATFORM;
+    scope->path[0] = cpu_to_le16(Q35_PSEUDO_DEVFN_IOAPIC);
+
     build_header(linker, table_data, (void *)(table_data->data + dmar_start),
                  "DMAR", table_data->len - dmar_start, 1, NULL, NULL);
 }
diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index ea9be0b..0dbdde3 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -571,6 +571,20 @@ enum {
 /*
  * Sub-structures for DMAR
  */
+
+#define ACPI_DMAR_DEV_SCOPE_TYPE_IOAPIC     (0x03)
+
+/* Device scope structure for DRHD. */
+struct AcpiDmarDeviceScope {
+    uint8_t entry_type;
+    uint8_t length;
+    uint16_t reserved;
+    uint8_t enumeration_id;
+    uint8_t bus;
+    uint16_t path[0];           /* list of dev:func pairs */
+} QEMU_PACKED;
+typedef struct AcpiDmarDeviceScope AcpiDmarDeviceScope;
+
 /* Type 0: Hardware Unit Definition */
 struct AcpiDmarHardwareUnit {
     uint16_t type;
@@ -579,6 +593,7 @@ struct AcpiDmarHardwareUnit {
     uint8_t reserved;
     uint16_t pci_segment;   /* The PCI Segment associated with this unit */
     uint64_t address;   /* Base address of remapping hardware register-set */
+    AcpiDmarDeviceScope scope[0];
 } QEMU_PACKED;
 typedef struct AcpiDmarHardwareUnit AcpiDmarHardwareUnit;
 
diff --git a/include/hw/pci-host/q35.h b/include/hw/pci-host/q35.h
index c5c073d..312b47f 100644
--- a/include/hw/pci-host/q35.h
+++ b/include/hw/pci-host/q35.h
@@ -175,4 +175,12 @@ typedef struct Q35PCIHost {
 
 uint64_t mch_mcfg_base(void);
 
+/*
+ * Arbitary but unique BNF number for IOAPIC device.
+ *
+ * TODO: make sure there would have no conflict with real PCI bus
+ */
+#define Q35_PSEUDO_BUS_PLATFORM         (0xff)
+#define Q35_PSEUDO_DEVFN_IOAPIC         (0x00)
+
 #endif /* HW_Q35_H */
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 09/26] intel_iommu: define interrupt remap table addr register
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (7 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 08/26] acpi: add DMAR scope definition for root IOAPIC Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 10/26] intel_iommu: handle interrupt remap enable Peter Xu
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

Defined Interrupt Remap Table Address register to store IR table
pointer. Also, do proper handling on global command register writes to
store table pointer and its size.

One more debug flag "DEBUG_IR" is added for interrupt remapping.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c          | 52 +++++++++++++++++++++++++++++++++++++++++-
 hw/i386/intel_iommu_internal.h |  4 ++++
 include/hw/i386/intel_iommu.h  |  5 ++++
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index e216fd3..26ef17a3 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -33,7 +33,7 @@
 #ifdef DEBUG_INTEL_IOMMU
 enum {
     DEBUG_GENERAL, DEBUG_CSR, DEBUG_INV, DEBUG_MMU, DEBUG_FLOG,
-    DEBUG_CACHE,
+    DEBUG_CACHE, DEBUG_IR,
 };
 #define VTD_DBGBIT(x)   (1 << DEBUG_##x)
 static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR);
@@ -903,6 +903,19 @@ static void vtd_root_table_setup(IntelIOMMUState *s)
                 (s->root_extended ? "(extended)" : ""));
 }
 
+static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
+{
+    uint64_t value = 0;
+    value = vtd_get_quad_raw(s, DMAR_IRTA_REG);
+    s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
+    s->intr_root = value & VTD_IRTA_ADDR_MASK;
+
+    /* TODO: invalidate interrupt entry cache */
+
+    VTD_DPRINTF(CSR, "int remap table addr 0x%"PRIx64 " size %"PRIu32,
+                s->intr_root, s->intr_size);
+}
+
 static void vtd_context_global_invalidate(IntelIOMMUState *s)
 {
     s->context_cache_gen++;
@@ -1141,6 +1154,16 @@ static void vtd_handle_gcmd_srtp(IntelIOMMUState *s)
     vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_RTPS);
 }
 
+/* Set Interrupt Remap Table Pointer */
+static void vtd_handle_gcmd_sirtp(IntelIOMMUState *s)
+{
+    VTD_DPRINTF(CSR, "set Interrupt Remap Table Pointer");
+
+    vtd_interrupt_remap_table_setup(s);
+    /* Ok - report back to driver */
+    vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_IRTPS);
+}
+
 /* Handle Translation Enable/Disable */
 static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
 {
@@ -1180,6 +1203,10 @@ static void vtd_handle_gcmd_write(IntelIOMMUState *s)
         /* Queued Invalidation Enable */
         vtd_handle_gcmd_qie(s, val & VTD_GCMD_QIE);
     }
+    if (val & VTD_GCMD_SIRTP) {
+        /* Set/update the interrupt remapping root-table pointer */
+        vtd_handle_gcmd_sirtp(s);
+    }
 }
 
 /* Handle write to Context Command Register */
@@ -1841,6 +1868,23 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
         vtd_update_fsts_ppf(s);
         break;
 
+    case DMAR_IRTA_REG:
+        VTD_DPRINTF(IR, "DMAR_IRTA_REG write addr 0x%"PRIx64
+                    ", size %d, val 0x%"PRIx64, addr, size, val);
+        if (size == 4) {
+            vtd_set_long(s, addr, val);
+        } else {
+            vtd_set_quad(s, addr, val);
+        }
+        break;
+
+    case DMAR_IRTA_REG_HI:
+        VTD_DPRINTF(IR, "DMAR_IRTA_REG_HI write addr 0x%"PRIx64
+                    ", size %d, val 0x%"PRIx64, addr, size, val);
+        assert(size == 4);
+        vtd_set_long(s, addr, val);
+        break;
+
     default:
         VTD_DPRINTF(GENERAL, "error: unhandled reg write addr 0x%"PRIx64
                     ", size %d, val 0x%"PRIx64, addr, size, val);
@@ -2023,6 +2067,12 @@ static void vtd_init(IntelIOMMUState *s)
     /* Fault Recording Registers, 128-bit */
     vtd_define_quad(s, DMAR_FRCD_REG_0_0, 0, 0, 0);
     vtd_define_quad(s, DMAR_FRCD_REG_0_2, 0, 0, 0x8000000000000000ULL);
+
+    /*
+     * Interrupt remapping registers, not support extended interrupt
+     * mode for now.
+     */
+    vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xfffffffffffff00fULL, 0);
 }
 
 /* Should not reset address_spaces when reset because devices will still use
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 5b98a11..309833f 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -172,6 +172,10 @@
 #define VTD_RTADDR_RTT              (1ULL << 11)
 #define VTD_RTADDR_ADDR_MASK        (VTD_HAW_MASK ^ 0xfffULL)
 
+/* IRTA_REG */
+#define VTD_IRTA_ADDR_MASK          (VTD_HAW_MASK ^ 0xfffULL)
+#define VTD_IRTA_SIZE_MASK          (0xfULL)
+
 /* ECAP_REG */
 /* (offset >> 4) << 8 */
 #define VTD_ECAP_IRO                (DMAR_IOTLB_REG_OFFSET << 4)
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 638d77f..83d1905 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -125,6 +125,11 @@ struct IntelIOMMUState {
     MemoryRegionIOMMUOps iommu_ops;
     GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* reference */
     VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
+
+    /* interrupt remapping */
+    bool intr_enabled;              /* Whether guest enabled IR */
+    dma_addr_t intr_root;           /* Interrupt remapping table pointer */
+    uint32_t intr_size;             /* Number of IR table entries */
 };
 
 #endif
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 10/26] intel_iommu: handle interrupt remap enable
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (8 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 09/26] intel_iommu: define interrupt remap table addr register Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 11/26] intel_iommu: define several structs for IOMMU IR Peter Xu
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

Handle writting to IRE bit in global command register.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 26ef17a3..d061e2a 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1183,6 +1183,22 @@ static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
     }
 }
 
+/* Handle Interrupt Remap Enable/Disable */
+static void vtd_handle_gcmd_ire(IntelIOMMUState *s, bool en)
+{
+    VTD_DPRINTF(CSR, "Interrupt Remap Enable %s", (en ? "on" : "off"));
+
+    if (en) {
+        s->intr_enabled = true;
+        /* Ok - report back to driver */
+        vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_IRES);
+    } else {
+        s->intr_enabled = false;
+        /* Ok - report back to driver */
+        vtd_set_clear_mask_long(s, DMAR_GSTS_REG, VTD_GSTS_IRES, 0);
+    }
+}
+
 /* Handle write to Global Command Register */
 static void vtd_handle_gcmd_write(IntelIOMMUState *s)
 {
@@ -1207,6 +1223,10 @@ static void vtd_handle_gcmd_write(IntelIOMMUState *s)
         /* Set/update the interrupt remapping root-table pointer */
         vtd_handle_gcmd_sirtp(s);
     }
+    if (changed & VTD_GCMD_IRE) {
+        /* Interrupt remap enable/disable */
+        vtd_handle_gcmd_ire(s, val & VTD_GCMD_IRE);
+    }
 }
 
 /* Handle write to Context Command Register */
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 11/26] intel_iommu: define several structs for IOMMU IR
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (9 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 10/26] intel_iommu: handle interrupt remap enable Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 12/26] intel_iommu: add IR translation faults defines Peter Xu
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

Several data structs are defined to better support the rest of the
patches: IRTE to parse remapping table entries, and IOAPIC/MSI related
structure bits to parse interrupt entries to be filled in by guest
kernel.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/hw/i386/intel_iommu.h | 74 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 83d1905..9a898c1 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -52,6 +52,8 @@ typedef struct IntelIOMMUState IntelIOMMUState;
 typedef struct VTDAddressSpace VTDAddressSpace;
 typedef struct VTDIOTLBEntry VTDIOTLBEntry;
 typedef struct VTDBus VTDBus;
+typedef union VTD_IRTE VTD_IRTE;
+typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
 
 /* Context-Entry */
 struct VTDContextEntry {
@@ -90,6 +92,78 @@ struct VTDIOTLBEntry {
     bool write_flags;
 };
 
+/* Interrupt Remapping Table Entry Definition */
+union VTD_IRTE {
+    struct {
+#ifdef HOST_WORDS_BIGENDIAN
+        uint32_t dest_id:32;         /* Destination ID */
+        uint32_t __reserved_1:8;     /* Reserved 1 */
+        uint32_t vector:8;           /* Interrupt Vector */
+        uint32_t irte_mode:1;        /* IRTE Mode */
+        uint32_t __reserved_0:3;     /* Reserved 0 */
+        uint32_t __avail:4;          /* Available spaces for software */
+        uint32_t delivery_mode:3;    /* Delivery Mode */
+        uint32_t trigger_mode:1;     /* Trigger Mode */
+        uint32_t redir_hint:1;       /* Redirection Hint */
+        uint32_t dest_mode:1;        /* Destination Mode */
+        uint32_t fault_disable:1;    /* Fault Processing Disable */
+        uint32_t present:1;          /* Whether entry present/available */
+#else
+        uint32_t present:1;          /* Whether entry present/available */
+        uint32_t fault_disable:1;    /* Fault Processing Disable */
+        uint32_t dest_mode:1;        /* Destination Mode */
+        uint32_t redir_hint:1;       /* Redirection Hint */
+        uint32_t trigger_mode:1;     /* Trigger Mode */
+        uint32_t delivery_mode:3;    /* Delivery Mode */
+        uint32_t __avail:4;          /* Available spaces for software */
+        uint32_t __reserved_0:3;     /* Reserved 0 */
+        uint32_t irte_mode:1;        /* IRTE Mode */
+        uint32_t vector:8;           /* Interrupt Vector */
+        uint32_t __reserved_1:8;     /* Reserved 1 */
+        uint32_t dest_id:32;         /* Destination ID */
+#endif
+        uint16_t source_id:16;       /* Source-ID */
+#ifdef HOST_WORDS_BIGENDIAN
+        uint64_t __reserved_2:44;    /* Reserved 2 */
+        uint64_t sid_vtype:2;        /* Source-ID Validation Type */
+        uint64_t sid_q:2;            /* Source-ID Qualifier */
+#else
+        uint64_t sid_q:2;            /* Source-ID Qualifier */
+        uint64_t sid_vtype:2;        /* Source-ID Validation Type */
+        uint64_t __reserved_2:44;    /* Reserved 2 */
+#endif
+    } QEMU_PACKED;
+    uint64_t data[2];
+};
+
+#define VTD_IR_INT_FORMAT_COMPAT     (0) /* Compatible Interrupt */
+#define VTD_IR_INT_FORMAT_REMAP      (1) /* Remappable Interrupt */
+
+/* Programming format for MSI/MSI-X addresses */
+union VTD_IR_MSIAddress {
+    struct {
+#ifdef HOST_WORDS_BIGENDIAN
+        uint32_t __head:12;          /* Should always be: 0x0fee */
+        uint32_t index_l:15;         /* Interrupt index bit 14-0 */
+        uint32_t int_mode:1;         /* Interrupt format */
+        uint32_t sub_valid:1;        /* SHV: Sub-Handle Valid bit */
+        uint32_t index_h:1;          /* Interrupt index bit 15 */
+        uint32_t __not_care:2;
+#else
+        uint32_t __not_care:2;
+        uint32_t index_h:1;          /* Interrupt index bit 15 */
+        uint32_t sub_valid:1;        /* SHV: Sub-Handle Valid bit */
+        uint32_t int_mode:1;         /* Interrupt format */
+        uint32_t index_l:15;         /* Interrupt index bit 14-0 */
+        uint32_t __head:12;          /* Should always be: 0x0fee */
+#endif
+    } QEMU_PACKED;
+    uint32_t data;
+};
+
+/* When IR is enabled, all MSI/MSI-X data bits should be zero */
+#define VTD_IR_MSI_DATA          (0)
+
 /* The iommu (DMAR) device state struct */
 struct IntelIOMMUState {
     X86IOMMUState x86_iommu;
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 12/26] intel_iommu: add IR translation faults defines
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (10 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 11/26] intel_iommu: define several structs for IOMMU IR Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 13/26] intel_iommu: Add support for PCI MSI remap Peter Xu
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

Adding translation fault definitions for interrupt remapping. Please
refer to VT-d spec section 7.1.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu_internal.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 309833f..2a9987f 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -271,6 +271,19 @@ typedef enum VTDFaultReason {
      * context-entry.
      */
     VTD_FR_CONTEXT_ENTRY_TT,
+
+    /* Interrupt remapping transition faults */
+    VTD_FR_IR_REQ_RSVD = 0x20, /* One or more IR request reserved
+                                * fields set */
+    VTD_FR_IR_INDEX_OVER = 0x21, /* Index value greater than max */
+    VTD_FR_IR_ENTRY_P = 0x22,    /* Present (P) not set in IRTE */
+    VTD_FR_IR_ROOT_INVAL = 0x23, /* IR Root table invalid */
+    VTD_FR_IR_IRTE_RSVD = 0x24,  /* IRTE Rsvd field non-zero with
+                                  * Present flag set */
+    VTD_FR_IR_REQ_COMPAT = 0x25, /* Encountered compatible IR
+                                  * request while disabled */
+    VTD_FR_IR_SID_ERR = 0x26,   /* Invalid Source-ID */
+
     /* This is not a normal fault reason. We use this to indicate some faults
      * that are not referenced by the VT-d specification.
      * Fault event with such reason should not be recorded.
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 13/26] intel_iommu: Add support for PCI MSI remap
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (11 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 12/26] intel_iommu: add IR translation faults defines Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 14/26] q35: ioapic: add support for emulated IOAPIC IR Peter Xu
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

This patch enables interrupt remapping for PCI devices.

To play the trick, one memory region "iommu_ir" is added as child region
of the original iommu memory region, covering range 0xfeeXXXXX (which is
the address range for APIC). All the writes to this range will be taken
as MSI, and translation is carried out only when IR is enabled.

Idea suggested by Paolo Bonzini.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c          | 243 +++++++++++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h |   2 +
 include/hw/i386/intel_iommu.h  |  66 +++++++++++
 3 files changed, 311 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index d061e2a..cab3d8c 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1972,6 +1972,244 @@ static Property vtd_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+/* Read IRTE entry with specific index */
+static int vtd_irte_get(IntelIOMMUState *iommu, uint16_t index,
+                        VTD_IRTE *entry)
+{
+    dma_addr_t addr = 0x00;
+
+    addr = iommu->intr_root + index * sizeof(*entry);
+    if (dma_memory_read(&address_space_memory, addr, entry,
+                        sizeof(*entry))) {
+        VTD_DPRINTF(GENERAL, "error: fail to access IR root at 0x%"PRIx64
+                    " + %"PRIu16, iommu->intr_root, index);
+        return -VTD_FR_IR_ROOT_INVAL;
+    }
+
+    if (!entry->present) {
+        VTD_DPRINTF(GENERAL, "error: present flag not set in IRTE"
+                    " entry index %u value 0x%"PRIx64 " 0x%"PRIx64,
+                    index, le64_to_cpu(entry->data[1]),
+                    le64_to_cpu(entry->data[0]));
+        return -VTD_FR_IR_ENTRY_P;
+    }
+
+    if (entry->__reserved_0 || entry->__reserved_1 || \
+        entry->__reserved_2) {
+        VTD_DPRINTF(GENERAL, "error: IRTE entry index %"PRIu16
+                    " reserved fields non-zero: 0x%"PRIx64 " 0x%"PRIx64,
+                    index, le64_to_cpu(entry->data[1]),
+                    le64_to_cpu(entry->data[0]));
+        return -VTD_FR_IR_IRTE_RSVD;
+    }
+
+    /*
+     * TODO: Check Source-ID corresponds to SVT (Source Validation
+     * Type) bits
+     */
+
+    return 0;
+}
+
+/* Fetch IRQ information of specific IR index */
+static int vtd_remap_irq_get(IntelIOMMUState *iommu, uint16_t index, VTDIrq *irq)
+{
+    VTD_IRTE irte;
+    int ret = 0;
+
+    bzero(&irte, sizeof(irte));
+
+    ret = vtd_irte_get(iommu, index, &irte);
+    if (ret) {
+        return ret;
+    }
+
+    irq->trigger_mode = irte.trigger_mode;
+    irq->vector = irte.vector;
+    irq->delivery_mode = irte.delivery_mode;
+    /* Not support EIM yet: please refer to vt-d 9.10 DST bits */
+#define  VTD_IR_APIC_DEST_MASK         (0xff00ULL)
+#define  VTD_IR_APIC_DEST_SHIFT        (8)
+    irq->dest = (le32_to_cpu(irte.dest_id) & VTD_IR_APIC_DEST_MASK) >> \
+        VTD_IR_APIC_DEST_SHIFT;
+    irq->dest_mode = irte.dest_mode;
+    irq->redir_hint = irte.redir_hint;
+
+    VTD_DPRINTF(IR, "remapping interrupt index %d: trig:%u,vec:%u,"
+                "deliver:%u,dest:%u,dest_mode:%u", index,
+                irq->trigger_mode, irq->vector, irq->delivery_mode,
+                irq->dest, irq->dest_mode);
+
+    return 0;
+}
+
+/* Generate one MSI message from VTDIrq info */
+static void vtd_generate_msi_message(VTDIrq *irq, MSIMessage *msg_out)
+{
+    VTD_MSIMessage msg = {};
+
+    /* Generate address bits */
+    msg.dest_mode = irq->dest_mode;
+    msg.redir_hint = irq->redir_hint;
+    msg.dest = irq->dest;
+    msg.__addr_head = cpu_to_le32(0xfee);
+    /* Keep this from original MSI address bits */
+    msg.__not_used = irq->msi_addr_last_bits;
+
+    /* Generate data bits */
+    msg.vector = irq->vector;
+    msg.delivery_mode = irq->delivery_mode;
+    msg.level = 1;
+    msg.trigger_mode = irq->trigger_mode;
+
+    msg_out->address = msg.msi_addr;
+    msg_out->data = msg.msi_data;
+}
+
+/* Interrupt remapping for MSI/MSI-X entry */
+static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
+                                   MSIMessage *origin,
+                                   MSIMessage *translated)
+{
+    int ret = 0;
+    VTD_IR_MSIAddress addr;
+    uint16_t index;
+    VTDIrq irq = {0};
+
+    assert(origin && translated);
+
+    if (!iommu || !iommu->intr_enabled) {
+        goto do_not_translate;
+    }
+
+    if (origin->address & VTD_MSI_ADDR_HI_MASK) {
+        VTD_DPRINTF(GENERAL, "error: MSI addr high 32 bits nonzero"
+                    " during interrupt remapping: 0x%"PRIx32,
+                    (uint32_t)((origin->address & VTD_MSI_ADDR_HI_MASK) >> \
+                    VTD_MSI_ADDR_HI_SHIFT));
+        return -VTD_FR_IR_REQ_RSVD;
+    }
+
+    addr.data = origin->address & VTD_MSI_ADDR_LO_MASK;
+    if (le16_to_cpu(addr.__head) != 0xfee) {
+        VTD_DPRINTF(GENERAL, "error: MSI addr low 32 bits invalid: "
+                    "0x%"PRIx32, addr.data);
+        return -VTD_FR_IR_REQ_RSVD;
+    }
+
+    /* This is compatible mode. */
+    if (addr.int_mode != VTD_IR_INT_FORMAT_REMAP) {
+        goto do_not_translate;
+    }
+
+    index = addr.index_h << 15 | le16_to_cpu(addr.index_l);
+
+    ret = vtd_remap_irq_get(iommu, index, &irq);
+    if (ret) {
+        return ret;
+    }
+
+    if (addr.sub_valid == 1) {
+        VTD_DPRINTF(IR, "received MSI interrupt");
+        if (origin->data) {
+            VTD_DPRINTF(GENERAL, "error: MSI data bits non-zero for "
+                        "interrupt remappable entry: 0x%"PRIx32,
+                        origin->data);
+            return -VTD_FR_IR_REQ_RSVD;
+        }
+    } else {
+        uint8_t vector = origin->data & 0xff;
+        VTD_DPRINTF(IR, "received IOAPIC interrupt");
+        /* IOAPIC entry vector should be aligned with IRTE vector
+         * (see vt-d spec 5.1.5.1). */
+        if (vector != irq.vector) {
+            VTD_DPRINTF(GENERAL, "IOAPIC vector inconsistent: "
+                        "entry: %d, IRTE: %d, index: %d",
+                        vector, irq.vector, index);
+        }
+    }
+
+    /*
+     * We'd better keep the last two bits, assuming that guest OS
+     * might modify it. Keep it does not hurt after all.
+     */
+    irq.msi_addr_last_bits = addr.__not_care;
+
+    /* Translate VTDIrq to MSI message */
+    vtd_generate_msi_message(&irq, translated);
+
+    VTD_DPRINTF(IR, "mapping MSI 0x%"PRIx64":0x%"PRIx32 " -> "
+                "0x%"PRIx64":0x%"PRIx32, origin->address, origin->data,
+                translated->address, translated->data);
+    return 0;
+
+do_not_translate:
+    memcpy(translated, origin, sizeof(*origin));
+    return 0;
+}
+
+static MemTxResult vtd_mem_ir_read(void *opaque, hwaddr addr,
+                                   uint64_t *data, unsigned size,
+                                   MemTxAttrs attrs)
+{
+    addr += VTD_INTERRUPT_ADDR_FIRST;
+
+    VTD_DPRINTF(IR, "read mem_ir addr 0x%"PRIx64 " size %u",
+                addr, size);
+
+    if (dma_memory_read(&address_space_memory, addr, &data, size)) {
+        VTD_DPRINTF(GENERAL, "error: fail to access 0x%"PRIx64, addr);
+        return MEMTX_ERROR;
+    }
+
+    return MEMTX_OK;
+}
+
+static MemTxResult vtd_mem_ir_write(void *opaque, hwaddr addr,
+                                    uint64_t value, unsigned size,
+                                    MemTxAttrs attrs)
+{
+    int ret = 0;
+    MSIMessage from = {0}, to = {0};
+
+    from.address = (uint64_t) addr + VTD_INTERRUPT_ADDR_FIRST;
+    from.data = (uint32_t) value;
+
+    ret = vtd_interrupt_remap_msi(opaque, &from, &to);
+    if (ret) {
+        /* TODO: report error */
+        VTD_DPRINTF(GENERAL, "int remap fail for addr 0x%"PRIx64
+                    " data 0x%"PRIx32, from.address, from.data);
+        /* Drop this interrupt */
+        return MEMTX_ERROR;
+    }
+
+    VTD_DPRINTF(IR, "delivering MSI 0x%"PRIx64":0x%"PRIx32
+                " for device sid 0x%04x",
+                to.address, to.data, sid);
+
+    if (dma_memory_write(&address_space_memory, to.address,
+                         &to.data, size)) {
+        VTD_DPRINTF(GENERAL, "error: fail to write 0x%"PRIx64
+                    " value 0x%"PRIx32, to.address, to.data);
+    }
+
+    return MEMTX_OK;
+}
+
+static const MemoryRegionOps vtd_mem_ir_ops = {
+    .read_with_attrs = vtd_mem_ir_read,
+    .write_with_attrs = vtd_mem_ir_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+};
 
 static AddressSpace *vtd_find_add_as(X86IOMMUState *x86_iommu, PCIBus *bus,
                                      int devfn)
@@ -2001,6 +2239,11 @@ static AddressSpace *vtd_find_add_as(X86IOMMUState *x86_iommu, PCIBus *bus,
         vtd_dev_as->context_cache_entry.context_cache_gen = 0;
         memory_region_init_iommu(&vtd_dev_as->iommu, OBJECT(s),
                                  &s->iommu_ops, "intel_iommu", UINT64_MAX);
+        memory_region_init_io(&vtd_dev_as->iommu_ir, OBJECT(s),
+                              &vtd_mem_ir_ops, s, "intel_iommu_ir",
+                              VTD_INTERRUPT_ADDR_SIZE);
+        memory_region_add_subregion(&vtd_dev_as->iommu, VTD_INTERRUPT_ADDR_FIRST,
+                                    &vtd_dev_as->iommu_ir);
         address_space_init(&vtd_dev_as->as,
                            &vtd_dev_as->iommu, "intel_iommu");
     }
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 2a9987f..e1a08cb 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -110,6 +110,8 @@
 /* Interrupt Address Range */
 #define VTD_INTERRUPT_ADDR_FIRST    0xfee00000ULL
 #define VTD_INTERRUPT_ADDR_LAST     0xfeefffffULL
+#define VTD_INTERRUPT_ADDR_SIZE     (VTD_INTERRUPT_ADDR_LAST - \
+                                     VTD_INTERRUPT_ADDR_FIRST + 1)
 
 /* The shift of source_id in the key of IOTLB hash table */
 #define VTD_IOTLB_SID_SHIFT         36
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 9a898c1..b3f17d7 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -24,6 +24,8 @@
 #include "hw/qdev.h"
 #include "sysemu/dma.h"
 #include "hw/i386/x86-iommu.h"
+#include "hw/i386/ioapic.h"
+#include "hw/pci/msi.h"
 
 #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
 #define INTEL_IOMMU_DEVICE(obj) \
@@ -46,6 +48,10 @@
 
 #define DMAR_REPORT_F_INTR          (1)
 
+#define  VTD_MSI_ADDR_HI_MASK        (0xffffffff00000000ULL)
+#define  VTD_MSI_ADDR_HI_SHIFT       (32)
+#define  VTD_MSI_ADDR_LO_MASK        (0x00000000ffffffffULL)
+
 typedef struct VTDContextEntry VTDContextEntry;
 typedef struct VTDContextCacheEntry VTDContextCacheEntry;
 typedef struct IntelIOMMUState IntelIOMMUState;
@@ -54,6 +60,8 @@ typedef struct VTDIOTLBEntry VTDIOTLBEntry;
 typedef struct VTDBus VTDBus;
 typedef union VTD_IRTE VTD_IRTE;
 typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
+typedef struct VTDIrq VTDIrq;
+typedef struct VTD_MSIMessage VTD_MSIMessage;
 
 /* Context-Entry */
 struct VTDContextEntry {
@@ -74,6 +82,7 @@ struct VTDAddressSpace {
     uint8_t devfn;
     AddressSpace as;
     MemoryRegion iommu;
+    MemoryRegion iommu_ir;      /* Interrupt region: 0xfeeXXXXX */
     IntelIOMMUState *iommu_state;
     VTDContextCacheEntry context_cache_entry;
 };
@@ -161,6 +170,63 @@ union VTD_IR_MSIAddress {
     uint32_t data;
 };
 
+/* Generic IRQ entry information */
+struct VTDIrq {
+    /* Used by both IOAPIC/MSI interrupt remapping */
+    uint8_t trigger_mode;
+    uint8_t vector;
+    uint8_t delivery_mode;
+    uint32_t dest;
+    uint8_t dest_mode;
+
+    /* only used by MSI interrupt remapping */
+    uint8_t redir_hint;
+    uint8_t msi_addr_last_bits;
+};
+
+struct VTD_MSIMessage {
+    union {
+        struct {
+#ifdef HOST_WORDS_BIGENDIAN
+            uint32_t __addr_head:12; /* 0xfee */
+            uint32_t dest:8;
+            uint32_t __reserved:8;
+            uint32_t redir_hint:1;
+            uint32_t dest_mode:1;
+            uint32_t __not_used:2;
+#else
+            uint32_t __not_used:2;
+            uint32_t dest_mode:1;
+            uint32_t redir_hint:1;
+            uint32_t __reserved:8;
+            uint32_t dest:8;
+            uint32_t __addr_head:12; /* 0xfee */
+#endif
+            uint32_t __addr_hi:32;
+        } QEMU_PACKED;
+        uint64_t msi_addr;
+    };
+    union {
+        struct {
+#ifdef HOST_WORDS_BIGENDIAN
+            uint16_t trigger_mode:1;
+            uint16_t level:1;
+            uint16_t __resved:3;
+            uint16_t delivery_mode:3;
+            uint16_t vector:8;
+#else
+            uint16_t vector:8;
+            uint16_t delivery_mode:3;
+            uint16_t __resved:3;
+            uint16_t level:1;
+            uint16_t trigger_mode:1;
+#endif
+            uint16_t __resved1:16;
+        } QEMU_PACKED;
+        uint32_t msi_data;
+    };
+};
+
 /* When IR is enabled, all MSI/MSI-X data bits should be zero */
 #define VTD_IR_MSI_DATA          (0)
 
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 14/26] q35: ioapic: add support for emulated IOAPIC IR
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (12 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 13/26] intel_iommu: Add support for PCI MSI remap Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 15/26] ioapic: introduce ioapic_entry_parse() helper Peter Xu
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

This patch translates all IOAPIC interrupts into MSI ones. One pseudo
ioapic address space is added to transfer the MSI message. By default,
it will be system memory address space. When IR is enabled, it will be
IOMMU address space.

Currently, only emulated IOAPIC is supported.

Idea suggested by Jan Kiszka and Rita Sinha in the following patch:

https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg01933.html

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c             |  6 +++++-
 hw/i386/pc.c                      |  3 +++
 hw/intc/ioapic.c                  | 28 ++++++++++++++++++++++++----
 include/hw/i386/apic-msidef.h     |  1 +
 include/hw/i386/ioapic_internal.h |  1 +
 include/hw/i386/pc.h              |  4 ++++
 6 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index cab3d8c..d874596 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -28,6 +28,7 @@
 #include "hw/i386/pc.h"
 #include "hw/boards.h"
 #include "hw/i386/x86-iommu.h"
+#include "hw/pci-host/q35.h"
 
 /*#define DEBUG_INTEL_IOMMU*/
 #ifdef DEBUG_INTEL_IOMMU
@@ -2360,7 +2361,8 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 
 static void vtd_realize(DeviceState *dev, Error **errp)
 {
-    PCIBus *bus = PC_MACHINE(qdev_get_machine())->bus;
+    PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
+    PCIBus *bus = pcms->bus;
     IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
 
     VTD_DPRINTF(GENERAL, "");
@@ -2377,6 +2379,8 @@ static void vtd_realize(DeviceState *dev, Error **errp)
     sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
     bus->iommu_fn = vtd_host_dma_iommu;
     bus->iommu_opaque = dev;
+    /* Pseudo address space under root PCI bus. */
+    pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
 }
 
 static void vtd_class_init(ObjectClass *klass, void *data)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 7198ed5..5420545 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1389,6 +1389,9 @@ void pc_memory_init(PCMachineState *pcms,
         rom_add_option(option_rom[i].name, option_rom[i].bootindex);
     }
     pcms->fw_cfg = fw_cfg;
+
+    /* Init default IOAPIC address space */
+    pcms->ioapic_as = &address_space_memory;
 }
 
 qemu_irq pc_allocate_cpu_irq(void)
diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 273bb08..36dd42a 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -29,6 +29,8 @@
 #include "hw/i386/ioapic_internal.h"
 #include "include/hw/pci/msi.h"
 #include "sysemu/kvm.h"
+#include "target-i386/cpu.h"
+#include "hw/i386/apic-msidef.h"
 
 //#define DEBUG_IOAPIC
 
@@ -50,13 +52,15 @@ extern int ioapic_no;
 
 static void ioapic_service(IOAPICCommonState *s)
 {
+    AddressSpace *ioapic_as = PC_MACHINE(qdev_get_machine())->ioapic_as;
+    uint32_t addr, data;
     uint8_t i;
     uint8_t trig_mode;
     uint8_t vector;
     uint8_t delivery_mode;
     uint32_t mask;
     uint64_t entry;
-    uint8_t dest;
+    uint16_t dest_idx;
     uint8_t dest_mode;
 
     for (i = 0; i < IOAPIC_NUM_PINS; i++) {
@@ -67,7 +71,14 @@ static void ioapic_service(IOAPICCommonState *s)
             entry = s->ioredtbl[i];
             if (!(entry & IOAPIC_LVT_MASKED)) {
                 trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1);
-                dest = entry >> IOAPIC_LVT_DEST_SHIFT;
+                /*
+                 * By default, this would be dest_id[8] +
+                 * reserved[8]. When IR is enabled, this would be
+                 * interrupt_index[15] + interrupt_format[1]. This
+                 * field never means anything, but only used to
+                 * generate corresponding MSI.
+                 */
+                dest_idx = entry >> IOAPIC_LVT_DEST_IDX_SHIFT;
                 dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1;
                 delivery_mode =
                     (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK;
@@ -97,8 +108,17 @@ static void ioapic_service(IOAPICCommonState *s)
 #else
                 (void)coalesce;
 #endif
-                apic_deliver_irq(dest, dest_mode, delivery_mode, vector,
-                                 trig_mode);
+                /* No matter whether IR is enabled, we translate
+                 * the IOAPIC message into a MSI one, and its
+                 * address space will decide whether we need a
+                 * translation. */
+                addr = APIC_DEFAULT_ADDRESS | \
+                    (dest_idx << MSI_ADDR_DEST_IDX_SHIFT) |
+                    (dest_mode << MSI_ADDR_DEST_MODE_SHIFT);
+                data = (vector << MSI_DATA_VECTOR_SHIFT) |
+                    (trig_mode << MSI_DATA_TRIGGER_SHIFT) |
+                    (delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
+                stl_le_phys(ioapic_as, addr, data);
             }
         }
     }
diff --git a/include/hw/i386/apic-msidef.h b/include/hw/i386/apic-msidef.h
index 6e2eb71..8b4d4cc 100644
--- a/include/hw/i386/apic-msidef.h
+++ b/include/hw/i386/apic-msidef.h
@@ -25,6 +25,7 @@
 #define MSI_ADDR_REDIRECTION_SHIFT      3
 
 #define MSI_ADDR_DEST_ID_SHIFT          12
+#define MSI_ADDR_DEST_IDX_SHIFT         4
 #define  MSI_ADDR_DEST_ID_MASK          0x00ffff0
 
 #endif /* HW_APIC_MSIDEF_H */
diff --git a/include/hw/i386/ioapic_internal.h b/include/hw/i386/ioapic_internal.h
index cab9e67..31dafb3 100644
--- a/include/hw/i386/ioapic_internal.h
+++ b/include/hw/i386/ioapic_internal.h
@@ -31,6 +31,7 @@
 #define IOAPIC_VERSION                  0x11
 
 #define IOAPIC_LVT_DEST_SHIFT           56
+#define IOAPIC_LVT_DEST_IDX_SHIFT       48
 #define IOAPIC_LVT_MASKED_SHIFT         16
 #define IOAPIC_LVT_TRIGGER_MODE_SHIFT   15
 #define IOAPIC_LVT_REMOTE_IRR_SHIFT     14
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 49566c8..49807f5 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -72,6 +72,10 @@ struct PCMachineState {
     uint64_t numa_nodes;
     uint64_t *node_mem;
     uint64_t *node_cpu;
+
+    /* Address space used by IOAPIC device. All IOAPIC interrupts
+     * will be translated to MSI messages in the address space. */
+    AddressSpace *ioapic_as;
 };
 
 #define PC_MACHINE_ACPI_DEVICE_PROP "acpi-device"
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 15/26] ioapic: introduce ioapic_entry_parse() helper
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (13 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 14/26] q35: ioapic: add support for emulated IOAPIC IR Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 16/26] intel_iommu: add support for split irqchip Peter Xu
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

Abstract IOAPIC entry parsing logic into a helper function.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/intc/ioapic.c | 110 +++++++++++++++++++++++++++----------------------------
 1 file changed, 54 insertions(+), 56 deletions(-)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 36dd42a..c4469e4 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -50,18 +50,56 @@ static IOAPICCommonState *ioapics[MAX_IOAPICS];
 /* global variable from ioapic_common.c */
 extern int ioapic_no;
 
+struct ioapic_entry_info {
+    /* fields parsed from IOAPIC entries */
+    uint8_t masked;
+    uint8_t trig_mode;
+    uint16_t dest_idx;
+    uint8_t dest_mode;
+    uint8_t delivery_mode;
+    uint8_t vector;
+
+    /* MSI message generated from above parsed fields */
+    uint32_t addr;
+    uint32_t data;
+};
+
+static void ioapic_entry_parse(uint64_t entry, struct ioapic_entry_info *info)
+{
+    bzero(info, sizeof(*info));
+    info->masked = (entry >> IOAPIC_LVT_MASKED_SHIFT) & 1;
+    info->trig_mode = (entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1;
+    /*
+     * By default, this would be dest_id[8] + reserved[8]. When IR
+     * is enabled, this would be interrupt_index[15] +
+     * interrupt_format[1]. This field never means anything, but
+     * only used to generate corresponding MSI.
+     */
+    info->dest_idx = (entry >> IOAPIC_LVT_DEST_IDX_SHIFT) & 0xffff;
+    info->dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1;
+    info->delivery_mode = (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) \
+        & IOAPIC_DM_MASK;
+    if (info->delivery_mode == IOAPIC_DM_EXTINT) {
+        info->vector = pic_read_irq(isa_pic);
+    } else {
+        info->vector = entry & IOAPIC_VECTOR_MASK;
+    }
+
+    info->addr = APIC_DEFAULT_ADDRESS | \
+        (info->dest_idx << MSI_ADDR_DEST_IDX_SHIFT) | \
+        (info->dest_mode << MSI_ADDR_DEST_MODE_SHIFT);
+    info->data = (info->vector << MSI_DATA_VECTOR_SHIFT) | \
+        (info->trig_mode << MSI_DATA_TRIGGER_SHIFT) | \
+        (info->delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
+}
+
 static void ioapic_service(IOAPICCommonState *s)
 {
     AddressSpace *ioapic_as = PC_MACHINE(qdev_get_machine())->ioapic_as;
-    uint32_t addr, data;
+    struct ioapic_entry_info info;
     uint8_t i;
-    uint8_t trig_mode;
-    uint8_t vector;
-    uint8_t delivery_mode;
     uint32_t mask;
     uint64_t entry;
-    uint16_t dest_idx;
-    uint8_t dest_mode;
 
     for (i = 0; i < IOAPIC_NUM_PINS; i++) {
         mask = 1 << i;
@@ -69,33 +107,18 @@ static void ioapic_service(IOAPICCommonState *s)
             int coalesce = 0;
 
             entry = s->ioredtbl[i];
-            if (!(entry & IOAPIC_LVT_MASKED)) {
-                trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1);
-                /*
-                 * By default, this would be dest_id[8] +
-                 * reserved[8]. When IR is enabled, this would be
-                 * interrupt_index[15] + interrupt_format[1]. This
-                 * field never means anything, but only used to
-                 * generate corresponding MSI.
-                 */
-                dest_idx = entry >> IOAPIC_LVT_DEST_IDX_SHIFT;
-                dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1;
-                delivery_mode =
-                    (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK;
-                if (trig_mode == IOAPIC_TRIGGER_EDGE) {
+            ioapic_entry_parse(entry, &info);
+            if (!info.masked) {
+                if (info.trig_mode == IOAPIC_TRIGGER_EDGE) {
                     s->irr &= ~mask;
                 } else {
                     coalesce = s->ioredtbl[i] & IOAPIC_LVT_REMOTE_IRR;
                     s->ioredtbl[i] |= IOAPIC_LVT_REMOTE_IRR;
                 }
-                if (delivery_mode == IOAPIC_DM_EXTINT) {
-                    vector = pic_read_irq(isa_pic);
-                } else {
-                    vector = entry & IOAPIC_VECTOR_MASK;
-                }
+
 #ifdef CONFIG_KVM
                 if (kvm_irqchip_is_split()) {
-                    if (trig_mode == IOAPIC_TRIGGER_EDGE) {
+                    if (info.trig_mode == IOAPIC_TRIGGER_EDGE) {
                         kvm_set_irq(kvm_state, i, 1);
                         kvm_set_irq(kvm_state, i, 0);
                     } else {
@@ -112,13 +135,7 @@ static void ioapic_service(IOAPICCommonState *s)
                  * the IOAPIC message into a MSI one, and its
                  * address space will decide whether we need a
                  * translation. */
-                addr = APIC_DEFAULT_ADDRESS | \
-                    (dest_idx << MSI_ADDR_DEST_IDX_SHIFT) |
-                    (dest_mode << MSI_ADDR_DEST_MODE_SHIFT);
-                data = (vector << MSI_DATA_VECTOR_SHIFT) |
-                    (trig_mode << MSI_DATA_TRIGGER_SHIFT) |
-                    (delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
-                stl_le_phys(ioapic_as, addr, data);
+                stl_le_phys(ioapic_as, info.addr, info.data);
             }
         }
     }
@@ -169,30 +186,11 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s)
 
     if (kvm_irqchip_is_split()) {
         for (i = 0; i < IOAPIC_NUM_PINS; i++) {
-            uint64_t entry = s->ioredtbl[i];
-            uint8_t trig_mode;
-            uint8_t delivery_mode;
-            uint8_t dest;
-            uint8_t dest_mode;
-            uint64_t pin_polarity;
             MSIMessage msg;
-
-            trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1);
-            dest = entry >> IOAPIC_LVT_DEST_SHIFT;
-            dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1;
-            pin_polarity = (entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1;
-            delivery_mode =
-                (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK;
-
-            msg.address = APIC_DEFAULT_ADDRESS;
-            msg.address |= dest_mode << 2;
-            msg.address |= dest << 12;
-
-            msg.data = entry & IOAPIC_VECTOR_MASK;
-            msg.data |= delivery_mode << APIC_DELIVERY_MODE_SHIFT;
-            msg.data |= pin_polarity << APIC_POLARITY_SHIFT;
-            msg.data |= trig_mode << APIC_TRIG_MODE_SHIFT;
-
+            struct ioapic_entry_info info;
+            ioapic_entry_parse(s->ioredtbl[i], &info);
+            msg.address = info.addr;
+            msg.data = info.data;
             kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL);
         }
         kvm_irqchip_commit_routes(kvm_state);
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 16/26] intel_iommu: add support for split irqchip
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (14 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 15/26] ioapic: introduce ioapic_entry_parse() helper Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-25  8:08   ` Jan Kiszka
  2016-07-04 14:32   ` Paolo Bonzini
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 17/26] x86-iommu: introduce IEC notifiers Peter Xu
                   ` (11 subsequent siblings)
  27 siblings, 2 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

In split irqchip mode, IOAPIC is working in user space, only update
kernel irq routes when entry changed. When IR is enabled, we directly
update the kernel with translated messages. It works just like a kernel
cache for the remapping entries.

Since KVM irqfd is using kernel gsi routes to deliver interrupts, as
long as we can support split irqchip, we will support irqfd as
well. Also, since kernel gsi routes will cache translated interrupts,
irqfd delivery will not suffer from any performance impact due to IR.

And, since we supported irqfd, vhost devices will be able to work
seamlessly with IR now. Logically this should contain both vhost-net and
vhost-user case.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c         |  7 +++++++
 include/hw/i386/intel_iommu.h |  1 +
 include/hw/i386/x86-iommu.h   |  4 ++++
 target-i386/kvm.c             | 27 +++++++++++++++++++++++++++
 trace-events                  |  3 +++
 5 files changed, 42 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index d874596..0eaffc6 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2149,6 +2149,12 @@ do_not_translate:
     return 0;
 }
 
+static int vtd_int_remap(X86IOMMUState *iommu, MSIMessage *src,
+                         MSIMessage *dst, uint16_t sid)
+{
+    return vtd_interrupt_remap_msi(INTEL_IOMMU_DEVICE(iommu), src, dst);
+}
+
 static MemTxResult vtd_mem_ir_read(void *opaque, hwaddr addr,
                                    uint64_t *data, unsigned size,
                                    MemTxAttrs attrs)
@@ -2393,6 +2399,7 @@ static void vtd_class_init(ObjectClass *klass, void *data)
     dc->props = vtd_properties;
     x86_class->realize = vtd_realize;
     x86_class->find_add_as = vtd_find_add_as;
+    x86_class->int_remap = vtd_int_remap;
 }
 
 static const TypeInfo vtd_info = {
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index b3f17d7..3bca390 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -26,6 +26,7 @@
 #include "hw/i386/x86-iommu.h"
 #include "hw/i386/ioapic.h"
 #include "hw/pci/msi.h"
+#include "hw/sysbus.h"
 
 #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
 #define INTEL_IOMMU_DEVICE(obj) \
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
index 07199be..b419ae5 100644
--- a/include/hw/i386/x86-iommu.h
+++ b/include/hw/i386/x86-iommu.h
@@ -22,6 +22,7 @@
 
 #include "hw/sysbus.h"
 #include "exec/memory.h"
+#include "hw/pci/pci.h"
 
 #define  TYPE_X86_IOMMU_DEVICE  ("x86-iommu")
 #define  X86_IOMMU_DEVICE(obj) \
@@ -43,6 +44,9 @@ struct X86IOMMUClass {
     DeviceRealize realize;
     /* Find/Add IOMMU address space for specific PCI device */
     AddressSpace *(*find_add_as)(X86IOMMUState *s, PCIBus *bus, int devfn);
+    /* MSI-based interrupt remapping */
+    int (*int_remap)(X86IOMMUState *iommu, MSIMessage *src,
+                     MSIMessage *dst, uint16_t sid);
 };
 
 struct X86IOMMUState {
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index f3698f1..bfa40b2 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -35,6 +35,7 @@
 #include "hw/i386/apic.h"
 #include "hw/i386/apic_internal.h"
 #include "hw/i386/apic-msidef.h"
+#include "hw/i386/intel_iommu.h"
 
 #include "exec/ioport.h"
 #include "standard-headers/asm-x86/hyperv.h"
@@ -42,6 +43,7 @@
 #include "hw/pci/msi.h"
 #include "migration/migration.h"
 #include "exec/memattrs.h"
+#include "trace.h"
 
 //#define DEBUG_KVM
 
@@ -3323,6 +3325,31 @@ int kvm_device_msix_deassign(KVMState *s, uint32_t dev_id)
 int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
                              uint64_t address, uint32_t data, PCIDevice *dev)
 {
+    X86IOMMUState *iommu = x86_iommu_get_default();
+
+    if (iommu) {
+        int ret;
+        MSIMessage src, dst;
+        X86IOMMUClass *class = X86_IOMMU_GET_CLASS(iommu);
+
+        src.address = route->u.msi.address_hi;
+        src.address <<= VTD_MSI_ADDR_HI_SHIFT;
+        src.address |= route->u.msi.address_lo;
+        src.data = route->u.msi.data;
+
+        ret = class->int_remap(iommu, &src, &dst, dev ? \
+                               pci_requester_id(dev) : \
+                               X86_IOMMU_SID_INVALID);
+        if (ret) {
+            trace_kvm_x86_fixup_msi_error(route->gsi);
+            return 1;
+        }
+
+        route->u.msi.address_hi = dst.address >> VTD_MSI_ADDR_HI_SHIFT;
+        route->u.msi.address_lo = dst.address & VTD_MSI_ADDR_LO_MASK;
+        route->u.msi.data = dst.data;
+    }
+
     return 0;
 }
 
diff --git a/trace-events b/trace-events
index da0d060..2982f64 100644
--- a/trace-events
+++ b/trace-events
@@ -2206,3 +2206,6 @@ gicv3_redist_write(uint32_t cpu, uint64_t offset, uint64_t data, unsigned size,
 gicv3_redist_badwrite(uint32_t cpu, uint64_t offset, uint64_t data, unsigned size, bool secure) "GICv3 redistributor %x write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u secure %d: error"
 gicv3_redist_set_irq(uint32_t cpu, int irq, int level) "GICv3 redistributor %x interrupt %d level changed to %d"
 gicv3_redist_send_sgi(uint32_t cpu, int irq) "GICv3 redistributor %x pending SGI %d"
+
+# target-i386/kvm.c
+kvm_x86_fixup_msi_error(uint32_t gsi) "VT-d failed to remap interrupt for GSI %" PRIu32
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 17/26] x86-iommu: introduce IEC notifiers
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (15 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 16/26] intel_iommu: add support for split irqchip Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-07-04 14:22   ` Paolo Bonzini
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 18/26] ioapic: register IOMMU IEC notifier for ioapic Peter Xu
                   ` (10 subsequent siblings)
  27 siblings, 1 reply; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

This patch introduces x86 IOMMU IEC (Interrupt Entry Cache)
invalidation notifier list. When vIOMMU receives IEC invalidate
request, all the registered units will be notified with specific
invalidation requests.

Intel IOMMU is the first provider that generates such a event.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c          | 36 +++++++++++++++++++++++++++++-------
 hw/i386/intel_iommu_internal.h | 24 ++++++++++++++++++++----
 hw/i386/x86-iommu.c            | 29 +++++++++++++++++++++++++++++
 include/hw/i386/x86-iommu.h    | 40 ++++++++++++++++++++++++++++++++++++++++
 trace-events                   |  3 +++
 5 files changed, 121 insertions(+), 11 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 0eaffc6..11cb495 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -904,6 +904,12 @@ static void vtd_root_table_setup(IntelIOMMUState *s)
                 (s->root_extended ? "(extended)" : ""));
 }
 
+static void vtd_iec_notify_all(IntelIOMMUState *s, bool global,
+                               uint32_t index, uint32_t mask)
+{
+    x86_iommu_iec_notify_all(X86_IOMMU_DEVICE(s), global, index, mask);
+}
+
 static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
 {
     uint64_t value = 0;
@@ -911,7 +917,8 @@ static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
     s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
     s->intr_root = value & VTD_IRTA_ADDR_MASK;
 
-    /* TODO: invalidate interrupt entry cache */
+    /* Notify global invalidation */
+    vtd_iec_notify_all(s, true, 0, 0);
 
     VTD_DPRINTF(CSR, "int remap table addr 0x%"PRIx64 " size %"PRIu32,
                 s->intr_root, s->intr_size);
@@ -1413,6 +1420,21 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
     return true;
 }
 
+static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
+                                     VTDInvDesc *inv_desc)
+{
+    VTD_DPRINTF(INV, "inv ir glob %d index %d mask %d",
+                inv_desc->iec.granularity,
+                inv_desc->iec.index,
+                inv_desc->iec.index_mask);
+
+    vtd_iec_notify_all(s, !inv_desc->iec.granularity,
+                       inv_desc->iec.index,
+                       inv_desc->iec.index_mask);
+
+    return true;
+}
+
 static bool vtd_process_inv_desc(IntelIOMMUState *s)
 {
     VTDInvDesc inv_desc;
@@ -1453,12 +1475,12 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         break;
 
     case VTD_INV_DESC_IEC:
-        VTD_DPRINTF(INV, "Interrupt Entry Cache Invalidation "
-                    "not implemented yet");
-        /*
-         * Since currently we do not cache interrupt entries, we can
-         * just mark this descriptor as "good" and move on.
-         */
+        VTD_DPRINTF(INV, "Invalidation Interrupt Entry Cache "
+                    "Descriptor hi 0x%"PRIx64 " lo 0x%"PRIx64,
+                    inv_desc.hi, inv_desc.lo);
+        if (!vtd_process_inv_iec_desc(s, &inv_desc)) {
+            return false;
+        }
         break;
 
     default:
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index e1a08cb..10c20fe 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -296,12 +296,28 @@ typedef enum VTDFaultReason {
 
 #define VTD_CONTEXT_CACHE_GEN_MAX       0xffffffffUL
 
+/* Interrupt Entry Cache Invalidation Descriptor: VT-d 6.5.2.7. */
+struct VTDInvDescIEC {
+    uint32_t type:4;            /* Should always be 0x4 */
+    uint32_t granularity:1;     /* If set, it's global IR invalidation */
+    uint32_t resved_1:22;
+    uint32_t index_mask:5;      /* 2^N for continuous int invalidation */
+    uint32_t index:16;          /* Start index to invalidate */
+    uint32_t reserved_2:16;
+};
+typedef struct VTDInvDescIEC VTDInvDescIEC;
+
 /* Queued Invalidation Descriptor */
-struct VTDInvDesc {
-    uint64_t lo;
-    uint64_t hi;
+union VTDInvDesc {
+    struct {
+        uint64_t lo;
+        uint64_t hi;
+    };
+    union {
+        VTDInvDescIEC iec;
+    };
 };
-typedef struct VTDInvDesc VTDInvDesc;
+typedef union VTDInvDesc VTDInvDesc;
 
 /* Masks for struct VTDInvDesc */
 #define VTD_INV_DESC_TYPE               0xf
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 4280839..ce26b2a 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -22,6 +22,33 @@
 #include "hw/boards.h"
 #include "hw/i386/x86-iommu.h"
 #include "qemu/error-report.h"
+#include "trace.h"
+
+void x86_iommu_iec_register_notifier(X86IOMMUState *iommu,
+                                     iec_notify_fn fn, void *data)
+{
+    IEC_Notifier *notifier = g_new0(IEC_Notifier, 1);
+
+    notifier->iec_notify = fn;
+    notifier->private = data;
+
+    QLIST_INSERT_HEAD(&iommu->iec_notifiers, notifier, list);
+}
+
+void x86_iommu_iec_notify_all(X86IOMMUState *iommu, bool global,
+                              uint32_t index, uint32_t mask)
+{
+    IEC_Notifier *notifier;
+
+    trace_x86_iommu_iec_notify(global, index, mask);
+
+    QLIST_FOREACH(notifier, &iommu->iec_notifiers, list) {
+        if (notifier->iec_notify) {
+            notifier->iec_notify(notifier->private, global,
+                                 index, mask);
+        }
+    }
+}
 
 /* Default X86 IOMMU device */
 static X86IOMMUState *x86_iommu_default = NULL;
@@ -46,7 +73,9 @@ X86IOMMUState *x86_iommu_get_default(void)
 
 static void x86_iommu_realize(DeviceState *dev, Error **errp)
 {
+    X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
     X86IOMMUClass *x86_class = X86_IOMMU_GET_CLASS(dev);
+    QLIST_INIT(&x86_iommu->iec_notifiers);
     if (x86_class->realize) {
         x86_class->realize(dev, errp);
     }
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
index b419ae5..af80d15 100644
--- a/include/hw/i386/x86-iommu.h
+++ b/include/hw/i386/x86-iommu.h
@@ -49,9 +49,28 @@ struct X86IOMMUClass {
                      MSIMessage *dst, uint16_t sid);
 };
 
+/**
+ * iec_notify_fn - IEC (Interrupt Entry Cache) notifier hook,
+ *                 triggered when IR invalidation happens.
+ * @private: private data
+ * @global: whether this is a global IEC invalidation
+ * @index: IRTE index to invalidate (start from)
+ * @mask: invalidation mask
+ */
+typedef void (*iec_notify_fn)(void *private, bool global,
+                              uint32_t index, uint32_t mask);
+
+struct IEC_Notifier {
+    iec_notify_fn iec_notify;
+    void *private;
+    QLIST_ENTRY(IEC_Notifier) list;
+};
+typedef struct IEC_Notifier IEC_Notifier;
+
 struct X86IOMMUState {
     SysBusDevice busdev;
     bool intr_supported;        /* Whether vIOMMU supports IR */
+    QLIST_HEAD(, IEC_Notifier) iec_notifiers; /* IEC notify list */
 };
 
 /**
@@ -60,4 +79,25 @@ struct X86IOMMUState {
  */
 X86IOMMUState *x86_iommu_get_default(void);
 
+/**
+ * x86_iommu_iec_register_notifier - register IEC (Interrupt Entry
+ *                                   Cache) notifiers
+ * @iommu: IOMMU device to register
+ * @fn: IEC notifier hook function
+ * @data: notifier private data
+ */
+void x86_iommu_iec_register_notifier(X86IOMMUState *iommu,
+                                     iec_notify_fn fn, void *data);
+
+/**
+ * x86_iommu_iec_notify_all - Notify IEC invalidations
+ * @iommu: IOMMU device that sends the notification
+ * @global: whether this is a global invalidation. If true, @index
+ *          and @mask are undefined.
+ * @index: starting index of interrupt entry to invalidate
+ * @mask: index mask for the invalidation
+ */
+void x86_iommu_iec_notify_all(X86IOMMUState *iommu, bool global,
+                              uint32_t index, uint32_t mask);
+
 #endif
diff --git a/trace-events b/trace-events
index 2982f64..20df932 100644
--- a/trace-events
+++ b/trace-events
@@ -2209,3 +2209,6 @@ gicv3_redist_send_sgi(uint32_t cpu, int irq) "GICv3 redistributor %x pending SGI
 
 # target-i386/kvm.c
 kvm_x86_fixup_msi_error(uint32_t gsi) "VT-d failed to remap interrupt for GSI %" PRIu32
+
+# hw/i386/x86-iommu.c
+x86_iommu_iec_notify(bool global, uint32_t index, uint32_t mask) "Notify IEC invalidation: global=%d index=%" PRIu32 " mask=%" PRIu32
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 18/26] ioapic: register IOMMU IEC notifier for ioapic
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (16 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 17/26] x86-iommu: introduce IEC notifiers Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 19/26] intel_iommu: Add support for Extended Interrupt Mode Peter Xu
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

Let IOAPIC the first consumer of x86 IOMMU IEC invalidation
notifiers. This is only used for split irqchip case, when vIOMMU
receives IR invalidation requests, IOAPIC will be notified to update
kernel irq routes. For simplicity, we just update all IOAPIC routes,
even if the invalidated entries are not IOAPIC ones.

Since now we are creating IOMMUs using "-device" parameter, IOMMU
device will be created after IOAPIC.  We need to do the registration
after machine done by leveraging machine_done notifier.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/intc/ioapic.c                  | 29 +++++++++++++++++++++++++++++
 include/hw/i386/ioapic_internal.h |  2 ++
 2 files changed, 31 insertions(+)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index c4469e4..0c34e3e 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -31,6 +31,7 @@
 #include "sysemu/kvm.h"
 #include "target-i386/cpu.h"
 #include "hw/i386/apic-msidef.h"
+#include "hw/i386/x86-iommu.h"
 
 //#define DEBUG_IOAPIC
 
@@ -198,6 +199,14 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s)
 #endif
 }
 
+static void ioapic_iec_notifier(void *private, bool global,
+                                uint32_t index, uint32_t mask)
+{
+    IOAPICCommonState *s = (IOAPICCommonState *)private;
+    /* For simplicity, we just update all the routes */
+    ioapic_update_kvm_routes(s);
+}
+
 void ioapic_eoi_broadcast(int vector)
 {
     IOAPICCommonState *s;
@@ -354,6 +363,24 @@ static const MemoryRegionOps ioapic_io_ops = {
     .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
+static void ioapic_machine_done_notify(Notifier *notifier, void *data)
+{
+    IOAPICCommonState *s = container_of(notifier, IOAPICCommonState,
+                                        machine_done);
+
+#ifdef CONFIG_KVM
+    if (kvm_irqchip_is_split()) {
+        X86IOMMUState *iommu = x86_iommu_get_default();
+        if (iommu) {
+            /* Register this IOAPIC with IOMMU IEC notifier, so that
+             * when there are IR invalidates, we can be notified to
+             * update kernel IR cache. */
+            x86_iommu_iec_register_notifier(iommu, ioapic_iec_notifier, s);
+        }
+    }
+#endif
+}
+
 static void ioapic_realize(DeviceState *dev, Error **errp)
 {
     IOAPICCommonState *s = IOAPIC_COMMON(dev);
@@ -364,6 +391,8 @@ static void ioapic_realize(DeviceState *dev, Error **errp)
     qdev_init_gpio_in(dev, ioapic_set_irq, IOAPIC_NUM_PINS);
 
     ioapics[ioapic_no] = s;
+    s->machine_done.notify = ioapic_machine_done_notify;
+    qemu_add_machine_init_done_notifier(&s->machine_done);
 }
 
 static void ioapic_class_init(ObjectClass *klass, void *data)
diff --git a/include/hw/i386/ioapic_internal.h b/include/hw/i386/ioapic_internal.h
index 31dafb3..84e3deb 100644
--- a/include/hw/i386/ioapic_internal.h
+++ b/include/hw/i386/ioapic_internal.h
@@ -25,6 +25,7 @@
 #include "hw/hw.h"
 #include "exec/memory.h"
 #include "hw/sysbus.h"
+#include "qemu/notify.h"
 
 #define MAX_IOAPICS                     1
 
@@ -107,6 +108,7 @@ struct IOAPICCommonState {
     uint8_t ioregsel;
     uint32_t irr;
     uint64_t ioredtbl[IOAPIC_NUM_PINS];
+    Notifier machine_done;
 };
 
 void ioapic_reset_common(DeviceState *dev);
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 19/26] intel_iommu: Add support for Extended Interrupt Mode
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (17 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 18/26] ioapic: register IOMMU IEC notifier for ioapic Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 20/26] intel_iommu: add SID validation for IR Peter Xu
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx,
	Jan Kiszka

From: Jan Kiszka <jan.kiszka@siemens.com>

As neither QEMU nor KVM support more than 255 CPUs so far, this is
simple: we only need to switch the destination ID translation in
vtd_remap_irq_get if EIME is set.

Once CFI support is there, it will have to take EIM into account as
well. So far, nothing to do for this.

This patch allows to use x2APIC in split irqchip mode of KVM.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
[use le32_to_cpu() to retrieve dest_id]
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c          | 16 +++++++++-------
 hw/i386/intel_iommu_internal.h |  2 ++
 include/hw/i386/intel_iommu.h  |  1 +
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 11cb495..7bfaa39 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -916,6 +916,7 @@ static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
     value = vtd_get_quad_raw(s, DMAR_IRTA_REG);
     s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
     s->intr_root = value & VTD_IRTA_ADDR_MASK;
+    s->intr_eime = value & VTD_IRTA_EIME;
 
     /* Notify global invalidation */
     vtd_iec_notify_all(s, true, 0, 0);
@@ -2050,11 +2051,13 @@ static int vtd_remap_irq_get(IntelIOMMUState *iommu, uint16_t index, VTDIrq *irq
     irq->trigger_mode = irte.trigger_mode;
     irq->vector = irte.vector;
     irq->delivery_mode = irte.delivery_mode;
-    /* Not support EIM yet: please refer to vt-d 9.10 DST bits */
+    irq->dest = le32_to_cpu(irte.dest_id);
+    if (!iommu->intr_eime) {
 #define  VTD_IR_APIC_DEST_MASK         (0xff00ULL)
 #define  VTD_IR_APIC_DEST_SHIFT        (8)
-    irq->dest = (le32_to_cpu(irte.dest_id) & VTD_IR_APIC_DEST_MASK) >> \
-        VTD_IR_APIC_DEST_SHIFT;
+        irq->dest = (irq->dest & VTD_IR_APIC_DEST_MASK) >>
+            VTD_IR_APIC_DEST_SHIFT;
+    }
     irq->dest_mode = irte.dest_mode;
     irq->redir_hint = irte.redir_hint;
 
@@ -2307,7 +2310,7 @@ static void vtd_init(IntelIOMMUState *s)
     s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
 
     if (x86_iommu->intr_supported) {
-        s->ecap |= VTD_ECAP_IR;
+        s->ecap |= VTD_ECAP_IR | VTD_ECAP_EIM;
     }
 
     vtd_reset_context_cache(s);
@@ -2361,10 +2364,9 @@ static void vtd_init(IntelIOMMUState *s)
     vtd_define_quad(s, DMAR_FRCD_REG_0_2, 0, 0, 0x8000000000000000ULL);
 
     /*
-     * Interrupt remapping registers, not support extended interrupt
-     * mode for now.
+     * Interrupt remapping registers.
      */
-    vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xfffffffffffff00fULL, 0);
+    vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xfffffffffffff80fULL, 0);
 }
 
 /* Should not reset address_spaces when reset because devices will still use
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 10c20fe..72b0114 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -176,6 +176,7 @@
 
 /* IRTA_REG */
 #define VTD_IRTA_ADDR_MASK          (VTD_HAW_MASK ^ 0xfffULL)
+#define VTD_IRTA_EIME               (1ULL << 11)
 #define VTD_IRTA_SIZE_MASK          (0xfULL)
 
 /* ECAP_REG */
@@ -184,6 +185,7 @@
 #define VTD_ECAP_QI                 (1ULL << 1)
 /* Interrupt Remapping support */
 #define VTD_ECAP_IR                 (1ULL << 3)
+#define VTD_ECAP_EIM                (1ULL << 4)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 3bca390..2fdca5b 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -271,6 +271,7 @@ struct IntelIOMMUState {
     bool intr_enabled;              /* Whether guest enabled IR */
     dma_addr_t intr_root;           /* Interrupt remapping table pointer */
     uint32_t intr_size;             /* Number of IR table entries */
+    bool intr_eime;                 /* Extended interrupt mode enabled */
 };
 
 #endif
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 20/26] intel_iommu: add SID validation for IR
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (18 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 19/26] intel_iommu: Add support for Extended Interrupt Mode Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 21/26] kvm-irqchip: simplify kvm_irqchip_add_msi_route Peter Xu
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

This patch enables SID validation. Invalid interrupts will be dropped.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c         | 69 ++++++++++++++++++++++++++++++++++++-------
 include/hw/i386/intel_iommu.h | 17 +++++++++++
 2 files changed, 75 insertions(+), 11 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 7bfaa39..789ee25 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1998,9 +1998,13 @@ static Property vtd_properties[] = {
 
 /* Read IRTE entry with specific index */
 static int vtd_irte_get(IntelIOMMUState *iommu, uint16_t index,
-                        VTD_IRTE *entry)
+                        VTD_IRTE *entry, uint16_t sid)
 {
+    static const uint16_t vtd_svt_mask[VTD_SQ_MAX] = \
+        {0xffff, 0xfffb, 0xfff9, 0xfff8};
     dma_addr_t addr = 0x00;
+    uint16_t mask, source_id;
+    uint8_t bus, bus_max, bus_min;
 
     addr = iommu->intr_root + index * sizeof(*entry);
     if (dma_memory_read(&address_space_memory, addr, entry,
@@ -2027,23 +2031,58 @@ static int vtd_irte_get(IntelIOMMUState *iommu, uint16_t index,
         return -VTD_FR_IR_IRTE_RSVD;
     }
 
-    /*
-     * TODO: Check Source-ID corresponds to SVT (Source Validation
-     * Type) bits
-     */
+    if (sid != X86_IOMMU_SID_INVALID) {
+        /* Validate IRTE SID */
+        source_id = le32_to_cpu(entry->source_id);
+        switch (entry->sid_vtype) {
+        case VTD_SVT_NONE:
+            VTD_DPRINTF(IR, "No SID validation for IRTE index %d", index);
+            break;
+
+        case VTD_SVT_ALL:
+            mask = vtd_svt_mask[entry->sid_q];
+            if ((source_id & mask) != (sid & mask)) {
+                VTD_DPRINTF(GENERAL, "SID validation for IRTE index "
+                            "%d failed (reqid 0x%04x sid 0x%04x)", index,
+                            sid, source_id);
+                return -VTD_FR_IR_SID_ERR;
+            }
+            break;
+
+        case VTD_SVT_BUS:
+            bus_max = source_id >> 8;
+            bus_min = source_id & 0xff;
+            bus = sid >> 8;
+            if (bus > bus_max || bus < bus_min) {
+                VTD_DPRINTF(GENERAL, "SID validation for IRTE index %d "
+                            "failed (bus %d outside %d-%d)", index, bus,
+                            bus_min, bus_max);
+                return -VTD_FR_IR_SID_ERR;
+            }
+            break;
+
+        default:
+            VTD_DPRINTF(GENERAL, "Invalid SVT bits (0x%x) in IRTE index "
+                        "%d", entry->sid_vtype, index);
+            /* Take this as verification failure. */
+            return -VTD_FR_IR_SID_ERR;
+            break;
+        }
+    }
 
     return 0;
 }
 
 /* Fetch IRQ information of specific IR index */
-static int vtd_remap_irq_get(IntelIOMMUState *iommu, uint16_t index, VTDIrq *irq)
+static int vtd_remap_irq_get(IntelIOMMUState *iommu, uint16_t index,
+                             VTDIrq *irq, uint16_t sid)
 {
     VTD_IRTE irte;
     int ret = 0;
 
     bzero(&irte, sizeof(irte));
 
-    ret = vtd_irte_get(iommu, index, &irte);
+    ret = vtd_irte_get(iommu, index, &irte, sid);
     if (ret) {
         return ret;
     }
@@ -2095,7 +2134,8 @@ static void vtd_generate_msi_message(VTDIrq *irq, MSIMessage *msg_out)
 /* Interrupt remapping for MSI/MSI-X entry */
 static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
                                    MSIMessage *origin,
-                                   MSIMessage *translated)
+                                   MSIMessage *translated,
+                                   uint16_t sid)
 {
     int ret = 0;
     VTD_IR_MSIAddress addr;
@@ -2130,7 +2170,7 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
 
     index = addr.index_h << 15 | le16_to_cpu(addr.index_l);
 
-    ret = vtd_remap_irq_get(iommu, index, &irq);
+    ret = vtd_remap_irq_get(iommu, index, &irq, sid);
     if (ret) {
         return ret;
     }
@@ -2177,7 +2217,8 @@ do_not_translate:
 static int vtd_int_remap(X86IOMMUState *iommu, MSIMessage *src,
                          MSIMessage *dst, uint16_t sid)
 {
-    return vtd_interrupt_remap_msi(INTEL_IOMMU_DEVICE(iommu), src, dst);
+    return vtd_interrupt_remap_msi(INTEL_IOMMU_DEVICE(iommu),
+                                   src, dst, sid);
 }
 
 static MemTxResult vtd_mem_ir_read(void *opaque, hwaddr addr,
@@ -2203,11 +2244,17 @@ static MemTxResult vtd_mem_ir_write(void *opaque, hwaddr addr,
 {
     int ret = 0;
     MSIMessage from = {0}, to = {0};
+    uint16_t sid = X86_IOMMU_SID_INVALID;
 
     from.address = (uint64_t) addr + VTD_INTERRUPT_ADDR_FIRST;
     from.data = (uint32_t) value;
 
-    ret = vtd_interrupt_remap_msi(opaque, &from, &to);
+    if (!attrs.unspecified) {
+        /* We have explicit Source ID */
+        sid = attrs.requester_id;
+    }
+
+    ret = vtd_interrupt_remap_msi(opaque, &from, &to, sid);
     if (ret) {
         /* TODO: report error */
         VTD_DPRINTF(GENERAL, "int remap fail for addr 0x%"PRIx64
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 2fdca5b..e1b6dec 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -102,6 +102,23 @@ struct VTDIOTLBEntry {
     bool write_flags;
 };
 
+/* VT-d Source-ID Qualifier types */
+enum {
+    VTD_SQ_FULL = 0x00,     /* Full SID verification */
+    VTD_SQ_IGN_3 = 0x01,    /* Ignore bit 3 */
+    VTD_SQ_IGN_2_3 = 0x02,  /* Ignore bits 2 & 3 */
+    VTD_SQ_IGN_1_3 = 0x03,  /* Ignore bits 1-3 */
+    VTD_SQ_MAX,
+};
+
+/* VT-d Source Validation Types */
+enum {
+    VTD_SVT_NONE = 0x00,    /* No validation */
+    VTD_SVT_ALL = 0x01,     /* Do full validation */
+    VTD_SVT_BUS = 0x02,     /* Validate bus range */
+    VTD_SVT_MAX,
+};
+
 /* Interrupt Remapping Table Entry Definition */
 union VTD_IRTE {
     struct {
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 21/26] kvm-irqchip: simplify kvm_irqchip_add_msi_route
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (19 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 20/26] intel_iommu: add SID validation for IR Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 22/26] kvm-irqchip: i386: add hook for add/remove virq Peter Xu
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

Changing the original MSIMessage parameter in kvm_irqchip_add_msi_route
into the vector number. Vector index provides more information than the
MSIMessage, we can retrieve the MSIMessage using the vector easily. This
will avoid fetching MSIMessage every time before adding MSI routes.

Meanwhile, the vector info will be used in the coming patches to further
enable gsi route update notifications.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/kvm/pci-assign.c |  8 ++------
 hw/misc/ivshmem.c        |  3 +--
 hw/vfio/pci.c            | 11 +++++------
 hw/virtio/virtio-pci.c   |  9 +++------
 include/sysemu/kvm.h     | 13 ++++++++++++-
 kvm-all.c                | 18 ++++++++++++++++--
 kvm-stub.c               |  2 +-
 target-i386/kvm.c        |  3 +--
 8 files changed, 41 insertions(+), 26 deletions(-)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index f9c9014..62dec5f 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -974,10 +974,9 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
     }
 
     if (ctrl_byte & PCI_MSI_FLAGS_ENABLE) {
-        MSIMessage msg = msi_get_message(pci_dev, 0);
         int virq;
 
-        virq = kvm_irqchip_add_msi_route(kvm_state, msg, pci_dev);
+        virq = kvm_irqchip_add_msi_route(kvm_state, 0, pci_dev);
         if (virq < 0) {
             perror("assigned_dev_update_msi: kvm_irqchip_add_msi_route");
             return;
@@ -1042,7 +1041,6 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
     uint16_t entries_nr = 0;
     int i, r = 0;
     MSIXTableEntry *entry = adev->msix_table;
-    MSIMessage msg;
 
     /* Get the usable entry number for allocating */
     for (i = 0; i < adev->msix_max; i++, entry++) {
@@ -1079,9 +1077,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
             continue;
         }
 
-        msg.address = entry->addr_lo | ((uint64_t)entry->addr_hi << 32);
-        msg.data = entry->data;
-        r = kvm_irqchip_add_msi_route(kvm_state, msg, pci_dev);
+        r = kvm_irqchip_add_msi_route(kvm_state, i, pci_dev);
         if (r < 0) {
             return r;
         }
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index c4dde3a..8512523 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -441,13 +441,12 @@ static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector,
                                      Error **errp)
 {
     PCIDevice *pdev = PCI_DEVICE(s);
-    MSIMessage msg = msix_get_message(pdev, vector);
     int ret;
 
     IVSHMEM_DPRINTF("ivshmem_add_kvm_msi_virq vector:%d\n", vector);
     assert(!s->msi_vectors[vector].pdev);
 
-    ret = kvm_irqchip_add_msi_route(kvm_state, msg, pdev);
+    ret = kvm_irqchip_add_msi_route(kvm_state, vector, pdev);
     if (ret < 0) {
         error_setg(errp, "kvm_irqchip_add_msi_route failed");
         return;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 53b87b7..cc4e60c 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -416,11 +416,11 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
 }
 
 static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
-                                  MSIMessage *msg, bool msix)
+                                  int vector_n, bool msix)
 {
     int virq;
 
-    if ((msix && vdev->no_kvm_msix) || (!msix && vdev->no_kvm_msi) || !msg) {
+    if ((msix && vdev->no_kvm_msix) || (!msix && vdev->no_kvm_msi)) {
         return;
     }
 
@@ -428,7 +428,7 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
         return;
     }
 
-    virq = kvm_irqchip_add_msi_route(kvm_state, *msg, &vdev->pdev);
+    virq = kvm_irqchip_add_msi_route(kvm_state, vector_n, &vdev->pdev);
     if (virq < 0) {
         event_notifier_cleanup(&vector->kvm_interrupt);
         return;
@@ -494,7 +494,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
             vfio_update_kvm_msi_virq(vector, *msg, pdev);
         }
     } else {
-        vfio_add_kvm_msi_virq(vdev, vector, msg, true);
+        vfio_add_kvm_msi_virq(vdev, vector, nr, true);
     }
 
     /*
@@ -638,7 +638,6 @@ retry:
 
     for (i = 0; i < vdev->nr_vectors; i++) {
         VFIOMSIVector *vector = &vdev->msi_vectors[i];
-        MSIMessage msg = msi_get_message(&vdev->pdev, i);
 
         vector->vdev = vdev;
         vector->virq = -1;
@@ -655,7 +654,7 @@ retry:
          * Attempt to enable route through KVM irqchip,
          * default to userspace handling if unavailable.
          */
-        vfio_add_kvm_msi_virq(vdev, vector, &msg, false);
+        vfio_add_kvm_msi_virq(vdev, vector, i, false);
     }
 
     /* Set interrupt type prior to possible interrupts */
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 1a02783..184570d 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -727,14 +727,13 @@ static uint32_t virtio_read_config(PCIDevice *pci_dev,
 
 static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy,
                                         unsigned int queue_no,
-                                        unsigned int vector,
-                                        MSIMessage msg)
+                                        unsigned int vector)
 {
     VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector];
     int ret;
 
     if (irqfd->users == 0) {
-        ret = kvm_irqchip_add_msi_route(kvm_state, msg, &proxy->pci_dev);
+        ret = kvm_irqchip_add_msi_route(kvm_state, vector, &proxy->pci_dev);
         if (ret < 0) {
             return ret;
         }
@@ -785,7 +784,6 @@ static int kvm_virtio_pci_vector_use(VirtIOPCIProxy *proxy, int nvqs)
     VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
     unsigned int vector;
     int ret, queue_no;
-    MSIMessage msg;
 
     for (queue_no = 0; queue_no < nvqs; queue_no++) {
         if (!virtio_queue_get_num(vdev, queue_no)) {
@@ -795,8 +793,7 @@ static int kvm_virtio_pci_vector_use(VirtIOPCIProxy *proxy, int nvqs)
         if (vector >= msix_nr_vectors_allocated(dev)) {
             continue;
         }
-        msg = msix_get_message(dev, vector);
-        ret = kvm_virtio_pci_vq_vector_use(proxy, queue_no, vector, msg);
+        ret = kvm_virtio_pci_vq_vector_use(proxy, queue_no, vector);
         if (ret < 0) {
             goto undo;
         }
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index ad6f837..e5d90bd 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -474,7 +474,18 @@ static inline void cpu_synchronize_post_init(CPUState *cpu)
     }
 }
 
-int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg, PCIDevice *dev);
+/**
+ * kvm_irqchip_add_msi_route - Add MSI route for specific vector
+ * @s:      KVM state
+ * @vector: which vector to add. This can be either MSI/MSIX
+ *          vector. The function will automatically detect whether
+ *          MSI/MSIX is enabled, and fetch corresponding MSI
+ *          message.
+ * @dev:    Owner PCI device to add the route. If @dev is specified
+ *          as @NULL, an empty MSI message will be inited.
+ * @return: virq (>=0) when success, errno (<0) when failed.
+ */
+int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev);
 int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
                                  PCIDevice *dev);
 void kvm_irqchip_release_virq(KVMState *s, int virq);
diff --git a/kvm-all.c b/kvm-all.c
index a88f917..d94c0e4 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -25,6 +25,7 @@
 #include "qemu/error-report.h"
 #include "hw/hw.h"
 #include "hw/pci/msi.h"
+#include "hw/pci/msix.h"
 #include "hw/s390x/adapter.h"
 #include "exec/gdbstub.h"
 #include "sysemu/kvm_int.h"
@@ -1237,10 +1238,23 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
     return kvm_set_irq(s, route->kroute.gsi, 1);
 }
 
-int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg, PCIDevice *dev)
+int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
 {
     struct kvm_irq_routing_entry kroute = {};
     int virq;
+    MSIMessage msg = {0, 0};
+
+    if (dev) {
+        if (msix_enabled(dev)) {
+            msg = msix_get_message(dev, vector);
+        } else if (msi_enabled(dev)) {
+            msg = msi_get_message(dev, vector);
+        } else {
+            /* Should never happen */
+            error_report("%s: unknown interrupt type", __func__);
+            abort();
+        }
+    }
 
     if (kvm_gsi_direct_mapping()) {
         return kvm_arch_msi_data_to_gsi(msg.data);
@@ -1390,7 +1404,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
     abort();
 }
 
-int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
+int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
 {
     return -ENOSYS;
 }
diff --git a/kvm-stub.c b/kvm-stub.c
index 07c09d1..982e590 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -116,7 +116,7 @@ int kvm_on_sigbus(int code, void *addr)
 }
 
 #ifndef CONFIG_USER_ONLY
-int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg, PCIDevice *dev)
+int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
 {
     return -ENOSYS;
 }
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index bfa40b2..17cd24b 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -3154,8 +3154,7 @@ void kvm_arch_init_irq_routing(KVMState *s)
         /* If the ioapic is in QEMU and the lapics are in KVM, reserve
            MSI routes for signaling interrupts to the local apics. */
         for (i = 0; i < IOAPIC_NUM_PINS; i++) {
-            struct MSIMessage msg = { 0x0, 0x0 };
-            if (kvm_irqchip_add_msi_route(s, msg, NULL) < 0) {
+            if (kvm_irqchip_add_msi_route(s, 0, NULL) < 0) {
                 error_report("Could not enable split IRQ mode.");
                 exit(1);
             }
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 22/26] kvm-irqchip: i386: add hook for add/remove virq
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (20 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 21/26] kvm-irqchip: simplify kvm_irqchip_add_msi_route Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 23/26] kvm-irqchip: x86: add msi route notify fn Peter Xu
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

Adding two hooks to be notified when adding/removing msi routes. There
are two kinds of MSI routes:

- in kvm_irqchip_add_irq_route(): before assigning IRQFD. Used by
  vhost, vfio, etc.

- in kvm_irqchip_send_msi(): when sending direct MSI message, if
  direct MSI not allowed, we will first create one MSI route entry
  in the kernel, then trigger it.

This patch only hooks the first one (irqfd case). We do not need to
take care for the 2nd one, since it's only used by QEMU userspace
(kvm-apic) and the messages will always do in-time translation when
triggered. While we need to note them down for the 1st one, so that we
can notify the kernel when cache invalidation happens.

Also, we do not hook IOAPIC msi routes (we have explicit notifier for
IOAPIC to keep its cache updated). We only need to care about irqfd
users.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/sysemu/kvm.h |  6 ++++++
 kvm-all.c            |  2 ++
 target-arm/kvm.c     | 11 +++++++++++
 target-i386/kvm.c    | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 target-mips/kvm.c    | 11 +++++++++++
 target-ppc/kvm.c     | 11 +++++++++++
 target-s390x/kvm.c   | 11 +++++++++++
 trace-events         |  2 ++
 8 files changed, 102 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index e5d90bd..0a16e0e 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -359,6 +359,12 @@ void kvm_arch_init_irq_routing(KVMState *s);
 int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
                              uint64_t address, uint32_t data, PCIDevice *dev);
 
+/* Notify arch about newly added MSI routes */
+int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
+                                int vector, PCIDevice *dev);
+/* Notify arch about released MSI routes */
+int kvm_arch_release_virq_post(int virq);
+
 int kvm_arch_msi_data_to_gsi(uint32_t data);
 
 int kvm_set_irq(KVMState *s, int irq, int level);
diff --git a/kvm-all.c b/kvm-all.c
index d94c0e4..69ff658 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1133,6 +1133,7 @@ void kvm_irqchip_release_virq(KVMState *s, int virq)
         }
     }
     clear_gsi(s, virq);
+    kvm_arch_release_virq_post(virq);
 }
 
 static unsigned int kvm_hash_msi(uint32_t data)
@@ -1281,6 +1282,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
     }
 
     kvm_add_routing_entry(s, &kroute);
+    kvm_arch_add_msi_route_post(&kroute, vector, dev);
     kvm_irqchip_commit_routes(s);
 
     return virq;
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 5c2bd7a..dbe393c 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -622,6 +622,17 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
     return 0;
 }
 
+int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
+                                int vector, PCIDevice *dev)
+{
+    return 0;
+}
+
+int kvm_arch_release_virq_post(int virq)
+{
+    return 0;
+}
+
 int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
     return (data - 32) & 0xffff;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 17cd24b..5d7a7a7 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -3352,6 +3352,54 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
     return 0;
 }
 
+typedef struct MSIRouteEntry MSIRouteEntry;
+
+struct MSIRouteEntry {
+    PCIDevice *dev;             /* Device pointer */
+    int vector;                 /* MSI/MSIX vector index */
+    int virq;                   /* Virtual IRQ index */
+    QLIST_ENTRY(MSIRouteEntry) list;
+};
+
+/* List of used GSI routes */
+static QLIST_HEAD(, MSIRouteEntry) msi_route_list = \
+    QLIST_HEAD_INITIALIZER(msi_route_list);
+
+int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
+                                int vector, PCIDevice *dev)
+{
+    MSIRouteEntry *entry;
+
+    if (!dev) {
+        /* These are (possibly) IOAPIC routes only used for split
+         * kernel irqchip mode, while what we are housekeeping are
+         * PCI devices only. */
+        return 0;
+    }
+
+    entry = g_new0(MSIRouteEntry, 1);
+    entry->dev = dev;
+    entry->vector = vector;
+    entry->virq = route->gsi;
+    QLIST_INSERT_HEAD(&msi_route_list, entry, list);
+
+    trace_kvm_x86_add_msi_route(route->gsi);
+    return 0;
+}
+
+int kvm_arch_release_virq_post(int virq)
+{
+    MSIRouteEntry *entry, *next;
+    QLIST_FOREACH_SAFE(entry, &msi_route_list, list, next) {
+        if (entry->virq == virq) {
+            trace_kvm_x86_remove_msi_route(virq);
+            QLIST_REMOVE(entry, list);
+            break;
+        }
+    }
+    return 0;
+}
+
 int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
     abort();
diff --git a/target-mips/kvm.c b/target-mips/kvm.c
index f3f832d..dcf5fbb 100644
--- a/target-mips/kvm.c
+++ b/target-mips/kvm.c
@@ -1043,6 +1043,17 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
     return 0;
 }
 
+int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
+                                int vector, PCIDevice *dev)
+{
+    return 0;
+}
+
+int kvm_arch_release_virq_post(int virq)
+{
+    return 0;
+}
+
 int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
     abort();
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index e14da60..d09c982 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -2602,6 +2602,17 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
     return 0;
 }
 
+int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
+                                int vector, PCIDevice *dev)
+{
+    return 0;
+}
+
+int kvm_arch_release_virq_post(int virq)
+{
+    return 0;
+}
+
 int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
     return data & 0xffff;
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 45e94ca..08aaf61 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -2267,6 +2267,17 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
     return 0;
 }
 
+int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
+                                int vector, PCIDevice *dev)
+{
+    return 0;
+}
+
+int kvm_arch_release_virq_post(int virq)
+{
+    return 0;
+}
+
 int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
     abort();
diff --git a/trace-events b/trace-events
index 20df932..1ca0842 100644
--- a/trace-events
+++ b/trace-events
@@ -2209,6 +2209,8 @@ gicv3_redist_send_sgi(uint32_t cpu, int irq) "GICv3 redistributor %x pending SGI
 
 # target-i386/kvm.c
 kvm_x86_fixup_msi_error(uint32_t gsi) "VT-d failed to remap interrupt for GSI %" PRIu32
+kvm_x86_add_msi_route(int virq) "Adding route entry for virq %d"
+kvm_x86_remove_msi_route(int virq) "Removing route entry for virq %d"
 
 # hw/i386/x86-iommu.c
 x86_iommu_iec_notify(bool global, uint32_t index, uint32_t mask) "Notify IEC invalidation: global=%d index=%" PRIu32 " mask=%" PRIu32
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 23/26] kvm-irqchip: x86: add msi route notify fn
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (21 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 22/26] kvm-irqchip: i386: add hook for add/remove virq Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 24/26] kvm-irqchip: do explicit commit when update irq Peter Xu
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

One more IEC notifier is added to let msi routes know about the IEC
changes. When interrupt invalidation happens, all registered msi routes
will be updated for all PCI devices.

Since both vfio and vhost are possible gsi route consumers, this patch
will go one step further to keep them safe in split irqchip mode and
when irqfd is enabled.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/pci/pci.c         | 15 +++++++++++++++
 include/hw/pci/pci.h |  2 ++
 kvm-all.c            | 10 +---------
 target-i386/kvm.c    | 30 ++++++++++++++++++++++++++++++
 trace-events         |  1 +
 5 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 3b02888..4ed119e 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2598,6 +2598,21 @@ PCIDevice *pci_get_function_0(PCIDevice *pci_dev)
     }
 }
 
+MSIMessage pci_get_msi_message(PCIDevice *dev, int vector)
+{
+    MSIMessage msg;
+    if (msix_enabled(dev)) {
+        msg = msix_get_message(dev, vector);
+    } else if (msi_enabled(dev)) {
+        msg = msi_get_message(dev, vector);
+    } else {
+        /* Should never happen */
+        error_report("%s: unknown interrupt type", __func__);
+        abort();
+    }
+    return msg;
+}
+
 static const TypeInfo pci_device_type_info = {
     .name = TYPE_PCI_DEVICE,
     .parent = TYPE_DEVICE,
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 9ed1624..74d797d 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -805,4 +805,6 @@ extern const VMStateDescription vmstate_pci_device;
     .offset     = vmstate_offset_pointer(_state, _field, PCIDevice), \
 }
 
+MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
+
 #endif
diff --git a/kvm-all.c b/kvm-all.c
index 69ff658..ca30a58 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1246,15 +1246,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
     MSIMessage msg = {0, 0};
 
     if (dev) {
-        if (msix_enabled(dev)) {
-            msg = msix_get_message(dev, vector);
-        } else if (msi_enabled(dev)) {
-            msg = msi_get_message(dev, vector);
-        } else {
-            /* Should never happen */
-            error_report("%s: unknown interrupt type", __func__);
-            abort();
-        }
+        msg = pci_get_msi_message(dev, vector);
     }
 
     if (kvm_gsi_direct_mapping()) {
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5d7a7a7..f02ba0a 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -36,6 +36,7 @@
 #include "hw/i386/apic_internal.h"
 #include "hw/i386/apic-msidef.h"
 #include "hw/i386/intel_iommu.h"
+#include "hw/i386/x86-iommu.h"
 
 #include "exec/ioport.h"
 #include "standard-headers/asm-x86/hyperv.h"
@@ -3365,9 +3366,26 @@ struct MSIRouteEntry {
 static QLIST_HEAD(, MSIRouteEntry) msi_route_list = \
     QLIST_HEAD_INITIALIZER(msi_route_list);
 
+static void kvm_update_msi_routes_all(void *private, bool global,
+                                      uint32_t index, uint32_t mask)
+{
+    int cnt = 0;
+    MSIRouteEntry *entry;
+    MSIMessage msg;
+    /* TODO: explicit route update */
+    QLIST_FOREACH(entry, &msi_route_list, list) {
+        cnt++;
+        msg = pci_get_msi_message(entry->dev, entry->vector);
+        kvm_irqchip_update_msi_route(kvm_state, entry->virq,
+                                     msg, entry->dev);
+    }
+    trace_kvm_x86_update_msi_routes(cnt);
+}
+
 int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
                                 int vector, PCIDevice *dev)
 {
+    static bool notify_list_inited = false;
     MSIRouteEntry *entry;
 
     if (!dev) {
@@ -3384,6 +3402,18 @@ int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
     QLIST_INSERT_HEAD(&msi_route_list, entry, list);
 
     trace_kvm_x86_add_msi_route(route->gsi);
+
+    if (!notify_list_inited) {
+        /* For the first time we do add route, add ourselves into
+         * IOMMU's IEC notify list if needed. */
+        X86IOMMUState *iommu = x86_iommu_get_default();
+        if (iommu) {
+            x86_iommu_iec_register_notifier(iommu,
+                                            kvm_update_msi_routes_all,
+                                            NULL);
+        }
+        notify_list_inited = true;
+    }
     return 0;
 }
 
diff --git a/trace-events b/trace-events
index 1ca0842..9ce3514 100644
--- a/trace-events
+++ b/trace-events
@@ -2211,6 +2211,7 @@ gicv3_redist_send_sgi(uint32_t cpu, int irq) "GICv3 redistributor %x pending SGI
 kvm_x86_fixup_msi_error(uint32_t gsi) "VT-d failed to remap interrupt for GSI %" PRIu32
 kvm_x86_add_msi_route(int virq) "Adding route entry for virq %d"
 kvm_x86_remove_msi_route(int virq) "Removing route entry for virq %d"
+kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
 
 # hw/i386/x86-iommu.c
 x86_iommu_iec_notify(bool global, uint32_t index, uint32_t mask) "Notify IEC invalidation: global=%d index=%" PRIu32 " mask=%" PRIu32
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 24/26] kvm-irqchip: do explicit commit when update irq
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (22 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 23/26] kvm-irqchip: x86: add msi route notify fn Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-22  3:42   ` [Qemu-devel] [PATCH v10.2 24/26] kvm-irqchip: introduce kvm_irqchip_update_msi_route_no_commit Peter Xu
  2016-07-04 14:23   ` [Qemu-devel] [PATCH v10 24/26] kvm-irqchip: do explicit commit when update irq Paolo Bonzini
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 25/26] intel_iommu: support all masks in interrupt entry cache invalidation Peter Xu
                   ` (3 subsequent siblings)
  27 siblings, 2 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

In the past, we are doing gsi route commit for each irqchip route
update. This is not efficient if we are updating lots of routes in the
same time. This patch removes the committing phase in
kvm_irqchip_update_msi_route(). Instead, we do explicit commit after all
routes updated.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/kvm/pci-assign.c | 2 ++
 hw/misc/ivshmem.c        | 1 +
 hw/vfio/pci.c            | 1 +
 hw/virtio/virtio-pci.c   | 1 +
 include/sysemu/kvm.h     | 2 +-
 kvm-all.c                | 2 --
 kvm-stub.c               | 4 ++++
 target-i386/kvm.c        | 1 +
 8 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index 62dec5f..a79557f 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -1015,6 +1015,7 @@ static void assigned_dev_update_msi_msg(PCIDevice *pci_dev)
 
     kvm_irqchip_update_msi_route(kvm_state, assigned_dev->msi_virq[0],
                                  msi_get_message(pci_dev, 0), pci_dev);
+    kvm_irqchip_commit_routes(kvm_state);
 }
 
 static bool assigned_dev_msix_masked(MSIXTableEntry *entry)
@@ -1601,6 +1602,7 @@ static void assigned_dev_msix_mmio_write(void *opaque, hwaddr addr,
                 if (ret) {
                     error_report("Error updating irq routing entry (%d)", ret);
                 }
+                kvm_irqchip_commit_routes(kvm_state);
             }
         }
     }
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 8512523..241a70c 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -322,6 +322,7 @@ static int ivshmem_vector_unmask(PCIDevice *dev, unsigned vector,
     if (ret < 0) {
         return ret;
     }
+    kvm_irqchip_commit_routes(kvm_state);
 
     return kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, n, NULL, v->virq);
 }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index cc4e60c..56b13f9 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -457,6 +457,7 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg,
                                      PCIDevice *pdev)
 {
     kvm_irqchip_update_msi_route(kvm_state, vector->virq, msg, pdev);
+    kvm_irqchip_commit_routes(kvm_state);
 }
 
 static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 184570d..aad0f3d 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -870,6 +870,7 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy *proxy,
             if (ret < 0) {
                 return ret;
             }
+            kvm_irqchip_commit_routes(kvm_state);
         }
     }
 
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 0a16e0e..c9c2436 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -371,7 +371,6 @@ int kvm_set_irq(KVMState *s, int irq, int level);
 int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg);
 
 void kvm_irqchip_add_irq_route(KVMState *s, int gsi, int irqchip, int pin);
-void kvm_irqchip_commit_routes(KVMState *s);
 
 void kvm_put_apic_state(DeviceState *d, struct kvm_lapic_state *kapic);
 void kvm_get_apic_state(DeviceState *d, struct kvm_lapic_state *kapic);
@@ -494,6 +493,7 @@ static inline void cpu_synchronize_post_init(CPUState *cpu)
 int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev);
 int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
                                  PCIDevice *dev);
+void kvm_irqchip_commit_routes(KVMState *s);
 void kvm_irqchip_release_virq(KVMState *s, int virq);
 
 int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter);
diff --git a/kvm-all.c b/kvm-all.c
index ca30a58..3764ba9 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1094,8 +1094,6 @@ static int kvm_update_routing_entry(KVMState *s,
 
         *entry = *new_entry;
 
-        kvm_irqchip_commit_routes(s);
-
         return 0;
     }
 
diff --git a/kvm-stub.c b/kvm-stub.c
index 982e590..64e23f6 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -135,6 +135,10 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
     return -ENOSYS;
 }
 
+void kvm_irqchip_commit_routes(KVMState *s)
+{
+}
+
 int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter)
 {
     return -ENOSYS;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index f02ba0a..0e26862 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -3379,6 +3379,7 @@ static void kvm_update_msi_routes_all(void *private, bool global,
         kvm_irqchip_update_msi_route(kvm_state, entry->virq,
                                      msg, entry->dev);
     }
+    kvm_irqchip_commit_routes(kvm_state);
     trace_kvm_x86_update_msi_routes(cnt);
 }
 
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 25/26] intel_iommu: support all masks in interrupt entry cache invalidation
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (23 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 24/26] kvm-irqchip: do explicit commit when update irq Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 26/26] kvm-all: add trace events for kvm irqchip ops Peter Xu
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

From: Radim Krčmář <rkrcmar@redhat.com>

Linux guests do not gracefully handle cases when the invalidation mask
they wanted is not supported, probably because real hardware always
allowed all.

We can just say that all 16 masks are supported, because both
ioapic_iec_notifier and kvm_update_msi_routes_all invalidate all caches.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 hw/i386/intel_iommu.c          | 2 +-
 hw/i386/intel_iommu_internal.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 789ee25..4ff9a24 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2357,7 +2357,7 @@ static void vtd_init(IntelIOMMUState *s)
     s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
 
     if (x86_iommu->intr_supported) {
-        s->ecap |= VTD_ECAP_IR | VTD_ECAP_EIM;
+        s->ecap |= VTD_ECAP_IR | VTD_ECAP_EIM | VTD_ECAP_MHMV;
     }
 
     vtd_reset_context_cache(s);
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 72b0114..0829a50 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -186,6 +186,7 @@
 /* Interrupt Remapping support */
 #define VTD_ECAP_IR                 (1ULL << 3)
 #define VTD_ECAP_EIM                (1ULL << 4)
+#define VTD_ECAP_MHMV               (15ULL << 20)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 26/26] kvm-all: add trace events for kvm irqchip ops
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (24 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 25/26] intel_iommu: support all masks in interrupt entry cache invalidation Peter Xu
@ 2016-06-21  7:47 ` Peter Xu
  2016-07-04 14:33 ` [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Paolo Bonzini
  2016-07-04 16:39 ` Michael S. Tsirkin
  27 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-21  7:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

These will help us monitoring irqchip route activities more easily.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 kvm-all.c    | 5 +++++
 trace-events | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/kvm-all.c b/kvm-all.c
index 3764ba9..ef81ca5 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1048,6 +1048,7 @@ void kvm_irqchip_commit_routes(KVMState *s)
     int ret;
 
     s->irq_routes->flags = 0;
+    trace_kvm_irqchip_commit_routes();
     ret = kvm_vm_ioctl(s, KVM_SET_GSI_ROUTING, s->irq_routes);
     assert(ret == 0);
 }
@@ -1271,6 +1272,8 @@ int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
         return -EINVAL;
     }
 
+    trace_kvm_irqchip_add_msi_route(virq);
+
     kvm_add_routing_entry(s, &kroute);
     kvm_arch_add_msi_route_post(&kroute, vector, dev);
     kvm_irqchip_commit_routes(s);
@@ -1301,6 +1304,8 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
         return -EINVAL;
     }
 
+    trace_kvm_irqchip_update_msi_route(virq);
+
     return kvm_update_routing_entry(s, &kroute);
 }
 
diff --git a/trace-events b/trace-events
index 9ce3514..68fcb44 100644
--- a/trace-events
+++ b/trace-events
@@ -1633,6 +1633,9 @@ kvm_run_exit(int cpu_index, uint32_t reason) "cpu_index %d, reason %d"
 kvm_device_ioctl(int fd, int type, void *arg) "dev fd %d, type 0x%x, arg %p"
 kvm_failed_reg_get(uint64_t id, const char *msg) "Warning: Unable to retrieve ONEREG %" PRIu64 " from KVM: %s"
 kvm_failed_reg_set(uint64_t id, const char *msg) "Warning: Unable to set ONEREG %" PRIu64 " to KVM: %s"
+kvm_irqchip_commit_routes(void) ""
+kvm_irqchip_add_msi_route(int virq) "Adding MSI route virq=%d"
+kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
 
 # target-ppc/kvm.c
 kvm_failed_spr_set(int str, const char *msg) "Warning: Unable to set SPR %d to KVM: %s"
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10.2 24/26] kvm-irqchip: introduce kvm_irqchip_update_msi_route_no_commit
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 24/26] kvm-irqchip: do explicit commit when update irq Peter Xu
@ 2016-06-22  3:42   ` Peter Xu
  2016-07-04 14:23   ` [Qemu-devel] [PATCH v10 24/26] kvm-irqchip: do explicit commit when update irq Paolo Bonzini
  1 sibling, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-22  3:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

We do gsi route commit for each irqchip route update before. This is not
efficient if we are updating lots of routes in the same time. This patch
introduce a new "no_commit" version of the update function, which can be
used when we update multiple route entries in a sequence.

This change also requires that we export kvm_irqchip_commit_routes() to
public.

Signed-off-by: Peter Xu <peterx@redhat.com>
---

After a second thought, a better way to do this is to introduce
another new function, rather than modifying the old
kvm_iqrchip_udpate_msi_route(). By doing this, I can avoid touch other
part of codes, also I can keep add_msi_route() and update_msi_route()
aligned since both of them will contain one commit phase.

Please review this v10.2 instead of v10 for this patch, and will use
this one in future versions if np.

 hw/intc/ioapic.c     |  3 ++-
 include/sysemu/kvm.h | 10 +++++++++-
 kvm-all.c            | 19 +++++++++++++++----
 kvm-stub.c           |  4 ++++
 target-i386/kvm.c    |  5 +++--
 5 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 0c34e3e..931aeaf 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -192,7 +192,8 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s)
             ioapic_entry_parse(s->ioredtbl[i], &info);
             msg.address = info.addr;
             msg.data = info.data;
-            kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL);
+            kvm_irqchip_update_msi_route_no_commit(kvm_state, i,
+                                                   msg, NULL);
         }
         kvm_irqchip_commit_routes(kvm_state);
     }
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 0a16e0e..ba7e7f0 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -371,7 +371,6 @@ int kvm_set_irq(KVMState *s, int irq, int level);
 int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg);
 
 void kvm_irqchip_add_irq_route(KVMState *s, int gsi, int irqchip, int pin);
-void kvm_irqchip_commit_routes(KVMState *s);
 
 void kvm_put_apic_state(DeviceState *d, struct kvm_lapic_state *kapic);
 void kvm_get_apic_state(DeviceState *d, struct kvm_lapic_state *kapic);
@@ -494,6 +493,15 @@ static inline void cpu_synchronize_post_init(CPUState *cpu)
 int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev);
 int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
                                  PCIDevice *dev);
+/*
+ * Same as kvm_irqchip_update_msi_route(), but need explicit
+ * kvm_irqchip_commit_routes() afterward. This is efficient when we
+ * need to update multiple MSI routes at the same time, to avoid
+ * unnecessary commits between updates.
+ */
+int kvm_irqchip_update_msi_route_no_commit(KVMState *s, int virq,
+                                           MSIMessage msg, PCIDevice *dev);
+void kvm_irqchip_commit_routes(KVMState *s);
 void kvm_irqchip_release_virq(KVMState *s, int virq);
 
 int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter);
diff --git a/kvm-all.c b/kvm-all.c
index ca30a58..b896184 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1094,8 +1094,6 @@ static int kvm_update_routing_entry(KVMState *s,
 
         *entry = *new_entry;
 
-        kvm_irqchip_commit_routes(s);
-
         return 0;
     }
 
@@ -1280,8 +1278,8 @@ int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
     return virq;
 }
 
-int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
-                                 PCIDevice *dev)
+int kvm_irqchip_update_msi_route_no_commit(KVMState *s, int virq,
+                                           MSIMessage msg, PCIDevice *dev)
 {
     struct kvm_irq_routing_entry kroute = {};
 
@@ -1306,6 +1304,19 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
     return kvm_update_routing_entry(s, &kroute);
 }
 
+int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
+                                 PCIDevice *dev)
+{
+    int ret;
+
+    ret = kvm_irqchip_update_msi_route_no_commit(s, virq, msg, dev);
+    if (ret >= 0) {
+        kvm_irqchip_commit_routes(s);
+    }
+
+    return ret;
+}
+
 static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, int rfd, int virq,
                                     bool assign)
 {
diff --git a/kvm-stub.c b/kvm-stub.c
index 982e590..64e23f6 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -135,6 +135,10 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
     return -ENOSYS;
 }
 
+void kvm_irqchip_commit_routes(KVMState *s)
+{
+}
+
 int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter)
 {
     return -ENOSYS;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index f02ba0a..62aec24 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -3376,9 +3376,10 @@ static void kvm_update_msi_routes_all(void *private, bool global,
     QLIST_FOREACH(entry, &msi_route_list, list) {
         cnt++;
         msg = pci_get_msi_message(entry->dev, entry->vector);
-        kvm_irqchip_update_msi_route(kvm_state, entry->virq,
-                                     msg, entry->dev);
+        kvm_irqchip_update_msi_route_no_commit(kvm_state, entry->virq,
+                                               msg, entry->dev);
     }
+    kvm_irqchip_commit_routes(kvm_state);
     trace_kvm_x86_update_msi_routes(cnt);
 }
 
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v10 27/26] intel_iommu: disallow kernel-irqchip=on with IR
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 01/26] x86-iommu: introduce parent class Peter Xu
@ 2016-06-24  7:10   ` Peter Xu
  2016-06-24  9:20     ` Peter Xu
  2016-07-11 10:17     ` David Kiarie
  0 siblings, 2 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-24  7:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4, peterx

When user specify "kernel-irqchip=on", throw error and then quit.

Signed-off-by: Peter Xu <peterx@redhat.com>
---

One more patch for this series. Without this one, guest kernel will
possibly hang. This is not user friendly.

 hw/i386/intel_iommu.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 4ff9a24..618b0f9 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -20,6 +20,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/error-report.h"
 #include "hw/sysbus.h"
 #include "exec/address-spaces.h"
 #include "intel_iommu_internal.h"
@@ -29,6 +30,7 @@
 #include "hw/boards.h"
 #include "hw/i386/x86-iommu.h"
 #include "hw/pci-host/q35.h"
+#include "sysemu/kvm.h"
 
 /*#define DEBUG_INTEL_IOMMU*/
 #ifdef DEBUG_INTEL_IOMMU
@@ -2458,6 +2460,13 @@ static void vtd_realize(DeviceState *dev, Error **errp)
     bus->iommu_opaque = dev;
     /* Pseudo address space under root PCI bus. */
     pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
+
+    /* Currently Intel IOMMU IR only support "kernel-irqchip={off|split}" */
+    if (kvm_irqchip_in_kernel() && !kvm_irqchip_is_split()) {
+        error_report("Intel Interrupt Remapping cannot work with "
+                     "kernel-irqchip=on, please use 'split|off'.");
+        exit(1);
+    }
 }
 
 static void vtd_class_init(ObjectClass *klass, void *data)
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 27/26] intel_iommu: disallow kernel-irqchip=on with IR
  2016-06-24  7:10   ` [Qemu-devel] [PATCH v10 27/26] intel_iommu: disallow kernel-irqchip=on with IR Peter Xu
@ 2016-06-24  9:20     ` Peter Xu
  2016-07-04 15:39       ` Michael S. Tsirkin
  2016-07-11 10:17     ` David Kiarie
  1 sibling, 1 reply; 63+ messages in thread
From: Peter Xu @ 2016-06-24  9:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4

On Fri, Jun 24, 2016 at 03:10:21PM +0800, Peter Xu wrote:
> When user specify "kernel-irqchip=on", throw error and then quit.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> 
> One more patch for this series. Without this one, guest kernel will
> possibly hang. This is not user friendly.

This patch should not be here. It should in-reply-to the cover letter.
My fault to erroneously pasted a wrong message ID. :(((((

-- peterx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 16/26] intel_iommu: add support for split irqchip
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 16/26] intel_iommu: add support for split irqchip Peter Xu
@ 2016-06-25  8:08   ` Jan Kiszka
  2016-06-25 13:18     ` Peter Xu
  2016-07-04 14:32   ` Paolo Bonzini
  1 sibling, 1 reply; 63+ messages in thread
From: Jan Kiszka @ 2016-06-25  8:08 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: imammedo, rth, ehabkost, jasowang, marcel, mst, pbonzini,
	rkrcmar, alex.williamson, wexu, davidkiarie4, Valentine Sinitsyn

[-- Attachment #1: Type: text/plain, Size: 6528 bytes --]

On 2016-06-21 09:47, Peter Xu wrote:
> In split irqchip mode, IOAPIC is working in user space, only update
> kernel irq routes when entry changed. When IR is enabled, we directly
> update the kernel with translated messages. It works just like a kernel
> cache for the remapping entries.
> 
> Since KVM irqfd is using kernel gsi routes to deliver interrupts, as
> long as we can support split irqchip, we will support irqfd as
> well. Also, since kernel gsi routes will cache translated interrupts,
> irqfd delivery will not suffer from any performance impact due to IR.
> 
> And, since we supported irqfd, vhost devices will be able to work
> seamlessly with IR now. Logically this should contain both vhost-net and
> vhost-user case.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/intel_iommu.c         |  7 +++++++
>  include/hw/i386/intel_iommu.h |  1 +
>  include/hw/i386/x86-iommu.h   |  4 ++++
>  target-i386/kvm.c             | 27 +++++++++++++++++++++++++++
>  trace-events                  |  3 +++
>  5 files changed, 42 insertions(+)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index d874596..0eaffc6 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2149,6 +2149,12 @@ do_not_translate:
>      return 0;
>  }
>  
> +static int vtd_int_remap(X86IOMMUState *iommu, MSIMessage *src,
> +                         MSIMessage *dst, uint16_t sid)
> +{
> +    return vtd_interrupt_remap_msi(INTEL_IOMMU_DEVICE(iommu), src, dst);
> +}
> +
>  static MemTxResult vtd_mem_ir_read(void *opaque, hwaddr addr,
>                                     uint64_t *data, unsigned size,
>                                     MemTxAttrs attrs)
> @@ -2393,6 +2399,7 @@ static void vtd_class_init(ObjectClass *klass, void *data)
>      dc->props = vtd_properties;
>      x86_class->realize = vtd_realize;
>      x86_class->find_add_as = vtd_find_add_as;
> +    x86_class->int_remap = vtd_int_remap;
>  }
>  
>  static const TypeInfo vtd_info = {
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index b3f17d7..3bca390 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -26,6 +26,7 @@
>  #include "hw/i386/x86-iommu.h"
>  #include "hw/i386/ioapic.h"
>  #include "hw/pci/msi.h"
> +#include "hw/sysbus.h"
>  
>  #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
>  #define INTEL_IOMMU_DEVICE(obj) \
> diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
> index 07199be..b419ae5 100644
> --- a/include/hw/i386/x86-iommu.h
> +++ b/include/hw/i386/x86-iommu.h
> @@ -22,6 +22,7 @@
>  
>  #include "hw/sysbus.h"
>  #include "exec/memory.h"
> +#include "hw/pci/pci.h"
>  
>  #define  TYPE_X86_IOMMU_DEVICE  ("x86-iommu")
>  #define  X86_IOMMU_DEVICE(obj) \
> @@ -43,6 +44,9 @@ struct X86IOMMUClass {
>      DeviceRealize realize;
>      /* Find/Add IOMMU address space for specific PCI device */
>      AddressSpace *(*find_add_as)(X86IOMMUState *s, PCIBus *bus, int devfn);
> +    /* MSI-based interrupt remapping */
> +    int (*int_remap)(X86IOMMUState *iommu, MSIMessage *src,
> +                     MSIMessage *dst, uint16_t sid);
>  };
>  
>  struct X86IOMMUState {
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index f3698f1..bfa40b2 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -35,6 +35,7 @@
>  #include "hw/i386/apic.h"
>  #include "hw/i386/apic_internal.h"
>  #include "hw/i386/apic-msidef.h"
> +#include "hw/i386/intel_iommu.h"
>  
>  #include "exec/ioport.h"
>  #include "standard-headers/asm-x86/hyperv.h"
> @@ -42,6 +43,7 @@
>  #include "hw/pci/msi.h"
>  #include "migration/migration.h"
>  #include "exec/memattrs.h"
> +#include "trace.h"
>  
>  //#define DEBUG_KVM
>  
> @@ -3323,6 +3325,31 @@ int kvm_device_msix_deassign(KVMState *s, uint32_t dev_id)
>  int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
>                               uint64_t address, uint32_t data, PCIDevice *dev)
>  {
> +    X86IOMMUState *iommu = x86_iommu_get_default();
> +
> +    if (iommu) {
> +        int ret;
> +        MSIMessage src, dst;
> +        X86IOMMUClass *class = X86_IOMMU_GET_CLASS(iommu);
> +
> +        src.address = route->u.msi.address_hi;
> +        src.address <<= VTD_MSI_ADDR_HI_SHIFT;
> +        src.address |= route->u.msi.address_lo;
> +        src.data = route->u.msi.data;
> +
> +        ret = class->int_remap(iommu, &src, &dst, dev ? \
> +                               pci_requester_id(dev) : \
> +                               X86_IOMMU_SID_INVALID);
> +        if (ret) {
> +            trace_kvm_x86_fixup_msi_error(route->gsi);
> +            return 1;
> +        }
> +
> +        route->u.msi.address_hi = dst.address >> VTD_MSI_ADDR_HI_SHIFT;
> +        route->u.msi.address_lo = dst.address & VTD_MSI_ADDR_LO_MASK;
> +        route->u.msi.data = dst.data;
> +    }
> +
>      return 0;
>  }
>  
> diff --git a/trace-events b/trace-events
> index da0d060..2982f64 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -2206,3 +2206,6 @@ gicv3_redist_write(uint32_t cpu, uint64_t offset, uint64_t data, unsigned size,
>  gicv3_redist_badwrite(uint32_t cpu, uint64_t offset, uint64_t data, unsigned size, bool secure) "GICv3 redistributor %x write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u secure %d: error"
>  gicv3_redist_set_irq(uint32_t cpu, int irq, int level) "GICv3 redistributor %x interrupt %d level changed to %d"
>  gicv3_redist_send_sgi(uint32_t cpu, int irq) "GICv3 redistributor %x pending SGI %d"
> +
> +# target-i386/kvm.c
> +kvm_x86_fixup_msi_error(uint32_t gsi) "VT-d failed to remap interrupt for GSI %" PRIu32
> 

For successful remappings, this is fine - it just caches the result in
an interrupt route. But what will happen with invalid interrupts?

My current understanding is that, because the translation happens on
activation of that interrupt source, not on actual signalling, the IOMMU
will report an error too early and none when the interrupt is actually
sent. That will lead to unwanted results, in the worst case
false-positiv IR error reports to the guest, no?

I think we need to do this:
- silently remap broken sources to an error sink
- hook up the error sink with the actual IOMMU model (Intel or AMD)
- when that source actually fires, let the sink report an IR
  translation error to the guest

Am I right?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 16/26] intel_iommu: add support for split irqchip
  2016-06-25  8:08   ` Jan Kiszka
@ 2016-06-25 13:18     ` Peter Xu
  2016-06-25 15:18       ` Jan Kiszka
  0 siblings, 1 reply; 63+ messages in thread
From: Peter Xu @ 2016-06-25 13:18 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, mst,
	pbonzini, rkrcmar, alex.williamson, wexu, davidkiarie4,
	Valentine Sinitsyn

On Sat, Jun 25, 2016 at 10:08:10AM +0200, Jan Kiszka wrote:

[...]

> For successful remappings, this is fine - it just caches the result in
> an interrupt route. But what will happen with invalid interrupts?
> 
> My current understanding is that, because the translation happens on
> activation of that interrupt source, not on actual signalling, the IOMMU
> will report an error too early and none when the interrupt is actually
> sent. That will lead to unwanted results, in the worst case
> false-positiv IR error reports to the guest, no?
> 
> I think we need to do this:
> - silently remap broken sources to an error sink
> - hook up the error sink with the actual IOMMU model (Intel or AMD)
> - when that source actually fires, let the sink report an IR
>   translation error to the guest
> 
> Am I right?

Right. I totally missed this one. :(

Currently when split irqchip is specified, IOAPIC interrupts are
cached in kernel with type KVM_IRQ_ROUTING_MSI (which is the same as
irqfds). When guest specify a fault interrupt entry, it is possible
that we silently fail the update, and all further interrupts are still
the old and correct one.

I agree with your solution on this. First of all we update the
interrupt even if it's faulty, but we should mark it out. After that,
we should fire QEMU from kernel side when the fault interrupt is
triggered, so that QEMU IOMMU can still generate corresponding fault
report interrupt to guest (though for Intel IOMMU IR, we still haven't
handled any fault report yet, but we should be prepared for it).

So it seems that finally we cannot avoid touching KVM this time.

I have a thought on how to implement the "sink" you have mentioned:

First of all, in KVM, we provide a new KVM_IRQ_ROUTING_* type, maybe
called:

  KVM_IRQ_ROUTING_EVENTFD

When KVM got this kind of interrupt, KVM does not trigger any real
interrupt to guest. Instead, it just do eventfd_signal() to a
pre-defined fd (maybe also with some data along with the notification,
so that we can put the error inside?), which is set during
KVM_SET_GSI_ROUTING ioctl().

After that, QEMU register all fault interrupts using this new
KVM_IRQ_ROUTING_EVENTFD type (rather than original
KVM_IRQ_ROUTING_MSI), assign a specific handler to handle the events
from these interrupts, and trigger IOMMU fault report path in that
handler.

(Here I used KVM_IRQ_ROUTING_EVENTFD rather than something like
 KVM_IRQ_ROUTING_FAULT_MSI to make the API a more general one, in case
 we can leverage it in other cases in the future)

Do you think the above workable?

No matter which solution we will have, I would still suggest we add
this as an "enhancement" after this series, since:

- there are works that depend on this series, so I would appreciate if
  this series can be merged first, so that other people can have a
  good basement (Radim's x2apic, David's AMD IOMMU). Though this is
  based on the assumption that the basic design of this series is
  workable...

- this problem will only exist for guest driver developers and should
  not happen for generic users (right?), so only a small subset of
  users might be affected.

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 16/26] intel_iommu: add support for split irqchip
  2016-06-25 13:18     ` Peter Xu
@ 2016-06-25 15:18       ` Jan Kiszka
  2016-06-26  1:48         ` Peter Xu
  0 siblings, 1 reply; 63+ messages in thread
From: Jan Kiszka @ 2016-06-25 15:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, mst,
	pbonzini, rkrcmar, alex.williamson, wexu, davidkiarie4,
	Valentine Sinitsyn

[-- Attachment #1: Type: text/plain, Size: 3814 bytes --]

On 2016-06-25 15:18, Peter Xu wrote:
> On Sat, Jun 25, 2016 at 10:08:10AM +0200, Jan Kiszka wrote:
> 
> [...]
> 
>> For successful remappings, this is fine - it just caches the result in
>> an interrupt route. But what will happen with invalid interrupts?
>>
>> My current understanding is that, because the translation happens on
>> activation of that interrupt source, not on actual signalling, the IOMMU
>> will report an error too early and none when the interrupt is actually
>> sent. That will lead to unwanted results, in the worst case
>> false-positiv IR error reports to the guest, no?
>>
>> I think we need to do this:
>> - silently remap broken sources to an error sink
>> - hook up the error sink with the actual IOMMU model (Intel or AMD)
>> - when that source actually fires, let the sink report an IR
>>   translation error to the guest
>>
>> Am I right?
> 
> Right. I totally missed this one. :(
> 
> Currently when split irqchip is specified, IOAPIC interrupts are
> cached in kernel with type KVM_IRQ_ROUTING_MSI (which is the same as
> irqfds). When guest specify a fault interrupt entry, it is possible
> that we silently fail the update, and all further interrupts are still
> the old and correct one.
> 
> I agree with your solution on this. First of all we update the
> interrupt even if it's faulty, but we should mark it out. After that,
> we should fire QEMU from kernel side when the fault interrupt is
> triggered, so that QEMU IOMMU can still generate corresponding fault
> report interrupt to guest (though for Intel IOMMU IR, we still haven't
> handled any fault report yet, but we should be prepared for it).
> 
> So it seems that finally we cannot avoid touching KVM this time.
> 
> I have a thought on how to implement the "sink" you have mentioned:
> 
> First of all, in KVM, we provide a new KVM_IRQ_ROUTING_* type, maybe
> called:
> 
>   KVM_IRQ_ROUTING_EVENTFD

Not really, because all sources are either using eventfds, which you can
also terminate in user space (already done for vhost and vfio in certain
scenarios - IIRC) or originate there anyway (IOAPIC).

> 
> When KVM got this kind of interrupt, KVM does not trigger any real
> interrupt to guest. Instead, it just do eventfd_signal() to a
> pre-defined fd (maybe also with some data along with the notification,
> so that we can put the error inside?), which is set during
> KVM_SET_GSI_ROUTING ioctl().
> 
> After that, QEMU register all fault interrupts using this new
> KVM_IRQ_ROUTING_EVENTFD type (rather than original
> KVM_IRQ_ROUTING_MSI), assign a specific handler to handle the events
> from these interrupts, and trigger IOMMU fault report path in that
> handler.
> 
> (Here I used KVM_IRQ_ROUTING_EVENTFD rather than something like
>  KVM_IRQ_ROUTING_FAULT_MSI to make the API a more general one, in case
>  we can leverage it in other cases in the future)
> 
> Do you think the above workable?
> 
> No matter which solution we will have, I would still suggest we add
> this as an "enhancement" after this series, since:
> 
> - there are works that depend on this series, so I would appreciate if
>   this series can be merged first, so that other people can have a
>   good basement (Radim's x2apic, David's AMD IOMMU). Though this is
>   based on the assumption that the basic design of this series is
>   workable...

I understand, and it is probably safe...

> 
> - this problem will only exist for guest driver developers and should
>   not happen for generic users (right?), so only a small subset of
>   users might be affected.

...provided there is only little risk that some guest programs some
half-backed or stale message that would be rejected prematurely. But
that risk is most likely low.

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 16/26] intel_iommu: add support for split irqchip
  2016-06-25 15:18       ` Jan Kiszka
@ 2016-06-26  1:48         ` Peter Xu
  2016-06-26 13:27           ` Jan Kiszka
  0 siblings, 1 reply; 63+ messages in thread
From: Peter Xu @ 2016-06-26  1:48 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, mst,
	pbonzini, rkrcmar, alex.williamson, wexu, davidkiarie4,
	Valentine Sinitsyn

On Sat, Jun 25, 2016 at 05:18:40PM +0200, Jan Kiszka wrote:
> On 2016-06-25 15:18, Peter Xu wrote:
> > On Sat, Jun 25, 2016 at 10:08:10AM +0200, Jan Kiszka wrote:

[...]

> > I have a thought on how to implement the "sink" you have mentioned:
> > 
> > First of all, in KVM, we provide a new KVM_IRQ_ROUTING_* type, maybe
> > called:
> > 
> >   KVM_IRQ_ROUTING_EVENTFD
> 
> Not really, because all sources are either using eventfds, which you can
> also terminate in user space (already done for vhost and vfio in certain
> scenarios - IIRC) or originate there anyway (IOAPIC).

But how should we handle the cases when the interrupt path are all in
kernel?

For vhost, data should be transfered all inside kernel when split
irqchip and irqfd are used: when vhost got data, it triggers irqfd to
deliver the interrupt to KVM. Along the way, we should all in kernel.

For vfio, we have vfio_msihandler() who handles the hardware IRQ and
then triggers irqfd as well to KVM. Again, it seems all in kernel
space, no chance to stop that as well.

Please correct me if I was wrong.

[...]

> > - there are works that depend on this series, so I would appreciate if
> >   this series can be merged first, so that other people can have a
> >   good basement (Radim's x2apic, David's AMD IOMMU). Though this is
> >   based on the assumption that the basic design of this series is
> >   workable...
> 
> I understand, and it is probably safe...
> 
> > 
> > - this problem will only exist for guest driver developers and should
> >   not happen for generic users (right?), so only a small subset of
> >   users might be affected.
> 
> ...provided there is only little risk that some guest programs some
> half-backed or stale message that would be rejected prematurely. But
> that risk is most likely low.

Yes, thanks!

-- peterx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 16/26] intel_iommu: add support for split irqchip
  2016-06-26  1:48         ` Peter Xu
@ 2016-06-26 13:27           ` Jan Kiszka
  2016-06-28  6:10             ` Michael S. Tsirkin
                               ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Jan Kiszka @ 2016-06-26 13:27 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, mst,
	pbonzini, rkrcmar, alex.williamson, wexu, davidkiarie4,
	Valentine Sinitsyn

[-- Attachment #1: Type: text/plain, Size: 1519 bytes --]

On 2016-06-26 03:48, Peter Xu wrote:
> On Sat, Jun 25, 2016 at 05:18:40PM +0200, Jan Kiszka wrote:
>> On 2016-06-25 15:18, Peter Xu wrote:
>>> On Sat, Jun 25, 2016 at 10:08:10AM +0200, Jan Kiszka wrote:
> 
> [...]
> 
>>> I have a thought on how to implement the "sink" you have mentioned:
>>>
>>> First of all, in KVM, we provide a new KVM_IRQ_ROUTING_* type, maybe
>>> called:
>>>
>>>   KVM_IRQ_ROUTING_EVENTFD
>>
>> Not really, because all sources are either using eventfds, which you can
>> also terminate in user space (already done for vhost and vfio in certain
>> scenarios - IIRC) or originate there anyway (IOAPIC).
> 
> But how should we handle the cases when the interrupt path are all in
> kernel?

There are none which we can't redirect (only full in-kernel irqchip
would have, but that's unsupported anyway).

> 
> For vhost, data should be transfered all inside kernel when split
> irqchip and irqfd are used: when vhost got data, it triggers irqfd to
> deliver the interrupt to KVM. Along the way, we should all in kernel.
> 
> For vfio, we have vfio_msihandler() who handles the hardware IRQ and
> then triggers irqfd as well to KVM. Again, it seems all in kernel
> space, no chance to stop that as well.
> 
> Please correct me if I was wrong.

Look at what vhost is doing e.g.: when a virtqueue is masked, it
installs an event notifier that records incoming events in a pending
state field. When it's unmasked, the corresponding KVM irqfd is installed.

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 16/26] intel_iommu: add support for split irqchip
  2016-06-26 13:27           ` Jan Kiszka
@ 2016-06-28  6:10             ` Michael S. Tsirkin
  2016-06-28  7:25             ` Peter Xu
  2017-01-03  6:15             ` Peter Xu
  2 siblings, 0 replies; 63+ messages in thread
From: Michael S. Tsirkin @ 2016-06-28  6:10 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Peter Xu, qemu-devel, imammedo, rth, ehabkost, jasowang, marcel,
	pbonzini, rkrcmar, alex.williamson, wexu, davidkiarie4,
	Valentine Sinitsyn

On Sun, Jun 26, 2016 at 03:27:50PM +0200, Jan Kiszka wrote:
> On 2016-06-26 03:48, Peter Xu wrote:
> > On Sat, Jun 25, 2016 at 05:18:40PM +0200, Jan Kiszka wrote:
> >> On 2016-06-25 15:18, Peter Xu wrote:
> >>> On Sat, Jun 25, 2016 at 10:08:10AM +0200, Jan Kiszka wrote:
> > 
> > [...]
> > 
> >>> I have a thought on how to implement the "sink" you have mentioned:
> >>>
> >>> First of all, in KVM, we provide a new KVM_IRQ_ROUTING_* type, maybe
> >>> called:
> >>>
> >>>   KVM_IRQ_ROUTING_EVENTFD
> >>
> >> Not really, because all sources are either using eventfds, which you can
> >> also terminate in user space (already done for vhost and vfio in certain
> >> scenarios - IIRC) or originate there anyway (IOAPIC).
> > 
> > But how should we handle the cases when the interrupt path are all in
> > kernel?
> 
> There are none which we can't redirect (only full in-kernel irqchip
> would have, but that's unsupported anyway).

I agree but I kind of feel it's ok to work on this
as a patch on top.
Additionally, some kind of test would have to be written
for these error cases, which is non-negligeable amount of worl.
So I'm inlined to merge this patchset - I feel it'll
help things make progress.

Thoughts? Jan - if you agree it's a good idea, acks would be appreciated.

> > 
> > For vhost, data should be transfered all inside kernel when split
> > irqchip and irqfd are used: when vhost got data, it triggers irqfd to
> > deliver the interrupt to KVM. Along the way, we should all in kernel.
> > 
> > For vfio, we have vfio_msihandler() who handles the hardware IRQ and
> > then triggers irqfd as well to KVM. Again, it seems all in kernel
> > space, no chance to stop that as well.
> > 
> > Please correct me if I was wrong.
> 
> Look at what vhost is doing e.g.: when a virtqueue is masked, it
> installs an event notifier that records incoming events in a pending
> state field. When it's unmasked, the corresponding KVM irqfd is installed.
> 
> Jan
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 16/26] intel_iommu: add support for split irqchip
  2016-06-26 13:27           ` Jan Kiszka
  2016-06-28  6:10             ` Michael S. Tsirkin
@ 2016-06-28  7:25             ` Peter Xu
  2017-01-03  6:15             ` Peter Xu
  2 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-06-28  7:25 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, mst,
	pbonzini, rkrcmar, alex.williamson, wexu, davidkiarie4,
	Valentine Sinitsyn

On Sun, Jun 26, 2016 at 03:27:50PM +0200, Jan Kiszka wrote:
> On 2016-06-26 03:48, Peter Xu wrote:
> > On Sat, Jun 25, 2016 at 05:18:40PM +0200, Jan Kiszka wrote:
> >> On 2016-06-25 15:18, Peter Xu wrote:
> >>> On Sat, Jun 25, 2016 at 10:08:10AM +0200, Jan Kiszka wrote:
> > 
> > [...]
> > 
> >>> I have a thought on how to implement the "sink" you have mentioned:
> >>>
> >>> First of all, in KVM, we provide a new KVM_IRQ_ROUTING_* type, maybe
> >>> called:
> >>>
> >>>   KVM_IRQ_ROUTING_EVENTFD
> >>
> >> Not really, because all sources are either using eventfds, which you can
> >> also terminate in user space (already done for vhost and vfio in certain
> >> scenarios - IIRC) or originate there anyway (IOAPIC).
> > 
> > But how should we handle the cases when the interrupt path are all in
> > kernel?
> 
> There are none which we can't redirect (only full in-kernel irqchip
> would have, but that's unsupported anyway).
> 
> > 
> > For vhost, data should be transfered all inside kernel when split
> > irqchip and irqfd are used: when vhost got data, it triggers irqfd to
> > deliver the interrupt to KVM. Along the way, we should all in kernel.
> > 
> > For vfio, we have vfio_msihandler() who handles the hardware IRQ and
> > then triggers irqfd as well to KVM. Again, it seems all in kernel
> > space, no chance to stop that as well.
> > 
> > Please correct me if I was wrong.
> 
> Look at what vhost is doing e.g.: when a virtqueue is masked, it
> installs an event notifier that records incoming events in a pending
> state field. When it's unmasked, the corresponding KVM irqfd is installed.

You are right. Thanks for the explaination.

-- peterx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 17/26] x86-iommu: introduce IEC notifiers
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 17/26] x86-iommu: introduce IEC notifiers Peter Xu
@ 2016-07-04 14:22   ` Paolo Bonzini
  2016-07-05  7:32     ` Peter Xu
  0 siblings, 1 reply; 63+ messages in thread
From: Paolo Bonzini @ 2016-07-04 14:22 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: ehabkost, mst, jasowang, rkrcmar, alex.williamson, jan.kiszka,
	wexu, marcel, imammedo, davidkiarie4, rth



On 21/06/2016 09:47, Peter Xu wrote:
> This patch introduces x86 IOMMU IEC (Interrupt Entry Cache)
> invalidation notifier list. When vIOMMU receives IEC invalidate
> request, all the registered units will be notified with specific
> invalidation requests.
> 
> Intel IOMMU is the first provider that generates such a event.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Please consider switching this to a NotifierList.

Paolo

> ---
>  hw/i386/intel_iommu.c          | 36 +++++++++++++++++++++++++++++-------
>  hw/i386/intel_iommu_internal.h | 24 ++++++++++++++++++++----
>  hw/i386/x86-iommu.c            | 29 +++++++++++++++++++++++++++++
>  include/hw/i386/x86-iommu.h    | 40 ++++++++++++++++++++++++++++++++++++++++
>  trace-events                   |  3 +++
>  5 files changed, 121 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 0eaffc6..11cb495 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -904,6 +904,12 @@ static void vtd_root_table_setup(IntelIOMMUState *s)
>                  (s->root_extended ? "(extended)" : ""));
>  }
>  
> +static void vtd_iec_notify_all(IntelIOMMUState *s, bool global,
> +                               uint32_t index, uint32_t mask)
> +{
> +    x86_iommu_iec_notify_all(X86_IOMMU_DEVICE(s), global, index, mask);
> +}
> +
>  static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
>  {
>      uint64_t value = 0;
> @@ -911,7 +917,8 @@ static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
>      s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
>      s->intr_root = value & VTD_IRTA_ADDR_MASK;
>  
> -    /* TODO: invalidate interrupt entry cache */
> +    /* Notify global invalidation */
> +    vtd_iec_notify_all(s, true, 0, 0);
>  
>      VTD_DPRINTF(CSR, "int remap table addr 0x%"PRIx64 " size %"PRIu32,
>                  s->intr_root, s->intr_size);
> @@ -1413,6 +1420,21 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
>      return true;
>  }
>  
> +static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
> +                                     VTDInvDesc *inv_desc)
> +{
> +    VTD_DPRINTF(INV, "inv ir glob %d index %d mask %d",
> +                inv_desc->iec.granularity,
> +                inv_desc->iec.index,
> +                inv_desc->iec.index_mask);
> +
> +    vtd_iec_notify_all(s, !inv_desc->iec.granularity,
> +                       inv_desc->iec.index,
> +                       inv_desc->iec.index_mask);
> +
> +    return true;
> +}
> +
>  static bool vtd_process_inv_desc(IntelIOMMUState *s)
>  {
>      VTDInvDesc inv_desc;
> @@ -1453,12 +1475,12 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
>          break;
>  
>      case VTD_INV_DESC_IEC:
> -        VTD_DPRINTF(INV, "Interrupt Entry Cache Invalidation "
> -                    "not implemented yet");
> -        /*
> -         * Since currently we do not cache interrupt entries, we can
> -         * just mark this descriptor as "good" and move on.
> -         */
> +        VTD_DPRINTF(INV, "Invalidation Interrupt Entry Cache "
> +                    "Descriptor hi 0x%"PRIx64 " lo 0x%"PRIx64,
> +                    inv_desc.hi, inv_desc.lo);
> +        if (!vtd_process_inv_iec_desc(s, &inv_desc)) {
> +            return false;
> +        }
>          break;
>  
>      default:
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index e1a08cb..10c20fe 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -296,12 +296,28 @@ typedef enum VTDFaultReason {
>  
>  #define VTD_CONTEXT_CACHE_GEN_MAX       0xffffffffUL
>  
> +/* Interrupt Entry Cache Invalidation Descriptor: VT-d 6.5.2.7. */
> +struct VTDInvDescIEC {
> +    uint32_t type:4;            /* Should always be 0x4 */
> +    uint32_t granularity:1;     /* If set, it's global IR invalidation */
> +    uint32_t resved_1:22;
> +    uint32_t index_mask:5;      /* 2^N for continuous int invalidation */
> +    uint32_t index:16;          /* Start index to invalidate */
> +    uint32_t reserved_2:16;
> +};
> +typedef struct VTDInvDescIEC VTDInvDescIEC;
> +
>  /* Queued Invalidation Descriptor */
> -struct VTDInvDesc {
> -    uint64_t lo;
> -    uint64_t hi;
> +union VTDInvDesc {
> +    struct {
> +        uint64_t lo;
> +        uint64_t hi;
> +    };
> +    union {
> +        VTDInvDescIEC iec;
> +    };
>  };
> -typedef struct VTDInvDesc VTDInvDesc;
> +typedef union VTDInvDesc VTDInvDesc;
>  
>  /* Masks for struct VTDInvDesc */
>  #define VTD_INV_DESC_TYPE               0xf
> diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
> index 4280839..ce26b2a 100644
> --- a/hw/i386/x86-iommu.c
> +++ b/hw/i386/x86-iommu.c
> @@ -22,6 +22,33 @@
>  #include "hw/boards.h"
>  #include "hw/i386/x86-iommu.h"
>  #include "qemu/error-report.h"
> +#include "trace.h"
> +
> +void x86_iommu_iec_register_notifier(X86IOMMUState *iommu,
> +                                     iec_notify_fn fn, void *data)
> +{
> +    IEC_Notifier *notifier = g_new0(IEC_Notifier, 1);
> +
> +    notifier->iec_notify = fn;
> +    notifier->private = data;
> +
> +    QLIST_INSERT_HEAD(&iommu->iec_notifiers, notifier, list);
> +}
> +
> +void x86_iommu_iec_notify_all(X86IOMMUState *iommu, bool global,
> +                              uint32_t index, uint32_t mask)
> +{
> +    IEC_Notifier *notifier;
> +
> +    trace_x86_iommu_iec_notify(global, index, mask);
> +
> +    QLIST_FOREACH(notifier, &iommu->iec_notifiers, list) {
> +        if (notifier->iec_notify) {
> +            notifier->iec_notify(notifier->private, global,
> +                                 index, mask);
> +        }
> +    }
> +}
>  
>  /* Default X86 IOMMU device */
>  static X86IOMMUState *x86_iommu_default = NULL;
> @@ -46,7 +73,9 @@ X86IOMMUState *x86_iommu_get_default(void)
>  
>  static void x86_iommu_realize(DeviceState *dev, Error **errp)
>  {
> +    X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
>      X86IOMMUClass *x86_class = X86_IOMMU_GET_CLASS(dev);
> +    QLIST_INIT(&x86_iommu->iec_notifiers);
>      if (x86_class->realize) {
>          x86_class->realize(dev, errp);
>      }
> diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
> index b419ae5..af80d15 100644
> --- a/include/hw/i386/x86-iommu.h
> +++ b/include/hw/i386/x86-iommu.h
> @@ -49,9 +49,28 @@ struct X86IOMMUClass {
>                       MSIMessage *dst, uint16_t sid);
>  };
>  
> +/**
> + * iec_notify_fn - IEC (Interrupt Entry Cache) notifier hook,
> + *                 triggered when IR invalidation happens.
> + * @private: private data
> + * @global: whether this is a global IEC invalidation
> + * @index: IRTE index to invalidate (start from)
> + * @mask: invalidation mask
> + */
> +typedef void (*iec_notify_fn)(void *private, bool global,
> +                              uint32_t index, uint32_t mask);
> +
> +struct IEC_Notifier {
> +    iec_notify_fn iec_notify;
> +    void *private;
> +    QLIST_ENTRY(IEC_Notifier) list;
> +};
> +typedef struct IEC_Notifier IEC_Notifier;
> +
>  struct X86IOMMUState {
>      SysBusDevice busdev;
>      bool intr_supported;        /* Whether vIOMMU supports IR */
> +    QLIST_HEAD(, IEC_Notifier) iec_notifiers; /* IEC notify list */
>  };
>  
>  /**
> @@ -60,4 +79,25 @@ struct X86IOMMUState {
>   */
>  X86IOMMUState *x86_iommu_get_default(void);
>  
> +/**
> + * x86_iommu_iec_register_notifier - register IEC (Interrupt Entry
> + *                                   Cache) notifiers
> + * @iommu: IOMMU device to register
> + * @fn: IEC notifier hook function
> + * @data: notifier private data
> + */
> +void x86_iommu_iec_register_notifier(X86IOMMUState *iommu,
> +                                     iec_notify_fn fn, void *data);
> +
> +/**
> + * x86_iommu_iec_notify_all - Notify IEC invalidations
> + * @iommu: IOMMU device that sends the notification
> + * @global: whether this is a global invalidation. If true, @index
> + *          and @mask are undefined.
> + * @index: starting index of interrupt entry to invalidate
> + * @mask: index mask for the invalidation
> + */
> +void x86_iommu_iec_notify_all(X86IOMMUState *iommu, bool global,
> +                              uint32_t index, uint32_t mask);
> +
>  #endif
> diff --git a/trace-events b/trace-events
> index 2982f64..20df932 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -2209,3 +2209,6 @@ gicv3_redist_send_sgi(uint32_t cpu, int irq) "GICv3 redistributor %x pending SGI
>  
>  # target-i386/kvm.c
>  kvm_x86_fixup_msi_error(uint32_t gsi) "VT-d failed to remap interrupt for GSI %" PRIu32
> +
> +# hw/i386/x86-iommu.c
> +x86_iommu_iec_notify(bool global, uint32_t index, uint32_t mask) "Notify IEC invalidation: global=%d index=%" PRIu32 " mask=%" PRIu32
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 24/26] kvm-irqchip: do explicit commit when update irq
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 24/26] kvm-irqchip: do explicit commit when update irq Peter Xu
  2016-06-22  3:42   ` [Qemu-devel] [PATCH v10.2 24/26] kvm-irqchip: introduce kvm_irqchip_update_msi_route_no_commit Peter Xu
@ 2016-07-04 14:23   ` Paolo Bonzini
  2016-07-05  7:35     ` Peter Xu
  1 sibling, 1 reply; 63+ messages in thread
From: Paolo Bonzini @ 2016-07-04 14:23 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: ehabkost, mst, jasowang, rkrcmar, alex.williamson, jan.kiszka,
	wexu, marcel, imammedo, davidkiarie4, rth



On 21/06/2016 09:47, Peter Xu wrote:
> In the past, we are doing gsi route commit for each irqchip route
> update. This is not efficient if we are updating lots of routes in the
> same time. This patch removes the committing phase in
> kvm_irqchip_update_msi_route(). Instead, we do explicit commit after all
> routes updated.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/kvm/pci-assign.c | 2 ++
>  hw/misc/ivshmem.c        | 1 +
>  hw/vfio/pci.c            | 1 +
>  hw/virtio/virtio-pci.c   | 1 +
>  include/sysemu/kvm.h     | 2 +-
>  kvm-all.c                | 2 --
>  kvm-stub.c               | 4 ++++
>  target-i386/kvm.c        | 1 +
>  8 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
> index 62dec5f..a79557f 100644
> --- a/hw/i386/kvm/pci-assign.c
> +++ b/hw/i386/kvm/pci-assign.c
> @@ -1015,6 +1015,7 @@ static void assigned_dev_update_msi_msg(PCIDevice *pci_dev)
>  
>      kvm_irqchip_update_msi_route(kvm_state, assigned_dev->msi_virq[0],
>                                   msi_get_message(pci_dev, 0), pci_dev);
> +    kvm_irqchip_commit_routes(kvm_state);
>  }
>  
>  static bool assigned_dev_msix_masked(MSIXTableEntry *entry)
> @@ -1601,6 +1602,7 @@ static void assigned_dev_msix_mmio_write(void *opaque, hwaddr addr,
>                  if (ret) {
>                      error_report("Error updating irq routing entry (%d)", ret);
>                  }
> +                kvm_irqchip_commit_routes(kvm_state);
>              }
>          }
>      }
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 8512523..241a70c 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -322,6 +322,7 @@ static int ivshmem_vector_unmask(PCIDevice *dev, unsigned vector,
>      if (ret < 0) {
>          return ret;
>      }
> +    kvm_irqchip_commit_routes(kvm_state);
>  
>      return kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, n, NULL, v->virq);
>  }
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index cc4e60c..56b13f9 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -457,6 +457,7 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg,
>                                       PCIDevice *pdev)
>  {
>      kvm_irqchip_update_msi_route(kvm_state, vector->virq, msg, pdev);
> +    kvm_irqchip_commit_routes(kvm_state);
>  }
>  
>  static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 184570d..aad0f3d 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -870,6 +870,7 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy *proxy,
>              if (ret < 0) {
>                  return ret;
>              }
> +            kvm_irqchip_commit_routes(kvm_state);
>          }
>      }
>  
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 0a16e0e..c9c2436 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -371,7 +371,6 @@ int kvm_set_irq(KVMState *s, int irq, int level);
>  int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg);
>  
>  void kvm_irqchip_add_irq_route(KVMState *s, int gsi, int irqchip, int pin);
> -void kvm_irqchip_commit_routes(KVMState *s);
>  
>  void kvm_put_apic_state(DeviceState *d, struct kvm_lapic_state *kapic);
>  void kvm_get_apic_state(DeviceState *d, struct kvm_lapic_state *kapic);
> @@ -494,6 +493,7 @@ static inline void cpu_synchronize_post_init(CPUState *cpu)
>  int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev);
>  int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
>                                   PCIDevice *dev);
> +void kvm_irqchip_commit_routes(KVMState *s);
>  void kvm_irqchip_release_virq(KVMState *s, int virq);
>  
>  int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter);
> diff --git a/kvm-all.c b/kvm-all.c
> index ca30a58..3764ba9 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -1094,8 +1094,6 @@ static int kvm_update_routing_entry(KVMState *s,
>  
>          *entry = *new_entry;
>  
> -        kvm_irqchip_commit_routes(s);
> -
>          return 0;
>      }
>  
> diff --git a/kvm-stub.c b/kvm-stub.c
> index 982e590..64e23f6 100644
> --- a/kvm-stub.c
> +++ b/kvm-stub.c
> @@ -135,6 +135,10 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
>      return -ENOSYS;
>  }
>  
> +void kvm_irqchip_commit_routes(KVMState *s)
> +{
> +}
> +
>  int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter)
>  {
>      return -ENOSYS;
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index f02ba0a..0e26862 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -3379,6 +3379,7 @@ static void kvm_update_msi_routes_all(void *private, bool global,
>          kvm_irqchip_update_msi_route(kvm_state, entry->virq,
>                                       msg, entry->dev);
>      }
> +    kvm_irqchip_commit_routes(kvm_state);
>      trace_kvm_x86_update_msi_routes(cnt);
>  }
>  
> 

FWIW I prefer this to the "v10.2".

Paolo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 16/26] intel_iommu: add support for split irqchip
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 16/26] intel_iommu: add support for split irqchip Peter Xu
  2016-06-25  8:08   ` Jan Kiszka
@ 2016-07-04 14:32   ` Paolo Bonzini
  1 sibling, 0 replies; 63+ messages in thread
From: Paolo Bonzini @ 2016-07-04 14:32 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: ehabkost, mst, jasowang, rkrcmar, alex.williamson, jan.kiszka,
	wexu, marcel, imammedo, davidkiarie4, rth



On 21/06/2016 09:47, Peter Xu wrote:
> @@ -3323,6 +3325,31 @@ int kvm_device_msix_deassign(KVMState *s, uint32_t dev_id)
>  int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
>                               uint64_t address, uint32_t data, PCIDevice *dev)
>  {
> +    X86IOMMUState *iommu = x86_iommu_get_default();
> +
> +    if (iommu) {
> +        int ret;
> +        MSIMessage src, dst;
> +        X86IOMMUClass *class = X86_IOMMU_GET_CLASS(iommu);
> +
> +        src.address = route->u.msi.address_hi;
> +        src.address <<= VTD_MSI_ADDR_HI_SHIFT;
> +        src.address |= route->u.msi.address_lo;
> +        src.data = route->u.msi.data;
> +
> +        ret = class->int_remap(iommu, &src, &dst, dev ? \
> +                               pci_requester_id(dev) : \
> +                               X86_IOMMU_SID_INVALID);
> +        if (ret) {
> +            trace_kvm_x86_fixup_msi_error(route->gsi);
> +            return 1;
> +        }
> +
> +        route->u.msi.address_hi = dst.address >> VTD_MSI_ADDR_HI_SHIFT;
> +        route->u.msi.address_lo = dst.address & VTD_MSI_ADDR_LO_MASK;
> +        route->u.msi.data = dst.data;
> +    }
> +
>      return 0;
>  }

I don't like this particularly.  Instead, I think the X86 IOMMU class
should implement a new interface "MSIRemapper", and PCIBus should have a
pointer to MSIRemapper*.  Then this can become:

    if (dev) {
        PCIBus *bus = PCI_BUS(qdev_get_parent_bus(DEVICE(dev));
        if (bus->remapper) {
            msi_remapper_fixup_route(bus->remapper, &src, &dst,
                                     pci_requester_id(dev));
        }
    }

That said, I'm okay with the patch as is because the issue is not with
the x86 implementation but with kvm_arch_fixup_msi_route.  S390 should
be able to do the same, by implementing MSIRemapper in S390pciState (I
think).

Paolo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (25 preceding siblings ...)
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 26/26] kvm-all: add trace events for kvm irqchip ops Peter Xu
@ 2016-07-04 14:33 ` Paolo Bonzini
  2016-07-04 16:39 ` Michael S. Tsirkin
  27 siblings, 0 replies; 63+ messages in thread
From: Paolo Bonzini @ 2016-07-04 14:33 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: ehabkost, mst, jasowang, rkrcmar, alex.williamson, jan.kiszka,
	wexu, marcel, imammedo, davidkiarie4, rth



On 21/06/2016 09:47, Peter Xu wrote:
> This is v10 of Intel IOMMU IR support, based on patches:
> 
> - [PATCH v2 0/3] enable iommu with -device
>   https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg00554.html
> 
> V9 introduced one bug when split irqchip is used with multiple
> vCPUs.  V10 mainly fixes this issue, with several other trivial
> enhancements.
> 
> Online branch:
> 
>   https://github.com/xzpeter/qemu vtd-intr-v10
> 
> Please review.  Thanks.
> 
> v10 changes:
> - Fix issue when specify more than 1 vcpus.  This is introduced in v9
>   after rebased to Marcel's patches.  The problem is that, before
>   Marcel's patch, we will first create IOMMU then IOAPIC, while the
>   order is switched after Marcel's changes.  This affects patch 18
>   ("register IOMMU IEC notifier for ioapic") and I need to do the
>   registration after IOAPIC realization.
> - Display readable error message if user specify more than one x86
>   vIOMMU, rather than an assertion fail. (patch 2)
> - Correct vtd iec notifier "global" parameter: if granularity bit is
>   clear (not set), then it's a global invalidation (patch 17,
>   inverted meaning for granularity).
> - added one more patch (patch 26) to add some trace events for irqchip
>   msi routes operations.
> - rebase to latest master

For patches 16, 21-24 and 26-27:

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 05/26] acpi: enable INTR for DMAR report structure
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 05/26] acpi: enable INTR for DMAR report structure Peter Xu
@ 2016-07-04 15:14   ` Michael S. Tsirkin
  2016-07-05  6:39     ` Peter Xu
  0 siblings, 1 reply; 63+ messages in thread
From: Michael S. Tsirkin @ 2016-07-04 15:14 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4

On Tue, Jun 21, 2016 at 03:47:33PM +0800, Peter Xu wrote:
> In ACPI DMA remapping report structure, enable INTR flag when specified.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/acpi-build.c          | 11 ++++++++++-
>  include/hw/i386/intel_iommu.h |  2 ++
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 161f089..961ccd6a 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -57,6 +57,7 @@
>  
>  #include "qapi/qmp/qint.h"
>  #include "qom/qom-qobject.h"
> +#include "hw/i386/x86-iommu.h"
>  
>  /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
>   * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
> @@ -2422,10 +2423,18 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
>  
>      AcpiTableDmar *dmar;
>      AcpiDmarHardwareUnit *drhd;
> +    uint8_t dmar_flags = 0;
> +    X86IOMMUState *iommu = x86_iommu_get_default();
> +
> +    assert(iommu);
> +    if (iommu->intr_supported) {
> +        /* enable INTR for the IOMMU device */
> +        dmar_flags |= DMAR_REPORT_F_INTR;

Please rewrite it: drop DMAR_REPORT_F_INTR macro,
and replace with literal + comment documenting
earliest spec version has it and the exact text
to look for in the spec.


> +    }
>  
>      dmar = acpi_data_push(table_data, sizeof(*dmar));
>      dmar->host_address_width = VTD_HOST_ADDRESS_WIDTH - 1;
> -    dmar->flags = 0;    /* No intr_remap for now */
> +    dmar->flags = dmar_flags;
>  
>      /* DMAR Remapping Hardware Unit Definition structure */
>      drhd = acpi_data_push(table_data, sizeof(*drhd));
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index e36b896..638d77f 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -44,6 +44,8 @@
>  #define VTD_HOST_ADDRESS_WIDTH      39
>  #define VTD_HAW_MASK                ((1ULL << VTD_HOST_ADDRESS_WIDTH) - 1)
>  
> +#define DMAR_REPORT_F_INTR          (1)
> +
>  typedef struct VTDContextEntry VTDContextEntry;
>  typedef struct VTDContextCacheEntry VTDContextCacheEntry;
>  typedef struct IntelIOMMUState IntelIOMMUState;
> -- 
> 2.4.11

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 02/26] x86-iommu: provide x86_iommu_get_default
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 02/26] x86-iommu: provide x86_iommu_get_default Peter Xu
@ 2016-07-04 15:16   ` Michael S. Tsirkin
  2016-07-05  5:11     ` Peter Xu
  2016-07-04 15:17   ` Michael S. Tsirkin
  1 sibling, 1 reply; 63+ messages in thread
From: Michael S. Tsirkin @ 2016-07-04 15:16 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4

On Tue, Jun 21, 2016 at 03:47:30PM +0800, Peter Xu wrote:
> Instead of searching the device tree every time, one static variable is
> declared for the default system x86 IOMMU device.  Also, some VT-d
> macros are replaced by x86 ones.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

I think it's cleaner to just use object_resolve_path_type
with the X86 type. Error handling by exit is rather ugly, too:
if we need a singleton type, let's add one and have
generic code detect such errors.

> ---
>  hw/i386/acpi-build.c          |  9 ++-------
>  hw/i386/intel_iommu.c         |  9 ++++++---
>  hw/i386/x86-iommu.c           | 23 +++++++++++++++++++++++
>  include/hw/i386/intel_iommu.h |  1 -
>  include/hw/i386/x86-iommu.h   |  9 +++++++++
>  5 files changed, 40 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 8ca2032..161f089 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -50,7 +50,7 @@
>  #include "hw/i386/ich9.h"
>  #include "hw/pci/pci_bus.h"
>  #include "hw/pci-host/q35.h"
> -#include "hw/i386/intel_iommu.h"
> +#include "hw/i386/x86-iommu.h"
>  #include "hw/timer/hpet.h"
>  
>  #include "hw/acpi/aml-build.h"
> @@ -2500,12 +2500,7 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
>  
>  static bool acpi_has_iommu(void)
>  {
> -    bool ambiguous;
> -    Object *intel_iommu;
> -
> -    intel_iommu = object_resolve_path_type("", TYPE_INTEL_IOMMU_DEVICE,
> -                                           &ambiguous);
> -    return intel_iommu && !ambiguous;
> +    return !!x86_iommu_get_default();
>  }
>  
>  static
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 2734f6b..1936c41 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -26,6 +26,8 @@
>  #include "hw/pci/pci.h"
>  #include "hw/pci/pci_bus.h"
>  #include "hw/i386/pc.h"
> +#include "hw/boards.h"
> +#include "hw/i386/x86-iommu.h"
>  
>  /*#define DEBUG_INTEL_IOMMU*/
>  #ifdef DEBUG_INTEL_IOMMU
> @@ -192,7 +194,7 @@ static void vtd_reset_context_cache(IntelIOMMUState *s)
>  
>      VTD_DPRINTF(CACHE, "global context_cache_gen=1");
>      while (g_hash_table_iter_next (&bus_it, NULL, (void**)&vtd_bus)) {
> -        for (devfn_it = 0; devfn_it < VTD_PCI_DEVFN_MAX; ++devfn_it) {
> +        for (devfn_it = 0; devfn_it < X86_IOMMU_PCI_DEVFN_MAX; ++devfn_it) {
>              vtd_as = vtd_bus->dev_as[devfn_it];
>              if (!vtd_as) {
>                  continue;
> @@ -964,7 +966,7 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
>      vtd_bus = vtd_find_as_from_bus_num(s, VTD_SID_TO_BUS(source_id));
>      if (vtd_bus) {
>          devfn = VTD_SID_TO_DEVFN(source_id);
> -        for (devfn_it = 0; devfn_it < VTD_PCI_DEVFN_MAX; ++devfn_it) {
> +        for (devfn_it = 0; devfn_it < X86_IOMMU_PCI_DEVFN_MAX; ++devfn_it) {
>              vtd_as = vtd_bus->dev_as[devfn_it];
>              if (vtd_as && ((devfn_it & mask) == (devfn & mask))) {
>                  VTD_DPRINTF(INV, "invalidate context-cahce of devfn 0x%"PRIx16,
> @@ -1906,7 +1908,8 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>  
>      if (!vtd_bus) {
>          /* No corresponding free() */
> -        vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * VTD_PCI_DEVFN_MAX);
> +        vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * \
> +                            X86_IOMMU_PCI_DEVFN_MAX);
>          vtd_bus->bus = bus;
>          key = (uintptr_t)bus;
>          g_hash_table_insert(s->vtd_as_by_busptr, &key, vtd_bus);
> diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
> index d739afb..f395139 100644
> --- a/hw/i386/x86-iommu.c
> +++ b/hw/i386/x86-iommu.c
> @@ -21,6 +21,28 @@
>  #include "hw/sysbus.h"
>  #include "hw/boards.h"
>  #include "hw/i386/x86-iommu.h"
> +#include "qemu/error-report.h"
> +
> +/* Default X86 IOMMU device */
> +static X86IOMMUState *x86_iommu_default = NULL;
> +
> +static void x86_iommu_set_default(X86IOMMUState *x86_iommu)
> +{
> +    assert(x86_iommu);
> +
> +    if (x86_iommu_default) {
> +        error_report("QEMU does not support multiple vIOMMUs "
> +                     "for x86 yet.");
> +        exit(1);
> +    }
> +
> +    x86_iommu_default = x86_iommu;
> +}
> +
> +X86IOMMUState *x86_iommu_get_default(void)
> +{
> +    return x86_iommu_default;
> +}
>  
>  static void x86_iommu_realize(DeviceState *dev, Error **errp)
>  {
> @@ -28,6 +50,7 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
>      if (x86_class->realize) {
>          x86_class->realize(dev, errp);
>      }
> +    x86_iommu_set_default(X86_IOMMU_DEVICE(dev));
>  }
>  
>  static void x86_iommu_class_init(ObjectClass *klass, void *data)
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 680a0c4..0794309 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -35,7 +35,6 @@
>  #define VTD_PCI_BUS_MAX             256
>  #define VTD_PCI_SLOT_MAX            32
>  #define VTD_PCI_FUNC_MAX            8
> -#define VTD_PCI_DEVFN_MAX           256
>  #define VTD_PCI_SLOT(devfn)         (((devfn) >> 3) & 0x1f)
>  #define VTD_PCI_FUNC(devfn)         ((devfn) & 0x07)
>  #define VTD_SID_TO_BUS(sid)         (((sid) >> 8) & 0xff)
> diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
> index 924f39a..d6991cb 100644
> --- a/include/hw/i386/x86-iommu.h
> +++ b/include/hw/i386/x86-iommu.h
> @@ -30,6 +30,9 @@
>  #define  X86_IOMMU_GET_CLASS(obj) \
>      OBJECT_GET_CLASS(X86IOMMUClass, obj, TYPE_X86_IOMMU_DEVICE)
>  
> +#define X86_IOMMU_PCI_DEVFN_MAX           256
> +#define X86_IOMMU_SID_INVALID             (0xffff)
> +
>  typedef struct X86IOMMUState X86IOMMUState;
>  typedef struct X86IOMMUClass X86IOMMUClass;
>  
> @@ -43,4 +46,10 @@ struct X86IOMMUState {
>      SysBusDevice busdev;
>  };
>  
> +/**
> + * x86_iommu_get_default - get default IOMMU device
> + * @return: pointer to default IOMMU device
> + */
> +X86IOMMUState *x86_iommu_get_default(void);
> +
>  #endif
> -- 
> 2.4.11

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 03/26] x86-iommu: q35: generalize find_add_as()
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 03/26] x86-iommu: q35: generalize find_add_as() Peter Xu
@ 2016-07-04 15:16   ` Michael S. Tsirkin
  2016-07-04 16:08     ` Paolo Bonzini
  0 siblings, 1 reply; 63+ messages in thread
From: Michael S. Tsirkin @ 2016-07-04 15:16 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4

On Tue, Jun 21, 2016 at 03:47:31PM +0800, Peter Xu wrote:
> Remove VT-d calls in common q35 codes. Instead, we provide a general
> find_add_as() for x86-iommu type.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

I think it would be cleaner in the device,
not in the type. In theory you could then mix
multiple different iommu types in the same machine.

> ---
>  hw/i386/intel_iommu.c         | 17 +++++++++--------
>  include/hw/i386/intel_iommu.h |  5 -----
>  include/hw/i386/x86-iommu.h   |  3 +++
>  3 files changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 1936c41..b487224 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -1900,8 +1900,10 @@ static Property vtd_properties[] = {
>  };
>  
>  
> -VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
> +static AddressSpace *vtd_find_add_as(X86IOMMUState *x86_iommu, PCIBus *bus,
> +                                     int devfn)
>  {
> +    IntelIOMMUState *s = (IntelIOMMUState *)x86_iommu;
>      uintptr_t key = (uintptr_t)bus;
>      VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
>      VTDAddressSpace *vtd_dev_as;
> @@ -1929,7 +1931,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>          address_space_init(&vtd_dev_as->as,
>                             &vtd_dev_as->iommu, "intel_iommu");
>      }
> -    return vtd_dev_as;
> +    return &vtd_dev_as->as;
>  }
>  
>  /* Do the initialization. It will also be called when reset, so pay
> @@ -2021,13 +2023,11 @@ static void vtd_reset(DeviceState *dev)
>  
>  static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>  {
> -    IntelIOMMUState *s = opaque;
> -    VTDAddressSpace *vtd_as;
> -
> -    assert(0 <= devfn && devfn <= VTD_PCI_DEVFN_MAX);
> +    X86IOMMUState *x86_iommu = opaque;
> +    X86IOMMUClass *x86_class = X86_IOMMU_GET_CLASS(x86_iommu);
>  
> -    vtd_as = vtd_find_add_as(s, bus, devfn);
> -    return &vtd_as->as;
> +    assert(0 <= devfn && devfn <= X86_IOMMU_PCI_DEVFN_MAX);
> +    return x86_class->find_add_as(x86_iommu, bus, devfn);
>  }
>  
>  static void vtd_realize(DeviceState *dev, Error **errp)
> @@ -2060,6 +2060,7 @@ static void vtd_class_init(ObjectClass *klass, void *data)
>      dc->vmsd = &vtd_vmstate;
>      dc->props = vtd_properties;
>      x86_class->realize = vtd_realize;
> +    x86_class->find_add_as = vtd_find_add_as;
>  }
>  
>  static const TypeInfo vtd_info = {
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 0794309..e36b896 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -125,9 +125,4 @@ struct IntelIOMMUState {
>      VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
>  };
>  
> -/* Find the VTD Address space associated with the given bus pointer,
> - * create a new one if none exists
> - */
> -VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn);
> -
>  #endif
> diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
> index d6991cb..2070cd1 100644
> --- a/include/hw/i386/x86-iommu.h
> +++ b/include/hw/i386/x86-iommu.h
> @@ -21,6 +21,7 @@
>  #define IOMMU_COMMON_H
>  
>  #include "hw/sysbus.h"
> +#include "exec/memory.h"
>  
>  #define  TYPE_X86_IOMMU_DEVICE  ("x86-iommu")
>  #define  X86_IOMMU_DEVICE(obj) \
> @@ -40,6 +41,8 @@ struct X86IOMMUClass {
>      SysBusDeviceClass parent;
>      /* Intel/AMD specific realize() hook */
>      DeviceRealize realize;
> +    /* Find/Add IOMMU address space for specific PCI device */
> +    AddressSpace *(*find_add_as)(X86IOMMUState *s, PCIBus *bus, int devfn);
>  };
>  
>  struct X86IOMMUState {
> -- 
> 2.4.11

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 02/26] x86-iommu: provide x86_iommu_get_default
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 02/26] x86-iommu: provide x86_iommu_get_default Peter Xu
  2016-07-04 15:16   ` Michael S. Tsirkin
@ 2016-07-04 15:17   ` Michael S. Tsirkin
  2016-07-05  5:12     ` Peter Xu
  1 sibling, 1 reply; 63+ messages in thread
From: Michael S. Tsirkin @ 2016-07-04 15:17 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4

On Tue, Jun 21, 2016 at 03:47:30PM +0800, Peter Xu wrote:
> Instead of searching the device tree every time, one static variable is
> declared for the default system x86 IOMMU device.  Also, some VT-d
> macros are replaced by x86 ones.

In the future pls don't mix unrelated changes in same patch like this.

> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/acpi-build.c          |  9 ++-------
>  hw/i386/intel_iommu.c         |  9 ++++++---
>  hw/i386/x86-iommu.c           | 23 +++++++++++++++++++++++
>  include/hw/i386/intel_iommu.h |  1 -
>  include/hw/i386/x86-iommu.h   |  9 +++++++++
>  5 files changed, 40 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 8ca2032..161f089 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -50,7 +50,7 @@
>  #include "hw/i386/ich9.h"
>  #include "hw/pci/pci_bus.h"
>  #include "hw/pci-host/q35.h"
> -#include "hw/i386/intel_iommu.h"
> +#include "hw/i386/x86-iommu.h"
>  #include "hw/timer/hpet.h"
>  
>  #include "hw/acpi/aml-build.h"
> @@ -2500,12 +2500,7 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
>  
>  static bool acpi_has_iommu(void)
>  {
> -    bool ambiguous;
> -    Object *intel_iommu;
> -
> -    intel_iommu = object_resolve_path_type("", TYPE_INTEL_IOMMU_DEVICE,
> -                                           &ambiguous);
> -    return intel_iommu && !ambiguous;
> +    return !!x86_iommu_get_default();
>  }
>  
>  static
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 2734f6b..1936c41 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -26,6 +26,8 @@
>  #include "hw/pci/pci.h"
>  #include "hw/pci/pci_bus.h"
>  #include "hw/i386/pc.h"
> +#include "hw/boards.h"
> +#include "hw/i386/x86-iommu.h"
>  
>  /*#define DEBUG_INTEL_IOMMU*/
>  #ifdef DEBUG_INTEL_IOMMU
> @@ -192,7 +194,7 @@ static void vtd_reset_context_cache(IntelIOMMUState *s)
>  
>      VTD_DPRINTF(CACHE, "global context_cache_gen=1");
>      while (g_hash_table_iter_next (&bus_it, NULL, (void**)&vtd_bus)) {
> -        for (devfn_it = 0; devfn_it < VTD_PCI_DEVFN_MAX; ++devfn_it) {
> +        for (devfn_it = 0; devfn_it < X86_IOMMU_PCI_DEVFN_MAX; ++devfn_it) {
>              vtd_as = vtd_bus->dev_as[devfn_it];
>              if (!vtd_as) {
>                  continue;
> @@ -964,7 +966,7 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
>      vtd_bus = vtd_find_as_from_bus_num(s, VTD_SID_TO_BUS(source_id));
>      if (vtd_bus) {
>          devfn = VTD_SID_TO_DEVFN(source_id);
> -        for (devfn_it = 0; devfn_it < VTD_PCI_DEVFN_MAX; ++devfn_it) {
> +        for (devfn_it = 0; devfn_it < X86_IOMMU_PCI_DEVFN_MAX; ++devfn_it) {
>              vtd_as = vtd_bus->dev_as[devfn_it];
>              if (vtd_as && ((devfn_it & mask) == (devfn & mask))) {
>                  VTD_DPRINTF(INV, "invalidate context-cahce of devfn 0x%"PRIx16,
> @@ -1906,7 +1908,8 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>  
>      if (!vtd_bus) {
>          /* No corresponding free() */
> -        vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * VTD_PCI_DEVFN_MAX);
> +        vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * \
> +                            X86_IOMMU_PCI_DEVFN_MAX);
>          vtd_bus->bus = bus;
>          key = (uintptr_t)bus;
>          g_hash_table_insert(s->vtd_as_by_busptr, &key, vtd_bus);
> diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
> index d739afb..f395139 100644
> --- a/hw/i386/x86-iommu.c
> +++ b/hw/i386/x86-iommu.c
> @@ -21,6 +21,28 @@
>  #include "hw/sysbus.h"
>  #include "hw/boards.h"
>  #include "hw/i386/x86-iommu.h"
> +#include "qemu/error-report.h"
> +
> +/* Default X86 IOMMU device */
> +static X86IOMMUState *x86_iommu_default = NULL;
> +
> +static void x86_iommu_set_default(X86IOMMUState *x86_iommu)
> +{
> +    assert(x86_iommu);
> +
> +    if (x86_iommu_default) {
> +        error_report("QEMU does not support multiple vIOMMUs "
> +                     "for x86 yet.");
> +        exit(1);
> +    }
> +
> +    x86_iommu_default = x86_iommu;
> +}
> +
> +X86IOMMUState *x86_iommu_get_default(void)
> +{
> +    return x86_iommu_default;
> +}
>  
>  static void x86_iommu_realize(DeviceState *dev, Error **errp)
>  {
> @@ -28,6 +50,7 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
>      if (x86_class->realize) {
>          x86_class->realize(dev, errp);
>      }
> +    x86_iommu_set_default(X86_IOMMU_DEVICE(dev));
>  }
>  
>  static void x86_iommu_class_init(ObjectClass *klass, void *data)
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 680a0c4..0794309 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -35,7 +35,6 @@
>  #define VTD_PCI_BUS_MAX             256
>  #define VTD_PCI_SLOT_MAX            32
>  #define VTD_PCI_FUNC_MAX            8
> -#define VTD_PCI_DEVFN_MAX           256
>  #define VTD_PCI_SLOT(devfn)         (((devfn) >> 3) & 0x1f)
>  #define VTD_PCI_FUNC(devfn)         ((devfn) & 0x07)
>  #define VTD_SID_TO_BUS(sid)         (((sid) >> 8) & 0xff)
> diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
> index 924f39a..d6991cb 100644
> --- a/include/hw/i386/x86-iommu.h
> +++ b/include/hw/i386/x86-iommu.h
> @@ -30,6 +30,9 @@
>  #define  X86_IOMMU_GET_CLASS(obj) \
>      OBJECT_GET_CLASS(X86IOMMUClass, obj, TYPE_X86_IOMMU_DEVICE)
>  
> +#define X86_IOMMU_PCI_DEVFN_MAX           256
> +#define X86_IOMMU_SID_INVALID             (0xffff)
> +
>  typedef struct X86IOMMUState X86IOMMUState;
>  typedef struct X86IOMMUClass X86IOMMUClass;
>  
> @@ -43,4 +46,10 @@ struct X86IOMMUState {
>      SysBusDevice busdev;
>  };
>  
> +/**
> + * x86_iommu_get_default - get default IOMMU device
> + * @return: pointer to default IOMMU device
> + */
> +X86IOMMUState *x86_iommu_get_default(void);
> +
>  #endif
> -- 
> 2.4.11

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 08/26] acpi: add DMAR scope definition for root IOAPIC
  2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 08/26] acpi: add DMAR scope definition for root IOAPIC Peter Xu
@ 2016-07-04 15:22   ` Michael S. Tsirkin
  2016-07-05  7:30     ` Peter Xu
  0 siblings, 1 reply; 63+ messages in thread
From: Michael S. Tsirkin @ 2016-07-04 15:22 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4

On Tue, Jun 21, 2016 at 03:47:36PM +0800, Peter Xu wrote:
> To enable interrupt remapping for intel IOMMU device, each IOAPIC device
> in the system reported via ACPI MADT must be explicitly enumerated under
> one specific remapping hardware unit. This patch adds the root-complex
> IOAPIC into the default DMAR device.
> 
> Please refer to VT-d spec 8.3.1.1 for more information.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/acpi-build.c        | 19 ++++++++++++++++---
>  include/hw/acpi/acpi-defs.h | 15 +++++++++++++++
>  include/hw/pci-host/q35.h   |  8 ++++++++
>  3 files changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 961ccd6a..eec022e 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -77,6 +77,9 @@
>  #define ACPI_BUILD_DPRINTF(fmt, ...)
>  #endif
>  
> +/* Default IOAPIC ID */
> +#define ACPI_BUILD_IOAPIC_ID 0x0
> +
>  typedef struct AcpiMcfgInfo {
>      uint64_t mcfg_base;
>      uint32_t mcfg_size;
> @@ -370,7 +373,6 @@ build_madt(GArray *table_data, BIOSLinker *linker, PCMachineState *pcms)
>      io_apic = acpi_data_push(table_data, sizeof *io_apic);
>      io_apic->type = ACPI_APIC_IO;
>      io_apic->length = sizeof(*io_apic);
> -#define ACPI_BUILD_IOAPIC_ID 0x0
>      io_apic->io_apic_id = ACPI_BUILD_IOAPIC_ID;
>      io_apic->address = cpu_to_le32(IO_APIC_DEFAULT_ADDRESS);
>      io_apic->interrupt = cpu_to_le32(0);
> @@ -2425,6 +2427,9 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
>      AcpiDmarHardwareUnit *drhd;
>      uint8_t dmar_flags = 0;
>      X86IOMMUState *iommu = x86_iommu_get_default();
> +    AcpiDmarDeviceScope *scope = NULL;
> +    /* Root complex IOAPIC use one path[0] only */
> +    uint8_t ioapic_scope_size = sizeof(*scope) + sizeof(scope->path[0]);

just use int or unsigned or size_t for types like this.

>  
>      assert(iommu);
>      if (iommu->intr_supported) {
> @@ -2437,13 +2442,21 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
>      dmar->flags = dmar_flags;
>  
>      /* DMAR Remapping Hardware Unit Definition structure */
> -    drhd = acpi_data_push(table_data, sizeof(*drhd));
> +    drhd = acpi_data_push(table_data, sizeof(*drhd) + ioapic_scope_size);
>      drhd->type = cpu_to_le16(ACPI_DMAR_TYPE_HARDWARE_UNIT);
> -    drhd->length = cpu_to_le16(sizeof(*drhd));   /* No device scope now */
> +    drhd->length = cpu_to_le16(sizeof(*drhd) + ioapic_scope_size);
>      drhd->flags = ACPI_DMAR_INCLUDE_PCI_ALL;
>      drhd->pci_segment = cpu_to_le16(0);
>      drhd->address = cpu_to_le64(Q35_HOST_BRIDGE_IOMMU_ADDR);
>  
> +    /* Scope definition for the root-complex IOAPIC */
> +    scope = &drhd->scope[0];
> +    scope->entry_type = ACPI_DMAR_DEV_SCOPE_TYPE_IOAPIC;
> +    scope->length = ioapic_scope_size;
> +    scope->enumeration_id = ACPI_BUILD_IOAPIC_ID;
> +    scope->bus = Q35_PSEUDO_BUS_PLATFORM;
> +    scope->path[0] = cpu_to_le16(Q35_PSEUDO_DEVFN_IOAPIC);
> +
>      build_header(linker, table_data, (void *)(table_data->data + dmar_start),
>                   "DMAR", table_data->len - dmar_start, 1, NULL, NULL);
>  }
> diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
> index ea9be0b..0dbdde3 100644
> --- a/include/hw/acpi/acpi-defs.h
> +++ b/include/hw/acpi/acpi-defs.h
> @@ -571,6 +571,20 @@ enum {
>  /*
>   * Sub-structures for DMAR
>   */
> +
> +#define ACPI_DMAR_DEV_SCOPE_TYPE_IOAPIC     (0x03)

Again, pls use literal with comment in code.

> +
> +/* Device scope structure for DRHD. */
> +struct AcpiDmarDeviceScope {
> +    uint8_t entry_type;
> +    uint8_t length;
> +    uint16_t reserved;
> +    uint8_t enumeration_id;
> +    uint8_t bus;
> +    uint16_t path[0];           /* list of dev:func pairs */
> +} QEMU_PACKED;
> +typedef struct AcpiDmarDeviceScope AcpiDmarDeviceScope;
> +
>  /* Type 0: Hardware Unit Definition */
>  struct AcpiDmarHardwareUnit {
>      uint16_t type;
> @@ -579,6 +593,7 @@ struct AcpiDmarHardwareUnit {
>      uint8_t reserved;
>      uint16_t pci_segment;   /* The PCI Segment associated with this unit */
>      uint64_t address;   /* Base address of remapping hardware register-set */
> +    AcpiDmarDeviceScope scope[0];
>  } QEMU_PACKED;
>  typedef struct AcpiDmarHardwareUnit AcpiDmarHardwareUnit;
>  




> diff --git a/include/hw/pci-host/q35.h b/include/hw/pci-host/q35.h
> index c5c073d..312b47f 100644
> --- a/include/hw/pci-host/q35.h
> +++ b/include/hw/pci-host/q35.h
> @@ -175,4 +175,12 @@ typedef struct Q35PCIHost {
>  
>  uint64_t mch_mcfg_base(void);
>  
> +/*
> + * Arbitary but unique BNF number for IOAPIC device.
> + *
> + * TODO: make sure there would have no conflict with real PCI bus

How are you going to do this?

> + */
> +#define Q35_PSEUDO_BUS_PLATFORM         (0xff)
> +#define Q35_PSEUDO_DEVFN_IOAPIC         (0x00)
> +
>  #endif /* HW_Q35_H */
> -- 
> 2.4.11

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 27/26] intel_iommu: disallow kernel-irqchip=on with IR
  2016-06-24  9:20     ` Peter Xu
@ 2016-07-04 15:39       ` Michael S. Tsirkin
  2016-07-05  3:51         ` Peter Xu
  0 siblings, 1 reply; 63+ messages in thread
From: Michael S. Tsirkin @ 2016-07-04 15:39 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4

On Fri, Jun 24, 2016 at 05:20:22PM +0800, Peter Xu wrote:
> On Fri, Jun 24, 2016 at 03:10:21PM +0800, Peter Xu wrote:
> > When user specify "kernel-irqchip=on", throw error and then quit.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > 
> > One more patch for this series. Without this one, guest kernel will
> > possibly hang. This is not user friendly.
> 
> This patch should not be here. It should in-reply-to the cover letter.
> My fault to erroneously pasted a wrong message ID. :(((((
> 
> -- peterx


It doesn't apply either.  Please repost it properly, including
Paolo's ack.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 03/26] x86-iommu: q35: generalize find_add_as()
  2016-07-04 15:16   ` Michael S. Tsirkin
@ 2016-07-04 16:08     ` Paolo Bonzini
  2016-07-04 16:35       ` Michael S. Tsirkin
  0 siblings, 1 reply; 63+ messages in thread
From: Paolo Bonzini @ 2016-07-04 16:08 UTC (permalink / raw)
  To: Michael S. Tsirkin, Peter Xu
  Cc: ehabkost, rkrcmar, jasowang, qemu-devel, alex.williamson,
	jan.kiszka, wexu, marcel, imammedo, davidkiarie4, rth



On 04/07/2016 17:16, Michael S. Tsirkin wrote:
> On Tue, Jun 21, 2016 at 03:47:31PM +0800, Peter Xu wrote:
>> Remove VT-d calls in common q35 codes. Instead, we provide a general
>> find_add_as() for x86-iommu type.
>>
>> Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> I think it would be cleaner in the device,
> not in the type. In theory you could then mix
> multiple different iommu types in the same machine.

Not sure what you mean.  Each IOMMU subclass can define its own
find_add_as implementation, what's wrong with that?

Paolo

> 
>> ---
>>  hw/i386/intel_iommu.c         | 17 +++++++++--------
>>  include/hw/i386/intel_iommu.h |  5 -----
>>  include/hw/i386/x86-iommu.h   |  3 +++
>>  3 files changed, 12 insertions(+), 13 deletions(-)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 1936c41..b487224 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -1900,8 +1900,10 @@ static Property vtd_properties[] = {
>>  };
>>  
>>  
>> -VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>> +static AddressSpace *vtd_find_add_as(X86IOMMUState *x86_iommu, PCIBus *bus,
>> +                                     int devfn)
>>  {
>> +    IntelIOMMUState *s = (IntelIOMMUState *)x86_iommu;
>>      uintptr_t key = (uintptr_t)bus;
>>      VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
>>      VTDAddressSpace *vtd_dev_as;
>> @@ -1929,7 +1931,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>>          address_space_init(&vtd_dev_as->as,
>>                             &vtd_dev_as->iommu, "intel_iommu");
>>      }
>> -    return vtd_dev_as;
>> +    return &vtd_dev_as->as;
>>  }
>>  
>>  /* Do the initialization. It will also be called when reset, so pay
>> @@ -2021,13 +2023,11 @@ static void vtd_reset(DeviceState *dev)
>>  
>>  static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>>  {
>> -    IntelIOMMUState *s = opaque;
>> -    VTDAddressSpace *vtd_as;
>> -
>> -    assert(0 <= devfn && devfn <= VTD_PCI_DEVFN_MAX);
>> +    X86IOMMUState *x86_iommu = opaque;
>> +    X86IOMMUClass *x86_class = X86_IOMMU_GET_CLASS(x86_iommu);
>>  
>> -    vtd_as = vtd_find_add_as(s, bus, devfn);
>> -    return &vtd_as->as;
>> +    assert(0 <= devfn && devfn <= X86_IOMMU_PCI_DEVFN_MAX);
>> +    return x86_class->find_add_as(x86_iommu, bus, devfn);
>>  }
>>  
>>  static void vtd_realize(DeviceState *dev, Error **errp)
>> @@ -2060,6 +2060,7 @@ static void vtd_class_init(ObjectClass *klass, void *data)
>>      dc->vmsd = &vtd_vmstate;
>>      dc->props = vtd_properties;
>>      x86_class->realize = vtd_realize;
>> +    x86_class->find_add_as = vtd_find_add_as;
>>  }
>>  
>>  static const TypeInfo vtd_info = {
>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
>> index 0794309..e36b896 100644
>> --- a/include/hw/i386/intel_iommu.h
>> +++ b/include/hw/i386/intel_iommu.h
>> @@ -125,9 +125,4 @@ struct IntelIOMMUState {
>>      VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
>>  };
>>  
>> -/* Find the VTD Address space associated with the given bus pointer,
>> - * create a new one if none exists
>> - */
>> -VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn);
>> -
>>  #endif
>> diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
>> index d6991cb..2070cd1 100644
>> --- a/include/hw/i386/x86-iommu.h
>> +++ b/include/hw/i386/x86-iommu.h
>> @@ -21,6 +21,7 @@
>>  #define IOMMU_COMMON_H
>>  
>>  #include "hw/sysbus.h"
>> +#include "exec/memory.h"
>>  
>>  #define  TYPE_X86_IOMMU_DEVICE  ("x86-iommu")
>>  #define  X86_IOMMU_DEVICE(obj) \
>> @@ -40,6 +41,8 @@ struct X86IOMMUClass {
>>      SysBusDeviceClass parent;
>>      /* Intel/AMD specific realize() hook */
>>      DeviceRealize realize;
>> +    /* Find/Add IOMMU address space for specific PCI device */
>> +    AddressSpace *(*find_add_as)(X86IOMMUState *s, PCIBus *bus, int devfn);
>>  };
>>  
>>  struct X86IOMMUState {
>> -- 
>> 2.4.11
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 03/26] x86-iommu: q35: generalize find_add_as()
  2016-07-04 16:08     ` Paolo Bonzini
@ 2016-07-04 16:35       ` Michael S. Tsirkin
  2016-07-04 16:40         ` Paolo Bonzini
  0 siblings, 1 reply; 63+ messages in thread
From: Michael S. Tsirkin @ 2016-07-04 16:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Xu, ehabkost, rkrcmar, jasowang, qemu-devel,
	alex.williamson, jan.kiszka, wexu, marcel, imammedo,
	davidkiarie4, rth

On Mon, Jul 04, 2016 at 06:08:28PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/07/2016 17:16, Michael S. Tsirkin wrote:
> > On Tue, Jun 21, 2016 at 03:47:31PM +0800, Peter Xu wrote:
> >> Remove VT-d calls in common q35 codes. Instead, we provide a general
> >> find_add_as() for x86-iommu type.
> >>
> >> Signed-off-by: Peter Xu <peterx@redhat.com>
> > 
> > I think it would be cleaner in the device,
> > not in the type. In theory you could then mix
> > multiple different iommu types in the same machine.
> 
> Not sure what you mean.  Each IOMMU subclass can define its own
> find_add_as implementation, what's wrong with that?
> 
> Paolo


this:

 @@ -2060,6 +2060,7 @@ static void vtd_class_init(ObjectClass *klass, void *data)
      dc->vmsd = &vtd_vmstate;
      dc->props = vtd_properties;
      x86_class->realize = vtd_realize;
 +    x86_class->find_add_as = vtd_find_add_as;
  }

I think this
means that if there are two classes inheriting x86_class,
they will conflict over-writing vtd_find_add_as in the
parent.

What did I miss?

> > 
> >> ---
> >>  hw/i386/intel_iommu.c         | 17 +++++++++--------
> >>  include/hw/i386/intel_iommu.h |  5 -----
> >>  include/hw/i386/x86-iommu.h   |  3 +++
> >>  3 files changed, 12 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> >> index 1936c41..b487224 100644
> >> --- a/hw/i386/intel_iommu.c
> >> +++ b/hw/i386/intel_iommu.c
> >> @@ -1900,8 +1900,10 @@ static Property vtd_properties[] = {
> >>  };
> >>  
> >>  
> >> -VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
> >> +static AddressSpace *vtd_find_add_as(X86IOMMUState *x86_iommu, PCIBus *bus,
> >> +                                     int devfn)
> >>  {
> >> +    IntelIOMMUState *s = (IntelIOMMUState *)x86_iommu;
> >>      uintptr_t key = (uintptr_t)bus;
> >>      VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
> >>      VTDAddressSpace *vtd_dev_as;
> >> @@ -1929,7 +1931,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
> >>          address_space_init(&vtd_dev_as->as,
> >>                             &vtd_dev_as->iommu, "intel_iommu");
> >>      }
> >> -    return vtd_dev_as;
> >> +    return &vtd_dev_as->as;
> >>  }
> >>  
> >>  /* Do the initialization. It will also be called when reset, so pay
> >> @@ -2021,13 +2023,11 @@ static void vtd_reset(DeviceState *dev)
> >>  
> >>  static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
> >>  {
> >> -    IntelIOMMUState *s = opaque;
> >> -    VTDAddressSpace *vtd_as;
> >> -
> >> -    assert(0 <= devfn && devfn <= VTD_PCI_DEVFN_MAX);
> >> +    X86IOMMUState *x86_iommu = opaque;
> >> +    X86IOMMUClass *x86_class = X86_IOMMU_GET_CLASS(x86_iommu);
> >>  
> >> -    vtd_as = vtd_find_add_as(s, bus, devfn);
> >> -    return &vtd_as->as;
> >> +    assert(0 <= devfn && devfn <= X86_IOMMU_PCI_DEVFN_MAX);
> >> +    return x86_class->find_add_as(x86_iommu, bus, devfn);
> >>  }
> >>  
> >>  static void vtd_realize(DeviceState *dev, Error **errp)
> >> @@ -2060,6 +2060,7 @@ static void vtd_class_init(ObjectClass *klass, void *data)
> >>      dc->vmsd = &vtd_vmstate;
> >>      dc->props = vtd_properties;
> >>      x86_class->realize = vtd_realize;
> >> +    x86_class->find_add_as = vtd_find_add_as;
> >>  }
> >>  
> >>  static const TypeInfo vtd_info = {
> >> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> >> index 0794309..e36b896 100644
> >> --- a/include/hw/i386/intel_iommu.h
> >> +++ b/include/hw/i386/intel_iommu.h
> >> @@ -125,9 +125,4 @@ struct IntelIOMMUState {
> >>      VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
> >>  };
> >>  
> >> -/* Find the VTD Address space associated with the given bus pointer,
> >> - * create a new one if none exists
> >> - */
> >> -VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn);
> >> -
> >>  #endif
> >> diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
> >> index d6991cb..2070cd1 100644
> >> --- a/include/hw/i386/x86-iommu.h
> >> +++ b/include/hw/i386/x86-iommu.h
> >> @@ -21,6 +21,7 @@
> >>  #define IOMMU_COMMON_H
> >>  
> >>  #include "hw/sysbus.h"
> >> +#include "exec/memory.h"
> >>  
> >>  #define  TYPE_X86_IOMMU_DEVICE  ("x86-iommu")
> >>  #define  X86_IOMMU_DEVICE(obj) \
> >> @@ -40,6 +41,8 @@ struct X86IOMMUClass {
> >>      SysBusDeviceClass parent;
> >>      /* Intel/AMD specific realize() hook */
> >>      DeviceRealize realize;
> >> +    /* Find/Add IOMMU address space for specific PCI device */
> >> +    AddressSpace *(*find_add_as)(X86IOMMUState *s, PCIBus *bus, int devfn);
> >>  };
> >>  
> >>  struct X86IOMMUState {
> >> -- 
> >> 2.4.11
> > 
> > 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU
  2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
                   ` (26 preceding siblings ...)
  2016-07-04 14:33 ` [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Paolo Bonzini
@ 2016-07-04 16:39 ` Michael S. Tsirkin
  27 siblings, 0 replies; 63+ messages in thread
From: Michael S. Tsirkin @ 2016-07-04 16:39 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4

On Tue, Jun 21, 2016 at 03:47:28PM +0800, Peter Xu wrote:
> This is v10 of Intel IOMMU IR support, based on patches:
> 
> - [PATCH v2 0/3] enable iommu with -device
>   https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg00554.html
> 
> V9 introduced one bug when split irqchip is used with multiple
> vCPUs.  V10 mainly fixes this issue, with several other trivial
> enhancements.

I am sending pull request today, please rebase to that.
There were too many conflicts for me to resolve with
confidence.

Preferably address at least some stylistic comments while you are
at it, but they can be fixed by patches on top, too.

> Online branch:
> 
>   https://github.com/xzpeter/qemu vtd-intr-v10
> 
> Please review.  Thanks.
> 
> v10 changes:
> - Fix issue when specify more than 1 vcpus.  This is introduced in v9
>   after rebased to Marcel's patches.  The problem is that, before
>   Marcel's patch, we will first create IOMMU then IOAPIC, while the
>   order is switched after Marcel's changes.  This affects patch 18
>   ("register IOMMU IEC notifier for ioapic") and I need to do the
>   registration after IOAPIC realization.
> - Display readable error message if user specify more than one x86
>   vIOMMU, rather than an assertion fail. (patch 2)
> - Correct vtd iec notifier "global" parameter: if granularity bit is
>   clear (not set), then it's a global invalidation (patch 17,
>   inverted meaning for granularity).
> - added one more patch (patch 26) to add some trace events for irqchip
>   msi routes operations.
> - rebase to latest master
> 
> v9 changes:
> - addressed several possible acpi issue with BE machines, and comment
>   fix [Igor]
> - removed patch 16 in v8 since it's useless after rebasing to Marcel's
>   patches
> - move vtd_svt_mask into vtd_irte_get() and declare it as constant.
> - rebase to latest master, with Marcel's "-device intel-iommu" patch v2
>   - re-arrange patch order, moving x86-iommu to the beginning (so that
>     I can add "intremap" property for it, which can be further shared
> 	by future AMD IOMMUs)
>   - add device property "intremap" for X86 IOMMU device (new patch 4
>     in v9)
>   - replace all existing references of MachineState.iommu_intr to
>     device property X86IOMMUState.intr_supported, removing
>     MachineState.iommu_intr
>   - some other minor changes due to the rebase
> 
> v8 changes:
> - rebase to latest master
> - patch 7
>   - remove VTD_IR_IOAPICEntry, which is useless now
>   - fix possible issue on big endian machines for VTD_IRTE,
>     VTD_IR_MSIAddress
> - patch 12
>   - fix endianess issue with bit-field defines: fix BE issue with
>     VTD_MSIMessage, do cpu_to_*() or reverse when necessary on
>     bit-field uses.
> - patch 19
>   - used le32_to_cpu() for dest_id, and added my s-o-b line beneath
>     Jan's.
> 
> v7 changes (using v6 patch index):
> - patch 10: trivial change in debug string (remove one more "\n")
> - patch 17-18: ioapic remote irr patches, sent seperately
>   already. So removed from this series.
> - patch 24: 
>   - fix commit message: only irqfd msi routes are maintained, not
>     all msi routes.
>   - skip all IOAPIC msi entries (dev == NULL). We only need to
>     housekeep irqfd users.
> - added patches
>   - pick up Radim's patch on adding MHMV ecap bits [Radim]
> - remove all vtd_* patches, instead, use x86-iommu ones at the first
>   place. This introduced lots of patch order changes and content
>   changes, which affected from original patch 8 to the end. Sorry!
>   [Jan]
> 
> v6 changes:
> - patch 10: use write_with_attrs() rather than write(), preparing
>   for SID verification [Jan]
> - patch 17-18: add r-b line from Radim [Radim]
> - new patch 19: put together Jan's EIM patch [Jan]
> - new patch 20: add SID validation process
> - new patch 21-22: introduce X86IOMMU class, which is the parent of
>   IntelIOMMU class. Patch 21 only introduce the class and did
>   nothing, patch 22 cleaned up all the vtd_*() hooks into x86
>   ones. This is only a start. In the future, we can abstract more
>   things into X86IOMMU class, like iotlb, address spaces mgmt,
>   etc. [Jan]
> - new patch 23-25: this is to do IEC notify to all irqfd consumers
>   like vhost/vfio. patch 23 changed interface for
>   kvm_irqchip_add_msi_route(), provide vector info rather than a raw
>   MSI message. Patch 24 added new hooks to do arch-specific
>   notification on addition/deletion of msi routes. Patch 25 is x86
>   specific, which added one more IEC notifier for msi routes. [Jan]
> - new patch 26: this is to partially solve the issue that Jan has
>   encountered (1 sec delay when invalidating IR cache).
> 
> v5 changes:
> - patch 10: add vector checking for IOAPIC interrupts (this may help
>   debug in the future, will only generate warning if specify
>   IOMMU_DEBUG)
> - patch 13: replace error_report() with a trace. [Jan]
> - patch 14: rename parameter "intr" to "intremap", to be aligned
>   with kernel parameter [Jan]
> - patch 15: fix comments for vtd_iec_notify_fn
> - patch 17 & 18 (added): fix issue when IR enabled with devices
>   using level-triggered interrupts, like e1000. Adding it to the end
>   of series, since this issue never happen without IR.
> 
>   Patch 17 adds read-only check for IOAPIC entries.
>   Patch 18 clears remote IRR bit when entry configured as
>   edge-triggered.
> 
> v4 changes (all patch number corresponds to v3):
> - add one patch at the start of v3 series: I missed to send the
>   first patch in v3. adding it in. [Jan]
> - patch 9: add support for compatible mode (no reason not to support
>   it, if not, we will get some warnings when using split irqchip)
> - patch 11: further simplify ioapic_update_kvm_routes() using the
>   helper function.
> - patch 12: tweak on kvm_arch_fixup_msi_route() rather than
>   ioapic_update_kvm_routes() only. [Radim]
> - add patch 15: introduce IEC (Interrupt Entry Cache) invalidation
>   notifier list. We can register to this list if we want to be
>   notified when we got IR invalidation requests [Radim]
> - add patch 16: let IOAPIC the first consumer for the above IEC
>   notifier list. [Radim]
> - several other trivial fixes (like moving some defines from .c to
>   .h, moving several lines of changes from one patch to another to
>   make it make more sense, etc.)
> 
> v3 changes (all patch numbers corresponds to v2):
> - patch 1 (-> v3 patch 13)
>   - move to the end of series [Alex]
> - patch 10 (dropped)
>   - drop this one, since re-worked on IOAPIC support, so we do not
>     need this any more.
> - patch 12 (-> v3 patch 10)
>   - leverage MSI path for IOAPIC IR [Jan]
> - patch 13 (v3 -> patch 9)
>   - remove vtd_interrupt_remap_msi() declaration by reordering the
>     functions [mst]
>   - vtd_generate_msi_message(): init msg using {}, remove FIXME
>     [mst]
> - new patches
>   - v3 patch 11: introduce ioapic_entry_parse() helper function
>   - v3 patch 12: add support for kernel-irqchip=split. This needs
>     more reviews, logically this should enable lots of things:
> 	splitted irqchip, irqfd, vhost, and irqfd support for
> 	passthrough devices (not tested). Please refer to the patch for
> 	more information.
> 
> v2 changes:
> - patch 1
>   - rename "int_remap" to "intr" in several places [Marcel]
>   - remove "Intel" specific words in desc or commit message, prepare
>     itself with further AMD support [Marcel]
>   - avoid using object_property_get_bool() [Marcel]
> - patch 5
>   - use PCI bus number 0xff rather than 0xf0 for the IOAPIC scope
>     definition. (please let me know if anyone knows how I can avoid
> 	user using PCI bus number 0xff... TIA)
> - patch 11
>   - fix comments [Marcel]
> - all
>   - remove intr_supported variable [Marcel]
> 
> This patchset provide interrupt remapping (IR) support of the emulated
> Intel IOMMU device.
> 
> By default, IR is disabled to be better compatible with current
> QEMU. To enable IR, we can use the following command to boot a
> IR-supported VM with virtio-net device with vhost (do not support
> kvm-ioapic, so we need to specify kernel-irqchip={split|off} here):
> 
> $ qemu-system-x86_64 -M q35,kernel-irqchip=split \
>      -device intel-iommu,intremap=on \
>      -enable-kvm -m 1024 \
> 	 -netdev tap,id=net0,vhost=on \
>      -device virtio-net-pci,netdev=user.0 \
>      -monitor telnet::3333,server,nowait \
> 	 /var/lib/libvirt/images/vm1.qcow2
> 
> When guest boots, we can verify whether IR enabled by grepping the
> dmesg like:
> 
> Feb 19 11:21:23 localhost.localdomain kernel: DMAR-IR: IOAPIC id 0 under DRHD base  0xfed90000 IOMMU 0
> Feb 19 11:21:23 localhost.localdomain kernel: DMAR-IR: Enabled IRQ remapping in x2apic mode
> 
> Testing is only covering basic smoke test for the following matrix:
> 
> - IR enabled/disable
> - kernel irqchip off/split
> - network device: tap with/without vhost, e1000
> - vCPU count: 1/2
> 
> Currently supported:
> 
> - Emulated/Splitted irqchip
> - Generic PCI Devices
> - vhost devices
> - pass through device support? Not tested, but suppose it should work.
> - IEC (Interrupt Entry Cache) cache invalidation notification
> - EIM (from Jan)
> - IRTE Source-id validation
> 
> TODO List:
> 
> - explicit IEC invalidation (currently, we do update without
>   checking. Also, we can process QI invalidation in bulk, as Jan
>   suggested)
> - IR fault reporting
> - migration support (for IOMMU as general?)
> - more?
> 
> Jan Kiszka (1):
>   intel_iommu: Add support for Extended Interrupt Mode
> 
> Peter Xu (24):
>   x86-iommu: introduce parent class
>   x86-iommu: provide x86_iommu_get_default
>   x86-iommu: q35: generalize find_add_as()
>   x86-iommu: introduce "intremap" property
>   acpi: enable INTR for DMAR report structure
>   intel_iommu: allow queued invalidation for IR
>   intel_iommu: set IR bit for ECAP register
>   acpi: add DMAR scope definition for root IOAPIC
>   intel_iommu: define interrupt remap table addr register
>   intel_iommu: handle interrupt remap enable
>   intel_iommu: define several structs for IOMMU IR
>   intel_iommu: add IR translation faults defines
>   intel_iommu: Add support for PCI MSI remap
>   q35: ioapic: add support for emulated IOAPIC IR
>   ioapic: introduce ioapic_entry_parse() helper
>   intel_iommu: add support for split irqchip
>   x86-iommu: introduce IEC notifiers
>   ioapic: register IOMMU IEC notifier for ioapic
>   intel_iommu: add SID validation for IR
>   kvm-irqchip: simplify kvm_irqchip_add_msi_route
>   kvm-irqchip: i386: add hook for add/remove virq
>   kvm-irqchip: x86: add msi route notify fn
>   kvm-irqchip: do explicit commit when update irq
>   kvm-all: add trace events for kvm irqchip ops
> 
> Radim Krčmář (1):
>   intel_iommu: support all masks in interrupt entry cache invalidation
> 
>  hw/i386/Makefile.objs             |   2 +-
>  hw/i386/acpi-build.c              |  39 +++-
>  hw/i386/intel_iommu.c             | 445 ++++++++++++++++++++++++++++++++++++--
>  hw/i386/intel_iommu_internal.h    |  50 ++++-
>  hw/i386/kvm/pci-assign.c          |  10 +-
>  hw/i386/pc.c                      |   3 +
>  hw/i386/x86-iommu.c               | 128 +++++++++++
>  hw/intc/ioapic.c                  | 133 ++++++++----
>  hw/misc/ivshmem.c                 |   4 +-
>  hw/pci/pci.c                      |  15 ++
>  hw/vfio/pci.c                     |  12 +-
>  hw/virtio/virtio-pci.c            |  10 +-
>  include/hw/acpi/acpi-defs.h       |  15 ++
>  include/hw/i386/apic-msidef.h     |   1 +
>  include/hw/i386/intel_iommu.h     | 175 ++++++++++++++-
>  include/hw/i386/ioapic_internal.h |   3 +
>  include/hw/i386/pc.h              |   4 +
>  include/hw/i386/x86-iommu.h       | 103 +++++++++
>  include/hw/pci-host/q35.h         |   8 +
>  include/hw/pci/pci.h              |   2 +
>  include/sysemu/kvm.h              |  21 +-
>  kvm-all.c                         |  19 +-
>  kvm-stub.c                        |   6 +-
>  target-arm/kvm.c                  |  11 +
>  target-i386/kvm.c                 | 109 +++++++++-
>  target-mips/kvm.c                 |  11 +
>  target-ppc/kvm.c                  |  11 +
>  target-s390x/kvm.c                |  11 +
>  trace-events                      |  12 +
>  29 files changed, 1263 insertions(+), 110 deletions(-)
>  create mode 100644 hw/i386/x86-iommu.c
>  create mode 100644 include/hw/i386/x86-iommu.h
> 
> -- 
> 2.4.11

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 03/26] x86-iommu: q35: generalize find_add_as()
  2016-07-04 16:35       ` Michael S. Tsirkin
@ 2016-07-04 16:40         ` Paolo Bonzini
  0 siblings, 0 replies; 63+ messages in thread
From: Paolo Bonzini @ 2016-07-04 16:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Xu, ehabkost, rkrcmar, jasowang, qemu-devel,
	alex.williamson, jan.kiszka, wexu, marcel, imammedo,
	davidkiarie4, rth



On 04/07/2016 18:35, Michael S. Tsirkin wrote:
>> > 
>> > Not sure what you mean.  Each IOMMU subclass can define its own
>> > find_add_as implementation, what's wrong with that?
>> > 
>> > Paolo
> 
> this:
> 
>  @@ -2060,6 +2060,7 @@ static void vtd_class_init(ObjectClass *klass, void *data)
>       dc->vmsd = &vtd_vmstate;
>       dc->props = vtd_properties;
>       x86_class->realize = vtd_realize;
>  +    x86_class->find_add_as = vtd_find_add_as;
>   }
> 
> I think this
> means that if there are two classes inheriting x86_class,
> they will conflict over-writing vtd_find_add_as in the
> parent.

No, x86_class is really just (X86IOMMUClass *)klass, and klass is local.

Paolo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 27/26] intel_iommu: disallow kernel-irqchip=on with IR
  2016-07-04 15:39       ` Michael S. Tsirkin
@ 2016-07-05  3:51         ` Peter Xu
  0 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-07-05  3:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4

On Mon, Jul 04, 2016 at 06:39:00PM +0300, Michael S. Tsirkin wrote:
> On Fri, Jun 24, 2016 at 05:20:22PM +0800, Peter Xu wrote:
> > On Fri, Jun 24, 2016 at 03:10:21PM +0800, Peter Xu wrote:
> > > When user specify "kernel-irqchip=on", throw error and then quit.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > > 
> > > One more patch for this series. Without this one, guest kernel will
> > > possibly hang. This is not user friendly.
> > 
> > This patch should not be here. It should in-reply-to the cover letter.
> > My fault to erroneously pasted a wrong message ID. :(((((
> > 
> > -- peterx
> 
> 
> It doesn't apply either.  Please repost it properly, including
> Paolo's ack.

Sure. Will be included in v11. Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 02/26] x86-iommu: provide x86_iommu_get_default
  2016-07-04 15:16   ` Michael S. Tsirkin
@ 2016-07-05  5:11     ` Peter Xu
  0 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-07-05  5:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4

On Mon, Jul 04, 2016 at 06:16:08PM +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 21, 2016 at 03:47:30PM +0800, Peter Xu wrote:
> > Instead of searching the device tree every time, one static variable is
> > declared for the default system x86 IOMMU device.  Also, some VT-d
> > macros are replaced by x86 ones.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> I think it's cleaner to just use object_resolve_path_type
> with the X86 type. Error handling by exit is rather ugly, too:
> if we need a singleton type, let's add one and have
> generic code detect such errors.

I did a quick measurement on the old path resolving method, it's
consuming >60us every time just to fetch the default IOMMU object (on
my laptop, i7-4810MQ CPU @ 2.80GHz). Do you think it'll be better if
we can avoid that? Currently there is no critical path that is using
this get_default(), only by IEC notifiers. However that will still
take some extra time during boot, or when the notifiers are triggered.

I agree that we should better provide a more general interface for
singleton semantic. But do you think it's okay I send another patch to
do that after this series merged? Since I may need some more time
reading the codes and IIUC it'll possibly be a very standalone patch
related to QOM, and another patch to just let X86 IOMMU be the first
user.

Thanks,
-- peterx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 02/26] x86-iommu: provide x86_iommu_get_default
  2016-07-04 15:17   ` Michael S. Tsirkin
@ 2016-07-05  5:12     ` Peter Xu
  0 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-07-05  5:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4

On Mon, Jul 04, 2016 at 06:17:47PM +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 21, 2016 at 03:47:30PM +0800, Peter Xu wrote:
> > Instead of searching the device tree every time, one static variable is
> > declared for the default system x86 IOMMU device.  Also, some VT-d
> > macros are replaced by x86 ones.
> 
> In the future pls don't mix unrelated changes in same patch like this.

Yes, sorry. I'll split it into two in v11 if you don't mind.

-- peterx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 05/26] acpi: enable INTR for DMAR report structure
  2016-07-04 15:14   ` Michael S. Tsirkin
@ 2016-07-05  6:39     ` Peter Xu
  0 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-07-05  6:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4

On Mon, Jul 04, 2016 at 06:14:41PM +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 21, 2016 at 03:47:33PM +0800, Peter Xu wrote:
> > In ACPI DMA remapping report structure, enable INTR flag when specified.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  hw/i386/acpi-build.c          | 11 ++++++++++-
> >  include/hw/i386/intel_iommu.h |  2 ++
> >  2 files changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index 161f089..961ccd6a 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -57,6 +57,7 @@
> >  
> >  #include "qapi/qmp/qint.h"
> >  #include "qom/qom-qobject.h"
> > +#include "hw/i386/x86-iommu.h"
> >  
> >  /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
> >   * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
> > @@ -2422,10 +2423,18 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
> >  
> >      AcpiTableDmar *dmar;
> >      AcpiDmarHardwareUnit *drhd;
> > +    uint8_t dmar_flags = 0;
> > +    X86IOMMUState *iommu = x86_iommu_get_default();
> > +
> > +    assert(iommu);
> > +    if (iommu->intr_supported) {
> > +        /* enable INTR for the IOMMU device */
> > +        dmar_flags |= DMAR_REPORT_F_INTR;
> 
> Please rewrite it: drop DMAR_REPORT_F_INTR macro,
> and replace with literal + comment documenting
> earliest spec version has it and the exact text
> to look for in the spec.

For "literal", do you mean this?

   dmar_flags |= 0x1;

Could I ask why we need to drop the macro? There are two possible
flags here, bit 0 is for IR, bit 1 (not used in QEMU yet) for
X2APIC_OPT_OUT. Macros seem to be more clear. Did I miss anything?

Regarding to the comment, maybe:

"enable INTR for the IOMMU device. See VT-d spec 8.1 (any version
 newer than Oct. 2014)"

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 08/26] acpi: add DMAR scope definition for root IOAPIC
  2016-07-04 15:22   ` Michael S. Tsirkin
@ 2016-07-05  7:30     ` Peter Xu
  0 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-07-05  7:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, pbonzini,
	jan.kiszka, rkrcmar, alex.williamson, wexu, davidkiarie4

On Mon, Jul 04, 2016 at 06:22:56PM +0300, Michael S. Tsirkin wrote:

[...]

> > @@ -2425,6 +2427,9 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
> >      AcpiDmarHardwareUnit *drhd;
> >      uint8_t dmar_flags = 0;
> >      X86IOMMUState *iommu = x86_iommu_get_default();
> > +    AcpiDmarDeviceScope *scope = NULL;
> > +    /* Root complex IOAPIC use one path[0] only */
> > +    uint8_t ioapic_scope_size = sizeof(*scope) + sizeof(scope->path[0]);
> 
> just use int or unsigned or size_t for types like this.

Will fix.

[...]

> > diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
> > index ea9be0b..0dbdde3 100644
> > --- a/include/hw/acpi/acpi-defs.h
> > +++ b/include/hw/acpi/acpi-defs.h
> > @@ -571,6 +571,20 @@ enum {
> >  /*
> >   * Sub-structures for DMAR
> >   */
> > +
> > +#define ACPI_DMAR_DEV_SCOPE_TYPE_IOAPIC     (0x03)
> 
> Again, pls use literal with comment in code.

Will fix.

[...]

> > diff --git a/include/hw/pci-host/q35.h b/include/hw/pci-host/q35.h
> > index c5c073d..312b47f 100644
> > --- a/include/hw/pci-host/q35.h
> > +++ b/include/hw/pci-host/q35.h
> > @@ -175,4 +175,12 @@ typedef struct Q35PCIHost {
> >  
> >  uint64_t mch_mcfg_base(void);
> >  
> > +/*
> > + * Arbitary but unique BNF number for IOAPIC device.
> > + *
> > + * TODO: make sure there would have no conflict with real PCI bus
> 
> How are you going to do this?

Still not think about it yet (on my todo list). Please shoot if
there's any suggestion.

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 17/26] x86-iommu: introduce IEC notifiers
  2016-07-04 14:22   ` Paolo Bonzini
@ 2016-07-05  7:32     ` Peter Xu
  0 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-07-05  7:32 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, ehabkost, mst, jasowang, rkrcmar, alex.williamson,
	jan.kiszka, wexu, marcel, imammedo, davidkiarie4, rth

On Mon, Jul 04, 2016 at 04:22:49PM +0200, Paolo Bonzini wrote:
> 
> 
> On 21/06/2016 09:47, Peter Xu wrote:
> > This patch introduces x86 IOMMU IEC (Interrupt Entry Cache)
> > invalidation notifier list. When vIOMMU receives IEC invalidate
> > request, all the registered units will be notified with specific
> > invalidation requests.
> > 
> > Intel IOMMU is the first provider that generates such a event.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> Please consider switching this to a NotifierList.

Noted in my todo. Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 24/26] kvm-irqchip: do explicit commit when update irq
  2016-07-04 14:23   ` [Qemu-devel] [PATCH v10 24/26] kvm-irqchip: do explicit commit when update irq Paolo Bonzini
@ 2016-07-05  7:35     ` Peter Xu
  0 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-07-05  7:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, ehabkost, mst, jasowang, rkrcmar, alex.williamson,
	jan.kiszka, wexu, marcel, imammedo, davidkiarie4, rth

On Mon, Jul 04, 2016 at 04:23:32PM +0200, Paolo Bonzini wrote:
> FWIW I prefer this to the "v10.2".

Let me drop v10.2 then. Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 27/26] intel_iommu: disallow kernel-irqchip=on with IR
  2016-06-24  7:10   ` [Qemu-devel] [PATCH v10 27/26] intel_iommu: disallow kernel-irqchip=on with IR Peter Xu
  2016-06-24  9:20     ` Peter Xu
@ 2016-07-11 10:17     ` David Kiarie
  2016-07-11 12:08       ` Peter Xu
  1 sibling, 1 reply; 63+ messages in thread
From: David Kiarie @ 2016-07-11 10:17 UTC (permalink / raw)
  To: Peter Xu
  Cc: QEMU Developers, imammedo, rth, Eduardo Habkost, jasowang,
	Marcel Apfelbaum, Michael S. Tsirkin, pbonzini, Jan Kiszka,
	rkrcmar, Alex Williamson, wexu

On Fri, Jun 24, 2016 at 10:10 AM, Peter Xu <peterx@redhat.com> wrote:
> When user specify "kernel-irqchip=on", throw error and then quit.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>
> One more patch for this series. Without this one, guest kernel will
> possibly hang. This is not user friendly.
>
>  hw/i386/intel_iommu.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 4ff9a24..618b0f9 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -20,6 +20,7 @@
>   */
>
>  #include "qemu/osdep.h"
> +#include "qemu/error-report.h"
>  #include "hw/sysbus.h"
>  #include "exec/address-spaces.h"
>  #include "intel_iommu_internal.h"
> @@ -29,6 +30,7 @@
>  #include "hw/boards.h"
>  #include "hw/i386/x86-iommu.h"
>  #include "hw/pci-host/q35.h"
> +#include "sysemu/kvm.h"
>
>  /*#define DEBUG_INTEL_IOMMU*/
>  #ifdef DEBUG_INTEL_IOMMU
> @@ -2458,6 +2460,13 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>      bus->iommu_opaque = dev;
>      /* Pseudo address space under root PCI bus. */
>      pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
> +
> +    /* Currently Intel IOMMU IR only support "kernel-irqchip={off|split}" */
> +    if (kvm_irqchip_in_kernel() && !kvm_irqchip_is_split()) {
> +        error_report("Intel Interrupt Remapping cannot work with "
> +                     "kernel-irqchip=on, please use 'split|off'.");
> +        exit(1);
> +    }
>  }

Shouldn't you be checking whether VT-d interrupt remapping is
enabled(I'm assuming it's off by default) before you ensure
kernel-irqchip=off|split ? Doesn't the above imply that one can't use
VT-d with kernel_irqchip=on (regardless of whether IR is enabled) ?

>
>  static void vtd_class_init(ObjectClass *klass, void *data)
> --
> 2.4.11
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 27/26] intel_iommu: disallow kernel-irqchip=on with IR
  2016-07-11 10:17     ` David Kiarie
@ 2016-07-11 12:08       ` Peter Xu
  0 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2016-07-11 12:08 UTC (permalink / raw)
  To: David Kiarie
  Cc: QEMU Developers, imammedo, rth, Eduardo Habkost, jasowang,
	Marcel Apfelbaum, Michael S. Tsirkin, pbonzini, Jan Kiszka,
	rkrcmar, Alex Williamson, wexu

On Mon, Jul 11, 2016 at 01:17:40PM +0300, David Kiarie wrote:
> On Fri, Jun 24, 2016 at 10:10 AM, Peter Xu <peterx@redhat.com> wrote:
> > When user specify "kernel-irqchip=on", throw error and then quit.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >
> > One more patch for this series. Without this one, guest kernel will
> > possibly hang. This is not user friendly.
> >
> >  hw/i386/intel_iommu.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 4ff9a24..618b0f9 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -20,6 +20,7 @@
> >   */
> >
> >  #include "qemu/osdep.h"
> > +#include "qemu/error-report.h"
> >  #include "hw/sysbus.h"
> >  #include "exec/address-spaces.h"
> >  #include "intel_iommu_internal.h"
> > @@ -29,6 +30,7 @@
> >  #include "hw/boards.h"
> >  #include "hw/i386/x86-iommu.h"
> >  #include "hw/pci-host/q35.h"
> > +#include "sysemu/kvm.h"
> >
> >  /*#define DEBUG_INTEL_IOMMU*/
> >  #ifdef DEBUG_INTEL_IOMMU
> > @@ -2458,6 +2460,13 @@ static void vtd_realize(DeviceState *dev, Error **errp)
> >      bus->iommu_opaque = dev;
> >      /* Pseudo address space under root PCI bus. */
> >      pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
> > +
> > +    /* Currently Intel IOMMU IR only support "kernel-irqchip={off|split}" */
> > +    if (kvm_irqchip_in_kernel() && !kvm_irqchip_is_split()) {
> > +        error_report("Intel Interrupt Remapping cannot work with "
> > +                     "kernel-irqchip=on, please use 'split|off'.");
> > +        exit(1);
> > +    }
> >  }
> 
> Shouldn't you be checking whether VT-d interrupt remapping is
> enabled(I'm assuming it's off by default) before you ensure
> kernel-irqchip=off|split ? Doesn't the above imply that one can't use
> VT-d with kernel_irqchip=on (regardless of whether IR is enabled) ?

Yes we should allow ir=off and kernel-irqchip=on. Will fix in v12,
thanks!

-- peterx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 16/26] intel_iommu: add support for split irqchip
  2016-06-26 13:27           ` Jan Kiszka
  2016-06-28  6:10             ` Michael S. Tsirkin
  2016-06-28  7:25             ` Peter Xu
@ 2017-01-03  6:15             ` Peter Xu
  2017-01-04 10:33               ` Jan Kiszka
  2 siblings, 1 reply; 63+ messages in thread
From: Peter Xu @ 2017-01-03  6:15 UTC (permalink / raw)
  To: Jan Kiszka, Paolo Bonzini
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, mst,
	rkrcmar, alex.williamson, wexu, davidkiarie4, Valentine Sinitsyn

On Sun, Jun 26, 2016 at 03:27:50PM +0200, Jan Kiszka wrote:
> On 2016-06-26 03:48, Peter Xu wrote:
> > On Sat, Jun 25, 2016 at 05:18:40PM +0200, Jan Kiszka wrote:
> >> On 2016-06-25 15:18, Peter Xu wrote:
> >>> On Sat, Jun 25, 2016 at 10:08:10AM +0200, Jan Kiszka wrote:
> > 
> > [...]
> > 
> >>> I have a thought on how to implement the "sink" you have mentioned:
> >>>
> >>> First of all, in KVM, we provide a new KVM_IRQ_ROUTING_* type, maybe
> >>> called:
> >>>
> >>>   KVM_IRQ_ROUTING_EVENTFD
> >>
> >> Not really, because all sources are either using eventfds, which you can
> >> also terminate in user space (already done for vhost and vfio in certain
> >> scenarios - IIRC) or originate there anyway (IOAPIC).
> > 
> > But how should we handle the cases when the interrupt path are all in
> > kernel?
> 
> There are none which we can't redirect (only full in-kernel irqchip
> would have, but that's unsupported anyway).
> 
> > 
> > For vhost, data should be transfered all inside kernel when split
> > irqchip and irqfd are used: when vhost got data, it triggers irqfd to
> > deliver the interrupt to KVM. Along the way, we should all in kernel.
> > 
> > For vfio, we have vfio_msihandler() who handles the hardware IRQ and
> > then triggers irqfd as well to KVM. Again, it seems all in kernel
> > space, no chance to stop that as well.
> > 
> > Please correct me if I was wrong.
> 
> Look at what vhost is doing e.g.: when a virtqueue is masked, it
> installs an event notifier that records incoming events in a pending
> state field. When it's unmasked, the corresponding KVM irqfd is installed.

Hmm I think it's time I pick up this topic up again... :)

Since it's been half a year from the last post of this thread (I
believe this thread is the so-called "cold data" and should be stored
on tapes already... and sorry fot the long delay), I'd like to do a
quick summary on this: interrupt remap still cannot work well when we
install fault interrupts - when that happens, we should inject VT-d
fault, rather than keeping silence.

The suggestion from Jan above should be a good solution that only need
to touch qemu part - that's the most benefit AFAIU. However, OTOH IMO
we need to modify all the kvm irqfd users with this fix (pci-assign,
ioapic, ivshmem, vfio-pci, virtio) - we need to have all these devices
init with an "fault sink" eventfd, then when we detected specific
irqfd install error, we install the "fault sink". What's worse, if we
add new devices with irqfd support, we need to implement the same
error handling logic as well. Am I understanding it correctly? If so,
isn't that awkward?

Now I am re-thinking about my KVM_IRQ_ROUTING_EVENTFD proposal to do
it - in that case, we should not need to worry about the users of kvm
irqfd, and the error handling is done automatically even with new
irqfd users coming in. The disadvantage is of course we need to touch
both qemu and kvm, also we need to touch KVM API for it (though I
think it'll only need very small change in KVM). And not sure whether
that would worth it.

Or, any better way to do it?

Hope I didn't miss anything. Comments are welcomed!

Regards,

-- peterx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 16/26] intel_iommu: add support for split irqchip
  2017-01-03  6:15             ` Peter Xu
@ 2017-01-04 10:33               ` Jan Kiszka
  2017-01-05  2:21                 ` Peter Xu
  0 siblings, 1 reply; 63+ messages in thread
From: Jan Kiszka @ 2017-01-04 10:33 UTC (permalink / raw)
  To: Peter Xu, Paolo Bonzini
  Cc: qemu-devel, imammedo, rth, ehabkost, jasowang, marcel, mst,
	rkrcmar, alex.williamson, wexu, davidkiarie4, Valentine Sinitsyn

[-- Attachment #1: Type: text/plain, Size: 3654 bytes --]

On 2017-01-03 07:15, Peter Xu wrote:
> On Sun, Jun 26, 2016 at 03:27:50PM +0200, Jan Kiszka wrote:
>> On 2016-06-26 03:48, Peter Xu wrote:
>>> On Sat, Jun 25, 2016 at 05:18:40PM +0200, Jan Kiszka wrote:
>>>> On 2016-06-25 15:18, Peter Xu wrote:
>>>>> On Sat, Jun 25, 2016 at 10:08:10AM +0200, Jan Kiszka wrote:
>>>
>>> [...]
>>>
>>>>> I have a thought on how to implement the "sink" you have mentioned:
>>>>>
>>>>> First of all, in KVM, we provide a new KVM_IRQ_ROUTING_* type, maybe
>>>>> called:
>>>>>
>>>>>   KVM_IRQ_ROUTING_EVENTFD
>>>>
>>>> Not really, because all sources are either using eventfds, which you can
>>>> also terminate in user space (already done for vhost and vfio in certain
>>>> scenarios - IIRC) or originate there anyway (IOAPIC).
>>>
>>> But how should we handle the cases when the interrupt path are all in
>>> kernel?
>>
>> There are none which we can't redirect (only full in-kernel irqchip
>> would have, but that's unsupported anyway).
>>
>>>
>>> For vhost, data should be transfered all inside kernel when split
>>> irqchip and irqfd are used: when vhost got data, it triggers irqfd to
>>> deliver the interrupt to KVM. Along the way, we should all in kernel.
>>>
>>> For vfio, we have vfio_msihandler() who handles the hardware IRQ and
>>> then triggers irqfd as well to KVM. Again, it seems all in kernel
>>> space, no chance to stop that as well.
>>>
>>> Please correct me if I was wrong.
>>
>> Look at what vhost is doing e.g.: when a virtqueue is masked, it
>> installs an event notifier that records incoming events in a pending
>> state field. When it's unmasked, the corresponding KVM irqfd is installed.
> 
> Hmm I think it's time I pick up this topic up again... :)
> 
> Since it's been half a year from the last post of this thread (I
> believe this thread is the so-called "cold data" and should be stored
> on tapes already... and sorry fot the long delay), I'd like to do a
> quick summary on this: interrupt remap still cannot work well when we
> install fault interrupts - when that happens, we should inject VT-d
> fault, rather than keeping silence.
> 
> The suggestion from Jan above should be a good solution that only need
> to touch qemu part - that's the most benefit AFAIU. However, OTOH IMO
> we need to modify all the kvm irqfd users with this fix (pci-assign,
> ioapic, ivshmem, vfio-pci, virtio) - we need to have all these devices
> init with an "fault sink" eventfd, then when we detected specific
> irqfd install error, we install the "fault sink". What's worse, if we
> add new devices with irqfd support, we need to implement the same
> error handling logic as well. Am I understanding it correctly? If so,
> isn't that awkward?
> 
> Now I am re-thinking about my KVM_IRQ_ROUTING_EVENTFD proposal to do
> it - in that case, we should not need to worry about the users of kvm
> irqfd, and the error handling is done automatically even with new
> irqfd users coming in. The disadvantage is of course we need to touch
> both qemu and kvm, also we need to touch KVM API for it (though I
> think it'll only need very small change in KVM). And not sure whether
> that would worth it.
> 
> Or, any better way to do it?
> 
> Hope I didn't miss anything. Comments are welcomed!
> 

I don't have the details in mind again, but I suppose the only
alternative to fixing a QEMU boilerplate code issue with new KVM kernel
interface is abstracting the common patterns in QEMU that all the irqfd
users share and solve solve that topic once. Might turn out, though,
that the exiting kernel interface prevents this...

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v10 16/26] intel_iommu: add support for split irqchip
  2017-01-04 10:33               ` Jan Kiszka
@ 2017-01-05  2:21                 ` Peter Xu
  0 siblings, 0 replies; 63+ messages in thread
From: Peter Xu @ 2017-01-05  2:21 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Paolo Bonzini, qemu-devel, imammedo, rth, ehabkost, jasowang,
	marcel, mst, rkrcmar, alex.williamson, wexu, davidkiarie4,
	Valentine Sinitsyn

On Wed, Jan 04, 2017 at 11:33:36AM +0100, Jan Kiszka wrote:
> On 2017-01-03 07:15, Peter Xu wrote:
> > On Sun, Jun 26, 2016 at 03:27:50PM +0200, Jan Kiszka wrote:
> >> On 2016-06-26 03:48, Peter Xu wrote:
> >>> On Sat, Jun 25, 2016 at 05:18:40PM +0200, Jan Kiszka wrote:
> >>>> On 2016-06-25 15:18, Peter Xu wrote:
> >>>>> On Sat, Jun 25, 2016 at 10:08:10AM +0200, Jan Kiszka wrote:
> >>>
> >>> [...]
> >>>
> >>>>> I have a thought on how to implement the "sink" you have mentioned:
> >>>>>
> >>>>> First of all, in KVM, we provide a new KVM_IRQ_ROUTING_* type, maybe
> >>>>> called:
> >>>>>
> >>>>>   KVM_IRQ_ROUTING_EVENTFD
> >>>>
> >>>> Not really, because all sources are either using eventfds, which you can
> >>>> also terminate in user space (already done for vhost and vfio in certain
> >>>> scenarios - IIRC) or originate there anyway (IOAPIC).
> >>>
> >>> But how should we handle the cases when the interrupt path are all in
> >>> kernel?
> >>
> >> There are none which we can't redirect (only full in-kernel irqchip
> >> would have, but that's unsupported anyway).
> >>
> >>>
> >>> For vhost, data should be transfered all inside kernel when split
> >>> irqchip and irqfd are used: when vhost got data, it triggers irqfd to
> >>> deliver the interrupt to KVM. Along the way, we should all in kernel.
> >>>
> >>> For vfio, we have vfio_msihandler() who handles the hardware IRQ and
> >>> then triggers irqfd as well to KVM. Again, it seems all in kernel
> >>> space, no chance to stop that as well.
> >>>
> >>> Please correct me if I was wrong.
> >>
> >> Look at what vhost is doing e.g.: when a virtqueue is masked, it
> >> installs an event notifier that records incoming events in a pending
> >> state field. When it's unmasked, the corresponding KVM irqfd is installed.
> > 
> > Hmm I think it's time I pick up this topic up again... :)
> > 
> > Since it's been half a year from the last post of this thread (I
> > believe this thread is the so-called "cold data" and should be stored
> > on tapes already... and sorry fot the long delay), I'd like to do a
> > quick summary on this: interrupt remap still cannot work well when we
> > install fault interrupts - when that happens, we should inject VT-d
> > fault, rather than keeping silence.
> > 
> > The suggestion from Jan above should be a good solution that only need
> > to touch qemu part - that's the most benefit AFAIU. However, OTOH IMO
> > we need to modify all the kvm irqfd users with this fix (pci-assign,
> > ioapic, ivshmem, vfio-pci, virtio) - we need to have all these devices
> > init with an "fault sink" eventfd, then when we detected specific
> > irqfd install error, we install the "fault sink". What's worse, if we
> > add new devices with irqfd support, we need to implement the same
> > error handling logic as well. Am I understanding it correctly? If so,
> > isn't that awkward?
> > 
> > Now I am re-thinking about my KVM_IRQ_ROUTING_EVENTFD proposal to do
> > it - in that case, we should not need to worry about the users of kvm
> > irqfd, and the error handling is done automatically even with new
> > irqfd users coming in. The disadvantage is of course we need to touch
> > both qemu and kvm, also we need to touch KVM API for it (though I
> > think it'll only need very small change in KVM). And not sure whether
> > that would worth it.
> > 
> > Or, any better way to do it?
> > 
> > Hope I didn't miss anything. Comments are welcomed!
> > 
> 
> I don't have the details in mind again, but I suppose the only
> alternative to fixing a QEMU boilerplate code issue with new KVM kernel
> interface is abstracting the common patterns in QEMU that all the irqfd
> users share and solve solve that topic once. Might turn out, though,
> that the exiting kernel interface prevents this...

Hmm, (after a quick glance) I was just afraid that I might need to
touch lots of codes in QEMU even to provide such a common layer for
this single fault tolerance feature.

Then let me think it over again... Thanks Jan!

-- peterx

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2017-01-05  2:21 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-21  7:47 [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 01/26] x86-iommu: introduce parent class Peter Xu
2016-06-24  7:10   ` [Qemu-devel] [PATCH v10 27/26] intel_iommu: disallow kernel-irqchip=on with IR Peter Xu
2016-06-24  9:20     ` Peter Xu
2016-07-04 15:39       ` Michael S. Tsirkin
2016-07-05  3:51         ` Peter Xu
2016-07-11 10:17     ` David Kiarie
2016-07-11 12:08       ` Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 02/26] x86-iommu: provide x86_iommu_get_default Peter Xu
2016-07-04 15:16   ` Michael S. Tsirkin
2016-07-05  5:11     ` Peter Xu
2016-07-04 15:17   ` Michael S. Tsirkin
2016-07-05  5:12     ` Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 03/26] x86-iommu: q35: generalize find_add_as() Peter Xu
2016-07-04 15:16   ` Michael S. Tsirkin
2016-07-04 16:08     ` Paolo Bonzini
2016-07-04 16:35       ` Michael S. Tsirkin
2016-07-04 16:40         ` Paolo Bonzini
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 04/26] x86-iommu: introduce "intremap" property Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 05/26] acpi: enable INTR for DMAR report structure Peter Xu
2016-07-04 15:14   ` Michael S. Tsirkin
2016-07-05  6:39     ` Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 06/26] intel_iommu: allow queued invalidation for IR Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 07/26] intel_iommu: set IR bit for ECAP register Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 08/26] acpi: add DMAR scope definition for root IOAPIC Peter Xu
2016-07-04 15:22   ` Michael S. Tsirkin
2016-07-05  7:30     ` Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 09/26] intel_iommu: define interrupt remap table addr register Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 10/26] intel_iommu: handle interrupt remap enable Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 11/26] intel_iommu: define several structs for IOMMU IR Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 12/26] intel_iommu: add IR translation faults defines Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 13/26] intel_iommu: Add support for PCI MSI remap Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 14/26] q35: ioapic: add support for emulated IOAPIC IR Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 15/26] ioapic: introduce ioapic_entry_parse() helper Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 16/26] intel_iommu: add support for split irqchip Peter Xu
2016-06-25  8:08   ` Jan Kiszka
2016-06-25 13:18     ` Peter Xu
2016-06-25 15:18       ` Jan Kiszka
2016-06-26  1:48         ` Peter Xu
2016-06-26 13:27           ` Jan Kiszka
2016-06-28  6:10             ` Michael S. Tsirkin
2016-06-28  7:25             ` Peter Xu
2017-01-03  6:15             ` Peter Xu
2017-01-04 10:33               ` Jan Kiszka
2017-01-05  2:21                 ` Peter Xu
2016-07-04 14:32   ` Paolo Bonzini
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 17/26] x86-iommu: introduce IEC notifiers Peter Xu
2016-07-04 14:22   ` Paolo Bonzini
2016-07-05  7:32     ` Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 18/26] ioapic: register IOMMU IEC notifier for ioapic Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 19/26] intel_iommu: Add support for Extended Interrupt Mode Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 20/26] intel_iommu: add SID validation for IR Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 21/26] kvm-irqchip: simplify kvm_irqchip_add_msi_route Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 22/26] kvm-irqchip: i386: add hook for add/remove virq Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 23/26] kvm-irqchip: x86: add msi route notify fn Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 24/26] kvm-irqchip: do explicit commit when update irq Peter Xu
2016-06-22  3:42   ` [Qemu-devel] [PATCH v10.2 24/26] kvm-irqchip: introduce kvm_irqchip_update_msi_route_no_commit Peter Xu
2016-07-04 14:23   ` [Qemu-devel] [PATCH v10 24/26] kvm-irqchip: do explicit commit when update irq Paolo Bonzini
2016-07-05  7:35     ` Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 25/26] intel_iommu: support all masks in interrupt entry cache invalidation Peter Xu
2016-06-21  7:47 ` [Qemu-devel] [PATCH v10 26/26] kvm-all: add trace events for kvm irqchip ops Peter Xu
2016-07-04 14:33 ` [Qemu-devel] [PATCH v10 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU Paolo Bonzini
2016-07-04 16:39 ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.