All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/6] x86_iommu/amd: add interrupt remap support
@ 2018-09-11 16:49 Brijesh Singh
  2018-09-11 16:49 ` [Qemu-devel] [PATCH 1/6] x86_iommu: move the kernel-irqchip check in common code Brijesh Singh
                   ` (5 more replies)
  0 siblings, 6 replies; 31+ messages in thread
From: Brijesh Singh @ 2018-09-11 16:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Brijesh Singh, Michael S. Tsirkin, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Marcel Apfelbaum,
	Tom Lendacky, Suravee Suthikulpanit

This series adds the interrupt remapping support for amd-iommu device.

IOMMU spec is available at: https://support.amd.com/TechDocs/48882_IOMMU.pdf

To enable the interrupt remap use below qemu cli
# $QEMU \
  -device amd-iommu,intremap=on

I have tested FC-28 and Ubuntu 18.04 guest. 

Linux guest bootup log shows the interrupt remap supports:

[root@localhost ~]# dmesg | grep -i AMD-Vi
[    0.001761] AMD-Vi: Using IVHD type 0x10
[    0.003051] AMD-Vi: device: 00:03.0 cap: 0040 seg: 0 flags: d1 info 0000
[    0.004007] AMD-Vi:        mmio-addr: 00000000fed80000
[    0.004874] AMD-Vi:   DEV_ALL                        flags: 00
[    0.006236] AMD-Vi:   DEV_SPECIAL(IOAPIC[0])         devid: 00:14.0
[    0.667943] AMD-Vi: Found IOMMU at 0000:00:03.0 cap 0x40
[    0.668727] AMD-Vi: Extended features (0x29d3):
[    0.669874] AMD-Vi: Interrupt remapping enabled
[    0.671074] AMD-Vi: Lazy IO/TLB flushing enabled

cat /proc/interrupts confirms that its using IR

[root@localhost ~]# cat /proc/interrupts 
CPU0       
 0:         40  IR-IO-APIC    2-edge      timer
 1:          9  IR-IO-APIC    1-edge      i8042
 4:       1770  IR-IO-APIC    4-edge      ttyS0
 7:          0  IR-IO-APIC    7-edge      parport0
 8:          1  IR-IO-APIC    8-edge      rtc0
 9:          0  IR-IO-APIC    9-fasteoi   acpi
12:         15  IR-IO-APIC   12-edge      i8042
16:          0  IR-IO-APIC   16-fasteoi   i801_smbus
24:          0   PCI-MSI 49152-edge      AMD-Vi
25:      13070  IR-PCI-MSI 512000-edge      ahci[0000:00:1f.2]
26:         86  IR-PCI-MSI 32768-edge      enp0s2-rx-0
27:        139  IR-PCI-MSI 32769-edge      enp0s2-tx-0
28:          1  IR-PCI-MSI 32770-edge      enp0s2
NMI:          0   Non-maskable interrupts
LOC:      26686   Local timer interrupts
SPU:          0   Spurious interrupts
...
...

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>

Brijesh Singh (6):
  x86_iommu: move the kernel-irqchip check in common code
  x86_iommu/amd: Prepare for interrupt remap support
  x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled
  i386: acpi: add IVHD device entry for IOAPIC
  x86_iommu/amd: Add interrupt remap support when VAPIC is enabled
  x86_iommu/amd: Enable Guest virtual APIC support

 hw/i386/acpi-build.c  |  23 +++-
 hw/i386/amd_iommu.c   | 360 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/i386/amd_iommu.h   | 115 +++++++++++++++-
 hw/i386/intel_iommu.c |   7 -
 hw/i386/trace-events  |  14 ++
 hw/i386/x86-iommu.c   |   9 ++
 6 files changed, 516 insertions(+), 12 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 1/6] x86_iommu: move the kernel-irqchip check in common code
  2018-09-11 16:49 [Qemu-devel] [PATCH 0/6] x86_iommu/amd: add interrupt remap support Brijesh Singh
@ 2018-09-11 16:49 ` Brijesh Singh
  2018-09-12  3:45   ` Peter Xu
  2018-09-11 16:49 ` [Qemu-devel] [PATCH 2/6] x86_iommu/amd: Prepare for interrupt remap support Brijesh Singh
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 31+ messages in thread
From: Brijesh Singh @ 2018-09-11 16:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Brijesh Singh, Michael S. Tsirkin, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Marcel Apfelbaum,
	Tom Lendacky, Suravee Suthikulpanit

Interrupt remapping needs kernel-irqchip={off|split} on both Intel and AMD
platforms. Move the check in common place.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 hw/i386/intel_iommu.c | 7 -------
 hw/i386/x86-iommu.c   | 9 +++++++++
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3dfada1..84dbc20 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3248,13 +3248,6 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
 {
     X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
 
-    /* Currently Intel IOMMU IR only support "kernel-irqchip={off|split}" */
-    if (x86_iommu->intr_supported && kvm_irqchip_in_kernel() &&
-        !kvm_irqchip_is_split()) {
-        error_setg(errp, "Intel Interrupt Remapping cannot work with "
-                         "kernel-irqchip=on, please use 'split|off'.");
-        return false;
-    }
     if (s->intr_eim == ON_OFF_AUTO_ON && !x86_iommu->intr_supported) {
         error_setg(errp, "eim=on cannot be selected without intremap=on");
         return false;
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 8a01a2d..7440cb8 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -25,6 +25,7 @@
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "trace.h"
+#include "sysemu/kvm.h"
 
 void x86_iommu_iec_register_notifier(X86IOMMUState *iommu,
                                      iec_notify_fn fn, void *data)
@@ -94,6 +95,14 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    /* Both Intel and AMD IOMMU IR only support "kernel-irqchip={off|split}" */
+    if (x86_iommu->intr_supported && kvm_irqchip_in_kernel() &&
+        !kvm_irqchip_is_split()) {
+        error_setg(errp, "Interrupt Remapping cannot work with "
+                         "kernel-irqchip=on, please use 'split|off'.");
+        return;
+    }
+
     if (x86_class->realize) {
         x86_class->realize(dev, errp);
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 2/6] x86_iommu/amd: Prepare for interrupt remap support
  2018-09-11 16:49 [Qemu-devel] [PATCH 0/6] x86_iommu/amd: add interrupt remap support Brijesh Singh
  2018-09-11 16:49 ` [Qemu-devel] [PATCH 1/6] x86_iommu: move the kernel-irqchip check in common code Brijesh Singh
@ 2018-09-11 16:49 ` Brijesh Singh
  2018-09-12  3:52   ` Peter Xu
  2018-09-11 16:49 ` [Qemu-devel] [PATCH 3/6] x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled Brijesh Singh
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 31+ messages in thread
From: Brijesh Singh @ 2018-09-11 16:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Brijesh Singh, Michael S. Tsirkin, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Marcel Apfelbaum,
	Tom Lendacky, Suravee Suthikulpanit

Register the interrupt remapping callback and read/write ops for the
amd-iommu-ir memory region.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 hw/i386/amd_iommu.c  | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/i386/amd_iommu.h  |  17 +++++++-
 hw/i386/trace-events |   5 +++
 3 files changed, 127 insertions(+), 2 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 1fd669f..572ba0a 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -26,6 +26,7 @@
 #include "amd_iommu.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
+#include "hw/i386/apic_internal.h"
 #include "trace.h"
 
 /* used AMD-Vi MMIO registers */
@@ -1026,6 +1027,101 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
     return ret;
 }
 
+/* Interrupt remapping for MSI/MSI-X entry */
+static int amdvi_int_remap_msi(AMDVIState *iommu,
+                               MSIMessage *origin,
+                               MSIMessage *translated,
+                               uint16_t sid)
+{
+    int ret;
+
+    assert(origin && translated);
+
+    trace_amdvi_ir_remap_msi_req(origin->address, origin->data, sid);
+
+    if (!iommu || !iommu->intr_enabled) {
+        memcpy(translated, origin, sizeof(*origin));
+        goto out;
+    }
+
+    if (origin->address & AMDVI_MSI_ADDR_HI_MASK) {
+        trace_amdvi_err("MSI address high 32 bits non-zero when "
+                        "Interrupt Remapping enabled.");
+        return -AMDVI_IR_ERR;
+    }
+
+    if ((origin->address & AMDVI_MSI_ADDR_LO_MASK) != APIC_DEFAULT_ADDRESS) {
+        trace_amdvi_err("MSI is not from IOAPIC.");
+        return -AMDVI_IR_ERR;
+    }
+
+out:
+    trace_amdvi_ir_remap_msi(origin->address, origin->data,
+                             translated->address, translated->data);
+    return 0;
+}
+
+static int amdvi_int_remap(X86IOMMUState *iommu,
+                           MSIMessage *origin,
+                           MSIMessage *translated,
+                           uint16_t sid)
+{
+    return amdvi_int_remap_msi(AMD_IOMMU_DEVICE(iommu), origin,
+                               translated, sid);
+}
+
+static MemTxResult amdvi_mem_ir_write(void *opaque, hwaddr addr,
+                                      uint64_t value, unsigned size,
+                                      MemTxAttrs attrs)
+{
+    int ret;
+    MSIMessage from = { 0, 0 }, to = { 0, 0 };
+    uint16_t sid = AMDVI_SB_IOAPIC_ID;
+
+    from.address = (uint64_t) addr + AMDVI_INT_ADDR_FIRST;
+    from.data = (uint32_t) value;
+
+    trace_amdvi_mem_ir_write_req(addr, value, size);
+
+    if (!attrs.unspecified) {
+        /* We have explicit Source ID */
+        sid = attrs.requester_id;
+    }
+
+    ret = amdvi_int_remap_msi(opaque, &from, &to, sid);
+    if (ret < 0) {
+        /* TODO: report error */
+        /* Drop the interrupt */
+        return MEMTX_ERROR;
+    }
+
+    apic_get_class()->send_msi(&to);
+
+    trace_amdvi_mem_ir_write(to.address, to.data);
+    return MEMTX_OK;
+}
+
+static MemTxResult amdvi_mem_ir_read(void *opaque, hwaddr addr,
+                                     uint64_t *data, unsigned size,
+                                     MemTxAttrs attrs)
+{
+    return MEMTX_OK;
+}
+
+static const MemoryRegionOps amdvi_ir_ops = {
+    .read_with_attrs = amdvi_mem_ir_read,
+    .write_with_attrs = amdvi_mem_ir_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    }
+};
+
 static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 {
     AMDVIState *s = opaque;
@@ -1055,6 +1151,12 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
         address_space_init(&iommu_as[devfn]->as,
                            MEMORY_REGION(&iommu_as[devfn]->iommu),
                            "amd-iommu");
+        memory_region_init_io(&iommu_as[devfn]->iommu_ir, OBJECT(s),
+                              &amdvi_ir_ops, s, "amd-iommu-ir",
+                              AMDVI_INT_ADDR_SIZE);
+        memory_region_add_subregion(MEMORY_REGION(&iommu_as[devfn]->iommu),
+                              AMDVI_INT_ADDR_FIRST,
+                              &iommu_as[devfn]->iommu_ir);
     }
     return &iommu_as[devfn]->as;
 }
@@ -1172,6 +1274,10 @@ static void amdvi_realize(DeviceState *dev, Error **err)
         return;
     }
 
+    /* Pseudo address space under root PCI bus. */
+    pcms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_SB_IOAPIC_ID);
+    s->intr_enabled = x86_iommu->intr_supported;
+
     /* set up MMIO */
     memory_region_init_io(&s->mmio, OBJECT(s), &mmio_mem_ops, s, "amdvi-mmio",
                           AMDVI_MMIO_SIZE);
@@ -1205,6 +1311,7 @@ static void amdvi_class_init(ObjectClass *klass, void* data)
     dc->vmsd = &vmstate_amdvi;
     dc->hotpluggable = false;
     dc_class->realize = amdvi_realize;
+    dc_class->int_remap = amdvi_int_remap;
     /* Supported by the pc-q35-* machine types */
     dc->user_creatable = true;
 }
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index 8740305..74e568b 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -206,8 +206,18 @@
 
 #define AMDVI_COMMAND_SIZE   16
 
-#define AMDVI_INT_ADDR_FIRST 0xfee00000
-#define AMDVI_INT_ADDR_LAST  0xfeefffff
+#define AMDVI_INT_ADDR_FIRST    0xfee00000
+#define AMDVI_INT_ADDR_LAST     0xfeefffff
+#define AMDVI_INT_ADDR_SIZE     (AMDVI_INT_ADDR_LAST - AMDVI_INT_ADDR_FIRST + 1)
+#define AMDVI_MSI_ADDR_HI_MASK  (0xffffffff00000000ULL)
+#define AMDVI_MSI_ADDR_LO_MASK  (0x00000000ffffffffULL)
+
+/* Southbridge IOAPIC ID */
+#define AMDVI_SB_IOAPIC_ID      0xa0
+
+/* Interrupt remapping errors */
+#define AMDVI_IR_ERR            0x1
+
 
 #define TYPE_AMD_IOMMU_DEVICE "amd-iommu"
 #define AMD_IOMMU_DEVICE(obj)\
@@ -278,6 +288,9 @@ typedef struct AMDVIState {
 
     /* IOTLB */
     GHashTable *iotlb;
+
+    /* Interrupt remapping */
+    bool intr_enabled;
 } AMDVIState;
 
 #endif
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 9e6fc4d..41d533c 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -101,6 +101,11 @@ amdvi_mode_invalid(uint8_t level, uint64_t addr)"error: translation level 0x%"PR
 amdvi_page_fault(uint64_t addr) "error: page fault accessing guest physical address 0x%"PRIx64
 amdvi_iotlb_hit(uint8_t bus, uint8_t slot, uint8_t func, uint64_t addr, uint64_t txaddr) "hit iotlb devid %02x:%02x.%x gpa 0x%"PRIx64" hpa 0x%"PRIx64
 amdvi_translation_result(uint8_t bus, uint8_t slot, uint8_t func, uint64_t addr, uint64_t txaddr) "devid: %02x:%02x.%x gpa 0x%"PRIx64" hpa 0x%"PRIx64
+amdvi_mem_ir_write_req(uint64_t addr, uint64_t val, uint32_t size) "addr 0x%"PRIx64" data 0x%"PRIx64" size 0x%"PRIx32
+amdvi_mem_ir_write(uint64_t addr, uint64_t val) "addr 0x%"PRIx64" data 0x%"PRIx64
+amdvi_ir_remap_msi_req(uint64_t addr, uint64_t data, uint8_t devid) "addr 0x%"PRIx64" data 0x%"PRIx64" devid 0x%"PRIx8
+amdvi_ir_remap_msi(uint64_t addr, uint64_t data, uint64_t addr2, uint64_t data2) "(addr 0x%"PRIx64", data 0x%"PRIx64") -> (addr 0x%"PRIx64", data 0x%"PRIx64")"
+amdvi_err(const char *str) "%s"
 
 # hw/i386/vmport.c
 vmport_register(unsigned char command, void *func, void *opaque) "command: 0x%02x func: %p opaque: %p"
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 3/6] x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled
  2018-09-11 16:49 [Qemu-devel] [PATCH 0/6] x86_iommu/amd: add interrupt remap support Brijesh Singh
  2018-09-11 16:49 ` [Qemu-devel] [PATCH 1/6] x86_iommu: move the kernel-irqchip check in common code Brijesh Singh
  2018-09-11 16:49 ` [Qemu-devel] [PATCH 2/6] x86_iommu/amd: Prepare for interrupt remap support Brijesh Singh
@ 2018-09-11 16:49 ` Brijesh Singh
  2018-09-12  3:37   ` Peter Xu
  2018-09-11 16:49 ` [Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC Brijesh Singh
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 31+ messages in thread
From: Brijesh Singh @ 2018-09-11 16:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Brijesh Singh, Michael S. Tsirkin, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Marcel Apfelbaum,
	Tom Lendacky, Suravee Suthikulpanit

Emulate the interrupt remapping support when guest virtual APIC is
not enabled.

See IOMMU spec: https://support.amd.com/TechDocs/48882_IOMMU.pdf
(section 2.2.5.1) for details information.

When VAPIC is not enabled, it uses interrupt remapping as defined in
Table 20 and Figure 15 from IOMMU spec.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 hw/i386/amd_iommu.c  | 187 +++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/i386/amd_iommu.h  |  60 ++++++++++++++++-
 hw/i386/trace-events |   7 ++
 3 files changed, 253 insertions(+), 1 deletion(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 572ba0a..5ac19df 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -28,6 +28,8 @@
 #include "qemu/error-report.h"
 #include "hw/i386/apic_internal.h"
 #include "trace.h"
+#include "cpu.h"
+#include "hw/i386/apic-msidef.h"
 
 /* used AMD-Vi MMIO registers */
 const char *amdvi_mmio_low[] = {
@@ -1027,6 +1029,119 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
     return ret;
 }
 
+static int amdvi_get_irte(AMDVIState *s, MSIMessage *origin, uint64_t *dte,
+                          union irte *irte, uint16_t devid)
+{
+    uint64_t irte_root, offset;
+
+    irte_root = dte[2] & AMDVI_IR_PHYS_ADDR_MASK;
+    offset = (origin->data & AMDVI_IRTE_OFFSET) << 2;
+
+    trace_amdvi_ir_irte(irte_root, offset);
+
+    if (dma_memory_read(&address_space_memory, irte_root + offset,
+                        irte, sizeof(*irte))) {
+        trace_amdvi_ir_err("failed to get irte");
+        return -AMDVI_IR_GET_IRTE;
+    }
+
+    trace_amdvi_ir_irte_val(irte->val);
+
+    return 0;
+}
+
+static void amdvi_generate_msi_message(struct AMDVIIrq *irq, MSIMessage *out)
+{
+    out->address = APIC_DEFAULT_ADDRESS | \
+        (irq->dest_mode << MSI_ADDR_DEST_MODE_SHIFT) | \
+        (irq->redir_hint << MSI_ADDR_REDIRECTION_SHIFT) | \
+        (irq->dest << MSI_ADDR_DEST_ID_SHIFT);
+
+    out->data = (irq->vector << MSI_DATA_VECTOR_SHIFT) | \
+        (irq->delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
+
+    trace_amdvi_ir_generate_msi_message(irq->vector, irq->delivery_mode,
+            irq->dest_mode, irq->dest, irq->redir_hint);
+}
+
+static int amdvi_int_remap_legacy(AMDVIState *iommu,
+                                  MSIMessage *origin,
+                                  MSIMessage *translated,
+                                  uint64_t *dte,
+                                  struct AMDVIIrq *irq,
+                                  uint16_t sid)
+{
+    int ret;
+    union irte irte;
+
+    /* get interrupt remapping table */
+    ret = amdvi_get_irte(iommu, origin, dte, &irte, sid);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (!irte.fields.valid) {
+        trace_amdvi_ir_target_abort("RemapEn is disabled");
+        return -AMDVI_IR_TARGET_ABORT;
+    }
+
+    if (irte.fields.guest_mode) {
+        trace_amdvi_ir_target_abort("guest mode is not zero");
+        return -AMDVI_IR_TARGET_ABORT;
+    }
+
+    if (irte.fields.int_type > AMDVI_IOAPIC_INT_TYPE_ARBITRATED) {
+        trace_amdvi_ir_target_abort("reserved int_type");
+        return -AMDVI_IR_TARGET_ABORT;
+    }
+
+    irq->delivery_mode = irte.fields.int_type;
+    irq->vector = irte.fields.vector;
+    irq->dest_mode = irte.fields.dm;
+    irq->redir_hint = irte.fields.rq_eoi;
+    irq->dest = irte.fields.destination;
+
+    return 0;
+}
+
+static int __amdvi_int_remap_msi(AMDVIState *iommu,
+                                 MSIMessage *origin,
+                                 MSIMessage *translated,
+                                 uint64_t *dte,
+                                 uint16_t sid)
+{
+    int ret;
+    uint8_t int_ctl;
+    struct AMDVIIrq irq = { 0 };
+
+    int_ctl = (dte[2] >> AMDVI_IR_INTCTL_SHIFT) & 3;
+    trace_amdvi_ir_intctl(int_ctl);
+
+    switch (int_ctl) {
+    case AMDVI_IR_INTCTL_PASS:
+        memcpy(translated, origin, sizeof(*origin));
+        return 0;
+    case AMDVI_IR_INTCTL_REMAP:
+        break;
+    case AMDVI_IR_INTCTL_ABORT:
+        trace_amdvi_ir_target_abort("int_ctl abort");
+        return -AMDVI_IR_TARGET_ABORT;
+    default:
+        trace_amdvi_ir_target_abort("int_ctl reserved");
+        return -AMDVI_IR_TARGET_ABORT;
+    }
+
+    ret = amdvi_int_remap_legacy(iommu, origin, translated, dte, &irq, sid);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* Translate AMDVIIrq to MSI message */
+    amdvi_generate_msi_message(&irq, translated);
+
+    return 0;
+}
+
 /* Interrupt remapping for MSI/MSI-X entry */
 static int amdvi_int_remap_msi(AMDVIState *iommu,
                                MSIMessage *origin,
@@ -1034,6 +1149,9 @@ static int amdvi_int_remap_msi(AMDVIState *iommu,
                                uint16_t sid)
 {
     int ret;
+    uint64_t pass = 0;
+    uint64_t dte[4] = { 0 };
+    uint8_t dest_mode, delivery_mode;
 
     assert(origin && translated);
 
@@ -1055,10 +1173,79 @@ static int amdvi_int_remap_msi(AMDVIState *iommu,
         return -AMDVI_IR_ERR;
     }
 
+    /*
+     * When IOMMU is enabled, interrupt remap request will come either from
+     * IO-APIC or PCI device. If interrupt is from PCI device then it will
+     * have a valid requester id but if the interrupt it from IO-APIC
+     * then requester is will be invalid.
+     */
+    if (sid == X86_IOMMU_SID_INVALID) {
+        sid = AMDVI_SB_IOAPIC_ID;
+    }
+
+    amdvi_get_dte(iommu, sid, dte);
+
+    /* interrupt remapping is disabled */
+    if (!(dte[2] & AMDVI_IR_REMAP_ENABLE)) {
+        memcpy(translated, origin, sizeof(*origin));
+        goto out;
+    }
+
+    /* deliver_mode and dest_mode from MSI data */
+    dest_mode = (origin->data >> 11) & 1;       /* Bit 11 */
+    delivery_mode = (origin->data >> 7) & 7;    /* Bits 10:8 */
+
+    switch (delivery_mode) {
+    case AMDVI_IOAPIC_INT_TYPE_FIXED:
+    case AMDVI_IOAPIC_INT_TYPE_ARBITRATED:
+        trace_amdvi_ir_delivery_mode("fixed/arbitrated");
+        ret = __amdvi_int_remap_msi(iommu, origin, translated, dte, sid);
+        if (ret < 0) {
+            goto remap_fail;
+        } else {
+            goto out;
+        }
+    case AMDVI_IOAPIC_INT_TYPE_SMI:
+        error_report("SMI is not supported!");
+        goto remap_fail;
+    case AMDVI_IOAPIC_INT_TYPE_NMI:
+        pass = dte[3] & AMDVI_DEV_NMI_PASS_MASK;
+        trace_amdvi_ir_delivery_mode("nmi");
+        break;
+    case AMDVI_IOAPIC_INT_TYPE_INIT:
+        pass = dte[3] & AMDVI_DEV_INT_PASS_MASK;
+        trace_amdvi_ir_delivery_mode("init");
+        break;
+    case AMDVI_IOAPIC_INT_TYPE_EINT:
+        pass = dte[3] & AMDVI_DEV_EINT_PASS_MASK;
+        trace_amdvi_ir_delivery_mode("eint");
+        break;
+    default:
+        trace_amdvi_ir_delivery_mode("unsupported delivery_mode");
+        goto remap_fail;
+    }
+
+    /* dest_mode 1 is valid for fixed and arbitrated interrupts only */
+    if (dest_mode) {
+        trace_amdvi_ir_err("invalid dest_mode");
+        goto remap_fail;
+    }
+
+    if (pass) {
+        memcpy(translated, origin, sizeof(*origin));
+    } else {
+        /* pass through is not enabled */
+        trace_amdvi_ir_err("passthrough is not enabled");
+        goto remap_fail;
+    }
+
 out:
     trace_amdvi_ir_remap_msi(origin->address, origin->data,
                              translated->address, translated->data);
     return 0;
+
+remap_fail:
+    return -AMDVI_IR_TARGET_ABORT;
 }
 
 static int amdvi_int_remap(X86IOMMUState *iommu,
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index 74e568b..58ef4db 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -217,7 +217,65 @@
 
 /* Interrupt remapping errors */
 #define AMDVI_IR_ERR            0x1
-
+#define AMDVI_IR_GET_IRTE       0x2
+#define AMDVI_IR_TARGET_ABORT   0x3
+
+/* Interrupt remapping */
+#define AMDVI_IR_REMAP_ENABLE           1ULL
+#define AMDVI_IR_INTCTL_SHIFT           60
+#define AMDVI_IR_INTCTL_ABORT           0
+#define AMDVI_IR_INTCTL_PASS            1
+#define AMDVI_IR_INTCTL_REMAP           2
+
+#define AMDVI_IR_PHYS_ADDR_MASK         (((1ULL << 45) - 1) << 6)
+
+/* MSI data 10:0 bits (section 2.2.5.1 Fig 14) */
+#define AMDVI_IRTE_OFFSET               0x7ff
+
+/* Delivery mode of MSI data (same as IOAPIC deilver mode encoding) */
+#define AMDVI_IOAPIC_INT_TYPE_FIXED          0x0
+#define AMDVI_IOAPIC_INT_TYPE_ARBITRATED     0x1
+#define AMDVI_IOAPIC_INT_TYPE_SMI            0x2
+#define AMDVI_IOAPIC_INT_TYPE_NMI            0x4
+#define AMDVI_IOAPIC_INT_TYPE_INIT           0x5
+#define AMDVI_IOAPIC_INT_TYPE_EINT           0x7
+
+/* Pass through interrupt */
+#define AMDVI_DEV_INT_PASS_MASK         (1UL << 56)
+#define AMDVI_DEV_EINT_PASS_MASK        (1UL << 57)
+#define AMDVI_DEV_NMI_PASS_MASK         (1UL << 58)
+#define AMDVI_DEV_LINT0_PASS_MASK       (1UL << 62)
+#define AMDVI_DEV_LINT1_PASS_MASK       (1UL << 63)
+
+/* Generic IRQ entry information */
+struct AMDVIIrq {
+    /* Used by both IOAPIC/MSI interrupt remapping */
+    uint8_t trigger_mode;
+    uint8_t vector;
+    uint8_t delivery_mode;
+    uint32_t dest;
+    uint8_t dest_mode;
+
+    /* only used by MSI interrupt remapping */
+    uint8_t redir_hint;
+    uint8_t msi_addr_last_bits;
+};
+
+/* Interrupt remapping table fields (Guest VAPIC not enabled) */
+union irte {
+    uint32_t val;
+    struct {
+        uint32_t valid:1,
+                 no_fault:1,
+                 int_type:3,
+                 rq_eoi:1,
+                 dm:1,
+                 guest_mode:1,
+                 destination:8,
+                 vector:8,
+                 rsvd:8;
+    } fields;
+};
 
 #define TYPE_AMD_IOMMU_DEVICE "amd-iommu"
 #define AMD_IOMMU_DEVICE(obj)\
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 41d533c..98150c9 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -106,6 +106,13 @@ amdvi_mem_ir_write(uint64_t addr, uint64_t val) "addr 0x%"PRIx64" data 0x%"PRIx6
 amdvi_ir_remap_msi_req(uint64_t addr, uint64_t data, uint8_t devid) "addr 0x%"PRIx64" data 0x%"PRIx64" devid 0x%"PRIx8
 amdvi_ir_remap_msi(uint64_t addr, uint64_t data, uint64_t addr2, uint64_t data2) "(addr 0x%"PRIx64", data 0x%"PRIx64") -> (addr 0x%"PRIx64", data 0x%"PRIx64")"
 amdvi_err(const char *str) "%s"
+amdvi_ir_irte(uint64_t addr, uint64_t data) "addr 0x%"PRIx64" offset 0x%"PRIx64
+amdvi_ir_irte_val(uint32_t data) "data 0x%"PRIx32
+amdvi_ir_err(const char *str) "%s"
+amdvi_ir_intctl(uint8_t val) "int_ctl 0x%"PRIx8
+amdvi_ir_target_abort(const char *str) "%s"
+amdvi_ir_delivery_mode(const char *str) "%s"
+amdvi_ir_generate_msi_message(uint8_t vector, uint8_t delivery_mode, uint8_t dest_mode, uint8_t dest, uint8_t rh) "vector %d delivery-mode %d dest-mode %d dest-id %d rh %d"
 
 # hw/i386/vmport.c
 vmport_register(unsigned char command, void *func, void *opaque) "command: 0x%02x func: %p opaque: %p"
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC
  2018-09-11 16:49 [Qemu-devel] [PATCH 0/6] x86_iommu/amd: add interrupt remap support Brijesh Singh
                   ` (2 preceding siblings ...)
  2018-09-11 16:49 ` [Qemu-devel] [PATCH 3/6] x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled Brijesh Singh
@ 2018-09-11 16:49 ` Brijesh Singh
  2018-09-12  4:35   ` Peter Xu
  2018-09-12 16:35   ` Igor Mammedov
  2018-09-11 16:49 ` [Qemu-devel] [PATCH 6/6] x86_iommu/amd: Enable Guest virtual APIC support Brijesh Singh
       [not found] ` <1536684589-11718-6-git-send-email-brijesh.singh@amd.com>
  5 siblings, 2 replies; 31+ messages in thread
From: Brijesh Singh @ 2018-09-11 16:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Brijesh Singh, Michael S. Tsirkin, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Marcel Apfelbaum,
	Tom Lendacky, Suravee Suthikulpanit

When interrupt remapping is enabled, add a special IVHD device
(type IOAPIC) -- which is typically PCI device 14:0.0. Linux IOMMU driver
checks for this special device.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 hw/i386/acpi-build.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index e1ee8ae..5c2c638 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2519,6 +2519,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
 static void
 build_amd_iommu(GArray *table_data, BIOSLinker *linker)
 {
+    int ivhd_table_len = 28;
     int iommu_start = table_data->len;
     AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
 
@@ -2540,8 +2541,16 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
                              (1UL << 6) | /* PrefSup      */
                              (1UL << 7),  /* PPRSup       */
                              1);
+
+    /*
+     * When interrupt remapping is enabled, we add a special IVHD device
+     * for type IO-APIC.
+     */
+    if (s->intr_enabled) {
+        ivhd_table_len += 8;
+    }
     /* IVHD length */
-    build_append_int_noprefix(table_data, 28, 2);
+    build_append_int_noprefix(table_data, ivhd_table_len, 2);
     /* DeviceID */
     build_append_int_noprefix(table_data, s->devid, 2);
     /* Capability offset */
@@ -2565,6 +2574,15 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
      */
     build_append_int_noprefix(table_data, 0x0000001, 4);
 
+    /*
+     * When interrupt remapping is enabled, Linux IOMMU driver also checks
+     * for special IVHD device (type IO-APIC), which is typically presented
+     * as PCI device 14:00.0.
+     */
+    if (s->intr_enabled) {
+        build_append_int_noprefix(table_data, 0x0100a00000000048, 8);
+    }
+
     build_header(linker, table_data, (void *)(table_data->data + iommu_start),
                  "IVRS", table_data->len - iommu_start, 1, NULL, NULL);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 6/6] x86_iommu/amd: Enable Guest virtual APIC support
  2018-09-11 16:49 [Qemu-devel] [PATCH 0/6] x86_iommu/amd: add interrupt remap support Brijesh Singh
                   ` (3 preceding siblings ...)
  2018-09-11 16:49 ` [Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC Brijesh Singh
@ 2018-09-11 16:49 ` Brijesh Singh
  2018-09-12  4:52   ` Peter Xu
  2018-09-12 16:38   ` Igor Mammedov
       [not found] ` <1536684589-11718-6-git-send-email-brijesh.singh@amd.com>
  5 siblings, 2 replies; 31+ messages in thread
From: Brijesh Singh @ 2018-09-11 16:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Brijesh Singh, Michael S. Tsirkin, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Marcel Apfelbaum,
	Tom Lendacky, Suravee Suthikulpanit

Now that amd-iommu support interrupt remapping, enable the GASup in IVRS
table and GASup in extended feature register to indicate that IOMMU
support guest virtual APIC mode.

Note that the GAMSup is set to zero to indicate that  Guest Virtual
APIC does not support advanced interrupt features (i.e virtualized
interrupts using the guest virtual APIC).

See Table 21 from IOMMU spec for interrupt virtualization controls

IOMMU spec: https://support.amd.com/TechDocs/48882_IOMMU.pdf

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 hw/i386/acpi-build.c | 3 ++-
 hw/i386/amd_iommu.h  | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 5c2c638..1cbc8ba 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2565,7 +2565,8 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
     build_append_int_noprefix(table_data,
                              (48UL << 30) | /* HATS   */
                              (48UL << 28) | /* GATS   */
-                             (1UL << 2),    /* GTSup  */
+                             (1UL << 2)   | /* GTSup  */
+                             (1UL << 6),    /* GASup  */
                              4);
     /*
      *   Type 1 device entry reporting all devices
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index 1dab974..5defaac 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -177,7 +177,7 @@
 /* extended feature support */
 #define AMDVI_EXT_FEATURES (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
         AMDVI_FEATURE_IA | AMDVI_FEATURE_GT | AMDVI_FEATURE_HE | \
-        AMDVI_GATS_MODE | AMDVI_HATS_MODE)
+        AMDVI_GATS_MODE | AMDVI_HATS_MODE | AMDVI_FEATURE_GA)
 
 /* capabilities header */
 #define AMDVI_CAPAB_FEATURES (AMDVI_CAPAB_FLAT_EXT | \
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 3/6] x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled
  2018-09-11 16:49 ` [Qemu-devel] [PATCH 3/6] x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled Brijesh Singh
@ 2018-09-12  3:37   ` Peter Xu
  2018-09-12 18:50     ` Brijesh Singh
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Xu @ 2018-09-12  3:37 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: qemu-devel, Tom Lendacky, Eduardo Habkost, Michael S. Tsirkin,
	Paolo Bonzini, Suravee Suthikulpanit, Richard Henderson

On Tue, Sep 11, 2018 at 11:49:46AM -0500, Brijesh Singh wrote:
> Emulate the interrupt remapping support when guest virtual APIC is
> not enabled.
> 
> See IOMMU spec: https://support.amd.com/TechDocs/48882_IOMMU.pdf
> (section 2.2.5.1) for details information.
> 
> When VAPIC is not enabled, it uses interrupt remapping as defined in
> Table 20 and Figure 15 from IOMMU spec.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  hw/i386/amd_iommu.c  | 187 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/i386/amd_iommu.h  |  60 ++++++++++++++++-
>  hw/i386/trace-events |   7 ++
>  3 files changed, 253 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index 572ba0a..5ac19df 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -28,6 +28,8 @@
>  #include "qemu/error-report.h"
>  #include "hw/i386/apic_internal.h"
>  #include "trace.h"
> +#include "cpu.h"
> +#include "hw/i386/apic-msidef.h"
>  
>  /* used AMD-Vi MMIO registers */
>  const char *amdvi_mmio_low[] = {
> @@ -1027,6 +1029,119 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
>      return ret;
>  }
>  
> +static int amdvi_get_irte(AMDVIState *s, MSIMessage *origin, uint64_t *dte,
> +                          union irte *irte, uint16_t devid)
> +{
> +    uint64_t irte_root, offset;
> +
> +    irte_root = dte[2] & AMDVI_IR_PHYS_ADDR_MASK;
> +    offset = (origin->data & AMDVI_IRTE_OFFSET) << 2;
> +
> +    trace_amdvi_ir_irte(irte_root, offset);
> +
> +    if (dma_memory_read(&address_space_memory, irte_root + offset,
> +                        irte, sizeof(*irte))) {
> +        trace_amdvi_ir_err("failed to get irte");
> +        return -AMDVI_IR_GET_IRTE;
> +    }
> +
> +    trace_amdvi_ir_irte_val(irte->val);
> +
> +    return 0;
> +}
> +
> +static void amdvi_generate_msi_message(struct AMDVIIrq *irq, MSIMessage *out)
> +{
> +    out->address = APIC_DEFAULT_ADDRESS | \
> +        (irq->dest_mode << MSI_ADDR_DEST_MODE_SHIFT) | \
> +        (irq->redir_hint << MSI_ADDR_REDIRECTION_SHIFT) | \
> +        (irq->dest << MSI_ADDR_DEST_ID_SHIFT);
> +
> +    out->data = (irq->vector << MSI_DATA_VECTOR_SHIFT) | \
> +        (irq->delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
> +
> +    trace_amdvi_ir_generate_msi_message(irq->vector, irq->delivery_mode,
> +            irq->dest_mode, irq->dest, irq->redir_hint);
> +}
> +
> +static int amdvi_int_remap_legacy(AMDVIState *iommu,
> +                                  MSIMessage *origin,
> +                                  MSIMessage *translated,
> +                                  uint64_t *dte,
> +                                  struct AMDVIIrq *irq,
> +                                  uint16_t sid)
> +{
> +    int ret;
> +    union irte irte;
> +
> +    /* get interrupt remapping table */

... get interrupt remapping table "entry"? :)

I see similar wordings in your spec, e.g., Table 20 is named as
"Interrupt Remapping Table Fields - Basic Format", but actually AFAICT
it's for the entry fields.  I'm confused a bit with them.

> +    ret = amdvi_get_irte(iommu, origin, dte, &irte, sid);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    if (!irte.fields.valid) {
> +        trace_amdvi_ir_target_abort("RemapEn is disabled");
> +        return -AMDVI_IR_TARGET_ABORT;
> +    }
> +
> +    if (irte.fields.guest_mode) {
> +        trace_amdvi_ir_target_abort("guest mode is not zero");
> +        return -AMDVI_IR_TARGET_ABORT;
> +    }
> +
> +    if (irte.fields.int_type > AMDVI_IOAPIC_INT_TYPE_ARBITRATED) {
> +        trace_amdvi_ir_target_abort("reserved int_type");
> +        return -AMDVI_IR_TARGET_ABORT;
> +    }
> +
> +    irq->delivery_mode = irte.fields.int_type;
> +    irq->vector = irte.fields.vector;
> +    irq->dest_mode = irte.fields.dm;
> +    irq->redir_hint = irte.fields.rq_eoi;
> +    irq->dest = irte.fields.destination;
> +
> +    return 0;
> +}
> +
> +static int __amdvi_int_remap_msi(AMDVIState *iommu,
> +                                 MSIMessage *origin,
> +                                 MSIMessage *translated,
> +                                 uint64_t *dte,
> +                                 uint16_t sid)
> +{
> +    int ret;
> +    uint8_t int_ctl;
> +    struct AMDVIIrq irq = { 0 };
> +
> +    int_ctl = (dte[2] >> AMDVI_IR_INTCTL_SHIFT) & 3;
> +    trace_amdvi_ir_intctl(int_ctl);
> +
> +    switch (int_ctl) {
> +    case AMDVI_IR_INTCTL_PASS:
> +        memcpy(translated, origin, sizeof(*origin));
> +        return 0;
> +    case AMDVI_IR_INTCTL_REMAP:
> +        break;
> +    case AMDVI_IR_INTCTL_ABORT:
> +        trace_amdvi_ir_target_abort("int_ctl abort");
> +        return -AMDVI_IR_TARGET_ABORT;
> +    default:
> +        trace_amdvi_ir_target_abort("int_ctl reserved");
> +        return -AMDVI_IR_TARGET_ABORT;
> +    }
> +
> +    ret = amdvi_int_remap_legacy(iommu, origin, translated, dte, &irq, sid);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    /* Translate AMDVIIrq to MSI message */
> +    amdvi_generate_msi_message(&irq, translated);
> +
> +    return 0;
> +}
> +
>  /* Interrupt remapping for MSI/MSI-X entry */
>  static int amdvi_int_remap_msi(AMDVIState *iommu,
>                                 MSIMessage *origin,
> @@ -1034,6 +1149,9 @@ static int amdvi_int_remap_msi(AMDVIState *iommu,
>                                 uint16_t sid)
>  {
>      int ret;
> +    uint64_t pass = 0;
> +    uint64_t dte[4] = { 0 };
> +    uint8_t dest_mode, delivery_mode;
>  
>      assert(origin && translated);
>  
> @@ -1055,10 +1173,79 @@ static int amdvi_int_remap_msi(AMDVIState *iommu,
>          return -AMDVI_IR_ERR;
>      }
>  
> +    /*
> +     * When IOMMU is enabled, interrupt remap request will come either from
> +     * IO-APIC or PCI device. If interrupt is from PCI device then it will
> +     * have a valid requester id but if the interrupt it from IO-APIC
> +     * then requester is will be invalid.

s/is/id/

> +     */
> +    if (sid == X86_IOMMU_SID_INVALID) {
> +        sid = AMDVI_SB_IOAPIC_ID;
> +    }
> +
> +    amdvi_get_dte(iommu, sid, dte);

Mind to check the return value?

After all it's provided, and we have the fault path to handle it in
this function.

> +
> +    /* interrupt remapping is disabled */
> +    if (!(dte[2] & AMDVI_IR_REMAP_ENABLE)) {
> +        memcpy(translated, origin, sizeof(*origin));
> +        goto out;
> +    }
> +
> +    /* deliver_mode and dest_mode from MSI data */
> +    dest_mode = (origin->data >> 11) & 1;       /* Bit 11 */
> +    delivery_mode = (origin->data >> 7) & 7;    /* Bits 10:8 */

Nit: The comments are already nice, though IMHO some mask macros would
be nicer, like AMDVI_IR_PHYS_ADDR_MASK.

Also, could you hint me where are these things defined in the spec?
I'm looking at Figure 14, but there MSI data bits 15:11 are defined as
"XXXXXb", and bits 10:8 seems to be part of the interrupt table offset.

> +
> +    switch (delivery_mode) {
> +    case AMDVI_IOAPIC_INT_TYPE_FIXED:
> +    case AMDVI_IOAPIC_INT_TYPE_ARBITRATED:
> +        trace_amdvi_ir_delivery_mode("fixed/arbitrated");
> +        ret = __amdvi_int_remap_msi(iommu, origin, translated, dte, sid);
> +        if (ret < 0) {
> +            goto remap_fail;
> +        } else {
> +            goto out;
> +        }
> +    case AMDVI_IOAPIC_INT_TYPE_SMI:
> +        error_report("SMI is not supported!");
> +        goto remap_fail;

(I'm not sure whether some compiler will try to find the "break" and
 spit things if it failed to do so, so normally I'll keep them
 there... but I might be wrong)

> +    case AMDVI_IOAPIC_INT_TYPE_NMI:
> +        pass = dte[3] & AMDVI_DEV_NMI_PASS_MASK;
> +        trace_amdvi_ir_delivery_mode("nmi");
> +        break;
> +    case AMDVI_IOAPIC_INT_TYPE_INIT:
> +        pass = dte[3] & AMDVI_DEV_INT_PASS_MASK;
> +        trace_amdvi_ir_delivery_mode("init");
> +        break;
> +    case AMDVI_IOAPIC_INT_TYPE_EINT:
> +        pass = dte[3] & AMDVI_DEV_EINT_PASS_MASK;
> +        trace_amdvi_ir_delivery_mode("eint");
> +        break;
> +    default:
> +        trace_amdvi_ir_delivery_mode("unsupported delivery_mode");
> +        goto remap_fail;
> +    }
> +
> +    /* dest_mode 1 is valid for fixed and arbitrated interrupts only */
> +    if (dest_mode) {
> +        trace_amdvi_ir_err("invalid dest_mode");
> +        goto remap_fail;
> +    }

I think this check can be moved even before the switch?

> +
> +    if (pass) {
> +        memcpy(translated, origin, sizeof(*origin));
> +    } else {
> +        /* pass through is not enabled */
> +        trace_amdvi_ir_err("passthrough is not enabled");
> +        goto remap_fail;
> +    }
> +
>  out:
>      trace_amdvi_ir_remap_msi(origin->address, origin->data,
>                               translated->address, translated->data);
>      return 0;
> +
> +remap_fail:
> +    return -AMDVI_IR_TARGET_ABORT;
>  }
>  
>  static int amdvi_int_remap(X86IOMMUState *iommu,
> diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
> index 74e568b..58ef4db 100644
> --- a/hw/i386/amd_iommu.h
> +++ b/hw/i386/amd_iommu.h
> @@ -217,7 +217,65 @@
>  
>  /* Interrupt remapping errors */
>  #define AMDVI_IR_ERR            0x1
> -
> +#define AMDVI_IR_GET_IRTE       0x2
> +#define AMDVI_IR_TARGET_ABORT   0x3
> +
> +/* Interrupt remapping */
> +#define AMDVI_IR_REMAP_ENABLE           1ULL
> +#define AMDVI_IR_INTCTL_SHIFT           60
> +#define AMDVI_IR_INTCTL_ABORT           0
> +#define AMDVI_IR_INTCTL_PASS            1
> +#define AMDVI_IR_INTCTL_REMAP           2
> +
> +#define AMDVI_IR_PHYS_ADDR_MASK         (((1ULL << 45) - 1) << 6)
> +
> +/* MSI data 10:0 bits (section 2.2.5.1 Fig 14) */
> +#define AMDVI_IRTE_OFFSET               0x7ff
> +
> +/* Delivery mode of MSI data (same as IOAPIC deilver mode encoding) */
> +#define AMDVI_IOAPIC_INT_TYPE_FIXED          0x0
> +#define AMDVI_IOAPIC_INT_TYPE_ARBITRATED     0x1
> +#define AMDVI_IOAPIC_INT_TYPE_SMI            0x2
> +#define AMDVI_IOAPIC_INT_TYPE_NMI            0x4
> +#define AMDVI_IOAPIC_INT_TYPE_INIT           0x5
> +#define AMDVI_IOAPIC_INT_TYPE_EINT           0x7
> +
> +/* Pass through interrupt */
> +#define AMDVI_DEV_INT_PASS_MASK         (1UL << 56)
> +#define AMDVI_DEV_EINT_PASS_MASK        (1UL << 57)
> +#define AMDVI_DEV_NMI_PASS_MASK         (1UL << 58)
> +#define AMDVI_DEV_LINT0_PASS_MASK       (1UL << 62)
> +#define AMDVI_DEV_LINT1_PASS_MASK       (1UL << 63)
> +
> +/* Generic IRQ entry information */
> +struct AMDVIIrq {
> +    /* Used by both IOAPIC/MSI interrupt remapping */
> +    uint8_t trigger_mode;
> +    uint8_t vector;
> +    uint8_t delivery_mode;
> +    uint32_t dest;
> +    uint8_t dest_mode;
> +
> +    /* only used by MSI interrupt remapping */
> +    uint8_t redir_hint;
> +    uint8_t msi_addr_last_bits;
> +};

This is VTDIrq.

We're having some similar codes between the vt-d and amd-vi ir
implementations.  I'm thinking to what extend we could share the code.
I don't think it would worth it if we need to spend too much extra
time, but things like this one might still be good candidates that we
can share them at the very beginning since it won't be that hard (like
what you did in patch 1).

(maybe also things like amdvi_generate_msi_message but I'm not sure)

> +
> +/* Interrupt remapping table fields (Guest VAPIC not enabled) */
> +union irte {
> +    uint32_t val;
> +    struct {
> +        uint32_t valid:1,
> +                 no_fault:1,
> +                 int_type:3,
> +                 rq_eoi:1,
> +                 dm:1,
> +                 guest_mode:1,
> +                 destination:8,
> +                 vector:8,
> +                 rsvd:8;
> +    } fields;
> +};
>  
>  #define TYPE_AMD_IOMMU_DEVICE "amd-iommu"
>  #define AMD_IOMMU_DEVICE(obj)\
> diff --git a/hw/i386/trace-events b/hw/i386/trace-events
> index 41d533c..98150c9 100644
> --- a/hw/i386/trace-events
> +++ b/hw/i386/trace-events
> @@ -106,6 +106,13 @@ amdvi_mem_ir_write(uint64_t addr, uint64_t val) "addr 0x%"PRIx64" data 0x%"PRIx6
>  amdvi_ir_remap_msi_req(uint64_t addr, uint64_t data, uint8_t devid) "addr 0x%"PRIx64" data 0x%"PRIx64" devid 0x%"PRIx8
>  amdvi_ir_remap_msi(uint64_t addr, uint64_t data, uint64_t addr2, uint64_t data2) "(addr 0x%"PRIx64", data 0x%"PRIx64") -> (addr 0x%"PRIx64", data 0x%"PRIx64")"
>  amdvi_err(const char *str) "%s"
> +amdvi_ir_irte(uint64_t addr, uint64_t data) "addr 0x%"PRIx64" offset 0x%"PRIx64
> +amdvi_ir_irte_val(uint32_t data) "data 0x%"PRIx32
> +amdvi_ir_err(const char *str) "%s"
> +amdvi_ir_intctl(uint8_t val) "int_ctl 0x%"PRIx8
> +amdvi_ir_target_abort(const char *str) "%s"
> +amdvi_ir_delivery_mode(const char *str) "%s"
> +amdvi_ir_generate_msi_message(uint8_t vector, uint8_t delivery_mode, uint8_t dest_mode, uint8_t dest, uint8_t rh) "vector %d delivery-mode %d dest-mode %d dest-id %d rh %d"
>  
>  # hw/i386/vmport.c
>  vmport_register(unsigned char command, void *func, void *opaque) "command: 0x%02x func: %p opaque: %p"
> -- 
> 2.7.4
> 
> 

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 1/6] x86_iommu: move the kernel-irqchip check in common code
  2018-09-11 16:49 ` [Qemu-devel] [PATCH 1/6] x86_iommu: move the kernel-irqchip check in common code Brijesh Singh
@ 2018-09-12  3:45   ` Peter Xu
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Xu @ 2018-09-12  3:45 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: qemu-devel, Tom Lendacky, Eduardo Habkost, Michael S. Tsirkin,
	Paolo Bonzini, Suravee Suthikulpanit, Richard Henderson

On Tue, Sep 11, 2018 at 11:49:44AM -0500, Brijesh Singh wrote:
> Interrupt remapping needs kernel-irqchip={off|split} on both Intel and AMD
> platforms. Move the check in common place.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] x86_iommu/amd: Prepare for interrupt remap support
  2018-09-11 16:49 ` [Qemu-devel] [PATCH 2/6] x86_iommu/amd: Prepare for interrupt remap support Brijesh Singh
@ 2018-09-12  3:52   ` Peter Xu
  2018-09-12 18:59     ` Brijesh Singh
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Xu @ 2018-09-12  3:52 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: qemu-devel, Tom Lendacky, Eduardo Habkost, Michael S. Tsirkin,
	Paolo Bonzini, Suravee Suthikulpanit, Richard Henderson

On Tue, Sep 11, 2018 at 11:49:45AM -0500, Brijesh Singh wrote:
>  static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>  {
>      AMDVIState *s = opaque;
> @@ -1055,6 +1151,12 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>          address_space_init(&iommu_as[devfn]->as,
>                             MEMORY_REGION(&iommu_as[devfn]->iommu),
>                             "amd-iommu");
> +        memory_region_init_io(&iommu_as[devfn]->iommu_ir, OBJECT(s),
> +                              &amdvi_ir_ops, s, "amd-iommu-ir",
> +                              AMDVI_INT_ADDR_SIZE);
> +        memory_region_add_subregion(MEMORY_REGION(&iommu_as[devfn]->iommu),
> +                              AMDVI_INT_ADDR_FIRST,
> +                              &iommu_as[devfn]->iommu_ir);

A pure question: just to make sure this IR region won't be masked out
by other memory regions.  Asked since VT-d is explicitly setting a
higher priority of the memory region for interrupts with
memory_region_add_subregion_overlap().

>      }
>      return &iommu_as[devfn]->as;
>  }
> @@ -1172,6 +1274,10 @@ static void amdvi_realize(DeviceState *dev, Error **err)
>          return;
>      }
>  
> +    /* Pseudo address space under root PCI bus. */
> +    pcms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_SB_IOAPIC_ID);
> +    s->intr_enabled = x86_iommu->intr_supported;

So does this mean that AMD IR cannot be disabled if declared support?
For VT-d, IR needs to be explicitly enabled otherwise disabled (even
supported).

Otherwise the patch looks good to me.  Thanks,

> +
>      /* set up MMIO */
>      memory_region_init_io(&s->mmio, OBJECT(s), &mmio_mem_ops, s, "amdvi-mmio",
>                            AMDVI_MMIO_SIZE);
> @@ -1205,6 +1311,7 @@ static void amdvi_class_init(ObjectClass *klass, void* data)
>      dc->vmsd = &vmstate_amdvi;
>      dc->hotpluggable = false;
>      dc_class->realize = amdvi_realize;
> +    dc_class->int_remap = amdvi_int_remap;
>      /* Supported by the pc-q35-* machine types */
>      dc->user_creatable = true;
>  }

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC
  2018-09-11 16:49 ` [Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC Brijesh Singh
@ 2018-09-12  4:35   ` Peter Xu
  2018-09-12 19:11     ` Brijesh Singh
  2018-09-12 16:35   ` Igor Mammedov
  1 sibling, 1 reply; 31+ messages in thread
From: Peter Xu @ 2018-09-12  4:35 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: qemu-devel, Tom Lendacky, Eduardo Habkost, Michael S. Tsirkin,
	Paolo Bonzini, Suravee Suthikulpanit, Richard Henderson

On Tue, Sep 11, 2018 at 11:49:47AM -0500, Brijesh Singh wrote:
> When interrupt remapping is enabled, add a special IVHD device
> (type IOAPIC) -- which is typically PCI device 14:0.0. Linux IOMMU driver
> checks for this special device.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  hw/i386/acpi-build.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index e1ee8ae..5c2c638 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2519,6 +2519,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
>  static void
>  build_amd_iommu(GArray *table_data, BIOSLinker *linker)
>  {
> +    int ivhd_table_len = 28;
>      int iommu_start = table_data->len;
>      AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
>  
> @@ -2540,8 +2541,16 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
>                               (1UL << 6) | /* PrefSup      */
>                               (1UL << 7),  /* PPRSup       */
>                               1);
> +
> +    /*
> +     * When interrupt remapping is enabled, we add a special IVHD device
> +     * for type IO-APIC.
> +     */
> +    if (s->intr_enabled) {
> +        ivhd_table_len += 8;
> +    }
>      /* IVHD length */
> -    build_append_int_noprefix(table_data, 28, 2);
> +    build_append_int_noprefix(table_data, ivhd_table_len, 2);
>      /* DeviceID */
>      build_append_int_noprefix(table_data, s->devid, 2);
>      /* Capability offset */
> @@ -2565,6 +2574,15 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
>       */
>      build_append_int_noprefix(table_data, 0x0000001, 4);
>  
> +    /*
> +     * When interrupt remapping is enabled, Linux IOMMU driver also checks
> +     * for special IVHD device (type IO-APIC), which is typically presented
> +     * as PCI device 14:00.0.
> +     */
> +    if (s->intr_enabled) {
> +        build_append_int_noprefix(table_data, 0x0100a00000000048, 8);

Some comments on the bit definition would be nicer, or "please refer
to Table 95 of AMD-Vi spec".

Could I ask how come the 14:00.0?  Is that in the spec somewhere?

And since you explicitly mentioned Linux, then... would it work for
Windows too?

Thanks,

> +    }
> +
>      build_header(linker, table_data, (void *)(table_data->data + iommu_start),
>                   "IVRS", table_data->len - iommu_start, 1, NULL, NULL);
>  }
> -- 
> 2.7.4
> 
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 6/6] x86_iommu/amd: Enable Guest virtual APIC support
  2018-09-11 16:49 ` [Qemu-devel] [PATCH 6/6] x86_iommu/amd: Enable Guest virtual APIC support Brijesh Singh
@ 2018-09-12  4:52   ` Peter Xu
  2018-09-12 21:14     ` Brijesh Singh
  2018-09-13  7:13     ` Suravee Suthikulpanit
  2018-09-12 16:38   ` Igor Mammedov
  1 sibling, 2 replies; 31+ messages in thread
From: Peter Xu @ 2018-09-12  4:52 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: qemu-devel, Tom Lendacky, Eduardo Habkost, Michael S. Tsirkin,
	Paolo Bonzini, Suravee Suthikulpanit, Richard Henderson

On Tue, Sep 11, 2018 at 11:49:49AM -0500, Brijesh Singh wrote:
> Now that amd-iommu support interrupt remapping, enable the GASup in IVRS
> table and GASup in extended feature register to indicate that IOMMU
> support guest virtual APIC mode.
> 
> Note that the GAMSup is set to zero to indicate that  Guest Virtual
> APIC does not support advanced interrupt features (i.e virtualized
> interrupts using the guest virtual APIC).
> 
> See Table 21 from IOMMU spec for interrupt virtualization controls
> 
> IOMMU spec: https://support.amd.com/TechDocs/48882_IOMMU.pdf
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  hw/i386/acpi-build.c | 3 ++-
>  hw/i386/amd_iommu.h  | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 5c2c638..1cbc8ba 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2565,7 +2565,8 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
>      build_append_int_noprefix(table_data,
>                               (48UL << 30) | /* HATS   */
>                               (48UL << 28) | /* GATS   */
> -                             (1UL << 2),    /* GTSup  */
> +                             (1UL << 2)   | /* GTSup  */
> +                             (1UL << 6),    /* GASup  */

Sorry if I misunderstood - is this for nested?

I'm a bit confused here... IIUC in your previous patches you didn't
really implement guest_mode==1 case in IRTEs.  So if you have this set
then the guest should be able to setup IRTEs with guest_mode==1?  How
did it work?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC
  2018-09-11 16:49 ` [Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC Brijesh Singh
  2018-09-12  4:35   ` Peter Xu
@ 2018-09-12 16:35   ` Igor Mammedov
  2018-09-12 19:24     ` Brijesh Singh
  1 sibling, 1 reply; 31+ messages in thread
From: Igor Mammedov @ 2018-09-12 16:35 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: qemu-devel, Tom Lendacky, Eduardo Habkost, Michael S. Tsirkin,
	Paolo Bonzini, Suravee Suthikulpanit, Richard Henderson

On Tue, 11 Sep 2018 11:49:47 -0500
Brijesh Singh <brijesh.singh@amd.com> wrote:

> When interrupt remapping is enabled, add a special IVHD device
> (type IOAPIC) -- which is typically PCI device 14:0.0. Linux IOMMU driver
> checks for this special device.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  hw/i386/acpi-build.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index e1ee8ae..5c2c638 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2519,6 +2519,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
>  static void
>  build_amd_iommu(GArray *table_data, BIOSLinker *linker)
>  {
> +    int ivhd_table_len = 28;
>      int iommu_start = table_data->len;
>      AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
>  
> @@ -2540,8 +2541,16 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
>                               (1UL << 6) | /* PrefSup      */
>                               (1UL << 7),  /* PPRSup       */
>                               1);
> +
> +    /*
> +     * When interrupt remapping is enabled, we add a special IVHD device
> +     * for type IO-APIC.
> +     */
> +    if (s->intr_enabled) {
> +        ivhd_table_len += 8;
> +    }
>      /* IVHD length */
> -    build_append_int_noprefix(table_data, 28, 2);
> +    build_append_int_noprefix(table_data, ivhd_table_len, 2);
>      /* DeviceID */
>      build_append_int_noprefix(table_data, s->devid, 2);
>      /* Capability offset */
> @@ -2565,6 +2574,15 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
>       */
>      build_append_int_noprefix(table_data, 0x0000001, 4);
>  
> +    /*
> +     * When interrupt remapping is enabled, Linux IOMMU driver also checks
> +     * for special IVHD device (type IO-APIC), which is typically presented
> +     * as PCI device 14:00.0.
Probably it shouldn't be a 'typically' device from somewhere but rather address
fetched from corresponding device model QEMU implements.


> +     */
> +    if (s->intr_enabled) {
> +        build_append_int_noprefix(table_data, 0x0100a00000000048, 8);
                                                 ^^ this is incomprehensible,
where does this magic number comes from and how was it calculated?

> +    }
> +
>      build_header(linker, table_data, (void *)(table_data->data + iommu_start),
>                   "IVRS", table_data->len - iommu_start, 1, NULL, NULL);
>  }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 6/6] x86_iommu/amd: Enable Guest virtual APIC support
  2018-09-11 16:49 ` [Qemu-devel] [PATCH 6/6] x86_iommu/amd: Enable Guest virtual APIC support Brijesh Singh
  2018-09-12  4:52   ` Peter Xu
@ 2018-09-12 16:38   ` Igor Mammedov
  1 sibling, 0 replies; 31+ messages in thread
From: Igor Mammedov @ 2018-09-12 16:38 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: qemu-devel, Tom Lendacky, Eduardo Habkost, Michael S. Tsirkin,
	Paolo Bonzini, Suravee Suthikulpanit, Richard Henderson

On Tue, 11 Sep 2018 11:49:49 -0500
Brijesh Singh <brijesh.singh@amd.com> wrote:

> Now that amd-iommu support interrupt remapping, enable the GASup in IVRS
> table and GASup in extended feature register to indicate that IOMMU
> support guest virtual APIC mode.
> 
> Note that the GAMSup is set to zero to indicate that  Guest Virtual
> APIC does not support advanced interrupt features (i.e virtualized
> interrupts using the guest virtual APIC).
> 
> See Table 21 from IOMMU spec for interrupt virtualization controls
> 
> IOMMU spec: https://support.amd.com/TechDocs/48882_IOMMU.pdf
Table numbers and URLs tend to change over long time span,
pls exact spec title and revision to above info, so one could
easily google it.

> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  hw/i386/acpi-build.c | 3 ++-
>  hw/i386/amd_iommu.h  | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 5c2c638..1cbc8ba 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2565,7 +2565,8 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
>      build_append_int_noprefix(table_data,
>                               (48UL << 30) | /* HATS   */
>                               (48UL << 28) | /* GATS   */
> -                             (1UL << 2),    /* GTSup  */
> +                             (1UL << 2)   | /* GTSup  */
> +                             (1UL << 6),    /* GASup  */
>                               4);
>      /*
>       *   Type 1 device entry reporting all devices
> diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
> index 1dab974..5defaac 100644
> --- a/hw/i386/amd_iommu.h
> +++ b/hw/i386/amd_iommu.h
> @@ -177,7 +177,7 @@
>  /* extended feature support */
>  #define AMDVI_EXT_FEATURES (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
>          AMDVI_FEATURE_IA | AMDVI_FEATURE_GT | AMDVI_FEATURE_HE | \
> -        AMDVI_GATS_MODE | AMDVI_HATS_MODE)
> +        AMDVI_GATS_MODE | AMDVI_HATS_MODE | AMDVI_FEATURE_GA)
>  
>  /* capabilities header */
>  #define AMDVI_CAPAB_FEATURES (AMDVI_CAPAB_FLAT_EXT | \

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 3/6] x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled
  2018-09-12  3:37   ` Peter Xu
@ 2018-09-12 18:50     ` Brijesh Singh
  2018-09-13  2:59       ` Peter Xu
  0 siblings, 1 reply; 31+ messages in thread
From: Brijesh Singh @ 2018-09-12 18:50 UTC (permalink / raw)
  To: Peter Xu
  Cc: brijesh.singh, qemu-devel, Tom Lendacky, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Suravee Suthikulpanit,
	Richard Henderson


Thanks for the quick review feedback.

On 09/11/2018 10:37 PM, Peter Xu wrote:
> On Tue, Sep 11, 2018 at 11:49:46AM -0500, Brijesh Singh wrote:
>> Emulate the interrupt remapping support when guest virtual APIC is
>> not enabled.
>>
>> See IOMMU spec: https://support.amd.com/TechDocs/48882_IOMMU.pdf
>> (section 2.2.5.1) for details information.
>>
>> When VAPIC is not enabled, it uses interrupt remapping as defined in
>> Table 20 and Figure 15 from IOMMU spec.
>>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Richard Henderson <rth@twiddle.net>
>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
>> Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
>> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>   hw/i386/amd_iommu.c  | 187 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   hw/i386/amd_iommu.h  |  60 ++++++++++++++++-
>>   hw/i386/trace-events |   7 ++
>>   3 files changed, 253 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>> index 572ba0a..5ac19df 100644
>> --- a/hw/i386/amd_iommu.c
>> +++ b/hw/i386/amd_iommu.c
>> @@ -28,6 +28,8 @@
>>   #include "qemu/error-report.h"
>>   #include "hw/i386/apic_internal.h"
>>   #include "trace.h"
>> +#include "cpu.h"
>> +#include "hw/i386/apic-msidef.h"
>>   
>>   /* used AMD-Vi MMIO registers */
>>   const char *amdvi_mmio_low[] = {
>> @@ -1027,6 +1029,119 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
>>       return ret;
>>   }
>>   
>> +static int amdvi_get_irte(AMDVIState *s, MSIMessage *origin, uint64_t *dte,
>> +                          union irte *irte, uint16_t devid)
>> +{
>> +    uint64_t irte_root, offset;
>> +
>> +    irte_root = dte[2] & AMDVI_IR_PHYS_ADDR_MASK;
>> +    offset = (origin->data & AMDVI_IRTE_OFFSET) << 2;
>> +
>> +    trace_amdvi_ir_irte(irte_root, offset);
>> +
>> +    if (dma_memory_read(&address_space_memory, irte_root + offset,
>> +                        irte, sizeof(*irte))) {
>> +        trace_amdvi_ir_err("failed to get irte");
>> +        return -AMDVI_IR_GET_IRTE;
>> +    }
>> +
>> +    trace_amdvi_ir_irte_val(irte->val);
>> +
>> +    return 0;
>> +}
>> +
>> +static void amdvi_generate_msi_message(struct AMDVIIrq *irq, MSIMessage *out)
>> +{
>> +    out->address = APIC_DEFAULT_ADDRESS | \
>> +        (irq->dest_mode << MSI_ADDR_DEST_MODE_SHIFT) | \
>> +        (irq->redir_hint << MSI_ADDR_REDIRECTION_SHIFT) | \
>> +        (irq->dest << MSI_ADDR_DEST_ID_SHIFT);
>> +
>> +    out->data = (irq->vector << MSI_DATA_VECTOR_SHIFT) | \
>> +        (irq->delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
>> +
>> +    trace_amdvi_ir_generate_msi_message(irq->vector, irq->delivery_mode,
>> +            irq->dest_mode, irq->dest, irq->redir_hint);
>> +}
>> +
>> +static int amdvi_int_remap_legacy(AMDVIState *iommu,
>> +                                  MSIMessage *origin,
>> +                                  MSIMessage *translated,
>> +                                  uint64_t *dte,
>> +                                  struct AMDVIIrq *irq,
>> +                                  uint16_t sid)
>> +{
>> +    int ret;
>> +    union irte irte;
>> +
>> +    /* get interrupt remapping table */
> 
> ... get interrupt remapping table "entry"? :)
> 
> I see similar wordings in your spec, e.g., Table 20 is named as
> "Interrupt Remapping Table Fields - Basic Format", but actually AFAICT
> it's for the entry fields.  I'm confused a bit with them.
> 

I was too much in spec hence used the same wording as spec. But, I agree
with you that we should use "... interrupt remapping table entry".


>> +    ret = amdvi_get_irte(iommu, origin, dte, &irte, sid);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    if (!irte.fields.valid) {
>> +        trace_amdvi_ir_target_abort("RemapEn is disabled");
>> +        return -AMDVI_IR_TARGET_ABORT;
>> +    }
>> +
>> +    if (irte.fields.guest_mode) {
>> +        trace_amdvi_ir_target_abort("guest mode is not zero");
>> +        return -AMDVI_IR_TARGET_ABORT;
>> +    }
>> +
>> +    if (irte.fields.int_type > AMDVI_IOAPIC_INT_TYPE_ARBITRATED) {
>> +        trace_amdvi_ir_target_abort("reserved int_type");
>> +        return -AMDVI_IR_TARGET_ABORT;
>> +    }
>> +
>> +    irq->delivery_mode = irte.fields.int_type;
>> +    irq->vector = irte.fields.vector;
>> +    irq->dest_mode = irte.fields.dm;
>> +    irq->redir_hint = irte.fields.rq_eoi;
>> +    irq->dest = irte.fields.destination;
>> +
>> +    return 0;
>> +}
>> +
>> +static int __amdvi_int_remap_msi(AMDVIState *iommu,
>> +                                 MSIMessage *origin,
>> +                                 MSIMessage *translated,
>> +                                 uint64_t *dte,
>> +                                 uint16_t sid)
>> +{
>> +    int ret;
>> +    uint8_t int_ctl;
>> +    struct AMDVIIrq irq = { 0 };
>> +
>> +    int_ctl = (dte[2] >> AMDVI_IR_INTCTL_SHIFT) & 3;
>> +    trace_amdvi_ir_intctl(int_ctl);
>> +
>> +    switch (int_ctl) {
>> +    case AMDVI_IR_INTCTL_PASS:
>> +        memcpy(translated, origin, sizeof(*origin));
>> +        return 0;
>> +    case AMDVI_IR_INTCTL_REMAP:
>> +        break;
>> +    case AMDVI_IR_INTCTL_ABORT:
>> +        trace_amdvi_ir_target_abort("int_ctl abort");
>> +        return -AMDVI_IR_TARGET_ABORT;
>> +    default:
>> +        trace_amdvi_ir_target_abort("int_ctl reserved");
>> +        return -AMDVI_IR_TARGET_ABORT;
>> +    }
>> +
>> +    ret = amdvi_int_remap_legacy(iommu, origin, translated, dte, &irq, sid);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    /* Translate AMDVIIrq to MSI message */
>> +    amdvi_generate_msi_message(&irq, translated);
>> +
>> +    return 0;
>> +}
>> +
>>   /* Interrupt remapping for MSI/MSI-X entry */
>>   static int amdvi_int_remap_msi(AMDVIState *iommu,
>>                                  MSIMessage *origin,
>> @@ -1034,6 +1149,9 @@ static int amdvi_int_remap_msi(AMDVIState *iommu,
>>                                  uint16_t sid)
>>   {
>>       int ret;
>> +    uint64_t pass = 0;
>> +    uint64_t dte[4] = { 0 };
>> +    uint8_t dest_mode, delivery_mode;
>>   
>>       assert(origin && translated);
>>   
>> @@ -1055,10 +1173,79 @@ static int amdvi_int_remap_msi(AMDVIState *iommu,
>>           return -AMDVI_IR_ERR;
>>       }
>>   
>> +    /*
>> +     * When IOMMU is enabled, interrupt remap request will come either from
>> +     * IO-APIC or PCI device. If interrupt is from PCI device then it will
>> +     * have a valid requester id but if the interrupt it from IO-APIC
>> +     * then requester is will be invalid.
> 
> s/is/id/
> 

I will fix the comment thanks

>> +     */
>> +    if (sid == X86_IOMMU_SID_INVALID) {
>> +        sid = AMDVI_SB_IOAPIC_ID;
>> +    }
>> +
>> +    amdvi_get_dte(iommu, sid, dte);
> 
> Mind to check the return value?
> 
> After all it's provided, and we have the fault path to handle it in
> this function.
> 

The amdvi_get_dte(..) does not check the IR bit. It only checks for the
address-translation enabled bit. I can extend the function to check for
IR flag so that we can check the error status of this function.


>> +
>> +    /* interrupt remapping is disabled */
>> +    if (!(dte[2] & AMDVI_IR_REMAP_ENABLE)) {
>> +        memcpy(translated, origin, sizeof(*origin));
>> +        goto out;
>> +    }
>> +
>> +    /* deliver_mode and dest_mode from MSI data */
>> +    dest_mode = (origin->data >> 11) & 1;       /* Bit 11 */
>> +    delivery_mode = (origin->data >> 7) & 7;    /* Bits 10:8 */
> 
> Nit: The comments are already nice, though IMHO some mask macros would
> be nicer, like AMDVI_IR_PHYS_ADDR_MASK.
> 
> Also, could you hint me where are these things defined in the spec?
> I'm looking at Figure 14, but there MSI data bits 15:11 are defined as
> "XXXXXb", and bits 10:8 seems to be part of the interrupt table offset.
> 

Since we are not emulating the Hyper Transport hence we need to look at
the encoding of the delivery mode of MSI data, which  is the same as
APIC/IOAPIC delivery mode encoding. See

* For MSI Data, 
https://pdfs.semanticscholar.org/presentation/9420/c279e942eca568157711ef5c92b800c40a79.pdf
   (page 5)
* For IOAPIC, https://wiki.osdev.org/APIC

They are similar to Hyper Transport MT encoding, but not quite the same.

enum ioapic_irq_destination_types {
         dest_Fixed              = 0,
         dest_LowestPrio         = 1,
         dest_SMI                = 2,
         dest__reserved_1        = 3,
         dest_NMI                = 4,
         dest_INIT               = 5,
         dest__reserved_2        = 6,
         dest_ExtINT             = 7
};

I will add the comments in the code and also provide the link


>> +
>> +    switch (delivery_mode) {
>> +    case AMDVI_IOAPIC_INT_TYPE_FIXED:
>> +    case AMDVI_IOAPIC_INT_TYPE_ARBITRATED:
>> +        trace_amdvi_ir_delivery_mode("fixed/arbitrated");
>> +        ret = __amdvi_int_remap_msi(iommu, origin, translated, dte, sid);
>> +        if (ret < 0) {
>> +            goto remap_fail;
>> +        } else {
>> +            goto out;
>> +        }
>> +    case AMDVI_IOAPIC_INT_TYPE_SMI:
>> +        error_report("SMI is not supported!");
>> +        goto remap_fail;
> 
> (I'm not sure whether some compiler will try to find the "break" and
>   spit things if it failed to do so, so normally I'll keep them
>   there... but I might be wrong)
> 
>> +    case AMDVI_IOAPIC_INT_TYPE_NMI:
>> +        pass = dte[3] & AMDVI_DEV_NMI_PASS_MASK;
>> +        trace_amdvi_ir_delivery_mode("nmi");
>> +        break;
>> +    case AMDVI_IOAPIC_INT_TYPE_INIT:
>> +        pass = dte[3] & AMDVI_DEV_INT_PASS_MASK;
>> +        trace_amdvi_ir_delivery_mode("init");
>> +        break;
>> +    case AMDVI_IOAPIC_INT_TYPE_EINT:
>> +        pass = dte[3] & AMDVI_DEV_EINT_PASS_MASK;
>> +        trace_amdvi_ir_delivery_mode("eint");
>> +        break;
>> +    default:
>> +        trace_amdvi_ir_delivery_mode("unsupported delivery_mode");
>> +        goto remap_fail;
>> +    }
>> +
>> +    /* dest_mode 1 is valid for fixed and arbitrated interrupts only */
>> +    if (dest_mode) {
>> +        trace_amdvi_ir_err("invalid dest_mode");
>> +        goto remap_fail;
>> +    }
> 
> I think this check can be moved even before the switch?
> 

Theoretically yes but it will require adding me additional checks
before the switch statement. The dest_mode check  need me done for
non fixed or arbitrated interrupt type only.


>> +
>> +    if (pass) {
>> +        memcpy(translated, origin, sizeof(*origin));
>> +    } else {
>> +        /* pass through is not enabled */
>> +        trace_amdvi_ir_err("passthrough is not enabled");
>> +        goto remap_fail;
>> +    }
>> +
>>   out:
>>       trace_amdvi_ir_remap_msi(origin->address, origin->data,
>>                                translated->address, translated->data);
>>       return 0;
>> +
>> +remap_fail:
>> +    return -AMDVI_IR_TARGET_ABORT;
>>   }
>>   
>>   static int amdvi_int_remap(X86IOMMUState *iommu,
>> diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
>> index 74e568b..58ef4db 100644
>> --- a/hw/i386/amd_iommu.h
>> +++ b/hw/i386/amd_iommu.h
>> @@ -217,7 +217,65 @@
>>   
>>   /* Interrupt remapping errors */
>>   #define AMDVI_IR_ERR            0x1
>> -
>> +#define AMDVI_IR_GET_IRTE       0x2
>> +#define AMDVI_IR_TARGET_ABORT   0x3
>> +
>> +/* Interrupt remapping */
>> +#define AMDVI_IR_REMAP_ENABLE           1ULL
>> +#define AMDVI_IR_INTCTL_SHIFT           60
>> +#define AMDVI_IR_INTCTL_ABORT           0
>> +#define AMDVI_IR_INTCTL_PASS            1
>> +#define AMDVI_IR_INTCTL_REMAP           2
>> +
>> +#define AMDVI_IR_PHYS_ADDR_MASK         (((1ULL << 45) - 1) << 6)
>> +
>> +/* MSI data 10:0 bits (section 2.2.5.1 Fig 14) */
>> +#define AMDVI_IRTE_OFFSET               0x7ff
>> +
>> +/* Delivery mode of MSI data (same as IOAPIC deilver mode encoding) */
>> +#define AMDVI_IOAPIC_INT_TYPE_FIXED          0x0
>> +#define AMDVI_IOAPIC_INT_TYPE_ARBITRATED     0x1
>> +#define AMDVI_IOAPIC_INT_TYPE_SMI            0x2
>> +#define AMDVI_IOAPIC_INT_TYPE_NMI            0x4
>> +#define AMDVI_IOAPIC_INT_TYPE_INIT           0x5
>> +#define AMDVI_IOAPIC_INT_TYPE_EINT           0x7
>> +
>> +/* Pass through interrupt */
>> +#define AMDVI_DEV_INT_PASS_MASK         (1UL << 56)
>> +#define AMDVI_DEV_EINT_PASS_MASK        (1UL << 57)
>> +#define AMDVI_DEV_NMI_PASS_MASK         (1UL << 58)
>> +#define AMDVI_DEV_LINT0_PASS_MASK       (1UL << 62)
>> +#define AMDVI_DEV_LINT1_PASS_MASK       (1UL << 63)
>> +
>> +/* Generic IRQ entry information */
>> +struct AMDVIIrq {
>> +    /* Used by both IOAPIC/MSI interrupt remapping */
>> +    uint8_t trigger_mode;
>> +    uint8_t vector;
>> +    uint8_t delivery_mode;
>> +    uint32_t dest;
>> +    uint8_t dest_mode;
>> +
>> +    /* only used by MSI interrupt remapping */
>> +    uint8_t redir_hint;
>> +    uint8_t msi_addr_last_bits;
>> +};
> 
> This is VTDIrq.
> 
> We're having some similar codes between the vt-d and amd-vi ir
> implementations.  I'm thinking to what extend we could share the code.
> I don't think it would worth it if we need to spend too much extra
> time, but things like this one might still be good candidates that we
> can share them at the very beginning since it won't be that hard (like
> what you did in patch 1).
> 
> (maybe also things like amdvi_generate_msi_message but I'm not sure)
> 

I will see what can be done to move the VTDIrq struct and msi message
generation in common file.

thanks

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] x86_iommu/amd: Prepare for interrupt remap support
  2018-09-12  3:52   ` Peter Xu
@ 2018-09-12 18:59     ` Brijesh Singh
  2018-09-13  3:15       ` Peter Xu
  0 siblings, 1 reply; 31+ messages in thread
From: Brijesh Singh @ 2018-09-12 18:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: brijesh.singh, qemu-devel, Tom Lendacky, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Suravee Suthikulpanit,
	Richard Henderson



On 09/11/2018 10:52 PM, Peter Xu wrote:
> On Tue, Sep 11, 2018 at 11:49:45AM -0500, Brijesh Singh wrote:
>>   static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>>   {
>>       AMDVIState *s = opaque;
>> @@ -1055,6 +1151,12 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>>           address_space_init(&iommu_as[devfn]->as,
>>                              MEMORY_REGION(&iommu_as[devfn]->iommu),
>>                              "amd-iommu");
>> +        memory_region_init_io(&iommu_as[devfn]->iommu_ir, OBJECT(s),
>> +                              &amdvi_ir_ops, s, "amd-iommu-ir",
>> +                              AMDVI_INT_ADDR_SIZE);
>> +        memory_region_add_subregion(MEMORY_REGION(&iommu_as[devfn]->iommu),
>> +                              AMDVI_INT_ADDR_FIRST,
>> +                              &iommu_as[devfn]->iommu_ir);
> 
> A pure question: just to make sure this IR region won't be masked out
> by other memory regions.  Asked since VT-d is explicitly setting a
> higher priority of the memory region for interrupts with
> memory_region_add_subregion_overlap().
> 

Hmm, I was hoping that this IR region will not be masked out by other
regions but if it does then we will have trouble. thanks for pointing
this out, I think we can do similar thing as VT-d and make the region
as high priority so that we get memops invoked.


>>       }
>>       return &iommu_as[devfn]->as;
>>   }
>> @@ -1172,6 +1274,10 @@ static void amdvi_realize(DeviceState *dev, Error **err)
>>           return;
>>       }
>>   
>> +    /* Pseudo address space under root PCI bus. */
>> +    pcms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_SB_IOAPIC_ID);
>> +    s->intr_enabled = x86_iommu->intr_supported;
> 
> So does this mean that AMD IR cannot be disabled if declared support?
> For VT-d, IR needs to be explicitly enabled otherwise disabled (even
> supported).
> 


Yes, once its declared as supported then it can not disabled. Its
upto the guest OS to decide whether it want to use the intr remapping
feature by parsing the IVRS. This also brings question, should we
just enable it by default because its guest OS decision whether it
wants to use it or not.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC
  2018-09-12  4:35   ` Peter Xu
@ 2018-09-12 19:11     ` Brijesh Singh
  2018-09-13  3:20       ` Peter Xu
  0 siblings, 1 reply; 31+ messages in thread
From: Brijesh Singh @ 2018-09-12 19:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: brijesh.singh, qemu-devel, Tom Lendacky, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Suravee Suthikulpanit,
	Richard Henderson



On 09/11/2018 11:35 PM, Peter Xu wrote:
> On Tue, Sep 11, 2018 at 11:49:47AM -0500, Brijesh Singh wrote:
>> When interrupt remapping is enabled, add a special IVHD device
>> (type IOAPIC) -- which is typically PCI device 14:0.0. Linux IOMMU driver
>> checks for this special device.
>>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Richard Henderson <rth@twiddle.net>
>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
>> Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
>> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>   hw/i386/acpi-build.c | 20 +++++++++++++++++++-
>>   1 file changed, 19 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index e1ee8ae..5c2c638 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -2519,6 +2519,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
>>   static void
>>   build_amd_iommu(GArray *table_data, BIOSLinker *linker)
>>   {
>> +    int ivhd_table_len = 28;
>>       int iommu_start = table_data->len;
>>       AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
>>   
>> @@ -2540,8 +2541,16 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
>>                                (1UL << 6) | /* PrefSup      */
>>                                (1UL << 7),  /* PPRSup       */
>>                                1);
>> +
>> +    /*
>> +     * When interrupt remapping is enabled, we add a special IVHD device
>> +     * for type IO-APIC.
>> +     */
>> +    if (s->intr_enabled) {
>> +        ivhd_table_len += 8;
>> +    }
>>       /* IVHD length */
>> -    build_append_int_noprefix(table_data, 28, 2);
>> +    build_append_int_noprefix(table_data, ivhd_table_len, 2);
>>       /* DeviceID */
>>       build_append_int_noprefix(table_data, s->devid, 2);
>>       /* Capability offset */
>> @@ -2565,6 +2574,15 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
>>        */
>>       build_append_int_noprefix(table_data, 0x0000001, 4);
>>   
>> +    /*
>> +     * When interrupt remapping is enabled, Linux IOMMU driver also checks
>> +     * for special IVHD device (type IO-APIC), which is typically presented
>> +     * as PCI device 14:00.0.
>> +     */
>> +    if (s->intr_enabled) {
>> +        build_append_int_noprefix(table_data, 0x0100a00000000048, 8);
> 
> Some comments on the bit definition would be nicer, or "please refer
> to Table 95 of AMD-Vi spec".
> 
> Could I ask how come the 14:00.0?  Is that in the spec somewhere?
> 
> And since you explicitly mentioned Linux, then... would it work for
> Windows too?
> 

The PCI 14:00.0 is SouthBridge IOAPIC device. On bare metal the timer
subsystem is connected to the SB IOAPIC. The IVRS table must contains
the entry of SB IOAPIC otherwise Linux will not enable the IR mapping
while parsing the IVRS.

On bare meta system, IVRS will always have entry for SB IOAPIC. As per
Windows is concerned, I am not sure if Windows support interrupt remap.
If it does, adding the SB IOAPIC devid should not cause any problem
to it because its always available on bare metal system.

Here is linux commit

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC
  2018-09-12 16:35   ` Igor Mammedov
@ 2018-09-12 19:24     ` Brijesh Singh
  2018-09-13 18:18       ` Michael S. Tsirkin
  0 siblings, 1 reply; 31+ messages in thread
From: Brijesh Singh @ 2018-09-12 19:24 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: brijesh.singh, qemu-devel, Tom Lendacky, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Suravee Suthikulpanit,
	Richard Henderson



On 09/12/2018 11:35 AM, Igor Mammedov wrote:
...

>>   
>> +    /*
>> +     * When interrupt remapping is enabled, Linux IOMMU driver also checks
>> +     * for special IVHD device (type IO-APIC), which is typically presented
>> +     * as PCI device 14:00.0.
> Probably it shouldn't be a 'typically' device from somewhere but rather address
> fetched from corresponding device model QEMU implements.
> 

IOAPIC is not presented as a true PCI device to guest OS. When IOMMU is
enabled a pseudo address space to added under root PCI bus. PCI 14:0.0
presents to this pseudo device.

> 
>> +     */
>> +    if (s->intr_enabled) {
>> +        build_append_int_noprefix(table_data, 0x0100a00000000048, 8);
>                                                   ^^ this is incomprehensible,
> where does this magic number comes from and how was it calculated?
> 

In order to provide interrupt remap support, a special IVHD device need
to be added,  the magic number uses the format defined in Table 95 (IVHD
device entry type codes).

0x01 00a0 00 00 0000 48

Byte 0: 0x48 (special device)
Byte 1 & 2: must be zero
Byte 3: 0 (dte setting)
Byte 4: 0 (handle)
Byte 5 & 6: IOAPIC devfn (14:0.0)
Byte 7: 0x1 (IOAPIC) - See Table 97 in spec


>> +    }
>> +
>>       build_header(linker, table_data, (void *)(table_data->data + iommu_start),
>>                    "IVRS", table_data->len - iommu_start, 1, NULL, NULL);
>>   }
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 6/6] x86_iommu/amd: Enable Guest virtual APIC support
  2018-09-12  4:52   ` Peter Xu
@ 2018-09-12 21:14     ` Brijesh Singh
  2018-09-13  8:36       ` Suravee Suthikulpanit
  2018-09-13  7:13     ` Suravee Suthikulpanit
  1 sibling, 1 reply; 31+ messages in thread
From: Brijesh Singh @ 2018-09-12 21:14 UTC (permalink / raw)
  To: Peter Xu
  Cc: brijesh.singh, qemu-devel, Tom Lendacky, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Suravee Suthikulpanit,
	Richard Henderson



On 09/11/2018 11:52 PM, Peter Xu wrote:
...

>>
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index 5c2c638..1cbc8ba 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -2565,7 +2565,8 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
>>       build_append_int_noprefix(table_data,
>>                                (48UL << 30) | /* HATS   */
>>                                (48UL << 28) | /* GATS   */
>> -                             (1UL << 2),    /* GTSup  */
>> +                             (1UL << 2)   | /* GTSup  */
>> +                             (1UL << 6),    /* GASup  */
> 
> Sorry if I misunderstood - is this for nested?
> 
> I'm a bit confused here... IIUC in your previous patches you didn't
> really implement guest_mode==1 case in IRTEs.  So if you have this set
> then the guest should be able to setup IRTEs with guest_mode==1?  How
> did it work?
> 

Yes, sometime spec can be confusing ;) let me see if I can explain it.

Suravee, please correct me if I misunderstood something

- A legacy interrupt remap support is available on all IOMMU versions.

   Guest OS makes the decision whether to use the feature. Guest OS
   sets the IV bit in DTE to indicate when there is a valid interrupt
   map (See Table 7 DTE definition).

   In this mode, IRTE is 32-bit. The field details are available
   in Table 20 - Section 2.2.5.2. The third patch in this series
   implements the support for this case.

   (NOTE: As I explained [1], the Linux guest enables intr remap only
    when it sees a special IVHD IOAPIC type device is present in IVRS)

   [1] https://marc.info/?l=qemu-devel&m=153677956506196&w=2

- When AVIC is used in the guest, the intr remap logic is different.

   Based on the guest_mode, IOMMU and AVIC may need to work together
   to remap the interrupts (IOMMU spec refers this as Guest Virtual APIC
   Enabled -- Section 2.2.5.3).

   The GASup bit in extended feature register and IVHD IOMMU feature
   reporting field in IVRS are used to tell whether the IOMMU supports
   intr remap when AVIC is enabled in the guest.

   In this mode, the IRTE is 128-bit.

   To make things interesting, there is a GAMSup bit in extended
   feature register. It is used by IOMMU to tell the AVIC supported
   modes:

   a) 0 = intr_remap only (i.e IOMMU can remap the interrupt on its own)
   b) 1 = Both guest AVIC and IOMMU will work together to remap the intr

   The patch 5 in the series implements the intr_remap (#a)

   I have not implemented the mode=1. I am not sure if its worth
   implementing this mode for the emulated IOMMU case.

   See Table 21 for control knobs from IOMMU. This patch series
   implements the first three rows.

I hope my explanation does not add more confusions ;)

-Brijesh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 3/6] x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled
  2018-09-12 18:50     ` Brijesh Singh
@ 2018-09-13  2:59       ` Peter Xu
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Xu @ 2018-09-13  2:59 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: qemu-devel, Tom Lendacky, Eduardo Habkost, Michael S. Tsirkin,
	Paolo Bonzini, Suravee Suthikulpanit, Richard Henderson

On Wed, Sep 12, 2018 at 01:50:34PM -0500, Brijesh Singh wrote:

[...]

> > > +     */
> > > +    if (sid == X86_IOMMU_SID_INVALID) {
> > > +        sid = AMDVI_SB_IOAPIC_ID;
> > > +    }
> > > +
> > > +    amdvi_get_dte(iommu, sid, dte);
> > 
> > Mind to check the return value?
> > 
> > After all it's provided, and we have the fault path to handle it in
> > this function.
> > 
> 
> The amdvi_get_dte(..) does not check the IR bit. It only checks for the
> address-translation enabled bit. I can extend the function to check for
> IR flag so that we can check the error status of this function.

It seems to me that amdvi_get_dte() is checking against things like:
general read errors, reserved bits error, and proper set of
AMDVI_DEV_VALID.  These seem still make sense to IR.

> 
> 
> > > +
> > > +    /* interrupt remapping is disabled */
> > > +    if (!(dte[2] & AMDVI_IR_REMAP_ENABLE)) {
> > > +        memcpy(translated, origin, sizeof(*origin));
> > > +        goto out;
> > > +    }
> > > +
> > > +    /* deliver_mode and dest_mode from MSI data */
> > > +    dest_mode = (origin->data >> 11) & 1;       /* Bit 11 */
> > > +    delivery_mode = (origin->data >> 7) & 7;    /* Bits 10:8 */
> > 
> > Nit: The comments are already nice, though IMHO some mask macros would
> > be nicer, like AMDVI_IR_PHYS_ADDR_MASK.
> > 
> > Also, could you hint me where are these things defined in the spec?
> > I'm looking at Figure 14, but there MSI data bits 15:11 are defined as
> > "XXXXXb", and bits 10:8 seems to be part of the interrupt table offset.
> > 
> 
> Since we are not emulating the Hyper Transport hence we need to look at
> the encoding of the delivery mode of MSI data, which  is the same as
> APIC/IOAPIC delivery mode encoding. See
> 
> * For MSI Data, https://pdfs.semanticscholar.org/presentation/9420/c279e942eca568157711ef5c92b800c40a79.pdf
>   (page 5)

[1]

> * For IOAPIC, https://wiki.osdev.org/APIC
> 
> They are similar to Hyper Transport MT encoding, but not quite the same.
> 
> enum ioapic_irq_destination_types {
>         dest_Fixed              = 0,
>         dest_LowestPrio         = 1,
>         dest_SMI                = 2,
>         dest__reserved_1        = 3,
>         dest_NMI                = 4,
>         dest_INIT               = 5,
>         dest__reserved_2        = 6,
>         dest_ExtINT             = 7
> };
> 
> I will add the comments in the code and also provide the link

I just misunderstood since you did this before the switch.  Maybe you
could consider move this after the switch then since these dest_mode
things are not valid for all the cases.

Meanwhile, you are refering to general MSI/IOAPIC definitions, then I
don't see why dest_mode is bit 11 of MSI data; what I see is that it
should be bit 2 of MSI address.  That's exactly on the slides you
provided [1].  Would you help double confirm?

[...]

> > > +/* Generic IRQ entry information */
> > > +struct AMDVIIrq {
> > > +    /* Used by both IOAPIC/MSI interrupt remapping */
> > > +    uint8_t trigger_mode;
> > > +    uint8_t vector;
> > > +    uint8_t delivery_mode;
> > > +    uint32_t dest;
> > > +    uint8_t dest_mode;
> > > +
> > > +    /* only used by MSI interrupt remapping */
> > > +    uint8_t redir_hint;
> > > +    uint8_t msi_addr_last_bits;
> > > +};
> > 
> > This is VTDIrq.
> > 
> > We're having some similar codes between the vt-d and amd-vi ir
> > implementations.  I'm thinking to what extend we could share the code.
> > I don't think it would worth it if we need to spend too much extra
> > time, but things like this one might still be good candidates that we
> > can share them at the very beginning since it won't be that hard (like
> > what you did in patch 1).
> > 
> > (maybe also things like amdvi_generate_msi_message but I'm not sure)
> > 
> 
> I will see what can be done to move the VTDIrq struct and msi message
> generation in common file.

That'll be nice.  Thank you.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] x86_iommu/amd: Prepare for interrupt remap support
  2018-09-12 18:59     ` Brijesh Singh
@ 2018-09-13  3:15       ` Peter Xu
  2018-09-13  8:15         ` Suravee Suthikulpanit
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Xu @ 2018-09-13  3:15 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: qemu-devel, Tom Lendacky, Eduardo Habkost, Michael S. Tsirkin,
	Paolo Bonzini, Suravee Suthikulpanit, Richard Henderson

On Wed, Sep 12, 2018 at 01:59:06PM -0500, Brijesh Singh wrote:

[...]

> > >       }
> > >       return &iommu_as[devfn]->as;
> > >   }
> > > @@ -1172,6 +1274,10 @@ static void amdvi_realize(DeviceState *dev, Error **err)
> > >           return;
> > >       }
> > > +    /* Pseudo address space under root PCI bus. */
> > > +    pcms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_SB_IOAPIC_ID);
> > > +    s->intr_enabled = x86_iommu->intr_supported;
> > 
> > So does this mean that AMD IR cannot be disabled if declared support?
> > For VT-d, IR needs to be explicitly enabled otherwise disabled (even
> > supported).
> > 
> 
> 
> Yes, once its declared as supported then it can not disabled. Its
> upto the guest OS to decide whether it want to use the intr remapping
> feature by parsing the IVRS. This also brings question, should we
> just enable it by default because its guest OS decision whether it
> wants to use it or not.

It's by default off?  I mean:

    DEFINE_PROP_BOOL("intremap", X86IOMMUState, intr_supported, false),

Then it's good to me, thanks.  You could add a comment there
explicitly mentioning that "IR will be enabled as long as declared
support for AMD" since it might confuse some of the people like me...
but it's optional.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC
  2018-09-12 19:11     ` Brijesh Singh
@ 2018-09-13  3:20       ` Peter Xu
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Xu @ 2018-09-13  3:20 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: qemu-devel, Tom Lendacky, Eduardo Habkost, Michael S. Tsirkin,
	Paolo Bonzini, Suravee Suthikulpanit, Richard Henderson

On Wed, Sep 12, 2018 at 02:11:10PM -0500, Brijesh Singh wrote:
> 
> 
> On 09/11/2018 11:35 PM, Peter Xu wrote:
> > On Tue, Sep 11, 2018 at 11:49:47AM -0500, Brijesh Singh wrote:
> > > When interrupt remapping is enabled, add a special IVHD device
> > > (type IOAPIC) -- which is typically PCI device 14:0.0. Linux IOMMU driver
> > > checks for this special device.
> > > 
> > > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: Richard Henderson <rth@twiddle.net>
> > > Cc: Eduardo Habkost <ehabkost@redhat.com>
> > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > > Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
> > > Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> > > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > > ---
> > >   hw/i386/acpi-build.c | 20 +++++++++++++++++++-
> > >   1 file changed, 19 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > index e1ee8ae..5c2c638 100644
> > > --- a/hw/i386/acpi-build.c
> > > +++ b/hw/i386/acpi-build.c
> > > @@ -2519,6 +2519,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
> > >   static void
> > >   build_amd_iommu(GArray *table_data, BIOSLinker *linker)
> > >   {
> > > +    int ivhd_table_len = 28;
> > >       int iommu_start = table_data->len;
> > >       AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
> > > @@ -2540,8 +2541,16 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
> > >                                (1UL << 6) | /* PrefSup      */
> > >                                (1UL << 7),  /* PPRSup       */
> > >                                1);
> > > +
> > > +    /*
> > > +     * When interrupt remapping is enabled, we add a special IVHD device
> > > +     * for type IO-APIC.
> > > +     */
> > > +    if (s->intr_enabled) {
> > > +        ivhd_table_len += 8;
> > > +    }
> > >       /* IVHD length */
> > > -    build_append_int_noprefix(table_data, 28, 2);
> > > +    build_append_int_noprefix(table_data, ivhd_table_len, 2);
> > >       /* DeviceID */
> > >       build_append_int_noprefix(table_data, s->devid, 2);
> > >       /* Capability offset */
> > > @@ -2565,6 +2574,15 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
> > >        */
> > >       build_append_int_noprefix(table_data, 0x0000001, 4);
> > > +    /*
> > > +     * When interrupt remapping is enabled, Linux IOMMU driver also checks
> > > +     * for special IVHD device (type IO-APIC), which is typically presented
> > > +     * as PCI device 14:00.0.
> > > +     */
> > > +    if (s->intr_enabled) {
> > > +        build_append_int_noprefix(table_data, 0x0100a00000000048, 8);
> > 
> > Some comments on the bit definition would be nicer, or "please refer
> > to Table 95 of AMD-Vi spec".
> > 
> > Could I ask how come the 14:00.0?  Is that in the spec somewhere?
> > 
> > And since you explicitly mentioned Linux, then... would it work for
> > Windows too?
> > 
> 
> The PCI 14:00.0 is SouthBridge IOAPIC device. On bare metal the timer
> subsystem is connected to the SB IOAPIC. The IVRS table must contains
> the entry of SB IOAPIC otherwise Linux will not enable the IR mapping
> while parsing the IVRS.
> 
> On bare meta system, IVRS will always have entry for SB IOAPIC. As per
> Windows is concerned, I am not sure if Windows support interrupt remap.
> If it does, adding the SB IOAPIC devid should not cause any problem
> to it because its always available on bare metal system.
> 
> Here is linux commit
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059

Thanks for these information.  Please feel free to add some of the
information into the comment, that might explain itself better.  When
referring to commits, I would suggest just use "Please refer to Linux
commit xxx (xxx)" rather than the link though.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 6/6] x86_iommu/amd: Enable Guest virtual APIC support
  2018-09-12  4:52   ` Peter Xu
  2018-09-12 21:14     ` Brijesh Singh
@ 2018-09-13  7:13     ` Suravee Suthikulpanit
  1 sibling, 0 replies; 31+ messages in thread
From: Suravee Suthikulpanit @ 2018-09-13  7:13 UTC (permalink / raw)
  To: Peter Xu, Brijesh Singh
  Cc: qemu-devel, Tom Lendacky, Eduardo Habkost, Michael S. Tsirkin,
	Paolo Bonzini, Richard Henderson

Peter,

On 9/12/18 11:52 AM, Peter Xu wrote:
> On Tue, Sep 11, 2018 at 11:49:49AM -0500, Brijesh Singh wrote:
>> Now that amd-iommu support interrupt remapping, enable the GASup in IVRS
>> table and GASup in extended feature register to indicate that IOMMU
>> support guest virtual APIC mode.
>>
>> Note that the GAMSup is set to zero to indicate that  Guest Virtual
>> APIC does not support advanced interrupt features (i.e virtualized
>> interrupts using the guest virtual APIC).
>>
>> See Table 21 from IOMMU spec for interrupt virtualization controls
>>
>> IOMMU spec: https://support.amd.com/TechDocs/48882_IOMMU.pdf
>>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Richard Henderson <rth@twiddle.net>
>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
>> Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
>> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>   hw/i386/acpi-build.c | 3 ++-
>>   hw/i386/amd_iommu.h  | 2 +-
>>   2 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index 5c2c638..1cbc8ba 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -2565,7 +2565,8 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
>>       build_append_int_noprefix(table_data,
>>                                (48UL << 30) | /* HATS   */
>>                                (48UL << 28) | /* GATS   */
>> -                             (1UL << 2),    /* GTSup  */
>> +                             (1UL << 2)   | /* GTSup  */
>> +                             (1UL << 6),    /* GASup  */
> 
> Sorry if I misunderstood - is this for nested?
> 
> I'm a bit confused here... IIUC in your previous patches you didn't
> really implement guest_mode==1 case in IRTEs.  So if you have this set
> then the guest should be able to setup IRTEs with guest_mode==1?  How
> did it work?
> 
> Thanks,
> 

The naming of these bits are confusing. Please allow me to help explain.
There are two capability bits:

* GASup  : This is to allow 128-bit IRTE when GAEn is set.

* GAMSup : This is for Guest Virtual APIC mode support,
             which is not currently supported in vIOMMU.
            The commit message in patch 5 is incorrect.

Here, we set GASup in order to allow 128-bit IRTE support,
which is needed for some of future AMD IOMMU features.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] x86_iommu/amd: Add interrupt remap support when VAPIC is enabled
       [not found] ` <1536684589-11718-6-git-send-email-brijesh.singh@amd.com>
@ 2018-09-13  7:16   ` Suravee Suthikulpanit
  0 siblings, 0 replies; 31+ messages in thread
From: Suravee Suthikulpanit @ 2018-09-13  7:16 UTC (permalink / raw)
  To: Brijesh Singh, qemu-devel
  Cc: Michael S. Tsirkin, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost, Marcel Apfelbaum, Tom Lendacky

Brijesh,

On 9/11/18 11:49 PM, Brijesh Singh wrote:
> Emulate the interrupt remapping support when guest virtual APIC is
> enabled.
> 
> See IOMMU spec:https://support.amd.com/TechDocs/48882_IOMMU.pdf
> (section 2.2.5.2) for details information.
> 
> When VAPIC is enabled, it uses interrupt remapping as defined in
> Table 22 and Figure 17 from IOMMU spec.
> 
> Cc: "Michael S. Tsirkin"<mst@redhat.com>
> Cc: Paolo Bonzini<pbonzini@redhat.com>
> Cc: Richard Henderson<rth@twiddle.net>
> Cc: Eduardo Habkost<ehabkost@redhat.com>
> Cc: Marcel Apfelbaum<marcel.apfelbaum@gmail.com>
> Cc: Tom Lendacky<Thomas.Lendacky@amd.com>
> Cc: Suravee Suthikulpanit<Suravee.Suthikulpanit@amd.com>
> Signed-off-by: Brijesh Singh<brijesh.singh@amd.com>
> ---
>   hw/i386/amd_iommu.c  | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>   hw/i386/amd_iommu.h  | 38 +++++++++++++++++++++++++++++
>   hw/i386/trace-events |  2 ++
>   3 files changed, 107 insertions(+), 1 deletion(-)

The commit message here is incorrect and/or misleading.
The GASup bit is essentially allow 128-bit IRTE format.
The guest virtual APIC support is really GAMSup bit.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] x86_iommu/amd: Prepare for interrupt remap support
  2018-09-13  3:15       ` Peter Xu
@ 2018-09-13  8:15         ` Suravee Suthikulpanit
  2018-09-13  8:48           ` Peter Xu
  2018-09-13 14:47           ` Paolo Bonzini
  0 siblings, 2 replies; 31+ messages in thread
From: Suravee Suthikulpanit @ 2018-09-13  8:15 UTC (permalink / raw)
  To: Peter Xu, Brijesh Singh
  Cc: qemu-devel, Tom Lendacky, Eduardo Habkost, Michael S. Tsirkin,
	Paolo Bonzini, Richard Henderson

Brijesh / Peter,

On 9/13/18 10:15 AM, Peter Xu wrote:
> On Wed, Sep 12, 2018 at 01:59:06PM -0500, Brijesh Singh wrote:
> 
> [...]
> 
>>>>        }
>>>>        return &iommu_as[devfn]->as;
>>>>    }
>>>> @@ -1172,6 +1274,10 @@ static void amdvi_realize(DeviceState *dev, Error **err)
>>>>            return;
>>>>        }
>>>> +    /* Pseudo address space under root PCI bus. */
>>>> +    pcms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_SB_IOAPIC_ID);
>>>> +    s->intr_enabled = x86_iommu->intr_supported;
>>>
>>> So does this mean that AMD IR cannot be disabled if declared support?
>>> For VT-d, IR needs to be explicitly enabled otherwise disabled (even
>>> supported).
>>>
>>
>>
>> Yes, once its declared as supported then it can not disabled. Its
>> upto the guest OS to decide whether it want to use the intr remapping
>> feature by parsing the IVRS. This also brings question, should we
>> just enable it by default because its guest OS decision whether it
>> wants to use it or not.
> 
> It's by default off?  I mean:
> 
>      DEFINE_PROP_BOOL("intremap", X86IOMMUState, intr_supported, false),
> 
> Then it's good to me, thanks.  You could add a comment there
> explicitly mentioning that "IR will be enabled as long as declared
> support for AMD" since it might confuse some of the people like me...
> but it's optional.
> 
> Regards,
> 

Actually, there are two separate knobs here:

* This option basically means the interrupt remapping support is available/unavailable,
   which should default to available unless specified to unavailable at the QEMU command line.
   For example, the AMD IOMMU v1 does not have interrupt remapping support, which means
   the interrupt remapping bit fields in DTE (e.g. IV, IntCtl, EIntPass, INITPass, NMIPass,
   Lint0Pass, Lint1Pass) are ignored.

* If interrupt remapping support is available, the guest OS can decide to enable or disable the feature
   using DTE bit fileds (e.g. Linux option intremap=off would disable interrupt remapping
   by not setting DTE[IV] bit). For Linux AMD IOMMU driver, the default is to enable the interrupt remapping.

In facts, we should not need this option. However, if you prefer to keep this option,
we probably should rename this to "intremap_sup", in which if the default value should be 1.

Regards,
Suravee

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 6/6] x86_iommu/amd: Enable Guest virtual APIC support
  2018-09-12 21:14     ` Brijesh Singh
@ 2018-09-13  8:36       ` Suravee Suthikulpanit
  2018-09-13 11:44         ` Peter Xu
  0 siblings, 1 reply; 31+ messages in thread
From: Suravee Suthikulpanit @ 2018-09-13  8:36 UTC (permalink / raw)
  To: Brijesh Singh, Peter Xu
  Cc: qemu-devel, Tom Lendacky, Eduardo Habkost, Michael S. Tsirkin,
	Paolo Bonzini, Richard Henderson

Brijesh/Peter,

On 9/13/18 4:14 AM, Brijesh Singh wrote:
> 
> 
> On 09/11/2018 11:52 PM, Peter Xu wrote:
> ...
> 
>>>
>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>>> index 5c2c638..1cbc8ba 100644
>>> --- a/hw/i386/acpi-build.c
>>> +++ b/hw/i386/acpi-build.c
>>> @@ -2565,7 +2565,8 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
>>>       build_append_int_noprefix(table_data,
>>>                                (48UL << 30) | /* HATS   */
>>>                                (48UL << 28) | /* GATS   */
>>> -                             (1UL << 2),    /* GTSup  */
>>> +                             (1UL << 2)   | /* GTSup  */
>>> +                             (1UL << 6),    /* GASup  */
>>
>> Sorry if I misunderstood - is this for nested?
>>
>> I'm a bit confused here... IIUC in your previous patches you didn't
>> really implement guest_mode==1 case in IRTEs.  So if you have this set
>> then the guest should be able to setup IRTEs with guest_mode==1?  How
>> did it work?
>>
> 
> Yes, sometime spec can be confusing ;) let me see if I can explain it.
> 
> Suravee, please correct me if I misunderstood something
> 
> - A legacy interrupt remap support is available on all IOMMU versions.
> 
>    Guest OS makes the decision whether to use the feature. Guest OS
>    sets the IV bit in DTE to indicate when there is a valid interrupt
>    map (See Table 7 DTE definition).
> 
>    In this mode, IRTE is 32-bit. The field details are available
>    in Table 20 - Section 2.2.5.2. The third patch in this series
>    implements the support for this case.

This is correct.

> 
> - When AVIC is used in the guest, the intr remap logic is different.
> 
>    Based on the guest_mode, IOMMU and AVIC may need to work together
>    to remap the interrupts (IOMMU spec refers this as Guest Virtual APIC
>    Enabled -- Section 2.2.5.3).
> 
>    The GASup bit in extended feature register and IVHD IOMMU feature
>    reporting field in IVRS are used to tell whether the IOMMU supports
>    intr remap when AVIC is enabled in the guest.
> 
>    In this mode, the IRTE is 128-bit.

GASup is a prereq for GAMSup and several other features in the future.

> 
>    To make things interesting, there is a GAMSup bit in extended
>    feature register. It is used by IOMMU to tell the AVIC supported
>    modes:
> 
>    a) 0 = intr_remap only (i.e IOMMU can remap the interrupt on its own)
>    b) 1 = Both guest AVIC and IOMMU will work together to remap the intr
> 
>    The patch 5 in the series implements the intr_remap (#a)
> 
>    I have not implemented the mode=1. I am not sure if its worth
>    implementing this mode for the emulated IOMMU case.
> 
>    See Table 21 for control knobs from IOMMU. This patch series
>    implements the first three rows.

GAMSup is guest virtual APIC mode (aka AVIC), and for vIOMMU case,
would be used for nested VM. This is not currently supported for vIOMMU,

Regards,
Suravee

> 
> -Brijesh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] x86_iommu/amd: Prepare for interrupt remap support
  2018-09-13  8:15         ` Suravee Suthikulpanit
@ 2018-09-13  8:48           ` Peter Xu
  2018-09-13 14:47           ` Paolo Bonzini
  1 sibling, 0 replies; 31+ messages in thread
From: Peter Xu @ 2018-09-13  8:48 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: Brijesh Singh, qemu-devel, Tom Lendacky, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Richard Henderson

On Thu, Sep 13, 2018 at 03:15:27PM +0700, Suravee Suthikulpanit wrote:
> Brijesh / Peter,
> 
> On 9/13/18 10:15 AM, Peter Xu wrote:
> > On Wed, Sep 12, 2018 at 01:59:06PM -0500, Brijesh Singh wrote:
> > 
> > [...]
> > 
> > > > >        }
> > > > >        return &iommu_as[devfn]->as;
> > > > >    }
> > > > > @@ -1172,6 +1274,10 @@ static void amdvi_realize(DeviceState *dev, Error **err)
> > > > >            return;
> > > > >        }
> > > > > +    /* Pseudo address space under root PCI bus. */
> > > > > +    pcms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_SB_IOAPIC_ID);
> > > > > +    s->intr_enabled = x86_iommu->intr_supported;
> > > > 
> > > > So does this mean that AMD IR cannot be disabled if declared support?
> > > > For VT-d, IR needs to be explicitly enabled otherwise disabled (even
> > > > supported).
> > > > 
> > > 
> > > 
> > > Yes, once its declared as supported then it can not disabled. Its
> > > upto the guest OS to decide whether it want to use the intr remapping
> > > feature by parsing the IVRS. This also brings question, should we
> > > just enable it by default because its guest OS decision whether it
> > > wants to use it or not.
> > 
> > It's by default off?  I mean:
> > 
> >      DEFINE_PROP_BOOL("intremap", X86IOMMUState, intr_supported, false),
> > 
> > Then it's good to me, thanks.  You could add a comment there
> > explicitly mentioning that "IR will be enabled as long as declared
> > support for AMD" since it might confuse some of the people like me...
> > but it's optional.
> > 
> > Regards,
> > 
> 
> Actually, there are two separate knobs here:
> 
> * This option basically means the interrupt remapping support is available/unavailable,
>   which should default to available unless specified to unavailable at the QEMU command line.
>   For example, the AMD IOMMU v1 does not have interrupt remapping support, which means
>   the interrupt remapping bit fields in DTE (e.g. IV, IntCtl, EIntPass, INITPass, NMIPass,
>   Lint0Pass, Lint1Pass) are ignored.
> 
> * If interrupt remapping support is available, the guest OS can decide to enable or disable the feature
>   using DTE bit fileds (e.g. Linux option intremap=off would disable interrupt remapping
>   by not setting DTE[IV] bit). For Linux AMD IOMMU driver, the default is to enable the interrupt remapping.
> 
> In facts, we should not need this option. However, if you prefer to keep this option,
> we probably should rename this to "intremap_sup", in which if the default value should be 1.

I see, thanks.  For me I don't yet see a reason for that intremap_sup
since AFAIU that's exactly what current QEMU's intremap parameter do.

I think Intel is having the similar knobs:

* The "intremap" in QEMU decides whether the hardware we emulate
  supports interrupt remapping, and

* The "intremap" in the guest decides whether the guest kernel will
  use the interrupt remapping feature

Though IMHO the only difference is that Intel has another global knob
to turn that on/off in the GCMD register (VT-d spec 10.4.4, GCMD bit
25) depending on whether the 2nd knob is on, while for AMD it's just
per-device rather than per-iommu.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 6/6] x86_iommu/amd: Enable Guest virtual APIC support
  2018-09-13  8:36       ` Suravee Suthikulpanit
@ 2018-09-13 11:44         ` Peter Xu
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Xu @ 2018-09-13 11:44 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: Brijesh Singh, qemu-devel, Tom Lendacky, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, Richard Henderson

On Thu, Sep 13, 2018 at 03:36:28PM +0700, Suravee Suthikulpanit wrote:
> Brijesh/Peter,
> 
> On 9/13/18 4:14 AM, Brijesh Singh wrote:
> > 
> > 
> > On 09/11/2018 11:52 PM, Peter Xu wrote:
> > ...
> > 
> > > > 
> > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > index 5c2c638..1cbc8ba 100644
> > > > --- a/hw/i386/acpi-build.c
> > > > +++ b/hw/i386/acpi-build.c
> > > > @@ -2565,7 +2565,8 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
> > > >       build_append_int_noprefix(table_data,
> > > >                                (48UL << 30) | /* HATS   */
> > > >                                (48UL << 28) | /* GATS   */
> > > > -                             (1UL << 2),    /* GTSup  */
> > > > +                             (1UL << 2)   | /* GTSup  */
> > > > +                             (1UL << 6),    /* GASup  */
> > > 
> > > Sorry if I misunderstood - is this for nested?
> > > 
> > > I'm a bit confused here... IIUC in your previous patches you didn't
> > > really implement guest_mode==1 case in IRTEs.  So if you have this set
> > > then the guest should be able to setup IRTEs with guest_mode==1?  How
> > > did it work?
> > > 
> > 
> > Yes, sometime spec can be confusing ;) let me see if I can explain it.
> > 
> > Suravee, please correct me if I misunderstood something
> > 
> > - A legacy interrupt remap support is available on all IOMMU versions.
> > 
> >    Guest OS makes the decision whether to use the feature. Guest OS
> >    sets the IV bit in DTE to indicate when there is a valid interrupt
> >    map (See Table 7 DTE definition).
> > 
> >    In this mode, IRTE is 32-bit. The field details are available
> >    in Table 20 - Section 2.2.5.2. The third patch in this series
> >    implements the support for this case.
> 
> This is correct.
> 
> > 
> > - When AVIC is used in the guest, the intr remap logic is different.
> > 
> >    Based on the guest_mode, IOMMU and AVIC may need to work together
> >    to remap the interrupts (IOMMU spec refers this as Guest Virtual APIC
> >    Enabled -- Section 2.2.5.3).
> > 
> >    The GASup bit in extended feature register and IVHD IOMMU feature
> >    reporting field in IVRS are used to tell whether the IOMMU supports
> >    intr remap when AVIC is enabled in the guest.
> > 
> >    In this mode, the IRTE is 128-bit.
> 
> GASup is a prereq for GAMSup and several other features in the future.
> 
> > 
> >    To make things interesting, there is a GAMSup bit in extended
> >    feature register. It is used by IOMMU to tell the AVIC supported
> >    modes:
> > 
> >    a) 0 = intr_remap only (i.e IOMMU can remap the interrupt on its own)
> >    b) 1 = Both guest AVIC and IOMMU will work together to remap the intr
> > 
> >    The patch 5 in the series implements the intr_remap (#a)
> > 
> >    I have not implemented the mode=1. I am not sure if its worth
> >    implementing this mode for the emulated IOMMU case.
> > 
> >    See Table 21 for control knobs from IOMMU. This patch series
> >    implements the first three rows.
> 
> GAMSup is guest virtual APIC mode (aka AVIC), and for vIOMMU case,
> would be used for nested VM. This is not currently supported for vIOMMU,

Hi, Brijesh, Suravee,

Thank you for the explanations.  Now with that I understand the states
and, yes, I agree that the patchset has implemented 128bits int remap
so you should have GASup set on both places (and keep GAMSup==0).
Then please take mine:

Reviewed-by: Peter Xu <peterx@redhat.com>

Actually I don't quite understand why these information needs to be
declared in two places (ACPI and MMIO region).  More interestingly, I
noticed that there is a note in the ACPI chapter 5:

        Software Implementation Note: Information conveyed in the IVRS
        overrides the corresponding information available through the
        IOMMU hardware registers. System software is required to honor
        the ACPI settings.

Does it mean that there could be chance where both data do not match?
I would be curious why not only declare the capabilities at one place?
AFAIU that's what Intel is doing, e.g., Intel ACPI DRHD structures do
not have the feature bit declarations but they are all in the
capability registers.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] x86_iommu/amd: Prepare for interrupt remap support
  2018-09-13  8:15         ` Suravee Suthikulpanit
  2018-09-13  8:48           ` Peter Xu
@ 2018-09-13 14:47           ` Paolo Bonzini
  1 sibling, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2018-09-13 14:47 UTC (permalink / raw)
  To: Suravee Suthikulpanit, Peter Xu, Brijesh Singh
  Cc: qemu-devel, Tom Lendacky, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson

On 13/09/2018 10:15, Suravee Suthikulpanit wrote:
> However, if you prefer to keep this option,
> we probably should rename this to "intremap_sup", in which if the
> default value should be 1.

The main reason to have the property and to leave it off by default is
that it is incompatible with kernel_irqchip=split.  A second reason is
migration support, where you would be able to migrate from QEMU 3.1 to
3.0 and interrupt remapping would disappear.

It's okay to call the property just "intremap" even if it's ultimately
the guest's call to use it.  It's the same for e.g. a PCI device's MSI
support, properties for that are called just "msi" or "msix".

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC
  2018-09-12 19:24     ` Brijesh Singh
@ 2018-09-13 18:18       ` Michael S. Tsirkin
  2018-09-13 22:20         ` Brijesh Singh
  0 siblings, 1 reply; 31+ messages in thread
From: Michael S. Tsirkin @ 2018-09-13 18:18 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Igor Mammedov, qemu-devel, Tom Lendacky, Eduardo Habkost,
	Paolo Bonzini, Suravee Suthikulpanit, Richard Henderson

On Wed, Sep 12, 2018 at 02:24:52PM -0500, Brijesh Singh wrote:
> 
> 
> On 09/12/2018 11:35 AM, Igor Mammedov wrote:
> ...
> 
> > > +    /*
> > > +     * When interrupt remapping is enabled, Linux IOMMU driver also checks
> > > +     * for special IVHD device (type IO-APIC), which is typically presented
> > > +     * as PCI device 14:00.0.
> > Probably it shouldn't be a 'typically' device from somewhere but rather address
> > fetched from corresponding device model QEMU implements.
> > 
> 
> IOAPIC is not presented as a true PCI device to guest OS. When IOMMU is
> enabled a pseudo address space to added under root PCI bus. PCI 14:0.0
> presents to this pseudo device.
> 
> > 
> > > +     */
> > > +    if (s->intr_enabled) {
> > > +        build_append_int_noprefix(table_data, 0x0100a00000000048, 8);
> >                                                   ^^ this is incomprehensible,
> > where does this magic number comes from and how was it calculated?
> > 
> 
> In order to provide interrupt remap support, a special IVHD device need
> to be added,  the magic number uses the format defined in Table 95 (IVHD
> device entry type codes).
> 
> 0x01 00a0 00 00 0000 48
> 
> Byte 0: 0x48 (special device)
> Byte 1 & 2: must be zero
> Byte 3: 0 (dte setting)
> Byte 4: 0 (handle)
> Byte 5 & 6: IOAPIC devfn (14:0.0)

Do you mean *bus* devfn? devfn is 0.0.

> Byte 7: 0x1 (IOAPIC) - See Table 97 in spec


Above should go into code comment, along with
first (oldest) version of spec that has this table.
Additionally the number is IMHO more readable as:
	(0x1ull << 56) | (PCI_BUILD_BDF(14, 0) << 40) | 0x48

(assuming I got what it should be).

> 
> > > +    }
> > > +
> > >       build_header(linker, table_data, (void *)(table_data->data + iommu_start),
> > >                    "IVRS", table_data->len - iommu_start, 1, NULL, NULL);
> > >   }
> > 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC
  2018-09-13 18:18       ` Michael S. Tsirkin
@ 2018-09-13 22:20         ` Brijesh Singh
  2018-09-13 22:29           ` Michael S. Tsirkin
  0 siblings, 1 reply; 31+ messages in thread
From: Brijesh Singh @ 2018-09-13 22:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: brijesh.singh, Igor Mammedov, qemu-devel, Tom Lendacky,
	Eduardo Habkost, Paolo Bonzini, Suravee Suthikulpanit,
	Richard Henderson



On 09/13/2018 01:18 PM, Michael S. Tsirkin wrote:
...>>
>> 0x01 00a0 00 00 0000 48
>>
>> Byte 0: 0x48 (special device)
>> Byte 1 & 2: must be zero
>> Byte 3: 0 (dte setting)
>> Byte 4: 0 (handle)
>> Byte 5 & 6: IOAPIC devfn (14:0.0)
> 
> Do you mean *bus* devfn? devfn is 0.0.
> 

Sorry my bad, I was meaning to write devid and not devfn.

See, 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/amd_iommu_init.c#n2343

/* SB IOAPIC is always on this device in AMD systems */
#define IOAPIC_SB_DEVID		((0x00 << 8) | PCI_DEVFN(0x14, 0))


>> Byte 7: 0x1 (IOAPIC) - See Table 97 in spec
> 
> 
> Above should go into code comment, along with
> first (oldest) version of spec that has this table.
> Additionally the number is IMHO more readable as:
> 	(0x1ull << 56) | (PCI_BUILD_BDF(14, 0) << 40) | 0x48
> 
> (assuming I got what it should be).
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC
  2018-09-13 22:20         ` Brijesh Singh
@ 2018-09-13 22:29           ` Michael S. Tsirkin
  0 siblings, 0 replies; 31+ messages in thread
From: Michael S. Tsirkin @ 2018-09-13 22:29 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Igor Mammedov, qemu-devel, Tom Lendacky, Eduardo Habkost,
	Paolo Bonzini, Suravee Suthikulpanit, Richard Henderson

On Thu, Sep 13, 2018 at 05:20:34PM -0500, Brijesh Singh wrote:
> 
> 
> On 09/13/2018 01:18 PM, Michael S. Tsirkin wrote:
> ...>>
> > > 0x01 00a0 00 00 0000 48
> > > 
> > > Byte 0: 0x48 (special device)
> > > Byte 1 & 2: must be zero
> > > Byte 3: 0 (dte setting)
> > > Byte 4: 0 (handle)
> > > Byte 5 & 6: IOAPIC devfn (14:0.0)
> > 
> > Do you mean *bus* devfn? devfn is 0.0.
> > 
> 
> Sorry my bad, I was meaning to write devid and not devfn.
> 
> See, https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/amd_iommu_init.c#n2343
> 
> /* SB IOAPIC is always on this device in AMD systems */
> #define IOAPIC_SB_DEVID		((0x00 << 8) | PCI_DEVFN(0x14, 0))

And devid is bus device function.

So in fact it is 0:20.0

So use

	PCI_BUILD_BDF(0, PCI_DEVFN(0x14, 0))

below

> 
> > > Byte 7: 0x1 (IOAPIC) - See Table 97 in spec
> > 
> > 
> > Above should go into code comment, along with
> > first (oldest) version of spec that has this table.
> > Additionally the number is IMHO more readable as:
> > 	(0x1ull << 56) | (PCI_BUILD_BDF(14, 0) << 40) | 0x48
> > 
> > (assuming I got what it should be).
> > 

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2018-09-13 22:29 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-11 16:49 [Qemu-devel] [PATCH 0/6] x86_iommu/amd: add interrupt remap support Brijesh Singh
2018-09-11 16:49 ` [Qemu-devel] [PATCH 1/6] x86_iommu: move the kernel-irqchip check in common code Brijesh Singh
2018-09-12  3:45   ` Peter Xu
2018-09-11 16:49 ` [Qemu-devel] [PATCH 2/6] x86_iommu/amd: Prepare for interrupt remap support Brijesh Singh
2018-09-12  3:52   ` Peter Xu
2018-09-12 18:59     ` Brijesh Singh
2018-09-13  3:15       ` Peter Xu
2018-09-13  8:15         ` Suravee Suthikulpanit
2018-09-13  8:48           ` Peter Xu
2018-09-13 14:47           ` Paolo Bonzini
2018-09-11 16:49 ` [Qemu-devel] [PATCH 3/6] x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled Brijesh Singh
2018-09-12  3:37   ` Peter Xu
2018-09-12 18:50     ` Brijesh Singh
2018-09-13  2:59       ` Peter Xu
2018-09-11 16:49 ` [Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC Brijesh Singh
2018-09-12  4:35   ` Peter Xu
2018-09-12 19:11     ` Brijesh Singh
2018-09-13  3:20       ` Peter Xu
2018-09-12 16:35   ` Igor Mammedov
2018-09-12 19:24     ` Brijesh Singh
2018-09-13 18:18       ` Michael S. Tsirkin
2018-09-13 22:20         ` Brijesh Singh
2018-09-13 22:29           ` Michael S. Tsirkin
2018-09-11 16:49 ` [Qemu-devel] [PATCH 6/6] x86_iommu/amd: Enable Guest virtual APIC support Brijesh Singh
2018-09-12  4:52   ` Peter Xu
2018-09-12 21:14     ` Brijesh Singh
2018-09-13  8:36       ` Suravee Suthikulpanit
2018-09-13 11:44         ` Peter Xu
2018-09-13  7:13     ` Suravee Suthikulpanit
2018-09-12 16:38   ` Igor Mammedov
     [not found] ` <1536684589-11718-6-git-send-email-brijesh.singh@amd.com>
2018-09-13  7:16   ` [Qemu-devel] [PATCH 5/6] x86_iommu/amd: Add interrupt remap support when VAPIC is enabled Suravee Suthikulpanit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.