All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d
@ 2017-11-17  6:22 Chao Gao
  2017-11-17  6:22 ` [PATCH v4 01/28] Xen/doc: Add Xen virtual IOMMU doc Chao Gao
                   ` (28 more replies)
  0 siblings, 29 replies; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Ian Jackson, Tim Deegan, Jan Beulich,
	Andrew Cooper, Chao Gao, Roger Pau Monné

This patchset is to introduce vIOMMU framework and add virtual VTD's
interrupt remapping support according "Xen virtual IOMMU high level
design doc V3"(https://lists.xenproject.org/archives/html/xen-devel/
2016-11/msg01391.html).

- vIOMMU framework
New framework provides viommu_ops and help functions to abstract
vIOMMU operations(E,G create, destroy, handle irq remapping request
and so on). Vendors(Intel, ARM, AMD and son) can implement their
vIOMMU callbacks.

- Virtual VTD
We enable irq remapping function and covers both
MSI and IOAPIC interrupts. Don't support post interrupt mode emulation
and post interrupt mode enabled on host with virtual VTD. will add
later.

In case of conflicts, this series also can be found in my personal github:
Xen: https://github.com/gc1008/viommu_xen.git vIOMMU4
Qemu: https://github.com/gc1008/viommu_qemu.git vIOMMU3

Any comments would be highly appreciated. And below is change history.

Changes since v3:
 - add logic to build DMAR table for PVH guest. But only very limited tests are
 performed on PVH guest.
 - use one interface to binding guest remappable and non-remappable interrupts
 with physical interrupts. To achieve this, current binding interface should
 be changed. The advantage is that it can simplify the code to support the
 new format guest interrupt. But the disadvantage is clearly incompatible with
 old QEMU.
 - VT-d posted interrupt feature can be used to deliver guest remappable
 interrupt. The guest interrupt attributes (vector, affinity) are decoded from
 guest IRTE and then accordingly written to host IRTE. In this version, when
 guest invalidates an IRTE, the host IRTE will be updated according to the
 new guest IRTE.
 - add draining in-flight interrupt support. When guest invalidates
 an IRTE, the in-flight interrupt related to the IRTE should be drained.
 This version provides a very simple solution: process QI only when no
 interrupt is delivering which definitely implies there is no in-flight
 interrupt.
 - use locks in QI and fault handling sub-feature. These locks guarantee
 the registers/status won't be changed by guest when vvtd is dealing with
 faults or invalidation requests.
 - move viommu structure under hvm domain rather than making it a field of
 struct domain.
 - remove unneeded domctl interface for destroying viommu. Currently, dynamic
 destruction aren't needed.
 - reorder the patches per Roger's suggestion: the viommu abstract goes first,
 then the implementation of emulated VT-d, several hooks of
 configuring/delivering guest interrupt and EOI, and related changes of
 toolstack.
 - fix many coding style issues pointed out by Roger.

Change since v2:
       1) Remove vIOMMU hypercall of query capabilities and introduce when 
necessary.
       2) Remove length field of vIOMMU create parameter of vIOMMU hypercall
       3) Introduce irq remapping mode callback to vIOMMU framework and vIOMMU 
device models
can check irq remapping mode by vendor specific ways.
       4) Update vIOMMU docs.
       5) Other changes please see patches' change logs.

Change since v1:
       1) Fix coding style issues
       2) Add definitions for vIOMMU type and capabilities
       3) Change vIOMMU kconfig and select vIOMMU default on x86
       4) Put vIOMMU creation in libxl__arch_domain_create()
       5) Make vIOMMU structure of tool stack more general for both PV and HVM.

Change since RFC v2:
       1) Move vvtd.c to drivers/passthrough/vtd directroy. 
       2) Make vIOMMU always built in on x86
       3) Add new boot cmd "viommu" to enable viommu function
       4) Fix some code stype issues.

Change since RFC v1:
       1) Add Xen virtual IOMMU doc docs/misc/viommu.txt
       2) Move vIOMMU hypercall of create/destroy vIOMMU and query  
capabilities from dmop to domctl suggested by Paul Durrant. Because
these hypercalls can be done in tool stack and more VM mode(E,G PVH
or other modes don't use Qemu) can be benefit.
       3) Add check of input MMIO address and length.
       4) Add iommu_type in vIOMMU hypercall parameter to specify
vendor vIOMMU device model(E,G Intel VTD, AMD or ARM IOMMU. So far
only support Intel VTD).
       5) Add save and restore support for vvtd

Chao Gao (23):
  vtd: clean-up and preparation for vvtd
  x86/hvm: Introduce a emulated VTD for HVM
  x86/vvtd: Add MMIO handler for VVTD
  x86/vvtd: Set Interrupt Remapping Table Pointer through GCMD
  x86/vvtd: Enable Interrupt Remapping through GCMD
  x86/vvtd: Process interrupt remapping request
  x86/vvtd: decode interrupt attribute from IRTE
  x86/vvtd: add a helper function to decide the interrupt format
  x86/vvtd: Handle interrupt translation faults
  x86/vvtd: Enable Queued Invalidation through GCMD
  x86/vvtd: Add queued invalidation (QI) support
  x86/vvtd: save and restore emulated VT-d
  x86/vioapic: Hook interrupt delivery of vIOAPIC
  x86/vioapic: extend vioapic_get_vector() to support remapping format
    RTE
  xen/pt: when binding guest msi, accept the whole msi message
  vvtd: update hvm_gmsi_info when binding guest msi with pirq or
  x86/vmsi: Hook delivering remapping format msi to guest and handling
    eoi
  tools/libacpi: Add DMA remapping reporting (DMAR) ACPI table
    structures
  tools/libacpi: Add new fields in acpi_config for DMAR table
  tools/libxl: Add an user configurable parameter to control vIOMMU
    attributes
  tools/libxl: build DMAR table for a guest with one virtual VTD
  tools/libxl: create vIOMMU during domain construction
  tools/libxc: Add viommu operations in libxc

Lan Tianyu (5):
  Xen/doc: Add Xen virtual IOMMU doc
  VIOMMU: Add vIOMMU framework and vIOMMU domctl
  VIOMMU: Add irq request callback to deal with irq remapping
  VIOMMU: Add get irq info callback to convert irq remapping request
  VIOMMU: Introduce callback of checking irq remapping mode

 docs/man/xl.cfg.pod.5.in               |   27 +
 docs/misc/viommu.txt                   |  120 +++
 docs/misc/xen-command-line.markdown    |    7 +
 tools/firmware/hvmloader/ovmf.c        |    2 +-
 tools/libacpi/acpi2_0.h                |   61 ++
 tools/libacpi/build.c                  |   61 ++
 tools/libacpi/libacpi.h                |   10 +
 tools/libxc/Makefile                   |    1 +
 tools/libxc/include/xenctrl.h          |   10 +-
 tools/libxc/xc_domain.c                |   14 +-
 tools/libxc/xc_viommu.c                |   51 ++
 tools/libxl/libxl_arch.h               |    1 +
 tools/libxl/libxl_create.c             |   47 ++
 tools/libxl/libxl_types.idl            |   12 +
 tools/libxl/libxl_x86.c                |   21 +-
 tools/libxl/libxl_x86_acpi.c           |   98 ++-
 tools/xl/xl_parse.c                    |   50 +-
 xen/arch/x86/Kconfig                   |    1 +
 xen/arch/x86/hvm/hvm.c                 |    5 +-
 xen/arch/x86/hvm/irq.c                 |    6 +
 xen/arch/x86/hvm/vioapic.c             |   23 +-
 xen/arch/x86/hvm/vmsi.c                |   33 +-
 xen/arch/x86/xen.lds.S                 |    3 +
 xen/common/Kconfig                     |    3 +
 xen/common/Makefile                    |    1 +
 xen/common/domctl.c                    |    7 +
 xen/common/viommu.c                    |  171 +++++
 xen/drivers/passthrough/io.c           |  144 ++--
 xen/drivers/passthrough/vtd/Makefile   |    7 +-
 xen/drivers/passthrough/vtd/iommu.h    |  197 +++--
 xen/drivers/passthrough/vtd/vvtd.c     | 1301 ++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/domain.h       |    3 +
 xen/include/asm-x86/hvm/hvm.h          |    2 +
 xen/include/asm-x86/hvm/irq.h          |    6 +-
 xen/include/asm-x86/viommu.h           |   82 ++
 xen/include/public/arch-x86/hvm/save.h |   18 +-
 xen/include/public/domctl.h            |   39 +-
 xen/include/xen/viommu.h               |   85 +++
 38 files changed, 2603 insertions(+), 127 deletions(-)
 create mode 100644 docs/misc/viommu.txt
 create mode 100644 tools/libxc/xc_viommu.c
 create mode 100644 xen/common/viommu.c
 create mode 100644 xen/drivers/passthrough/vtd/vvtd.c
 create mode 100644 xen/include/asm-x86/viommu.h
 create mode 100644 xen/include/xen/viommu.h

-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v4 01/28] Xen/doc: Add Xen virtual IOMMU doc
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-09 12:54   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 02/28] VIOMMU: Add vIOMMU framework and vIOMMU domctl Chao Gao
                   ` (27 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

From: Lan Tianyu <tianyu.lan@intel.com>

This patch is to add Xen virtual IOMMU doc to introduce motivation,
framework, vIOMMU hypercall and xl configuration.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 docs/misc/viommu.txt | 120 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 120 insertions(+)
 create mode 100644 docs/misc/viommu.txt

diff --git a/docs/misc/viommu.txt b/docs/misc/viommu.txt
new file mode 100644
index 0000000..472d2b5
--- /dev/null
+++ b/docs/misc/viommu.txt
@@ -0,0 +1,120 @@
+Xen virtual IOMMU
+
+Motivation
+==========
+Enable more than 128 vcpu support
+
+The current requirements of HPC cloud service requires VM with a high
+number of CPUs in order to achieve high performance in parallel
+computing.
+
+To support >128 vcpus, X2APIC mode in guest is necessary because legacy
+APIC(XAPIC) just supports 8-bit APIC ID. The APIC ID used by Xen is
+CPU ID * 2 (ie: CPU 127 has APIC ID 254, which is the last one available
+in xAPIC mode) and so it only can support 128 vcpus at most. x2APIC mode
+supports 32-bit APIC ID and it requires the interrupt remapping functionality
+of a vIOMMU if the guest wishes to route interrupts to all available vCPUs
+
+PCI MSI/IOAPIC can only send interrupt message containing 8-bit APIC ID,
+which cannot address cpus with >254 APIC ID. Interrupt remapping supports
+32-bit APIC ID and so it's necessary for >128 vcpus support.
+
+vIOMMU Architecture
+===================
+vIOMMU device model is inside Xen hypervisor for following factors
+    1) Avoid round trips between Qemu and Xen hypervisor
+    2) Ease of integration with the rest of hypervisor
+    3) PVH doesn't use Qemu
+
+* Interrupt remapping overview.
+Interrupts from virtual devices and physical devices are delivered
+to vLAPIC from vIOAPIC and vMSI. vIOMMU needs to remap interrupt during
+this procedure.
+
++---------------------------------------------------+
+|Qemu                       |VM                     |
+|                           | +----------------+    |
+|                           | |  Device driver |    |
+|                           | +--------+-------+    |
+|                           |          ^            |
+|       +----------------+  | +--------+-------+    |
+|       | Virtual device |  | |  IRQ subsystem |    |
+|       +-------+--------+  | +--------+-------+    |
+|               |           |          ^            |
+|               |           |          |            |
++---------------------------+-----------------------+
+|hypervisor     |                      | VIRQ       |
+|               |            +---------+--------+   |
+|               |            |      vLAPIC      |   |
+|               |VIRQ        +---------+--------+   |
+|               |                      ^            |
+|               |                      |            |
+|               |            +---------+--------+   |
+|               |            |      vIOMMU      |   |
+|               |            +---------+--------+   |
+|               |                      ^            |
+|               |                      |            |
+|               |            +---------+--------+   |
+|               |            |   vIOAPIC/vMSI   |   |
+|               |            +----+----+--------+   |
+|               |                 ^    ^            |
+|               +-----------------+    |            |
+|                                      |            |
++---------------------------------------------------+
+HW                                     |IRQ
+                                +-------------------+
+                                |   PCI Device      |
+                                +-------------------+
+
+
+vIOMMU hypercall
+================
+Introduce a new domctl hypercall "xen_domctl_viommu_op" to create
+vIOMMUs instance in hypervisor. vIOMMU instance will be destroyed
+during destroying domain.
+
+* vIOMMU hypercall parameter structure
+
+/* vIOMMU type - specify vendor vIOMMU device model */
+#define VIOMMU_TYPE_INTEL_VTD	       0
+
+/* vIOMMU capabilities */
+#define VIOMMU_CAP_IRQ_REMAPPING  (1u << 0)
+
+struct xen_domctl_viommu_op {
+    uint32_t cmd;
+#define XEN_DOMCTL_viommu_create          0
+    union {
+        struct {
+            /* IN - vIOMMU type  */
+            uint8_t type;
+            /* IN - MMIO base address of vIOMMU. */
+            uint64_t base_address;
+            /* IN - Capabilities with which we want to create */
+            uint64_t capabilities;
+            /* OUT - vIOMMU identity */
+            uint32_t id;
+        } create;
+    } u;
+};
+
+- XEN_DOMCTL_create_viommu
+    Create vIOMMU device with type, capabilities and MMIO base address.
+Hypervisor allocates viommu_id for new vIOMMU instance and return back.
+The vIOMMU device model in hypervisor should check whether it can
+support the input capabilities and return error if not.
+
+vIOMMU domctl and vIOMMU option in configure file consider multi-vIOMMU
+support for single VM.(e.g, parameters of create vIOMMU includes vIOMMU id).
+But function implementation only supports one vIOMMU per VM so far.
+
+xl x86 vIOMMU configuration"
+============================
+viommu = [
+    'type=intel_vtd,intremap=1',
+    ...
+]
+
+"type" - Specify vIOMMU device model type. Currently only supports Intel vtd
+device model.
+"intremap" - Enable vIOMMU interrupt remapping function.
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 02/28] VIOMMU: Add vIOMMU framework and vIOMMU domctl
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
  2017-11-17  6:22 ` [PATCH v4 01/28] Xen/doc: Add Xen virtual IOMMU doc Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-09 14:33   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 03/28] VIOMMU: Add irq request callback to deal with irq remapping Chao Gao
                   ` (26 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

From: Lan Tianyu <tianyu.lan@intel.com>

This patch is to introduce an abstract layer for arch vIOMMU implementation
and vIOMMU domctl to deal with requests from tool stack. Arch vIOMMU code needs to
provide callback. vIOMMU domctl supports to create vIOMMU instance in hypervisor
and it will be destroyed during destroying domain.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v4:
 - introduce REGISTER_VIOMMU() to register viommu types and ops.
 - remove unneeded domctl interface to destroy viommu.
---
 docs/misc/xen-command-line.markdown |   7 ++
 xen/arch/x86/Kconfig                |   1 +
 xen/arch/x86/hvm/hvm.c              |   3 +
 xen/arch/x86/xen.lds.S              |   3 +
 xen/common/Kconfig                  |   3 +
 xen/common/Makefile                 |   1 +
 xen/common/domctl.c                 |   7 ++
 xen/common/viommu.c                 | 125 ++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/domain.h    |   3 +
 xen/include/public/domctl.h         |  31 +++++++++
 xen/include/xen/viommu.h            |  69 ++++++++++++++++++++
 11 files changed, 253 insertions(+)
 create mode 100644 xen/common/viommu.c
 create mode 100644 xen/include/xen/viommu.h

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index eb4995e..d097382 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1836,3 +1836,10 @@ mode.
 > Default: `true`
 
 Permit use of the `xsave/xrstor` instructions.
+
+### viommu
+> `= <boolean>`
+
+> Default: `false`
+
+Permit use of viommu interface to create and destroy viommu device model.
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 64955dc..df254e4 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -25,6 +25,7 @@ config X86
 	select HAS_UBSAN
 	select NUMA
 	select VGA
+	select VIOMMU
 
 config ARCH_DEFCONFIG
 	string
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 205b4cb..964418a 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -36,6 +36,7 @@
 #include <xen/rangeset.h>
 #include <xen/monitor.h>
 #include <xen/warning.h>
+#include <xen/viommu.h>
 #include <asm/shadow.h>
 #include <asm/hap.h>
 #include <asm/current.h>
@@ -693,6 +694,8 @@ void hvm_domain_relinquish_resources(struct domain *d)
         pmtimer_deinit(d);
         hpet_deinit(d);
     }
+
+    viommu_destroy_domain(d);
 }
 
 void hvm_domain_destroy(struct domain *d)
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index d5e8821..7f8d2b8 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -231,6 +231,9 @@ SECTIONS
        __start_schedulers_array = .;
        *(.data.schedulers)
        __end_schedulers_array = .;
+       __start_viommus_array = .;
+       *(.data.viommus)
+       __end_viommus_array = .;
   } :text
 
   .data : {                    /* Data */
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 103ef44..62aaa76 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -52,6 +52,9 @@ config HAS_CHECKPOLICY
 	string
 	option env="XEN_HAS_CHECKPOLICY"
 
+config VIOMMU
+	bool
+
 config KEXEC
 	bool "kexec support"
 	default y
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 66cc2c8..182b3ac 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -56,6 +56,7 @@ obj-y += time.o
 obj-y += timer.o
 obj-y += trace.o
 obj-y += version.o
+obj-$(CONFIG_VIOMMU) += viommu.o
 obj-y += virtual_region.o
 obj-y += vm_event.o
 obj-y += vmap.o
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 3c6fa4e..9c5651d 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -25,6 +25,7 @@
 #include <xen/paging.h>
 #include <xen/hypercall.h>
 #include <xen/vm_event.h>
+#include <xen/viommu.h>
 #include <xen/monitor.h>
 #include <asm/current.h>
 #include <asm/irq.h>
@@ -1155,6 +1156,12 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
                                      op->u.set_gnttab_limits.maptrack_frames);
         break;
 
+    case XEN_DOMCTL_viommu_op:
+        ret = viommu_domctl(d, &op->u.viommu_op);
+        if ( !ret )
+            copyback = 1;
+        break;
+
     default:
         ret = arch_do_domctl(op, d, u_domctl);
         break;
diff --git a/xen/common/viommu.c b/xen/common/viommu.c
new file mode 100644
index 0000000..fd8b7fd
--- /dev/null
+++ b/xen/common/viommu.c
@@ -0,0 +1,125 @@
+/*
+ * common/viommu.c
+ *
+ * Copyright (c) 2017 Intel Corporation
+ * Author: Lan Tianyu <tianyu.lan@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/sched.h>
+#include <xen/spinlock.h>
+#include <xen/types.h>
+#include <xen/viommu.h>
+
+extern const struct viommu_ops *__start_viommus_array[], *__end_viommus_array[];
+#define NUM_VIOMMU_TYPE (__end_viommus_array - __start_viommus_array)
+#define viommu_type_array __start_viommus_array
+
+int viommu_destroy_domain(struct domain *d)
+{
+    struct viommu *viommu = d->arch.hvm_domain.viommu;
+    int ret;
+
+    if ( !viommu )
+        return -ENODEV;
+
+    ret = viommu->ops->destroy(viommu);
+    if ( ret < 0 )
+        return ret;
+
+    xfree(viommu);
+    d->arch.hvm_domain.viommu = NULL;
+
+    return 0;
+}
+
+static const struct viommu_ops *viommu_get_ops(uint8_t type)
+{
+    int i;
+
+    for ( i = 0; i < NUM_VIOMMU_TYPE; i++)
+    {
+        if ( viommu_type_array[i]->type == type )
+            return viommu_type_array[i];
+    }
+
+    return NULL;
+}
+
+static int viommu_create(struct domain *d, uint8_t type,
+                         uint64_t base_address, uint64_t caps,
+                         uint32_t *viommu_id)
+{
+    struct viommu *viommu;
+    const struct viommu_ops *viommu_ops = NULL;
+    int rc;
+
+    /* Only support one vIOMMU per domain. */
+    if ( d->arch.hvm_domain.viommu )
+        return -E2BIG;
+
+    viommu_ops = viommu_get_ops(type);
+    if ( !viommu_ops )
+        return -EINVAL;
+
+    ASSERT(viommu_ops->create);
+
+    viommu = xzalloc(struct viommu);
+    if ( !viommu )
+        return -ENOMEM;
+
+    viommu->base_address = base_address;
+    viommu->caps = caps;
+    viommu->ops = viommu_ops;
+
+    rc = viommu_ops->create(d, viommu);
+    if ( rc < 0 )
+    {
+        xfree(viommu);
+        return rc;
+    }
+
+    d->arch.hvm_domain.viommu = viommu;
+
+    /* Only support one vIOMMU per domain. */
+    *viommu_id = 0;
+    return 0;
+}
+
+int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op)
+{
+    int rc;
+
+    switch ( op->cmd )
+    {
+    case XEN_DOMCTL_viommu_create:
+        rc = viommu_create(d, op->u.create.type, op->u.create.base_address,
+                           op->u.create.capabilities, &op->u.create.id);
+        break;
+    default:
+        return -ENOSYS;
+    }
+
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 7f128c0..fcd3482 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -21,6 +21,7 @@
 #define __ASM_X86_HVM_DOMAIN_H__
 
 #include <xen/iommu.h>
+#include <xen/viommu.h>
 #include <asm/hvm/irq.h>
 #include <asm/hvm/vpt.h>
 #include <asm/hvm/vlapic.h>
@@ -196,6 +197,8 @@ struct hvm_domain {
         struct vmx_domain vmx;
         struct svm_domain svm;
     };
+
+    struct viommu *viommu;
 };
 
 #define hap_enabled(d)  ((d)->arch.hvm_domain.hap_enabled)
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 70027ab..9f6f0aa 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1105,6 +1105,35 @@ struct xen_domctl_vuart_op {
                                  */
 };
 
+/*  vIOMMU helper
+ *
+ * vIOMMU interface can be used to create vIOMMU.
+ */
+
+struct xen_domctl_viommu_op {
+    uint32_t cmd;
+#define XEN_DOMCTL_viommu_create        0
+    union {
+        struct {
+            /* IN - vIOMMU type */
+            uint8_t type;
+#define VIOMMU_TYPE_INTEL_VTD           0
+            /* 
+             * IN - MMIO base address of vIOMMU. vIOMMU device models
+             * are in charge of to check base_address.
+             */
+            uint64_t base_address;
+            /* IN - Capabilities with which we want to create */
+            uint64_t capabilities;
+#define VIOMMU_CAP_IRQ_REMAPPING        (1u << 0)
+            /* OUT - vIOMMU identity */
+            uint32_t id;
+        } create;
+    } u;
+};
+typedef struct xen_domctl_viommu_op xen_domctl_viommu_op;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_viommu_op);
+
 struct xen_domctl {
     uint32_t cmd;
 #define XEN_DOMCTL_createdomain                   1
@@ -1184,6 +1213,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_soft_reset                    79
 #define XEN_DOMCTL_set_gnttab_limits             80
 #define XEN_DOMCTL_vuart_op                      81
+#define XEN_DOMCTL_viommu_op                     82
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1248,6 +1278,7 @@ struct xen_domctl {
         struct xen_domctl_psr_cat_op        psr_cat_op;
         struct xen_domctl_set_gnttab_limits set_gnttab_limits;
         struct xen_domctl_vuart_op          vuart_op;
+        struct xen_domctl_viommu_op         viommu_op;
         uint8_t                             pad[128];
     } u;
 };
diff --git a/xen/include/xen/viommu.h b/xen/include/xen/viommu.h
new file mode 100644
index 0000000..a859d80
--- /dev/null
+++ b/xen/include/xen/viommu.h
@@ -0,0 +1,69 @@
+/*
+ * include/xen/viommu.h
+ *
+ * Copyright (c) 2017, Intel Corporation
+ * Author: Lan Tianyu <tianyu.lan@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+#ifndef __XEN_VIOMMU_H__
+#define __XEN_VIOMMU_H__
+
+#ifdef CONFIG_VIOMMU
+
+struct viommu;
+
+struct viommu_ops {
+    uint8_t type;
+    int (*create)(struct domain *d, struct viommu *viommu);
+    int (*destroy)(struct viommu *viommu);
+};
+
+struct viommu {
+    uint64_t base_address;
+    uint64_t caps;
+    const struct viommu_ops *ops;
+    void *priv;
+};
+
+#define REGISTER_VIOMMU(x) static const struct viommu_ops *x##_entry \
+  __used_section(".data.viommus") = &x;
+
+
+int viommu_register_type(uint8_t type, struct viommu_ops *ops);
+int viommu_destroy_domain(struct domain *d);
+int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op);
+#else
+static inline int viommu_destroy_domain(struct domain *d)
+{
+    return -EINVAL;
+}
+static inline
+int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op)
+{
+    return false;
+}
+#endif
+
+#endif /* __XEN_VIOMMU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 03/28] VIOMMU: Add irq request callback to deal with irq remapping
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
  2017-11-17  6:22 ` [PATCH v4 01/28] Xen/doc: Add Xen virtual IOMMU doc Chao Gao
  2017-11-17  6:22 ` [PATCH v4 02/28] VIOMMU: Add vIOMMU framework and vIOMMU domctl Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-09 15:02   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 04/28] VIOMMU: Add get irq info callback to convert irq remapping request Chao Gao
                   ` (25 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

From: Lan Tianyu <tianyu.lan@intel.com>

This patch is to add irq request callback for platform implementation
to deal with irq remapping request.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 xen/common/viommu.c          | 15 ++++++++++++
 xen/include/asm-x86/viommu.h | 54 ++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/viommu.h     |  6 +++++
 3 files changed, 75 insertions(+)
 create mode 100644 xen/include/asm-x86/viommu.h

diff --git a/xen/common/viommu.c b/xen/common/viommu.c
index fd8b7fd..53d4b70 100644
--- a/xen/common/viommu.c
+++ b/xen/common/viommu.c
@@ -114,6 +114,21 @@ int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op)
     return rc;
 }
 
+int viommu_handle_irq_request(const struct domain *d,
+                              const struct arch_irq_remapping_request *request)
+{
+    struct viommu *viommu = d->arch.hvm_domain.viommu;
+
+    if ( !viommu )
+        return -ENODEV;
+
+    ASSERT(viommu->ops);
+    if ( !viommu->ops->handle_irq_request )
+        return -EINVAL;
+
+    return viommu->ops->handle_irq_request(d, request);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/viommu.h b/xen/include/asm-x86/viommu.h
new file mode 100644
index 0000000..01ec80e
--- /dev/null
+++ b/xen/include/asm-x86/viommu.h
@@ -0,0 +1,54 @@
+/*
+ * include/asm-x86/viommu.h
+ *
+ * Copyright (c) 2017 Intel Corporation.
+ * Author: Lan Tianyu <tianyu.lan@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+#ifndef __ARCH_X86_VIOMMU_H__
+#define __ARCH_X86_VIOMMU_H__
+
+/* IRQ request type */
+enum viommu_irq_request_type {
+    VIOMMU_REQUEST_IRQ_MSI = 0,
+    VIOMMU_REQUEST_IRQ_APIC = 1
+};
+
+struct arch_irq_remapping_request
+{
+    union {
+        /* MSI */
+        struct {
+            uint64_t addr;
+            uint32_t data;
+        } msi;
+        /* Redirection Entry in IOAPIC */
+        uint64_t rte;
+    } msg;
+    uint16_t source_id;
+    enum viommu_irq_request_type type;
+};
+
+#endif /* __ARCH_X86_VIOMMU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/viommu.h b/xen/include/xen/viommu.h
index a859d80..67e25d5 100644
--- a/xen/include/xen/viommu.h
+++ b/xen/include/xen/viommu.h
@@ -22,12 +22,16 @@
 
 #ifdef CONFIG_VIOMMU
 
+#include <asm/viommu.h>
+
 struct viommu;
 
 struct viommu_ops {
     uint8_t type;
     int (*create)(struct domain *d, struct viommu *viommu);
     int (*destroy)(struct viommu *viommu);
+    int (*handle_irq_request)(const struct domain *d,
+                              const struct arch_irq_remapping_request *request);
 };
 
 struct viommu {
@@ -44,6 +48,8 @@ struct viommu {
 int viommu_register_type(uint8_t type, struct viommu_ops *ops);
 int viommu_destroy_domain(struct domain *d);
 int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op);
+int viommu_handle_irq_request(const struct domain *d,
+                              const struct arch_irq_remapping_request *request);
 #else
 static inline int viommu_destroy_domain(struct domain *d)
 {
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 04/28] VIOMMU: Add get irq info callback to convert irq remapping request
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (2 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 03/28] VIOMMU: Add irq request callback to deal with irq remapping Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-09 15:06   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 05/28] VIOMMU: Introduce callback of checking irq remapping mode Chao Gao
                   ` (24 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

From: Lan Tianyu <tianyu.lan@intel.com>

This patch is to add get_irq_info callback for platform implementation
to convert irq remapping request to irq info (E,G vector, dest, dest_mode
and so on).

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 xen/common/viommu.c          | 16 ++++++++++++++++
 xen/include/asm-x86/viommu.h |  8 ++++++++
 xen/include/xen/viommu.h     |  6 ++++++
 3 files changed, 30 insertions(+)

diff --git a/xen/common/viommu.c b/xen/common/viommu.c
index 53d4b70..9eafdef 100644
--- a/xen/common/viommu.c
+++ b/xen/common/viommu.c
@@ -129,6 +129,22 @@ int viommu_handle_irq_request(const struct domain *d,
     return viommu->ops->handle_irq_request(d, request);
 }
 
+int viommu_get_irq_info(const struct domain *d,
+                        const struct arch_irq_remapping_request *request,
+                        struct arch_irq_remapping_info *irq_info)
+{
+    const struct viommu *viommu = d->arch.hvm_domain.viommu;
+
+    if ( !viommu )
+        return -EINVAL;
+
+    ASSERT(viommu->ops);
+    if ( !viommu->ops->get_irq_info )
+        return -EINVAL;
+
+    return viommu->ops->get_irq_info(d, request, irq_info);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/viommu.h b/xen/include/asm-x86/viommu.h
index 01ec80e..3d995ba 100644
--- a/xen/include/asm-x86/viommu.h
+++ b/xen/include/asm-x86/viommu.h
@@ -26,6 +26,14 @@ enum viommu_irq_request_type {
     VIOMMU_REQUEST_IRQ_APIC = 1
 };
 
+struct arch_irq_remapping_info
+{
+    uint8_t dest_mode:1;
+    uint8_t delivery_mode:3;
+    uint8_t  vector;
+    uint32_t dest;
+};
+
 struct arch_irq_remapping_request
 {
     union {
diff --git a/xen/include/xen/viommu.h b/xen/include/xen/viommu.h
index 67e25d5..73b853f 100644
--- a/xen/include/xen/viommu.h
+++ b/xen/include/xen/viommu.h
@@ -32,6 +32,9 @@ struct viommu_ops {
     int (*destroy)(struct viommu *viommu);
     int (*handle_irq_request)(const struct domain *d,
                               const struct arch_irq_remapping_request *request);
+    int (*get_irq_info)(const struct domain *d,
+                        const struct arch_irq_remapping_request *request,
+                        struct arch_irq_remapping_info *info);
 };
 
 struct viommu {
@@ -50,6 +53,9 @@ int viommu_destroy_domain(struct domain *d);
 int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op);
 int viommu_handle_irq_request(const struct domain *d,
                               const struct arch_irq_remapping_request *request);
+int viommu_get_irq_info(const struct domain *d,
+                        const struct arch_irq_remapping_request *request,
+                        struct arch_irq_remapping_info *irq_info);
 #else
 static inline int viommu_destroy_domain(struct domain *d)
 {
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 05/28] VIOMMU: Introduce callback of checking irq remapping mode
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (3 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 04/28] VIOMMU: Add get irq info callback to convert irq remapping request Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-09 15:11   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 06/28] vtd: clean-up and preparation for vvtd Chao Gao
                   ` (23 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

From: Lan Tianyu <tianyu.lan@intel.com>

This patch is to add callback for vIOAPIC and vMSI to check whether interrupt
remapping is enabled.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 xen/common/viommu.c      | 15 +++++++++++++++
 xen/include/xen/viommu.h |  4 ++++
 2 files changed, 19 insertions(+)

diff --git a/xen/common/viommu.c b/xen/common/viommu.c
index 9eafdef..72173c3 100644
--- a/xen/common/viommu.c
+++ b/xen/common/viommu.c
@@ -145,6 +145,21 @@ int viommu_get_irq_info(const struct domain *d,
     return viommu->ops->get_irq_info(d, request, irq_info);
 }
 
+bool viommu_check_irq_remapping(const struct domain *d,
+                                const struct arch_irq_remapping_request *request)
+{
+    const struct viommu *viommu = d->arch.hvm_domain.viommu;
+
+    if ( !viommu )
+        return false;
+
+    ASSERT(viommu->ops);
+    if ( !viommu->ops->check_irq_remapping )
+        return false;
+
+    return viommu->ops->check_irq_remapping(d, request);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/xen/viommu.h b/xen/include/xen/viommu.h
index 73b853f..c1dfaec 100644
--- a/xen/include/xen/viommu.h
+++ b/xen/include/xen/viommu.h
@@ -29,6 +29,8 @@ struct viommu;
 struct viommu_ops {
     uint8_t type;
     int (*create)(struct domain *d, struct viommu *viommu);
+    bool (*check_irq_remapping)(const struct domain *d,
+                                const struct arch_irq_remapping_request *request);
     int (*destroy)(struct viommu *viommu);
     int (*handle_irq_request)(const struct domain *d,
                               const struct arch_irq_remapping_request *request);
@@ -56,6 +58,8 @@ int viommu_handle_irq_request(const struct domain *d,
 int viommu_get_irq_info(const struct domain *d,
                         const struct arch_irq_remapping_request *request,
                         struct arch_irq_remapping_info *irq_info);
+bool viommu_check_irq_remapping(const struct domain *d,
+                                const struct arch_irq_remapping_request *request);
 #else
 static inline int viommu_destroy_domain(struct domain *d)
 {
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 06/28] vtd: clean-up and preparation for vvtd
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (4 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 05/28] VIOMMU: Introduce callback of checking irq remapping mode Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-09 15:17   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 07/28] x86/hvm: Introduce a emulated VTD for HVM Chao Gao
                   ` (22 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

This patch contains following changes:
- align register definitions
- use MASK_EXTR to define some macros about extended capabilies
rather than open-coding the masks
- define fields of FECTL and FESTS as uint32_t rather than u64 since
FECTL and FESTS are 32 bit registers.

No functional changes.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>

---
v4:
 - Only fix the alignment and defer introducing new definition to when
 they are needed
 (Suggested-by Roger Pau Monné)
 - remove parts of open-coded masks
v3:
 - new
---
 xen/drivers/passthrough/vtd/iommu.h | 86 +++++++++++++++++++++----------------
 1 file changed, 48 insertions(+), 38 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index 72c1a2e..db80b31 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -26,28 +26,28 @@
  * Intel IOMMU register specification per version 1.0 public spec.
  */
 
-#define    DMAR_VER_REG    0x0    /* Arch version supported by this IOMMU */
-#define    DMAR_CAP_REG    0x8    /* Hardware supported capabilities */
-#define    DMAR_ECAP_REG    0x10    /* Extended capabilities supported */
-#define    DMAR_GCMD_REG    0x18    /* Global command register */
-#define    DMAR_GSTS_REG    0x1c    /* Global status register */
-#define    DMAR_RTADDR_REG    0x20    /* Root entry table */
-#define    DMAR_CCMD_REG    0x28    /* Context command reg */
-#define    DMAR_FSTS_REG    0x34    /* Fault Status register */
-#define    DMAR_FECTL_REG    0x38    /* Fault control register */
-#define    DMAR_FEDATA_REG    0x3c    /* Fault event interrupt data register */
-#define    DMAR_FEADDR_REG    0x40    /* Fault event interrupt addr register */
-#define    DMAR_FEUADDR_REG 0x44    /* Upper address register */
-#define    DMAR_AFLOG_REG    0x58    /* Advanced Fault control */
-#define    DMAR_PMEN_REG    0x64    /* Enable Protected Memory Region */
-#define    DMAR_PLMBASE_REG 0x68    /* PMRR Low addr */
-#define    DMAR_PLMLIMIT_REG 0x6c    /* PMRR low limit */
-#define    DMAR_PHMBASE_REG 0x70    /* pmrr high base addr */
-#define    DMAR_PHMLIMIT_REG 0x78    /* pmrr high limit */
-#define    DMAR_IQH_REG    0x80    /* invalidation queue head */
-#define    DMAR_IQT_REG    0x88    /* invalidation queue tail */
-#define    DMAR_IQA_REG    0x90    /* invalidation queue addr */
-#define    DMAR_IRTA_REG   0xB8    /* intr remap */
+#define DMAR_VER_REG            0x0  /* Arch version supported by this IOMMU */
+#define DMAR_CAP_REG            0x8  /* Hardware supported capabilities */
+#define DMAR_ECAP_REG           0x10 /* Extended capabilities supported */
+#define DMAR_GCMD_REG           0x18 /* Global command register */
+#define DMAR_GSTS_REG           0x1c /* Global status register */
+#define DMAR_RTADDR_REG         0x20 /* Root entry table */
+#define DMAR_CCMD_REG           0x28 /* Context command reg */
+#define DMAR_FSTS_REG           0x34 /* Fault Status register */
+#define DMAR_FECTL_REG          0x38 /* Fault control register */
+#define DMAR_FEDATA_REG         0x3c /* Fault event interrupt data register */
+#define DMAR_FEADDR_REG         0x40 /* Fault event interrupt addr register */
+#define DMAR_FEUADDR_REG        0x44 /* Upper address register */
+#define DMAR_AFLOG_REG          0x58 /* Advanced Fault control */
+#define DMAR_PMEN_REG           0x64 /* Enable Protected Memory Region */
+#define DMAR_PLMBASE_REG        0x68 /* PMRR Low addr */
+#define DMAR_PLMLIMIT_REG       0x6c /* PMRR low limit */
+#define DMAR_PHMBASE_REG        0x70 /* pmrr high base addr */
+#define DMAR_PHMLIMIT_REG       0x78 /* pmrr high limit */
+#define DMAR_IQH_REG            0x80 /* invalidation queue head */
+#define DMAR_IQT_REG            0x88 /* invalidation queue tail */
+#define DMAR_IQA_REG            0x90 /* invalidation queue addr */
+#define DMAR_IRTA_REG           0xb8 /* intr remap */
 
 #define OFFSET_STRIDE        (9)
 #define dmar_readl(dmar, reg) readl((dmar) + (reg))
@@ -93,16 +93,26 @@
  * Extended Capability Register
  */
 
+#define DMA_ECAP_SNP_CTL        ((uint64_t)1 << 7)
+#define DMA_ECAP_PASS_THRU      ((uint64_t)1 << 6)
+#define DMA_ECAP_CACHE_HINTS    ((uint64_t)1 << 5)
+#define DMA_ECAP_EIM            ((uint64_t)1 << 4)
+#define DMA_ECAP_INTR_REMAP     ((uint64_t)1 << 3)
+#define DMA_ECAP_DEV_IOTLB      ((uint64_t)1 << 2)
+#define DMA_ECAP_QUEUED_INVAL   ((uint64_t)1 << 1)
+#define DMA_ECAP_COHERENT       ((uint64_t)1 << 0)
+
+#define ecap_snp_ctl(e)         MASK_EXTR(e, DMA_ECAP_SNP_CTL)
+#define ecap_pass_thru(e)       MASK_EXTR(e, DMA_ECAP_PASS_THRU)
+#define ecap_cache_hints(e)     MASK_EXTR(e, DMA_ECAP_CACHE_HINTS)
+#define ecap_eim(e)             MASK_EXTR(e, DMA_ECAP_EIM)
+#define ecap_intr_remap(e)      MASK_EXTR(e, DMA_ECAP_INTR_REMAP)
+#define ecap_dev_iotlb(e)       MASK_EXTR(e, DMA_ECAP_DEV_IOTLB)
+#define ecap_queued_inval(e)    MASK_EXTR(e, DMA_ECAP_QUEUED_INVAL)
+#define ecap_coherent(e)        MASK_EXTR(e, DMA_ECAP_COHERENT)
+
 #define ecap_niotlb_iunits(e)    ((((e) >> 24) & 0xff) + 1)
 #define ecap_iotlb_offset(e)     ((((e) >> 8) & 0x3ff) * 16)
-#define ecap_coherent(e)         ((e >> 0) & 0x1)
-#define ecap_queued_inval(e)     ((e >> 1) & 0x1)
-#define ecap_dev_iotlb(e)        ((e >> 2) & 0x1)
-#define ecap_intr_remap(e)       ((e >> 3) & 0x1)
-#define ecap_eim(e)              ((e >> 4) & 0x1)
-#define ecap_cache_hints(e)      ((e >> 5) & 0x1)
-#define ecap_pass_thru(e)        ((e >> 6) & 0x1)
-#define ecap_snp_ctl(e)          ((e >> 7) & 0x1)
 
 /* IOTLB_REG */
 #define DMA_TLB_FLUSH_GRANU_OFFSET  60
@@ -164,16 +174,16 @@
 #define DMA_CCMD_CAIG_MASK(x) (((u64)x) & ((u64) 0x3 << 59))
 
 /* FECTL_REG */
-#define DMA_FECTL_IM (((u64)1) << 31)
+#define DMA_FECTL_IM        ((uint32_t)1 << 31)
 
 /* FSTS_REG */
-#define DMA_FSTS_PFO ((u64)1 << 0)
-#define DMA_FSTS_PPF ((u64)1 << 1)
-#define DMA_FSTS_AFO ((u64)1 << 2)
-#define DMA_FSTS_APF ((u64)1 << 3)
-#define DMA_FSTS_IQE ((u64)1 << 4)
-#define DMA_FSTS_ICE ((u64)1 << 5)
-#define DMA_FSTS_ITE ((u64)1 << 6)
+#define DMA_FSTS_PFO        ((uint32_t)1 << 0)
+#define DMA_FSTS_PPF        ((uint32_t)1 << 1)
+#define DMA_FSTS_AFO        ((uint32_t)1 << 2)
+#define DMA_FSTS_APF        ((uint32_t)1 << 3)
+#define DMA_FSTS_IQE        ((uint32_t)1 << 4)
+#define DMA_FSTS_ICE        ((uint32_t)1 << 5)
+#define DMA_FSTS_ITE        ((uint32_t)1 << 6)
 #define DMA_FSTS_FAULTS    DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_AFO | DMA_FSTS_APF | DMA_FSTS_IQE | DMA_FSTS_ICE | DMA_FSTS_ITE
 #define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff)
 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 07/28] x86/hvm: Introduce a emulated VTD for HVM
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (5 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 06/28] vtd: clean-up and preparation for vvtd Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-09 16:27   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 08/28] x86/vvtd: Add MMIO handler for VVTD Chao Gao
                   ` (21 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

This patch adds create/destroy function for the emulated VTD
and adapts it to the common VIOMMU abstraction.

As the Makefile is changed here, put all files in alphabetic order
by this chance.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>

---
v4:
- use REGISTER_VIOMMU
- shrink the size of hvm_hw_vvtd_regs
- make hvm_hw_vvtd_regs a field inside struct vvtd
---
 xen/drivers/passthrough/vtd/Makefile |   7 +-
 xen/drivers/passthrough/vtd/iommu.h  |   9 +++
 xen/drivers/passthrough/vtd/vvtd.c   | 150 +++++++++++++++++++++++++++++++++++
 3 files changed, 163 insertions(+), 3 deletions(-)
 create mode 100644 xen/drivers/passthrough/vtd/vvtd.c

diff --git a/xen/drivers/passthrough/vtd/Makefile b/xen/drivers/passthrough/vtd/Makefile
index f302653..163c7fe 100644
--- a/xen/drivers/passthrough/vtd/Makefile
+++ b/xen/drivers/passthrough/vtd/Makefile
@@ -1,8 +1,9 @@
 subdir-$(CONFIG_X86) += x86
 
-obj-y += iommu.o
 obj-y += dmar.o
-obj-y += utils.o
-obj-y += qinval.o
 obj-y += intremap.o
+obj-y += iommu.o
+obj-y += qinval.o
 obj-y += quirks.o
+obj-y += utils.o
+obj-$(CONFIG_VIOMMU) += vvtd.o
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index db80b31..f2ef3dd 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -47,6 +47,7 @@
 #define DMAR_IQH_REG            0x80 /* invalidation queue head */
 #define DMAR_IQT_REG            0x88 /* invalidation queue tail */
 #define DMAR_IQA_REG            0x90 /* invalidation queue addr */
+#define DMAR_IECTL_REG          0xa0 /* invalidation event control register */
 #define DMAR_IRTA_REG           0xb8 /* intr remap */
 
 #define OFFSET_STRIDE        (9)
@@ -89,6 +90,12 @@
 #define cap_afl(c)        (((c) >> 3) & 1)
 #define cap_ndoms(c)        (1 << (4 + 2 * ((c) & 0x7)))
 
+#define cap_set_num_fault_regs(c)   ((((c) - 1) & 0xff) << 40)
+#define cap_set_fault_reg_offset(c) ((((c) / 16) & 0x3ff) << 24)
+#define cap_set_mgaw(c)             ((((c) - 1) & 0x3f) << 16)
+#define cap_set_sagaw(c)            (((c) & 0x1f) << 8)
+#define cap_set_ndoms(c)            ((c) & 0x7)
+
 /*
  * Extended Capability Register
  */
@@ -114,6 +121,8 @@
 #define ecap_niotlb_iunits(e)    ((((e) >> 24) & 0xff) + 1)
 #define ecap_iotlb_offset(e)     ((((e) >> 8) & 0x3ff) * 16)
 
+#define ecap_set_mhmv(e)         (((e) & 0xf) << 20)
+
 /* IOTLB_REG */
 #define DMA_TLB_FLUSH_GRANU_OFFSET  60
 #define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60)
diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
new file mode 100644
index 0000000..9f76ccf
--- /dev/null
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -0,0 +1,150 @@
+/*
+ * vvtd.c
+ *
+ * virtualize VTD for HVM.
+ *
+ * Copyright (C) 2017 Chao Gao, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/sched.h>
+#include <xen/types.h>
+#include <xen/viommu.h>
+#include <xen/xmalloc.h>
+#include <asm/current.h>
+#include <asm/hvm/domain.h>
+
+#include "iommu.h"
+
+/* Supported capabilities by vvtd */
+#define VVTD_MAX_CAPS VIOMMU_CAP_IRQ_REMAPPING
+
+#define VVTD_FRCD_NUM   1ULL
+#define VVTD_FRCD_START (DMAR_IRTA_REG + 8)
+#define VVTD_FRCD_END   (VVTD_FRCD_START + VVTD_FRCD_NUM * 16)
+#define VVTD_MAX_OFFSET VVTD_FRCD_END
+
+struct hvm_hw_vvtd {
+    uint32_t regs[VVTD_MAX_OFFSET/sizeof(uint32_t)];
+};
+
+struct vvtd {
+    /* Base address of remapping hardware register-set */
+    uint64_t base_addr;
+    /* Point back to the owner domain */
+    struct domain *domain;
+
+    struct hvm_hw_vvtd hw;
+};
+
+/* Setting viommu_verbose enables debugging messages of vIOMMU */
+bool __read_mostly viommu_verbose;
+boolean_runtime_param("viommu_verbose", viommu_verbose);
+
+#ifndef NDEBUG
+#define vvtd_info(fmt...) do {                    \
+    if ( viommu_verbose )                         \
+        gprintk(XENLOG_INFO, ## fmt);             \
+} while(0)
+/*
+ * Use printk and '_G_' prefix because vvtd_debug() may be called
+ * in the context of another domain's vCPU. Don't output 'current'
+ * information to avoid confusion.
+ */
+#define vvtd_debug(fmt...) do {                   \
+    if ( viommu_verbose && printk_ratelimit())    \
+        printk(XENLOG_G_DEBUG fmt);               \
+} while(0)
+#else
+#define vvtd_info(...) do {} while(0)
+#define vvtd_debug(...) do {} while(0)
+#endif
+
+#define VVTD_REG_POS(vvtd, offset) &(vvtd->hw.regs[offset/sizeof(uint32_t)])
+
+static inline void vvtd_set_reg(struct vvtd *vvtd, uint32_t reg, uint32_t value)
+{
+    *VVTD_REG_POS(vvtd, reg) = value;
+}
+
+static inline uint32_t vvtd_get_reg(const struct vvtd *vvtd, uint32_t reg)
+{
+    return *VVTD_REG_POS(vvtd, reg);
+}
+
+static inline void vvtd_set_reg_quad(struct vvtd *vvtd, uint32_t reg,
+                                     uint64_t value)
+{
+    *(uint64_t*)VVTD_REG_POS(vvtd, reg) = value;
+}
+
+static inline uint64_t vvtd_get_reg_quad(const struct vvtd *vvtd, uint32_t reg)
+{
+    return *(uint64_t*)VVTD_REG_POS(vvtd, reg);
+}
+
+static void vvtd_reset(struct vvtd *vvtd)
+{
+    uint64_t cap = cap_set_num_fault_regs(VVTD_FRCD_NUM)
+                   | cap_set_fault_reg_offset(VVTD_FRCD_START)
+                   | cap_set_mgaw(39) /* maximum guest address width */
+                   | cap_set_sagaw(2) /* support 3-level page_table */
+                   | cap_set_ndoms(6); /* support 64K domains */
+    uint64_t ecap = DMA_ECAP_QUEUED_INVAL | DMA_ECAP_INTR_REMAP | DMA_ECAP_EIM |
+                    ecap_set_mhmv(0xf);
+
+    vvtd_set_reg(vvtd, DMAR_VER_REG, 0x10UL);
+    vvtd_set_reg_quad(vvtd, DMAR_CAP_REG, cap);
+    vvtd_set_reg_quad(vvtd, DMAR_ECAP_REG, ecap);
+    vvtd_set_reg(vvtd, DMAR_FECTL_REG, 0x80000000UL);
+    vvtd_set_reg(vvtd, DMAR_IECTL_REG, 0x80000000UL);
+}
+
+static int vvtd_create(struct domain *d, struct viommu *viommu)
+{
+    struct vvtd *vvtd;
+
+    if ( !is_hvm_domain(d) || (viommu->base_address & (PAGE_SIZE - 1)) ||
+         (~VVTD_MAX_CAPS & viommu->caps) )
+        return -EINVAL;
+
+    vvtd = xzalloc_bytes(sizeof(struct vvtd));
+    if ( !vvtd )
+        return ENOMEM;
+
+    vvtd_reset(vvtd);
+    vvtd->base_addr = viommu->base_address;
+    vvtd->domain = d;
+
+    viommu->priv = vvtd;
+
+    return 0;
+}
+
+static int vvtd_destroy(struct viommu *viommu)
+{
+    struct vvtd *vvtd = viommu->priv;
+
+    if ( vvtd )
+        xfree(vvtd);
+
+    return 0;
+}
+
+static const struct viommu_ops vvtd_hvm_vmx_ops = {
+    .create = vvtd_create,
+    .destroy = vvtd_destroy,
+};
+
+REGISTER_VIOMMU(vvtd_hvm_vmx_ops);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 08/28] x86/vvtd: Add MMIO handler for VVTD
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (6 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 07/28] x86/hvm: Introduce a emulated VTD for HVM Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-09 16:39   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 09/28] x86/vvtd: Set Interrupt Remapping Table Pointer through GCMD Chao Gao
                   ` (20 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

This patch adds VVTD MMIO handler to deal with MMIO access.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
v4:
 - only trap the register emulated in vvtd_in_range().
   i.e. replace PAGE_SIZE with the VVTD_MAX_OFFSET
---
 xen/drivers/passthrough/vtd/vvtd.c | 55 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
index 9f76ccf..d78d878 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -94,6 +94,60 @@ static inline uint64_t vvtd_get_reg_quad(const struct vvtd *vvtd, uint32_t reg)
     return *(uint64_t*)VVTD_REG_POS(vvtd, reg);
 }
 
+static void *domain_vvtd(const struct domain *d)
+{
+    if ( is_hvm_domain(d) && d->arch.hvm_domain.viommu )
+        return d->arch.hvm_domain.viommu->priv;
+    else
+        return NULL;
+}
+
+static int vvtd_in_range(struct vcpu *v, unsigned long addr)
+{
+    struct vvtd *vvtd = domain_vvtd(v->domain);
+
+    if ( vvtd )
+        return (addr >= vvtd->base_addr) &&
+               (addr < vvtd->base_addr + VVTD_MAX_OFFSET);
+    return 0;
+}
+
+static int vvtd_read(struct vcpu *v, unsigned long addr,
+                     unsigned int len, unsigned long *pval)
+{
+    struct vvtd *vvtd = domain_vvtd(v->domain);
+    unsigned int offset = addr - vvtd->base_addr;
+
+    vvtd_info("Read offset %x len %d\n", offset, len);
+
+    if ( (len != 4 && len != 8) || (offset & (len - 1)) )
+        return X86EMUL_OKAY;
+
+    if ( len == 4 )
+        *pval = vvtd_get_reg(vvtd, offset);
+    else
+        *pval = vvtd_get_reg_quad(vvtd, offset);
+
+    return X86EMUL_OKAY;
+}
+
+static int vvtd_write(struct vcpu *v, unsigned long addr,
+                      unsigned int len, unsigned long val)
+{
+    struct vvtd *vvtd = domain_vvtd(v->domain);
+    unsigned int offset = addr - vvtd->base_addr;
+
+    vvtd_info("Write offset %x len %d val %lx\n", offset, len, val);
+
+    return X86EMUL_OKAY;
+}
+
+static const struct hvm_mmio_ops vvtd_mmio_ops = {
+    .check = vvtd_in_range,
+    .read = vvtd_read,
+    .write = vvtd_write
+};
+
 static void vvtd_reset(struct vvtd *vvtd)
 {
     uint64_t cap = cap_set_num_fault_regs(VVTD_FRCD_NUM)
@@ -126,6 +180,7 @@ static int vvtd_create(struct domain *d, struct viommu *viommu)
     vvtd_reset(vvtd);
     vvtd->base_addr = viommu->base_address;
     vvtd->domain = d;
+    register_mmio_handler(d, &vvtd_mmio_ops);
 
     viommu->priv = vvtd;
 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 09/28] x86/vvtd: Set Interrupt Remapping Table Pointer through GCMD
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (7 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 08/28] x86/vvtd: Add MMIO handler for VVTD Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-09 16:59   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 10/28] x86/vvtd: Enable Interrupt Remapping " Chao Gao
                   ` (19 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

Software sets SIRTP field of GCMD to set/update the interrupt remapping
table pointer used by hardware. The interrupt remapping table pointer is
specified through the Interrupt Remapping Table Address (IRTA_REG)
register.

This patch emulates this operation and adds some new fields in VVTD to track
info (e.g. the table's gfn and max supported entries) of interrupt remapping
table.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>

---
v4:
 - declare eim_enabled as bool and irt as gfn_t
 - rename vvtd_handle_gcmd_sirtp() to write_gcmd_sirtp()

v3:
 - ignore unaligned r/w of vt-d hardware registers and return X86EMUL_OK
---
 xen/drivers/passthrough/vtd/iommu.h | 16 ++++++-
 xen/drivers/passthrough/vtd/vvtd.c  | 86 +++++++++++++++++++++++++++++++++++++
 2 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index f2ef3dd..8579843 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -48,7 +48,8 @@
 #define DMAR_IQT_REG            0x88 /* invalidation queue tail */
 #define DMAR_IQA_REG            0x90 /* invalidation queue addr */
 #define DMAR_IECTL_REG          0xa0 /* invalidation event control register */
-#define DMAR_IRTA_REG           0xb8 /* intr remap */
+#define DMAR_IRTA_REG           0xb8 /* base address of intr remap table */
+#define DMAR_IRTUA_REG          0xbc /* upper address of intr remap table */
 
 #define OFFSET_STRIDE        (9)
 #define dmar_readl(dmar, reg) readl((dmar) + (reg))
@@ -150,6 +151,9 @@
 #define DMA_GCMD_SIRTP  (((u64)1) << 24)
 #define DMA_GCMD_CFI    (((u64)1) << 23)
 
+/* mask of one-shot bits */
+#define DMA_GCMD_ONE_SHOT_MASK 0x96ffffff
+
 /* GSTS_REG */
 #define DMA_GSTS_TES    (((u64)1) << 31)
 #define DMA_GSTS_RTPS   (((u64)1) << 30)
@@ -157,10 +161,18 @@
 #define DMA_GSTS_AFLS   (((u64)1) << 28)
 #define DMA_GSTS_WBFS   (((u64)1) << 27)
 #define DMA_GSTS_QIES   (((u64)1) <<26)
+#define DMA_GSTS_SIRTPS_SHIFT   24
+#define DMA_GSTS_SIRTPS (((u64)1) << DMA_GSTS_SIRTPS_SHIFT)
 #define DMA_GSTS_IRES   (((u64)1) <<25)
-#define DMA_GSTS_SIRTPS (((u64)1) << 24)
 #define DMA_GSTS_CFIS   (((u64)1) <<23)
 
+/* IRTA_REG */
+/* The base of 4KB aligned interrupt remapping table */
+#define DMA_IRTA_ADDR(val)      ((val) & ~0xfffULL)
+/* The size of remapping table is 2^(x+1), where x is the size field in IRTA */
+#define DMA_IRTA_S(val)         (val & 0xf)
+#define DMA_IRTA_SIZE(val)      (1UL << (DMA_IRTA_S(val) + 1))
+
 /* PMEN_REG */
 #define DMA_PMEN_EPM    (((u32)1) << 31)
 #define DMA_PMEN_PRS    (((u32)1) << 0)
diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
index d78d878..f0476fe 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -36,6 +36,12 @@
 #define VVTD_MAX_OFFSET VVTD_FRCD_END
 
 struct hvm_hw_vvtd {
+    bool eim_enabled;
+
+    /* Interrupt remapping table base gfn and the max of entries */
+    uint16_t irt_max_entry;
+    gfn_t irt;
+
     uint32_t regs[VVTD_MAX_OFFSET/sizeof(uint32_t)];
 };
 
@@ -73,6 +79,16 @@ boolean_runtime_param("viommu_verbose", viommu_verbose);
 
 #define VVTD_REG_POS(vvtd, offset) &(vvtd->hw.regs[offset/sizeof(uint32_t)])
 
+static inline void vvtd_set_bit(struct vvtd *vvtd, uint32_t reg, int nr)
+{
+    __set_bit(nr, VVTD_REG_POS(vvtd, reg));
+}
+
+static inline void vvtd_clear_bit(struct vvtd *vvtd, uint32_t reg, int nr)
+{
+    __clear_bit(nr, VVTD_REG_POS(vvtd, reg));
+}
+
 static inline void vvtd_set_reg(struct vvtd *vvtd, uint32_t reg, uint32_t value)
 {
     *VVTD_REG_POS(vvtd, reg) = value;
@@ -102,6 +118,52 @@ static void *domain_vvtd(const struct domain *d)
         return NULL;
 }
 
+static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
+{
+    uint64_t irta = vvtd_get_reg_quad(vvtd, DMAR_IRTA_REG);
+
+    if ( !(val & DMA_GCMD_SIRTP) )
+        return;
+
+    /*
+     * Hardware clears this bit when software sets the SIRTPS field in
+     * the Global Command register and sets it when hardware completes
+     * the 'Set Interrupt Remap Table Pointer' operation.
+     */
+    vvtd_clear_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
+
+    if ( gfn_x(vvtd->hw.irt) != PFN_DOWN(DMA_IRTA_ADDR(irta)) ||
+         vvtd->hw.irt_max_entry != DMA_IRTA_SIZE(irta) )
+    {
+        vvtd->hw.irt = _gfn(PFN_DOWN(DMA_IRTA_ADDR(irta)));
+        vvtd->hw.irt_max_entry = DMA_IRTA_SIZE(irta);
+        vvtd->hw.eim_enabled = !!(irta & IRTA_EIME);
+        vvtd_info("Update IR info (addr=%lx eim=%d size=%d)\n",
+                  gfn_x(vvtd->hw.irt), vvtd->hw.eim_enabled,
+                  vvtd->hw.irt_max_entry);
+    }
+    vvtd_set_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
+}
+
+static void vvtd_write_gcmd(struct vvtd *vvtd, uint32_t val)
+{
+    uint32_t orig = vvtd_get_reg(vvtd, DMAR_GSTS_REG);
+    uint32_t changed;
+
+    orig = orig & DMA_GCMD_ONE_SHOT_MASK;   /* reset the one-shot bits */
+    changed = orig ^ val;
+
+    if ( !changed )
+        return;
+
+    if ( changed & (changed - 1) )
+        vvtd_info("Write %x to GCMD (current %x), updating multiple fields",
+                  val, orig);
+
+    if ( changed & DMA_GCMD_SIRTP )
+        write_gcmd_sirtp(vvtd, val);
+}
+
 static int vvtd_in_range(struct vcpu *v, unsigned long addr)
 {
     struct vvtd *vvtd = domain_vvtd(v->domain);
@@ -139,6 +201,30 @@ static int vvtd_write(struct vcpu *v, unsigned long addr,
 
     vvtd_info("Write offset %x len %d val %lx\n", offset, len, val);
 
+    if ( (len != 4 && len != 8) || (offset & (len - 1)) )
+        return X86EMUL_OKAY;
+
+    switch ( offset )
+    {
+    case DMAR_GCMD_REG:
+        vvtd_write_gcmd(vvtd, val);
+        break;
+
+    case DMAR_IRTA_REG:
+        vvtd_set_reg(vvtd, offset, val);
+        if ( len == 4 )
+            break;
+        val = val >> 32;
+        offset += 4;
+        /* Fall through */
+    case DMAR_IRTUA_REG:
+        vvtd_set_reg(vvtd, offset, val);
+        break;
+
+    default:
+        break;
+    }
+
     return X86EMUL_OKAY;
 }
 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 10/28] x86/vvtd: Enable Interrupt Remapping through GCMD
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (8 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 09/28] x86/vvtd: Set Interrupt Remapping Table Pointer through GCMD Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-09 17:15   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 11/28] x86/vvtd: Process interrupt remapping request Chao Gao
                   ` (18 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

Software writes this field to enable/disable interrupt reampping. This
patch emulate IRES field of GCMD. Currently, Guest's whole IRT are
mapped to Xen permanently for the latency of delivering interrupt. And
the old mapping is undone if present when trying to set up a new one.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
v4:
 - map guest's interrupt reampping table to Xen permanently rather than
 mapping one specific page on demand.
---
 xen/drivers/passthrough/vtd/iommu.h |  3 +-
 xen/drivers/passthrough/vtd/vvtd.c  | 98 +++++++++++++++++++++++++++++++++++++
 2 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index 8579843..9c59aeb 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -161,9 +161,10 @@
 #define DMA_GSTS_AFLS   (((u64)1) << 28)
 #define DMA_GSTS_WBFS   (((u64)1) << 27)
 #define DMA_GSTS_QIES   (((u64)1) <<26)
+#define DMA_GSTS_IRES_SHIFT     25
+#define DMA_GSTS_IRES   (((u64)1) << DMA_GSTS_IRES_SHIFT)
 #define DMA_GSTS_SIRTPS_SHIFT   24
 #define DMA_GSTS_SIRTPS (((u64)1) << DMA_GSTS_SIRTPS_SHIFT)
-#define DMA_GSTS_IRES   (((u64)1) <<25)
 #define DMA_GSTS_CFIS   (((u64)1) <<23)
 
 /* IRTA_REG */
diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
index f0476fe..06e522a 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -24,6 +24,7 @@
 #include <xen/xmalloc.h>
 #include <asm/current.h>
 #include <asm/hvm/domain.h>
+#include <asm/p2m.h>
 
 #include "iommu.h"
 
@@ -37,6 +38,7 @@
 
 struct hvm_hw_vvtd {
     bool eim_enabled;
+    bool intremap_enabled;
 
     /* Interrupt remapping table base gfn and the max of entries */
     uint16_t irt_max_entry;
@@ -52,6 +54,7 @@ struct vvtd {
     struct domain *domain;
 
     struct hvm_hw_vvtd hw;
+    void *irt_base;
 };
 
 /* Setting viommu_verbose enables debugging messages of vIOMMU */
@@ -118,6 +121,77 @@ static void *domain_vvtd(const struct domain *d)
         return NULL;
 }
 
+static void *map_guest_pages(struct domain *d, uint64_t gfn, uint32_t nr)
+{
+    mfn_t *mfn = xmalloc_array(mfn_t, nr);
+    void* ret;
+    int i;
+
+    if ( !mfn )
+        return NULL;
+
+    for ( i = 0; i < nr; i++)
+    {
+        struct page_info *p = get_page_from_gfn(d, gfn + i, NULL, P2M_ALLOC);
+
+        if ( !p || !get_page_type(p, PGT_writable_page) )
+        {
+            if ( p )
+                put_page(p);
+            goto undo;
+        }
+
+        mfn[i] = _mfn(page_to_mfn(p));
+    }
+
+    ret = vmap(mfn, nr);
+    if ( ret == NULL )
+        goto undo;
+    xfree(mfn);
+
+    return ret;
+
+ undo:
+    for ( ; --i >= 0; )
+        put_page_and_type(mfn_to_page(mfn_x(mfn[i])));
+    xfree(mfn);
+    gprintk(XENLOG_ERR, "Failed to map guest pages %lx nr %x\n", gfn, nr);
+
+    return NULL;
+}
+
+static void unmap_guest_pages(void *va, uint32_t nr)
+{
+    unsigned long *mfn = xmalloc_array(unsigned long, nr);
+    int i;
+    void *va_copy = va;
+
+    if ( !mfn )
+    {
+        printk("%s %d: No free memory\n", __FILE__, __LINE__);
+        return;
+    }
+
+    for ( i = 0; i < nr; i++, va += PAGE_SIZE)
+        mfn[i] = domain_page_map_to_mfn(va);
+
+    vunmap(va_copy);
+
+    for ( i = 0; i < nr; i++)
+        put_page_and_type(mfn_to_page(mfn[i]));
+}
+
+static void write_gcmd_ire(struct vvtd *vvtd, uint32_t val)
+{
+    bool set = val & DMA_GCMD_IRE;
+
+    vvtd_info("%sable Interrupt Remapping\n", set ? "En" : "Dis");
+
+    vvtd->hw.intremap_enabled = set;
+    (set ? vvtd_set_bit : vvtd_clear_bit)
+        (vvtd, DMAR_GSTS_REG, DMA_GSTS_IRES_SHIFT);
+}
+
 static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
 {
     uint64_t irta = vvtd_get_reg_quad(vvtd, DMAR_IRTA_REG);
@@ -131,16 +205,29 @@ static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
      * the 'Set Interrupt Remap Table Pointer' operation.
      */
     vvtd_clear_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
+    if ( vvtd->hw.intremap_enabled )
+        vvtd_info("Update Interrupt Remapping Table when active\n");
 
     if ( gfn_x(vvtd->hw.irt) != PFN_DOWN(DMA_IRTA_ADDR(irta)) ||
          vvtd->hw.irt_max_entry != DMA_IRTA_SIZE(irta) )
     {
+        if ( vvtd->irt_base )
+        {
+            unmap_guest_pages(vvtd->irt_base,
+                              PFN_UP(vvtd->hw.irt_max_entry *
+                                     sizeof(struct iremap_entry)));
+            vvtd->irt_base = NULL;
+        }
         vvtd->hw.irt = _gfn(PFN_DOWN(DMA_IRTA_ADDR(irta)));
         vvtd->hw.irt_max_entry = DMA_IRTA_SIZE(irta);
         vvtd->hw.eim_enabled = !!(irta & IRTA_EIME);
         vvtd_info("Update IR info (addr=%lx eim=%d size=%d)\n",
                   gfn_x(vvtd->hw.irt), vvtd->hw.eim_enabled,
                   vvtd->hw.irt_max_entry);
+
+        vvtd->irt_base = map_guest_pages(vvtd->domain, gfn_x(vvtd->hw.irt),
+                                         PFN_UP(vvtd->hw.irt_max_entry *
+                                                sizeof(struct iremap_entry)));
     }
     vvtd_set_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
 }
@@ -162,6 +249,8 @@ static void vvtd_write_gcmd(struct vvtd *vvtd, uint32_t val)
 
     if ( changed & DMA_GCMD_SIRTP )
         write_gcmd_sirtp(vvtd, val);
+    if ( changed & DMA_GCMD_IRE )
+        write_gcmd_ire(vvtd, val);
 }
 
 static int vvtd_in_range(struct vcpu *v, unsigned long addr)
@@ -278,7 +367,16 @@ static int vvtd_destroy(struct viommu *viommu)
     struct vvtd *vvtd = viommu->priv;
 
     if ( vvtd )
+    {
+        if ( vvtd->irt_base )
+        {
+            unmap_guest_pages(vvtd->irt_base,
+                              PFN_UP(vvtd->hw.irt_max_entry *
+                                     sizeof(struct iremap_entry)));
+            vvtd->irt_base = NULL;
+        }
         xfree(vvtd);
+    }
 
     return 0;
 }
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 11/28] x86/vvtd: Process interrupt remapping request
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (9 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 10/28] x86/vvtd: Enable Interrupt Remapping " Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-09 17:44   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 12/28] x86/vvtd: decode interrupt attribute from IRTE Chao Gao
                   ` (17 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

When a remapping interrupt request arrives, remapping hardware computes the
interrupt_index per the algorithm described in VTD spec
"Interrupt Remapping Table", interprets the IRTE and generates a remapped
interrupt request.

This patch introduces viommu_handle_irq_request() to emulate the process how
remapping hardware handles a remapping interrupt request. This patch
also introduces a counter inflight_intr, which is used to count the number
of interrupt are being handled. The reason why we should have this
counter is VT-d hardware should drain in-flight interrups before setting
flags to show that some operations are completed. These operations
include enabling interrupt remapping and performing a kind of invalidation
requests. In vvtd, we also try to drain in-flight interrupts by waiting
the inflight_intr is decreased to 0.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>

---
v4:
 - use "#define" to define interrupt remapping transition faults
 rather than using an enum
 - use switch-case rather than if-else in irq_remapping_request_index()
 and vvtd_irq_request_sanity_check()
 - introduce a counter inflight_intr

v3:
 - Encode map_guest_page()'s error into void* to avoid using another parameter
---
 xen/drivers/passthrough/vtd/iommu.h |  15 +++
 xen/drivers/passthrough/vtd/vvtd.c  | 219 ++++++++++++++++++++++++++++++++++++
 2 files changed, 234 insertions(+)

diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index 9c59aeb..82edd2a 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -216,6 +216,15 @@
 #define dma_frcd_source_id(c) (c & 0xffff)
 #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
 
+/* Interrupt remapping transition faults */
+#define VTD_FR_IR_REQ_RSVD      0x20
+#define VTD_FR_IR_INDEX_OVER    0x21
+#define VTD_FR_IR_ENTRY_P       0x22
+#define VTD_FR_IR_ROOT_INVAL    0x23
+#define VTD_FR_IR_IRTE_RSVD     0x24
+#define VTD_FR_IR_REQ_COMPAT    0x25
+#define VTD_FR_IR_SID_ERR       0x26
+
 /*
  * 0: Present
  * 1-11: Reserved
@@ -356,6 +365,12 @@ struct iremap_entry {
 };
 
 /*
+ * When VT-d doesn't enable extended interrupt mode, hardware interprets
+ * 8-bits ([15:8]) of Destination-ID field in the IRTEs.
+ */
+#define IRTE_xAPIC_DEST_MASK 0xff00
+
+/*
  * Posted-interrupt descriptor address is 64 bits with 64-byte aligned, only
  * the upper 26 bits of lest significiant 32 bits is available.
  */
diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
index 06e522a..927e715 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -22,11 +22,15 @@
 #include <xen/types.h>
 #include <xen/viommu.h>
 #include <xen/xmalloc.h>
+#include <asm/apic.h>
 #include <asm/current.h>
+#include <asm/event.h>
+#include <asm/io_apic.h>
 #include <asm/hvm/domain.h>
 #include <asm/p2m.h>
 
 #include "iommu.h"
+#include "vtd.h"
 
 /* Supported capabilities by vvtd */
 #define VVTD_MAX_CAPS VIOMMU_CAP_IRQ_REMAPPING
@@ -52,6 +56,8 @@ struct vvtd {
     uint64_t base_addr;
     /* Point back to the owner domain */
     struct domain *domain;
+    /* # of in-flight interrupts */
+    atomic_t inflight_intr;
 
     struct hvm_hw_vvtd hw;
     void *irt_base;
@@ -181,6 +187,109 @@ static void unmap_guest_pages(void *va, uint32_t nr)
         put_page_and_type(mfn_to_page(mfn[i]));
 }
 
+static int vvtd_delivery(struct domain *d, uint8_t vector,
+                         uint32_t dest, bool dest_mode,
+                         uint8_t delivery_mode, uint8_t trig_mode)
+{
+    struct vlapic *target;
+    struct vcpu *v;
+
+    switch ( delivery_mode )
+    {
+    case dest_LowestPrio:
+        target = vlapic_lowest_prio(d, NULL, 0, dest, dest_mode);
+        if ( target != NULL )
+        {
+            vvtd_debug("d%d: dest=v%d dlm=%x vector=%d trig_mode=%d\n",
+                       vlapic_domain(target)->domain_id,
+                       vlapic_vcpu(target)->vcpu_id,
+                       delivery_mode, vector, trig_mode);
+            vlapic_set_irq(target, vector, trig_mode);
+            break;
+        }
+        vvtd_debug("d%d: null round robin: vector=%02x\n",
+                   d->domain_id, vector);
+        break;
+
+    case dest_Fixed:
+        for_each_vcpu ( d, v )
+            if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) )
+            {
+                vvtd_debug("d%d: dest=v%d dlm=%x vector=%d trig_mode=%d\n",
+                           v->domain->domain_id, v->vcpu_id,
+                           delivery_mode, vector, trig_mode);
+                vlapic_set_irq(vcpu_vlapic(v), vector, trig_mode);
+            }
+        break;
+
+    case dest_NMI:
+        for_each_vcpu ( d, v )
+            if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) &&
+                 !test_and_set_bool(v->nmi_pending) )
+                vcpu_kick(v);
+        break;
+
+    default:
+        gdprintk(XENLOG_WARNING, "Unsupported VTD delivery mode %d\n",
+                 delivery_mode);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+/* Computing the IRTE index for a given interrupt request. When success, return
+ * 0 and set index to reference the corresponding IRTE. Otherwise, return < 0,
+ * i.e. -1 when the irq request isn't an remapping format.
+ */
+static int irq_remapping_request_index(
+    const struct arch_irq_remapping_request *irq, uint32_t *index)
+{
+    switch ( irq->type )
+    {
+    case VIOMMU_REQUEST_IRQ_MSI:
+    {
+        struct msi_msg_remap_entry msi_msg =
+        {
+            .address_lo = { .val = irq->msg.msi.addr },
+            .data = irq->msg.msi.data,
+        };
+
+        if ( !msi_msg.address_lo.format )
+            return -1;
+
+        *index = (msi_msg.address_lo.index_15 << 15) +
+                msi_msg.address_lo.index_0_14;
+        if ( msi_msg.address_lo.SHV )
+            *index += (uint16_t)msi_msg.data;
+        break;
+    }
+
+    case VIOMMU_REQUEST_IRQ_APIC:
+    {
+        struct IO_APIC_route_remap_entry remap_rte = { .val = irq->msg.rte };
+
+        if ( !remap_rte.format )
+            return -1;
+
+        *index = (remap_rte.index_15 << 15) + remap_rte.index_0_14;
+        break;
+    }
+
+    default:
+        ASSERT_UNREACHABLE();
+    }
+
+    return 0;
+}
+
+static inline uint32_t irte_dest(struct vvtd *vvtd, uint32_t dest)
+{
+    /* In xAPIC mode, only 8-bits([15:8]) are valid */
+    return vvtd->hw.eim_enabled ? dest
+                                : MASK_EXTR(dest, IRTE_xAPIC_DEST_MASK);
+}
+
 static void write_gcmd_ire(struct vvtd *vvtd, uint32_t val)
 {
     bool set = val & DMA_GCMD_IRE;
@@ -323,6 +432,115 @@ static const struct hvm_mmio_ops vvtd_mmio_ops = {
     .write = vvtd_write
 };
 
+static void vvtd_handle_fault(struct vvtd *vvtd,
+                              const struct arch_irq_remapping_request *irq,
+                              struct iremap_entry *irte,
+                              unsigned int fault)
+{
+    switch ( fault )
+    {
+    case VTD_FR_IR_SID_ERR:
+    case VTD_FR_IR_IRTE_RSVD:
+    case VTD_FR_IR_ENTRY_P:
+        if ( qinval_fault_disable(*irte) )
+            break;
+    /* fall through */
+    case VTD_FR_IR_REQ_RSVD:
+    case VTD_FR_IR_INDEX_OVER:
+    case VTD_FR_IR_ROOT_INVAL:
+        /* TODO: handle fault (e.g. record and report this fault to VM */
+        break;
+
+    default:
+        vvtd_debug("d%d can't handle VT-d fault %x\n", vvtd->domain->domain_id,
+                   fault);
+    }
+    return;
+}
+
+static bool vvtd_irq_request_sanity_check(const struct vvtd *vvtd,
+                                   const struct arch_irq_remapping_request *irq)
+{
+    switch ( irq->type )
+    {
+    case VIOMMU_REQUEST_IRQ_APIC:
+    {
+        struct IO_APIC_route_remap_entry rte = { .val = irq->msg.rte };
+
+        return !rte.reserved;
+    }
+
+    case VIOMMU_REQUEST_IRQ_MSI:
+        return true;
+    }
+
+    ASSERT_UNREACHABLE();
+    return false;
+}
+
+static int vvtd_get_entry(struct vvtd *vvtd,
+                          const struct arch_irq_remapping_request *irq,
+                          struct iremap_entry *dest)
+{
+    uint32_t entry;
+    struct iremap_entry irte;
+    int ret = irq_remapping_request_index(irq, &entry);
+
+    ASSERT(!ret);
+
+    vvtd_debug("d%d: interpret a request with index %x\n",
+               vvtd->domain->domain_id, entry);
+
+    if ( !vvtd_irq_request_sanity_check(vvtd, irq) )
+        return VTD_FR_IR_REQ_RSVD;
+    else if ( entry > vvtd->hw.irt_max_entry )
+        return VTD_FR_IR_INDEX_OVER;
+    else if ( !vvtd->irt_base )
+        return VTD_FR_IR_ROOT_INVAL;
+
+    irte = ((struct iremap_entry*)vvtd->irt_base)[entry];
+
+    if ( !qinval_present(irte) )
+        ret = VTD_FR_IR_ENTRY_P;
+    else if ( (irte.remap.res_1 || irte.remap.res_2 || irte.remap.res_3 ||
+               irte.remap.res_4) )
+        ret = VTD_FR_IR_IRTE_RSVD;
+
+    /* FIXME: We don't check against the source ID */
+
+    dest->val = irte.val;
+
+    return ret;
+}
+
+static int vvtd_handle_irq_request(const struct domain *d,
+                                   const struct arch_irq_remapping_request *irq)
+{
+    struct iremap_entry irte;
+    int ret;
+    struct vvtd *vvtd = domain_vvtd(d);
+
+    if ( !vvtd || !vvtd->hw.intremap_enabled )
+        return -ENODEV;
+
+    atomic_inc(&vvtd->inflight_intr);
+    ret = vvtd_get_entry(vvtd, irq, &irte);
+    if ( ret )
+    {
+        vvtd_handle_fault(vvtd, irq, &irte, ret);
+        goto out;
+    }
+
+    ret = vvtd_delivery(vvtd->domain, irte.remap.vector,
+                        irte_dest(vvtd, irte.remap.dst),
+                        irte.remap.dm, irte.remap.dlm,
+                        irte.remap.tm);
+
+ out:
+    atomic_dec(&vvtd->inflight_intr);
+    return ret;
+}
+
 static void vvtd_reset(struct vvtd *vvtd)
 {
     uint64_t cap = cap_set_num_fault_regs(VVTD_FRCD_NUM)
@@ -384,6 +602,7 @@ static int vvtd_destroy(struct viommu *viommu)
 static const struct viommu_ops vvtd_hvm_vmx_ops = {
     .create = vvtd_create,
     .destroy = vvtd_destroy,
+    .handle_irq_request = vvtd_handle_irq_request,
 };
 
 REGISTER_VIOMMU(vvtd_hvm_vmx_ops);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 12/28] x86/vvtd: decode interrupt attribute from IRTE
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (10 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 11/28] x86/vvtd: Process interrupt remapping request Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-12 11:55   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 13/28] x86/vvtd: add a helper function to decide the interrupt format Chao Gao
                   ` (16 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

Without interrupt remapping, interrupt attributes can be extracted from
msi message or IOAPIC RTE. However, with interrupt remapping enabled,
the attributes are enclosed in the associated IRTE. This callback is
for cases in which the caller wants to acquire interrupt attributes, for
example:
1. vioapic_get_vector(). With vIOMMU, the RTE may don't contain vector.
2. perform EOI which is always based on the interrupt vector.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
v3:
 - add example cases in which we will use this function.
---
 xen/drivers/passthrough/vtd/vvtd.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
index 927e715..9890cc2 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -541,6 +541,30 @@ static int vvtd_handle_irq_request(const struct domain *d,
     return ret;
 }
 
+static int vvtd_get_irq_info(const struct domain *d,
+                             const struct arch_irq_remapping_request *irq,
+                             struct arch_irq_remapping_info *info)
+{
+    int ret;
+    struct iremap_entry irte;
+    struct vvtd *vvtd = domain_vvtd(d);
+
+    if ( !vvtd )
+        return -ENODEV;
+
+    ret = vvtd_get_entry(vvtd, irq, &irte);
+    /* not in an interrupt delivery, don't report faults to guest */
+    if ( ret )
+        return ret;
+
+    info->vector = irte.remap.vector;
+    info->dest = irte_dest(vvtd, irte.remap.dst);
+    info->dest_mode = irte.remap.dm;
+    info->delivery_mode = irte.remap.dlm;
+
+    return 0;
+}
+
 static void vvtd_reset(struct vvtd *vvtd)
 {
     uint64_t cap = cap_set_num_fault_regs(VVTD_FRCD_NUM)
@@ -603,6 +627,7 @@ static const struct viommu_ops vvtd_hvm_vmx_ops = {
     .create = vvtd_create,
     .destroy = vvtd_destroy,
     .handle_irq_request = vvtd_handle_irq_request,
+    .get_irq_info = vvtd_get_irq_info,
 };
 
 REGISTER_VIOMMU(vvtd_hvm_vmx_ops);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 13/28] x86/vvtd: add a helper function to decide the interrupt format
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (11 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 12/28] x86/vvtd: decode interrupt attribute from IRTE Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-12 12:14   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 14/28] x86/vvtd: Handle interrupt translation faults Chao Gao
                   ` (15 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

Different platform may use different method to distinguish
remapping format interrupt and normal format interrupt.

Intel uses one bit in IOAPIC RTE or MSI address register to
indicate the interrupt is remapping format. vvtd should handle
all the interrupts when .check_irq_remapping() return true.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>

---
v3:
 - new
---
 xen/drivers/passthrough/vtd/vvtd.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
index 9890cc2..d3dec01 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -565,6 +565,15 @@ static int vvtd_get_irq_info(const struct domain *d,
     return 0;
 }
 
+/* check whether the interrupt request is remappable */
+static bool vvtd_is_remapping(const struct domain *d,
+                              const struct arch_irq_remapping_request *irq)
+{
+    uint32_t idx;
+
+    return !irq_remapping_request_index(irq, &idx);
+}
+
 static void vvtd_reset(struct vvtd *vvtd)
 {
     uint64_t cap = cap_set_num_fault_regs(VVTD_FRCD_NUM)
@@ -628,6 +637,7 @@ static const struct viommu_ops vvtd_hvm_vmx_ops = {
     .destroy = vvtd_destroy,
     .handle_irq_request = vvtd_handle_irq_request,
     .get_irq_info = vvtd_get_irq_info,
+    .check_irq_remapping = vvtd_is_remapping,
 };
 
 REGISTER_VIOMMU(vvtd_hvm_vmx_ops);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 14/28] x86/vvtd: Handle interrupt translation faults
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (12 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 13/28] x86/vvtd: add a helper function to decide the interrupt format Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-12 12:55   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 15/28] x86/vvtd: Enable Queued Invalidation through GCMD Chao Gao
                   ` (14 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

Interrupt translation faults are non-recoverable fault. When faults
are triggered, it needs to populate fault info to Fault Recording
Registers and inject msi interrupt to notify guest IOMMU driver
to deal with faults.

This patch emulates hardware's handling interrupt translation
faults (more information about the process can be found in VT-d spec,
chipter "Translation Faults", section "Non-Recoverable Fault
Reporting" and section "Non-Recoverable Logging").
Specifically, viommu_record_fault() records the fault information and
viommu_report_non_recoverable_fault() reports faults to software.
Currently, only Primary Fault Logging is supported and the Number of
Fault-recording Registers is 1.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>

---
v4:
 - introduce a lock to protect fault-event related regs
---
 xen/drivers/passthrough/vtd/iommu.h |  51 ++++++-
 xen/drivers/passthrough/vtd/vvtd.c  | 288 +++++++++++++++++++++++++++++++++++-
 2 files changed, 333 insertions(+), 6 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index 82edd2a..dc2df75 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -196,26 +196,67 @@
 #define DMA_CCMD_CAIG_MASK(x) (((u64)x) & ((u64) 0x3 << 59))
 
 /* FECTL_REG */
-#define DMA_FECTL_IM        ((uint32_t)1 << 31)
+#define DMA_FECTL_IM_SHIFT  31
+#define DMA_FECTL_IP_SHIFT  30
+#define DMA_FECTL_IM        ((uint32_t)1 << DMA_FECTL_IM_SHIFT)
+#define DMA_FECTL_IP        ((uint32_t)1 << DMA_FECTL_IP_SHIFT)
 
 /* FSTS_REG */
-#define DMA_FSTS_PFO        ((uint32_t)1 << 0)
-#define DMA_FSTS_PPF        ((uint32_t)1 << 1)
+#define DMA_FSTS_PFO_SHIFT  0
+#define DMA_FSTS_PPF_SHIFT  1
+#define DMA_FSTS_PRO_SHIFT  7
+
+#define DMA_FSTS_PFO        ((uint32_t)1 << DMA_FSTS_PFO_SHIFT)
+#define DMA_FSTS_PPF        ((uint32_t)1 << DMA_FSTS_PPF_SHIFT)
 #define DMA_FSTS_AFO        ((uint32_t)1 << 2)
 #define DMA_FSTS_APF        ((uint32_t)1 << 3)
 #define DMA_FSTS_IQE        ((uint32_t)1 << 4)
 #define DMA_FSTS_ICE        ((uint32_t)1 << 5)
 #define DMA_FSTS_ITE        ((uint32_t)1 << 6)
-#define DMA_FSTS_FAULTS    DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_AFO | DMA_FSTS_APF | DMA_FSTS_IQE | DMA_FSTS_ICE | DMA_FSTS_ITE
+#define DMA_FSTS_PRO        ((uint32_t)1 << DMA_FSTS_PRO_SHIFT)
+#define DMA_FSTS_FAULTS     (DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_AFO | \
+                             DMA_FSTS_APF | DMA_FSTS_IQE | DMA_FSTS_ICE | \
+                             DMA_FSTS_ITE | DMA_FSTS_PRO)
+#define DMA_FSTS_RW1CS      (DMA_FSTS_PFO | DMA_FSTS_AFO | DMA_FSTS_APF | \
+                             DMA_FSTS_IQE | DMA_FSTS_ICE | DMA_FSTS_ITE | \
+                             DMA_FSTS_PRO)
 #define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff)
 
 /* FRCD_REG, 32 bits access */
-#define DMA_FRCD_F (((u64)1) << 31)
+#define DMA_FRCD_LEN            0x10
+#define DMA_FRCD2_OFFSET        0x8
+#define DMA_FRCD3_OFFSET        0xc
+#define DMA_FRCD_F_SHIFT        31
+#define DMA_FRCD_F ((u64)1 << DMA_FRCD_F_SHIFT)
 #define dma_frcd_type(d) ((d >> 30) & 1)
 #define dma_frcd_fault_reason(c) (c & 0xff)
 #define dma_frcd_source_id(c) (c & 0xffff)
 #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
 
+struct vtd_fault_record_register
+{
+    union {
+        struct {
+            uint64_t lo;
+            uint64_t hi;
+        } bits;
+        struct {
+            uint64_t rsvd0          :12,
+                     fault_info     :52;
+            uint64_t source_id      :16,
+                     rsvd1          :9,
+                     pmr            :1,  /* Privilege Mode Requested */
+                     exe            :1,  /* Execute Permission Requested */
+                     pasid_p        :1,  /* PASID Present */
+                     fault_reason   :8,  /* Fault Reason */
+                     pasid_val      :20, /* PASID Value */
+                     addr_type      :2,  /* Address Type */
+                     type           :1,  /* Type. (0) Write (1) Read/AtomicOp */
+                     fault          :1;  /* Fault */
+        } fields;
+    };
+};
+
 /* Interrupt remapping transition faults */
 #define VTD_FR_IR_REQ_RSVD      0x20
 #define VTD_FR_IR_INDEX_OVER    0x21
diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
index d3dec01..83805d1 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -43,6 +43,7 @@
 struct hvm_hw_vvtd {
     bool eim_enabled;
     bool intremap_enabled;
+    uint32_t fault_index;
 
     /* Interrupt remapping table base gfn and the max of entries */
     uint16_t irt_max_entry;
@@ -58,6 +59,12 @@ struct vvtd {
     struct domain *domain;
     /* # of in-flight interrupts */
     atomic_t inflight_intr;
+    /*
+     * This lock protects fault-event related registers (DMAR_FEXXX_REG).
+     * It's used for draining in-flight fault events before responding
+     * guest's programming to those registers.
+     */
+    spinlock_t fe_lock;
 
     struct hvm_hw_vvtd hw;
     void *irt_base;
@@ -87,6 +94,21 @@ boolean_runtime_param("viommu_verbose", viommu_verbose);
 #endif
 
 #define VVTD_REG_POS(vvtd, offset) &(vvtd->hw.regs[offset/sizeof(uint32_t)])
+static inline int vvtd_test_and_set_bit(struct vvtd *vvtd, uint32_t reg, int nr)
+{
+    return test_and_set_bit(nr, VVTD_REG_POS(vvtd, reg));
+}
+
+static inline int vvtd_test_and_clear_bit(struct vvtd *vvtd, uint32_t reg,
+                                          int nr)
+{
+    return test_and_clear_bit(nr, VVTD_REG_POS(vvtd, reg));
+}
+
+static inline int vvtd_test_bit(struct vvtd *vvtd, uint32_t reg, int nr)
+{
+    return test_bit(nr, VVTD_REG_POS(vvtd, reg));
+}
 
 static inline void vvtd_set_bit(struct vvtd *vvtd, uint32_t reg, int nr)
 {
@@ -238,6 +260,30 @@ static int vvtd_delivery(struct domain *d, uint8_t vector,
     return 0;
 }
 
+static void vvtd_generate_interrupt(const struct vvtd *vvtd, uint64_t addr,
+                                    uint32_t data)
+{
+    bool dm = addr & MSI_ADDR_DESTMODE_MASK;
+    uint32_t dest = MASK_EXTR(addr, MSI_ADDR_DEST_ID_MASK);
+    uint8_t dlm = MASK_EXTR(data, MSI_DATA_DELIVERY_MODE_MASK);
+    uint8_t tm = MASK_EXTR(data, MSI_DATA_TRIGGER_MASK);
+    uint8_t vector = data & MSI_DATA_VECTOR_MASK;
+
+    vvtd_debug("d%d: generating msi %lx %x\n", vvtd->domain->domain_id, addr,
+               data);
+
+    if ( vvtd->hw.eim_enabled )
+        dest |= (addr >> 40) << 8;
+
+    vvtd_delivery(vvtd->domain, vector, dest, dm, dlm, tm);
+}
+
+static void vvtd_notify_fault(const struct vvtd *vvtd)
+{
+    vvtd_generate_interrupt(vvtd, vvtd_get_reg_quad(vvtd, DMAR_FEADDR_REG),
+                            vvtd_get_reg(vvtd, DMAR_FEDATA_REG));
+}
+
 /* Computing the IRTE index for a given interrupt request. When success, return
  * 0 and set index to reference the corresponding IRTE. Otherwise, return < 0,
  * i.e. -1 when the irq request isn't an remapping format.
@@ -290,6 +336,198 @@ static inline uint32_t irte_dest(struct vvtd *vvtd, uint32_t dest)
                                 : MASK_EXTR(dest, IRTE_xAPIC_DEST_MASK);
 }
 
+static void vvtd_report_non_recoverable_fault(struct vvtd *vvtd, int reason)
+{
+    uint32_t fsts = vvtd_get_reg(vvtd, DMAR_FSTS_REG);
+
+    vvtd_set_bit(vvtd, DMAR_FSTS_REG, reason);
+
+    /*
+     * Accoroding to VT-d spec "Non-Recoverable Fault Event" chapter, if
+     * there are any previously reported interrupt conditions that are yet to
+     * be sevices by software, the Fault Event interrrupt is not generated.
+     */
+    if ( fsts & DMA_FSTS_FAULTS )
+        return;
+
+    vvtd_set_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IP_SHIFT);
+    if ( !vvtd_test_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IM_SHIFT) )
+    {
+        vvtd_notify_fault(vvtd);
+        vvtd_clear_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IP_SHIFT);
+    }
+}
+
+static void vvtd_update_ppf(struct vvtd *vvtd)
+{
+    int i;
+    uint64_t cap = vvtd_get_reg_quad(vvtd, DMAR_CAP_REG);
+    unsigned int base = cap_fault_reg_offset(cap);
+
+    for ( i = 0; i < cap_num_fault_regs(cap); i++ )
+    {
+        if ( vvtd_test_bit(vvtd, base + i * DMA_FRCD_LEN + DMA_FRCD3_OFFSET,
+                           DMA_FRCD_F_SHIFT) )
+        {
+            vvtd_report_non_recoverable_fault(vvtd, DMA_FSTS_PPF_SHIFT);
+            return;
+        }
+    }
+    /*
+     * No Primary Fault is in Fault Record Registers, thus clear PPF bit in
+     * FSTS.
+     */
+    vvtd_clear_bit(vvtd, DMAR_FSTS_REG, DMA_FSTS_PPF_SHIFT);
+
+    /* If no fault is in FSTS, clear pending bit in FECTL. */
+    if ( !(vvtd_get_reg(vvtd, DMAR_FSTS_REG) & DMA_FSTS_FAULTS) )
+        vvtd_clear_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IP_SHIFT);
+}
+
+/*
+ * Commit a fault to emulated Fault Record Registers.
+ */
+static void vvtd_commit_frcd(struct vvtd *vvtd, int idx,
+                             const struct vtd_fault_record_register *frcd)
+{
+    unsigned int base = cap_fault_reg_offset(
+                            vvtd_get_reg_quad(vvtd, DMAR_CAP_REG));
+
+    vvtd_set_reg_quad(vvtd, base + idx * DMA_FRCD_LEN, frcd->bits.lo);
+    vvtd_set_reg_quad(vvtd, base + idx * DMA_FRCD_LEN + 8, frcd->bits.hi);
+    vvtd_update_ppf(vvtd);
+}
+
+/*
+ * Allocate a FRCD for the caller. If success, return the FRI. Or, return -1
+ * when failure.
+ */
+static int vvtd_alloc_frcd(struct vvtd *vvtd)
+{
+    int prev;
+    uint64_t cap = vvtd_get_reg_quad(vvtd, DMAR_CAP_REG);
+    unsigned int base = cap_fault_reg_offset(cap);
+
+    /* Set the F bit to indicate the FRCD is in use. */
+    if ( !vvtd_test_and_set_bit(vvtd,
+                                base + vvtd->hw.fault_index * DMA_FRCD_LEN +
+                                DMA_FRCD3_OFFSET, DMA_FRCD_F_SHIFT) )
+    {
+        prev = vvtd->hw.fault_index;
+        vvtd->hw.fault_index = (prev + 1) % cap_num_fault_regs(cap);
+        return vvtd->hw.fault_index;
+    }
+    return -ENOMEM;
+}
+
+static void vvtd_free_frcd(struct vvtd *vvtd, int i)
+{
+    unsigned int base = cap_fault_reg_offset(
+                            vvtd_get_reg_quad(vvtd, DMAR_CAP_REG));
+
+    vvtd_clear_bit(vvtd, base + i * DMA_FRCD_LEN + DMA_FRCD3_OFFSET,
+                   DMA_FRCD_F_SHIFT);
+}
+
+static int vvtd_record_fault(struct vvtd *vvtd,
+                             const struct arch_irq_remapping_request *request,
+                             int reason)
+{
+    struct vtd_fault_record_register frcd;
+    int fault_index;
+    uint32_t irt_index;
+
+    spin_lock(&vvtd->fe_lock);
+    switch(reason)
+    {
+    case VTD_FR_IR_REQ_RSVD:
+    case VTD_FR_IR_INDEX_OVER:
+    case VTD_FR_IR_ENTRY_P:
+    case VTD_FR_IR_ROOT_INVAL:
+    case VTD_FR_IR_IRTE_RSVD:
+    case VTD_FR_IR_REQ_COMPAT:
+    case VTD_FR_IR_SID_ERR:
+        if ( vvtd_test_bit(vvtd, DMAR_FSTS_REG, DMA_FSTS_PFO_SHIFT) )
+            goto out;
+
+        /* No available Fault Record means Fault overflowed */
+        fault_index = vvtd_alloc_frcd(vvtd);
+        if ( fault_index < 0 )
+        {
+            vvtd_report_non_recoverable_fault(vvtd, DMA_FSTS_PFO_SHIFT);
+            goto out;
+        }
+        memset(&frcd, 0, sizeof(frcd));
+        frcd.fields.fault_reason = reason;
+        if ( irq_remapping_request_index(request, &irt_index) )
+            goto out;
+        frcd.fields.fault_info = irt_index;
+        frcd.fields.source_id = request->source_id;
+        frcd.fields.fault = 1;
+        vvtd_commit_frcd(vvtd, fault_index, &frcd);
+        break;
+
+    default:
+        vvtd_debug("d%d: can't handle vvtd fault (reason 0x%x)",
+                   vvtd->domain->domain_id, reason);
+        break;
+    }
+
+ out:
+    spin_unlock(&vvtd->fe_lock);
+    return X86EMUL_OKAY;
+}
+
+static int vvtd_write_frcd3(struct vvtd *vvtd, uint32_t val)
+{
+    /* Writing a 1 means clear fault */
+    if ( val & DMA_FRCD_F )
+    {
+        vvtd_free_frcd(vvtd, 0);
+        vvtd_update_ppf(vvtd);
+    }
+    return X86EMUL_OKAY;
+}
+
+static void vvtd_write_fectl(struct vvtd *vvtd, uint32_t val)
+{
+    /*
+     * Only DMA_FECTL_IM bit is writable. Generate pending event when unmask.
+     */
+    if ( !(val & DMA_FECTL_IM) )
+    {
+        /* Clear IM */
+        vvtd_clear_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IM_SHIFT);
+        if ( vvtd_test_and_clear_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IP_SHIFT) )
+            vvtd_notify_fault(vvtd);
+    }
+    else
+        vvtd_set_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IM_SHIFT);
+}
+
+static void vvtd_write_fsts(struct vvtd *vvtd, uint32_t val)
+{
+    int i, max_fault_index = DMA_FSTS_PRO_SHIFT;
+    uint64_t bits_to_clear = val & DMA_FSTS_RW1CS;
+
+    if ( bits_to_clear )
+    {
+        i = find_first_bit(&bits_to_clear, max_fault_index / 8 + 1);
+        while ( i <= max_fault_index )
+        {
+            vvtd_clear_bit(vvtd, DMAR_FSTS_REG, i);
+            i = find_next_bit(&bits_to_clear, max_fault_index / 8 + 1, i + 1);
+        }
+    }
+
+    /*
+     * Clear IP field when all status fields in the Fault Status Register
+     * being clear.
+     */
+    if ( !((vvtd_get_reg(vvtd, DMAR_FSTS_REG) & DMA_FSTS_FAULTS)) )
+        vvtd_clear_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IP_SHIFT);
+}
+
 static void write_gcmd_ire(struct vvtd *vvtd, uint32_t val)
 {
     bool set = val & DMA_GCMD_IRE;
@@ -391,11 +629,47 @@ static int vvtd_read(struct vcpu *v, unsigned long addr,
     return X86EMUL_OKAY;
 }
 
+static void vvtd_write_fault_regs(struct vvtd *vvtd, unsigned long val,
+                                  unsigned int offset, unsigned int len)
+{
+    unsigned int fault_offset = cap_fault_reg_offset(
+                                    vvtd_get_reg_quad(vvtd, DMAR_CAP_REG));
+
+    spin_lock(&vvtd->fe_lock);
+    for ( ; len ; len -= 4, offset += 4, val = val >> 32)
+    {
+        switch ( offset )
+        {
+        case DMAR_FSTS_REG:
+            vvtd_write_fsts(vvtd, val);
+            break;
+
+        case DMAR_FECTL_REG:
+            vvtd_write_fectl(vvtd, val);
+            break;
+
+        case DMAR_FEDATA_REG:
+        case DMAR_FEADDR_REG:
+        case DMAR_FEUADDR_REG:
+            vvtd_set_reg(vvtd, offset, val);
+            break;
+
+        default:
+            if ( offset == (fault_offset + DMA_FRCD3_OFFSET) )
+                 vvtd_write_frcd3(vvtd, val);
+            break;
+        }
+    }
+    spin_unlock(&vvtd->fe_lock);
+}
+
 static int vvtd_write(struct vcpu *v, unsigned long addr,
                       unsigned int len, unsigned long val)
 {
     struct vvtd *vvtd = domain_vvtd(v->domain);
     unsigned int offset = addr - vvtd->base_addr;
+    unsigned int fault_offset = cap_fault_reg_offset(
+                                    vvtd_get_reg_quad(vvtd, DMAR_CAP_REG));
 
     vvtd_info("Write offset %x len %d val %lx\n", offset, len, val);
 
@@ -419,7 +693,18 @@ static int vvtd_write(struct vcpu *v, unsigned long addr,
         vvtd_set_reg(vvtd, offset, val);
         break;
 
+    case DMAR_FSTS_REG:
+    case DMAR_FECTL_REG:
+    case DMAR_FEDATA_REG:
+    case DMAR_FEADDR_REG:
+    case DMAR_FEUADDR_REG:
+        vvtd_write_fault_regs(vvtd, val, offset, len);
+        break;
+
     default:
+        if ( (offset == (fault_offset + DMA_FRCD2_OFFSET)) ||
+             (offset == (fault_offset + DMA_FRCD3_OFFSET)) )
+            vvtd_write_fault_regs(vvtd, val, offset, len);
         break;
     }
 
@@ -448,7 +733,7 @@ static void vvtd_handle_fault(struct vvtd *vvtd,
     case VTD_FR_IR_REQ_RSVD:
     case VTD_FR_IR_INDEX_OVER:
     case VTD_FR_IR_ROOT_INVAL:
-        /* TODO: handle fault (e.g. record and report this fault to VM */
+        vvtd_record_fault(vvtd, irq, fault);
         break;
 
     default:
@@ -607,6 +892,7 @@ static int vvtd_create(struct domain *d, struct viommu *viommu)
     vvtd->base_addr = viommu->base_address;
     vvtd->domain = d;
     register_mmio_handler(d, &vvtd_mmio_ops);
+    spin_lock_init(&vvtd->fe_lock);
 
     viommu->priv = vvtd;
 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 15/28] x86/vvtd: Enable Queued Invalidation through GCMD
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (13 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 14/28] x86/vvtd: Handle interrupt translation faults Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-12 14:04   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 16/28] x86/vvtd: Add queued invalidation (QI) support Chao Gao
                   ` (13 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

Software writes to QIE field of GCMD to enable or disable queued
invalidations. This patch emulates QIE field of GCMD.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
 xen/drivers/passthrough/vtd/iommu.h |  3 ++-
 xen/drivers/passthrough/vtd/vvtd.c  | 18 ++++++++++++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index dc2df75..b71dab8 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -160,7 +160,8 @@
 #define DMA_GSTS_FLS    (((u64)1) << 29)
 #define DMA_GSTS_AFLS   (((u64)1) << 28)
 #define DMA_GSTS_WBFS   (((u64)1) << 27)
-#define DMA_GSTS_QIES   (((u64)1) <<26)
+#define DMA_GSTS_QIES_SHIFT     26
+#define DMA_GSTS_QIES   (((u64)1) << DMA_GSTS_QIES_SHIFT)
 #define DMA_GSTS_IRES_SHIFT     25
 #define DMA_GSTS_IRES   (((u64)1) << DMA_GSTS_IRES_SHIFT)
 #define DMA_GSTS_SIRTPS_SHIFT   24
diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
index 83805d1..a2fa64a 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -539,6 +539,20 @@ static void write_gcmd_ire(struct vvtd *vvtd, uint32_t val)
         (vvtd, DMAR_GSTS_REG, DMA_GSTS_IRES_SHIFT);
 }
 
+static void write_gcmd_qie(struct vvtd *vvtd, uint32_t val)
+{
+    bool set = val & DMA_GCMD_QIE;
+
+    vvtd_info("%sable Queue Invalidation\n", set ? "En" : "Dis");
+
+    if ( set )
+        vvtd_set_reg_quad(vvtd, DMAR_IQH_REG, 0);
+
+    (set ? vvtd_set_bit : vvtd_clear_bit)
+        (vvtd, DMAR_GSTS_REG, DMA_GSTS_QIES_SHIFT);
+
+}
+
 static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
 {
     uint64_t irta = vvtd_get_reg_quad(vvtd, DMAR_IRTA_REG);
@@ -598,6 +612,10 @@ static void vvtd_write_gcmd(struct vvtd *vvtd, uint32_t val)
         write_gcmd_sirtp(vvtd, val);
     if ( changed & DMA_GCMD_IRE )
         write_gcmd_ire(vvtd, val);
+    if ( changed & DMA_GCMD_QIE )
+        write_gcmd_qie(vvtd, val);
+    if ( changed & ~(DMA_GCMD_SIRTP | DMA_GCMD_IRE | DMA_GCMD_QIE) )
+        vvtd_info("Only SIRTP, IRE, QIE in GCMD are handled");
 }
 
 static int vvtd_in_range(struct vcpu *v, unsigned long addr)
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 16/28] x86/vvtd: Add queued invalidation (QI) support
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (14 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 15/28] x86/vvtd: Enable Queued Invalidation through GCMD Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-12 14:36   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 17/28] x86/vvtd: save and restore emulated VT-d Chao Gao
                   ` (12 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

Queued Invalidation Interface is an expanded invalidation interface with
extended capabilities. Hardware implementations report support for queued
invalidation interface through the Extended Capability Register. The queued
invalidation interface uses an Invalidation Queue (IQ), which is a circular
buffer in system memory. Software submits commands by writing Invalidation
Descriptors to the IQ.

In this patch, a new function viommu_process_iq() is used for emulating how
hardware handles invalidation requests through QI.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>

---
v4:
 - Introduce a lock to protect invalidation related registers.
---
 xen/drivers/passthrough/vtd/iommu.h |  24 +++-
 xen/drivers/passthrough/vtd/vvtd.c  | 271 +++++++++++++++++++++++++++++++++++-
 2 files changed, 293 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index b71dab8..de9188b 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -47,7 +47,12 @@
 #define DMAR_IQH_REG            0x80 /* invalidation queue head */
 #define DMAR_IQT_REG            0x88 /* invalidation queue tail */
 #define DMAR_IQA_REG            0x90 /* invalidation queue addr */
+#define DMAR_IQUA_REG           0x94 /* invalidation queue upper addr */
+#define DMAR_ICS_REG            0x9c /* invalidation completion status */
 #define DMAR_IECTL_REG          0xa0 /* invalidation event control register */
+#define DMAR_IEDATA_REG         0xa4 /* invalidation event data register */
+#define DMAR_IEADDR_REG         0xa8 /* invalidation event address register */
+#define DMAR_IEUADDR_REG        0xac /* upper address register */
 #define DMAR_IRTA_REG           0xb8 /* base address of intr remap table */
 #define DMAR_IRTUA_REG          0xbc /* upper address of intr remap table */
 
@@ -175,6 +180,21 @@
 #define DMA_IRTA_S(val)         (val & 0xf)
 #define DMA_IRTA_SIZE(val)      (1UL << (DMA_IRTA_S(val) + 1))
 
+/* IQA_REG */
+#define DMA_IQA_ADDR(val)       (val & ~0xfffULL)
+#define DMA_IQA_QS(val)         (val & 0x7)
+#define DMA_IQA_RSVD            0xff8ULL
+
+/* IECTL_REG */
+#define DMA_IECTL_IM_SHIFT 31
+#define DMA_IECTL_IM            (1U << DMA_IECTL_IM_SHIFT)
+#define DMA_IECTL_IP_SHIFT 30
+#define DMA_IECTL_IP            (1U << DMA_IECTL_IP_SHIFT)
+
+/* ICS_REG */
+#define DMA_ICS_IWC_SHIFT       0
+#define DMA_ICS_IWC             (1U << DMA_ICS_IWC_SHIFT)
+
 /* PMEN_REG */
 #define DMA_PMEN_EPM    (((u32)1) << 31)
 #define DMA_PMEN_PRS    (((u32)1) << 0)
@@ -205,13 +225,14 @@
 /* FSTS_REG */
 #define DMA_FSTS_PFO_SHIFT  0
 #define DMA_FSTS_PPF_SHIFT  1
+#define DMA_FSTS_IQE_SHIFT  4
 #define DMA_FSTS_PRO_SHIFT  7
 
 #define DMA_FSTS_PFO        ((uint32_t)1 << DMA_FSTS_PFO_SHIFT)
 #define DMA_FSTS_PPF        ((uint32_t)1 << DMA_FSTS_PPF_SHIFT)
 #define DMA_FSTS_AFO        ((uint32_t)1 << 2)
 #define DMA_FSTS_APF        ((uint32_t)1 << 3)
-#define DMA_FSTS_IQE        ((uint32_t)1 << 4)
+#define DMA_FSTS_IQE        ((uint32_t)1 << DMA_FSTS_IQE_SHIFT)
 #define DMA_FSTS_ICE        ((uint32_t)1 << 5)
 #define DMA_FSTS_ITE        ((uint32_t)1 << 6)
 #define DMA_FSTS_PRO        ((uint32_t)1 << DMA_FSTS_PRO_SHIFT)
@@ -555,6 +576,7 @@ struct qinval_entry {
 
 /* Queue invalidation head/tail shift */
 #define QINVAL_INDEX_SHIFT 4
+#define QINVAL_INDEX_MASK  0x7fff0ULL
 
 #define qinval_present(v) ((v).lo & 1)
 #define qinval_fault_disable(v) (((v).lo >> 1) & 1)
diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
index a2fa64a..81170ec 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -27,6 +27,7 @@
 #include <asm/event.h>
 #include <asm/io_apic.h>
 #include <asm/hvm/domain.h>
+#include <asm/hvm/support.h>
 #include <asm/p2m.h>
 
 #include "iommu.h"
@@ -68,6 +69,9 @@ struct vvtd {
 
     struct hvm_hw_vvtd hw;
     void *irt_base;
+    void *inv_queue_base;
+    /* This lock protects invalidation related registers */
+    spinlock_t ie_lock;
 };
 
 /* Setting viommu_verbose enables debugging messages of vIOMMU */
@@ -284,6 +288,12 @@ static void vvtd_notify_fault(const struct vvtd *vvtd)
                             vvtd_get_reg(vvtd, DMAR_FEDATA_REG));
 }
 
+static void vvtd_notify_inv_completion(const struct vvtd *vvtd)
+{
+    vvtd_generate_interrupt(vvtd, vvtd_get_reg_quad(vvtd, DMAR_IEADDR_REG),
+                            vvtd_get_reg(vvtd, DMAR_IEDATA_REG));
+}
+
 /* Computing the IRTE index for a given interrupt request. When success, return
  * 0 and set index to reference the corresponding IRTE. Otherwise, return < 0,
  * i.e. -1 when the irq request isn't an remapping format.
@@ -478,6 +488,189 @@ static int vvtd_record_fault(struct vvtd *vvtd,
     return X86EMUL_OKAY;
 }
 
+/*
+ * Process an invalidation descriptor. Currently, only two types descriptors,
+ * Interrupt Entry Cache Invalidation Descritor and Invalidation Wait
+ * Descriptor are handled.
+ * @vvtd: the virtual vtd instance
+ * @i: the index of the invalidation descriptor to be processed
+ *
+ * If success return 0, or return non-zero when failure.
+ */
+static int process_iqe(struct vvtd *vvtd, uint32_t i)
+{
+    struct qinval_entry qinval;
+    int ret = 0;
+
+    if ( !vvtd->inv_queue_base )
+    {
+        gdprintk(XENLOG_ERR, "Invalidation queue base isn't set\n");
+        return -1;
+    }
+    qinval = ((struct qinval_entry *)vvtd->inv_queue_base)[i];
+
+    switch ( qinval.q.inv_wait_dsc.lo.type )
+    {
+    case TYPE_INVAL_WAIT:
+        if ( qinval.q.inv_wait_dsc.lo.sw )
+        {
+            uint32_t data = qinval.q.inv_wait_dsc.lo.sdata;
+            uint64_t addr = qinval.q.inv_wait_dsc.hi.saddr << 2;
+
+            ret = hvm_copy_to_guest_phys(addr, &data, sizeof(data), current);
+            if ( ret )
+                vvtd_info("Failed to write status address\n");
+        }
+
+        /*
+         * The following code generates an invalidation completion event
+         * indicating the invalidation wait descriptor completion. Note that
+         * the following code fragment is not tested properly.
+         */
+        if ( qinval.q.inv_wait_dsc.lo.iflag )
+        {
+            if ( !vvtd_test_and_set_bit(vvtd, DMAR_ICS_REG, DMA_ICS_IWC_SHIFT) )
+            {
+                vvtd_set_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IP_SHIFT);
+                if ( !vvtd_test_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IM_SHIFT) )
+                {
+                    vvtd_notify_inv_completion(vvtd);
+                    vvtd_clear_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IP_SHIFT);
+                }
+            }
+        }
+        break;
+
+    case TYPE_INVAL_IEC:
+        /* No cache is preserved in vvtd, nothing is needed to be flushed */
+        break;
+
+    default:
+        vvtd_debug("d%d: Invalidation type (%x) isn't supported\n",
+                   vvtd->domain->domain_id, qinval.q.inv_wait_dsc.lo.type);
+        return -1;
+    }
+
+    return ret;
+}
+
+/*
+ * Invalidate all the descriptors in Invalidation Queue.
+ */
+static void vvtd_process_iq(struct vvtd *vvtd)
+{
+    uint32_t max_entry, i, iqh, iqt;
+    int err = 0;
+
+    /* Trylock avoids more than 1 caller dealing with invalidation requests */
+    if ( !spin_trylock(&vvtd->ie_lock) )
+        return;
+
+    iqh = MASK_EXTR(vvtd_get_reg_quad(vvtd, DMAR_IQH_REG), QINVAL_INDEX_MASK);
+    iqt = MASK_EXTR(vvtd_get_reg_quad(vvtd, DMAR_IQT_REG), QINVAL_INDEX_MASK);
+    /*
+     * No new descriptor is fetched from the Invalidation Queue until
+     * software clears the IQE field in the Fault Status Register
+     */
+    if ( vvtd_test_bit(vvtd, DMAR_FSTS_REG, DMA_FSTS_IQE_SHIFT) )
+    {
+        spin_unlock(&vvtd->ie_lock);
+        return;
+    }
+
+    max_entry = 1 << (QINVAL_ENTRY_ORDER +
+                      DMA_IQA_QS(vvtd_get_reg_quad(vvtd, DMAR_IQA_REG)));
+
+    ASSERT(iqt < max_entry);
+    if ( iqh == iqt )
+    {
+        spin_unlock(&vvtd->ie_lock);
+        return;
+    }
+
+    for ( i = iqh; i != iqt; i = (i + 1) % max_entry )
+    {
+        err = process_iqe(vvtd, i);
+        if ( err )
+            break;
+    }
+
+    /*
+     * set IQH before checking error, because IQH should reference
+     * the desriptor associated with the error when an error is seen
+     * by guest
+     */
+    vvtd_set_reg_quad(vvtd, DMAR_IQH_REG, i << QINVAL_INDEX_SHIFT);
+
+    spin_unlock(&vvtd->ie_lock);
+    if ( err )
+    {
+        spin_lock(&vvtd->fe_lock);
+        vvtd_report_non_recoverable_fault(vvtd, DMA_FSTS_IQE_SHIFT);
+        spin_unlock(&vvtd->fe_lock);
+    }
+}
+
+static void vvtd_write_iqt(struct vvtd *vvtd, uint32_t val)
+{
+    uint32_t max_entry;
+
+    if ( val & ~QINVAL_INDEX_MASK )
+    {
+        vvtd_info("attempts to set reserved bits in IQT\n");
+        return;
+    }
+
+    max_entry = 1U << (QINVAL_ENTRY_ORDER +
+                       DMA_IQA_QS(vvtd_get_reg_quad(vvtd, DMAR_IQA_REG)));
+    if ( MASK_EXTR(val, QINVAL_INDEX_MASK) >= max_entry )
+    {
+        vvtd_info("IQT: Value %x exceeded supported max index.", val);
+        return;
+    }
+
+    vvtd_set_reg(vvtd, DMAR_IQT_REG, val);
+}
+
+static void vvtd_write_iqa(struct vvtd *vvtd, uint32_t val, bool high)
+{
+    uint64_t cap = vvtd_get_reg_quad(vvtd, DMAR_CAP_REG);
+    uint64_t old = vvtd_get_reg_quad(vvtd, DMAR_IQA_REG);
+    uint64_t new;
+
+    if ( high )
+        new = ((uint64_t)val << 32) | (old & 0xffffffff);
+    else
+        new = ((old >> 32) << 32) | val;
+
+    if ( new & (~((1ULL << cap_mgaw(cap)) - 1) | DMA_IQA_RSVD) )
+    {
+        vvtd_info("Attempt to set reserved bits in IQA\n");
+        return;
+    }
+
+    vvtd_set_reg_quad(vvtd, DMAR_IQA_REG, new);
+    if ( high && !vvtd->inv_queue_base )
+        vvtd->inv_queue_base = map_guest_pages(vvtd->domain,
+                                               PFN_DOWN(DMA_IQA_ADDR(new)),
+                                               1 << DMA_IQA_QS(new));
+    else if ( !high && vvtd->inv_queue_base )
+    {
+        unmap_guest_pages(vvtd->inv_queue_base, 1 << DMA_IQA_QS(old));
+        vvtd->inv_queue_base = NULL;
+    }
+}
+
+static void vvtd_write_ics(struct vvtd *vvtd, uint32_t val)
+{
+    if ( val & DMA_ICS_IWC )
+    {
+        vvtd_clear_bit(vvtd, DMAR_ICS_REG, DMA_ICS_IWC_SHIFT);
+        /* When IWC field is cleared, the IP field needs to be cleared */
+        vvtd_clear_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IP_SHIFT);
+    }
+}
+
 static int vvtd_write_frcd3(struct vvtd *vvtd, uint32_t val)
 {
     /* Writing a 1 means clear fault */
@@ -489,6 +682,20 @@ static int vvtd_write_frcd3(struct vvtd *vvtd, uint32_t val)
     return X86EMUL_OKAY;
 }
 
+static void vvtd_write_iectl(struct vvtd *vvtd, uint32_t val)
+{
+    /* Only DMA_IECTL_IM bit is writable. Generate pending event when unmask */
+    if ( !(val & DMA_IECTL_IM) )
+    {
+        /* Clear IM and clear IP */
+        vvtd_clear_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IM_SHIFT);
+        if ( vvtd_test_and_clear_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IP_SHIFT) )
+            vvtd_notify_inv_completion(vvtd);
+    }
+    else
+        vvtd_set_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IM_SHIFT);
+}
+
 static void vvtd_write_fectl(struct vvtd *vvtd, uint32_t val)
 {
     /*
@@ -681,6 +888,48 @@ static void vvtd_write_fault_regs(struct vvtd *vvtd, unsigned long val,
     spin_unlock(&vvtd->fe_lock);
 }
 
+static void vvtd_write_invalidation_regs(struct vvtd *vvtd, unsigned long val,
+                                         unsigned int offset, unsigned int len)
+{
+    spin_lock(&vvtd->ie_lock);
+    for ( ; len ; len -= 4, offset += 4, val = val >> 32)
+    {
+        switch ( offset )
+        {
+        case DMAR_IECTL_REG:
+            vvtd_write_iectl(vvtd, val);
+            break;
+
+        case DMAR_ICS_REG:
+            vvtd_write_ics(vvtd, val);
+            break;
+
+        case DMAR_IQT_REG:
+            vvtd_write_iqt(vvtd, val);
+            break;
+
+        case DMAR_IQA_REG:
+            vvtd_write_iqa(vvtd, val, 0);
+            break;
+
+        case DMAR_IQUA_REG:
+            vvtd_write_iqa(vvtd, val, 1);
+            break;
+
+        case DMAR_IEDATA_REG:
+        case DMAR_IEADDR_REG:
+        case DMAR_IEUADDR_REG:
+            vvtd_set_reg(vvtd, offset, val);
+            break;
+
+        default:
+            break;
+        }
+    }
+    spin_unlock(&vvtd->ie_lock);
+
+}
+
 static int vvtd_write(struct vcpu *v, unsigned long addr,
                       unsigned int len, unsigned long val)
 {
@@ -719,6 +968,17 @@ static int vvtd_write(struct vcpu *v, unsigned long addr,
         vvtd_write_fault_regs(vvtd, val, offset, len);
         break;
 
+    case DMAR_IECTL_REG:
+    case DMAR_ICS_REG:
+    case DMAR_IQT_REG:
+    case DMAR_IQA_REG:
+    case DMAR_IQUA_REG:
+    case DMAR_IEDATA_REG:
+    case DMAR_IEADDR_REG:
+    case DMAR_IEUADDR_REG:
+        vvtd_write_invalidation_regs(vvtd, val, offset, len);
+        break;
+
     default:
         if ( (offset == (fault_offset + DMA_FRCD2_OFFSET)) ||
              (offset == (fault_offset + DMA_FRCD3_OFFSET)) )
@@ -840,7 +1100,8 @@ static int vvtd_handle_irq_request(const struct domain *d,
                         irte.remap.tm);
 
  out:
-    atomic_dec(&vvtd->inflight_intr);
+    if ( !atomic_dec_and_test(&vvtd->inflight_intr) )
+        vvtd_process_iq(vvtd);
     return ret;
 }
 
@@ -911,6 +1172,7 @@ static int vvtd_create(struct domain *d, struct viommu *viommu)
     vvtd->domain = d;
     register_mmio_handler(d, &vvtd_mmio_ops);
     spin_lock_init(&vvtd->fe_lock);
+    spin_lock_init(&vvtd->ie_lock);
 
     viommu->priv = vvtd;
 
@@ -930,6 +1192,13 @@ static int vvtd_destroy(struct viommu *viommu)
                                      sizeof(struct iremap_entry)));
             vvtd->irt_base = NULL;
         }
+        if ( vvtd->inv_queue_base )
+        {
+            uint64_t old = vvtd_get_reg_quad(vvtd, DMAR_IQA_REG);
+
+            unmap_guest_pages(vvtd->inv_queue_base, 1 << DMA_IQA_QS(old));
+            vvtd->inv_queue_base = NULL;
+        }
         xfree(vvtd);
     }
 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 17/28] x86/vvtd: save and restore emulated VT-d
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (15 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 16/28] x86/vvtd: Add queued invalidation (QI) support Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-12 14:49   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 18/28] x86/vioapic: Hook interrupt delivery of vIOAPIC Chao Gao
                   ` (11 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan, Ian Jackson,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

Provide a save-restore pair to save/restore registers and non-register
status.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
v3:
 - use one entry to save both vvtd registers and other intermediate
 state
---
 xen/drivers/passthrough/vtd/vvtd.c     | 57 +++++++++++++++++++++++-----------
 xen/include/public/arch-x86/hvm/save.h | 18 ++++++++++-
 2 files changed, 56 insertions(+), 19 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
index 81170ec..f6bde69 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -27,8 +27,10 @@
 #include <asm/event.h>
 #include <asm/io_apic.h>
 #include <asm/hvm/domain.h>
+#include <asm/hvm/save.h>
 #include <asm/hvm/support.h>
 #include <asm/p2m.h>
+#include <public/hvm/save.h>
 
 #include "iommu.h"
 #include "vtd.h"
@@ -38,20 +40,6 @@
 
 #define VVTD_FRCD_NUM   1ULL
 #define VVTD_FRCD_START (DMAR_IRTA_REG + 8)
-#define VVTD_FRCD_END   (VVTD_FRCD_START + VVTD_FRCD_NUM * 16)
-#define VVTD_MAX_OFFSET VVTD_FRCD_END
-
-struct hvm_hw_vvtd {
-    bool eim_enabled;
-    bool intremap_enabled;
-    uint32_t fault_index;
-
-    /* Interrupt remapping table base gfn and the max of entries */
-    uint16_t irt_max_entry;
-    gfn_t irt;
-
-    uint32_t regs[VVTD_MAX_OFFSET/sizeof(uint32_t)];
-};
 
 struct vvtd {
     /* Base address of remapping hardware register-set */
@@ -776,7 +764,7 @@ static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
     if ( vvtd->hw.intremap_enabled )
         vvtd_info("Update Interrupt Remapping Table when active\n");
 
-    if ( gfn_x(vvtd->hw.irt) != PFN_DOWN(DMA_IRTA_ADDR(irta)) ||
+    if ( vvtd->hw.irt != PFN_DOWN(DMA_IRTA_ADDR(irta)) ||
          vvtd->hw.irt_max_entry != DMA_IRTA_SIZE(irta) )
     {
         if ( vvtd->irt_base )
@@ -786,14 +774,14 @@ static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
                                      sizeof(struct iremap_entry)));
             vvtd->irt_base = NULL;
         }
-        vvtd->hw.irt = _gfn(PFN_DOWN(DMA_IRTA_ADDR(irta)));
+        vvtd->hw.irt = PFN_DOWN(DMA_IRTA_ADDR(irta));
         vvtd->hw.irt_max_entry = DMA_IRTA_SIZE(irta);
         vvtd->hw.eim_enabled = !!(irta & IRTA_EIME);
         vvtd_info("Update IR info (addr=%lx eim=%d size=%d)\n",
-                  gfn_x(vvtd->hw.irt), vvtd->hw.eim_enabled,
+                  vvtd->hw.irt, vvtd->hw.eim_enabled,
                   vvtd->hw.irt_max_entry);
 
-        vvtd->irt_base = map_guest_pages(vvtd->domain, gfn_x(vvtd->hw.irt),
+        vvtd->irt_base = map_guest_pages(vvtd->domain, vvtd->hw.irt,
                                          PFN_UP(vvtd->hw.irt_max_entry *
                                                 sizeof(struct iremap_entry)));
     }
@@ -1138,6 +1126,39 @@ static bool vvtd_is_remapping(const struct domain *d,
     return !irq_remapping_request_index(irq, &idx);
 }
 
+static int vvtd_load(struct domain *d, hvm_domain_context_t *h)
+{
+    struct vvtd *vvtd = domain_vvtd(d);
+    uint64_t iqa;
+
+    if ( !vvtd )
+        return -ENODEV;
+
+    if ( hvm_load_entry(VVTD, h, &vvtd->hw) )
+        return -EINVAL;
+
+    iqa = vvtd_get_reg_quad(vvtd, DMAR_IQA_REG);
+    vvtd->irt_base = map_guest_pages(vvtd->domain, vvtd->hw.irt,
+                                     PFN_UP(vvtd->hw.irt_max_entry *
+                                            sizeof(struct iremap_entry)));
+    vvtd->inv_queue_base = map_guest_pages(vvtd->domain,
+                                           PFN_DOWN(DMA_IQA_ADDR(iqa)),
+                                           1 << DMA_IQA_QS(iqa));
+    return 0;
+}
+
+static int vvtd_save(struct domain *d, hvm_domain_context_t *h)
+{
+    struct vvtd *vvtd = domain_vvtd(d);
+
+    if ( !vvtd )
+        return 0;
+
+    return hvm_save_entry(VVTD, 0, h, &vvtd->hw);
+}
+
+HVM_REGISTER_SAVE_RESTORE(VVTD, vvtd_save, vvtd_load, 1, HVMSR_PER_DOM);
+
 static void vvtd_reset(struct vvtd *vvtd)
 {
     uint64_t cap = cap_set_num_fault_regs(VVTD_FRCD_NUM)
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index fd7bf3f..24a513b 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -639,10 +639,26 @@ struct hvm_msr {
 
 #define CPU_MSR_CODE  20
 
+#define VVTD_MAX_OFFSET 0xd0
+struct hvm_hw_vvtd
+{
+    uint32_t eim_enabled : 1,
+             intremap_enabled : 1;
+    uint32_t fault_index;
+
+    /* Interrupt remapping table base gfn and the max of entries */
+    uint32_t irt_max_entry;
+    uint64_t irt;
+
+    uint32_t regs[VVTD_MAX_OFFSET/sizeof(uint32_t)];
+};
+
+DECLARE_HVM_SAVE_TYPE(VVTD, 21, struct hvm_hw_vvtd);
+
 /* 
  * Largest type-code in use
  */
-#define HVM_SAVE_CODE_MAX 20
+#define HVM_SAVE_CODE_MAX 21
 
 #endif /* __XEN_PUBLIC_HVM_SAVE_X86_H__ */
 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 18/28] x86/vioapic: Hook interrupt delivery of vIOAPIC
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (16 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 17/28] x86/vvtd: save and restore emulated VT-d Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-12 14:54   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 19/28] x86/vioapic: extend vioapic_get_vector() to support remapping format RTE Chao Gao
                   ` (10 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

When irq remapping is enabled, IOAPIC Redirection Entry may be in remapping
format. If that, generate an irq_remapping_request and call the common
VIOMMU abstraction's callback to handle this interrupt request. Device
model is responsible for checking the request's validity.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>

---
v3:
 - use the new interface to check remapping format.
---
 xen/arch/x86/hvm/vioapic.c   | 9 +++++++++
 xen/include/asm-x86/viommu.h | 9 +++++++++
 2 files changed, 18 insertions(+)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 97b419f..0f20e3f 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -30,6 +30,7 @@
 #include <xen/lib.h>
 #include <xen/errno.h>
 #include <xen/sched.h>
+#include <xen/viommu.h>
 #include <public/hvm/ioreq.h>
 #include <asm/hvm/io.h>
 #include <asm/hvm/vpic.h>
@@ -387,9 +388,17 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
     struct vlapic *target;
     struct vcpu *v;
     unsigned int irq = vioapic->base_gsi + pin;
+    struct arch_irq_remapping_request request;
 
     ASSERT(spin_is_locked(&d->arch.hvm_domain.irq_lock));
 
+    irq_request_ioapic_fill(&request, vioapic->id, vioapic->redirtbl[pin].bits);
+    if ( viommu_check_irq_remapping(d, &request) )
+    {
+        viommu_handle_irq_request(d, &request);
+        return;
+    }
+
     HVM_DBG_LOG(DBG_LEVEL_IOAPIC,
                 "dest=%x dest_mode=%x delivery_mode=%x "
                 "vector=%x trig_mode=%x",
diff --git a/xen/include/asm-x86/viommu.h b/xen/include/asm-x86/viommu.h
index 3d995ba..e526e38 100644
--- a/xen/include/asm-x86/viommu.h
+++ b/xen/include/asm-x86/viommu.h
@@ -49,6 +49,15 @@ struct arch_irq_remapping_request
     enum viommu_irq_request_type type;
 };
 
+static inline void irq_request_ioapic_fill(
+    struct arch_irq_remapping_request *req, uint32_t ioapic_id, uint64_t rte)
+{
+    ASSERT(req);
+    req->type = VIOMMU_REQUEST_IRQ_APIC;
+    req->source_id = ioapic_id;
+    req->msg.rte = rte;
+}
+
 #endif /* __ARCH_X86_VIOMMU_H__ */
 
 /*
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 19/28] x86/vioapic: extend vioapic_get_vector() to support remapping format RTE
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (17 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 18/28] x86/vioapic: Hook interrupt delivery of vIOAPIC Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-12 15:01   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 20/28] xen/pt: when binding guest msi, accept the whole msi message Chao Gao
                   ` (9 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

When IOAPIC RTE is in remapping format, it doesn't contain the vector of
interrupt. For this case, the RTE contains an index of interrupt remapping
table where the vector of interrupt is stored. This patchs gets the vector
through a vIOMMU interface.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
 xen/arch/x86/hvm/vioapic.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 0f20e3f..8b34b21 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -560,11 +560,23 @@ int vioapic_get_vector(const struct domain *d, unsigned int gsi)
 {
     unsigned int pin;
     const struct hvm_vioapic *vioapic = gsi_vioapic(d, gsi, &pin);
+    struct arch_irq_remapping_request request;
 
     if ( !vioapic )
         return -EINVAL;
 
-    return vioapic->redirtbl[pin].fields.vector;
+    irq_request_ioapic_fill(&request, vioapic->id, vioapic->redirtbl[pin].bits);
+    if ( viommu_check_irq_remapping(vioapic->domain, &request) )
+    {
+        struct arch_irq_remapping_info info;
+
+        return unlikely(viommu_get_irq_info(vioapic->domain, &request, &info))
+                   ? : info.vector;
+    }
+    else
+    {
+        return vioapic->redirtbl[pin].fields.vector;
+    }
 }
 
 int vioapic_get_trigger_mode(const struct domain *d, unsigned int gsi)
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 20/28] xen/pt: when binding guest msi, accept the whole msi message
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (18 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 19/28] x86/vioapic: extend vioapic_get_vector() to support remapping format RTE Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-12 15:16   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 21/28] vvtd: update hvm_gmsi_info when binding guest msi with pirq or Chao Gao
                   ` (8 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Jan Beulich, Chao Gao, Ian Jackson, Roger Pau Monné

... rather than a filtered one. Previously, some fields (reserved or
unalterable) are filtered by QEMU. These fields are useless for the
legacy interrupt format (i.e. non remappable format). However, these
fields are meaningful to remappable format. Accepting the whole msi
message will significantly reduce the efforts to support binding
remappable format msi.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
v4:
 - new
---
 tools/libxc/include/xenctrl.h |  7 ++++---
 tools/libxc/xc_domain.c       | 14 ++++++++------
 xen/arch/x86/hvm/vmsi.c       | 12 ++++++------
 xen/drivers/passthrough/io.c  | 36 +++++++++++++++++-------------------
 xen/include/asm-x86/hvm/irq.h |  5 +++--
 xen/include/public/domctl.h   |  8 ++------
 6 files changed, 40 insertions(+), 42 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 666db0b..8ade90c 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1756,16 +1756,17 @@ int xc_domain_ioport_mapping(xc_interface *xch,
 int xc_domain_update_msi_irq(
     xc_interface *xch,
     uint32_t domid,
-    uint32_t gvec,
     uint32_t pirq,
+    uint64_t addr,
+    uint32_t data,
     uint32_t gflags,
     uint64_t gtable);
 
 int xc_domain_unbind_msi_irq(xc_interface *xch,
                              uint32_t domid,
-                             uint32_t gvec,
                              uint32_t pirq,
-                             uint32_t gflags);
+                             uint64_t addr,
+                             uint32_t data);
 
 int xc_domain_bind_pt_irq(xc_interface *xch,
                           uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 3ccd27f..f7baf11 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1735,8 +1735,9 @@ int xc_deassign_dt_device(
 int xc_domain_update_msi_irq(
     xc_interface *xch,
     uint32_t domid,
-    uint32_t gvec,
     uint32_t pirq,
+    uint64_t addr,
+    uint32_t data,
     uint32_t gflags,
     uint64_t gtable)
 {
@@ -1750,7 +1751,8 @@ int xc_domain_update_msi_irq(
     bind = &(domctl.u.bind_pt_irq);
     bind->irq_type = PT_IRQ_TYPE_MSI;
     bind->machine_irq = pirq;
-    bind->u.msi.gvec = gvec;
+    bind->u.msi.addr = addr;
+    bind->u.msi.data = data;
     bind->u.msi.gflags = gflags;
     bind->u.msi.gtable = gtable;
 
@@ -1761,9 +1763,9 @@ int xc_domain_update_msi_irq(
 int xc_domain_unbind_msi_irq(
     xc_interface *xch,
     uint32_t domid,
-    uint32_t gvec,
     uint32_t pirq,
-    uint32_t gflags)
+    uint64_t addr,
+    uint32_t data)
 {
     int rc;
     struct xen_domctl_bind_pt_irq *bind;
@@ -1775,8 +1777,8 @@ int xc_domain_unbind_msi_irq(
     bind = &(domctl.u.bind_pt_irq);
     bind->irq_type = PT_IRQ_TYPE_MSI;
     bind->machine_irq = pirq;
-    bind->u.msi.gvec = gvec;
-    bind->u.msi.gflags = gflags;
+    bind->u.msi.addr = addr;
+    bind->u.msi.data = data;
 
     rc = do_domctl(xch, &domctl);
     return rc;
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 7126de7..5edb0e7 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -101,12 +101,12 @@ int vmsi_deliver(
 
 void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci)
 {
-    uint32_t flags = pirq_dpci->gmsi.gflags;
-    int vector = pirq_dpci->gmsi.gvec;
-    uint8_t dest = (uint8_t)flags;
-    bool dest_mode = flags & XEN_DOMCTL_VMSI_X86_DM_MASK;
-    uint8_t delivery_mode = MASK_EXTR(flags, XEN_DOMCTL_VMSI_X86_DELIV_MASK);
-    bool trig_mode = flags & XEN_DOMCTL_VMSI_X86_TRIG_MASK;
+    uint8_t vector = pirq_dpci->gmsi.data & MSI_DATA_VECTOR_MASK;
+    uint8_t dest = MASK_EXTR(pirq_dpci->gmsi.addr, MSI_ADDR_DEST_ID_MASK);
+    bool dest_mode = pirq_dpci->gmsi.addr & MSI_ADDR_DESTMODE_MASK;
+    uint8_t delivery_mode = MASK_EXTR(pirq_dpci->gmsi.data,
+                                      MSI_DATA_DELIVERY_MODE_MASK);
+    bool trig_mode = pirq_dpci->gmsi.data & MSI_DATA_TRIGGER_MASK;
 
     HVM_DBG_LOG(DBG_LEVEL_IOAPIC,
                 "msi: dest=%x dest_mode=%x delivery_mode=%x "
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 8f16e6c..d8c66bf 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -339,19 +339,17 @@ int pt_irq_create_bind(
     {
     case PT_IRQ_TYPE_MSI:
     {
-        uint8_t dest, delivery_mode;
+        uint8_t dest, delivery_mode, gvec;
         bool dest_mode;
         int dest_vcpu_id;
         const struct vcpu *vcpu;
-        uint32_t gflags = pt_irq_bind->u.msi.gflags &
-                          ~XEN_DOMCTL_VMSI_X86_UNMASKED;
 
         if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
         {
             pirq_dpci->flags = HVM_IRQ_DPCI_MAPPED | HVM_IRQ_DPCI_MACH_MSI |
                                HVM_IRQ_DPCI_GUEST_MSI;
-            pirq_dpci->gmsi.gvec = pt_irq_bind->u.msi.gvec;
-            pirq_dpci->gmsi.gflags = gflags;
+            pirq_dpci->gmsi.data = pt_irq_bind->u.msi.data;
+            pirq_dpci->gmsi.addr = pt_irq_bind->u.msi.addr;
             /*
              * 'pt_irq_create_bind' can be called after 'pt_irq_destroy_bind'.
              * The 'pirq_cleanup_check' which would free the structure is only
@@ -383,8 +381,8 @@ int pt_irq_create_bind(
             }
             if ( unlikely(rc) )
             {
-                pirq_dpci->gmsi.gflags = 0;
-                pirq_dpci->gmsi.gvec = 0;
+                pirq_dpci->gmsi.addr = 0;
+                pirq_dpci->gmsi.data = 0;
                 pirq_dpci->dom = NULL;
                 pirq_dpci->flags = 0;
                 pirq_cleanup_check(info, d);
@@ -403,22 +401,23 @@ int pt_irq_create_bind(
             }
 
             /* If pirq is already mapped as vmsi, update guest data/addr. */
-            if ( pirq_dpci->gmsi.gvec != pt_irq_bind->u.msi.gvec ||
-                 pirq_dpci->gmsi.gflags != gflags )
+            if ( pirq_dpci->gmsi.data != pt_irq_bind->u.msi.data ||
+                 pirq_dpci->gmsi.addr != pt_irq_bind->u.msi.addr )
             {
                 /* Directly clear pending EOIs before enabling new MSI info. */
                 pirq_guest_eoi(info);
 
-                pirq_dpci->gmsi.gvec = pt_irq_bind->u.msi.gvec;
-                pirq_dpci->gmsi.gflags = gflags;
+                pirq_dpci->gmsi.data = pt_irq_bind->u.msi.data;
+                pirq_dpci->gmsi.addr = pt_irq_bind->u.msi.addr;
             }
         }
         /* Calculate dest_vcpu_id for MSI-type pirq migration. */
-        dest = MASK_EXTR(pirq_dpci->gmsi.gflags,
-                         XEN_DOMCTL_VMSI_X86_DEST_ID_MASK);
-        dest_mode = pirq_dpci->gmsi.gflags & XEN_DOMCTL_VMSI_X86_DM_MASK;
-        delivery_mode = MASK_EXTR(pirq_dpci->gmsi.gflags,
-                                  XEN_DOMCTL_VMSI_X86_DELIV_MASK);
+        dest = MASK_EXTR(pirq_dpci->gmsi.addr, MSI_ADDR_DEST_ID_MASK);
+        dest_mode = pirq_dpci->gmsi.addr & MSI_ADDR_DESTMODE_MASK;
+        delivery_mode = MASK_EXTR(pirq_dpci->gmsi.data,
+                                  MSI_DATA_DELIVERY_MODE_MASK);
+        gvec = pirq_dpci->gmsi.data & MSI_DATA_VECTOR_MASK;
+        pirq_dpci->gmsi.gvec = gvec;
 
         dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
         pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
@@ -837,9 +836,8 @@ static int _hvm_dpci_msi_eoi(struct domain *d,
     if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
          (pirq_dpci->gmsi.gvec == vector) )
     {
-        unsigned int dest = MASK_EXTR(pirq_dpci->gmsi.gflags,
-                                      XEN_DOMCTL_VMSI_X86_DEST_ID_MASK);
-        bool dest_mode = pirq_dpci->gmsi.gflags & XEN_DOMCTL_VMSI_X86_DM_MASK;
+        uint32_t dest = MASK_EXTR(pirq_dpci->gmsi.addr, MSI_ADDR_DEST_ID_MASK);
+        bool dest_mode = pirq_dpci->gmsi.addr & MSI_ADDR_DESTMODE_MASK;
 
         if ( vlapic_match_dest(vcpu_vlapic(current), NULL, 0, dest,
                                dest_mode) )
diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h
index 3b6b4bd..3a8832c 100644
--- a/xen/include/asm-x86/hvm/irq.h
+++ b/xen/include/asm-x86/hvm/irq.h
@@ -132,9 +132,10 @@ struct dev_intx_gsi_link {
 #define HVM_IRQ_DPCI_TRANSLATE       (1u << _HVM_IRQ_DPCI_TRANSLATE_SHIFT)
 
 struct hvm_gmsi_info {
-    uint32_t gvec;
-    uint32_t gflags;
+    uint32_t data;
     int dest_vcpu_id; /* -1 :multi-dest, non-negative: dest_vcpu_id */
+    uint64_t addr;
+    uint8_t gvec;
     bool posted; /* directly deliver to guest via VT-d PI? */
 };
 
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 9f6f0aa..2717c68 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -536,15 +536,11 @@ struct xen_domctl_bind_pt_irq {
             uint8_t intx;
         } pci;
         struct {
-            uint8_t gvec;
             uint32_t gflags;
-#define XEN_DOMCTL_VMSI_X86_DEST_ID_MASK 0x0000ff
-#define XEN_DOMCTL_VMSI_X86_RH_MASK      0x000100
-#define XEN_DOMCTL_VMSI_X86_DM_MASK      0x000200
-#define XEN_DOMCTL_VMSI_X86_DELIV_MASK   0x007000
-#define XEN_DOMCTL_VMSI_X86_TRIG_MASK    0x008000
 #define XEN_DOMCTL_VMSI_X86_UNMASKED     0x010000
 
+            uint32_t data;
+            uint64_t addr;
             uint64_aligned_t gtable;
         } msi;
         struct {
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 21/28] vvtd: update hvm_gmsi_info when binding guest msi with pirq or
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (19 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 20/28] xen/pt: when binding guest msi, accept the whole msi message Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-02-12 15:38   ` Roger Pau Monné
  2017-11-17  6:22 ` [PATCH v4 22/28] x86/vmsi: Hook delivering remapping format msi to guest and handling eoi Chao Gao
                   ` (7 subsequent siblings)
  28 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan, Ian Jackson,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

... handlding guest's invalidation request.

To support pirq migration optimization and using VT-d posted interrupt to
inject msi from assigned devices, each time guest programs msi information
(affinity, vector), the struct hvm_gmsi_info should be updated accordingly.
But after introducing vvtd, guest only needs to update an IRTE, which is in
guest memory, to program msi information.  vvtd doesn't trap r/w to the memory
range. Instead, it traps the queue invalidation, which is a method used to
notify VT-d hardware that an IRTE has changed.

This patch updates hvm_gmsi_info structure and programs physical IRTEs to use
VT-d posted interrupt if possible when binding guest msi with pirq or handling
guest's invalidation request. For the latter, all physical interrupts bound
with the domain are gone through to find the ones matching with the IRTE.

Notes: calling vvtd_process_iq() in vvtd_read() rather than in
vvtd_handle_irq_request() is to avoid ABBA deadlock of d->event_lock and
vvtd->ie_lock.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
v4:
 - new
---
 xen/arch/x86/hvm/hvm.c             |  2 +-
 xen/drivers/passthrough/io.c       | 89 ++++++++++++++++++++++++++++----------
 xen/drivers/passthrough/vtd/vvtd.c | 70 ++++++++++++++++++++++++++++--
 xen/include/asm-x86/hvm/hvm.h      |  2 +
 xen/include/asm-x86/hvm/irq.h      |  1 +
 xen/include/asm-x86/viommu.h       | 11 +++++
 6 files changed, 147 insertions(+), 28 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 964418a..d2c1372 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -462,7 +462,7 @@ void hvm_migrate_timers(struct vcpu *v)
     pt_migrate(v);
 }
 
-static int hvm_migrate_pirq(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
+int hvm_migrate_pirq(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
                             void *arg)
 {
     struct vcpu *v = arg;
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index d8c66bf..9198ef5 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -21,6 +21,7 @@
 #include <xen/iommu.h>
 #include <xen/cpu.h>
 #include <xen/irq.h>
+#include <xen/viommu.h>
 #include <asm/hvm/irq.h>
 #include <asm/hvm/support.h>
 #include <asm/io_apic.h>
@@ -275,6 +276,61 @@ static struct vcpu *vector_hashing_dest(const struct domain *d,
     return dest;
 }
 
+void pt_update_gmsi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
+{
+    uint8_t dest, delivery_mode;
+    bool dest_mode;
+    int dest_vcpu_id;
+    const struct vcpu *vcpu;
+    struct arch_irq_remapping_request request;
+    struct arch_irq_remapping_info remap_info;
+
+    ASSERT(spin_is_locked(&d->event_lock));
+
+    /* Calculate dest_vcpu_id for MSI-type pirq migration. */
+    irq_request_msi_fill(&request, pirq_dpci->gmsi.addr, pirq_dpci->gmsi.data);
+    if ( viommu_check_irq_remapping(d, &request) )
+    {
+        /* An error in IRTE, don't perform the optimization */
+        if ( viommu_get_irq_info(d, &request, &remap_info) )
+        {
+            pirq_dpci->gmsi.posted = false;
+            pirq_dpci->gmsi.dest_vcpu_id = -1;
+            pirq_dpci->gmsi.gvec = 0;
+            return;
+        }
+
+        dest = remap_info.dest;
+        dest_mode = remap_info.dest_mode;
+        delivery_mode = remap_info.delivery_mode;
+        pirq_dpci->gmsi.gvec = remap_info.vector;
+    }
+    else
+    {
+        dest = MASK_EXTR(pirq_dpci->gmsi.addr, MSI_ADDR_DEST_ID_MASK);
+        dest_mode = pirq_dpci->gmsi.addr & MSI_ADDR_DESTMODE_MASK;
+        delivery_mode = MASK_EXTR(pirq_dpci->gmsi.data,
+                                  MSI_DATA_DELIVERY_MODE_MASK);
+        pirq_dpci->gmsi.gvec = pirq_dpci->gmsi.data & MSI_DATA_VECTOR_MASK;
+    }
+
+    dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
+    pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
+
+    pirq_dpci->gmsi.posted = false;
+    vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
+    if ( iommu_intpost )
+    {
+        if ( delivery_mode == dest_LowestPrio )
+            vcpu = vector_hashing_dest(d, dest, dest_mode, pirq_dpci->gmsi.gvec);
+        if ( vcpu )
+        {
+            pirq_dpci->gmsi.posted = true;
+            pirq_dpci->gmsi.dest_vcpu_id = vcpu->vcpu_id;
+        }
+    }
+}
+
 int pt_irq_create_bind(
     struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
 {
@@ -339,9 +395,6 @@ int pt_irq_create_bind(
     {
     case PT_IRQ_TYPE_MSI:
     {
-        uint8_t dest, delivery_mode, gvec;
-        bool dest_mode;
-        int dest_vcpu_id;
         const struct vcpu *vcpu;
 
         if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
@@ -411,35 +464,23 @@ int pt_irq_create_bind(
                 pirq_dpci->gmsi.addr = pt_irq_bind->u.msi.addr;
             }
         }
-        /* Calculate dest_vcpu_id for MSI-type pirq migration. */
-        dest = MASK_EXTR(pirq_dpci->gmsi.addr, MSI_ADDR_DEST_ID_MASK);
-        dest_mode = pirq_dpci->gmsi.addr & MSI_ADDR_DESTMODE_MASK;
-        delivery_mode = MASK_EXTR(pirq_dpci->gmsi.data,
-                                  MSI_DATA_DELIVERY_MODE_MASK);
-        gvec = pirq_dpci->gmsi.data & MSI_DATA_VECTOR_MASK;
-        pirq_dpci->gmsi.gvec = gvec;
 
-        dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
-        pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
+        pt_update_gmsi(d, pirq_dpci);
         spin_unlock(&d->event_lock);
 
-        pirq_dpci->gmsi.posted = false;
-        vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
-        if ( iommu_intpost )
-        {
-            if ( delivery_mode == dest_LowestPrio )
-                vcpu = vector_hashing_dest(d, dest, dest_mode,
-                                           pirq_dpci->gmsi.gvec);
-            if ( vcpu )
-                pirq_dpci->gmsi.posted = true;
-        }
-        if ( dest_vcpu_id >= 0 )
-            hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
+        if ( pirq_dpci->gmsi.dest_vcpu_id >= 0 )
+            hvm_migrate_pirqs(d->vcpu[pirq_dpci->gmsi.dest_vcpu_id]);
 
         /* Use interrupt posting if it is supported. */
         if ( iommu_intpost )
+        {
+            if ( pirq_dpci->gmsi.posted )
+                vcpu = d->vcpu[pirq_dpci->gmsi.dest_vcpu_id];
+            else
+                vcpu = NULL;
             pi_update_irte(vcpu ? &vcpu->arch.hvm_vmx.pi_desc : NULL,
                            info, pirq_dpci->gmsi.gvec);
+        }
 
         if ( pt_irq_bind->u.msi.gflags & XEN_DOMCTL_VMSI_X86_UNMASKED )
         {
diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
index f6bde69..d12ad1d 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -477,6 +477,50 @@ static int vvtd_record_fault(struct vvtd *vvtd,
 }
 
 /*
+ * 'arg' is the index of interrupt remapping table. This index is used to
+ * search physical irqs which satify that the gmsi mapped with the physical irq
+ * is tranlated by the IRTE refered to by the index. The struct hvm_gmsi_info
+ * contains some fields are infered from an virtual IRTE. These fields should
+ * be updated when guest invalidates an IRTE. Furthermore, the physical IRTE
+ * is updated accordingly to reduce IPIs or utilize VT-d posted interrupt.
+ *
+ * if 'arg' is -1, perform a global invalidation.
+ */
+static int invalidate_gmsi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
+                         void *arg)
+{
+    if ( pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI )
+    {
+        uint32_t index, target = (long)arg;
+        struct arch_irq_remapping_request req;
+        const struct vcpu *vcpu;
+
+        irq_request_msi_fill(&req, pirq_dpci->gmsi.addr, pirq_dpci->gmsi.data);
+        if ( !irq_remapping_request_index(&req, &index) &&
+             ((target == -1) || (target == index)) )
+        {
+            pt_update_gmsi(d, pirq_dpci);
+            if ( pirq_dpci->gmsi.dest_vcpu_id >= 0 )
+                hvm_migrate_pirq(d, pirq_dpci,
+                                 d->vcpu[pirq_dpci->gmsi.dest_vcpu_id]);
+
+            /* Use interrupt posting if it is supported. */
+            if ( iommu_intpost )
+            {
+                if ( pirq_dpci->gmsi.posted )
+                    vcpu = d->vcpu[pirq_dpci->gmsi.dest_vcpu_id];
+                else
+                    vcpu = NULL;
+                pi_update_irte(vcpu ? &vcpu->arch.hvm_vmx.pi_desc : NULL,
+                               dpci_pirq(pirq_dpci), pirq_dpci->gmsi.gvec);
+            }
+        }
+    }
+
+    return 0;
+}
+
+/*
  * Process an invalidation descriptor. Currently, only two types descriptors,
  * Interrupt Entry Cache Invalidation Descritor and Invalidation Wait
  * Descriptor are handled.
@@ -530,7 +574,26 @@ static int process_iqe(struct vvtd *vvtd, uint32_t i)
         break;
 
     case TYPE_INVAL_IEC:
-        /* No cache is preserved in vvtd, nothing is needed to be flushed */
+        /*
+         * If VT-d pi is enabled, pi_update_irte() may be called. It assumes
+         * pcidevs_locked().
+         */
+        pcidevs_lock();
+        spin_lock(&vvtd->domain->event_lock);
+        /* A global invalidation of the cache is requested */
+        if ( !qinval.q.iec_inv_dsc.lo.granu )
+            pt_pirq_iterate(vvtd->domain, invalidate_gmsi, (void *)(long)-1);
+        else
+        {
+            uint32_t iidx = qinval.q.iec_inv_dsc.lo.iidx;
+            uint32_t nr = 1 << qinval.q.iec_inv_dsc.lo.im;
+
+            for ( ; nr; nr--, iidx++)
+                pt_pirq_iterate(vvtd->domain, invalidate_gmsi,
+                                (void *)(long)iidx);
+        }
+        spin_unlock(&vvtd->domain->event_lock);
+        pcidevs_unlock();
         break;
 
     default:
@@ -839,6 +902,8 @@ static int vvtd_read(struct vcpu *v, unsigned long addr,
     else
         *pval = vvtd_get_reg_quad(vvtd, offset);
 
+    if ( !atomic_read(&vvtd->inflight_intr) )
+        vvtd_process_iq(vvtd);
     return X86EMUL_OKAY;
 }
 
@@ -1088,8 +1153,7 @@ static int vvtd_handle_irq_request(const struct domain *d,
                         irte.remap.tm);
 
  out:
-    if ( !atomic_dec_and_test(&vvtd->inflight_intr) )
-        vvtd_process_iq(vvtd);
+    atomic_dec(&vvtd->inflight_intr);
     return ret;
 }
 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index b687e03..f276ab6 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -394,6 +394,8 @@ bool hvm_set_guest_bndcfgs(struct vcpu *v, u64 val);
 bool hvm_check_cpuid_faulting(struct vcpu *v);
 void hvm_migrate_timers(struct vcpu *v);
 void hvm_do_resume(struct vcpu *v);
+int hvm_migrate_pirq(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
+                            void *arg);
 void hvm_migrate_pirqs(struct vcpu *v);
 
 void hvm_inject_event(const struct x86_event *event);
diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h
index 3a8832c..3279371 100644
--- a/xen/include/asm-x86/hvm/irq.h
+++ b/xen/include/asm-x86/hvm/irq.h
@@ -176,6 +176,7 @@ struct hvm_pirq_dpci {
 
 void pt_pirq_init(struct domain *, struct hvm_pirq_dpci *);
 bool pt_pirq_cleanup_check(struct hvm_pirq_dpci *);
+void pt_update_gmsi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci);
 int pt_pirq_iterate(struct domain *d,
                     int (*cb)(struct domain *,
                               struct hvm_pirq_dpci *, void *arg),
diff --git a/xen/include/asm-x86/viommu.h b/xen/include/asm-x86/viommu.h
index e526e38..91ebc64 100644
--- a/xen/include/asm-x86/viommu.h
+++ b/xen/include/asm-x86/viommu.h
@@ -58,6 +58,17 @@ static inline void irq_request_ioapic_fill(
     req->msg.rte = rte;
 }
 
+static inline void irq_request_msi_fill(
+    struct arch_irq_remapping_request *req, uint64_t addr, uint32_t data)
+{
+    ASSERT(req);
+    req->type = VIOMMU_REQUEST_IRQ_MSI;
+    /* Source ID isn't in use */
+    req->source_id = 0;
+    req->msg.msi.addr = addr;
+    req->msg.msi.data = data;
+}
+
 #endif /* __ARCH_X86_VIOMMU_H__ */
 
 /*
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 22/28] x86/vmsi: Hook delivering remapping format msi to guest and handling eoi
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (20 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 21/28] vvtd: update hvm_gmsi_info when binding guest msi with pirq or Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2017-11-17  6:22 ` [PATCH v4 23/28] tools/libacpi: Add DMA remapping reporting (DMAR) ACPI table structures Chao Gao
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

When delivering guest msi, firstly, the format of the msi is determined
by the 'check_irq_remmapping' method of viommu. Then, msi of
non-remapping format is delivered as normal and remapping format msi is
handled by viommu. When handling eoi, the interrupt attributes (vector,
affinity) are used to search the physical irq. It is clear that for
remapping format msi, the interrupt attributs should be decodes from
IRTE.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
 xen/arch/x86/hvm/irq.c       |  6 ++++++
 xen/arch/x86/hvm/vmsi.c      | 33 +++++++++++++++++++++------------
 xen/drivers/passthrough/io.c | 35 +++++++++++++++++++++++++++--------
 3 files changed, 54 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index e425df9..b561480 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -23,6 +23,7 @@
 #include <xen/sched.h>
 #include <xen/irq.h>
 #include <xen/keyhandler.h>
+#include <xen/viommu.h>
 #include <asm/hvm/domain.h>
 #include <asm/hvm/support.h>
 #include <asm/msi.h>
@@ -339,6 +340,11 @@ int hvm_inject_msi(struct domain *d, uint64_t addr, uint32_t data)
     uint8_t trig_mode = (data & MSI_DATA_TRIGGER_MASK)
         >> MSI_DATA_TRIGGER_SHIFT;
     uint8_t vector = data & MSI_DATA_VECTOR_MASK;
+    struct arch_irq_remapping_request request;
+
+    irq_request_msi_fill(&request, addr, data);
+    if ( viommu_check_irq_remapping(d, &request) )
+        return viommu_handle_irq_request(d, &request);
 
     if ( !vector )
     {
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 5edb0e7..9dc5631 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -31,6 +31,7 @@
 #include <xen/errno.h>
 #include <xen/sched.h>
 #include <xen/irq.h>
+#include <xen/viommu.h>
 #include <public/hvm/ioreq.h>
 #include <asm/hvm/io.h>
 #include <asm/hvm/vpic.h>
@@ -101,21 +102,29 @@ int vmsi_deliver(
 
 void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci)
 {
-    uint8_t vector = pirq_dpci->gmsi.data & MSI_DATA_VECTOR_MASK;
-    uint8_t dest = MASK_EXTR(pirq_dpci->gmsi.addr, MSI_ADDR_DEST_ID_MASK);
-    bool dest_mode = pirq_dpci->gmsi.addr & MSI_ADDR_DESTMODE_MASK;
-    uint8_t delivery_mode = MASK_EXTR(pirq_dpci->gmsi.data,
-                                      MSI_DATA_DELIVERY_MODE_MASK);
-    bool trig_mode = pirq_dpci->gmsi.data & MSI_DATA_TRIGGER_MASK;
-
-    HVM_DBG_LOG(DBG_LEVEL_IOAPIC,
-                "msi: dest=%x dest_mode=%x delivery_mode=%x "
-                "vector=%x trig_mode=%x\n",
-                dest, dest_mode, delivery_mode, vector, trig_mode);
+    struct arch_irq_remapping_request request;
 
     ASSERT(pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI);
 
-    vmsi_deliver(d, vector, dest, dest_mode, delivery_mode, trig_mode);
+    irq_request_msi_fill(&request, pirq_dpci->gmsi.addr, pirq_dpci->gmsi.data);
+    if ( viommu_check_irq_remapping(d, &request) )
+        viommu_handle_irq_request(d, &request);
+    else
+    {
+        uint8_t vector = pirq_dpci->gmsi.data & MSI_DATA_VECTOR_MASK;
+        uint8_t dest = MASK_EXTR(pirq_dpci->gmsi.addr, MSI_ADDR_DEST_ID_MASK);
+        bool dest_mode = pirq_dpci->gmsi.addr & MSI_ADDR_DESTMODE_MASK;
+        uint8_t delivery_mode = MASK_EXTR(pirq_dpci->gmsi.data,
+                                          MSI_DATA_DELIVERY_MODE_MASK);
+        bool trig_mode = pirq_dpci->gmsi.data & MSI_DATA_TRIGGER_MASK;
+
+        HVM_DBG_LOG(DBG_LEVEL_IOAPIC,
+                    "msi: dest=%x dest_mode=%x delivery_mode=%x "
+                    "vector=%x trig_mode=%x\n",
+                    dest, dest_mode, delivery_mode, vector, trig_mode);
+
+        vmsi_deliver(d, vector, dest, dest_mode, delivery_mode, trig_mode);
+    }
 }
 
 /* Return value, -1 : multi-dests, non-negative value: dest_vcpu_id */
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 9198ef5..34a3cf1 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -872,16 +872,35 @@ static void __msi_pirq_eoi(struct hvm_pirq_dpci *pirq_dpci)
 static int _hvm_dpci_msi_eoi(struct domain *d,
                              struct hvm_pirq_dpci *pirq_dpci, void *arg)
 {
-    int vector = (long)arg;
-
-    if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
-         (pirq_dpci->gmsi.gvec == vector) )
+    if ( pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI )
     {
-        uint32_t dest = MASK_EXTR(pirq_dpci->gmsi.addr, MSI_ADDR_DEST_ID_MASK);
-        bool dest_mode = pirq_dpci->gmsi.addr & MSI_ADDR_DESTMODE_MASK;
+        uint8_t vector, vector_target = (long)arg;
+        uint32_t dest;
+        bool dm;
+        struct arch_irq_remapping_request request;
+
+        irq_request_msi_fill(&request, pirq_dpci->gmsi.addr,
+                             pirq_dpci->gmsi.data);
+        if ( viommu_check_irq_remapping(d, &request) )
+        {
+            struct arch_irq_remapping_info info;
+
+            if ( viommu_get_irq_info(d, &request, &info) )
+                return 0;
+
+            vector = info.vector;
+            dest = info.dest;
+            dm = info.dest_mode;
+        }
+        else
+        {
+            vector = pirq_dpci->gmsi.data & MSI_DATA_VECTOR_MASK;
+            dest = MASK_EXTR(pirq_dpci->gmsi.addr, MSI_ADDR_DEST_ID_MASK);
+            dm = pirq_dpci->gmsi.addr & MSI_ADDR_DESTMODE_MASK;
+        }
 
-        if ( vlapic_match_dest(vcpu_vlapic(current), NULL, 0, dest,
-                               dest_mode) )
+        if ( vector == vector_target &&
+             vlapic_match_dest(vcpu_vlapic(current), NULL, 0, dest, dm) )
         {
             __msi_pirq_eoi(pirq_dpci);
             return 1;
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 23/28] tools/libacpi: Add DMA remapping reporting (DMAR) ACPI table structures
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (21 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 22/28] x86/vmsi: Hook delivering remapping format msi to guest and handling eoi Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2017-11-17  6:22 ` [PATCH v4 24/28] tools/libacpi: Add new fields in acpi_config for DMAR table Chao Gao
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Jan Beulich, Chao Gao, Ian Jackson, Roger Pau Monné

Add dmar table structure according Chapter 8 "BIOS Considerations" of
VTd spec Rev. 2.4.

VTd spec:http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
 tools/libacpi/acpi2_0.h | 61 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
index 2619ba3..6081417 100644
--- a/tools/libacpi/acpi2_0.h
+++ b/tools/libacpi/acpi2_0.h
@@ -422,6 +422,65 @@ struct acpi_20_slit {
 };
 
 /*
+ * DMA Remapping Table header definition (DMAR)
+ */
+
+/*
+ * DMAR Flags.
+ */
+#define ACPI_DMAR_INTR_REMAP        (1 << 0)
+#define ACPI_DMAR_X2APIC_OPT_OUT    (1 << 1)
+
+struct acpi_dmar {
+    struct acpi_header header;
+    uint8_t host_address_width;
+    uint8_t flags;
+    uint8_t reserved[10];
+};
+
+/*
+ * Device Scope Types
+ */
+#define ACPI_DMAR_DEV_SCOPE_PCI_ENDPOINT        0x01
+#define ACPI_DMAR_DEV_SCOPE_PCI_SUB_HRCHY       0x02
+#define ACPI_DMAR_DEV_SCOPE_IOAPIC              0x03
+#define ACPI_DMAR_DEV_SCOPE_HPET                0x04
+#define ACPI_DMAR_DEV_SCOPE_ACPI_NS_DEV         0x05
+
+struct dmar_device_scope {
+    uint8_t type;
+    uint8_t length;
+    uint8_t reserved[2];
+    uint8_t enumeration_id;
+    uint8_t bus;
+    uint16_t path[0];
+};
+
+/*
+ * DMA Remapping Hardware Unit Types
+ */
+#define ACPI_DMAR_TYPE_DRHD     0x00
+#define ACPI_DMAR_TYPE_RMRR     0x01
+#define ACPI_DMAR_TYPE_ATSR     0x02
+#define ACPI_DMAR_TYPE_RHSA     0x03
+#define ACPI_DMAR_TYPE_ANDD     0x04
+
+/*
+ * DMA Remapping Hardware Unit Flags. All other bits are reserved and must be 0.
+ */
+#define ACPI_DMAR_INCLUDE_PCI_ALL   (1 << 0)
+
+struct acpi_dmar_hardware_unit {
+    uint16_t type;
+    uint16_t length;
+    uint8_t flags;
+    uint8_t reserved;
+    uint16_t pci_segment;
+    uint64_t base_address;
+    struct dmar_device_scope scope[0];
+};
+
+/*
  * Table Signatures.
  */
 #define ACPI_2_0_RSDP_SIGNATURE ASCII64('R','S','D',' ','P','T','R',' ')
@@ -435,6 +494,7 @@ struct acpi_20_slit {
 #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
 #define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
 #define ACPI_2_0_SLIT_SIGNATURE ASCII32('S','L','I','T')
+#define ACPI_2_0_DMAR_SIGNATURE ASCII32('D','M','A','R')
 
 /*
  * Table revision numbers.
@@ -449,6 +509,7 @@ struct acpi_20_slit {
 #define ACPI_1_0_FADT_REVISION 0x01
 #define ACPI_2_0_SRAT_REVISION 0x01
 #define ACPI_2_0_SLIT_REVISION 0x01
+#define ACPI_2_0_DMAR_REVISION 0x01
 
 #pragma pack ()
 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 24/28] tools/libacpi: Add new fields in acpi_config for DMAR table
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (22 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 23/28] tools/libacpi: Add DMA remapping reporting (DMAR) ACPI table structures Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2017-11-17  6:22 ` [PATCH v4 25/28] tools/libxl: Add an user configurable parameter to control vIOMMU attributes Chao Gao
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

The BIOS reports the remapping hardware units in a platform to system software
through the DMA Remapping Reporting (DMAR) ACPI table.
New fields are introduced for DMAR table. These new fields are set by
toolstack through parsing guest's config file. construct_dmar() is added to
build DMAR table according to the new fields.

The header files in ovmf.c are re-ordered to avoid including <stdbool.h> in
tools/libacpi/libacpi.h.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
v4:
 - initialize variables during declaration if possible
 - reorder the sequence of header files to avoid including <stdbool.h>
 in tools/libacpi/libacpi.h

v3:
 - Remove chip-set specific IOAPIC BDF. Instead, let IOAPIC-related
 info be passed by struct acpi_config.
---
 tools/firmware/hvmloader/ovmf.c |  2 +-
 tools/libacpi/build.c           | 49 +++++++++++++++++++++++++++++++++++++++++
 tools/libacpi/libacpi.h         |  9 ++++++++
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/tools/firmware/hvmloader/ovmf.c b/tools/firmware/hvmloader/ovmf.c
index a17a11c..606ab4d 100644
--- a/tools/firmware/hvmloader/ovmf.c
+++ b/tools/firmware/hvmloader/ovmf.c
@@ -23,10 +23,10 @@
 
 #include "config.h"
 #include "smbios_types.h"
+#include "util.h"
 #include "libacpi.h"
 #include "apic_regs.h"
 #include "../rombios/config.h"
-#include "util.h"
 #include "pci_regs.h"
 #include "hypercall.h"
 
diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index f9881c9..bd759da 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -303,6 +303,55 @@ static struct acpi_20_slit *construct_slit(struct acpi_ctxt *ctxt,
     return slit;
 }
 
+/*
+ * Only one DMA remapping hardware unit is exposed and all devices
+ * are under the remapping hardware unit. I/O APIC should be explicitly
+ * enumerated.
+ */
+struct acpi_dmar *construct_dmar(struct acpi_ctxt *ctxt,
+                                 const struct acpi_config *config)
+{
+    struct acpi_dmar_hardware_unit *drhd;
+    struct dmar_device_scope *scope;
+    unsigned int ioapic_scope_size = sizeof(*scope) + sizeof(scope->path[0]);
+    unsigned int size = sizeof(struct acpi_dmar) + sizeof(*drhd) +
+                        ioapic_scope_size;
+    struct acpi_dmar *dmar = ctxt->mem_ops.alloc(ctxt, size, 16);
+
+    if ( !dmar )
+        return NULL;
+
+    memset(dmar, 0, size);
+    dmar->header.signature = ACPI_2_0_DMAR_SIGNATURE;
+    dmar->header.revision = ACPI_2_0_DMAR_REVISION;
+    dmar->header.length = size;
+    fixed_strcpy(dmar->header.oem_id, ACPI_OEM_ID);
+    fixed_strcpy(dmar->header.oem_table_id, ACPI_OEM_TABLE_ID);
+    dmar->header.oem_revision = ACPI_OEM_REVISION;
+    dmar->header.creator_id   = ACPI_CREATOR_ID;
+    dmar->header.creator_revision = ACPI_CREATOR_REVISION;
+    dmar->host_address_width = config->host_addr_width - 1;
+    if ( config->iommu_intremap_supported )
+        dmar->flags |= ACPI_DMAR_INTR_REMAP;
+
+    drhd = (struct acpi_dmar_hardware_unit *)((void*)dmar + sizeof(*dmar));
+    drhd->type = ACPI_DMAR_TYPE_DRHD;
+    drhd->length = sizeof(*drhd) + ioapic_scope_size;
+    drhd->flags = ACPI_DMAR_INCLUDE_PCI_ALL;
+    drhd->pci_segment = 0;
+    drhd->base_address = config->iommu_base_addr;
+
+    scope = &drhd->scope[0];
+    scope->type = ACPI_DMAR_DEV_SCOPE_IOAPIC;
+    scope->length = ioapic_scope_size;
+    scope->enumeration_id = config->ioapic_id;
+    scope->bus = config->ioapic_bus;
+    scope->path[0] = config->ioapic_devfn;
+
+    set_checksum(dmar, offsetof(struct acpi_header, checksum), size);
+    return dmar;
+}
+
 static int construct_passthrough_tables(struct acpi_ctxt *ctxt,
                                         unsigned long *table_ptrs,
                                         int nr_tables,
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index a2efd23..c09afdc 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -96,8 +96,17 @@ struct acpi_config {
     uint32_t ioapic_base_address;
     uint16_t pci_isa_irq_mask;
     uint8_t ioapic_id;
+
+    /* Emulated IOMMU features, location and IOAPIC under the scope of IOMMU */
+    bool iommu_intremap_supported;
+    uint8_t host_addr_width;
+    uint8_t ioapic_bus;
+    uint16_t ioapic_devfn;
+    uint64_t iommu_base_addr;
 };
 
+struct acpi_dmar *construct_dmar(struct acpi_ctxt *ctxt,
+                                 const struct acpi_config *config);
 int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config);
 
 #endif /* __LIBACPI_H__ */
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 25/28] tools/libxl: Add an user configurable parameter to control vIOMMU attributes
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (23 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 24/28] tools/libacpi: Add new fields in acpi_config for DMAR table Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2017-11-17  6:22 ` [PATCH v4 26/28] tools/libxl: build DMAR table for a guest with one virtual VTD Chao Gao
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Jan Beulich, Chao Gao, Ian Jackson, Roger Pau Monné

A field, viommu_info, is added to struct libxl_domain_build_info. Several
attributes can be specified by guest config file for virtual IOMMU. These
attributes are used for DMAR construction and vIOMMU creation.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>

---
v4:
 - Move VTD_BASE_ADDRESS's definition to libxl_arch.h

v3:
 - allow an array of viommu rather than only one viommu to present to guest.
 During domain building, an error would be raised for
 multiple viommus case since we haven't implemented this yet.
 - provide a libxl__viommu_set_default() for viommu
---
 docs/man/xl.cfg.pod.5.in    | 27 ++++++++++++++++++++++++
 tools/libxl/libxl_arch.h    |  1 +
 tools/libxl/libxl_create.c  | 47 ++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_types.idl | 12 +++++++++++
 tools/xl/xl_parse.c         | 50 ++++++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 136 insertions(+), 1 deletion(-)

diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index b7b91d8..2a48cb8 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -1803,6 +1803,33 @@ L<http://www.microsoft.com/en-us/download/details.aspx?id=30707>
 
 =back 
 
+=item B<viommu=[ "VIOMMU_STRING", "VIOMMU_STRING", ...]>
+
+Specifies the vIOMMUs which are to be provided to the guest.
+
+B<VIOMMU_STRING> has the form C<KEY=VALUE,KEY=VALUE,...> where:
+
+=over 4
+
+=item B<KEY=VALUE>
+
+Possible B<KEY>s are:
+
+=over 4
+
+=item B<type="STRING">
+
+Currently there is only one valid type:
+
+(x86 only) "intel_vtd" means providing a emulated Intel VT-d to the guest.
+
+=item B<intremap=BOOLEAN>
+
+Specifies whether the vIOMMU should support interrupt remapping
+and default 'true'.
+
+=back
+
 =head3 Guest Virtual Time Controls
 
 =over 4
diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index 784ec7f..9a74d96 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -81,6 +81,7 @@ int libxl__arch_extra_memory(libxl__gc *gc,
 #if defined(__i386__) || defined(__x86_64__)
 
 #define LAPIC_BASE_ADDRESS  0xfee00000
+#define VTD_BASE_ADDRESS    0xfed90000
 
 int libxl__dom_load_acpi(libxl__gc *gc,
                          const libxl_domain_build_info *b_info,
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f15fb21..b486751 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -59,6 +59,50 @@ void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
                             LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
 }
 
+static int libxl__viommu_set_default(libxl__gc *gc,
+                                     libxl_domain_build_info *b_info)
+{
+    int i;
+
+    for (i = 0; i < b_info->num_viommus; i++) {
+        libxl_viommu_info *viommu = &b_info->viommu[i];
+
+        if (libxl_defbool_is_default(viommu->intremap))
+            libxl_defbool_set(&viommu->intremap, true);
+
+        if (!libxl_defbool_val(viommu->intremap)) {
+            LOG(ERROR, "Cannot create one virtual VTD without intremap");
+            return ERROR_INVAL;
+        }
+
+        switch (viommu->type) {
+        case LIBXL_VIOMMU_TYPE_INTEL_VTD:
+            /*
+             * If there are multiple vIOMMUs, we need arrange all vIOMMUs to
+             * avoid overlap. Put a check here in case we get here for multiple
+             * vIOMMUs case.
+             */
+            if (b_info->num_viommus > 1) {
+                LOG(ERROR, "Multiple vIOMMUs support is under implementation");
+                return ERROR_INVAL;
+            }
+
+            /* Set default values to unexposed fields */
+            viommu->base_addr = VTD_BASE_ADDRESS;
+
+            /* Set desired capbilities */
+            viommu->cap = VIOMMU_CAP_IRQ_REMAPPING;
+
+            break;
+
+        default:
+            return ERROR_INVAL;
+        }
+    }
+
+    return 0;
+}
+
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
                                         libxl_domain_build_info *b_info)
 {
@@ -218,6 +262,9 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
     libxl__arch_domain_build_info_acpi_setdefault(b_info);
     libxl_defbool_setdefault(&b_info->dm_restrict, false);
 
+    if (libxl__viommu_set_default(gc, b_info))
+        return ERROR_FAIL;
+
     switch (b_info->type) {
     case LIBXL_DOMAIN_TYPE_HVM:
         if (b_info->shadow_memkb == LIBXL_MEMKB_DEFAULT)
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index a239324..b6869eb 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -457,6 +457,17 @@ libxl_altp2m_mode = Enumeration("altp2m_mode", [
     (3, "limited"),
     ], init_val = "LIBXL_ALTP2M_MODE_DISABLED")
 
+libxl_viommu_type = Enumeration("viommu_type", [
+    (1, "intel_vtd"),
+    ])
+
+libxl_viommu_info = Struct("viommu_info", [
+    ("type",            libxl_viommu_type),
+    ("intremap",        libxl_defbool),
+    ("cap",             uint64),
+    ("base_addr",       uint64),
+    ])
+
 libxl_domain_build_info = Struct("domain_build_info",[
     ("max_vcpus",       integer),
     ("avail_vcpus",     libxl_bitmap),
@@ -522,6 +533,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("nested_hvm",       libxl_defbool),
     ("apic",             libxl_defbool),
     ("dm_restrict",      libxl_defbool),
+    ("viommu",           Array(libxl_viommu_info, "num_viommus")),
     ("u", KeyedUnion(None, libxl_domain_type, "type",
                 [("hvm", Struct(None, [("firmware",         string),
                                        ("bios",             libxl_bios_type),
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 9a692d5..aa2eeac 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -851,6 +851,38 @@ out:
     return rc;
 }
 
+/* Parses viommu data and adds info into viommu
+ * Returns 1 if the input doesn't form a valid viommu
+ * or parsed values are not correct. Successful parse returns 0 */
+static int parse_viommu_config(libxl_viommu_info *viommu, const char *info)
+{
+    char *ptr, *oparg, *saveptr = NULL, *buf = xstrdup(info);
+
+    ptr = strtok_r(buf, ",", &saveptr);
+    if (MATCH_OPTION("type", ptr, oparg)) {
+        if (!strcmp(oparg, "intel_vtd")) {
+            viommu->type = LIBXL_VIOMMU_TYPE_INTEL_VTD;
+        } else {
+            fprintf(stderr, "Invalid viommu type: %s\n", oparg);
+            return 1;
+        }
+    } else {
+        fprintf(stderr, "viommu type should be set first: %s\n", oparg);
+        return 1;
+    }
+
+    for (ptr = strtok_r(NULL, ",", &saveptr); ptr;
+         ptr = strtok_r(NULL, ",", &saveptr)) {
+        if (MATCH_OPTION("intremap", ptr, oparg)) {
+            libxl_defbool_set(&viommu->intremap, strtoul(oparg, NULL, 0));
+        } else {
+            fprintf(stderr, "Unknown string `%s' in viommu spec\n", ptr);
+            return 1;
+        }
+    }
+    return 0;
+}
+
 void parse_config_data(const char *config_source,
                        const char *config_data,
                        int config_len,
@@ -860,7 +892,7 @@ void parse_config_data(const char *config_source,
     long l, vcpus = 0;
     XLU_Config *config;
     XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms,
-                   *usbctrls, *usbdevs, *p9devs, *vdispls;
+                   *usbctrls, *usbdevs, *p9devs, *vdispls, *iommus;
     XLU_ConfigList *channels, *ioports, *irqs, *iomem, *viridian, *dtdevs,
                    *mca_caps;
     int num_ioports, num_irqs, num_iomem, num_cpus, num_viridian, num_mca_caps;
@@ -1199,6 +1231,22 @@ void parse_config_data(const char *config_source,
     xlu_cfg_get_defbool(config, "nestedhvm", &b_info->nested_hvm, 0);
     xlu_cfg_get_defbool(config, "apic", &b_info->apic, 0);
 
+    if (!xlu_cfg_get_list (config, "viommu", &iommus, 0, 0)) {
+        while ((buf = xlu_cfg_get_listitem (iommus, b_info->num_viommus))
+                != NULL) {
+            libxl_viommu_info *viommu;
+
+            viommu = ARRAY_EXTEND_INIT_NODEVID(b_info->viommu,
+                                               b_info->num_viommus,
+                                               libxl_viommu_info_init);
+
+            if (parse_viommu_config(viommu, buf)) {
+                fprintf(stderr, "ERROR: invalid viommu setting\n");
+                exit (1);
+            }
+        }
+    }
+
     switch(b_info->type) {
     case LIBXL_DOMAIN_TYPE_HVM:
         kernel_basename = libxl_basename(b_info->kernel);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 26/28] tools/libxl: build DMAR table for a guest with one virtual VTD
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (24 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 25/28] tools/libxl: Add an user configurable parameter to control vIOMMU attributes Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2017-11-17  6:22 ` [PATCH v4 27/28] tools/libxl: create vIOMMU during domain construction Chao Gao
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Jan Beulich, Chao Gao, Ian Jackson, Roger Pau Monné

A new logic is added to init_acpi_config(). The logic initializes
some fields introduced for DMAR table. For PVH guest, the DMAR table
is built as other tables. But for HVM guest, only the DMAR table is
built in toolstack and pass through it to guest via existing mechanism.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
v4:
 - build DMAR table for PVH guest.
 - remove LIBXL_DEVICE_MODEL_VERSION_NONE

v3:
 - build dmar and initialize related acpi_modules struct in
 libxl_x86_acpi.c, keeping in accordance with pvh.
---
 tools/libacpi/build.c        | 12 ++++++
 tools/libacpi/libacpi.h      |  1 +
 tools/libxl/libxl_x86.c      |  4 +-
 tools/libxl/libxl_x86_acpi.c | 98 ++++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 111 insertions(+), 4 deletions(-)

diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index bd759da..df0a67c 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -517,6 +517,18 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
             printf("Failed to build SLIT, skipping...\n");
     }
 
+    /* DMAR */
+    if ( config->table_flags & ACPI_HAS_DMAR )
+    {
+        struct acpi_dmar *dmar = construct_dmar(ctxt, config);
+
+        if ( dmar )
+            table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, dmar);
+        else
+            printf("Failed to build DMAR, skipping...\n");
+    }
+
+
     /* Load any additional tables passed through. */
     nr_tables += construct_passthrough_tables(ctxt, table_ptrs,
                                               nr_tables, config);
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index c09afdc..bdeeccc 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -36,6 +36,7 @@
 #define ACPI_HAS_8042              (1<<13)
 #define ACPI_HAS_CMOS_RTC          (1<<14)
 #define ACPI_HAS_SSDT_LAPTOP_SLATE (1<<15)
+#define ACPI_HAS_DMAR              (1<<16)
 
 struct xen_vmemrange;
 struct acpi_numa {
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 5f91fe4..cb2f494 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -383,7 +383,9 @@ int libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
 {
     int rc = 0;
 
-    if (info->type == LIBXL_DOMAIN_TYPE_PVH) {
+
+    if (info->type == LIBXL_DOMAIN_TYPE_HVM
+            || info->type == LIBXL_DOMAIN_TYPE_PVH) {
         rc = libxl__dom_load_acpi(gc, info, dom);
         if (rc != 0)
             LOGE(ERROR, "libxl_dom_load_acpi failed");
diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
index 9a7c904..bbe9219 100644
--- a/tools/libxl/libxl_x86_acpi.c
+++ b/tools/libxl/libxl_x86_acpi.c
@@ -16,6 +16,7 @@
 #include "libxl_arch.h"
 #include <xen/hvm/hvm_info_table.h>
 #include <xen/hvm/e820.h>
+#include "libacpi/acpi2_0.h"
 #include "libacpi/libacpi.h"
 
 #include <xc_dom.h>
@@ -100,6 +101,25 @@ static int init_acpi_config(libxl__gc *gc,
     struct hvm_info_table *hvminfo;
     int i, r, rc;
 
+    if ((b_info->num_viommus == 1) &&
+        (b_info->viommu[0].type == LIBXL_VIOMMU_TYPE_INTEL_VTD)) {
+        if (libxl_defbool_val(b_info->viommu[0].intremap))
+            config->iommu_intremap_supported = true;
+        config->iommu_base_addr = b_info->viommu[0].base_addr;
+
+        /* IOAPIC id and PSEUDO BDF */
+        config->ioapic_id = 1;
+        config->ioapic_bus = 0xff;
+        config->ioapic_devfn = 0x0;
+
+        config->host_addr_width = 39;
+        config->table_flags |= ACPI_HAS_DMAR;
+    }
+
+    if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) {
+        return 0;
+    }
+
     config->dsdt_anycpu = config->dsdt_15cpu = dsdt_pvh;
     config->dsdt_anycpu_len = config->dsdt_15cpu_len = dsdt_pvh_len;
 
@@ -161,9 +181,9 @@ out:
     return rc;
 }
 
-int libxl__dom_load_acpi(libxl__gc *gc,
-                         const libxl_domain_build_info *b_info,
-                         struct xc_dom_image *dom)
+static int libxl__dom_load_acpi_pvh(libxl__gc *gc,
+                                    const libxl_domain_build_info *b_info,
+                                    struct xc_dom_image *dom)
 {
     struct acpi_config config = {0};
     struct libxl_acpi_ctxt libxl_ctxt;
@@ -235,6 +255,78 @@ out:
     return rc;
 }
 
+static void *acpi_memalign(struct acpi_ctxt *ctxt, uint32_t size,
+                           uint32_t align)
+{
+    int ret;
+    void *ptr;
+
+    ret = posix_memalign(&ptr, align, size);
+    if (ret != 0 || !ptr)
+        return NULL;
+
+    return ptr;
+}
+
+/*
+ * For hvm, we don't need build acpi in libxl. Instead, it's built in hvmloader.
+ * But if one hvm has virtual VTD(s), we build DMAR table for it and joint this
+ * table with existing content in acpi_modules in order to employ HVM
+ * firmware pass-through mechanism to pass-through DMAR table.
+ */
+static int libxl__dom_load_acpi_hvm(libxl__gc *gc,
+                                    const libxl_domain_build_info *b_info,
+                                    struct xc_dom_image *dom)
+{
+    struct acpi_config config = { 0 };
+    struct acpi_ctxt ctxt;
+    struct acpi_dmar *dmar;
+    uint32_t len;
+
+    ctxt.mem_ops.alloc = acpi_memalign;
+    ctxt.mem_ops.v2p = virt_to_phys;
+    ctxt.mem_ops.free = acpi_mem_free;
+
+    init_acpi_config(gc, dom, b_info, &config);
+    dmar = construct_dmar(&ctxt, &config);
+    if ( !dmar )
+        return ERROR_NOMEM;
+    len = dmar->header.length;
+
+    if (len) {
+        libxl__ptr_add(gc, dmar);
+        if (!dom->acpi_modules[0].data) {
+            dom->acpi_modules[0].data = (void *)dmar;
+            dom->acpi_modules[0].length = len;
+        } else {
+            /* joint tables */
+            void *newdata;
+
+            newdata = libxl__malloc(gc, len + dom->acpi_modules[0].length);
+            memcpy(newdata, dom->acpi_modules[0].data,
+                   dom->acpi_modules[0].length);
+            memcpy(newdata + dom->acpi_modules[0].length, dmar, len);
+
+            free(dom->acpi_modules[0].data);
+            dom->acpi_modules[0].data = newdata;
+            dom->acpi_modules[0].length += len;
+        }
+    }
+    return 0;
+}
+
+int libxl__dom_load_acpi(libxl__gc *gc,
+                         const libxl_domain_build_info *b_info,
+                         struct xc_dom_image *dom)
+{
+
+    if (b_info->type == LIBXL_DOMAIN_TYPE_PVH)
+        return libxl__dom_load_acpi_pvh(gc, b_info, dom);
+    else if (b_info->type == LIBXL_DOMAIN_TYPE_HVM)
+        return libxl__dom_load_acpi_hvm(gc, b_info, dom);
+
+    return -EINVAL;
+}
 /*
  * Local variables:
  * mode: C
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 27/28] tools/libxl: create vIOMMU during domain construction
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (25 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 26/28] tools/libxl: build DMAR table for a guest with one virtual VTD Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2017-11-17  6:22 ` [PATCH v4 28/28] tools/libxc: Add viommu operations in libxc Chao Gao
  2018-10-04 15:51 ` [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Jan Beulich
  28 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Jan Beulich, Chao Gao, Ian Jackson, Roger Pau Monné

If guest is configured to have a vIOMMU, create it during domain construction.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>

---
v4:
 - s/LOGED/LOGD
v3:
 - Remove the process of querying capabilities.
---
 tools/libxl/libxl_x86.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index cb2f494..394c70f 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -343,8 +343,25 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
     if (d_config->b_info.type != LIBXL_DOMAIN_TYPE_PV) {
         unsigned long shadow = DIV_ROUNDUP(d_config->b_info.shadow_memkb,
                                            1024);
+        unsigned int i;
+
         xc_shadow_control(ctx->xch, domid, XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION,
                           NULL, 0, &shadow, 0, NULL);
+
+        for (i = 0; i < d_config->b_info.num_viommus; i++) {
+            uint32_t id;
+            libxl_viommu_info *viommu = &d_config->b_info.viommu[i];
+
+            if (viommu->type == LIBXL_VIOMMU_TYPE_INTEL_VTD) {
+                ret = xc_viommu_create(ctx->xch, domid, VIOMMU_TYPE_INTEL_VTD,
+                                       viommu->base_addr, viommu->cap, &id);
+                if (ret) {
+                    LOGD(ERROR, domid, "create vIOMMU fail (%d)", ret);
+                    ret = ERROR_FAIL;
+                    goto out;
+                }
+            }
+        }
     }
 
     if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_PV &&
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 28/28] tools/libxc: Add viommu operations in libxc
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (26 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 27/28] tools/libxl: create vIOMMU during domain construction Chao Gao
@ 2017-11-17  6:22 ` Chao Gao
  2018-10-04 15:51 ` [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Jan Beulich
  28 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2017-11-17  6:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Tim Deegan,
	Jan Beulich, Andrew Cooper, Chao Gao, Roger Pau Monné

Add libxc helpers for XEN_DOMCTL_viommu_op. Now, it has one sub-command
- create(): create a vIOMMU in Xen, given viommu type, register-set
            location and capabilities

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
v4:
 - remove destroy() sub-command
v3:
 - Remove API for querying viommu capabilities
 - Remove pointless cast
 - Polish commit message
 - Coding style
---
 tools/libxc/Makefile          |  1 +
 tools/libxc/include/xenctrl.h |  3 +++
 tools/libxc/xc_viommu.c       | 51 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 55 insertions(+)
 create mode 100644 tools/libxc/xc_viommu.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 9a019e8..7d8c4b4 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -51,6 +51,7 @@ CTRL_SRCS-$(CONFIG_MiniOS) += xc_minios.c
 CTRL_SRCS-y       += xc_evtchn_compat.c
 CTRL_SRCS-y       += xc_gnttab_compat.c
 CTRL_SRCS-y       += xc_devicemodel_compat.c
+CTRL_SRCS-y       += xc_viommu.c
 
 GUEST_SRCS-y :=
 GUEST_SRCS-y += xg_private.c xc_suspend.c
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 8ade90c..69cf03f 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2537,6 +2537,9 @@ enum xc_static_cpu_featuremask {
 const uint32_t *xc_get_static_cpu_featuremask(enum xc_static_cpu_featuremask);
 const uint32_t *xc_get_feature_deep_deps(uint32_t feature);
 
+int xc_viommu_create(xc_interface *xch, domid_t dom, uint64_t type,
+                     uint64_t base_addr, uint64_t cap, uint32_t *viommu_id);
+
 #endif
 
 int xc_livepatch_upload(xc_interface *xch,
diff --git a/tools/libxc/xc_viommu.c b/tools/libxc/xc_viommu.c
new file mode 100644
index 0000000..a72b2f4
--- /dev/null
+++ b/tools/libxc/xc_viommu.c
@@ -0,0 +1,51 @@
+/*
+ * xc_viommu.c
+ *
+ * viommu related API functions.
+ *
+ * Copyright (C) 2017 Intel Corporation
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License, version 2.1, as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "xc_private.h"
+
+int xc_viommu_create(xc_interface *xch, domid_t dom, uint64_t type,
+                     uint64_t base_addr, uint64_t cap, uint32_t *viommu_id)
+{
+    int rc;
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_viommu_op;
+    domctl.domain = dom;
+    domctl.u.viommu_op.cmd = XEN_DOMCTL_viommu_create;
+    domctl.u.viommu_op.u.create.type = type;
+    domctl.u.viommu_op.u.create.base_address = base_addr;
+    domctl.u.viommu_op.u.create.capabilities = cap;
+
+    rc = do_domctl(xch, &domctl);
+    if ( !rc )
+        *viommu_id = domctl.u.viommu_op.u.create.id;
+
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 01/28] Xen/doc: Add Xen virtual IOMMU doc
  2017-11-17  6:22 ` [PATCH v4 01/28] Xen/doc: Add Xen virtual IOMMU doc Chao Gao
@ 2018-02-09 12:54   ` Roger Pau Monné
  2018-02-09 15:53     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-09 12:54 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:08PM +0800, Chao Gao wrote:
> From: Lan Tianyu <tianyu.lan@intel.com>
> 
> This patch is to add Xen virtual IOMMU doc to introduce motivation,
> framework, vIOMMU hypercall and xl configuration.
> 
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
>  docs/misc/viommu.txt | 120 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 120 insertions(+)
>  create mode 100644 docs/misc/viommu.txt
> 
> diff --git a/docs/misc/viommu.txt b/docs/misc/viommu.txt
> new file mode 100644
> index 0000000..472d2b5
> --- /dev/null
> +++ b/docs/misc/viommu.txt
> @@ -0,0 +1,120 @@
> +Xen virtual IOMMU
> +
> +Motivation
> +==========
> +Enable more than 128 vcpu support
> +
> +The current requirements of HPC cloud service requires VM with a high
> +number of CPUs in order to achieve high performance in parallel
> +computing.
> +
> +To support >128 vcpus, X2APIC mode in guest is necessary because legacy
> +APIC(XAPIC) just supports 8-bit APIC ID. The APIC ID used by Xen is
> +CPU ID * 2 (ie: CPU 127 has APIC ID 254, which is the last one available
> +in xAPIC mode) and so it only can support 128 vcpus at most. x2APIC mode
> +supports 32-bit APIC ID and it requires the interrupt remapping functionality
> +of a vIOMMU if the guest wishes to route interrupts to all available vCPUs
> +
> +PCI MSI/IOAPIC can only send interrupt message containing 8-bit APIC ID,
> +which cannot address cpus with >254 APIC ID. Interrupt remapping supports
> +32-bit APIC ID and so it's necessary for >128 vcpus support.
> +
> +vIOMMU Architecture
> +===================
> +vIOMMU device model is inside Xen hypervisor for following factors
> +    1) Avoid round trips between Qemu and Xen hypervisor
> +    2) Ease of integration with the rest of hypervisor
> +    3) PVH doesn't use Qemu
> +
> +* Interrupt remapping overview.
> +Interrupts from virtual devices and physical devices are delivered
> +to vLAPIC from vIOAPIC and vMSI. vIOMMU needs to remap interrupt during
> +this procedure.
> +
> ++---------------------------------------------------+
> +|Qemu                       |VM                     |
> +|                           | +----------------+    |
> +|                           | |  Device driver |    |
> +|                           | +--------+-------+    |
> +|                           |          ^            |
> +|       +----------------+  | +--------+-------+    |
> +|       | Virtual device |  | |  IRQ subsystem |    |
> +|       +-------+--------+  | +--------+-------+    |
> +|               |           |          ^            |
> +|               |           |          |            |
> ++---------------------------+-----------------------+
> +|hypervisor     |                      | VIRQ       |
> +|               |            +---------+--------+   |
> +|               |            |      vLAPIC      |   |
> +|               |VIRQ        +---------+--------+   |
> +|               |                      ^            |
> +|               |                      |            |
> +|               |            +---------+--------+   |
> +|               |            |      vIOMMU      |   |
> +|               |            +---------+--------+   |
> +|               |                      ^            |
> +|               |                      |            |
> +|               |            +---------+--------+   |
> +|               |            |   vIOAPIC/vMSI   |   |
> +|               |            +----+----+--------+   |
> +|               |                 ^    ^            |
> +|               +-----------------+    |            |
> +|                                      |            |
> ++---------------------------------------------------+
> +HW                                     |IRQ
> +                                +-------------------+
> +                                |   PCI Device      |
> +                                +-------------------+
> +
> +
> +vIOMMU hypercall
> +================
> +Introduce a new domctl hypercall "xen_domctl_viommu_op" to create
> +vIOMMUs instance in hypervisor. vIOMMU instance will be destroyed
> +during destroying domain.
> +
> +* vIOMMU hypercall parameter structure
> +
> +/* vIOMMU type - specify vendor vIOMMU device model */
> +#define VIOMMU_TYPE_INTEL_VTD	       0
> +
> +/* vIOMMU capabilities */
> +#define VIOMMU_CAP_IRQ_REMAPPING  (1u << 0)
> +
> +struct xen_domctl_viommu_op {
> +    uint32_t cmd;
> +#define XEN_DOMCTL_viommu_create          0
> +    union {
> +        struct {
> +            /* IN - vIOMMU type  */
> +            uint8_t type;
> +            /* IN - MMIO base address of vIOMMU. */
> +            uint64_t base_address;
> +            /* IN - Capabilities with which we want to create */
> +            uint64_t capabilities;
> +            /* OUT - vIOMMU identity */
> +            uint32_t id;
> +        } create;
> +    } u;
> +};
> +
> +- XEN_DOMCTL_create_viommu
> +    Create vIOMMU device with type, capabilities and MMIO base address.
> +Hypervisor allocates viommu_id for new vIOMMU instance and return back.
> +The vIOMMU device model in hypervisor should check whether it can
> +support the input capabilities and return error if not.
> +
> +vIOMMU domctl and vIOMMU option in configure file consider multi-vIOMMU
> +support for single VM.(e.g, parameters of create vIOMMU includes vIOMMU id).
> +But function implementation only supports one vIOMMU per VM so far.
> +
> +xl x86 vIOMMU configuration"
> +============================
> +viommu = [
> +    'type=intel_vtd,intremap=1',
> +    ...
> +]
> +
> +"type" - Specify vIOMMU device model type. Currently only supports Intel vtd
> +device model.

Although I see the point in being able to specify the vIOMMU type, is
this really helpful from an admin PoV?

What would happen for example if you try to add an Intel vIOMMU to a
guest running on an AMD CPU? I guess the guest OSes would be quite
surprised about that...

I think the most common way to use this option would be:

viommu = [
    'intremap=1',
    ...
]

And vIOMMUs should automatically be added to guests with > 128 vCPUs?
IIRC Linux requires a vIOMMU in order to run with > 128 vCPUs (which
is quite arbitrary, but anyway...).

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 02/28] VIOMMU: Add vIOMMU framework and vIOMMU domctl
  2017-11-17  6:22 ` [PATCH v4 02/28] VIOMMU: Add vIOMMU framework and vIOMMU domctl Chao Gao
@ 2018-02-09 14:33   ` Roger Pau Monné
  2018-02-09 16:13     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-09 14:33 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:09PM +0800, Chao Gao wrote:
> From: Lan Tianyu <tianyu.lan@intel.com>
> 
> This patch is to introduce an abstract layer for arch vIOMMU implementation
> and vIOMMU domctl to deal with requests from tool stack. Arch vIOMMU code needs to
> provide callback. vIOMMU domctl supports to create vIOMMU instance in hypervisor
> and it will be destroyed during destroying domain.
> 
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> v4:
>  - introduce REGISTER_VIOMMU() to register viommu types and ops.
>  - remove unneeded domctl interface to destroy viommu.
> ---
>  docs/misc/xen-command-line.markdown |   7 ++
>  xen/arch/x86/Kconfig                |   1 +
>  xen/arch/x86/hvm/hvm.c              |   3 +
>  xen/arch/x86/xen.lds.S              |   3 +
>  xen/common/Kconfig                  |   3 +
>  xen/common/Makefile                 |   1 +
>  xen/common/domctl.c                 |   7 ++
>  xen/common/viommu.c                 | 125 ++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/hvm/domain.h    |   3 +
>  xen/include/public/domctl.h         |  31 +++++++++
>  xen/include/xen/viommu.h            |  69 ++++++++++++++++++++
>  11 files changed, 253 insertions(+)
>  create mode 100644 xen/common/viommu.c
>  create mode 100644 xen/include/xen/viommu.h
> 
> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> index eb4995e..d097382 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1836,3 +1836,10 @@ mode.
>  > Default: `true`
>  
>  Permit use of the `xsave/xrstor` instructions.
> +
> +### viommu
> +> `= <boolean>`
> +
> +> Default: `false`
> +
> +Permit use of viommu interface to create and destroy viommu device model.

I'm not sure about the point of having this command line option, this
is a guest feature and just setting it from the config file should be
enough IMHO.

> diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
> index 64955dc..df254e4 100644
> --- a/xen/arch/x86/Kconfig
> +++ b/xen/arch/x86/Kconfig
> @@ -25,6 +25,7 @@ config X86
>  	select HAS_UBSAN
>  	select NUMA
>  	select VGA
> +	select VIOMMU
>  
>  config ARCH_DEFCONFIG
>  	string
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 205b4cb..964418a 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -36,6 +36,7 @@
>  #include <xen/rangeset.h>
>  #include <xen/monitor.h>
>  #include <xen/warning.h>
> +#include <xen/viommu.h>
>  #include <asm/shadow.h>
>  #include <asm/hap.h>
>  #include <asm/current.h>
> @@ -693,6 +694,8 @@ void hvm_domain_relinquish_resources(struct domain *d)
>          pmtimer_deinit(d);
>          hpet_deinit(d);
>      }
> +
> +    viommu_destroy_domain(d);

This returns a value, but you ignore it (read below for how I think
this should be solved).

>  }
>  
>  void hvm_domain_destroy(struct domain *d)
> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
> index d5e8821..7f8d2b8 100644
> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -231,6 +231,9 @@ SECTIONS
>         __start_schedulers_array = .;
>         *(.data.schedulers)
>         __end_schedulers_array = .;
> +       __start_viommus_array = .;
> +       *(.data.viommus)
> +       __end_viommus_array = .;

This should be protected with #ifdef CONFIG_VIOMMU. And please place
it at the end of the rodata section (AFAICT there's no need for it to
be in the data.read_mostly section).

>    } :text
>  
>    .data : {                    /* Data */
> diff --git a/xen/common/Kconfig b/xen/common/Kconfig
> index 103ef44..62aaa76 100644
> --- a/xen/common/Kconfig
> +++ b/xen/common/Kconfig
> @@ -52,6 +52,9 @@ config HAS_CHECKPOLICY
>  	string
>  	option env="XEN_HAS_CHECKPOLICY"
>  
> +config VIOMMU
> +	bool
> +
>  config KEXEC
>  	bool "kexec support"
>  	default y
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index 66cc2c8..182b3ac 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -56,6 +56,7 @@ obj-y += time.o
>  obj-y += timer.o
>  obj-y += trace.o
>  obj-y += version.o
> +obj-$(CONFIG_VIOMMU) += viommu.o
>  obj-y += virtual_region.o
>  obj-y += vm_event.o
>  obj-y += vmap.o
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index 3c6fa4e..9c5651d 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -25,6 +25,7 @@
>  #include <xen/paging.h>
>  #include <xen/hypercall.h>
>  #include <xen/vm_event.h>
> +#include <xen/viommu.h>
>  #include <xen/monitor.h>
>  #include <asm/current.h>
>  #include <asm/irq.h>
> @@ -1155,6 +1156,12 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>                                       op->u.set_gnttab_limits.maptrack_frames);
>          break;
>  
> +    case XEN_DOMCTL_viommu_op:
> +        ret = viommu_domctl(d, &op->u.viommu_op);
> +        if ( !ret )
> +            copyback = 1;
> +        break;
> +
>      default:
>          ret = arch_do_domctl(op, d, u_domctl);
>          break;
> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
> new file mode 100644
> index 0000000..fd8b7fd
> --- /dev/null
> +++ b/xen/common/viommu.c
> @@ -0,0 +1,125 @@
> +/*
> + * common/viommu.c
> + *
> + * Copyright (c) 2017 Intel Corporation
> + * Author: Lan Tianyu <tianyu.lan@intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/sched.h>
> +#include <xen/spinlock.h>
> +#include <xen/types.h>
> +#include <xen/viommu.h>
> +
> +extern const struct viommu_ops *__start_viommus_array[], *__end_viommus_array[];
> +#define NUM_VIOMMU_TYPE (__end_viommus_array - __start_viommus_array)
> +#define viommu_type_array __start_viommus_array
> +
> +int viommu_destroy_domain(struct domain *d)

IMHO I would rather prefer the destroy operation to not be allowed to
fail, that would allow switching it return value to void.

The more that you simply ignore the return value in
hvm_domain_relinquish_resources.

> +{
> +    struct viommu *viommu = d->arch.hvm_domain.viommu;
> +    int ret;
> +
> +    if ( !viommu )
> +        return -ENODEV;
> +
> +    ret = viommu->ops->destroy(viommu);
> +    if ( ret < 0 )
> +        return ret;
> +
> +    xfree(viommu);
> +    d->arch.hvm_domain.viommu = NULL;
> +
> +    return 0;
> +}
> +
> +static const struct viommu_ops *viommu_get_ops(uint8_t type)
> +{
> +    int i;

unsigned int.

> +
> +    for ( i = 0; i < NUM_VIOMMU_TYPE; i++)
> +    {
> +        if ( viommu_type_array[i]->type == type )
> +            return viommu_type_array[i];
> +    }
> +
> +    return NULL;
> +}
> +
> +static int viommu_create(struct domain *d, uint8_t type,
> +                         uint64_t base_address, uint64_t caps,
> +                         uint32_t *viommu_id)
> +{
> +    struct viommu *viommu;
> +    const struct viommu_ops *viommu_ops = NULL;

You can initialize viommu_ops here directly:

const struct viommu_ops *viommu_ops = viommu_get_ops(type);

> +    int rc;
> +
> +    /* Only support one vIOMMU per domain. */
> +    if ( d->arch.hvm_domain.viommu )
> +        return -E2BIG;

EEXIST is maybe better?

> +
> +    viommu_ops = viommu_get_ops(type);
> +    if ( !viommu_ops )
> +        return -EINVAL;
> +
> +    ASSERT(viommu_ops->create);
> +
> +    viommu = xzalloc(struct viommu);
> +    if ( !viommu )
> +        return -ENOMEM;
> +
> +    viommu->base_address = base_address;
> +    viommu->caps = caps;
> +    viommu->ops = viommu_ops;
> +
> +    rc = viommu_ops->create(d, viommu);
> +    if ( rc < 0 )
> +    {
> +        xfree(viommu);
> +        return rc;
> +    }
> +
> +    d->arch.hvm_domain.viommu = viommu;
> +
> +    /* Only support one vIOMMU per domain. */
> +    *viommu_id = 0;
> +    return 0;
> +}
> +
> +int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op)
> +{
> +    int rc;
> +
> +    switch ( op->cmd )
> +    {
> +    case XEN_DOMCTL_viommu_create:
> +        rc = viommu_create(d, op->u.create.type, op->u.create.base_address,
> +                           op->u.create.capabilities, &op->u.create.id);
> +        break;

Newline.

> +    default:
> +        return -ENOSYS;

-EOPNOTSUPP

> +    }
> +
> +    return rc;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
> index 7f128c0..fcd3482 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -21,6 +21,7 @@
>  #define __ASM_X86_HVM_DOMAIN_H__
>  
>  #include <xen/iommu.h>
> +#include <xen/viommu.h>
>  #include <asm/hvm/irq.h>
>  #include <asm/hvm/vpt.h>
>  #include <asm/hvm/vlapic.h>
> @@ -196,6 +197,8 @@ struct hvm_domain {
>          struct vmx_domain vmx;
>          struct svm_domain svm;
>      };
> +
> +    struct viommu *viommu;

Are you sure this will compile if you don't select CONFIG_VIOMMU?

AFAICT struct viommu is only defined if VIOMMU is selected, so you
should add something like...

> diff --git a/xen/include/xen/viommu.h b/xen/include/xen/viommu.h
> new file mode 100644
> index 0000000..a859d80
> --- /dev/null
> +++ b/xen/include/xen/viommu.h
> @@ -0,0 +1,69 @@
> +/*
> + * include/xen/viommu.h
> + *
> + * Copyright (c) 2017, Intel Corporation
> + * Author: Lan Tianyu <tianyu.lan@intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + *
> + */
> +#ifndef __XEN_VIOMMU_H__
> +#define __XEN_VIOMMU_H__
> +
> +#ifdef CONFIG_VIOMMU
> +
> +struct viommu;
> +
> +struct viommu_ops {
> +    uint8_t type;
> +    int (*create)(struct domain *d, struct viommu *viommu);
> +    int (*destroy)(struct viommu *viommu);
> +};
> +
> +struct viommu {
> +    uint64_t base_address;
> +    uint64_t caps;
> +    const struct viommu_ops *ops;
> +    void *priv;
> +};
> +
> +#define REGISTER_VIOMMU(x) static const struct viommu_ops *x##_entry \
> +  __used_section(".data.viommus") = &x;
> +
> +
> +int viommu_register_type(uint8_t type, struct viommu_ops *ops);
> +int viommu_destroy_domain(struct domain *d);
> +int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op);
> +#else

...

struct viommu {
};

here.

> +static inline int viommu_destroy_domain(struct domain *d)
> +{
> +    return -EINVAL;

ENODEV if you really have to return an error here, note that I think
destroy should not return anything.

> +}
> +static inline
> +int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op)

The style should be:

static inline int viommu_domctl(struct domain *d,
                                struct xen_domctl_viommu_op *op)
{
    ...

> +{
> +    return false;

Urg, no please. This should be -EOPNOTSUP.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 03/28] VIOMMU: Add irq request callback to deal with irq remapping
  2017-11-17  6:22 ` [PATCH v4 03/28] VIOMMU: Add irq request callback to deal with irq remapping Chao Gao
@ 2018-02-09 15:02   ` Roger Pau Monné
  2018-02-09 16:21     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-09 15:02 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:10PM +0800, Chao Gao wrote:
> From: Lan Tianyu <tianyu.lan@intel.com>
> 
> This patch is to add irq request callback for platform implementation
> to deal with irq remapping request.
> 
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
>  xen/common/viommu.c          | 15 ++++++++++++
>  xen/include/asm-x86/viommu.h | 54 ++++++++++++++++++++++++++++++++++++++++++++
>  xen/include/xen/viommu.h     |  6 +++++
>  3 files changed, 75 insertions(+)
>  create mode 100644 xen/include/asm-x86/viommu.h
> 
> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
> index fd8b7fd..53d4b70 100644
> --- a/xen/common/viommu.c
> +++ b/xen/common/viommu.c
> @@ -114,6 +114,21 @@ int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op)
>      return rc;
>  }
>  
> +int viommu_handle_irq_request(const struct domain *d,
> +                              const struct arch_irq_remapping_request *request)
> +{
> +    struct viommu *viommu = d->arch.hvm_domain.viommu;
> +
> +    if ( !viommu )
> +        return -ENODEV;
> +
> +    ASSERT(viommu->ops);
> +    if ( !viommu->ops->handle_irq_request )
> +        return -EINVAL;

EOPNOTSUPP? EINVAL seems like something the handler itself should
return when processing the inputs.

> +
> +    return viommu->ops->handle_irq_request(d, request);
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/include/asm-x86/viommu.h b/xen/include/asm-x86/viommu.h
> new file mode 100644
> index 0000000..01ec80e
> --- /dev/null
> +++ b/xen/include/asm-x86/viommu.h
> @@ -0,0 +1,54 @@
> +/*
> + * include/asm-x86/viommu.h
> + *
> + * Copyright (c) 2017 Intel Corporation.
> + * Author: Lan Tianyu <tianyu.lan@intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + *
> + */
> +#ifndef __ARCH_X86_VIOMMU_H__
> +#define __ARCH_X86_VIOMMU_H__
> +
> +/* IRQ request type */
> +enum viommu_irq_request_type {
> +    VIOMMU_REQUEST_IRQ_MSI = 0,
> +    VIOMMU_REQUEST_IRQ_APIC = 1
> +};
> +
> +struct arch_irq_remapping_request
> +{
> +    union {
> +        /* MSI */
> +        struct {
> +            uint64_t addr;
> +            uint32_t data;
> +        } msi;
> +        /* Redirection Entry in IOAPIC */
> +        uint64_t rte;
> +    } msg;

Do you really need the msg name here? IIRC we support anonymous
unions for non public structures.

> +    uint16_t source_id;
> +    enum viommu_irq_request_type type;
> +};

This structure looks fine, but it would be more helpful to introduce
the struct together with the device specific handle_irq_request
function, or else the fields inside of this struct are not relevant.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 04/28] VIOMMU: Add get irq info callback to convert irq remapping request
  2017-11-17  6:22 ` [PATCH v4 04/28] VIOMMU: Add get irq info callback to convert irq remapping request Chao Gao
@ 2018-02-09 15:06   ` Roger Pau Monné
  2018-02-09 16:34     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-09 15:06 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:11PM +0800, Chao Gao wrote:
> From: Lan Tianyu <tianyu.lan@intel.com>
> 
> This patch is to add get_irq_info callback for platform implementation
> to convert irq remapping request to irq info (E,G vector, dest, dest_mode
> and so on).
> 
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
>  xen/common/viommu.c          | 16 ++++++++++++++++
>  xen/include/asm-x86/viommu.h |  8 ++++++++
>  xen/include/xen/viommu.h     |  6 ++++++
>  3 files changed, 30 insertions(+)
> 
> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
> index 53d4b70..9eafdef 100644
> --- a/xen/common/viommu.c
> +++ b/xen/common/viommu.c
> @@ -129,6 +129,22 @@ int viommu_handle_irq_request(const struct domain *d,
>      return viommu->ops->handle_irq_request(d, request);
>  }
>  
> +int viommu_get_irq_info(const struct domain *d,
> +                        const struct arch_irq_remapping_request *request,
> +                        struct arch_irq_remapping_info *irq_info)
> +{
> +    const struct viommu *viommu = d->arch.hvm_domain.viommu;
> +
> +    if ( !viommu )
> +        return -EINVAL;
> +
> +    ASSERT(viommu->ops);
> +    if ( !viommu->ops->get_irq_info )
> +        return -EINVAL;

EOPNOTSUPP.

> +
> +    return viommu->ops->get_irq_info(d, request, irq_info);
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/include/asm-x86/viommu.h b/xen/include/asm-x86/viommu.h
> index 01ec80e..3d995ba 100644
> --- a/xen/include/asm-x86/viommu.h
> +++ b/xen/include/asm-x86/viommu.h
> @@ -26,6 +26,14 @@ enum viommu_irq_request_type {
>      VIOMMU_REQUEST_IRQ_APIC = 1
>  };
>  
> +struct arch_irq_remapping_info
> +{
> +    uint8_t dest_mode:1;
> +    uint8_t delivery_mode:3;
> +    uint8_t  vector;
              ^ double space.

> +    uint32_t dest;
> +};

The same issue again, introducing this structure without the code in
get_irq_info makes it impossible to review IMHO.

Also this should be introduced below the arch_irq_remapping_request
struct.

> +
>  struct arch_irq_remapping_request
>  {
>      union {
> diff --git a/xen/include/xen/viommu.h b/xen/include/xen/viommu.h
> index 67e25d5..73b853f 100644
> --- a/xen/include/xen/viommu.h
> +++ b/xen/include/xen/viommu.h
> @@ -32,6 +32,9 @@ struct viommu_ops {
>      int (*destroy)(struct viommu *viommu);
>      int (*handle_irq_request)(const struct domain *d,
>                                const struct arch_irq_remapping_request *request);
> +    int (*get_irq_info)(const struct domain *d,
> +                        const struct arch_irq_remapping_request *request,
> +                        struct arch_irq_remapping_info *info);
>  };
>  
>  struct viommu {
> @@ -50,6 +53,9 @@ int viommu_destroy_domain(struct domain *d);
>  int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op);
>  int viommu_handle_irq_request(const struct domain *d,
>                                const struct arch_irq_remapping_request *request);
> +int viommu_get_irq_info(const struct domain *d,
> +                        const struct arch_irq_remapping_request *request,

Why do you need 'request' here?

> +                        struct arch_irq_remapping_info *irq_info);
>  #else
>  static inline int viommu_destroy_domain(struct domain *d)
>  {
> -- 
> 1.8.3.1
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 05/28] VIOMMU: Introduce callback of checking irq remapping mode
  2017-11-17  6:22 ` [PATCH v4 05/28] VIOMMU: Introduce callback of checking irq remapping mode Chao Gao
@ 2018-02-09 15:11   ` Roger Pau Monné
  2018-02-09 16:47     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-09 15:11 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:12PM +0800, Chao Gao wrote:
> From: Lan Tianyu <tianyu.lan@intel.com>
> 
> This patch is to add callback for vIOAPIC and vMSI to check whether interrupt
> remapping is enabled.

Same as with the previous patches, not adding the actual code in
check_irq_remapping makes reviewing this impossible.

> 
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
>  xen/common/viommu.c      | 15 +++++++++++++++
>  xen/include/xen/viommu.h |  4 ++++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
> index 9eafdef..72173c3 100644
> --- a/xen/common/viommu.c
> +++ b/xen/common/viommu.c
> @@ -145,6 +145,21 @@ int viommu_get_irq_info(const struct domain *d,
>      return viommu->ops->get_irq_info(d, request, irq_info);
>  }
>  
> +bool viommu_check_irq_remapping(const struct domain *d,
> +                                const struct arch_irq_remapping_request *request)
> +{
> +    const struct viommu *viommu = d->arch.hvm_domain.viommu;
> +
> +    if ( !viommu )
> +        return false;
> +
> +    ASSERT(viommu->ops);
> +    if ( !viommu->ops->check_irq_remapping )
> +        return false;
> +
> +    return viommu->ops->check_irq_remapping(d, request);
> +}

Having a helper for each functionality you want to support seems
extremely cumbersome, I would image this to grow so that you will also
have viommu_check_mem_mapping and others.

Isn't it better to just have something like viommu_check_feature, or
even just expose a features field in the viommu struct itself?

> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/include/xen/viommu.h b/xen/include/xen/viommu.h
> index 73b853f..c1dfaec 100644
> --- a/xen/include/xen/viommu.h
> +++ b/xen/include/xen/viommu.h
> @@ -29,6 +29,8 @@ struct viommu;
>  struct viommu_ops {
>      uint8_t type;
>      int (*create)(struct domain *d, struct viommu *viommu);
> +    bool (*check_irq_remapping)(const struct domain *d,
> +                                const struct arch_irq_remapping_request *request);

Why add it here, instead of at the end of the struct?

>      int (*destroy)(struct viommu *viommu);
>      int (*handle_irq_request)(const struct domain *d,
>                                const struct arch_irq_remapping_request *request);
> @@ -56,6 +58,8 @@ int viommu_handle_irq_request(const struct domain *d,
>  int viommu_get_irq_info(const struct domain *d,
>                          const struct arch_irq_remapping_request *request,
>                          struct arch_irq_remapping_info *irq_info);
> +bool viommu_check_irq_remapping(const struct domain *d,
> +                                const struct arch_irq_remapping_request *request);
>  #else
>  static inline int viommu_destroy_domain(struct domain *d)
>  {
> -- 
> 1.8.3.1
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 06/28] vtd: clean-up and preparation for vvtd
  2017-11-17  6:22 ` [PATCH v4 06/28] vtd: clean-up and preparation for vvtd Chao Gao
@ 2018-02-09 15:17   ` Roger Pau Monné
  2018-02-09 16:51     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-09 15:17 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:13PM +0800, Chao Gao wrote:
> This patch contains following changes:
> - align register definitions
> - use MASK_EXTR to define some macros about extended capabilies
> rather than open-coding the masks
> - define fields of FECTL and FESTS as uint32_t rather than u64 since
> FECTL and FESTS are 32 bit registers.
> 
> No functional changes.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Just one nit...

> 
> ---
> v4:
>  - Only fix the alignment and defer introducing new definition to when
>  they are needed
>  (Suggested-by Roger Pau Monné)
>  - remove parts of open-coded masks
> v3:
>  - new
> ---
>  xen/drivers/passthrough/vtd/iommu.h | 86 +++++++++++++++++++++----------------
>  1 file changed, 48 insertions(+), 38 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
> index 72c1a2e..db80b31 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> +#define DMA_ECAP_SNP_CTL        ((uint64_t)1 << 7)
> +#define DMA_ECAP_PASS_THRU      ((uint64_t)1 << 6)
> +#define DMA_ECAP_CACHE_HINTS    ((uint64_t)1 << 5)
> +#define DMA_ECAP_EIM            ((uint64_t)1 << 4)
> +#define DMA_ECAP_INTR_REMAP     ((uint64_t)1 << 3)
> +#define DMA_ECAP_DEV_IOTLB      ((uint64_t)1 << 2)
> +#define DMA_ECAP_QUEUED_INVAL   ((uint64_t)1 << 1)
> +#define DMA_ECAP_COHERENT       ((uint64_t)1 << 0)

I think the general practice is to use 1UL (because it's shorter), or
1U for 32bits.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 01/28] Xen/doc: Add Xen virtual IOMMU doc
  2018-02-09 12:54   ` Roger Pau Monné
@ 2018-02-09 15:53     ` Chao Gao
  0 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2018-02-09 15:53 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Feb 09, 2018 at 12:54:11PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:08PM +0800, Chao Gao wrote:
>> From: Lan Tianyu <tianyu.lan@intel.com>
>> 
>> This patch is to add Xen virtual IOMMU doc to introduce motivation,
>> framework, vIOMMU hypercall and xl configuration.
>> 
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>>  docs/misc/viommu.txt | 120 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 120 insertions(+)
>>  create mode 100644 docs/misc/viommu.txt
>> 
>> diff --git a/docs/misc/viommu.txt b/docs/misc/viommu.txt
>> new file mode 100644
>> index 0000000..472d2b5
>> --- /dev/null
>> +++ b/docs/misc/viommu.txt
>> @@ -0,0 +1,120 @@
>> +Xen virtual IOMMU
>> +
>> +Motivation
>> +==========
>> +Enable more than 128 vcpu support
>> +
>> +The current requirements of HPC cloud service requires VM with a high
>> +number of CPUs in order to achieve high performance in parallel
>> +computing.
>> +
>> +To support >128 vcpus, X2APIC mode in guest is necessary because legacy
>> +APIC(XAPIC) just supports 8-bit APIC ID. The APIC ID used by Xen is
>> +CPU ID * 2 (ie: CPU 127 has APIC ID 254, which is the last one available
>> +in xAPIC mode) and so it only can support 128 vcpus at most. x2APIC mode
>> +supports 32-bit APIC ID and it requires the interrupt remapping functionality
>> +of a vIOMMU if the guest wishes to route interrupts to all available vCPUs
>> +
>> +PCI MSI/IOAPIC can only send interrupt message containing 8-bit APIC ID,
>> +which cannot address cpus with >254 APIC ID. Interrupt remapping supports
>> +32-bit APIC ID and so it's necessary for >128 vcpus support.
>> +
>> +vIOMMU Architecture
>> +===================
>> +vIOMMU device model is inside Xen hypervisor for following factors
>> +    1) Avoid round trips between Qemu and Xen hypervisor
>> +    2) Ease of integration with the rest of hypervisor
>> +    3) PVH doesn't use Qemu
>> +
>> +* Interrupt remapping overview.
>> +Interrupts from virtual devices and physical devices are delivered
>> +to vLAPIC from vIOAPIC and vMSI. vIOMMU needs to remap interrupt during
>> +this procedure.
>> +
>> ++---------------------------------------------------+
>> +|Qemu                       |VM                     |
>> +|                           | +----------------+    |
>> +|                           | |  Device driver |    |
>> +|                           | +--------+-------+    |
>> +|                           |          ^            |
>> +|       +----------------+  | +--------+-------+    |
>> +|       | Virtual device |  | |  IRQ subsystem |    |
>> +|       +-------+--------+  | +--------+-------+    |
>> +|               |           |          ^            |
>> +|               |           |          |            |
>> ++---------------------------+-----------------------+
>> +|hypervisor     |                      | VIRQ       |
>> +|               |            +---------+--------+   |
>> +|               |            |      vLAPIC      |   |
>> +|               |VIRQ        +---------+--------+   |
>> +|               |                      ^            |
>> +|               |                      |            |
>> +|               |            +---------+--------+   |
>> +|               |            |      vIOMMU      |   |
>> +|               |            +---------+--------+   |
>> +|               |                      ^            |
>> +|               |                      |            |
>> +|               |            +---------+--------+   |
>> +|               |            |   vIOAPIC/vMSI   |   |
>> +|               |            +----+----+--------+   |
>> +|               |                 ^    ^            |
>> +|               +-----------------+    |            |
>> +|                                      |            |
>> ++---------------------------------------------------+
>> +HW                                     |IRQ
>> +                                +-------------------+
>> +                                |   PCI Device      |
>> +                                +-------------------+
>> +
>> +
>> +vIOMMU hypercall
>> +================
>> +Introduce a new domctl hypercall "xen_domctl_viommu_op" to create
>> +vIOMMUs instance in hypervisor. vIOMMU instance will be destroyed
>> +during destroying domain.
>> +
>> +* vIOMMU hypercall parameter structure
>> +
>> +/* vIOMMU type - specify vendor vIOMMU device model */
>> +#define VIOMMU_TYPE_INTEL_VTD	       0
>> +
>> +/* vIOMMU capabilities */
>> +#define VIOMMU_CAP_IRQ_REMAPPING  (1u << 0)
>> +
>> +struct xen_domctl_viommu_op {
>> +    uint32_t cmd;
>> +#define XEN_DOMCTL_viommu_create          0
>> +    union {
>> +        struct {
>> +            /* IN - vIOMMU type  */
>> +            uint8_t type;
>> +            /* IN - MMIO base address of vIOMMU. */
>> +            uint64_t base_address;
>> +            /* IN - Capabilities with which we want to create */
>> +            uint64_t capabilities;
>> +            /* OUT - vIOMMU identity */
>> +            uint32_t id;
>> +        } create;
>> +    } u;
>> +};
>> +
>> +- XEN_DOMCTL_create_viommu
>> +    Create vIOMMU device with type, capabilities and MMIO base address.
>> +Hypervisor allocates viommu_id for new vIOMMU instance and return back.
>> +The vIOMMU device model in hypervisor should check whether it can
>> +support the input capabilities and return error if not.
>> +
>> +vIOMMU domctl and vIOMMU option in configure file consider multi-vIOMMU
>> +support for single VM.(e.g, parameters of create vIOMMU includes vIOMMU id).
>> +But function implementation only supports one vIOMMU per VM so far.
>> +
>> +xl x86 vIOMMU configuration"
>> +============================
>> +viommu = [
>> +    'type=intel_vtd,intremap=1',
>> +    ...
>> +]
>> +
>> +"type" - Specify vIOMMU device model type. Currently only supports Intel vtd
>> +device model.
>
>Although I see the point in being able to specify the vIOMMU type, is
>this really helpful from an admin PoV?
>
>What would happen for example if you try to add an Intel vIOMMU to a
>guest running on an AMD CPU? I guess the guest OSes would be quite
>surprised about that...
>
>I think the most common way to use this option would be:
>
>viommu = [
>    'intremap=1',
>    ...
>]

Agree it.

>
>And vIOMMUs should automatically be added to guests with > 128 vCPUs?
>IIRC Linux requires a vIOMMU in order to run with > 128 vCPUs (which
>is quite arbitrary, but anyway...).

I think linux will only use 128 CPUs for this case on bare-metal.
Considering a benign VM shouldn't has a weird configuration -- has > 128
vcpus but has no viommu, adding vIOMMUs automatically when needed is
fine with me.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 02/28] VIOMMU: Add vIOMMU framework and vIOMMU domctl
  2018-02-09 14:33   ` Roger Pau Monné
@ 2018-02-09 16:13     ` Chao Gao
  0 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2018-02-09 16:13 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Feb 09, 2018 at 02:33:57PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:09PM +0800, Chao Gao wrote:
>> From: Lan Tianyu <tianyu.lan@intel.com>
>> 
>> This patch is to introduce an abstract layer for arch vIOMMU implementation
>> and vIOMMU domctl to deal with requests from tool stack. Arch vIOMMU code needs to
>> provide callback. vIOMMU domctl supports to create vIOMMU instance in hypervisor
>> and it will be destroyed during destroying domain.
>> 
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>> v4:
>>  - introduce REGISTER_VIOMMU() to register viommu types and ops.
>>  - remove unneeded domctl interface to destroy viommu.
>> ---
>>  docs/misc/xen-command-line.markdown |   7 ++
>>  xen/arch/x86/Kconfig                |   1 +
>>  xen/arch/x86/hvm/hvm.c              |   3 +
>>  xen/arch/x86/xen.lds.S              |   3 +
>>  xen/common/Kconfig                  |   3 +
>>  xen/common/Makefile                 |   1 +
>>  xen/common/domctl.c                 |   7 ++
>>  xen/common/viommu.c                 | 125 ++++++++++++++++++++++++++++++++++++
>>  xen/include/asm-x86/hvm/domain.h    |   3 +
>>  xen/include/public/domctl.h         |  31 +++++++++
>>  xen/include/xen/viommu.h            |  69 ++++++++++++++++++++
>>  11 files changed, 253 insertions(+)
>>  create mode 100644 xen/common/viommu.c
>>  create mode 100644 xen/include/xen/viommu.h
>> 
>> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
>> index eb4995e..d097382 100644
>> --- a/docs/misc/xen-command-line.markdown
>> +++ b/docs/misc/xen-command-line.markdown
>> @@ -1836,3 +1836,10 @@ mode.
>>  > Default: `true`
>>  
>>  Permit use of the `xsave/xrstor` instructions.
>> +
>> +### viommu
>> +> `= <boolean>`
>> +
>> +> Default: `false`
>> +
>> +Permit use of viommu interface to create and destroy viommu device model.
>
>I'm not sure about the point of having this command line option, this
>is a guest feature and just setting it from the config file should be
>enough IMHO.

Sorry for this. You gave the same remark on our v3. And we promised to
remove it.  But we forgot this one. I will remove it.

>
>> diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
>> index 64955dc..df254e4 100644
>> --- a/xen/arch/x86/Kconfig
>> +++ b/xen/arch/x86/Kconfig
>> @@ -25,6 +25,7 @@ config X86
>>  	select HAS_UBSAN
>>  	select NUMA
>>  	select VGA
>> +	select VIOMMU
>>  
>>  config ARCH_DEFCONFIG
>>  	string
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index 205b4cb..964418a 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -36,6 +36,7 @@
>>  #include <xen/rangeset.h>
>>  #include <xen/monitor.h>
>>  #include <xen/warning.h>
>> +#include <xen/viommu.h>
>>  #include <asm/shadow.h>
>>  #include <asm/hap.h>
>>  #include <asm/current.h>
>> @@ -693,6 +694,8 @@ void hvm_domain_relinquish_resources(struct domain *d)
>>          pmtimer_deinit(d);
>>          hpet_deinit(d);
>>      }
>> +
>> +    viommu_destroy_domain(d);
>
>This returns a value, but you ignore it (read below for how I think
>this should be solved).
>
>>  }
>>  
>>  void hvm_domain_destroy(struct domain *d)
>> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
>> index d5e8821..7f8d2b8 100644
>> --- a/xen/arch/x86/xen.lds.S
>> +++ b/xen/arch/x86/xen.lds.S
>> @@ -231,6 +231,9 @@ SECTIONS
>>         __start_schedulers_array = .;
>>         *(.data.schedulers)
>>         __end_schedulers_array = .;
>> +       __start_viommus_array = .;
>> +       *(.data.viommus)
>> +       __end_viommus_array = .;
>
>This should be protected with #ifdef CONFIG_VIOMMU. And please place
>it at the end of the rodata section (AFAICT there's no need for it to
>be in the data.read_mostly section).
>

Will do.

>>    } :text
>>  
>>    .data : {                    /* Data */
>> diff --git a/xen/common/Kconfig b/xen/common/Kconfig
>> index 103ef44..62aaa76 100644
>> --- a/xen/common/Kconfig
>> +++ b/xen/common/Kconfig
>> @@ -52,6 +52,9 @@ config HAS_CHECKPOLICY
>>  	string
>>  	option env="XEN_HAS_CHECKPOLICY"
>>  
>> +config VIOMMU
>> +	bool
>> +
>>  config KEXEC
>>  	bool "kexec support"
>>  	default y
>> diff --git a/xen/common/Makefile b/xen/common/Makefile
>> index 66cc2c8..182b3ac 100644
>> --- a/xen/common/Makefile
>> +++ b/xen/common/Makefile
>> @@ -56,6 +56,7 @@ obj-y += time.o
>>  obj-y += timer.o
>>  obj-y += trace.o
>>  obj-y += version.o
>> +obj-$(CONFIG_VIOMMU) += viommu.o
>>  obj-y += virtual_region.o
>>  obj-y += vm_event.o
>>  obj-y += vmap.o
>> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
>> index 3c6fa4e..9c5651d 100644
>> --- a/xen/common/domctl.c
>> +++ b/xen/common/domctl.c
>> @@ -25,6 +25,7 @@
>>  #include <xen/paging.h>
>>  #include <xen/hypercall.h>
>>  #include <xen/vm_event.h>
>> +#include <xen/viommu.h>
>>  #include <xen/monitor.h>
>>  #include <asm/current.h>
>>  #include <asm/irq.h>
>> @@ -1155,6 +1156,12 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>>                                       op->u.set_gnttab_limits.maptrack_frames);
>>          break;
>>  
>> +    case XEN_DOMCTL_viommu_op:
>> +        ret = viommu_domctl(d, &op->u.viommu_op);
>> +        if ( !ret )
>> +            copyback = 1;
>> +        break;
>> +
>>      default:
>>          ret = arch_do_domctl(op, d, u_domctl);
>>          break;
>> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
>> new file mode 100644
>> index 0000000..fd8b7fd
>> --- /dev/null
>> +++ b/xen/common/viommu.c
>> @@ -0,0 +1,125 @@
>> +/*
>> + * common/viommu.c
>> + *
>> + * Copyright (c) 2017 Intel Corporation
>> + * Author: Lan Tianyu <tianyu.lan@intel.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <xen/sched.h>
>> +#include <xen/spinlock.h>
>> +#include <xen/types.h>
>> +#include <xen/viommu.h>
>> +
>> +extern const struct viommu_ops *__start_viommus_array[], *__end_viommus_array[];
>> +#define NUM_VIOMMU_TYPE (__end_viommus_array - __start_viommus_array)
>> +#define viommu_type_array __start_viommus_array
>> +
>> +int viommu_destroy_domain(struct domain *d)
>
>IMHO I would rather prefer the destroy operation to not be allowed to
>fail, that would allow switching it return value to void.
>
>The more that you simply ignore the return value in
>hvm_domain_relinquish_resources.
>

Got it.

>> +{
>> +    struct viommu *viommu = d->arch.hvm_domain.viommu;
>> +    int ret;
>> +
>> +    if ( !viommu )
>> +        return -ENODEV;
>> +
>> +    ret = viommu->ops->destroy(viommu);
>> +    if ( ret < 0 )
>> +        return ret;
>> +
>> +    xfree(viommu);
>> +    d->arch.hvm_domain.viommu = NULL;
>> +
>> +    return 0;
>> +}
>> +
>> +static const struct viommu_ops *viommu_get_ops(uint8_t type)
>> +{
>> +    int i;
>
>unsigned int.
>
>> +
>> +    for ( i = 0; i < NUM_VIOMMU_TYPE; i++)
>> +    {
>> +        if ( viommu_type_array[i]->type == type )
>> +            return viommu_type_array[i];
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
>> +static int viommu_create(struct domain *d, uint8_t type,
>> +                         uint64_t base_address, uint64_t caps,
>> +                         uint32_t *viommu_id)
>> +{
>> +    struct viommu *viommu;
>> +    const struct viommu_ops *viommu_ops = NULL;
>
>You can initialize viommu_ops here directly:
>
>const struct viommu_ops *viommu_ops = viommu_get_ops(type);
>
>> +    int rc;
>> +
>> +    /* Only support one vIOMMU per domain. */
>> +    if ( d->arch.hvm_domain.viommu )
>> +        return -E2BIG;
>
>EEXIST is maybe better?
>
>> +
>> +    viommu_ops = viommu_get_ops(type);
>> +    if ( !viommu_ops )
>> +        return -EINVAL;
>> +
>> +    ASSERT(viommu_ops->create);
>> +
>> +    viommu = xzalloc(struct viommu);
>> +    if ( !viommu )
>> +        return -ENOMEM;
>> +
>> +    viommu->base_address = base_address;
>> +    viommu->caps = caps;
>> +    viommu->ops = viommu_ops;
>> +
>> +    rc = viommu_ops->create(d, viommu);
>> +    if ( rc < 0 )
>> +    {
>> +        xfree(viommu);
>> +        return rc;
>> +    }
>> +
>> +    d->arch.hvm_domain.viommu = viommu;
>> +
>> +    /* Only support one vIOMMU per domain. */
>> +    *viommu_id = 0;
>> +    return 0;
>> +}
>> +
>> +int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op)
>> +{
>> +    int rc;
>> +
>> +    switch ( op->cmd )
>> +    {
>> +    case XEN_DOMCTL_viommu_create:
>> +        rc = viommu_create(d, op->u.create.type, op->u.create.base_address,
>> +                           op->u.create.capabilities, &op->u.create.id);
>> +        break;
>
>Newline.
>
>> +    default:
>> +        return -ENOSYS;
>
>-EOPNOTSUPP
>
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-file-style: "BSD"
>> + * c-basic-offset: 4
>> + * tab-width: 4
>> + * indent-tabs-mode: nil
>> + * End:
>> + */
>> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
>> index 7f128c0..fcd3482 100644
>> --- a/xen/include/asm-x86/hvm/domain.h
>> +++ b/xen/include/asm-x86/hvm/domain.h
>> @@ -21,6 +21,7 @@
>>  #define __ASM_X86_HVM_DOMAIN_H__
>>  
>>  #include <xen/iommu.h>
>> +#include <xen/viommu.h>
>>  #include <asm/hvm/irq.h>
>>  #include <asm/hvm/vpt.h>
>>  #include <asm/hvm/vlapic.h>
>> @@ -196,6 +197,8 @@ struct hvm_domain {
>>          struct vmx_domain vmx;
>>          struct svm_domain svm;
>>      };
>> +
>> +    struct viommu *viommu;
>
>Are you sure this will compile if you don't select CONFIG_VIOMMU?
>
>AFAICT struct viommu is only defined if VIOMMU is selected, so you
>should add something like...
>
>> diff --git a/xen/include/xen/viommu.h b/xen/include/xen/viommu.h
>> new file mode 100644
>> index 0000000..a859d80
>> --- /dev/null
>> +++ b/xen/include/xen/viommu.h
>> @@ -0,0 +1,69 @@
>> +/*
>> + * include/xen/viommu.h
>> + *
>> + * Copyright (c) 2017, Intel Corporation
>> + * Author: Lan Tianyu <tianyu.lan@intel.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + *
>> + */
>> +#ifndef __XEN_VIOMMU_H__
>> +#define __XEN_VIOMMU_H__
>> +
>> +#ifdef CONFIG_VIOMMU
>> +
>> +struct viommu;
>> +
>> +struct viommu_ops {
>> +    uint8_t type;
>> +    int (*create)(struct domain *d, struct viommu *viommu);
>> +    int (*destroy)(struct viommu *viommu);
>> +};
>> +
>> +struct viommu {
>> +    uint64_t base_address;
>> +    uint64_t caps;
>> +    const struct viommu_ops *ops;
>> +    void *priv;
>> +};
>> +
>> +#define REGISTER_VIOMMU(x) static const struct viommu_ops *x##_entry \
>> +  __used_section(".data.viommus") = &x;
>> +
>> +
>> +int viommu_register_type(uint8_t type, struct viommu_ops *ops);
>> +int viommu_destroy_domain(struct domain *d);
>> +int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op);
>> +#else
>
>...
>
>struct viommu {
>};
>
>here.
>

Got it.

>> +static inline int viommu_destroy_domain(struct domain *d)
>> +{
>> +    return -EINVAL;
>
>ENODEV if you really have to return an error here, note that I think
>destroy should not return anything.
>
>> +}
>> +static inline
>> +int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op)
>
>The style should be:
>
>static inline int viommu_domctl(struct domain *d,
>                                struct xen_domctl_viommu_op *op)
>{
>    ...
>
>> +{
>> +    return false;
>
>Urg, no please. This should be -EOPNOTSUP.

yes.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 03/28] VIOMMU: Add irq request callback to deal with irq remapping
  2018-02-09 15:02   ` Roger Pau Monné
@ 2018-02-09 16:21     ` Chao Gao
  0 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2018-02-09 16:21 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Feb 09, 2018 at 03:02:03PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:10PM +0800, Chao Gao wrote:
>> From: Lan Tianyu <tianyu.lan@intel.com>
>> 
>> This patch is to add irq request callback for platform implementation
>> to deal with irq remapping request.
>> 
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>>  xen/common/viommu.c          | 15 ++++++++++++
>>  xen/include/asm-x86/viommu.h | 54 ++++++++++++++++++++++++++++++++++++++++++++
>>  xen/include/xen/viommu.h     |  6 +++++
>>  3 files changed, 75 insertions(+)
>>  create mode 100644 xen/include/asm-x86/viommu.h
>> 
>> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
>> index fd8b7fd..53d4b70 100644
>> --- a/xen/common/viommu.c
>> +++ b/xen/common/viommu.c
>> @@ -114,6 +114,21 @@ int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op)
>>      return rc;
>>  }
>>  
>> +int viommu_handle_irq_request(const struct domain *d,
>> +                              const struct arch_irq_remapping_request *request)
>> +{
>> +    struct viommu *viommu = d->arch.hvm_domain.viommu;
>> +
>> +    if ( !viommu )
>> +        return -ENODEV;
>> +
>> +    ASSERT(viommu->ops);
>> +    if ( !viommu->ops->handle_irq_request )
>> +        return -EINVAL;
>
>EOPNOTSUPP? EINVAL seems like something the handler itself should
>return when processing the inputs.

I also prefer EOPNOTSUPP.

>
>> +
>> +    return viommu->ops->handle_irq_request(d, request);
>> +}
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/xen/include/asm-x86/viommu.h b/xen/include/asm-x86/viommu.h
>> new file mode 100644
>> index 0000000..01ec80e
>> --- /dev/null
>> +++ b/xen/include/asm-x86/viommu.h
>> @@ -0,0 +1,54 @@
>> +/*
>> + * include/asm-x86/viommu.h
>> + *
>> + * Copyright (c) 2017 Intel Corporation.
>> + * Author: Lan Tianyu <tianyu.lan@intel.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + *
>> + */
>> +#ifndef __ARCH_X86_VIOMMU_H__
>> +#define __ARCH_X86_VIOMMU_H__
>> +
>> +/* IRQ request type */
>> +enum viommu_irq_request_type {
>> +    VIOMMU_REQUEST_IRQ_MSI = 0,
>> +    VIOMMU_REQUEST_IRQ_APIC = 1
>> +};
>> +
>> +struct arch_irq_remapping_request
>> +{
>> +    union {
>> +        /* MSI */
>> +        struct {
>> +            uint64_t addr;
>> +            uint32_t data;
>> +        } msi;
>> +        /* Redirection Entry in IOAPIC */
>> +        uint64_t rte;
>> +    } msg;
>
>Do you really need the msg name here? IIRC we support anonymous
>unions for non public structures.

will use anonymous unions.

>
>> +    uint16_t source_id;
>> +    enum viommu_irq_request_type type;
>> +};
>
>This structure looks fine, but it would be more helpful to introduce
>the struct together with the device specific handle_irq_request
>function, or else the fields inside of this struct are not relevant.

Ok, will merge related changes into this one.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 07/28] x86/hvm: Introduce a emulated VTD for HVM
  2017-11-17  6:22 ` [PATCH v4 07/28] x86/hvm: Introduce a emulated VTD for HVM Chao Gao
@ 2018-02-09 16:27   ` Roger Pau Monné
  2018-02-09 17:12     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-09 16:27 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:14PM +0800, Chao Gao wrote:
> This patch adds create/destroy function for the emulated VTD
> and adapts it to the common VIOMMU abstraction.
> 
> As the Makefile is changed here, put all files in alphabetic order
> by this chance.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> 
> ---
> v4:
> - use REGISTER_VIOMMU
> - shrink the size of hvm_hw_vvtd_regs
> - make hvm_hw_vvtd_regs a field inside struct vvtd
> ---
>  xen/drivers/passthrough/vtd/Makefile |   7 +-
>  xen/drivers/passthrough/vtd/iommu.h  |   9 +++
>  xen/drivers/passthrough/vtd/vvtd.c   | 150 +++++++++++++++++++++++++++++++++++
>  3 files changed, 163 insertions(+), 3 deletions(-)
>  create mode 100644 xen/drivers/passthrough/vtd/vvtd.c
> 
> diff --git a/xen/drivers/passthrough/vtd/Makefile b/xen/drivers/passthrough/vtd/Makefile
> index f302653..163c7fe 100644
> --- a/xen/drivers/passthrough/vtd/Makefile
> +++ b/xen/drivers/passthrough/vtd/Makefile
> @@ -1,8 +1,9 @@
>  subdir-$(CONFIG_X86) += x86
>  
> -obj-y += iommu.o
>  obj-y += dmar.o
> -obj-y += utils.o
> -obj-y += qinval.o
>  obj-y += intremap.o
> +obj-y += iommu.o
> +obj-y += qinval.o
>  obj-y += quirks.o
> +obj-y += utils.o
> +obj-$(CONFIG_VIOMMU) += vvtd.o
> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
> index db80b31..f2ef3dd 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -47,6 +47,7 @@
>  #define DMAR_IQH_REG            0x80 /* invalidation queue head */
>  #define DMAR_IQT_REG            0x88 /* invalidation queue tail */
>  #define DMAR_IQA_REG            0x90 /* invalidation queue addr */
> +#define DMAR_IECTL_REG          0xa0 /* invalidation event control register */
>  #define DMAR_IRTA_REG           0xb8 /* intr remap */
>  
>  #define OFFSET_STRIDE        (9)
> @@ -89,6 +90,12 @@
>  #define cap_afl(c)        (((c) >> 3) & 1)
>  #define cap_ndoms(c)        (1 << (4 + 2 * ((c) & 0x7)))
>  
> +#define cap_set_num_fault_regs(c)   ((((c) - 1) & 0xff) << 40)
> +#define cap_set_fault_reg_offset(c) ((((c) / 16) & 0x3ff) << 24)
> +#define cap_set_mgaw(c)             ((((c) - 1) & 0x3f) << 16)
> +#define cap_set_sagaw(c)            (((c) & 0x1f) << 8)
> +#define cap_set_ndoms(c)            ((c) & 0x7)
> +
>  /*
>   * Extended Capability Register
>   */
> @@ -114,6 +121,8 @@
>  #define ecap_niotlb_iunits(e)    ((((e) >> 24) & 0xff) + 1)
>  #define ecap_iotlb_offset(e)     ((((e) >> 8) & 0x3ff) * 16)
>  
> +#define ecap_set_mhmv(e)         (((e) & 0xf) << 20)
> +
>  /* IOTLB_REG */
>  #define DMA_TLB_FLUSH_GRANU_OFFSET  60
>  #define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60)
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
> new file mode 100644
> index 0000000..9f76ccf
> --- /dev/null
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -0,0 +1,150 @@
> +/*
> + * vvtd.c
> + *
> + * virtualize VTD for HVM.
> + *
> + * Copyright (C) 2017 Chao Gao, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/sched.h>
> +#include <xen/types.h>
> +#include <xen/viommu.h>
> +#include <xen/xmalloc.h>
> +#include <asm/current.h>
> +#include <asm/hvm/domain.h>
> +
> +#include "iommu.h"
> +
> +/* Supported capabilities by vvtd */
> +#define VVTD_MAX_CAPS VIOMMU_CAP_IRQ_REMAPPING
> +
> +#define VVTD_FRCD_NUM   1ULL
> +#define VVTD_FRCD_START (DMAR_IRTA_REG + 8)
> +#define VVTD_FRCD_END   (VVTD_FRCD_START + VVTD_FRCD_NUM * 16)
> +#define VVTD_MAX_OFFSET VVTD_FRCD_END
> +
> +struct hvm_hw_vvtd {
> +    uint32_t regs[VVTD_MAX_OFFSET/sizeof(uint32_t)];

Unless I'm mistaken this is 208bytes in size, yet you only seem to use
28bytes (from the registers used in vvtd_reset). I guess this is going
to change over the series so all this space is really needed.

Also I think this would be better as:

union hw_vvtd {
    uint32_t regs32[VVTD_MAX_OFFSET/sizeof(uint32_t)];
    uint64_t regs64[VVTD_MAX_OFFSET/sizeof(uint64_t)];
};

> +};
> +
> +struct vvtd {
> +    /* Base address of remapping hardware register-set */
> +    uint64_t base_addr;
> +    /* Point back to the owner domain */
> +    struct domain *domain;
> +
> +    struct hvm_hw_vvtd hw;
> +};
> +
> +/* Setting viommu_verbose enables debugging messages of vIOMMU */
> +bool __read_mostly viommu_verbose;

static?

> +boolean_runtime_param("viommu_verbose", viommu_verbose);
> +
> +#ifndef NDEBUG
> +#define vvtd_info(fmt...) do {                    \
> +    if ( viommu_verbose )                         \
> +        gprintk(XENLOG_INFO, ## fmt);             \
> +} while(0)
> +/*
> + * Use printk and '_G_' prefix because vvtd_debug() may be called
> + * in the context of another domain's vCPU. Don't output 'current'
> + * information to avoid confusion.
> + */
> +#define vvtd_debug(fmt...) do {                   \
> +    if ( viommu_verbose && printk_ratelimit())    \
> +        printk(XENLOG_G_DEBUG fmt);               \

I think printk is already rate-limited if you use _G_, so no need for
the ratelimit call.

> +} while(0)
> +#else
> +#define vvtd_info(...) do {} while(0)
> +#define vvtd_debug(...) do {} while(0)
> +#endif
> +
> +#define VVTD_REG_POS(vvtd, offset) &(vvtd->hw.regs[offset/sizeof(uint32_t)])
> +
> +static inline void vvtd_set_reg(struct vvtd *vvtd, uint32_t reg, uint32_t value)

I don't think you need the vvtd prefix here, and I would leave adding
inline to the compiler discretion:

static void set_reg32(struct vvtd *vvtd, unsigned long offset, uint32_t val)
{
    vvtd->hw.regs32[offset / 4] = val
}

But I think you can even get rid of the helper functions and just use
macros directly, ie:

#define GET_REG(vvtd, offset, size) \
    ((vvtd)->hw.regs ## size [(offset) / size / 8 ])
#define SET_REG(vvtd, offset, val, size) \
    (GET_REG(vvtd, offset, val) = val)

This is better IMHO, and I'm not really sure the SET_REG macro is
really needed, you can just open code GET_REG(...) = val;

> +{
> +    *VVTD_REG_POS(vvtd, reg) = value;
> +}
> +
> +static inline uint32_t vvtd_get_reg(const struct vvtd *vvtd, uint32_t reg)
> +{
> +    return *VVTD_REG_POS(vvtd, reg);
> +}
> +
> +static inline void vvtd_set_reg_quad(struct vvtd *vvtd, uint32_t reg,
> +                                     uint64_t value)
> +{
> +    *(uint64_t*)VVTD_REG_POS(vvtd, reg) = value;
> +}
> +
> +static inline uint64_t vvtd_get_reg_quad(const struct vvtd *vvtd, uint32_t reg)
> +{
> +    return *(uint64_t*)VVTD_REG_POS(vvtd, reg);
> +}
> +
> +static void vvtd_reset(struct vvtd *vvtd)
> +{
> +    uint64_t cap = cap_set_num_fault_regs(VVTD_FRCD_NUM)
> +                   | cap_set_fault_reg_offset(VVTD_FRCD_START)
> +                   | cap_set_mgaw(39) /* maximum guest address width */
> +                   | cap_set_sagaw(2) /* support 3-level page_table */
> +                   | cap_set_ndoms(6); /* support 64K domains */
> +    uint64_t ecap = DMA_ECAP_QUEUED_INVAL | DMA_ECAP_INTR_REMAP | DMA_ECAP_EIM |
> +                    ecap_set_mhmv(0xf);
> +
> +    vvtd_set_reg(vvtd, DMAR_VER_REG, 0x10UL);
> +    vvtd_set_reg_quad(vvtd, DMAR_CAP_REG, cap);
> +    vvtd_set_reg_quad(vvtd, DMAR_ECAP_REG, ecap);
> +    vvtd_set_reg(vvtd, DMAR_FECTL_REG, 0x80000000UL);
> +    vvtd_set_reg(vvtd, DMAR_IECTL_REG, 0x80000000UL);
> +}
> +
> +static int vvtd_create(struct domain *d, struct viommu *viommu)
> +{
> +    struct vvtd *vvtd;
> +
> +    if ( !is_hvm_domain(d) || (viommu->base_address & (PAGE_SIZE - 1)) ||
> +         (~VVTD_MAX_CAPS & viommu->caps) )
> +        return -EINVAL;
> +
> +    vvtd = xzalloc_bytes(sizeof(struct vvtd));

vvtd = xzalloc(struct vvtd);

> +    if ( !vvtd )
> +        return ENOMEM;
> +
> +    vvtd_reset(vvtd);
> +    vvtd->base_addr = viommu->base_address;

I think it would be good to have some check here, so that the vIOMMU
is not for example positioned on top of a RAM region. Ideally you
should check that the gfns [base_address, base_address + size) are
unpopulated.

> +    vvtd->domain = d;
> +
> +    viommu->priv = vvtd;
> +
> +    return 0;
> +}
> +
> +static int vvtd_destroy(struct viommu *viommu)
> +{
> +    struct vvtd *vvtd = viommu->priv;
> +
> +    if ( vvtd )
> +        xfree(vvtd);
> +
> +    return 0;
> +}
> +
> +static const struct viommu_ops vvtd_hvm_vmx_ops = {

Is the vmx needed? vvtd is already Intel specific AFAICT. You could
probably omit the hvm also, so vvtd_ops.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 04/28] VIOMMU: Add get irq info callback to convert irq remapping request
  2018-02-09 15:06   ` Roger Pau Monné
@ 2018-02-09 16:34     ` Chao Gao
  0 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2018-02-09 16:34 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Feb 09, 2018 at 03:06:07PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:11PM +0800, Chao Gao wrote:
>> From: Lan Tianyu <tianyu.lan@intel.com>
>> 
>> This patch is to add get_irq_info callback for platform implementation
>> to convert irq remapping request to irq info (E,G vector, dest, dest_mode
>> and so on).
>> 
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>>  xen/common/viommu.c          | 16 ++++++++++++++++
>>  xen/include/asm-x86/viommu.h |  8 ++++++++
>>  xen/include/xen/viommu.h     |  6 ++++++
>>  3 files changed, 30 insertions(+)
>> 
>> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
>> index 53d4b70..9eafdef 100644
>> --- a/xen/common/viommu.c
>> +++ b/xen/common/viommu.c
>> @@ -129,6 +129,22 @@ int viommu_handle_irq_request(const struct domain *d,
>>      return viommu->ops->handle_irq_request(d, request);
>>  }
>>  
>> +int viommu_get_irq_info(const struct domain *d,
>> +                        const struct arch_irq_remapping_request *request,
>> +                        struct arch_irq_remapping_info *irq_info)
>> +{
>> +    const struct viommu *viommu = d->arch.hvm_domain.viommu;
>> +
>> +    if ( !viommu )
>> +        return -EINVAL;
>> +
>> +    ASSERT(viommu->ops);
>> +    if ( !viommu->ops->get_irq_info )
>> +        return -EINVAL;
>
>EOPNOTSUPP.
>
>> +
>> +    return viommu->ops->get_irq_info(d, request, irq_info);
>> +}
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/xen/include/asm-x86/viommu.h b/xen/include/asm-x86/viommu.h
>> index 01ec80e..3d995ba 100644
>> --- a/xen/include/asm-x86/viommu.h
>> +++ b/xen/include/asm-x86/viommu.h
>> @@ -26,6 +26,14 @@ enum viommu_irq_request_type {
>>      VIOMMU_REQUEST_IRQ_APIC = 1
>>  };
>>  
>> +struct arch_irq_remapping_info
>> +{
>> +    uint8_t dest_mode:1;
>> +    uint8_t delivery_mode:3;
>> +    uint8_t  vector;
>              ^ double space.
>
>> +    uint32_t dest;
>> +};
>
>The same issue again, introducing this structure without the code in
>get_irq_info makes it impossible to review IMHO.
>
>Also this should be introduced below the arch_irq_remapping_request
>struct.

Will pay attention to this kind of issue.
>
>> +
>>  struct arch_irq_remapping_request
>>  {
>>      union {
>> diff --git a/xen/include/xen/viommu.h b/xen/include/xen/viommu.h
>> index 67e25d5..73b853f 100644
>> --- a/xen/include/xen/viommu.h
>> +++ b/xen/include/xen/viommu.h
>> @@ -32,6 +32,9 @@ struct viommu_ops {
>>      int (*destroy)(struct viommu *viommu);
>>      int (*handle_irq_request)(const struct domain *d,
>>                                const struct arch_irq_remapping_request *request);
>> +    int (*get_irq_info)(const struct domain *d,
>> +                        const struct arch_irq_remapping_request *request,
>> +                        struct arch_irq_remapping_info *info);
>>  };
>>  
>>  struct viommu {
>> @@ -50,6 +53,9 @@ int viommu_destroy_domain(struct domain *d);
>>  int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op);
>>  int viommu_handle_irq_request(const struct domain *d,
>>                                const struct arch_irq_remapping_request *request);
>> +int viommu_get_irq_info(const struct domain *d,
>> +                        const struct arch_irq_remapping_request *request,
>
>Why do you need 'request' here?
>

A request is a abstract of legacy interrupt or remappable interrupt.
vIOMMU can deliver a request (or an interrupt) to vlapic. It also
can translate the request to interrupt attributes (vector,
destination...) with the help of an interrupt remapping table.
This function gets the corresponding interrupt attributes of the request.

Thanks
Chao

>> +                        struct arch_irq_remapping_info *irq_info);
>>  #else
>>  static inline int viommu_destroy_domain(struct domain *d)
>>  {
>> -- 
>> 1.8.3.1
>> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 08/28] x86/vvtd: Add MMIO handler for VVTD
  2017-11-17  6:22 ` [PATCH v4 08/28] x86/vvtd: Add MMIO handler for VVTD Chao Gao
@ 2018-02-09 16:39   ` Roger Pau Monné
  2018-02-09 17:21     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-09 16:39 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:15PM +0800, Chao Gao wrote:
> This patch adds VVTD MMIO handler to deal with MMIO access.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
> v4:
>  - only trap the register emulated in vvtd_in_range().
>    i.e. replace PAGE_SIZE with the VVTD_MAX_OFFSET
> ---
>  xen/drivers/passthrough/vtd/vvtd.c | 55 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 55 insertions(+)
> 
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
> index 9f76ccf..d78d878 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c

Now that I look at this, this is the wrong folder. This should be in
xen/arch/x86/hvm with the rest of the emulated devices.

> @@ -94,6 +94,60 @@ static inline uint64_t vvtd_get_reg_quad(const struct vvtd *vvtd, uint32_t reg)
>      return *(uint64_t*)VVTD_REG_POS(vvtd, reg);
>  }
>  
> +static void *domain_vvtd(const struct domain *d)
> +{
> +    if ( is_hvm_domain(d) && d->arch.hvm_domain.viommu )

hvm_mmio_ops is only used by HVM guests, so the is_hvm_domain check
here is redundant. At which point the helper can be simplified as:

static struct vvtd *domain_vvtd(const struct domain *d)
{
    return d->arch.hvm_domain.viommu ? d->arch.hvm_domain.viommu->priv : NULL;
}

> +        return d->arch.hvm_domain.viommu->priv;
> +    else
> +        return NULL;
> +}
> +
> +static int vvtd_in_range(struct vcpu *v, unsigned long addr)
> +{
> +    struct vvtd *vvtd = domain_vvtd(v->domain);
const

> +
> +    if ( vvtd )
> +        return (addr >= vvtd->base_addr) &&
> +               (addr < vvtd->base_addr + VVTD_MAX_OFFSET);
> +    return 0;
> +}
> +
> +static int vvtd_read(struct vcpu *v, unsigned long addr,
> +                     unsigned int len, unsigned long *pval)
> +{
> +    struct vvtd *vvtd = domain_vvtd(v->domain);
const

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 05/28] VIOMMU: Introduce callback of checking irq remapping mode
  2018-02-09 15:11   ` Roger Pau Monné
@ 2018-02-09 16:47     ` Chao Gao
  2018-02-12 10:21       ` Roger Pau Monné
  0 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2018-02-09 16:47 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Feb 09, 2018 at 03:11:25PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:12PM +0800, Chao Gao wrote:
>> From: Lan Tianyu <tianyu.lan@intel.com>
>> 
>> This patch is to add callback for vIOAPIC and vMSI to check whether interrupt
>> remapping is enabled.
>
>Same as with the previous patches, not adding the actual code in
>check_irq_remapping makes reviewing this impossible.
>
>> 
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>>  xen/common/viommu.c      | 15 +++++++++++++++
>>  xen/include/xen/viommu.h |  4 ++++
>>  2 files changed, 19 insertions(+)
>> 
>> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
>> index 9eafdef..72173c3 100644
>> --- a/xen/common/viommu.c
>> +++ b/xen/common/viommu.c
>> @@ -145,6 +145,21 @@ int viommu_get_irq_info(const struct domain *d,
>>      return viommu->ops->get_irq_info(d, request, irq_info);
>>  }
>>  
>> +bool viommu_check_irq_remapping(const struct domain *d,
>> +                                const struct arch_irq_remapping_request *request)
>> +{
>> +    const struct viommu *viommu = d->arch.hvm_domain.viommu;
>> +
>> +    if ( !viommu )
>> +        return false;
>> +
>> +    ASSERT(viommu->ops);
>> +    if ( !viommu->ops->check_irq_remapping )
>> +        return false;
>> +
>> +    return viommu->ops->check_irq_remapping(d, request);
>> +}
>
>Having a helper for each functionality you want to support seems
>extremely cumbersome, I would image this to grow so that you will also
>have viommu_check_mem_mapping and others.
>
>Isn't it better to just have something like viommu_check_feature, or
>even just expose a features field in the viommu struct itself?

Maybe it is caused by our poor function name and no comments to point
out what the function does.  As you know, interrupts has two formats:
legacy format and remappable format.  The format is indicated by one bit
of MSI msg or IOAPIC RTE. Roughly, only remappable format should be
translated by IOMMU. So every time we want to handle an interrupt, we
should know its format and we think the remappable format varies
from different vendors. This is why we introduce a new field here in
order to abstract the check of remapping format.

If you could come up with a better name, that will be very helpful.

Thanks
Chao
>
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/xen/include/xen/viommu.h b/xen/include/xen/viommu.h
>> index 73b853f..c1dfaec 100644
>> --- a/xen/include/xen/viommu.h
>> +++ b/xen/include/xen/viommu.h
>> @@ -29,6 +29,8 @@ struct viommu;
>>  struct viommu_ops {
>>      uint8_t type;
>>      int (*create)(struct domain *d, struct viommu *viommu);
>> +    bool (*check_irq_remapping)(const struct domain *d,
>> +                                const struct arch_irq_remapping_request *request);
>
>Why add it here, instead of at the end of the struct?
>
>>      int (*destroy)(struct viommu *viommu);
>>      int (*handle_irq_request)(const struct domain *d,
>>                                const struct arch_irq_remapping_request *request);
>> @@ -56,6 +58,8 @@ int viommu_handle_irq_request(const struct domain *d,
>>  int viommu_get_irq_info(const struct domain *d,
>>                          const struct arch_irq_remapping_request *request,
>>                          struct arch_irq_remapping_info *irq_info);
>> +bool viommu_check_irq_remapping(const struct domain *d,
>> +                                const struct arch_irq_remapping_request *request);
>>  #else
>>  static inline int viommu_destroy_domain(struct domain *d)
>>  {
>> -- 
>> 1.8.3.1
>> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 06/28] vtd: clean-up and preparation for vvtd
  2018-02-09 15:17   ` Roger Pau Monné
@ 2018-02-09 16:51     ` Chao Gao
  0 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2018-02-09 16:51 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Feb 09, 2018 at 03:17:59PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:13PM +0800, Chao Gao wrote:
>> This patch contains following changes:
>> - align register definitions
>> - use MASK_EXTR to define some macros about extended capabilies
>> rather than open-coding the masks
>> - define fields of FECTL and FESTS as uint32_t rather than u64 since
>> FECTL and FESTS are 32 bit registers.
>> 
>> No functional changes.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>
>Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks. It is the first Reviewed-by we get. :).
>
>Just one nit...
>
>> 
>> ---
>> v4:
>>  - Only fix the alignment and defer introducing new definition to when
>>  they are needed
>>  (Suggested-by Roger Pau Monné)
>>  - remove parts of open-coded masks
>> v3:
>>  - new
>> ---
>>  xen/drivers/passthrough/vtd/iommu.h | 86 +++++++++++++++++++++----------------
>>  1 file changed, 48 insertions(+), 38 deletions(-)
>> 
>> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
>> index 72c1a2e..db80b31 100644
>> --- a/xen/drivers/passthrough/vtd/iommu.h
>> +++ b/xen/drivers/passthrough/vtd/iommu.h
>> +#define DMA_ECAP_SNP_CTL        ((uint64_t)1 << 7)
>> +#define DMA_ECAP_PASS_THRU      ((uint64_t)1 << 6)
>> +#define DMA_ECAP_CACHE_HINTS    ((uint64_t)1 << 5)
>> +#define DMA_ECAP_EIM            ((uint64_t)1 << 4)
>> +#define DMA_ECAP_INTR_REMAP     ((uint64_t)1 << 3)
>> +#define DMA_ECAP_DEV_IOTLB      ((uint64_t)1 << 2)
>> +#define DMA_ECAP_QUEUED_INVAL   ((uint64_t)1 << 1)
>> +#define DMA_ECAP_COHERENT       ((uint64_t)1 << 0)
>
>I think the general practice is to use 1UL (because it's shorter), or
>1U for 32bits.

Will do.

Thanks
chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 09/28] x86/vvtd: Set Interrupt Remapping Table Pointer through GCMD
  2017-11-17  6:22 ` [PATCH v4 09/28] x86/vvtd: Set Interrupt Remapping Table Pointer through GCMD Chao Gao
@ 2018-02-09 16:59   ` Roger Pau Monné
  2018-02-11  4:34     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-09 16:59 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:16PM +0800, Chao Gao wrote:
> Software sets SIRTP field of GCMD to set/update the interrupt remapping
> table pointer used by hardware. The interrupt remapping table pointer is
> specified through the Interrupt Remapping Table Address (IRTA_REG)
> register.
> 
> This patch emulates this operation and adds some new fields in VVTD to track
> info (e.g. the table's gfn and max supported entries) of interrupt remapping
> table.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> 
> ---
> v4:
>  - declare eim_enabled as bool and irt as gfn_t
>  - rename vvtd_handle_gcmd_sirtp() to write_gcmd_sirtp()
> 
> v3:
>  - ignore unaligned r/w of vt-d hardware registers and return X86EMUL_OK
> ---
>  xen/drivers/passthrough/vtd/iommu.h | 16 ++++++-
>  xen/drivers/passthrough/vtd/vvtd.c  | 86 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 100 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
> index f2ef3dd..8579843 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -48,7 +48,8 @@
>  #define DMAR_IQT_REG            0x88 /* invalidation queue tail */
>  #define DMAR_IQA_REG            0x90 /* invalidation queue addr */
>  #define DMAR_IECTL_REG          0xa0 /* invalidation event control register */
> -#define DMAR_IRTA_REG           0xb8 /* intr remap */
> +#define DMAR_IRTA_REG           0xb8 /* base address of intr remap table */
> +#define DMAR_IRTUA_REG          0xbc /* upper address of intr remap table */
>  
>  #define OFFSET_STRIDE        (9)
>  #define dmar_readl(dmar, reg) readl((dmar) + (reg))
> @@ -150,6 +151,9 @@
>  #define DMA_GCMD_SIRTP  (((u64)1) << 24)
>  #define DMA_GCMD_CFI    (((u64)1) << 23)
>  
> +/* mask of one-shot bits */
> +#define DMA_GCMD_ONE_SHOT_MASK 0x96ffffff
> +
>  /* GSTS_REG */
>  #define DMA_GSTS_TES    (((u64)1) << 31)
>  #define DMA_GSTS_RTPS   (((u64)1) << 30)
> @@ -157,10 +161,18 @@
>  #define DMA_GSTS_AFLS   (((u64)1) << 28)
>  #define DMA_GSTS_WBFS   (((u64)1) << 27)
>  #define DMA_GSTS_QIES   (((u64)1) <<26)
> +#define DMA_GSTS_SIRTPS_SHIFT   24
> +#define DMA_GSTS_SIRTPS (((u64)1) << DMA_GSTS_SIRTPS_SHIFT)
>  #define DMA_GSTS_IRES   (((u64)1) <<25)
> -#define DMA_GSTS_SIRTPS (((u64)1) << 24)
>  #define DMA_GSTS_CFIS   (((u64)1) <<23)
>  
> +/* IRTA_REG */
> +/* The base of 4KB aligned interrupt remapping table */
> +#define DMA_IRTA_ADDR(val)      ((val) & ~0xfffULL)
> +/* The size of remapping table is 2^(x+1), where x is the size field in IRTA */
> +#define DMA_IRTA_S(val)         (val & 0xf)
> +#define DMA_IRTA_SIZE(val)      (1UL << (DMA_IRTA_S(val) + 1))
> +
>  /* PMEN_REG */
>  #define DMA_PMEN_EPM    (((u32)1) << 31)
>  #define DMA_PMEN_PRS    (((u32)1) << 0)
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
> index d78d878..f0476fe 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -36,6 +36,12 @@
>  #define VVTD_MAX_OFFSET VVTD_FRCD_END
>  
>  struct hvm_hw_vvtd {
> +    bool eim_enabled;
> +
> +    /* Interrupt remapping table base gfn and the max of entries */
> +    uint16_t irt_max_entry;
> +    gfn_t irt;
> +
>      uint32_t regs[VVTD_MAX_OFFSET/sizeof(uint32_t)];
>  };
>  
> @@ -73,6 +79,16 @@ boolean_runtime_param("viommu_verbose", viommu_verbose);
>  
>  #define VVTD_REG_POS(vvtd, offset) &(vvtd->hw.regs[offset/sizeof(uint32_t)])
>  
> +static inline void vvtd_set_bit(struct vvtd *vvtd, uint32_t reg, int nr)
> +{
> +    __set_bit(nr, VVTD_REG_POS(vvtd, reg));
> +}
> +
> +static inline void vvtd_clear_bit(struct vvtd *vvtd, uint32_t reg, int nr)
> +{
> +    __clear_bit(nr, VVTD_REG_POS(vvtd, reg));
> +}
> +
>  static inline void vvtd_set_reg(struct vvtd *vvtd, uint32_t reg, uint32_t value)
>  {
>      *VVTD_REG_POS(vvtd, reg) = value;
> @@ -102,6 +118,52 @@ static void *domain_vvtd(const struct domain *d)
>          return NULL;
>  }
>  
> +static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
> +{
> +    uint64_t irta = vvtd_get_reg_quad(vvtd, DMAR_IRTA_REG);
> +
> +    if ( !(val & DMA_GCMD_SIRTP) )

I think you likely want to do put_gfn here (see my comment below).

> +        return;
> +
> +    /*
> +     * Hardware clears this bit when software sets the SIRTPS field in
> +     * the Global Command register and sets it when hardware completes
> +     * the 'Set Interrupt Remap Table Pointer' operation.
> +     */
> +    vvtd_clear_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
> +
> +    if ( gfn_x(vvtd->hw.irt) != PFN_DOWN(DMA_IRTA_ADDR(irta)) ||
> +         vvtd->hw.irt_max_entry != DMA_IRTA_SIZE(irta) )
> +    {
> +        vvtd->hw.irt = _gfn(PFN_DOWN(DMA_IRTA_ADDR(irta)));

I'm not sure about the usage of this gfn (I guess I will figure out in
further patches), but I think you should probably use get_gfn so that
you take a reference to it. Using PFN_DOWN and _gfn is clearly
defeating the purpose of the whole gfn infrastructure.

Note that you then need to use put_gfn when releasing it.

> +        vvtd->hw.irt_max_entry = DMA_IRTA_SIZE(irta);
> +        vvtd->hw.eim_enabled = !!(irta & IRTA_EIME);
> +        vvtd_info("Update IR info (addr=%lx eim=%d size=%d)\n",
> +                  gfn_x(vvtd->hw.irt), vvtd->hw.eim_enabled,
> +                  vvtd->hw.irt_max_entry);
> +    }
> +    vvtd_set_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
> +}
> +
> +static void vvtd_write_gcmd(struct vvtd *vvtd, uint32_t val)
> +{
> +    uint32_t orig = vvtd_get_reg(vvtd, DMAR_GSTS_REG);
> +    uint32_t changed;
> +
> +    orig = orig & DMA_GCMD_ONE_SHOT_MASK;   /* reset the one-shot bits */
> +    changed = orig ^ val;
> +
> +    if ( !changed )
> +        return;
> +
> +    if ( changed & (changed - 1) )
> +        vvtd_info("Write %x to GCMD (current %x), updating multiple fields",
> +                  val, orig);

I'm not sure I see the purpose of the above message.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 07/28] x86/hvm: Introduce a emulated VTD for HVM
  2018-02-09 16:27   ` Roger Pau Monné
@ 2018-02-09 17:12     ` Chao Gao
  2018-02-12 10:35       ` Roger Pau Monné
  0 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2018-02-09 17:12 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Feb 09, 2018 at 04:27:54PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:14PM +0800, Chao Gao wrote:
>> This patch adds create/destroy function for the emulated VTD
>> and adapts it to the common VIOMMU abstraction.
>> 
>> As the Makefile is changed here, put all files in alphabetic order
>> by this chance.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> 
>> ---
>> v4:
>> - use REGISTER_VIOMMU
>> - shrink the size of hvm_hw_vvtd_regs
>> - make hvm_hw_vvtd_regs a field inside struct vvtd
>> ---
>>  xen/drivers/passthrough/vtd/Makefile |   7 +-
>>  xen/drivers/passthrough/vtd/iommu.h  |   9 +++
>>  xen/drivers/passthrough/vtd/vvtd.c   | 150 +++++++++++++++++++++++++++++++++++
>>  3 files changed, 163 insertions(+), 3 deletions(-)
>>  create mode 100644 xen/drivers/passthrough/vtd/vvtd.c
>> 
>> diff --git a/xen/drivers/passthrough/vtd/Makefile b/xen/drivers/passthrough/vtd/Makefile
>> index f302653..163c7fe 100644
>> --- a/xen/drivers/passthrough/vtd/Makefile
>> +++ b/xen/drivers/passthrough/vtd/Makefile
>> @@ -1,8 +1,9 @@
>>  subdir-$(CONFIG_X86) += x86
>>  
>> -obj-y += iommu.o
>>  obj-y += dmar.o
>> -obj-y += utils.o
>> -obj-y += qinval.o
>>  obj-y += intremap.o
>> +obj-y += iommu.o
>> +obj-y += qinval.o
>>  obj-y += quirks.o
>> +obj-y += utils.o
>> +obj-$(CONFIG_VIOMMU) += vvtd.o
>> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
>> index db80b31..f2ef3dd 100644
>> --- a/xen/drivers/passthrough/vtd/iommu.h
>> +++ b/xen/drivers/passthrough/vtd/iommu.h
>> @@ -47,6 +47,7 @@
>>  #define DMAR_IQH_REG            0x80 /* invalidation queue head */
>>  #define DMAR_IQT_REG            0x88 /* invalidation queue tail */
>>  #define DMAR_IQA_REG            0x90 /* invalidation queue addr */
>> +#define DMAR_IECTL_REG          0xa0 /* invalidation event control register */
>>  #define DMAR_IRTA_REG           0xb8 /* intr remap */
>>  
>>  #define OFFSET_STRIDE        (9)
>> @@ -89,6 +90,12 @@
>>  #define cap_afl(c)        (((c) >> 3) & 1)
>>  #define cap_ndoms(c)        (1 << (4 + 2 * ((c) & 0x7)))
>>  
>> +#define cap_set_num_fault_regs(c)   ((((c) - 1) & 0xff) << 40)
>> +#define cap_set_fault_reg_offset(c) ((((c) / 16) & 0x3ff) << 24)
>> +#define cap_set_mgaw(c)             ((((c) - 1) & 0x3f) << 16)
>> +#define cap_set_sagaw(c)            (((c) & 0x1f) << 8)
>> +#define cap_set_ndoms(c)            ((c) & 0x7)
>> +
>>  /*
>>   * Extended Capability Register
>>   */
>> @@ -114,6 +121,8 @@
>>  #define ecap_niotlb_iunits(e)    ((((e) >> 24) & 0xff) + 1)
>>  #define ecap_iotlb_offset(e)     ((((e) >> 8) & 0x3ff) * 16)
>>  
>> +#define ecap_set_mhmv(e)         (((e) & 0xf) << 20)
>> +
>>  /* IOTLB_REG */
>>  #define DMA_TLB_FLUSH_GRANU_OFFSET  60
>>  #define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60)
>> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
>> new file mode 100644
>> index 0000000..9f76ccf
>> --- /dev/null
>> +++ b/xen/drivers/passthrough/vtd/vvtd.c
>> @@ -0,0 +1,150 @@
>> +/*
>> + * vvtd.c
>> + *
>> + * virtualize VTD for HVM.
>> + *
>> + * Copyright (C) 2017 Chao Gao, Intel Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms and conditions of the GNU General Public
>> + * License, version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public
>> + * License along with this program; If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <xen/sched.h>
>> +#include <xen/types.h>
>> +#include <xen/viommu.h>
>> +#include <xen/xmalloc.h>
>> +#include <asm/current.h>
>> +#include <asm/hvm/domain.h>
>> +
>> +#include "iommu.h"
>> +
>> +/* Supported capabilities by vvtd */
>> +#define VVTD_MAX_CAPS VIOMMU_CAP_IRQ_REMAPPING
>> +
>> +#define VVTD_FRCD_NUM   1ULL
>> +#define VVTD_FRCD_START (DMAR_IRTA_REG + 8)
>> +#define VVTD_FRCD_END   (VVTD_FRCD_START + VVTD_FRCD_NUM * 16)
>> +#define VVTD_MAX_OFFSET VVTD_FRCD_END
>> +
>> +struct hvm_hw_vvtd {
>> +    uint32_t regs[VVTD_MAX_OFFSET/sizeof(uint32_t)];
>
>Unless I'm mistaken this is 208bytes in size, yet you only seem to use
>28bytes (from the registers used in vvtd_reset). I guess this is going
>to change over the series so all this space is really needed.

Yes except I bump up the offset when introducing "queue invalidation"
and "fault reporting" which are two features of VT-d.

>
>Also I think this would be better as:
>
>union hw_vvtd {
>    uint32_t regs32[VVTD_MAX_OFFSET/sizeof(uint32_t)];
>    uint64_t regs64[VVTD_MAX_OFFSET/sizeof(uint64_t)];
>};
>

Will do.

>> +};
>> +
>> +struct vvtd {
>> +    /* Base address of remapping hardware register-set */
>> +    uint64_t base_addr;
>> +    /* Point back to the owner domain */
>> +    struct domain *domain;
>> +
>> +    struct hvm_hw_vvtd hw;
>> +};
>> +
>> +/* Setting viommu_verbose enables debugging messages of vIOMMU */
>> +bool __read_mostly viommu_verbose;
>
>static?
>
>> +boolean_runtime_param("viommu_verbose", viommu_verbose);
>> +
>> +#ifndef NDEBUG
>> +#define vvtd_info(fmt...) do {                    \
>> +    if ( viommu_verbose )                         \
>> +        gprintk(XENLOG_INFO, ## fmt);             \
>> +} while(0)
>> +/*
>> + * Use printk and '_G_' prefix because vvtd_debug() may be called
>> + * in the context of another domain's vCPU. Don't output 'current'
>> + * information to avoid confusion.
>> + */
>> +#define vvtd_debug(fmt...) do {                   \
>> +    if ( viommu_verbose && printk_ratelimit())    \
>> +        printk(XENLOG_G_DEBUG fmt);               \
>
>I think printk is already rate-limited if you use _G_, so no need for
>the ratelimit call.
>
>> +} while(0)
>> +#else
>> +#define vvtd_info(...) do {} while(0)
>> +#define vvtd_debug(...) do {} while(0)
>> +#endif
>> +
>> +#define VVTD_REG_POS(vvtd, offset) &(vvtd->hw.regs[offset/sizeof(uint32_t)])
>> +
>> +static inline void vvtd_set_reg(struct vvtd *vvtd, uint32_t reg, uint32_t value)
>
>I don't think you need the vvtd prefix here, and I would leave adding
>inline to the compiler discretion:
>
>static void set_reg32(struct vvtd *vvtd, unsigned long offset, uint32_t val)
>{
>    vvtd->hw.regs32[offset / 4] = val
>}
>
>But I think you can even get rid of the helper functions and just use
>macros directly, ie:
>
>#define GET_REG(vvtd, offset, size) \
>    ((vvtd)->hw.regs ## size [(offset) / size / 8 ])
>#define SET_REG(vvtd, offset, val, size) \
>    (GET_REG(vvtd, offset, val) = val)
>
>This is better IMHO, and I'm not really sure the SET_REG macro is
>really needed, you can just open code GET_REG(...) = val;

Got it.

>
>> +{
>> +    *VVTD_REG_POS(vvtd, reg) = value;
>> +}
>> +
>> +static inline uint32_t vvtd_get_reg(const struct vvtd *vvtd, uint32_t reg)
>> +{
>> +    return *VVTD_REG_POS(vvtd, reg);
>> +}
>> +
>> +static inline void vvtd_set_reg_quad(struct vvtd *vvtd, uint32_t reg,
>> +                                     uint64_t value)
>> +{
>> +    *(uint64_t*)VVTD_REG_POS(vvtd, reg) = value;
>> +}
>> +
>> +static inline uint64_t vvtd_get_reg_quad(const struct vvtd *vvtd, uint32_t reg)
>> +{
>> +    return *(uint64_t*)VVTD_REG_POS(vvtd, reg);
>> +}
>> +
>> +static void vvtd_reset(struct vvtd *vvtd)
>> +{
>> +    uint64_t cap = cap_set_num_fault_regs(VVTD_FRCD_NUM)
>> +                   | cap_set_fault_reg_offset(VVTD_FRCD_START)
>> +                   | cap_set_mgaw(39) /* maximum guest address width */
>> +                   | cap_set_sagaw(2) /* support 3-level page_table */
>> +                   | cap_set_ndoms(6); /* support 64K domains */
>> +    uint64_t ecap = DMA_ECAP_QUEUED_INVAL | DMA_ECAP_INTR_REMAP | DMA_ECAP_EIM |
>> +                    ecap_set_mhmv(0xf);
>> +
>> +    vvtd_set_reg(vvtd, DMAR_VER_REG, 0x10UL);
>> +    vvtd_set_reg_quad(vvtd, DMAR_CAP_REG, cap);
>> +    vvtd_set_reg_quad(vvtd, DMAR_ECAP_REG, ecap);
>> +    vvtd_set_reg(vvtd, DMAR_FECTL_REG, 0x80000000UL);
>> +    vvtd_set_reg(vvtd, DMAR_IECTL_REG, 0x80000000UL);
>> +}
>> +
>> +static int vvtd_create(struct domain *d, struct viommu *viommu)
>> +{
>> +    struct vvtd *vvtd;
>> +
>> +    if ( !is_hvm_domain(d) || (viommu->base_address & (PAGE_SIZE - 1)) ||
>> +         (~VVTD_MAX_CAPS & viommu->caps) )
>> +        return -EINVAL;
>> +
>> +    vvtd = xzalloc_bytes(sizeof(struct vvtd));
>
>vvtd = xzalloc(struct vvtd);
>
>> +    if ( !vvtd )
>> +        return ENOMEM;
>> +
>> +    vvtd_reset(vvtd);
>> +    vvtd->base_addr = viommu->base_address;
>
>I think it would be good to have some check here, so that the vIOMMU
>is not for example positioned on top of a RAM region. Ideally you
>should check that the gfns [base_address, base_address + size) are
>unpopulated.

Yes. Except some checks here, this page should be reserved in guest e820,
which implies some work in qemu or tool stack.

>
>> +    vvtd->domain = d;
>> +
>> +    viommu->priv = vvtd;
>> +
>> +    return 0;
>> +}
>> +
>> +static int vvtd_destroy(struct viommu *viommu)
>> +{
>> +    struct vvtd *vvtd = viommu->priv;
>> +
>> +    if ( vvtd )
>> +        xfree(vvtd);
>> +
>> +    return 0;
>> +}
>> +
>> +static const struct viommu_ops vvtd_hvm_vmx_ops = {
>
>Is the vmx needed? vvtd is already Intel specific AFAICT. You could
>probably omit the hvm also, so vvtd_ops.

Will do.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 10/28] x86/vvtd: Enable Interrupt Remapping through GCMD
  2017-11-17  6:22 ` [PATCH v4 10/28] x86/vvtd: Enable Interrupt Remapping " Chao Gao
@ 2018-02-09 17:15   ` Roger Pau Monné
  2018-02-11  5:05     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-09 17:15 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:17PM +0800, Chao Gao wrote:
> Software writes this field to enable/disable interrupt reampping. This
> patch emulate IRES field of GCMD. Currently, Guest's whole IRT are
> mapped to Xen permanently for the latency of delivering interrupt. And
> the old mapping is undone if present when trying to set up a new one.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
> v4:
>  - map guest's interrupt reampping table to Xen permanently rather than
>  mapping one specific page on demand.
> ---
>  xen/drivers/passthrough/vtd/iommu.h |  3 +-
>  xen/drivers/passthrough/vtd/vvtd.c  | 98 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 100 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
> index 8579843..9c59aeb 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -161,9 +161,10 @@
>  #define DMA_GSTS_AFLS   (((u64)1) << 28)
>  #define DMA_GSTS_WBFS   (((u64)1) << 27)
>  #define DMA_GSTS_QIES   (((u64)1) <<26)
> +#define DMA_GSTS_IRES_SHIFT     25
> +#define DMA_GSTS_IRES   (((u64)1) << DMA_GSTS_IRES_SHIFT)

We are trying to avoid more use-cases of u64. Also, didn't you clean
that file in a previous patch? Why was this not properly adjusted to
use UL or uint64_t there?

>  #define DMA_GSTS_SIRTPS_SHIFT   24
>  #define DMA_GSTS_SIRTPS (((u64)1) << DMA_GSTS_SIRTPS_SHIFT)
> -#define DMA_GSTS_IRES   (((u64)1) <<25)
>  #define DMA_GSTS_CFIS   (((u64)1) <<23)
>  
>  /* IRTA_REG */
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
> index f0476fe..06e522a 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -24,6 +24,7 @@
>  #include <xen/xmalloc.h>
>  #include <asm/current.h>
>  #include <asm/hvm/domain.h>
> +#include <asm/p2m.h>
>  
>  #include "iommu.h"
>  
> @@ -37,6 +38,7 @@
>  
>  struct hvm_hw_vvtd {
>      bool eim_enabled;
> +    bool intremap_enabled;
>  
>      /* Interrupt remapping table base gfn and the max of entries */
>      uint16_t irt_max_entry;
> @@ -52,6 +54,7 @@ struct vvtd {
>      struct domain *domain;
>  
>      struct hvm_hw_vvtd hw;
> +    void *irt_base;
>  };
>  
>  /* Setting viommu_verbose enables debugging messages of vIOMMU */
> @@ -118,6 +121,77 @@ static void *domain_vvtd(const struct domain *d)
>          return NULL;
>  }
>  
> +static void *map_guest_pages(struct domain *d, uint64_t gfn, uint32_t nr)
                                                  ^ gfn_t

Also, this function and unmap_guest_pages look generic enough to be
placed somewhere else, like p2m.c maybe?

> +{
> +    mfn_t *mfn = xmalloc_array(mfn_t, nr);
> +    void* ret;
> +    int i;
> +
> +    if ( !mfn )
> +        return NULL;
> +
> +    for ( i = 0; i < nr; i++)
> +    {
> +        struct page_info *p = get_page_from_gfn(d, gfn + i, NULL, P2M_ALLOC);
> +
> +        if ( !p || !get_page_type(p, PGT_writable_page) )
> +        {
> +            if ( p )
> +                put_page(p);
> +            goto undo;
> +        }
> +
> +        mfn[i] = _mfn(page_to_mfn(p));

Please use the type-safe version of page_to_mfn, by adding the
following at the top of the file:

/* Override macros from asm/mm.h to make them work with mfn_t */
#undef mfn_to_page
#define mfn_to_page(mfn) __mfn_to_page(mfn_x(mfn))
#undef page_to_mfn
#define page_to_mfn(pg) _mfn(__page_to_mfn(pg))

> +    }
> +
> +    ret = vmap(mfn, nr);
> +    if ( ret == NULL )
> +        goto undo;
> +    xfree(mfn);

You can move the xfree(mfn) before the check, and then you can remove
it from the undo label.

And since the undo label is just used once, what about doing

    ret = vmap(mfn, nr);
    xfree(mfn);
    if ( !ret )
    {
        while ( i-- )
            put_page_and_type(mfn_to_page(mfn_x(mfn[i])));
        ....

> +
> +    return ret;
> +
> + undo:
> +    for ( ; --i >= 0; )
> +        put_page_and_type(mfn_to_page(mfn_x(mfn[i])));
> +    xfree(mfn);
> +    gprintk(XENLOG_ERR, "Failed to map guest pages %lx nr %x\n", gfn, nr);
> +
> +    return NULL;
> +}
> +
> +static void unmap_guest_pages(void *va, uint32_t nr)
unsigned long please.

> +{
> +    unsigned long *mfn = xmalloc_array(unsigned long, nr);
> +    int i;
> +    void *va_copy = va;
> +
> +    if ( !mfn )
> +    {
> +        printk("%s %d: No free memory\n", __FILE__, __LINE__);
> +        return;
> +    }
> +
> +    for ( i = 0; i < nr; i++, va += PAGE_SIZE)
> +        mfn[i] = domain_page_map_to_mfn(va);
> +
> +    vunmap(va_copy);
> +
> +    for ( i = 0; i < nr; i++)
> +        put_page_and_type(mfn_to_page(mfn[i]));
> +}
> +
> +static void write_gcmd_ire(struct vvtd *vvtd, uint32_t val)
> +{
> +    bool set = val & DMA_GCMD_IRE;
> +
> +    vvtd_info("%sable Interrupt Remapping\n", set ? "En" : "Dis");
> +
> +    vvtd->hw.intremap_enabled = set;
> +    (set ? vvtd_set_bit : vvtd_clear_bit)
> +        (vvtd, DMAR_GSTS_REG, DMA_GSTS_IRES_SHIFT);
> +}
> +
>  static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
>  {
>      uint64_t irta = vvtd_get_reg_quad(vvtd, DMAR_IRTA_REG);
> @@ -131,16 +205,29 @@ static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
>       * the 'Set Interrupt Remap Table Pointer' operation.
>       */
>      vvtd_clear_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
> +    if ( vvtd->hw.intremap_enabled )
> +        vvtd_info("Update Interrupt Remapping Table when active\n");
>  
>      if ( gfn_x(vvtd->hw.irt) != PFN_DOWN(DMA_IRTA_ADDR(irta)) ||
>           vvtd->hw.irt_max_entry != DMA_IRTA_SIZE(irta) )
>      {
> +        if ( vvtd->irt_base )
> +        {
> +            unmap_guest_pages(vvtd->irt_base,
> +                              PFN_UP(vvtd->hw.irt_max_entry *
> +                                     sizeof(struct iremap_entry)));
> +            vvtd->irt_base = NULL;
> +        }

Shouldn't this be done when sirtp is switched off, instead of when
it's updated?

What happens in the following scenario:

- Guest writes gfn to irta.
- Guest enables sirtps.
- Guest disables sirtps.
- Guest tries to balloon out the page used in irta.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 08/28] x86/vvtd: Add MMIO handler for VVTD
  2018-02-09 16:39   ` Roger Pau Monné
@ 2018-02-09 17:21     ` Chao Gao
  2018-02-09 17:51       ` Roger Pau Monné
  0 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2018-02-09 17:21 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Feb 09, 2018 at 04:39:15PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:15PM +0800, Chao Gao wrote:
>> This patch adds VVTD MMIO handler to deal with MMIO access.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> ---
>> v4:
>>  - only trap the register emulated in vvtd_in_range().
>>    i.e. replace PAGE_SIZE with the VVTD_MAX_OFFSET
>> ---
>>  xen/drivers/passthrough/vtd/vvtd.c | 55 ++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 55 insertions(+)
>> 
>> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
>> index 9f76ccf..d78d878 100644
>> --- a/xen/drivers/passthrough/vtd/vvtd.c
>> +++ b/xen/drivers/passthrough/vtd/vvtd.c
>
>Now that I look at this, this is the wrong folder. This should be in
>xen/arch/x86/hvm with the rest of the emulated devices.

It is a problem we discussed in previous versions. AMD puts its vIOMMU
(iommu_guest.c) in xen/drivers/passthrough/amd/. We are following what
they did. I don't have special taste on this. If no one objects to your
suggestion, I will move it to xen/arch/x86/hvm/. Maybe create a new
intel directory since it's intel-specific and won't be used by AMD.

>
>> @@ -94,6 +94,60 @@ static inline uint64_t vvtd_get_reg_quad(const struct vvtd *vvtd, uint32_t reg)
>>      return *(uint64_t*)VVTD_REG_POS(vvtd, reg);
>>  }
>>  
>> +static void *domain_vvtd(const struct domain *d)
>> +{
>> +    if ( is_hvm_domain(d) && d->arch.hvm_domain.viommu )
>
>hvm_mmio_ops is only used by HVM guests, so the is_hvm_domain check
>here is redundant. At which point the helper can be simplified as:
>
>static struct vvtd *domain_vvtd(const struct domain *d)
>{
>    return d->arch.hvm_domain.viommu ? d->arch.hvm_domain.viommu->priv : NULL;
>}
>

Got it.

Thanks
Chao

>> +        return d->arch.hvm_domain.viommu->priv;
>> +    else
>> +        return NULL;
>> +}
>> +
>> +static int vvtd_in_range(struct vcpu *v, unsigned long addr)
>> +{
>> +    struct vvtd *vvtd = domain_vvtd(v->domain);
>const
>
>> +
>> +    if ( vvtd )
>> +        return (addr >= vvtd->base_addr) &&
>> +               (addr < vvtd->base_addr + VVTD_MAX_OFFSET);
>> +    return 0;
>> +}
>> +
>> +static int vvtd_read(struct vcpu *v, unsigned long addr,
>> +                     unsigned int len, unsigned long *pval)
>> +{
>> +    struct vvtd *vvtd = domain_vvtd(v->domain);
>const
>
>Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 11/28] x86/vvtd: Process interrupt remapping request
  2017-11-17  6:22 ` [PATCH v4 11/28] x86/vvtd: Process interrupt remapping request Chao Gao
@ 2018-02-09 17:44   ` Roger Pau Monné
  2018-02-11  5:31     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-09 17:44 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:18PM +0800, Chao Gao wrote:
> When a remapping interrupt request arrives, remapping hardware computes the
> interrupt_index per the algorithm described in VTD spec
> "Interrupt Remapping Table", interprets the IRTE and generates a remapped
> interrupt request.
> 
> This patch introduces viommu_handle_irq_request() to emulate the process how
> remapping hardware handles a remapping interrupt request. This patch
> also introduces a counter inflight_intr, which is used to count the number
> of interrupt are being handled. The reason why we should have this
> counter is VT-d hardware should drain in-flight interrups before setting
> flags to show that some operations are completed. These operations
> include enabling interrupt remapping and performing a kind of invalidation
> requests. In vvtd, we also try to drain in-flight interrupts by waiting
> the inflight_intr is decreased to 0.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> 
> ---
> v4:
>  - use "#define" to define interrupt remapping transition faults
>  rather than using an enum
>  - use switch-case rather than if-else in irq_remapping_request_index()
>  and vvtd_irq_request_sanity_check()
>  - introduce a counter inflight_intr
> 
> v3:
>  - Encode map_guest_page()'s error into void* to avoid using another parameter
> ---
>  xen/drivers/passthrough/vtd/iommu.h |  15 +++
>  xen/drivers/passthrough/vtd/vvtd.c  | 219 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 234 insertions(+)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
> index 9c59aeb..82edd2a 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -216,6 +216,15 @@
>  #define dma_frcd_source_id(c) (c & 0xffff)
>  #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
>  
> +/* Interrupt remapping transition faults */
> +#define VTD_FR_IR_REQ_RSVD      0x20
> +#define VTD_FR_IR_INDEX_OVER    0x21
> +#define VTD_FR_IR_ENTRY_P       0x22
> +#define VTD_FR_IR_ROOT_INVAL    0x23
> +#define VTD_FR_IR_IRTE_RSVD     0x24
> +#define VTD_FR_IR_REQ_COMPAT    0x25
> +#define VTD_FR_IR_SID_ERR       0x26
> +
>  /*
>   * 0: Present
>   * 1-11: Reserved
> @@ -356,6 +365,12 @@ struct iremap_entry {
>  };
>  
>  /*
> + * When VT-d doesn't enable extended interrupt mode, hardware interprets
> + * 8-bits ([15:8]) of Destination-ID field in the IRTEs.
> + */
> +#define IRTE_xAPIC_DEST_MASK 0xff00
> +
> +/*
>   * Posted-interrupt descriptor address is 64 bits with 64-byte aligned, only
>   * the upper 26 bits of lest significiant 32 bits is available.
>   */
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
> index 06e522a..927e715 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -22,11 +22,15 @@
>  #include <xen/types.h>
>  #include <xen/viommu.h>
>  #include <xen/xmalloc.h>
> +#include <asm/apic.h>
>  #include <asm/current.h>
> +#include <asm/event.h>
> +#include <asm/io_apic.h>
>  #include <asm/hvm/domain.h>
>  #include <asm/p2m.h>
>  
>  #include "iommu.h"
> +#include "vtd.h"
>  
>  /* Supported capabilities by vvtd */
>  #define VVTD_MAX_CAPS VIOMMU_CAP_IRQ_REMAPPING
> @@ -52,6 +56,8 @@ struct vvtd {
>      uint64_t base_addr;
>      /* Point back to the owner domain */
>      struct domain *domain;
> +    /* # of in-flight interrupts */
> +    atomic_t inflight_intr;
>  
>      struct hvm_hw_vvtd hw;
>      void *irt_base;
> @@ -181,6 +187,109 @@ static void unmap_guest_pages(void *va, uint32_t nr)
>          put_page_and_type(mfn_to_page(mfn[i]));
>  }
>  
> +static int vvtd_delivery(struct domain *d, uint8_t vector,
> +                         uint32_t dest, bool dest_mode,
> +                         uint8_t delivery_mode, uint8_t trig_mode)
> +{
> +    struct vlapic *target;
> +    struct vcpu *v;
> +
> +    switch ( delivery_mode )
> +    {
> +    case dest_LowestPrio:
> +        target = vlapic_lowest_prio(d, NULL, 0, dest, dest_mode);
> +        if ( target != NULL )
> +        {
> +            vvtd_debug("d%d: dest=v%d dlm=%x vector=%d trig_mode=%d\n",
> +                       vlapic_domain(target)->domain_id,
> +                       vlapic_vcpu(target)->vcpu_id,
> +                       delivery_mode, vector, trig_mode);
> +            vlapic_set_irq(target, vector, trig_mode);
> +            break;
> +        }
> +        vvtd_debug("d%d: null round robin: vector=%02x\n",
> +                   d->domain_id, vector);
> +        break;
> +
> +    case dest_Fixed:
> +        for_each_vcpu ( d, v )
> +            if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) )
> +            {
> +                vvtd_debug("d%d: dest=v%d dlm=%x vector=%d trig_mode=%d\n",
> +                           v->domain->domain_id, v->vcpu_id,
> +                           delivery_mode, vector, trig_mode);
> +                vlapic_set_irq(vcpu_vlapic(v), vector, trig_mode);
> +            }
> +        break;
> +
> +    case dest_NMI:
> +        for_each_vcpu ( d, v )
> +            if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) &&
> +                 !test_and_set_bool(v->nmi_pending) )
> +                vcpu_kick(v);

Doing this loops here seems quite bad from a preformance PoV,
specially taking into account that this code is going to be used with
> 128 vCPUs.

> +        break;
> +
> +    default:
> +        gdprintk(XENLOG_WARNING, "Unsupported VTD delivery mode %d\n",
> +                 delivery_mode);
> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}
> +
> +/* Computing the IRTE index for a given interrupt request. When success, return
> + * 0 and set index to reference the corresponding IRTE. Otherwise, return < 0,
> + * i.e. -1 when the irq request isn't an remapping format.
> + */
> +static int irq_remapping_request_index(
> +    const struct arch_irq_remapping_request *irq, uint32_t *index)
> +{
> +    switch ( irq->type )
> +    {
> +    case VIOMMU_REQUEST_IRQ_MSI:
> +    {
> +        struct msi_msg_remap_entry msi_msg =
> +        {
> +            .address_lo = { .val = irq->msg.msi.addr },

Can't you just use .address_lo.val = irq->...

> +            .data = irq->msg.msi.data,
> +        };
> +
> +        if ( !msi_msg.address_lo.format )
> +            return -1;

In all the other functions you already return some kind of meaningful
error code, please do so here also.

> +
> +        *index = (msi_msg.address_lo.index_15 << 15) +
> +                msi_msg.address_lo.index_0_14;
> +        if ( msi_msg.address_lo.SHV )
> +            *index += (uint16_t)msi_msg.data;
> +        break;
> +    }
> +
> +    case VIOMMU_REQUEST_IRQ_APIC:
> +    {
> +        struct IO_APIC_route_remap_entry remap_rte = { .val = irq->msg.rte };
> +
> +        if ( !remap_rte.format )
> +            return -1;
> +
> +        *index = (remap_rte.index_15 << 15) + remap_rte.index_0_14;
> +        break;
> +    }
> +
> +    default:
> +        ASSERT_UNREACHABLE();
> +    }
> +
> +    return 0;
> +}
> +
> +static inline uint32_t irte_dest(struct vvtd *vvtd, uint32_t dest)
> +{
> +    /* In xAPIC mode, only 8-bits([15:8]) are valid */
> +    return vvtd->hw.eim_enabled ? dest
> +                                : MASK_EXTR(dest, IRTE_xAPIC_DEST_MASK);
> +}
> +
>  static void write_gcmd_ire(struct vvtd *vvtd, uint32_t val)
>  {
>      bool set = val & DMA_GCMD_IRE;
> @@ -323,6 +432,115 @@ static const struct hvm_mmio_ops vvtd_mmio_ops = {
>      .write = vvtd_write
>  };
>  
> +static void vvtd_handle_fault(struct vvtd *vvtd,
> +                              const struct arch_irq_remapping_request *irq,
> +                              struct iremap_entry *irte,
> +                              unsigned int fault)
> +{
> +    switch ( fault )
> +    {
> +    case VTD_FR_IR_SID_ERR:
> +    case VTD_FR_IR_IRTE_RSVD:
> +    case VTD_FR_IR_ENTRY_P:
> +        if ( qinval_fault_disable(*irte) )
> +            break;
> +    /* fall through */
> +    case VTD_FR_IR_REQ_RSVD:
> +    case VTD_FR_IR_INDEX_OVER:
> +    case VTD_FR_IR_ROOT_INVAL:
> +        /* TODO: handle fault (e.g. record and report this fault to VM */
> +        break;
> +
> +    default:
> +        vvtd_debug("d%d can't handle VT-d fault %x\n", vvtd->domain->domain_id,
> +                   fault);
> +    }
> +    return;
> +}
> +
> +static bool vvtd_irq_request_sanity_check(const struct vvtd *vvtd,
> +                                   const struct arch_irq_remapping_request *irq)
> +{
> +    switch ( irq->type )
> +    {
> +    case VIOMMU_REQUEST_IRQ_APIC:
> +    {
> +        struct IO_APIC_route_remap_entry rte = { .val = irq->msg.rte };
> +
> +        return !rte.reserved;
> +    }
> +
> +    case VIOMMU_REQUEST_IRQ_MSI:
> +        return true;
> +    }
> +
> +    ASSERT_UNREACHABLE();
> +    return false;
> +}
> +
> +static int vvtd_get_entry(struct vvtd *vvtd,
> +                          const struct arch_irq_remapping_request *irq,
> +                          struct iremap_entry *dest)
const for both vvtd and dest?

> +{
> +    uint32_t entry;
> +    struct iremap_entry irte;
> +    int ret = irq_remapping_request_index(irq, &entry);
> +
> +    ASSERT(!ret);
> +
> +    vvtd_debug("d%d: interpret a request with index %x\n",
> +               vvtd->domain->domain_id, entry);
> +
> +    if ( !vvtd_irq_request_sanity_check(vvtd, irq) )
> +        return VTD_FR_IR_REQ_RSVD;
> +    else if ( entry > vvtd->hw.irt_max_entry )
> +        return VTD_FR_IR_INDEX_OVER;
> +    else if ( !vvtd->irt_base )

No need for the 'else', since you are already using return.

> +        return VTD_FR_IR_ROOT_INVAL;
> +
> +    irte = ((struct iremap_entry*)vvtd->irt_base)[entry];
> +
> +    if ( !qinval_present(irte) )
> +        ret = VTD_FR_IR_ENTRY_P;
> +    else if ( (irte.remap.res_1 || irte.remap.res_2 || irte.remap.res_3 ||
> +               irte.remap.res_4) )
> +        ret = VTD_FR_IR_IRTE_RSVD;
> +
> +    /* FIXME: We don't check against the source ID */
> +
> +    dest->val = irte.val;
> +
> +    return ret;
> +}
> +
> +static int vvtd_handle_irq_request(const struct domain *d,

constifying domain here is not the best practice IMHO. In the function
you are actually modifying vvtd, which is fine because it's a pointer
but it's conceptually inside of domain.

> +                                   const struct arch_irq_remapping_request *irq)
> +{
> +    struct iremap_entry irte;
> +    int ret;
> +    struct vvtd *vvtd = domain_vvtd(d);
> +
> +    if ( !vvtd || !vvtd->hw.intremap_enabled )
> +        return -ENODEV;
> +
> +    atomic_inc(&vvtd->inflight_intr);
> +    ret = vvtd_get_entry(vvtd, irq, &irte);
> +    if ( ret )
> +    {
> +        vvtd_handle_fault(vvtd, irq, &irte, ret);
> +        goto out;
> +    }
> +
> +    ret = vvtd_delivery(vvtd->domain, irte.remap.vector,
> +                        irte_dest(vvtd, irte.remap.dst),
> +                        irte.remap.dm, irte.remap.dlm,
> +                        irte.remap.tm);
> +
> + out:
> +    atomic_dec(&vvtd->inflight_intr);

So inflight_intr seem to be quite pointless, you only use it in this
function and it's never read AFAICT.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 08/28] x86/vvtd: Add MMIO handler for VVTD
  2018-02-09 17:21     ` Chao Gao
@ 2018-02-09 17:51       ` Roger Pau Monné
  2018-02-22  6:20         ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-09 17:51 UTC (permalink / raw)
  To: xen-devel, Tim Deegan, Stefano Stabellini, Jan Beulich, Wei Liu,
	Ian Jackson, George Dunlap, Konrad Rzeszutek Wilk, Andrew Cooper,
	Kevin Tian, Lan Tianyu

On Sat, Feb 10, 2018 at 01:21:09AM +0800, Chao Gao wrote:
> On Fri, Feb 09, 2018 at 04:39:15PM +0000, Roger Pau Monné wrote:
> >On Fri, Nov 17, 2017 at 02:22:15PM +0800, Chao Gao wrote:
> >> This patch adds VVTD MMIO handler to deal with MMIO access.
> >> 
> >> Signed-off-by: Chao Gao <chao.gao@intel.com>
> >> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> >> ---
> >> v4:
> >>  - only trap the register emulated in vvtd_in_range().
> >>    i.e. replace PAGE_SIZE with the VVTD_MAX_OFFSET
> >> ---
> >>  xen/drivers/passthrough/vtd/vvtd.c | 55 ++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 55 insertions(+)
> >> 
> >> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
> >> index 9f76ccf..d78d878 100644
> >> --- a/xen/drivers/passthrough/vtd/vvtd.c
> >> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> >
> >Now that I look at this, this is the wrong folder. This should be in
> >xen/arch/x86/hvm with the rest of the emulated devices.
> 
> It is a problem we discussed in previous versions. AMD puts its vIOMMU
> (iommu_guest.c) in xen/drivers/passthrough/amd/. We are following what
> they did. I don't have special taste on this. If no one objects to your
> suggestion, I will move it to xen/arch/x86/hvm/. Maybe create a new
> intel directory since it's intel-specific and won't be used by AMD.

Oh, it's been quite some time since I've reviewed that, so TBH I
didn't remember that discussion.

If the AMD viommu thing is already there I guess it doesn't hurt...
Also, have you checked whether it can be converted to use the
infrastructure that you add here?

Both should really be using the same interface.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 09/28] x86/vvtd: Set Interrupt Remapping Table Pointer through GCMD
  2018-02-09 16:59   ` Roger Pau Monné
@ 2018-02-11  4:34     ` Chao Gao
  2018-02-11  5:09       ` Chao Gao
  2018-02-12 11:25       ` Roger Pau Monné
  0 siblings, 2 replies; 83+ messages in thread
From: Chao Gao @ 2018-02-11  4:34 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Feb 09, 2018 at 04:59:11PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:16PM +0800, Chao Gao wrote:
>> Software sets SIRTP field of GCMD to set/update the interrupt remapping
>> table pointer used by hardware. The interrupt remapping table pointer is
>> specified through the Interrupt Remapping Table Address (IRTA_REG)
>> register.
>> 
>> This patch emulates this operation and adds some new fields in VVTD to track
>> info (e.g. the table's gfn and max supported entries) of interrupt remapping
>> table.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> 
>> ---
>> v4:
>>  - declare eim_enabled as bool and irt as gfn_t
>>  - rename vvtd_handle_gcmd_sirtp() to write_gcmd_sirtp()
>> 
>> v3:
>>  - ignore unaligned r/w of vt-d hardware registers and return X86EMUL_OK
>> ---
>>  xen/drivers/passthrough/vtd/iommu.h | 16 ++++++-
>>  xen/drivers/passthrough/vtd/vvtd.c  | 86 +++++++++++++++++++++++++++++++++++++
>>  2 files changed, 100 insertions(+), 2 deletions(-)
>> 
>> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
>> index f2ef3dd..8579843 100644
>> --- a/xen/drivers/passthrough/vtd/iommu.h
>> +++ b/xen/drivers/passthrough/vtd/iommu.h
>> @@ -48,7 +48,8 @@
>>  #define DMAR_IQT_REG            0x88 /* invalidation queue tail */
>>  #define DMAR_IQA_REG            0x90 /* invalidation queue addr */
>>  #define DMAR_IECTL_REG          0xa0 /* invalidation event control register */
>> -#define DMAR_IRTA_REG           0xb8 /* intr remap */
>> +#define DMAR_IRTA_REG           0xb8 /* base address of intr remap table */
>> +#define DMAR_IRTUA_REG          0xbc /* upper address of intr remap table */
>>  
>>  #define OFFSET_STRIDE        (9)
>>  #define dmar_readl(dmar, reg) readl((dmar) + (reg))
>> @@ -150,6 +151,9 @@
>>  #define DMA_GCMD_SIRTP  (((u64)1) << 24)
>>  #define DMA_GCMD_CFI    (((u64)1) << 23)
>>  
>> +/* mask of one-shot bits */
>> +#define DMA_GCMD_ONE_SHOT_MASK 0x96ffffff
>> +
>>  /* GSTS_REG */
>>  #define DMA_GSTS_TES    (((u64)1) << 31)
>>  #define DMA_GSTS_RTPS   (((u64)1) << 30)
>> @@ -157,10 +161,18 @@
>>  #define DMA_GSTS_AFLS   (((u64)1) << 28)
>>  #define DMA_GSTS_WBFS   (((u64)1) << 27)
>>  #define DMA_GSTS_QIES   (((u64)1) <<26)
>> +#define DMA_GSTS_SIRTPS_SHIFT   24
>> +#define DMA_GSTS_SIRTPS (((u64)1) << DMA_GSTS_SIRTPS_SHIFT)
>>  #define DMA_GSTS_IRES   (((u64)1) <<25)
>> -#define DMA_GSTS_SIRTPS (((u64)1) << 24)
>>  #define DMA_GSTS_CFIS   (((u64)1) <<23)
>>  
>> +/* IRTA_REG */
>> +/* The base of 4KB aligned interrupt remapping table */
>> +#define DMA_IRTA_ADDR(val)      ((val) & ~0xfffULL)
>> +/* The size of remapping table is 2^(x+1), where x is the size field in IRTA */
>> +#define DMA_IRTA_S(val)         (val & 0xf)
>> +#define DMA_IRTA_SIZE(val)      (1UL << (DMA_IRTA_S(val) + 1))
>> +
>>  /* PMEN_REG */
>>  #define DMA_PMEN_EPM    (((u32)1) << 31)
>>  #define DMA_PMEN_PRS    (((u32)1) << 0)
>> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
>> index d78d878..f0476fe 100644
>> --- a/xen/drivers/passthrough/vtd/vvtd.c
>> +++ b/xen/drivers/passthrough/vtd/vvtd.c
>> @@ -36,6 +36,12 @@
>>  #define VVTD_MAX_OFFSET VVTD_FRCD_END
>>  
>>  struct hvm_hw_vvtd {
>> +    bool eim_enabled;
>> +
>> +    /* Interrupt remapping table base gfn and the max of entries */
>> +    uint16_t irt_max_entry;
>> +    gfn_t irt;
>> +
>>      uint32_t regs[VVTD_MAX_OFFSET/sizeof(uint32_t)];
>>  };
>>  
>> @@ -73,6 +79,16 @@ boolean_runtime_param("viommu_verbose", viommu_verbose);
>>  
>>  #define VVTD_REG_POS(vvtd, offset) &(vvtd->hw.regs[offset/sizeof(uint32_t)])
>>  
>> +static inline void vvtd_set_bit(struct vvtd *vvtd, uint32_t reg, int nr)
>> +{
>> +    __set_bit(nr, VVTD_REG_POS(vvtd, reg));
>> +}
>> +
>> +static inline void vvtd_clear_bit(struct vvtd *vvtd, uint32_t reg, int nr)
>> +{
>> +    __clear_bit(nr, VVTD_REG_POS(vvtd, reg));
>> +}
>> +
>>  static inline void vvtd_set_reg(struct vvtd *vvtd, uint32_t reg, uint32_t value)
>>  {
>>      *VVTD_REG_POS(vvtd, reg) = value;
>> @@ -102,6 +118,52 @@ static void *domain_vvtd(const struct domain *d)
>>          return NULL;
>>  }
>>  
>> +static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
>> +{
>> +    uint64_t irta = vvtd_get_reg_quad(vvtd, DMAR_IRTA_REG);
>> +
>> +    if ( !(val & DMA_GCMD_SIRTP) )
>
>I think you likely want to do put_gfn here (see my comment below).
>
>> +        return;
>> +
>> +    /*
>> +     * Hardware clears this bit when software sets the SIRTPS field in
>> +     * the Global Command register and sets it when hardware completes
>> +     * the 'Set Interrupt Remap Table Pointer' operation.
>> +     */
>> +    vvtd_clear_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
>> +
>> +    if ( gfn_x(vvtd->hw.irt) != PFN_DOWN(DMA_IRTA_ADDR(irta)) ||
>> +         vvtd->hw.irt_max_entry != DMA_IRTA_SIZE(irta) )
>> +    {
>> +        vvtd->hw.irt = _gfn(PFN_DOWN(DMA_IRTA_ADDR(irta)));
>
>I'm not sure about the usage of this gfn (I guess I will figure out in
>further patches), but I think you should probably use get_gfn so that
>you take a reference to it. Using PFN_DOWN and _gfn is clearly
>defeating the purpose of the whole gfn infrastructure.
>
>Note that you then need to use put_gfn when releasing it.

The steps to enable interrupt remapping is:
1. write to IRTA. Software should write the physcial address of interrupt
remapping table to this register.
2. write GCMD with SIRTP set. According to VT-d spec 10.4.4, software
sets SIRTP to set/update the interrupt reampping table pointer used by
hardware.
3. write GCMD with IRE set.

In this version, we get a reference in step3 (in next patch, through
map/unmap guest IRT) other than in step2. The benefit is when guest
tries to write SIRTP many times before enabling interrupt remapping,
vvtd doesn't need to perform map/unmap guest IRT each time.

>
>> +        vvtd->hw.irt_max_entry = DMA_IRTA_SIZE(irta);
>> +        vvtd->hw.eim_enabled = !!(irta & IRTA_EIME);
>> +        vvtd_info("Update IR info (addr=%lx eim=%d size=%d)\n",
>> +                  gfn_x(vvtd->hw.irt), vvtd->hw.eim_enabled,
>> +                  vvtd->hw.irt_max_entry);
>> +    }
>> +    vvtd_set_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
>> +}
>> +
>> +static void vvtd_write_gcmd(struct vvtd *vvtd, uint32_t val)
>> +{
>> +    uint32_t orig = vvtd_get_reg(vvtd, DMAR_GSTS_REG);
>> +    uint32_t changed;
>> +
>> +    orig = orig & DMA_GCMD_ONE_SHOT_MASK;   /* reset the one-shot bits */
>> +    changed = orig ^ val;
>> +
>> +    if ( !changed )
>> +        return;
>> +
>> +    if ( changed & (changed - 1) )
>> +        vvtd_info("Write %x to GCMD (current %x), updating multiple fields",
>> +                  val, orig);
>
>I'm not sure I see the purpose of the above message.

I will remove this. My original throught is when we could get a warning
that guest driver doesn't completely follow VT-d spec 10.4.4:
If multiple control fields in this register need to be modified,
software much serialize the modification through multiple writes to this
register.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 10/28] x86/vvtd: Enable Interrupt Remapping through GCMD
  2018-02-09 17:15   ` Roger Pau Monné
@ 2018-02-11  5:05     ` Chao Gao
  2018-02-12 11:30       ` Roger Pau Monné
  0 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2018-02-11  5:05 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Feb 09, 2018 at 05:15:17PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:17PM +0800, Chao Gao wrote:
>> Software writes this field to enable/disable interrupt reampping. This
>> patch emulate IRES field of GCMD. Currently, Guest's whole IRT are
>> mapped to Xen permanently for the latency of delivering interrupt. And
>> the old mapping is undone if present when trying to set up a new one.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> ---
>> v4:
>>  - map guest's interrupt reampping table to Xen permanently rather than
>>  mapping one specific page on demand.
>> ---
>>  xen/drivers/passthrough/vtd/iommu.h |  3 +-
>>  xen/drivers/passthrough/vtd/vvtd.c  | 98 +++++++++++++++++++++++++++++++++++++
>>  2 files changed, 100 insertions(+), 1 deletion(-)
>> 
>> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
>> index 8579843..9c59aeb 100644
>> --- a/xen/drivers/passthrough/vtd/iommu.h
>> +++ b/xen/drivers/passthrough/vtd/iommu.h
>> @@ -161,9 +161,10 @@
>>  #define DMA_GSTS_AFLS   (((u64)1) << 28)
>>  #define DMA_GSTS_WBFS   (((u64)1) << 27)
>>  #define DMA_GSTS_QIES   (((u64)1) <<26)
>> +#define DMA_GSTS_IRES_SHIFT     25
>> +#define DMA_GSTS_IRES   (((u64)1) << DMA_GSTS_IRES_SHIFT)
>
>We are trying to avoid more use-cases of u64. Also, didn't you clean
>that file in a previous patch? Why was this not properly adjusted to
>use UL or uint64_t there?

Yes. I did. I will do some cleanup and put all cleanup in one patch.

>
>>  #define DMA_GSTS_SIRTPS_SHIFT   24
>>  #define DMA_GSTS_SIRTPS (((u64)1) << DMA_GSTS_SIRTPS_SHIFT)
>> -#define DMA_GSTS_IRES   (((u64)1) <<25)
>>  #define DMA_GSTS_CFIS   (((u64)1) <<23)
>>  
>>  /* IRTA_REG */
>> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
>> index f0476fe..06e522a 100644
>> --- a/xen/drivers/passthrough/vtd/vvtd.c
>> +++ b/xen/drivers/passthrough/vtd/vvtd.c
>> @@ -24,6 +24,7 @@
>>  #include <xen/xmalloc.h>
>>  #include <asm/current.h>
>>  #include <asm/hvm/domain.h>
>> +#include <asm/p2m.h>
>>  
>>  #include "iommu.h"
>>  
>> @@ -37,6 +38,7 @@
>>  
>>  struct hvm_hw_vvtd {
>>      bool eim_enabled;
>> +    bool intremap_enabled;
>>  
>>      /* Interrupt remapping table base gfn and the max of entries */
>>      uint16_t irt_max_entry;
>> @@ -52,6 +54,7 @@ struct vvtd {
>>      struct domain *domain;
>>  
>>      struct hvm_hw_vvtd hw;
>> +    void *irt_base;
>>  };
>>  
>>  /* Setting viommu_verbose enables debugging messages of vIOMMU */
>> @@ -118,6 +121,77 @@ static void *domain_vvtd(const struct domain *d)
>>          return NULL;
>>  }
>>  
>> +static void *map_guest_pages(struct domain *d, uint64_t gfn, uint32_t nr)
>                                                  ^ gfn_t
>
>Also, this function and unmap_guest_pages look generic enough to be
>placed somewhere else, like p2m.c maybe?

Ok. will do.

>
>> +{
>> +    mfn_t *mfn = xmalloc_array(mfn_t, nr);
>> +    void* ret;
>> +    int i;
>> +
>> +    if ( !mfn )
>> +        return NULL;
>> +
>> +    for ( i = 0; i < nr; i++)
>> +    {
>> +        struct page_info *p = get_page_from_gfn(d, gfn + i, NULL, P2M_ALLOC);
>> +
>> +        if ( !p || !get_page_type(p, PGT_writable_page) )
>> +        {
>> +            if ( p )
>> +                put_page(p);
>> +            goto undo;
>> +        }
>> +
>> +        mfn[i] = _mfn(page_to_mfn(p));
>
>Please use the type-safe version of page_to_mfn, by adding the
>following at the top of the file:
>
>/* Override macros from asm/mm.h to make them work with mfn_t */
>#undef mfn_to_page
>#define mfn_to_page(mfn) __mfn_to_page(mfn_x(mfn))
>#undef page_to_mfn
>#define page_to_mfn(pg) _mfn(__page_to_mfn(pg))
>
>> +    }
>> +
>> +    ret = vmap(mfn, nr);
>> +    if ( ret == NULL )
>> +        goto undo;
>> +    xfree(mfn);
>
>You can move the xfree(mfn) before the check, and then you can remove
>it from the undo label.
>
>And since the undo label is just used once, what about doing
>
>    ret = vmap(mfn, nr);
>    xfree(mfn);
>    if ( !ret )
>    {
>        while ( i-- )
>            put_page_and_type(mfn_to_page(mfn_x(mfn[i])));
>        ....

Good suggestion.

>
>> +
>> +    return ret;
>> +
>> + undo:
>> +    for ( ; --i >= 0; )
>> +        put_page_and_type(mfn_to_page(mfn_x(mfn[i])));
>> +    xfree(mfn);
>> +    gprintk(XENLOG_ERR, "Failed to map guest pages %lx nr %x\n", gfn, nr);
>> +
>> +    return NULL;
>> +}
>> +
>> +static void unmap_guest_pages(void *va, uint32_t nr)
>unsigned long please.
>
>> +{
>> +    unsigned long *mfn = xmalloc_array(unsigned long, nr);
>> +    int i;
>> +    void *va_copy = va;
>> +
>> +    if ( !mfn )
>> +    {
>> +        printk("%s %d: No free memory\n", __FILE__, __LINE__);
>> +        return;
>> +    }
>> +
>> +    for ( i = 0; i < nr; i++, va += PAGE_SIZE)
>> +        mfn[i] = domain_page_map_to_mfn(va);
>> +
>> +    vunmap(va_copy);
>> +
>> +    for ( i = 0; i < nr; i++)
>> +        put_page_and_type(mfn_to_page(mfn[i]));
>> +}
>> +
>> +static void write_gcmd_ire(struct vvtd *vvtd, uint32_t val)
>> +{
>> +    bool set = val & DMA_GCMD_IRE;
>> +
>> +    vvtd_info("%sable Interrupt Remapping\n", set ? "En" : "Dis");
>> +
>> +    vvtd->hw.intremap_enabled = set;
>> +    (set ? vvtd_set_bit : vvtd_clear_bit)
>> +        (vvtd, DMAR_GSTS_REG, DMA_GSTS_IRES_SHIFT);
>> +}
>> +
>>  static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
>>  {
>>      uint64_t irta = vvtd_get_reg_quad(vvtd, DMAR_IRTA_REG);
>> @@ -131,16 +205,29 @@ static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
>>       * the 'Set Interrupt Remap Table Pointer' operation.
>>       */
>>      vvtd_clear_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
>> +    if ( vvtd->hw.intremap_enabled )
>> +        vvtd_info("Update Interrupt Remapping Table when active\n");
>>  
>>      if ( gfn_x(vvtd->hw.irt) != PFN_DOWN(DMA_IRTA_ADDR(irta)) ||
>>           vvtd->hw.irt_max_entry != DMA_IRTA_SIZE(irta) )
>>      {
>> +        if ( vvtd->irt_base )
>> +        {
>> +            unmap_guest_pages(vvtd->irt_base,
>> +                              PFN_UP(vvtd->hw.irt_max_entry *
>> +                                     sizeof(struct iremap_entry)));
>> +            vvtd->irt_base = NULL;
>> +        }
>
>Shouldn't this be done when sirtp is switched off, instead of when
>it's updated?
>
>What happens in the following scenario:
>
>- Guest writes gfn to irta.
>- Guest enables sirtps.
>- Guest disables sirtps.

Disabling SIRTP isn't clear to me. Maybe you mean writing to GCMD with
SIRTP cleared. Hardware ignores write 0 to SIRTP I think becasue SIRTP
is a one-shot bit. Please refer to the example in VT-d spec 10.4.4.
Each time IRTP is updated, the old mapping should be destroyed and the
new mapping should be created.

BTW, seems it's better to put setting up mapping to the previous patch.

Thanks
chao

>- Guest tries to balloon out the page used in irta.
>
>Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 09/28] x86/vvtd: Set Interrupt Remapping Table Pointer through GCMD
  2018-02-11  4:34     ` Chao Gao
@ 2018-02-11  5:09       ` Chao Gao
  2018-02-12 11:25       ` Roger Pau Monné
  1 sibling, 0 replies; 83+ messages in thread
From: Chao Gao @ 2018-02-11  5:09 UTC (permalink / raw)
  To: Roger Pau Monné,
	xen-devel, Tim Deegan, Stefano Stabellini, Jan Beulich, Wei Liu,
	Ian Jackson, George Dunlap, Konrad Rzeszutek Wilk, Andrew Cooper,
	Kevin Tian, Lan Tianyu

On Sun, Feb 11, 2018 at 12:34:11PM +0800, Chao Gao wrote:
>On Fri, Feb 09, 2018 at 04:59:11PM +0000, Roger Pau Monné wrote:
>>On Fri, Nov 17, 2017 at 02:22:16PM +0800, Chao Gao wrote:
>>> Software sets SIRTP field of GCMD to set/update the interrupt remapping
>>> table pointer used by hardware. The interrupt remapping table pointer is
>>> specified through the Interrupt Remapping Table Address (IRTA_REG)
>>> register.
>>> 
>>> This patch emulates this operation and adds some new fields in VVTD to track
>>> info (e.g. the table's gfn and max supported entries) of interrupt remapping
>>> table.
>>> 
>>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>>> 
>>> ---
>>> v4:
>>>  - declare eim_enabled as bool and irt as gfn_t
>>>  - rename vvtd_handle_gcmd_sirtp() to write_gcmd_sirtp()
>>> 
>>> v3:
>>>  - ignore unaligned r/w of vt-d hardware registers and return X86EMUL_OK
>>> ---
>>>  xen/drivers/passthrough/vtd/iommu.h | 16 ++++++-
>>>  xen/drivers/passthrough/vtd/vvtd.c  | 86 +++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 100 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
>>> index f2ef3dd..8579843 100644
>>> --- a/xen/drivers/passthrough/vtd/iommu.h
>>> +++ b/xen/drivers/passthrough/vtd/iommu.h
>>> @@ -48,7 +48,8 @@
>>>  #define DMAR_IQT_REG            0x88 /* invalidation queue tail */
>>>  #define DMAR_IQA_REG            0x90 /* invalidation queue addr */
>>>  #define DMAR_IECTL_REG          0xa0 /* invalidation event control register */
>>> -#define DMAR_IRTA_REG           0xb8 /* intr remap */
>>> +#define DMAR_IRTA_REG           0xb8 /* base address of intr remap table */
>>> +#define DMAR_IRTUA_REG          0xbc /* upper address of intr remap table */
>>>  
>>>  #define OFFSET_STRIDE        (9)
>>>  #define dmar_readl(dmar, reg) readl((dmar) + (reg))
>>> @@ -150,6 +151,9 @@
>>>  #define DMA_GCMD_SIRTP  (((u64)1) << 24)
>>>  #define DMA_GCMD_CFI    (((u64)1) << 23)
>>>  
>>> +/* mask of one-shot bits */
>>> +#define DMA_GCMD_ONE_SHOT_MASK 0x96ffffff
>>> +
>>>  /* GSTS_REG */
>>>  #define DMA_GSTS_TES    (((u64)1) << 31)
>>>  #define DMA_GSTS_RTPS   (((u64)1) << 30)
>>> @@ -157,10 +161,18 @@
>>>  #define DMA_GSTS_AFLS   (((u64)1) << 28)
>>>  #define DMA_GSTS_WBFS   (((u64)1) << 27)
>>>  #define DMA_GSTS_QIES   (((u64)1) <<26)
>>> +#define DMA_GSTS_SIRTPS_SHIFT   24
>>> +#define DMA_GSTS_SIRTPS (((u64)1) << DMA_GSTS_SIRTPS_SHIFT)
>>>  #define DMA_GSTS_IRES   (((u64)1) <<25)
>>> -#define DMA_GSTS_SIRTPS (((u64)1) << 24)
>>>  #define DMA_GSTS_CFIS   (((u64)1) <<23)
>>>  
>>> +/* IRTA_REG */
>>> +/* The base of 4KB aligned interrupt remapping table */
>>> +#define DMA_IRTA_ADDR(val)      ((val) & ~0xfffULL)
>>> +/* The size of remapping table is 2^(x+1), where x is the size field in IRTA */
>>> +#define DMA_IRTA_S(val)         (val & 0xf)
>>> +#define DMA_IRTA_SIZE(val)      (1UL << (DMA_IRTA_S(val) + 1))
>>> +
>>>  /* PMEN_REG */
>>>  #define DMA_PMEN_EPM    (((u32)1) << 31)
>>>  #define DMA_PMEN_PRS    (((u32)1) << 0)
>>> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
>>> index d78d878..f0476fe 100644
>>> --- a/xen/drivers/passthrough/vtd/vvtd.c
>>> +++ b/xen/drivers/passthrough/vtd/vvtd.c
>>> @@ -36,6 +36,12 @@
>>>  #define VVTD_MAX_OFFSET VVTD_FRCD_END
>>>  
>>>  struct hvm_hw_vvtd {
>>> +    bool eim_enabled;
>>> +
>>> +    /* Interrupt remapping table base gfn and the max of entries */
>>> +    uint16_t irt_max_entry;
>>> +    gfn_t irt;
>>> +
>>>      uint32_t regs[VVTD_MAX_OFFSET/sizeof(uint32_t)];
>>>  };
>>>  
>>> @@ -73,6 +79,16 @@ boolean_runtime_param("viommu_verbose", viommu_verbose);
>>>  
>>>  #define VVTD_REG_POS(vvtd, offset) &(vvtd->hw.regs[offset/sizeof(uint32_t)])
>>>  
>>> +static inline void vvtd_set_bit(struct vvtd *vvtd, uint32_t reg, int nr)
>>> +{
>>> +    __set_bit(nr, VVTD_REG_POS(vvtd, reg));
>>> +}
>>> +
>>> +static inline void vvtd_clear_bit(struct vvtd *vvtd, uint32_t reg, int nr)
>>> +{
>>> +    __clear_bit(nr, VVTD_REG_POS(vvtd, reg));
>>> +}
>>> +
>>>  static inline void vvtd_set_reg(struct vvtd *vvtd, uint32_t reg, uint32_t value)
>>>  {
>>>      *VVTD_REG_POS(vvtd, reg) = value;
>>> @@ -102,6 +118,52 @@ static void *domain_vvtd(const struct domain *d)
>>>          return NULL;
>>>  }
>>>  
>>> +static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
>>> +{
>>> +    uint64_t irta = vvtd_get_reg_quad(vvtd, DMAR_IRTA_REG);
>>> +
>>> +    if ( !(val & DMA_GCMD_SIRTP) )
>>
>>I think you likely want to do put_gfn here (see my comment below).
>>
>>> +        return;
>>> +
>>> +    /*
>>> +     * Hardware clears this bit when software sets the SIRTPS field in
>>> +     * the Global Command register and sets it when hardware completes
>>> +     * the 'Set Interrupt Remap Table Pointer' operation.
>>> +     */
>>> +    vvtd_clear_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
>>> +
>>> +    if ( gfn_x(vvtd->hw.irt) != PFN_DOWN(DMA_IRTA_ADDR(irta)) ||
>>> +         vvtd->hw.irt_max_entry != DMA_IRTA_SIZE(irta) )
>>> +    {
>>> +        vvtd->hw.irt = _gfn(PFN_DOWN(DMA_IRTA_ADDR(irta)));
>>
>>I'm not sure about the usage of this gfn (I guess I will figure out in
>>further patches), but I think you should probably use get_gfn so that
>>you take a reference to it. Using PFN_DOWN and _gfn is clearly
>>defeating the purpose of the whole gfn infrastructure.
>>
>>Note that you then need to use put_gfn when releasing it.
>
>The steps to enable interrupt remapping is:
>1. write to IRTA. Software should write the physcial address of interrupt
>remapping table to this register.
>2. write GCMD with SIRTP set. According to VT-d spec 10.4.4, software
>sets SIRTP to set/update the interrupt reampping table pointer used by
>hardware.
>3. write GCMD with IRE set.
>
>In this version, we get a reference in step3 (in next patch, through
>map/unmap guest IRT) other than in step2. The benefit is when guest
>tries to write SIRTP many times before enabling interrupt remapping,
>vvtd doesn't need to perform map/unmap guest IRT each time.

Oops, I should corret it here. In this version, we get a reference in step2
when mapping guest interrupt remapping table (in next patch, I will move
it to this patch). Needn't use get_gfn() here to get a reference here
again.

Thanks
Chao

>
>>
>>> +        vvtd->hw.irt_max_entry = DMA_IRTA_SIZE(irta);
>>> +        vvtd->hw.eim_enabled = !!(irta & IRTA_EIME);
>>> +        vvtd_info("Update IR info (addr=%lx eim=%d size=%d)\n",
>>> +                  gfn_x(vvtd->hw.irt), vvtd->hw.eim_enabled,
>>> +                  vvtd->hw.irt_max_entry);
>>> +    }
>>> +    vvtd_set_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
>>> +}
>>> +
>>> +static void vvtd_write_gcmd(struct vvtd *vvtd, uint32_t val)
>>> +{
>>> +    uint32_t orig = vvtd_get_reg(vvtd, DMAR_GSTS_REG);
>>> +    uint32_t changed;
>>> +
>>> +    orig = orig & DMA_GCMD_ONE_SHOT_MASK;   /* reset the one-shot bits */
>>> +    changed = orig ^ val;
>>> +
>>> +    if ( !changed )
>>> +        return;
>>> +
>>> +    if ( changed & (changed - 1) )
>>> +        vvtd_info("Write %x to GCMD (current %x), updating multiple fields",
>>> +                  val, orig);
>>
>>I'm not sure I see the purpose of the above message.
>
>I will remove this. My original throught is when we could get a warning
>that guest driver doesn't completely follow VT-d spec 10.4.4:
>If multiple control fields in this register need to be modified,
>software much serialize the modification through multiple writes to this
>register.
>
>Thanks
>Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 11/28] x86/vvtd: Process interrupt remapping request
  2018-02-09 17:44   ` Roger Pau Monné
@ 2018-02-11  5:31     ` Chao Gao
  2018-02-23 17:04       ` Roger Pau Monné
  0 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2018-02-11  5:31 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Feb 09, 2018 at 05:44:17PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:18PM +0800, Chao Gao wrote:
>> When a remapping interrupt request arrives, remapping hardware computes the
>> interrupt_index per the algorithm described in VTD spec
>> "Interrupt Remapping Table", interprets the IRTE and generates a remapped
>> interrupt request.
>> 
>> This patch introduces viommu_handle_irq_request() to emulate the process how
>> remapping hardware handles a remapping interrupt request. This patch
>> also introduces a counter inflight_intr, which is used to count the number
>> of interrupt are being handled. The reason why we should have this
>> counter is VT-d hardware should drain in-flight interrups before setting
>> flags to show that some operations are completed. These operations
>> include enabling interrupt remapping and performing a kind of invalidation
>> requests. In vvtd, we also try to drain in-flight interrupts by waiting
>> the inflight_intr is decreased to 0.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> 
>> ---
>> v4:
>>  - use "#define" to define interrupt remapping transition faults
>>  rather than using an enum
>>  - use switch-case rather than if-else in irq_remapping_request_index()
>>  and vvtd_irq_request_sanity_check()
>>  - introduce a counter inflight_intr
>> 
>> v3:
>>  - Encode map_guest_page()'s error into void* to avoid using another parameter
>> ---
>>  xen/drivers/passthrough/vtd/iommu.h |  15 +++
>>  xen/drivers/passthrough/vtd/vvtd.c  | 219 ++++++++++++++++++++++++++++++++++++
>>  2 files changed, 234 insertions(+)
>> 
>> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
>> index 9c59aeb..82edd2a 100644
>> --- a/xen/drivers/passthrough/vtd/iommu.h
>> +++ b/xen/drivers/passthrough/vtd/iommu.h
>> @@ -216,6 +216,15 @@
>>  #define dma_frcd_source_id(c) (c & 0xffff)
>>  #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
>>  
>> +/* Interrupt remapping transition faults */
>> +#define VTD_FR_IR_REQ_RSVD      0x20
>> +#define VTD_FR_IR_INDEX_OVER    0x21
>> +#define VTD_FR_IR_ENTRY_P       0x22
>> +#define VTD_FR_IR_ROOT_INVAL    0x23
>> +#define VTD_FR_IR_IRTE_RSVD     0x24
>> +#define VTD_FR_IR_REQ_COMPAT    0x25
>> +#define VTD_FR_IR_SID_ERR       0x26
>> +
>>  /*
>>   * 0: Present
>>   * 1-11: Reserved
>> @@ -356,6 +365,12 @@ struct iremap_entry {
>>  };
>>  
>>  /*
>> + * When VT-d doesn't enable extended interrupt mode, hardware interprets
>> + * 8-bits ([15:8]) of Destination-ID field in the IRTEs.
>> + */
>> +#define IRTE_xAPIC_DEST_MASK 0xff00
>> +
>> +/*
>>   * Posted-interrupt descriptor address is 64 bits with 64-byte aligned, only
>>   * the upper 26 bits of lest significiant 32 bits is available.
>>   */
>> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
>> index 06e522a..927e715 100644
>> --- a/xen/drivers/passthrough/vtd/vvtd.c
>> +++ b/xen/drivers/passthrough/vtd/vvtd.c
>> @@ -22,11 +22,15 @@
>>  #include <xen/types.h>
>>  #include <xen/viommu.h>
>>  #include <xen/xmalloc.h>
>> +#include <asm/apic.h>
>>  #include <asm/current.h>
>> +#include <asm/event.h>
>> +#include <asm/io_apic.h>
>>  #include <asm/hvm/domain.h>
>>  #include <asm/p2m.h>
>>  
>>  #include "iommu.h"
>> +#include "vtd.h"
>>  
>>  /* Supported capabilities by vvtd */
>>  #define VVTD_MAX_CAPS VIOMMU_CAP_IRQ_REMAPPING
>> @@ -52,6 +56,8 @@ struct vvtd {
>>      uint64_t base_addr;
>>      /* Point back to the owner domain */
>>      struct domain *domain;
>> +    /* # of in-flight interrupts */
>> +    atomic_t inflight_intr;
>>  
>>      struct hvm_hw_vvtd hw;
>>      void *irt_base;
>> @@ -181,6 +187,109 @@ static void unmap_guest_pages(void *va, uint32_t nr)
>>          put_page_and_type(mfn_to_page(mfn[i]));
>>  }
>>  
>> +static int vvtd_delivery(struct domain *d, uint8_t vector,
>> +                         uint32_t dest, bool dest_mode,
>> +                         uint8_t delivery_mode, uint8_t trig_mode)
>> +{
>> +    struct vlapic *target;
>> +    struct vcpu *v;
>> +
>> +    switch ( delivery_mode )
>> +    {
>> +    case dest_LowestPrio:
>> +        target = vlapic_lowest_prio(d, NULL, 0, dest, dest_mode);
>> +        if ( target != NULL )
>> +        {
>> +            vvtd_debug("d%d: dest=v%d dlm=%x vector=%d trig_mode=%d\n",
>> +                       vlapic_domain(target)->domain_id,
>> +                       vlapic_vcpu(target)->vcpu_id,
>> +                       delivery_mode, vector, trig_mode);
>> +            vlapic_set_irq(target, vector, trig_mode);
>> +            break;
>> +        }
>> +        vvtd_debug("d%d: null round robin: vector=%02x\n",
>> +                   d->domain_id, vector);
>> +        break;
>> +
>> +    case dest_Fixed:
>> +        for_each_vcpu ( d, v )
>> +            if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) )
>> +            {
>> +                vvtd_debug("d%d: dest=v%d dlm=%x vector=%d trig_mode=%d\n",
>> +                           v->domain->domain_id, v->vcpu_id,
>> +                           delivery_mode, vector, trig_mode);
>> +                vlapic_set_irq(vcpu_vlapic(v), vector, trig_mode);
>> +            }
>> +        break;
>> +
>> +    case dest_NMI:
>> +        for_each_vcpu ( d, v )
>> +            if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) &&
>> +                 !test_and_set_bool(v->nmi_pending) )
>> +                vcpu_kick(v);
>
>Doing this loops here seems quite bad from a preformance PoV,
>specially taking into account that this code is going to be used with
>> 128 vCPUs.

Maybe. But i prefer to not do optimization at this early stage.

>
>> +        break;
>> +
>> +    default:
>> +        gdprintk(XENLOG_WARNING, "Unsupported VTD delivery mode %d\n",
>> +                 delivery_mode);
>> +        return -EINVAL;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/* Computing the IRTE index for a given interrupt request. When success, return
>> + * 0 and set index to reference the corresponding IRTE. Otherwise, return < 0,
>> + * i.e. -1 when the irq request isn't an remapping format.
>> + */
>> +static int irq_remapping_request_index(
>> +    const struct arch_irq_remapping_request *irq, uint32_t *index)
>> +{
>> +    switch ( irq->type )
>> +    {
>> +    case VIOMMU_REQUEST_IRQ_MSI:
>> +    {
>> +        struct msi_msg_remap_entry msi_msg =
>> +        {
>> +            .address_lo = { .val = irq->msg.msi.addr },
>
>Can't you just use .address_lo.val = irq->...

Will do.

>
>> +            .data = irq->msg.msi.data,
>> +        };
>> +
>> +        if ( !msi_msg.address_lo.format )
>> +            return -1;
>
>In all the other functions you already return some kind of meaningful
>error code, please do so here also.

Ok.

>
>> +
>> +        *index = (msi_msg.address_lo.index_15 << 15) +
>> +                msi_msg.address_lo.index_0_14;
>> +        if ( msi_msg.address_lo.SHV )
>> +            *index += (uint16_t)msi_msg.data;
>> +        break;
>> +    }
>> +
>> +    case VIOMMU_REQUEST_IRQ_APIC:
>> +    {
>> +        struct IO_APIC_route_remap_entry remap_rte = { .val = irq->msg.rte };
>> +
>> +        if ( !remap_rte.format )
>> +            return -1;
>> +
>> +        *index = (remap_rte.index_15 << 15) + remap_rte.index_0_14;
>> +        break;
>> +    }
>> +
>> +    default:
>> +        ASSERT_UNREACHABLE();
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static inline uint32_t irte_dest(struct vvtd *vvtd, uint32_t dest)
>> +{
>> +    /* In xAPIC mode, only 8-bits([15:8]) are valid */
>> +    return vvtd->hw.eim_enabled ? dest
>> +                                : MASK_EXTR(dest, IRTE_xAPIC_DEST_MASK);
>> +}
>> +
>>  static void write_gcmd_ire(struct vvtd *vvtd, uint32_t val)
>>  {
>>      bool set = val & DMA_GCMD_IRE;
>> @@ -323,6 +432,115 @@ static const struct hvm_mmio_ops vvtd_mmio_ops = {
>>      .write = vvtd_write
>>  };
>>  
>> +static void vvtd_handle_fault(struct vvtd *vvtd,
>> +                              const struct arch_irq_remapping_request *irq,
>> +                              struct iremap_entry *irte,
>> +                              unsigned int fault)
>> +{
>> +    switch ( fault )
>> +    {
>> +    case VTD_FR_IR_SID_ERR:
>> +    case VTD_FR_IR_IRTE_RSVD:
>> +    case VTD_FR_IR_ENTRY_P:
>> +        if ( qinval_fault_disable(*irte) )
>> +            break;
>> +    /* fall through */
>> +    case VTD_FR_IR_REQ_RSVD:
>> +    case VTD_FR_IR_INDEX_OVER:
>> +    case VTD_FR_IR_ROOT_INVAL:
>> +        /* TODO: handle fault (e.g. record and report this fault to VM */
>> +        break;
>> +
>> +    default:
>> +        vvtd_debug("d%d can't handle VT-d fault %x\n", vvtd->domain->domain_id,
>> +                   fault);
>> +    }
>> +    return;
>> +}
>> +
>> +static bool vvtd_irq_request_sanity_check(const struct vvtd *vvtd,
>> +                                   const struct arch_irq_remapping_request *irq)
>> +{
>> +    switch ( irq->type )
>> +    {
>> +    case VIOMMU_REQUEST_IRQ_APIC:
>> +    {
>> +        struct IO_APIC_route_remap_entry rte = { .val = irq->msg.rte };
>> +
>> +        return !rte.reserved;
>> +    }
>> +
>> +    case VIOMMU_REQUEST_IRQ_MSI:
>> +        return true;
>> +    }
>> +
>> +    ASSERT_UNREACHABLE();
>> +    return false;
>> +}
>> +
>> +static int vvtd_get_entry(struct vvtd *vvtd,
>> +                          const struct arch_irq_remapping_request *irq,
>> +                          struct iremap_entry *dest)
>const for both vvtd and dest?

constify vvtd is ok. 'dest' is used to store the entry corresponding to the request.
So 'dest' cannot be const.

>
>> +{
>> +    uint32_t entry;
>> +    struct iremap_entry irte;
>> +    int ret = irq_remapping_request_index(irq, &entry);
>> +
>> +    ASSERT(!ret);
>> +
>> +    vvtd_debug("d%d: interpret a request with index %x\n",
>> +               vvtd->domain->domain_id, entry);
>> +
>> +    if ( !vvtd_irq_request_sanity_check(vvtd, irq) )
>> +        return VTD_FR_IR_REQ_RSVD;
>> +    else if ( entry > vvtd->hw.irt_max_entry )
>> +        return VTD_FR_IR_INDEX_OVER;
>> +    else if ( !vvtd->irt_base )
>
>No need for the 'else', since you are already using return.
>
>> +        return VTD_FR_IR_ROOT_INVAL;
>> +
>> +    irte = ((struct iremap_entry*)vvtd->irt_base)[entry];
>> +
>> +    if ( !qinval_present(irte) )
>> +        ret = VTD_FR_IR_ENTRY_P;
>> +    else if ( (irte.remap.res_1 || irte.remap.res_2 || irte.remap.res_3 ||
>> +               irte.remap.res_4) )
>> +        ret = VTD_FR_IR_IRTE_RSVD;
>> +
>> +    /* FIXME: We don't check against the source ID */
>> +
>> +    dest->val = irte.val;
>> +
>> +    return ret;
>> +}
>> +
>> +static int vvtd_handle_irq_request(const struct domain *d,
>
>constifying domain here is not the best practice IMHO. In the function
>you are actually modifying vvtd, which is fine because it's a pointer
>but it's conceptually inside of domain.

Ok.

>
>> +                                   const struct arch_irq_remapping_request *irq)
>> +{
>> +    struct iremap_entry irte;
>> +    int ret;
>> +    struct vvtd *vvtd = domain_vvtd(d);
>> +
>> +    if ( !vvtd || !vvtd->hw.intremap_enabled )
>> +        return -ENODEV;
>> +
>> +    atomic_inc(&vvtd->inflight_intr);
>> +    ret = vvtd_get_entry(vvtd, irq, &irte);
>> +    if ( ret )
>> +    {
>> +        vvtd_handle_fault(vvtd, irq, &irte, ret);
>> +        goto out;
>> +    }
>> +
>> +    ret = vvtd_delivery(vvtd->domain, irte.remap.vector,
>> +                        irte_dest(vvtd, irte.remap.dst),
>> +                        irte.remap.dm, irte.remap.dlm,
>> +                        irte.remap.tm);
>> +
>> + out:
>> +    atomic_dec(&vvtd->inflight_intr);
>
>So inflight_intr seem to be quite pointless, you only use it in this
>function and it's never read AFAICT.

I will introduce this field when it is first used.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 05/28] VIOMMU: Introduce callback of checking irq remapping mode
  2018-02-09 16:47     ` Chao Gao
@ 2018-02-12 10:21       ` Roger Pau Monné
  0 siblings, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-12 10:21 UTC (permalink / raw)
  To: xen-devel, Lan Tianyu, Kevin Tian, George Dunlap, Wei Liu,
	Tim Deegan, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Jan Beulich, Ian Jackson, Andrew Cooper

On Sat, Feb 10, 2018 at 12:47:07AM +0800, Chao Gao wrote:
> On Fri, Feb 09, 2018 at 03:11:25PM +0000, Roger Pau Monné wrote:
> >On Fri, Nov 17, 2017 at 02:22:12PM +0800, Chao Gao wrote:
> >> From: Lan Tianyu <tianyu.lan@intel.com>
> >> 
> >> This patch is to add callback for vIOAPIC and vMSI to check whether interrupt
> >> remapping is enabled.
> >
> >Same as with the previous patches, not adding the actual code in
> >check_irq_remapping makes reviewing this impossible.
> >
> >> 
> >> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> >> Signed-off-by: Chao Gao <chao.gao@intel.com>
> >> ---
> >>  xen/common/viommu.c      | 15 +++++++++++++++
> >>  xen/include/xen/viommu.h |  4 ++++
> >>  2 files changed, 19 insertions(+)
> >> 
> >> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
> >> index 9eafdef..72173c3 100644
> >> --- a/xen/common/viommu.c
> >> +++ b/xen/common/viommu.c
> >> @@ -145,6 +145,21 @@ int viommu_get_irq_info(const struct domain *d,
> >>      return viommu->ops->get_irq_info(d, request, irq_info);
> >>  }
> >>  
> >> +bool viommu_check_irq_remapping(const struct domain *d,
> >> +                                const struct arch_irq_remapping_request *request)
> >> +{
> >> +    const struct viommu *viommu = d->arch.hvm_domain.viommu;
> >> +
> >> +    if ( !viommu )
> >> +        return false;
> >> +
> >> +    ASSERT(viommu->ops);
> >> +    if ( !viommu->ops->check_irq_remapping )
> >> +        return false;
> >> +
> >> +    return viommu->ops->check_irq_remapping(d, request);
> >> +}
> >
> >Having a helper for each functionality you want to support seems
> >extremely cumbersome, I would image this to grow so that you will also
> >have viommu_check_mem_mapping and others.
> >
> >Isn't it better to just have something like viommu_check_feature, or
> >even just expose a features field in the viommu struct itself?
> 
> Maybe it is caused by our poor function name and no comments to point
> out what the function does.  As you know, interrupts has two formats:
> legacy format and remappable format.  The format is indicated by one bit
> of MSI msg or IOAPIC RTE. Roughly, only remappable format should be
> translated by IOMMU. So every time we want to handle an interrupt, we
> should know its format and we think the remappable format varies
> from different vendors. This is why we introduce a new field here in
> order to abstract the check of remapping format.

Oh, I see. So this is used to check whether each interrupt needs
remapping or not, it's not used to check whether the arch specific
vIOMMU implementation supports interrupt remapping.

I would maybe rename this to 'check_irq_remapped' or
'check_intr_remapped'.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 07/28] x86/hvm: Introduce a emulated VTD for HVM
  2018-02-09 17:12     ` Chao Gao
@ 2018-02-12 10:35       ` Roger Pau Monné
  0 siblings, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-12 10:35 UTC (permalink / raw)
  To: xen-devel, George Dunlap, Kevin Tian, Wei Liu, Tim Deegan,
	Stefano Stabellini, Konrad Rzeszutek Wilk, Jan Beulich,
	Ian Jackson, Andrew Cooper, Lan Tianyu

On Sat, Feb 10, 2018 at 01:12:28AM +0800, Chao Gao wrote:
> On Fri, Feb 09, 2018 at 04:27:54PM +0000, Roger Pau Monné wrote:
> >On Fri, Nov 17, 2017 at 02:22:14PM +0800, Chao Gao wrote:
> >> +    if ( !vvtd )
> >> +        return ENOMEM;
> >> +
> >> +    vvtd_reset(vvtd);
> >> +    vvtd->base_addr = viommu->base_address;
> >
> >I think it would be good to have some check here, so that the vIOMMU
> >is not for example positioned on top of a RAM region. Ideally you
> >should check that the gfns [base_address, base_address + size) are
> >unpopulated.
> 
> Yes. Except some checks here, this page should be reserved in guest e820,
> which implies some work in qemu or tool stack.

Right... I guess since the toolstack is the one that actually
populates memory &c it should be the one to position the vIOMMU, so
just leave this as-is for the time being, let's see other's people
opinion.

Out of curiosity, is the IOMMU on real hardware always at the same
memory address?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 09/28] x86/vvtd: Set Interrupt Remapping Table Pointer through GCMD
  2018-02-11  4:34     ` Chao Gao
  2018-02-11  5:09       ` Chao Gao
@ 2018-02-12 11:25       ` Roger Pau Monné
  1 sibling, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-12 11:25 UTC (permalink / raw)
  To: xen-devel, Tim Deegan, Stefano Stabellini, Jan Beulich, Wei Liu,
	Ian Jackson, George Dunlap, Konrad Rzeszutek Wilk, Andrew Cooper,
	Kevin Tian, Lan Tianyu

On Sun, Feb 11, 2018 at 12:34:11PM +0800, Chao Gao wrote:
> On Fri, Feb 09, 2018 at 04:59:11PM +0000, Roger Pau Monné wrote:
> >On Fri, Nov 17, 2017 at 02:22:16PM +0800, Chao Gao wrote:
> >> +        return;
> >> +
> >> +    /*
> >> +     * Hardware clears this bit when software sets the SIRTPS field in
> >> +     * the Global Command register and sets it when hardware completes
> >> +     * the 'Set Interrupt Remap Table Pointer' operation.
> >> +     */
> >> +    vvtd_clear_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
> >> +
> >> +    if ( gfn_x(vvtd->hw.irt) != PFN_DOWN(DMA_IRTA_ADDR(irta)) ||
> >> +         vvtd->hw.irt_max_entry != DMA_IRTA_SIZE(irta) )
> >> +    {
> >> +        vvtd->hw.irt = _gfn(PFN_DOWN(DMA_IRTA_ADDR(irta)));
> >
> >I'm not sure about the usage of this gfn (I guess I will figure out in
> >further patches), but I think you should probably use get_gfn so that
> >you take a reference to it. Using PFN_DOWN and _gfn is clearly
> >defeating the purpose of the whole gfn infrastructure.
> >
> >Note that you then need to use put_gfn when releasing it.
> 
> The steps to enable interrupt remapping is:
> 1. write to IRTA. Software should write the physcial address of interrupt
> remapping table to this register.
> 2. write GCMD with SIRTP set. According to VT-d spec 10.4.4, software
> sets SIRTP to set/update the interrupt reampping table pointer used by
> hardware.
> 3. write GCMD with IRE set.
> 
> In this version, we get a reference in step3 (in next patch, through
> map/unmap guest IRT) other than in step2. The benefit is when guest
> tries to write SIRTP many times before enabling interrupt remapping,
> vvtd doesn't need to perform map/unmap guest IRT each time.

Oh, I see, so the reference should be dropped when IRE is cleared,
since IRTA can be set multiple times without IRE set, which shouldn't
result in the page tables being mapped.

One thing that I don't really quite like about all this implementation
is that you allocate space for all the registers in 'regs', and yet
you keep adding more fields to the struct, like eim_enabled or
irt_max_entry which can all be obtained from 'regs' itself.

IMO, I think you should keep data in a single place to avoid it
getting out of sync. So either you use 'regs' for everything, or you
drop 'regs' completely and simply use per-register custom fields that
you add to hvm_hw_vvtd when they are needed.

I think the later would be clearer, but I haven't reviewed the whole
series yet.

> >> +        vvtd->hw.irt_max_entry = DMA_IRTA_SIZE(irta);
> >> +        vvtd->hw.eim_enabled = !!(irta & IRTA_EIME);
> >> +        vvtd_info("Update IR info (addr=%lx eim=%d size=%d)\n",
> >> +                  gfn_x(vvtd->hw.irt), vvtd->hw.eim_enabled,
> >> +                  vvtd->hw.irt_max_entry);
> >> +    }
> >> +    vvtd_set_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
> >> +}
> >> +
> >> +static void vvtd_write_gcmd(struct vvtd *vvtd, uint32_t val)
> >> +{
> >> +    uint32_t orig = vvtd_get_reg(vvtd, DMAR_GSTS_REG);
> >> +    uint32_t changed;
> >> +
> >> +    orig = orig & DMA_GCMD_ONE_SHOT_MASK;   /* reset the one-shot bits */
> >> +    changed = orig ^ val;
> >> +
> >> +    if ( !changed )
> >> +        return;
> >> +
> >> +    if ( changed & (changed - 1) )
> >> +        vvtd_info("Write %x to GCMD (current %x), updating multiple fields",
> >> +                  val, orig);
> >
> >I'm not sure I see the purpose of the above message.
> 
> I will remove this. My original throught is when we could get a warning
> that guest driver doesn't completely follow VT-d spec 10.4.4:
> If multiple control fields in this register need to be modified,
> software much serialize the modification through multiple writes to this
> register.

Oh, I see, I didn't know the spec only allows changing one bit at a
time. What does real hardware do when multiple bits are changed at the
same write?

Is some kind of error triggered?

I think this is likely helpful, but should be a WARN or ERROR log
message, not an info one.

Thanks. Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 10/28] x86/vvtd: Enable Interrupt Remapping through GCMD
  2018-02-11  5:05     ` Chao Gao
@ 2018-02-12 11:30       ` Roger Pau Monné
  2018-02-22  6:25         ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-12 11:30 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Sun, Feb 11, 2018 at 01:05:01PM +0800, Chao Gao wrote:
> On Fri, Feb 09, 2018 at 05:15:17PM +0000, Roger Pau Monné wrote:
> >On Fri, Nov 17, 2017 at 02:22:17PM +0800, Chao Gao wrote:
> >> +static void write_gcmd_ire(struct vvtd *vvtd, uint32_t val)
> >> +{
> >> +    bool set = val & DMA_GCMD_IRE;
> >> +
> >> +    vvtd_info("%sable Interrupt Remapping\n", set ? "En" : "Dis");
> >> +
> >> +    vvtd->hw.intremap_enabled = set;
> >> +    (set ? vvtd_set_bit : vvtd_clear_bit)
> >> +        (vvtd, DMAR_GSTS_REG, DMA_GSTS_IRES_SHIFT);
> >> +}
> >> +
> >>  static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
> >>  {
> >>      uint64_t irta = vvtd_get_reg_quad(vvtd, DMAR_IRTA_REG);
> >> @@ -131,16 +205,29 @@ static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
> >>       * the 'Set Interrupt Remap Table Pointer' operation.
> >>       */
> >>      vvtd_clear_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
> >> +    if ( vvtd->hw.intremap_enabled )
> >> +        vvtd_info("Update Interrupt Remapping Table when active\n");
> >>  
> >>      if ( gfn_x(vvtd->hw.irt) != PFN_DOWN(DMA_IRTA_ADDR(irta)) ||
> >>           vvtd->hw.irt_max_entry != DMA_IRTA_SIZE(irta) )
> >>      {
> >> +        if ( vvtd->irt_base )
> >> +        {
> >> +            unmap_guest_pages(vvtd->irt_base,
> >> +                              PFN_UP(vvtd->hw.irt_max_entry *
> >> +                                     sizeof(struct iremap_entry)));
> >> +            vvtd->irt_base = NULL;
> >> +        }
> >
> >Shouldn't this be done when sirtp is switched off, instead of when
> >it's updated?
> >
> >What happens in the following scenario:
> >
> >- Guest writes gfn to irta.
> >- Guest enables sirtps.
> >- Guest disables sirtps.
> 
> Disabling SIRTP isn't clear to me. Maybe you mean writing to GCMD with
> SIRTP cleared. Hardware ignores write 0 to SIRTP I think becasue SIRTP
> is a one-shot bit. Please refer to the example in VT-d spec 10.4.4.
> Each time IRTP is updated, the old mapping should be destroyed and the
> new mapping should be created.

After reading the spec I agree, there's no such thing as clearing
SIRTP.

You should however unmap the IRTA address when IRE is cleared
(interrupt remapping disabled), which AFAICT you don't to do now.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 12/28] x86/vvtd: decode interrupt attribute from IRTE
  2017-11-17  6:22 ` [PATCH v4 12/28] x86/vvtd: decode interrupt attribute from IRTE Chao Gao
@ 2018-02-12 11:55   ` Roger Pau Monné
  2018-02-22  6:33     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-12 11:55 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:19PM +0800, Chao Gao wrote:
> Without interrupt remapping, interrupt attributes can be extracted from
> msi message or IOAPIC RTE. However, with interrupt remapping enabled,
> the attributes are enclosed in the associated IRTE. This callback is
> for cases in which the caller wants to acquire interrupt attributes, for
> example:
> 1. vioapic_get_vector(). With vIOMMU, the RTE may don't contain vector.
                                                ^ doesn't contain the vector.
> 2. perform EOI which is always based on the interrupt vector.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
> v3:
>  - add example cases in which we will use this function.

I'm still missing the actual usage of vvtd_get_irq_info. This handler
is introduced without any user.

> ---
>  xen/drivers/passthrough/vtd/vvtd.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
> index 927e715..9890cc2 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -541,6 +541,30 @@ static int vvtd_handle_irq_request(const struct domain *d,
>      return ret;
>  }
>  
> +static int vvtd_get_irq_info(const struct domain *d,

IMO for internal (static) functions you can drop the vvtd_ prefix.

> +                             const struct arch_irq_remapping_request *irq,
> +                             struct arch_irq_remapping_info *info)
> +{
> +    int ret;
> +    struct iremap_entry irte;
> +    struct vvtd *vvtd = domain_vvtd(d);
> +
> +    if ( !vvtd )
> +        return -ENODEV;
> +
> +    ret = vvtd_get_entry(vvtd, irq, &irte);
> +    /* not in an interrupt delivery, don't report faults to guest */
> +    if ( ret )
> +        return ret;
> +
> +    info->vector = irte.remap.vector;
> +    info->dest = irte_dest(vvtd, irte.remap.dst);
> +    info->dest_mode = irte.remap.dm;
> +    info->delivery_mode = irte.remap.dlm;
> +
> +    return 0;
> +}
> +
>  static void vvtd_reset(struct vvtd *vvtd)
>  {
>      uint64_t cap = cap_set_num_fault_regs(VVTD_FRCD_NUM)
> @@ -603,6 +627,7 @@ static const struct viommu_ops vvtd_hvm_vmx_ops = {
>      .create = vvtd_create,
>      .destroy = vvtd_destroy,
>      .handle_irq_request = vvtd_handle_irq_request,
> +    .get_irq_info = vvtd_get_irq_info,

So the public helper to this arch specific hook is added in 4/28, yet
the arch specific code is added here, and I still have to figure out
where this will actually be hooked into the vIOAPIC or vMSI code.

Would it be possible to have a single patch, which contains 4/28, the
code in this patch and the glue that hooks this into the vIOAPIC and
vMSI code?

The above likely applies to quite a lot of patches in this series.
It's fine to try to reduce the size of patches as much as possible,
but at least in this series this is actually harming (at least my)
capability to review them.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 13/28] x86/vvtd: add a helper function to decide the interrupt format
  2017-11-17  6:22 ` [PATCH v4 13/28] x86/vvtd: add a helper function to decide the interrupt format Chao Gao
@ 2018-02-12 12:14   ` Roger Pau Monné
  0 siblings, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-12 12:14 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:20PM +0800, Chao Gao wrote:
> Different platform may use different method to distinguish
> remapping format interrupt and normal format interrupt.
> 
> Intel uses one bit in IOAPIC RTE or MSI address register to
> indicate the interrupt is remapping format. vvtd should handle
> all the interrupts when .check_irq_remapping() return true.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> 
> ---
> v3:
>  - new
> ---
>  xen/drivers/passthrough/vtd/vvtd.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
> index 9890cc2..d3dec01 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -565,6 +565,15 @@ static int vvtd_get_irq_info(const struct domain *d,
>      return 0;
>  }
>  
> +/* check whether the interrupt request is remappable */
> +static bool vvtd_is_remapping(const struct domain *d,

irq_remapped or intr_remapped would be clearer.

And likewise the comment in the previous patch, it would be much
better to introduce this together with check_irq_remapping and an
actual user.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 14/28] x86/vvtd: Handle interrupt translation faults
  2017-11-17  6:22 ` [PATCH v4 14/28] x86/vvtd: Handle interrupt translation faults Chao Gao
@ 2018-02-12 12:55   ` Roger Pau Monné
  2018-02-22  8:23     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-12 12:55 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:21PM +0800, Chao Gao wrote:
> Interrupt translation faults are non-recoverable fault. When faults
> are triggered, it needs to populate fault info to Fault Recording
> Registers and inject msi interrupt to notify guest IOMMU driver
> to deal with faults.
> 
> This patch emulates hardware's handling interrupt translation
> faults (more information about the process can be found in VT-d spec,
> chipter "Translation Faults", section "Non-Recoverable Fault
> Reporting" and section "Non-Recoverable Logging").
> Specifically, viommu_record_fault() records the fault information and
> viommu_report_non_recoverable_fault() reports faults to software.
> Currently, only Primary Fault Logging is supported and the Number of
> Fault-recording Registers is 1.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> 
> ---
> v4:
>  - introduce a lock to protect fault-event related regs
> ---
>  xen/drivers/passthrough/vtd/iommu.h |  51 ++++++-
>  xen/drivers/passthrough/vtd/vvtd.c  | 288 +++++++++++++++++++++++++++++++++++-
>  2 files changed, 333 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
> index 82edd2a..dc2df75 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -196,26 +196,67 @@
>  #define DMA_CCMD_CAIG_MASK(x) (((u64)x) & ((u64) 0x3 << 59))
>  
>  /* FECTL_REG */
> -#define DMA_FECTL_IM        ((uint32_t)1 << 31)
> +#define DMA_FECTL_IM_SHIFT  31
> +#define DMA_FECTL_IP_SHIFT  30
> +#define DMA_FECTL_IM        ((uint32_t)1 << DMA_FECTL_IM_SHIFT)
> +#define DMA_FECTL_IP        ((uint32_t)1 << DMA_FECTL_IP_SHIFT)
>  
>  /* FSTS_REG */
> -#define DMA_FSTS_PFO        ((uint32_t)1 << 0)
> -#define DMA_FSTS_PPF        ((uint32_t)1 << 1)
> +#define DMA_FSTS_PFO_SHIFT  0
> +#define DMA_FSTS_PPF_SHIFT  1
> +#define DMA_FSTS_PRO_SHIFT  7
> +
> +#define DMA_FSTS_PFO        ((uint32_t)1 << DMA_FSTS_PFO_SHIFT)
> +#define DMA_FSTS_PPF        ((uint32_t)1 << DMA_FSTS_PPF_SHIFT)
>  #define DMA_FSTS_AFO        ((uint32_t)1 << 2)
>  #define DMA_FSTS_APF        ((uint32_t)1 << 3)
>  #define DMA_FSTS_IQE        ((uint32_t)1 << 4)
>  #define DMA_FSTS_ICE        ((uint32_t)1 << 5)
>  #define DMA_FSTS_ITE        ((uint32_t)1 << 6)
> -#define DMA_FSTS_FAULTS    DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_AFO | DMA_FSTS_APF | DMA_FSTS_IQE | DMA_FSTS_ICE | DMA_FSTS_ITE
> +#define DMA_FSTS_PRO        ((uint32_t)1 << DMA_FSTS_PRO_SHIFT)
> +#define DMA_FSTS_FAULTS     (DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_AFO | \
> +                             DMA_FSTS_APF | DMA_FSTS_IQE | DMA_FSTS_ICE | \
> +                             DMA_FSTS_ITE | DMA_FSTS_PRO)
> +#define DMA_FSTS_RW1CS      (DMA_FSTS_PFO | DMA_FSTS_AFO | DMA_FSTS_APF | \
> +                             DMA_FSTS_IQE | DMA_FSTS_ICE | DMA_FSTS_ITE | \
> +                             DMA_FSTS_PRO)
>  #define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff)
>  
>  /* FRCD_REG, 32 bits access */
> -#define DMA_FRCD_F (((u64)1) << 31)
> +#define DMA_FRCD_LEN            0x10
> +#define DMA_FRCD2_OFFSET        0x8
> +#define DMA_FRCD3_OFFSET        0xc
> +#define DMA_FRCD_F_SHIFT        31
> +#define DMA_FRCD_F ((u64)1 << DMA_FRCD_F_SHIFT)
>  #define dma_frcd_type(d) ((d >> 30) & 1)
>  #define dma_frcd_fault_reason(c) (c & 0xff)
>  #define dma_frcd_source_id(c) (c & 0xffff)
>  #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
>  
> +struct vtd_fault_record_register
> +{
> +    union {
> +        struct {
> +            uint64_t lo;
> +            uint64_t hi;
> +        } bits;
> +        struct {
> +            uint64_t rsvd0          :12,
> +                     fault_info     :52;
> +            uint64_t source_id      :16,
> +                     rsvd1          :9,
> +                     pmr            :1,  /* Privilege Mode Requested */
> +                     exe            :1,  /* Execute Permission Requested */
> +                     pasid_p        :1,  /* PASID Present */
> +                     fault_reason   :8,  /* Fault Reason */
> +                     pasid_val      :20, /* PASID Value */
> +                     addr_type      :2,  /* Address Type */
> +                     type           :1,  /* Type. (0) Write (1) Read/AtomicOp */
> +                     fault          :1;  /* Fault */
> +        } fields;
> +    };
> +};
> +
>  /* Interrupt remapping transition faults */
>  #define VTD_FR_IR_REQ_RSVD      0x20
>  #define VTD_FR_IR_INDEX_OVER    0x21
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
> index d3dec01..83805d1 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -43,6 +43,7 @@
>  struct hvm_hw_vvtd {
>      bool eim_enabled;
>      bool intremap_enabled;
> +    uint32_t fault_index;
>  
>      /* Interrupt remapping table base gfn and the max of entries */
>      uint16_t irt_max_entry;
> @@ -58,6 +59,12 @@ struct vvtd {
>      struct domain *domain;
>      /* # of in-flight interrupts */
>      atomic_t inflight_intr;
> +    /*
> +     * This lock protects fault-event related registers (DMAR_FEXXX_REG).
> +     * It's used for draining in-flight fault events before responding
> +     * guest's programming to those registers.
> +     */
> +    spinlock_t fe_lock;

I still think almost if not all of the vvtd helper functions should be
mutually exclusive (ie: locked), not only the fault-event related
registers. I guess Linux or other OSes already serialize access to the
vIOMMU somehow, so your not seeing any errors. But I'm quite sure
things will fail in weird ways if a malicious guests starts to
concurrently write to different vIOMMU registers.

>  
>      struct hvm_hw_vvtd hw;
>      void *irt_base;
> @@ -87,6 +94,21 @@ boolean_runtime_param("viommu_verbose", viommu_verbose);
>  #endif
>  
>  #define VVTD_REG_POS(vvtd, offset) &(vvtd->hw.regs[offset/sizeof(uint32_t)])
> +static inline int vvtd_test_and_set_bit(struct vvtd *vvtd, uint32_t reg, int nr)
> +{
> +    return test_and_set_bit(nr, VVTD_REG_POS(vvtd, reg));
> +}
> +
> +static inline int vvtd_test_and_clear_bit(struct vvtd *vvtd, uint32_t reg,
> +                                          int nr)
> +{
> +    return test_and_clear_bit(nr, VVTD_REG_POS(vvtd, reg));
> +}

So for set and clear bit you use the non locked variants (prefixed by
__), and here you use the locked variants of test and set/clear. Is
there any reason for this? I would expect locked/unlocked bitops to be
used consistently for dealing with the registers unless there's a
specific reason not to do so.

> +
> +static inline int vvtd_test_bit(struct vvtd *vvtd, uint32_t reg, int nr)
> +{
> +    return test_bit(nr, VVTD_REG_POS(vvtd, reg));
> +}
>  
>  static inline void vvtd_set_bit(struct vvtd *vvtd, uint32_t reg, int nr)
>  {
> @@ -238,6 +260,30 @@ static int vvtd_delivery(struct domain *d, uint8_t vector,
>      return 0;
>  }
>  
> +static void vvtd_generate_interrupt(const struct vvtd *vvtd, uint64_t addr,
> +                                    uint32_t data)
> +{
> +    bool dm = addr & MSI_ADDR_DESTMODE_MASK;

Please use MASK_EXTR here. Also destmode is usually treated as an
uint8_t in the rest of the Xen code (see vmsi_deliver). I would
probably keep using uint8_t just for consistency with the rest of the
code.

> +    uint32_t dest = MASK_EXTR(addr, MSI_ADDR_DEST_ID_MASK);
> +    uint8_t dlm = MASK_EXTR(data, MSI_DATA_DELIVERY_MODE_MASK);
> +    uint8_t tm = MASK_EXTR(data, MSI_DATA_TRIGGER_MASK);
> +    uint8_t vector = data & MSI_DATA_VECTOR_MASK;

MASK_EXTR please.

> +
> +    vvtd_debug("d%d: generating msi %lx %x\n", vvtd->domain->domain_id, addr,
> +               data);
> +
> +    if ( vvtd->hw.eim_enabled )
> +        dest |= (addr >> 40) << 8;

This 40 and 8 look like magic numbers to me, but it's liekly me
missing something. Any reason not to use addr >> 32 directly? In any
case I would really appreciate if you could add defines for those
and/or comments.

> +
> +    vvtd_delivery(vvtd->domain, vector, dest, dm, dlm, tm);
> +}
> +
> +static void vvtd_notify_fault(const struct vvtd *vvtd)
> +{
> +    vvtd_generate_interrupt(vvtd, vvtd_get_reg_quad(vvtd, DMAR_FEADDR_REG),
> +                            vvtd_get_reg(vvtd, DMAR_FEDATA_REG));
> +}
> +
>  /* Computing the IRTE index for a given interrupt request. When success, return
>   * 0 and set index to reference the corresponding IRTE. Otherwise, return < 0,
>   * i.e. -1 when the irq request isn't an remapping format.
> @@ -290,6 +336,198 @@ static inline uint32_t irte_dest(struct vvtd *vvtd, uint32_t dest)
>                                  : MASK_EXTR(dest, IRTE_xAPIC_DEST_MASK);
>  }
>  
> +static void vvtd_report_non_recoverable_fault(struct vvtd *vvtd, int reason)
> +{
> +    uint32_t fsts = vvtd_get_reg(vvtd, DMAR_FSTS_REG);
> +
> +    vvtd_set_bit(vvtd, DMAR_FSTS_REG, reason);

test_and_set?

> +
> +    /*
> +     * Accoroding to VT-d spec "Non-Recoverable Fault Event" chapter, if
> +     * there are any previously reported interrupt conditions that are yet to
> +     * be sevices by software, the Fault Event interrrupt is not generated.
> +     */
> +    if ( fsts & DMA_FSTS_FAULTS )
> +        return;
> +
> +    vvtd_set_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IP_SHIFT);
> +    if ( !vvtd_test_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IM_SHIFT) )
> +    {
> +        vvtd_notify_fault(vvtd);
> +        vvtd_clear_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IP_SHIFT);
> +    }
> +}
> +
> +static void vvtd_update_ppf(struct vvtd *vvtd)
> +{
> +    int i;

unsigned int.

> +    uint64_t cap = vvtd_get_reg_quad(vvtd, DMAR_CAP_REG);
> +    unsigned int base = cap_fault_reg_offset(cap);
> +
> +    for ( i = 0; i < cap_num_fault_regs(cap); i++ )
> +    {
> +        if ( vvtd_test_bit(vvtd, base + i * DMA_FRCD_LEN + DMA_FRCD3_OFFSET,
> +                           DMA_FRCD_F_SHIFT) )
> +        {
> +            vvtd_report_non_recoverable_fault(vvtd, DMA_FSTS_PPF_SHIFT);
> +            return;
> +        }
> +    }
> +    /*
> +     * No Primary Fault is in Fault Record Registers, thus clear PPF bit in
> +     * FSTS.
> +     */
> +    vvtd_clear_bit(vvtd, DMAR_FSTS_REG, DMA_FSTS_PPF_SHIFT);
> +
> +    /* If no fault is in FSTS, clear pending bit in FECTL. */
> +    if ( !(vvtd_get_reg(vvtd, DMAR_FSTS_REG) & DMA_FSTS_FAULTS) )
> +        vvtd_clear_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IP_SHIFT);
> +}
> +
> +/*
> + * Commit a fault to emulated Fault Record Registers.
> + */
> +static void vvtd_commit_frcd(struct vvtd *vvtd, int idx,
> +                             const struct vtd_fault_record_register *frcd)
> +{
> +    unsigned int base = cap_fault_reg_offset(
> +                            vvtd_get_reg_quad(vvtd, DMAR_CAP_REG));
> +
> +    vvtd_set_reg_quad(vvtd, base + idx * DMA_FRCD_LEN, frcd->bits.lo);
> +    vvtd_set_reg_quad(vvtd, base + idx * DMA_FRCD_LEN + 8, frcd->bits.hi);
> +    vvtd_update_ppf(vvtd);
> +}
> +
> +/*
> + * Allocate a FRCD for the caller. If success, return the FRI. Or, return -1
> + * when failure.
> + */
> +static int vvtd_alloc_frcd(struct vvtd *vvtd)

What's the maximum value of FRCD according to the spec? Will it fit in
an int?

> +{
> +    int prev;
> +    uint64_t cap = vvtd_get_reg_quad(vvtd, DMAR_CAP_REG);
> +    unsigned int base = cap_fault_reg_offset(cap);
> +
> +    /* Set the F bit to indicate the FRCD is in use. */
> +    if ( !vvtd_test_and_set_bit(vvtd,
> +                                base + vvtd->hw.fault_index * DMA_FRCD_LEN +
> +                                DMA_FRCD3_OFFSET, DMA_FRCD_F_SHIFT) )
> +    {
> +        prev = vvtd->hw.fault_index;

prev can be declared inside the if:

    unsigned int prev = vvtd->hw.fault_index;

Also prev is used only once, so I think you can just get rid of it.

> +        vvtd->hw.fault_index = (prev + 1) % cap_num_fault_regs(cap);
> +        return vvtd->hw.fault_index;
> +    }

Newline.

> +    return -ENOMEM;
> +}
> +
> +static void vvtd_free_frcd(struct vvtd *vvtd, int i)
> +{
> +    unsigned int base = cap_fault_reg_offset(
> +                            vvtd_get_reg_quad(vvtd, DMAR_CAP_REG));
> +
> +    vvtd_clear_bit(vvtd, base + i * DMA_FRCD_LEN + DMA_FRCD3_OFFSET,
> +                   DMA_FRCD_F_SHIFT);
> +}
> +
> +static int vvtd_record_fault(struct vvtd *vvtd,
> +                             const struct arch_irq_remapping_request *request,
> +                             int reason)
> +{
> +    struct vtd_fault_record_register frcd;
> +    int fault_index;

unsigned int maybe, see comments above.

> +    uint32_t irt_index;
> +
> +    spin_lock(&vvtd->fe_lock);
> +    switch(reason)
> +    {
> +    case VTD_FR_IR_REQ_RSVD:
> +    case VTD_FR_IR_INDEX_OVER:
> +    case VTD_FR_IR_ENTRY_P:
> +    case VTD_FR_IR_ROOT_INVAL:
> +    case VTD_FR_IR_IRTE_RSVD:
> +    case VTD_FR_IR_REQ_COMPAT:
> +    case VTD_FR_IR_SID_ERR:
> +        if ( vvtd_test_bit(vvtd, DMAR_FSTS_REG, DMA_FSTS_PFO_SHIFT) )
> +            goto out;
> +
> +        /* No available Fault Record means Fault overflowed */
> +        fault_index = vvtd_alloc_frcd(vvtd);
> +        if ( fault_index < 0 )
> +        {
> +            vvtd_report_non_recoverable_fault(vvtd, DMA_FSTS_PFO_SHIFT);
> +            goto out;
> +        }
> +        memset(&frcd, 0, sizeof(frcd));

Given the fact that frcd has not padding you can initialize it at
declaration using:

struct vtd_fault_record_register frcd = { };

> +        frcd.fields.fault_reason = reason;
> +        if ( irq_remapping_request_index(request, &irt_index) )
> +            goto out;
> +        frcd.fields.fault_info = irt_index;
> +        frcd.fields.source_id = request->source_id;
> +        frcd.fields.fault = 1;
> +        vvtd_commit_frcd(vvtd, fault_index, &frcd);
> +        break;
> +
> +    default:
> +        vvtd_debug("d%d: can't handle vvtd fault (reason 0x%x)",
> +                   vvtd->domain->domain_id, reason);
> +        break;
> +    }
> +
> + out:
> +    spin_unlock(&vvtd->fe_lock);
> +    return X86EMUL_OKAY;

I'm not sure why this function needs to return any value given it's
current usage, and in any case since it's not an emulation handler it
shouldn't use X86EMUL_* values at all.

> +}
> +
> +static int vvtd_write_frcd3(struct vvtd *vvtd, uint32_t val)
> +{
> +    /* Writing a 1 means clear fault */
> +    if ( val & DMA_FRCD_F )
> +    {
> +        vvtd_free_frcd(vvtd, 0);
> +        vvtd_update_ppf(vvtd);
> +    }
> +    return X86EMUL_OKAY;

Same here, I don't see the point in returning a value, and certainly
it shouldn't be X86EMUL_* in any case.

> +}
> +
> +static void vvtd_write_fectl(struct vvtd *vvtd, uint32_t val)
> +{
> +    /*
> +     * Only DMA_FECTL_IM bit is writable. Generate pending event when unmask.
> +     */
> +    if ( !(val & DMA_FECTL_IM) )
> +    {
> +        /* Clear IM */
> +        vvtd_clear_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IM_SHIFT);
> +        if ( vvtd_test_and_clear_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IP_SHIFT) )
> +            vvtd_notify_fault(vvtd);
> +    }
> +    else
> +        vvtd_set_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IM_SHIFT);
> +}
> +
> +static void vvtd_write_fsts(struct vvtd *vvtd, uint32_t val)
> +{
> +    int i, max_fault_index = DMA_FSTS_PRO_SHIFT;
> +    uint64_t bits_to_clear = val & DMA_FSTS_RW1CS;
> +
> +    if ( bits_to_clear )
> +    {
> +        i = find_first_bit(&bits_to_clear, max_fault_index / 8 + 1);
> +        while ( i <= max_fault_index )
> +        {
> +            vvtd_clear_bit(vvtd, DMAR_FSTS_REG, i);
> +            i = find_next_bit(&bits_to_clear, max_fault_index / 8 + 1, i + 1);
> +        }
> +    }
> +
> +    /*
> +     * Clear IP field when all status fields in the Fault Status Register
> +     * being clear.
> +     */
> +    if ( !((vvtd_get_reg(vvtd, DMAR_FSTS_REG) & DMA_FSTS_FAULTS)) )
> +        vvtd_clear_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IP_SHIFT);
> +}
> +
>  static void write_gcmd_ire(struct vvtd *vvtd, uint32_t val)
>  {
>      bool set = val & DMA_GCMD_IRE;
> @@ -391,11 +629,47 @@ static int vvtd_read(struct vcpu *v, unsigned long addr,
>      return X86EMUL_OKAY;
>  }
>  
> +static void vvtd_write_fault_regs(struct vvtd *vvtd, unsigned long val,
> +                                  unsigned int offset, unsigned int len)
> +{
> +    unsigned int fault_offset = cap_fault_reg_offset(
> +                                    vvtd_get_reg_quad(vvtd, DMAR_CAP_REG));
> +
> +    spin_lock(&vvtd->fe_lock);
> +    for ( ; len ; len -= 4, offset += 4, val = val >> 32)

It seems overkill to use a for loop here when len can only be 4 or 8
AFAICT (maybe I'm wrong). Is 64bit access really allowed to those
registers? You seem to treat all of them as 32bit registers which
makes me wonder if 64bit accesses are really allowed.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 15/28] x86/vvtd: Enable Queued Invalidation through GCMD
  2017-11-17  6:22 ` [PATCH v4 15/28] x86/vvtd: Enable Queued Invalidation through GCMD Chao Gao
@ 2018-02-12 14:04   ` Roger Pau Monné
  2018-02-22 10:33     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-12 14:04 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:22PM +0800, Chao Gao wrote:
> Software writes to QIE field of GCMD to enable or disable queued
> invalidations. This patch emulates QIE field of GCMD.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
>  xen/drivers/passthrough/vtd/iommu.h |  3 ++-
>  xen/drivers/passthrough/vtd/vvtd.c  | 18 ++++++++++++++++++
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
> index dc2df75..b71dab8 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -160,7 +160,8 @@
>  #define DMA_GSTS_FLS    (((u64)1) << 29)
>  #define DMA_GSTS_AFLS   (((u64)1) << 28)
>  #define DMA_GSTS_WBFS   (((u64)1) << 27)
> -#define DMA_GSTS_QIES   (((u64)1) <<26)
> +#define DMA_GSTS_QIES_SHIFT     26
> +#define DMA_GSTS_QIES   (((u64)1) << DMA_GSTS_QIES_SHIFT)
>  #define DMA_GSTS_IRES_SHIFT     25
>  #define DMA_GSTS_IRES   (((u64)1) << DMA_GSTS_IRES_SHIFT)
>  #define DMA_GSTS_SIRTPS_SHIFT   24
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
> index 83805d1..a2fa64a 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -539,6 +539,20 @@ static void write_gcmd_ire(struct vvtd *vvtd, uint32_t val)
>          (vvtd, DMAR_GSTS_REG, DMA_GSTS_IRES_SHIFT);
>  }
>  
> +static void write_gcmd_qie(struct vvtd *vvtd, uint32_t val)
> +{
> +    bool set = val & DMA_GCMD_QIE;
> +
> +    vvtd_info("%sable Queue Invalidation\n", set ? "En" : "Dis");
> +
> +    if ( set )
> +        vvtd_set_reg_quad(vvtd, DMAR_IQH_REG, 0);

If QIE is already enabled and the user writes to GCMD with the QIE bit
set won't this wrongly clear the invalidation queue?

> +
> +    (set ? vvtd_set_bit : vvtd_clear_bit)
> +        (vvtd, DMAR_GSTS_REG, DMA_GSTS_QIES_SHIFT);
> +
> +}
> +
>  static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
>  {
>      uint64_t irta = vvtd_get_reg_quad(vvtd, DMAR_IRTA_REG);
> @@ -598,6 +612,10 @@ static void vvtd_write_gcmd(struct vvtd *vvtd, uint32_t val)
>          write_gcmd_sirtp(vvtd, val);
>      if ( changed & DMA_GCMD_IRE )
>          write_gcmd_ire(vvtd, val);
> +    if ( changed & DMA_GCMD_QIE )
> +        write_gcmd_qie(vvtd, val);
> +    if ( changed & ~(DMA_GCMD_SIRTP | DMA_GCMD_IRE | DMA_GCMD_QIE) )
> +        vvtd_info("Only SIRTP, IRE, QIE in GCMD are handled");

This seems quite likely to go out of sync. I would rather do:

if ( changed & DMA_GCMD_QIE )
{
    write_gcmd_qie(vvtd, val);
    changed &= ~DMA_GCMD_QIE;
}
...
if ( changed )
    vvtd_info("Unhandled bit detected: %...");

It seems also quite likely this can be simplified with a macro:

#define HANDLE_GCMD_BIT(bit)        \
if ( changed & DMA_GCMD_ ## bit )   \
{                                   \
    write_gcmd_ ## bit (vvtd, val); \
    changed &= ~DMA_GCMD_ ## bit;   \
}

So that you can write:

HANDLE_GCMD_BIT(IRE);
HANDLE_GCMD_BIT(QIE);
...

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 16/28] x86/vvtd: Add queued invalidation (QI) support
  2017-11-17  6:22 ` [PATCH v4 16/28] x86/vvtd: Add queued invalidation (QI) support Chao Gao
@ 2018-02-12 14:36   ` Roger Pau Monné
  2018-02-23  4:38     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-12 14:36 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:23PM +0800, Chao Gao wrote:
> Queued Invalidation Interface is an expanded invalidation interface with
> extended capabilities. Hardware implementations report support for queued
> invalidation interface through the Extended Capability Register. The queued
> invalidation interface uses an Invalidation Queue (IQ), which is a circular
> buffer in system memory. Software submits commands by writing Invalidation
> Descriptors to the IQ.
> 
> In this patch, a new function viommu_process_iq() is used for emulating how
> hardware handles invalidation requests through QI.

You should mention that QI is mandatory in order to support interrupt
remapping.

I was about to ask whether QI could be deferred to a later stage, but
AFAICT this is not an option.

> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> 
> ---
> v4:
>  - Introduce a lock to protect invalidation related registers.
> ---
>  xen/drivers/passthrough/vtd/iommu.h |  24 +++-
>  xen/drivers/passthrough/vtd/vvtd.c  | 271 +++++++++++++++++++++++++++++++++++-
>  2 files changed, 293 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
> index b71dab8..de9188b 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -47,7 +47,12 @@
>  #define DMAR_IQH_REG            0x80 /* invalidation queue head */
>  #define DMAR_IQT_REG            0x88 /* invalidation queue tail */
>  #define DMAR_IQA_REG            0x90 /* invalidation queue addr */
> +#define DMAR_IQUA_REG           0x94 /* invalidation queue upper addr */
> +#define DMAR_ICS_REG            0x9c /* invalidation completion status */
>  #define DMAR_IECTL_REG          0xa0 /* invalidation event control register */
> +#define DMAR_IEDATA_REG         0xa4 /* invalidation event data register */
> +#define DMAR_IEADDR_REG         0xa8 /* invalidation event address register */
> +#define DMAR_IEUADDR_REG        0xac /* upper address register */
>  #define DMAR_IRTA_REG           0xb8 /* base address of intr remap table */
>  #define DMAR_IRTUA_REG          0xbc /* upper address of intr remap table */
>  
> @@ -175,6 +180,21 @@
>  #define DMA_IRTA_S(val)         (val & 0xf)
>  #define DMA_IRTA_SIZE(val)      (1UL << (DMA_IRTA_S(val) + 1))
>  
> +/* IQA_REG */
> +#define DMA_IQA_ADDR(val)       (val & ~0xfffULL)
> +#define DMA_IQA_QS(val)         (val & 0x7)
> +#define DMA_IQA_RSVD            0xff8ULL
> +
> +/* IECTL_REG */
> +#define DMA_IECTL_IM_SHIFT 31
> +#define DMA_IECTL_IM            (1U << DMA_IECTL_IM_SHIFT)
> +#define DMA_IECTL_IP_SHIFT 30
> +#define DMA_IECTL_IP            (1U << DMA_IECTL_IP_SHIFT)
> +
> +/* ICS_REG */
> +#define DMA_ICS_IWC_SHIFT       0
> +#define DMA_ICS_IWC             (1U << DMA_ICS_IWC_SHIFT)
> +
>  /* PMEN_REG */
>  #define DMA_PMEN_EPM    (((u32)1) << 31)
>  #define DMA_PMEN_PRS    (((u32)1) << 0)
> @@ -205,13 +225,14 @@
>  /* FSTS_REG */
>  #define DMA_FSTS_PFO_SHIFT  0
>  #define DMA_FSTS_PPF_SHIFT  1
> +#define DMA_FSTS_IQE_SHIFT  4
>  #define DMA_FSTS_PRO_SHIFT  7
>  
>  #define DMA_FSTS_PFO        ((uint32_t)1 << DMA_FSTS_PFO_SHIFT)
>  #define DMA_FSTS_PPF        ((uint32_t)1 << DMA_FSTS_PPF_SHIFT)
>  #define DMA_FSTS_AFO        ((uint32_t)1 << 2)
>  #define DMA_FSTS_APF        ((uint32_t)1 << 3)
> -#define DMA_FSTS_IQE        ((uint32_t)1 << 4)
> +#define DMA_FSTS_IQE        ((uint32_t)1 << DMA_FSTS_IQE_SHIFT)
>  #define DMA_FSTS_ICE        ((uint32_t)1 << 5)
>  #define DMA_FSTS_ITE        ((uint32_t)1 << 6)
>  #define DMA_FSTS_PRO        ((uint32_t)1 << DMA_FSTS_PRO_SHIFT)
> @@ -555,6 +576,7 @@ struct qinval_entry {
>  
>  /* Queue invalidation head/tail shift */
>  #define QINVAL_INDEX_SHIFT 4
> +#define QINVAL_INDEX_MASK  0x7fff0ULL
>  
>  #define qinval_present(v) ((v).lo & 1)
>  #define qinval_fault_disable(v) (((v).lo >> 1) & 1)
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
> index a2fa64a..81170ec 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -27,6 +27,7 @@
>  #include <asm/event.h>
>  #include <asm/io_apic.h>
>  #include <asm/hvm/domain.h>
> +#include <asm/hvm/support.h>
>  #include <asm/p2m.h>
>  
>  #include "iommu.h"
> @@ -68,6 +69,9 @@ struct vvtd {
>  
>      struct hvm_hw_vvtd hw;
>      void *irt_base;
> +    void *inv_queue_base;

Why not declare this as:

struct qinval_entry *

> +    /* This lock protects invalidation related registers */
> +    spinlock_t ie_lock;

As noted in another patch, I think the first approach should be to use
a single lock that serializes access to the whole vIOMMU register
space. Later we can see about more fine grained locking.

>  };
>  
>  /* Setting viommu_verbose enables debugging messages of vIOMMU */
> @@ -284,6 +288,12 @@ static void vvtd_notify_fault(const struct vvtd *vvtd)
>                              vvtd_get_reg(vvtd, DMAR_FEDATA_REG));
>  }
>  
> +static void vvtd_notify_inv_completion(const struct vvtd *vvtd)
> +{
> +    vvtd_generate_interrupt(vvtd, vvtd_get_reg_quad(vvtd, DMAR_IEADDR_REG),
> +                            vvtd_get_reg(vvtd, DMAR_IEDATA_REG));
> +}
> +
>  /* Computing the IRTE index for a given interrupt request. When success, return
>   * 0 and set index to reference the corresponding IRTE. Otherwise, return < 0,
>   * i.e. -1 when the irq request isn't an remapping format.
> @@ -478,6 +488,189 @@ static int vvtd_record_fault(struct vvtd *vvtd,
>      return X86EMUL_OKAY;
>  }
>  
> +/*
> + * Process an invalidation descriptor. Currently, only two types descriptors,
> + * Interrupt Entry Cache Invalidation Descritor and Invalidation Wait
> + * Descriptor are handled.
> + * @vvtd: the virtual vtd instance
> + * @i: the index of the invalidation descriptor to be processed
> + *
> + * If success return 0, or return non-zero when failure.
> + */
> +static int process_iqe(struct vvtd *vvtd, uint32_t i)
> +{
> +    struct qinval_entry qinval;
> +    int ret = 0;
> +
> +    if ( !vvtd->inv_queue_base )
> +    {
> +        gdprintk(XENLOG_ERR, "Invalidation queue base isn't set\n");
> +        return -1;

If you just return -1 or 0 please use bool instead. Or return proper
error codes.

> +    }
> +    qinval = ((struct qinval_entry *)vvtd->inv_queue_base)[i];

See my comment above regarding how inv_queue_base is declared, I'm not
sure why the copy is needed here.

> +
> +    switch ( qinval.q.inv_wait_dsc.lo.type )
> +    {
> +    case TYPE_INVAL_WAIT:
> +        if ( qinval.q.inv_wait_dsc.lo.sw )
> +        {
> +            uint32_t data = qinval.q.inv_wait_dsc.lo.sdata;
> +            uint64_t addr = qinval.q.inv_wait_dsc.hi.saddr << 2;
> +
> +            ret = hvm_copy_to_guest_phys(addr, &data, sizeof(data), current);
> +            if ( ret )
> +                vvtd_info("Failed to write status address\n");
> +        }
> +
> +        /*
> +         * The following code generates an invalidation completion event
> +         * indicating the invalidation wait descriptor completion. Note that
> +         * the following code fragment is not tested properly.
> +         */
> +        if ( qinval.q.inv_wait_dsc.lo.iflag )
> +        {
> +            if ( !vvtd_test_and_set_bit(vvtd, DMAR_ICS_REG, DMA_ICS_IWC_SHIFT) )
> +            {
> +                vvtd_set_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IP_SHIFT);
> +                if ( !vvtd_test_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IM_SHIFT) )
> +                {
> +                    vvtd_notify_inv_completion(vvtd);
> +                    vvtd_clear_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IP_SHIFT);
> +                }
> +            }
> +        }
> +        break;
> +
> +    case TYPE_INVAL_IEC:
> +        /* No cache is preserved in vvtd, nothing is needed to be flushed */
> +        break;
> +
> +    default:
> +        vvtd_debug("d%d: Invalidation type (%x) isn't supported\n",
> +                   vvtd->domain->domain_id, qinval.q.inv_wait_dsc.lo.type);
> +        return -1;
> +    }
> +
> +    return ret;
> +}
> +
> +/*
> + * Invalidate all the descriptors in Invalidation Queue.
> + */
> +static void vvtd_process_iq(struct vvtd *vvtd)
> +{
> +    uint32_t max_entry, i, iqh, iqt;
> +    int err = 0;
> +
> +    /* Trylock avoids more than 1 caller dealing with invalidation requests */
> +    if ( !spin_trylock(&vvtd->ie_lock) )

Uh, is this correct? You are returning without the queue being
invalidated AFAICT.

> +        return;
> +
> +    iqh = MASK_EXTR(vvtd_get_reg_quad(vvtd, DMAR_IQH_REG), QINVAL_INDEX_MASK);
> +    iqt = MASK_EXTR(vvtd_get_reg_quad(vvtd, DMAR_IQT_REG), QINVAL_INDEX_MASK);
> +    /*
> +     * No new descriptor is fetched from the Invalidation Queue until
> +     * software clears the IQE field in the Fault Status Register
> +     */
> +    if ( vvtd_test_bit(vvtd, DMAR_FSTS_REG, DMA_FSTS_IQE_SHIFT) )
> +    {
> +        spin_unlock(&vvtd->ie_lock);
> +        return;
> +    }
> +
> +    max_entry = 1 << (QINVAL_ENTRY_ORDER +
> +                      DMA_IQA_QS(vvtd_get_reg_quad(vvtd, DMAR_IQA_REG)));
> +
> +    ASSERT(iqt < max_entry);

Is it possible for the user to write a valid value to DMAR_IQT_REG and
then change DMAR_IQA_REG in order to make the above ASSERT trigger?

> +    if ( iqh == iqt )
> +    {
> +        spin_unlock(&vvtd->ie_lock);
> +        return;
> +    }
> +
> +    for ( i = iqh; i != iqt; i = (i + 1) % max_entry )
> +    {
> +        err = process_iqe(vvtd, i);
> +        if ( err )
> +            break;
> +    }
> +
> +    /*
> +     * set IQH before checking error, because IQH should reference
> +     * the desriptor associated with the error when an error is seen
> +     * by guest
> +     */
> +    vvtd_set_reg_quad(vvtd, DMAR_IQH_REG, i << QINVAL_INDEX_SHIFT);
> +
> +    spin_unlock(&vvtd->ie_lock);
> +    if ( err )
> +    {
> +        spin_lock(&vvtd->fe_lock);
> +        vvtd_report_non_recoverable_fault(vvtd, DMA_FSTS_IQE_SHIFT);
> +        spin_unlock(&vvtd->fe_lock);
> +    }
> +}
> +
> +static void vvtd_write_iqt(struct vvtd *vvtd, uint32_t val)
> +{
> +    uint32_t max_entry;
> +
> +    if ( val & ~QINVAL_INDEX_MASK )
> +    {
> +        vvtd_info("attempts to set reserved bits in IQT\n");
> +        return;
> +    }
> +
> +    max_entry = 1U << (QINVAL_ENTRY_ORDER +
> +                       DMA_IQA_QS(vvtd_get_reg_quad(vvtd, DMAR_IQA_REG)));
> +    if ( MASK_EXTR(val, QINVAL_INDEX_MASK) >= max_entry )
> +    {
> +        vvtd_info("IQT: Value %x exceeded supported max index.", val);
> +        return;
> +    }
> +
> +    vvtd_set_reg(vvtd, DMAR_IQT_REG, val);
> +}
> +
> +static void vvtd_write_iqa(struct vvtd *vvtd, uint32_t val, bool high)
> +{
> +    uint64_t cap = vvtd_get_reg_quad(vvtd, DMAR_CAP_REG);
> +    uint64_t old = vvtd_get_reg_quad(vvtd, DMAR_IQA_REG);
> +    uint64_t new;
> +
> +    if ( high )
> +        new = ((uint64_t)val << 32) | (old & 0xffffffff);
> +    else
> +        new = ((old >> 32) << 32) | val;

You can also use old & ~0xffffffffUL

> +
> +    if ( new & (~((1ULL << cap_mgaw(cap)) - 1) | DMA_IQA_RSVD) )
> +    {
> +        vvtd_info("Attempt to set reserved bits in IQA\n");
> +        return;
> +    }
> +
> +    vvtd_set_reg_quad(vvtd, DMAR_IQA_REG, new);
> +    if ( high && !vvtd->inv_queue_base )
> +        vvtd->inv_queue_base = map_guest_pages(vvtd->domain,
> +                                               PFN_DOWN(DMA_IQA_ADDR(new)),
> +                                               1 << DMA_IQA_QS(new));

Don't you need to pick a reference to this page(s)?

> +    else if ( !high && vvtd->inv_queue_base )

I'm not sure I follow the logic with high here.

> +    {
> +        unmap_guest_pages(vvtd->inv_queue_base, 1 << DMA_IQA_QS(old));
> +        vvtd->inv_queue_base = NULL;
> +    }
> +}
> +
> +static void vvtd_write_ics(struct vvtd *vvtd, uint32_t val)
> +{
> +    if ( val & DMA_ICS_IWC )
> +    {
> +        vvtd_clear_bit(vvtd, DMAR_ICS_REG, DMA_ICS_IWC_SHIFT);
> +        /* When IWC field is cleared, the IP field needs to be cleared */
> +        vvtd_clear_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IP_SHIFT);
> +    }
> +}
> +
>  static int vvtd_write_frcd3(struct vvtd *vvtd, uint32_t val)
>  {
>      /* Writing a 1 means clear fault */
> @@ -489,6 +682,20 @@ static int vvtd_write_frcd3(struct vvtd *vvtd, uint32_t val)
>      return X86EMUL_OKAY;
>  }
>  
> +static void vvtd_write_iectl(struct vvtd *vvtd, uint32_t val)
> +{
> +    /* Only DMA_IECTL_IM bit is writable. Generate pending event when unmask */
> +    if ( !(val & DMA_IECTL_IM) )
> +    {
> +        /* Clear IM and clear IP */
> +        vvtd_clear_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IM_SHIFT);
> +        if ( vvtd_test_and_clear_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IP_SHIFT) )
> +            vvtd_notify_inv_completion(vvtd);
> +    }
> +    else
> +        vvtd_set_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IM_SHIFT);
> +}
> +
>  static void vvtd_write_fectl(struct vvtd *vvtd, uint32_t val)
>  {
>      /*
> @@ -681,6 +888,48 @@ static void vvtd_write_fault_regs(struct vvtd *vvtd, unsigned long val,
>      spin_unlock(&vvtd->fe_lock);
>  }
>  
> +static void vvtd_write_invalidation_regs(struct vvtd *vvtd, unsigned long val,
> +                                         unsigned int offset, unsigned int len)
> +{
> +    spin_lock(&vvtd->ie_lock);
> +    for ( ; len ; len -= 4, offset += 4, val = val >> 32)

Same comment as in the previous patch, I don't really like the for
loop, but I guess 64bit access must be allowed to these grup of
registers?

> +    {
> +        switch ( offset )
> +        {
> +        case DMAR_IECTL_REG:
> +            vvtd_write_iectl(vvtd, val);
> +            break;
> +
> +        case DMAR_ICS_REG:
> +            vvtd_write_ics(vvtd, val);
> +            break;
> +
> +        case DMAR_IQT_REG:
> +            vvtd_write_iqt(vvtd, val);
> +            break;
> +
> +        case DMAR_IQA_REG:
> +            vvtd_write_iqa(vvtd, val, 0);
> +            break;
> +
> +        case DMAR_IQUA_REG:
> +            vvtd_write_iqa(vvtd, val, 1);
> +            break;
> +
> +        case DMAR_IEDATA_REG:
> +        case DMAR_IEADDR_REG:
> +        case DMAR_IEUADDR_REG:
> +            vvtd_set_reg(vvtd, offset, val);
> +            break;
> +
> +        default:
> +            break;
> +        }
> +    }
> +    spin_unlock(&vvtd->ie_lock);
> +
> +}
> +
>  static int vvtd_write(struct vcpu *v, unsigned long addr,
>                        unsigned int len, unsigned long val)
>  {
> @@ -719,6 +968,17 @@ static int vvtd_write(struct vcpu *v, unsigned long addr,
>          vvtd_write_fault_regs(vvtd, val, offset, len);
>          break;
>  
> +    case DMAR_IECTL_REG:
> +    case DMAR_ICS_REG:
> +    case DMAR_IQT_REG:
> +    case DMAR_IQA_REG:
> +    case DMAR_IQUA_REG:
> +    case DMAR_IEDATA_REG:
> +    case DMAR_IEADDR_REG:
> +    case DMAR_IEUADDR_REG:
> +        vvtd_write_invalidation_regs(vvtd, val, offset, len);
> +        break;
> +
>      default:
>          if ( (offset == (fault_offset + DMA_FRCD2_OFFSET)) ||
>               (offset == (fault_offset + DMA_FRCD3_OFFSET)) )
> @@ -840,7 +1100,8 @@ static int vvtd_handle_irq_request(const struct domain *d,
>                          irte.remap.tm);
>  
>   out:
> -    atomic_dec(&vvtd->inflight_intr);
> +    if ( !atomic_dec_and_test(&vvtd->inflight_intr) )
> +        vvtd_process_iq(vvtd);
>      return ret;
>  }
>  
> @@ -911,6 +1172,7 @@ static int vvtd_create(struct domain *d, struct viommu *viommu)
>      vvtd->domain = d;
>      register_mmio_handler(d, &vvtd_mmio_ops);
>      spin_lock_init(&vvtd->fe_lock);
> +    spin_lock_init(&vvtd->ie_lock);
>  
>      viommu->priv = vvtd;
>  
> @@ -930,6 +1192,13 @@ static int vvtd_destroy(struct viommu *viommu)
>                                       sizeof(struct iremap_entry)));
>              vvtd->irt_base = NULL;
>          }
> +        if ( vvtd->inv_queue_base )
> +        {
> +            uint64_t old = vvtd_get_reg_quad(vvtd, DMAR_IQA_REG);
> +
> +            unmap_guest_pages(vvtd->inv_queue_base, 1 << DMA_IQA_QS(old));

Don't you also need to unmap this page(s) when QIE is disabled?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 17/28] x86/vvtd: save and restore emulated VT-d
  2017-11-17  6:22 ` [PATCH v4 17/28] x86/vvtd: save and restore emulated VT-d Chao Gao
@ 2018-02-12 14:49   ` Roger Pau Monné
  2018-02-23  5:22     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-12 14:49 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:24PM +0800, Chao Gao wrote:
> Provide a save-restore pair to save/restore registers and non-register
> status.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
> v3:
>  - use one entry to save both vvtd registers and other intermediate
>  state
> ---
>  xen/drivers/passthrough/vtd/vvtd.c     | 57 +++++++++++++++++++++++-----------
>  xen/include/public/arch-x86/hvm/save.h | 18 ++++++++++-
>  2 files changed, 56 insertions(+), 19 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
> index 81170ec..f6bde69 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -27,8 +27,10 @@
>  #include <asm/event.h>
>  #include <asm/io_apic.h>
>  #include <asm/hvm/domain.h>
> +#include <asm/hvm/save.h>
>  #include <asm/hvm/support.h>
>  #include <asm/p2m.h>
> +#include <public/hvm/save.h>
>  
>  #include "iommu.h"
>  #include "vtd.h"
> @@ -38,20 +40,6 @@
>  
>  #define VVTD_FRCD_NUM   1ULL
>  #define VVTD_FRCD_START (DMAR_IRTA_REG + 8)
> -#define VVTD_FRCD_END   (VVTD_FRCD_START + VVTD_FRCD_NUM * 16)
> -#define VVTD_MAX_OFFSET VVTD_FRCD_END
> -
> -struct hvm_hw_vvtd {
> -    bool eim_enabled;
> -    bool intremap_enabled;
> -    uint32_t fault_index;
> -
> -    /* Interrupt remapping table base gfn and the max of entries */
> -    uint16_t irt_max_entry;
> -    gfn_t irt;

You are changing gfn_t to uint64_t, is gfn_t not working with the
migration stream?

Also I think this duplication of fields (having all registers in
'regs' and some cached in miscellaneous top level fields is not a good
approach.

> -
> -    uint32_t regs[VVTD_MAX_OFFSET/sizeof(uint32_t)];
> -};
>  
>  struct vvtd {
>      /* Base address of remapping hardware register-set */
> @@ -776,7 +764,7 @@ static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
>      if ( vvtd->hw.intremap_enabled )
>          vvtd_info("Update Interrupt Remapping Table when active\n");
>  
> -    if ( gfn_x(vvtd->hw.irt) != PFN_DOWN(DMA_IRTA_ADDR(irta)) ||
> +    if ( vvtd->hw.irt != PFN_DOWN(DMA_IRTA_ADDR(irta)) ||
>           vvtd->hw.irt_max_entry != DMA_IRTA_SIZE(irta) )
>      {
>          if ( vvtd->irt_base )
> @@ -786,14 +774,14 @@ static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
>                                       sizeof(struct iremap_entry)));
>              vvtd->irt_base = NULL;
>          }
> -        vvtd->hw.irt = _gfn(PFN_DOWN(DMA_IRTA_ADDR(irta)));
> +        vvtd->hw.irt = PFN_DOWN(DMA_IRTA_ADDR(irta));
>          vvtd->hw.irt_max_entry = DMA_IRTA_SIZE(irta);
>          vvtd->hw.eim_enabled = !!(irta & IRTA_EIME);
>          vvtd_info("Update IR info (addr=%lx eim=%d size=%d)\n",
> -                  gfn_x(vvtd->hw.irt), vvtd->hw.eim_enabled,
> +                  vvtd->hw.irt, vvtd->hw.eim_enabled,
>                    vvtd->hw.irt_max_entry);
>  
> -        vvtd->irt_base = map_guest_pages(vvtd->domain, gfn_x(vvtd->hw.irt),
> +        vvtd->irt_base = map_guest_pages(vvtd->domain, vvtd->hw.irt,
>                                           PFN_UP(vvtd->hw.irt_max_entry *
>                                                  sizeof(struct iremap_entry)));
>      }
> @@ -1138,6 +1126,39 @@ static bool vvtd_is_remapping(const struct domain *d,
>      return !irq_remapping_request_index(irq, &idx);
>  }
>  
> +static int vvtd_load(struct domain *d, hvm_domain_context_t *h)
> +{
> +    struct vvtd *vvtd = domain_vvtd(d);
> +    uint64_t iqa;
> +
> +    if ( !vvtd )
> +        return -ENODEV;
> +
> +    if ( hvm_load_entry(VVTD, h, &vvtd->hw) )
> +        return -EINVAL;
> +
> +    iqa = vvtd_get_reg_quad(vvtd, DMAR_IQA_REG);
> +    vvtd->irt_base = map_guest_pages(vvtd->domain, vvtd->hw.irt,
> +                                     PFN_UP(vvtd->hw.irt_max_entry *
> +                                            sizeof(struct iremap_entry)));
> +    vvtd->inv_queue_base = map_guest_pages(vvtd->domain,
> +                                           PFN_DOWN(DMA_IQA_ADDR(iqa)),
> +                                           1 << DMA_IQA_QS(iqa));

Why are you unconditionally mapping those pages? Shouldn't you check
that the relevant features are enabled?

Both could be 0 or simply point to garbage.

> +    return 0;
> +}
> +
> +static int vvtd_save(struct domain *d, hvm_domain_context_t *h)
> +{
> +    struct vvtd *vvtd = domain_vvtd(d);
> +
> +    if ( !vvtd )
> +        return 0;
> +
> +    return hvm_save_entry(VVTD, 0, h, &vvtd->hw);
> +}
> +
> +HVM_REGISTER_SAVE_RESTORE(VVTD, vvtd_save, vvtd_load, 1, HVMSR_PER_DOM);
> +
>  static void vvtd_reset(struct vvtd *vvtd)
>  {
>      uint64_t cap = cap_set_num_fault_regs(VVTD_FRCD_NUM)
> diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
> index fd7bf3f..24a513b 100644
> --- a/xen/include/public/arch-x86/hvm/save.h
> +++ b/xen/include/public/arch-x86/hvm/save.h
> @@ -639,10 +639,26 @@ struct hvm_msr {
>  
>  #define CPU_MSR_CODE  20
>  
> +#define VVTD_MAX_OFFSET 0xd0

You used to have some kind of formula to calculate VVTD_MAX_OFFSET,
yet here the value is just hardcoded. Any reason for this?

> +struct hvm_hw_vvtd
> +{
> +    uint32_t eim_enabled : 1,
> +             intremap_enabled : 1;
> +    uint32_t fault_index;
> +
> +    /* Interrupt remapping table base gfn and the max of entries */
> +    uint32_t irt_max_entry;
> +    uint64_t irt;
> +
> +    uint32_t regs[VVTD_MAX_OFFSET/sizeof(uint32_t)];
> +};
> +
> +DECLARE_HVM_SAVE_TYPE(VVTD, 21, struct hvm_hw_vvtd);

Adding new fields to this struct in a migration compatible way is
going to be a PITA, but there's no easy solution to this I'm afraid...

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 18/28] x86/vioapic: Hook interrupt delivery of vIOAPIC
  2017-11-17  6:22 ` [PATCH v4 18/28] x86/vioapic: Hook interrupt delivery of vIOAPIC Chao Gao
@ 2018-02-12 14:54   ` Roger Pau Monné
  2018-02-24  1:51     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-12 14:54 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:25PM +0800, Chao Gao wrote:
> When irq remapping is enabled, IOAPIC Redirection Entry may be in remapping
> format. If that, generate an irq_remapping_request and call the common

"If that's the case, ..."

> VIOMMU abstraction's callback to handle this interrupt request. Device
> model is responsible for checking the request's validity.

What does this exactly mean? Device model is not involved in what the
guest writes to the vIOAPIC RTE, so it's impossible for the device
model to validate this in any way.

> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> 
> ---
> v3:
>  - use the new interface to check remapping format.
> ---
>  xen/arch/x86/hvm/vioapic.c   | 9 +++++++++
>  xen/include/asm-x86/viommu.h | 9 +++++++++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
> index 97b419f..0f20e3f 100644
> --- a/xen/arch/x86/hvm/vioapic.c
> +++ b/xen/arch/x86/hvm/vioapic.c
> @@ -30,6 +30,7 @@
>  #include <xen/lib.h>
>  #include <xen/errno.h>
>  #include <xen/sched.h>
> +#include <xen/viommu.h>
>  #include <public/hvm/ioreq.h>
>  #include <asm/hvm/io.h>
>  #include <asm/hvm/vpic.h>
> @@ -387,9 +388,17 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
>      struct vlapic *target;
>      struct vcpu *v;
>      unsigned int irq = vioapic->base_gsi + pin;
> +    struct arch_irq_remapping_request request;
>  
>      ASSERT(spin_is_locked(&d->arch.hvm_domain.irq_lock));
>  
> +    irq_request_ioapic_fill(&request, vioapic->id, vioapic->redirtbl[pin].bits);
> +    if ( viommu_check_irq_remapping(d, &request) )
> +    {
> +        viommu_handle_irq_request(d, &request);
> +        return;
> +    }

Will this compile if you disable vIOMMU in Kconfig?

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 19/28] x86/vioapic: extend vioapic_get_vector() to support remapping format RTE
  2017-11-17  6:22 ` [PATCH v4 19/28] x86/vioapic: extend vioapic_get_vector() to support remapping format RTE Chao Gao
@ 2018-02-12 15:01   ` Roger Pau Monné
  0 siblings, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-12 15:01 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:26PM +0800, Chao Gao wrote:
> When IOAPIC RTE is in remapping format, it doesn't contain the vector of
> interrupt. For this case, the RTE contains an index of interrupt remapping
> table where the vector of interrupt is stored. This patchs gets the vector
> through a vIOMMU interface.

I think this should be merged with the previous patch.

> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
>  xen/arch/x86/hvm/vioapic.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
> index 0f20e3f..8b34b21 100644
> --- a/xen/arch/x86/hvm/vioapic.c
> +++ b/xen/arch/x86/hvm/vioapic.c
> @@ -560,11 +560,23 @@ int vioapic_get_vector(const struct domain *d, unsigned int gsi)
>  {
>      unsigned int pin;
>      const struct hvm_vioapic *vioapic = gsi_vioapic(d, gsi, &pin);
> +    struct arch_irq_remapping_request request;
>  
>      if ( !vioapic )
>          return -EINVAL;
>  
> -    return vioapic->redirtbl[pin].fields.vector;
> +    irq_request_ioapic_fill(&request, vioapic->id, vioapic->redirtbl[pin].bits);
> +    if ( viommu_check_irq_remapping(vioapic->domain, &request) )
> +    {
> +        struct arch_irq_remapping_info info;
> +
> +        return unlikely(viommu_get_irq_info(vioapic->domain, &request, &info))
> +                   ? : info.vector;
> +    }
> +    else
> +    {
> +        return vioapic->redirtbl[pin].fields.vector;
> +    }

Unneeded braces.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 20/28] xen/pt: when binding guest msi, accept the whole msi message
  2017-11-17  6:22 ` [PATCH v4 20/28] xen/pt: when binding guest msi, accept the whole msi message Chao Gao
@ 2018-02-12 15:16   ` Roger Pau Monné
  2018-02-24  2:20     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-12 15:16 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jan Beulich,
	Ian Jackson

On Fri, Nov 17, 2017 at 02:22:27PM +0800, Chao Gao wrote:
> ... rather than a filtered one. Previously, some fields (reserved or
> unalterable) are filtered by QEMU. These fields are useless for the
> legacy interrupt format (i.e. non remappable format). However, these
> fields are meaningful to remappable format. Accepting the whole msi
> message will significantly reduce the efforts to support binding
> remappable format msi.

This should be sent as a separate patch series, together with the
required QEMU change. Batching it in this series it's going to make it
harder to commit IMO.

Also note that the QEMU side needs to be committed and backported to
the qemu-xen tree before applying the Xen side.

> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
> v4:
>  - new
> ---
>  tools/libxc/include/xenctrl.h |  7 ++++---
>  tools/libxc/xc_domain.c       | 14 ++++++++------
>  xen/arch/x86/hvm/vmsi.c       | 12 ++++++------
>  xen/drivers/passthrough/io.c  | 36 +++++++++++++++++-------------------
>  xen/include/asm-x86/hvm/irq.h |  5 +++--
>  xen/include/public/domctl.h   |  8 ++------
>  6 files changed, 40 insertions(+), 42 deletions(-)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 666db0b..8ade90c 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -1756,16 +1756,17 @@ int xc_domain_ioport_mapping(xc_interface *xch,
>  int xc_domain_update_msi_irq(
>      xc_interface *xch,
>      uint32_t domid,
> -    uint32_t gvec,
>      uint32_t pirq,
> +    uint64_t addr,
> +    uint32_t data,
>      uint32_t gflags,

If you pass addr and data, do you really need to also pass gflags?

> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index 7126de7..5edb0e7 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -101,12 +101,12 @@ int vmsi_deliver(
>  
>  void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci)
>  {
> -    uint32_t flags = pirq_dpci->gmsi.gflags;
> -    int vector = pirq_dpci->gmsi.gvec;
> -    uint8_t dest = (uint8_t)flags;
> -    bool dest_mode = flags & XEN_DOMCTL_VMSI_X86_DM_MASK;
> -    uint8_t delivery_mode = MASK_EXTR(flags, XEN_DOMCTL_VMSI_X86_DELIV_MASK);
> -    bool trig_mode = flags & XEN_DOMCTL_VMSI_X86_TRIG_MASK;
> +    uint8_t vector = pirq_dpci->gmsi.data & MSI_DATA_VECTOR_MASK;

MASK_EXTR please (here and elsewhere).

> +    uint8_t dest = MASK_EXTR(pirq_dpci->gmsi.addr, MSI_ADDR_DEST_ID_MASK);
> +    bool dest_mode = pirq_dpci->gmsi.addr & MSI_ADDR_DESTMODE_MASK;
> +    uint8_t delivery_mode = MASK_EXTR(pirq_dpci->gmsi.data,
> +                                      MSI_DATA_DELIVERY_MODE_MASK);
> +    bool trig_mode = pirq_dpci->gmsi.data & MSI_DATA_TRIGGER_MASK;
>  
>      HVM_DBG_LOG(DBG_LEVEL_IOAPIC,
>                  "msi: dest=%x dest_mode=%x delivery_mode=%x "
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index 8f16e6c..d8c66bf 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -339,19 +339,17 @@ int pt_irq_create_bind(
>      {
>      case PT_IRQ_TYPE_MSI:
>      {
> -        uint8_t dest, delivery_mode;
> +        uint8_t dest, delivery_mode, gvec;

I'm not sure you really need the gvec local variable, AFAICT it's used
only once.

> diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h
> index 3b6b4bd..3a8832c 100644
> --- a/xen/include/asm-x86/hvm/irq.h
> +++ b/xen/include/asm-x86/hvm/irq.h
> @@ -132,9 +132,10 @@ struct dev_intx_gsi_link {
>  #define HVM_IRQ_DPCI_TRANSLATE       (1u << _HVM_IRQ_DPCI_TRANSLATE_SHIFT)
>  
>  struct hvm_gmsi_info {
> -    uint32_t gvec;
> -    uint32_t gflags;
> +    uint32_t data;
>      int dest_vcpu_id; /* -1 :multi-dest, non-negative: dest_vcpu_id */
> +    uint64_t addr;
> +    uint8_t gvec;

Can't you just obtain the guest vector from addr and flags?

>      bool posted; /* directly deliver to guest via VT-d PI? */
>  };
>  
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 9f6f0aa..2717c68 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -536,15 +536,11 @@ struct xen_domctl_bind_pt_irq {
>              uint8_t intx;
>          } pci;
>          struct {
> -            uint8_t gvec;
>              uint32_t gflags;
> -#define XEN_DOMCTL_VMSI_X86_DEST_ID_MASK 0x0000ff
> -#define XEN_DOMCTL_VMSI_X86_RH_MASK      0x000100
> -#define XEN_DOMCTL_VMSI_X86_DM_MASK      0x000200
> -#define XEN_DOMCTL_VMSI_X86_DELIV_MASK   0x007000
> -#define XEN_DOMCTL_VMSI_X86_TRIG_MASK    0x008000
>  #define XEN_DOMCTL_VMSI_X86_UNMASKED     0x010000

Oh, I see, you need gflags for the unmask thing only.

>  
> +            uint32_t data;
> +            uint64_t addr;
>              uint64_aligned_t gtable;
>          } msi;
>          struct {
> -- 
> 1.8.3.1
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 21/28] vvtd: update hvm_gmsi_info when binding guest msi with pirq or
  2017-11-17  6:22 ` [PATCH v4 21/28] vvtd: update hvm_gmsi_info when binding guest msi with pirq or Chao Gao
@ 2018-02-12 15:38   ` Roger Pau Monné
  2018-02-24  5:05     ` Chao Gao
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-12 15:38 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Nov 17, 2017 at 02:22:28PM +0800, Chao Gao wrote:
> ... handlding guest's invalidation request.
> 
> To support pirq migration optimization and using VT-d posted interrupt to
> inject msi from assigned devices, each time guest programs msi information
> (affinity, vector), the struct hvm_gmsi_info should be updated accordingly.
> But after introducing vvtd, guest only needs to update an IRTE, which is in
> guest memory, to program msi information.  vvtd doesn't trap r/w to the memory
> range. Instead, it traps the queue invalidation, which is a method used to
> notify VT-d hardware that an IRTE has changed.
> 
> This patch updates hvm_gmsi_info structure and programs physical IRTEs to use
> VT-d posted interrupt if possible when binding guest msi with pirq or handling
> guest's invalidation request. For the latter, all physical interrupts bound
> with the domain are gone through to find the ones matching with the IRTE.
> 
> Notes: calling vvtd_process_iq() in vvtd_read() rather than in
> vvtd_handle_irq_request() is to avoid ABBA deadlock of d->event_lock and
> vvtd->ie_lock.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
> v4:
>  - new
> ---
>  xen/arch/x86/hvm/hvm.c             |  2 +-
>  xen/drivers/passthrough/io.c       | 89 ++++++++++++++++++++++++++++----------
>  xen/drivers/passthrough/vtd/vvtd.c | 70 ++++++++++++++++++++++++++++--
>  xen/include/asm-x86/hvm/hvm.h      |  2 +
>  xen/include/asm-x86/hvm/irq.h      |  1 +
>  xen/include/asm-x86/viommu.h       | 11 +++++
>  6 files changed, 147 insertions(+), 28 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 964418a..d2c1372 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -462,7 +462,7 @@ void hvm_migrate_timers(struct vcpu *v)
>      pt_migrate(v);
>  }
>  
> -static int hvm_migrate_pirq(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
> +int hvm_migrate_pirq(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
>                              void *arg)
>  {
>      struct vcpu *v = arg;
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index d8c66bf..9198ef5 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -21,6 +21,7 @@
>  #include <xen/iommu.h>
>  #include <xen/cpu.h>
>  #include <xen/irq.h>
> +#include <xen/viommu.h>
>  #include <asm/hvm/irq.h>
>  #include <asm/hvm/support.h>
>  #include <asm/io_apic.h>
> @@ -275,6 +276,61 @@ static struct vcpu *vector_hashing_dest(const struct domain *d,
>      return dest;
>  }
>  
> +void pt_update_gmsi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
> +{
> +    uint8_t dest, delivery_mode;
> +    bool dest_mode;
> +    int dest_vcpu_id;
> +    const struct vcpu *vcpu;
> +    struct arch_irq_remapping_request request;
> +    struct arch_irq_remapping_info remap_info;
> +
> +    ASSERT(spin_is_locked(&d->event_lock));
> +
> +    /* Calculate dest_vcpu_id for MSI-type pirq migration. */
> +    irq_request_msi_fill(&request, pirq_dpci->gmsi.addr, pirq_dpci->gmsi.data);
> +    if ( viommu_check_irq_remapping(d, &request) )
> +    {
> +        /* An error in IRTE, don't perform the optimization */
> +        if ( viommu_get_irq_info(d, &request, &remap_info) )
> +        {
> +            pirq_dpci->gmsi.posted = false;
> +            pirq_dpci->gmsi.dest_vcpu_id = -1;
> +            pirq_dpci->gmsi.gvec = 0;
> +            return;
> +        }
> +
> +        dest = remap_info.dest;
> +        dest_mode = remap_info.dest_mode;
> +        delivery_mode = remap_info.delivery_mode;
> +        pirq_dpci->gmsi.gvec = remap_info.vector;
> +    }
> +    else
> +    {
> +        dest = MASK_EXTR(pirq_dpci->gmsi.addr, MSI_ADDR_DEST_ID_MASK);
> +        dest_mode = pirq_dpci->gmsi.addr & MSI_ADDR_DESTMODE_MASK;
> +        delivery_mode = MASK_EXTR(pirq_dpci->gmsi.data,
> +                                  MSI_DATA_DELIVERY_MODE_MASK);
> +        pirq_dpci->gmsi.gvec = pirq_dpci->gmsi.data & MSI_DATA_VECTOR_MASK;
> +    }
> +
> +    dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
> +    pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
> +
> +    pirq_dpci->gmsi.posted = false;
> +    vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;

So you use dest_vcpu_id to get the vcpu here...

> +    if ( iommu_intpost )
> +    {
> +        if ( delivery_mode == dest_LowestPrio )
> +            vcpu = vector_hashing_dest(d, dest, dest_mode, pirq_dpci->gmsi.gvec);
> +        if ( vcpu )
> +        {
> +            pirq_dpci->gmsi.posted = true;
> +            pirq_dpci->gmsi.dest_vcpu_id = vcpu->vcpu_id;

... which is only used here in order to get the dest_vcpu_id back. Is
this really needed? Can't you just use dest_vcpu_id?

I would rather do:

if ( iommu_intpost && delivery_mode == dest_LowestPrio )
{
    const struct vcpu *vcpu = vector_hashing_dest(d, dest, dest_mode,
                                                  pirq_dpci->gmsi.gvec);

    if ( vcpu )
    {
        ....
    }
}

> +        }
> +    }
> +}
> +
>  int pt_irq_create_bind(
>      struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
>  {
> @@ -339,9 +395,6 @@ int pt_irq_create_bind(
>      {
>      case PT_IRQ_TYPE_MSI:
>      {
> -        uint8_t dest, delivery_mode, gvec;
> -        bool dest_mode;
> -        int dest_vcpu_id;
>          const struct vcpu *vcpu;
>  
>          if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
> @@ -411,35 +464,23 @@ int pt_irq_create_bind(
>                  pirq_dpci->gmsi.addr = pt_irq_bind->u.msi.addr;
>              }
>          }
> -        /* Calculate dest_vcpu_id for MSI-type pirq migration. */
> -        dest = MASK_EXTR(pirq_dpci->gmsi.addr, MSI_ADDR_DEST_ID_MASK);
> -        dest_mode = pirq_dpci->gmsi.addr & MSI_ADDR_DESTMODE_MASK;
> -        delivery_mode = MASK_EXTR(pirq_dpci->gmsi.data,
> -                                  MSI_DATA_DELIVERY_MODE_MASK);
> -        gvec = pirq_dpci->gmsi.data & MSI_DATA_VECTOR_MASK;
> -        pirq_dpci->gmsi.gvec = gvec;
>  
> -        dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
> -        pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
> +        pt_update_gmsi(d, pirq_dpci);
>          spin_unlock(&d->event_lock);
>  
> -        pirq_dpci->gmsi.posted = false;
> -        vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
> -        if ( iommu_intpost )
> -        {
> -            if ( delivery_mode == dest_LowestPrio )
> -                vcpu = vector_hashing_dest(d, dest, dest_mode,
> -                                           pirq_dpci->gmsi.gvec);
> -            if ( vcpu )
> -                pirq_dpci->gmsi.posted = true;
> -        }
> -        if ( dest_vcpu_id >= 0 )
> -            hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
> +        if ( pirq_dpci->gmsi.dest_vcpu_id >= 0 )
> +            hvm_migrate_pirqs(d->vcpu[pirq_dpci->gmsi.dest_vcpu_id]);
>  
>          /* Use interrupt posting if it is supported. */
>          if ( iommu_intpost )
> +        {
> +            if ( pirq_dpci->gmsi.posted )
> +                vcpu = d->vcpu[pirq_dpci->gmsi.dest_vcpu_id];
> +            else
> +                vcpu = NULL;
>              pi_update_irte(vcpu ? &vcpu->arch.hvm_vmx.pi_desc : NULL,
>                             info, pirq_dpci->gmsi.gvec);

If vcpu is now only used inside of this if condition please move it's
declaration here to reduce the scope.

> +        }
>  
>          if ( pt_irq_bind->u.msi.gflags & XEN_DOMCTL_VMSI_X86_UNMASKED )
>          {
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
> index f6bde69..d12ad1d 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -477,6 +477,50 @@ static int vvtd_record_fault(struct vvtd *vvtd,
>  }
>  
>  /*
> + * 'arg' is the index of interrupt remapping table. This index is used to
> + * search physical irqs which satify that the gmsi mapped with the physical irq
> + * is tranlated by the IRTE refered to by the index. The struct hvm_gmsi_info
> + * contains some fields are infered from an virtual IRTE. These fields should
> + * be updated when guest invalidates an IRTE. Furthermore, the physical IRTE
> + * is updated accordingly to reduce IPIs or utilize VT-d posted interrupt.
> + *
> + * if 'arg' is -1, perform a global invalidation.
> + */
> +static int invalidate_gmsi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
> +                         void *arg)
> +{
> +    if ( pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI )
> +    {
> +        uint32_t index, target = (long)arg;
> +        struct arch_irq_remapping_request req;
> +        const struct vcpu *vcpu;
> +
> +        irq_request_msi_fill(&req, pirq_dpci->gmsi.addr, pirq_dpci->gmsi.data);
> +        if ( !irq_remapping_request_index(&req, &index) &&
> +             ((target == -1) || (target == index)) )

Shouldn't this -1 be some kind of define, like GMSI_ALL or similar?
Also isn't it possible to use -1 as a valid target?

> +        {
> +            pt_update_gmsi(d, pirq_dpci);
> +            if ( pirq_dpci->gmsi.dest_vcpu_id >= 0 )
> +                hvm_migrate_pirq(d, pirq_dpci,
> +                                 d->vcpu[pirq_dpci->gmsi.dest_vcpu_id]);
> +
> +            /* Use interrupt posting if it is supported. */
> +            if ( iommu_intpost )
> +            {
> +                if ( pirq_dpci->gmsi.posted )
> +                    vcpu = d->vcpu[pirq_dpci->gmsi.dest_vcpu_id];
> +                else
> +                    vcpu = NULL;
> +                pi_update_irte(vcpu ? &vcpu->arch.hvm_vmx.pi_desc : NULL,
> +                               dpci_pirq(pirq_dpci), pirq_dpci->gmsi.gvec);
> +            }
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +/*
>   * Process an invalidation descriptor. Currently, only two types descriptors,
>   * Interrupt Entry Cache Invalidation Descritor and Invalidation Wait
>   * Descriptor are handled.
> @@ -530,7 +574,26 @@ static int process_iqe(struct vvtd *vvtd, uint32_t i)
>          break;
>  
>      case TYPE_INVAL_IEC:
> -        /* No cache is preserved in vvtd, nothing is needed to be flushed */
> +        /*
> +         * If VT-d pi is enabled, pi_update_irte() may be called. It assumes
> +         * pcidevs_locked().
> +         */
> +        pcidevs_lock();
> +        spin_lock(&vvtd->domain->event_lock);
> +        /* A global invalidation of the cache is requested */
> +        if ( !qinval.q.iec_inv_dsc.lo.granu )
> +            pt_pirq_iterate(vvtd->domain, invalidate_gmsi, (void *)(long)-1);
> +        else
> +        {
> +            uint32_t iidx = qinval.q.iec_inv_dsc.lo.iidx;
> +            uint32_t nr = 1 << qinval.q.iec_inv_dsc.lo.im;
> +
> +            for ( ; nr; nr--, iidx++)

You can initialize nr in the for loop.

> +                pt_pirq_iterate(vvtd->domain, invalidate_gmsi,
> +                                (void *)(long)iidx);
> +        }
> +        spin_unlock(&vvtd->domain->event_lock);
> +        pcidevs_unlock();
>          break;
>  
>      default:
> @@ -839,6 +902,8 @@ static int vvtd_read(struct vcpu *v, unsigned long addr,
>      else
>          *pval = vvtd_get_reg_quad(vvtd, offset);
>  
> +    if ( !atomic_read(&vvtd->inflight_intr) )
> +        vvtd_process_iq(vvtd);
>      return X86EMUL_OKAY;
>  }
>  
> @@ -1088,8 +1153,7 @@ static int vvtd_handle_irq_request(const struct domain *d,
>                          irte.remap.tm);
>  
>   out:
> -    if ( !atomic_dec_and_test(&vvtd->inflight_intr) )
> -        vvtd_process_iq(vvtd);
> +    atomic_dec(&vvtd->inflight_intr);

Why is this removed? It was changed like 4 patches before, and
reverted here.

>      return ret;
>  }
>  
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index b687e03..f276ab6 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -394,6 +394,8 @@ bool hvm_set_guest_bndcfgs(struct vcpu *v, u64 val);
>  bool hvm_check_cpuid_faulting(struct vcpu *v);
>  void hvm_migrate_timers(struct vcpu *v);
>  void hvm_do_resume(struct vcpu *v);
> +int hvm_migrate_pirq(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
> +                            void *arg);

Please align.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 08/28] x86/vvtd: Add MMIO handler for VVTD
  2018-02-09 17:51       ` Roger Pau Monné
@ 2018-02-22  6:20         ` Chao Gao
  2018-02-23 17:07           ` Roger Pau Monné
  0 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2018-02-22  6:20 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Feb 09, 2018 at 05:51:29PM +0000, Roger Pau Monné wrote:
>On Sat, Feb 10, 2018 at 01:21:09AM +0800, Chao Gao wrote:
>> On Fri, Feb 09, 2018 at 04:39:15PM +0000, Roger Pau Monné wrote:
>> >On Fri, Nov 17, 2017 at 02:22:15PM +0800, Chao Gao wrote:
>> >> This patch adds VVTD MMIO handler to deal with MMIO access.
>> >> 
>> >> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> >> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> >> ---
>> >> v4:
>> >>  - only trap the register emulated in vvtd_in_range().
>> >>    i.e. replace PAGE_SIZE with the VVTD_MAX_OFFSET
>> >> ---
>> >>  xen/drivers/passthrough/vtd/vvtd.c | 55 ++++++++++++++++++++++++++++++++++++++
>> >>  1 file changed, 55 insertions(+)
>> >> 
>> >> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
>> >> index 9f76ccf..d78d878 100644
>> >> --- a/xen/drivers/passthrough/vtd/vvtd.c
>> >> +++ b/xen/drivers/passthrough/vtd/vvtd.c
>> >
>> >Now that I look at this, this is the wrong folder. This should be in
>> >xen/arch/x86/hvm with the rest of the emulated devices.
>> 
>> It is a problem we discussed in previous versions. AMD puts its vIOMMU
>> (iommu_guest.c) in xen/drivers/passthrough/amd/. We are following what
>> they did. I don't have special taste on this. If no one objects to your
>> suggestion, I will move it to xen/arch/x86/hvm/. Maybe create a new
>> intel directory since it's intel-specific and won't be used by AMD.
>
>Oh, it's been quite some time since I've reviewed that, so TBH I
>didn't remember that discussion.
>
>If the AMD viommu thing is already there I guess it doesn't hurt...
>Also, have you checked whether it can be converted to use the
>infrastructure that you add here?

Not yet. It seems that we have no method to use AMD vIOMMU now.
And I notice that Wei plans to remove AMD vIOMMU.

I can convert AMD vIOMMU implementation to use this infrastructure if we
finally decide to preserve AMD vIOMMU.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 10/28] x86/vvtd: Enable Interrupt Remapping through GCMD
  2018-02-12 11:30       ` Roger Pau Monné
@ 2018-02-22  6:25         ` Chao Gao
  0 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2018-02-22  6:25 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, George Dunlap,
	Ian Jackson, Tim Deegan, xen-devel, Jan Beulich, Andrew Cooper

On Mon, Feb 12, 2018 at 11:30:18AM +0000, Roger Pau Monné wrote:
>On Sun, Feb 11, 2018 at 01:05:01PM +0800, Chao Gao wrote:
>> On Fri, Feb 09, 2018 at 05:15:17PM +0000, Roger Pau Monné wrote:
>> >On Fri, Nov 17, 2017 at 02:22:17PM +0800, Chao Gao wrote:
>> >> +static void write_gcmd_ire(struct vvtd *vvtd, uint32_t val)
>> >> +{
>> >> +    bool set = val & DMA_GCMD_IRE;
>> >> +
>> >> +    vvtd_info("%sable Interrupt Remapping\n", set ? "En" : "Dis");
>> >> +
>> >> +    vvtd->hw.intremap_enabled = set;
>> >> +    (set ? vvtd_set_bit : vvtd_clear_bit)
>> >> +        (vvtd, DMAR_GSTS_REG, DMA_GSTS_IRES_SHIFT);
>> >> +}
>> >> +
>> >>  static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
>> >>  {
>> >>      uint64_t irta = vvtd_get_reg_quad(vvtd, DMAR_IRTA_REG);
>> >> @@ -131,16 +205,29 @@ static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
>> >>       * the 'Set Interrupt Remap Table Pointer' operation.
>> >>       */
>> >>      vvtd_clear_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
>> >> +    if ( vvtd->hw.intremap_enabled )
>> >> +        vvtd_info("Update Interrupt Remapping Table when active\n");
>> >>  
>> >>      if ( gfn_x(vvtd->hw.irt) != PFN_DOWN(DMA_IRTA_ADDR(irta)) ||
>> >>           vvtd->hw.irt_max_entry != DMA_IRTA_SIZE(irta) )
>> >>      {
>> >> +        if ( vvtd->irt_base )
>> >> +        {
>> >> +            unmap_guest_pages(vvtd->irt_base,
>> >> +                              PFN_UP(vvtd->hw.irt_max_entry *
>> >> +                                     sizeof(struct iremap_entry)));
>> >> +            vvtd->irt_base = NULL;
>> >> +        }
>> >
>> >Shouldn't this be done when sirtp is switched off, instead of when
>> >it's updated?
>> >
>> >What happens in the following scenario:
>> >
>> >- Guest writes gfn to irta.
>> >- Guest enables sirtps.
>> >- Guest disables sirtps.
>> 
>> Disabling SIRTP isn't clear to me. Maybe you mean writing to GCMD with
>> SIRTP cleared. Hardware ignores write 0 to SIRTP I think becasue SIRTP
>> is a one-shot bit. Please refer to the example in VT-d spec 10.4.4.
>> Each time IRTP is updated, the old mapping should be destroyed and the
>> new mapping should be created.
>
>After reading the spec I agree, there's no such thing as clearing
>SIRTP.
>
>You should however unmap the IRTA address when IRE is cleared
>(interrupt remapping disabled), which AFAICT you don't to do now.

Yes. I agree.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 12/28] x86/vvtd: decode interrupt attribute from IRTE
  2018-02-12 11:55   ` Roger Pau Monné
@ 2018-02-22  6:33     ` Chao Gao
  0 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2018-02-22  6:33 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Mon, Feb 12, 2018 at 11:55:42AM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:19PM +0800, Chao Gao wrote:
>> Without interrupt remapping, interrupt attributes can be extracted from
>> msi message or IOAPIC RTE. However, with interrupt remapping enabled,
>> the attributes are enclosed in the associated IRTE. This callback is
>> for cases in which the caller wants to acquire interrupt attributes, for
>> example:
>> 1. vioapic_get_vector(). With vIOMMU, the RTE may don't contain vector.
>                                                ^ doesn't contain the vector.
>> 2. perform EOI which is always based on the interrupt vector.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> ---
>> v3:
>>  - add example cases in which we will use this function.
>
>I'm still missing the actual usage of vvtd_get_irq_info. This handler
>is introduced without any user.
>
>> ---
>>  xen/drivers/passthrough/vtd/vvtd.c | 25 +++++++++++++++++++++++++
>>  1 file changed, 25 insertions(+)
>> 
>> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
>> index 927e715..9890cc2 100644
>> --- a/xen/drivers/passthrough/vtd/vvtd.c
>> +++ b/xen/drivers/passthrough/vtd/vvtd.c
>> @@ -541,6 +541,30 @@ static int vvtd_handle_irq_request(const struct domain *d,
>>      return ret;
>>  }
>>  
>> +static int vvtd_get_irq_info(const struct domain *d,
>
>IMO for internal (static) functions you can drop the vvtd_ prefix.
>
>> +                             const struct arch_irq_remapping_request *irq,
>> +                             struct arch_irq_remapping_info *info)
>> +{
>> +    int ret;
>> +    struct iremap_entry irte;
>> +    struct vvtd *vvtd = domain_vvtd(d);
>> +
>> +    if ( !vvtd )
>> +        return -ENODEV;
>> +
>> +    ret = vvtd_get_entry(vvtd, irq, &irte);
>> +    /* not in an interrupt delivery, don't report faults to guest */
>> +    if ( ret )
>> +        return ret;
>> +
>> +    info->vector = irte.remap.vector;
>> +    info->dest = irte_dest(vvtd, irte.remap.dst);
>> +    info->dest_mode = irte.remap.dm;
>> +    info->delivery_mode = irte.remap.dlm;
>> +
>> +    return 0;
>> +}
>> +
>>  static void vvtd_reset(struct vvtd *vvtd)
>>  {
>>      uint64_t cap = cap_set_num_fault_regs(VVTD_FRCD_NUM)
>> @@ -603,6 +627,7 @@ static const struct viommu_ops vvtd_hvm_vmx_ops = {
>>      .create = vvtd_create,
>>      .destroy = vvtd_destroy,
>>      .handle_irq_request = vvtd_handle_irq_request,
>> +    .get_irq_info = vvtd_get_irq_info,
>
>So the public helper to this arch specific hook is added in 4/28, yet
>the arch specific code is added here, and I still have to figure out
>where this will actually be hooked into the vIOAPIC or vMSI code.
>
>Would it be possible to have a single patch, which contains 4/28, the
>code in this patch and the glue that hooks this into the vIOAPIC and
>vMSI code?
>
>The above likely applies to quite a lot of patches in this series.
>It's fine to try to reduce the size of patches as much as possible,
>but at least in this series this is actually harming (at least my)
>capability to review them.

Yes. I will put related changes in one patch.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 14/28] x86/vvtd: Handle interrupt translation faults
  2018-02-12 12:55   ` Roger Pau Monné
@ 2018-02-22  8:23     ` Chao Gao
  0 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2018-02-22  8:23 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Mon, Feb 12, 2018 at 12:55:06PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:21PM +0800, Chao Gao wrote:
>> Interrupt translation faults are non-recoverable fault. When faults
>> are triggered, it needs to populate fault info to Fault Recording
>> Registers and inject msi interrupt to notify guest IOMMU driver
>> to deal with faults.
>> 
>> This patch emulates hardware's handling interrupt translation
>> faults (more information about the process can be found in VT-d spec,
>> chipter "Translation Faults", section "Non-Recoverable Fault
>> Reporting" and section "Non-Recoverable Logging").
>> Specifically, viommu_record_fault() records the fault information and
>> viommu_report_non_recoverable_fault() reports faults to software.
>> Currently, only Primary Fault Logging is supported and the Number of
>> Fault-recording Registers is 1.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> 
>> ---
>> v4:
>>  - introduce a lock to protect fault-event related regs
>> ---
>>  xen/drivers/passthrough/vtd/iommu.h |  51 ++++++-
>>  xen/drivers/passthrough/vtd/vvtd.c  | 288 +++++++++++++++++++++++++++++++++++-
>>  2 files changed, 333 insertions(+), 6 deletions(-)
>> 
>> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
>> index 82edd2a..dc2df75 100644
>> --- a/xen/drivers/passthrough/vtd/iommu.h
>> +++ b/xen/drivers/passthrough/vtd/iommu.h
>> @@ -196,26 +196,67 @@
>>  #define DMA_CCMD_CAIG_MASK(x) (((u64)x) & ((u64) 0x3 << 59))
>>  
>>  /* FECTL_REG */
>> -#define DMA_FECTL_IM        ((uint32_t)1 << 31)
>> +#define DMA_FECTL_IM_SHIFT  31
>> +#define DMA_FECTL_IP_SHIFT  30
>> +#define DMA_FECTL_IM        ((uint32_t)1 << DMA_FECTL_IM_SHIFT)
>> +#define DMA_FECTL_IP        ((uint32_t)1 << DMA_FECTL_IP_SHIFT)
>>  
>>  /* FSTS_REG */
>> -#define DMA_FSTS_PFO        ((uint32_t)1 << 0)
>> -#define DMA_FSTS_PPF        ((uint32_t)1 << 1)
>> +#define DMA_FSTS_PFO_SHIFT  0
>> +#define DMA_FSTS_PPF_SHIFT  1
>> +#define DMA_FSTS_PRO_SHIFT  7
>> +
>> +#define DMA_FSTS_PFO        ((uint32_t)1 << DMA_FSTS_PFO_SHIFT)
>> +#define DMA_FSTS_PPF        ((uint32_t)1 << DMA_FSTS_PPF_SHIFT)
>>  #define DMA_FSTS_AFO        ((uint32_t)1 << 2)
>>  #define DMA_FSTS_APF        ((uint32_t)1 << 3)
>>  #define DMA_FSTS_IQE        ((uint32_t)1 << 4)
>>  #define DMA_FSTS_ICE        ((uint32_t)1 << 5)
>>  #define DMA_FSTS_ITE        ((uint32_t)1 << 6)
>> -#define DMA_FSTS_FAULTS    DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_AFO | DMA_FSTS_APF | DMA_FSTS_IQE | DMA_FSTS_ICE | DMA_FSTS_ITE
>> +#define DMA_FSTS_PRO        ((uint32_t)1 << DMA_FSTS_PRO_SHIFT)
>> +#define DMA_FSTS_FAULTS     (DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_AFO | \
>> +                             DMA_FSTS_APF | DMA_FSTS_IQE | DMA_FSTS_ICE | \
>> +                             DMA_FSTS_ITE | DMA_FSTS_PRO)
>> +#define DMA_FSTS_RW1CS      (DMA_FSTS_PFO | DMA_FSTS_AFO | DMA_FSTS_APF | \
>> +                             DMA_FSTS_IQE | DMA_FSTS_ICE | DMA_FSTS_ITE | \
>> +                             DMA_FSTS_PRO)
>>  #define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff)
>>  
>>  /* FRCD_REG, 32 bits access */
>> -#define DMA_FRCD_F (((u64)1) << 31)
>> +#define DMA_FRCD_LEN            0x10
>> +#define DMA_FRCD2_OFFSET        0x8
>> +#define DMA_FRCD3_OFFSET        0xc
>> +#define DMA_FRCD_F_SHIFT        31
>> +#define DMA_FRCD_F ((u64)1 << DMA_FRCD_F_SHIFT)
>>  #define dma_frcd_type(d) ((d >> 30) & 1)
>>  #define dma_frcd_fault_reason(c) (c & 0xff)
>>  #define dma_frcd_source_id(c) (c & 0xffff)
>>  #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
>>  
>> +struct vtd_fault_record_register
>> +{
>> +    union {
>> +        struct {
>> +            uint64_t lo;
>> +            uint64_t hi;
>> +        } bits;
>> +        struct {
>> +            uint64_t rsvd0          :12,
>> +                     fault_info     :52;
>> +            uint64_t source_id      :16,
>> +                     rsvd1          :9,
>> +                     pmr            :1,  /* Privilege Mode Requested */
>> +                     exe            :1,  /* Execute Permission Requested */
>> +                     pasid_p        :1,  /* PASID Present */
>> +                     fault_reason   :8,  /* Fault Reason */
>> +                     pasid_val      :20, /* PASID Value */
>> +                     addr_type      :2,  /* Address Type */
>> +                     type           :1,  /* Type. (0) Write (1) Read/AtomicOp */
>> +                     fault          :1;  /* Fault */
>> +        } fields;
>> +    };
>> +};
>> +
>>  /* Interrupt remapping transition faults */
>>  #define VTD_FR_IR_REQ_RSVD      0x20
>>  #define VTD_FR_IR_INDEX_OVER    0x21
>> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
>> index d3dec01..83805d1 100644
>> --- a/xen/drivers/passthrough/vtd/vvtd.c
>> +++ b/xen/drivers/passthrough/vtd/vvtd.c
>> @@ -43,6 +43,7 @@
>>  struct hvm_hw_vvtd {
>>      bool eim_enabled;
>>      bool intremap_enabled;
>> +    uint32_t fault_index;
>>  
>>      /* Interrupt remapping table base gfn and the max of entries */
>>      uint16_t irt_max_entry;
>> @@ -58,6 +59,12 @@ struct vvtd {
>>      struct domain *domain;
>>      /* # of in-flight interrupts */
>>      atomic_t inflight_intr;
>> +    /*
>> +     * This lock protects fault-event related registers (DMAR_FEXXX_REG).
>> +     * It's used for draining in-flight fault events before responding
>> +     * guest's programming to those registers.
>> +     */
>> +    spinlock_t fe_lock;
>
>I still think almost if not all of the vvtd helper functions should be
>mutually exclusive (ie: locked), not only the fault-event related
>registers. I guess Linux or other OSes already serialize access to the
>vIOMMU somehow, so your not seeing any errors. But I'm quite sure
>things will fail in weird ways if a malicious guests starts to
>concurrently write to different vIOMMU registers.

VT-d SPEC doesn't describe about what would happen if software accesses
to registers concurrently. Adding a lock to force serialization in case
guest doesn't do that is fine with me. As to the fe_lock here, this lock
isn't used to serialize accesses from guest. It is to serialize
(virtual) hardware write and guest's write. Fault events can come
during interrupt delivery, thus it may happen at any time and overlap
with guest's write. When delivering fault events, we need to check
guest's configuration of fault events (i.e. masked or not, the vector
number...), then inject interrupts to guest. Guest shouldn't be allowed
to change configuration when hardware is processing a fault event
(a.k.a there are in-flight fault events). Otherwise, an interrupt may
inject to guest when guest masks fault events or an interrupt with stale
vector is injected to guest.

>
>>  
>>      struct hvm_hw_vvtd hw;
>>      void *irt_base;
>> @@ -87,6 +94,21 @@ boolean_runtime_param("viommu_verbose", viommu_verbose);
>>  #endif
>>  
>>  #define VVTD_REG_POS(vvtd, offset) &(vvtd->hw.regs[offset/sizeof(uint32_t)])
>> +static inline int vvtd_test_and_set_bit(struct vvtd *vvtd, uint32_t reg, int nr)
>> +{
>> +    return test_and_set_bit(nr, VVTD_REG_POS(vvtd, reg));
>> +}
>> +
>> +static inline int vvtd_test_and_clear_bit(struct vvtd *vvtd, uint32_t reg,
>> +                                          int nr)
>> +{
>> +    return test_and_clear_bit(nr, VVTD_REG_POS(vvtd, reg));
>> +}
>
>So for set and clear bit you use the non locked variants (prefixed by
>__), and here you use the locked variants of test and set/clear. Is
>there any reason for this? I would expect locked/unlocked bitops to be
>used consistently for dealing with the registers unless there's a
>specific reason not to do so.

non locked variants would be fine for 'fe_lock' is introduced in this
version. Only non locked variants will be used in next version.

>
>> +
>> +static inline int vvtd_test_bit(struct vvtd *vvtd, uint32_t reg, int nr)
>> +{
>> +    return test_bit(nr, VVTD_REG_POS(vvtd, reg));
>> +}
>>  
>>  static inline void vvtd_set_bit(struct vvtd *vvtd, uint32_t reg, int nr)
>>  {
>> @@ -238,6 +260,30 @@ static int vvtd_delivery(struct domain *d, uint8_t vector,
>>      return 0;
>>  }
>>  
>> +static void vvtd_generate_interrupt(const struct vvtd *vvtd, uint64_t addr,
>> +                                    uint32_t data)
>> +{
>> +    bool dm = addr & MSI_ADDR_DESTMODE_MASK;
>
>Please use MASK_EXTR here. Also destmode is usually treated as an
>uint8_t in the rest of the Xen code (see vmsi_deliver). I would
>probably keep using uint8_t just for consistency with the rest of the
>code.

Will do.

>
>> +    uint32_t dest = MASK_EXTR(addr, MSI_ADDR_DEST_ID_MASK);
>> +    uint8_t dlm = MASK_EXTR(data, MSI_DATA_DELIVERY_MODE_MASK);
>> +    uint8_t tm = MASK_EXTR(data, MSI_DATA_TRIGGER_MASK);
>> +    uint8_t vector = data & MSI_DATA_VECTOR_MASK;
>
>MASK_EXTR please.
>
>> +
>> +    vvtd_debug("d%d: generating msi %lx %x\n", vvtd->domain->domain_id, addr,
>> +               data);
>> +
>> +    if ( vvtd->hw.eim_enabled )
>> +        dest |= (addr >> 40) << 8;
>
>This 40 and 8 look like magic numbers to me, but it's liekly me
>missing something. Any reason not to use addr >> 32 directly? In any
>case I would really appreciate if you could add defines for those
>and/or comments.

If eim_enabled is 1, destination ID[31:8] is addr[63,40].
I will add some definition. 

>
>> +
>> +    vvtd_delivery(vvtd->domain, vector, dest, dm, dlm, tm);
>> +}
>> +
>> +static void vvtd_notify_fault(const struct vvtd *vvtd)
>> +{
>> +    vvtd_generate_interrupt(vvtd, vvtd_get_reg_quad(vvtd, DMAR_FEADDR_REG),
>> +                            vvtd_get_reg(vvtd, DMAR_FEDATA_REG));
>> +}
>> +
>>  /* Computing the IRTE index for a given interrupt request. When success, return
>>   * 0 and set index to reference the corresponding IRTE. Otherwise, return < 0,
>>   * i.e. -1 when the irq request isn't an remapping format.
>> @@ -290,6 +336,198 @@ static inline uint32_t irte_dest(struct vvtd *vvtd, uint32_t dest)
>>                                  : MASK_EXTR(dest, IRTE_xAPIC_DEST_MASK);
>>  }
>>  
>> +static void vvtd_report_non_recoverable_fault(struct vvtd *vvtd, int reason)
>> +{
>> +    uint32_t fsts = vvtd_get_reg(vvtd, DMAR_FSTS_REG);
>> +
>> +    vvtd_set_bit(vvtd, DMAR_FSTS_REG, reason);
>
>test_and_set?

No. There are many fault reasons. For example, primary pending fault,
primary fault overflow and invalidation queue error. Here we wants to
read the whole fsts rather than one bit we want to set, if there are
some faults reported to guest are yet to be sevices, no need to inject
fault events again.

>
>> +
>> +    /*
>> +     * Accoroding to VT-d spec "Non-Recoverable Fault Event" chapter, if
>> +     * there are any previously reported interrupt conditions that are yet to
>> +     * be sevices by software, the Fault Event interrrupt is not generated.
>> +     */
>> +    if ( fsts & DMA_FSTS_FAULTS )
>> +        return;
>> +
>> +    vvtd_set_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IP_SHIFT);
>> +    if ( !vvtd_test_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IM_SHIFT) )
>> +    {
>> +        vvtd_notify_fault(vvtd);
>> +        vvtd_clear_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IP_SHIFT);
>> +    }
>> +}
>> +
>> +static void vvtd_update_ppf(struct vvtd *vvtd)
>> +{
>> +    int i;
>
>unsigned int.
>
>> +    uint64_t cap = vvtd_get_reg_quad(vvtd, DMAR_CAP_REG);
>> +    unsigned int base = cap_fault_reg_offset(cap);
>> +
>> +    for ( i = 0; i < cap_num_fault_regs(cap); i++ )
>> +    {
>> +        if ( vvtd_test_bit(vvtd, base + i * DMA_FRCD_LEN + DMA_FRCD3_OFFSET,
>> +                           DMA_FRCD_F_SHIFT) )
>> +        {
>> +            vvtd_report_non_recoverable_fault(vvtd, DMA_FSTS_PPF_SHIFT);
>> +            return;
>> +        }
>> +    }
>> +    /*
>> +     * No Primary Fault is in Fault Record Registers, thus clear PPF bit in
>> +     * FSTS.
>> +     */
>> +    vvtd_clear_bit(vvtd, DMAR_FSTS_REG, DMA_FSTS_PPF_SHIFT);
>> +
>> +    /* If no fault is in FSTS, clear pending bit in FECTL. */
>> +    if ( !(vvtd_get_reg(vvtd, DMAR_FSTS_REG) & DMA_FSTS_FAULTS) )
>> +        vvtd_clear_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IP_SHIFT);
>> +}
>> +
>> +/*
>> + * Commit a fault to emulated Fault Record Registers.
>> + */
>> +static void vvtd_commit_frcd(struct vvtd *vvtd, int idx,
>> +                             const struct vtd_fault_record_register *frcd)
>> +{
>> +    unsigned int base = cap_fault_reg_offset(
>> +                            vvtd_get_reg_quad(vvtd, DMAR_CAP_REG));
>> +
>> +    vvtd_set_reg_quad(vvtd, base + idx * DMA_FRCD_LEN, frcd->bits.lo);
>> +    vvtd_set_reg_quad(vvtd, base + idx * DMA_FRCD_LEN + 8, frcd->bits.hi);
>> +    vvtd_update_ppf(vvtd);
>> +}
>> +
>> +/*
>> + * Allocate a FRCD for the caller. If success, return the FRI. Or, return -1
>> + * when failure.
>> + */
>> +static int vvtd_alloc_frcd(struct vvtd *vvtd)
>
>What's the maximum value of FRCD according to the spec? Will it fit in
>an int?

64. The number of FRCD is exposed to software via DMAR_CAP_REG[47:40].
So I think 'int' is fine. Currently, vIOMMU has only 1 FRCD.

>
>> +{
>> +    int prev;
>> +    uint64_t cap = vvtd_get_reg_quad(vvtd, DMAR_CAP_REG);
>> +    unsigned int base = cap_fault_reg_offset(cap);
>> +
>> +    /* Set the F bit to indicate the FRCD is in use. */
>> +    if ( !vvtd_test_and_set_bit(vvtd,
>> +                                base + vvtd->hw.fault_index * DMA_FRCD_LEN +
>> +                                DMA_FRCD3_OFFSET, DMA_FRCD_F_SHIFT) )
>> +    {
>> +        prev = vvtd->hw.fault_index;
>
>prev can be declared inside the if:
>
>    unsigned int prev = vvtd->hw.fault_index;
>
>Also prev is used only once, so I think you can just get rid of it.

Will do.

>
>> +        vvtd->hw.fault_index = (prev + 1) % cap_num_fault_regs(cap);
>> +        return vvtd->hw.fault_index;
>> +    }
>
>Newline.
>
>> +    return -ENOMEM;
>> +}
>> +
>> +static void vvtd_free_frcd(struct vvtd *vvtd, int i)
>> +{
>> +    unsigned int base = cap_fault_reg_offset(
>> +                            vvtd_get_reg_quad(vvtd, DMAR_CAP_REG));
>> +
>> +    vvtd_clear_bit(vvtd, base + i * DMA_FRCD_LEN + DMA_FRCD3_OFFSET,
>> +                   DMA_FRCD_F_SHIFT);
>> +}
>> +
>> +static int vvtd_record_fault(struct vvtd *vvtd,
>> +                             const struct arch_irq_remapping_request *request,
>> +                             int reason)
>> +{
>> +    struct vtd_fault_record_register frcd;
>> +    int fault_index;
>
>unsigned int maybe, see comments above.
>
>> +    uint32_t irt_index;
>> +
>> +    spin_lock(&vvtd->fe_lock);
>> +    switch(reason)
>> +    {
>> +    case VTD_FR_IR_REQ_RSVD:
>> +    case VTD_FR_IR_INDEX_OVER:
>> +    case VTD_FR_IR_ENTRY_P:
>> +    case VTD_FR_IR_ROOT_INVAL:
>> +    case VTD_FR_IR_IRTE_RSVD:
>> +    case VTD_FR_IR_REQ_COMPAT:
>> +    case VTD_FR_IR_SID_ERR:
>> +        if ( vvtd_test_bit(vvtd, DMAR_FSTS_REG, DMA_FSTS_PFO_SHIFT) )
>> +            goto out;
>> +
>> +        /* No available Fault Record means Fault overflowed */
>> +        fault_index = vvtd_alloc_frcd(vvtd);
>> +        if ( fault_index < 0 )
>> +        {
>> +            vvtd_report_non_recoverable_fault(vvtd, DMA_FSTS_PFO_SHIFT);
>> +            goto out;
>> +        }
>> +        memset(&frcd, 0, sizeof(frcd));
>
>Given the fact that frcd has not padding you can initialize it at
>declaration using:
>
>struct vtd_fault_record_register frcd = { };

Will do.

>
>> +        frcd.fields.fault_reason = reason;
>> +        if ( irq_remapping_request_index(request, &irt_index) )
>> +            goto out;
>> +        frcd.fields.fault_info = irt_index;
>> +        frcd.fields.source_id = request->source_id;
>> +        frcd.fields.fault = 1;
>> +        vvtd_commit_frcd(vvtd, fault_index, &frcd);
>> +        break;
>> +
>> +    default:
>> +        vvtd_debug("d%d: can't handle vvtd fault (reason 0x%x)",
>> +                   vvtd->domain->domain_id, reason);
>> +        break;
>> +    }
>> +
>> + out:
>> +    spin_unlock(&vvtd->fe_lock);
>> +    return X86EMUL_OKAY;
>
>I'm not sure why this function needs to return any value given it's
>current usage, and in any case since it's not an emulation handler it
>shouldn't use X86EMUL_* values at all.

will eliminate return value.

>
>> +}
>> +
>> +static int vvtd_write_frcd3(struct vvtd *vvtd, uint32_t val)
>> +{
>> +    /* Writing a 1 means clear fault */
>> +    if ( val & DMA_FRCD_F )
>> +    {
>> +        vvtd_free_frcd(vvtd, 0);
>> +        vvtd_update_ppf(vvtd);
>> +    }
>> +    return X86EMUL_OKAY;
>
>Same here, I don't see the point in returning a value, and certainly
>it shouldn't be X86EMUL_* in any case.
>
>> +}
>> +
>> +static void vvtd_write_fectl(struct vvtd *vvtd, uint32_t val)
>> +{
>> +    /*
>> +     * Only DMA_FECTL_IM bit is writable. Generate pending event when unmask.
>> +     */
>> +    if ( !(val & DMA_FECTL_IM) )
>> +    {
>> +        /* Clear IM */
>> +        vvtd_clear_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IM_SHIFT);
>> +        if ( vvtd_test_and_clear_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IP_SHIFT) )
>> +            vvtd_notify_fault(vvtd);
>> +    }
>> +    else
>> +        vvtd_set_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IM_SHIFT);
>> +}
>> +
>> +static void vvtd_write_fsts(struct vvtd *vvtd, uint32_t val)
>> +{
>> +    int i, max_fault_index = DMA_FSTS_PRO_SHIFT;
>> +    uint64_t bits_to_clear = val & DMA_FSTS_RW1CS;
>> +
>> +    if ( bits_to_clear )
>> +    {
>> +        i = find_first_bit(&bits_to_clear, max_fault_index / 8 + 1);
>> +        while ( i <= max_fault_index )
>> +        {
>> +            vvtd_clear_bit(vvtd, DMAR_FSTS_REG, i);
>> +            i = find_next_bit(&bits_to_clear, max_fault_index / 8 + 1, i + 1);
>> +        }
>> +    }
>> +
>> +    /*
>> +     * Clear IP field when all status fields in the Fault Status Register
>> +     * being clear.
>> +     */
>> +    if ( !((vvtd_get_reg(vvtd, DMAR_FSTS_REG) & DMA_FSTS_FAULTS)) )
>> +        vvtd_clear_bit(vvtd, DMAR_FECTL_REG, DMA_FECTL_IP_SHIFT);
>> +}
>> +
>>  static void write_gcmd_ire(struct vvtd *vvtd, uint32_t val)
>>  {
>>      bool set = val & DMA_GCMD_IRE;
>> @@ -391,11 +629,47 @@ static int vvtd_read(struct vcpu *v, unsigned long addr,
>>      return X86EMUL_OKAY;
>>  }
>>  
>> +static void vvtd_write_fault_regs(struct vvtd *vvtd, unsigned long val,
>> +                                  unsigned int offset, unsigned int len)
>> +{
>> +    unsigned int fault_offset = cap_fault_reg_offset(
>> +                                    vvtd_get_reg_quad(vvtd, DMAR_CAP_REG));
>> +
>> +    spin_lock(&vvtd->fe_lock);
>> +    for ( ; len ; len -= 4, offset += 4, val = val >> 32)
>
>It seems overkill to use a for loop here when len can only be 4 or 8
>AFAICT (maybe I'm wrong). Is 64bit access really allowed to those
>registers? You seem to treat all of them as 32bit registers which
>makes me wonder if 64bit accesses are really allowed.

64bit accesses are allowed. VT-d SPEC 10.2 says
"Software is expected to access 32-bit registers as aligned doublewords.
...

Software must access 64-bit and 128-bit registers as either aligned
quadwords or aligned doublewords. Hardware may disassemble a quadword
register access as two double-word accesses"

Using a for loop here is allowed by VT-d SPEC. Furthermore, it makes me
get rid of
'''
if (len == 8)
    vvtd_set_reg_quad(...)
else
    vvtd_set_reg(...)
'''
And this is also the reason why the struct hvm_hw_vvtd in patch 07/28 is
defined as 
struct hvm_hw_vvtd {
    uint32_t regs[...];
}.
About this definition, you commented that it would be better as:
union hw_vvtd {
    uint32_t regs32[...];
    uint64_t regs64[...];
};

Actually, no 64-bit registers are needed for vIOMMU disassembles 64-bit
writes to two 32-bit writes.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 15/28] x86/vvtd: Enable Queued Invalidation through GCMD
  2018-02-12 14:04   ` Roger Pau Monné
@ 2018-02-22 10:33     ` Chao Gao
  0 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2018-02-22 10:33 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Mon, Feb 12, 2018 at 02:04:46PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:22PM +0800, Chao Gao wrote:
>> Software writes to QIE field of GCMD to enable or disable queued
>> invalidations. This patch emulates QIE field of GCMD.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> ---
>>  xen/drivers/passthrough/vtd/iommu.h |  3 ++-
>>  xen/drivers/passthrough/vtd/vvtd.c  | 18 ++++++++++++++++++
>>  2 files changed, 20 insertions(+), 1 deletion(-)
>> 
>> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
>> index dc2df75..b71dab8 100644
>> --- a/xen/drivers/passthrough/vtd/iommu.h
>> +++ b/xen/drivers/passthrough/vtd/iommu.h
>> @@ -160,7 +160,8 @@
>>  #define DMA_GSTS_FLS    (((u64)1) << 29)
>>  #define DMA_GSTS_AFLS   (((u64)1) << 28)
>>  #define DMA_GSTS_WBFS   (((u64)1) << 27)
>> -#define DMA_GSTS_QIES   (((u64)1) <<26)
>> +#define DMA_GSTS_QIES_SHIFT     26
>> +#define DMA_GSTS_QIES   (((u64)1) << DMA_GSTS_QIES_SHIFT)
>>  #define DMA_GSTS_IRES_SHIFT     25
>>  #define DMA_GSTS_IRES   (((u64)1) << DMA_GSTS_IRES_SHIFT)
>>  #define DMA_GSTS_SIRTPS_SHIFT   24
>> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
>> index 83805d1..a2fa64a 100644
>> --- a/xen/drivers/passthrough/vtd/vvtd.c
>> +++ b/xen/drivers/passthrough/vtd/vvtd.c
>> @@ -539,6 +539,20 @@ static void write_gcmd_ire(struct vvtd *vvtd, uint32_t val)
>>          (vvtd, DMAR_GSTS_REG, DMA_GSTS_IRES_SHIFT);
>>  }
>>  
>> +static void write_gcmd_qie(struct vvtd *vvtd, uint32_t val)
>> +{
>> +    bool set = val & DMA_GCMD_QIE;
>> +
>> +    vvtd_info("%sable Queue Invalidation\n", set ? "En" : "Dis");
>> +
>> +    if ( set )
>> +        vvtd_set_reg_quad(vvtd, DMAR_IQH_REG, 0);
>
>If QIE is already enabled and the user writes to GCMD with the QIE bit
>set won't this wrongly clear the invalidation queue?

No. If QIE is already enabled, writting to GCMD with QIE would be
ignored. write_gcmd_qie() is called only when QIE is changed in
vvtd_write_gcmd(). Actually, if we want to enable other features and not
want to disable QI, we should write to GCMD with the QIE bit set.

>
>> +
>> +    (set ? vvtd_set_bit : vvtd_clear_bit)
>> +        (vvtd, DMAR_GSTS_REG, DMA_GSTS_QIES_SHIFT);
>> +
>> +}
>> +
>>  static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
>>  {
>>      uint64_t irta = vvtd_get_reg_quad(vvtd, DMAR_IRTA_REG);
>> @@ -598,6 +612,10 @@ static void vvtd_write_gcmd(struct vvtd *vvtd, uint32_t val)
>>          write_gcmd_sirtp(vvtd, val);
>>      if ( changed & DMA_GCMD_IRE )
>>          write_gcmd_ire(vvtd, val);
>> +    if ( changed & DMA_GCMD_QIE )
>> +        write_gcmd_qie(vvtd, val);
>> +    if ( changed & ~(DMA_GCMD_SIRTP | DMA_GCMD_IRE | DMA_GCMD_QIE) )
>> +        vvtd_info("Only SIRTP, IRE, QIE in GCMD are handled");
>
>This seems quite likely to go out of sync. I would rather do:
>
>if ( changed & DMA_GCMD_QIE )
>{
>    write_gcmd_qie(vvtd, val);
>    changed &= ~DMA_GCMD_QIE;
>}
>...
>if ( changed )
>    vvtd_info("Unhandled bit detected: %...");
>
>It seems also quite likely this can be simplified with a macro:
>
>#define HANDLE_GCMD_BIT(bit)        \
>if ( changed & DMA_GCMD_ ## bit )   \
>{                                   \
>    write_gcmd_ ## bit (vvtd, val); \
>    changed &= ~DMA_GCMD_ ## bit;   \
>}
>
>So that you can write:
>
>HANDLE_GCMD_BIT(IRE);
>HANDLE_GCMD_BIT(QIE);
>...

Will use this macro.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 16/28] x86/vvtd: Add queued invalidation (QI) support
  2018-02-12 14:36   ` Roger Pau Monné
@ 2018-02-23  4:38     ` Chao Gao
  0 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2018-02-23  4:38 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, George Dunlap,
	Ian Jackson, Tim Deegan, xen-devel, Jan Beulich, Andrew Cooper

On Mon, Feb 12, 2018 at 02:36:10PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:23PM +0800, Chao Gao wrote:
>> Queued Invalidation Interface is an expanded invalidation interface with
>> extended capabilities. Hardware implementations report support for queued
>> invalidation interface through the Extended Capability Register. The queued
>> invalidation interface uses an Invalidation Queue (IQ), which is a circular
>> buffer in system memory. Software submits commands by writing Invalidation
>> Descriptors to the IQ.
>> 
>> In this patch, a new function viommu_process_iq() is used for emulating how
>> hardware handles invalidation requests through QI.
>
>You should mention that QI is mandatory in order to support interrupt
>remapping.

Will do.

>
>I was about to ask whether QI could be deferred to a later stage, but
>AFAICT this is not an option.
>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> 
>> ---
>> v4:
>>  - Introduce a lock to protect invalidation related registers.
>> ---
>>  xen/drivers/passthrough/vtd/iommu.h |  24 +++-
>>  xen/drivers/passthrough/vtd/vvtd.c  | 271 +++++++++++++++++++++++++++++++++++-
>>  2 files changed, 293 insertions(+), 2 deletions(-)
>> 
>> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
>> index b71dab8..de9188b 100644
>> --- a/xen/drivers/passthrough/vtd/iommu.h
>> +++ b/xen/drivers/passthrough/vtd/iommu.h
>> @@ -47,7 +47,12 @@
>>  #define DMAR_IQH_REG            0x80 /* invalidation queue head */
>>  #define DMAR_IQT_REG            0x88 /* invalidation queue tail */
>>  #define DMAR_IQA_REG            0x90 /* invalidation queue addr */
>> +#define DMAR_IQUA_REG           0x94 /* invalidation queue upper addr */
>> +#define DMAR_ICS_REG            0x9c /* invalidation completion status */
>>  #define DMAR_IECTL_REG          0xa0 /* invalidation event control register */
>> +#define DMAR_IEDATA_REG         0xa4 /* invalidation event data register */
>> +#define DMAR_IEADDR_REG         0xa8 /* invalidation event address register */
>> +#define DMAR_IEUADDR_REG        0xac /* upper address register */
>>  #define DMAR_IRTA_REG           0xb8 /* base address of intr remap table */
>>  #define DMAR_IRTUA_REG          0xbc /* upper address of intr remap table */
>>  
>> @@ -175,6 +180,21 @@
>>  #define DMA_IRTA_S(val)         (val & 0xf)
>>  #define DMA_IRTA_SIZE(val)      (1UL << (DMA_IRTA_S(val) + 1))
>>  
>> +/* IQA_REG */
>> +#define DMA_IQA_ADDR(val)       (val & ~0xfffULL)
>> +#define DMA_IQA_QS(val)         (val & 0x7)
>> +#define DMA_IQA_RSVD            0xff8ULL
>> +
>> +/* IECTL_REG */
>> +#define DMA_IECTL_IM_SHIFT 31
>> +#define DMA_IECTL_IM            (1U << DMA_IECTL_IM_SHIFT)
>> +#define DMA_IECTL_IP_SHIFT 30
>> +#define DMA_IECTL_IP            (1U << DMA_IECTL_IP_SHIFT)
>> +
>> +/* ICS_REG */
>> +#define DMA_ICS_IWC_SHIFT       0
>> +#define DMA_ICS_IWC             (1U << DMA_ICS_IWC_SHIFT)
>> +
>>  /* PMEN_REG */
>>  #define DMA_PMEN_EPM    (((u32)1) << 31)
>>  #define DMA_PMEN_PRS    (((u32)1) << 0)
>> @@ -205,13 +225,14 @@
>>  /* FSTS_REG */
>>  #define DMA_FSTS_PFO_SHIFT  0
>>  #define DMA_FSTS_PPF_SHIFT  1
>> +#define DMA_FSTS_IQE_SHIFT  4
>>  #define DMA_FSTS_PRO_SHIFT  7
>>  
>>  #define DMA_FSTS_PFO        ((uint32_t)1 << DMA_FSTS_PFO_SHIFT)
>>  #define DMA_FSTS_PPF        ((uint32_t)1 << DMA_FSTS_PPF_SHIFT)
>>  #define DMA_FSTS_AFO        ((uint32_t)1 << 2)
>>  #define DMA_FSTS_APF        ((uint32_t)1 << 3)
>> -#define DMA_FSTS_IQE        ((uint32_t)1 << 4)
>> +#define DMA_FSTS_IQE        ((uint32_t)1 << DMA_FSTS_IQE_SHIFT)
>>  #define DMA_FSTS_ICE        ((uint32_t)1 << 5)
>>  #define DMA_FSTS_ITE        ((uint32_t)1 << 6)
>>  #define DMA_FSTS_PRO        ((uint32_t)1 << DMA_FSTS_PRO_SHIFT)
>> @@ -555,6 +576,7 @@ struct qinval_entry {
>>  
>>  /* Queue invalidation head/tail shift */
>>  #define QINVAL_INDEX_SHIFT 4
>> +#define QINVAL_INDEX_MASK  0x7fff0ULL
>>  
>>  #define qinval_present(v) ((v).lo & 1)
>>  #define qinval_fault_disable(v) (((v).lo >> 1) & 1)
>> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
>> index a2fa64a..81170ec 100644
>> --- a/xen/drivers/passthrough/vtd/vvtd.c
>> +++ b/xen/drivers/passthrough/vtd/vvtd.c
>> @@ -27,6 +27,7 @@
>>  #include <asm/event.h>
>>  #include <asm/io_apic.h>
>>  #include <asm/hvm/domain.h>
>> +#include <asm/hvm/support.h>
>>  #include <asm/p2m.h>
>>  
>>  #include "iommu.h"
>> @@ -68,6 +69,9 @@ struct vvtd {
>>  
>>      struct hvm_hw_vvtd hw;
>>      void *irt_base;
>> +    void *inv_queue_base;
>
>Why not declare this as:
>
>struct qinval_entry *

will do.

>
>> +    /* This lock protects invalidation related registers */
>> +    spinlock_t ie_lock;
>
>As noted in another patch, I think the first approach should be to use
>a single lock that serializes access to the whole vIOMMU register
>space. Later we can see about more fine grained locking.

Seems you mean a coars-grained lock should be taken at the beginning of operations
which read and write vIOMMU registers. It is supposed to be easy to add such lock.

>
>>  };
>>  
>>  /* Setting viommu_verbose enables debugging messages of vIOMMU */
>> @@ -284,6 +288,12 @@ static void vvtd_notify_fault(const struct vvtd *vvtd)
>>                              vvtd_get_reg(vvtd, DMAR_FEDATA_REG));
>>  }
>>  
>> +static void vvtd_notify_inv_completion(const struct vvtd *vvtd)
>> +{
>> +    vvtd_generate_interrupt(vvtd, vvtd_get_reg_quad(vvtd, DMAR_IEADDR_REG),
>> +                            vvtd_get_reg(vvtd, DMAR_IEDATA_REG));
>> +}
>> +
>>  /* Computing the IRTE index for a given interrupt request. When success, return
>>   * 0 and set index to reference the corresponding IRTE. Otherwise, return < 0,
>>   * i.e. -1 when the irq request isn't an remapping format.
>> @@ -478,6 +488,189 @@ static int vvtd_record_fault(struct vvtd *vvtd,
>>      return X86EMUL_OKAY;
>>  }
>>  
>> +/*
>> + * Process an invalidation descriptor. Currently, only two types descriptors,
>> + * Interrupt Entry Cache Invalidation Descritor and Invalidation Wait
>> + * Descriptor are handled.
>> + * @vvtd: the virtual vtd instance
>> + * @i: the index of the invalidation descriptor to be processed
>> + *
>> + * If success return 0, or return non-zero when failure.
>> + */
>> +static int process_iqe(struct vvtd *vvtd, uint32_t i)
>> +{
>> +    struct qinval_entry qinval;
>> +    int ret = 0;
>> +
>> +    if ( !vvtd->inv_queue_base )
>> +    {
>> +        gdprintk(XENLOG_ERR, "Invalidation queue base isn't set\n");
>> +        return -1;
>
>If you just return -1 or 0 please use bool instead. Or return proper
>error codes.

Will return meaningful error codes.

>
>> +    }
>> +    qinval = ((struct qinval_entry *)vvtd->inv_queue_base)[i];
>
>See my comment above regarding how inv_queue_base is declared, I'm not
>sure why the copy is needed here.

Don't need copy here. Will fix.

>
>> +
>> +    switch ( qinval.q.inv_wait_dsc.lo.type )
>> +    {
>> +    case TYPE_INVAL_WAIT:
>> +        if ( qinval.q.inv_wait_dsc.lo.sw )
>> +        {
>> +            uint32_t data = qinval.q.inv_wait_dsc.lo.sdata;
>> +            uint64_t addr = qinval.q.inv_wait_dsc.hi.saddr << 2;
>> +
>> +            ret = hvm_copy_to_guest_phys(addr, &data, sizeof(data), current);
>> +            if ( ret )
>> +                vvtd_info("Failed to write status address\n");
>> +        }
>> +
>> +        /*
>> +         * The following code generates an invalidation completion event
>> +         * indicating the invalidation wait descriptor completion. Note that
>> +         * the following code fragment is not tested properly.
>> +         */
>> +        if ( qinval.q.inv_wait_dsc.lo.iflag )
>> +        {
>> +            if ( !vvtd_test_and_set_bit(vvtd, DMAR_ICS_REG, DMA_ICS_IWC_SHIFT) )
>> +            {
>> +                vvtd_set_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IP_SHIFT);
>> +                if ( !vvtd_test_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IM_SHIFT) )
>> +                {
>> +                    vvtd_notify_inv_completion(vvtd);
>> +                    vvtd_clear_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IP_SHIFT);
>> +                }
>> +            }
>> +        }
>> +        break;
>> +
>> +    case TYPE_INVAL_IEC:
>> +        /* No cache is preserved in vvtd, nothing is needed to be flushed */
>> +        break;
>> +
>> +    default:
>> +        vvtd_debug("d%d: Invalidation type (%x) isn't supported\n",
>> +                   vvtd->domain->domain_id, qinval.q.inv_wait_dsc.lo.type);
>> +        return -1;
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +/*
>> + * Invalidate all the descriptors in Invalidation Queue.
>> + */
>> +static void vvtd_process_iq(struct vvtd *vvtd)
>> +{
>> +    uint32_t max_entry, i, iqh, iqt;
>> +    int err = 0;
>> +
>> +    /* Trylock avoids more than 1 caller dealing with invalidation requests */
>> +    if ( !spin_trylock(&vvtd->ie_lock) )
>
>Uh, is this correct? You are returning without the queue being
>invalidated AFAICT.

QI is an asynchronous operation. Software can queue a special
invalidation request to invalidation queue to get a notification from
hardware when all requests before that spcial request are finished.
Returing without invalidating queue is acceptable. Real invalidation is
deferred to a suitable time when no interrupt is processing, no one
is doing invalidation and guest isn't writing QI related registers.

Anyway, if we use a coars-grained lock, 'ie_lock' will be removed and then
invalidation will be done synchronously because there will be no in-flight
requests.

>
>> +        return;
>> +
>> +    iqh = MASK_EXTR(vvtd_get_reg_quad(vvtd, DMAR_IQH_REG), QINVAL_INDEX_MASK);
>> +    iqt = MASK_EXTR(vvtd_get_reg_quad(vvtd, DMAR_IQT_REG), QINVAL_INDEX_MASK);
>> +    /*
>> +     * No new descriptor is fetched from the Invalidation Queue until
>> +     * software clears the IQE field in the Fault Status Register
>> +     */
>> +    if ( vvtd_test_bit(vvtd, DMAR_FSTS_REG, DMA_FSTS_IQE_SHIFT) )
>> +    {
>> +        spin_unlock(&vvtd->ie_lock);
>> +        return;
>> +    }
>> +
>> +    max_entry = 1 << (QINVAL_ENTRY_ORDER +
>> +                      DMA_IQA_QS(vvtd_get_reg_quad(vvtd, DMAR_IQA_REG)));
>> +
>> +    ASSERT(iqt < max_entry);
>
>Is it possible for the user to write a valid value to DMAR_IQT_REG and
>then change DMAR_IQA_REG in order to make the above ASSERT trigger?

No. It isn't.

>
>> +    if ( iqh == iqt )
>> +    {
>> +        spin_unlock(&vvtd->ie_lock);
>> +        return;
>> +    }
>> +
>> +    for ( i = iqh; i != iqt; i = (i + 1) % max_entry )
>> +    {
>> +        err = process_iqe(vvtd, i);
>> +        if ( err )
>> +            break;
>> +    }
>> +
>> +    /*
>> +     * set IQH before checking error, because IQH should reference
>> +     * the desriptor associated with the error when an error is seen
>> +     * by guest
>> +     */
>> +    vvtd_set_reg_quad(vvtd, DMAR_IQH_REG, i << QINVAL_INDEX_SHIFT);
>> +
>> +    spin_unlock(&vvtd->ie_lock);
>> +    if ( err )
>> +    {
>> +        spin_lock(&vvtd->fe_lock);
>> +        vvtd_report_non_recoverable_fault(vvtd, DMA_FSTS_IQE_SHIFT);
>> +        spin_unlock(&vvtd->fe_lock);
>> +    }
>> +}
>> +
>> +static void vvtd_write_iqt(struct vvtd *vvtd, uint32_t val)
>> +{
>> +    uint32_t max_entry;
>> +
>> +    if ( val & ~QINVAL_INDEX_MASK )
>> +    {
>> +        vvtd_info("attempts to set reserved bits in IQT\n");
>> +        return;
>> +    }
>> +
>> +    max_entry = 1U << (QINVAL_ENTRY_ORDER +
>> +                       DMA_IQA_QS(vvtd_get_reg_quad(vvtd, DMAR_IQA_REG)));
>> +    if ( MASK_EXTR(val, QINVAL_INDEX_MASK) >= max_entry )
>> +    {
>> +        vvtd_info("IQT: Value %x exceeded supported max index.", val);
>> +        return;
>> +    }
>> +
>> +    vvtd_set_reg(vvtd, DMAR_IQT_REG, val);
>> +}
>> +
>> +static void vvtd_write_iqa(struct vvtd *vvtd, uint32_t val, bool high)
>> +{
>> +    uint64_t cap = vvtd_get_reg_quad(vvtd, DMAR_CAP_REG);
>> +    uint64_t old = vvtd_get_reg_quad(vvtd, DMAR_IQA_REG);
>> +    uint64_t new;
>> +
>> +    if ( high )
>> +        new = ((uint64_t)val << 32) | (old & 0xffffffff);
>> +    else
>> +        new = ((old >> 32) << 32) | val;
>
>You can also use old & ~0xffffffffUL
>
>> +
>> +    if ( new & (~((1ULL << cap_mgaw(cap)) - 1) | DMA_IQA_RSVD) )
>> +    {
>> +        vvtd_info("Attempt to set reserved bits in IQA\n");
>> +        return;
>> +    }
>> +
>> +    vvtd_set_reg_quad(vvtd, DMAR_IQA_REG, new);
>> +    if ( high && !vvtd->inv_queue_base )
>> +        vvtd->inv_queue_base = map_guest_pages(vvtd->domain,
>> +                                               PFN_DOWN(DMA_IQA_ADDR(new)),
>> +                                               1 << DMA_IQA_QS(new));
>
>Don't you need to pick a reference to this page(s)?

Mapping guest pages already get a reference, needn't get a reference
again.

>
>> +    else if ( !high && vvtd->inv_queue_base )
>
>I'm not sure I follow the logic with high here.

Software can access 64-bit as either aligned quadwords or aligned
doublewords. Here we set up mapping when writting to the upper
double-word and destroy mapping when writting to the lower doubleword.

>
>> +    {
>> +        unmap_guest_pages(vvtd->inv_queue_base, 1 << DMA_IQA_QS(old));
>> +        vvtd->inv_queue_base = NULL;
>> +    }
>> +}
>> +
>> +static void vvtd_write_ics(struct vvtd *vvtd, uint32_t val)
>> +{
>> +    if ( val & DMA_ICS_IWC )
>> +    {
>> +        vvtd_clear_bit(vvtd, DMAR_ICS_REG, DMA_ICS_IWC_SHIFT);
>> +        /* When IWC field is cleared, the IP field needs to be cleared */
>> +        vvtd_clear_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IP_SHIFT);
>> +    }
>> +}
>> +
>>  static int vvtd_write_frcd3(struct vvtd *vvtd, uint32_t val)
>>  {
>>      /* Writing a 1 means clear fault */
>> @@ -489,6 +682,20 @@ static int vvtd_write_frcd3(struct vvtd *vvtd, uint32_t val)
>>      return X86EMUL_OKAY;
>>  }
>>  
>> +static void vvtd_write_iectl(struct vvtd *vvtd, uint32_t val)
>> +{
>> +    /* Only DMA_IECTL_IM bit is writable. Generate pending event when unmask */
>> +    if ( !(val & DMA_IECTL_IM) )
>> +    {
>> +        /* Clear IM and clear IP */
>> +        vvtd_clear_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IM_SHIFT);
>> +        if ( vvtd_test_and_clear_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IP_SHIFT) )
>> +            vvtd_notify_inv_completion(vvtd);
>> +    }
>> +    else
>> +        vvtd_set_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IM_SHIFT);
>> +}
>> +
>>  static void vvtd_write_fectl(struct vvtd *vvtd, uint32_t val)
>>  {
>>      /*
>> @@ -681,6 +888,48 @@ static void vvtd_write_fault_regs(struct vvtd *vvtd, unsigned long val,
>>      spin_unlock(&vvtd->fe_lock);
>>  }
>>  
>> +static void vvtd_write_invalidation_regs(struct vvtd *vvtd, unsigned long val,
>> +                                         unsigned int offset, unsigned int len)
>> +{
>> +    spin_lock(&vvtd->ie_lock);
>> +    for ( ; len ; len -= 4, offset += 4, val = val >> 32)
>
>Same comment as in the previous patch, I don't really like the for
>loop, but I guess 64bit access must be allowed to these grup of
>registers?

Yes. It is allowed. But VT-d SPEC implies that hardware may disassemble
64bit write to two 32bit write.

>
>> +    {
>> +        switch ( offset )
>> +        {
>> +        case DMAR_IECTL_REG:
>> +            vvtd_write_iectl(vvtd, val);
>> +            break;
>> +
>> +        case DMAR_ICS_REG:
>> +            vvtd_write_ics(vvtd, val);
>> +            break;
>> +
>> +        case DMAR_IQT_REG:
>> +            vvtd_write_iqt(vvtd, val);
>> +            break;
>> +
>> +        case DMAR_IQA_REG:
>> +            vvtd_write_iqa(vvtd, val, 0);
>> +            break;
>> +
>> +        case DMAR_IQUA_REG:
>> +            vvtd_write_iqa(vvtd, val, 1);
>> +            break;
>> +
>> +        case DMAR_IEDATA_REG:
>> +        case DMAR_IEADDR_REG:
>> +        case DMAR_IEUADDR_REG:
>> +            vvtd_set_reg(vvtd, offset, val);
>> +            break;
>> +
>> +        default:
>> +            break;
>> +        }
>> +    }
>> +    spin_unlock(&vvtd->ie_lock);
>> +
>> +}
>> +
>>  static int vvtd_write(struct vcpu *v, unsigned long addr,
>>                        unsigned int len, unsigned long val)
>>  {
>> @@ -719,6 +968,17 @@ static int vvtd_write(struct vcpu *v, unsigned long addr,
>>          vvtd_write_fault_regs(vvtd, val, offset, len);
>>          break;
>>  
>> +    case DMAR_IECTL_REG:
>> +    case DMAR_ICS_REG:
>> +    case DMAR_IQT_REG:
>> +    case DMAR_IQA_REG:
>> +    case DMAR_IQUA_REG:
>> +    case DMAR_IEDATA_REG:
>> +    case DMAR_IEADDR_REG:
>> +    case DMAR_IEUADDR_REG:
>> +        vvtd_write_invalidation_regs(vvtd, val, offset, len);
>> +        break;
>> +
>>      default:
>>          if ( (offset == (fault_offset + DMA_FRCD2_OFFSET)) ||
>>               (offset == (fault_offset + DMA_FRCD3_OFFSET)) )
>> @@ -840,7 +1100,8 @@ static int vvtd_handle_irq_request(const struct domain *d,
>>                          irte.remap.tm);
>>  
>>   out:
>> -    atomic_dec(&vvtd->inflight_intr);
>> +    if ( !atomic_dec_and_test(&vvtd->inflight_intr) )
>> +        vvtd_process_iq(vvtd);
>>      return ret;
>>  }
>>  
>> @@ -911,6 +1172,7 @@ static int vvtd_create(struct domain *d, struct viommu *viommu)
>>      vvtd->domain = d;
>>      register_mmio_handler(d, &vvtd_mmio_ops);
>>      spin_lock_init(&vvtd->fe_lock);
>> +    spin_lock_init(&vvtd->ie_lock);
>>  
>>      viommu->priv = vvtd;
>>  
>> @@ -930,6 +1192,13 @@ static int vvtd_destroy(struct viommu *viommu)
>>                                       sizeof(struct iremap_entry)));
>>              vvtd->irt_base = NULL;
>>          }
>> +        if ( vvtd->inv_queue_base )
>> +        {
>> +            uint64_t old = vvtd_get_reg_quad(vvtd, DMAR_IQA_REG);
>> +
>> +            unmap_guest_pages(vvtd->inv_queue_base, 1 << DMA_IQA_QS(old));
>
>Don't you also need to unmap this page(s) when QIE is disabled?

Will do.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 17/28] x86/vvtd: save and restore emulated VT-d
  2018-02-12 14:49   ` Roger Pau Monné
@ 2018-02-23  5:22     ` Chao Gao
  2018-02-23 17:19       ` Roger Pau Monné
  0 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2018-02-23  5:22 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Mon, Feb 12, 2018 at 02:49:12PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:24PM +0800, Chao Gao wrote:
>> Provide a save-restore pair to save/restore registers and non-register
>> status.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> ---
>> v3:
>>  - use one entry to save both vvtd registers and other intermediate
>>  state
>> ---
>>  xen/drivers/passthrough/vtd/vvtd.c     | 57 +++++++++++++++++++++++-----------
>>  xen/include/public/arch-x86/hvm/save.h | 18 ++++++++++-
>>  2 files changed, 56 insertions(+), 19 deletions(-)
>> 
>> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
>> index 81170ec..f6bde69 100644
>> --- a/xen/drivers/passthrough/vtd/vvtd.c
>> +++ b/xen/drivers/passthrough/vtd/vvtd.c
>> @@ -27,8 +27,10 @@
>>  #include <asm/event.h>
>>  #include <asm/io_apic.h>
>>  #include <asm/hvm/domain.h>
>> +#include <asm/hvm/save.h>
>>  #include <asm/hvm/support.h>
>>  #include <asm/p2m.h>
>> +#include <public/hvm/save.h>
>>  
>>  #include "iommu.h"
>>  #include "vtd.h"
>> @@ -38,20 +40,6 @@
>>  
>>  #define VVTD_FRCD_NUM   1ULL
>>  #define VVTD_FRCD_START (DMAR_IRTA_REG + 8)
>> -#define VVTD_FRCD_END   (VVTD_FRCD_START + VVTD_FRCD_NUM * 16)
>> -#define VVTD_MAX_OFFSET VVTD_FRCD_END
>> -
>> -struct hvm_hw_vvtd {
>> -    bool eim_enabled;
>> -    bool intremap_enabled;
>> -    uint32_t fault_index;
>> -
>> -    /* Interrupt remapping table base gfn and the max of entries */
>> -    uint16_t irt_max_entry;
>> -    gfn_t irt;
>
>You are changing gfn_t to uint64_t, is gfn_t not working with the
>migration stream?

In xen/include/public/save.h, there is a comment around line 32:
 * Structures in this header *must* have the same layout in 32bit 
 * and 64bit environments: this means that all fields must be explicitly 
 * sized types and aligned to their sizes, and the structs must be 
 * a multiple of eight bytes long.

That's why I change bool to uint32_t and gfn_t to uint64_t.

>
>Also I think this duplication of fields (having all registers in
>'regs' and some cached in miscellaneous top level fields is not a good
>approach.

Yes. I think intremap_enabled can be removed. Others (i.e.
eim_enable, irt_max_entry, irt) cannot be removed. Because
guest may update IRTA when interrupt remapping is enabled.
Here means guest wants to use TABLE B to replace current TABLE A.
To finish this replacing, guest should program table B's base address,
the number of entries and eim is enabled or not to IRTA and then set
GCMD with SIRTP set. If saving and restoring happen between above two
steps, we can recover TABLE A's base address and other information.

>
>> -
>> -    uint32_t regs[VVTD_MAX_OFFSET/sizeof(uint32_t)];
>> -};
>>  
>>  struct vvtd {
>>      /* Base address of remapping hardware register-set */
>> @@ -776,7 +764,7 @@ static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
>>      if ( vvtd->hw.intremap_enabled )
>>          vvtd_info("Update Interrupt Remapping Table when active\n");
>>  
>> -    if ( gfn_x(vvtd->hw.irt) != PFN_DOWN(DMA_IRTA_ADDR(irta)) ||
>> +    if ( vvtd->hw.irt != PFN_DOWN(DMA_IRTA_ADDR(irta)) ||
>>           vvtd->hw.irt_max_entry != DMA_IRTA_SIZE(irta) )
>>      {
>>          if ( vvtd->irt_base )
>> @@ -786,14 +774,14 @@ static void write_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
>>                                       sizeof(struct iremap_entry)));
>>              vvtd->irt_base = NULL;
>>          }
>> -        vvtd->hw.irt = _gfn(PFN_DOWN(DMA_IRTA_ADDR(irta)));
>> +        vvtd->hw.irt = PFN_DOWN(DMA_IRTA_ADDR(irta));
>>          vvtd->hw.irt_max_entry = DMA_IRTA_SIZE(irta);
>>          vvtd->hw.eim_enabled = !!(irta & IRTA_EIME);
>>          vvtd_info("Update IR info (addr=%lx eim=%d size=%d)\n",
>> -                  gfn_x(vvtd->hw.irt), vvtd->hw.eim_enabled,
>> +                  vvtd->hw.irt, vvtd->hw.eim_enabled,
>>                    vvtd->hw.irt_max_entry);
>>  
>> -        vvtd->irt_base = map_guest_pages(vvtd->domain, gfn_x(vvtd->hw.irt),
>> +        vvtd->irt_base = map_guest_pages(vvtd->domain, vvtd->hw.irt,
>>                                           PFN_UP(vvtd->hw.irt_max_entry *
>>                                                  sizeof(struct iremap_entry)));
>>      }
>> @@ -1138,6 +1126,39 @@ static bool vvtd_is_remapping(const struct domain *d,
>>      return !irq_remapping_request_index(irq, &idx);
>>  }
>>  
>> +static int vvtd_load(struct domain *d, hvm_domain_context_t *h)
>> +{
>> +    struct vvtd *vvtd = domain_vvtd(d);
>> +    uint64_t iqa;
>> +
>> +    if ( !vvtd )
>> +        return -ENODEV;
>> +
>> +    if ( hvm_load_entry(VVTD, h, &vvtd->hw) )
>> +        return -EINVAL;
>> +
>> +    iqa = vvtd_get_reg_quad(vvtd, DMAR_IQA_REG);
>> +    vvtd->irt_base = map_guest_pages(vvtd->domain, vvtd->hw.irt,
>> +                                     PFN_UP(vvtd->hw.irt_max_entry *
>> +                                            sizeof(struct iremap_entry)));
>> +    vvtd->inv_queue_base = map_guest_pages(vvtd->domain,
>> +                                           PFN_DOWN(DMA_IQA_ADDR(iqa)),
>> +                                           1 << DMA_IQA_QS(iqa));
>
>Why are you unconditionally mapping those pages? Shouldn't you check
>that the relevant features are enabled?
>
>Both could be 0 or simply point to garbage.

Will do some checks.

>
>> +    return 0;
>> +}
>> +
>> +static int vvtd_save(struct domain *d, hvm_domain_context_t *h)
>> +{
>> +    struct vvtd *vvtd = domain_vvtd(d);
>> +
>> +    if ( !vvtd )
>> +        return 0;
>> +
>> +    return hvm_save_entry(VVTD, 0, h, &vvtd->hw);
>> +}
>> +
>> +HVM_REGISTER_SAVE_RESTORE(VVTD, vvtd_save, vvtd_load, 1, HVMSR_PER_DOM);
>> +
>>  static void vvtd_reset(struct vvtd *vvtd)
>>  {
>>      uint64_t cap = cap_set_num_fault_regs(VVTD_FRCD_NUM)
>> diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
>> index fd7bf3f..24a513b 100644
>> --- a/xen/include/public/arch-x86/hvm/save.h
>> +++ b/xen/include/public/arch-x86/hvm/save.h
>> @@ -639,10 +639,26 @@ struct hvm_msr {
>>  
>>  #define CPU_MSR_CODE  20
>>  
>> +#define VVTD_MAX_OFFSET 0xd0
>
>You used to have some kind of formula to calculate VVTD_MAX_OFFSET,
>yet here the value is just hardcoded. Any reason for this?

The formula uses DMAR_IRTA_REG, which is defined in xen/drivers/passthrough/vtd/iommu.h.
It cannot be included by this file.

>
>> +struct hvm_hw_vvtd
>> +{
>> +    uint32_t eim_enabled : 1,
>> +             intremap_enabled : 1;
>> +    uint32_t fault_index;
>> +
>> +    /* Interrupt remapping table base gfn and the max of entries */
>> +    uint32_t irt_max_entry;
>> +    uint64_t irt;
>> +
>> +    uint32_t regs[VVTD_MAX_OFFSET/sizeof(uint32_t)];
>> +};
>> +
>> +DECLARE_HVM_SAVE_TYPE(VVTD, 21, struct hvm_hw_vvtd);
>
>Adding new fields to this struct in a migration compatible way is
>going to be a PITA, but there's no easy solution to this I'm afraid...

What do you mean by "migration compatible"? Do you mean migrating a hvm
guest with viommu between different Xen versions? Could it be solved by
leaving some padding fields here?

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 11/28] x86/vvtd: Process interrupt remapping request
  2018-02-11  5:31     ` Chao Gao
@ 2018-02-23 17:04       ` Roger Pau Monné
  0 siblings, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-23 17:04 UTC (permalink / raw)
  To: xen-devel, Tim Deegan, Stefano Stabellini, Jan Beulich, Wei Liu,
	Ian Jackson, George Dunlap, Konrad Rzeszutek Wilk, Andrew Cooper,
	Kevin Tian, Lan Tianyu

On Sun, Feb 11, 2018 at 01:31:41PM +0800, Chao Gao wrote:
> On Fri, Feb 09, 2018 at 05:44:17PM +0000, Roger Pau Monné wrote:
> >On Fri, Nov 17, 2017 at 02:22:18PM +0800, Chao Gao wrote:
> >> +static int vvtd_delivery(struct domain *d, uint8_t vector,
> >> +                         uint32_t dest, bool dest_mode,
> >> +                         uint8_t delivery_mode, uint8_t trig_mode)
> >> +{
> >> +    struct vlapic *target;
> >> +    struct vcpu *v;
> >> +
> >> +    switch ( delivery_mode )
> >> +    {
> >> +    case dest_LowestPrio:
> >> +        target = vlapic_lowest_prio(d, NULL, 0, dest, dest_mode);
> >> +        if ( target != NULL )
> >> +        {
> >> +            vvtd_debug("d%d: dest=v%d dlm=%x vector=%d trig_mode=%d\n",
> >> +                       vlapic_domain(target)->domain_id,
> >> +                       vlapic_vcpu(target)->vcpu_id,
> >> +                       delivery_mode, vector, trig_mode);
> >> +            vlapic_set_irq(target, vector, trig_mode);
> >> +            break;
> >> +        }
> >> +        vvtd_debug("d%d: null round robin: vector=%02x\n",
> >> +                   d->domain_id, vector);
> >> +        break;
> >> +
> >> +    case dest_Fixed:
> >> +        for_each_vcpu ( d, v )
> >> +            if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) )
> >> +            {
> >> +                vvtd_debug("d%d: dest=v%d dlm=%x vector=%d trig_mode=%d\n",
> >> +                           v->domain->domain_id, v->vcpu_id,
> >> +                           delivery_mode, vector, trig_mode);
> >> +                vlapic_set_irq(vcpu_vlapic(v), vector, trig_mode);
> >> +            }
> >> +        break;
> >> +
> >> +    case dest_NMI:
> >> +        for_each_vcpu ( d, v )
> >> +            if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) &&
> >> +                 !test_and_set_bool(v->nmi_pending) )
> >> +                vcpu_kick(v);
> >
> >Doing this loops here seems quite bad from a preformance PoV,
> >specially taking into account that this code is going to be used with
> >> 128 vCPUs.
> 
> Maybe. But i prefer to not do optimization at this early stage.

I agree with not doing optimizations for first pass implementations,
but given this series is focused on increasing the number of vCPUs in
order to get better performance adding loops bounded to the number of
vCPUs seems quite incoherent.

There are several of those in the vlapic code for example, so I'm
wondering whether a preparatory patch should deal with those, or at
least have a plan.

I would like to at least see a 'TODO' tag here describing how to deal
with this in the future, so that the maximum allowed number of vCPUs
for HVM domain is not bumped until those TODOs are taken care of.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 08/28] x86/vvtd: Add MMIO handler for VVTD
  2018-02-22  6:20         ` Chao Gao
@ 2018-02-23 17:07           ` Roger Pau Monné
  2018-02-23 17:37             ` Wei Liu
  0 siblings, 1 reply; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-23 17:07 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Thu, Feb 22, 2018 at 02:20:12PM +0800, Chao Gao wrote:
> On Fri, Feb 09, 2018 at 05:51:29PM +0000, Roger Pau Monné wrote:
> >On Sat, Feb 10, 2018 at 01:21:09AM +0800, Chao Gao wrote:
> >> On Fri, Feb 09, 2018 at 04:39:15PM +0000, Roger Pau Monné wrote:
> >> >On Fri, Nov 17, 2017 at 02:22:15PM +0800, Chao Gao wrote:
> >> >> This patch adds VVTD MMIO handler to deal with MMIO access.
> >> >> 
> >> >> Signed-off-by: Chao Gao <chao.gao@intel.com>
> >> >> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> >> >> ---
> >> >> v4:
> >> >>  - only trap the register emulated in vvtd_in_range().
> >> >>    i.e. replace PAGE_SIZE with the VVTD_MAX_OFFSET
> >> >> ---
> >> >>  xen/drivers/passthrough/vtd/vvtd.c | 55 ++++++++++++++++++++++++++++++++++++++
> >> >>  1 file changed, 55 insertions(+)
> >> >> 
> >> >> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
> >> >> index 9f76ccf..d78d878 100644
> >> >> --- a/xen/drivers/passthrough/vtd/vvtd.c
> >> >> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> >> >
> >> >Now that I look at this, this is the wrong folder. This should be in
> >> >xen/arch/x86/hvm with the rest of the emulated devices.
> >> 
> >> It is a problem we discussed in previous versions. AMD puts its vIOMMU
> >> (iommu_guest.c) in xen/drivers/passthrough/amd/. We are following what
> >> they did. I don't have special taste on this. If no one objects to your
> >> suggestion, I will move it to xen/arch/x86/hvm/. Maybe create a new
> >> intel directory since it's intel-specific and won't be used by AMD.
> >
> >Oh, it's been quite some time since I've reviewed that, so TBH I
> >didn't remember that discussion.
> >
> >If the AMD viommu thing is already there I guess it doesn't hurt...
> >Also, have you checked whether it can be converted to use the
> >infrastructure that you add here?
> 
> Not yet. It seems that we have no method to use AMD vIOMMU now.
> And I notice that Wei plans to remove AMD vIOMMU.
> 
> I can convert AMD vIOMMU implementation to use this infrastructure if we
> finally decide to preserve AMD vIOMMU.

Oh, OK, I had no idea we where planning to remove the AMD vIOMMU
stuff.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 17/28] x86/vvtd: save and restore emulated VT-d
  2018-02-23  5:22     ` Chao Gao
@ 2018-02-23 17:19       ` Roger Pau Monné
  0 siblings, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2018-02-23 17:19 UTC (permalink / raw)
  To: Chao Gao
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Fri, Feb 23, 2018 at 01:22:23PM +0800, Chao Gao wrote:
> On Mon, Feb 12, 2018 at 02:49:12PM +0000, Roger Pau Monné wrote:
> >On Fri, Nov 17, 2017 at 02:22:24PM +0800, Chao Gao wrote:
> >
> >> +struct hvm_hw_vvtd
> >> +{
> >> +    uint32_t eim_enabled : 1,
> >> +             intremap_enabled : 1;
> >> +    uint32_t fault_index;
> >> +
> >> +    /* Interrupt remapping table base gfn and the max of entries */
> >> +    uint32_t irt_max_entry;
> >> +    uint64_t irt;
> >> +
> >> +    uint32_t regs[VVTD_MAX_OFFSET/sizeof(uint32_t)];
> >> +};
> >> +
> >> +DECLARE_HVM_SAVE_TYPE(VVTD, 21, struct hvm_hw_vvtd);
> >
> >Adding new fields to this struct in a migration compatible way is
> >going to be a PITA, but there's no easy solution to this I'm afraid...
> 
> What do you mean by "migration compatible"? Do you mean migrating a hvm
> guest with viommu between different Xen versions? Could it be solved by
> leaving some padding fields here?

It's inevitable, but when new features are added to the vvtd
implementation the struct will likely change and then we will need
some conversion helper.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 08/28] x86/vvtd: Add MMIO handler for VVTD
  2018-02-23 17:07           ` Roger Pau Monné
@ 2018-02-23 17:37             ` Wei Liu
  0 siblings, 0 replies; 83+ messages in thread
From: Wei Liu @ 2018-02-23 17:37 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper, Chao Gao

On Fri, Feb 23, 2018 at 05:07:09PM +0000, Roger Pau Monné wrote:
> On Thu, Feb 22, 2018 at 02:20:12PM +0800, Chao Gao wrote:
> > On Fri, Feb 09, 2018 at 05:51:29PM +0000, Roger Pau Monné wrote:
> > >On Sat, Feb 10, 2018 at 01:21:09AM +0800, Chao Gao wrote:
> > >> On Fri, Feb 09, 2018 at 04:39:15PM +0000, Roger Pau Monné wrote:
> > >> >On Fri, Nov 17, 2017 at 02:22:15PM +0800, Chao Gao wrote:
> > >> >> This patch adds VVTD MMIO handler to deal with MMIO access.
> > >> >> 
> > >> >> Signed-off-by: Chao Gao <chao.gao@intel.com>
> > >> >> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> > >> >> ---
> > >> >> v4:
> > >> >>  - only trap the register emulated in vvtd_in_range().
> > >> >>    i.e. replace PAGE_SIZE with the VVTD_MAX_OFFSET
> > >> >> ---
> > >> >>  xen/drivers/passthrough/vtd/vvtd.c | 55 ++++++++++++++++++++++++++++++++++++++
> > >> >>  1 file changed, 55 insertions(+)
> > >> >> 
> > >> >> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
> > >> >> index 9f76ccf..d78d878 100644
> > >> >> --- a/xen/drivers/passthrough/vtd/vvtd.c
> > >> >> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> > >> >
> > >> >Now that I look at this, this is the wrong folder. This should be in
> > >> >xen/arch/x86/hvm with the rest of the emulated devices.
> > >> 
> > >> It is a problem we discussed in previous versions. AMD puts its vIOMMU
> > >> (iommu_guest.c) in xen/drivers/passthrough/amd/. We are following what
> > >> they did. I don't have special taste on this. If no one objects to your
> > >> suggestion, I will move it to xen/arch/x86/hvm/. Maybe create a new
> > >> intel directory since it's intel-specific and won't be used by AMD.
> > >
> > >Oh, it's been quite some time since I've reviewed that, so TBH I
> > >didn't remember that discussion.
> > >
> > >If the AMD viommu thing is already there I guess it doesn't hurt...
> > >Also, have you checked whether it can be converted to use the
> > >infrastructure that you add here?
> > 
> > Not yet. It seems that we have no method to use AMD vIOMMU now.
> > And I notice that Wei plans to remove AMD vIOMMU.
> > 
> > I can convert AMD vIOMMU implementation to use this infrastructure if we
> > finally decide to preserve AMD vIOMMU.
> 
> Oh, OK, I had no idea we where planning to remove the AMD vIOMMU
> stuff.

That code was never properly hooked up in the first place. It has been
dead code since 2012-ish so I assumed noone cared.

I don't know if AMD maintainer will object to the removal though.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 18/28] x86/vioapic: Hook interrupt delivery of vIOAPIC
  2018-02-12 14:54   ` Roger Pau Monné
@ 2018-02-24  1:51     ` Chao Gao
  2018-02-24  3:17       ` Tian, Kevin
  0 siblings, 1 reply; 83+ messages in thread
From: Chao Gao @ 2018-02-24  1:51 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Mon, Feb 12, 2018 at 02:54:02PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:25PM +0800, Chao Gao wrote:
>> When irq remapping is enabled, IOAPIC Redirection Entry may be in remapping
>> format. If that, generate an irq_remapping_request and call the common
>
>"If that's the case, ..."
>
>> VIOMMU abstraction's callback to handle this interrupt request. Device
>> model is responsible for checking the request's validity.
>
>What does this exactly mean? Device model is not involved in what the
>guest writes to the vIOAPIC RTE, so it's impossible for the device
>model to validate this in any way.

How about this description:
When irq remapping is enabled, IOAPIC Redirection Entry may be in remapping
format. If that's the case, an irq_remapping_request will be generated and
IOMMU-specific handler deals with this request. IOMMU-specific handler
will check whether the request is valid or not, report error via
IOMMU-specific machanism if invalid or otherwise transform the request to an
interrupt info (including interrupt destination, vector and trigger mode
etc.) according to IRT.

>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> 
>> ---
>> v3:
>>  - use the new interface to check remapping format.
>> ---
>>  xen/arch/x86/hvm/vioapic.c   | 9 +++++++++
>>  xen/include/asm-x86/viommu.h | 9 +++++++++
>>  2 files changed, 18 insertions(+)
>> 
>> diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
>> index 97b419f..0f20e3f 100644
>> --- a/xen/arch/x86/hvm/vioapic.c
>> +++ b/xen/arch/x86/hvm/vioapic.c
>> @@ -30,6 +30,7 @@
>>  #include <xen/lib.h>
>>  #include <xen/errno.h>
>>  #include <xen/sched.h>
>> +#include <xen/viommu.h>
>>  #include <public/hvm/ioreq.h>
>>  #include <asm/hvm/io.h>
>>  #include <asm/hvm/vpic.h>
>> @@ -387,9 +388,17 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
>>      struct vlapic *target;
>>      struct vcpu *v;
>>      unsigned int irq = vioapic->base_gsi + pin;
>> +    struct arch_irq_remapping_request request;
>>  
>>      ASSERT(spin_is_locked(&d->arch.hvm_domain.irq_lock));
>>  
>> +    irq_request_ioapic_fill(&request, vioapic->id, vioapic->redirtbl[pin].bits);
>> +    if ( viommu_check_irq_remapping(d, &request) )
>> +    {
>> +        viommu_handle_irq_request(d, &request);
>> +        return;
>> +    }
>
>Will this compile if you disable vIOMMU in Kconfig?

Yes. Will fix this by wrapping this fragment with #ifdef and #endif.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 20/28] xen/pt: when binding guest msi, accept the whole msi message
  2018-02-12 15:16   ` Roger Pau Monné
@ 2018-02-24  2:20     ` Chao Gao
  0 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2018-02-24  2:20 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Andrew Cooper, Tim Deegan, xen-devel, Jan Beulich,
	Ian Jackson

On Mon, Feb 12, 2018 at 03:16:25PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:27PM +0800, Chao Gao wrote:
>> ... rather than a filtered one. Previously, some fields (reserved or
>> unalterable) are filtered by QEMU. These fields are useless for the
>> legacy interrupt format (i.e. non remappable format). However, these
>> fields are meaningful to remappable format. Accepting the whole msi
>> message will significantly reduce the efforts to support binding
>> remappable format msi.
>
>This should be sent as a separate patch series, together with the
>required QEMU change. Batching it in this series it's going to make it
>harder to commit IMO.

Will do.

>
>Also note that the QEMU side needs to be committed and backported to
>the qemu-xen tree before applying the Xen side.

As for compatibility, How about introducing an new API for binding
interrupt using an unfiltered message? QEMU maintainer thinks changing
an API between QEMU and Xen is not good.

>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> ---
>> v4:
>>  - new
>> ---
>>  tools/libxc/include/xenctrl.h |  7 ++++---
>>  tools/libxc/xc_domain.c       | 14 ++++++++------
>>  xen/arch/x86/hvm/vmsi.c       | 12 ++++++------
>>  xen/drivers/passthrough/io.c  | 36 +++++++++++++++++-------------------
>>  xen/include/asm-x86/hvm/irq.h |  5 +++--
>>  xen/include/public/domctl.h   |  8 ++------
>>  6 files changed, 40 insertions(+), 42 deletions(-)
>> 
>> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
>> index 666db0b..8ade90c 100644
>> --- a/tools/libxc/include/xenctrl.h
>> +++ b/tools/libxc/include/xenctrl.h
>> @@ -1756,16 +1756,17 @@ int xc_domain_ioport_mapping(xc_interface *xch,
>>  int xc_domain_update_msi_irq(
>>      xc_interface *xch,
>>      uint32_t domid,
>> -    uint32_t gvec,
>>      uint32_t pirq,
>> +    uint64_t addr,
>> +    uint32_t data,
>>      uint32_t gflags,
>
>If you pass addr and data, do you really need to also pass gflags?
>
>> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
>> index 7126de7..5edb0e7 100644
>> --- a/xen/arch/x86/hvm/vmsi.c
>> +++ b/xen/arch/x86/hvm/vmsi.c
>> @@ -101,12 +101,12 @@ int vmsi_deliver(
>>  
>>  void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci)
>>  {
>> -    uint32_t flags = pirq_dpci->gmsi.gflags;
>> -    int vector = pirq_dpci->gmsi.gvec;
>> -    uint8_t dest = (uint8_t)flags;
>> -    bool dest_mode = flags & XEN_DOMCTL_VMSI_X86_DM_MASK;
>> -    uint8_t delivery_mode = MASK_EXTR(flags, XEN_DOMCTL_VMSI_X86_DELIV_MASK);
>> -    bool trig_mode = flags & XEN_DOMCTL_VMSI_X86_TRIG_MASK;
>> +    uint8_t vector = pirq_dpci->gmsi.data & MSI_DATA_VECTOR_MASK;
>
>MASK_EXTR please (here and elsewhere).
>
>> +    uint8_t dest = MASK_EXTR(pirq_dpci->gmsi.addr, MSI_ADDR_DEST_ID_MASK);
>> +    bool dest_mode = pirq_dpci->gmsi.addr & MSI_ADDR_DESTMODE_MASK;
>> +    uint8_t delivery_mode = MASK_EXTR(pirq_dpci->gmsi.data,
>> +                                      MSI_DATA_DELIVERY_MODE_MASK);
>> +    bool trig_mode = pirq_dpci->gmsi.data & MSI_DATA_TRIGGER_MASK;
>>  
>>      HVM_DBG_LOG(DBG_LEVEL_IOAPIC,
>>                  "msi: dest=%x dest_mode=%x delivery_mode=%x "
>> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
>> index 8f16e6c..d8c66bf 100644
>> --- a/xen/drivers/passthrough/io.c
>> +++ b/xen/drivers/passthrough/io.c
>> @@ -339,19 +339,17 @@ int pt_irq_create_bind(
>>      {
>>      case PT_IRQ_TYPE_MSI:
>>      {
>> -        uint8_t dest, delivery_mode;
>> +        uint8_t dest, delivery_mode, gvec;
>
>I'm not sure you really need the gvec local variable, AFAICT it's used
>only once.
>
>> diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h
>> index 3b6b4bd..3a8832c 100644
>> --- a/xen/include/asm-x86/hvm/irq.h
>> +++ b/xen/include/asm-x86/hvm/irq.h
>> @@ -132,9 +132,10 @@ struct dev_intx_gsi_link {
>>  #define HVM_IRQ_DPCI_TRANSLATE       (1u << _HVM_IRQ_DPCI_TRANSLATE_SHIFT)
>>  
>>  struct hvm_gmsi_info {
>> -    uint32_t gvec;
>> -    uint32_t gflags;
>> +    uint32_t data;
>>      int dest_vcpu_id; /* -1 :multi-dest, non-negative: dest_vcpu_id */
>> +    uint64_t addr;
>> +    uint8_t gvec;
>
>Can't you just obtain the guest vector from addr and flags?

It seems yes. Will try to remove 'gvec' field.

>
>>      bool posted; /* directly deliver to guest via VT-d PI? */
>>  };
>>  
>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>> index 9f6f0aa..2717c68 100644
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -536,15 +536,11 @@ struct xen_domctl_bind_pt_irq {
>>              uint8_t intx;
>>          } pci;
>>          struct {
>> -            uint8_t gvec;
>>              uint32_t gflags;
>> -#define XEN_DOMCTL_VMSI_X86_DEST_ID_MASK 0x0000ff
>> -#define XEN_DOMCTL_VMSI_X86_RH_MASK      0x000100
>> -#define XEN_DOMCTL_VMSI_X86_DM_MASK      0x000200
>> -#define XEN_DOMCTL_VMSI_X86_DELIV_MASK   0x007000
>> -#define XEN_DOMCTL_VMSI_X86_TRIG_MASK    0x008000
>>  #define XEN_DOMCTL_VMSI_X86_UNMASKED     0x010000
>
>Oh, I see, you need gflags for the unmask thing only.

Yes. When we were rebasing, we found a conflict here. And after some
study about the new flag, It is no easy to remove this flag.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 18/28] x86/vioapic: Hook interrupt delivery of vIOAPIC
  2018-02-24  1:51     ` Chao Gao
@ 2018-02-24  3:17       ` Tian, Kevin
  0 siblings, 0 replies; 83+ messages in thread
From: Tian, Kevin @ 2018-02-24  3:17 UTC (permalink / raw)
  To: Gao, Chao, Roger Pau Monné
  Cc: Lan Tianyu, Stefano Stabellini, Wei Liu, George Dunlap,
	Ian Jackson, Tim Deegan, xen-devel, Jan Beulich, Andrew Cooper

> From: Gao, Chao
> Sent: Saturday, February 24, 2018 9:51 AM
> 
> On Mon, Feb 12, 2018 at 02:54:02PM +0000, Roger Pau Monné wrote:
> >On Fri, Nov 17, 2017 at 02:22:25PM +0800, Chao Gao wrote:
> >> When irq remapping is enabled, IOAPIC Redirection Entry may be in
> remapping
> >> format. If that, generate an irq_remapping_request and call the
> common
> >
> >"If that's the case, ..."
> >
> >> VIOMMU abstraction's callback to handle this interrupt request. Device
> >> model is responsible for checking the request's validity.
> >
> >What does this exactly mean? Device model is not involved in what the
> >guest writes to the vIOAPIC RTE, so it's impossible for the device
> >model to validate this in any way.
> 
> How about this description:
> When irq remapping is enabled, IOAPIC Redirection Entry may be in
> remapping
> format. If that's the case, an irq_remapping_request will be generated and
> IOMMU-specific handler deals with this request. IOMMU-specific handler
> will check whether the request is valid or not, report error via
> IOMMU-specific machanism if invalid or otherwise transform the request
> to an
> interrupt info (including interrupt destination, vector and trigger mode
> etc.) according to IRT.
> 

the description should match what this patch actually does.
detail about how caller works should be left to the patch where
the caller is introduced. Here imo the subject line is already
clear enough... isn't it? 

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 21/28] vvtd: update hvm_gmsi_info when binding guest msi with pirq or
  2018-02-12 15:38   ` Roger Pau Monné
@ 2018-02-24  5:05     ` Chao Gao
  0 siblings, 0 replies; 83+ messages in thread
From: Chao Gao @ 2018-02-24  5:05 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lan Tianyu, Kevin Tian, Stefano Stabellini, Wei Liu,
	George Dunlap, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper

On Mon, Feb 12, 2018 at 03:38:07PM +0000, Roger Pau Monné wrote:
>On Fri, Nov 17, 2017 at 02:22:28PM +0800, Chao Gao wrote:
>> ... handlding guest's invalidation request.
>> 
>> To support pirq migration optimization and using VT-d posted interrupt to
>> inject msi from assigned devices, each time guest programs msi information
>> (affinity, vector), the struct hvm_gmsi_info should be updated accordingly.
>> But after introducing vvtd, guest only needs to update an IRTE, which is in
>> guest memory, to program msi information.  vvtd doesn't trap r/w to the memory
>> range. Instead, it traps the queue invalidation, which is a method used to
>> notify VT-d hardware that an IRTE has changed.
>> 
>> This patch updates hvm_gmsi_info structure and programs physical IRTEs to use
>> VT-d posted interrupt if possible when binding guest msi with pirq or handling
>> guest's invalidation request. For the latter, all physical interrupts bound
>> with the domain are gone through to find the ones matching with the IRTE.
>> 
>> Notes: calling vvtd_process_iq() in vvtd_read() rather than in
>> vvtd_handle_irq_request() is to avoid ABBA deadlock of d->event_lock and
>> vvtd->ie_lock.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> ---
>> v4:
>>  - new
>> ---
>>  xen/arch/x86/hvm/hvm.c             |  2 +-
>>  xen/drivers/passthrough/io.c       | 89 ++++++++++++++++++++++++++++----------
>>  xen/drivers/passthrough/vtd/vvtd.c | 70 ++++++++++++++++++++++++++++--
>>  xen/include/asm-x86/hvm/hvm.h      |  2 +
>>  xen/include/asm-x86/hvm/irq.h      |  1 +
>>  xen/include/asm-x86/viommu.h       | 11 +++++
>>  6 files changed, 147 insertions(+), 28 deletions(-)
>> 
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index 964418a..d2c1372 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -462,7 +462,7 @@ void hvm_migrate_timers(struct vcpu *v)
>>      pt_migrate(v);
>>  }
>>  
>> -static int hvm_migrate_pirq(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
>> +int hvm_migrate_pirq(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
>>                              void *arg)
>>  {
>>      struct vcpu *v = arg;
>> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
>> index d8c66bf..9198ef5 100644
>> --- a/xen/drivers/passthrough/io.c
>> +++ b/xen/drivers/passthrough/io.c
>> @@ -21,6 +21,7 @@
>>  #include <xen/iommu.h>
>>  #include <xen/cpu.h>
>>  #include <xen/irq.h>
>> +#include <xen/viommu.h>
>>  #include <asm/hvm/irq.h>
>>  #include <asm/hvm/support.h>
>>  #include <asm/io_apic.h>
>> @@ -275,6 +276,61 @@ static struct vcpu *vector_hashing_dest(const struct domain *d,
>>      return dest;
>>  }
>>  
>> +void pt_update_gmsi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
>> +{
>> +    uint8_t dest, delivery_mode;
>> +    bool dest_mode;
>> +    int dest_vcpu_id;
>> +    const struct vcpu *vcpu;
>> +    struct arch_irq_remapping_request request;
>> +    struct arch_irq_remapping_info remap_info;
>> +
>> +    ASSERT(spin_is_locked(&d->event_lock));
>> +
>> +    /* Calculate dest_vcpu_id for MSI-type pirq migration. */
>> +    irq_request_msi_fill(&request, pirq_dpci->gmsi.addr, pirq_dpci->gmsi.data);
>> +    if ( viommu_check_irq_remapping(d, &request) )
>> +    {
>> +        /* An error in IRTE, don't perform the optimization */
>> +        if ( viommu_get_irq_info(d, &request, &remap_info) )
>> +        {
>> +            pirq_dpci->gmsi.posted = false;
>> +            pirq_dpci->gmsi.dest_vcpu_id = -1;
>> +            pirq_dpci->gmsi.gvec = 0;
>> +            return;
>> +        }
>> +
>> +        dest = remap_info.dest;
>> +        dest_mode = remap_info.dest_mode;
>> +        delivery_mode = remap_info.delivery_mode;
>> +        pirq_dpci->gmsi.gvec = remap_info.vector;
>> +    }
>> +    else
>> +    {
>> +        dest = MASK_EXTR(pirq_dpci->gmsi.addr, MSI_ADDR_DEST_ID_MASK);
>> +        dest_mode = pirq_dpci->gmsi.addr & MSI_ADDR_DESTMODE_MASK;
>> +        delivery_mode = MASK_EXTR(pirq_dpci->gmsi.data,
>> +                                  MSI_DATA_DELIVERY_MODE_MASK);
>> +        pirq_dpci->gmsi.gvec = pirq_dpci->gmsi.data & MSI_DATA_VECTOR_MASK;
>> +    }
>> +
>> +    dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
>> +    pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
>> +
>> +    pirq_dpci->gmsi.posted = false;
>> +    vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
>
>So you use dest_vcpu_id to get the vcpu here...
>
>> +    if ( iommu_intpost )
>> +    {
>> +        if ( delivery_mode == dest_LowestPrio )
>> +            vcpu = vector_hashing_dest(d, dest, dest_mode, pirq_dpci->gmsi.gvec);
>> +        if ( vcpu )
>> +        {
>> +            pirq_dpci->gmsi.posted = true;
>> +            pirq_dpci->gmsi.dest_vcpu_id = vcpu->vcpu_id;
>
>... which is only used here in order to get the dest_vcpu_id back. Is
>this really needed? Can't you just use dest_vcpu_id?
>
>I would rather do:
>
>if ( iommu_intpost && delivery_mode == dest_LowestPrio )
>{

These two 'if' cannot be combined. Because for "iommu_intpost && !dest_lowestPrio" case,
the 'pirq_dpci->gmsi.posted' may also need to be set.

>    const struct vcpu *vcpu = vector_hashing_dest(d, dest, dest_mode,
>                                                  pirq_dpci->gmsi.gvec);
>
>    if ( vcpu )
>    {
>        ....
>    }
>}

How about:
    if ( iommu_intpost )
    {
        const struct vcpu *vcpu;

        if ( delivery_mode == dest_LowestPrio )
            vcpu = vector_hashing_dest(d, dest, dest_mode, pirq_dpci->gmsi.gvec);
        if ( vcpu )
	    dest_vcpu_id = vcpu->vcpu_id;
	if ( dest_vcpu_id >= 0 )
        {
            pirq_dpci->gmsi.posted = true;
            pirq_dpci->gmsi.dest_vcpu_id = vcpu->vcpu_id;

>
>> +        }
>> +    }
>> +}
>> +
>>  int pt_irq_create_bind(
>>      struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
>>  {
>> @@ -339,9 +395,6 @@ int pt_irq_create_bind(
>>      {
>>      case PT_IRQ_TYPE_MSI:
>>      {
>> -        uint8_t dest, delivery_mode, gvec;
>> -        bool dest_mode;
>> -        int dest_vcpu_id;
>>          const struct vcpu *vcpu;
>>  
>>          if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
>> @@ -411,35 +464,23 @@ int pt_irq_create_bind(
>>                  pirq_dpci->gmsi.addr = pt_irq_bind->u.msi.addr;
>>              }
>>          }
>> -        /* Calculate dest_vcpu_id for MSI-type pirq migration. */
>> -        dest = MASK_EXTR(pirq_dpci->gmsi.addr, MSI_ADDR_DEST_ID_MASK);
>> -        dest_mode = pirq_dpci->gmsi.addr & MSI_ADDR_DESTMODE_MASK;
>> -        delivery_mode = MASK_EXTR(pirq_dpci->gmsi.data,
>> -                                  MSI_DATA_DELIVERY_MODE_MASK);
>> -        gvec = pirq_dpci->gmsi.data & MSI_DATA_VECTOR_MASK;
>> -        pirq_dpci->gmsi.gvec = gvec;
>>  
>> -        dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
>> -        pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
>> +        pt_update_gmsi(d, pirq_dpci);
>>          spin_unlock(&d->event_lock);
>>  
>> -        pirq_dpci->gmsi.posted = false;
>> -        vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
>> -        if ( iommu_intpost )
>> -        {
>> -            if ( delivery_mode == dest_LowestPrio )
>> -                vcpu = vector_hashing_dest(d, dest, dest_mode,
>> -                                           pirq_dpci->gmsi.gvec);
>> -            if ( vcpu )
>> -                pirq_dpci->gmsi.posted = true;
>> -        }
>> -        if ( dest_vcpu_id >= 0 )
>> -            hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
>> +        if ( pirq_dpci->gmsi.dest_vcpu_id >= 0 )
>> +            hvm_migrate_pirqs(d->vcpu[pirq_dpci->gmsi.dest_vcpu_id]);
>>  
>>          /* Use interrupt posting if it is supported. */
>>          if ( iommu_intpost )
>> +        {
>> +            if ( pirq_dpci->gmsi.posted )
>> +                vcpu = d->vcpu[pirq_dpci->gmsi.dest_vcpu_id];
>> +            else
>> +                vcpu = NULL;
>>              pi_update_irte(vcpu ? &vcpu->arch.hvm_vmx.pi_desc : NULL,
>>                             info, pirq_dpci->gmsi.gvec);
>
>If vcpu is now only used inside of this if condition please move it's
>declaration here to reduce the scope.

Will do.

>
>> +        }
>>  
>>          if ( pt_irq_bind->u.msi.gflags & XEN_DOMCTL_VMSI_X86_UNMASKED )
>>          {
>> diff --git a/xen/drivers/passthrough/vtd/vvtd.c b/xen/drivers/passthrough/vtd/vvtd.c
>> index f6bde69..d12ad1d 100644
>> --- a/xen/drivers/passthrough/vtd/vvtd.c
>> +++ b/xen/drivers/passthrough/vtd/vvtd.c
>> @@ -477,6 +477,50 @@ static int vvtd_record_fault(struct vvtd *vvtd,
>>  }
>>  
>>  /*
>> + * 'arg' is the index of interrupt remapping table. This index is used to
>> + * search physical irqs which satify that the gmsi mapped with the physical irq
>> + * is tranlated by the IRTE refered to by the index. The struct hvm_gmsi_info
>> + * contains some fields are infered from an virtual IRTE. These fields should
>> + * be updated when guest invalidates an IRTE. Furthermore, the physical IRTE
>> + * is updated accordingly to reduce IPIs or utilize VT-d posted interrupt.
>> + *
>> + * if 'arg' is -1, perform a global invalidation.
>> + */
>> +static int invalidate_gmsi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
>> +                         void *arg)
>> +{
>> +    if ( pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI )
>> +    {
>> +        uint32_t index, target = (long)arg;
>> +        struct arch_irq_remapping_request req;
>> +        const struct vcpu *vcpu;
>> +
>> +        irq_request_msi_fill(&req, pirq_dpci->gmsi.addr, pirq_dpci->gmsi.data);
>> +        if ( !irq_remapping_request_index(&req, &index) &&
>> +             ((target == -1) || (target == index)) )
>
>Shouldn't this -1 be some kind of define, like GMSI_ALL or similar?
>Also isn't it possible to use -1 as a valid target?

Ok. Will do.

>
>> +        {
>> +            pt_update_gmsi(d, pirq_dpci);
>> +            if ( pirq_dpci->gmsi.dest_vcpu_id >= 0 )
>> +                hvm_migrate_pirq(d, pirq_dpci,
>> +                                 d->vcpu[pirq_dpci->gmsi.dest_vcpu_id]);
>> +
>> +            /* Use interrupt posting if it is supported. */
>> +            if ( iommu_intpost )
>> +            {
>> +                if ( pirq_dpci->gmsi.posted )
>> +                    vcpu = d->vcpu[pirq_dpci->gmsi.dest_vcpu_id];
>> +                else
>> +                    vcpu = NULL;
>> +                pi_update_irte(vcpu ? &vcpu->arch.hvm_vmx.pi_desc : NULL,
>> +                               dpci_pirq(pirq_dpci), pirq_dpci->gmsi.gvec);
>> +            }
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/*
>>   * Process an invalidation descriptor. Currently, only two types descriptors,
>>   * Interrupt Entry Cache Invalidation Descritor and Invalidation Wait
>>   * Descriptor are handled.
>> @@ -530,7 +574,26 @@ static int process_iqe(struct vvtd *vvtd, uint32_t i)
>>          break;
>>  
>>      case TYPE_INVAL_IEC:
>> -        /* No cache is preserved in vvtd, nothing is needed to be flushed */
>> +        /*
>> +         * If VT-d pi is enabled, pi_update_irte() may be called. It assumes
>> +         * pcidevs_locked().
>> +         */
>> +        pcidevs_lock();
>> +        spin_lock(&vvtd->domain->event_lock);
>> +        /* A global invalidation of the cache is requested */
>> +        if ( !qinval.q.iec_inv_dsc.lo.granu )
>> +            pt_pirq_iterate(vvtd->domain, invalidate_gmsi, (void *)(long)-1);
>> +        else
>> +        {
>> +            uint32_t iidx = qinval.q.iec_inv_dsc.lo.iidx;
>> +            uint32_t nr = 1 << qinval.q.iec_inv_dsc.lo.im;
>> +
>> +            for ( ; nr; nr--, iidx++)
>
>You can initialize nr in the for loop.
>
>> +                pt_pirq_iterate(vvtd->domain, invalidate_gmsi,
>> +                                (void *)(long)iidx);
>> +        }
>> +        spin_unlock(&vvtd->domain->event_lock);
>> +        pcidevs_unlock();
>>          break;
>>  
>>      default:
>> @@ -839,6 +902,8 @@ static int vvtd_read(struct vcpu *v, unsigned long addr,
>>      else
>>          *pval = vvtd_get_reg_quad(vvtd, offset);
>>  
>> +    if ( !atomic_read(&vvtd->inflight_intr) )
>> +        vvtd_process_iq(vvtd);
>>      return X86EMUL_OKAY;
>>  }
>>  
>> @@ -1088,8 +1153,7 @@ static int vvtd_handle_irq_request(const struct domain *d,
>>                          irte.remap.tm);
>>  
>>   out:
>> -    if ( !atomic_dec_and_test(&vvtd->inflight_intr) )
>> -        vvtd_process_iq(vvtd);
>> +    atomic_dec(&vvtd->inflight_intr);
>
>Why is this removed? It was changed like 4 patches before, and
>reverted here.

Here it is to avoid a deadlock. In this patch, d->event_lock is acquired
when handling invalidation request of TYPE_INVAL_IEC type. Acquired this
lock is needed for pt_pirq_iterate() is called. But for some cases, when
we are in vvtd_handle_irq_request(), the d->event_lock is already held.
So for the following call trace:
hvm_dirq_assist() -> vmsi_deliver_pirq() -> viommu_handle_irq_request()
-> vvtd_process_iq() -> process_iqe().

d->event_lock will be acquired twice, one in hvm_dirq_assist() and the
other in process_iqe(). Moving vvtd_process_iq() out of
viommu_handle_irq_request() can avoid this deadlock.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d
  2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
                   ` (27 preceding siblings ...)
  2017-11-17  6:22 ` [PATCH v4 28/28] tools/libxc: Add viommu operations in libxc Chao Gao
@ 2018-10-04 15:51 ` Jan Beulich
  28 siblings, 0 replies; 83+ messages in thread
From: Jan Beulich @ 2018-10-04 15:51 UTC (permalink / raw)
  To: Chao Gao
  Cc: Tim Deegan, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Roger Pau Monne

>>> On 17.11.17 at 07:22, <chao.gao@intel.com> wrote:
> This patchset is to introduce vIOMMU framework and add virtual VTD's
> interrupt remapping support according "Xen virtual IOMMU high level
> design doc V3"(https://lists.xenproject.org/archives/html/xen-devel/ 
> 2016-11/msg01391.html).
> 
> - vIOMMU framework
> New framework provides viommu_ops and help functions to abstract
> vIOMMU operations(E,G create, destroy, handle irq remapping request
> and so on). Vendors(Intel, ARM, AMD and son) can implement their
> vIOMMU callbacks.
> 
> - Virtual VTD
> We enable irq remapping function and covers both
> MSI and IOAPIC interrupts. Don't support post interrupt mode emulation
> and post interrupt mode enabled on host with virtual VTD. will add
> later.
> 
> In case of conflicts, this series also can be found in my personal github:
> Xen: https://github.com/gc1008/viommu_xen.git vIOMMU4
> Qemu: https://github.com/gc1008/viommu_qemu.git vIOMMU3
> 
> Any comments would be highly appreciated. And below is change history.

So I still had this in my to-be-looked-at folder, but given how old
it is and given how much other dependencies it has as per earlier
discussion (plus quite certainly a fair amount of re-basing), I've
decided to drop this version now, before it reaches its 1 year
submission anniversary.

In the hope for your understanding,
Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2018-10-04 15:51 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-17  6:22 [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Chao Gao
2017-11-17  6:22 ` [PATCH v4 01/28] Xen/doc: Add Xen virtual IOMMU doc Chao Gao
2018-02-09 12:54   ` Roger Pau Monné
2018-02-09 15:53     ` Chao Gao
2017-11-17  6:22 ` [PATCH v4 02/28] VIOMMU: Add vIOMMU framework and vIOMMU domctl Chao Gao
2018-02-09 14:33   ` Roger Pau Monné
2018-02-09 16:13     ` Chao Gao
2017-11-17  6:22 ` [PATCH v4 03/28] VIOMMU: Add irq request callback to deal with irq remapping Chao Gao
2018-02-09 15:02   ` Roger Pau Monné
2018-02-09 16:21     ` Chao Gao
2017-11-17  6:22 ` [PATCH v4 04/28] VIOMMU: Add get irq info callback to convert irq remapping request Chao Gao
2018-02-09 15:06   ` Roger Pau Monné
2018-02-09 16:34     ` Chao Gao
2017-11-17  6:22 ` [PATCH v4 05/28] VIOMMU: Introduce callback of checking irq remapping mode Chao Gao
2018-02-09 15:11   ` Roger Pau Monné
2018-02-09 16:47     ` Chao Gao
2018-02-12 10:21       ` Roger Pau Monné
2017-11-17  6:22 ` [PATCH v4 06/28] vtd: clean-up and preparation for vvtd Chao Gao
2018-02-09 15:17   ` Roger Pau Monné
2018-02-09 16:51     ` Chao Gao
2017-11-17  6:22 ` [PATCH v4 07/28] x86/hvm: Introduce a emulated VTD for HVM Chao Gao
2018-02-09 16:27   ` Roger Pau Monné
2018-02-09 17:12     ` Chao Gao
2018-02-12 10:35       ` Roger Pau Monné
2017-11-17  6:22 ` [PATCH v4 08/28] x86/vvtd: Add MMIO handler for VVTD Chao Gao
2018-02-09 16:39   ` Roger Pau Monné
2018-02-09 17:21     ` Chao Gao
2018-02-09 17:51       ` Roger Pau Monné
2018-02-22  6:20         ` Chao Gao
2018-02-23 17:07           ` Roger Pau Monné
2018-02-23 17:37             ` Wei Liu
2017-11-17  6:22 ` [PATCH v4 09/28] x86/vvtd: Set Interrupt Remapping Table Pointer through GCMD Chao Gao
2018-02-09 16:59   ` Roger Pau Monné
2018-02-11  4:34     ` Chao Gao
2018-02-11  5:09       ` Chao Gao
2018-02-12 11:25       ` Roger Pau Monné
2017-11-17  6:22 ` [PATCH v4 10/28] x86/vvtd: Enable Interrupt Remapping " Chao Gao
2018-02-09 17:15   ` Roger Pau Monné
2018-02-11  5:05     ` Chao Gao
2018-02-12 11:30       ` Roger Pau Monné
2018-02-22  6:25         ` Chao Gao
2017-11-17  6:22 ` [PATCH v4 11/28] x86/vvtd: Process interrupt remapping request Chao Gao
2018-02-09 17:44   ` Roger Pau Monné
2018-02-11  5:31     ` Chao Gao
2018-02-23 17:04       ` Roger Pau Monné
2017-11-17  6:22 ` [PATCH v4 12/28] x86/vvtd: decode interrupt attribute from IRTE Chao Gao
2018-02-12 11:55   ` Roger Pau Monné
2018-02-22  6:33     ` Chao Gao
2017-11-17  6:22 ` [PATCH v4 13/28] x86/vvtd: add a helper function to decide the interrupt format Chao Gao
2018-02-12 12:14   ` Roger Pau Monné
2017-11-17  6:22 ` [PATCH v4 14/28] x86/vvtd: Handle interrupt translation faults Chao Gao
2018-02-12 12:55   ` Roger Pau Monné
2018-02-22  8:23     ` Chao Gao
2017-11-17  6:22 ` [PATCH v4 15/28] x86/vvtd: Enable Queued Invalidation through GCMD Chao Gao
2018-02-12 14:04   ` Roger Pau Monné
2018-02-22 10:33     ` Chao Gao
2017-11-17  6:22 ` [PATCH v4 16/28] x86/vvtd: Add queued invalidation (QI) support Chao Gao
2018-02-12 14:36   ` Roger Pau Monné
2018-02-23  4:38     ` Chao Gao
2017-11-17  6:22 ` [PATCH v4 17/28] x86/vvtd: save and restore emulated VT-d Chao Gao
2018-02-12 14:49   ` Roger Pau Monné
2018-02-23  5:22     ` Chao Gao
2018-02-23 17:19       ` Roger Pau Monné
2017-11-17  6:22 ` [PATCH v4 18/28] x86/vioapic: Hook interrupt delivery of vIOAPIC Chao Gao
2018-02-12 14:54   ` Roger Pau Monné
2018-02-24  1:51     ` Chao Gao
2018-02-24  3:17       ` Tian, Kevin
2017-11-17  6:22 ` [PATCH v4 19/28] x86/vioapic: extend vioapic_get_vector() to support remapping format RTE Chao Gao
2018-02-12 15:01   ` Roger Pau Monné
2017-11-17  6:22 ` [PATCH v4 20/28] xen/pt: when binding guest msi, accept the whole msi message Chao Gao
2018-02-12 15:16   ` Roger Pau Monné
2018-02-24  2:20     ` Chao Gao
2017-11-17  6:22 ` [PATCH v4 21/28] vvtd: update hvm_gmsi_info when binding guest msi with pirq or Chao Gao
2018-02-12 15:38   ` Roger Pau Monné
2018-02-24  5:05     ` Chao Gao
2017-11-17  6:22 ` [PATCH v4 22/28] x86/vmsi: Hook delivering remapping format msi to guest and handling eoi Chao Gao
2017-11-17  6:22 ` [PATCH v4 23/28] tools/libacpi: Add DMA remapping reporting (DMAR) ACPI table structures Chao Gao
2017-11-17  6:22 ` [PATCH v4 24/28] tools/libacpi: Add new fields in acpi_config for DMAR table Chao Gao
2017-11-17  6:22 ` [PATCH v4 25/28] tools/libxl: Add an user configurable parameter to control vIOMMU attributes Chao Gao
2017-11-17  6:22 ` [PATCH v4 26/28] tools/libxl: build DMAR table for a guest with one virtual VTD Chao Gao
2017-11-17  6:22 ` [PATCH v4 27/28] tools/libxl: create vIOMMU during domain construction Chao Gao
2017-11-17  6:22 ` [PATCH v4 28/28] tools/libxc: Add viommu operations in libxc Chao Gao
2018-10-04 15:51 ` [PATCH v4 00/28] add vIOMMU support with irq remapping function of virtual VT-d Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.