qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
@ 2019-10-24 12:34 Liu Yi L
  2019-10-24 12:34 ` [RFC v2 01/22] update-linux-headers: Import iommu.h Liu Yi L
                   ` (25 more replies)
  0 siblings, 26 replies; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, kvm, jun.j.tian, eric.auger,
	yi.y.sun, jacob.jun.pan, david

Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on Intel
platforms allow address space sharing between device DMA and applications.
SVA can reduce programming complexity and enhance security.
This series is intended to expose SVA capability to VMs. i.e. shared guest
application address space with passthru devices. The whole SVA virtualization
requires QEMU/VFIO/IOMMU changes. This series includes the QEMU changes, for
VFIO and IOMMU changes, they are in separate series (listed in the "Related
series").

The high-level architecture for SVA virtualization is as below:

    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process CR3, FL only|
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'                       |
    |             |                       V
    |             |                CR3 in GPA
    '-------------'
Guest
------| Shadow |--------------------------|--------
      v        v                          v
Host
    .-------------.  .----------------------.
    |   pIOMMU    |  | Bind FL for GVA-GPA  |
    |             |  '----------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.------------------------------.
    |             |   |SL for GPA-HPA, default domain|
    |             |   '------------------------------'
    '-------------'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

The complete vSVA upstream patches are divided into three phases:
    1. Common APIs and PCI device direct assignment
    2. Page Request Services (PRS) support
    3. Mediated device assignment

This RFC patchset is aiming for the phase 1. Works together with the VT-d
driver[1] changes and VFIO changes[2].

Related series:
[1] [PATCH v6 00/10] Nested Shared Virtual Address (SVA) VT-d support:
https://lkml.org/lkml/2019/10/22/953
<This series is based on this kernel series from Jacob Pan>

[2] [RFC v2 0/3] vfio: support Shared Virtual Addressing from Yi Liu

There are roughly four parts:
 1. Introduce IOMMUContext as abstract layer between vIOMMU emulator and
    VFIO to avoid direct calling between the two
 2. Passdown PASID allocation and free to host
 3. Passdown guest PASID binding to host
 4. Passdown guest IOMMU cache invalidation to host

The full set can be found in below link:
https://github.com/luxis1999/qemu.git: sva_vtd_v6_qemu_rfc_v2

Changelog:
	- RFC v1 -> v2:
	  Introduce IOMMUContext to abstract the connection between VFIO
	  and vIOMMU emulator, which is a replacement of the PCIPASIDOps
	  in RFC v1. Modify x-scalable-mode to be string option instead of
	  adding a new option as RFC v1 did. Refined the pasid cache management
	  and addressed the TODOs mentioned in RFC v1. 
	  RFC v1: https://patchwork.kernel.org/cover/11033657/

Eric Auger (1):
  update-linux-headers: Import iommu.h

Liu Yi L (20):
  header update VFIO/IOMMU vSVA APIs against 5.4.0-rc3+
  intel_iommu: modify x-scalable-mode to be string option
  vfio/common: add iommu_ctx_notifier in container
  hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps
  hw/pci: introduce pci_device_iommu_context()
  intel_iommu: provide get_iommu_context() callback
  vfio/pci: add iommu_context notifier for pasid alloc/free
  intel_iommu: add virtual command capability support
  intel_iommu: process pasid cache invalidation
  intel_iommu: add present bit check for pasid table entries
  intel_iommu: add PASID cache management infrastructure
  vfio/pci: add iommu_context notifier for pasid bind/unbind
  intel_iommu: bind/unbind guest page table to host
  intel_iommu: replay guest pasid bindings to host
  intel_iommu: replay pasid binds after context cache invalidation
  intel_iommu: do not passdown pasid bind for PASID #0
  vfio/pci: add iommu_context notifier for PASID-based iotlb flush
  intel_iommu: process PASID-based iotlb invalidation
  intel_iommu: propagate PASID-based iotlb invalidation to host
  intel_iommu: process PASID-based Device-TLB invalidation

Peter Xu (1):
  hw/iommu: introduce IOMMUContext

 hw/Makefile.objs                |    1 +
 hw/alpha/typhoon.c              |    6 +-
 hw/arm/smmu-common.c            |    6 +-
 hw/hppa/dino.c                  |    6 +-
 hw/i386/amd_iommu.c             |    6 +-
 hw/i386/intel_iommu.c           | 1249 +++++++++++++++++++++++++++++++++++++--
 hw/i386/intel_iommu_internal.h  |  109 ++++
 hw/i386/trace-events            |    6 +
 hw/iommu/Makefile.objs          |    1 +
 hw/iommu/iommu.c                |   66 +++
 hw/pci-host/designware.c        |    6 +-
 hw/pci-host/ppce500.c           |    6 +-
 hw/pci-host/prep.c              |    6 +-
 hw/pci-host/sabre.c             |    6 +-
 hw/pci/pci.c                    |   27 +-
 hw/ppc/ppc440_pcix.c            |    6 +-
 hw/ppc/spapr_pci.c              |    6 +-
 hw/s390x/s390-pci-bus.c         |    8 +-
 hw/vfio/common.c                |   10 +
 hw/vfio/pci.c                   |  149 +++++
 include/hw/i386/intel_iommu.h   |   58 +-
 include/hw/iommu/iommu.h        |  113 ++++
 include/hw/pci/pci.h            |   13 +-
 include/hw/pci/pci_bus.h        |    2 +-
 include/hw/vfio/vfio-common.h   |    9 +
 linux-headers/linux/iommu.h     |  324 ++++++++++
 linux-headers/linux/vfio.h      |   83 +++
 scripts/update-linux-headers.sh |    2 +-
 28 files changed, 2232 insertions(+), 58 deletions(-)
 create mode 100644 hw/iommu/Makefile.objs
 create mode 100644 hw/iommu/iommu.c
 create mode 100644 include/hw/iommu/iommu.h
 create mode 100644 linux-headers/linux/iommu.h

-- 
2.7.4



^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC v2 01/22] update-linux-headers: Import iommu.h
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-10-24 12:34 ` [RFC v2 02/22] header update VFIO/IOMMU vSVA APIs against 5.4.0-rc3+ Liu Yi L
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

From: Eric Auger <eric.auger@redhat.com>

Update the script to import the new iommu.h uapi header.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 scripts/update-linux-headers.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index f76d773..dfdfdfd 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -141,7 +141,7 @@ done
 
 rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
-for header in kvm.h vfio.h vfio_ccw.h vhost.h \
+for header in kvm.h vfio.h vfio_ccw.h vhost.h iommu.h \
               psci.h psp-sev.h userfaultfd.h mman.h; do
     cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
 done
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 02/22] header update VFIO/IOMMU vSVA APIs against 5.4.0-rc3+
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
  2019-10-24 12:34 ` [RFC v2 01/22] update-linux-headers: Import iommu.h Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-10-24 12:34 ` [RFC v2 03/22] intel_iommu: modify x-scalable-mode to be string option Liu Yi L
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

The kernel iommu.h header file includes the extensions for vSVA support.
e.g. guest PASID bind, iommu fault report related user structures etc.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 linux-headers/linux/iommu.h | 324 ++++++++++++++++++++++++++++++++++++++++++++
 linux-headers/linux/vfio.h  |  83 ++++++++++++
 2 files changed, 407 insertions(+)
 create mode 100644 linux-headers/linux/iommu.h

diff --git a/linux-headers/linux/iommu.h b/linux-headers/linux/iommu.h
new file mode 100644
index 0000000..872f786
--- /dev/null
+++ b/linux-headers/linux/iommu.h
@@ -0,0 +1,324 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * IOMMU user API definitions
+ */
+
+#ifndef _IOMMU_H
+#define _IOMMU_H
+
+#include <linux/types.h>
+
+#define IOMMU_FAULT_PERM_READ	(1 << 0) /* read */
+#define IOMMU_FAULT_PERM_WRITE	(1 << 1) /* write */
+#define IOMMU_FAULT_PERM_EXEC	(1 << 2) /* exec */
+#define IOMMU_FAULT_PERM_PRIV	(1 << 3) /* privileged */
+
+/* Generic fault types, can be expanded IRQ remapping fault */
+enum iommu_fault_type {
+	IOMMU_FAULT_DMA_UNRECOV = 1,	/* unrecoverable fault */
+	IOMMU_FAULT_PAGE_REQ,		/* page request fault */
+};
+
+enum iommu_fault_reason {
+	IOMMU_FAULT_REASON_UNKNOWN = 0,
+
+	/* Could not access the PASID table (fetch caused external abort) */
+	IOMMU_FAULT_REASON_PASID_FETCH,
+
+	/* PASID entry is invalid or has configuration errors */
+	IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
+
+	/*
+	 * PASID is out of range (e.g. exceeds the maximum PASID
+	 * supported by the IOMMU) or disabled.
+	 */
+	IOMMU_FAULT_REASON_PASID_INVALID,
+
+	/*
+	 * An external abort occurred fetching (or updating) a translation
+	 * table descriptor
+	 */
+	IOMMU_FAULT_REASON_WALK_EABT,
+
+	/*
+	 * Could not access the page table entry (Bad address),
+	 * actual translation fault
+	 */
+	IOMMU_FAULT_REASON_PTE_FETCH,
+
+	/* Protection flag check failed */
+	IOMMU_FAULT_REASON_PERMISSION,
+
+	/* access flag check failed */
+	IOMMU_FAULT_REASON_ACCESS,
+
+	/* Output address of a translation stage caused Address Size fault */
+	IOMMU_FAULT_REASON_OOR_ADDRESS,
+};
+
+/**
+ * struct iommu_fault_unrecoverable - Unrecoverable fault data
+ * @reason: reason of the fault, from &enum iommu_fault_reason
+ * @flags: parameters of this fault (IOMMU_FAULT_UNRECOV_* values)
+ * @pasid: Process Address Space ID
+ * @perm: requested permission access using by the incoming transaction
+ *        (IOMMU_FAULT_PERM_* values)
+ * @addr: offending page address
+ * @fetch_addr: address that caused a fetch abort, if any
+ */
+struct iommu_fault_unrecoverable {
+	__u32	reason;
+#define IOMMU_FAULT_UNRECOV_PASID_VALID		(1 << 0)
+#define IOMMU_FAULT_UNRECOV_ADDR_VALID		(1 << 1)
+#define IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID	(1 << 2)
+	__u32	flags;
+	__u32	pasid;
+	__u32	perm;
+	__u64	addr;
+	__u64	fetch_addr;
+};
+
+/**
+ * struct iommu_fault_page_request - Page Request data
+ * @flags: encodes whether the corresponding fields are valid and whether this
+ *         is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* values)
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
+ * @addr: page address
+ * @private_data: device-specific private information
+ */
+struct iommu_fault_page_request {
+#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID	(1 << 0)
+#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE	(1 << 1)
+#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA	(1 << 2)
+	__u32	flags;
+	__u32	pasid;
+	__u32	grpid;
+	__u32	perm;
+	__u64	addr;
+	__u64	private_data[2];
+};
+
+/**
+ * struct iommu_fault - Generic fault data
+ * @type: fault type from &enum iommu_fault_type
+ * @padding: reserved for future use (should be zero)
+ * @event: fault event, when @type is %IOMMU_FAULT_DMA_UNRECOV
+ * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
+ * @padding2: sets the fault size to allow for future extensions
+ */
+struct iommu_fault {
+	__u32	type;
+	__u32	padding;
+	union {
+		struct iommu_fault_unrecoverable event;
+		struct iommu_fault_page_request prm;
+		__u8 padding2[56];
+	};
+};
+
+/**
+ * enum iommu_page_response_code - Return status of fault handlers
+ * @IOMMU_PAGE_RESP_SUCCESS: Fault has been handled and the page tables
+ *	populated, retry the access. This is "Success" in PCI PRI.
+ * @IOMMU_PAGE_RESP_FAILURE: General error. Drop all subsequent faults from
+ *	this device if possible. This is "Response Failure" in PCI PRI.
+ * @IOMMU_PAGE_RESP_INVALID: Could not handle this fault, don't retry the
+ *	access. This is "Invalid Request" in PCI PRI.
+ */
+enum iommu_page_response_code {
+	IOMMU_PAGE_RESP_SUCCESS = 0,
+	IOMMU_PAGE_RESP_INVALID,
+	IOMMU_PAGE_RESP_FAILURE,
+};
+
+/**
+ * struct iommu_page_response - Generic page response information
+ * @version: API version of this structure
+ * @flags: encodes whether the corresponding fields are valid
+ *         (IOMMU_FAULT_PAGE_RESPONSE_* values)
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @code: response code from &enum iommu_page_response_code
+ */
+struct iommu_page_response {
+#define IOMMU_PAGE_RESP_VERSION_1	1
+	__u32	version;
+#define IOMMU_PAGE_RESP_PASID_VALID	(1 << 0)
+	__u32	flags;
+	__u32	pasid;
+	__u32	grpid;
+	__u32	code;
+};
+
+/* defines the granularity of the invalidation */
+enum iommu_inv_granularity {
+	IOMMU_INV_GRANU_DOMAIN,	/* domain-selective invalidation */
+	IOMMU_INV_GRANU_PASID,	/* PASID-selective invalidation */
+	IOMMU_INV_GRANU_ADDR,	/* page-selective invalidation */
+	IOMMU_INV_GRANU_NR,	/* number of invalidation granularities */
+};
+
+/**
+ * struct iommu_inv_addr_info - Address Selective Invalidation Structure
+ *
+ * @flags: indicates the granularity of the address-selective invalidation
+ * - If the PASID bit is set, the @pasid field is populated and the invalidation
+ *   relates to cache entries tagged with this PASID and matching the address
+ *   range.
+ * - If ARCHID bit is set, @archid is populated and the invalidation relates
+ *   to cache entries tagged with this architecture specific ID and matching
+ *   the address range.
+ * - Both PASID and ARCHID can be set as they may tag different caches.
+ * - If neither PASID or ARCHID is set, global addr invalidation applies.
+ * - The LEAF flag indicates whether only the leaf PTE caching needs to be
+ *   invalidated and other paging structure caches can be preserved.
+ * @pasid: process address space ID
+ * @archid: architecture-specific ID
+ * @addr: first stage/level input address
+ * @granule_size: page/block size of the mapping in bytes
+ * @nb_granules: number of contiguous granules to be invalidated
+ */
+struct iommu_inv_addr_info {
+#define IOMMU_INV_ADDR_FLAGS_PASID	(1 << 0)
+#define IOMMU_INV_ADDR_FLAGS_ARCHID	(1 << 1)
+#define IOMMU_INV_ADDR_FLAGS_LEAF	(1 << 2)
+	__u32	flags;
+	__u32	archid;
+	__u64	pasid;
+	__u64	addr;
+	__u64	granule_size;
+	__u64	nb_granules;
+};
+
+/**
+ * struct iommu_inv_pasid_info - PASID Selective Invalidation Structure
+ *
+ * @flags: indicates the granularity of the PASID-selective invalidation
+ * - If the PASID bit is set, the @pasid field is populated and the invalidation
+ *   relates to cache entries tagged with this PASID and matching the address
+ *   range.
+ * - If the ARCHID bit is set, the @archid is populated and the invalidation
+ *   relates to cache entries tagged with this architecture specific ID and
+ *   matching the address range.
+ * - Both PASID and ARCHID can be set as they may tag different caches.
+ * - At least one of PASID or ARCHID must be set.
+ * @pasid: process address space ID
+ * @archid: architecture-specific ID
+ */
+struct iommu_inv_pasid_info {
+#define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
+#define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
+	__u32	flags;
+	__u32	archid;
+	__u64	pasid;
+};
+
+/**
+ * struct iommu_cache_invalidate_info - First level/stage invalidation
+ *     information
+ * @version: API version of this structure
+ * @cache: bitfield that allows to select which caches to invalidate
+ * @granularity: defines the lowest granularity used for the invalidation:
+ *     domain > PASID > addr
+ * @padding: reserved for future use (should be zero)
+ * @pasid_info: invalidation data when @granularity is %IOMMU_INV_GRANU_PASID
+ * @addr_info: invalidation data when @granularity is %IOMMU_INV_GRANU_ADDR
+ *
+ * Not all the combinations of cache/granularity are valid:
+ *
+ * +--------------+---------------+---------------+---------------+
+ * | type /       |   DEV_IOTLB   |     IOTLB     |      PASID    |
+ * | granularity  |               |               |      cache    |
+ * +==============+===============+===============+===============+
+ * | DOMAIN       |       N/A     |       Y       |       Y       |
+ * +--------------+---------------+---------------+---------------+
+ * | PASID        |       Y       |       Y       |       Y       |
+ * +--------------+---------------+---------------+---------------+
+ * | ADDR         |       Y       |       Y       |       N/A     |
+ * +--------------+---------------+---------------+---------------+
+ *
+ * Invalidations by %IOMMU_INV_GRANU_DOMAIN don't take any argument other than
+ * @version and @cache.
+ *
+ * If multiple cache types are invalidated simultaneously, they all
+ * must support the used granularity.
+ */
+struct iommu_cache_invalidate_info {
+#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
+	__u32	version;
+/* IOMMU paging structure cache */
+#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /* IOMMU IOTLB */
+#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /* Device IOTLB */
+#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /* PASID cache */
+#define IOMMU_CACHE_INV_TYPE_NR		(3)
+	__u8	cache;
+	__u8	granularity;
+	__u8	padding[2];
+	union {
+		struct iommu_inv_pasid_info pasid_info;
+		struct iommu_inv_addr_info addr_info;
+	};
+};
+
+/**
+ * struct iommu_gpasid_bind_data_vtd - Intel VT-d specific data on device and guest
+ * SVA binding.
+ *
+ * @flags:	VT-d PASID table entry attributes
+ * @pat:	Page attribute table data to compute effective memory type
+ * @emt:	Extended memory type
+ *
+ * Only guest vIOMMU selectable and effective options are passed down to
+ * the host IOMMU.
+ */
+struct iommu_gpasid_bind_data_vtd {
+#define IOMMU_SVA_VTD_GPASID_SRE	(1 << 0) /* supervisor request */
+#define IOMMU_SVA_VTD_GPASID_EAFE	(1 << 1) /* extended access enable */
+#define IOMMU_SVA_VTD_GPASID_PCD	(1 << 2) /* page-level cache disable */
+#define IOMMU_SVA_VTD_GPASID_PWT	(1 << 3) /* page-level write through */
+#define IOMMU_SVA_VTD_GPASID_EMTE	(1 << 4) /* extended mem type enable */
+#define IOMMU_SVA_VTD_GPASID_CD		(1 << 5) /* PASID-level cache disable */
+	__u64 flags;
+	__u32 pat;
+	__u32 emt;
+};
+
+/**
+ * struct iommu_gpasid_bind_data - Information about device and guest PASID binding
+ * @version:	Version of this data structure
+ * @format:	PASID table entry format
+ * @flags:	Additional information on guest bind request
+ * @gpgd:	Guest page directory base of the guest mm to bind
+ * @hpasid:	Process address space ID used for the guest mm in host IOMMU
+ * @gpasid:	Process address space ID used for the guest mm in guest IOMMU
+ * @addr_width:	Guest virtual address width
+ * @padding:	Reserved for future use (should be zero)
+ * @vtd:	Intel VT-d specific data
+ *
+ * Guest to host PASID mapping can be an identity or non-identity, where guest
+ * has its own PASID space. For non-identify mapping, guest to host PASID lookup
+ * is needed when VM programs guest PASID into an assigned device. VMM may
+ * trap such PASID programming then request host IOMMU driver to convert guest
+ * PASID to host PASID based on this bind data.
+ */
+struct iommu_gpasid_bind_data {
+#define IOMMU_GPASID_BIND_VERSION_1	1
+	__u32 version;
+#define IOMMU_PASID_FORMAT_INTEL_VTD	1
+	__u32 format;
+#define IOMMU_SVA_GPASID_VAL	(1 << 0) /* guest PASID valid */
+	__u64 flags;
+	__u64 gpgd;
+	__u64 hpasid;
+	__u64 gpasid;
+	__u32 addr_width;
+	__u8  padding[12];
+	/* Vendor specific data */
+	union {
+		struct iommu_gpasid_bind_data_vtd vtd;
+	};
+};
+
+#endif /* _IOMMU_H */
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index fb10370..e5173b6 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -14,6 +14,7 @@
 
 #include <linux/types.h>
 #include <linux/ioctl.h>
+#include <linux/iommu.h>
 
 #define VFIO_API_VERSION	0
 
@@ -794,6 +795,88 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/**
+ * VFIO_IOMMU_CACHE_INVALIDATE - _IOWR(VFIO_TYPE, VFIO_BASE + 24,
+ *			struct vfio_iommu_type1_cache_invalidate)
+ *
+ * Propagate guest IOMMU cache invalidation to the host.
+ */
+struct vfio_iommu_type1_cache_invalidate {
+	__u32   argsz;
+	__u32   flags;
+	struct iommu_cache_invalidate_info info;
+};
+#define VFIO_IOMMU_CACHE_INVALIDATE      _IO(VFIO_TYPE, VFIO_BASE + 24)
+
+/*
+ * @flag=VFIO_IOMMU_PASID_ALLOC, refer to the @min_pasid and @max_pasid fields
+ * @flag=VFIO_IOMMU_PASID_FREE, refer to @pasid field
+ */
+struct vfio_iommu_type1_pasid_request {
+	__u32	argsz;
+#define VFIO_IOMMU_PASID_ALLOC	(1 << 0)
+#define VFIO_IOMMU_PASID_FREE	(1 << 1)
+	__u32	flag;
+	union {
+		struct {
+			int min_pasid;
+			int max_pasid;
+		};
+		int pasid;
+	};
+};
+
+/**
+ * VFIO_IOMMU_PASID_REQUEST - _IOWR(VFIO_TYPE, VFIO_BASE + 27,
+ *				struct vfio_iommu_type1_pasid_request)
+ *
+ */
+#define VFIO_IOMMU_PASID_REQUEST	_IO(VFIO_TYPE, VFIO_BASE + 27)
+
+enum vfio_iommu_bind_type {
+	VFIO_IOMMU_BIND_PROCESS,
+	VFIO_IOMMU_BIND_GUEST_PASID,
+};
+
+/*
+ * Supported types:
+ *	- VFIO_IOMMU_BIND_GUEST_PASID: bind guest pasid, which invoked
+ *			by guest, it takes iommu_gpasid_bind_data in data.
+ */
+struct vfio_iommu_type1_bind {
+	__u32				argsz;
+	enum vfio_iommu_bind_type	bind_type;
+	__u8				data[];
+};
+
+/*
+ * VFIO_IOMMU_BIND - _IOWR(VFIO_TYPE, VFIO_BASE + 28, struct vfio_iommu_bind)
+ *
+ * Manage address spaces of devices in this container. Initially a TYPE1
+ * container can only have one address space, managed with
+ * VFIO_IOMMU_MAP/UNMAP_DMA.
+ *
+ * An IOMMU of type VFIO_TYPE1_NESTING_IOMMU can be managed by both MAP/UNMAP
+ * and BIND ioctls at the same time. MAP/UNMAP acts on the stage-2 (host) page
+ * tables, and BIND manages the stage-1 (guest) page tables. Other types of
+ * IOMMU may allow MAP/UNMAP and BIND to coexist, where MAP/UNMAP controls
+ * non-PASID traffic and BIND controls PASID traffic. But this depends on the
+ * underlying IOMMU architecture and isn't guaranteed.
+ *
+ * Availability of this feature depends on the device, its bus, the underlying
+ * IOMMU and the CPU architecture.
+ *
+ * returns: 0 on success, -errno on failure.
+ */
+#define VFIO_IOMMU_BIND		_IO(VFIO_TYPE, VFIO_BASE + 28)
+
+/*
+ * VFIO_IOMMU_UNBIND - _IOWR(VFIO_TYPE, VFIO_BASE + 29, struct vfio_iommu_bind)
+ *
+ * Undo what was done by the corresponding VFIO_IOMMU_BIND ioctl.
+ */
+#define VFIO_IOMMU_UNBIND	_IO(VFIO_TYPE, VFIO_BASE + 29)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 03/22] intel_iommu: modify x-scalable-mode to be string option
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
  2019-10-24 12:34 ` [RFC v2 01/22] update-linux-headers: Import iommu.h Liu Yi L
  2019-10-24 12:34 ` [RFC v2 02/22] header update VFIO/IOMMU vSVA APIs against 5.4.0-rc3+ Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-11-01 14:57   ` Peter Xu
  2019-10-24 12:34 ` [RFC v2 04/22] hw/iommu: introduce IOMMUContext Liu Yi L
                   ` (22 subsequent siblings)
  25 siblings, 1 reply; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

Intel VT-d 3.0 introduces scalable mode, and it has a bunch of capabilities
related to scalable mode translation, thus there are multiple combinations.
While this vIOMMU implementation wants simplify it for user by providing
typical combinations. User could config it by "x-scalable-mode" option. The
usage is as below:

"-device intel-iommu,x-scalable-mode=["legacy"|"modern"]"

 - "legacy": gives support for SL page table
 - "modern": gives support for FL page table, pasid, virtual command
 -  if not configured, means no scalable mode support, if not proper
    configured, will throw error

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 15 +++++++++++++--
 hw/i386/intel_iommu_internal.h |  3 +++
 include/hw/i386/intel_iommu.h  |  2 +-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 771bed2..4a1a07a 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3019,7 +3019,7 @@ static Property vtd_properties[] = {
     DEFINE_PROP_UINT8("aw-bits", IntelIOMMUState, aw_bits,
                       VTD_HOST_ADDRESS_WIDTH),
     DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
-    DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode, FALSE),
+    DEFINE_PROP_STRING("x-scalable-mode", IntelIOMMUState, scalable_mode),
     DEFINE_PROP_BOOL("dma-drain", IntelIOMMUState, dma_drain, true),
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -3581,7 +3581,12 @@ static void vtd_init(IntelIOMMUState *s)
 
     /* TODO: read cap/ecap from host to decide which cap to be exposed. */
     if (s->scalable_mode) {
-        s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
+        if (!strcmp(s->scalable_mode, "legacy")) {
+            s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
+        } else if (!strcmp(s->scalable_mode, "modern")) {
+            s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID
+                       | VTD_ECAP_FLTS | VTD_ECAP_PSS;
+        }
     }
 
     vtd_reset_caches(s);
@@ -3700,6 +3705,12 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
         return false;
     }
 
+    if (s->scalable_mode &&
+        (strcmp(s->scalable_mode, "modern") &&
+         strcmp(s->scalable_mode, "legacy"))) {
+            error_setg(errp, "Invalid x-scalable-mode config");
+    }
+
     return true;
 }
 
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index c1235a7..be7b30a 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -190,8 +190,11 @@
 #define VTD_ECAP_PT                 (1ULL << 6)
 #define VTD_ECAP_MHMV               (15ULL << 20)
 #define VTD_ECAP_SRS                (1ULL << 31)
+#define VTD_ECAP_PSS                (19ULL << 35)
+#define VTD_ECAP_PASID              (1ULL << 40)
 #define VTD_ECAP_SMTS               (1ULL << 43)
 #define VTD_ECAP_SLTS               (1ULL << 46)
+#define VTD_ECAP_FLTS               (1ULL << 47)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 66b931e..6062588 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -231,7 +231,7 @@ struct IntelIOMMUState {
     uint32_t version;
 
     bool caching_mode;              /* RO - is cap CM enabled? */
-    bool scalable_mode;             /* RO - is Scalable Mode supported? */
+    char *scalable_mode;            /* RO - Scalable Mode model */
 
     dma_addr_t root;                /* Current root table pointer */
     bool root_scalable;             /* Type of root table (scalable or not) */
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 04/22] hw/iommu: introduce IOMMUContext
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (2 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 03/22] intel_iommu: modify x-scalable-mode to be string option Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-10-27 17:39   ` David Gibson
  2019-10-24 12:34 ` [RFC v2 05/22] vfio/common: add iommu_ctx_notifier in container Liu Yi L
                   ` (21 subsequent siblings)
  25 siblings, 1 reply; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

From: Peter Xu <peterx@redhat.com>

This patch adds IOMMUContext as an abstract layer of IOMMU related
operations. The current usage of this abstract layer is setup dual-
stage IOMMU translation (vSVA) for vIOMMU.

To setup dual-stage IOMMU translation, vIOMMU needs to propagate
guest changes to host via passthru channels (e.g. VFIO). To have
a better abstraction, it is better to avoid direct calling between
vIOMMU and VFIO. So we have this new structure to act as abstract
layer between VFIO and vIOMMU. So far, it is proposed to provide a
notifier mechanism, which registered by VFIO and fired by vIOMMU.

For more background, may refer to the discussion below:

https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg05022.html

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/Makefile.objs         |  1 +
 hw/iommu/Makefile.objs   |  1 +
 hw/iommu/iommu.c         | 66 ++++++++++++++++++++++++++++++++++++++++
 include/hw/iommu/iommu.h | 79 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 147 insertions(+)
 create mode 100644 hw/iommu/Makefile.objs
 create mode 100644 hw/iommu/iommu.c
 create mode 100644 include/hw/iommu/iommu.h

diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index ece6cc3..ac19f9c 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -39,6 +39,7 @@ devices-dirs-y += xen/
 devices-dirs-$(CONFIG_MEM_DEVICE) += mem/
 devices-dirs-y += semihosting/
 devices-dirs-y += smbios/
+devices-dirs-y += iommu/
 endif
 
 common-obj-y += $(devices-dirs-y)
diff --git a/hw/iommu/Makefile.objs b/hw/iommu/Makefile.objs
new file mode 100644
index 0000000..0484b79
--- /dev/null
+++ b/hw/iommu/Makefile.objs
@@ -0,0 +1 @@
+obj-y += iommu.o
diff --git a/hw/iommu/iommu.c b/hw/iommu/iommu.c
new file mode 100644
index 0000000..2391b0d
--- /dev/null
+++ b/hw/iommu/iommu.c
@@ -0,0 +1,66 @@
+/*
+ * QEMU abstract of IOMMU context
+ *
+ * Copyright (C) 2019 Red Hat Inc.
+ *
+ * Authors: Peter Xu <peterx@redhat.com>,
+ *          Liu Yi L <yi.l.liu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/iommu/iommu.h"
+
+void iommu_ctx_notifier_register(IOMMUContext *iommu_ctx,
+                                 IOMMUCTXNotifier *n,
+                                 IOMMUCTXNotifyFn fn,
+                                 IOMMUCTXEvent event)
+{
+    n->event = event;
+    n->iommu_ctx_event_notify = fn;
+    QLIST_INSERT_HEAD(&iommu_ctx->iommu_ctx_notifiers, n, node);
+    return;
+}
+
+void iommu_ctx_notifier_unregister(IOMMUContext *iommu_ctx,
+                                   IOMMUCTXNotifier *notifier)
+{
+    IOMMUCTXNotifier *cur, *next;
+
+    QLIST_FOREACH_SAFE(cur, &iommu_ctx->iommu_ctx_notifiers, node, next) {
+        if (cur == notifier) {
+            QLIST_REMOVE(cur, node);
+            break;
+        }
+    }
+}
+
+void iommu_ctx_event_notify(IOMMUContext *iommu_ctx,
+                            IOMMUCTXEventData *event_data)
+{
+    IOMMUCTXNotifier *cur;
+
+    QLIST_FOREACH(cur, &iommu_ctx->iommu_ctx_notifiers, node) {
+        if ((cur->event == event_data->event) &&
+                                 cur->iommu_ctx_event_notify) {
+            cur->iommu_ctx_event_notify(cur, event_data);
+        }
+    }
+}
+
+void iommu_context_init(IOMMUContext *iommu_ctx)
+{
+    QLIST_INIT(&iommu_ctx->iommu_ctx_notifiers);
+}
diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
new file mode 100644
index 0000000..c22c442
--- /dev/null
+++ b/include/hw/iommu/iommu.h
@@ -0,0 +1,79 @@
+/*
+ * QEMU abstraction of IOMMU Context
+ *
+ * Copyright (C) 2019 Red Hat Inc.
+ *
+ * Authors: Peter Xu <peterx@redhat.com>,
+ *          Liu, Yi L <yi.l.liu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_PCI_PASID_H
+#define HW_PCI_PASID_H
+
+#include "qemu/queue.h"
+#ifndef CONFIG_USER_ONLY
+#include "exec/hwaddr.h"
+#endif
+
+typedef struct IOMMUContext IOMMUContext;
+
+enum IOMMUCTXEvent {
+    IOMMU_CTX_EVENT_NUM,
+};
+typedef enum IOMMUCTXEvent IOMMUCTXEvent;
+
+struct IOMMUCTXEventData {
+    IOMMUCTXEvent event;
+    uint64_t length;
+    void *data;
+};
+typedef struct IOMMUCTXEventData IOMMUCTXEventData;
+
+typedef struct IOMMUCTXNotifier IOMMUCTXNotifier;
+
+typedef void (*IOMMUCTXNotifyFn)(IOMMUCTXNotifier *notifier,
+                                 IOMMUCTXEventData *event_data);
+
+struct IOMMUCTXNotifier {
+    IOMMUCTXNotifyFn iommu_ctx_event_notify;
+    /*
+     * What events we are listening to. Let's allow multiple event
+     * registrations from beginning.
+     */
+    IOMMUCTXEvent event;
+    QLIST_ENTRY(IOMMUCTXNotifier) node;
+};
+
+/*
+ * This is an abstraction of IOMMU context.
+ */
+struct IOMMUContext {
+    uint32_t pasid;
+    QLIST_HEAD(, IOMMUCTXNotifier) iommu_ctx_notifiers;
+};
+
+void iommu_ctx_notifier_register(IOMMUContext *iommu_ctx,
+                                 IOMMUCTXNotifier *n,
+                                 IOMMUCTXNotifyFn fn,
+                                 IOMMUCTXEvent event);
+void iommu_ctx_notifier_unregister(IOMMUContext *iommu_ctx,
+                                   IOMMUCTXNotifier *notifier);
+void iommu_ctx_event_notify(IOMMUContext *iommu_ctx,
+                            IOMMUCTXEventData *event_data);
+
+void iommu_context_init(IOMMUContext *iommu_ctx);
+
+#endif
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 05/22] vfio/common: add iommu_ctx_notifier in container
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (3 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 04/22] hw/iommu: introduce IOMMUContext Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-11-01 14:58   ` Peter Xu
  2019-10-24 12:34 ` [RFC v2 06/22] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps Liu Yi L
                   ` (20 subsequent siblings)
  25 siblings, 1 reply; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

This patch adds a list in VFIOContainer to store iommu_ctx_notifier
which is based on IOMMUContext. As a preparation of registering
iommu_ctx_notifiers.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/vfio/common.c              | 1 +
 include/hw/vfio/vfio-common.h | 9 +++++++++
 2 files changed, 10 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 5ca1148..d418527 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1271,6 +1271,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     container->error = NULL;
     QLIST_INIT(&container->giommu_list);
     QLIST_INIT(&container->hostwin_list);
+    QLIST_INIT(&container->iommu_ctx_list);
 
     ret = vfio_init_container(container, group->fd, errp);
     if (ret) {
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index fd56420..975d12b 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -29,6 +29,7 @@
 #ifdef CONFIG_LINUX
 #include <linux/vfio.h>
 #endif
+#include "hw/iommu/iommu.h"
 
 #define VFIO_MSG_PREFIX "vfio %s: "
 
@@ -75,6 +76,7 @@ typedef struct VFIOContainer {
     bool initialized;
     unsigned long pgsizes;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
+    QLIST_HEAD(, VFIOIOMMUContext) iommu_ctx_list;
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
     QLIST_HEAD(, VFIOGroup) group_list;
     QLIST_ENTRY(VFIOContainer) next;
@@ -88,6 +90,13 @@ typedef struct VFIOGuestIOMMU {
     QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
 } VFIOGuestIOMMU;
 
+typedef struct VFIOIOMMUContext {
+    VFIOContainer *container;
+    IOMMUContext *iommu_ctx;
+    IOMMUCTXNotifier n;
+    QLIST_ENTRY(VFIOIOMMUContext) iommu_ctx_next;
+} VFIOIOMMUContext;
+
 typedef struct VFIOHostDMAWindow {
     hwaddr min_iova;
     hwaddr max_iova;
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 06/22] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (4 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 05/22] vfio/common: add iommu_ctx_notifier in container Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-10-27 17:43   ` David Gibson
  2019-11-01 18:09   ` Peter Xu
  2019-10-24 12:34 ` [RFC v2 07/22] hw/pci: introduce pci_device_iommu_context() Liu Yi L
                   ` (19 subsequent siblings)
  25 siblings, 2 replies; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

This patch modifies pci_setup_iommu() to set PCIIOMMUOps instead of only
setting PCIIOMMUFunc. PCIIOMMUFunc is previously used to get an address
space for a device in vendor specific way. The PCIIOMMUOps still offers
this functionality. Use PCIIOMMUOps leaves space to add more iommu related
vendor specific operations.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/alpha/typhoon.c       |  6 +++++-
 hw/arm/smmu-common.c     |  6 +++++-
 hw/hppa/dino.c           |  6 +++++-
 hw/i386/amd_iommu.c      |  6 +++++-
 hw/i386/intel_iommu.c    |  6 +++++-
 hw/pci-host/designware.c |  6 +++++-
 hw/pci-host/ppce500.c    |  6 +++++-
 hw/pci-host/prep.c       |  6 +++++-
 hw/pci-host/sabre.c      |  6 +++++-
 hw/pci/pci.c             | 11 ++++++-----
 hw/ppc/ppc440_pcix.c     |  6 +++++-
 hw/ppc/spapr_pci.c       |  6 +++++-
 hw/s390x/s390-pci-bus.c  |  8 ++++++--
 include/hw/pci/pci.h     |  8 ++++++--
 include/hw/pci/pci_bus.h |  2 +-
 15 files changed, 74 insertions(+), 21 deletions(-)

diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
index 179e1f7..b890771 100644
--- a/hw/alpha/typhoon.c
+++ b/hw/alpha/typhoon.c
@@ -741,6 +741,10 @@ static AddressSpace *typhoon_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
     return &s->pchip.iommu_as;
 }
 
+static const PCIIOMMUOps typhoon_iommu_ops = {
+    .get_address_space = typhoon_pci_dma_iommu,
+};
+
 static void typhoon_set_irq(void *opaque, int irq, int level)
 {
     TyphoonState *s = opaque;
@@ -901,7 +905,7 @@ PCIBus *typhoon_init(ram_addr_t ram_size, ISABus **isa_bus,
                              "iommu-typhoon", UINT64_MAX);
     address_space_init(&s->pchip.iommu_as, MEMORY_REGION(&s->pchip.iommu),
                        "pchip0-pci");
-    pci_setup_iommu(b, typhoon_pci_dma_iommu, s);
+    pci_setup_iommu(b, &typhoon_iommu_ops, s);
 
     /* Pchip0 PCI special/interrupt acknowledge, 0x801.F800.0000, 64MB.  */
     memory_region_init_io(&s->pchip.reg_iack, OBJECT(s), &alpha_pci_iack_ops,
diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 245817d..d668514 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -342,6 +342,10 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
     return &sdev->as;
 }
 
+static const PCIIOMMUOps smmu_ops = {
+    .get_address_space = smmu_find_add_as,
+};
+
 IOMMUMemoryRegion *smmu_iommu_mr(SMMUState *s, uint32_t sid)
 {
     uint8_t bus_n, devfn;
@@ -436,7 +440,7 @@ static void smmu_base_realize(DeviceState *dev, Error **errp)
     s->smmu_pcibus_by_busptr = g_hash_table_new(NULL, NULL);
 
     if (s->primary_bus) {
-        pci_setup_iommu(s->primary_bus, smmu_find_add_as, s);
+        pci_setup_iommu(s->primary_bus, &smmu_ops, s);
     } else {
         error_setg(errp, "SMMU is not attached to any PCI bus!");
     }
diff --git a/hw/hppa/dino.c b/hw/hppa/dino.c
index ab6969b..dbcff03 100644
--- a/hw/hppa/dino.c
+++ b/hw/hppa/dino.c
@@ -389,6 +389,10 @@ static AddressSpace *dino_pcihost_set_iommu(PCIBus *bus, void *opaque,
     return &s->bm_as;
 }
 
+static const PCIIOMMUOps dino_iommu_ops = {
+    .get_address_space = dino_pcihost_set_iommu,
+};
+
 /*
  * Dino interrupts are connected as shown on Page 78, Table 23
  * (Little-endian bit numbers)
@@ -508,7 +512,7 @@ PCIBus *dino_init(MemoryRegion *addr_space,
     memory_region_add_subregion(&s->bm, 0xfff00000,
                                 &s->bm_cpu_alias);
     address_space_init(&s->bm_as, &s->bm, "pci-bm");
-    pci_setup_iommu(b, dino_pcihost_set_iommu, s);
+    pci_setup_iommu(b, &dino_iommu_ops, s);
 
     *p_rtc_irq = qemu_allocate_irq(dino_set_timer_irq, s, 0);
     *p_ser_irq = qemu_allocate_irq(dino_set_serial_irq, s, 0);
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index d372636..ba6904c 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1451,6 +1451,10 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
     return &iommu_as[devfn]->as;
 }
 
+static const PCIIOMMUOps amdvi_iommu_ops = {
+    .get_address_space = amdvi_host_dma_iommu,
+};
+
 static const MemoryRegionOps mmio_mem_ops = {
     .read = amdvi_mmio_read,
     .write = amdvi_mmio_write,
@@ -1576,7 +1580,7 @@ static void amdvi_realize(DeviceState *dev, Error **err)
 
     sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->mmio);
     sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, AMDVI_BASE_ADDR);
-    pci_setup_iommu(bus, amdvi_host_dma_iommu, s);
+    pci_setup_iommu(bus, &amdvi_iommu_ops, s);
     s->devid = object_property_get_int(OBJECT(&s->pci), "addr", err);
     msi_init(&s->pci.dev, 0, 1, true, false, err);
     amdvi_init(s);
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 4a1a07a..67a7836 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3666,6 +3666,10 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
     return &vtd_as->as;
 }
 
+static PCIIOMMUOps vtd_iommu_ops = {
+    .get_address_space = vtd_host_dma_iommu,
+};
+
 static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
 {
     X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
@@ -3782,7 +3786,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
                                               g_free, g_free);
     vtd_init(s);
     sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
-    pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
+    pci_setup_iommu(bus, &vtd_iommu_ops, dev);
     /* Pseudo address space under root PCI bus. */
     pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
     qemu_add_machine_init_done_notifier(&vtd_machine_done_notify);
diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c
index 71e9b0d..235d6af 100644
--- a/hw/pci-host/designware.c
+++ b/hw/pci-host/designware.c
@@ -645,6 +645,10 @@ static AddressSpace *designware_pcie_host_set_iommu(PCIBus *bus, void *opaque,
     return &s->pci.address_space;
 }
 
+static const PCIIOMMUOps designware_iommu_ops = {
+    .get_address_space = designware_pcie_host_set_iommu,
+};
+
 static void designware_pcie_host_realize(DeviceState *dev, Error **errp)
 {
     PCIHostState *pci = PCI_HOST_BRIDGE(dev);
@@ -686,7 +690,7 @@ static void designware_pcie_host_realize(DeviceState *dev, Error **errp)
     address_space_init(&s->pci.address_space,
                        &s->pci.address_space_root,
                        "pcie-bus-address-space");
-    pci_setup_iommu(pci->bus, designware_pcie_host_set_iommu, s);
+    pci_setup_iommu(pci->bus, designware_iommu_ops, s);
 
     qdev_set_parent_bus(DEVICE(&s->root), BUS(pci->bus));
     qdev_init_nofail(DEVICE(&s->root));
diff --git a/hw/pci-host/ppce500.c b/hw/pci-host/ppce500.c
index 8bed8e8..0f907b0 100644
--- a/hw/pci-host/ppce500.c
+++ b/hw/pci-host/ppce500.c
@@ -439,6 +439,10 @@ static AddressSpace *e500_pcihost_set_iommu(PCIBus *bus, void *opaque,
     return &s->bm_as;
 }
 
+static const PCIIOMMUOps ppce500_iommu_ops = {
+    .get_address_space = e500_pcihost_set_iommu,
+};
+
 static void e500_pcihost_realize(DeviceState *dev, Error **errp)
 {
     SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
@@ -473,7 +477,7 @@ static void e500_pcihost_realize(DeviceState *dev, Error **errp)
     memory_region_init(&s->bm, OBJECT(s), "bm-e500", UINT64_MAX);
     memory_region_add_subregion(&s->bm, 0x0, &s->busmem);
     address_space_init(&s->bm_as, &s->bm, "pci-bm");
-    pci_setup_iommu(b, e500_pcihost_set_iommu, s);
+    pci_setup_iommu(b, &ppce500_iommu_ops, s);
 
     pci_create_simple(b, 0, "e500-host-bridge");
 
diff --git a/hw/pci-host/prep.c b/hw/pci-host/prep.c
index 85d7ba9..f372524 100644
--- a/hw/pci-host/prep.c
+++ b/hw/pci-host/prep.c
@@ -213,6 +213,10 @@ static AddressSpace *raven_pcihost_set_iommu(PCIBus *bus, void *opaque,
     return &s->bm_as;
 }
 
+static const PCIIOMMU raven_iommu_ops = {
+    .get_address_space = raven_pcihost_set_iommu,
+};
+
 static void raven_change_gpio(void *opaque, int n, int level)
 {
     PREPPCIState *s = opaque;
@@ -303,7 +307,7 @@ static void raven_pcihost_initfn(Object *obj)
     memory_region_add_subregion(&s->bm, 0         , &s->bm_pci_memory_alias);
     memory_region_add_subregion(&s->bm, 0x80000000, &s->bm_ram_alias);
     address_space_init(&s->bm_as, &s->bm, "raven-bm");
-    pci_setup_iommu(&s->pci_bus, raven_pcihost_set_iommu, s);
+    pci_setup_iommu(&s->pci_bus, &raven_iommu_ops, s);
 
     h->bus = &s->pci_bus;
 
diff --git a/hw/pci-host/sabre.c b/hw/pci-host/sabre.c
index fae20ee..79b7565 100644
--- a/hw/pci-host/sabre.c
+++ b/hw/pci-host/sabre.c
@@ -112,6 +112,10 @@ static AddressSpace *sabre_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
     return &is->iommu_as;
 }
 
+static const PCIIOMMUOps sabre_iommu_ops = {
+    .get_address_space = sabre_pci_dma_iommu,
+};
+
 static void sabre_config_write(void *opaque, hwaddr addr,
                                uint64_t val, unsigned size)
 {
@@ -402,7 +406,7 @@ static void sabre_realize(DeviceState *dev, Error **errp)
     /* IOMMU */
     memory_region_add_subregion_overlap(&s->sabre_config, 0x200,
                     sysbus_mmio_get_region(SYS_BUS_DEVICE(s->iommu), 0), 1);
-    pci_setup_iommu(phb->bus, sabre_pci_dma_iommu, s->iommu);
+    pci_setup_iommu(phb->bus, &sabre_iommu_ops, s->iommu);
 
     /* APB secondary busses */
     pci_dev = pci_create_multifunction(phb->bus, PCI_DEVFN(1, 0), true,
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index aa05c2b..b5ce9ca 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2615,18 +2615,19 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
     PCIBus *bus = pci_get_bus(dev);
     PCIBus *iommu_bus = bus;
 
-    while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) {
+    while (iommu_bus && !iommu_bus->iommu_ops && iommu_bus->parent_dev) {
         iommu_bus = pci_get_bus(iommu_bus->parent_dev);
     }
-    if (iommu_bus && iommu_bus->iommu_fn) {
-        return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, dev->devfn);
+    if (iommu_bus && iommu_bus->iommu_ops) {
+        return iommu_bus->iommu_ops->get_address_space(bus,
+                           iommu_bus->iommu_opaque, dev->devfn);
     }
     return &address_space_memory;
 }
 
-void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque)
+void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque)
 {
-    bus->iommu_fn = fn;
+    bus->iommu_ops = ops;
     bus->iommu_opaque = opaque;
 }
 
diff --git a/hw/ppc/ppc440_pcix.c b/hw/ppc/ppc440_pcix.c
index 2ee2d4f..2c8579c 100644
--- a/hw/ppc/ppc440_pcix.c
+++ b/hw/ppc/ppc440_pcix.c
@@ -442,6 +442,10 @@ static AddressSpace *ppc440_pcix_set_iommu(PCIBus *b, void *opaque, int devfn)
     return &s->bm_as;
 }
 
+static const PCIIOMMUOps ppc440_iommu_ops = {
+    .get_adress_space = ppc440_pcix_set_iommu,
+};
+
 /* The default pci_host_data_{read,write} functions in pci/pci_host.c
  * deny access to registers without bit 31 set but our clients want
  * this to work so we have to override these here */
@@ -487,7 +491,7 @@ static void ppc440_pcix_realize(DeviceState *dev, Error **errp)
     memory_region_init(&s->bm, OBJECT(s), "bm-ppc440-pcix", UINT64_MAX);
     memory_region_add_subregion(&s->bm, 0x0, &s->busmem);
     address_space_init(&s->bm_as, &s->bm, "pci-bm");
-    pci_setup_iommu(h->bus, ppc440_pcix_set_iommu, s);
+    pci_setup_iommu(h->bus, &ppc440_iommu_ops, s);
 
     memory_region_init(&s->container, OBJECT(s), "pci-container", PCI_ALL_SIZE);
     memory_region_init_io(&h->conf_mem, OBJECT(s), &pci_host_conf_le_ops,
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 01ff41d..83cd857 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -771,6 +771,10 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
     return &phb->iommu_as;
 }
 
+static const PCIIOMMUOps spapr_iommu_ops = {
+    .get_address_space = spapr_pci_dma_iommu,
+};
+
 static char *spapr_phb_vfio_get_loc_code(SpaprPhbState *sphb,  PCIDevice *pdev)
 {
     char *path = NULL, *buf = NULL, *host = NULL;
@@ -1950,7 +1954,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
     memory_region_add_subregion(&sphb->iommu_root, SPAPR_PCI_MSI_WINDOW,
                                 &sphb->msiwindow);
 
-    pci_setup_iommu(bus, spapr_pci_dma_iommu, sphb);
+    pci_setup_iommu(bus, &spapr_iommu_ops, sphb);
 
     pci_bus_set_route_irq_fn(bus, spapr_route_intx_pin_to_irq);
 
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 2d2f4a7..14684a0 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -635,6 +635,10 @@ static AddressSpace *s390_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
     return &iommu->as;
 }
 
+static const PCIIOMMUOps s390_iommu_ops = {
+    .get_address_space = s390_pci_dma_iommu,
+};
+
 static uint8_t set_ind_atomic(uint64_t ind_loc, uint8_t to_be_set)
 {
     uint8_t ind_old, ind_new;
@@ -748,7 +752,7 @@ static void s390_pcihost_realize(DeviceState *dev, Error **errp)
     b = pci_register_root_bus(dev, NULL, s390_pci_set_irq, s390_pci_map_irq,
                               NULL, get_system_memory(), get_system_io(), 0,
                               64, TYPE_PCI_BUS);
-    pci_setup_iommu(b, s390_pci_dma_iommu, s);
+    pci_setup_iommu(b, &s390_iommu_ops, s);
 
     bus = BUS(b);
     qbus_set_hotplug_handler(bus, OBJECT(dev), &local_err);
@@ -919,7 +923,7 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
 
         pdev = PCI_DEVICE(dev);
         pci_bridge_map_irq(pb, dev->id, s390_pci_map_irq);
-        pci_setup_iommu(&pb->sec_bus, s390_pci_dma_iommu, s);
+        pci_setup_iommu(&pb->sec_bus, &s390_iommu_ops, s);
 
         qbus_set_hotplug_handler(BUS(&pb->sec_bus), OBJECT(s), errp);
 
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index f3f0ffd..d9fed8d 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -480,10 +480,14 @@ void pci_bus_get_w64_range(PCIBus *bus, Range *range);
 
 void pci_device_deassert_intx(PCIDevice *dev);
 
-typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
+typedef struct PCIIOMMUOps PCIIOMMUOps;
+struct PCIIOMMUOps {
+    AddressSpace * (*get_address_space)(PCIBus *bus,
+                                void *opaque, int32_t devfn);
+};
 
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
-void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
+void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *iommu_ops, void *opaque);
 
 static inline void
 pci_set_byte(uint8_t *config, uint8_t val)
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 0714f57..c281057 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -29,7 +29,7 @@ enum PCIBusFlags {
 struct PCIBus {
     BusState qbus;
     enum PCIBusFlags flags;
-    PCIIOMMUFunc iommu_fn;
+    const PCIIOMMUOps *iommu_ops;
     void *iommu_opaque;
     uint8_t devfn_min;
     uint32_t slot_reserved_mask;
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 07/22] hw/pci: introduce pci_device_iommu_context()
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (5 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 06/22] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-10-29 11:50   ` David Gibson
  2019-11-01 18:09   ` Peter Xu
  2019-10-24 12:34 ` [RFC v2 08/22] intel_iommu: provide get_iommu_context() callback Liu Yi L
                   ` (18 subsequent siblings)
  25 siblings, 2 replies; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

This patch adds pci_device_iommu_context() to get an iommu_context
for a given device. A new callback is added in PCIIOMMUOps. Users
who wants to listen to events issued by vIOMMU could use this new
interface to get an iommu_context and register their own notifiers,
then wait for notifications from vIOMMU. e.g. VFIO is the first user
of it to listen to the PASID_ALLOC/PASID_BIND/CACHE_INV events and
propagate the events to host.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/pci/pci.c         | 16 ++++++++++++++++
 include/hw/pci/pci.h |  5 +++++
 2 files changed, 21 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index b5ce9ca..4e6af06 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2625,6 +2625,22 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
     return &address_space_memory;
 }
 
+IOMMUContext *pci_device_iommu_context(PCIDevice *dev)
+{
+    PCIBus *bus = pci_get_bus(dev);
+    PCIBus *iommu_bus = bus;
+
+    while (iommu_bus && !iommu_bus->iommu_ops && iommu_bus->parent_dev) {
+        iommu_bus = pci_get_bus(iommu_bus->parent_dev);
+    }
+    if (iommu_bus && iommu_bus->iommu_ops &&
+        iommu_bus->iommu_ops->get_iommu_context) {
+        return iommu_bus->iommu_ops->get_iommu_context(bus,
+                           iommu_bus->iommu_opaque, dev->devfn);
+    }
+    return NULL;
+}
+
 void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque)
 {
     bus->iommu_ops = ops;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index d9fed8d..ccada47 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -9,6 +9,8 @@
 
 #include "hw/pci/pcie.h"
 
+#include "hw/iommu/iommu.h"
+
 extern bool pci_available;
 
 /* PCI bus */
@@ -484,9 +486,12 @@ typedef struct PCIIOMMUOps PCIIOMMUOps;
 struct PCIIOMMUOps {
     AddressSpace * (*get_address_space)(PCIBus *bus,
                                 void *opaque, int32_t devfn);
+    IOMMUContext * (*get_iommu_context)(PCIBus *bus,
+                                void *opaque, int32_t devfn);
 };
 
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
+IOMMUContext *pci_device_iommu_context(PCIDevice *dev);
 void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *iommu_ops, void *opaque);
 
 static inline void
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 08/22] intel_iommu: provide get_iommu_context() callback
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (6 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 07/22] hw/pci: introduce pci_device_iommu_context() Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-11-01 14:55   ` Peter Xu
  2019-10-24 12:34 ` [RFC v2 09/22] vfio/pci: add iommu_context notifier for pasid alloc/free Liu Yi L
                   ` (17 subsequent siblings)
  25 siblings, 1 reply; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

This patch adds get_iommu_context() callback to return an iommu_context
Intel VT-d platform.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c         | 57 ++++++++++++++++++++++++++++++++++++++-----
 include/hw/i386/intel_iommu.h | 14 ++++++++++-
 2 files changed, 64 insertions(+), 7 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 67a7836..e9f8692 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3288,22 +3288,33 @@ static const MemoryRegionOps vtd_mem_ir_ops = {
     },
 };
 
-VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
+static VTDBus *vtd_find_add_bus(IntelIOMMUState *s, PCIBus *bus)
 {
     uintptr_t key = (uintptr_t)bus;
-    VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
-    VTDAddressSpace *vtd_dev_as;
-    char name[128];
+    VTDBus *vtd_bus;
 
+    vtd_iommu_lock(s);
+    vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
     if (!vtd_bus) {
         uintptr_t *new_key = g_malloc(sizeof(*new_key));
         *new_key = (uintptr_t)bus;
         /* No corresponding free() */
-        vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * \
-                            PCI_DEVFN_MAX);
+        vtd_bus = g_malloc0(sizeof(VTDBus) + PCI_DEVFN_MAX * \
+                    (sizeof(VTDAddressSpace *) + sizeof(VTDIOMMUContext *)));
         vtd_bus->bus = bus;
         g_hash_table_insert(s->vtd_as_by_busptr, new_key, vtd_bus);
     }
+    vtd_iommu_unlock(s);
+    return vtd_bus;
+}
+
+VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
+{
+    VTDBus *vtd_bus;
+    VTDAddressSpace *vtd_dev_as;
+    char name[128];
+
+    vtd_bus = vtd_find_add_bus(s, bus);
 
     vtd_dev_as = vtd_bus->dev_as[devfn];
 
@@ -3370,6 +3381,27 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
     return vtd_dev_as;
 }
 
+VTDIOMMUContext *vtd_find_add_ic(IntelIOMMUState *s,
+                                 PCIBus *bus, int devfn)
+{
+    VTDBus *vtd_bus;
+    VTDIOMMUContext *vtd_dev_ic;
+
+    vtd_bus = vtd_find_add_bus(s, bus);
+
+    vtd_dev_ic = vtd_bus->dev_ic[devfn];
+
+    if (!vtd_dev_ic) {
+        vtd_bus->dev_ic[devfn] = vtd_dev_ic =
+                    g_malloc0(sizeof(VTDIOMMUContext));
+        vtd_dev_ic->vtd_bus = vtd_bus;
+        vtd_dev_ic->devfn = (uint8_t)devfn;
+        vtd_dev_ic->iommu_state = s;
+        iommu_context_init(&vtd_dev_ic->iommu_context);
+    }
+    return vtd_dev_ic;
+}
+
 static uint64_t get_naturally_aligned_size(uint64_t start,
                                            uint64_t size, int gaw)
 {
@@ -3666,8 +3698,21 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
     return &vtd_as->as;
 }
 
+static IOMMUContext *vtd_dev_iommu_context(PCIBus *bus,
+                                           void *opaque, int devfn)
+{
+    IntelIOMMUState *s = opaque;
+    VTDIOMMUContext *vtd_ic;
+
+    assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
+
+    vtd_ic = vtd_find_add_ic(s, bus, devfn);
+    return &vtd_ic->iommu_context;
+}
+
 static PCIIOMMUOps vtd_iommu_ops = {
     .get_address_space = vtd_host_dma_iommu,
+    .get_iommu_context = vtd_dev_iommu_context,
 };
 
 static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 6062588..1c580c1 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -68,6 +68,7 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
 typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
 typedef struct VTDPASIDDirEntry VTDPASIDDirEntry;
 typedef struct VTDPASIDEntry VTDPASIDEntry;
+typedef struct VTDIOMMUContext VTDIOMMUContext;
 
 /* Context-Entry */
 struct VTDContextEntry {
@@ -116,9 +117,19 @@ struct VTDAddressSpace {
     IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
 };
 
+struct VTDIOMMUContext {
+    VTDBus *vtd_bus;
+    uint8_t devfn;
+    IOMMUContext iommu_context;
+    IntelIOMMUState *iommu_state;
+};
+
 struct VTDBus {
     PCIBus* bus;		/* A reference to the bus to provide translation for */
-    VTDAddressSpace *dev_as[0];	/* A table of VTDAddressSpace objects indexed by devfn */
+    /* A table of VTDAddressSpace objects indexed by devfn */
+    VTDAddressSpace *dev_as[PCI_DEVFN_MAX];
+    /* A table of VTDIOMMUContext objects indexed by devfn */
+    VTDIOMMUContext *dev_ic[PCI_DEVFN_MAX];
 };
 
 struct VTDIOTLBEntry {
@@ -282,5 +293,6 @@ struct IntelIOMMUState {
  * create a new one if none exists
  */
 VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn);
+VTDIOMMUContext *vtd_find_add_ic(IntelIOMMUState *s, PCIBus *bus, int devfn);
 
 #endif
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 09/22] vfio/pci: add iommu_context notifier for pasid alloc/free
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (7 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 08/22] intel_iommu: provide get_iommu_context() callback Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-10-29 12:15   ` David Gibson
  2019-10-24 12:34 ` [RFC v2 10/22] intel_iommu: add virtual command capability support Liu Yi L
                   ` (16 subsequent siblings)
  25 siblings, 1 reply; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

This patch adds pasid alloc/free notifiers for vfio-pci. It is
supposed to be fired by vIOMMU. VFIO then sends PASID allocation
or free request to host.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/vfio/common.c         |  9 ++++++
 hw/vfio/pci.c            | 81 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/iommu/iommu.h | 15 +++++++++
 3 files changed, 105 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index d418527..e6ad21c 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1436,6 +1436,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
     if (QLIST_EMPTY(&container->group_list)) {
         VFIOAddressSpace *space = container->space;
         VFIOGuestIOMMU *giommu, *tmp;
+        VFIOIOMMUContext *giommu_ctx, *ctx;
 
         QLIST_REMOVE(container, next);
 
@@ -1446,6 +1447,14 @@ static void vfio_disconnect_container(VFIOGroup *group)
             g_free(giommu);
         }
 
+        QLIST_FOREACH_SAFE(giommu_ctx, &container->iommu_ctx_list,
+                                                   iommu_ctx_next, ctx) {
+            iommu_ctx_notifier_unregister(giommu_ctx->iommu_ctx,
+                                                      &giommu_ctx->n);
+            QLIST_REMOVE(giommu_ctx, iommu_ctx_next);
+            g_free(giommu_ctx);
+        }
+
         trace_vfio_disconnect_container(container->fd);
         close(container->fd);
         g_free(container);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 12fac39..8721ff6 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2699,11 +2699,80 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
     vdev->req_enabled = false;
 }
 
+static void vfio_register_iommu_ctx_notifier(VFIOPCIDevice *vdev,
+                                             IOMMUContext *iommu_ctx,
+                                             IOMMUCTXNotifyFn fn,
+                                             IOMMUCTXEvent event)
+{
+    VFIOContainer *container = vdev->vbasedev.group->container;
+    VFIOIOMMUContext *giommu_ctx;
+
+    giommu_ctx = g_malloc0(sizeof(*giommu_ctx));
+    giommu_ctx->container = container;
+    giommu_ctx->iommu_ctx = iommu_ctx;
+    QLIST_INSERT_HEAD(&container->iommu_ctx_list,
+                      giommu_ctx,
+                      iommu_ctx_next);
+    iommu_ctx_notifier_register(iommu_ctx,
+                                &giommu_ctx->n,
+                                fn,
+                                event);
+}
+
+static void vfio_iommu_pasid_alloc_notify(IOMMUCTXNotifier *n,
+                                          IOMMUCTXEventData *event_data)
+{
+    VFIOIOMMUContext *giommu_ctx = container_of(n, VFIOIOMMUContext, n);
+    VFIOContainer *container = giommu_ctx->container;
+    IOMMUCTXPASIDReqDesc *pasid_req =
+                              (IOMMUCTXPASIDReqDesc *) event_data->data;
+    struct vfio_iommu_type1_pasid_request req;
+    unsigned long argsz;
+    int pasid;
+
+    argsz = sizeof(req);
+    req.argsz = argsz;
+    req.flag = VFIO_IOMMU_PASID_ALLOC;
+    req.min_pasid = pasid_req->min_pasid;
+    req.max_pasid = pasid_req->max_pasid;
+
+    pasid = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
+    if (pasid < 0) {
+        error_report("%s: %d, alloc failed", __func__, -errno);
+    }
+    pasid_req->alloc_result = pasid;
+}
+
+static void vfio_iommu_pasid_free_notify(IOMMUCTXNotifier *n,
+                                          IOMMUCTXEventData *event_data)
+{
+    VFIOIOMMUContext *giommu_ctx = container_of(n, VFIOIOMMUContext, n);
+    VFIOContainer *container = giommu_ctx->container;
+    IOMMUCTXPASIDReqDesc *pasid_req =
+                              (IOMMUCTXPASIDReqDesc *) event_data->data;
+    struct vfio_iommu_type1_pasid_request req;
+    unsigned long argsz;
+    int ret = 0;
+
+    argsz = sizeof(req);
+    req.argsz = argsz;
+    req.flag = VFIO_IOMMU_PASID_FREE;
+    req.pasid = pasid_req->pasid;
+
+    ret = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
+    if (ret != 0) {
+        error_report("%s: %d, pasid %u free failed",
+                   __func__, -errno, (unsigned) pasid_req->pasid);
+    }
+    pasid_req->free_result = ret;
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
     VFIOPCIDevice *vdev = PCI_VFIO(pdev);
     VFIODevice *vbasedev_iter;
     VFIOGroup *group;
+    IOMMUContext *iommu_context;
     char *tmp, *subsys, group_path[PATH_MAX], *group_name;
     Error *err = NULL;
     ssize_t len;
@@ -3000,6 +3069,18 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     vfio_register_req_notifier(vdev);
     vfio_setup_resetfn_quirk(vdev);
 
+    iommu_context = pci_device_iommu_context(pdev);
+    if (iommu_context) {
+        vfio_register_iommu_ctx_notifier(vdev,
+                                         iommu_context,
+                                         vfio_iommu_pasid_alloc_notify,
+                                         IOMMU_CTX_EVENT_PASID_ALLOC);
+        vfio_register_iommu_ctx_notifier(vdev,
+                                         iommu_context,
+                                         vfio_iommu_pasid_free_notify,
+                                         IOMMU_CTX_EVENT_PASID_FREE);
+    }
+
     return;
 
 out_teardown:
diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
index c22c442..4352afd 100644
--- a/include/hw/iommu/iommu.h
+++ b/include/hw/iommu/iommu.h
@@ -31,10 +31,25 @@
 typedef struct IOMMUContext IOMMUContext;
 
 enum IOMMUCTXEvent {
+    IOMMU_CTX_EVENT_PASID_ALLOC,
+    IOMMU_CTX_EVENT_PASID_FREE,
     IOMMU_CTX_EVENT_NUM,
 };
 typedef enum IOMMUCTXEvent IOMMUCTXEvent;
 
+union IOMMUCTXPASIDReqDesc {
+    struct {
+        uint32_t min_pasid;
+        uint32_t max_pasid;
+        int32_t alloc_result; /* pasid allocated for the alloc request */
+    };
+    struct {
+        uint32_t pasid; /* pasid to be free */
+        int free_result;
+    };
+};
+typedef union IOMMUCTXPASIDReqDesc IOMMUCTXPASIDReqDesc;
+
 struct IOMMUCTXEventData {
     IOMMUCTXEvent event;
     uint64_t length;
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 10/22] intel_iommu: add virtual command capability support
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (8 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 09/22] vfio/pci: add iommu_context notifier for pasid alloc/free Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-11-01 18:05   ` Peter Xu
  2019-10-24 12:34 ` [RFC v2 11/22] intel_iommu: process pasid cache invalidation Liu Yi L
                   ` (15 subsequent siblings)
  25 siblings, 1 reply; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

This patch adds virtual command support to Intel vIOMMU per
Intel VT-d 3.1 spec. And adds two virtual commands: alloc_pasid
and free_pasid.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 162 ++++++++++++++++++++++++++++++++++++++++-
 hw/i386/intel_iommu_internal.h |  38 ++++++++++
 hw/i386/trace-events           |   1 +
 include/hw/i386/intel_iommu.h  |   6 +-
 4 files changed, 205 insertions(+), 2 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index e9f8692..88b843f 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -944,6 +944,7 @@ static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
                 return vtd_bus;
             }
         }
+        vtd_bus = NULL;
     }
     return vtd_bus;
 }
@@ -2590,6 +2591,140 @@ static void vtd_handle_iectl_write(IntelIOMMUState *s)
     }
 }
 
+static int vtd_request_pasid_alloc(IntelIOMMUState *s)
+{
+    VTDBus *vtd_bus;
+    int bus_n, devfn;
+    IOMMUCTXEventData event_data;
+    IOMMUCTXPASIDReqDesc req;
+    VTDIOMMUContext *vtd_ic;
+
+    event_data.event = IOMMU_CTX_EVENT_PASID_ALLOC;
+    event_data.data = &req;
+    req.min_pasid = VTD_MIN_HPASID;
+    req.max_pasid = VTD_MAX_HPASID;
+    req.alloc_result = 0;
+    event_data.length = sizeof(req);
+    for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) {
+        vtd_bus = vtd_find_as_from_bus_num(s, bus_n);
+        if (!vtd_bus) {
+            continue;
+        }
+        for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
+            vtd_ic = vtd_bus->dev_ic[devfn];
+            if (!vtd_ic) {
+                continue;
+            }
+            iommu_ctx_event_notify(&vtd_ic->iommu_context, &event_data);
+            if (req.alloc_result > 0) {
+                return req.alloc_result;
+            }
+        }
+    }
+    return -1;
+}
+
+static int vtd_request_pasid_free(IntelIOMMUState *s, uint32_t pasid)
+{
+    VTDBus *vtd_bus;
+    int bus_n, devfn;
+    IOMMUCTXEventData event_data;
+    IOMMUCTXPASIDReqDesc req;
+    VTDIOMMUContext *vtd_ic;
+
+    event_data.event = IOMMU_CTX_EVENT_PASID_FREE;
+    event_data.data = &req;
+    req.pasid = pasid;
+    req.free_result = 0;
+    event_data.length = sizeof(req);
+    for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) {
+        vtd_bus = vtd_find_as_from_bus_num(s, bus_n);
+        if (!vtd_bus) {
+            continue;
+        }
+        for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
+            vtd_ic = vtd_bus->dev_ic[devfn];
+            if (!vtd_ic) {
+                continue;
+            }
+            iommu_ctx_event_notify(&vtd_ic->iommu_context, &event_data);
+            if (req.free_result == 0) {
+                return 0;
+            }
+        }
+    }
+    return -1;
+}
+
+/*
+ * If IP is not set, set it and return 0
+ * If IP is already set, return -1
+ */
+static int vtd_vcmd_rsp_ip_check(IntelIOMMUState *s)
+{
+    if (!(s->vccap & VTD_VCCAP_PAS) ||
+         (s->vcrsp & 1)) {
+        return -1;
+    }
+    s->vcrsp = 1;
+    vtd_set_quad_raw(s, DMAR_VCRSP_REG,
+                     ((uint64_t) s->vcrsp));
+    return 0;
+}
+
+static void vtd_vcmd_clear_ip(IntelIOMMUState *s)
+{
+    s->vcrsp &= (~((uint64_t)(0x1)));
+    vtd_set_quad_raw(s, DMAR_VCRSP_REG,
+                     ((uint64_t) s->vcrsp));
+}
+
+/* Handle write to Virtual Command Register */
+static int vtd_handle_vcmd_write(IntelIOMMUState *s, uint64_t val)
+{
+    uint32_t pasid;
+    int ret = -1;
+
+    trace_vtd_reg_write_vcmd(s->vcrsp, val);
+
+    /*
+     * Since vCPU should be blocked when the guest VMCD
+     * write was trapped to here. Should be no other vCPUs
+     * try to access VCMD if guest software is well written.
+     * However, we still emulate the IP bit here in case of
+     * bad guest software. Also align with the spec.
+     */
+    ret = vtd_vcmd_rsp_ip_check(s);
+    if (ret) {
+        return ret;
+    }
+    switch (val & VTD_VCMD_CMD_MASK) {
+    case VTD_VCMD_ALLOC_PASID:
+        ret = vtd_request_pasid_alloc(s);
+        if (ret < 0) {
+            s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_NO_AVAILABLE_PASID);
+        } else {
+            s->vcrsp |= VTD_VCRSP_RSLT(ret);
+        }
+        break;
+
+    case VTD_VCMD_FREE_PASID:
+        pasid = VTD_VCMD_PASID_VALUE(val);
+        ret = vtd_request_pasid_free(s, pasid);
+        if (ret < 0) {
+            s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_FREE_INVALID_PASID);
+        }
+        break;
+
+    default:
+        s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_UNDEFINED_CMD);
+        printf("Virtual Command: unsupported command!!!\n");
+        break;
+    }
+    vtd_vcmd_clear_ip(s);
+    return 0;
+}
+
 static uint64_t vtd_mem_read(void *opaque, hwaddr addr, unsigned size)
 {
     IntelIOMMUState *s = opaque;
@@ -2879,6 +3014,23 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
         vtd_set_long(s, addr, val);
         break;
 
+    case DMAR_VCMD_REG:
+        if (!vtd_handle_vcmd_write(s, val)) {
+            if (size == 4) {
+                vtd_set_long(s, addr, val);
+            } else {
+                vtd_set_quad(s, addr, val);
+            }
+        }
+        break;
+
+    case DMAR_VCMD_REG_HI:
+        assert(size == 4);
+        if (!vtd_handle_vcmd_write(s, val)) {
+            vtd_set_long(s, addr, val);
+        }
+        break;
+
     default:
         if (size == 4) {
             vtd_set_long(s, addr, val);
@@ -3617,7 +3769,8 @@ static void vtd_init(IntelIOMMUState *s)
             s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
         } else if (!strcmp(s->scalable_mode, "modern")) {
             s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID
-                       | VTD_ECAP_FLTS | VTD_ECAP_PSS;
+                       | VTD_ECAP_FLTS | VTD_ECAP_PSS | VTD_ECAP_VCS;
+            s->vccap |= VTD_VCCAP_PAS;
         }
     }
 
@@ -3674,6 +3827,13 @@ static void vtd_init(IntelIOMMUState *s)
      * Interrupt remapping registers.
      */
     vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xfffffffffffff80fULL, 0);
+
+    /*
+     * Virtual Command Definitions
+     */
+    vtd_define_quad(s, DMAR_VCCAP_REG, s->vccap, 0, 0);
+    vtd_define_quad(s, DMAR_VCMD_REG, 0, 0xffffffffffffffffULL, 0);
+    vtd_define_quad(s, DMAR_VCRSP_REG, 0, 0, 0);
 }
 
 /* Should not reset address_spaces when reset because devices will still use
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index be7b30a..8668771 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -85,6 +85,12 @@
 #define DMAR_MTRRCAP_REG_HI     0x104
 #define DMAR_MTRRDEF_REG        0x108 /* MTRR default type */
 #define DMAR_MTRRDEF_REG_HI     0x10c
+#define DMAR_VCCAP_REG          0xE00 /* Virtual Command Capability Register */
+#define DMAR_VCCAP_REG_HI       0xE04
+#define DMAR_VCMD_REG           0xE10 /* Virtual Command Register */
+#define DMAR_VCMD_REG_HI        0xE14
+#define DMAR_VCRSP_REG          0xE20 /* Virtual Command Reponse Register */
+#define DMAR_VCRSP_REG_HI       0xE24
 
 /* IOTLB registers */
 #define DMAR_IOTLB_REG_OFFSET   0xf0 /* Offset to the IOTLB registers */
@@ -193,6 +199,7 @@
 #define VTD_ECAP_PSS                (19ULL << 35)
 #define VTD_ECAP_PASID              (1ULL << 40)
 #define VTD_ECAP_SMTS               (1ULL << 43)
+#define VTD_ECAP_VCS                (1ULL << 44)
 #define VTD_ECAP_SLTS               (1ULL << 46)
 #define VTD_ECAP_FLTS               (1ULL << 47)
 
@@ -315,6 +322,37 @@ typedef enum VTDFaultReason {
 
 #define VTD_CONTEXT_CACHE_GEN_MAX       0xffffffffUL
 
+/* VCCAP_REG */
+#define VTD_VCCAP_PAS               (1UL << 0)
+
+/*
+ * The basic idea is to let hypervisor to set a range for available
+ * PASIDs for VMs. One of the reasons is PASID #0 is reserved by
+ * RID_PASID usage. We have no idea how many reserved PASIDs in future,
+ * so here just an evaluated value. Honestly, set it as "1" is enough
+ * at current stage.
+ */
+#define VTD_MIN_HPASID              1
+#define VTD_MAX_HPASID              0xFFFFF
+
+/* Virtual Command Register */
+enum {
+     VTD_VCMD_NULL_CMD = 0,
+     VTD_VCMD_ALLOC_PASID = 1,
+     VTD_VCMD_FREE_PASID = 2,
+     VTD_VCMD_CMD_NUM,
+};
+
+#define VTD_VCMD_CMD_MASK           0xffUL
+#define VTD_VCMD_PASID_VALUE(val)   (((val) >> 8) & 0xfffff)
+
+#define VTD_VCRSP_RSLT(val)         ((val) << 8)
+#define VTD_VCRSP_SC(val)           (((val) & 0x3) << 1)
+
+#define VTD_VCMD_UNDEFINED_CMD         1ULL
+#define VTD_VCMD_NO_AVAILABLE_PASID    2ULL
+#define VTD_VCMD_FREE_INVALID_PASID    2ULL
+
 /* Interrupt Entry Cache Invalidation Descriptor: VT-d 6.5.2.7. */
 struct VTDInvDescIEC {
     uint32_t type:4;            /* Should always be 0x4 */
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index c8bc464..43c0314 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -51,6 +51,7 @@ vtd_reg_write_gcmd(uint32_t status, uint32_t val) "status 0x%"PRIx32" value 0x%"
 vtd_reg_write_fectl(uint32_t value) "value 0x%"PRIx32
 vtd_reg_write_iectl(uint32_t value) "value 0x%"PRIx32
 vtd_reg_ics_clear_ip(void) ""
+vtd_reg_write_vcmd(uint32_t status, uint32_t val) "status 0x%"PRIx32" value 0x%"PRIx32
 vtd_dmar_translate(uint8_t bus, uint8_t slot, uint8_t func, uint64_t iova, uint64_t gpa, uint64_t mask) "dev %02x:%02x.%02x iova 0x%"PRIx64" -> gpa 0x%"PRIx64" mask 0x%"PRIx64
 vtd_dmar_enable(bool en) "enable %d"
 vtd_dmar_fault(uint16_t sid, int fault, uint64_t addr, bool is_write) "sid 0x%"PRIx16" fault %d addr 0x%"PRIx64" write %d"
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 1c580c1..0d49480 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -46,7 +46,7 @@
 #define VTD_SID_TO_BUS(sid)         (((sid) >> 8) & 0xff)
 #define VTD_SID_TO_DEVFN(sid)       ((sid) & 0xff)
 
-#define DMAR_REG_SIZE               0x230
+#define DMAR_REG_SIZE               0xF00
 #define VTD_HOST_AW_39BIT           39
 #define VTD_HOST_AW_48BIT           48
 #define VTD_HOST_ADDRESS_WIDTH      VTD_HOST_AW_39BIT
@@ -282,6 +282,10 @@ struct IntelIOMMUState {
     uint8_t aw_bits;                /* Host/IOVA address width (in bits) */
     bool dma_drain;                 /* Whether DMA r/w draining enabled */
 
+    /* Virtual Command Register */
+    uint64_t vccap;                 /* The value of vcmd capability reg */
+    uint64_t vcrsp;                 /* Current value of VCMD RSP REG */
+
     /*
      * Protects IOMMU states in general.  Currently it protects the
      * per-IOMMU IOTLB cache, and context entry cache in VTDAddressSpace.
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 11/22] intel_iommu: process pasid cache invalidation
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (9 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 10/22] intel_iommu: add virtual command capability support Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-11-02 16:05   ` Peter Xu
  2019-10-24 12:34 ` [RFC v2 12/22] intel_iommu: add present bit check for pasid table entries Liu Yi L
                   ` (14 subsequent siblings)
  25 siblings, 1 reply; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

This patch adds PASID cache invalidation handling. When guest enabled
PASID usages (e.g. SVA), guest software should issue a proper PASID
cache invalidation when caching-mode is exposed. This patch only adds
the draft handling of pasid cache invalidation. Detailed handling will
be added in subsequent patches.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c          | 66 ++++++++++++++++++++++++++++++++++++++----
 hw/i386/intel_iommu_internal.h | 12 ++++++++
 hw/i386/trace-events           |  3 ++
 3 files changed, 76 insertions(+), 5 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 88b843f..84ff6f0 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2335,6 +2335,63 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
     return true;
 }
 
+static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id)
+{
+    return 0;
+}
+
+static int vtd_pasid_cache_psi(IntelIOMMUState *s,
+                               uint16_t domain_id, uint32_t pasid)
+{
+    return 0;
+}
+
+static int vtd_pasid_cache_gsi(IntelIOMMUState *s)
+{
+    return 0;
+}
+
+static bool vtd_process_pasid_desc(IntelIOMMUState *s,
+                                   VTDInvDesc *inv_desc)
+{
+    uint16_t domain_id;
+    uint32_t pasid;
+    int ret = 0;
+
+    if ((inv_desc->val[0] & VTD_INV_DESC_PASIDC_RSVD_VAL0) ||
+        (inv_desc->val[1] & VTD_INV_DESC_PASIDC_RSVD_VAL1) ||
+        (inv_desc->val[2] & VTD_INV_DESC_PASIDC_RSVD_VAL2) ||
+        (inv_desc->val[3] & VTD_INV_DESC_PASIDC_RSVD_VAL3)) {
+        error_report_once("non-zero-field-in-pc_inv_desc hi: 0x%" PRIx64
+                  " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]);
+        return false;
+    }
+
+    domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->val[0]);
+    pasid = VTD_INV_DESC_PASIDC_PASID(inv_desc->val[0]);
+
+    switch (inv_desc->val[0] & VTD_INV_DESC_PASIDC_G) {
+    case VTD_INV_DESC_PASIDC_DSI:
+        ret = vtd_pasid_cache_dsi(s, domain_id);
+        break;
+
+    case VTD_INV_DESC_PASIDC_PASID_SI:
+        ret = vtd_pasid_cache_psi(s, domain_id, pasid);
+        break;
+
+    case VTD_INV_DESC_PASIDC_GLOBAL:
+        ret = vtd_pasid_cache_gsi(s);
+        break;
+
+    default:
+        error_report_once("invalid-inv-granu-in-pc_inv_desc hi: 0x%" PRIx64
+                  " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]);
+        return false;
+    }
+
+    return (ret == 0) ? true : false;
+}
+
 static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
                                      VTDInvDesc *inv_desc)
 {
@@ -2441,12 +2498,11 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         }
         break;
 
-    /*
-     * TODO: the entity of below two cases will be implemented in future series.
-     * To make guest (which integrates scalable mode support patch set in
-     * iommu driver) work, just return true is enough so far.
-     */
     case VTD_INV_DESC_PC:
+        trace_vtd_inv_desc("pasid-cache", inv_desc.val[1], inv_desc.val[0]);
+        if (!vtd_process_pasid_desc(s, &inv_desc)) {
+            return false;
+        }
         break;
 
     case VTD_INV_DESC_PIOTLB:
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 8668771..c6cb28b 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -445,6 +445,18 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \
         (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
 
+#define VTD_INV_DESC_PASIDC_G          (3ULL << 4)
+#define VTD_INV_DESC_PASIDC_PASID(val) (((val) >> 32) & 0xfffffULL)
+#define VTD_INV_DESC_PASIDC_DID(val)   (((val) >> 16) & VTD_DOMAIN_ID_MASK)
+#define VTD_INV_DESC_PASIDC_RSVD_VAL0  0xfff000000000ffc0ULL
+#define VTD_INV_DESC_PASIDC_RSVD_VAL1  0xffffffffffffffffULL
+#define VTD_INV_DESC_PASIDC_RSVD_VAL2  0xffffffffffffffffULL
+#define VTD_INV_DESC_PASIDC_RSVD_VAL3  0xffffffffffffffffULL
+
+#define VTD_INV_DESC_PASIDC_DSI        (0ULL << 4)
+#define VTD_INV_DESC_PASIDC_PASID_SI   (1ULL << 4)
+#define VTD_INV_DESC_PASIDC_GLOBAL     (3ULL << 4)
+
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
     uint16_t domain_id;
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 43c0314..6da8bd2 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -22,6 +22,9 @@ vtd_inv_qi_head(uint16_t head) "read head %d"
 vtd_inv_qi_tail(uint16_t head) "write tail %d"
 vtd_inv_qi_fetch(void) ""
 vtd_context_cache_reset(void) ""
+vtd_pasid_cache_gsi(void) ""
+vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain 0x%"PRIx16
+vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32
 vtd_re_not_present(uint8_t bus) "Root entry bus %"PRIu8" not present"
 vtd_ce_not_present(uint8_t bus, uint8_t devfn) "Context entry bus %"PRIu8" devfn %"PRIu8" not present"
 vtd_iotlb_page_hit(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t domain) "IOTLB page hit sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" domain 0x%"PRIx16
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 12/22] intel_iommu: add present bit check for pasid table entries
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (10 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 11/22] intel_iommu: process pasid cache invalidation Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-11-02 16:20   ` Peter Xu
  2019-10-24 12:34 ` [RFC v2 13/22] intel_iommu: add PASID cache management infrastructure Liu Yi L
                   ` (13 subsequent siblings)
  25 siblings, 1 reply; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

The present bit check for pasid entry (pe) and pasid directory
entry (pdire) were missed in previous commits as fpd bit check
doesn't require present bit as "Set". This patch adds the present
bit check for callers which wants to get a valid pe/pdire.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c          | 92 +++++++++++++++++++++++++++++++++---------
 hw/i386/intel_iommu_internal.h |  1 +
 2 files changed, 74 insertions(+), 19 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 84ff6f0..90b8f6c 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -686,9 +686,18 @@ static inline bool vtd_pe_type_check(X86IOMMUState *x86_iommu,
     return true;
 }
 
-static int vtd_get_pasid_dire(dma_addr_t pasid_dir_base,
-                              uint32_t pasid,
-                              VTDPASIDDirEntry *pdire)
+static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire)
+{
+    return pdire->val & 1;
+}
+
+/**
+ * Caller of this function should check present bit if wants
+ * to use pdir entry for futher usage except for fpd bit check.
+ */
+static int vtd_get_pdire_from_pdir_table(dma_addr_t pasid_dir_base,
+                                         uint32_t pasid,
+                                         VTDPASIDDirEntry *pdire)
 {
     uint32_t index;
     dma_addr_t addr, entry_size;
@@ -703,18 +712,22 @@ static int vtd_get_pasid_dire(dma_addr_t pasid_dir_base,
     return 0;
 }
 
-static int vtd_get_pasid_entry(IntelIOMMUState *s,
-                               uint32_t pasid,
-                               VTDPASIDDirEntry *pdire,
-                               VTDPASIDEntry *pe)
+static inline bool vtd_pe_present(VTDPASIDEntry *pe)
+{
+    return pe->val[0] & VTD_PASID_ENTRY_P;
+}
+
+static int vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s,
+                                          uint32_t pasid,
+                                          dma_addr_t addr,
+                                          VTDPASIDEntry *pe)
 {
     uint32_t index;
-    dma_addr_t addr, entry_size;
+    dma_addr_t entry_size;
     X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
 
     index = VTD_PASID_TABLE_INDEX(pasid);
     entry_size = VTD_PASID_ENTRY_SIZE;
-    addr = pdire->val & VTD_PASID_TABLE_BASE_ADDR_MASK;
     addr = addr + index * entry_size;
     if (dma_memory_read(&address_space_memory, addr, pe, entry_size)) {
         return -VTD_FR_PASID_TABLE_INV;
@@ -732,25 +745,54 @@ static int vtd_get_pasid_entry(IntelIOMMUState *s,
     return 0;
 }
 
-static int vtd_get_pasid_entry_from_pasid(IntelIOMMUState *s,
-                                          dma_addr_t pasid_dir_base,
-                                          uint32_t pasid,
-                                          VTDPASIDEntry *pe)
+/**
+ * Caller of this function should check present bit if wants
+ * to use pasid entry for futher usage except for fpd bit check.
+ */
+static int vtd_get_pe_from_pdire(IntelIOMMUState *s,
+                                 uint32_t pasid,
+                                 VTDPASIDDirEntry *pdire,
+                                 VTDPASIDEntry *pe)
+{
+    dma_addr_t addr = pdire->val & VTD_PASID_TABLE_BASE_ADDR_MASK;
+
+    return vtd_get_pe_in_pasid_leaf_table(s, pasid, addr, pe);
+}
+
+/**
+ * This function gets a pasid entry from a specified pasid
+ * table (includes dir and leaf table) with a specified pasid.
+ * Sanity check should be done to ensure return a present
+ * pasid entry to caller.
+ */
+static int vtd_get_pe_from_pasid_table(IntelIOMMUState *s,
+                                       dma_addr_t pasid_dir_base,
+                                       uint32_t pasid,
+                                       VTDPASIDEntry *pe)
 {
     int ret;
     VTDPASIDDirEntry pdire;
 
-    ret = vtd_get_pasid_dire(pasid_dir_base, pasid, &pdire);
+    ret = vtd_get_pdire_from_pdir_table(pasid_dir_base,
+                                        pasid, &pdire);
     if (ret) {
         return ret;
     }
 
-    ret = vtd_get_pasid_entry(s, pasid, &pdire, pe);
+    if (!vtd_pdire_present(&pdire)) {
+        return -VTD_FR_PASID_TABLE_INV;
+    }
+
+    ret = vtd_get_pe_from_pdire(s, pasid, &pdire, pe);
     if (ret) {
         return ret;
     }
 
-    return ret;
+    if (!vtd_pe_present(pe)) {
+        return -VTD_FR_PASID_TABLE_INV;
+    }
+
+    return 0;
 }
 
 static int vtd_ce_get_rid2pasid_entry(IntelIOMMUState *s,
@@ -763,7 +805,7 @@ static int vtd_ce_get_rid2pasid_entry(IntelIOMMUState *s,
 
     pasid = VTD_CE_GET_RID2PASID(ce);
     pasid_dir_base = VTD_CE_GET_PASID_DIR_TABLE(ce);
-    ret = vtd_get_pasid_entry_from_pasid(s, pasid_dir_base, pasid, pe);
+    ret = vtd_get_pe_from_pasid_table(s, pasid_dir_base, pasid, pe);
 
     return ret;
 }
@@ -781,7 +823,11 @@ static int vtd_ce_get_pasid_fpd(IntelIOMMUState *s,
     pasid = VTD_CE_GET_RID2PASID(ce);
     pasid_dir_base = VTD_CE_GET_PASID_DIR_TABLE(ce);
 
-    ret = vtd_get_pasid_dire(pasid_dir_base, pasid, &pdire);
+    /*
+     * No present bit check since fpd is meaningful even
+     * if the present bit is clear.
+     */
+    ret = vtd_get_pdire_from_pdir_table(pasid_dir_base, pasid, &pdire);
     if (ret) {
         return ret;
     }
@@ -791,7 +837,15 @@ static int vtd_ce_get_pasid_fpd(IntelIOMMUState *s,
         return 0;
     }
 
-    ret = vtd_get_pasid_entry(s, pasid, &pdire, &pe);
+    if (!vtd_pdire_present(&pdire)) {
+        return -VTD_FR_PASID_TABLE_INV;
+    }
+
+    /*
+     * No present bit check since fpd is meaningful even
+     * if the present bit is clear.
+     */
+    ret = vtd_get_pe_from_pdire(s, pasid, &pdire, &pe);
     if (ret) {
         return ret;
     }
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index c6cb28b..879211e 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -529,6 +529,7 @@ typedef struct VTDRootEntry VTDRootEntry;
 #define VTD_PASID_ENTRY_FPD           (1ULL << 1) /* Fault Processing Disable */
 
 /* PASID Granular Translation Type Mask */
+#define VTD_PASID_ENTRY_P              1ULL
 #define VTD_SM_PASID_ENTRY_PGTT        (7ULL << 6)
 #define VTD_SM_PASID_ENTRY_FLT         (1ULL << 6)
 #define VTD_SM_PASID_ENTRY_SLT         (2ULL << 6)
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 13/22] intel_iommu: add PASID cache management infrastructure
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (11 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 12/22] intel_iommu: add present bit check for pasid table entries Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-11-04 17:08   ` Peter Xu
  2019-11-04 20:06   ` Peter Xu
  2019-10-24 12:34 ` [RFC v2 14/22] vfio/pci: add iommu_context notifier for pasid bind/unbind Liu Yi L
                   ` (12 subsequent siblings)
  25 siblings, 2 replies; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

This patch adds a PASID cache management infrastructure based on
new added structure VTDPASIDAddressSpace, which is used to track
the PASID usage and future PASID tagged DMA address translation
support in vIOMMU.

    struct VTDPASIDAddressSpace {
        VTDBus *vtd_bus;
        uint8_t devfn;
        AddressSpace as;
        uint32_t pasid;
        IntelIOMMUState *iommu_state;
        VTDContextCacheEntry context_cache_entry;
        QLIST_ENTRY(VTDPASIDAddressSpace) next;
        VTDPASIDCacheEntry pasid_cache_entry;
    };

Ideally, a VTDPASIDAddressSpace instance is created when a PASID
is bound with a DMA AddressSpace. Intel VT-d spec requires guest
software to issue pasid cache invalidation when bind or unbind a
pasid with an address space under caching-mode. However, as
VTDPASIDAddressSpace instances also act as pasid cache in this
implementation, its creation also happens during vIOMMU PASID
tagged DMA translation. The creation in this path will not be
added in this patch since no PASID-capable emulated devices for
now.

The implementation in this patch manages VTDPASIDAddressSpace
instances per PASID+BDF (lookup and insert will use PASID and
BDF) since Intel VT-d spec allows per-BDF PASID Table. When a
guest bind a PASID with an AddressSpace, QEMU will capture the
guest pasid selective pasid cache invalidation, and allocate
remove a VTDPASIDAddressSpace instance per the invalidation
reasons:

    *) a present pasid entry moved to non-present
    *) a present pasid entry to be a present entry
    *) a non-present pasid entry moved to present

vIOMMU emulator could figure out the reason by fetching latest
guest pasid entry.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c          | 356 +++++++++++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h |  10 ++
 hw/i386/trace-events           |   1 +
 include/hw/i386/intel_iommu.h  |  36 ++++-
 4 files changed, 402 insertions(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 90b8f6c..d8827c9 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -40,6 +40,7 @@
 #include "kvm_i386.h"
 #include "migration/vmstate.h"
 #include "trace.h"
+#include "qemu/jhash.h"
 
 /* context entry operations */
 #define VTD_CE_GET_RID2PASID(ce) \
@@ -65,6 +66,8 @@
 static void vtd_address_space_refresh_all(IntelIOMMUState *s);
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
 
+static void vtd_pasid_cache_reset(IntelIOMMUState *s);
+
 static void vtd_panic_require_caching_mode(void)
 {
     error_report("We need to set caching-mode=on for intel-iommu to enable "
@@ -276,6 +279,7 @@ static void vtd_reset_caches(IntelIOMMUState *s)
     vtd_iommu_lock(s);
     vtd_reset_iotlb_locked(s);
     vtd_reset_context_cache_locked(s);
+    vtd_pasid_cache_reset(s);
     vtd_iommu_unlock(s);
 }
 
@@ -686,6 +690,11 @@ static inline bool vtd_pe_type_check(X86IOMMUState *x86_iommu,
     return true;
 }
 
+static inline uint16_t vtd_pe_get_domain_id(VTDPASIDEntry *pe)
+{
+    return VTD_SM_PASID_ENTRY_DID((pe)->val[1]);
+}
+
 static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire)
 {
     return pdire->val & 1;
@@ -2389,19 +2398,361 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
     return true;
 }
 
+static inline struct pasid_key *vtd_get_pasid_key(uint32_t pasid,
+                                                  uint16_t sid)
+{
+    struct pasid_key *key = g_malloc0(sizeof(*key));
+    key->pasid = pasid;
+    key->sid = sid;
+    return key;
+}
+
+static guint vtd_pasid_as_key_hash(gconstpointer v)
+{
+    struct pasid_key *key = (struct pasid_key *)v;
+    uint32_t a, b, c;
+
+    /* Jenkins hash */
+    a = b = c = JHASH_INITVAL + sizeof(*key);
+    a += key->sid;
+    b += extract32(key->pasid, 0, 16);
+    c += extract32(key->pasid, 16, 16);
+
+    __jhash_mix(a, b, c);
+    __jhash_final(a, b, c);
+
+    return c;
+}
+
+static gboolean vtd_pasid_as_key_equal(gconstpointer v1, gconstpointer v2)
+{
+    const struct pasid_key *k1 = v1;
+    const struct pasid_key *k2 = v2;
+
+    return (k1->pasid == k2->pasid) && (k1->sid == k2->sid);
+}
+
+static inline bool vtd_pc_is_dom_si(struct VTDPASIDCacheInfo *pc_info)
+{
+    return pc_info->flags & VTD_PASID_CACHE_DOMSI;
+}
+
+static inline bool vtd_pc_is_pasid_si(struct VTDPASIDCacheInfo *pc_info)
+{
+    return pc_info->flags & VTD_PASID_CACHE_PASIDSI;
+}
+
+static inline int vtd_dev_get_pe_from_pasid(IntelIOMMUState *s,
+                                            uint8_t bus_num,
+                                            uint8_t devfn,
+                                            uint32_t pasid,
+                                            VTDPASIDEntry *pe)
+{
+    VTDContextEntry ce;
+    int ret;
+    dma_addr_t pasid_dir_base;
+
+    if (!s->root_scalable) {
+        return -VTD_FR_PASID_TABLE_INV;
+    }
+
+    ret = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
+    if (ret) {
+        return ret;
+    }
+
+    pasid_dir_base = VTD_CE_GET_PASID_DIR_TABLE(&ce);
+    ret = vtd_get_pe_from_pasid_table(s,
+                                  pasid_dir_base, pasid, pe);
+
+    return ret;
+}
+
+static bool vtd_pasid_entry_compare(VTDPASIDEntry *p1, VTDPASIDEntry *p2)
+{
+    int i = 0;
+    while (i < sizeof(*p1) / sizeof(p1->val)) {
+        if (p1->val[i] != p2->val[i]) {
+            return false;
+        }
+        i++;
+    }
+    return true;
+}
+
+/**
+ * This function is used to clear pasid_cache_gen of cached pasid
+ * entry in vtd_pasid_as instances. Caller of this function should
+ * hold iommu_lock.
+ */
+static gboolean vtd_flush_pasid(gpointer key, gpointer value,
+                                gpointer user_data)
+{
+    VTDPASIDCacheInfo *pc_info = user_data;
+    VTDPASIDAddressSpace *vtd_pasid_as = value;
+    IntelIOMMUState *s = vtd_pasid_as->iommu_state;
+    VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry;
+    VTDBus *vtd_bus = vtd_pasid_as->vtd_bus;
+    VTDPASIDEntry pe;
+    uint16_t did;
+    uint32_t pasid;
+    uint16_t devfn;
+    gboolean remove = false;
+
+    did = vtd_pe_get_domain_id(&pc_entry->pasid_entry);
+    pasid = vtd_pasid_as->pasid;
+    devfn = vtd_pasid_as->devfn;
+
+    if (pc_entry->pasid_cache_gen &&
+        (vtd_pc_is_dom_si(pc_info) ? (pc_info->domain_id == did) : 1) &&
+        (vtd_pc_is_pasid_si(pc_info) ? (pc_info->pasid == pasid) : 1)) {
+        /*
+         * Modify pasid_cache_gen to be 0, the cached pasid entry in
+         * vtd_pasid_as instance is invalid. And vtd_pasid_as instance
+         * would be treated as invalid in QEMU scope until the pasid
+         * cache gen is updated in a new pasid binding or updated in
+         * below logic if found guest pasid entry exists.
+         */
+        remove = true;
+        pc_entry->pasid_cache_gen = 0;
+        if (vtd_bus->dev_ic[devfn]) {
+            if (!vtd_dev_get_pe_from_pasid(s,
+                      pci_bus_num(vtd_bus->bus), devfn, pasid, &pe)) {
+                /*
+                 * pasid entry exists, so keep the vtd_pasid_as, and needs
+                 * update the pasid entry cached in vtd_pasid_as. Also, if
+                 * the guest pasid entry doesn't equal to cached pasid entry
+                 * needs to issue a pasid bind to host for passthru devies.
+                 */
+                remove = false;
+                pc_entry->pasid_cache_gen = s->pasid_cache_gen;
+                if (!vtd_pasid_entry_compare(&pe, &pc_entry->pasid_entry)) {
+                    pc_entry->pasid_entry = pe;
+                    /*
+                     * TODO: when pasid-base-iotlb(piotlb) infrastructure is
+                     * ready, should invalidate QEMU piotlb togehter with this
+                     * change.
+                     */
+                }
+            }
+        }
+    }
+
+    return remove;
+}
+
 static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id)
 {
+    VTDPASIDCacheInfo pc_info;
+
+    trace_vtd_pasid_cache_dsi(domain_id);
+
+    pc_info.flags = VTD_PASID_CACHE_DOMSI;
+    pc_info.domain_id = domain_id;
+
+    /*
+     * Loop all existing pasid caches and update them.
+     */
+    vtd_iommu_lock(s);
+    g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
+    vtd_iommu_unlock(s);
+
+    /*
+     * TODO: Domain selective PASID cache invalidation
+     * may be issued wrongly by programmer, to be safe,
+     * after invalidating the pasid caches, emulator
+     * needs to replay the pasid bindings by walking guest
+     * pasid dir and pasid table.
+     */
     return 0;
 }
 
+/**
+ * This function finds or adds a VTDPASIDAddressSpace for a device
+ * when it is bound to a pasid. Caller of this function should hold
+ * iommu_lock.
+ */
+static VTDPASIDAddressSpace *vtd_add_find_pasid_as(IntelIOMMUState *s,
+                                                   VTDBus *vtd_bus,
+                                                   int devfn,
+                                                   uint32_t pasid,
+                                                   bool allocate)
+{
+    struct pasid_key *key;
+    struct pasid_key *new_key;
+    VTDPASIDAddressSpace *vtd_pasid_as;
+    uint16_t sid;
+
+    sid = vtd_make_source_id(pci_bus_num(vtd_bus->bus), devfn);
+    key = vtd_get_pasid_key(pasid, sid);
+    vtd_pasid_as = g_hash_table_lookup(s->vtd_pasid_as, key);
+
+    if (!vtd_pasid_as && allocate) {
+        new_key = vtd_get_pasid_key(pasid, sid);
+        /*
+         * Initiate the vtd_pasid_as structure.
+         *
+         * This structure here is used to track the guest pasid
+         * binding and also serves as pasid-cache mangement entry.
+         *
+         * TODO: in future, if wants to support the SVA-aware DMA
+         *       emulation, the vtd_pasid_as should be fully initialized.
+         *       e.g. the address_space and memory region fields.
+         */
+        vtd_pasid_as = g_malloc0(sizeof(VTDPASIDAddressSpace));
+        vtd_pasid_as->iommu_state = s;
+        vtd_pasid_as->vtd_bus = vtd_bus;
+        vtd_pasid_as->devfn = devfn;
+        vtd_pasid_as->context_cache_entry.context_cache_gen = 0;
+        vtd_pasid_as->pasid = pasid;
+        vtd_pasid_as->pasid_cache_entry.pasid_cache_gen = 0;
+        g_hash_table_insert(s->vtd_pasid_as, new_key, vtd_pasid_as);
+    }
+    return vtd_pasid_as;
+}
+
+ /**
+  * This function updates the pasid entry cached in &vtd_pasid_as.
+  * Caller of this function should hold iommu_lock.
+  */
+static inline void vtd_fill_in_pe_cache(
+              VTDPASIDAddressSpace *vtd_pasid_as, VTDPASIDEntry *pe)
+{
+    IntelIOMMUState *s = vtd_pasid_as->iommu_state;
+    VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry;
+
+    pc_entry->pasid_entry = *pe;
+    pc_entry->pasid_cache_gen = s->pasid_cache_gen;
+}
+
 static int vtd_pasid_cache_psi(IntelIOMMUState *s,
                                uint16_t domain_id, uint32_t pasid)
 {
+    VTDPASIDCacheInfo pc_info;
+    VTDPASIDEntry pe;
+    VTDBus *vtd_bus;
+    int bus_n, devfn;
+    VTDPASIDAddressSpace *vtd_pasid_as;
+    VTDIOMMUContext *vtd_ic;
+
+    pc_info.flags = VTD_PASID_CACHE_DOMSI;
+    pc_info.domain_id = domain_id;
+    pc_info.flags |= VTD_PASID_CACHE_PASIDSI;
+    pc_info.pasid = pasid;
+
+    /*
+     * Regards to a pasid selective pasid cache invalidation (PSI), it
+     * could be either cases of below:
+     * a) a present pasid entry moved to non-present
+     * b) a present pasid entry to be a present entry
+     * c) a non-present pasid entry moved to present
+     *
+     * Here the handling of a PSI is:
+     * 1) loop all the exisitng vtd_pasid_as instances to update them
+     *    according to the latest guest pasid entry in pasid table.
+     *    this will make sure affected existing vtd_pasid_as instances
+     *    cached the latest pasid entries. Also, during the loop, the
+     *    host should be notified if needed. e.g. pasid unbind or pasid
+     *    update. Should be able to cover case a) and case b).
+     *
+     * 2) loop all devices to cover case c)
+     *    However, it is not good to always loop all devices. In this
+     *    implementation. We do it in this ways:
+     *    - For devices which have VTDIOMMUContext instances, we loop
+     *      them and check if guest pasid entry exists. If yes, it is
+     *      case c), we update the pasid cache and also notify host.
+     *    - For devices which have no VTDIOMMUContext instances, it is
+     *      not necessary to create pasid cache at this phase since it
+     *      could be created when vIOMMU do DMA address translation.
+     *      This is not implemented yet since no PASID-capable emulated
+     *      devices today. If we have it in future, the pasid cache shall
+     *      be created there.
+     */
+
+    vtd_iommu_lock(s);
+    g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
+    vtd_iommu_unlock(s);
+
+    vtd_iommu_lock(s);
+    QLIST_FOREACH(vtd_ic, &s->vtd_dev_ic_list, next) {
+        vtd_bus = vtd_ic->vtd_bus;
+        devfn = vtd_ic->devfn;
+        bus_n = pci_bus_num(vtd_bus->bus);
+
+        /* Step 1: fetch vtd_pasid_as and check if it is valid */
+        vtd_pasid_as = vtd_add_find_pasid_as(s, vtd_bus,
+                                        devfn, pasid, true);
+        if (vtd_pasid_as &&
+            (s->pasid_cache_gen ==
+             vtd_pasid_as->pasid_cache_entry.pasid_cache_gen)) {
+            /*
+             * pasid_cache_gen equals to s->pasid_cache_gen means
+             * vtd_pasid_as is valid after the above s->vtd_pasid_as
+             * updates. Thus no need for the below steps.
+             */
+            continue;
+        }
+
+        /*
+         * Step 2: vtd_pasid_as is not valid, it's potentailly a
+         * new pasid bind. Fetch guest pasid entry.
+         */
+        if (vtd_dev_get_pe_from_pasid(s, bus_n, devfn, pasid, &pe)) {
+            continue;
+        }
+
+        /*
+         * Step 3: pasid entry exists, update pasid cache
+         *
+         * Here need to check domain ID since guest pasid entry
+         * exists. What needs to do are:
+         *   - update the pc_entry in the vtd_pasid_as
+         *   - set proper pc_entry.pasid_cache_gen
+         *   - passdown the latest guest pasid entry config to host
+         *     (will be added in later patch)
+         */
+        if (domain_id == vtd_pe_get_domain_id(&pe)) {
+            vtd_fill_in_pe_cache(vtd_pasid_as, &pe);
+        }
+    }
+    vtd_iommu_unlock(s);
     return 0;
 }
 
+/**
+ * Caller of this function should hold iommu_lock
+ */
+static void vtd_pasid_cache_reset(IntelIOMMUState *s)
+{
+    VTDPASIDCacheInfo pc_info;
+
+    trace_vtd_pasid_cache_reset();
+
+    pc_info.flags = 0;
+
+    /*
+     * Reset pasid cache is a big hammer, so use g_hash_table_foreach_remove
+     * which will free the vtd_pasid_as instances.
+     */
+    g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
+    s->pasid_cache_gen = 1;
+}
+
 static int vtd_pasid_cache_gsi(IntelIOMMUState *s)
 {
+    trace_vtd_pasid_cache_gsi();
+
+    vtd_iommu_lock(s);
+    vtd_pasid_cache_reset(s);
+    vtd_iommu_unlock(s);
+
+    /*
+     * TODO: Global PASID cache invalidation may be
+     * issued wrongly by programmer, to be safe, after
+     * invalidating the pasid caches, emulator needs
+     * to replay the pasid bindings by walking guest
+     * pasid dir and pasid table.
+     */
     return 0;
 }
 
@@ -3660,7 +4011,9 @@ VTDIOMMUContext *vtd_find_add_ic(IntelIOMMUState *s,
         vtd_dev_ic->devfn = (uint8_t)devfn;
         vtd_dev_ic->iommu_state = s;
         iommu_context_init(&vtd_dev_ic->iommu_context);
+        QLIST_INSERT_HEAD(&s->vtd_dev_ic_list, vtd_dev_ic, next);
     }
+
     return vtd_dev_ic;
 }
 
@@ -4074,6 +4427,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
     }
 
     QLIST_INIT(&s->vtd_as_with_notifiers);
+    QLIST_INIT(&s->vtd_dev_ic_list);
     qemu_mutex_init(&s->iommu_lock);
     memset(s->vtd_as_by_bus_num, 0, sizeof(s->vtd_as_by_bus_num));
     memory_region_init_io(&s->csrmem, OBJECT(s), &vtd_mem_ops, s,
@@ -4099,6 +4453,8 @@ static void vtd_realize(DeviceState *dev, Error **errp)
                                      g_free, g_free);
     s->vtd_as_by_busptr = g_hash_table_new_full(vtd_uint64_hash, vtd_uint64_equal,
                                               g_free, g_free);
+    s->vtd_pasid_as = g_hash_table_new_full(vtd_pasid_as_key_hash,
+                                   vtd_pasid_as_key_equal, g_free, g_free);
     vtd_init(s);
     sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
     pci_setup_iommu(bus, &vtd_iommu_ops, dev);
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 879211e..12873e1 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -311,6 +311,7 @@ typedef enum VTDFaultReason {
     VTD_FR_IR_SID_ERR = 0x26,   /* Invalid Source-ID */
 
     VTD_FR_PASID_TABLE_INV = 0x58,  /*Invalid PASID table entry */
+    VTD_FR_PASID_ENTRY_P = 0x59, /* The Present(P) field of pasidt-entry is 0 */
 
     /* This is not a normal fault reason. We use this to indicate some faults
      * that are not referenced by the VT-d specification.
@@ -482,6 +483,15 @@ struct VTDRootEntry {
 };
 typedef struct VTDRootEntry VTDRootEntry;
 
+struct VTDPASIDCacheInfo {
+#define VTD_PASID_CACHE_DOMSI   (1ULL << 0);
+#define VTD_PASID_CACHE_PASIDSI (1ULL << 1);
+    uint32_t flags;
+    uint16_t domain_id;
+    uint32_t pasid;
+};
+typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo;
+
 /* Masks for struct VTDRootEntry */
 #define VTD_ROOT_ENTRY_P            1ULL
 #define VTD_ROOT_ENTRY_CTP          (~0xfffULL)
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 6da8bd2..7912ae1 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -22,6 +22,7 @@ vtd_inv_qi_head(uint16_t head) "read head %d"
 vtd_inv_qi_tail(uint16_t head) "write tail %d"
 vtd_inv_qi_fetch(void) ""
 vtd_context_cache_reset(void) ""
+vtd_pasid_cache_reset(void) ""
 vtd_pasid_cache_gsi(void) ""
 vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain 0x%"PRIx16
 vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 0d49480..d693f71 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -69,6 +69,8 @@ typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
 typedef struct VTDPASIDDirEntry VTDPASIDDirEntry;
 typedef struct VTDPASIDEntry VTDPASIDEntry;
 typedef struct VTDIOMMUContext VTDIOMMUContext;
+typedef struct VTDPASIDCacheEntry VTDPASIDCacheEntry;
+typedef struct VTDPASIDAddressSpace VTDPASIDAddressSpace;
 
 /* Context-Entry */
 struct VTDContextEntry {
@@ -101,6 +103,31 @@ struct VTDPASIDEntry {
     uint64_t val[8];
 };
 
+struct pasid_key {
+    uint32_t pasid;
+    uint16_t sid;
+};
+
+struct VTDPASIDCacheEntry {
+    /*
+     * The cache entry is obsolete if
+     * pasid_cache_gen!=IntelIOMMUState.pasid_cache_gen
+     */
+    uint32_t pasid_cache_gen;
+    struct VTDPASIDEntry pasid_entry;
+};
+
+struct VTDPASIDAddressSpace {
+    VTDBus *vtd_bus;
+    uint8_t devfn;
+    AddressSpace as;
+    uint32_t pasid;
+    IntelIOMMUState *iommu_state;
+    VTDContextCacheEntry context_cache_entry;
+    QLIST_ENTRY(VTDPASIDAddressSpace) next;
+    VTDPASIDCacheEntry pasid_cache_entry;
+};
+
 struct VTDAddressSpace {
     PCIBus *bus;
     uint8_t devfn;
@@ -121,6 +148,7 @@ struct VTDIOMMUContext {
     VTDBus *vtd_bus;
     uint8_t devfn;
     IOMMUContext iommu_context;
+    QLIST_ENTRY(VTDIOMMUContext) next;
     IntelIOMMUState *iommu_state;
 };
 
@@ -269,9 +297,14 @@ struct IntelIOMMUState {
 
     GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* reference */
     VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
+    GHashTable *vtd_pasid_as;   /* VTDPASIDAddressSpace objects */
+    uint32_t pasid_cache_gen;   /* Should be in [1,MAX] */
     /* list of registered notifiers */
     QLIST_HEAD(, VTDAddressSpace) vtd_as_with_notifiers;
 
+    /* list of registered notifiers */
+    QLIST_HEAD(, VTDIOMMUContext) vtd_dev_ic_list;
+
     /* interrupt remapping */
     bool intr_enabled;              /* Whether guest enabled IR */
     dma_addr_t intr_root;           /* Interrupt remapping table pointer */
@@ -288,7 +321,8 @@ struct IntelIOMMUState {
 
     /*
      * Protects IOMMU states in general.  Currently it protects the
-     * per-IOMMU IOTLB cache, and context entry cache in VTDAddressSpace.
+     * per-IOMMU IOTLB cache, and context entry cache in VTDAddressSpace,
+     * and pasid cache in VTDPASIDAddressSpace.
      */
     QemuMutex iommu_lock;
 };
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 14/22] vfio/pci: add iommu_context notifier for pasid bind/unbind
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (12 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 13/22] intel_iommu: add PASID cache management infrastructure Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-11-04 16:02   ` David Gibson
  2019-10-24 12:34 ` [RFC v2 15/22] intel_iommu: bind/unbind guest page table to host Liu Yi L
                   ` (11 subsequent siblings)
  25 siblings, 1 reply; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

This patch adds notifier for pasid bind/unbind. VFIO registers this
notifier to listen to the dual-stage translation (a.k.a. nested
translation) configuration changes and propagate to host. Thus vIOMMU
is able to set its translation structures to host.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/vfio/pci.c            | 39 +++++++++++++++++++++++++++++++++++++++
 include/hw/iommu/iommu.h | 11 +++++++++++
 2 files changed, 50 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8721ff6..012b8ed 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2767,6 +2767,41 @@ static void vfio_iommu_pasid_free_notify(IOMMUCTXNotifier *n,
     pasid_req->free_result = ret;
 }
 
+static void vfio_iommu_pasid_bind_notify(IOMMUCTXNotifier *n,
+                                         IOMMUCTXEventData *event_data)
+{
+#ifdef __linux__
+    VFIOIOMMUContext *giommu_ctx = container_of(n, VFIOIOMMUContext, n);
+    VFIOContainer *container = giommu_ctx->container;
+    IOMMUCTXPASIDBindData *pasid_bind =
+                              (IOMMUCTXPASIDBindData *) event_data->data;
+    struct vfio_iommu_type1_bind *bind;
+    struct iommu_gpasid_bind_data *bind_data;
+    unsigned long argsz;
+
+    argsz = sizeof(*bind) + sizeof(*bind_data);
+    bind = g_malloc0(argsz);
+    bind->argsz = argsz;
+    bind->bind_type = VFIO_IOMMU_BIND_GUEST_PASID;
+    bind_data = (struct iommu_gpasid_bind_data *) &bind->data;
+    *bind_data = *pasid_bind->data;
+
+    if (pasid_bind->flag & IOMMU_CTX_BIND_PASID) {
+        if (ioctl(container->fd, VFIO_IOMMU_BIND, bind) != 0) {
+            error_report("%s: pasid (%llu:%llu) bind failed: %d", __func__,
+                         bind_data->gpasid, bind_data->hpasid, -errno);
+        }
+    } else if (pasid_bind->flag & IOMMU_CTX_UNBIND_PASID) {
+        if (ioctl(container->fd, VFIO_IOMMU_UNBIND, bind) != 0) {
+            error_report("%s: pasid (%llu:%llu) unbind failed: %d", __func__,
+                         bind_data->gpasid, bind_data->hpasid, -errno);
+        }
+    }
+
+    g_free(bind);
+#endif
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
     VFIOPCIDevice *vdev = PCI_VFIO(pdev);
@@ -3079,6 +3114,10 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
                                          iommu_context,
                                          vfio_iommu_pasid_free_notify,
                                          IOMMU_CTX_EVENT_PASID_FREE);
+        vfio_register_iommu_ctx_notifier(vdev,
+                                         iommu_context,
+                                         vfio_iommu_pasid_bind_notify,
+                                         IOMMU_CTX_EVENT_PASID_BIND);
     }
 
     return;
diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
index 4352afd..4f21aa1 100644
--- a/include/hw/iommu/iommu.h
+++ b/include/hw/iommu/iommu.h
@@ -33,6 +33,7 @@ typedef struct IOMMUContext IOMMUContext;
 enum IOMMUCTXEvent {
     IOMMU_CTX_EVENT_PASID_ALLOC,
     IOMMU_CTX_EVENT_PASID_FREE,
+    IOMMU_CTX_EVENT_PASID_BIND,
     IOMMU_CTX_EVENT_NUM,
 };
 typedef enum IOMMUCTXEvent IOMMUCTXEvent;
@@ -50,6 +51,16 @@ union IOMMUCTXPASIDReqDesc {
 };
 typedef union IOMMUCTXPASIDReqDesc IOMMUCTXPASIDReqDesc;
 
+struct IOMMUCTXPASIDBindData {
+#define IOMMU_CTX_BIND_PASID   (1 << 0)
+#define IOMMU_CTX_UNBIND_PASID (1 << 1)
+    uint32_t flag;
+#ifdef __linux__
+    struct iommu_gpasid_bind_data *data;
+#endif
+};
+typedef struct IOMMUCTXPASIDBindData IOMMUCTXPASIDBindData;
+
 struct IOMMUCTXEventData {
     IOMMUCTXEvent event;
     uint64_t length;
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 15/22] intel_iommu: bind/unbind guest page table to host
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (13 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 14/22] vfio/pci: add iommu_context notifier for pasid bind/unbind Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-11-04 20:25   ` Peter Xu
  2019-10-24 12:34 ` [RFC v2 16/22] intel_iommu: replay guest pasid bindings " Liu Yi L
                   ` (10 subsequent siblings)
  25 siblings, 1 reply; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

This patch captures the guest PASID table entry modifications and
propagates the changes to host to setup nested translation. The
guest page table is configured as 1st level page table (GVA->GPA)
whose translation result would further go through host VT-d 2nd
level page table(GPA->HPA) under nested translation mode. This is
a key part of vSVA support.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c          | 81 ++++++++++++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h | 20 +++++++++++
 2 files changed, 101 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index d8827c9..793b0de 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -41,6 +41,7 @@
 #include "migration/vmstate.h"
 #include "trace.h"
 #include "qemu/jhash.h"
+#include <linux/iommu.h>
 
 /* context entry operations */
 #define VTD_CE_GET_RID2PASID(ce) \
@@ -695,6 +696,16 @@ static inline uint16_t vtd_pe_get_domain_id(VTDPASIDEntry *pe)
     return VTD_SM_PASID_ENTRY_DID((pe)->val[1]);
 }
 
+static inline uint32_t vtd_pe_get_fl_aw(VTDPASIDEntry *pe)
+{
+    return 48 + ((pe->val[2] >> 2) & VTD_SM_PASID_ENTRY_FLPM) * 9;
+}
+
+static inline dma_addr_t vtd_pe_get_flpt_base(VTDPASIDEntry *pe)
+{
+    return pe->val[2] & VTD_SM_PASID_ENTRY_FLPTPTR;
+}
+
 static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire)
 {
     return pdire->val & 1;
@@ -1850,6 +1861,67 @@ static void vtd_context_global_invalidate(IntelIOMMUState *s)
     vtd_iommu_replay_all(s);
 }
 
+static void vtd_bind_guest_pasid(IntelIOMMUState *s, VTDBus *vtd_bus,
+            int devfn, int pasid, VTDPASIDEntry *pe, VTDPASIDOp op)
+{
+#ifdef __linux__
+    VTDIOMMUContext *vtd_ic;
+    IOMMUCTXEventData event_data;
+    IOMMUCTXPASIDBindData bind;
+    struct iommu_gpasid_bind_data *g_bind_data;
+
+    vtd_ic = vtd_bus->dev_ic[devfn];
+    if (!vtd_ic) {
+        return;
+    }
+
+    g_bind_data = g_malloc0(sizeof(*g_bind_data));
+    bind.flag = 0;
+    g_bind_data->flags = 0;
+    g_bind_data->vtd.flags = 0;
+    switch (op) {
+    case VTD_PASID_BIND:
+    case VTD_PASID_UPDATE:
+        g_bind_data->version = IOMMU_GPASID_BIND_VERSION_1;
+        g_bind_data->format = IOMMU_PASID_FORMAT_INTEL_VTD;
+        g_bind_data->gpgd = vtd_pe_get_flpt_base(pe);
+        g_bind_data->addr_width = vtd_pe_get_fl_aw(pe);
+        g_bind_data->hpasid = pasid;
+        g_bind_data->gpasid = pasid;
+        g_bind_data->flags |= IOMMU_SVA_GPASID_VAL;
+        g_bind_data->vtd.flags =
+                             (VTD_SM_PASID_ENTRY_SRE_BIT(pe->val[2]) ? 1 : 0)
+                           | (VTD_SM_PASID_ENTRY_EAFE_BIT(pe->val[2]) ? 1 : 0)
+                           | (VTD_SM_PASID_ENTRY_PCD_BIT(pe->val[1]) ? 1 : 0)
+                           | (VTD_SM_PASID_ENTRY_PWT_BIT(pe->val[1]) ? 1 : 0)
+                           | (VTD_SM_PASID_ENTRY_EMTE_BIT(pe->val[1]) ? 1 : 0)
+                           | (VTD_SM_PASID_ENTRY_CD_BIT(pe->val[1]) ? 1 : 0);
+        g_bind_data->vtd.pat = VTD_SM_PASID_ENTRY_PAT(pe->val[1]);
+        g_bind_data->vtd.emt = VTD_SM_PASID_ENTRY_EMT(pe->val[1]);
+        bind.flag |= IOMMU_CTX_BIND_PASID;
+        break;
+
+    case VTD_PASID_UNBIND:
+        g_bind_data->gpgd = 0;
+        g_bind_data->addr_width = 0;
+        g_bind_data->hpasid = pasid;
+        bind.flag |= IOMMU_CTX_UNBIND_PASID;
+        break;
+
+    default:
+        printf("Unknown VTDPASIDOp!!\n");
+        break;
+    }
+    if (bind.flag) {
+        event_data.event = IOMMU_CTX_EVENT_PASID_BIND;
+        bind.data = g_bind_data;
+        event_data.data = &bind;
+        iommu_ctx_event_notify(&vtd_ic->iommu_context, &event_data);
+    }
+    g_free(g_bind_data);
+#endif
+}
+
 /* Do a context-cache device-selective invalidation.
  * @func_mask: FM field after shifting
  */
@@ -2528,12 +2600,17 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value,
                 pc_entry->pasid_cache_gen = s->pasid_cache_gen;
                 if (!vtd_pasid_entry_compare(&pe, &pc_entry->pasid_entry)) {
                     pc_entry->pasid_entry = pe;
+                    vtd_bind_guest_pasid(s, vtd_bus, devfn,
+                                     pasid, &pe, VTD_PASID_UPDATE);
                     /*
                      * TODO: when pasid-base-iotlb(piotlb) infrastructure is
                      * ready, should invalidate QEMU piotlb togehter with this
                      * change.
                      */
                 }
+            } else {
+                vtd_bind_guest_pasid(s, vtd_bus, devfn,
+                                  pasid, NULL, VTD_PASID_UNBIND);
             }
         }
     }
@@ -2623,6 +2700,10 @@ static inline void vtd_fill_in_pe_cache(
 
     pc_entry->pasid_entry = *pe;
     pc_entry->pasid_cache_gen = s->pasid_cache_gen;
+    vtd_bind_guest_pasid(s, vtd_pasid_as->vtd_bus,
+                         vtd_pasid_as->devfn,
+                         vtd_pasid_as->pasid,
+                         pe, VTD_PASID_UPDATE);
 }
 
 static int vtd_pasid_cache_psi(IntelIOMMUState *s,
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 12873e1..13e02e8 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -483,6 +483,14 @@ struct VTDRootEntry {
 };
 typedef struct VTDRootEntry VTDRootEntry;
 
+enum VTDPASIDOp {
+    VTD_PASID_BIND,
+    VTD_PASID_UNBIND,
+    VTD_PASID_UPDATE,
+    VTD_OP_NUM
+};
+typedef enum VTDPASIDOp VTDPASIDOp;
+
 struct VTDPASIDCacheInfo {
 #define VTD_PASID_CACHE_DOMSI   (1ULL << 0);
 #define VTD_PASID_CACHE_PASIDSI (1ULL << 1);
@@ -549,6 +557,18 @@ typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo;
 #define VTD_SM_PASID_ENTRY_AW          7ULL /* Adjusted guest-address-width */
 #define VTD_SM_PASID_ENTRY_DID(val)    ((val) & VTD_DOMAIN_ID_MASK)
 
+/* Adjusted guest-address-width */
+#define VTD_SM_PASID_ENTRY_FLPM          3ULL
+#define VTD_SM_PASID_ENTRY_FLPTPTR       (~0xfffULL)
+#define VTD_SM_PASID_ENTRY_SRE_BIT(val)  (!!((val) & 1ULL))
+#define VTD_SM_PASID_ENTRY_EAFE_BIT(val) (!!(((val) >> 7) & 1ULL))
+#define VTD_SM_PASID_ENTRY_PCD_BIT(val)  (!!(((val) >> 31) & 1ULL))
+#define VTD_SM_PASID_ENTRY_PWT_BIT(val)  (!!(((val) >> 30) & 1ULL))
+#define VTD_SM_PASID_ENTRY_EMTE_BIT(val) (!!(((val) >> 26) & 1ULL))
+#define VTD_SM_PASID_ENTRY_CD_BIT(val)   (!!(((val) >> 25) & 1ULL))
+#define VTD_SM_PASID_ENTRY_PAT(val)      (((val) >> 32) & 0xFFFFFFFFULL)
+#define VTD_SM_PASID_ENTRY_EMT(val)      (((val) >> 27) & 0x7ULL)
+
 /* Second Level Page Translation Pointer*/
 #define VTD_SM_PASID_ENTRY_SLPTPTR     (~0xfffULL)
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 16/22] intel_iommu: replay guest pasid bindings to host
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (14 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 15/22] intel_iommu: bind/unbind guest page table to host Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-10-24 12:34 ` [RFC v2 17/22] intel_iommu: replay pasid binds after context cache invalidation Liu Yi L
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

This patch adds guest pasid bindings replay for domain-selective(dsi)
pasid cache invalidation and global pasid cache invalidation by walking
guest pasid table.

Reason:
Guest OS may flush the pasid cache with wrong granularity. e.g. guest
does a svm_bind() but flush the pasid cache with global or domain
selective pasid cache invalidation instead of pasid selective(psi)
pasid cache invalidation. Regards to such case, it works in host.
Per spec, a global or domain selective pasid cache invalidation should
be able to cover what a pasid selective invalidation does. The only
concern is performance deduction since dsi and global cache invalidation
will flush more than psi. To align with native, vIOMMU needs emulator
needs to do replay for the two invalidation granularity to reflect the
latest pasid bindings in guest pasid table.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c          | 160 +++++++++++++++++++++++++++++++++++++----
 hw/i386/intel_iommu_internal.h |   1 +
 2 files changed, 149 insertions(+), 12 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 793b0de..a9e660c 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -68,6 +68,8 @@ static void vtd_address_space_refresh_all(IntelIOMMUState *s);
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
 
 static void vtd_pasid_cache_reset(IntelIOMMUState *s);
+static int vtd_update_pe_cache_for_dev(IntelIOMMUState *s,
+              VTDBus *vtd_bus, int devfn, int pasid, VTDPASIDEntry *pe);
 
 static void vtd_panic_require_caching_mode(void)
 {
@@ -2618,6 +2620,111 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value,
     return remove;
 }
 
+/**
+ * Constant information used during pasid table walk
+ * @ic: iommu_context
+ * @flags: indicates if it is domain selective walk
+ * @did: domain ID of the pasid table walk
+ */
+typedef struct {
+    VTDIOMMUContext *ic;
+#define VTD_PASID_TABLE_DID_SEL_WALK   (1ULL << 0);
+    uint32_t flags;
+    uint16_t did;
+} vtd_pt_walk_info;
+
+static bool vtd_sm_pasid_table_walk_one(IntelIOMMUState *s,
+                                       dma_addr_t pt_base,
+                                       int start,
+                                       int end,
+                                       vtd_pt_walk_info *info)
+{
+    VTDPASIDEntry pe;
+    int pasid = start;
+    int pasid_next;
+    VTDIOMMUContext *ic = info->ic;
+
+    while (pasid < end) {
+        pasid_next = pasid + 1;
+
+        if (!vtd_get_pe_in_pasid_leaf_table(s, pasid, pt_base, &pe)
+            && vtd_pe_present(&pe)) {
+            if (vtd_update_pe_cache_for_dev(s,
+                                      ic->vtd_bus, ic->devfn, pasid, &pe)) {
+                error_report_once("%s, bus: %d, devfn: %d, pasid: %d",
+                  __func__, pci_bus_num(ic->vtd_bus->bus), ic->devfn, pasid);
+                return false;
+            }
+        }
+        pasid = pasid_next;
+    }
+    return true;
+}
+
+/*
+ * Currently, VT-d scalable mode pasid table is a two level table, this
+ * function aims to loop a range of PASIDs in a given pasid table to
+ * identify the pasid config in guest.
+ */
+static void vtd_sm_pasid_table_walk(IntelIOMMUState *s, dma_addr_t pdt_base,
+                                  int start, int end, vtd_pt_walk_info *info)
+{
+    VTDPASIDDirEntry pdire;
+    int pasid = start;
+    int pasid_next;
+    dma_addr_t pt_base;
+
+    while (pasid < end) {
+        pasid_next = pasid + VTD_PASID_TBL_ENTRY_NUM;
+        if (!vtd_get_pdire_from_pdir_table(pdt_base, pasid, &pdire)
+            && vtd_pdire_present(&pdire)) {
+            pt_base = pdire.val & VTD_PASID_TABLE_BASE_ADDR_MASK;
+            if (!vtd_sm_pasid_table_walk_one(s,
+                              pt_base, pasid, pasid_next, info)) {
+                break;
+            }
+        }
+        pasid = pasid_next;
+    }
+}
+
+/**
+ * This function replay the guest pasid bindings to hots by
+ * walking the guest PASID table. This ensures host will have
+ * latest guest pasid bindings.
+ */
+static void vtd_replay_guest_pasid_bindings(IntelIOMMUState *s,
+                                           uint16_t *did, bool is_dsi)
+{
+    VTDContextEntry ce;
+    vtd_pt_walk_info info;
+    VTDIOMMUContext *vtd_ic;
+
+    if (is_dsi) {
+        info.flags = VTD_PASID_TABLE_DID_SEL_WALK;
+        info.did = *did;
+    }
+
+    /*
+     * In this replay, only needs to care about the devices which
+     * has iommu_context created. For the one not have iommu_context,
+     * it is not necessary to replay the bindings since their cache
+     * could be re-created in the next DMA address transaltion.
+     */
+    QLIST_FOREACH(vtd_ic, &s->vtd_dev_ic_list, next) {
+        if (!vtd_dev_to_context_entry(s,
+                                      pci_bus_num(vtd_ic->vtd_bus->bus),
+                                      vtd_ic->devfn, &ce)) {
+            info.ic = vtd_ic;
+            vtd_sm_pasid_table_walk(s,
+                                    VTD_CE_GET_PASID_DIR_TABLE(&ce),
+                                    0,
+                                    VTD_MAX_HPASID,
+                                    &info);
+        }
+    }
+}
+
 static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id)
 {
     VTDPASIDCacheInfo pc_info;
@@ -2632,15 +2739,17 @@ static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id)
      */
     vtd_iommu_lock(s);
     g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
-    vtd_iommu_unlock(s);
 
     /*
-     * TODO: Domain selective PASID cache invalidation
-     * may be issued wrongly by programmer, to be safe,
-     * after invalidating the pasid caches, emulator
-     * needs to replay the pasid bindings by walking guest
-     * pasid dir and pasid table.
+     * Domain selective PASID cache invalidation may be
+     * issued wrongly by programmer, to be safe, after
+     * invalidating the pasid caches, emulator needs to
+     * replay the pasid bindings by walking guest pasid
+     * dir and pasid table.
      */
+    vtd_replay_guest_pasid_bindings(s, &domain_id, true);
+
+    vtd_iommu_unlock(s);
     return 0;
 }
 
@@ -2706,6 +2815,31 @@ static inline void vtd_fill_in_pe_cache(
                          pe, VTD_PASID_UPDATE);
 }
 
+/**
+ * This function updates the pasid entry cached in &vtd_pasid_as.
+ * Caller of this function should hold iommu_lock.
+ */
+static int vtd_update_pe_cache_for_dev(IntelIOMMUState *s, VTDBus *vtd_bus,
+                               int devfn, int pasid, VTDPASIDEntry *pe)
+{
+    VTDPASIDAddressSpace *vtd_pasid_as;
+
+    vtd_pasid_as = vtd_add_find_pasid_as(s, vtd_bus,
+                                        devfn, pasid, true);
+    if (!vtd_pasid_as) {
+        error_report_once("%s, fatal error happened!\n", __func__);
+        return -1;
+    }
+
+    if (vtd_pasid_as->pasid_cache_entry.pasid_cache_gen ==
+                                               s->pasid_cache_gen) {
+        return 0;
+    }
+
+    vtd_fill_in_pe_cache(vtd_pasid_as, pe);
+    return 0;
+}
+
 static int vtd_pasid_cache_psi(IntelIOMMUState *s,
                                uint16_t domain_id, uint32_t pasid)
 {
@@ -2825,15 +2959,17 @@ static int vtd_pasid_cache_gsi(IntelIOMMUState *s)
 
     vtd_iommu_lock(s);
     vtd_pasid_cache_reset(s);
-    vtd_iommu_unlock(s);
 
     /*
-     * TODO: Global PASID cache invalidation may be
-     * issued wrongly by programmer, to be safe, after
-     * invalidating the pasid caches, emulator needs
-     * to replay the pasid bindings by walking guest
-     * pasid dir and pasid table.
+     * Global PASID cache invalidation may be issued
+     * wrongly by programmer, to be safe, after invalidating
+     * the pasid caches, emulator needs to replay the
+     * pasid bindings by walking guest pasid dir and
+     * pasid table.
      */
+    vtd_replay_guest_pasid_bindings(s, NULL, false);
+
+    vtd_iommu_unlock(s);
     return 0;
 }
 
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 13e02e8..eab65ef 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -545,6 +545,7 @@ typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo;
 #define VTD_PASID_TABLE_BITS_MASK     (0x3fULL)
 #define VTD_PASID_TABLE_INDEX(pasid)  ((pasid) & VTD_PASID_TABLE_BITS_MASK)
 #define VTD_PASID_ENTRY_FPD           (1ULL << 1) /* Fault Processing Disable */
+#define VTD_PASID_TBL_ENTRY_NUM       (1ULL << 6)
 
 /* PASID Granular Translation Type Mask */
 #define VTD_PASID_ENTRY_P              1ULL
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 17/22] intel_iommu: replay pasid binds after context cache invalidation
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (15 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 16/22] intel_iommu: replay guest pasid bindings " Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-10-24 12:34 ` [RFC v2 18/22] intel_iommu: do not passdown pasid bind for PASID #0 Liu Yi L
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

This patch replays guest pasid bindings after context cache invalidation.
This is a behavior to ensure safety. Actually, programmer should issue
pasid cache invalidation with proper granularity after issuing a context
cache invalidation.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c          | 68 ++++++++++++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h |  3 ++
 hw/i386/trace-events           |  1 +
 3 files changed, 72 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a9e660c..6bceb7f 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -70,6 +70,10 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
 static void vtd_pasid_cache_reset(IntelIOMMUState *s);
 static int vtd_update_pe_cache_for_dev(IntelIOMMUState *s,
               VTDBus *vtd_bus, int devfn, int pasid, VTDPASIDEntry *pe);
+static void vtd_replay_guest_pasid_bindings(IntelIOMMUState *s,
+                                           uint16_t *did, bool is_dsi);
+static void vtd_pasid_cache_devsi(IntelIOMMUState *s,
+                                  VTDBus *vtd_bus, uint16_t devfn);
 
 static void vtd_panic_require_caching_mode(void)
 {
@@ -1861,6 +1865,10 @@ static void vtd_context_global_invalidate(IntelIOMMUState *s)
      * VT-d emulation codes.
      */
     vtd_iommu_replay_all(s);
+
+    vtd_iommu_lock(s);
+    vtd_replay_guest_pasid_bindings(s, NULL, false);
+    vtd_iommu_unlock(s);
 }
 
 static void vtd_bind_guest_pasid(IntelIOMMUState *s, VTDBus *vtd_bus,
@@ -1981,6 +1989,22 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
                  * happened.
                  */
                 vtd_sync_shadow_page_table(vtd_as);
+                /*
+                 * Per spec, context flush should also followed with PASID
+                 * cache and iotlb flush. Regards to a device selective
+                 * context cache invalidation:
+                 * if (emaulted_device)
+                 *    modify the pasid cache gen and pasid-based iotlb gen
+                 *    value (will be added in following patches)
+                 * else if (assigned_device)
+                 *    check if the device has been bound to any pasid
+                 *    invoke pasid_unbind regards to each bound pasid
+                 * Here, we have vtd_pasid_cache_devsi() to invalidate pasid
+                 * caches, while for piotlb in QEMU, we don't have it yet, so
+                 * no handling. For assigned device, host iommu driver would
+                 * flush piotlb when a pasid unbind is passdown to it.
+                 */
+                 vtd_pasid_cache_devsi(s, vtd_bus, devfn_it);
             }
         }
     }
@@ -2516,6 +2540,11 @@ static inline bool vtd_pc_is_pasid_si(struct VTDPASIDCacheInfo *pc_info)
     return pc_info->flags & VTD_PASID_CACHE_PASIDSI;
 }
 
+static inline bool vtd_pc_is_dev_si(struct VTDPASIDCacheInfo *pc_info)
+{
+    return pc_info->flags & VTD_PASID_CACHE_DEVSI;
+}
+
 static inline int vtd_dev_get_pe_from_pasid(IntelIOMMUState *s,
                                             uint8_t bus_num,
                                             uint8_t devfn,
@@ -2578,6 +2607,8 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value,
     devfn = vtd_pasid_as->devfn;
 
     if (pc_entry->pasid_cache_gen &&
+        (vtd_pc_is_dev_si(pc_info) ? (((pc_info->devfn == devfn)) &&
+         (pc_info->vtd_bus == vtd_bus)) : 1) &&
         (vtd_pc_is_dom_si(pc_info) ? (pc_info->domain_id == did) : 1) &&
         (vtd_pc_is_pasid_si(pc_info) ? (pc_info->pasid == pasid) : 1)) {
         /*
@@ -2934,6 +2965,43 @@ static int vtd_pasid_cache_psi(IntelIOMMUState *s,
     return 0;
 }
 
+static void vtd_pasid_cache_devsi(IntelIOMMUState *s,
+                                  VTDBus *vtd_bus, uint16_t devfn)
+{
+    VTDPASIDCacheInfo pc_info;
+    VTDContextEntry ce;
+    vtd_pt_walk_info info;
+
+    trace_vtd_pasid_cache_devsi(devfn);
+
+    pc_info.flags = VTD_PASID_CACHE_DEVSI;
+    pc_info.vtd_bus = vtd_bus;
+    pc_info.devfn = devfn;
+
+    vtd_iommu_lock(s);
+    g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
+
+    /*
+     * To be safe, after invalidating the pasid caches,
+     * emulator needs to replay the pasid bindings by
+     * walking guest pasid dir and pasid table.
+     */
+    if (vtd_bus->dev_ic[devfn] &&
+        !vtd_dev_to_context_entry(s,
+                                  pci_bus_num(vtd_bus->bus),
+                                  devfn, &ce)) {
+        info.flags = 0x0;
+        info.did = 0;
+        info.ic = vtd_bus->dev_ic[devfn];
+        vtd_sm_pasid_table_walk(s,
+                                VTD_CE_GET_PASID_DIR_TABLE(&ce),
+                                0,
+                                VTD_MAX_HPASID,
+                                &info);
+    }
+    vtd_iommu_unlock(s);
+}
+
 /**
  * Caller of this function should hold iommu_lock
  */
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index eab65ef..908536c 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -494,9 +494,12 @@ typedef enum VTDPASIDOp VTDPASIDOp;
 struct VTDPASIDCacheInfo {
 #define VTD_PASID_CACHE_DOMSI   (1ULL << 0);
 #define VTD_PASID_CACHE_PASIDSI (1ULL << 1);
+#define VTD_PASID_CACHE_DEVSI   (1ULL << 2);
     uint32_t flags;
     uint16_t domain_id;
     uint32_t pasid;
+    VTDBus *vtd_bus;
+    uint16_t devfn;
 };
 typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo;
 
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 7912ae1..25bd6a4 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -26,6 +26,7 @@ vtd_pasid_cache_reset(void) ""
 vtd_pasid_cache_gsi(void) ""
 vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain 0x%"PRIx16
 vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32
+vtd_pasid_cache_devsi(uint16_t devfn) "Dev slective PC invalidation dev: 0x%"PRIx16
 vtd_re_not_present(uint8_t bus) "Root entry bus %"PRIu8" not present"
 vtd_ce_not_present(uint8_t bus, uint8_t devfn) "Context entry bus %"PRIu8" devfn %"PRIu8" not present"
 vtd_iotlb_page_hit(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t domain) "IOTLB page hit sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" domain 0x%"PRIx16
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 18/22] intel_iommu: do not passdown pasid bind for PASID #0
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (16 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 17/22] intel_iommu: replay pasid binds after context cache invalidation Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-10-24 12:34 ` [RFC v2 19/22] vfio/pci: add iommu_context notifier for PASID-based iotlb flush Liu Yi L
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

RID_PASID field was introduced in VT-d 3.0 spec, it is used for DMA
requests w/o PASID in scalable mode VT-d. It is also known as IOVA.
And in VT-d 3.1 spec, there is further definition on it:

"Implementations not supporting RID_PASID capability (ECAP_REG.RPS is
0b), use a PASID value of 0 to perform address translation for requests
without PASID."

This patch adds a check on the PASIDs which are going to be bound to
device. For PASID #0, no need to passdown pasid binding since PASID #0
is used as RID_PASID for requests without pasid. Reason is current Intel
vIOMMU supports guest IOVA by shadowing guest 2nd level page table.
However, in future, if guest OS uses 1st level page table to store IOVA
mappings, thus guest IOVA support will also be done via nested translation.
Then vIOMMU could passdown the pasid binding for PASID #0 to host with a
special PASID value to indicate host to bind the guest page table to a
proper PASID. e.g PASID value from RID_PASID field for PF/VF if ECAP_REG.RPS
is clear or default PASID for ADI (Assignable Device Interface in Scalable
IOV solution).

IOVA over FLPT support on Intel VT-d: https://lkml.org/lkml/2019/9/23/297

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 6bceb7f..d621455 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1880,6 +1880,16 @@ static void vtd_bind_guest_pasid(IntelIOMMUState *s, VTDBus *vtd_bus,
     IOMMUCTXPASIDBindData bind;
     struct iommu_gpasid_bind_data *g_bind_data;
 
+    if (pasid < VTD_MIN_HPASID) {
+        /*
+         * If pasid < VTD_HPASID_MIN, this pasid is not allocated
+         * from host. No need to passdown the changes on it to host.
+         * TODO: when IOVA over FLPT is ready, this switch should be
+         * refined.
+         */
+        return;
+    }
+
     vtd_ic = vtd_bus->dev_ic[devfn];
     if (!vtd_ic) {
         return;
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 19/22] vfio/pci: add iommu_context notifier for PASID-based iotlb flush
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (17 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 18/22] intel_iommu: do not passdown pasid bind for PASID #0 Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-10-24 12:34 ` [RFC v2 20/22] intel_iommu: process PASID-based iotlb invalidation Liu Yi L
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

This patch adds notifier for propagating guest PASID-based iotlb
invalidation to host.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/vfio/pci.c            | 29 +++++++++++++++++++++++++++++
 include/hw/iommu/iommu.h |  8 ++++++++
 2 files changed, 37 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 012b8ed..52fe3ed 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2802,6 +2802,31 @@ static void vfio_iommu_pasid_bind_notify(IOMMUCTXNotifier *n,
 #endif
 }
 
+static void vfio_iommu_cache_inv_notify(IOMMUCTXNotifier *n,
+                                        IOMMUCTXEventData *event_data)
+{
+#ifdef __linux__
+    VFIOIOMMUContext *giommu_ctx = container_of(n, VFIOIOMMUContext, n);
+    VFIOContainer *container = giommu_ctx->container;
+    IOMMUCTXCacheInvInfo *inv_info =
+                              (IOMMUCTXCacheInvInfo *) event_data->data;
+    struct vfio_iommu_type1_cache_invalidate *cache_inv;
+    unsigned long argsz;
+
+    argsz = sizeof(*cache_inv);
+    cache_inv = g_malloc0(argsz);
+    cache_inv->argsz = argsz;
+    cache_inv->info = *inv_info->info;
+    cache_inv->flags = 0;
+
+    if (ioctl(container->fd, VFIO_IOMMU_CACHE_INVALIDATE, cache_inv) != 0) {
+        error_report("%s: cache invalidation failed: %d", __func__, -errno);
+    }
+
+    g_free(cache_inv);
+#endif
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
     VFIOPCIDevice *vdev = PCI_VFIO(pdev);
@@ -3118,6 +3143,10 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
                                          iommu_context,
                                          vfio_iommu_pasid_bind_notify,
                                          IOMMU_CTX_EVENT_PASID_BIND);
+        vfio_register_iommu_ctx_notifier(vdev,
+                                         iommu_context,
+                                         vfio_iommu_cache_inv_notify,
+                                         IOMMU_CTX_EVENT_CACHE_INV);
     }
 
     return;
diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
index 4f21aa1..452f609 100644
--- a/include/hw/iommu/iommu.h
+++ b/include/hw/iommu/iommu.h
@@ -34,6 +34,7 @@ enum IOMMUCTXEvent {
     IOMMU_CTX_EVENT_PASID_ALLOC,
     IOMMU_CTX_EVENT_PASID_FREE,
     IOMMU_CTX_EVENT_PASID_BIND,
+    IOMMU_CTX_EVENT_CACHE_INV,
     IOMMU_CTX_EVENT_NUM,
 };
 typedef enum IOMMUCTXEvent IOMMUCTXEvent;
@@ -61,6 +62,13 @@ struct IOMMUCTXPASIDBindData {
 };
 typedef struct IOMMUCTXPASIDBindData IOMMUCTXPASIDBindData;
 
+struct IOMMUCTXCacheInvInfo {
+#ifdef __linux__
+    struct iommu_cache_invalidate_info *info;
+#endif
+};
+typedef struct IOMMUCTXCacheInvInfo IOMMUCTXCacheInvInfo;
+
 struct IOMMUCTXEventData {
     IOMMUCTXEvent event;
     uint64_t length;
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 20/22] intel_iommu: process PASID-based iotlb invalidation
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (18 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 19/22] vfio/pci: add iommu_context notifier for PASID-based iotlb flush Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-10-24 12:34 ` [RFC v2 21/22] intel_iommu: propagate PASID-based iotlb invalidation to host Liu Yi L
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

This patch adds the basic PASID-based iotlb (piotlb) invalidation support.
piotlb is used during walking Intel VT-d first-level page table. This patch
only adds the basic processing. Detailed process will be added in next patch.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c          | 57 ++++++++++++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h | 13 ++++++++++
 2 files changed, 70 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index d621455..6cd922f 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3092,6 +3092,59 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s,
     return (ret == 0) ? true : false;
 }
 
+static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s,
+                                        uint16_t domain_id,
+                                        uint32_t pasid)
+{
+}
+
+static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
+                             uint32_t pasid, hwaddr addr, uint8_t am, bool ih)
+{
+}
+
+static bool vtd_process_piotlb_desc(IntelIOMMUState *s,
+                                    VTDInvDesc *inv_desc)
+{
+    uint16_t domain_id;
+    uint32_t pasid;
+    uint8_t am;
+    hwaddr addr;
+
+    if ((inv_desc->val[0] & VTD_INV_DESC_PIOTLB_RSVD_VAL0) ||
+        (inv_desc->val[1] & VTD_INV_DESC_PIOTLB_RSVD_VAL1)) {
+        error_report_once("non-zero-field-in-piotlb_inv_desc hi: 0x%" PRIx64
+                  " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]);
+        return false;
+    }
+
+    domain_id = VTD_INV_DESC_PIOTLB_DID(inv_desc->val[0]);
+    pasid = VTD_INV_DESC_PIOTLB_PASID(inv_desc->val[0]);
+    switch (inv_desc->val[0] & VTD_INV_DESC_IOTLB_G) {
+    case VTD_INV_DESC_PIOTLB_ALL_IN_PASID:
+        vtd_piotlb_pasid_invalidate(s, domain_id, pasid);
+        break;
+
+    case VTD_INV_DESC_PIOTLB_PSI_IN_PASID:
+        am = VTD_INV_DESC_PIOTLB_AM(inv_desc->val[1]);
+        addr = (hwaddr) VTD_INV_DESC_PIOTLB_ADDR(inv_desc->val[1]);
+        if (am > VTD_MAMV) {
+            error_report_once("Invalid am, > max am value, hi: 0x%" PRIx64
+                    " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]);
+            return false;
+        }
+        vtd_piotlb_page_invalidate(s, domain_id, pasid,
+             addr, am, VTD_INV_DESC_PIOTLB_IH(inv_desc->val[1]));
+        break;
+
+    default:
+        error_report_once("Invalid granularity in P-IOTLB desc hi: 0x%" PRIx64
+                  " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]);
+        return false;
+    }
+    return true;
+}
+
 static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
                                      VTDInvDesc *inv_desc)
 {
@@ -3206,6 +3259,10 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         break;
 
     case VTD_INV_DESC_PIOTLB:
+        trace_vtd_inv_desc("p-iotlb", inv_desc.val[1], inv_desc.val[0]);
+        if (!vtd_process_piotlb_desc(s, &inv_desc)) {
+            return false;
+        }
         break;
 
     case VTD_INV_DESC_WAIT:
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 908536c..eddfe54 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -458,6 +458,19 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_PASIDC_PASID_SI   (1ULL << 4)
 #define VTD_INV_DESC_PASIDC_GLOBAL     (3ULL << 4)
 
+#define VTD_INV_DESC_PIOTLB_ALL_IN_PASID  (2ULL << 4)
+#define VTD_INV_DESC_PIOTLB_PSI_IN_PASID  (3ULL << 4)
+
+#define VTD_INV_DESC_PIOTLB_RSVD_VAL0     0xfff000000000ffc0ULL
+#define VTD_INV_DESC_PIOTLB_RSVD_VAL1     0xf80ULL
+
+#define VTD_INV_DESC_PIOTLB_PASID(val)    (((val) >> 32) & 0xfffffULL)
+#define VTD_INV_DESC_PIOTLB_DID(val)      (((val) >> 16) & \
+                                             VTD_DOMAIN_ID_MASK)
+#define VTD_INV_DESC_PIOTLB_ADDR(val)     ((val) & ~0xfffULL)
+#define VTD_INV_DESC_PIOTLB_AM(val)       ((val) & 0x3fULL)
+#define VTD_INV_DESC_PIOTLB_IH(val)       (((val) >> 6) & 0x1)
+
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
     uint16_t domain_id;
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 21/22] intel_iommu: propagate PASID-based iotlb invalidation to host
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (19 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 20/22] intel_iommu: process PASID-based iotlb invalidation Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-10-24 12:34 ` [RFC v2 22/22] intel_iommu: process PASID-based Device-TLB invalidation Liu Yi L
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

This patch propagates PASID-based iotlb invalidation to host.

Intel VT-d 3.0 supports nested translation in PASID granularity. For guest
SVA support, nested translation is enabled for specific PASID. This is also
known as dual stage translation which gives better virtualization support.

Under such configuration, guest owns the GVA->GPA translation which is
configured as first level page table in host side for a specific pasid, and
host owns GPA->HPA translation. As guest owns first level translation table,
piotlb invalidation should be propagated to host since host IOMMU will cache
first level page table related mappings during DMA address translation.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c          | 127 +++++++++++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h |   7 +++
 2 files changed, 134 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 6cd922f..5ca9ee1 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3092,15 +3092,142 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s,
     return (ret == 0) ? true : false;
 }
 
+static void vtd_invalidate_piotlb(IntelIOMMUState *s, VTDBus *vtd_bus,
+                                  int devfn, IOMMUCTXCacheInvInfo *inv_info)
+{
+#ifdef __linux__
+    VTDIOMMUContext *vtd_ic;
+    IOMMUCTXEventData event_data;
+    vtd_ic = vtd_bus->dev_ic[devfn];
+    if (!vtd_ic) {
+        return;
+    }
+    event_data.event = IOMMU_CTX_EVENT_CACHE_INV;
+    event_data.data = inv_info;
+    iommu_ctx_event_notify(&vtd_ic->iommu_context, &event_data);
+#endif
+}
+
+static inline bool vtd_pasid_cache_valid(
+                          VTDPASIDAddressSpace *vtd_pasid_as)
+{
+    return (vtd_pasid_as->iommu_state->pasid_cache_gen &&
+            (vtd_pasid_as->iommu_state->pasid_cache_gen
+             == vtd_pasid_as->pasid_cache_entry.pasid_cache_gen));
+}
+
+/**
+ * This function is a loop function for the s->vtd_pasid_as
+ * list with VTDPIOTLBInvInfo as execution filter. It propagates
+ * the piotlb invalidation to host. Caller of this function
+ * should hold iommu_lock.
+ */
+static void vtd_flush_pasid_iotlb(gpointer key, gpointer value,
+                                  gpointer user_data)
+{
+    VTDPIOTLBInvInfo *piotlb_info = user_data;
+    VTDPASIDAddressSpace *vtd_pasid_as = value;
+    uint16_t did;
+
+    /*
+     * Needs to check whether the pasid entry cache stored in
+     * vtd_pasid_as is valid or not. "invalid" means the pasid
+     * cache has been flushed, thus host should have done piotlb
+     * invalidation together with a pasid cache invalidation, so
+     * no need to pass down piotlb invalidation to host for better
+     * performance. Only when pasid entry cache is "valid", should
+     * a piotlb invalidation be propagated to host since it means
+     * guest just modified a mapping in its page table.
+     */
+    if (!vtd_pasid_cache_valid(vtd_pasid_as)) {
+        return;
+    }
+
+    did = vtd_pe_get_domain_id(
+                &(vtd_pasid_as->pasid_cache_entry.pasid_entry));
+
+    if ((piotlb_info->domain_id == did) &&
+        (piotlb_info->pasid == vtd_pasid_as->pasid)) {
+        vtd_invalidate_piotlb(vtd_pasid_as->iommu_state,
+                              vtd_pasid_as->vtd_bus,
+                              vtd_pasid_as->devfn,
+                              &piotlb_info->inv_info);
+    }
+
+    /*
+     * TODO: needs to add QEMU piotlb flush when QEMU piotlb
+     * infrastructure is ready. For now, it is enough for passthru
+     * devices.
+     */
+}
+
 static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s,
                                         uint16_t domain_id,
                                         uint32_t pasid)
 {
+#ifdef __linux__
+    VTDPIOTLBInvInfo piotlb_info;
+    struct iommu_cache_invalidate_info *cache_info;
+    IOMMUCTXCacheInvInfo *inv_info = &piotlb_info.inv_info;
+
+    cache_info = g_malloc0(sizeof(*cache_info));
+    cache_info->version = IOMMU_CACHE_INVALIDATE_INFO_VERSION_1;
+    cache_info->cache = IOMMU_CACHE_INV_TYPE_IOTLB;
+    cache_info->granularity = IOMMU_INV_GRANU_PASID;
+    cache_info->pasid_info.pasid = pasid;
+    cache_info->pasid_info.flags = IOMMU_INV_PASID_FLAGS_PASID;
+    inv_info->info = cache_info;
+    piotlb_info.domain_id = domain_id;
+    piotlb_info.pasid = pasid;
+
+    vtd_iommu_lock(s);
+    /*
+     * Here loops all the vtd_pasid_as instances in s->vtd_pasid_as
+     * to find out the affected devices since piotlb invalidation
+     * should check pasid cache per architecture point of view.
+     */
+    g_hash_table_foreach(s->vtd_pasid_as,
+                         vtd_flush_pasid_iotlb, &piotlb_info);
+    vtd_iommu_unlock(s);
+
+    g_free(cache_info);
+#endif
 }
 
 static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
                              uint32_t pasid, hwaddr addr, uint8_t am, bool ih)
 {
+#ifdef __linux__
+    VTDPIOTLBInvInfo piotlb_info;
+    struct iommu_cache_invalidate_info *cache_info;
+    IOMMUCTXCacheInvInfo *inv_info = &piotlb_info.inv_info;
+
+    cache_info = g_malloc0(sizeof(*cache_info));
+    cache_info->version = IOMMU_CACHE_INVALIDATE_INFO_VERSION_1;
+    cache_info->cache = IOMMU_CACHE_INV_TYPE_IOTLB;
+    cache_info->granularity = IOMMU_INV_GRANU_ADDR;
+    cache_info->addr_info.flags = IOMMU_INV_ADDR_FLAGS_PASID;
+    cache_info->addr_info.flags |= ih ? IOMMU_INV_ADDR_FLAGS_LEAF : 0;
+    cache_info->addr_info.pasid = pasid;
+    cache_info->addr_info.addr = addr;
+    cache_info->addr_info.granule_size = 1 << (12 + am);
+    cache_info->addr_info.nb_granules = 1;
+    inv_info->info = cache_info;
+    piotlb_info.domain_id = domain_id;
+    piotlb_info.pasid = pasid;
+
+    vtd_iommu_lock(s);
+    /*
+     * Here loops all the vtd_pasid_as instances in s->vtd_pasid_as
+     * to find out the affected devices since piotlb invalidation
+     * should check pasid cache per architecture point of view.
+     */
+    g_hash_table_foreach(s->vtd_pasid_as,
+                         vtd_flush_pasid_iotlb, &piotlb_info);
+    vtd_iommu_unlock(s);
+
+    g_free(cache_info);
+#endif
 }
 
 static bool vtd_process_piotlb_desc(IntelIOMMUState *s,
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index eddfe54..6a83f6c 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -516,6 +516,13 @@ struct VTDPASIDCacheInfo {
 };
 typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo;
 
+struct VTDPIOTLBInvInfo {
+    uint16_t domain_id;
+    uint32_t pasid;
+    IOMMUCTXCacheInvInfo inv_info;
+};
+typedef struct VTDPIOTLBInvInfo VTDPIOTLBInvInfo;
+
 /* Masks for struct VTDRootEntry */
 #define VTD_ROOT_ENTRY_P            1ULL
 #define VTD_ROOT_ENTRY_CTP          (~0xfffULL)
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC v2 22/22] intel_iommu: process PASID-based Device-TLB invalidation
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (20 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 21/22] intel_iommu: propagate PASID-based iotlb invalidation to host Liu Yi L
@ 2019-10-24 12:34 ` Liu Yi L
  2019-10-25  6:21 ` [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM no-reply
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 79+ messages in thread
From: Liu Yi L @ 2019-10-24 12:34 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, yi.l.liu, Yi Sun, kvm, jun.j.tian,
	eric.auger, yi.y.sun, jacob.jun.pan, david

This patch adds an empty handling for PASID-based Device-TLB invalidation.
For now it is enough as it is not necessary to propagate it to host for
passthru device and also there is no emulated device has device tlb.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c          | 18 ++++++++++++++++++
 hw/i386/intel_iommu_internal.h |  1 +
 2 files changed, 19 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 5ca9ee1..1c00e9c 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3285,6 +3285,17 @@ static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
     return true;
 }
 
+static bool vtd_process_device_piotlb_desc(IntelIOMMUState *s,
+                                           VTDInvDesc *inv_desc)
+{
+    /*
+     * no need to handle it for passthru device, for emulated
+     * devices with device tlb, it may be required, but for now,
+     * return is enough
+     */
+    return true;
+}
+
 static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
                                           VTDInvDesc *inv_desc)
 {
@@ -3406,6 +3417,13 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         }
         break;
 
+    case VTD_INV_DESC_DEV_PIOTLB:
+        trace_vtd_inv_desc("device-piotlb", inv_desc.hi, inv_desc.lo);
+        if (!vtd_process_device_piotlb_desc(s, &inv_desc)) {
+            return false;
+        }
+        break;
+
     case VTD_INV_DESC_DEVICE:
         trace_vtd_inv_desc("device", inv_desc.hi, inv_desc.lo);
         if (!vtd_process_device_iotlb_desc(s, &inv_desc)) {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 6a83f6c..714dc09 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -390,6 +390,7 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_WAIT               0x5 /* Invalidation Wait Descriptor */
 #define VTD_INV_DESC_PIOTLB             0x6 /* PASID-IOTLB Invalidate Desc */
 #define VTD_INV_DESC_PC                 0x7 /* PASID-cache Invalidate Desc */
+#define VTD_INV_DESC_DEV_PIOTLB         0x8 /* PASID-based-DIOTLB inv_desc*/
 #define VTD_INV_DESC_NONE               0   /* Not an Invalidate Descriptor */
 
 /* Masks for Invalidation Wait Descriptor*/
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (21 preceding siblings ...)
  2019-10-24 12:34 ` [RFC v2 22/22] intel_iommu: process PASID-based Device-TLB invalidation Liu Yi L
@ 2019-10-25  6:21 ` no-reply
  2019-10-25  6:30 ` no-reply
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 79+ messages in thread
From: no-reply @ 2019-10-25  6:21 UTC (permalink / raw)
  To: yi.l.liu
  Cc: tianyu.lan, kevin.tian, yi.l.liu, kvm, mst, jun.j.tian,
	qemu-devel, peterx, eric.auger, alex.williamson, jacob.jun.pan,
	pbonzini, yi.y.sun, david

Patchew URL: https://patchew.org/QEMU/1571920483-3382-1-git-send-email-yi.l.liu@intel.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      hw/pci/pci_host.o
  CC      hw/pci/pcie.o
/tmp/qemu-test/src/hw/pci-host/designware.c: In function 'designware_pcie_host_realize':
/tmp/qemu-test/src/hw/pci-host/designware.c:693:5: error: incompatible type for argument 2 of 'pci_setup_iommu'
     pci_setup_iommu(pci->bus, designware_iommu_ops, s);
     ^
In file included from /tmp/qemu-test/src/include/hw/pci/msi.h:24:0,
---
/tmp/qemu-test/src/include/hw/pci/pci.h:495:6: note: expected 'const struct PCIIOMMUOps *' but argument is of type 'PCIIOMMUOps'
 void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *iommu_ops, void *opaque);
      ^
make: *** [hw/pci-host/designware.o] Error 1
make: *** Waiting for unfinished jobs....
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 662, in <module>
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=092c0f9750e6454780dcead436e6bc2c', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-ih3zhzs3/src/docker-src.2019-10-25-02.18.26.32058:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=092c0f9750e6454780dcead436e6bc2c
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-ih3zhzs3/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    3m8.783s
user    0m8.093s


The full log is available at
http://patchew.org/logs/1571920483-3382-1-git-send-email-yi.l.liu@intel.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (22 preceding siblings ...)
  2019-10-25  6:21 ` [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM no-reply
@ 2019-10-25  6:30 ` no-reply
  2019-10-25  9:49 ` Jason Wang
  2019-11-04 17:22 ` Peter Xu
  25 siblings, 0 replies; 79+ messages in thread
From: no-reply @ 2019-10-25  6:30 UTC (permalink / raw)
  To: yi.l.liu
  Cc: tianyu.lan, kevin.tian, yi.l.liu, kvm, mst, jun.j.tian,
	qemu-devel, peterx, eric.auger, alex.williamson, jacob.jun.pan,
	pbonzini, yi.y.sun, david

Patchew URL: https://patchew.org/QEMU/1571920483-3382-1-git-send-email-yi.l.liu@intel.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      hw/pci/pci_host.o
  CC      hw/pci/pcie.o
/tmp/qemu-test/src/hw/pci-host/designware.c: In function 'designware_pcie_host_realize':
/tmp/qemu-test/src/hw/pci-host/designware.c:693:31: error: incompatible type for argument 2 of 'pci_setup_iommu'
     pci_setup_iommu(pci->bus, designware_iommu_ops, s);
                               ^~~~~~~~~~~~~~~~~~~~
In file included from /tmp/qemu-test/src/include/hw/pci/msi.h:24,
---
/tmp/qemu-test/src/include/hw/pci/pci.h:495:54: note: expected 'const PCIIOMMUOps *' {aka 'const struct PCIIOMMUOps *'} but argument is of type 'PCIIOMMUOps' {aka 'const struct PCIIOMMUOps'}
 void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *iommu_ops, void *opaque);
                                   ~~~~~~~~~~~~~~~~~~~^~~~~~~~~
make: *** [/tmp/qemu-test/src/rules.mak:69: hw/pci-host/designware.o] Error 1
make: *** Waiting for unfinished jobs....
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 662, in <module>
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=c26679928a9c432d9832978acd80e20b', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-c5ij2tri/src/docker-src.2019-10-25-02.27.27.16595:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=c26679928a9c432d9832978acd80e20b
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-c5ij2tri/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real    2m45.686s
user    0m7.841s


The full log is available at
http://patchew.org/logs/1571920483-3382-1-git-send-email-yi.l.liu@intel.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (23 preceding siblings ...)
  2019-10-25  6:30 ` no-reply
@ 2019-10-25  9:49 ` Jason Wang
  2019-10-25 10:12   ` Tian, Kevin
  2019-11-04 17:22 ` Peter Xu
  25 siblings, 1 reply; 79+ messages in thread
From: Jason Wang @ 2019-10-25  9:49 UTC (permalink / raw)
  To: Liu Yi L, qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, kvm, jun.j.tian,
	eric.auger, yi.y.sun, david


On 2019/10/24 下午8:34, Liu Yi L wrote:
> Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on Intel
> platforms allow address space sharing between device DMA and applications.


Interesting, so the below figure demonstrates the case of VM. I wonder 
how much differences if we compare it with doing SVM between device and 
an ordinary process (e.g dpdk)?

Thanks


> SVA can reduce programming complexity and enhance security.
> This series is intended to expose SVA capability to VMs. i.e. shared guest
> application address space with passthru devices. The whole SVA virtualization
> requires QEMU/VFIO/IOMMU changes. This series includes the QEMU changes, for
> VFIO and IOMMU changes, they are in separate series (listed in the "Related
> series").
>
> The high-level architecture for SVA virtualization is as below:
>
>      .-------------.  .---------------------------.
>      |   vIOMMU    |  | Guest process CR3, FL only|
>      |             |  '---------------------------'
>      .----------------/
>      | PASID Entry |--- PASID cache flush -
>      '-------------'                       |
>      |             |                       V
>      |             |                CR3 in GPA
>      '-------------'
> Guest
> ------| Shadow |--------------------------|--------
>        v        v                          v
> Host
>      .-------------.  .----------------------.
>      |   pIOMMU    |  | Bind FL for GVA-GPA  |
>      |             |  '----------------------'
>      .----------------/  |
>      | PASID Entry |     V (Nested xlate)
>      '----------------\.------------------------------.
>      |             |   |SL for GPA-HPA, default domain|
>      |             |   '------------------------------'
>      '-------------'
> Where:
>   - FL = First level/stage one page tables
>   - SL = Second level/stage two page tables
>
> The complete vSVA upstream patches are divided into three phases:
>      1. Common APIs and PCI device direct assignment
>      2. Page Request Services (PRS) support
>      3. Mediated device assignment
>
> This RFC patchset is aiming for the phase 1. Works together with the VT-d
> driver[1] changes and VFIO changes[2].
>
> Related series:
> [1] [PATCH v6 00/10] Nested Shared Virtual Address (SVA) VT-d support:
> https://lkml.org/lkml/2019/10/22/953
> <This series is based on this kernel series from Jacob Pan>
>
> [2] [RFC v2 0/3] vfio: support Shared Virtual Addressing from Yi Liu
>
> There are roughly four parts:
>   1. Introduce IOMMUContext as abstract layer between vIOMMU emulator and
>      VFIO to avoid direct calling between the two
>   2. Passdown PASID allocation and free to host
>   3. Passdown guest PASID binding to host
>   4. Passdown guest IOMMU cache invalidation to host
>
> The full set can be found in below link:
> https://github.com/luxis1999/qemu.git: sva_vtd_v6_qemu_rfc_v2
>
> Changelog:
> 	- RFC v1 -> v2:
> 	  Introduce IOMMUContext to abstract the connection between VFIO
> 	  and vIOMMU emulator, which is a replacement of the PCIPASIDOps
> 	  in RFC v1. Modify x-scalable-mode to be string option instead of
> 	  adding a new option as RFC v1 did. Refined the pasid cache management
> 	  and addressed the TODOs mentioned in RFC v1.
> 	  RFC v1: https://patchwork.kernel.org/cover/11033657/
>
> Eric Auger (1):
>    update-linux-headers: Import iommu.h
>
> Liu Yi L (20):
>    header update VFIO/IOMMU vSVA APIs against 5.4.0-rc3+
>    intel_iommu: modify x-scalable-mode to be string option
>    vfio/common: add iommu_ctx_notifier in container
>    hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps
>    hw/pci: introduce pci_device_iommu_context()
>    intel_iommu: provide get_iommu_context() callback
>    vfio/pci: add iommu_context notifier for pasid alloc/free
>    intel_iommu: add virtual command capability support
>    intel_iommu: process pasid cache invalidation
>    intel_iommu: add present bit check for pasid table entries
>    intel_iommu: add PASID cache management infrastructure
>    vfio/pci: add iommu_context notifier for pasid bind/unbind
>    intel_iommu: bind/unbind guest page table to host
>    intel_iommu: replay guest pasid bindings to host
>    intel_iommu: replay pasid binds after context cache invalidation
>    intel_iommu: do not passdown pasid bind for PASID #0
>    vfio/pci: add iommu_context notifier for PASID-based iotlb flush
>    intel_iommu: process PASID-based iotlb invalidation
>    intel_iommu: propagate PASID-based iotlb invalidation to host
>    intel_iommu: process PASID-based Device-TLB invalidation
>
> Peter Xu (1):
>    hw/iommu: introduce IOMMUContext
>
>   hw/Makefile.objs                |    1 +
>   hw/alpha/typhoon.c              |    6 +-
>   hw/arm/smmu-common.c            |    6 +-
>   hw/hppa/dino.c                  |    6 +-
>   hw/i386/amd_iommu.c             |    6 +-
>   hw/i386/intel_iommu.c           | 1249 +++++++++++++++++++++++++++++++++++++--
>   hw/i386/intel_iommu_internal.h  |  109 ++++
>   hw/i386/trace-events            |    6 +
>   hw/iommu/Makefile.objs          |    1 +
>   hw/iommu/iommu.c                |   66 +++
>   hw/pci-host/designware.c        |    6 +-
>   hw/pci-host/ppce500.c           |    6 +-
>   hw/pci-host/prep.c              |    6 +-
>   hw/pci-host/sabre.c             |    6 +-
>   hw/pci/pci.c                    |   27 +-
>   hw/ppc/ppc440_pcix.c            |    6 +-
>   hw/ppc/spapr_pci.c              |    6 +-
>   hw/s390x/s390-pci-bus.c         |    8 +-
>   hw/vfio/common.c                |   10 +
>   hw/vfio/pci.c                   |  149 +++++
>   include/hw/i386/intel_iommu.h   |   58 +-
>   include/hw/iommu/iommu.h        |  113 ++++
>   include/hw/pci/pci.h            |   13 +-
>   include/hw/pci/pci_bus.h        |    2 +-
>   include/hw/vfio/vfio-common.h   |    9 +
>   linux-headers/linux/iommu.h     |  324 ++++++++++
>   linux-headers/linux/vfio.h      |   83 +++
>   scripts/update-linux-headers.sh |    2 +-
>   28 files changed, 2232 insertions(+), 58 deletions(-)
>   create mode 100644 hw/iommu/Makefile.objs
>   create mode 100644 hw/iommu/iommu.c
>   create mode 100644 include/hw/iommu/iommu.h
>   create mode 100644 linux-headers/linux/iommu.h
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
  2019-10-25  9:49 ` Jason Wang
@ 2019-10-25 10:12   ` Tian, Kevin
  2019-10-31  4:33     ` Jason Wang
  0 siblings, 1 reply; 79+ messages in thread
From: Tian, Kevin @ 2019-10-25 10:12 UTC (permalink / raw)
  To: Jason Wang, Liu, Yi L, qemu-devel, mst, pbonzini,
	alex.williamson, peterx
  Cc: tianyu.lan, jacob.jun.pan, kvm, Tian, Jun J, eric.auger, Sun,
	Yi Y, david

> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Friday, October 25, 2019 5:49 PM
> 
> 
> On 2019/10/24 下午8:34, Liu Yi L wrote:
> > Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on Intel
> > platforms allow address space sharing between device DMA and
> applications.
> 
> 
> Interesting, so the below figure demonstrates the case of VM. I wonder
> how much differences if we compare it with doing SVM between device
> and
> an ordinary process (e.g dpdk)?
> 
> Thanks

One difference is that ordinary process requires only stage-1 translation,
while VM requires nested translation.

> 
> 
> > SVA can reduce programming complexity and enhance security.
> > This series is intended to expose SVA capability to VMs. i.e. shared guest
> > application address space with passthru devices. The whole SVA
> virtualization
> > requires QEMU/VFIO/IOMMU changes. This series includes the QEMU
> changes, for
> > VFIO and IOMMU changes, they are in separate series (listed in the
> "Related
> > series").
> >
> > The high-level architecture for SVA virtualization is as below:
> >
> >      .-------------.  .---------------------------.
> >      |   vIOMMU    |  | Guest process CR3, FL only|
> >      |             |  '---------------------------'
> >      .----------------/
> >      | PASID Entry |--- PASID cache flush -
> >      '-------------'                       |
> >      |             |                       V
> >      |             |                CR3 in GPA
> >      '-------------'
> > Guest
> > ------| Shadow |--------------------------|--------
> >        v        v                          v
> > Host
> >      .-------------.  .----------------------.
> >      |   pIOMMU    |  | Bind FL for GVA-GPA  |
> >      |             |  '----------------------'
> >      .----------------/  |
> >      | PASID Entry |     V (Nested xlate)
> >      '----------------\.------------------------------.
> >      |             |   |SL for GPA-HPA, default domain|
> >      |             |   '------------------------------'
> >      '-------------'
> > Where:
> >   - FL = First level/stage one page tables
> >   - SL = Second level/stage two page tables
> >
> > The complete vSVA upstream patches are divided into three phases:
> >      1. Common APIs and PCI device direct assignment
> >      2. Page Request Services (PRS) support
> >      3. Mediated device assignment
> >
> > This RFC patchset is aiming for the phase 1. Works together with the VT-d
> > driver[1] changes and VFIO changes[2].
> >
> > Related series:
> > [1] [PATCH v6 00/10] Nested Shared Virtual Address (SVA) VT-d support:
> > https://lkml.org/lkml/2019/10/22/953
> > <This series is based on this kernel series from Jacob Pan>
> >
> > [2] [RFC v2 0/3] vfio: support Shared Virtual Addressing from Yi Liu
> >
> > There are roughly four parts:
> >   1. Introduce IOMMUContext as abstract layer between vIOMMU
> emulator and
> >      VFIO to avoid direct calling between the two
> >   2. Passdown PASID allocation and free to host
> >   3. Passdown guest PASID binding to host
> >   4. Passdown guest IOMMU cache invalidation to host
> >
> > The full set can be found in below link:
> > https://github.com/luxis1999/qemu.git: sva_vtd_v6_qemu_rfc_v2
> >
> > Changelog:
> > 	- RFC v1 -> v2:
> > 	  Introduce IOMMUContext to abstract the connection between
> VFIO
> > 	  and vIOMMU emulator, which is a replacement of the
> PCIPASIDOps
> > 	  in RFC v1. Modify x-scalable-mode to be string option instead of
> > 	  adding a new option as RFC v1 did. Refined the pasid cache
> management
> > 	  and addressed the TODOs mentioned in RFC v1.
> > 	  RFC v1: https://patchwork.kernel.org/cover/11033657/
> >
> > Eric Auger (1):
> >    update-linux-headers: Import iommu.h
> >
> > Liu Yi L (20):
> >    header update VFIO/IOMMU vSVA APIs against 5.4.0-rc3+
> >    intel_iommu: modify x-scalable-mode to be string option
> >    vfio/common: add iommu_ctx_notifier in container
> >    hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps
> >    hw/pci: introduce pci_device_iommu_context()
> >    intel_iommu: provide get_iommu_context() callback
> >    vfio/pci: add iommu_context notifier for pasid alloc/free
> >    intel_iommu: add virtual command capability support
> >    intel_iommu: process pasid cache invalidation
> >    intel_iommu: add present bit check for pasid table entries
> >    intel_iommu: add PASID cache management infrastructure
> >    vfio/pci: add iommu_context notifier for pasid bind/unbind
> >    intel_iommu: bind/unbind guest page table to host
> >    intel_iommu: replay guest pasid bindings to host
> >    intel_iommu: replay pasid binds after context cache invalidation
> >    intel_iommu: do not passdown pasid bind for PASID #0
> >    vfio/pci: add iommu_context notifier for PASID-based iotlb flush
> >    intel_iommu: process PASID-based iotlb invalidation
> >    intel_iommu: propagate PASID-based iotlb invalidation to host
> >    intel_iommu: process PASID-based Device-TLB invalidation
> >
> > Peter Xu (1):
> >    hw/iommu: introduce IOMMUContext
> >
> >   hw/Makefile.objs                |    1 +
> >   hw/alpha/typhoon.c              |    6 +-
> >   hw/arm/smmu-common.c            |    6 +-
> >   hw/hppa/dino.c                  |    6 +-
> >   hw/i386/amd_iommu.c             |    6 +-
> >   hw/i386/intel_iommu.c           | 1249
> +++++++++++++++++++++++++++++++++++++--
> >   hw/i386/intel_iommu_internal.h  |  109 ++++
> >   hw/i386/trace-events            |    6 +
> >   hw/iommu/Makefile.objs          |    1 +
> >   hw/iommu/iommu.c                |   66 +++
> >   hw/pci-host/designware.c        |    6 +-
> >   hw/pci-host/ppce500.c           |    6 +-
> >   hw/pci-host/prep.c              |    6 +-
> >   hw/pci-host/sabre.c             |    6 +-
> >   hw/pci/pci.c                    |   27 +-
> >   hw/ppc/ppc440_pcix.c            |    6 +-
> >   hw/ppc/spapr_pci.c              |    6 +-
> >   hw/s390x/s390-pci-bus.c         |    8 +-
> >   hw/vfio/common.c                |   10 +
> >   hw/vfio/pci.c                   |  149 +++++
> >   include/hw/i386/intel_iommu.h   |   58 +-
> >   include/hw/iommu/iommu.h        |  113 ++++
> >   include/hw/pci/pci.h            |   13 +-
> >   include/hw/pci/pci_bus.h        |    2 +-
> >   include/hw/vfio/vfio-common.h   |    9 +
> >   linux-headers/linux/iommu.h     |  324 ++++++++++
> >   linux-headers/linux/vfio.h      |   83 +++
> >   scripts/update-linux-headers.sh |    2 +-
> >   28 files changed, 2232 insertions(+), 58 deletions(-)
> >   create mode 100644 hw/iommu/Makefile.objs
> >   create mode 100644 hw/iommu/iommu.c
> >   create mode 100644 include/hw/iommu/iommu.h
> >   create mode 100644 linux-headers/linux/iommu.h
> >


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 04/22] hw/iommu: introduce IOMMUContext
  2019-10-24 12:34 ` [RFC v2 04/22] hw/iommu: introduce IOMMUContext Liu Yi L
@ 2019-10-27 17:39   ` David Gibson
  2019-11-06 11:18     ` Liu, Yi L
  0 siblings, 1 reply; 79+ messages in thread
From: David Gibson @ 2019-10-27 17:39 UTC (permalink / raw)
  To: Liu Yi L
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, Yi Sun, kvm, mst,
	jun.j.tian, qemu-devel, peterx, eric.auger, alex.williamson,
	pbonzini, yi.y.sun

[-- Attachment #1: Type: text/plain, Size: 8514 bytes --]

On Thu, Oct 24, 2019 at 08:34:25AM -0400, Liu Yi L wrote:
> From: Peter Xu <peterx@redhat.com>
> 
> This patch adds IOMMUContext as an abstract layer of IOMMU related
> operations. The current usage of this abstract layer is setup dual-
> stage IOMMU translation (vSVA) for vIOMMU.
> 
> To setup dual-stage IOMMU translation, vIOMMU needs to propagate
> guest changes to host via passthru channels (e.g. VFIO). To have
> a better abstraction, it is better to avoid direct calling between
> vIOMMU and VFIO. So we have this new structure to act as abstract
> layer between VFIO and vIOMMU. So far, it is proposed to provide a
> notifier mechanism, which registered by VFIO and fired by vIOMMU.
> 
> For more background, may refer to the discussion below:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg05022.html
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
>  hw/Makefile.objs         |  1 +
>  hw/iommu/Makefile.objs   |  1 +
>  hw/iommu/iommu.c         | 66 ++++++++++++++++++++++++++++++++++++++++
>  include/hw/iommu/iommu.h | 79 ++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 147 insertions(+)
>  create mode 100644 hw/iommu/Makefile.objs
>  create mode 100644 hw/iommu/iommu.c
>  create mode 100644 include/hw/iommu/iommu.h
> 
> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> index ece6cc3..ac19f9c 100644
> --- a/hw/Makefile.objs
> +++ b/hw/Makefile.objs
> @@ -39,6 +39,7 @@ devices-dirs-y += xen/
>  devices-dirs-$(CONFIG_MEM_DEVICE) += mem/
>  devices-dirs-y += semihosting/
>  devices-dirs-y += smbios/
> +devices-dirs-y += iommu/
>  endif
>  
>  common-obj-y += $(devices-dirs-y)
> diff --git a/hw/iommu/Makefile.objs b/hw/iommu/Makefile.objs
> new file mode 100644
> index 0000000..0484b79
> --- /dev/null
> +++ b/hw/iommu/Makefile.objs
> @@ -0,0 +1 @@
> +obj-y += iommu.o
> diff --git a/hw/iommu/iommu.c b/hw/iommu/iommu.c
> new file mode 100644
> index 0000000..2391b0d
> --- /dev/null
> +++ b/hw/iommu/iommu.c
> @@ -0,0 +1,66 @@
> +/*
> + * QEMU abstract of IOMMU context
> + *
> + * Copyright (C) 2019 Red Hat Inc.
> + *
> + * Authors: Peter Xu <peterx@redhat.com>,
> + *          Liu Yi L <yi.l.liu@intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/iommu/iommu.h"
> +
> +void iommu_ctx_notifier_register(IOMMUContext *iommu_ctx,
> +                                 IOMMUCTXNotifier *n,
> +                                 IOMMUCTXNotifyFn fn,
> +                                 IOMMUCTXEvent event)
> +{
> +    n->event = event;
> +    n->iommu_ctx_event_notify = fn;
> +    QLIST_INSERT_HEAD(&iommu_ctx->iommu_ctx_notifiers, n, node);

Having this both modify the IOMMUCTXNotifier structure and insert it
in the list seems confusing to me - and gratuitously different from
the interface for both IOMMUNotifier and Notifier.

Separating out a iommu_ctx_notifier_init() as a helper and having
register take a fully initialized structure seems better to me.

> +    return;

Using an explicit return at the end of a function returning void is an
odd style.

> +}
> +
> +void iommu_ctx_notifier_unregister(IOMMUContext *iommu_ctx,
> +                                   IOMMUCTXNotifier *notifier)
> +{
> +    IOMMUCTXNotifier *cur, *next;
> +
> +    QLIST_FOREACH_SAFE(cur, &iommu_ctx->iommu_ctx_notifiers, node, next) {
> +        if (cur == notifier) {
> +            QLIST_REMOVE(cur, node);
> +            break;
> +        }
> +    }
> +}
> +
> +void iommu_ctx_event_notify(IOMMUContext *iommu_ctx,
> +                            IOMMUCTXEventData *event_data)
> +{
> +    IOMMUCTXNotifier *cur;
> +
> +    QLIST_FOREACH(cur, &iommu_ctx->iommu_ctx_notifiers, node) {
> +        if ((cur->event == event_data->event) &&
> +                                 cur->iommu_ctx_event_notify) {

Do you actually need the test on iommu_ctx_event_notify?  I can't see
any reason to register a notifier with a NULL function pointer.

> +            cur->iommu_ctx_event_notify(cur, event_data);
> +        }
> +    }
> +}
> +
> +void iommu_context_init(IOMMUContext *iommu_ctx)
> +{
> +    QLIST_INIT(&iommu_ctx->iommu_ctx_notifiers);
> +}
> diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
> new file mode 100644
> index 0000000..c22c442
> --- /dev/null
> +++ b/include/hw/iommu/iommu.h
> @@ -0,0 +1,79 @@
> +/*
> + * QEMU abstraction of IOMMU Context
> + *
> + * Copyright (C) 2019 Red Hat Inc.
> + *
> + * Authors: Peter Xu <peterx@redhat.com>,
> + *          Liu, Yi L <yi.l.liu@intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef HW_PCI_PASID_H
> +#define HW_PCI_PASID_H

These guards need to be updated for the new header name.

> +
> +#include "qemu/queue.h"
> +#ifndef CONFIG_USER_ONLY
> +#include "exec/hwaddr.h"
> +#endif
> +
> +typedef struct IOMMUContext IOMMUContext;
> +
> +enum IOMMUCTXEvent {
> +    IOMMU_CTX_EVENT_NUM,
> +};
> +typedef enum IOMMUCTXEvent IOMMUCTXEvent;
> +
> +struct IOMMUCTXEventData {
> +    IOMMUCTXEvent event;
> +    uint64_t length;
> +    void *data;
> +};
> +typedef struct IOMMUCTXEventData IOMMUCTXEventData;
> +
> +typedef struct IOMMUCTXNotifier IOMMUCTXNotifier;
> +
> +typedef void (*IOMMUCTXNotifyFn)(IOMMUCTXNotifier *notifier,
> +                                 IOMMUCTXEventData *event_data);
> +
> +struct IOMMUCTXNotifier {
> +    IOMMUCTXNotifyFn iommu_ctx_event_notify;
> +    /*
> +     * What events we are listening to. Let's allow multiple event
> +     * registrations from beginning.
> +     */
> +    IOMMUCTXEvent event;
> +    QLIST_ENTRY(IOMMUCTXNotifier) node;
> +};
> +
> +/*
> + * This is an abstraction of IOMMU context.
> + */
> +struct IOMMUContext {
> +    uint32_t pasid;

This confuses me a bit.  I thought the idea was that IOMMUContext with
SVM would represent all the PASIDs in use, but here we have a specific
pasid stored in the structure.

> +    QLIST_HEAD(, IOMMUCTXNotifier) iommu_ctx_notifiers;
> +};
> +
> +void iommu_ctx_notifier_register(IOMMUContext *iommu_ctx,
> +                                 IOMMUCTXNotifier *n,
> +                                 IOMMUCTXNotifyFn fn,
> +                                 IOMMUCTXEvent event);
> +void iommu_ctx_notifier_unregister(IOMMUContext *iommu_ctx,
> +                                   IOMMUCTXNotifier *notifier);
> +void iommu_ctx_event_notify(IOMMUContext *iommu_ctx,
> +                            IOMMUCTXEventData *event_data);
> +
> +void iommu_context_init(IOMMUContext *iommu_ctx);
> +
> +#endif

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 06/22] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps
  2019-10-24 12:34 ` [RFC v2 06/22] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps Liu Yi L
@ 2019-10-27 17:43   ` David Gibson
  2019-11-06  8:18     ` Liu, Yi L
  2019-11-01 18:09   ` Peter Xu
  1 sibling, 1 reply; 79+ messages in thread
From: David Gibson @ 2019-10-27 17:43 UTC (permalink / raw)
  To: Liu Yi L
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, Yi Sun, kvm, mst,
	jun.j.tian, qemu-devel, peterx, eric.auger, alex.williamson,
	pbonzini, yi.y.sun

[-- Attachment #1: Type: text/plain, Size: 16685 bytes --]

On Thu, Oct 24, 2019 at 08:34:27AM -0400, Liu Yi L wrote:
> This patch modifies pci_setup_iommu() to set PCIIOMMUOps instead of only
> setting PCIIOMMUFunc. PCIIOMMUFunc is previously used to get an address
> space for a device in vendor specific way. The PCIIOMMUOps still offers
> this functionality. Use PCIIOMMUOps leaves space to add more iommu related
> vendor specific operations.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/alpha/typhoon.c       |  6 +++++-
>  hw/arm/smmu-common.c     |  6 +++++-
>  hw/hppa/dino.c           |  6 +++++-
>  hw/i386/amd_iommu.c      |  6 +++++-
>  hw/i386/intel_iommu.c    |  6 +++++-
>  hw/pci-host/designware.c |  6 +++++-
>  hw/pci-host/ppce500.c    |  6 +++++-
>  hw/pci-host/prep.c       |  6 +++++-
>  hw/pci-host/sabre.c      |  6 +++++-
>  hw/pci/pci.c             | 11 ++++++-----
>  hw/ppc/ppc440_pcix.c     |  6 +++++-
>  hw/ppc/spapr_pci.c       |  6 +++++-
>  hw/s390x/s390-pci-bus.c  |  8 ++++++--
>  include/hw/pci/pci.h     |  8 ++++++--
>  include/hw/pci/pci_bus.h |  2 +-
>  15 files changed, 74 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
> index 179e1f7..b890771 100644
> --- a/hw/alpha/typhoon.c
> +++ b/hw/alpha/typhoon.c
> @@ -741,6 +741,10 @@ static AddressSpace *typhoon_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>      return &s->pchip.iommu_as;
>  }
>  
> +static const PCIIOMMUOps typhoon_iommu_ops = {
> +    .get_address_space = typhoon_pci_dma_iommu,
> +};
> +
>  static void typhoon_set_irq(void *opaque, int irq, int level)
>  {
>      TyphoonState *s = opaque;
> @@ -901,7 +905,7 @@ PCIBus *typhoon_init(ram_addr_t ram_size, ISABus **isa_bus,
>                               "iommu-typhoon", UINT64_MAX);
>      address_space_init(&s->pchip.iommu_as, MEMORY_REGION(&s->pchip.iommu),
>                         "pchip0-pci");
> -    pci_setup_iommu(b, typhoon_pci_dma_iommu, s);
> +    pci_setup_iommu(b, &typhoon_iommu_ops, s);
>  
>      /* Pchip0 PCI special/interrupt acknowledge, 0x801.F800.0000, 64MB.  */
>      memory_region_init_io(&s->pchip.reg_iack, OBJECT(s), &alpha_pci_iack_ops,
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index 245817d..d668514 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -342,6 +342,10 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
>      return &sdev->as;
>  }
>  
> +static const PCIIOMMUOps smmu_ops = {
> +    .get_address_space = smmu_find_add_as,
> +};
> +
>  IOMMUMemoryRegion *smmu_iommu_mr(SMMUState *s, uint32_t sid)
>  {
>      uint8_t bus_n, devfn;
> @@ -436,7 +440,7 @@ static void smmu_base_realize(DeviceState *dev, Error **errp)
>      s->smmu_pcibus_by_busptr = g_hash_table_new(NULL, NULL);
>  
>      if (s->primary_bus) {
> -        pci_setup_iommu(s->primary_bus, smmu_find_add_as, s);
> +        pci_setup_iommu(s->primary_bus, &smmu_ops, s);
>      } else {
>          error_setg(errp, "SMMU is not attached to any PCI bus!");
>      }
> diff --git a/hw/hppa/dino.c b/hw/hppa/dino.c
> index ab6969b..dbcff03 100644
> --- a/hw/hppa/dino.c
> +++ b/hw/hppa/dino.c
> @@ -389,6 +389,10 @@ static AddressSpace *dino_pcihost_set_iommu(PCIBus *bus, void *opaque,
>      return &s->bm_as;
>  }
>  
> +static const PCIIOMMUOps dino_iommu_ops = {
> +    .get_address_space = dino_pcihost_set_iommu,
> +};
> +
>  /*
>   * Dino interrupts are connected as shown on Page 78, Table 23
>   * (Little-endian bit numbers)
> @@ -508,7 +512,7 @@ PCIBus *dino_init(MemoryRegion *addr_space,
>      memory_region_add_subregion(&s->bm, 0xfff00000,
>                                  &s->bm_cpu_alias);
>      address_space_init(&s->bm_as, &s->bm, "pci-bm");
> -    pci_setup_iommu(b, dino_pcihost_set_iommu, s);
> +    pci_setup_iommu(b, &dino_iommu_ops, s);
>  
>      *p_rtc_irq = qemu_allocate_irq(dino_set_timer_irq, s, 0);
>      *p_ser_irq = qemu_allocate_irq(dino_set_serial_irq, s, 0);
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index d372636..ba6904c 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -1451,6 +1451,10 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>      return &iommu_as[devfn]->as;
>  }
>  
> +static const PCIIOMMUOps amdvi_iommu_ops = {
> +    .get_address_space = amdvi_host_dma_iommu,
> +};
> +
>  static const MemoryRegionOps mmio_mem_ops = {
>      .read = amdvi_mmio_read,
>      .write = amdvi_mmio_write,
> @@ -1576,7 +1580,7 @@ static void amdvi_realize(DeviceState *dev, Error **err)
>  
>      sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->mmio);
>      sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, AMDVI_BASE_ADDR);
> -    pci_setup_iommu(bus, amdvi_host_dma_iommu, s);
> +    pci_setup_iommu(bus, &amdvi_iommu_ops, s);
>      s->devid = object_property_get_int(OBJECT(&s->pci), "addr", err);
>      msi_init(&s->pci.dev, 0, 1, true, false, err);
>      amdvi_init(s);
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 4a1a07a..67a7836 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -3666,6 +3666,10 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>      return &vtd_as->as;
>  }
>  
> +static PCIIOMMUOps vtd_iommu_ops = {
> +    .get_address_space = vtd_host_dma_iommu,
> +};
> +
>  static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
>  {
>      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> @@ -3782,7 +3786,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>                                                g_free, g_free);
>      vtd_init(s);
>      sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
> -    pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
> +    pci_setup_iommu(bus, &vtd_iommu_ops, dev);
>      /* Pseudo address space under root PCI bus. */
>      pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
>      qemu_add_machine_init_done_notifier(&vtd_machine_done_notify);
> diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c
> index 71e9b0d..235d6af 100644
> --- a/hw/pci-host/designware.c
> +++ b/hw/pci-host/designware.c
> @@ -645,6 +645,10 @@ static AddressSpace *designware_pcie_host_set_iommu(PCIBus *bus, void *opaque,
>      return &s->pci.address_space;
>  }
>  
> +static const PCIIOMMUOps designware_iommu_ops = {
> +    .get_address_space = designware_pcie_host_set_iommu,
> +};
> +
>  static void designware_pcie_host_realize(DeviceState *dev, Error **errp)
>  {
>      PCIHostState *pci = PCI_HOST_BRIDGE(dev);
> @@ -686,7 +690,7 @@ static void designware_pcie_host_realize(DeviceState *dev, Error **errp)
>      address_space_init(&s->pci.address_space,
>                         &s->pci.address_space_root,
>                         "pcie-bus-address-space");
> -    pci_setup_iommu(pci->bus, designware_pcie_host_set_iommu, s);
> +    pci_setup_iommu(pci->bus, designware_iommu_ops, s);
>  
>      qdev_set_parent_bus(DEVICE(&s->root), BUS(pci->bus));
>      qdev_init_nofail(DEVICE(&s->root));
> diff --git a/hw/pci-host/ppce500.c b/hw/pci-host/ppce500.c
> index 8bed8e8..0f907b0 100644
> --- a/hw/pci-host/ppce500.c
> +++ b/hw/pci-host/ppce500.c
> @@ -439,6 +439,10 @@ static AddressSpace *e500_pcihost_set_iommu(PCIBus *bus, void *opaque,
>      return &s->bm_as;
>  }
>  
> +static const PCIIOMMUOps ppce500_iommu_ops = {
> +    .get_address_space = e500_pcihost_set_iommu,
> +};
> +
>  static void e500_pcihost_realize(DeviceState *dev, Error **errp)
>  {
>      SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
> @@ -473,7 +477,7 @@ static void e500_pcihost_realize(DeviceState *dev, Error **errp)
>      memory_region_init(&s->bm, OBJECT(s), "bm-e500", UINT64_MAX);
>      memory_region_add_subregion(&s->bm, 0x0, &s->busmem);
>      address_space_init(&s->bm_as, &s->bm, "pci-bm");
> -    pci_setup_iommu(b, e500_pcihost_set_iommu, s);
> +    pci_setup_iommu(b, &ppce500_iommu_ops, s);
>  
>      pci_create_simple(b, 0, "e500-host-bridge");
>  
> diff --git a/hw/pci-host/prep.c b/hw/pci-host/prep.c
> index 85d7ba9..f372524 100644
> --- a/hw/pci-host/prep.c
> +++ b/hw/pci-host/prep.c
> @@ -213,6 +213,10 @@ static AddressSpace *raven_pcihost_set_iommu(PCIBus *bus, void *opaque,
>      return &s->bm_as;
>  }
>  
> +static const PCIIOMMU raven_iommu_ops = {
> +    .get_address_space = raven_pcihost_set_iommu,
> +};
> +
>  static void raven_change_gpio(void *opaque, int n, int level)
>  {
>      PREPPCIState *s = opaque;
> @@ -303,7 +307,7 @@ static void raven_pcihost_initfn(Object *obj)
>      memory_region_add_subregion(&s->bm, 0         , &s->bm_pci_memory_alias);
>      memory_region_add_subregion(&s->bm, 0x80000000, &s->bm_ram_alias);
>      address_space_init(&s->bm_as, &s->bm, "raven-bm");
> -    pci_setup_iommu(&s->pci_bus, raven_pcihost_set_iommu, s);
> +    pci_setup_iommu(&s->pci_bus, &raven_iommu_ops, s);
>  
>      h->bus = &s->pci_bus;
>  
> diff --git a/hw/pci-host/sabre.c b/hw/pci-host/sabre.c
> index fae20ee..79b7565 100644
> --- a/hw/pci-host/sabre.c
> +++ b/hw/pci-host/sabre.c
> @@ -112,6 +112,10 @@ static AddressSpace *sabre_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>      return &is->iommu_as;
>  }
>  
> +static const PCIIOMMUOps sabre_iommu_ops = {
> +    .get_address_space = sabre_pci_dma_iommu,
> +};
> +
>  static void sabre_config_write(void *opaque, hwaddr addr,
>                                 uint64_t val, unsigned size)
>  {
> @@ -402,7 +406,7 @@ static void sabre_realize(DeviceState *dev, Error **errp)
>      /* IOMMU */
>      memory_region_add_subregion_overlap(&s->sabre_config, 0x200,
>                      sysbus_mmio_get_region(SYS_BUS_DEVICE(s->iommu), 0), 1);
> -    pci_setup_iommu(phb->bus, sabre_pci_dma_iommu, s->iommu);
> +    pci_setup_iommu(phb->bus, &sabre_iommu_ops, s->iommu);
>  
>      /* APB secondary busses */
>      pci_dev = pci_create_multifunction(phb->bus, PCI_DEVFN(1, 0), true,
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index aa05c2b..b5ce9ca 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2615,18 +2615,19 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
>      PCIBus *bus = pci_get_bus(dev);
>      PCIBus *iommu_bus = bus;
>  
> -    while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) {
> +    while (iommu_bus && !iommu_bus->iommu_ops && iommu_bus->parent_dev) {
>          iommu_bus = pci_get_bus(iommu_bus->parent_dev);
>      }
> -    if (iommu_bus && iommu_bus->iommu_fn) {
> -        return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, dev->devfn);
> +    if (iommu_bus && iommu_bus->iommu_ops) {
> +        return iommu_bus->iommu_ops->get_address_space(bus,
> +                           iommu_bus->iommu_opaque, dev->devfn);
>      }
>      return &address_space_memory;
>  }
>  
> -void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque)
> +void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque)
>  {
> -    bus->iommu_fn = fn;
> +    bus->iommu_ops = ops;
>      bus->iommu_opaque = opaque;
>  }
>  
> diff --git a/hw/ppc/ppc440_pcix.c b/hw/ppc/ppc440_pcix.c
> index 2ee2d4f..2c8579c 100644
> --- a/hw/ppc/ppc440_pcix.c
> +++ b/hw/ppc/ppc440_pcix.c
> @@ -442,6 +442,10 @@ static AddressSpace *ppc440_pcix_set_iommu(PCIBus *b, void *opaque, int devfn)
>      return &s->bm_as;
>  }
>  
> +static const PCIIOMMUOps ppc440_iommu_ops = {
> +    .get_adress_space = ppc440_pcix_set_iommu,
> +};
> +
>  /* The default pci_host_data_{read,write} functions in pci/pci_host.c
>   * deny access to registers without bit 31 set but our clients want
>   * this to work so we have to override these here */
> @@ -487,7 +491,7 @@ static void ppc440_pcix_realize(DeviceState *dev, Error **errp)
>      memory_region_init(&s->bm, OBJECT(s), "bm-ppc440-pcix", UINT64_MAX);
>      memory_region_add_subregion(&s->bm, 0x0, &s->busmem);
>      address_space_init(&s->bm_as, &s->bm, "pci-bm");
> -    pci_setup_iommu(h->bus, ppc440_pcix_set_iommu, s);
> +    pci_setup_iommu(h->bus, &ppc440_iommu_ops, s);
>  
>      memory_region_init(&s->container, OBJECT(s), "pci-container", PCI_ALL_SIZE);
>      memory_region_init_io(&h->conf_mem, OBJECT(s), &pci_host_conf_le_ops,
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 01ff41d..83cd857 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -771,6 +771,10 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>      return &phb->iommu_as;
>  }
>  
> +static const PCIIOMMUOps spapr_iommu_ops = {
> +    .get_address_space = spapr_pci_dma_iommu,
> +};
> +
>  static char *spapr_phb_vfio_get_loc_code(SpaprPhbState *sphb,  PCIDevice *pdev)
>  {
>      char *path = NULL, *buf = NULL, *host = NULL;
> @@ -1950,7 +1954,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>      memory_region_add_subregion(&sphb->iommu_root, SPAPR_PCI_MSI_WINDOW,
>                                  &sphb->msiwindow);
>  
> -    pci_setup_iommu(bus, spapr_pci_dma_iommu, sphb);
> +    pci_setup_iommu(bus, &spapr_iommu_ops, sphb);
>  
>      pci_bus_set_route_irq_fn(bus, spapr_route_intx_pin_to_irq);
>  
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index 2d2f4a7..14684a0 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -635,6 +635,10 @@ static AddressSpace *s390_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>      return &iommu->as;
>  }
>  
> +static const PCIIOMMUOps s390_iommu_ops = {
> +    .get_address_space = s390_pci_dma_iommu,
> +};
> +
>  static uint8_t set_ind_atomic(uint64_t ind_loc, uint8_t to_be_set)
>  {
>      uint8_t ind_old, ind_new;
> @@ -748,7 +752,7 @@ static void s390_pcihost_realize(DeviceState *dev, Error **errp)
>      b = pci_register_root_bus(dev, NULL, s390_pci_set_irq, s390_pci_map_irq,
>                                NULL, get_system_memory(), get_system_io(), 0,
>                                64, TYPE_PCI_BUS);
> -    pci_setup_iommu(b, s390_pci_dma_iommu, s);
> +    pci_setup_iommu(b, &s390_iommu_ops, s);
>  
>      bus = BUS(b);
>      qbus_set_hotplug_handler(bus, OBJECT(dev), &local_err);
> @@ -919,7 +923,7 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>  
>          pdev = PCI_DEVICE(dev);
>          pci_bridge_map_irq(pb, dev->id, s390_pci_map_irq);
> -        pci_setup_iommu(&pb->sec_bus, s390_pci_dma_iommu, s);
> +        pci_setup_iommu(&pb->sec_bus, &s390_iommu_ops, s);
>  
>          qbus_set_hotplug_handler(BUS(&pb->sec_bus), OBJECT(s), errp);
>  
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index f3f0ffd..d9fed8d 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -480,10 +480,14 @@ void pci_bus_get_w64_range(PCIBus *bus, Range *range);
>  
>  void pci_device_deassert_intx(PCIDevice *dev);
>  
> -typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
> +typedef struct PCIIOMMUOps PCIIOMMUOps;
> +struct PCIIOMMUOps {
> +    AddressSpace * (*get_address_space)(PCIBus *bus,
> +                                void *opaque, int32_t devfn);
> +};
>  
>  AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
> -void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
> +void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *iommu_ops, void *opaque);
>  
>  static inline void
>  pci_set_byte(uint8_t *config, uint8_t val)
> diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
> index 0714f57..c281057 100644
> --- a/include/hw/pci/pci_bus.h
> +++ b/include/hw/pci/pci_bus.h
> @@ -29,7 +29,7 @@ enum PCIBusFlags {
>  struct PCIBus {
>      BusState qbus;
>      enum PCIBusFlags flags;
> -    PCIIOMMUFunc iommu_fn;
> +    const PCIIOMMUOps *iommu_ops;
>      void *iommu_opaque;
>      uint8_t devfn_min;
>      uint32_t slot_reserved_mask;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 07/22] hw/pci: introduce pci_device_iommu_context()
  2019-10-24 12:34 ` [RFC v2 07/22] hw/pci: introduce pci_device_iommu_context() Liu Yi L
@ 2019-10-29 11:50   ` David Gibson
  2019-11-06  8:20     ` Liu, Yi L
  2019-11-01 18:09   ` Peter Xu
  1 sibling, 1 reply; 79+ messages in thread
From: David Gibson @ 2019-10-29 11:50 UTC (permalink / raw)
  To: Liu Yi L
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, Yi Sun, kvm, mst,
	jun.j.tian, qemu-devel, peterx, eric.auger, alex.williamson,
	pbonzini, yi.y.sun

[-- Attachment #1: Type: text/plain, Size: 3038 bytes --]

On Thu, Oct 24, 2019 at 08:34:28AM -0400, Liu Yi L wrote:
> This patch adds pci_device_iommu_context() to get an iommu_context
> for a given device. A new callback is added in PCIIOMMUOps. Users
> who wants to listen to events issued by vIOMMU could use this new
> interface to get an iommu_context and register their own notifiers,
> then wait for notifications from vIOMMU. e.g. VFIO is the first user
> of it to listen to the PASID_ALLOC/PASID_BIND/CACHE_INV events and
> propagate the events to host.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/pci/pci.c         | 16 ++++++++++++++++
>  include/hw/pci/pci.h |  5 +++++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index b5ce9ca..4e6af06 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2625,6 +2625,22 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
>      return &address_space_memory;
>  }
>  
> +IOMMUContext *pci_device_iommu_context(PCIDevice *dev)
> +{
> +    PCIBus *bus = pci_get_bus(dev);
> +    PCIBus *iommu_bus = bus;
> +
> +    while (iommu_bus && !iommu_bus->iommu_ops && iommu_bus->parent_dev) {
> +        iommu_bus = pci_get_bus(iommu_bus->parent_dev);
> +    }
> +    if (iommu_bus && iommu_bus->iommu_ops &&
> +        iommu_bus->iommu_ops->get_iommu_context) {
> +        return iommu_bus->iommu_ops->get_iommu_context(bus,
> +                           iommu_bus->iommu_opaque, dev->devfn);
> +    }
> +    return NULL;
> +}
> +
>  void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque)
>  {
>      bus->iommu_ops = ops;
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index d9fed8d..ccada47 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -9,6 +9,8 @@
>  
>  #include "hw/pci/pcie.h"
>  
> +#include "hw/iommu/iommu.h"
> +
>  extern bool pci_available;
>  
>  /* PCI bus */
> @@ -484,9 +486,12 @@ typedef struct PCIIOMMUOps PCIIOMMUOps;
>  struct PCIIOMMUOps {
>      AddressSpace * (*get_address_space)(PCIBus *bus,
>                                  void *opaque, int32_t devfn);
> +    IOMMUContext * (*get_iommu_context)(PCIBus *bus,
> +                                void *opaque, int32_t devfn);
>  };
>  
>  AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
> +IOMMUContext *pci_device_iommu_context(PCIDevice *dev);
>  void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *iommu_ops, void *opaque);
>  
>  static inline void

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 09/22] vfio/pci: add iommu_context notifier for pasid alloc/free
  2019-10-24 12:34 ` [RFC v2 09/22] vfio/pci: add iommu_context notifier for pasid alloc/free Liu Yi L
@ 2019-10-29 12:15   ` David Gibson
  2019-11-01 17:26     ` Peter Xu
  2019-11-06 12:14     ` Liu, Yi L
  0 siblings, 2 replies; 79+ messages in thread
From: David Gibson @ 2019-10-29 12:15 UTC (permalink / raw)
  To: Liu Yi L
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, Yi Sun, kvm, mst,
	jun.j.tian, qemu-devel, peterx, eric.auger, alex.williamson,
	pbonzini, yi.y.sun

[-- Attachment #1: Type: text/plain, Size: 7667 bytes --]

On Thu, Oct 24, 2019 at 08:34:30AM -0400, Liu Yi L wrote:
> This patch adds pasid alloc/free notifiers for vfio-pci. It is
> supposed to be fired by vIOMMU. VFIO then sends PASID allocation
> or free request to host.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
>  hw/vfio/common.c         |  9 ++++++
>  hw/vfio/pci.c            | 81 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/iommu/iommu.h | 15 +++++++++
>  3 files changed, 105 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index d418527..e6ad21c 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1436,6 +1436,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>      if (QLIST_EMPTY(&container->group_list)) {
>          VFIOAddressSpace *space = container->space;
>          VFIOGuestIOMMU *giommu, *tmp;
> +        VFIOIOMMUContext *giommu_ctx, *ctx;
>  
>          QLIST_REMOVE(container, next);
>  
> @@ -1446,6 +1447,14 @@ static void vfio_disconnect_container(VFIOGroup *group)
>              g_free(giommu);
>          }
>  
> +        QLIST_FOREACH_SAFE(giommu_ctx, &container->iommu_ctx_list,
> +                                                   iommu_ctx_next, ctx) {
> +            iommu_ctx_notifier_unregister(giommu_ctx->iommu_ctx,
> +                                                      &giommu_ctx->n);
> +            QLIST_REMOVE(giommu_ctx, iommu_ctx_next);
> +            g_free(giommu_ctx);
> +        }
> +
>          trace_vfio_disconnect_container(container->fd);
>          close(container->fd);
>          g_free(container);
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 12fac39..8721ff6 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2699,11 +2699,80 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>      vdev->req_enabled = false;
>  }
>  
> +static void vfio_register_iommu_ctx_notifier(VFIOPCIDevice *vdev,
> +                                             IOMMUContext *iommu_ctx,
> +                                             IOMMUCTXNotifyFn fn,
> +                                             IOMMUCTXEvent event)
> +{
> +    VFIOContainer *container = vdev->vbasedev.group->container;
> +    VFIOIOMMUContext *giommu_ctx;
> +
> +    giommu_ctx = g_malloc0(sizeof(*giommu_ctx));
> +    giommu_ctx->container = container;
> +    giommu_ctx->iommu_ctx = iommu_ctx;
> +    QLIST_INSERT_HEAD(&container->iommu_ctx_list,
> +                      giommu_ctx,
> +                      iommu_ctx_next);
> +    iommu_ctx_notifier_register(iommu_ctx,
> +                                &giommu_ctx->n,
> +                                fn,
> +                                event);
> +}
> +
> +static void vfio_iommu_pasid_alloc_notify(IOMMUCTXNotifier *n,
> +                                          IOMMUCTXEventData *event_data)
> +{
> +    VFIOIOMMUContext *giommu_ctx = container_of(n, VFIOIOMMUContext, n);
> +    VFIOContainer *container = giommu_ctx->container;
> +    IOMMUCTXPASIDReqDesc *pasid_req =
> +                              (IOMMUCTXPASIDReqDesc *) event_data->data;
> +    struct vfio_iommu_type1_pasid_request req;
> +    unsigned long argsz;
> +    int pasid;
> +
> +    argsz = sizeof(req);
> +    req.argsz = argsz;
> +    req.flag = VFIO_IOMMU_PASID_ALLOC;
> +    req.min_pasid = pasid_req->min_pasid;
> +    req.max_pasid = pasid_req->max_pasid;
> +
> +    pasid = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> +    if (pasid < 0) {
> +        error_report("%s: %d, alloc failed", __func__, -errno);
> +    }
> +    pasid_req->alloc_result = pasid;

Altering the event data from the notifier doesn't make sense.  By
definition there can be multiple notifiers on the chain, so in that
case which one is responsible for updating the writable field?

> +}
> +
> +static void vfio_iommu_pasid_free_notify(IOMMUCTXNotifier *n,
> +                                          IOMMUCTXEventData *event_data)
> +{
> +    VFIOIOMMUContext *giommu_ctx = container_of(n, VFIOIOMMUContext, n);
> +    VFIOContainer *container = giommu_ctx->container;
> +    IOMMUCTXPASIDReqDesc *pasid_req =
> +                              (IOMMUCTXPASIDReqDesc *) event_data->data;
> +    struct vfio_iommu_type1_pasid_request req;
> +    unsigned long argsz;
> +    int ret = 0;
> +
> +    argsz = sizeof(req);
> +    req.argsz = argsz;
> +    req.flag = VFIO_IOMMU_PASID_FREE;
> +    req.pasid = pasid_req->pasid;
> +
> +    ret = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> +    if (ret != 0) {
> +        error_report("%s: %d, pasid %u free failed",
> +                   __func__, -errno, (unsigned) pasid_req->pasid);
> +    }
> +    pasid_req->free_result = ret;

Same problem here.

> +}
> +
>  static void vfio_realize(PCIDevice *pdev, Error **errp)
>  {
>      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>      VFIODevice *vbasedev_iter;
>      VFIOGroup *group;
> +    IOMMUContext *iommu_context;
>      char *tmp, *subsys, group_path[PATH_MAX], *group_name;
>      Error *err = NULL;
>      ssize_t len;
> @@ -3000,6 +3069,18 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>      vfio_register_req_notifier(vdev);
>      vfio_setup_resetfn_quirk(vdev);
>  
> +    iommu_context = pci_device_iommu_context(pdev);
> +    if (iommu_context) {
> +        vfio_register_iommu_ctx_notifier(vdev,
> +                                         iommu_context,
> +                                         vfio_iommu_pasid_alloc_notify,
> +                                         IOMMU_CTX_EVENT_PASID_ALLOC);
> +        vfio_register_iommu_ctx_notifier(vdev,
> +                                         iommu_context,
> +                                         vfio_iommu_pasid_free_notify,
> +                                         IOMMU_CTX_EVENT_PASID_FREE);
> +    }
> +
>      return;
>  
>  out_teardown:
> diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
> index c22c442..4352afd 100644
> --- a/include/hw/iommu/iommu.h
> +++ b/include/hw/iommu/iommu.h
> @@ -31,10 +31,25 @@
>  typedef struct IOMMUContext IOMMUContext;
>  
>  enum IOMMUCTXEvent {
> +    IOMMU_CTX_EVENT_PASID_ALLOC,
> +    IOMMU_CTX_EVENT_PASID_FREE,
>      IOMMU_CTX_EVENT_NUM,
>  };
>  typedef enum IOMMUCTXEvent IOMMUCTXEvent;
>  
> +union IOMMUCTXPASIDReqDesc {
> +    struct {
> +        uint32_t min_pasid;
> +        uint32_t max_pasid;
> +        int32_t alloc_result; /* pasid allocated for the alloc request */
> +    };
> +    struct {
> +        uint32_t pasid; /* pasid to be free */
> +        int free_result;
> +    };
> +};

Apart from theproblem with writable fields, using a big union for
event data is pretty ugly.  If you need this different information for
the different events, it might make more sense to have a separate
notifier chain with a separate call interface for each event type,
rather than trying to multiplex them together.

> +typedef union IOMMUCTXPASIDReqDesc IOMMUCTXPASIDReqDesc;
> +
>  struct IOMMUCTXEventData {
>      IOMMUCTXEvent event;
>      uint64_t length;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
  2019-10-25 10:12   ` Tian, Kevin
@ 2019-10-31  4:33     ` Jason Wang
  2019-10-31  5:39       ` Tian, Kevin
  2019-10-31 14:07       ` Liu, Yi L
  0 siblings, 2 replies; 79+ messages in thread
From: Jason Wang @ 2019-10-31  4:33 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, qemu-devel, mst, pbonzini,
	alex.williamson, peterx
  Cc: tianyu.lan, jacob.jun.pan, kvm, Tian, Jun J, eric.auger, Sun,
	Yi Y, david


On 2019/10/25 下午6:12, Tian, Kevin wrote:
>> From: Jason Wang [mailto:jasowang@redhat.com]
>> Sent: Friday, October 25, 2019 5:49 PM
>>
>>
>> On 2019/10/24 下午8:34, Liu Yi L wrote:
>>> Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on Intel
>>> platforms allow address space sharing between device DMA and
>> applications.
>>
>>
>> Interesting, so the below figure demonstrates the case of VM. I wonder
>> how much differences if we compare it with doing SVM between device
>> and
>> an ordinary process (e.g dpdk)?
>>
>> Thanks
> One difference is that ordinary process requires only stage-1 translation,
> while VM requires nested translation.


A silly question, then I believe there's no need for VFIO DMA API in 
this case consider the page table is shared between MMU and IOMMU?

Thanks

>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
  2019-10-31  4:33     ` Jason Wang
@ 2019-10-31  5:39       ` Tian, Kevin
  2019-10-31 14:07       ` Liu, Yi L
  1 sibling, 0 replies; 79+ messages in thread
From: Tian, Kevin @ 2019-10-31  5:39 UTC (permalink / raw)
  To: Jason Wang, Liu, Yi L, qemu-devel, mst, pbonzini,
	alex.williamson, peterx
  Cc: tianyu.lan, jacob.jun.pan, kvm, Tian, Jun J, eric.auger, Sun,
	Yi Y, david

> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Thursday, October 31, 2019 12:33 PM
> 
> 
> On 2019/10/25 下午6:12, Tian, Kevin wrote:
> >> From: Jason Wang [mailto:jasowang@redhat.com]
> >> Sent: Friday, October 25, 2019 5:49 PM
> >>
> >>
> >> On 2019/10/24 下午8:34, Liu Yi L wrote:
> >>> Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on
> Intel
> >>> platforms allow address space sharing between device DMA and
> >> applications.
> >>
> >>
> >> Interesting, so the below figure demonstrates the case of VM. I wonder
> >> how much differences if we compare it with doing SVM between device
> >> and
> >> an ordinary process (e.g dpdk)?
> >>
> >> Thanks
> > One difference is that ordinary process requires only stage-1 translation,
> > while VM requires nested translation.
> 
> 
> A silly question, then I believe there's no need for VFIO DMA API in
> this case consider the page table is shared between MMU and IOMMU?
> 

yes, only need to intercept guest iotlb invalidation request on stage-1 
translation and then forward to IOMMU through new VFIO API. Existing
VFIO DMA API applies to only the stage-2 translation (GPA->HPA) here.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
  2019-10-31  4:33     ` Jason Wang
  2019-10-31  5:39       ` Tian, Kevin
@ 2019-10-31 14:07       ` Liu, Yi L
  2019-11-01  7:29         ` Jason Wang
  1 sibling, 1 reply; 79+ messages in thread
From: Liu, Yi L @ 2019-10-31 14:07 UTC (permalink / raw)
  To: Jason Wang, Tian, Kevin, qemu-devel, mst, pbonzini,
	alex.williamson, peterx
  Cc: tianyu.lan, jacob.jun.pan, kvm, Tian, Jun J, eric.auger, Sun,
	Yi Y, david

> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Thursday, October 31, 2019 5:33 AM
> Subject: Re: [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
> 
> 
> On 2019/10/25 下午6:12, Tian, Kevin wrote:
> >> From: Jason Wang [mailto:jasowang@redhat.com]
> >> Sent: Friday, October 25, 2019 5:49 PM
> >>
> >>
> >> On 2019/10/24 下午8:34, Liu Yi L wrote:
> >>> Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on
> >>> Intel platforms allow address space sharing between device DMA and
> >> applications.
> >>
> >>
> >> Interesting, so the below figure demonstrates the case of VM. I
> >> wonder how much differences if we compare it with doing SVM between
> >> device and an ordinary process (e.g dpdk)?
> >>
> >> Thanks
> > One difference is that ordinary process requires only stage-1
> > translation, while VM requires nested translation.
> 
> 
> A silly question, then I believe there's no need for VFIO DMA API in this case consider
> the page table is shared between MMU and IOMMU?

Echo Kevin's reply. We use nested translation here. For stage-1, yes, no need to use
VFIO DMA API. For stage-2, we still use VFIO DMA API to program the GPA->HPA
mapping to host. :-)

Regards,
Yi Liu
> 
> Thanks
> 
> >


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
  2019-10-31 14:07       ` Liu, Yi L
@ 2019-11-01  7:29         ` Jason Wang
  2019-11-01  7:46           ` Tian, Kevin
  0 siblings, 1 reply; 79+ messages in thread
From: Jason Wang @ 2019-11-01  7:29 UTC (permalink / raw)
  To: Liu, Yi L, Tian, Kevin, qemu-devel, mst, pbonzini,
	alex.williamson, peterx
  Cc: tianyu.lan, jacob.jun.pan, kvm, Tian, Jun J, eric.auger, Sun,
	Yi Y, david


On 2019/10/31 下午10:07, Liu, Yi L wrote:
>> From: Jason Wang [mailto:jasowang@redhat.com]
>> Sent: Thursday, October 31, 2019 5:33 AM
>> Subject: Re: [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
>>
>>
>> On 2019/10/25 下午6:12, Tian, Kevin wrote:
>>>> From: Jason Wang [mailto:jasowang@redhat.com]
>>>> Sent: Friday, October 25, 2019 5:49 PM
>>>>
>>>>
>>>> On 2019/10/24 下午8:34, Liu Yi L wrote:
>>>>> Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on
>>>>> Intel platforms allow address space sharing between device DMA and
>>>> applications.
>>>>
>>>>
>>>> Interesting, so the below figure demonstrates the case of VM. I
>>>> wonder how much differences if we compare it with doing SVM between
>>>> device and an ordinary process (e.g dpdk)?
>>>>
>>>> Thanks
>>> One difference is that ordinary process requires only stage-1
>>> translation, while VM requires nested translation.
>>
>> A silly question, then I believe there's no need for VFIO DMA API in this case consider
>> the page table is shared between MMU and IOMMU?
> Echo Kevin's reply. We use nested translation here. For stage-1, yes, no need to use
> VFIO DMA API. For stage-2, we still use VFIO DMA API to program the GPA->HPA
> mapping to host. :-)


Cool, two more questions:

- Can EPT shares its page table with IOMMU L2?

- Similar to EPT, when GPA->HPA (actually HVA->HPA) is modified by mm, 
VFIO need to use MMU notifier do modify L2 accordingly besides DMA API?

Thanks


>
> Regards,
> Yi Liu
>> Thanks
>>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
  2019-11-01  7:29         ` Jason Wang
@ 2019-11-01  7:46           ` Tian, Kevin
  2019-11-01  8:04             ` Jason Wang
  0 siblings, 1 reply; 79+ messages in thread
From: Tian, Kevin @ 2019-11-01  7:46 UTC (permalink / raw)
  To: Jason Wang, Liu, Yi L, qemu-devel, mst, pbonzini,
	alex.williamson, peterx
  Cc: tianyu.lan, jacob.jun.pan, kvm, Tian, Jun J, eric.auger, Sun,
	Yi Y, david

> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Friday, November 1, 2019 3:30 PM
> 
> 
> On 2019/10/31 下午10:07, Liu, Yi L wrote:
> >> From: Jason Wang [mailto:jasowang@redhat.com]
> >> Sent: Thursday, October 31, 2019 5:33 AM
> >> Subject: Re: [RFC v2 00/22] intel_iommu: expose Shared Virtual
> Addressing to VM
> >>
> >>
> >> On 2019/10/25 下午6:12, Tian, Kevin wrote:
> >>>> From: Jason Wang [mailto:jasowang@redhat.com]
> >>>> Sent: Friday, October 25, 2019 5:49 PM
> >>>>
> >>>>
> >>>> On 2019/10/24 下午8:34, Liu Yi L wrote:
> >>>>> Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on
> >>>>> Intel platforms allow address space sharing between device DMA
> and
> >>>> applications.
> >>>>
> >>>>
> >>>> Interesting, so the below figure demonstrates the case of VM. I
> >>>> wonder how much differences if we compare it with doing SVM
> between
> >>>> device and an ordinary process (e.g dpdk)?
> >>>>
> >>>> Thanks
> >>> One difference is that ordinary process requires only stage-1
> >>> translation, while VM requires nested translation.
> >>
> >> A silly question, then I believe there's no need for VFIO DMA API in this
> case consider
> >> the page table is shared between MMU and IOMMU?
> > Echo Kevin's reply. We use nested translation here. For stage-1, yes, no
> need to use
> > VFIO DMA API. For stage-2, we still use VFIO DMA API to program the
> GPA->HPA
> > mapping to host. :-)
> 
> 
> Cool, two more questions:
> 
> - Can EPT shares its page table with IOMMU L2?

yes, their formats are compatible.

> 
> - Similar to EPT, when GPA->HPA (actually HVA->HPA) is modified by mm,
> VFIO need to use MMU notifier do modify L2 accordingly besides DMA API?
> 

VFIO devices need to pin-down guest memory pages that are mapped
in IOMMU. So notifier is not required since mm won't change the mapping
for those pages.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
  2019-11-01  7:46           ` Tian, Kevin
@ 2019-11-01  8:04             ` Jason Wang
  2019-11-01  8:09               ` Jason Wang
  0 siblings, 1 reply; 79+ messages in thread
From: Jason Wang @ 2019-11-01  8:04 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, qemu-devel, mst, pbonzini,
	alex.williamson, peterx
  Cc: tianyu.lan, jacob.jun.pan, kvm, Michael S. Tsirkin, Tian, Jun J,
	eric.auger, Sun, Yi Y, david


On 2019/11/1 下午3:46, Tian, Kevin wrote:
>> From: Jason Wang [mailto:jasowang@redhat.com]
>> Sent: Friday, November 1, 2019 3:30 PM
>>
>>
>> On 2019/10/31 下午10:07, Liu, Yi L wrote:
>>>> From: Jason Wang [mailto:jasowang@redhat.com]
>>>> Sent: Thursday, October 31, 2019 5:33 AM
>>>> Subject: Re: [RFC v2 00/22] intel_iommu: expose Shared Virtual
>> Addressing to VM
>>>>
>>>> On 2019/10/25 下午6:12, Tian, Kevin wrote:
>>>>>> From: Jason Wang [mailto:jasowang@redhat.com]
>>>>>> Sent: Friday, October 25, 2019 5:49 PM
>>>>>>
>>>>>>
>>>>>> On 2019/10/24 下午8:34, Liu Yi L wrote:
>>>>>>> Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on
>>>>>>> Intel platforms allow address space sharing between device DMA
>> and
>>>>>> applications.
>>>>>>
>>>>>>
>>>>>> Interesting, so the below figure demonstrates the case of VM. I
>>>>>> wonder how much differences if we compare it with doing SVM
>> between
>>>>>> device and an ordinary process (e.g dpdk)?
>>>>>>
>>>>>> Thanks
>>>>> One difference is that ordinary process requires only stage-1
>>>>> translation, while VM requires nested translation.
>>>> A silly question, then I believe there's no need for VFIO DMA API in this
>> case consider
>>>> the page table is shared between MMU and IOMMU?
>>> Echo Kevin's reply. We use nested translation here. For stage-1, yes, no
>> need to use
>>> VFIO DMA API. For stage-2, we still use VFIO DMA API to program the
>> GPA->HPA
>>> mapping to host. :-)
>>
>> Cool, two more questions:
>>
>> - Can EPT shares its page table with IOMMU L2?
> yes, their formats are compatible.
>
>> - Similar to EPT, when GPA->HPA (actually HVA->HPA) is modified by mm,
>> VFIO need to use MMU notifier do modify L2 accordingly besides DMA API?
>>
> VFIO devices need to pin-down guest memory pages that are mapped
> in IOMMU. So notifier is not required since mm won't change the mapping
> for those pages.


The GUP tends to lead a lot of issues, we may consider to allow 
userspace to choose to not pin them in the future.

Thanks


>
> Thanks
> Kevin



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
  2019-11-01  8:04             ` Jason Wang
@ 2019-11-01  8:09               ` Jason Wang
  2019-11-02  7:35                 ` Tian, Kevin
  0 siblings, 1 reply; 79+ messages in thread
From: Jason Wang @ 2019-11-01  8:09 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, qemu-devel, mst, pbonzini,
	alex.williamson, peterx
  Cc: tianyu.lan, jacob.jun.pan, kvm, Tian, Jun J, eric.auger, Sun,
	Yi Y, david


On 2019/11/1 下午4:04, Jason Wang wrote:
>
> On 2019/11/1 下午3:46, Tian, Kevin wrote:
>>> From: Jason Wang [mailto:jasowang@redhat.com]
>>> Sent: Friday, November 1, 2019 3:30 PM
>>>
>>>
>>> On 2019/10/31 下午10:07, Liu, Yi L wrote:
>>>>> From: Jason Wang [mailto:jasowang@redhat.com]
>>>>> Sent: Thursday, October 31, 2019 5:33 AM
>>>>> Subject: Re: [RFC v2 00/22] intel_iommu: expose Shared Virtual
>>> Addressing to VM
>>>>>
>>>>> On 2019/10/25 下午6:12, Tian, Kevin wrote:
>>>>>>> From: Jason Wang [mailto:jasowang@redhat.com]
>>>>>>> Sent: Friday, October 25, 2019 5:49 PM
>>>>>>>
>>>>>>>
>>>>>>> On 2019/10/24 下午8:34, Liu Yi L wrote:
>>>>>>>> Shared virtual address (SVA), a.k.a, Shared virtual memory 
>>>>>>>> (SVM) on
>>>>>>>> Intel platforms allow address space sharing between device DMA
>>> and
>>>>>>> applications.
>>>>>>>
>>>>>>>
>>>>>>> Interesting, so the below figure demonstrates the case of VM. I
>>>>>>> wonder how much differences if we compare it with doing SVM
>>> between
>>>>>>> device and an ordinary process (e.g dpdk)?
>>>>>>>
>>>>>>> Thanks
>>>>>> One difference is that ordinary process requires only stage-1
>>>>>> translation, while VM requires nested translation.
>>>>> A silly question, then I believe there's no need for VFIO DMA API 
>>>>> in this
>>> case consider
>>>>> the page table is shared between MMU and IOMMU?
>>>> Echo Kevin's reply. We use nested translation here. For stage-1, 
>>>> yes, no
>>> need to use
>>>> VFIO DMA API. For stage-2, we still use VFIO DMA API to program the
>>> GPA->HPA
>>>> mapping to host. :-)
>>>
>>> Cool, two more questions:
>>>
>>> - Can EPT shares its page table with IOMMU L2?
>> yes, their formats are compatible.
>>
>>> - Similar to EPT, when GPA->HPA (actually HVA->HPA) is modified by mm,
>>> VFIO need to use MMU notifier do modify L2 accordingly besides DMA API?
>>>
>> VFIO devices need to pin-down guest memory pages that are mapped
>> in IOMMU. So notifier is not required since mm won't change the mapping
>> for those pages.
>
>
> The GUP tends to lead a lot of issues, we may consider to allow 
> userspace to choose to not pin them in the future.


Btw, I'm asking since I see MMU notifier is used by intel-svm.c to flush 
IOTLB. (I don't see any users in kernel source that use that API though 
e.g intel_svm_bind_mm()).

Thanks


>
> Thanks
>
>
>>
>> Thanks
>> Kevin



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 08/22] intel_iommu: provide get_iommu_context() callback
  2019-10-24 12:34 ` [RFC v2 08/22] intel_iommu: provide get_iommu_context() callback Liu Yi L
@ 2019-11-01 14:55   ` Peter Xu
  2019-11-06 11:07     ` Liu, Yi L
  0 siblings, 1 reply; 79+ messages in thread
From: Peter Xu @ 2019-11-01 14:55 UTC (permalink / raw)
  To: Liu Yi L
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, Yi Sun, kvm, mst,
	jun.j.tian, qemu-devel, eric.auger, alex.williamson, pbonzini,
	yi.y.sun, david

On Thu, Oct 24, 2019 at 08:34:29AM -0400, Liu Yi L wrote:
> This patch adds get_iommu_context() callback to return an iommu_context
> Intel VT-d platform.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
>  hw/i386/intel_iommu.c         | 57 ++++++++++++++++++++++++++++++++++++++-----
>  include/hw/i386/intel_iommu.h | 14 ++++++++++-
>  2 files changed, 64 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 67a7836..e9f8692 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -3288,22 +3288,33 @@ static const MemoryRegionOps vtd_mem_ir_ops = {
>      },
>  };
>  
> -VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
> +static VTDBus *vtd_find_add_bus(IntelIOMMUState *s, PCIBus *bus)
>  {
>      uintptr_t key = (uintptr_t)bus;
> -    VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
> -    VTDAddressSpace *vtd_dev_as;
> -    char name[128];
> +    VTDBus *vtd_bus;
>  
> +    vtd_iommu_lock(s);

Why explicitly take the IOMMU lock here?  I mean, it's fine to take
it, but if so why not take it to cover the whole vtd_find_add_as()?

For now it'll be fine in either way because I believe iommu_lock is
not really functioning when we're still with BQL here, however if you
add that explicitly then I don't see why it's not covering that.

> +    vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
>      if (!vtd_bus) {
>          uintptr_t *new_key = g_malloc(sizeof(*new_key));
>          *new_key = (uintptr_t)bus;
>          /* No corresponding free() */
> -        vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * \
> -                            PCI_DEVFN_MAX);
> +        vtd_bus = g_malloc0(sizeof(VTDBus) + PCI_DEVFN_MAX * \
> +                    (sizeof(VTDAddressSpace *) + sizeof(VTDIOMMUContext *)));

Should this be as simple as g_malloc0(sizeof(VTDBus) since [1]?

Otherwise the patch looks sane to me.

>          vtd_bus->bus = bus;
>          g_hash_table_insert(s->vtd_as_by_busptr, new_key, vtd_bus);
>      }
> +    vtd_iommu_unlock(s);
> +    return vtd_bus;
> +}

[...]

>  struct VTDBus {
>      PCIBus* bus;		/* A reference to the bus to provide translation for */
> -    VTDAddressSpace *dev_as[0];	/* A table of VTDAddressSpace objects indexed by devfn */
> +    /* A table of VTDAddressSpace objects indexed by devfn */
> +    VTDAddressSpace *dev_as[PCI_DEVFN_MAX];
> +    /* A table of VTDIOMMUContext objects indexed by devfn */
> +    VTDIOMMUContext *dev_ic[PCI_DEVFN_MAX];

[1]

>  };
>  
>  struct VTDIOTLBEntry {
> @@ -282,5 +293,6 @@ struct IntelIOMMUState {
>   * create a new one if none exists
>   */
>  VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn);
> +VTDIOMMUContext *vtd_find_add_ic(IntelIOMMUState *s, PCIBus *bus, int devfn);
>  
>  #endif
> -- 
> 2.7.4
> 

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 03/22] intel_iommu: modify x-scalable-mode to be string option
  2019-10-24 12:34 ` [RFC v2 03/22] intel_iommu: modify x-scalable-mode to be string option Liu Yi L
@ 2019-11-01 14:57   ` Peter Xu
  2019-11-05  9:14     ` Liu, Yi L
  0 siblings, 1 reply; 79+ messages in thread
From: Peter Xu @ 2019-11-01 14:57 UTC (permalink / raw)
  To: Liu Yi L
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, Yi Sun, kvm, mst,
	jun.j.tian, qemu-devel, eric.auger, alex.williamson, pbonzini,
	yi.y.sun, david

On Thu, Oct 24, 2019 at 08:34:24AM -0400, Liu Yi L wrote:
> Intel VT-d 3.0 introduces scalable mode, and it has a bunch of capabilities
> related to scalable mode translation, thus there are multiple combinations.
> While this vIOMMU implementation wants simplify it for user by providing
> typical combinations. User could config it by "x-scalable-mode" option. The
> usage is as below:
> 
> "-device intel-iommu,x-scalable-mode=["legacy"|"modern"]"
> 
>  - "legacy": gives support for SL page table
>  - "modern": gives support for FL page table, pasid, virtual command
>  -  if not configured, means no scalable mode support, if not proper
>     configured, will throw error
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> ---
>  hw/i386/intel_iommu.c          | 15 +++++++++++++--
>  hw/i386/intel_iommu_internal.h |  3 +++
>  include/hw/i386/intel_iommu.h  |  2 +-
>  3 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 771bed2..4a1a07a 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -3019,7 +3019,7 @@ static Property vtd_properties[] = {
>      DEFINE_PROP_UINT8("aw-bits", IntelIOMMUState, aw_bits,
>                        VTD_HOST_ADDRESS_WIDTH),
>      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
> -    DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode, FALSE),
> +    DEFINE_PROP_STRING("x-scalable-mode", IntelIOMMUState, scalable_mode),
>      DEFINE_PROP_BOOL("dma-drain", IntelIOMMUState, dma_drain, true),
>      DEFINE_PROP_END_OF_LIST(),
>  };
> @@ -3581,7 +3581,12 @@ static void vtd_init(IntelIOMMUState *s)
>  
>      /* TODO: read cap/ecap from host to decide which cap to be exposed. */
>      if (s->scalable_mode) {
> -        s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
> +        if (!strcmp(s->scalable_mode, "legacy")) {
> +            s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
> +        } else if (!strcmp(s->scalable_mode, "modern")) {
> +            s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID
> +                       | VTD_ECAP_FLTS | VTD_ECAP_PSS;
> +        }

Shall we do this string op only once in vtd_decide_config() then keep
it somewhere?

Something like:

  - s->scalable_mode_str to keep the string
  - s->scalable_mode still as a bool to cache the global enablement
  - s->scalable_modern as a bool to keep the mode

?

These could be used in some MMIO path (I think) and parsing strings
always could be a bit overkill.

>      }
>  
>      vtd_reset_caches(s);
> @@ -3700,6 +3705,12 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
>          return false;
>      }
>  
> +    if (s->scalable_mode &&
> +        (strcmp(s->scalable_mode, "modern") &&
> +         strcmp(s->scalable_mode, "legacy"))) {
> +            error_setg(errp, "Invalid x-scalable-mode config");
> +    }
> +
>      return true;
>  }
>  
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index c1235a7..be7b30a 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -190,8 +190,11 @@
>  #define VTD_ECAP_PT                 (1ULL << 6)
>  #define VTD_ECAP_MHMV               (15ULL << 20)
>  #define VTD_ECAP_SRS                (1ULL << 31)
> +#define VTD_ECAP_PSS                (19ULL << 35)
> +#define VTD_ECAP_PASID              (1ULL << 40)
>  #define VTD_ECAP_SMTS               (1ULL << 43)
>  #define VTD_ECAP_SLTS               (1ULL << 46)
> +#define VTD_ECAP_FLTS               (1ULL << 47)
>  
>  /* CAP_REG */
>  /* (offset >> 4) << 24 */
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 66b931e..6062588 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -231,7 +231,7 @@ struct IntelIOMMUState {
>      uint32_t version;
>  
>      bool caching_mode;              /* RO - is cap CM enabled? */
> -    bool scalable_mode;             /* RO - is Scalable Mode supported? */
> +    char *scalable_mode;            /* RO - Scalable Mode model */
>  
>      dma_addr_t root;                /* Current root table pointer */
>      bool root_scalable;             /* Type of root table (scalable or not) */
> -- 
> 2.7.4
> 

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 05/22] vfio/common: add iommu_ctx_notifier in container
  2019-10-24 12:34 ` [RFC v2 05/22] vfio/common: add iommu_ctx_notifier in container Liu Yi L
@ 2019-11-01 14:58   ` Peter Xu
  2019-11-06 11:08     ` Liu, Yi L
  0 siblings, 1 reply; 79+ messages in thread
From: Peter Xu @ 2019-11-01 14:58 UTC (permalink / raw)
  To: Liu Yi L
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, Yi Sun, kvm, mst,
	jun.j.tian, qemu-devel, eric.auger, alex.williamson, pbonzini,
	yi.y.sun, david

On Thu, Oct 24, 2019 at 08:34:26AM -0400, Liu Yi L wrote:

[...]

> +typedef struct VFIOIOMMUContext {
> +    VFIOContainer *container;
> +    IOMMUContext *iommu_ctx;
> +    IOMMUCTXNotifier n;
> +    QLIST_ENTRY(VFIOIOMMUContext) iommu_ctx_next;
> +} VFIOIOMMUContext;
> +

No strong opinion on this - but for me it would be more meaningful to
squash this patch into where this struct is firstly used.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 09/22] vfio/pci: add iommu_context notifier for pasid alloc/free
  2019-10-29 12:15   ` David Gibson
@ 2019-11-01 17:26     ` Peter Xu
  2019-11-06 12:46       ` Liu, Yi L
  2019-11-06 12:14     ` Liu, Yi L
  1 sibling, 1 reply; 79+ messages in thread
From: Peter Xu @ 2019-11-01 17:26 UTC (permalink / raw)
  To: David Gibson
  Cc: tianyu.lan, kevin.tian, Liu Yi L, Yi Sun, kvm, mst, jun.j.tian,
	qemu-devel, eric.auger, alex.williamson, jacob.jun.pan, pbonzini,
	yi.y.sun

On Tue, Oct 29, 2019 at 01:15:44PM +0100, David Gibson wrote:
> > +union IOMMUCTXPASIDReqDesc {
> > +    struct {
> > +        uint32_t min_pasid;
> > +        uint32_t max_pasid;
> > +        int32_t alloc_result; /* pasid allocated for the alloc request */
> > +    };
> > +    struct {
> > +        uint32_t pasid; /* pasid to be free */
> > +        int free_result;
> > +    };
> > +};
> 
> Apart from theproblem with writable fields, using a big union for
> event data is pretty ugly.  If you need this different information for
> the different events, it might make more sense to have a separate
> notifier chain with a separate call interface for each event type,
> rather than trying to multiplex them together.

I have no issue on the union definiion, however I do agree that it's a
bit awkward to register one notifier for each event.

Instead of introducing even more notifier chains, I'm thinking whether
we can simply provide a single notifier hook for all the four events.
After all I don't see in what case we'll only register some of the
events, like we can't register alloc_pasid() without registering to
free_pasid() because otherwise it does not make sense..  And also you
have the wrapper struct ("IOMMUCTXEventData") which contains the event
type, so the notify() hook will know which message is this.

A side note is that I think you don't need the
IOMMUCTXEventData.length.  If you see the code, vtd_bind_guest_pasid()
does not even initialize length right now, and I think it could still
work only because none of the vfio notify() hook
(e.g. vfio_iommu_pasid_bind_notify) checks that length...

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 10/22] intel_iommu: add virtual command capability support
  2019-10-24 12:34 ` [RFC v2 10/22] intel_iommu: add virtual command capability support Liu Yi L
@ 2019-11-01 18:05   ` Peter Xu
  2019-11-06 12:40     ` Liu, Yi L
  0 siblings, 1 reply; 79+ messages in thread
From: Peter Xu @ 2019-11-01 18:05 UTC (permalink / raw)
  To: Liu Yi L
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, Yi Sun, kvm, mst,
	jun.j.tian, qemu-devel, eric.auger, alex.williamson, pbonzini,
	yi.y.sun, david

On Thu, Oct 24, 2019 at 08:34:31AM -0400, Liu Yi L wrote:
> This patch adds virtual command support to Intel vIOMMU per
> Intel VT-d 3.1 spec. And adds two virtual commands: alloc_pasid
> and free_pasid.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> ---
>  hw/i386/intel_iommu.c          | 162 ++++++++++++++++++++++++++++++++++++++++-
>  hw/i386/intel_iommu_internal.h |  38 ++++++++++
>  hw/i386/trace-events           |   1 +
>  include/hw/i386/intel_iommu.h  |   6 +-
>  4 files changed, 205 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index e9f8692..88b843f 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -944,6 +944,7 @@ static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
>                  return vtd_bus;
>              }
>          }
> +        vtd_bus = NULL;

I feel like I've commented on this..

Should this be a standalone patch?

>      }
>      return vtd_bus;
>  }
> @@ -2590,6 +2591,140 @@ static void vtd_handle_iectl_write(IntelIOMMUState *s)
>      }
>  }
>  
> +static int vtd_request_pasid_alloc(IntelIOMMUState *s)
> +{
> +    VTDBus *vtd_bus;
> +    int bus_n, devfn;
> +    IOMMUCTXEventData event_data;
> +    IOMMUCTXPASIDReqDesc req;
> +    VTDIOMMUContext *vtd_ic;
> +
> +    event_data.event = IOMMU_CTX_EVENT_PASID_ALLOC;
> +    event_data.data = &req;
> +    req.min_pasid = VTD_MIN_HPASID;
> +    req.max_pasid = VTD_MAX_HPASID;
> +    req.alloc_result = 0;
> +    event_data.length = sizeof(req);

As mentioned in the other thread, do you think we can drop this length
field?

> +    for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) {
> +        vtd_bus = vtd_find_as_from_bus_num(s, bus_n);
> +        if (!vtd_bus) {
> +            continue;
> +        }
> +        for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
> +            vtd_ic = vtd_bus->dev_ic[devfn];
> +            if (!vtd_ic) {
> +                continue;
> +            }
> +            iommu_ctx_event_notify(&vtd_ic->iommu_context, &event_data);

Considering that we'll fill in the result into event_data, it could be
a bit misleading to still call it "notify" here because normally it
should only get data from the notifier caller rather than returning a
meaningful value..  Things like SUCCESS/FAIL would be fine, but here
we're returning a pasid from the notifier which seems a bit odd.

Maybe rename it to iommu_ctx_event_deliver()?  Then we just rename all
the references of "notify" thingys into "hook" or something clearer?

> +            if (req.alloc_result > 0) {

I'd suggest we comment on this:

    We'll return the first valid result we got.  It's a bit hackish in
    that we don't have a good global interface yet to talk to modules
    like vfio to deliver this allocation request, so we're leveraging
    this per-device context to do the same thing just to make sure the
    allocation happens only once.

Same to the pasid_free() below, though you can reference the comment
here from there to be simple.

> +                return req.alloc_result;
> +            }
> +        }
> +    }
> +    return -1;
> +}
> +
> +static int vtd_request_pasid_free(IntelIOMMUState *s, uint32_t pasid)
> +{
> +    VTDBus *vtd_bus;
> +    int bus_n, devfn;
> +    IOMMUCTXEventData event_data;
> +    IOMMUCTXPASIDReqDesc req;
> +    VTDIOMMUContext *vtd_ic;
> +
> +    event_data.event = IOMMU_CTX_EVENT_PASID_FREE;
> +    event_data.data = &req;
> +    req.pasid = pasid;
> +    req.free_result = 0;
> +    event_data.length = sizeof(req);
> +    for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) {
> +        vtd_bus = vtd_find_as_from_bus_num(s, bus_n);
> +        if (!vtd_bus) {
> +            continue;
> +        }
> +        for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
> +            vtd_ic = vtd_bus->dev_ic[devfn];
> +            if (!vtd_ic) {
> +                continue;
> +            }
> +            iommu_ctx_event_notify(&vtd_ic->iommu_context, &event_data);
> +            if (req.free_result == 0) {
> +                return 0;
> +            }
> +        }
> +    }
> +    return -1;
> +}
> +
> +/*
> + * If IP is not set, set it and return 0
> + * If IP is already set, return -1
> + */
> +static int vtd_vcmd_rsp_ip_check(IntelIOMMUState *s)
> +{
> +    if (!(s->vccap & VTD_VCCAP_PAS) ||
> +         (s->vcrsp & 1)) {
> +        return -1;
> +    }

VTD_VCCAP_PAS is not a IP check, so maybe simply move these chunk out
to vtd_handle_vcmd_write?  Then we can rename this function to
"void vtd_vcmd_ip_set(...)".

> +    s->vcrsp = 1;
> +    vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> +                     ((uint64_t) s->vcrsp));
> +    return 0;
> +}
> +
> +static void vtd_vcmd_clear_ip(IntelIOMMUState *s)
> +{
> +    s->vcrsp &= (~((uint64_t)(0x1)));
> +    vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> +                     ((uint64_t) s->vcrsp));
> +}
> +
> +/* Handle write to Virtual Command Register */
> +static int vtd_handle_vcmd_write(IntelIOMMUState *s, uint64_t val)
> +{
> +    uint32_t pasid;
> +    int ret = -1;
> +
> +    trace_vtd_reg_write_vcmd(s->vcrsp, val);
> +
> +    /*
> +     * Since vCPU should be blocked when the guest VMCD
> +     * write was trapped to here. Should be no other vCPUs
> +     * try to access VCMD if guest software is well written.
> +     * However, we still emulate the IP bit here in case of
> +     * bad guest software. Also align with the spec.
> +     */
> +    ret = vtd_vcmd_rsp_ip_check(s);
> +    if (ret) {
> +        return ret;
> +    }
> +    switch (val & VTD_VCMD_CMD_MASK) {
> +    case VTD_VCMD_ALLOC_PASID:
> +        ret = vtd_request_pasid_alloc(s);
> +        if (ret < 0) {
> +            s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_NO_AVAILABLE_PASID);
> +        } else {
> +            s->vcrsp |= VTD_VCRSP_RSLT(ret);
> +        }
> +        break;
> +
> +    case VTD_VCMD_FREE_PASID:
> +        pasid = VTD_VCMD_PASID_VALUE(val);
> +        ret = vtd_request_pasid_free(s, pasid);
> +        if (ret < 0) {
> +            s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_FREE_INVALID_PASID);
> +        }
> +        break;
> +
> +    default:
> +        s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_UNDEFINED_CMD);
> +        printf("Virtual Command: unsupported command!!!\n");

Perhaps error_report_once()?

> +        break;
> +    }
> +    vtd_vcmd_clear_ip(s);
> +    return 0;
> +}
> +
>  static uint64_t vtd_mem_read(void *opaque, hwaddr addr, unsigned size)
>  {
>      IntelIOMMUState *s = opaque;
> @@ -2879,6 +3014,23 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
>          vtd_set_long(s, addr, val);
>          break;
>  
> +    case DMAR_VCMD_REG:
> +        if (!vtd_handle_vcmd_write(s, val)) {
> +            if (size == 4) {
> +                vtd_set_long(s, addr, val);
> +            } else {
> +                vtd_set_quad(s, addr, val);
> +            }
> +        }
> +        break;
> +
> +    case DMAR_VCMD_REG_HI:
> +        assert(size == 4);

This assert() seems scary, but of course not a problem of this patch
because plenty of that are there in vtd_mem_write..  So we can fix
that later.

Do you know what should happen on bare-metal from spec-wise that when
the guest e.g. writes 2 bytes to these mmio regions?

> +        if (!vtd_handle_vcmd_write(s, val)) {
> +            vtd_set_long(s, addr, val);
> +        }
> +        break;
> +
>      default:
>          if (size == 4) {
>              vtd_set_long(s, addr, val);
> @@ -3617,7 +3769,8 @@ static void vtd_init(IntelIOMMUState *s)
>              s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
>          } else if (!strcmp(s->scalable_mode, "modern")) {
>              s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID
> -                       | VTD_ECAP_FLTS | VTD_ECAP_PSS;
> +                       | VTD_ECAP_FLTS | VTD_ECAP_PSS | VTD_ECAP_VCS;
> +            s->vccap |= VTD_VCCAP_PAS;
>          }
>      }
>  

[...]

> +#define VTD_VCMD_CMD_MASK           0xffUL
> +#define VTD_VCMD_PASID_VALUE(val)   (((val) >> 8) & 0xfffff)
> +
> +#define VTD_VCRSP_RSLT(val)         ((val) << 8)
> +#define VTD_VCRSP_SC(val)           (((val) & 0x3) << 1)
> +
> +#define VTD_VCMD_UNDEFINED_CMD         1ULL
> +#define VTD_VCMD_NO_AVAILABLE_PASID    2ULL

According to 10.4.44 - should this be 1?

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 06/22] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps
  2019-10-24 12:34 ` [RFC v2 06/22] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps Liu Yi L
  2019-10-27 17:43   ` David Gibson
@ 2019-11-01 18:09   ` Peter Xu
  2019-11-06  8:15     ` Liu, Yi L
  1 sibling, 1 reply; 79+ messages in thread
From: Peter Xu @ 2019-11-01 18:09 UTC (permalink / raw)
  To: Liu Yi L
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, Yi Sun, kvm, mst,
	jun.j.tian, qemu-devel, eric.auger, alex.williamson, pbonzini,
	yi.y.sun, david

On Thu, Oct 24, 2019 at 08:34:27AM -0400, Liu Yi L wrote:
> This patch modifies pci_setup_iommu() to set PCIIOMMUOps instead of only
> setting PCIIOMMUFunc. PCIIOMMUFunc is previously used to get an address
> space for a device in vendor specific way. The PCIIOMMUOps still offers
> this functionality. Use PCIIOMMUOps leaves space to add more iommu related
> vendor specific operations.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 07/22] hw/pci: introduce pci_device_iommu_context()
  2019-10-24 12:34 ` [RFC v2 07/22] hw/pci: introduce pci_device_iommu_context() Liu Yi L
  2019-10-29 11:50   ` David Gibson
@ 2019-11-01 18:09   ` Peter Xu
  2019-11-06  8:14     ` Liu, Yi L
  1 sibling, 1 reply; 79+ messages in thread
From: Peter Xu @ 2019-11-01 18:09 UTC (permalink / raw)
  To: Liu Yi L
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, Yi Sun, kvm, mst,
	jun.j.tian, qemu-devel, eric.auger, alex.williamson, pbonzini,
	yi.y.sun, david

On Thu, Oct 24, 2019 at 08:34:28AM -0400, Liu Yi L wrote:
> This patch adds pci_device_iommu_context() to get an iommu_context
> for a given device. A new callback is added in PCIIOMMUOps. Users
> who wants to listen to events issued by vIOMMU could use this new
> interface to get an iommu_context and register their own notifiers,
> then wait for notifications from vIOMMU. e.g. VFIO is the first user
> of it to listen to the PASID_ALLOC/PASID_BIND/CACHE_INV events and
> propagate the events to host.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
  2019-11-01  8:09               ` Jason Wang
@ 2019-11-02  7:35                 ` Tian, Kevin
  0 siblings, 0 replies; 79+ messages in thread
From: Tian, Kevin @ 2019-11-02  7:35 UTC (permalink / raw)
  To: Jason Wang, Liu, Yi L, qemu-devel, mst, pbonzini,
	alex.williamson, peterx
  Cc: tianyu.lan, jacob.jun.pan, kvm, Tian, Jun J, eric.auger, Sun,
	Yi Y, david

> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Friday, November 1, 2019 4:10 PM
> 
> 
> On 2019/11/1 下午4:04, Jason Wang wrote:
> >
> > On 2019/11/1 下午3:46, Tian, Kevin wrote:
> >>> From: Jason Wang [mailto:jasowang@redhat.com]
> >>> Sent: Friday, November 1, 2019 3:30 PM
> >>>
> >>>
> >>> On 2019/10/31 下午10:07, Liu, Yi L wrote:
> >>>>> From: Jason Wang [mailto:jasowang@redhat.com]
> >>>>> Sent: Thursday, October 31, 2019 5:33 AM
> >>>>> Subject: Re: [RFC v2 00/22] intel_iommu: expose Shared Virtual
> >>> Addressing to VM
> >>>>>
> >>>>> On 2019/10/25 下午6:12, Tian, Kevin wrote:
> >>>>>>> From: Jason Wang [mailto:jasowang@redhat.com]
> >>>>>>> Sent: Friday, October 25, 2019 5:49 PM
> >>>>>>>
> >>>>>>>
> >>>>>>> On 2019/10/24 下午8:34, Liu Yi L wrote:
> >>>>>>>> Shared virtual address (SVA), a.k.a, Shared virtual memory
> >>>>>>>> (SVM) on
> >>>>>>>> Intel platforms allow address space sharing between device DMA
> >>> and
> >>>>>>> applications.
> >>>>>>>
> >>>>>>>
> >>>>>>> Interesting, so the below figure demonstrates the case of VM. I
> >>>>>>> wonder how much differences if we compare it with doing SVM
> >>> between
> >>>>>>> device and an ordinary process (e.g dpdk)?
> >>>>>>>
> >>>>>>> Thanks
> >>>>>> One difference is that ordinary process requires only stage-1
> >>>>>> translation, while VM requires nested translation.
> >>>>> A silly question, then I believe there's no need for VFIO DMA API
> >>>>> in this
> >>> case consider
> >>>>> the page table is shared between MMU and IOMMU?
> >>>> Echo Kevin's reply. We use nested translation here. For stage-1,
> >>>> yes, no
> >>> need to use
> >>>> VFIO DMA API. For stage-2, we still use VFIO DMA API to program the
> >>> GPA->HPA
> >>>> mapping to host. :-)
> >>>
> >>> Cool, two more questions:
> >>>
> >>> - Can EPT shares its page table with IOMMU L2?
> >> yes, their formats are compatible.
> >>
> >>> - Similar to EPT, when GPA->HPA (actually HVA->HPA) is modified by
> mm,
> >>> VFIO need to use MMU notifier do modify L2 accordingly besides DMA
> API?
> >>>
> >> VFIO devices need to pin-down guest memory pages that are mapped
> >> in IOMMU. So notifier is not required since mm won't change the
> mapping
> >> for those pages.
> >
> >
> > The GUP tends to lead a lot of issues, we may consider to allow
> > userspace to choose to not pin them in the future.
> 
> 
> Btw, I'm asking since I see MMU notifier is used by intel-svm.c to flush
> IOTLB. (I don't see any users in kernel source that use that API though
> e.g intel_svm_bind_mm()).
> 

intel-svm.c requires MMU notifier to invalidate IOTLB upon any change
on the CPU page table, when the latter is shared with device in SVA
case. But for VFIO usage, which is based on stage2, the map/unmap 
requests explicitly come from userspace. there is no need to sync with
mm.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 11/22] intel_iommu: process pasid cache invalidation
  2019-10-24 12:34 ` [RFC v2 11/22] intel_iommu: process pasid cache invalidation Liu Yi L
@ 2019-11-02 16:05   ` Peter Xu
  2019-11-06  5:55     ` Liu, Yi L
  0 siblings, 1 reply; 79+ messages in thread
From: Peter Xu @ 2019-11-02 16:05 UTC (permalink / raw)
  To: Liu Yi L
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, Yi Sun, kvm, mst,
	jun.j.tian, qemu-devel, eric.auger, alex.williamson, pbonzini,
	yi.y.sun, david

On Thu, Oct 24, 2019 at 08:34:32AM -0400, Liu Yi L wrote:
> This patch adds PASID cache invalidation handling. When guest enabled
> PASID usages (e.g. SVA), guest software should issue a proper PASID
> cache invalidation when caching-mode is exposed. This patch only adds
> the draft handling of pasid cache invalidation. Detailed handling will
> be added in subsequent patches.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
>  hw/i386/intel_iommu.c          | 66 ++++++++++++++++++++++++++++++++++++++----
>  hw/i386/intel_iommu_internal.h | 12 ++++++++
>  hw/i386/trace-events           |  3 ++
>  3 files changed, 76 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 88b843f..84ff6f0 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2335,6 +2335,63 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
>      return true;
>  }
>  
> +static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id)
> +{
> +    return 0;
> +}
> +
> +static int vtd_pasid_cache_psi(IntelIOMMUState *s,
> +                               uint16_t domain_id, uint32_t pasid)
> +{
> +    return 0;
> +}
> +
> +static int vtd_pasid_cache_gsi(IntelIOMMUState *s)
> +{
> +    return 0;
> +}
> +
> +static bool vtd_process_pasid_desc(IntelIOMMUState *s,
> +                                   VTDInvDesc *inv_desc)
> +{
> +    uint16_t domain_id;
> +    uint32_t pasid;
> +    int ret = 0;
> +
> +    if ((inv_desc->val[0] & VTD_INV_DESC_PASIDC_RSVD_VAL0) ||
> +        (inv_desc->val[1] & VTD_INV_DESC_PASIDC_RSVD_VAL1) ||
> +        (inv_desc->val[2] & VTD_INV_DESC_PASIDC_RSVD_VAL2) ||
> +        (inv_desc->val[3] & VTD_INV_DESC_PASIDC_RSVD_VAL3)) {
> +        error_report_once("non-zero-field-in-pc_inv_desc hi: 0x%" PRIx64
> +                  " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]);
> +        return false;
> +    }
> +
> +    domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->val[0]);
> +    pasid = VTD_INV_DESC_PASIDC_PASID(inv_desc->val[0]);
> +
> +    switch (inv_desc->val[0] & VTD_INV_DESC_PASIDC_G) {
> +    case VTD_INV_DESC_PASIDC_DSI:
> +        ret = vtd_pasid_cache_dsi(s, domain_id);
> +        break;
> +
> +    case VTD_INV_DESC_PASIDC_PASID_SI:
> +        ret = vtd_pasid_cache_psi(s, domain_id, pasid);
> +        break;
> +
> +    case VTD_INV_DESC_PASIDC_GLOBAL:
> +        ret = vtd_pasid_cache_gsi(s);
> +        break;
> +
> +    default:
> +        error_report_once("invalid-inv-granu-in-pc_inv_desc hi: 0x%" PRIx64
> +                  " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]);
> +        return false;
> +    }
> +
> +    return (ret == 0) ? true : false;
> +}
> +
>  static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
>                                       VTDInvDesc *inv_desc)
>  {
> @@ -2441,12 +2498,11 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
>          }
>          break;
>  
> -    /*
> -     * TODO: the entity of below two cases will be implemented in future series.
> -     * To make guest (which integrates scalable mode support patch set in
> -     * iommu driver) work, just return true is enough so far.
> -     */
>      case VTD_INV_DESC_PC:
> +        trace_vtd_inv_desc("pasid-cache", inv_desc.val[1], inv_desc.val[0]);

Could be helpful if you dump [2|3] together here...

> +        if (!vtd_process_pasid_desc(s, &inv_desc)) {
> +            return false;
> +        }
>          break;
>  
>      case VTD_INV_DESC_PIOTLB:
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index 8668771..c6cb28b 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -445,6 +445,18 @@ typedef union VTDInvDesc VTDInvDesc;
>  #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \
>          (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
>  
> +#define VTD_INV_DESC_PASIDC_G          (3ULL << 4)
> +#define VTD_INV_DESC_PASIDC_PASID(val) (((val) >> 32) & 0xfffffULL)
> +#define VTD_INV_DESC_PASIDC_DID(val)   (((val) >> 16) & VTD_DOMAIN_ID_MASK)
> +#define VTD_INV_DESC_PASIDC_RSVD_VAL0  0xfff000000000ffc0ULL

Nit: Mind to comment here that bit 9-11 is marked as zero rather than
reserved?  This seems to work but if bit 9-11 can be non-zero in some
other descriptors then it would be clearer to define it as
0xfff000000000f1c0ULL then explicitly check bits 9-11.

Otherwise looks good to me.

> +#define VTD_INV_DESC_PASIDC_RSVD_VAL1  0xffffffffffffffffULL
> +#define VTD_INV_DESC_PASIDC_RSVD_VAL2  0xffffffffffffffffULL
> +#define VTD_INV_DESC_PASIDC_RSVD_VAL3  0xffffffffffffffffULL
> +
> +#define VTD_INV_DESC_PASIDC_DSI        (0ULL << 4)
> +#define VTD_INV_DESC_PASIDC_PASID_SI   (1ULL << 4)
> +#define VTD_INV_DESC_PASIDC_GLOBAL     (3ULL << 4)
> +
>  /* Information about page-selective IOTLB invalidate */
>  struct VTDIOTLBPageInvInfo {
>      uint16_t domain_id;
> diff --git a/hw/i386/trace-events b/hw/i386/trace-events
> index 43c0314..6da8bd2 100644
> --- a/hw/i386/trace-events
> +++ b/hw/i386/trace-events
> @@ -22,6 +22,9 @@ vtd_inv_qi_head(uint16_t head) "read head %d"
>  vtd_inv_qi_tail(uint16_t head) "write tail %d"
>  vtd_inv_qi_fetch(void) ""
>  vtd_context_cache_reset(void) ""
> +vtd_pasid_cache_gsi(void) ""
> +vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain 0x%"PRIx16
> +vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32
>  vtd_re_not_present(uint8_t bus) "Root entry bus %"PRIu8" not present"
>  vtd_ce_not_present(uint8_t bus, uint8_t devfn) "Context entry bus %"PRIu8" devfn %"PRIu8" not present"
>  vtd_iotlb_page_hit(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t domain) "IOTLB page hit sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" domain 0x%"PRIx16
> -- 
> 2.7.4
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 12/22] intel_iommu: add present bit check for pasid table entries
  2019-10-24 12:34 ` [RFC v2 12/22] intel_iommu: add present bit check for pasid table entries Liu Yi L
@ 2019-11-02 16:20   ` Peter Xu
  2019-11-06  8:14     ` Liu, Yi L
  0 siblings, 1 reply; 79+ messages in thread
From: Peter Xu @ 2019-11-02 16:20 UTC (permalink / raw)
  To: Liu Yi L
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, Yi Sun, kvm, mst,
	jun.j.tian, qemu-devel, eric.auger, alex.williamson, pbonzini,
	yi.y.sun, david

On Thu, Oct 24, 2019 at 08:34:33AM -0400, Liu Yi L wrote:
> The present bit check for pasid entry (pe) and pasid directory
> entry (pdire) were missed in previous commits as fpd bit check
> doesn't require present bit as "Set". This patch adds the present
> bit check for callers which wants to get a valid pe/pdire.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 14/22] vfio/pci: add iommu_context notifier for pasid bind/unbind
  2019-10-24 12:34 ` [RFC v2 14/22] vfio/pci: add iommu_context notifier for pasid bind/unbind Liu Yi L
@ 2019-11-04 16:02   ` David Gibson
  2019-11-06 12:22     ` Liu, Yi L
  0 siblings, 1 reply; 79+ messages in thread
From: David Gibson @ 2019-11-04 16:02 UTC (permalink / raw)
  To: Liu Yi L
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, Yi Sun, kvm, mst,
	jun.j.tian, qemu-devel, peterx, eric.auger, alex.williamson,
	pbonzini, yi.y.sun

[-- Attachment #1: Type: text/plain, Size: 4550 bytes --]

On Thu, Oct 24, 2019 at 08:34:35AM -0400, Liu Yi L wrote:
> This patch adds notifier for pasid bind/unbind. VFIO registers this
> notifier to listen to the dual-stage translation (a.k.a. nested
> translation) configuration changes and propagate to host. Thus vIOMMU
> is able to set its translation structures to host.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
>  hw/vfio/pci.c            | 39 +++++++++++++++++++++++++++++++++++++++
>  include/hw/iommu/iommu.h | 11 +++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 8721ff6..012b8ed 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2767,6 +2767,41 @@ static void vfio_iommu_pasid_free_notify(IOMMUCTXNotifier *n,
>      pasid_req->free_result = ret;
>  }
>  
> +static void vfio_iommu_pasid_bind_notify(IOMMUCTXNotifier *n,
> +                                         IOMMUCTXEventData *event_data)
> +{
> +#ifdef __linux__

Is hw/vfio/pci.c even built on non-linux hosts?

> +    VFIOIOMMUContext *giommu_ctx = container_of(n, VFIOIOMMUContext, n);
> +    VFIOContainer *container = giommu_ctx->container;
> +    IOMMUCTXPASIDBindData *pasid_bind =
> +                              (IOMMUCTXPASIDBindData *) event_data->data;
> +    struct vfio_iommu_type1_bind *bind;
> +    struct iommu_gpasid_bind_data *bind_data;
> +    unsigned long argsz;
> +
> +    argsz = sizeof(*bind) + sizeof(*bind_data);
> +    bind = g_malloc0(argsz);
> +    bind->argsz = argsz;
> +    bind->bind_type = VFIO_IOMMU_BIND_GUEST_PASID;
> +    bind_data = (struct iommu_gpasid_bind_data *) &bind->data;
> +    *bind_data = *pasid_bind->data;
> +
> +    if (pasid_bind->flag & IOMMU_CTX_BIND_PASID) {
> +        if (ioctl(container->fd, VFIO_IOMMU_BIND, bind) != 0) {
> +            error_report("%s: pasid (%llu:%llu) bind failed: %d", __func__,
> +                         bind_data->gpasid, bind_data->hpasid, -errno);
> +        }
> +    } else if (pasid_bind->flag & IOMMU_CTX_UNBIND_PASID) {
> +        if (ioctl(container->fd, VFIO_IOMMU_UNBIND, bind) != 0) {
> +            error_report("%s: pasid (%llu:%llu) unbind failed: %d", __func__,
> +                         bind_data->gpasid, bind_data->hpasid, -errno);
> +        }
> +    }
> +
> +    g_free(bind);
> +#endif
> +}
> +
>  static void vfio_realize(PCIDevice *pdev, Error **errp)
>  {
>      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> @@ -3079,6 +3114,10 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>                                           iommu_context,
>                                           vfio_iommu_pasid_free_notify,
>                                           IOMMU_CTX_EVENT_PASID_FREE);
> +        vfio_register_iommu_ctx_notifier(vdev,
> +                                         iommu_context,
> +                                         vfio_iommu_pasid_bind_notify,
> +                                         IOMMU_CTX_EVENT_PASID_BIND);
>      }
>  
>      return;
> diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
> index 4352afd..4f21aa1 100644
> --- a/include/hw/iommu/iommu.h
> +++ b/include/hw/iommu/iommu.h
> @@ -33,6 +33,7 @@ typedef struct IOMMUContext IOMMUContext;
>  enum IOMMUCTXEvent {
>      IOMMU_CTX_EVENT_PASID_ALLOC,
>      IOMMU_CTX_EVENT_PASID_FREE,
> +    IOMMU_CTX_EVENT_PASID_BIND,
>      IOMMU_CTX_EVENT_NUM,
>  };
>  typedef enum IOMMUCTXEvent IOMMUCTXEvent;
> @@ -50,6 +51,16 @@ union IOMMUCTXPASIDReqDesc {
>  };
>  typedef union IOMMUCTXPASIDReqDesc IOMMUCTXPASIDReqDesc;
>  
> +struct IOMMUCTXPASIDBindData {
> +#define IOMMU_CTX_BIND_PASID   (1 << 0)
> +#define IOMMU_CTX_UNBIND_PASID (1 << 1)
> +    uint32_t flag;
> +#ifdef __linux__
> +    struct iommu_gpasid_bind_data *data;

Embedding a linux specific structure in the notification message seems
dubious to me.

> +#endif
> +};
> +typedef struct IOMMUCTXPASIDBindData IOMMUCTXPASIDBindData;
> +
>  struct IOMMUCTXEventData {
>      IOMMUCTXEvent event;
>      uint64_t length;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 13/22] intel_iommu: add PASID cache management infrastructure
  2019-10-24 12:34 ` [RFC v2 13/22] intel_iommu: add PASID cache management infrastructure Liu Yi L
@ 2019-11-04 17:08   ` Peter Xu
  2019-11-04 20:06   ` Peter Xu
  1 sibling, 0 replies; 79+ messages in thread
From: Peter Xu @ 2019-11-04 17:08 UTC (permalink / raw)
  To: Liu Yi L
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, Yi Sun, kvm, mst,
	jun.j.tian, qemu-devel, eric.auger, alex.williamson, pbonzini,
	yi.y.sun, david

On Thu, Oct 24, 2019 at 08:34:34AM -0400, Liu Yi L wrote:
> This patch adds a PASID cache management infrastructure based on
> new added structure VTDPASIDAddressSpace, which is used to track
> the PASID usage and future PASID tagged DMA address translation
> support in vIOMMU.
> 
>     struct VTDPASIDAddressSpace {
>         VTDBus *vtd_bus;
>         uint8_t devfn;
>         AddressSpace as;
>         uint32_t pasid;
>         IntelIOMMUState *iommu_state;
>         VTDContextCacheEntry context_cache_entry;
>         QLIST_ENTRY(VTDPASIDAddressSpace) next;
>         VTDPASIDCacheEntry pasid_cache_entry;
>     };
> 
> Ideally, a VTDPASIDAddressSpace instance is created when a PASID
> is bound with a DMA AddressSpace. Intel VT-d spec requires guest
> software to issue pasid cache invalidation when bind or unbind a
> pasid with an address space under caching-mode. However, as
> VTDPASIDAddressSpace instances also act as pasid cache in this
> implementation, its creation also happens during vIOMMU PASID
> tagged DMA translation. The creation in this path will not be
> added in this patch since no PASID-capable emulated devices for
> now.

So is this patch an incomplete version even for the pasid caching
layer for emulated device?

IMHO it would be considered as acceptable to merge something that is
even not ready from hardware pov but at least from software pov it is
complete (so when the hardware is ready we should logically run the
binary directly on them, bugs can happen but that's another story).
However if for this case:

  - it's not even complete as is (in translation functions it seems
    that we don't ever use this cache layer at all),

  - we don't have emulated device supported for pasid yet at all, so
    even further to have this code start to make any sense, and,

  - this is a 400 line patch as standalone :)  Which means that we
    need to start maintain these 400 LOC starting from the day when it
    gets merged, while it's far from even being tested.  Then I don't
    see how to maintain...

With above, I would suggest you put this patch into the future
patchset where you would like to have the first emulated device for
pasid and then you can even test this patch with those ones.  What do
you think?

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
  2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (24 preceding siblings ...)
  2019-10-25  9:49 ` Jason Wang
@ 2019-11-04 17:22 ` Peter Xu
  2019-11-05  9:09   ` Liu, Yi L
  25 siblings, 1 reply; 79+ messages in thread
From: Peter Xu @ 2019-11-04 17:22 UTC (permalink / raw)
  To: Liu Yi L
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, kvm, mst, jun.j.tian,
	qemu-devel, eric.auger, alex.williamson, pbonzini, yi.y.sun,
	david

On Thu, Oct 24, 2019 at 08:34:21AM -0400, Liu Yi L wrote:
> Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on Intel
> platforms allow address space sharing between device DMA and applications.
> SVA can reduce programming complexity and enhance security.
> This series is intended to expose SVA capability to VMs. i.e. shared guest
> application address space with passthru devices. The whole SVA virtualization
> requires QEMU/VFIO/IOMMU changes. This series includes the QEMU changes, for
> VFIO and IOMMU changes, they are in separate series (listed in the "Related
> series").
> 
> The high-level architecture for SVA virtualization is as below:
> 
>     .-------------.  .---------------------------.
>     |   vIOMMU    |  | Guest process CR3, FL only|
>     |             |  '---------------------------'
>     .----------------/
>     | PASID Entry |--- PASID cache flush -
>     '-------------'                       |
>     |             |                       V
>     |             |                CR3 in GPA
>     '-------------'
> Guest
> ------| Shadow |--------------------------|--------
>       v        v                          v
> Host
>     .-------------.  .----------------------.
>     |   pIOMMU    |  | Bind FL for GVA-GPA  |
>     |             |  '----------------------'
>     .----------------/  |
>     | PASID Entry |     V (Nested xlate)
>     '----------------\.------------------------------.
>     |             |   |SL for GPA-HPA, default domain|
>     |             |   '------------------------------'
>     '-------------'
> Where:
>  - FL = First level/stage one page tables
>  - SL = Second level/stage two page tables

Yi,

Would you mind to always mention what tests you have been done with
the patchset in the cover letter?  It'll be fine to say that you're
running this against FPGAs so no one could really retest it, but still
it would be good to know that as well.  It'll even be better to
mention that which part of the series is totally untested if you are
aware of.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 13/22] intel_iommu: add PASID cache management infrastructure
  2019-10-24 12:34 ` [RFC v2 13/22] intel_iommu: add PASID cache management infrastructure Liu Yi L
  2019-11-04 17:08   ` Peter Xu
@ 2019-11-04 20:06   ` Peter Xu
  2019-11-06  7:56     ` Liu, Yi L
  1 sibling, 1 reply; 79+ messages in thread
From: Peter Xu @ 2019-11-04 20:06 UTC (permalink / raw)
  To: Liu Yi L
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, Yi Sun, kvm, mst,
	jun.j.tian, qemu-devel, eric.auger, alex.williamson, pbonzini,
	yi.y.sun, david

On Thu, Oct 24, 2019 at 08:34:34AM -0400, Liu Yi L wrote:
> This patch adds a PASID cache management infrastructure based on
> new added structure VTDPASIDAddressSpace, which is used to track
> the PASID usage and future PASID tagged DMA address translation
> support in vIOMMU.
> 
>     struct VTDPASIDAddressSpace {
>         VTDBus *vtd_bus;
>         uint8_t devfn;
>         AddressSpace as;
>         uint32_t pasid;
>         IntelIOMMUState *iommu_state;
>         VTDContextCacheEntry context_cache_entry;
>         QLIST_ENTRY(VTDPASIDAddressSpace) next;
>         VTDPASIDCacheEntry pasid_cache_entry;
>     };
> 
> Ideally, a VTDPASIDAddressSpace instance is created when a PASID
> is bound with a DMA AddressSpace. Intel VT-d spec requires guest
> software to issue pasid cache invalidation when bind or unbind a
> pasid with an address space under caching-mode. However, as
> VTDPASIDAddressSpace instances also act as pasid cache in this
> implementation, its creation also happens during vIOMMU PASID
> tagged DMA translation. The creation in this path will not be
> added in this patch since no PASID-capable emulated devices for
> now.
> 
> The implementation in this patch manages VTDPASIDAddressSpace
> instances per PASID+BDF (lookup and insert will use PASID and
> BDF) since Intel VT-d spec allows per-BDF PASID Table. When a
> guest bind a PASID with an AddressSpace, QEMU will capture the
> guest pasid selective pasid cache invalidation, and allocate
> remove a VTDPASIDAddressSpace instance per the invalidation
> reasons:
> 
>     *) a present pasid entry moved to non-present
>     *) a present pasid entry to be a present entry
>     *) a non-present pasid entry moved to present
> 
> vIOMMU emulator could figure out the reason by fetching latest
> guest pasid entry.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>

Ok feel free to ignore my previous reply... I didn't notice it's
actually the pasid entry cache layer rather than the whole pasid
layer (including piotlb).  Comments below.

> ---
>  hw/i386/intel_iommu.c          | 356 +++++++++++++++++++++++++++++++++++++++++
>  hw/i386/intel_iommu_internal.h |  10 ++
>  hw/i386/trace-events           |   1 +
>  include/hw/i386/intel_iommu.h  |  36 ++++-
>  4 files changed, 402 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 90b8f6c..d8827c9 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -40,6 +40,7 @@
>  #include "kvm_i386.h"
>  #include "migration/vmstate.h"
>  #include "trace.h"
> +#include "qemu/jhash.h"
>  
>  /* context entry operations */
>  #define VTD_CE_GET_RID2PASID(ce) \
> @@ -65,6 +66,8 @@
>  static void vtd_address_space_refresh_all(IntelIOMMUState *s);
>  static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
>  
> +static void vtd_pasid_cache_reset(IntelIOMMUState *s);
> +
>  static void vtd_panic_require_caching_mode(void)
>  {
>      error_report("We need to set caching-mode=on for intel-iommu to enable "
> @@ -276,6 +279,7 @@ static void vtd_reset_caches(IntelIOMMUState *s)
>      vtd_iommu_lock(s);
>      vtd_reset_iotlb_locked(s);
>      vtd_reset_context_cache_locked(s);
> +    vtd_pasid_cache_reset(s);
>      vtd_iommu_unlock(s);
>  }
>  
> @@ -686,6 +690,11 @@ static inline bool vtd_pe_type_check(X86IOMMUState *x86_iommu,
>      return true;
>  }
>  
> +static inline uint16_t vtd_pe_get_domain_id(VTDPASIDEntry *pe)
> +{
> +    return VTD_SM_PASID_ENTRY_DID((pe)->val[1]);
> +}
> +
>  static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire)
>  {
>      return pdire->val & 1;
> @@ -2389,19 +2398,361 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
>      return true;
>  }
>  
> +static inline struct pasid_key *vtd_get_pasid_key(uint32_t pasid,
> +                                                  uint16_t sid)
> +{
> +    struct pasid_key *key = g_malloc0(sizeof(*key));

I think you can simply return the pasid_key directly maybe otherwise
should be careful on mem leak.  Actually I think it's leaked below...

> +    key->pasid = pasid;
> +    key->sid = sid;
> +    return key;
> +}
> +
> +static guint vtd_pasid_as_key_hash(gconstpointer v)
> +{
> +    struct pasid_key *key = (struct pasid_key *)v;
> +    uint32_t a, b, c;
> +
> +    /* Jenkins hash */
> +    a = b = c = JHASH_INITVAL + sizeof(*key);
> +    a += key->sid;
> +    b += extract32(key->pasid, 0, 16);
> +    c += extract32(key->pasid, 16, 16);
> +
> +    __jhash_mix(a, b, c);
> +    __jhash_final(a, b, c);

I'm totally not good at hash, but I'm curious why no one wants to
introduce at least a jhash() so we don't need to call these internals
(I believe that's how kernel did it).  At the meantime I don't see how
it would be better than things like g_str_hash() too so I'd be glad if
anyone could help explain a bit...

> +
> +    return c;
> +}
> +
> +static gboolean vtd_pasid_as_key_equal(gconstpointer v1, gconstpointer v2)
> +{
> +    const struct pasid_key *k1 = v1;
> +    const struct pasid_key *k2 = v2;
> +
> +    return (k1->pasid == k2->pasid) && (k1->sid == k2->sid);
> +}
> +
> +static inline bool vtd_pc_is_dom_si(struct VTDPASIDCacheInfo *pc_info)
> +{
> +    return pc_info->flags & VTD_PASID_CACHE_DOMSI;
> +}
> +
> +static inline bool vtd_pc_is_pasid_si(struct VTDPASIDCacheInfo *pc_info)
> +{
> +    return pc_info->flags & VTD_PASID_CACHE_PASIDSI;

AFAIS these only used once.  How about removing these helpers?  I
don't see much on helping readability or anything...  please see below
at [1].

> +}
> +
> +static inline int vtd_dev_get_pe_from_pasid(IntelIOMMUState *s,
> +                                            uint8_t bus_num,
> +                                            uint8_t devfn,
> +                                            uint32_t pasid,
> +                                            VTDPASIDEntry *pe)
> +{
> +    VTDContextEntry ce;
> +    int ret;
> +    dma_addr_t pasid_dir_base;
> +
> +    if (!s->root_scalable) {
> +        return -VTD_FR_PASID_TABLE_INV;
> +    }
> +
> +    ret = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    pasid_dir_base = VTD_CE_GET_PASID_DIR_TABLE(&ce);
> +    ret = vtd_get_pe_from_pasid_table(s,
> +                                  pasid_dir_base, pasid, pe);
> +
> +    return ret;
> +}
> +
> +static bool vtd_pasid_entry_compare(VTDPASIDEntry *p1, VTDPASIDEntry *p2)
> +{
> +    int i = 0;
> +    while (i < sizeof(*p1) / sizeof(p1->val)) {
> +        if (p1->val[i] != p2->val[i]) {
> +            return false;
> +        }
> +        i++;
> +    }
> +    return true;

Will this work?

  return !memcmp(p1, p2, sizeof(*p1));

> +}
> +
> +/**
> + * This function is used to clear pasid_cache_gen of cached pasid
> + * entry in vtd_pasid_as instances. Caller of this function should
> + * hold iommu_lock.
> + */
> +static gboolean vtd_flush_pasid(gpointer key, gpointer value,
> +                                gpointer user_data)
> +{
> +    VTDPASIDCacheInfo *pc_info = user_data;
> +    VTDPASIDAddressSpace *vtd_pasid_as = value;
> +    IntelIOMMUState *s = vtd_pasid_as->iommu_state;
> +    VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry;
> +    VTDBus *vtd_bus = vtd_pasid_as->vtd_bus;
> +    VTDPASIDEntry pe;
> +    uint16_t did;
> +    uint32_t pasid;
> +    uint16_t devfn;
> +    gboolean remove = false;
> +
> +    did = vtd_pe_get_domain_id(&pc_entry->pasid_entry);
> +    pasid = vtd_pasid_as->pasid;
> +    devfn = vtd_pasid_as->devfn;
> +
> +    if (pc_entry->pasid_cache_gen &&
> +        (vtd_pc_is_dom_si(pc_info) ? (pc_info->domain_id == did) : 1) &&
> +        (vtd_pc_is_pasid_si(pc_info) ? (pc_info->pasid == pasid) : 1)) {

This chunk is a bit odd to me.  How about something like this?

  ...

  if (!pc_entry->pasid_cache_gen)
    return false;

  switch (pc_info->flags) {
    case DOMAIN:
      if (pc_info->domain_id != did) {
        return false;
      }
      break;
    case PASID:
      if (pc_info->pasid != pasid) {
        return false;
      }
      break;
    ... (I think you'll add more in the follow up patches)
  }

> +        /*
> +         * Modify pasid_cache_gen to be 0, the cached pasid entry in
> +         * vtd_pasid_as instance is invalid. And vtd_pasid_as instance
> +         * would be treated as invalid in QEMU scope until the pasid
> +         * cache gen is updated in a new pasid binding or updated in
> +         * below logic if found guest pasid entry exists.
> +         */
> +        remove = true;

Why set remove here?  Should we set it only if we found that the entry
is cleared?

> +        pc_entry->pasid_cache_gen = 0;
> +        if (vtd_bus->dev_ic[devfn]) {
> +            if (!vtd_dev_get_pe_from_pasid(s,
> +                      pci_bus_num(vtd_bus->bus), devfn, pasid, &pe)) {
> +                /*
> +                 * pasid entry exists, so keep the vtd_pasid_as, and needs
> +                 * update the pasid entry cached in vtd_pasid_as. Also, if
> +                 * the guest pasid entry doesn't equal to cached pasid entry
> +                 * needs to issue a pasid bind to host for passthru devies.
> +                 */
> +                remove = false;
> +                pc_entry->pasid_cache_gen = s->pasid_cache_gen;
> +                if (!vtd_pasid_entry_compare(&pe, &pc_entry->pasid_entry)) {
> +                    pc_entry->pasid_entry = pe;

What if the pasid entry changed from valid to all zeros?  Should we
unbind/remove it as well?

> +                    /*
> +                     * TODO: when pasid-base-iotlb(piotlb) infrastructure is
> +                     * ready, should invalidate QEMU piotlb togehter with this
> +                     * change.
> +                     */
> +                }
> +            }
> +        }
> +    }
> +
> +    return remove;

In summary, IMHO this chunk could be clearer if like this:

  ... (continues with above pesudo code)
  
  ret = vtd_dev_get_pe_from_pasid(..., &pe);
  if (ret) {
    goto remove;
  }
  // detected correct pasid entry
  if (!vtd_pasid_entry_compare(&pe, ...)) {
     // pasid entry changed
     if (vtd_pasid_cleared(&pe)) {
       // the pasid is cleared to all zero, drop
       goto remove;
     }
     // a new pasid is setup

     // Send UNBIND if cache valid
     ...
     // Send BIND
     ...
     // Update cache
     pc_entry->pasid_entry = pe;
     pc_entry->pasid_cache_gen = s->pasid_cache_gen;
  }

remove:
  // Send UNBIND if cache valid
  ...
  return true;

I feel like you shouldn't bother checking against
vtd_bus->dev_ic[devfn] at all here because if that was set then it
means we need to pass these information down to host, and it'll be
checked automatically because when we send BIND/UNBIND event we'll
definitely check that too otherwise those events will be noops.

> +}
> +
>  static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id)
>  {
> +    VTDPASIDCacheInfo pc_info;
> +
> +    trace_vtd_pasid_cache_dsi(domain_id);
> +
> +    pc_info.flags = VTD_PASID_CACHE_DOMSI;
> +    pc_info.domain_id = domain_id;
> +
> +    /*
> +     * Loop all existing pasid caches and update them.
> +     */
> +    vtd_iommu_lock(s);
> +    g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
> +    vtd_iommu_unlock(s);
> +
> +    /*
> +     * TODO: Domain selective PASID cache invalidation
> +     * may be issued wrongly by programmer, to be safe,

IMHO it's not wrong even if the guest sends that, because logically
the guest can send invalidation as it wishes, and we should have
similar issue before on the 2nd level page table invalidations... and
that's why we need to keep the iova mapping inside qemu I suppose...

> +     * after invalidating the pasid caches, emulator
> +     * needs to replay the pasid bindings by walking guest
> +     * pasid dir and pasid table.

This is true...

> +     */
>      return 0;
>  }
>  
> +/**
> + * This function finds or adds a VTDPASIDAddressSpace for a device
> + * when it is bound to a pasid. Caller of this function should hold
> + * iommu_lock.
> + */
> +static VTDPASIDAddressSpace *vtd_add_find_pasid_as(IntelIOMMUState *s,
> +                                                   VTDBus *vtd_bus,
> +                                                   int devfn,
> +                                                   uint32_t pasid,
> +                                                   bool allocate)
> +{
> +    struct pasid_key *key;
> +    struct pasid_key *new_key;
> +    VTDPASIDAddressSpace *vtd_pasid_as;
> +    uint16_t sid;
> +
> +    sid = vtd_make_source_id(pci_bus_num(vtd_bus->bus), devfn);
> +    key = vtd_get_pasid_key(pasid, sid);
> +    vtd_pasid_as = g_hash_table_lookup(s->vtd_pasid_as, key);
> +
> +    if (!vtd_pasid_as && allocate) {
> +        new_key = vtd_get_pasid_key(pasid, sid);

Is this the same as key no matter what?

> +        /*
> +         * Initiate the vtd_pasid_as structure.
> +         *
> +         * This structure here is used to track the guest pasid
> +         * binding and also serves as pasid-cache mangement entry.
> +         *
> +         * TODO: in future, if wants to support the SVA-aware DMA
> +         *       emulation, the vtd_pasid_as should be fully initialized.
> +         *       e.g. the address_space and memory region fields.
> +         */
> +        vtd_pasid_as = g_malloc0(sizeof(VTDPASIDAddressSpace));
> +        vtd_pasid_as->iommu_state = s;
> +        vtd_pasid_as->vtd_bus = vtd_bus;
> +        vtd_pasid_as->devfn = devfn;
> +        vtd_pasid_as->context_cache_entry.context_cache_gen = 0;
> +        vtd_pasid_as->pasid = pasid;
> +        vtd_pasid_as->pasid_cache_entry.pasid_cache_gen = 0;
> +        g_hash_table_insert(s->vtd_pasid_as, new_key, vtd_pasid_as);
> +    }
> +    return vtd_pasid_as;
> +}
> +
> + /**
> +  * This function updates the pasid entry cached in &vtd_pasid_as.
> +  * Caller of this function should hold iommu_lock.
> +  */
> +static inline void vtd_fill_in_pe_cache(
> +              VTDPASIDAddressSpace *vtd_pasid_as, VTDPASIDEntry *pe)
> +{
> +    IntelIOMMUState *s = vtd_pasid_as->iommu_state;
> +    VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry;
> +
> +    pc_entry->pasid_entry = *pe;
> +    pc_entry->pasid_cache_gen = s->pasid_cache_gen;
> +}
> +
>  static int vtd_pasid_cache_psi(IntelIOMMUState *s,
>                                 uint16_t domain_id, uint32_t pasid)
>  {
> +    VTDPASIDCacheInfo pc_info;
> +    VTDPASIDEntry pe;
> +    VTDBus *vtd_bus;
> +    int bus_n, devfn;
> +    VTDPASIDAddressSpace *vtd_pasid_as;
> +    VTDIOMMUContext *vtd_ic;
> +
> +    pc_info.flags = VTD_PASID_CACHE_DOMSI;
> +    pc_info.domain_id = domain_id;
> +    pc_info.flags |= VTD_PASID_CACHE_PASIDSI;
> +    pc_info.pasid = pasid;
> +
> +    /*
> +     * Regards to a pasid selective pasid cache invalidation (PSI), it
> +     * could be either cases of below:
> +     * a) a present pasid entry moved to non-present
> +     * b) a present pasid entry to be a present entry
> +     * c) a non-present pasid entry moved to present
> +     *
> +     * Here the handling of a PSI is:
> +     * 1) loop all the exisitng vtd_pasid_as instances to update them
> +     *    according to the latest guest pasid entry in pasid table.
> +     *    this will make sure affected existing vtd_pasid_as instances
> +     *    cached the latest pasid entries. Also, during the loop, the
> +     *    host should be notified if needed. e.g. pasid unbind or pasid
> +     *    update. Should be able to cover case a) and case b).
> +     *
> +     * 2) loop all devices to cover case c)
> +     *    However, it is not good to always loop all devices. In this
> +     *    implementation. We do it in this ways:
> +     *    - For devices which have VTDIOMMUContext instances, we loop
> +     *      them and check if guest pasid entry exists. If yes, it is
> +     *      case c), we update the pasid cache and also notify host.
> +     *    - For devices which have no VTDIOMMUContext instances, it is
> +     *      not necessary to create pasid cache at this phase since it
> +     *      could be created when vIOMMU do DMA address translation.
> +     *      This is not implemented yet since no PASID-capable emulated
> +     *      devices today. If we have it in future, the pasid cache shall
> +     *      be created there.
> +     */
> +
> +    vtd_iommu_lock(s);
> +    g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
> +    vtd_iommu_unlock(s);

[2]

> +
> +    vtd_iommu_lock(s);

Do you want to explicitly release the lock for other thread?
Otherwise I don't see a point to unlock/lock in sequence..

> +    QLIST_FOREACH(vtd_ic, &s->vtd_dev_ic_list, next) {
> +        vtd_bus = vtd_ic->vtd_bus;
> +        devfn = vtd_ic->devfn;
> +        bus_n = pci_bus_num(vtd_bus->bus);
> +
> +        /* Step 1: fetch vtd_pasid_as and check if it is valid */
> +        vtd_pasid_as = vtd_add_find_pasid_as(s, vtd_bus,
> +                                        devfn, pasid, true);
> +        if (vtd_pasid_as &&
> +            (s->pasid_cache_gen ==
> +             vtd_pasid_as->pasid_cache_entry.pasid_cache_gen)) {
> +            /*
> +             * pasid_cache_gen equals to s->pasid_cache_gen means
> +             * vtd_pasid_as is valid after the above s->vtd_pasid_as
> +             * updates. Thus no need for the below steps.
> +             */
> +            continue;
> +        }
> +
> +        /*
> +         * Step 2: vtd_pasid_as is not valid, it's potentailly a
> +         * new pasid bind. Fetch guest pasid entry.
> +         */
> +        if (vtd_dev_get_pe_from_pasid(s, bus_n, devfn, pasid, &pe)) {
> +            continue;
> +        }
> +
> +        /*
> +         * Step 3: pasid entry exists, update pasid cache
> +         *
> +         * Here need to check domain ID since guest pasid entry
> +         * exists. What needs to do are:
> +         *   - update the pc_entry in the vtd_pasid_as
> +         *   - set proper pc_entry.pasid_cache_gen
> +         *   - passdown the latest guest pasid entry config to host
> +         *     (will be added in later patch)
> +         */
> +        if (domain_id == vtd_pe_get_domain_id(&pe)) {
> +            vtd_fill_in_pe_cache(vtd_pasid_as, &pe);
> +        }
> +    }

Could you explain why do we need this whole chunk if with [2] above?
I feel like that'll do all the things we need already (send
BIND/UNBIND, update pasid entry cache).

> +    vtd_iommu_unlock(s);
>      return 0;
>  }
>  
> +/**
> + * Caller of this function should hold iommu_lock
> + */
> +static void vtd_pasid_cache_reset(IntelIOMMUState *s)
> +{
> +    VTDPASIDCacheInfo pc_info;
> +
> +    trace_vtd_pasid_cache_reset();
> +
> +    pc_info.flags = 0;

Maybe also introduce a flag for GLOBAL flush to be clear?

> +
> +    /*
> +     * Reset pasid cache is a big hammer, so use g_hash_table_foreach_remove
> +     * which will free the vtd_pasid_as instances.
> +     */
> +    g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
> +    s->pasid_cache_gen = 1;
> +}
> +
>  static int vtd_pasid_cache_gsi(IntelIOMMUState *s)
>  {
> +    trace_vtd_pasid_cache_gsi();
> +
> +    vtd_iommu_lock(s);
> +    vtd_pasid_cache_reset(s);
> +    vtd_iommu_unlock(s);
> +
> +    /*
> +     * TODO: Global PASID cache invalidation may be
> +     * issued wrongly by programmer, to be safe, after
> +     * invalidating the pasid caches, emulator needs
> +     * to replay the pasid bindings by walking guest
> +     * pasid dir and pasid table.
> +     */
>      return 0;
>  }
>  
> @@ -3660,7 +4011,9 @@ VTDIOMMUContext *vtd_find_add_ic(IntelIOMMUState *s,
>          vtd_dev_ic->devfn = (uint8_t)devfn;
>          vtd_dev_ic->iommu_state = s;
>          iommu_context_init(&vtd_dev_ic->iommu_context);
> +        QLIST_INSERT_HEAD(&s->vtd_dev_ic_list, vtd_dev_ic, next);
>      }
> +
>      return vtd_dev_ic;
>  }
>  
> @@ -4074,6 +4427,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>      }
>  
>      QLIST_INIT(&s->vtd_as_with_notifiers);
> +    QLIST_INIT(&s->vtd_dev_ic_list);
>      qemu_mutex_init(&s->iommu_lock);
>      memset(s->vtd_as_by_bus_num, 0, sizeof(s->vtd_as_by_bus_num));
>      memory_region_init_io(&s->csrmem, OBJECT(s), &vtd_mem_ops, s,
> @@ -4099,6 +4453,8 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>                                       g_free, g_free);
>      s->vtd_as_by_busptr = g_hash_table_new_full(vtd_uint64_hash, vtd_uint64_equal,
>                                                g_free, g_free);
> +    s->vtd_pasid_as = g_hash_table_new_full(vtd_pasid_as_key_hash,
> +                                   vtd_pasid_as_key_equal, g_free, g_free);
>      vtd_init(s);
>      sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
>      pci_setup_iommu(bus, &vtd_iommu_ops, dev);
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index 879211e..12873e1 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -311,6 +311,7 @@ typedef enum VTDFaultReason {
>      VTD_FR_IR_SID_ERR = 0x26,   /* Invalid Source-ID */
>  
>      VTD_FR_PASID_TABLE_INV = 0x58,  /*Invalid PASID table entry */
> +    VTD_FR_PASID_ENTRY_P = 0x59, /* The Present(P) field of pasidt-entry is 0 */
>  
>      /* This is not a normal fault reason. We use this to indicate some faults
>       * that are not referenced by the VT-d specification.
> @@ -482,6 +483,15 @@ struct VTDRootEntry {
>  };
>  typedef struct VTDRootEntry VTDRootEntry;
>  
> +struct VTDPASIDCacheInfo {
> +#define VTD_PASID_CACHE_DOMSI   (1ULL << 0);
> +#define VTD_PASID_CACHE_PASIDSI (1ULL << 1);
> +    uint32_t flags;
> +    uint16_t domain_id;
> +    uint32_t pasid;
> +};
> +typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo;
> +
>  /* Masks for struct VTDRootEntry */
>  #define VTD_ROOT_ENTRY_P            1ULL
>  #define VTD_ROOT_ENTRY_CTP          (~0xfffULL)
> diff --git a/hw/i386/trace-events b/hw/i386/trace-events
> index 6da8bd2..7912ae1 100644
> --- a/hw/i386/trace-events
> +++ b/hw/i386/trace-events
> @@ -22,6 +22,7 @@ vtd_inv_qi_head(uint16_t head) "read head %d"
>  vtd_inv_qi_tail(uint16_t head) "write tail %d"
>  vtd_inv_qi_fetch(void) ""
>  vtd_context_cache_reset(void) ""
> +vtd_pasid_cache_reset(void) ""
>  vtd_pasid_cache_gsi(void) ""
>  vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain 0x%"PRIx16
>  vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 0d49480..d693f71 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -69,6 +69,8 @@ typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
>  typedef struct VTDPASIDDirEntry VTDPASIDDirEntry;
>  typedef struct VTDPASIDEntry VTDPASIDEntry;
>  typedef struct VTDIOMMUContext VTDIOMMUContext;
> +typedef struct VTDPASIDCacheEntry VTDPASIDCacheEntry;
> +typedef struct VTDPASIDAddressSpace VTDPASIDAddressSpace;
>  
>  /* Context-Entry */
>  struct VTDContextEntry {
> @@ -101,6 +103,31 @@ struct VTDPASIDEntry {
>      uint64_t val[8];
>  };
>  
> +struct pasid_key {
> +    uint32_t pasid;
> +    uint16_t sid;
> +};
> +
> +struct VTDPASIDCacheEntry {
> +    /*
> +     * The cache entry is obsolete if
> +     * pasid_cache_gen!=IntelIOMMUState.pasid_cache_gen
> +     */
> +    uint32_t pasid_cache_gen;
> +    struct VTDPASIDEntry pasid_entry;
> +};
> +
> +struct VTDPASIDAddressSpace {
> +    VTDBus *vtd_bus;
> +    uint8_t devfn;
> +    AddressSpace as;
> +    uint32_t pasid;
> +    IntelIOMMUState *iommu_state;
> +    VTDContextCacheEntry context_cache_entry;
> +    QLIST_ENTRY(VTDPASIDAddressSpace) next;
> +    VTDPASIDCacheEntry pasid_cache_entry;
> +};
> +
>  struct VTDAddressSpace {
>      PCIBus *bus;
>      uint8_t devfn;
> @@ -121,6 +148,7 @@ struct VTDIOMMUContext {
>      VTDBus *vtd_bus;
>      uint8_t devfn;
>      IOMMUContext iommu_context;
> +    QLIST_ENTRY(VTDIOMMUContext) next;
>      IntelIOMMUState *iommu_state;
>  };
>  
> @@ -269,9 +297,14 @@ struct IntelIOMMUState {
>  
>      GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* reference */
>      VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
> +    GHashTable *vtd_pasid_as;   /* VTDPASIDAddressSpace objects */
> +    uint32_t pasid_cache_gen;   /* Should be in [1,MAX] */
>      /* list of registered notifiers */
>      QLIST_HEAD(, VTDAddressSpace) vtd_as_with_notifiers;
>  
> +    /* list of registered notifiers */
> +    QLIST_HEAD(, VTDIOMMUContext) vtd_dev_ic_list;
> +
>      /* interrupt remapping */
>      bool intr_enabled;              /* Whether guest enabled IR */
>      dma_addr_t intr_root;           /* Interrupt remapping table pointer */
> @@ -288,7 +321,8 @@ struct IntelIOMMUState {
>  
>      /*
>       * Protects IOMMU states in general.  Currently it protects the
> -     * per-IOMMU IOTLB cache, and context entry cache in VTDAddressSpace.
> +     * per-IOMMU IOTLB cache, and context entry cache in VTDAddressSpace,
> +     * and pasid cache in VTDPASIDAddressSpace.
>       */
>      QemuMutex iommu_lock;
>  };
> -- 
> 2.7.4
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 15/22] intel_iommu: bind/unbind guest page table to host
  2019-10-24 12:34 ` [RFC v2 15/22] intel_iommu: bind/unbind guest page table to host Liu Yi L
@ 2019-11-04 20:25   ` Peter Xu
  2019-11-06  8:10     ` Liu, Yi L
  0 siblings, 1 reply; 79+ messages in thread
From: Peter Xu @ 2019-11-04 20:25 UTC (permalink / raw)
  To: Liu Yi L
  Cc: tianyu.lan, kevin.tian, jacob.jun.pan, Yi Sun, kvm, mst,
	jun.j.tian, qemu-devel, eric.auger, alex.williamson, pbonzini,
	yi.y.sun, david

On Thu, Oct 24, 2019 at 08:34:36AM -0400, Liu Yi L wrote:
> This patch captures the guest PASID table entry modifications and
> propagates the changes to host to setup nested translation. The
> guest page table is configured as 1st level page table (GVA->GPA)
> whose translation result would further go through host VT-d 2nd
> level page table(GPA->HPA) under nested translation mode. This is
> a key part of vSVA support.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
>  hw/i386/intel_iommu.c          | 81 ++++++++++++++++++++++++++++++++++++++++++
>  hw/i386/intel_iommu_internal.h | 20 +++++++++++
>  2 files changed, 101 insertions(+)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index d8827c9..793b0de 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -41,6 +41,7 @@
>  #include "migration/vmstate.h"
>  #include "trace.h"
>  #include "qemu/jhash.h"
> +#include <linux/iommu.h>
>  
>  /* context entry operations */
>  #define VTD_CE_GET_RID2PASID(ce) \
> @@ -695,6 +696,16 @@ static inline uint16_t vtd_pe_get_domain_id(VTDPASIDEntry *pe)
>      return VTD_SM_PASID_ENTRY_DID((pe)->val[1]);
>  }
>  
> +static inline uint32_t vtd_pe_get_fl_aw(VTDPASIDEntry *pe)
> +{
> +    return 48 + ((pe->val[2] >> 2) & VTD_SM_PASID_ENTRY_FLPM) * 9;
> +}
> +
> +static inline dma_addr_t vtd_pe_get_flpt_base(VTDPASIDEntry *pe)
> +{
> +    return pe->val[2] & VTD_SM_PASID_ENTRY_FLPTPTR;
> +}
> +
>  static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire)
>  {
>      return pdire->val & 1;
> @@ -1850,6 +1861,67 @@ static void vtd_context_global_invalidate(IntelIOMMUState *s)
>      vtd_iommu_replay_all(s);
>  }
>  
> +static void vtd_bind_guest_pasid(IntelIOMMUState *s, VTDBus *vtd_bus,
> +            int devfn, int pasid, VTDPASIDEntry *pe, VTDPASIDOp op)
> +{
> +#ifdef __linux__
> +    VTDIOMMUContext *vtd_ic;
> +    IOMMUCTXEventData event_data;
> +    IOMMUCTXPASIDBindData bind;
> +    struct iommu_gpasid_bind_data *g_bind_data;
> +
> +    vtd_ic = vtd_bus->dev_ic[devfn];
> +    if (!vtd_ic) {
> +        return;
> +    }
> +
> +    g_bind_data = g_malloc0(sizeof(*g_bind_data));
> +    bind.flag = 0;
> +    g_bind_data->flags = 0;
> +    g_bind_data->vtd.flags = 0;
> +    switch (op) {
> +    case VTD_PASID_BIND:
> +    case VTD_PASID_UPDATE:
> +        g_bind_data->version = IOMMU_GPASID_BIND_VERSION_1;
> +        g_bind_data->format = IOMMU_PASID_FORMAT_INTEL_VTD;
> +        g_bind_data->gpgd = vtd_pe_get_flpt_base(pe);
> +        g_bind_data->addr_width = vtd_pe_get_fl_aw(pe);
> +        g_bind_data->hpasid = pasid;
> +        g_bind_data->gpasid = pasid;
> +        g_bind_data->flags |= IOMMU_SVA_GPASID_VAL;
> +        g_bind_data->vtd.flags =
> +                             (VTD_SM_PASID_ENTRY_SRE_BIT(pe->val[2]) ? 1 : 0)
> +                           | (VTD_SM_PASID_ENTRY_EAFE_BIT(pe->val[2]) ? 1 : 0)
> +                           | (VTD_SM_PASID_ENTRY_PCD_BIT(pe->val[1]) ? 1 : 0)
> +                           | (VTD_SM_PASID_ENTRY_PWT_BIT(pe->val[1]) ? 1 : 0)
> +                           | (VTD_SM_PASID_ENTRY_EMTE_BIT(pe->val[1]) ? 1 : 0)
> +                           | (VTD_SM_PASID_ENTRY_CD_BIT(pe->val[1]) ? 1 : 0);
> +        g_bind_data->vtd.pat = VTD_SM_PASID_ENTRY_PAT(pe->val[1]);
> +        g_bind_data->vtd.emt = VTD_SM_PASID_ENTRY_EMT(pe->val[1]);
> +        bind.flag |= IOMMU_CTX_BIND_PASID;
> +        break;
> +
> +    case VTD_PASID_UNBIND:
> +        g_bind_data->gpgd = 0;
> +        g_bind_data->addr_width = 0;
> +        g_bind_data->hpasid = pasid;
> +        bind.flag |= IOMMU_CTX_UNBIND_PASID;
> +        break;
> +
> +    default:
> +        printf("Unknown VTDPASIDOp!!\n");

Please don't use printf()..  Here assert() suits.

> +        break;
> +    }
> +    if (bind.flag) {

Will this be untrue?  If not, assert() works too.

> +        event_data.event = IOMMU_CTX_EVENT_PASID_BIND;
> +        bind.data = g_bind_data;
> +        event_data.data = &bind;
> +        iommu_ctx_event_notify(&vtd_ic->iommu_context, &event_data);
> +    }
> +    g_free(g_bind_data);
> +#endif
> +}
> +
>  /* Do a context-cache device-selective invalidation.
>   * @func_mask: FM field after shifting
>   */
> @@ -2528,12 +2600,17 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value,
>                  pc_entry->pasid_cache_gen = s->pasid_cache_gen;
>                  if (!vtd_pasid_entry_compare(&pe, &pc_entry->pasid_entry)) {
>                      pc_entry->pasid_entry = pe;
> +                    vtd_bind_guest_pasid(s, vtd_bus, devfn,
> +                                     pasid, &pe, VTD_PASID_UPDATE);
>                      /*
>                       * TODO: when pasid-base-iotlb(piotlb) infrastructure is
>                       * ready, should invalidate QEMU piotlb togehter with this
>                       * change.
>                       */
>                  }
> +            } else {
> +                vtd_bind_guest_pasid(s, vtd_bus, devfn,
> +                                  pasid, NULL, VTD_PASID_UNBIND);

Please see the reply in the other thread on vtd_flush_pasid().  I've
filled in where I feel like this UNBIND should exist, I feel like your
current code could miss some places where you should unbind but didn't.

>              }
>          }
>      }
> @@ -2623,6 +2700,10 @@ static inline void vtd_fill_in_pe_cache(
>  
>      pc_entry->pasid_entry = *pe;
>      pc_entry->pasid_cache_gen = s->pasid_cache_gen;
> +    vtd_bind_guest_pasid(s, vtd_pasid_as->vtd_bus,
> +                         vtd_pasid_as->devfn,
> +                         vtd_pasid_as->pasid,
> +                         pe, VTD_PASID_UPDATE);
>  }
>  
>  static int vtd_pasid_cache_psi(IntelIOMMUState *s,
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index 12873e1..13e02e8 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -483,6 +483,14 @@ struct VTDRootEntry {
>  };
>  typedef struct VTDRootEntry VTDRootEntry;
>  
> +enum VTDPASIDOp {
> +    VTD_PASID_BIND,
> +    VTD_PASID_UNBIND,
> +    VTD_PASID_UPDATE,
> +    VTD_OP_NUM
> +};
> +typedef enum VTDPASIDOp VTDPASIDOp;
> +
>  struct VTDPASIDCacheInfo {
>  #define VTD_PASID_CACHE_DOMSI   (1ULL << 0);
>  #define VTD_PASID_CACHE_PASIDSI (1ULL << 1);
> @@ -549,6 +557,18 @@ typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo;
>  #define VTD_SM_PASID_ENTRY_AW          7ULL /* Adjusted guest-address-width */
>  #define VTD_SM_PASID_ENTRY_DID(val)    ((val) & VTD_DOMAIN_ID_MASK)
>  
> +/* Adjusted guest-address-width */
> +#define VTD_SM_PASID_ENTRY_FLPM          3ULL
> +#define VTD_SM_PASID_ENTRY_FLPTPTR       (~0xfffULL)
> +#define VTD_SM_PASID_ENTRY_SRE_BIT(val)  (!!((val) & 1ULL))
> +#define VTD_SM_PASID_ENTRY_EAFE_BIT(val) (!!(((val) >> 7) & 1ULL))
> +#define VTD_SM_PASID_ENTRY_PCD_BIT(val)  (!!(((val) >> 31) & 1ULL))
> +#define VTD_SM_PASID_ENTRY_PWT_BIT(val)  (!!(((val) >> 30) & 1ULL))
> +#define VTD_SM_PASID_ENTRY_EMTE_BIT(val) (!!(((val) >> 26) & 1ULL))
> +#define VTD_SM_PASID_ENTRY_CD_BIT(val)   (!!(((val) >> 25) & 1ULL))
> +#define VTD_SM_PASID_ENTRY_PAT(val)      (((val) >> 32) & 0xFFFFFFFFULL)
> +#define VTD_SM_PASID_ENTRY_EMT(val)      (((val) >> 27) & 0x7ULL)
> +
>  /* Second Level Page Translation Pointer*/
>  #define VTD_SM_PASID_ENTRY_SLPTPTR     (~0xfffULL)
>  
> -- 
> 2.7.4
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
  2019-11-04 17:22 ` Peter Xu
@ 2019-11-05  9:09   ` Liu, Yi L
  0 siblings, 0 replies; 79+ messages in thread
From: Liu, Yi L @ 2019-11-05  9:09 UTC (permalink / raw)
  To: Peter Xu
  Cc: tianyu.lan, Tian, Kevin, jacob.jun.pan, kvm, mst, Tian, Jun J,
	qemu-devel, eric.auger, alex.williamson, pbonzini, Sun, Yi Y,
	david

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Tuesday, November 5, 2019 1:23 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM
> 
> On Thu, Oct 24, 2019 at 08:34:21AM -0400, Liu Yi L wrote:
> > Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on
> > Intel platforms allow address space sharing between device DMA and applications.
> > SVA can reduce programming complexity and enhance security.
> > This series is intended to expose SVA capability to VMs. i.e. shared
> > guest application address space with passthru devices. The whole SVA
> > virtualization requires QEMU/VFIO/IOMMU changes. This series includes
> > the QEMU changes, for VFIO and IOMMU changes, they are in separate
> > series (listed in the "Related series").
> >
[...]
>
> Yi,
> 
> Would you mind to always mention what tests you have been done with the
> patchset in the cover letter?  It'll be fine to say that you're running this against FPGAs
> so no one could really retest it, but still it would be good to know that as well.  It'll
> even be better to mention that which part of the series is totally untested if you are
> aware of.

Sure, I should have included the test parts. Will do in next version.

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 03/22] intel_iommu: modify x-scalable-mode to be string option
  2019-11-01 14:57   ` Peter Xu
@ 2019-11-05  9:14     ` Liu, Yi L
  2019-11-05 12:50       ` Peter Xu
  0 siblings, 1 reply; 79+ messages in thread
From: Liu, Yi L @ 2019-11-05  9:14 UTC (permalink / raw)
  To: Peter Xu
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, eric.auger, alex.williamson, pbonzini, Sun, Yi Y,
	david

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Friday, November 1, 2019 10:58 PM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 03/22] intel_iommu: modify x-scalable-mode to be string option
> 
> On Thu, Oct 24, 2019 at 08:34:24AM -0400, Liu Yi L wrote:
> > Intel VT-d 3.0 introduces scalable mode, and it has a bunch of
> > capabilities related to scalable mode translation, thus there are multiple
> combinations.
> > While this vIOMMU implementation wants simplify it for user by
> > providing typical combinations. User could config it by
> > "x-scalable-mode" option. The usage is as below:
> >
> > "-device intel-iommu,x-scalable-mode=["legacy"|"modern"]"
> >
> >  - "legacy": gives support for SL page table
> >  - "modern": gives support for FL page table, pasid, virtual command
> >  -  if not configured, means no scalable mode support, if not proper
> >     configured, will throw error
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> > ---
> >  hw/i386/intel_iommu.c          | 15 +++++++++++++--
> >  hw/i386/intel_iommu_internal.h |  3 +++
> > include/hw/i386/intel_iommu.h  |  2 +-
> >  3 files changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> > 771bed2..4a1a07a 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -3019,7 +3019,7 @@ static Property vtd_properties[] = {
> >      DEFINE_PROP_UINT8("aw-bits", IntelIOMMUState, aw_bits,
> >                        VTD_HOST_ADDRESS_WIDTH),
> >      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode,
> FALSE),
> > -    DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode,
> FALSE),
> > +    DEFINE_PROP_STRING("x-scalable-mode", IntelIOMMUState,
> > + scalable_mode),
> >      DEFINE_PROP_BOOL("dma-drain", IntelIOMMUState, dma_drain, true),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> > @@ -3581,7 +3581,12 @@ static void vtd_init(IntelIOMMUState *s)
> >
> >      /* TODO: read cap/ecap from host to decide which cap to be exposed. */
> >      if (s->scalable_mode) {
> > -        s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
> > +        if (!strcmp(s->scalable_mode, "legacy")) {
> > +            s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
> > +        } else if (!strcmp(s->scalable_mode, "modern")) {
> > +            s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID
> > +                       | VTD_ECAP_FLTS | VTD_ECAP_PSS;
> > +        }
> 
> Shall we do this string op only once in vtd_decide_config() then keep it somewhere?

Agreed. I'll move it to vtd_decide_config().

> Something like:
> 
>   - s->scalable_mode_str to keep the string
>   - s->scalable_mode still as a bool to cache the global enablement
>   - s->scalable_modern as a bool to keep the mode
> 
> ?

So x-scalable-mode is still a string option, just to have a new field to store it?

> 
> These could be used in some MMIO path (I think) and parsing strings always could be
> a bit overkill.

I think so. Let's just align on the direction above.

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 03/22] intel_iommu: modify x-scalable-mode to be string option
  2019-11-05  9:14     ` Liu, Yi L
@ 2019-11-05 12:50       ` Peter Xu
  2019-11-06  9:50         ` Liu, Yi L
  0 siblings, 1 reply; 79+ messages in thread
From: Peter Xu @ 2019-11-05 12:50 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, eric.auger, alex.williamson, pbonzini, Sun, Yi Y,
	david

On Tue, Nov 05, 2019 at 09:14:08AM +0000, Liu, Yi L wrote:
> > Something like:
> > 
> >   - s->scalable_mode_str to keep the string
> >   - s->scalable_mode still as a bool to cache the global enablement
> >   - s->scalable_modern as a bool to keep the mode
> > 
> > ?
> 
> So x-scalable-mode is still a string option, just to have a new field to store it?

Yep.  I'd say maybe we should start to allow to define some union-ish
properties, but for now I think string is ok.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 11/22] intel_iommu: process pasid cache invalidation
  2019-11-02 16:05   ` Peter Xu
@ 2019-11-06  5:55     ` Liu, Yi L
  0 siblings, 0 replies; 79+ messages in thread
From: Liu, Yi L @ 2019-11-06  5:55 UTC (permalink / raw)
  To: Peter Xu
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, eric.auger, alex.williamson, pbonzini, Sun, Yi Y,
	david

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Sunday, November 3, 2019 12:06 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 11/22] intel_iommu: process pasid cache invalidation
> 
> On Thu, Oct 24, 2019 at 08:34:32AM -0400, Liu Yi L wrote:
> > This patch adds PASID cache invalidation handling. When guest enabled
> > PASID usages (e.g. SVA), guest software should issue a proper PASID
> > cache invalidation when caching-mode is exposed. This patch only adds
> > the draft handling of pasid cache invalidation. Detailed handling will
> > be added in subsequent patches.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> >  hw/i386/intel_iommu.c          | 66 ++++++++++++++++++++++++++++++++++++++--
> --
> >  hw/i386/intel_iommu_internal.h | 12 ++++++++
> >  hw/i386/trace-events           |  3 ++
> >  3 files changed, 76 insertions(+), 5 deletions(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 88b843f..84ff6f0 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -2335,6 +2335,63 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState
> *s, VTDInvDesc *inv_desc)
> >      return true;
> >  }
> >
> > +static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id)
> > +{
> > +    return 0;
> > +}
> > +
> > +static int vtd_pasid_cache_psi(IntelIOMMUState *s,
> > +                               uint16_t domain_id, uint32_t pasid)
> > +{
> > +    return 0;
> > +}
> > +
> > +static int vtd_pasid_cache_gsi(IntelIOMMUState *s)
> > +{
> > +    return 0;
> > +}
> > +
> > +static bool vtd_process_pasid_desc(IntelIOMMUState *s,
> > +                                   VTDInvDesc *inv_desc)
> > +{
> > +    uint16_t domain_id;
> > +    uint32_t pasid;
> > +    int ret = 0;
> > +
> > +    if ((inv_desc->val[0] & VTD_INV_DESC_PASIDC_RSVD_VAL0) ||
> > +        (inv_desc->val[1] & VTD_INV_DESC_PASIDC_RSVD_VAL1) ||
> > +        (inv_desc->val[2] & VTD_INV_DESC_PASIDC_RSVD_VAL2) ||
> > +        (inv_desc->val[3] & VTD_INV_DESC_PASIDC_RSVD_VAL3)) {
> > +        error_report_once("non-zero-field-in-pc_inv_desc hi: 0x%" PRIx64
> > +                  " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]);
> > +        return false;
> > +    }
> > +
> > +    domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->val[0]);
> > +    pasid = VTD_INV_DESC_PASIDC_PASID(inv_desc->val[0]);
> > +
> > +    switch (inv_desc->val[0] & VTD_INV_DESC_PASIDC_G) {
> > +    case VTD_INV_DESC_PASIDC_DSI:
> > +        ret = vtd_pasid_cache_dsi(s, domain_id);
> > +        break;
> > +
> > +    case VTD_INV_DESC_PASIDC_PASID_SI:
> > +        ret = vtd_pasid_cache_psi(s, domain_id, pasid);
> > +        break;
> > +
> > +    case VTD_INV_DESC_PASIDC_GLOBAL:
> > +        ret = vtd_pasid_cache_gsi(s);
> > +        break;
> > +
> > +    default:
> > +        error_report_once("invalid-inv-granu-in-pc_inv_desc hi: 0x%" PRIx64
> > +                  " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]);
> > +        return false;
> > +    }
> > +
> > +    return (ret == 0) ? true : false;
> > +}
> > +
> >  static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
> >                                       VTDInvDesc *inv_desc)
> >  {
> > @@ -2441,12 +2498,11 @@ static bool vtd_process_inv_desc(IntelIOMMUState
> *s)
> >          }
> >          break;
> >
> > -    /*
> > -     * TODO: the entity of below two cases will be implemented in future series.
> > -     * To make guest (which integrates scalable mode support patch set in
> > -     * iommu driver) work, just return true is enough so far.
> > -     */
> >      case VTD_INV_DESC_PC:
> > +        trace_vtd_inv_desc("pasid-cache", inv_desc.val[1], inv_desc.val[0]);
> 
> Could be helpful if you dump [2|3] together here...

sure. Let me add it in next version.

> > +        if (!vtd_process_pasid_desc(s, &inv_desc)) {
> > +            return false;
> > +        }
> >          break;
> >
> >      case VTD_INV_DESC_PIOTLB:
> > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> > index 8668771..c6cb28b 100644
> > --- a/hw/i386/intel_iommu_internal.h
> > +++ b/hw/i386/intel_iommu_internal.h
> > @@ -445,6 +445,18 @@ typedef union VTDInvDesc VTDInvDesc;
> >  #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \
> >          (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
> >
> > +#define VTD_INV_DESC_PASIDC_G          (3ULL << 4)
> > +#define VTD_INV_DESC_PASIDC_PASID(val) (((val) >> 32) & 0xfffffULL)
> > +#define VTD_INV_DESC_PASIDC_DID(val)   (((val) >> 16) &
> VTD_DOMAIN_ID_MASK)
> > +#define VTD_INV_DESC_PASIDC_RSVD_VAL0  0xfff000000000ffc0ULL
> 
> Nit: Mind to comment here that bit 9-11 is marked as zero rather than
> reserved?  This seems to work but if bit 9-11 can be non-zero in some
> other descriptors then it would be clearer to define it as
> 0xfff000000000f1c0ULL then explicitly check bits 9-11.
> 
> Otherwise looks good to me.

You are right. This is not reserved. It's parts of the descriptor type now. Will
fix it in next version.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 13/22] intel_iommu: add PASID cache management infrastructure
  2019-11-04 20:06   ` Peter Xu
@ 2019-11-06  7:56     ` Liu, Yi L
  2019-11-07 15:46       ` Peter Xu
  0 siblings, 1 reply; 79+ messages in thread
From: Liu, Yi L @ 2019-11-06  7:56 UTC (permalink / raw)
  To: Peter Xu
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, eric.auger, alex.williamson, pbonzini, Sun, Yi Y,
	david

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Tuesday, November 5, 2019 4:07 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 13/22] intel_iommu: add PASID cache management
> infrastructure
> 
> On Thu, Oct 24, 2019 at 08:34:34AM -0400, Liu Yi L wrote:
> > This patch adds a PASID cache management infrastructure based on
> > new added structure VTDPASIDAddressSpace, which is used to track
> > the PASID usage and future PASID tagged DMA address translation
> > support in vIOMMU.
> >
> >     struct VTDPASIDAddressSpace {
> >         VTDBus *vtd_bus;
> >         uint8_t devfn;
> >         AddressSpace as;
> >         uint32_t pasid;
> >         IntelIOMMUState *iommu_state;
> >         VTDContextCacheEntry context_cache_entry;
> >         QLIST_ENTRY(VTDPASIDAddressSpace) next;
> >         VTDPASIDCacheEntry pasid_cache_entry;
> >     };
> >
> > Ideally, a VTDPASIDAddressSpace instance is created when a PASID
> > is bound with a DMA AddressSpace. Intel VT-d spec requires guest
> > software to issue pasid cache invalidation when bind or unbind a
> > pasid with an address space under caching-mode. However, as
> > VTDPASIDAddressSpace instances also act as pasid cache in this
> > implementation, its creation also happens during vIOMMU PASID
> > tagged DMA translation. The creation in this path will not be
> > added in this patch since no PASID-capable emulated devices for
> > now.
> >
> > The implementation in this patch manages VTDPASIDAddressSpace
> > instances per PASID+BDF (lookup and insert will use PASID and
> > BDF) since Intel VT-d spec allows per-BDF PASID Table. When a
> > guest bind a PASID with an AddressSpace, QEMU will capture the
> > guest pasid selective pasid cache invalidation, and allocate
> > remove a VTDPASIDAddressSpace instance per the invalidation
> > reasons:
> >
> >     *) a present pasid entry moved to non-present
> >     *) a present pasid entry to be a present entry
> >     *) a non-present pasid entry moved to present
> >
> > vIOMMU emulator could figure out the reason by fetching latest
> > guest pasid entry.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> 
> Ok feel free to ignore my previous reply... I didn't notice it's
> actually the pasid entry cache layer rather than the whole pasid
> layer (including piotlb).  Comments below.

yep. It is in another patch and this patch set won't implement piotlb
cache infrastructure as no emulated sva-capable device so far.

> > ---
> >  hw/i386/intel_iommu.c          | 356
> +++++++++++++++++++++++++++++++++++++++++
> >  hw/i386/intel_iommu_internal.h |  10 ++
> >  hw/i386/trace-events           |   1 +
> >  include/hw/i386/intel_iommu.h  |  36 ++++-
> >  4 files changed, 402 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 90b8f6c..d8827c9 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -40,6 +40,7 @@
> >  #include "kvm_i386.h"
> >  #include "migration/vmstate.h"
> >  #include "trace.h"
> > +#include "qemu/jhash.h"
> >
> >  /* context entry operations */
> >  #define VTD_CE_GET_RID2PASID(ce) \
> > @@ -65,6 +66,8 @@
> >  static void vtd_address_space_refresh_all(IntelIOMMUState *s);
> >  static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
> >
> > +static void vtd_pasid_cache_reset(IntelIOMMUState *s);
> > +
> >  static void vtd_panic_require_caching_mode(void)
> >  {
> >      error_report("We need to set caching-mode=on for intel-iommu to enable "
> > @@ -276,6 +279,7 @@ static void vtd_reset_caches(IntelIOMMUState *s)
> >      vtd_iommu_lock(s);
> >      vtd_reset_iotlb_locked(s);
> >      vtd_reset_context_cache_locked(s);
> > +    vtd_pasid_cache_reset(s);
> >      vtd_iommu_unlock(s);
> >  }
> >
> > @@ -686,6 +690,11 @@ static inline bool vtd_pe_type_check(X86IOMMUState
> *x86_iommu,
> >      return true;
> >  }
> >
> > +static inline uint16_t vtd_pe_get_domain_id(VTDPASIDEntry *pe)
> > +{
> > +    return VTD_SM_PASID_ENTRY_DID((pe)->val[1]);
> > +}
> > +
> >  static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire)
> >  {
> >      return pdire->val & 1;
> > @@ -2389,19 +2398,361 @@ static bool
> vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
> >      return true;
> >  }
> >
> > +static inline struct pasid_key *vtd_get_pasid_key(uint32_t pasid,
> > +                                                  uint16_t sid)
> > +{
> > +    struct pasid_key *key = g_malloc0(sizeof(*key));
> 
> I think you can simply return the pasid_key directly maybe otherwise
> should be careful on mem leak.  Actually I think it's leaked below...

sure, I can do it. For the leak, it is a known issue as below comment
indicates. Not sure why it was left as it is. Perhaps, the key point is
used in the hash table. Per my understanding, hash table should have
its own field to store the key content. Do you have any idea?

    if (!vtd_bus) {
        uintptr_t *new_key = g_malloc(sizeof(*new_key));
        *new_key = (uintptr_t)bus;
        /* No corresponding free() */

> 
> > +    key->pasid = pasid;
> > +    key->sid = sid;
> > +    return key;
> > +}
> > +
> > +static guint vtd_pasid_as_key_hash(gconstpointer v)
> > +{
> > +    struct pasid_key *key = (struct pasid_key *)v;
> > +    uint32_t a, b, c;
> > +
> > +    /* Jenkins hash */
> > +    a = b = c = JHASH_INITVAL + sizeof(*key);
> > +    a += key->sid;
> > +    b += extract32(key->pasid, 0, 16);
> > +    c += extract32(key->pasid, 16, 16);
> > +
> > +    __jhash_mix(a, b, c);
> > +    __jhash_final(a, b, c);
> 
> I'm totally not good at hash, but I'm curious why no one wants to
> introduce at least a jhash() so we don't need to call these internals
> (I believe that's how kernel did it). 

well, I'm also curious about it.

> At the meantime I don't see how
> it would be better than things like g_str_hash() too so I'd be glad if
> anyone could help explain a bit...

I used to use g_str_hash(), and used string as key.
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02128.html

Do you want me to keep the pasid_key structure here and switch to
use g_str_hash()? Then the pasid key content would be compared as
strings. I think it should work. But, I may be wrong all the same.

> > +
> > +    return c;
> > +}
> > +
> > +static gboolean vtd_pasid_as_key_equal(gconstpointer v1, gconstpointer v2)
> > +{
> > +    const struct pasid_key *k1 = v1;
> > +    const struct pasid_key *k2 = v2;
> > +
> > +    return (k1->pasid == k2->pasid) && (k1->sid == k2->sid);
> > +}
> > +
> > +static inline bool vtd_pc_is_dom_si(struct VTDPASIDCacheInfo *pc_info)
> > +{
> > +    return pc_info->flags & VTD_PASID_CACHE_DOMSI;
> > +}
> > +
> > +static inline bool vtd_pc_is_pasid_si(struct VTDPASIDCacheInfo *pc_info)
> > +{
> > +    return pc_info->flags & VTD_PASID_CACHE_PASIDSI;
> 
> AFAIS these only used once.  How about removing these helpers?  I
> don't see much on helping readability or anything...  please see below
> at [1].

Agreed. Will do it. BTW. I failed to locate [1]. May you point it out. Surely
I don’t want to miss any comments.

> > +}
> > +
> > +static inline int vtd_dev_get_pe_from_pasid(IntelIOMMUState *s,
> > +                                            uint8_t bus_num,
> > +                                            uint8_t devfn,
> > +                                            uint32_t pasid,
> > +                                            VTDPASIDEntry *pe)
> > +{
> > +    VTDContextEntry ce;
> > +    int ret;
> > +    dma_addr_t pasid_dir_base;
> > +
> > +    if (!s->root_scalable) {
> > +        return -VTD_FR_PASID_TABLE_INV;
> > +    }
> > +
> > +    ret = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
> > +    if (ret) {
> > +        return ret;
> > +    }
> > +
> > +    pasid_dir_base = VTD_CE_GET_PASID_DIR_TABLE(&ce);
> > +    ret = vtd_get_pe_from_pasid_table(s,
> > +                                  pasid_dir_base, pasid, pe);
> > +
> > +    return ret;
> > +}
> > +
> > +static bool vtd_pasid_entry_compare(VTDPASIDEntry *p1, VTDPASIDEntry *p2)
> > +{
> > +    int i = 0;
> > +    while (i < sizeof(*p1) / sizeof(p1->val)) {
> > +        if (p1->val[i] != p2->val[i]) {
> > +            return false;
> > +        }
> > +        i++;
> > +    }
> > +    return true;
> 
> Will this work?
> 
>   return !memcmp(p1, p2, sizeof(*p1));

oh, yes. Will replace with it.

> > +}
> > +
> > +/**
> > + * This function is used to clear pasid_cache_gen of cached pasid
> > + * entry in vtd_pasid_as instances. Caller of this function should
> > + * hold iommu_lock.
> > + */
> > +static gboolean vtd_flush_pasid(gpointer key, gpointer value,
> > +                                gpointer user_data)
> > +{
> > +    VTDPASIDCacheInfo *pc_info = user_data;
> > +    VTDPASIDAddressSpace *vtd_pasid_as = value;
> > +    IntelIOMMUState *s = vtd_pasid_as->iommu_state;
> > +    VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry;
> > +    VTDBus *vtd_bus = vtd_pasid_as->vtd_bus;
> > +    VTDPASIDEntry pe;
> > +    uint16_t did;
> > +    uint32_t pasid;
> > +    uint16_t devfn;
> > +    gboolean remove = false;
> > +
> > +    did = vtd_pe_get_domain_id(&pc_entry->pasid_entry);
> > +    pasid = vtd_pasid_as->pasid;
> > +    devfn = vtd_pasid_as->devfn;
> > +
> > +    if (pc_entry->pasid_cache_gen &&
> > +        (vtd_pc_is_dom_si(pc_info) ? (pc_info->domain_id == did) : 1) &&
> > +        (vtd_pc_is_pasid_si(pc_info) ? (pc_info->pasid == pasid) : 1)) {
> 
> This chunk is a bit odd to me.  How about something like this?
> 
>   ...
> 
>   if (!pc_entry->pasid_cache_gen)
>     return false;
> 
>   switch (pc_info->flags) {
>     case DOMAIN:
>       if (pc_info->domain_id != did) {
>         return false;
>       }
>       break;
>     case PASID:
>       if (pc_info->pasid != pasid) {
>         return false;
>       }
>       break;
>     ... (I think you'll add more in the follow up patches)
>   }

yep, I can do it.

> > +        /*
> > +         * Modify pasid_cache_gen to be 0, the cached pasid entry in
> > +         * vtd_pasid_as instance is invalid. And vtd_pasid_as instance
> > +         * would be treated as invalid in QEMU scope until the pasid
> > +         * cache gen is updated in a new pasid binding or updated in
> > +         * below logic if found guest pasid entry exists.
> > +         */
> > +        remove = true;
> 
> Why set remove here?  Should we set it only if we found that the entry
> is cleared?

Yes, you are right. But it is only for passthru devices. For emulated
sva-capable device, I think it would simple to always remove cached
pasid-entry if guest issues pasid cache invalidation. This is because
caching-mode is not necessary for emulated devices, which means pasid
cache invalidation for emulated devices only means cache flush. This
is enough as the pasid entry can be re-cached during PASID tagged DMA
translation in do_translate(). Although it is not yet added as I
mentioned in the patch commit message. While for passthru devices,
pasid cache invalidation does not only mean cache flush. Instead, it
relies on the latest guest pasid entry presence status.

Based on the above idea, I make the remove=true at the beginning, and
if the subsequent logic finds out it is for passthru devices, it will
check guest pasid entry and then decide how to handle the pasid cache
invalidation request. "remove" will be set to be false when guest pasid
entry exists.

> 
> > +        pc_entry->pasid_cache_gen = 0;
> > +        if (vtd_bus->dev_ic[devfn]) {
> > +            if (!vtd_dev_get_pe_from_pasid(s,
> > +                      pci_bus_num(vtd_bus->bus), devfn, pasid, &pe)) {
> > +                /*
> > +                 * pasid entry exists, so keep the vtd_pasid_as, and needs
> > +                 * update the pasid entry cached in vtd_pasid_as. Also, if
> > +                 * the guest pasid entry doesn't equal to cached pasid entry
> > +                 * needs to issue a pasid bind to host for passthru devies.
> > +                 */
> > +                remove = false;
> > +                pc_entry->pasid_cache_gen = s->pasid_cache_gen;
> > +                if (!vtd_pasid_entry_compare(&pe, &pc_entry->pasid_entry)) {
> > +                    pc_entry->pasid_entry = pe;
> 
> What if the pasid entry changed from valid to all zeros?  Should we
> unbind/remove it as well?

If it is from valid to all zero, vtd_dev_get_pe_from_pasid() should
return non-zero. Then it would keep "remove"=true and pasid_cache_gen=0.
Unbind will be added with bind in patch 15. Here just handle the cached
pasid entry within vIOMMU.

0015-intel_iommu-bind-unbind-guest-page-table-to-host.patch

> 
> > +                    /*
> > +                     * TODO: when pasid-base-iotlb(piotlb) infrastructure is
> > +                     * ready, should invalidate QEMU piotlb togehter with this
> > +                     * change.
> > +                     */
> > +                }
> > +            }
> > +        }
> > +    }
> > +
> > +    return remove;
> 
> In summary, IMHO this chunk could be clearer if like this:
> 
>   ... (continues with above pesudo code)
> 
>   ret = vtd_dev_get_pe_from_pasid(..., &pe);
>   if (ret) {
>     goto remove;
>   }
>   // detected correct pasid entry
>   if (!vtd_pasid_entry_compare(&pe, ...)) {
>      // pasid entry changed
>      if (vtd_pasid_cleared(&pe)) {
>        // the pasid is cleared to all zero, drop
>        goto remove;
>      }
>      // a new pasid is setup
> 
>      // Send UNBIND if cache valid
>      ...
>      // Send BIND
>      ...
>      // Update cache
>      pc_entry->pasid_entry = pe;
>      pc_entry->pasid_cache_gen = s->pasid_cache_gen;
>   }
> 
> remove:
>   // Send UNBIND if cache valid
>   ...
>   return true;

yep, I can do it. nice idea. :-)

> I feel like you shouldn't bother checking against
> vtd_bus->dev_ic[devfn] at all here because if that was set then it
> means we need to pass these information down to host, and it'll be
> checked automatically because when we send BIND/UNBIND event we'll
> definitely check that too otherwise those events will be noops.

I need it. Because I want to differ passthru devices and emulated
devices. Ideally, emulated devices won't have vtd_bus->dev_ic[devfn].

> > +}
> > +
> >  static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id)
> >  {
> > +    VTDPASIDCacheInfo pc_info;
> > +
> > +    trace_vtd_pasid_cache_dsi(domain_id);
> > +
> > +    pc_info.flags = VTD_PASID_CACHE_DOMSI;
> > +    pc_info.domain_id = domain_id;
> > +
> > +    /*
> > +     * Loop all existing pasid caches and update them.
> > +     */
> > +    vtd_iommu_lock(s);
> > +    g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
> > +    vtd_iommu_unlock(s);
> > +
> > +    /*
> > +     * TODO: Domain selective PASID cache invalidation
> > +     * may be issued wrongly by programmer, to be safe,
> 
> IMHO it's not wrong even if the guest sends that, because logically
> the guest can send invalidation as it wishes, and we should have
> similar issue before on the 2nd level page table invalidations... and
> that's why we need to keep the iova mapping inside qemu I suppose...

yes, we are aligned on this point. I can update the description above.

> 
> > +     * after invalidating the pasid caches, emulator
> > +     * needs to replay the pasid bindings by walking guest
> > +     * pasid dir and pasid table.
> 
> This is true...

handshake here.

> 
> > +     */
> >      return 0;
> >  }
> >
> > +/**
> > + * This function finds or adds a VTDPASIDAddressSpace for a device
> > + * when it is bound to a pasid. Caller of this function should hold
> > + * iommu_lock.
> > + */
> > +static VTDPASIDAddressSpace *vtd_add_find_pasid_as(IntelIOMMUState *s,
> > +                                                   VTDBus *vtd_bus,
> > +                                                   int devfn,
> > +                                                   uint32_t pasid,
> > +                                                   bool allocate)
> > +{
> > +    struct pasid_key *key;
> > +    struct pasid_key *new_key;
> > +    VTDPASIDAddressSpace *vtd_pasid_as;
> > +    uint16_t sid;
> > +
> > +    sid = vtd_make_source_id(pci_bus_num(vtd_bus->bus), devfn);
> > +    key = vtd_get_pasid_key(pasid, sid);
> > +    vtd_pasid_as = g_hash_table_lookup(s->vtd_pasid_as, key);
> > +
> > +    if (!vtd_pasid_as && allocate) {
> > +        new_key = vtd_get_pasid_key(pasid, sid);
> 
> Is this the same as key no matter what?

It is the key content matters. I'd need to refine the key alloc/free in
next version.

> 
> > +        /*
> > +         * Initiate the vtd_pasid_as structure.
> > +         *
> > +         * This structure here is used to track the guest pasid
> > +         * binding and also serves as pasid-cache mangement entry.
> > +         *
> > +         * TODO: in future, if wants to support the SVA-aware DMA
> > +         *       emulation, the vtd_pasid_as should be fully initialized.
> > +         *       e.g. the address_space and memory region fields.
> > +         */
> > +        vtd_pasid_as = g_malloc0(sizeof(VTDPASIDAddressSpace));
> > +        vtd_pasid_as->iommu_state = s;
> > +        vtd_pasid_as->vtd_bus = vtd_bus;
> > +        vtd_pasid_as->devfn = devfn;
> > +        vtd_pasid_as->context_cache_entry.context_cache_gen = 0;
> > +        vtd_pasid_as->pasid = pasid;
> > +        vtd_pasid_as->pasid_cache_entry.pasid_cache_gen = 0;
> > +        g_hash_table_insert(s->vtd_pasid_as, new_key, vtd_pasid_as);
> > +    }
> > +    return vtd_pasid_as;
> > +}
> > +
> > + /**
> > +  * This function updates the pasid entry cached in &vtd_pasid_as.
> > +  * Caller of this function should hold iommu_lock.
> > +  */
> > +static inline void vtd_fill_in_pe_cache(
> > +              VTDPASIDAddressSpace *vtd_pasid_as, VTDPASIDEntry *pe)
> > +{
> > +    IntelIOMMUState *s = vtd_pasid_as->iommu_state;
> > +    VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry;
> > +
> > +    pc_entry->pasid_entry = *pe;
> > +    pc_entry->pasid_cache_gen = s->pasid_cache_gen;
> > +}
> > +
> >  static int vtd_pasid_cache_psi(IntelIOMMUState *s,
> >                                 uint16_t domain_id, uint32_t pasid)
> >  {
> > +    VTDPASIDCacheInfo pc_info;
> > +    VTDPASIDEntry pe;
> > +    VTDBus *vtd_bus;
> > +    int bus_n, devfn;
> > +    VTDPASIDAddressSpace *vtd_pasid_as;
> > +    VTDIOMMUContext *vtd_ic;
> > +
> > +    pc_info.flags = VTD_PASID_CACHE_DOMSI;
> > +    pc_info.domain_id = domain_id;
> > +    pc_info.flags |= VTD_PASID_CACHE_PASIDSI;
> > +    pc_info.pasid = pasid;
> > +
> > +    /*
> > +     * Regards to a pasid selective pasid cache invalidation (PSI), it
> > +     * could be either cases of below:
> > +     * a) a present pasid entry moved to non-present
> > +     * b) a present pasid entry to be a present entry
> > +     * c) a non-present pasid entry moved to present
> > +     *
> > +     * Here the handling of a PSI is:
> > +     * 1) loop all the exisitng vtd_pasid_as instances to update them
> > +     *    according to the latest guest pasid entry in pasid table.
> > +     *    this will make sure affected existing vtd_pasid_as instances
> > +     *    cached the latest pasid entries. Also, during the loop, the
> > +     *    host should be notified if needed. e.g. pasid unbind or pasid
> > +     *    update. Should be able to cover case a) and case b).
> > +     *
> > +     * 2) loop all devices to cover case c)
> > +     *    However, it is not good to always loop all devices. In this
> > +     *    implementation. We do it in this ways:
> > +     *    - For devices which have VTDIOMMUContext instances, we loop
> > +     *      them and check if guest pasid entry exists. If yes, it is
> > +     *      case c), we update the pasid cache and also notify host.
> > +     *    - For devices which have no VTDIOMMUContext instances, it is
> > +     *      not necessary to create pasid cache at this phase since it
> > +     *      could be created when vIOMMU do DMA address translation.
> > +     *      This is not implemented yet since no PASID-capable emulated
> > +     *      devices today. If we have it in future, the pasid cache shall
> > +     *      be created there.
> > +     */
> > +
> > +    vtd_iommu_lock(s);
> > +    g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
> > +    vtd_iommu_unlock(s);
> 
> [2]
> 
> > +
> > +    vtd_iommu_lock(s);
> 
> Do you want to explicitly release the lock for other thread?
> Otherwise I don't see a point to unlock/lock in sequence..

I felt like to have shorter protected snippets. But, I don’t have strong
reason either after reconsidering it. I'll remove it anyhow.

> > +    QLIST_FOREACH(vtd_ic, &s->vtd_dev_ic_list, next) {
> > +        vtd_bus = vtd_ic->vtd_bus;
> > +        devfn = vtd_ic->devfn;
> > +        bus_n = pci_bus_num(vtd_bus->bus);
> > +
> > +        /* Step 1: fetch vtd_pasid_as and check if it is valid */
> > +        vtd_pasid_as = vtd_add_find_pasid_as(s, vtd_bus,
> > +                                        devfn, pasid, true);
> > +        if (vtd_pasid_as &&
> > +            (s->pasid_cache_gen ==
> > +             vtd_pasid_as->pasid_cache_entry.pasid_cache_gen)) {
> > +            /*
> > +             * pasid_cache_gen equals to s->pasid_cache_gen means
> > +             * vtd_pasid_as is valid after the above s->vtd_pasid_as
> > +             * updates. Thus no need for the below steps.
> > +             */
> > +            continue;
> > +        }
> > +
> > +        /*
> > +         * Step 2: vtd_pasid_as is not valid, it's potentailly a
> > +         * new pasid bind. Fetch guest pasid entry.
> > +         */
> > +        if (vtd_dev_get_pe_from_pasid(s, bus_n, devfn, pasid, &pe)) {
> > +            continue;
> > +        }
> > +
> > +        /*
> > +         * Step 3: pasid entry exists, update pasid cache
> > +         *
> > +         * Here need to check domain ID since guest pasid entry
> > +         * exists. What needs to do are:
> > +         *   - update the pc_entry in the vtd_pasid_as
> > +         *   - set proper pc_entry.pasid_cache_gen
> > +         *   - passdown the latest guest pasid entry config to host
> > +         *     (will be added in later patch)
> > +         */
> > +        if (domain_id == vtd_pe_get_domain_id(&pe)) {
> > +            vtd_fill_in_pe_cache(vtd_pasid_as, &pe);
> > +        }
> > +    }
> 
> Could you explain why do we need this whole chunk if with [2] above?
> I feel like that'll do all the things we need already (send
> BIND/UNBIND, update pasid entry cache).

You may refer to the comments added for this function in the patch
itself. Also, I'd like to talk more to assist your review. The basic
idea is that the above chunk [2] only handles the already cached
pasid-entries. right? It covers the modifications from present pasid
entry to be either non-present or present modifications. While for a
non-present pasid entry to present modification, chunk [2] has no idea.
To cover such possibilities, needs to loop all devices and check the
corresponding pasid entries. This is what I proposed in RFC v1. But I
don’t like it. To be more efficient, I think we can just loop all
passthru devices since only passthru devices care about the
non-present to present changes. For emulated devices, its pasid cache
can be created in do_translate() for emulated PASID tagged DMAs.

> 
> > +    vtd_iommu_unlock(s);
> >      return 0;
> >  }
> >
> > +/**
> > + * Caller of this function should hold iommu_lock
> > + */
> > +static void vtd_pasid_cache_reset(IntelIOMMUState *s)
> > +{
> > +    VTDPASIDCacheInfo pc_info;
> > +
> > +    trace_vtd_pasid_cache_reset();
> > +
> > +    pc_info.flags = 0;
> 
> Maybe also introduce a flag for GLOBAL flush to be clear?

Will do it. Thanks.

> > +
> > +    /*
> > +     * Reset pasid cache is a big hammer, so use g_hash_table_foreach_remove
> > +     * which will free the vtd_pasid_as instances.
> > +     */
> > +    g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
> > +    s->pasid_cache_gen = 1;
> > +}
> > +
> >  static int vtd_pasid_cache_gsi(IntelIOMMUState *s)
> >  {
> > +    trace_vtd_pasid_cache_gsi();
> > +
> > +    vtd_iommu_lock(s);
> > +    vtd_pasid_cache_reset(s);
> > +    vtd_iommu_unlock(s);
> > +
> > +    /*
> > +     * TODO: Global PASID cache invalidation may be
> > +     * issued wrongly by programmer, to be safe, after
> > +     * invalidating the pasid caches, emulator needs
> > +     * to replay the pasid bindings by walking guest
> > +     * pasid dir and pasid table.
> > +     */
> >      return 0;
> >  }
> >
> > @@ -3660,7 +4011,9 @@ VTDIOMMUContext
> *vtd_find_add_ic(IntelIOMMUState *s,
> >          vtd_dev_ic->devfn = (uint8_t)devfn;
> >          vtd_dev_ic->iommu_state = s;
> >          iommu_context_init(&vtd_dev_ic->iommu_context);
> > +        QLIST_INSERT_HEAD(&s->vtd_dev_ic_list, vtd_dev_ic, next);
> >      }
> > +
> >      return vtd_dev_ic;
> >  }
> >
> > @@ -4074,6 +4427,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
> >      }
> >
> >      QLIST_INIT(&s->vtd_as_with_notifiers);
> > +    QLIST_INIT(&s->vtd_dev_ic_list);
> >      qemu_mutex_init(&s->iommu_lock);
> >      memset(s->vtd_as_by_bus_num, 0, sizeof(s->vtd_as_by_bus_num));
> >      memory_region_init_io(&s->csrmem, OBJECT(s), &vtd_mem_ops, s,
> > @@ -4099,6 +4453,8 @@ static void vtd_realize(DeviceState *dev, Error **errp)
> >                                       g_free, g_free);
> >      s->vtd_as_by_busptr = g_hash_table_new_full(vtd_uint64_hash,
> vtd_uint64_equal,
> >                                                g_free, g_free);
> > +    s->vtd_pasid_as = g_hash_table_new_full(vtd_pasid_as_key_hash,
> > +                                   vtd_pasid_as_key_equal, g_free, g_free);
> >      vtd_init(s);
> >      sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
> >      pci_setup_iommu(bus, &vtd_iommu_ops, dev);
> > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> > index 879211e..12873e1 100644
> > --- a/hw/i386/intel_iommu_internal.h
> > +++ b/hw/i386/intel_iommu_internal.h
> > @@ -311,6 +311,7 @@ typedef enum VTDFaultReason {
> >      VTD_FR_IR_SID_ERR = 0x26,   /* Invalid Source-ID */
> >
> >      VTD_FR_PASID_TABLE_INV = 0x58,  /*Invalid PASID table entry */
> > +    VTD_FR_PASID_ENTRY_P = 0x59, /* The Present(P) field of pasidt-entry is 0 */
> >
> >      /* This is not a normal fault reason. We use this to indicate some faults
> >       * that are not referenced by the VT-d specification.
> > @@ -482,6 +483,15 @@ struct VTDRootEntry {
> >  };
> >  typedef struct VTDRootEntry VTDRootEntry;
> >
> > +struct VTDPASIDCacheInfo {
> > +#define VTD_PASID_CACHE_DOMSI   (1ULL << 0);
> > +#define VTD_PASID_CACHE_PASIDSI (1ULL << 1);
> > +    uint32_t flags;
> > +    uint16_t domain_id;
> > +    uint32_t pasid;
> > +};
> > +typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo;
> > +
> >  /* Masks for struct VTDRootEntry */
> >  #define VTD_ROOT_ENTRY_P            1ULL
> >  #define VTD_ROOT_ENTRY_CTP          (~0xfffULL)
> > diff --git a/hw/i386/trace-events b/hw/i386/trace-events
> > index 6da8bd2..7912ae1 100644
> > --- a/hw/i386/trace-events
> > +++ b/hw/i386/trace-events
> > @@ -22,6 +22,7 @@ vtd_inv_qi_head(uint16_t head) "read head %d"
> >  vtd_inv_qi_tail(uint16_t head) "write tail %d"
> >  vtd_inv_qi_fetch(void) ""
> >  vtd_context_cache_reset(void) ""
> > +vtd_pasid_cache_reset(void) ""
> >  vtd_pasid_cache_gsi(void) ""
> >  vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain
> 0x%"PRIx16
> >  vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC
> invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32
> > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > index 0d49480..d693f71 100644
> > --- a/include/hw/i386/intel_iommu.h
> > +++ b/include/hw/i386/intel_iommu.h
> > @@ -69,6 +69,8 @@ typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
> >  typedef struct VTDPASIDDirEntry VTDPASIDDirEntry;
> >  typedef struct VTDPASIDEntry VTDPASIDEntry;
> >  typedef struct VTDIOMMUContext VTDIOMMUContext;
> > +typedef struct VTDPASIDCacheEntry VTDPASIDCacheEntry;
> > +typedef struct VTDPASIDAddressSpace VTDPASIDAddressSpace;
> >
> >  /* Context-Entry */
> >  struct VTDContextEntry {
> > @@ -101,6 +103,31 @@ struct VTDPASIDEntry {
> >      uint64_t val[8];
> >  };
> >
> > +struct pasid_key {
> > +    uint32_t pasid;
> > +    uint16_t sid;
> > +};
> > +
> > +struct VTDPASIDCacheEntry {
> > +    /*
> > +     * The cache entry is obsolete if
> > +     * pasid_cache_gen!=IntelIOMMUState.pasid_cache_gen
> > +     */
> > +    uint32_t pasid_cache_gen;
> > +    struct VTDPASIDEntry pasid_entry;
> > +};
> > +
> > +struct VTDPASIDAddressSpace {
> > +    VTDBus *vtd_bus;
> > +    uint8_t devfn;
> > +    AddressSpace as;
> > +    uint32_t pasid;
> > +    IntelIOMMUState *iommu_state;
> > +    VTDContextCacheEntry context_cache_entry;
> > +    QLIST_ENTRY(VTDPASIDAddressSpace) next;
> > +    VTDPASIDCacheEntry pasid_cache_entry;
> > +};
> > +
> >  struct VTDAddressSpace {
> >      PCIBus *bus;
> >      uint8_t devfn;
> > @@ -121,6 +148,7 @@ struct VTDIOMMUContext {
> >      VTDBus *vtd_bus;
> >      uint8_t devfn;
> >      IOMMUContext iommu_context;
> > +    QLIST_ENTRY(VTDIOMMUContext) next;
> >      IntelIOMMUState *iommu_state;
> >  };
> >
> > @@ -269,9 +297,14 @@ struct IntelIOMMUState {
> >
> >      GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus*
> reference */
> >      VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed
> by bus number */
> > +    GHashTable *vtd_pasid_as;   /* VTDPASIDAddressSpace objects */
> > +    uint32_t pasid_cache_gen;   /* Should be in [1,MAX] */
> >      /* list of registered notifiers */
> >      QLIST_HEAD(, VTDAddressSpace) vtd_as_with_notifiers;
> >
> > +    /* list of registered notifiers */
> > +    QLIST_HEAD(, VTDIOMMUContext) vtd_dev_ic_list;
> > +
> >      /* interrupt remapping */
> >      bool intr_enabled;              /* Whether guest enabled IR */
> >      dma_addr_t intr_root;           /* Interrupt remapping table pointer */
> > @@ -288,7 +321,8 @@ struct IntelIOMMUState {
> >
> >      /*
> >       * Protects IOMMU states in general.  Currently it protects the
> > -     * per-IOMMU IOTLB cache, and context entry cache in VTDAddressSpace.
> > +     * per-IOMMU IOTLB cache, and context entry cache in VTDAddressSpace,
> > +     * and pasid cache in VTDPASIDAddressSpace.
> >       */
> >      QemuMutex iommu_lock;
> >  };
> > --
> > 2.7.4
> >
> 
> --
> Peter Xu

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 15/22] intel_iommu: bind/unbind guest page table to host
  2019-11-04 20:25   ` Peter Xu
@ 2019-11-06  8:10     ` Liu, Yi L
  2019-11-06 14:27       ` Peter Xu
  0 siblings, 1 reply; 79+ messages in thread
From: Liu, Yi L @ 2019-11-06  8:10 UTC (permalink / raw)
  To: Peter Xu
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, eric.auger, alex.williamson, pbonzini, Sun, Yi Y,
	david

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Tuesday, November 5, 2019 4:26 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 15/22] intel_iommu: bind/unbind guest page table to host
> 
> On Thu, Oct 24, 2019 at 08:34:36AM -0400, Liu Yi L wrote:
> > This patch captures the guest PASID table entry modifications and
> > propagates the changes to host to setup nested translation. The
> > guest page table is configured as 1st level page table (GVA->GPA)
> > whose translation result would further go through host VT-d 2nd
> > level page table(GPA->HPA) under nested translation mode. This is
> > a key part of vSVA support.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> >  hw/i386/intel_iommu.c          | 81
> ++++++++++++++++++++++++++++++++++++++++++
> >  hw/i386/intel_iommu_internal.h | 20 +++++++++++
> >  2 files changed, 101 insertions(+)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index d8827c9..793b0de 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -41,6 +41,7 @@
> >  #include "migration/vmstate.h"
> >  #include "trace.h"
> >  #include "qemu/jhash.h"
> > +#include <linux/iommu.h>
> >
> >  /* context entry operations */
> >  #define VTD_CE_GET_RID2PASID(ce) \
> > @@ -695,6 +696,16 @@ static inline uint16_t
> vtd_pe_get_domain_id(VTDPASIDEntry *pe)
> >      return VTD_SM_PASID_ENTRY_DID((pe)->val[1]);
> >  }
> >
> > +static inline uint32_t vtd_pe_get_fl_aw(VTDPASIDEntry *pe)
> > +{
> > +    return 48 + ((pe->val[2] >> 2) & VTD_SM_PASID_ENTRY_FLPM) * 9;
> > +}
> > +
> > +static inline dma_addr_t vtd_pe_get_flpt_base(VTDPASIDEntry *pe)
> > +{
> > +    return pe->val[2] & VTD_SM_PASID_ENTRY_FLPTPTR;
> > +}
> > +
> >  static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire)
> >  {
> >      return pdire->val & 1;
> > @@ -1850,6 +1861,67 @@ static void
> vtd_context_global_invalidate(IntelIOMMUState *s)
> >      vtd_iommu_replay_all(s);
> >  }
> >
> > +static void vtd_bind_guest_pasid(IntelIOMMUState *s, VTDBus *vtd_bus,
> > +            int devfn, int pasid, VTDPASIDEntry *pe, VTDPASIDOp op)
> > +{
> > +#ifdef __linux__
> > +    VTDIOMMUContext *vtd_ic;
> > +    IOMMUCTXEventData event_data;
> > +    IOMMUCTXPASIDBindData bind;
> > +    struct iommu_gpasid_bind_data *g_bind_data;
> > +
> > +    vtd_ic = vtd_bus->dev_ic[devfn];
> > +    if (!vtd_ic) {
> > +        return;
> > +    }
> > +
> > +    g_bind_data = g_malloc0(sizeof(*g_bind_data));
> > +    bind.flag = 0;
> > +    g_bind_data->flags = 0;
> > +    g_bind_data->vtd.flags = 0;
> > +    switch (op) {
> > +    case VTD_PASID_BIND:
> > +    case VTD_PASID_UPDATE:
> > +        g_bind_data->version = IOMMU_GPASID_BIND_VERSION_1;
> > +        g_bind_data->format = IOMMU_PASID_FORMAT_INTEL_VTD;
> > +        g_bind_data->gpgd = vtd_pe_get_flpt_base(pe);
> > +        g_bind_data->addr_width = vtd_pe_get_fl_aw(pe);
> > +        g_bind_data->hpasid = pasid;
> > +        g_bind_data->gpasid = pasid;
> > +        g_bind_data->flags |= IOMMU_SVA_GPASID_VAL;
> > +        g_bind_data->vtd.flags =
> > +                             (VTD_SM_PASID_ENTRY_SRE_BIT(pe->val[2]) ? 1 : 0)
> > +                           | (VTD_SM_PASID_ENTRY_EAFE_BIT(pe->val[2]) ? 1 : 0)
> > +                           | (VTD_SM_PASID_ENTRY_PCD_BIT(pe->val[1]) ? 1 : 0)
> > +                           | (VTD_SM_PASID_ENTRY_PWT_BIT(pe->val[1]) ? 1 : 0)
> > +                           | (VTD_SM_PASID_ENTRY_EMTE_BIT(pe->val[1]) ? 1 : 0)
> > +                           | (VTD_SM_PASID_ENTRY_CD_BIT(pe->val[1]) ? 1 : 0);
> > +        g_bind_data->vtd.pat = VTD_SM_PASID_ENTRY_PAT(pe->val[1]);
> > +        g_bind_data->vtd.emt = VTD_SM_PASID_ENTRY_EMT(pe->val[1]);
> > +        bind.flag |= IOMMU_CTX_BIND_PASID;
> > +        break;
> > +
> > +    case VTD_PASID_UNBIND:
> > +        g_bind_data->gpgd = 0;
> > +        g_bind_data->addr_width = 0;
> > +        g_bind_data->hpasid = pasid;
> > +        bind.flag |= IOMMU_CTX_UNBIND_PASID;
> > +        break;
> > +
> > +    default:
> > +        printf("Unknown VTDPASIDOp!!\n");
> 
> Please don't use printf()..  Here assert() suits.

Will correct it. Thanks.

> 
> > +        break;
> > +    }
> > +    if (bind.flag) {
> 
> Will this be untrue?  If not, assert() works too.

yes, it is possible. If an unknown VTDPASIDOp, then no switch case
will initiate bind.flag.

> > +        event_data.event = IOMMU_CTX_EVENT_PASID_BIND;
> > +        bind.data = g_bind_data;
> > +        event_data.data = &bind;
> > +        iommu_ctx_event_notify(&vtd_ic->iommu_context, &event_data);
> > +    }
> > +    g_free(g_bind_data);
> > +#endif
> > +}
> > +
> >  /* Do a context-cache device-selective invalidation.
> >   * @func_mask: FM field after shifting
> >   */
> > @@ -2528,12 +2600,17 @@ static gboolean vtd_flush_pasid(gpointer key,
> gpointer value,
> >                  pc_entry->pasid_cache_gen = s->pasid_cache_gen;
> >                  if (!vtd_pasid_entry_compare(&pe, &pc_entry->pasid_entry)) {
> >                      pc_entry->pasid_entry = pe;
> > +                    vtd_bind_guest_pasid(s, vtd_bus, devfn,
> > +                                     pasid, &pe, VTD_PASID_UPDATE);
> >                      /*
> >                       * TODO: when pasid-base-iotlb(piotlb) infrastructure is
> >                       * ready, should invalidate QEMU piotlb togehter with this
> >                       * change.
> >                       */
> >                  }
> > +            } else {
> > +                vtd_bind_guest_pasid(s, vtd_bus, devfn,
> > +                                  pasid, NULL, VTD_PASID_UNBIND);
> 
> Please see the reply in the other thread on vtd_flush_pasid().  I've
> filled in where I feel like this UNBIND should exist, I feel like your
> current code could miss some places where you should unbind but didn't.

I've replied in that thread regards to your comments. May you
reconsider it here. Hope, it suits what you thought. If still
something missed, pls feel free to point out.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 12/22] intel_iommu: add present bit check for pasid table entries
  2019-11-02 16:20   ` Peter Xu
@ 2019-11-06  8:14     ` Liu, Yi L
  0 siblings, 0 replies; 79+ messages in thread
From: Liu, Yi L @ 2019-11-06  8:14 UTC (permalink / raw)
  To: Peter Xu
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, eric.auger, alex.williamson, pbonzini, Sun, Yi Y,
	david

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Sunday, November 3, 2019 12:21 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 12/22] intel_iommu: add present bit check for pasid table
> entries
> 
> On Thu, Oct 24, 2019 at 08:34:33AM -0400, Liu Yi L wrote:
> > The present bit check for pasid entry (pe) and pasid directory entry
> > (pdire) were missed in previous commits as fpd bit check doesn't
> > require present bit as "Set". This patch adds the present bit check
> > for callers which wants to get a valid pe/pdire.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks for the review.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 07/22] hw/pci: introduce pci_device_iommu_context()
  2019-11-01 18:09   ` Peter Xu
@ 2019-11-06  8:14     ` Liu, Yi L
  0 siblings, 0 replies; 79+ messages in thread
From: Liu, Yi L @ 2019-11-06  8:14 UTC (permalink / raw)
  To: Peter Xu
  Cc: tianyu.lan, Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian,
	Jun J, qemu-devel, eric.auger, alex.williamson, pbonzini, Sun,
	Yi Y, david

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Saturday, November 2, 2019 2:10 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 07/22] hw/pci: introduce pci_device_iommu_context()
> 
> On Thu, Oct 24, 2019 at 08:34:28AM -0400, Liu Yi L wrote:
> > This patch adds pci_device_iommu_context() to get an iommu_context for
> > a given device. A new callback is added in PCIIOMMUOps. Users who
> > wants to listen to events issued by vIOMMU could use this new
> > interface to get an iommu_context and register their own notifiers,
> > then wait for notifications from vIOMMU. e.g. VFIO is the first user
> > of it to listen to the PASID_ALLOC/PASID_BIND/CACHE_INV events and
> > propagate the events to host.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Cc: David Gibson <david@gibson.dropbear.id.au>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks for the review.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 06/22] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps
  2019-11-01 18:09   ` Peter Xu
@ 2019-11-06  8:15     ` Liu, Yi L
  0 siblings, 0 replies; 79+ messages in thread
From: Liu, Yi L @ 2019-11-06  8:15 UTC (permalink / raw)
  To: Peter Xu
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, eric.auger, alex.williamson, pbonzini, Sun, Yi Y,
	david

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Saturday, November 2, 2019 2:09 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 06/22] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps
> 
> On Thu, Oct 24, 2019 at 08:34:27AM -0400, Liu Yi L wrote:
> > This patch modifies pci_setup_iommu() to set PCIIOMMUOps instead of
> > only setting PCIIOMMUFunc. PCIIOMMUFunc is previously used to get an
> > address space for a device in vendor specific way. The PCIIOMMUOps
> > still offers this functionality. Use PCIIOMMUOps leaves space to add
> > more iommu related vendor specific operations.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Cc: David Gibson <david@gibson.dropbear.id.au>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks for the review.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 06/22] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps
  2019-10-27 17:43   ` David Gibson
@ 2019-11-06  8:18     ` Liu, Yi L
  0 siblings, 0 replies; 79+ messages in thread
From: Liu, Yi L @ 2019-11-06  8:18 UTC (permalink / raw)
  To: David Gibson
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, peterx, eric.auger, alex.williamson, pbonzini, Sun,
	Yi Y

> From: David Gibson
> Sent: Monday, October 28, 2019 1:44 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 06/22] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps
> 
> On Thu, Oct 24, 2019 at 08:34:27AM -0400, Liu Yi L wrote:
> > This patch modifies pci_setup_iommu() to set PCIIOMMUOps instead of
> > only setting PCIIOMMUFunc. PCIIOMMUFunc is previously used to get an
> > address space for a device in vendor specific way. The PCIIOMMUOps
> > still offers this functionality. Use PCIIOMMUOps leaves space to add
> > more iommu related vendor specific operations.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Cc: David Gibson <david@gibson.dropbear.id.au>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Thanks for the review.

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 07/22] hw/pci: introduce pci_device_iommu_context()
  2019-10-29 11:50   ` David Gibson
@ 2019-11-06  8:20     ` Liu, Yi L
  0 siblings, 0 replies; 79+ messages in thread
From: Liu, Yi L @ 2019-11-06  8:20 UTC (permalink / raw)
  To: David Gibson
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, peterx, eric.auger, alex.williamson, pbonzini, Sun,
	Yi Y

> From: David Gibson
> Sent: Tuesday, October 29, 2019 7:51 PM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 07/22] hw/pci: introduce pci_device_iommu_context()
> 
> On Thu, Oct 24, 2019 at 08:34:28AM -0400, Liu Yi L wrote:
> > This patch adds pci_device_iommu_context() to get an iommu_context for
> > a given device. A new callback is added in PCIIOMMUOps. Users who
> > wants to listen to events issued by vIOMMU could use this new
> > interface to get an iommu_context and register their own notifiers,
> > then wait for notifications from vIOMMU. e.g. VFIO is the first user
> > of it to listen to the PASID_ALLOC/PASID_BIND/CACHE_INV events and
> > propagate the events to host.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Cc: David Gibson <david@gibson.dropbear.id.au>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Thanks for the review.

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 03/22] intel_iommu: modify x-scalable-mode to be string option
  2019-11-05 12:50       ` Peter Xu
@ 2019-11-06  9:50         ` Liu, Yi L
  0 siblings, 0 replies; 79+ messages in thread
From: Liu, Yi L @ 2019-11-06  9:50 UTC (permalink / raw)
  To: Peter Xu
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, eric.auger, alex.williamson, pbonzini, Sun, Yi Y,
	david

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Tuesday, November 5, 2019 8:50 PM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 03/22] intel_iommu: modify x-scalable-mode to be string option
> 
> On Tue, Nov 05, 2019 at 09:14:08AM +0000, Liu, Yi L wrote:
> > > Something like:
> > >
> > >   - s->scalable_mode_str to keep the string
> > >   - s->scalable_mode still as a bool to cache the global enablement
> > >   - s->scalable_modern as a bool to keep the mode
> > >
> > > ?
> >
> > So x-scalable-mode is still a string option, just to have a new field to store it?
> 
> Yep.  I'd say maybe we should start to allow to define some union-ish properties, but
> for now I think string is ok.

ok, let me make it in next version.

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 08/22] intel_iommu: provide get_iommu_context() callback
  2019-11-01 14:55   ` Peter Xu
@ 2019-11-06 11:07     ` Liu, Yi L
  0 siblings, 0 replies; 79+ messages in thread
From: Liu, Yi L @ 2019-11-06 11:07 UTC (permalink / raw)
  To: Peter Xu
  Cc: tianyu.lan, Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian,
	Jun J, qemu-devel, eric.auger, alex.williamson, pbonzini, Sun,
	Yi Y, david

> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Friday, November 1, 2019 10:55 PM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 08/22] intel_iommu: provide get_iommu_context() callback
> 
> On Thu, Oct 24, 2019 at 08:34:29AM -0400, Liu Yi L wrote:
> > This patch adds get_iommu_context() callback to return an iommu_context
> > Intel VT-d platform.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> >  hw/i386/intel_iommu.c         | 57 ++++++++++++++++++++++++++++++++++++++---
> --
> >  include/hw/i386/intel_iommu.h | 14 ++++++++++-
> >  2 files changed, 64 insertions(+), 7 deletions(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 67a7836..e9f8692 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -3288,22 +3288,33 @@ static const MemoryRegionOps vtd_mem_ir_ops = {
> >      },
> >  };
> >
> > -VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
> > +static VTDBus *vtd_find_add_bus(IntelIOMMUState *s, PCIBus *bus)
> >  {
> >      uintptr_t key = (uintptr_t)bus;
> > -    VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
> > -    VTDAddressSpace *vtd_dev_as;
> > -    char name[128];
> > +    VTDBus *vtd_bus;
> >
> > +    vtd_iommu_lock(s);
> 
> Why explicitly take the IOMMU lock here?  I mean, it's fine to take
> it, but if so why not take it to cover the whole vtd_find_add_as()?

Just wanted to make the protected snippet smaller. But I'm fine to move it
to vtd_find_add_as() if there is no much value for putting it here.

> For now it'll be fine in either way because I believe iommu_lock is
> not really functioning when we're still with BQL here, however if you
> add that explicitly then I don't see why it's not covering that.

Got it. It functions if you missed to put a mirrored unlock after a lock. (joke)

> 
> > +    vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
> >      if (!vtd_bus) {
> >          uintptr_t *new_key = g_malloc(sizeof(*new_key));
> >          *new_key = (uintptr_t)bus;
> >          /* No corresponding free() */
> > -        vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * \
> > -                            PCI_DEVFN_MAX);
> > +        vtd_bus = g_malloc0(sizeof(VTDBus) + PCI_DEVFN_MAX * \
> > +                    (sizeof(VTDAddressSpace *) + sizeof(VTDIOMMUContext *)));
> 
> Should this be as simple as g_malloc0(sizeof(VTDBus) since [1]?

yes, it's old writing. Will modify it.

> Otherwise the patch looks sane to me.
> 
> >          vtd_bus->bus = bus;
> >          g_hash_table_insert(s->vtd_as_by_busptr, new_key, vtd_bus);
> >      }
> > +    vtd_iommu_unlock(s);
> > +    return vtd_bus;
> > +}
> 
> [...]
> 
> >  struct VTDBus {
> >      PCIBus* bus;		/* A reference to the bus to provide translation for
> */
> > -    VTDAddressSpace *dev_as[0];	/* A table of VTDAddressSpace objects
> indexed by devfn */
> > +    /* A table of VTDAddressSpace objects indexed by devfn */
> > +    VTDAddressSpace *dev_as[PCI_DEVFN_MAX];
> > +    /* A table of VTDIOMMUContext objects indexed by devfn */
> > +    VTDIOMMUContext *dev_ic[PCI_DEVFN_MAX];
> 
> [1]

exactly.

> 
> >  };
> >
> >  struct VTDIOTLBEntry {
> > @@ -282,5 +293,6 @@ struct IntelIOMMUState {
> >   * create a new one if none exists
> >   */
> >  VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn);
> > +VTDIOMMUContext *vtd_find_add_ic(IntelIOMMUState *s, PCIBus *bus, int
> devfn);
> >
> >  #endif
> > --
> > 2.7.4
> >

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 05/22] vfio/common: add iommu_ctx_notifier in container
  2019-11-01 14:58   ` Peter Xu
@ 2019-11-06 11:08     ` Liu, Yi L
  0 siblings, 0 replies; 79+ messages in thread
From: Liu, Yi L @ 2019-11-06 11:08 UTC (permalink / raw)
  To: Peter Xu
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, eric.auger, alex.williamson, pbonzini, Sun, Yi Y,
	david

> From: Peter Xu
> Sent: Friday, November 1, 2019 10:59 PM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 05/22] vfio/common: add iommu_ctx_notifier in container
> 
> On Thu, Oct 24, 2019 at 08:34:26AM -0400, Liu Yi L wrote:
> 
> [...]
> 
> > +typedef struct VFIOIOMMUContext {
> > +    VFIOContainer *container;
> > +    IOMMUContext *iommu_ctx;
> > +    IOMMUCTXNotifier n;
> > +    QLIST_ENTRY(VFIOIOMMUContext) iommu_ctx_next; } VFIOIOMMUContext;
> > +
> 
> No strong opinion on this - but for me it would be more meaningful to squash this
> patch into where this struct is firstly used.

got it. will make it in next version.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 04/22] hw/iommu: introduce IOMMUContext
  2019-10-27 17:39   ` David Gibson
@ 2019-11-06 11:18     ` Liu, Yi L
  0 siblings, 0 replies; 79+ messages in thread
From: Liu, Yi L @ 2019-11-06 11:18 UTC (permalink / raw)
  To: David Gibson
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, peterx, eric.auger, alex.williamson, pbonzini, Sun,
	Yi Y

> From: David Gibson [mailto:david@gibson.dropbear.id.au]
> Sent: Monday, October 28, 2019 1:39 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 04/22] hw/iommu: introduce IOMMUContext
> 
> On Thu, Oct 24, 2019 at 08:34:25AM -0400, Liu Yi L wrote:
> > From: Peter Xu <peterx@redhat.com>
> >
> > This patch adds IOMMUContext as an abstract layer of IOMMU related
> > operations. The current usage of this abstract layer is setup dual-
> > stage IOMMU translation (vSVA) for vIOMMU.
> >
> > To setup dual-stage IOMMU translation, vIOMMU needs to propagate
> > guest changes to host via passthru channels (e.g. VFIO). To have
> > a better abstraction, it is better to avoid direct calling between
> > vIOMMU and VFIO. So we have this new structure to act as abstract
> > layer between VFIO and vIOMMU. So far, it is proposed to provide a
> > notifier mechanism, which registered by VFIO and fired by vIOMMU.
> >
> > For more background, may refer to the discussion below:
> >
> > https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg05022.html
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Cc: David Gibson <david@gibson.dropbear.id.au>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> >  hw/Makefile.objs         |  1 +
> >  hw/iommu/Makefile.objs   |  1 +
> >  hw/iommu/iommu.c         | 66 ++++++++++++++++++++++++++++++++++++++++
> >  include/hw/iommu/iommu.h | 79
> ++++++++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 147 insertions(+)
> >  create mode 100644 hw/iommu/Makefile.objs
> >  create mode 100644 hw/iommu/iommu.c
> >  create mode 100644 include/hw/iommu/iommu.h
> >
> > diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> > index ece6cc3..ac19f9c 100644
> > --- a/hw/Makefile.objs
> > +++ b/hw/Makefile.objs
> > @@ -39,6 +39,7 @@ devices-dirs-y += xen/
> >  devices-dirs-$(CONFIG_MEM_DEVICE) += mem/
> >  devices-dirs-y += semihosting/
> >  devices-dirs-y += smbios/
> > +devices-dirs-y += iommu/
> >  endif
> >
> >  common-obj-y += $(devices-dirs-y)
> > diff --git a/hw/iommu/Makefile.objs b/hw/iommu/Makefile.objs
> > new file mode 100644
> > index 0000000..0484b79
> > --- /dev/null
> > +++ b/hw/iommu/Makefile.objs
> > @@ -0,0 +1 @@
> > +obj-y += iommu.o
> > diff --git a/hw/iommu/iommu.c b/hw/iommu/iommu.c
> > new file mode 100644
> > index 0000000..2391b0d
> > --- /dev/null
> > +++ b/hw/iommu/iommu.c
> > @@ -0,0 +1,66 @@
> > +/*
> > + * QEMU abstract of IOMMU context
> > + *
> > + * Copyright (C) 2019 Red Hat Inc.
> > + *
> > + * Authors: Peter Xu <peterx@redhat.com>,
> > + *          Liu Yi L <yi.l.liu@intel.com>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > +
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > +
> > + * You should have received a copy of the GNU General Public License along
> > + * with this program; if not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "hw/iommu/iommu.h"
> > +
> > +void iommu_ctx_notifier_register(IOMMUContext *iommu_ctx,
> > +                                 IOMMUCTXNotifier *n,
> > +                                 IOMMUCTXNotifyFn fn,
> > +                                 IOMMUCTXEvent event)
> > +{
> > +    n->event = event;
> > +    n->iommu_ctx_event_notify = fn;
> > +    QLIST_INSERT_HEAD(&iommu_ctx->iommu_ctx_notifiers, n, node);
> 
> Having this both modify the IOMMUCTXNotifier structure and insert it
> in the list seems confusing to me - and gratuitously different from
> the interface for both IOMMUNotifier and Notifier.
> 
> Separating out a iommu_ctx_notifier_init() as a helper and having
> register take a fully initialized structure seems better to me.

Thanks, will do it in next version.

> > +    return;
> 
> Using an explicit return at the end of a function returning void is an
> odd style.

got it, will fix it in next version.

> 
> > +}
> > +
> > +void iommu_ctx_notifier_unregister(IOMMUContext *iommu_ctx,
> > +                                   IOMMUCTXNotifier *notifier)
> > +{
> > +    IOMMUCTXNotifier *cur, *next;
> > +
> > +    QLIST_FOREACH_SAFE(cur, &iommu_ctx->iommu_ctx_notifiers, node, next) {
> > +        if (cur == notifier) {
> > +            QLIST_REMOVE(cur, node);
> > +            break;
> > +        }
> > +    }
> > +}
> > +
> > +void iommu_ctx_event_notify(IOMMUContext *iommu_ctx,
> > +                            IOMMUCTXEventData *event_data)
> > +{
> > +    IOMMUCTXNotifier *cur;
> > +
> > +    QLIST_FOREACH(cur, &iommu_ctx->iommu_ctx_notifiers, node) {
> > +        if ((cur->event == event_data->event) &&
> > +                                 cur->iommu_ctx_event_notify) {
> 
> Do you actually need the test on iommu_ctx_event_notify?  I can't see
> any reason to register a notifier with a NULL function pointer.

sure, let me remove the check. I may have been too careful here. :-)

> > +            cur->iommu_ctx_event_notify(cur, event_data);
> > +        }
> > +    }
> > +}
> > +
> > +void iommu_context_init(IOMMUContext *iommu_ctx)
> > +{
> > +    QLIST_INIT(&iommu_ctx->iommu_ctx_notifiers);
> > +}
> > diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
> > new file mode 100644
> > index 0000000..c22c442
> > --- /dev/null
> > +++ b/include/hw/iommu/iommu.h
> > @@ -0,0 +1,79 @@
> > +/*
> > + * QEMU abstraction of IOMMU Context
> > + *
> > + * Copyright (C) 2019 Red Hat Inc.
> > + *
> > + * Authors: Peter Xu <peterx@redhat.com>,
> > + *          Liu, Yi L <yi.l.liu@intel.com>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > +
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > +
> > + * You should have received a copy of the GNU General Public License along
> > + * with this program; if not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#ifndef HW_PCI_PASID_H
> > +#define HW_PCI_PASID_H
> 
> These guards need to be updated for the new header name.

Oops, thanks for spotting it out.

> > +
> > +#include "qemu/queue.h"
> > +#ifndef CONFIG_USER_ONLY
> > +#include "exec/hwaddr.h"
> > +#endif
> > +
> > +typedef struct IOMMUContext IOMMUContext;
> > +
> > +enum IOMMUCTXEvent {
> > +    IOMMU_CTX_EVENT_NUM,
> > +};
> > +typedef enum IOMMUCTXEvent IOMMUCTXEvent;
> > +
> > +struct IOMMUCTXEventData {
> > +    IOMMUCTXEvent event;
> > +    uint64_t length;
> > +    void *data;
> > +};
> > +typedef struct IOMMUCTXEventData IOMMUCTXEventData;
> > +
> > +typedef struct IOMMUCTXNotifier IOMMUCTXNotifier;
> > +
> > +typedef void (*IOMMUCTXNotifyFn)(IOMMUCTXNotifier *notifier,
> > +                                 IOMMUCTXEventData *event_data);
> > +
> > +struct IOMMUCTXNotifier {
> > +    IOMMUCTXNotifyFn iommu_ctx_event_notify;
> > +    /*
> > +     * What events we are listening to. Let's allow multiple event
> > +     * registrations from beginning.
> > +     */
> > +    IOMMUCTXEvent event;
> > +    QLIST_ENTRY(IOMMUCTXNotifier) node;
> > +};
> > +
> > +/*
> > + * This is an abstraction of IOMMU context.
> > + */
> > +struct IOMMUContext {
> > +    uint32_t pasid;
> 
> This confuses me a bit.  I thought the idea was that IOMMUContext with
> SVM would represent all the PASIDs in use, but here we have a specific
> pasid stored in the structure.

It's added by mistake. Should not be included. No patch will use this field.
Will remove it. Thanks for the careful review.

Thanks,
Yi Liu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 09/22] vfio/pci: add iommu_context notifier for pasid alloc/free
  2019-10-29 12:15   ` David Gibson
  2019-11-01 17:26     ` Peter Xu
@ 2019-11-06 12:14     ` Liu, Yi L
  2019-11-20  4:27       ` David Gibson
  1 sibling, 1 reply; 79+ messages in thread
From: Liu, Yi L @ 2019-11-06 12:14 UTC (permalink / raw)
  To: David Gibson
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, peterx, eric.auger, alex.williamson, pbonzini, Sun,
	Yi Y

> From: David Gibson [mailto:david@gibson.dropbear.id.au]
> Sent: Tuesday, October 29, 2019 8:16 PM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 09/22] vfio/pci: add iommu_context notifier for pasid alloc/free
> 
> On Thu, Oct 24, 2019 at 08:34:30AM -0400, Liu Yi L wrote:
> > This patch adds pasid alloc/free notifiers for vfio-pci. It is
> > supposed to be fired by vIOMMU. VFIO then sends PASID allocation
> > or free request to host.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Cc: David Gibson <david@gibson.dropbear.id.au>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> >  hw/vfio/common.c         |  9 ++++++
> >  hw/vfio/pci.c            | 81
> ++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/hw/iommu/iommu.h | 15 +++++++++
> >  3 files changed, 105 insertions(+)
> >
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index d418527..e6ad21c 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -1436,6 +1436,7 @@ static void vfio_disconnect_container(VFIOGroup
> *group)
> >      if (QLIST_EMPTY(&container->group_list)) {
> >          VFIOAddressSpace *space = container->space;
> >          VFIOGuestIOMMU *giommu, *tmp;
> > +        VFIOIOMMUContext *giommu_ctx, *ctx;
> >
> >          QLIST_REMOVE(container, next);
> >
> > @@ -1446,6 +1447,14 @@ static void vfio_disconnect_container(VFIOGroup
> *group)
> >              g_free(giommu);
> >          }
> >
> > +        QLIST_FOREACH_SAFE(giommu_ctx, &container->iommu_ctx_list,
> > +                                                   iommu_ctx_next, ctx) {
> > +            iommu_ctx_notifier_unregister(giommu_ctx->iommu_ctx,
> > +                                                      &giommu_ctx->n);
> > +            QLIST_REMOVE(giommu_ctx, iommu_ctx_next);
> > +            g_free(giommu_ctx);
> > +        }
> > +
> >          trace_vfio_disconnect_container(container->fd);
> >          close(container->fd);
> >          g_free(container);
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index 12fac39..8721ff6 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -2699,11 +2699,80 @@ static void
> vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> >      vdev->req_enabled = false;
> >  }
> >
> > +static void vfio_register_iommu_ctx_notifier(VFIOPCIDevice *vdev,
> > +                                             IOMMUContext *iommu_ctx,
> > +                                             IOMMUCTXNotifyFn fn,
> > +                                             IOMMUCTXEvent event)
> > +{
> > +    VFIOContainer *container = vdev->vbasedev.group->container;
> > +    VFIOIOMMUContext *giommu_ctx;
> > +
> > +    giommu_ctx = g_malloc0(sizeof(*giommu_ctx));
> > +    giommu_ctx->container = container;
> > +    giommu_ctx->iommu_ctx = iommu_ctx;
> > +    QLIST_INSERT_HEAD(&container->iommu_ctx_list,
> > +                      giommu_ctx,
> > +                      iommu_ctx_next);
> > +    iommu_ctx_notifier_register(iommu_ctx,
> > +                                &giommu_ctx->n,
> > +                                fn,
> > +                                event);
> > +}
> > +
> > +static void vfio_iommu_pasid_alloc_notify(IOMMUCTXNotifier *n,
> > +                                          IOMMUCTXEventData *event_data)
> > +{
> > +    VFIOIOMMUContext *giommu_ctx = container_of(n, VFIOIOMMUContext, n);
> > +    VFIOContainer *container = giommu_ctx->container;
> > +    IOMMUCTXPASIDReqDesc *pasid_req =
> > +                              (IOMMUCTXPASIDReqDesc *) event_data->data;
> > +    struct vfio_iommu_type1_pasid_request req;
> > +    unsigned long argsz;
> > +    int pasid;
> > +
> > +    argsz = sizeof(req);
> > +    req.argsz = argsz;
> > +    req.flag = VFIO_IOMMU_PASID_ALLOC;
> > +    req.min_pasid = pasid_req->min_pasid;
> > +    req.max_pasid = pasid_req->max_pasid;
> > +
> > +    pasid = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> > +    if (pasid < 0) {
> > +        error_report("%s: %d, alloc failed", __func__, -errno);
> > +    }
> > +    pasid_req->alloc_result = pasid;
> 
> Altering the event data from the notifier doesn't make sense.  By
> definition there can be multiple notifiers on the chain, so in that
> case which one is responsible for updating the writable field?

I guess you mean multiple pasid_alloc nofitiers. right?

It works for VT-d now, as Intel vIOMMU maintains the IOMMUContext
per-bdf. And there will be only 1 pasid_alloc notifier in the chain. But, I
agree it is not good if other module just share an IOMMUConext across
devices. Definitely, it would have multiple pasid_alloc notifiers.

How about enforcing IOMMUContext layer to only invoke one successful
pasid_alloc/free notifier if PASID_ALLOC/FREE event comes? pasid
alloc/free are really special as it requires feedback. And a potential
benefit is that the pasid_alloc/free will not be affected by hot plug
scenario. There will be always a notifier to work for pasid_alloc/free
work unless all passthru devices are hot plugged. How do you think? Or
if any other idea?

> > +}
> > +
> > +static void vfio_iommu_pasid_free_notify(IOMMUCTXNotifier *n,
> > +                                          IOMMUCTXEventData *event_data)
> > +{
> > +    VFIOIOMMUContext *giommu_ctx = container_of(n, VFIOIOMMUContext, n);
> > +    VFIOContainer *container = giommu_ctx->container;
> > +    IOMMUCTXPASIDReqDesc *pasid_req =
> > +                              (IOMMUCTXPASIDReqDesc *) event_data->data;
> > +    struct vfio_iommu_type1_pasid_request req;
> > +    unsigned long argsz;
> > +    int ret = 0;
> > +
> > +    argsz = sizeof(req);
> > +    req.argsz = argsz;
> > +    req.flag = VFIO_IOMMU_PASID_FREE;
> > +    req.pasid = pasid_req->pasid;
> > +
> > +    ret = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> > +    if (ret != 0) {
> > +        error_report("%s: %d, pasid %u free failed",
> > +                   __func__, -errno, (unsigned) pasid_req->pasid);
> > +    }
> > +    pasid_req->free_result = ret;
> 
> Same problem here.

yep, as above proposal.

> > +}
> > +
> >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> >  {
> >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> >      VFIODevice *vbasedev_iter;
> >      VFIOGroup *group;
> > +    IOMMUContext *iommu_context;
> >      char *tmp, *subsys, group_path[PATH_MAX], *group_name;
> >      Error *err = NULL;
> >      ssize_t len;
> > @@ -3000,6 +3069,18 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> >      vfio_register_req_notifier(vdev);
> >      vfio_setup_resetfn_quirk(vdev);
> >
> > +    iommu_context = pci_device_iommu_context(pdev);
> > +    if (iommu_context) {
> > +        vfio_register_iommu_ctx_notifier(vdev,
> > +                                         iommu_context,
> > +                                         vfio_iommu_pasid_alloc_notify,
> > +                                         IOMMU_CTX_EVENT_PASID_ALLOC);
> > +        vfio_register_iommu_ctx_notifier(vdev,
> > +                                         iommu_context,
> > +                                         vfio_iommu_pasid_free_notify,
> > +                                         IOMMU_CTX_EVENT_PASID_FREE);
> > +    }
> > +
> >      return;
> >
> >  out_teardown:
> > diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
> > index c22c442..4352afd 100644
> > --- a/include/hw/iommu/iommu.h
> > +++ b/include/hw/iommu/iommu.h
> > @@ -31,10 +31,25 @@
> >  typedef struct IOMMUContext IOMMUContext;
> >
> >  enum IOMMUCTXEvent {
> > +    IOMMU_CTX_EVENT_PASID_ALLOC,
> > +    IOMMU_CTX_EVENT_PASID_FREE,
> >      IOMMU_CTX_EVENT_NUM,
> >  };
> >  typedef enum IOMMUCTXEvent IOMMUCTXEvent;
> >
> > +union IOMMUCTXPASIDReqDesc {
> > +    struct {
> > +        uint32_t min_pasid;
> > +        uint32_t max_pasid;
> > +        int32_t alloc_result; /* pasid allocated for the alloc request */
> > +    };
> > +    struct {
> > +        uint32_t pasid; /* pasid to be free */
> > +        int free_result;
> > +    };
> > +};
> 
> Apart from theproblem with writable fields, using a big union for
> event data is pretty ugly.  If you need this different information for
> the different events, it might make more sense to have a separate
> notifier chain with a separate call interface for each event type,
> rather than trying to multiplex them together.

sure, I'll de-couple them. Nice catch.

Thanks,
Yi Liu



^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 14/22] vfio/pci: add iommu_context notifier for pasid bind/unbind
  2019-11-04 16:02   ` David Gibson
@ 2019-11-06 12:22     ` Liu, Yi L
  2019-11-06 14:25       ` Peter Xu
  0 siblings, 1 reply; 79+ messages in thread
From: Liu, Yi L @ 2019-11-06 12:22 UTC (permalink / raw)
  To: David Gibson, eric.auger
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, peterx, alex.williamson, pbonzini, Sun, Yi Y

> From: David Gibson
> Sent: Tuesday, November 5, 2019 12:02 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 14/22] vfio/pci: add iommu_context notifier for pasid
> bind/unbind
> 
> On Thu, Oct 24, 2019 at 08:34:35AM -0400, Liu Yi L wrote:
> > This patch adds notifier for pasid bind/unbind. VFIO registers this
> > notifier to listen to the dual-stage translation (a.k.a. nested
> > translation) configuration changes and propagate to host. Thus vIOMMU
> > is able to set its translation structures to host.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Cc: David Gibson <david@gibson.dropbear.id.au>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> >  hw/vfio/pci.c            | 39 +++++++++++++++++++++++++++++++++++++++
> >  include/hw/iommu/iommu.h | 11 +++++++++++
> >  2 files changed, 50 insertions(+)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index 8721ff6..012b8ed 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -2767,6 +2767,41 @@ static void
> vfio_iommu_pasid_free_notify(IOMMUCTXNotifier *n,
> >      pasid_req->free_result = ret;
> >  }
> >
> > +static void vfio_iommu_pasid_bind_notify(IOMMUCTXNotifier *n,
> > +                                         IOMMUCTXEventData *event_data)
> > +{
> > +#ifdef __linux__
> 
> Is hw/vfio/pci.c even built on non-linux hosts?

I'm not quite sure. It's based a comment from RFC v1. I think it could somehow
prevent compiling issue when doing code porting. So I added it. If it's impossible
to build on non-linux hosts per your experience, I can remove it to make things
simple.

> > +    VFIOIOMMUContext *giommu_ctx = container_of(n, VFIOIOMMUContext, n);
> > +    VFIOContainer *container = giommu_ctx->container;
> > +    IOMMUCTXPASIDBindData *pasid_bind =
> > +                              (IOMMUCTXPASIDBindData *) event_data->data;
> > +    struct vfio_iommu_type1_bind *bind;
> > +    struct iommu_gpasid_bind_data *bind_data;
> > +    unsigned long argsz;
> > +
> > +    argsz = sizeof(*bind) + sizeof(*bind_data);
> > +    bind = g_malloc0(argsz);
> > +    bind->argsz = argsz;
> > +    bind->bind_type = VFIO_IOMMU_BIND_GUEST_PASID;
> > +    bind_data = (struct iommu_gpasid_bind_data *) &bind->data;
> > +    *bind_data = *pasid_bind->data;
> > +
> > +    if (pasid_bind->flag & IOMMU_CTX_BIND_PASID) {
> > +        if (ioctl(container->fd, VFIO_IOMMU_BIND, bind) != 0) {
> > +            error_report("%s: pasid (%llu:%llu) bind failed: %d", __func__,
> > +                         bind_data->gpasid, bind_data->hpasid, -errno);
> > +        }
> > +    } else if (pasid_bind->flag & IOMMU_CTX_UNBIND_PASID) {
> > +        if (ioctl(container->fd, VFIO_IOMMU_UNBIND, bind) != 0) {
> > +            error_report("%s: pasid (%llu:%llu) unbind failed: %d", __func__,
> > +                         bind_data->gpasid, bind_data->hpasid, -errno);
> > +        }
> > +    }
> > +
> > +    g_free(bind);
> > +#endif
> > +}
> > +
> >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> >  {
> >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > @@ -3079,6 +3114,10 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> >                                           iommu_context,
> >                                           vfio_iommu_pasid_free_notify,
> >                                           IOMMU_CTX_EVENT_PASID_FREE);
> > +        vfio_register_iommu_ctx_notifier(vdev,
> > +                                         iommu_context,
> > +                                         vfio_iommu_pasid_bind_notify,
> > +                                         IOMMU_CTX_EVENT_PASID_BIND);
> >      }
> >
> >      return;
> > diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
> > index 4352afd..4f21aa1 100644
> > --- a/include/hw/iommu/iommu.h
> > +++ b/include/hw/iommu/iommu.h
> > @@ -33,6 +33,7 @@ typedef struct IOMMUContext IOMMUContext;
> >  enum IOMMUCTXEvent {
> >      IOMMU_CTX_EVENT_PASID_ALLOC,
> >      IOMMU_CTX_EVENT_PASID_FREE,
> > +    IOMMU_CTX_EVENT_PASID_BIND,
> >      IOMMU_CTX_EVENT_NUM,
> >  };
> >  typedef enum IOMMUCTXEvent IOMMUCTXEvent;
> > @@ -50,6 +51,16 @@ union IOMMUCTXPASIDReqDesc {
> >  };
> >  typedef union IOMMUCTXPASIDReqDesc IOMMUCTXPASIDReqDesc;
> >
> > +struct IOMMUCTXPASIDBindData {
> > +#define IOMMU_CTX_BIND_PASID   (1 << 0)
> > +#define IOMMU_CTX_UNBIND_PASID (1 << 1)
> > +    uint32_t flag;
> > +#ifdef __linux__
> > +    struct iommu_gpasid_bind_data *data;
> 
> Embedding a linux specific structure in the notification message seems
> dubious to me.

Just similar as your above comment in this thread. If we don't want to add
it there, then here it is also unnecessary.

@Eric, do you think it is still necessary to add the __linux__ marco here?

Thanks,
Yi Liu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 10/22] intel_iommu: add virtual command capability support
  2019-11-01 18:05   ` Peter Xu
@ 2019-11-06 12:40     ` Liu, Yi L
  2019-11-06 14:00       ` Peter Xu
  0 siblings, 1 reply; 79+ messages in thread
From: Liu, Yi L @ 2019-11-06 12:40 UTC (permalink / raw)
  To: Peter Xu
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, eric.auger, alex.williamson, pbonzini, Sun, Yi Y,
	david

> From: Peter Xu
> Sent: Saturday, November 2, 2019 2:06 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 10/22] intel_iommu: add virtual command capability support
> 
> On Thu, Oct 24, 2019 at 08:34:31AM -0400, Liu Yi L wrote:
> > This patch adds virtual command support to Intel vIOMMU per
> > Intel VT-d 3.1 spec. And adds two virtual commands: alloc_pasid
> > and free_pasid.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> > ---
> >  hw/i386/intel_iommu.c          | 162
> ++++++++++++++++++++++++++++++++++++++++-
> >  hw/i386/intel_iommu_internal.h |  38 ++++++++++
> >  hw/i386/trace-events           |   1 +
> >  include/hw/i386/intel_iommu.h  |   6 +-
> >  4 files changed, 205 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index e9f8692..88b843f 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -944,6 +944,7 @@ static VTDBus
> *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
> >                  return vtd_bus;
> >              }
> >          }
> > +        vtd_bus = NULL;
> 
> I feel like I've commented on this..
> 
> Should this be a standalone patch?

Oops, I should have made it in a separate patch. will do it in next version.

> >      }
> >      return vtd_bus;
> >  }
> > @@ -2590,6 +2591,140 @@ static void vtd_handle_iectl_write(IntelIOMMUState
> *s)
> >      }
> >  }
> >
> > +static int vtd_request_pasid_alloc(IntelIOMMUState *s)
> > +{
> > +    VTDBus *vtd_bus;
> > +    int bus_n, devfn;
> > +    IOMMUCTXEventData event_data;
> > +    IOMMUCTXPASIDReqDesc req;
> > +    VTDIOMMUContext *vtd_ic;
> > +
> > +    event_data.event = IOMMU_CTX_EVENT_PASID_ALLOC;
> > +    event_data.data = &req;
> > +    req.min_pasid = VTD_MIN_HPASID;
> > +    req.max_pasid = VTD_MAX_HPASID;
> > +    req.alloc_result = 0;
> > +    event_data.length = sizeof(req);
> 
> As mentioned in the other thread, do you think we can drop this length
> field?

yep, will do it.

> 
> > +    for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) {
> > +        vtd_bus = vtd_find_as_from_bus_num(s, bus_n);
> > +        if (!vtd_bus) {
> > +            continue;
> > +        }
> > +        for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
> > +            vtd_ic = vtd_bus->dev_ic[devfn];
> > +            if (!vtd_ic) {
> > +                continue;
> > +            }
> > +            iommu_ctx_event_notify(&vtd_ic->iommu_context, &event_data);
> 
> Considering that we'll fill in the result into event_data, it could be
> a bit misleading to still call it "notify" here because normally it
> should only get data from the notifier caller rather than returning a
> meaningful value..  Things like SUCCESS/FAIL would be fine, but here
> we're returning a pasid from the notifier which seems a bit odd.
> 
> Maybe rename it to iommu_ctx_event_deliver()?  Then we just rename all
> the references of "notify" thingys into "hook" or something clearer?

got it. Will do it when we got agreement on the comments regards to
[RFC v2 09/22] vfio/pci: add iommu_context notifier for pasid alloc/free
of this series.

> > +            if (req.alloc_result > 0) {
> 
> I'd suggest we comment on this:
> 
>     We'll return the first valid result we got.  It's a bit hackish in
>     that we don't have a good global interface yet to talk to modules
>     like vfio to deliver this allocation request, so we're leveraging
>     this per-device context to do the same thing just to make sure the
>     allocation happens only once.
> 
> Same to the pasid_free() below, though you can reference the comment
> here from there to be simple.

Got it. Will add it in both place.

> 
> > +                return req.alloc_result;
> > +            }
> > +        }
> > +    }
> > +    return -1;
> > +}
> > +
> > +static int vtd_request_pasid_free(IntelIOMMUState *s, uint32_t pasid)
> > +{
> > +    VTDBus *vtd_bus;
> > +    int bus_n, devfn;
> > +    IOMMUCTXEventData event_data;
> > +    IOMMUCTXPASIDReqDesc req;
> > +    VTDIOMMUContext *vtd_ic;
> > +
> > +    event_data.event = IOMMU_CTX_EVENT_PASID_FREE;
> > +    event_data.data = &req;
> > +    req.pasid = pasid;
> > +    req.free_result = 0;
> > +    event_data.length = sizeof(req);
> > +    for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) {
> > +        vtd_bus = vtd_find_as_from_bus_num(s, bus_n);
> > +        if (!vtd_bus) {
> > +            continue;
> > +        }
> > +        for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
> > +            vtd_ic = vtd_bus->dev_ic[devfn];
> > +            if (!vtd_ic) {
> > +                continue;
> > +            }
> > +            iommu_ctx_event_notify(&vtd_ic->iommu_context, &event_data);
> > +            if (req.free_result == 0) {
> > +                return 0;
> > +            }
> > +        }
> > +    }
> > +    return -1;
> > +}
> > +
> > +/*
> > + * If IP is not set, set it and return 0
> > + * If IP is already set, return -1
> > + */
> > +static int vtd_vcmd_rsp_ip_check(IntelIOMMUState *s)
> > +{
> > +    if (!(s->vccap & VTD_VCCAP_PAS) ||
> > +         (s->vcrsp & 1)) {
> > +        return -1;
> > +    }
> 
> VTD_VCCAP_PAS is not a IP check, so maybe simply move these chunk out
> to vtd_handle_vcmd_write?  Then we can rename this function to
> "void vtd_vcmd_ip_set(...)".

yes, it is. will do it in next version.

> 
> > +    s->vcrsp = 1;
> > +    vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> > +                     ((uint64_t) s->vcrsp));
> > +    return 0;
> > +}
> > +
> > +static void vtd_vcmd_clear_ip(IntelIOMMUState *s)
> > +{
> > +    s->vcrsp &= (~((uint64_t)(0x1)));
> > +    vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> > +                     ((uint64_t) s->vcrsp));
> > +}
> > +
> > +/* Handle write to Virtual Command Register */
> > +static int vtd_handle_vcmd_write(IntelIOMMUState *s, uint64_t val)
> > +{
> > +    uint32_t pasid;
> > +    int ret = -1;
> > +
> > +    trace_vtd_reg_write_vcmd(s->vcrsp, val);
> > +
> > +    /*
> > +     * Since vCPU should be blocked when the guest VMCD
> > +     * write was trapped to here. Should be no other vCPUs
> > +     * try to access VCMD if guest software is well written.
> > +     * However, we still emulate the IP bit here in case of
> > +     * bad guest software. Also align with the spec.
> > +     */
> > +    ret = vtd_vcmd_rsp_ip_check(s);
> > +    if (ret) {
> > +        return ret;
> > +    }
> > +    switch (val & VTD_VCMD_CMD_MASK) {
> > +    case VTD_VCMD_ALLOC_PASID:
> > +        ret = vtd_request_pasid_alloc(s);
> > +        if (ret < 0) {
> > +            s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_NO_AVAILABLE_PASID);
> > +        } else {
> > +            s->vcrsp |= VTD_VCRSP_RSLT(ret);
> > +        }
> > +        break;
> > +
> > +    case VTD_VCMD_FREE_PASID:
> > +        pasid = VTD_VCMD_PASID_VALUE(val);
> > +        ret = vtd_request_pasid_free(s, pasid);
> > +        if (ret < 0) {
> > +            s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_FREE_INVALID_PASID);
> > +        }
> > +        break;
> > +
> > +    default:
> > +        s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_UNDEFINED_CMD);
> > +        printf("Virtual Command: unsupported command!!!\n");
> 
> Perhaps error_report_once()?

will fix it in next version. thx~

> > +        break;
> > +    }
> > +    vtd_vcmd_clear_ip(s);
> > +    return 0;
> > +}
> > +
> >  static uint64_t vtd_mem_read(void *opaque, hwaddr addr, unsigned size)
> >  {
> >      IntelIOMMUState *s = opaque;
> > @@ -2879,6 +3014,23 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
> >          vtd_set_long(s, addr, val);
> >          break;
> >
> > +    case DMAR_VCMD_REG:
> > +        if (!vtd_handle_vcmd_write(s, val)) {
> > +            if (size == 4) {
> > +                vtd_set_long(s, addr, val);
> > +            } else {
> > +                vtd_set_quad(s, addr, val);
> > +            }
> > +        }
> > +        break;
> > +
> > +    case DMAR_VCMD_REG_HI:
> > +        assert(size == 4);
> 
> This assert() seems scary, but of course not a problem of this patch
> because plenty of that are there in vtd_mem_write..  So we can fix
> that later.

got it.

> 
> Do you know what should happen on bare-metal from spec-wise that when
> the guest e.g. writes 2 bytes to these mmio regions?

I've no idea to your question. It is not a bare-metal capability. Personally, I
prefer to have a toggle bit to mark the full written of a cmd to VMCD_REG.
Reason is that we have no control on guest software. It may write new cmd
to VCMD_REG in a bad manner. e.g. write high 32 bits first and then write the
low 32 bits. Then it will have two traps. Apparently, for the first trap, it fills
in the VCMD_REG and no need to handle it since it is not a full written. I'm
checking it and evaluating it. How do you think on it?

> 
> > +        if (!vtd_handle_vcmd_write(s, val)) {
> > +            vtd_set_long(s, addr, val);
> > +        }
> > +        break;
> > +
> >      default:
> >          if (size == 4) {
> >              vtd_set_long(s, addr, val);
> > @@ -3617,7 +3769,8 @@ static void vtd_init(IntelIOMMUState *s)
> >              s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
> >          } else if (!strcmp(s->scalable_mode, "modern")) {
> >              s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID
> > -                       | VTD_ECAP_FLTS | VTD_ECAP_PSS;
> > +                       | VTD_ECAP_FLTS | VTD_ECAP_PSS | VTD_ECAP_VCS;
> > +            s->vccap |= VTD_VCCAP_PAS;
> >          }
> >      }
> >
> 
> [...]
> 
> > +#define VTD_VCMD_CMD_MASK           0xffUL
> > +#define VTD_VCMD_PASID_VALUE(val)   (((val) >> 8) & 0xfffff)
> > +
> > +#define VTD_VCRSP_RSLT(val)         ((val) << 8)
> > +#define VTD_VCRSP_SC(val)           (((val) & 0x3) << 1)
> > +
> > +#define VTD_VCMD_UNDEFINED_CMD         1ULL
> > +#define VTD_VCMD_NO_AVAILABLE_PASID    2ULL
> 
> According to 10.4.44 - should this be 1?

It's 2 now per VT-d spec 3.1 (2019 June). I should have mentioned it in the cover
letter...

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 09/22] vfio/pci: add iommu_context notifier for pasid alloc/free
  2019-11-01 17:26     ` Peter Xu
@ 2019-11-06 12:46       ` Liu, Yi L
  0 siblings, 0 replies; 79+ messages in thread
From: Liu, Yi L @ 2019-11-06 12:46 UTC (permalink / raw)
  To: Peter Xu, David Gibson
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, eric.auger, alex.williamson, pbonzini, Sun, Yi Y

> From: Peter Xu
> Sent: Saturday, November 2, 2019 1:26 AM
> To: David Gibson <david@gibson.dropbear.id.au>
> Subject: Re: [RFC v2 09/22] vfio/pci: add iommu_context notifier for pasid alloc/free
> 
> On Tue, Oct 29, 2019 at 01:15:44PM +0100, David Gibson wrote:
> > > +union IOMMUCTXPASIDReqDesc {
> > > +    struct {
> > > +        uint32_t min_pasid;
> > > +        uint32_t max_pasid;
> > > +        int32_t alloc_result; /* pasid allocated for the alloc request */
> > > +    };
> > > +    struct {
> > > +        uint32_t pasid; /* pasid to be free */
> > > +        int free_result;
> > > +    };
> > > +};
> >
> > Apart from theproblem with writable fields, using a big union for
> > event data is pretty ugly.  If you need this different information for
> > the different events, it might make more sense to have a separate
> > notifier chain with a separate call interface for each event type,
> > rather than trying to multiplex them together.
> 
> I have no issue on the union definiion, however I do agree that it's a
> bit awkward to register one notifier for each event.

Got it. Would fix it in next version.

> Instead of introducing even more notifier chains, I'm thinking whether
> we can simply provide a single notifier hook for all the four events.
> After all I don't see in what case we'll only register some of the
> events, like we can't register alloc_pasid() without registering to
> free_pasid() because otherwise it does not make sense..  And also you
> have the wrapper struct ("IOMMUCTXEventData") which contains the event
> type, so the notify() hook will know which message is this.

I'm in with this proposal. This makes the notifier chain smaller.

> A side note is that I think you don't need the
> IOMMUCTXEventData.length.  If you see the code, vtd_bind_guest_pasid()
> does not even initialize length right now, and I think it could still
> work only because none of the vfio notify() hook
> (e.g. vfio_iommu_pasid_bind_notify) checks that length...

yes, will fix it.

> --
> Peter Xu

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 10/22] intel_iommu: add virtual command capability support
  2019-11-06 12:40     ` Liu, Yi L
@ 2019-11-06 14:00       ` Peter Xu
  2019-11-12  6:27         ` Liu, Yi L
  0 siblings, 1 reply; 79+ messages in thread
From: Peter Xu @ 2019-11-06 14:00 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, eric.auger, alex.williamson, pbonzini, Sun, Yi Y,
	david

On Wed, Nov 06, 2019 at 12:40:41PM +0000, Liu, Yi L wrote:
> > 
> > Do you know what should happen on bare-metal from spec-wise that when
> > the guest e.g. writes 2 bytes to these mmio regions?
> 
> I've no idea to your question. It is not a bare-metal capability. Personally, I
> prefer to have a toggle bit to mark the full written of a cmd to VMCD_REG.
> Reason is that we have no control on guest software. It may write new cmd
> to VCMD_REG in a bad manner. e.g. write high 32 bits first and then write the
> low 32 bits. Then it will have two traps. Apparently, for the first trap, it fills
> in the VCMD_REG and no need to handle it since it is not a full written. I'm
> checking it and evaluating it. How do you think on it?

Oh I just noticed that vtd_mem_ops.min_access_size==4 now so writting
2B should never happen at least.  Then we'll bail out at
memory_region_access_valid().  Seems fine.

> 
> > 
> > > +        if (!vtd_handle_vcmd_write(s, val)) {
> > > +            vtd_set_long(s, addr, val);
> > > +        }
> > > +        break;
> > > +
> > >      default:
> > >          if (size == 4) {
> > >              vtd_set_long(s, addr, val);
> > > @@ -3617,7 +3769,8 @@ static void vtd_init(IntelIOMMUState *s)
> > >              s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
> > >          } else if (!strcmp(s->scalable_mode, "modern")) {
> > >              s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID
> > > -                       | VTD_ECAP_FLTS | VTD_ECAP_PSS;
> > > +                       | VTD_ECAP_FLTS | VTD_ECAP_PSS | VTD_ECAP_VCS;
> > > +            s->vccap |= VTD_VCCAP_PAS;
> > >          }
> > >      }
> > >
> > 
> > [...]
> > 
> > > +#define VTD_VCMD_CMD_MASK           0xffUL
> > > +#define VTD_VCMD_PASID_VALUE(val)   (((val) >> 8) & 0xfffff)
> > > +
> > > +#define VTD_VCRSP_RSLT(val)         ((val) << 8)
> > > +#define VTD_VCRSP_SC(val)           (((val) & 0x3) << 1)
> > > +
> > > +#define VTD_VCMD_UNDEFINED_CMD         1ULL
> > > +#define VTD_VCMD_NO_AVAILABLE_PASID    2ULL
> > 
> > According to 10.4.44 - should this be 1?
> 
> It's 2 now per VT-d spec 3.1 (2019 June). I should have mentioned it in the cover
> letter...

Well you're right... I hope there won't be other "major" things get
changed otherwise it'll be really a pain of working on all of these
before things settle...

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 14/22] vfio/pci: add iommu_context notifier for pasid bind/unbind
  2019-11-06 12:22     ` Liu, Yi L
@ 2019-11-06 14:25       ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2019-11-06 14:25 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, eric.auger, alex.williamson, pbonzini, Sun, Yi Y,
	David Gibson

On Wed, Nov 06, 2019 at 12:22:46PM +0000, Liu, Yi L wrote:
> > From: David Gibson
> > Sent: Tuesday, November 5, 2019 12:02 AM
> > To: Liu, Yi L <yi.l.liu@intel.com>
> > Subject: Re: [RFC v2 14/22] vfio/pci: add iommu_context notifier for pasid
> > bind/unbind
> > 
> > On Thu, Oct 24, 2019 at 08:34:35AM -0400, Liu Yi L wrote:
> > > This patch adds notifier for pasid bind/unbind. VFIO registers this
> > > notifier to listen to the dual-stage translation (a.k.a. nested
> > > translation) configuration changes and propagate to host. Thus vIOMMU
> > > is able to set its translation structures to host.
> > >
> > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > Cc: Peter Xu <peterx@redhat.com>
> > > Cc: Eric Auger <eric.auger@redhat.com>
> > > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > > Cc: David Gibson <david@gibson.dropbear.id.au>
> > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > ---
> > >  hw/vfio/pci.c            | 39 +++++++++++++++++++++++++++++++++++++++
> > >  include/hw/iommu/iommu.h | 11 +++++++++++
> > >  2 files changed, 50 insertions(+)
> > >
> > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > index 8721ff6..012b8ed 100644
> > > --- a/hw/vfio/pci.c
> > > +++ b/hw/vfio/pci.c
> > > @@ -2767,6 +2767,41 @@ static void
> > vfio_iommu_pasid_free_notify(IOMMUCTXNotifier *n,
> > >      pasid_req->free_result = ret;
> > >  }
> > >
> > > +static void vfio_iommu_pasid_bind_notify(IOMMUCTXNotifier *n,
> > > +                                         IOMMUCTXEventData *event_data)
> > > +{
> > > +#ifdef __linux__
> > 
> > Is hw/vfio/pci.c even built on non-linux hosts?
> 
> I'm not quite sure. It's based a comment from RFC v1. I think it could somehow
> prevent compiling issue when doing code porting. So I added it. If it's impossible
> to build on non-linux hosts per your experience, I can remove it to make things
> simple.

To my understanding this should not be needed because VFIO doesn't
work with non-linux after all (as said)... while...

> 
> > > +    VFIOIOMMUContext *giommu_ctx = container_of(n, VFIOIOMMUContext, n);
> > > +    VFIOContainer *container = giommu_ctx->container;
> > > +    IOMMUCTXPASIDBindData *pasid_bind =
> > > +                              (IOMMUCTXPASIDBindData *) event_data->data;
> > > +    struct vfio_iommu_type1_bind *bind;
> > > +    struct iommu_gpasid_bind_data *bind_data;
> > > +    unsigned long argsz;
> > > +
> > > +    argsz = sizeof(*bind) + sizeof(*bind_data);
> > > +    bind = g_malloc0(argsz);
> > > +    bind->argsz = argsz;
> > > +    bind->bind_type = VFIO_IOMMU_BIND_GUEST_PASID;
> > > +    bind_data = (struct iommu_gpasid_bind_data *) &bind->data;
> > > +    *bind_data = *pasid_bind->data;
> > > +
> > > +    if (pasid_bind->flag & IOMMU_CTX_BIND_PASID) {
> > > +        if (ioctl(container->fd, VFIO_IOMMU_BIND, bind) != 0) {
> > > +            error_report("%s: pasid (%llu:%llu) bind failed: %d", __func__,
> > > +                         bind_data->gpasid, bind_data->hpasid, -errno);
> > > +        }
> > > +    } else if (pasid_bind->flag & IOMMU_CTX_UNBIND_PASID) {
> > > +        if (ioctl(container->fd, VFIO_IOMMU_UNBIND, bind) != 0) {
> > > +            error_report("%s: pasid (%llu:%llu) unbind failed: %d", __func__,
> > > +                         bind_data->gpasid, bind_data->hpasid, -errno);
> > > +        }
> > > +    }
> > > +
> > > +    g_free(bind);
> > > +#endif
> > > +}
> > > +
> > >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> > >  {
> > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > @@ -3079,6 +3114,10 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> > >                                           iommu_context,
> > >                                           vfio_iommu_pasid_free_notify,
> > >                                           IOMMU_CTX_EVENT_PASID_FREE);
> > > +        vfio_register_iommu_ctx_notifier(vdev,
> > > +                                         iommu_context,
> > > +                                         vfio_iommu_pasid_bind_notify,
> > > +                                         IOMMU_CTX_EVENT_PASID_BIND);
> > >      }
> > >
> > >      return;
> > > diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
> > > index 4352afd..4f21aa1 100644
> > > --- a/include/hw/iommu/iommu.h
> > > +++ b/include/hw/iommu/iommu.h
> > > @@ -33,6 +33,7 @@ typedef struct IOMMUContext IOMMUContext;
> > >  enum IOMMUCTXEvent {
> > >      IOMMU_CTX_EVENT_PASID_ALLOC,
> > >      IOMMU_CTX_EVENT_PASID_FREE,
> > > +    IOMMU_CTX_EVENT_PASID_BIND,
> > >      IOMMU_CTX_EVENT_NUM,
> > >  };
> > >  typedef enum IOMMUCTXEvent IOMMUCTXEvent;
> > > @@ -50,6 +51,16 @@ union IOMMUCTXPASIDReqDesc {
> > >  };
> > >  typedef union IOMMUCTXPASIDReqDesc IOMMUCTXPASIDReqDesc;
> > >
> > > +struct IOMMUCTXPASIDBindData {
> > > +#define IOMMU_CTX_BIND_PASID   (1 << 0)
> > > +#define IOMMU_CTX_UNBIND_PASID (1 << 1)
> > > +    uint32_t flag;
> > > +#ifdef __linux__
> > > +    struct iommu_gpasid_bind_data *data;
> > 
> > Embedding a linux specific structure in the notification message seems
> > dubious to me.
> 
> Just similar as your above comment in this thread. If we don't want to add
> it there, then here it is also unnecessary.

... I'm not sure but maybe we need this here because I _think_ vt-d
should even work on Windows.  However instead of __linux__ over *data,
should you cover the whole IOMMUCTXPASIDBindData?

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 15/22] intel_iommu: bind/unbind guest page table to host
  2019-11-06  8:10     ` Liu, Yi L
@ 2019-11-06 14:27       ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2019-11-06 14:27 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, eric.auger, alex.williamson, pbonzini, Sun, Yi Y,
	david

On Wed, Nov 06, 2019 at 08:10:59AM +0000, Liu, Yi L wrote:
> > From: Peter Xu [mailto:peterx@redhat.com]
> > Sent: Tuesday, November 5, 2019 4:26 AM
> > To: Liu, Yi L <yi.l.liu@intel.com>
> > Subject: Re: [RFC v2 15/22] intel_iommu: bind/unbind guest page table to host
> > 
> > On Thu, Oct 24, 2019 at 08:34:36AM -0400, Liu Yi L wrote:
> > > This patch captures the guest PASID table entry modifications and
> > > propagates the changes to host to setup nested translation. The
> > > guest page table is configured as 1st level page table (GVA->GPA)
> > > whose translation result would further go through host VT-d 2nd
> > > level page table(GPA->HPA) under nested translation mode. This is
> > > a key part of vSVA support.
> > >
> > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > Cc: Peter Xu <peterx@redhat.com>
> > > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > ---
> > >  hw/i386/intel_iommu.c          | 81
> > ++++++++++++++++++++++++++++++++++++++++++
> > >  hw/i386/intel_iommu_internal.h | 20 +++++++++++
> > >  2 files changed, 101 insertions(+)
> > >
> > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > index d8827c9..793b0de 100644
> > > --- a/hw/i386/intel_iommu.c
> > > +++ b/hw/i386/intel_iommu.c
> > > @@ -41,6 +41,7 @@
> > >  #include "migration/vmstate.h"
> > >  #include "trace.h"
> > >  #include "qemu/jhash.h"
> > > +#include <linux/iommu.h>
> > >
> > >  /* context entry operations */
> > >  #define VTD_CE_GET_RID2PASID(ce) \
> > > @@ -695,6 +696,16 @@ static inline uint16_t
> > vtd_pe_get_domain_id(VTDPASIDEntry *pe)
> > >      return VTD_SM_PASID_ENTRY_DID((pe)->val[1]);
> > >  }
> > >
> > > +static inline uint32_t vtd_pe_get_fl_aw(VTDPASIDEntry *pe)
> > > +{
> > > +    return 48 + ((pe->val[2] >> 2) & VTD_SM_PASID_ENTRY_FLPM) * 9;
> > > +}
> > > +
> > > +static inline dma_addr_t vtd_pe_get_flpt_base(VTDPASIDEntry *pe)
> > > +{
> > > +    return pe->val[2] & VTD_SM_PASID_ENTRY_FLPTPTR;
> > > +}
> > > +
> > >  static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire)
> > >  {
> > >      return pdire->val & 1;
> > > @@ -1850,6 +1861,67 @@ static void
> > vtd_context_global_invalidate(IntelIOMMUState *s)
> > >      vtd_iommu_replay_all(s);
> > >  }
> > >
> > > +static void vtd_bind_guest_pasid(IntelIOMMUState *s, VTDBus *vtd_bus,
> > > +            int devfn, int pasid, VTDPASIDEntry *pe, VTDPASIDOp op)
> > > +{
> > > +#ifdef __linux__
> > > +    VTDIOMMUContext *vtd_ic;
> > > +    IOMMUCTXEventData event_data;
> > > +    IOMMUCTXPASIDBindData bind;
> > > +    struct iommu_gpasid_bind_data *g_bind_data;
> > > +
> > > +    vtd_ic = vtd_bus->dev_ic[devfn];
> > > +    if (!vtd_ic) {
> > > +        return;
> > > +    }
> > > +
> > > +    g_bind_data = g_malloc0(sizeof(*g_bind_data));
> > > +    bind.flag = 0;
> > > +    g_bind_data->flags = 0;
> > > +    g_bind_data->vtd.flags = 0;
> > > +    switch (op) {
> > > +    case VTD_PASID_BIND:
> > > +    case VTD_PASID_UPDATE:
> > > +        g_bind_data->version = IOMMU_GPASID_BIND_VERSION_1;
> > > +        g_bind_data->format = IOMMU_PASID_FORMAT_INTEL_VTD;
> > > +        g_bind_data->gpgd = vtd_pe_get_flpt_base(pe);
> > > +        g_bind_data->addr_width = vtd_pe_get_fl_aw(pe);
> > > +        g_bind_data->hpasid = pasid;
> > > +        g_bind_data->gpasid = pasid;
> > > +        g_bind_data->flags |= IOMMU_SVA_GPASID_VAL;
> > > +        g_bind_data->vtd.flags =
> > > +                             (VTD_SM_PASID_ENTRY_SRE_BIT(pe->val[2]) ? 1 : 0)
> > > +                           | (VTD_SM_PASID_ENTRY_EAFE_BIT(pe->val[2]) ? 1 : 0)
> > > +                           | (VTD_SM_PASID_ENTRY_PCD_BIT(pe->val[1]) ? 1 : 0)
> > > +                           | (VTD_SM_PASID_ENTRY_PWT_BIT(pe->val[1]) ? 1 : 0)
> > > +                           | (VTD_SM_PASID_ENTRY_EMTE_BIT(pe->val[1]) ? 1 : 0)
> > > +                           | (VTD_SM_PASID_ENTRY_CD_BIT(pe->val[1]) ? 1 : 0);
> > > +        g_bind_data->vtd.pat = VTD_SM_PASID_ENTRY_PAT(pe->val[1]);
> > > +        g_bind_data->vtd.emt = VTD_SM_PASID_ENTRY_EMT(pe->val[1]);
> > > +        bind.flag |= IOMMU_CTX_BIND_PASID;
> > > +        break;
> > > +
> > > +    case VTD_PASID_UNBIND:
> > > +        g_bind_data->gpgd = 0;
> > > +        g_bind_data->addr_width = 0;
> > > +        g_bind_data->hpasid = pasid;
> > > +        bind.flag |= IOMMU_CTX_UNBIND_PASID;
> > > +        break;
> > > +
> > > +    default:
> > > +        printf("Unknown VTDPASIDOp!!\n");
> > 
> > Please don't use printf()..  Here assert() suits.
> 
> Will correct it. Thanks.
> 
> > 
> > > +        break;
> > > +    }
> > > +    if (bind.flag) {
> > 
> > Will this be untrue?  If not, assert() works too.
> 
> yes, it is possible. If an unknown VTDPASIDOp, then no switch case
> will initiate bind.flag.

Then should it be a programming error?  If so we should still use
assert(), I think...

> 
> > > +        event_data.event = IOMMU_CTX_EVENT_PASID_BIND;
> > > +        bind.data = g_bind_data;
> > > +        event_data.data = &bind;
> > > +        iommu_ctx_event_notify(&vtd_ic->iommu_context, &event_data);
> > > +    }
> > > +    g_free(g_bind_data);
> > > +#endif
> > > +}
> > > +
> > >  /* Do a context-cache device-selective invalidation.
> > >   * @func_mask: FM field after shifting
> > >   */
> > > @@ -2528,12 +2600,17 @@ static gboolean vtd_flush_pasid(gpointer key,
> > gpointer value,
> > >                  pc_entry->pasid_cache_gen = s->pasid_cache_gen;
> > >                  if (!vtd_pasid_entry_compare(&pe, &pc_entry->pasid_entry)) {
> > >                      pc_entry->pasid_entry = pe;
> > > +                    vtd_bind_guest_pasid(s, vtd_bus, devfn,
> > > +                                     pasid, &pe, VTD_PASID_UPDATE);
> > >                      /*
> > >                       * TODO: when pasid-base-iotlb(piotlb) infrastructure is
> > >                       * ready, should invalidate QEMU piotlb togehter with this
> > >                       * change.
> > >                       */
> > >                  }
> > > +            } else {
> > > +                vtd_bind_guest_pasid(s, vtd_bus, devfn,
> > > +                                  pasid, NULL, VTD_PASID_UNBIND);
> > 
> > Please see the reply in the other thread on vtd_flush_pasid().  I've
> > filled in where I feel like this UNBIND should exist, I feel like your
> > current code could miss some places where you should unbind but didn't.
> 
> I've replied in that thread regards to your comments. May you
> reconsider it here. Hope, it suits what you thought. If still
> something missed, pls feel free to point out.

Ok let's wait to see your next version.  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 13/22] intel_iommu: add PASID cache management infrastructure
  2019-11-06  7:56     ` Liu, Yi L
@ 2019-11-07 15:46       ` Peter Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Peter Xu @ 2019-11-07 15:46 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, eric.auger, alex.williamson, pbonzini, Sun, Yi Y,
	david

On Wed, Nov 06, 2019 at 07:56:21AM +0000, Liu, Yi L wrote:
> > > +static inline struct pasid_key *vtd_get_pasid_key(uint32_t pasid,
> > > +                                                  uint16_t sid)
> > > +{
> > > +    struct pasid_key *key = g_malloc0(sizeof(*key));
> > 
> > I think you can simply return the pasid_key directly maybe otherwise
> > should be careful on mem leak.  Actually I think it's leaked below...
> 
> sure, I can do it. For the leak, it is a known issue as below comment
> indicates. Not sure why it was left as it is. Perhaps, the key point is
> used in the hash table. Per my understanding, hash table should have
> its own field to store the key content. Do you have any idea?
> 
>     if (!vtd_bus) {
>         uintptr_t *new_key = g_malloc(sizeof(*new_key));

This declares new_key as a "uintptr_t *" type.  Note, it's "new_key"
that gets the malloced region not "*new_key".

>         *new_key = (uintptr_t)bus;

This assigns the value of the bus to *newkey.

>         /* No corresponding free() */

So this does not need to be freed because the key will be inserted
very soon.  In your case below, you need to free it when it's only
used for lookup, or you can simply declare the key as stack variable
like what it did in vtd_find_add_as:

VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
{
    uintptr_t key = (uintptr_t)bus;
    ...
}

> 
> > 
> > > +    key->pasid = pasid;
> > > +    key->sid = sid;
> > > +    return key;
> > > +}
> > > +
> > > +static guint vtd_pasid_as_key_hash(gconstpointer v)
> > > +{
> > > +    struct pasid_key *key = (struct pasid_key *)v;
> > > +    uint32_t a, b, c;
> > > +
> > > +    /* Jenkins hash */
> > > +    a = b = c = JHASH_INITVAL + sizeof(*key);
> > > +    a += key->sid;
> > > +    b += extract32(key->pasid, 0, 16);
> > > +    c += extract32(key->pasid, 16, 16);
> > > +
> > > +    __jhash_mix(a, b, c);
> > > +    __jhash_final(a, b, c);
> > 
> > I'm totally not good at hash, but I'm curious why no one wants to
> > introduce at least a jhash() so we don't need to call these internals
> > (I believe that's how kernel did it). 
> 
> well, I'm also curious about it.
> 
> > At the meantime I don't see how
> > it would be better than things like g_str_hash() too so I'd be glad if
> > anyone could help explain a bit...
> 
> I used to use g_str_hash(), and used string as key.
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02128.html
> 
> Do you want me to keep the pasid_key structure here and switch to
> use g_str_hash()? Then the pasid key content would be compared as
> strings. I think it should work. But, I may be wrong all the same.

Oh ok so no length can be specified with that.  Do you like to
introduce the jhash() with your series and use it?

> 
> > > +
> > > +    return c;
> > > +}
> > > +
> > > +static gboolean vtd_pasid_as_key_equal(gconstpointer v1, gconstpointer v2)
> > > +{
> > > +    const struct pasid_key *k1 = v1;
> > > +    const struct pasid_key *k2 = v2;
> > > +
> > > +    return (k1->pasid == k2->pasid) && (k1->sid == k2->sid);
> > > +}
> > > +
> > > +static inline bool vtd_pc_is_dom_si(struct VTDPASIDCacheInfo *pc_info)
> > > +{
> > > +    return pc_info->flags & VTD_PASID_CACHE_DOMSI;
> > > +}
> > > +
> > > +static inline bool vtd_pc_is_pasid_si(struct VTDPASIDCacheInfo *pc_info)
> > > +{
> > > +    return pc_info->flags & VTD_PASID_CACHE_PASIDSI;
> > 
> > AFAIS these only used once.  How about removing these helpers?  I
> > don't see much on helping readability or anything...  please see below
> > at [1].
> 
> Agreed. Will do it. BTW. I failed to locate [1]. May you point it out. Surely
> I don’t want to miss any comments.

No worry, it's a comment left below and you didn't miss anything. I
just forgot to mark it out in my previous reply.

> 
> > > +}
> > > +
> > > +static inline int vtd_dev_get_pe_from_pasid(IntelIOMMUState *s,
> > > +                                            uint8_t bus_num,
> > > +                                            uint8_t devfn,
> > > +                                            uint32_t pasid,
> > > +                                            VTDPASIDEntry *pe)
> > > +{
> > > +    VTDContextEntry ce;
> > > +    int ret;
> > > +    dma_addr_t pasid_dir_base;
> > > +
> > > +    if (!s->root_scalable) {
> > > +        return -VTD_FR_PASID_TABLE_INV;
> > > +    }
> > > +
> > > +    ret = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
> > > +    if (ret) {
> > > +        return ret;
> > > +    }
> > > +
> > > +    pasid_dir_base = VTD_CE_GET_PASID_DIR_TABLE(&ce);
> > > +    ret = vtd_get_pe_from_pasid_table(s,
> > > +                                  pasid_dir_base, pasid, pe);
> > > +
> > > +    return ret;
> > > +}
> > > +
> > > +static bool vtd_pasid_entry_compare(VTDPASIDEntry *p1, VTDPASIDEntry *p2)
> > > +{
> > > +    int i = 0;
> > > +    while (i < sizeof(*p1) / sizeof(p1->val)) {
> > > +        if (p1->val[i] != p2->val[i]) {
> > > +            return false;
> > > +        }
> > > +        i++;
> > > +    }
> > > +    return true;
> > 
> > Will this work?
> > 
> >   return !memcmp(p1, p2, sizeof(*p1));
> 
> oh, yes. Will replace with it.
> 
> > > +}
> > > +
> > > +/**
> > > + * This function is used to clear pasid_cache_gen of cached pasid
> > > + * entry in vtd_pasid_as instances. Caller of this function should
> > > + * hold iommu_lock.
> > > + */
> > > +static gboolean vtd_flush_pasid(gpointer key, gpointer value,
> > > +                                gpointer user_data)
> > > +{
> > > +    VTDPASIDCacheInfo *pc_info = user_data;
> > > +    VTDPASIDAddressSpace *vtd_pasid_as = value;
> > > +    IntelIOMMUState *s = vtd_pasid_as->iommu_state;
> > > +    VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry;
> > > +    VTDBus *vtd_bus = vtd_pasid_as->vtd_bus;
> > > +    VTDPASIDEntry pe;
> > > +    uint16_t did;
> > > +    uint32_t pasid;
> > > +    uint16_t devfn;
> > > +    gboolean remove = false;
> > > +
> > > +    did = vtd_pe_get_domain_id(&pc_entry->pasid_entry);
> > > +    pasid = vtd_pasid_as->pasid;
> > > +    devfn = vtd_pasid_as->devfn;
> > > +
> > > +    if (pc_entry->pasid_cache_gen &&
> > > +        (vtd_pc_is_dom_si(pc_info) ? (pc_info->domain_id == did) : 1) &&
> > > +        (vtd_pc_is_pasid_si(pc_info) ? (pc_info->pasid == pasid) : 1)) {
> > 
> > This chunk is a bit odd to me.  How about something like this?
> > 
> >   ...
> > 
> >   if (!pc_entry->pasid_cache_gen)
> >     return false;
> > 
> >   switch (pc_info->flags) {
> >     case DOMAIN:
> >       if (pc_info->domain_id != did) {
> >         return false;
> >       }
> >       break;
> >     case PASID:
> >       if (pc_info->pasid != pasid) {
> >         return false;
> >       }
> >       break;
> >     ... (I think you'll add more in the follow up patches)
> >   }
> 
> yep, I can do it.
> 
> > > +        /*
> > > +         * Modify pasid_cache_gen to be 0, the cached pasid entry in
> > > +         * vtd_pasid_as instance is invalid. And vtd_pasid_as instance
> > > +         * would be treated as invalid in QEMU scope until the pasid
> > > +         * cache gen is updated in a new pasid binding or updated in
> > > +         * below logic if found guest pasid entry exists.
> > > +         */
> > > +        remove = true;
> > 
> > Why set remove here?  Should we set it only if we found that the entry
> > is cleared?
> 
> Yes, you are right. But it is only for passthru devices. For emulated
> sva-capable device, I think it would simple to always remove cached
> pasid-entry if guest issues pasid cache invalidation. This is because
> caching-mode is not necessary for emulated devices, which means pasid
> cache invalidation for emulated devices only means cache flush. This
> is enough as the pasid entry can be re-cached during PASID tagged DMA
> translation in do_translate(). Although it is not yet added as I
> mentioned in the patch commit message. While for passthru devices,
> pasid cache invalidation does not only mean cache flush. Instead, it
> relies on the latest guest pasid entry presence status.
> 
> Based on the above idea, I make the remove=true at the beginning, and
> if the subsequent logic finds out it is for passthru devices, it will
> check guest pasid entry and then decide how to handle the pasid cache
> invalidation request. "remove" will be set to be false when guest pasid
> entry exists.

So if you use the logic as [1] below imho it'll both work for emulated
and assigned devices.  It's a bit odd to add "fast path" for emulated
device to me because emulated device is destined to be slow already
and mostly for debugging purpose, rather than any real use in
production, afaiu...

> 
> > 
> > > +        pc_entry->pasid_cache_gen = 0;
> > > +        if (vtd_bus->dev_ic[devfn]) {
> > > +            if (!vtd_dev_get_pe_from_pasid(s,
> > > +                      pci_bus_num(vtd_bus->bus), devfn, pasid, &pe)) {
> > > +                /*
> > > +                 * pasid entry exists, so keep the vtd_pasid_as, and needs
> > > +                 * update the pasid entry cached in vtd_pasid_as. Also, if
> > > +                 * the guest pasid entry doesn't equal to cached pasid entry
> > > +                 * needs to issue a pasid bind to host for passthru devies.
> > > +                 */
> > > +                remove = false;
> > > +                pc_entry->pasid_cache_gen = s->pasid_cache_gen;
> > > +                if (!vtd_pasid_entry_compare(&pe, &pc_entry->pasid_entry)) {
> > > +                    pc_entry->pasid_entry = pe;
> > 
> > What if the pasid entry changed from valid to all zeros?  Should we
> > unbind/remove it as well?
> 
> If it is from valid to all zero, vtd_dev_get_pe_from_pasid() should
> return non-zero.

Why?  Shouldn't it returns zero but with a &pe that contains all zero instead?

Feel free to skip this question if you're going to refactor this piece
of code, then we can review that.

> Then it would keep "remove"=true and pasid_cache_gen=0.
> Unbind will be added with bind in patch 15. Here just handle the cached
> pasid entry within vIOMMU.
> 
> 0015-intel_iommu-bind-unbind-guest-page-table-to-host.patch
> 
> > 
> > > +                    /*
> > > +                     * TODO: when pasid-base-iotlb(piotlb) infrastructure is
> > > +                     * ready, should invalidate QEMU piotlb togehter with this
> > > +                     * change.
> > > +                     */
> > > +                }
> > > +            }
> > > +        }
> > > +    }
> > > +
> > > +    return remove;
> > 
> > In summary, IMHO this chunk could be clearer if like this:
> > 
> >   ... (continues with above pesudo code)
> > 
> >   ret = vtd_dev_get_pe_from_pasid(..., &pe);
> >   if (ret) {
> >     goto remove;
> >   }
> >   // detected correct pasid entry
> >   if (!vtd_pasid_entry_compare(&pe, ...)) {
> >      // pasid entry changed
> >      if (vtd_pasid_cleared(&pe)) {
> >        // the pasid is cleared to all zero, drop
> >        goto remove;
> >      }
> >      // a new pasid is setup
> > 
> >      // Send UNBIND if cache valid
> >      ...
> >      // Send BIND
> >      ...
> >      // Update cache
> >      pc_entry->pasid_entry = pe;
> >      pc_entry->pasid_cache_gen = s->pasid_cache_gen;

(I think I missed a "return false" here...)

> >   }
> > 
> > remove:
> >   // Send UNBIND if cache valid
> >   ...
> >   return true;
> 
> yep, I can do it. nice idea. :-)

[1]

> 
> > I feel like you shouldn't bother checking against
> > vtd_bus->dev_ic[devfn] at all here because if that was set then it
> > means we need to pass these information down to host, and it'll be
> > checked automatically because when we send BIND/UNBIND event we'll
> > definitely check that too otherwise those events will be noops.
> 
> I need it. Because I want to differ passthru devices and emulated
> devices. Ideally, emulated devices won't have vtd_bus->dev_ic[devfn].

I've commented above with the same topic - IMHO we don't need fast
path for emulated devices.  Readability could matter more at least to me.
Better readability could also mean less chance of bugs.

> 
> > > +}
> > > +
> > >  static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id)
> > >  {
> > > +    VTDPASIDCacheInfo pc_info;
> > > +
> > > +    trace_vtd_pasid_cache_dsi(domain_id);
> > > +
> > > +    pc_info.flags = VTD_PASID_CACHE_DOMSI;
> > > +    pc_info.domain_id = domain_id;
> > > +
> > > +    /*
> > > +     * Loop all existing pasid caches and update them.
> > > +     */
> > > +    vtd_iommu_lock(s);
> > > +    g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
> > > +    vtd_iommu_unlock(s);
> > > +
> > > +    /*
> > > +     * TODO: Domain selective PASID cache invalidation
> > > +     * may be issued wrongly by programmer, to be safe,
> > 
> > IMHO it's not wrong even if the guest sends that, because logically
> > the guest can send invalidation as it wishes, and we should have
> > similar issue before on the 2nd level page table invalidations... and
> > that's why we need to keep the iova mapping inside qemu I suppose...
> 
> yes, we are aligned on this point. I can update the description above.
> 
> > 
> > > +     * after invalidating the pasid caches, emulator
> > > +     * needs to replay the pasid bindings by walking guest
> > > +     * pasid dir and pasid table.
> > 
> > This is true...
> 
> handshake here.
> 
> > 
> > > +     */
> > >      return 0;
> > >  }
> > >
> > > +/**
> > > + * This function finds or adds a VTDPASIDAddressSpace for a device
> > > + * when it is bound to a pasid. Caller of this function should hold
> > > + * iommu_lock.
> > > + */
> > > +static VTDPASIDAddressSpace *vtd_add_find_pasid_as(IntelIOMMUState *s,
> > > +                                                   VTDBus *vtd_bus,
> > > +                                                   int devfn,
> > > +                                                   uint32_t pasid,
> > > +                                                   bool allocate)
> > > +{
> > > +    struct pasid_key *key;
> > > +    struct pasid_key *new_key;
> > > +    VTDPASIDAddressSpace *vtd_pasid_as;
> > > +    uint16_t sid;
> > > +
> > > +    sid = vtd_make_source_id(pci_bus_num(vtd_bus->bus), devfn);
> > > +    key = vtd_get_pasid_key(pasid, sid);
> > > +    vtd_pasid_as = g_hash_table_lookup(s->vtd_pasid_as, key);
> > > +
> > > +    if (!vtd_pasid_as && allocate) {
> > > +        new_key = vtd_get_pasid_key(pasid, sid);
> > 
> > Is this the same as key no matter what?
> 
> It is the key content matters. I'd need to refine the key alloc/free in
> next version.
> 
> > 
> > > +        /*
> > > +         * Initiate the vtd_pasid_as structure.
> > > +         *
> > > +         * This structure here is used to track the guest pasid
> > > +         * binding and also serves as pasid-cache mangement entry.
> > > +         *
> > > +         * TODO: in future, if wants to support the SVA-aware DMA
> > > +         *       emulation, the vtd_pasid_as should be fully initialized.
> > > +         *       e.g. the address_space and memory region fields.
> > > +         */
> > > +        vtd_pasid_as = g_malloc0(sizeof(VTDPASIDAddressSpace));
> > > +        vtd_pasid_as->iommu_state = s;
> > > +        vtd_pasid_as->vtd_bus = vtd_bus;
> > > +        vtd_pasid_as->devfn = devfn;
> > > +        vtd_pasid_as->context_cache_entry.context_cache_gen = 0;
> > > +        vtd_pasid_as->pasid = pasid;
> > > +        vtd_pasid_as->pasid_cache_entry.pasid_cache_gen = 0;
> > > +        g_hash_table_insert(s->vtd_pasid_as, new_key, vtd_pasid_as);
> > > +    }
> > > +    return vtd_pasid_as;
> > > +}
> > > +
> > > + /**
> > > +  * This function updates the pasid entry cached in &vtd_pasid_as.
> > > +  * Caller of this function should hold iommu_lock.
> > > +  */
> > > +static inline void vtd_fill_in_pe_cache(
> > > +              VTDPASIDAddressSpace *vtd_pasid_as, VTDPASIDEntry *pe)
> > > +{
> > > +    IntelIOMMUState *s = vtd_pasid_as->iommu_state;
> > > +    VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry;
> > > +
> > > +    pc_entry->pasid_entry = *pe;
> > > +    pc_entry->pasid_cache_gen = s->pasid_cache_gen;
> > > +}
> > > +
> > >  static int vtd_pasid_cache_psi(IntelIOMMUState *s,
> > >                                 uint16_t domain_id, uint32_t pasid)
> > >  {
> > > +    VTDPASIDCacheInfo pc_info;
> > > +    VTDPASIDEntry pe;
> > > +    VTDBus *vtd_bus;
> > > +    int bus_n, devfn;
> > > +    VTDPASIDAddressSpace *vtd_pasid_as;
> > > +    VTDIOMMUContext *vtd_ic;
> > > +
> > > +    pc_info.flags = VTD_PASID_CACHE_DOMSI;
> > > +    pc_info.domain_id = domain_id;
> > > +    pc_info.flags |= VTD_PASID_CACHE_PASIDSI;
> > > +    pc_info.pasid = pasid;
> > > +
> > > +    /*
> > > +     * Regards to a pasid selective pasid cache invalidation (PSI), it
> > > +     * could be either cases of below:
> > > +     * a) a present pasid entry moved to non-present
> > > +     * b) a present pasid entry to be a present entry
> > > +     * c) a non-present pasid entry moved to present
> > > +     *
> > > +     * Here the handling of a PSI is:
> > > +     * 1) loop all the exisitng vtd_pasid_as instances to update them
> > > +     *    according to the latest guest pasid entry in pasid table.
> > > +     *    this will make sure affected existing vtd_pasid_as instances
> > > +     *    cached the latest pasid entries. Also, during the loop, the
> > > +     *    host should be notified if needed. e.g. pasid unbind or pasid
> > > +     *    update. Should be able to cover case a) and case b).
> > > +     *
> > > +     * 2) loop all devices to cover case c)
> > > +     *    However, it is not good to always loop all devices. In this
> > > +     *    implementation. We do it in this ways:
> > > +     *    - For devices which have VTDIOMMUContext instances, we loop
> > > +     *      them and check if guest pasid entry exists. If yes, it is
> > > +     *      case c), we update the pasid cache and also notify host.
> > > +     *    - For devices which have no VTDIOMMUContext instances, it is
> > > +     *      not necessary to create pasid cache at this phase since it
> > > +     *      could be created when vIOMMU do DMA address translation.
> > > +     *      This is not implemented yet since no PASID-capable emulated
> > > +     *      devices today. If we have it in future, the pasid cache shall
> > > +     *      be created there.
> > > +     */
> > > +
> > > +    vtd_iommu_lock(s);
> > > +    g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
> > > +    vtd_iommu_unlock(s);
> > 
> > [2]
> > 
> > > +
> > > +    vtd_iommu_lock(s);
> > 
> > Do you want to explicitly release the lock for other thread?
> > Otherwise I don't see a point to unlock/lock in sequence..
> 
> I felt like to have shorter protected snippets. But, I don’t have strong
> reason either after reconsidering it. I'll remove it anyhow.
> 
> > > +    QLIST_FOREACH(vtd_ic, &s->vtd_dev_ic_list, next) {
> > > +        vtd_bus = vtd_ic->vtd_bus;
> > > +        devfn = vtd_ic->devfn;
> > > +        bus_n = pci_bus_num(vtd_bus->bus);
> > > +
> > > +        /* Step 1: fetch vtd_pasid_as and check if it is valid */
> > > +        vtd_pasid_as = vtd_add_find_pasid_as(s, vtd_bus,
> > > +                                        devfn, pasid, true);
> > > +        if (vtd_pasid_as &&
> > > +            (s->pasid_cache_gen ==
> > > +             vtd_pasid_as->pasid_cache_entry.pasid_cache_gen)) {
> > > +            /*
> > > +             * pasid_cache_gen equals to s->pasid_cache_gen means
> > > +             * vtd_pasid_as is valid after the above s->vtd_pasid_as
> > > +             * updates. Thus no need for the below steps.
> > > +             */
> > > +            continue;
> > > +        }
> > > +
> > > +        /*
> > > +         * Step 2: vtd_pasid_as is not valid, it's potentailly a
> > > +         * new pasid bind. Fetch guest pasid entry.
> > > +         */
> > > +        if (vtd_dev_get_pe_from_pasid(s, bus_n, devfn, pasid, &pe)) {
> > > +            continue;
> > > +        }
> > > +
> > > +        /*
> > > +         * Step 3: pasid entry exists, update pasid cache
> > > +         *
> > > +         * Here need to check domain ID since guest pasid entry
> > > +         * exists. What needs to do are:
> > > +         *   - update the pc_entry in the vtd_pasid_as
> > > +         *   - set proper pc_entry.pasid_cache_gen
> > > +         *   - passdown the latest guest pasid entry config to host
> > > +         *     (will be added in later patch)
> > > +         */
> > > +        if (domain_id == vtd_pe_get_domain_id(&pe)) {
> > > +            vtd_fill_in_pe_cache(vtd_pasid_as, &pe);
> > > +        }
> > > +    }
> > 
> > Could you explain why do we need this whole chunk if with [2] above?
> > I feel like that'll do all the things we need already (send
> > BIND/UNBIND, update pasid entry cache).
> 
> You may refer to the comments added for this function in the patch
> itself. Also, I'd like to talk more to assist your review. The basic
> idea is that the above chunk [2] only handles the already cached
> pasid-entries. right? It covers the modifications from present pasid
> entry to be either non-present or present modifications. While for a
> non-present pasid entry to present modification, chunk [2] has no idea.
> To cover such possibilities, needs to loop all devices and check the
> corresponding pasid entries. This is what I proposed in RFC v1. But I
> don’t like it. To be more efficient, I think we can just loop all
> passthru devices since only passthru devices care about the
> non-present to present changes. For emulated devices, its pasid cache
> can be created in do_translate() for emulated PASID tagged DMAs.

Ok I see your point, thanks for explaining!

Two loops are still a bit messy though.  How about we do this:

  - add another patch to keep a list of all the devices that are under
    the IOMMU (e.g. link the VTDAddressSpace in vtd_find_add_as when
    created), then

  - in this patch instead of looping over two, we just loop over all
    the devices (emulated + assigned), we do vtd_flush_pasid() or
    similar thing to update the pasid cache in a common way

Then I think we don't need two loops, but only once to traverse the
list that contains all devices.  Do you think it could be a bit cleaner?

Thanks,

> 
> > 
> > > +    vtd_iommu_unlock(s);
> > >      return 0;
> > >  }

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 10/22] intel_iommu: add virtual command capability support
  2019-11-06 14:00       ` Peter Xu
@ 2019-11-12  6:27         ` Liu, Yi L
  0 siblings, 0 replies; 79+ messages in thread
From: Liu, Yi L @ 2019-11-12  6:27 UTC (permalink / raw)
  To: Peter Xu
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, eric.auger, alex.williamson, pbonzini, Sun, Yi Y,
	david

> From: Peter Xu <peterx@redhat.com>
> Sent: Wednesday, November 6, 2019 10:01 PM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 10/22] intel_iommu: add virtual command capability support
> 
> On Wed, Nov 06, 2019 at 12:40:41PM +0000, Liu, Yi L wrote:
> > >
> > > Do you know what should happen on bare-metal from spec-wise that when
> > > the guest e.g. writes 2 bytes to these mmio regions?
> >
> > I've no idea to your question. It is not a bare-metal capability. Personally, I
> > prefer to have a toggle bit to mark the full written of a cmd to VMCD_REG.
> > Reason is that we have no control on guest software. It may write new cmd
> > to VCMD_REG in a bad manner. e.g. write high 32 bits first and then write the
> > low 32 bits. Then it will have two traps. Apparently, for the first trap, it fills
> > in the VCMD_REG and no need to handle it since it is not a full written. I'm
> > checking it and evaluating it. How do you think on it?
> 
> Oh I just noticed that vtd_mem_ops.min_access_size==4 now so writting
> 2B should never happen at least.  Then we'll bail out at
> memory_region_access_valid().  Seems fine.

got it.

> 
> >
> > >
> > > > +        if (!vtd_handle_vcmd_write(s, val)) {
> > > > +            vtd_set_long(s, addr, val);
> > > > +        }
> > > > +        break;
> > > > +
> > > >      default:
> > > >          if (size == 4) {
> > > >              vtd_set_long(s, addr, val);
> > > > @@ -3617,7 +3769,8 @@ static void vtd_init(IntelIOMMUState *s)
> > > >              s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
> > > >          } else if (!strcmp(s->scalable_mode, "modern")) {
> > > >              s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID
> > > > -                       | VTD_ECAP_FLTS | VTD_ECAP_PSS;
> > > > +                       | VTD_ECAP_FLTS | VTD_ECAP_PSS | VTD_ECAP_VCS;
> > > > +            s->vccap |= VTD_VCCAP_PAS;
> > > >          }
> > > >      }
> > > >
> > >
> > > [...]
> > >
> > > > +#define VTD_VCMD_CMD_MASK           0xffUL
> > > > +#define VTD_VCMD_PASID_VALUE(val)   (((val) >> 8) & 0xfffff)
> > > > +
> > > > +#define VTD_VCRSP_RSLT(val)         ((val) << 8)
> > > > +#define VTD_VCRSP_SC(val)           (((val) & 0x3) << 1)
> > > > +
> > > > +#define VTD_VCMD_UNDEFINED_CMD         1ULL
> > > > +#define VTD_VCMD_NO_AVAILABLE_PASID    2ULL
> > >
> > > According to 10.4.44 - should this be 1?
> >
> > It's 2 now per VT-d spec 3.1 (2019 June). I should have mentioned it in the cover
> > letter...
> 
> Well you're right... I hope there won't be other "major" things get
> changed otherwise it'll be really a pain of working on all of these
> before things settle...

As far as I know, only this part has significant change. Other parts are consistent.
I'll mention spec version next time in the cover letter.

Thanks,
Yi Liu


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC v2 09/22] vfio/pci: add iommu_context notifier for pasid alloc/free
  2019-11-06 12:14     ` Liu, Yi L
@ 2019-11-20  4:27       ` David Gibson
  2019-11-26  7:07         ` Liu, Yi L
  0 siblings, 1 reply; 79+ messages in thread
From: David Gibson @ 2019-11-20  4:27 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, peterx, eric.auger, alex.williamson, pbonzini, Sun,
	Yi Y

[-- Attachment #1: Type: text/plain, Size: 6815 bytes --]

On Wed, Nov 06, 2019 at 12:14:50PM +0000, Liu, Yi L wrote:
> > From: David Gibson [mailto:david@gibson.dropbear.id.au]
> > Sent: Tuesday, October 29, 2019 8:16 PM
> > To: Liu, Yi L <yi.l.liu@intel.com>
> > Subject: Re: [RFC v2 09/22] vfio/pci: add iommu_context notifier for pasid alloc/free
> > 
> > On Thu, Oct 24, 2019 at 08:34:30AM -0400, Liu Yi L wrote:
> > > This patch adds pasid alloc/free notifiers for vfio-pci. It is
> > > supposed to be fired by vIOMMU. VFIO then sends PASID allocation
> > > or free request to host.
> > >
> > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > Cc: Peter Xu <peterx@redhat.com>
> > > Cc: Eric Auger <eric.auger@redhat.com>
> > > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > > Cc: David Gibson <david@gibson.dropbear.id.au>
> > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > ---
> > >  hw/vfio/common.c         |  9 ++++++
> > >  hw/vfio/pci.c            | 81
> > ++++++++++++++++++++++++++++++++++++++++++++++++
> > >  include/hw/iommu/iommu.h | 15 +++++++++
> > >  3 files changed, 105 insertions(+)
> > >
> > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > index d418527..e6ad21c 100644
> > > --- a/hw/vfio/common.c
> > > +++ b/hw/vfio/common.c
> > > @@ -1436,6 +1436,7 @@ static void vfio_disconnect_container(VFIOGroup
> > *group)
> > >      if (QLIST_EMPTY(&container->group_list)) {
> > >          VFIOAddressSpace *space = container->space;
> > >          VFIOGuestIOMMU *giommu, *tmp;
> > > +        VFIOIOMMUContext *giommu_ctx, *ctx;
> > >
> > >          QLIST_REMOVE(container, next);
> > >
> > > @@ -1446,6 +1447,14 @@ static void vfio_disconnect_container(VFIOGroup
> > *group)
> > >              g_free(giommu);
> > >          }
> > >
> > > +        QLIST_FOREACH_SAFE(giommu_ctx, &container->iommu_ctx_list,
> > > +                                                   iommu_ctx_next, ctx) {
> > > +            iommu_ctx_notifier_unregister(giommu_ctx->iommu_ctx,
> > > +                                                      &giommu_ctx->n);
> > > +            QLIST_REMOVE(giommu_ctx, iommu_ctx_next);
> > > +            g_free(giommu_ctx);
> > > +        }
> > > +
> > >          trace_vfio_disconnect_container(container->fd);
> > >          close(container->fd);
> > >          g_free(container);
> > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > index 12fac39..8721ff6 100644
> > > --- a/hw/vfio/pci.c
> > > +++ b/hw/vfio/pci.c
> > > @@ -2699,11 +2699,80 @@ static void
> > vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> > >      vdev->req_enabled = false;
> > >  }
> > >
> > > +static void vfio_register_iommu_ctx_notifier(VFIOPCIDevice *vdev,
> > > +                                             IOMMUContext *iommu_ctx,
> > > +                                             IOMMUCTXNotifyFn fn,
> > > +                                             IOMMUCTXEvent event)
> > > +{
> > > +    VFIOContainer *container = vdev->vbasedev.group->container;
> > > +    VFIOIOMMUContext *giommu_ctx;
> > > +
> > > +    giommu_ctx = g_malloc0(sizeof(*giommu_ctx));
> > > +    giommu_ctx->container = container;
> > > +    giommu_ctx->iommu_ctx = iommu_ctx;
> > > +    QLIST_INSERT_HEAD(&container->iommu_ctx_list,
> > > +                      giommu_ctx,
> > > +                      iommu_ctx_next);
> > > +    iommu_ctx_notifier_register(iommu_ctx,
> > > +                                &giommu_ctx->n,
> > > +                                fn,
> > > +                                event);
> > > +}
> > > +
> > > +static void vfio_iommu_pasid_alloc_notify(IOMMUCTXNotifier *n,
> > > +                                          IOMMUCTXEventData *event_data)
> > > +{
> > > +    VFIOIOMMUContext *giommu_ctx = container_of(n, VFIOIOMMUContext, n);
> > > +    VFIOContainer *container = giommu_ctx->container;
> > > +    IOMMUCTXPASIDReqDesc *pasid_req =
> > > +                              (IOMMUCTXPASIDReqDesc *) event_data->data;
> > > +    struct vfio_iommu_type1_pasid_request req;
> > > +    unsigned long argsz;
> > > +    int pasid;
> > > +
> > > +    argsz = sizeof(req);
> > > +    req.argsz = argsz;
> > > +    req.flag = VFIO_IOMMU_PASID_ALLOC;
> > > +    req.min_pasid = pasid_req->min_pasid;
> > > +    req.max_pasid = pasid_req->max_pasid;
> > > +
> > > +    pasid = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> > > +    if (pasid < 0) {
> > > +        error_report("%s: %d, alloc failed", __func__, -errno);
> > > +    }
> > > +    pasid_req->alloc_result = pasid;
> > 
> > Altering the event data from the notifier doesn't make sense.  By
> > definition there can be multiple notifiers on the chain, so in that
> > case which one is responsible for updating the writable field?
> 
> I guess you mean multiple pasid_alloc nofitiers. right?
> 
> It works for VT-d now, as Intel vIOMMU maintains the IOMMUContext
> per-bdf. And there will be only 1 pasid_alloc notifier in the chain. But, I
> agree it is not good if other module just share an IOMMUConext across
> devices. Definitely, it would have multiple pasid_alloc notifiers.

Right.

> How about enforcing IOMMUContext layer to only invoke one successful
> pasid_alloc/free notifier if PASID_ALLOC/FREE event comes? pasid
> alloc/free are really special as it requires feedback. And a potential
> benefit is that the pasid_alloc/free will not be affected by hot plug
> scenario. There will be always a notifier to work for pasid_alloc/free
> work unless all passthru devices are hot plugged. How do you think? Or
> if any other idea?

Hrm, that still doesn't seem right to me.  I don't think a notifier is
really the right mechanism for something that needs to return values.
This seems like something where you need to find a _single_
responsible object and call a method / callback on that specifically.

But it seems to me there's a more fundamental problem here.  AIUI the
idea is that a single IOMMUContext could hold multiple devices.  But
if the devices are responsible for assigning their own pasid values
(by passing that decisionon to the host through vfio) then that really
can't work.

I'm assuming it's impossible from the hardware side to virtualize the
pasids (so that we could assign them from qemu without host
intervention).

If so, then the pasid allocation really has to be a Context level, not
device level operation.  We'd have to wire the VFIO backend up to the
context itself, not a device... I'm not immediately sure how to do
that, though.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC v2 09/22] vfio/pci: add iommu_context notifier for pasid alloc/free
  2019-11-20  4:27       ` David Gibson
@ 2019-11-26  7:07         ` Liu, Yi L
  0 siblings, 0 replies; 79+ messages in thread
From: Liu, Yi L @ 2019-11-26  7:07 UTC (permalink / raw)
  To: David Gibson
  Cc: Tian, Kevin, jacob.jun.pan, Yi Sun, kvm, mst, Tian, Jun J,
	qemu-devel, peterx, eric.auger, alex.williamson, pbonzini, Sun,
	Yi Y

Hi David,

> From: David Gibson < david@gibson.dropbear.id.au>
> Sent: Wednesday, November 20, 2019 12:28 PM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v2 09/22] vfio/pci: add iommu_context notifier for pasid alloc/free
> 
> On Wed, Nov 06, 2019 at 12:14:50PM +0000, Liu, Yi L wrote:
> > > From: David Gibson [mailto:david@gibson.dropbear.id.au]
> > > Sent: Tuesday, October 29, 2019 8:16 PM
> > > To: Liu, Yi L <yi.l.liu@intel.com>
> > > Subject: Re: [RFC v2 09/22] vfio/pci: add iommu_context notifier for pasid
> alloc/free
> > >
> > > On Thu, Oct 24, 2019 at 08:34:30AM -0400, Liu Yi L wrote:
> > > > This patch adds pasid alloc/free notifiers for vfio-pci. It is
> > > > supposed to be fired by vIOMMU. VFIO then sends PASID allocation
> > > > or free request to host.
> > > >
> > > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > Cc: Peter Xu <peterx@redhat.com>
> > > > Cc: Eric Auger <eric.auger@redhat.com>
> > > > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > > > Cc: David Gibson <david@gibson.dropbear.id.au>
> > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > > ---
> > > >  hw/vfio/common.c         |  9 ++++++
> > > >  hw/vfio/pci.c            | 81
[...]
> > > > +
> > > > +static void vfio_iommu_pasid_alloc_notify(IOMMUCTXNotifier *n,
> > > > +                                          IOMMUCTXEventData *event_data)
> > > > +{
> > > > +    VFIOIOMMUContext *giommu_ctx = container_of(n, VFIOIOMMUContext,
> n);
> > > > +    VFIOContainer *container = giommu_ctx->container;
> > > > +    IOMMUCTXPASIDReqDesc *pasid_req =
> > > > +                              (IOMMUCTXPASIDReqDesc *) event_data->data;
> > > > +    struct vfio_iommu_type1_pasid_request req;
> > > > +    unsigned long argsz;
> > > > +    int pasid;
> > > > +
> > > > +    argsz = sizeof(req);
> > > > +    req.argsz = argsz;
> > > > +    req.flag = VFIO_IOMMU_PASID_ALLOC;
> > > > +    req.min_pasid = pasid_req->min_pasid;
> > > > +    req.max_pasid = pasid_req->max_pasid;
> > > > +
> > > > +    pasid = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> > > > +    if (pasid < 0) {
> > > > +        error_report("%s: %d, alloc failed", __func__, -errno);
> > > > +    }
> > > > +    pasid_req->alloc_result = pasid;
> > >
> > > Altering the event data from the notifier doesn't make sense.  By
> > > definition there can be multiple notifiers on the chain, so in that
> > > case which one is responsible for updating the writable field?
> >
> > I guess you mean multiple pasid_alloc nofitiers. right?
> >
> > It works for VT-d now, as Intel vIOMMU maintains the IOMMUContext
> > per-bdf. And there will be only 1 pasid_alloc notifier in the chain. But, I
> > agree it is not good if other module just share an IOMMUConext across
> > devices. Definitely, it would have multiple pasid_alloc notifiers.
> 
> Right.
> 
> > How about enforcing IOMMUContext layer to only invoke one successful
> > pasid_alloc/free notifier if PASID_ALLOC/FREE event comes? pasid
> > alloc/free are really special as it requires feedback. And a potential
> > benefit is that the pasid_alloc/free will not be affected by hot plug
> > scenario. There will be always a notifier to work for pasid_alloc/free
> > work unless all passthru devices are hot plugged. How do you think? Or
> > if any other idea?
> 
> Hrm, that still doesn't seem right to me.  I don't think a notifier is
> really the right mechanism for something that needs to return values.
> This seems like something where you need to find a _single_
> responsible object and call a method / callback on that specifically.

Agreed. For alloc/free operations, we need an explicit calling instead
of notifier which is usally to be a chain notification.

> But it seems to me there's a more fundamental problem here.  AIUI the
> idea is that a single IOMMUContext could hold multiple devices.  But
> if the devices are responsible for assigning their own pasid values
> (by passing that decisionon to the host through vfio) then that really
> can't work.
>
> I'm assuming it's impossible from the hardware side to virtualize the
> pasids (so that we could assign them from qemu without host
> intervention).

Actually, this is possible. On Intel platform, we've introduced ENQCMD
to do PASID translation which essentially supports PASID virtualization.
You may get more details in section 3.3. This is also why we want to have
host's intervention in PASID alloc/free.

https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

> If so, then the pasid allocation really has to be a Context level, not
> device level operation.  We'd have to wire the VFIO backend up to the
> context itself, not a device... I'm not immediately sure how to do
> that, though.

I think for the pasid alloc/free, we want it to be a vfio container
operation. right? However, we cannot expose vfio container out of vfio
or we don't want to do such thing. Then I'm wondering if we can have
a PASIDObject which is allocated per container creation, and registered
to vIOMMU. The PASIDObject can provide pasid alloc/free ops. vIOMMU can
consume the ops to get host pasid or free a host pasid.

While for the current IOMMUContext in this patchset, I think we may keep
it to support bind_gpasid and iommu_cache_invalidate. Also, as far as I
can see, we may want to extend it to support host IOMMU translation fault
injection to vIOMMU. This is also an important operation after config
nested translation for vIOMMU (a.k.a. dual stage translation).

> --
> David Gibson                  | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au        | minimalist, thank you.  NOT _the_ _other_
>                               | _way_ _around_!
> http://www.ozlabs.org/~dgibson

Thanks,
Yi Liu


^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2019-11-26  7:26 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-24 12:34 [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
2019-10-24 12:34 ` [RFC v2 01/22] update-linux-headers: Import iommu.h Liu Yi L
2019-10-24 12:34 ` [RFC v2 02/22] header update VFIO/IOMMU vSVA APIs against 5.4.0-rc3+ Liu Yi L
2019-10-24 12:34 ` [RFC v2 03/22] intel_iommu: modify x-scalable-mode to be string option Liu Yi L
2019-11-01 14:57   ` Peter Xu
2019-11-05  9:14     ` Liu, Yi L
2019-11-05 12:50       ` Peter Xu
2019-11-06  9:50         ` Liu, Yi L
2019-10-24 12:34 ` [RFC v2 04/22] hw/iommu: introduce IOMMUContext Liu Yi L
2019-10-27 17:39   ` David Gibson
2019-11-06 11:18     ` Liu, Yi L
2019-10-24 12:34 ` [RFC v2 05/22] vfio/common: add iommu_ctx_notifier in container Liu Yi L
2019-11-01 14:58   ` Peter Xu
2019-11-06 11:08     ` Liu, Yi L
2019-10-24 12:34 ` [RFC v2 06/22] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps Liu Yi L
2019-10-27 17:43   ` David Gibson
2019-11-06  8:18     ` Liu, Yi L
2019-11-01 18:09   ` Peter Xu
2019-11-06  8:15     ` Liu, Yi L
2019-10-24 12:34 ` [RFC v2 07/22] hw/pci: introduce pci_device_iommu_context() Liu Yi L
2019-10-29 11:50   ` David Gibson
2019-11-06  8:20     ` Liu, Yi L
2019-11-01 18:09   ` Peter Xu
2019-11-06  8:14     ` Liu, Yi L
2019-10-24 12:34 ` [RFC v2 08/22] intel_iommu: provide get_iommu_context() callback Liu Yi L
2019-11-01 14:55   ` Peter Xu
2019-11-06 11:07     ` Liu, Yi L
2019-10-24 12:34 ` [RFC v2 09/22] vfio/pci: add iommu_context notifier for pasid alloc/free Liu Yi L
2019-10-29 12:15   ` David Gibson
2019-11-01 17:26     ` Peter Xu
2019-11-06 12:46       ` Liu, Yi L
2019-11-06 12:14     ` Liu, Yi L
2019-11-20  4:27       ` David Gibson
2019-11-26  7:07         ` Liu, Yi L
2019-10-24 12:34 ` [RFC v2 10/22] intel_iommu: add virtual command capability support Liu Yi L
2019-11-01 18:05   ` Peter Xu
2019-11-06 12:40     ` Liu, Yi L
2019-11-06 14:00       ` Peter Xu
2019-11-12  6:27         ` Liu, Yi L
2019-10-24 12:34 ` [RFC v2 11/22] intel_iommu: process pasid cache invalidation Liu Yi L
2019-11-02 16:05   ` Peter Xu
2019-11-06  5:55     ` Liu, Yi L
2019-10-24 12:34 ` [RFC v2 12/22] intel_iommu: add present bit check for pasid table entries Liu Yi L
2019-11-02 16:20   ` Peter Xu
2019-11-06  8:14     ` Liu, Yi L
2019-10-24 12:34 ` [RFC v2 13/22] intel_iommu: add PASID cache management infrastructure Liu Yi L
2019-11-04 17:08   ` Peter Xu
2019-11-04 20:06   ` Peter Xu
2019-11-06  7:56     ` Liu, Yi L
2019-11-07 15:46       ` Peter Xu
2019-10-24 12:34 ` [RFC v2 14/22] vfio/pci: add iommu_context notifier for pasid bind/unbind Liu Yi L
2019-11-04 16:02   ` David Gibson
2019-11-06 12:22     ` Liu, Yi L
2019-11-06 14:25       ` Peter Xu
2019-10-24 12:34 ` [RFC v2 15/22] intel_iommu: bind/unbind guest page table to host Liu Yi L
2019-11-04 20:25   ` Peter Xu
2019-11-06  8:10     ` Liu, Yi L
2019-11-06 14:27       ` Peter Xu
2019-10-24 12:34 ` [RFC v2 16/22] intel_iommu: replay guest pasid bindings " Liu Yi L
2019-10-24 12:34 ` [RFC v2 17/22] intel_iommu: replay pasid binds after context cache invalidation Liu Yi L
2019-10-24 12:34 ` [RFC v2 18/22] intel_iommu: do not passdown pasid bind for PASID #0 Liu Yi L
2019-10-24 12:34 ` [RFC v2 19/22] vfio/pci: add iommu_context notifier for PASID-based iotlb flush Liu Yi L
2019-10-24 12:34 ` [RFC v2 20/22] intel_iommu: process PASID-based iotlb invalidation Liu Yi L
2019-10-24 12:34 ` [RFC v2 21/22] intel_iommu: propagate PASID-based iotlb invalidation to host Liu Yi L
2019-10-24 12:34 ` [RFC v2 22/22] intel_iommu: process PASID-based Device-TLB invalidation Liu Yi L
2019-10-25  6:21 ` [RFC v2 00/22] intel_iommu: expose Shared Virtual Addressing to VM no-reply
2019-10-25  6:30 ` no-reply
2019-10-25  9:49 ` Jason Wang
2019-10-25 10:12   ` Tian, Kevin
2019-10-31  4:33     ` Jason Wang
2019-10-31  5:39       ` Tian, Kevin
2019-10-31 14:07       ` Liu, Yi L
2019-11-01  7:29         ` Jason Wang
2019-11-01  7:46           ` Tian, Kevin
2019-11-01  8:04             ` Jason Wang
2019-11-01  8:09               ` Jason Wang
2019-11-02  7:35                 ` Tian, Kevin
2019-11-04 17:22 ` Peter Xu
2019-11-05  9:09   ` Liu, Yi L

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).