KVM Archive on lore.kernel.org
 help / color / Atom feed
* [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM
@ 2019-07-05 11:01 Liu Yi L
  2019-07-05 11:01 ` [RFC v1 01/18] linux-headers: import iommu.h from kernel Liu Yi L
                   ` (17 more replies)
  0 siblings, 18 replies; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm

Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on Intel
platforms allow address space sharing between device DMA and applications.
SVA can reduce programming complexity and enhance security.
This series is intended to expose SVA capability to VMs. i.e. shared guest
application address space with passthru devices. The whole SVA virtualization
requires QEMU/VFIO/IOMMU changes. This series includes the QEMU changes, for
VFIO and IOMMU changes, they are in separate series (listed in the "Related
series").

The high-level architecture for SVA virtualization is as below:

    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process CR3, FL only|
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'                       |
    |             |                       V
    |             |                CR3 in GPA
    '-------------'
Guest
------| Shadow |--------------------------|--------
      v        v                          v
Host
    .-------------.  .----------------------.
    |   pIOMMU    |  | Bind FL for GVA-GPA  |
    |             |  '----------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.------------------------------.
    |             |   |SL for GPA-HPA, default domain|
    |             |   '------------------------------'
    '-------------'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

There are roughly four parts:
1. Introduce PCIPASIDOps to PCIDevice to support PASID related operations
2. Passdown PASID allocation and free to host
3. Passdown guest PASID binding to host
4. Passdown guest IOMMU cache invalidation to host

Related series:
[1] [PATCH v4 00/22]  Shared virtual address IOMMU and VT-d support:
https://lwn.net/Articles/790820/
<My series is based on this kernel series from Jacob Pan>

[2] [RFC PATCH 0/4] vfio: support Shared Virtual Addressing from Yi Liu

This work is based on collaboration with other developers on the IOMMU
mailing list. Notably,
[1] [RFC PATCH 00/20] Qemu: Extend intel_iommu emulator to support
Shared Virtual Memory from Yi Liu
https://www.spinics.net/lists/kvm/msg148798.html

[2] [RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d from Yi Liu
https://lists.linuxfoundation.org/pipermail/iommu/2017-April/021475.html

[3] [PATCH v3 00/12] Introduce new iommu notifier framework for virt-SVA
by Yi Liu
https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00078.html

[4] [PATCH v6 00/22] SMMUv3 Nested Stage Setup by Eric Auger
https://lkml.org/lkml/2019/3/17/124

[5] [RFC v4 00/27] vSMMUv3/pSMMUv3 2 stage VFIO integration by Eric Auger
https://lists.sr.ht/~philmd/qemu/%3C20190527114203.2762-1-eric.auger%40redhat.com%3E

[6] [RFC PATCH 2/6] drivers core: Add I/O ASID allocator by Jean-Philippe
Brucker
https://www.spinics.net/lists/iommu/msg30639.html

Liu Yi L (18):
  linux-headers: import iommu.h from kernel
  linux-headers: import vfio.h from kernel
  hw/pci: introduce PCIPASIDOps to PCIDevice
  intel_iommu: add "sm_model" option
  vfio/pci: add pasid alloc/free implementation
  intel_iommu: support virtual command emulation and pasid request
  hw/pci: add pci_device_bind/unbind_gpasid
  vfio/pci: add vfio bind/unbind_gpasid implementation
  intel_iommu: process pasid cache invalidation
  intel_iommu: tag VTDAddressSpace instance with PASID
  intel_iommu: create VTDAddressSpace per BDF+PASID
  intel_iommu: bind/unbind guest page table to host
  intel_iommu: flush pasid cache after a DSI context cache flush
  hw/pci: add flush_pasid_iotlb() in PCIPASIDOps
  vfio/pci: adds support for PASID-based iotlb flush
  intel_iommu: add PASID-based iotlb invalidation support
  intel_iommu: propagate PASID-based iotlb flush to host
  intel_iommu: do not passdown pasid bind for PASID #0

 hw/i386/intel_iommu.c          | 811 ++++++++++++++++++++++++++++++++++++++++-
 hw/i386/intel_iommu_internal.h |  97 +++++
 hw/i386/trace-events           |   7 +
 hw/pci/pci.c                   |  95 +++++
 hw/vfio/pci.c                  | 138 +++++++
 include/hw/i386/intel_iommu.h  |  22 +-
 include/hw/pci/pci.h           |  27 ++
 linux-headers/linux/iommu.h    | 338 +++++++++++++++++
 linux-headers/linux/vfio.h     | 116 ++++++
 9 files changed, 1644 insertions(+), 7 deletions(-)
 create mode 100644 linux-headers/linux/iommu.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 01/18] linux-headers: import iommu.h from kernel
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
@ 2019-07-05 11:01 ` Liu Yi L
  2019-07-05 11:01 ` [RFC v1 02/18] linux-headers: import vfio.h " Liu Yi L
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

This patch imports the vIOMMU related definitions from kernel
uapi/iommu.h. e.g. iommu fault report, pasid allocation, guest
pasid bind and guest iommu cache flush.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
---
 linux-headers/linux/iommu.h | 338 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 338 insertions(+)
 create mode 100644 linux-headers/linux/iommu.h

diff --git a/linux-headers/linux/iommu.h b/linux-headers/linux/iommu.h
new file mode 100644
index 0000000..a9cdc63
--- /dev/null
+++ b/linux-headers/linux/iommu.h
@@ -0,0 +1,338 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * IOMMU user API definitions
+ */
+
+#ifndef _UAPI_IOMMU_H
+#define _UAPI_IOMMU_H
+
+#include <linux/types.h>
+
+#define IOMMU_FAULT_PERM_READ	(1 << 0) /* read */
+#define IOMMU_FAULT_PERM_WRITE	(1 << 1) /* write */
+#define IOMMU_FAULT_PERM_EXEC	(1 << 2) /* exec */
+#define IOMMU_FAULT_PERM_PRIV	(1 << 3) /* privileged */
+
+/* Generic fault types, can be expanded IRQ remapping fault */
+enum iommu_fault_type {
+	IOMMU_FAULT_DMA_UNRECOV = 1,	/* unrecoverable fault */
+	IOMMU_FAULT_PAGE_REQ,		/* page request fault */
+};
+
+enum iommu_fault_reason {
+	IOMMU_FAULT_REASON_UNKNOWN = 0,
+
+	/* Could not access the PASID table (fetch caused external abort) */
+	IOMMU_FAULT_REASON_PASID_FETCH,
+
+	/* PASID entry is invalid or has configuration errors */
+	IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
+
+	/*
+	 * PASID is out of range (e.g. exceeds the maximum PASID
+	 * supported by the IOMMU) or disabled.
+	 */
+	IOMMU_FAULT_REASON_PASID_INVALID,
+
+	/*
+	 * An external abort occurred fetching (or updating) a translation
+	 * table descriptor
+	 */
+	IOMMU_FAULT_REASON_WALK_EABT,
+
+	/*
+	 * Could not access the page table entry (Bad address),
+	 * actual translation fault
+	 */
+	IOMMU_FAULT_REASON_PTE_FETCH,
+
+	/* Protection flag check failed */
+	IOMMU_FAULT_REASON_PERMISSION,
+
+	/* access flag check failed */
+	IOMMU_FAULT_REASON_ACCESS,
+
+	/* Output address of a translation stage caused Address Size fault */
+	IOMMU_FAULT_REASON_OOR_ADDRESS,
+};
+
+/**
+ * struct iommu_fault_unrecoverable - Unrecoverable fault data
+ * @reason: reason of the fault, from &enum iommu_fault_reason
+ * @flags: parameters of this fault (IOMMU_FAULT_UNRECOV_* values)
+ * @pasid: Process Address Space ID
+ * @perm: Requested permission access using by the incoming transaction
+ *        (IOMMU_FAULT_PERM_* values)
+ * @addr: offending page address
+ * @fetch_addr: address that caused a fetch abort, if any
+ */
+struct iommu_fault_unrecoverable {
+	__u32	reason;
+#define IOMMU_FAULT_UNRECOV_PASID_VALID		(1 << 0)
+#define IOMMU_FAULT_UNRECOV_ADDR_VALID		(1 << 1)
+#define IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID	(1 << 2)
+	__u32	flags;
+	__u32	pasid;
+	__u32	perm;
+	__u64	addr;
+	__u64	fetch_addr;
+};
+
+/**
+ * struct iommu_fault_page_request - Page Request data
+ * @flags: encodes whether the corresponding fields are valid and whether this
+ *         is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* values)
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
+ * @addr: page address
+ * @private_data: device-specific private information
+ */
+struct iommu_fault_page_request {
+#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID	(1 << 0)
+#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE	(1 << 1)
+#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA	(1 << 2)
+	__u32	flags;
+	__u32	pasid;
+	__u32	grpid;
+	__u32	perm;
+	__u64	addr;
+	__u64	private_data[2];
+};
+
+/**
+ * struct iommu_fault - Generic fault data
+ * @type: fault type from &enum iommu_fault_type
+ * @padding: reserved for future use (should be zero)
+ * @event: Fault event, when @type is %IOMMU_FAULT_DMA_UNRECOV
+ * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
+ */
+struct iommu_fault {
+	__u32	type;
+	__u32	padding;
+	union {
+		struct iommu_fault_unrecoverable event;
+		struct iommu_fault_page_request prm;
+	};
+};
+
+/**
+ * struct iommu_pasid_smmuv3 - ARM SMMUv3 Stream Table Entry stage 1 related
+ *     information
+ * @version: API version of this structure
+ * @s1fmt: STE s1fmt (format of the CD table: single CD, linear table
+ *         or 2-level table)
+ * @s1dss: STE s1dss (specifies the behavior when @pasid_bits != 0
+ *         and no PASID is passed along with the incoming transaction)
+ * @padding: reserved for future use (should be zero)
+ *
+ * The PASID table is referred to as the Context Descriptor (CD) table on ARM
+ * SMMUv3. Please refer to the ARM SMMU 3.x spec (ARM IHI 0070A) for full
+ * details.
+ */
+struct iommu_pasid_smmuv3 {
+#define PASID_TABLE_SMMUV3_CFG_VERSION_1 1
+	__u32	version;
+	__u8	s1fmt;
+	__u8	s1dss;
+	__u8	padding[2];
+};
+
+/**
+ * struct iommu_pasid_table_config - PASID table data used to bind guest PASID
+ *     table to the host IOMMU
+ * @version: API version to prepare for future extensions
+ * @format: format of the PASID table
+ * @base_ptr: guest physical address of the PASID table
+ * @pasid_bits: number of PASID bits used in the PASID table
+ * @config: indicates whether the guest translation stage must
+ *          be translated, bypassed or aborted.
+ * @padding: reserved for future use (should be zero)
+ * @smmuv3: table information when @format is %IOMMU_PASID_FORMAT_SMMUV3
+ */
+struct iommu_pasid_table_config {
+#define PASID_TABLE_CFG_VERSION_1 1
+	__u32	version;
+#define IOMMU_PASID_FORMAT_SMMUV3	1
+	__u32	format;
+	__u64	base_ptr;
+	__u8	pasid_bits;
+#define IOMMU_PASID_CONFIG_TRANSLATE	1
+#define IOMMU_PASID_CONFIG_BYPASS	2
+#define IOMMU_PASID_CONFIG_ABORT	3
+	__u8	config;
+	__u8    padding[6];
+	union {
+		struct iommu_pasid_smmuv3 smmuv3;
+	};
+};
+
+/* defines the granularity of the invalidation */
+enum iommu_inv_granularity {
+	IOMMU_INV_GRANU_DOMAIN,	/* domain-selective invalidation */
+	IOMMU_INV_GRANU_PASID,	/* PASID-selective invalidation */
+	IOMMU_INV_GRANU_ADDR,	/* page-selective invalidation */
+	IOMMU_INV_GRANU_NR,	/* number of invalidation granularities */
+};
+
+/**
+ * struct iommu_inv_addr_info - Address Selective Invalidation Structure
+ *
+ * @flags: indicates the granularity of the address-selective invalidation
+ * - If the PASID bit is set, the @pasid field is populated and the invalidation
+ *   relates to cache entries tagged with this PASID and matching the address
+ *   range.
+ * - If ARCHID bit is set, @archid is populated and the invalidation relates
+ *   to cache entries tagged with this architecture specific ID and matching
+ *   the address range.
+ * - Both PASID and ARCHID can be set as they may tag different caches.
+ * - If neither PASID or ARCHID is set, global addr invalidation applies.
+ * - The LEAF flag indicates whether only the leaf PTE caching needs to be
+ *   invalidated and other paging structure caches can be preserved.
+ * @pasid: process address space ID
+ * @archid: architecture-specific ID
+ * @addr: first stage/level input address
+ * @granule_size: page/block size of the mapping in bytes
+ * @nb_granules: number of contiguous granules to be invalidated
+ */
+struct iommu_inv_addr_info {
+#define IOMMU_INV_ADDR_FLAGS_PASID	(1 << 0)
+#define IOMMU_INV_ADDR_FLAGS_ARCHID	(1 << 1)
+#define IOMMU_INV_ADDR_FLAGS_LEAF	(1 << 2)
+	__u32	flags;
+	__u32	archid;
+	__u64	pasid;
+	__u64	addr;
+	__u64	granule_size;
+	__u64	nb_granules;
+};
+
+/**
+ * struct iommu_inv_pasid_info - PASID Selective Invalidation Structure
+ *
+ * @flags: indicates the granularity of the PASID-selective invalidation
+ * - If the PASID bit is set, the @pasid field is populated and the invalidation
+ *   relates to cache entries tagged with this PASID and matching the address
+ *   range.
+ * - If the ARCHID bit is set, the @archid is populated and the invalidation
+ *   relates to cache entries tagged with this architecture specific ID and
+ *   matching the address range.
+ * - Both PASID and ARCHID can be set as they may tag different caches.
+ * - At least one of PASID or ARCHID must be set.
+ * @pasid: process address space ID
+ * @archid: architecture-specific ID
+ */
+struct iommu_inv_pasid_info {
+#define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
+#define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
+	__u32	flags;
+	__u32	archid;
+	__u64	pasid;
+};
+
+/**
+ * struct iommu_cache_invalidate_info - First level/stage invalidation
+ *     information
+ * @version: API version of this structure
+ * @cache: bitfield that allows to select which caches to invalidate
+ * @granularity: defines the lowest granularity used for the invalidation:
+ *     domain > PASID > addr
+ * @padding: reserved for future use (should be zero)
+ * @pasid_info: invalidation data when @granularity is %IOMMU_INV_GRANU_PASID
+ * @addr_info: invalidation data when @granularity is %IOMMU_INV_GRANU_ADDR
+ *
+ * Not all the combinations of cache/granularity are valid:
+ *
+ * +--------------+---------------+---------------+---------------+
+ * | type /       |   DEV_IOTLB   |     IOTLB     |      PASID    |
+ * | granularity  |               |               |      cache    |
+ * +==============+===============+===============+===============+
+ * | DOMAIN       |       N/A     |       Y       |       Y       |
+ * +--------------+---------------+---------------+---------------+
+ * | PASID        |       Y       |       Y       |       Y       |
+ * +--------------+---------------+---------------+---------------+
+ * | ADDR         |       Y       |       Y       |       N/A     |
+ * +--------------+---------------+---------------+---------------+
+ *
+ * Invalidations by %IOMMU_INV_GRANU_DOMAIN don't take any argument other than
+ * @version and @cache.
+ *
+ * If multiple cache types are invalidated simultaneously, they all
+ * must support the used granularity.
+ */
+struct iommu_cache_invalidate_info {
+#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
+	__u32	version;
+/* IOMMU paging structure cache */
+#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /* IOMMU IOTLB */
+#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /* Device IOTLB */
+#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /* PASID cache */
+#define IOMMU_CACHE_INV_TYPE_NR		(3)
+	__u8	cache;
+	__u8	granularity;
+	__u8	padding[2];
+	union {
+		struct iommu_inv_pasid_info pasid_info;
+		struct iommu_inv_addr_info addr_info;
+	};
+};
+
+/**
+ * struct gpasid_bind_data_vtd - Intel VT-d specific data on device and guest
+ * SVA binding.
+ *
+ * @flags:	VT-d PASID table entry attributes
+ * @pat:	Page attribute table data to compute effective memory type
+ * @emt:	Extended memory type
+ *
+ * Only guest vIOMMU selectable and effective options are passed down to
+ * the host IOMMU.
+ */
+struct gpasid_bind_data_vtd {
+#define IOMMU_SVA_VTD_GPASID_SRE	(1 << 0) /* supervisor request */
+#define IOMMU_SVA_VTD_GPASID_EAFE	(1 << 1) /* extended access enable */
+#define IOMMU_SVA_VTD_GPASID_PCD	(1 << 2) /* page-level cache disable */
+#define IOMMU_SVA_VTD_GPASID_PWT	(1 << 3) /* page-level write through */
+#define IOMMU_SVA_VTD_GPASID_EMTE	(1 << 4) /* extended mem type enable */
+#define IOMMU_SVA_VTD_GPASID_CD		(1 << 5) /* PASID-level cache disable */
+	__u64 flags;
+	__u32 pat;
+	__u32 emt;
+};
+
+/**
+ * struct gpasid_bind_data - Information about device and guest PASID binding
+ * @version:	Version of this data structure
+ * @format:	PASID table entry format
+ * @flags:	Additional information on guest bind request
+ * @gpgd:	Guest page directory base of the guest mm to bind
+ * @hpasid:	Process address space ID used for the guest mm in host IOMMU
+ * @gpasid:	Process address space ID used for the guest mm in guest IOMMU
+ * @addr_width:	Guest virtual address width
+ * @vtd:	Intel VT-d specific data
+ *
+ * Guest to host PASID mapping can be an identity or non-identity, where guest
+ * has its own PASID space. For non-identify mapping, guest to host PASID lookup
+ * is needed when VM programs guest PASID into an assigned device. VMM may
+ * trap such PASID programming then request host IOMMU driver to convert guest
+ * PASID to host PASID based on this bind data.
+ */
+struct gpasid_bind_data {
+#define IOMMU_GPASID_BIND_VERSION_1	1
+	__u32 version;
+#define IOMMU_PASID_FORMAT_INTEL_VTD	1
+	__u32 format;
+#define IOMMU_SVA_GPASID_VAL	(1 << 0) /* guest PASID valid */
+	__u64 flags;
+	__u64 gpgd;
+	__u64 hpasid;
+	__u64 gpasid;
+	__u32 addr_width;
+	__u8  padding[4];
+	/* Vendor specific data */
+	union {
+		struct gpasid_bind_data_vtd vtd;
+	};
+};
+
+#endif /* _UAPI_IOMMU_H */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 02/18] linux-headers: import vfio.h from kernel
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
  2019-07-05 11:01 ` [RFC v1 01/18] linux-headers: import iommu.h from kernel Liu Yi L
@ 2019-07-05 11:01 ` " Liu Yi L
  2019-07-09  1:58   ` Peter Xu
  2019-07-05 11:01 ` [RFC v1 03/18] hw/pci: introduce PCIPASIDOps to PCIDevice Liu Yi L
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

This patch imports the vIOMMU related definitions from kernel
uapi/vfio.h. e.g. pasid allocation, guest pasid bind, guest pasid
table bind and guest iommu cache invalidation.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
---
 linux-headers/linux/vfio.h | 116 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 116 insertions(+)

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 24f5051..551648e 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -14,6 +14,7 @@
 
 #include <linux/types.h>
 #include <linux/ioctl.h>
+#include <linux/iommu.h>
 
 #define VFIO_API_VERSION	0
 
@@ -763,6 +764,121 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/**
+ * VFIO_IOMMU_ATTACH_PASID_TABLE - _IOWR(VFIO_TYPE, VFIO_BASE + 22,
+ *			struct vfio_iommu_type1_attach_pasid_table)
+ *
+ * Passes the PASID table to the host. Calling ATTACH_PASID_TABLE
+ * while a table is already installed is allowed: it replaces the old
+ * table. DETACH does a comprehensive tear down of the nested mode.
+ */
+struct vfio_iommu_type1_attach_pasid_table {
+	__u32	argsz;
+	__u32	flags;
+	struct iommu_pasid_table_config config;
+};
+#define VFIO_IOMMU_ATTACH_PASID_TABLE	_IO(VFIO_TYPE, VFIO_BASE + 22)
+
+/**
+ * VFIO_IOMMU_DETACH_PASID_TABLE - - _IOWR(VFIO_TYPE, VFIO_BASE + 23)
+ * Detaches the PASID table
+ */
+#define VFIO_IOMMU_DETACH_PASID_TABLE	_IO(VFIO_TYPE, VFIO_BASE + 23)
+
+/**
+ * VFIO_IOMMU_CACHE_INVALIDATE - _IOWR(VFIO_TYPE, VFIO_BASE + 24,
+ *			struct vfio_iommu_type1_cache_invalidate)
+ *
+ * Propagate guest IOMMU cache invalidation to the host.
+ */
+struct vfio_iommu_type1_cache_invalidate {
+	__u32   argsz;
+	__u32   flags;
+	struct iommu_cache_invalidate_info info;
+};
+#define VFIO_IOMMU_CACHE_INVALIDATE      _IO(VFIO_TYPE, VFIO_BASE + 24)
+
+/*
+ * @flag=VFIO_IOMMU_PASID_ALLOC, refer to the @min_pasid and @max_pasid fields
+ * @flag=VFIO_IOMMU_PASID_FREE, refer to @pasid field
+ */
+struct vfio_iommu_type1_pasid_request {
+	__u32	argsz;
+#define VFIO_IOMMU_PASID_ALLOC	(1 << 0)
+#define VFIO_IOMMU_PASID_FREE	(1 << 1)
+	__u32	flag;
+	union {
+		struct {
+			int min_pasid;
+			int max_pasid;
+		};
+		int pasid;
+	};
+};
+
+/**
+ * VFIO_IOMMU_PASID_REQUEST - _IOWR(VFIO_TYPE, VFIO_BASE + 27,
+ *				struct vfio_iommu_type1_pasid_request)
+ *
+ */
+#define VFIO_IOMMU_PASID_REQUEST	_IO(VFIO_TYPE, VFIO_BASE + 27)
+
+/*
+ * In guest use of SVA, the first level page tables is managed by the guest.
+ * we can either bind guest PASID table or explicitly bind a PASID with guest
+ * page table.
+ */
+struct vfio_iommu_type1_bind_guest_pasid {
+	struct gpasid_bind_data bind_data;
+};
+
+enum vfio_iommu_bind_type {
+	VFIO_IOMMU_BIND_PROCESS,
+	VFIO_IOMMU_BIND_GUEST_PASID,
+};
+
+/*
+ * Supported types:
+ *     - VFIO_IOMMU_BIND_PROCESS: bind native process, which takes
+ *                      vfio_iommu_type1_bind_process in data.
+ *     - VFIO_IOMMU_BIND_GUEST_PASID: bind guest pasid, which invoked
+ *                      by guest process binding, it takes
+ *                      vfio_iommu_type1_bind_guest_pasid in data.
+ */
+struct vfio_iommu_type1_bind {
+	__u32				argsz;
+	enum vfio_iommu_bind_type	bind_type;
+	__u8				data[];
+};
+
+/*
+ * VFIO_IOMMU_BIND - _IOWR(VFIO_TYPE, VFIO_BASE + 28, struct vfio_iommu_bind)
+ *
+ * Manage address spaces of devices in this container. Initially a TYPE1
+ * container can only have one address space, managed with
+ * VFIO_IOMMU_MAP/UNMAP_DMA.
+ *
+ * An IOMMU of type VFIO_TYPE1_NESTING_IOMMU can be managed by both MAP/UNMAP
+ * and BIND ioctls at the same time. MAP/UNMAP acts on the stage-2 (host) page
+ * tables, and BIND manages the stage-1 (guest) page tables. Other types of
+ * IOMMU may allow MAP/UNMAP and BIND to coexist, where MAP/UNMAP controls
+ * non-PASID traffic and BIND controls PASID traffic. But this depends on the
+ * underlying IOMMU architecture and isn't guaranteed.
+ *
+ * Availability of this feature depends on the device, its bus, the underlying
+ * IOMMU and the CPU architecture.
+ *
+ * returns: 0 on success, -errno on failure.
+ */
+#define VFIO_IOMMU_BIND		_IO(VFIO_TYPE, VFIO_BASE + 28)
+
+/*
+ * VFIO_IOMMU_UNBIND - _IOWR(VFIO_TYPE, VFIO_BASE + 29, struct vfio_iommu_bind)
+ *
+ * Undo what was done by the corresponding VFIO_IOMMU_BIND ioctl.
+ */
+#define VFIO_IOMMU_UNBIND	_IO(VFIO_TYPE, VFIO_BASE + 29)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 03/18] hw/pci: introduce PCIPASIDOps to PCIDevice
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
  2019-07-05 11:01 ` [RFC v1 01/18] linux-headers: import iommu.h from kernel Liu Yi L
  2019-07-05 11:01 ` [RFC v1 02/18] linux-headers: import vfio.h " Liu Yi L
@ 2019-07-05 11:01 ` Liu Yi L
  2019-07-09  2:12   ` Peter Xu
  2019-07-05 11:01 ` [RFC v1 04/18] intel_iommu: add "sm_model" option Liu Yi L
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

This patch introduces PCIPASIDOps for PASID related operations in
future usage like virt-SVA. Related discussions can be found in
below links.

https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00078.html
https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00940.html

So far, to setup virt-SVA for assigned SVA capable device, needs to
configure host translation structures for specific pasid. (e.g. bind
guest page table to host and enable nested translation in host).
Besides, vIOMMU emulator needs to forward guest's cache invalidation
to host since host nested translation is enabled. e.g. on VT-d, guest
owns 1st level translation table, thus cache invalidation for 1st
level should be propagated to host.

This patch adds two functions: alloc_pasid and free_pasid to support
guest pasid allocation and free. The implementations of the callbacks
would be device passthru modules. Like vfio.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
---
 hw/pci/pci.c         | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/pci/pci.h | 14 ++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 8076a80..710f9e9 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2626,6 +2626,56 @@ void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque)
     bus->iommu_opaque = opaque;
 }
 
+void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops)
+{
+    assert(ops && !dev->pasid_ops);
+    dev->pasid_ops = ops;
+}
+
+bool pci_device_is_ops_set(PCIBus *bus, int32_t devfn)
+{
+    PCIDevice *dev;
+
+    if (!bus) {
+        return false;
+    }
+
+    dev = bus->devices[devfn];
+    return !!(dev && dev->pasid_ops);
+}
+
+int pci_device_request_pasid_alloc(PCIBus *bus, int32_t devfn,
+                                   uint32_t min_pasid, uint32_t max_pasid)
+{
+    PCIDevice *dev;
+
+    if (!bus) {
+        return -1;
+    }
+
+    dev = bus->devices[devfn];
+    if (dev && dev->pasid_ops && dev->pasid_ops->alloc_pasid) {
+        return dev->pasid_ops->alloc_pasid(bus, devfn, min_pasid, max_pasid);
+    }
+    return -1;
+}
+
+int pci_device_request_pasid_free(PCIBus *bus, int32_t devfn,
+                                  uint32_t pasid)
+{
+    PCIDevice *dev;
+
+    if (!bus) {
+        return -1;
+    }
+
+    dev = bus->devices[devfn];
+    if (dev && dev->pasid_ops && dev->pasid_ops->free_pasid) {
+        return dev->pasid_ops->free_pasid(bus, devfn, pasid);
+    }
+    return -1;
+}
+
 static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque)
 {
     Range *range = opaque;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index d082707..16e5b8e 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -262,6 +262,13 @@ struct PCIReqIDCache {
 };
 typedef struct PCIReqIDCache PCIReqIDCache;
 
+typedef struct PCIPASIDOps PCIPASIDOps;
+struct PCIPASIDOps {
+    int (*alloc_pasid)(PCIBus *bus, int32_t devfn,
+                         uint32_t min_pasid, uint32_t max_pasid);
+    int (*free_pasid)(PCIBus *bus, int32_t devfn, uint32_t pasid);
+};
+
 struct PCIDevice {
     DeviceState qdev;
 
@@ -351,6 +358,7 @@ struct PCIDevice {
     MSIVectorUseNotifier msix_vector_use_notifier;
     MSIVectorReleaseNotifier msix_vector_release_notifier;
     MSIVectorPollNotifier msix_vector_poll_notifier;
+    PCIPASIDOps *pasid_ops;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
@@ -484,6 +492,12 @@ typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
 void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
 
+void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops);
+bool pci_device_is_ops_set(PCIBus *bus, int32_t devfn);
+int pci_device_request_pasid_alloc(PCIBus *bus, int32_t devfn,
+                                   uint32_t min_pasid, uint32_t max_pasid);
+int pci_device_request_pasid_free(PCIBus *bus, int32_t devfn, uint32_t pasid);
+
 static inline void
 pci_set_byte(uint8_t *config, uint8_t val)
 {
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 04/18] intel_iommu: add "sm_model" option
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (2 preceding siblings ...)
  2019-07-05 11:01 ` [RFC v1 03/18] hw/pci: introduce PCIPASIDOps to PCIDevice Liu Yi L
@ 2019-07-05 11:01 ` Liu Yi L
  2019-07-09  2:15   ` Peter Xu
  2019-07-05 11:01 ` [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation Liu Yi L
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

Intel VT-d 3.0 introduces scalable mode, and it has a bunch of
capabilities related to scalable mode translation, thus there
are multiple combinations. While this vIOMMU implementation
wants simplify it for user by providing typical combinations.
User could config it by "sm_model" option. The usage is as
below:

"-device intel-iommu,x-scalable-mode=on,sm_model=["legacy"|"scalable"]"

 - "legacy": gives support for SL page table
 - "scalable": gives support for FL page table, pasid, virtual command
 - default to be "legacy" if "x-scalable-mode=on while no sm_model is
   configured

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 28 +++++++++++++++++++++++++++-
 hw/i386/intel_iommu_internal.h |  2 ++
 include/hw/i386/intel_iommu.h  |  1 +
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 44b1231..3160a05 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3014,6 +3014,7 @@ static Property vtd_properties[] = {
     DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
     DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode, FALSE),
     DEFINE_PROP_BOOL("dma-drain", IntelIOMMUState, dma_drain, true),
+    DEFINE_PROP_STRING("sm_model", IntelIOMMUState, sm_model),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -3489,6 +3490,14 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
     return;
 }
 
+const char sm_model_manual[] =
+        "\"-device intel-iommu,x-scalable-mode=on,"
+        "sm_model=[\"legacy\"|\"scalable\"]\"\n"
+        " - \"legacy\" gives support for SL page table based IOVA\n"
+        " - \"scalable\" gives support for FL page table based IOVA and SVA\n"
+        " - default to be \"legacy\" if \"x-scalable-mode=on\""
+        " while no sm_model is configured\n";
+
 /* Do the initialization. It will also be called when reset, so pay
  * attention when adding new initialization stuff.
  */
@@ -3557,9 +3566,26 @@ static void vtd_init(IntelIOMMUState *s)
         s->cap |= VTD_CAP_CM;
     }
 
+    if (s->sm_model && !s->scalable_mode) {
+        printf("\n\"sm_model\" depends on \"x-scalable-mode\"\n"
+               "please check if \"x-scalable-mode\" is expected\n"
+               "\"sm_model\" manual:\n%s", sm_model_manual);
+        exit(1);
+    }
+
     /* TODO: read cap/ecap from host to decide which cap to be exposed. */
     if (s->scalable_mode) {
-        s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
+        if (!s->sm_model || !strcmp(s->sm_model, "legacy")) {
+            s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
+        } else if (!strcmp(s->sm_model, "scalable")) {
+            s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID
+                       | VTD_ECAP_FLTS;
+        } else {
+            printf("\n!!!!! Invalid sm_model config !!!!!\n"
+                "Please config sm_model=[\"legacy\"|\"scalable\"]\n"
+                "\"sm_model\" manual:\n%s", sm_model_manual);
+            exit(1);
+        }
     }
 
     vtd_reset_caches(s);
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index c1235a7..adae198 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -190,8 +190,10 @@
 #define VTD_ECAP_PT                 (1ULL << 6)
 #define VTD_ECAP_MHMV               (15ULL << 20)
 #define VTD_ECAP_SRS                (1ULL << 31)
+#define VTD_ECAP_PASID              (1ULL << 40)
 #define VTD_ECAP_SMTS               (1ULL << 43)
 #define VTD_ECAP_SLTS               (1ULL << 46)
+#define VTD_ECAP_FLTS               (1ULL << 47)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 12f3d26..b51cc9f 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -270,6 +270,7 @@ struct IntelIOMMUState {
     bool buggy_eim;                 /* Force buggy EIM unless eim=off */
     uint8_t aw_bits;                /* Host/IOVA address width (in bits) */
     bool dma_drain;                 /* Whether DMA r/w draining enabled */
+    char *sm_model;          /* identify actual scalable mode iommu model*/
 
     /*
      * Protects IOMMU states in general.  Currently it protects the
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (3 preceding siblings ...)
  2019-07-05 11:01 ` [RFC v1 04/18] intel_iommu: add "sm_model" option Liu Yi L
@ 2019-07-05 11:01 ` Liu Yi L
  2019-07-09  2:23   ` Peter Xu
  2019-07-15  2:55   ` David Gibson
  2019-07-05 11:01 ` [RFC v1 06/18] intel_iommu: support virtual command emulation and pasid request Liu Yi L
                   ` (12 subsequent siblings)
  17 siblings, 2 replies; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

This patch adds vfio implementation PCIPASIDOps.alloc_pasid/free_pasid().
These two functions are used to propagate guest pasid allocation and
free requests to host via vfio container ioctl.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
---
 hw/vfio/pci.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index ce3fe96..ab184ad 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2690,6 +2690,65 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
     vdev->req_enabled = false;
 }
 
+static int vfio_pci_device_request_pasid_alloc(PCIBus *bus,
+                                               int32_t devfn,
+                                               uint32_t min_pasid,
+                                               uint32_t max_pasid)
+{
+    PCIDevice *pdev = bus->devices[devfn];
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+    VFIOContainer *container = vdev->vbasedev.group->container;
+    struct vfio_iommu_type1_pasid_request req;
+    unsigned long argsz;
+    int pasid;
+
+    argsz = sizeof(req);
+    req.argsz = argsz;
+    req.flag = VFIO_IOMMU_PASID_ALLOC;
+    req.min_pasid = min_pasid;
+    req.max_pasid = max_pasid;
+
+    rcu_read_lock();
+    pasid = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
+    if (pasid < 0) {
+        error_report("vfio_pci_device_request_pasid_alloc:"
+                     " request failed, contanier: %p", container);
+    }
+    rcu_read_unlock();
+    return pasid;
+}
+
+static int vfio_pci_device_request_pasid_free(PCIBus *bus,
+                                              int32_t devfn,
+                                              uint32_t pasid)
+{
+    PCIDevice *pdev = bus->devices[devfn];
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+    VFIOContainer *container = vdev->vbasedev.group->container;
+    struct vfio_iommu_type1_pasid_request req;
+    unsigned long argsz;
+    int ret = 0;
+
+    argsz = sizeof(req);
+    req.argsz = argsz;
+    req.flag = VFIO_IOMMU_PASID_FREE;
+    req.pasid = pasid;
+
+    rcu_read_lock();
+    ret = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
+    if (ret != 0) {
+        error_report("vfio_pci_device_request_pasid_free:"
+                     " request failed, contanier: %p", container);
+    }
+    rcu_read_unlock();
+    return ret;
+}
+
+static PCIPASIDOps vfio_pci_pasid_ops = {
+    .alloc_pasid = vfio_pci_device_request_pasid_alloc,
+    .free_pasid = vfio_pci_device_request_pasid_free,
+};
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
     VFIOPCIDevice *vdev = PCI_VFIO(pdev);
@@ -2991,6 +3050,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     vfio_register_req_notifier(vdev);
     vfio_setup_resetfn_quirk(vdev);
 
+    pci_setup_pasid_ops(pdev, &vfio_pci_pasid_ops);
+
     return;
 
 out_teardown:
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 06/18] intel_iommu: support virtual command emulation and pasid request
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (4 preceding siblings ...)
  2019-07-05 11:01 ` [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation Liu Yi L
@ 2019-07-05 11:01 ` Liu Yi L
  2019-07-09  3:19   ` Peter Xu
  2019-07-05 11:01 ` [RFC v1 07/18] hw/pci: add pci_device_bind/unbind_gpasid Liu Yi L
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

This patch adds virtual command support to Intel vIOMMU per Intel VT-d 3.1
spec. This patch adds two virtual commands: alloc_pasid and free_pasid.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 139 ++++++++++++++++++++++++++++++++++++++++-
 hw/i386/intel_iommu_internal.h |  30 +++++++++
 hw/i386/trace-events           |   1 +
 include/hw/i386/intel_iommu.h  |   6 +-
 4 files changed, 174 insertions(+), 2 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3160a05..3cf250d 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -932,11 +932,19 @@ static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
                 s->vtd_as_by_bus_num[bus_num] = vtd_bus;
                 return vtd_bus;
             }
+            vtd_bus = NULL;
         }
     }
     return vtd_bus;
 }
 
+static PCIBus *vtd_find_pci_bus_from_bus_num(IntelIOMMUState *s,
+                                             uint8_t bus_num)
+{
+    VTDBus *vtd_bus = vtd_find_as_from_bus_num(s, bus_num);
+    return vtd_bus ? vtd_bus->bus : NULL;
+}
+
 /* Given the @iova, get relevant @slptep. @slpte_level will be the last level
  * of the translation, can be used for deciding the size of large page.
  */
@@ -2579,6 +2587,103 @@ static void vtd_handle_iectl_write(IntelIOMMUState *s)
     }
 }
 
+static int vtd_request_pasid_alloc(IntelIOMMUState *s)
+{
+    PCIBus *bus;
+    int bus_n, devfn;
+
+    for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) {
+        bus = vtd_find_pci_bus_from_bus_num(s, bus_n);
+        if (!bus) {
+            continue;
+        }
+        for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
+            if (pci_device_is_ops_set(bus, devfn)) {
+                return pci_device_request_pasid_alloc(bus, devfn,
+                                                      VTD_MIN_HPASID,
+                                                      VTD_MAX_HPASID);
+            }
+        }
+    }
+    return -1;
+}
+
+static int vtd_request_pasid_free(IntelIOMMUState *s, uint32_t pasid)
+{
+    PCIBus *bus;
+    int bus_n, devfn;
+
+    for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) {
+        bus = vtd_find_pci_bus_from_bus_num(s, bus_n);
+        if (!bus) {
+            continue;
+        }
+        for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
+            if (pci_device_is_ops_set(bus, devfn)) {
+                return pci_device_request_pasid_free(bus, devfn, pasid);
+            }
+        }
+    }
+    return -1;
+}
+
+/* Handle write to Virtual Command Register */
+static void vtd_handle_vcmd_write(IntelIOMMUState *s)
+{
+    uint32_t status = vtd_get_long_raw(s, DMAR_VCRSP_REG);
+    uint32_t val = vtd_get_long_raw(s, DMAR_VCMD_REG);
+    uint32_t pasid;
+    int ret = -1;
+
+    trace_vtd_reg_write_vcmd(status, val);
+
+    switch (val & VTD_VCMD_CMD_MASK) {
+    case VTD_VCMD_ALLOC_PASID:
+        if (!(s->vccap & VTD_VCCAP_PAS) ||
+             (s->vcrsp & 1)) {
+            break;
+        }
+        s->vcrsp = 1;
+        vtd_set_quad_raw(s, DMAR_VCRSP_REG,
+                         ((uint64_t) s->vcrsp));
+        ret = vtd_request_pasid_alloc(s);
+        if (ret < 0) {
+            s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_NO_AVAILABLE_PASID);
+        } else {
+            s->vcrsp |= VTD_VCRSP_RSLT(ret);
+        }
+        s->vcrsp &= (~((uint64_t)(0x1)));
+        vtd_set_quad_raw(s, DMAR_VCRSP_REG,
+                         ((uint64_t) s->vcrsp));
+        break;
+
+    case VTD_VCMD_FREE_PASID:
+        if (!(s->vccap & VTD_VCCAP_PAS) ||
+             (s->vcrsp & 1)) {
+            break;
+        }
+        s->vcrsp &= 1;
+        vtd_set_quad_raw(s, DMAR_VCRSP_REG,
+                         ((uint64_t) s->vcrsp));
+        pasid = VTD_VCMD_PASID_VALUE(val);
+        ret = vtd_request_pasid_free(s, pasid);
+        if (ret < 0) {
+            s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_FREE_INVALID_PASID);
+        }
+        s->vcrsp &= (~((uint64_t)(0x1)));
+        vtd_set_quad_raw(s, DMAR_VCRSP_REG,
+                         ((uint64_t) s->vcrsp));
+        break;
+
+    default:
+        s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_UNDEFINED_CMD);
+        vtd_set_quad_raw(s, DMAR_VCRSP_REG,
+                         ((uint64_t) s->vcrsp));
+        printf("Virtual Command: unsupported command!!!\n");
+        break;
+    }
+}
+
 static uint64_t vtd_mem_read(void *opaque, hwaddr addr, unsigned size)
 {
     IntelIOMMUState *s = opaque;
@@ -2620,6 +2725,15 @@ static uint64_t vtd_mem_read(void *opaque, hwaddr addr, unsigned size)
         val = s->iq >> 32;
         break;
 
+    case DMAR_VCRSP_REG:
+        val = s->vcrsp;
+        break;
+
+    case DMAR_VCRSP_REG_HI:
+        assert(size == 4);
+        val = s->vcrsp >> 32;
+        break;
+
     default:
         if (size == 4) {
             val = vtd_get_long(s, addr);
@@ -2868,6 +2982,21 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
         vtd_set_long(s, addr, val);
         break;
 
+    case DMAR_VCMD_REG:
+        if (size == 4) {
+            vtd_set_long(s, addr, val);
+        } else {
+            vtd_set_quad(s, addr, val);
+        }
+        vtd_handle_vcmd_write(s);
+        break;
+
+    case DMAR_VCMD_REG_HI:
+        assert(size == 4);
+        vtd_set_long(s, addr, val);
+        vtd_handle_vcmd_write(s);
+        break;
+
     default:
         if (size == 4) {
             vtd_set_long(s, addr, val);
@@ -3579,7 +3708,8 @@ static void vtd_init(IntelIOMMUState *s)
             s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
         } else if (!strcmp(s->sm_model, "scalable")) {
             s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID
-                       | VTD_ECAP_FLTS;
+                       | VTD_ECAP_FLTS | VTD_ECAP_VCS;
+            s->vccap |= VTD_VCCAP_PAS;
         } else {
             printf("\n!!!!! Invalid sm_model config !!!!!\n"
                 "Please config sm_model=[\"legacy\"|\"scalable\"]\n"
@@ -3641,6 +3771,13 @@ static void vtd_init(IntelIOMMUState *s)
      * Interrupt remapping registers.
      */
     vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xfffffffffffff80fULL, 0);
+
+    /*
+     * Virtual Command Definitions
+     */
+    vtd_define_quad(s, DMAR_VCCAP_REG, s->vccap, 0, 0);
+    vtd_define_quad(s, DMAR_VCMD_REG, 0, 0xffffffffffffffffULL, 0);
+    vtd_define_quad(s, DMAR_VCRSP_REG, 0, 0, 0);
 }
 
 /* Should not reset address_spaces when reset because devices will still use
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index adae198..f5a2f0d 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -85,6 +85,12 @@
 #define DMAR_MTRRCAP_REG_HI     0x104
 #define DMAR_MTRRDEF_REG        0x108 /* MTRR default type */
 #define DMAR_MTRRDEF_REG_HI     0x10c
+#define DMAR_VCCAP_REG          0xE00 /* Virtual Command Capability Register */
+#define DMAR_VCCAP_REG_HI       0xE04
+#define DMAR_VCMD_REG           0xE10 /* Virtual Command Register */
+#define DMAR_VCMD_REG_HI        0xE14
+#define DMAR_VCRSP_REG          0xE20 /* Virtual Command Reponse Register */
+#define DMAR_VCRSP_REG_HI       0xE24
 
 /* IOTLB registers */
 #define DMAR_IOTLB_REG_OFFSET   0xf0 /* Offset to the IOTLB registers */
@@ -192,6 +198,7 @@
 #define VTD_ECAP_SRS                (1ULL << 31)
 #define VTD_ECAP_PASID              (1ULL << 40)
 #define VTD_ECAP_SMTS               (1ULL << 43)
+#define VTD_ECAP_VCS                (1ULL << 44)
 #define VTD_ECAP_SLTS               (1ULL << 46)
 #define VTD_ECAP_FLTS               (1ULL << 47)
 
@@ -314,6 +321,29 @@ typedef enum VTDFaultReason {
 
 #define VTD_CONTEXT_CACHE_GEN_MAX       0xffffffffUL
 
+/* VCCAP_REG */
+#define VTD_VCCAP_PAS               (1UL << 0)
+#define VTD_MIN_HPASID              200
+#define VTD_MAX_HPASID              0xFFFFF
+
+/* Virtual Command Register */
+enum {
+     VTD_VCMD_NULL_CMD = 0,
+     VTD_VCMD_ALLOC_PASID,
+     VTD_VCMD_FREE_PASID,
+     VTD_VCMD_CMD_NUM,
+};
+
+#define VTD_VCMD_CMD_MASK           0xffUL
+#define VTD_VCMD_PASID_VALUE(val)   (((val) >> 8) & 0xfffff)
+
+#define VTD_VCRSP_RSLT(val)         ((val) << 8)
+#define VTD_VCRSP_SC(val)           (((val) & 0x3) << 1)
+
+#define VTD_VCMD_UNDEFINED_CMD         1ULL
+#define VTD_VCMD_NO_AVAILABLE_PASID    2ULL
+#define VTD_VCMD_FREE_INVALID_PASID    2ULL
+
 /* Interrupt Entry Cache Invalidation Descriptor: VT-d 6.5.2.7. */
 struct VTDInvDescIEC {
     uint32_t type:4;            /* Should always be 0x4 */
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index c8bc464..43c0314 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -51,6 +51,7 @@ vtd_reg_write_gcmd(uint32_t status, uint32_t val) "status 0x%"PRIx32" value 0x%"
 vtd_reg_write_fectl(uint32_t value) "value 0x%"PRIx32
 vtd_reg_write_iectl(uint32_t value) "value 0x%"PRIx32
 vtd_reg_ics_clear_ip(void) ""
+vtd_reg_write_vcmd(uint32_t status, uint32_t val) "status 0x%"PRIx32" value 0x%"PRIx32
 vtd_dmar_translate(uint8_t bus, uint8_t slot, uint8_t func, uint64_t iova, uint64_t gpa, uint64_t mask) "dev %02x:%02x.%02x iova 0x%"PRIx64" -> gpa 0x%"PRIx64" mask 0x%"PRIx64
 vtd_dmar_enable(bool en) "enable %d"
 vtd_dmar_fault(uint16_t sid, int fault, uint64_t addr, bool is_write) "sid 0x%"PRIx16" fault %d addr 0x%"PRIx64" write %d"
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index b51cc9f..4b74f3d 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -46,7 +46,7 @@
 #define VTD_SID_TO_BUS(sid)         (((sid) >> 8) & 0xff)
 #define VTD_SID_TO_DEVFN(sid)       ((sid) & 0xff)
 
-#define DMAR_REG_SIZE               0x230
+#define DMAR_REG_SIZE               0xF00
 #define VTD_HOST_AW_39BIT           39
 #define VTD_HOST_AW_48BIT           48
 #define VTD_HOST_ADDRESS_WIDTH      VTD_HOST_AW_39BIT
@@ -272,6 +272,10 @@ struct IntelIOMMUState {
     bool dma_drain;                 /* Whether DMA r/w draining enabled */
     char *sm_model;          /* identify actual scalable mode iommu model*/
 
+    /* Virtual Command Register */
+    uint64_t vccap;                 /* The value of vcmd capability reg */
+    uint64_t vcrsp;                 /* Current value of VCMD RSP REG */
+
     /*
      * Protects IOMMU states in general.  Currently it protects the
      * per-IOMMU IOTLB cache, and context entry cache in VTDAddressSpace.
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 07/18] hw/pci: add pci_device_bind/unbind_gpasid
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (5 preceding siblings ...)
  2019-07-05 11:01 ` [RFC v1 06/18] intel_iommu: support virtual command emulation and pasid request Liu Yi L
@ 2019-07-05 11:01 ` Liu Yi L
  2019-07-09  8:37   ` Auger Eric
  2019-07-05 11:01 ` [RFC v1 08/18] vfio/pci: add vfio bind/unbind_gpasid implementation Liu Yi L
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

This patch adds two callbacks pci_device_bind/unbind_gpasid() to
PCIPASIDOps. These two callbacks are used to propagate guest pasid
bind/unbind to host. The implementations of the callbacks would be
device passthru modules like vfio.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/pci/pci.c         | 30 ++++++++++++++++++++++++++++++
 include/hw/pci/pci.h |  9 +++++++++
 2 files changed, 39 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 710f9e9..2229229 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2676,6 +2676,36 @@ int pci_device_request_pasid_free(PCIBus *bus, int32_t devfn,
     return -1;
 }
 
+void pci_device_bind_gpasid(PCIBus *bus, int32_t devfn,
+                                struct gpasid_bind_data *g_bind_data)
+{
+    PCIDevice *dev;
+
+    if (!bus) {
+        return;
+    }
+
+    dev = bus->devices[devfn];
+    if (dev && dev->pasid_ops) {
+        dev->pasid_ops->bind_gpasid(bus, devfn, g_bind_data);
+    }
+}
+
+void pci_device_unbind_gpasid(PCIBus *bus, int32_t devfn,
+                                struct gpasid_bind_data *g_bind_data)
+{
+    PCIDevice *dev;
+
+    if (!bus) {
+        return;
+    }
+
+    dev = bus->devices[devfn];
+    if (dev && dev->pasid_ops) {
+        dev->pasid_ops->unbind_gpasid(bus, devfn, g_bind_data);
+    }
+}
+
 static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque)
 {
     Range *range = opaque;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 16e5b8e..8d849e6 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -9,6 +9,7 @@
 #include "hw/isa/isa.h"
 
 #include "hw/pci/pcie.h"
+#include <linux/iommu.h>
 
 extern bool pci_available;
 
@@ -267,6 +268,10 @@ struct PCIPASIDOps {
     int (*alloc_pasid)(PCIBus *bus, int32_t devfn,
                          uint32_t min_pasid, uint32_t max_pasid);
     int (*free_pasid)(PCIBus *bus, int32_t devfn, uint32_t pasid);
+    void (*bind_gpasid)(PCIBus *bus, int32_t devfn,
+                            struct gpasid_bind_data *g_bind_data);
+    void (*unbind_gpasid)(PCIBus *bus, int32_t devfn,
+                            struct gpasid_bind_data *g_bind_data);
 };
 
 struct PCIDevice {
@@ -497,6 +502,10 @@ bool pci_device_is_ops_set(PCIBus *bus, int32_t devfn);
 int pci_device_request_pasid_alloc(PCIBus *bus, int32_t devfn,
                                    uint32_t min_pasid, uint32_t max_pasid);
 int pci_device_request_pasid_free(PCIBus *bus, int32_t devfn, uint32_t pasid);
+void pci_device_bind_gpasid(PCIBus *bus, int32_t devfn,
+                            struct gpasid_bind_data *g_bind_data);
+void pci_device_unbind_gpasid(PCIBus *bus, int32_t devfn,
+                            struct gpasid_bind_data *g_bind_data);
 
 static inline void
 pci_set_byte(uint8_t *config, uint8_t val)
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 08/18] vfio/pci: add vfio bind/unbind_gpasid implementation
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (6 preceding siblings ...)
  2019-07-05 11:01 ` [RFC v1 07/18] hw/pci: add pci_device_bind/unbind_gpasid Liu Yi L
@ 2019-07-05 11:01 ` Liu Yi L
  2019-07-09  8:37   ` Auger Eric
  2019-07-05 11:01 ` [RFC v1 09/18] intel_iommu: process pasid cache invalidation Liu Yi L
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

This patch adds vfio implementation PCIPASIDOps.bind_gpasid/unbind_pasid().
These two functions are used to propagate guest pasid bind and unbind
requests to host via vfio container ioctl.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/vfio/pci.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index ab184ad..892b46c 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2744,9 +2744,63 @@ static int vfio_pci_device_request_pasid_free(PCIBus *bus,
     return ret;
 }
 
+static void vfio_pci_device_bind_gpasid(PCIBus *bus, int32_t devfn,
+                                     struct gpasid_bind_data *g_bind_data)
+{
+    PCIDevice *pdev = bus->devices[devfn];
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+    VFIOContainer *container = vdev->vbasedev.group->container;
+    struct vfio_iommu_type1_bind *bind;
+    struct vfio_iommu_type1_bind_guest_pasid *bind_guest_pasid;
+    unsigned long argsz;
+
+    argsz = sizeof(*bind) + sizeof(*bind_guest_pasid);
+    bind = g_malloc0(argsz);
+    bind->argsz = argsz;
+    bind->bind_type = VFIO_IOMMU_BIND_GUEST_PASID;
+    bind_guest_pasid = (struct vfio_iommu_type1_bind_guest_pasid *) &bind->data;
+    bind_guest_pasid->bind_data = *g_bind_data;
+
+    rcu_read_lock();
+    if (ioctl(container->fd, VFIO_IOMMU_BIND, bind) != 0) {
+        error_report("vfio_pci_device_bind_gpasid:"
+                     " bind failed, contanier: %p", container);
+    }
+    rcu_read_unlock();
+    g_free(bind);
+}
+
+static void vfio_pci_device_unbind_gpasid(PCIBus *bus, int32_t devfn,
+                                     struct gpasid_bind_data *g_bind_data)
+{
+    PCIDevice *pdev = bus->devices[devfn];
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+    VFIOContainer *container = vdev->vbasedev.group->container;
+    struct vfio_iommu_type1_bind *bind;
+    struct vfio_iommu_type1_bind_guest_pasid *bind_guest_pasid;
+    unsigned long argsz;
+
+    argsz = sizeof(*bind) + sizeof(*bind_guest_pasid);
+    bind = g_malloc0(argsz);
+    bind->argsz = argsz;
+    bind->bind_type = VFIO_IOMMU_BIND_GUEST_PASID;
+    bind_guest_pasid = (struct vfio_iommu_type1_bind_guest_pasid *) &bind->data;
+    bind_guest_pasid->bind_data = *g_bind_data;
+
+    rcu_read_lock();
+    if (ioctl(container->fd, VFIO_IOMMU_UNBIND, bind) != 0) {
+        error_report("vfio_pci_device_unbind_gpasid:"
+                     " unbind failed, contanier: %p", container);
+    }
+    rcu_read_unlock();
+    g_free(bind);
+}
+
 static PCIPASIDOps vfio_pci_pasid_ops = {
     .alloc_pasid = vfio_pci_device_request_pasid_alloc,
     .free_pasid = vfio_pci_device_request_pasid_free,
+    .bind_gpasid = vfio_pci_device_bind_gpasid,
+    .unbind_gpasid = vfio_pci_device_unbind_gpasid,
 };
 
 static void vfio_realize(PCIDevice *pdev, Error **errp)
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 09/18] intel_iommu: process pasid cache invalidation
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (7 preceding siblings ...)
  2019-07-05 11:01 ` [RFC v1 08/18] vfio/pci: add vfio bind/unbind_gpasid implementation Liu Yi L
@ 2019-07-05 11:01 ` Liu Yi L
  2019-07-09  4:47   ` Peter Xu
  2019-07-05 11:01 ` [RFC v1 10/18] intel_iommu: tag VTDAddressSpace instance with PASID Liu Yi L
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

This patch adds PASID cache flush emulation framework. Per Intel VT-d 3.0
spec, PASID cache invalidation under caching-mode provides a mechanism
software Intel VT-d(vIOMMU) implementations to track guest PASID bind/unbind
operations. This is a key part of vIOMMU support for guest SVA. And this
patch only adds the frame of it. The detailed implementation relies on
PASID records management implementation in vIOMMU, which will be covered
in later patch of this series.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c          | 40 +++++++++++++++++++++++++++++++++++-----
 hw/i386/intel_iommu_internal.h | 12 ++++++++++++
 2 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3cf250d..ef13662 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2331,6 +2331,37 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
     return true;
 }
 
+static bool vtd_process_pasid_desc(IntelIOMMUState *s,
+                                   VTDInvDesc *inv_desc)
+{
+    if ((inv_desc->val[0] & VTD_INV_DESC_PASIDC_RSVD_VAL0) ||
+        (inv_desc->val[1] & VTD_INV_DESC_PASIDC_RSVD_VAL1) ||
+        (inv_desc->val[2] & VTD_INV_DESC_PASIDC_RSVD_VAL2) ||
+        (inv_desc->val[3] & VTD_INV_DESC_PASIDC_RSVD_VAL3)) {
+        trace_vtd_inv_desc("non-zero-field-in-pc_inv_desc",
+                            inv_desc->val[1], inv_desc->val[0]);
+        return false;
+    }
+
+    switch (inv_desc->val[0] & VTD_INV_DESC_PASIDC_G) {
+    case VTD_INV_DESC_PASIDC_DSI:
+        break;
+
+    case VTD_INV_DESC_PASIDC_PASID_SI:
+        break;
+
+    case VTD_INV_DESC_PASIDC_GLOBAL:
+        break;
+
+    default:
+        trace_vtd_inv_desc("invalid-inv-granu-in-pc_inv_desc",
+                            inv_desc->val[1], inv_desc->val[0]);
+        return false;
+    }
+
+    return true;
+}
+
 static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
                                      VTDInvDesc *inv_desc)
 {
@@ -2437,12 +2468,11 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         }
         break;
 
-    /*
-     * TODO: the entity of below two cases will be implemented in future series.
-     * To make guest (which integrates scalable mode support patch set in
-     * iommu driver) work, just return true is enough so far.
-     */
     case VTD_INV_DESC_PC:
+        trace_vtd_inv_desc("pasid-cache", inv_desc.val[1], inv_desc.val[0]);
+        if (!vtd_process_pasid_desc(s, &inv_desc)) {
+            return false;
+        }
         break;
 
     case VTD_INV_DESC_PIOTLB:
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index f5a2f0d..e335800 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -436,6 +436,18 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \
         (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
 
+#define VTD_INV_DESC_PASIDC_G          (3ULL << 4)
+#define VTD_INV_DESC_PASIDC_PASID(val) (((val) >> 32) & 0xfffffULL)
+#define VTD_INV_DESC_PASIDC_DID(val)   (((val) >> 16) & VTD_DOMAIN_ID_MASK)
+#define VTD_INV_DESC_PASIDC_RSVD_VAL0  0xfff000000000ffc0ULL
+#define VTD_INV_DESC_PASIDC_RSVD_VAL1  0xffffffffffffffffULL
+#define VTD_INV_DESC_PASIDC_RSVD_VAL2  0xffffffffffffffffULL
+#define VTD_INV_DESC_PASIDC_RSVD_VAL3  0xffffffffffffffffULL
+
+#define VTD_INV_DESC_PASIDC_DSI        (0ULL << 4)
+#define VTD_INV_DESC_PASIDC_PASID_SI   (1ULL << 4)
+#define VTD_INV_DESC_PASIDC_GLOBAL     (3ULL << 4)
+
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
     uint16_t domain_id;
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 10/18] intel_iommu: tag VTDAddressSpace instance with PASID
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (8 preceding siblings ...)
  2019-07-05 11:01 ` [RFC v1 09/18] intel_iommu: process pasid cache invalidation Liu Yi L
@ 2019-07-05 11:01 ` Liu Yi L
  2019-07-09  6:12   ` Peter Xu
  2019-07-05 11:01 ` [RFC v1 11/18] intel_iommu: create VTDAddressSpace per BDF+PASID Liu Yi L
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

This patch introduces new fields in VTDAddressSpace for further PASID support
in Intel vIOMMU. In old time, each device has a VTDAddressSpace instance to
stand for its guest IOVA address space when vIOMMU is enabled. However, when
PASID is exposed to guest, device will have multiple address spaces which
are tagged with PASID. To suit this change, VTDAddressSpace should be tagged
with PASIDs in Intel vIOMMU.

To record PASID tagged VTDAddressSpaces, a hash table is introduced. The
data in the hash table can be used for future sanity check and retrieve
previous PASID configs of guest and also future emulated SVA DMA support
for emulated SVA capable devices. The lookup key is a string and its format
is as below:

"rsv%04dpasid%010dsid%06d" -- totally 32 bytes

Example: device 00:02.0 is bound to pasid 5, then its key to index
hash table is:
"rsv0000pasid0000000005sid000016"

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c          | 166 ++++++++++++++++++++++++++++++++++++++++-
 hw/i386/intel_iommu_internal.h |   9 +++
 hw/i386/trace-events           |   4 +
 include/hw/i386/intel_iommu.h  |  15 ++++
 4 files changed, 193 insertions(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index ef13662..3b8e614 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -61,6 +61,16 @@
 static void vtd_address_space_refresh_all(IntelIOMMUState *s);
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
 
+static void hash_pasid_as_free(void *ptr);
+static inline int vtd_get_pasid_key(char *key, int key_size,
+                                   uint32_t pasid, uint16_t sid);
+static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id);
+static int vtd_pasid_cache_gsi(IntelIOMMUState *s);
+static void vtd_pasid_cache_reset(IntelIOMMUState *s);
+static int vtd_pasid_cache_psi(IntelIOMMUState *s,
+                               uint16_t domain_id,
+                               uint32_t pasid);
+
 static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
                             uint64_t wmask, uint64_t w1cmask)
 {
@@ -265,6 +275,7 @@ static void vtd_reset_caches(IntelIOMMUState *s)
     vtd_iommu_lock(s);
     vtd_reset_iotlb_locked(s);
     vtd_reset_context_cache_locked(s);
+    vtd_pasid_cache_reset(s);
     vtd_iommu_unlock(s);
 }
 
@@ -675,6 +686,11 @@ static inline bool vtd_pe_type_check(X86IOMMUState *x86_iommu,
     return true;
 }
 
+static inline uint16_t vtd_pe_get_domain_id(VTDPASIDEntry *pe)
+{
+    return VTD_SM_PASID_ENTRY_DID((pe)->val[1]);
+}
+
 static int vtd_get_pasid_dire(dma_addr_t pasid_dir_base,
                               uint32_t pasid,
                               VTDPASIDDirEntry *pdire)
@@ -2334,6 +2350,10 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
 static bool vtd_process_pasid_desc(IntelIOMMUState *s,
                                    VTDInvDesc *inv_desc)
 {
+    uint16_t domain_id;
+    uint32_t pasid;
+    int ret = 0;
+
     if ((inv_desc->val[0] & VTD_INV_DESC_PASIDC_RSVD_VAL0) ||
         (inv_desc->val[1] & VTD_INV_DESC_PASIDC_RSVD_VAL1) ||
         (inv_desc->val[2] & VTD_INV_DESC_PASIDC_RSVD_VAL2) ||
@@ -2343,14 +2363,20 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s,
         return false;
     }
 
+    domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->val[0]);
+    pasid = VTD_INV_DESC_PASIDC_PASID(inv_desc->val[0]);
+
     switch (inv_desc->val[0] & VTD_INV_DESC_PASIDC_G) {
     case VTD_INV_DESC_PASIDC_DSI:
+        ret = vtd_pasid_cache_dsi(s, domain_id);
         break;
 
     case VTD_INV_DESC_PASIDC_PASID_SI:
+        ret = vtd_pasid_cache_psi(s, domain_id, pasid);
         break;
 
     case VTD_INV_DESC_PASIDC_GLOBAL:
+        ret = vtd_pasid_cache_gsi(s);
         break;
 
     default:
@@ -2359,7 +2385,7 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s,
         return false;
     }
 
-    return true;
+    return (ret == 0) ? true : false;
 }
 
 static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
@@ -3523,6 +3549,142 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
     return vtd_dev_as;
 }
 
+#define VTD_PASID_CACHE_GEN_MAX       0xffffffffUL
+
+static inline bool vtd_pc_is_dom_si(struct VTDPASIDCacheInfo *pc_info)
+{
+    return pc_info->flags & VTD_PASID_CACHE_DOMSI;
+}
+
+static inline bool vtd_pc_is_pasid_si(struct VTDPASIDCacheInfo *pc_info)
+{
+    return pc_info->flags & VTD_PASID_CACHE_PASIDSI;
+}
+
+/**
+ * This function is used to clear pasid_cache_gen of cached pasid
+ * entry in vtd_pasid_as instance. Caller of this function should
+ * do explicit vtd_pasid_as instance release. e.g. call this function
+ * with g_hash_table_foreach_remove() or follow this function with
+ * a function to release vtd_pasid_as instance.
+ */
+static gboolean vtd_flush_pasid(gpointer key, gpointer value,
+                                gpointer user_data)
+{
+    VTDPASIDCacheInfo *pc_info = user_data;
+    VTDAddressSpace *vtd_pasid_as = value;
+    uint16_t did;
+    uint32_t pasid;
+
+    if (!vtd_pasid_as || !vtd_pasid_as->pasid_allocated) {
+        return false;
+    }
+
+    did = vtd_pe_get_domain_id(&(vtd_pasid_as->pasid_cache_entry.pasid_entry));
+    pasid = vtd_pasid_as->pasid;
+    if (vtd_pasid_as->pasid_cache_entry.pasid_cache_gen &&
+        (vtd_pc_is_dom_si(pc_info) ? (pc_info->domain_id == did) : 1) &&
+        (vtd_pc_is_pasid_si(pc_info) ? (pc_info->pasid == pasid) : 1)) {
+        /*
+         * Modify pasid_cache_gen to be 0, the cached pasid entry in
+         * vtd_pasid_as instance is invalid. And vtd_pasid_as instance
+         * would be treated as invalid in QEMU scope until the pasid
+         * cache gen is updated in a new pasid binding.
+         */
+        vtd_pasid_as->pasid_cache_entry.pasid_cache_gen = 0;
+        return true;
+    }
+
+    return false;
+}
+
+static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id)
+{
+    VTDPASIDCacheInfo pc_info;
+
+    trace_vtd_pasid_cache_dsi(domain_id);
+
+    pc_info.flags = VTD_PASID_CACHE_DOMSI;
+    pc_info.domain_id = domain_id;
+
+    /*
+     * use g_hash_table_foreach_remove(), which will free the
+     * vtd_pasid_as instances.
+     */
+    g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
+    /*
+     * TODO: Domain selective PASID cache invalidation
+     * may be issued wrongly by programmer, to be safe,
+     * after invalidating the pasid caches, emulator
+     * needs to replay the pasid bindings by walking guest
+     * pasid dir and pasid table.
+     */
+    return 0;
+}
+
+static int vtd_pasid_cache_psi(IntelIOMMUState *s,
+                               uint16_t domain_id,
+                               uint32_t pasid)
+{
+    /*
+     * Empty in this patch, will add in next patch
+     * vtd_pasid_as instance will be created in this
+     * function
+     */
+    return 0;
+}
+
+/**
+ * Caller of this function should hold iommu_lock
+ */
+static void vtd_pasid_cache_reset(IntelIOMMUState *s)
+{
+    VTDPASIDCacheInfo pc_info;
+
+    trace_vtd_pasid_cache_reset();
+
+    pc_info.flags = 0;
+
+    /*
+     * Reset pasid cache is a big hammer, so use g_hash_table_foreach_remove
+     * which will free the vtd_pasid_as instances.
+     */
+    g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
+    s->pasid_cache_gen = 1;
+}
+
+static int vtd_pasid_cache_gsi(IntelIOMMUState *s)
+{
+    trace_vtd_pasid_cache_gsi();
+    vtd_iommu_lock(s);
+    s->pasid_cache_gen++;
+    if (s->pasid_cache_gen == VTD_PASID_CACHE_GEN_MAX) {
+        vtd_pasid_cache_reset(s);
+    }
+    vtd_iommu_unlock(s);
+    /*
+     * TODO: Global PASID cache invalidation may be
+     * issued wrongly by programmer, to be safe, after
+     * invalidating the pasid caches, emulator needs
+     * to replay the pasid bindings by walking guest
+     * pasid dir and pasid table.
+     */
+    return 0;
+}
+
+static inline int vtd_get_pasid_key(char *key, int key_size,
+                                    uint32_t pasid, uint16_t sid)
+{
+    return snprintf(key, key_size, "rsv%04dpasid%010dsid%06d", 0, pasid, sid);
+}
+
+static void hash_pasid_as_free(void *ptr)
+{
+    VTDAddressSpace *vtd_pasid_as = (VTDAddressSpace *) ptr;
+
+    g_free(vtd_pasid_as);
+}
+
 /* Unmap the whole range in the notifier's scope. */
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
 {
@@ -3914,6 +4076,8 @@ static void vtd_realize(DeviceState *dev, Error **errp)
                                      g_free, g_free);
     s->vtd_as_by_busptr = g_hash_table_new_full(vtd_uint64_hash, vtd_uint64_equal,
                                               g_free, g_free);
+    s->vtd_pasid_as = g_hash_table_new_full(&g_str_hash, &g_str_equal,
+                                     g_free, hash_pasid_as_free);
     vtd_init(s);
     sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
     pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index e335800..bbe176f 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -473,6 +473,15 @@ struct VTDRootEntry {
 };
 typedef struct VTDRootEntry VTDRootEntry;
 
+struct VTDPASIDCacheInfo {
+#define VTD_PASID_CACHE_DOMSI   (1ULL << 0);
+#define VTD_PASID_CACHE_PASIDSI (1ULL << 1);
+    uint32_t flags;
+    uint16_t domain_id;
+    uint32_t pasid;
+};
+typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo;
+
 /* Masks for struct VTDRootEntry */
 #define VTD_ROOT_ENTRY_P            1ULL
 #define VTD_ROOT_ENTRY_CTP          (~0xfffULL)
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 43c0314..7912ae1 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -22,6 +22,10 @@ vtd_inv_qi_head(uint16_t head) "read head %d"
 vtd_inv_qi_tail(uint16_t head) "write tail %d"
 vtd_inv_qi_fetch(void) ""
 vtd_context_cache_reset(void) ""
+vtd_pasid_cache_reset(void) ""
+vtd_pasid_cache_gsi(void) ""
+vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain 0x%"PRIx16
+vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32
 vtd_re_not_present(uint8_t bus) "Root entry bus %"PRIu8" not present"
 vtd_ce_not_present(uint8_t bus, uint8_t devfn) "Context entry bus %"PRIu8" devfn %"PRIu8" not present"
 vtd_iotlb_page_hit(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t domain) "IOTLB page hit sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" domain 0x%"PRIx16
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 4b74f3d..24c8678 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -68,6 +68,7 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
 typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
 typedef struct VTDPASIDDirEntry VTDPASIDDirEntry;
 typedef struct VTDPASIDEntry VTDPASIDEntry;
+typedef struct VTDPASIDCacheEntry VTDPASIDCacheEntry;
 
 /* Context-Entry */
 struct VTDContextEntry {
@@ -100,7 +101,18 @@ struct VTDPASIDEntry {
     uint64_t val[8];
 };
 
+struct VTDPASIDCacheEntry {
+    /*
+     * The cache entry is obsolete if
+     * pasid_cache_gen!=IntelIOMMUState.pasid_cache_gen
+     */
+    uint32_t pasid_cache_gen;
+    struct VTDPASIDEntry pasid_entry;
+};
+
 struct VTDAddressSpace {
+    bool pasid_allocated;
+    uint32_t pasid;
     PCIBus *bus;
     uint8_t devfn;
     AddressSpace as;
@@ -114,6 +126,7 @@ struct VTDAddressSpace {
     /* Superset of notifier flags that this address space has */
     IOMMUNotifierFlag notifier_flags;
     IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
+    VTDPASIDCacheEntry pasid_cache_entry;
 };
 
 struct VTDBus {
@@ -258,6 +271,8 @@ struct IntelIOMMUState {
 
     GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* reference */
     VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
+    GHashTable *vtd_pasid_as;   /* VTDAddressSpace objects indexed by pasid */
+    uint32_t pasid_cache_gen;   /* Should be in [1,MAX] */
     /* list of registered notifiers */
     QLIST_HEAD(, VTDAddressSpace) vtd_as_with_notifiers;
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 11/18] intel_iommu: create VTDAddressSpace per BDF+PASID
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (9 preceding siblings ...)
  2019-07-05 11:01 ` [RFC v1 10/18] intel_iommu: tag VTDAddressSpace instance with PASID Liu Yi L
@ 2019-07-05 11:01 ` Liu Yi L
  2019-07-09  6:39   ` Peter Xu
  2019-07-05 11:01 ` [RFC v1 12/18] intel_iommu: bind/unbind guest page table to host Liu Yi L
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

This patch allocates PASID tagged VTDAddressSpace instances per
BDF+PASID. New PASID tagged VTDAddressSpace instances is allocated
when captured guest pasid selective pasid cache invalidation.

For pasid selective pasid cache invalidation from guest under Intel
VT-d caching-mode, it could be one of the below cases:

*) a non-present pasid entry moved to present
*) a present pasid entry moved to non-present
*) permission modifications, downgrade or upgrade

To check the cases, vIOMMU needs to fetch the latest guest pasid
entry and compare it with the previous stored pasid entry in PASID
tagged VTDAddressSpace instance.

TODO: vIOMMU needs to replay the pasid bindings by walking
guest pasid table for global and domain selective pasid cache
invalidation since guest OS may flush the pasid cache with
wrong granularity. e.g. has a svm_bind() but flush the pasid
cache with global or domain selective instead of pasid
selective. Actually, per spec, a global or domain selective
pasid cache invalidation should cover what a pasid selective
flush can do. In native, only concern is performance deduction
regards to a "wider" cache flush. But in virtualization, it
would be a disaster if no proper handling. So, to be safe, vIOMMU
emulator needs to do replay for the two invalidation granularity
to reflect the latest pasid bindings in guest pasid table.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c          | 210 ++++++++++++++++++++++++++++++++++++++++-
 hw/i386/intel_iommu_internal.h |   2 +
 2 files changed, 207 insertions(+), 5 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3b8e614..cfe5dbf 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -70,6 +70,18 @@ static void vtd_pasid_cache_reset(IntelIOMMUState *s);
 static int vtd_pasid_cache_psi(IntelIOMMUState *s,
                                uint16_t domain_id,
                                uint32_t pasid);
+static VTDContextCacheEntry *vtd_find_context_cache(IntelIOMMUState *s,
+                                                    PCIBus *bus, int devfn);
+static void vtd_invalidate_pe_cache(IntelIOMMUState *s,
+                                    PCIBus *bus,
+                                    int devfn,
+                                    uint32_t pasid,
+                                    uint16_t domain_id);
+static VTDAddressSpace *vtd_add_find_pasid_as(IntelIOMMUState *s,
+                                              PCIBus *bus,
+                                              int devfn,
+                                              uint32_t pasid,
+                                              bool allocate);
 
 static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
                             uint64_t wmask, uint64_t w1cmask)
@@ -691,6 +703,11 @@ static inline uint16_t vtd_pe_get_domain_id(VTDPASIDEntry *pe)
     return VTD_SM_PASID_ENTRY_DID((pe)->val[1]);
 }
 
+static inline bool vtd_pe_present(VTDPASIDEntry *pe)
+{
+    return pe->val[0] & VTD_PASID_ENTRY_P;
+}
+
 static int vtd_get_pasid_dire(dma_addr_t pasid_dir_base,
                               uint32_t pasid,
                               VTDPASIDDirEntry *pdire)
@@ -758,6 +775,26 @@ static int vtd_get_pasid_entry_from_pasid(IntelIOMMUState *s,
     return ret;
 }
 
+static inline int vtd_ce_get_pe_from_pasid(IntelIOMMUState *s,
+                                         VTDContextEntry *ce,
+                                         uint32_t pasid,
+                                         VTDPASIDEntry *pe)
+{
+    dma_addr_t pasid_dir_base;
+    int ret;
+
+    assert(s->root_scalable);
+
+    pasid_dir_base = VTD_CE_GET_PASID_DIR_TABLE(ce);
+    ret = vtd_get_pasid_entry_from_pasid(s,
+                                  pasid_dir_base, pasid, pe);
+    if (!vtd_pe_present(pe)) {
+        return -VTD_FR_PASID_ENTRY_P;
+    }
+
+    return ret;
+}
+
 static int vtd_ce_get_rid2pasid_entry(IntelIOMMUState *s,
                                       VTDContextEntry *ce,
                                       VTDPASIDEntry *pe)
@@ -2347,6 +2384,32 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
     return true;
 }
 
+VTDContextCacheEntry *vtd_find_context_cache(IntelIOMMUState *s,
+                                          PCIBus *bus, int devfn)
+{
+    uintptr_t key = (uintptr_t)bus;
+    VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
+    VTDAddressSpace *vtd_dev_as;
+    VTDContextCacheEntry *cc_entry;
+
+    if (!vtd_bus) {
+        return NULL;
+    }
+
+    vtd_dev_as = vtd_bus->dev_as[devfn];
+    if (!vtd_dev_as) {
+        return NULL;
+    }
+
+    cc_entry = &vtd_dev_as->context_cache_entry;
+    if (s->context_cache_gen &&
+        cc_entry->context_cache_gen == s->context_cache_gen) {
+        return cc_entry;
+    } else {
+        return NULL;
+    }
+}
+
 static bool vtd_process_pasid_desc(IntelIOMMUState *s,
                                    VTDInvDesc *inv_desc)
 {
@@ -3622,15 +3685,152 @@ static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id)
     return 0;
 }
 
+static void vtd_release_pasid_as(IntelIOMMUState *s,
+                                 PCIBus *bus,
+                                 int devfn,
+                                 uint32_t pasid)
+{
+    char key[32];
+    uint16_t sid;
+
+    sid = vtd_make_source_id(pci_bus_num(bus), devfn);
+    vtd_get_pasid_key(&key[0], 32, pasid, sid);
+    g_hash_table_remove(s->vtd_pasid_as, &key[0]);
+}
+
+static void vtd_invalidate_pe_cache(IntelIOMMUState *s,
+                                    PCIBus *bus,
+                                    int devfn,
+                                    uint32_t pasid,
+                                    uint16_t domain_id)
+{
+    VTDAddressSpace *vtd_pasid_as = NULL;
+    VTDPASIDCacheInfo pc_info;
+
+    pc_info.flags = VTD_PASID_CACHE_DOMSI;
+    pc_info.domain_id = domain_id;
+    pc_info.flags |= VTD_PASID_CACHE_PASIDSI;
+    pc_info.pasid = pasid;
+
+    vtd_pasid_as = vtd_add_find_pasid_as(s, bus, devfn, pasid, false);
+
+    vtd_flush_pasid(NULL, vtd_pasid_as, &pc_info);
+    vtd_release_pasid_as(s, bus, devfn, pasid);
+}
+
+/**
+ * This function finds or adds a VTDAddressSpace for a device when
+ * it is bound to a pasid
+ */
+static VTDAddressSpace *vtd_add_find_pasid_as(IntelIOMMUState *s,
+                                              PCIBus *bus,
+                                              int devfn,
+                                              uint32_t pasid,
+                                              bool allocate)
+{
+    char key[32];
+    char *new_key;
+    VTDAddressSpace *vtd_pasid_as;
+    uint16_t sid;
+
+    sid = vtd_make_source_id(pci_bus_num(bus), devfn);
+    vtd_get_pasid_key(&key[0], 32, pasid, sid);
+    vtd_pasid_as = g_hash_table_lookup(s->vtd_pasid_as, &key[0]);
+
+    if (!vtd_pasid_as && allocate) {
+        new_key = g_malloc(32);
+        vtd_get_pasid_key(&new_key[0], 32, pasid, sid);
+        /*
+         * Initiate the vtd_pasid_as structure.
+         *
+         * This structure here is used to track the guest pasid
+         * binding and also serves as pasid-cache mangement entry.
+         *
+         * TODO: in future, if wants to support the SVA-aware DMA
+         *       emulation, the vtd_pasid_as should be fully initialized.
+         *       e.g. the address_space and memory region fields.
+         */
+        vtd_pasid_as = g_malloc0(sizeof(VTDAddressSpace));
+        vtd_pasid_as->iommu_state = s;
+        vtd_pasid_as->bus = bus;
+        vtd_pasid_as->devfn = devfn;
+        vtd_pasid_as->context_cache_entry.context_cache_gen = 0;
+        vtd_pasid_as->pasid = pasid;
+        vtd_pasid_as->pasid_allocated = true;
+        vtd_pasid_as->pasid_cache_entry.pasid_cache_gen = 0;
+        g_hash_table_insert(s->vtd_pasid_as, new_key, vtd_pasid_as);
+    }
+    return vtd_pasid_as;
+}
+
 static int vtd_pasid_cache_psi(IntelIOMMUState *s,
                                uint16_t domain_id,
                                uint32_t pasid)
 {
-    /*
-     * Empty in this patch, will add in next patch
-     * vtd_pasid_as instance will be created in this
-     * function
-     */
+    VTDAddressSpace *vtd_pasid_as;
+    VTDContextEntry ce;
+    VTDPASIDEntry pe;
+    PCIBus *bus;
+    int bus_n, devfn;
+    VTDContextCacheEntry *cc_entry = NULL;
+    VTDPASIDCacheEntry *pc_entry = NULL;
+
+    for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) {
+        bus = vtd_find_pci_bus_from_bus_num(s, bus_n);
+        if (!bus) {
+            continue;
+        }
+        for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
+            /* Step 1: fetch guest context entry */
+            if (vtd_dev_to_context_entry(s, bus_n, devfn, &ce)) {
+                /* guest context entry does not exist, flush cache */
+                cc_entry = vtd_find_context_cache(s, bus, devfn);
+                if (cc_entry) {
+                    cc_entry->context_cache_gen = 0;
+                    vtd_invalidate_pe_cache(s, bus,
+                                          devfn, pasid, domain_id);
+                }
+                /*
+                 * neither guest context entry exists nor context cache
+                 * exists, this pasid flush has nothing to do with this
+                 * devfn in this loop, just go to next devfn
+                 */
+                continue;
+            }
+
+            /* Step 2: fetch guest pasid entry */
+            if (vtd_ce_get_pe_from_pasid(s, &ce, pasid, &pe)) {
+                /* guest PASID entry does not exist, flush cache */
+                vtd_invalidate_pe_cache(s, bus,
+                                       devfn, pasid, domain_id);
+                continue;
+            }
+
+            /*
+             * Step 3: pasid entry exists, check if domain Id suits
+             *
+             * Here no need to check domain ID since guest pasid entry
+             * exists. What needs to do are:
+             *   - create a new vtd_pasid_as or fetch an existed one
+             *   - update the pc_entry in the vtd_pasid_as
+             *   - set proper pc_entry.pasid_cache_gen
+             *   - passdown the latest guest pasid entry config to host
+             * with the above operations, vIOMMU could ensure the pasid
+             * cache in vIOMMU device model reflects the latest guest
+             * pasid entry config, and also the host also uses the
+             * latest guest pasid entry config.
+             */
+            vtd_pasid_as = vtd_add_find_pasid_as(s, bus,
+                                             devfn, pasid, true);
+            if (!vtd_pasid_as) {
+                printf("%s, fatal error happened!\n", __func__);
+                continue;
+            }
+            pc_entry = &vtd_pasid_as->pasid_cache_entry;
+            pc_entry->pasid_entry = pe; /* update pasid cache */
+            pc_entry->pasid_cache_gen = s->pasid_cache_gen;
+        }
+    }
     return 0;
 }
 
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index bbe176f..afeb6aa 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -310,6 +310,7 @@ typedef enum VTDFaultReason {
     VTD_FR_IR_SID_ERR = 0x26,   /* Invalid Source-ID */
 
     VTD_FR_PASID_TABLE_INV = 0x58,  /*Invalid PASID table entry */
+    VTD_FR_PASID_ENTRY_P = 0x59, /* The Present(P) field of pasidt-entry is 0 */
 
     /* This is not a normal fault reason. We use this to indicate some faults
      * that are not referenced by the VT-d specification.
@@ -529,6 +530,7 @@ typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo;
 #define VTD_PASID_ENTRY_FPD           (1ULL << 1) /* Fault Processing Disable */
 
 /* PASID Granular Translation Type Mask */
+#define VTD_PASID_ENTRY_P              1ULL
 #define VTD_SM_PASID_ENTRY_PGTT        (7ULL << 6)
 #define VTD_SM_PASID_ENTRY_FLT         (1ULL << 6)
 #define VTD_SM_PASID_ENTRY_SLT         (2ULL << 6)
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 12/18] intel_iommu: bind/unbind guest page table to host
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (10 preceding siblings ...)
  2019-07-05 11:01 ` [RFC v1 11/18] intel_iommu: create VTDAddressSpace per BDF+PASID Liu Yi L
@ 2019-07-05 11:01 ` Liu Yi L
  2019-07-05 11:01 ` [RFC v1 13/18] intel_iommu: flush pasid cache after a DSI context cache flush Liu Yi L
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

This patch captures the guest PASID table entry modifications and passdown
the changes to host. Thus guest page table is bound to host IOMMU and is
configured as 1st level page table (GVA->GPA) whose translation result
would further go through host VT-d 2nd level page table(GPA->HPA) under
nested translation mode. This is key part of vSVA support in KVM.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c          | 85 +++++++++++++++++++++++++++++++++++++++++-
 hw/i386/intel_iommu_internal.h | 20 ++++++++++
 2 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index cfe5dbf..d897a52 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -703,6 +703,16 @@ static inline uint16_t vtd_pe_get_domain_id(VTDPASIDEntry *pe)
     return VTD_SM_PASID_ENTRY_DID((pe)->val[1]);
 }
 
+static inline uint32_t vtd_pe_get_fl_aw(VTDPASIDEntry *pe)
+{
+    return 48 + ((pe->val[2] >> 2) & VTD_SM_PASID_ENTRY_FLPM) * 9;
+}
+
+static inline dma_addr_t vtd_pe_get_flpt_base(VTDPASIDEntry *pe)
+{
+    return pe->val[2] & VTD_SM_PASID_ENTRY_FLPTPTR;
+}
+
 static inline bool vtd_pe_present(VTDPASIDEntry *pe)
 {
     return pe->val[0] & VTD_PASID_ENTRY_P;
@@ -1836,6 +1846,47 @@ static void vtd_context_global_invalidate(IntelIOMMUState *s)
     vtd_iommu_replay_all(s);
 }
 
+static void vtd_bind_guest_pasid(IntelIOMMUState *s, int bus_n,
+            int devfn, int pasid, VTDPASIDEntry *pe, VTDPASIDOp op)
+{
+    PCIBus *bus;
+    struct gpasid_bind_data *g_bind_data;
+    bus = vtd_find_pci_bus_from_bus_num(s, bus_n);
+    g_bind_data = g_malloc0(sizeof(*g_bind_data));
+
+    switch (op) {
+    case VTD_PASID_BIND:
+    case VTD_PASID_UPDATE:
+        g_bind_data->version = IOMMU_GPASID_BIND_VERSION_1;
+        g_bind_data->format = IOMMU_PASID_FORMAT_INTEL_VTD;
+        g_bind_data->gpgd = vtd_pe_get_flpt_base(pe);
+        g_bind_data->addr_width = vtd_pe_get_fl_aw(pe);
+        g_bind_data->hpasid = pasid;
+        g_bind_data->vtd.flags =
+                             (VTD_SM_PASID_ENTRY_SRE_BIT(pe->val[2]) ? 1 : 0)
+                           | (VTD_SM_PASID_ENTRY_EAFE_BIT(pe->val[2]) ? 1 : 0)
+                           | (VTD_SM_PASID_ENTRY_PCD_BIT(pe->val[1]) ? 1 : 0)
+                           | (VTD_SM_PASID_ENTRY_PWT_BIT(pe->val[1]) ? 1 : 0)
+                           | (VTD_SM_PASID_ENTRY_EMTE_BIT(pe->val[1]) ? 1 : 0)
+                           | (VTD_SM_PASID_ENTRY_CD_BIT(pe->val[1]) ? 1 : 0);
+        g_bind_data->vtd.pat = VTD_SM_PASID_ENTRY_PAT(pe->val[1]);
+        g_bind_data->vtd.emt = VTD_SM_PASID_ENTRY_EMT(pe->val[1]);
+        pci_device_bind_gpasid(bus, devfn, g_bind_data);
+        break;
+
+    case VTD_PASID_UNBIND:
+        g_bind_data->gpgd = 0;
+        g_bind_data->addr_width = 0;
+        g_bind_data->hpasid = pasid;
+        pci_device_unbind_gpasid(bus, devfn, g_bind_data);
+        break;
+
+    default:
+        printf("Unknown VTDPASIDOp!!\n");
+        break;
+    }
+}
+
 /* Do a context-cache device-selective invalidation.
  * @func_mask: FM field after shifting
  */
@@ -1893,6 +1944,17 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
                  * happened.
                  */
                 vtd_sync_shadow_page_table(vtd_as);
+                /*
+                 * Per spec, context flush should also followed with PASID
+                 * cache and iotlb flush. Here, mark it as a TODO.
+                 * Regards to a device selective context cache invalidation:
+                 * if (emaulted_device)
+                 *    modify the pasid cache gen and pasid-based iotlb gen
+                 *    value (will be added in following patches)
+                 * else if (assigned_device)
+                 *    check if the device has been bound to any pasid
+                 *    invoke pasid_unbind regards to each bound pasid
+                 */
             }
         }
     }
@@ -3636,7 +3698,7 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value,
 {
     VTDPASIDCacheInfo *pc_info = user_data;
     VTDAddressSpace *vtd_pasid_as = value;
-    uint16_t did;
+    uint16_t did, devfn;
     uint32_t pasid;
 
     if (!vtd_pasid_as || !vtd_pasid_as->pasid_allocated) {
@@ -3645,6 +3707,7 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value,
 
     did = vtd_pe_get_domain_id(&(vtd_pasid_as->pasid_cache_entry.pasid_entry));
     pasid = vtd_pasid_as->pasid;
+    devfn = vtd_pasid_as->devfn;
     if (vtd_pasid_as->pasid_cache_entry.pasid_cache_gen &&
         (vtd_pc_is_dom_si(pc_info) ? (pc_info->domain_id == did) : 1) &&
         (vtd_pc_is_pasid_si(pc_info) ? (pc_info->pasid == pasid) : 1)) {
@@ -3655,6 +3718,19 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value,
          * cache gen is updated in a new pasid binding.
          */
         vtd_pasid_as->pasid_cache_entry.pasid_cache_gen = 0;
+        /*
+         * To be clean, invalidate the vtd_pasid_as instance is expected.
+         * but it is optional if wants to save memory allocation for
+         * frequent pasid usage on a device. This function will not
+         * release the vtd_pasid_as instance, the caller should do
+         * it explicitly.
+         */
+        vtd_bind_guest_pasid(vtd_pasid_as->iommu_state,
+                             pci_bus_num(vtd_pasid_as->bus),
+                             devfn,
+                             pasid,
+                             NULL,
+                             VTD_PASID_UNBIND);
         return true;
     }
 
@@ -3824,11 +3900,18 @@ static int vtd_pasid_cache_psi(IntelIOMMUState *s,
                                              devfn, pasid, true);
             if (!vtd_pasid_as) {
                 printf("%s, fatal error happened!\n", __func__);
+                /*
+                 * get vtd_pasid_as failed, need to do an unbind
+                 * in case of previous bind
+                 */
+                vtd_bind_guest_pasid(s, bus_n, devfn,
+                                     pasid, NULL, VTD_PASID_UNBIND);
                 continue;
             }
             pc_entry = &vtd_pasid_as->pasid_cache_entry;
             pc_entry->pasid_entry = pe; /* update pasid cache */
             pc_entry->pasid_cache_gen = s->pasid_cache_gen;
+            vtd_bind_guest_pasid(s, bus_n, devfn, pasid, &pe, VTD_PASID_BIND);
         }
     }
     return 0;
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index afeb6aa..f9a4ac6 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -474,6 +474,14 @@ struct VTDRootEntry {
 };
 typedef struct VTDRootEntry VTDRootEntry;
 
+enum VTDPASIDOp {
+    VTD_PASID_BIND,
+    VTD_PASID_UNBIND,
+    VTD_PASID_UPDATE,
+    VTD_OP_NUM
+};
+typedef enum VTDPASIDOp VTDPASIDOp;
+
 struct VTDPASIDCacheInfo {
 #define VTD_PASID_CACHE_DOMSI   (1ULL << 0);
 #define VTD_PASID_CACHE_PASIDSI (1ULL << 1);
@@ -540,6 +548,18 @@ typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo;
 #define VTD_SM_PASID_ENTRY_AW          7ULL /* Adjusted guest-address-width */
 #define VTD_SM_PASID_ENTRY_DID(val)    ((val) & VTD_DOMAIN_ID_MASK)
 
+/* Adjusted guest-address-width */
+#define VTD_SM_PASID_ENTRY_FLPM          3ULL
+#define VTD_SM_PASID_ENTRY_FLPTPTR       (~0xfffULL)
+#define VTD_SM_PASID_ENTRY_SRE_BIT(val)  (!!((val) & 1ULL))
+#define VTD_SM_PASID_ENTRY_EAFE_BIT(val) (!!(((val) >> 7) & 1ULL))
+#define VTD_SM_PASID_ENTRY_PCD_BIT(val)  (!!(((val) >> 31) & 1ULL))
+#define VTD_SM_PASID_ENTRY_PWT_BIT(val)  (!!(((val) >> 30) & 1ULL))
+#define VTD_SM_PASID_ENTRY_EMTE_BIT(val) (!!(((val) >> 26) & 1ULL))
+#define VTD_SM_PASID_ENTRY_CD_BIT(val)   (!!(((val) >> 25) & 1ULL))
+#define VTD_SM_PASID_ENTRY_PAT(val)      (((val) >> 32) & 0xFFFFFFFFULL)
+#define VTD_SM_PASID_ENTRY_EMT(val)      (((val) >> 27) & 0x7ULL)
+
 /* Second Level Page Translation Pointer*/
 #define VTD_SM_PASID_ENTRY_SLPTPTR     (~0xfffULL)
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 13/18] intel_iommu: flush pasid cache after a DSI context cache flush
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (11 preceding siblings ...)
  2019-07-05 11:01 ` [RFC v1 12/18] intel_iommu: bind/unbind guest page table to host Liu Yi L
@ 2019-07-05 11:01 ` Liu Yi L
  2019-07-05 11:01 ` [RFC v1 14/18] hw/pci: add flush_pasid_iotlb() in PCIPASIDOps Liu Yi L
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

This patch flushes pasid cache after a device selective context cache
flush. This is a behavior to ensure safety. Actually, programmer should
issue a pasid cache flush following a device selective context cache
invalidation.

TODO: global and domain selective context cache flush should also be
followed with a proper pasid cache flush. Also needs to consider pasid
bind replay.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c          | 22 ++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h |  2 ++
 hw/i386/trace-events           |  1 +
 3 files changed, 25 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index d897a52..3b213a4 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -70,6 +70,8 @@ static void vtd_pasid_cache_reset(IntelIOMMUState *s);
 static int vtd_pasid_cache_psi(IntelIOMMUState *s,
                                uint16_t domain_id,
                                uint32_t pasid);
+static void vtd_pasid_cache_devsi(IntelIOMMUState *s,
+                                  uint16_t devfn);
 static VTDContextCacheEntry *vtd_find_context_cache(IntelIOMMUState *s,
                                                     PCIBus *bus, int devfn);
 static void vtd_invalidate_pe_cache(IntelIOMMUState *s,
@@ -1955,6 +1957,7 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
                  *    check if the device has been bound to any pasid
                  *    invoke pasid_unbind regards to each bound pasid
                  */
+                 vtd_pasid_cache_devsi(s, devfn_it);
             }
         }
     }
@@ -3686,6 +3689,11 @@ static inline bool vtd_pc_is_pasid_si(struct VTDPASIDCacheInfo *pc_info)
     return pc_info->flags & VTD_PASID_CACHE_PASIDSI;
 }
 
+static inline bool vtd_pc_is_dev_si(struct VTDPASIDCacheInfo *pc_info)
+{
+    return pc_info->flags & VTD_PASID_CACHE_DEVSI;
+}
+
 /**
  * This function is used to clear pasid_cache_gen of cached pasid
  * entry in vtd_pasid_as instance. Caller of this function should
@@ -3709,6 +3717,7 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value,
     pasid = vtd_pasid_as->pasid;
     devfn = vtd_pasid_as->devfn;
     if (vtd_pasid_as->pasid_cache_entry.pasid_cache_gen &&
+        (vtd_pc_is_dev_si(pc_info) ? (pc_info->devfn == devfn) : 1) &&
         (vtd_pc_is_dom_si(pc_info) ? (pc_info->domain_id == did) : 1) &&
         (vtd_pc_is_pasid_si(pc_info) ? (pc_info->pasid == pasid) : 1)) {
         /*
@@ -3917,6 +3926,19 @@ static int vtd_pasid_cache_psi(IntelIOMMUState *s,
     return 0;
 }
 
+static void vtd_pasid_cache_devsi(IntelIOMMUState *s,
+                                  uint16_t devfn)
+{
+    VTDPASIDCacheInfo pc_info;
+
+    trace_vtd_pasid_cache_devsi(devfn);
+
+    pc_info.flags = VTD_PASID_CACHE_DEVSI;
+    pc_info.devfn = devfn;
+
+    g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
+}
+
 /**
  * Caller of this function should hold iommu_lock
  */
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index f9a4ac6..021d358 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -485,9 +485,11 @@ typedef enum VTDPASIDOp VTDPASIDOp;
 struct VTDPASIDCacheInfo {
 #define VTD_PASID_CACHE_DOMSI   (1ULL << 0);
 #define VTD_PASID_CACHE_PASIDSI (1ULL << 1);
+#define VTD_PASID_CACHE_DEVSI   (1ULL << 2);
     uint32_t flags;
     uint16_t domain_id;
     uint32_t pasid;
+    uint16_t devfn;
 };
 typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo;
 
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 7912ae1..25bd6a4 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -26,6 +26,7 @@ vtd_pasid_cache_reset(void) ""
 vtd_pasid_cache_gsi(void) ""
 vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain 0x%"PRIx16
 vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32
+vtd_pasid_cache_devsi(uint16_t devfn) "Dev slective PC invalidation dev: 0x%"PRIx16
 vtd_re_not_present(uint8_t bus) "Root entry bus %"PRIu8" not present"
 vtd_ce_not_present(uint8_t bus, uint8_t devfn) "Context entry bus %"PRIu8" devfn %"PRIu8" not present"
 vtd_iotlb_page_hit(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t domain) "IOTLB page hit sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" domain 0x%"PRIx16
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 14/18] hw/pci: add flush_pasid_iotlb() in PCIPASIDOps
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (12 preceding siblings ...)
  2019-07-05 11:01 ` [RFC v1 13/18] intel_iommu: flush pasid cache after a DSI context cache flush Liu Yi L
@ 2019-07-05 11:01 ` Liu Yi L
  2019-07-05 11:01 ` [RFC v1 15/18] vfio/pci: adds support for PASID-based iotlb flush Liu Yi L
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

This patch adds flush_pasid_iotlb() in PCIPASIDOps for passing guest
PASID-based iotlb flush operation to host via vfio interface.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/pci/pci.c         | 15 +++++++++++++++
 include/hw/pci/pci.h |  4 ++++
 2 files changed, 19 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 2229229..cf92bed 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2706,6 +2706,21 @@ void pci_device_unbind_gpasid(PCIBus *bus, int32_t devfn,
     }
 }
 
+void pci_device_flush_pasid_iotlb(PCIBus *bus, int32_t devfn,
+                            struct iommu_cache_invalidate_info *info)
+{
+    PCIDevice *dev;
+
+    if (!bus) {
+        return;
+    }
+
+    dev = bus->devices[devfn];
+    if (dev && dev->pasid_ops) {
+        dev->pasid_ops->flush_pasid_iotlb(bus, devfn, info);
+    }
+}
+
 static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque)
 {
     Range *range = opaque;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 8d849e6..77e6bb1 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -272,6 +272,8 @@ struct PCIPASIDOps {
                             struct gpasid_bind_data *g_bind_data);
     void (*unbind_gpasid)(PCIBus *bus, int32_t devfn,
                             struct gpasid_bind_data *g_bind_data);
+    void (*flush_pasid_iotlb)(PCIBus *bus, int32_t devfn,
+                            struct iommu_cache_invalidate_info *info);
 };
 
 struct PCIDevice {
@@ -506,6 +508,8 @@ void pci_device_bind_gpasid(PCIBus *bus, int32_t devfn,
                             struct gpasid_bind_data *g_bind_data);
 void pci_device_unbind_gpasid(PCIBus *bus, int32_t devfn,
                             struct gpasid_bind_data *g_bind_data);
+void pci_device_flush_pasid_iotlb(PCIBus *bus, int32_t devfn,
+                            struct iommu_cache_invalidate_info *info);
 
 static inline void
 pci_set_byte(uint8_t *config, uint8_t val)
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 15/18] vfio/pci: adds support for PASID-based iotlb flush
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (13 preceding siblings ...)
  2019-07-05 11:01 ` [RFC v1 14/18] hw/pci: add flush_pasid_iotlb() in PCIPASIDOps Liu Yi L
@ 2019-07-05 11:01 ` Liu Yi L
  2019-07-05 11:01 ` [RFC v1 16/18] intel_iommu: add PASID-based iotlb invalidation support Liu Yi L
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

This patch adds support for propagating guest PASID-based iotlb flush
to host via vfio container ioctl.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/vfio/pci.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 892b46c..d8b84a5 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2796,11 +2796,34 @@ static void vfio_pci_device_unbind_gpasid(PCIBus *bus, int32_t devfn,
     g_free(bind);
 }
 
+static void vfio_pci_device_flush_pasid_iotlb(PCIBus *bus, int32_t devfn,
+                                struct iommu_cache_invalidate_info *info)
+{
+    PCIDevice *pdev = bus->devices[devfn];
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+    VFIOContainer *container = vdev->vbasedev.group->container;
+    struct vfio_iommu_type1_cache_invalidate cache_inv;
+    unsigned long argsz;
+
+    argsz = sizeof(cache_inv);
+    cache_inv.argsz = argsz;
+    cache_inv.info = *info;
+    cache_inv.flags = 0x0;
+
+    rcu_read_lock();
+    if (ioctl(container->fd, VFIO_IOMMU_CACHE_INVALIDATE, &cache_inv) != 0) {
+        error_report("vfio_pci_device_flush_pasid_iotlb:"
+                     " cache invalidation failed, contanier: %p", container);
+    }
+    rcu_read_unlock();
+}
+
 static PCIPASIDOps vfio_pci_pasid_ops = {
     .alloc_pasid = vfio_pci_device_request_pasid_alloc,
     .free_pasid = vfio_pci_device_request_pasid_free,
     .bind_gpasid = vfio_pci_device_bind_gpasid,
     .unbind_gpasid = vfio_pci_device_unbind_gpasid,
+    .flush_pasid_iotlb = vfio_pci_device_flush_pasid_iotlb,
 };
 
 static void vfio_realize(PCIDevice *pdev, Error **errp)
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 16/18] intel_iommu: add PASID-based iotlb invalidation support
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (14 preceding siblings ...)
  2019-07-05 11:01 ` [RFC v1 15/18] vfio/pci: adds support for PASID-based iotlb flush Liu Yi L
@ 2019-07-05 11:01 ` Liu Yi L
  2019-07-05 11:01 ` [RFC v1 17/18] intel_iommu: propagate PASID-based iotlb flush to host Liu Yi L
  2019-07-05 11:01 ` [RFC v1 18/18] intel_iommu: do not passdown pasid bind for PASID #0 Liu Yi L
  17 siblings, 0 replies; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

PASID-based IOTLB (piotlb) is used during walking Intel VT-d first-level
page table. This patch adds frame of processing for PASID-based IOTLB flush.
Detailed processing is in next patch of this patchset.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c          | 61 ++++++++++++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h | 13 +++++++++
 hw/i386/trace-events           |  1 +
 3 files changed, 75 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3b213a4..7a778d8 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2516,6 +2516,63 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s,
     return (ret == 0) ? true : false;
 }
 
+static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s,
+                                        uint16_t domain_id,
+                                        uint32_t pasid)
+{
+}
+
+static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
+                             uint32_t pasid, hwaddr addr, uint8_t am, bool ih)
+{
+}
+
+static bool vtd_process_piotlb_desc(IntelIOMMUState *s,
+                                    VTDInvDesc *inv_desc)
+{
+    uint16_t domain_id;
+    uint32_t pasid;
+    uint8_t am;
+    hwaddr addr;
+
+    if ((inv_desc->val[0] & VTD_INV_DESC_PIOTLB_RSVD_VAL0) ||
+        (inv_desc->val[1] & VTD_INV_DESC_PIOTLB_RSVD_VAL1)) {
+        trace_vtd_piotlb_inv("Non-zreo reserved field",
+                          inv_desc->val[1], inv_desc->val[0]);
+        return false;
+    }
+
+    domain_id = VTD_INV_DESC_PIOTLB_DID(inv_desc->val[0]);
+    pasid = VTD_INV_DESC_PIOTLB_PASID(inv_desc->val[0]);
+    switch (inv_desc->val[0] & VTD_INV_DESC_IOTLB_G) {
+    case VTD_INV_DESC_PIOTLB_ALL_IN_PASID:
+        trace_vtd_piotlb_inv("PASID selectived piotlb flush",
+                          inv_desc->val[1], inv_desc->val[0]);
+        vtd_piotlb_pasid_invalidate(s, domain_id, pasid);
+        break;
+
+    case VTD_INV_DESC_PIOTLB_PSI_IN_PASID:
+        am = VTD_INV_DESC_PIOTLB_AM(inv_desc->val[1]);
+        addr = (hwaddr) VTD_INV_DESC_PIOTLB_ADDR(inv_desc->val[1]);
+        trace_vtd_piotlb_inv("Page selective piotlb flush within a PASID",
+                          inv_desc->val[1], inv_desc->val[0]);
+        if (am > VTD_MAMV) {
+            trace_vtd_piotlb_inv("Invalid am, > max am value",
+                          inv_desc->val[1], inv_desc->val[0]);
+            return false;
+        }
+        vtd_piotlb_page_invalidate(s, domain_id, pasid,
+             addr, am, VTD_INV_DESC_PIOTLB_IH(inv_desc->val[1]));
+        break;
+
+    default:
+        trace_vtd_piotlb_inv("Invalid granularity in P-IOTLB desc",
+                          inv_desc->val[1], inv_desc->val[0]);
+        return false;
+    }
+    return true;
+}
+
 static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
                                      VTDInvDesc *inv_desc)
 {
@@ -2630,6 +2687,10 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         break;
 
     case VTD_INV_DESC_PIOTLB:
+        trace_vtd_inv_desc("p-iotlb", inv_desc.val[1], inv_desc.val[0]);
+        if (!vtd_process_piotlb_desc(s, &inv_desc)) {
+            return false;
+        }
         break;
 
     case VTD_INV_DESC_WAIT:
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 021d358..69cd879 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -449,6 +449,19 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_PASIDC_PASID_SI   (1ULL << 4)
 #define VTD_INV_DESC_PASIDC_GLOBAL     (3ULL << 4)
 
+#define VTD_INV_DESC_PIOTLB_ALL_IN_PASID  (2ULL << 4)
+#define VTD_INV_DESC_PIOTLB_PSI_IN_PASID  (3ULL << 4)
+
+#define VTD_INV_DESC_PIOTLB_RSVD_VAL0     0xfff000000000ffc0ULL
+#define VTD_INV_DESC_PIOTLB_RSVD_VAL1     0xf80ULL
+
+#define VTD_INV_DESC_PIOTLB_PASID(val)    (((val) >> 32) & 0xfffffULL)
+#define VTD_INV_DESC_PIOTLB_DID(val)      (((val) >> 16) & \
+                                             VTD_DOMAIN_ID_MASK)
+#define VTD_INV_DESC_PIOTLB_ADDR(val)     ((val) & ~0xfffULL)
+#define VTD_INV_DESC_PIOTLB_AM(val)       ((val) & 0x3fULL)
+#define VTD_INV_DESC_PIOTLB_IH(val)       (((val) >> 6) & 0x1)
+
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
     uint16_t domain_id;
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 25bd6a4..2338be7 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -16,6 +16,7 @@ vtd_inv_desc_wait_sw(uint64_t addr, uint32_t data) "wait invalidate status write
 vtd_inv_desc_wait_irq(const char *msg) "%s"
 vtd_inv_desc_wait_write_fail(uint64_t hi, uint64_t lo) "write fail for wait desc hi 0x%"PRIx64" lo 0x%"PRIx64
 vtd_inv_desc_iec(uint32_t granularity, uint32_t index, uint32_t mask) "granularity 0x%"PRIx32" index 0x%"PRIx32" mask 0x%"PRIx32
+vtd_piotlb_inv(const char *type, uint64_t hi, uint64_t lo) "invalidate desc type %s high 0x%"PRIx64" low 0x%"PRIx64
 vtd_inv_qi_enable(bool enable) "enabled %d"
 vtd_inv_qi_setup(uint64_t addr, int size) "addr 0x%"PRIx64" size %d"
 vtd_inv_qi_head(uint16_t head) "read head %d"
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 17/18] intel_iommu: propagate PASID-based iotlb flush to host
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (15 preceding siblings ...)
  2019-07-05 11:01 ` [RFC v1 16/18] intel_iommu: add PASID-based iotlb invalidation support Liu Yi L
@ 2019-07-05 11:01 ` Liu Yi L
  2019-07-05 11:01 ` [RFC v1 18/18] intel_iommu: do not passdown pasid bind for PASID #0 Liu Yi L
  17 siblings, 0 replies; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

Intel VT-d 3.0 supports nested translation in PASID granularity. For guest
SVA support, nested translation is enabled for specific PASID. In such case,
guest owns the GVA->GPA translation which is configured as first level page
table in host side for a specific pasid, and host owns GPA->HPA translation.
As guest owns first level translation table, guest's PASID-based IOTLB(piotlb)
flush should be propagated to host since host IOMMU will cache first level
page table related mappings during DMA address translation.

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c          | 68 ++++++++++++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h |  7 +++++
 2 files changed, 75 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 7a778d8..e4286e5 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2516,15 +2516,83 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s,
     return (ret == 0) ? true : false;
 }
 
+static inline bool vtd_pasid_cache_valid(VTDAddressSpace *vtd_pasid_as)
+{
+    return (vtd_pasid_as->iommu_state->pasid_cache_gen &&
+            (vtd_pasid_as->iommu_state->pasid_cache_gen
+             == vtd_pasid_as->pasid_cache_entry.pasid_cache_gen));
+}
+
+static void vtd_flush_pasid_iotlb(gpointer key, gpointer value,
+                                  gpointer user_data)
+{
+    VTDPIOTLBInvInfo *piotlb_info = user_data;
+    VTDAddressSpace *vtd_pasid_as = value;
+    uint16_t did;
+
+    /*
+     * Actually, needs to check whether the pasid entry cache stored in
+     * vtd_pasid_as is valid or not. "invalid" means the pasid cache
+     * has been flushed, thus host should have done such piotlb flush,
+     * no need to pass down piotlb flush to host. Only when pasid entry
+     * cache is "valid", should a piotlb flush pass down to host. Because
+     * In such case, it is due to mapping changes in guest, a piotlb flush
+     * in host is required.
+     */
+    if (!vtd_pasid_as || !vtd_pasid_cache_valid(vtd_pasid_as)) {
+        return;
+    }
+
+    did = vtd_pe_get_domain_id(
+                &(vtd_pasid_as->pasid_cache_entry.pasid_entry));
+    /*
+     * vtd_pasid_as should be non-NULL and the pasid_cache_gen
+     * should be non-zero. If vtd_pasid_as management is clean,
+     * the vtd_pasid_as is non-NULL is enough.
+     */
+    if ((piotlb_info->domain_id == did) &&
+        (piotlb_info->pasid == vtd_pasid_as->pasid)) {
+        pci_device_flush_pasid_iotlb(vtd_pasid_as->bus,
+                            vtd_pasid_as->devfn, &piotlb_info->tlb_info);
+    }
+}
+
 static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s,
                                         uint16_t domain_id,
                                         uint32_t pasid)
 {
+    VTDPIOTLBInvInfo piotlb_info;
+    struct iommu_cache_invalidate_info *inv_info = &piotlb_info.tlb_info;
+
+    piotlb_info.domain_id = domain_id;
+    piotlb_info.pasid = pasid;
+    inv_info->version = IOMMU_CACHE_INVALIDATE_INFO_VERSION_1;
+    inv_info->cache = IOMMU_CACHE_INV_TYPE_IOTLB;
+    inv_info->granularity = IOMMU_INV_GRANU_PASID;
+    inv_info->pasid_info.pasid = pasid;
+    inv_info->pasid_info.flags = IOMMU_INV_PASID_FLAGS_PASID;
+    g_hash_table_foreach(s->vtd_pasid_as, vtd_flush_pasid_iotlb, &piotlb_info);
 }
 
 static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
                              uint32_t pasid, hwaddr addr, uint8_t am, bool ih)
 {
+    VTDPIOTLBInvInfo piotlb_info;
+    struct iommu_cache_invalidate_info *inv_info = &piotlb_info.tlb_info;
+
+    piotlb_info.domain_id = domain_id;
+    piotlb_info.pasid = pasid;
+    inv_info->version = IOMMU_CACHE_INVALIDATE_INFO_VERSION_1;
+    inv_info->cache = IOMMU_CACHE_INV_TYPE_IOTLB;
+    inv_info->granularity = IOMMU_INV_GRANU_ADDR;
+    inv_info->addr_info.flags = IOMMU_INV_ADDR_FLAGS_PASID;
+    inv_info->addr_info.flags |= ih ? IOMMU_INV_ADDR_FLAGS_LEAF : 0;
+    inv_info->addr_info.pasid = pasid;
+    inv_info->addr_info.addr = addr;
+    inv_info->addr_info.granule_size = 1 << (12 + am);
+    inv_info->addr_info.nb_granules = 1;
+
+    g_hash_table_foreach(s->vtd_pasid_as, vtd_flush_pasid_iotlb, &piotlb_info);
 }
 
 static bool vtd_process_piotlb_desc(IntelIOMMUState *s,
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 69cd879..556ea8d 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -506,6 +506,13 @@ struct VTDPASIDCacheInfo {
 };
 typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo;
 
+struct VTDPIOTLBInvInfo {
+    uint16_t domain_id;
+    uint32_t pasid;
+    struct iommu_cache_invalidate_info tlb_info;
+};
+typedef struct VTDPIOTLBInvInfo VTDPIOTLBInvInfo;
+
 /* Masks for struct VTDRootEntry */
 #define VTD_ROOT_ENTRY_P            1ULL
 #define VTD_ROOT_ENTRY_CTP          (~0xfffULL)
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC v1 18/18] intel_iommu: do not passdown pasid bind for PASID #0
  2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
                   ` (16 preceding siblings ...)
  2019-07-05 11:01 ` [RFC v1 17/18] intel_iommu: propagate PASID-based iotlb flush to host Liu Yi L
@ 2019-07-05 11:01 ` Liu Yi L
  17 siblings, 0 replies; 58+ messages in thread
From: Liu Yi L @ 2019-07-05 11:01 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: eric.auger, david, tianyu.lan, kevin.tian, yi.l.liu, jun.j.tian,
	yi.y.sun, kvm, Jacob Pan, Yi Sun

RID_PASID field was introduced in VT-d 3.0 spec, it is used for DMA
requests w/o PASID in scalable mode VT-d. It is also known as IOVA.
And in VT-d 3.1 spec, there is further definition on it:

"Implementations not supporting RID_PASID capability (ECAP_REG.RPS is
0b), use a PASID value of 0 to perform address translation for requests
without PASID."

This patch adds a check on the PASIDs which are going to be bound to
device. For PASID #0, no need to passdown pasid binding since PASID #0
is used as RID_PASID for requests without pasid. Reason is current Intel
vIOMMU supports guest IOVA by shadowing guest 2nd level page table.
However, in future, if guest OS uses 1st level page table to store IOVA
mappings, thus guest IOVA support will also be done via nested translation
in host side. Then vIOMMU could passdown the pasid binding for PASID #0
to host with a special PASID value. A special PASID value is to indicate
host to bind the guest page table to a proper PASID. e.g PASID value from
RID_PASID field for PF/VF or default PASID for ADI (Assignable Device
Interface in Scalable IOV solution).

Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index e4286e5..ee55209 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1853,6 +1853,14 @@ static void vtd_bind_guest_pasid(IntelIOMMUState *s, int bus_n,
 {
     PCIBus *bus;
     struct gpasid_bind_data *g_bind_data;
+
+    if (pasid < VTD_MIN_HPASID) {
+        /*
+         * If pasid < VTD_HPASID_MIN, this pasid is not allocated
+         * from host. No need to passdown the changes on it to host.
+         */
+        return;
+    }
     bus = vtd_find_pci_bus_from_bus_num(s, bus_n);
     g_bind_data = g_malloc0(sizeof(*g_bind_data));
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 02/18] linux-headers: import vfio.h from kernel
  2019-07-05 11:01 ` [RFC v1 02/18] linux-headers: import vfio.h " Liu Yi L
@ 2019-07-09  1:58   ` Peter Xu
  2019-07-09  8:37     ` Auger Eric
  2019-07-10 12:29     ` Liu, Yi L
  0 siblings, 2 replies; 58+ messages in thread
From: Peter Xu @ 2019-07-09  1:58 UTC (permalink / raw)
  To: Liu Yi L
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	tianyu.lan, kevin.tian, jun.j.tian, yi.y.sun, kvm, Jacob Pan,
	Yi Sun

On Fri, Jul 05, 2019 at 07:01:35PM +0800, Liu Yi L wrote:
> This patch imports the vIOMMU related definitions from kernel
> uapi/vfio.h. e.g. pasid allocation, guest pasid bind, guest pasid
> table bind and guest iommu cache invalidation.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>

Just a note that in the last version you can use
scripts/update-linux-headers.sh to update the headers.  For this RFC
it's perfectly fine.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 03/18] hw/pci: introduce PCIPASIDOps to PCIDevice
  2019-07-05 11:01 ` [RFC v1 03/18] hw/pci: introduce PCIPASIDOps to PCIDevice Liu Yi L
@ 2019-07-09  2:12   ` Peter Xu
  2019-07-09 10:41     ` Auger Eric
  2019-07-10 11:08     ` Liu, Yi L
  0 siblings, 2 replies; 58+ messages in thread
From: Peter Xu @ 2019-07-09  2:12 UTC (permalink / raw)
  To: Liu Yi L
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	tianyu.lan, kevin.tian, jun.j.tian, yi.y.sun, kvm, Jacob Pan,
	Yi Sun

On Fri, Jul 05, 2019 at 07:01:36PM +0800, Liu Yi L wrote:
> +void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops)
> +{
> +    assert(ops && !dev->pasid_ops);
> +    dev->pasid_ops = ops;
> +}
> +
> +bool pci_device_is_ops_set(PCIBus *bus, int32_t devfn)

Name should be "pci_device_is_pasid_ops_set".  Or maybe you can simply
drop this function because as long as you check it in helper functions
like [1] below always then it seems even unecessary.

> +{
> +    PCIDevice *dev;
> +
> +    if (!bus) {
> +        return false;
> +    }
> +
> +    dev = bus->devices[devfn];
> +    return !!(dev && dev->pasid_ops);
> +}
> +
> +int pci_device_request_pasid_alloc(PCIBus *bus, int32_t devfn,
> +                                   uint32_t min_pasid, uint32_t max_pasid)

From VT-d spec I see that the virtual command "allocate pasid" does
not have bdf information so it's global, but here we've got bus/devfn.
I'm curious is that reserved for ARM or some other arch?

> +{
> +    PCIDevice *dev;
> +
> +    if (!bus) {
> +        return -1;
> +    }
> +
> +    dev = bus->devices[devfn];
> +    if (dev && dev->pasid_ops && dev->pasid_ops->alloc_pasid) {

[1]

> +        return dev->pasid_ops->alloc_pasid(bus, devfn, min_pasid, max_pasid);
> +    }
> +    return -1;
> +}
> +
> +int pci_device_request_pasid_free(PCIBus *bus, int32_t devfn,
> +                                  uint32_t pasid)
> +{
> +    PCIDevice *dev;
> +
> +    if (!bus) {
> +        return -1;
> +    }
> +
> +    dev = bus->devices[devfn];
> +    if (dev && dev->pasid_ops && dev->pasid_ops->free_pasid) {
> +        return dev->pasid_ops->free_pasid(bus, devfn, pasid);
> +    }
> +    return -1;
> +}
> +
>  static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque)
>  {
>      Range *range = opaque;
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index d082707..16e5b8e 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -262,6 +262,13 @@ struct PCIReqIDCache {
>  };
>  typedef struct PCIReqIDCache PCIReqIDCache;
>  
> +typedef struct PCIPASIDOps PCIPASIDOps;
> +struct PCIPASIDOps {
> +    int (*alloc_pasid)(PCIBus *bus, int32_t devfn,
> +                         uint32_t min_pasid, uint32_t max_pasid);
> +    int (*free_pasid)(PCIBus *bus, int32_t devfn, uint32_t pasid);
> +};
> +
>  struct PCIDevice {
>      DeviceState qdev;
>  
> @@ -351,6 +358,7 @@ struct PCIDevice {
>      MSIVectorUseNotifier msix_vector_use_notifier;
>      MSIVectorReleaseNotifier msix_vector_release_notifier;
>      MSIVectorPollNotifier msix_vector_poll_notifier;
> +    PCIPASIDOps *pasid_ops;
>  };
>  
>  void pci_register_bar(PCIDevice *pci_dev, int region_num,
> @@ -484,6 +492,12 @@ typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
>  AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
>  void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
>  
> +void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops);
> +bool pci_device_is_ops_set(PCIBus *bus, int32_t devfn);
> +int pci_device_request_pasid_alloc(PCIBus *bus, int32_t devfn,
> +                                   uint32_t min_pasid, uint32_t max_pasid);
> +int pci_device_request_pasid_free(PCIBus *bus, int32_t devfn, uint32_t pasid);
> +
>  static inline void
>  pci_set_byte(uint8_t *config, uint8_t val)
>  {
> -- 
> 2.7.4
> 

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 04/18] intel_iommu: add "sm_model" option
  2019-07-05 11:01 ` [RFC v1 04/18] intel_iommu: add "sm_model" option Liu Yi L
@ 2019-07-09  2:15   ` Peter Xu
  2019-07-10 12:14     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Peter Xu @ 2019-07-09  2:15 UTC (permalink / raw)
  To: Liu Yi L
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	tianyu.lan, kevin.tian, jun.j.tian, yi.y.sun, kvm, Jacob Pan,
	Yi Sun

On Fri, Jul 05, 2019 at 07:01:37PM +0800, Liu Yi L wrote:
> Intel VT-d 3.0 introduces scalable mode, and it has a bunch of
> capabilities related to scalable mode translation, thus there
> are multiple combinations. While this vIOMMU implementation
> wants simplify it for user by providing typical combinations.
> User could config it by "sm_model" option. The usage is as
> below:
> 
> "-device intel-iommu,x-scalable-mode=on,sm_model=["legacy"|"scalable"]"

Is it a requirement to split into two parameters, instead of just
exposing everything about scalable mode when x-scalable-mode is set?

> 
>  - "legacy": gives support for SL page table
>  - "scalable": gives support for FL page table, pasid, virtual command
>  - default to be "legacy" if "x-scalable-mode=on while no sm_model is
>    configured
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> ---
>  hw/i386/intel_iommu.c          | 28 +++++++++++++++++++++++++++-
>  hw/i386/intel_iommu_internal.h |  2 ++
>  include/hw/i386/intel_iommu.h  |  1 +
>  3 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 44b1231..3160a05 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -3014,6 +3014,7 @@ static Property vtd_properties[] = {
>      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
>      DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode, FALSE),
>      DEFINE_PROP_BOOL("dma-drain", IntelIOMMUState, dma_drain, true),
> +    DEFINE_PROP_STRING("sm_model", IntelIOMMUState, sm_model),

Can do 's/-/_/' to follow the rest if we need it.

>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> @@ -3489,6 +3490,14 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
>      return;
>  }
>  
> +const char sm_model_manual[] =
> +        "\"-device intel-iommu,x-scalable-mode=on,"
> +        "sm_model=[\"legacy\"|\"scalable\"]\"\n"
> +        " - \"legacy\" gives support for SL page table based IOVA\n"
> +        " - \"scalable\" gives support for FL page table based IOVA and SVA\n"
> +        " - default to be \"legacy\" if \"x-scalable-mode=on\""
> +        " while no sm_model is configured\n";
> +
>  /* Do the initialization. It will also be called when reset, so pay
>   * attention when adding new initialization stuff.
>   */
> @@ -3557,9 +3566,26 @@ static void vtd_init(IntelIOMMUState *s)
>          s->cap |= VTD_CAP_CM;
>      }
>  
> +    if (s->sm_model && !s->scalable_mode) {
> +        printf("\n\"sm_model\" depends on \"x-scalable-mode\"\n"
> +               "please check if \"x-scalable-mode\" is expected\n"
> +               "\"sm_model\" manual:\n%s", sm_model_manual);
> +        exit(1);

Let's avoid calling exit() directly considering that we've had things
like vtd_decide_config() already which allows an Error**.  We can also
introduce that too into vtd_init() and pass the error to upper to
handle the failure.

> +    }
> +
>      /* TODO: read cap/ecap from host to decide which cap to be exposed. */
>      if (s->scalable_mode) {
> -        s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
> +        if (!s->sm_model || !strcmp(s->sm_model, "legacy")) {
> +            s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
> +        } else if (!strcmp(s->sm_model, "scalable")) {
> +            s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID
> +                       | VTD_ECAP_FLTS;

Do you also need VTD_ECAP_SLTS here?

> +        } else {
> +            printf("\n!!!!! Invalid sm_model config !!!!!\n"
> +                "Please config sm_model=[\"legacy\"|\"scalable\"]\n"
> +                "\"sm_model\" manual:\n%s", sm_model_manual);
> +            exit(1);

Same here.

Thanks,

> +        }
>      }
>  
>      vtd_reset_caches(s);
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index c1235a7..adae198 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -190,8 +190,10 @@
>  #define VTD_ECAP_PT                 (1ULL << 6)
>  #define VTD_ECAP_MHMV               (15ULL << 20)
>  #define VTD_ECAP_SRS                (1ULL << 31)
> +#define VTD_ECAP_PASID              (1ULL << 40)
>  #define VTD_ECAP_SMTS               (1ULL << 43)
>  #define VTD_ECAP_SLTS               (1ULL << 46)
> +#define VTD_ECAP_FLTS               (1ULL << 47)
>  
>  /* CAP_REG */
>  /* (offset >> 4) << 24 */
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 12f3d26..b51cc9f 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -270,6 +270,7 @@ struct IntelIOMMUState {
>      bool buggy_eim;                 /* Force buggy EIM unless eim=off */
>      uint8_t aw_bits;                /* Host/IOVA address width (in bits) */
>      bool dma_drain;                 /* Whether DMA r/w draining enabled */
> +    char *sm_model;          /* identify actual scalable mode iommu model*/
>  
>      /*
>       * Protects IOMMU states in general.  Currently it protects the
> -- 
> 2.7.4
> 

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
  2019-07-05 11:01 ` [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation Liu Yi L
@ 2019-07-09  2:23   ` Peter Xu
  2019-07-10 12:16     ` Liu, Yi L
  2019-07-15  2:55   ` David Gibson
  1 sibling, 1 reply; 58+ messages in thread
From: Peter Xu @ 2019-07-09  2:23 UTC (permalink / raw)
  To: Liu Yi L
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	tianyu.lan, kevin.tian, jun.j.tian, yi.y.sun, kvm, Jacob Pan,
	Yi Sun

On Fri, Jul 05, 2019 at 07:01:38PM +0800, Liu Yi L wrote:
> This patch adds vfio implementation PCIPASIDOps.alloc_pasid/free_pasid().
> These two functions are used to propagate guest pasid allocation and
> free requests to host via vfio container ioctl.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> ---
>  hw/vfio/pci.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 61 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index ce3fe96..ab184ad 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2690,6 +2690,65 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>      vdev->req_enabled = false;
>  }
>  
> +static int vfio_pci_device_request_pasid_alloc(PCIBus *bus,
> +                                               int32_t devfn,
> +                                               uint32_t min_pasid,
> +                                               uint32_t max_pasid)
> +{
> +    PCIDevice *pdev = bus->devices[devfn];
> +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> +    VFIOContainer *container = vdev->vbasedev.group->container;
> +    struct vfio_iommu_type1_pasid_request req;
> +    unsigned long argsz;
> +    int pasid;
> +
> +    argsz = sizeof(req);
> +    req.argsz = argsz;
> +    req.flag = VFIO_IOMMU_PASID_ALLOC;
> +    req.min_pasid = min_pasid;
> +    req.max_pasid = max_pasid;
> +
> +    rcu_read_lock();

Could I ask what's this RCU lock protecting?

> +    pasid = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> +    if (pasid < 0) {
> +        error_report("vfio_pci_device_request_pasid_alloc:"
> +                     " request failed, contanier: %p", container);

Can use __func__, also since we're going to dump the error after all,
we can also include the errno (pasid) here which seems to be more
helpful than the container pointer at least to me. :)

> +    }
> +    rcu_read_unlock();
> +    return pasid;
> +}
> +
> +static int vfio_pci_device_request_pasid_free(PCIBus *bus,
> +                                              int32_t devfn,
> +                                              uint32_t pasid)
> +{
> +    PCIDevice *pdev = bus->devices[devfn];
> +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> +    VFIOContainer *container = vdev->vbasedev.group->container;
> +    struct vfio_iommu_type1_pasid_request req;
> +    unsigned long argsz;
> +    int ret = 0;
> +
> +    argsz = sizeof(req);
> +    req.argsz = argsz;
> +    req.flag = VFIO_IOMMU_PASID_FREE;
> +    req.pasid = pasid;
> +
> +    rcu_read_lock();
> +    ret = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> +    if (ret != 0) {
> +        error_report("vfio_pci_device_request_pasid_free:"
> +                     " request failed, contanier: %p", container);
> +    }
> +    rcu_read_unlock();
> +    return ret;
> +}
> +
> +static PCIPASIDOps vfio_pci_pasid_ops = {
> +    .alloc_pasid = vfio_pci_device_request_pasid_alloc,
> +    .free_pasid = vfio_pci_device_request_pasid_free,
> +};
> +
>  static void vfio_realize(PCIDevice *pdev, Error **errp)
>  {
>      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> @@ -2991,6 +3050,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>      vfio_register_req_notifier(vdev);
>      vfio_setup_resetfn_quirk(vdev);
>  
> +    pci_setup_pasid_ops(pdev, &vfio_pci_pasid_ops);
> +
>      return;
>  
>  out_teardown:
> -- 
> 2.7.4
> 

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 06/18] intel_iommu: support virtual command emulation and pasid request
  2019-07-05 11:01 ` [RFC v1 06/18] intel_iommu: support virtual command emulation and pasid request Liu Yi L
@ 2019-07-09  3:19   ` Peter Xu
  2019-07-10 11:51     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Peter Xu @ 2019-07-09  3:19 UTC (permalink / raw)
  To: Liu Yi L
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	tianyu.lan, kevin.tian, jun.j.tian, yi.y.sun, kvm, Jacob Pan,
	Yi Sun

On Fri, Jul 05, 2019 at 07:01:39PM +0800, Liu Yi L wrote:
> This patch adds virtual command support to Intel vIOMMU per Intel VT-d 3.1
> spec. This patch adds two virtual commands: alloc_pasid and free_pasid.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> ---
>  hw/i386/intel_iommu.c          | 139 ++++++++++++++++++++++++++++++++++++++++-
>  hw/i386/intel_iommu_internal.h |  30 +++++++++
>  hw/i386/trace-events           |   1 +
>  include/hw/i386/intel_iommu.h  |   6 +-
>  4 files changed, 174 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 3160a05..3cf250d 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -932,11 +932,19 @@ static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
>                  s->vtd_as_by_bus_num[bus_num] = vtd_bus;
>                  return vtd_bus;
>              }
> +            vtd_bus = NULL;

Can move to ...
>          }

... here?

>      }
>      return vtd_bus;
>  }
>  
> +static PCIBus *vtd_find_pci_bus_from_bus_num(IntelIOMMUState *s,
> +                                             uint8_t bus_num)
> +{
> +    VTDBus *vtd_bus = vtd_find_as_from_bus_num(s, bus_num);
> +    return vtd_bus ? vtd_bus->bus : NULL;
> +}
> +
>  /* Given the @iova, get relevant @slptep. @slpte_level will be the last level
>   * of the translation, can be used for deciding the size of large page.
>   */
> @@ -2579,6 +2587,103 @@ static void vtd_handle_iectl_write(IntelIOMMUState *s)
>      }
>  }
>  
> +static int vtd_request_pasid_alloc(IntelIOMMUState *s)
> +{
> +    PCIBus *bus;
> +    int bus_n, devfn;
> +
> +    for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) {
> +        bus = vtd_find_pci_bus_from_bus_num(s, bus_n);
> +        if (!bus) {
> +            continue;
> +        }
> +        for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
> +            if (pci_device_is_ops_set(bus, devfn)) {
> +                return pci_device_request_pasid_alloc(bus, devfn,
> +                                                      VTD_MIN_HPASID,
> +                                                      VTD_MAX_HPASID);

Ah so here I see why pci_device_is_ops_set() is necessary... you
wanted to find a device that is vfio-pci and supports PASID.  This is
a bit awkward but indeed I don't know what's a better option to make
it a clearer interface if we can't let IOMMU to talk directly to vfio.

THe thing is that VFIO_IOMMU_PASID_REQUEST seems to be defined per
VFIO container, while VT-d spec is of course defining PASID allocation
as globally.  More context on how the pasid address space will be
defined and considerations behind (not only for this series, but for
the big picture of SVA work) would be greatly welcomed.

> +            }
> +        }
> +    }
> +    return -1;
> +}
> +
> +static int vtd_request_pasid_free(IntelIOMMUState *s, uint32_t pasid)
> +{
> +    PCIBus *bus;
> +    int bus_n, devfn;
> +
> +    for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) {
> +        bus = vtd_find_pci_bus_from_bus_num(s, bus_n);
> +        if (!bus) {
> +            continue;
> +        }
> +        for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
> +            if (pci_device_is_ops_set(bus, devfn)) {
> +                return pci_device_request_pasid_free(bus, devfn, pasid);
> +            }
> +        }
> +    }
> +    return -1;
> +}
> +
> +/* Handle write to Virtual Command Register */
> +static void vtd_handle_vcmd_write(IntelIOMMUState *s)
> +{
> +    uint32_t status = vtd_get_long_raw(s, DMAR_VCRSP_REG);
> +    uint32_t val = vtd_get_long_raw(s, DMAR_VCMD_REG);
> +    uint32_t pasid;
> +    int ret = -1;
> +
> +    trace_vtd_reg_write_vcmd(status, val);

Could we use s->vcrsp directly instead of using DMAR_VCRSP_REG?

> +
> +    switch (val & VTD_VCMD_CMD_MASK) {
> +    case VTD_VCMD_ALLOC_PASID:
> +        if (!(s->vccap & VTD_VCCAP_PAS) ||
> +             (s->vcrsp & 1)) {

Nit: we can consider to offer some helpers for them.

Also, I think we should check vcrsp&1 at the entry for all vcmds. [1]

> +            break;
> +        }
> +        s->vcrsp = 1;
> +        vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> +                         ((uint64_t) s->vcrsp));

Do we really need to emulate the "In Progress" like this?  The vcpu is
blocked here after all, and AFAICT all the rest of vcpus should not
access these registers because obviously these registers cannot be
accessed concurrently...

I think the IP bit is useful when some new vcmd would take plenty of
time so that we can do the long vcmds in async way.  However here it
seems not the case?

> +        ret = vtd_request_pasid_alloc(s);
> +        if (ret < 0) {
> +            s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_NO_AVAILABLE_PASID);
> +        } else {
> +            s->vcrsp |= VTD_VCRSP_RSLT(ret);
> +        }
> +        s->vcrsp &= (~((uint64_t)(0x1)));
> +        vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> +                         ((uint64_t) s->vcrsp));
> +        break;
> +
> +    case VTD_VCMD_FREE_PASID:
> +        if (!(s->vccap & VTD_VCCAP_PAS) ||
> +             (s->vcrsp & 1)) {
> +            break;
> +        }
> +        s->vcrsp &= 1;
> +        vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> +                         ((uint64_t) s->vcrsp));

Same here on IP bit emulation.  IMHO we can drop these and this
function can be greatly simplified.  Your call. :)

> +        pasid = VTD_VCMD_PASID_VALUE(val);
> +        ret = vtd_request_pasid_free(s, pasid);
> +        if (ret < 0) {
> +            s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_FREE_INVALID_PASID);
> +        }
> +        s->vcrsp &= (~((uint64_t)(0x1)));
> +        vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> +                         ((uint64_t) s->vcrsp));
> +        break;
> +
> +    default:
> +        s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_UNDEFINED_CMD);

(IMHO you can simply do s/|=/=/ here if you handle IP well at the
 entry of the function)

> +        vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> +                         ((uint64_t) s->vcrsp));
> +        printf("Virtual Command: unsupported command!!!\n");
> +        break;
> +    }
> +}
> +
>  static uint64_t vtd_mem_read(void *opaque, hwaddr addr, unsigned size)
>  {
>      IntelIOMMUState *s = opaque;
> @@ -2620,6 +2725,15 @@ static uint64_t vtd_mem_read(void *opaque, hwaddr addr, unsigned size)
>          val = s->iq >> 32;
>          break;
>  
> +    case DMAR_VCRSP_REG:
> +        val = s->vcrsp;
> +        break;
> +
> +    case DMAR_VCRSP_REG_HI:
> +        assert(size == 4);
> +        val = s->vcrsp >> 32;
> +        break;

If you're always with vtd_set_quad_raw()s then IMHO you can drop these
lines?  vtd_mem_read() has a default to handle all these.

> +
>      default:
>          if (size == 4) {
>              val = vtd_get_long(s, addr);
> @@ -2868,6 +2982,21 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
>          vtd_set_long(s, addr, val);
>          break;
>  
> +    case DMAR_VCMD_REG:
> +        if (size == 4) {
> +            vtd_set_long(s, addr, val);
> +        } else {
> +            vtd_set_quad(s, addr, val);
> +        }
> +        vtd_handle_vcmd_write(s);

IMHO you should do vtd_handle_vcmd_write() first and let it return a
value, when returning true you update the regisers using vtd_set_*()
otherwise you should skip (e.g., when IP is set in vcmd result reg).

> +        break;
> +
> +    case DMAR_VCMD_REG_HI:
> +        assert(size == 4);
> +        vtd_set_long(s, addr, val);
> +        vtd_handle_vcmd_write(s);

Same here?

> +        break;
> +
>      default:
>          if (size == 4) {
>              vtd_set_long(s, addr, val);
> @@ -3579,7 +3708,8 @@ static void vtd_init(IntelIOMMUState *s)
>              s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
>          } else if (!strcmp(s->sm_model, "scalable")) {
>              s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID
> -                       | VTD_ECAP_FLTS;
> +                       | VTD_ECAP_FLTS | VTD_ECAP_VCS;
> +            s->vccap |= VTD_VCCAP_PAS;
>          } else {
>              printf("\n!!!!! Invalid sm_model config !!!!!\n"
>                  "Please config sm_model=[\"legacy\"|\"scalable\"]\n"
> @@ -3641,6 +3771,13 @@ static void vtd_init(IntelIOMMUState *s)
>       * Interrupt remapping registers.
>       */
>      vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xfffffffffffff80fULL, 0);
> +
> +    /*
> +     * Virtual Command Definitions
> +     */
> +    vtd_define_quad(s, DMAR_VCCAP_REG, s->vccap, 0, 0);
> +    vtd_define_quad(s, DMAR_VCMD_REG, 0, 0xffffffffffffffffULL, 0);
> +    vtd_define_quad(s, DMAR_VCRSP_REG, 0, 0, 0);
>  }
>  
>  /* Should not reset address_spaces when reset because devices will still use
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index adae198..f5a2f0d 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -85,6 +85,12 @@
>  #define DMAR_MTRRCAP_REG_HI     0x104
>  #define DMAR_MTRRDEF_REG        0x108 /* MTRR default type */
>  #define DMAR_MTRRDEF_REG_HI     0x10c
> +#define DMAR_VCCAP_REG          0xE00 /* Virtual Command Capability Register */
> +#define DMAR_VCCAP_REG_HI       0xE04
> +#define DMAR_VCMD_REG           0xE10 /* Virtual Command Register */
> +#define DMAR_VCMD_REG_HI        0xE14
> +#define DMAR_VCRSP_REG          0xE20 /* Virtual Command Reponse Register */
> +#define DMAR_VCRSP_REG_HI       0xE24
>  
>  /* IOTLB registers */
>  #define DMAR_IOTLB_REG_OFFSET   0xf0 /* Offset to the IOTLB registers */
> @@ -192,6 +198,7 @@
>  #define VTD_ECAP_SRS                (1ULL << 31)
>  #define VTD_ECAP_PASID              (1ULL << 40)
>  #define VTD_ECAP_SMTS               (1ULL << 43)
> +#define VTD_ECAP_VCS                (1ULL << 44)
>  #define VTD_ECAP_SLTS               (1ULL << 46)
>  #define VTD_ECAP_FLTS               (1ULL << 47)
>  
> @@ -314,6 +321,29 @@ typedef enum VTDFaultReason {
>  
>  #define VTD_CONTEXT_CACHE_GEN_MAX       0xffffffffUL
>  
> +/* VCCAP_REG */
> +#define VTD_VCCAP_PAS               (1UL << 0)
> +#define VTD_MIN_HPASID              200

Comment this value a bit?

> +#define VTD_MAX_HPASID              0xFFFFF
> +
> +/* Virtual Command Register */
> +enum {
> +     VTD_VCMD_NULL_CMD = 0,
> +     VTD_VCMD_ALLOC_PASID,

Shall we spell " = 1" explicitly if defined in spec?

> +     VTD_VCMD_FREE_PASID,

Same here.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 09/18] intel_iommu: process pasid cache invalidation
  2019-07-05 11:01 ` [RFC v1 09/18] intel_iommu: process pasid cache invalidation Liu Yi L
@ 2019-07-09  4:47   ` Peter Xu
  2019-07-11  6:22     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Peter Xu @ 2019-07-09  4:47 UTC (permalink / raw)
  To: Liu Yi L
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	tianyu.lan, kevin.tian, jun.j.tian, yi.y.sun, kvm, Jacob Pan,
	Yi Sun

On Fri, Jul 05, 2019 at 07:01:42PM +0800, Liu Yi L wrote:
> +static bool vtd_process_pasid_desc(IntelIOMMUState *s,
> +                                   VTDInvDesc *inv_desc)
> +{
> +    if ((inv_desc->val[0] & VTD_INV_DESC_PASIDC_RSVD_VAL0) ||
> +        (inv_desc->val[1] & VTD_INV_DESC_PASIDC_RSVD_VAL1) ||
> +        (inv_desc->val[2] & VTD_INV_DESC_PASIDC_RSVD_VAL2) ||
> +        (inv_desc->val[3] & VTD_INV_DESC_PASIDC_RSVD_VAL3)) {
> +        trace_vtd_inv_desc("non-zero-field-in-pc_inv_desc",
> +                            inv_desc->val[1], inv_desc->val[0]);

The first parameter of trace_vtd_inv_desc() should be the type.

Can use error_report_once() here.

> +        return false;
> +    }
> +
> +    switch (inv_desc->val[0] & VTD_INV_DESC_PASIDC_G) {
> +    case VTD_INV_DESC_PASIDC_DSI:
> +        break;
> +
> +    case VTD_INV_DESC_PASIDC_PASID_SI:
> +        break;
> +
> +    case VTD_INV_DESC_PASIDC_GLOBAL:
> +        break;
> +
> +    default:
> +        trace_vtd_inv_desc("invalid-inv-granu-in-pc_inv_desc",
> +                            inv_desc->val[1], inv_desc->val[0]);

Here too.

> +        return false;
> +    }
> +
> +    return true;
> +}

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 10/18] intel_iommu: tag VTDAddressSpace instance with PASID
  2019-07-05 11:01 ` [RFC v1 10/18] intel_iommu: tag VTDAddressSpace instance with PASID Liu Yi L
@ 2019-07-09  6:12   ` Peter Xu
  2019-07-11  7:24     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Peter Xu @ 2019-07-09  6:12 UTC (permalink / raw)
  To: Liu Yi L
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	tianyu.lan, kevin.tian, jun.j.tian, yi.y.sun, kvm, Jacob Pan,
	Yi Sun

On Fri, Jul 05, 2019 at 07:01:43PM +0800, Liu Yi L wrote:
> This patch introduces new fields in VTDAddressSpace for further PASID support
> in Intel vIOMMU. In old time, each device has a VTDAddressSpace instance to
> stand for its guest IOVA address space when vIOMMU is enabled. However, when
> PASID is exposed to guest, device will have multiple address spaces which
> are tagged with PASID. To suit this change, VTDAddressSpace should be tagged
> with PASIDs in Intel vIOMMU.
> 
> To record PASID tagged VTDAddressSpaces, a hash table is introduced. The
> data in the hash table can be used for future sanity check and retrieve
> previous PASID configs of guest and also future emulated SVA DMA support
> for emulated SVA capable devices. The lookup key is a string and its format
> is as below:
> 
> "rsv%04dpasid%010dsid%06d" -- totally 32 bytes

Can we make it simply a struct?

        struct pasid_key {
                uint32_t pasid;
                uint16_t sid;
        }

Also I think we don't need to keep reserved bits because it'll be a
structure that'll only be used by QEMU so we can extend it easily in
the future when necessary.

[...]

> +static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id)
> +{
> +    VTDPASIDCacheInfo pc_info;
> +
> +    trace_vtd_pasid_cache_dsi(domain_id);
> +
> +    pc_info.flags = VTD_PASID_CACHE_DOMSI;
> +    pc_info.domain_id = domain_id;
> +
> +    /*
> +     * use g_hash_table_foreach_remove(), which will free the
> +     * vtd_pasid_as instances.
> +     */
> +    g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
> +    /*
> +     * TODO: Domain selective PASID cache invalidation
> +     * may be issued wrongly by programmer, to be safe,
> +     * after invalidating the pasid caches, emulator
> +     * needs to replay the pasid bindings by walking guest
> +     * pasid dir and pasid table.
> +     */

It seems to me that this is still unchanged for the whole series.
It's fine for RFC, but just a reminder that please either comment on
why we don't have something or implement what we need here...

[...]

>  /* Unmap the whole range in the notifier's scope. */
>  static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
>  {
> @@ -3914,6 +4076,8 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>                                       g_free, g_free);
>      s->vtd_as_by_busptr = g_hash_table_new_full(vtd_uint64_hash, vtd_uint64_equal,
>                                                g_free, g_free);
> +    s->vtd_pasid_as = g_hash_table_new_full(&g_str_hash, &g_str_equal,
> +                                     g_free, hash_pasid_as_free);

Can use g_free() and drop hash_pasid_as_free()?

Also, this patch only tries to drop entries of the hash table but the
hash table is never inserted or used.  I would suggest that you put
that part to be with this patch as a whole otherwise it's hard to
clarify how this hash table will be used.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 11/18] intel_iommu: create VTDAddressSpace per BDF+PASID
  2019-07-05 11:01 ` [RFC v1 11/18] intel_iommu: create VTDAddressSpace per BDF+PASID Liu Yi L
@ 2019-07-09  6:39   ` Peter Xu
  2019-07-11  8:13     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Peter Xu @ 2019-07-09  6:39 UTC (permalink / raw)
  To: Liu Yi L
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	tianyu.lan, kevin.tian, jun.j.tian, yi.y.sun, kvm, Jacob Pan,
	Yi Sun

On Fri, Jul 05, 2019 at 07:01:44PM +0800, Liu Yi L wrote:

[...]

> +/**
> + * This function finds or adds a VTDAddressSpace for a device when
> + * it is bound to a pasid
> + */
> +static VTDAddressSpace *vtd_add_find_pasid_as(IntelIOMMUState *s,
> +                                              PCIBus *bus,
> +                                              int devfn,
> +                                              uint32_t pasid,
> +                                              bool allocate)
> +{
> +    char key[32];
> +    char *new_key;
> +    VTDAddressSpace *vtd_pasid_as;
> +    uint16_t sid;
> +
> +    sid = vtd_make_source_id(pci_bus_num(bus), devfn);
> +    vtd_get_pasid_key(&key[0], 32, pasid, sid);
> +    vtd_pasid_as = g_hash_table_lookup(s->vtd_pasid_as, &key[0]);
> +
> +    if (!vtd_pasid_as && allocate) {
> +        new_key = g_malloc(32);
> +        vtd_get_pasid_key(&new_key[0], 32, pasid, sid);
> +        /*
> +         * Initiate the vtd_pasid_as structure.
> +         *
> +         * This structure here is used to track the guest pasid
> +         * binding and also serves as pasid-cache mangement entry.
> +         *
> +         * TODO: in future, if wants to support the SVA-aware DMA
> +         *       emulation, the vtd_pasid_as should be fully initialized.
> +         *       e.g. the address_space and memory region fields.
> +         */

I'm not very sure about this part.  IMHO all those memory regions are
used to inlay the whole IOMMU idea into QEMU's memory API framework.
Now even without the whole PASID support we've already have a workable
vtd_iommu_translate() that will intercept device DMA operations and we
can try to translate the IOVA to anything we want.  Now the iommu_idx
parameter of vtd_iommu_translate() is never used (I'd say until now I
still don't sure on whether the "iommu_idx" idea is the best we can
have... I've tried to debate on that but... anyway I assume for Intel
we can think it as the "pasid" information or at least contains it),
however in the further we can have that PASID/iommu_idx/whatever
passed into this translate() function too, then we can walk the 1st
level page table there if we found that this device had enabled the
1st level mapping (or even nested).  I don't see what else we need to
do to play with extra memory regions.

Conclusion: I feel like SVA can use its own structure here instead of
reusing VTDAddressSpace, because I think those memory regions can
probably be useless.  Even it will, we can refactor the code later,
but I really doubt it...

> +        vtd_pasid_as = g_malloc0(sizeof(VTDAddressSpace));
> +        vtd_pasid_as->iommu_state = s;
> +        vtd_pasid_as->bus = bus;
> +        vtd_pasid_as->devfn = devfn;
> +        vtd_pasid_as->context_cache_entry.context_cache_gen = 0;
> +        vtd_pasid_as->pasid = pasid;
> +        vtd_pasid_as->pasid_allocated = true;
> +        vtd_pasid_as->pasid_cache_entry.pasid_cache_gen = 0;
> +        g_hash_table_insert(s->vtd_pasid_as, new_key, vtd_pasid_as);
> +    }
> +    return vtd_pasid_as;
> +}

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 07/18] hw/pci: add pci_device_bind/unbind_gpasid
  2019-07-05 11:01 ` [RFC v1 07/18] hw/pci: add pci_device_bind/unbind_gpasid Liu Yi L
@ 2019-07-09  8:37   ` Auger Eric
  2019-07-10 12:18     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2019-07-09  8:37 UTC (permalink / raw)
  To: Liu Yi L, qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: david, tianyu.lan, kevin.tian, jun.j.tian, yi.y.sun, kvm,
	Jacob Pan, Yi Sun

Hi Liu,

On 7/5/19 1:01 PM, Liu Yi L wrote:
> This patch adds two callbacks pci_device_bind/unbind_gpasid() to
> PCIPASIDOps. These two callbacks are used to propagate guest pasid
> bind/unbind to host. The implementations of the callbacks would be
> device passthru modules like vfio.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
>  hw/pci/pci.c         | 30 ++++++++++++++++++++++++++++++
>  include/hw/pci/pci.h |  9 +++++++++
>  2 files changed, 39 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 710f9e9..2229229 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2676,6 +2676,36 @@ int pci_device_request_pasid_free(PCIBus *bus, int32_t devfn,
>      return -1;
>  }
>  
> +void pci_device_bind_gpasid(PCIBus *bus, int32_t devfn,
> +                                struct gpasid_bind_data *g_bind_data)
struct gpasid_bind_data is defined in linux headers so I think you would
need: #ifdef __linux__
> +{
> +    PCIDevice *dev;
> +
> +    if (!bus) {
> +        return;
> +    }
> +
> +    dev = bus->devices[devfn];
> +    if (dev && dev->pasid_ops) {
> +        dev->pasid_ops->bind_gpasid(bus, devfn, g_bind_data);
> +    }
> +}
> +
> +void pci_device_unbind_gpasid(PCIBus *bus, int32_t devfn,
> +                                struct gpasid_bind_data *g_bind_data)
> +{
> +    PCIDevice *dev;
> +
> +    if (!bus) {
> +        return;
> +    }
> +
> +    dev = bus->devices[devfn];
> +    if (dev && dev->pasid_ops) {
> +        dev->pasid_ops->unbind_gpasid(bus, devfn, g_bind_data);
> +    }
> +}
> +
>  static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque)
>  {
>      Range *range = opaque;
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 16e5b8e..8d849e6 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -9,6 +9,7 @@
>  #include "hw/isa/isa.h"
>  
>  #include "hw/pci/pcie.h"
> +#include <linux/iommu.h>
>  
>  extern bool pci_available;
>  
> @@ -267,6 +268,10 @@ struct PCIPASIDOps {
>      int (*alloc_pasid)(PCIBus *bus, int32_t devfn,
>                           uint32_t min_pasid, uint32_t max_pasid);
>      int (*free_pasid)(PCIBus *bus, int32_t devfn, uint32_t pasid);
> +    void (*bind_gpasid)(PCIBus *bus, int32_t devfn,
> +                            struct gpasid_bind_data *g_bind_data);
> +    void (*unbind_gpasid)(PCIBus *bus, int32_t devfn,
> +                            struct gpasid_bind_data *g_bind_data);
>  };
>  
>  struct PCIDevice {
> @@ -497,6 +502,10 @@ bool pci_device_is_ops_set(PCIBus *bus, int32_t devfn);
>  int pci_device_request_pasid_alloc(PCIBus *bus, int32_t devfn,
>                                     uint32_t min_pasid, uint32_t max_pasid);
>  int pci_device_request_pasid_free(PCIBus *bus, int32_t devfn, uint32_t pasid);
> +void pci_device_bind_gpasid(PCIBus *bus, int32_t devfn,
> +                            struct gpasid_bind_data *g_bind_data);
> +void pci_device_unbind_gpasid(PCIBus *bus, int32_t devfn,
> +                            struct gpasid_bind_data *g_bind_data);
>  
>  static inline void
>  pci_set_byte(uint8_t *config, uint8_t val)
> 
Thanks

Eric

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 08/18] vfio/pci: add vfio bind/unbind_gpasid implementation
  2019-07-05 11:01 ` [RFC v1 08/18] vfio/pci: add vfio bind/unbind_gpasid implementation Liu Yi L
@ 2019-07-09  8:37   ` Auger Eric
  2019-07-10 12:30     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2019-07-09  8:37 UTC (permalink / raw)
  To: Liu Yi L, qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: david, tianyu.lan, kevin.tian, jun.j.tian, yi.y.sun, kvm,
	Jacob Pan, Yi Sun

Hi Liu,

On 7/5/19 1:01 PM, Liu Yi L wrote:
> This patch adds vfio implementation PCIPASIDOps.bind_gpasid/unbind_pasid().
> These two functions are used to propagate guest pasid bind and unbind
> requests to host via vfio container ioctl.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
>  hw/vfio/pci.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 54 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index ab184ad..892b46c 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2744,9 +2744,63 @@ static int vfio_pci_device_request_pasid_free(PCIBus *bus,
>      return ret;
>  }
>  
> +static void vfio_pci_device_bind_gpasid(PCIBus *bus, int32_t devfn,
> +                                     struct gpasid_bind_data *g_bind_data)
> +{
> +    PCIDevice *pdev = bus->devices[devfn];
> +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> +    VFIOContainer *container = vdev->vbasedev.group->container;
> +    struct vfio_iommu_type1_bind *bind;
> +    struct vfio_iommu_type1_bind_guest_pasid *bind_guest_pasid;
> +    unsigned long argsz;
> +
> +    argsz = sizeof(*bind) + sizeof(*bind_guest_pasid);
> +    bind = g_malloc0(argsz);
> +    bind->argsz = argsz;
> +    bind->bind_type = VFIO_IOMMU_BIND_GUEST_PASID;
> +    bind_guest_pasid = (struct vfio_iommu_type1_bind_guest_pasid *) &bind->data;
> +    bind_guest_pasid->bind_data = *g_bind_data;
> +
> +    rcu_read_lock();
why do you need the rcu_read_lock?
> +    if (ioctl(container->fd, VFIO_IOMMU_BIND, bind) != 0) {
> +        error_report("vfio_pci_device_bind_gpasid:"
> +                     " bind failed, contanier: %p", container);
container
> +    }
> +    rcu_read_unlock();
> +    g_free(bind);
> +}
> +
> +static void vfio_pci_device_unbind_gpasid(PCIBus *bus, int32_t devfn,
> +                                     struct gpasid_bind_data *g_bind_data)
> +{
> +    PCIDevice *pdev = bus->devices[devfn];
> +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> +    VFIOContainer *container = vdev->vbasedev.group->container;
> +    struct vfio_iommu_type1_bind *bind;
> +    struct vfio_iommu_type1_bind_guest_pasid *bind_guest_pasid;
> +    unsigned long argsz;
> +
> +    argsz = sizeof(*bind) + sizeof(*bind_guest_pasid);
> +    bind = g_malloc0(argsz);
> +    bind->argsz = argsz;
> +    bind->bind_type = VFIO_IOMMU_BIND_GUEST_PASID;
> +    bind_guest_pasid = (struct vfio_iommu_type1_bind_guest_pasid *) &bind->data;
> +    bind_guest_pasid->bind_data = *g_bind_data;
> +
> +    rcu_read_lock();
> +    if (ioctl(container->fd, VFIO_IOMMU_UNBIND, bind) != 0) {
> +        error_report("vfio_pci_device_unbind_gpasid:"
> +                     " unbind failed, contanier: %p", container);
container
> +    }
> +    rcu_read_unlock();
> +    g_free(bind);
> +}
> +
>  static PCIPASIDOps vfio_pci_pasid_ops = {
>      .alloc_pasid = vfio_pci_device_request_pasid_alloc,
>      .free_pasid = vfio_pci_device_request_pasid_free,
> +    .bind_gpasid = vfio_pci_device_bind_gpasid,
> +    .unbind_gpasid = vfio_pci_device_unbind_gpasid,
>  };
>  
>  static void vfio_realize(PCIDevice *pdev, Error **errp)
> 

Thanks

Eric

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 02/18] linux-headers: import vfio.h from kernel
  2019-07-09  1:58   ` Peter Xu
@ 2019-07-09  8:37     ` Auger Eric
  2019-07-10 12:31       ` Liu, Yi L
  2019-07-10 12:29     ` Liu, Yi L
  1 sibling, 1 reply; 58+ messages in thread
From: Auger Eric @ 2019-07-09  8:37 UTC (permalink / raw)
  To: Peter Xu, Liu Yi L
  Cc: qemu-devel, mst, pbonzini, alex.williamson, david, tianyu.lan,
	kevin.tian, jun.j.tian, yi.y.sun, kvm, Jacob Pan, Yi Sun

Hi Liu,

On 7/9/19 3:58 AM, Peter Xu wrote:
> On Fri, Jul 05, 2019 at 07:01:35PM +0800, Liu Yi L wrote:
>> This patch imports the vIOMMU related definitions from kernel
>> uapi/vfio.h. e.g. pasid allocation, guest pasid bind, guest pasid
>> table bind and guest iommu cache invalidation.
>>
>> Cc: Kevin Tian <kevin.tian@intel.com>
>> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
>> Cc: Peter Xu <peterx@redhat.com>
>> Cc: Eric Auger <eric.auger@redhat.com>
>> Cc: Yi Sun <yi.y.sun@linux.intel.com>
>> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> 
> Just a note that in the last version you can use
> scripts/update-linux-headers.sh to update the headers.  For this RFC
> it's perfectly fine.
> 

You will need to update scripts/update-linux-headers.sh to import the
new iommu.h header. See "[RFC v4 02/27] update-linux-headers: Import
iommu.h"
https://www.mail-archive.com/qemu-devel@nongnu.org/msg620098.html.

Thanks

Eric

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 03/18] hw/pci: introduce PCIPASIDOps to PCIDevice
  2019-07-09  2:12   ` Peter Xu
@ 2019-07-09 10:41     ` Auger Eric
  2019-07-10 11:08     ` Liu, Yi L
  1 sibling, 0 replies; 58+ messages in thread
From: Auger Eric @ 2019-07-09 10:41 UTC (permalink / raw)
  To: Peter Xu, Liu Yi L
  Cc: qemu-devel, mst, pbonzini, alex.williamson, david, tianyu.lan,
	kevin.tian, jun.j.tian, yi.y.sun, kvm, Jacob Pan, Yi Sun

Hi Liu, Peter,

On 7/9/19 4:12 AM, Peter Xu wrote:
> On Fri, Jul 05, 2019 at 07:01:36PM +0800, Liu Yi L wrote:
>> +void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops)
>> +{
>> +    assert(ops && !dev->pasid_ops);
>> +    dev->pasid_ops = ops;
>> +}
>> +
>> +bool pci_device_is_ops_set(PCIBus *bus, int32_t devfn)
> 
> Name should be "pci_device_is_pasid_ops_set".  Or maybe you can simply
> drop this function because as long as you check it in helper functions
> like [1] below always then it seems even unecessary.

I think we need such query to know whether the PCI device needs to be
notified. This is somehow equivalent to the flags we had before but less
precise as we cannot query whether a specific callback is implemented.

Thanks

Eric
> 
>> +{
>> +    PCIDevice *dev;
>> +
>> +    if (!bus) {
>> +        return false;
>> +    }
>> +
>> +    dev = bus->devices[devfn];
>> +    return !!(dev && dev->pasid_ops);
>> +}
>> +
>> +int pci_device_request_pasid_alloc(PCIBus *bus, int32_t devfn,
>> +                                   uint32_t min_pasid, uint32_t max_pasid)
> 
> From VT-d spec I see that the virtual command "allocate pasid" does
> not have bdf information so it's global, but here we've got bus/devfn.
> I'm curious is that reserved for ARM or some other arch?
> 
>> +{
>> +    PCIDevice *dev;
>> +
>> +    if (!bus) {
>> +        return -1;
>> +    }
>> +
>> +    dev = bus->devices[devfn];
>> +    if (dev && dev->pasid_ops && dev->pasid_ops->alloc_pasid) {
> 
> [1]
> 
>> +        return dev->pasid_ops->alloc_pasid(bus, devfn, min_pasid, max_pasid);
>> +    }
>> +    return -1;
>> +}
>> +
>> +int pci_device_request_pasid_free(PCIBus *bus, int32_t devfn,
>> +                                  uint32_t pasid)
>> +{
>> +    PCIDevice *dev;
>> +
>> +    if (!bus) {
>> +        return -1;
>> +    }
>> +
>> +    dev = bus->devices[devfn];
>> +    if (dev && dev->pasid_ops && dev->pasid_ops->free_pasid) {
>> +        return dev->pasid_ops->free_pasid(bus, devfn, pasid);
>> +    }
>> +    return -1;
>> +}
>> +
>>  static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque)
>>  {
>>      Range *range = opaque;
>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>> index d082707..16e5b8e 100644
>> --- a/include/hw/pci/pci.h
>> +++ b/include/hw/pci/pci.h
>> @@ -262,6 +262,13 @@ struct PCIReqIDCache {
>>  };
>>  typedef struct PCIReqIDCache PCIReqIDCache;
>>  
>> +typedef struct PCIPASIDOps PCIPASIDOps;
>> +struct PCIPASIDOps {
>> +    int (*alloc_pasid)(PCIBus *bus, int32_t devfn,
>> +                         uint32_t min_pasid, uint32_t max_pasid);
>> +    int (*free_pasid)(PCIBus *bus, int32_t devfn, uint32_t pasid);
>> +};
>> +
>>  struct PCIDevice {
>>      DeviceState qdev;
>>  
>> @@ -351,6 +358,7 @@ struct PCIDevice {
>>      MSIVectorUseNotifier msix_vector_use_notifier;
>>      MSIVectorReleaseNotifier msix_vector_release_notifier;
>>      MSIVectorPollNotifier msix_vector_poll_notifier;
>> +    PCIPASIDOps *pasid_ops;
>>  };
>>  
>>  void pci_register_bar(PCIDevice *pci_dev, int region_num,
>> @@ -484,6 +492,12 @@ typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
>>  AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
>>  void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
>>  
>> +void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops);
>> +bool pci_device_is_ops_set(PCIBus *bus, int32_t devfn);
>> +int pci_device_request_pasid_alloc(PCIBus *bus, int32_t devfn,
>> +                                   uint32_t min_pasid, uint32_t max_pasid);
>> +int pci_device_request_pasid_free(PCIBus *bus, int32_t devfn, uint32_t pasid);
>> +
>>  static inline void
>>  pci_set_byte(uint8_t *config, uint8_t val)
>>  {
>> -- 
>> 2.7.4
>>
> 
> Regards,
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 03/18] hw/pci: introduce PCIPASIDOps to PCIDevice
  2019-07-09  2:12   ` Peter Xu
  2019-07-09 10:41     ` Auger Eric
@ 2019-07-10 11:08     ` Liu, Yi L
  2019-07-11  3:51       ` david
  1 sibling, 1 reply; 58+ messages in thread
From: Liu, Yi L @ 2019-07-10 11:08 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

> From: Peter Xu [mailto:zhexu@redhat.com]
> Sent: Tuesday, July 9, 2019 10:12 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v1 03/18] hw/pci: introduce PCIPASIDOps to PCIDevice
> 
> On Fri, Jul 05, 2019 at 07:01:36PM +0800, Liu Yi L wrote:
> > +void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops)
> > +{
> > +    assert(ops && !dev->pasid_ops);
> > +    dev->pasid_ops = ops;
> > +}
> > +
> > +bool pci_device_is_ops_set(PCIBus *bus, int32_t devfn)
> 
> Name should be "pci_device_is_pasid_ops_set".  Or maybe you can simply
> drop this function because as long as you check it in helper functions
> like [1] below always then it seems even unecessary.

yes, the name should be "pci_device_is_pasid_ops_set". I noticed your
comments on the necessity in another, let's talk in that thread. :-)

> > +{
> > +    PCIDevice *dev;
> > +
> > +    if (!bus) {
> > +        return false;
> > +    }
> > +
> > +    dev = bus->devices[devfn];
> > +    return !!(dev && dev->pasid_ops);
> > +}
> > +
> > +int pci_device_request_pasid_alloc(PCIBus *bus, int32_t devfn,
> > +                                   uint32_t min_pasid, uint32_t max_pasid)
> 
> From VT-d spec I see that the virtual command "allocate pasid" does
> not have bdf information so it's global, but here we've got bus/devfn.
> I'm curious is that reserved for ARM or some other arch?

You are right. VT-d spec doesn’t have bdf info. But we need to pass the
allocation request via vfio. So this function has bdf info. In vIOMMU side,
it should select a vfio-pci device and invoke this callback when it wants to
request PASID alloc/free.

> > +{
> > +    PCIDevice *dev;
> > +
> > +    if (!bus) {
> > +        return -1;
> > +    }
> > +
> > +    dev = bus->devices[devfn];
> > +    if (dev && dev->pasid_ops && dev->pasid_ops->alloc_pasid) {
> 
> [1]
> 
> > +        return dev->pasid_ops->alloc_pasid(bus, devfn, min_pasid, max_pasid);

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 06/18] intel_iommu: support virtual command emulation and pasid request
  2019-07-09  3:19   ` Peter Xu
@ 2019-07-10 11:51     ` Liu, Yi L
  2019-07-11  1:13       ` Peter Xu
  0 siblings, 1 reply; 58+ messages in thread
From: Liu, Yi L @ 2019-07-10 11:51 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

> From: Peter Xu [mailto:zhexu@redhat.com]
> Sent: Tuesday, July 9, 2019 11:19 AM
> Subject: Re: [RFC v1 06/18] intel_iommu: support virtual command emulation and
> pasid request
> 
> On Fri, Jul 05, 2019 at 07:01:39PM +0800, Liu Yi L wrote:
> > This patch adds virtual command support to Intel vIOMMU per Intel VT-d 3.1
> > spec. This patch adds two virtual commands: alloc_pasid and free_pasid.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> > ---
> >  hw/i386/intel_iommu.c          | 139
> ++++++++++++++++++++++++++++++++++++++++-
> >  hw/i386/intel_iommu_internal.h |  30 +++++++++
> >  hw/i386/trace-events           |   1 +
> >  include/hw/i386/intel_iommu.h  |   6 +-
> >  4 files changed, 174 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 3160a05..3cf250d 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -932,11 +932,19 @@ static VTDBus
> *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
> >                  s->vtd_as_by_bus_num[bus_num] = vtd_bus;
> >                  return vtd_bus;
> >              }
> > +            vtd_bus = NULL;
> 
> Can move to ...
> >          }
> 
> ... here?

yes, it is. could save variable assignments.

> >      }
> >      return vtd_bus;
> >  }
> >
> > +static PCIBus *vtd_find_pci_bus_from_bus_num(IntelIOMMUState *s,
> > +                                             uint8_t bus_num)
> > +{
> > +    VTDBus *vtd_bus = vtd_find_as_from_bus_num(s, bus_num);
> > +    return vtd_bus ? vtd_bus->bus : NULL;
> > +}
> > +
> >  /* Given the @iova, get relevant @slptep. @slpte_level will be the last level
> >   * of the translation, can be used for deciding the size of large page.
> >   */
> > @@ -2579,6 +2587,103 @@ static void vtd_handle_iectl_write(IntelIOMMUState
> *s)
> >      }
> >  }
> >
> > +static int vtd_request_pasid_alloc(IntelIOMMUState *s)
> > +{
> > +    PCIBus *bus;
> > +    int bus_n, devfn;
> > +
> > +    for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) {
> > +        bus = vtd_find_pci_bus_from_bus_num(s, bus_n);
> > +        if (!bus) {
> > +            continue;
> > +        }
> > +        for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
> > +            if (pci_device_is_ops_set(bus, devfn)) {
> > +                return pci_device_request_pasid_alloc(bus, devfn,
> > +                                                      VTD_MIN_HPASID,
> > +                                                      VTD_MAX_HPASID);
> 
> Ah so here I see why pci_device_is_ops_set() is necessary... you
> wanted to find a device that is vfio-pci and supports PASID.  This is
> a bit awkward but indeed I don't know what's a better option to make
> it a clearer interface if we can't let IOMMU to talk directly to vfio.

yes, it is.

> THe thing is that VFIO_IOMMU_PASID_REQUEST seems to be defined per
> VFIO container, while VT-d spec is of course defining PASID allocation
> as globally. 

right.

> More context on how the pasid address space will be
> defined and considerations behind (not only for this series, but for
> the big picture of SVA work) would be greatly welcomed.

already noticed in other replies. let's align one by one.

> > +            }
> > +        }
> > +    }
> > +    return -1;
> > +}
> > +
> > +static int vtd_request_pasid_free(IntelIOMMUState *s, uint32_t pasid)
> > +{
> > +    PCIBus *bus;
> > +    int bus_n, devfn;
> > +
> > +    for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) {
> > +        bus = vtd_find_pci_bus_from_bus_num(s, bus_n);
> > +        if (!bus) {
> > +            continue;
> > +        }
> > +        for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
> > +            if (pci_device_is_ops_set(bus, devfn)) {
> > +                return pci_device_request_pasid_free(bus, devfn, pasid);
> > +            }
> > +        }
> > +    }
> > +    return -1;
> > +}
> > +
> > +/* Handle write to Virtual Command Register */
> > +static void vtd_handle_vcmd_write(IntelIOMMUState *s)
> > +{
> > +    uint32_t status = vtd_get_long_raw(s, DMAR_VCRSP_REG);
> > +    uint32_t val = vtd_get_long_raw(s, DMAR_VCMD_REG);
> > +    uint32_t pasid;
> > +    int ret = -1;
> > +
> > +    trace_vtd_reg_write_vcmd(status, val);
> 
> Could we use s->vcrsp directly instead of using DMAR_VCRSP_REG?

yes, I think so.

> > +
> > +    switch (val & VTD_VCMD_CMD_MASK) {
> > +    case VTD_VCMD_ALLOC_PASID:
> > +        if (!(s->vccap & VTD_VCCAP_PAS) ||
> > +             (s->vcrsp & 1)) {
> 
> Nit: we can consider to offer some helpers for them.

will have helper for them.

> Also, I think we should check vcrsp&1 at the entry for all vcmds. [1]

Agreed.

> > +            break;
> > +        }
> > +        s->vcrsp = 1;
> > +        vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> > +                         ((uint64_t) s->vcrsp));
> 
> Do we really need to emulate the "In Progress" like this?  The vcpu is
> blocked here after all, and AFAICT all the rest of vcpus should not
> access these registers because obviously these registers cannot be
> accessed concurrently...

Other vcpus should poll the IP bit before submitting vcmds. As IP bit
is set, other vcpus will not access these bits. but if not, they may submit
new vcmds, while we only have 1 response register, that is not we
support. That's why we need to set IP bit.

> 
> I think the IP bit is useful when some new vcmd would take plenty of
> time so that we can do the long vcmds in async way.  However here it
> seems not the case?

no, so far, it is synchronize way. As mentioned above, IP bit is to ensure
only one vcmd is handled for a time. Other vcpus won't be able to submit
vcmds before IP is cleared.

> > +        ret = vtd_request_pasid_alloc(s);
> > +        if (ret < 0) {
> > +            s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_NO_AVAILABLE_PASID);
> > +        } else {
> > +            s->vcrsp |= VTD_VCRSP_RSLT(ret);
> > +        }
> > +        s->vcrsp &= (~((uint64_t)(0x1)));
> > +        vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> > +                         ((uint64_t) s->vcrsp));
> > +        break;
> > +
> > +    case VTD_VCMD_FREE_PASID:
> > +        if (!(s->vccap & VTD_VCCAP_PAS) ||
> > +             (s->vcrsp & 1)) {
> > +            break;
> > +        }
> > +        s->vcrsp &= 1;
> > +        vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> > +                         ((uint64_t) s->vcrsp));
> 
> Same here on IP bit emulation.  IMHO we can drop these and this
> function can be greatly simplified.  Your call. :)
> 
> > +        pasid = VTD_VCMD_PASID_VALUE(val);
> > +        ret = vtd_request_pasid_free(s, pasid);
> > +        if (ret < 0) {
> > +            s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_FREE_INVALID_PASID);
> > +        }
> > +        s->vcrsp &= (~((uint64_t)(0x1)));
> > +        vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> > +                         ((uint64_t) s->vcrsp));
> > +        break;
> > +
> > +    default:
> > +        s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_UNDEFINED_CMD);
> 
> (IMHO you can simply do s/|=/=/ here if you handle IP well at the
>  entry of the function)
> 
> > +        vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> > +                         ((uint64_t) s->vcrsp));
> > +        printf("Virtual Command: unsupported command!!!\n");
> > +        break;
> > +    }
> > +}
> > +
> >  static uint64_t vtd_mem_read(void *opaque, hwaddr addr, unsigned size)
> >  {
> >      IntelIOMMUState *s = opaque;
> > @@ -2620,6 +2725,15 @@ static uint64_t vtd_mem_read(void *opaque, hwaddr
> addr, unsigned size)
> >          val = s->iq >> 32;
> >          break;
> >
> > +    case DMAR_VCRSP_REG:
> > +        val = s->vcrsp;
> > +        break;
> > +
> > +    case DMAR_VCRSP_REG_HI:
> > +        assert(size == 4);
> > +        val = s->vcrsp >> 32;
> > +        break;
> 
> If you're always with vtd_set_quad_raw()s then IMHO you can drop these
> lines?  vtd_mem_read() has a default to handle all these.

aha, yes. nice suggestion.

> > +
> >      default:
> >          if (size == 4) {
> >              val = vtd_get_long(s, addr);
> > @@ -2868,6 +2982,21 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
> >          vtd_set_long(s, addr, val);
> >          break;
> >
> > +    case DMAR_VCMD_REG:
> > +        if (size == 4) {
> > +            vtd_set_long(s, addr, val);
> > +        } else {
> > +            vtd_set_quad(s, addr, val);
> > +        }
> > +        vtd_handle_vcmd_write(s);
> 
> IMHO you should do vtd_handle_vcmd_write() first and let it return a
> value, when returning true you update the regisers using vtd_set_*()
> otherwise you should skip (e.g., when IP is set in vcmd result reg).

Good. Let me refine the logic here.

> > +        break;
> > +
> > +    case DMAR_VCMD_REG_HI:
> > +        assert(size == 4);
> > +        vtd_set_long(s, addr, val);
> > +        vtd_handle_vcmd_write(s);
> 
> Same here?

Accepted.

> > +        break;
> > +
> >      default:
> >          if (size == 4) {
> >              vtd_set_long(s, addr, val);
> > @@ -3579,7 +3708,8 @@ static void vtd_init(IntelIOMMUState *s)
> >              s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
> >          } else if (!strcmp(s->sm_model, "scalable")) {
> >              s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID
> > -                       | VTD_ECAP_FLTS;
> > +                       | VTD_ECAP_FLTS | VTD_ECAP_VCS;
> > +            s->vccap |= VTD_VCCAP_PAS;
> >          } else {
> >              printf("\n!!!!! Invalid sm_model config !!!!!\n"
> >                  "Please config sm_model=[\"legacy\"|\"scalable\"]\n"
> > @@ -3641,6 +3771,13 @@ static void vtd_init(IntelIOMMUState *s)
> >       * Interrupt remapping registers.
> >       */
> >      vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xfffffffffffff80fULL, 0);
> > +
> > +    /*
> > +     * Virtual Command Definitions
> > +     */
> > +    vtd_define_quad(s, DMAR_VCCAP_REG, s->vccap, 0, 0);
> > +    vtd_define_quad(s, DMAR_VCMD_REG, 0, 0xffffffffffffffffULL, 0);
> > +    vtd_define_quad(s, DMAR_VCRSP_REG, 0, 0, 0);
> >  }
> >
> >  /* Should not reset address_spaces when reset because devices will still use
> > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> > index adae198..f5a2f0d 100644
> > --- a/hw/i386/intel_iommu_internal.h
> > +++ b/hw/i386/intel_iommu_internal.h
> > @@ -85,6 +85,12 @@
> >  #define DMAR_MTRRCAP_REG_HI     0x104
> >  #define DMAR_MTRRDEF_REG        0x108 /* MTRR default type */
> >  #define DMAR_MTRRDEF_REG_HI     0x10c
> > +#define DMAR_VCCAP_REG          0xE00 /* Virtual Command Capability Register */
> > +#define DMAR_VCCAP_REG_HI       0xE04
> > +#define DMAR_VCMD_REG           0xE10 /* Virtual Command Register */
> > +#define DMAR_VCMD_REG_HI        0xE14
> > +#define DMAR_VCRSP_REG          0xE20 /* Virtual Command Reponse Register */
> > +#define DMAR_VCRSP_REG_HI       0xE24
> >
> >  /* IOTLB registers */
> >  #define DMAR_IOTLB_REG_OFFSET   0xf0 /* Offset to the IOTLB registers */
> > @@ -192,6 +198,7 @@
> >  #define VTD_ECAP_SRS                (1ULL << 31)
> >  #define VTD_ECAP_PASID              (1ULL << 40)
> >  #define VTD_ECAP_SMTS               (1ULL << 43)
> > +#define VTD_ECAP_VCS                (1ULL << 44)
> >  #define VTD_ECAP_SLTS               (1ULL << 46)
> >  #define VTD_ECAP_FLTS               (1ULL << 47)
> >
> > @@ -314,6 +321,29 @@ typedef enum VTDFaultReason {
> >
> >  #define VTD_CONTEXT_CACHE_GEN_MAX       0xffffffffUL
> >
> > +/* VCCAP_REG */
> > +#define VTD_VCCAP_PAS               (1UL << 0)
> > +#define VTD_MIN_HPASID              200
> 
> Comment this value a bit?

The basic idea is to let hypervisor to set a range for available PASIDs for
VMs. One of the reasons is PASID #0 is reserved by RID_PASID usage.
We have no idea how many reserved PASIDs in future, so here just a
evaluated value. Honestly, set it as "1" is enough at current stage.

> > +#define VTD_MAX_HPASID              0xFFFFF
> > +
> > +/* Virtual Command Register */
> > +enum {
> > +     VTD_VCMD_NULL_CMD = 0,
> > +     VTD_VCMD_ALLOC_PASID,
> 
> Shall we spell " = 1" explicitly if defined in spec?

yes, it is.

> > +     VTD_VCMD_FREE_PASID,
> 
> Same here.

Accepted.

> 
> Regards,
> 
> --
> Peter Xu

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 04/18] intel_iommu: add "sm_model" option
  2019-07-09  2:15   ` Peter Xu
@ 2019-07-10 12:14     ` Liu, Yi L
  2019-07-11  1:03       ` Peter Xu
  0 siblings, 1 reply; 58+ messages in thread
From: Liu, Yi L @ 2019-07-10 12:14 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	tianyu.lan, Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan,
	Yi Sun

> From: Peter Xu [mailto:zhexu@redhat.com]
> Sent: Tuesday, July 9, 2019 10:16 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v1 04/18] intel_iommu: add "sm_model" option
> 
> On Fri, Jul 05, 2019 at 07:01:37PM +0800, Liu Yi L wrote:
> > Intel VT-d 3.0 introduces scalable mode, and it has a bunch of
> > capabilities related to scalable mode translation, thus there
> > are multiple combinations. While this vIOMMU implementation
> > wants simplify it for user by providing typical combinations.
> > User could config it by "sm_model" option. The usage is as
> > below:
> >
> > "-device intel-iommu,x-scalable-mode=on,sm_model=["legacy"|"scalable"]"
> 
> Is it a requirement to split into two parameters, instead of just
> exposing everything about scalable mode when x-scalable-mode is set?

yes, it is. Scalable mode has multiple capabilities. And we want to support
the most typical combinations to simplify software. e.g. current scalable mode
vIOMMU exposes only 2nd level translation to guest, and guest IOVA support
is via shadowing guest 2nd level page table. We have plan to move IOVA from
2nd level page table to 1st level page table, thus guest IOVA can be supported
with nested translation. And this also addresses the co-existence issue of guest
SVA and guest IOVA. So in future we will have scalable mode vIOMMU expose
1st level translation only. To differentiate this config with current vIOMMU,
we need an extra option to control it. But yes, it is still scalable mode vIOMMU.
just has different capability exposed to guest.

BTW. do you know if I can add sub-options under "x-scalable-mode"? I think
that may demonstrate the dependency better.

> >
> >  - "legacy": gives support for SL page table
> >  - "scalable": gives support for FL page table, pasid, virtual command
> >  - default to be "legacy" if "x-scalable-mode=on while no sm_model is
> >    configured
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> > ---
> >  hw/i386/intel_iommu.c          | 28 +++++++++++++++++++++++++++-
> >  hw/i386/intel_iommu_internal.h |  2 ++
> >  include/hw/i386/intel_iommu.h  |  1 +
> >  3 files changed, 30 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 44b1231..3160a05 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -3014,6 +3014,7 @@ static Property vtd_properties[] = {
> >      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode,
> FALSE),
> >      DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode,
> FALSE),
> >      DEFINE_PROP_BOOL("dma-drain", IntelIOMMUState, dma_drain, true),
> > +    DEFINE_PROP_STRING("sm_model", IntelIOMMUState, sm_model),
> 
> Can do 's/-/_/' to follow the rest if we need it.

Do you mean sub-options after "x-scalable-mode"?

> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >
> > @@ -3489,6 +3490,14 @@ static void vtd_iommu_replay(IOMMUMemoryRegion
> *iommu_mr, IOMMUNotifier *n)
> >      return;
> >  }
> >
> > +const char sm_model_manual[] =
> > +        "\"-device intel-iommu,x-scalable-mode=on,"
> > +        "sm_model=[\"legacy\"|\"scalable\"]\"\n"
> > +        " - \"legacy\" gives support for SL page table based IOVA\n"
> > +        " - \"scalable\" gives support for FL page table based IOVA and SVA\n"
> > +        " - default to be \"legacy\" if \"x-scalable-mode=on\""
> > +        " while no sm_model is configured\n";
> > +
> >  /* Do the initialization. It will also be called when reset, so pay
> >   * attention when adding new initialization stuff.
> >   */
> > @@ -3557,9 +3566,26 @@ static void vtd_init(IntelIOMMUState *s)
> >          s->cap |= VTD_CAP_CM;
> >      }
> >
> > +    if (s->sm_model && !s->scalable_mode) {
> > +        printf("\n\"sm_model\" depends on \"x-scalable-mode\"\n"
> > +               "please check if \"x-scalable-mode\" is expected\n"
> > +               "\"sm_model\" manual:\n%s", sm_model_manual);
> > +        exit(1);
> 
> Let's avoid calling exit() directly considering that we've had things
> like vtd_decide_config() already which allows an Error**.  We can also
> introduce that too into vtd_init() and pass the error to upper to
> handle the failure.

sure.

> > +    }
> > +
> >      /* TODO: read cap/ecap from host to decide which cap to be exposed. */
> >      if (s->scalable_mode) {
> > -        s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
> > +        if (!s->sm_model || !strcmp(s->sm_model, "legacy")) {
> > +            s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
> > +        } else if (!strcmp(s->sm_model, "scalable")) {
> > +            s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID
> > +                       | VTD_ECAP_FLTS;
> 
> Do you also need VTD_ECAP_SLTS here?

As mentioned above, in long term, we want to expose FLT to guest only.

> > +        } else {
> > +            printf("\n!!!!! Invalid sm_model config !!!!!\n"
> > +                "Please config sm_model=[\"legacy\"|\"scalable\"]\n"
> > +                "\"sm_model\" manual:\n%s", sm_model_manual);
> > +            exit(1);
> 
> Same here.

got it.

> Thanks,
> 
> > +        }
> >      }
> >
> >      vtd_reset_caches(s);
> > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> > index c1235a7..adae198 100644
> > --- a/hw/i386/intel_iommu_internal.h
> > +++ b/hw/i386/intel_iommu_internal.h
> > @@ -190,8 +190,10 @@
> >  #define VTD_ECAP_PT                 (1ULL << 6)
> >  #define VTD_ECAP_MHMV               (15ULL << 20)
> >  #define VTD_ECAP_SRS                (1ULL << 31)
> > +#define VTD_ECAP_PASID              (1ULL << 40)
> >  #define VTD_ECAP_SMTS               (1ULL << 43)
> >  #define VTD_ECAP_SLTS               (1ULL << 46)
> > +#define VTD_ECAP_FLTS               (1ULL << 47)
> >
> >  /* CAP_REG */
> >  /* (offset >> 4) << 24 */
> > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > index 12f3d26..b51cc9f 100644
> > --- a/include/hw/i386/intel_iommu.h
> > +++ b/include/hw/i386/intel_iommu.h
> > @@ -270,6 +270,7 @@ struct IntelIOMMUState {
> >      bool buggy_eim;                 /* Force buggy EIM unless eim=off */
> >      uint8_t aw_bits;                /* Host/IOVA address width (in bits) */
> >      bool dma_drain;                 /* Whether DMA r/w draining enabled */
> > +    char *sm_model;          /* identify actual scalable mode iommu model*/
> >
> >      /*
> >       * Protects IOMMU states in general.  Currently it protects the
> > --
> > 2.7.4
> >
> 
> Regards,
> 
> --
> Peter Xu

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
  2019-07-09  2:23   ` Peter Xu
@ 2019-07-10 12:16     ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2019-07-10 12:16 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

> From: Peter Xu [mailto:zhexu@redhat.com]
> Sent: Tuesday, July 9, 2019 10:24 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
> 
> On Fri, Jul 05, 2019 at 07:01:38PM +0800, Liu Yi L wrote:
> > This patch adds vfio implementation PCIPASIDOps.alloc_pasid/free_pasid().
> > These two functions are used to propagate guest pasid allocation and
> > free requests to host via vfio container ioctl.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Cc: David Gibson <david@gibson.dropbear.id.au>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> > ---
> >  hw/vfio/pci.c | 61
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 61 insertions(+)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index ce3fe96..ab184ad 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -2690,6 +2690,65 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice
> *vdev)
> >      vdev->req_enabled = false;
> >  }
> >
> > +static int vfio_pci_device_request_pasid_alloc(PCIBus *bus,
> > +                                               int32_t devfn,
> > +                                               uint32_t min_pasid,
> > +                                               uint32_t max_pasid)
> > +{
> > +    PCIDevice *pdev = bus->devices[devfn];
> > +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> > +    VFIOContainer *container = vdev->vbasedev.group->container;
> > +    struct vfio_iommu_type1_pasid_request req;
> > +    unsigned long argsz;
> > +    int pasid;
> > +
> > +    argsz = sizeof(req);
> > +    req.argsz = argsz;
> > +    req.flag = VFIO_IOMMU_PASID_ALLOC;
> > +    req.min_pasid = min_pasid;
> > +    req.max_pasid = max_pasid;
> > +
> > +    rcu_read_lock();
> 
> Could I ask what's this RCU lock protecting?

good catch, let me remove it.

> 
> > +    pasid = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> > +    if (pasid < 0) {
> > +        error_report("vfio_pci_device_request_pasid_alloc:"
> > +                     " request failed, contanier: %p", container);
> 
> Can use __func__, also since we're going to dump the error after all,
> we can also include the errno (pasid) here which seems to be more
> helpful than the container pointer at least to me. :)

accepted, thanks.

> > +    }
> > +    rcu_read_unlock();
> > +    return pasid;
> > +}
> > +
> > +static int vfio_pci_device_request_pasid_free(PCIBus *bus,
> > +                                              int32_t devfn,
> > +                                              uint32_t pasid)
> > +{
> > +    PCIDevice *pdev = bus->devices[devfn];
> > +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> > +    VFIOContainer *container = vdev->vbasedev.group->container;
> > +    struct vfio_iommu_type1_pasid_request req;
> > +    unsigned long argsz;
> > +    int ret = 0;
> > +
> > +    argsz = sizeof(req);
> > +    req.argsz = argsz;
> > +    req.flag = VFIO_IOMMU_PASID_FREE;
> > +    req.pasid = pasid;
> > +
> > +    rcu_read_lock();
> > +    ret = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> > +    if (ret != 0) {
> > +        error_report("vfio_pci_device_request_pasid_free:"
> > +                     " request failed, contanier: %p", container);
> > +    }
> > +    rcu_read_unlock();
> > +    return ret;
> > +}
> > +
> > +static PCIPASIDOps vfio_pci_pasid_ops = {
> > +    .alloc_pasid = vfio_pci_device_request_pasid_alloc,
> > +    .free_pasid = vfio_pci_device_request_pasid_free,
> > +};
> > +
> >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> >  {
> >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > @@ -2991,6 +3050,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> >      vfio_register_req_notifier(vdev);
> >      vfio_setup_resetfn_quirk(vdev);
> >
> > +    pci_setup_pasid_ops(pdev, &vfio_pci_pasid_ops);
> > +
> >      return;
> >
> >  out_teardown:
> > --
> > 2.7.4
> >
> 
> Regards,
> 
> --
> Peter Xu

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 07/18] hw/pci: add pci_device_bind/unbind_gpasid
  2019-07-09  8:37   ` Auger Eric
@ 2019-07-10 12:18     ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2019-07-10 12:18 UTC (permalink / raw)
  To: Auger Eric, qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: david, Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

Hi Eric,

> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: Tuesday, July 9, 2019 4:38 PM
> To: Liu, Yi L <yi.l.liu@intel.com>; qemu-devel@nongnu.org; mst@redhat.com;
> Subject: Re: [RFC v1 07/18] hw/pci: add pci_device_bind/unbind_gpasid
> 
> Hi Liu,
> 
> On 7/5/19 1:01 PM, Liu Yi L wrote:
> > This patch adds two callbacks pci_device_bind/unbind_gpasid() to
> > PCIPASIDOps. These two callbacks are used to propagate guest pasid
> > bind/unbind to host. The implementations of the callbacks would be
> > device passthru modules like vfio.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Cc: David Gibson <david@gibson.dropbear.id.au>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> >  hw/pci/pci.c         | 30 ++++++++++++++++++++++++++++++
> >  include/hw/pci/pci.h |  9 +++++++++
> >  2 files changed, 39 insertions(+)
> >
> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 710f9e9..2229229 100644
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -2676,6 +2676,36 @@ int pci_device_request_pasid_free(PCIBus *bus,
> int32_t devfn,
> >      return -1;
> >  }
> >
> > +void pci_device_bind_gpasid(PCIBus *bus, int32_t devfn,
> > +                                struct gpasid_bind_data *g_bind_data)
> struct gpasid_bind_data is defined in linux headers so I think you would
> need: #ifdef __linux__

Oops, thanks for the remind.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 02/18] linux-headers: import vfio.h from kernel
  2019-07-09  1:58   ` Peter Xu
  2019-07-09  8:37     ` Auger Eric
@ 2019-07-10 12:29     ` Liu, Yi L
  1 sibling, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2019-07-10 12:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	tianyu.lan, Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan,
	Yi Sun

> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
> Of Peter Xu
> Sent: Tuesday, July 9, 2019 9:58 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v1 02/18] linux-headers: import vfio.h from kernel
> 
> On Fri, Jul 05, 2019 at 07:01:35PM +0800, Liu Yi L wrote:
> > This patch imports the vIOMMU related definitions from kernel
> > uapi/vfio.h. e.g. pasid allocation, guest pasid bind, guest pasid
> > table bind and guest iommu cache invalidation.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> 
> Just a note that in the last version you can use scripts/update-linux-headers.sh to
> update the headers.  For this RFC it's perfectly fine.

yep, thanks for the remind.

> --
> Peter Xu

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 08/18] vfio/pci: add vfio bind/unbind_gpasid implementation
  2019-07-09  8:37   ` Auger Eric
@ 2019-07-10 12:30     ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2019-07-10 12:30 UTC (permalink / raw)
  To: Auger Eric, qemu-devel, mst, pbonzini, alex.williamson, peterx
  Cc: david, tianyu.lan, Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm,
	Jacob Pan, Yi Sun

> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: Tuesday, July 9, 2019 4:38 PM
> Subject: Re: [RFC v1 08/18] vfio/pci: add vfio bind/unbind_gpasid implementation
> 
> Hi Liu,
> 
> On 7/5/19 1:01 PM, Liu Yi L wrote:
> > This patch adds vfio implementation PCIPASIDOps.bind_gpasid/unbind_pasid().
> > These two functions are used to propagate guest pasid bind and unbind
> > requests to host via vfio container ioctl.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Cc: David Gibson <david@gibson.dropbear.id.au>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> >  hw/vfio/pci.c | 54
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 54 insertions(+)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index ab184ad..892b46c
> > 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -2744,9 +2744,63 @@ static int vfio_pci_device_request_pasid_free(PCIBus
> *bus,
> >      return ret;
> >  }
> >
> > +static void vfio_pci_device_bind_gpasid(PCIBus *bus, int32_t devfn,
> > +                                     struct gpasid_bind_data
> > +*g_bind_data) {
> > +    PCIDevice *pdev = bus->devices[devfn];
> > +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> > +    VFIOContainer *container = vdev->vbasedev.group->container;
> > +    struct vfio_iommu_type1_bind *bind;
> > +    struct vfio_iommu_type1_bind_guest_pasid *bind_guest_pasid;
> > +    unsigned long argsz;
> > +
> > +    argsz = sizeof(*bind) + sizeof(*bind_guest_pasid);
> > +    bind = g_malloc0(argsz);
> > +    bind->argsz = argsz;
> > +    bind->bind_type = VFIO_IOMMU_BIND_GUEST_PASID;
> > +    bind_guest_pasid = (struct vfio_iommu_type1_bind_guest_pasid *) &bind-
> >data;
> > +    bind_guest_pasid->bind_data = *g_bind_data;
> > +
> > +    rcu_read_lock();
> why do you need the rcu_read_lock?
> > +    if (ioctl(container->fd, VFIO_IOMMU_BIND, bind) != 0) {
> > +        error_report("vfio_pci_device_bind_gpasid:"
> > +                     " bind failed, contanier: %p", container);
> container

nice catch. :-)

> > +    }
> > +    rcu_read_unlock();
> > +    g_free(bind);
> > +}
> > +
> > +static void vfio_pci_device_unbind_gpasid(PCIBus *bus, int32_t devfn,
> > +                                     struct gpasid_bind_data
> > +*g_bind_data) {
> > +    PCIDevice *pdev = bus->devices[devfn];
> > +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> > +    VFIOContainer *container = vdev->vbasedev.group->container;
> > +    struct vfio_iommu_type1_bind *bind;
> > +    struct vfio_iommu_type1_bind_guest_pasid *bind_guest_pasid;
> > +    unsigned long argsz;
> > +
> > +    argsz = sizeof(*bind) + sizeof(*bind_guest_pasid);
> > +    bind = g_malloc0(argsz);
> > +    bind->argsz = argsz;
> > +    bind->bind_type = VFIO_IOMMU_BIND_GUEST_PASID;
> > +    bind_guest_pasid = (struct vfio_iommu_type1_bind_guest_pasid *) &bind-
> >data;
> > +    bind_guest_pasid->bind_data = *g_bind_data;
> > +
> > +    rcu_read_lock();
> > +    if (ioctl(container->fd, VFIO_IOMMU_UNBIND, bind) != 0) {
> > +        error_report("vfio_pci_device_unbind_gpasid:"
> > +                     " unbind failed, contanier: %p", container);
> container

oops, Thanks,

> > +    }
> > +    rcu_read_unlock();
> > +    g_free(bind);
> > +}
> > +
> >  static PCIPASIDOps vfio_pci_pasid_ops = {
> >      .alloc_pasid = vfio_pci_device_request_pasid_alloc,
> >      .free_pasid = vfio_pci_device_request_pasid_free,
> > +    .bind_gpasid = vfio_pci_device_bind_gpasid,
> > +    .unbind_gpasid = vfio_pci_device_unbind_gpasid,
> >  };
> >
> >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> >
> 
> Thanks
> 
> Eric

Yi Liu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 02/18] linux-headers: import vfio.h from kernel
  2019-07-09  8:37     ` Auger Eric
@ 2019-07-10 12:31       ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2019-07-10 12:31 UTC (permalink / raw)
  To: Auger Eric, Peter Xu
  Cc: qemu-devel, mst, pbonzini, alex.williamson, david, tianyu.lan,
	Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: Tuesday, July 9, 2019 4:38 PM
> To: Peter Xu <zhexu@redhat.com>; Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v1 02/18] linux-headers: import vfio.h from kernel
> 
> Hi Liu,
> 
> On 7/9/19 3:58 AM, Peter Xu wrote:
> > On Fri, Jul 05, 2019 at 07:01:35PM +0800, Liu Yi L wrote:
> >> This patch imports the vIOMMU related definitions from kernel
> >> uapi/vfio.h. e.g. pasid allocation, guest pasid bind, guest pasid
> >> table bind and guest iommu cache invalidation.
> >>
> >> Cc: Kevin Tian <kevin.tian@intel.com>
> >> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> >> Cc: Peter Xu <peterx@redhat.com>
> >> Cc: Eric Auger <eric.auger@redhat.com>
> >> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> >> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> >> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> >> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> >
> > Just a note that in the last version you can use
> > scripts/update-linux-headers.sh to update the headers.  For this RFC
> > it's perfectly fine.
> >
> 
> You will need to update scripts/update-linux-headers.sh to import the new iommu.h
> header. See "[RFC v4 02/27] update-linux-headers: Import iommu.h"
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg620098.html.

Thanks very much Eric. :-)

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 04/18] intel_iommu: add "sm_model" option
  2019-07-10 12:14     ` Liu, Yi L
@ 2019-07-11  1:03       ` Peter Xu
  2019-07-11  6:25         ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Peter Xu @ 2019-07-11  1:03 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Peter Xu, qemu-devel, mst, pbonzini, alex.williamson, eric.auger,
	david, tianyu.lan, Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm,
	Jacob Pan, Yi Sun

On Wed, Jul 10, 2019 at 12:14:44PM +0000, Liu, Yi L wrote:
> > From: Peter Xu [mailto:zhexu@redhat.com]
> > Sent: Tuesday, July 9, 2019 10:16 AM
> > To: Liu, Yi L <yi.l.liu@intel.com>
> > Subject: Re: [RFC v1 04/18] intel_iommu: add "sm_model" option
> > 
> > On Fri, Jul 05, 2019 at 07:01:37PM +0800, Liu Yi L wrote:
> > > Intel VT-d 3.0 introduces scalable mode, and it has a bunch of
> > > capabilities related to scalable mode translation, thus there
> > > are multiple combinations. While this vIOMMU implementation
> > > wants simplify it for user by providing typical combinations.
> > > User could config it by "sm_model" option. The usage is as
> > > below:
> > >
> > > "-device intel-iommu,x-scalable-mode=on,sm_model=["legacy"|"scalable"]"
> > 
> > Is it a requirement to split into two parameters, instead of just
> > exposing everything about scalable mode when x-scalable-mode is set?
> 
> yes, it is. Scalable mode has multiple capabilities. And we want to support
> the most typical combinations to simplify software. e.g. current scalable mode
> vIOMMU exposes only 2nd level translation to guest, and guest IOVA support
> is via shadowing guest 2nd level page table. We have plan to move IOVA from
> 2nd level page table to 1st level page table, thus guest IOVA can be supported
> with nested translation. And this also addresses the co-existence issue of guest
> SVA and guest IOVA. So in future we will have scalable mode vIOMMU expose
> 1st level translation only. To differentiate this config with current vIOMMU,
> we need an extra option to control it. But yes, it is still scalable mode vIOMMU.
> just has different capability exposed to guest.

I see.  Thanks for explaining.

> 
> BTW. do you know if I can add sub-options under "x-scalable-mode"? I think
> that may demonstrate the dependency better.

I'm not an expert of that, but I think at least we can make it a
string parameter depends on what you prefer, then we can do
"x-scalable-mode=legacy|modern".  Or keep this would be fine too.

> 
> > >
> > >  - "legacy": gives support for SL page table
> > >  - "scalable": gives support for FL page table, pasid, virtual command
> > >  - default to be "legacy" if "x-scalable-mode=on while no sm_model is
> > >    configured
> > >
> > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > Cc: Peter Xu <peterx@redhat.com>
> > > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> > > ---
> > >  hw/i386/intel_iommu.c          | 28 +++++++++++++++++++++++++++-
> > >  hw/i386/intel_iommu_internal.h |  2 ++
> > >  include/hw/i386/intel_iommu.h  |  1 +
> > >  3 files changed, 30 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > index 44b1231..3160a05 100644
> > > --- a/hw/i386/intel_iommu.c
> > > +++ b/hw/i386/intel_iommu.c
> > > @@ -3014,6 +3014,7 @@ static Property vtd_properties[] = {
> > >      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode,
> > FALSE),
> > >      DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode,
> > FALSE),
> > >      DEFINE_PROP_BOOL("dma-drain", IntelIOMMUState, dma_drain, true),
> > > +    DEFINE_PROP_STRING("sm_model", IntelIOMMUState, sm_model),
> > 
> > Can do 's/-/_/' to follow the rest if we need it.
> 
> Do you mean sub-options after "x-scalable-mode"?

No, I only mean "sm-model". :)

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 06/18] intel_iommu: support virtual command emulation and pasid request
  2019-07-10 11:51     ` Liu, Yi L
@ 2019-07-11  1:13       ` Peter Xu
  2019-07-11  6:59         ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Peter Xu @ 2019-07-11  1:13 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Peter Xu, qemu-devel, mst, pbonzini, alex.williamson, eric.auger,
	david, Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan,
	Yi Sun

On Wed, Jul 10, 2019 at 11:51:17AM +0000, Liu, Yi L wrote:

[...]

> > > +        s->vcrsp = 1;
> > > +        vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> > > +                         ((uint64_t) s->vcrsp));
> > 
> > Do we really need to emulate the "In Progress" like this?  The vcpu is
> > blocked here after all, and AFAICT all the rest of vcpus should not
> > access these registers because obviously these registers cannot be
> > accessed concurrently...
> 
> Other vcpus should poll the IP bit before submitting vcmds. As IP bit
> is set, other vcpus will not access these bits. but if not, they may submit
> new vcmds, while we only have 1 response register, that is not we
> support. That's why we need to set IP bit.

I still don't think another CPU can use this register even if it
polled with IP==0...  The reason is simply as you described - we only
have one pair of VCMD/VRSPD registers so IMHO the guest IOMMU driver
must have a lock (probably a mutex) to guarantee sequential access of
these registers otherwise race can happen.

> 
> > 
> > I think the IP bit is useful when some new vcmd would take plenty of
> > time so that we can do the long vcmds in async way.  However here it
> > seems not the case?
> 
> no, so far, it is synchronize way. As mentioned above, IP bit is to ensure
> only one vcmd is handled for a time. Other vcpus won't be able to submit
> vcmds before IP is cleared.

[...]

> > > @@ -192,6 +198,7 @@
> > >  #define VTD_ECAP_SRS                (1ULL << 31)
> > >  #define VTD_ECAP_PASID              (1ULL << 40)
> > >  #define VTD_ECAP_SMTS               (1ULL << 43)
> > > +#define VTD_ECAP_VCS                (1ULL << 44)
> > >  #define VTD_ECAP_SLTS               (1ULL << 46)
> > >  #define VTD_ECAP_FLTS               (1ULL << 47)
> > >
> > > @@ -314,6 +321,29 @@ typedef enum VTDFaultReason {
> > >
> > >  #define VTD_CONTEXT_CACHE_GEN_MAX       0xffffffffUL
> > >
> > > +/* VCCAP_REG */
> > > +#define VTD_VCCAP_PAS               (1UL << 0)
> > > +#define VTD_MIN_HPASID              200
> > 
> > Comment this value a bit?
> 
> The basic idea is to let hypervisor to set a range for available PASIDs for
> VMs. One of the reasons is PASID #0 is reserved by RID_PASID usage.
> We have no idea how many reserved PASIDs in future, so here just a
> evaluated value. Honestly, set it as "1" is enough at current stage.

That'll be a very nice initial comment for that (I mean, put it into
the patch, of course :).

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 03/18] hw/pci: introduce PCIPASIDOps to PCIDevice
  2019-07-10 11:08     ` Liu, Yi L
@ 2019-07-11  3:51       ` david
  2019-07-11  7:13         ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: david @ 2019-07-11  3:51 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Peter Xu, qemu-devel, mst, pbonzini, alex.williamson, eric.auger,
	Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

[-- Attachment #1: Type: text/plain, Size: 2706 bytes --]

On Wed, Jul 10, 2019 at 11:08:15AM +0000, Liu, Yi L wrote:
> > From: Peter Xu [mailto:zhexu@redhat.com]
> > Sent: Tuesday, July 9, 2019 10:12 AM
> > To: Liu, Yi L <yi.l.liu@intel.com>
> > Subject: Re: [RFC v1 03/18] hw/pci: introduce PCIPASIDOps to PCIDevice
> > 
> > On Fri, Jul 05, 2019 at 07:01:36PM +0800, Liu Yi L wrote:
> > > +void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops)
> > > +{
> > > +    assert(ops && !dev->pasid_ops);
> > > +    dev->pasid_ops = ops;
> > > +}
> > > +
> > > +bool pci_device_is_ops_set(PCIBus *bus, int32_t devfn)
> > 
> > Name should be "pci_device_is_pasid_ops_set".  Or maybe you can simply
> > drop this function because as long as you check it in helper functions
> > like [1] below always then it seems even unecessary.
> 
> yes, the name should be "pci_device_is_pasid_ops_set". I noticed your
> comments on the necessity in another, let's talk in that thread. :-)
> 
> > > +{
> > > +    PCIDevice *dev;
> > > +
> > > +    if (!bus) {
> > > +        return false;
> > > +    }
> > > +
> > > +    dev = bus->devices[devfn];
> > > +    return !!(dev && dev->pasid_ops);
> > > +}
> > > +
> > > +int pci_device_request_pasid_alloc(PCIBus *bus, int32_t devfn,
> > > +                                   uint32_t min_pasid, uint32_t max_pasid)
> > 
> > From VT-d spec I see that the virtual command "allocate pasid" does
> > not have bdf information so it's global, but here we've got bus/devfn.
> > I'm curious is that reserved for ARM or some other arch?
> 
> You are right. VT-d spec doesn’t have bdf info. But we need to pass the
> allocation request via vfio. So this function has bdf info. In vIOMMU side,
> it should select a vfio-pci device and invoke this callback when it wants to
> request PASID alloc/free.

That doesn't seem conceptually right.  IIUC, the pasids "belong" to a
sort of SVM context.  It seems to be the alloc should be on that
object - and that object would already have some connection to any
relevant vfio containers.  At the vfio level this seems like it should
be a container operation rather than a device operation.

> > > +{
> > > +    PCIDevice *dev;
> > > +
> > > +    if (!bus) {
> > > +        return -1;
> > > +    }
> > > +
> > > +    dev = bus->devices[devfn];
> > > +    if (dev && dev->pasid_ops && dev->pasid_ops->alloc_pasid) {
> > 
> > [1]
> > 
> > > +        return dev->pasid_ops->alloc_pasid(bus, devfn, min_pasid, max_pasid);
> 
> Thanks,
> Yi Liu

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 09/18] intel_iommu: process pasid cache invalidation
  2019-07-09  4:47   ` Peter Xu
@ 2019-07-11  6:22     ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2019-07-11  6:22 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

> From: Peter Xu [mailto:zhexu@redhat.com]
> Sent: Tuesday, July 9, 2019 12:48 PM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Cc: qemu-devel@nongnu.org; mst@redhat.com; pbonzini@redhat.com;
> alex.williamson@redhat.com; eric.auger@redhat.com;
> david@gibson.dropbear.id.au; tianyu.lan@intel.com; Tian, Kevin
> <kevin.tian@intel.com>; Tian, Jun J <jun.j.tian@intel.com>; Sun, Yi Y
> <yi.y.sun@intel.com>; kvm@vger.kernel.org; Jacob Pan
> <jacob.jun.pan@linux.intel.com>; Yi Sun <yi.y.sun@linux.intel.com>
> Subject: Re: [RFC v1 09/18] intel_iommu: process pasid cache invalidation
> 
> On Fri, Jul 05, 2019 at 07:01:42PM +0800, Liu Yi L wrote:
> > +static bool vtd_process_pasid_desc(IntelIOMMUState *s,
> > +                                   VTDInvDesc *inv_desc) {
> > +    if ((inv_desc->val[0] & VTD_INV_DESC_PASIDC_RSVD_VAL0) ||
> > +        (inv_desc->val[1] & VTD_INV_DESC_PASIDC_RSVD_VAL1) ||
> > +        (inv_desc->val[2] & VTD_INV_DESC_PASIDC_RSVD_VAL2) ||
> > +        (inv_desc->val[3] & VTD_INV_DESC_PASIDC_RSVD_VAL3)) {
> > +        trace_vtd_inv_desc("non-zero-field-in-pc_inv_desc",
> > +                            inv_desc->val[1], inv_desc->val[0]);
> 
> The first parameter of trace_vtd_inv_desc() should be the type.
> 
> Can use error_report_once() here.

I think so, let me switch to use it in next version.

> > +        return false;
> > +    }
> > +
> > +    switch (inv_desc->val[0] & VTD_INV_DESC_PASIDC_G) {
> > +    case VTD_INV_DESC_PASIDC_DSI:
> > +        break;
> > +
> > +    case VTD_INV_DESC_PASIDC_PASID_SI:
> > +        break;
> > +
> > +    case VTD_INV_DESC_PASIDC_GLOBAL:
> > +        break;
> > +
> > +    default:
> > +        trace_vtd_inv_desc("invalid-inv-granu-in-pc_inv_desc",
> > +                            inv_desc->val[1], inv_desc->val[0]);
> 
> Here too.

Got it.

Thanks,
Yi Liu

> > +        return false;
> > +    }
> > +
> > +    return true;
> > +}
> 
> Regards,
> 
> --
> Peter Xu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 04/18] intel_iommu: add "sm_model" option
  2019-07-11  1:03       ` Peter Xu
@ 2019-07-11  6:25         ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2019-07-11  6:25 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	tianyu.lan, Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan,
	Yi Sun

> From: Peter Xu [mailto:zhexu@redhat.com]
> Sent: Thursday, July 11, 2019 9:04 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v1 04/18] intel_iommu: add "sm_model" option
> 
> On Wed, Jul 10, 2019 at 12:14:44PM +0000, Liu, Yi L wrote:
> > > From: Peter Xu [mailto:zhexu@redhat.com]
> > > Sent: Tuesday, July 9, 2019 10:16 AM
> > > To: Liu, Yi L <yi.l.liu@intel.com>
> > > Subject: Re: [RFC v1 04/18] intel_iommu: add "sm_model" option
> > >
> > > On Fri, Jul 05, 2019 at 07:01:37PM +0800, Liu Yi L wrote:
> > > > Intel VT-d 3.0 introduces scalable mode, and it has a bunch of
> > > > capabilities related to scalable mode translation, thus there are
> > > > multiple combinations. While this vIOMMU implementation wants
> > > > simplify it for user by providing typical combinations.
> > > > User could config it by "sm_model" option. The usage is as
> > > > below:
> > > >
> > > > "-device intel-iommu,x-scalable-mode=on,sm_model=["legacy"|"scalable"]"
> > >
> > > Is it a requirement to split into two parameters, instead of just
> > > exposing everything about scalable mode when x-scalable-mode is set?
> >
> > yes, it is. Scalable mode has multiple capabilities. And we want to
> > support the most typical combinations to simplify software. e.g.
> > current scalable mode vIOMMU exposes only 2nd level translation to
> > guest, and guest IOVA support is via shadowing guest 2nd level page
> > table. We have plan to move IOVA from 2nd level page table to 1st
> > level page table, thus guest IOVA can be supported with nested
> > translation. And this also addresses the co-existence issue of guest
> > SVA and guest IOVA. So in future we will have scalable mode vIOMMU
> > expose 1st level translation only. To differentiate this config with current vIOMMU,
> we need an extra option to control it. But yes, it is still scalable mode vIOMMU.
> > just has different capability exposed to guest.
> 
> I see.  Thanks for explaining.

you are welcome. :-)

> 
> >
> > BTW. do you know if I can add sub-options under "x-scalable-mode"? I
> > think that may demonstrate the dependency better.
> 
> I'm not an expert of that, but I think at least we can make it a string parameter
> depends on what you prefer, then we can do "x-scalable-mode=legacy|modern".  Or
> keep this would be fine too.

hmmm, it's a good idea. If we agree to change x-scalable-mode to be a string
parameter. I think I can change it.

> >
> > > >
> > > >  - "legacy": gives support for SL page table
> > > >  - "scalable": gives support for FL page table, pasid, virtual
> > > > command
> > > >  - default to be "legacy" if "x-scalable-mode=on while no sm_model is
> > > >    configured
> > > >
> > > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > Cc: Peter Xu <peterx@redhat.com>
> > > > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > > Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> > > > ---
> > > >  hw/i386/intel_iommu.c          | 28 +++++++++++++++++++++++++++-
> > > >  hw/i386/intel_iommu_internal.h |  2 ++
> > > > include/hw/i386/intel_iommu.h  |  1 +
> > > >  3 files changed, 30 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> > > > 44b1231..3160a05 100644
> > > > --- a/hw/i386/intel_iommu.c
> > > > +++ b/hw/i386/intel_iommu.c
> > > > @@ -3014,6 +3014,7 @@ static Property vtd_properties[] = {
> > > >      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState,
> > > > caching_mode,
> > > FALSE),
> > > >      DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState,
> > > > scalable_mode,
> > > FALSE),
> > > >      DEFINE_PROP_BOOL("dma-drain", IntelIOMMUState, dma_drain,
> > > > true),
> > > > +    DEFINE_PROP_STRING("sm_model", IntelIOMMUState, sm_model),
> > >
> > > Can do 's/-/_/' to follow the rest if we need it.
> >
> > Do you mean sub-options after "x-scalable-mode"?
> 
> No, I only mean "sm-model". :)

got it. if we modify x-scalable-mode to be string, then sm-model would be
removed.

Regards,
Yi Liu

> Regards,
> 
> --
> Peter Xu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 06/18] intel_iommu: support virtual command emulation and pasid request
  2019-07-11  1:13       ` Peter Xu
@ 2019-07-11  6:59         ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2019-07-11  6:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

> From: Peter Xu [mailto:zhexu@redhat.com]
> Sent: Thursday, July 11, 2019 9:13 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v1 06/18] intel_iommu: support virtual command emulation and
> pasid request
> 
> On Wed, Jul 10, 2019 at 11:51:17AM +0000, Liu, Yi L wrote:
> 
> [...]
> 
> > > > +        s->vcrsp = 1;
> > > > +        vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> > > > +                         ((uint64_t) s->vcrsp));
> > >
> > > Do we really need to emulate the "In Progress" like this?  The vcpu is
> > > blocked here after all, and AFAICT all the rest of vcpus should not
> > > access these registers because obviously these registers cannot be
> > > accessed concurrently...
> >
> > Other vcpus should poll the IP bit before submitting vcmds. As IP bit
> > is set, other vcpus will not access these bits. but if not, they may submit
> > new vcmds, while we only have 1 response register, that is not we
> > support. That's why we need to set IP bit.
> 
> I still don't think another CPU can use this register even if it
> polled with IP==0...  The reason is simply as you described - we only
> have one pair of VCMD/VRSPD registers so IMHO the guest IOMMU driver
> must have a lock (probably a mutex) to guarantee sequential access of
> these registers otherwise race can happen.

Got it. So the case here is: other vcpus will not be able to access the VCMD/
VRSP due the lock in guest iommu driver. So IP bit is only used to block any
further VCMDs from the same vcpu which gained the lock. But we are emulating
VCMD/VRSP in a synchronize manner, so vcpu has no way to submit new VCMDs
before a prior VMCD is completed.

> >
> > >
> > > I think the IP bit is useful when some new vcmd would take plenty of
> > > time so that we can do the long vcmds in async way.  However here it
> > > seems not the case?
> >
> > no, so far, it is synchronize way. As mentioned above, IP bit is to ensure
> > only one vcmd is handled for a time. Other vcpus won't be able to submit
> > vcmds before IP is cleared.
> 
> [...]
> 
> > > > @@ -192,6 +198,7 @@
> > > >  #define VTD_ECAP_SRS                (1ULL << 31)
> > > >  #define VTD_ECAP_PASID              (1ULL << 40)
> > > >  #define VTD_ECAP_SMTS               (1ULL << 43)
> > > > +#define VTD_ECAP_VCS                (1ULL << 44)
> > > >  #define VTD_ECAP_SLTS               (1ULL << 46)
> > > >  #define VTD_ECAP_FLTS               (1ULL << 47)
> > > >
> > > > @@ -314,6 +321,29 @@ typedef enum VTDFaultReason {
> > > >
> > > >  #define VTD_CONTEXT_CACHE_GEN_MAX       0xffffffffUL
> > > >
> > > > +/* VCCAP_REG */
> > > > +#define VTD_VCCAP_PAS               (1UL << 0)
> > > > +#define VTD_MIN_HPASID              200
> > >
> > > Comment this value a bit?
> >
> > The basic idea is to let hypervisor to set a range for available PASIDs for
> > VMs. One of the reasons is PASID #0 is reserved by RID_PASID usage.
> > We have no idea how many reserved PASIDs in future, so here just a
> > evaluated value. Honestly, set it as "1" is enough at current stage.
> 
> That'll be a very nice initial comment for that (I mean, put it into
> the patch, of course :).

Got it. will add it in next version.

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 03/18] hw/pci: introduce PCIPASIDOps to PCIDevice
  2019-07-11  3:51       ` david
@ 2019-07-11  7:13         ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2019-07-11  7:13 UTC (permalink / raw)
  To: david
  Cc: Peter Xu, qemu-devel, mst, pbonzini, alex.williamson, eric.auger,
	Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

> From: david@gibson.dropbear.id.au [mailto:david@gibson.dropbear.id.au]
> Sent: Thursday, July 11, 2019 11:52 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v1 03/18] hw/pci: introduce PCIPASIDOps to PCIDevice
> 
> On Wed, Jul 10, 2019 at 11:08:15AM +0000, Liu, Yi L wrote:
> > > From: Peter Xu [mailto:zhexu@redhat.com]
> > > Sent: Tuesday, July 9, 2019 10:12 AM
> > > To: Liu, Yi L <yi.l.liu@intel.com>
> > > Subject: Re: [RFC v1 03/18] hw/pci: introduce PCIPASIDOps to PCIDevice
> > >
> > > On Fri, Jul 05, 2019 at 07:01:36PM +0800, Liu Yi L wrote:
> > > > +void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops)
> > > > +{
> > > > +    assert(ops && !dev->pasid_ops);
> > > > +    dev->pasid_ops = ops;
> > > > +}
> > > > +
> > > > +bool pci_device_is_ops_set(PCIBus *bus, int32_t devfn)
> > >
> > > Name should be "pci_device_is_pasid_ops_set".  Or maybe you can simply
> > > drop this function because as long as you check it in helper functions
> > > like [1] below always then it seems even unecessary.
> >
> > yes, the name should be "pci_device_is_pasid_ops_set". I noticed your
> > comments on the necessity in another, let's talk in that thread. :-)
> >
> > > > +{
> > > > +    PCIDevice *dev;
> > > > +
> > > > +    if (!bus) {
> > > > +        return false;
> > > > +    }
> > > > +
> > > > +    dev = bus->devices[devfn];
> > > > +    return !!(dev && dev->pasid_ops);
> > > > +}
> > > > +
> > > > +int pci_device_request_pasid_alloc(PCIBus *bus, int32_t devfn,
> > > > +                                   uint32_t min_pasid, uint32_t max_pasid)
> > >
> > > From VT-d spec I see that the virtual command "allocate pasid" does
> > > not have bdf information so it's global, but here we've got bus/devfn.
> > > I'm curious is that reserved for ARM or some other arch?
> >
> > You are right. VT-d spec doesn’t have bdf info. But we need to pass the
> > allocation request via vfio. So this function has bdf info. In vIOMMU side,
> > it should select a vfio-pci device and invoke this callback when it wants to
> > request PASID alloc/free.
> 
> That doesn't seem conceptually right.  IIUC, the pasids "belong" to a
> sort of SVM context.  It seems to be the alloc should be on that
> object - and that object would already have some connection to any
> relevant vfio containers.  At the vfio level this seems like it should
> be a container operation rather than a device operation.

Hi David,

Yeah, I agree it should finally be a container operation. Actually, in the
callback implementation, it is a container operation. May refer to the
implementation in below patch. :-)

[RFC v1 05/18] vfio/pci: add pasid alloc/free implementation

Thanks,
Yi Liu

> > > > +{
> > > > +    PCIDevice *dev;
> > > > +
> > > > +    if (!bus) {
> > > > +        return -1;
> > > > +    }
> > > > +
> > > > +    dev = bus->devices[devfn];
> > > > +    if (dev && dev->pasid_ops && dev->pasid_ops->alloc_pasid) {
> > >
> > > [1]
> > >
> > > > +        return dev->pasid_ops->alloc_pasid(bus, devfn, min_pasid, max_pasid);
> >
> > Thanks,
> > Yi Liu
> 
> --
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 10/18] intel_iommu: tag VTDAddressSpace instance with PASID
  2019-07-09  6:12   ` Peter Xu
@ 2019-07-11  7:24     ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2019-07-11  7:24 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	tianyu.lan, Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan,
	Yi Sun

> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
> Of Peter Xu
> Sent: Tuesday, July 9, 2019 2:13 PM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v1 10/18] intel_iommu: tag VTDAddressSpace instance with PASID
> 
> On Fri, Jul 05, 2019 at 07:01:43PM +0800, Liu Yi L wrote:
> > This patch introduces new fields in VTDAddressSpace for further PASID
> > support in Intel vIOMMU. In old time, each device has a
> > VTDAddressSpace instance to stand for its guest IOVA address space
> > when vIOMMU is enabled. However, when PASID is exposed to guest,
> > device will have multiple address spaces which are tagged with PASID.
> > To suit this change, VTDAddressSpace should be tagged with PASIDs in Intel
> vIOMMU.
> >
> > To record PASID tagged VTDAddressSpaces, a hash table is introduced.
> > The data in the hash table can be used for future sanity check and
> > retrieve previous PASID configs of guest and also future emulated SVA
> > DMA support for emulated SVA capable devices. The lookup key is a
> > string and its format is as below:
> >
> > "rsv%04dpasid%010dsid%06d" -- totally 32 bytes
> 
> Can we make it simply a struct?
> 
>         struct pasid_key {
>                 uint32_t pasid;
>                 uint16_t sid;
>         }

Nice suggestion. Let me try it.

> Also I think we don't need to keep reserved bits because it'll be a structure that'll
> only be used by QEMU so we can extend it easily in the future when necessary.

If using structure, no need indeed. :-)

> [...]
> 
> > +static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t
> > +domain_id) {
> > +    VTDPASIDCacheInfo pc_info;
> > +
> > +    trace_vtd_pasid_cache_dsi(domain_id);
> > +
> > +    pc_info.flags = VTD_PASID_CACHE_DOMSI;
> > +    pc_info.domain_id = domain_id;
> > +
> > +    /*
> > +     * use g_hash_table_foreach_remove(), which will free the
> > +     * vtd_pasid_as instances.
> > +     */
> > +    g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info);
> > +    /*
> > +     * TODO: Domain selective PASID cache invalidation
> > +     * may be issued wrongly by programmer, to be safe,
> > +     * after invalidating the pasid caches, emulator
> > +     * needs to replay the pasid bindings by walking guest
> > +     * pasid dir and pasid table.
> > +     */
> 
> It seems to me that this is still unchanged for the whole series.
> It's fine for RFC, but just a reminder that please either comment on why we don't
> have something or implement what we need here...

Yes, I haven’t added in this RFC. So listed it as a TODO here. This would be done
after the work flow is clear. :-)

> [...]
> 
> >  /* Unmap the whole range in the notifier's scope. */  static void
> > vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)  { @@
> > -3914,6 +4076,8 @@ static void vtd_realize(DeviceState *dev, Error **errp)
> >                                       g_free, g_free);
> >      s->vtd_as_by_busptr = g_hash_table_new_full(vtd_uint64_hash,
> vtd_uint64_equal,
> >                                                g_free, g_free);
> > +    s->vtd_pasid_as = g_hash_table_new_full(&g_str_hash, &g_str_equal,
> > +                                     g_free, hash_pasid_as_free);
> 
> Can use g_free() and drop hash_pasid_as_free()?

Nice catch. I used hash_pasid_as_free() because of that I'd like to do something other
than free the VTDAddressSpace instance. e.g. destroy the AddressSpace MemoryRegion
instances before free VTDAddressSpace instance. That's related to another comment
from you in anther thread. :-)

For now, I think it is fine to drop it and just use g_free.

> Also, this patch only tries to drop entries of the hash table but the hash table is never
> inserted or used.  I would suggest that you put that part to be with this patch as a
> whole otherwise it's hard to clarify how this hash table will be used.

Good suggestion, will make it sound in next version.

Thanks,
Yi Liu

> Regards,
> 
> --
> Peter Xu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 11/18] intel_iommu: create VTDAddressSpace per BDF+PASID
  2019-07-09  6:39   ` Peter Xu
@ 2019-07-11  8:13     ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2019-07-11  8:13 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, mst, pbonzini, alex.williamson, eric.auger, david,
	tianyu.lan, Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan,
	Yi Sun

> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
> Of Peter Xu
> Sent: Tuesday, July 9, 2019 2:39 PM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v1 11/18] intel_iommu: create VTDAddressSpace per BDF+PASID
> 
> On Fri, Jul 05, 2019 at 07:01:44PM +0800, Liu Yi L wrote:
> 
> [...]
> 
> > +/**
> > + * This function finds or adds a VTDAddressSpace for a device when
> > + * it is bound to a pasid
> > + */
> > +static VTDAddressSpace *vtd_add_find_pasid_as(IntelIOMMUState *s,
> > +                                              PCIBus *bus,
> > +                                              int devfn,
> > +                                              uint32_t pasid,
> > +                                              bool allocate)
> > +{
> > +    char key[32];
> > +    char *new_key;
> > +    VTDAddressSpace *vtd_pasid_as;
> > +    uint16_t sid;
> > +
> > +    sid = vtd_make_source_id(pci_bus_num(bus), devfn);
> > +    vtd_get_pasid_key(&key[0], 32, pasid, sid);
> > +    vtd_pasid_as = g_hash_table_lookup(s->vtd_pasid_as, &key[0]);
> > +
> > +    if (!vtd_pasid_as && allocate) {
> > +        new_key = g_malloc(32);
> > +        vtd_get_pasid_key(&new_key[0], 32, pasid, sid);
> > +        /*
> > +         * Initiate the vtd_pasid_as structure.
> > +         *
> > +         * This structure here is used to track the guest pasid
> > +         * binding and also serves as pasid-cache mangement entry.
> > +         *
> > +         * TODO: in future, if wants to support the SVA-aware DMA
> > +         *       emulation, the vtd_pasid_as should be fully initialized.
> > +         *       e.g. the address_space and memory region fields.
> > +         */
> 
> I'm not very sure about this part.  IMHO all those memory regions are
> used to inlay the whole IOMMU idea into QEMU's memory API framework.
> Now even without the whole PASID support we've already have a workable
> vtd_iommu_translate() that will intercept device DMA operations and we
> can try to translate the IOVA to anything we want.  Now the iommu_idx
> parameter of vtd_iommu_translate() is never used (I'd say until now I
> still don't sure on whether the "iommu_idx" idea is the best we can
> have... I've tried to debate on that but... anyway I assume for Intel
> we can think it as the "pasid" information or at least contains it),
> however in the further we can have that PASID/iommu_idx/whatever
> passed into this translate() function too, then we can walk the 1st
> level page table there if we found that this device had enabled the
> 1st level mapping (or even nested).  I don't see what else we need to
> do to play with extra memory regions.

Not sure if passing a PASID to translate() function is good since we may
need to pass PASID parameter through all the QEMU AddressSpace read/
write stack.

Actually, I did some experiment with a simple emulated SVA-capable device
some time ago (no iommu_idx at that time). Per my understanding, a
SVA capable device model needs to fetch an AddressSpace with a PASID
and then call dma_memory_rw() which will invoke the QEMU AddressSpace
read/write stack, then finally call into vtd_iommu_translate(), and in
translate() we can get the VTDAddressSpace instance and it has a flag
"pasid_allocated". If it is true, translate the input address with page table
behind the PASID from the "pasid" field in VTDAddressSpace. I guess this
may introduce the least changes to existing logic.

> 
> Conclusion: I feel like SVA can use its own structure here instead of
> reusing VTDAddressSpace, because I think those memory regions can
> probably be useless.  Even it will, we can refactor the code later,
> but I really doubt it...

Hmmm, right, even necessary, SVA will require less memory regions. I can
switch to use a structure named VTDPASIDAddressSpace or alike.

Thanks,
Yi Liu

> > +        vtd_pasid_as = g_malloc0(sizeof(VTDAddressSpace));
> > +        vtd_pasid_as->iommu_state = s;
> > +        vtd_pasid_as->bus = bus;
> > +        vtd_pasid_as->devfn = devfn;
> > +        vtd_pasid_as->context_cache_entry.context_cache_gen = 0;
> > +        vtd_pasid_as->pasid = pasid;
> > +        vtd_pasid_as->pasid_allocated = true;
> > +        vtd_pasid_as->pasid_cache_entry.pasid_cache_gen = 0;
> > +        g_hash_table_insert(s->vtd_pasid_as, new_key, vtd_pasid_as);
> > +    }
> > +    return vtd_pasid_as;
> > +}
> 
> Regards,
> 
> --
> Peter Xu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
  2019-07-05 11:01 ` [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation Liu Yi L
  2019-07-09  2:23   ` Peter Xu
@ 2019-07-15  2:55   ` David Gibson
  2019-07-16 10:25     ` Liu, Yi L
  1 sibling, 1 reply; 58+ messages in thread
From: David Gibson @ 2019-07-15  2:55 UTC (permalink / raw)
  To: Liu Yi L
  Cc: qemu-devel, mst, pbonzini, alex.williamson, peterx, eric.auger,
	tianyu.lan, kevin.tian, jun.j.tian, yi.y.sun, kvm, Jacob Pan,
	Yi Sun

[-- Attachment #1: Type: text/plain, Size: 4095 bytes --]

On Fri, Jul 05, 2019 at 07:01:38PM +0800, Liu Yi L wrote:
> This patch adds vfio implementation PCIPASIDOps.alloc_pasid/free_pasid().
> These two functions are used to propagate guest pasid allocation and
> free requests to host via vfio container ioctl.

As I said in an earlier comment, I think doing this on the device is
conceptually incorrect.  I think we need an explcit notion of an SVM
context (i.e. the namespace in which all the PASIDs live) - which will
IIUC usually be shared amongst multiple devices.  The create and free
PASID requests should be on that object.

> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> ---
>  hw/vfio/pci.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 61 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index ce3fe96..ab184ad 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2690,6 +2690,65 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>      vdev->req_enabled = false;
>  }
>  
> +static int vfio_pci_device_request_pasid_alloc(PCIBus *bus,
> +                                               int32_t devfn,
> +                                               uint32_t min_pasid,
> +                                               uint32_t max_pasid)
> +{
> +    PCIDevice *pdev = bus->devices[devfn];
> +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> +    VFIOContainer *container = vdev->vbasedev.group->container;
> +    struct vfio_iommu_type1_pasid_request req;
> +    unsigned long argsz;
> +    int pasid;
> +
> +    argsz = sizeof(req);
> +    req.argsz = argsz;
> +    req.flag = VFIO_IOMMU_PASID_ALLOC;
> +    req.min_pasid = min_pasid;
> +    req.max_pasid = max_pasid;
> +
> +    rcu_read_lock();
> +    pasid = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> +    if (pasid < 0) {
> +        error_report("vfio_pci_device_request_pasid_alloc:"
> +                     " request failed, contanier: %p", container);
> +    }
> +    rcu_read_unlock();
> +    return pasid;
> +}
> +
> +static int vfio_pci_device_request_pasid_free(PCIBus *bus,
> +                                              int32_t devfn,
> +                                              uint32_t pasid)
> +{
> +    PCIDevice *pdev = bus->devices[devfn];
> +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> +    VFIOContainer *container = vdev->vbasedev.group->container;
> +    struct vfio_iommu_type1_pasid_request req;
> +    unsigned long argsz;
> +    int ret = 0;
> +
> +    argsz = sizeof(req);
> +    req.argsz = argsz;
> +    req.flag = VFIO_IOMMU_PASID_FREE;
> +    req.pasid = pasid;
> +
> +    rcu_read_lock();
> +    ret = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> +    if (ret != 0) {
> +        error_report("vfio_pci_device_request_pasid_free:"
> +                     " request failed, contanier: %p", container);
> +    }
> +    rcu_read_unlock();
> +    return ret;
> +}
> +
> +static PCIPASIDOps vfio_pci_pasid_ops = {
> +    .alloc_pasid = vfio_pci_device_request_pasid_alloc,
> +    .free_pasid = vfio_pci_device_request_pasid_free,
> +};
> +
>  static void vfio_realize(PCIDevice *pdev, Error **errp)
>  {
>      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> @@ -2991,6 +3050,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>      vfio_register_req_notifier(vdev);
>      vfio_setup_resetfn_quirk(vdev);
>  
> +    pci_setup_pasid_ops(pdev, &vfio_pci_pasid_ops);
> +
>      return;
>  
>  out_teardown:

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
  2019-07-15  2:55   ` David Gibson
@ 2019-07-16 10:25     ` Liu, Yi L
  2019-07-17  3:06       ` David Gibson
  0 siblings, 1 reply; 58+ messages in thread
From: Liu, Yi L @ 2019-07-16 10:25 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, mst, pbonzini, alex.williamson, peterx, eric.auger,
	Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
> Of David Gibson
> Sent: Monday, July 15, 2019 10:55 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
> 
> On Fri, Jul 05, 2019 at 07:01:38PM +0800, Liu Yi L wrote:
> > This patch adds vfio implementation PCIPASIDOps.alloc_pasid/free_pasid().
> > These two functions are used to propagate guest pasid allocation and
> > free requests to host via vfio container ioctl.
> 
> As I said in an earlier comment, I think doing this on the device is
> conceptually incorrect.  I think we need an explcit notion of an SVM
> context (i.e. the namespace in which all the PASIDs live) - which will
> IIUC usually be shared amongst multiple devices.  The create and free
> PASID requests should be on that object.

Actually, the allocation is not doing on this device. System wide, it is
done on a container. So not sure if it is the API interface gives you a
sense that this is done on device. Also, curious on the SVM context
concept, do you mean it a per-VM context or a per-SVM usage context?
May you elaborate a little more. :-)

Thanks,
Yi Liu

> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > Cc: David Gibson <david@gibson.dropbear.id.au>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> > ---
> >  hw/vfio/pci.c | 61
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 61 insertions(+)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index ce3fe96..ab184ad 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -2690,6 +2690,65 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice
> *vdev)
> >      vdev->req_enabled = false;
> >  }
> >
> > +static int vfio_pci_device_request_pasid_alloc(PCIBus *bus,
> > +                                               int32_t devfn,
> > +                                               uint32_t min_pasid,
> > +                                               uint32_t max_pasid)
> > +{
> > +    PCIDevice *pdev = bus->devices[devfn];
> > +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> > +    VFIOContainer *container = vdev->vbasedev.group->container;
> > +    struct vfio_iommu_type1_pasid_request req;
> > +    unsigned long argsz;
> > +    int pasid;
> > +
> > +    argsz = sizeof(req);
> > +    req.argsz = argsz;
> > +    req.flag = VFIO_IOMMU_PASID_ALLOC;
> > +    req.min_pasid = min_pasid;
> > +    req.max_pasid = max_pasid;
> > +
> > +    rcu_read_lock();
> > +    pasid = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> > +    if (pasid < 0) {
> > +        error_report("vfio_pci_device_request_pasid_alloc:"
> > +                     " request failed, contanier: %p", container);
> > +    }
> > +    rcu_read_unlock();
> > +    return pasid;
> > +}
> > +
> > +static int vfio_pci_device_request_pasid_free(PCIBus *bus,
> > +                                              int32_t devfn,
> > +                                              uint32_t pasid)
> > +{
> > +    PCIDevice *pdev = bus->devices[devfn];
> > +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> > +    VFIOContainer *container = vdev->vbasedev.group->container;
> > +    struct vfio_iommu_type1_pasid_request req;
> > +    unsigned long argsz;
> > +    int ret = 0;
> > +
> > +    argsz = sizeof(req);
> > +    req.argsz = argsz;
> > +    req.flag = VFIO_IOMMU_PASID_FREE;
> > +    req.pasid = pasid;
> > +
> > +    rcu_read_lock();
> > +    ret = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> > +    if (ret != 0) {
> > +        error_report("vfio_pci_device_request_pasid_free:"
> > +                     " request failed, contanier: %p", container);
> > +    }
> > +    rcu_read_unlock();
> > +    return ret;
> > +}
> > +
> > +static PCIPASIDOps vfio_pci_pasid_ops = {
> > +    .alloc_pasid = vfio_pci_device_request_pasid_alloc,
> > +    .free_pasid = vfio_pci_device_request_pasid_free,
> > +};
> > +
> >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> >  {
> >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > @@ -2991,6 +3050,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> >      vfio_register_req_notifier(vdev);
> >      vfio_setup_resetfn_quirk(vdev);
> >
> > +    pci_setup_pasid_ops(pdev, &vfio_pci_pasid_ops);
> > +
> >      return;
> >
> >  out_teardown:
> 
> --
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
  2019-07-16 10:25     ` Liu, Yi L
@ 2019-07-17  3:06       ` David Gibson
  2019-07-22  7:02         ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: David Gibson @ 2019-07-17  3:06 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: qemu-devel, mst, pbonzini, alex.williamson, peterx, eric.auger,
	Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

[-- Attachment #1: Type: text/plain, Size: 5641 bytes --]

On Tue, Jul 16, 2019 at 10:25:55AM +0000, Liu, Yi L wrote:
> > From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
> > Of David Gibson
> > Sent: Monday, July 15, 2019 10:55 AM
> > To: Liu, Yi L <yi.l.liu@intel.com>
> > Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
> > 
> > On Fri, Jul 05, 2019 at 07:01:38PM +0800, Liu Yi L wrote:
> > > This patch adds vfio implementation PCIPASIDOps.alloc_pasid/free_pasid().
> > > These two functions are used to propagate guest pasid allocation and
> > > free requests to host via vfio container ioctl.
> > 
> > As I said in an earlier comment, I think doing this on the device is
> > conceptually incorrect.  I think we need an explcit notion of an SVM
> > context (i.e. the namespace in which all the PASIDs live) - which will
> > IIUC usually be shared amongst multiple devices.  The create and free
> > PASID requests should be on that object.
> 
> Actually, the allocation is not doing on this device. System wide, it is
> done on a container. So not sure if it is the API interface gives you a
> sense that this is done on device.

Sorry, I should have been clearer.  I can see that at the VFIO level
it is done on the container.  However the function here takes a bus
and devfn, so this qemu internal interface is per-device, which
doesn't really make sense.

> Also, curious on the SVM context
> concept, do you mean it a per-VM context or a per-SVM usage context?
> May you elaborate a little more. :-)

Sorry, I'm struggling to find a good term for this.  By "context" I
mean a namespace containing a bunch of PASID address spaces, those
PASIDs are then visible to some group of devices.

> 
> Thanks,
> Yi Liu
> 
> > >
> > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > Cc: Peter Xu <peterx@redhat.com>
> > > Cc: Eric Auger <eric.auger@redhat.com>
> > > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > > Cc: David Gibson <david@gibson.dropbear.id.au>
> > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> > > ---
> > >  hw/vfio/pci.c | 61
> > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 61 insertions(+)
> > >
> > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > index ce3fe96..ab184ad 100644
> > > --- a/hw/vfio/pci.c
> > > +++ b/hw/vfio/pci.c
> > > @@ -2690,6 +2690,65 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice
> > *vdev)
> > >      vdev->req_enabled = false;
> > >  }
> > >
> > > +static int vfio_pci_device_request_pasid_alloc(PCIBus *bus,
> > > +                                               int32_t devfn,
> > > +                                               uint32_t min_pasid,
> > > +                                               uint32_t max_pasid)
> > > +{
> > > +    PCIDevice *pdev = bus->devices[devfn];
> > > +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> > > +    VFIOContainer *container = vdev->vbasedev.group->container;
> > > +    struct vfio_iommu_type1_pasid_request req;
> > > +    unsigned long argsz;
> > > +    int pasid;
> > > +
> > > +    argsz = sizeof(req);
> > > +    req.argsz = argsz;
> > > +    req.flag = VFIO_IOMMU_PASID_ALLOC;
> > > +    req.min_pasid = min_pasid;
> > > +    req.max_pasid = max_pasid;
> > > +
> > > +    rcu_read_lock();
> > > +    pasid = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> > > +    if (pasid < 0) {
> > > +        error_report("vfio_pci_device_request_pasid_alloc:"
> > > +                     " request failed, contanier: %p", container);
> > > +    }
> > > +    rcu_read_unlock();
> > > +    return pasid;
> > > +}
> > > +
> > > +static int vfio_pci_device_request_pasid_free(PCIBus *bus,
> > > +                                              int32_t devfn,
> > > +                                              uint32_t pasid)
> > > +{
> > > +    PCIDevice *pdev = bus->devices[devfn];
> > > +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> > > +    VFIOContainer *container = vdev->vbasedev.group->container;
> > > +    struct vfio_iommu_type1_pasid_request req;
> > > +    unsigned long argsz;
> > > +    int ret = 0;
> > > +
> > > +    argsz = sizeof(req);
> > > +    req.argsz = argsz;
> > > +    req.flag = VFIO_IOMMU_PASID_FREE;
> > > +    req.pasid = pasid;
> > > +
> > > +    rcu_read_lock();
> > > +    ret = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> > > +    if (ret != 0) {
> > > +        error_report("vfio_pci_device_request_pasid_free:"
> > > +                     " request failed, contanier: %p", container);
> > > +    }
> > > +    rcu_read_unlock();
> > > +    return ret;
> > > +}
> > > +
> > > +static PCIPASIDOps vfio_pci_pasid_ops = {
> > > +    .alloc_pasid = vfio_pci_device_request_pasid_alloc,
> > > +    .free_pasid = vfio_pci_device_request_pasid_free,
> > > +};
> > > +
> > >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> > >  {
> > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > @@ -2991,6 +3050,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> > >      vfio_register_req_notifier(vdev);
> > >      vfio_setup_resetfn_quirk(vdev);
> > >
> > > +    pci_setup_pasid_ops(pdev, &vfio_pci_pasid_ops);
> > > +
> > >      return;
> > >
> > >  out_teardown:
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
  2019-07-17  3:06       ` David Gibson
@ 2019-07-22  7:02         ` Liu, Yi L
  2019-07-23  3:57           ` David Gibson
  0 siblings, 1 reply; 58+ messages in thread
From: Liu, Yi L @ 2019-07-22  7:02 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, mst, pbonzini, alex.williamson, peterx, eric.auger,
	Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
> Of David Gibson
> Sent: Wednesday, July 17, 2019 11:07 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
> 
> On Tue, Jul 16, 2019 at 10:25:55AM +0000, Liu, Yi L wrote:
> > > From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf
> > > Of David Gibson
> > > Sent: Monday, July 15, 2019 10:55 AM
> > > To: Liu, Yi L <yi.l.liu@intel.com>
> > > Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
> > >
> > > On Fri, Jul 05, 2019 at 07:01:38PM +0800, Liu Yi L wrote:
> > > > This patch adds vfio implementation PCIPASIDOps.alloc_pasid/free_pasid().
> > > > These two functions are used to propagate guest pasid allocation and
> > > > free requests to host via vfio container ioctl.
> > >
> > > As I said in an earlier comment, I think doing this on the device is
> > > conceptually incorrect.  I think we need an explcit notion of an SVM
> > > context (i.e. the namespace in which all the PASIDs live) - which will
> > > IIUC usually be shared amongst multiple devices.  The create and free
> > > PASID requests should be on that object.
> >
> > Actually, the allocation is not doing on this device. System wide, it is
> > done on a container. So not sure if it is the API interface gives you a
> > sense that this is done on device.
> 
> Sorry, I should have been clearer.  I can see that at the VFIO level
> it is done on the container.  However the function here takes a bus
> and devfn, so this qemu internal interface is per-device, which
> doesn't really make sense.

Got it. The reason here is to pass the bus and devfn info, so that VFIO
can figure out a container for the operation. So far in QEMU, there is
no good way to connect the vIOMMU emulator and VFIO regards to
SVM. hw/pci layer is a choice based on some previous discussion. But
yes, I agree with you that we may need to have an explicit notion for
SVM. Do you think it is good to introduce a new abstract layer for SVM
(may name as SVMContext). The idea would be that vIOMMU maintain
the SVMContext instances and expose explicit interface for VFIO to get
it. Then VFIO register notifiers on to the SVMContext. When vIOMMU
emulator wants to do PASID alloc/free, it fires the corresponding
notifier. After call into VFIO, the notifier function itself figure out the
container it is bound. In this way, it's the duty of vIOMMU emulator to
figure out a proper notifier to fire. From interface point of view, it is no
longer per-device. Also, it leaves the PASID management details to
vIOMMU emulator as it can be vendor specific. Does it make sense?
Also, I'd like to know if you have any other idea on it. That would
surely be helpful. :-)

> > Also, curious on the SVM context
> > concept, do you mean it a per-VM context or a per-SVM usage context?
> > May you elaborate a little more. :-)
> 
> Sorry, I'm struggling to find a good term for this.  By "context" I
> mean a namespace containing a bunch of PASID address spaces, those
> PASIDs are then visible to some group of devices.

I see. May be the SVMContext instance above can include multiple PASID
address spaces. And again, I think this relationship should be maintained
in vIOMMU emulator.

Thanks,
Yi Liu

> 
> >
> > Thanks,
> > Yi Liu
> >
> > > >
> > > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > Cc: Peter Xu <peterx@redhat.com>
> > > > Cc: Eric Auger <eric.auger@redhat.com>
> > > > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > > > Cc: David Gibson <david@gibson.dropbear.id.au>
> > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > > Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> > > > ---
> > > >  hw/vfio/pci.c | 61
> > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 61 insertions(+)
> > > >
> > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > index ce3fe96..ab184ad 100644
> > > > --- a/hw/vfio/pci.c
> > > > +++ b/hw/vfio/pci.c
> > > > @@ -2690,6 +2690,65 @@ static void
> vfio_unregister_req_notifier(VFIOPCIDevice
> > > *vdev)
> > > >      vdev->req_enabled = false;
> > > >  }
> > > >
> > > > +static int vfio_pci_device_request_pasid_alloc(PCIBus *bus,
> > > > +                                               int32_t devfn,
> > > > +                                               uint32_t min_pasid,
> > > > +                                               uint32_t max_pasid)
> > > > +{
> > > > +    PCIDevice *pdev = bus->devices[devfn];
> > > > +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> > > > +    VFIOContainer *container = vdev->vbasedev.group->container;
> > > > +    struct vfio_iommu_type1_pasid_request req;
> > > > +    unsigned long argsz;
> > > > +    int pasid;
> > > > +
> > > > +    argsz = sizeof(req);
> > > > +    req.argsz = argsz;
> > > > +    req.flag = VFIO_IOMMU_PASID_ALLOC;
> > > > +    req.min_pasid = min_pasid;
> > > > +    req.max_pasid = max_pasid;
> > > > +
> > > > +    rcu_read_lock();
> > > > +    pasid = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> > > > +    if (pasid < 0) {
> > > > +        error_report("vfio_pci_device_request_pasid_alloc:"
> > > > +                     " request failed, contanier: %p", container);
> > > > +    }
> > > > +    rcu_read_unlock();
> > > > +    return pasid;
> > > > +}
> > > > +
> > > > +static int vfio_pci_device_request_pasid_free(PCIBus *bus,
> > > > +                                              int32_t devfn,
> > > > +                                              uint32_t pasid)
> > > > +{
> > > > +    PCIDevice *pdev = bus->devices[devfn];
> > > > +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> > > > +    VFIOContainer *container = vdev->vbasedev.group->container;
> > > > +    struct vfio_iommu_type1_pasid_request req;
> > > > +    unsigned long argsz;
> > > > +    int ret = 0;
> > > > +
> > > > +    argsz = sizeof(req);
> > > > +    req.argsz = argsz;
> > > > +    req.flag = VFIO_IOMMU_PASID_FREE;
> > > > +    req.pasid = pasid;
> > > > +
> > > > +    rcu_read_lock();
> > > > +    ret = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> > > > +    if (ret != 0) {
> > > > +        error_report("vfio_pci_device_request_pasid_free:"
> > > > +                     " request failed, contanier: %p", container);
> > > > +    }
> > > > +    rcu_read_unlock();
> > > > +    return ret;
> > > > +}
> > > > +
> > > > +static PCIPASIDOps vfio_pci_pasid_ops = {
> > > > +    .alloc_pasid = vfio_pci_device_request_pasid_alloc,
> > > > +    .free_pasid = vfio_pci_device_request_pasid_free,
> > > > +};
> > > > +
> > > >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> > > >  {
> > > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > > @@ -2991,6 +3050,8 @@ static void vfio_realize(PCIDevice *pdev, Error
> **errp)
> > > >      vfio_register_req_notifier(vdev);
> > > >      vfio_setup_resetfn_quirk(vdev);
> > > >
> > > > +    pci_setup_pasid_ops(pdev, &vfio_pci_pasid_ops);
> > > > +
> > > >      return;
> > > >
> > > >  out_teardown:
> > >
> >
> 
> --
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
  2019-07-22  7:02         ` Liu, Yi L
@ 2019-07-23  3:57           ` David Gibson
  2019-07-24  4:57             ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: David Gibson @ 2019-07-23  3:57 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: qemu-devel, mst, pbonzini, alex.williamson, peterx, eric.auger,
	Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

[-- Attachment #1: Type: text/plain, Size: 8289 bytes --]

On Mon, Jul 22, 2019 at 07:02:51AM +0000, Liu, Yi L wrote:
> > From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
> > Of David Gibson
> > Sent: Wednesday, July 17, 2019 11:07 AM
> > To: Liu, Yi L <yi.l.liu@intel.com>
> > Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
> > 
> > On Tue, Jul 16, 2019 at 10:25:55AM +0000, Liu, Yi L wrote:
> > > > From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> > Behalf
> > > > Of David Gibson
> > > > Sent: Monday, July 15, 2019 10:55 AM
> > > > To: Liu, Yi L <yi.l.liu@intel.com>
> > > > Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
> > > >
> > > > On Fri, Jul 05, 2019 at 07:01:38PM +0800, Liu Yi L wrote:
> > > > > This patch adds vfio implementation PCIPASIDOps.alloc_pasid/free_pasid().
> > > > > These two functions are used to propagate guest pasid allocation and
> > > > > free requests to host via vfio container ioctl.
> > > >
> > > > As I said in an earlier comment, I think doing this on the device is
> > > > conceptually incorrect.  I think we need an explcit notion of an SVM
> > > > context (i.e. the namespace in which all the PASIDs live) - which will
> > > > IIUC usually be shared amongst multiple devices.  The create and free
> > > > PASID requests should be on that object.
> > >
> > > Actually, the allocation is not doing on this device. System wide, it is
> > > done on a container. So not sure if it is the API interface gives you a
> > > sense that this is done on device.
> > 
> > Sorry, I should have been clearer.  I can see that at the VFIO level
> > it is done on the container.  However the function here takes a bus
> > and devfn, so this qemu internal interface is per-device, which
> > doesn't really make sense.
> 
> Got it. The reason here is to pass the bus and devfn info, so that VFIO
> can figure out a container for the operation. So far in QEMU, there is
> no good way to connect the vIOMMU emulator and VFIO regards to
> SVM.

Right, and I think that's an indication that we're not modelling
something in qemu that we should be.

> hw/pci layer is a choice based on some previous discussion. But
> yes, I agree with you that we may need to have an explicit notion for
> SVM. Do you think it is good to introduce a new abstract layer for SVM
> (may name as SVMContext).

I think so, yes.

If nothing else, I expect we'll need this concept if we ever want to
be able to implement SVM for emulated devices (which could be useful
for debugging, even if it's not something you'd do in production).

> The idea would be that vIOMMU maintain
> the SVMContext instances and expose explicit interface for VFIO to get
> it. Then VFIO register notifiers on to the SVMContext. When vIOMMU
> emulator wants to do PASID alloc/free, it fires the corresponding
> notifier. After call into VFIO, the notifier function itself figure out the
> container it is bound. In this way, it's the duty of vIOMMU emulator to
> figure out a proper notifier to fire. From interface point of view, it is no
> longer per-device.

Exactly.

> Also, it leaves the PASID management details to
> vIOMMU emulator as it can be vendor specific. Does it make sense?
> Also, I'd like to know if you have any other idea on it. That would
> surely be helpful. :-)
> 
> > > Also, curious on the SVM context
> > > concept, do you mean it a per-VM context or a per-SVM usage context?
> > > May you elaborate a little more. :-)
> > 
> > Sorry, I'm struggling to find a good term for this.  By "context" I
> > mean a namespace containing a bunch of PASID address spaces, those
> > PASIDs are then visible to some group of devices.
> 
> I see. May be the SVMContext instance above can include multiple PASID
> address spaces. And again, I think this relationship should be maintained
> in vIOMMU emulator.
> 
> Thanks,
> Yi Liu
> 
> > 
> > >
> > > Thanks,
> > > Yi Liu
> > >
> > > > >
> > > > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > > > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > > Cc: Peter Xu <peterx@redhat.com>
> > > > > Cc: Eric Auger <eric.auger@redhat.com>
> > > > > Cc: Yi Sun <yi.y.sun@linux.intel.com>
> > > > > Cc: David Gibson <david@gibson.dropbear.id.au>
> > > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > > > Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> > > > > ---
> > > > >  hw/vfio/pci.c | 61
> > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > >  1 file changed, 61 insertions(+)
> > > > >
> > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > > index ce3fe96..ab184ad 100644
> > > > > --- a/hw/vfio/pci.c
> > > > > +++ b/hw/vfio/pci.c
> > > > > @@ -2690,6 +2690,65 @@ static void
> > vfio_unregister_req_notifier(VFIOPCIDevice
> > > > *vdev)
> > > > >      vdev->req_enabled = false;
> > > > >  }
> > > > >
> > > > > +static int vfio_pci_device_request_pasid_alloc(PCIBus *bus,
> > > > > +                                               int32_t devfn,
> > > > > +                                               uint32_t min_pasid,
> > > > > +                                               uint32_t max_pasid)
> > > > > +{
> > > > > +    PCIDevice *pdev = bus->devices[devfn];
> > > > > +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> > > > > +    VFIOContainer *container = vdev->vbasedev.group->container;
> > > > > +    struct vfio_iommu_type1_pasid_request req;
> > > > > +    unsigned long argsz;
> > > > > +    int pasid;
> > > > > +
> > > > > +    argsz = sizeof(req);
> > > > > +    req.argsz = argsz;
> > > > > +    req.flag = VFIO_IOMMU_PASID_ALLOC;
> > > > > +    req.min_pasid = min_pasid;
> > > > > +    req.max_pasid = max_pasid;
> > > > > +
> > > > > +    rcu_read_lock();
> > > > > +    pasid = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> > > > > +    if (pasid < 0) {
> > > > > +        error_report("vfio_pci_device_request_pasid_alloc:"
> > > > > +                     " request failed, contanier: %p", container);
> > > > > +    }
> > > > > +    rcu_read_unlock();
> > > > > +    return pasid;
> > > > > +}
> > > > > +
> > > > > +static int vfio_pci_device_request_pasid_free(PCIBus *bus,
> > > > > +                                              int32_t devfn,
> > > > > +                                              uint32_t pasid)
> > > > > +{
> > > > > +    PCIDevice *pdev = bus->devices[devfn];
> > > > > +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> > > > > +    VFIOContainer *container = vdev->vbasedev.group->container;
> > > > > +    struct vfio_iommu_type1_pasid_request req;
> > > > > +    unsigned long argsz;
> > > > > +    int ret = 0;
> > > > > +
> > > > > +    argsz = sizeof(req);
> > > > > +    req.argsz = argsz;
> > > > > +    req.flag = VFIO_IOMMU_PASID_FREE;
> > > > > +    req.pasid = pasid;
> > > > > +
> > > > > +    rcu_read_lock();
> > > > > +    ret = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req);
> > > > > +    if (ret != 0) {
> > > > > +        error_report("vfio_pci_device_request_pasid_free:"
> > > > > +                     " request failed, contanier: %p", container);
> > > > > +    }
> > > > > +    rcu_read_unlock();
> > > > > +    return ret;
> > > > > +}
> > > > > +
> > > > > +static PCIPASIDOps vfio_pci_pasid_ops = {
> > > > > +    .alloc_pasid = vfio_pci_device_request_pasid_alloc,
> > > > > +    .free_pasid = vfio_pci_device_request_pasid_free,
> > > > > +};
> > > > > +
> > > > >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> > > > >  {
> > > > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > > > @@ -2991,6 +3050,8 @@ static void vfio_realize(PCIDevice *pdev, Error
> > **errp)
> > > > >      vfio_register_req_notifier(vdev);
> > > > >      vfio_setup_resetfn_quirk(vdev);
> > > > >
> > > > > +    pci_setup_pasid_ops(pdev, &vfio_pci_pasid_ops);
> > > > > +
> > > > >      return;
> > > > >
> > > > >  out_teardown:
> > > >
> > >
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
  2019-07-23  3:57           ` David Gibson
@ 2019-07-24  4:57             ` Liu, Yi L
  2019-07-24  9:33               ` Auger Eric
  0 siblings, 1 reply; 58+ messages in thread
From: Liu, Yi L @ 2019-07-24  4:57 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, mst, pbonzini, alex.williamson, peterx, eric.auger,
	Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
> Of David Gibson
> Sent: Tuesday, July 23, 2019 11:58 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
> 
> On Mon, Jul 22, 2019 at 07:02:51AM +0000, Liu, Yi L wrote:
> > > From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
> > > On Behalf Of David Gibson
> > > Sent: Wednesday, July 17, 2019 11:07 AM
> > > To: Liu, Yi L <yi.l.liu@intel.com>
> > > Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free
> > > implementation
> > >
> > > On Tue, Jul 16, 2019 at 10:25:55AM +0000, Liu, Yi L wrote:
> > > > > From: kvm-owner@vger.kernel.org
> > > > > [mailto:kvm-owner@vger.kernel.org] On
> > > Behalf
> > > > > Of David Gibson
> > > > > Sent: Monday, July 15, 2019 10:55 AM
> > > > > To: Liu, Yi L <yi.l.liu@intel.com>
> > > > > Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free
> > > > > implementation
> > > > >
> > > > > On Fri, Jul 05, 2019 at 07:01:38PM +0800, Liu Yi L wrote:
> > > > > > This patch adds vfio implementation PCIPASIDOps.alloc_pasid/free_pasid().
> > > > > > These two functions are used to propagate guest pasid
> > > > > > allocation and free requests to host via vfio container ioctl.
> > > > >
> > > > > As I said in an earlier comment, I think doing this on the
> > > > > device is conceptually incorrect.  I think we need an explcit
> > > > > notion of an SVM context (i.e. the namespace in which all the
> > > > > PASIDs live) - which will IIUC usually be shared amongst
> > > > > multiple devices.  The create and free PASID requests should be on that object.
> > > >
> > > > Actually, the allocation is not doing on this device. System wide,
> > > > it is done on a container. So not sure if it is the API interface
> > > > gives you a sense that this is done on device.
> > >
> > > Sorry, I should have been clearer.  I can see that at the VFIO level
> > > it is done on the container.  However the function here takes a bus
> > > and devfn, so this qemu internal interface is per-device, which
> > > doesn't really make sense.
> >
> > Got it. The reason here is to pass the bus and devfn info, so that
> > VFIO can figure out a container for the operation. So far in QEMU,
> > there is no good way to connect the vIOMMU emulator and VFIO regards
> > to SVM.
> 
> Right, and I think that's an indication that we're not modelling something in qemu
> that we should be.
> 
> > hw/pci layer is a choice based on some previous discussion. But yes, I
> > agree with you that we may need to have an explicit notion for SVM. Do
> > you think it is good to introduce a new abstract layer for SVM (may
> > name as SVMContext).
> 
> I think so, yes.
> 
> If nothing else, I expect we'll need this concept if we ever want to be able to
> implement SVM for emulated devices (which could be useful for debugging, even if
> it's not something you'd do in production).
> 
> > The idea would be that vIOMMU maintain the SVMContext instances and
> > expose explicit interface for VFIO to get it. Then VFIO register
> > notifiers on to the SVMContext. When vIOMMU emulator wants to do PASID
> > alloc/free, it fires the corresponding notifier. After call into VFIO,
> > the notifier function itself figure out the container it is bound. In
> > this way, it's the duty of vIOMMU emulator to figure out a proper
> > notifier to fire. From interface point of view, it is no longer
> > per-device.
> 
> Exactly.

Cool, let me prepare another version with the ideas. Thanks for your
review. :-)

Regards,
Yi Liu

> > Also, it leaves the PASID management details to vIOMMU emulator as it
> > can be vendor specific. Does it make sense?
> > Also, I'd like to know if you have any other idea on it. That would
> > surely be helpful. :-)
> >
> > > > Also, curious on the SVM context
> > > > concept, do you mean it a per-VM context or a per-SVM usage context?
> > > > May you elaborate a little more. :-)
> > >
> > > Sorry, I'm struggling to find a good term for this.  By "context" I
> > > mean a namespace containing a bunch of PASID address spaces, those
> > > PASIDs are then visible to some group of devices.
> >
> > I see. May be the SVMContext instance above can include multiple PASID
> > address spaces. And again, I think this relationship should be
> > maintained in vIOMMU emulator.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
  2019-07-24  4:57             ` Liu, Yi L
@ 2019-07-24  9:33               ` Auger Eric
  2019-07-25  3:40                 ` David Gibson
  2019-07-26  5:18                 ` Liu, Yi L
  0 siblings, 2 replies; 58+ messages in thread
From: Auger Eric @ 2019-07-24  9:33 UTC (permalink / raw)
  To: Liu, Yi L, David Gibson
  Cc: qemu-devel, mst, pbonzini, alex.williamson, peterx, Tian, Kevin,
	Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

Hi Yi, David,

On 7/24/19 6:57 AM, Liu, Yi L wrote:
>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
>> Of David Gibson
>> Sent: Tuesday, July 23, 2019 11:58 AM
>> To: Liu, Yi L <yi.l.liu@intel.com>
>> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
>>
>> On Mon, Jul 22, 2019 at 07:02:51AM +0000, Liu, Yi L wrote:
>>>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
>>>> On Behalf Of David Gibson
>>>> Sent: Wednesday, July 17, 2019 11:07 AM
>>>> To: Liu, Yi L <yi.l.liu@intel.com>
>>>> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free
>>>> implementation
>>>>
>>>> On Tue, Jul 16, 2019 at 10:25:55AM +0000, Liu, Yi L wrote:
>>>>>> From: kvm-owner@vger.kernel.org
>>>>>> [mailto:kvm-owner@vger.kernel.org] On
>>>> Behalf
>>>>>> Of David Gibson
>>>>>> Sent: Monday, July 15, 2019 10:55 AM
>>>>>> To: Liu, Yi L <yi.l.liu@intel.com>
>>>>>> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free
>>>>>> implementation
>>>>>>
>>>>>> On Fri, Jul 05, 2019 at 07:01:38PM +0800, Liu Yi L wrote:
>>>>>>> This patch adds vfio implementation PCIPASIDOps.alloc_pasid/free_pasid().
>>>>>>> These two functions are used to propagate guest pasid
>>>>>>> allocation and free requests to host via vfio container ioctl.
>>>>>>
>>>>>> As I said in an earlier comment, I think doing this on the
>>>>>> device is conceptually incorrect.  I think we need an explcit
>>>>>> notion of an SVM context (i.e. the namespace in which all the
>>>>>> PASIDs live) - which will IIUC usually be shared amongst
>>>>>> multiple devices.  The create and free PASID requests should be on that object.
>>>>>
>>>>> Actually, the allocation is not doing on this device. System wide,
>>>>> it is done on a container. So not sure if it is the API interface
>>>>> gives you a sense that this is done on device.
>>>>
>>>> Sorry, I should have been clearer.  I can see that at the VFIO level
>>>> it is done on the container.  However the function here takes a bus
>>>> and devfn, so this qemu internal interface is per-device, which
>>>> doesn't really make sense.
>>>
>>> Got it. The reason here is to pass the bus and devfn info, so that
>>> VFIO can figure out a container for the operation. So far in QEMU,
>>> there is no good way to connect the vIOMMU emulator and VFIO regards
>>> to SVM.
>>
>> Right, and I think that's an indication that we're not modelling something in qemu
>> that we should be.
>>
>>> hw/pci layer is a choice based on some previous discussion. But yes, I
>>> agree with you that we may need to have an explicit notion for SVM. Do
>>> you think it is good to introduce a new abstract layer for SVM (may
>>> name as SVMContext).
>>
>> I think so, yes.
>>
>> If nothing else, I expect we'll need this concept if we ever want to be able to
>> implement SVM for emulated devices (which could be useful for debugging, even if
>> it's not something you'd do in production).
>>
>>> The idea would be that vIOMMU maintain the SVMContext instances and
>>> expose explicit interface for VFIO to get it. Then VFIO register
>>> notifiers on to the SVMContext. When vIOMMU emulator wants to do PASID
>>> alloc/free, it fires the corresponding notifier. After call into VFIO,
>>> the notifier function itself figure out the container it is bound. In
>>> this way, it's the duty of vIOMMU emulator to figure out a proper
>>> notifier to fire. From interface point of view, it is no longer
>>> per-device.
>>
>> Exactly.
> 
> Cool, let me prepare another version with the ideas. Thanks for your
> review. :-)
> 
> Regards,
> Yi Liu
> 
>>> Also, it leaves the PASID management details to vIOMMU emulator as it
>>> can be vendor specific. Does it make sense?
>>> Also, I'd like to know if you have any other idea on it. That would
>>> surely be helpful. :-)
>>>
>>>>> Also, curious on the SVM context
>>>>> concept, do you mean it a per-VM context or a per-SVM usage context?
>>>>> May you elaborate a little more. :-)
>>>>
>>>> Sorry, I'm struggling to find a good term for this.  By "context" I
>>>> mean a namespace containing a bunch of PASID address spaces, those
>>>> PASIDs are then visible to some group of devices.
>>>
>>> I see. May be the SVMContext instance above can include multiple PASID
>>> address spaces. And again, I think this relationship should be
>>> maintained in vIOMMU emulator.
> 
So if I understand we now head towards introducing new notifiers taking
a "SVMContext" as argument instead of an IOMMUMemoryRegion.

I think we need to be clear about how both abstractions (SVMContext and
IOMMUMemoryRegion) differ. I would also need "SVMContext" abstraction
for 2stage SMMU integration (to notify stage 1 config changes and MSI
bindings) so I would need this new object to be not too much tied to SVM
use case.

Thanks

Eric


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
  2019-07-24  9:33               ` Auger Eric
@ 2019-07-25  3:40                 ` David Gibson
  2019-07-26  5:18                 ` Liu, Yi L
  1 sibling, 0 replies; 58+ messages in thread
From: David Gibson @ 2019-07-25  3:40 UTC (permalink / raw)
  To: Auger Eric
  Cc: Liu, Yi L, qemu-devel, mst, pbonzini, alex.williamson, peterx,
	Tian, Kevin, Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

[-- Attachment #1: Type: text/plain, Size: 5447 bytes --]

On Wed, Jul 24, 2019 at 11:33:06AM +0200, Auger Eric wrote:
> Hi Yi, David,
> 
> On 7/24/19 6:57 AM, Liu, Yi L wrote:
> >> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
> >> Of David Gibson
> >> Sent: Tuesday, July 23, 2019 11:58 AM
> >> To: Liu, Yi L <yi.l.liu@intel.com>
> >> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
> >>
> >> On Mon, Jul 22, 2019 at 07:02:51AM +0000, Liu, Yi L wrote:
> >>>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
> >>>> On Behalf Of David Gibson
> >>>> Sent: Wednesday, July 17, 2019 11:07 AM
> >>>> To: Liu, Yi L <yi.l.liu@intel.com>
> >>>> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free
> >>>> implementation
> >>>>
> >>>> On Tue, Jul 16, 2019 at 10:25:55AM +0000, Liu, Yi L wrote:
> >>>>>> From: kvm-owner@vger.kernel.org
> >>>>>> [mailto:kvm-owner@vger.kernel.org] On
> >>>> Behalf
> >>>>>> Of David Gibson
> >>>>>> Sent: Monday, July 15, 2019 10:55 AM
> >>>>>> To: Liu, Yi L <yi.l.liu@intel.com>
> >>>>>> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free
> >>>>>> implementation
> >>>>>>
> >>>>>> On Fri, Jul 05, 2019 at 07:01:38PM +0800, Liu Yi L wrote:
> >>>>>>> This patch adds vfio implementation PCIPASIDOps.alloc_pasid/free_pasid().
> >>>>>>> These two functions are used to propagate guest pasid
> >>>>>>> allocation and free requests to host via vfio container ioctl.
> >>>>>>
> >>>>>> As I said in an earlier comment, I think doing this on the
> >>>>>> device is conceptually incorrect.  I think we need an explcit
> >>>>>> notion of an SVM context (i.e. the namespace in which all the
> >>>>>> PASIDs live) - which will IIUC usually be shared amongst
> >>>>>> multiple devices.  The create and free PASID requests should be on that object.
> >>>>>
> >>>>> Actually, the allocation is not doing on this device. System wide,
> >>>>> it is done on a container. So not sure if it is the API interface
> >>>>> gives you a sense that this is done on device.
> >>>>
> >>>> Sorry, I should have been clearer.  I can see that at the VFIO level
> >>>> it is done on the container.  However the function here takes a bus
> >>>> and devfn, so this qemu internal interface is per-device, which
> >>>> doesn't really make sense.
> >>>
> >>> Got it. The reason here is to pass the bus and devfn info, so that
> >>> VFIO can figure out a container for the operation. So far in QEMU,
> >>> there is no good way to connect the vIOMMU emulator and VFIO regards
> >>> to SVM.
> >>
> >> Right, and I think that's an indication that we're not modelling something in qemu
> >> that we should be.
> >>
> >>> hw/pci layer is a choice based on some previous discussion. But yes, I
> >>> agree with you that we may need to have an explicit notion for SVM. Do
> >>> you think it is good to introduce a new abstract layer for SVM (may
> >>> name as SVMContext).
> >>
> >> I think so, yes.
> >>
> >> If nothing else, I expect we'll need this concept if we ever want to be able to
> >> implement SVM for emulated devices (which could be useful for debugging, even if
> >> it's not something you'd do in production).
> >>
> >>> The idea would be that vIOMMU maintain the SVMContext instances and
> >>> expose explicit interface for VFIO to get it. Then VFIO register
> >>> notifiers on to the SVMContext. When vIOMMU emulator wants to do PASID
> >>> alloc/free, it fires the corresponding notifier. After call into VFIO,
> >>> the notifier function itself figure out the container it is bound. In
> >>> this way, it's the duty of vIOMMU emulator to figure out a proper
> >>> notifier to fire. From interface point of view, it is no longer
> >>> per-device.
> >>
> >> Exactly.
> > 
> > Cool, let me prepare another version with the ideas. Thanks for your
> > review. :-)
> > 
> > Regards,
> > Yi Liu
> > 
> >>> Also, it leaves the PASID management details to vIOMMU emulator as it
> >>> can be vendor specific. Does it make sense?
> >>> Also, I'd like to know if you have any other idea on it. That would
> >>> surely be helpful. :-)
> >>>
> >>>>> Also, curious on the SVM context
> >>>>> concept, do you mean it a per-VM context or a per-SVM usage context?
> >>>>> May you elaborate a little more. :-)
> >>>>
> >>>> Sorry, I'm struggling to find a good term for this.  By "context" I
> >>>> mean a namespace containing a bunch of PASID address spaces, those
> >>>> PASIDs are then visible to some group of devices.
> >>>
> >>> I see. May be the SVMContext instance above can include multiple PASID
> >>> address spaces. And again, I think this relationship should be
> >>> maintained in vIOMMU emulator.
> > 
> So if I understand we now head towards introducing new notifiers taking
> a "SVMContext" as argument instead of an IOMMUMemoryRegion.
> 
> I think we need to be clear about how both abstractions (SVMContext and
> IOMMUMemoryRegion) differ. I would also need "SVMContext" abstraction
> for 2stage SMMU integration (to notify stage 1 config changes and MSI
> bindings) so I would need this new object to be not too much tied to SVM
> use case.

That's my suggestion.  I don't really have any authority to decide..

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
  2019-07-24  9:33               ` Auger Eric
  2019-07-25  3:40                 ` David Gibson
@ 2019-07-26  5:18                 ` Liu, Yi L
  2019-08-02  7:36                   ` Auger Eric
  1 sibling, 1 reply; 58+ messages in thread
From: Liu, Yi L @ 2019-07-26  5:18 UTC (permalink / raw)
  To: Auger Eric, David Gibson
  Cc: qemu-devel, mst, pbonzini, alex.williamson, peterx, Tian, Kevin,
	Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

Hi Eric,

> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: Wednesday, July 24, 2019 5:33 PM
> To: Liu, Yi L <yi.l.liu@intel.com>; David Gibson <david@gibson.dropbear.id.au>
> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
> 
> Hi Yi, David,
> 
> On 7/24/19 6:57 AM, Liu, Yi L wrote:
> >> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> >> Behalf Of David Gibson
> >> Sent: Tuesday, July 23, 2019 11:58 AM
> >> To: Liu, Yi L <yi.l.liu@intel.com>
> >> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free
> >> implementation
> >>
> >> On Mon, Jul 22, 2019 at 07:02:51AM +0000, Liu, Yi L wrote:
> >>>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
> >>>> On Behalf Of David Gibson
> >>>> Sent: Wednesday, July 17, 2019 11:07 AM
> >>>> To: Liu, Yi L <yi.l.liu@intel.com>
> >>>> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free
> >>>> implementation
> >>>>
> >>>> On Tue, Jul 16, 2019 at 10:25:55AM +0000, Liu, Yi L wrote:
> >>>>>> From: kvm-owner@vger.kernel.org
> >>>>>> [mailto:kvm-owner@vger.kernel.org] On
> >>>> Behalf
> >>>>>> Of David Gibson
> >>>>>> Sent: Monday, July 15, 2019 10:55 AM
> >>>>>> To: Liu, Yi L <yi.l.liu@intel.com>
> >>>>>> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free
> >>>>>> implementation
> >>>>>>
> >>>>>> On Fri, Jul 05, 2019 at 07:01:38PM +0800, Liu Yi L wrote:
> >>>>>>> This patch adds vfio implementation PCIPASIDOps.alloc_pasid/free_pasid().
> >>>>>>> These two functions are used to propagate guest pasid allocation
> >>>>>>> and free requests to host via vfio container ioctl.
> >>>>>>
> >>>>>> As I said in an earlier comment, I think doing this on the device
> >>>>>> is conceptually incorrect.  I think we need an explcit notion of
> >>>>>> an SVM context (i.e. the namespace in which all the PASIDs live)
> >>>>>> - which will IIUC usually be shared amongst multiple devices.
> >>>>>> The create and free PASID requests should be on that object.
> >>>>>
> >>>>> Actually, the allocation is not doing on this device. System wide,
> >>>>> it is done on a container. So not sure if it is the API interface
> >>>>> gives you a sense that this is done on device.
> >>>>
> >>>> Sorry, I should have been clearer.  I can see that at the VFIO
> >>>> level it is done on the container.  However the function here takes
> >>>> a bus and devfn, so this qemu internal interface is per-device,
> >>>> which doesn't really make sense.
> >>>
> >>> Got it. The reason here is to pass the bus and devfn info, so that
> >>> VFIO can figure out a container for the operation. So far in QEMU,
> >>> there is no good way to connect the vIOMMU emulator and VFIO regards
> >>> to SVM.
> >>
> >> Right, and I think that's an indication that we're not modelling
> >> something in qemu that we should be.
> >>
> >>> hw/pci layer is a choice based on some previous discussion. But yes,
> >>> I agree with you that we may need to have an explicit notion for
> >>> SVM. Do you think it is good to introduce a new abstract layer for
> >>> SVM (may name as SVMContext).
> >>
> >> I think so, yes.
> >>
> >> If nothing else, I expect we'll need this concept if we ever want to
> >> be able to implement SVM for emulated devices (which could be useful
> >> for debugging, even if it's not something you'd do in production).
> >>
> >>> The idea would be that vIOMMU maintain the SVMContext instances and
> >>> expose explicit interface for VFIO to get it. Then VFIO register
> >>> notifiers on to the SVMContext. When vIOMMU emulator wants to do
> >>> PASID alloc/free, it fires the corresponding notifier. After call
> >>> into VFIO, the notifier function itself figure out the container it
> >>> is bound. In this way, it's the duty of vIOMMU emulator to figure
> >>> out a proper notifier to fire. From interface point of view, it is
> >>> no longer per-device.
> >>
> >> Exactly.
> >
> > Cool, let me prepare another version with the ideas. Thanks for your
> > review. :-)
> >
> > Regards,
> > Yi Liu
> >
> >>> Also, it leaves the PASID management details to vIOMMU emulator as
> >>> it can be vendor specific. Does it make sense?
> >>> Also, I'd like to know if you have any other idea on it. That would
> >>> surely be helpful. :-)
> >>>
> >>>>> Also, curious on the SVM context
> >>>>> concept, do you mean it a per-VM context or a per-SVM usage context?
> >>>>> May you elaborate a little more. :-)
> >>>>
> >>>> Sorry, I'm struggling to find a good term for this.  By "context" I
> >>>> mean a namespace containing a bunch of PASID address spaces, those
> >>>> PASIDs are then visible to some group of devices.
> >>>
> >>> I see. May be the SVMContext instance above can include multiple
> >>> PASID address spaces. And again, I think this relationship should be
> >>> maintained in vIOMMU emulator.
> >
> So if I understand we now head towards introducing new notifiers taking a
> "SVMContext" as argument instead of an IOMMUMemoryRegion.

yes, this is the rough idea.
 
> I think we need to be clear about how both abstractions (SVMContext and
> IOMMUMemoryRegion) differ. I would also need "SVMContext" abstraction for
> 2stage SMMU integration (to notify stage 1 config changes and MSI
> bindings) so I would need this new object to be not too much tied to SVM use case.

I agree. SVMContext is just a proposed name. We may have better naming for it
as long as the thing we want to have is a new abstract layer between VFIO and
vIOMMU. Per my understanding, the IOMMUMemoryRegion abstraction is for
the notifications around guest memory changes. e.g. VFIO needs to be notified
when there is MAP/UNMAP happened. However, for the SVMContext, it aims to
be an abstraction for SVM/PASID related operations, which has no direct
relationship with memory. e.g. for VT-d, pasid allocation, pasid bind/unbind,
pasid based-iotlb flush. I think pasid bind/unbind and pasid based-iotlb flush is
equivalent with the stage 1 config changes in SMMU. If you agree to use it
all the same, how about naming it as IOMMUConext? Also, pls feel free to
propose your suggestion. :-)

Thanks,
Yi Liu

changes.

> Thanks
> 
> Eric


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
  2019-07-26  5:18                 ` Liu, Yi L
@ 2019-08-02  7:36                   ` Auger Eric
  0 siblings, 0 replies; 58+ messages in thread
From: Auger Eric @ 2019-08-02  7:36 UTC (permalink / raw)
  To: Liu, Yi L, David Gibson
  Cc: qemu-devel, mst, pbonzini, alex.williamson, peterx, Tian, Kevin,
	Tian, Jun J, Sun, Yi Y, kvm, Jacob Pan, Yi Sun

Hi Yi,

On 7/26/19 7:18 AM, Liu, Yi L wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Auger Eric [mailto:eric.auger@redhat.com]
>> Sent: Wednesday, July 24, 2019 5:33 PM
>> To: Liu, Yi L <yi.l.liu@intel.com>; David Gibson <david@gibson.dropbear.id.au>
>> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
>>
>> Hi Yi, David,
>>
>> On 7/24/19 6:57 AM, Liu, Yi L wrote:
>>>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
>>>> Behalf Of David Gibson
>>>> Sent: Tuesday, July 23, 2019 11:58 AM
>>>> To: Liu, Yi L <yi.l.liu@intel.com>
>>>> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free
>>>> implementation
>>>>
>>>> On Mon, Jul 22, 2019 at 07:02:51AM +0000, Liu, Yi L wrote:
>>>>>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
>>>>>> On Behalf Of David Gibson
>>>>>> Sent: Wednesday, July 17, 2019 11:07 AM
>>>>>> To: Liu, Yi L <yi.l.liu@intel.com>
>>>>>> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free
>>>>>> implementation
>>>>>>
>>>>>> On Tue, Jul 16, 2019 at 10:25:55AM +0000, Liu, Yi L wrote:
>>>>>>>> From: kvm-owner@vger.kernel.org
>>>>>>>> [mailto:kvm-owner@vger.kernel.org] On
>>>>>> Behalf
>>>>>>>> Of David Gibson
>>>>>>>> Sent: Monday, July 15, 2019 10:55 AM
>>>>>>>> To: Liu, Yi L <yi.l.liu@intel.com>
>>>>>>>> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free
>>>>>>>> implementation
>>>>>>>>
>>>>>>>> On Fri, Jul 05, 2019 at 07:01:38PM +0800, Liu Yi L wrote:
>>>>>>>>> This patch adds vfio implementation PCIPASIDOps.alloc_pasid/free_pasid().
>>>>>>>>> These two functions are used to propagate guest pasid allocation
>>>>>>>>> and free requests to host via vfio container ioctl.
>>>>>>>>
>>>>>>>> As I said in an earlier comment, I think doing this on the device
>>>>>>>> is conceptually incorrect.  I think we need an explcit notion of
>>>>>>>> an SVM context (i.e. the namespace in which all the PASIDs live)
>>>>>>>> - which will IIUC usually be shared amongst multiple devices.
>>>>>>>> The create and free PASID requests should be on that object.
>>>>>>>
>>>>>>> Actually, the allocation is not doing on this device. System wide,
>>>>>>> it is done on a container. So not sure if it is the API interface
>>>>>>> gives you a sense that this is done on device.
>>>>>>
>>>>>> Sorry, I should have been clearer.  I can see that at the VFIO
>>>>>> level it is done on the container.  However the function here takes
>>>>>> a bus and devfn, so this qemu internal interface is per-device,
>>>>>> which doesn't really make sense.
>>>>>
>>>>> Got it. The reason here is to pass the bus and devfn info, so that
>>>>> VFIO can figure out a container for the operation. So far in QEMU,
>>>>> there is no good way to connect the vIOMMU emulator and VFIO regards
>>>>> to SVM.
>>>>
>>>> Right, and I think that's an indication that we're not modelling
>>>> something in qemu that we should be.
>>>>
>>>>> hw/pci layer is a choice based on some previous discussion. But yes,
>>>>> I agree with you that we may need to have an explicit notion for
>>>>> SVM. Do you think it is good to introduce a new abstract layer for
>>>>> SVM (may name as SVMContext).
>>>>
>>>> I think so, yes.
>>>>
>>>> If nothing else, I expect we'll need this concept if we ever want to
>>>> be able to implement SVM for emulated devices (which could be useful
>>>> for debugging, even if it's not something you'd do in production).
>>>>
>>>>> The idea would be that vIOMMU maintain the SVMContext instances and
>>>>> expose explicit interface for VFIO to get it. Then VFIO register
>>>>> notifiers on to the SVMContext. When vIOMMU emulator wants to do
>>>>> PASID alloc/free, it fires the corresponding notifier. After call
>>>>> into VFIO, the notifier function itself figure out the container it
>>>>> is bound. In this way, it's the duty of vIOMMU emulator to figure
>>>>> out a proper notifier to fire. From interface point of view, it is
>>>>> no longer per-device.
>>>>
>>>> Exactly.
>>>
>>> Cool, let me prepare another version with the ideas. Thanks for your
>>> review. :-)
>>>
>>> Regards,
>>> Yi Liu
>>>
>>>>> Also, it leaves the PASID management details to vIOMMU emulator as
>>>>> it can be vendor specific. Does it make sense?
>>>>> Also, I'd like to know if you have any other idea on it. That would
>>>>> surely be helpful. :-)
>>>>>
>>>>>>> Also, curious on the SVM context
>>>>>>> concept, do you mean it a per-VM context or a per-SVM usage context?
>>>>>>> May you elaborate a little more. :-)
>>>>>>
>>>>>> Sorry, I'm struggling to find a good term for this.  By "context" I
>>>>>> mean a namespace containing a bunch of PASID address spaces, those
>>>>>> PASIDs are then visible to some group of devices.
>>>>>
>>>>> I see. May be the SVMContext instance above can include multiple
>>>>> PASID address spaces. And again, I think this relationship should be
>>>>> maintained in vIOMMU emulator.
>>>
>> So if I understand we now head towards introducing new notifiers taking a
>> "SVMContext" as argument instead of an IOMMUMemoryRegion.
> 
> yes, this is the rough idea.
>  
>> I think we need to be clear about how both abstractions (SVMContext and
>> IOMMUMemoryRegion) differ. I would also need "SVMContext" abstraction for
>> 2stage SMMU integration (to notify stage 1 config changes and MSI
>> bindings) so I would need this new object to be not too much tied to SVM use case.
> 
> I agree. SVMContext is just a proposed name. We may have better naming for it
> as long as the thing we want to have is a new abstract layer between VFIO and
> vIOMMU. Per my understanding, the IOMMUMemoryRegion abstraction is for
> the notifications around guest memory changes. e.g. VFIO needs to be notified
> when there is MAP/UNMAP happened. However, for the SVMContext, it aims to
> be an abstraction for SVM/PASID related operations, which has no direct
> relationship with memory. e.g. for VT-d, pasid allocation, pasid bind/unbind,
> pasid based-iotlb flush. I think pasid bind/unbind and pasid based-iotlb flush is
> equivalent with the stage 1 config changes in SMMU. If you agree to use it
> all the same, how about naming it as IOMMUConext? Also, pls feel free to
> propose your suggestion. :-)
Sorry for the delay. Yes IOMMUContext sounds OK to me. Looking forward
to reading your next revision.

Thanks

Eric
> 
> Thanks,
> Yi Liu
> 
> changes.
> 
>> Thanks
>>
>> Eric
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, back to index

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-05 11:01 [RFC v1 00/18] intel_iommu: expose Shared Virtual Addressing to VM Liu Yi L
2019-07-05 11:01 ` [RFC v1 01/18] linux-headers: import iommu.h from kernel Liu Yi L
2019-07-05 11:01 ` [RFC v1 02/18] linux-headers: import vfio.h " Liu Yi L
2019-07-09  1:58   ` Peter Xu
2019-07-09  8:37     ` Auger Eric
2019-07-10 12:31       ` Liu, Yi L
2019-07-10 12:29     ` Liu, Yi L
2019-07-05 11:01 ` [RFC v1 03/18] hw/pci: introduce PCIPASIDOps to PCIDevice Liu Yi L
2019-07-09  2:12   ` Peter Xu
2019-07-09 10:41     ` Auger Eric
2019-07-10 11:08     ` Liu, Yi L
2019-07-11  3:51       ` david
2019-07-11  7:13         ` Liu, Yi L
2019-07-05 11:01 ` [RFC v1 04/18] intel_iommu: add "sm_model" option Liu Yi L
2019-07-09  2:15   ` Peter Xu
2019-07-10 12:14     ` Liu, Yi L
2019-07-11  1:03       ` Peter Xu
2019-07-11  6:25         ` Liu, Yi L
2019-07-05 11:01 ` [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation Liu Yi L
2019-07-09  2:23   ` Peter Xu
2019-07-10 12:16     ` Liu, Yi L
2019-07-15  2:55   ` David Gibson
2019-07-16 10:25     ` Liu, Yi L
2019-07-17  3:06       ` David Gibson
2019-07-22  7:02         ` Liu, Yi L
2019-07-23  3:57           ` David Gibson
2019-07-24  4:57             ` Liu, Yi L
2019-07-24  9:33               ` Auger Eric
2019-07-25  3:40                 ` David Gibson
2019-07-26  5:18                 ` Liu, Yi L
2019-08-02  7:36                   ` Auger Eric
2019-07-05 11:01 ` [RFC v1 06/18] intel_iommu: support virtual command emulation and pasid request Liu Yi L
2019-07-09  3:19   ` Peter Xu
2019-07-10 11:51     ` Liu, Yi L
2019-07-11  1:13       ` Peter Xu
2019-07-11  6:59         ` Liu, Yi L
2019-07-05 11:01 ` [RFC v1 07/18] hw/pci: add pci_device_bind/unbind_gpasid Liu Yi L
2019-07-09  8:37   ` Auger Eric
2019-07-10 12:18     ` Liu, Yi L
2019-07-05 11:01 ` [RFC v1 08/18] vfio/pci: add vfio bind/unbind_gpasid implementation Liu Yi L
2019-07-09  8:37   ` Auger Eric
2019-07-10 12:30     ` Liu, Yi L
2019-07-05 11:01 ` [RFC v1 09/18] intel_iommu: process pasid cache invalidation Liu Yi L
2019-07-09  4:47   ` Peter Xu
2019-07-11  6:22     ` Liu, Yi L
2019-07-05 11:01 ` [RFC v1 10/18] intel_iommu: tag VTDAddressSpace instance with PASID Liu Yi L
2019-07-09  6:12   ` Peter Xu
2019-07-11  7:24     ` Liu, Yi L
2019-07-05 11:01 ` [RFC v1 11/18] intel_iommu: create VTDAddressSpace per BDF+PASID Liu Yi L
2019-07-09  6:39   ` Peter Xu
2019-07-11  8:13     ` Liu, Yi L
2019-07-05 11:01 ` [RFC v1 12/18] intel_iommu: bind/unbind guest page table to host Liu Yi L
2019-07-05 11:01 ` [RFC v1 13/18] intel_iommu: flush pasid cache after a DSI context cache flush Liu Yi L
2019-07-05 11:01 ` [RFC v1 14/18] hw/pci: add flush_pasid_iotlb() in PCIPASIDOps Liu Yi L
2019-07-05 11:01 ` [RFC v1 15/18] vfio/pci: adds support for PASID-based iotlb flush Liu Yi L
2019-07-05 11:01 ` [RFC v1 16/18] intel_iommu: add PASID-based iotlb invalidation support Liu Yi L
2019-07-05 11:01 ` [RFC v1 17/18] intel_iommu: propagate PASID-based iotlb flush to host Liu Yi L
2019-07-05 11:01 ` [RFC v1 18/18] intel_iommu: do not passdown pasid bind for PASID #0 Liu Yi L

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org kvm@archiver.kernel.org
	public-inbox-index kvm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox