linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 00/10] Nested Shared Virtual Address (SVA) VT-d support
@ 2019-12-16 19:24 Jacob Pan
  2019-12-16 19:24 ` [PATCH v8 01/10] iommu/vt-d: Move domain helper to header Jacob Pan
                   ` (9 more replies)
  0 siblings, 10 replies; 27+ messages in thread
From: Jacob Pan @ 2019-12-16 19:24 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Lu Baolu, David Woodhouse
  Cc: Tian, Kevin, Raj Ashok, Yi Liu, Eric Auger, Jacob Pan

Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on Intel
platforms allow address space sharing between device DMA and applications.
SVA can reduce programming complexity and enhance security.
This series is intended to enable SVA virtualization, i.e. enable use of SVA
within a guest user application.

Only IOMMU portion of the changes are included in this series. Additional
support is needed in VFIO and QEMU (will be submitted separately) to complete
this functionality.

To make incremental changes and reduce the size of each patchset. This series
does not inlcude support for page request services.

In VT-d implementation, PASID table is per device and maintained in the host.
Guest PASID table is shadowed in VMM where virtual IOMMU is emulated.

    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process CR3, FL only|
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'                       |
    |             |                       V
    |             |                CR3 in GPA
    '-------------'
Guest
------| Shadow |--------------------------|--------
      v        v                          v
Host
    .-------------.  .----------------------.
    |   pIOMMU    |  | Bind FL for GVA-GPA  |
    |             |  '----------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.------------------------------.
    |             |   |SL for GPA-HPA, default domain|
    |             |   '------------------------------'
    '-------------'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

This is the remaining VT-d only portion of V5 since the uAPIs and IOASID common
code have been applied to Joerg's IOMMU core branch.
(https://lkml.org/lkml/2019/10/2/833)

The complete set with VFIO patches are here:
https://github.com/jacobpan/linux.git:siov_sva

The complete nested SVA upstream patches are divided into three phases:
    1. Common APIs and PCI device direct assignment
    2. Page Request Services (PRS) support
    3. Mediated device assignment

With this set and the accompanied VFIO code, we will achieve phase #1.

Thanks,

Jacob

ChangeLog:
	- v8
	  - Extracted cleanup patches from V7 and accepted into maintainer's
	    tree (https://lkml.org/lkml/2019/12/2/514).
	  - Added IOASID notifier and VT-d handler for termination of PASID
	    IOMMU context upon free. This will ensure success of VFIO IOASID
	    free API regardless PASID is in use.
	    (https://lore.kernel.org/linux-iommu/1571919983-3231-1-git-send-email-yi.l.liu@intel.com/)

	- V7
	  - Respect vIOMMU PASID range in virtual command PASID/IOASID allocator
	  - Caching virtual command capabilities to avoid runtime checks that
	    could cause vmexits.

	- V6
	  - Rebased on top of Joerg's core branch
	  (git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git core)
	  - Adapt to new uAPIs and IOASID allocators

	- V5
	  Rebased on v5.3-rc4 which has some of the IOMMU fault APIs merged.
 	  Addressed v4 review comments from Eric Auger, Baolu Lu, and
	    Jonathan Cameron. Specific changes are as follows:
	  - Refined custom IOASID allocator to support multiple vIOMMU, hotplug
	    cases.
	  - Extracted vendor data from IOMMU guest PASID bind data, for VT-d
	    will support all necessary guest PASID entry fields for PASID
	    bind.
	  - Support non-identity host-guest PASID mapping
	  - Exception handling in various cases

	- V4
	  - Redesigned IOASID allocator such that it can support custom
	  allocators with shared helper functions. Use separate XArray
	  to store IOASIDs per allocator. Took advice from Eric Auger to
	  have default allocator use the generic allocator structure.
	  Combined into one patch in that the default allocator is just
	  "another" allocator now. Can be built as a module in case of
	  driver use without IOMMU.
	  - Extended bind guest PASID data to support SMMU and non-identity
	  guest to host PASID mapping https://lkml.org/lkml/2019/5/21/802
	  - Rebased on Jean's sva/api common tree, new patches starts with
	   [PATCH v4 10/22]

	- V3
	  - Addressed thorough review comments from Eric Auger (Thank you!)
	  - Moved IOASID allocator from driver core to IOMMU code per
	    suggestion by Christoph Hellwig
	    (https://lkml.org/lkml/2019/4/26/462)
	  - Rebased on top of Jean's SVA API branch and Eric's v7[1]
	    (git://linux-arm.org/linux-jpb.git sva/api)
	  - All IOMMU APIs are unmodified (except the new bind guest PASID
	    call in patch 9/16)

	- V2
	  - Rebased on Joerg's IOMMU x86/vt-d branch v5.1-rc4
	  - Integrated with Eric Auger's new v7 series for common APIs
	  (https://github.com/eauger/linux/tree/v5.1-rc3-2stage-v7)
	  - Addressed review comments from Andy Shevchenko and Alex Williamson on
	    IOASID custom allocator.
	  - Support multiple custom IOASID allocators (vIOMMUs) and dynamic
	    registration.

Jacob Pan (9):
  iommu/vt-d: Move domain helper to header
  iommu/vt-d: Add nested translation helper function
  iommu/vt-d: Add bind guest PASID support
  iommu/vt-d: Support flushing more translation cache types
  iommu/vt-d: Add svm/sva invalidate function
  iommu/vt-d: Cache virtual command capability register
  iommu/vt-d: Add custom allocator for IOASID
  iommu/ioasid: Add notifier for status change
  iommu/vt-d: Handle IOASID notifications

Lu Baolu (1):
  iommu/vt-d: Enlightened PASID allocation

 drivers/iommu/dmar.c        |  47 ++++++++
 drivers/iommu/intel-iommu.c | 255 ++++++++++++++++++++++++++++++++++++++++-
 drivers/iommu/intel-pasid.c | 273 +++++++++++++++++++++++++++++++++++++++++++-
 drivers/iommu/intel-pasid.h |  25 +++-
 drivers/iommu/intel-svm.c   | 266 ++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/ioasid.c      |  43 +++++++
 include/linux/intel-iommu.h |  46 +++++++-
 include/linux/intel-svm.h   |  17 +++
 include/linux/ioasid.h      |  20 ++++
 include/uapi/linux/iommu.h  |   5 +-
 10 files changed, 983 insertions(+), 14 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v8 01/10] iommu/vt-d: Move domain helper to header
  2019-12-16 19:24 [PATCH v8 00/10] Nested Shared Virtual Address (SVA) VT-d support Jacob Pan
@ 2019-12-16 19:24 ` Jacob Pan
  2019-12-16 19:24 ` [PATCH v8 02/10] iommu/vt-d: Add nested translation helper function Jacob Pan
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Jacob Pan @ 2019-12-16 19:24 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Lu Baolu, David Woodhouse
  Cc: Tian, Kevin, Raj Ashok, Yi Liu, Eric Auger, Jacob Pan

Move domain helper to header to be used by SVA code.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/intel-iommu.c | 6 ------
 include/linux/intel-iommu.h | 6 ++++++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index fdb688983fec..cc89791d807c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -428,12 +428,6 @@ static void init_translation_status(struct intel_iommu *iommu)
 		iommu->flags |= VTD_FLAG_TRANS_PRE_ENABLED;
 }
 
-/* Convert generic 'struct iommu_domain to private struct dmar_domain */
-static struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
-{
-	return container_of(dom, struct dmar_domain, domain);
-}
-
 static int __init intel_iommu_setup(char *str)
 {
 	if (!str)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index aaece25c055f..74b79e2e6a73 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -589,6 +589,12 @@ static inline void __iommu_flush_cache(
 		clflush_cache_range(addr, size);
 }
 
+/* Convert generic struct iommu_domain to private struct dmar_domain */
+static inline struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
+{
+	return container_of(dom, struct dmar_domain, domain);
+}
+
 /*
  * 0: readable
  * 1: writable
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 02/10] iommu/vt-d: Add nested translation helper function
  2019-12-16 19:24 [PATCH v8 00/10] Nested Shared Virtual Address (SVA) VT-d support Jacob Pan
  2019-12-16 19:24 ` [PATCH v8 01/10] iommu/vt-d: Move domain helper to header Jacob Pan
@ 2019-12-16 19:24 ` Jacob Pan
  2019-12-18  2:01   ` Lu Baolu
  2019-12-18  2:41   ` Lu Baolu
  2019-12-16 19:24 ` [PATCH v8 03/10] iommu/vt-d: Add bind guest PASID support Jacob Pan
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: Jacob Pan @ 2019-12-16 19:24 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Lu Baolu, David Woodhouse
  Cc: Tian, Kevin, Raj Ashok, Yi Liu, Eric Auger, Jacob Pan, Liu, Yi L

Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
With PASID granular translation type set to 0x11b, translation
result from the first level(FL) also subject to a second level(SL)
page table translation. This mode is used for SVA virtualization,
where FL performs guest virtual to guest physical translation and
SL performs guest physical to host physical translation.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 drivers/iommu/intel-pasid.c | 213 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/intel-pasid.h |  12 +++
 include/linux/intel-iommu.h |   3 +
 include/uapi/linux/iommu.h  |   5 +-
 4 files changed, 232 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 3cb569e76642..b178ad9e47ae 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -359,6 +359,76 @@ pasid_set_flpm(struct pasid_entry *pe, u64 value)
 	pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
 }
 
+/*
+ * Setup the Extended Memory Type(EMT) field (Bits 91-93)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_emt(struct pasid_entry *pe, u64 value)
+{
+	pasid_set_bits(&pe->val[1], GENMASK_ULL(29, 27), value << 27);
+}
+
+/*
+ * Setup the Page Attribute Table (PAT) field (Bits 96-127)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_pat(struct pasid_entry *pe, u64 value)
+{
+	pasid_set_bits(&pe->val[1], GENMASK_ULL(63, 32), value << 27);
+}
+
+/*
+ * Setup the Cache Disable (CD) field (Bit 89)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_cd(struct pasid_entry *pe)
+{
+	pasid_set_bits(&pe->val[1], 1 << 25, 1);
+}
+
+/*
+ * Setup the Extended Memory Type Enable (EMTE) field (Bit 90)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_emte(struct pasid_entry *pe)
+{
+	pasid_set_bits(&pe->val[1], 1 << 26, 1);
+}
+
+/*
+ * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_eafe(struct pasid_entry *pe)
+{
+	pasid_set_bits(&pe->val[2], 1 << 7, 1);
+}
+
+/*
+ * Setup the Page-level Cache Disable (PCD) field (Bit 95)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_pcd(struct pasid_entry *pe)
+{
+	pasid_set_bits(&pe->val[1], 1 << 31, 1);
+}
+
+/*
+ * Setup the Page-level Write-Through (PWT)) field (Bit 94)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_pwt(struct pasid_entry *pe)
+{
+	pasid_set_bits(&pe->val[1], 1 << 30, 1);
+}
+
 static void
 pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
 				    u16 did, int pasid)
@@ -599,3 +669,146 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 
 	return 0;
 }
+
+static int intel_pasid_setup_bind_data(struct intel_iommu *iommu,
+				struct pasid_entry *pte,
+				struct iommu_gpasid_bind_data_vtd *pasid_data)
+{
+	/*
+	 * Not all guest PASID table entry fields are passed down during bind,
+	 * here we only set up the ones that are dependent on guest settings.
+	 * Execution related bits such as NXE, SMEP are not meaningful to IOMMU,
+	 * therefore not set. Other fields, such as snoop related, are set based
+	 * on host needs regardless of  guest settings.
+	 */
+	if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_SRE) {
+		if (!ecap_srs(iommu->ecap)) {
+			pr_err("No supervisor request support on %s\n",
+			       iommu->name);
+			return -EINVAL;
+		}
+		pasid_set_sre(pte);
+	}
+
+	if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) {
+		if (!ecap_eafs(iommu->ecap)) {
+			pr_err("No extended access flag support on %s\n",
+				iommu->name);
+			return -EINVAL;
+		}
+		pasid_set_eafe(pte);
+	}
+
+	/*
+	 * Memory type is only applicable to devices inside processor coherent
+	 * domain. PCIe devices are not included. We can skip the rest of the
+	 * flags if IOMMU does not support MTS.
+	 */
+	if (ecap_mts(iommu->ecap)) {
+		if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EMTE) {
+			pasid_set_emte(pte);
+			pasid_set_emt(pte, pasid_data->emt);
+		}
+		if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PCD)
+			pasid_set_pcd(pte);
+		if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PWT)
+			pasid_set_pwt(pte);
+		if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_CD)
+			pasid_set_cd(pte);
+		pasid_set_pat(pte, pasid_data->pat);
+	} else if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EMT_MASK) {
+		pr_warn("No memory type support for bind guest PASID on %s\n",
+			iommu->name);
+		return -EINVAL;
+	}
+
+	return 0;
+
+}
+
+/**
+ * intel_pasid_setup_nested() - Set up PASID entry for nested translation
+ * which is used for vSVA. The first level page tables are used for
+ * GVA-GPA or GIOVA-GPA translation in the guest, second level page tables
+ *  are used for GPA-HPA translation.
+ *
+ * @iommu:      Iommu which the device belong to
+ * @dev:        Device to be set up for translation
+ * @gpgd:       FLPTPTR: First Level Page translation pointer in GPA
+ * @pasid:      PASID to be programmed in the device PASID table
+ * @pasid_data: Additional PASID info from the guest bind request
+ * @domain:     Domain info for setting up second level page tables
+ * @addr_width: Address width of the first level (guest)
+ */
+int intel_pasid_setup_nested(struct intel_iommu *iommu,
+			struct device *dev, pgd_t *gpgd,
+			int pasid, struct iommu_gpasid_bind_data_vtd *pasid_data,
+			struct dmar_domain *domain,
+			int addr_width)
+{
+	struct pasid_entry *pte;
+	struct dma_pte *pgd;
+	u64 pgd_val;
+	int agaw;
+	u16 did;
+
+	if (!ecap_nest(iommu->ecap)) {
+		pr_err("IOMMU: %s: No nested translation support\n",
+		       iommu->name);
+		return -EINVAL;
+	}
+
+	pte = intel_pasid_get_entry(dev, pasid);
+	if (WARN_ON(!pte))
+		return -EINVAL;
+
+	pasid_clear_entry(pte);
+
+	/* Sanity checking performed by caller to make sure address
+	 * width matching in two dimensions:
+	 * 1. CPU vs. IOMMU
+	 * 2. Guest vs. Host.
+	 */
+	switch (addr_width) {
+	case ADDR_WIDTH_5LEVEL:
+		pasid_set_flpm(pte, 1);
+		break;
+	case ADDR_WIDTH_4LEVEL:
+		pasid_set_flpm(pte, 0);
+		break;
+	default:
+		dev_err(dev, "Invalid paging mode %d\n", addr_width);
+		return -EINVAL;
+	}
+
+	pasid_set_flptr(pte, (u64)gpgd);
+
+	intel_pasid_setup_bind_data(iommu, pte, pasid_data);
+
+	/* Setup the second level based on the given domain */
+	pgd = domain->pgd;
+
+	for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
+		pgd = phys_to_virt(dma_pte_addr(pgd));
+		if (!dma_pte_present(pgd)) {
+			pasid_clear_entry(pte);
+			dev_err(dev, "Invalid domain page table\n");
+			return -EINVAL;
+		}
+	}
+	pgd_val = virt_to_phys(pgd);
+	pasid_set_slptr(pte, pgd_val);
+	pasid_set_fault_enable(pte);
+
+	did = domain->iommu_did[iommu->seq_id];
+	pasid_set_domain_id(pte, did);
+
+	pasid_set_address_width(pte, agaw);
+	pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+
+	pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
+	pasid_set_present(pte);
+	pasid_flush_caches(iommu, pte, pasid, did);
+
+	return 0;
+}
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index fc8cd8f17de1..95ed160b1947 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -36,6 +36,7 @@
  * to vmalloc or even module mappings.
  */
 #define PASID_FLAG_SUPERVISOR_MODE	BIT(0)
+#define PASID_FLAG_NESTED		BIT(1)
 
 struct pasid_dir_entry {
 	u64 val;
@@ -45,6 +46,11 @@ struct pasid_entry {
 	u64 val[8];
 };
 
+#define PASID_ENTRY_PGTT_FL_ONLY	(1)
+#define PASID_ENTRY_PGTT_SL_ONLY	(2)
+#define PASID_ENTRY_PGTT_NESTED		(3)
+#define PASID_ENTRY_PGTT_PT		(4)
+
 /* The representative of a PASID table */
 struct pasid_table {
 	void			*table;		/* pasid table pointer */
@@ -93,6 +99,12 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 				   struct dmar_domain *domain,
 				   struct device *dev, int pasid);
+int intel_pasid_setup_nested(struct intel_iommu *iommu,
+			struct device *dev, pgd_t *pgd,
+			int pasid,
+			struct iommu_gpasid_bind_data_vtd *pasid_data,
+			struct dmar_domain *domain,
+			int addr_width);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 				 struct device *dev, int pasid);
 
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 74b79e2e6a73..19bf9ff180ae 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -34,6 +34,9 @@
 #define VTD_STRIDE_SHIFT        (9)
 #define VTD_STRIDE_MASK         (((u64)-1) << VTD_STRIDE_SHIFT)
 
+#define ADDR_WIDTH_5LEVEL	(57)
+#define ADDR_WIDTH_4LEVEL	(48)
+
 #define DMA_PTE_READ (1)
 #define DMA_PTE_WRITE (2)
 #define DMA_PTE_LARGE_PAGE (1 << 7)
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 4ad3496e5c43..fcafb6401430 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -284,7 +284,10 @@ struct iommu_gpasid_bind_data_vtd {
 	__u32 pat;
 	__u32 emt;
 };
-
+#define IOMMU_SVA_VTD_GPASID_EMT_MASK	(IOMMU_SVA_VTD_GPASID_CD | \
+					 IOMMU_SVA_VTD_GPASID_EMTE | \
+					 IOMMU_SVA_VTD_GPASID_PCD |  \
+					 IOMMU_SVA_VTD_GPASID_PWT)
 /**
  * struct iommu_gpasid_bind_data - Information about device and guest PASID binding
  * @version:	Version of this data structure
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 03/10] iommu/vt-d: Add bind guest PASID support
  2019-12-16 19:24 [PATCH v8 00/10] Nested Shared Virtual Address (SVA) VT-d support Jacob Pan
  2019-12-16 19:24 ` [PATCH v8 01/10] iommu/vt-d: Move domain helper to header Jacob Pan
  2019-12-16 19:24 ` [PATCH v8 02/10] iommu/vt-d: Add nested translation helper function Jacob Pan
@ 2019-12-16 19:24 ` Jacob Pan
  2019-12-18  3:14   ` Lu Baolu
  2019-12-16 19:24 ` [PATCH v8 04/10] iommu/vt-d: Support flushing more translation cache types Jacob Pan
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Jacob Pan @ 2019-12-16 19:24 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Lu Baolu, David Woodhouse
  Cc: Tian, Kevin, Raj Ashok, Yi Liu, Eric Auger, Jacob Pan, Liu, Yi L

When supporting guest SVA with emulated IOMMU, the guest PASID
table is shadowed in VMM. Updates to guest vIOMMU PASID table
will result in PASID cache flush which will be passed down to
the host as bind guest PASID calls.

For the SL page tables, it will be harvested from device's
default domain (request w/o PASID), or aux domain in case of
mediated device.

    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process CR3, FL only|
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'                       |
    |             |                       V
    |             |                CR3 in GPA
    '-------------'
Guest
------| Shadow |--------------------------|--------
      v        v                          v
Host
    .-------------.  .----------------------.
    |   pIOMMU    |  | Bind FL for GVA-GPA  |
    |             |  '----------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.------------------------------.
    |             |   |SL for GPA-HPA, default domain|
    |             |   '------------------------------'
    '-------------'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c |   4 +
 drivers/iommu/intel-svm.c   | 214 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/intel-iommu.h |   8 +-
 include/linux/intel-svm.h   |  17 ++++
 4 files changed, 242 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index cc89791d807c..304654dbc622 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5993,6 +5993,10 @@ const struct iommu_ops intel_iommu_ops = {
 	.dev_disable_feat	= intel_iommu_dev_disable_feat,
 	.is_attach_deferred	= intel_iommu_is_attach_deferred,
 	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
+#ifdef CONFIG_INTEL_IOMMU_SVM
+	.sva_bind_gpasid	= intel_svm_bind_gpasid,
+	.sva_unbind_gpasid	= intel_svm_unbind_gpasid,
+#endif
 };
 
 static void quirk_iommu_igfx(struct pci_dev *dev)
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 0fcbe631cd5f..f580b7be63c5 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -230,6 +230,220 @@ static LIST_HEAD(global_svm_list);
 	list_for_each_entry((sdev), &(svm)->devs, list)	\
 		if ((d) != (sdev)->dev) {} else
 
+int intel_svm_bind_gpasid(struct iommu_domain *domain,
+			struct device *dev,
+			struct iommu_gpasid_bind_data *data)
+{
+	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
+	struct dmar_domain *ddomain;
+	struct intel_svm_dev *sdev;
+	struct intel_svm *svm;
+	int ret = 0;
+
+	if (WARN_ON(!iommu) || !data)
+		return -EINVAL;
+
+	if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
+	    data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
+		return -EINVAL;
+
+	if (dev_is_pci(dev)) {
+		/* VT-d supports devices with full 20 bit PASIDs only */
+		if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
+			return -EINVAL;
+	} else {
+		return -ENOTSUPP;
+	}
+
+	/*
+	 * We only check host PASID range, we have no knowledge to check
+	 * guest PASID range nor do we use the guest PASID.
+	 */
+	if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
+		return -EINVAL;
+
+	ddomain = to_dmar_domain(domain);
+
+	/* Sanity check paging mode support match between host and guest */
+	if (data->addr_width == ADDR_WIDTH_5LEVEL &&
+	    !cap_5lp_support(iommu->cap)) {
+		pr_err("Cannot support 5 level paging requested by guest!\n");
+		return -EINVAL;
+	}
+
+	mutex_lock(&pasid_mutex);
+	svm = ioasid_find(NULL, data->hpasid, NULL);
+	if (IS_ERR(svm)) {
+		ret = PTR_ERR(svm);
+		goto out;
+	}
+
+	if (svm) {
+		/*
+		 * If we found svm for the PASID, there must be at
+		 * least one device bond, otherwise svm should be freed.
+		 */
+		if (WARN_ON(list_empty(&svm->devs)))
+			return -EINVAL;
+
+		if (svm->mm == get_task_mm(current) &&
+		    data->hpasid == svm->pasid &&
+		    data->gpasid == svm->gpasid) {
+			pr_warn("Cannot bind the same guest-host PASID for the same process\n");
+			mmput(svm->mm);
+			return -EINVAL;
+		}
+
+		for_each_svm_dev(sdev, svm, dev) {
+			/* In case of multiple sub-devices of the same pdev
+			 * assigned, we should allow multiple bind calls with
+			 * the same PASID and pdev.
+			 */
+			sdev->users++;
+			goto out;
+		}
+	} else {
+		/* We come here when PASID has never been bond to a device. */
+		svm = kzalloc(sizeof(*svm), GFP_KERNEL);
+		if (!svm) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		/* REVISIT: upper layer/VFIO can track host process that bind the PASID.
+		 * ioasid_set = mm might be sufficient for vfio to check pasid VMM
+		 * ownership.
+		 */
+		svm->mm = get_task_mm(current);
+		svm->pasid = data->hpasid;
+		if (data->flags & IOMMU_SVA_GPASID_VAL) {
+			svm->gpasid = data->gpasid;
+			svm->flags |= SVM_FLAG_GUEST_PASID;
+		}
+		ioasid_set_data(data->hpasid, svm);
+		INIT_LIST_HEAD_RCU(&svm->devs);
+		INIT_LIST_HEAD(&svm->list);
+
+		mmput(svm->mm);
+	}
+	sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
+	if (!sdev) {
+		if (list_empty(&svm->devs)) {
+			ioasid_set_data(data->hpasid, NULL);
+			kfree(svm);
+		}
+		ret = -ENOMEM;
+		goto out;
+	}
+	sdev->dev = dev;
+	sdev->users = 1;
+
+	/* Set up device context entry for PASID if not enabled already */
+	ret = intel_iommu_enable_pasid(iommu, sdev->dev);
+	if (ret) {
+		dev_err(dev, "Failed to enable PASID capability\n");
+		kfree(sdev);
+		goto out;
+	}
+
+	/*
+	 * For guest bind, we need to set up PASID table entry as follows:
+	 * - FLPM matches guest paging mode
+	 * - turn on nested mode
+	 * - SL guest address width matching
+	 */
+	ret = intel_pasid_setup_nested(iommu,
+				dev,
+				(pgd_t *)data->gpgd,
+				data->hpasid,
+				&data->vtd,
+				ddomain,
+				data->addr_width);
+	if (ret) {
+		dev_err(dev, "Failed to set up PASID %llu in nested mode, Err %d\n",
+			data->hpasid, ret);
+		/*
+		 * PASID entry should be in cleared state if nested mode
+		 * set up failed. So we only need to clear IOASID tracking
+		 * data such that free call will succeed.
+		 */
+		ioasid_set_data(data->hpasid, NULL);
+		kfree(sdev);
+		if (list_empty(&svm->devs))
+			kfree(svm);
+
+		goto out;
+	}
+	svm->flags |= SVM_FLAG_GUEST_MODE;
+
+	init_rcu_head(&sdev->rcu);
+	list_add_rcu(&sdev->list, &svm->devs);
+ out:
+	mutex_unlock(&pasid_mutex);
+	return ret;
+}
+
+int intel_svm_unbind_gpasid(struct device *dev, int pasid)
+{
+	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
+	struct intel_svm_dev *sdev;
+	struct intel_svm *svm;
+	int ret = -EINVAL;
+
+	if (WARN_ON(!iommu))
+		return -EINVAL;
+
+	mutex_lock(&pasid_mutex);
+	svm = ioasid_find(NULL, pasid, NULL);
+	if (!svm) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (IS_ERR(svm)) {
+		ret = PTR_ERR(svm);
+		goto out;
+	}
+
+	for_each_svm_dev(sdev, svm, dev) {
+		ret = 0;
+		sdev->users--;
+		if (!sdev->users) {
+			list_del_rcu(&sdev->list);
+			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+			/* TODO: Drain in flight PRQ for the PASID since it
+			 * may get reused soon, we don't want to
+			 * confuse with its previous life.
+			 * intel_svm_drain_prq(dev, pasid);
+			 */
+			kfree_rcu(sdev, rcu);
+
+			if (list_empty(&svm->devs)) {
+				list_del(&svm->list);
+				/*
+				 * We do not free PASID here until explicit call
+				 * from VFIO to free. The PASID life cycle
+				 * management is largely tied to VFIO management
+				 * of assigned device life cycles. In case of
+				 * guest exit without a explicit free PASID call,
+				 * the responsibility lies in VFIO layer to free
+				 * the PASIDs allocated for the guest.
+				 * For security reasons, VFIO has to track the
+				 * PASID ownership per guest anyway to ensure
+				 * that PASID allocated by one guest cannot be
+				 * used by another.
+				 */
+				ioasid_set_data(pasid, NULL);
+				kfree(svm);
+			}
+		}
+		break;
+	}
+out:
+	mutex_unlock(&pasid_mutex);
+
+	return ret;
+}
+
 int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
 {
 	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 19bf9ff180ae..412a90cb1738 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -671,7 +671,9 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev);
 extern void intel_svm_check(struct intel_iommu *iommu);
 extern int intel_svm_enable_prq(struct intel_iommu *iommu);
 extern int intel_svm_finish_prq(struct intel_iommu *iommu);
-
+extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
+		struct device *dev, struct iommu_gpasid_bind_data *data);
+extern int intel_svm_unbind_gpasid(struct device *dev, int pasid);
 struct svm_dev_ops;
 
 struct intel_svm_dev {
@@ -688,9 +690,13 @@ struct intel_svm_dev {
 struct intel_svm {
 	struct mmu_notifier notifier;
 	struct mm_struct *mm;
+
 	struct intel_iommu *iommu;
 	int flags;
 	int pasid;
+	int gpasid; /* Guest PASID in case of vSVA bind with non-identity host
+		     * to guest PASID mapping.
+		     */
 	struct list_head devs;
 	struct list_head list;
 };
diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
index 94f047a8a845..a2c189ad0b01 100644
--- a/include/linux/intel-svm.h
+++ b/include/linux/intel-svm.h
@@ -44,6 +44,23 @@ struct svm_dev_ops {
  * do such IOTLB flushes automatically.
  */
 #define SVM_FLAG_SUPERVISOR_MODE	(1<<1)
+/*
+ * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind to a device.
+ * In this case the mm_struct is in the guest kernel or userspace, its life
+ * cycle is managed by VMM and VFIO layer. For IOMMU driver, this API provides
+ * means to bind/unbind guest CR3 with PASIDs allocated for a device.
+ */
+#define SVM_FLAG_GUEST_MODE	(1<<2)
+/*
+ * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own PASID space,
+ * which requires guest and host PASID translation at both directions. We keep
+ * track of guest PASID in order to provide lookup service to device drivers.
+ * One such example is a physical function (PF) driver that supports mediated
+ * device (mdev) assignment. Guest programming of mdev configuration space can
+ * only be done with guest PASID, therefore PF driver needs to find the matching
+ * host PASID to program the real hardware.
+ */
+#define SVM_FLAG_GUEST_PASID	(1<<3)
 
 #ifdef CONFIG_INTEL_IOMMU_SVM
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 04/10] iommu/vt-d: Support flushing more translation cache types
  2019-12-16 19:24 [PATCH v8 00/10] Nested Shared Virtual Address (SVA) VT-d support Jacob Pan
                   ` (2 preceding siblings ...)
  2019-12-16 19:24 ` [PATCH v8 03/10] iommu/vt-d: Add bind guest PASID support Jacob Pan
@ 2019-12-16 19:24 ` Jacob Pan
  2019-12-19  2:46   ` Lu Baolu
  2019-12-16 19:24 ` [PATCH v8 05/10] iommu/vt-d: Add svm/sva invalidate function Jacob Pan
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Jacob Pan @ 2019-12-16 19:24 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Lu Baolu, David Woodhouse
  Cc: Tian, Kevin, Raj Ashok, Yi Liu, Eric Auger, Jacob Pan

When Shared Virtual Memory is exposed to a guest via vIOMMU, scalable
IOTLB invalidation may be passed down from outside IOMMU subsystems.
This patch adds invalidation functions that can be used for additional
translation cache types.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/dmar.c        | 46 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/intel-pasid.c |  3 ++-
 include/linux/intel-iommu.h | 21 +++++++++++++++++----
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 3acfa6a25fa2..f2f5d75da94a 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1348,6 +1348,20 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
 	qi_submit_sync(&desc, iommu);
 }
 
+/* PASID-based IOTLB Invalidate */
+void qi_flush_iotlb_pasid(struct intel_iommu *iommu, u16 did, u64 addr, u32 pasid,
+		unsigned int size_order, u64 granu, int ih)
+{
+	struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
+
+	desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
+		QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
+	desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
+		QI_EIOTLB_AM(size_order);
+
+	qi_submit_sync(&desc, iommu);
+}
+
 void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 			u16 qdep, u64 addr, unsigned mask)
 {
@@ -1371,6 +1385,38 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 	qi_submit_sync(&desc, iommu);
 }
 
+/* PASID-based device IOTLB Invalidate */
+void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
+		u32 pasid,  u16 qdep, u64 addr, unsigned size_order, u64 granu)
+{
+	struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
+
+	desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
+		QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
+		QI_DEV_IOTLB_PFSID(pfsid);
+	desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
+
+	/* If S bit is 0, we only flush a single page. If S bit is set,
+	 * The least significant zero bit indicates the invalidation address
+	 * range. VT-d spec 6.5.2.6.
+	 * e.g. address bit 12[0] indicates 8KB, 13[0] indicates 16KB.
+	 */
+	if (!size_order) {
+		desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) & ~QI_DEV_EIOTLB_SIZE;
+	} else {
+		unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order);
+		desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) | QI_DEV_EIOTLB_SIZE;
+	}
+	qi_submit_sync(&desc, iommu);
+}
+
+void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int pasid)
+{
+	struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
+
+	desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) | QI_PC_GRAN(granu) | QI_PC_TYPE;
+	qi_submit_sync(&desc, iommu);
+}
 /*
  * Disable Queued Invalidation interface.
  */
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index b178ad9e47ae..10f8c7564118 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -435,7 +435,8 @@ pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
 {
 	struct qi_desc desc;
 
-	desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL | QI_PC_PASID(pasid);
+	desc.qw0 = QI_PC_DID(did) | QI_PC_GRAN(QI_PC_PASID_SEL) |
+		QI_PC_PASID(pasid) | QI_PC_TYPE;
 	desc.qw1 = 0;
 	desc.qw2 = 0;
 	desc.qw3 = 0;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 412a90cb1738..ee26989df008 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -329,7 +329,7 @@ enum {
 #define QI_IOTLB_GRAN(gran) 	(((u64)gran) >> (DMA_TLB_FLUSH_GRANU_OFFSET-4))
 #define QI_IOTLB_ADDR(addr)	(((u64)addr) & VTD_PAGE_MASK)
 #define QI_IOTLB_IH(ih)		(((u64)ih) << 6)
-#define QI_IOTLB_AM(am)		(((u8)am))
+#define QI_IOTLB_AM(am)		(((u8)am) & 0x3f)
 
 #define QI_CC_FM(fm)		(((u64)fm) << 48)
 #define QI_CC_SID(sid)		(((u64)sid) << 32)
@@ -348,16 +348,21 @@ enum {
 #define QI_PC_DID(did)		(((u64)did) << 16)
 #define QI_PC_GRAN(gran)	(((u64)gran) << 4)
 
-#define QI_PC_ALL_PASIDS	(QI_PC_TYPE | QI_PC_GRAN(0))
-#define QI_PC_PASID_SEL		(QI_PC_TYPE | QI_PC_GRAN(1))
+/* PASID cache invalidation granu */
+#define QI_PC_ALL_PASIDS	0
+#define QI_PC_PASID_SEL		1
 
 #define QI_EIOTLB_ADDR(addr)	((u64)(addr) & VTD_PAGE_MASK)
 #define QI_EIOTLB_IH(ih)	(((u64)ih) << 6)
-#define QI_EIOTLB_AM(am)	(((u64)am))
+#define QI_EIOTLB_AM(am)	(((u64)am) & 0x3f)
 #define QI_EIOTLB_PASID(pasid) 	(((u64)pasid) << 32)
 #define QI_EIOTLB_DID(did)	(((u64)did) << 16)
 #define QI_EIOTLB_GRAN(gran) 	(((u64)gran) << 4)
 
+/* QI Dev-IOTLB inv granu */
+#define QI_DEV_IOTLB_GRAN_ALL		1
+#define QI_DEV_IOTLB_GRAN_PASID_SEL	0
+
 #define QI_DEV_EIOTLB_ADDR(a)	((u64)(a) & VTD_PAGE_MASK)
 #define QI_DEV_EIOTLB_SIZE	(((u64)1) << 11)
 #define QI_DEV_EIOTLB_GLOB(g)	((u64)g)
@@ -653,8 +658,16 @@ extern void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid,
 			     u8 fm, u64 type);
 extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
 			  unsigned int size_order, u64 type);
+extern void qi_flush_iotlb_pasid(struct intel_iommu *iommu, u16 did, u64 addr,
+			u32 pasid, unsigned int size_order, u64 type, int ih);
 extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 			u16 qdep, u64 addr, unsigned mask);
+
+extern void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
+			u32 pasid, u16 qdep, u64 addr, unsigned size_order, u64 granu);
+
+extern void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int pasid);
+
 extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
 
 extern int dmar_ir_support(void);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 05/10] iommu/vt-d: Add svm/sva invalidate function
  2019-12-16 19:24 [PATCH v8 00/10] Nested Shared Virtual Address (SVA) VT-d support Jacob Pan
                   ` (3 preceding siblings ...)
  2019-12-16 19:24 ` [PATCH v8 04/10] iommu/vt-d: Support flushing more translation cache types Jacob Pan
@ 2019-12-16 19:24 ` Jacob Pan
  2019-12-16 19:24 ` [PATCH v8 06/10] iommu/vt-d: Cache virtual command capability register Jacob Pan
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Jacob Pan @ 2019-12-16 19:24 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Lu Baolu, David Woodhouse
  Cc: Tian, Kevin, Raj Ashok, Yi Liu, Eric Auger, Jacob Pan, Liu, Yi L

When Shared Virtual Address (SVA) is enabled for a guest OS via
vIOMMU, we need to provide invalidation support at IOMMU API and driver
level. This patch adds Intel VT-d specific function to implement
iommu passdown invalidate API for shared virtual address.

The use case is for supporting caching structure invalidation
of assigned SVM capable devices. Emulated IOMMU exposes queue
invalidation capability and passes down all descriptors from the guest
to the physical IOMMU.

The assumption is that guest to host device ID mapping should be
resolved prior to calling IOMMU driver. Based on the device handle,
host IOMMU driver can replace certain fields before submit to the
invalidation queue.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 170 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 170 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 304654dbc622..e90102c7540d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5458,6 +5458,175 @@ static void intel_iommu_aux_detach_device(struct iommu_domain *domain,
 	aux_domain_remove_dev(to_dmar_domain(domain), dev);
 }
 
+/*
+ * 2D array for converting and sanitizing IOMMU generic TLB granularity to
+ * VT-d granularity. Invalidation is typically included in the unmap operation
+ * as a result of DMA or VFIO unmap. However, for assigned device where guest
+ * could own the first level page tables without being shadowed by QEMU. In
+ * this case there is no pass down unmap to the host IOMMU as a result of unmap
+ * in the guest. Only invalidations are trapped and passed down.
+ * In all cases, only first level TLB invalidation (request with PASID) can be
+ * passed down, therefore we do not include IOTLB granularity for request
+ * without PASID (second level).
+ *
+ * For an example, to find the VT-d granularity encoding for IOTLB
+ * type and page selective granularity within PASID:
+ * X: indexed by iommu cache type
+ * Y: indexed by enum iommu_inv_granularity
+ * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
+ *
+ * Granu_map array indicates validity of the table. 1: valid, 0: invalid
+ *
+ */
+const static int inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
+	/* PASID based IOTLB, support PASID selective and page selective */
+	{0, 1, 1},
+	/* PASID based dev TLBs, only support all PASIDs or single PASID */
+	{1, 1, 0},
+	/* PASID cache */
+	{1, 1, 0}
+};
+
+const static u64 inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
+	/* PASID based IOTLB */
+	{0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
+	/* PASID based dev TLBs */
+	{QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
+	/* PASID cache */
+	{QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0},
+};
+
+static inline int to_vtd_granularity(int type, int granu, u64 *vtd_granu)
+{
+	if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >= IOMMU_INV_GRANU_NR ||
+		!inv_type_granu_map[type][granu])
+		return -EINVAL;
+
+	*vtd_granu = inv_type_granu_table[type][granu];
+
+	return 0;
+}
+
+static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
+{
+	u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT;
+
+	/* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.
+	 * IOMMU cache invalidate API passes granu_size in bytes, and number of
+	 * granu size in contiguous memory.
+	 */
+	return order_base_2(nr_pages);
+}
+
+#ifdef CONFIG_INTEL_IOMMU_SVM
+static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
+		struct device *dev, struct iommu_cache_invalidate_info *inv_info)
+{
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct device_domain_info *info;
+	struct intel_iommu *iommu;
+	unsigned long flags;
+	int cache_type;
+	u8 bus, devfn;
+	u16 did, sid;
+	int ret = 0;
+	u64 size;
+
+	if (!inv_info || !dmar_domain ||
+		inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+		return -EINVAL;
+
+	if (!dev || !dev_is_pci(dev))
+		return -ENODEV;
+
+	iommu = device_to_iommu(dev, &bus, &devfn);
+	if (!iommu)
+		return -ENODEV;
+
+	spin_lock_irqsave(&device_domain_lock, flags);
+	spin_lock(&iommu->lock);
+	info = iommu_support_dev_iotlb(dmar_domain, iommu, bus, devfn);
+	if (!info) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+	did = dmar_domain->iommu_did[iommu->seq_id];
+	sid = PCI_DEVID(bus, devfn);
+	size = to_vtd_size(inv_info->addr_info.granule_size, inv_info->addr_info.nb_granules);
+
+	for_each_set_bit(cache_type, (unsigned long *)&inv_info->cache, IOMMU_CACHE_INV_TYPE_NR) {
+		u64 granu = 0;
+		u64 pasid = 0;
+
+		ret = to_vtd_granularity(cache_type, inv_info->granularity, &granu);
+		if (ret) {
+			pr_err("Invalid cache type and granu combination %d/%d\n", cache_type,
+				inv_info->granularity);
+			break;
+		}
+
+		/* PASID is stored in different locations based on granularity */
+		if (inv_info->granularity == IOMMU_INV_GRANU_PASID)
+			pasid = inv_info->pasid_info.pasid;
+		else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
+			pasid = inv_info->addr_info.pasid;
+		else {
+			pr_err("Cannot find PASID for given cache type and granularity\n");
+			break;
+		}
+
+		switch (BIT(cache_type)) {
+		case IOMMU_CACHE_INV_TYPE_IOTLB:
+			if (size && (inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
+				pr_err("Address out of range, 0x%llx, size order %llu\n",
+					inv_info->addr_info.addr, size);
+				ret = -ERANGE;
+				goto out_unlock;
+			}
+
+			qi_flush_iotlb_pasid(iommu, did, mm_to_dma_pfn(inv_info->addr_info.addr),
+					pasid, size, granu, inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF);
+
+			/*
+			 * Always flush device IOTLB if ATS is enabled since guest
+			 * vIOMMU exposes CM = 1, no device IOTLB flush will be passed
+			 * down.
+			 */
+			if (info->ats_enabled) {
+				qi_flush_dev_iotlb_pasid(iommu, sid, info->pfsid,
+						pasid, info->ats_qdep,
+						inv_info->addr_info.addr, size,
+						granu);
+			}
+			break;
+		case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
+			if (info->ats_enabled) {
+				qi_flush_dev_iotlb_pasid(iommu, sid, info->pfsid,
+						inv_info->addr_info.pasid, info->ats_qdep,
+						inv_info->addr_info.addr, size,
+						granu);
+			} else
+				pr_warn("Passdown device IOTLB flush w/o ATS!\n");
+
+			break;
+		case IOMMU_CACHE_INV_TYPE_PASID:
+			qi_flush_pasid_cache(iommu, did, granu, inv_info->pasid_info.pasid);
+
+			break;
+		default:
+			dev_err(dev, "Unsupported IOMMU invalidation type %d\n",
+				cache_type);
+			ret = -EINVAL;
+		}
+	}
+out_unlock:
+	spin_unlock(&iommu->lock);
+	spin_unlock_irqrestore(&device_domain_lock, flags);
+
+	return ret;
+}
+#endif
+
 static int intel_iommu_map(struct iommu_domain *domain,
 			   unsigned long iova, phys_addr_t hpa,
 			   size_t size, int iommu_prot, gfp_t gfp)
@@ -5994,6 +6163,7 @@ const struct iommu_ops intel_iommu_ops = {
 	.is_attach_deferred	= intel_iommu_is_attach_deferred,
 	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
 #ifdef CONFIG_INTEL_IOMMU_SVM
+	.cache_invalidate	= intel_iommu_sva_invalidate,
 	.sva_bind_gpasid	= intel_svm_bind_gpasid,
 	.sva_unbind_gpasid	= intel_svm_unbind_gpasid,
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 06/10] iommu/vt-d: Cache virtual command capability register
  2019-12-16 19:24 [PATCH v8 00/10] Nested Shared Virtual Address (SVA) VT-d support Jacob Pan
                   ` (4 preceding siblings ...)
  2019-12-16 19:24 ` [PATCH v8 05/10] iommu/vt-d: Add svm/sva invalidate function Jacob Pan
@ 2019-12-16 19:24 ` Jacob Pan
  2019-12-18  3:25   ` Lu Baolu
  2019-12-16 19:24 ` [PATCH v8 07/10] iommu/vt-d: Enlightened PASID allocation Jacob Pan
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Jacob Pan @ 2019-12-16 19:24 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Lu Baolu, David Woodhouse
  Cc: Tian, Kevin, Raj Ashok, Yi Liu, Eric Auger, Jacob Pan

Virtual command registers are used in the guest only, to prevent
vmexit cost, we cache the capability and store it during initialization.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/dmar.c        | 1 +
 include/linux/intel-iommu.h | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index f2f5d75da94a..3f98dd9ad004 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -953,6 +953,7 @@ static int map_iommu(struct intel_iommu *iommu, u64 phys_addr)
 		warn_invalid_dmar(phys_addr, " returns all ones");
 		goto unmap;
 	}
+	iommu->vccap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
 
 	/* the registers might be more than one page */
 	map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index ee26989df008..4d25141ec3df 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -189,6 +189,9 @@
 #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
 #define ecap_sc_support(e)	((e >> 7) & 0x1) /* Snooping Control */
 
+/* Virtual command interface capabilities */
+#define vccap_pasid(v)		((v & DMA_VCS_PAS)) /* PASID allocation */
+
 /* IOTLB_REG */
 #define DMA_TLB_FLUSH_GRANU_OFFSET  60
 #define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60)
@@ -531,6 +534,7 @@ struct intel_iommu {
 	u64		reg_size; /* size of hw register set */
 	u64		cap;
 	u64		ecap;
+	u64		vccap;
 	u32		gcmd; /* Holds TE, EAFL. Don't need SRTP, SFL, WBF */
 	raw_spinlock_t	register_lock; /* protect register handling */
 	int		seq_id;	/* sequence id of the iommu */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 07/10] iommu/vt-d: Enlightened PASID allocation
  2019-12-16 19:24 [PATCH v8 00/10] Nested Shared Virtual Address (SVA) VT-d support Jacob Pan
                   ` (5 preceding siblings ...)
  2019-12-16 19:24 ` [PATCH v8 06/10] iommu/vt-d: Cache virtual command capability register Jacob Pan
@ 2019-12-16 19:24 ` Jacob Pan
  2019-12-16 19:24 ` [PATCH v8 08/10] iommu/vt-d: Add custom allocator for IOASID Jacob Pan
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Jacob Pan @ 2019-12-16 19:24 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Lu Baolu, David Woodhouse
  Cc: Tian, Kevin, Raj Ashok, Yi Liu, Eric Auger, Jacob Pan

From: Lu Baolu <baolu.lu@linux.intel.com>

Enabling IOMMU in a guest requires communication with the host
driver for certain aspects. Use of PASID ID to enable Shared Virtual
Addressing (SVA) requires managing PASID's in the host. VT-d 3.0 spec
provides a Virtual Command Register (VCMD) to facilitate this.
Writes to this register in the guest are trapped by QEMU which
proxies the call to the host driver.

This virtual command interface consists of a capability register,
a virtual command register, and a virtual response register. Refer
to section 10.4.42, 10.4.43, 10.4.44 for more information.

This patch adds the enlightened PASID allocation/free interfaces
via the virtual command interface.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/intel-pasid.c | 57 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/intel-pasid.h | 13 ++++++++++-
 include/linux/intel-iommu.h |  2 ++
 3 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 10f8c7564118..0a135a5937f3 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -27,6 +27,63 @@
 static DEFINE_SPINLOCK(pasid_lock);
 u32 intel_pasid_max_id = PASID_MAX;
 
+int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
+{
+	unsigned long flags;
+	u8 status_code;
+	int ret = 0;
+	u64 res;
+
+	raw_spin_lock_irqsave(&iommu->register_lock, flags);
+	dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC);
+	IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
+		      !(res & VCMD_VRSP_IP), res);
+	raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
+
+	status_code = VCMD_VRSP_SC(res);
+	switch (status_code) {
+	case VCMD_VRSP_SC_SUCCESS:
+		*pasid = VCMD_VRSP_RESULT_PASID(res);
+		break;
+	case VCMD_VRSP_SC_NO_PASID_AVAIL:
+		pr_info("IOMMU: %s: No PASID available\n", iommu->name);
+		ret = -ENOSPC;
+		break;
+	default:
+		ret = -ENODEV;
+		pr_warn("IOMMU: %s: Unexpected error code %d\n",
+			iommu->name, status_code);
+	}
+
+	return ret;
+}
+
+void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
+{
+	unsigned long flags;
+	u8 status_code;
+	u64 res;
+
+	raw_spin_lock_irqsave(&iommu->register_lock, flags);
+	dmar_writeq(iommu->reg + DMAR_VCMD_REG,
+		    VCMD_CMD_OPERAND(pasid) | VCMD_CMD_FREE);
+	IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
+		      !(res & VCMD_VRSP_IP), res);
+	raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
+
+	status_code = VCMD_VRSP_SC(res);
+	switch (status_code) {
+	case VCMD_VRSP_SC_SUCCESS:
+		break;
+	case VCMD_VRSP_SC_INVALID_PASID:
+		pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
+		break;
+	default:
+		pr_warn("IOMMU: %s: Unexpected error code %d\n",
+			iommu->name, status_code);
+	}
+}
+
 /*
  * Per device pasid table management:
  */
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 95ed160b1947..b4c8aece979f 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -23,6 +23,16 @@
 #define is_pasid_enabled(entry)		(((entry)->lo >> 3) & 0x1)
 #define get_pasid_dir_size(entry)	(1 << ((((entry)->lo >> 9) & 0x7) + 7))
 
+/* Virtual command interface for enlightened pasid management. */
+#define VCMD_CMD_ALLOC			0x1
+#define VCMD_CMD_FREE			0x2
+#define VCMD_VRSP_IP			0x1
+#define VCMD_VRSP_SC(e)			(((e) >> 1) & 0x3)
+#define VCMD_VRSP_SC_SUCCESS		0
+#define VCMD_VRSP_SC_NO_PASID_AVAIL	1
+#define VCMD_VRSP_SC_INVALID_PASID	1
+#define VCMD_VRSP_RESULT_PASID(e)	(((e) >> 8) & 0xfffff)
+#define VCMD_CMD_OPERAND(e)		((e) << 8)
 /*
  * Domain ID reserved for pasid entries programmed for first-level
  * only and pass-through transfer modes.
@@ -107,5 +117,6 @@ int intel_pasid_setup_nested(struct intel_iommu *iommu,
 			int addr_width);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 				 struct device *dev, int pasid);
-
+int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
+void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid);
 #endif /* __INTEL_PASID_H */
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 4d25141ec3df..1e11560b0e59 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -164,6 +164,7 @@
 #define ecap_smpwc(e)		(((e) >> 48) & 0x1)
 #define ecap_flts(e)		(((e) >> 47) & 0x1)
 #define ecap_slts(e)		(((e) >> 46) & 0x1)
+#define ecap_vcs(e)		(((e) >> 44) & 0x1)
 #define ecap_smts(e)		(((e) >> 43) & 0x1)
 #define ecap_dit(e)		((e >> 41) & 0x1)
 #define ecap_pasid(e)		((e >> 40) & 0x1)
@@ -285,6 +286,7 @@
 
 /* PRS_REG */
 #define DMA_PRS_PPR	((u32)1)
+#define DMA_VCS_PAS	((u64)1)
 
 #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts)			\
 do {									\
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 08/10] iommu/vt-d: Add custom allocator for IOASID
  2019-12-16 19:24 [PATCH v8 00/10] Nested Shared Virtual Address (SVA) VT-d support Jacob Pan
                   ` (6 preceding siblings ...)
  2019-12-16 19:24 ` [PATCH v8 07/10] iommu/vt-d: Enlightened PASID allocation Jacob Pan
@ 2019-12-16 19:24 ` Jacob Pan
  2019-12-18  4:10   ` Lu Baolu
  2019-12-16 19:24 ` [PATCH v8 09/10] iommu/ioasid: Add notifier for status change Jacob Pan
  2019-12-16 19:24 ` [PATCH v8 10/10] iommu/vt-d: Handle IOASID notifications Jacob Pan
  9 siblings, 1 reply; 27+ messages in thread
From: Jacob Pan @ 2019-12-16 19:24 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Lu Baolu, David Woodhouse
  Cc: Tian, Kevin, Raj Ashok, Yi Liu, Eric Auger, Jacob Pan, Liu

When VT-d driver runs in the guest, PASID allocation must be
performed via virtual command interface. This patch registers a
custom IOASID allocator which takes precedence over the default
XArray based allocator. The resulting IOASID allocation will always
come from the host. This ensures that PASID namespace is system-
wide.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 75 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/intel-iommu.h |  2 ++
 2 files changed, 77 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index e90102c7540d..b0c0bb6f740e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1700,6 +1700,9 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
 		if (ecap_prs(iommu->ecap))
 			intel_svm_finish_prq(iommu);
 	}
+	if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
+		ioasid_unregister_allocator(&iommu->pasid_allocator);
+
 #endif
 }
 
@@ -3181,6 +3184,75 @@ static int copy_translation_tables(struct intel_iommu *iommu)
 	return ret;
 }
 
+#ifdef CONFIG_INTEL_IOMMU_SVM
+static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data)
+{
+	struct intel_iommu *iommu = data;
+	ioasid_t ioasid;
+
+	/*
+	 * VT-d virtual command interface always uses the full 20 bit
+	 * PASID range. Host can partition guest PASID range based on
+	 * policies but it is out of guest's control.
+	 */
+	if (min < PASID_MIN || max > intel_pasid_max_id)
+		return INVALID_IOASID;
+
+	if (vcmd_alloc_pasid(iommu, &ioasid))
+		return INVALID_IOASID;
+
+	return ioasid;
+}
+
+static void intel_ioasid_free(ioasid_t ioasid, void *data)
+{
+	struct intel_iommu *iommu = data;
+
+	if (!iommu)
+		return;
+	/*
+	 * Sanity check the ioasid owner is done at upper layer, e.g. VFIO
+	 * We can only free the PASID when all the devices are unbound.
+	 */
+	if (ioasid_find(NULL, ioasid, NULL)) {
+		pr_alert("Cannot free active IOASID %d\n", ioasid);
+		return;
+	}
+	vcmd_free_pasid(iommu, ioasid);
+}
+
+static void register_pasid_allocator(struct intel_iommu *iommu)
+{
+	if (!intel_iommu_sm) {
+		pr_warn("VT-d scalable mode not enabled\n");
+		return;
+	}
+
+	/*
+	 * Register a custom PASID allocator if we are running in a guest,
+	 * guest PASID must be obtained via virtual command interface.
+	 * There can be multiple vIOMMUs in each guest but only one allocator
+	 * is active. All vIOMMU allocators will eventually be calling the same
+	 * host allocator.
+	 */
+	if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap)) {
+		pr_info("Register custom PASID allocator\n");
+		iommu->pasid_allocator.alloc = intel_ioasid_alloc;
+		iommu->pasid_allocator.free = intel_ioasid_free;
+		iommu->pasid_allocator.pdata = (void *)iommu;
+		if (!ioasid_register_allocator(&iommu->pasid_allocator)) {
+			pr_warn("Custom PASID allocator failed, scalable mode disabled\n");
+			/*
+			 * Disable scalable mode on this IOMMU if there
+			 * is no custom allocator. Mixing SM capable vIOMMU
+			 * and non-SM vIOMMU are not supported.
+			 */
+			intel_iommu_sm = 0;
+		}
+	}
+}
+#endif
+
 static int __init init_dmars(void)
 {
 	struct dmar_drhd_unit *drhd;
@@ -3298,6 +3370,9 @@ static int __init init_dmars(void)
 	 */
 	for_each_active_iommu(iommu, drhd) {
 		iommu_flush_write_buffer(iommu);
+#ifdef CONFIG_INTEL_IOMMU_SVM
+		register_pasid_allocator(iommu);
+#endif
 		iommu_set_root_entry(iommu);
 		iommu->flush.flush_context(iommu, 0, 0, 0, DMA_CCMD_GLOBAL_INVL);
 		iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 1e11560b0e59..8c30b23bd838 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -19,6 +19,7 @@
 #include <linux/iommu.h>
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/dmar.h>
+#include <linux/ioasid.h>
 
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
@@ -557,6 +558,7 @@ struct intel_iommu {
 #ifdef CONFIG_INTEL_IOMMU_SVM
 	struct page_req_dsc *prq;
 	unsigned char prq_name[16];    /* Name for PRQ interrupt */
+	struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for PASIDs */
 #endif
 	struct q_inval  *qi;            /* Queued invalidation info */
 	u32 *iommu_state; /* Store iommu states between suspend and resume.*/
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 09/10] iommu/ioasid: Add notifier for status change
  2019-12-16 19:24 [PATCH v8 00/10] Nested Shared Virtual Address (SVA) VT-d support Jacob Pan
                   ` (7 preceding siblings ...)
  2019-12-16 19:24 ` [PATCH v8 08/10] iommu/vt-d: Add custom allocator for IOASID Jacob Pan
@ 2019-12-16 19:24 ` Jacob Pan
  2019-12-16 19:24 ` [PATCH v8 10/10] iommu/vt-d: Handle IOASID notifications Jacob Pan
  9 siblings, 0 replies; 27+ messages in thread
From: Jacob Pan @ 2019-12-16 19:24 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Lu Baolu, David Woodhouse
  Cc: Tian, Kevin, Raj Ashok, Yi Liu, Eric Auger, Jacob Pan

IOASIDs are system resources that can be shared by multiple drivers or
subsystems. When status of an IOASID changes at runtime, there is need
to notify all current users such that proper actions can be taken.

For example, an IOASID can be used by IOMMU subsystem for guest SVM as
well as KVM. When the guest is terminating unexpectedly, both KVM and
IOMMU need to perform clean up action before the IOASID is reclaimed.

This patch adds a per IOASID notifier that can be registered by
interesting parties.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/ioasid.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/ioasid.h | 20 ++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
index 0f8dd377aada..53a2ab287f7d 100644
--- a/drivers/iommu/ioasid.c
+++ b/drivers/iommu/ioasid.c
@@ -15,6 +15,7 @@ struct ioasid_data {
 	struct ioasid_set *set;
 	void *private;
 	struct rcu_head rcu;
+	struct atomic_notifier_head notifications;
 };
 
 /*
@@ -314,6 +315,7 @@ ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
 
 	data->set = set;
 	data->private = private;
+	ATOMIC_INIT_NOTIFIER_HEAD(&data->notifications);
 
 	/*
 	 * Custom allocator needs allocator data to perform platform specific
@@ -360,6 +362,9 @@ void ioasid_free(ioasid_t ioasid)
 		goto exit_unlock;
 	}
 
+	/* Notify all users that this IOASID is being freed */
+	atomic_notifier_call_chain(&ioasid_data->notifications, IOASID_FREE,
+				     &ioasid);
 	active_allocator->ops->free(ioasid, active_allocator->ops->pdata);
 	/* Custom allocator needs additional steps to free the xa element */
 	if (active_allocator->flags & IOASID_ALLOCATOR_CUSTOM) {
@@ -416,6 +421,44 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
 }
 EXPORT_SYMBOL_GPL(ioasid_find);
 
+int ioasid_add_notifier(ioasid_t ioasid, struct notifier_block *nb)
+{
+	struct ioasid_allocator_data *idata;
+	struct ioasid_data *data;
+	int ret = 0;
+
+	rcu_read_lock();
+	idata = rcu_dereference(active_allocator);
+	data = xa_load(&idata->xa, ioasid);
+	if (!data) {
+		ret = -ENOENT;
+		goto unlock;
+	}
+	ret = atomic_notifier_chain_register(&data->notifications, nb);
+unlock:
+	rcu_read_unlock();
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ioasid_add_notifier);
+
+void ioasid_remove_notifier(ioasid_t ioasid, struct notifier_block *nb)
+{
+	struct ioasid_allocator_data *idata;
+	struct ioasid_data *data;
+
+	rcu_read_lock();
+	idata = rcu_dereference(active_allocator);
+	data = xa_load(&idata->xa, ioasid);
+	rcu_read_unlock();
+	if (!data) {
+		pr_err("IOASID %d not found\n", ioasid);
+		return;
+	}
+	/* Unregister can sleep, called outside RCU critical section. */
+	atomic_notifier_chain_unregister(&data->notifications, nb);
+}
+EXPORT_SYMBOL_GPL(ioasid_remove_notifier);
+
 MODULE_AUTHOR("Jean-Philippe Brucker <jean-philippe.brucker@arm.com>");
 MODULE_AUTHOR("Jacob Pan <jacob.jun.pan@linux.intel.com>");
 MODULE_DESCRIPTION("IO Address Space ID (IOASID) allocator");
diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
index 6f000d7a0ddc..4517c4be4088 100644
--- a/include/linux/ioasid.h
+++ b/include/linux/ioasid.h
@@ -4,6 +4,7 @@
 
 #include <linux/types.h>
 #include <linux/errno.h>
+#include <linux/notifier.h>
 
 #define INVALID_IOASID ((ioasid_t)-1)
 typedef unsigned int ioasid_t;
@@ -29,6 +30,12 @@ struct ioasid_allocator_ops {
 	void *pdata;
 };
 
+/* Notification data when IOASID status changed */
+enum ioasid_notify_val {
+	IOASID_FREE = 1,
+	IOASID_SUSPEND,
+};
+
 #define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
 
 #if IS_ENABLED(CONFIG_IOASID)
@@ -40,6 +47,8 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
 int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
 void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
 int ioasid_set_data(ioasid_t ioasid, void *data);
+int ioasid_add_notifier(ioasid_t ioasid, struct notifier_block *nb);
+void ioasid_remove_notifier(ioasid_t ioasid, struct notifier_block *nb);
 
 #else /* !CONFIG_IOASID */
 static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
@@ -58,6 +67,17 @@ static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
 	return NULL;
 }
 
+static inline int ioasid_add_notifier(ioasid_t ioasid,
+				      struct notifier_block *nb)
+{
+	return -ENOTSUPP;
+}
+
+static inline void ioasid_remove_notifier(ioasid_t ioasid,
+				       struct notifier_block *nb)
+{
+}
+
 static inline int ioasid_register_allocator(struct ioasid_allocator_ops *allocator)
 {
 	return -ENOTSUPP;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 10/10] iommu/vt-d: Handle IOASID notifications
  2019-12-16 19:24 [PATCH v8 00/10] Nested Shared Virtual Address (SVA) VT-d support Jacob Pan
                   ` (8 preceding siblings ...)
  2019-12-16 19:24 ` [PATCH v8 09/10] iommu/ioasid: Add notifier for status change Jacob Pan
@ 2019-12-16 19:24 ` Jacob Pan
  9 siblings, 0 replies; 27+ messages in thread
From: Jacob Pan @ 2019-12-16 19:24 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Lu Baolu, David Woodhouse
  Cc: Tian, Kevin, Raj Ashok, Yi Liu, Eric Auger, Jacob Pan

IOASID/PASID are shared system resources that can be freed by software
components outside IOMMU subsystem. When status of an IOASID changes,
e.g. freed or suspended, notifications will be available to its users to
take proper action.

This patch adds a notification block such that when IOASID is freed by
other components such as VFIO, associated software and hardware context
can be cleaned.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel-svm.c   | 52 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/intel-iommu.h |  2 +-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index f580b7be63c5..a660e741551c 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -230,6 +230,48 @@ static LIST_HEAD(global_svm_list);
 	list_for_each_entry((sdev), &(svm)->devs, list)	\
 		if ((d) != (sdev)->dev) {} else
 
+static int ioasid_status_change(struct notifier_block *nb,
+				unsigned long code, void *data)
+{
+	ioasid_t ioasid = *(ioasid_t *)data;
+	struct intel_svm_dev *sdev;
+	struct intel_svm *svm;
+
+	if (code == IOASID_FREE) {
+		/*
+		 * Unbind all devices associated with this PASID which is
+		 * being freed by other users such as VFIO.
+		 */
+		svm = ioasid_find(NULL, ioasid, NULL);
+		if (!svm || !svm->iommu)
+			return NOTIFY_DONE;
+
+		if (IS_ERR(svm))
+			return NOTIFY_BAD;
+
+		list_for_each_entry(sdev, &svm->devs, list) {
+			list_del_rcu(&sdev->list);
+			intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
+						    svm->pasid);
+				kfree_rcu(sdev, rcu);
+
+				if (list_empty(&svm->devs)) {
+					list_del(&svm->list);
+					ioasid_set_data(ioasid, NULL);
+					kfree(svm);
+				}
+		}
+
+		return NOTIFY_OK;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block svm_ioasid_notifier = {
+		.notifier_call = ioasid_status_change,
+};
+
 int intel_svm_bind_gpasid(struct iommu_domain *domain,
 			struct device *dev,
 			struct iommu_gpasid_bind_data *data)
@@ -319,6 +361,13 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain,
 			svm->gpasid = data->gpasid;
 			svm->flags |= SVM_FLAG_GUEST_PASID;
 		}
+		/* Get notified when IOASID is freed by others, e.g. VFIO */
+		ret = ioasid_add_notifier(data->hpasid, &svm_ioasid_notifier);
+		if (ret) {
+			mmput(svm->mm);
+			kfree(svm);
+			goto out;
+		}
 		ioasid_set_data(data->hpasid, svm);
 		INIT_LIST_HEAD_RCU(&svm->devs);
 		INIT_LIST_HEAD(&svm->list);
@@ -432,6 +481,9 @@ int intel_svm_unbind_gpasid(struct device *dev, int pasid)
 				 * that PASID allocated by one guest cannot be
 				 * used by another.
 				 */
+				ioasid_remove_notifier(pasid,
+						       &svm_ioasid_notifier);
+
 				ioasid_set_data(pasid, NULL);
 				kfree(svm);
 			}
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 8c30b23bd838..e2a33c794e8d 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -711,7 +711,7 @@ struct intel_svm_dev {
 struct intel_svm {
 	struct mmu_notifier notifier;
 	struct mm_struct *mm;
-
+	struct notifier_block *nb;
 	struct intel_iommu *iommu;
 	int flags;
 	int pasid;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 02/10] iommu/vt-d: Add nested translation helper function
  2019-12-16 19:24 ` [PATCH v8 02/10] iommu/vt-d: Add nested translation helper function Jacob Pan
@ 2019-12-18  2:01   ` Lu Baolu
  2020-01-09 17:51     ` Jacob Pan
  2019-12-18  2:41   ` Lu Baolu
  1 sibling, 1 reply; 27+ messages in thread
From: Lu Baolu @ 2019-12-18  2:01 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: baolu.lu, Tian, Kevin, Raj Ashok, Yi Liu, Eric Auger

Hi Jacob,

On 12/17/19 3:24 AM, Jacob Pan wrote:
> Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
> With PASID granular translation type set to 0x11b, translation
> result from the first level(FL) also subject to a second level(SL)
> page table translation. This mode is used for SVA virtualization,
> where FL performs guest virtual to guest physical translation and
> SL performs guest physical to host physical translation.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>   drivers/iommu/intel-pasid.c | 213 ++++++++++++++++++++++++++++++++++++++++++++
>   drivers/iommu/intel-pasid.h |  12 +++
>   include/linux/intel-iommu.h |   3 +
>   include/uapi/linux/iommu.h  |   5 +-
>   4 files changed, 232 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 3cb569e76642..b178ad9e47ae 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -359,6 +359,76 @@ pasid_set_flpm(struct pasid_entry *pe, u64 value)
>   	pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
>   }
>   
> +/*
> + * Setup the Extended Memory Type(EMT) field (Bits 91-93)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_emt(struct pasid_entry *pe, u64 value)
> +{
> +	pasid_set_bits(&pe->val[1], GENMASK_ULL(29, 27), value << 27);
> +}
> +
> +/*
> + * Setup the Page Attribute Table (PAT) field (Bits 96-127)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_pat(struct pasid_entry *pe, u64 value)
> +{
> +	pasid_set_bits(&pe->val[1], GENMASK_ULL(63, 32), value << 27);

The last input should be "value << 32".

> +}
> +
> +/*
> + * Setup the Cache Disable (CD) field (Bit 89)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_cd(struct pasid_entry *pe)
> +{
> +	pasid_set_bits(&pe->val[1], 1 << 25, 1);

The last input should be "1 << 25".

> +}
> +
> +/*
> + * Setup the Extended Memory Type Enable (EMTE) field (Bit 90)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_emte(struct pasid_entry *pe)
> +{
> +	pasid_set_bits(&pe->val[1], 1 << 26, 1);

The last input should be "1 << 26".

> +}
> +
> +/*
> + * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_eafe(struct pasid_entry *pe)
> +{
> +	pasid_set_bits(&pe->val[2], 1 << 7, 1);

The last input should be "1 << 7".

> +}
> +
> +/*
> + * Setup the Page-level Cache Disable (PCD) field (Bit 95)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_pcd(struct pasid_entry *pe)
> +{
> +	pasid_set_bits(&pe->val[1], 1 << 31, 1);

The last input should be "1 << 31".

> +}
> +
> +/*
> + * Setup the Page-level Write-Through (PWT)) field (Bit 94)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_pwt(struct pasid_entry *pe)
> +{
> +	pasid_set_bits(&pe->val[1], 1 << 30, 1);

The last input should be "1 << 30".

> +}
> +
>   static void
>   pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
>   				    u16 did, int pasid)
> @@ -599,3 +669,146 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
>   
>   	return 0;
>   }
> +
> +static int intel_pasid_setup_bind_data(struct intel_iommu *iommu,
> +				struct pasid_entry *pte,
> +				struct iommu_gpasid_bind_data_vtd *pasid_data)
> +{
> +	/*
> +	 * Not all guest PASID table entry fields are passed down during bind,
> +	 * here we only set up the ones that are dependent on guest settings.
> +	 * Execution related bits such as NXE, SMEP are not meaningful to IOMMU,
> +	 * therefore not set. Other fields, such as snoop related, are set based
> +	 * on host needs regardless of  guest settings.
> +	 */
> +	if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_SRE) {
> +		if (!ecap_srs(iommu->ecap)) {
> +			pr_err("No supervisor request support on %s\n",
> +			       iommu->name);
> +			return -EINVAL;
> +		}
> +		pasid_set_sre(pte);
> +	}
> +
> +	if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) {
> +		if (!ecap_eafs(iommu->ecap)) {
> +			pr_err("No extended access flag support on %s\n",
> +				iommu->name);
> +			return -EINVAL;
> +		}
> +		pasid_set_eafe(pte);
> +	}
> +
> +	/*
> +	 * Memory type is only applicable to devices inside processor coherent
> +	 * domain. PCIe devices are not included. We can skip the rest of the
> +	 * flags if IOMMU does not support MTS.
> +	 */
> +	if (ecap_mts(iommu->ecap)) {
> +		if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EMTE) {
> +			pasid_set_emte(pte);
> +			pasid_set_emt(pte, pasid_data->emt);
> +		}
> +		if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PCD)
> +			pasid_set_pcd(pte);
> +		if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PWT)
> +			pasid_set_pwt(pte);
> +		if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_CD)
> +			pasid_set_cd(pte);
> +		pasid_set_pat(pte, pasid_data->pat);
> +	} else if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EMT_MASK) {
> +		pr_warn("No memory type support for bind guest PASID on %s\n",
> +			iommu->name);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +
> +}
> +
> +/**
> + * intel_pasid_setup_nested() - Set up PASID entry for nested translation
> + * which is used for vSVA. The first level page tables are used for

Please remove "which is used for vSVA". It should be a generic interface
for setting up nested translation mode?

> + * GVA-GPA or GIOVA-GPA translation in the guest, second level page tables
> + *  are used for GPA-HPA translation.

Nit: align with the last line.

> + *
> + * @iommu:      Iommu which the device belong to
> + * @dev:        Device to be set up for translation
> + * @gpgd:       FLPTPTR: First Level Page translation pointer in GPA
> + * @pasid:      PASID to be programmed in the device PASID table
> + * @pasid_data: Additional PASID info from the guest bind request
> + * @domain:     Domain info for setting up second level page tables
> + * @addr_width: Address width of the first level (guest)
> + */
> +int intel_pasid_setup_nested(struct intel_iommu *iommu,
> +			struct device *dev, pgd_t *gpgd,
> +			int pasid, struct iommu_gpasid_bind_data_vtd *pasid_data,
> +			struct dmar_domain *domain,
> +			int addr_width)
> +{
> +	struct pasid_entry *pte;
> +	struct dma_pte *pgd;
> +	u64 pgd_val;
> +	int agaw;
> +	u16 did;
> +
> +	if (!ecap_nest(iommu->ecap)) {
> +		pr_err("IOMMU: %s: No nested translation support\n",
> +		       iommu->name);
> +		return -EINVAL;
> +	}
> +
> +	pte = intel_pasid_get_entry(dev, pasid);
> +	if (WARN_ON(!pte))
> +		return -EINVAL;
> +
> +	pasid_clear_entry(pte);
> +
> +	/* Sanity checking performed by caller to make sure address
> +	 * width matching in two dimensions:
> +	 * 1. CPU vs. IOMMU
> +	 * 2. Guest vs. Host.
> +	 */
> +	switch (addr_width) {
> +	case ADDR_WIDTH_5LEVEL:
> +		pasid_set_flpm(pte, 1);
> +		break;
> +	case ADDR_WIDTH_4LEVEL:
> +		pasid_set_flpm(pte, 0);
> +		break;
> +	default:
> +		dev_err(dev, "Invalid paging mode %d\n", addr_width);

Invalid guest address width?

> +		return -EINVAL;
> +	}
> +
> +	pasid_set_flptr(pte, (u64)gpgd);
> +
> +	intel_pasid_setup_bind_data(iommu, pte, pasid_data);

Do you want to check and handle the errors returned from this function?

> +
> +	/* Setup the second level based on the given domain */
> +	pgd = domain->pgd;
> +
> +	for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
> +		pgd = phys_to_virt(dma_pte_addr(pgd));
> +		if (!dma_pte_present(pgd)) {
> +			pasid_clear_entry(pte);
> +			dev_err(dev, "Invalid domain page table\n");
> +			return -EINVAL;
> +		}
> +	}
> +	pgd_val = virt_to_phys(pgd);
> +	pasid_set_slptr(pte, pgd_val);
> +	pasid_set_fault_enable(pte);
> +
> +	did = domain->iommu_did[iommu->seq_id];
> +	pasid_set_domain_id(pte, did);
> +
> +	pasid_set_address_width(pte, agaw);
> +	pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
> +
> +	pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
> +	pasid_set_present(pte);
> +	pasid_flush_caches(iommu, pte, pasid, did);
> +
> +	return 0;
> +}
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> index fc8cd8f17de1..95ed160b1947 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -36,6 +36,7 @@
>    * to vmalloc or even module mappings.
>    */
>   #define PASID_FLAG_SUPERVISOR_MODE	BIT(0)
> +#define PASID_FLAG_NESTED		BIT(1)
>   
>   struct pasid_dir_entry {
>   	u64 val;
> @@ -45,6 +46,11 @@ struct pasid_entry {
>   	u64 val[8];
>   };
>   
> +#define PASID_ENTRY_PGTT_FL_ONLY	(1)
> +#define PASID_ENTRY_PGTT_SL_ONLY	(2)
> +#define PASID_ENTRY_PGTT_NESTED		(3)
> +#define PASID_ENTRY_PGTT_PT		(4)
> +
>   /* The representative of a PASID table */
>   struct pasid_table {
>   	void			*table;		/* pasid table pointer */
> @@ -93,6 +99,12 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
>   int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
>   				   struct dmar_domain *domain,
>   				   struct device *dev, int pasid);
> +int intel_pasid_setup_nested(struct intel_iommu *iommu,
> +			struct device *dev, pgd_t *pgd,
> +			int pasid,
> +			struct iommu_gpasid_bind_data_vtd *pasid_data,
> +			struct dmar_domain *domain,
> +			int addr_width);
>   void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
>   				 struct device *dev, int pasid);
>   
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 74b79e2e6a73..19bf9ff180ae 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -34,6 +34,9 @@
>   #define VTD_STRIDE_SHIFT        (9)
>   #define VTD_STRIDE_MASK         (((u64)-1) << VTD_STRIDE_SHIFT)
>   
> +#define ADDR_WIDTH_5LEVEL	(57)
> +#define ADDR_WIDTH_4LEVEL	(48)
> +
>   #define DMA_PTE_READ (1)
>   #define DMA_PTE_WRITE (2)
>   #define DMA_PTE_LARGE_PAGE (1 << 7)
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 4ad3496e5c43..fcafb6401430 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -284,7 +284,10 @@ struct iommu_gpasid_bind_data_vtd {
>   	__u32 pat;
>   	__u32 emt;
>   };
> -
> +#define IOMMU_SVA_VTD_GPASID_EMT_MASK	(IOMMU_SVA_VTD_GPASID_CD | \
> +					 IOMMU_SVA_VTD_GPASID_EMTE | \
> +					 IOMMU_SVA_VTD_GPASID_PCD |  \
> +					 IOMMU_SVA_VTD_GPASID_PWT)

Might need a seperated patch?

>   /**
>    * struct iommu_gpasid_bind_data - Information about device and guest PASID binding
>    * @version:	Version of this data structure
> 

Best regards,
baolu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 02/10] iommu/vt-d: Add nested translation helper function
  2019-12-16 19:24 ` [PATCH v8 02/10] iommu/vt-d: Add nested translation helper function Jacob Pan
  2019-12-18  2:01   ` Lu Baolu
@ 2019-12-18  2:41   ` Lu Baolu
  2020-01-09 18:39     ` Jacob Pan
  1 sibling, 1 reply; 27+ messages in thread
From: Lu Baolu @ 2019-12-18  2:41 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: baolu.lu, Tian, Kevin, Raj Ashok, Yi Liu, Eric Auger, Yi L

Hi again,

On 12/17/19 3:24 AM, Jacob Pan wrote:
> +/**
> + * intel_pasid_setup_nested() - Set up PASID entry for nested translation
> + * which is used for vSVA. The first level page tables are used for
> + * GVA-GPA or GIOVA-GPA translation in the guest, second level page tables
> + *  are used for GPA-HPA translation.
> + *
> + * @iommu:      Iommu which the device belong to
> + * @dev:        Device to be set up for translation
> + * @gpgd:       FLPTPTR: First Level Page translation pointer in GPA
> + * @pasid:      PASID to be programmed in the device PASID table
> + * @pasid_data: Additional PASID info from the guest bind request
> + * @domain:     Domain info for setting up second level page tables
> + * @addr_width: Address width of the first level (guest)
> + */
> +int intel_pasid_setup_nested(struct intel_iommu *iommu,
> +			struct device *dev, pgd_t *gpgd,
> +			int pasid, struct iommu_gpasid_bind_data_vtd *pasid_data,
> +			struct dmar_domain *domain,
> +			int addr_width)
> +{
> +	struct pasid_entry *pte;
> +	struct dma_pte *pgd;
> +	u64 pgd_val;
> +	int agaw;
> +	u16 did;
> +
> +	if (!ecap_nest(iommu->ecap)) {
> +		pr_err("IOMMU: %s: No nested translation support\n",
> +		       iommu->name);
> +		return -EINVAL;
> +	}
> +
> +	pte = intel_pasid_get_entry(dev, pasid);
> +	if (WARN_ON(!pte))
> +		return -EINVAL;
> +
> +	pasid_clear_entry(pte);

In some cases, e.g. nested mode for GIOVA-HPA, the PASID entry might
have already been setup for second level translation. (This could be
checked with the Present bit.) Hence, it's safe to flush caches here.

Or, maybe intel_pasid_tear_down_entry() is more suitable?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 03/10] iommu/vt-d: Add bind guest PASID support
  2019-12-16 19:24 ` [PATCH v8 03/10] iommu/vt-d: Add bind guest PASID support Jacob Pan
@ 2019-12-18  3:14   ` Lu Baolu
  2020-01-09 21:45     ` Jacob Pan
  0 siblings, 1 reply; 27+ messages in thread
From: Lu Baolu @ 2019-12-18  3:14 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: baolu.lu, Tian, Kevin, Raj Ashok, Yi Liu, Eric Auger, Yi L

Hi,

On 12/17/19 3:24 AM, Jacob Pan wrote:
> When supporting guest SVA with emulated IOMMU, the guest PASID
> table is shadowed in VMM. Updates to guest vIOMMU PASID table
> will result in PASID cache flush which will be passed down to
> the host as bind guest PASID calls.
> 
> For the SL page tables, it will be harvested from device's
> default domain (request w/o PASID), or aux domain in case of
> mediated device.
> 
>      .-------------.  .---------------------------.
>      |   vIOMMU    |  | Guest process CR3, FL only|
>      |             |  '---------------------------'
>      .----------------/
>      | PASID Entry |--- PASID cache flush -
>      '-------------'                       |
>      |             |                       V
>      |             |                CR3 in GPA
>      '-------------'
> Guest
> ------| Shadow |--------------------------|--------
>        v        v                          v
> Host
>      .-------------.  .----------------------.
>      |   pIOMMU    |  | Bind FL for GVA-GPA  |
>      |             |  '----------------------'
>      .----------------/  |
>      | PASID Entry |     V (Nested xlate)
>      '----------------\.------------------------------.
>      |             |   |SL for GPA-HPA, default domain|
>      |             |   '------------------------------'
>      '-------------'
> Where:
>   - FL = First level/stage one page tables
>   - SL = Second level/stage two page tables
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>   drivers/iommu/intel-iommu.c |   4 +
>   drivers/iommu/intel-svm.c   | 214 ++++++++++++++++++++++++++++++++++++++++++++
>   include/linux/intel-iommu.h |   8 +-
>   include/linux/intel-svm.h   |  17 ++++
>   4 files changed, 242 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index cc89791d807c..304654dbc622 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5993,6 +5993,10 @@ const struct iommu_ops intel_iommu_ops = {
>   	.dev_disable_feat	= intel_iommu_dev_disable_feat,
>   	.is_attach_deferred	= intel_iommu_is_attach_deferred,
>   	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +	.sva_bind_gpasid	= intel_svm_bind_gpasid,
> +	.sva_unbind_gpasid	= intel_svm_unbind_gpasid,
> +#endif
>   };
>   
>   static void quirk_iommu_igfx(struct pci_dev *dev)
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 0fcbe631cd5f..f580b7be63c5 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -230,6 +230,220 @@ static LIST_HEAD(global_svm_list);
>   	list_for_each_entry((sdev), &(svm)->devs, list)	\
>   		if ((d) != (sdev)->dev) {} else
>   
> +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> +			struct device *dev,
> +			struct iommu_gpasid_bind_data *data)
> +{
> +	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> +	struct dmar_domain *ddomain;
> +	struct intel_svm_dev *sdev;
> +	struct intel_svm *svm;
> +	int ret = 0;
> +
> +	if (WARN_ON(!iommu) || !data)
> +		return -EINVAL;
> +
> +	if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> +	    data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> +		return -EINVAL;
> +
> +	if (dev_is_pci(dev)) {
> +		/* VT-d supports devices with full 20 bit PASIDs only */
> +		if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> +			return -EINVAL;
> +	} else {
> +		return -ENOTSUPP;
> +	}
> +
> +	/*
> +	 * We only check host PASID range, we have no knowledge to check
> +	 * guest PASID range nor do we use the guest PASID.
> +	 */
> +	if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> +		return -EINVAL;
> +
> +	ddomain = to_dmar_domain(domain);
> +
> +	/* Sanity check paging mode support match between host and guest */
> +	if (data->addr_width == ADDR_WIDTH_5LEVEL &&
> +	    !cap_5lp_support(iommu->cap)) {
> +		pr_err("Cannot support 5 level paging requested by guest!\n");
> +		return -EINVAL;
> +	}
> +
> +	mutex_lock(&pasid_mutex);
> +	svm = ioasid_find(NULL, data->hpasid, NULL);
> +	if (IS_ERR(svm)) {
> +		ret = PTR_ERR(svm);
> +		goto out;
> +	}
> +
> +	if (svm) {
> +		/*
> +		 * If we found svm for the PASID, there must be at
> +		 * least one device bond, otherwise svm should be freed.
> +		 */
> +		if (WARN_ON(list_empty(&svm->devs)))
> +			return -EINVAL;
> +
> +		if (svm->mm == get_task_mm(current) &&
> +		    data->hpasid == svm->pasid &&
> +		    data->gpasid == svm->gpasid) {
> +			pr_warn("Cannot bind the same guest-host PASID for the same process\n");
> +			mmput(svm->mm);
> +			return -EINVAL;
> +		}
> +
> +		for_each_svm_dev(sdev, svm, dev) {
> +			/* In case of multiple sub-devices of the same pdev
> +			 * assigned, we should allow multiple bind calls with
> +			 * the same PASID and pdev.
> +			 */
> +			sdev->users++;
> +			goto out;
> +		}
> +	} else {
> +		/* We come here when PASID has never been bond to a device. */
> +		svm = kzalloc(sizeof(*svm), GFP_KERNEL);
> +		if (!svm) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +		/* REVISIT: upper layer/VFIO can track host process that bind the PASID.
> +		 * ioasid_set = mm might be sufficient for vfio to check pasid VMM
> +		 * ownership.
> +		 */

Does this mean you need more work to enhance this? Or just a hint?

> +		svm->mm = get_task_mm(current);
> +		svm->pasid = data->hpasid;
> +		if (data->flags & IOMMU_SVA_GPASID_VAL) {
> +			svm->gpasid = data->gpasid;
> +			svm->flags |= SVM_FLAG_GUEST_PASID;
> +		}
> +		ioasid_set_data(data->hpasid, svm);
> +		INIT_LIST_HEAD_RCU(&svm->devs);
> +		INIT_LIST_HEAD(&svm->list);
> +
> +		mmput(svm->mm);
> +	}
> +	sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> +	if (!sdev) {
> +		if (list_empty(&svm->devs)) {
> +			ioasid_set_data(data->hpasid, NULL);
> +			kfree(svm);
> +		}
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	sdev->dev = dev;
> +	sdev->users = 1;
> +
> +	/* Set up device context entry for PASID if not enabled already */
> +	ret = intel_iommu_enable_pasid(iommu, sdev->dev);
> +	if (ret) {
> +		dev_err(dev, "Failed to enable PASID capability\n");
> +		kfree(sdev);

Need to free svm if both svm and sdev are allocated in this function.

> +		goto out;
> +	}
> +
> +	/*
> +	 * For guest bind, we need to set up PASID table entry as follows:
> +	 * - FLPM matches guest paging mode
> +	 * - turn on nested mode
> +	 * - SL guest address width matching
> +	 */
> +	ret = intel_pasid_setup_nested(iommu,
> +				dev,
> +				(pgd_t *)data->gpgd,
> +				data->hpasid,
> +				&data->vtd,
> +				ddomain,
> +				data->addr_width);

Align with the left parenthesis.

Best regards,
baolu

> +	if (ret) {
> +		dev_err(dev, "Failed to set up PASID %llu in nested mode, Err %d\n",
> +			data->hpasid, ret);
> +		/*
> +		 * PASID entry should be in cleared state if nested mode
> +		 * set up failed. So we only need to clear IOASID tracking
> +		 * data such that free call will succeed.
> +		 */
> +		ioasid_set_data(data->hpasid, NULL);
> +		kfree(sdev);
> +		if (list_empty(&svm->devs))
> +			kfree(svm);
> +
> +		goto out;
> +	}
> +	svm->flags |= SVM_FLAG_GUEST_MODE;
> +
> +	init_rcu_head(&sdev->rcu);
> +	list_add_rcu(&sdev->list, &svm->devs);
> + out:
> +	mutex_unlock(&pasid_mutex);
> +	return ret;
> +}
> +
> +int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> +{
> +	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> +	struct intel_svm_dev *sdev;
> +	struct intel_svm *svm;
> +	int ret = -EINVAL;
> +
> +	if (WARN_ON(!iommu))
> +		return -EINVAL;
> +
> +	mutex_lock(&pasid_mutex);
> +	svm = ioasid_find(NULL, pasid, NULL);
> +	if (!svm) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (IS_ERR(svm)) {
> +		ret = PTR_ERR(svm);
> +		goto out;
> +	}
> +
> +	for_each_svm_dev(sdev, svm, dev) {
> +		ret = 0;
> +		sdev->users--;
> +		if (!sdev->users) {
> +			list_del_rcu(&sdev->list);
> +			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
> +			/* TODO: Drain in flight PRQ for the PASID since it
> +			 * may get reused soon, we don't want to
> +			 * confuse with its previous life.
> +			 * intel_svm_drain_prq(dev, pasid);
> +			 */
> +			kfree_rcu(sdev, rcu);
> +
> +			if (list_empty(&svm->devs)) {
> +				list_del(&svm->list);
> +				/*
> +				 * We do not free PASID here until explicit call
> +				 * from VFIO to free. The PASID life cycle
> +				 * management is largely tied to VFIO management
> +				 * of assigned device life cycles. In case of
> +				 * guest exit without a explicit free PASID call,
> +				 * the responsibility lies in VFIO layer to free
> +				 * the PASIDs allocated for the guest.
> +				 * For security reasons, VFIO has to track the
> +				 * PASID ownership per guest anyway to ensure
> +				 * that PASID allocated by one guest cannot be
> +				 * used by another.
> +				 */
> +				ioasid_set_data(pasid, NULL);
> +				kfree(svm);
> +			}
> +		}
> +		break;
> +	}
> +out:
> +	mutex_unlock(&pasid_mutex);
> +
> +	return ret;
> +}
> +
>   int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
>   {
>   	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 19bf9ff180ae..412a90cb1738 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -671,7 +671,9 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev);
>   extern void intel_svm_check(struct intel_iommu *iommu);
>   extern int intel_svm_enable_prq(struct intel_iommu *iommu);
>   extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> -
> +extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
> +		struct device *dev, struct iommu_gpasid_bind_data *data);
> +extern int intel_svm_unbind_gpasid(struct device *dev, int pasid);
>   struct svm_dev_ops;
>   
>   struct intel_svm_dev {
> @@ -688,9 +690,13 @@ struct intel_svm_dev {
>   struct intel_svm {
>   	struct mmu_notifier notifier;
>   	struct mm_struct *mm;
> +
>   	struct intel_iommu *iommu;
>   	int flags;
>   	int pasid;
> +	int gpasid; /* Guest PASID in case of vSVA bind with non-identity host
> +		     * to guest PASID mapping.
> +		     */
>   	struct list_head devs;
>   	struct list_head list;
>   };
> diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
> index 94f047a8a845..a2c189ad0b01 100644
> --- a/include/linux/intel-svm.h
> +++ b/include/linux/intel-svm.h
> @@ -44,6 +44,23 @@ struct svm_dev_ops {
>    * do such IOTLB flushes automatically.
>    */
>   #define SVM_FLAG_SUPERVISOR_MODE	(1<<1)
> +/*
> + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind to a device.
> + * In this case the mm_struct is in the guest kernel or userspace, its life
> + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this API provides
> + * means to bind/unbind guest CR3 with PASIDs allocated for a device.
> + */
> +#define SVM_FLAG_GUEST_MODE	(1<<2)
> +/*
> + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own PASID space,
> + * which requires guest and host PASID translation at both directions. We keep
> + * track of guest PASID in order to provide lookup service to device drivers.
> + * One such example is a physical function (PF) driver that supports mediated
> + * device (mdev) assignment. Guest programming of mdev configuration space can
> + * only be done with guest PASID, therefore PF driver needs to find the matching
> + * host PASID to program the real hardware.
> + */
> +#define SVM_FLAG_GUEST_PASID	(1<<3)
>   
>   #ifdef CONFIG_INTEL_IOMMU_SVM
>   
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 06/10] iommu/vt-d: Cache virtual command capability register
  2019-12-16 19:24 ` [PATCH v8 06/10] iommu/vt-d: Cache virtual command capability register Jacob Pan
@ 2019-12-18  3:25   ` Lu Baolu
  2020-01-09 21:59     ` Jacob Pan
  0 siblings, 1 reply; 27+ messages in thread
From: Lu Baolu @ 2019-12-18  3:25 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: baolu.lu, Tian, Kevin, Raj Ashok, Yi Liu, Eric Auger

Hi,

On 12/17/19 3:24 AM, Jacob Pan wrote:
> Virtual command registers are used in the guest only, to prevent
> vmexit cost, we cache the capability and store it during initialization.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>   drivers/iommu/dmar.c        | 1 +
>   include/linux/intel-iommu.h | 4 ++++
>   2 files changed, 5 insertions(+)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index f2f5d75da94a..3f98dd9ad004 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -953,6 +953,7 @@ static int map_iommu(struct intel_iommu *iommu, u64 phys_addr)
>   		warn_invalid_dmar(phys_addr, " returns all ones");
>   		goto unmap;
>   	}
> +	iommu->vccap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
>   
>   	/* the registers might be more than one page */
>   	map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index ee26989df008..4d25141ec3df 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -189,6 +189,9 @@
>   #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
>   #define ecap_sc_support(e)	((e >> 7) & 0x1) /* Snooping Control */
>   
> +/* Virtual command interface capabilities */
> +#define vccap_pasid(v)		((v & DMA_VCS_PAS)) /* PASID allocation */

Has DMA_VCS_PAS ever been defined?

Best regards,
baolu

> +
>   /* IOTLB_REG */
>   #define DMA_TLB_FLUSH_GRANU_OFFSET  60
>   #define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60)
> @@ -531,6 +534,7 @@ struct intel_iommu {
>   	u64		reg_size; /* size of hw register set */
>   	u64		cap;
>   	u64		ecap;
> +	u64		vccap;
>   	u32		gcmd; /* Holds TE, EAFL. Don't need SRTP, SFL, WBF */
>   	raw_spinlock_t	register_lock; /* protect register handling */
>   	int		seq_id;	/* sequence id of the iommu */
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 08/10] iommu/vt-d: Add custom allocator for IOASID
  2019-12-16 19:24 ` [PATCH v8 08/10] iommu/vt-d: Add custom allocator for IOASID Jacob Pan
@ 2019-12-18  4:10   ` Lu Baolu
  2020-01-09 22:06     ` Jacob Pan
  0 siblings, 1 reply; 27+ messages in thread
From: Lu Baolu @ 2019-12-18  4:10 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: baolu.lu, Tian, Kevin, Raj Ashok, Yi Liu, Eric Auger

Hi,

On 12/17/19 3:24 AM, Jacob Pan wrote:
> When VT-d driver runs in the guest, PASID allocation must be
> performed via virtual command interface. This patch registers a
> custom IOASID allocator which takes precedence over the default
> XArray based allocator. The resulting IOASID allocation will always
> come from the host. This ensures that PASID namespace is system-
> wide.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>   drivers/iommu/intel-iommu.c | 75 +++++++++++++++++++++++++++++++++++++++++++++
>   include/linux/intel-iommu.h |  2 ++
>   2 files changed, 77 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index e90102c7540d..b0c0bb6f740e 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1700,6 +1700,9 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
>   		if (ecap_prs(iommu->ecap))
>   			intel_svm_finish_prq(iommu);
>   	}
> +	if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
> +		ioasid_unregister_allocator(&iommu->pasid_allocator);
> +
>   #endif
>   }
>   
> @@ -3181,6 +3184,75 @@ static int copy_translation_tables(struct intel_iommu *iommu)
>   	return ret;
>   }
>   
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data)
> +{
> +	struct intel_iommu *iommu = data;
> +	ioasid_t ioasid;
> +

Check !iommu just like the free api?

> +	/*
> +	 * VT-d virtual command interface always uses the full 20 bit
> +	 * PASID range. Host can partition guest PASID range based on
> +	 * policies but it is out of guest's control.
> +	 */
> +	if (min < PASID_MIN || max > intel_pasid_max_id)
> +		return INVALID_IOASID;
> +
> +	if (vcmd_alloc_pasid(iommu, &ioasid))
> +		return INVALID_IOASID;
> +
> +	return ioasid;
> +}
> +
> +static void intel_ioasid_free(ioasid_t ioasid, void *data)
> +{
> +	struct intel_iommu *iommu = data;
> +
> +	if (!iommu)
> +		return;
> +	/*
> +	 * Sanity check the ioasid owner is done at upper layer, e.g. VFIO
> +	 * We can only free the PASID when all the devices are unbound.
> +	 */
> +	if (ioasid_find(NULL, ioasid, NULL)) {
> +		pr_alert("Cannot free active IOASID %d\n", ioasid);
> +		return;
> +	}
> +	vcmd_free_pasid(iommu, ioasid);
> +}
> +
> +static void register_pasid_allocator(struct intel_iommu *iommu)
> +{
> +	if (!intel_iommu_sm) {

Use sm_supported(iommu) instead.

> +		pr_warn("VT-d scalable mode not enabled\n");
> +		return;
> +	}
> +
> +	/*
> +	 * Register a custom PASID allocator if we are running in a guest,
> +	 * guest PASID must be obtained via virtual command interface.
> +	 * There can be multiple vIOMMUs in each guest but only one allocator
> +	 * is active. All vIOMMU allocators will eventually be calling the same
> +	 * host allocator.
> +	 */
> +	if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap)) {
> +		pr_info("Register custom PASID allocator\n");
> +		iommu->pasid_allocator.alloc = intel_ioasid_alloc;
> +		iommu->pasid_allocator.free = intel_ioasid_free;
> +		iommu->pasid_allocator.pdata = (void *)iommu;
> +		if (!ioasid_register_allocator(&iommu->pasid_allocator)) {
> +			pr_warn("Custom PASID allocator failed, scalable mode disabled\n");
> +			/*
> +			 * Disable scalable mode on this IOMMU if there
> +			 * is no custom allocator. Mixing SM capable vIOMMU
> +			 * and non-SM vIOMMU are not supported.
> +			 */
> +			intel_iommu_sm = 0;
> +		}
> +	}
> +}
> +#endif
> +
>   static int __init init_dmars(void)
>   {
>   	struct dmar_drhd_unit *drhd;
> @@ -3298,6 +3370,9 @@ static int __init init_dmars(void)
>   	 */
>   	for_each_active_iommu(iommu, drhd) {
>   		iommu_flush_write_buffer(iommu);
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +		register_pasid_allocator(iommu);
> +#endif
>   		iommu_set_root_entry(iommu);
>   		iommu->flush.flush_context(iommu, 0, 0, 0, DMA_CCMD_GLOBAL_INVL);
>   		iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 1e11560b0e59..8c30b23bd838 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -19,6 +19,7 @@
>   #include <linux/iommu.h>
>   #include <linux/io-64-nonatomic-lo-hi.h>
>   #include <linux/dmar.h>
> +#include <linux/ioasid.h>
>   
>   #include <asm/cacheflush.h>
>   #include <asm/iommu.h>
> @@ -557,6 +558,7 @@ struct intel_iommu {
>   #ifdef CONFIG_INTEL_IOMMU_SVM
>   	struct page_req_dsc *prq;
>   	unsigned char prq_name[16];    /* Name for PRQ interrupt */
> +	struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for PASIDs */
>   #endif
>   	struct q_inval  *qi;            /* Queued invalidation info */
>   	u32 *iommu_state; /* Store iommu states between suspend and resume.*/
> 

Best regards,
baolu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 04/10] iommu/vt-d: Support flushing more translation cache types
  2019-12-16 19:24 ` [PATCH v8 04/10] iommu/vt-d: Support flushing more translation cache types Jacob Pan
@ 2019-12-19  2:46   ` Lu Baolu
  2020-01-09 21:50     ` Jacob Pan
  0 siblings, 1 reply; 27+ messages in thread
From: Lu Baolu @ 2019-12-19  2:46 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: baolu.lu, Tian, Kevin, Raj Ashok, Yi Liu, Eric Auger

Hi,

On 12/17/19 3:24 AM, Jacob Pan wrote:
> When Shared Virtual Memory is exposed to a guest via vIOMMU, scalable
> IOTLB invalidation may be passed down from outside IOMMU subsystems.
> This patch adds invalidation functions that can be used for additional
> translation cache types.
> 
> Signed-off-by: Jacob Pan<jacob.jun.pan@linux.intel.com>
> ---
>   drivers/iommu/dmar.c        | 46 +++++++++++++++++++++++++++++++++++++++++++++
>   drivers/iommu/intel-pasid.c |  3 ++-
>   include/linux/intel-iommu.h | 21 +++++++++++++++++----
>   3 files changed, 65 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index 3acfa6a25fa2..f2f5d75da94a 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1348,6 +1348,20 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
>   	qi_submit_sync(&desc, iommu);
>   }
>   
> +/* PASID-based IOTLB Invalidate */
> +void qi_flush_iotlb_pasid(struct intel_iommu *iommu, u16 did, u64 addr, u32 pasid,
> +		unsigned int size_order, u64 granu, int ih)
> +{
> +	struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
> +
> +	desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
> +		QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
> +	desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
> +		QI_EIOTLB_AM(size_order);
> +
> +	qi_submit_sync(&desc, iommu);
> +}

There's another version of pasid-based iotlb invalidation.

https://lkml.org/lkml/2019/12/10/2128

Let's consider merging them.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 02/10] iommu/vt-d: Add nested translation helper function
  2019-12-18  2:01   ` Lu Baolu
@ 2020-01-09 17:51     ` Jacob Pan
  0 siblings, 0 replies; 27+ messages in thread
From: Jacob Pan @ 2020-01-09 17:51 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Tian, Kevin,
	Raj Ashok, Yi Liu, Eric Auger, jacob.jun.pan

Hi Baolu,

Appreciate the review. Comments inline below.

On Wed, 18 Dec 2019 10:01:17 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi Jacob,
> 
> On 12/17/19 3:24 AM, Jacob Pan wrote:
> > Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
> > With PASID granular translation type set to 0x11b, translation
> > result from the first level(FL) also subject to a second level(SL)
> > page table translation. This mode is used for SVA virtualization,
> > where FL performs guest virtual to guest physical translation and
> > SL performs guest physical to host physical translation.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >   drivers/iommu/intel-pasid.c | 213
> > ++++++++++++++++++++++++++++++++++++++++++++
> > drivers/iommu/intel-pasid.h |  12 +++ include/linux/intel-iommu.h
> > |   3 + include/uapi/linux/iommu.h  |   5 +-
> >   4 files changed, 232 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index 3cb569e76642..b178ad9e47ae
> > 100644 --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -359,6 +359,76 @@ pasid_set_flpm(struct pasid_entry *pe, u64
> > value) pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
> >   }
> >   
> > +/*
> > + * Setup the Extended Memory Type(EMT) field (Bits 91-93)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_emt(struct pasid_entry *pe, u64 value)
> > +{
> > +	pasid_set_bits(&pe->val[1], GENMASK_ULL(29, 27), value <<
> > 27); +}
> > +
> > +/*
> > + * Setup the Page Attribute Table (PAT) field (Bits 96-127)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_pat(struct pasid_entry *pe, u64 value)
> > +{
> > +	pasid_set_bits(&pe->val[1], GENMASK_ULL(63, 32), value <<
> > 27);  
> 
> The last input should be "value << 32".
> 
you are right. will fix.
> > +}
> > +
> > +/*
> > + * Setup the Cache Disable (CD) field (Bit 89)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_cd(struct pasid_entry *pe)
> > +{
> > +	pasid_set_bits(&pe->val[1], 1 << 25, 1);  
> 
> The last input should be "1 << 25".
> 
right, i misunderstood the argument of pasid_set_bits(), same for the
other bits below.
> > +}
> > +
> > +/*
> > + * Setup the Extended Memory Type Enable (EMTE) field (Bit 90)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_emte(struct pasid_entry *pe)
> > +{
> > +	pasid_set_bits(&pe->val[1], 1 << 26, 1);  
> 
> The last input should be "1 << 26".
> 
> > +}
> > +
> > +/*
> > + * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_eafe(struct pasid_entry *pe)
> > +{
> > +	pasid_set_bits(&pe->val[2], 1 << 7, 1);  
> 
> The last input should be "1 << 7".
> 
> > +}
> > +
> > +/*
> > + * Setup the Page-level Cache Disable (PCD) field (Bit 95)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_pcd(struct pasid_entry *pe)
> > +{
> > +	pasid_set_bits(&pe->val[1], 1 << 31, 1);  
> 
> The last input should be "1 << 31".
> 
> > +}
> > +
> > +/*
> > + * Setup the Page-level Write-Through (PWT)) field (Bit 94)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_pwt(struct pasid_entry *pe)
> > +{
> > +	pasid_set_bits(&pe->val[1], 1 << 30, 1);  
> 
> The last input should be "1 << 30".
> 
> > +}
> > +
> >   static void
> >   pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
> >   				    u16 did, int pasid)
> > @@ -599,3 +669,146 @@ int intel_pasid_setup_pass_through(struct
> > intel_iommu *iommu, 
> >   	return 0;
> >   }
> > +
> > +static int intel_pasid_setup_bind_data(struct intel_iommu *iommu,
> > +				struct pasid_entry *pte,
> > +				struct iommu_gpasid_bind_data_vtd
> > *pasid_data) +{
> > +	/*
> > +	 * Not all guest PASID table entry fields are passed down
> > during bind,
> > +	 * here we only set up the ones that are dependent on
> > guest settings.
> > +	 * Execution related bits such as NXE, SMEP are not
> > meaningful to IOMMU,
> > +	 * therefore not set. Other fields, such as snoop related,
> > are set based
> > +	 * on host needs regardless of  guest settings.
> > +	 */
> > +	if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_SRE) {
> > +		if (!ecap_srs(iommu->ecap)) {
> > +			pr_err("No supervisor request support on
> > %s\n",
> > +			       iommu->name);
> > +			return -EINVAL;
> > +		}
> > +		pasid_set_sre(pte);
> > +	}
> > +
> > +	if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) {
> > +		if (!ecap_eafs(iommu->ecap)) {
> > +			pr_err("No extended access flag support on
> > %s\n",
> > +				iommu->name);
> > +			return -EINVAL;
> > +		}
> > +		pasid_set_eafe(pte);
> > +	}
> > +
> > +	/*
> > +	 * Memory type is only applicable to devices inside
> > processor coherent
> > +	 * domain. PCIe devices are not included. We can skip the
> > rest of the
> > +	 * flags if IOMMU does not support MTS.
> > +	 */
> > +	if (ecap_mts(iommu->ecap)) {
> > +		if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EMTE)
> > {
> > +			pasid_set_emte(pte);
> > +			pasid_set_emt(pte, pasid_data->emt);
> > +		}
> > +		if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PCD)
> > +			pasid_set_pcd(pte);
> > +		if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PWT)
> > +			pasid_set_pwt(pte);
> > +		if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_CD)
> > +			pasid_set_cd(pte);
> > +		pasid_set_pat(pte, pasid_data->pat);
> > +	} else if (pasid_data->flags &
> > IOMMU_SVA_VTD_GPASID_EMT_MASK) {
> > +		pr_warn("No memory type support for bind guest
> > PASID on %s\n",
> > +			iommu->name);
> > +		return -EINVAL;
> > +	}
> > +
> > +	return 0;
> > +
> > +}
> > +
> > +/**
> > + * intel_pasid_setup_nested() - Set up PASID entry for nested
> > translation
> > + * which is used for vSVA. The first level page tables are used
> > for  
> 
> Please remove "which is used for vSVA". It should be a generic
> interface for setting up nested translation mode?
> 
Right, perhaps just mention vSVA as an example such that readers can
have some idea of how nested translation can be used? e.g.
"
 * intel_pasid_setup_nested() - Set up PASID entry for nested translation.
 * This could be used for guest shared virtual address. In this case, the
 * first level page tables are used for GVA-GPA translation in the guest,
 * second level page tables are used for GPA-HPA translation.
"

> > + * GVA-GPA or GIOVA-GPA translation in the guest, second level
> > page tables
> > + *  are used for GPA-HPA translation.  
> 
> Nit: align with the last line.
> 
Will do.
> > + *
> > + * @iommu:      Iommu which the device belong to
> > + * @dev:        Device to be set up for translation
> > + * @gpgd:       FLPTPTR: First Level Page translation pointer in
> > GPA
> > + * @pasid:      PASID to be programmed in the device PASID table
> > + * @pasid_data: Additional PASID info from the guest bind request
> > + * @domain:     Domain info for setting up second level page tables
> > + * @addr_width: Address width of the first level (guest)
> > + */
> > +int intel_pasid_setup_nested(struct intel_iommu *iommu,
> > +			struct device *dev, pgd_t *gpgd,
> > +			int pasid, struct
> > iommu_gpasid_bind_data_vtd *pasid_data,
> > +			struct dmar_domain *domain,
> > +			int addr_width)
> > +{
> > +	struct pasid_entry *pte;
> > +	struct dma_pte *pgd;
> > +	u64 pgd_val;
> > +	int agaw;
> > +	u16 did;
> > +
> > +	if (!ecap_nest(iommu->ecap)) {
> > +		pr_err("IOMMU: %s: No nested translation
> > support\n",
> > +		       iommu->name);
> > +		return -EINVAL;
> > +	}
> > +
> > +	pte = intel_pasid_get_entry(dev, pasid);
> > +	if (WARN_ON(!pte))
> > +		return -EINVAL;
> > +
> > +	pasid_clear_entry(pte);
> > +
> > +	/* Sanity checking performed by caller to make sure address
> > +	 * width matching in two dimensions:
> > +	 * 1. CPU vs. IOMMU
> > +	 * 2. Guest vs. Host.
> > +	 */
> > +	switch (addr_width) {
> > +	case ADDR_WIDTH_5LEVEL:
> > +		pasid_set_flpm(pte, 1);
> > +		break;
> > +	case ADDR_WIDTH_4LEVEL:
> > +		pasid_set_flpm(pte, 0);
> > +		break;
> > +	default:
> > +		dev_err(dev, "Invalid paging mode %d\n",
> > addr_width);  
> 
> Invalid guest address width?
> 
Sounds better, will do.
> > +		return -EINVAL;
> > +	}
> > +
> > +	pasid_set_flptr(pte, (u64)gpgd);
> > +
> > +	intel_pasid_setup_bind_data(iommu, pte, pasid_data);  
> 
> Do you want to check and handle the errors returned from this
> function?
> 
yes, that would be necessary in case of unsupported bind data. Thanks!
> > +
> > +	/* Setup the second level based on the given domain */
> > +	pgd = domain->pgd;
> > +
> > +	for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
> > +		pgd = phys_to_virt(dma_pte_addr(pgd));
> > +		if (!dma_pte_present(pgd)) {
> > +			pasid_clear_entry(pte);
> > +			dev_err(dev, "Invalid domain page
> > table\n");
> > +			return -EINVAL;
> > +		}
> > +	}
> > +	pgd_val = virt_to_phys(pgd);
> > +	pasid_set_slptr(pte, pgd_val);
> > +	pasid_set_fault_enable(pte);
> > +
> > +	did = domain->iommu_did[iommu->seq_id];
> > +	pasid_set_domain_id(pte, did);
> > +
> > +	pasid_set_address_width(pte, agaw);
> > +	pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
> > +
> > +	pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
> > +	pasid_set_present(pte);
> > +	pasid_flush_caches(iommu, pte, pasid, did);
> > +
> > +	return 0;
> > +}
> > diff --git a/drivers/iommu/intel-pasid.h
> > b/drivers/iommu/intel-pasid.h index fc8cd8f17de1..95ed160b1947
> > 100644 --- a/drivers/iommu/intel-pasid.h
> > +++ b/drivers/iommu/intel-pasid.h
> > @@ -36,6 +36,7 @@
> >    * to vmalloc or even module mappings.
> >    */
> >   #define PASID_FLAG_SUPERVISOR_MODE	BIT(0)
> > +#define PASID_FLAG_NESTED		BIT(1)
> >   
> >   struct pasid_dir_entry {
> >   	u64 val;
> > @@ -45,6 +46,11 @@ struct pasid_entry {
> >   	u64 val[8];
> >   };
> >   
> > +#define PASID_ENTRY_PGTT_FL_ONLY	(1)
> > +#define PASID_ENTRY_PGTT_SL_ONLY	(2)
> > +#define PASID_ENTRY_PGTT_NESTED		(3)
> > +#define PASID_ENTRY_PGTT_PT		(4)
> > +
> >   /* The representative of a PASID table */
> >   struct pasid_table {
> >   	void			*table;		/*
> > pasid table pointer */ @@ -93,6 +99,12 @@ int
> > intel_pasid_setup_second_level(struct intel_iommu *iommu, int
> > intel_pasid_setup_pass_through(struct intel_iommu *iommu, struct
> > dmar_domain *domain, struct device *dev, int pasid);
> > +int intel_pasid_setup_nested(struct intel_iommu *iommu,
> > +			struct device *dev, pgd_t *pgd,
> > +			int pasid,
> > +			struct iommu_gpasid_bind_data_vtd
> > *pasid_data,
> > +			struct dmar_domain *domain,
> > +			int addr_width);
> >   void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> >   				 struct device *dev, int pasid);
> >   
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index 74b79e2e6a73..19bf9ff180ae
> > 100644 --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -34,6 +34,9 @@
> >   #define VTD_STRIDE_SHIFT        (9)
> >   #define VTD_STRIDE_MASK         (((u64)-1) << VTD_STRIDE_SHIFT)
> >   
> > +#define ADDR_WIDTH_5LEVEL	(57)
> > +#define ADDR_WIDTH_4LEVEL	(48)
> > +
> >   #define DMA_PTE_READ (1)
> >   #define DMA_PTE_WRITE (2)
> >   #define DMA_PTE_LARGE_PAGE (1 << 7)
> > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> > index 4ad3496e5c43..fcafb6401430 100644
> > --- a/include/uapi/linux/iommu.h
> > +++ b/include/uapi/linux/iommu.h
> > @@ -284,7 +284,10 @@ struct iommu_gpasid_bind_data_vtd {
> >   	__u32 pat;
> >   	__u32 emt;
> >   };
> > -
> > +#define IOMMU_SVA_VTD_GPASID_EMT_MASK
> > (IOMMU_SVA_VTD_GPASID_CD | \
> > +					 IOMMU_SVA_VTD_GPASID_EMTE
> > | \
> > +					 IOMMU_SVA_VTD_GPASID_PCD
> > |  \
> > +
> > IOMMU_SVA_VTD_GPASID_PWT)  
> 
> Might need a seperated patch?
> 
Sure. I have other uapi changes, may group them as a separate series.

> >   /**
> >    * struct iommu_gpasid_bind_data - Information about device and
> > guest PASID binding
> >    * @version:	Version of this data structure
> >   
> 
> Best regards,
> baolu

[Jacob Pan]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 02/10] iommu/vt-d: Add nested translation helper function
  2019-12-18  2:41   ` Lu Baolu
@ 2020-01-09 18:39     ` Jacob Pan
  2020-01-10  1:15       ` Lu Baolu
  0 siblings, 1 reply; 27+ messages in thread
From: Jacob Pan @ 2020-01-09 18:39 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Tian, Kevin,
	Raj Ashok, Yi Liu, Eric Auger, Yi L, jacob.jun.pan

On Wed, 18 Dec 2019 10:41:53 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi again,
> 
> On 12/17/19 3:24 AM, Jacob Pan wrote:
> > +/**
> > + * intel_pasid_setup_nested() - Set up PASID entry for nested
> > translation
> > + * which is used for vSVA. The first level page tables are used for
> > + * GVA-GPA or GIOVA-GPA translation in the guest, second level
> > page tables
> > + *  are used for GPA-HPA translation.
> > + *
> > + * @iommu:      Iommu which the device belong to
> > + * @dev:        Device to be set up for translation
> > + * @gpgd:       FLPTPTR: First Level Page translation pointer in
> > GPA
> > + * @pasid:      PASID to be programmed in the device PASID table
> > + * @pasid_data: Additional PASID info from the guest bind request
> > + * @domain:     Domain info for setting up second level page tables
> > + * @addr_width: Address width of the first level (guest)
> > + */
> > +int intel_pasid_setup_nested(struct intel_iommu *iommu,
> > +			struct device *dev, pgd_t *gpgd,
> > +			int pasid, struct
> > iommu_gpasid_bind_data_vtd *pasid_data,
> > +			struct dmar_domain *domain,
> > +			int addr_width)
> > +{
> > +	struct pasid_entry *pte;
> > +	struct dma_pte *pgd;
> > +	u64 pgd_val;
> > +	int agaw;
> > +	u16 did;
> > +
> > +	if (!ecap_nest(iommu->ecap)) {
> > +		pr_err("IOMMU: %s: No nested translation
> > support\n",
> > +		       iommu->name);
> > +		return -EINVAL;
> > +	}
> > +
> > +	pte = intel_pasid_get_entry(dev, pasid);
> > +	if (WARN_ON(!pte))
> > +		return -EINVAL;
> > +
> > +	pasid_clear_entry(pte);  
> 
> In some cases, e.g. nested mode for GIOVA-HPA, the PASID entry might
> have already been setup for second level translation. (This could be
> checked with the Present bit.) Hence, it's safe to flush caches here.
> 
> Or, maybe intel_pasid_tear_down_entry() is more suitable?
> 
We don't allow binding the same device-PASID twice, so if the PASID
entry was used for GIOVA/RID2PASID, it should unbind first, and
teardown flush included, right?

> Best regards,
> baolu

[Jacob Pan]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 03/10] iommu/vt-d: Add bind guest PASID support
  2019-12-18  3:14   ` Lu Baolu
@ 2020-01-09 21:45     ` Jacob Pan
  0 siblings, 0 replies; 27+ messages in thread
From: Jacob Pan @ 2020-01-09 21:45 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Tian, Kevin,
	Raj Ashok, Yi Liu, Eric Auger, Yi L, jacob.jun.pan

On Wed, 18 Dec 2019 11:14:59 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi,
> 
> On 12/17/19 3:24 AM, Jacob Pan wrote:
> > When supporting guest SVA with emulated IOMMU, the guest PASID
> > table is shadowed in VMM. Updates to guest vIOMMU PASID table
> > will result in PASID cache flush which will be passed down to
> > the host as bind guest PASID calls.
> > 
> > For the SL page tables, it will be harvested from device's
> > default domain (request w/o PASID), or aux domain in case of
> > mediated device.
> > 
> >      .-------------.  .---------------------------.
> >      |   vIOMMU    |  | Guest process CR3, FL only|
> >      |             |  '---------------------------'
> >      .----------------/
> >      | PASID Entry |--- PASID cache flush -
> >      '-------------'                       |
> >      |             |                       V
> >      |             |                CR3 in GPA
> >      '-------------'
> > Guest
> > ------| Shadow |--------------------------|--------
> >        v        v                          v
> > Host
> >      .-------------.  .----------------------.
> >      |   pIOMMU    |  | Bind FL for GVA-GPA  |
> >      |             |  '----------------------'
> >      .----------------/  |
> >      | PASID Entry |     V (Nested xlate)
> >      '----------------\.------------------------------.
> >      |             |   |SL for GPA-HPA, default domain|
> >      |             |   '------------------------------'
> >      '-------------'
> > Where:
> >   - FL = First level/stage one page tables
> >   - SL = Second level/stage two page tables
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >   drivers/iommu/intel-iommu.c |   4 +
> >   drivers/iommu/intel-svm.c   | 214
> > ++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/intel-iommu.h |   8 +- include/linux/intel-svm.h   |
> > 17 ++++ 4 files changed, 242 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index cc89791d807c..304654dbc622
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5993,6 +5993,10 @@ const struct iommu_ops intel_iommu_ops = {
> >   	.dev_disable_feat	= intel_iommu_dev_disable_feat,
> >   	.is_attach_deferred	=
> > intel_iommu_is_attach_deferred, .pgsize_bitmap		=
> > INTEL_IOMMU_PGSIZES, +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +	.sva_bind_gpasid	= intel_svm_bind_gpasid,
> > +	.sva_unbind_gpasid	= intel_svm_unbind_gpasid,
> > +#endif
> >   };
> >   
> >   static void quirk_iommu_igfx(struct pci_dev *dev)
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index 0fcbe631cd5f..f580b7be63c5 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -230,6 +230,220 @@ static LIST_HEAD(global_svm_list);
> >   	list_for_each_entry((sdev), &(svm)->devs, list)	\
> >   		if ((d) != (sdev)->dev) {} else
> >   
> > +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> > +			struct device *dev,
> > +			struct iommu_gpasid_bind_data *data)
> > +{
> > +	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > +	struct dmar_domain *ddomain;
> > +	struct intel_svm_dev *sdev;
> > +	struct intel_svm *svm;
> > +	int ret = 0;
> > +
> > +	if (WARN_ON(!iommu) || !data)
> > +		return -EINVAL;
> > +
> > +	if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> > +	    data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> > +		return -EINVAL;
> > +
> > +	if (dev_is_pci(dev)) {
> > +		/* VT-d supports devices with full 20 bit PASIDs
> > only */
> > +		if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> > +			return -EINVAL;
> > +	} else {
> > +		return -ENOTSUPP;
> > +	}
> > +
> > +	/*
> > +	 * We only check host PASID range, we have no knowledge to
> > check
> > +	 * guest PASID range nor do we use the guest PASID.
> > +	 */
> > +	if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> > +		return -EINVAL;
> > +
> > +	ddomain = to_dmar_domain(domain);
> > +
> > +	/* Sanity check paging mode support match between host and
> > guest */
> > +	if (data->addr_width == ADDR_WIDTH_5LEVEL &&
> > +	    !cap_5lp_support(iommu->cap)) {
> > +		pr_err("Cannot support 5 level paging requested by
> > guest!\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	mutex_lock(&pasid_mutex);
> > +	svm = ioasid_find(NULL, data->hpasid, NULL);
> > +	if (IS_ERR(svm)) {
> > +		ret = PTR_ERR(svm);
> > +		goto out;
> > +	}
> > +
> > +	if (svm) {
> > +		/*
> > +		 * If we found svm for the PASID, there must be at
> > +		 * least one device bond, otherwise svm should be
> > freed.
> > +		 */
> > +		if (WARN_ON(list_empty(&svm->devs)))
> > +			return -EINVAL;
> > +
> > +		if (svm->mm == get_task_mm(current) &&
> > +		    data->hpasid == svm->pasid &&
> > +		    data->gpasid == svm->gpasid) {
> > +			pr_warn("Cannot bind the same guest-host
> > PASID for the same process\n");
> > +			mmput(svm->mm);
> > +			return -EINVAL;
> > +		}
> > +
> > +		for_each_svm_dev(sdev, svm, dev) {
> > +			/* In case of multiple sub-devices of the
> > same pdev
> > +			 * assigned, we should allow multiple bind
> > calls with
> > +			 * the same PASID and pdev.
> > +			 */
> > +			sdev->users++;
> > +			goto out;
> > +		}
> > +	} else {
> > +		/* We come here when PASID has never been bond to
> > a device. */
> > +		svm = kzalloc(sizeof(*svm), GFP_KERNEL);
> > +		if (!svm) {
> > +			ret = -ENOMEM;
> > +			goto out;
> > +		}
> > +		/* REVISIT: upper layer/VFIO can track host
> > process that bind the PASID.
> > +		 * ioasid_set = mm might be sufficient for vfio to
> > check pasid VMM
> > +		 * ownership.
> > +		 */  
> 
> Does this mean you need more work to enhance this? Or just a hint?
> 
> > +		svm->mm = get_task_mm(current);
> > +		svm->pasid = data->hpasid;
> > +		if (data->flags & IOMMU_SVA_GPASID_VAL) {
> > +			svm->gpasid = data->gpasid;
> > +			svm->flags |= SVM_FLAG_GUEST_PASID;
> > +		}
> > +		ioasid_set_data(data->hpasid, svm);
> > +		INIT_LIST_HEAD_RCU(&svm->devs);
> > +		INIT_LIST_HEAD(&svm->list);
> > +
> > +		mmput(svm->mm);
> > +	}
> > +	sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> > +	if (!sdev) {
> > +		if (list_empty(&svm->devs)) {
> > +			ioasid_set_data(data->hpasid, NULL);
> > +			kfree(svm);
> > +		}
> > +		ret = -ENOMEM;
> > +		goto out;
> > +	}
> > +	sdev->dev = dev;
> > +	sdev->users = 1;
> > +
> > +	/* Set up device context entry for PASID if not enabled
> > already */
> > +	ret = intel_iommu_enable_pasid(iommu, sdev->dev);
> > +	if (ret) {
> > +		dev_err(dev, "Failed to enable PASID
> > capability\n");
> > +		kfree(sdev);  
> 
> Need to free svm if both svm and sdev are allocated in this function.
> 
right, will check if svm->devs list is empty to determine if svm is
allocated in this function, then free.
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * For guest bind, we need to set up PASID table entry as
> > follows:
> > +	 * - FLPM matches guest paging mode
> > +	 * - turn on nested mode
> > +	 * - SL guest address width matching
> > +	 */
> > +	ret = intel_pasid_setup_nested(iommu,
> > +				dev,
> > +				(pgd_t *)data->gpgd,
> > +				data->hpasid,
> > +				&data->vtd,
> > +				ddomain,
> > +				data->addr_width);  
> 
> Align with the left parenthesis.
> 
sounds good.
Thanks!
> Best regards,
> baolu
> 
> > +	if (ret) {
> > +		dev_err(dev, "Failed to set up PASID %llu in
> > nested mode, Err %d\n",
> > +			data->hpasid, ret);
> > +		/*
> > +		 * PASID entry should be in cleared state if
> > nested mode
> > +		 * set up failed. So we only need to clear IOASID
> > tracking
> > +		 * data such that free call will succeed.
> > +		 */
> > +		ioasid_set_data(data->hpasid, NULL);
> > +		kfree(sdev);
> > +		if (list_empty(&svm->devs))
> > +			kfree(svm);
> > +
> > +		goto out;
> > +	}
> > +	svm->flags |= SVM_FLAG_GUEST_MODE;
> > +
> > +	init_rcu_head(&sdev->rcu);
> > +	list_add_rcu(&sdev->list, &svm->devs);
> > + out:
> > +	mutex_unlock(&pasid_mutex);
> > +	return ret;
> > +}
> > +
> > +int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> > +{
> > +	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > +	struct intel_svm_dev *sdev;
> > +	struct intel_svm *svm;
> > +	int ret = -EINVAL;
> > +
> > +	if (WARN_ON(!iommu))
> > +		return -EINVAL;
> > +
> > +	mutex_lock(&pasid_mutex);
> > +	svm = ioasid_find(NULL, pasid, NULL);
> > +	if (!svm) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	if (IS_ERR(svm)) {
> > +		ret = PTR_ERR(svm);
> > +		goto out;
> > +	}
> > +
> > +	for_each_svm_dev(sdev, svm, dev) {
> > +		ret = 0;
> > +		sdev->users--;
> > +		if (!sdev->users) {
> > +			list_del_rcu(&sdev->list);
> > +			intel_pasid_tear_down_entry(iommu, dev,
> > svm->pasid);
> > +			/* TODO: Drain in flight PRQ for the PASID
> > since it
> > +			 * may get reused soon, we don't want to
> > +			 * confuse with its previous life.
> > +			 * intel_svm_drain_prq(dev, pasid);
> > +			 */
> > +			kfree_rcu(sdev, rcu);
> > +
> > +			if (list_empty(&svm->devs)) {
> > +				list_del(&svm->list);
> > +				/*
> > +				 * We do not free PASID here until
> > explicit call
> > +				 * from VFIO to free. The PASID
> > life cycle
> > +				 * management is largely tied to
> > VFIO management
> > +				 * of assigned device life cycles.
> > In case of
> > +				 * guest exit without a explicit
> > free PASID call,
> > +				 * the responsibility lies in VFIO
> > layer to free
> > +				 * the PASIDs allocated for the
> > guest.
> > +				 * For security reasons, VFIO has
> > to track the
> > +				 * PASID ownership per guest
> > anyway to ensure
> > +				 * that PASID allocated by one
> > guest cannot be
> > +				 * used by another.
> > +				 */
> > +				ioasid_set_data(pasid, NULL);
> > +				kfree(svm);
> > +			}
> > +		}
> > +		break;
> > +	}
> > +out:
> > +	mutex_unlock(&pasid_mutex);
> > +
> > +	return ret;
> > +}
> > +
> >   int intel_svm_bind_mm(struct device *dev, int *pasid, int flags,
> > struct svm_dev_ops *ops) {
> >   	struct intel_iommu *iommu =
> > intel_svm_device_to_iommu(dev); diff --git
> > a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index
> > 19bf9ff180ae..412a90cb1738 100644 --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -671,7 +671,9 @@ int intel_iommu_enable_pasid(struct intel_iommu
> > *iommu, struct device *dev); extern void intel_svm_check(struct
> > intel_iommu *iommu); extern int intel_svm_enable_prq(struct
> > intel_iommu *iommu); extern int intel_svm_finish_prq(struct
> > intel_iommu *iommu); -
> > +extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
> > +		struct device *dev, struct iommu_gpasid_bind_data
> > *data); +extern int intel_svm_unbind_gpasid(struct device *dev, int
> > pasid); struct svm_dev_ops;
> >   
> >   struct intel_svm_dev {
> > @@ -688,9 +690,13 @@ struct intel_svm_dev {
> >   struct intel_svm {
> >   	struct mmu_notifier notifier;
> >   	struct mm_struct *mm;
> > +
> >   	struct intel_iommu *iommu;
> >   	int flags;
> >   	int pasid;
> > +	int gpasid; /* Guest PASID in case of vSVA bind with
> > non-identity host
> > +		     * to guest PASID mapping.
> > +		     */
> >   	struct list_head devs;
> >   	struct list_head list;
> >   };
> > diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
> > index 94f047a8a845..a2c189ad0b01 100644
> > --- a/include/linux/intel-svm.h
> > +++ b/include/linux/intel-svm.h
> > @@ -44,6 +44,23 @@ struct svm_dev_ops {
> >    * do such IOTLB flushes automatically.
> >    */
> >   #define SVM_FLAG_SUPERVISOR_MODE	(1<<1)
> > +/*
> > + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind
> > to a device.
> > + * In this case the mm_struct is in the guest kernel or userspace,
> > its life
> > + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this
> > API provides
> > + * means to bind/unbind guest CR3 with PASIDs allocated for a
> > device.
> > + */
> > +#define SVM_FLAG_GUEST_MODE	(1<<2)
> > +/*
> > + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own
> > PASID space,
> > + * which requires guest and host PASID translation at both
> > directions. We keep
> > + * track of guest PASID in order to provide lookup service to
> > device drivers.
> > + * One such example is a physical function (PF) driver that
> > supports mediated
> > + * device (mdev) assignment. Guest programming of mdev
> > configuration space can
> > + * only be done with guest PASID, therefore PF driver needs to
> > find the matching
> > + * host PASID to program the real hardware.
> > + */
> > +#define SVM_FLAG_GUEST_PASID	(1<<3)
> >   
> >   #ifdef CONFIG_INTEL_IOMMU_SVM
> >   
> >   

[Jacob Pan]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 04/10] iommu/vt-d: Support flushing more translation cache types
  2019-12-19  2:46   ` Lu Baolu
@ 2020-01-09 21:50     ` Jacob Pan
  2020-01-10  1:17       ` Lu Baolu
  0 siblings, 1 reply; 27+ messages in thread
From: Jacob Pan @ 2020-01-09 21:50 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Tian, Kevin,
	Raj Ashok, Yi Liu, Eric Auger, jacob.jun.pan

On Thu, 19 Dec 2019 10:46:51 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi,
> 
> On 12/17/19 3:24 AM, Jacob Pan wrote:
> > When Shared Virtual Memory is exposed to a guest via vIOMMU,
> > scalable IOTLB invalidation may be passed down from outside IOMMU
> > subsystems. This patch adds invalidation functions that can be used
> > for additional translation cache types.
> > 
> > Signed-off-by: Jacob Pan<jacob.jun.pan@linux.intel.com>
> > ---
> >   drivers/iommu/dmar.c        | 46
> > +++++++++++++++++++++++++++++++++++++++++++++
> > drivers/iommu/intel-pasid.c |  3 ++- include/linux/intel-iommu.h |
> > 21 +++++++++++++++++---- 3 files changed, 65 insertions(+), 5
> > deletions(-)
> > 
> > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> > index 3acfa6a25fa2..f2f5d75da94a 100644
> > --- a/drivers/iommu/dmar.c
> > +++ b/drivers/iommu/dmar.c
> > @@ -1348,6 +1348,20 @@ void qi_flush_iotlb(struct intel_iommu
> > *iommu, u16 did, u64 addr, qi_submit_sync(&desc, iommu);
> >   }
> >   
> > +/* PASID-based IOTLB Invalidate */
> > +void qi_flush_iotlb_pasid(struct intel_iommu *iommu, u16 did, u64
> > addr, u32 pasid,
> > +		unsigned int size_order, u64 granu, int ih)
> > +{
> > +	struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
> > +
> > +	desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
> > +		QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
> > +	desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
> > +		QI_EIOTLB_AM(size_order);
> > +
> > +	qi_submit_sync(&desc, iommu);
> > +}  
> 
> There's another version of pasid-based iotlb invalidation.
> 
> https://lkml.org/lkml/2019/12/10/2128
> 
> Let's consider merging them.
> 
Absolutely, the difference i see is that the granularity is explicit
here. Here we do invalidation request from the guest. Perhaps, we can
look at consolidation once this use case is supported?

> Best regards,
> baolu

[Jacob Pan]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 06/10] iommu/vt-d: Cache virtual command capability register
  2019-12-18  3:25   ` Lu Baolu
@ 2020-01-09 21:59     ` Jacob Pan
  0 siblings, 0 replies; 27+ messages in thread
From: Jacob Pan @ 2020-01-09 21:59 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Tian, Kevin,
	Raj Ashok, Yi Liu, Eric Auger, jacob.jun.pan

On Wed, 18 Dec 2019 11:25:27 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi,
> 
> On 12/17/19 3:24 AM, Jacob Pan wrote:
> > Virtual command registers are used in the guest only, to prevent
> > vmexit cost, we cache the capability and store it during
> > initialization.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >   drivers/iommu/dmar.c        | 1 +
> >   include/linux/intel-iommu.h | 4 ++++
> >   2 files changed, 5 insertions(+)
> > 
> > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> > index f2f5d75da94a..3f98dd9ad004 100644
> > --- a/drivers/iommu/dmar.c
> > +++ b/drivers/iommu/dmar.c
> > @@ -953,6 +953,7 @@ static int map_iommu(struct intel_iommu *iommu,
> > u64 phys_addr) warn_invalid_dmar(phys_addr, " returns all ones");
> >   		goto unmap;
> >   	}
> > +	iommu->vccap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
> >   
> >   	/* the registers might be more than one page */
> >   	map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index ee26989df008..4d25141ec3df
> > 100644 --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -189,6 +189,9 @@
> >   #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
> >   #define ecap_sc_support(e)	((e >> 7) & 0x1) /* Snooping
> > Control */ 
> > +/* Virtual command interface capabilities */
> > +#define vccap_pasid(v)		((v & DMA_VCS_PAS)) /* PASID
> > allocation */  
> 
> Has DMA_VCS_PAS ever been defined?
> 
Good catch, it is in the next patch, need to move the #define here.
Thanks!

> Best regards,
> baolu
> 
>  [...]  

[Jacob Pan]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 08/10] iommu/vt-d: Add custom allocator for IOASID
  2019-12-18  4:10   ` Lu Baolu
@ 2020-01-09 22:06     ` Jacob Pan
  2020-01-10  1:19       ` Lu Baolu
  0 siblings, 1 reply; 27+ messages in thread
From: Jacob Pan @ 2020-01-09 22:06 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Tian, Kevin,
	Raj Ashok, Yi Liu, Eric Auger, jacob.jun.pan

On Wed, 18 Dec 2019 12:10:55 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi,
> 
> On 12/17/19 3:24 AM, Jacob Pan wrote:
> > When VT-d driver runs in the guest, PASID allocation must be
> > performed via virtual command interface. This patch registers a
> > custom IOASID allocator which takes precedence over the default
> > XArray based allocator. The resulting IOASID allocation will always
> > come from the host. This ensures that PASID namespace is system-
> > wide.
> > 
> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Liu, Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >   drivers/iommu/intel-iommu.c | 75
> > +++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/intel-iommu.h |  2 ++ 2 files changed, 77
> > insertions(+)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index e90102c7540d..b0c0bb6f740e
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -1700,6 +1700,9 @@ static void free_dmar_iommu(struct
> > intel_iommu *iommu) if (ecap_prs(iommu->ecap))
> >   			intel_svm_finish_prq(iommu);
> >   	}
> > +	if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
> > +
> > ioasid_unregister_allocator(&iommu->pasid_allocator); +
> >   #endif
> >   }
> >   
> > @@ -3181,6 +3184,75 @@ static int copy_translation_tables(struct
> > intel_iommu *iommu) return ret;
> >   }
> >   
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max,
> > void *data) +{
> > +	struct intel_iommu *iommu = data;
> > +	ioasid_t ioasid;
> > +  
> 
> Check !iommu just like the free api?
> 
sounds good, will return INVALID_IOASID if NULL.

> > +	/*
> > +	 * VT-d virtual command interface always uses the full 20
> > bit
> > +	 * PASID range. Host can partition guest PASID range based
> > on
> > +	 * policies but it is out of guest's control.
> > +	 */
> > +	if (min < PASID_MIN || max > intel_pasid_max_id)
> > +		return INVALID_IOASID;
> > +
> > +	if (vcmd_alloc_pasid(iommu, &ioasid))
> > +		return INVALID_IOASID;
> > +
> > +	return ioasid;
> > +}
> > +
> > +static void intel_ioasid_free(ioasid_t ioasid, void *data)
> > +{
> > +	struct intel_iommu *iommu = data;
> > +
> > +	if (!iommu)
> > +		return;
> > +	/*
> > +	 * Sanity check the ioasid owner is done at upper layer,
> > e.g. VFIO
> > +	 * We can only free the PASID when all the devices are
> > unbound.
> > +	 */
> > +	if (ioasid_find(NULL, ioasid, NULL)) {
> > +		pr_alert("Cannot free active IOASID %d\n", ioasid);
> > +		return;
> > +	}
> > +	vcmd_free_pasid(iommu, ioasid);
> > +}
> > +
> > +static void register_pasid_allocator(struct intel_iommu *iommu)
> > +{
> > +	if (!intel_iommu_sm) {  
> 
> Use sm_supported(iommu) instead.
> 
sounds good, seems we could separate the sm code more cleanly in the
future to avoid all these checks.

> > +		pr_warn("VT-d scalable mode not enabled\n");
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * Register a custom PASID allocator if we are running in
> > a guest,
> > +	 * guest PASID must be obtained via virtual command
> > interface.
> > +	 * There can be multiple vIOMMUs in each guest but only
> > one allocator
> > +	 * is active. All vIOMMU allocators will eventually be
> > calling the same
> > +	 * host allocator.
> > +	 */
> > +	if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap)) {
> > +		pr_info("Register custom PASID allocator\n");
> > +		iommu->pasid_allocator.alloc = intel_ioasid_alloc;
> > +		iommu->pasid_allocator.free = intel_ioasid_free;
> > +		iommu->pasid_allocator.pdata = (void *)iommu;
> > +		if
> > (!ioasid_register_allocator(&iommu->pasid_allocator)) {
> > +			pr_warn("Custom PASID allocator failed,
> > scalable mode disabled\n");
> > +			/*
> > +			 * Disable scalable mode on this IOMMU if
> > there
> > +			 * is no custom allocator. Mixing SM
> > capable vIOMMU
> > +			 * and non-SM vIOMMU are not supported.
> > +			 */
> > +			intel_iommu_sm = 0;
> > +		}
> > +	}
> > +}
> > +#endif
> > +
> >   static int __init init_dmars(void)
> >   {
> >   	struct dmar_drhd_unit *drhd;
> > @@ -3298,6 +3370,9 @@ static int __init init_dmars(void)
> >   	 */
> >   	for_each_active_iommu(iommu, drhd) {
> >   		iommu_flush_write_buffer(iommu);
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +		register_pasid_allocator(iommu);
> > +#endif
> >   		iommu_set_root_entry(iommu);
> >   		iommu->flush.flush_context(iommu, 0, 0, 0,
> > DMA_CCMD_GLOBAL_INVL); iommu->flush.flush_iotlb(iommu, 0, 0, 0,
> > DMA_TLB_GLOBAL_FLUSH); diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index 1e11560b0e59..8c30b23bd838
> > 100644 --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -19,6 +19,7 @@
> >   #include <linux/iommu.h>
> >   #include <linux/io-64-nonatomic-lo-hi.h>
> >   #include <linux/dmar.h>
> > +#include <linux/ioasid.h>
> >   
> >   #include <asm/cacheflush.h>
> >   #include <asm/iommu.h>
> > @@ -557,6 +558,7 @@ struct intel_iommu {
> >   #ifdef CONFIG_INTEL_IOMMU_SVM
> >   	struct page_req_dsc *prq;
> >   	unsigned char prq_name[16];    /* Name for PRQ interrupt
> > */
> > +	struct ioasid_allocator_ops pasid_allocator; /* Custom
> > allocator for PASIDs */ #endif
> >   	struct q_inval  *qi;            /* Queued invalidation
> > info */ u32 *iommu_state; /* Store iommu states between suspend and
> > resume.*/ 
> 
> Best regards,
> baolu

[Jacob Pan]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 02/10] iommu/vt-d: Add nested translation helper function
  2020-01-09 18:39     ` Jacob Pan
@ 2020-01-10  1:15       ` Lu Baolu
  2020-01-10 18:25         ` Jacob Pan
  0 siblings, 1 reply; 27+ messages in thread
From: Lu Baolu @ 2020-01-10  1:15 UTC (permalink / raw)
  To: Jacob Pan
  Cc: baolu.lu, iommu, LKML, Joerg Roedel, David Woodhouse, Tian,
	Kevin, Raj Ashok, Yi Liu, Eric Auger, Yi L

Hi Jacob,

On 1/10/20 2:39 AM, Jacob Pan wrote:
> On Wed, 18 Dec 2019 10:41:53 +0800
> Lu Baolu <baolu.lu@linux.intel.com> wrote:
> 
>> Hi again,
>>
>> On 12/17/19 3:24 AM, Jacob Pan wrote:
>>> +/**
>>> + * intel_pasid_setup_nested() - Set up PASID entry for nested
>>> translation
>>> + * which is used for vSVA. The first level page tables are used for
>>> + * GVA-GPA or GIOVA-GPA translation in the guest, second level
>>> page tables
>>> + *  are used for GPA-HPA translation.
>>> + *
>>> + * @iommu:      Iommu which the device belong to
>>> + * @dev:        Device to be set up for translation
>>> + * @gpgd:       FLPTPTR: First Level Page translation pointer in
>>> GPA
>>> + * @pasid:      PASID to be programmed in the device PASID table
>>> + * @pasid_data: Additional PASID info from the guest bind request
>>> + * @domain:     Domain info for setting up second level page tables
>>> + * @addr_width: Address width of the first level (guest)
>>> + */
>>> +int intel_pasid_setup_nested(struct intel_iommu *iommu,
>>> +			struct device *dev, pgd_t *gpgd,
>>> +			int pasid, struct
>>> iommu_gpasid_bind_data_vtd *pasid_data,
>>> +			struct dmar_domain *domain,
>>> +			int addr_width)
>>> +{
>>> +	struct pasid_entry *pte;
>>> +	struct dma_pte *pgd;
>>> +	u64 pgd_val;
>>> +	int agaw;
>>> +	u16 did;
>>> +
>>> +	if (!ecap_nest(iommu->ecap)) {
>>> +		pr_err("IOMMU: %s: No nested translation
>>> support\n",
>>> +		       iommu->name);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	pte = intel_pasid_get_entry(dev, pasid);
>>> +	if (WARN_ON(!pte))
>>> +		return -EINVAL;
>>> +
>>> +	pasid_clear_entry(pte);
>>
>> In some cases, e.g. nested mode for GIOVA-HPA, the PASID entry might
>> have already been setup for second level translation. (This could be
>> checked with the Present bit.) Hence, it's safe to flush caches here.
>>
>> Or, maybe intel_pasid_tear_down_entry() is more suitable?
>>
> We don't allow binding the same device-PASID twice, so if the PASID
> entry was used for GIOVA/RID2PASID, it should unbind first, and
> teardown flush included, right?
> 

Fair enough. Can you please add this as a comment to this function? So
that the caller of this interface can know this. Or add a check in this
function which returns error if the pasid entry has already been bond.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 04/10] iommu/vt-d: Support flushing more translation cache types
  2020-01-09 21:50     ` Jacob Pan
@ 2020-01-10  1:17       ` Lu Baolu
  0 siblings, 0 replies; 27+ messages in thread
From: Lu Baolu @ 2020-01-10  1:17 UTC (permalink / raw)
  To: Jacob Pan
  Cc: baolu.lu, iommu, LKML, Joerg Roedel, David Woodhouse, Tian,
	Kevin, Raj Ashok, Yi Liu, Eric Auger

Hi,

On 1/10/20 5:50 AM, Jacob Pan wrote:
> On Thu, 19 Dec 2019 10:46:51 +0800
> Lu Baolu<baolu.lu@linux.intel.com>  wrote:
> 
>> Hi,
>>
>> On 12/17/19 3:24 AM, Jacob Pan wrote:
>>> When Shared Virtual Memory is exposed to a guest via vIOMMU,
>>> scalable IOTLB invalidation may be passed down from outside IOMMU
>>> subsystems. This patch adds invalidation functions that can be used
>>> for additional translation cache types.
>>>
>>> Signed-off-by: Jacob Pan<jacob.jun.pan@linux.intel.com>
>>> ---
>>>    drivers/iommu/dmar.c        | 46
>>> +++++++++++++++++++++++++++++++++++++++++++++
>>> drivers/iommu/intel-pasid.c |  3 ++- include/linux/intel-iommu.h |
>>> 21 +++++++++++++++++---- 3 files changed, 65 insertions(+), 5
>>> deletions(-)
>>>
>>> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
>>> index 3acfa6a25fa2..f2f5d75da94a 100644
>>> --- a/drivers/iommu/dmar.c
>>> +++ b/drivers/iommu/dmar.c
>>> @@ -1348,6 +1348,20 @@ void qi_flush_iotlb(struct intel_iommu
>>> *iommu, u16 did, u64 addr, qi_submit_sync(&desc, iommu);
>>>    }
>>>    
>>> +/* PASID-based IOTLB Invalidate */
>>> +void qi_flush_iotlb_pasid(struct intel_iommu *iommu, u16 did, u64
>>> addr, u32 pasid,
>>> +		unsigned int size_order, u64 granu, int ih)
>>> +{
>>> +	struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
>>> +
>>> +	desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
>>> +		QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
>>> +	desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
>>> +		QI_EIOTLB_AM(size_order);
>>> +
>>> +	qi_submit_sync(&desc, iommu);
>>> +}
>> There's another version of pasid-based iotlb invalidation.
>>
>> https://lkml.org/lkml/2019/12/10/2128
>>
>> Let's consider merging them.
>>
> Absolutely, the difference i see is that the granularity is explicit
> here. Here we do invalidation request from the guest. Perhaps, we can
> look at consolidation once this use case is supported?
> 

Looks good to me. :-)

Best regards,
baolu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 08/10] iommu/vt-d: Add custom allocator for IOASID
  2020-01-09 22:06     ` Jacob Pan
@ 2020-01-10  1:19       ` Lu Baolu
  0 siblings, 0 replies; 27+ messages in thread
From: Lu Baolu @ 2020-01-10  1:19 UTC (permalink / raw)
  To: Jacob Pan
  Cc: baolu.lu, iommu, LKML, Joerg Roedel, David Woodhouse, Tian,
	Kevin, Raj Ashok, Yi Liu, Eric Auger

Hi,

On 1/10/20 6:06 AM, Jacob Pan wrote:
>>> +static void register_pasid_allocator(struct intel_iommu *iommu)
>>> +{
>>> +	if (!intel_iommu_sm) {
>> Use sm_supported(iommu) instead.
>>
> sounds good, seems we could separate the sm code more cleanly in the
> future to avoid all these checks.
> 

Agreed.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 02/10] iommu/vt-d: Add nested translation helper function
  2020-01-10  1:15       ` Lu Baolu
@ 2020-01-10 18:25         ` Jacob Pan
  0 siblings, 0 replies; 27+ messages in thread
From: Jacob Pan @ 2020-01-10 18:25 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Tian, Kevin,
	Raj Ashok, Yi Liu, Eric Auger, Yi L, jacob.jun.pan

On Fri, 10 Jan 2020 09:15:45 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi Jacob,
> 
> On 1/10/20 2:39 AM, Jacob Pan wrote:
> > On Wed, 18 Dec 2019 10:41:53 +0800
> > Lu Baolu <baolu.lu@linux.intel.com> wrote:
> >   
> >> Hi again,
> >>
> >> On 12/17/19 3:24 AM, Jacob Pan wrote:  
> >>> +/**
> >>> + * intel_pasid_setup_nested() - Set up PASID entry for nested
> >>> translation
> >>> + * which is used for vSVA. The first level page tables are used
> >>> for
> >>> + * GVA-GPA or GIOVA-GPA translation in the guest, second level
> >>> page tables
> >>> + *  are used for GPA-HPA translation.
> >>> + *
> >>> + * @iommu:      Iommu which the device belong to
> >>> + * @dev:        Device to be set up for translation
> >>> + * @gpgd:       FLPTPTR: First Level Page translation pointer in
> >>> GPA
> >>> + * @pasid:      PASID to be programmed in the device PASID table
> >>> + * @pasid_data: Additional PASID info from the guest bind request
> >>> + * @domain:     Domain info for setting up second level page
> >>> tables
> >>> + * @addr_width: Address width of the first level (guest)
> >>> + */
> >>> +int intel_pasid_setup_nested(struct intel_iommu *iommu,
> >>> +			struct device *dev, pgd_t *gpgd,
> >>> +			int pasid, struct
> >>> iommu_gpasid_bind_data_vtd *pasid_data,
> >>> +			struct dmar_domain *domain,
> >>> +			int addr_width)
> >>> +{
> >>> +	struct pasid_entry *pte;
> >>> +	struct dma_pte *pgd;
> >>> +	u64 pgd_val;
> >>> +	int agaw;
> >>> +	u16 did;
> >>> +
> >>> +	if (!ecap_nest(iommu->ecap)) {
> >>> +		pr_err("IOMMU: %s: No nested translation
> >>> support\n",
> >>> +		       iommu->name);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	pte = intel_pasid_get_entry(dev, pasid);
> >>> +	if (WARN_ON(!pte))
> >>> +		return -EINVAL;
> >>> +
> >>> +	pasid_clear_entry(pte);  
> >>
> >> In some cases, e.g. nested mode for GIOVA-HPA, the PASID entry
> >> might have already been setup for second level translation. (This
> >> could be checked with the Present bit.) Hence, it's safe to flush
> >> caches here.
> >>
> >> Or, maybe intel_pasid_tear_down_entry() is more suitable?
> >>  
> > We don't allow binding the same device-PASID twice, so if the PASID
> > entry was used for GIOVA/RID2PASID, it should unbind first, and
> > teardown flush included, right?
> >   
> 
> Fair enough. Can you please add this as a comment to this function? So
> that the caller of this interface can know this. Or add a check in
> this function which returns error if the pasid entry has already been
> bond.
> 
Sounds good, i will do both comment and check as this:

	/*
	 * Caller must ensure PASID entry is not in use, i.e. not bind the
	 * same PASID to the same device twice.
	 */
	if (pasid_pte_is_present(pte))
		return -EBUSY;
We already have the check in the current caller.
Thanks,
> Best regards,
> baolu

[Jacob Pan]

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-01-10 18:20 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-16 19:24 [PATCH v8 00/10] Nested Shared Virtual Address (SVA) VT-d support Jacob Pan
2019-12-16 19:24 ` [PATCH v8 01/10] iommu/vt-d: Move domain helper to header Jacob Pan
2019-12-16 19:24 ` [PATCH v8 02/10] iommu/vt-d: Add nested translation helper function Jacob Pan
2019-12-18  2:01   ` Lu Baolu
2020-01-09 17:51     ` Jacob Pan
2019-12-18  2:41   ` Lu Baolu
2020-01-09 18:39     ` Jacob Pan
2020-01-10  1:15       ` Lu Baolu
2020-01-10 18:25         ` Jacob Pan
2019-12-16 19:24 ` [PATCH v8 03/10] iommu/vt-d: Add bind guest PASID support Jacob Pan
2019-12-18  3:14   ` Lu Baolu
2020-01-09 21:45     ` Jacob Pan
2019-12-16 19:24 ` [PATCH v8 04/10] iommu/vt-d: Support flushing more translation cache types Jacob Pan
2019-12-19  2:46   ` Lu Baolu
2020-01-09 21:50     ` Jacob Pan
2020-01-10  1:17       ` Lu Baolu
2019-12-16 19:24 ` [PATCH v8 05/10] iommu/vt-d: Add svm/sva invalidate function Jacob Pan
2019-12-16 19:24 ` [PATCH v8 06/10] iommu/vt-d: Cache virtual command capability register Jacob Pan
2019-12-18  3:25   ` Lu Baolu
2020-01-09 21:59     ` Jacob Pan
2019-12-16 19:24 ` [PATCH v8 07/10] iommu/vt-d: Enlightened PASID allocation Jacob Pan
2019-12-16 19:24 ` [PATCH v8 08/10] iommu/vt-d: Add custom allocator for IOASID Jacob Pan
2019-12-18  4:10   ` Lu Baolu
2020-01-09 22:06     ` Jacob Pan
2020-01-10  1:19       ` Lu Baolu
2019-12-16 19:24 ` [PATCH v8 09/10] iommu/ioasid: Add notifier for status change Jacob Pan
2019-12-16 19:24 ` [PATCH v8 10/10] iommu/vt-d: Handle IOASID notifications Jacob Pan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).