All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Support for nesting IOMMUs in VFIO
@ 2014-09-02  9:53 Will Deacon
       [not found] ` <1409651619-27342-1-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Will Deacon @ 2014-09-02  9:53 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Will Deacon

Hello,

This is version three of the patches I originally posted here:

  RFCv1: http://permalink.gmane.org/gmane.linux.kernel.iommu/5552
  RFCv2: http://permalink.gmane.org/gmane.linux.kernel.iommu/5700

Changes since RFCv2 include:

  - Dropped the RFC tag
  - Rebased onto 3.17-rc*
  - Iterate the support bus_types (currently just PCI) and check that
    nesting is actually supported

The corresponding arm-smmu changes are included to show how the new
domain attribute can be used.

All feedback welcome,

Will

-->8

Will Deacon (3):
  iommu: introduce domain attribute for nesting IOMMUs
  vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type
  iommu/arm-smmu: add support for DOMAIN_ATTR_NESTING attribute

 drivers/iommu/arm-smmu.c        | 110 ++++++++++++++++++++++++++++++++--------
 drivers/vfio/vfio_iommu_type1.c |  87 ++++++++++++++++++++++++++++---
 include/linux/iommu.h           |   1 +
 include/uapi/linux/vfio.h       |   2 +
 4 files changed, 173 insertions(+), 27 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/3] iommu: introduce domain attribute for nesting IOMMUs
       [not found] ` <1409651619-27342-1-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
@ 2014-09-02  9:53   ` Will Deacon
  2014-09-02  9:53   ` [PATCH v3 2/3] vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type Will Deacon
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2014-09-02  9:53 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Will Deacon

Some IOMMUs, such as the ARM SMMU, support two stages of translation.
The idea behind such a scheme is to allow a guest operating system to
use the IOMMU for DMA mappings in the first stage of translation, with
the hypervisor then installing mappings in the second stage to provide
isolation of the DMA to the physical range assigned to that virtual
machine.

In order to allow IOMMU domains to be used for second-stage translation,
this patch adds a new iommu_attr (IOMMU_ATTR_NESTING) for setting
second-stage domains prior to device attach. The attribute can also be
queried to see if a domain is actually making use of nesting.

Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
Cc: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
---
 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 20f9a527922a..7b02bcc85b9e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -80,6 +80,7 @@ enum iommu_attr {
 	DOMAIN_ATTR_FSL_PAMU_STASH,
 	DOMAIN_ATTR_FSL_PAMU_ENABLE,
 	DOMAIN_ATTR_FSL_PAMUV1,
+	DOMAIN_ATTR_NESTING,	/* two stages of translation */
 	DOMAIN_ATTR_MAX,
 };
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/3] vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type
       [not found] ` <1409651619-27342-1-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
  2014-09-02  9:53   ` [PATCH v3 1/3] iommu: introduce domain attribute for nesting IOMMUs Will Deacon
@ 2014-09-02  9:53   ` Will Deacon
       [not found]     ` <1409651619-27342-3-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
  2014-09-02  9:53   ` [PATCH v3 3/3] iommu/arm-smmu: add support for DOMAIN_ATTR_NESTING attribute Will Deacon
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Will Deacon @ 2014-09-02  9:53 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Will Deacon

VFIO allows devices to be safely handed off to userspace by putting
them behind an IOMMU configured to ensure DMA and interrupt isolation.
This enables userspace KVM clients, such as kvmtool and qemu, to further
map the device into a virtual machine.

With IOMMUs such as the ARM SMMU, it is then possible to provide SMMU
translation services to the guest operating system, which are nested
with the existing translation installed by VFIO. However, enabling this
feature means that the IOMMU driver must be informed that the VFIO domain
is being created for the purposes of nested translation.

This patch adds a new IOMMU type (VFIO_TYPE1_NESTING_IOMMU) to the VFIO
type-1 driver. The new IOMMU type acts identically to the
VFIO_TYPE1v2_IOMMU type, but additionally sets the DOMAIN_ATTR_NESTING
attribute on its IOMMU domains. Userspace can check whether nesting is
actually available using the VFIO_CHECK_EXTENSION ioctl, in a similar
manner to checking for cache-coherent DMA.

Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
Cc: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
---
 drivers/vfio/vfio_iommu_type1.c | 87 +++++++++++++++++++++++++++++++++++++----
 include/uapi/linux/vfio.h       |  2 +
 2 files changed, 82 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 0734fbe5b651..50aa56c7b6a7 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -53,11 +53,15 @@ module_param_named(disable_hugepages,
 MODULE_PARM_DESC(disable_hugepages,
 		 "Disable VFIO IOMMU support for IOMMU hugepages.");
 
+/* Feature flags for VFIO Type-1 IOMMUs */
+#define VFIO_IOMMU_FEAT_V2	(1 << 0)
+#define VFIO_IOMMU_FEAT_NESTING	(1 << 1)
+
 struct vfio_iommu {
 	struct list_head	domain_list;
 	struct mutex		lock;
 	struct rb_root		dma_list;
-	bool v2;
+	int			features;
 };
 
 struct vfio_domain {
@@ -441,7 +445,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 	 * will only return success and a size of zero if there were no
 	 * mappings within the range.
 	 */
-	if (iommu->v2) {
+	if (iommu->features & VFIO_IOMMU_FEAT_V2) {
 		dma = vfio_find_dma(iommu, unmap->iova, 0);
 		if (dma && dma->iova != unmap->iova) {
 			ret = -EINVAL;
@@ -455,7 +459,8 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 	}
 
 	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
-		if (!iommu->v2 && unmap->iova > dma->iova)
+		if (!(iommu->features & VFIO_IOMMU_FEAT_V2) &&
+		    unmap->iova > dma->iova)
 			break;
 		unmapped += dma->size;
 		vfio_remove_dma(iommu, dma);
@@ -671,7 +676,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	struct vfio_group *group, *g;
 	struct vfio_domain *domain, *d;
 	struct bus_type *bus = NULL;
-	int ret;
+	int ret, attr = 1;
 
 	mutex_lock(&iommu->lock);
 
@@ -705,6 +710,9 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 		goto out_free;
 	}
 
+	if (iommu->features & VFIO_IOMMU_FEAT_NESTING)
+		iommu_domain_set_attr(domain->domain, DOMAIN_ATTR_NESTING, &attr);
+
 	ret = iommu_attach_group(domain->domain, iommu_group);
 	if (ret)
 		goto out_domain;
@@ -815,12 +823,50 @@ done:
 	mutex_unlock(&iommu->lock);
 }
 
+static int vfio_iommus_have_nesting(void)
+{
+	static struct bus_type *bus_types[] = { &pci_bus_type };
+	int i, ret = 0;
+
+	for (i = 0; !ret && i < ARRAY_SIZE(bus_types); ++i) {
+		struct iommu_domain *domain;
+		struct bus_type *bus = bus_types[i];
+		int attr = 1;
+
+		if (!iommu_present(bus))
+			continue;
+
+		domain = iommu_domain_alloc(bus);
+		if (!domain)
+			continue;
+
+		if (!iommu_domain_set_attr(domain, DOMAIN_ATTR_NESTING, &attr))
+			ret = 1;
+
+		iommu_domain_free(domain);
+	}
+
+	return ret;
+}
+
 static void *vfio_iommu_type1_open(unsigned long arg)
 {
 	struct vfio_iommu *iommu;
-
-	if (arg != VFIO_TYPE1_IOMMU && arg != VFIO_TYPE1v2_IOMMU)
+	int features = 0;
+
+	switch (arg) {
+	case VFIO_TYPE1_IOMMU:
+		break;
+	case VFIO_TYPE1_NESTING_IOMMU:
+		if (!vfio_iommus_have_nesting())
+			return ERR_PTR(-EOPNOTSUPP);
+		features |= VFIO_IOMMU_FEAT_NESTING;
+	case VFIO_TYPE1v2_IOMMU:
+		features |= VFIO_IOMMU_FEAT_V2;
+		break;
+	default:
 		return ERR_PTR(-EINVAL);
+	}
 
 	iommu = kzalloc(sizeof(*iommu), GFP_KERNEL);
 	if (!iommu)
@@ -829,7 +875,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	INIT_LIST_HEAD(&iommu->domain_list);
 	iommu->dma_list = RB_ROOT;
 	mutex_init(&iommu->lock);
-	iommu->v2 = (arg == VFIO_TYPE1v2_IOMMU);
+	iommu->features = features;
 
 	return iommu;
 }
@@ -875,6 +921,29 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static int vfio_domains_have_iommu_nesting(struct vfio_iommu *iommu)
+{
+	struct vfio_domain *domain;
+	int ret = 1;
+
+	if (!(iommu->features & VFIO_IOMMU_FEAT_NESTING))
+		return 0;
+
+	mutex_lock(&iommu->lock);
+	list_for_each_entry(domain, &iommu->domain_list, next) {
+		int attr;
+
+		if (iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_NESTING,
+					   &attr) || !attr) {
+			ret = 0;
+			break;
+		}
+	}
+	mutex_unlock(&iommu->lock);
+
+	return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -886,6 +955,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		case VFIO_TYPE1_IOMMU:
 		case VFIO_TYPE1v2_IOMMU:
 			return 1;
+		case VFIO_TYPE1_NESTING_IOMMU:
+			if (!iommu)
+				return vfio_iommus_have_nesting();
+			return vfio_domains_have_iommu_nesting(iommu);
 		case VFIO_DMA_CC_IOMMU:
 			if (!iommu)
 				return 0;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 6612974c64bf..ac79f6214fd6 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -24,11 +24,13 @@
 #define VFIO_TYPE1_IOMMU		1
 #define VFIO_SPAPR_TCE_IOMMU		2
 #define VFIO_TYPE1v2_IOMMU		3
+
 /*
  * IOMMU enforces DMA cache coherence (ex. PCIe NoSnoop stripping).  This
  * capability is subject to change as groups are added or removed.
  */
 #define VFIO_DMA_CC_IOMMU		4
+#define VFIO_TYPE1_NESTING_IOMMU	5	/* Implies v2 */
 
 /* Check if EEH is supported */
 #define VFIO_EEH			5
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 3/3] iommu/arm-smmu: add support for DOMAIN_ATTR_NESTING attribute
       [not found] ` <1409651619-27342-1-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
  2014-09-02  9:53   ` [PATCH v3 1/3] iommu: introduce domain attribute for nesting IOMMUs Will Deacon
  2014-09-02  9:53   ` [PATCH v3 2/3] vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type Will Deacon
@ 2014-09-02  9:53   ` Will Deacon
       [not found]     ` <1409651619-27342-4-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
  2014-09-03 15:22   ` [PATCH v3 0/3] Support for nesting IOMMUs in VFIO Joerg Roedel
  2014-09-11 17:10   ` Will Deacon
  4 siblings, 1 reply; 12+ messages in thread
From: Will Deacon @ 2014-09-02  9:53 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Will Deacon

When domains are set with the DOMAIN_ATTR_NESTING flag, we must ensure
that we allocate them to stage-2 context banks if the hardware permits
it.

This patch adds support for the attribute to the ARM SMMU driver, with
the actual stage being determined depending on the features supported
by the hardware.

Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu.c | 110 ++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 90 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d2d8cdaf4f0b..c745296b170f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -404,9 +404,16 @@ struct arm_smmu_cfg {
 #define ARM_SMMU_CB_ASID(cfg)		((cfg)->cbndx)
 #define ARM_SMMU_CB_VMID(cfg)		((cfg)->cbndx + 1)
 
+enum arm_smmu_domain_stage {
+	ARM_SMMU_DOMAIN_S1 = 0,
+	ARM_SMMU_DOMAIN_S2,
+	ARM_SMMU_DOMAIN_NESTED,
+};
+
 struct arm_smmu_domain {
 	struct arm_smmu_device		*smmu;
 	struct arm_smmu_cfg		cfg;
+	enum arm_smmu_domain_stage	stage;
 	spinlock_t			lock;
 };
 
@@ -899,19 +906,46 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	if (smmu_domain->smmu)
 		goto out_unlock;
 
-	if (smmu->features & ARM_SMMU_FEAT_TRANS_NESTED) {
+	/*
+	 * Mapping the requested stage onto what we support is surprisingly
+	 * complicated, mainly because the spec allows S1+S2 SMMUs without
+	 * support for nested translation. That means we end up with the
+	 * following table:
+	 *
+	 * Requested        Supported        Actual
+	 *     S1               N              S1
+	 *     S1             S1+S2            S1
+	 *     S1               S2             S2
+	 *     S1               S1             S1
+	 *     N                N              N
+	 *     N              S1+S2            S2
+	 *     N                S2             S2
+	 *     N                S1             S1
+	 *
+	 * Note that you can't actually request stage-2 mappings.
+	 */
+	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
+		smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
+	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
+		smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
+
+	switch (smmu_domain->stage) {
+	case ARM_SMMU_DOMAIN_S1:
+		cfg->cbar = CBAR_TYPE_S1_TRANS_S2_BYPASS;
+		start = smmu->num_s2_context_banks;
+		break;
+	case ARM_SMMU_DOMAIN_NESTED:
 		/*
 		 * We will likely want to change this if/when KVM gets
 		 * involved.
 		 */
-		cfg->cbar = CBAR_TYPE_S1_TRANS_S2_BYPASS;
-		start = smmu->num_s2_context_banks;
-	} else if (smmu->features & ARM_SMMU_FEAT_TRANS_S1) {
-		cfg->cbar = CBAR_TYPE_S1_TRANS_S2_BYPASS;
-		start = smmu->num_s2_context_banks;
-	} else {
+	case ARM_SMMU_DOMAIN_S2:
 		cfg->cbar = CBAR_TYPE_S2_TRANS;
 		start = 0;
+		break;
+	default:
+		ret = -EINVAL;
+		goto out_unlock;
 	}
 
 	ret = __arm_smmu_alloc_bitmap(smmu->context_map, start,
@@ -1637,20 +1671,56 @@ static void arm_smmu_remove_device(struct device *dev)
 	iommu_group_remove_device(dev);
 }
 
+static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
+				    enum iommu_attr attr, void *data)
+{
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+
+	switch (attr) {
+	case DOMAIN_ATTR_NESTING:
+		*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
+		return 0;
+	default:
+		return -ENODEV;
+	}
+}
+
+static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
+				    enum iommu_attr attr, void *data)
+{
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+
+	switch (attr) {
+	case DOMAIN_ATTR_NESTING:
+		if (smmu_domain->smmu)
+			return -EPERM;
+		if (*(int *)data)
+			smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
+		else
+			smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
+
+		return 0;
+	default:
+		return -ENODEV;
+	}
+}
+
 static const struct iommu_ops arm_smmu_ops = {
-	.domain_init	= arm_smmu_domain_init,
-	.domain_destroy	= arm_smmu_domain_destroy,
-	.attach_dev	= arm_smmu_attach_dev,
-	.detach_dev	= arm_smmu_detach_dev,
-	.map		= arm_smmu_map,
-	.unmap		= arm_smmu_unmap,
-	.iova_to_phys	= arm_smmu_iova_to_phys,
-	.domain_has_cap	= arm_smmu_domain_has_cap,
-	.add_device	= arm_smmu_add_device,
-	.remove_device	= arm_smmu_remove_device,
-	.pgsize_bitmap	= (SECTION_SIZE |
-			   ARM_SMMU_PTE_CONT_SIZE |
-			   PAGE_SIZE),
+	.domain_init		= arm_smmu_domain_init,
+	.domain_destroy		= arm_smmu_domain_destroy,
+	.attach_dev		= arm_smmu_attach_dev,
+	.detach_dev		= arm_smmu_detach_dev,
+	.map			= arm_smmu_map,
+	.unmap			= arm_smmu_unmap,
+	.iova_to_phys		= arm_smmu_iova_to_phys,
+	.domain_has_cap		= arm_smmu_domain_has_cap,
+	.add_device		= arm_smmu_add_device,
+	.remove_device		= arm_smmu_remove_device,
+	.domain_get_attr	= arm_smmu_domain_get_attr,
+	.domain_set_attr	= arm_smmu_domain_set_attr,
+	.pgsize_bitmap		= (SECTION_SIZE |
+				   ARM_SMMU_PTE_CONT_SIZE |
+				   PAGE_SIZE),
 };
 
 static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [PATCH v3 3/3] iommu/arm-smmu: add support for DOMAIN_ATTR_NESTING attribute
       [not found]     ` <1409651619-27342-4-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
@ 2014-09-02 11:12       ` Varun Sethi
       [not found]         ` <6c60062d962340bbaeccb07c82eaae73-AZ66ij2kwaacCcN9WK45f+O6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Varun Sethi @ 2014-09-02 11:12 UTC (permalink / raw)
  To: Will Deacon, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hi Will,
We would still need a mechanism to distinguish stage1 mapping from stage2 mapping i.e. for nested translation we should be able to specify whether the mapping corresponds to stage1 or stage 2 translation. Also, correspondingly we would require different context banks for stage 1 and stage 2 translation. 

Regards
Varun

> -----Original Message-----
> From: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org [mailto:iommu-
> bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org] On Behalf Of Will Deacon
> Sent: Tuesday, September 02, 2014 3:24 PM
> To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> Cc: Will Deacon
> Subject: [PATCH v3 3/3] iommu/arm-smmu: add support for
> DOMAIN_ATTR_NESTING attribute
> 
> When domains are set with the DOMAIN_ATTR_NESTING flag, we must ensure
> that we allocate them to stage-2 context banks if the hardware permits it.
> 
> This patch adds support for the attribute to the ARM SMMU driver, with the
> actual stage being determined depending on the features supported by the
> hardware.
> 
> Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/iommu/arm-smmu.c | 110
> ++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 90 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index
> d2d8cdaf4f0b..c745296b170f 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -404,9 +404,16 @@ struct arm_smmu_cfg {
>  #define ARM_SMMU_CB_ASID(cfg)		((cfg)->cbndx)
>  #define ARM_SMMU_CB_VMID(cfg)		((cfg)->cbndx + 1)
> 
> +enum arm_smmu_domain_stage {
> +	ARM_SMMU_DOMAIN_S1 = 0,
> +	ARM_SMMU_DOMAIN_S2,
> +	ARM_SMMU_DOMAIN_NESTED,
> +};
> +
>  struct arm_smmu_domain {
>  	struct arm_smmu_device		*smmu;
>  	struct arm_smmu_cfg		cfg;
> +	enum arm_smmu_domain_stage	stage;
>  	spinlock_t			lock;
>  };
> 
> @@ -899,19 +906,46 @@ static int arm_smmu_init_domain_context(struct
> iommu_domain *domain,
>  	if (smmu_domain->smmu)
>  		goto out_unlock;
> 
> -	if (smmu->features & ARM_SMMU_FEAT_TRANS_NESTED) {
> +	/*
> +	 * Mapping the requested stage onto what we support is surprisingly
> +	 * complicated, mainly because the spec allows S1+S2 SMMUs without
> +	 * support for nested translation. That means we end up with the
> +	 * following table:
> +	 *
> +	 * Requested        Supported        Actual
> +	 *     S1               N              S1
> +	 *     S1             S1+S2            S1
> +	 *     S1               S2             S2
> +	 *     S1               S1             S1
> +	 *     N                N              N
> +	 *     N              S1+S2            S2
> +	 *     N                S2             S2
> +	 *     N                S1             S1
> +	 *
> +	 * Note that you can't actually request stage-2 mappings.
> +	 */
> +	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
> +		smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
> +	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
> +		smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
> +
> +	switch (smmu_domain->stage) {
> +	case ARM_SMMU_DOMAIN_S1:
> +		cfg->cbar = CBAR_TYPE_S1_TRANS_S2_BYPASS;
> +		start = smmu->num_s2_context_banks;
> +		break;
> +	case ARM_SMMU_DOMAIN_NESTED:
>  		/*
>  		 * We will likely want to change this if/when KVM gets
>  		 * involved.
>  		 */
> -		cfg->cbar = CBAR_TYPE_S1_TRANS_S2_BYPASS;
> -		start = smmu->num_s2_context_banks;
> -	} else if (smmu->features & ARM_SMMU_FEAT_TRANS_S1) {
> -		cfg->cbar = CBAR_TYPE_S1_TRANS_S2_BYPASS;
> -		start = smmu->num_s2_context_banks;
> -	} else {
> +	case ARM_SMMU_DOMAIN_S2:
>  		cfg->cbar = CBAR_TYPE_S2_TRANS;
>  		start = 0;
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		goto out_unlock;
>  	}
> 
>  	ret = __arm_smmu_alloc_bitmap(smmu->context_map, start, @@ -
> 1637,20 +1671,56 @@ static void arm_smmu_remove_device(struct device
> *dev)
>  	iommu_group_remove_device(dev);
>  }
> 
> +static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
> +				    enum iommu_attr attr, void *data) {
> +	struct arm_smmu_domain *smmu_domain = domain->priv;
> +
> +	switch (attr) {
> +	case DOMAIN_ATTR_NESTING:
> +		*(int *)data = (smmu_domain->stage ==
> ARM_SMMU_DOMAIN_NESTED);
> +		return 0;
> +	default:
> +		return -ENODEV;
> +	}
> +}
> +
> +static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
> +				    enum iommu_attr attr, void *data) {
> +	struct arm_smmu_domain *smmu_domain = domain->priv;
> +
> +	switch (attr) {
> +	case DOMAIN_ATTR_NESTING:
> +		if (smmu_domain->smmu)
> +			return -EPERM;
> +		if (*(int *)data)
> +			smmu_domain->stage =
> ARM_SMMU_DOMAIN_NESTED;
> +		else
> +			smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
> +
> +		return 0;
> +	default:
> +		return -ENODEV;
> +	}
> +}
> +
>  static const struct iommu_ops arm_smmu_ops = {
> -	.domain_init	= arm_smmu_domain_init,
> -	.domain_destroy	= arm_smmu_domain_destroy,
> -	.attach_dev	= arm_smmu_attach_dev,
> -	.detach_dev	= arm_smmu_detach_dev,
> -	.map		= arm_smmu_map,
> -	.unmap		= arm_smmu_unmap,
> -	.iova_to_phys	= arm_smmu_iova_to_phys,
> -	.domain_has_cap	= arm_smmu_domain_has_cap,
> -	.add_device	= arm_smmu_add_device,
> -	.remove_device	= arm_smmu_remove_device,
> -	.pgsize_bitmap	= (SECTION_SIZE |
> -			   ARM_SMMU_PTE_CONT_SIZE |
> -			   PAGE_SIZE),
> +	.domain_init		= arm_smmu_domain_init,
> +	.domain_destroy		= arm_smmu_domain_destroy,
> +	.attach_dev		= arm_smmu_attach_dev,
> +	.detach_dev		= arm_smmu_detach_dev,
> +	.map			= arm_smmu_map,
> +	.unmap			= arm_smmu_unmap,
> +	.iova_to_phys		= arm_smmu_iova_to_phys,
> +	.domain_has_cap		= arm_smmu_domain_has_cap,
> +	.add_device		= arm_smmu_add_device,
> +	.remove_device		= arm_smmu_remove_device,
> +	.domain_get_attr	= arm_smmu_domain_get_attr,
> +	.domain_set_attr	= arm_smmu_domain_set_attr,
> +	.pgsize_bitmap		= (SECTION_SIZE |
> +				   ARM_SMMU_PTE_CONT_SIZE |
> +				   PAGE_SIZE),
>  };
> 
>  static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
> --
> 2.1.0
> 
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] iommu/arm-smmu: add support for DOMAIN_ATTR_NESTING attribute
       [not found]         ` <6c60062d962340bbaeccb07c82eaae73-AZ66ij2kwaacCcN9WK45f+O6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
@ 2014-09-02 11:16           ` Will Deacon
  0 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2014-09-02 11:16 UTC (permalink / raw)
  To: Varun Sethi; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, Sep 02, 2014 at 12:12:20PM +0100, Varun Sethi wrote:
> We would still need a mechanism to distinguish stage1 mapping from stage2
> mapping i.e. for nested translation we should be able to specify whether
> the mapping corresponds to stage1 or stage 2 translation. Also,
> correspondingly we would require different context banks for stage 1 and
> stage 2 translation.

Please take a look at my iommu/pci branch where I have much more support
for nested translation with the ARM SMMU driver. This series is purely about
getting the stage-2 part configured -- you need a virtual SMMU interface for
the stage-1 component.

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/3] Support for nesting IOMMUs in VFIO
       [not found] ` <1409651619-27342-1-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
                     ` (2 preceding siblings ...)
  2014-09-02  9:53   ` [PATCH v3 3/3] iommu/arm-smmu: add support for DOMAIN_ATTR_NESTING attribute Will Deacon
@ 2014-09-03 15:22   ` Joerg Roedel
       [not found]     ` <20140903152241.GK28786-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
  2014-09-11 17:10   ` Will Deacon
  4 siblings, 1 reply; 12+ messages in thread
From: Joerg Roedel @ 2014-09-03 15:22 UTC (permalink / raw)
  To: Will Deacon; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hi Will,

On Tue, Sep 02, 2014 at 10:53:36AM +0100, Will Deacon wrote:
> This is version three of the patches I originally posted here:
> 
>   RFCv1: http://permalink.gmane.org/gmane.linux.kernel.iommu/5552
>   RFCv2: http://permalink.gmane.org/gmane.linux.kernel.iommu/5700
> 
> Changes since RFCv2 include:
> 
>   - Dropped the RFC tag
>   - Rebased onto 3.17-rc*
>   - Iterate the support bus_types (currently just PCI) and check that
>     nesting is actually supported
> 
> The corresponding arm-smmu changes are included to show how the new
> domain attribute can be used.

How is this implemented in the SMMU, can there be only one stage2
mapping, so that a device can be passed through to a KVM guest or can
there be multiple stage2 mappings in parallel (like with PRI/PASID on
PCI) to support multiple translation contexts on one device?


	Joerg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/3] Support for nesting IOMMUs in VFIO
       [not found]     ` <20140903152241.GK28786-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
@ 2014-09-03 16:04       ` Will Deacon
  0 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2014-09-03 16:04 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Wed, Sep 03, 2014 at 04:22:42PM +0100, Joerg Roedel wrote:
> Hi Will,

Hi Joerg,

> On Tue, Sep 02, 2014 at 10:53:36AM +0100, Will Deacon wrote:
> > This is version three of the patches I originally posted here:
> > 
> >   RFCv1: http://permalink.gmane.org/gmane.linux.kernel.iommu/5552
> >   RFCv2: http://permalink.gmane.org/gmane.linux.kernel.iommu/5700
> > 
> > Changes since RFCv2 include:
> > 
> >   - Dropped the RFC tag
> >   - Rebased onto 3.17-rc*
> >   - Iterate the support bus_types (currently just PCI) and check that
> >     nesting is actually supported
> > 
> > The corresponding arm-smmu changes are included to show how the new
> > domain attribute can be used.
> 
> How is this implemented in the SMMU, can there be only one stage2
> mapping, so that a device can be passed through to a KVM guest or can
> there be multiple stage2 mappings in parallel (like with PRI/PASID on
> PCI) to support multiple translation contexts on one device?

The restriction is that a stage-1 context (i.e. set of guest page tables)
can be associated with at most 1 stage-2 context. If a device is capable of
emitting multiple Stream IDs and the guest assigns those to different
stage-1 contexts, then I guess we could use different stage-2 contexts for
them, but it's not something I'd considered.

Note that the stage-2 context is used to walk the stage-1 page tables, so
you always needs the guest memory to be mapped there.

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/3] Support for nesting IOMMUs in VFIO
       [not found] ` <1409651619-27342-1-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
                     ` (3 preceding siblings ...)
  2014-09-03 15:22   ` [PATCH v3 0/3] Support for nesting IOMMUs in VFIO Joerg Roedel
@ 2014-09-11 17:10   ` Will Deacon
  4 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2014-09-11 17:10 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hi Alex,

On Tue, Sep 02, 2014 at 10:53:36AM +0100, Will Deacon wrote:
> Hello,
> 
> This is version three of the patches I originally posted here:
> 
>   RFCv1: http://permalink.gmane.org/gmane.linux.kernel.iommu/5552
>   RFCv2: http://permalink.gmane.org/gmane.linux.kernel.iommu/5700
> 
> Changes since RFCv2 include:
> 
>   - Dropped the RFC tag
>   - Rebased onto 3.17-rc*
>   - Iterate the support bus_types (currently just PCI) and check that
>     nesting is actually supported
> 
> The corresponding arm-smmu changes are included to show how the new
> domain attribute can be used.
> 
> All feedback welcome,

Do you have any comments on this series? I've got a fair amount of work
building on top of it, so it would good to either fix it or see about
getting it merged.

Cheers,

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/3] vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type
       [not found]     ` <1409651619-27342-3-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
@ 2014-09-11 19:12       ` Alex Williamson
       [not found]         ` <1410462748.2982.342.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Williamson @ 2014-09-11 19:12 UTC (permalink / raw)
  To: Will Deacon; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, 2014-09-02 at 10:53 +0100, Will Deacon wrote:
> VFIO allows devices to be safely handed off to userspace by putting
> them behind an IOMMU configured to ensure DMA and interrupt isolation.
> This enables userspace KVM clients, such as kvmtool and qemu, to further
> map the device into a virtual machine.
> 
> With IOMMUs such as the ARM SMMU, it is then possible to provide SMMU
> translation services to the guest operating system, which are nested
> with the existing translation installed by VFIO. However, enabling this
> feature means that the IOMMU driver must be informed that the VFIO domain
> is being created for the purposes of nested translation.
> 
> This patch adds a new IOMMU type (VFIO_TYPE1_NESTING_IOMMU) to the VFIO
> type-1 driver. The new IOMMU type acts identically to the
> VFIO_TYPE1v2_IOMMU type, but additionally sets the DOMAIN_ATTR_NESTING
> attribute on its IOMMU domains. Userspace can check whether nesting is
> actually available using the VFIO_CHECK_EXTENSION ioctl, in a similar
> manner to checking for cache-coherent DMA.
> 
> Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> Cc: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 87 +++++++++++++++++++++++++++++++++++++----
>  include/uapi/linux/vfio.h       |  2 +
>  2 files changed, 82 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 0734fbe5b651..50aa56c7b6a7 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -53,11 +53,15 @@ module_param_named(disable_hugepages,
>  MODULE_PARM_DESC(disable_hugepages,
>  		 "Disable VFIO IOMMU support for IOMMU hugepages.");
>  
> +/* Feature flags for VFIO Type-1 IOMMUs */
> +#define VFIO_IOMMU_FEAT_V2	(1 << 0)
> +#define VFIO_IOMMU_FEAT_NESTING	(1 << 1)
> +
>  struct vfio_iommu {
>  	struct list_head	domain_list;
>  	struct mutex		lock;
>  	struct rb_root		dma_list;
> -	bool v2;
> +	int			features;

Personally I'd rather add a nesting bool than mask out flag bits.

>  };
>  
>  struct vfio_domain {
> @@ -441,7 +445,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>  	 * will only return success and a size of zero if there were no
>  	 * mappings within the range.
>  	 */
> -	if (iommu->v2) {
> +	if (iommu->features & VFIO_IOMMU_FEAT_V2) {
>  		dma = vfio_find_dma(iommu, unmap->iova, 0);
>  		if (dma && dma->iova != unmap->iova) {
>  			ret = -EINVAL;
> @@ -455,7 +459,8 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>  	}
>  
>  	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
> -		if (!iommu->v2 && unmap->iova > dma->iova)
> +		if (!(iommu->features & VFIO_IOMMU_FEAT_V2) &&
> +		    unmap->iova > dma->iova)
>  			break;
>  		unmapped += dma->size;
>  		vfio_remove_dma(iommu, dma);
> @@ -671,7 +676,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  	struct vfio_group *group, *g;
>  	struct vfio_domain *domain, *d;
>  	struct bus_type *bus = NULL;
> -	int ret;
> +	int ret, attr = 1;
>  
>  	mutex_lock(&iommu->lock);
>  
> @@ -705,6 +710,9 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  		goto out_free;
>  	}
>  
> +	if (iommu->features & VFIO_IOMMU_FEAT_NESTING)
> +		iommu_domain_set_attr(domain->domain, DOMAIN_ATTR_NESTING, &attr);
> +

Shouldn't we fail if this doesn't work?

>  	ret = iommu_attach_group(domain->domain, iommu_group);
>  	if (ret)
>  		goto out_domain;
> @@ -815,12 +823,50 @@ done:
>  	mutex_unlock(&iommu->lock);
>  }
>  
> +static int vfio_iommus_have_nesting(void)
> +{
> +	static struct bus_type *bus_types[] = { &pci_bus_type };

Hmm, we just got rid of all traces of pci in this file, so I'm not too
excited about adding it back.  Don't we no longer include pci.h?  I'm
surprised this doesn't throw a warning.  We should try to be independent
of CONFIG_PCI here.

> +	int i, ret = 0;
> +
> +	for (i = 0; !ret && i < ARRAY_SIZE(bus_types); ++i) {
> +		struct iommu_domain *domain;
> +		struct bus_type *bus = bus_types[i];
> +		int attr = 1;
> +
> +		if (!iommu_present(bus))
> +			continue;
> +
> +		domain = iommu_domain_alloc(bus);
> +		if (!domain)
> +			continue;
> +
> +		if (!iommu_domain_set_attr(domain, DOMAIN_ATTR_NESTING, &attr))
> +			ret = 1;
> +
> +		iommu_domain_free(domain);
> +	}

Can this support Joerg's iommu_capable() interface so we can test w/o
creating a domain?  We still have the bus_type problem though.

> +
> +	return ret;
> +}
> +
>  static void *vfio_iommu_type1_open(unsigned long arg)
>  {
>  	struct vfio_iommu *iommu;
> -
> -	if (arg != VFIO_TYPE1_IOMMU && arg != VFIO_TYPE1v2_IOMMU)
> +	int features = 0;
> +
> +	switch (arg) {
> +	case VFIO_TYPE1_IOMMU:
> +		break;
> +	case VFIO_TYPE1_NESTING_IOMMU:
> +		if (!vfio_iommus_have_nesting())
> +			return ERR_PTR(-EOPNOTSUPP);
> +		features |= VFIO_IOMMU_FEAT_NESTING;
> +	case VFIO_TYPE1v2_IOMMU:
> +		features |= VFIO_IOMMU_FEAT_V2;
> +		break;
> +	default:
>  		return ERR_PTR(-EINVAL);
> +	}
>  
>  	iommu = kzalloc(sizeof(*iommu), GFP_KERNEL);
>  	if (!iommu)
> @@ -829,7 +875,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>  	INIT_LIST_HEAD(&iommu->domain_list);
>  	iommu->dma_list = RB_ROOT;
>  	mutex_init(&iommu->lock);
> -	iommu->v2 = (arg == VFIO_TYPE1v2_IOMMU);
> +	iommu->features = features;
>  
>  	return iommu;
>  }
> @@ -875,6 +921,29 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static int vfio_domains_have_iommu_nesting(struct vfio_iommu *iommu)
> +{
> +	struct vfio_domain *domain;
> +	int ret = 1;
> +
> +	if (!(iommu->features & VFIO_IOMMU_FEAT_NESTING))
> +		return 0;
> +
> +	mutex_lock(&iommu->lock);
> +	list_for_each_entry(domain, &iommu->domain_list, next) {
> +		int attr;
> +
> +		if (iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_NESTING,
> +					   &attr) || !attr) {
> +			ret = 0;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&iommu->lock);
> +
> +	return ret;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> @@ -886,6 +955,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  		case VFIO_TYPE1_IOMMU:
>  		case VFIO_TYPE1v2_IOMMU:
>  			return 1;
> +		case VFIO_TYPE1_NESTING_IOMMU:
> +			if (!iommu)
> +				return vfio_iommus_have_nesting();
> +			return vfio_domains_have_iommu_nesting(iommu);

I'd just about prefer to statically return 1 for NESTING and then fail
to add groups if we can't set the attribute.  It sucks, but here we test
an arbitrary bus type to determine whether nesting is supported, allow a
nesting iommu type to be set regardless of what kind of devices are
about to go into it, and require that the user test to see whether the
requested IOMMU attribute actually stuck.  That's not a very intuitive
interface.  Thanks,

Alex

>  		case VFIO_DMA_CC_IOMMU:
>  			if (!iommu)
>  				return 0;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 6612974c64bf..ac79f6214fd6 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -24,11 +24,13 @@
>  #define VFIO_TYPE1_IOMMU		1
>  #define VFIO_SPAPR_TCE_IOMMU		2
>  #define VFIO_TYPE1v2_IOMMU		3
> +
>  /*
>   * IOMMU enforces DMA cache coherence (ex. PCIe NoSnoop stripping).  This
>   * capability is subject to change as groups are added or removed.
>   */
>  #define VFIO_DMA_CC_IOMMU		4
> +#define VFIO_TYPE1_NESTING_IOMMU	5	/* Implies v2 */
>  
>  /* Check if EEH is supported */
>  #define VFIO_EEH			5

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/3] vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type
       [not found]         ` <1410462748.2982.342.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
@ 2014-09-12 16:55           ` Will Deacon
       [not found]             ` <20140912165520.GF12108-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Will Deacon @ 2014-09-12 16:55 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hi Alex,

Thanks for taking a look.

On Thu, Sep 11, 2014 at 08:12:28PM +0100, Alex Williamson wrote:
> On Tue, 2014-09-02 at 10:53 +0100, Will Deacon wrote:
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index 0734fbe5b651..50aa56c7b6a7 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -53,11 +53,15 @@ module_param_named(disable_hugepages,
> >  MODULE_PARM_DESC(disable_hugepages,
> >  		 "Disable VFIO IOMMU support for IOMMU hugepages.");
> >  
> > +/* Feature flags for VFIO Type-1 IOMMUs */
> > +#define VFIO_IOMMU_FEAT_V2	(1 << 0)
> > +#define VFIO_IOMMU_FEAT_NESTING	(1 << 1)
> > +
> >  struct vfio_iommu {
> >  	struct list_head	domain_list;
> >  	struct mutex		lock;
> >  	struct rb_root		dma_list;
> > -	bool v2;
> > +	int			features;
> 
> Personally I'd rather add a nesting bool than mask out flag bits.

Sure.

> > @@ -705,6 +710,9 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> >  		goto out_free;
> >  	}
> >  
> > +	if (iommu->features & VFIO_IOMMU_FEAT_NESTING)
> > +		iommu_domain_set_attr(domain->domain, DOMAIN_ATTR_NESTING, &attr);
> > +
> 
> Shouldn't we fail if this doesn't work?

The reason I don't fail is because device passthrough can still work with an
IOMMU that isn't capable of nesting. The part that won't work is attempting
to instantiate a virtual SMMU interface for the guest.

However, it just struck me that userspace knows whether or not it's going to
want a virtual SMMU interface. If it does, then failure to set NESTING is
fatal. If it doesn't, then it doesn't need to set the flag so there is no
issue.

> >  	ret = iommu_attach_group(domain->domain, iommu_group);
> >  	if (ret)
> >  		goto out_domain;
> > @@ -815,12 +823,50 @@ done:
> >  	mutex_unlock(&iommu->lock);
> >  }
> >  
> > +static int vfio_iommus_have_nesting(void)
> > +{
> > +	static struct bus_type *bus_types[] = { &pci_bus_type };
> 
> Hmm, we just got rid of all traces of pci in this file, so I'm not too
> excited about adding it back.  Don't we no longer include pci.h?  I'm
> surprised this doesn't throw a warning.  We should try to be independent
> of CONFIG_PCI here.

I don't see any warning here.

> > +		if (!iommu_present(bus))
> > +			continue;
> > +
> > +		domain = iommu_domain_alloc(bus);
> > +		if (!domain)
> > +			continue;
> > +
> > +		if (!iommu_domain_set_attr(domain, DOMAIN_ATTR_NESTING, &attr))
> > +			ret = 1;
> > +
> > +		iommu_domain_free(domain);
> > +	}
> 
> Can this support Joerg's iommu_capable() interface so we can test w/o
> creating a domain?  We still have the bus_type problem though.

[...]

> > @@ -886,6 +955,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  		case VFIO_TYPE1_IOMMU:
> >  		case VFIO_TYPE1v2_IOMMU:
> >  			return 1;
> > +		case VFIO_TYPE1_NESTING_IOMMU:
> > +			if (!iommu)
> > +				return vfio_iommus_have_nesting();
> > +			return vfio_domains_have_iommu_nesting(iommu);
> 
> I'd just about prefer to statically return 1 for NESTING and then fail
> to add groups if we can't set the attribute.  It sucks, but here we test
> an arbitrary bus type to determine whether nesting is supported, allow a
> nesting iommu type to be set regardless of what kind of devices are
> about to go into it, and require that the user test to see whether the
> requested IOMMU attribute actually stuck.  That's not a very intuitive
> interface.  Thanks,

Yes, I totally agree that the interface is clunky. Returning 1 for NESTING
solves both the pci bus problem and the iommu_capable issue, so that sounds
like a great simplification of the code.

That means userspace:

  - Can always open a VFIO_TYPE1_NESTING_IOMMU
  - VFIO_CHECK_EXTENSION will always return 1
  - When adding groups (e.g. SET_IOMMU), the iommu_domain_set_attr can
    fail, which will cause the corresponding ioctl to fail

If nesting is not required because there is no virtual SMMU interface,
then the usual TYPE1 IOMMU can be used.

Sound good?

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/3] vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type
       [not found]             ` <20140912165520.GF12108-5wv7dgnIgG8@public.gmane.org>
@ 2014-09-12 17:25               ` Alex Williamson
  0 siblings, 0 replies; 12+ messages in thread
From: Alex Williamson @ 2014-09-12 17:25 UTC (permalink / raw)
  To: Will Deacon; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Fri, 2014-09-12 at 17:55 +0100, Will Deacon wrote:
> Hi Alex,
> 
> Thanks for taking a look.
> 
> On Thu, Sep 11, 2014 at 08:12:28PM +0100, Alex Williamson wrote:
> > On Tue, 2014-09-02 at 10:53 +0100, Will Deacon wrote:
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > > index 0734fbe5b651..50aa56c7b6a7 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -53,11 +53,15 @@ module_param_named(disable_hugepages,
> > >  MODULE_PARM_DESC(disable_hugepages,
> > >  		 "Disable VFIO IOMMU support for IOMMU hugepages.");
> > >  
> > > +/* Feature flags for VFIO Type-1 IOMMUs */
> > > +#define VFIO_IOMMU_FEAT_V2	(1 << 0)
> > > +#define VFIO_IOMMU_FEAT_NESTING	(1 << 1)
> > > +
> > >  struct vfio_iommu {
> > >  	struct list_head	domain_list;
> > >  	struct mutex		lock;
> > >  	struct rb_root		dma_list;
> > > -	bool v2;
> > > +	int			features;
> > 
> > Personally I'd rather add a nesting bool than mask out flag bits.
> 
> Sure.
> 
> > > @@ -705,6 +710,9 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> > >  		goto out_free;
> > >  	}
> > >  
> > > +	if (iommu->features & VFIO_IOMMU_FEAT_NESTING)
> > > +		iommu_domain_set_attr(domain->domain, DOMAIN_ATTR_NESTING, &attr);
> > > +
> > 
> > Shouldn't we fail if this doesn't work?
> 
> The reason I don't fail is because device passthrough can still work with an
> IOMMU that isn't capable of nesting. The part that won't work is attempting
> to instantiate a virtual SMMU interface for the guest.
> 
> However, it just struck me that userspace knows whether or not it's going to
> want a virtual SMMU interface. If it does, then failure to set NESTING is
> fatal. If it doesn't, then it doesn't need to set the flag so there is no
> issue.
> 
> > >  	ret = iommu_attach_group(domain->domain, iommu_group);
> > >  	if (ret)
> > >  		goto out_domain;
> > > @@ -815,12 +823,50 @@ done:
> > >  	mutex_unlock(&iommu->lock);
> > >  }
> > >  
> > > +static int vfio_iommus_have_nesting(void)
> > > +{
> > > +	static struct bus_type *bus_types[] = { &pci_bus_type };
> > 
> > Hmm, we just got rid of all traces of pci in this file, so I'm not too
> > excited about adding it back.  Don't we no longer include pci.h?  I'm
> > surprised this doesn't throw a warning.  We should try to be independent
> > of CONFIG_PCI here.
> 
> I don't see any warning here.
> 
> > > +		if (!iommu_present(bus))
> > > +			continue;
> > > +
> > > +		domain = iommu_domain_alloc(bus);
> > > +		if (!domain)
> > > +			continue;
> > > +
> > > +		if (!iommu_domain_set_attr(domain, DOMAIN_ATTR_NESTING, &attr))
> > > +			ret = 1;
> > > +
> > > +		iommu_domain_free(domain);
> > > +	}
> > 
> > Can this support Joerg's iommu_capable() interface so we can test w/o
> > creating a domain?  We still have the bus_type problem though.
> 
> [...]
> 
> > > @@ -886,6 +955,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> > >  		case VFIO_TYPE1_IOMMU:
> > >  		case VFIO_TYPE1v2_IOMMU:
> > >  			return 1;
> > > +		case VFIO_TYPE1_NESTING_IOMMU:
> > > +			if (!iommu)
> > > +				return vfio_iommus_have_nesting();
> > > +			return vfio_domains_have_iommu_nesting(iommu);
> > 
> > I'd just about prefer to statically return 1 for NESTING and then fail
> > to add groups if we can't set the attribute.  It sucks, but here we test
> > an arbitrary bus type to determine whether nesting is supported, allow a
> > nesting iommu type to be set regardless of what kind of devices are
> > about to go into it, and require that the user test to see whether the
> > requested IOMMU attribute actually stuck.  That's not a very intuitive
> > interface.  Thanks,
> 
> Yes, I totally agree that the interface is clunky. Returning 1 for NESTING
> solves both the pci bus problem and the iommu_capable issue, so that sounds
> like a great simplification of the code.
> 
> That means userspace:
> 
>   - Can always open a VFIO_TYPE1_NESTING_IOMMU
>   - VFIO_CHECK_EXTENSION will always return 1
>   - When adding groups (e.g. SET_IOMMU), the iommu_domain_set_attr can
>     fail, which will cause the corresponding ioctl to fail
> 
> If nesting is not required because there is no virtual SMMU interface,
> then the usual TYPE1 IOMMU can be used.
> 
> Sound good?

It's not 100% satisfying, but I can't think of anything better.
VFIO_TYPE1_NESTING_IOMMU is then just a VFIO level indication of support
with no indication of underlying hardware support.  The user doesn't
know until they add a group if it's really possible.  Not ideal from a
user perspective, but VFIO has always left itself the option of
disallowing groups from being added to a container.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-09-12 17:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-02  9:53 [PATCH v3 0/3] Support for nesting IOMMUs in VFIO Will Deacon
     [not found] ` <1409651619-27342-1-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
2014-09-02  9:53   ` [PATCH v3 1/3] iommu: introduce domain attribute for nesting IOMMUs Will Deacon
2014-09-02  9:53   ` [PATCH v3 2/3] vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type Will Deacon
     [not found]     ` <1409651619-27342-3-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
2014-09-11 19:12       ` Alex Williamson
     [not found]         ` <1410462748.2982.342.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
2014-09-12 16:55           ` Will Deacon
     [not found]             ` <20140912165520.GF12108-5wv7dgnIgG8@public.gmane.org>
2014-09-12 17:25               ` Alex Williamson
2014-09-02  9:53   ` [PATCH v3 3/3] iommu/arm-smmu: add support for DOMAIN_ATTR_NESTING attribute Will Deacon
     [not found]     ` <1409651619-27342-4-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
2014-09-02 11:12       ` Varun Sethi
     [not found]         ` <6c60062d962340bbaeccb07c82eaae73-AZ66ij2kwaacCcN9WK45f+O6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
2014-09-02 11:16           ` Will Deacon
2014-09-03 15:22   ` [PATCH v3 0/3] Support for nesting IOMMUs in VFIO Joerg Roedel
     [not found]     ` <20140903152241.GK28786-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2014-09-03 16:04       ` Will Deacon
2014-09-11 17:10   ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.