All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 1/3] iommu: introduce IOMMU_DOMAIN_HYP domain type for hypervisor allocation
@ 2014-07-01 16:10 Will Deacon
       [not found] ` <1404231017-10856-1-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Will Deacon @ 2014-07-01 16:10 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Will Deacon

Some IOMMUs, such as the ARM SMMU, support two stages of translation.
The idea behind such a scheme is to allow a guest operating system to
use the IOMMU for DMA mappings in the first stage of translation, with
the hypervisor then installing mappings in the second stage to provide
isolation of the DMA to the physical range assigned to that virtual
machine.

In order to allow IOMMU domains to be allocated for second-stage
translation, this patch extends iommu_domain_alloc (and the associated
->domain_init callback on struct iommu) to take a type parameter
indicating the intended purpose for the domain. The only supported types
at present are IOMMU_DOMAIN_DMA (i.e. what we have at the moment) and
IOMMU_DOMAIN_HYP, which instructs the backend driver to allocate and
initialise a second-stage domain, if possible.

All IOMMU drivers are updated to take the new type parameter, but it is
ignored at present. All callers of iommu_domain_alloc are also updated
to pass IOMMU_DOMAIN_DMA as the type parameter, apart from
kvm_iommu_map_guest, which passes the new IOMMU_DOMAIN_HYP flag.

Finally, a new IOMMU capability, IOMMU_CAP_HYP_MAPPING, is added so that
it is possible to check whether or not a domain is able to make use of
nested translation.

Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
Cc: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
---

Hi Joerg, Alex,

Initially I wanted to use iommu_attr for this, but the IOMMU_DOMAIN_HYP
flag is not something that can change dynamically and needs to be known
at domain initialisation time. We could pass iommu_attr to domain_alloc,
but that would interact strangely with the existing attrs.

I'm open to all suggestions!

Will

 arch/arm/mm/dma-mapping.c                |  2 +-
 drivers/gpu/drm/msm/msm_gpu.c            |  2 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c |  2 +-
 drivers/iommu/amd_iommu.c                |  2 +-
 drivers/iommu/amd_iommu_v2.c             |  2 +-
 drivers/iommu/arm-smmu.c                 |  2 +-
 drivers/iommu/exynos-iommu.c             |  2 +-
 drivers/iommu/fsl_pamu_domain.c          |  2 +-
 drivers/iommu/intel-iommu.c              |  2 +-
 drivers/iommu/iommu.c                    |  4 ++--
 drivers/iommu/ipmmu-vmsa.c               |  2 +-
 drivers/iommu/msm_iommu.c                |  2 +-
 drivers/iommu/omap-iommu.c               |  2 +-
 drivers/iommu/shmobile-iommu.c           |  2 +-
 drivers/iommu/tegra-gart.c               |  2 +-
 drivers/iommu/tegra-smmu.c               |  2 +-
 drivers/remoteproc/remoteproc_core.c     |  2 +-
 drivers/vfio/vfio_iommu_type1.c          |  2 +-
 include/linux/iommu.h                    | 12 +++++++++---
 virt/kvm/iommu.c                         |  3 ++-
 20 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 4c88935654ca..c768e58fa3f1 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1970,7 +1970,7 @@ arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, size_t size)
 
 	spin_lock_init(&mapping->lock);
 
-	mapping->domain = iommu_domain_alloc(bus);
+	mapping->domain = iommu_domain_alloc(bus, IOMMU_DOMAIN_DMA);
 	if (!mapping->domain)
 		goto err4;
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index c6322197db8c..aa755c54e70f 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -603,7 +603,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	 * and have separate page tables per context.  For now, to keep things
 	 * simple and to get something working, just use a single address space:
 	 */
-	iommu = iommu_domain_alloc(&platform_bus_type);
+	iommu = iommu_domain_alloc(&platform_bus_type, IOMMU_DOMAIN_DMA);
 	if (iommu) {
 		dev_info(drm->dev, "%s: using IOMMU\n", name);
 		gpu->mmu = msm_iommu_new(drm, iommu);
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 801a1d6937e4..4bdec982fc39 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -471,7 +471,7 @@ struct usnic_uiom_pd *usnic_uiom_alloc_pd(void)
 	if (!pd)
 		return ERR_PTR(-ENOMEM);
 
-	pd->domain = domain = iommu_domain_alloc(&pci_bus_type);
+	pd->domain = domain = iommu_domain_alloc(&pci_bus_type, IOMMU_DOMAIN_DMA);
 	if (IS_ERR_OR_NULL(domain)) {
 		usnic_err("Failed to allocate IOMMU domain with err %ld\n",
 				PTR_ERR(pd->domain));
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 4aec6a29e316..98a5cf4f4c02 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3292,7 +3292,7 @@ static int __init alloc_passthrough_domain(void)
 
 	return 0;
 }
-static int amd_iommu_domain_init(struct iommu_domain *dom)
+static int amd_iommu_domain_init(struct iommu_domain *dom, int type)
 {
 	struct protection_domain *domain;
 
diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 499b4366a98d..984f0aa19d6f 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -774,7 +774,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
 	if (dev_state->states == NULL)
 		goto out_free_dev_state;
 
-	dev_state->domain = iommu_domain_alloc(&pci_bus_type);
+	dev_state->domain = iommu_domain_alloc(&pci_bus_type, IOMMU_DOMAIN_DMA);
 	if (dev_state->domain == NULL)
 		goto out_free_states;
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a6e38982d09c..37d36e88420b 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -940,7 +940,7 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
 	__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
 }
 
-static int arm_smmu_domain_init(struct iommu_domain *domain)
+static int arm_smmu_domain_init(struct iommu_domain *domain, int type)
 {
 	struct arm_smmu_domain *smmu_domain;
 	pgd_t *pgd;
diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 99054d2c040d..7f9b18cc214c 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -697,7 +697,7 @@ static inline void pgtable_flush(void *vastart, void *vaend)
 				virt_to_phys(vaend));
 }
 
-static int exynos_iommu_domain_init(struct iommu_domain *domain)
+static int exynos_iommu_domain_init(struct iommu_domain *domain, int type)
 {
 	struct exynos_iommu_domain *priv;
 	int i;
diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 93072ba44b1d..4e1b7e1fd8e9 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -438,7 +438,7 @@ static void fsl_pamu_domain_destroy(struct iommu_domain *domain)
 	kmem_cache_free(fsl_pamu_domain_cache, dma_domain);
 }
 
-static int fsl_pamu_domain_init(struct iommu_domain *domain)
+static int fsl_pamu_domain_init(struct iommu_domain *domain, int type)
 {
 	struct fsl_dma_domain *dma_domain;
 
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 51b6b77dc3e5..415f8fdb0256 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4160,7 +4160,7 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width)
 	return 0;
 }
 
-static int intel_iommu_domain_init(struct iommu_domain *domain)
+static int intel_iommu_domain_init(struct iommu_domain *domain, int type)
 {
 	struct dmar_domain *dmar_domain;
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index e5555fcfe703..ca0a850152ed 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -647,7 +647,7 @@ void iommu_set_fault_handler(struct iommu_domain *domain,
 }
 EXPORT_SYMBOL_GPL(iommu_set_fault_handler);
 
-struct iommu_domain *iommu_domain_alloc(struct bus_type *bus)
+struct iommu_domain *iommu_domain_alloc(struct bus_type *bus, int type)
 {
 	struct iommu_domain *domain;
 	int ret;
@@ -661,7 +661,7 @@ struct iommu_domain *iommu_domain_alloc(struct bus_type *bus)
 
 	domain->ops = bus->iommu_ops;
 
-	ret = domain->ops->domain_init(domain);
+	ret = domain->ops->domain_init(domain, type);
 	if (ret)
 		goto out_free;
 
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 53cde086e83b..a7f7de68c830 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -854,7 +854,7 @@ done:
  * IOMMU Operations
  */
 
-static int ipmmu_domain_init(struct iommu_domain *io_domain)
+static int ipmmu_domain_init(struct iommu_domain *io_domain, int type)
 {
 	struct ipmmu_vmsa_domain *domain;
 
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index f5ff657f49fa..1f535e468d33 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -210,7 +210,7 @@ static void __program_context(void __iomem *base, int ctx, phys_addr_t pgtable)
 	SET_M(base, ctx, 1);
 }
 
-static int msm_iommu_domain_init(struct iommu_domain *domain)
+static int msm_iommu_domain_init(struct iommu_domain *domain, int type)
 {
 	struct msm_priv *priv = kzalloc(sizeof(*priv), GFP_KERNEL);
 
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 895af06a667f..83dbb43ebc63 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1160,7 +1160,7 @@ static void omap_iommu_detach_dev(struct iommu_domain *domain,
 	spin_unlock(&omap_domain->lock);
 }
 
-static int omap_iommu_domain_init(struct iommu_domain *domain)
+static int omap_iommu_domain_init(struct iommu_domain *domain, int type)
 {
 	struct omap_iommu_domain *omap_domain;
 
diff --git a/drivers/iommu/shmobile-iommu.c b/drivers/iommu/shmobile-iommu.c
index 464acda0bbc4..0ee48e3ca7f7 100644
--- a/drivers/iommu/shmobile-iommu.c
+++ b/drivers/iommu/shmobile-iommu.c
@@ -82,7 +82,7 @@ static void pgtable_write(struct shmobile_iommu_domain_pgtable *pgtable,
 				   sizeof(val) * count, DMA_TO_DEVICE);
 }
 
-static int shmobile_iommu_domain_init(struct iommu_domain *domain)
+static int shmobile_iommu_domain_init(struct iommu_domain *domain, int type)
 {
 	struct shmobile_iommu_domain *sh_domain;
 	int i, ret;
diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index dba1a9fd5070..a138d772f009 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -216,7 +216,7 @@ out:
 	spin_unlock(&gart->client_lock);
 }
 
-static int gart_iommu_domain_init(struct iommu_domain *domain)
+static int gart_iommu_domain_init(struct iommu_domain *domain, int type)
 {
 	return 0;
 }
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 605b5b46a903..edd1ec26b408 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -868,7 +868,7 @@ out:
 	spin_unlock(&as->client_lock);
 }
 
-static int smmu_iommu_domain_init(struct iommu_domain *domain)
+static int smmu_iommu_domain_init(struct iommu_domain *domain, int type)
 {
 	int i, err = -EAGAIN;
 	unsigned long flags;
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 3cd85a638afa..d0bb5b64e7ee 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -110,7 +110,7 @@ static int rproc_enable_iommu(struct rproc *rproc)
 		return 0;
 	}
 
-	domain = iommu_domain_alloc(dev->bus);
+	domain = iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA);
 	if (!domain) {
 		dev_err(dev, "can't alloc iommu domain\n");
 		return -ENOMEM;
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 0734fbe5b651..8ae76774d28e 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -699,7 +699,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_free;
 
-	domain->domain = iommu_domain_alloc(bus);
+	domain->domain = iommu_domain_alloc(bus, IOMMU_DOMAIN_DMA);
 	if (!domain->domain) {
 		ret = -EIO;
 		goto out_free;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b96a5b2136e4..023ea8775303 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -49,6 +49,10 @@ struct iommu_domain_geometry {
 	bool force_aperture;       /* DMA only allowed in mappable range? */
 };
 
+/* iommu domain types */
+#define IOMMU_DOMAIN_DMA	0x0
+#define IOMMU_DOMAIN_HYP	0x1
+
 struct iommu_domain {
 	struct iommu_ops *ops;
 	void *priv;
@@ -59,6 +63,7 @@ struct iommu_domain {
 
 #define IOMMU_CAP_CACHE_COHERENCY	0x1
 #define IOMMU_CAP_INTR_REMAP		0x2	/* isolates device intrs */
+#define IOMMU_CAP_HYP_MAPPING		0x3	/* isolates guest DMA */
 
 /*
  * Following constraints are specifc to FSL_PAMUV1:
@@ -102,7 +107,7 @@ enum iommu_attr {
  * @pgsize_bitmap: bitmap of supported page sizes
  */
 struct iommu_ops {
-	int (*domain_init)(struct iommu_domain *domain);
+	int (*domain_init)(struct iommu_domain *domain, int type);
 	void (*domain_destroy)(struct iommu_domain *domain);
 	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
 	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
@@ -142,7 +147,7 @@ struct iommu_ops {
 
 extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops);
 extern bool iommu_present(struct bus_type *bus);
-extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus);
+extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus, int type);
 extern struct iommu_group *iommu_group_get_by_id(int id);
 extern void iommu_domain_free(struct iommu_domain *domain);
 extern int iommu_attach_device(struct iommu_domain *domain,
@@ -243,7 +248,8 @@ static inline bool iommu_present(struct bus_type *bus)
 	return false;
 }
 
-static inline struct iommu_domain *iommu_domain_alloc(struct bus_type *bus)
+static inline struct iommu_domain *iommu_domain_alloc(struct bus_type *bus,
+						      int type)
 {
 	return NULL;
 }
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 0df7d4b34dfe..9b025ba63898 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -238,7 +238,8 @@ int kvm_iommu_map_guest(struct kvm *kvm)
 
 	mutex_lock(&kvm->slots_lock);
 
-	kvm->arch.iommu_domain = iommu_domain_alloc(&pci_bus_type);
+	kvm->arch.iommu_domain = iommu_domain_alloc(&pci_bus_type,
+						    IOMMU_DOMAIN_HYP);
 	if (!kvm->arch.iommu_domain) {
 		r = -ENOMEM;
 		goto out_unlock;
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/3] vfio/iommu_type1: add new VFIO_TYPE1_HYP_IOMMU IOMMU type
       [not found] ` <1404231017-10856-1-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
@ 2014-07-01 16:10   ` Will Deacon
  2014-07-01 16:10   ` [RFC PATCH 3/3] iommu/arm-smmu: add support for IOMMU_DOMAIN_HYP flag Will Deacon
  2014-07-01 17:42   ` [RFC PATCH 1/3] iommu: introduce IOMMU_DOMAIN_HYP domain type for hypervisor allocation Alex Williamson
  2 siblings, 0 replies; 10+ messages in thread
From: Will Deacon @ 2014-07-01 16:10 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Will Deacon

VFIO allows devices to be safely handed off to userspace by putting
them behind an IOMMU configured to ensure DMA and interrupt isolation.
This enables userspace KVM clients, such as kvmtool and qemu, to further
map the device into a virtual machine.

With IOMMUs such as the ARM SMMU, it is then possible to provide SMMU
translation services to the guest operating system, despite the existing
translation installed by VFIO. However, enabling this feature means that
the IOMMU driver must be informed that the VFIO domain is being created
for the purposes of a hypervisor.

This patch adds a new IOMMU type (VFIO_TYPE1_HYP_IOMMU) to the VFIO
type-1 driver which acts identically to VFIO_TYPE1v2_IOMMU but passes
the IOMMU_DOMAIN_HYP flag when allocating IOMMU domains for the VFIO
groups attached to the IOMMU.

Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
Cc: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
---
 drivers/vfio/vfio_iommu_type1.c | 22 ++++++++++++++++++----
 include/uapi/linux/vfio.h       |  2 ++
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 8ae76774d28e..612db1b84ae8 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -58,6 +58,7 @@ struct vfio_iommu {
 	struct mutex		lock;
 	struct rb_root		dma_list;
 	bool v2;
+	int			domain_flags;
 };
 
 struct vfio_domain {
@@ -699,7 +700,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_free;
 
-	domain->domain = iommu_domain_alloc(bus, IOMMU_DOMAIN_DMA);
+	domain->domain = iommu_domain_alloc(bus, iommu->domain_flags);
 	if (!domain->domain) {
 		ret = -EIO;
 		goto out_free;
@@ -818,9 +819,20 @@ done:
 static void *vfio_iommu_type1_open(unsigned long arg)
 {
 	struct vfio_iommu *iommu;
-
-	if (arg != VFIO_TYPE1_IOMMU && arg != VFIO_TYPE1v2_IOMMU)
+	bool v2 = false;
+	int domain_flags = IOMMU_DOMAIN_DMA;
+
+	switch (arg) {
+	case VFIO_TYPE1_IOMMU:
+		break;
+	case VFIO_TYPE1_HYP_IOMMU:
+		domain_flags = IOMMU_DOMAIN_HYP;
+	case VFIO_TYPE1v2_IOMMU:
+		v2 = true;
+		break;
+	default:
 		return ERR_PTR(-EINVAL);
+	}
 
 	iommu = kzalloc(sizeof(*iommu), GFP_KERNEL);
 	if (!iommu)
@@ -829,7 +841,8 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	INIT_LIST_HEAD(&iommu->domain_list);
 	iommu->dma_list = RB_ROOT;
 	mutex_init(&iommu->lock);
-	iommu->v2 = (arg == VFIO_TYPE1v2_IOMMU);
+	iommu->v2 = v2;
+	iommu->domain_flags = domain_flags;
 
 	return iommu;
 }
@@ -885,6 +898,7 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		switch (arg) {
 		case VFIO_TYPE1_IOMMU:
 		case VFIO_TYPE1v2_IOMMU:
+		case VFIO_TYPE1_HYP_IOMMU:
 			return 1;
 		case VFIO_DMA_CC_IOMMU:
 			if (!iommu)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index cb9023d4f063..871cf6dfe631 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -29,6 +29,8 @@
  * capability is subject to change as groups are added or removed.
  */
 #define VFIO_DMA_CC_IOMMU		4
+#define VFIO_TYPE1_HYP_IOMMU		5	/* Implies v2 */
+
 
 /*
  * The IOCTL interface is designed for extensibility by embedding the
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 3/3] iommu/arm-smmu: add support for IOMMU_DOMAIN_HYP flag
       [not found] ` <1404231017-10856-1-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
  2014-07-01 16:10   ` [RFC PATCH 2/3] vfio/iommu_type1: add new VFIO_TYPE1_HYP_IOMMU IOMMU type Will Deacon
@ 2014-07-01 16:10   ` Will Deacon
  2014-07-01 17:42   ` [RFC PATCH 1/3] iommu: introduce IOMMU_DOMAIN_HYP domain type for hypervisor allocation Alex Williamson
  2 siblings, 0 replies; 10+ messages in thread
From: Will Deacon @ 2014-07-01 16:10 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Will Deacon

When domains are created with the IOMMU_DOMAIN_HYP flag, we must ensure
that we allocate them to stage-2 context banks if the hardware permits
it.

This patch attempts to allocate IOMMU_DOMAIN_HYP domains to stage-2
context-banks and keeps track of which stage the domain ended up being
assigned to. Our capability check is then updated to report
IOMMU_CAP_HYP_MAPPING only for domains using nested translation..

Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu.c | 33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 37d36e88420b..e0b7c18e88e3 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -391,9 +391,16 @@ struct arm_smmu_cfg {
 #define ARM_SMMU_CB_ASID(cfg)		((cfg)->cbndx)
 #define ARM_SMMU_CB_VMID(cfg)		((cfg)->cbndx + 1)
 
+enum arm_smmu_domain_stage {
+	ARM_SMMU_DOMAIN_S1 = 0,
+	ARM_SMMU_DOMAIN_S2,
+	ARM_SMMU_DOMAIN_NESTED,
+};
+
 struct arm_smmu_domain {
 	struct arm_smmu_device		*smmu;
 	struct arm_smmu_cfg		cfg;
+	enum arm_smmu_domain_stage	stage;
 	spinlock_t			lock;
 };
 
@@ -869,19 +876,25 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	struct arm_smmu_domain *smmu_domain = domain->priv;
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
 
-	if (smmu->features & ARM_SMMU_FEAT_TRANS_NESTED) {
+	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
+		smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
+
+	switch (smmu_domain->stage) {
+	case ARM_SMMU_DOMAIN_S1:
+		cfg->cbar = CBAR_TYPE_S1_TRANS_S2_BYPASS;
+		start = smmu->num_s2_context_banks;
+		break;
+	case ARM_SMMU_DOMAIN_NESTED:
 		/*
 		 * We will likely want to change this if/when KVM gets
 		 * involved.
 		 */
-		cfg->cbar = CBAR_TYPE_S1_TRANS_S2_BYPASS;
-		start = smmu->num_s2_context_banks;
-	} else if (smmu->features & ARM_SMMU_FEAT_TRANS_S1) {
-		cfg->cbar = CBAR_TYPE_S1_TRANS_S2_BYPASS;
-		start = smmu->num_s2_context_banks;
-	} else {
+	case ARM_SMMU_DOMAIN_S2:
 		cfg->cbar = CBAR_TYPE_S2_TRANS;
 		start = 0;
+		break;
+	default:
+		return -EINVAL;
 	}
 
 	ret = __arm_smmu_alloc_bitmap(smmu->context_map, start,
@@ -960,6 +973,10 @@ static int arm_smmu_domain_init(struct iommu_domain *domain, int type)
 	smmu_domain->cfg.pgd = pgd;
 
 	spin_lock_init(&smmu_domain->lock);
+
+	if (type == IOMMU_DOMAIN_HYP)
+		smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
+
 	domain->priv = smmu_domain;
 	return 0;
 
@@ -1516,6 +1533,8 @@ static int arm_smmu_domain_has_cap(struct iommu_domain *domain,
 		return features & ARM_SMMU_FEAT_COHERENT_WALK;
 	case IOMMU_CAP_INTR_REMAP:
 		return 1; /* MSIs are just memory writes */
+	case IOMMU_CAP_HYP_MAPPING:
+		return smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED;
 	default:
 		return 0;
 	}
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/3] iommu: introduce IOMMU_DOMAIN_HYP domain type for hypervisor allocation
       [not found] ` <1404231017-10856-1-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
  2014-07-01 16:10   ` [RFC PATCH 2/3] vfio/iommu_type1: add new VFIO_TYPE1_HYP_IOMMU IOMMU type Will Deacon
  2014-07-01 16:10   ` [RFC PATCH 3/3] iommu/arm-smmu: add support for IOMMU_DOMAIN_HYP flag Will Deacon
@ 2014-07-01 17:42   ` Alex Williamson
       [not found]     ` <1404236553.3225.93.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
  2 siblings, 1 reply; 10+ messages in thread
From: Alex Williamson @ 2014-07-01 17:42 UTC (permalink / raw)
  To: Will Deacon; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, 2014-07-01 at 17:10 +0100, Will Deacon wrote:
> Some IOMMUs, such as the ARM SMMU, support two stages of translation.
> The idea behind such a scheme is to allow a guest operating system to
> use the IOMMU for DMA mappings in the first stage of translation, with
> the hypervisor then installing mappings in the second stage to provide
> isolation of the DMA to the physical range assigned to that virtual
> machine.
> 
> In order to allow IOMMU domains to be allocated for second-stage
> translation, this patch extends iommu_domain_alloc (and the associated
> ->domain_init callback on struct iommu) to take a type parameter
> indicating the intended purpose for the domain. The only supported types
> at present are IOMMU_DOMAIN_DMA (i.e. what we have at the moment) and
> IOMMU_DOMAIN_HYP, which instructs the backend driver to allocate and
> initialise a second-stage domain, if possible.
> 
> All IOMMU drivers are updated to take the new type parameter, but it is
> ignored at present. All callers of iommu_domain_alloc are also updated
> to pass IOMMU_DOMAIN_DMA as the type parameter, apart from
> kvm_iommu_map_guest, which passes the new IOMMU_DOMAIN_HYP flag.
> 
> Finally, a new IOMMU capability, IOMMU_CAP_HYP_MAPPING, is added so that
> it is possible to check whether or not a domain is able to make use of
> nested translation.

Why is this necessarily related to HYPervisor use?  It seems like this
new domain type is effectively just a normal domain that supports some
sort of fault handler that can call out to attempt to create missing
mappings.  IOMMUs supporting PCI PRI (Page Request Interface) could
potentially make use of something like that on bare metal or under
hypervisor control.  If that's true, then could this be some sort of
iommu_domain_set_l2_handler() that happens after the domain is
allocated?

For this patch, I don't understand why legacy KVM assignment would
allocate a HYP domain while VFIO would use a DMA domain.  It seems like
you're just counting on x86 never making the distinction between the
two.

> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -49,6 +49,10 @@ struct iommu_domain_geometry {
>  	bool force_aperture;       /* DMA only allowed in mappable range? */
>  };
>  
> +/* iommu domain types */
> +#define IOMMU_DOMAIN_DMA	0x0
> +#define IOMMU_DOMAIN_HYP	0x1
> +
>  struct iommu_domain {
>  	struct iommu_ops *ops;
>  	void *priv;
> @@ -59,6 +63,7 @@ struct iommu_domain {
>  
>  #define IOMMU_CAP_CACHE_COHERENCY	0x1
>  #define IOMMU_CAP_INTR_REMAP		0x2	/* isolates device intrs */
> +#define IOMMU_CAP_HYP_MAPPING		0x3	/* isolates guest DMA */

This makes no sense, it's exactly what we do with a "DMA" domain.  I
think the code needs to focus on what is really different about this
domain, not what is the expected use case.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/3] iommu: introduce IOMMU_DOMAIN_HYP domain type for hypervisor allocation
       [not found]     ` <1404236553.3225.93.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
@ 2014-07-01 18:04       ` Will Deacon
       [not found]         ` <20140701180426.GX28164-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Will Deacon @ 2014-07-01 18:04 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hi Alex,

Thanks for having a look.

On Tue, Jul 01, 2014 at 06:42:33PM +0100, Alex Williamson wrote:
> On Tue, 2014-07-01 at 17:10 +0100, Will Deacon wrote:
> > Some IOMMUs, such as the ARM SMMU, support two stages of translation.
> > The idea behind such a scheme is to allow a guest operating system to
> > use the IOMMU for DMA mappings in the first stage of translation, with
> > the hypervisor then installing mappings in the second stage to provide
> > isolation of the DMA to the physical range assigned to that virtual
> > machine.
> > 
> > In order to allow IOMMU domains to be allocated for second-stage
> > translation, this patch extends iommu_domain_alloc (and the associated
> > ->domain_init callback on struct iommu) to take a type parameter
> > indicating the intended purpose for the domain. The only supported types
> > at present are IOMMU_DOMAIN_DMA (i.e. what we have at the moment) and
> > IOMMU_DOMAIN_HYP, which instructs the backend driver to allocate and
> > initialise a second-stage domain, if possible.
> > 
> > All IOMMU drivers are updated to take the new type parameter, but it is
> > ignored at present. All callers of iommu_domain_alloc are also updated
> > to pass IOMMU_DOMAIN_DMA as the type parameter, apart from
> > kvm_iommu_map_guest, which passes the new IOMMU_DOMAIN_HYP flag.
> > 
> > Finally, a new IOMMU capability, IOMMU_CAP_HYP_MAPPING, is added so that
> > it is possible to check whether or not a domain is able to make use of
> > nested translation.
> 
> Why is this necessarily related to HYPervisor use?  It seems like this
> new domain type is effectively just a normal domain that supports some
> sort of fault handler that can call out to attempt to create missing
> mappings.

Not quite. The idea of this domain is that it provides isolation for a
guest, so I'd actually expect these domains to contain pinned mappings most
of the time (handling guest faults in the hypervisor is pretty error-prone).

Perhaps if I explain how the ARM SMMU works, that might help (and if it
doesn't, please reply saying so :). The ARM SMMU supports two stages of
translation:

  Stage-1: Guest VA (VA) -> Guest PA (IPA, or intermediate physical address)
  Stage-2: IPA -> Host Physical Address (PA)

These can be glued together to form nested translation, where an incoming VA
is translated through both stages to get a PA. Page table walks triggered at
stage-1 expect to see IPAs for the table addresses.

An important thing to note here is that the hardware is configured
differently at each stage; the page table formats themselves are slightly
different (e.g. restricted permissions at stage-2) and certain hardware
contexts are only capable of stage-2 translation.

The way this is supposed to work is that the KVM host would install the VFIO
DMA mapping (ok, now I see why you don't like the name) at stage-2. This
allows the SMMU driver to allocate a corresponding stage-1 context for the
mapping and expose that directly to the guest as part of a virtual, stage-1-only
SMMU. Then the guest can install its own SMMU mappings at stage-1 for
contiguous DMA (in the guest VA space) without any knowledge of the hypervisor
mapping.

To do this successfully, we need to communicate the intention of the mapping
to the SMMU driver (i.e. stage-1 vs stage-2) at domain initialisation time.
I could just add ARM SMMU-specific properties there, but I thought this
might potentially be useful to others.

> IOMMUs supporting PCI PRI (Page Request Interface) could
> potentially make use of something like that on bare metal or under
> hypervisor control.  If that's true, then could this be some sort of
> iommu_domain_set_l2_handler() that happens after the domain is
> allocated?

I'm not sure that's what I was aiming for... see above.

> For this patch, I don't understand why legacy KVM assignment would
> allocate a HYP domain while VFIO would use a DMA domain.  It seems like
> you're just counting on x86 never making the distinction between the
> two.

That's true, but I was also trying to indicate the intention of the mapping
so that other IOMMUs could potentially make use of the flags.

> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -49,6 +49,10 @@ struct iommu_domain_geometry {
> >  	bool force_aperture;       /* DMA only allowed in mappable range? */
> >  };
> >  
> > +/* iommu domain types */
> > +#define IOMMU_DOMAIN_DMA	0x0
> > +#define IOMMU_DOMAIN_HYP	0x1
> > +
> >  struct iommu_domain {
> >  	struct iommu_ops *ops;
> >  	void *priv;
> > @@ -59,6 +63,7 @@ struct iommu_domain {
> >  
> >  #define IOMMU_CAP_CACHE_COHERENCY	0x1
> >  #define IOMMU_CAP_INTR_REMAP		0x2	/* isolates device intrs */
> > +#define IOMMU_CAP_HYP_MAPPING		0x3	/* isolates guest DMA */
> 
> This makes no sense, it's exactly what we do with a "DMA" domain.  I
> think the code needs to focus on what is really different about this
> domain, not what is the expected use case.  Thanks,

The use-case is certainly relevant, though. I can do device passthrough
with a stage-1 mapping for example, but you wouldn't then be able to
instantiate a virtual SMMU interface in the guest.

I could rename these IOMMU_CAP_STAGE{1,2}, but that then sounds very
ARM-specific. What do you think?

Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/3] iommu: introduce IOMMU_DOMAIN_HYP domain type for hypervisor allocation
       [not found]         ` <20140701180426.GX28164-5wv7dgnIgG8@public.gmane.org>
@ 2014-07-01 19:28           ` Alex Williamson
       [not found]             ` <1404242891.3225.144.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Alex Williamson @ 2014-07-01 19:28 UTC (permalink / raw)
  To: Will Deacon; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, 2014-07-01 at 19:04 +0100, Will Deacon wrote:
> Hi Alex,
> 
> Thanks for having a look.
> 
> On Tue, Jul 01, 2014 at 06:42:33PM +0100, Alex Williamson wrote:
> > On Tue, 2014-07-01 at 17:10 +0100, Will Deacon wrote:
> > > Some IOMMUs, such as the ARM SMMU, support two stages of translation.
> > > The idea behind such a scheme is to allow a guest operating system to
> > > use the IOMMU for DMA mappings in the first stage of translation, with
> > > the hypervisor then installing mappings in the second stage to provide
> > > isolation of the DMA to the physical range assigned to that virtual
> > > machine.
> > > 
> > > In order to allow IOMMU domains to be allocated for second-stage
> > > translation, this patch extends iommu_domain_alloc (and the associated
> > > ->domain_init callback on struct iommu) to take a type parameter
> > > indicating the intended purpose for the domain. The only supported types
> > > at present are IOMMU_DOMAIN_DMA (i.e. what we have at the moment) and
> > > IOMMU_DOMAIN_HYP, which instructs the backend driver to allocate and
> > > initialise a second-stage domain, if possible.
> > > 
> > > All IOMMU drivers are updated to take the new type parameter, but it is
> > > ignored at present. All callers of iommu_domain_alloc are also updated
> > > to pass IOMMU_DOMAIN_DMA as the type parameter, apart from
> > > kvm_iommu_map_guest, which passes the new IOMMU_DOMAIN_HYP flag.
> > > 
> > > Finally, a new IOMMU capability, IOMMU_CAP_HYP_MAPPING, is added so that
> > > it is possible to check whether or not a domain is able to make use of
> > > nested translation.
> > 
> > Why is this necessarily related to HYPervisor use?  It seems like this
> > new domain type is effectively just a normal domain that supports some
> > sort of fault handler that can call out to attempt to create missing
> > mappings.
> 
> Not quite. The idea of this domain is that it provides isolation for a
> guest, so I'd actually expect these domains to contain pinned mappings most
> of the time (handling guest faults in the hypervisor is pretty error-prone).

Hmm, that's exactly what an existing IOMMU domain does and how we use it
through QEMU and VFIO today.

> Perhaps if I explain how the ARM SMMU works, that might help (and if it
> doesn't, please reply saying so :). The ARM SMMU supports two stages of
> translation:
> 
>   Stage-1: Guest VA (VA) -> Guest PA (IPA, or intermediate physical address)
>   Stage-2: IPA -> Host Physical Address (PA)
> 
> These can be glued together to form nested translation, where an incoming VA
> is translated through both stages to get a PA. Page table walks triggered at
> stage-1 expect to see IPAs for the table addresses.
> 
> An important thing to note here is that the hardware is configured
> differently at each stage; the page table formats themselves are slightly
> different (e.g. restricted permissions at stage-2) and certain hardware
> contexts are only capable of stage-2 translation.
> 
> The way this is supposed to work is that the KVM host would install the VFIO
> DMA mapping (ok, now I see why you don't like the name) at stage-2. This
> allows the SMMU driver to allocate a corresponding stage-1 context for the
> mapping and expose that directly to the guest as part of a virtual, stage-1-only
> SMMU. Then the guest can install its own SMMU mappings at stage-1 for
> contiguous DMA (in the guest VA space) without any knowledge of the hypervisor
> mapping.
> 
> To do this successfully, we need to communicate the intention of the mapping
> to the SMMU driver (i.e. stage-1 vs stage-2) at domain initialisation time.
> I could just add ARM SMMU-specific properties there, but I thought this
> might potentially be useful to others.

If I understand, this is just nesting or two-dimensional paging support
and you need the lowest level translation to be setup differently
because the context entries for a stage-1 vs stage-2 page table are
different, right?  Is it then the case that you can also support a
traditional one-dimensional configuration where the SMMU is not exposed
and you'll likely have some QEMU switch to configure which will be used
for a given guest?

The code in 3/3 identifies this as a nested capable domain, so I'm not
sure why we'd want to make the assumption that it's for hypervisor use
when a hypervisor can use either type.

> > IOMMUs supporting PCI PRI (Page Request Interface) could
> > potentially make use of something like that on bare metal or under
> > hypervisor control.  If that's true, then could this be some sort of
> > iommu_domain_set_l2_handler() that happens after the domain is
> > allocated?
> 
> I'm not sure that's what I was aiming for... see above.
> 
> > For this patch, I don't understand why legacy KVM assignment would
> > allocate a HYP domain while VFIO would use a DMA domain.  It seems like
> > you're just counting on x86 never making the distinction between the
> > two.
> 
> That's true, but I was also trying to indicate the intention of the mapping
> so that other IOMMUs could potentially make use of the flags.

I find that defining an API on intentions is rarely a good idea.  Stick
with "what does this actually do".  In this case, I think we want an
interface that either specifies that the domain being allocated is
nesting capable or we want to be able to set a nesting attribute on a
domain, which may be possible after it's been allocated for a small
window before other operations have been done.

If we go the route of defining it at allocation time, I'd probably think
about adding a new interface, like:

iommu_domain_alloc_with_features(struct bus_type *bus, u32 features)

Where features is a bitmap

#define IOMMU_DOMAIN_FEATURE_NESTING (1U << 0)

iommu_domain_alloc(struct bus_type *bus) would then just be a wrapper
with features = 0.  If an IOMMU driver doesn't support a requested
feature, it should fail so we don't have cases like KVM requesting a
"HYP" domain when it doesn't need it and the IOMMU drivers don't support
it.

It would be a lot less intrusive if we could use iommu_domain_set_attr()
to enable nesting after allocation though.  This could return error if
called after the point at which it can be easily enabled.

> > > --- a/include/linux/iommu.h
> > > +++ b/include/linux/iommu.h
> > > @@ -49,6 +49,10 @@ struct iommu_domain_geometry {
> > >  	bool force_aperture;       /* DMA only allowed in mappable range? */
> > >  };
> > >  
> > > +/* iommu domain types */
> > > +#define IOMMU_DOMAIN_DMA	0x0
> > > +#define IOMMU_DOMAIN_HYP	0x1
> > > +
> > >  struct iommu_domain {
> > >  	struct iommu_ops *ops;
> > >  	void *priv;
> > > @@ -59,6 +63,7 @@ struct iommu_domain {
> > >  
> > >  #define IOMMU_CAP_CACHE_COHERENCY	0x1
> > >  #define IOMMU_CAP_INTR_REMAP		0x2	/* isolates device intrs */
> > > +#define IOMMU_CAP_HYP_MAPPING		0x3	/* isolates guest DMA */
> > 
> > This makes no sense, it's exactly what we do with a "DMA" domain.  I
> > think the code needs to focus on what is really different about this
> > domain, not what is the expected use case.  Thanks,
> 
> The use-case is certainly relevant, though. I can do device passthrough
> with a stage-1 mapping for example, but you wouldn't then be able to
> instantiate a virtual SMMU interface in the guest.

I think you've switched the definition of stage-1 here to what we
consider a normal IOMMU domain mapping today, so I'm going to suggest
that stage-1 vs stage-2 where which is which depends on the type of
domain allocated is way too confusing.

> I could rename these IOMMU_CAP_STAGE{1,2}, but that then sounds very
> ARM-specific. What do you think?

Very much so, IOMMU_CAP_NESTING, DOMAIN_ATTR_NESTING?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/3] iommu: introduce IOMMU_DOMAIN_HYP domain type for hypervisor allocation
       [not found]             ` <1404242891.3225.144.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
@ 2014-07-02 10:49               ` Will Deacon
       [not found]                 ` <20140702104902.GH18731-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Will Deacon @ 2014-07-02 10:49 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, Jul 01, 2014 at 08:28:11PM +0100, Alex Williamson wrote:
> On Tue, 2014-07-01 at 19:04 +0100, Will Deacon wrote:
> > On Tue, Jul 01, 2014 at 06:42:33PM +0100, Alex Williamson wrote:
> > > On Tue, 2014-07-01 at 17:10 +0100, Will Deacon wrote:
> > > > Finally, a new IOMMU capability, IOMMU_CAP_HYP_MAPPING, is added so that
> > > > it is possible to check whether or not a domain is able to make use of
> > > > nested translation.
> > > 
> > > Why is this necessarily related to HYPervisor use?  It seems like this
> > > new domain type is effectively just a normal domain that supports some
> > > sort of fault handler that can call out to attempt to create missing
> > > mappings.
> > 
> > Not quite. The idea of this domain is that it provides isolation for a
> > guest, so I'd actually expect these domains to contain pinned mappings most
> > of the time (handling guest faults in the hypervisor is pretty error-prone).
> 
> Hmm, that's exactly what an existing IOMMU domain does and how we use it
> through QEMU and VFIO today.

There is a subtlety here in that VFIO can be used for both KVM device
passthrough and for userspace drivers. The former wants a stage-2 mapping
(to allow nesting) whilst the latter wants stage-1 (since we can't guarantee
cache coherency with a stage-2 mapping in isolation).

> > Perhaps if I explain how the ARM SMMU works, that might help (and if it
> > doesn't, please reply saying so :). The ARM SMMU supports two stages of
> > translation:
> > 
> >   Stage-1: Guest VA (VA) -> Guest PA (IPA, or intermediate physical address)
> >   Stage-2: IPA -> Host Physical Address (PA)
> > 
> > These can be glued together to form nested translation, where an incoming VA
> > is translated through both stages to get a PA. Page table walks triggered at
> > stage-1 expect to see IPAs for the table addresses.
> > 
> > An important thing to note here is that the hardware is configured
> > differently at each stage; the page table formats themselves are slightly
> > different (e.g. restricted permissions at stage-2) and certain hardware
> > contexts are only capable of stage-2 translation.
> > 
> > The way this is supposed to work is that the KVM host would install the VFIO
> > DMA mapping (ok, now I see why you don't like the name) at stage-2. This
> > allows the SMMU driver to allocate a corresponding stage-1 context for the
> > mapping and expose that directly to the guest as part of a virtual, stage-1-only
> > SMMU. Then the guest can install its own SMMU mappings at stage-1 for
> > contiguous DMA (in the guest VA space) without any knowledge of the hypervisor
> > mapping.
> > 
> > To do this successfully, we need to communicate the intention of the mapping
> > to the SMMU driver (i.e. stage-1 vs stage-2) at domain initialisation time.
> > I could just add ARM SMMU-specific properties there, but I thought this
> > might potentially be useful to others.
> 
> If I understand, this is just nesting or two-dimensional paging support
> and you need the lowest level translation to be setup differently
> because the context entries for a stage-1 vs stage-2 page table are
> different, right?

Correct! Other differences include:

  - Slightly different page table format
  - Limited ability to manipulate memory attributes at stage-2 (which can
    affect IOMMU_CACHE mappings, as I mentioned earlier).
  - An implementation can support stage-1 only or stage-2 only if it
    wishes.

> Is it then the case that you can also support a
> traditional one-dimensional configuration where the SMMU is not exposed
> and you'll likely have some QEMU switch to configure which will be used
> for a given guest?

Yup, and this could actually be done at either stage-1 or stage-2 since VFIO
calls iommu_map directly. In the future, we'd like to share the CPU page
tables created by KVM, which would force us to use stage-2.

> The code in 3/3 identifies this as a nested capable domain, so I'm not
> sure why we'd want to make the assumption that it's for hypervisor use
> when a hypervisor can use either type.

Actually, a hypervisor would want to use stage-1 for its own mappings (back
to the IOMMU_CACHE stuff) and stage-2 for guests. Otherwise, I could simply
hack the SMMU driver to always install at the highest available stage.

> > > For this patch, I don't understand why legacy KVM assignment would
> > > allocate a HYP domain while VFIO would use a DMA domain.  It seems like
> > > you're just counting on x86 never making the distinction between the
> > > two.
> > 
> > That's true, but I was also trying to indicate the intention of the mapping
> > so that other IOMMUs could potentially make use of the flags.
> 
> I find that defining an API on intentions is rarely a good idea.  Stick
> with "what does this actually do".  In this case, I think we want an
> interface that either specifies that the domain being allocated is
> nesting capable or we want to be able to set a nesting attribute on a
> domain, which may be possible after it's been allocated for a small
> window before other operations have been done.

Ok, I can work with that. One question that immediately springs to mind is
when you have a stage-2-only SMMU. We can't support nesting, since there's
no stage-1 to give to the guest, but we'd still want to allow
device-passthrough. In this case, perhaps it's best to think of stage-2-only
SMMUs having an identity-map fixed at stage-1?

> If we go the route of defining it at allocation time, I'd probably think
> about adding a new interface, like:
> 
> iommu_domain_alloc_with_features(struct bus_type *bus, u32 features)
> 
> Where features is a bitmap
> 
> #define IOMMU_DOMAIN_FEATURE_NESTING (1U << 0)
> 
> iommu_domain_alloc(struct bus_type *bus) would then just be a wrapper
> with features = 0.  If an IOMMU driver doesn't support a requested
> feature, it should fail so we don't have cases like KVM requesting a
> "HYP" domain when it doesn't need it and the IOMMU drivers don't support
> it.
> 
> It would be a lot less intrusive if we could use iommu_domain_set_attr()
> to enable nesting after allocation though.  This could return error if
> called after the point at which it can be easily enabled.

I'll take another look at set_attr, since I agree that it would be cleaner.

> > > This makes no sense, it's exactly what we do with a "DMA" domain.  I
> > > think the code needs to focus on what is really different about this
> > > domain, not what is the expected use case.  Thanks,
> > 
> > The use-case is certainly relevant, though. I can do device passthrough
> > with a stage-1 mapping for example, but you wouldn't then be able to
> > instantiate a virtual SMMU interface in the guest.
> 
> I think you've switched the definition of stage-1 here to what we
> consider a normal IOMMU domain mapping today, so I'm going to suggest
> that stage-1 vs stage-2 where which is which depends on the type of
> domain allocated is way too confusing.

Agreed; but we do want to support device-passthrough for stage-1-only and
stage-2-only SMMUs. That detail can hopefully be hidden away in the SMMU
driver if we stick to the NESTING flag.

Thanks again for the comments, I'll try and address them for v2.

Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/3] iommu: introduce IOMMU_DOMAIN_HYP domain type for hypervisor allocation
       [not found]                 ` <20140702104902.GH18731-5wv7dgnIgG8@public.gmane.org>
@ 2014-07-02 13:57                   ` Will Deacon
       [not found]                     ` <20140702135742.GC24879-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Will Deacon @ 2014-07-02 13:57 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Wed, Jul 02, 2014 at 11:49:02AM +0100, Will Deacon wrote:
> On Tue, Jul 01, 2014 at 08:28:11PM +0100, Alex Williamson wrote:
> > If we go the route of defining it at allocation time, I'd probably think
> > about adding a new interface, like:
> > 
> > iommu_domain_alloc_with_features(struct bus_type *bus, u32 features)
> > 
> > Where features is a bitmap
> > 
> > #define IOMMU_DOMAIN_FEATURE_NESTING (1U << 0)
> > 
> > iommu_domain_alloc(struct bus_type *bus) would then just be a wrapper
> > with features = 0.  If an IOMMU driver doesn't support a requested
> > feature, it should fail so we don't have cases like KVM requesting a
> > "HYP" domain when it doesn't need it and the IOMMU drivers don't support
> > it.
> > 
> > It would be a lot less intrusive if we could use iommu_domain_set_attr()
> > to enable nesting after allocation though.  This could return error if
> > called after the point at which it can be easily enabled.
> 
> I'll take another look at set_attr, since I agree that it would be cleaner.

Ok, we quickly end up with a bootstrap problem here:

  - We need to know whether the domain is stage-1 or stage-2 before we've
    attached any devices to it (since we can have DMA traffic once a
    device is attached).

  - Until a device is attached, we don't know the specific SMMU instance
    backing the domain.

  - Until we know the SMMU, we don't know whether or not we can do
    nesting, and therefore can't handle set_attr calls.

So the two places we can provide this information are either at domain_alloc
time or at device_attach time. I think the former makes more sense, so I'll
look at adding a new alloc function as you suggest above.

Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/3] iommu: introduce IOMMU_DOMAIN_HYP domain type for hypervisor allocation
       [not found]                     ` <20140702135742.GC24879-5wv7dgnIgG8@public.gmane.org>
@ 2014-07-02 15:04                       ` Alex Williamson
       [not found]                         ` <1404313455.1862.34.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Alex Williamson @ 2014-07-02 15:04 UTC (permalink / raw)
  To: Will Deacon; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Wed, 2014-07-02 at 14:57 +0100, Will Deacon wrote:
> On Wed, Jul 02, 2014 at 11:49:02AM +0100, Will Deacon wrote:
> > On Tue, Jul 01, 2014 at 08:28:11PM +0100, Alex Williamson wrote:
> > > If we go the route of defining it at allocation time, I'd probably think
> > > about adding a new interface, like:
> > > 
> > > iommu_domain_alloc_with_features(struct bus_type *bus, u32 features)
> > > 
> > > Where features is a bitmap
> > > 
> > > #define IOMMU_DOMAIN_FEATURE_NESTING (1U << 0)
> > > 
> > > iommu_domain_alloc(struct bus_type *bus) would then just be a wrapper
> > > with features = 0.  If an IOMMU driver doesn't support a requested
> > > feature, it should fail so we don't have cases like KVM requesting a
> > > "HYP" domain when it doesn't need it and the IOMMU drivers don't support
> > > it.
> > > 
> > > It would be a lot less intrusive if we could use iommu_domain_set_attr()
> > > to enable nesting after allocation though.  This could return error if
> > > called after the point at which it can be easily enabled.
> > 
> > I'll take another look at set_attr, since I agree that it would be cleaner.
> 
> Ok, we quickly end up with a bootstrap problem here:
> 
>   - We need to know whether the domain is stage-1 or stage-2 before we've
>     attached any devices to it (since we can have DMA traffic once a
>     device is attached).
> 
>   - Until a device is attached, we don't know the specific SMMU instance
>     backing the domain.
> 
>   - Until we know the SMMU, we don't know whether or not we can do
>     nesting, and therefore can't handle set_attr calls.
> 
> So the two places we can provide this information are either at domain_alloc
> time or at device_attach time. I think the former makes more sense, so I'll
> look at adding a new alloc function as you suggest above.

I don't really see how using set_attr to set a domain parameter is any
different than specifying it at domain_alloc.  "Create a domain, set it
to require nesting, add devices" versus "Create a domain and set it to
require nesting, add devices".  Thanks,

Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/3] iommu: introduce IOMMU_DOMAIN_HYP domain type for hypervisor allocation
       [not found]                         ` <1404313455.1862.34.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
@ 2014-07-02 18:57                           ` Will Deacon
  0 siblings, 0 replies; 10+ messages in thread
From: Will Deacon @ 2014-07-02 18:57 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Wed, Jul 02, 2014 at 04:04:15PM +0100, Alex Williamson wrote:
> On Wed, 2014-07-02 at 14:57 +0100, Will Deacon wrote:
> > On Wed, Jul 02, 2014 at 11:49:02AM +0100, Will Deacon wrote:
> > > On Tue, Jul 01, 2014 at 08:28:11PM +0100, Alex Williamson wrote:
> > > > If we go the route of defining it at allocation time, I'd probably think
> > > > about adding a new interface, like:
> > > > 
> > > > iommu_domain_alloc_with_features(struct bus_type *bus, u32 features)
> > > > 
> > > > Where features is a bitmap
> > > > 
> > > > #define IOMMU_DOMAIN_FEATURE_NESTING (1U << 0)
> > > > 
> > > > iommu_domain_alloc(struct bus_type *bus) would then just be a wrapper
> > > > with features = 0.  If an IOMMU driver doesn't support a requested
> > > > feature, it should fail so we don't have cases like KVM requesting a
> > > > "HYP" domain when it doesn't need it and the IOMMU drivers don't support
> > > > it.
> > > > 
> > > > It would be a lot less intrusive if we could use iommu_domain_set_attr()
> > > > to enable nesting after allocation though.  This could return error if
> > > > called after the point at which it can be easily enabled.
> > > 
> > > I'll take another look at set_attr, since I agree that it would be cleaner.
> > 
> > Ok, we quickly end up with a bootstrap problem here:
> > 
> >   - We need to know whether the domain is stage-1 or stage-2 before we've
> >     attached any devices to it (since we can have DMA traffic once a
> >     device is attached).
> > 
> >   - Until a device is attached, we don't know the specific SMMU instance
> >     backing the domain.
> > 
> >   - Until we know the SMMU, we don't know whether or not we can do
> >     nesting, and therefore can't handle set_attr calls.
> > 
> > So the two places we can provide this information are either at domain_alloc
> > time or at device_attach time. I think the former makes more sense, so I'll
> > look at adding a new alloc function as you suggest above.
> 
> I don't really see how using set_attr to set a domain parameter is any
> different than specifying it at domain_alloc.  "Create a domain, set it
> to require nesting, add devices" versus "Create a domain and set it to
> require nesting, add devices".  Thanks,

There's a difference in where you detect the error, but actually, that's all
under the same VFIO ioctl so I think you're right.

I'll put together a patch.

Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-07-02 18:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-01 16:10 [RFC PATCH 1/3] iommu: introduce IOMMU_DOMAIN_HYP domain type for hypervisor allocation Will Deacon
     [not found] ` <1404231017-10856-1-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
2014-07-01 16:10   ` [RFC PATCH 2/3] vfio/iommu_type1: add new VFIO_TYPE1_HYP_IOMMU IOMMU type Will Deacon
2014-07-01 16:10   ` [RFC PATCH 3/3] iommu/arm-smmu: add support for IOMMU_DOMAIN_HYP flag Will Deacon
2014-07-01 17:42   ` [RFC PATCH 1/3] iommu: introduce IOMMU_DOMAIN_HYP domain type for hypervisor allocation Alex Williamson
     [not found]     ` <1404236553.3225.93.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
2014-07-01 18:04       ` Will Deacon
     [not found]         ` <20140701180426.GX28164-5wv7dgnIgG8@public.gmane.org>
2014-07-01 19:28           ` Alex Williamson
     [not found]             ` <1404242891.3225.144.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
2014-07-02 10:49               ` Will Deacon
     [not found]                 ` <20140702104902.GH18731-5wv7dgnIgG8@public.gmane.org>
2014-07-02 13:57                   ` Will Deacon
     [not found]                     ` <20140702135742.GC24879-5wv7dgnIgG8@public.gmane.org>
2014-07-02 15:04                       ` Alex Williamson
     [not found]                         ` <1404313455.1862.34.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
2014-07-02 18:57                           ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.