All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt
  2015-09-30 14:56   ` Bharat Bhushan
@ 2015-09-30 11:02     ` kbuild test robot
  -1 siblings, 0 replies; 45+ messages in thread
From: kbuild test robot @ 2015-09-30 11:02 UTC (permalink / raw)
  To: Bharat Bhushan
  Cc: kbuild-all, kvmarm, kvm, alex.williamson, christoffer.dall,
	eric.auger, pranavkumar, marc.zyngier, will.deacon,
	Bharat Bhushan

[-- Attachment #1: Type: text/plain, Size: 634 bytes --]

Hi Bharat,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please ignore]

config: x86_64-rhel (attached as .config)
reproduce:
  git checkout 6fdf43e0b410216a2fe2d1d6e8541fb4f69557f9
  # save the attached .config to linux build tree
  make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

>> ERROR: "vfio_device_map_msi" [drivers/vfio/pci/vfio-pci.ko] undefined!
>> ERROR: "vfio_device_unmap_msi" [drivers/vfio/pci/vfio-pci.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 35272 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt
@ 2015-09-30 11:02     ` kbuild test robot
  0 siblings, 0 replies; 45+ messages in thread
From: kbuild test robot @ 2015-09-30 11:02 UTC (permalink / raw)
  Cc: kbuild-all, kvmarm, kvm, alex.williamson, christoffer.dall,
	eric.auger, pranavkumar, marc.zyngier, will.deacon,
	Bharat Bhushan

[-- Attachment #1: Type: text/plain, Size: 634 bytes --]

Hi Bharat,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please ignore]

config: x86_64-rhel (attached as .config)
reproduce:
  git checkout 6fdf43e0b410216a2fe2d1d6e8541fb4f69557f9
  # save the attached .config to linux build tree
  make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

>> ERROR: "vfio_device_map_msi" [drivers/vfio/pci/vfio-pci.ko] undefined!
>> ERROR: "vfio_device_unmap_msi" [drivers/vfio/pci/vfio-pci.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 35272 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt
  2015-09-30 11:02     ` kbuild test robot
  (?)
@ 2015-09-30 11:32     ` Bhushan Bharat
  -1 siblings, 0 replies; 45+ messages in thread
From: Bhushan Bharat @ 2015-09-30 11:32 UTC (permalink / raw)
  To: kvm
  Cc: kvmarm, alex.williamson, christoffer.dall, eric.auger,
	pranavkumar, marc.zyngier, will.deacon


> -----Original Message-----
> From: kbuild test robot [mailto:lkp@intel.com]
> Sent: Wednesday, September 30, 2015 4:33 PM
> To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> Cc: kbuild-all@01.org; kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> alex.williamson@redhat.com; christoffer.dall@linaro.org;
> eric.auger@linaro.org; pranavkumar@linaro.org; marc.zyngier@arm.com;
> will.deacon@arm.com; Bhushan Bharat-R65777
> <Bharat.Bhushan@freescale.com>
> Subject: Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi
> interrupt
> 
> Hi Bharat,
> 
> [auto build test results on v4.3-rc3 -- if it's inappropriate base, please ignore]
> 
> config: x86_64-rhel (attached as .config)
> reproduce:
>   git checkout 6fdf43e0b410216a2fe2d1d6e8541fb4f69557f9
>   # save the attached .config to linux build tree
>   make ARCH=x86_64
> 
> All error/warnings (new ones prefixed by >>):
> 
> >> ERROR: "vfio_device_map_msi" [drivers/vfio/pci/vfio-pci.ko] undefined!
> >> ERROR: "vfio_device_unmap_msi" [drivers/vfio/pci/vfio-pci.ko]
> undefined!

Yes, this is the problem, I will correct this in next version.

Thanks
-Bharat

> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt
  2015-09-30 14:56   ` Bharat Bhushan
@ 2015-09-30 11:34     ` kbuild test robot
  -1 siblings, 0 replies; 45+ messages in thread
From: kbuild test robot @ 2015-09-30 11:34 UTC (permalink / raw)
  To: Bharat Bhushan
  Cc: kbuild-all, kvmarm, kvm, alex.williamson, christoffer.dall,
	eric.auger, pranavkumar, marc.zyngier, will.deacon,
	Bharat Bhushan

[-- Attachment #1: Type: text/plain, Size: 576 bytes --]

Hi Bharat,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please ignore]

config: i386-allmodconfig (attached as .config)
reproduce:
  git checkout 6fdf43e0b410216a2fe2d1d6e8541fb4f69557f9
  # save the attached .config to linux build tree
  make ARCH=i386 

All error/warnings (new ones prefixed by >>):

>> ERROR: "vfio_device_map_msi" undefined!
>> ERROR: "vfio_device_unmap_msi" undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 51590 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt
@ 2015-09-30 11:34     ` kbuild test robot
  0 siblings, 0 replies; 45+ messages in thread
From: kbuild test robot @ 2015-09-30 11:34 UTC (permalink / raw)
  Cc: kbuild-all, kvmarm, kvm, alex.williamson, christoffer.dall,
	eric.auger, pranavkumar, marc.zyngier, will.deacon,
	Bharat Bhushan

[-- Attachment #1: Type: text/plain, Size: 576 bytes --]

Hi Bharat,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please ignore]

config: i386-allmodconfig (attached as .config)
reproduce:
  git checkout 6fdf43e0b410216a2fe2d1d6e8541fb4f69557f9
  # save the attached .config to linux build tree
  make ARCH=i386 

All error/warnings (new ones prefixed by >>):

>> ERROR: "vfio_device_map_msi" undefined!
>> ERROR: "vfio_device_unmap_msi" undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 51590 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region
@ 2015-09-30 14:56 Bharat Bhushan
  2015-09-30 14:56   ` Bharat Bhushan
                   ` (5 more replies)
  0 siblings, 6 replies; 45+ messages in thread
From: Bharat Bhushan @ 2015-09-30 14:56 UTC (permalink / raw)
  To: kvmarm, kvm, alex.williamson; +Cc: marc.zyngier, will.deacon

This Patch adds the VFIO APIs to add and remove reserved iova
regions. The reserved iova region can be used for mapping some
specific physical address in iommu.

Currently we are planning to use this interface for adding iova
regions for creating iommu of msi-pages. But the API are designed
for future extension where some other physical address can be mapped.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
---
 drivers/vfio/vfio_iommu_type1.c | 201 +++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/vfio.h       |  43 +++++++++
 2 files changed, 243 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 57d8c37..fa5d3e4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -59,6 +59,7 @@ struct vfio_iommu {
 	struct rb_root		dma_list;
 	bool			v2;
 	bool			nesting;
+	struct list_head	reserved_iova_list;
 };
 
 struct vfio_domain {
@@ -77,6 +78,15 @@ struct vfio_dma {
 	int			prot;		/* IOMMU_READ/WRITE */
 };
 
+struct vfio_resvd_region {
+	dma_addr_t	iova;
+	size_t		size;
+	int		prot;			/* IOMMU_READ/WRITE */
+	int		refcount;		/* ref count of mappings */
+	uint64_t	map_paddr;		/* Mapped Physical Address */
+	struct list_head next;
+};
+
 struct vfio_group {
 	struct iommu_group	*iommu_group;
 	struct list_head	next;
@@ -106,6 +116,38 @@ static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
 	return NULL;
 }
 
+/* This function must be called with iommu->lock held */
+static bool vfio_overlap_with_resvd_region(struct vfio_iommu *iommu,
+					   dma_addr_t start, size_t size)
+{
+	struct vfio_resvd_region *region;
+
+	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
+		if (region->iova < start)
+			return (start - region->iova < region->size);
+		else if (start < region->iova)
+			return (region->iova - start < size);
+
+		return (region->size > 0 && size > 0);
+	}
+
+	return false;
+}
+
+/* This function must be called with iommu->lock held */
+static
+struct vfio_resvd_region *vfio_find_resvd_region(struct vfio_iommu *iommu,
+						 dma_addr_t start, size_t size)
+{
+	struct vfio_resvd_region *region;
+
+	list_for_each_entry(region, &iommu->reserved_iova_list, next)
+		if (region->iova == start && region->size == size)
+			return region;
+
+	return NULL;
+}
+
 static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
 {
 	struct rb_node **link = &iommu->dma_list.rb_node, *parent = NULL;
@@ -580,7 +622,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 
 	mutex_lock(&iommu->lock);
 
-	if (vfio_find_dma(iommu, iova, size)) {
+	if (vfio_find_dma(iommu, iova, size) ||
+	    vfio_overlap_with_resvd_region(iommu, iova, size)) {
 		mutex_unlock(&iommu->lock);
 		return -EEXIST;
 	}
@@ -626,6 +669,127 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	return ret;
 }
 
+/* This function must be called with iommu->lock held */
+static
+int vfio_iommu_resvd_region_del(struct vfio_iommu *iommu,
+				dma_addr_t iova, size_t size, int prot)
+{
+	struct vfio_resvd_region *res_region;
+
+	res_region = vfio_find_resvd_region(iommu, iova, size);
+	/* Region should not be mapped in iommu */
+	if (res_region == NULL || res_region->map_paddr)
+		return -EINVAL;
+
+	list_del(&res_region->next);
+	kfree(res_region);
+	return 0;
+}
+
+/* This function must be called with iommu->lock held */
+static int vfio_iommu_resvd_region_add(struct vfio_iommu *iommu,
+				       dma_addr_t iova, size_t size, int prot)
+{
+	struct vfio_resvd_region *res_region;
+
+	/* Check overlap with with dma maping and reserved regions */
+	if (vfio_find_dma(iommu, iova, size) ||
+	    vfio_find_resvd_region(iommu, iova, size))
+		return -EEXIST;
+
+	res_region = kzalloc(sizeof(*res_region), GFP_KERNEL);
+	if (res_region == NULL)
+		return -ENOMEM;
+
+	res_region->iova = iova;
+	res_region->size = size;
+	res_region->prot = prot;
+	res_region->refcount = 0;
+	res_region->map_paddr = 0;
+
+	list_add(&res_region->next, &iommu->reserved_iova_list);
+
+	return 0;
+}
+
+static
+int vfio_handle_reserved_region_add(struct vfio_iommu *iommu,
+				struct vfio_iommu_reserved_region_add *region)
+{
+	dma_addr_t iova = region->iova;
+	size_t size = region->size;
+	int flags = region->flags;
+	uint64_t mask;
+	int prot = 0;
+	int ret;
+
+	/* Verify that none of our __u64 fields overflow */
+	if (region->size != size || region->iova != iova)
+		return -EINVAL;
+
+	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
+
+	WARN_ON(mask & PAGE_MASK);
+
+	if (flags & VFIO_IOMMU_RES_REGION_READ)
+		prot |= IOMMU_READ;
+	if (flags & VFIO_IOMMU_RES_REGION_WRITE)
+		prot |= IOMMU_WRITE;
+
+	if (!prot || !size || (size | iova) & mask)
+		return -EINVAL;
+
+	/* Don't allow IOVA wrap */
+	if (iova + size - 1 < iova)
+		return -EINVAL;
+
+	mutex_lock(&iommu->lock);
+
+	if (region->flags & VFIO_IOMMU_RES_REGION_ADD) {
+		ret = vfio_iommu_resvd_region_add(iommu, iova, size, prot);
+		if (ret) {
+			mutex_unlock(&iommu->lock);
+			return ret;
+		}
+	}
+
+	mutex_unlock(&iommu->lock);
+	return 0;
+}
+
+static
+int vfio_handle_reserved_region_del(struct vfio_iommu *iommu,
+				struct vfio_iommu_reserved_region_del *region)
+{
+	dma_addr_t iova = region->iova;
+	size_t size = region->size;
+	int flags = region->flags;
+	int ret;
+
+	if (!(flags & VFIO_IOMMU_RES_REGION_DEL))
+		return -EINVAL;
+
+	mutex_lock(&iommu->lock);
+
+	/* Check for the region */
+	if (vfio_find_resvd_region(iommu, iova, size) == NULL) {
+		mutex_unlock(&iommu->lock);
+		return -EINVAL;
+	}
+
+	/* remove the reserved region */
+	if (region->flags & VFIO_IOMMU_RES_REGION_DEL) {
+		ret = vfio_iommu_resvd_region_del(iommu, iova, size, flags);
+		if (ret) {
+			mutex_unlock(&iommu->lock);
+			return ret;
+		}
+	}
+
+	mutex_unlock(&iommu->lock);
+	return 0;
+}
+
 static int vfio_bus_type(struct device *dev, void *data)
 {
 	struct bus_type **bus = data;
@@ -905,6 +1069,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	}
 
 	INIT_LIST_HEAD(&iommu->domain_list);
+	INIT_LIST_HEAD(&iommu->reserved_iova_list);
 	iommu->dma_list = RB_ROOT;
 	mutex_init(&iommu->lock);
 
@@ -1020,6 +1185,40 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 			return ret;
 
 		return copy_to_user((void __user *)arg, &unmap, minsz);
+	} else if (cmd == VFIO_IOMMU_RESERVED_REGION_ADD) {
+		struct vfio_iommu_reserved_region_add region;
+		long ret;
+
+		minsz = offsetofend(struct vfio_iommu_reserved_region_add,
+				    size);
+		if (copy_from_user(&region, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (region.argsz < minsz)
+			return -EINVAL;
+
+		ret = vfio_handle_reserved_region_add(iommu, &region);
+		if (ret)
+			return ret;
+
+		return copy_to_user((void __user *)arg, &region, minsz);
+	} else if (cmd == VFIO_IOMMU_RESERVED_REGION_DEL) {
+		struct vfio_iommu_reserved_region_del region;
+		long ret;
+
+		minsz = offsetofend(struct vfio_iommu_reserved_region_del,
+				    size);
+		if (copy_from_user(&region, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (region.argsz < minsz)
+			return -EINVAL;
+
+		ret = vfio_handle_reserved_region_del(iommu, &region);
+		if (ret)
+			return ret;
+
+		return copy_to_user((void __user *)arg, &region, minsz);
 	}
 
 	return -ENOTTY;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index b57b750..1abd1a9 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -440,6 +440,49 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/**************** Reserved IOVA region specific APIs **********************/
+
+/*
+ * VFIO_IOMMU_RESERVED_REGION_ADD - _IO(VFIO_TYPE, VFIO_BASE + 17,
+ *					struct vfio_iommu_reserved_region_add)
+ * This is used to add a reserved iova region.
+ * @flags - Input: VFIO_IOMMU_RES_REGION_ADD flag is for adding
+ * a reserved region.
+ * Also pass READ/WRITE/IOMMU flags to be used in iommu mapping.
+ * @iova - Input: IOVA base address of reserved region
+ * @size - Input: Size of the reserved region
+ * Return: 0 on success, -errno on failure
+ */
+struct vfio_iommu_reserved_region_add {
+	__u32   argsz;
+	__u32   flags;
+#define VFIO_IOMMU_RES_REGION_ADD	(1 << 0) /* Add a reserved region */
+#define VFIO_IOMMU_RES_REGION_READ	(1 << 1) /* readable region */
+#define VFIO_IOMMU_RES_REGION_WRITE	(1 << 2) /* writable region */
+	__u64	iova;
+	__u64   size;
+};
+#define VFIO_IOMMU_RESERVED_REGION_ADD _IO(VFIO_TYPE, VFIO_BASE + 17)
+
+/*
+ * VFIO_IOMMU_RESERVED_REGION_DEL - _IO(VFIO_TYPE, VFIO_BASE + 18,
+ *					struct vfio_iommu_reserved_region_del)
+ * This is used to delete an existing reserved iova region.
+ * @flags - VFIO_IOMMU_RES_REGION_DEL flag is for deleting a region use,
+ *  only a unmapped region can be deleted.
+ * @iova - Input: IOVA base address of reserved region
+ * @size - Input: Size of the reserved region
+ * Return: 0 on success, -errno on failure
+ */
+struct vfio_iommu_reserved_region_del {
+	__u32   argsz;
+	__u32   flags;
+#define VFIO_IOMMU_RES_REGION_DEL	(1 << 0) /* unset the reserved region */
+	__u64	iova;
+	__u64   size;
+};
+#define VFIO_IOMMU_RESERVED_REGION_DEL _IO(VFIO_TYPE, VFIO_BASE + 18)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 2/6] iommu: Add interface to get msi-pages mapping attributes
  2015-09-30 14:56 [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region Bharat Bhushan
@ 2015-09-30 14:56   ` Bharat Bhushan
  2015-09-30 14:56   ` Bharat Bhushan
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Bharat Bhushan @ 2015-09-30 14:56 UTC (permalink / raw)
  To: kvmarm, kvm, alex.williamson; +Cc: marc.zyngier, will.deacon

This APIs return the capability of automatically mapping msi-pages
in iommu with some magic iova. Which is what currently most of
iommu's does and is the default behaviour.

Further API returns whether iommu allows the user to define different
iova for mai-page mapping for the domain. This is required when a msi
capable device is directly assigned to user-space/VM and user-space/VM
need to define a non-overlapping (from other dma-able address space)
iova for msi-pages mapping in iommu.

This patch just define the interface and follow up patches will
extend this interface.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
---
 drivers/iommu/arm-smmu.c        |  3 +++
 drivers/iommu/fsl_pamu_domain.c |  3 +++
 drivers/iommu/iommu.c           | 14 ++++++++++++++
 include/linux/iommu.h           |  9 ++++++++-
 4 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 66a803b..a3956fb 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1406,6 +1406,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 	case DOMAIN_ATTR_NESTING:
 		*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
 		return 0;
+	case DOMAIN_ATTR_MSI_MAPPING:
+		/* Dummy handling added */
+		return 0;
 	default:
 		return -ENODEV;
 	}
diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 1d45293..9a94430 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -856,6 +856,9 @@ static int fsl_pamu_get_domain_attr(struct iommu_domain *domain,
 	case DOMAIN_ATTR_FSL_PAMUV1:
 		*(int *)data = DOMAIN_ATTR_FSL_PAMUV1;
 		break;
+	case DOMAIN_ATTR_MSI_MAPPING:
+		/* Dummy handling added */
+		break;
 	default:
 		pr_debug("Unsupported attribute type\n");
 		ret = -EINVAL;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d4f527e..16c2eab 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1216,6 +1216,7 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
 	bool *paging;
 	int ret = 0;
 	u32 *count;
+	struct iommu_domain_msi_maps *msi_maps;
 
 	switch (attr) {
 	case DOMAIN_ATTR_GEOMETRY:
@@ -1236,6 +1237,19 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
 			ret = -ENODEV;
 
 		break;
+	case DOMAIN_ATTR_MSI_MAPPING:
+		msi_maps = data;
+
+		/* Default MSI-pages are magically mapped with some iova and
+		 * do now allow to configure with different iova.
+		 */
+		msi_maps->automap = true;
+		msi_maps->override_automap = false;
+
+		if (domain->ops->domain_get_attr)
+			ret = domain->ops->domain_get_attr(domain, attr, data);
+
+		break;
 	default:
 		if (!domain->ops->domain_get_attr)
 			return -EINVAL;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0546b87..6d49f3f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -83,6 +83,13 @@ struct iommu_domain {
 	struct iommu_domain_geometry geometry;
 };
 
+struct iommu_domain_msi_maps {
+	dma_addr_t base_address;
+	dma_addr_t size;
+	bool automap;
+	bool override_automap;
+};
+
 enum iommu_cap {
 	IOMMU_CAP_CACHE_COHERENCY,	/* IOMMU can enforce cache coherent DMA
 					   transactions */
@@ -111,6 +118,7 @@ enum iommu_attr {
 	DOMAIN_ATTR_FSL_PAMU_ENABLE,
 	DOMAIN_ATTR_FSL_PAMUV1,
 	DOMAIN_ATTR_NESTING,	/* two stages of translation */
+	DOMAIN_ATTR_MSI_MAPPING, /* Provides MSIs mapping in iommu */
 	DOMAIN_ATTR_MAX,
 };
 
@@ -167,7 +175,6 @@ struct iommu_ops {
 	int (*domain_set_windows)(struct iommu_domain *domain, u32 w_count);
 	/* Get the numer of window per domain */
 	u32 (*domain_get_windows)(struct iommu_domain *domain);
-
 #ifdef CONFIG_OF_IOMMU
 	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
 #endif
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 2/6] iommu: Add interface to get msi-pages mapping attributes
@ 2015-09-30 14:56   ` Bharat Bhushan
  0 siblings, 0 replies; 45+ messages in thread
From: Bharat Bhushan @ 2015-09-30 14:56 UTC (permalink / raw)
  To: kvmarm, kvm, alex.williamson; +Cc: marc.zyngier, will.deacon

This APIs return the capability of automatically mapping msi-pages
in iommu with some magic iova. Which is what currently most of
iommu's does and is the default behaviour.

Further API returns whether iommu allows the user to define different
iova for mai-page mapping for the domain. This is required when a msi
capable device is directly assigned to user-space/VM and user-space/VM
need to define a non-overlapping (from other dma-able address space)
iova for msi-pages mapping in iommu.

This patch just define the interface and follow up patches will
extend this interface.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
---
 drivers/iommu/arm-smmu.c        |  3 +++
 drivers/iommu/fsl_pamu_domain.c |  3 +++
 drivers/iommu/iommu.c           | 14 ++++++++++++++
 include/linux/iommu.h           |  9 ++++++++-
 4 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 66a803b..a3956fb 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1406,6 +1406,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 	case DOMAIN_ATTR_NESTING:
 		*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
 		return 0;
+	case DOMAIN_ATTR_MSI_MAPPING:
+		/* Dummy handling added */
+		return 0;
 	default:
 		return -ENODEV;
 	}
diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 1d45293..9a94430 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -856,6 +856,9 @@ static int fsl_pamu_get_domain_attr(struct iommu_domain *domain,
 	case DOMAIN_ATTR_FSL_PAMUV1:
 		*(int *)data = DOMAIN_ATTR_FSL_PAMUV1;
 		break;
+	case DOMAIN_ATTR_MSI_MAPPING:
+		/* Dummy handling added */
+		break;
 	default:
 		pr_debug("Unsupported attribute type\n");
 		ret = -EINVAL;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d4f527e..16c2eab 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1216,6 +1216,7 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
 	bool *paging;
 	int ret = 0;
 	u32 *count;
+	struct iommu_domain_msi_maps *msi_maps;
 
 	switch (attr) {
 	case DOMAIN_ATTR_GEOMETRY:
@@ -1236,6 +1237,19 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
 			ret = -ENODEV;
 
 		break;
+	case DOMAIN_ATTR_MSI_MAPPING:
+		msi_maps = data;
+
+		/* Default MSI-pages are magically mapped with some iova and
+		 * do now allow to configure with different iova.
+		 */
+		msi_maps->automap = true;
+		msi_maps->override_automap = false;
+
+		if (domain->ops->domain_get_attr)
+			ret = domain->ops->domain_get_attr(domain, attr, data);
+
+		break;
 	default:
 		if (!domain->ops->domain_get_attr)
 			return -EINVAL;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0546b87..6d49f3f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -83,6 +83,13 @@ struct iommu_domain {
 	struct iommu_domain_geometry geometry;
 };
 
+struct iommu_domain_msi_maps {
+	dma_addr_t base_address;
+	dma_addr_t size;
+	bool automap;
+	bool override_automap;
+};
+
 enum iommu_cap {
 	IOMMU_CAP_CACHE_COHERENCY,	/* IOMMU can enforce cache coherent DMA
 					   transactions */
@@ -111,6 +118,7 @@ enum iommu_attr {
 	DOMAIN_ATTR_FSL_PAMU_ENABLE,
 	DOMAIN_ATTR_FSL_PAMUV1,
 	DOMAIN_ATTR_NESTING,	/* two stages of translation */
+	DOMAIN_ATTR_MSI_MAPPING, /* Provides MSIs mapping in iommu */
 	DOMAIN_ATTR_MAX,
 };
 
@@ -167,7 +175,6 @@ struct iommu_ops {
 	int (*domain_set_windows)(struct iommu_domain *domain, u32 w_count);
 	/* Get the numer of window per domain */
 	u32 (*domain_get_windows)(struct iommu_domain *domain);
-
 #ifdef CONFIG_OF_IOMMU
 	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
 #endif
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
  2015-09-30 14:56 [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region Bharat Bhushan
@ 2015-09-30 14:56   ` Bharat Bhushan
  2015-09-30 14:56   ` Bharat Bhushan
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Bharat Bhushan @ 2015-09-30 14:56 UTC (permalink / raw)
  To: kvmarm, kvm, alex.williamson; +Cc: marc.zyngier, will.deacon

This patch allows the user-space to know whether msi-pages
are automatically mapped with some magic iova or not.

Even if the msi-pages are automatically mapped, still user-space
wants to over-ride the automatic iova selection for msi-mapping.
For this user-space need to know whether it is allowed to change
the automatic mapping or not and this API provides this mechanism.
Follow up patches will provide how to over-ride this.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
---
 drivers/vfio/vfio_iommu_type1.c | 32 ++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       |  3 +++
 2 files changed, 35 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index fa5d3e4..3315fb6 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -59,6 +59,7 @@ struct vfio_iommu {
 	struct rb_root		dma_list;
 	bool			v2;
 	bool			nesting;
+	bool			allow_msi_reconfig;
 	struct list_head	reserved_iova_list;
 };
 
@@ -1117,6 +1118,23 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static
+int vfio_domains_get_msi_maps(struct vfio_iommu *iommu,
+			      struct iommu_domain_msi_maps *msi_maps)
+{
+	struct vfio_domain *d;
+	int ret;
+
+	mutex_lock(&iommu->lock);
+	/* All domains have same msi-automap property, pick first */
+	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
+	ret = iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_MAPPING,
+				    msi_maps);
+	mutex_unlock(&iommu->lock);
+
+	return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -1138,6 +1156,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		}
 	} else if (cmd == VFIO_IOMMU_GET_INFO) {
 		struct vfio_iommu_type1_info info;
+		struct iommu_domain_msi_maps msi_maps;
+		int ret;
 
 		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
 
@@ -1149,6 +1169,18 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 		info.flags = 0;
 
+		ret = vfio_domains_get_msi_maps(iommu, &msi_maps);
+		if (ret)
+			return ret;
+
+		if (msi_maps.override_automap) {
+			info.flags |= VFIO_IOMMU_INFO_MSI_ALLOW_RECONFIG;
+			iommu->allow_msi_reconfig = true;
+		}
+
+		if (msi_maps.automap)
+			info.flags |= VFIO_IOMMU_INFO_MSI_AUTOMAP;
+
 		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
 		return copy_to_user((void __user *)arg, &info, minsz);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 1abd1a9..9998f6e 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -391,6 +391,9 @@ struct vfio_iommu_type1_info {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
+#define VFIO_IOMMU_INFO_MSI_AUTOMAP (1 << 1)	/* MSI pages are auto-mapped
+						   in iommu */
+#define VFIO_IOMMU_INFO_MSI_ALLOW_RECONFIG (1 << 2) /* Allows reconfig automap*/
 	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
 };
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
@ 2015-09-30 14:56   ` Bharat Bhushan
  0 siblings, 0 replies; 45+ messages in thread
From: Bharat Bhushan @ 2015-09-30 14:56 UTC (permalink / raw)
  To: kvmarm, kvm, alex.williamson; +Cc: marc.zyngier, will.deacon

This patch allows the user-space to know whether msi-pages
are automatically mapped with some magic iova or not.

Even if the msi-pages are automatically mapped, still user-space
wants to over-ride the automatic iova selection for msi-mapping.
For this user-space need to know whether it is allowed to change
the automatic mapping or not and this API provides this mechanism.
Follow up patches will provide how to over-ride this.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
---
 drivers/vfio/vfio_iommu_type1.c | 32 ++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       |  3 +++
 2 files changed, 35 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index fa5d3e4..3315fb6 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -59,6 +59,7 @@ struct vfio_iommu {
 	struct rb_root		dma_list;
 	bool			v2;
 	bool			nesting;
+	bool			allow_msi_reconfig;
 	struct list_head	reserved_iova_list;
 };
 
@@ -1117,6 +1118,23 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static
+int vfio_domains_get_msi_maps(struct vfio_iommu *iommu,
+			      struct iommu_domain_msi_maps *msi_maps)
+{
+	struct vfio_domain *d;
+	int ret;
+
+	mutex_lock(&iommu->lock);
+	/* All domains have same msi-automap property, pick first */
+	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
+	ret = iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_MAPPING,
+				    msi_maps);
+	mutex_unlock(&iommu->lock);
+
+	return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -1138,6 +1156,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		}
 	} else if (cmd == VFIO_IOMMU_GET_INFO) {
 		struct vfio_iommu_type1_info info;
+		struct iommu_domain_msi_maps msi_maps;
+		int ret;
 
 		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
 
@@ -1149,6 +1169,18 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 		info.flags = 0;
 
+		ret = vfio_domains_get_msi_maps(iommu, &msi_maps);
+		if (ret)
+			return ret;
+
+		if (msi_maps.override_automap) {
+			info.flags |= VFIO_IOMMU_INFO_MSI_ALLOW_RECONFIG;
+			iommu->allow_msi_reconfig = true;
+		}
+
+		if (msi_maps.automap)
+			info.flags |= VFIO_IOMMU_INFO_MSI_AUTOMAP;
+
 		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
 		return copy_to_user((void __user *)arg, &info, minsz);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 1abd1a9..9998f6e 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -391,6 +391,9 @@ struct vfio_iommu_type1_info {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
+#define VFIO_IOMMU_INFO_MSI_AUTOMAP (1 << 1)	/* MSI pages are auto-mapped
+						   in iommu */
+#define VFIO_IOMMU_INFO_MSI_ALLOW_RECONFIG (1 << 2) /* Allows reconfig automap*/
 	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
 };
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI pages
  2015-09-30 14:56 [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region Bharat Bhushan
@ 2015-09-30 14:56   ` Bharat Bhushan
  2015-09-30 14:56   ` Bharat Bhushan
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Bharat Bhushan @ 2015-09-30 14:56 UTC (permalink / raw)
  To: kvmarm, kvm, alex.williamson; +Cc: marc.zyngier, will.deacon

For MSI interrupts to work for a pass-through devices we need
to have mapping of msi-pages in iommu. Now on some platforms
(like x86) does this msi-pages mapping happens magically and in these
case they chooses an iova which they somehow know that it will never
overlap with guest memory. But this magic iova selection
may not be always true for all platform (like PowerPC and ARM64).

Also on x86 platform, there is no problem as long as running a x86-guest
on x86-host but there can be issues when running a non-x86 guest on
x86 host or other userspace applications like (I think ODP/DPDK).
As in these cases there can be chances that it overlaps with guest
memory mapping.

This patch add interface to iommu-map and iommu-unmap msi-pages at
reserved iova chosen by userspace.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
---
 drivers/vfio/vfio.c             |  52 +++++++++++++++++++
 drivers/vfio/vfio_iommu_type1.c | 111 ++++++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h            |   9 +++-
 3 files changed, 171 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 2fb29df..a817d2d 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -605,6 +605,58 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
+int vfio_device_map_msi(struct vfio_device *device, uint64_t msi_addr,
+			uint32_t size, uint64_t *msi_iova)
+{
+	struct vfio_container *container = device->group->container;
+	struct vfio_iommu_driver *driver;
+	int ret;
+
+	/* Validate address and size */
+	if (!msi_addr || !size || !msi_iova)
+		return -EINVAL;
+
+	down_read(&container->group_lock);
+
+	driver = container->iommu_driver;
+	if (!driver || !driver->ops || !driver->ops->msi_map) {
+		up_read(&container->group_lock);
+		return -EINVAL;
+	}
+
+	ret = driver->ops->msi_map(container->iommu_data,
+				   msi_addr, size, msi_iova);
+
+	up_read(&container->group_lock);
+	return ret;
+}
+
+int vfio_device_unmap_msi(struct vfio_device *device, uint64_t msi_iova,
+			  uint64_t size)
+{
+	struct vfio_container *container = device->group->container;
+	struct vfio_iommu_driver *driver;
+	int ret;
+
+	/* Validate address and size */
+	if (!msi_iova || !size)
+		return -EINVAL;
+
+	down_read(&container->group_lock);
+
+	driver = container->iommu_driver;
+	if (!driver || !driver->ops || !driver->ops->msi_unmap) {
+		up_read(&container->group_lock);
+		return -EINVAL;
+	}
+
+	ret = driver->ops->msi_unmap(container->iommu_data,
+				     msi_iova, size);
+
+	up_read(&container->group_lock);
+	return ret;
+}
+
 /**
  * VFIO driver API
  */
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3315fb6..ab376c2 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1003,12 +1003,34 @@ out_free:
 	return ret;
 }
 
+static void vfio_iommu_unmap_all_reserved_regions(struct vfio_iommu *iommu)
+{
+	struct vfio_resvd_region *region;
+	struct vfio_domain *d;
+
+	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
+		list_for_each_entry(d, &iommu->domain_list, next) {
+			if (!region->map_paddr)
+				continue;
+
+			if (!iommu_iova_to_phys(d->domain, region->iova))
+				continue;
+
+			iommu_unmap(d->domain, region->iova, PAGE_SIZE);
+			region->map_paddr = 0;
+			cond_resched();
+		}
+	}
+}
+
 static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu)
 {
 	struct rb_node *node;
 
 	while ((node = rb_first(&iommu->dma_list)))
 		vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma, node));
+
+	vfio_iommu_unmap_all_reserved_regions(iommu);
 }
 
 static void vfio_iommu_type1_detach_group(void *iommu_data,
@@ -1048,6 +1070,93 @@ done:
 	mutex_unlock(&iommu->lock);
 }
 
+static int vfio_iommu_type1_msi_map(void *iommu_data, uint64_t msi_addr,
+				    uint64_t size, uint64_t *msi_iova)
+{
+	struct vfio_iommu *iommu = iommu_data;
+	struct vfio_resvd_region *region;
+	int ret;
+
+	mutex_lock(&iommu->lock);
+
+	/* Do not try ceate iommu-mapping if msi reconfig not allowed */
+	if (!iommu->allow_msi_reconfig) {
+		mutex_unlock(&iommu->lock);
+		return 0;
+	}
+
+	/* Check if there is already region mapping the msi page */
+	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
+		if (region->map_paddr == msi_addr) {
+			*msi_iova = region->iova;
+			region->refcount++;
+			mutex_unlock(&iommu->lock);
+			return 0;
+		}
+	}
+
+	/* Get a unmapped reserved region */
+	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
+		if (!region->map_paddr)
+			break;
+	}
+
+	if (region == NULL) {
+		mutex_unlock(&iommu->lock);
+		return -ENODEV;
+	}
+
+	ret = vfio_iommu_map(iommu, region->iova, msi_addr >> PAGE_SHIFT,
+			     size >> PAGE_SHIFT, region->prot);
+	if (ret) {
+		mutex_unlock(&iommu->lock);
+		return ret;
+	}
+
+	region->map_paddr = msi_addr;
+	*msi_iova = region->iova;
+	region->refcount++;
+
+	mutex_unlock(&iommu->lock);
+
+	return 0;
+}
+
+static int vfio_iommu_type1_msi_unmap(void *iommu_data, uint64_t iova,
+				      uint64_t size)
+{
+	struct vfio_iommu *iommu = iommu_data;
+	struct vfio_resvd_region *region;
+	struct vfio_domain *d;
+
+	mutex_lock(&iommu->lock);
+
+	/* find the region mapping the msi page */
+	list_for_each_entry(region, &iommu->reserved_iova_list, next)
+		if (region->iova == iova)
+			break;
+
+	if (region == NULL || region->refcount <= 0) {
+		mutex_unlock(&iommu->lock);
+		return -EINVAL;
+	}
+
+	region->refcount--;
+	if (!region->refcount) {
+		list_for_each_entry(d, &iommu->domain_list, next) {
+			if (!iommu_iova_to_phys(d->domain, iova))
+				continue;
+
+			iommu_unmap(d->domain, iova, size);
+			cond_resched();
+		}
+	}
+	region->map_paddr = 0;
+
+	mutex_unlock(&iommu->lock);
+	return 0;
+}
+
 static void *vfio_iommu_type1_open(unsigned long arg)
 {
 	struct vfio_iommu *iommu;
@@ -1264,6 +1373,8 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
 	.ioctl		= vfio_iommu_type1_ioctl,
 	.attach_group	= vfio_iommu_type1_attach_group,
 	.detach_group	= vfio_iommu_type1_detach_group,
+	.msi_map	= vfio_iommu_type1_msi_map,
+	.msi_unmap	= vfio_iommu_type1_msi_unmap,
 };
 
 static int __init vfio_iommu_type1_init(void)
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ddb4409..b91085d 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -52,6 +52,10 @@ extern void *vfio_del_group_dev(struct device *dev);
 extern struct vfio_device *vfio_device_get_from_dev(struct device *dev);
 extern void vfio_device_put(struct vfio_device *device);
 extern void *vfio_device_data(struct vfio_device *device);
+extern int vfio_device_map_msi(struct vfio_device *device, uint64_t msi_addr,
+			       uint32_t size, uint64_t *msi_iova);
+int vfio_device_unmap_msi(struct vfio_device *device, uint64_t msi_iova,
+			  uint64_t size);
 
 /**
  * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks
@@ -72,7 +76,10 @@ struct vfio_iommu_driver_ops {
 					struct iommu_group *group);
 	void		(*detach_group)(void *iommu_data,
 					struct iommu_group *group);
-
+	int		(*msi_map)(void *iommu_data, uint64_t msi_addr,
+				   uint64_t size, uint64_t *msi_iova);
+	int		(*msi_unmap)(void *iommu_data, uint64_t msi_iova,
+				     uint64_t size);
 };
 
 extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI pages
@ 2015-09-30 14:56   ` Bharat Bhushan
  0 siblings, 0 replies; 45+ messages in thread
From: Bharat Bhushan @ 2015-09-30 14:56 UTC (permalink / raw)
  To: kvmarm, kvm, alex.williamson; +Cc: marc.zyngier, will.deacon

For MSI interrupts to work for a pass-through devices we need
to have mapping of msi-pages in iommu. Now on some platforms
(like x86) does this msi-pages mapping happens magically and in these
case they chooses an iova which they somehow know that it will never
overlap with guest memory. But this magic iova selection
may not be always true for all platform (like PowerPC and ARM64).

Also on x86 platform, there is no problem as long as running a x86-guest
on x86-host but there can be issues when running a non-x86 guest on
x86 host or other userspace applications like (I think ODP/DPDK).
As in these cases there can be chances that it overlaps with guest
memory mapping.

This patch add interface to iommu-map and iommu-unmap msi-pages at
reserved iova chosen by userspace.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
---
 drivers/vfio/vfio.c             |  52 +++++++++++++++++++
 drivers/vfio/vfio_iommu_type1.c | 111 ++++++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h            |   9 +++-
 3 files changed, 171 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 2fb29df..a817d2d 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -605,6 +605,58 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
+int vfio_device_map_msi(struct vfio_device *device, uint64_t msi_addr,
+			uint32_t size, uint64_t *msi_iova)
+{
+	struct vfio_container *container = device->group->container;
+	struct vfio_iommu_driver *driver;
+	int ret;
+
+	/* Validate address and size */
+	if (!msi_addr || !size || !msi_iova)
+		return -EINVAL;
+
+	down_read(&container->group_lock);
+
+	driver = container->iommu_driver;
+	if (!driver || !driver->ops || !driver->ops->msi_map) {
+		up_read(&container->group_lock);
+		return -EINVAL;
+	}
+
+	ret = driver->ops->msi_map(container->iommu_data,
+				   msi_addr, size, msi_iova);
+
+	up_read(&container->group_lock);
+	return ret;
+}
+
+int vfio_device_unmap_msi(struct vfio_device *device, uint64_t msi_iova,
+			  uint64_t size)
+{
+	struct vfio_container *container = device->group->container;
+	struct vfio_iommu_driver *driver;
+	int ret;
+
+	/* Validate address and size */
+	if (!msi_iova || !size)
+		return -EINVAL;
+
+	down_read(&container->group_lock);
+
+	driver = container->iommu_driver;
+	if (!driver || !driver->ops || !driver->ops->msi_unmap) {
+		up_read(&container->group_lock);
+		return -EINVAL;
+	}
+
+	ret = driver->ops->msi_unmap(container->iommu_data,
+				     msi_iova, size);
+
+	up_read(&container->group_lock);
+	return ret;
+}
+
 /**
  * VFIO driver API
  */
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3315fb6..ab376c2 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1003,12 +1003,34 @@ out_free:
 	return ret;
 }
 
+static void vfio_iommu_unmap_all_reserved_regions(struct vfio_iommu *iommu)
+{
+	struct vfio_resvd_region *region;
+	struct vfio_domain *d;
+
+	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
+		list_for_each_entry(d, &iommu->domain_list, next) {
+			if (!region->map_paddr)
+				continue;
+
+			if (!iommu_iova_to_phys(d->domain, region->iova))
+				continue;
+
+			iommu_unmap(d->domain, region->iova, PAGE_SIZE);
+			region->map_paddr = 0;
+			cond_resched();
+		}
+	}
+}
+
 static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu)
 {
 	struct rb_node *node;
 
 	while ((node = rb_first(&iommu->dma_list)))
 		vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma, node));
+
+	vfio_iommu_unmap_all_reserved_regions(iommu);
 }
 
 static void vfio_iommu_type1_detach_group(void *iommu_data,
@@ -1048,6 +1070,93 @@ done:
 	mutex_unlock(&iommu->lock);
 }
 
+static int vfio_iommu_type1_msi_map(void *iommu_data, uint64_t msi_addr,
+				    uint64_t size, uint64_t *msi_iova)
+{
+	struct vfio_iommu *iommu = iommu_data;
+	struct vfio_resvd_region *region;
+	int ret;
+
+	mutex_lock(&iommu->lock);
+
+	/* Do not try ceate iommu-mapping if msi reconfig not allowed */
+	if (!iommu->allow_msi_reconfig) {
+		mutex_unlock(&iommu->lock);
+		return 0;
+	}
+
+	/* Check if there is already region mapping the msi page */
+	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
+		if (region->map_paddr == msi_addr) {
+			*msi_iova = region->iova;
+			region->refcount++;
+			mutex_unlock(&iommu->lock);
+			return 0;
+		}
+	}
+
+	/* Get a unmapped reserved region */
+	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
+		if (!region->map_paddr)
+			break;
+	}
+
+	if (region == NULL) {
+		mutex_unlock(&iommu->lock);
+		return -ENODEV;
+	}
+
+	ret = vfio_iommu_map(iommu, region->iova, msi_addr >> PAGE_SHIFT,
+			     size >> PAGE_SHIFT, region->prot);
+	if (ret) {
+		mutex_unlock(&iommu->lock);
+		return ret;
+	}
+
+	region->map_paddr = msi_addr;
+	*msi_iova = region->iova;
+	region->refcount++;
+
+	mutex_unlock(&iommu->lock);
+
+	return 0;
+}
+
+static int vfio_iommu_type1_msi_unmap(void *iommu_data, uint64_t iova,
+				      uint64_t size)
+{
+	struct vfio_iommu *iommu = iommu_data;
+	struct vfio_resvd_region *region;
+	struct vfio_domain *d;
+
+	mutex_lock(&iommu->lock);
+
+	/* find the region mapping the msi page */
+	list_for_each_entry(region, &iommu->reserved_iova_list, next)
+		if (region->iova == iova)
+			break;
+
+	if (region == NULL || region->refcount <= 0) {
+		mutex_unlock(&iommu->lock);
+		return -EINVAL;
+	}
+
+	region->refcount--;
+	if (!region->refcount) {
+		list_for_each_entry(d, &iommu->domain_list, next) {
+			if (!iommu_iova_to_phys(d->domain, iova))
+				continue;
+
+			iommu_unmap(d->domain, iova, size);
+			cond_resched();
+		}
+	}
+	region->map_paddr = 0;
+
+	mutex_unlock(&iommu->lock);
+	return 0;
+}
+
 static void *vfio_iommu_type1_open(unsigned long arg)
 {
 	struct vfio_iommu *iommu;
@@ -1264,6 +1373,8 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
 	.ioctl		= vfio_iommu_type1_ioctl,
 	.attach_group	= vfio_iommu_type1_attach_group,
 	.detach_group	= vfio_iommu_type1_detach_group,
+	.msi_map	= vfio_iommu_type1_msi_map,
+	.msi_unmap	= vfio_iommu_type1_msi_unmap,
 };
 
 static int __init vfio_iommu_type1_init(void)
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ddb4409..b91085d 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -52,6 +52,10 @@ extern void *vfio_del_group_dev(struct device *dev);
 extern struct vfio_device *vfio_device_get_from_dev(struct device *dev);
 extern void vfio_device_put(struct vfio_device *device);
 extern void *vfio_device_data(struct vfio_device *device);
+extern int vfio_device_map_msi(struct vfio_device *device, uint64_t msi_addr,
+			       uint32_t size, uint64_t *msi_iova);
+int vfio_device_unmap_msi(struct vfio_device *device, uint64_t msi_iova,
+			  uint64_t size);
 
 /**
  * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks
@@ -72,7 +76,10 @@ struct vfio_iommu_driver_ops {
 					struct iommu_group *group);
 	void		(*detach_group)(void *iommu_data,
 					struct iommu_group *group);
-
+	int		(*msi_map)(void *iommu_data, uint64_t msi_addr,
+				   uint64_t size, uint64_t *msi_iova);
+	int		(*msi_unmap)(void *iommu_data, uint64_t msi_iova,
+				     uint64_t size);
 };
 
 extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt
  2015-09-30 14:56 [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region Bharat Bhushan
@ 2015-09-30 14:56   ` Bharat Bhushan
  2015-09-30 14:56   ` Bharat Bhushan
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Bharat Bhushan @ 2015-09-30 14:56 UTC (permalink / raw)
  To: kvmarm, kvm, alex.williamson; +Cc: marc.zyngier, will.deacon

An MSI-address is allocated and programmed in pcie device
during interrupt configuration. Now for a pass-through device,
try to create the iommu mapping for this allocted/programmed
msi-address.  If the iommu mapping is created and the msi
address programmed in the pcie device is different from
msi-iova as per iommu programming then reconfigure the pci
device to use msi-iova as msi address.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
---
 drivers/vfio/pci/vfio_pci_intrs.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 1f577b4..c9690af 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -312,13 +312,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
 	char *name = msix ? "vfio-msix" : "vfio-msi";
 	struct eventfd_ctx *trigger;
+	struct msi_msg msg;
+	struct vfio_device *device;
+	uint64_t msi_addr, msi_iova;
 	int ret;
 
 	if (vector >= vdev->num_ctx)
 		return -EINVAL;
 
+	device = vfio_device_get_from_dev(&pdev->dev);
+	if (device == NULL)
+		return -EINVAL;
+
 	if (vdev->ctx[vector].trigger) {
 		free_irq(irq, vdev->ctx[vector].trigger);
+		get_cached_msi_msg(irq, &msg);
+		msi_iova = ((u64)msg.address_hi << 32) | msg.address_lo;
+		vfio_device_unmap_msi(device, msi_iova, PAGE_SIZE);
 		kfree(vdev->ctx[vector].name);
 		eventfd_ctx_put(vdev->ctx[vector].trigger);
 		vdev->ctx[vector].trigger = NULL;
@@ -346,12 +356,11 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 	 * cached value of the message prior to enabling.
 	 */
 	if (msix) {
-		struct msi_msg msg;
-
 		get_cached_msi_msg(irq, &msg);
 		pci_write_msi_msg(irq, &msg);
 	}
 
+
 	ret = request_irq(irq, vfio_msihandler, 0,
 			  vdev->ctx[vector].name, trigger);
 	if (ret) {
@@ -360,6 +369,29 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 		return ret;
 	}
 
+	/* Re-program the new-iova in pci-device in case there is
+	 * different iommu-mapping created for programmed msi-address.
+	 */
+	get_cached_msi_msg(irq, &msg);
+	msi_iova = 0;
+	msi_addr = (u64)(msg.address_hi) << 32 | (u64)(msg.address_lo);
+	ret = vfio_device_map_msi(device, msi_addr, PAGE_SIZE, &msi_iova);
+	if (ret) {
+		free_irq(irq, vdev->ctx[vector].trigger);
+		kfree(vdev->ctx[vector].name);
+		eventfd_ctx_put(trigger);
+		return ret;
+	}
+
+	/* Reprogram only if iommu-mapped iova is different from msi-address */
+	if (msi_iova && (msi_iova != msi_addr)) {
+		msg.address_hi = (u32)(msi_iova >> 32);
+		/* Keep Lower bits from original msi message address */
+		msg.address_lo &= PAGE_MASK;
+		msg.address_lo |= (u32)(msi_iova & 0x00000000ffffffff);
+		pci_write_msi_msg(irq, &msg);
+	}
+
 	vdev->ctx[vector].trigger = trigger;
 
 	return 0;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt
@ 2015-09-30 14:56   ` Bharat Bhushan
  0 siblings, 0 replies; 45+ messages in thread
From: Bharat Bhushan @ 2015-09-30 14:56 UTC (permalink / raw)
  To: kvmarm, kvm, alex.williamson; +Cc: marc.zyngier, will.deacon

An MSI-address is allocated and programmed in pcie device
during interrupt configuration. Now for a pass-through device,
try to create the iommu mapping for this allocted/programmed
msi-address.  If the iommu mapping is created and the msi
address programmed in the pcie device is different from
msi-iova as per iommu programming then reconfigure the pci
device to use msi-iova as msi address.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
---
 drivers/vfio/pci/vfio_pci_intrs.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 1f577b4..c9690af 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -312,13 +312,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
 	char *name = msix ? "vfio-msix" : "vfio-msi";
 	struct eventfd_ctx *trigger;
+	struct msi_msg msg;
+	struct vfio_device *device;
+	uint64_t msi_addr, msi_iova;
 	int ret;
 
 	if (vector >= vdev->num_ctx)
 		return -EINVAL;
 
+	device = vfio_device_get_from_dev(&pdev->dev);
+	if (device == NULL)
+		return -EINVAL;
+
 	if (vdev->ctx[vector].trigger) {
 		free_irq(irq, vdev->ctx[vector].trigger);
+		get_cached_msi_msg(irq, &msg);
+		msi_iova = ((u64)msg.address_hi << 32) | msg.address_lo;
+		vfio_device_unmap_msi(device, msi_iova, PAGE_SIZE);
 		kfree(vdev->ctx[vector].name);
 		eventfd_ctx_put(vdev->ctx[vector].trigger);
 		vdev->ctx[vector].trigger = NULL;
@@ -346,12 +356,11 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 	 * cached value of the message prior to enabling.
 	 */
 	if (msix) {
-		struct msi_msg msg;
-
 		get_cached_msi_msg(irq, &msg);
 		pci_write_msi_msg(irq, &msg);
 	}
 
+
 	ret = request_irq(irq, vfio_msihandler, 0,
 			  vdev->ctx[vector].name, trigger);
 	if (ret) {
@@ -360,6 +369,29 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 		return ret;
 	}
 
+	/* Re-program the new-iova in pci-device in case there is
+	 * different iommu-mapping created for programmed msi-address.
+	 */
+	get_cached_msi_msg(irq, &msg);
+	msi_iova = 0;
+	msi_addr = (u64)(msg.address_hi) << 32 | (u64)(msg.address_lo);
+	ret = vfio_device_map_msi(device, msi_addr, PAGE_SIZE, &msi_iova);
+	if (ret) {
+		free_irq(irq, vdev->ctx[vector].trigger);
+		kfree(vdev->ctx[vector].name);
+		eventfd_ctx_put(trigger);
+		return ret;
+	}
+
+	/* Reprogram only if iommu-mapped iova is different from msi-address */
+	if (msi_iova && (msi_iova != msi_addr)) {
+		msg.address_hi = (u32)(msi_iova >> 32);
+		/* Keep Lower bits from original msi message address */
+		msg.address_lo &= PAGE_MASK;
+		msg.address_lo |= (u32)(msi_iova & 0x00000000ffffffff);
+		pci_write_msi_msg(irq, &msg);
+	}
+
 	vdev->ctx[vector].trigger = trigger;
 
 	return 0;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for MSI
  2015-09-30 14:56 [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region Bharat Bhushan
@ 2015-09-30 14:56   ` Bharat Bhushan
  2015-09-30 14:56   ` Bharat Bhushan
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Bharat Bhushan @ 2015-09-30 14:56 UTC (permalink / raw)
  To: kvmarm, kvm, alex.williamson; +Cc: marc.zyngier, will.deacon

Finally ARM SMMU declare that iommu-mapping for MSI-pages are not
set automatically and it should be set explicitly.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
---
 drivers/iommu/arm-smmu.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a3956fb..9d37e72 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1401,13 +1401,18 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 				    enum iommu_attr attr, void *data)
 {
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct iommu_domain_msi_maps *msi_maps;
 
 	switch (attr) {
 	case DOMAIN_ATTR_NESTING:
 		*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
 		return 0;
 	case DOMAIN_ATTR_MSI_MAPPING:
-		/* Dummy handling added */
+		msi_maps = data;
+
+		msi_maps->automap = false;
+		msi_maps->override_automap = true;
+
 		return 0;
 	default:
 		return -ENODEV;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for MSI
@ 2015-09-30 14:56   ` Bharat Bhushan
  0 siblings, 0 replies; 45+ messages in thread
From: Bharat Bhushan @ 2015-09-30 14:56 UTC (permalink / raw)
  To: kvmarm, kvm, alex.williamson; +Cc: marc.zyngier, will.deacon

Finally ARM SMMU declare that iommu-mapping for MSI-pages are not
set automatically and it should be set explicitly.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
---
 drivers/iommu/arm-smmu.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a3956fb..9d37e72 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1401,13 +1401,18 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 				    enum iommu_attr attr, void *data)
 {
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct iommu_domain_msi_maps *msi_maps;
 
 	switch (attr) {
 	case DOMAIN_ATTR_NESTING:
 		*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
 		return 0;
 	case DOMAIN_ATTR_MSI_MAPPING:
-		/* Dummy handling added */
+		msi_maps = data;
+
+		msi_maps->automap = false;
+		msi_maps->override_automap = true;
+
 		return 0;
 	default:
 		return -ENODEV;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region
  2015-09-30 14:56 [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region Bharat Bhushan
                   ` (4 preceding siblings ...)
  2015-09-30 14:56   ` Bharat Bhushan
@ 2015-10-02 22:45 ` Alex Williamson
  2015-10-05  4:55   ` Bhushan Bharat
  5 siblings, 1 reply; 45+ messages in thread
From: Alex Williamson @ 2015-10-02 22:45 UTC (permalink / raw)
  To: Bharat Bhushan
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon

On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> This Patch adds the VFIO APIs to add and remove reserved iova
> regions. The reserved iova region can be used for mapping some
> specific physical address in iommu.
> 
> Currently we are planning to use this interface for adding iova
> regions for creating iommu of msi-pages. But the API are designed
> for future extension where some other physical address can be mapped.
> 
> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 201 +++++++++++++++++++++++++++++++++++++++-
>  include/uapi/linux/vfio.h       |  43 +++++++++
>  2 files changed, 243 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 57d8c37..fa5d3e4 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -59,6 +59,7 @@ struct vfio_iommu {
>  	struct rb_root		dma_list;
>  	bool			v2;
>  	bool			nesting;
> +	struct list_head	reserved_iova_list;

This alignment leads to poor packing in the structure, put it above the
bools.

>  };
>  
>  struct vfio_domain {
> @@ -77,6 +78,15 @@ struct vfio_dma {
>  	int			prot;		/* IOMMU_READ/WRITE */
>  };
>  
> +struct vfio_resvd_region {
> +	dma_addr_t	iova;
> +	size_t		size;
> +	int		prot;			/* IOMMU_READ/WRITE */
> +	int		refcount;		/* ref count of mappings */
> +	uint64_t	map_paddr;		/* Mapped Physical Address */

phys_addr_t

> +	struct list_head next;
> +};
> +
>  struct vfio_group {
>  	struct iommu_group	*iommu_group;
>  	struct list_head	next;
> @@ -106,6 +116,38 @@ static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
>  	return NULL;
>  }
>  
> +/* This function must be called with iommu->lock held */
> +static bool vfio_overlap_with_resvd_region(struct vfio_iommu *iommu,
> +					   dma_addr_t start, size_t size)
> +{
> +	struct vfio_resvd_region *region;
> +
> +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> +		if (region->iova < start)
> +			return (start - region->iova < region->size);
> +		else if (start < region->iova)
> +			return (region->iova - start < size);

<= on both of the return lines?

> +
> +		return (region->size > 0 && size > 0);
> +	}
> +
> +	return false;
> +}
> +
> +/* This function must be called with iommu->lock held */
> +static
> +struct vfio_resvd_region *vfio_find_resvd_region(struct vfio_iommu *iommu,
> +						 dma_addr_t start, size_t size)
> +{
> +	struct vfio_resvd_region *region;
> +
> +	list_for_each_entry(region, &iommu->reserved_iova_list, next)
> +		if (region->iova == start && region->size == size)
> +			return region;
> +
> +	return NULL;
> +}
> +
>  static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
>  {
>  	struct rb_node **link = &iommu->dma_list.rb_node, *parent = NULL;
> @@ -580,7 +622,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  
>  	mutex_lock(&iommu->lock);
>  
> -	if (vfio_find_dma(iommu, iova, size)) {
> +	if (vfio_find_dma(iommu, iova, size) ||
> +	    vfio_overlap_with_resvd_region(iommu, iova, size)) {
>  		mutex_unlock(&iommu->lock);
>  		return -EEXIST;
>  	}
> @@ -626,6 +669,127 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  	return ret;
>  }
>  
> +/* This function must be called with iommu->lock held */
> +static
> +int vfio_iommu_resvd_region_del(struct vfio_iommu *iommu,
> +				dma_addr_t iova, size_t size, int prot)
> +{
> +	struct vfio_resvd_region *res_region;

Have some consistency in naming, just use "region".
> +
> +	res_region = vfio_find_resvd_region(iommu, iova, size);
> +	/* Region should not be mapped in iommu */
> +	if (res_region == NULL || res_region->map_paddr)
> +		return -EINVAL;

Are these two separate errors?  !region is -EINVAL, but being mapped is
-EBUSY.

> +
> +	list_del(&res_region->next);
> +	kfree(res_region);
> +	return 0;
> +}
> +
> +/* This function must be called with iommu->lock held */
> +static int vfio_iommu_resvd_region_add(struct vfio_iommu *iommu,
> +				       dma_addr_t iova, size_t size, int prot)
> +{
> +	struct vfio_resvd_region *res_region;
> +
> +	/* Check overlap with with dma maping and reserved regions */
> +	if (vfio_find_dma(iommu, iova, size) ||
> +	    vfio_find_resvd_region(iommu, iova, size))
> +		return -EEXIST;
> +
> +	res_region = kzalloc(sizeof(*res_region), GFP_KERNEL);
> +	if (res_region == NULL)
> +		return -ENOMEM;
> +
> +	res_region->iova = iova;
> +	res_region->size = size;
> +	res_region->prot = prot;
> +	res_region->refcount = 0;
> +	res_region->map_paddr = 0;

They're already 0 by the kzalloc

> +
> +	list_add(&res_region->next, &iommu->reserved_iova_list);
> +
> +	return 0;
> +}
> +
> +static
> +int vfio_handle_reserved_region_add(struct vfio_iommu *iommu,
> +				struct vfio_iommu_reserved_region_add *region)
> +{
> +	dma_addr_t iova = region->iova;
> +	size_t size = region->size;
> +	int flags = region->flags;
> +	uint64_t mask;
> +	int prot = 0;
> +	int ret;
> +
> +	/* Verify that none of our __u64 fields overflow */
> +	if (region->size != size || region->iova != iova)
> +		return -EINVAL;
> +
> +	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
> +
> +	WARN_ON(mask & PAGE_MASK);
> +
> +	if (flags & VFIO_IOMMU_RES_REGION_READ)
> +		prot |= IOMMU_READ;
> +	if (flags & VFIO_IOMMU_RES_REGION_WRITE)
> +		prot |= IOMMU_WRITE;
> +
> +	if (!prot || !size || (size | iova) & mask)
> +		return -EINVAL;
> +
> +	/* Don't allow IOVA wrap */
> +	if (iova + size - 1 < iova)
> +		return -EINVAL;
> +
> +	mutex_lock(&iommu->lock);
> +
> +	if (region->flags & VFIO_IOMMU_RES_REGION_ADD) {
> +		ret = vfio_iommu_resvd_region_add(iommu, iova, size, prot);
> +		if (ret) {
> +			mutex_unlock(&iommu->lock);
> +			return ret;
> +		}
> +	}

Silently fail if not VFIO_IOMMU_RES_REGION_ADD?

> +
> +	mutex_unlock(&iommu->lock);
> +	return 0;
> +}
> +
> +static
> +int vfio_handle_reserved_region_del(struct vfio_iommu *iommu,
> +				struct vfio_iommu_reserved_region_del *region)
> +{
> +	dma_addr_t iova = region->iova;
> +	size_t size = region->size;
> +	int flags = region->flags;
> +	int ret;
> +
> +	if (!(flags & VFIO_IOMMU_RES_REGION_DEL))
> +		return -EINVAL;
> +
> +	mutex_lock(&iommu->lock);
> +
> +	/* Check for the region */
> +	if (vfio_find_resvd_region(iommu, iova, size) == NULL) {
> +		mutex_unlock(&iommu->lock);
> +		return -EINVAL;
> +	}
> +
> +	/* remove the reserved region */
> +	if (region->flags & VFIO_IOMMU_RES_REGION_DEL) {
> +		ret = vfio_iommu_resvd_region_del(iommu, iova, size, flags);
> +		if (ret) {
> +			mutex_unlock(&iommu->lock);
> +			return ret;
> +		}
> +	}
> +
> +	mutex_unlock(&iommu->lock);
> +	return 0;
> +}
> +
>  static int vfio_bus_type(struct device *dev, void *data)
>  {
>  	struct bus_type **bus = data;
> @@ -905,6 +1069,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>  	}
>  
>  	INIT_LIST_HEAD(&iommu->domain_list);
> +	INIT_LIST_HEAD(&iommu->reserved_iova_list);
>  	iommu->dma_list = RB_ROOT;
>  	mutex_init(&iommu->lock);
>  
> @@ -1020,6 +1185,40 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  			return ret;
>  
>  		return copy_to_user((void __user *)arg, &unmap, minsz);
> +	} else if (cmd == VFIO_IOMMU_RESERVED_REGION_ADD) {
> +		struct vfio_iommu_reserved_region_add region;
> +		long ret;
> +
> +		minsz = offsetofend(struct vfio_iommu_reserved_region_add,
> +				    size);
> +		if (copy_from_user(&region, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (region.argsz < minsz)
> +			return -EINVAL;
> +
> +		ret = vfio_handle_reserved_region_add(iommu, &region);
> +		if (ret)
> +			return ret;
> +
> +		return copy_to_user((void __user *)arg, &region, minsz);
> +	} else if (cmd == VFIO_IOMMU_RESERVED_REGION_DEL) {
> +		struct vfio_iommu_reserved_region_del region;
> +		long ret;
> +
> +		minsz = offsetofend(struct vfio_iommu_reserved_region_del,
> +				    size);
> +		if (copy_from_user(&region, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (region.argsz < minsz)
> +			return -EINVAL;
> +
> +		ret = vfio_handle_reserved_region_del(iommu, &region);
> +		if (ret)
> +			return ret;
> +
> +		return copy_to_user((void __user *)arg, &region, minsz);

So we've just created an interface that is available for all vfio-type1
users, whether it makes any sense for the platform or not, and it allows
the user to consume arbitrary amounts of kernel memory, by making an
infinitely long list of reserved iova entries, brilliant!

>  	}
>  
>  	return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index b57b750..1abd1a9 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -440,6 +440,49 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/**************** Reserved IOVA region specific APIs **********************/
> +
> +/*
> + * VFIO_IOMMU_RESERVED_REGION_ADD - _IO(VFIO_TYPE, VFIO_BASE + 17,
> + *					struct vfio_iommu_reserved_region_add)
> + * This is used to add a reserved iova region.
> + * @flags - Input: VFIO_IOMMU_RES_REGION_ADD flag is for adding
> + * a reserved region.

Why else would we call VFIO_IOMMU_RESERVED_REGION_ADD except to add a
region, this flag is redundant.

> + * Also pass READ/WRITE/IOMMU flags to be used in iommu mapping.
> + * @iova - Input: IOVA base address of reserved region
> + * @size - Input: Size of the reserved region
> + * Return: 0 on success, -errno on failure
> + */
> +struct vfio_iommu_reserved_region_add {
> +	__u32   argsz;
> +	__u32   flags;
> +#define VFIO_IOMMU_RES_REGION_ADD	(1 << 0) /* Add a reserved region */
> +#define VFIO_IOMMU_RES_REGION_READ	(1 << 1) /* readable region */
> +#define VFIO_IOMMU_RES_REGION_WRITE	(1 << 2) /* writable region */
> +	__u64	iova;
> +	__u64   size;
> +};
> +#define VFIO_IOMMU_RESERVED_REGION_ADD _IO(VFIO_TYPE, VFIO_BASE + 17)
> +
> +/*
> + * VFIO_IOMMU_RESERVED_REGION_DEL - _IO(VFIO_TYPE, VFIO_BASE + 18,
> + *					struct vfio_iommu_reserved_region_del)
> + * This is used to delete an existing reserved iova region.
> + * @flags - VFIO_IOMMU_RES_REGION_DEL flag is for deleting a region use,
> + *  only a unmapped region can be deleted.
> + * @iova - Input: IOVA base address of reserved region
> + * @size - Input: Size of the reserved region
> + * Return: 0 on success, -errno on failure
> + */
> +struct vfio_iommu_reserved_region_del {
> +	__u32   argsz;
> +	__u32   flags;
> +#define VFIO_IOMMU_RES_REGION_DEL	(1 << 0) /* unset the reserved region */
> +	__u64	iova;
> +	__u64   size;
> +};
> +#define VFIO_IOMMU_RESERVED_REGION_DEL _IO(VFIO_TYPE, VFIO_BASE + 18)
> +

These are effectively both

struct vfio_iommu_type1_dma_unmap

>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*




^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/6] iommu: Add interface to get msi-pages mapping attributes
  2015-09-30 14:56   ` Bharat Bhushan
  (?)
@ 2015-10-02 22:45   ` Alex Williamson
  2015-10-05  5:17     ` Bhushan Bharat
  2015-10-05  5:56     ` Bhushan Bharat
  -1 siblings, 2 replies; 45+ messages in thread
From: Alex Williamson @ 2015-10-02 22:45 UTC (permalink / raw)
  To: Bharat Bhushan
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon

[really ought to consider cc'ing the iommu list]

On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> This APIs return the capability of automatically mapping msi-pages
> in iommu with some magic iova. Which is what currently most of
> iommu's does and is the default behaviour.
> 
> Further API returns whether iommu allows the user to define different
> iova for mai-page mapping for the domain. This is required when a msi
> capable device is directly assigned to user-space/VM and user-space/VM
> need to define a non-overlapping (from other dma-able address space)
> iova for msi-pages mapping in iommu.
> 
> This patch just define the interface and follow up patches will
> extend this interface.

This is backwards, generally you want to add the infrastructure and only
expose it once all the pieces are in place for it to work.  For
instance, patch 1/6 exposes a new userspace interface for vfio that
doesn't do anything yet.  How does the user know if it's there, *and*
works?

> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> ---
>  drivers/iommu/arm-smmu.c        |  3 +++
>  drivers/iommu/fsl_pamu_domain.c |  3 +++
>  drivers/iommu/iommu.c           | 14 ++++++++++++++
>  include/linux/iommu.h           |  9 ++++++++-
>  4 files changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 66a803b..a3956fb 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1406,6 +1406,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
>  	case DOMAIN_ATTR_NESTING:
>  		*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
>  		return 0;
> +	case DOMAIN_ATTR_MSI_MAPPING:
> +		/* Dummy handling added */
> +		return 0;
>  	default:
>  		return -ENODEV;
>  	}
> diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
> index 1d45293..9a94430 100644
> --- a/drivers/iommu/fsl_pamu_domain.c
> +++ b/drivers/iommu/fsl_pamu_domain.c
> @@ -856,6 +856,9 @@ static int fsl_pamu_get_domain_attr(struct iommu_domain *domain,
>  	case DOMAIN_ATTR_FSL_PAMUV1:
>  		*(int *)data = DOMAIN_ATTR_FSL_PAMUV1;
>  		break;
> +	case DOMAIN_ATTR_MSI_MAPPING:
> +		/* Dummy handling added */
> +		break;
>  	default:
>  		pr_debug("Unsupported attribute type\n");
>  		ret = -EINVAL;
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d4f527e..16c2eab 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1216,6 +1216,7 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
>  	bool *paging;
>  	int ret = 0;
>  	u32 *count;
> +	struct iommu_domain_msi_maps *msi_maps;
>  
>  	switch (attr) {
>  	case DOMAIN_ATTR_GEOMETRY:
> @@ -1236,6 +1237,19 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
>  			ret = -ENODEV;
>  
>  		break;
> +	case DOMAIN_ATTR_MSI_MAPPING:
> +		msi_maps = data;
> +
> +		/* Default MSI-pages are magically mapped with some iova and
> +		 * do now allow to configure with different iova.
> +		 */
> +		msi_maps->automap = true;
> +		msi_maps->override_automap = false;

There's no magic.  I think what you're trying to express is the
difference between platforms that support MSI within the IOMMU IOVA
space and thus need explicit IOMMU mappings vs platforms where MSI
mappings either bypass the IOMMU entirely or are setup implicitly with
interrupt remapping support.

Why does it make sense to impose any sort of defaults?  If the IOMMU
driver doesn't tell us what to do, I don't think we want to assume
anything.

> +
> +		if (domain->ops->domain_get_attr)
> +			ret = domain->ops->domain_get_attr(domain, attr, data);
> +
> +		break;
>  	default:
>  		if (!domain->ops->domain_get_attr)
>  			return -EINVAL;
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 0546b87..6d49f3f 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -83,6 +83,13 @@ struct iommu_domain {
>  	struct iommu_domain_geometry geometry;
>  };
>  
> +struct iommu_domain_msi_maps {
> +	dma_addr_t base_address;
> +	dma_addr_t size;

size_t?

> +	bool automap;
> +	bool override_automap;
> +};
> +
>  enum iommu_cap {
>  	IOMMU_CAP_CACHE_COHERENCY,	/* IOMMU can enforce cache coherent DMA
>  					   transactions */
> @@ -111,6 +118,7 @@ enum iommu_attr {
>  	DOMAIN_ATTR_FSL_PAMU_ENABLE,
>  	DOMAIN_ATTR_FSL_PAMUV1,
>  	DOMAIN_ATTR_NESTING,	/* two stages of translation */
> +	DOMAIN_ATTR_MSI_MAPPING, /* Provides MSIs mapping in iommu */
>  	DOMAIN_ATTR_MAX,
>  };
>  
> @@ -167,7 +175,6 @@ struct iommu_ops {
>  	int (*domain_set_windows)(struct iommu_domain *domain, u32 w_count);
>  	/* Get the numer of window per domain */
>  	u32 (*domain_get_windows)(struct iommu_domain *domain);
> -
>  #ifdef CONFIG_OF_IOMMU
>  	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
>  #endif




^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
  2015-09-30 14:56   ` Bharat Bhushan
  (?)
@ 2015-10-02 22:46   ` Alex Williamson
  2015-10-05  6:00     ` Bhushan Bharat
  -1 siblings, 1 reply; 45+ messages in thread
From: Alex Williamson @ 2015-10-02 22:46 UTC (permalink / raw)
  To: Bharat Bhushan; +Cc: kvm, marc.zyngier, will.deacon, kvmarm

On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> This patch allows the user-space to know whether msi-pages
> are automatically mapped with some magic iova or not.
> 
> Even if the msi-pages are automatically mapped, still user-space
> wants to over-ride the automatic iova selection for msi-mapping.
> For this user-space need to know whether it is allowed to change
> the automatic mapping or not and this API provides this mechanism.
> Follow up patches will provide how to over-ride this.
> 
> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 32 ++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h       |  3 +++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index fa5d3e4..3315fb6 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -59,6 +59,7 @@ struct vfio_iommu {
>  	struct rb_root		dma_list;
>  	bool			v2;
>  	bool			nesting;
> +	bool			allow_msi_reconfig;
>  	struct list_head	reserved_iova_list;
>  };
>  
> @@ -1117,6 +1118,23 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static
> +int vfio_domains_get_msi_maps(struct vfio_iommu *iommu,
> +			      struct iommu_domain_msi_maps *msi_maps)
> +{
> +	struct vfio_domain *d;
> +	int ret;
> +
> +	mutex_lock(&iommu->lock);
> +	/* All domains have same msi-automap property, pick first */
> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> +	ret = iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_MAPPING,
> +				    msi_maps);
> +	mutex_unlock(&iommu->lock);
> +
> +	return ret;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> @@ -1138,6 +1156,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  		}
>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
>  		struct vfio_iommu_type1_info info;
> +		struct iommu_domain_msi_maps msi_maps;
> +		int ret;
>  
>  		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>  
> @@ -1149,6 +1169,18 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  		info.flags = 0;
>  
> +		ret = vfio_domains_get_msi_maps(iommu, &msi_maps);
> +		if (ret)
> +			return ret;

And now ioctl(VFIO_IOMMU_GET_INFO) no longer works for any IOMMU
implementing domain_get_attr but not supporting DOMAIN_ATTR_MSI_MAPPING.

> +
> +		if (msi_maps.override_automap) {
> +			info.flags |= VFIO_IOMMU_INFO_MSI_ALLOW_RECONFIG;
> +			iommu->allow_msi_reconfig = true;
> +		}
> +
> +		if (msi_maps.automap)
> +			info.flags |= VFIO_IOMMU_INFO_MSI_AUTOMAP;
> +
>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>  
>  		return copy_to_user((void __user *)arg, &info, minsz);
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 1abd1a9..9998f6e 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -391,6 +391,9 @@ struct vfio_iommu_type1_info {
>  	__u32	argsz;
>  	__u32	flags;
>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
> +#define VFIO_IOMMU_INFO_MSI_AUTOMAP (1 << 1)	/* MSI pages are auto-mapped
> +						   in iommu */
> +#define VFIO_IOMMU_INFO_MSI_ALLOW_RECONFIG (1 << 2) /* Allows reconfig automap*/
>  	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
>  };
>  

Once again, exposing interfaces to the user before they actually do
anything is backwards.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI pages
  2015-09-30 14:56   ` Bharat Bhushan
  (?)
@ 2015-10-02 22:46   ` Alex Williamson
  2015-10-05  6:27     ` Bhushan Bharat
  -1 siblings, 1 reply; 45+ messages in thread
From: Alex Williamson @ 2015-10-02 22:46 UTC (permalink / raw)
  To: Bharat Bhushan; +Cc: kvm, marc.zyngier, will.deacon, kvmarm

On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> For MSI interrupts to work for a pass-through devices we need
> to have mapping of msi-pages in iommu. Now on some platforms
> (like x86) does this msi-pages mapping happens magically and in these
> case they chooses an iova which they somehow know that it will never
> overlap with guest memory. But this magic iova selection
> may not be always true for all platform (like PowerPC and ARM64).
> 
> Also on x86 platform, there is no problem as long as running a x86-guest
> on x86-host but there can be issues when running a non-x86 guest on
> x86 host or other userspace applications like (I think ODP/DPDK).
> As in these cases there can be chances that it overlaps with guest
> memory mapping.

Wow, it's amazing anything works... smoke and mirrors.

> This patch add interface to iommu-map and iommu-unmap msi-pages at
> reserved iova chosen by userspace.
> 
> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> ---
>  drivers/vfio/vfio.c             |  52 +++++++++++++++++++
>  drivers/vfio/vfio_iommu_type1.c | 111 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/vfio.h            |   9 +++-
>  3 files changed, 171 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 2fb29df..a817d2d 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -605,6 +605,58 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
>  	return NOTIFY_OK;
>  }
>  
> +int vfio_device_map_msi(struct vfio_device *device, uint64_t msi_addr,
> +			uint32_t size, uint64_t *msi_iova)
> +{
> +	struct vfio_container *container = device->group->container;
> +	struct vfio_iommu_driver *driver;
> +	int ret;
> +
> +	/* Validate address and size */
> +	if (!msi_addr || !size || !msi_iova)
> +		return -EINVAL;
> +
> +	down_read(&container->group_lock);
> +
> +	driver = container->iommu_driver;
> +	if (!driver || !driver->ops || !driver->ops->msi_map) {
> +		up_read(&container->group_lock);
> +		return -EINVAL;
> +	}
> +
> +	ret = driver->ops->msi_map(container->iommu_data,
> +				   msi_addr, size, msi_iova);
> +
> +	up_read(&container->group_lock);
> +	return ret;
> +}
> +
> +int vfio_device_unmap_msi(struct vfio_device *device, uint64_t msi_iova,
> +			  uint64_t size)
> +{
> +	struct vfio_container *container = device->group->container;
> +	struct vfio_iommu_driver *driver;
> +	int ret;
> +
> +	/* Validate address and size */
> +	if (!msi_iova || !size)
> +		return -EINVAL;
> +
> +	down_read(&container->group_lock);
> +
> +	driver = container->iommu_driver;
> +	if (!driver || !driver->ops || !driver->ops->msi_unmap) {
> +		up_read(&container->group_lock);
> +		return -EINVAL;
> +	}
> +
> +	ret = driver->ops->msi_unmap(container->iommu_data,
> +				     msi_iova, size);
> +
> +	up_read(&container->group_lock);
> +	return ret;
> +}
> +
>  /**
>   * VFIO driver API
>   */
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 3315fb6..ab376c2 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1003,12 +1003,34 @@ out_free:
>  	return ret;
>  }
>  
> +static void vfio_iommu_unmap_all_reserved_regions(struct vfio_iommu *iommu)
> +{
> +	struct vfio_resvd_region *region;
> +	struct vfio_domain *d;
> +
> +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> +		list_for_each_entry(d, &iommu->domain_list, next) {
> +			if (!region->map_paddr)
> +				continue;
> +
> +			if (!iommu_iova_to_phys(d->domain, region->iova))
> +				continue;
> +
> +			iommu_unmap(d->domain, region->iova, PAGE_SIZE);

PAGE_SIZE?  Why not region->size?

> +			region->map_paddr = 0;
> +			cond_resched();
> +		}
> +	}
> +}
> +
>  static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu)
>  {
>  	struct rb_node *node;
>  
>  	while ((node = rb_first(&iommu->dma_list)))
>  		vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma, node));
> +
> +	vfio_iommu_unmap_all_reserved_regions(iommu);
>  }
>  
>  static void vfio_iommu_type1_detach_group(void *iommu_data,
> @@ -1048,6 +1070,93 @@ done:
>  	mutex_unlock(&iommu->lock);
>  }
>  
> +static int vfio_iommu_type1_msi_map(void *iommu_data, uint64_t msi_addr,
> +				    uint64_t size, uint64_t *msi_iova)
> +{
> +	struct vfio_iommu *iommu = iommu_data;
> +	struct vfio_resvd_region *region;
> +	int ret;
> +
> +	mutex_lock(&iommu->lock);
> +
> +	/* Do not try ceate iommu-mapping if msi reconfig not allowed */
> +	if (!iommu->allow_msi_reconfig) {
> +		mutex_unlock(&iommu->lock);
> +		return 0;
> +	}
> +
> +	/* Check if there is already region mapping the msi page */
> +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> +		if (region->map_paddr == msi_addr) {
> +			*msi_iova = region->iova;
> +			region->refcount++;
> +			mutex_unlock(&iommu->lock);
> +			return 0;
> +		}
> +	}
> +
> +	/* Get a unmapped reserved region */
> +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> +		if (!region->map_paddr)
> +			break;
> +	}
> +
> +	if (region == NULL) {
> +		mutex_unlock(&iommu->lock);
> +		return -ENODEV;
> +	}
> +
> +	ret = vfio_iommu_map(iommu, region->iova, msi_addr >> PAGE_SHIFT,
> +			     size >> PAGE_SHIFT, region->prot);

So the reserved region has a size and the msi mapping has a size and we
arbitrarily decide to use the msi mapping size here?  The overlap checks
we've done for the reserved region are meaningless then.  No wonder
you're unmapping with PAGE_SIZE, we have no idea.

> +	if (ret) {
> +		mutex_unlock(&iommu->lock);
> +		return ret;
> +	}
> +
> +	region->map_paddr = msi_addr;

Is there some sort of implied page alignment with msi_addr?  I could
pass 0x0 for one call, 0x1 for another and due to the mapping shift, get
two reserved IOVAs pointing at the same msi page.

> +	*msi_iova = region->iova;
> +	region->refcount++;
> +
> +	mutex_unlock(&iommu->lock);
> +
> +	return 0;
> +}
> +
> +static int vfio_iommu_type1_msi_unmap(void *iommu_data, uint64_t iova,
> +				      uint64_t size)
> +{
> +	struct vfio_iommu *iommu = iommu_data;
> +	struct vfio_resvd_region *region;
> +	struct vfio_domain *d;
> +
> +	mutex_lock(&iommu->lock);
> +
> +	/* find the region mapping the msi page */
> +	list_for_each_entry(region, &iommu->reserved_iova_list, next)
> +		if (region->iova == iova)
> +			break;
> +
> +	if (region == NULL || region->refcount <= 0) {
> +		mutex_unlock(&iommu->lock);
> +		return -EINVAL;
> +	}
> +
> +	region->refcount--;
> +	if (!region->refcount) {
> +		list_for_each_entry(d, &iommu->domain_list, next) {
> +			if (!iommu_iova_to_phys(d->domain, iova))
> +				continue;
> +
> +			iommu_unmap(d->domain, iova, size);

And here we're just trusting that the unmap was the same size as the
map?

> +			cond_resched();
> +		}
> +	}
> +	region->map_paddr = 0;
> +
> +	mutex_unlock(&iommu->lock);
> +	return 0;
> +}
> +
>  static void *vfio_iommu_type1_open(unsigned long arg)
>  {
>  	struct vfio_iommu *iommu;
> @@ -1264,6 +1373,8 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
>  	.ioctl		= vfio_iommu_type1_ioctl,
>  	.attach_group	= vfio_iommu_type1_attach_group,
>  	.detach_group	= vfio_iommu_type1_detach_group,
> +	.msi_map	= vfio_iommu_type1_msi_map,
> +	.msi_unmap	= vfio_iommu_type1_msi_unmap,
>  };
>  
>  static int __init vfio_iommu_type1_init(void)
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index ddb4409..b91085d 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -52,6 +52,10 @@ extern void *vfio_del_group_dev(struct device *dev);
>  extern struct vfio_device *vfio_device_get_from_dev(struct device *dev);
>  extern void vfio_device_put(struct vfio_device *device);
>  extern void *vfio_device_data(struct vfio_device *device);
> +extern int vfio_device_map_msi(struct vfio_device *device, uint64_t msi_addr,
> +			       uint32_t size, uint64_t *msi_iova);
> +int vfio_device_unmap_msi(struct vfio_device *device, uint64_t msi_iova,
> +			  uint64_t size);
>  
>  /**
>   * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks
> @@ -72,7 +76,10 @@ struct vfio_iommu_driver_ops {
>  					struct iommu_group *group);
>  	void		(*detach_group)(void *iommu_data,
>  					struct iommu_group *group);
> -
> +	int		(*msi_map)(void *iommu_data, uint64_t msi_addr,
> +				   uint64_t size, uint64_t *msi_iova);
> +	int		(*msi_unmap)(void *iommu_data, uint64_t msi_iova,
> +				     uint64_t size);
>  };
>  
>  extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);

How did this patch solve any of the problems outlined in the commit log?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt
  2015-09-30 14:56   ` Bharat Bhushan
                     ` (2 preceding siblings ...)
  (?)
@ 2015-10-02 22:46   ` Alex Williamson
  2015-10-05  7:20     ` Bhushan Bharat
  -1 siblings, 1 reply; 45+ messages in thread
From: Alex Williamson @ 2015-10-02 22:46 UTC (permalink / raw)
  To: Bharat Bhushan
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon

On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> An MSI-address is allocated and programmed in pcie device
> during interrupt configuration. Now for a pass-through device,
> try to create the iommu mapping for this allocted/programmed
> msi-address.  If the iommu mapping is created and the msi
> address programmed in the pcie device is different from
> msi-iova as per iommu programming then reconfigure the pci
> device to use msi-iova as msi address.
> 
> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> ---
>  drivers/vfio/pci/vfio_pci_intrs.c | 36 ++++++++++++++++++++++++++++++++++--
>  1 file changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1f577b4..c9690af 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -312,13 +312,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>  	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
>  	char *name = msix ? "vfio-msix" : "vfio-msi";
>  	struct eventfd_ctx *trigger;
> +	struct msi_msg msg;
> +	struct vfio_device *device;
> +	uint64_t msi_addr, msi_iova;
>  	int ret;
>  
>  	if (vector >= vdev->num_ctx)
>  		return -EINVAL;
>  
> +	device = vfio_device_get_from_dev(&pdev->dev);

Have you looked at this function?  I don't think we want to be doing
that every time we want to poke the interrupt configuration.  Also note
that IOMMU mappings don't operate on devices, but groups, so maybe we
want to pass the group.

> +	if (device == NULL)
> +		return -EINVAL;

This would be a legitimate BUG_ON(!device)

> +
>  	if (vdev->ctx[vector].trigger) {
>  		free_irq(irq, vdev->ctx[vector].trigger);
> +		get_cached_msi_msg(irq, &msg);
> +		msi_iova = ((u64)msg.address_hi << 32) | msg.address_lo;
> +		vfio_device_unmap_msi(device, msi_iova, PAGE_SIZE);
>  		kfree(vdev->ctx[vector].name);
>  		eventfd_ctx_put(vdev->ctx[vector].trigger);
>  		vdev->ctx[vector].trigger = NULL;
> @@ -346,12 +356,11 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>  	 * cached value of the message prior to enabling.
>  	 */
>  	if (msix) {
> -		struct msi_msg msg;
> -
>  		get_cached_msi_msg(irq, &msg);
>  		pci_write_msi_msg(irq, &msg);
>  	}
>  
> +

gratuitous newline

>  	ret = request_irq(irq, vfio_msihandler, 0,
>  			  vdev->ctx[vector].name, trigger);
>  	if (ret) {
> @@ -360,6 +369,29 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>  		return ret;
>  	}
>  
> +	/* Re-program the new-iova in pci-device in case there is
> +	 * different iommu-mapping created for programmed msi-address.
> +	 */
> +	get_cached_msi_msg(irq, &msg);
> +	msi_iova = 0;
> +	msi_addr = (u64)(msg.address_hi) << 32 | (u64)(msg.address_lo);
> +	ret = vfio_device_map_msi(device, msi_addr, PAGE_SIZE, &msi_iova);
> +	if (ret) {
> +		free_irq(irq, vdev->ctx[vector].trigger);
> +		kfree(vdev->ctx[vector].name);
> +		eventfd_ctx_put(trigger);
> +		return ret;
> +	}
> +
> +	/* Reprogram only if iommu-mapped iova is different from msi-address */
> +	if (msi_iova && (msi_iova != msi_addr)) {
> +		msg.address_hi = (u32)(msi_iova >> 32);
> +		/* Keep Lower bits from original msi message address */
> +		msg.address_lo &= PAGE_MASK;
> +		msg.address_lo |= (u32)(msi_iova & 0x00000000ffffffff);

Seems like you're making some assumptions here that are dependent on the
architecture and maybe the platform.

> +		pci_write_msi_msg(irq, &msg);
> +	}
> +
>  	vdev->ctx[vector].trigger = trigger;
>  
>  	return 0;




^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for MSI
  2015-09-30 14:56   ` Bharat Bhushan
  (?)
@ 2015-10-02 22:46   ` Alex Williamson
  2015-10-05  8:33     ` Bhushan Bharat
  -1 siblings, 1 reply; 45+ messages in thread
From: Alex Williamson @ 2015-10-02 22:46 UTC (permalink / raw)
  To: Bharat Bhushan
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon

On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> Finally ARM SMMU declare that iommu-mapping for MSI-pages are not
> set automatically and it should be set explicitly.
> 
> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> ---
>  drivers/iommu/arm-smmu.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index a3956fb..9d37e72 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1401,13 +1401,18 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
>  				    enum iommu_attr attr, void *data)
>  {
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct iommu_domain_msi_maps *msi_maps;
>  
>  	switch (attr) {
>  	case DOMAIN_ATTR_NESTING:
>  		*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
>  		return 0;
>  	case DOMAIN_ATTR_MSI_MAPPING:
> -		/* Dummy handling added */
> +		msi_maps = data;
> +
> +		msi_maps->automap = false;
> +		msi_maps->override_automap = true;
> +
>  		return 0;
>  	default:
>  		return -ENODEV;

In previous discussions I understood one of the problems you were trying
to solve was having a limited number of MSI banks and while you may be
able to get isolated MSI banks for some number of users, it wasn't
unlimited and sharing may be required.  I don't see any of that
addressed in this series.

Also, the management of reserved IOVAs vs MSI addresses looks really
dubious to me.  How does your platform pick an MSI address and what are
we breaking by covertly changing it?  We seem to be masking over at the
VFIO level, where there should be lower level interfaces doing the right
thing when we configure MSI on the device.

The problem of reporting "automap" base address isn't addressed more
than leaving some unused field in iommu_domain_msi_maps.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region
  2015-10-02 22:45 ` [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region Alex Williamson
@ 2015-10-05  4:55   ` Bhushan Bharat
  2015-10-05 22:45     ` Alex Williamson
  0 siblings, 1 reply; 45+ messages in thread
From: Bhushan Bharat @ 2015-10-05  4:55 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon

Hi Alex,

> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Saturday, October 03, 2015 4:16 AM
> To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> marc.zyngier@arm.com; will.deacon@arm.com
> Subject: Re: [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova
> region
> 
> On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > This Patch adds the VFIO APIs to add and remove reserved iova regions.
> > The reserved iova region can be used for mapping some specific
> > physical address in iommu.
> >
> > Currently we are planning to use this interface for adding iova
> > regions for creating iommu of msi-pages. But the API are designed for
> > future extension where some other physical address can be mapped.
> >
> > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 201
> +++++++++++++++++++++++++++++++++++++++-
> >  include/uapi/linux/vfio.h       |  43 +++++++++
> >  2 files changed, 243 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c index 57d8c37..fa5d3e4 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -59,6 +59,7 @@ struct vfio_iommu {
> >  	struct rb_root		dma_list;
> >  	bool			v2;
> >  	bool			nesting;
> > +	struct list_head	reserved_iova_list;
> 
> This alignment leads to poor packing in the structure, put it above the bools.

ok

> 
> >  };
> >
> >  struct vfio_domain {
> > @@ -77,6 +78,15 @@ struct vfio_dma {
> >  	int			prot;		/* IOMMU_READ/WRITE */
> >  };
> >
> > +struct vfio_resvd_region {
> > +	dma_addr_t	iova;
> > +	size_t		size;
> > +	int		prot;			/* IOMMU_READ/WRITE */
> > +	int		refcount;		/* ref count of mappings */
> > +	uint64_t	map_paddr;		/* Mapped Physical Address
> */
> 
> phys_addr_t

Ok,

> 
> > +	struct list_head next;
> > +};
> > +
> >  struct vfio_group {
> >  	struct iommu_group	*iommu_group;
> >  	struct list_head	next;
> > @@ -106,6 +116,38 @@ static struct vfio_dma *vfio_find_dma(struct
> vfio_iommu *iommu,
> >  	return NULL;
> >  }
> >
> > +/* This function must be called with iommu->lock held */ static bool
> > +vfio_overlap_with_resvd_region(struct vfio_iommu *iommu,
> > +					   dma_addr_t start, size_t size) {
> > +	struct vfio_resvd_region *region;
> > +
> > +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> > +		if (region->iova < start)
> > +			return (start - region->iova < region->size);
> > +		else if (start < region->iova)
> > +			return (region->iova - start < size);
> 
> <= on both of the return lines?

I think is should be "<" and not "=<", no ?

> 
> > +
> > +		return (region->size > 0 && size > 0);
> > +	}
> > +
> > +	return false;
> > +}
> > +
> > +/* This function must be called with iommu->lock held */ static
> > +struct vfio_resvd_region *vfio_find_resvd_region(struct vfio_iommu
> *iommu,
> > +						 dma_addr_t start, size_t
> size) {
> > +	struct vfio_resvd_region *region;
> > +
> > +	list_for_each_entry(region, &iommu->reserved_iova_list, next)
> > +		if (region->iova == start && region->size == size)
> > +			return region;
> > +
> > +	return NULL;
> > +}
> > +
> >  static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma
> > *new)  {
> >  	struct rb_node **link = &iommu->dma_list.rb_node, *parent =
> NULL; @@
> > -580,7 +622,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> >
> >  	mutex_lock(&iommu->lock);
> >
> > -	if (vfio_find_dma(iommu, iova, size)) {
> > +	if (vfio_find_dma(iommu, iova, size) ||
> > +	    vfio_overlap_with_resvd_region(iommu, iova, size)) {
> >  		mutex_unlock(&iommu->lock);
> >  		return -EEXIST;
> >  	}
> > @@ -626,6 +669,127 @@ static int vfio_dma_do_map(struct vfio_iommu
> *iommu,
> >  	return ret;
> >  }
> >
> > +/* This function must be called with iommu->lock held */ static int
> > +vfio_iommu_resvd_region_del(struct vfio_iommu *iommu,
> > +				dma_addr_t iova, size_t size, int prot) {
> > +	struct vfio_resvd_region *res_region;
> 
> Have some consistency in naming, just use "region".

Ok,

> > +
> > +	res_region = vfio_find_resvd_region(iommu, iova, size);
> > +	/* Region should not be mapped in iommu */
> > +	if (res_region == NULL || res_region->map_paddr)
> > +		return -EINVAL;
> 
> Are these two separate errors?  !region is -EINVAL, but being mapped is -
> EBUSY.

Yes, will separate them.

> 
> > +
> > +	list_del(&res_region->next);
> > +	kfree(res_region);
> > +	return 0;
> > +}
> > +
> > +/* This function must be called with iommu->lock held */ static int
> > +vfio_iommu_resvd_region_add(struct vfio_iommu *iommu,
> > +				       dma_addr_t iova, size_t size, int prot) {
> > +	struct vfio_resvd_region *res_region;
> > +
> > +	/* Check overlap with with dma maping and reserved regions */
> > +	if (vfio_find_dma(iommu, iova, size) ||
> > +	    vfio_find_resvd_region(iommu, iova, size))
> > +		return -EEXIST;
> > +
> > +	res_region = kzalloc(sizeof(*res_region), GFP_KERNEL);
> > +	if (res_region == NULL)
> > +		return -ENOMEM;
> > +
> > +	res_region->iova = iova;
> > +	res_region->size = size;
> > +	res_region->prot = prot;
> > +	res_region->refcount = 0;
> > +	res_region->map_paddr = 0;
> 
> They're already 0 by the kzalloc

Yes ;)
> 
> > +
> > +	list_add(&res_region->next, &iommu->reserved_iova_list);
> > +
> > +	return 0;
> > +}
> > +
> > +static
> > +int vfio_handle_reserved_region_add(struct vfio_iommu *iommu,
> > +				struct vfio_iommu_reserved_region_add
> *region) {
> > +	dma_addr_t iova = region->iova;
> > +	size_t size = region->size;
> > +	int flags = region->flags;
> > +	uint64_t mask;
> > +	int prot = 0;
> > +	int ret;
> > +
> > +	/* Verify that none of our __u64 fields overflow */
> > +	if (region->size != size || region->iova != iova)
> > +		return -EINVAL;
> > +
> > +	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
> > +
> > +	WARN_ON(mask & PAGE_MASK);
> > +
> > +	if (flags & VFIO_IOMMU_RES_REGION_READ)
> > +		prot |= IOMMU_READ;
> > +	if (flags & VFIO_IOMMU_RES_REGION_WRITE)
> > +		prot |= IOMMU_WRITE;
> > +
> > +	if (!prot || !size || (size | iova) & mask)
> > +		return -EINVAL;
> > +
> > +	/* Don't allow IOVA wrap */
> > +	if (iova + size - 1 < iova)
> > +		return -EINVAL;
> > +
> > +	mutex_lock(&iommu->lock);
> > +
> > +	if (region->flags & VFIO_IOMMU_RES_REGION_ADD) {
> > +		ret = vfio_iommu_resvd_region_add(iommu, iova, size,
> prot);
> > +		if (ret) {
> > +			mutex_unlock(&iommu->lock);
> > +			return ret;
> > +		}
> > +	}
> 
> Silently fail if not VFIO_IOMMU_RES_REGION_ADD?

As per below comment we do not need this flag. So the above flag checking will be removed.

> 
> > +
> > +	mutex_unlock(&iommu->lock);
> > +	return 0;
> > +}
> > +
> > +static
> > +int vfio_handle_reserved_region_del(struct vfio_iommu *iommu,
> > +				struct vfio_iommu_reserved_region_del
> *region) {
> > +	dma_addr_t iova = region->iova;
> > +	size_t size = region->size;
> > +	int flags = region->flags;
> > +	int ret;
> > +
> > +	if (!(flags & VFIO_IOMMU_RES_REGION_DEL))
> > +		return -EINVAL;
> > +
> > +	mutex_lock(&iommu->lock);
> > +
> > +	/* Check for the region */
> > +	if (vfio_find_resvd_region(iommu, iova, size) == NULL) {
> > +		mutex_unlock(&iommu->lock);
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* remove the reserved region */
> > +	if (region->flags & VFIO_IOMMU_RES_REGION_DEL) {
> > +		ret = vfio_iommu_resvd_region_del(iommu, iova, size,
> flags);
> > +		if (ret) {
> > +			mutex_unlock(&iommu->lock);
> > +			return ret;
> > +		}
> > +	}
> > +
> > +	mutex_unlock(&iommu->lock);
> > +	return 0;
> > +}
> > +
> >  static int vfio_bus_type(struct device *dev, void *data)  {
> >  	struct bus_type **bus = data;
> > @@ -905,6 +1069,7 @@ static void *vfio_iommu_type1_open(unsigned
> long arg)
> >  	}
> >
> >  	INIT_LIST_HEAD(&iommu->domain_list);
> > +	INIT_LIST_HEAD(&iommu->reserved_iova_list);
> >  	iommu->dma_list = RB_ROOT;
> >  	mutex_init(&iommu->lock);
> >
> > @@ -1020,6 +1185,40 @@ static long vfio_iommu_type1_ioctl(void
> *iommu_data,
> >  			return ret;
> >
> >  		return copy_to_user((void __user *)arg, &unmap, minsz);
> > +	} else if (cmd == VFIO_IOMMU_RESERVED_REGION_ADD) {
> > +		struct vfio_iommu_reserved_region_add region;
> > +		long ret;
> > +
> > +		minsz = offsetofend(struct
> vfio_iommu_reserved_region_add,
> > +				    size);
> > +		if (copy_from_user(&region, (void __user *)arg, minsz))
> > +			return -EFAULT;
> > +
> > +		if (region.argsz < minsz)
> > +			return -EINVAL;
> > +
> > +		ret = vfio_handle_reserved_region_add(iommu, &region);
> > +		if (ret)
> > +			return ret;
> > +
> > +		return copy_to_user((void __user *)arg, &region, minsz);
> > +	} else if (cmd == VFIO_IOMMU_RESERVED_REGION_DEL) {
> > +		struct vfio_iommu_reserved_region_del region;
> > +		long ret;
> > +
> > +		minsz = offsetofend(struct
> vfio_iommu_reserved_region_del,
> > +				    size);
> > +		if (copy_from_user(&region, (void __user *)arg, minsz))
> > +			return -EFAULT;
> > +
> > +		if (region.argsz < minsz)
> > +			return -EINVAL;
> > +
> > +		ret = vfio_handle_reserved_region_del(iommu, &region);
> > +		if (ret)
> > +			return ret;
> > +
> > +		return copy_to_user((void __user *)arg, &region, minsz);
> 
> So we've just created an interface that is available for all vfio-type1 users,
> whether it makes any sense for the platform or not,

How we should decide that a given platform needs this or not?

> and it allows the user to
> consume arbitrary amounts of kernel memory, by making an infinitely long
> list of reserved iova entries, brilliant!

I was not sure of how to limit the user. What I was thinking of having a default number of pages a user can reserve (512 pages). Also we can give a sysfs interface so that user can change the default number of pages. Does this sound good? If not please suggest?

> 
> >  	}
> >
> >  	return -ENOTTY;
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index b57b750..1abd1a9 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -440,6 +440,49 @@ struct vfio_iommu_type1_dma_unmap {
> >  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
> >  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
> >
> > +/**************** Reserved IOVA region specific APIs
> > +**********************/
> > +
> > +/*
> > + * VFIO_IOMMU_RESERVED_REGION_ADD - _IO(VFIO_TYPE, VFIO_BASE
> + 17,
> > + *					struct
> vfio_iommu_reserved_region_add)
> > + * This is used to add a reserved iova region.
> > + * @flags - Input: VFIO_IOMMU_RES_REGION_ADD flag is for adding
> > + * a reserved region.
> 
> Why else would we call VFIO_IOMMU_RESERVED_REGION_ADD except to
> add a region, this flag is redundant.

Ok, will remove this.

> 
> > + * Also pass READ/WRITE/IOMMU flags to be used in iommu mapping.
> > + * @iova - Input: IOVA base address of reserved region
> > + * @size - Input: Size of the reserved region
> > + * Return: 0 on success, -errno on failure  */ struct
> > +vfio_iommu_reserved_region_add {
> > +	__u32   argsz;
> > +	__u32   flags;
> > +#define VFIO_IOMMU_RES_REGION_ADD	(1 << 0) /* Add a reserved
> region */
> > +#define VFIO_IOMMU_RES_REGION_READ	(1 << 1) /* readable region */
> > +#define VFIO_IOMMU_RES_REGION_WRITE	(1 << 2) /* writable
> region */
> > +	__u64	iova;
> > +	__u64   size;
> > +};
> > +#define VFIO_IOMMU_RESERVED_REGION_ADD _IO(VFIO_TYPE,
> VFIO_BASE + 17)
> > +
> > +/*
> > + * VFIO_IOMMU_RESERVED_REGION_DEL - _IO(VFIO_TYPE, VFIO_BASE +
> 18,
> > + *					struct
> vfio_iommu_reserved_region_del)
> > + * This is used to delete an existing reserved iova region.
> > + * @flags - VFIO_IOMMU_RES_REGION_DEL flag is for deleting a region
> > +use,
> > + *  only a unmapped region can be deleted.
> > + * @iova - Input: IOVA base address of reserved region
> > + * @size - Input: Size of the reserved region
> > + * Return: 0 on success, -errno on failure  */ struct
> > +vfio_iommu_reserved_region_del {
> > +	__u32   argsz;
> > +	__u32   flags;
> > +#define VFIO_IOMMU_RES_REGION_DEL	(1 << 0) /* unset the
> reserved region */
> > +	__u64	iova;
> > +	__u64   size;
> > +};
> > +#define VFIO_IOMMU_RESERVED_REGION_DEL _IO(VFIO_TYPE,
> VFIO_BASE + 18)
> > +
> 
> These are effectively both
> 
> struct vfio_iommu_type1_dma_unmap

Yes, do you want to suggest that we should use " struct vfio_iommu_type1_dma_unmap". I found that confusing.
What is we use "struct vfio_iommu_reserved_region" and use flag VFIO_IOMMU_RES_REGION_DEL/ VFIO_IOMMU_RES_REGION_ADD.

Thanks
-Bharat

> 
> >  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU
> > -------- */
> >
> >  /*
> 
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [RFC PATCH 2/6] iommu: Add interface to get msi-pages mapping attributes
  2015-10-02 22:45   ` Alex Williamson
@ 2015-10-05  5:17     ` Bhushan Bharat
  2015-10-05  5:56     ` Bhushan Bharat
  1 sibling, 0 replies; 45+ messages in thread
From: Bhushan Bharat @ 2015-10-05  5:17 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, marc.zyngier, will.deacon, kvmarm



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Saturday, October 03, 2015 4:16 AM
> To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> marc.zyngier@arm.com; will.deacon@arm.com
> Subject: Re: [RFC PATCH 2/6] iommu: Add interface to get msi-pages
> mapping attributes
> 
> [really ought to consider cc'ing the iommu list]
> 
> On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > This APIs return the capability of automatically mapping msi-pages in
> > iommu with some magic iova. Which is what currently most of iommu's
> > does and is the default behaviour.
> >
> > Further API returns whether iommu allows the user to define different
> > iova for mai-page mapping for the domain. This is required when a msi
> > capable device is directly assigned to user-space/VM and user-space/VM
> > need to define a non-overlapping (from other dma-able address space)
> > iova for msi-pages mapping in iommu.
> >
> > This patch just define the interface and follow up patches will extend
> > this interface.
> 
> This is backwards, generally you want to add the infrastructure and only
> expose it once all the pieces are in place for it to work.  For instance, patch
> 1/6 exposes a new userspace interface for vfio that doesn't do anything yet.
> How does the user know if it's there, *and* works?

Ok, I will reorder the patches.

> 
> > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > ---
> >  drivers/iommu/arm-smmu.c        |  3 +++
> >  drivers/iommu/fsl_pamu_domain.c |  3 +++
> >  drivers/iommu/iommu.c           | 14 ++++++++++++++
> >  include/linux/iommu.h           |  9 ++++++++-
> >  4 files changed, 28 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index
> > 66a803b..a3956fb 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -1406,6 +1406,9 @@ static int arm_smmu_domain_get_attr(struct
> iommu_domain *domain,
> >  	case DOMAIN_ATTR_NESTING:
> >  		*(int *)data = (smmu_domain->stage ==
> ARM_SMMU_DOMAIN_NESTED);
> >  		return 0;
> > +	case DOMAIN_ATTR_MSI_MAPPING:
> > +		/* Dummy handling added */
> > +		return 0;
> >  	default:
> >  		return -ENODEV;
> >  	}
> > diff --git a/drivers/iommu/fsl_pamu_domain.c
> > b/drivers/iommu/fsl_pamu_domain.c index 1d45293..9a94430 100644
> > --- a/drivers/iommu/fsl_pamu_domain.c
> > +++ b/drivers/iommu/fsl_pamu_domain.c
> > @@ -856,6 +856,9 @@ static int fsl_pamu_get_domain_attr(struct
> iommu_domain *domain,
> >  	case DOMAIN_ATTR_FSL_PAMUV1:
> >  		*(int *)data = DOMAIN_ATTR_FSL_PAMUV1;
> >  		break;
> > +	case DOMAIN_ATTR_MSI_MAPPING:
> > +		/* Dummy handling added */
> > +		break;
> >  	default:
> >  		pr_debug("Unsupported attribute type\n");
> >  		ret = -EINVAL;
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index
> > d4f527e..16c2eab 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1216,6 +1216,7 @@ int iommu_domain_get_attr(struct
> iommu_domain *domain,
> >  	bool *paging;
> >  	int ret = 0;
> >  	u32 *count;
> > +	struct iommu_domain_msi_maps *msi_maps;
> >
> >  	switch (attr) {
> >  	case DOMAIN_ATTR_GEOMETRY:
> > @@ -1236,6 +1237,19 @@ int iommu_domain_get_attr(struct
> iommu_domain *domain,
> >  			ret = -ENODEV;
> >
> >  		break;
> > +	case DOMAIN_ATTR_MSI_MAPPING:
> > +		msi_maps = data;
> > +
> > +		/* Default MSI-pages are magically mapped with some iova
> and
> > +		 * do now allow to configure with different iova.
> > +		 */
> > +		msi_maps->automap = true;
> > +		msi_maps->override_automap = false;
> 
> There's no magic.  I think what you're trying to express is the difference
> between platforms that support MSI within the IOMMU IOVA space and
> thus need explicit IOMMU mappings vs platforms where MSI mappings
> either bypass the IOMMU entirely or are setup implicitly with interrupt
> remapping support.

Yes, I wants to differentiate the platforms which requires explicit iommu mapping for MSI with other platforms.
I will change the comment and use better name (need_mapping/need_iommu_mapping/require_mapping).

> 
> Why does it make sense to impose any sort of defaults?  If the IOMMU
> driver doesn't tell us what to do, I don't think we want to assume anything.
> 
> > +
> > +		if (domain->ops->domain_get_attr)
> > +			ret = domain->ops->domain_get_attr(domain, attr,
> data);
> > +
> > +		break;
> >  	default:
> >  		if (!domain->ops->domain_get_attr)
> >  			return -EINVAL;
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
> > 0546b87..6d49f3f 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -83,6 +83,13 @@ struct iommu_domain {
> >  	struct iommu_domain_geometry geometry;  };
> >
> > +struct iommu_domain_msi_maps {
> > +	dma_addr_t base_address;
> > +	dma_addr_t size;
> 
> size_t?

Will remove above two fields as they are redundant.

Thanks
-Bharat

> 
> > +	bool automap;
> > +	bool override_automap;
> > +};
> > +
> >  enum iommu_cap {
> >  	IOMMU_CAP_CACHE_COHERENCY,	/* IOMMU can enforce cache
> coherent DMA
> >  					   transactions */
> > @@ -111,6 +118,7 @@ enum iommu_attr {
> >  	DOMAIN_ATTR_FSL_PAMU_ENABLE,
> >  	DOMAIN_ATTR_FSL_PAMUV1,
> >  	DOMAIN_ATTR_NESTING,	/* two stages of translation */
> > +	DOMAIN_ATTR_MSI_MAPPING, /* Provides MSIs mapping in iommu
> */
> >  	DOMAIN_ATTR_MAX,
> >  };
> >
> > @@ -167,7 +175,6 @@ struct iommu_ops {
> >  	int (*domain_set_windows)(struct iommu_domain *domain, u32
> w_count);
> >  	/* Get the numer of window per domain */
> >  	u32 (*domain_get_windows)(struct iommu_domain *domain);
> > -
> >  #ifdef CONFIG_OF_IOMMU
> >  	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
> > #endif
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [RFC PATCH 2/6] iommu: Add interface to get msi-pages mapping attributes
  2015-10-02 22:45   ` Alex Williamson
  2015-10-05  5:17     ` Bhushan Bharat
@ 2015-10-05  5:56     ` Bhushan Bharat
  1 sibling, 0 replies; 45+ messages in thread
From: Bhushan Bharat @ 2015-10-05  5:56 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon

Forgot to respond to one of the comment ..

> -----Original Message-----
> From: Bhushan Bharat-R65777
> Sent: Monday, October 05, 2015 10:47 AM
> To: 'Alex Williamson' <alex.williamson@redhat.com>
> Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> marc.zyngier@arm.com; will.deacon@arm.com
> Subject: RE: [RFC PATCH 2/6] iommu: Add interface to get msi-pages
> mapping attributes
> 
> 
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Saturday, October 03, 2015 4:16 AM
> > To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> > Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > christoffer.dall@linaro.org; eric.auger@linaro.org;
> > pranavkumar@linaro.org; marc.zyngier@arm.com; will.deacon@arm.com
> > Subject: Re: [RFC PATCH 2/6] iommu: Add interface to get msi-pages
> > mapping attributes
> >
> > [really ought to consider cc'ing the iommu list]
> >
> > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > This APIs return the capability of automatically mapping msi-pages
> > > in iommu with some magic iova. Which is what currently most of
> > > iommu's does and is the default behaviour.
> > >
> > > Further API returns whether iommu allows the user to define
> > > different iova for mai-page mapping for the domain. This is required
> > > when a msi capable device is directly assigned to user-space/VM and
> > > user-space/VM need to define a non-overlapping (from other dma-able
> > > address space) iova for msi-pages mapping in iommu.
> > >
> > > This patch just define the interface and follow up patches will
> > > extend this interface.
> >
> > This is backwards, generally you want to add the infrastructure and
> > only expose it once all the pieces are in place for it to work.  For
> > instance, patch
> > 1/6 exposes a new userspace interface for vfio that doesn't do anything
> yet.
> > How does the user know if it's there, *and* works?
> 
> Ok, I will reorder the patches.
> 
> >
> > > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > > ---
> > >  drivers/iommu/arm-smmu.c        |  3 +++
> > >  drivers/iommu/fsl_pamu_domain.c |  3 +++
> > >  drivers/iommu/iommu.c           | 14 ++++++++++++++
> > >  include/linux/iommu.h           |  9 ++++++++-
> > >  4 files changed, 28 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index
> > > 66a803b..a3956fb 100644
> > > --- a/drivers/iommu/arm-smmu.c
> > > +++ b/drivers/iommu/arm-smmu.c
> > > @@ -1406,6 +1406,9 @@ static int arm_smmu_domain_get_attr(struct
> > iommu_domain *domain,
> > >  	case DOMAIN_ATTR_NESTING:
> > >  		*(int *)data = (smmu_domain->stage ==
> > ARM_SMMU_DOMAIN_NESTED);
> > >  		return 0;
> > > +	case DOMAIN_ATTR_MSI_MAPPING:
> > > +		/* Dummy handling added */
> > > +		return 0;
> > >  	default:
> > >  		return -ENODEV;
> > >  	}
> > > diff --git a/drivers/iommu/fsl_pamu_domain.c
> > > b/drivers/iommu/fsl_pamu_domain.c index 1d45293..9a94430 100644
> > > --- a/drivers/iommu/fsl_pamu_domain.c
> > > +++ b/drivers/iommu/fsl_pamu_domain.c
> > > @@ -856,6 +856,9 @@ static int fsl_pamu_get_domain_attr(struct
> > iommu_domain *domain,
> > >  	case DOMAIN_ATTR_FSL_PAMUV1:
> > >  		*(int *)data = DOMAIN_ATTR_FSL_PAMUV1;
> > >  		break;
> > > +	case DOMAIN_ATTR_MSI_MAPPING:
> > > +		/* Dummy handling added */
> > > +		break;
> > >  	default:
> > >  		pr_debug("Unsupported attribute type\n");
> > >  		ret = -EINVAL;
> > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index
> > > d4f527e..16c2eab 100644
> > > --- a/drivers/iommu/iommu.c
> > > +++ b/drivers/iommu/iommu.c
> > > @@ -1216,6 +1216,7 @@ int iommu_domain_get_attr(struct
> > iommu_domain *domain,
> > >  	bool *paging;
> > >  	int ret = 0;
> > >  	u32 *count;
> > > +	struct iommu_domain_msi_maps *msi_maps;
> > >
> > >  	switch (attr) {
> > >  	case DOMAIN_ATTR_GEOMETRY:
> > > @@ -1236,6 +1237,19 @@ int iommu_domain_get_attr(struct
> > iommu_domain *domain,
> > >  			ret = -ENODEV;
> > >
> > >  		break;
> > > +	case DOMAIN_ATTR_MSI_MAPPING:
> > > +		msi_maps = data;
> > > +
> > > +		/* Default MSI-pages are magically mapped with some iova
> > and
> > > +		 * do now allow to configure with different iova.
> > > +		 */
> > > +		msi_maps->automap = true;
> > > +		msi_maps->override_automap = false;
> >
> > There's no magic.  I think what you're trying to express is the
> > difference between platforms that support MSI within the IOMMU IOVA
> > space and thus need explicit IOMMU mappings vs platforms where MSI
> > mappings either bypass the IOMMU entirely or are setup implicitly with
> > interrupt remapping support.
> 
> Yes, I wants to differentiate the platforms which requires explicit iommu
> mapping for MSI with other platforms.
> I will change the comment and use better name
> (need_mapping/need_iommu_mapping/require_mapping).
> 
> >
> > Why does it make sense to impose any sort of defaults?  If the IOMMU
> > driver doesn't tell us what to do, I don't think we want to assume anything.

Agree, in this patch series I restricted the change to smmu only. I will try extending this to other iommu's as well.

Thanks
-Bharat

> >
> > > +
> > > +		if (domain->ops->domain_get_attr)
> > > +			ret = domain->ops->domain_get_attr(domain, attr,
> > data);
> > > +
> > > +		break;
> > >  	default:
> > >  		if (!domain->ops->domain_get_attr)
> > >  			return -EINVAL;
> > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
> > > 0546b87..6d49f3f 100644
> > > --- a/include/linux/iommu.h
> > > +++ b/include/linux/iommu.h
> > > @@ -83,6 +83,13 @@ struct iommu_domain {
> > >  	struct iommu_domain_geometry geometry;  };
> > >
> > > +struct iommu_domain_msi_maps {
> > > +	dma_addr_t base_address;
> > > +	dma_addr_t size;
> >
> > size_t?
> 
> Will remove above two fields as they are redundant.
> 
> Thanks
> -Bharat
> 
> >
> > > +	bool automap;
> > > +	bool override_automap;
> > > +};
> > > +
> > >  enum iommu_cap {
> > >  	IOMMU_CAP_CACHE_COHERENCY,	/* IOMMU can enforce cache
> > coherent DMA
> > >  					   transactions */
> > > @@ -111,6 +118,7 @@ enum iommu_attr {
> > >  	DOMAIN_ATTR_FSL_PAMU_ENABLE,
> > >  	DOMAIN_ATTR_FSL_PAMUV1,
> > >  	DOMAIN_ATTR_NESTING,	/* two stages of translation */
> > > +	DOMAIN_ATTR_MSI_MAPPING, /* Provides MSIs mapping in iommu
> > */
> > >  	DOMAIN_ATTR_MAX,
> > >  };
> > >
> > > @@ -167,7 +175,6 @@ struct iommu_ops {
> > >  	int (*domain_set_windows)(struct iommu_domain *domain, u32
> > w_count);
> > >  	/* Get the numer of window per domain */
> > >  	u32 (*domain_get_windows)(struct iommu_domain *domain);
> > > -
> > >  #ifdef CONFIG_OF_IOMMU
> > >  	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
> > > #endif
> >
> >


^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
  2015-10-02 22:46   ` Alex Williamson
@ 2015-10-05  6:00     ` Bhushan Bharat
  2015-10-05 22:45       ` Alex Williamson
  0 siblings, 1 reply; 45+ messages in thread
From: Bhushan Bharat @ 2015-10-05  6:00 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Saturday, October 03, 2015 4:16 AM
> To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> marc.zyngier@arm.com; will.deacon@arm.com
> Subject: Re: [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs
> automap state
> 
> On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > This patch allows the user-space to know whether msi-pages are
> > automatically mapped with some magic iova or not.
> >
> > Even if the msi-pages are automatically mapped, still user-space wants
> > to over-ride the automatic iova selection for msi-mapping.
> > For this user-space need to know whether it is allowed to change the
> > automatic mapping or not and this API provides this mechanism.
> > Follow up patches will provide how to over-ride this.
> >
> > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 32
> ++++++++++++++++++++++++++++++++
> >  include/uapi/linux/vfio.h       |  3 +++
> >  2 files changed, 35 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c index fa5d3e4..3315fb6 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -59,6 +59,7 @@ struct vfio_iommu {
> >  	struct rb_root		dma_list;
> >  	bool			v2;
> >  	bool			nesting;
> > +	bool			allow_msi_reconfig;
> >  	struct list_head	reserved_iova_list;
> >  };
> >
> > @@ -1117,6 +1118,23 @@ static int
> vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> >  	return ret;
> >  }
> >
> > +static
> > +int vfio_domains_get_msi_maps(struct vfio_iommu *iommu,
> > +			      struct iommu_domain_msi_maps *msi_maps) {
> > +	struct vfio_domain *d;
> > +	int ret;
> > +
> > +	mutex_lock(&iommu->lock);
> > +	/* All domains have same msi-automap property, pick first */
> > +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> > +	ret = iommu_domain_get_attr(d->domain,
> DOMAIN_ATTR_MSI_MAPPING,
> > +				    msi_maps);
> > +	mutex_unlock(&iommu->lock);
> > +
> > +	return ret;
> > +}
> > +
> >  static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  				   unsigned int cmd, unsigned long arg)  { @@
> -1138,6 +1156,8 @@
> > static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  		}
> >  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
> >  		struct vfio_iommu_type1_info info;
> > +		struct iommu_domain_msi_maps msi_maps;
> > +		int ret;
> >
> >  		minsz = offsetofend(struct vfio_iommu_type1_info,
> iova_pgsizes);
> >
> > @@ -1149,6 +1169,18 @@ static long vfio_iommu_type1_ioctl(void
> > *iommu_data,
> >
> >  		info.flags = 0;
> >
> > +		ret = vfio_domains_get_msi_maps(iommu, &msi_maps);
> > +		if (ret)
> > +			return ret;
> 
> And now ioctl(VFIO_IOMMU_GET_INFO) no longer works for any IOMMU
> implementing domain_get_attr but not supporting
> DOMAIN_ATTR_MSI_MAPPING.

With this current patch version this will get the default assumed behavior as you commented on previous patch. 

> 
> > +
> > +		if (msi_maps.override_automap) {
> > +			info.flags |=
> VFIO_IOMMU_INFO_MSI_ALLOW_RECONFIG;
> > +			iommu->allow_msi_reconfig = true;
> > +		}
> > +
> > +		if (msi_maps.automap)
> > +			info.flags |= VFIO_IOMMU_INFO_MSI_AUTOMAP;
> > +
> >  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
> >
> >  		return copy_to_user((void __user *)arg, &info, minsz); diff --
> git
> > a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index
> > 1abd1a9..9998f6e 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -391,6 +391,9 @@ struct vfio_iommu_type1_info {
> >  	__u32	argsz;
> >  	__u32	flags;
> >  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page
> sizes info */
> > +#define VFIO_IOMMU_INFO_MSI_AUTOMAP (1 << 1)	/* MSI pages
> are auto-mapped
> > +						   in iommu */
> > +#define VFIO_IOMMU_INFO_MSI_ALLOW_RECONFIG (1 << 2) /* Allows
> > +reconfig automap*/
> >  	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
> >  };
> >
> 
> Once again, exposing interfaces to the user before they actually do anything
> is backwards.

Will change the order.

Thanks
-Bharat


^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI pages
  2015-10-02 22:46   ` Alex Williamson
@ 2015-10-05  6:27     ` Bhushan Bharat
  2015-10-05 22:45       ` Alex Williamson
  0 siblings, 1 reply; 45+ messages in thread
From: Bhushan Bharat @ 2015-10-05  6:27 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Saturday, October 03, 2015 4:16 AM
> To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> marc.zyngier@arm.com; will.deacon@arm.com
> Subject: Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI
> pages
> 
> On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > For MSI interrupts to work for a pass-through devices we need to have
> > mapping of msi-pages in iommu. Now on some platforms (like x86) does
> > this msi-pages mapping happens magically and in these case they
> > chooses an iova which they somehow know that it will never overlap
> > with guest memory. But this magic iova selection may not be always
> > true for all platform (like PowerPC and ARM64).
> >
> > Also on x86 platform, there is no problem as long as running a
> > x86-guest on x86-host but there can be issues when running a non-x86
> > guest on
> > x86 host or other userspace applications like (I think ODP/DPDK).
> > As in these cases there can be chances that it overlaps with guest
> > memory mapping.
> 
> Wow, it's amazing anything works... smoke and mirrors.
> 
> > This patch add interface to iommu-map and iommu-unmap msi-pages at
> > reserved iova chosen by userspace.
> >
> > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > ---
> >  drivers/vfio/vfio.c             |  52 +++++++++++++++++++
> >  drivers/vfio/vfio_iommu_type1.c | 111
> ++++++++++++++++++++++++++++++++++++++++
> >  include/linux/vfio.h            |   9 +++-
> >  3 files changed, 171 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index
> > 2fb29df..a817d2d 100644
> > --- a/drivers/vfio/vfio.c
> > +++ b/drivers/vfio/vfio.c
> > @@ -605,6 +605,58 @@ static int vfio_iommu_group_notifier(struct
> notifier_block *nb,
> >  	return NOTIFY_OK;
> >  }
> >
> > +int vfio_device_map_msi(struct vfio_device *device, uint64_t msi_addr,
> > +			uint32_t size, uint64_t *msi_iova) {
> > +	struct vfio_container *container = device->group->container;
> > +	struct vfio_iommu_driver *driver;
> > +	int ret;
> > +
> > +	/* Validate address and size */
> > +	if (!msi_addr || !size || !msi_iova)
> > +		return -EINVAL;
> > +
> > +	down_read(&container->group_lock);
> > +
> > +	driver = container->iommu_driver;
> > +	if (!driver || !driver->ops || !driver->ops->msi_map) {
> > +		up_read(&container->group_lock);
> > +		return -EINVAL;
> > +	}
> > +
> > +	ret = driver->ops->msi_map(container->iommu_data,
> > +				   msi_addr, size, msi_iova);
> > +
> > +	up_read(&container->group_lock);
> > +	return ret;
> > +}
> > +
> > +int vfio_device_unmap_msi(struct vfio_device *device, uint64_t
> msi_iova,
> > +			  uint64_t size)
> > +{
> > +	struct vfio_container *container = device->group->container;
> > +	struct vfio_iommu_driver *driver;
> > +	int ret;
> > +
> > +	/* Validate address and size */
> > +	if (!msi_iova || !size)
> > +		return -EINVAL;
> > +
> > +	down_read(&container->group_lock);
> > +
> > +	driver = container->iommu_driver;
> > +	if (!driver || !driver->ops || !driver->ops->msi_unmap) {
> > +		up_read(&container->group_lock);
> > +		return -EINVAL;
> > +	}
> > +
> > +	ret = driver->ops->msi_unmap(container->iommu_data,
> > +				     msi_iova, size);
> > +
> > +	up_read(&container->group_lock);
> > +	return ret;
> > +}
> > +
> >  /**
> >   * VFIO driver API
> >   */
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c index 3315fb6..ab376c2 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -1003,12 +1003,34 @@ out_free:
> >  	return ret;
> >  }
> >
> > +static void vfio_iommu_unmap_all_reserved_regions(struct vfio_iommu
> > +*iommu) {
> > +	struct vfio_resvd_region *region;
> > +	struct vfio_domain *d;
> > +
> > +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> > +		list_for_each_entry(d, &iommu->domain_list, next) {
> > +			if (!region->map_paddr)
> > +				continue;
> > +
> > +			if (!iommu_iova_to_phys(d->domain, region->iova))
> > +				continue;
> > +
> > +			iommu_unmap(d->domain, region->iova,
> PAGE_SIZE);
> 
> PAGE_SIZE?  Why not region->size?

Yes, this should be region->size.

> 
> > +			region->map_paddr = 0;
> > +			cond_resched();
> > +		}
> > +	}
> > +}
> > +
> >  static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu)  {
> >  	struct rb_node *node;
> >
> >  	while ((node = rb_first(&iommu->dma_list)))
> >  		vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma,
> node));
> > +
> > +	vfio_iommu_unmap_all_reserved_regions(iommu);
> >  }
> >
> >  static void vfio_iommu_type1_detach_group(void *iommu_data, @@
> > -1048,6 +1070,93 @@ done:
> >  	mutex_unlock(&iommu->lock);
> >  }
> >
> > +static int vfio_iommu_type1_msi_map(void *iommu_data, uint64_t
> msi_addr,
> > +				    uint64_t size, uint64_t *msi_iova) {
> > +	struct vfio_iommu *iommu = iommu_data;
> > +	struct vfio_resvd_region *region;
> > +	int ret;
> > +
> > +	mutex_lock(&iommu->lock);
> > +
> > +	/* Do not try ceate iommu-mapping if msi reconfig not allowed */
> > +	if (!iommu->allow_msi_reconfig) {
> > +		mutex_unlock(&iommu->lock);
> > +		return 0;
> > +	}
> > +
> > +	/* Check if there is already region mapping the msi page */
> > +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> > +		if (region->map_paddr == msi_addr) {
> > +			*msi_iova = region->iova;
> > +			region->refcount++;
> > +			mutex_unlock(&iommu->lock);
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	/* Get a unmapped reserved region */
> > +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> > +		if (!region->map_paddr)
> > +			break;
> > +	}
> > +
> > +	if (region == NULL) {
> > +		mutex_unlock(&iommu->lock);
> > +		return -ENODEV;
> > +	}
> > +
> > +	ret = vfio_iommu_map(iommu, region->iova, msi_addr >>
> PAGE_SHIFT,
> > +			     size >> PAGE_SHIFT, region->prot);
> 
> So the reserved region has a size and the msi mapping has a size and we
> arbitrarily decide to use the msi mapping size here?

Reserved region interface is generic and user can set reserved region of any size (multiple of page-size). But we do not want to create MSI address mapping beyond the MSI-page otherwise this can be security issue. But I think I am not tracking how much reserved iova region is mapped, so unmap is called for same size.


>  The overlap checks we've done for the reserved region are meaningless then.  No wonder
> you're unmapping with PAGE_SIZE, we have no idea.

Do you think we should divide the reserved region in pages and track map/unmap per page?

> 
> > +	if (ret) {
> > +		mutex_unlock(&iommu->lock);
> > +		return ret;
> > +	}
> > +
> > +	region->map_paddr = msi_addr;
> 
> Is there some sort of implied page alignment with msi_addr?  I could pass 0x0
> for one call, 0x1 for another and due to the mapping shift, get two reserved
> IOVAs pointing at the same msi page.

Page size alignment. Will have a check for alignment.

> 
> > +	*msi_iova = region->iova;
> > +	region->refcount++;
> > +
> > +	mutex_unlock(&iommu->lock);
> > +
> > +	return 0;
> > +}
> > +
> > +static int vfio_iommu_type1_msi_unmap(void *iommu_data, uint64_t
> iova,
> > +				      uint64_t size)
> > +{
> > +	struct vfio_iommu *iommu = iommu_data;
> > +	struct vfio_resvd_region *region;
> > +	struct vfio_domain *d;
> > +
> > +	mutex_lock(&iommu->lock);
> > +
> > +	/* find the region mapping the msi page */
> > +	list_for_each_entry(region, &iommu->reserved_iova_list, next)
> > +		if (region->iova == iova)
> > +			break;
> > +
> > +	if (region == NULL || region->refcount <= 0) {
> > +		mutex_unlock(&iommu->lock);
> > +		return -EINVAL;
> > +	}
> > +
> > +	region->refcount--;
> > +	if (!region->refcount) {
> > +		list_for_each_entry(d, &iommu->domain_list, next) {
> > +			if (!iommu_iova_to_phys(d->domain, iova))
> > +				continue;
> > +
> > +			iommu_unmap(d->domain, iova, size);
> 
> And here we're just trusting that the unmap was the same size as the map?
> 
> > +			cond_resched();
> > +		}
> > +	}
> > +	region->map_paddr = 0;
> > +
> > +	mutex_unlock(&iommu->lock);
> > +	return 0;
> > +}
> > +
> >  static void *vfio_iommu_type1_open(unsigned long arg)  {
> >  	struct vfio_iommu *iommu;
> > @@ -1264,6 +1373,8 @@ static const struct vfio_iommu_driver_ops
> vfio_iommu_driver_ops_type1 = {
> >  	.ioctl		= vfio_iommu_type1_ioctl,
> >  	.attach_group	= vfio_iommu_type1_attach_group,
> >  	.detach_group	= vfio_iommu_type1_detach_group,
> > +	.msi_map	= vfio_iommu_type1_msi_map,
> > +	.msi_unmap	= vfio_iommu_type1_msi_unmap,
> >  };
> >
> >  static int __init vfio_iommu_type1_init(void) diff --git
> > a/include/linux/vfio.h b/include/linux/vfio.h index ddb4409..b91085d
> > 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -52,6 +52,10 @@ extern void *vfio_del_group_dev(struct device
> > *dev);  extern struct vfio_device *vfio_device_get_from_dev(struct
> > device *dev);  extern void vfio_device_put(struct vfio_device
> > *device);  extern void *vfio_device_data(struct vfio_device *device);
> > +extern int vfio_device_map_msi(struct vfio_device *device, uint64_t
> msi_addr,
> > +			       uint32_t size, uint64_t *msi_iova); int
> > +vfio_device_unmap_msi(struct vfio_device *device, uint64_t msi_iova,
> > +			  uint64_t size);
> >
> >  /**
> >   * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks @@
> > -72,7 +76,10 @@ struct vfio_iommu_driver_ops {
> >  					struct iommu_group *group);
> >  	void		(*detach_group)(void *iommu_data,
> >  					struct iommu_group *group);
> > -
> > +	int		(*msi_map)(void *iommu_data, uint64_t msi_addr,
> > +				   uint64_t size, uint64_t *msi_iova);
> > +	int		(*msi_unmap)(void *iommu_data, uint64_t
> msi_iova,
> > +				     uint64_t size);
> >  };
> >
> >  extern int vfio_register_iommu_driver(const struct
> > vfio_iommu_driver_ops *ops);
> 
> How did this patch solve any of the problems outlined in the commit log?

Problem outlined in the commit log is not solved by this patch but I think we want to solve this by the patch series. I will more the problem to cover-letter.

Thanks
-Bharat




^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt
  2015-10-02 22:46   ` Alex Williamson
@ 2015-10-05  7:20     ` Bhushan Bharat
  2015-10-05 22:44       ` Alex Williamson
  0 siblings, 1 reply; 45+ messages in thread
From: Bhushan Bharat @ 2015-10-05  7:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Saturday, October 03, 2015 4:17 AM
> To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> marc.zyngier@arm.com; will.deacon@arm.com
> Subject: Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi
> interrupt
> 
> On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > An MSI-address is allocated and programmed in pcie device during
> > interrupt configuration. Now for a pass-through device, try to create
> > the iommu mapping for this allocted/programmed msi-address.  If the
> > iommu mapping is created and the msi address programmed in the pcie
> > device is different from msi-iova as per iommu programming then
> > reconfigure the pci device to use msi-iova as msi address.
> >
> > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_intrs.c | 36
> > ++++++++++++++++++++++++++++++++++--
> >  1 file changed, 34 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c
> > b/drivers/vfio/pci/vfio_pci_intrs.c
> > index 1f577b4..c9690af 100644
> > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> > @@ -312,13 +312,23 @@ static int vfio_msi_set_vector_signal(struct
> vfio_pci_device *vdev,
> >  	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
> >  	char *name = msix ? "vfio-msix" : "vfio-msi";
> >  	struct eventfd_ctx *trigger;
> > +	struct msi_msg msg;
> > +	struct vfio_device *device;
> > +	uint64_t msi_addr, msi_iova;
> >  	int ret;
> >
> >  	if (vector >= vdev->num_ctx)
> >  		return -EINVAL;
> >
> > +	device = vfio_device_get_from_dev(&pdev->dev);
> 
> Have you looked at this function?  I don't think we want to be doing that
> every time we want to poke the interrupt configuration.

I am trying to describe what I understood, a device can have many interrupts and we should setup iommu only once, when called for the first time to enable/setup interrupt.
Similarly when disabling the interrupt we should iommu-unmap when called for the last enabled interrupt for that device. Now with this understanding, should I move this map-unmap to separate functions and call them from vfio_msi_set_block() rather than in vfio_msi_set_vector_signal()

>  Also note that
> IOMMU mappings don't operate on devices, but groups, so maybe we want
> to pass the group.

Yes, it operates on group. I hesitated to add an API to get group. Do you suggest to that it is ok to add API to get group from device.

> 
> > +	if (device == NULL)
> > +		return -EINVAL;
> 
> This would be a legitimate BUG_ON(!device)
> 
> > +
> >  	if (vdev->ctx[vector].trigger) {
> >  		free_irq(irq, vdev->ctx[vector].trigger);
> > +		get_cached_msi_msg(irq, &msg);
> > +		msi_iova = ((u64)msg.address_hi << 32) | msg.address_lo;
> > +		vfio_device_unmap_msi(device, msi_iova, PAGE_SIZE);
> >  		kfree(vdev->ctx[vector].name);
> >  		eventfd_ctx_put(vdev->ctx[vector].trigger);
> >  		vdev->ctx[vector].trigger = NULL;
> > @@ -346,12 +356,11 @@ static int vfio_msi_set_vector_signal(struct
> vfio_pci_device *vdev,
> >  	 * cached value of the message prior to enabling.
> >  	 */
> >  	if (msix) {
> > -		struct msi_msg msg;
> > -
> >  		get_cached_msi_msg(irq, &msg);
> >  		pci_write_msi_msg(irq, &msg);
> >  	}
> >
> > +
> 
> gratuitous newline
> 
> >  	ret = request_irq(irq, vfio_msihandler, 0,
> >  			  vdev->ctx[vector].name, trigger);
> >  	if (ret) {
> > @@ -360,6 +369,29 @@ static int vfio_msi_set_vector_signal(struct
> vfio_pci_device *vdev,
> >  		return ret;
> >  	}
> >
> > +	/* Re-program the new-iova in pci-device in case there is
> > +	 * different iommu-mapping created for programmed msi-address.
> > +	 */
> > +	get_cached_msi_msg(irq, &msg);
> > +	msi_iova = 0;
> > +	msi_addr = (u64)(msg.address_hi) << 32 | (u64)(msg.address_lo);
> > +	ret = vfio_device_map_msi(device, msi_addr, PAGE_SIZE,
> &msi_iova);
> > +	if (ret) {
> > +		free_irq(irq, vdev->ctx[vector].trigger);
> > +		kfree(vdev->ctx[vector].name);
> > +		eventfd_ctx_put(trigger);
> > +		return ret;
> > +	}
> > +
> > +	/* Reprogram only if iommu-mapped iova is different from msi-
> address */
> > +	if (msi_iova && (msi_iova != msi_addr)) {
> > +		msg.address_hi = (u32)(msi_iova >> 32);
> > +		/* Keep Lower bits from original msi message address */
> > +		msg.address_lo &= PAGE_MASK;
> > +		msg.address_lo |= (u32)(msi_iova & 0x00000000ffffffff);
> 
> Seems like you're making some assumptions here that are dependent on the
> architecture and maybe the platform.

What I tried is to map the msi page with different iova, which is page size aligned. But the offset within the page will remain same.
For example, original msi address was 0x0603_0040 and we have a reserved region at 0xf000_0000. So iommu mapping is created for 0xf000_0000 =>0x0600_3000 of size 0x1000.

So the new address to be programmed in device is 0xf000_0040, offset 0x40 added to base address in iommu mapping.

Thanks
-Bharat

> 
> > +		pci_write_msi_msg(irq, &msg);
> > +	}
> > +
> >  	vdev->ctx[vector].trigger = trigger;
> >
> >  	return 0;
> 
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for MSI
  2015-10-02 22:46   ` Alex Williamson
@ 2015-10-05  8:33     ` Bhushan Bharat
  2015-10-05 22:54       ` Alex Williamson
  0 siblings, 1 reply; 45+ messages in thread
From: Bhushan Bharat @ 2015-10-05  8:33 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, marc.zyngier, will.deacon, kvmarm



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Saturday, October 03, 2015 4:17 AM
> To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> marc.zyngier@arm.com; will.deacon@arm.com
> Subject: Re: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for
> MSI
> 
> On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > Finally ARM SMMU declare that iommu-mapping for MSI-pages are not set
> > automatically and it should be set explicitly.
> >
> > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > ---
> >  drivers/iommu/arm-smmu.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index
> > a3956fb..9d37e72 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -1401,13 +1401,18 @@ static int arm_smmu_domain_get_attr(struct
> iommu_domain *domain,
> >  				    enum iommu_attr attr, void *data)  {
> >  	struct arm_smmu_domain *smmu_domain =
> to_smmu_domain(domain);
> > +	struct iommu_domain_msi_maps *msi_maps;
> >
> >  	switch (attr) {
> >  	case DOMAIN_ATTR_NESTING:
> >  		*(int *)data = (smmu_domain->stage ==
> ARM_SMMU_DOMAIN_NESTED);
> >  		return 0;
> >  	case DOMAIN_ATTR_MSI_MAPPING:
> > -		/* Dummy handling added */
> > +		msi_maps = data;
> > +
> > +		msi_maps->automap = false;
> > +		msi_maps->override_automap = true;
> > +
> >  		return 0;
> >  	default:
> >  		return -ENODEV;
> 
> In previous discussions I understood one of the problems you were trying to
> solve was having a limited number of MSI banks and while you may be able
> to get isolated MSI banks for some number of users, it wasn't unlimited and
> sharing may be required.  I don't see any of that addressed in this series.

That problem was on PowerPC. Infact there were two problems, one which MSI bank to be used and second how to create iommu-mapping for device assigned to userspace.
First problem was PowerPC specific and that will be solved separately.
For second problem, earlier I tried to added a couple of MSI specific ioctls and you suggested (IIUC) that we should have a generic reserved-iova type of API and then we can map MSI bank using reserved-iova and this will not require involvement of user-space.

> 
> Also, the management of reserved IOVAs vs MSI addresses looks really
> dubious to me.  How does your platform pick an MSI address and what are
> we breaking by covertly changing it?  We seem to be masking over at the
> VFIO level, where there should be lower level interfaces doing the right thing
> when we configure MSI on the device.

Yes, In my understanding the right solution should be:
 1) VFIO driver should know what physical-msi-address will be used for devices in an iommu-group.
    I did not find an generic API, on PowerPC I added some function in ffrescale msi-driver and called from vfio-iommu-fsl-pamu.c (not yet upstreamed).
 2) VFIO driver should know what IOVA to be used for creating iommu-mapping (VFIO APIs patch of this patch series)
 3) VFIO driver will create the iommu-mapping using (1) and (2)
 4) VFIO driver should be able to tell the msi-driver that for a given device it should use different IOVA. So when composing the msi message (for the devices is the given iommu-group) it should use that programmed iova as MSI-address. This interface also needed to be developed.

I was not sure of which approach we should take. The current approach in the patch is simple to develop so I went ahead to take input but I agree this does not look very good.
What do you think, should drop this approach and work out the approach as described above.

Thanks
-Bharat
> 
> The problem of reporting "automap" base address isn't addressed more than
> leaving some unused field in iommu_domain_msi_maps.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt
  2015-10-05  7:20     ` Bhushan Bharat
@ 2015-10-05 22:44       ` Alex Williamson
  2015-10-06  8:32         ` Bhushan Bharat
  0 siblings, 1 reply; 45+ messages in thread
From: Alex Williamson @ 2015-10-05 22:44 UTC (permalink / raw)
  To: Bhushan Bharat
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon

On Mon, 2015-10-05 at 07:20 +0000, Bhushan Bharat wrote:
> 
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Saturday, October 03, 2015 4:17 AM
> > To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> > Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> > marc.zyngier@arm.com; will.deacon@arm.com
> > Subject: Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi
> > interrupt
> > 
> > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > An MSI-address is allocated and programmed in pcie device during
> > > interrupt configuration. Now for a pass-through device, try to create
> > > the iommu mapping for this allocted/programmed msi-address.  If the
> > > iommu mapping is created and the msi address programmed in the pcie
> > > device is different from msi-iova as per iommu programming then
> > > reconfigure the pci device to use msi-iova as msi address.
> > >
> > > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > > ---
> > >  drivers/vfio/pci/vfio_pci_intrs.c | 36
> > > ++++++++++++++++++++++++++++++++++--
> > >  1 file changed, 34 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c
> > > b/drivers/vfio/pci/vfio_pci_intrs.c
> > > index 1f577b4..c9690af 100644
> > > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> > > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> > > @@ -312,13 +312,23 @@ static int vfio_msi_set_vector_signal(struct
> > vfio_pci_device *vdev,
> > >  	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
> > >  	char *name = msix ? "vfio-msix" : "vfio-msi";
> > >  	struct eventfd_ctx *trigger;
> > > +	struct msi_msg msg;
> > > +	struct vfio_device *device;
> > > +	uint64_t msi_addr, msi_iova;
> > >  	int ret;
> > >
> > >  	if (vector >= vdev->num_ctx)
> > >  		return -EINVAL;
> > >
> > > +	device = vfio_device_get_from_dev(&pdev->dev);
> > 
> > Have you looked at this function?  I don't think we want to be doing that
> > every time we want to poke the interrupt configuration.
> 
> I am trying to describe what I understood, a device can have many interrupts and we should setup iommu only once, when called for the first time to enable/setup interrupt.
> Similarly when disabling the interrupt we should iommu-unmap when called for the last enabled interrupt for that device. Now with this understanding, should I move this map-unmap to separate functions and call them from vfio_msi_set_block() rather than in vfio_msi_set_vector_signal()

Interrupts can be setup and torn down at any time and I don't see how
one function or the other makes much difference.
vfio_device_get_from_dev() is enough overhead that the data we need
should be cached if we're going to call it with some regularity.  Maybe
vfio_iommu_driver_ops.open() should be called with a pointer to the
vfio_device... or the vfio_group.

> >  Also note that
> > IOMMU mappings don't operate on devices, but groups, so maybe we want
> > to pass the group.
> 
> Yes, it operates on group. I hesitated to add an API to get group. Do you suggest to that it is ok to add API to get group from device.

No, the above suggestion is probably better.

> > 
> > > +	if (device == NULL)
> > > +		return -EINVAL;
> > 
> > This would be a legitimate BUG_ON(!device)
> > 
> > > +
> > >  	if (vdev->ctx[vector].trigger) {
> > >  		free_irq(irq, vdev->ctx[vector].trigger);
> > > +		get_cached_msi_msg(irq, &msg);
> > > +		msi_iova = ((u64)msg.address_hi << 32) | msg.address_lo;
> > > +		vfio_device_unmap_msi(device, msi_iova, PAGE_SIZE);
> > >  		kfree(vdev->ctx[vector].name);
> > >  		eventfd_ctx_put(vdev->ctx[vector].trigger);
> > >  		vdev->ctx[vector].trigger = NULL;
> > > @@ -346,12 +356,11 @@ static int vfio_msi_set_vector_signal(struct
> > vfio_pci_device *vdev,
> > >  	 * cached value of the message prior to enabling.
> > >  	 */
> > >  	if (msix) {
> > > -		struct msi_msg msg;
> > > -
> > >  		get_cached_msi_msg(irq, &msg);
> > >  		pci_write_msi_msg(irq, &msg);
> > >  	}
> > >
> > > +
> > 
> > gratuitous newline
> > 
> > >  	ret = request_irq(irq, vfio_msihandler, 0,
> > >  			  vdev->ctx[vector].name, trigger);
> > >  	if (ret) {
> > > @@ -360,6 +369,29 @@ static int vfio_msi_set_vector_signal(struct
> > vfio_pci_device *vdev,
> > >  		return ret;
> > >  	}
> > >
> > > +	/* Re-program the new-iova in pci-device in case there is
> > > +	 * different iommu-mapping created for programmed msi-address.
> > > +	 */
> > > +	get_cached_msi_msg(irq, &msg);
> > > +	msi_iova = 0;
> > > +	msi_addr = (u64)(msg.address_hi) << 32 | (u64)(msg.address_lo);
> > > +	ret = vfio_device_map_msi(device, msi_addr, PAGE_SIZE,
> > &msi_iova);
> > > +	if (ret) {
> > > +		free_irq(irq, vdev->ctx[vector].trigger);
> > > +		kfree(vdev->ctx[vector].name);
> > > +		eventfd_ctx_put(trigger);
> > > +		return ret;
> > > +	}
> > > +
> > > +	/* Reprogram only if iommu-mapped iova is different from msi-
> > address */
> > > +	if (msi_iova && (msi_iova != msi_addr)) {
> > > +		msg.address_hi = (u32)(msi_iova >> 32);
> > > +		/* Keep Lower bits from original msi message address */
> > > +		msg.address_lo &= PAGE_MASK;
> > > +		msg.address_lo |= (u32)(msi_iova & 0x00000000ffffffff);
> > 
> > Seems like you're making some assumptions here that are dependent on the
> > architecture and maybe the platform.
> 
> What I tried is to map the msi page with different iova, which is page size aligned. But the offset within the page will remain same.
> For example, original msi address was 0x0603_0040 and we have a reserved region at 0xf000_0000. So iommu mapping is created for 0xf000_0000 =>0x0600_3000 of size 0x1000.
> 
> So the new address to be programmed in device is 0xf000_0040, offset 0x40 added to base address in iommu mapping.

Don't you need ~PAGE_MASK for it to work like that?  The & with
0x00000000ffffffff shouldn't be needed either, certainly not with all
the leading zeros.

> > > +		pci_write_msi_msg(irq, &msg);
> > > +	}
> > > +
> > >  	vdev->ctx[vector].trigger = trigger;
> > >
> > >  	return 0;
> > 
> > 
> 




^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
  2015-10-05  6:00     ` Bhushan Bharat
@ 2015-10-05 22:45       ` Alex Williamson
  2015-10-06  8:53         ` Bhushan Bharat
  0 siblings, 1 reply; 45+ messages in thread
From: Alex Williamson @ 2015-10-05 22:45 UTC (permalink / raw)
  To: Bhushan Bharat
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon

On Mon, 2015-10-05 at 06:00 +0000, Bhushan Bharat wrote:
> > -1138,6 +1156,8 @@
> > > static long vfio_iommu_type1_ioctl(void *iommu_data,
> > >  		}
> > >  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
> > >  		struct vfio_iommu_type1_info info;
> > > +		struct iommu_domain_msi_maps msi_maps;
> > > +		int ret;
> > >
> > >  		minsz = offsetofend(struct vfio_iommu_type1_info,
> > iova_pgsizes);
> > >
> > > @@ -1149,6 +1169,18 @@ static long vfio_iommu_type1_ioctl(void
> > > *iommu_data,
> > >
> > >  		info.flags = 0;
> > >
> > > +		ret = vfio_domains_get_msi_maps(iommu, &msi_maps);
> > > +		if (ret)
> > > +			return ret;
> > 
> > And now ioctl(VFIO_IOMMU_GET_INFO) no longer works for any IOMMU
> > implementing domain_get_attr but not supporting
> > DOMAIN_ATTR_MSI_MAPPING.
> 
> With this current patch version this will get the default assumed behavior as you commented on previous patch. 

How so?

+               msi_maps->automap = true;
+               msi_maps->override_automap = false;
+
+               if (domain->ops->domain_get_attr)
+                       ret = domain->ops->domain_get_attr(domain, attr, data);

If domain_get_attr is implemented, but DOMAIN_ATTR_MSI_MAPPING is not,
ret should be an error code.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI pages
  2015-10-05  6:27     ` Bhushan Bharat
@ 2015-10-05 22:45       ` Alex Williamson
  2015-10-06  9:05         ` Bhushan Bharat
  0 siblings, 1 reply; 45+ messages in thread
From: Alex Williamson @ 2015-10-05 22:45 UTC (permalink / raw)
  To: Bhushan Bharat
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon

On Mon, 2015-10-05 at 06:27 +0000, Bhushan Bharat wrote:
> 
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Saturday, October 03, 2015 4:16 AM
> > To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> > Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> > marc.zyngier@arm.com; will.deacon@arm.com
> > Subject: Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI
> > pages
> > 
> > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > For MSI interrupts to work for a pass-through devices we need to have
> > > mapping of msi-pages in iommu. Now on some platforms (like x86) does
> > > this msi-pages mapping happens magically and in these case they
> > > chooses an iova which they somehow know that it will never overlap
> > > with guest memory. But this magic iova selection may not be always
> > > true for all platform (like PowerPC and ARM64).
> > >
> > > Also on x86 platform, there is no problem as long as running a
> > > x86-guest on x86-host but there can be issues when running a non-x86
> > > guest on
> > > x86 host or other userspace applications like (I think ODP/DPDK).
> > > As in these cases there can be chances that it overlaps with guest
> > > memory mapping.
> > 
> > Wow, it's amazing anything works... smoke and mirrors.
> > 
> > > This patch add interface to iommu-map and iommu-unmap msi-pages at
> > > reserved iova chosen by userspace.
> > >
> > > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > > ---
> > >  drivers/vfio/vfio.c             |  52 +++++++++++++++++++
> > >  drivers/vfio/vfio_iommu_type1.c | 111
> > ++++++++++++++++++++++++++++++++++++++++
> > >  include/linux/vfio.h            |   9 +++-
> > >  3 files changed, 171 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index
> > > 2fb29df..a817d2d 100644
> > > --- a/drivers/vfio/vfio.c
> > > +++ b/drivers/vfio/vfio.c
> > > @@ -605,6 +605,58 @@ static int vfio_iommu_group_notifier(struct
> > notifier_block *nb,
> > >  	return NOTIFY_OK;
> > >  }
> > >
> > > +int vfio_device_map_msi(struct vfio_device *device, uint64_t msi_addr,
> > > +			uint32_t size, uint64_t *msi_iova) {
> > > +	struct vfio_container *container = device->group->container;
> > > +	struct vfio_iommu_driver *driver;
> > > +	int ret;
> > > +
> > > +	/* Validate address and size */
> > > +	if (!msi_addr || !size || !msi_iova)
> > > +		return -EINVAL;
> > > +
> > > +	down_read(&container->group_lock);
> > > +
> > > +	driver = container->iommu_driver;
> > > +	if (!driver || !driver->ops || !driver->ops->msi_map) {
> > > +		up_read(&container->group_lock);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	ret = driver->ops->msi_map(container->iommu_data,
> > > +				   msi_addr, size, msi_iova);
> > > +
> > > +	up_read(&container->group_lock);
> > > +	return ret;
> > > +}
> > > +
> > > +int vfio_device_unmap_msi(struct vfio_device *device, uint64_t
> > msi_iova,
> > > +			  uint64_t size)
> > > +{
> > > +	struct vfio_container *container = device->group->container;
> > > +	struct vfio_iommu_driver *driver;
> > > +	int ret;
> > > +
> > > +	/* Validate address and size */
> > > +	if (!msi_iova || !size)
> > > +		return -EINVAL;
> > > +
> > > +	down_read(&container->group_lock);
> > > +
> > > +	driver = container->iommu_driver;
> > > +	if (!driver || !driver->ops || !driver->ops->msi_unmap) {
> > > +		up_read(&container->group_lock);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	ret = driver->ops->msi_unmap(container->iommu_data,
> > > +				     msi_iova, size);
> > > +
> > > +	up_read(&container->group_lock);
> > > +	return ret;
> > > +}
> > > +
> > >  /**
> > >   * VFIO driver API
> > >   */
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > b/drivers/vfio/vfio_iommu_type1.c index 3315fb6..ab376c2 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -1003,12 +1003,34 @@ out_free:
> > >  	return ret;
> > >  }
> > >
> > > +static void vfio_iommu_unmap_all_reserved_regions(struct vfio_iommu
> > > +*iommu) {
> > > +	struct vfio_resvd_region *region;
> > > +	struct vfio_domain *d;
> > > +
> > > +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> > > +		list_for_each_entry(d, &iommu->domain_list, next) {
> > > +			if (!region->map_paddr)
> > > +				continue;
> > > +
> > > +			if (!iommu_iova_to_phys(d->domain, region->iova))
> > > +				continue;
> > > +
> > > +			iommu_unmap(d->domain, region->iova,
> > PAGE_SIZE);
> > 
> > PAGE_SIZE?  Why not region->size?
> 
> Yes, this should be region->size.
> 
> > 
> > > +			region->map_paddr = 0;
> > > +			cond_resched();
> > > +		}
> > > +	}
> > > +}
> > > +
> > >  static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu)  {
> > >  	struct rb_node *node;
> > >
> > >  	while ((node = rb_first(&iommu->dma_list)))
> > >  		vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma,
> > node));
> > > +
> > > +	vfio_iommu_unmap_all_reserved_regions(iommu);
> > >  }
> > >
> > >  static void vfio_iommu_type1_detach_group(void *iommu_data, @@
> > > -1048,6 +1070,93 @@ done:
> > >  	mutex_unlock(&iommu->lock);
> > >  }
> > >
> > > +static int vfio_iommu_type1_msi_map(void *iommu_data, uint64_t
> > msi_addr,
> > > +				    uint64_t size, uint64_t *msi_iova) {
> > > +	struct vfio_iommu *iommu = iommu_data;
> > > +	struct vfio_resvd_region *region;
> > > +	int ret;
> > > +
> > > +	mutex_lock(&iommu->lock);
> > > +
> > > +	/* Do not try ceate iommu-mapping if msi reconfig not allowed */
> > > +	if (!iommu->allow_msi_reconfig) {
> > > +		mutex_unlock(&iommu->lock);
> > > +		return 0;
> > > +	}
> > > +
> > > +	/* Check if there is already region mapping the msi page */
> > > +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> > > +		if (region->map_paddr == msi_addr) {
> > > +			*msi_iova = region->iova;
> > > +			region->refcount++;
> > > +			mutex_unlock(&iommu->lock);
> > > +			return 0;
> > > +		}
> > > +	}
> > > +
> > > +	/* Get a unmapped reserved region */
> > > +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> > > +		if (!region->map_paddr)
> > > +			break;
> > > +	}
> > > +
> > > +	if (region == NULL) {
> > > +		mutex_unlock(&iommu->lock);
> > > +		return -ENODEV;
> > > +	}
> > > +
> > > +	ret = vfio_iommu_map(iommu, region->iova, msi_addr >>
> > PAGE_SHIFT,
> > > +			     size >> PAGE_SHIFT, region->prot);
> > 
> > So the reserved region has a size and the msi mapping has a size and we
> > arbitrarily decide to use the msi mapping size here?
> 
> Reserved region interface is generic and user can set reserved region of any size (multiple of page-size). But we do not want to create MSI address mapping beyond the MSI-page otherwise this can be security issue. But I think I am not tracking how much reserved iova region is mapped, so unmap is called for same size.
> 
> 
> >  The overlap checks we've done for the reserved region are meaningless then.  No wonder
> > you're unmapping with PAGE_SIZE, we have no idea.
> 
> Do you think we should divide the reserved region in pages and track map/unmap per page?

I'd certainly expect as a user to do one large reserved region mapping
and be done rather than a large number of smaller mappings.  I don't
really understand how we're providing isolation with this interface
though, we're setting up the IOMMU so the guest has a mapping to the
MSI, but our IOMMU granularity is page size.  Aren't we giving the guest
access to everything else that might be mapped into that page?  Don't we
need to push an reservation down to the MSI allocation in order to have
isolation?  If we did that, couldn't we pretty much guarantee that all
MSI vectors would fit into a page or two?


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region
  2015-10-05  4:55   ` Bhushan Bharat
@ 2015-10-05 22:45     ` Alex Williamson
  2015-10-06  9:39       ` Bhushan Bharat
  0 siblings, 1 reply; 45+ messages in thread
From: Alex Williamson @ 2015-10-05 22:45 UTC (permalink / raw)
  To: Bhushan Bharat
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon

On Mon, 2015-10-05 at 04:55 +0000, Bhushan Bharat wrote:
> Hi Alex,
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Saturday, October 03, 2015 4:16 AM
> > To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> > Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> > marc.zyngier@arm.com; will.deacon@arm.com
> > Subject: Re: [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova
> > region
> > 
> > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > This Patch adds the VFIO APIs to add and remove reserved iova regions.
> > > The reserved iova region can be used for mapping some specific
> > > physical address in iommu.
> > >
> > > Currently we are planning to use this interface for adding iova
> > > regions for creating iommu of msi-pages. But the API are designed for
> > > future extension where some other physical address can be mapped.
> > >
> > > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > > ---
> > >  drivers/vfio/vfio_iommu_type1.c | 201
> > +++++++++++++++++++++++++++++++++++++++-
> > >  include/uapi/linux/vfio.h       |  43 +++++++++
> > >  2 files changed, 243 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > b/drivers/vfio/vfio_iommu_type1.c index 57d8c37..fa5d3e4 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -59,6 +59,7 @@ struct vfio_iommu {
> > >  	struct rb_root		dma_list;
> > >  	bool			v2;
> > >  	bool			nesting;
> > > +	struct list_head	reserved_iova_list;
> > 
> > This alignment leads to poor packing in the structure, put it above the bools.
> 
> ok
> 
> > 
> > >  };
> > >
> > >  struct vfio_domain {
> > > @@ -77,6 +78,15 @@ struct vfio_dma {
> > >  	int			prot;		/* IOMMU_READ/WRITE */
> > >  };
> > >
> > > +struct vfio_resvd_region {
> > > +	dma_addr_t	iova;
> > > +	size_t		size;
> > > +	int		prot;			/* IOMMU_READ/WRITE */
> > > +	int		refcount;		/* ref count of mappings */
> > > +	uint64_t	map_paddr;		/* Mapped Physical Address
> > */
> > 
> > phys_addr_t
> 
> Ok,
> 
> > 
> > > +	struct list_head next;
> > > +};
> > > +
> > >  struct vfio_group {
> > >  	struct iommu_group	*iommu_group;
> > >  	struct list_head	next;
> > > @@ -106,6 +116,38 @@ static struct vfio_dma *vfio_find_dma(struct
> > vfio_iommu *iommu,
> > >  	return NULL;
> > >  }
> > >
> > > +/* This function must be called with iommu->lock held */ static bool
> > > +vfio_overlap_with_resvd_region(struct vfio_iommu *iommu,
> > > +					   dma_addr_t start, size_t size) {
> > > +	struct vfio_resvd_region *region;
> > > +
> > > +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> > > +		if (region->iova < start)
> > > +			return (start - region->iova < region->size);
> > > +		else if (start < region->iova)
> > > +			return (region->iova - start < size);
> > 
> > <= on both of the return lines?
> 
> I think is should be "<" and not "=<", no ?

Yep, looks like you're right.  Maybe there's a more straightforward way
to do this.

> > 
> > > +
> > > +		return (region->size > 0 && size > 0);
> > > +	}
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +/* This function must be called with iommu->lock held */ static
> > > +struct vfio_resvd_region *vfio_find_resvd_region(struct vfio_iommu
> > *iommu,
> > > +						 dma_addr_t start, size_t
> > size) {
> > > +	struct vfio_resvd_region *region;
> > > +
> > > +	list_for_each_entry(region, &iommu->reserved_iova_list, next)
> > > +		if (region->iova == start && region->size == size)
> > > +			return region;
> > > +
> > > +	return NULL;
> > > +}
> > > +
> > >  static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma
> > > *new)  {
> > >  	struct rb_node **link = &iommu->dma_list.rb_node, *parent =
> > NULL; @@
> > > -580,7 +622,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> > >
> > >  	mutex_lock(&iommu->lock);
> > >
> > > -	if (vfio_find_dma(iommu, iova, size)) {
> > > +	if (vfio_find_dma(iommu, iova, size) ||
> > > +	    vfio_overlap_with_resvd_region(iommu, iova, size)) {
> > >  		mutex_unlock(&iommu->lock);
> > >  		return -EEXIST;
> > >  	}
> > > @@ -626,6 +669,127 @@ static int vfio_dma_do_map(struct vfio_iommu
> > *iommu,
> > >  	return ret;
> > >  }
> > >
> > > +/* This function must be called with iommu->lock held */ static int
> > > +vfio_iommu_resvd_region_del(struct vfio_iommu *iommu,
> > > +				dma_addr_t iova, size_t size, int prot) {
> > > +	struct vfio_resvd_region *res_region;
> > 
> > Have some consistency in naming, just use "region".
> 
> Ok,
> 
> > > +
> > > +	res_region = vfio_find_resvd_region(iommu, iova, size);
> > > +	/* Region should not be mapped in iommu */
> > > +	if (res_region == NULL || res_region->map_paddr)
> > > +		return -EINVAL;
> > 
> > Are these two separate errors?  !region is -EINVAL, but being mapped is -
> > EBUSY.
> 
> Yes, will separate them.
> 
> > 
> > > +
> > > +	list_del(&res_region->next);
> > > +	kfree(res_region);
> > > +	return 0;
> > > +}
> > > +
> > > +/* This function must be called with iommu->lock held */ static int
> > > +vfio_iommu_resvd_region_add(struct vfio_iommu *iommu,
> > > +				       dma_addr_t iova, size_t size, int prot) {
> > > +	struct vfio_resvd_region *res_region;
> > > +
> > > +	/* Check overlap with with dma maping and reserved regions */
> > > +	if (vfio_find_dma(iommu, iova, size) ||
> > > +	    vfio_find_resvd_region(iommu, iova, size))
> > > +		return -EEXIST;
> > > +
> > > +	res_region = kzalloc(sizeof(*res_region), GFP_KERNEL);
> > > +	if (res_region == NULL)
> > > +		return -ENOMEM;
> > > +
> > > +	res_region->iova = iova;
> > > +	res_region->size = size;
> > > +	res_region->prot = prot;
> > > +	res_region->refcount = 0;
> > > +	res_region->map_paddr = 0;
> > 
> > They're already 0 by the kzalloc
> 
> Yes ;)
> > 
> > > +
> > > +	list_add(&res_region->next, &iommu->reserved_iova_list);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static
> > > +int vfio_handle_reserved_region_add(struct vfio_iommu *iommu,
> > > +				struct vfio_iommu_reserved_region_add
> > *region) {
> > > +	dma_addr_t iova = region->iova;
> > > +	size_t size = region->size;
> > > +	int flags = region->flags;
> > > +	uint64_t mask;
> > > +	int prot = 0;
> > > +	int ret;
> > > +
> > > +	/* Verify that none of our __u64 fields overflow */
> > > +	if (region->size != size || region->iova != iova)
> > > +		return -EINVAL;
> > > +
> > > +	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
> > > +
> > > +	WARN_ON(mask & PAGE_MASK);
> > > +
> > > +	if (flags & VFIO_IOMMU_RES_REGION_READ)
> > > +		prot |= IOMMU_READ;
> > > +	if (flags & VFIO_IOMMU_RES_REGION_WRITE)
> > > +		prot |= IOMMU_WRITE;
> > > +
> > > +	if (!prot || !size || (size | iova) & mask)
> > > +		return -EINVAL;
> > > +
> > > +	/* Don't allow IOVA wrap */
> > > +	if (iova + size - 1 < iova)
> > > +		return -EINVAL;
> > > +
> > > +	mutex_lock(&iommu->lock);
> > > +
> > > +	if (region->flags & VFIO_IOMMU_RES_REGION_ADD) {
> > > +		ret = vfio_iommu_resvd_region_add(iommu, iova, size,
> > prot);
> > > +		if (ret) {
> > > +			mutex_unlock(&iommu->lock);
> > > +			return ret;
> > > +		}
> > > +	}
> > 
> > Silently fail if not VFIO_IOMMU_RES_REGION_ADD?
> 
> As per below comment we do not need this flag. So the above flag checking will be removed.
> 
> > 
> > > +
> > > +	mutex_unlock(&iommu->lock);
> > > +	return 0;
> > > +}
> > > +
> > > +static
> > > +int vfio_handle_reserved_region_del(struct vfio_iommu *iommu,
> > > +				struct vfio_iommu_reserved_region_del
> > *region) {
> > > +	dma_addr_t iova = region->iova;
> > > +	size_t size = region->size;
> > > +	int flags = region->flags;
> > > +	int ret;
> > > +
> > > +	if (!(flags & VFIO_IOMMU_RES_REGION_DEL))
> > > +		return -EINVAL;
> > > +
> > > +	mutex_lock(&iommu->lock);
> > > +
> > > +	/* Check for the region */
> > > +	if (vfio_find_resvd_region(iommu, iova, size) == NULL) {
> > > +		mutex_unlock(&iommu->lock);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	/* remove the reserved region */
> > > +	if (region->flags & VFIO_IOMMU_RES_REGION_DEL) {
> > > +		ret = vfio_iommu_resvd_region_del(iommu, iova, size,
> > flags);
> > > +		if (ret) {
> > > +			mutex_unlock(&iommu->lock);
> > > +			return ret;
> > > +		}
> > > +	}
> > > +
> > > +	mutex_unlock(&iommu->lock);
> > > +	return 0;
> > > +}
> > > +
> > >  static int vfio_bus_type(struct device *dev, void *data)  {
> > >  	struct bus_type **bus = data;
> > > @@ -905,6 +1069,7 @@ static void *vfio_iommu_type1_open(unsigned
> > long arg)
> > >  	}
> > >
> > >  	INIT_LIST_HEAD(&iommu->domain_list);
> > > +	INIT_LIST_HEAD(&iommu->reserved_iova_list);
> > >  	iommu->dma_list = RB_ROOT;
> > >  	mutex_init(&iommu->lock);
> > >
> > > @@ -1020,6 +1185,40 @@ static long vfio_iommu_type1_ioctl(void
> > *iommu_data,
> > >  			return ret;
> > >
> > >  		return copy_to_user((void __user *)arg, &unmap, minsz);
> > > +	} else if (cmd == VFIO_IOMMU_RESERVED_REGION_ADD) {
> > > +		struct vfio_iommu_reserved_region_add region;
> > > +		long ret;
> > > +
> > > +		minsz = offsetofend(struct
> > vfio_iommu_reserved_region_add,
> > > +				    size);
> > > +		if (copy_from_user(&region, (void __user *)arg, minsz))
> > > +			return -EFAULT;
> > > +
> > > +		if (region.argsz < minsz)
> > > +			return -EINVAL;
> > > +
> > > +		ret = vfio_handle_reserved_region_add(iommu, &region);
> > > +		if (ret)
> > > +			return ret;
> > > +
> > > +		return copy_to_user((void __user *)arg, &region, minsz);
> > > +	} else if (cmd == VFIO_IOMMU_RESERVED_REGION_DEL) {
> > > +		struct vfio_iommu_reserved_region_del region;
> > > +		long ret;
> > > +
> > > +		minsz = offsetofend(struct
> > vfio_iommu_reserved_region_del,
> > > +				    size);
> > > +		if (copy_from_user(&region, (void __user *)arg, minsz))
> > > +			return -EFAULT;
> > > +
> > > +		if (region.argsz < minsz)
> > > +			return -EINVAL;
> > > +
> > > +		ret = vfio_handle_reserved_region_del(iommu, &region);
> > > +		if (ret)
> > > +			return ret;
> > > +
> > > +		return copy_to_user((void __user *)arg, &region, minsz);
> > 
> > So we've just created an interface that is available for all vfio-type1 users,
> > whether it makes any sense for the platform or not,
> 
> How we should decide that a given platform needs this or not?

You later add new iommu interfaces, presumably if the iommu doesn't
implement those interfaces then there's no point in us exposing these
interfaces to vfio.

> > and it allows the user to
> > consume arbitrary amounts of kernel memory, by making an infinitely long
> > list of reserved iova entries, brilliant!
> 
> I was not sure of how to limit the user. What I was thinking of having a default number of pages a user can reserve (512 pages). Also we can give a sysfs interface so that user can change the default number of pages. Does this sound good? If not please suggest?

Isn't 512 entries a lot for a linked list?  Can we use our existing rb
tree to manage these entries rather than a secondary list?  How many
entries do we realistically need?  Can the iommu callbacks help give us
a limit?  Can we somehow use information about the devices in the group
to produce a limit, ie. MSI vectors possible from the group?

> > 
> > >  	}
> > >
> > >  	return -ENOTTY;
> > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > index b57b750..1abd1a9 100644
> > > --- a/include/uapi/linux/vfio.h
> > > +++ b/include/uapi/linux/vfio.h
> > > @@ -440,6 +440,49 @@ struct vfio_iommu_type1_dma_unmap {
> > >  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
> > >  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
> > >
> > > +/**************** Reserved IOVA region specific APIs
> > > +**********************/
> > > +
> > > +/*
> > > + * VFIO_IOMMU_RESERVED_REGION_ADD - _IO(VFIO_TYPE, VFIO_BASE
> > + 17,
> > > + *					struct
> > vfio_iommu_reserved_region_add)
> > > + * This is used to add a reserved iova region.
> > > + * @flags - Input: VFIO_IOMMU_RES_REGION_ADD flag is for adding
> > > + * a reserved region.
> > 
> > Why else would we call VFIO_IOMMU_RESERVED_REGION_ADD except to
> > add a region, this flag is redundant.
> 
> Ok, will remove this.
> 
> > 
> > > + * Also pass READ/WRITE/IOMMU flags to be used in iommu mapping.
> > > + * @iova - Input: IOVA base address of reserved region
> > > + * @size - Input: Size of the reserved region
> > > + * Return: 0 on success, -errno on failure  */ struct
> > > +vfio_iommu_reserved_region_add {
> > > +	__u32   argsz;
> > > +	__u32   flags;
> > > +#define VFIO_IOMMU_RES_REGION_ADD	(1 << 0) /* Add a reserved
> > region */
> > > +#define VFIO_IOMMU_RES_REGION_READ	(1 << 1) /* readable region */
> > > +#define VFIO_IOMMU_RES_REGION_WRITE	(1 << 2) /* writable
> > region */
> > > +	__u64	iova;
> > > +	__u64   size;
> > > +};
> > > +#define VFIO_IOMMU_RESERVED_REGION_ADD _IO(VFIO_TYPE,
> > VFIO_BASE + 17)
> > > +
> > > +/*
> > > + * VFIO_IOMMU_RESERVED_REGION_DEL - _IO(VFIO_TYPE, VFIO_BASE +
> > 18,
> > > + *					struct
> > vfio_iommu_reserved_region_del)
> > > + * This is used to delete an existing reserved iova region.
> > > + * @flags - VFIO_IOMMU_RES_REGION_DEL flag is for deleting a region
> > > +use,
> > > + *  only a unmapped region can be deleted.
> > > + * @iova - Input: IOVA base address of reserved region
> > > + * @size - Input: Size of the reserved region
> > > + * Return: 0 on success, -errno on failure  */ struct
> > > +vfio_iommu_reserved_region_del {
> > > +	__u32   argsz;
> > > +	__u32   flags;
> > > +#define VFIO_IOMMU_RES_REGION_DEL	(1 << 0) /* unset the
> > reserved region */
> > > +	__u64	iova;
> > > +	__u64   size;
> > > +};
> > > +#define VFIO_IOMMU_RESERVED_REGION_DEL _IO(VFIO_TYPE,
> > VFIO_BASE + 18)
> > > +
> > 
> > These are effectively both
> > 
> > struct vfio_iommu_type1_dma_unmap
> 
> Yes, do you want to suggest that we should use " struct vfio_iommu_type1_dma_unmap". I found that confusing.
> What is we use "struct vfio_iommu_reserved_region" and use flag VFIO_IOMMU_RES_REGION_DEL/ VFIO_IOMMU_RES_REGION_ADD.

What if we just use the existing map and unmap interface with a flag to
indicate an MSI reserved mapping?  I don't really see why we need new
ioctls for this.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for MSI
  2015-10-05  8:33     ` Bhushan Bharat
@ 2015-10-05 22:54       ` Alex Williamson
  2015-10-06 10:26         ` Bhushan Bharat
  0 siblings, 1 reply; 45+ messages in thread
From: Alex Williamson @ 2015-10-05 22:54 UTC (permalink / raw)
  To: Bhushan Bharat
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon

On Mon, 2015-10-05 at 08:33 +0000, Bhushan Bharat wrote:
> 
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Saturday, October 03, 2015 4:17 AM
> > To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> > Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> > marc.zyngier@arm.com; will.deacon@arm.com
> > Subject: Re: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for
> > MSI
> > 
> > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > Finally ARM SMMU declare that iommu-mapping for MSI-pages are not set
> > > automatically and it should be set explicitly.
> > >
> > > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > > ---
> > >  drivers/iommu/arm-smmu.c | 7 ++++++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index
> > > a3956fb..9d37e72 100644
> > > --- a/drivers/iommu/arm-smmu.c
> > > +++ b/drivers/iommu/arm-smmu.c
> > > @@ -1401,13 +1401,18 @@ static int arm_smmu_domain_get_attr(struct
> > iommu_domain *domain,
> > >  				    enum iommu_attr attr, void *data)  {
> > >  	struct arm_smmu_domain *smmu_domain =
> > to_smmu_domain(domain);
> > > +	struct iommu_domain_msi_maps *msi_maps;
> > >
> > >  	switch (attr) {
> > >  	case DOMAIN_ATTR_NESTING:
> > >  		*(int *)data = (smmu_domain->stage ==
> > ARM_SMMU_DOMAIN_NESTED);
> > >  		return 0;
> > >  	case DOMAIN_ATTR_MSI_MAPPING:
> > > -		/* Dummy handling added */
> > > +		msi_maps = data;
> > > +
> > > +		msi_maps->automap = false;
> > > +		msi_maps->override_automap = true;
> > > +
> > >  		return 0;
> > >  	default:
> > >  		return -ENODEV;
> > 
> > In previous discussions I understood one of the problems you were trying to
> > solve was having a limited number of MSI banks and while you may be able
> > to get isolated MSI banks for some number of users, it wasn't unlimited and
> > sharing may be required.  I don't see any of that addressed in this series.
> 
> That problem was on PowerPC. Infact there were two problems, one which MSI bank to be used and second how to create iommu-mapping for device assigned to userspace.
> First problem was PowerPC specific and that will be solved separately.
> For second problem, earlier I tried to added a couple of MSI specific ioctls and you suggested (IIUC) that we should have a generic reserved-iova type of API and then we can map MSI bank using reserved-iova and this will not require involvement of user-space.
> 
> > 
> > Also, the management of reserved IOVAs vs MSI addresses looks really
> > dubious to me.  How does your platform pick an MSI address and what are
> > we breaking by covertly changing it?  We seem to be masking over at the
> > VFIO level, where there should be lower level interfaces doing the right thing
> > when we configure MSI on the device.
> 
> Yes, In my understanding the right solution should be:
>  1) VFIO driver should know what physical-msi-address will be used for devices in an iommu-group.
>     I did not find an generic API, on PowerPC I added some function in ffrescale msi-driver and called from vfio-iommu-fsl-pamu.c (not yet upstreamed).
>  2) VFIO driver should know what IOVA to be used for creating iommu-mapping (VFIO APIs patch of this patch series)
>  3) VFIO driver will create the iommu-mapping using (1) and (2)
>  4) VFIO driver should be able to tell the msi-driver that for a given device it should use different IOVA. So when composing the msi message (for the devices is the given iommu-group) it should use that programmed iova as MSI-address. This interface also needed to be developed.
> 
> I was not sure of which approach we should take. The current approach in the patch is simple to develop so I went ahead to take input but I agree this does not look very good.
> What do you think, should drop this approach and work out the approach as described above.

I'm certainly not interested in applying an maintaining an interim
solution that isn't the right one.  It seems like VFIO is too involved
in this process in your example.  On x86 we have per vector isolation
and the only thing we're missing is reporting back of the region used by
MSI vectors as reserved IOVA space (but it's standard on x86, so an x86
VM user will never use it for IOVA).  In your model, the MSI IOVA space
is programmable, but it has page granularity (I assume).  Therefore we
shouldn't be sharing that page with anyone.  That seems to suggest we
need to allocate a page of vector space from the host kernel, setup the
IOVA mapping, and then the host kernel should know to only allocate MSI
vectors for these devices from that pre-allocated page.  Otherwise we
need to call the interrupts unsafe, like we do on x86 without interrupt
remapping.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt
  2015-10-05 22:44       ` Alex Williamson
@ 2015-10-06  8:32         ` Bhushan Bharat
  2015-10-06 15:06           ` Alex Williamson
  0 siblings, 1 reply; 45+ messages in thread
From: Bhushan Bharat @ 2015-10-06  8:32 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, marc.zyngier, will.deacon, kvmarm



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Tuesday, October 06, 2015 4:15 AM
> To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> marc.zyngier@arm.com; will.deacon@arm.com
> Subject: Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi
> interrupt
> 
> On Mon, 2015-10-05 at 07:20 +0000, Bhushan Bharat wrote:
> >
> >
> > > -----Original Message-----
> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > Sent: Saturday, October 03, 2015 4:17 AM
> > > To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> > > Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > > christoffer.dall@linaro.org; eric.auger@linaro.org;
> > > pranavkumar@linaro.org; marc.zyngier@arm.com; will.deacon@arm.com
> > > Subject: Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi
> > > interrupt
> > >
> > > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > > An MSI-address is allocated and programmed in pcie device during
> > > > interrupt configuration. Now for a pass-through device, try to
> > > > create the iommu mapping for this allocted/programmed msi-address.
> > > > If the iommu mapping is created and the msi address programmed in
> > > > the pcie device is different from msi-iova as per iommu
> > > > programming then reconfigure the pci device to use msi-iova as msi
> address.
> > > >
> > > > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > > > ---
> > > >  drivers/vfio/pci/vfio_pci_intrs.c | 36
> > > > ++++++++++++++++++++++++++++++++++--
> > > >  1 file changed, 34 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c
> > > > b/drivers/vfio/pci/vfio_pci_intrs.c
> > > > index 1f577b4..c9690af 100644
> > > > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> > > > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> > > > @@ -312,13 +312,23 @@ static int vfio_msi_set_vector_signal(struct
> > > vfio_pci_device *vdev,
> > > >  	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
> > > >  	char *name = msix ? "vfio-msix" : "vfio-msi";
> > > >  	struct eventfd_ctx *trigger;
> > > > +	struct msi_msg msg;
> > > > +	struct vfio_device *device;
> > > > +	uint64_t msi_addr, msi_iova;
> > > >  	int ret;
> > > >
> > > >  	if (vector >= vdev->num_ctx)
> > > >  		return -EINVAL;
> > > >
> > > > +	device = vfio_device_get_from_dev(&pdev->dev);
> > >
> > > Have you looked at this function?  I don't think we want to be doing
> > > that every time we want to poke the interrupt configuration.
> >
> > I am trying to describe what I understood, a device can have many
> interrupts and we should setup iommu only once, when called for the first
> time to enable/setup interrupt.
> > Similarly when disabling the interrupt we should iommu-unmap when
> > called for the last enabled interrupt for that device. Now with this
> > understanding, should I move this map-unmap to separate functions and
> > call them from vfio_msi_set_block() rather than in
> > vfio_msi_set_vector_signal()
> 
> Interrupts can be setup and torn down at any time and I don't see how one
> function or the other makes much difference.
> vfio_device_get_from_dev() is enough overhead that the data we need
> should be cached if we're going to call it with some regularity.  Maybe
> vfio_iommu_driver_ops.open() should be called with a pointer to the
> vfio_device... or the vfio_group.

vfio_iommu_driver_ops.open() ? or do you mean vfio_pci_open() should be called with vfio_device or vfio_group, and we will cache that in vfio_pci_device ?

> 
> > >  Also note that
> > > IOMMU mappings don't operate on devices, but groups, so maybe we
> > > want to pass the group.
> >
> > Yes, it operates on group. I hesitated to add an API to get group. Do you
> suggest to that it is ok to add API to get group from device.
> 
> No, the above suggestion is probably better.
> 
> > >
> > > > +	if (device == NULL)
> > > > +		return -EINVAL;
> > >
> > > This would be a legitimate BUG_ON(!device)
> > >
> > > > +
> > > >  	if (vdev->ctx[vector].trigger) {
> > > >  		free_irq(irq, vdev->ctx[vector].trigger);
> > > > +		get_cached_msi_msg(irq, &msg);
> > > > +		msi_iova = ((u64)msg.address_hi << 32) | msg.address_lo;
> > > > +		vfio_device_unmap_msi(device, msi_iova, PAGE_SIZE);
> > > >  		kfree(vdev->ctx[vector].name);
> > > >  		eventfd_ctx_put(vdev->ctx[vector].trigger);
> > > >  		vdev->ctx[vector].trigger = NULL; @@ -346,12 +356,11 @@
> static
> > > > int vfio_msi_set_vector_signal(struct
> > > vfio_pci_device *vdev,
> > > >  	 * cached value of the message prior to enabling.
> > > >  	 */
> > > >  	if (msix) {
> > > > -		struct msi_msg msg;
> > > > -
> > > >  		get_cached_msi_msg(irq, &msg);
> > > >  		pci_write_msi_msg(irq, &msg);
> > > >  	}
> > > >
> > > > +
> > >
> > > gratuitous newline
> > >
> > > >  	ret = request_irq(irq, vfio_msihandler, 0,
> > > >  			  vdev->ctx[vector].name, trigger);
> > > >  	if (ret) {
> > > > @@ -360,6 +369,29 @@ static int vfio_msi_set_vector_signal(struct
> > > vfio_pci_device *vdev,
> > > >  		return ret;
> > > >  	}
> > > >
> > > > +	/* Re-program the new-iova in pci-device in case there is
> > > > +	 * different iommu-mapping created for programmed msi-address.
> > > > +	 */
> > > > +	get_cached_msi_msg(irq, &msg);
> > > > +	msi_iova = 0;
> > > > +	msi_addr = (u64)(msg.address_hi) << 32 | (u64)(msg.address_lo);
> > > > +	ret = vfio_device_map_msi(device, msi_addr, PAGE_SIZE,
> > > &msi_iova);
> > > > +	if (ret) {
> > > > +		free_irq(irq, vdev->ctx[vector].trigger);
> > > > +		kfree(vdev->ctx[vector].name);
> > > > +		eventfd_ctx_put(trigger);
> > > > +		return ret;
> > > > +	}
> > > > +
> > > > +	/* Reprogram only if iommu-mapped iova is different from msi-
> > > address */
> > > > +	if (msi_iova && (msi_iova != msi_addr)) {
> > > > +		msg.address_hi = (u32)(msi_iova >> 32);
> > > > +		/* Keep Lower bits from original msi message address */
> > > > +		msg.address_lo &= PAGE_MASK;
> > > > +		msg.address_lo |= (u32)(msi_iova & 0x00000000ffffffff);
> > >
> > > Seems like you're making some assumptions here that are dependent on
> > > the architecture and maybe the platform.
> >
> > What I tried is to map the msi page with different iova, which is page size
> aligned. But the offset within the page will remain same.
> > For example, original msi address was 0x0603_0040 and we have a reserved
> region at 0xf000_0000. So iommu mapping is created for 0xf000_0000
> =>0x0600_3000 of size 0x1000.
> >
> > So the new address to be programmed in device is 0xf000_0040, offset
> 0x40 added to base address in iommu mapping.
> 
> Don't you need ~PAGE_MASK for it to work like that?  The & with
> 0x00000000ffffffff shouldn't be needed either, certainly not with all the
> leading zeros.

Yes, I think ~PAGE_MSK can be used.

Thanks
-Bharat

> 
> > > > +		pci_write_msi_msg(irq, &msg);
> > > > +	}
> > > > +
> > > >  	vdev->ctx[vector].trigger = trigger;
> > > >
> > > >  	return 0;
> > >
> > >
> >
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
  2015-10-05 22:45       ` Alex Williamson
@ 2015-10-06  8:53         ` Bhushan Bharat
  2015-10-06 15:11           ` Alex Williamson
  0 siblings, 1 reply; 45+ messages in thread
From: Bhushan Bharat @ 2015-10-06  8:53 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Tuesday, October 06, 2015 4:15 AM
> To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> marc.zyngier@arm.com; will.deacon@arm.com
> Subject: Re: [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs
> automap state
> 
> On Mon, 2015-10-05 at 06:00 +0000, Bhushan Bharat wrote:
> > > -1138,6 +1156,8 @@
> > > > static long vfio_iommu_type1_ioctl(void *iommu_data,
> > > >  		}
> > > >  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
> > > >  		struct vfio_iommu_type1_info info;
> > > > +		struct iommu_domain_msi_maps msi_maps;
> > > > +		int ret;
> > > >
> > > >  		minsz = offsetofend(struct vfio_iommu_type1_info,
> > > iova_pgsizes);
> > > >
> > > > @@ -1149,6 +1169,18 @@ static long vfio_iommu_type1_ioctl(void
> > > > *iommu_data,
> > > >
> > > >  		info.flags = 0;
> > > >
> > > > +		ret = vfio_domains_get_msi_maps(iommu, &msi_maps);
> > > > +		if (ret)
> > > > +			return ret;
> > >
> > > And now ioctl(VFIO_IOMMU_GET_INFO) no longer works for any
> IOMMU
> > > implementing domain_get_attr but not supporting
> > > DOMAIN_ATTR_MSI_MAPPING.
> >
> > With this current patch version this will get the default assumed behavior
> as you commented on previous patch.
> 
> How so?

You are right, the ioctl will return failure. But that should be ok, right?

> 
> +               msi_maps->automap = true;
> +               msi_maps->override_automap = false;
> +
> +               if (domain->ops->domain_get_attr)
> +                       ret = domain->ops->domain_get_attr(domain, attr,
> + data);
> 
> If domain_get_attr is implemented, but DOMAIN_ATTR_MSI_MAPPING is
> not, ret should be an error code.

Currently it returns same error code returned by domain->ops->domain_get_attr(). 
I do not think we want to complicate that we return an error to user-space that msi's probably cannot be used but user-space can continue with Legacy interrupt, or you want that?

Thanks
-Bharat


^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI pages
  2015-10-05 22:45       ` Alex Williamson
@ 2015-10-06  9:05         ` Bhushan Bharat
  2015-10-06 15:12           ` Alex Williamson
  0 siblings, 1 reply; 45+ messages in thread
From: Bhushan Bharat @ 2015-10-06  9:05 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, marc.zyngier, will.deacon, kvmarm



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Tuesday, October 06, 2015 4:15 AM
> To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> marc.zyngier@arm.com; will.deacon@arm.com
> Subject: Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI
> pages
> 
> On Mon, 2015-10-05 at 06:27 +0000, Bhushan Bharat wrote:
> >
> >
> > > -----Original Message-----
> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > Sent: Saturday, October 03, 2015 4:16 AM
> > > To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> > > Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > > christoffer.dall@linaro.org; eric.auger@linaro.org;
> > > pranavkumar@linaro.org; marc.zyngier@arm.com; will.deacon@arm.com
> > > Subject: Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap
> > > MSI pages
> > >
> > > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > > For MSI interrupts to work for a pass-through devices we need to
> > > > have mapping of msi-pages in iommu. Now on some platforms (like
> > > > x86) does this msi-pages mapping happens magically and in these
> > > > case they chooses an iova which they somehow know that it will
> > > > never overlap with guest memory. But this magic iova selection may
> > > > not be always true for all platform (like PowerPC and ARM64).
> > > >
> > > > Also on x86 platform, there is no problem as long as running a
> > > > x86-guest on x86-host but there can be issues when running a
> > > > non-x86 guest on
> > > > x86 host or other userspace applications like (I think ODP/DPDK).
> > > > As in these cases there can be chances that it overlaps with guest
> > > > memory mapping.
> > >
> > > Wow, it's amazing anything works... smoke and mirrors.
> > >
> > > > This patch add interface to iommu-map and iommu-unmap msi-pages
> at
> > > > reserved iova chosen by userspace.
> > > >
> > > > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > > > ---
> > > >  drivers/vfio/vfio.c             |  52 +++++++++++++++++++
> > > >  drivers/vfio/vfio_iommu_type1.c | 111
> > > ++++++++++++++++++++++++++++++++++++++++
> > > >  include/linux/vfio.h            |   9 +++-
> > > >  3 files changed, 171 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index
> > > > 2fb29df..a817d2d 100644
> > > > --- a/drivers/vfio/vfio.c
> > > > +++ b/drivers/vfio/vfio.c
> > > > @@ -605,6 +605,58 @@ static int vfio_iommu_group_notifier(struct
> > > notifier_block *nb,
> > > >  	return NOTIFY_OK;
> > > >  }
> > > >
> > > > +int vfio_device_map_msi(struct vfio_device *device, uint64_t
> msi_addr,
> > > > +			uint32_t size, uint64_t *msi_iova) {
> > > > +	struct vfio_container *container = device->group->container;
> > > > +	struct vfio_iommu_driver *driver;
> > > > +	int ret;
> > > > +
> > > > +	/* Validate address and size */
> > > > +	if (!msi_addr || !size || !msi_iova)
> > > > +		return -EINVAL;
> > > > +
> > > > +	down_read(&container->group_lock);
> > > > +
> > > > +	driver = container->iommu_driver;
> > > > +	if (!driver || !driver->ops || !driver->ops->msi_map) {
> > > > +		up_read(&container->group_lock);
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	ret = driver->ops->msi_map(container->iommu_data,
> > > > +				   msi_addr, size, msi_iova);
> > > > +
> > > > +	up_read(&container->group_lock);
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +int vfio_device_unmap_msi(struct vfio_device *device, uint64_t
> > > msi_iova,
> > > > +			  uint64_t size)
> > > > +{
> > > > +	struct vfio_container *container = device->group->container;
> > > > +	struct vfio_iommu_driver *driver;
> > > > +	int ret;
> > > > +
> > > > +	/* Validate address and size */
> > > > +	if (!msi_iova || !size)
> > > > +		return -EINVAL;
> > > > +
> > > > +	down_read(&container->group_lock);
> > > > +
> > > > +	driver = container->iommu_driver;
> > > > +	if (!driver || !driver->ops || !driver->ops->msi_unmap) {
> > > > +		up_read(&container->group_lock);
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	ret = driver->ops->msi_unmap(container->iommu_data,
> > > > +				     msi_iova, size);
> > > > +
> > > > +	up_read(&container->group_lock);
> > > > +	return ret;
> > > > +}
> > > > +
> > > >  /**
> > > >   * VFIO driver API
> > > >   */
> > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > b/drivers/vfio/vfio_iommu_type1.c index 3315fb6..ab376c2 100644
> > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > @@ -1003,12 +1003,34 @@ out_free:
> > > >  	return ret;
> > > >  }
> > > >
> > > > +static void vfio_iommu_unmap_all_reserved_regions(struct
> > > > +vfio_iommu
> > > > +*iommu) {
> > > > +	struct vfio_resvd_region *region;
> > > > +	struct vfio_domain *d;
> > > > +
> > > > +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> > > > +		list_for_each_entry(d, &iommu->domain_list, next) {
> > > > +			if (!region->map_paddr)
> > > > +				continue;
> > > > +
> > > > +			if (!iommu_iova_to_phys(d->domain, region->iova))
> > > > +				continue;
> > > > +
> > > > +			iommu_unmap(d->domain, region->iova,
> > > PAGE_SIZE);
> > >
> > > PAGE_SIZE?  Why not region->size?
> >
> > Yes, this should be region->size.
> >
> > >
> > > > +			region->map_paddr = 0;
> > > > +			cond_resched();
> > > > +		}
> > > > +	}
> > > > +}
> > > > +
> > > >  static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu)
> {
> > > >  	struct rb_node *node;
> > > >
> > > >  	while ((node = rb_first(&iommu->dma_list)))
> > > >  		vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma,
> > > node));
> > > > +
> > > > +	vfio_iommu_unmap_all_reserved_regions(iommu);
> > > >  }
> > > >
> > > >  static void vfio_iommu_type1_detach_group(void *iommu_data, @@
> > > > -1048,6 +1070,93 @@ done:
> > > >  	mutex_unlock(&iommu->lock);
> > > >  }
> > > >
> > > > +static int vfio_iommu_type1_msi_map(void *iommu_data, uint64_t
> > > msi_addr,
> > > > +				    uint64_t size, uint64_t *msi_iova) {
> > > > +	struct vfio_iommu *iommu = iommu_data;
> > > > +	struct vfio_resvd_region *region;
> > > > +	int ret;
> > > > +
> > > > +	mutex_lock(&iommu->lock);
> > > > +
> > > > +	/* Do not try ceate iommu-mapping if msi reconfig not allowed */
> > > > +	if (!iommu->allow_msi_reconfig) {
> > > > +		mutex_unlock(&iommu->lock);
> > > > +		return 0;
> > > > +	}
> > > > +
> > > > +	/* Check if there is already region mapping the msi page */
> > > > +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> > > > +		if (region->map_paddr == msi_addr) {
> > > > +			*msi_iova = region->iova;
> > > > +			region->refcount++;
> > > > +			mutex_unlock(&iommu->lock);
> > > > +			return 0;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	/* Get a unmapped reserved region */
> > > > +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> > > > +		if (!region->map_paddr)
> > > > +			break;
> > > > +	}
> > > > +
> > > > +	if (region == NULL) {
> > > > +		mutex_unlock(&iommu->lock);
> > > > +		return -ENODEV;
> > > > +	}
> > > > +
> > > > +	ret = vfio_iommu_map(iommu, region->iova, msi_addr >>
> > > PAGE_SHIFT,
> > > > +			     size >> PAGE_SHIFT, region->prot);
> > >
> > > So the reserved region has a size and the msi mapping has a size and
> > > we arbitrarily decide to use the msi mapping size here?
> >
> > Reserved region interface is generic and user can set reserved region of
> any size (multiple of page-size). But we do not want to create MSI address
> mapping beyond the MSI-page otherwise this can be security issue. But I
> think I am not tracking how much reserved iova region is mapped, so unmap
> is called for same size.
> >
> >
> > >  The overlap checks we've done for the reserved region are
> > > meaningless then.  No wonder you're unmapping with PAGE_SIZE, we
> have no idea.
> >
> > Do you think we should divide the reserved region in pages and track
> map/unmap per page?
> 
> I'd certainly expect as a user to do one large reserved region mapping and be
> done rather than a large number of smaller mappings.  I don't really
> understand how we're providing isolation with this interface though, we're
> setting up the IOMMU so the guest has a mapping to the MSI, but our
> IOMMU granularity is page size.  Aren't we giving the guest access to
> everything else that might be mapped into that page?  Don't we need to
> push an reservation down to the MSI allocation in order to have isolation?  If
> we did that, couldn't we pretty much guarantee that all MSI vectors would fit
> into a page or two?

Normally we will reserve one MSI-page for a vfio-group and all devices will use same on PowerPC. I think same applies to SMMU as well. Now for X86 I do not know how many pages we needed for a vfio-group? If we needed one/two (small numbers) msi-page then I think for msi purpose one/two reserved-iova region of PAGE_SIZE is sufficient.

Thanks
-Bharat

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region
  2015-10-05 22:45     ` Alex Williamson
@ 2015-10-06  9:39       ` Bhushan Bharat
  2015-10-06 15:21         ` Alex Williamson
  0 siblings, 1 reply; 45+ messages in thread
From: Bhushan Bharat @ 2015-10-06  9:39 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Tuesday, October 06, 2015 4:15 AM
> To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> marc.zyngier@arm.com; will.deacon@arm.com
> Subject: Re: [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova
> region
> 
> On Mon, 2015-10-05 at 04:55 +0000, Bhushan Bharat wrote:
> > Hi Alex,
> >
> > > -----Original Message-----
> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > Sent: Saturday, October 03, 2015 4:16 AM
> > > To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> > > Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > > christoffer.dall@linaro.org; eric.auger@linaro.org;
> > > pranavkumar@linaro.org; marc.zyngier@arm.com; will.deacon@arm.com
> > > Subject: Re: [RFC PATCH 1/6] vfio: Add interface for add/del
> > > reserved iova region
> > >
> > > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > > This Patch adds the VFIO APIs to add and remove reserved iova regions.
> > > > The reserved iova region can be used for mapping some specific
> > > > physical address in iommu.
> > > >
> > > > Currently we are planning to use this interface for adding iova
> > > > regions for creating iommu of msi-pages. But the API are designed
> > > > for future extension where some other physical address can be
> mapped.
> > > >
> > > > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > > > ---
> > > >  drivers/vfio/vfio_iommu_type1.c | 201
> > > +++++++++++++++++++++++++++++++++++++++-
> > > >  include/uapi/linux/vfio.h       |  43 +++++++++
> > > >  2 files changed, 243 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > b/drivers/vfio/vfio_iommu_type1.c index 57d8c37..fa5d3e4 100644
> > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > @@ -59,6 +59,7 @@ struct vfio_iommu {
> > > >  	struct rb_root		dma_list;
> > > >  	bool			v2;
> > > >  	bool			nesting;
> > > > +	struct list_head	reserved_iova_list;
> > >
> > > This alignment leads to poor packing in the structure, put it above the
> bools.
> >
> > ok
> >
> > >
> > > >  };
> > > >
> > > >  struct vfio_domain {
> > > > @@ -77,6 +78,15 @@ struct vfio_dma {
> > > >  	int			prot;		/* IOMMU_READ/WRITE */
> > > >  };
> > > >
> > > > +struct vfio_resvd_region {
> > > > +	dma_addr_t	iova;
> > > > +	size_t		size;
> > > > +	int		prot;			/* IOMMU_READ/WRITE */
> > > > +	int		refcount;		/* ref count of mappings */
> > > > +	uint64_t	map_paddr;		/* Mapped Physical Address
> > > */
> > >
> > > phys_addr_t
> >
> > Ok,
> >
> > >
> > > > +	struct list_head next;
> > > > +};
> > > > +
> > > >  struct vfio_group {
> > > >  	struct iommu_group	*iommu_group;
> > > >  	struct list_head	next;
> > > > @@ -106,6 +116,38 @@ static struct vfio_dma *vfio_find_dma(struct
> > > vfio_iommu *iommu,
> > > >  	return NULL;
> > > >  }
> > > >
> > > > +/* This function must be called with iommu->lock held */ static
> > > > +bool vfio_overlap_with_resvd_region(struct vfio_iommu *iommu,
> > > > +					   dma_addr_t start, size_t size) {
> > > > +	struct vfio_resvd_region *region;
> > > > +
> > > > +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> > > > +		if (region->iova < start)
> > > > +			return (start - region->iova < region->size);
> > > > +		else if (start < region->iova)
> > > > +			return (region->iova - start < size);
> > >
> > > <= on both of the return lines?
> >
> > I think is should be "<" and not "=<", no ?
> 
> Yep, looks like you're right.  Maybe there's a more straightforward way to do
> this.
> 
> > >
> > > > +
> > > > +		return (region->size > 0 && size > 0);
> > > > +	}
> > > > +
> > > > +	return false;
> > > > +}
> > > > +
> > > > +/* This function must be called with iommu->lock held */ static
> > > > +struct vfio_resvd_region *vfio_find_resvd_region(struct
> > > > +vfio_iommu
> > > *iommu,
> > > > +						 dma_addr_t start, size_t
> > > size) {
> > > > +	struct vfio_resvd_region *region;
> > > > +
> > > > +	list_for_each_entry(region, &iommu->reserved_iova_list, next)
> > > > +		if (region->iova == start && region->size == size)
> > > > +			return region;
> > > > +
> > > > +	return NULL;
> > > > +}
> > > > +
> > > >  static void vfio_link_dma(struct vfio_iommu *iommu, struct
> > > > vfio_dma
> > > > *new)  {
> > > >  	struct rb_node **link = &iommu->dma_list.rb_node, *parent =
> > > NULL; @@
> > > > -580,7 +622,8 @@ static int vfio_dma_do_map(struct vfio_iommu
> > > > *iommu,
> > > >
> > > >  	mutex_lock(&iommu->lock);
> > > >
> > > > -	if (vfio_find_dma(iommu, iova, size)) {
> > > > +	if (vfio_find_dma(iommu, iova, size) ||
> > > > +	    vfio_overlap_with_resvd_region(iommu, iova, size)) {
> > > >  		mutex_unlock(&iommu->lock);
> > > >  		return -EEXIST;
> > > >  	}
> > > > @@ -626,6 +669,127 @@ static int vfio_dma_do_map(struct
> vfio_iommu
> > > *iommu,
> > > >  	return ret;
> > > >  }
> > > >
> > > > +/* This function must be called with iommu->lock held */ static
> > > > +int vfio_iommu_resvd_region_del(struct vfio_iommu *iommu,
> > > > +				dma_addr_t iova, size_t size, int prot) {
> > > > +	struct vfio_resvd_region *res_region;
> > >
> > > Have some consistency in naming, just use "region".
> >
> > Ok,
> >
> > > > +
> > > > +	res_region = vfio_find_resvd_region(iommu, iova, size);
> > > > +	/* Region should not be mapped in iommu */
> > > > +	if (res_region == NULL || res_region->map_paddr)
> > > > +		return -EINVAL;
> > >
> > > Are these two separate errors?  !region is -EINVAL, but being mapped
> > > is - EBUSY.
> >
> > Yes, will separate them.
> >
> > >
> > > > +
> > > > +	list_del(&res_region->next);
> > > > +	kfree(res_region);
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +/* This function must be called with iommu->lock held */ static
> > > > +int vfio_iommu_resvd_region_add(struct vfio_iommu *iommu,
> > > > +				       dma_addr_t iova, size_t size, int prot) {
> > > > +	struct vfio_resvd_region *res_region;
> > > > +
> > > > +	/* Check overlap with with dma maping and reserved regions */
> > > > +	if (vfio_find_dma(iommu, iova, size) ||
> > > > +	    vfio_find_resvd_region(iommu, iova, size))
> > > > +		return -EEXIST;
> > > > +
> > > > +	res_region = kzalloc(sizeof(*res_region), GFP_KERNEL);
> > > > +	if (res_region == NULL)
> > > > +		return -ENOMEM;
> > > > +
> > > > +	res_region->iova = iova;
> > > > +	res_region->size = size;
> > > > +	res_region->prot = prot;
> > > > +	res_region->refcount = 0;
> > > > +	res_region->map_paddr = 0;
> > >
> > > They're already 0 by the kzalloc
> >
> > Yes ;)
> > >
> > > > +
> > > > +	list_add(&res_region->next, &iommu->reserved_iova_list);
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static
> > > > +int vfio_handle_reserved_region_add(struct vfio_iommu *iommu,
> > > > +				struct vfio_iommu_reserved_region_add
> > > *region) {
> > > > +	dma_addr_t iova = region->iova;
> > > > +	size_t size = region->size;
> > > > +	int flags = region->flags;
> > > > +	uint64_t mask;
> > > > +	int prot = 0;
> > > > +	int ret;
> > > > +
> > > > +	/* Verify that none of our __u64 fields overflow */
> > > > +	if (region->size != size || region->iova != iova)
> > > > +		return -EINVAL;
> > > > +
> > > > +	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
> > > > +
> > > > +	WARN_ON(mask & PAGE_MASK);
> > > > +
> > > > +	if (flags & VFIO_IOMMU_RES_REGION_READ)
> > > > +		prot |= IOMMU_READ;
> > > > +	if (flags & VFIO_IOMMU_RES_REGION_WRITE)
> > > > +		prot |= IOMMU_WRITE;
> > > > +
> > > > +	if (!prot || !size || (size | iova) & mask)
> > > > +		return -EINVAL;
> > > > +
> > > > +	/* Don't allow IOVA wrap */
> > > > +	if (iova + size - 1 < iova)
> > > > +		return -EINVAL;
> > > > +
> > > > +	mutex_lock(&iommu->lock);
> > > > +
> > > > +	if (region->flags & VFIO_IOMMU_RES_REGION_ADD) {
> > > > +		ret = vfio_iommu_resvd_region_add(iommu, iova, size,
> > > prot);
> > > > +		if (ret) {
> > > > +			mutex_unlock(&iommu->lock);
> > > > +			return ret;
> > > > +		}
> > > > +	}
> > >
> > > Silently fail if not VFIO_IOMMU_RES_REGION_ADD?
> >
> > As per below comment we do not need this flag. So the above flag
> checking will be removed.
> >
> > >
> > > > +
> > > > +	mutex_unlock(&iommu->lock);
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static
> > > > +int vfio_handle_reserved_region_del(struct vfio_iommu *iommu,
> > > > +				struct vfio_iommu_reserved_region_del
> > > *region) {
> > > > +	dma_addr_t iova = region->iova;
> > > > +	size_t size = region->size;
> > > > +	int flags = region->flags;
> > > > +	int ret;
> > > > +
> > > > +	if (!(flags & VFIO_IOMMU_RES_REGION_DEL))
> > > > +		return -EINVAL;
> > > > +
> > > > +	mutex_lock(&iommu->lock);
> > > > +
> > > > +	/* Check for the region */
> > > > +	if (vfio_find_resvd_region(iommu, iova, size) == NULL) {
> > > > +		mutex_unlock(&iommu->lock);
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	/* remove the reserved region */
> > > > +	if (region->flags & VFIO_IOMMU_RES_REGION_DEL) {
> > > > +		ret = vfio_iommu_resvd_region_del(iommu, iova, size,
> > > flags);
> > > > +		if (ret) {
> > > > +			mutex_unlock(&iommu->lock);
> > > > +			return ret;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	mutex_unlock(&iommu->lock);
> > > > +	return 0;
> > > > +}
> > > > +
> > > >  static int vfio_bus_type(struct device *dev, void *data)  {
> > > >  	struct bus_type **bus = data;
> > > > @@ -905,6 +1069,7 @@ static void
> *vfio_iommu_type1_open(unsigned
> > > long arg)
> > > >  	}
> > > >
> > > >  	INIT_LIST_HEAD(&iommu->domain_list);
> > > > +	INIT_LIST_HEAD(&iommu->reserved_iova_list);
> > > >  	iommu->dma_list = RB_ROOT;
> > > >  	mutex_init(&iommu->lock);
> > > >
> > > > @@ -1020,6 +1185,40 @@ static long vfio_iommu_type1_ioctl(void
> > > *iommu_data,
> > > >  			return ret;
> > > >
> > > >  		return copy_to_user((void __user *)arg, &unmap, minsz);
> > > > +	} else if (cmd == VFIO_IOMMU_RESERVED_REGION_ADD) {
> > > > +		struct vfio_iommu_reserved_region_add region;
> > > > +		long ret;
> > > > +
> > > > +		minsz = offsetofend(struct
> > > vfio_iommu_reserved_region_add,
> > > > +				    size);
> > > > +		if (copy_from_user(&region, (void __user *)arg, minsz))
> > > > +			return -EFAULT;
> > > > +
> > > > +		if (region.argsz < minsz)
> > > > +			return -EINVAL;
> > > > +
> > > > +		ret = vfio_handle_reserved_region_add(iommu, &region);
> > > > +		if (ret)
> > > > +			return ret;
> > > > +
> > > > +		return copy_to_user((void __user *)arg, &region, minsz);
> > > > +	} else if (cmd == VFIO_IOMMU_RESERVED_REGION_DEL) {
> > > > +		struct vfio_iommu_reserved_region_del region;
> > > > +		long ret;
> > > > +
> > > > +		minsz = offsetofend(struct
> > > vfio_iommu_reserved_region_del,
> > > > +				    size);
> > > > +		if (copy_from_user(&region, (void __user *)arg, minsz))
> > > > +			return -EFAULT;
> > > > +
> > > > +		if (region.argsz < minsz)
> > > > +			return -EINVAL;
> > > > +
> > > > +		ret = vfio_handle_reserved_region_del(iommu, &region);
> > > > +		if (ret)
> > > > +			return ret;
> > > > +
> > > > +		return copy_to_user((void __user *)arg, &region, minsz);
> > >
> > > So we've just created an interface that is available for all
> > > vfio-type1 users, whether it makes any sense for the platform or
> > > not,
> >
> > How we should decide that a given platform needs this or not?
> 
> You later add new iommu interfaces, presumably if the iommu doesn't
> implement those interfaces then there's no point in us exposing these
> interfaces to vfio.

You mean if an iommu says "does not requires explicit-iommu-mapping MSIs" then if user-space calls these reserved-iova ioctls then we should return error without adding these regions, right?

> 
> > > and it allows the user to
> > > consume arbitrary amounts of kernel memory, by making an infinitely
> > > long list of reserved iova entries, brilliant!
> >
> > I was not sure of how to limit the user. What I was thinking of having a
> default number of pages a user can reserve (512 pages). Also we can give a
> sysfs interface so that user can change the default number of pages. Does
> this sound good? If not please suggest?
> 
> Isn't 512 entries a lot for a linked list?
>  Can we use our existing rb tree to manage these entries rather than a secondary list?

I do not think so, it will complicate the code.

>  How many entries do we realistically need?

I think this should be small number of entries as discussed on other patch. Small number means 1 or 2 max on PowerPC and smmu (I think so). 

>  Can the iommu callbacks help give us a limit?

I am not sure right now how deterministically we can get number of regions/msi-pages needed for a given platform and for a list of devices in a group?

>  Can we somehow use information about the devices in the group to produce a limit,
> ie. MSI vectors possible from the group?

It seems like we need to know number of vectors needed per device in the group and we needed to know how many vectors can supported using the reserved msi-page. We are sharing a msi-page then it becomes a little more complicated. But overall it seems like we need a 2-3 or very small number of regions and I will suggest that we can go with this small number can enhance if required later on.

> 
> > >
> > > >  	}
> > > >
> > > >  	return -ENOTTY;
> > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > > index b57b750..1abd1a9 100644
> > > > --- a/include/uapi/linux/vfio.h
> > > > +++ b/include/uapi/linux/vfio.h
> > > > @@ -440,6 +440,49 @@ struct vfio_iommu_type1_dma_unmap {
> > > >  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
> > > >  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
> > > >
> > > > +/**************** Reserved IOVA region specific APIs
> > > > +**********************/
> > > > +
> > > > +/*
> > > > + * VFIO_IOMMU_RESERVED_REGION_ADD - _IO(VFIO_TYPE,
> VFIO_BASE
> > > + 17,
> > > > + *					struct
> > > vfio_iommu_reserved_region_add)
> > > > + * This is used to add a reserved iova region.
> > > > + * @flags - Input: VFIO_IOMMU_RES_REGION_ADD flag is for adding
> > > > + * a reserved region.
> > >
> > > Why else would we call VFIO_IOMMU_RESERVED_REGION_ADD except
> to add
> > > a region, this flag is redundant.
> >
> > Ok, will remove this.
> >
> > >
> > > > + * Also pass READ/WRITE/IOMMU flags to be used in iommu mapping.
> > > > + * @iova - Input: IOVA base address of reserved region
> > > > + * @size - Input: Size of the reserved region
> > > > + * Return: 0 on success, -errno on failure  */ struct
> > > > +vfio_iommu_reserved_region_add {
> > > > +	__u32   argsz;
> > > > +	__u32   flags;
> > > > +#define VFIO_IOMMU_RES_REGION_ADD	(1 << 0) /* Add a
> reserved
> > > region */
> > > > +#define VFIO_IOMMU_RES_REGION_READ	(1 << 1) /* readable
> region */
> > > > +#define VFIO_IOMMU_RES_REGION_WRITE	(1 << 2) /* writable
> > > region */
> > > > +	__u64	iova;
> > > > +	__u64   size;
> > > > +};
> > > > +#define VFIO_IOMMU_RESERVED_REGION_ADD _IO(VFIO_TYPE,
> > > VFIO_BASE + 17)
> > > > +
> > > > +/*
> > > > + * VFIO_IOMMU_RESERVED_REGION_DEL - _IO(VFIO_TYPE,
> VFIO_BASE +
> > > 18,
> > > > + *					struct
> > > vfio_iommu_reserved_region_del)
> > > > + * This is used to delete an existing reserved iova region.
> > > > + * @flags - VFIO_IOMMU_RES_REGION_DEL flag is for deleting a
> > > > +region use,
> > > > + *  only a unmapped region can be deleted.
> > > > + * @iova - Input: IOVA base address of reserved region
> > > > + * @size - Input: Size of the reserved region
> > > > + * Return: 0 on success, -errno on failure  */ struct
> > > > +vfio_iommu_reserved_region_del {
> > > > +	__u32   argsz;
> > > > +	__u32   flags;
> > > > +#define VFIO_IOMMU_RES_REGION_DEL	(1 << 0) /* unset the
> > > reserved region */
> > > > +	__u64	iova;
> > > > +	__u64   size;
> > > > +};
> > > > +#define VFIO_IOMMU_RESERVED_REGION_DEL _IO(VFIO_TYPE,
> > > VFIO_BASE + 18)
> > > > +
> > >
> > > These are effectively both
> > >
> > > struct vfio_iommu_type1_dma_unmap
> >
> > Yes, do you want to suggest that we should use " struct
> vfio_iommu_type1_dma_unmap". I found that confusing.
> > What is we use "struct vfio_iommu_reserved_region" and use flag
> VFIO_IOMMU_RES_REGION_DEL/ VFIO_IOMMU_RES_REGION_ADD.
> 
> What if we just use the existing map and unmap interface with a flag to
> indicate an MSI reserved mapping?  I don't really see why we need new ioctls
> for this.

Existing map/unmap APIs uses virtual address, user-space mmap() and provide virtual address as well. I think in previous discussion you commented that let's keep then separate, over-riding complicates the map/unmap and can be cause of confusion, and keeping these separate also gives space to extend these reserved iova APIs for something else in future (do not know for what right now). I personally think that it make sense to keep them separate, let me know if you think otherwise 

Thanks
-Bharat

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for MSI
  2015-10-05 22:54       ` Alex Williamson
@ 2015-10-06 10:26         ` Bhushan Bharat
  2015-10-26 15:40           ` Christoffer Dall
  2015-11-02  2:53           ` Pranavkumar Sawargaonkar
  0 siblings, 2 replies; 45+ messages in thread
From: Bhushan Bharat @ 2015-10-06 10:26 UTC (permalink / raw)
  To: Alex Williamson, christoffer.dall, marc.zyngier
  Cc: kvmarm, kvm, eric.auger, pranavkumar, will.deacon



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Tuesday, October 06, 2015 4:25 AM
> To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> marc.zyngier@arm.com; will.deacon@arm.com
> Subject: Re: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for
> MSI
> 
> On Mon, 2015-10-05 at 08:33 +0000, Bhushan Bharat wrote:
> >
> >
> > > -----Original Message-----
> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > Sent: Saturday, October 03, 2015 4:17 AM
> > > To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> > > Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > > christoffer.dall@linaro.org; eric.auger@linaro.org;
> > > pranavkumar@linaro.org; marc.zyngier@arm.com; will.deacon@arm.com
> > > Subject: Re: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping
> > > for MSI
> > >
> > > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > > Finally ARM SMMU declare that iommu-mapping for MSI-pages are not
> > > > set automatically and it should be set explicitly.
> > > >
> > > > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > > > ---
> > > >  drivers/iommu/arm-smmu.c | 7 ++++++-
> > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > index
> > > > a3956fb..9d37e72 100644
> > > > --- a/drivers/iommu/arm-smmu.c
> > > > +++ b/drivers/iommu/arm-smmu.c
> > > > @@ -1401,13 +1401,18 @@ static int
> arm_smmu_domain_get_attr(struct
> > > iommu_domain *domain,
> > > >  				    enum iommu_attr attr, void *data)  {
> > > >  	struct arm_smmu_domain *smmu_domain =
> > > to_smmu_domain(domain);
> > > > +	struct iommu_domain_msi_maps *msi_maps;
> > > >
> > > >  	switch (attr) {
> > > >  	case DOMAIN_ATTR_NESTING:
> > > >  		*(int *)data = (smmu_domain->stage ==
> > > ARM_SMMU_DOMAIN_NESTED);
> > > >  		return 0;
> > > >  	case DOMAIN_ATTR_MSI_MAPPING:
> > > > -		/* Dummy handling added */
> > > > +		msi_maps = data;
> > > > +
> > > > +		msi_maps->automap = false;
> > > > +		msi_maps->override_automap = true;
> > > > +
> > > >  		return 0;
> > > >  	default:
> > > >  		return -ENODEV;
> > >
> > > In previous discussions I understood one of the problems you were
> > > trying to solve was having a limited number of MSI banks and while
> > > you may be able to get isolated MSI banks for some number of users,
> > > it wasn't unlimited and sharing may be required.  I don't see any of that
> addressed in this series.
> >
> > That problem was on PowerPC. Infact there were two problems, one which
> MSI bank to be used and second how to create iommu-mapping for device
> assigned to userspace.
> > First problem was PowerPC specific and that will be solved separately.
> > For second problem, earlier I tried to added a couple of MSI specific ioctls
> and you suggested (IIUC) that we should have a generic reserved-iova type
> of API and then we can map MSI bank using reserved-iova and this will not
> require involvement of user-space.
> >
> > >
> > > Also, the management of reserved IOVAs vs MSI addresses looks really
> > > dubious to me.  How does your platform pick an MSI address and what
> > > are we breaking by covertly changing it?  We seem to be masking over
> > > at the VFIO level, where there should be lower level interfaces
> > > doing the right thing when we configure MSI on the device.
> >
> > Yes, In my understanding the right solution should be:
> >  1) VFIO driver should know what physical-msi-address will be used for
> devices in an iommu-group.
> >     I did not find an generic API, on PowerPC I added some function in
> ffrescale msi-driver and called from vfio-iommu-fsl-pamu.c (not yet
> upstreamed).
> >  2) VFIO driver should know what IOVA to be used for creating
> > iommu-mapping (VFIO APIs patch of this patch series)
> >  3) VFIO driver will create the iommu-mapping using (1) and (2)
> >  4) VFIO driver should be able to tell the msi-driver that for a given device it
> should use different IOVA. So when composing the msi message (for the
> devices is the given iommu-group) it should use that programmed iova as
> MSI-address. This interface also needed to be developed.
> >
> > I was not sure of which approach we should take. The current approach in
> the patch is simple to develop so I went ahead to take input but I agree this
> does not look very good.
> > What do you think, should drop this approach and work out the approach
> as described above.
> 
> I'm certainly not interested in applying an maintaining an interim solution that
> isn't the right one.  It seems like VFIO is too involved in this process in your
> example.  On x86 we have per vector isolation and the only thing we're
> missing is reporting back of the region used by MSI vectors as reserved IOVA
> space (but it's standard on x86, so an x86 VM user will never use it for IOVA).

I remember you mentioned that there is no problem when running an x86 guest on an x86 host.  But it will interesting when running a non-x86 VMs on an x86 host  or non-VM userspace use of VFIO though. 

> In your model, the MSI IOVA space is programmable,

Yes, on PowerPC and ARM-SMMU case also we have to create mapping with an IOVA. First question is which IOVA to be used, and we added the reserved iova ioctl for same.

Second problem is we needed an msi-page physical address for setting up iommu-mapping, and so we needed to reserve an msi-page. I did this for PowerPC but not in a generic extension in msi-driver and will look the code a bit more details on adding an interface to reserve an msi-page or get a shared msi-page with allow-unsafe-interrupt.

Third problem is to report the reserved IOVA to be used for MSI vectors for the given set of devices (devices in a vfio-group). 

Mark/Christopher,
I am not an expert in this area so I might have to understand that code. If you think you can give solution to 2nd and 3rd problem quickly then please let me know.

Thanks
-Bharat

> but it has page
> granularity (I assume).  Therefore we shouldn't be sharing that page with
> anyone.  That seems to suggest we need to allocate a page of vector space
> from the host kernel, setup the IOVA mapping, and then the host kernel
> should know to only allocate MSI vectors for these devices from that pre-
> allocated page.  Otherwise we need to call the interrupts unsafe, like we do
> on x86 without interrupt remapping.  Thanks,
> 
> Alex


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt
  2015-10-06  8:32         ` Bhushan Bharat
@ 2015-10-06 15:06           ` Alex Williamson
  0 siblings, 0 replies; 45+ messages in thread
From: Alex Williamson @ 2015-10-06 15:06 UTC (permalink / raw)
  To: Bhushan Bharat
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon

On Tue, 2015-10-06 at 08:32 +0000, Bhushan Bharat wrote:
> 
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Tuesday, October 06, 2015 4:15 AM
> > To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> > Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> > marc.zyngier@arm.com; will.deacon@arm.com
> > Subject: Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi
> > interrupt
> > 
> > On Mon, 2015-10-05 at 07:20 +0000, Bhushan Bharat wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > > Sent: Saturday, October 03, 2015 4:17 AM
> > > > To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> > > > Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > > > christoffer.dall@linaro.org; eric.auger@linaro.org;
> > > > pranavkumar@linaro.org; marc.zyngier@arm.com; will.deacon@arm.com
> > > > Subject: Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi
> > > > interrupt
> > > >
> > > > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > > > An MSI-address is allocated and programmed in pcie device during
> > > > > interrupt configuration. Now for a pass-through device, try to
> > > > > create the iommu mapping for this allocted/programmed msi-address.
> > > > > If the iommu mapping is created and the msi address programmed in
> > > > > the pcie device is different from msi-iova as per iommu
> > > > > programming then reconfigure the pci device to use msi-iova as msi
> > address.
> > > > >
> > > > > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > > > > ---
> > > > >  drivers/vfio/pci/vfio_pci_intrs.c | 36
> > > > > ++++++++++++++++++++++++++++++++++--
> > > > >  1 file changed, 34 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c
> > > > > b/drivers/vfio/pci/vfio_pci_intrs.c
> > > > > index 1f577b4..c9690af 100644
> > > > > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> > > > > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> > > > > @@ -312,13 +312,23 @@ static int vfio_msi_set_vector_signal(struct
> > > > vfio_pci_device *vdev,
> > > > >  	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
> > > > >  	char *name = msix ? "vfio-msix" : "vfio-msi";
> > > > >  	struct eventfd_ctx *trigger;
> > > > > +	struct msi_msg msg;
> > > > > +	struct vfio_device *device;
> > > > > +	uint64_t msi_addr, msi_iova;
> > > > >  	int ret;
> > > > >
> > > > >  	if (vector >= vdev->num_ctx)
> > > > >  		return -EINVAL;
> > > > >
> > > > > +	device = vfio_device_get_from_dev(&pdev->dev);
> > > >
> > > > Have you looked at this function?  I don't think we want to be doing
> > > > that every time we want to poke the interrupt configuration.
> > >
> > > I am trying to describe what I understood, a device can have many
> > interrupts and we should setup iommu only once, when called for the first
> > time to enable/setup interrupt.
> > > Similarly when disabling the interrupt we should iommu-unmap when
> > > called for the last enabled interrupt for that device. Now with this
> > > understanding, should I move this map-unmap to separate functions and
> > > call them from vfio_msi_set_block() rather than in
> > > vfio_msi_set_vector_signal()
> > 
> > Interrupts can be setup and torn down at any time and I don't see how one
> > function or the other makes much difference.
> > vfio_device_get_from_dev() is enough overhead that the data we need
> > should be cached if we're going to call it with some regularity.  Maybe
> > vfio_iommu_driver_ops.open() should be called with a pointer to the
> > vfio_device... or the vfio_group.
> 
> vfio_iommu_driver_ops.open() ? or do you mean vfio_pci_open() should be called with vfio_device or vfio_group, and we will cache that in vfio_pci_device ?

vfio_pci_open() is an implementation of vfio_iommu_driver_ops.open().
The internal API between vfio and vfio bus drivers would need to have a
parameter added.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
  2015-10-06  8:53         ` Bhushan Bharat
@ 2015-10-06 15:11           ` Alex Williamson
  0 siblings, 0 replies; 45+ messages in thread
From: Alex Williamson @ 2015-10-06 15:11 UTC (permalink / raw)
  To: Bhushan Bharat
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon

On Tue, 2015-10-06 at 08:53 +0000, Bhushan Bharat wrote:
> 
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Tuesday, October 06, 2015 4:15 AM
> > To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> > Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> > marc.zyngier@arm.com; will.deacon@arm.com
> > Subject: Re: [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs
> > automap state
> > 
> > On Mon, 2015-10-05 at 06:00 +0000, Bhushan Bharat wrote:
> > > > -1138,6 +1156,8 @@
> > > > > static long vfio_iommu_type1_ioctl(void *iommu_data,
> > > > >  		}
> > > > >  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
> > > > >  		struct vfio_iommu_type1_info info;
> > > > > +		struct iommu_domain_msi_maps msi_maps;
> > > > > +		int ret;
> > > > >
> > > > >  		minsz = offsetofend(struct vfio_iommu_type1_info,
> > > > iova_pgsizes);
> > > > >
> > > > > @@ -1149,6 +1169,18 @@ static long vfio_iommu_type1_ioctl(void
> > > > > *iommu_data,
> > > > >
> > > > >  		info.flags = 0;
> > > > >
> > > > > +		ret = vfio_domains_get_msi_maps(iommu, &msi_maps);
> > > > > +		if (ret)
> > > > > +			return ret;
> > > >
> > > > And now ioctl(VFIO_IOMMU_GET_INFO) no longer works for any
> > IOMMU
> > > > implementing domain_get_attr but not supporting
> > > > DOMAIN_ATTR_MSI_MAPPING.
> > >
> > > With this current patch version this will get the default assumed behavior
> > as you commented on previous patch.
> > 
> > How so?
> 
> You are right, the ioctl will return failure. But that should be ok, right?

Not remotely.  ioctl(VFIO_IOMMU_GET_INFO) can't suddenly stop working on
some platforms.

> > 
> > +               msi_maps->automap = true;
> > +               msi_maps->override_automap = false;
> > +
> > +               if (domain->ops->domain_get_attr)
> > +                       ret = domain->ops->domain_get_attr(domain, attr,
> > + data);
> > 
> > If domain_get_attr is implemented, but DOMAIN_ATTR_MSI_MAPPING is
> > not, ret should be an error code.
> 
> Currently it returns same error code returned by domain->ops->domain_get_attr(). 
> I do not think we want to complicate that we return an error to user-space that msi's probably cannot be used but user-space can continue with Legacy interrupt, or you want that?

I can't really parse your statement, but ioctl(VFIO_IOMMU_GET_INFO)
works today and it must work with your changes.  Your change should only
affect whether some flags are visible, MSI has worked just fine up to
this point on other platforms.




^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI pages
  2015-10-06  9:05         ` Bhushan Bharat
@ 2015-10-06 15:12           ` Alex Williamson
  0 siblings, 0 replies; 45+ messages in thread
From: Alex Williamson @ 2015-10-06 15:12 UTC (permalink / raw)
  To: Bhushan Bharat
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon

On Tue, 2015-10-06 at 09:05 +0000, Bhushan Bharat wrote:
> 
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Tuesday, October 06, 2015 4:15 AM
> > To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> > Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> > marc.zyngier@arm.com; will.deacon@arm.com
> > Subject: Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI
> > pages
> > 
> > On Mon, 2015-10-05 at 06:27 +0000, Bhushan Bharat wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > > Sent: Saturday, October 03, 2015 4:16 AM
> > > > To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> > > > Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > > > christoffer.dall@linaro.org; eric.auger@linaro.org;
> > > > pranavkumar@linaro.org; marc.zyngier@arm.com; will.deacon@arm.com
> > > > Subject: Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap
> > > > MSI pages
> > > >
> > > > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > > > For MSI interrupts to work for a pass-through devices we need to
> > > > > have mapping of msi-pages in iommu. Now on some platforms (like
> > > > > x86) does this msi-pages mapping happens magically and in these
> > > > > case they chooses an iova which they somehow know that it will
> > > > > never overlap with guest memory. But this magic iova selection may
> > > > > not be always true for all platform (like PowerPC and ARM64).
> > > > >
> > > > > Also on x86 platform, there is no problem as long as running a
> > > > > x86-guest on x86-host but there can be issues when running a
> > > > > non-x86 guest on
> > > > > x86 host or other userspace applications like (I think ODP/DPDK).
> > > > > As in these cases there can be chances that it overlaps with guest
> > > > > memory mapping.
> > > >
> > > > Wow, it's amazing anything works... smoke and mirrors.
> > > >
> > > > > This patch add interface to iommu-map and iommu-unmap msi-pages
> > at
> > > > > reserved iova chosen by userspace.
> > > > >
> > > > > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > > > > ---
> > > > >  drivers/vfio/vfio.c             |  52 +++++++++++++++++++
> > > > >  drivers/vfio/vfio_iommu_type1.c | 111
> > > > ++++++++++++++++++++++++++++++++++++++++
> > > > >  include/linux/vfio.h            |   9 +++-
> > > > >  3 files changed, 171 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index
> > > > > 2fb29df..a817d2d 100644
> > > > > --- a/drivers/vfio/vfio.c
> > > > > +++ b/drivers/vfio/vfio.c
> > > > > @@ -605,6 +605,58 @@ static int vfio_iommu_group_notifier(struct
> > > > notifier_block *nb,
> > > > >  	return NOTIFY_OK;
> > > > >  }
> > > > >
> > > > > +int vfio_device_map_msi(struct vfio_device *device, uint64_t
> > msi_addr,
> > > > > +			uint32_t size, uint64_t *msi_iova) {
> > > > > +	struct vfio_container *container = device->group->container;
> > > > > +	struct vfio_iommu_driver *driver;
> > > > > +	int ret;
> > > > > +
> > > > > +	/* Validate address and size */
> > > > > +	if (!msi_addr || !size || !msi_iova)
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	down_read(&container->group_lock);
> > > > > +
> > > > > +	driver = container->iommu_driver;
> > > > > +	if (!driver || !driver->ops || !driver->ops->msi_map) {
> > > > > +		up_read(&container->group_lock);
> > > > > +		return -EINVAL;
> > > > > +	}
> > > > > +
> > > > > +	ret = driver->ops->msi_map(container->iommu_data,
> > > > > +				   msi_addr, size, msi_iova);
> > > > > +
> > > > > +	up_read(&container->group_lock);
> > > > > +	return ret;
> > > > > +}
> > > > > +
> > > > > +int vfio_device_unmap_msi(struct vfio_device *device, uint64_t
> > > > msi_iova,
> > > > > +			  uint64_t size)
> > > > > +{
> > > > > +	struct vfio_container *container = device->group->container;
> > > > > +	struct vfio_iommu_driver *driver;
> > > > > +	int ret;
> > > > > +
> > > > > +	/* Validate address and size */
> > > > > +	if (!msi_iova || !size)
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	down_read(&container->group_lock);
> > > > > +
> > > > > +	driver = container->iommu_driver;
> > > > > +	if (!driver || !driver->ops || !driver->ops->msi_unmap) {
> > > > > +		up_read(&container->group_lock);
> > > > > +		return -EINVAL;
> > > > > +	}
> > > > > +
> > > > > +	ret = driver->ops->msi_unmap(container->iommu_data,
> > > > > +				     msi_iova, size);
> > > > > +
> > > > > +	up_read(&container->group_lock);
> > > > > +	return ret;
> > > > > +}
> > > > > +
> > > > >  /**
> > > > >   * VFIO driver API
> > > > >   */
> > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > > b/drivers/vfio/vfio_iommu_type1.c index 3315fb6..ab376c2 100644
> > > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > > @@ -1003,12 +1003,34 @@ out_free:
> > > > >  	return ret;
> > > > >  }
> > > > >
> > > > > +static void vfio_iommu_unmap_all_reserved_regions(struct
> > > > > +vfio_iommu
> > > > > +*iommu) {
> > > > > +	struct vfio_resvd_region *region;
> > > > > +	struct vfio_domain *d;
> > > > > +
> > > > > +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> > > > > +		list_for_each_entry(d, &iommu->domain_list, next) {
> > > > > +			if (!region->map_paddr)
> > > > > +				continue;
> > > > > +
> > > > > +			if (!iommu_iova_to_phys(d->domain, region->iova))
> > > > > +				continue;
> > > > > +
> > > > > +			iommu_unmap(d->domain, region->iova,
> > > > PAGE_SIZE);
> > > >
> > > > PAGE_SIZE?  Why not region->size?
> > >
> > > Yes, this should be region->size.
> > >
> > > >
> > > > > +			region->map_paddr = 0;
> > > > > +			cond_resched();
> > > > > +		}
> > > > > +	}
> > > > > +}
> > > > > +
> > > > >  static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu)
> > {
> > > > >  	struct rb_node *node;
> > > > >
> > > > >  	while ((node = rb_first(&iommu->dma_list)))
> > > > >  		vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma,
> > > > node));
> > > > > +
> > > > > +	vfio_iommu_unmap_all_reserved_regions(iommu);
> > > > >  }
> > > > >
> > > > >  static void vfio_iommu_type1_detach_group(void *iommu_data, @@
> > > > > -1048,6 +1070,93 @@ done:
> > > > >  	mutex_unlock(&iommu->lock);
> > > > >  }
> > > > >
> > > > > +static int vfio_iommu_type1_msi_map(void *iommu_data, uint64_t
> > > > msi_addr,
> > > > > +				    uint64_t size, uint64_t *msi_iova) {
> > > > > +	struct vfio_iommu *iommu = iommu_data;
> > > > > +	struct vfio_resvd_region *region;
> > > > > +	int ret;
> > > > > +
> > > > > +	mutex_lock(&iommu->lock);
> > > > > +
> > > > > +	/* Do not try ceate iommu-mapping if msi reconfig not allowed */
> > > > > +	if (!iommu->allow_msi_reconfig) {
> > > > > +		mutex_unlock(&iommu->lock);
> > > > > +		return 0;
> > > > > +	}
> > > > > +
> > > > > +	/* Check if there is already region mapping the msi page */
> > > > > +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> > > > > +		if (region->map_paddr == msi_addr) {
> > > > > +			*msi_iova = region->iova;
> > > > > +			region->refcount++;
> > > > > +			mutex_unlock(&iommu->lock);
> > > > > +			return 0;
> > > > > +		}
> > > > > +	}
> > > > > +
> > > > > +	/* Get a unmapped reserved region */
> > > > > +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> > > > > +		if (!region->map_paddr)
> > > > > +			break;
> > > > > +	}
> > > > > +
> > > > > +	if (region == NULL) {
> > > > > +		mutex_unlock(&iommu->lock);
> > > > > +		return -ENODEV;
> > > > > +	}
> > > > > +
> > > > > +	ret = vfio_iommu_map(iommu, region->iova, msi_addr >>
> > > > PAGE_SHIFT,
> > > > > +			     size >> PAGE_SHIFT, region->prot);
> > > >
> > > > So the reserved region has a size and the msi mapping has a size and
> > > > we arbitrarily decide to use the msi mapping size here?
> > >
> > > Reserved region interface is generic and user can set reserved region of
> > any size (multiple of page-size). But we do not want to create MSI address
> > mapping beyond the MSI-page otherwise this can be security issue. But I
> > think I am not tracking how much reserved iova region is mapped, so unmap
> > is called for same size.
> > >
> > >
> > > >  The overlap checks we've done for the reserved region are
> > > > meaningless then.  No wonder you're unmapping with PAGE_SIZE, we
> > have no idea.
> > >
> > > Do you think we should divide the reserved region in pages and track
> > map/unmap per page?
> > 
> > I'd certainly expect as a user to do one large reserved region mapping and be
> > done rather than a large number of smaller mappings.  I don't really
> > understand how we're providing isolation with this interface though, we're
> > setting up the IOMMU so the guest has a mapping to the MSI, but our
> > IOMMU granularity is page size.  Aren't we giving the guest access to
> > everything else that might be mapped into that page?  Don't we need to
> > push an reservation down to the MSI allocation in order to have isolation?  If
> > we did that, couldn't we pretty much guarantee that all MSI vectors would fit
> > into a page or two?
> 
> Normally we will reserve one MSI-page for a vfio-group and all devices will use same on PowerPC. I think same applies to SMMU as well. Now for X86 I do not know how many pages we needed for a vfio-group? If we needed one/two (small numbers) msi-page then I think for msi purpose one/two reserved-iova region of PAGE_SIZE is sufficient.

x86 will not be reserving pages, x86 needs a mechanism to report the
range reserved by the platform, which needs to be addressed yet.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region
  2015-10-06  9:39       ` Bhushan Bharat
@ 2015-10-06 15:21         ` Alex Williamson
  0 siblings, 0 replies; 45+ messages in thread
From: Alex Williamson @ 2015-10-06 15:21 UTC (permalink / raw)
  To: Bhushan Bharat
  Cc: kvmarm, kvm, christoffer.dall, eric.auger, pranavkumar,
	marc.zyngier, will.deacon

On Tue, 2015-10-06 at 09:39 +0000, Bhushan Bharat wrote:
> 
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Tuesday, October 06, 2015 4:15 AM
> > To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> > Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
> > marc.zyngier@arm.com; will.deacon@arm.com
> > Subject: Re: [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova
> > region
> > 
> > On Mon, 2015-10-05 at 04:55 +0000, Bhushan Bharat wrote:
> > > Hi Alex,
> > >
> > > > -----Original Message-----
> > > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > > Sent: Saturday, October 03, 2015 4:16 AM
> > > > To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
> > > > Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > > > christoffer.dall@linaro.org; eric.auger@linaro.org;
> > > > pranavkumar@linaro.org; marc.zyngier@arm.com; will.deacon@arm.com
> > > > Subject: Re: [RFC PATCH 1/6] vfio: Add interface for add/del
> > > > reserved iova region
> > > >
> > > > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > > > This Patch adds the VFIO APIs to add and remove reserved iova regions.
> > > > > The reserved iova region can be used for mapping some specific
> > > > > physical address in iommu.
> > > > >
> > > > > Currently we are planning to use this interface for adding iova
> > > > > regions for creating iommu of msi-pages. But the API are designed
> > > > > for future extension where some other physical address can be
> > mapped.
> > > > >
> > > > > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > > > > ---
> > > > >  drivers/vfio/vfio_iommu_type1.c | 201
> > > > +++++++++++++++++++++++++++++++++++++++-
> > > > >  include/uapi/linux/vfio.h       |  43 +++++++++
> > > > >  2 files changed, 243 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > > b/drivers/vfio/vfio_iommu_type1.c index 57d8c37..fa5d3e4 100644
> > > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > > @@ -59,6 +59,7 @@ struct vfio_iommu {
> > > > >  	struct rb_root		dma_list;
> > > > >  	bool			v2;
> > > > >  	bool			nesting;
> > > > > +	struct list_head	reserved_iova_list;
> > > >
> > > > This alignment leads to poor packing in the structure, put it above the
> > bools.
> > >
> > > ok
> > >
> > > >
> > > > >  };
> > > > >
> > > > >  struct vfio_domain {
> > > > > @@ -77,6 +78,15 @@ struct vfio_dma {
> > > > >  	int			prot;		/* IOMMU_READ/WRITE */
> > > > >  };
> > > > >
> > > > > +struct vfio_resvd_region {
> > > > > +	dma_addr_t	iova;
> > > > > +	size_t		size;
> > > > > +	int		prot;			/* IOMMU_READ/WRITE */
> > > > > +	int		refcount;		/* ref count of mappings */
> > > > > +	uint64_t	map_paddr;		/* Mapped Physical Address
> > > > */
> > > >
> > > > phys_addr_t
> > >
> > > Ok,
> > >
> > > >
> > > > > +	struct list_head next;
> > > > > +};
> > > > > +
> > > > >  struct vfio_group {
> > > > >  	struct iommu_group	*iommu_group;
> > > > >  	struct list_head	next;
> > > > > @@ -106,6 +116,38 @@ static struct vfio_dma *vfio_find_dma(struct
> > > > vfio_iommu *iommu,
> > > > >  	return NULL;
> > > > >  }
> > > > >
> > > > > +/* This function must be called with iommu->lock held */ static
> > > > > +bool vfio_overlap_with_resvd_region(struct vfio_iommu *iommu,
> > > > > +					   dma_addr_t start, size_t size) {
> > > > > +	struct vfio_resvd_region *region;
> > > > > +
> > > > > +	list_for_each_entry(region, &iommu->reserved_iova_list, next) {
> > > > > +		if (region->iova < start)
> > > > > +			return (start - region->iova < region->size);
> > > > > +		else if (start < region->iova)
> > > > > +			return (region->iova - start < size);
> > > >
> > > > <= on both of the return lines?
> > >
> > > I think is should be "<" and not "=<", no ?
> > 
> > Yep, looks like you're right.  Maybe there's a more straightforward way to do
> > this.
> > 
> > > >
> > > > > +
> > > > > +		return (region->size > 0 && size > 0);
> > > > > +	}
> > > > > +
> > > > > +	return false;
> > > > > +}
> > > > > +
> > > > > +/* This function must be called with iommu->lock held */ static
> > > > > +struct vfio_resvd_region *vfio_find_resvd_region(struct
> > > > > +vfio_iommu
> > > > *iommu,
> > > > > +						 dma_addr_t start, size_t
> > > > size) {
> > > > > +	struct vfio_resvd_region *region;
> > > > > +
> > > > > +	list_for_each_entry(region, &iommu->reserved_iova_list, next)
> > > > > +		if (region->iova == start && region->size == size)
> > > > > +			return region;
> > > > > +
> > > > > +	return NULL;
> > > > > +}
> > > > > +
> > > > >  static void vfio_link_dma(struct vfio_iommu *iommu, struct
> > > > > vfio_dma
> > > > > *new)  {
> > > > >  	struct rb_node **link = &iommu->dma_list.rb_node, *parent =
> > > > NULL; @@
> > > > > -580,7 +622,8 @@ static int vfio_dma_do_map(struct vfio_iommu
> > > > > *iommu,
> > > > >
> > > > >  	mutex_lock(&iommu->lock);
> > > > >
> > > > > -	if (vfio_find_dma(iommu, iova, size)) {
> > > > > +	if (vfio_find_dma(iommu, iova, size) ||
> > > > > +	    vfio_overlap_with_resvd_region(iommu, iova, size)) {
> > > > >  		mutex_unlock(&iommu->lock);
> > > > >  		return -EEXIST;
> > > > >  	}
> > > > > @@ -626,6 +669,127 @@ static int vfio_dma_do_map(struct
> > vfio_iommu
> > > > *iommu,
> > > > >  	return ret;
> > > > >  }
> > > > >
> > > > > +/* This function must be called with iommu->lock held */ static
> > > > > +int vfio_iommu_resvd_region_del(struct vfio_iommu *iommu,
> > > > > +				dma_addr_t iova, size_t size, int prot) {
> > > > > +	struct vfio_resvd_region *res_region;
> > > >
> > > > Have some consistency in naming, just use "region".
> > >
> > > Ok,
> > >
> > > > > +
> > > > > +	res_region = vfio_find_resvd_region(iommu, iova, size);
> > > > > +	/* Region should not be mapped in iommu */
> > > > > +	if (res_region == NULL || res_region->map_paddr)
> > > > > +		return -EINVAL;
> > > >
> > > > Are these two separate errors?  !region is -EINVAL, but being mapped
> > > > is - EBUSY.
> > >
> > > Yes, will separate them.
> > >
> > > >
> > > > > +
> > > > > +	list_del(&res_region->next);
> > > > > +	kfree(res_region);
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > > +/* This function must be called with iommu->lock held */ static
> > > > > +int vfio_iommu_resvd_region_add(struct vfio_iommu *iommu,
> > > > > +				       dma_addr_t iova, size_t size, int prot) {
> > > > > +	struct vfio_resvd_region *res_region;
> > > > > +
> > > > > +	/* Check overlap with with dma maping and reserved regions */
> > > > > +	if (vfio_find_dma(iommu, iova, size) ||
> > > > > +	    vfio_find_resvd_region(iommu, iova, size))
> > > > > +		return -EEXIST;
> > > > > +
> > > > > +	res_region = kzalloc(sizeof(*res_region), GFP_KERNEL);
> > > > > +	if (res_region == NULL)
> > > > > +		return -ENOMEM;
> > > > > +
> > > > > +	res_region->iova = iova;
> > > > > +	res_region->size = size;
> > > > > +	res_region->prot = prot;
> > > > > +	res_region->refcount = 0;
> > > > > +	res_region->map_paddr = 0;
> > > >
> > > > They're already 0 by the kzalloc
> > >
> > > Yes ;)
> > > >
> > > > > +
> > > > > +	list_add(&res_region->next, &iommu->reserved_iova_list);
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > > +static
> > > > > +int vfio_handle_reserved_region_add(struct vfio_iommu *iommu,
> > > > > +				struct vfio_iommu_reserved_region_add
> > > > *region) {
> > > > > +	dma_addr_t iova = region->iova;
> > > > > +	size_t size = region->size;
> > > > > +	int flags = region->flags;
> > > > > +	uint64_t mask;
> > > > > +	int prot = 0;
> > > > > +	int ret;
> > > > > +
> > > > > +	/* Verify that none of our __u64 fields overflow */
> > > > > +	if (region->size != size || region->iova != iova)
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
> > > > > +
> > > > > +	WARN_ON(mask & PAGE_MASK);
> > > > > +
> > > > > +	if (flags & VFIO_IOMMU_RES_REGION_READ)
> > > > > +		prot |= IOMMU_READ;
> > > > > +	if (flags & VFIO_IOMMU_RES_REGION_WRITE)
> > > > > +		prot |= IOMMU_WRITE;
> > > > > +
> > > > > +	if (!prot || !size || (size | iova) & mask)
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	/* Don't allow IOVA wrap */
> > > > > +	if (iova + size - 1 < iova)
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	mutex_lock(&iommu->lock);
> > > > > +
> > > > > +	if (region->flags & VFIO_IOMMU_RES_REGION_ADD) {
> > > > > +		ret = vfio_iommu_resvd_region_add(iommu, iova, size,
> > > > prot);
> > > > > +		if (ret) {
> > > > > +			mutex_unlock(&iommu->lock);
> > > > > +			return ret;
> > > > > +		}
> > > > > +	}
> > > >
> > > > Silently fail if not VFIO_IOMMU_RES_REGION_ADD?
> > >
> > > As per below comment we do not need this flag. So the above flag
> > checking will be removed.
> > >
> > > >
> > > > > +
> > > > > +	mutex_unlock(&iommu->lock);
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > > +static
> > > > > +int vfio_handle_reserved_region_del(struct vfio_iommu *iommu,
> > > > > +				struct vfio_iommu_reserved_region_del
> > > > *region) {
> > > > > +	dma_addr_t iova = region->iova;
> > > > > +	size_t size = region->size;
> > > > > +	int flags = region->flags;
> > > > > +	int ret;
> > > > > +
> > > > > +	if (!(flags & VFIO_IOMMU_RES_REGION_DEL))
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	mutex_lock(&iommu->lock);
> > > > > +
> > > > > +	/* Check for the region */
> > > > > +	if (vfio_find_resvd_region(iommu, iova, size) == NULL) {
> > > > > +		mutex_unlock(&iommu->lock);
> > > > > +		return -EINVAL;
> > > > > +	}
> > > > > +
> > > > > +	/* remove the reserved region */
> > > > > +	if (region->flags & VFIO_IOMMU_RES_REGION_DEL) {
> > > > > +		ret = vfio_iommu_resvd_region_del(iommu, iova, size,
> > > > flags);
> > > > > +		if (ret) {
> > > > > +			mutex_unlock(&iommu->lock);
> > > > > +			return ret;
> > > > > +		}
> > > > > +	}
> > > > > +
> > > > > +	mutex_unlock(&iommu->lock);
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > >  static int vfio_bus_type(struct device *dev, void *data)  {
> > > > >  	struct bus_type **bus = data;
> > > > > @@ -905,6 +1069,7 @@ static void
> > *vfio_iommu_type1_open(unsigned
> > > > long arg)
> > > > >  	}
> > > > >
> > > > >  	INIT_LIST_HEAD(&iommu->domain_list);
> > > > > +	INIT_LIST_HEAD(&iommu->reserved_iova_list);
> > > > >  	iommu->dma_list = RB_ROOT;
> > > > >  	mutex_init(&iommu->lock);
> > > > >
> > > > > @@ -1020,6 +1185,40 @@ static long vfio_iommu_type1_ioctl(void
> > > > *iommu_data,
> > > > >  			return ret;
> > > > >
> > > > >  		return copy_to_user((void __user *)arg, &unmap, minsz);
> > > > > +	} else if (cmd == VFIO_IOMMU_RESERVED_REGION_ADD) {
> > > > > +		struct vfio_iommu_reserved_region_add region;
> > > > > +		long ret;
> > > > > +
> > > > > +		minsz = offsetofend(struct
> > > > vfio_iommu_reserved_region_add,
> > > > > +				    size);
> > > > > +		if (copy_from_user(&region, (void __user *)arg, minsz))
> > > > > +			return -EFAULT;
> > > > > +
> > > > > +		if (region.argsz < minsz)
> > > > > +			return -EINVAL;
> > > > > +
> > > > > +		ret = vfio_handle_reserved_region_add(iommu, &region);
> > > > > +		if (ret)
> > > > > +			return ret;
> > > > > +
> > > > > +		return copy_to_user((void __user *)arg, &region, minsz);
> > > > > +	} else if (cmd == VFIO_IOMMU_RESERVED_REGION_DEL) {
> > > > > +		struct vfio_iommu_reserved_region_del region;
> > > > > +		long ret;
> > > > > +
> > > > > +		minsz = offsetofend(struct
> > > > vfio_iommu_reserved_region_del,
> > > > > +				    size);
> > > > > +		if (copy_from_user(&region, (void __user *)arg, minsz))
> > > > > +			return -EFAULT;
> > > > > +
> > > > > +		if (region.argsz < minsz)
> > > > > +			return -EINVAL;
> > > > > +
> > > > > +		ret = vfio_handle_reserved_region_del(iommu, &region);
> > > > > +		if (ret)
> > > > > +			return ret;
> > > > > +
> > > > > +		return copy_to_user((void __user *)arg, &region, minsz);
> > > >
> > > > So we've just created an interface that is available for all
> > > > vfio-type1 users, whether it makes any sense for the platform or
> > > > not,
> > >
> > > How we should decide that a given platform needs this or not?
> > 
> > You later add new iommu interfaces, presumably if the iommu doesn't
> > implement those interfaces then there's no point in us exposing these
> > interfaces to vfio.
> 
> You mean if an iommu says "does not requires explicit-iommu-mapping MSIs" then if user-space calls these reserved-iova ioctls then we should return error without adding these regions, right?

Yes, what would it mean for the user to add reserved regions if we have
no means to use them?

> > > > and it allows the user to
> > > > consume arbitrary amounts of kernel memory, by making an infinitely
> > > > long list of reserved iova entries, brilliant!
> > >
> > > I was not sure of how to limit the user. What I was thinking of having a
> > default number of pages a user can reserve (512 pages). Also we can give a
> > sysfs interface so that user can change the default number of pages. Does
> > this sound good? If not please suggest?
> > 
> > Isn't 512 entries a lot for a linked list?
> >  Can we use our existing rb tree to manage these entries rather than a secondary list?
> 
> I do not think so, it will complicate the code.

That's not a good answer.  Is a flag on the rb entry insufficient?  Why
is that more complicated than a completely separate list?  We can never
have reserved mappings and dma mappings occupy the same IOVA space and
the rb tree is tracking iova space, so it seems like a natural place to
track both dma and reserved IOVA.

> >  How many entries do we realistically need?
> 
> I think this should be small number of entries as discussed on other patch. Small number means 1 or 2 max on PowerPC and smmu (I think so). 
> 
> >  Can the iommu callbacks help give us a limit?
> 
> I am not sure right now how deterministically we can get number of regions/msi-pages needed for a given platform and for a list of devices in a group?
> 
> >  Can we somehow use information about the devices in the group to produce a limit,
> > ie. MSI vectors possible from the group?
> 
> It seems like we need to know number of vectors needed per device in the group and we needed to know how many vectors can supported using the reserved msi-page. We are sharing a msi-page then it becomes a little more complicated. But overall it seems like we need a 2-3 or very small number of regions and I will suggest that we can go with this small number can enhance if required later on.
> 
> > 
> > > >
> > > > >  	}
> > > > >
> > > > >  	return -ENOTTY;
> > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > > > index b57b750..1abd1a9 100644
> > > > > --- a/include/uapi/linux/vfio.h
> > > > > +++ b/include/uapi/linux/vfio.h
> > > > > @@ -440,6 +440,49 @@ struct vfio_iommu_type1_dma_unmap {
> > > > >  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
> > > > >  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
> > > > >
> > > > > +/**************** Reserved IOVA region specific APIs
> > > > > +**********************/
> > > > > +
> > > > > +/*
> > > > > + * VFIO_IOMMU_RESERVED_REGION_ADD - _IO(VFIO_TYPE,
> > VFIO_BASE
> > > > + 17,
> > > > > + *					struct
> > > > vfio_iommu_reserved_region_add)
> > > > > + * This is used to add a reserved iova region.
> > > > > + * @flags - Input: VFIO_IOMMU_RES_REGION_ADD flag is for adding
> > > > > + * a reserved region.
> > > >
> > > > Why else would we call VFIO_IOMMU_RESERVED_REGION_ADD except
> > to add
> > > > a region, this flag is redundant.
> > >
> > > Ok, will remove this.
> > >
> > > >
> > > > > + * Also pass READ/WRITE/IOMMU flags to be used in iommu mapping.
> > > > > + * @iova - Input: IOVA base address of reserved region
> > > > > + * @size - Input: Size of the reserved region
> > > > > + * Return: 0 on success, -errno on failure  */ struct
> > > > > +vfio_iommu_reserved_region_add {
> > > > > +	__u32   argsz;
> > > > > +	__u32   flags;
> > > > > +#define VFIO_IOMMU_RES_REGION_ADD	(1 << 0) /* Add a
> > reserved
> > > > region */
> > > > > +#define VFIO_IOMMU_RES_REGION_READ	(1 << 1) /* readable
> > region */
> > > > > +#define VFIO_IOMMU_RES_REGION_WRITE	(1 << 2) /* writable
> > > > region */
> > > > > +	__u64	iova;
> > > > > +	__u64   size;
> > > > > +};
> > > > > +#define VFIO_IOMMU_RESERVED_REGION_ADD _IO(VFIO_TYPE,
> > > > VFIO_BASE + 17)
> > > > > +
> > > > > +/*
> > > > > + * VFIO_IOMMU_RESERVED_REGION_DEL - _IO(VFIO_TYPE,
> > VFIO_BASE +
> > > > 18,
> > > > > + *					struct
> > > > vfio_iommu_reserved_region_del)
> > > > > + * This is used to delete an existing reserved iova region.
> > > > > + * @flags - VFIO_IOMMU_RES_REGION_DEL flag is for deleting a
> > > > > +region use,
> > > > > + *  only a unmapped region can be deleted.
> > > > > + * @iova - Input: IOVA base address of reserved region
> > > > > + * @size - Input: Size of the reserved region
> > > > > + * Return: 0 on success, -errno on failure  */ struct
> > > > > +vfio_iommu_reserved_region_del {
> > > > > +	__u32   argsz;
> > > > > +	__u32   flags;
> > > > > +#define VFIO_IOMMU_RES_REGION_DEL	(1 << 0) /* unset the
> > > > reserved region */
> > > > > +	__u64	iova;
> > > > > +	__u64   size;
> > > > > +};
> > > > > +#define VFIO_IOMMU_RESERVED_REGION_DEL _IO(VFIO_TYPE,
> > > > VFIO_BASE + 18)
> > > > > +
> > > >
> > > > These are effectively both
> > > >
> > > > struct vfio_iommu_type1_dma_unmap
> > >
> > > Yes, do you want to suggest that we should use " struct
> > vfio_iommu_type1_dma_unmap". I found that confusing.
> > > What is we use "struct vfio_iommu_reserved_region" and use flag
> > VFIO_IOMMU_RES_REGION_DEL/ VFIO_IOMMU_RES_REGION_ADD.
> > 
> > What if we just use the existing map and unmap interface with a flag to
> > indicate an MSI reserved mapping?  I don't really see why we need new ioctls
> > for this.
> 
> Existing map/unmap APIs uses virtual address, user-space mmap() and provide virtual address as well. I think in previous discussion you commented that let's keep then separate, over-riding complicates the map/unmap and can be cause of confusion, and keeping these separate also gives space to extend these reserved iova APIs for something else in future (do not know for what right now). I personally think that it make sense to keep them separate, let me know if you think otherwise 

It seems pretty simple for a userspace app to know that the virtual
address isn't used when doing a mapping with MSI_RESERVED set.  The
proposed API doesn't allow any room for extension because it's
completely unspecified what the reserved region is intended for.  Here
we use them for MSI, but what else are we allowed to use them for?  How
would the user specify on reserved type vs another in a future
extension?



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for MSI
  2015-10-06 10:26         ` Bhushan Bharat
@ 2015-10-26 15:40           ` Christoffer Dall
  2015-11-02  2:53           ` Pranavkumar Sawargaonkar
  1 sibling, 0 replies; 45+ messages in thread
From: Christoffer Dall @ 2015-10-26 15:40 UTC (permalink / raw)
  To: Bhushan Bharat
  Cc: Alex Williamson, marc.zyngier, kvmarm, kvm, eric.auger,
	pranavkumar, will.deacon, Andre Przywara

On Tue, Oct 06, 2015 at 10:26:34AM +0000, Bhushan Bharat wrote:

[...]

> > 
> > I'm certainly not interested in applying an maintaining an interim solution that
> > isn't the right one.  It seems like VFIO is too involved in this process in your
> > example.  On x86 we have per vector isolation and the only thing we're
> > missing is reporting back of the region used by MSI vectors as reserved IOVA
> > space (but it's standard on x86, so an x86 VM user will never use it for IOVA).
> 
> I remember you mentioned that there is no problem when running an x86 guest on an x86 host.  But it will interesting when running a non-x86 VMs on an x86 host  or non-VM userspace use of VFIO though. 
> 
> > In your model, the MSI IOVA space is programmable,
> 
> Yes, on PowerPC and ARM-SMMU case also we have to create mapping with an IOVA. First question is which IOVA to be used, and we added the reserved iova ioctl for same.
> 
> Second problem is we needed an msi-page physical address for setting up iommu-mapping, and so we needed to reserve an msi-page. I did this for PowerPC but not in a generic extension in msi-driver and will look the code a bit more details on adding an interface to reserve an msi-page or get a shared msi-page with allow-unsafe-interrupt.

Sorry, I'm far from familiar with how x86 does interrupt handling and
I know very little of PCIe and MSIs, so please allow me to ask some
stupid questions:

What does an msi-page physical address mean?

> 
> Third problem is to report the reserved IOVA to be used for MSI vectors for the given set of devices (devices in a vfio-group). 

What do MSI vectors mean in this context?  Is this a Linux kernel
construct, something tied to PCIe, something tied to the interrupt
controller, or?

In the case of ARM, AFAIU, you have a single doorbell register per ITS
and devices can write to this register with their device id and the
eventid.  So it's a register in a page somewhere.

Now, what is the problem you don't understand with ARM here?

> 
> Mark/Christopher,
> I am not an expert in this area so I might have to understand that code. If you think you can give solution to 2nd and 3rd problem quickly then please let me know.
> 
I don't really understand what you're asking, but if you can educate me
on the concepts above I may be able to offer some advice.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for MSI
  2015-10-06 10:26         ` Bhushan Bharat
  2015-10-26 15:40           ` Christoffer Dall
@ 2015-11-02  2:53           ` Pranavkumar Sawargaonkar
  1 sibling, 0 replies; 45+ messages in thread
From: Pranavkumar Sawargaonkar @ 2015-11-02  2:53 UTC (permalink / raw)
  To: Bhushan Bharat
  Cc: Alex Williamson, christoffer.dall, marc.zyngier, kvmarm, kvm,
	eric.auger, will.deacon

Hi Bharat,

On 6 October 2015 at 15:56, Bhushan Bharat <Bharat.Bhushan@freescale.com> wrote:
>
>
>> -----Original Message-----
>> From: Alex Williamson [mailto:alex.williamson@redhat.com]
>> Sent: Tuesday, October 06, 2015 4:25 AM
>> To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
>> Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
>> christoffer.dall@linaro.org; eric.auger@linaro.org; pranavkumar@linaro.org;
>> marc.zyngier@arm.com; will.deacon@arm.com
>> Subject: Re: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for
>> MSI
>>
>> On Mon, 2015-10-05 at 08:33 +0000, Bhushan Bharat wrote:
>> >
>> >
>> > > -----Original Message-----
>> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
>> > > Sent: Saturday, October 03, 2015 4:17 AM
>> > > To: Bhushan Bharat-R65777 <Bharat.Bhushan@freescale.com>
>> > > Cc: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org;
>> > > christoffer.dall@linaro.org; eric.auger@linaro.org;
>> > > pranavkumar@linaro.org; marc.zyngier@arm.com; will.deacon@arm.com
>> > > Subject: Re: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping
>> > > for MSI
>> > >
>> > > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
>> > > > Finally ARM SMMU declare that iommu-mapping for MSI-pages are not
>> > > > set automatically and it should be set explicitly.
>> > > >
>> > > > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
>> > > > ---
>> > > >  drivers/iommu/arm-smmu.c | 7 ++++++-
>> > > >  1 file changed, 6 insertions(+), 1 deletion(-)
>> > > >
>> > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> > > index
>> > > > a3956fb..9d37e72 100644
>> > > > --- a/drivers/iommu/arm-smmu.c
>> > > > +++ b/drivers/iommu/arm-smmu.c
>> > > > @@ -1401,13 +1401,18 @@ static int
>> arm_smmu_domain_get_attr(struct
>> > > iommu_domain *domain,
>> > > >                                     enum iommu_attr attr, void *data)  {
>> > > >         struct arm_smmu_domain *smmu_domain =
>> > > to_smmu_domain(domain);
>> > > > +       struct iommu_domain_msi_maps *msi_maps;
>> > > >
>> > > >         switch (attr) {
>> > > >         case DOMAIN_ATTR_NESTING:
>> > > >                 *(int *)data = (smmu_domain->stage ==
>> > > ARM_SMMU_DOMAIN_NESTED);
>> > > >                 return 0;
>> > > >         case DOMAIN_ATTR_MSI_MAPPING:
>> > > > -               /* Dummy handling added */
>> > > > +               msi_maps = data;
>> > > > +
>> > > > +               msi_maps->automap = false;
>> > > > +               msi_maps->override_automap = true;
>> > > > +
>> > > >                 return 0;
>> > > >         default:
>> > > >                 return -ENODEV;
>> > >
>> > > In previous discussions I understood one of the problems you were
>> > > trying to solve was having a limited number of MSI banks and while
>> > > you may be able to get isolated MSI banks for some number of users,
>> > > it wasn't unlimited and sharing may be required.  I don't see any of that
>> addressed in this series.
>> >
>> > That problem was on PowerPC. Infact there were two problems, one which
>> MSI bank to be used and second how to create iommu-mapping for device
>> assigned to userspace.
>> > First problem was PowerPC specific and that will be solved separately.
>> > For second problem, earlier I tried to added a couple of MSI specific ioctls
>> and you suggested (IIUC) that we should have a generic reserved-iova type
>> of API and then we can map MSI bank using reserved-iova and this will not
>> require involvement of user-space.
>> >
>> > >
>> > > Also, the management of reserved IOVAs vs MSI addresses looks really
>> > > dubious to me.  How does your platform pick an MSI address and what
>> > > are we breaking by covertly changing it?  We seem to be masking over
>> > > at the VFIO level, where there should be lower level interfaces
>> > > doing the right thing when we configure MSI on the device.
>> >
>> > Yes, In my understanding the right solution should be:
>> >  1) VFIO driver should know what physical-msi-address will be used for
>> devices in an iommu-group.
>> >     I did not find an generic API, on PowerPC I added some function in
>> ffrescale msi-driver and called from vfio-iommu-fsl-pamu.c (not yet
>> upstreamed).
>> >  2) VFIO driver should know what IOVA to be used for creating
>> > iommu-mapping (VFIO APIs patch of this patch series)
>> >  3) VFIO driver will create the iommu-mapping using (1) and (2)
>> >  4) VFIO driver should be able to tell the msi-driver that for a given device it
>> should use different IOVA. So when composing the msi message (for the
>> devices is the given iommu-group) it should use that programmed iova as
>> MSI-address. This interface also needed to be developed.
>> >
>> > I was not sure of which approach we should take. The current approach in
>> the patch is simple to develop so I went ahead to take input but I agree this
>> does not look very good.
>> > What do you think, should drop this approach and work out the approach
>> as described above.
>>
>> I'm certainly not interested in applying an maintaining an interim solution that
>> isn't the right one.  It seems like VFIO is too involved in this process in your
>> example.  On x86 we have per vector isolation and the only thing we're
>> missing is reporting back of the region used by MSI vectors as reserved IOVA
>> space (but it's standard on x86, so an x86 VM user will never use it for IOVA).
>
> I remember you mentioned that there is no problem when running an x86 guest on an x86 host.  But it will interesting when running a non-x86 VMs on an x86 host  or non-VM userspace use of VFIO though.
>
>> In your model, the MSI IOVA space is programmable,
>
> Yes, on PowerPC and ARM-SMMU case also we have to create mapping with an IOVA. First question is which IOVA to be used, and we added the reserved iova ioctl for same.
>
> Second problem is we needed an msi-page physical address for setting up iommu-mapping, and so we needed to reserve an msi-page. I did this for PowerPC but not in a generic extension in msi-driver and will look the code a bit more details on adding an interface to reserve an msi-page or get a shared msi-page with allow-unsafe-interrupt.

I think reserving MSI page is tricky as in arm/arm64 in MSI
controllers like GICv2M:
1. We  have number of interrupts mapped to one physical MSI address.
2. Two root complexes can have same MSI controller this means two EP
devices on two different RCs can have same MSI address programmed (and
will have different data to generate different MSI interrupt)
3. This means reserving of a MSI physical address is difficult since
we can have one EP device from one RC assigned to a VM and another EP
device can be used by host linux or by a different VM.

One way to get msi physical address which I have tried in my patch series was:

1. Let linux host controller assigned MSI address (by calling
request_irq) which current VFIO PCI driver does, and then extract this
address.
2. Map this extracted address with an IOVA address.

>
> Third problem is to report the reserved IOVA to be used for MSI vectors for the given set of devices (devices in a vfio-group).

I am not sure what is the problem here ? as userspace is going to tell
us which IOVA is to be used.
Do you mean -
1. userspace is going to provide us a range of IOVA and we will choose
one of them to map it with a physical MSI address.
2. Now how to tell userspace which IOVA we have used ???????

>
> Mark/Christopher,
> I am not an expert in this area so I might have to understand that code. If you think you can give solution to 2nd and 3rd problem quickly then please let me know.
>
> Thanks
> -Bharat
>
>> but it has page
>> granularity (I assume).  Therefore we shouldn't be sharing that page with
>> anyone.  That seems to suggest we need to allocate a page of vector space
>> from the host kernel, setup the IOVA mapping, and then the host kernel
>> should know to only allocate MSI vectors for these devices from that pre-
>> allocated page.  Otherwise we need to call the interrupts unsafe, like we do
>> on x86 without interrupt remapping.  Thanks,
>>
>> Alex
>
Thanks,
Pranav

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2015-11-02  2:53 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-30 14:56 [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region Bharat Bhushan
2015-09-30 14:56 ` [RFC PATCH 2/6] iommu: Add interface to get msi-pages mapping attributes Bharat Bhushan
2015-09-30 14:56   ` Bharat Bhushan
2015-10-02 22:45   ` Alex Williamson
2015-10-05  5:17     ` Bhushan Bharat
2015-10-05  5:56     ` Bhushan Bharat
2015-09-30 14:56 ` [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state Bharat Bhushan
2015-09-30 14:56   ` Bharat Bhushan
2015-10-02 22:46   ` Alex Williamson
2015-10-05  6:00     ` Bhushan Bharat
2015-10-05 22:45       ` Alex Williamson
2015-10-06  8:53         ` Bhushan Bharat
2015-10-06 15:11           ` Alex Williamson
2015-09-30 14:56 ` [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI pages Bharat Bhushan
2015-09-30 14:56   ` Bharat Bhushan
2015-10-02 22:46   ` Alex Williamson
2015-10-05  6:27     ` Bhushan Bharat
2015-10-05 22:45       ` Alex Williamson
2015-10-06  9:05         ` Bhushan Bharat
2015-10-06 15:12           ` Alex Williamson
2015-09-30 14:56 ` [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt Bharat Bhushan
2015-09-30 14:56   ` Bharat Bhushan
2015-09-30 11:02   ` kbuild test robot
2015-09-30 11:02     ` kbuild test robot
2015-09-30 11:32     ` Bhushan Bharat
2015-09-30 11:34   ` kbuild test robot
2015-09-30 11:34     ` kbuild test robot
2015-10-02 22:46   ` Alex Williamson
2015-10-05  7:20     ` Bhushan Bharat
2015-10-05 22:44       ` Alex Williamson
2015-10-06  8:32         ` Bhushan Bharat
2015-10-06 15:06           ` Alex Williamson
2015-09-30 14:56 ` [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for MSI Bharat Bhushan
2015-09-30 14:56   ` Bharat Bhushan
2015-10-02 22:46   ` Alex Williamson
2015-10-05  8:33     ` Bhushan Bharat
2015-10-05 22:54       ` Alex Williamson
2015-10-06 10:26         ` Bhushan Bharat
2015-10-26 15:40           ` Christoffer Dall
2015-11-02  2:53           ` Pranavkumar Sawargaonkar
2015-10-02 22:45 ` [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region Alex Williamson
2015-10-05  4:55   ` Bhushan Bharat
2015-10-05 22:45     ` Alex Williamson
2015-10-06  9:39       ` Bhushan Bharat
2015-10-06 15:21         ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.