[PATCH v5 0/7] vfio/type1: Add support for valid iova list management

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v5 0/7] vfio/type1: Add support for valid iova list management
@ 2018-03-15 16:35 ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, pmorel
  Cc: kvm, linux-kernel, iommu, linuxarm, john.garry, xuwei5, Shameer Kolothum

This series introduces an iova list associated with a vfio 
iommu. The list is kept updated taking care of iommu apertures,
and reserved regions. Also this series adds checks for any conflict
with existing dma mappings whenever a new device group is attached to
the domain.

User-space can retrieve valid iova ranges using VFIO_IOMMU_GET_INFO
ioctl capability chains. Any dma map request outside the valid iova
range will be rejected.


v4 --> v5
Rebased to next-20180315.
 
 -Incorporated the corner case bug fix suggested by Alex to patch #5.
 -Based on suggestions by Alex and Robin, added patch#7. This
  moves the PCI window  reservation back in to DMA specific path.
  This is to fix the issue reported by Eric[1].

Note:
The patch #7 has dependency with [2][3]

1. https://patchwork.kernel.org/patch/10232043/
2. https://patchwork.kernel.org/patch/10216553/
3. https://patchwork.kernel.org/patch/10216555/

v3 --> v4
 Addressed comments received for v3.
 -dma_addr_t instead of phys_addr_t
 -LIST_HEAD() usage.
 -Free up iova_copy list in case of error.
 -updated logic in filling the iova caps info(patch #5)

RFCv2 --> v3
 Removed RFC tag.
 Addressed comments from Alex and Eric:
 - Added comments to make iova list management logic more clear.
 - Use of iova list copy so that original is not altered in
   case of failure.

RFCv1 --> RFCv2
 Addressed comments from Alex:
-Introduced IOVA list management and added checks for conflicts with 
 existing dma map entries during attach/detach.

Shameer Kolothum (2):
  vfio/type1: Add IOVA range capability support
  iommu/dma: Move PCI window region reservation back into dma specific
    path.

Shameerali Kolothum Thodi (5):
  vfio/type1: Introduce iova list and add iommu aperture validity check
  vfio/type1: Check reserve region conflict and update iova list
  vfio/type1: Update iova list on detach
  vfio/type1: check dma map request is within a valid iova range
  vfio/type1: remove duplicate retrieval of reserved regions

 drivers/iommu/dma-iommu.c       |  54 ++---
 drivers/vfio/vfio_iommu_type1.c | 497 +++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/vfio.h       |  23 ++
 3 files changed, 533 insertions(+), 41 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v5 0/7] vfio/type1: Add support for valid iova list management
@ 2018-03-15 16:35 ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, pmorel
  Cc: kvm, linux-kernel, iommu, linuxarm, john.garry, xuwei5, Shameer Kolothum

This series introduces an iova list associated with a vfio 
iommu. The list is kept updated taking care of iommu apertures,
and reserved regions. Also this series adds checks for any conflict
with existing dma mappings whenever a new device group is attached to
the domain.

User-space can retrieve valid iova ranges using VFIO_IOMMU_GET_INFO
ioctl capability chains. Any dma map request outside the valid iova
range will be rejected.


v4 --> v5
Rebased to next-20180315.
 
 -Incorporated the corner case bug fix suggested by Alex to patch #5.
 -Based on suggestions by Alex and Robin, added patch#7. This
  moves the PCI window  reservation back in to DMA specific path.
  This is to fix the issue reported by Eric[1].

Note:
The patch #7 has dependency with [2][3]

1. https://patchwork.kernel.org/patch/10232043/
2. https://patchwork.kernel.org/patch/10216553/
3. https://patchwork.kernel.org/patch/10216555/

v3 --> v4
 Addressed comments received for v3.
 -dma_addr_t instead of phys_addr_t
 -LIST_HEAD() usage.
 -Free up iova_copy list in case of error.
 -updated logic in filling the iova caps info(patch #5)

RFCv2 --> v3
 Removed RFC tag.
 Addressed comments from Alex and Eric:
 - Added comments to make iova list management logic more clear.
 - Use of iova list copy so that original is not altered in
   case of failure.

RFCv1 --> RFCv2
 Addressed comments from Alex:
-Introduced IOVA list management and added checks for conflicts with 
 existing dma map entries during attach/detach.

Shameer Kolothum (2):
  vfio/type1: Add IOVA range capability support
  iommu/dma: Move PCI window region reservation back into dma specific
    path.

Shameerali Kolothum Thodi (5):
  vfio/type1: Introduce iova list and add iommu aperture validity check
  vfio/type1: Check reserve region conflict and update iova list
  vfio/type1: Update iova list on detach
  vfio/type1: check dma map request is within a valid iova range
  vfio/type1: remove duplicate retrieval of reserved regions

 drivers/iommu/dma-iommu.c       |  54 ++---
 drivers/vfio/vfio_iommu_type1.c | 497 +++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/vfio.h       |  23 ++
 3 files changed, 533 insertions(+), 41 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v5 1/7] vfio/type1: Introduce iova list and add iommu aperture validity check
@ 2018-03-15 16:35   ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, pmorel
  Cc: kvm, linux-kernel, iommu, linuxarm, john.garry, xuwei5, Shameer Kolothum

This introduces an iova list that is valid for dma mappings. Make
sure the new iommu aperture window doesn't conflict with the current
one or with any existing dma mappings during attach.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/vfio/vfio_iommu_type1.c | 183 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 180 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 45657e2..1123c74 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -60,6 +60,7 @@ MODULE_PARM_DESC(disable_hugepages,
 
 struct vfio_iommu {
 	struct list_head	domain_list;
+	struct list_head	iova_list;
 	struct vfio_domain	*external_domain; /* domain for external user */
 	struct mutex		lock;
 	struct rb_root		dma_list;
@@ -92,6 +93,12 @@ struct vfio_group {
 	struct list_head	next;
 };
 
+struct vfio_iova {
+	struct list_head	list;
+	dma_addr_t		start;
+	dma_addr_t		end;
+};
+
 /*
  * Guest RAM pinning working set or DMA target
  */
@@ -1204,6 +1211,149 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base)
 	return ret;
 }
 
+/*
+ * This is a helper function to insert an address range to iova list.
+ * The list starts with a single entry corresponding to the IOMMU
+ * domain geometry to which the device group is attached. The list
+ * aperture gets modified when a new domain is added to the container
+ * if the new aperture doesn't conflict with the current one or with
+ * any existing dma mappings. The list is also modified to exclude
+ * any reserved regions associated with the device group.
+ */
+static int vfio_iommu_iova_insert(struct list_head *head,
+				  dma_addr_t start, dma_addr_t end)
+{
+	struct vfio_iova *region;
+
+	region = kmalloc(sizeof(*region), GFP_KERNEL);
+	if (!region)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&region->list);
+	region->start = start;
+	region->end = end;
+
+	list_add_tail(&region->list, head);
+	return 0;
+}
+
+/*
+ * Check the new iommu aperture conflicts with existing aper or with any
+ * existing dma mappings.
+ */
+static bool vfio_iommu_aper_conflict(struct vfio_iommu *iommu,
+				     dma_addr_t start, dma_addr_t end)
+{
+	struct vfio_iova *first, *last;
+	struct list_head *iova = &iommu->iova_list;
+
+	if (list_empty(iova))
+		return false;
+
+	/* Disjoint sets, return conflict */
+	first = list_first_entry(iova, struct vfio_iova, list);
+	last = list_last_entry(iova, struct vfio_iova, list);
+	if ((start > last->end) || (end < first->start))
+		return true;
+
+	/* Check for any existing dma mappings outside the new start */
+	if (start > first->start) {
+		if (vfio_find_dma(iommu, first->start, start - first->start))
+			return true;
+	}
+
+	/* Check for any existing dma mappings outside the new end */
+	if (end < last->end) {
+		if (vfio_find_dma(iommu, end + 1, last->end - end))
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Resize iommu iova aperture window. This is called only if the new
+ * aperture has no conflict with existing aperture and dma mappings.
+ */
+static int vfio_iommu_aper_resize(struct list_head *iova,
+				      dma_addr_t start,
+				      dma_addr_t end)
+{
+	struct vfio_iova *node, *next;
+
+	if (list_empty(iova))
+		return vfio_iommu_iova_insert(iova, start, end);
+
+	/* Adjust iova list start */
+	list_for_each_entry_safe(node, next, iova, list) {
+		if (start < node->start)
+			break;
+		if ((start >= node->start) && (start < node->end)) {
+			node->start = start;
+			break;
+		}
+		/* Delete nodes before new start */
+		list_del(&node->list);
+		kfree(node);
+	}
+
+	/* Adjust iova list end */
+	list_for_each_entry_safe(node, next, iova, list) {
+		if (end > node->end)
+			continue;
+		if ((end > node->start) && (end <= node->end)) {
+			node->end = end;
+			continue;
+		}
+		/* Delete nodes after new end */
+		list_del(&node->list);
+		kfree(node);
+	}
+
+	return 0;
+}
+
+static void vfio_iommu_iova_free(struct list_head *iova)
+{
+	struct vfio_iova *n, *next;
+
+	list_for_each_entry_safe(n, next, iova, list) {
+		list_del(&n->list);
+		kfree(n);
+	}
+}
+
+static int vfio_iommu_iova_get_copy(struct vfio_iommu *iommu,
+				    struct list_head *iova_copy)
+{
+
+	struct list_head *iova = &iommu->iova_list;
+	struct vfio_iova *n;
+	int ret;
+
+	list_for_each_entry(n, iova, list) {
+		ret = vfio_iommu_iova_insert(iova_copy, n->start, n->end);
+		if (ret)
+			goto out_free;
+	}
+
+	return 0;
+
+out_free:
+	vfio_iommu_iova_free(iova_copy);
+	return ret;
+}
+
+static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu,
+					struct list_head *iova_copy)
+{
+	struct list_head *iova = &iommu->iova_list;
+
+	vfio_iommu_iova_free(iova);
+
+	list_splice_tail(iova_copy, iova);
+}
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
 					 struct iommu_group *iommu_group)
 {
@@ -1214,6 +1364,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	int ret;
 	bool resv_msi, msi_remap;
 	phys_addr_t resv_msi_base;
+	struct iommu_domain_geometry geo;
+	LIST_HEAD(iova_copy);
 
 	mutex_lock(&iommu->lock);
 
@@ -1283,6 +1435,25 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_domain;
 
+	/* Get aperture info */
+	iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY, &geo);
+
+	if (vfio_iommu_aper_conflict(iommu, geo.aperture_start,
+					    geo.aperture_end)) {
+		ret = -EINVAL;
+		goto out_detach;
+	}
+
+	/* Get a copy of the current iova list and work on it */
+	ret = vfio_iommu_iova_get_copy(iommu, &iova_copy);
+	if (ret)
+		goto out_detach;
+
+	ret = vfio_iommu_aper_resize(&iova_copy, geo.aperture_start,
+						 geo.aperture_end);
+	if (ret)
+		goto out_detach;
+
 	resv_msi = vfio_iommu_has_sw_msi(iommu_group, &resv_msi_base);
 
 	INIT_LIST_HEAD(&domain->group_list);
@@ -1316,8 +1487,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 				list_add(&group->next, &d->group_list);
 				iommu_domain_free(domain->domain);
 				kfree(domain);
-				mutex_unlock(&iommu->lock);
-				return 0;
+				goto done;
 			}
 
 			ret = iommu_attach_group(domain->domain, iommu_group);
@@ -1340,7 +1510,9 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	}
 
 	list_add(&domain->next, &iommu->domain_list);
-
+done:
+	/* Delete the old one and insert new iova list */
+	vfio_iommu_iova_insert_copy(iommu, &iova_copy);
 	mutex_unlock(&iommu->lock);
 
 	return 0;
@@ -1349,6 +1521,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	iommu_detach_group(domain->domain, iommu_group);
 out_domain:
 	iommu_domain_free(domain->domain);
+	vfio_iommu_iova_free(&iova_copy);
 out_free:
 	kfree(domain);
 	kfree(group);
@@ -1487,6 +1660,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	}
 
 	INIT_LIST_HEAD(&iommu->domain_list);
+	INIT_LIST_HEAD(&iommu->iova_list);
 	iommu->dma_list = RB_ROOT;
 	mutex_init(&iommu->lock);
 	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
@@ -1529,6 +1703,9 @@ static void vfio_iommu_type1_release(void *iommu_data)
 		list_del(&domain->next);
 		kfree(domain);
 	}
+
+	vfio_iommu_iova_free(&iommu->iova_list);
+
 	kfree(iommu);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 1/7] vfio/type1: Introduce iova list and add iommu aperture validity check
@ 2018-03-15 16:35   ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

This introduces an iova list that is valid for dma mappings. Make
sure the new iommu aperture window doesn't conflict with the current
one or with any existing dma mappings during attach.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 drivers/vfio/vfio_iommu_type1.c | 183 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 180 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 45657e2..1123c74 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -60,6 +60,7 @@ MODULE_PARM_DESC(disable_hugepages,
 
 struct vfio_iommu {
 	struct list_head	domain_list;
+	struct list_head	iova_list;
 	struct vfio_domain	*external_domain; /* domain for external user */
 	struct mutex		lock;
 	struct rb_root		dma_list;
@@ -92,6 +93,12 @@ struct vfio_group {
 	struct list_head	next;
 };
 
+struct vfio_iova {
+	struct list_head	list;
+	dma_addr_t		start;
+	dma_addr_t		end;
+};
+
 /*
  * Guest RAM pinning working set or DMA target
  */
@@ -1204,6 +1211,149 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base)
 	return ret;
 }
 
+/*
+ * This is a helper function to insert an address range to iova list.
+ * The list starts with a single entry corresponding to the IOMMU
+ * domain geometry to which the device group is attached. The list
+ * aperture gets modified when a new domain is added to the container
+ * if the new aperture doesn't conflict with the current one or with
+ * any existing dma mappings. The list is also modified to exclude
+ * any reserved regions associated with the device group.
+ */
+static int vfio_iommu_iova_insert(struct list_head *head,
+				  dma_addr_t start, dma_addr_t end)
+{
+	struct vfio_iova *region;
+
+	region = kmalloc(sizeof(*region), GFP_KERNEL);
+	if (!region)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&region->list);
+	region->start = start;
+	region->end = end;
+
+	list_add_tail(&region->list, head);
+	return 0;
+}
+
+/*
+ * Check the new iommu aperture conflicts with existing aper or with any
+ * existing dma mappings.
+ */
+static bool vfio_iommu_aper_conflict(struct vfio_iommu *iommu,
+				     dma_addr_t start, dma_addr_t end)
+{
+	struct vfio_iova *first, *last;
+	struct list_head *iova = &iommu->iova_list;
+
+	if (list_empty(iova))
+		return false;
+
+	/* Disjoint sets, return conflict */
+	first = list_first_entry(iova, struct vfio_iova, list);
+	last = list_last_entry(iova, struct vfio_iova, list);
+	if ((start > last->end) || (end < first->start))
+		return true;
+
+	/* Check for any existing dma mappings outside the new start */
+	if (start > first->start) {
+		if (vfio_find_dma(iommu, first->start, start - first->start))
+			return true;
+	}
+
+	/* Check for any existing dma mappings outside the new end */
+	if (end < last->end) {
+		if (vfio_find_dma(iommu, end + 1, last->end - end))
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Resize iommu iova aperture window. This is called only if the new
+ * aperture has no conflict with existing aperture and dma mappings.
+ */
+static int vfio_iommu_aper_resize(struct list_head *iova,
+				      dma_addr_t start,
+				      dma_addr_t end)
+{
+	struct vfio_iova *node, *next;
+
+	if (list_empty(iova))
+		return vfio_iommu_iova_insert(iova, start, end);
+
+	/* Adjust iova list start */
+	list_for_each_entry_safe(node, next, iova, list) {
+		if (start < node->start)
+			break;
+		if ((start >= node->start) && (start < node->end)) {
+			node->start = start;
+			break;
+		}
+		/* Delete nodes before new start */
+		list_del(&node->list);
+		kfree(node);
+	}
+
+	/* Adjust iova list end */
+	list_for_each_entry_safe(node, next, iova, list) {
+		if (end > node->end)
+			continue;
+		if ((end > node->start) && (end <= node->end)) {
+			node->end = end;
+			continue;
+		}
+		/* Delete nodes after new end */
+		list_del(&node->list);
+		kfree(node);
+	}
+
+	return 0;
+}
+
+static void vfio_iommu_iova_free(struct list_head *iova)
+{
+	struct vfio_iova *n, *next;
+
+	list_for_each_entry_safe(n, next, iova, list) {
+		list_del(&n->list);
+		kfree(n);
+	}
+}
+
+static int vfio_iommu_iova_get_copy(struct vfio_iommu *iommu,
+				    struct list_head *iova_copy)
+{
+
+	struct list_head *iova = &iommu->iova_list;
+	struct vfio_iova *n;
+	int ret;
+
+	list_for_each_entry(n, iova, list) {
+		ret = vfio_iommu_iova_insert(iova_copy, n->start, n->end);
+		if (ret)
+			goto out_free;
+	}
+
+	return 0;
+
+out_free:
+	vfio_iommu_iova_free(iova_copy);
+	return ret;
+}
+
+static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu,
+					struct list_head *iova_copy)
+{
+	struct list_head *iova = &iommu->iova_list;
+
+	vfio_iommu_iova_free(iova);
+
+	list_splice_tail(iova_copy, iova);
+}
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
 					 struct iommu_group *iommu_group)
 {
@@ -1214,6 +1364,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	int ret;
 	bool resv_msi, msi_remap;
 	phys_addr_t resv_msi_base;
+	struct iommu_domain_geometry geo;
+	LIST_HEAD(iova_copy);
 
 	mutex_lock(&iommu->lock);
 
@@ -1283,6 +1435,25 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_domain;
 
+	/* Get aperture info */
+	iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY, &geo);
+
+	if (vfio_iommu_aper_conflict(iommu, geo.aperture_start,
+					    geo.aperture_end)) {
+		ret = -EINVAL;
+		goto out_detach;
+	}
+
+	/* Get a copy of the current iova list and work on it */
+	ret = vfio_iommu_iova_get_copy(iommu, &iova_copy);
+	if (ret)
+		goto out_detach;
+
+	ret = vfio_iommu_aper_resize(&iova_copy, geo.aperture_start,
+						 geo.aperture_end);
+	if (ret)
+		goto out_detach;
+
 	resv_msi = vfio_iommu_has_sw_msi(iommu_group, &resv_msi_base);
 
 	INIT_LIST_HEAD(&domain->group_list);
@@ -1316,8 +1487,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 				list_add(&group->next, &d->group_list);
 				iommu_domain_free(domain->domain);
 				kfree(domain);
-				mutex_unlock(&iommu->lock);
-				return 0;
+				goto done;
 			}
 
 			ret = iommu_attach_group(domain->domain, iommu_group);
@@ -1340,7 +1510,9 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	}
 
 	list_add(&domain->next, &iommu->domain_list);
-
+done:
+	/* Delete the old one and insert new iova list */
+	vfio_iommu_iova_insert_copy(iommu, &iova_copy);
 	mutex_unlock(&iommu->lock);
 
 	return 0;
@@ -1349,6 +1521,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	iommu_detach_group(domain->domain, iommu_group);
 out_domain:
 	iommu_domain_free(domain->domain);
+	vfio_iommu_iova_free(&iova_copy);
 out_free:
 	kfree(domain);
 	kfree(group);
@@ -1487,6 +1660,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	}
 
 	INIT_LIST_HEAD(&iommu->domain_list);
+	INIT_LIST_HEAD(&iommu->iova_list);
 	iommu->dma_list = RB_ROOT;
 	mutex_init(&iommu->lock);
 	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
@@ -1529,6 +1703,9 @@ static void vfio_iommu_type1_release(void *iommu_data)
 		list_del(&domain->next);
 		kfree(domain);
 	}
+
+	vfio_iommu_iova_free(&iommu->iova_list);
+
 	kfree(iommu);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 1/7] vfio/type1: Introduce iova list and add iommu aperture validity check
@ 2018-03-15 16:35   ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

This introduces an iova list that is valid for dma mappings. Make
sure the new iommu aperture window doesn't conflict with the current
one or with any existing dma mappings during attach.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 drivers/vfio/vfio_iommu_type1.c | 183 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 180 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 45657e2..1123c74 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -60,6 +60,7 @@ MODULE_PARM_DESC(disable_hugepages,
 
 struct vfio_iommu {
 	struct list_head	domain_list;
+	struct list_head	iova_list;
 	struct vfio_domain	*external_domain; /* domain for external user */
 	struct mutex		lock;
 	struct rb_root		dma_list;
@@ -92,6 +93,12 @@ struct vfio_group {
 	struct list_head	next;
 };
 
+struct vfio_iova {
+	struct list_head	list;
+	dma_addr_t		start;
+	dma_addr_t		end;
+};
+
 /*
  * Guest RAM pinning working set or DMA target
  */
@@ -1204,6 +1211,149 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base)
 	return ret;
 }
 
+/*
+ * This is a helper function to insert an address range to iova list.
+ * The list starts with a single entry corresponding to the IOMMU
+ * domain geometry to which the device group is attached. The list
+ * aperture gets modified when a new domain is added to the container
+ * if the new aperture doesn't conflict with the current one or with
+ * any existing dma mappings. The list is also modified to exclude
+ * any reserved regions associated with the device group.
+ */
+static int vfio_iommu_iova_insert(struct list_head *head,
+				  dma_addr_t start, dma_addr_t end)
+{
+	struct vfio_iova *region;
+
+	region = kmalloc(sizeof(*region), GFP_KERNEL);
+	if (!region)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&region->list);
+	region->start = start;
+	region->end = end;
+
+	list_add_tail(&region->list, head);
+	return 0;
+}
+
+/*
+ * Check the new iommu aperture conflicts with existing aper or with any
+ * existing dma mappings.
+ */
+static bool vfio_iommu_aper_conflict(struct vfio_iommu *iommu,
+				     dma_addr_t start, dma_addr_t end)
+{
+	struct vfio_iova *first, *last;
+	struct list_head *iova = &iommu->iova_list;
+
+	if (list_empty(iova))
+		return false;
+
+	/* Disjoint sets, return conflict */
+	first = list_first_entry(iova, struct vfio_iova, list);
+	last = list_last_entry(iova, struct vfio_iova, list);
+	if ((start > last->end) || (end < first->start))
+		return true;
+
+	/* Check for any existing dma mappings outside the new start */
+	if (start > first->start) {
+		if (vfio_find_dma(iommu, first->start, start - first->start))
+			return true;
+	}
+
+	/* Check for any existing dma mappings outside the new end */
+	if (end < last->end) {
+		if (vfio_find_dma(iommu, end + 1, last->end - end))
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Resize iommu iova aperture window. This is called only if the new
+ * aperture has no conflict with existing aperture and dma mappings.
+ */
+static int vfio_iommu_aper_resize(struct list_head *iova,
+				      dma_addr_t start,
+				      dma_addr_t end)
+{
+	struct vfio_iova *node, *next;
+
+	if (list_empty(iova))
+		return vfio_iommu_iova_insert(iova, start, end);
+
+	/* Adjust iova list start */
+	list_for_each_entry_safe(node, next, iova, list) {
+		if (start < node->start)
+			break;
+		if ((start >= node->start) && (start < node->end)) {
+			node->start = start;
+			break;
+		}
+		/* Delete nodes before new start */
+		list_del(&node->list);
+		kfree(node);
+	}
+
+	/* Adjust iova list end */
+	list_for_each_entry_safe(node, next, iova, list) {
+		if (end > node->end)
+			continue;
+		if ((end > node->start) && (end <= node->end)) {
+			node->end = end;
+			continue;
+		}
+		/* Delete nodes after new end */
+		list_del(&node->list);
+		kfree(node);
+	}
+
+	return 0;
+}
+
+static void vfio_iommu_iova_free(struct list_head *iova)
+{
+	struct vfio_iova *n, *next;
+
+	list_for_each_entry_safe(n, next, iova, list) {
+		list_del(&n->list);
+		kfree(n);
+	}
+}
+
+static int vfio_iommu_iova_get_copy(struct vfio_iommu *iommu,
+				    struct list_head *iova_copy)
+{
+
+	struct list_head *iova = &iommu->iova_list;
+	struct vfio_iova *n;
+	int ret;
+
+	list_for_each_entry(n, iova, list) {
+		ret = vfio_iommu_iova_insert(iova_copy, n->start, n->end);
+		if (ret)
+			goto out_free;
+	}
+
+	return 0;
+
+out_free:
+	vfio_iommu_iova_free(iova_copy);
+	return ret;
+}
+
+static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu,
+					struct list_head *iova_copy)
+{
+	struct list_head *iova = &iommu->iova_list;
+
+	vfio_iommu_iova_free(iova);
+
+	list_splice_tail(iova_copy, iova);
+}
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
 					 struct iommu_group *iommu_group)
 {
@@ -1214,6 +1364,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	int ret;
 	bool resv_msi, msi_remap;
 	phys_addr_t resv_msi_base;
+	struct iommu_domain_geometry geo;
+	LIST_HEAD(iova_copy);
 
 	mutex_lock(&iommu->lock);
 
@@ -1283,6 +1435,25 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_domain;
 
+	/* Get aperture info */
+	iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY, &geo);
+
+	if (vfio_iommu_aper_conflict(iommu, geo.aperture_start,
+					    geo.aperture_end)) {
+		ret = -EINVAL;
+		goto out_detach;
+	}
+
+	/* Get a copy of the current iova list and work on it */
+	ret = vfio_iommu_iova_get_copy(iommu, &iova_copy);
+	if (ret)
+		goto out_detach;
+
+	ret = vfio_iommu_aper_resize(&iova_copy, geo.aperture_start,
+						 geo.aperture_end);
+	if (ret)
+		goto out_detach;
+
 	resv_msi = vfio_iommu_has_sw_msi(iommu_group, &resv_msi_base);
 
 	INIT_LIST_HEAD(&domain->group_list);
@@ -1316,8 +1487,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 				list_add(&group->next, &d->group_list);
 				iommu_domain_free(domain->domain);
 				kfree(domain);
-				mutex_unlock(&iommu->lock);
-				return 0;
+				goto done;
 			}
 
 			ret = iommu_attach_group(domain->domain, iommu_group);
@@ -1340,7 +1510,9 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	}
 
 	list_add(&domain->next, &iommu->domain_list);
-
+done:
+	/* Delete the old one and insert new iova list */
+	vfio_iommu_iova_insert_copy(iommu, &iova_copy);
 	mutex_unlock(&iommu->lock);
 
 	return 0;
@@ -1349,6 +1521,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	iommu_detach_group(domain->domain, iommu_group);
 out_domain:
 	iommu_domain_free(domain->domain);
+	vfio_iommu_iova_free(&iova_copy);
 out_free:
 	kfree(domain);
 	kfree(group);
@@ -1487,6 +1660,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	}
 
 	INIT_LIST_HEAD(&iommu->domain_list);
+	INIT_LIST_HEAD(&iommu->iova_list);
 	iommu->dma_list = RB_ROOT;
 	mutex_init(&iommu->lock);
 	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
@@ -1529,6 +1703,9 @@ static void vfio_iommu_type1_release(void *iommu_data)
 		list_del(&domain->next);
 		kfree(domain);
 	}
+
+	vfio_iommu_iova_free(&iommu->iova_list);
+
 	kfree(iommu);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
  2018-03-15 16:35 ` Shameer Kolothum
@ 2018-03-15 16:35   ` Shameer Kolothum
  -1 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, pmorel
  Cc: kvm, linux-kernel, iommu, linuxarm, john.garry, xuwei5, Shameer Kolothum

This retrieves the reserved regions associated with dev group and
checks for conflicts with any existing dma mappings. Also update
the iova list excluding the reserved regions.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/vfio/vfio_iommu_type1.c | 90 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 1123c74..cfe2bb2 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct list_head *iova,
 	return 0;
 }
 
+/*
+ * Check reserved region conflicts with existing dma mappings
+ */
+static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
+				struct list_head *resv_regions)
+{
+	struct iommu_resv_region *region;
+
+	/* Check for conflict with existing dma mappings */
+	list_for_each_entry(region, resv_regions, list) {
+		if (vfio_find_dma(iommu, region->start, region->length))
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Check iova region overlap with  reserved regions and
+ * exclude them from the iommu iova range
+ */
+static int vfio_iommu_resv_exclude(struct list_head *iova,
+					struct list_head *resv_regions)
+{
+	struct iommu_resv_region *resv;
+	struct vfio_iova *n, *next;
+
+	list_for_each_entry(resv, resv_regions, list) {
+		phys_addr_t start, end;
+
+		start = resv->start;
+		end = resv->start + resv->length - 1;
+
+		list_for_each_entry_safe(n, next, iova, list) {
+			int ret = 0;
+
+			/* No overlap */
+			if ((start > n->end) || (end < n->start))
+				continue;
+			/*
+			 * Insert a new node if current node overlaps with the
+			 * reserve region to exlude that from valid iova range.
+			 * Note that, new node is inserted before the current
+			 * node and finally the current node is deleted keeping
+			 * the list updated and sorted.
+			 */
+			if (start > n->start)
+				ret = vfio_iommu_iova_insert(&n->list,
+							n->start, start - 1);
+			if (!ret && end < n->end)
+				ret = vfio_iommu_iova_insert(&n->list,
+							end + 1, n->end);
+			if (ret)
+				return ret;
+
+			list_del(&n->list);
+			kfree(n);
+		}
+	}
+
+	if (list_empty(iova))
+		return -EINVAL;
+
+	return 0;
+}
+
+static void vfio_iommu_resv_free(struct list_head *resv_regions)
+{
+	struct iommu_resv_region *n, *next;
+
+	list_for_each_entry_safe(n, next, resv_regions, list) {
+		list_del(&n->list);
+		kfree(n);
+	}
+}
+
 static void vfio_iommu_iova_free(struct list_head *iova)
 {
 	struct vfio_iova *n, *next;
@@ -1366,6 +1442,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	phys_addr_t resv_msi_base;
 	struct iommu_domain_geometry geo;
 	LIST_HEAD(iova_copy);
+	LIST_HEAD(group_resv_regions);
 
 	mutex_lock(&iommu->lock);
 
@@ -1444,6 +1521,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 		goto out_detach;
 	}
 
+	iommu_get_group_resv_regions(iommu_group, &group_resv_regions);
+
+	if (vfio_iommu_resv_conflict(iommu, &group_resv_regions)) {
+		ret = -EINVAL;
+		goto out_detach;
+	}
+
 	/* Get a copy of the current iova list and work on it */
 	ret = vfio_iommu_iova_get_copy(iommu, &iova_copy);
 	if (ret)
@@ -1454,6 +1538,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_detach;
 
+	ret = vfio_iommu_resv_exclude(&iova_copy, &group_resv_regions);
+	if (ret)
+		goto out_detach;
+
 	resv_msi = vfio_iommu_has_sw_msi(iommu_group, &resv_msi_base);
 
 	INIT_LIST_HEAD(&domain->group_list);
@@ -1514,6 +1602,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	/* Delete the old one and insert new iova list */
 	vfio_iommu_iova_insert_copy(iommu, &iova_copy);
 	mutex_unlock(&iommu->lock);
+	vfio_iommu_resv_free(&group_resv_regions);
 
 	return 0;
 
@@ -1522,6 +1611,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 out_domain:
 	iommu_domain_free(domain->domain);
 	vfio_iommu_iova_free(&iova_copy);
+	vfio_iommu_resv_free(&group_resv_regions);
 out_free:
 	kfree(domain);
 	kfree(group);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-15 16:35   ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, pmorel
  Cc: kvm, linux-kernel, iommu, linuxarm, john.garry, xuwei5, Shameer Kolothum

This retrieves the reserved regions associated with dev group and
checks for conflicts with any existing dma mappings. Also update
the iova list excluding the reserved regions.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/vfio/vfio_iommu_type1.c | 90 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 1123c74..cfe2bb2 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct list_head *iova,
 	return 0;
 }
 
+/*
+ * Check reserved region conflicts with existing dma mappings
+ */
+static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
+				struct list_head *resv_regions)
+{
+	struct iommu_resv_region *region;
+
+	/* Check for conflict with existing dma mappings */
+	list_for_each_entry(region, resv_regions, list) {
+		if (vfio_find_dma(iommu, region->start, region->length))
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Check iova region overlap with  reserved regions and
+ * exclude them from the iommu iova range
+ */
+static int vfio_iommu_resv_exclude(struct list_head *iova,
+					struct list_head *resv_regions)
+{
+	struct iommu_resv_region *resv;
+	struct vfio_iova *n, *next;
+
+	list_for_each_entry(resv, resv_regions, list) {
+		phys_addr_t start, end;
+
+		start = resv->start;
+		end = resv->start + resv->length - 1;
+
+		list_for_each_entry_safe(n, next, iova, list) {
+			int ret = 0;
+
+			/* No overlap */
+			if ((start > n->end) || (end < n->start))
+				continue;
+			/*
+			 * Insert a new node if current node overlaps with the
+			 * reserve region to exlude that from valid iova range.
+			 * Note that, new node is inserted before the current
+			 * node and finally the current node is deleted keeping
+			 * the list updated and sorted.
+			 */
+			if (start > n->start)
+				ret = vfio_iommu_iova_insert(&n->list,
+							n->start, start - 1);
+			if (!ret && end < n->end)
+				ret = vfio_iommu_iova_insert(&n->list,
+							end + 1, n->end);
+			if (ret)
+				return ret;
+
+			list_del(&n->list);
+			kfree(n);
+		}
+	}
+
+	if (list_empty(iova))
+		return -EINVAL;
+
+	return 0;
+}
+
+static void vfio_iommu_resv_free(struct list_head *resv_regions)
+{
+	struct iommu_resv_region *n, *next;
+
+	list_for_each_entry_safe(n, next, resv_regions, list) {
+		list_del(&n->list);
+		kfree(n);
+	}
+}
+
 static void vfio_iommu_iova_free(struct list_head *iova)
 {
 	struct vfio_iova *n, *next;
@@ -1366,6 +1442,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	phys_addr_t resv_msi_base;
 	struct iommu_domain_geometry geo;
 	LIST_HEAD(iova_copy);
+	LIST_HEAD(group_resv_regions);
 
 	mutex_lock(&iommu->lock);
 
@@ -1444,6 +1521,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 		goto out_detach;
 	}
 
+	iommu_get_group_resv_regions(iommu_group, &group_resv_regions);
+
+	if (vfio_iommu_resv_conflict(iommu, &group_resv_regions)) {
+		ret = -EINVAL;
+		goto out_detach;
+	}
+
 	/* Get a copy of the current iova list and work on it */
 	ret = vfio_iommu_iova_get_copy(iommu, &iova_copy);
 	if (ret)
@@ -1454,6 +1538,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_detach;
 
+	ret = vfio_iommu_resv_exclude(&iova_copy, &group_resv_regions);
+	if (ret)
+		goto out_detach;
+
 	resv_msi = vfio_iommu_has_sw_msi(iommu_group, &resv_msi_base);
 
 	INIT_LIST_HEAD(&domain->group_list);
@@ -1514,6 +1602,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	/* Delete the old one and insert new iova list */
 	vfio_iommu_iova_insert_copy(iommu, &iova_copy);
 	mutex_unlock(&iommu->lock);
+	vfio_iommu_resv_free(&group_resv_regions);
 
 	return 0;
 
@@ -1522,6 +1611,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 out_domain:
 	iommu_domain_free(domain->domain);
 	vfio_iommu_iova_free(&iova_copy);
+	vfio_iommu_resv_free(&group_resv_regions);
 out_free:
 	kfree(domain);
 	kfree(group);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 3/7] vfio/type1: Update iova list on detach
@ 2018-03-15 16:35   ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, pmorel
  Cc: kvm, linux-kernel, iommu, linuxarm, john.garry, xuwei5, Shameer Kolothum

Get a copy of iova list on _group_detach and try to update the list.
On success replace the current one with the copy. Leave the list as
it is if update fails.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/vfio/vfio_iommu_type1.c | 91 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index cfe2bb2..25e6920 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1667,12 +1667,88 @@ static void vfio_sanity_check_pfn_list(struct vfio_iommu *iommu)
 	WARN_ON(iommu->notifier.head);
 }
 
+/*
+ * Called when a domain is removed in detach. It is possible that
+ * the removed domain decided the iova aperture window. Modify the
+ * iova aperture with the smallest window among existing domains.
+ */
+static void vfio_iommu_aper_expand(struct vfio_iommu *iommu,
+				   struct list_head *iova_copy)
+{
+	struct vfio_domain *domain;
+	struct iommu_domain_geometry geo;
+	struct vfio_iova *node;
+	dma_addr_t start = 0;
+	dma_addr_t end = (dma_addr_t)~0;
+
+	list_for_each_entry(domain, &iommu->domain_list, next) {
+		iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY,
+				      &geo);
+		if (geo.aperture_start > start)
+			start = geo.aperture_start;
+		if (geo.aperture_end < end)
+			end = geo.aperture_end;
+	}
+
+	/* Modify aperture limits. The new aper is either same or bigger */
+	node = list_first_entry(iova_copy, struct vfio_iova, list);
+	node->start = start;
+	node = list_last_entry(iova_copy, struct vfio_iova, list);
+	node->end = end;
+}
+
+/*
+ * Called when a group is detached. The reserved regions for that
+ * group can be part of valid iova now. But since reserved regions
+ * may be duplicated among groups, populate the iova valid regions
+ * list again.
+ */
+static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu,
+				   struct list_head *iova_copy)
+{
+	struct vfio_domain *d;
+	struct vfio_group *g;
+	struct vfio_iova *node;
+	dma_addr_t start, end;
+	LIST_HEAD(resv_regions);
+	int ret;
+
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		list_for_each_entry(g, &d->group_list, next)
+			iommu_get_group_resv_regions(g->iommu_group,
+							 &resv_regions);
+	}
+
+	if (list_empty(&resv_regions))
+		return 0;
+
+	node = list_first_entry(iova_copy, struct vfio_iova, list);
+	start = node->start;
+	node = list_last_entry(iova_copy, struct vfio_iova, list);
+	end = node->end;
+
+	/* purge the iova list and create new one */
+	vfio_iommu_iova_free(iova_copy);
+
+	ret = vfio_iommu_aper_resize(iova_copy, start, end);
+	if (ret)
+		goto done;
+
+	/* Exclude current reserved regions from iova ranges */
+	ret = vfio_iommu_resv_exclude(iova_copy, &resv_regions);
+done:
+	vfio_iommu_resv_free(&resv_regions);
+	return ret;
+}
+
 static void vfio_iommu_type1_detach_group(void *iommu_data,
 					  struct iommu_group *iommu_group)
 {
 	struct vfio_iommu *iommu = iommu_data;
 	struct vfio_domain *domain;
 	struct vfio_group *group;
+	bool iova_copy_fail;
+	LIST_HEAD(iova_copy);
 
 	mutex_lock(&iommu->lock);
 
@@ -1695,6 +1771,12 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 		}
 	}
 
+	/*
+	 * Get a copy of iova list. If success, use copy to update the
+	 * list and to replace the current one.
+	 */
+	iova_copy_fail = !!vfio_iommu_iova_get_copy(iommu, &iova_copy);
+
 	list_for_each_entry(domain, &iommu->domain_list, next) {
 		group = find_iommu_group(domain, iommu_group);
 		if (!group)
@@ -1720,10 +1802,19 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 			iommu_domain_free(domain->domain);
 			list_del(&domain->next);
 			kfree(domain);
+			if (!iova_copy_fail && !list_empty(&iommu->domain_list))
+				vfio_iommu_aper_expand(iommu, &iova_copy);
 		}
 		break;
 	}
 
+	if (!iova_copy_fail && !list_empty(&iommu->domain_list)) {
+		if (!vfio_iommu_resv_refresh(iommu, &iova_copy))
+			vfio_iommu_iova_insert_copy(iommu, &iova_copy);
+		else
+			vfio_iommu_iova_free(&iova_copy);
+	}
+
 detach_group_done:
 	mutex_unlock(&iommu->lock);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 3/7] vfio/type1: Update iova list on detach
@ 2018-03-15 16:35   ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Get a copy of iova list on _group_detach and try to update the list.
On success replace the current one with the copy. Leave the list as
it is if update fails.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 drivers/vfio/vfio_iommu_type1.c | 91 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index cfe2bb2..25e6920 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1667,12 +1667,88 @@ static void vfio_sanity_check_pfn_list(struct vfio_iommu *iommu)
 	WARN_ON(iommu->notifier.head);
 }
 
+/*
+ * Called when a domain is removed in detach. It is possible that
+ * the removed domain decided the iova aperture window. Modify the
+ * iova aperture with the smallest window among existing domains.
+ */
+static void vfio_iommu_aper_expand(struct vfio_iommu *iommu,
+				   struct list_head *iova_copy)
+{
+	struct vfio_domain *domain;
+	struct iommu_domain_geometry geo;
+	struct vfio_iova *node;
+	dma_addr_t start = 0;
+	dma_addr_t end = (dma_addr_t)~0;
+
+	list_for_each_entry(domain, &iommu->domain_list, next) {
+		iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY,
+				      &geo);
+		if (geo.aperture_start > start)
+			start = geo.aperture_start;
+		if (geo.aperture_end < end)
+			end = geo.aperture_end;
+	}
+
+	/* Modify aperture limits. The new aper is either same or bigger */
+	node = list_first_entry(iova_copy, struct vfio_iova, list);
+	node->start = start;
+	node = list_last_entry(iova_copy, struct vfio_iova, list);
+	node->end = end;
+}
+
+/*
+ * Called when a group is detached. The reserved regions for that
+ * group can be part of valid iova now. But since reserved regions
+ * may be duplicated among groups, populate the iova valid regions
+ * list again.
+ */
+static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu,
+				   struct list_head *iova_copy)
+{
+	struct vfio_domain *d;
+	struct vfio_group *g;
+	struct vfio_iova *node;
+	dma_addr_t start, end;
+	LIST_HEAD(resv_regions);
+	int ret;
+
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		list_for_each_entry(g, &d->group_list, next)
+			iommu_get_group_resv_regions(g->iommu_group,
+							 &resv_regions);
+	}
+
+	if (list_empty(&resv_regions))
+		return 0;
+
+	node = list_first_entry(iova_copy, struct vfio_iova, list);
+	start = node->start;
+	node = list_last_entry(iova_copy, struct vfio_iova, list);
+	end = node->end;
+
+	/* purge the iova list and create new one */
+	vfio_iommu_iova_free(iova_copy);
+
+	ret = vfio_iommu_aper_resize(iova_copy, start, end);
+	if (ret)
+		goto done;
+
+	/* Exclude current reserved regions from iova ranges */
+	ret = vfio_iommu_resv_exclude(iova_copy, &resv_regions);
+done:
+	vfio_iommu_resv_free(&resv_regions);
+	return ret;
+}
+
 static void vfio_iommu_type1_detach_group(void *iommu_data,
 					  struct iommu_group *iommu_group)
 {
 	struct vfio_iommu *iommu = iommu_data;
 	struct vfio_domain *domain;
 	struct vfio_group *group;
+	bool iova_copy_fail;
+	LIST_HEAD(iova_copy);
 
 	mutex_lock(&iommu->lock);
 
@@ -1695,6 +1771,12 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 		}
 	}
 
+	/*
+	 * Get a copy of iova list. If success, use copy to update the
+	 * list and to replace the current one.
+	 */
+	iova_copy_fail = !!vfio_iommu_iova_get_copy(iommu, &iova_copy);
+
 	list_for_each_entry(domain, &iommu->domain_list, next) {
 		group = find_iommu_group(domain, iommu_group);
 		if (!group)
@@ -1720,10 +1802,19 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 			iommu_domain_free(domain->domain);
 			list_del(&domain->next);
 			kfree(domain);
+			if (!iova_copy_fail && !list_empty(&iommu->domain_list))
+				vfio_iommu_aper_expand(iommu, &iova_copy);
 		}
 		break;
 	}
 
+	if (!iova_copy_fail && !list_empty(&iommu->domain_list)) {
+		if (!vfio_iommu_resv_refresh(iommu, &iova_copy))
+			vfio_iommu_iova_insert_copy(iommu, &iova_copy);
+		else
+			vfio_iommu_iova_free(&iova_copy);
+	}
+
 detach_group_done:
 	mutex_unlock(&iommu->lock);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 3/7] vfio/type1: Update iova list on detach
@ 2018-03-15 16:35   ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Get a copy of iova list on _group_detach and try to update the list.
On success replace the current one with the copy. Leave the list as
it is if update fails.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 drivers/vfio/vfio_iommu_type1.c | 91 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index cfe2bb2..25e6920 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1667,12 +1667,88 @@ static void vfio_sanity_check_pfn_list(struct vfio_iommu *iommu)
 	WARN_ON(iommu->notifier.head);
 }
 
+/*
+ * Called when a domain is removed in detach. It is possible that
+ * the removed domain decided the iova aperture window. Modify the
+ * iova aperture with the smallest window among existing domains.
+ */
+static void vfio_iommu_aper_expand(struct vfio_iommu *iommu,
+				   struct list_head *iova_copy)
+{
+	struct vfio_domain *domain;
+	struct iommu_domain_geometry geo;
+	struct vfio_iova *node;
+	dma_addr_t start = 0;
+	dma_addr_t end = (dma_addr_t)~0;
+
+	list_for_each_entry(domain, &iommu->domain_list, next) {
+		iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY,
+				      &geo);
+		if (geo.aperture_start > start)
+			start = geo.aperture_start;
+		if (geo.aperture_end < end)
+			end = geo.aperture_end;
+	}
+
+	/* Modify aperture limits. The new aper is either same or bigger */
+	node = list_first_entry(iova_copy, struct vfio_iova, list);
+	node->start = start;
+	node = list_last_entry(iova_copy, struct vfio_iova, list);
+	node->end = end;
+}
+
+/*
+ * Called when a group is detached. The reserved regions for that
+ * group can be part of valid iova now. But since reserved regions
+ * may be duplicated among groups, populate the iova valid regions
+ * list again.
+ */
+static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu,
+				   struct list_head *iova_copy)
+{
+	struct vfio_domain *d;
+	struct vfio_group *g;
+	struct vfio_iova *node;
+	dma_addr_t start, end;
+	LIST_HEAD(resv_regions);
+	int ret;
+
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		list_for_each_entry(g, &d->group_list, next)
+			iommu_get_group_resv_regions(g->iommu_group,
+							 &resv_regions);
+	}
+
+	if (list_empty(&resv_regions))
+		return 0;
+
+	node = list_first_entry(iova_copy, struct vfio_iova, list);
+	start = node->start;
+	node = list_last_entry(iova_copy, struct vfio_iova, list);
+	end = node->end;
+
+	/* purge the iova list and create new one */
+	vfio_iommu_iova_free(iova_copy);
+
+	ret = vfio_iommu_aper_resize(iova_copy, start, end);
+	if (ret)
+		goto done;
+
+	/* Exclude current reserved regions from iova ranges */
+	ret = vfio_iommu_resv_exclude(iova_copy, &resv_regions);
+done:
+	vfio_iommu_resv_free(&resv_regions);
+	return ret;
+}
+
 static void vfio_iommu_type1_detach_group(void *iommu_data,
 					  struct iommu_group *iommu_group)
 {
 	struct vfio_iommu *iommu = iommu_data;
 	struct vfio_domain *domain;
 	struct vfio_group *group;
+	bool iova_copy_fail;
+	LIST_HEAD(iova_copy);
 
 	mutex_lock(&iommu->lock);
 
@@ -1695,6 +1771,12 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 		}
 	}
 
+	/*
+	 * Get a copy of iova list. If success, use copy to update the
+	 * list and to replace the current one.
+	 */
+	iova_copy_fail = !!vfio_iommu_iova_get_copy(iommu, &iova_copy);
+
 	list_for_each_entry(domain, &iommu->domain_list, next) {
 		group = find_iommu_group(domain, iommu_group);
 		if (!group)
@@ -1720,10 +1802,19 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 			iommu_domain_free(domain->domain);
 			list_del(&domain->next);
 			kfree(domain);
+			if (!iova_copy_fail && !list_empty(&iommu->domain_list))
+				vfio_iommu_aper_expand(iommu, &iova_copy);
 		}
 		break;
 	}
 
+	if (!iova_copy_fail && !list_empty(&iommu->domain_list)) {
+		if (!vfio_iommu_resv_refresh(iommu, &iova_copy))
+			vfio_iommu_iova_insert_copy(iommu, &iova_copy);
+		else
+			vfio_iommu_iova_free(&iova_copy);
+	}
+
 detach_group_done:
 	mutex_unlock(&iommu->lock);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 4/7] vfio/type1: check dma map request is within a valid iova range
@ 2018-03-15 16:35   ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, pmorel
  Cc: kvm, linux-kernel, iommu, linuxarm, john.garry, xuwei5, Shameer Kolothum

This checks and rejects any dma map request outside valid iova
range.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/vfio/vfio_iommu_type1.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 25e6920..d59db31 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -982,6 +982,23 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma,
 	return ret;
 }
 
+/*
+ * Check dma map request is within a valid iova range
+ */
+static bool vfio_iommu_iova_dma_valid(struct vfio_iommu *iommu,
+				dma_addr_t start, dma_addr_t end)
+{
+	struct list_head *iova = &iommu->iova_list;
+	struct vfio_iova *node;
+
+	list_for_each_entry(node, iova, list) {
+		if ((start >= node->start) && (end <= node->end))
+			return true;
+	}
+
+	return false;
+}
+
 static int vfio_dma_do_map(struct vfio_iommu *iommu,
 			   struct vfio_iommu_type1_dma_map *map)
 {
@@ -1020,6 +1037,11 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 		goto out_unlock;
 	}
 
+	if (!vfio_iommu_iova_dma_valid(iommu, iova, iova + size - 1)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
 	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
 	if (!dma) {
 		ret = -ENOMEM;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 4/7] vfio/type1: check dma map request is within a valid iova range
@ 2018-03-15 16:35   ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

This checks and rejects any dma map request outside valid iova
range.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 drivers/vfio/vfio_iommu_type1.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 25e6920..d59db31 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -982,6 +982,23 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma,
 	return ret;
 }
 
+/*
+ * Check dma map request is within a valid iova range
+ */
+static bool vfio_iommu_iova_dma_valid(struct vfio_iommu *iommu,
+				dma_addr_t start, dma_addr_t end)
+{
+	struct list_head *iova = &iommu->iova_list;
+	struct vfio_iova *node;
+
+	list_for_each_entry(node, iova, list) {
+		if ((start >= node->start) && (end <= node->end))
+			return true;
+	}
+
+	return false;
+}
+
 static int vfio_dma_do_map(struct vfio_iommu *iommu,
 			   struct vfio_iommu_type1_dma_map *map)
 {
@@ -1020,6 +1037,11 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 		goto out_unlock;
 	}
 
+	if (!vfio_iommu_iova_dma_valid(iommu, iova, iova + size - 1)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
 	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
 	if (!dma) {
 		ret = -ENOMEM;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 4/7] vfio/type1: check dma map request is within a valid iova range
@ 2018-03-15 16:35   ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

This checks and rejects any dma map request outside valid iova
range.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 drivers/vfio/vfio_iommu_type1.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 25e6920..d59db31 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -982,6 +982,23 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma,
 	return ret;
 }
 
+/*
+ * Check dma map request is within a valid iova range
+ */
+static bool vfio_iommu_iova_dma_valid(struct vfio_iommu *iommu,
+				dma_addr_t start, dma_addr_t end)
+{
+	struct list_head *iova = &iommu->iova_list;
+	struct vfio_iova *node;
+
+	list_for_each_entry(node, iova, list) {
+		if ((start >= node->start) && (end <= node->end))
+			return true;
+	}
+
+	return false;
+}
+
 static int vfio_dma_do_map(struct vfio_iommu *iommu,
 			   struct vfio_iommu_type1_dma_map *map)
 {
@@ -1020,6 +1037,11 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 		goto out_unlock;
 	}
 
+	if (!vfio_iommu_iova_dma_valid(iommu, iova, iova + size - 1)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
 	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
 	if (!dma) {
 		ret = -ENOMEM;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 5/7] vfio/type1: Add IOVA range capability support
@ 2018-03-15 16:35   ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, pmorel
  Cc: kvm, linux-kernel, iommu, linuxarm, john.garry, xuwei5, Shameer Kolothum

This  allows the user-space to retrieve the supported IOVA
range(s), excluding any reserved regions. The implementation
is based on capability chains, added to VFIO_IOMMU_GET_INFO ioctl.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/vfio/vfio_iommu_type1.c | 96 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       | 23 ++++++++++
 2 files changed, 119 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index d59db31..90f195d 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1929,6 +1929,68 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static int vfio_iommu_iova_add_cap(struct vfio_info_cap *caps,
+		 struct vfio_iommu_type1_info_cap_iova_range *cap_iovas,
+		 size_t size)
+{
+	struct vfio_info_cap_header *header;
+	struct vfio_iommu_type1_info_cap_iova_range *iova_cap;
+
+	header = vfio_info_cap_add(caps, size,
+				VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE, 1);
+	if (IS_ERR(header))
+		return PTR_ERR(header);
+
+	iova_cap = container_of(header,
+			struct vfio_iommu_type1_info_cap_iova_range, header);
+	iova_cap->nr_iovas = cap_iovas->nr_iovas;
+	memcpy(iova_cap->iova_ranges, cap_iovas->iova_ranges,
+		cap_iovas->nr_iovas * sizeof(*cap_iovas->iova_ranges));
+	return 0;
+}
+
+static int vfio_iommu_iova_build_caps(struct vfio_iommu *iommu,
+				struct vfio_info_cap *caps)
+{
+	struct vfio_iommu_type1_info_cap_iova_range *cap_iovas;
+	struct vfio_iova *iova;
+	size_t size;
+	int iovas = 0, i = 0, ret;
+
+	mutex_lock(&iommu->lock);
+
+	list_for_each_entry(iova, &iommu->iova_list, list)
+		iovas++;
+
+	if (!iovas) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	size = sizeof(*cap_iovas) + (iovas * sizeof(*cap_iovas->iova_ranges));
+
+	cap_iovas = kzalloc(size, GFP_KERNEL);
+	if (!cap_iovas) {
+		ret = -ENOMEM;
+		goto out_unlock;
+	}
+
+	cap_iovas->nr_iovas = iovas;
+
+	list_for_each_entry(iova, &iommu->iova_list, list) {
+		cap_iovas->iova_ranges[i].start = iova->start;
+		cap_iovas->iova_ranges[i].end = iova->end;
+		i++;
+	}
+
+	ret = vfio_iommu_iova_add_cap(caps, cap_iovas, size);
+
+	kfree(cap_iovas);
+out_unlock:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -1950,19 +2012,53 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		}
 	} else if (cmd == VFIO_IOMMU_GET_INFO) {
 		struct vfio_iommu_type1_info info;
+		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+		unsigned long capsz;
+		int ret;
 
 		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
 
+		/* For backward compatibility, cannot require this */
+		capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
+
 		if (copy_from_user(&info, (void __user *)arg, minsz))
 			return -EFAULT;
 
 		if (info.argsz < minsz)
 			return -EINVAL;
 
+		if (info.argsz >= capsz) {
+			minsz = capsz;
+			info.cap_offset = 0; /* output, no-recopy necessary */
+		}
+
 		info.flags = VFIO_IOMMU_INFO_PGSIZES;
 
 		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
+		ret = vfio_iommu_iova_build_caps(iommu, &caps);
+		if (ret)
+			return ret;
+
+		if (caps.size) {
+			info.flags |= VFIO_IOMMU_INFO_CAPS;
+
+			if (info.argsz < sizeof(info) + caps.size) {
+				info.argsz = sizeof(info) + caps.size;
+			} else {
+				vfio_info_cap_shift(&caps, sizeof(info));
+				if (copy_to_user((void __user *)arg +
+						sizeof(info), caps.buf,
+						caps.size)) {
+					kfree(caps.buf);
+					return -EFAULT;
+				}
+				info.cap_offset = sizeof(info);
+			}
+
+			kfree(caps.buf);
+		}
+
 		return copy_to_user((void __user *)arg, &info, minsz) ?
 			-EFAULT : 0;
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index c743721..46b49e9 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -589,7 +589,30 @@ struct vfio_iommu_type1_info {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
+#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
 	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
+	__u32   cap_offset;	/* Offset within info struct of first cap */
+};
+
+/*
+ * The IOVA capability allows to report the valid IOVA range(s)
+ * excluding any reserved regions associated with dev group. Any dma
+ * map attempt outside the valid iova range will return error.
+ *
+ * The structures below define version 1 of this capability.
+ */
+#define VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE  1
+
+struct vfio_iova_range {
+	__u64	start;
+	__u64	end;
+};
+
+struct vfio_iommu_type1_info_cap_iova_range {
+	struct vfio_info_cap_header header;
+	__u32	nr_iovas;
+	__u32	reserved;
+	struct vfio_iova_range iova_ranges[];
 };
 
 #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 5/7] vfio/type1: Add IOVA range capability support
@ 2018-03-15 16:35   ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

This  allows the user-space to retrieve the supported IOVA
range(s), excluding any reserved regions. The implementation
is based on capability chains, added to VFIO_IOMMU_GET_INFO ioctl.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 drivers/vfio/vfio_iommu_type1.c | 96 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       | 23 ++++++++++
 2 files changed, 119 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index d59db31..90f195d 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1929,6 +1929,68 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static int vfio_iommu_iova_add_cap(struct vfio_info_cap *caps,
+		 struct vfio_iommu_type1_info_cap_iova_range *cap_iovas,
+		 size_t size)
+{
+	struct vfio_info_cap_header *header;
+	struct vfio_iommu_type1_info_cap_iova_range *iova_cap;
+
+	header = vfio_info_cap_add(caps, size,
+				VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE, 1);
+	if (IS_ERR(header))
+		return PTR_ERR(header);
+
+	iova_cap = container_of(header,
+			struct vfio_iommu_type1_info_cap_iova_range, header);
+	iova_cap->nr_iovas = cap_iovas->nr_iovas;
+	memcpy(iova_cap->iova_ranges, cap_iovas->iova_ranges,
+		cap_iovas->nr_iovas * sizeof(*cap_iovas->iova_ranges));
+	return 0;
+}
+
+static int vfio_iommu_iova_build_caps(struct vfio_iommu *iommu,
+				struct vfio_info_cap *caps)
+{
+	struct vfio_iommu_type1_info_cap_iova_range *cap_iovas;
+	struct vfio_iova *iova;
+	size_t size;
+	int iovas = 0, i = 0, ret;
+
+	mutex_lock(&iommu->lock);
+
+	list_for_each_entry(iova, &iommu->iova_list, list)
+		iovas++;
+
+	if (!iovas) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	size = sizeof(*cap_iovas) + (iovas * sizeof(*cap_iovas->iova_ranges));
+
+	cap_iovas = kzalloc(size, GFP_KERNEL);
+	if (!cap_iovas) {
+		ret = -ENOMEM;
+		goto out_unlock;
+	}
+
+	cap_iovas->nr_iovas = iovas;
+
+	list_for_each_entry(iova, &iommu->iova_list, list) {
+		cap_iovas->iova_ranges[i].start = iova->start;
+		cap_iovas->iova_ranges[i].end = iova->end;
+		i++;
+	}
+
+	ret = vfio_iommu_iova_add_cap(caps, cap_iovas, size);
+
+	kfree(cap_iovas);
+out_unlock:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -1950,19 +2012,53 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		}
 	} else if (cmd == VFIO_IOMMU_GET_INFO) {
 		struct vfio_iommu_type1_info info;
+		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+		unsigned long capsz;
+		int ret;
 
 		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
 
+		/* For backward compatibility, cannot require this */
+		capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
+
 		if (copy_from_user(&info, (void __user *)arg, minsz))
 			return -EFAULT;
 
 		if (info.argsz < minsz)
 			return -EINVAL;
 
+		if (info.argsz >= capsz) {
+			minsz = capsz;
+			info.cap_offset = 0; /* output, no-recopy necessary */
+		}
+
 		info.flags = VFIO_IOMMU_INFO_PGSIZES;
 
 		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
+		ret = vfio_iommu_iova_build_caps(iommu, &caps);
+		if (ret)
+			return ret;
+
+		if (caps.size) {
+			info.flags |= VFIO_IOMMU_INFO_CAPS;
+
+			if (info.argsz < sizeof(info) + caps.size) {
+				info.argsz = sizeof(info) + caps.size;
+			} else {
+				vfio_info_cap_shift(&caps, sizeof(info));
+				if (copy_to_user((void __user *)arg +
+						sizeof(info), caps.buf,
+						caps.size)) {
+					kfree(caps.buf);
+					return -EFAULT;
+				}
+				info.cap_offset = sizeof(info);
+			}
+
+			kfree(caps.buf);
+		}
+
 		return copy_to_user((void __user *)arg, &info, minsz) ?
 			-EFAULT : 0;
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index c743721..46b49e9 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -589,7 +589,30 @@ struct vfio_iommu_type1_info {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
+#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
 	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
+	__u32   cap_offset;	/* Offset within info struct of first cap */
+};
+
+/*
+ * The IOVA capability allows to report the valid IOVA range(s)
+ * excluding any reserved regions associated with dev group. Any dma
+ * map attempt outside the valid iova range will return error.
+ *
+ * The structures below define version 1 of this capability.
+ */
+#define VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE  1
+
+struct vfio_iova_range {
+	__u64	start;
+	__u64	end;
+};
+
+struct vfio_iommu_type1_info_cap_iova_range {
+	struct vfio_info_cap_header header;
+	__u32	nr_iovas;
+	__u32	reserved;
+	struct vfio_iova_range iova_ranges[];
 };
 
 #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 5/7] vfio/type1: Add IOVA range capability support
@ 2018-03-15 16:35   ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

This  allows the user-space to retrieve the supported IOVA
range(s), excluding any reserved regions. The implementation
is based on capability chains, added to VFIO_IOMMU_GET_INFO ioctl.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 drivers/vfio/vfio_iommu_type1.c | 96 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       | 23 ++++++++++
 2 files changed, 119 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index d59db31..90f195d 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1929,6 +1929,68 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static int vfio_iommu_iova_add_cap(struct vfio_info_cap *caps,
+		 struct vfio_iommu_type1_info_cap_iova_range *cap_iovas,
+		 size_t size)
+{
+	struct vfio_info_cap_header *header;
+	struct vfio_iommu_type1_info_cap_iova_range *iova_cap;
+
+	header = vfio_info_cap_add(caps, size,
+				VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE, 1);
+	if (IS_ERR(header))
+		return PTR_ERR(header);
+
+	iova_cap = container_of(header,
+			struct vfio_iommu_type1_info_cap_iova_range, header);
+	iova_cap->nr_iovas = cap_iovas->nr_iovas;
+	memcpy(iova_cap->iova_ranges, cap_iovas->iova_ranges,
+		cap_iovas->nr_iovas * sizeof(*cap_iovas->iova_ranges));
+	return 0;
+}
+
+static int vfio_iommu_iova_build_caps(struct vfio_iommu *iommu,
+				struct vfio_info_cap *caps)
+{
+	struct vfio_iommu_type1_info_cap_iova_range *cap_iovas;
+	struct vfio_iova *iova;
+	size_t size;
+	int iovas = 0, i = 0, ret;
+
+	mutex_lock(&iommu->lock);
+
+	list_for_each_entry(iova, &iommu->iova_list, list)
+		iovas++;
+
+	if (!iovas) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	size = sizeof(*cap_iovas) + (iovas * sizeof(*cap_iovas->iova_ranges));
+
+	cap_iovas = kzalloc(size, GFP_KERNEL);
+	if (!cap_iovas) {
+		ret = -ENOMEM;
+		goto out_unlock;
+	}
+
+	cap_iovas->nr_iovas = iovas;
+
+	list_for_each_entry(iova, &iommu->iova_list, list) {
+		cap_iovas->iova_ranges[i].start = iova->start;
+		cap_iovas->iova_ranges[i].end = iova->end;
+		i++;
+	}
+
+	ret = vfio_iommu_iova_add_cap(caps, cap_iovas, size);
+
+	kfree(cap_iovas);
+out_unlock:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -1950,19 +2012,53 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		}
 	} else if (cmd == VFIO_IOMMU_GET_INFO) {
 		struct vfio_iommu_type1_info info;
+		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+		unsigned long capsz;
+		int ret;
 
 		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
 
+		/* For backward compatibility, cannot require this */
+		capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
+
 		if (copy_from_user(&info, (void __user *)arg, minsz))
 			return -EFAULT;
 
 		if (info.argsz < minsz)
 			return -EINVAL;
 
+		if (info.argsz >= capsz) {
+			minsz = capsz;
+			info.cap_offset = 0; /* output, no-recopy necessary */
+		}
+
 		info.flags = VFIO_IOMMU_INFO_PGSIZES;
 
 		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
+		ret = vfio_iommu_iova_build_caps(iommu, &caps);
+		if (ret)
+			return ret;
+
+		if (caps.size) {
+			info.flags |= VFIO_IOMMU_INFO_CAPS;
+
+			if (info.argsz < sizeof(info) + caps.size) {
+				info.argsz = sizeof(info) + caps.size;
+			} else {
+				vfio_info_cap_shift(&caps, sizeof(info));
+				if (copy_to_user((void __user *)arg +
+						sizeof(info), caps.buf,
+						caps.size)) {
+					kfree(caps.buf);
+					return -EFAULT;
+				}
+				info.cap_offset = sizeof(info);
+			}
+
+			kfree(caps.buf);
+		}
+
 		return copy_to_user((void __user *)arg, &info, minsz) ?
 			-EFAULT : 0;
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index c743721..46b49e9 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -589,7 +589,30 @@ struct vfio_iommu_type1_info {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
+#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
 	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
+	__u32   cap_offset;	/* Offset within info struct of first cap */
+};
+
+/*
+ * The IOVA capability allows to report the valid IOVA range(s)
+ * excluding any reserved regions associated with dev group. Any dma
+ * map attempt outside the valid iova range will return error.
+ *
+ * The structures below define version 1 of this capability.
+ */
+#define VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE  1
+
+struct vfio_iova_range {
+	__u64	start;
+	__u64	end;
+};
+
+struct vfio_iommu_type1_info_cap_iova_range {
+	struct vfio_info_cap_header header;
+	__u32	nr_iovas;
+	__u32	reserved;
+	struct vfio_iova_range iova_ranges[];
 };
 
 #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 6/7] vfio/type1: remove duplicate retrieval of reserved regions
  2018-03-15 16:35 ` Shameer Kolothum
@ 2018-03-15 16:35   ` Shameer Kolothum
  -1 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, pmorel
  Cc: kvm, linux-kernel, iommu, linuxarm, john.garry, xuwei5, Shameer Kolothum

As we now already have the reserved regions list, just pass that into
vfio_iommu_has_sw_msi() fn.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/vfio/vfio_iommu_type1.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 90f195d..65c13e7 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1204,15 +1204,13 @@ static struct vfio_group *find_iommu_group(struct vfio_domain *domain,
 	return NULL;
 }
 
-static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base)
+static bool vfio_iommu_has_sw_msi(struct list_head *group_resv_regions,
+						phys_addr_t *base)
 {
-	struct list_head group_resv_regions;
-	struct iommu_resv_region *region, *next;
+	struct iommu_resv_region *region;
 	bool ret = false;
 
-	INIT_LIST_HEAD(&group_resv_regions);
-	iommu_get_group_resv_regions(group, &group_resv_regions);
-	list_for_each_entry(region, &group_resv_regions, list) {
+	list_for_each_entry(region, group_resv_regions, list) {
 		/*
 		 * The presence of any 'real' MSI regions should take
 		 * precedence over the software-managed one if the
@@ -1228,8 +1226,7 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base)
 			ret = true;
 		}
 	}
-	list_for_each_entry_safe(region, next, &group_resv_regions, list)
-		kfree(region);
+
 	return ret;
 }
 
@@ -1564,7 +1561,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_detach;
 
-	resv_msi = vfio_iommu_has_sw_msi(iommu_group, &resv_msi_base);
+	resv_msi = vfio_iommu_has_sw_msi(&group_resv_regions, &resv_msi_base);
 
 	INIT_LIST_HEAD(&domain->group_list);
 	list_add(&group->next, &domain->group_list);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 6/7] vfio/type1: remove duplicate retrieval of reserved regions
@ 2018-03-15 16:35   ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, pmorel
  Cc: kvm, linux-kernel, iommu, linuxarm, john.garry, xuwei5, Shameer Kolothum

As we now already have the reserved regions list, just pass that into
vfio_iommu_has_sw_msi() fn.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/vfio/vfio_iommu_type1.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 90f195d..65c13e7 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1204,15 +1204,13 @@ static struct vfio_group *find_iommu_group(struct vfio_domain *domain,
 	return NULL;
 }
 
-static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base)
+static bool vfio_iommu_has_sw_msi(struct list_head *group_resv_regions,
+						phys_addr_t *base)
 {
-	struct list_head group_resv_regions;
-	struct iommu_resv_region *region, *next;
+	struct iommu_resv_region *region;
 	bool ret = false;
 
-	INIT_LIST_HEAD(&group_resv_regions);
-	iommu_get_group_resv_regions(group, &group_resv_regions);
-	list_for_each_entry(region, &group_resv_regions, list) {
+	list_for_each_entry(region, group_resv_regions, list) {
 		/*
 		 * The presence of any 'real' MSI regions should take
 		 * precedence over the software-managed one if the
@@ -1228,8 +1226,7 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base)
 			ret = true;
 		}
 	}
-	list_for_each_entry_safe(region, next, &group_resv_regions, list)
-		kfree(region);
+
 	return ret;
 }
 
@@ -1564,7 +1561,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_detach;
 
-	resv_msi = vfio_iommu_has_sw_msi(iommu_group, &resv_msi_base);
+	resv_msi = vfio_iommu_has_sw_msi(&group_resv_regions, &resv_msi_base);
 
 	INIT_LIST_HEAD(&domain->group_list);
 	list_add(&group->next, &domain->group_list);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 7/7] iommu/dma: Move PCI window region reservation back into dma specific path.
@ 2018-03-15 16:35   ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, pmorel
  Cc: kvm, linux-kernel, iommu, linuxarm, john.garry, xuwei5,
	Shameer Kolothum, Robin Murphy, Joerg Roedel

This pretty much reverts commit 273df9635385 ("iommu/dma: Make PCI
window reservation generic")  by moving the PCI window region
reservation back into the dma specific path so that these regions
doesn't get exposed via the IOMMU API interface. With this change,
the vfio interface will report only iommu specific reserved regions
to the user space.

Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/iommu/dma-iommu.c | 54 ++++++++++++++++++++++-------------------------
 1 file changed, 25 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f05f3cf..ddcbbdb 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -167,40 +167,16 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
  * @list: Reserved region list from iommu_get_resv_regions()
  *
  * IOMMU drivers can use this to implement their .get_resv_regions callback
- * for general non-IOMMU-specific reservations. Currently, this covers host
- * bridge windows for PCI devices and GICv3 ITS region reservation on ACPI
- * based ARM platforms that may require HW MSI reservation.
+ * for general non-IOMMU-specific reservations. Currently, this covers GICv3
+ * ITS region reservation on ACPI based ARM platforms that may require HW MSI
+ * reservation.
  */
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
 {
-	struct pci_host_bridge *bridge;
-	struct resource_entry *window;
-
-	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode) &&
-		iort_iommu_msi_get_resv_regions(dev, list) < 0)
-		return;
-
-	if (!dev_is_pci(dev))
-		return;
-
-	bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
-	resource_list_for_each_entry(window, &bridge->windows) {
-		struct iommu_resv_region *region;
-		phys_addr_t start;
-		size_t length;
-
-		if (resource_type(window->res) != IORESOURCE_MEM)
-			continue;
 
-		start = window->res->start - window->offset;
-		length = window->res->end - window->res->start + 1;
-		region = iommu_alloc_resv_region(start, length, 0,
-				IOMMU_RESV_RESERVED);
-		if (!region)
-			return;
+	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode))
+		iort_iommu_msi_get_resv_regions(dev, list);
 
-		list_add_tail(&region->list, list);
-	}
 }
 EXPORT_SYMBOL(iommu_dma_get_resv_regions);
 
@@ -229,6 +205,23 @@ static int cookie_init_hw_msi_region(struct iommu_dma_cookie *cookie,
 	return 0;
 }
 
+static void iova_reserve_pci_windows(struct pci_dev *dev,
+		struct iova_domain *iovad)
+{
+	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+	struct resource_entry *window;
+	unsigned long lo, hi;
+
+	resource_list_for_each_entry(window, &bridge->windows) {
+		if (resource_type(window->res) != IORESOURCE_MEM)
+			continue;
+
+		lo = iova_pfn(iovad, window->res->start - window->offset);
+		hi = iova_pfn(iovad, window->res->end - window->offset);
+		reserve_iova(iovad, lo, hi);
+	}
+}
+
 static int iova_reserve_iommu_regions(struct device *dev,
 		struct iommu_domain *domain)
 {
@@ -238,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev,
 	LIST_HEAD(resv_regions);
 	int ret = 0;
 
+	if (dev_is_pci(dev))
+		iova_reserve_pci_windows(to_pci_dev(dev), iovad);
+
 	iommu_get_resv_regions(dev, &resv_regions);
 	list_for_each_entry(region, &resv_regions, list) {
 		unsigned long lo, hi;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 7/7] iommu/dma: Move PCI window region reservation back into dma specific path.
@ 2018-03-15 16:35   ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

This pretty much reverts commit 273df9635385 ("iommu/dma: Make PCI
window reservation generic")  by moving the PCI window region
reservation back into the dma specific path so that these regions
doesn't get exposed via the IOMMU API interface. With this change,
the vfio interface will report only iommu specific reserved regions
to the user space.

Cc: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 drivers/iommu/dma-iommu.c | 54 ++++++++++++++++++++++-------------------------
 1 file changed, 25 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f05f3cf..ddcbbdb 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -167,40 +167,16 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
  * @list: Reserved region list from iommu_get_resv_regions()
  *
  * IOMMU drivers can use this to implement their .get_resv_regions callback
- * for general non-IOMMU-specific reservations. Currently, this covers host
- * bridge windows for PCI devices and GICv3 ITS region reservation on ACPI
- * based ARM platforms that may require HW MSI reservation.
+ * for general non-IOMMU-specific reservations. Currently, this covers GICv3
+ * ITS region reservation on ACPI based ARM platforms that may require HW MSI
+ * reservation.
  */
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
 {
-	struct pci_host_bridge *bridge;
-	struct resource_entry *window;
-
-	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode) &&
-		iort_iommu_msi_get_resv_regions(dev, list) < 0)
-		return;
-
-	if (!dev_is_pci(dev))
-		return;
-
-	bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
-	resource_list_for_each_entry(window, &bridge->windows) {
-		struct iommu_resv_region *region;
-		phys_addr_t start;
-		size_t length;
-
-		if (resource_type(window->res) != IORESOURCE_MEM)
-			continue;
 
-		start = window->res->start - window->offset;
-		length = window->res->end - window->res->start + 1;
-		region = iommu_alloc_resv_region(start, length, 0,
-				IOMMU_RESV_RESERVED);
-		if (!region)
-			return;
+	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode))
+		iort_iommu_msi_get_resv_regions(dev, list);
 
-		list_add_tail(&region->list, list);
-	}
 }
 EXPORT_SYMBOL(iommu_dma_get_resv_regions);
 
@@ -229,6 +205,23 @@ static int cookie_init_hw_msi_region(struct iommu_dma_cookie *cookie,
 	return 0;
 }
 
+static void iova_reserve_pci_windows(struct pci_dev *dev,
+		struct iova_domain *iovad)
+{
+	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+	struct resource_entry *window;
+	unsigned long lo, hi;
+
+	resource_list_for_each_entry(window, &bridge->windows) {
+		if (resource_type(window->res) != IORESOURCE_MEM)
+			continue;
+
+		lo = iova_pfn(iovad, window->res->start - window->offset);
+		hi = iova_pfn(iovad, window->res->end - window->offset);
+		reserve_iova(iovad, lo, hi);
+	}
+}
+
 static int iova_reserve_iommu_regions(struct device *dev,
 		struct iommu_domain *domain)
 {
@@ -238,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev,
 	LIST_HEAD(resv_regions);
 	int ret = 0;
 
+	if (dev_is_pci(dev))
+		iova_reserve_pci_windows(to_pci_dev(dev), iovad);
+
 	iommu_get_resv_regions(dev, &resv_regions);
 	list_for_each_entry(region, &resv_regions, list) {
 		unsigned long lo, hi;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 7/7] iommu/dma: Move PCI window region reservation back into dma specific path.
@ 2018-03-15 16:35   ` Shameer Kolothum
  0 siblings, 0 replies; 63+ messages in thread
From: Shameer Kolothum @ 2018-03-15 16:35 UTC (permalink / raw)
  To: alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

This pretty much reverts commit 273df9635385 ("iommu/dma: Make PCI
window reservation generic")  by moving the PCI window region
reservation back into the dma specific path so that these regions
doesn't get exposed via the IOMMU API interface. With this change,
the vfio interface will report only iommu specific reserved regions
to the user space.

Cc: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 drivers/iommu/dma-iommu.c | 54 ++++++++++++++++++++++-------------------------
 1 file changed, 25 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f05f3cf..ddcbbdb 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -167,40 +167,16 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
  * @list: Reserved region list from iommu_get_resv_regions()
  *
  * IOMMU drivers can use this to implement their .get_resv_regions callback
- * for general non-IOMMU-specific reservations. Currently, this covers host
- * bridge windows for PCI devices and GICv3 ITS region reservation on ACPI
- * based ARM platforms that may require HW MSI reservation.
+ * for general non-IOMMU-specific reservations. Currently, this covers GICv3
+ * ITS region reservation on ACPI based ARM platforms that may require HW MSI
+ * reservation.
  */
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
 {
-	struct pci_host_bridge *bridge;
-	struct resource_entry *window;
-
-	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode) &&
-		iort_iommu_msi_get_resv_regions(dev, list) < 0)
-		return;
-
-	if (!dev_is_pci(dev))
-		return;
-
-	bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
-	resource_list_for_each_entry(window, &bridge->windows) {
-		struct iommu_resv_region *region;
-		phys_addr_t start;
-		size_t length;
-
-		if (resource_type(window->res) != IORESOURCE_MEM)
-			continue;
 
-		start = window->res->start - window->offset;
-		length = window->res->end - window->res->start + 1;
-		region = iommu_alloc_resv_region(start, length, 0,
-				IOMMU_RESV_RESERVED);
-		if (!region)
-			return;
+	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode))
+		iort_iommu_msi_get_resv_regions(dev, list);
 
-		list_add_tail(&region->list, list);
-	}
 }
 EXPORT_SYMBOL(iommu_dma_get_resv_regions);
 
@@ -229,6 +205,23 @@ static int cookie_init_hw_msi_region(struct iommu_dma_cookie *cookie,
 	return 0;
 }
 
+static void iova_reserve_pci_windows(struct pci_dev *dev,
+		struct iova_domain *iovad)
+{
+	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+	struct resource_entry *window;
+	unsigned long lo, hi;
+
+	resource_list_for_each_entry(window, &bridge->windows) {
+		if (resource_type(window->res) != IORESOURCE_MEM)
+			continue;
+
+		lo = iova_pfn(iovad, window->res->start - window->offset);
+		hi = iova_pfn(iovad, window->res->end - window->offset);
+		reserve_iova(iovad, lo, hi);
+	}
+}
+
 static int iova_reserve_iommu_regions(struct device *dev,
 		struct iommu_domain *domain)
 {
@@ -238,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev,
 	LIST_HEAD(resv_regions);
 	int ret = 0;
 
+	if (dev_is_pci(dev))
+		iova_reserve_pci_windows(to_pci_dev(dev), iovad);
+
 	iommu_get_resv_regions(dev, &resv_regions);
 	list_for_each_entry(region, &resv_regions, list) {
 		unsigned long lo, hi;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-19  7:51     ` Tian, Kevin
  0 siblings, 0 replies; 63+ messages in thread
From: Tian, Kevin @ 2018-03-19  7:51 UTC (permalink / raw)
  To: Shameer Kolothum, alex.williamson, eric.auger, pmorel
  Cc: kvm, linux-kernel, xuwei5, linuxarm, iommu

> From: Shameer Kolothum
> Sent: Friday, March 16, 2018 12:35 AM
> 
> This retrieves the reserved regions associated with dev group and
> checks for conflicts with any existing dma mappings. Also update
> the iova list excluding the reserved regions.
> 
> Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 90
> +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 90 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c
> index 1123c74..cfe2bb2 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct
> list_head *iova,
>  	return 0;
>  }
> 
> +/*
> + * Check reserved region conflicts with existing dma mappings
> + */
> +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
> +				struct list_head *resv_regions)
> +{
> +	struct iommu_resv_region *region;
> +
> +	/* Check for conflict with existing dma mappings */
> +	list_for_each_entry(region, resv_regions, list) {
> +		if (vfio_find_dma(iommu, region->start, region->length))
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * Check iova region overlap with  reserved regions and
> + * exclude them from the iommu iova range
> + */
> +static int vfio_iommu_resv_exclude(struct list_head *iova,
> +					struct list_head *resv_regions)
> +{
> +	struct iommu_resv_region *resv;
> +	struct vfio_iova *n, *next;
> +
> +	list_for_each_entry(resv, resv_regions, list) {
> +		phys_addr_t start, end;
> +
> +		start = resv->start;
> +		end = resv->start + resv->length - 1;
> +
> +		list_for_each_entry_safe(n, next, iova, list) {
> +			int ret = 0;
> +
> +			/* No overlap */
> +			if ((start > n->end) || (end < n->start))
> +				continue;
> +			/*
> +			 * Insert a new node if current node overlaps with
> the
> +			 * reserve region to exlude that from valid iova
> range.
> +			 * Note that, new node is inserted before the
> current
> +			 * node and finally the current node is deleted
> keeping
> +			 * the list updated and sorted.
> +			 */
> +			if (start > n->start)
> +				ret = vfio_iommu_iova_insert(&n->list,
> +							n->start, start - 1);
> +			if (!ret && end < n->end)
> +				ret = vfio_iommu_iova_insert(&n->list,
> +							end + 1, n->end);
> +			if (ret)
> +				return ret;

Is it safer to delete the 1st node here in case of failure of the 2nd node?
There is no problem with current logic since upon error iova_copy will
be released anyway. However this function alone doesn't assume the
fact of a temporary list, thus it's better to keep the list clean w/o garbage
left from any error handling.

> +
> +			list_del(&n->list);
> +			kfree(n);
> +		}
> +	}
> +
> +	if (list_empty(iova))
> +		return -EINVAL;

if list_empty should BUG_ON here? or do we really need this check?

> +
> +	return 0;
> +}
> +
> +static void vfio_iommu_resv_free(struct list_head *resv_regions)
> +{
> +	struct iommu_resv_region *n, *next;
> +
> +	list_for_each_entry_safe(n, next, resv_regions, list) {
> +		list_del(&n->list);
> +		kfree(n);
> +	}
> +}
> +
>  static void vfio_iommu_iova_free(struct list_head *iova)
>  {
>  	struct vfio_iova *n, *next;
> @@ -1366,6 +1442,7 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
>  	phys_addr_t resv_msi_base;
>  	struct iommu_domain_geometry geo;
>  	LIST_HEAD(iova_copy);
> +	LIST_HEAD(group_resv_regions);
> 
>  	mutex_lock(&iommu->lock);
> 
> @@ -1444,6 +1521,13 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
>  		goto out_detach;
>  	}
> 
> +	iommu_get_group_resv_regions(iommu_group,
> &group_resv_regions);
> +
> +	if (vfio_iommu_resv_conflict(iommu, &group_resv_regions)) {
> +		ret = -EINVAL;
> +		goto out_detach;
> +	}
> +
>  	/* Get a copy of the current iova list and work on it */
>  	ret = vfio_iommu_iova_get_copy(iommu, &iova_copy);
>  	if (ret)
> @@ -1454,6 +1538,10 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
>  	if (ret)
>  		goto out_detach;
> 
> +	ret = vfio_iommu_resv_exclude(&iova_copy, &group_resv_regions);
> +	if (ret)
> +		goto out_detach;
> +
>  	resv_msi = vfio_iommu_has_sw_msi(iommu_group,
> &resv_msi_base);
> 
>  	INIT_LIST_HEAD(&domain->group_list);
> @@ -1514,6 +1602,7 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
>  	/* Delete the old one and insert new iova list */
>  	vfio_iommu_iova_insert_copy(iommu, &iova_copy);
>  	mutex_unlock(&iommu->lock);
> +	vfio_iommu_resv_free(&group_resv_regions);
> 
>  	return 0;
> 
> @@ -1522,6 +1611,7 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
>  out_domain:
>  	iommu_domain_free(domain->domain);
>  	vfio_iommu_iova_free(&iova_copy);
> +	vfio_iommu_resv_free(&group_resv_regions);
>  out_free:
>  	kfree(domain);
>  	kfree(group);
> --
> 2.7.4
> 
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-19  7:51     ` Tian, Kevin
  0 siblings, 0 replies; 63+ messages in thread
From: Tian, Kevin @ 2018-03-19  7:51 UTC (permalink / raw)
  To: Shameer Kolothum, alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kvm-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA

> From: Shameer Kolothum
> Sent: Friday, March 16, 2018 12:35 AM
> 
> This retrieves the reserved regions associated with dev group and
> checks for conflicts with any existing dma mappings. Also update
> the iova list excluding the reserved regions.
> 
> Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 90
> +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 90 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c
> index 1123c74..cfe2bb2 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct
> list_head *iova,
>  	return 0;
>  }
> 
> +/*
> + * Check reserved region conflicts with existing dma mappings
> + */
> +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
> +				struct list_head *resv_regions)
> +{
> +	struct iommu_resv_region *region;
> +
> +	/* Check for conflict with existing dma mappings */
> +	list_for_each_entry(region, resv_regions, list) {
> +		if (vfio_find_dma(iommu, region->start, region->length))
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * Check iova region overlap with  reserved regions and
> + * exclude them from the iommu iova range
> + */
> +static int vfio_iommu_resv_exclude(struct list_head *iova,
> +					struct list_head *resv_regions)
> +{
> +	struct iommu_resv_region *resv;
> +	struct vfio_iova *n, *next;
> +
> +	list_for_each_entry(resv, resv_regions, list) {
> +		phys_addr_t start, end;
> +
> +		start = resv->start;
> +		end = resv->start + resv->length - 1;
> +
> +		list_for_each_entry_safe(n, next, iova, list) {
> +			int ret = 0;
> +
> +			/* No overlap */
> +			if ((start > n->end) || (end < n->start))
> +				continue;
> +			/*
> +			 * Insert a new node if current node overlaps with
> the
> +			 * reserve region to exlude that from valid iova
> range.
> +			 * Note that, new node is inserted before the
> current
> +			 * node and finally the current node is deleted
> keeping
> +			 * the list updated and sorted.
> +			 */
> +			if (start > n->start)
> +				ret = vfio_iommu_iova_insert(&n->list,
> +							n->start, start - 1);
> +			if (!ret && end < n->end)
> +				ret = vfio_iommu_iova_insert(&n->list,
> +							end + 1, n->end);
> +			if (ret)
> +				return ret;

Is it safer to delete the 1st node here in case of failure of the 2nd node?
There is no problem with current logic since upon error iova_copy will
be released anyway. However this function alone doesn't assume the
fact of a temporary list, thus it's better to keep the list clean w/o garbage
left from any error handling.

> +
> +			list_del(&n->list);
> +			kfree(n);
> +		}
> +	}
> +
> +	if (list_empty(iova))
> +		return -EINVAL;

if list_empty should BUG_ON here? or do we really need this check?

> +
> +	return 0;
> +}
> +
> +static void vfio_iommu_resv_free(struct list_head *resv_regions)
> +{
> +	struct iommu_resv_region *n, *next;
> +
> +	list_for_each_entry_safe(n, next, resv_regions, list) {
> +		list_del(&n->list);
> +		kfree(n);
> +	}
> +}
> +
>  static void vfio_iommu_iova_free(struct list_head *iova)
>  {
>  	struct vfio_iova *n, *next;
> @@ -1366,6 +1442,7 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
>  	phys_addr_t resv_msi_base;
>  	struct iommu_domain_geometry geo;
>  	LIST_HEAD(iova_copy);
> +	LIST_HEAD(group_resv_regions);
> 
>  	mutex_lock(&iommu->lock);
> 
> @@ -1444,6 +1521,13 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
>  		goto out_detach;
>  	}
> 
> +	iommu_get_group_resv_regions(iommu_group,
> &group_resv_regions);
> +
> +	if (vfio_iommu_resv_conflict(iommu, &group_resv_regions)) {
> +		ret = -EINVAL;
> +		goto out_detach;
> +	}
> +
>  	/* Get a copy of the current iova list and work on it */
>  	ret = vfio_iommu_iova_get_copy(iommu, &iova_copy);
>  	if (ret)
> @@ -1454,6 +1538,10 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
>  	if (ret)
>  		goto out_detach;
> 
> +	ret = vfio_iommu_resv_exclude(&iova_copy, &group_resv_regions);
> +	if (ret)
> +		goto out_detach;
> +
>  	resv_msi = vfio_iommu_has_sw_msi(iommu_group,
> &resv_msi_base);
> 
>  	INIT_LIST_HEAD(&domain->group_list);
> @@ -1514,6 +1602,7 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
>  	/* Delete the old one and insert new iova list */
>  	vfio_iommu_iova_insert_copy(iommu, &iova_copy);
>  	mutex_unlock(&iommu->lock);
> +	vfio_iommu_resv_free(&group_resv_regions);
> 
>  	return 0;
> 
> @@ -1522,6 +1611,7 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
>  out_domain:
>  	iommu_domain_free(domain->domain);
>  	vfio_iommu_iova_free(&iova_copy);
> +	vfio_iommu_resv_free(&group_resv_regions);
>  out_free:
>  	kfree(domain);
>  	kfree(group);
> --
> 2.7.4
> 
> 
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list management
@ 2018-03-19  8:28   ` Tian, Kevin
  0 siblings, 0 replies; 63+ messages in thread
From: Tian, Kevin @ 2018-03-19  8:28 UTC (permalink / raw)
  To: Shameer Kolothum, alex.williamson, eric.auger, pmorel
  Cc: kvm, linux-kernel, xuwei5, linuxarm, iommu

> From: Shameer Kolothum
> Sent: Friday, March 16, 2018 12:35 AM
> 
> This series introduces an iova list associated with a vfio
> iommu. The list is kept updated taking care of iommu apertures,
> and reserved regions. Also this series adds checks for any conflict
> with existing dma mappings whenever a new device group is attached to
> the domain.
> 
> User-space can retrieve valid iova ranges using VFIO_IOMMU_GET_INFO
> ioctl capability chains. Any dma map request outside the valid iova
> range will be rejected.

GET_INFO is done at initialization time which is good for cold attached
devices. If a hotplugged device may cause change of valid iova ranges
at run-time, then there could be potential problem (which however is
difficult for user space or orchestration stack to figure out in advance)
Can we do some extension like below to make hotplug case cleaner?

- An interface allowing user space to request VFIO rejecting further 
attach_group if doing so may cause iova range change. e.g. Qemu can 
do such request once completing initial GET_INFO;

- or an event notification to user space upon change of valid iova
ranges when attaching a new device at run-time. It goes one step 
further - even attach may cause iova range change, it may still
succeed as long as Qemu hasn't allocated any iova in impacted 
range

Thanks
Kevin

> 
> 
> v4 --> v5
> Rebased to next-20180315.
> 
>  -Incorporated the corner case bug fix suggested by Alex to patch #5.
>  -Based on suggestions by Alex and Robin, added patch#7. This
>   moves the PCI window  reservation back in to DMA specific path.
>   This is to fix the issue reported by Eric[1].
> 
> Note:
> The patch #7 has dependency with [2][3]
> 
> 1. https://patchwork.kernel.org/patch/10232043/
> 2. https://patchwork.kernel.org/patch/10216553/
> 3. https://patchwork.kernel.org/patch/10216555/
> 
> v3 --> v4
>  Addressed comments received for v3.
>  -dma_addr_t instead of phys_addr_t
>  -LIST_HEAD() usage.
>  -Free up iova_copy list in case of error.
>  -updated logic in filling the iova caps info(patch #5)
> 
> RFCv2 --> v3
>  Removed RFC tag.
>  Addressed comments from Alex and Eric:
>  - Added comments to make iova list management logic more clear.
>  - Use of iova list copy so that original is not altered in
>    case of failure.
> 
> RFCv1 --> RFCv2
>  Addressed comments from Alex:
> -Introduced IOVA list management and added checks for conflicts with
>  existing dma map entries during attach/detach.
> 
> Shameer Kolothum (2):
>   vfio/type1: Add IOVA range capability support
>   iommu/dma: Move PCI window region reservation back into dma specific
>     path.
> 
> Shameerali Kolothum Thodi (5):
>   vfio/type1: Introduce iova list and add iommu aperture validity check
>   vfio/type1: Check reserve region conflict and update iova list
>   vfio/type1: Update iova list on detach
>   vfio/type1: check dma map request is within a valid iova range
>   vfio/type1: remove duplicate retrieval of reserved regions
> 
>  drivers/iommu/dma-iommu.c       |  54 ++---
>  drivers/vfio/vfio_iommu_type1.c | 497
> +++++++++++++++++++++++++++++++++++++++-
>  include/uapi/linux/vfio.h       |  23 ++
>  3 files changed, 533 insertions(+), 41 deletions(-)
> 
> --
> 2.7.4
> 
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list management
@ 2018-03-19  8:28   ` Tian, Kevin
  0 siblings, 0 replies; 63+ messages in thread
From: Tian, Kevin @ 2018-03-19  8:28 UTC (permalink / raw)
  To: Shameer Kolothum, alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kvm-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA

> From: Shameer Kolothum
> Sent: Friday, March 16, 2018 12:35 AM
> 
> This series introduces an iova list associated with a vfio
> iommu. The list is kept updated taking care of iommu apertures,
> and reserved regions. Also this series adds checks for any conflict
> with existing dma mappings whenever a new device group is attached to
> the domain.
> 
> User-space can retrieve valid iova ranges using VFIO_IOMMU_GET_INFO
> ioctl capability chains. Any dma map request outside the valid iova
> range will be rejected.

GET_INFO is done at initialization time which is good for cold attached
devices. If a hotplugged device may cause change of valid iova ranges
at run-time, then there could be potential problem (which however is
difficult for user space or orchestration stack to figure out in advance)
Can we do some extension like below to make hotplug case cleaner?

- An interface allowing user space to request VFIO rejecting further 
attach_group if doing so may cause iova range change. e.g. Qemu can 
do such request once completing initial GET_INFO;

- or an event notification to user space upon change of valid iova
ranges when attaching a new device at run-time. It goes one step 
further - even attach may cause iova range change, it may still
succeed as long as Qemu hasn't allocated any iova in impacted 
range

Thanks
Kevin

> 
> 
> v4 --> v5
> Rebased to next-20180315.
> 
>  -Incorporated the corner case bug fix suggested by Alex to patch #5.
>  -Based on suggestions by Alex and Robin, added patch#7. This
>   moves the PCI window  reservation back in to DMA specific path.
>   This is to fix the issue reported by Eric[1].
> 
> Note:
> The patch #7 has dependency with [2][3]
> 
> 1. https://patchwork.kernel.org/patch/10232043/
> 2. https://patchwork.kernel.org/patch/10216553/
> 3. https://patchwork.kernel.org/patch/10216555/
> 
> v3 --> v4
>  Addressed comments received for v3.
>  -dma_addr_t instead of phys_addr_t
>  -LIST_HEAD() usage.
>  -Free up iova_copy list in case of error.
>  -updated logic in filling the iova caps info(patch #5)
> 
> RFCv2 --> v3
>  Removed RFC tag.
>  Addressed comments from Alex and Eric:
>  - Added comments to make iova list management logic more clear.
>  - Use of iova list copy so that original is not altered in
>    case of failure.
> 
> RFCv1 --> RFCv2
>  Addressed comments from Alex:
> -Introduced IOVA list management and added checks for conflicts with
>  existing dma map entries during attach/detach.
> 
> Shameer Kolothum (2):
>   vfio/type1: Add IOVA range capability support
>   iommu/dma: Move PCI window region reservation back into dma specific
>     path.
> 
> Shameerali Kolothum Thodi (5):
>   vfio/type1: Introduce iova list and add iommu aperture validity check
>   vfio/type1: Check reserve region conflict and update iova list
>   vfio/type1: Update iova list on detach
>   vfio/type1: check dma map request is within a valid iova range
>   vfio/type1: remove duplicate retrieval of reserved regions
> 
>  drivers/iommu/dma-iommu.c       |  54 ++---
>  drivers/vfio/vfio_iommu_type1.c | 497
> +++++++++++++++++++++++++++++++++++++++-
>  include/uapi/linux/vfio.h       |  23 ++
>  3 files changed, 533 insertions(+), 41 deletions(-)
> 
> --
> 2.7.4
> 
> 
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list management
@ 2018-03-19 10:54     ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 63+ messages in thread
From: Shameerali Kolothum Thodi @ 2018-03-19 10:54 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson, eric.auger, pmorel
  Cc: kvm, linux-kernel, xuwei (O), Linuxarm, iommu

Hi Kevin,

Thanks for taking a look at this series.

> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: Monday, March 19, 2018 8:29 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> alex.williamson@redhat.com; eric.auger@redhat.com;
> pmorel@linux.vnet.ibm.com
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org; xuwei (O)
> <xuwei5@huawei.com>; Linuxarm <linuxarm@huawei.com>;
> iommu@lists.linux-foundation.org
> Subject: RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list
> management
> 
> > From: Shameer Kolothum
> > Sent: Friday, March 16, 2018 12:35 AM
> >
> > This series introduces an iova list associated with a vfio
> > iommu. The list is kept updated taking care of iommu apertures,
> > and reserved regions. Also this series adds checks for any conflict
> > with existing dma mappings whenever a new device group is attached to
> > the domain.
> >
> > User-space can retrieve valid iova ranges using VFIO_IOMMU_GET_INFO
> > ioctl capability chains. Any dma map request outside the valid iova
> > range will be rejected.
> 
> GET_INFO is done at initialization time which is good for cold attached
> devices. If a hotplugged device may cause change of valid iova ranges
> at run-time, then there could be potential problem (which however is
> difficult for user space or orchestration stack to figure out in advance)
> Can we do some extension like below to make hotplug case cleaner?

As I understand, in case a hotplugged device results in an update to the valid
Iova ranges then the Qemu, vfio_connect_container() --> memory_listner_register()
will fail if there is a conflict as patch #4 checks for invalid dma map requests. 

Not sure, your concern is preventing hotplug much before this happens or not.

Thanks,
Shameer

> - An interface allowing user space to request VFIO rejecting further
> attach_group if doing so may cause iova range change. e.g. Qemu can
> do such request once completing initial GET_INFO;
> 
> - or an event notification to user space upon change of valid iova
> ranges when attaching a new device at run-time. It goes one step
> further - even attach may cause iova range change, it may still
> succeed as long as Qemu hasn't allocated any iova in impacted
> range
> 
> Thanks
> Kevin
> 
> >
> >
> > v4 --> v5
> > Rebased to next-20180315.
> >
> >  -Incorporated the corner case bug fix suggested by Alex to patch #5.
> >  -Based on suggestions by Alex and Robin, added patch#7. This
> >   moves the PCI window  reservation back in to DMA specific path.
> >   This is to fix the issue reported by Eric[1].
> >
> > Note:
> > The patch #7 has dependency with [2][3]
> >
> > 1. https://patchwork.kernel.org/patch/10232043/
> > 2. https://patchwork.kernel.org/patch/10216553/
> > 3. https://patchwork.kernel.org/patch/10216555/
> >
> > v3 --> v4
> >  Addressed comments received for v3.
> >  -dma_addr_t instead of phys_addr_t
> >  -LIST_HEAD() usage.
> >  -Free up iova_copy list in case of error.
> >  -updated logic in filling the iova caps info(patch #5)
> >
> > RFCv2 --> v3
> >  Removed RFC tag.
> >  Addressed comments from Alex and Eric:
> >  - Added comments to make iova list management logic more clear.
> >  - Use of iova list copy so that original is not altered in
> >    case of failure.
> >
> > RFCv1 --> RFCv2
> >  Addressed comments from Alex:
> > -Introduced IOVA list management and added checks for conflicts with
> >  existing dma map entries during attach/detach.
> >
> > Shameer Kolothum (2):
> >   vfio/type1: Add IOVA range capability support
> >   iommu/dma: Move PCI window region reservation back into dma specific
> >     path.
> >
> > Shameerali Kolothum Thodi (5):
> >   vfio/type1: Introduce iova list and add iommu aperture validity check
> >   vfio/type1: Check reserve region conflict and update iova list
> >   vfio/type1: Update iova list on detach
> >   vfio/type1: check dma map request is within a valid iova range
> >   vfio/type1: remove duplicate retrieval of reserved regions
> >
> >  drivers/iommu/dma-iommu.c       |  54 ++---
> >  drivers/vfio/vfio_iommu_type1.c | 497
> > +++++++++++++++++++++++++++++++++++++++-
> >  include/uapi/linux/vfio.h       |  23 ++
> >  3 files changed, 533 insertions(+), 41 deletions(-)
> >
> > --
> > 2.7.4
> >
> >
> > _______________________________________________
> > iommu mailing list
> > iommu@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list management
@ 2018-03-19 10:54     ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 63+ messages in thread
From: Shameerali Kolothum Thodi @ 2018-03-19 10:54 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: xuwei (O),
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Linuxarm

Hi Kevin,

Thanks for taking a look at this series.

> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org]
> Sent: Monday, March 19, 2018 8:29 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
> Cc: kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; xuwei (O)
> <xuwei5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; Linuxarm <linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> Subject: RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list
> management
> 
> > From: Shameer Kolothum
> > Sent: Friday, March 16, 2018 12:35 AM
> >
> > This series introduces an iova list associated with a vfio
> > iommu. The list is kept updated taking care of iommu apertures,
> > and reserved regions. Also this series adds checks for any conflict
> > with existing dma mappings whenever a new device group is attached to
> > the domain.
> >
> > User-space can retrieve valid iova ranges using VFIO_IOMMU_GET_INFO
> > ioctl capability chains. Any dma map request outside the valid iova
> > range will be rejected.
> 
> GET_INFO is done at initialization time which is good for cold attached
> devices. If a hotplugged device may cause change of valid iova ranges
> at run-time, then there could be potential problem (which however is
> difficult for user space or orchestration stack to figure out in advance)
> Can we do some extension like below to make hotplug case cleaner?

As I understand, in case a hotplugged device results in an update to the valid
Iova ranges then the Qemu, vfio_connect_container() --> memory_listner_register()
will fail if there is a conflict as patch #4 checks for invalid dma map requests. 

Not sure, your concern is preventing hotplug much before this happens or not.

Thanks,
Shameer

> - An interface allowing user space to request VFIO rejecting further
> attach_group if doing so may cause iova range change. e.g. Qemu can
> do such request once completing initial GET_INFO;
> 
> - or an event notification to user space upon change of valid iova
> ranges when attaching a new device at run-time. It goes one step
> further - even attach may cause iova range change, it may still
> succeed as long as Qemu hasn't allocated any iova in impacted
> range
> 
> Thanks
> Kevin
> 
> >
> >
> > v4 --> v5
> > Rebased to next-20180315.
> >
> >  -Incorporated the corner case bug fix suggested by Alex to patch #5.
> >  -Based on suggestions by Alex and Robin, added patch#7. This
> >   moves the PCI window  reservation back in to DMA specific path.
> >   This is to fix the issue reported by Eric[1].
> >
> > Note:
> > The patch #7 has dependency with [2][3]
> >
> > 1. https://patchwork.kernel.org/patch/10232043/
> > 2. https://patchwork.kernel.org/patch/10216553/
> > 3. https://patchwork.kernel.org/patch/10216555/
> >
> > v3 --> v4
> >  Addressed comments received for v3.
> >  -dma_addr_t instead of phys_addr_t
> >  -LIST_HEAD() usage.
> >  -Free up iova_copy list in case of error.
> >  -updated logic in filling the iova caps info(patch #5)
> >
> > RFCv2 --> v3
> >  Removed RFC tag.
> >  Addressed comments from Alex and Eric:
> >  - Added comments to make iova list management logic more clear.
> >  - Use of iova list copy so that original is not altered in
> >    case of failure.
> >
> > RFCv1 --> RFCv2
> >  Addressed comments from Alex:
> > -Introduced IOVA list management and added checks for conflicts with
> >  existing dma map entries during attach/detach.
> >
> > Shameer Kolothum (2):
> >   vfio/type1: Add IOVA range capability support
> >   iommu/dma: Move PCI window region reservation back into dma specific
> >     path.
> >
> > Shameerali Kolothum Thodi (5):
> >   vfio/type1: Introduce iova list and add iommu aperture validity check
> >   vfio/type1: Check reserve region conflict and update iova list
> >   vfio/type1: Update iova list on detach
> >   vfio/type1: check dma map request is within a valid iova range
> >   vfio/type1: remove duplicate retrieval of reserved regions
> >
> >  drivers/iommu/dma-iommu.c       |  54 ++---
> >  drivers/vfio/vfio_iommu_type1.c | 497
> > +++++++++++++++++++++++++++++++++++++++-
> >  include/uapi/linux/vfio.h       |  23 ++
> >  3 files changed, 533 insertions(+), 41 deletions(-)
> >
> > --
> > 2.7.4
> >
> >
> > _______________________________________________
> > iommu mailing list
> > iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list management
@ 2018-03-19 10:54     ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 63+ messages in thread
From: Shameerali Kolothum Thodi @ 2018-03-19 10:54 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: xuwei (O),
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Linuxarm

Hi Kevin,

Thanks for taking a look at this series.

> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org]
> Sent: Monday, March 19, 2018 8:29 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
> Cc: kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; xuwei (O)
> <xuwei5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; Linuxarm <linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> Subject: RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list
> management
> 
> > From: Shameer Kolothum
> > Sent: Friday, March 16, 2018 12:35 AM
> >
> > This series introduces an iova list associated with a vfio
> > iommu. The list is kept updated taking care of iommu apertures,
> > and reserved regions. Also this series adds checks for any conflict
> > with existing dma mappings whenever a new device group is attached to
> > the domain.
> >
> > User-space can retrieve valid iova ranges using VFIO_IOMMU_GET_INFO
> > ioctl capability chains. Any dma map request outside the valid iova
> > range will be rejected.
> 
> GET_INFO is done at initialization time which is good for cold attached
> devices. If a hotplugged device may cause change of valid iova ranges
> at run-time, then there could be potential problem (which however is
> difficult for user space or orchestration stack to figure out in advance)
> Can we do some extension like below to make hotplug case cleaner?

As I understand, in case a hotplugged device results in an update to the valid
Iova ranges then the Qemu, vfio_connect_container() --> memory_listner_register()
will fail if there is a conflict as patch #4 checks for invalid dma map requests. 

Not sure, your concern is preventing hotplug much before this happens or not.

Thanks,
Shameer

> - An interface allowing user space to request VFIO rejecting further
> attach_group if doing so may cause iova range change. e.g. Qemu can
> do such request once completing initial GET_INFO;
> 
> - or an event notification to user space upon change of valid iova
> ranges when attaching a new device at run-time. It goes one step
> further - even attach may cause iova range change, it may still
> succeed as long as Qemu hasn't allocated any iova in impacted
> range
> 
> Thanks
> Kevin
> 
> >
> >
> > v4 --> v5
> > Rebased to next-20180315.
> >
> >  -Incorporated the corner case bug fix suggested by Alex to patch #5.
> >  -Based on suggestions by Alex and Robin, added patch#7. This
> >   moves the PCI window  reservation back in to DMA specific path.
> >   This is to fix the issue reported by Eric[1].
> >
> > Note:
> > The patch #7 has dependency with [2][3]
> >
> > 1. https://patchwork.kernel.org/patch/10232043/
> > 2. https://patchwork.kernel.org/patch/10216553/
> > 3. https://patchwork.kernel.org/patch/10216555/
> >
> > v3 --> v4
> >  Addressed comments received for v3.
> >  -dma_addr_t instead of phys_addr_t
> >  -LIST_HEAD() usage.
> >  -Free up iova_copy list in case of error.
> >  -updated logic in filling the iova caps info(patch #5)
> >
> > RFCv2 --> v3
> >  Removed RFC tag.
> >  Addressed comments from Alex and Eric:
> >  - Added comments to make iova list management logic more clear.
> >  - Use of iova list copy so that original is not altered in
> >    case of failure.
> >
> > RFCv1 --> RFCv2
> >  Addressed comments from Alex:
> > -Introduced IOVA list management and added checks for conflicts with
> >  existing dma map entries during attach/detach.
> >
> > Shameer Kolothum (2):
> >   vfio/type1: Add IOVA range capability support
> >   iommu/dma: Move PCI window region reservation back into dma specific
> >     path.
> >
> > Shameerali Kolothum Thodi (5):
> >   vfio/type1: Introduce iova list and add iommu aperture validity check
> >   vfio/type1: Check reserve region conflict and update iova list
> >   vfio/type1: Update iova list on detach
> >   vfio/type1: check dma map request is within a valid iova range
> >   vfio/type1: remove duplicate retrieval of reserved regions
> >
> >  drivers/iommu/dma-iommu.c       |  54 ++---
> >  drivers/vfio/vfio_iommu_type1.c | 497
> > +++++++++++++++++++++++++++++++++++++++-
> >  include/uapi/linux/vfio.h       |  23 ++
> >  3 files changed, 533 insertions(+), 41 deletions(-)
> >
> > --
> > 2.7.4
> >
> >
> > _______________________________________________
> > iommu mailing list
> > iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-19 10:55       ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 63+ messages in thread
From: Shameerali Kolothum Thodi @ 2018-03-19 10:55 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson, eric.auger, pmorel
  Cc: kvm, linux-kernel, xuwei (O), Linuxarm, iommu



> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: Monday, March 19, 2018 7:52 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> alex.williamson@redhat.com; eric.auger@redhat.com;
> pmorel@linux.vnet.ibm.com
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org; xuwei (O)
> <xuwei5@huawei.com>; Linuxarm <linuxarm@huawei.com>;
> iommu@lists.linux-foundation.org
> Subject: RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and
> update iova list
> 
> > From: Shameer Kolothum
> > Sent: Friday, March 16, 2018 12:35 AM
> >
> > This retrieves the reserved regions associated with dev group and
> > checks for conflicts with any existing dma mappings. Also update
> > the iova list excluding the reserved regions.
> >
> > Signed-off-by: Shameer Kolothum
> > <shameerali.kolothum.thodi@huawei.com>
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 90
> > +++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 90 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c
> > index 1123c74..cfe2bb2 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct
> > list_head *iova,
> >  	return 0;
> >  }
> >
> > +/*
> > + * Check reserved region conflicts with existing dma mappings
> > + */
> > +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
> > +				struct list_head *resv_regions)
> > +{
> > +	struct iommu_resv_region *region;
> > +
> > +	/* Check for conflict with existing dma mappings */
> > +	list_for_each_entry(region, resv_regions, list) {
> > +		if (vfio_find_dma(iommu, region->start, region->length))
> > +			return true;
> > +	}
> > +
> > +	return false;
> > +}
> > +
> > +/*
> > + * Check iova region overlap with  reserved regions and
> > + * exclude them from the iommu iova range
> > + */
> > +static int vfio_iommu_resv_exclude(struct list_head *iova,
> > +					struct list_head *resv_regions)
> > +{
> > +	struct iommu_resv_region *resv;
> > +	struct vfio_iova *n, *next;
> > +
> > +	list_for_each_entry(resv, resv_regions, list) {
> > +		phys_addr_t start, end;
> > +
> > +		start = resv->start;
> > +		end = resv->start + resv->length - 1;
> > +
> > +		list_for_each_entry_safe(n, next, iova, list) {
> > +			int ret = 0;
> > +
> > +			/* No overlap */
> > +			if ((start > n->end) || (end < n->start))
> > +				continue;
> > +			/*
> > +			 * Insert a new node if current node overlaps with
> > the
> > +			 * reserve region to exlude that from valid iova
> > range.
> > +			 * Note that, new node is inserted before the
> > current
> > +			 * node and finally the current node is deleted
> > keeping
> > +			 * the list updated and sorted.
> > +			 */
> > +			if (start > n->start)
> > +				ret = vfio_iommu_iova_insert(&n->list,
> > +							n->start, start - 1);
> > +			if (!ret && end < n->end)
> > +				ret = vfio_iommu_iova_insert(&n->list,
> > +							end + 1, n->end);
> > +			if (ret)
> > +				return ret;
> 
> Is it safer to delete the 1st node here in case of failure of the 2nd node?
> There is no problem with current logic since upon error iova_copy will
> be released anyway. However this function alone doesn't assume the
> fact of a temporary list, thus it's better to keep the list clean w/o garbage
> left from any error handling.

Agree. I will consider this.

> > +
> > +			list_del(&n->list);
> > +			kfree(n);
> > +		}
> > +	}
> > +
> > +	if (list_empty(iova))
> > +		return -EINVAL;
> 
> if list_empty should BUG_ON here? or do we really need this check?

I think we need the check here. This basically means there is no valid iova
region as the reserved regions overlaps it completely(very unlikely, a bad
configuration maybe). The __attach will fail if that happens and may be 
WARN_ON is a good idea to notify the user.

Thanks,
Shameer

> > +
> > +	return 0;
> > +}
> > +
> > +static void vfio_iommu_resv_free(struct list_head *resv_regions)
> > +{
> > +	struct iommu_resv_region *n, *next;
> > +
> > +	list_for_each_entry_safe(n, next, resv_regions, list) {
> > +		list_del(&n->list);
> > +		kfree(n);
> > +	}
> > +}
> > +
> >  static void vfio_iommu_iova_free(struct list_head *iova)
> >  {
> >  	struct vfio_iova *n, *next;
> > @@ -1366,6 +1442,7 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> >  	phys_addr_t resv_msi_base;
> >  	struct iommu_domain_geometry geo;
> >  	LIST_HEAD(iova_copy);
> > +	LIST_HEAD(group_resv_regions);
> >
> >  	mutex_lock(&iommu->lock);
> >
> > @@ -1444,6 +1521,13 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> >  		goto out_detach;
> >  	}
> >
> > +	iommu_get_group_resv_regions(iommu_group,
> > &group_resv_regions);
> > +
> > +	if (vfio_iommu_resv_conflict(iommu, &group_resv_regions)) {
> > +		ret = -EINVAL;
> > +		goto out_detach;
> > +	}
> > +
> >  	/* Get a copy of the current iova list and work on it */
> >  	ret = vfio_iommu_iova_get_copy(iommu, &iova_copy);
> >  	if (ret)
> > @@ -1454,6 +1538,10 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> >  	if (ret)
> >  		goto out_detach;
> >
> > +	ret = vfio_iommu_resv_exclude(&iova_copy, &group_resv_regions);
> > +	if (ret)
> > +		goto out_detach;
> > +
> >  	resv_msi = vfio_iommu_has_sw_msi(iommu_group,
> > &resv_msi_base);
> >
> >  	INIT_LIST_HEAD(&domain->group_list);
> > @@ -1514,6 +1602,7 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> >  	/* Delete the old one and insert new iova list */
> >  	vfio_iommu_iova_insert_copy(iommu, &iova_copy);
> >  	mutex_unlock(&iommu->lock);
> > +	vfio_iommu_resv_free(&group_resv_regions);
> >
> >  	return 0;
> >
> > @@ -1522,6 +1611,7 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> >  out_domain:
> >  	iommu_domain_free(domain->domain);
> >  	vfio_iommu_iova_free(&iova_copy);
> > +	vfio_iommu_resv_free(&group_resv_regions);
> >  out_free:
> >  	kfree(domain);
> >  	kfree(group);
> > --
> > 2.7.4
> >
> >
> > _______________________________________________
> > iommu mailing list
> > iommu@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-19 10:55       ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 63+ messages in thread
From: Shameerali Kolothum Thodi @ 2018-03-19 10:55 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: xuwei (O),
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Linuxarm



> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org]
> Sent: Monday, March 19, 2018 7:52 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
> Cc: kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; xuwei (O)
> <xuwei5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; Linuxarm <linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> Subject: RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and
> update iova list
> 
> > From: Shameer Kolothum
> > Sent: Friday, March 16, 2018 12:35 AM
> >
> > This retrieves the reserved regions associated with dev group and
> > checks for conflicts with any existing dma mappings. Also update
> > the iova list excluding the reserved regions.
> >
> > Signed-off-by: Shameer Kolothum
> > <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 90
> > +++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 90 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c
> > index 1123c74..cfe2bb2 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct
> > list_head *iova,
> >  	return 0;
> >  }
> >
> > +/*
> > + * Check reserved region conflicts with existing dma mappings
> > + */
> > +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
> > +				struct list_head *resv_regions)
> > +{
> > +	struct iommu_resv_region *region;
> > +
> > +	/* Check for conflict with existing dma mappings */
> > +	list_for_each_entry(region, resv_regions, list) {
> > +		if (vfio_find_dma(iommu, region->start, region->length))
> > +			return true;
> > +	}
> > +
> > +	return false;
> > +}
> > +
> > +/*
> > + * Check iova region overlap with  reserved regions and
> > + * exclude them from the iommu iova range
> > + */
> > +static int vfio_iommu_resv_exclude(struct list_head *iova,
> > +					struct list_head *resv_regions)
> > +{
> > +	struct iommu_resv_region *resv;
> > +	struct vfio_iova *n, *next;
> > +
> > +	list_for_each_entry(resv, resv_regions, list) {
> > +		phys_addr_t start, end;
> > +
> > +		start = resv->start;
> > +		end = resv->start + resv->length - 1;
> > +
> > +		list_for_each_entry_safe(n, next, iova, list) {
> > +			int ret = 0;
> > +
> > +			/* No overlap */
> > +			if ((start > n->end) || (end < n->start))
> > +				continue;
> > +			/*
> > +			 * Insert a new node if current node overlaps with
> > the
> > +			 * reserve region to exlude that from valid iova
> > range.
> > +			 * Note that, new node is inserted before the
> > current
> > +			 * node and finally the current node is deleted
> > keeping
> > +			 * the list updated and sorted.
> > +			 */
> > +			if (start > n->start)
> > +				ret = vfio_iommu_iova_insert(&n->list,
> > +							n->start, start - 1);
> > +			if (!ret && end < n->end)
> > +				ret = vfio_iommu_iova_insert(&n->list,
> > +							end + 1, n->end);
> > +			if (ret)
> > +				return ret;
> 
> Is it safer to delete the 1st node here in case of failure of the 2nd node?
> There is no problem with current logic since upon error iova_copy will
> be released anyway. However this function alone doesn't assume the
> fact of a temporary list, thus it's better to keep the list clean w/o garbage
> left from any error handling.

Agree. I will consider this.

> > +
> > +			list_del(&n->list);
> > +			kfree(n);
> > +		}
> > +	}
> > +
> > +	if (list_empty(iova))
> > +		return -EINVAL;
> 
> if list_empty should BUG_ON here? or do we really need this check?

I think we need the check here. This basically means there is no valid iova
region as the reserved regions overlaps it completely(very unlikely, a bad
configuration maybe). The __attach will fail if that happens and may be 
WARN_ON is a good idea to notify the user.

Thanks,
Shameer

> > +
> > +	return 0;
> > +}
> > +
> > +static void vfio_iommu_resv_free(struct list_head *resv_regions)
> > +{
> > +	struct iommu_resv_region *n, *next;
> > +
> > +	list_for_each_entry_safe(n, next, resv_regions, list) {
> > +		list_del(&n->list);
> > +		kfree(n);
> > +	}
> > +}
> > +
> >  static void vfio_iommu_iova_free(struct list_head *iova)
> >  {
> >  	struct vfio_iova *n, *next;
> > @@ -1366,6 +1442,7 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> >  	phys_addr_t resv_msi_base;
> >  	struct iommu_domain_geometry geo;
> >  	LIST_HEAD(iova_copy);
> > +	LIST_HEAD(group_resv_regions);
> >
> >  	mutex_lock(&iommu->lock);
> >
> > @@ -1444,6 +1521,13 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> >  		goto out_detach;
> >  	}
> >
> > +	iommu_get_group_resv_regions(iommu_group,
> > &group_resv_regions);
> > +
> > +	if (vfio_iommu_resv_conflict(iommu, &group_resv_regions)) {
> > +		ret = -EINVAL;
> > +		goto out_detach;
> > +	}
> > +
> >  	/* Get a copy of the current iova list and work on it */
> >  	ret = vfio_iommu_iova_get_copy(iommu, &iova_copy);
> >  	if (ret)
> > @@ -1454,6 +1538,10 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> >  	if (ret)
> >  		goto out_detach;
> >
> > +	ret = vfio_iommu_resv_exclude(&iova_copy, &group_resv_regions);
> > +	if (ret)
> > +		goto out_detach;
> > +
> >  	resv_msi = vfio_iommu_has_sw_msi(iommu_group,
> > &resv_msi_base);
> >
> >  	INIT_LIST_HEAD(&domain->group_list);
> > @@ -1514,6 +1602,7 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> >  	/* Delete the old one and insert new iova list */
> >  	vfio_iommu_iova_insert_copy(iommu, &iova_copy);
> >  	mutex_unlock(&iommu->lock);
> > +	vfio_iommu_resv_free(&group_resv_regions);
> >
> >  	return 0;
> >
> > @@ -1522,6 +1611,7 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> >  out_domain:
> >  	iommu_domain_free(domain->domain);
> >  	vfio_iommu_iova_free(&iova_copy);
> > +	vfio_iommu_resv_free(&group_resv_regions);
> >  out_free:
> >  	kfree(domain);
> >  	kfree(group);
> > --
> > 2.7.4
> >
> >
> > _______________________________________________
> > iommu mailing list
> > iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-19 10:55       ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 63+ messages in thread
From: Shameerali Kolothum Thodi @ 2018-03-19 10:55 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: xuwei (O),
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Linuxarm



> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org]
> Sent: Monday, March 19, 2018 7:52 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
> Cc: kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; xuwei (O)
> <xuwei5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; Linuxarm <linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> Subject: RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and
> update iova list
> 
> > From: Shameer Kolothum
> > Sent: Friday, March 16, 2018 12:35 AM
> >
> > This retrieves the reserved regions associated with dev group and
> > checks for conflicts with any existing dma mappings. Also update
> > the iova list excluding the reserved regions.
> >
> > Signed-off-by: Shameer Kolothum
> > <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 90
> > +++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 90 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c
> > index 1123c74..cfe2bb2 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct
> > list_head *iova,
> >  	return 0;
> >  }
> >
> > +/*
> > + * Check reserved region conflicts with existing dma mappings
> > + */
> > +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
> > +				struct list_head *resv_regions)
> > +{
> > +	struct iommu_resv_region *region;
> > +
> > +	/* Check for conflict with existing dma mappings */
> > +	list_for_each_entry(region, resv_regions, list) {
> > +		if (vfio_find_dma(iommu, region->start, region->length))
> > +			return true;
> > +	}
> > +
> > +	return false;
> > +}
> > +
> > +/*
> > + * Check iova region overlap with  reserved regions and
> > + * exclude them from the iommu iova range
> > + */
> > +static int vfio_iommu_resv_exclude(struct list_head *iova,
> > +					struct list_head *resv_regions)
> > +{
> > +	struct iommu_resv_region *resv;
> > +	struct vfio_iova *n, *next;
> > +
> > +	list_for_each_entry(resv, resv_regions, list) {
> > +		phys_addr_t start, end;
> > +
> > +		start = resv->start;
> > +		end = resv->start + resv->length - 1;
> > +
> > +		list_for_each_entry_safe(n, next, iova, list) {
> > +			int ret = 0;
> > +
> > +			/* No overlap */
> > +			if ((start > n->end) || (end < n->start))
> > +				continue;
> > +			/*
> > +			 * Insert a new node if current node overlaps with
> > the
> > +			 * reserve region to exlude that from valid iova
> > range.
> > +			 * Note that, new node is inserted before the
> > current
> > +			 * node and finally the current node is deleted
> > keeping
> > +			 * the list updated and sorted.
> > +			 */
> > +			if (start > n->start)
> > +				ret = vfio_iommu_iova_insert(&n->list,
> > +							n->start, start - 1);
> > +			if (!ret && end < n->end)
> > +				ret = vfio_iommu_iova_insert(&n->list,
> > +							end + 1, n->end);
> > +			if (ret)
> > +				return ret;
> 
> Is it safer to delete the 1st node here in case of failure of the 2nd node?
> There is no problem with current logic since upon error iova_copy will
> be released anyway. However this function alone doesn't assume the
> fact of a temporary list, thus it's better to keep the list clean w/o garbage
> left from any error handling.

Agree. I will consider this.

> > +
> > +			list_del(&n->list);
> > +			kfree(n);
> > +		}
> > +	}
> > +
> > +	if (list_empty(iova))
> > +		return -EINVAL;
> 
> if list_empty should BUG_ON here? or do we really need this check?

I think we need the check here. This basically means there is no valid iova
region as the reserved regions overlaps it completely(very unlikely, a bad
configuration maybe). The __attach will fail if that happens and may be 
WARN_ON is a good idea to notify the user.

Thanks,
Shameer

> > +
> > +	return 0;
> > +}
> > +
> > +static void vfio_iommu_resv_free(struct list_head *resv_regions)
> > +{
> > +	struct iommu_resv_region *n, *next;
> > +
> > +	list_for_each_entry_safe(n, next, resv_regions, list) {
> > +		list_del(&n->list);
> > +		kfree(n);
> > +	}
> > +}
> > +
> >  static void vfio_iommu_iova_free(struct list_head *iova)
> >  {
> >  	struct vfio_iova *n, *next;
> > @@ -1366,6 +1442,7 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> >  	phys_addr_t resv_msi_base;
> >  	struct iommu_domain_geometry geo;
> >  	LIST_HEAD(iova_copy);
> > +	LIST_HEAD(group_resv_regions);
> >
> >  	mutex_lock(&iommu->lock);
> >
> > @@ -1444,6 +1521,13 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> >  		goto out_detach;
> >  	}
> >
> > +	iommu_get_group_resv_regions(iommu_group,
> > &group_resv_regions);
> > +
> > +	if (vfio_iommu_resv_conflict(iommu, &group_resv_regions)) {
> > +		ret = -EINVAL;
> > +		goto out_detach;
> > +	}
> > +
> >  	/* Get a copy of the current iova list and work on it */
> >  	ret = vfio_iommu_iova_get_copy(iommu, &iova_copy);
> >  	if (ret)
> > @@ -1454,6 +1538,10 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> >  	if (ret)
> >  		goto out_detach;
> >
> > +	ret = vfio_iommu_resv_exclude(&iova_copy, &group_resv_regions);
> > +	if (ret)
> > +		goto out_detach;
> > +
> >  	resv_msi = vfio_iommu_has_sw_msi(iommu_group,
> > &resv_msi_base);
> >
> >  	INIT_LIST_HEAD(&domain->group_list);
> > @@ -1514,6 +1602,7 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> >  	/* Delete the old one and insert new iova list */
> >  	vfio_iommu_iova_insert_copy(iommu, &iova_copy);
> >  	mutex_unlock(&iommu->lock);
> > +	vfio_iommu_resv_free(&group_resv_regions);
> >
> >  	return 0;
> >
> > @@ -1522,6 +1611,7 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> >  out_domain:
> >  	iommu_domain_free(domain->domain);
> >  	vfio_iommu_iova_free(&iova_copy);
> > +	vfio_iommu_resv_free(&group_resv_regions);
> >  out_free:
> >  	kfree(domain);
> >  	kfree(group);
> > --
> > 2.7.4
> >
> >
> > _______________________________________________
> > iommu mailing list
> > iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list management
@ 2018-03-19 12:12       ` Tian, Kevin
  0 siblings, 0 replies; 63+ messages in thread
From: Tian, Kevin @ 2018-03-19 12:12 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, alex.williamson, eric.auger, pmorel
  Cc: kvm, linux-kernel, xuwei (O), Linuxarm, iommu

> From: Shameerali Kolothum Thodi
> [mailto:shameerali.kolothum.thodi@huawei.com]
> 
> Hi Kevin,
> 
> Thanks for taking a look at this series.
> 
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > Sent: Monday, March 19, 2018 8:29 AM
> > To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>;
> > alex.williamson@redhat.com; eric.auger@redhat.com;
> > pmorel@linux.vnet.ibm.com
> > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org; xuwei (O)
> > <xuwei5@huawei.com>; Linuxarm <linuxarm@huawei.com>;
> > iommu@lists.linux-foundation.org
> > Subject: RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list
> > management
> >
> > > From: Shameer Kolothum
> > > Sent: Friday, March 16, 2018 12:35 AM
> > >
> > > This series introduces an iova list associated with a vfio
> > > iommu. The list is kept updated taking care of iommu apertures,
> > > and reserved regions. Also this series adds checks for any conflict
> > > with existing dma mappings whenever a new device group is attached
> to
> > > the domain.
> > >
> > > User-space can retrieve valid iova ranges using
> VFIO_IOMMU_GET_INFO
> > > ioctl capability chains. Any dma map request outside the valid iova
> > > range will be rejected.
> >
> > GET_INFO is done at initialization time which is good for cold attached
> > devices. If a hotplugged device may cause change of valid iova ranges
> > at run-time, then there could be potential problem (which however is
> > difficult for user space or orchestration stack to figure out in advance)
> > Can we do some extension like below to make hotplug case cleaner?
> 
> As I understand, in case a hotplugged device results in an update to the
> valid
> Iova ranges then the Qemu, vfio_connect_container() -->
> memory_listner_register()
> will fail if there is a conflict as patch #4 checks for invalid dma map requests.

OK, possibly Qemu can do another GET_INFO upon any dma map 
error to get latest ranges and then allocate a new valid iova to
redo the map. this should work if valid ranges shrink due to new
hotplugged device. But if hot-removing a device may result in
more valid ranges, so far there is no way for Qemu to pick up.
I'm not sure whether we want to go that far though...

> 
> Not sure, your concern is preventing hotplug much before this happens or
> not.
> 

yes, my earlier thought is more to catch the problem in attach phase.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list management
@ 2018-03-19 12:12       ` Tian, Kevin
  0 siblings, 0 replies; 63+ messages in thread
From: Tian, Kevin @ 2018-03-19 12:12 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: xuwei (O),
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Linuxarm

> From: Shameerali Kolothum Thodi
> [mailto:shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org]
> 
> Hi Kevin,
> 
> Thanks for taking a look at this series.
> 
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org]
> > Sent: Monday, March 19, 2018 8:29 AM
> > To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> > alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
> > Cc: kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; xuwei (O)
> > <xuwei5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; Linuxarm <linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> > iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > Subject: RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list
> > management
> >
> > > From: Shameer Kolothum
> > > Sent: Friday, March 16, 2018 12:35 AM
> > >
> > > This series introduces an iova list associated with a vfio
> > > iommu. The list is kept updated taking care of iommu apertures,
> > > and reserved regions. Also this series adds checks for any conflict
> > > with existing dma mappings whenever a new device group is attached
> to
> > > the domain.
> > >
> > > User-space can retrieve valid iova ranges using
> VFIO_IOMMU_GET_INFO
> > > ioctl capability chains. Any dma map request outside the valid iova
> > > range will be rejected.
> >
> > GET_INFO is done at initialization time which is good for cold attached
> > devices. If a hotplugged device may cause change of valid iova ranges
> > at run-time, then there could be potential problem (which however is
> > difficult for user space or orchestration stack to figure out in advance)
> > Can we do some extension like below to make hotplug case cleaner?
> 
> As I understand, in case a hotplugged device results in an update to the
> valid
> Iova ranges then the Qemu, vfio_connect_container() -->
> memory_listner_register()
> will fail if there is a conflict as patch #4 checks for invalid dma map requests.

OK, possibly Qemu can do another GET_INFO upon any dma map 
error to get latest ranges and then allocate a new valid iova to
redo the map. this should work if valid ranges shrink due to new
hotplugged device. But if hot-removing a device may result in
more valid ranges, so far there is no way for Qemu to pick up.
I'm not sure whether we want to go that far though...

> 
> Not sure, your concern is preventing hotplug much before this happens or
> not.
> 

yes, my earlier thought is more to catch the problem in attach phase.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list management
@ 2018-03-19 12:12       ` Tian, Kevin
  0 siblings, 0 replies; 63+ messages in thread
From: Tian, Kevin @ 2018-03-19 12:12 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: xuwei (O),
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Linuxarm

> From: Shameerali Kolothum Thodi
> [mailto:shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org]
> 
> Hi Kevin,
> 
> Thanks for taking a look at this series.
> 
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org]
> > Sent: Monday, March 19, 2018 8:29 AM
> > To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> > alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
> > Cc: kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; xuwei (O)
> > <xuwei5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; Linuxarm <linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> > iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > Subject: RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list
> > management
> >
> > > From: Shameer Kolothum
> > > Sent: Friday, March 16, 2018 12:35 AM
> > >
> > > This series introduces an iova list associated with a vfio
> > > iommu. The list is kept updated taking care of iommu apertures,
> > > and reserved regions. Also this series adds checks for any conflict
> > > with existing dma mappings whenever a new device group is attached
> to
> > > the domain.
> > >
> > > User-space can retrieve valid iova ranges using
> VFIO_IOMMU_GET_INFO
> > > ioctl capability chains. Any dma map request outside the valid iova
> > > range will be rejected.
> >
> > GET_INFO is done at initialization time which is good for cold attached
> > devices. If a hotplugged device may cause change of valid iova ranges
> > at run-time, then there could be potential problem (which however is
> > difficult for user space or orchestration stack to figure out in advance)
> > Can we do some extension like below to make hotplug case cleaner?
> 
> As I understand, in case a hotplugged device results in an update to the
> valid
> Iova ranges then the Qemu, vfio_connect_container() -->
> memory_listner_register()
> will fail if there is a conflict as patch #4 checks for invalid dma map requests.

OK, possibly Qemu can do another GET_INFO upon any dma map 
error to get latest ranges and then allocate a new valid iova to
redo the map. this should work if valid ranges shrink due to new
hotplugged device. But if hot-removing a device may result in
more valid ranges, so far there is no way for Qemu to pick up.
I'm not sure whether we want to go that far though...

> 
> Not sure, your concern is preventing hotplug much before this happens or
> not.
> 

yes, my earlier thought is more to catch the problem in attach phase.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-19 12:16         ` Tian, Kevin
  0 siblings, 0 replies; 63+ messages in thread
From: Tian, Kevin @ 2018-03-19 12:16 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, alex.williamson, eric.auger, pmorel
  Cc: kvm, linux-kernel, xuwei (O), Linuxarm, iommu

> From: Shameerali Kolothum Thodi
> [mailto:shameerali.kolothum.thodi@huawei.com]
> 
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > Sent: Monday, March 19, 2018 7:52 AM
> > To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>;
> > alex.williamson@redhat.com; eric.auger@redhat.com;
> > pmorel@linux.vnet.ibm.com
> > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org; xuwei (O)
> > <xuwei5@huawei.com>; Linuxarm <linuxarm@huawei.com>;
> > iommu@lists.linux-foundation.org
> > Subject: RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and
> > update iova list
> >
> > > From: Shameer Kolothum
> > > Sent: Friday, March 16, 2018 12:35 AM
> > >
> > > This retrieves the reserved regions associated with dev group and
> > > checks for conflicts with any existing dma mappings. Also update
> > > the iova list excluding the reserved regions.
> > >
> > > Signed-off-by: Shameer Kolothum
> > > <shameerali.kolothum.thodi@huawei.com>
> > > ---
> > >  drivers/vfio/vfio_iommu_type1.c | 90
> > > +++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 90 insertions(+)
> > >
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > b/drivers/vfio/vfio_iommu_type1.c
> > > index 1123c74..cfe2bb2 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct
> > > list_head *iova,
> > >  	return 0;
> > >  }
> > >
> > > +/*
> > > + * Check reserved region conflicts with existing dma mappings
> > > + */
> > > +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
> > > +				struct list_head *resv_regions)
> > > +{
> > > +	struct iommu_resv_region *region;
> > > +
> > > +	/* Check for conflict with existing dma mappings */
> > > +	list_for_each_entry(region, resv_regions, list) {
> > > +		if (vfio_find_dma(iommu, region->start, region->length))
> > > +			return true;
> > > +	}
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +/*
> > > + * Check iova region overlap with  reserved regions and
> > > + * exclude them from the iommu iova range
> > > + */
> > > +static int vfio_iommu_resv_exclude(struct list_head *iova,
> > > +					struct list_head *resv_regions)
> > > +{
> > > +	struct iommu_resv_region *resv;
> > > +	struct vfio_iova *n, *next;
> > > +
> > > +	list_for_each_entry(resv, resv_regions, list) {
> > > +		phys_addr_t start, end;
> > > +
> > > +		start = resv->start;
> > > +		end = resv->start + resv->length - 1;
> > > +
> > > +		list_for_each_entry_safe(n, next, iova, list) {
> > > +			int ret = 0;
> > > +
> > > +			/* No overlap */
> > > +			if ((start > n->end) || (end < n->start))
> > > +				continue;
> > > +			/*
> > > +			 * Insert a new node if current node overlaps with
> > > the
> > > +			 * reserve region to exlude that from valid iova
> > > range.
> > > +			 * Note that, new node is inserted before the
> > > current
> > > +			 * node and finally the current node is deleted
> > > keeping
> > > +			 * the list updated and sorted.
> > > +			 */
> > > +			if (start > n->start)
> > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > +							n->start, start - 1);
> > > +			if (!ret && end < n->end)
> > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > +							end + 1, n->end);
> > > +			if (ret)
> > > +				return ret;
> >
> > Is it safer to delete the 1st node here in case of failure of the 2nd node?
> > There is no problem with current logic since upon error iova_copy will
> > be released anyway. However this function alone doesn't assume the
> > fact of a temporary list, thus it's better to keep the list clean w/o garbage
> > left from any error handling.
> 
> Agree. I will consider this.
> 
> > > +
> > > +			list_del(&n->list);
> > > +			kfree(n);
> > > +		}
> > > +	}
> > > +
> > > +	if (list_empty(iova))
> > > +		return -EINVAL;
> >
> > if list_empty should BUG_ON here? or do we really need this check?
> 
> I think we need the check here. This basically means there is no valid iova
> region as the reserved regions overlaps it completely(very unlikely, a bad
> configuration maybe). The __attach will fail if that happens and may be
> WARN_ON is a good idea to notify the user.
> 

you are right. I misread the code that deletion happens only
after insertion...

Thanks
Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-19 12:16         ` Tian, Kevin
  0 siblings, 0 replies; 63+ messages in thread
From: Tian, Kevin @ 2018-03-19 12:16 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: xuwei (O),
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Linuxarm

> From: Shameerali Kolothum Thodi
> [mailto:shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org]
> 
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org]
> > Sent: Monday, March 19, 2018 7:52 AM
> > To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> > alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
> > Cc: kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; xuwei (O)
> > <xuwei5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; Linuxarm <linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> > iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > Subject: RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and
> > update iova list
> >
> > > From: Shameer Kolothum
> > > Sent: Friday, March 16, 2018 12:35 AM
> > >
> > > This retrieves the reserved regions associated with dev group and
> > > checks for conflicts with any existing dma mappings. Also update
> > > the iova list excluding the reserved regions.
> > >
> > > Signed-off-by: Shameer Kolothum
> > > <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > > ---
> > >  drivers/vfio/vfio_iommu_type1.c | 90
> > > +++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 90 insertions(+)
> > >
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > b/drivers/vfio/vfio_iommu_type1.c
> > > index 1123c74..cfe2bb2 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct
> > > list_head *iova,
> > >  	return 0;
> > >  }
> > >
> > > +/*
> > > + * Check reserved region conflicts with existing dma mappings
> > > + */
> > > +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
> > > +				struct list_head *resv_regions)
> > > +{
> > > +	struct iommu_resv_region *region;
> > > +
> > > +	/* Check for conflict with existing dma mappings */
> > > +	list_for_each_entry(region, resv_regions, list) {
> > > +		if (vfio_find_dma(iommu, region->start, region->length))
> > > +			return true;
> > > +	}
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +/*
> > > + * Check iova region overlap with  reserved regions and
> > > + * exclude them from the iommu iova range
> > > + */
> > > +static int vfio_iommu_resv_exclude(struct list_head *iova,
> > > +					struct list_head *resv_regions)
> > > +{
> > > +	struct iommu_resv_region *resv;
> > > +	struct vfio_iova *n, *next;
> > > +
> > > +	list_for_each_entry(resv, resv_regions, list) {
> > > +		phys_addr_t start, end;
> > > +
> > > +		start = resv->start;
> > > +		end = resv->start + resv->length - 1;
> > > +
> > > +		list_for_each_entry_safe(n, next, iova, list) {
> > > +			int ret = 0;
> > > +
> > > +			/* No overlap */
> > > +			if ((start > n->end) || (end < n->start))
> > > +				continue;
> > > +			/*
> > > +			 * Insert a new node if current node overlaps with
> > > the
> > > +			 * reserve region to exlude that from valid iova
> > > range.
> > > +			 * Note that, new node is inserted before the
> > > current
> > > +			 * node and finally the current node is deleted
> > > keeping
> > > +			 * the list updated and sorted.
> > > +			 */
> > > +			if (start > n->start)
> > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > +							n->start, start - 1);
> > > +			if (!ret && end < n->end)
> > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > +							end + 1, n->end);
> > > +			if (ret)
> > > +				return ret;
> >
> > Is it safer to delete the 1st node here in case of failure of the 2nd node?
> > There is no problem with current logic since upon error iova_copy will
> > be released anyway. However this function alone doesn't assume the
> > fact of a temporary list, thus it's better to keep the list clean w/o garbage
> > left from any error handling.
> 
> Agree. I will consider this.
> 
> > > +
> > > +			list_del(&n->list);
> > > +			kfree(n);
> > > +		}
> > > +	}
> > > +
> > > +	if (list_empty(iova))
> > > +		return -EINVAL;
> >
> > if list_empty should BUG_ON here? or do we really need this check?
> 
> I think we need the check here. This basically means there is no valid iova
> region as the reserved regions overlaps it completely(very unlikely, a bad
> configuration maybe). The __attach will fail if that happens and may be
> WARN_ON is a good idea to notify the user.
> 

you are right. I misread the code that deletion happens only
after insertion...

Thanks
Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-19 12:16         ` Tian, Kevin
  0 siblings, 0 replies; 63+ messages in thread
From: Tian, Kevin @ 2018-03-19 12:16 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: xuwei (O),
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Linuxarm

> From: Shameerali Kolothum Thodi
> [mailto:shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org]
> 
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org]
> > Sent: Monday, March 19, 2018 7:52 AM
> > To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> > alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
> > Cc: kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; xuwei (O)
> > <xuwei5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; Linuxarm <linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> > iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > Subject: RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and
> > update iova list
> >
> > > From: Shameer Kolothum
> > > Sent: Friday, March 16, 2018 12:35 AM
> > >
> > > This retrieves the reserved regions associated with dev group and
> > > checks for conflicts with any existing dma mappings. Also update
> > > the iova list excluding the reserved regions.
> > >
> > > Signed-off-by: Shameer Kolothum
> > > <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > > ---
> > >  drivers/vfio/vfio_iommu_type1.c | 90
> > > +++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 90 insertions(+)
> > >
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > b/drivers/vfio/vfio_iommu_type1.c
> > > index 1123c74..cfe2bb2 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct
> > > list_head *iova,
> > >  	return 0;
> > >  }
> > >
> > > +/*
> > > + * Check reserved region conflicts with existing dma mappings
> > > + */
> > > +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
> > > +				struct list_head *resv_regions)
> > > +{
> > > +	struct iommu_resv_region *region;
> > > +
> > > +	/* Check for conflict with existing dma mappings */
> > > +	list_for_each_entry(region, resv_regions, list) {
> > > +		if (vfio_find_dma(iommu, region->start, region->length))
> > > +			return true;
> > > +	}
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +/*
> > > + * Check iova region overlap with  reserved regions and
> > > + * exclude them from the iommu iova range
> > > + */
> > > +static int vfio_iommu_resv_exclude(struct list_head *iova,
> > > +					struct list_head *resv_regions)
> > > +{
> > > +	struct iommu_resv_region *resv;
> > > +	struct vfio_iova *n, *next;
> > > +
> > > +	list_for_each_entry(resv, resv_regions, list) {
> > > +		phys_addr_t start, end;
> > > +
> > > +		start = resv->start;
> > > +		end = resv->start + resv->length - 1;
> > > +
> > > +		list_for_each_entry_safe(n, next, iova, list) {
> > > +			int ret = 0;
> > > +
> > > +			/* No overlap */
> > > +			if ((start > n->end) || (end < n->start))
> > > +				continue;
> > > +			/*
> > > +			 * Insert a new node if current node overlaps with
> > > the
> > > +			 * reserve region to exlude that from valid iova
> > > range.
> > > +			 * Note that, new node is inserted before the
> > > current
> > > +			 * node and finally the current node is deleted
> > > keeping
> > > +			 * the list updated and sorted.
> > > +			 */
> > > +			if (start > n->start)
> > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > +							n->start, start - 1);
> > > +			if (!ret && end < n->end)
> > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > +							end + 1, n->end);
> > > +			if (ret)
> > > +				return ret;
> >
> > Is it safer to delete the 1st node here in case of failure of the 2nd node?
> > There is no problem with current logic since upon error iova_copy will
> > be released anyway. However this function alone doesn't assume the
> > fact of a temporary list, thus it's better to keep the list clean w/o garbage
> > left from any error handling.
> 
> Agree. I will consider this.
> 
> > > +
> > > +			list_del(&n->list);
> > > +			kfree(n);
> > > +		}
> > > +	}
> > > +
> > > +	if (list_empty(iova))
> > > +		return -EINVAL;
> >
> > if list_empty should BUG_ON here? or do we really need this check?
> 
> I think we need the check here. This basically means there is no valid iova
> region as the reserved regions overlaps it completely(very unlikely, a bad
> configuration maybe). The __attach will fail if that happens and may be
> WARN_ON is a good idea to notify the user.
> 

you are right. I misread the code that deletion happens only
after insertion...

Thanks
Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-20 22:37       ` Alex Williamson
  0 siblings, 0 replies; 63+ messages in thread
From: Alex Williamson @ 2018-03-20 22:37 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Shameer Kolothum, eric.auger, pmorel, kvm, linux-kernel, xuwei5,
	linuxarm, iommu

On Mon, 19 Mar 2018 07:51:58 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Shameer Kolothum
> > Sent: Friday, March 16, 2018 12:35 AM
> > 
> > This retrieves the reserved regions associated with dev group and
> > checks for conflicts with any existing dma mappings. Also update
> > the iova list excluding the reserved regions.
> > 
> > Signed-off-by: Shameer Kolothum
> > <shameerali.kolothum.thodi@huawei.com>
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 90
> > +++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 90 insertions(+)
> > 
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c
> > index 1123c74..cfe2bb2 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct
> > list_head *iova,
> >  	return 0;
> >  }
> > 
> > +/*
> > + * Check reserved region conflicts with existing dma mappings
> > + */
> > +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
> > +				struct list_head *resv_regions)
> > +{
> > +	struct iommu_resv_region *region;
> > +
> > +	/* Check for conflict with existing dma mappings */
> > +	list_for_each_entry(region, resv_regions, list) {
> > +		if (vfio_find_dma(iommu, region->start, region->length))
> > +			return true;
> > +	}
> > +
> > +	return false;
> > +}
> > +
> > +/*
> > + * Check iova region overlap with  reserved regions and
> > + * exclude them from the iommu iova range
> > + */
> > +static int vfio_iommu_resv_exclude(struct list_head *iova,
> > +					struct list_head *resv_regions)
> > +{
> > +	struct iommu_resv_region *resv;
> > +	struct vfio_iova *n, *next;
> > +
> > +	list_for_each_entry(resv, resv_regions, list) {
> > +		phys_addr_t start, end;
> > +
> > +		start = resv->start;
> > +		end = resv->start + resv->length - 1;
> > +
> > +		list_for_each_entry_safe(n, next, iova, list) {
> > +			int ret = 0;
> > +
> > +			/* No overlap */
> > +			if ((start > n->end) || (end < n->start))
> > +				continue;
> > +			/*
> > +			 * Insert a new node if current node overlaps with
> > the
> > +			 * reserve region to exlude that from valid iova
> > range.
> > +			 * Note that, new node is inserted before the
> > current
> > +			 * node and finally the current node is deleted
> > keeping
> > +			 * the list updated and sorted.
> > +			 */
> > +			if (start > n->start)
> > +				ret = vfio_iommu_iova_insert(&n->list,
> > +							n->start, start - 1);
> > +			if (!ret && end < n->end)
> > +				ret = vfio_iommu_iova_insert(&n->list,
> > +							end + 1, n->end);
> > +			if (ret)
> > +				return ret;  
> 
> Is it safer to delete the 1st node here in case of failure of the 2nd node?
> There is no problem with current logic since upon error iova_copy will
> be released anyway. However this function alone doesn't assume the
> fact of a temporary list, thus it's better to keep the list clean w/o garbage
> left from any error handling.

I don't think the proposal makes the list notably more sane on failure
than we have here.  If the function returns an error and the list is
modified in any way, how can the caller recover?  We're operating on a
principle of modify a copy and throw it away on error, the only
function level solution to the problem you're noting is to make each
function generate a working copy, which is clearly inefficient.  This
is a static function, not intended for general use, so I think a
sufficient approach to address your concern is to simply note the error
behavior in the comment above the function, the list is in an
unknown/inconsistent state on error.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-20 22:37       ` Alex Williamson
  0 siblings, 0 replies; 63+ messages in thread
From: Alex Williamson @ 2018-03-20 22:37 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Mon, 19 Mar 2018 07:51:58 +0000
"Tian, Kevin" <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:

> > From: Shameer Kolothum
> > Sent: Friday, March 16, 2018 12:35 AM
> > 
> > This retrieves the reserved regions associated with dev group and
> > checks for conflicts with any existing dma mappings. Also update
> > the iova list excluding the reserved regions.
> > 
> > Signed-off-by: Shameer Kolothum
> > <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 90
> > +++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 90 insertions(+)
> > 
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c
> > index 1123c74..cfe2bb2 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct
> > list_head *iova,
> >  	return 0;
> >  }
> > 
> > +/*
> > + * Check reserved region conflicts with existing dma mappings
> > + */
> > +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
> > +				struct list_head *resv_regions)
> > +{
> > +	struct iommu_resv_region *region;
> > +
> > +	/* Check for conflict with existing dma mappings */
> > +	list_for_each_entry(region, resv_regions, list) {
> > +		if (vfio_find_dma(iommu, region->start, region->length))
> > +			return true;
> > +	}
> > +
> > +	return false;
> > +}
> > +
> > +/*
> > + * Check iova region overlap with  reserved regions and
> > + * exclude them from the iommu iova range
> > + */
> > +static int vfio_iommu_resv_exclude(struct list_head *iova,
> > +					struct list_head *resv_regions)
> > +{
> > +	struct iommu_resv_region *resv;
> > +	struct vfio_iova *n, *next;
> > +
> > +	list_for_each_entry(resv, resv_regions, list) {
> > +		phys_addr_t start, end;
> > +
> > +		start = resv->start;
> > +		end = resv->start + resv->length - 1;
> > +
> > +		list_for_each_entry_safe(n, next, iova, list) {
> > +			int ret = 0;
> > +
> > +			/* No overlap */
> > +			if ((start > n->end) || (end < n->start))
> > +				continue;
> > +			/*
> > +			 * Insert a new node if current node overlaps with
> > the
> > +			 * reserve region to exlude that from valid iova
> > range.
> > +			 * Note that, new node is inserted before the
> > current
> > +			 * node and finally the current node is deleted
> > keeping
> > +			 * the list updated and sorted.
> > +			 */
> > +			if (start > n->start)
> > +				ret = vfio_iommu_iova_insert(&n->list,
> > +							n->start, start - 1);
> > +			if (!ret && end < n->end)
> > +				ret = vfio_iommu_iova_insert(&n->list,
> > +							end + 1, n->end);
> > +			if (ret)
> > +				return ret;  
> 
> Is it safer to delete the 1st node here in case of failure of the 2nd node?
> There is no problem with current logic since upon error iova_copy will
> be released anyway. However this function alone doesn't assume the
> fact of a temporary list, thus it's better to keep the list clean w/o garbage
> left from any error handling.

I don't think the proposal makes the list notably more sane on failure
than we have here.  If the function returns an error and the list is
modified in any way, how can the caller recover?  We're operating on a
principle of modify a copy and throw it away on error, the only
function level solution to the problem you're noting is to make each
function generate a working copy, which is clearly inefficient.  This
is a static function, not intended for general use, so I think a
sufficient approach to address your concern is to simply note the error
behavior in the comment above the function, the list is in an
unknown/inconsistent state on error.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 0/7] vfio/type1: Add support for valid iova list management
@ 2018-03-20 22:55     ` Alex Williamson
  0 siblings, 0 replies; 63+ messages in thread
From: Alex Williamson @ 2018-03-20 22:55 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Shameer Kolothum, eric.auger, pmorel, kvm, linux-kernel, xuwei5,
	linuxarm, iommu

On Mon, 19 Mar 2018 08:28:32 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Shameer Kolothum
> > Sent: Friday, March 16, 2018 12:35 AM
> > 
> > This series introduces an iova list associated with a vfio
> > iommu. The list is kept updated taking care of iommu apertures,
> > and reserved regions. Also this series adds checks for any conflict
> > with existing dma mappings whenever a new device group is attached to
> > the domain.
> > 
> > User-space can retrieve valid iova ranges using VFIO_IOMMU_GET_INFO
> > ioctl capability chains. Any dma map request outside the valid iova
> > range will be rejected.  
> 
> GET_INFO is done at initialization time which is good for cold attached
> devices. If a hotplugged device may cause change of valid iova ranges
> at run-time, then there could be potential problem (which however is
> difficult for user space or orchestration stack to figure out in advance)
> Can we do some extension like below to make hotplug case cleaner?

Let's be clear what we mean by hotplug here, as I see it, the only
relevant hotplug would be a physical device, hot added to the host,
which becomes a member of an existing, in-use IOMMU group.  If, on the
other hand, we're talking about hotplug related to the user process,
there's nothing asynchronous about that.  For instance in the QEMU
case, QEMU must add the group to the container, at which point it can
evaluate the new iova list and remove the group from the container if
it doesn't like the result.  So what would be a case of the available
iova list for a group changing as a result of adding a device?

> - An interface allowing user space to request VFIO rejecting further 
> attach_group if doing so may cause iova range change. e.g. Qemu can 
> do such request once completing initial GET_INFO;

For the latter case above, it seems unnecessary, QEMU can revert the
attach, we're only promising that the attach won't interfere with
existing mappings.  For the host hotplug case, the user has no control,
the new device is a member of the iommu group and therefore necessarily
becomes a part of container.  I imagine there are plenty of other holes
in this scenario already.

> - or an event notification to user space upon change of valid iova
> ranges when attaching a new device at run-time. It goes one step 
> further - even attach may cause iova range change, it may still
> succeed as long as Qemu hasn't allocated any iova in impacted 
> range

Same as above, in the QEMU hotplug case, the user is orchestrating the
adding of the group to the container, they can then check the iommu
info on their own and determine what, if any, changes are relevant to
them, knowing that the addition would not succeed if any current
mappings where affected.  In the host case, a notification would be
required, but we'd first need to identify exactly how the iova list can
change asynchronous to the user doing anything.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 0/7] vfio/type1: Add support for valid iova list management
@ 2018-03-20 22:55     ` Alex Williamson
  0 siblings, 0 replies; 63+ messages in thread
From: Alex Williamson @ 2018-03-20 22:55 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Mon, 19 Mar 2018 08:28:32 +0000
"Tian, Kevin" <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:

> > From: Shameer Kolothum
> > Sent: Friday, March 16, 2018 12:35 AM
> > 
> > This series introduces an iova list associated with a vfio
> > iommu. The list is kept updated taking care of iommu apertures,
> > and reserved regions. Also this series adds checks for any conflict
> > with existing dma mappings whenever a new device group is attached to
> > the domain.
> > 
> > User-space can retrieve valid iova ranges using VFIO_IOMMU_GET_INFO
> > ioctl capability chains. Any dma map request outside the valid iova
> > range will be rejected.  
> 
> GET_INFO is done at initialization time which is good for cold attached
> devices. If a hotplugged device may cause change of valid iova ranges
> at run-time, then there could be potential problem (which however is
> difficult for user space or orchestration stack to figure out in advance)
> Can we do some extension like below to make hotplug case cleaner?

Let's be clear what we mean by hotplug here, as I see it, the only
relevant hotplug would be a physical device, hot added to the host,
which becomes a member of an existing, in-use IOMMU group.  If, on the
other hand, we're talking about hotplug related to the user process,
there's nothing asynchronous about that.  For instance in the QEMU
case, QEMU must add the group to the container, at which point it can
evaluate the new iova list and remove the group from the container if
it doesn't like the result.  So what would be a case of the available
iova list for a group changing as a result of adding a device?

> - An interface allowing user space to request VFIO rejecting further 
> attach_group if doing so may cause iova range change. e.g. Qemu can 
> do such request once completing initial GET_INFO;

For the latter case above, it seems unnecessary, QEMU can revert the
attach, we're only promising that the attach won't interfere with
existing mappings.  For the host hotplug case, the user has no control,
the new device is a member of the iommu group and therefore necessarily
becomes a part of container.  I imagine there are plenty of other holes
in this scenario already.

> - or an event notification to user space upon change of valid iova
> ranges when attaching a new device at run-time. It goes one step 
> further - even attach may cause iova range change, it may still
> succeed as long as Qemu hasn't allocated any iova in impacted 
> range

Same as above, in the QEMU hotplug case, the user is orchestrating the
adding of the group to the container, they can then check the iommu
info on their own and determine what, if any, changes are relevant to
them, knowing that the addition would not succeed if any current
mappings where affected.  In the host case, a notification would be
required, but we'd first need to identify exactly how the iova list can
change asynchronous to the user doing anything.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list management
@ 2018-03-21  3:28       ` Tian, Kevin
  0 siblings, 0 replies; 63+ messages in thread
From: Tian, Kevin @ 2018-03-21  3:28 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Shameer Kolothum, eric.auger, pmorel, kvm, linux-kernel, xuwei5,
	linuxarm, iommu

> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Wednesday, March 21, 2018 6:55 AM
> 
> On Mon, 19 Mar 2018 08:28:32 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Shameer Kolothum
> > > Sent: Friday, March 16, 2018 12:35 AM
> > >
> > > This series introduces an iova list associated with a vfio
> > > iommu. The list is kept updated taking care of iommu apertures,
> > > and reserved regions. Also this series adds checks for any conflict
> > > with existing dma mappings whenever a new device group is attached
> to
> > > the domain.
> > >
> > > User-space can retrieve valid iova ranges using
> VFIO_IOMMU_GET_INFO
> > > ioctl capability chains. Any dma map request outside the valid iova
> > > range will be rejected.
> >
> > GET_INFO is done at initialization time which is good for cold attached
> > devices. If a hotplugged device may cause change of valid iova ranges
> > at run-time, then there could be potential problem (which however is
> > difficult for user space or orchestration stack to figure out in advance)
> > Can we do some extension like below to make hotplug case cleaner?
> 
> Let's be clear what we mean by hotplug here, as I see it, the only
> relevant hotplug would be a physical device, hot added to the host,
> which becomes a member of an existing, in-use IOMMU group.  If, on the
> other hand, we're talking about hotplug related to the user process,
> there's nothing asynchronous about that.  For instance in the QEMU
> case, QEMU must add the group to the container, at which point it can
> evaluate the new iova list and remove the group from the container if
> it doesn't like the result.  So what would be a case of the available
> iova list for a group changing as a result of adding a device?

My original thought was about the latter case. At that moment 
I was not sure whether the window between adding/removing 
the group may cause some issue if there are right some IOVA 
allocations happening in parallel. But looks Qemu can anyway
handle it well as long as such scenario is considered.

> 
> > - An interface allowing user space to request VFIO rejecting further
> > attach_group if doing so may cause iova range change. e.g. Qemu can
> > do such request once completing initial GET_INFO;
> 
> For the latter case above, it seems unnecessary, QEMU can revert the
> attach, we're only promising that the attach won't interfere with
> existing mappings.  For the host hotplug case, the user has no control,
> the new device is a member of the iommu group and therefore necessarily
> becomes a part of container.  I imagine there are plenty of other holes
> in this scenario already.
> 
> > - or an event notification to user space upon change of valid iova
> > ranges when attaching a new device at run-time. It goes one step
> > further - even attach may cause iova range change, it may still
> > succeed as long as Qemu hasn't allocated any iova in impacted
> > range
> 
> Same as above, in the QEMU hotplug case, the user is orchestrating the
> adding of the group to the container, they can then check the iommu
> info on their own and determine what, if any, changes are relevant to
> them, knowing that the addition would not succeed if any current
> mappings where affected.  In the host case, a notification would be
> required, but we'd first need to identify exactly how the iova list can
> change asynchronous to the user doing anything.  Thanks,

for host hotplug, possibly notification could be an opt-in model.
meaning if user space doesn't explicitly request receiving notification 
on such event, the device is just left in unused state (vfio-pci still
claims the device, assuming it assigned to the container owner, but
the owner doesn't use it)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list management
@ 2018-03-21  3:28       ` Tian, Kevin
  0 siblings, 0 replies; 63+ messages in thread
From: Tian, Kevin @ 2018-03-21  3:28 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

> From: Alex Williamson [mailto:alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> Sent: Wednesday, March 21, 2018 6:55 AM
> 
> On Mon, 19 Mar 2018 08:28:32 +0000
> "Tian, Kevin" <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> 
> > > From: Shameer Kolothum
> > > Sent: Friday, March 16, 2018 12:35 AM
> > >
> > > This series introduces an iova list associated with a vfio
> > > iommu. The list is kept updated taking care of iommu apertures,
> > > and reserved regions. Also this series adds checks for any conflict
> > > with existing dma mappings whenever a new device group is attached
> to
> > > the domain.
> > >
> > > User-space can retrieve valid iova ranges using
> VFIO_IOMMU_GET_INFO
> > > ioctl capability chains. Any dma map request outside the valid iova
> > > range will be rejected.
> >
> > GET_INFO is done at initialization time which is good for cold attached
> > devices. If a hotplugged device may cause change of valid iova ranges
> > at run-time, then there could be potential problem (which however is
> > difficult for user space or orchestration stack to figure out in advance)
> > Can we do some extension like below to make hotplug case cleaner?
> 
> Let's be clear what we mean by hotplug here, as I see it, the only
> relevant hotplug would be a physical device, hot added to the host,
> which becomes a member of an existing, in-use IOMMU group.  If, on the
> other hand, we're talking about hotplug related to the user process,
> there's nothing asynchronous about that.  For instance in the QEMU
> case, QEMU must add the group to the container, at which point it can
> evaluate the new iova list and remove the group from the container if
> it doesn't like the result.  So what would be a case of the available
> iova list for a group changing as a result of adding a device?

My original thought was about the latter case. At that moment 
I was not sure whether the window between adding/removing 
the group may cause some issue if there are right some IOVA 
allocations happening in parallel. But looks Qemu can anyway
handle it well as long as such scenario is considered.

> 
> > - An interface allowing user space to request VFIO rejecting further
> > attach_group if doing so may cause iova range change. e.g. Qemu can
> > do such request once completing initial GET_INFO;
> 
> For the latter case above, it seems unnecessary, QEMU can revert the
> attach, we're only promising that the attach won't interfere with
> existing mappings.  For the host hotplug case, the user has no control,
> the new device is a member of the iommu group and therefore necessarily
> becomes a part of container.  I imagine there are plenty of other holes
> in this scenario already.
> 
> > - or an event notification to user space upon change of valid iova
> > ranges when attaching a new device at run-time. It goes one step
> > further - even attach may cause iova range change, it may still
> > succeed as long as Qemu hasn't allocated any iova in impacted
> > range
> 
> Same as above, in the QEMU hotplug case, the user is orchestrating the
> adding of the group to the container, they can then check the iommu
> info on their own and determine what, if any, changes are relevant to
> them, knowing that the addition would not succeed if any current
> mappings where affected.  In the host case, a notification would be
> required, but we'd first need to identify exactly how the iova list can
> change asynchronous to the user doing anything.  Thanks,

for host hotplug, possibly notification could be an opt-in model.
meaning if user space doesn't explicitly request receiving notification 
on such event, the device is just left in unused state (vfio-pci still
claims the device, assuming it assigned to the container owner, but
the owner doesn't use it)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-21  3:30         ` Tian, Kevin
  0 siblings, 0 replies; 63+ messages in thread
From: Tian, Kevin @ 2018-03-21  3:30 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Shameer Kolothum, eric.auger, pmorel, kvm, linux-kernel, xuwei5,
	linuxarm, iommu

> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Wednesday, March 21, 2018 6:38 AM
> 
> On Mon, 19 Mar 2018 07:51:58 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Shameer Kolothum
> > > Sent: Friday, March 16, 2018 12:35 AM
> > >
> > > This retrieves the reserved regions associated with dev group and
> > > checks for conflicts with any existing dma mappings. Also update
> > > the iova list excluding the reserved regions.
> > >
> > > Signed-off-by: Shameer Kolothum
> > > <shameerali.kolothum.thodi@huawei.com>
> > > ---
> > >  drivers/vfio/vfio_iommu_type1.c | 90
> > > +++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 90 insertions(+)
> > >
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > b/drivers/vfio/vfio_iommu_type1.c
> > > index 1123c74..cfe2bb2 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct
> > > list_head *iova,
> > >  	return 0;
> > >  }
> > >
> > > +/*
> > > + * Check reserved region conflicts with existing dma mappings
> > > + */
> > > +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
> > > +				struct list_head *resv_regions)
> > > +{
> > > +	struct iommu_resv_region *region;
> > > +
> > > +	/* Check for conflict with existing dma mappings */
> > > +	list_for_each_entry(region, resv_regions, list) {
> > > +		if (vfio_find_dma(iommu, region->start, region->length))
> > > +			return true;
> > > +	}
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +/*
> > > + * Check iova region overlap with  reserved regions and
> > > + * exclude them from the iommu iova range
> > > + */
> > > +static int vfio_iommu_resv_exclude(struct list_head *iova,
> > > +					struct list_head *resv_regions)
> > > +{
> > > +	struct iommu_resv_region *resv;
> > > +	struct vfio_iova *n, *next;
> > > +
> > > +	list_for_each_entry(resv, resv_regions, list) {
> > > +		phys_addr_t start, end;
> > > +
> > > +		start = resv->start;
> > > +		end = resv->start + resv->length - 1;
> > > +
> > > +		list_for_each_entry_safe(n, next, iova, list) {
> > > +			int ret = 0;
> > > +
> > > +			/* No overlap */
> > > +			if ((start > n->end) || (end < n->start))
> > > +				continue;
> > > +			/*
> > > +			 * Insert a new node if current node overlaps with
> > > the
> > > +			 * reserve region to exlude that from valid iova
> > > range.
> > > +			 * Note that, new node is inserted before the
> > > current
> > > +			 * node and finally the current node is deleted
> > > keeping
> > > +			 * the list updated and sorted.
> > > +			 */
> > > +			if (start > n->start)
> > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > +							n->start, start - 1);
> > > +			if (!ret && end < n->end)
> > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > +							end + 1, n->end);
> > > +			if (ret)
> > > +				return ret;
> >
> > Is it safer to delete the 1st node here in case of failure of the 2nd node?
> > There is no problem with current logic since upon error iova_copy will
> > be released anyway. However this function alone doesn't assume the
> > fact of a temporary list, thus it's better to keep the list clean w/o garbage
> > left from any error handling.
> 
> I don't think the proposal makes the list notably more sane on failure
> than we have here.  If the function returns an error and the list is
> modified in any way, how can the caller recover?  We're operating on a
> principle of modify a copy and throw it away on error, the only
> function level solution to the problem you're noting is to make each
> function generate a working copy, which is clearly inefficient.  This
> is a static function, not intended for general use, so I think a
> sufficient approach to address your concern is to simply note the error
> behavior in the comment above the function, the list is in an
> unknown/inconsistent state on error.  Thanks,
> 

'static' doesn't mean it cannot be used for general purpose in the same
file. Clarifying the expected behavior in comment is OK to me.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-21  3:30         ` Tian, Kevin
  0 siblings, 0 replies; 63+ messages in thread
From: Tian, Kevin @ 2018-03-21  3:30 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

> From: Alex Williamson [mailto:alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> Sent: Wednesday, March 21, 2018 6:38 AM
> 
> On Mon, 19 Mar 2018 07:51:58 +0000
> "Tian, Kevin" <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> 
> > > From: Shameer Kolothum
> > > Sent: Friday, March 16, 2018 12:35 AM
> > >
> > > This retrieves the reserved regions associated with dev group and
> > > checks for conflicts with any existing dma mappings. Also update
> > > the iova list excluding the reserved regions.
> > >
> > > Signed-off-by: Shameer Kolothum
> > > <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > > ---
> > >  drivers/vfio/vfio_iommu_type1.c | 90
> > > +++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 90 insertions(+)
> > >
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > b/drivers/vfio/vfio_iommu_type1.c
> > > index 1123c74..cfe2bb2 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct
> > > list_head *iova,
> > >  	return 0;
> > >  }
> > >
> > > +/*
> > > + * Check reserved region conflicts with existing dma mappings
> > > + */
> > > +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
> > > +				struct list_head *resv_regions)
> > > +{
> > > +	struct iommu_resv_region *region;
> > > +
> > > +	/* Check for conflict with existing dma mappings */
> > > +	list_for_each_entry(region, resv_regions, list) {
> > > +		if (vfio_find_dma(iommu, region->start, region->length))
> > > +			return true;
> > > +	}
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +/*
> > > + * Check iova region overlap with  reserved regions and
> > > + * exclude them from the iommu iova range
> > > + */
> > > +static int vfio_iommu_resv_exclude(struct list_head *iova,
> > > +					struct list_head *resv_regions)
> > > +{
> > > +	struct iommu_resv_region *resv;
> > > +	struct vfio_iova *n, *next;
> > > +
> > > +	list_for_each_entry(resv, resv_regions, list) {
> > > +		phys_addr_t start, end;
> > > +
> > > +		start = resv->start;
> > > +		end = resv->start + resv->length - 1;
> > > +
> > > +		list_for_each_entry_safe(n, next, iova, list) {
> > > +			int ret = 0;
> > > +
> > > +			/* No overlap */
> > > +			if ((start > n->end) || (end < n->start))
> > > +				continue;
> > > +			/*
> > > +			 * Insert a new node if current node overlaps with
> > > the
> > > +			 * reserve region to exlude that from valid iova
> > > range.
> > > +			 * Note that, new node is inserted before the
> > > current
> > > +			 * node and finally the current node is deleted
> > > keeping
> > > +			 * the list updated and sorted.
> > > +			 */
> > > +			if (start > n->start)
> > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > +							n->start, start - 1);
> > > +			if (!ret && end < n->end)
> > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > +							end + 1, n->end);
> > > +			if (ret)
> > > +				return ret;
> >
> > Is it safer to delete the 1st node here in case of failure of the 2nd node?
> > There is no problem with current logic since upon error iova_copy will
> > be released anyway. However this function alone doesn't assume the
> > fact of a temporary list, thus it's better to keep the list clean w/o garbage
> > left from any error handling.
> 
> I don't think the proposal makes the list notably more sane on failure
> than we have here.  If the function returns an error and the list is
> modified in any way, how can the caller recover?  We're operating on a
> principle of modify a copy and throw it away on error, the only
> function level solution to the problem you're noting is to make each
> function generate a working copy, which is clearly inefficient.  This
> is a static function, not intended for general use, so I think a
> sufficient approach to address your concern is to simply note the error
> behavior in the comment above the function, the list is in an
> unknown/inconsistent state on error.  Thanks,
> 

'static' doesn't mean it cannot be used for general purpose in the same
file. Clarifying the expected behavior in comment is OK to me.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-21 16:31           ` Alex Williamson
  0 siblings, 0 replies; 63+ messages in thread
From: Alex Williamson @ 2018-03-21 16:31 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Shameer Kolothum, eric.auger, pmorel, kvm, linux-kernel, xuwei5,
	linuxarm, iommu

On Wed, 21 Mar 2018 03:30:29 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Wednesday, March 21, 2018 6:38 AM
> > 
> > On Mon, 19 Mar 2018 07:51:58 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >   
> > > > From: Shameer Kolothum
> > > > Sent: Friday, March 16, 2018 12:35 AM
> > > >
> > > > This retrieves the reserved regions associated with dev group and
> > > > checks for conflicts with any existing dma mappings. Also update
> > > > the iova list excluding the reserved regions.
> > > >
> > > > Signed-off-by: Shameer Kolothum
> > > > <shameerali.kolothum.thodi@huawei.com>
> > > > ---
> > > >  drivers/vfio/vfio_iommu_type1.c | 90
> > > > +++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 90 insertions(+)
> > > >
> > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > b/drivers/vfio/vfio_iommu_type1.c
> > > > index 1123c74..cfe2bb2 100644
> > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct
> > > > list_head *iova,
> > > >  	return 0;
> > > >  }
> > > >
> > > > +/*
> > > > + * Check reserved region conflicts with existing dma mappings
> > > > + */
> > > > +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
> > > > +				struct list_head *resv_regions)
> > > > +{
> > > > +	struct iommu_resv_region *region;
> > > > +
> > > > +	/* Check for conflict with existing dma mappings */
> > > > +	list_for_each_entry(region, resv_regions, list) {
> > > > +		if (vfio_find_dma(iommu, region->start, region->length))
> > > > +			return true;
> > > > +	}
> > > > +
> > > > +	return false;
> > > > +}
> > > > +
> > > > +/*
> > > > + * Check iova region overlap with  reserved regions and
> > > > + * exclude them from the iommu iova range
> > > > + */
> > > > +static int vfio_iommu_resv_exclude(struct list_head *iova,
> > > > +					struct list_head *resv_regions)
> > > > +{
> > > > +	struct iommu_resv_region *resv;
> > > > +	struct vfio_iova *n, *next;
> > > > +
> > > > +	list_for_each_entry(resv, resv_regions, list) {
> > > > +		phys_addr_t start, end;
> > > > +
> > > > +		start = resv->start;
> > > > +		end = resv->start + resv->length - 1;
> > > > +
> > > > +		list_for_each_entry_safe(n, next, iova, list) {
> > > > +			int ret = 0;
> > > > +
> > > > +			/* No overlap */
> > > > +			if ((start > n->end) || (end < n->start))
> > > > +				continue;
> > > > +			/*
> > > > +			 * Insert a new node if current node overlaps with
> > > > the
> > > > +			 * reserve region to exlude that from valid iova
> > > > range.
> > > > +			 * Note that, new node is inserted before the
> > > > current
> > > > +			 * node and finally the current node is deleted
> > > > keeping
> > > > +			 * the list updated and sorted.
> > > > +			 */
> > > > +			if (start > n->start)
> > > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > > +							n->start, start - 1);
> > > > +			if (!ret && end < n->end)
> > > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > > +							end + 1, n->end);
> > > > +			if (ret)
> > > > +				return ret;  
> > >
> > > Is it safer to delete the 1st node here in case of failure of the 2nd node?
> > > There is no problem with current logic since upon error iova_copy will
> > > be released anyway. However this function alone doesn't assume the
> > > fact of a temporary list, thus it's better to keep the list clean w/o garbage
> > > left from any error handling.  
> > 
> > I don't think the proposal makes the list notably more sane on failure
> > than we have here.  If the function returns an error and the list is
> > modified in any way, how can the caller recover?  We're operating on a
> > principle of modify a copy and throw it away on error, the only
> > function level solution to the problem you're noting is to make each
> > function generate a working copy, which is clearly inefficient.  This
> > is a static function, not intended for general use, so I think a
> > sufficient approach to address your concern is to simply note the error
> > behavior in the comment above the function, the list is in an
> > unknown/inconsistent state on error.  Thanks,
> >   
> 
> 'static' doesn't mean it cannot be used for general purpose in the same
> file.

Obviously this is true, but expecting robust error handling, as might
be found in an exported general purpose function, from a static
specific purpose helper, is a bit absurd.  The strategy is therefore,
a) can we make it more general purpose without compromising the intent
of the function; probably not without adding overhead of using a local
copy of the list, b) can we modify the API, function name, arg names,
etc to make the behavior more intuitive; maybe, c) Can we at least add
a comment to make the potentially non-intuitive behavior obvious; of
course.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-21 16:31           ` Alex Williamson
  0 siblings, 0 replies; 63+ messages in thread
From: Alex Williamson @ 2018-03-21 16:31 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Wed, 21 Mar 2018 03:30:29 +0000
"Tian, Kevin" <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:

> > From: Alex Williamson [mailto:alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> > Sent: Wednesday, March 21, 2018 6:38 AM
> > 
> > On Mon, 19 Mar 2018 07:51:58 +0000
> > "Tian, Kevin" <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> >   
> > > > From: Shameer Kolothum
> > > > Sent: Friday, March 16, 2018 12:35 AM
> > > >
> > > > This retrieves the reserved regions associated with dev group and
> > > > checks for conflicts with any existing dma mappings. Also update
> > > > the iova list excluding the reserved regions.
> > > >
> > > > Signed-off-by: Shameer Kolothum
> > > > <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > > > ---
> > > >  drivers/vfio/vfio_iommu_type1.c | 90
> > > > +++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 90 insertions(+)
> > > >
> > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > b/drivers/vfio/vfio_iommu_type1.c
> > > > index 1123c74..cfe2bb2 100644
> > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct
> > > > list_head *iova,
> > > >  	return 0;
> > > >  }
> > > >
> > > > +/*
> > > > + * Check reserved region conflicts with existing dma mappings
> > > > + */
> > > > +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
> > > > +				struct list_head *resv_regions)
> > > > +{
> > > > +	struct iommu_resv_region *region;
> > > > +
> > > > +	/* Check for conflict with existing dma mappings */
> > > > +	list_for_each_entry(region, resv_regions, list) {
> > > > +		if (vfio_find_dma(iommu, region->start, region->length))
> > > > +			return true;
> > > > +	}
> > > > +
> > > > +	return false;
> > > > +}
> > > > +
> > > > +/*
> > > > + * Check iova region overlap with  reserved regions and
> > > > + * exclude them from the iommu iova range
> > > > + */
> > > > +static int vfio_iommu_resv_exclude(struct list_head *iova,
> > > > +					struct list_head *resv_regions)
> > > > +{
> > > > +	struct iommu_resv_region *resv;
> > > > +	struct vfio_iova *n, *next;
> > > > +
> > > > +	list_for_each_entry(resv, resv_regions, list) {
> > > > +		phys_addr_t start, end;
> > > > +
> > > > +		start = resv->start;
> > > > +		end = resv->start + resv->length - 1;
> > > > +
> > > > +		list_for_each_entry_safe(n, next, iova, list) {
> > > > +			int ret = 0;
> > > > +
> > > > +			/* No overlap */
> > > > +			if ((start > n->end) || (end < n->start))
> > > > +				continue;
> > > > +			/*
> > > > +			 * Insert a new node if current node overlaps with
> > > > the
> > > > +			 * reserve region to exlude that from valid iova
> > > > range.
> > > > +			 * Note that, new node is inserted before the
> > > > current
> > > > +			 * node and finally the current node is deleted
> > > > keeping
> > > > +			 * the list updated and sorted.
> > > > +			 */
> > > > +			if (start > n->start)
> > > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > > +							n->start, start - 1);
> > > > +			if (!ret && end < n->end)
> > > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > > +							end + 1, n->end);
> > > > +			if (ret)
> > > > +				return ret;  
> > >
> > > Is it safer to delete the 1st node here in case of failure of the 2nd node?
> > > There is no problem with current logic since upon error iova_copy will
> > > be released anyway. However this function alone doesn't assume the
> > > fact of a temporary list, thus it's better to keep the list clean w/o garbage
> > > left from any error handling.  
> > 
> > I don't think the proposal makes the list notably more sane on failure
> > than we have here.  If the function returns an error and the list is
> > modified in any way, how can the caller recover?  We're operating on a
> > principle of modify a copy and throw it away on error, the only
> > function level solution to the problem you're noting is to make each
> > function generate a working copy, which is clearly inefficient.  This
> > is a static function, not intended for general use, so I think a
> > sufficient approach to address your concern is to simply note the error
> > behavior in the comment above the function, the list is in an
> > unknown/inconsistent state on error.  Thanks,
> >   
> 
> 'static' doesn't mean it cannot be used for general purpose in the same
> file.

Obviously this is true, but expecting robust error handling, as might
be found in an exported general purpose function, from a static
specific purpose helper, is a bit absurd.  The strategy is therefore,
a) can we make it more general purpose without compromising the intent
of the function; probably not without adding overhead of using a local
copy of the list, b) can we modify the API, function name, arg names,
etc to make the behavior more intuitive; maybe, c) Can we at least add
a comment to make the potentially non-intuitive behavior obvious; of
course.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 0/7] vfio/type1: Add support for valid iova list management
  2018-03-21  3:28       ` Tian, Kevin
  (?)
@ 2018-03-21 17:11       ` Alex Williamson
  2018-03-22  9:10           ` Tian, Kevin
  -1 siblings, 1 reply; 63+ messages in thread
From: Alex Williamson @ 2018-03-21 17:11 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Shameer Kolothum, eric.auger, pmorel, kvm, linux-kernel, xuwei5,
	linuxarm, iommu

On Wed, 21 Mar 2018 03:28:16 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Wednesday, March 21, 2018 6:55 AM
> > 
> > On Mon, 19 Mar 2018 08:28:32 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >   
> > > > From: Shameer Kolothum
> > > > Sent: Friday, March 16, 2018 12:35 AM
> > > >
> > > > This series introduces an iova list associated with a vfio
> > > > iommu. The list is kept updated taking care of iommu apertures,
> > > > and reserved regions. Also this series adds checks for any conflict
> > > > with existing dma mappings whenever a new device group is attached  
> > to  
> > > > the domain.
> > > >
> > > > User-space can retrieve valid iova ranges using  
> > VFIO_IOMMU_GET_INFO  
> > > > ioctl capability chains. Any dma map request outside the valid iova
> > > > range will be rejected.  
> > >
> > > GET_INFO is done at initialization time which is good for cold attached
> > > devices. If a hotplugged device may cause change of valid iova ranges
> > > at run-time, then there could be potential problem (which however is
> > > difficult for user space or orchestration stack to figure out in advance)
> > > Can we do some extension like below to make hotplug case cleaner?  
> > 
> > Let's be clear what we mean by hotplug here, as I see it, the only
> > relevant hotplug would be a physical device, hot added to the host,
> > which becomes a member of an existing, in-use IOMMU group.  If, on the
> > other hand, we're talking about hotplug related to the user process,
> > there's nothing asynchronous about that.  For instance in the QEMU
> > case, QEMU must add the group to the container, at which point it can
> > evaluate the new iova list and remove the group from the container if
> > it doesn't like the result.  So what would be a case of the available
> > iova list for a group changing as a result of adding a device?  
> 
> My original thought was about the latter case. At that moment 
> I was not sure whether the window between adding/removing 
> the group may cause some issue if there are right some IOVA 
> allocations happening in parallel. But looks Qemu can anyway
> handle it well as long as such scenario is considered.

I believe the kernel patches here and the existing code are using
locking to prevent races between mapping changes and device changes, so
the acceptance of a new group into a container and the iova list for a
container should always be correct for the order these operations
arrive.  Beyond that, I don't believe it's the kernel's responsibility
to do anything more than block groups from being added if they conflict
with current mappings.  The user already has the minimum interfaces
they need to manage other scenarios.

> > > - An interface allowing user space to request VFIO rejecting further
> > > attach_group if doing so may cause iova range change. e.g. Qemu can
> > > do such request once completing initial GET_INFO;  
> > 
> > For the latter case above, it seems unnecessary, QEMU can revert the
> > attach, we're only promising that the attach won't interfere with
> > existing mappings.  For the host hotplug case, the user has no control,
> > the new device is a member of the iommu group and therefore necessarily
> > becomes a part of container.  I imagine there are plenty of other holes
> > in this scenario already.
> >   
> > > - or an event notification to user space upon change of valid iova
> > > ranges when attaching a new device at run-time. It goes one step
> > > further - even attach may cause iova range change, it may still
> > > succeed as long as Qemu hasn't allocated any iova in impacted
> > > range  
> > 
> > Same as above, in the QEMU hotplug case, the user is orchestrating the
> > adding of the group to the container, they can then check the iommu
> > info on their own and determine what, if any, changes are relevant to
> > them, knowing that the addition would not succeed if any current
> > mappings where affected.  In the host case, a notification would be
> > required, but we'd first need to identify exactly how the iova list can
> > change asynchronous to the user doing anything.  Thanks,  
> 
> for host hotplug, possibly notification could be an opt-in model.
> meaning if user space doesn't explicitly request receiving notification 
> on such event, the device is just left in unused state (vfio-pci still
> claims the device, assuming it assigned to the container owner, but
> the owner doesn't use it)

Currently, if a device is added to a live group, the kernel will print
a warning.  We have a todo to bind that device to a vfio-bus driver,
but I'm not sure that isn't an overreach into the system policy
decisions.  If system policy decides to bind to a vfio bus driver, all
is well, but we might be missing the iommu backend adding the device to
the iommu domain, so it probably won't work unless the requester ID is
actually aliased to the IOMMU (such as a conventional PCI device), a
standard PCIe device simply wouldn't be part of the IOMMU domain and
would generate IOMMU faults if the user attempts to use it (AFAICT).
OTOH, if system policy binds the device to a native host driver, then
the integrity of the group for userspace use is compromised, which is
terminated with prejudice via a BUG.  Obviously the user is never
obligated to listen to signals and we don't provide a signal here as
this scenario is mostly theoretical, though it would be relatively easy
with software hotplug to artificially induce such a condition.

However, I'm still not sure how adding a device to a group is
necessarily relevant to this series, particularly how it would change
the iova list.  When we add groups to a container, we're potentially
crossing boundaries between IOMMUs and may therefore pickup new
reserved resource requirements, but devices within a group seem like
reserved regions should already be accounted for in the existing group.
At least so long as we've decided reserved regions are only the IOMMU
imposed reserved regions and not routing within the group, which we've
hand waved as the user's problem already.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list management
@ 2018-03-22  9:10           ` Tian, Kevin
  0 siblings, 0 replies; 63+ messages in thread
From: Tian, Kevin @ 2018-03-22  9:10 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Shameer Kolothum, eric.auger, pmorel, kvm, linux-kernel, xuwei5,
	linuxarm, iommu

> From: Alex Williamson
> Sent: Thursday, March 22, 2018 1:11 AM
> 
> On Wed, 21 Mar 2018 03:28:16 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > Sent: Wednesday, March 21, 2018 6:55 AM
> > >
> > > On Mon, 19 Mar 2018 08:28:32 +0000
> > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > >
> > > > > From: Shameer Kolothum
> > > > > Sent: Friday, March 16, 2018 12:35 AM
> > > > >
> > > > > This series introduces an iova list associated with a vfio
> > > > > iommu. The list is kept updated taking care of iommu apertures,
> > > > > and reserved regions. Also this series adds checks for any conflict
> > > > > with existing dma mappings whenever a new device group is
> attached
> > > to
> > > > > the domain.
> > > > >
> > > > > User-space can retrieve valid iova ranges using
> > > VFIO_IOMMU_GET_INFO
> > > > > ioctl capability chains. Any dma map request outside the valid iova
> > > > > range will be rejected.
> > > >
> > > > GET_INFO is done at initialization time which is good for cold attached
> > > > devices. If a hotplugged device may cause change of valid iova ranges
> > > > at run-time, then there could be potential problem (which however is
> > > > difficult for user space or orchestration stack to figure out in advance)
> > > > Can we do some extension like below to make hotplug case cleaner?
> > >
> > > Let's be clear what we mean by hotplug here, as I see it, the only
> > > relevant hotplug would be a physical device, hot added to the host,
> > > which becomes a member of an existing, in-use IOMMU group.  If, on
> the
> > > other hand, we're talking about hotplug related to the user process,
> > > there's nothing asynchronous about that.  For instance in the QEMU
> > > case, QEMU must add the group to the container, at which point it can
> > > evaluate the new iova list and remove the group from the container if
> > > it doesn't like the result.  So what would be a case of the available
> > > iova list for a group changing as a result of adding a device?
> >
> > My original thought was about the latter case. At that moment
> > I was not sure whether the window between adding/removing
> > the group may cause some issue if there are right some IOVA
> > allocations happening in parallel. But looks Qemu can anyway
> > handle it well as long as such scenario is considered.
> 
> I believe the kernel patches here and the existing code are using
> locking to prevent races between mapping changes and device changes, so
> the acceptance of a new group into a container and the iova list for a
> container should always be correct for the order these operations
> arrive.  Beyond that, I don't believe it's the kernel's responsibility
> to do anything more than block groups from being added if they conflict
> with current mappings.  The user already has the minimum interfaces
> they need to manage other scenarios.

Agree

> 
> > > > - An interface allowing user space to request VFIO rejecting further
> > > > attach_group if doing so may cause iova range change. e.g. Qemu can
> > > > do such request once completing initial GET_INFO;
> > >
> > > For the latter case above, it seems unnecessary, QEMU can revert the
> > > attach, we're only promising that the attach won't interfere with
> > > existing mappings.  For the host hotplug case, the user has no control,
> > > the new device is a member of the iommu group and therefore
> necessarily
> > > becomes a part of container.  I imagine there are plenty of other holes
> > > in this scenario already.
> > >
> > > > - or an event notification to user space upon change of valid iova
> > > > ranges when attaching a new device at run-time. It goes one step
> > > > further - even attach may cause iova range change, it may still
> > > > succeed as long as Qemu hasn't allocated any iova in impacted
> > > > range
> > >
> > > Same as above, in the QEMU hotplug case, the user is orchestrating the
> > > adding of the group to the container, they can then check the iommu
> > > info on their own and determine what, if any, changes are relevant to
> > > them, knowing that the addition would not succeed if any current
> > > mappings where affected.  In the host case, a notification would be
> > > required, but we'd first need to identify exactly how the iova list can
> > > change asynchronous to the user doing anything.  Thanks,
> >
> > for host hotplug, possibly notification could be an opt-in model.
> > meaning if user space doesn't explicitly request receiving notification
> > on such event, the device is just left in unused state (vfio-pci still
> > claims the device, assuming it assigned to the container owner, but
> > the owner doesn't use it)
> 
> Currently, if a device is added to a live group, the kernel will print
> a warning.  We have a todo to bind that device to a vfio-bus driver,
> but I'm not sure that isn't an overreach into the system policy
> decisions.  If system policy decides to bind to a vfio bus driver, all
> is well, but we might be missing the iommu backend adding the device to
> the iommu domain, so it probably won't work unless the requester ID is
> actually aliased to the IOMMU (such as a conventional PCI device), a
> standard PCIe device simply wouldn't be part of the IOMMU domain and
> would generate IOMMU faults if the user attempts to use it (AFAICT).
> OTOH, if system policy binds the device to a native host driver, then
> the integrity of the group for userspace use is compromised, which is
> terminated with prejudice via a BUG.  Obviously the user is never
> obligated to listen to signals and we don't provide a signal here as
> this scenario is mostly theoretical, though it would be relatively easy
> with software hotplug to artificially induce such a condition.
> 
> However, I'm still not sure how adding a device to a group is
> necessarily relevant to this series, particularly how it would change
> the iova list.  When we add groups to a container, we're potentially
> crossing boundaries between IOMMUs and may therefore pickup new
> reserved resource requirements, but devices within a group seem like
> reserved regions should already be accounted for in the existing group.
> At least so long as we've decided reserved regions are only the IOMMU
> imposed reserved regions and not routing within the group, which we've
> hand waved as the user's problem already.  Thanks,
> 

oh it's not relevant to this series now. As I said my earlier concern
is more on guest hotplug side which has been closed. Sorry for 
distracting the thread when further replying to host hotplug which
should be in a separate thread if necessary (so I'll stop further comment
here until there is a real need for that part. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 0/7] vfio/type1: Add support for valid iova list management
@ 2018-03-22  9:10           ` Tian, Kevin
  0 siblings, 0 replies; 63+ messages in thread
From: Tian, Kevin @ 2018-03-22  9:10 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

> From: Alex Williamson
> Sent: Thursday, March 22, 2018 1:11 AM
> 
> On Wed, 21 Mar 2018 03:28:16 +0000
> "Tian, Kevin" <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> 
> > > From: Alex Williamson [mailto:alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> > > Sent: Wednesday, March 21, 2018 6:55 AM
> > >
> > > On Mon, 19 Mar 2018 08:28:32 +0000
> > > "Tian, Kevin" <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> > >
> > > > > From: Shameer Kolothum
> > > > > Sent: Friday, March 16, 2018 12:35 AM
> > > > >
> > > > > This series introduces an iova list associated with a vfio
> > > > > iommu. The list is kept updated taking care of iommu apertures,
> > > > > and reserved regions. Also this series adds checks for any conflict
> > > > > with existing dma mappings whenever a new device group is
> attached
> > > to
> > > > > the domain.
> > > > >
> > > > > User-space can retrieve valid iova ranges using
> > > VFIO_IOMMU_GET_INFO
> > > > > ioctl capability chains. Any dma map request outside the valid iova
> > > > > range will be rejected.
> > > >
> > > > GET_INFO is done at initialization time which is good for cold attached
> > > > devices. If a hotplugged device may cause change of valid iova ranges
> > > > at run-time, then there could be potential problem (which however is
> > > > difficult for user space or orchestration stack to figure out in advance)
> > > > Can we do some extension like below to make hotplug case cleaner?
> > >
> > > Let's be clear what we mean by hotplug here, as I see it, the only
> > > relevant hotplug would be a physical device, hot added to the host,
> > > which becomes a member of an existing, in-use IOMMU group.  If, on
> the
> > > other hand, we're talking about hotplug related to the user process,
> > > there's nothing asynchronous about that.  For instance in the QEMU
> > > case, QEMU must add the group to the container, at which point it can
> > > evaluate the new iova list and remove the group from the container if
> > > it doesn't like the result.  So what would be a case of the available
> > > iova list for a group changing as a result of adding a device?
> >
> > My original thought was about the latter case. At that moment
> > I was not sure whether the window between adding/removing
> > the group may cause some issue if there are right some IOVA
> > allocations happening in parallel. But looks Qemu can anyway
> > handle it well as long as such scenario is considered.
> 
> I believe the kernel patches here and the existing code are using
> locking to prevent races between mapping changes and device changes, so
> the acceptance of a new group into a container and the iova list for a
> container should always be correct for the order these operations
> arrive.  Beyond that, I don't believe it's the kernel's responsibility
> to do anything more than block groups from being added if they conflict
> with current mappings.  The user already has the minimum interfaces
> they need to manage other scenarios.

Agree

> 
> > > > - An interface allowing user space to request VFIO rejecting further
> > > > attach_group if doing so may cause iova range change. e.g. Qemu can
> > > > do such request once completing initial GET_INFO;
> > >
> > > For the latter case above, it seems unnecessary, QEMU can revert the
> > > attach, we're only promising that the attach won't interfere with
> > > existing mappings.  For the host hotplug case, the user has no control,
> > > the new device is a member of the iommu group and therefore
> necessarily
> > > becomes a part of container.  I imagine there are plenty of other holes
> > > in this scenario already.
> > >
> > > > - or an event notification to user space upon change of valid iova
> > > > ranges when attaching a new device at run-time. It goes one step
> > > > further - even attach may cause iova range change, it may still
> > > > succeed as long as Qemu hasn't allocated any iova in impacted
> > > > range
> > >
> > > Same as above, in the QEMU hotplug case, the user is orchestrating the
> > > adding of the group to the container, they can then check the iommu
> > > info on their own and determine what, if any, changes are relevant to
> > > them, knowing that the addition would not succeed if any current
> > > mappings where affected.  In the host case, a notification would be
> > > required, but we'd first need to identify exactly how the iova list can
> > > change asynchronous to the user doing anything.  Thanks,
> >
> > for host hotplug, possibly notification could be an opt-in model.
> > meaning if user space doesn't explicitly request receiving notification
> > on such event, the device is just left in unused state (vfio-pci still
> > claims the device, assuming it assigned to the container owner, but
> > the owner doesn't use it)
> 
> Currently, if a device is added to a live group, the kernel will print
> a warning.  We have a todo to bind that device to a vfio-bus driver,
> but I'm not sure that isn't an overreach into the system policy
> decisions.  If system policy decides to bind to a vfio bus driver, all
> is well, but we might be missing the iommu backend adding the device to
> the iommu domain, so it probably won't work unless the requester ID is
> actually aliased to the IOMMU (such as a conventional PCI device), a
> standard PCIe device simply wouldn't be part of the IOMMU domain and
> would generate IOMMU faults if the user attempts to use it (AFAICT).
> OTOH, if system policy binds the device to a native host driver, then
> the integrity of the group for userspace use is compromised, which is
> terminated with prejudice via a BUG.  Obviously the user is never
> obligated to listen to signals and we don't provide a signal here as
> this scenario is mostly theoretical, though it would be relatively easy
> with software hotplug to artificially induce such a condition.
> 
> However, I'm still not sure how adding a device to a group is
> necessarily relevant to this series, particularly how it would change
> the iova list.  When we add groups to a container, we're potentially
> crossing boundaries between IOMMUs and may therefore pickup new
> reserved resource requirements, but devices within a group seem like
> reserved regions should already be accounted for in the existing group.
> At least so long as we've decided reserved regions are only the IOMMU
> imposed reserved regions and not routing within the group, which we've
> hand waved as the user's problem already.  Thanks,
> 

oh it's not relevant to this series now. As I said my earlier concern
is more on guest hotplug side which has been closed. Sorry for 
distracting the thread when further replying to host hotplug which
should be in a separate thread if necessary (so I'll stop further comment
here until there is a real need for that part. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-22  9:15             ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 63+ messages in thread
From: Shameerali Kolothum Thodi @ 2018-03-22  9:15 UTC (permalink / raw)
  To: Alex Williamson, Tian, Kevin
  Cc: eric.auger, pmorel, kvm, linux-kernel, xuwei (O), Linuxarm, iommu



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Wednesday, March 21, 2018 4:31 PM
> To: Tian, Kevin <kevin.tian@intel.com>
> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> eric.auger@redhat.com; pmorel@linux.vnet.ibm.com; kvm@vger.kernel.org;
> linux-kernel@vger.kernel.org; xuwei (O) <xuwei5@huawei.com>; Linuxarm
> <linuxarm@huawei.com>; iommu@lists.linux-foundation.org
> Subject: Re: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and
> update iova list
> 
> On Wed, 21 Mar 2018 03:30:29 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > Sent: Wednesday, March 21, 2018 6:38 AM
> > >
> > > On Mon, 19 Mar 2018 07:51:58 +0000
> > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > >
> > > > > From: Shameer Kolothum
> > > > > Sent: Friday, March 16, 2018 12:35 AM
> > > > >
> > > > > This retrieves the reserved regions associated with dev group and
> > > > > checks for conflicts with any existing dma mappings. Also update
> > > > > the iova list excluding the reserved regions.
> > > > >
> > > > > Signed-off-by: Shameer Kolothum
> > > > > <shameerali.kolothum.thodi@huawei.com>
> > > > > ---
> > > > >  drivers/vfio/vfio_iommu_type1.c | 90
> > > > > +++++++++++++++++++++++++++++++++++++++++
> > > > >  1 file changed, 90 insertions(+)
> > > > >
> > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > > b/drivers/vfio/vfio_iommu_type1.c
> > > > > index 1123c74..cfe2bb2 100644
> > > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > > @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct
> > > > > list_head *iova,
> > > > >  	return 0;
> > > > >  }
> > > > >
> > > > > +/*
> > > > > + * Check reserved region conflicts with existing dma mappings
> > > > > + */
> > > > > +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
> > > > > +				struct list_head *resv_regions)
> > > > > +{
> > > > > +	struct iommu_resv_region *region;
> > > > > +
> > > > > +	/* Check for conflict with existing dma mappings */
> > > > > +	list_for_each_entry(region, resv_regions, list) {
> > > > > +		if (vfio_find_dma(iommu, region->start, region-
> >length))
> > > > > +			return true;
> > > > > +	}
> > > > > +
> > > > > +	return false;
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * Check iova region overlap with  reserved regions and
> > > > > + * exclude them from the iommu iova range
> > > > > + */
> > > > > +static int vfio_iommu_resv_exclude(struct list_head *iova,
> > > > > +					struct list_head *resv_regions)
> > > > > +{
> > > > > +	struct iommu_resv_region *resv;
> > > > > +	struct vfio_iova *n, *next;
> > > > > +
> > > > > +	list_for_each_entry(resv, resv_regions, list) {
> > > > > +		phys_addr_t start, end;
> > > > > +
> > > > > +		start = resv->start;
> > > > > +		end = resv->start + resv->length - 1;
> > > > > +
> > > > > +		list_for_each_entry_safe(n, next, iova, list) {
> > > > > +			int ret = 0;
> > > > > +
> > > > > +			/* No overlap */
> > > > > +			if ((start > n->end) || (end < n->start))
> > > > > +				continue;
> > > > > +			/*
> > > > > +			 * Insert a new node if current node overlaps
> with
> > > > > the
> > > > > +			 * reserve region to exlude that from valid iova
> > > > > range.
> > > > > +			 * Note that, new node is inserted before the
> > > > > current
> > > > > +			 * node and finally the current node is deleted
> > > > > keeping
> > > > > +			 * the list updated and sorted.
> > > > > +			 */
> > > > > +			if (start > n->start)
> > > > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > > > +							n->start, start
> - 1);
> > > > > +			if (!ret && end < n->end)
> > > > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > > > +							end + 1, n-
> >end);
> > > > > +			if (ret)
> > > > > +				return ret;
> > > >
> > > > Is it safer to delete the 1st node here in case of failure of the 2nd node?
> > > > There is no problem with current logic since upon error iova_copy will
> > > > be released anyway. However this function alone doesn't assume the
> > > > fact of a temporary list, thus it's better to keep the list clean w/o garbage
> > > > left from any error handling.
> > >
> > > I don't think the proposal makes the list notably more sane on failure
> > > than we have here.  If the function returns an error and the list is
> > > modified in any way, how can the caller recover?  We're operating on a
> > > principle of modify a copy and throw it away on error, the only
> > > function level solution to the problem you're noting is to make each
> > > function generate a working copy, which is clearly inefficient.  This
> > > is a static function, not intended for general use, so I think a
> > > sufficient approach to address your concern is to simply note the error
> > > behavior in the comment above the function, the list is in an
> > > unknown/inconsistent state on error.  Thanks,
> > >
> >
> > 'static' doesn't mean it cannot be used for general purpose in the same
> > file.
> 
> Obviously this is true, but expecting robust error handling, as might
> be found in an exported general purpose function, from a static
> specific purpose helper, is a bit absurd.  The strategy is therefore,
> a) can we make it more general purpose without compromising the intent
> of the function; probably not without adding overhead of using a local
> copy of the list, b) can we modify the API, function name, arg names,
> etc to make the behavior more intuitive; maybe, c) Can we at least add
> a comment to make the potentially non-intuitive behavior obvious; of
> course.  Thanks,

I had a look at this function again and agree with the observation that 
the only way to make it more sane would be to create another local copy
of the list and free up that in case of error. That sounds very inefficient.

I will go with the suggestion of adding a comment here to make the
behavior obvious.

Hi Alex,

If there are other comments to this series, then I will sent a revised one
along with this change added. Otherwise hope, its ok for you to add the
comment. Please let me know.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-22  9:15             ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 63+ messages in thread
From: Shameerali Kolothum Thodi @ 2018-03-22  9:15 UTC (permalink / raw)
  To: Alex Williamson, Tian, Kevin
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Linuxarm,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, xuwei (O)



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> Sent: Wednesday, March 21, 2018 4:31 PM
> To: Tian, Kevin <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; xuwei (O) <xuwei5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; Linuxarm
> <linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> Subject: Re: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and
> update iova list
> 
> On Wed, 21 Mar 2018 03:30:29 +0000
> "Tian, Kevin" <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> 
> > > From: Alex Williamson [mailto:alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> > > Sent: Wednesday, March 21, 2018 6:38 AM
> > >
> > > On Mon, 19 Mar 2018 07:51:58 +0000
> > > "Tian, Kevin" <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> > >
> > > > > From: Shameer Kolothum
> > > > > Sent: Friday, March 16, 2018 12:35 AM
> > > > >
> > > > > This retrieves the reserved regions associated with dev group and
> > > > > checks for conflicts with any existing dma mappings. Also update
> > > > > the iova list excluding the reserved regions.
> > > > >
> > > > > Signed-off-by: Shameer Kolothum
> > > > > <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > > > > ---
> > > > >  drivers/vfio/vfio_iommu_type1.c | 90
> > > > > +++++++++++++++++++++++++++++++++++++++++
> > > > >  1 file changed, 90 insertions(+)
> > > > >
> > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > > b/drivers/vfio/vfio_iommu_type1.c
> > > > > index 1123c74..cfe2bb2 100644
> > > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > > @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct
> > > > > list_head *iova,
> > > > >  	return 0;
> > > > >  }
> > > > >
> > > > > +/*
> > > > > + * Check reserved region conflicts with existing dma mappings
> > > > > + */
> > > > > +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
> > > > > +				struct list_head *resv_regions)
> > > > > +{
> > > > > +	struct iommu_resv_region *region;
> > > > > +
> > > > > +	/* Check for conflict with existing dma mappings */
> > > > > +	list_for_each_entry(region, resv_regions, list) {
> > > > > +		if (vfio_find_dma(iommu, region->start, region-
> >length))
> > > > > +			return true;
> > > > > +	}
> > > > > +
> > > > > +	return false;
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * Check iova region overlap with  reserved regions and
> > > > > + * exclude them from the iommu iova range
> > > > > + */
> > > > > +static int vfio_iommu_resv_exclude(struct list_head *iova,
> > > > > +					struct list_head *resv_regions)
> > > > > +{
> > > > > +	struct iommu_resv_region *resv;
> > > > > +	struct vfio_iova *n, *next;
> > > > > +
> > > > > +	list_for_each_entry(resv, resv_regions, list) {
> > > > > +		phys_addr_t start, end;
> > > > > +
> > > > > +		start = resv->start;
> > > > > +		end = resv->start + resv->length - 1;
> > > > > +
> > > > > +		list_for_each_entry_safe(n, next, iova, list) {
> > > > > +			int ret = 0;
> > > > > +
> > > > > +			/* No overlap */
> > > > > +			if ((start > n->end) || (end < n->start))
> > > > > +				continue;
> > > > > +			/*
> > > > > +			 * Insert a new node if current node overlaps
> with
> > > > > the
> > > > > +			 * reserve region to exlude that from valid iova
> > > > > range.
> > > > > +			 * Note that, new node is inserted before the
> > > > > current
> > > > > +			 * node and finally the current node is deleted
> > > > > keeping
> > > > > +			 * the list updated and sorted.
> > > > > +			 */
> > > > > +			if (start > n->start)
> > > > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > > > +							n->start, start
> - 1);
> > > > > +			if (!ret && end < n->end)
> > > > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > > > +							end + 1, n-
> >end);
> > > > > +			if (ret)
> > > > > +				return ret;
> > > >
> > > > Is it safer to delete the 1st node here in case of failure of the 2nd node?
> > > > There is no problem with current logic since upon error iova_copy will
> > > > be released anyway. However this function alone doesn't assume the
> > > > fact of a temporary list, thus it's better to keep the list clean w/o garbage
> > > > left from any error handling.
> > >
> > > I don't think the proposal makes the list notably more sane on failure
> > > than we have here.  If the function returns an error and the list is
> > > modified in any way, how can the caller recover?  We're operating on a
> > > principle of modify a copy and throw it away on error, the only
> > > function level solution to the problem you're noting is to make each
> > > function generate a working copy, which is clearly inefficient.  This
> > > is a static function, not intended for general use, so I think a
> > > sufficient approach to address your concern is to simply note the error
> > > behavior in the comment above the function, the list is in an
> > > unknown/inconsistent state on error.  Thanks,
> > >
> >
> > 'static' doesn't mean it cannot be used for general purpose in the same
> > file.
> 
> Obviously this is true, but expecting robust error handling, as might
> be found in an exported general purpose function, from a static
> specific purpose helper, is a bit absurd.  The strategy is therefore,
> a) can we make it more general purpose without compromising the intent
> of the function; probably not without adding overhead of using a local
> copy of the list, b) can we modify the API, function name, arg names,
> etc to make the behavior more intuitive; maybe, c) Can we at least add
> a comment to make the potentially non-intuitive behavior obvious; of
> course.  Thanks,

I had a look at this function again and agree with the observation that 
the only way to make it more sane would be to create another local copy
of the list and free up that in case of error. That sounds very inefficient.

I will go with the suggestion of adding a comment here to make the
behavior obvious.

Hi Alex,

If there are other comments to this series, then I will sent a revised one
along with this change added. Otherwise hope, its ok for you to add the
comment. Please let me know.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
@ 2018-03-22  9:15             ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 63+ messages in thread
From: Shameerali Kolothum Thodi @ 2018-03-22  9:15 UTC (permalink / raw)
  To: Alex Williamson, Tian, Kevin
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Linuxarm,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, xuwei (O)



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> Sent: Wednesday, March 21, 2018 4:31 PM
> To: Tian, Kevin <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>;
> eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; xuwei (O) <xuwei5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; Linuxarm
> <linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> Subject: Re: [PATCH v5 2/7] vfio/type1: Check reserve region conflict and
> update iova list
> 
> On Wed, 21 Mar 2018 03:30:29 +0000
> "Tian, Kevin" <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> 
> > > From: Alex Williamson [mailto:alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> > > Sent: Wednesday, March 21, 2018 6:38 AM
> > >
> > > On Mon, 19 Mar 2018 07:51:58 +0000
> > > "Tian, Kevin" <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> > >
> > > > > From: Shameer Kolothum
> > > > > Sent: Friday, March 16, 2018 12:35 AM
> > > > >
> > > > > This retrieves the reserved regions associated with dev group and
> > > > > checks for conflicts with any existing dma mappings. Also update
> > > > > the iova list excluding the reserved regions.
> > > > >
> > > > > Signed-off-by: Shameer Kolothum
> > > > > <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > > > > ---
> > > > >  drivers/vfio/vfio_iommu_type1.c | 90
> > > > > +++++++++++++++++++++++++++++++++++++++++
> > > > >  1 file changed, 90 insertions(+)
> > > > >
> > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > > b/drivers/vfio/vfio_iommu_type1.c
> > > > > index 1123c74..cfe2bb2 100644
> > > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > > @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct
> > > > > list_head *iova,
> > > > >  	return 0;
> > > > >  }
> > > > >
> > > > > +/*
> > > > > + * Check reserved region conflicts with existing dma mappings
> > > > > + */
> > > > > +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu,
> > > > > +				struct list_head *resv_regions)
> > > > > +{
> > > > > +	struct iommu_resv_region *region;
> > > > > +
> > > > > +	/* Check for conflict with existing dma mappings */
> > > > > +	list_for_each_entry(region, resv_regions, list) {
> > > > > +		if (vfio_find_dma(iommu, region->start, region-
> >length))
> > > > > +			return true;
> > > > > +	}
> > > > > +
> > > > > +	return false;
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * Check iova region overlap with  reserved regions and
> > > > > + * exclude them from the iommu iova range
> > > > > + */
> > > > > +static int vfio_iommu_resv_exclude(struct list_head *iova,
> > > > > +					struct list_head *resv_regions)
> > > > > +{
> > > > > +	struct iommu_resv_region *resv;
> > > > > +	struct vfio_iova *n, *next;
> > > > > +
> > > > > +	list_for_each_entry(resv, resv_regions, list) {
> > > > > +		phys_addr_t start, end;
> > > > > +
> > > > > +		start = resv->start;
> > > > > +		end = resv->start + resv->length - 1;
> > > > > +
> > > > > +		list_for_each_entry_safe(n, next, iova, list) {
> > > > > +			int ret = 0;
> > > > > +
> > > > > +			/* No overlap */
> > > > > +			if ((start > n->end) || (end < n->start))
> > > > > +				continue;
> > > > > +			/*
> > > > > +			 * Insert a new node if current node overlaps
> with
> > > > > the
> > > > > +			 * reserve region to exlude that from valid iova
> > > > > range.
> > > > > +			 * Note that, new node is inserted before the
> > > > > current
> > > > > +			 * node and finally the current node is deleted
> > > > > keeping
> > > > > +			 * the list updated and sorted.
> > > > > +			 */
> > > > > +			if (start > n->start)
> > > > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > > > +							n->start, start
> - 1);
> > > > > +			if (!ret && end < n->end)
> > > > > +				ret = vfio_iommu_iova_insert(&n->list,
> > > > > +							end + 1, n-
> >end);
> > > > > +			if (ret)
> > > > > +				return ret;
> > > >
> > > > Is it safer to delete the 1st node here in case of failure of the 2nd node?
> > > > There is no problem with current logic since upon error iova_copy will
> > > > be released anyway. However this function alone doesn't assume the
> > > > fact of a temporary list, thus it's better to keep the list clean w/o garbage
> > > > left from any error handling.
> > >
> > > I don't think the proposal makes the list notably more sane on failure
> > > than we have here.  If the function returns an error and the list is
> > > modified in any way, how can the caller recover?  We're operating on a
> > > principle of modify a copy and throw it away on error, the only
> > > function level solution to the problem you're noting is to make each
> > > function generate a working copy, which is clearly inefficient.  This
> > > is a static function, not intended for general use, so I think a
> > > sufficient approach to address your concern is to simply note the error
> > > behavior in the comment above the function, the list is in an
> > > unknown/inconsistent state on error.  Thanks,
> > >
> >
> > 'static' doesn't mean it cannot be used for general purpose in the same
> > file.
> 
> Obviously this is true, but expecting robust error handling, as might
> be found in an exported general purpose function, from a static
> specific purpose helper, is a bit absurd.  The strategy is therefore,
> a) can we make it more general purpose without compromising the intent
> of the function; probably not without adding overhead of using a local
> copy of the list, b) can we modify the API, function name, arg names,
> etc to make the behavior more intuitive; maybe, c) Can we at least add
> a comment to make the potentially non-intuitive behavior obvious; of
> course.  Thanks,

I had a look at this function again and agree with the observation that 
the only way to make it more sane would be to create another local copy
of the list and free up that in case of error. That sounds very inefficient.

I will go with the suggestion of adding a comment here to make the
behavior obvious.

Hi Alex,

If there are other comments to this series, then I will sent a revised one
along with this change added. Otherwise hope, its ok for you to add the
comment. Please let me know.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 7/7] iommu/dma: Move PCI window region reservation back into dma specific path.
@ 2018-03-22 16:21     ` Alex Williamson
  0 siblings, 0 replies; 63+ messages in thread
From: Alex Williamson @ 2018-03-22 16:21 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: eric.auger, pmorel, kvm, linux-kernel, iommu, linuxarm,
	john.garry, xuwei5, Robin Murphy, Joerg Roedel

On Thu, 15 Mar 2018 16:35:09 +0000
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:

> This pretty much reverts commit 273df9635385 ("iommu/dma: Make PCI
> window reservation generic")  by moving the PCI window region
> reservation back into the dma specific path so that these regions
> doesn't get exposed via the IOMMU API interface. With this change,
> the vfio interface will report only iommu specific reserved regions
> to the user space.
> 
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---

As currently ordered, we expose the iova list to the user in 5/7 with
the PCI window reservations still intact.  Isn't that a bisection
problem?  This patch should come before the iova list is expose to the
user.  This is otherwise independent, so I can pop it up in the stack on
commit, but I'd need an ack from Joerg and Robin to take it via my
tree.  Thanks,

Alex

>  drivers/iommu/dma-iommu.c | 54 ++++++++++++++++++++++-------------------------
>  1 file changed, 25 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index f05f3cf..ddcbbdb 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -167,40 +167,16 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
>   * @list: Reserved region list from iommu_get_resv_regions()
>   *
>   * IOMMU drivers can use this to implement their .get_resv_regions callback
> - * for general non-IOMMU-specific reservations. Currently, this covers host
> - * bridge windows for PCI devices and GICv3 ITS region reservation on ACPI
> - * based ARM platforms that may require HW MSI reservation.
> + * for general non-IOMMU-specific reservations. Currently, this covers GICv3
> + * ITS region reservation on ACPI based ARM platforms that may require HW MSI
> + * reservation.
>   */
>  void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
>  {
> -	struct pci_host_bridge *bridge;
> -	struct resource_entry *window;
> -
> -	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode) &&
> -		iort_iommu_msi_get_resv_regions(dev, list) < 0)
> -		return;
> -
> -	if (!dev_is_pci(dev))
> -		return;
> -
> -	bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
> -	resource_list_for_each_entry(window, &bridge->windows) {
> -		struct iommu_resv_region *region;
> -		phys_addr_t start;
> -		size_t length;
> -
> -		if (resource_type(window->res) != IORESOURCE_MEM)
> -			continue;
>  
> -		start = window->res->start - window->offset;
> -		length = window->res->end - window->res->start + 1;
> -		region = iommu_alloc_resv_region(start, length, 0,
> -				IOMMU_RESV_RESERVED);
> -		if (!region)
> -			return;
> +	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode))
> +		iort_iommu_msi_get_resv_regions(dev, list);
>  
> -		list_add_tail(&region->list, list);
> -	}
>  }
>  EXPORT_SYMBOL(iommu_dma_get_resv_regions);
>  
> @@ -229,6 +205,23 @@ static int cookie_init_hw_msi_region(struct iommu_dma_cookie *cookie,
>  	return 0;
>  }
>  
> +static void iova_reserve_pci_windows(struct pci_dev *dev,
> +		struct iova_domain *iovad)
> +{
> +	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
> +	struct resource_entry *window;
> +	unsigned long lo, hi;
> +
> +	resource_list_for_each_entry(window, &bridge->windows) {
> +		if (resource_type(window->res) != IORESOURCE_MEM)
> +			continue;
> +
> +		lo = iova_pfn(iovad, window->res->start - window->offset);
> +		hi = iova_pfn(iovad, window->res->end - window->offset);
> +		reserve_iova(iovad, lo, hi);
> +	}
> +}
> +
>  static int iova_reserve_iommu_regions(struct device *dev,
>  		struct iommu_domain *domain)
>  {
> @@ -238,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev,
>  	LIST_HEAD(resv_regions);
>  	int ret = 0;
>  
> +	if (dev_is_pci(dev))
> +		iova_reserve_pci_windows(to_pci_dev(dev), iovad);
> +
>  	iommu_get_resv_regions(dev, &resv_regions);
>  	list_for_each_entry(region, &resv_regions, list) {
>  		unsigned long lo, hi;

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 7/7] iommu/dma: Move PCI window region reservation back into dma specific path.
@ 2018-03-22 16:21     ` Alex Williamson
  0 siblings, 0 replies; 63+ messages in thread
From: Alex Williamson @ 2018-03-22 16:21 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Thu, 15 Mar 2018 16:35:09 +0000
Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:

> This pretty much reverts commit 273df9635385 ("iommu/dma: Make PCI
> window reservation generic")  by moving the PCI window region
> reservation back into the dma specific path so that these regions
> doesn't get exposed via the IOMMU API interface. With this change,
> the vfio interface will report only iommu specific reserved regions
> to the user space.
> 
> Cc: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> ---

As currently ordered, we expose the iova list to the user in 5/7 with
the PCI window reservations still intact.  Isn't that a bisection
problem?  This patch should come before the iova list is expose to the
user.  This is otherwise independent, so I can pop it up in the stack on
commit, but I'd need an ack from Joerg and Robin to take it via my
tree.  Thanks,

Alex

>  drivers/iommu/dma-iommu.c | 54 ++++++++++++++++++++++-------------------------
>  1 file changed, 25 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index f05f3cf..ddcbbdb 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -167,40 +167,16 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
>   * @list: Reserved region list from iommu_get_resv_regions()
>   *
>   * IOMMU drivers can use this to implement their .get_resv_regions callback
> - * for general non-IOMMU-specific reservations. Currently, this covers host
> - * bridge windows for PCI devices and GICv3 ITS region reservation on ACPI
> - * based ARM platforms that may require HW MSI reservation.
> + * for general non-IOMMU-specific reservations. Currently, this covers GICv3
> + * ITS region reservation on ACPI based ARM platforms that may require HW MSI
> + * reservation.
>   */
>  void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
>  {
> -	struct pci_host_bridge *bridge;
> -	struct resource_entry *window;
> -
> -	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode) &&
> -		iort_iommu_msi_get_resv_regions(dev, list) < 0)
> -		return;
> -
> -	if (!dev_is_pci(dev))
> -		return;
> -
> -	bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
> -	resource_list_for_each_entry(window, &bridge->windows) {
> -		struct iommu_resv_region *region;
> -		phys_addr_t start;
> -		size_t length;
> -
> -		if (resource_type(window->res) != IORESOURCE_MEM)
> -			continue;
>  
> -		start = window->res->start - window->offset;
> -		length = window->res->end - window->res->start + 1;
> -		region = iommu_alloc_resv_region(start, length, 0,
> -				IOMMU_RESV_RESERVED);
> -		if (!region)
> -			return;
> +	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode))
> +		iort_iommu_msi_get_resv_regions(dev, list);
>  
> -		list_add_tail(&region->list, list);
> -	}
>  }
>  EXPORT_SYMBOL(iommu_dma_get_resv_regions);
>  
> @@ -229,6 +205,23 @@ static int cookie_init_hw_msi_region(struct iommu_dma_cookie *cookie,
>  	return 0;
>  }
>  
> +static void iova_reserve_pci_windows(struct pci_dev *dev,
> +		struct iova_domain *iovad)
> +{
> +	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
> +	struct resource_entry *window;
> +	unsigned long lo, hi;
> +
> +	resource_list_for_each_entry(window, &bridge->windows) {
> +		if (resource_type(window->res) != IORESOURCE_MEM)
> +			continue;
> +
> +		lo = iova_pfn(iovad, window->res->start - window->offset);
> +		hi = iova_pfn(iovad, window->res->end - window->offset);
> +		reserve_iova(iovad, lo, hi);
> +	}
> +}
> +
>  static int iova_reserve_iommu_regions(struct device *dev,
>  		struct iommu_domain *domain)
>  {
> @@ -238,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev,
>  	LIST_HEAD(resv_regions);
>  	int ret = 0;
>  
> +	if (dev_is_pci(dev))
> +		iova_reserve_pci_windows(to_pci_dev(dev), iovad);
> +
>  	iommu_get_resv_regions(dev, &resv_regions);
>  	list_for_each_entry(region, &resv_regions, list) {
>  		unsigned long lo, hi;

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 7/7] iommu/dma: Move PCI window region reservation back into dma specific path.
@ 2018-03-22 17:22       ` Robin Murphy
  0 siblings, 0 replies; 63+ messages in thread
From: Robin Murphy @ 2018-03-22 17:22 UTC (permalink / raw)
  To: Alex Williamson, Shameer Kolothum
  Cc: eric.auger, pmorel, kvm, linux-kernel, iommu, linuxarm,
	john.garry, xuwei5, Joerg Roedel

On 22/03/18 16:21, Alex Williamson wrote:
> On Thu, 15 Mar 2018 16:35:09 +0000
> Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:
> 
>> This pretty much reverts commit 273df9635385 ("iommu/dma: Make PCI
>> window reservation generic")  by moving the PCI window region
>> reservation back into the dma specific path so that these regions
>> doesn't get exposed via the IOMMU API interface. With this change,
>> the vfio interface will report only iommu specific reserved regions
>> to the user space.
>>
>> Cc: Robin Murphy <robin.murphy@arm.com>
>> Cc: Joerg Roedel <joro@8bytes.org>
>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>> ---
> 
> As currently ordered, we expose the iova list to the user in 5/7 with
> the PCI window reservations still intact.  Isn't that a bisection
> problem?  This patch should come before the iova list is expose to the
> user.  This is otherwise independent, so I can pop it up in the stack on
> commit, but I'd need an ack from Joerg and Robin to take it via my
> tree.  Thanks,

If it counts, the changes look right, so:

Acked-by: Robin Murphy <robin.murphy@arm.com>

but it does look like there's a hard dependency on Joerg's core branch 
where Shameer's ITS workaround patches are currently queued. Otherwise, 
though, I don't think there's anything else due to be touching iommu-dma 
just yet.

Robin.

> 
> Alex
> 
>>   drivers/iommu/dma-iommu.c | 54 ++++++++++++++++++++++-------------------------
>>   1 file changed, 25 insertions(+), 29 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index f05f3cf..ddcbbdb 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -167,40 +167,16 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
>>    * @list: Reserved region list from iommu_get_resv_regions()
>>    *
>>    * IOMMU drivers can use this to implement their .get_resv_regions callback
>> - * for general non-IOMMU-specific reservations. Currently, this covers host
>> - * bridge windows for PCI devices and GICv3 ITS region reservation on ACPI
>> - * based ARM platforms that may require HW MSI reservation.
>> + * for general non-IOMMU-specific reservations. Currently, this covers GICv3
>> + * ITS region reservation on ACPI based ARM platforms that may require HW MSI
>> + * reservation.
>>    */
>>   void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
>>   {
>> -	struct pci_host_bridge *bridge;
>> -	struct resource_entry *window;
>> -
>> -	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode) &&
>> -		iort_iommu_msi_get_resv_regions(dev, list) < 0)
>> -		return;
>> -
>> -	if (!dev_is_pci(dev))
>> -		return;
>> -
>> -	bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
>> -	resource_list_for_each_entry(window, &bridge->windows) {
>> -		struct iommu_resv_region *region;
>> -		phys_addr_t start;
>> -		size_t length;
>> -
>> -		if (resource_type(window->res) != IORESOURCE_MEM)
>> -			continue;
>>   
>> -		start = window->res->start - window->offset;
>> -		length = window->res->end - window->res->start + 1;
>> -		region = iommu_alloc_resv_region(start, length, 0,
>> -				IOMMU_RESV_RESERVED);
>> -		if (!region)
>> -			return;
>> +	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode))
>> +		iort_iommu_msi_get_resv_regions(dev, list);
>>   
>> -		list_add_tail(&region->list, list);
>> -	}
>>   }
>>   EXPORT_SYMBOL(iommu_dma_get_resv_regions);
>>   
>> @@ -229,6 +205,23 @@ static int cookie_init_hw_msi_region(struct iommu_dma_cookie *cookie,
>>   	return 0;
>>   }
>>   
>> +static void iova_reserve_pci_windows(struct pci_dev *dev,
>> +		struct iova_domain *iovad)
>> +{
>> +	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>> +	struct resource_entry *window;
>> +	unsigned long lo, hi;
>> +
>> +	resource_list_for_each_entry(window, &bridge->windows) {
>> +		if (resource_type(window->res) != IORESOURCE_MEM)
>> +			continue;
>> +
>> +		lo = iova_pfn(iovad, window->res->start - window->offset);
>> +		hi = iova_pfn(iovad, window->res->end - window->offset);
>> +		reserve_iova(iovad, lo, hi);
>> +	}
>> +}
>> +
>>   static int iova_reserve_iommu_regions(struct device *dev,
>>   		struct iommu_domain *domain)
>>   {
>> @@ -238,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev,
>>   	LIST_HEAD(resv_regions);
>>   	int ret = 0;
>>   
>> +	if (dev_is_pci(dev))
>> +		iova_reserve_pci_windows(to_pci_dev(dev), iovad);
>> +
>>   	iommu_get_resv_regions(dev, &resv_regions);
>>   	list_for_each_entry(region, &resv_regions, list) {
>>   		unsigned long lo, hi;
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 7/7] iommu/dma: Move PCI window region reservation back into dma specific path.
@ 2018-03-22 17:22       ` Robin Murphy
  0 siblings, 0 replies; 63+ messages in thread
From: Robin Murphy @ 2018-03-22 17:22 UTC (permalink / raw)
  To: Alex Williamson, Shameer Kolothum
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 22/03/18 16:21, Alex Williamson wrote:
> On Thu, 15 Mar 2018 16:35:09 +0000
> Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
> 
>> This pretty much reverts commit 273df9635385 ("iommu/dma: Make PCI
>> window reservation generic")  by moving the PCI window region
>> reservation back into the dma specific path so that these regions
>> doesn't get exposed via the IOMMU API interface. With this change,
>> the vfio interface will report only iommu specific reserved regions
>> to the user space.
>>
>> Cc: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>> Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>> ---
> 
> As currently ordered, we expose the iova list to the user in 5/7 with
> the PCI window reservations still intact.  Isn't that a bisection
> problem?  This patch should come before the iova list is expose to the
> user.  This is otherwise independent, so I can pop it up in the stack on
> commit, but I'd need an ack from Joerg and Robin to take it via my
> tree.  Thanks,

If it counts, the changes look right, so:

Acked-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>

but it does look like there's a hard dependency on Joerg's core branch 
where Shameer's ITS workaround patches are currently queued. Otherwise, 
though, I don't think there's anything else due to be touching iommu-dma 
just yet.

Robin.

> 
> Alex
> 
>>   drivers/iommu/dma-iommu.c | 54 ++++++++++++++++++++++-------------------------
>>   1 file changed, 25 insertions(+), 29 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index f05f3cf..ddcbbdb 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -167,40 +167,16 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
>>    * @list: Reserved region list from iommu_get_resv_regions()
>>    *
>>    * IOMMU drivers can use this to implement their .get_resv_regions callback
>> - * for general non-IOMMU-specific reservations. Currently, this covers host
>> - * bridge windows for PCI devices and GICv3 ITS region reservation on ACPI
>> - * based ARM platforms that may require HW MSI reservation.
>> + * for general non-IOMMU-specific reservations. Currently, this covers GICv3
>> + * ITS region reservation on ACPI based ARM platforms that may require HW MSI
>> + * reservation.
>>    */
>>   void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
>>   {
>> -	struct pci_host_bridge *bridge;
>> -	struct resource_entry *window;
>> -
>> -	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode) &&
>> -		iort_iommu_msi_get_resv_regions(dev, list) < 0)
>> -		return;
>> -
>> -	if (!dev_is_pci(dev))
>> -		return;
>> -
>> -	bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
>> -	resource_list_for_each_entry(window, &bridge->windows) {
>> -		struct iommu_resv_region *region;
>> -		phys_addr_t start;
>> -		size_t length;
>> -
>> -		if (resource_type(window->res) != IORESOURCE_MEM)
>> -			continue;
>>   
>> -		start = window->res->start - window->offset;
>> -		length = window->res->end - window->res->start + 1;
>> -		region = iommu_alloc_resv_region(start, length, 0,
>> -				IOMMU_RESV_RESERVED);
>> -		if (!region)
>> -			return;
>> +	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode))
>> +		iort_iommu_msi_get_resv_regions(dev, list);
>>   
>> -		list_add_tail(&region->list, list);
>> -	}
>>   }
>>   EXPORT_SYMBOL(iommu_dma_get_resv_regions);
>>   
>> @@ -229,6 +205,23 @@ static int cookie_init_hw_msi_region(struct iommu_dma_cookie *cookie,
>>   	return 0;
>>   }
>>   
>> +static void iova_reserve_pci_windows(struct pci_dev *dev,
>> +		struct iova_domain *iovad)
>> +{
>> +	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>> +	struct resource_entry *window;
>> +	unsigned long lo, hi;
>> +
>> +	resource_list_for_each_entry(window, &bridge->windows) {
>> +		if (resource_type(window->res) != IORESOURCE_MEM)
>> +			continue;
>> +
>> +		lo = iova_pfn(iovad, window->res->start - window->offset);
>> +		hi = iova_pfn(iovad, window->res->end - window->offset);
>> +		reserve_iova(iovad, lo, hi);
>> +	}
>> +}
>> +
>>   static int iova_reserve_iommu_regions(struct device *dev,
>>   		struct iommu_domain *domain)
>>   {
>> @@ -238,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev,
>>   	LIST_HEAD(resv_regions);
>>   	int ret = 0;
>>   
>> +	if (dev_is_pci(dev))
>> +		iova_reserve_pci_windows(to_pci_dev(dev), iovad);
>> +
>>   	iommu_get_resv_regions(dev, &resv_regions);
>>   	list_for_each_entry(region, &resv_regions, list) {
>>   		unsigned long lo, hi;
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 7/7] iommu/dma: Move PCI window region reservation back into dma specific path.
@ 2018-03-23  8:57         ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 63+ messages in thread
From: Shameerali Kolothum Thodi @ 2018-03-23  8:57 UTC (permalink / raw)
  To: Robin Murphy, Alex Williamson
  Cc: eric.auger, pmorel, kvm, linux-kernel, iommu, Linuxarm,
	John Garry, xuwei (O),
	Joerg Roedel



> -----Original Message-----
> From: Robin Murphy [mailto:robin.murphy@arm.com]
> Sent: Thursday, March 22, 2018 5:22 PM
> To: Alex Williamson <alex.williamson@redhat.com>; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: eric.auger@redhat.com; pmorel@linux.vnet.ibm.com;
> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; iommu@lists.linux-
> foundation.org; Linuxarm <linuxarm@huawei.com>; John Garry
> <john.garry@huawei.com>; xuwei (O) <xuwei5@huawei.com>; Joerg Roedel
> <joro@8bytes.org>
> Subject: Re: [PATCH v5 7/7] iommu/dma: Move PCI window region reservation
> back into dma specific path.
> 
> On 22/03/18 16:21, Alex Williamson wrote:
> > On Thu, 15 Mar 2018 16:35:09 +0000
> > Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:
> >
> >> This pretty much reverts commit 273df9635385 ("iommu/dma: Make PCI
> >> window reservation generic")  by moving the PCI window region
> >> reservation back into the dma specific path so that these regions
> >> doesn't get exposed via the IOMMU API interface. With this change,
> >> the vfio interface will report only iommu specific reserved regions
> >> to the user space.
> >>
> >> Cc: Robin Murphy <robin.murphy@arm.com>
> >> Cc: Joerg Roedel <joro@8bytes.org>
> >> Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> >> ---
> >
> > As currently ordered, we expose the iova list to the user in 5/7 with
> > the PCI window reservations still intact.  Isn't that a bisection
> > problem?  This patch should come before the iova list is expose to the
> > user.  This is otherwise independent, so I can pop it up in the stack on
> > commit, but I'd need an ack from Joerg and Robin to take it via my
> > tree.  Thanks,
> 
> If it counts, the changes look right, so:
> 
> Acked-by: Robin Murphy <robin.murphy@arm.com>

Thanks Robin.
 
> but it does look like there's a hard dependency on Joerg's core branch
> where Shameer's ITS workaround patches are currently queued. Otherwise,
> though, I don't think there's anything else due to be touching iommu-dma
> just yet.

True, I have mentioned this dependency[1] in the cover letter.

Thanks,
Shameer

1. https://kernel.googlesource.com/pub/scm/linux/kernel/git/joro/iommu/+log/core


> Robin.
> 
> >
> > Alex
> >
> >>   drivers/iommu/dma-iommu.c | 54 ++++++++++++++++++++++-----------------
> --------
> >>   1 file changed, 25 insertions(+), 29 deletions(-)
> >>
> >> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> >> index f05f3cf..ddcbbdb 100644
> >> --- a/drivers/iommu/dma-iommu.c
> >> +++ b/drivers/iommu/dma-iommu.c
> >> @@ -167,40 +167,16 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
> >>    * @list: Reserved region list from iommu_get_resv_regions()
> >>    *
> >>    * IOMMU drivers can use this to implement their .get_resv_regions
> callback
> >> - * for general non-IOMMU-specific reservations. Currently, this covers host
> >> - * bridge windows for PCI devices and GICv3 ITS region reservation on ACPI
> >> - * based ARM platforms that may require HW MSI reservation.
> >> + * for general non-IOMMU-specific reservations. Currently, this covers
> GICv3
> >> + * ITS region reservation on ACPI based ARM platforms that may require
> HW MSI
> >> + * reservation.
> >>    */
> >>   void iommu_dma_get_resv_regions(struct device *dev, struct list_head
> *list)
> >>   {
> >> -	struct pci_host_bridge *bridge;
> >> -	struct resource_entry *window;
> >> -
> >> -	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode) &&
> >> -		iort_iommu_msi_get_resv_regions(dev, list) < 0)
> >> -		return;
> >> -
> >> -	if (!dev_is_pci(dev))
> >> -		return;
> >> -
> >> -	bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
> >> -	resource_list_for_each_entry(window, &bridge->windows) {
> >> -		struct iommu_resv_region *region;
> >> -		phys_addr_t start;
> >> -		size_t length;
> >> -
> >> -		if (resource_type(window->res) != IORESOURCE_MEM)
> >> -			continue;
> >>
> >> -		start = window->res->start - window->offset;
> >> -		length = window->res->end - window->res->start + 1;
> >> -		region = iommu_alloc_resv_region(start, length, 0,
> >> -				IOMMU_RESV_RESERVED);
> >> -		if (!region)
> >> -			return;
> >> +	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode))
> >> +		iort_iommu_msi_get_resv_regions(dev, list);
> >>
> >> -		list_add_tail(&region->list, list);
> >> -	}
> >>   }
> >>   EXPORT_SYMBOL(iommu_dma_get_resv_regions);
> >>
> >> @@ -229,6 +205,23 @@ static int cookie_init_hw_msi_region(struct
> iommu_dma_cookie *cookie,
> >>   	return 0;
> >>   }
> >>
> >> +static void iova_reserve_pci_windows(struct pci_dev *dev,
> >> +		struct iova_domain *iovad)
> >> +{
> >> +	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
> >> +	struct resource_entry *window;
> >> +	unsigned long lo, hi;
> >> +
> >> +	resource_list_for_each_entry(window, &bridge->windows) {
> >> +		if (resource_type(window->res) != IORESOURCE_MEM)
> >> +			continue;
> >> +
> >> +		lo = iova_pfn(iovad, window->res->start - window->offset);
> >> +		hi = iova_pfn(iovad, window->res->end - window->offset);
> >> +		reserve_iova(iovad, lo, hi);
> >> +	}
> >> +}
> >> +
> >>   static int iova_reserve_iommu_regions(struct device *dev,
> >>   		struct iommu_domain *domain)
> >>   {
> >> @@ -238,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device
> *dev,
> >>   	LIST_HEAD(resv_regions);
> >>   	int ret = 0;
> >>
> >> +	if (dev_is_pci(dev))
> >> +		iova_reserve_pci_windows(to_pci_dev(dev), iovad);
> >> +
> >>   	iommu_get_resv_regions(dev, &resv_regions);
> >>   	list_for_each_entry(region, &resv_regions, list) {
> >>   		unsigned long lo, hi;
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 7/7] iommu/dma: Move PCI window region reservation back into dma specific path.
@ 2018-03-23  8:57         ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 63+ messages in thread
From: Shameerali Kolothum Thodi @ 2018-03-23  8:57 UTC (permalink / raw)
  To: Robin Murphy, Alex Williamson
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Linuxarm,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, xuwei (O)



> -----Original Message-----
> From: Robin Murphy [mailto:robin.murphy-5wv7dgnIgG8@public.gmane.org]
> Sent: Thursday, March 22, 2018 5:22 PM
> To: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> Cc: eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org;
> kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; iommu-cunTk1MwBs/ROKNJybVBZg@public.gmane.org
> foundation.org; Linuxarm <linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; John Garry
> <john.garry-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; xuwei (O) <xuwei5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; Joerg Roedel
> <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> Subject: Re: [PATCH v5 7/7] iommu/dma: Move PCI window region reservation
> back into dma specific path.
> 
> On 22/03/18 16:21, Alex Williamson wrote:
> > On Thu, 15 Mar 2018 16:35:09 +0000
> > Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
> >
> >> This pretty much reverts commit 273df9635385 ("iommu/dma: Make PCI
> >> window reservation generic")  by moving the PCI window region
> >> reservation back into the dma specific path so that these regions
> >> doesn't get exposed via the IOMMU API interface. With this change,
> >> the vfio interface will report only iommu specific reserved regions
> >> to the user space.
> >>
> >> Cc: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> >> Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> >> Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> >> ---
> >
> > As currently ordered, we expose the iova list to the user in 5/7 with
> > the PCI window reservations still intact.  Isn't that a bisection
> > problem?  This patch should come before the iova list is expose to the
> > user.  This is otherwise independent, so I can pop it up in the stack on
> > commit, but I'd need an ack from Joerg and Robin to take it via my
> > tree.  Thanks,
> 
> If it counts, the changes look right, so:
> 
> Acked-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>

Thanks Robin.
 
> but it does look like there's a hard dependency on Joerg's core branch
> where Shameer's ITS workaround patches are currently queued. Otherwise,
> though, I don't think there's anything else due to be touching iommu-dma
> just yet.

True, I have mentioned this dependency[1] in the cover letter.

Thanks,
Shameer

1. https://kernel.googlesource.com/pub/scm/linux/kernel/git/joro/iommu/+log/core


> Robin.
> 
> >
> > Alex
> >
> >>   drivers/iommu/dma-iommu.c | 54 ++++++++++++++++++++++-----------------
> --------
> >>   1 file changed, 25 insertions(+), 29 deletions(-)
> >>
> >> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> >> index f05f3cf..ddcbbdb 100644
> >> --- a/drivers/iommu/dma-iommu.c
> >> +++ b/drivers/iommu/dma-iommu.c
> >> @@ -167,40 +167,16 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
> >>    * @list: Reserved region list from iommu_get_resv_regions()
> >>    *
> >>    * IOMMU drivers can use this to implement their .get_resv_regions
> callback
> >> - * for general non-IOMMU-specific reservations. Currently, this covers host
> >> - * bridge windows for PCI devices and GICv3 ITS region reservation on ACPI
> >> - * based ARM platforms that may require HW MSI reservation.
> >> + * for general non-IOMMU-specific reservations. Currently, this covers
> GICv3
> >> + * ITS region reservation on ACPI based ARM platforms that may require
> HW MSI
> >> + * reservation.
> >>    */
> >>   void iommu_dma_get_resv_regions(struct device *dev, struct list_head
> *list)
> >>   {
> >> -	struct pci_host_bridge *bridge;
> >> -	struct resource_entry *window;
> >> -
> >> -	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode) &&
> >> -		iort_iommu_msi_get_resv_regions(dev, list) < 0)
> >> -		return;
> >> -
> >> -	if (!dev_is_pci(dev))
> >> -		return;
> >> -
> >> -	bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
> >> -	resource_list_for_each_entry(window, &bridge->windows) {
> >> -		struct iommu_resv_region *region;
> >> -		phys_addr_t start;
> >> -		size_t length;
> >> -
> >> -		if (resource_type(window->res) != IORESOURCE_MEM)
> >> -			continue;
> >>
> >> -		start = window->res->start - window->offset;
> >> -		length = window->res->end - window->res->start + 1;
> >> -		region = iommu_alloc_resv_region(start, length, 0,
> >> -				IOMMU_RESV_RESERVED);
> >> -		if (!region)
> >> -			return;
> >> +	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode))
> >> +		iort_iommu_msi_get_resv_regions(dev, list);
> >>
> >> -		list_add_tail(&region->list, list);
> >> -	}
> >>   }
> >>   EXPORT_SYMBOL(iommu_dma_get_resv_regions);
> >>
> >> @@ -229,6 +205,23 @@ static int cookie_init_hw_msi_region(struct
> iommu_dma_cookie *cookie,
> >>   	return 0;
> >>   }
> >>
> >> +static void iova_reserve_pci_windows(struct pci_dev *dev,
> >> +		struct iova_domain *iovad)
> >> +{
> >> +	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
> >> +	struct resource_entry *window;
> >> +	unsigned long lo, hi;
> >> +
> >> +	resource_list_for_each_entry(window, &bridge->windows) {
> >> +		if (resource_type(window->res) != IORESOURCE_MEM)
> >> +			continue;
> >> +
> >> +		lo = iova_pfn(iovad, window->res->start - window->offset);
> >> +		hi = iova_pfn(iovad, window->res->end - window->offset);
> >> +		reserve_iova(iovad, lo, hi);
> >> +	}
> >> +}
> >> +
> >>   static int iova_reserve_iommu_regions(struct device *dev,
> >>   		struct iommu_domain *domain)
> >>   {
> >> @@ -238,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device
> *dev,
> >>   	LIST_HEAD(resv_regions);
> >>   	int ret = 0;
> >>
> >> +	if (dev_is_pci(dev))
> >> +		iova_reserve_pci_windows(to_pci_dev(dev), iovad);
> >> +
> >>   	iommu_get_resv_regions(dev, &resv_regions);
> >>   	list_for_each_entry(region, &resv_regions, list) {
> >>   		unsigned long lo, hi;
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 7/7] iommu/dma: Move PCI window region reservation back into dma specific path.
@ 2018-03-23  8:57         ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 63+ messages in thread
From: Shameerali Kolothum Thodi @ 2018-03-23  8:57 UTC (permalink / raw)
  To: Robin Murphy, Alex Williamson
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Linuxarm,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, xuwei (O)



> -----Original Message-----
> From: Robin Murphy [mailto:robin.murphy-5wv7dgnIgG8@public.gmane.org]
> Sent: Thursday, March 22, 2018 5:22 PM
> To: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> Cc: eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org;
> kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; iommu-cunTk1MwBs/ROKNJybVBZg@public.gmane.org
> foundation.org; Linuxarm <linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; John Garry
> <john.garry-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; xuwei (O) <xuwei5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; Joerg Roedel
> <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> Subject: Re: [PATCH v5 7/7] iommu/dma: Move PCI window region reservation
> back into dma specific path.
> 
> On 22/03/18 16:21, Alex Williamson wrote:
> > On Thu, 15 Mar 2018 16:35:09 +0000
> > Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
> >
> >> This pretty much reverts commit 273df9635385 ("iommu/dma: Make PCI
> >> window reservation generic")  by moving the PCI window region
> >> reservation back into the dma specific path so that these regions
> >> doesn't get exposed via the IOMMU API interface. With this change,
> >> the vfio interface will report only iommu specific reserved regions
> >> to the user space.
> >>
> >> Cc: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> >> Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> >> Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> >> ---
> >
> > As currently ordered, we expose the iova list to the user in 5/7 with
> > the PCI window reservations still intact.  Isn't that a bisection
> > problem?  This patch should come before the iova list is expose to the
> > user.  This is otherwise independent, so I can pop it up in the stack on
> > commit, but I'd need an ack from Joerg and Robin to take it via my
> > tree.  Thanks,
> 
> If it counts, the changes look right, so:
> 
> Acked-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>

Thanks Robin.
 
> but it does look like there's a hard dependency on Joerg's core branch
> where Shameer's ITS workaround patches are currently queued. Otherwise,
> though, I don't think there's anything else due to be touching iommu-dma
> just yet.

True, I have mentioned this dependency[1] in the cover letter.

Thanks,
Shameer

1. https://kernel.googlesource.com/pub/scm/linux/kernel/git/joro/iommu/+log/core


> Robin.
> 
> >
> > Alex
> >
> >>   drivers/iommu/dma-iommu.c | 54 ++++++++++++++++++++++-----------------
> --------
> >>   1 file changed, 25 insertions(+), 29 deletions(-)
> >>
> >> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> >> index f05f3cf..ddcbbdb 100644
> >> --- a/drivers/iommu/dma-iommu.c
> >> +++ b/drivers/iommu/dma-iommu.c
> >> @@ -167,40 +167,16 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
> >>    * @list: Reserved region list from iommu_get_resv_regions()
> >>    *
> >>    * IOMMU drivers can use this to implement their .get_resv_regions
> callback
> >> - * for general non-IOMMU-specific reservations. Currently, this covers host
> >> - * bridge windows for PCI devices and GICv3 ITS region reservation on ACPI
> >> - * based ARM platforms that may require HW MSI reservation.
> >> + * for general non-IOMMU-specific reservations. Currently, this covers
> GICv3
> >> + * ITS region reservation on ACPI based ARM platforms that may require
> HW MSI
> >> + * reservation.
> >>    */
> >>   void iommu_dma_get_resv_regions(struct device *dev, struct list_head
> *list)
> >>   {
> >> -	struct pci_host_bridge *bridge;
> >> -	struct resource_entry *window;
> >> -
> >> -	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode) &&
> >> -		iort_iommu_msi_get_resv_regions(dev, list) < 0)
> >> -		return;
> >> -
> >> -	if (!dev_is_pci(dev))
> >> -		return;
> >> -
> >> -	bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
> >> -	resource_list_for_each_entry(window, &bridge->windows) {
> >> -		struct iommu_resv_region *region;
> >> -		phys_addr_t start;
> >> -		size_t length;
> >> -
> >> -		if (resource_type(window->res) != IORESOURCE_MEM)
> >> -			continue;
> >>
> >> -		start = window->res->start - window->offset;
> >> -		length = window->res->end - window->res->start + 1;
> >> -		region = iommu_alloc_resv_region(start, length, 0,
> >> -				IOMMU_RESV_RESERVED);
> >> -		if (!region)
> >> -			return;
> >> +	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode))
> >> +		iort_iommu_msi_get_resv_regions(dev, list);
> >>
> >> -		list_add_tail(&region->list, list);
> >> -	}
> >>   }
> >>   EXPORT_SYMBOL(iommu_dma_get_resv_regions);
> >>
> >> @@ -229,6 +205,23 @@ static int cookie_init_hw_msi_region(struct
> iommu_dma_cookie *cookie,
> >>   	return 0;
> >>   }
> >>
> >> +static void iova_reserve_pci_windows(struct pci_dev *dev,
> >> +		struct iova_domain *iovad)
> >> +{
> >> +	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
> >> +	struct resource_entry *window;
> >> +	unsigned long lo, hi;
> >> +
> >> +	resource_list_for_each_entry(window, &bridge->windows) {
> >> +		if (resource_type(window->res) != IORESOURCE_MEM)
> >> +			continue;
> >> +
> >> +		lo = iova_pfn(iovad, window->res->start - window->offset);
> >> +		hi = iova_pfn(iovad, window->res->end - window->offset);
> >> +		reserve_iova(iovad, lo, hi);
> >> +	}
> >> +}
> >> +
> >>   static int iova_reserve_iommu_regions(struct device *dev,
> >>   		struct iommu_domain *domain)
> >>   {
> >> @@ -238,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device
> *dev,
> >>   	LIST_HEAD(resv_regions);
> >>   	int ret = 0;
> >>
> >> +	if (dev_is_pci(dev))
> >> +		iova_reserve_pci_windows(to_pci_dev(dev), iovad);
> >> +
> >>   	iommu_get_resv_regions(dev, &resv_regions);
> >>   	list_for_each_entry(region, &resv_regions, list) {
> >>   		unsigned long lo, hi;
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 7/7] iommu/dma: Move PCI window region reservation back into dma specific path.
@ 2018-03-28 13:41           ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 63+ messages in thread
From: Shameerali Kolothum Thodi @ 2018-03-28 13:41 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Robin Murphy, Alex Williamson, Joerg Roedel
  Cc: kvm, pmorel, linux-kernel, Linuxarm, eric.auger, iommu, xuwei (O)

Hi Joerg,

> -----Original Message-----
> From: Linuxarm [mailto:linuxarm-bounces@huawei.com] On Behalf Of
> Shameerali Kolothum Thodi
> Sent: Friday, March 23, 2018 8:57 AM
> To: Robin Murphy <robin.murphy@arm.com>; Alex Williamson
> <alex.williamson@redhat.com>
> Cc: kvm@vger.kernel.org; Joerg Roedel <joro@8bytes.org>;
> pmorel@linux.vnet.ibm.com; linux-kernel@vger.kernel.org; Linuxarm
> <linuxarm@huawei.com>; eric.auger@redhat.com; iommu@lists.linux-
> foundation.org; xuwei (O) <xuwei5@huawei.com>
> Subject: RE: [PATCH v5 7/7] iommu/dma: Move PCI window region reservation
> back into dma specific path.
> 
> 
> 
> > -----Original Message-----
> > From: Robin Murphy [mailto:robin.murphy@arm.com]
> > Sent: Thursday, March 22, 2018 5:22 PM
> > To: Alex Williamson <alex.williamson@redhat.com>; Shameerali Kolothum
> > Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: eric.auger@redhat.com; pmorel@linux.vnet.ibm.com;
> > kvm@vger.kernel.org; linux-kernel@vger.kernel.org; iommu@lists.linux-
> > foundation.org; Linuxarm <linuxarm@huawei.com>; John Garry
> > <john.garry@huawei.com>; xuwei (O) <xuwei5@huawei.com>; Joerg Roedel
> > <joro@8bytes.org>
> > Subject: Re: [PATCH v5 7/7] iommu/dma: Move PCI window region
> > reservation back into dma specific path.
> >
> > On 22/03/18 16:21, Alex Williamson wrote:
> > > On Thu, 15 Mar 2018 16:35:09 +0000
> > > Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:
> > >
> > >> This pretty much reverts commit 273df9635385 ("iommu/dma: Make PCI
> > >> window reservation generic")  by moving the PCI window region
> > >> reservation back into the dma specific path so that these regions
> > >> doesn't get exposed via the IOMMU API interface. With this change,
> > >> the vfio interface will report only iommu specific reserved regions
> > >> to the user space.
> > >>
> > >> Cc: Robin Murphy <robin.murphy@arm.com>
> > >> Cc: Joerg Roedel <joro@8bytes.org>
> > >> Signed-off-by: Shameer Kolothum
> > <shameerali.kolothum.thodi@huawei.com>
> > >> ---
> > >
> > > As currently ordered, we expose the iova list to the user in 5/7
> > > with the PCI window reservations still intact.  Isn't that a
> > > bisection problem?  This patch should come before the iova list is
> > > expose to the user.  This is otherwise independent, so I can pop it
> > > up in the stack on commit, but I'd need an ack from Joerg and Robin
> > > to take it via my tree.  Thanks,
> >
> > If it counts, the changes look right, so:
> >
> > Acked-by: Robin Murphy <robin.murphy@arm.com>
> 
> Thanks Robin.
> 
> > but it does look like there's a hard dependency on Joerg's core branch
> > where Shameer's ITS workaround patches are currently queued.
> > Otherwise, though, I don't think there's anything else due to be
> > touching iommu-dma just yet.
> 
> True, I have mentioned this dependency[1] in the cover letter.

Just a gentle ping on this. 

If you are Ok with this one, as mentioned by Alex above, he might be able to
take this via his tree or please advise the best option to pull this considering the
dependency with the ITS workaround patches.

Thanks,
Shameer

> 1.
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/joro/iommu/+log/c
> ore
> 
> 
> > Robin.
> >
> > >
> > > Alex
> > >
> > >>   drivers/iommu/dma-iommu.c | 54
> > >> ++++++++++++++++++++++-----------------
> > --------
> > >>   1 file changed, 25 insertions(+), 29 deletions(-)
> > >>
> > >> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > >> index f05f3cf..ddcbbdb 100644
> > >> --- a/drivers/iommu/dma-iommu.c
> > >> +++ b/drivers/iommu/dma-iommu.c
> > >> @@ -167,40 +167,16 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
> > >>    * @list: Reserved region list from iommu_get_resv_regions()
> > >>    *
> > >>    * IOMMU drivers can use this to implement their
> > >> .get_resv_regions
> > callback
> > >> - * for general non-IOMMU-specific reservations. Currently, this
> > >> covers host
> > >> - * bridge windows for PCI devices and GICv3 ITS region reservation
> > >> on ACPI
> > >> - * based ARM platforms that may require HW MSI reservation.
> > >> + * for general non-IOMMU-specific reservations. Currently, this
> > >> + covers
> > GICv3
> > >> + * ITS region reservation on ACPI based ARM platforms that may
> > >> + require
> > HW MSI
> > >> + * reservation.
> > >>    */
> > >>   void iommu_dma_get_resv_regions(struct device *dev, struct
> > >> list_head
> > *list)
> > >>   {
> > >> -	struct pci_host_bridge *bridge;
> > >> -	struct resource_entry *window;
> > >> -
> > >> -	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode) &&
> > >> -		iort_iommu_msi_get_resv_regions(dev, list) < 0)
> > >> -		return;
> > >> -
> > >> -	if (!dev_is_pci(dev))
> > >> -		return;
> > >> -
> > >> -	bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
> > >> -	resource_list_for_each_entry(window, &bridge->windows) {
> > >> -		struct iommu_resv_region *region;
> > >> -		phys_addr_t start;
> > >> -		size_t length;
> > >> -
> > >> -		if (resource_type(window->res) != IORESOURCE_MEM)
> > >> -			continue;
> > >>
> > >> -		start = window->res->start - window->offset;
> > >> -		length = window->res->end - window->res->start + 1;
> > >> -		region = iommu_alloc_resv_region(start, length, 0,
> > >> -				IOMMU_RESV_RESERVED);
> > >> -		if (!region)
> > >> -			return;
> > >> +	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode))
> > >> +		iort_iommu_msi_get_resv_regions(dev, list);
> > >>
> > >> -		list_add_tail(&region->list, list);
> > >> -	}
> > >>   }
> > >>   EXPORT_SYMBOL(iommu_dma_get_resv_regions);
> > >>
> > >> @@ -229,6 +205,23 @@ static int cookie_init_hw_msi_region(struct
> > iommu_dma_cookie *cookie,
> > >>   	return 0;
> > >>   }
> > >>
> > >> +static void iova_reserve_pci_windows(struct pci_dev *dev,
> > >> +		struct iova_domain *iovad)
> > >> +{
> > >> +	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
> > >> +	struct resource_entry *window;
> > >> +	unsigned long lo, hi;
> > >> +
> > >> +	resource_list_for_each_entry(window, &bridge->windows) {
> > >> +		if (resource_type(window->res) != IORESOURCE_MEM)
> > >> +			continue;
> > >> +
> > >> +		lo = iova_pfn(iovad, window->res->start - window->offset);
> > >> +		hi = iova_pfn(iovad, window->res->end - window->offset);
> > >> +		reserve_iova(iovad, lo, hi);
> > >> +	}
> > >> +}
> > >> +
> > >>   static int iova_reserve_iommu_regions(struct device *dev,
> > >>   		struct iommu_domain *domain)
> > >>   {
> > >> @@ -238,6 +231,9 @@ static int iova_reserve_iommu_regions(struct
> > >> device
> > *dev,
> > >>   	LIST_HEAD(resv_regions);
> > >>   	int ret = 0;
> > >>
> > >> +	if (dev_is_pci(dev))
> > >> +		iova_reserve_pci_windows(to_pci_dev(dev), iovad);
> > >> +
> > >>   	iommu_get_resv_regions(dev, &resv_regions);
> > >>   	list_for_each_entry(region, &resv_regions, list) {
> > >>   		unsigned long lo, hi;
> > >
> _______________________________________________
> Linuxarm mailing list
> Linuxarm@huawei.com
> http://hulk.huawei.com/mailman/listinfo/linuxarm

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 7/7] iommu/dma: Move PCI window region reservation back into dma specific path.
@ 2018-03-28 13:41           ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 63+ messages in thread
From: Shameerali Kolothum Thodi @ 2018-03-28 13:41 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Robin Murphy, Alex Williamson, Joerg Roedel
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Linuxarm,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, xuwei (O)

Hi Joerg,

> -----Original Message-----
> From: Linuxarm [mailto:linuxarm-bounces-hv44wF8Li93QT0dZR+AlfA@public.gmane.org] On Behalf Of
> Shameerali Kolothum Thodi
> Sent: Friday, March 23, 2018 8:57 AM
> To: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>; Alex Williamson
> <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>;
> pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Linuxarm
> <linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; iommu-cunTk1MwBs/ROKNJybVBZg@public.gmane.org
> foundation.org; xuwei (O) <xuwei5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> Subject: RE: [PATCH v5 7/7] iommu/dma: Move PCI window region reservation
> back into dma specific path.
> 
> 
> 
> > -----Original Message-----
> > From: Robin Murphy [mailto:robin.murphy-5wv7dgnIgG8@public.gmane.org]
> > Sent: Thursday, March 22, 2018 5:22 PM
> > To: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>; Shameerali Kolothum
> > Thodi <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > Cc: eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org;
> > kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; iommu-cunTk1MwBs/ROKNJybVBZg@public.gmane.org
> > foundation.org; Linuxarm <linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; John Garry
> > <john.garry-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; xuwei (O) <xuwei5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; Joerg Roedel
> > <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> > Subject: Re: [PATCH v5 7/7] iommu/dma: Move PCI window region
> > reservation back into dma specific path.
> >
> > On 22/03/18 16:21, Alex Williamson wrote:
> > > On Thu, 15 Mar 2018 16:35:09 +0000
> > > Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
> > >
> > >> This pretty much reverts commit 273df9635385 ("iommu/dma: Make PCI
> > >> window reservation generic")  by moving the PCI window region
> > >> reservation back into the dma specific path so that these regions
> > >> doesn't get exposed via the IOMMU API interface. With this change,
> > >> the vfio interface will report only iommu specific reserved regions
> > >> to the user space.
> > >>
> > >> Cc: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> > >> Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> > >> Signed-off-by: Shameer Kolothum
> > <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > >> ---
> > >
> > > As currently ordered, we expose the iova list to the user in 5/7
> > > with the PCI window reservations still intact.  Isn't that a
> > > bisection problem?  This patch should come before the iova list is
> > > expose to the user.  This is otherwise independent, so I can pop it
> > > up in the stack on commit, but I'd need an ack from Joerg and Robin
> > > to take it via my tree.  Thanks,
> >
> > If it counts, the changes look right, so:
> >
> > Acked-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> 
> Thanks Robin.
> 
> > but it does look like there's a hard dependency on Joerg's core branch
> > where Shameer's ITS workaround patches are currently queued.
> > Otherwise, though, I don't think there's anything else due to be
> > touching iommu-dma just yet.
> 
> True, I have mentioned this dependency[1] in the cover letter.

Just a gentle ping on this. 

If you are Ok with this one, as mentioned by Alex above, he might be able to
take this via his tree or please advise the best option to pull this considering the
dependency with the ITS workaround patches.

Thanks,
Shameer

> 1.
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/joro/iommu/+log/c
> ore
> 
> 
> > Robin.
> >
> > >
> > > Alex
> > >
> > >>   drivers/iommu/dma-iommu.c | 54
> > >> ++++++++++++++++++++++-----------------
> > --------
> > >>   1 file changed, 25 insertions(+), 29 deletions(-)
> > >>
> > >> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > >> index f05f3cf..ddcbbdb 100644
> > >> --- a/drivers/iommu/dma-iommu.c
> > >> +++ b/drivers/iommu/dma-iommu.c
> > >> @@ -167,40 +167,16 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
> > >>    * @list: Reserved region list from iommu_get_resv_regions()
> > >>    *
> > >>    * IOMMU drivers can use this to implement their
> > >> .get_resv_regions
> > callback
> > >> - * for general non-IOMMU-specific reservations. Currently, this
> > >> covers host
> > >> - * bridge windows for PCI devices and GICv3 ITS region reservation
> > >> on ACPI
> > >> - * based ARM platforms that may require HW MSI reservation.
> > >> + * for general non-IOMMU-specific reservations. Currently, this
> > >> + covers
> > GICv3
> > >> + * ITS region reservation on ACPI based ARM platforms that may
> > >> + require
> > HW MSI
> > >> + * reservation.
> > >>    */
> > >>   void iommu_dma_get_resv_regions(struct device *dev, struct
> > >> list_head
> > *list)
> > >>   {
> > >> -	struct pci_host_bridge *bridge;
> > >> -	struct resource_entry *window;
> > >> -
> > >> -	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode) &&
> > >> -		iort_iommu_msi_get_resv_regions(dev, list) < 0)
> > >> -		return;
> > >> -
> > >> -	if (!dev_is_pci(dev))
> > >> -		return;
> > >> -
> > >> -	bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
> > >> -	resource_list_for_each_entry(window, &bridge->windows) {
> > >> -		struct iommu_resv_region *region;
> > >> -		phys_addr_t start;
> > >> -		size_t length;
> > >> -
> > >> -		if (resource_type(window->res) != IORESOURCE_MEM)
> > >> -			continue;
> > >>
> > >> -		start = window->res->start - window->offset;
> > >> -		length = window->res->end - window->res->start + 1;
> > >> -		region = iommu_alloc_resv_region(start, length, 0,
> > >> -				IOMMU_RESV_RESERVED);
> > >> -		if (!region)
> > >> -			return;
> > >> +	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode))
> > >> +		iort_iommu_msi_get_resv_regions(dev, list);
> > >>
> > >> -		list_add_tail(&region->list, list);
> > >> -	}
> > >>   }
> > >>   EXPORT_SYMBOL(iommu_dma_get_resv_regions);
> > >>
> > >> @@ -229,6 +205,23 @@ static int cookie_init_hw_msi_region(struct
> > iommu_dma_cookie *cookie,
> > >>   	return 0;
> > >>   }
> > >>
> > >> +static void iova_reserve_pci_windows(struct pci_dev *dev,
> > >> +		struct iova_domain *iovad)
> > >> +{
> > >> +	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
> > >> +	struct resource_entry *window;
> > >> +	unsigned long lo, hi;
> > >> +
> > >> +	resource_list_for_each_entry(window, &bridge->windows) {
> > >> +		if (resource_type(window->res) != IORESOURCE_MEM)
> > >> +			continue;
> > >> +
> > >> +		lo = iova_pfn(iovad, window->res->start - window->offset);
> > >> +		hi = iova_pfn(iovad, window->res->end - window->offset);
> > >> +		reserve_iova(iovad, lo, hi);
> > >> +	}
> > >> +}
> > >> +
> > >>   static int iova_reserve_iommu_regions(struct device *dev,
> > >>   		struct iommu_domain *domain)
> > >>   {
> > >> @@ -238,6 +231,9 @@ static int iova_reserve_iommu_regions(struct
> > >> device
> > *dev,
> > >>   	LIST_HEAD(resv_regions);
> > >>   	int ret = 0;
> > >>
> > >> +	if (dev_is_pci(dev))
> > >> +		iova_reserve_pci_windows(to_pci_dev(dev), iovad);
> > >> +
> > >>   	iommu_get_resv_regions(dev, &resv_regions);
> > >>   	list_for_each_entry(region, &resv_regions, list) {
> > >>   		unsigned long lo, hi;
> > >
> _______________________________________________
> Linuxarm mailing list
> Linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org
> http://hulk.huawei.com/mailman/listinfo/linuxarm

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 7/7] iommu/dma: Move PCI window region reservation back into dma specific path.
@ 2018-03-28 13:41           ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 63+ messages in thread
From: Shameerali Kolothum Thodi @ 2018-03-28 13:41 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Robin Murphy, Alex Williamson, Joerg Roedel
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA,
	pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Linuxarm,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, xuwei (O)

Hi Joerg,

> -----Original Message-----
> From: Linuxarm [mailto:linuxarm-bounces-hv44wF8Li93QT0dZR+AlfA@public.gmane.org] On Behalf Of
> Shameerali Kolothum Thodi
> Sent: Friday, March 23, 2018 8:57 AM
> To: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>; Alex Williamson
> <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>;
> pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Linuxarm
> <linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; iommu-cunTk1MwBs/ROKNJybVBZg@public.gmane.org
> foundation.org; xuwei (O) <xuwei5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> Subject: RE: [PATCH v5 7/7] iommu/dma: Move PCI window region reservation
> back into dma specific path.
> 
> 
> 
> > -----Original Message-----
> > From: Robin Murphy [mailto:robin.murphy-5wv7dgnIgG8@public.gmane.org]
> > Sent: Thursday, March 22, 2018 5:22 PM
> > To: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>; Shameerali Kolothum
> > Thodi <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > Cc: eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; pmorel-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org;
> > kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; iommu-cunTk1MwBs/ROKNJybVBZg@public.gmane.org
> > foundation.org; Linuxarm <linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; John Garry
> > <john.garry-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; xuwei (O) <xuwei5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>; Joerg Roedel
> > <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> > Subject: Re: [PATCH v5 7/7] iommu/dma: Move PCI window region
> > reservation back into dma specific path.
> >
> > On 22/03/18 16:21, Alex Williamson wrote:
> > > On Thu, 15 Mar 2018 16:35:09 +0000
> > > Shameer Kolothum <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
> > >
> > >> This pretty much reverts commit 273df9635385 ("iommu/dma: Make PCI
> > >> window reservation generic")  by moving the PCI window region
> > >> reservation back into the dma specific path so that these regions
> > >> doesn't get exposed via the IOMMU API interface. With this change,
> > >> the vfio interface will report only iommu specific reserved regions
> > >> to the user space.
> > >>
> > >> Cc: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> > >> Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> > >> Signed-off-by: Shameer Kolothum
> > <shameerali.kolothum.thodi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > >> ---
> > >
> > > As currently ordered, we expose the iova list to the user in 5/7
> > > with the PCI window reservations still intact.  Isn't that a
> > > bisection problem?  This patch should come before the iova list is
> > > expose to the user.  This is otherwise independent, so I can pop it
> > > up in the stack on commit, but I'd need an ack from Joerg and Robin
> > > to take it via my tree.  Thanks,
> >
> > If it counts, the changes look right, so:
> >
> > Acked-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> 
> Thanks Robin.
> 
> > but it does look like there's a hard dependency on Joerg's core branch
> > where Shameer's ITS workaround patches are currently queued.
> > Otherwise, though, I don't think there's anything else due to be
> > touching iommu-dma just yet.
> 
> True, I have mentioned this dependency[1] in the cover letter.

Just a gentle ping on this. 

If you are Ok with this one, as mentioned by Alex above, he might be able to
take this via his tree or please advise the best option to pull this considering the
dependency with the ITS workaround patches.

Thanks,
Shameer

> 1.
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/joro/iommu/+log/c
> ore
> 
> 
> > Robin.
> >
> > >
> > > Alex
> > >
> > >>   drivers/iommu/dma-iommu.c | 54
> > >> ++++++++++++++++++++++-----------------
> > --------
> > >>   1 file changed, 25 insertions(+), 29 deletions(-)
> > >>
> > >> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > >> index f05f3cf..ddcbbdb 100644
> > >> --- a/drivers/iommu/dma-iommu.c
> > >> +++ b/drivers/iommu/dma-iommu.c
> > >> @@ -167,40 +167,16 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
> > >>    * @list: Reserved region list from iommu_get_resv_regions()
> > >>    *
> > >>    * IOMMU drivers can use this to implement their
> > >> .get_resv_regions
> > callback
> > >> - * for general non-IOMMU-specific reservations. Currently, this
> > >> covers host
> > >> - * bridge windows for PCI devices and GICv3 ITS region reservation
> > >> on ACPI
> > >> - * based ARM platforms that may require HW MSI reservation.
> > >> + * for general non-IOMMU-specific reservations. Currently, this
> > >> + covers
> > GICv3
> > >> + * ITS region reservation on ACPI based ARM platforms that may
> > >> + require
> > HW MSI
> > >> + * reservation.
> > >>    */
> > >>   void iommu_dma_get_resv_regions(struct device *dev, struct
> > >> list_head
> > *list)
> > >>   {
> > >> -	struct pci_host_bridge *bridge;
> > >> -	struct resource_entry *window;
> > >> -
> > >> -	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode) &&
> > >> -		iort_iommu_msi_get_resv_regions(dev, list) < 0)
> > >> -		return;
> > >> -
> > >> -	if (!dev_is_pci(dev))
> > >> -		return;
> > >> -
> > >> -	bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
> > >> -	resource_list_for_each_entry(window, &bridge->windows) {
> > >> -		struct iommu_resv_region *region;
> > >> -		phys_addr_t start;
> > >> -		size_t length;
> > >> -
> > >> -		if (resource_type(window->res) != IORESOURCE_MEM)
> > >> -			continue;
> > >>
> > >> -		start = window->res->start - window->offset;
> > >> -		length = window->res->end - window->res->start + 1;
> > >> -		region = iommu_alloc_resv_region(start, length, 0,
> > >> -				IOMMU_RESV_RESERVED);
> > >> -		if (!region)
> > >> -			return;
> > >> +	if (!is_of_node(dev->iommu_fwspec->iommu_fwnode))
> > >> +		iort_iommu_msi_get_resv_regions(dev, list);
> > >>
> > >> -		list_add_tail(&region->list, list);
> > >> -	}
> > >>   }
> > >>   EXPORT_SYMBOL(iommu_dma_get_resv_regions);
> > >>
> > >> @@ -229,6 +205,23 @@ static int cookie_init_hw_msi_region(struct
> > iommu_dma_cookie *cookie,
> > >>   	return 0;
> > >>   }
> > >>
> > >> +static void iova_reserve_pci_windows(struct pci_dev *dev,
> > >> +		struct iova_domain *iovad)
> > >> +{
> > >> +	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
> > >> +	struct resource_entry *window;
> > >> +	unsigned long lo, hi;
> > >> +
> > >> +	resource_list_for_each_entry(window, &bridge->windows) {
> > >> +		if (resource_type(window->res) != IORESOURCE_MEM)
> > >> +			continue;
> > >> +
> > >> +		lo = iova_pfn(iovad, window->res->start - window->offset);
> > >> +		hi = iova_pfn(iovad, window->res->end - window->offset);
> > >> +		reserve_iova(iovad, lo, hi);
> > >> +	}
> > >> +}
> > >> +
> > >>   static int iova_reserve_iommu_regions(struct device *dev,
> > >>   		struct iommu_domain *domain)
> > >>   {
> > >> @@ -238,6 +231,9 @@ static int iova_reserve_iommu_regions(struct
> > >> device
> > *dev,
> > >>   	LIST_HEAD(resv_regions);
> > >>   	int ret = 0;
> > >>
> > >> +	if (dev_is_pci(dev))
> > >> +		iova_reserve_pci_windows(to_pci_dev(dev), iovad);
> > >> +
> > >>   	iommu_get_resv_regions(dev, &resv_regions);
> > >>   	list_for_each_entry(region, &resv_regions, list) {
> > >>   		unsigned long lo, hi;
> > >
> _______________________________________________
> Linuxarm mailing list
> Linuxarm-hv44wF8Li93QT0dZR+AlfA@public.gmane.org
> http://hulk.huawei.com/mailman/listinfo/linuxarm

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2018-03-28 13:41 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-15 16:35 [PATCH v5 0/7] vfio/type1: Add support for valid iova list management Shameer Kolothum
2018-03-15 16:35 ` Shameer Kolothum
2018-03-15 16:35 ` [PATCH v5 1/7] vfio/type1: Introduce iova list and add iommu aperture validity check Shameer Kolothum
2018-03-15 16:35   ` Shameer Kolothum
2018-03-15 16:35   ` Shameer Kolothum
2018-03-15 16:35 ` [PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list Shameer Kolothum
2018-03-15 16:35   ` Shameer Kolothum
2018-03-19  7:51   ` Tian, Kevin
2018-03-19  7:51     ` Tian, Kevin
2018-03-19 10:55     ` Shameerali Kolothum Thodi
2018-03-19 10:55       ` Shameerali Kolothum Thodi
2018-03-19 10:55       ` Shameerali Kolothum Thodi
2018-03-19 12:16       ` Tian, Kevin
2018-03-19 12:16         ` Tian, Kevin
2018-03-19 12:16         ` Tian, Kevin
2018-03-20 22:37     ` Alex Williamson
2018-03-20 22:37       ` Alex Williamson
2018-03-21  3:30       ` Tian, Kevin
2018-03-21  3:30         ` Tian, Kevin
2018-03-21 16:31         ` Alex Williamson
2018-03-21 16:31           ` Alex Williamson
2018-03-22  9:15           ` Shameerali Kolothum Thodi
2018-03-22  9:15             ` Shameerali Kolothum Thodi
2018-03-22  9:15             ` Shameerali Kolothum Thodi
2018-03-15 16:35 ` [PATCH v5 3/7] vfio/type1: Update iova list on detach Shameer Kolothum
2018-03-15 16:35   ` Shameer Kolothum
2018-03-15 16:35   ` Shameer Kolothum
2018-03-15 16:35 ` [PATCH v5 4/7] vfio/type1: check dma map request is within a valid iova range Shameer Kolothum
2018-03-15 16:35   ` Shameer Kolothum
2018-03-15 16:35   ` Shameer Kolothum
2018-03-15 16:35 ` [PATCH v5 5/7] vfio/type1: Add IOVA range capability support Shameer Kolothum
2018-03-15 16:35   ` Shameer Kolothum
2018-03-15 16:35   ` Shameer Kolothum
2018-03-15 16:35 ` [PATCH v5 6/7] vfio/type1: remove duplicate retrieval of reserved regions Shameer Kolothum
2018-03-15 16:35   ` Shameer Kolothum
2018-03-15 16:35 ` [PATCH v5 7/7] iommu/dma: Move PCI window region reservation back into dma specific path Shameer Kolothum
2018-03-15 16:35   ` Shameer Kolothum
2018-03-15 16:35   ` Shameer Kolothum
2018-03-22 16:21   ` Alex Williamson
2018-03-22 16:21     ` Alex Williamson
2018-03-22 17:22     ` Robin Murphy
2018-03-22 17:22       ` Robin Murphy
2018-03-23  8:57       ` Shameerali Kolothum Thodi
2018-03-23  8:57         ` Shameerali Kolothum Thodi
2018-03-23  8:57         ` Shameerali Kolothum Thodi
2018-03-28 13:41         ` Shameerali Kolothum Thodi
2018-03-28 13:41           ` Shameerali Kolothum Thodi
2018-03-28 13:41           ` Shameerali Kolothum Thodi
2018-03-19  8:28 ` [PATCH v5 0/7] vfio/type1: Add support for valid iova list management Tian, Kevin
2018-03-19  8:28   ` Tian, Kevin
2018-03-19 10:54   ` Shameerali Kolothum Thodi
2018-03-19 10:54     ` Shameerali Kolothum Thodi
2018-03-19 10:54     ` Shameerali Kolothum Thodi
2018-03-19 12:12     ` Tian, Kevin
2018-03-19 12:12       ` Tian, Kevin
2018-03-19 12:12       ` Tian, Kevin
2018-03-20 22:55   ` Alex Williamson
2018-03-20 22:55     ` Alex Williamson
2018-03-21  3:28     ` Tian, Kevin
2018-03-21  3:28       ` Tian, Kevin
2018-03-21 17:11       ` Alex Williamson
2018-03-22  9:10         ` Tian, Kevin
2018-03-22  9:10           ` Tian, Kevin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.