KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/3] vfio/iommu_type1: Implement dirty log tracking based on IOMMU HWDBM
@ 2021-04-13  9:14 Keqian Zhu
  2021-04-13  9:14 ` [PATCH 1/3] vfio/iommu_type1: Add HWDBM status maintanance Keqian Zhu
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Keqian Zhu @ 2021-04-13  9:14 UTC (permalink / raw)
  To: linux-kernel, kvm, Alex Williamson, Kirti Wankhede,
	Cornelia Huck, Yi Sun, Tian Kevin
  Cc: Robin Murphy, Will Deacon, Joerg Roedel, Jean-Philippe Brucker,
	Jonathan Cameron, Lu Baolu, wanghaibin.wang, jiangkunkun,
	yuzenghui, lushenming

Hi everyone,

This patch series implement vfio dma dirty log tracking based on IOMMU HWDBM (hardware
dirty bit management, such as SMMU with HTTU or intel IOMMU with SLADE).

This patch series is split from the series[1] that containes both IOMMU part and
VFIO part. Please refer the new IOMMU part[2] to review or test.

Intention:

As we know, vfio live migration is an important and valuable feature, but there
are still many hurdles to solve, including migration of interrupt, device state,
DMA dirty log tracking, and etc.

For now, the only dirty log tracking interface is pinning. It has some drawbacks:
1. Only smart vendor drivers are aware of this.
2. It's coarse-grained, the pinned-scope is generally bigger than what the device actually access.
3. It can't track dirty continuously and precisely, vfio populates all pinned-scope as dirty.
   So it doesn't work well with iteratively dirty log handling.

About this series:

Implement a new dirty log tracking method for vfio based on iommu hwdbm. A new
ioctl operation named VFIO_DIRTY_LOG_MANUAL_CLEAR is added, which can eliminate
some redundant dirty handling of userspace.   
   
Optimizations Todo:

1. We recognized that each smmu_domain (a vfio_container may has several smmu_domain) has its
   own stage1 mapping, and we must scan all these mapping to sync dirty state. We plan to refactor
   smmu_domain to support more than one smmu in one smmu_domain, then these smmus can share a same
   stage1 mapping.
2. We also recognized that scan TTD is a hotspot of performance. Recently, I have implement a
   SW/HW conbined dirty log tracking at MMU side[3], which can effectively solve this problem.
   This idea can be applied to smmu side too.

Thanks,
Keqian

[1] https://lore.kernel.org/linux-iommu/20210310090614.26668-1-zhukeqian1@huawei.com/
[2] https://lore.kernel.org/linux-iommu/20210413085457.25400-1-zhukeqian1@huawei.com/  
[3] https://lore.kernel.org/linux-arm-kernel/20210126124444.27136-1-zhukeqian1@huawei.com/

Kunkun Jiang (3):
  vfio/iommu_type1: Add HWDBM status maintanance
  vfio/iommu_type1: Optimize dirty bitmap population based on iommu
    HWDBM
  vfio/iommu_type1: Add support for manual dirty log clear

 drivers/vfio/vfio_iommu_type1.c | 310 ++++++++++++++++++++++++++++++--
 include/uapi/linux/vfio.h       |  28 ++-
 2 files changed, 326 insertions(+), 12 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] vfio/iommu_type1: Add HWDBM status maintanance
  2021-04-13  9:14 [PATCH 0/3] vfio/iommu_type1: Implement dirty log tracking based on IOMMU HWDBM Keqian Zhu
@ 2021-04-13  9:14 ` Keqian Zhu
  2021-04-13 16:25   ` kernel test robot
  2021-04-14  3:14   ` kernel test robot
  2021-04-13  9:14 ` [PATCH 2/3] vfio/iommu_type1: Optimize dirty bitmap population based on iommu HWDBM Keqian Zhu
  2021-04-13  9:14 ` [PATCH 3/3] vfio/iommu_type1: Add support for manual dirty log clear Keqian Zhu
  2 siblings, 2 replies; 9+ messages in thread
From: Keqian Zhu @ 2021-04-13  9:14 UTC (permalink / raw)
  To: linux-kernel, kvm, Alex Williamson, Kirti Wankhede,
	Cornelia Huck, Yi Sun, Tian Kevin
  Cc: Robin Murphy, Will Deacon, Joerg Roedel, Jean-Philippe Brucker,
	Jonathan Cameron, Lu Baolu, wanghaibin.wang, jiangkunkun,
	yuzenghui, lushenming

From: Kunkun Jiang <jiangkunkun@huawei.com>

We are going to optimize dirty log tracking based on iommu
HWDBM feature, but the dirty log from iommu is useful only
when all iommu backed groups are with HWDBM feature.

This maintains a counter in vfio_iommu, which is used in
the policy of dirty bitmap population in next patch.

This also maintains a counter in vfio_domain, which is used
in the policy of switch dirty log in next patch.

Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---
 drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 45cbfd4879a5..9cb9ce021b22 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -73,6 +73,7 @@ struct vfio_iommu {
 	unsigned int		vaddr_invalid_count;
 	uint64_t		pgsize_bitmap;
 	uint64_t		num_non_pinned_groups;
+	uint64_t		num_non_hwdbm_groups;
 	wait_queue_head_t	vaddr_wait;
 	bool			v2;
 	bool			nesting;
@@ -85,6 +86,7 @@ struct vfio_domain {
 	struct iommu_domain	*domain;
 	struct list_head	next;
 	struct list_head	group_list;
+	uint64_t		num_non_hwdbm_groups;
 	int			prot;		/* IOMMU_CACHE */
 	bool			fgsp;		/* Fine-grained super pages */
 };
@@ -116,6 +118,7 @@ struct vfio_group {
 	struct list_head	next;
 	bool			mdev_group;	/* An mdev group */
 	bool			pinned_page_dirty_scope;
+	bool			iommu_hwdbm;	/* For iommu-backed group */
 };
 
 struct vfio_iova {
@@ -2252,6 +2255,44 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu,
 	list_splice_tail(iova_copy, iova);
 }
 
+static int vfio_dev_enable_feature(struct device *dev, void *data)
+{
+	enum iommu_dev_features *feat = data;
+
+	if (iommu_dev_feature_enabled(dev, *feat))
+		return 0;
+
+	return iommu_dev_enable_feature(dev, *feat);
+}
+
+static bool vfio_group_supports_hwdbm(struct vfio_group *group)
+{
+	enum iommu_dev_features feat = IOMMU_DEV_FEAT_HWDBM;
+
+	return !iommu_group_for_each_dev(group->iommu_group, &feat,
+					 vfio_dev_enable_feature);
+}
+
+/*
+ * Called after a new group is added to the group_list of domain, or before an
+ * old group is removed from the group_list of domain.
+ */
+static void vfio_iommu_update_hwdbm(struct vfio_iommu *iommu,
+				    struct vfio_domain *domain,
+				    struct vfio_group *group,
+				    bool attach)
+{
+	/* Update the HWDBM status of group, domain and iommu */
+	group->iommu_hwdbm = vfio_group_supports_hwdbm(group);
+	if (!group->iommu_hwdbm && attach) {
+		domain->num_non_hwdbm_groups++;
+		iommu->num_non_hwdbm_groups++;
+	} else if (!group->iommu_hwdbm && !attach) {
+		domain->num_non_hwdbm_groups--;
+		iommu->num_non_hwdbm_groups--;
+	}
+}
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
 					 struct iommu_group *iommu_group)
 {
@@ -2409,6 +2450,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 			vfio_iommu_detach_group(domain, group);
 			if (!vfio_iommu_attach_group(d, group)) {
 				list_add(&group->next, &d->group_list);
+				vfio_iommu_update_hwdbm(iommu, d, group, true);
 				iommu_domain_free(domain->domain);
 				kfree(domain);
 				goto done;
@@ -2435,6 +2477,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 
 	list_add(&domain->next, &iommu->domain_list);
 	vfio_update_pgsize_bitmap(iommu);
+	vfio_iommu_update_hwdbm(iommu, domain, group, true);
 done:
 	/* Delete the old one and insert new iova list */
 	vfio_iommu_iova_insert_copy(iommu, &iova_copy);
@@ -2618,6 +2661,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 			continue;
 
 		vfio_iommu_detach_group(domain, group);
+		vfio_iommu_update_hwdbm(iommu, domain, group, false);
 		update_dirty_scope = !group->pinned_page_dirty_scope;
 		list_del(&group->next);
 		kfree(group);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/3] vfio/iommu_type1: Optimize dirty bitmap population based on iommu HWDBM
  2021-04-13  9:14 [PATCH 0/3] vfio/iommu_type1: Implement dirty log tracking based on IOMMU HWDBM Keqian Zhu
  2021-04-13  9:14 ` [PATCH 1/3] vfio/iommu_type1: Add HWDBM status maintanance Keqian Zhu
@ 2021-04-13  9:14 ` Keqian Zhu
  2021-04-13 18:05   ` kernel test robot
  2021-04-13  9:14 ` [PATCH 3/3] vfio/iommu_type1: Add support for manual dirty log clear Keqian Zhu
  2 siblings, 1 reply; 9+ messages in thread
From: Keqian Zhu @ 2021-04-13  9:14 UTC (permalink / raw)
  To: linux-kernel, kvm, Alex Williamson, Kirti Wankhede,
	Cornelia Huck, Yi Sun, Tian Kevin
  Cc: Robin Murphy, Will Deacon, Joerg Roedel, Jean-Philippe Brucker,
	Jonathan Cameron, Lu Baolu, wanghaibin.wang, jiangkunkun,
	yuzenghui, lushenming

From: Kunkun Jiang <jiangkunkun@huawei.com>

In the past if vfio_iommu is not of pinned_page_dirty_scope and
vfio_dma is iommu_mapped, we populate full dirty bitmap for this
vfio_dma. Now we can try to get dirty log from iommu before make
the lousy decision.

The bitmap population:

In detail, if all vfio_group are of pinned_page_dirty_scope, the
dirty bitmap population is not affected. If there are vfio_groups
not of pinned_page_dirty_scope and their domains support HWDBM,
then we can try to get dirty log from IOMMU. Otherwise, lead to
full dirty bitmap.

Consider DMA and group hotplug:

Start dirty log for newly added DMA range, and stop dirty log for
DMA range going to remove.

If a domain don't support HWDBM at start, but can support it after
hotplug some groups (attach a first group with HWDBM or detach all
groups without HWDBM). If a domain support HWDBM at start, but do
not support it after hotplug some groups (attach a group without
HWDBM or detach all groups without HWDBM). So our policy is that
switch dirty log for domains dynamically.

Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---
 drivers/vfio/vfio_iommu_type1.c | 166 ++++++++++++++++++++++++++++++--
 1 file changed, 159 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 9cb9ce021b22..77950e47f56f 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1202,6 +1202,46 @@ static void vfio_update_pgsize_bitmap(struct vfio_iommu *iommu)
 	}
 }
 
+static int vfio_iommu_dirty_log_clear(struct vfio_iommu *iommu,
+				      dma_addr_t start_iova, size_t size,
+				      unsigned long *bitmap_buffer,
+				      dma_addr_t base_iova,
+				      unsigned long pgshift)
+{
+	struct vfio_domain *d;
+	int ret = 0;
+
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		ret = iommu_clear_dirty_log(d->domain, start_iova, size,
+					    bitmap_buffer, base_iova, pgshift);
+		if (ret) {
+			pr_warn("vfio_iommu dirty log clear failed!\n");
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static int vfio_iommu_dirty_log_sync(struct vfio_iommu *iommu,
+				     struct vfio_dma *dma,
+				     unsigned long pgshift)
+{
+	struct vfio_domain *d;
+	int ret = 0;
+
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		ret = iommu_sync_dirty_log(d->domain, dma->iova, dma->size,
+					   dma->bitmap, dma->iova, pgshift);
+		if (ret) {
+			pr_warn("vfio_iommu dirty log sync failed!\n");
+			break;
+		}
+	}
+
+	return ret;
+}
+
 static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
 			      struct vfio_dma *dma, dma_addr_t base_iova,
 			      size_t pgsize)
@@ -1212,13 +1252,22 @@ static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
 	unsigned long copy_offset = bit_offset / BITS_PER_LONG;
 	unsigned long shift = bit_offset % BITS_PER_LONG;
 	unsigned long leftover;
+	int ret;
 
-	/*
-	 * mark all pages dirty if any IOMMU capable device is not able
-	 * to report dirty pages and all pages are pinned and mapped.
-	 */
-	if (iommu->num_non_pinned_groups && dma->iommu_mapped)
+	if (!iommu->num_non_pinned_groups || !dma->iommu_mapped) {
+		/* nothing to do */
+	} else if (!iommu->num_non_hwdbm_groups) {
+		/* try to get dirty log from IOMMU */
+		ret = vfio_iommu_dirty_log_sync(iommu, dma, pgshift);
+		if (ret)
+			return ret;
+	} else {
+		/*
+		 * mark all pages dirty if any IOMMU capable device is not able
+		 * to report dirty pages and all pages are pinned and mapped.
+		 */
 		bitmap_set(dma->bitmap, 0, nbits);
+	}
 
 	if (shift) {
 		bitmap_shift_left(dma->bitmap, dma->bitmap, shift,
@@ -1236,6 +1285,12 @@ static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
 			 DIRTY_BITMAP_BYTES(nbits + shift)))
 		return -EFAULT;
 
+	/* Recover the bitmap if it'll be used to clear hardware dirty log */
+	if (shift && iommu->num_non_pinned_groups && dma->iommu_mapped &&
+	    !iommu->num_non_hwdbm_groups)
+		bitmap_shift_right(dma->bitmap, dma->bitmap, shift,
+				   nbits + shift);
+
 	return 0;
 }
 
@@ -1274,6 +1329,16 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
 		if (ret)
 			return ret;
 
+		/* Clear iommu dirty log to re-enable dirty log tracking */
+		if (iommu->num_non_pinned_groups && dma->iommu_mapped &&
+		    !iommu->num_non_hwdbm_groups) {
+			ret = vfio_iommu_dirty_log_clear(iommu,	dma->iova,
+					dma->size, dma->bitmap, dma->iova,
+					pgshift);
+			if (ret)
+				return ret;
+		}
+
 		/*
 		 * Re-populate bitmap to include all pinned pages which are
 		 * considered as dirty but exclude pages which are unpinned and
@@ -1294,6 +1359,22 @@ static int verify_bitmap_size(uint64_t npages, uint64_t bitmap_size)
 	return 0;
 }
 
+static void vfio_dma_dirty_log_switch(struct vfio_iommu *iommu,
+				      struct vfio_dma *dma, bool enable)
+{
+	struct vfio_domain *d;
+
+	if (!dma->iommu_mapped)
+		return;
+
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		if (d->num_non_hwdbm_groups)
+			continue;
+		WARN_ON(iommu_switch_dirty_log(d->domain, enable, dma->iova,
+					       dma->size, d->prot | dma->prot));
+	}
+}
+
 static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 			     struct vfio_iommu_type1_dma_unmap *unmap,
 			     struct vfio_bitmap *bitmap)
@@ -1446,6 +1527,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 				break;
 		}
 
+		/* Stop log for removed dma */
+		if (iommu->dirty_page_tracking)
+			vfio_dma_dirty_log_switch(iommu, dma, false);
+
 		unmapped += dma->size;
 		n = rb_next(n);
 		vfio_remove_dma(iommu, dma);
@@ -1677,8 +1762,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 
 	if (!ret && iommu->dirty_page_tracking) {
 		ret = vfio_dma_bitmap_alloc(dma, pgsize);
-		if (ret)
+		if (ret) {
 			vfio_remove_dma(iommu, dma);
+			goto out_unlock;
+		}
+
+		/* Start dirty log for newly added dma */
+		vfio_dma_dirty_log_switch(iommu, dma, true);
 	}
 
 out_unlock:
@@ -2273,6 +2363,21 @@ static bool vfio_group_supports_hwdbm(struct vfio_group *group)
 					 vfio_dev_enable_feature);
 }
 
+static void vfio_domain_dirty_log_switch(struct vfio_iommu *iommu,
+					 struct vfio_domain *d, bool enable)
+{
+	struct rb_node *n;
+	struct vfio_dma *dma;
+
+	for (n = rb_first(&iommu->dma_list); n; n = rb_next(n)) {
+		dma = rb_entry(n, struct vfio_dma, node);
+		if (!dma->iommu_mapped)
+			continue;
+		WARN_ON(iommu_switch_dirty_log(d->domain, enable, dma->iova,
+					       dma->size, d->prot | dma->prot));
+	}
+}
+
 /*
  * Called after a new group is added to the group_list of domain, or before an
  * old group is removed from the group_list of domain.
@@ -2282,6 +2387,10 @@ static void vfio_iommu_update_hwdbm(struct vfio_iommu *iommu,
 				    struct vfio_group *group,
 				    bool attach)
 {
+	uint64_t old_num_non_hwdbm = domain->num_non_hwdbm_groups;
+	bool singular = list_is_singular(&domain->group_list);
+	bool log_enabled, should_enable;
+
 	/* Update the HWDBM status of group, domain and iommu */
 	group->iommu_hwdbm = vfio_group_supports_hwdbm(group);
 	if (!group->iommu_hwdbm && attach) {
@@ -2291,6 +2400,30 @@ static void vfio_iommu_update_hwdbm(struct vfio_iommu *iommu,
 		domain->num_non_hwdbm_groups--;
 		iommu->num_non_hwdbm_groups--;
 	}
+
+	if (!iommu->dirty_page_tracking)
+		return;
+
+	/*
+	 * The vfio_domain can switch dirty log tracking dynamically due to
+	 * group attach/detach. The basic idea is to convert current dirty log
+	 * status to desired dirty log status.
+	 *
+	 * If num_non_hwdbm_groups is zero then dirty log has been enabled. One
+	 * exception is that this is the first group attached to a domain.
+	 *
+	 * If the updated num_non_hwdbm_groups is zero then dirty log should be
+	 * enabled. One exception is that this is the last group detached from
+	 * a domain.
+	 */
+	log_enabled = !old_num_non_hwdbm && !(attach && singular);
+	should_enable = !domain->num_non_hwdbm_groups && !(!attach && singular);
+
+	/* Switch dirty log tracking when status changed */
+	if (should_enable && !log_enabled)
+		vfio_domain_dirty_log_switch(iommu, domain, true);
+	else if (!should_enable && log_enabled)
+		vfio_domain_dirty_log_switch(iommu, domain, false);
 }
 
 static int vfio_iommu_type1_attach_group(void *iommu_data,
@@ -3046,6 +3179,22 @@ static int vfio_iommu_type1_unmap_dma(struct vfio_iommu *iommu,
 			-EFAULT : 0;
 }
 
+static void vfio_iommu_dirty_log_switch(struct vfio_iommu *iommu, bool enable)
+{
+	struct vfio_domain *d;
+
+	/*
+	 * We enable dirty log tracking for these vfio_domains that support
+	 * HWDBM. Even if all iommu domains don't support HWDBM for now. They
+	 * may support it after detach some groups.
+	 */
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		if (d->num_non_hwdbm_groups)
+			continue;
+		vfio_domain_dirty_log_switch(iommu, d, enable);
+	}
+}
+
 static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
 					unsigned long arg)
 {
@@ -3078,8 +3227,10 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
 		pgsize = 1 << __ffs(iommu->pgsize_bitmap);
 		if (!iommu->dirty_page_tracking) {
 			ret = vfio_dma_bitmap_alloc_all(iommu, pgsize);
-			if (!ret)
+			if (!ret) {
 				iommu->dirty_page_tracking = true;
+				vfio_iommu_dirty_log_switch(iommu, true);
+			}
 		}
 		mutex_unlock(&iommu->lock);
 		return ret;
@@ -3088,6 +3239,7 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
 		if (iommu->dirty_page_tracking) {
 			iommu->dirty_page_tracking = false;
 			vfio_dma_bitmap_free_all(iommu);
+			vfio_iommu_dirty_log_switch(iommu, false);
 		}
 		mutex_unlock(&iommu->lock);
 		return 0;
-- 
2.19.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 3/3] vfio/iommu_type1: Add support for manual dirty log clear
  2021-04-13  9:14 [PATCH 0/3] vfio/iommu_type1: Implement dirty log tracking based on IOMMU HWDBM Keqian Zhu
  2021-04-13  9:14 ` [PATCH 1/3] vfio/iommu_type1: Add HWDBM status maintanance Keqian Zhu
  2021-04-13  9:14 ` [PATCH 2/3] vfio/iommu_type1: Optimize dirty bitmap population based on iommu HWDBM Keqian Zhu
@ 2021-04-13  9:14 ` Keqian Zhu
  2021-04-15 20:43   ` Alex Williamson
  2 siblings, 1 reply; 9+ messages in thread
From: Keqian Zhu @ 2021-04-13  9:14 UTC (permalink / raw)
  To: linux-kernel, kvm, Alex Williamson, Kirti Wankhede,
	Cornelia Huck, Yi Sun, Tian Kevin
  Cc: Robin Murphy, Will Deacon, Joerg Roedel, Jean-Philippe Brucker,
	Jonathan Cameron, Lu Baolu, wanghaibin.wang, jiangkunkun,
	yuzenghui, lushenming

From: Kunkun Jiang <jiangkunkun@huawei.com>

In the past, we clear dirty log immediately after sync dirty
log to userspace. This may cause redundant dirty handling if
userspace handles dirty log iteratively:

After vfio clears dirty log, new dirty log starts to generate.
These new dirty log will be reported to userspace even if they
are generated before userspace handles the same dirty page.

That's to say, we should minimize the time gap of dirty log
clearing and dirty log handling. We can give userspace the
interface to clear dirty log.

Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---
 drivers/vfio/vfio_iommu_type1.c | 100 ++++++++++++++++++++++++++++++--
 include/uapi/linux/vfio.h       |  28 ++++++++-
 2 files changed, 123 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 77950e47f56f..d9c4a27b3c4e 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -78,6 +78,7 @@ struct vfio_iommu {
 	bool			v2;
 	bool			nesting;
 	bool			dirty_page_tracking;
+	bool			dirty_log_manual_clear;
 	bool			pinned_page_dirty_scope;
 	bool			container_open;
 };
@@ -1242,6 +1243,78 @@ static int vfio_iommu_dirty_log_sync(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static int vfio_iova_dirty_log_clear(u64 __user *bitmap,
+				     struct vfio_iommu *iommu,
+				     dma_addr_t iova, size_t size,
+				     size_t pgsize)
+{
+	struct vfio_dma *dma;
+	struct rb_node *n;
+	dma_addr_t start_iova, end_iova, riova;
+	unsigned long pgshift = __ffs(pgsize);
+	unsigned long bitmap_size;
+	unsigned long *bitmap_buffer = NULL;
+	bool clear_valid;
+	int rs, re, start, end, dma_offset;
+	int ret = 0;
+
+	bitmap_size = DIRTY_BITMAP_BYTES(size >> pgshift);
+	bitmap_buffer = kvmalloc(bitmap_size, GFP_KERNEL);
+	if (!bitmap_buffer) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (copy_from_user(bitmap_buffer, bitmap, bitmap_size)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	for (n = rb_first(&iommu->dma_list); n; n = rb_next(n)) {
+		dma = rb_entry(n, struct vfio_dma, node);
+		if (!dma->iommu_mapped)
+			continue;
+		if ((dma->iova + dma->size - 1) < iova)
+			continue;
+		if (dma->iova > iova + size - 1)
+			break;
+
+		start_iova = max(iova, dma->iova);
+		end_iova = min(iova + size, dma->iova + dma->size);
+
+		/* Similar logic as the tail of vfio_iova_dirty_bitmap */
+
+		clear_valid = false;
+		start = (start_iova - iova) >> pgshift;
+		end = (end_iova - iova) >> pgshift;
+		bitmap_for_each_set_region(bitmap_buffer, rs, re, start, end) {
+			clear_valid = true;
+			riova = iova + (rs << pgshift);
+			dma_offset = (riova - dma->iova) >> pgshift;
+			bitmap_clear(dma->bitmap, dma_offset, re - rs);
+		}
+
+		if (clear_valid)
+			vfio_dma_populate_bitmap(dma, pgsize);
+
+		if (clear_valid && !iommu->pinned_page_dirty_scope &&
+		    dma->iommu_mapped && !iommu->num_non_hwdbm_groups) {
+			ret = vfio_iommu_dirty_log_clear(iommu, start_iova,
+					end_iova - start_iova,	bitmap_buffer,
+					iova, pgshift);
+			if (ret) {
+				pr_warn("dma dirty log clear failed!\n");
+				goto out;
+			}
+		}
+
+	}
+
+out:
+	kfree(bitmap_buffer);
+	return ret;
+}
+
 static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
 			      struct vfio_dma *dma, dma_addr_t base_iova,
 			      size_t pgsize)
@@ -1329,6 +1402,10 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
 		if (ret)
 			return ret;
 
+		/* Do not clear dirty automatically when manual_clear enabled */
+		if (iommu->dirty_log_manual_clear)
+			continue;
+
 		/* Clear iommu dirty log to re-enable dirty log tracking */
 		if (iommu->num_non_pinned_groups && dma->iommu_mapped &&
 		    !iommu->num_non_hwdbm_groups) {
@@ -2946,6 +3023,11 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
 		if (!iommu)
 			return 0;
 		return vfio_domains_have_iommu_cache(iommu);
+	case VFIO_DIRTY_LOG_MANUAL_CLEAR:
+		if (!iommu)
+			return 0;
+		iommu->dirty_log_manual_clear = true;
+		return 1;
 	default:
 		return 0;
 	}
@@ -3201,7 +3283,8 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
 	struct vfio_iommu_type1_dirty_bitmap dirty;
 	uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START |
 			VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP |
-			VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP;
+			VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP |
+			VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP;
 	unsigned long minsz;
 	int ret = 0;
 
@@ -3243,7 +3326,8 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
 		}
 		mutex_unlock(&iommu->lock);
 		return 0;
-	} else if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP) {
+	} else if (dirty.flags & (VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP |
+				VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP)) {
 		struct vfio_iommu_type1_dirty_bitmap_get range;
 		unsigned long pgshift;
 		size_t data_size = dirty.argsz - minsz;
@@ -3286,13 +3370,21 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
 			goto out_unlock;
 		}
 
-		if (iommu->dirty_page_tracking)
+		if (!iommu->dirty_page_tracking) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+
+		if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP)
 			ret = vfio_iova_dirty_bitmap(range.bitmap.data,
 						     iommu, range.iova,
 						     range.size,
 						     range.bitmap.pgsize);
 		else
-			ret = -EINVAL;
+			ret = vfio_iova_dirty_log_clear(range.bitmap.data,
+							iommu, range.iova,
+							range.size,
+							range.bitmap.pgsize);
 out_unlock:
 		mutex_unlock(&iommu->lock);
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 8ce36c1d53ca..784dc3cf2a8f 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -52,6 +52,14 @@
 /* Supports the vaddr flag for DMA map and unmap */
 #define VFIO_UPDATE_VADDR		10
 
+/*
+ * The vfio_iommu driver may support user clears dirty log manually, which means
+ * dirty log is not cleared automatically after dirty log is copied to userspace,
+ * it's user's duty to clear dirty log. Note: when user queries this extension
+ * and vfio_iommu driver supports it, then it is enabled.
+ */
+#define VFIO_DIRTY_LOG_MANUAL_CLEAR	11
+
 /*
  * The IOCTL interface is designed for extensibility by embedding the
  * structure length (argsz) and flags into structures passed between
@@ -1188,7 +1196,24 @@ struct vfio_iommu_type1_dma_unmap {
  * actual bitmap. If dirty pages logging is not enabled, an error will be
  * returned.
  *
- * Only one of the flags _START, _STOP and _GET may be specified at a time.
+ * Calling the IOCTL with VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP flag set,
+ * instructs the IOMMU driver to clear the dirty status of pages in a bitmap
+ * for IOMMU container for a given IOVA range. The user must specify the IOVA
+ * range, the bitmap and the pgsize through the structure
+ * vfio_iommu_type1_dirty_bitmap_get in the data[] portion. This interface
+ * supports clearing a bitmap of the smallest supported pgsize only and can be
+ * modified in future to clear a bitmap of any specified supported pgsize. The
+ * user must provide a memory area for the bitmap memory and specify its size
+ * in bitmap.size. One bit is used to represent one page consecutively starting
+ * from iova offset. The user should provide page size in bitmap.pgsize field.
+ * A bit set in the bitmap indicates that the page at that offset from iova is
+ * cleared the dirty status, and dirty tracking is re-enabled for that page. The
+ * caller must set argsz to a value including the size of structure
+ * vfio_iommu_dirty_bitmap_get, but excluing the size of the actual bitmap. If
+ * dirty pages logging is not enabled, an error will be returned.
+ *
+ * Only one of the flags _START, _STOP, _GET and _CLEAR may be specified at a
+ * time.
  *
  */
 struct vfio_iommu_type1_dirty_bitmap {
@@ -1197,6 +1222,7 @@ struct vfio_iommu_type1_dirty_bitmap {
 #define VFIO_IOMMU_DIRTY_PAGES_FLAG_START	(1 << 0)
 #define VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP	(1 << 1)
 #define VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP	(1 << 2)
+#define VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP (1 << 3)
 	__u8         data[];
 };
 
-- 
2.19.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] vfio/iommu_type1: Add HWDBM status maintanance
  2021-04-13  9:14 ` [PATCH 1/3] vfio/iommu_type1: Add HWDBM status maintanance Keqian Zhu
@ 2021-04-13 16:25   ` kernel test robot
  2021-04-14  3:14   ` kernel test robot
  1 sibling, 0 replies; 9+ messages in thread
From: kernel test robot @ 2021-04-13 16:25 UTC (permalink / raw)
  To: Keqian Zhu, linux-kernel, kvm, Alex Williamson, Kirti Wankhede,
	Cornelia Huck, Yi Sun, Tian Kevin
  Cc: kbuild-all, Robin Murphy, Will Deacon, Joerg Roedel


[-- Attachment #1: Type: text/plain, Size: 2473 bytes --]

Hi Keqian,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on vfio/next]
[also build test ERROR on linux/master linus/master v5.12-rc7 next-20210413]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Keqian-Zhu/vfio-iommu_type1-Implement-dirty-log-tracking-based-on-IOMMU-HWDBM/20210413-171632
base:   https://github.com/awilliam/linux-vfio.git next
config: arm-randconfig-r015-20210413 (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/3005ed21d06a3ed861847529f08c3d8814013399
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Keqian-Zhu/vfio-iommu_type1-Implement-dirty-log-tracking-based-on-IOMMU-HWDBM/20210413-171632
        git checkout 3005ed21d06a3ed861847529f08c3d8814013399
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/vfio/vfio_iommu_type1.c: In function 'vfio_group_supports_hwdbm':
>> drivers/vfio/vfio_iommu_type1.c:2270:33: error: 'IOMMU_DEV_FEAT_HWDBM' undeclared (first use in this function); did you mean 'IOMMU_DEV_FEAT_SVA'?
    2270 |  enum iommu_dev_features feat = IOMMU_DEV_FEAT_HWDBM;
         |                                 ^~~~~~~~~~~~~~~~~~~~
         |                                 IOMMU_DEV_FEAT_SVA
   drivers/vfio/vfio_iommu_type1.c:2270:33: note: each undeclared identifier is reported only once for each function it appears in


vim +2270 drivers/vfio/vfio_iommu_type1.c

  2267	
  2268	static bool vfio_group_supports_hwdbm(struct vfio_group *group)
  2269	{
> 2270		enum iommu_dev_features feat = IOMMU_DEV_FEAT_HWDBM;
  2271	
  2272		return !iommu_group_for_each_dev(group->iommu_group, &feat,
  2273						 vfio_dev_enable_feature);
  2274	}
  2275	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 40611 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] vfio/iommu_type1: Optimize dirty bitmap population based on iommu HWDBM
  2021-04-13  9:14 ` [PATCH 2/3] vfio/iommu_type1: Optimize dirty bitmap population based on iommu HWDBM Keqian Zhu
@ 2021-04-13 18:05   ` kernel test robot
  0 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2021-04-13 18:05 UTC (permalink / raw)
  To: Keqian Zhu, linux-kernel, kvm, Alex Williamson, Kirti Wankhede,
	Cornelia Huck, Yi Sun, Tian Kevin
  Cc: kbuild-all, Robin Murphy, Will Deacon, Joerg Roedel


[-- Attachment #1: Type: text/plain, Size: 10037 bytes --]

Hi Keqian,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on vfio/next]
[also build test ERROR on linux/master linus/master v5.12-rc7 next-20210413]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Keqian-Zhu/vfio-iommu_type1-Implement-dirty-log-tracking-based-on-IOMMU-HWDBM/20210413-171632
base:   https://github.com/awilliam/linux-vfio.git next
config: arm-randconfig-r015-20210413 (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/5553c39f302409e175a70157c47679e61297dec5
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Keqian-Zhu/vfio-iommu_type1-Implement-dirty-log-tracking-based-on-IOMMU-HWDBM/20210413-171632
        git checkout 5553c39f302409e175a70157c47679e61297dec5
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/vfio/vfio_iommu_type1.c: In function 'vfio_iommu_dirty_log_clear':
>> drivers/vfio/vfio_iommu_type1.c:1215:9: error: implicit declaration of function 'iommu_clear_dirty_log' [-Werror=implicit-function-declaration]
    1215 |   ret = iommu_clear_dirty_log(d->domain, start_iova, size,
         |         ^~~~~~~~~~~~~~~~~~~~~
   drivers/vfio/vfio_iommu_type1.c: In function 'vfio_iommu_dirty_log_sync':
>> drivers/vfio/vfio_iommu_type1.c:1234:9: error: implicit declaration of function 'iommu_sync_dirty_log' [-Werror=implicit-function-declaration]
    1234 |   ret = iommu_sync_dirty_log(d->domain, dma->iova, dma->size,
         |         ^~~~~~~~~~~~~~~~~~~~
   In file included from arch/arm/include/asm/bug.h:60,
                    from include/linux/bug.h:5,
                    from include/linux/thread_info.h:12,
                    from include/asm-generic/preempt.h:5,
                    from ./arch/arm/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:78,
                    from include/linux/spinlock.h:51,
                    from include/linux/ipc.h:5,
                    from include/uapi/linux/sem.h:5,
                    from include/linux/sem.h:5,
                    from include/linux/compat.h:14,
                    from drivers/vfio/vfio_iommu_type1.c:24:
   drivers/vfio/vfio_iommu_type1.c: In function 'vfio_dma_dirty_log_switch':
>> drivers/vfio/vfio_iommu_type1.c:1373:11: error: implicit declaration of function 'iommu_switch_dirty_log' [-Werror=implicit-function-declaration]
    1373 |   WARN_ON(iommu_switch_dirty_log(d->domain, enable, dma->iova,
         |           ^~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/bug.h:188:25: note: in definition of macro 'WARN_ON'
     188 |  int __ret_warn_on = !!(condition);    \
         |                         ^~~~~~~~~
   drivers/vfio/vfio_iommu_type1.c: In function 'vfio_group_supports_hwdbm':
   drivers/vfio/vfio_iommu_type1.c:2360:33: error: 'IOMMU_DEV_FEAT_HWDBM' undeclared (first use in this function); did you mean 'IOMMU_DEV_FEAT_SVA'?
    2360 |  enum iommu_dev_features feat = IOMMU_DEV_FEAT_HWDBM;
         |                                 ^~~~~~~~~~~~~~~~~~~~
         |                                 IOMMU_DEV_FEAT_SVA
   drivers/vfio/vfio_iommu_type1.c:2360:33: note: each undeclared identifier is reported only once for each function it appears in
   cc1: some warnings being treated as errors


vim +/iommu_clear_dirty_log +1215 drivers/vfio/vfio_iommu_type1.c

  1204	
  1205	static int vfio_iommu_dirty_log_clear(struct vfio_iommu *iommu,
  1206					      dma_addr_t start_iova, size_t size,
  1207					      unsigned long *bitmap_buffer,
  1208					      dma_addr_t base_iova,
  1209					      unsigned long pgshift)
  1210	{
  1211		struct vfio_domain *d;
  1212		int ret = 0;
  1213	
  1214		list_for_each_entry(d, &iommu->domain_list, next) {
> 1215			ret = iommu_clear_dirty_log(d->domain, start_iova, size,
  1216						    bitmap_buffer, base_iova, pgshift);
  1217			if (ret) {
  1218				pr_warn("vfio_iommu dirty log clear failed!\n");
  1219				break;
  1220			}
  1221		}
  1222	
  1223		return ret;
  1224	}
  1225	
  1226	static int vfio_iommu_dirty_log_sync(struct vfio_iommu *iommu,
  1227					     struct vfio_dma *dma,
  1228					     unsigned long pgshift)
  1229	{
  1230		struct vfio_domain *d;
  1231		int ret = 0;
  1232	
  1233		list_for_each_entry(d, &iommu->domain_list, next) {
> 1234			ret = iommu_sync_dirty_log(d->domain, dma->iova, dma->size,
  1235						   dma->bitmap, dma->iova, pgshift);
  1236			if (ret) {
  1237				pr_warn("vfio_iommu dirty log sync failed!\n");
  1238				break;
  1239			}
  1240		}
  1241	
  1242		return ret;
  1243	}
  1244	
  1245	static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
  1246				      struct vfio_dma *dma, dma_addr_t base_iova,
  1247				      size_t pgsize)
  1248	{
  1249		unsigned long pgshift = __ffs(pgsize);
  1250		unsigned long nbits = dma->size >> pgshift;
  1251		unsigned long bit_offset = (dma->iova - base_iova) >> pgshift;
  1252		unsigned long copy_offset = bit_offset / BITS_PER_LONG;
  1253		unsigned long shift = bit_offset % BITS_PER_LONG;
  1254		unsigned long leftover;
  1255		int ret;
  1256	
  1257		if (!iommu->num_non_pinned_groups || !dma->iommu_mapped) {
  1258			/* nothing to do */
  1259		} else if (!iommu->num_non_hwdbm_groups) {
  1260			/* try to get dirty log from IOMMU */
  1261			ret = vfio_iommu_dirty_log_sync(iommu, dma, pgshift);
  1262			if (ret)
  1263				return ret;
  1264		} else {
  1265			/*
  1266			 * mark all pages dirty if any IOMMU capable device is not able
  1267			 * to report dirty pages and all pages are pinned and mapped.
  1268			 */
  1269			bitmap_set(dma->bitmap, 0, nbits);
  1270		}
  1271	
  1272		if (shift) {
  1273			bitmap_shift_left(dma->bitmap, dma->bitmap, shift,
  1274					  nbits + shift);
  1275	
  1276			if (copy_from_user(&leftover,
  1277					   (void __user *)(bitmap + copy_offset),
  1278					   sizeof(leftover)))
  1279				return -EFAULT;
  1280	
  1281			bitmap_or(dma->bitmap, dma->bitmap, &leftover, shift);
  1282		}
  1283	
  1284		if (copy_to_user((void __user *)(bitmap + copy_offset), dma->bitmap,
  1285				 DIRTY_BITMAP_BYTES(nbits + shift)))
  1286			return -EFAULT;
  1287	
  1288		/* Recover the bitmap if it'll be used to clear hardware dirty log */
  1289		if (shift && iommu->num_non_pinned_groups && dma->iommu_mapped &&
  1290		    !iommu->num_non_hwdbm_groups)
  1291			bitmap_shift_right(dma->bitmap, dma->bitmap, shift,
  1292					   nbits + shift);
  1293	
  1294		return 0;
  1295	}
  1296	
  1297	static int vfio_iova_dirty_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
  1298					  dma_addr_t iova, size_t size, size_t pgsize)
  1299	{
  1300		struct vfio_dma *dma;
  1301		struct rb_node *n;
  1302		unsigned long pgshift = __ffs(pgsize);
  1303		int ret;
  1304	
  1305		/*
  1306		 * GET_BITMAP request must fully cover vfio_dma mappings.  Multiple
  1307		 * vfio_dma mappings may be clubbed by specifying large ranges, but
  1308		 * there must not be any previous mappings bisected by the range.
  1309		 * An error will be returned if these conditions are not met.
  1310		 */
  1311		dma = vfio_find_dma(iommu, iova, 1);
  1312		if (dma && dma->iova != iova)
  1313			return -EINVAL;
  1314	
  1315		dma = vfio_find_dma(iommu, iova + size - 1, 0);
  1316		if (dma && dma->iova + dma->size != iova + size)
  1317			return -EINVAL;
  1318	
  1319		for (n = rb_first(&iommu->dma_list); n; n = rb_next(n)) {
  1320			struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
  1321	
  1322			if (dma->iova < iova)
  1323				continue;
  1324	
  1325			if (dma->iova > iova + size - 1)
  1326				break;
  1327	
  1328			ret = update_user_bitmap(bitmap, iommu, dma, iova, pgsize);
  1329			if (ret)
  1330				return ret;
  1331	
  1332			/* Clear iommu dirty log to re-enable dirty log tracking */
  1333			if (iommu->num_non_pinned_groups && dma->iommu_mapped &&
  1334			    !iommu->num_non_hwdbm_groups) {
  1335				ret = vfio_iommu_dirty_log_clear(iommu,	dma->iova,
  1336						dma->size, dma->bitmap, dma->iova,
  1337						pgshift);
  1338				if (ret)
  1339					return ret;
  1340			}
  1341	
  1342			/*
  1343			 * Re-populate bitmap to include all pinned pages which are
  1344			 * considered as dirty but exclude pages which are unpinned and
  1345			 * pages which are marked dirty by vfio_dma_rw()
  1346			 */
  1347			bitmap_clear(dma->bitmap, 0, dma->size >> pgshift);
  1348			vfio_dma_populate_bitmap(dma, pgsize);
  1349		}
  1350		return 0;
  1351	}
  1352	
  1353	static int verify_bitmap_size(uint64_t npages, uint64_t bitmap_size)
  1354	{
  1355		if (!npages || !bitmap_size || (bitmap_size > DIRTY_BITMAP_SIZE_MAX) ||
  1356		    (bitmap_size < DIRTY_BITMAP_BYTES(npages)))
  1357			return -EINVAL;
  1358	
  1359		return 0;
  1360	}
  1361	
  1362	static void vfio_dma_dirty_log_switch(struct vfio_iommu *iommu,
  1363					      struct vfio_dma *dma, bool enable)
  1364	{
  1365		struct vfio_domain *d;
  1366	
  1367		if (!dma->iommu_mapped)
  1368			return;
  1369	
  1370		list_for_each_entry(d, &iommu->domain_list, next) {
  1371			if (d->num_non_hwdbm_groups)
  1372				continue;
> 1373			WARN_ON(iommu_switch_dirty_log(d->domain, enable, dma->iova,
  1374						       dma->size, d->prot | dma->prot));
  1375		}
  1376	}
  1377	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 40611 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] vfio/iommu_type1: Add HWDBM status maintanance
  2021-04-13  9:14 ` [PATCH 1/3] vfio/iommu_type1: Add HWDBM status maintanance Keqian Zhu
  2021-04-13 16:25   ` kernel test robot
@ 2021-04-14  3:14   ` kernel test robot
  1 sibling, 0 replies; 9+ messages in thread
From: kernel test robot @ 2021-04-14  3:14 UTC (permalink / raw)
  To: Keqian Zhu, linux-kernel, kvm, Alex Williamson, Kirti Wankhede,
	Cornelia Huck, Yi Sun, Tian Kevin
  Cc: kbuild-all, clang-built-linux, Robin Murphy, Will Deacon, Joerg Roedel


[-- Attachment #1: Type: text/plain, Size: 2363 bytes --]

Hi Keqian,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on vfio/next]
[also build test ERROR on linux/master linus/master v5.12-rc7 next-20210413]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Keqian-Zhu/vfio-iommu_type1-Implement-dirty-log-tracking-based-on-IOMMU-HWDBM/20210413-171632
base:   https://github.com/awilliam/linux-vfio.git next
config: x86_64-randconfig-a013-20210413 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 9829f5e6b1bca9b61efc629770d28bb9014dec45)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/3005ed21d06a3ed861847529f08c3d8814013399
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Keqian-Zhu/vfio-iommu_type1-Implement-dirty-log-tracking-based-on-IOMMU-HWDBM/20210413-171632
        git checkout 3005ed21d06a3ed861847529f08c3d8814013399
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/vfio/vfio_iommu_type1.c:2270:33: error: use of undeclared identifier 'IOMMU_DEV_FEAT_HWDBM'
           enum iommu_dev_features feat = IOMMU_DEV_FEAT_HWDBM;
                                          ^
   1 error generated.


vim +/IOMMU_DEV_FEAT_HWDBM +2270 drivers/vfio/vfio_iommu_type1.c

  2267	
  2268	static bool vfio_group_supports_hwdbm(struct vfio_group *group)
  2269	{
> 2270		enum iommu_dev_features feat = IOMMU_DEV_FEAT_HWDBM;
  2271	
  2272		return !iommu_group_for_each_dev(group->iommu_group, &feat,
  2273						 vfio_dev_enable_feature);
  2274	}
  2275	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34886 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] vfio/iommu_type1: Add support for manual dirty log clear
  2021-04-13  9:14 ` [PATCH 3/3] vfio/iommu_type1: Add support for manual dirty log clear Keqian Zhu
@ 2021-04-15 20:43   ` Alex Williamson
  2021-04-16  8:45     ` Keqian Zhu
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Williamson @ 2021-04-15 20:43 UTC (permalink / raw)
  To: Keqian Zhu
  Cc: linux-kernel, kvm, Kirti Wankhede, Cornelia Huck, Yi Sun,
	Tian Kevin, Robin Murphy, Will Deacon, Joerg Roedel,
	Jean-Philippe Brucker, Jonathan Cameron, Lu Baolu,
	wanghaibin.wang, jiangkunkun, yuzenghui, lushenming

On Tue, 13 Apr 2021 17:14:45 +0800
Keqian Zhu <zhukeqian1@huawei.com> wrote:

> From: Kunkun Jiang <jiangkunkun@huawei.com>
> 
> In the past, we clear dirty log immediately after sync dirty
> log to userspace. This may cause redundant dirty handling if
> userspace handles dirty log iteratively:
> 
> After vfio clears dirty log, new dirty log starts to generate.
> These new dirty log will be reported to userspace even if they
> are generated before userspace handles the same dirty page.
> 
> That's to say, we should minimize the time gap of dirty log
> clearing and dirty log handling. We can give userspace the
> interface to clear dirty log.

IIUC, a user would be expected to clear the bitmap before copying the
dirty pages, therefore you're trying to reduce that time gap between
clearing any copy, but it cannot be fully eliminated and importantly,
if the user clears after copying, they've introduced a race.  Correct?

What results do you have to show that this is a worthwhile optimization?

I really don't like the semantics that testing for an IOMMU capability
enables it.  It needs to be explicitly controllable feature, which
suggests to me that it might be a flag used in combination with _GET or
a separate _GET_NOCLEAR operations.  Thanks,

Alex


> Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 100 ++++++++++++++++++++++++++++++--
>  include/uapi/linux/vfio.h       |  28 ++++++++-
>  2 files changed, 123 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 77950e47f56f..d9c4a27b3c4e 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -78,6 +78,7 @@ struct vfio_iommu {
>  	bool			v2;
>  	bool			nesting;
>  	bool			dirty_page_tracking;
> +	bool			dirty_log_manual_clear;
>  	bool			pinned_page_dirty_scope;
>  	bool			container_open;
>  };
> @@ -1242,6 +1243,78 @@ static int vfio_iommu_dirty_log_sync(struct vfio_iommu *iommu,
>  	return ret;
>  }
>  
> +static int vfio_iova_dirty_log_clear(u64 __user *bitmap,
> +				     struct vfio_iommu *iommu,
> +				     dma_addr_t iova, size_t size,
> +				     size_t pgsize)
> +{
> +	struct vfio_dma *dma;
> +	struct rb_node *n;
> +	dma_addr_t start_iova, end_iova, riova;
> +	unsigned long pgshift = __ffs(pgsize);
> +	unsigned long bitmap_size;
> +	unsigned long *bitmap_buffer = NULL;
> +	bool clear_valid;
> +	int rs, re, start, end, dma_offset;
> +	int ret = 0;
> +
> +	bitmap_size = DIRTY_BITMAP_BYTES(size >> pgshift);
> +	bitmap_buffer = kvmalloc(bitmap_size, GFP_KERNEL);
> +	if (!bitmap_buffer) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	if (copy_from_user(bitmap_buffer, bitmap, bitmap_size)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	for (n = rb_first(&iommu->dma_list); n; n = rb_next(n)) {
> +		dma = rb_entry(n, struct vfio_dma, node);
> +		if (!dma->iommu_mapped)
> +			continue;
> +		if ((dma->iova + dma->size - 1) < iova)
> +			continue;
> +		if (dma->iova > iova + size - 1)
> +			break;
> +
> +		start_iova = max(iova, dma->iova);
> +		end_iova = min(iova + size, dma->iova + dma->size);
> +
> +		/* Similar logic as the tail of vfio_iova_dirty_bitmap */
> +
> +		clear_valid = false;
> +		start = (start_iova - iova) >> pgshift;
> +		end = (end_iova - iova) >> pgshift;
> +		bitmap_for_each_set_region(bitmap_buffer, rs, re, start, end) {
> +			clear_valid = true;
> +			riova = iova + (rs << pgshift);
> +			dma_offset = (riova - dma->iova) >> pgshift;
> +			bitmap_clear(dma->bitmap, dma_offset, re - rs);
> +		}
> +
> +		if (clear_valid)
> +			vfio_dma_populate_bitmap(dma, pgsize);
> +
> +		if (clear_valid && !iommu->pinned_page_dirty_scope &&
> +		    dma->iommu_mapped && !iommu->num_non_hwdbm_groups) {
> +			ret = vfio_iommu_dirty_log_clear(iommu, start_iova,
> +					end_iova - start_iova,	bitmap_buffer,
> +					iova, pgshift);
> +			if (ret) {
> +				pr_warn("dma dirty log clear failed!\n");
> +				goto out;
> +			}
> +		}
> +
> +	}
> +
> +out:
> +	kfree(bitmap_buffer);
> +	return ret;
> +}
> +
>  static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
>  			      struct vfio_dma *dma, dma_addr_t base_iova,
>  			      size_t pgsize)
> @@ -1329,6 +1402,10 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
>  		if (ret)
>  			return ret;
>  
> +		/* Do not clear dirty automatically when manual_clear enabled */
> +		if (iommu->dirty_log_manual_clear)
> +			continue;
> +
>  		/* Clear iommu dirty log to re-enable dirty log tracking */
>  		if (iommu->num_non_pinned_groups && dma->iommu_mapped &&
>  		    !iommu->num_non_hwdbm_groups) {
> @@ -2946,6 +3023,11 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
>  		if (!iommu)
>  			return 0;
>  		return vfio_domains_have_iommu_cache(iommu);
> +	case VFIO_DIRTY_LOG_MANUAL_CLEAR:
> +		if (!iommu)
> +			return 0;
> +		iommu->dirty_log_manual_clear = true;
> +		return 1;
>  	default:
>  		return 0;
>  	}
> @@ -3201,7 +3283,8 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
>  	struct vfio_iommu_type1_dirty_bitmap dirty;
>  	uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START |
>  			VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP |
> -			VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP;
> +			VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP |
> +			VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP;
>  	unsigned long minsz;
>  	int ret = 0;
>  
> @@ -3243,7 +3326,8 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
>  		}
>  		mutex_unlock(&iommu->lock);
>  		return 0;
> -	} else if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP) {
> +	} else if (dirty.flags & (VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP |
> +				VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP)) {
>  		struct vfio_iommu_type1_dirty_bitmap_get range;
>  		unsigned long pgshift;
>  		size_t data_size = dirty.argsz - minsz;
> @@ -3286,13 +3370,21 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
>  			goto out_unlock;
>  		}
>  
> -		if (iommu->dirty_page_tracking)
> +		if (!iommu->dirty_page_tracking) {
> +			ret = -EINVAL;
> +			goto out_unlock;
> +		}
> +
> +		if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP)
>  			ret = vfio_iova_dirty_bitmap(range.bitmap.data,
>  						     iommu, range.iova,
>  						     range.size,
>  						     range.bitmap.pgsize);
>  		else
> -			ret = -EINVAL;
> +			ret = vfio_iova_dirty_log_clear(range.bitmap.data,
> +							iommu, range.iova,
> +							range.size,
> +							range.bitmap.pgsize);
>  out_unlock:
>  		mutex_unlock(&iommu->lock);
>  
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 8ce36c1d53ca..784dc3cf2a8f 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -52,6 +52,14 @@
>  /* Supports the vaddr flag for DMA map and unmap */
>  #define VFIO_UPDATE_VADDR		10
>  
> +/*
> + * The vfio_iommu driver may support user clears dirty log manually, which means
> + * dirty log is not cleared automatically after dirty log is copied to userspace,
> + * it's user's duty to clear dirty log. Note: when user queries this extension
> + * and vfio_iommu driver supports it, then it is enabled.
> + */
> +#define VFIO_DIRTY_LOG_MANUAL_CLEAR	11
> +
>  /*
>   * The IOCTL interface is designed for extensibility by embedding the
>   * structure length (argsz) and flags into structures passed between
> @@ -1188,7 +1196,24 @@ struct vfio_iommu_type1_dma_unmap {
>   * actual bitmap. If dirty pages logging is not enabled, an error will be
>   * returned.
>   *
> - * Only one of the flags _START, _STOP and _GET may be specified at a time.
> + * Calling the IOCTL with VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP flag set,
> + * instructs the IOMMU driver to clear the dirty status of pages in a bitmap
> + * for IOMMU container for a given IOVA range. The user must specify the IOVA
> + * range, the bitmap and the pgsize through the structure
> + * vfio_iommu_type1_dirty_bitmap_get in the data[] portion. This interface
> + * supports clearing a bitmap of the smallest supported pgsize only and can be
> + * modified in future to clear a bitmap of any specified supported pgsize. The
> + * user must provide a memory area for the bitmap memory and specify its size
> + * in bitmap.size. One bit is used to represent one page consecutively starting
> + * from iova offset. The user should provide page size in bitmap.pgsize field.
> + * A bit set in the bitmap indicates that the page at that offset from iova is
> + * cleared the dirty status, and dirty tracking is re-enabled for that page. The
> + * caller must set argsz to a value including the size of structure
> + * vfio_iommu_dirty_bitmap_get, but excluing the size of the actual bitmap. If
> + * dirty pages logging is not enabled, an error will be returned.
> + *
> + * Only one of the flags _START, _STOP, _GET and _CLEAR may be specified at a
> + * time.
>   *
>   */
>  struct vfio_iommu_type1_dirty_bitmap {
> @@ -1197,6 +1222,7 @@ struct vfio_iommu_type1_dirty_bitmap {
>  #define VFIO_IOMMU_DIRTY_PAGES_FLAG_START	(1 << 0)
>  #define VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP	(1 << 1)
>  #define VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP	(1 << 2)
> +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP (1 << 3)
>  	__u8         data[];
>  };
>  


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] vfio/iommu_type1: Add support for manual dirty log clear
  2021-04-15 20:43   ` Alex Williamson
@ 2021-04-16  8:45     ` Keqian Zhu
  0 siblings, 0 replies; 9+ messages in thread
From: Keqian Zhu @ 2021-04-16  8:45 UTC (permalink / raw)
  To: Alex Williamson
  Cc: linux-kernel, kvm, Kirti Wankhede, Cornelia Huck, Yi Sun,
	Tian Kevin, Robin Murphy, Will Deacon, Joerg Roedel,
	Jean-Philippe Brucker, Jonathan Cameron, Lu Baolu,
	wanghaibin.wang, jiangkunkun, yuzenghui, lushenming

Hi Alex,

On 2021/4/16 4:43, Alex Williamson wrote:
> On Tue, 13 Apr 2021 17:14:45 +0800
> Keqian Zhu <zhukeqian1@huawei.com> wrote:
> 
>> From: Kunkun Jiang <jiangkunkun@huawei.com>
>>
>> In the past, we clear dirty log immediately after sync dirty
>> log to userspace. This may cause redundant dirty handling if
>> userspace handles dirty log iteratively:
>>
>> After vfio clears dirty log, new dirty log starts to generate.
>> These new dirty log will be reported to userspace even if they
>> are generated before userspace handles the same dirty page.
>>
>> That's to say, we should minimize the time gap of dirty log
>> clearing and dirty log handling. We can give userspace the
>> interface to clear dirty log.
> 
> IIUC, a user would be expected to clear the bitmap before copying the
> dirty pages, therefore you're trying to reduce that time gap between
> clearing any copy, but it cannot be fully eliminated and importantly,
> if the user clears after copying, they've introduced a race.  Correct?
Yep, it's totally correct. If user clears after copying, it may lose dirty info.
I'll enhance the doc.

> 
> What results do you have to show that this is a worthwhile optimization?
This optimization is inspired by KVM[1]. The results are differ by different workload of guest.
In theory, the higher dirty rate the better result. But sorry that I tested it on our FPGA, the dirty
rate is heavily limited, so the result is not obvious.

> 
> I really don't like the semantics that testing for an IOMMU capability
> enables it.  It needs to be explicitly controllable feature, which
> suggests to me that it might be a flag used in combination with _GET or
> a separate _GET_NOCLEAR operations.  Thanks,
Yes, good suggestion. We should give userspace a choice.

Thanks,
Keqian

[1] https://lore.kernel.org/kvm/1543251253-24762-1-git-send-email-pbonzini@redhat.com/

> 
> Alex
> 
> 
>> Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 100 ++++++++++++++++++++++++++++++--
>>  include/uapi/linux/vfio.h       |  28 ++++++++-
>>  2 files changed, 123 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 77950e47f56f..d9c4a27b3c4e 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -78,6 +78,7 @@ struct vfio_iommu {
>>  	bool			v2;
>>  	bool			nesting;
>>  	bool			dirty_page_tracking;
>> +	bool			dirty_log_manual_clear;
>>  	bool			pinned_page_dirty_scope;
>>  	bool			container_open;
>>  };
>> @@ -1242,6 +1243,78 @@ static int vfio_iommu_dirty_log_sync(struct vfio_iommu *iommu,
>>  	return ret;
>>  }
>>  
>> +static int vfio_iova_dirty_log_clear(u64 __user *bitmap,
>> +				     struct vfio_iommu *iommu,
>> +				     dma_addr_t iova, size_t size,
>> +				     size_t pgsize)
>> +{
>> +	struct vfio_dma *dma;
>> +	struct rb_node *n;
>> +	dma_addr_t start_iova, end_iova, riova;
>> +	unsigned long pgshift = __ffs(pgsize);
>> +	unsigned long bitmap_size;
>> +	unsigned long *bitmap_buffer = NULL;
>> +	bool clear_valid;
>> +	int rs, re, start, end, dma_offset;
>> +	int ret = 0;
>> +
>> +	bitmap_size = DIRTY_BITMAP_BYTES(size >> pgshift);
>> +	bitmap_buffer = kvmalloc(bitmap_size, GFP_KERNEL);
>> +	if (!bitmap_buffer) {
>> +		ret = -ENOMEM;
>> +		goto out;
>> +	}
>> +
>> +	if (copy_from_user(bitmap_buffer, bitmap, bitmap_size)) {
>> +		ret = -EFAULT;
>> +		goto out;
>> +	}
>> +
>> +	for (n = rb_first(&iommu->dma_list); n; n = rb_next(n)) {
>> +		dma = rb_entry(n, struct vfio_dma, node);
>> +		if (!dma->iommu_mapped)
>> +			continue;
>> +		if ((dma->iova + dma->size - 1) < iova)
>> +			continue;
>> +		if (dma->iova > iova + size - 1)
>> +			break;
>> +
>> +		start_iova = max(iova, dma->iova);
>> +		end_iova = min(iova + size, dma->iova + dma->size);
>> +
>> +		/* Similar logic as the tail of vfio_iova_dirty_bitmap */
>> +
>> +		clear_valid = false;
>> +		start = (start_iova - iova) >> pgshift;
>> +		end = (end_iova - iova) >> pgshift;
>> +		bitmap_for_each_set_region(bitmap_buffer, rs, re, start, end) {
>> +			clear_valid = true;
>> +			riova = iova + (rs << pgshift);
>> +			dma_offset = (riova - dma->iova) >> pgshift;
>> +			bitmap_clear(dma->bitmap, dma_offset, re - rs);
>> +		}
>> +
>> +		if (clear_valid)
>> +			vfio_dma_populate_bitmap(dma, pgsize);
>> +
>> +		if (clear_valid && !iommu->pinned_page_dirty_scope &&
>> +		    dma->iommu_mapped && !iommu->num_non_hwdbm_groups) {
>> +			ret = vfio_iommu_dirty_log_clear(iommu, start_iova,
>> +					end_iova - start_iova,	bitmap_buffer,
>> +					iova, pgshift);
>> +			if (ret) {
>> +				pr_warn("dma dirty log clear failed!\n");
>> +				goto out;
>> +			}
>> +		}
>> +
>> +	}
>> +
>> +out:
>> +	kfree(bitmap_buffer);
>> +	return ret;
>> +}
>> +
>>  static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
>>  			      struct vfio_dma *dma, dma_addr_t base_iova,
>>  			      size_t pgsize)
>> @@ -1329,6 +1402,10 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
>>  		if (ret)
>>  			return ret;
>>  
>> +		/* Do not clear dirty automatically when manual_clear enabled */
>> +		if (iommu->dirty_log_manual_clear)
>> +			continue;
>> +
>>  		/* Clear iommu dirty log to re-enable dirty log tracking */
>>  		if (iommu->num_non_pinned_groups && dma->iommu_mapped &&
>>  		    !iommu->num_non_hwdbm_groups) {
>> @@ -2946,6 +3023,11 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
>>  		if (!iommu)
>>  			return 0;
>>  		return vfio_domains_have_iommu_cache(iommu);
>> +	case VFIO_DIRTY_LOG_MANUAL_CLEAR:
>> +		if (!iommu)
>> +			return 0;
>> +		iommu->dirty_log_manual_clear = true;
>> +		return 1;
>>  	default:
>>  		return 0;
>>  	}
>> @@ -3201,7 +3283,8 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
>>  	struct vfio_iommu_type1_dirty_bitmap dirty;
>>  	uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START |
>>  			VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP |
>> -			VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP;
>> +			VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP |
>> +			VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP;
>>  	unsigned long minsz;
>>  	int ret = 0;
>>  
>> @@ -3243,7 +3326,8 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
>>  		}
>>  		mutex_unlock(&iommu->lock);
>>  		return 0;
>> -	} else if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP) {
>> +	} else if (dirty.flags & (VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP |
>> +				VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP)) {
>>  		struct vfio_iommu_type1_dirty_bitmap_get range;
>>  		unsigned long pgshift;
>>  		size_t data_size = dirty.argsz - minsz;
>> @@ -3286,13 +3370,21 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
>>  			goto out_unlock;
>>  		}
>>  
>> -		if (iommu->dirty_page_tracking)
>> +		if (!iommu->dirty_page_tracking) {
>> +			ret = -EINVAL;
>> +			goto out_unlock;
>> +		}
>> +
>> +		if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP)
>>  			ret = vfio_iova_dirty_bitmap(range.bitmap.data,
>>  						     iommu, range.iova,
>>  						     range.size,
>>  						     range.bitmap.pgsize);
>>  		else
>> -			ret = -EINVAL;
>> +			ret = vfio_iova_dirty_log_clear(range.bitmap.data,
>> +							iommu, range.iova,
>> +							range.size,
>> +							range.bitmap.pgsize);
>>  out_unlock:
>>  		mutex_unlock(&iommu->lock);
>>  
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 8ce36c1d53ca..784dc3cf2a8f 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -52,6 +52,14 @@
>>  /* Supports the vaddr flag for DMA map and unmap */
>>  #define VFIO_UPDATE_VADDR		10
>>  
>> +/*
>> + * The vfio_iommu driver may support user clears dirty log manually, which means
>> + * dirty log is not cleared automatically after dirty log is copied to userspace,
>> + * it's user's duty to clear dirty log. Note: when user queries this extension
>> + * and vfio_iommu driver supports it, then it is enabled.
>> + */
>> +#define VFIO_DIRTY_LOG_MANUAL_CLEAR	11
>> +
>>  /*
>>   * The IOCTL interface is designed for extensibility by embedding the
>>   * structure length (argsz) and flags into structures passed between
>> @@ -1188,7 +1196,24 @@ struct vfio_iommu_type1_dma_unmap {
>>   * actual bitmap. If dirty pages logging is not enabled, an error will be
>>   * returned.
>>   *
>> - * Only one of the flags _START, _STOP and _GET may be specified at a time.
>> + * Calling the IOCTL with VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP flag set,
>> + * instructs the IOMMU driver to clear the dirty status of pages in a bitmap
>> + * for IOMMU container for a given IOVA range. The user must specify the IOVA
>> + * range, the bitmap and the pgsize through the structure
>> + * vfio_iommu_type1_dirty_bitmap_get in the data[] portion. This interface
>> + * supports clearing a bitmap of the smallest supported pgsize only and can be
>> + * modified in future to clear a bitmap of any specified supported pgsize. The
>> + * user must provide a memory area for the bitmap memory and specify its size
>> + * in bitmap.size. One bit is used to represent one page consecutively starting
>> + * from iova offset. The user should provide page size in bitmap.pgsize field.
>> + * A bit set in the bitmap indicates that the page at that offset from iova is
>> + * cleared the dirty status, and dirty tracking is re-enabled for that page. The
>> + * caller must set argsz to a value including the size of structure
>> + * vfio_iommu_dirty_bitmap_get, but excluing the size of the actual bitmap. If
>> + * dirty pages logging is not enabled, an error will be returned.
>> + *
>> + * Only one of the flags _START, _STOP, _GET and _CLEAR may be specified at a
>> + * time.
>>   *
>>   */
>>  struct vfio_iommu_type1_dirty_bitmap {
>> @@ -1197,6 +1222,7 @@ struct vfio_iommu_type1_dirty_bitmap {
>>  #define VFIO_IOMMU_DIRTY_PAGES_FLAG_START	(1 << 0)
>>  #define VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP	(1 << 1)
>>  #define VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP	(1 << 2)
>> +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP (1 << 3)
>>  	__u8         data[];
>>  };
>>  
> 
> .
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-13  9:14 [PATCH 0/3] vfio/iommu_type1: Implement dirty log tracking based on IOMMU HWDBM Keqian Zhu
2021-04-13  9:14 ` [PATCH 1/3] vfio/iommu_type1: Add HWDBM status maintanance Keqian Zhu
2021-04-13 16:25   ` kernel test robot
2021-04-14  3:14   ` kernel test robot
2021-04-13  9:14 ` [PATCH 2/3] vfio/iommu_type1: Optimize dirty bitmap population based on iommu HWDBM Keqian Zhu
2021-04-13 18:05   ` kernel test robot
2021-04-13  9:14 ` [PATCH 3/3] vfio/iommu_type1: Add support for manual dirty log clear Keqian Zhu
2021-04-15 20:43   ` Alex Williamson
2021-04-16  8:45     ` Keqian Zhu

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git