kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joao Martins <joao.m.martins@oracle.com>
To: iommu@lists.linux.dev
Cc: Jason Gunthorpe <jgg@nvidia.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Shameerali Kolothum Thodi  <shameerali.kolothum.thodi@huawei.com>,
	Lu Baolu <baolu.lu@linux.intel.com>, Yi Liu <yi.l.liu@intel.com>,
	Yi Y Sun <yi.y.sun@intel.com>, Eric Auger <eric.auger@redhat.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	Joerg Roedel <joro@8bytes.org>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	Will Deacon <will@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	kvm@vger.kernel.org, Joao Martins <joao.m.martins@oracle.com>
Subject: [PATCH RFCv2 15/24] iommufd: Add a flag to skip clearing of IOPTE dirty
Date: Thu, 18 May 2023 21:46:41 +0100	[thread overview]
Message-ID: <20230518204650.14541-16-joao.m.martins@oracle.com> (raw)
In-Reply-To: <20230518204650.14541-1-joao.m.martins@oracle.com>

VFIO has an operation where you an unmap an IOVA while returning a bitmap
with the dirty data. In reality the operation doesn't quite query the IO
pagetables that the the PTE was dirty or not, but it marks as dirty in the
bitmap anything that was mapped all in one operation.

In IOMMUFD this equivalent can be done in two operations by querying with
GET_DIRTY_IOVA followed by UNMAP_IOVA. However, this would incur two TLB
flushes given that after clearing dirty bits IOMMU implementations require
invalidating their IOTLB, plus another invalidation needed for the UNMAP.
To allow dirty bits to be queried faster, we add a flag
(IOMMU_GET_DIRTY_IOVA_NO_CLEAR) that requests to not clear the dirty bits
from the PTE (but just reading them), under the expectation that the next
operation is the unmap. An alternative is to unmap and just perpectually
mark as dirty as that's the same behaviour as today. So here equivalent
functionally can be provided, and if real dirty info is required we
amortize the cost while querying.

There's still a race against DMA where in theory the unmap of the IOVA
(when the guest invalidates the IOTLB via emulated iommu) would race
against the VF performing DMA on the same IOVA being invalidated which
would be marking the PTE as dirty but losing the update in the
unmap-related IOTLB flush. The way to actually prevent the race would be to
write-protect the IOPTE, then query dirty bits and flush the IOTLB in the
unmap after.  However, this remains an issue that is so far theoretically
possible, but lacks an use case or whether the race is relevant in the
first place that justifies such complexity.

Link: https://lore.kernel.org/linux-iommu/20220502185239.GR8364@nvidia.com/
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/hw_pagetable.c |  3 ++-
 drivers/iommu/iommufd/io_pagetable.c |  9 +++++++--
 include/uapi/linux/iommufd.h         | 12 ++++++++++++
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 25860aa0a1f8..bcb81eefe16c 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -260,7 +260,8 @@ int iommufd_hwpt_get_dirty_iova(struct iommufd_ucmd *ucmd)
 	struct iommufd_ioas *ioas;
 	int rc = -EOPNOTSUPP;
 
-	if ((cmd->flags || cmd->__reserved))
+	if ((cmd->flags & ~(IOMMU_GET_DIRTY_IOVA_NO_CLEAR)) ||
+	    cmd->__reserved)
 		return -EOPNOTSUPP;
 
 	hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index 01adb0f7e4d0..ef58910ec747 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -414,6 +414,7 @@ int iopt_map_user_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
 }
 
 struct iova_bitmap_fn_arg {
+	unsigned long flags;
 	struct iommu_domain *domain;
 	struct iommu_dirty_bitmap *dirty;
 };
@@ -426,8 +427,9 @@ static int __iommu_read_and_clear_dirty(struct iova_bitmap *bitmap,
 	struct iommu_domain *domain = arg->domain;
 	const struct iommu_domain_ops *ops = domain->ops;
 	struct iommu_dirty_bitmap *dirty = arg->dirty;
+	unsigned long flags = arg->flags;
 
-	return ops->read_and_clear_dirty(domain, iova, length, 0, dirty);
+	return ops->read_and_clear_dirty(domain, iova, length, flags, dirty);
 }
 
 static int iommu_read_and_clear_dirty(struct iommu_domain *domain,
@@ -451,11 +453,14 @@ static int iommu_read_and_clear_dirty(struct iommu_domain *domain,
 
 	iommu_dirty_bitmap_init(&dirty, iter, &gather);
 
+	arg.flags = flags;
 	arg.domain = domain;
 	arg.dirty = &dirty;
 	iova_bitmap_for_each(iter, &arg, __iommu_read_and_clear_dirty);
 
-	iommu_iotlb_sync(domain, &gather);
+	if (!(flags & IOMMU_DIRTY_NO_CLEAR))
+		iommu_iotlb_sync(domain, &gather);
+
 	iova_bitmap_free(iter);
 
 	return ret;
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index c256f7354867..a91bf845e679 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -427,6 +427,18 @@ struct iommufd_dirty_data {
 	__aligned_u64 *data;
 };
 
+/**
+ * enum iommufd_get_dirty_iova_flags - Flags for getting dirty bits
+ * @IOMMU_GET_DIRTY_IOVA_NO_CLEAR: Just read the PTEs without clearing any dirty
+ *                                 bits metadata. This flag can be passed in the
+ *                                 expectation where the next operation is
+ *                                 an unmap of the same IOVA range.
+ *
+ */
+enum iommufd_hwpt_get_dirty_iova_flags {
+	IOMMU_GET_DIRTY_IOVA_NO_CLEAR = 1,
+};
+
 /**
  * struct iommu_hwpt_get_dirty_iova - ioctl(IOMMU_HWPT_GET_DIRTY_IOVA)
  * @size: sizeof(struct iommu_hwpt_get_dirty_iova)
-- 
2.17.2


  parent reply	other threads:[~2023-05-18 20:49 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 01/24] iommu: Add RCU-protected page free support Joao Martins
2023-05-19 13:32   ` Jason Gunthorpe
2023-05-19 16:48     ` Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 02/24] iommu: Replace put_pages_list() with iommu_free_pgtbl_pages() Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 03/24] vfio: Move iova_bitmap into iommu core Joao Martins
2023-05-18 22:35   ` Alex Williamson
2023-05-19  9:06     ` Joao Martins
2023-05-19  9:01   ` Liu, Jingqi
2023-05-19  9:07     ` Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking Joao Martins
2023-05-19  8:42   ` Baolu Lu
2023-05-19  9:28     ` Joao Martins
2023-05-19 11:40   ` Jason Gunthorpe
2023-05-19 11:47     ` Joao Martins
2023-05-19 11:51       ` Jason Gunthorpe
2023-05-19 11:56         ` Joao Martins
2023-05-19 13:29           ` Jason Gunthorpe
2023-05-19 13:46             ` Joao Martins
2023-08-10 18:23             ` Joao Martins
2023-08-10 18:55               ` Jason Gunthorpe
2023-08-10 20:36                 ` Joao Martins
2023-08-11  1:09                   ` Jason Gunthorpe
2023-05-19 12:13         ` Baolu Lu
2023-05-19 13:22   ` Robin Murphy
2023-05-19 13:43     ` Joao Martins
2023-05-19 18:12       ` Robin Murphy
2023-05-18 20:46 ` [PATCH RFCv2 05/24] iommufd: Add a flag to enforce dirty tracking on attach Joao Martins
2023-05-19 13:34   ` Jason Gunthorpe
2023-05-18 20:46 ` [PATCH RFCv2 06/24] iommufd/selftest: Add a flags to _test_cmd_{hwpt_alloc,mock_domain} Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 07/24] iommufd/selftest: Test IOMMU_HWPT_ALLOC_ENFORCE_DIRTY Joao Martins
2023-05-19 13:35   ` Jason Gunthorpe
2023-05-19 13:52     ` Joao Martins
2023-05-19 13:55   ` Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 08/24] iommufd: Dirty tracking data support Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 09/24] iommufd: Add IOMMU_HWPT_SET_DIRTY Joao Martins
2023-05-19 13:49   ` Jason Gunthorpe
2023-05-19 14:21     ` Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 10/24] iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 11/24] iommufd: Add IOMMU_HWPT_GET_DIRTY_IOVA Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 12/24] iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_IOVA Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 13/24] iommufd: Add IOMMU_DEVICE_GET_CAPS Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 14/24] iommufd/selftest: Test IOMMU_DEVICE_GET_CAPS Joao Martins
2023-05-18 20:46 ` Joao Martins [this message]
2023-05-19 13:54   ` [PATCH RFCv2 15/24] iommufd: Add a flag to skip clearing of IOPTE dirty Jason Gunthorpe
2023-05-18 20:46 ` [PATCH RFCv2 16/24] iommufd/selftest: Test IOMMU_GET_DIRTY_IOVA_NO_CLEAR flag Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 17/24] iommu/amd: Access/Dirty bit support in IOPTEs Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 18/24] iommu/amd: Print access/dirty bits if supported Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 19/24] iommu/intel: Access/Dirty bit support for SL domains Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 20/24] iommu/arm-smmu-v3: Add feature detection for HTTU Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 21/24] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping Joao Martins
2023-05-19 13:49   ` Robin Murphy
2023-05-19 14:05     ` Joao Martins
2023-05-22 10:34   ` Shameerali Kolothum Thodi
2023-05-22 10:43     ` Joao Martins
2023-06-16 17:00       ` Shameerali Kolothum Thodi
2023-06-16 18:11         ` Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 22/24] iommu/arm-smmu-v3: Add read_and_clear_dirty() support Joao Martins
2023-06-16 16:46   ` Shameerali Kolothum Thodi
2023-06-16 18:10     ` Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 23/24] iommu/arm-smmu-v3: Add set_dirty_tracking() support Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 24/24] iommu/arm-smmu-v3: Advertise IOMMU_DOMAIN_F_ENFORCE_DIRTY Joao Martins
2023-05-30 14:10   ` Shameerali Kolothum Thodi
2023-05-30 19:19     ` Joao Martins
2023-05-31  9:21       ` Shameerali Kolothum Thodi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230518204650.14541-16-joao.m.martins@oracle.com \
    --to=joao.m.martins@oracle.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=eric.auger@redhat.com \
    --cc=iommu@lists.linux.dev \
    --cc=jean-philippe@linaro.org \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=will@kernel.org \
    --cc=yi.l.liu@intel.com \
    --cc=yi.y.sun@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).