iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking
@ 2023-05-18 20:46 Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 01/24] iommu: Add RCU-protected page free support Joao Martins
                   ` (23 more replies)
  0 siblings, 24 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

Presented herewith is a series that extends IOMMUFD to have IOMMU
hardware support for dirty bit in the IOPTEs.

Today, AMD Milan (or more recent) supports it while ARM SMMUv3.2
alongside VT-D rev3.x are expected to eventually come along.  One
intended use-case (but not restricted!) is to support Live Migration with
SR-IOV, specially useful for live migrateable PCI devices that can't
supply its own dirty tracking hardware blocks amongst others.

At a quick glance, IOMMUFD lets the userspace create the IOAS with a
set of a IOVA ranges mapped to some physical memory composing an IO
pagetable. This is then created via HWPT_ALLOC or attached to a
particular device/hwpt, consequently creating the IOMMU domain and share
a common IO page table representing the endporint DMA-addressable guest
address space. In IOMMUFD Dirty tracking (from v1 of the series) we will
be supporting the HWPT_ALLOC model only, as opposed to simpler
autodomains model.

The result is an hw_pagetable which represents the
iommu_domain which will be directly manipulated. The IOMMUFD UAPI,
and the iommu core kAPI are then extended to provide:

1) Enforce that only devices with dirty tracking support are attached
to an IOMMU domain, to cover the case where this isn't all homogenous in
the platform. The enforcing being enabled or not is tracked by the iommu
domain op *caller* not iommu driver implementaiton, to avoid redundantly
check this in IOMMU ops.

2) Toggling of Dirty Tracking on the iommu_domain. We model as the most
common case of changing hardware translation control structures dynamically
(x86) while making it easier to have an always-enabled mode. In the
RFCv1, the ARM specific case is suggested to be always enabled instead of
having to enable the per-PTE DBM control bit (what I previously called
"range tracking"). Here, setting/clearing tracking means just clearing the
dirty bits at start. IOMMUFD wise The 'real' tracking of whether dirty
tracking is enabled is stored in the IOMMU driver, hence no new
fields are added to iommufd pagetable structures, except for the
iommu_domain enforcement part.

Note: I haven't included a GET_DIRTY ioctl but I do have it implemented.
But I am not sure this is exactly needed. I find it good to have a getter
supplied with setter in general, but looking at how other parts were
developed in the past, the getter doesn't have a usage...

3) Add a capability probing for dirty tracking, leveraging the per-device
iommu_capable() and adding a IOMMU_CAP_DIRTY. IOMMUFD we add a
DEVICE_GET_CAPS ioctl which takes a device ID and returns some
capabilities. Similarly to 1) it might make sense to move it the drivers
.attach_device validation to the caller; for now I have this in iommu
drivers;

4) Read the I/O PTEs and marshal its dirtyiness into a bitmap. The bitmap
indexes on a page_size basis the IOVAs that got written by the device.
While performing the marshalling also drivers need to clear the dirty bits
from IOPTE and allow the kAPI caller to batch the much needed IOTLB flush.
There's no copy of bitmaps to userspace backed memory, all is zerocopy
based to not add more cost to the iommu driver IOPT walker. This shares
functionality with VFIO device dirty tracking via the IOVA bitmap APIs. So
far this is a test-and-clear kind of interface given that the IOPT walk is
going to be expensive. In addition this also adds the ability to read dirty
bit info without clearing the PTE info. This is meant to cover the
unmap-and-read-dirty use-case, and avoid the second IOTLB flush.

Note: I've kept the name read_and_clear_dirty() in v2 but this might not
make sense given the name of the flags; open to suggestions.

5) I've pulled Baolu Lu's patches[0] that make the pagetables page free
path RCU-protected, which will fix both the use-after-free scenario
mentioned there, but also let us a RCU-based page table walker for
read_and_clear_dirty() iommu op, as opposed to taking the same locks on
map/unmap. These are taken exactly as they were posted.

The additional dependency are:
* HWPT_ALLOC which allows creating/manipulating iommu_domain creation[4]

While needing this to make it useful with VFIO (and consequently to VMMs):
* VFIO cdev support and to use iommufd with VFIO [3]
* VFIO PCI hot reset

Hence, I have these two dependencies applied first on top of this series.
This whole thing as posted is also here[6].

The series is organized as follows:

* Patches 1-4: Takes care of the iommu domain operations to be added.
The idea is to abstract iommu drivers from any idea of how bitmaps are
stored or propagated back to the caller, as well as allowing
control/batching over IOTLB flush. So there's a data structure and an
helper that only tells the upper layer that an IOVA range got dirty.
This logic is shared with VFIO and it's meant to walking the bitmap
user memory, and kmap-ing plus setting bits as needed. IOMMU driver
just has an idea of a 'dirty bitmap state' and recording an IOVA as
dirty. It also pulls Baolu Lu's patches for RCU-safe pagetable free.

* Patches 5-16: Adds the UAPIs for IOMMUFD, and selftests. The selftests
cover some corner cases on boundaries handling of the bitmap and various
bitmap sizes that exercise. I haven't included huge IOVA ranges to avoid
risking the selftests failing to execute due to OOM issues of mmaping bit
buffers.

I've implemented for the x86 IOMMUs that have/eventually-have IOMMU A/D
support. So the next half of the series presents said implementations
for IOMMUs:

* Patches 17-18: AMD IOMMU implementation, particularly on those having
HDSup support. Tested with a Qemu amd-iommu with HDSUp emulated[1].

* Patches 19: Intel IOMMU rev3.x+ implementation. Tested with a Qemu
based intel-iommu vIOMMU with SSADS emulation support[1].

* Patches 20-24: ARM SMMU v3 impleemntation. A lot simpler than the v1
posting. Most of the adjustments were because of the new UAPI while taking
the comments I got in v1 from everyone. *Only compile tested*. Shameerali
will be taking over the ARM SMMUv3 support;

To help testing/prototypization, I also wrote qemu iommu emulation bits
to increase coverage of this code and hopefully make this more broadly
available for fellow contributors/devs[1]; it is stored here[2] and
largelly based on Nicolin, Yi and Eric's IOMMUFD bringup work (thanks a
ton!). It also includes IOMMUFD dirty tracking supporting Qemu that got
posted in the past. I won't be exactly following up a v2 there given that
IOMMUFD support needs to be firstly supported by Qemu.

We have live migrateable VFs in VMMs these days (e.g. Qemu 8.0) so we can
now test everything in tandem, but I haven't have my hardware setup *yet*
organized in such manner that allows me to test everything, hence why I am
still marking this as an RFC with intent to drop in v3. But most
importantly, this version is for making sure that iommu/iommufd kAPIs/UAPI
are solid; I'll focus more on iommu implementations next iteration;

Sorry for such a late posting since v1; hopefully this are in a better
direction.

Feedback or any comments are very much appreciated

Thanks!
        Joao

TODOs for v3:
- Testing with a live migrateable VF;
- Improve the dirty PTE walking in Intel/AMD iommu drivers, and anything
that I may have miss

Changes since RFCv1[5]:
Too many changes but the major items were:
* Majorirty of the changes from Jason/Kevin/Baolu/Suravee:
- Improve structure and rework most commit messages
- Drop all of the VFIO-compat series
- Drop the unmap-get-dirty API
- Tie this to HWPT only, no more autodomains handling;
- Rework the UAPI widely by:
  - Having a IOMMU_DEVICE_GET_CAPS which allows to fetching capabilities
    of devices, specifically test dirty tracking support for an individual
    device
  - Add a enforce-dirty flag to the IOMMU domain via HWPT_ALLOC
  - SET_DIRTY now clears dirty tracking before asking iommu driver to do so;
  - New GET_DIRTY_IOVA flag that does not clear dirty bits
  - Add coverage for all added APIs
  - Expand GET_DIRTY_IOVA tests to cover IOVA bitmap corner cases tests
  that I had in separate; I only excluded the Terabyte IOVA range
  usecases (which test bitmaps 2M+) because those will most likely fail
  to be run as selftests (not sure yet how I can include those). I am
  not exactly sure how I can cover those, unless I do 'fake IOVA maps'
  *somehow* which do not necessarily require real buffers.
- Handle most comments in intel-iommu. Only remaining one for v3 is the
  PTE walker which will be done better.
- Handle all comments in amd-iommu, most of which regarding locking.
  Only one remaining is v3 same as Intel;
- Reflect the UAPI changes into iommu driver implementations, including
persisting dirty tracking enabling in new attach_dev calls, as well as
enforcing attach_dev enforces the requested domain flags; As well as
future devices setting Dirty activated if they get attached to a iommu
domain with dirty tracking enabled.
* Comments from Yi Sun in making sure that dirty tracking isn't
restricted into SS only, so relax the check for FL support because it's
always enabled. (Yi Sun)
* Most of code that was in v1 for dirty bitmaps got rewritten and
repurpose to also cover VFIO case; so reuse this infra here too for both.
(Jason)
* Take Robin's suggestion of always enabling dirty tracking and set_dirty
just clearing bits on 'activation', and make that a generic property to
ensure we always get accurate results between starting and stopping
tracking. (Robin Murphy)
* Address all comments from SMMUv3 into how we enable/test the DBM, or the
bits in the context descriptor with io-pgtable::quirks, etc
(Robin, Shameerali)

[0] https://lore.kernel.org/linux-iommu/20220609070811.902868-1-baolu.lu@linux.intel.com/
[1] https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.martins@oracle.com/
[2] https://github.com/jpemartins/qemu/commits/iommufd-v2
[3] https://lore.kernel.org/kvm/20230426150321.454465-1-yi.l.liu@intel.com/
[4] https://lore.kernel.org/kvm/0-v7-6c0fd698eda2+5e3-iommufd_alloc_jgg@nvidia.com/
[5] https://lore.kernel.org/kvm/20220428210933.3583-1-joao.m.martins@oracle.com/
[6] https://github.com/jpemartins/linux/commits/iommufd-v2


Jean-Philippe Brucker (1):
  iommu/arm-smmu-v3: Add feature detection for HTTU

Joao Martins (19):
  vfio: Move iova_bitmap into iommu core
  iommu: Add iommu_domain ops for dirty tracking
  iommufd: Add a flag to enforce dirty tracking on attach
  iommufd/selftest: Add a flags to _test_cmd_{hwpt_alloc,mock_domain}
  iommufd/selftest: Test IOMMU_HWPT_ALLOC_ENFORCE_DIRTY
  iommufd: Dirty tracking data support
  iommufd: Add IOMMU_HWPT_SET_DIRTY
  iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY
  iommufd: Add IOMMU_HWPT_GET_DIRTY_IOVA
  iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_IOVA
  iommufd: Add IOMMU_DEVICE_GET_CAPS
  iommufd/selftest: Test IOMMU_DEVICE_GET_CAPS
  iommufd: Add a flag to skip clearing of IOPTE dirty
  iommufd/selftest: Test IOMMU_GET_DIRTY_IOVA_NO_CLEAR flag
  iommu/amd: Access/Dirty bit support in IOPTEs
  iommu/amd: Print access/dirty bits if supported
  iommu/intel: Access/Dirty bit support for SL domains
  iommu/arm-smmu-v3: Add set_dirty_tracking() support
  iommu/arm-smmu-v3: Advertise IOMMU_DOMAIN_F_ENFORCE_DIRTY

Keqian Zhu (1):
  iommu/arm-smmu-v3: Add read_and_clear_dirty() support

Kunkun Jiang (1):
  iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping

Lu Baolu (2):
  iommu: Add RCU-protected page free support
  iommu: Replace put_pages_list() with iommu_free_pgtbl_pages()

 drivers/iommu/Makefile                        |   1 +
 drivers/iommu/amd/amd_iommu.h                 |   1 +
 drivers/iommu/amd/amd_iommu_types.h           |  12 +
 drivers/iommu/amd/init.c                      |   9 +
 drivers/iommu/amd/io_pgtable.c                |  89 +++++++-
 drivers/iommu/amd/iommu.c                     |  81 +++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  95 ++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   8 +
 drivers/iommu/dma-iommu.c                     |   6 +-
 drivers/iommu/intel/iommu.c                   |  92 +++++++-
 drivers/iommu/intel/iommu.h                   |  15 ++
 drivers/iommu/intel/pasid.c                   | 103 +++++++++
 drivers/iommu/intel/pasid.h                   |   4 +
 drivers/iommu/io-pgtable-arm.c                | 115 +++++++++-
 drivers/iommu/iommu.c                         |  34 +++
 drivers/iommu/iommufd/device.c                |  28 ++-
 drivers/iommu/iommufd/hw_pagetable.c          | 112 +++++++++-
 drivers/iommu/iommufd/io_pagetable.c          | 111 ++++++++++
 drivers/iommu/iommufd/iommufd_private.h       |  27 ++-
 drivers/iommu/iommufd/iommufd_test.h          |  14 ++
 drivers/iommu/iommufd/main.c                  |   9 +
 drivers/iommu/iommufd/selftest.c              | 147 ++++++++++++-
 drivers/{vfio => iommu}/iova_bitmap.c         |   0
 drivers/vfio/Makefile                         |   3 +-
 include/linux/io-pgtable.h                    |   8 +
 include/linux/iommu.h                         |  77 +++++++
 include/uapi/linux/iommufd.h                  | 107 +++++++++
 tools/testing/selftests/iommu/Makefile        |   3 +
 tools/testing/selftests/iommu/iommufd.c       | 206 +++++++++++++++++-
 .../selftests/iommu/iommufd_fail_nth.c        |  24 +-
 tools/testing/selftests/iommu/iommufd_utils.h | 181 ++++++++++++++-
 31 files changed, 1680 insertions(+), 42 deletions(-)
 rename drivers/{vfio => iommu}/iova_bitmap.c (100%)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 01/24] iommu: Add RCU-protected page free support
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-19 13:32   ` Jason Gunthorpe
  2023-05-18 20:46 ` [PATCH RFCv2 02/24] iommu: Replace put_pages_list() with iommu_free_pgtbl_pages() Joao Martins
                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

From: Lu Baolu <baolu.lu@linux.intel.com>

The IOMMU page tables are updated using iommu_map/unmap() interfaces.
Currently, there is no mandatory requirement for drivers to use locks
to ensure concurrent updates to page tables, because it's assumed that
overlapping IOVA ranges do not have concurrent updates. Therefore the
IOMMU drivers only need to take care of concurrent updates to level
page table entries.

But enabling new features challenges this assumption. For example, the
hardware assisted dirty page tracking feature requires scanning page
tables in interfaces other than mapping and unmapping. This might result
in a use-after-free scenario in which a level page table has been freed
by the unmap() interface, while another thread is scanning the next level
page table.

This adds RCU-protected page free support so that the pages are really
freed and reused after a RCU grace period. Hence, the page tables are
safe for scanning within a rcu_read_lock critical region. Considering
that scanning the page table is a rare case, this also adds a domain
flag and the RCU-protected page free is only used when this flat is set.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommu.c | 23 +++++++++++++++++++++++
 include/linux/iommu.h | 10 ++++++++++
 2 files changed, 33 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 91573efd9488..2088caae5074 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3432,3 +3432,26 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
 
 	return domain;
 }
+
+static void pgtble_page_free_rcu(struct rcu_head *rcu)
+{
+	struct page *page = container_of(rcu, struct page, rcu_head);
+
+	__free_pages(page, 0);
+}
+
+void iommu_free_pgtbl_pages(struct iommu_domain *domain,
+			    struct list_head *pages)
+{
+	struct page *page, *next;
+
+	if (!domain->concurrent_traversal) {
+		put_pages_list(pages);
+		return;
+	}
+
+	list_for_each_entry_safe(page, next, pages, lru) {
+		list_del(&page->lru);
+		call_rcu(&page->rcu_head, pgtble_page_free_rcu);
+	}
+}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e8c9a7da1060..39d25645a5ab 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -110,6 +110,7 @@ struct iommu_domain {
 			int users;
 		};
 	};
+	unsigned long concurrent_traversal:1;
 };
 
 static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
@@ -697,6 +698,12 @@ static inline void dev_iommu_priv_set(struct device *dev, void *priv)
 	dev->iommu->priv = priv;
 }
 
+static inline void domain_set_concurrent_traversal(struct iommu_domain *domain,
+						   bool value)
+{
+	domain->concurrent_traversal = value;
+}
+
 int iommu_probe_device(struct device *dev);
 
 int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features f);
@@ -721,6 +728,9 @@ void iommu_detach_device_pasid(struct iommu_domain *domain,
 struct iommu_domain *
 iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid,
 			       unsigned int type);
+
+void iommu_free_pgtbl_pages(struct iommu_domain *domain,
+			    struct list_head *pages);
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 02/24] iommu: Replace put_pages_list() with iommu_free_pgtbl_pages()
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 01/24] iommu: Add RCU-protected page free support Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 03/24] vfio: Move iova_bitmap into iommu core Joao Martins
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

From: Lu Baolu <baolu.lu@linux.intel.com>

Therefore, RCU protected page free will take effect if necessary.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/amd/io_pgtable.c | 5 ++---
 drivers/iommu/dma-iommu.c      | 6 ++++--
 drivers/iommu/intel/iommu.c    | 4 ++--
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 1b67116882be..666b643106f8 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -430,7 +430,7 @@ static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	}
 
 	/* Everything flushed out, free pages now */
-	put_pages_list(&freelist);
+	iommu_free_pgtbl_pages(&dom->domain, &freelist);
 
 	return ret;
 }
@@ -511,8 +511,7 @@ static void v1_free_pgtable(struct io_pgtable *iop)
 
 	/* Make changes visible to IOMMUs */
 	amd_iommu_domain_update(dom);
-
-	put_pages_list(&freelist);
+	iommu_free_pgtbl_pages(&dom->domain, &freelist);
 }
 
 static struct io_pgtable *v1_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7a9f0b0bddbd..33925b9249b3 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -136,7 +136,8 @@ static void fq_ring_free(struct iommu_dma_cookie *cookie, struct iova_fq *fq)
 		if (fq->entries[idx].counter >= counter)
 			break;
 
-		put_pages_list(&fq->entries[idx].freelist);
+		iommu_free_pgtbl_pages(cookie->fq_domain,
+				       &fq->entries[idx].freelist);
 		free_iova_fast(&cookie->iovad,
 			       fq->entries[idx].iova_pfn,
 			       fq->entries[idx].pages);
@@ -232,7 +233,8 @@ static void iommu_dma_free_fq(struct iommu_dma_cookie *cookie)
 		struct iova_fq *fq = per_cpu_ptr(cookie->fq, cpu);
 
 		fq_ring_for_each(idx, fq)
-			put_pages_list(&fq->entries[idx].freelist);
+			iommu_free_pgtbl_pages(cookie->fq_domain,
+					       &fq->entries[idx].freelist);
 	}
 
 	free_percpu(cookie->fq);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index b871a6afd803..4662292d60ba 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1827,7 +1827,7 @@ static void domain_exit(struct dmar_domain *domain)
 		LIST_HEAD(freelist);
 
 		domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw), &freelist);
-		put_pages_list(&freelist);
+		iommu_free_pgtbl_pages(&domain->domain, &freelist);
 	}
 
 	if (WARN_ON(!list_empty(&domain->devices)))
@@ -4286,7 +4286,7 @@ static void intel_iommu_tlb_sync(struct iommu_domain *domain,
 				      start_pfn, nrpages,
 				      list_empty(&gather->freelist), 0);
 
-	put_pages_list(&gather->freelist);
+	iommu_free_pgtbl_pages(domain, &gather->freelist);
 }
 
 static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain,
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 03/24] vfio: Move iova_bitmap into iommu core
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 01/24] iommu: Add RCU-protected page free support Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 02/24] iommu: Replace put_pages_list() with iommu_free_pgtbl_pages() Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-18 22:35   ` Alex Williamson
  2023-05-19  9:01   ` Liu, Jingqi
  2023-05-18 20:46 ` [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking Joao Martins
                   ` (20 subsequent siblings)
  23 siblings, 2 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking
the user bitmaps, so move to the common dependency into IOMMU core. IOMMUFD
can't exactly host it given that VFIO dirty tracking can be used without
IOMMUFD.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/Makefile                | 1 +
 drivers/{vfio => iommu}/iova_bitmap.c | 0
 drivers/vfio/Makefile                 | 3 +--
 3 files changed, 2 insertions(+), 2 deletions(-)
 rename drivers/{vfio => iommu}/iova_bitmap.c (100%)

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 769e43d780ce..9d9dfbd2dfc2 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_DART) += io-pgtable-dart.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
+obj-$(CONFIG_IOMMU_IOVA) += iova_bitmap.o
 obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
 obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
diff --git a/drivers/vfio/iova_bitmap.c b/drivers/iommu/iova_bitmap.c
similarity index 100%
rename from drivers/vfio/iova_bitmap.c
rename to drivers/iommu/iova_bitmap.c
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 57c3515af606..f9cc32a9810c 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -1,8 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_VFIO) += vfio.o
 
-vfio-y += vfio_main.o \
-	  iova_bitmap.o
+vfio-y += vfio_main.o
 vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
 vfio-$(CONFIG_VFIO_GROUP) += group.o
 vfio-$(CONFIG_IOMMUFD) += iommufd.o
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (2 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 03/24] vfio: Move iova_bitmap into iommu core Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-19  8:42   ` Baolu Lu
                     ` (2 more replies)
  2023-05-18 20:46 ` [PATCH RFCv2 05/24] iommufd: Add a flag to enforce dirty tracking on attach Joao Martins
                   ` (19 subsequent siblings)
  23 siblings, 3 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

Add to iommu domain operations a set of callbacks to perform dirty
tracking, particulary to start and stop tracking and finally to read and
clear the dirty data.

Drivers are generally expected to dynamically change its translation
structures to toggle the tracking and flush some form of control state
structure that stands in the IOVA translation path. Though it's not
mandatory, as drivers will be enable dirty tracking at boot, and just flush
the IO pagetables when setting dirty tracking.  For each of the newly added
IOMMU core APIs:

.supported_flags[IOMMU_DOMAIN_F_ENFORCE_DIRTY]: Introduce a set of flags
that enforce certain restrictions in the iommu_domain object. For dirty
tracking this means that when IOMMU_DOMAIN_F_ENFORCE_DIRTY is set via its
helper iommu_domain_set_flags(...) devices attached via attach_dev will
fail on devices that do *not* have dirty tracking supported. IOMMU drivers
that support dirty tracking should advertise this flag, while enforcing
that dirty tracking is supported by the device in its .attach_dev iommu op.

iommu_cap::IOMMU_CAP_DIRTY: new device iommu_capable value when probing for
capabilities of the device.

.set_dirty_tracking(): an iommu driver is expected to change its
translation structures and enable dirty tracking for the devices in the
iommu_domain. For drivers making dirty tracking always-enabled, it should
just return 0.

.read_and_clear_dirty(): an iommu driver is expected to walk the iova range
passed in and use iommu_dirty_bitmap_record() to record dirty info per
IOVA. When detecting a given IOVA is dirty it should also clear its dirty
state from the PTE, *unless* the flag IOMMU_DIRTY_NO_CLEAR is passed in --
flushing is steered from the caller of the domain_op via iotlb_gather. The
iommu core APIs use the same data structure in use for dirty tracking for
VFIO device dirty (struct iova_bitmap) abstracted by
iommu_dirty_bitmap_record() helper function.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommu.c      | 11 +++++++
 include/linux/io-pgtable.h |  4 +++
 include/linux/iommu.h      | 67 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 82 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2088caae5074..95acc543e8fb 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2013,6 +2013,17 @@ struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus)
 }
 EXPORT_SYMBOL_GPL(iommu_domain_alloc);
 
+int iommu_domain_set_flags(struct iommu_domain *domain,
+			   const struct bus_type *bus, unsigned long val)
+{
+	if (!(val & bus->iommu_ops->supported_flags))
+		return -EINVAL;
+
+	domain->flags |= val;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_domain_set_flags);
+
 void iommu_domain_free(struct iommu_domain *domain)
 {
 	if (domain->type == IOMMU_DOMAIN_SVA)
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 1b7a44b35616..25142a0e2fc2 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -166,6 +166,10 @@ struct io_pgtable_ops {
 			      struct iommu_iotlb_gather *gather);
 	phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
 				    unsigned long iova);
+	int (*read_and_clear_dirty)(struct io_pgtable_ops *ops,
+				    unsigned long iova, size_t size,
+				    unsigned long flags,
+				    struct iommu_dirty_bitmap *dirty);
 };
 
 /**
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 39d25645a5ab..992ea87f2f8e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -13,6 +13,7 @@
 #include <linux/errno.h>
 #include <linux/err.h>
 #include <linux/of.h>
+#include <linux/iova_bitmap.h>
 #include <uapi/linux/iommu.h>
 
 #define IOMMU_READ	(1 << 0)
@@ -65,6 +66,11 @@ struct iommu_domain_geometry {
 
 #define __IOMMU_DOMAIN_SVA	(1U << 4)  /* Shared process address space */
 
+/* Domain feature flags that do not define domain types */
+#define IOMMU_DOMAIN_F_ENFORCE_DIRTY	(1U << 6)  /* Enforce attachment of
+						      dirty tracking supported
+						      devices		  */
+
 /*
  * This are the possible domain-types
  *
@@ -93,6 +99,7 @@ struct iommu_domain_geometry {
 
 struct iommu_domain {
 	unsigned type;
+	unsigned flags;
 	const struct iommu_domain_ops *ops;
 	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
 	struct iommu_domain_geometry geometry;
@@ -128,6 +135,7 @@ enum iommu_cap {
 	 * this device.
 	 */
 	IOMMU_CAP_ENFORCE_CACHE_COHERENCY,
+	IOMMU_CAP_DIRTY,		/* IOMMU supports dirty tracking */
 };
 
 /* These are the possible reserved region types */
@@ -220,6 +228,17 @@ struct iommu_iotlb_gather {
 	bool			queued;
 };
 
+/**
+ * struct iommu_dirty_bitmap - Dirty IOVA bitmap state
+ *
+ * @bitmap: IOVA bitmap
+ * @gather: Range information for a pending IOTLB flush
+ */
+struct iommu_dirty_bitmap {
+	struct iova_bitmap *bitmap;
+	struct iommu_iotlb_gather *gather;
+};
+
 /**
  * struct iommu_ops - iommu ops and capabilities
  * @capable: check capability
@@ -248,6 +267,7 @@ struct iommu_iotlb_gather {
  *                    pasid, so that any DMA transactions with this pasid
  *                    will be blocked by the hardware.
  * @pgsize_bitmap: bitmap of all possible supported page sizes
+ * @flags: All non domain type supported features
  * @owner: Driver module providing these ops
  */
 struct iommu_ops {
@@ -281,6 +301,7 @@ struct iommu_ops {
 
 	const struct iommu_domain_ops *default_domain_ops;
 	unsigned long pgsize_bitmap;
+	unsigned long supported_flags;
 	struct module *owner;
 };
 
@@ -316,6 +337,11 @@ struct iommu_ops {
  * @enable_nesting: Enable nesting
  * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
  * @free: Release the domain after use.
+ * @set_dirty_tracking: Enable or Disable dirty tracking on the iommu domain
+ * @read_and_clear_dirty: Walk IOMMU page tables for dirtied PTEs marshalled
+ *                        into a bitmap, with a bit represented as a page.
+ *                        Reads the dirty PTE bits and clears it from IO
+ *                        pagetables.
  */
 struct iommu_domain_ops {
 	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
@@ -348,6 +374,12 @@ struct iommu_domain_ops {
 				  unsigned long quirks);
 
 	void (*free)(struct iommu_domain *domain);
+
+	int (*set_dirty_tracking)(struct iommu_domain *domain, bool enabled);
+	int (*read_and_clear_dirty)(struct iommu_domain *domain,
+				    unsigned long iova, size_t size,
+				    unsigned long flags,
+				    struct iommu_dirty_bitmap *dirty);
 };
 
 /**
@@ -461,6 +493,9 @@ extern bool iommu_present(const struct bus_type *bus);
 extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap);
 extern bool iommu_group_has_isolated_msi(struct iommu_group *group);
 extern struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus);
+extern int iommu_domain_set_flags(struct iommu_domain *domain,
+				  const struct bus_type *bus,
+				  unsigned long flags);
 extern void iommu_domain_free(struct iommu_domain *domain);
 extern int iommu_attach_device(struct iommu_domain *domain,
 			       struct device *dev);
@@ -627,6 +662,28 @@ static inline bool iommu_iotlb_gather_queued(struct iommu_iotlb_gather *gather)
 	return gather && gather->queued;
 }
 
+static inline void iommu_dirty_bitmap_init(struct iommu_dirty_bitmap *dirty,
+					   struct iova_bitmap *bitmap,
+					   struct iommu_iotlb_gather *gather)
+{
+	if (gather)
+		iommu_iotlb_gather_init(gather);
+
+	dirty->bitmap = bitmap;
+	dirty->gather = gather;
+}
+
+static inline void
+iommu_dirty_bitmap_record(struct iommu_dirty_bitmap *dirty, unsigned long iova,
+			  unsigned long length)
+{
+	if (dirty->bitmap)
+		iova_bitmap_set(dirty->bitmap, iova, length);
+
+	if (dirty->gather)
+		iommu_iotlb_gather_add_range(dirty->gather, iova, length);
+}
+
 /* PCI device grouping function */
 extern struct iommu_group *pci_device_group(struct device *dev);
 /* Generic device grouping function */
@@ -657,6 +714,9 @@ struct iommu_fwspec {
 /* ATS is supported */
 #define IOMMU_FWSPEC_PCI_RC_ATS			(1 << 0)
 
+/* Read but do not clear any dirty bits */
+#define IOMMU_DIRTY_NO_CLEAR			(1 << 0)
+
 /**
  * struct iommu_sva - handle to a device-mm bond
  */
@@ -755,6 +815,13 @@ static inline struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus
 	return NULL;
 }
 
+static inline int iommu_domain_set_flags(struct iommu_domain *domain,
+					 const struct bus_type *bus,
+					 unsigned long flags)
+{
+	return -ENODEV;
+}
+
 static inline void iommu_domain_free(struct iommu_domain *domain)
 {
 }
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 05/24] iommufd: Add a flag to enforce dirty tracking on attach
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (3 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-19 13:34   ` Jason Gunthorpe
  2023-05-18 20:46 ` [PATCH RFCv2 06/24] iommufd/selftest: Add a flags to _test_cmd_{hwpt_alloc,mock_domain} Joao Martins
                   ` (18 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

Throughout IOMMU domain lifetime that wants to use dirty tracking, some
guarantees are needed such that any device attached to the iommu_domain
supports dirty tracking.

The idea is to handle a case where IOMMUs are assymetric feature-wise and
thus the capability may not be advertised for all devices.  This is done by
adding a flag into HWPT_ALLOC namely:

	IOMMUFD_HWPT_ALLOC_ENFORCE_DIRTY

.. Passed in HWPT_ALLOC ioctl flags. The enforcement is done by creating a
iommu_domain and setting the associated flags (via iommu_domain_set_flags)
cross-checking with IOMMU driver advertised flags (and failing if it's not
advertised). Advertising the new IOMMU domain feature flag requires that
the individual iommu driver capability is supported when we attach a new
device to the iommu_domain or otherwise fail the attachment if the
capability is not set in the device.  Userspace will have also the option
of checking which that dirty tracking is supported in the IOMMU behind the
device.

Link: https://lore.kernel.org/kvm/20220721142421.GB4609@nvidia.com/
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/device.c          |  2 +-
 drivers/iommu/iommufd/hw_pagetable.c    | 29 ++++++++++++++++++++++---
 drivers/iommu/iommufd/iommufd_private.h |  4 +++-
 include/uapi/linux/iommufd.h            |  9 ++++++++
 4 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 989bd485f92f..48d1300f0350 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -511,7 +511,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
 	}
 
 	hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev,
-					  immediate_attach);
+					  immediate_attach, false);
 	if (IS_ERR(hwpt)) {
 		destroy_hwpt = ERR_CAST(hwpt);
 		goto out_unlock;
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index cf2c1504e20d..4f0b72737ae2 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -55,12 +55,26 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
 	return 0;
 }
 
+int iommufd_hw_pagetable_enforce_dirty(struct iommufd_hw_pagetable *hwpt,
+				       struct iommufd_device *idev)
+{
+	hwpt->enforce_dirty =
+		!iommu_domain_set_flags(hwpt->domain, idev->dev->bus,
+					IOMMU_DOMAIN_F_ENFORCE_DIRTY);
+	if (!hwpt->enforce_dirty)
+		return -EINVAL;
+
+	return 0;
+}
+
 /**
  * iommufd_hw_pagetable_alloc() - Get an iommu_domain for a device
  * @ictx: iommufd context
  * @ioas: IOAS to associate the domain with
  * @idev: Device to get an iommu_domain for
  * @immediate_attach: True if idev should be attached to the hwpt
+ * @enforce_dirty: True if dirty tracking support should be enforce
+ * 		   on device attach
  *
  * Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT
  * will be linked to the given ioas and upon return the underlying iommu_domain
@@ -72,7 +86,8 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
  */
 struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
-			   struct iommufd_device *idev, bool immediate_attach)
+			   struct iommufd_device *idev, bool immediate_attach,
+			   bool enforce_dirty)
 {
 	struct iommufd_hw_pagetable *hwpt;
 	int rc;
@@ -107,6 +122,12 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 			goto out_abort;
 	}
 
+	if (enforce_dirty) {
+		rc = iommufd_hw_pagetable_enforce_dirty(hwpt, idev);
+		if (rc)
+			goto out_abort;
+	}
+
 	/*
 	 * immediate_attach exists only to accommodate iommu drivers that cannot
 	 * directly allocate a domain. These drivers do not finish creating the
@@ -141,7 +162,8 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 	struct iommufd_ioas *ioas;
 	int rc;
 
-	if (cmd->flags || cmd->__reserved)
+	if ((cmd->flags & ~(IOMMU_HWPT_ALLOC_ENFORCE_DIRTY)) ||
+	    cmd->__reserved)
 		return -EOPNOTSUPP;
 
 	idev = iommufd_get_device(ucmd, cmd->dev_id);
@@ -155,7 +177,8 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 	}
 
 	mutex_lock(&ioas->mutex);
-	hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev, false);
+	hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev, false,
+				  cmd->flags & IOMMU_HWPT_ALLOC_ENFORCE_DIRTY);
 	if (IS_ERR(hwpt)) {
 		rc = PTR_ERR(hwpt);
 		goto out_unlock;
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index dba730129b8c..2552eb44d83a 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -247,6 +247,7 @@ struct iommufd_hw_pagetable {
 	struct iommu_domain *domain;
 	bool auto_domain : 1;
 	bool enforce_cache_coherency : 1;
+	bool enforce_dirty : 1;
 	bool msi_cookie : 1;
 	/* Head at iommufd_ioas::hwpt_list */
 	struct list_head hwpt_item;
@@ -254,7 +255,8 @@ struct iommufd_hw_pagetable {
 
 struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
-			   struct iommufd_device *idev, bool immediate_attach);
+			   struct iommufd_device *idev, bool immediate_attach,
+			   bool enforce_dirty);
 int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt);
 int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 				struct iommufd_device *idev);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 8245c01adca6..1cd9c54d0f64 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -346,6 +346,15 @@ struct iommu_vfio_ioas {
 };
 #define IOMMU_VFIO_IOAS _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VFIO_IOAS)
 
+/**
+ * enum iommufd_hwpt_alloc_flags - Flags for alloc hwpt
+ * @IOMMU_HWPT_ALL_ENFORCE_DIRTY: Dirty tracking support for device IOMMU is
+ *                                enforced on device attachment
+ */
+enum iommufd_hwpt_alloc_flags {
+	IOMMU_HWPT_ALLOC_ENFORCE_DIRTY = 1 << 0,
+};
+
 /**
  * struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC)
  * @size: sizeof(struct iommu_hwpt_alloc)
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 06/24] iommufd/selftest: Add a flags to _test_cmd_{hwpt_alloc,mock_domain}
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (4 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 05/24] iommufd: Add a flag to enforce dirty tracking on attach Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 07/24] iommufd/selftest: Test IOMMU_HWPT_ALLOC_ENFORCE_DIRTY Joao Martins
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

In preparation to test passing flags to HWPT_ALLOC (particularly
IOMMU_HWPT_ALLOC_ENFORCE_DIRTY), add a flags argument into the test
functions.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 tools/testing/selftests/iommu/iommufd.c       |  3 ++-
 .../selftests/iommu/iommufd_fail_nth.c        | 24 +++++++++++--------
 tools/testing/selftests/iommu/iommufd_utils.h | 22 ++++++++++-------
 3 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index dc09c1de319f..771e4a40200f 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -1346,7 +1346,8 @@ TEST_F(iommufd_mock_domain, alloc_hwpt)
 		uint32_t stddev_id;
 		uint32_t hwpt_id;
 
-		test_cmd_hwpt_alloc(self->idev_ids[0], self->ioas_id, &hwpt_id);
+		test_cmd_hwpt_alloc(self->idev_ids[0], self->ioas_id,
+				    0, &hwpt_id);
 		test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
 		test_ioctl_destroy(stddev_id);
 		test_ioctl_destroy(hwpt_id);
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index d4c552e56948..0e003069bb2a 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -315,7 +315,8 @@ TEST_FAIL_NTH(basic_fail_nth, map_domain)
 
 	fail_nth_enable();
 
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id, NULL))
+	if (_test_cmd_mock_domain(self->fd, ioas_id, 0, &stdev_id,
+				  &hwpt_id, NULL))
 		return -1;
 
 	if (_test_ioctl_ioas_map(self->fd, ioas_id, buffer, 262144, &iova,
@@ -326,7 +327,8 @@ TEST_FAIL_NTH(basic_fail_nth, map_domain)
 	if (_test_ioctl_destroy(self->fd, stdev_id))
 		return -1;
 
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id, NULL))
+	if (_test_cmd_mock_domain(self->fd, ioas_id, 0,
+				  &stdev_id, &hwpt_id, NULL))
 		return -1;
 	return 0;
 }
@@ -350,13 +352,14 @@ TEST_FAIL_NTH(basic_fail_nth, map_two_domains)
 	if (_test_ioctl_set_temp_memory_limit(self->fd, 32))
 		return -1;
 
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id, NULL))
+	if (_test_cmd_mock_domain(self->fd, ioas_id, 0,
+				  &stdev_id, &hwpt_id, NULL))
 		return -1;
 
 	fail_nth_enable();
 
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id2, &hwpt_id2,
-				  NULL))
+	if (_test_cmd_mock_domain(self->fd, ioas_id, 0,
+				  &stdev_id2, &hwpt_id2, NULL))
 		return -1;
 
 	if (_test_ioctl_ioas_map(self->fd, ioas_id, buffer, 262144, &iova,
@@ -370,9 +373,9 @@ TEST_FAIL_NTH(basic_fail_nth, map_two_domains)
 	if (_test_ioctl_destroy(self->fd, stdev_id2))
 		return -1;
 
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id, NULL))
+	if (_test_cmd_mock_domain(self->fd, ioas_id, 0, &stdev_id, &hwpt_id, NULL))
 		return -1;
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id2, &hwpt_id2,
+	if (_test_cmd_mock_domain(self->fd, ioas_id, 0, &stdev_id2, &hwpt_id2,
 				  NULL))
 		return -1;
 	return 0;
@@ -530,7 +533,8 @@ TEST_FAIL_NTH(basic_fail_nth, access_pin_domain)
 	if (_test_ioctl_set_temp_memory_limit(self->fd, 32))
 		return -1;
 
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id, NULL))
+	if (_test_cmd_mock_domain(self->fd, ioas_id, 0,
+				  &stdev_id, &hwpt_id, NULL))
 		return -1;
 
 	if (_test_ioctl_ioas_map(self->fd, ioas_id, buffer, BUFFER_SIZE, &iova,
@@ -607,11 +611,11 @@ TEST_FAIL_NTH(basic_fail_nth, device)
 
 	fail_nth_enable();
 
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL,
+	if (_test_cmd_mock_domain(self->fd, ioas_id, 0, &stdev_id, NULL,
 				  &idev_id))
 		return -1;
 
-	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, &hwpt_id))
+	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, &hwpt_id))
 		return -1;
 
 	if (_test_cmd_mock_domain_replace(self->fd, stdev_id, ioas_id2, NULL))
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 53b4d3f2d9fc..04871bcfd34b 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -40,7 +40,8 @@ static unsigned long PAGE_SIZE;
 				&test_cmd));                                  \
 	})
 
-static int _test_cmd_mock_domain(int fd, unsigned int ioas_id, __u32 *stdev_id,
+static int _test_cmd_mock_domain(int fd, unsigned int ioas_id,
+				 __u32 stdev_flags, __u32 *stdev_id,
 				 __u32 *hwpt_id, __u32 *idev_id)
 {
 	struct iommu_test_cmd cmd = {
@@ -64,10 +65,13 @@ static int _test_cmd_mock_domain(int fd, unsigned int ioas_id, __u32 *stdev_id,
 	return 0;
 }
 #define test_cmd_mock_domain(ioas_id, stdev_id, hwpt_id, idev_id)       \
-	ASSERT_EQ(0, _test_cmd_mock_domain(self->fd, ioas_id, stdev_id, \
-					   hwpt_id, idev_id))
-#define test_err_mock_domain(_errno, ioas_id, stdev_id, hwpt_id)      \
-	EXPECT_ERRNO(_errno, _test_cmd_mock_domain(self->fd, ioas_id, \
+	ASSERT_EQ(0, _test_cmd_mock_domain(self->fd, ioas_id, 0,	\
+					   stdev_id, hwpt_id, idev_id))
+#define test_err_mock_domain(_errno, ioas_id, stdev_id, hwpt_id)         \
+	EXPECT_ERRNO(_errno, _test_cmd_mock_domain(self->fd, ioas_id, 0, \
+						   stdev_id, hwpt_id, NULL))
+#define test_err_mock_domain_flags(_errno, ioas_id, flags, stdev_id, hwpt_id) \
+	EXPECT_ERRNO(_errno, _test_cmd_mock_domain(self->fd, ioas_id, flags,  \
 						   stdev_id, hwpt_id, NULL))
 
 static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id,
@@ -99,10 +103,11 @@ static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id,
 							   pt_id, NULL))
 
 static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
-					 __u32 *hwpt_id)
+				__u32 flags, __u32 *hwpt_id)
 {
 	struct iommu_hwpt_alloc cmd = {
 		.size = sizeof(cmd),
+		.flags = flags,
 		.dev_id = device_id,
 		.pt_id = pt_id,
 	};
@@ -116,8 +121,9 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
 	return 0;
 }
 
-#define test_cmd_hwpt_alloc(device_id, pt_id, hwpt_id) \
-	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, hwpt_id))
+#define test_cmd_hwpt_alloc(device_id, pt_id, flags, hwpt_id) \
+	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \
+					  hwpt_id))
 
 static int _test_cmd_create_access(int fd, unsigned int ioas_id,
 				   __u32 *access_id, unsigned int flags)
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 07/24] iommufd/selftest: Test IOMMU_HWPT_ALLOC_ENFORCE_DIRTY
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (5 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 06/24] iommufd/selftest: Add a flags to _test_cmd_{hwpt_alloc,mock_domain} Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-19 13:35   ` Jason Gunthorpe
  2023-05-19 13:55   ` Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 08/24] iommufd: Dirty tracking data support Joao Martins
                   ` (16 subsequent siblings)
  23 siblings, 2 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

In order to selftest the iommu domain dirty enforcing we implement the
mock_domain necessary support and add a new dev_flags to test that the
attach_device fails as expected.

Expand the existing mock_domain fixture with a enforce_dirty test that
exercises the hwpt_alloc and device attachment.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/iommufd_test.h          |  5 +++
 drivers/iommu/iommufd/selftest.c              | 16 +++++++-
 tools/testing/selftests/iommu/Makefile        |  3 ++
 tools/testing/selftests/iommu/iommufd.c       | 39 +++++++++++++++++++
 tools/testing/selftests/iommu/iommufd_utils.h |  2 +-
 5 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index dd9168a20ddf..9abcc3231137 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -39,6 +39,10 @@ enum {
 	MOCK_FLAGS_ACCESS_CREATE_NEEDS_PIN_PAGES = 1 << 0,
 };
 
+enum {
+	MOCK_FLAGS_DEVICE_NO_DIRTY = 1 << 0,
+};
+
 struct iommu_test_cmd {
 	__u32 size;
 	__u32 op;
@@ -50,6 +54,7 @@ struct iommu_test_cmd {
 			__aligned_u64 length;
 		} add_reserved;
 		struct {
+			__u32 dev_flags;
 			__u32 out_stdev_id;
 			__u32 out_hwpt_id;
 			/* out_idev_id is the standard iommufd_bind object */
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 9d43334e4faf..65daceb6e0dc 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -93,6 +93,7 @@ enum selftest_obj_type {
 
 struct mock_dev {
 	struct device dev;
+	unsigned long flags;
 };
 
 struct selftest_obj {
@@ -115,6 +116,12 @@ static void mock_domain_blocking_free(struct iommu_domain *domain)
 static int mock_domain_nop_attach(struct iommu_domain *domain,
 				  struct device *dev)
 {
+	struct mock_dev *mdev = container_of(dev, struct mock_dev, dev);
+
+	if ((domain->flags & IOMMU_DOMAIN_F_ENFORCE_DIRTY) &&
+	    (mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY))
+		return -EINVAL;
+
 	return 0;
 }
 
@@ -278,6 +285,7 @@ static void mock_domain_set_plaform_dma_ops(struct device *dev)
 
 static const struct iommu_ops mock_ops = {
 	.owner = THIS_MODULE,
+	.supported_flags = IOMMU_DOMAIN_F_ENFORCE_DIRTY,
 	.pgsize_bitmap = MOCK_IO_PAGE_SIZE,
 	.domain_alloc = mock_domain_alloc,
 	.capable = mock_domain_capable,
@@ -328,18 +336,22 @@ static void mock_dev_release(struct device *dev)
 	kfree(mdev);
 }
 
-static struct mock_dev *mock_dev_create(void)
+static struct mock_dev *mock_dev_create(unsigned long dev_flags)
 {
 	struct iommu_group *iommu_group;
 	struct dev_iommu *dev_iommu;
 	struct mock_dev *mdev;
 	int rc;
 
+	if (dev_flags & ~(MOCK_FLAGS_DEVICE_NO_DIRTY))
+		return ERR_PTR(-EINVAL);
+
 	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
 	if (!mdev)
 		return ERR_PTR(-ENOMEM);
 
 	device_initialize(&mdev->dev);
+	mdev->flags = dev_flags;
 	mdev->dev.release = mock_dev_release;
 	mdev->dev.bus = &iommufd_mock_bus_type;
 
@@ -422,7 +434,7 @@ static int iommufd_test_mock_domain(struct iommufd_ucmd *ucmd,
 	sobj->idev.ictx = ucmd->ictx;
 	sobj->type = TYPE_IDEV;
 
-	sobj->idev.mock_dev = mock_dev_create();
+	sobj->idev.mock_dev = mock_dev_create(cmd->mock_domain.dev_flags);
 	if (IS_ERR(sobj->idev.mock_dev)) {
 		rc = PTR_ERR(sobj->idev.mock_dev);
 		goto out_sobj;
diff --git a/tools/testing/selftests/iommu/Makefile b/tools/testing/selftests/iommu/Makefile
index 32c5fdfd0eef..f1aee4e5ec2e 100644
--- a/tools/testing/selftests/iommu/Makefile
+++ b/tools/testing/selftests/iommu/Makefile
@@ -1,5 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0-only
 CFLAGS += -Wall -O2 -Wno-unused-function
+CFLAGS += -I../../../../tools/include/
+CFLAGS += -I../../../../include/uapi/
+CFLAGS += -I../../../../include/
 CFLAGS += $(KHDR_INCLUDES)
 
 CFLAGS += -D_GNU_SOURCE
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 771e4a40200f..da7d1dad1816 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -1354,6 +1354,45 @@ TEST_F(iommufd_mock_domain, alloc_hwpt)
 	}
 }
 
+FIXTURE(iommufd_dirty_tracking)
+{
+	int fd;
+	uint32_t ioas_id;
+	uint32_t hwpt_id;
+	uint32_t stdev_id;
+	uint32_t idev_id;
+};
+
+FIXTURE_SETUP(iommufd_dirty_tracking)
+{
+	self->fd = open("/dev/iommu", O_RDWR);
+	ASSERT_NE(-1, self->fd);
+
+	test_ioctl_ioas_alloc(&self->ioas_id);
+	test_cmd_mock_domain(self->ioas_id, &self->stdev_id,
+			     &self->hwpt_id, &self->idev_id);
+}
+
+FIXTURE_TEARDOWN(iommufd_dirty_tracking)
+{
+	teardown_iommufd(self->fd, _metadata);
+}
+
+TEST_F(iommufd_dirty_tracking, enforce_dirty)
+{
+	uint32_t dev_flags = MOCK_FLAGS_DEVICE_NO_DIRTY;
+	uint32_t stddev_id;
+	uint32_t hwpt_id;
+
+	test_cmd_hwpt_alloc(self->idev_id, self->ioas_id,
+			    IOMMU_HWPT_ALLOC_ENFORCE_DIRTY, &hwpt_id);
+	test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
+	test_err_mock_domain_flags(EINVAL, hwpt_id, dev_flags,
+				   &stddev_id, NULL);
+	test_ioctl_destroy(stddev_id);
+	test_ioctl_destroy(hwpt_id);
+}
+
 /* VFIO compatibility IOCTLs */
 
 TEST_F(iommufd, simple_ioctls)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 04871bcfd34b..f8c926f96f23 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -48,7 +48,7 @@ static int _test_cmd_mock_domain(int fd, unsigned int ioas_id,
 		.size = sizeof(cmd),
 		.op = IOMMU_TEST_OP_MOCK_DOMAIN,
 		.id = ioas_id,
-		.mock_domain = {},
+		.mock_domain = { .dev_flags = stdev_flags },
 	};
 	int ret;
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 08/24] iommufd: Dirty tracking data support
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (6 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 07/24] iommufd/selftest: Test IOMMU_HWPT_ALLOC_ENFORCE_DIRTY Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 09/24] iommufd: Add IOMMU_HWPT_SET_DIRTY Joao Martins
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

Add an IO pagetable API iopt_read_and_clear_dirty_data() that performs the
reading of dirty IOPTEs for a given IOVA range and then copying back to
userspace bitmap.

Underneath it uses the IOMMU domain kernel API which will read the dirty
bits, as well as atomically clearing the IOPTE dirty bit and flushing the
IOTLB at the end. The IOVA bitmaps usage takes care of the iteration of the
bitmaps user pages efficiently and without copies.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/io_pagetable.c    | 70 +++++++++++++++++++++++++
 drivers/iommu/iommufd/iommufd_private.h | 14 +++++
 2 files changed, 84 insertions(+)

diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index 21052f64f956..187626e5f2bc 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -15,6 +15,7 @@
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/errno.h>
+#include <uapi/linux/iommufd.h>
 
 #include "io_pagetable.h"
 #include "double_span.h"
@@ -412,6 +413,75 @@ int iopt_map_user_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
 	return 0;
 }
 
+struct iova_bitmap_fn_arg {
+	struct iommu_domain *domain;
+	struct iommu_dirty_bitmap *dirty;
+};
+
+static int __iommu_read_and_clear_dirty(struct iova_bitmap *bitmap,
+					unsigned long iova, size_t length,
+					void *opaque)
+{
+	struct iova_bitmap_fn_arg *arg = opaque;
+	struct iommu_domain *domain = arg->domain;
+	const struct iommu_domain_ops *ops = domain->ops;
+	struct iommu_dirty_bitmap *dirty = arg->dirty;
+
+	return ops->read_and_clear_dirty(domain, iova, length, 0, dirty);
+}
+
+static int iommu_read_and_clear_dirty(struct iommu_domain *domain,
+				      unsigned long flags,
+				      struct iommufd_dirty_data *bitmap)
+{
+	const struct iommu_domain_ops *ops = domain->ops;
+	struct iommu_iotlb_gather gather;
+	struct iommu_dirty_bitmap dirty;
+	struct iova_bitmap_fn_arg arg;
+	struct iova_bitmap *iter;
+	int ret = 0;
+
+	if (!ops || !ops->read_and_clear_dirty)
+		return -EOPNOTSUPP;
+
+	iter = iova_bitmap_alloc(bitmap->iova, bitmap->length,
+			     bitmap->page_size, bitmap->data);
+	if (IS_ERR(iter))
+		return -ENOMEM;
+
+	iommu_dirty_bitmap_init(&dirty, iter, &gather);
+
+	arg.domain = domain;
+	arg.dirty = &dirty;
+	iova_bitmap_for_each(iter, &arg, __iommu_read_and_clear_dirty);
+
+	iommu_iotlb_sync(domain, &gather);
+	iova_bitmap_free(iter);
+
+	return ret;
+}
+
+int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
+				   struct iommu_domain *domain,
+				   unsigned long flags,
+				   struct iommufd_dirty_data *bitmap)
+{
+	unsigned long last_iova, iova = bitmap->iova;
+	unsigned long length = bitmap->length;
+	int ret = -EOPNOTSUPP;
+
+	if ((iova & (iopt->iova_alignment - 1)))
+		return -EINVAL;
+
+	if (check_add_overflow(iova, length - 1, &last_iova))
+		return -EOVERFLOW;
+
+	down_read(&iopt->iova_rwsem);
+	ret = iommu_read_and_clear_dirty(domain, flags, bitmap);
+	up_read(&iopt->iova_rwsem);
+	return ret;
+}
+
 int iopt_get_pages(struct io_pagetable *iopt, unsigned long iova,
 		   unsigned long length, struct list_head *pages_list)
 {
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 2552eb44d83a..2259b15340e4 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -8,6 +8,8 @@
 #include <linux/xarray.h>
 #include <linux/refcount.h>
 #include <linux/uaccess.h>
+#include <linux/iommu.h>
+#include <linux/iova_bitmap.h>
 
 struct iommu_domain;
 struct iommu_group;
@@ -70,6 +72,18 @@ int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova,
 		    unsigned long length, unsigned long *unmapped);
 int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped);
 
+struct iommufd_dirty_data {
+	unsigned long iova;
+	unsigned long length;
+	unsigned long page_size;
+	unsigned long long *data;
+};
+
+int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
+				   struct iommu_domain *domain,
+				   unsigned long flags,
+				   struct iommufd_dirty_data *bitmap);
+
 void iommufd_access_notify_unmap(struct io_pagetable *iopt, unsigned long iova,
 				 unsigned long length);
 int iopt_table_add_domain(struct io_pagetable *iopt,
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 09/24] iommufd: Add IOMMU_HWPT_SET_DIRTY
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (7 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 08/24] iommufd: Dirty tracking data support Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-19 13:49   ` Jason Gunthorpe
  2023-05-18 20:46 ` [PATCH RFCv2 10/24] iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY Joao Martins
                   ` (14 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

Every IOMMU driver should be able to implement the needed iommu domain ops
to control dirty tracking.

Connect a hw_pagetable to the IOMMU core dirty tracking ops, specifically
the ability to enable/disable dirty tracking on an IOMMU domain
(hw_pagetable id). To that end add an io_pagetable kernel API to toggle
dirty tracking:

* iopt_set_dirty_tracking(iopt, [domain], state)

The intended caller of this is via the hw_pagetable object that is created.

Internally we will ensure the leftover dirty state is cleared /right
before/ we start dirty tracking. This is also useful for iommu drivers
which may decide that dirty tracking is always-enabled without being
able to disable it via iommu domain op.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/hw_pagetable.c    | 24 +++++++++++++++++
 drivers/iommu/iommufd/io_pagetable.c    | 36 +++++++++++++++++++++++++
 drivers/iommu/iommufd/iommufd_private.h | 11 ++++++++
 drivers/iommu/iommufd/main.c            |  3 +++
 include/uapi/linux/iommufd.h            | 27 +++++++++++++++++++
 5 files changed, 101 insertions(+)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 4f0b72737ae2..7acbd88d05b7 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -200,3 +200,27 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 	iommufd_put_object(&idev->obj);
 	return rc;
 }
+
+int iommufd_hwpt_set_dirty(struct iommufd_ucmd *ucmd)
+{
+	struct iommu_hwpt_set_dirty *cmd = ucmd->cmd;
+	struct iommufd_hw_pagetable *hwpt;
+	struct iommufd_ioas *ioas;
+	int rc = -EOPNOTSUPP;
+	bool enable;
+
+	hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);
+	if (IS_ERR(hwpt))
+		return PTR_ERR(hwpt);
+
+	if (!hwpt->enforce_dirty)
+		return -EOPNOTSUPP;
+
+	ioas = hwpt->ioas;
+	enable = cmd->flags & IOMMU_DIRTY_TRACKING_ENABLED;
+
+	rc = iopt_set_dirty_tracking(&ioas->iopt, hwpt->domain, enable);
+
+	iommufd_put_object(&hwpt->obj);
+	return rc;
+}
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index 187626e5f2bc..01adb0f7e4d0 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -479,6 +479,42 @@ int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
 	down_read(&iopt->iova_rwsem);
 	ret = iommu_read_and_clear_dirty(domain, flags, bitmap);
 	up_read(&iopt->iova_rwsem);
+
+	return ret;
+}
+
+int iopt_set_dirty_tracking(struct io_pagetable *iopt,
+			    struct iommu_domain *domain, bool enable)
+{
+	const struct iommu_domain_ops *ops = domain->ops;
+	struct iommu_dirty_bitmap dirty;
+	struct iommu_iotlb_gather gather;
+	struct iopt_area *area;
+	int ret = 0;
+
+	if (!ops->set_dirty_tracking)
+		return -EOPNOTSUPP;
+
+	iommu_dirty_bitmap_init(&dirty, NULL, &gather);
+
+	down_write(&iopt->iova_rwsem);
+	for (area = iopt_area_iter_first(iopt, 0, ULONG_MAX);
+	     area && enable;
+	     area = iopt_area_iter_next(area, 0, ULONG_MAX)) {
+		ret = ops->read_and_clear_dirty(domain,
+						iopt_area_iova(area),
+						iopt_area_length(area), 0,
+						&dirty);
+		if (ret)
+			goto out_unlock;
+	}
+
+	iommu_iotlb_sync(domain, &gather);
+
+	ret = ops->set_dirty_tracking(domain, enable);
+
+out_unlock:
+	up_write(&iopt->iova_rwsem);
 	return ret;
 }
 
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 2259b15340e4..e902197a6a42 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -10,6 +10,7 @@
 #include <linux/uaccess.h>
 #include <linux/iommu.h>
 #include <linux/iova_bitmap.h>
+#include <uapi/linux/iommufd.h>
 
 struct iommu_domain;
 struct iommu_group;
@@ -83,6 +84,8 @@ int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
 				   struct iommu_domain *domain,
 				   unsigned long flags,
 				   struct iommufd_dirty_data *bitmap);
+int iopt_set_dirty_tracking(struct io_pagetable *iopt,
+			    struct iommu_domain *domain, bool enable);
 
 void iommufd_access_notify_unmap(struct io_pagetable *iopt, unsigned long iova,
 				 unsigned long length);
@@ -267,6 +270,14 @@ struct iommufd_hw_pagetable {
 	struct list_head hwpt_item;
 };
 
+static inline struct iommufd_hw_pagetable *iommufd_get_hwpt(
+					struct iommufd_ucmd *ucmd, u32 id)
+{
+	return container_of(iommufd_get_object(ucmd->ictx, id,
+					       IOMMUFD_OBJ_HW_PAGETABLE),
+			    struct iommufd_hw_pagetable, obj);
+}
+int iommufd_hwpt_set_dirty(struct iommufd_ucmd *ucmd);
 struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 			   struct iommufd_device *idev, bool immediate_attach,
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 3932fe26522b..8c4640df0547 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -277,6 +277,7 @@ union ucmd_buffer {
 	struct iommu_ioas_unmap unmap;
 	struct iommu_option option;
 	struct iommu_vfio_ioas vfio_ioas;
+	struct iommu_hwpt_set_dirty set_dirty;
 #ifdef CONFIG_IOMMUFD_TEST
 	struct iommu_test_cmd test;
 #endif
@@ -318,6 +319,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 		 val64),
 	IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct iommu_vfio_ioas,
 		 __reserved),
+	IOCTL_OP(IOMMU_HWPT_SET_DIRTY, iommufd_hwpt_set_dirty,
+		 struct iommu_hwpt_set_dirty, __reserved),
 #ifdef CONFIG_IOMMUFD_TEST
 	IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last),
 #endif
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 1cd9c54d0f64..85498f14b3ae 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -46,6 +46,7 @@ enum {
 	IOMMUFD_CMD_OPTION,
 	IOMMUFD_CMD_VFIO_IOAS,
 	IOMMUFD_CMD_HWPT_ALLOC,
+	IOMMUFD_CMD_HWPT_SET_DIRTY,
 };
 
 /**
@@ -379,4 +380,30 @@ struct iommu_hwpt_alloc {
 	__u32 __reserved;
 };
 #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
+
+/**
+ * enum iommufd_set_dirty_flags - Flags for steering dirty tracking
+ * @IOMMU_DIRTY_TRACKING_DISABLED: Disables dirty tracking
+ * @IOMMU_DIRTY_TRACKING_ENABLED: Enables dirty tracking
+ */
+enum iommufd_hwpt_set_dirty_flags {
+	IOMMU_DIRTY_TRACKING_DISABLED = 0,
+	IOMMU_DIRTY_TRACKING_ENABLED = 1,
+};
+
+/**
+ * struct iommu_hwpt_set_dirty - ioctl(IOMMU_HWPT_SET_DIRTY)
+ * @size: sizeof(struct iommu_hwpt_set_dirty)
+ * @flags: Flags to control dirty tracking status.
+ * @hwpt_id: HW pagetable ID that represents the IOMMU domain.
+ *
+ * Toggle dirty tracking on an HW pagetable.
+ */
+struct iommu_hwpt_set_dirty {
+	__u32 size;
+	__u32 flags;
+	__u32 hwpt_id;
+	__u32 __reserved;
+};
+#define IOMMU_HWPT_SET_DIRTY _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_SET_DIRTY)
 #endif
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 10/24] iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (8 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 09/24] iommufd: Add IOMMU_HWPT_SET_DIRTY Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 11/24] iommufd: Add IOMMU_HWPT_GET_DIRTY_IOVA Joao Martins
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

Change mock_domain to supporting dirty tracking and add tests to exercise
the new SET_DIRTY API in the mock_domain selftest fixture.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/selftest.c              | 21 +++++++++++++++++++
 tools/testing/selftests/iommu/iommufd.c       | 15 +++++++++++++
 tools/testing/selftests/iommu/iommufd_utils.h | 18 ++++++++++++++++
 3 files changed, 54 insertions(+)

diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 65daceb6e0dc..ee7523c8d46a 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -21,6 +21,7 @@ static struct dentry *dbgfs_root;
 size_t iommufd_test_memory_limit = 65536;
 
 enum {
+	MOCK_DIRTY_TRACK = 1,
 	MOCK_IO_PAGE_SIZE = PAGE_SIZE / 2,
 
 	/*
@@ -83,6 +84,7 @@ void iommufd_test_syz_conv_iova_id(struct iommufd_ucmd *ucmd,
 }
 
 struct mock_iommu_domain {
+	unsigned long flags;
 	struct iommu_domain domain;
 	struct xarray pfns;
 };
@@ -283,6 +285,24 @@ static void mock_domain_set_plaform_dma_ops(struct device *dev)
 	 */
 }
 
+static int mock_domain_set_dirty_tracking(struct iommu_domain *domain,
+					  bool enable)
+{
+	struct mock_iommu_domain *mock =
+		container_of(domain, struct mock_iommu_domain, domain);
+	unsigned long flags = mock->flags;
+
+	/* No change? */
+	if (!(enable ^ !!(flags & MOCK_DIRTY_TRACK)))
+		return -EINVAL;
+
+	flags = (enable ?
+		 flags | MOCK_DIRTY_TRACK : flags & ~MOCK_DIRTY_TRACK);
+
+	mock->flags = flags;
+	return 0;
+}
+
 static const struct iommu_ops mock_ops = {
 	.owner = THIS_MODULE,
 	.supported_flags = IOMMU_DOMAIN_F_ENFORCE_DIRTY,
@@ -297,6 +317,7 @@ static const struct iommu_ops mock_ops = {
 			.map_pages = mock_domain_map_pages,
 			.unmap_pages = mock_domain_unmap_pages,
 			.iova_to_phys = mock_domain_iova_to_phys,
+			.set_dirty_tracking = mock_domain_set_dirty_tracking,
 		},
 };
 
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index da7d1dad1816..8adccdde5ecc 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -1393,6 +1393,21 @@ TEST_F(iommufd_dirty_tracking, enforce_dirty)
 	test_ioctl_destroy(hwpt_id);
 }
 
+TEST_F(iommufd_dirty_tracking, set_dirty)
+{
+	uint32_t stddev_id;
+	uint32_t hwpt_id;
+
+	test_cmd_hwpt_alloc(self->idev_id, self->ioas_id,
+			    IOMMU_HWPT_ALLOC_ENFORCE_DIRTY, &hwpt_id);
+	test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
+	test_cmd_set_dirty(hwpt_id, true);
+	test_cmd_set_dirty(hwpt_id, false);
+
+	test_ioctl_destroy(stddev_id);
+	test_ioctl_destroy(hwpt_id);
+}
+
 /* VFIO compatibility IOCTLs */
 
 TEST_F(iommufd, simple_ioctls)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index f8c926f96f23..3629c531ec9f 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -125,6 +125,24 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
 	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \
 					  hwpt_id))
 
+static int _test_cmd_set_dirty(int fd, __u32 hwpt_id, bool enabled)
+{
+	struct iommu_hwpt_set_dirty cmd = {
+		.size = sizeof(cmd),
+		.flags = enabled ? IOMMU_DIRTY_TRACKING_ENABLED :
+				   IOMMU_DIRTY_TRACKING_DISABLED,
+		.hwpt_id = hwpt_id,
+	};
+	int ret;
+
+	ret = ioctl(fd, IOMMU_HWPT_SET_DIRTY, &cmd);
+	if (ret)
+		return ret;
+	return 0;
+}
+
+#define test_cmd_set_dirty(hwpt_id, enabled) \
+	ASSERT_EQ(0, _test_cmd_set_dirty(self->fd, hwpt_id, enabled))
 static int _test_cmd_create_access(int fd, unsigned int ioas_id,
 				   __u32 *access_id, unsigned int flags)
 {
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 11/24] iommufd: Add IOMMU_HWPT_GET_DIRTY_IOVA
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (9 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 10/24] iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 12/24] iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_IOVA Joao Martins
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

Connect a hw_pagetable to the IOMMU core dirty tracking
read_and_clear_dirty iommu domain op. It exposes all of the functionality
for the UAPI that read the dirtied IOVAs while clearing the Dirty bits from
the PTEs

In doing so the previously internal iommufd_dirty_data structure is moved
over as the UAPI intermediate structure for representing iommufd dirty
bitmaps.

Contrary to past incantation of a similar interface in VFIO the IOVA range
to be scanned is tied in to the bitmap size, thus the application needs to
pass a appropriately sized bitmap address taking into account the iova
range being passed *and* page size ... as opposed to allowing bitmap-iova !=
iova.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/hw_pagetable.c    | 58 +++++++++++++++++++++++++
 drivers/iommu/iommufd/iommufd_private.h | 11 ++---
 drivers/iommu/iommufd/main.c            |  3 ++
 include/uapi/linux/iommufd.h            | 36 +++++++++++++++
 4 files changed, 101 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 7acbd88d05b7..25860aa0a1f8 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -224,3 +224,61 @@ int iommufd_hwpt_set_dirty(struct iommufd_ucmd *ucmd)
 	iommufd_put_object(&hwpt->obj);
 	return rc;
 }
+
+int iommufd_check_iova_range(struct iommufd_ioas *ioas,
+			     struct iommufd_dirty_data *bitmap)
+{
+	unsigned long pgshift, npages;
+	size_t iommu_pgsize;
+	int rc = -EINVAL;
+
+	pgshift = __ffs(bitmap->page_size);
+	npages = bitmap->length >> pgshift;
+
+	if (!npages || (npages > ULONG_MAX))
+		return rc;
+
+	iommu_pgsize = 1 << __ffs(ioas->iopt.iova_alignment);
+
+	/* allow only smallest supported pgsize */
+	if (bitmap->page_size != iommu_pgsize)
+		return rc;
+
+	if (bitmap->iova & (iommu_pgsize - 1))
+		return rc;
+
+	if (!bitmap->length || bitmap->length & (iommu_pgsize - 1))
+		return rc;
+
+	return 0;
+}
+
+int iommufd_hwpt_get_dirty_iova(struct iommufd_ucmd *ucmd)
+{
+	struct iommu_hwpt_get_dirty_iova *cmd = ucmd->cmd;
+	struct iommufd_hw_pagetable *hwpt;
+	struct iommufd_ioas *ioas;
+	int rc = -EOPNOTSUPP;
+
+	if ((cmd->flags || cmd->__reserved))
+		return -EOPNOTSUPP;
+
+	hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);
+	if (IS_ERR(hwpt))
+		return PTR_ERR(hwpt);
+
+	if (!hwpt->enforce_dirty)
+		return -EOPNOTSUPP;
+
+	ioas = hwpt->ioas;
+	rc = iommufd_check_iova_range(ioas, &cmd->bitmap);
+	if (rc)
+		goto out_put;
+
+	rc = iopt_read_and_clear_dirty_data(&ioas->iopt, hwpt->domain,
+					    cmd->flags, &cmd->bitmap);
+
+out_put:
+	iommufd_put_object(&hwpt->obj);
+	return rc;
+}
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index e902197a6a42..3de8046fee07 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -73,13 +73,6 @@ int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova,
 		    unsigned long length, unsigned long *unmapped);
 int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped);
 
-struct iommufd_dirty_data {
-	unsigned long iova;
-	unsigned long length;
-	unsigned long page_size;
-	unsigned long long *data;
-};
-
 int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
 				   struct iommu_domain *domain,
 				   unsigned long flags,
@@ -251,6 +244,8 @@ int iommufd_option_rlimit_mode(struct iommu_option *cmd,
 			       struct iommufd_ctx *ictx);
 
 int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd);
+int iommufd_check_iova_range(struct iommufd_ioas *ioas,
+			     struct iommufd_dirty_data *bitmap);
 
 /*
  * A HW pagetable is called an iommu_domain inside the kernel. This user object
@@ -278,6 +273,8 @@ static inline struct iommufd_hw_pagetable *iommufd_get_hwpt(
 			    struct iommufd_hw_pagetable, obj);
 }
 int iommufd_hwpt_set_dirty(struct iommufd_ucmd *ucmd);
+int iommufd_hwpt_get_dirty_iova(struct iommufd_ucmd *ucmd);
+
 struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 			   struct iommufd_device *idev, bool immediate_attach,
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 8c4640df0547..f34b309a1baf 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -278,6 +278,7 @@ union ucmd_buffer {
 	struct iommu_option option;
 	struct iommu_vfio_ioas vfio_ioas;
 	struct iommu_hwpt_set_dirty set_dirty;
+	struct iommu_hwpt_get_dirty_iova get_dirty_iova;
 #ifdef CONFIG_IOMMUFD_TEST
 	struct iommu_test_cmd test;
 #endif
@@ -321,6 +322,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 		 __reserved),
 	IOCTL_OP(IOMMU_HWPT_SET_DIRTY, iommufd_hwpt_set_dirty,
 		 struct iommu_hwpt_set_dirty, __reserved),
+	IOCTL_OP(IOMMU_HWPT_GET_DIRTY_IOVA, iommufd_hwpt_get_dirty_iova,
+		 struct iommu_hwpt_get_dirty_iova, bitmap.data),
 #ifdef CONFIG_IOMMUFD_TEST
 	IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last),
 #endif
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 85498f14b3ae..44f9ddcfda58 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -47,6 +47,7 @@ enum {
 	IOMMUFD_CMD_VFIO_IOAS,
 	IOMMUFD_CMD_HWPT_ALLOC,
 	IOMMUFD_CMD_HWPT_SET_DIRTY,
+	IOMMUFD_CMD_HWPT_GET_DIRTY_IOVA,
 };
 
 /**
@@ -406,4 +407,39 @@ struct iommu_hwpt_set_dirty {
 	__u32 __reserved;
 };
 #define IOMMU_HWPT_SET_DIRTY _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_SET_DIRTY)
+
+/**
+ * struct iommufd_dirty_bitmap - Dirty IOVA tracking bitmap
+ * @iova: base IOVA of the bitmap
+ * @length: IOVA size
+ * @page_size: page size granularity of each bit in the bitmap
+ * @data: bitmap where to set the dirty bits. The bitmap bits each
+ * represent a page_size which you deviate from an arbitrary iova.
+ * Checking a given IOVA is dirty:
+ *
+ *  data[(iova / page_size) / 64] & (1ULL << (iova % 64))
+ */
+struct iommufd_dirty_data {
+	__aligned_u64 iova;
+	__aligned_u64 length;
+	__aligned_u64 page_size;
+	__aligned_u64 *data;
+};
+
+/**
+ * struct iommu_hwpt_get_dirty_iova - ioctl(IOMMU_HWPT_GET_DIRTY_IOVA)
+ * @size: sizeof(struct iommu_hwpt_get_dirty_iova)
+ * @hwpt_id: HW pagetable ID that represents the IOMMU domain.
+ * @flags: Flags to control dirty tracking status.
+ * @bitmap: Bitmap of the range of IOVA to read out
+ */
+struct iommu_hwpt_get_dirty_iova {
+	__u32 size;
+	__u32 hwpt_id;
+	__u32 flags;
+	__u32 __reserved;
+	struct iommufd_dirty_data bitmap;
+};
+#define IOMMU_HWPT_GET_DIRTY_IOVA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_GET_DIRTY_IOVA)
+
 #endif
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 12/24] iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_IOVA
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (10 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 11/24] iommufd: Add IOMMU_HWPT_GET_DIRTY_IOVA Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 13/24] iommufd: Add IOMMU_DEVICE_GET_CAPS Joao Martins
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

Add a new test ioctl for simulating the dirty IOVAs in the mock domain, and
implement the mock iommu domain ops that get the dirty tracking supported.

The selftest exercises the usual main workflow of:

1) Setting dirty tracking from the iommu domain
2) Read and clear dirty IOPTEs

Different fixtures will test different IOVA range sizes, that exercise
corner cases of the bitmaps.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/iommufd_test.h          |   9 ++
 drivers/iommu/iommufd/selftest.c              |  97 ++++++++++++++-
 tools/testing/selftests/iommu/iommufd.c       |  99 ++++++++++++++++
 tools/testing/selftests/iommu/iommufd_utils.h | 112 ++++++++++++++++++
 4 files changed, 314 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 9abcc3231137..8c9f0d95ed94 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -18,6 +18,7 @@ enum {
 	IOMMU_TEST_OP_ACCESS_RW,
 	IOMMU_TEST_OP_SET_TEMP_MEMORY_LIMIT,
 	IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE,
+	IOMMU_TEST_OP_DIRTY,
 };
 
 enum {
@@ -96,6 +97,14 @@ struct iommu_test_cmd {
 		struct {
 			__u32 limit;
 		} memory_limit;
+		struct {
+			__u32 flags;
+			__aligned_u64 iova;
+			__aligned_u64 length;
+			__aligned_u64 page_size;
+			__aligned_u64 uptr;
+			__aligned_u64 out_nr_dirty;
+		} dirty;
 	};
 	__u32 last;
 };
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index ee7523c8d46a..3ec0eb4dfe97 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -34,6 +34,7 @@ enum {
 	_MOCK_PFN_START = MOCK_PFN_MASK + 1,
 	MOCK_PFN_START_IOVA = _MOCK_PFN_START,
 	MOCK_PFN_LAST_IOVA = _MOCK_PFN_START,
+	MOCK_PFN_DIRTY_IOVA = _MOCK_PFN_START << 1,
 };
 
 /*
@@ -234,7 +235,7 @@ static size_t mock_domain_unmap_pages(struct iommu_domain *domain,
 
 		for (cur = 0; cur != pgsize; cur += MOCK_IO_PAGE_SIZE) {
 			ent = xa_erase(&mock->pfns, iova / MOCK_IO_PAGE_SIZE);
-			WARN_ON(!ent);
+
 			/*
 			 * iommufd generates unmaps that must be a strict
 			 * superset of the map's performend So every starting
@@ -244,12 +245,12 @@ static size_t mock_domain_unmap_pages(struct iommu_domain *domain,
 			 * passed to map_pages
 			 */
 			if (first) {
-				WARN_ON(!(xa_to_value(ent) &
+				WARN_ON(ent && !(xa_to_value(ent) &
 					  MOCK_PFN_START_IOVA));
 				first = false;
 			}
 			if (pgcount == 1 && cur + MOCK_IO_PAGE_SIZE == pgsize)
-				WARN_ON(!(xa_to_value(ent) &
+				WARN_ON(ent && !(xa_to_value(ent) &
 					  MOCK_PFN_LAST_IOVA));
 
 			iova += MOCK_IO_PAGE_SIZE;
@@ -303,6 +304,39 @@ static int mock_domain_set_dirty_tracking(struct iommu_domain *domain,
 	return 0;
 }
 
+static int mock_domain_read_and_clear_dirty(struct iommu_domain *domain,
+					    unsigned long iova, size_t size,
+					    unsigned long flags,
+					    struct iommu_dirty_bitmap *dirty)
+{
+	struct mock_iommu_domain *mock =
+		container_of(domain, struct mock_iommu_domain, domain);
+	unsigned long i, max = size / MOCK_IO_PAGE_SIZE;
+	void *ent, *old;
+
+	if (!(mock->flags & MOCK_DIRTY_TRACK) && dirty->bitmap)
+		return -EINVAL;
+
+	for (i = 0; i < max; i++) {
+		unsigned long cur = iova + i * MOCK_IO_PAGE_SIZE;
+
+		ent = xa_load(&mock->pfns, cur / MOCK_IO_PAGE_SIZE);
+		if (ent &&
+		    (xa_to_value(ent) & MOCK_PFN_DIRTY_IOVA)) {
+			unsigned long val;
+
+			/* Clear dirty */
+			val = xa_to_value(ent) & ~MOCK_PFN_DIRTY_IOVA;
+			old = xa_store(&mock->pfns, cur / MOCK_IO_PAGE_SIZE,
+				       xa_mk_value(val), GFP_KERNEL);
+			WARN_ON_ONCE(ent != old);
+			iommu_dirty_bitmap_record(dirty, cur, MOCK_IO_PAGE_SIZE);
+		}
+	}
+
+	return 0;
+}
+
 static const struct iommu_ops mock_ops = {
 	.owner = THIS_MODULE,
 	.supported_flags = IOMMU_DOMAIN_F_ENFORCE_DIRTY,
@@ -318,6 +352,7 @@ static const struct iommu_ops mock_ops = {
 			.unmap_pages = mock_domain_unmap_pages,
 			.iova_to_phys = mock_domain_iova_to_phys,
 			.set_dirty_tracking = mock_domain_set_dirty_tracking,
+			.read_and_clear_dirty = mock_domain_read_and_clear_dirty,
 		},
 };
 
@@ -994,6 +1029,56 @@ static_assert((unsigned int)MOCK_ACCESS_RW_WRITE == IOMMUFD_ACCESS_RW_WRITE);
 static_assert((unsigned int)MOCK_ACCESS_RW_SLOW_PATH ==
 	      __IOMMUFD_ACCESS_RW_SLOW_PATH);
 
+static int iommufd_test_dirty(struct iommufd_ucmd *ucmd,
+			      unsigned int mockpt_id, unsigned long iova,
+			      size_t length, unsigned long page_size,
+			      void __user *uptr, u32 flags)
+{
+	unsigned long i, max = length / page_size;
+	struct iommu_test_cmd *cmd = ucmd->cmd;
+	struct iommufd_hw_pagetable *hwpt;
+	struct mock_iommu_domain *mock;
+	int rc, count = 0;
+
+	if (iova % page_size || length % page_size ||
+	    (uintptr_t)uptr % page_size)
+		return -EINVAL;
+
+	hwpt = get_md_pagetable(ucmd, mockpt_id, &mock);
+	if (IS_ERR(hwpt))
+		return PTR_ERR(hwpt);
+
+	if (!(mock->flags & MOCK_DIRTY_TRACK)) {
+		rc = -EINVAL;
+		goto out_put;
+	}
+
+	for (i = 0; i < max; i++) {
+		unsigned long cur = iova + i * page_size;
+		void *ent, *old;
+
+		if (!test_bit(i, (unsigned long *) uptr))
+			continue;
+
+		ent = xa_load(&mock->pfns, cur / page_size);
+		if (ent) {
+			unsigned long val;
+
+			val = xa_to_value(ent) | MOCK_PFN_DIRTY_IOVA;
+			old = xa_store(&mock->pfns, cur / page_size,
+				       xa_mk_value(val), GFP_KERNEL);
+			WARN_ON_ONCE(ent != old);
+			count++;
+		}
+	}
+
+	cmd->dirty.out_nr_dirty = count;
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+out_put:
+	iommufd_put_object(&hwpt->obj);
+	return rc;
+}
+
 void iommufd_selftest_destroy(struct iommufd_object *obj)
 {
 	struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj);
@@ -1055,6 +1140,12 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
 			return -EINVAL;
 		iommufd_test_memory_limit = cmd->memory_limit.limit;
 		return 0;
+	case IOMMU_TEST_OP_DIRTY:
+		return iommufd_test_dirty(
+			ucmd, cmd->id, cmd->dirty.iova,
+			cmd->dirty.length, cmd->dirty.page_size,
+			u64_to_user_ptr(cmd->dirty.uptr),
+			cmd->dirty.flags);
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 8adccdde5ecc..818e78cd889a 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -12,6 +12,7 @@
 static unsigned long HUGEPAGE_SIZE;
 
 #define MOCK_PAGE_SIZE (PAGE_SIZE / 2)
+#define BITS_PER_BYTE 8
 
 static unsigned long get_huge_page_size(void)
 {
@@ -1361,13 +1362,47 @@ FIXTURE(iommufd_dirty_tracking)
 	uint32_t hwpt_id;
 	uint32_t stdev_id;
 	uint32_t idev_id;
+	unsigned long page_size;
+	unsigned long bitmap_size;
+	void *bitmap;
+	void *buffer;
+};
+
+FIXTURE_VARIANT(iommufd_dirty_tracking)
+{
+	unsigned long buffer_size;
 };
 
 FIXTURE_SETUP(iommufd_dirty_tracking)
 {
+	void *vrc;
+	int rc;
+
 	self->fd = open("/dev/iommu", O_RDWR);
 	ASSERT_NE(-1, self->fd);
 
+	rc = posix_memalign(&self->buffer, HUGEPAGE_SIZE, variant->buffer_size);
+	if (rc || !self->buffer) {
+		SKIP(return, "Skipping buffer_size=%lu due to errno=%d",
+			     variant->buffer_size, rc);
+	}
+
+	assert((uintptr_t)self->buffer % HUGEPAGE_SIZE == 0);
+	vrc = mmap(self->buffer, variant->buffer_size, PROT_READ | PROT_WRITE,
+                   MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+        assert(vrc == self->buffer);
+
+	self->page_size = MOCK_PAGE_SIZE;
+	self->bitmap_size = variant->buffer_size /
+			     self->page_size / BITS_PER_BYTE;
+
+	/* Provision with an extra (MOCK_PAGE_SIZE) for the unaligned case */
+	rc = posix_memalign(&self->bitmap, PAGE_SIZE,
+			    self->bitmap_size + MOCK_PAGE_SIZE);
+	assert(!rc);
+	assert(self->bitmap);
+	assert((uintptr_t)self->bitmap % PAGE_SIZE == 0);
+
 	test_ioctl_ioas_alloc(&self->ioas_id);
 	test_cmd_mock_domain(self->ioas_id, &self->stdev_id,
 			     &self->hwpt_id, &self->idev_id);
@@ -1375,9 +1410,41 @@ FIXTURE_SETUP(iommufd_dirty_tracking)
 
 FIXTURE_TEARDOWN(iommufd_dirty_tracking)
 {
+	munmap(self->buffer, variant->buffer_size);
+	munmap(self->bitmap, self->bitmap_size);
 	teardown_iommufd(self->fd, _metadata);
 }
 
+FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty128k)
+{
+	/* one u32 index bitmap */
+	.buffer_size = 128UL * 1024UL,
+};
+
+FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty256k)
+{
+	/* one u64 index bitmap */
+	.buffer_size = 256UL * 1024UL,
+};
+
+FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty640k)
+{
+	/* two u64 index and trailing end bitmap */
+	.buffer_size = 640UL * 1024UL,
+};
+
+FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty128M)
+{
+	/* 4K bitmap (128M IOVA range) */
+	.buffer_size = 128UL * 1024UL * 1024UL,
+};
+
+FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty256M)
+{
+	/* 8K bitmap (256M IOVA range) */
+	.buffer_size = 256UL * 1024UL * 1024UL,
+};
+
 TEST_F(iommufd_dirty_tracking, enforce_dirty)
 {
 	uint32_t dev_flags = MOCK_FLAGS_DEVICE_NO_DIRTY;
@@ -1408,6 +1475,38 @@ TEST_F(iommufd_dirty_tracking, set_dirty)
 	test_ioctl_destroy(hwpt_id);
 }
 
+TEST_F(iommufd_dirty_tracking, get_dirty_iova)
+{
+	uint32_t stddev_id;
+	uint32_t hwpt_id;
+	uint32_t ioas_id;
+
+	test_ioctl_ioas_alloc(&ioas_id);
+	test_ioctl_ioas_map_fixed_id(ioas_id, self->buffer,
+				     variant->buffer_size,
+				     MOCK_APERTURE_START);
+
+	test_cmd_hwpt_alloc(self->idev_id, ioas_id,
+			    IOMMU_HWPT_ALLOC_ENFORCE_DIRTY, &hwpt_id);
+	test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
+
+	test_cmd_set_dirty(hwpt_id, true);
+
+	test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
+				MOCK_APERTURE_START,
+				self->page_size, self->bitmap,
+				self->bitmap_size, _metadata);
+
+	/* PAGE_SIZE unaligned bitmap */
+	test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
+				MOCK_APERTURE_START,
+				self->page_size, self->bitmap + MOCK_PAGE_SIZE,
+				self->bitmap_size, _metadata);
+
+	test_ioctl_destroy(stddev_id);
+	test_ioctl_destroy(hwpt_id);
+}
+
 /* VFIO compatibility IOCTLs */
 
 TEST_F(iommufd, simple_ioctls)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 3629c531ec9f..4d428fbb12e2 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -9,6 +9,8 @@
 #include <sys/ioctl.h>
 #include <stdint.h>
 #include <assert.h>
+#include <linux/bitmap.h>
+#include <linux/bitops.h>
 
 #include "../kselftest_harness.h"
 #include "../../../../drivers/iommu/iommufd/iommufd_test.h"
@@ -143,6 +145,105 @@ static int _test_cmd_set_dirty(int fd, __u32 hwpt_id, bool enabled)
 
 #define test_cmd_set_dirty(hwpt_id, enabled) \
 	ASSERT_EQ(0, _test_cmd_set_dirty(self->fd, hwpt_id, enabled))
+
+static int _test_cmd_get_dirty_iova(int fd, __u32 hwpt_id, size_t length,
+				    __u64 iova, size_t page_size, __u64 *bitmap)
+{
+	struct iommu_hwpt_get_dirty_iova cmd = {
+		.size = sizeof(cmd),
+		.hwpt_id = hwpt_id,
+		.bitmap = {
+			.iova = iova,
+			.length = length,
+			.page_size = page_size,
+			.data = bitmap,
+		}
+	};
+	int ret;
+
+	ret = ioctl(fd, IOMMU_HWPT_GET_DIRTY_IOVA, &cmd);
+	if (ret)
+		return ret;
+	return 0;
+}
+
+#define test_cmd_get_dirty_iova(fd, hwpt_id, length, iova, page_size, bitmap) \
+	ASSERT_EQ(0, _test_cmd_get_dirty_iova(fd, hwpt_id, length,            \
+					      iova, page_size, bitmap))
+
+static int _test_cmd_mock_domain_set_dirty(int fd, __u32 hwpt_id, size_t length,
+				           __u64 iova, size_t page_size,
+					   __u64 *bitmap, __u64 *dirty)
+{
+	struct iommu_test_cmd cmd = {
+		.size = sizeof(cmd),
+		.op = IOMMU_TEST_OP_DIRTY,
+		.id = hwpt_id,
+		.dirty = {
+			.iova = iova,
+			.length = length,
+			.page_size = page_size,
+			.uptr = (uintptr_t) bitmap,
+		}
+	};
+	int ret;
+
+	ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_DIRTY), &cmd);
+	if (ret)
+		return -ret;
+	if (dirty)
+		*dirty = cmd.dirty.out_nr_dirty;
+	return 0;
+}
+
+#define test_cmd_mock_domain_set_dirty(fd, hwpt_id, length, iova, page_size, bitmap, nr) \
+	ASSERT_EQ(0, _test_cmd_mock_domain_set_dirty(fd, hwpt_id,            \
+						     length, iova,           \
+						     page_size, bitmap,      \
+						     nr))
+
+static int _test_mock_dirty_bitmaps(int fd, __u32 hwpt_id, size_t length,
+				    __u64 iova, size_t page_size,
+				    __u64 *bitmap, __u64 bitmap_size,
+				    struct __test_metadata *_metadata)
+{
+	unsigned long i, count, nbits = bitmap_size * BITS_PER_BYTE;
+	unsigned long nr = nbits / 2;
+	__u64 out_dirty = 0;
+
+	/* Mark all even bits as dirty in the mock domain */
+	for (count = 0, i = 0; i < nbits; count += !(i%2), i++)
+		if (!(i % 2))
+			__set_bit(i, (unsigned long *) bitmap);
+	ASSERT_EQ(nr, count);
+
+	test_cmd_mock_domain_set_dirty(fd, hwpt_id, length, iova, page_size,
+				       bitmap, &out_dirty);
+	ASSERT_EQ(nr, out_dirty);
+
+	/* Expect all even bits as dirty in the user bitmap */
+	memset(bitmap, 0, bitmap_size);
+	test_cmd_get_dirty_iova(fd, hwpt_id, length, iova, page_size, bitmap);
+	for (count = 0, i = 0; i < nbits; count += !(i%2), i++)
+		ASSERT_EQ(!(i % 2), test_bit(i, (unsigned long *) bitmap));
+	ASSERT_EQ(count, out_dirty);
+
+	memset(bitmap, 0, bitmap_size);
+	test_cmd_get_dirty_iova(fd, hwpt_id, length, iova, page_size, bitmap);
+
+	/* It as read already -- expect all zeroes */
+	for (i = 0; i < nbits; i++)
+		ASSERT_EQ(0, test_bit(i, (unsigned long *) bitmap));
+
+	return 0;
+}
+#define test_mock_dirty_bitmaps(hwpt_id, length, iova, page_size, bitmap, \
+				bitmap_size, _metadata) \
+	ASSERT_EQ(0, _test_mock_dirty_bitmaps(self->fd, hwpt_id,      \
+					      length, iova,           \
+					      page_size, bitmap,      \
+					      bitmap_size, _metadata))
+
 static int _test_cmd_create_access(int fd, unsigned int ioas_id,
 				   __u32 *access_id, unsigned int flags)
 {
@@ -267,6 +368,17 @@ static int _test_ioctl_ioas_map(int fd, unsigned int ioas_id, void *buffer,
 					     IOMMU_IOAS_MAP_READABLE));       \
 	})
 
+#define test_ioctl_ioas_map_fixed_id(ioas_id, buffer, length, iova)           \
+	({                                                                    \
+		__u64 __iova = iova;                                          \
+		ASSERT_EQ(0, _test_ioctl_ioas_map(                            \
+				     self->fd, ioas_id, buffer, length,       \
+				     &__iova,                                 \
+				     IOMMU_IOAS_MAP_FIXED_IOVA |              \
+					     IOMMU_IOAS_MAP_WRITEABLE |       \
+					     IOMMU_IOAS_MAP_READABLE));       \
+	})
+
 #define test_err_ioctl_ioas_map_fixed(_errno, buffer, length, iova)           \
 	({                                                                    \
 		__u64 __iova = iova;                                          \
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 13/24] iommufd: Add IOMMU_DEVICE_GET_CAPS
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (11 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 12/24] iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_IOVA Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 14/24] iommufd/selftest: Test IOMMU_DEVICE_GET_CAPS Joao Martins
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

Add IOMMU_DEVICE_GET_CAPS op for querying iommu capabilities for a given
device.

Capabilities are IOMMU agnostic and use device_iommu_capable() API passing
one of the IOMMU_CAP_*. Enumerate IOMMU_CAP_DIRTY for now in the out_caps
field returned back to userspace.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/device.c          | 26 +++++++++++++++++++++++++
 drivers/iommu/iommufd/iommufd_private.h |  1 +
 drivers/iommu/iommufd/main.c            |  3 +++
 include/uapi/linux/iommufd.h            | 23 ++++++++++++++++++++++
 4 files changed, 53 insertions(+)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 48d1300f0350..63e2ffe21653 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -263,6 +263,32 @@ u32 iommufd_device_to_id(struct iommufd_device *idev)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, IOMMUFD);
 
+int iommufd_device_get_caps(struct iommufd_ucmd *ucmd)
+{
+	struct iommu_device_get_caps *cmd = ucmd->cmd;
+	struct iommufd_object *obj;
+	struct iommufd_device *idev;
+	int rc;
+
+	obj = iommufd_get_object(ucmd->ictx, cmd->dev_id, IOMMUFD_OBJ_DEVICE);
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	idev = container_of(obj, struct iommufd_device, obj);
+
+	cmd->out_caps = 0;
+	if (device_iommu_capable(idev->dev, IOMMU_CAP_DIRTY))
+		cmd->out_caps |= IOMMUFD_CAP_DIRTY_TRACKING;
+
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+	if (rc)
+		goto out_put;
+
+out_put:
+	iommufd_put_object(obj);
+	return rc;
+}
+
 static int iommufd_group_setup_msi(struct iommufd_group *igroup,
 				   struct iommufd_hw_pagetable *hwpt)
 {
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 3de8046fee07..e5782459e4aa 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -246,6 +246,7 @@ int iommufd_option_rlimit_mode(struct iommu_option *cmd,
 int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd);
 int iommufd_check_iova_range(struct iommufd_ioas *ioas,
 			     struct iommufd_dirty_data *bitmap);
+int iommufd_device_get_caps(struct iommufd_ucmd *ucmd);
 
 /*
  * A HW pagetable is called an iommu_domain inside the kernel. This user object
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index f34b309a1baf..c4c6f900ef0a 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -279,6 +279,7 @@ union ucmd_buffer {
 	struct iommu_vfio_ioas vfio_ioas;
 	struct iommu_hwpt_set_dirty set_dirty;
 	struct iommu_hwpt_get_dirty_iova get_dirty_iova;
+	struct iommu_device_get_caps get_caps;
 #ifdef CONFIG_IOMMUFD_TEST
 	struct iommu_test_cmd test;
 #endif
@@ -324,6 +325,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 		 struct iommu_hwpt_set_dirty, __reserved),
 	IOCTL_OP(IOMMU_HWPT_GET_DIRTY_IOVA, iommufd_hwpt_get_dirty_iova,
 		 struct iommu_hwpt_get_dirty_iova, bitmap.data),
+	IOCTL_OP(IOMMU_DEVICE_GET_CAPS, iommufd_device_get_caps,
+		 struct iommu_device_get_caps, out_caps),
 #ifdef CONFIG_IOMMUFD_TEST
 	IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last),
 #endif
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 44f9ddcfda58..c256f7354867 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -48,6 +48,7 @@ enum {
 	IOMMUFD_CMD_HWPT_ALLOC,
 	IOMMUFD_CMD_HWPT_SET_DIRTY,
 	IOMMUFD_CMD_HWPT_GET_DIRTY_IOVA,
+	IOMMUFD_CMD_DEVICE_GET_CAPS,
 };
 
 /**
@@ -442,4 +443,26 @@ struct iommu_hwpt_get_dirty_iova {
 };
 #define IOMMU_HWPT_GET_DIRTY_IOVA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_GET_DIRTY_IOVA)
 
+
+/**
+ * enum iommufd_device_caps
+ * @IOMMU_CAP_DIRTY_TRACKING: IOMMU device support for dirty tracking
+ */
+enum iommufd_device_caps {
+	IOMMUFD_CAP_DIRTY_TRACKING = 1 << 0,
+};
+
+/*
+ * struct iommu_device_caps - ioctl(IOMMU_DEVICE_GET_CAPS)
+ * @size: sizeof(struct iommu_device_caps)
+ * @dev_id: the device to query
+ * @caps: IOMMU capabilities of the device
+ */
+struct iommu_device_get_caps {
+       __u32 size;
+       __u32 dev_id;
+       __aligned_u64 out_caps;
+};
+#define IOMMU_DEVICE_GET_CAPS _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_CAPS)
+
 #endif
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 14/24] iommufd/selftest: Test IOMMU_DEVICE_GET_CAPS
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (12 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 13/24] iommufd: Add IOMMU_DEVICE_GET_CAPS Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 15/24] iommufd: Add a flag to skip clearing of IOPTE dirty Joao Martins
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

Enumerate the capabilities from the mock device and test whether it
advertises as expected. Include it as part of the iommufd_dirty_tracking
suite.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/selftest.c              | 10 +++++++++-
 tools/testing/selftests/iommu/iommufd.c       | 14 ++++++++++++++
 tools/testing/selftests/iommu/iommufd_utils.h | 19 +++++++++++++++++++
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 3ec0eb4dfe97..d81a977bf3af 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -275,7 +275,15 @@ static phys_addr_t mock_domain_iova_to_phys(struct iommu_domain *domain,
 
 static bool mock_domain_capable(struct device *dev, enum iommu_cap cap)
 {
-	return cap == IOMMU_CAP_CACHE_COHERENCY;
+	switch (cap) {
+		case IOMMU_CAP_CACHE_COHERENCY:
+		case IOMMU_CAP_DIRTY:
+			return true;
+		default:
+			break;
+	}
+
+	return false;
 }
 
 static void mock_domain_set_plaform_dma_ops(struct device *dev)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 818e78cd889a..dad1eca3aa09 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -1475,6 +1475,19 @@ TEST_F(iommufd_dirty_tracking, set_dirty)
 	test_ioctl_destroy(hwpt_id);
 }
 
+TEST_F(iommufd_dirty_tracking, device_dirty_capability)
+{
+	uint32_t stddev_id;
+	uint32_t hwpt_id;
+
+	test_cmd_hwpt_alloc(self->idev_id, self->ioas_id, 0, &hwpt_id);
+	test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
+	test_cmd_get_device_caps(self->idev_id, IOMMUFD_CAP_DIRTY_TRACKING);
+
+	test_ioctl_destroy(stddev_id);
+	test_ioctl_destroy(hwpt_id);
+}
+
 TEST_F(iommufd_dirty_tracking, get_dirty_iova)
 {
 	uint32_t stddev_id;
@@ -1507,6 +1520,7 @@ TEST_F(iommufd_dirty_tracking, get_dirty_iova)
 	test_ioctl_destroy(hwpt_id);
 }
 
+
 /* VFIO compatibility IOCTLs */
 
 TEST_F(iommufd, simple_ioctls)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 4d428fbb12e2..e942bc781f34 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -146,6 +146,25 @@ static int _test_cmd_set_dirty(int fd, __u32 hwpt_id, bool enabled)
 #define test_cmd_set_dirty(hwpt_id, enabled) \
 	ASSERT_EQ(0, _test_cmd_set_dirty(self->fd, hwpt_id, enabled))
 
+static int _test_cmd_get_device_caps(int fd, __u32 dev_id, __u64 capability)
+{
+	struct iommu_device_get_caps cmd = {
+		.size = sizeof(cmd),
+		.dev_id = dev_id,
+	};
+	int ret;
+
+	ret = ioctl(fd, IOMMU_DEVICE_GET_CAPS, &cmd);
+	if (ret)
+		return ret;
+
+	return cmd.out_caps & capability;
+}
+
+#define test_cmd_get_device_caps(dev_id, expected) \
+	ASSERT_EQ(expected, _test_cmd_get_device_caps(self->fd, dev_id, \
+						      expected))
+
 static int _test_cmd_get_dirty_iova(int fd, __u32 hwpt_id, size_t length,
 				    __u64 iova, size_t page_size, __u64 *bitmap)
 {
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 15/24] iommufd: Add a flag to skip clearing of IOPTE dirty
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (13 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 14/24] iommufd/selftest: Test IOMMU_DEVICE_GET_CAPS Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-19 13:54   ` Jason Gunthorpe
  2023-05-18 20:46 ` [PATCH RFCv2 16/24] iommufd/selftest: Test IOMMU_GET_DIRTY_IOVA_NO_CLEAR flag Joao Martins
                   ` (8 subsequent siblings)
  23 siblings, 1 reply; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

VFIO has an operation where you an unmap an IOVA while returning a bitmap
with the dirty data. In reality the operation doesn't quite query the IO
pagetables that the the PTE was dirty or not, but it marks as dirty in the
bitmap anything that was mapped all in one operation.

In IOMMUFD this equivalent can be done in two operations by querying with
GET_DIRTY_IOVA followed by UNMAP_IOVA. However, this would incur two TLB
flushes given that after clearing dirty bits IOMMU implementations require
invalidating their IOTLB, plus another invalidation needed for the UNMAP.
To allow dirty bits to be queried faster, we add a flag
(IOMMU_GET_DIRTY_IOVA_NO_CLEAR) that requests to not clear the dirty bits
from the PTE (but just reading them), under the expectation that the next
operation is the unmap. An alternative is to unmap and just perpectually
mark as dirty as that's the same behaviour as today. So here equivalent
functionally can be provided, and if real dirty info is required we
amortize the cost while querying.

There's still a race against DMA where in theory the unmap of the IOVA
(when the guest invalidates the IOTLB via emulated iommu) would race
against the VF performing DMA on the same IOVA being invalidated which
would be marking the PTE as dirty but losing the update in the
unmap-related IOTLB flush. The way to actually prevent the race would be to
write-protect the IOPTE, then query dirty bits and flush the IOTLB in the
unmap after.  However, this remains an issue that is so far theoretically
possible, but lacks an use case or whether the race is relevant in the
first place that justifies such complexity.

Link: https://lore.kernel.org/linux-iommu/20220502185239.GR8364@nvidia.com/
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/hw_pagetable.c |  3 ++-
 drivers/iommu/iommufd/io_pagetable.c |  9 +++++++--
 include/uapi/linux/iommufd.h         | 12 ++++++++++++
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 25860aa0a1f8..bcb81eefe16c 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -260,7 +260,8 @@ int iommufd_hwpt_get_dirty_iova(struct iommufd_ucmd *ucmd)
 	struct iommufd_ioas *ioas;
 	int rc = -EOPNOTSUPP;
 
-	if ((cmd->flags || cmd->__reserved))
+	if ((cmd->flags & ~(IOMMU_GET_DIRTY_IOVA_NO_CLEAR)) ||
+	    cmd->__reserved)
 		return -EOPNOTSUPP;
 
 	hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index 01adb0f7e4d0..ef58910ec747 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -414,6 +414,7 @@ int iopt_map_user_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
 }
 
 struct iova_bitmap_fn_arg {
+	unsigned long flags;
 	struct iommu_domain *domain;
 	struct iommu_dirty_bitmap *dirty;
 };
@@ -426,8 +427,9 @@ static int __iommu_read_and_clear_dirty(struct iova_bitmap *bitmap,
 	struct iommu_domain *domain = arg->domain;
 	const struct iommu_domain_ops *ops = domain->ops;
 	struct iommu_dirty_bitmap *dirty = arg->dirty;
+	unsigned long flags = arg->flags;
 
-	return ops->read_and_clear_dirty(domain, iova, length, 0, dirty);
+	return ops->read_and_clear_dirty(domain, iova, length, flags, dirty);
 }
 
 static int iommu_read_and_clear_dirty(struct iommu_domain *domain,
@@ -451,11 +453,14 @@ static int iommu_read_and_clear_dirty(struct iommu_domain *domain,
 
 	iommu_dirty_bitmap_init(&dirty, iter, &gather);
 
+	arg.flags = flags;
 	arg.domain = domain;
 	arg.dirty = &dirty;
 	iova_bitmap_for_each(iter, &arg, __iommu_read_and_clear_dirty);
 
-	iommu_iotlb_sync(domain, &gather);
+	if (!(flags & IOMMU_DIRTY_NO_CLEAR))
+		iommu_iotlb_sync(domain, &gather);
+
 	iova_bitmap_free(iter);
 
 	return ret;
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index c256f7354867..a91bf845e679 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -427,6 +427,18 @@ struct iommufd_dirty_data {
 	__aligned_u64 *data;
 };
 
+/**
+ * enum iommufd_get_dirty_iova_flags - Flags for getting dirty bits
+ * @IOMMU_GET_DIRTY_IOVA_NO_CLEAR: Just read the PTEs without clearing any dirty
+ *                                 bits metadata. This flag can be passed in the
+ *                                 expectation where the next operation is
+ *                                 an unmap of the same IOVA range.
+ *
+ */
+enum iommufd_hwpt_get_dirty_iova_flags {
+	IOMMU_GET_DIRTY_IOVA_NO_CLEAR = 1,
+};
+
 /**
  * struct iommu_hwpt_get_dirty_iova - ioctl(IOMMU_HWPT_GET_DIRTY_IOVA)
  * @size: sizeof(struct iommu_hwpt_get_dirty_iova)
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 16/24] iommufd/selftest: Test IOMMU_GET_DIRTY_IOVA_NO_CLEAR flag
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (14 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 15/24] iommufd: Add a flag to skip clearing of IOPTE dirty Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 17/24] iommu/amd: Access/Dirty bit support in IOPTEs Joao Martins
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

Change test_mock_dirty_bitmaps() to pass a flag where we specify the flag
under test. The test does the same thing as the GET_DIRTY_IOVA regular
test. Except that we test whether the bits we dirtied are fetched all the
same a second time as opposed to observing them cleared.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/selftest.c              | 15 ++++---
 tools/testing/selftests/iommu/iommufd.c       | 40 ++++++++++++++++++-
 tools/testing/selftests/iommu/iommufd_utils.h | 26 +++++++-----
 3 files changed, 64 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index d81a977bf3af..ae8e94259b21 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -331,13 +331,16 @@ static int mock_domain_read_and_clear_dirty(struct iommu_domain *domain,
 		ent = xa_load(&mock->pfns, cur / MOCK_IO_PAGE_SIZE);
 		if (ent &&
 		    (xa_to_value(ent) & MOCK_PFN_DIRTY_IOVA)) {
-			unsigned long val;
-
 			/* Clear dirty */
-			val = xa_to_value(ent) & ~MOCK_PFN_DIRTY_IOVA;
-			old = xa_store(&mock->pfns, cur / MOCK_IO_PAGE_SIZE,
-				       xa_mk_value(val), GFP_KERNEL);
-			WARN_ON_ONCE(ent != old);
+			if (!(flags & IOMMU_GET_DIRTY_IOVA_NO_CLEAR)) {
+				unsigned long val;
+
+				val = xa_to_value(ent) & ~MOCK_PFN_DIRTY_IOVA;
+				old = xa_store(&mock->pfns,
+					       cur / MOCK_IO_PAGE_SIZE,
+					       xa_mk_value(val), GFP_KERNEL);
+				WARN_ON_ONCE(ent != old);
+			}
 			iommu_dirty_bitmap_record(dirty, cur, MOCK_IO_PAGE_SIZE);
 		}
 	}
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index dad1eca3aa09..7ee788ce80c8 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -1508,13 +1508,49 @@ TEST_F(iommufd_dirty_tracking, get_dirty_iova)
 	test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
 				MOCK_APERTURE_START,
 				self->page_size, self->bitmap,
-				self->bitmap_size, _metadata);
+				self->bitmap_size, 0, _metadata);
 
 	/* PAGE_SIZE unaligned bitmap */
 	test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
 				MOCK_APERTURE_START,
 				self->page_size, self->bitmap + MOCK_PAGE_SIZE,
-				self->bitmap_size, _metadata);
+				self->bitmap_size, 0, _metadata);
+
+	test_ioctl_destroy(stddev_id);
+	test_ioctl_destroy(hwpt_id);
+}
+
+TEST_F(iommufd_dirty_tracking, get_dirty_iova_no_clear)
+{
+	uint32_t stddev_id;
+	uint32_t hwpt_id;
+	uint32_t ioas_id;
+
+	test_ioctl_ioas_alloc(&ioas_id);
+	test_ioctl_ioas_map_fixed_id(ioas_id, self->buffer,
+				     variant->buffer_size,
+				     MOCK_APERTURE_START);
+
+	test_cmd_hwpt_alloc(self->idev_id, ioas_id,
+			    IOMMU_HWPT_ALLOC_ENFORCE_DIRTY, &hwpt_id);
+	test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
+
+	test_cmd_set_dirty(hwpt_id, true);
+
+	test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
+				MOCK_APERTURE_START,
+				self->page_size, self->bitmap,
+				self->bitmap_size,
+				IOMMU_GET_DIRTY_IOVA_NO_CLEAR,
+				_metadata);
+
+	/* Unaligned bitmap */
+	test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
+				MOCK_APERTURE_START,
+				self->page_size, self->bitmap + MOCK_PAGE_SIZE,
+				self->bitmap_size,
+				IOMMU_GET_DIRTY_IOVA_NO_CLEAR,
+				_metadata);
 
 	test_ioctl_destroy(stddev_id);
 	test_ioctl_destroy(hwpt_id);
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index e942bc781f34..1c0b942bcb4a 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -166,11 +166,13 @@ static int _test_cmd_get_device_caps(int fd, __u32 dev_id, __u64 capability)
 						      expected))
 
 static int _test_cmd_get_dirty_iova(int fd, __u32 hwpt_id, size_t length,
-				    __u64 iova, size_t page_size, __u64 *bitmap)
+				    __u64 iova, size_t page_size, __u64 *bitmap,
+				    __u32 flags)
 {
 	struct iommu_hwpt_get_dirty_iova cmd = {
 		.size = sizeof(cmd),
 		.hwpt_id = hwpt_id,
+		.flags = flags,
 		.bitmap = {
 			.iova = iova,
 			.length = length,
@@ -186,9 +188,10 @@ static int _test_cmd_get_dirty_iova(int fd, __u32 hwpt_id, size_t length,
 	return 0;
 }
 
-#define test_cmd_get_dirty_iova(fd, hwpt_id, length, iova, page_size, bitmap) \
+#define test_cmd_get_dirty_iova(fd, hwpt_id, length, iova, page_size, bitmap, \
+				flags) \
 	ASSERT_EQ(0, _test_cmd_get_dirty_iova(fd, hwpt_id, length,            \
-					      iova, page_size, bitmap))
+					      iova, page_size, bitmap, flags))
 
 static int _test_cmd_mock_domain_set_dirty(int fd, __u32 hwpt_id, size_t length,
 				           __u64 iova, size_t page_size,
@@ -224,6 +227,7 @@ static int _test_cmd_mock_domain_set_dirty(int fd, __u32 hwpt_id, size_t length,
 static int _test_mock_dirty_bitmaps(int fd, __u32 hwpt_id, size_t length,
 				    __u64 iova, size_t page_size,
 				    __u64 *bitmap, __u64 bitmap_size,
+				    __u32 flags,
 				    struct __test_metadata *_metadata)
 {
 	unsigned long i, count, nbits = bitmap_size * BITS_PER_BYTE;
@@ -242,26 +246,30 @@ static int _test_mock_dirty_bitmaps(int fd, __u32 hwpt_id, size_t length,
 
 	/* Expect all even bits as dirty in the user bitmap */
 	memset(bitmap, 0, bitmap_size);
-	test_cmd_get_dirty_iova(fd, hwpt_id, length, iova, page_size, bitmap);
+	test_cmd_get_dirty_iova(fd, hwpt_id, length, iova,
+				page_size, bitmap, flags);
 	for (count = 0, i = 0; i < nbits; count += !(i%2), i++)
 		ASSERT_EQ(!(i % 2), test_bit(i, (unsigned long *) bitmap));
 	ASSERT_EQ(count, out_dirty);
 
 	memset(bitmap, 0, bitmap_size);
-	test_cmd_get_dirty_iova(fd, hwpt_id, length, iova, page_size, bitmap);
+	test_cmd_get_dirty_iova(fd, hwpt_id, length, iova,
+				page_size, bitmap, flags);
 
 	/* It as read already -- expect all zeroes */
-	for (i = 0; i < nbits; i++)
-		ASSERT_EQ(0, test_bit(i, (unsigned long *) bitmap));
+	for (i = 0; i < nbits; i++) {
+		ASSERT_EQ(!(i % 2) && (flags & IOMMU_GET_DIRTY_IOVA_NO_CLEAR),
+			  test_bit(i, (unsigned long *) bitmap));
+	}
 
 	return 0;
 }
 #define test_mock_dirty_bitmaps(hwpt_id, length, iova, page_size, bitmap, \
-				bitmap_size, _metadata) \
+				bitmap_size, flags, _metadata) \
 	ASSERT_EQ(0, _test_mock_dirty_bitmaps(self->fd, hwpt_id,      \
 					      length, iova,           \
 					      page_size, bitmap,      \
-					      bitmap_size, _metadata))
+					      bitmap_size, flags, _metadata))
 
 static int _test_cmd_create_access(int fd, unsigned int ioas_id,
 				   __u32 *access_id, unsigned int flags)
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 17/24] iommu/amd: Access/Dirty bit support in IOPTEs
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (15 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 16/24] iommufd/selftest: Test IOMMU_GET_DIRTY_IOVA_NO_CLEAR flag Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 18/24] iommu/amd: Print access/dirty bits if supported Joao Martins
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

IOMMU advertises Access/Dirty bits if the extended feature register
reports it. Relevant AMD IOMMU SDM ref[0]
"1.3.8 Enhanced Support for Access and Dirty Bits"

To enable it set the DTE flag in bits 7 and 8 to enable access, or
access+dirty. With that, the IOMMU starts marking the D and A flags on
every Memory Request or ATS translation request. It is on the VMM side
to steer whether to enable dirty tracking or not, rather than wrongly
doing in IOMMU. Relevant AMD IOMMU SDM ref [0], "Table 7. Device Table
Entry (DTE) Field Definitions" particularly the entry "HAD".

To actually toggle on and off it's relatively simple as it's setting
2 bits on DTE and flush the device DTE cache.

To get what's dirtied use existing AMD io-pgtable support, by walking
the pagetables over each IOVA, with fetch_pte().  The IOTLB flushing is
left to the caller (much like unmap), and iommu_dirty_bitmap_record() is
the one adding page-ranges to invalidate. This allows caller to batch
the flush over a big span of IOVA space, without the iommu wondering
about when to flush.

Worthwhile sections from AMD IOMMU SDM:

"2.2.3.1 Host Access Support"
"2.2.3.2 Host Dirty Support"

For details on how IOMMU hardware updates the dirty bit see,
and expects from its consequent clearing by CPU:

"2.2.7.4 Updating Accessed and Dirty Bits in the Guest Address Tables"
"2.2.7.5 Clearing Accessed and Dirty Bits"

Quoting the SDM:

"The setting of accessed and dirty status bits in the page tables is
visible to both the CPU and the peripheral when sharing guest page
tables. The IOMMU interlocked operations to update A and D bits must be
64-bit operations and naturally aligned on a 64-bit boundary"

.. and for the IOMMU update sequence to Dirty bit, essentially is states:

1. Decodes the read and write intent from the memory access.
2. If P=0 in the page descriptor, fail the access.
3. Compare the A & D bits in the descriptor with the read and write
intent in the request.
4. If the A or D bits need to be updated in the descriptor:
* Start atomic operation.
* Read the descriptor as a 64-bit access.
* If the descriptor no longer appears to require an update, release the
atomic lock with
no further action and continue to step 5.
* Calculate the new A & D bits.
* Write the descriptor as a 64-bit access.
* End atomic operation.
5. Continue to the next stage of translation or to the memory access.

Access/Dirty bits readout also need to consider the non-default
page-sizes (aka replicated PTEs as mentined by manual), as AMD
supports all powers of two (except 512G) page sizes.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/amd/amd_iommu.h       |  1 +
 drivers/iommu/amd/amd_iommu_types.h | 12 +++++
 drivers/iommu/amd/init.c            |  5 ++
 drivers/iommu/amd/io_pgtable.c      | 84 +++++++++++++++++++++++++++++
 drivers/iommu/amd/iommu.c           | 81 ++++++++++++++++++++++++++++
 5 files changed, 183 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index e98f20a9bdd8..62567f275878 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -35,6 +35,7 @@ extern int amd_iommu_enable_faulting(void);
 extern int amd_iommu_guest_ir;
 extern enum io_pgtable_fmt amd_iommu_pgtable;
 extern int amd_iommu_gpt_level;
+extern bool amd_iommu_had_support;
 
 /* IOMMUv2 specific functions */
 struct iommu_domain;
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 2ddbda3a4374..3138c257338d 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -97,7 +97,9 @@
 #define FEATURE_GATS_MASK	(3ULL)
 #define FEATURE_GAM_VAPIC	(1ULL<<21)
 #define FEATURE_GIOSUP		(1ULL<<48)
+#define FEATURE_HASUP		(1ULL<<49)
 #define FEATURE_EPHSUP		(1ULL<<50)
+#define FEATURE_HDSUP		(1ULL<<52)
 #define FEATURE_SNP		(1ULL<<63)
 
 #define FEATURE_PASID_SHIFT	32
@@ -208,6 +210,7 @@
 /* macros and definitions for device table entries */
 #define DEV_ENTRY_VALID         0x00
 #define DEV_ENTRY_TRANSLATION   0x01
+#define DEV_ENTRY_HAD           0x07
 #define DEV_ENTRY_PPR           0x34
 #define DEV_ENTRY_IR            0x3d
 #define DEV_ENTRY_IW            0x3e
@@ -366,10 +369,16 @@
 #define PTE_LEVEL_PAGE_SIZE(level)			\
 	(1ULL << (12 + (9 * (level))))
 
+/*
+ * The IOPTE dirty bit
+ */
+#define IOMMU_PTE_HD_BIT (6)
+
 /*
  * Bit value definition for I/O PTE fields
  */
 #define IOMMU_PTE_PR (1ULL << 0)
+#define IOMMU_PTE_HD (1ULL << IOMMU_PTE_HD_BIT)
 #define IOMMU_PTE_U  (1ULL << 59)
 #define IOMMU_PTE_FC (1ULL << 60)
 #define IOMMU_PTE_IR (1ULL << 61)
@@ -380,6 +389,7 @@
  */
 #define DTE_FLAG_V  (1ULL << 0)
 #define DTE_FLAG_TV (1ULL << 1)
+#define DTE_FLAG_HAD (3ULL << 7)
 #define DTE_FLAG_IR (1ULL << 61)
 #define DTE_FLAG_IW (1ULL << 62)
 
@@ -409,6 +419,7 @@
 
 #define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL)
 #define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_PR)
+#define IOMMU_PTE_DIRTY(pte) ((pte) & IOMMU_PTE_HD)
 #define IOMMU_PTE_PAGE(pte) (iommu_phys_to_virt((pte) & IOMMU_PAGE_MASK))
 #define IOMMU_PTE_MODE(pte) (((pte) >> 9) & 0x07)
 
@@ -559,6 +570,7 @@ struct protection_domain {
 	int nid;		/* Node ID */
 	u64 *gcr3_tbl;		/* Guest CR3 table */
 	unsigned long flags;	/* flags to find out type of domain */
+	bool dirty_tracking;	/* dirty tracking is enabled in the domain */
 	unsigned dev_cnt;	/* devices assigned to this domain */
 	unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */
 };
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 329a406cc37d..082f47e22c6e 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -151,6 +151,7 @@ struct ivmd_header {
 
 bool amd_iommu_dump;
 bool amd_iommu_irq_remap __read_mostly;
+bool amd_iommu_had_support __read_mostly;
 
 enum io_pgtable_fmt amd_iommu_pgtable = AMD_IOMMU_V1;
 /* Guest page table level */
@@ -2201,6 +2202,10 @@ static int __init amd_iommu_init_pci(void)
 	for_each_iommu(iommu)
 		iommu_flush_all_caches(iommu);
 
+	if (check_feature_on_all_iommus(FEATURE_HASUP) &&
+	    check_feature_on_all_iommus(FEATURE_HDSUP))
+		amd_iommu_had_support = true;
+
 	print_iommu_info();
 
 out:
diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 666b643106f8..63b4b3ae7c71 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -486,6 +486,89 @@ static phys_addr_t iommu_v1_iova_to_phys(struct io_pgtable_ops *ops, unsigned lo
 	return (__pte & ~offset_mask) | (iova & offset_mask);
 }
 
+static bool pte_test_dirty(u64 *ptep, unsigned long size)
+{
+       bool dirty = false;
+       int i, count;
+
+       /*
+        * 2.2.3.2 Host Dirty Support
+        * When a non-default page size is used , software must OR the
+        * Dirty bits in all of the replicated host PTEs used to map
+        * the page. The IOMMU does not guarantee the Dirty bits are
+        * set in all of the replicated PTEs. Any portion of the page
+        * may have been written even if the Dirty bit is set in only
+        * one of the replicated PTEs.
+        */
+       count = PAGE_SIZE_PTE_COUNT(size);
+       for (i = 0; i < count; i++) {
+               if (test_bit(IOMMU_PTE_HD_BIT, (unsigned long *) &ptep[i])) {
+                       dirty = true;
+                       break;
+               }
+       }
+
+       return dirty;
+}
+
+static bool pte_test_and_clear_dirty(u64 *ptep, unsigned long size)
+{
+	bool dirty = false;
+	int i, count;
+
+	/*
+	 * 2.2.3.2 Host Dirty Support
+	 * When a non-default page size is used , software must OR the
+	 * Dirty bits in all of the replicated host PTEs used to map
+	 * the page. The IOMMU does not guarantee the Dirty bits are
+	 * set in all of the replicated PTEs. Any portion of the page
+	 * may have been written even if the Dirty bit is set in only
+	 * one of the replicated PTEs.
+	 */
+	count = PAGE_SIZE_PTE_COUNT(size);
+	for (i = 0; i < count; i++)
+		if (test_and_clear_bit(IOMMU_PTE_HD_BIT,
+					(unsigned long *) &ptep[i]))
+			dirty = true;
+
+	return dirty;
+}
+
+static int iommu_v1_read_and_clear_dirty(struct io_pgtable_ops *ops,
+					 unsigned long iova, size_t size,
+					 unsigned long flags,
+					 struct iommu_dirty_bitmap *dirty)
+{
+	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
+	unsigned long end = iova + size - 1;
+
+	do {
+		unsigned long pgsize = 0;
+		u64 *ptep, pte;
+
+		ptep = fetch_pte(pgtable, iova, &pgsize);
+		if (ptep)
+			pte = READ_ONCE(*ptep);
+		if (!ptep || !IOMMU_PTE_PRESENT(pte)) {
+			pgsize = pgsize ?: PTE_LEVEL_PAGE_SIZE(0);
+			iova += pgsize;
+			continue;
+		}
+
+		/*
+		 * Mark the whole IOVA range as dirty even if only one of
+		 * the replicated PTEs were marked dirty.
+		 */
+		if (((flags & IOMMU_DIRTY_NO_CLEAR) &&
+				pte_test_dirty(ptep, pgsize)) ||
+		    pte_test_and_clear_dirty(ptep, pgsize))
+			iommu_dirty_bitmap_record(dirty, iova, pgsize);
+		iova += pgsize;
+	} while (iova < end);
+
+	return 0;
+}
+
 /*
  * ----------------------------------------------------
  */
@@ -526,6 +609,7 @@ static struct io_pgtable *v1_alloc_pgtable(struct io_pgtable_cfg *cfg, void *coo
 	pgtable->iop.ops.map_pages    = iommu_v1_map_pages;
 	pgtable->iop.ops.unmap_pages  = iommu_v1_unmap_pages;
 	pgtable->iop.ops.iova_to_phys = iommu_v1_iova_to_phys;
+	pgtable->iop.ops.read_and_clear_dirty = iommu_v1_read_and_clear_dirty;
 
 	return &pgtable->iop;
 }
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 4a314647d1f7..ddb92005f018 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1586,6 +1586,9 @@ static void set_dte_entry(struct amd_iommu *iommu, u16 devid,
 			pte_root |= 1ULL << DEV_ENTRY_PPR;
 	}
 
+	if (domain->dirty_tracking)
+		pte_root |= DTE_FLAG_HAD;
+
 	if (domain->flags & PD_IOMMUV2_MASK) {
 		u64 gcr3 = iommu_virt_to_phys(domain->gcr3_tbl);
 		u64 glx  = domain->glx;
@@ -2177,6 +2180,10 @@ static int amd_iommu_attach_device(struct iommu_domain *dom,
 
 	dev_data->defer_attach = false;
 
+	if (dom->flags & IOMMU_DOMAIN_F_ENFORCE_DIRTY &&
+	    (iommu && !(iommu->features & FEATURE_HDSUP)))
+		return -EINVAL;
+
 	if (dev_data->domain)
 		detach_device(dev);
 
@@ -2293,6 +2300,8 @@ static bool amd_iommu_capable(struct device *dev, enum iommu_cap cap)
 		return amdr_ivrs_remap_support;
 	case IOMMU_CAP_ENFORCE_CACHE_COHERENCY:
 		return true;
+	case IOMMU_CAP_DIRTY:
+		return amd_iommu_had_support;
 	default:
 		break;
 	}
@@ -2300,6 +2309,75 @@ static bool amd_iommu_capable(struct device *dev, enum iommu_cap cap)
 	return false;
 }
 
+static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
+					bool enable)
+{
+	struct protection_domain *pdomain = to_pdomain(domain);
+	struct dev_table_entry *dev_table;
+	struct iommu_dev_data *dev_data;
+	struct amd_iommu *iommu;
+	unsigned long flags;
+	u64 pte_root;
+
+	spin_lock_irqsave(&pdomain->lock, flags);
+	if (!(pdomain->dirty_tracking ^ enable)) {
+		spin_unlock_irqrestore(&pdomain->lock, flags);
+		return 0;
+	}
+
+	list_for_each_entry(dev_data, &pdomain->dev_list, list) {
+		iommu = rlookup_amd_iommu(dev_data->dev);
+		if (!iommu)
+			continue;
+
+		dev_table = get_dev_table(iommu);
+		pte_root = dev_table[dev_data->devid].data[0];
+
+		pte_root = (enable ?
+			pte_root | DTE_FLAG_HAD : pte_root & ~DTE_FLAG_HAD);
+
+		/* Flush device DTE */
+		dev_table[dev_data->devid].data[0] = pte_root;
+		device_flush_dte(dev_data);
+	}
+
+	/* Flush IOTLB to mark IOPTE dirty on the next translation(s) */
+	amd_iommu_domain_flush_tlb_pde(pdomain);
+	amd_iommu_domain_flush_complete(pdomain);
+	pdomain->dirty_tracking = enable;
+	spin_unlock_irqrestore(&pdomain->lock, flags);
+
+	return 0;
+}
+
+static int amd_iommu_read_and_clear_dirty(struct iommu_domain *domain,
+					  unsigned long iova, size_t size,
+					  unsigned long flags,
+					  struct iommu_dirty_bitmap *dirty)
+{
+	struct protection_domain *pdomain = to_pdomain(domain);
+	struct io_pgtable_ops *ops = &pdomain->iop.iop.ops;
+	unsigned long lflags;
+	int ret;
+
+	if (!ops || !ops->read_and_clear_dirty)
+		return -EOPNOTSUPP;
+
+	spin_lock_irqsave(&pdomain->lock, lflags);
+	if (!pdomain->dirty_tracking && dirty->bitmap) {
+		spin_unlock_irqrestore(&pdomain->lock, lflags);
+		return -EINVAL;
+	}
+	spin_unlock_irqrestore(&pdomain->lock, lflags);
+
+	rcu_read_lock();
+	ret = ops->read_and_clear_dirty(ops, iova, size, flags, dirty);
+	rcu_read_unlock();
+
+	return ret;
+}
+
+
 static void amd_iommu_get_resv_regions(struct device *dev,
 				       struct list_head *head)
 {
@@ -2432,6 +2510,7 @@ const struct iommu_ops amd_iommu_ops = {
 	.get_resv_regions = amd_iommu_get_resv_regions,
 	.is_attach_deferred = amd_iommu_is_attach_deferred,
 	.pgsize_bitmap	= AMD_IOMMU_PGSIZES,
+	.supported_flags = IOMMU_DOMAIN_F_ENFORCE_DIRTY,
 	.def_domain_type = amd_iommu_def_domain_type,
 	.default_domain_ops = &(const struct iommu_domain_ops) {
 		.attach_dev	= amd_iommu_attach_device,
@@ -2443,6 +2522,8 @@ const struct iommu_ops amd_iommu_ops = {
 		.iotlb_sync	= amd_iommu_iotlb_sync,
 		.free		= amd_iommu_domain_free,
 		.enforce_cache_coherency = amd_iommu_enforce_cache_coherency,
+		.set_dirty_tracking = amd_iommu_set_dirty_tracking,
+		.read_and_clear_dirty = amd_iommu_read_and_clear_dirty,
 	}
 };
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 18/24] iommu/amd: Print access/dirty bits if supported
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (16 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 17/24] iommu/amd: Access/Dirty bit support in IOPTEs Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 19/24] iommu/intel: Access/Dirty bit support for SL domains Joao Martins
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

Print the feature, much like other kernel-supported features.

One can still probe its actual hw support via sysfs, regardless
of what the kernel does.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/amd/init.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 082f47e22c6e..102440316c4a 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -2151,6 +2151,10 @@ static void print_iommu_info(void)
 
 			if (iommu->features & FEATURE_GAM_VAPIC)
 				pr_cont(" GA_vAPIC");
+			if (iommu->features & FEATURE_HASUP)
+				pr_cont(" HASup");
+			if (iommu->features & FEATURE_HDSUP)
+				pr_cont(" HDSup");
 
 			if (iommu->features & FEATURE_SNP)
 				pr_cont(" SNP");
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 19/24] iommu/intel: Access/Dirty bit support for SL domains
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (17 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 18/24] iommu/amd: Print access/dirty bits if supported Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 20/24] iommu/arm-smmu-v3: Add feature detection for HTTU Joao Martins
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

IOMMU advertises Access/Dirty bits for second-stage page table if the
extended capability DMAR register reports it (ECAP, mnemonic ECAP.SSADS).
The first stage table is compatible with CPU page table thus A/D bits are
implicitly supported. Relevant Intel IOMMU SDM ref for first stage table
"3.6.2 Accessed, Extended Accessed, and Dirty Flags" and second stage table
"3.7.2 Accessed and Dirty Flags".

First stage page table is enabled by default so it's allowed to set dirty
tracking and no control bits needed, we just return 0. To use SSADS, we set
bit 9 (SSADE) in the scalable-mode PASID table entry and flush the IOTLB
via pasid_flush_caches(). Relevant SDM refs:

"3.7.2 Accessed and Dirty Flags"
"6.5.3.3 Guidance to Software for Invalidations,
 Table 23. Guidance to Software for Invalidations"

PTE dirty bit is located in bit 9 and it's cached in the IOTLB so
we also need to flush IOTLB to make sure IOMMU attempts to set the
dirty bit again. Note that iommu_dirty_bitmap_record() will add the
IOVA to iotlb_gather and thus the caller of the iommu op will flush the
IOTLB. Relevant manuals over the hardware translation is
chapter 6 with some special mention to:

"6.2.3.1 Scalable-Mode PASID-Table Entry Programming Considerations"
"6.2.4 IOTLB"

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
The IOPTE walker is still inneficient, but v3 will change that. Mainly
purpose here would be to make sure the UAPI/IOMMUFD is solid and agreed upon.
---
 drivers/iommu/intel/iommu.c |  88 ++++++++++++++++++++++++++++++
 drivers/iommu/intel/iommu.h |  15 ++++++
 drivers/iommu/intel/pasid.c | 103 ++++++++++++++++++++++++++++++++++++
 drivers/iommu/intel/pasid.h |   4 ++
 4 files changed, 210 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 4662292d60ba..6cf9cbe4c299 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4112,6 +4112,10 @@ static int prepare_domain_attach_device(struct iommu_domain *domain,
 	if (!iommu)
 		return -ENODEV;
 
+	if (domain->flags & IOMMU_DOMAIN_F_ENFORCE_DIRTY &&
+	    !ecap_slads(iommu->ecap))
+		return -EINVAL;
+
 	if (dmar_domain->force_snooping && !ecap_sc_support(iommu->ecap))
 		return -EINVAL;
 
@@ -4374,6 +4378,9 @@ static bool intel_iommu_capable(struct device *dev, enum iommu_cap cap)
 		return dmar_platform_optin();
 	case IOMMU_CAP_ENFORCE_CACHE_COHERENCY:
 		return ecap_sc_support(info->iommu->ecap);
+	case IOMMU_CAP_DIRTY:
+		return sm_supported(info->iommu) &&
+			ecap_slads(info->iommu->ecap);
 	default:
 		return false;
 	}
@@ -4739,6 +4746,84 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 	intel_pasid_tear_down_entry(iommu, dev, pasid, false);
 }
 
+static int intel_iommu_set_dirty_tracking(struct iommu_domain *domain,
+					  bool enable)
+{
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct device_domain_info *info;
+	int ret = -EINVAL;
+
+	spin_lock(&dmar_domain->lock);
+	if (!(dmar_domain->dirty_tracking ^ enable) ||
+	    list_empty(&dmar_domain->devices)) {
+		spin_unlock(&dmar_domain->lock);
+		return 0;
+	}
+
+	list_for_each_entry(info, &dmar_domain->devices, link) {
+		/* First-level page table always enables dirty bit*/
+		if (dmar_domain->use_first_level) {
+			ret = 0;
+			break;
+		}
+
+		ret = intel_pasid_setup_dirty_tracking(info->iommu, info->domain,
+						     info->dev, PASID_RID2PASID,
+						     enable);
+		if (ret)
+			break;
+
+	}
+
+	if (!ret)
+		dmar_domain->dirty_tracking = enable;
+	spin_unlock(&dmar_domain->lock);
+
+	return ret;
+}
+
+static int intel_iommu_read_and_clear_dirty(struct iommu_domain *domain,
+					    unsigned long iova, size_t size,
+					    unsigned long flags,
+					    struct iommu_dirty_bitmap *dirty)
+{
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	unsigned long end = iova + size - 1;
+	unsigned long pgsize;
+	bool ad_enabled;
+
+	spin_lock(&dmar_domain->lock);
+	ad_enabled = dmar_domain->dirty_tracking;
+	spin_unlock(&dmar_domain->lock);
+
+	if (!ad_enabled && dirty->bitmap)
+		return -EINVAL;
+
+	rcu_read_lock();
+	do {
+		struct dma_pte *pte;
+		int lvl = 0;
+
+		pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, &lvl,
+				     GFP_ATOMIC);
+		pgsize = level_size(lvl) << VTD_PAGE_SHIFT;
+		if (!pte || !dma_pte_present(pte)) {
+			iova += pgsize;
+			continue;
+		}
+
+		/* It is writable, set the bitmap */
+		if (((flags & IOMMU_DIRTY_NO_CLEAR) &&
+				dma_sl_pte_dirty(pte)) ||
+		    dma_sl_pte_test_and_clear_dirty(pte))
+			iommu_dirty_bitmap_record(dirty, iova, pgsize);
+		iova += pgsize;
+	} while (iova < end);
+	rcu_read_unlock();
+
+	return 0;
+}
+
 const struct iommu_ops intel_iommu_ops = {
 	.capable		= intel_iommu_capable,
 	.domain_alloc		= intel_iommu_domain_alloc,
@@ -4753,6 +4838,7 @@ const struct iommu_ops intel_iommu_ops = {
 	.def_domain_type	= device_def_domain_type,
 	.remove_dev_pasid	= intel_iommu_remove_dev_pasid,
 	.pgsize_bitmap		= SZ_4K,
+	.supported_flags	= IOMMU_DOMAIN_F_ENFORCE_DIRTY,
 #ifdef CONFIG_INTEL_IOMMU_SVM
 	.page_response		= intel_svm_page_response,
 #endif
@@ -4766,6 +4852,8 @@ const struct iommu_ops intel_iommu_ops = {
 		.iova_to_phys		= intel_iommu_iova_to_phys,
 		.free			= intel_iommu_domain_free,
 		.enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
+		.set_dirty_tracking	= intel_iommu_set_dirty_tracking,
+		.read_and_clear_dirty   = intel_iommu_read_and_clear_dirty,
 	}
 };
 
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 1c5e1d88862b..56ee6ce2e09d 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -48,6 +48,9 @@
 #define DMA_FL_PTE_DIRTY	BIT_ULL(6)
 #define DMA_FL_PTE_XD		BIT_ULL(63)
 
+#define DMA_SL_PTE_DIRTY_BIT	9
+#define DMA_SL_PTE_DIRTY	BIT_ULL(DMA_SL_PTE_DIRTY_BIT)
+
 #define ADDR_WIDTH_5LEVEL	(57)
 #define ADDR_WIDTH_4LEVEL	(48)
 
@@ -592,6 +595,7 @@ struct dmar_domain {
 					 * otherwise, goes through the second
 					 * level.
 					 */
+	u8 dirty_tracking:1;		/* Dirty tracking is enabled */
 
 	spinlock_t lock;		/* Protect device tracking lists */
 	struct list_head devices;	/* all devices' list */
@@ -774,6 +778,17 @@ static inline bool dma_pte_present(struct dma_pte *pte)
 	return (pte->val & 3) != 0;
 }
 
+static inline bool dma_sl_pte_dirty(struct dma_pte *pte)
+{
+	return (pte->val & DMA_SL_PTE_DIRTY) != 0;
+}
+
+static inline bool dma_sl_pte_test_and_clear_dirty(struct dma_pte *pte)
+{
+	return test_and_clear_bit(DMA_SL_PTE_DIRTY_BIT,
+				  (unsigned long *)&pte->val);
+}
+
 static inline bool dma_pte_superpage(struct dma_pte *pte)
 {
 	return (pte->val & DMA_PTE_LARGE_PAGE);
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index c5d479770e12..c7cfa0387277 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -277,6 +277,11 @@ static inline void pasid_set_bits(u64 *ptr, u64 mask, u64 bits)
 	WRITE_ONCE(*ptr, (old & ~mask) | bits);
 }
 
+static inline u64 pasid_get_bits(u64 *ptr)
+{
+	return READ_ONCE(*ptr);
+}
+
 /*
  * Setup the DID(Domain Identifier) field (Bit 64~79) of scalable mode
  * PASID entry.
@@ -335,6 +340,45 @@ static inline void pasid_set_fault_enable(struct pasid_entry *pe)
 	pasid_set_bits(&pe->val[0], 1 << 1, 0);
 }
 
+/*
+ * Enable second level A/D bits by setting the SLADE (Second Level
+ * Access Dirty Enable) field (Bit 9) of a scalable mode PASID
+ * entry.
+ */
+static inline void pasid_set_ssade(struct pasid_entry *pe)
+{
+	pasid_set_bits(&pe->val[0], 1 << 9, 1 << 9);
+}
+
+/*
+ * Enable second level A/D bits by setting the SLADE (Second Level
+ * Access Dirty Enable) field (Bit 9) of a scalable mode PASID
+ * entry.
+ */
+static inline void pasid_clear_ssade(struct pasid_entry *pe)
+{
+	pasid_set_bits(&pe->val[0], 1 << 9, 0);
+}
+
+/*
+ * Checks if second level A/D bits by setting the SLADE (Second Level
+ * Access Dirty Enable) field (Bit 9) of a scalable mode PASID
+ * entry is enabled.
+ */
+static inline bool pasid_get_ssade(struct pasid_entry *pe)
+{
+	return pasid_get_bits(&pe->val[0]) & (1 << 9);
+}
+
+/*
+ * Setup the SRE(Supervisor Request Enable) field (Bit 128) of a
+ * scalable mode PASID entry.
+ */
+static inline void pasid_set_sre(struct pasid_entry *pe)
+{
+	pasid_set_bits(&pe->val[2], 1 << 0, 1);
+}
+
 /*
  * Setup the WPE(Write Protect Enable) field (Bit 132) of a
  * scalable mode PASID entry.
@@ -627,6 +671,8 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 	pasid_set_translation_type(pte, PASID_ENTRY_PGTT_SL_ONLY);
 	pasid_set_fault_enable(pte);
 	pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+	if (domain->dirty_tracking)
+		pasid_set_ssade(pte);
 
 	pasid_set_present(pte);
 	spin_unlock(&iommu->lock);
@@ -636,6 +682,63 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 	return 0;
 }
 
+/*
+ * Set up dirty tracking on a second only translation type.
+ */
+int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
+				     struct dmar_domain *domain,
+				     struct device *dev, u32 pasid,
+				     bool enabled)
+{
+	struct pasid_entry *pte;
+	u16 did, pgtt;
+
+	spin_lock(&iommu->lock);
+
+	did = domain_id_iommu(domain, iommu);
+	pte = intel_pasid_get_entry(dev, pasid);
+	if (!pte) {
+		spin_unlock(&iommu->lock);
+		dev_err(dev, "Failed to get pasid entry of PASID %d\n", pasid);
+		return -ENODEV;
+	}
+
+	pgtt = pasid_pte_get_pgtt(pte);
+
+	if (enabled)
+		pasid_set_ssade(pte);
+	else
+		pasid_clear_ssade(pte);
+	spin_unlock(&iommu->lock);
+
+	/*
+	 * From VT-d spec table 25 "Guidance to Software for Invalidations":
+	 *
+	 * - PASID-selective-within-Domain PASID-cache invalidation
+	 *   If (PGTT=SS or Nested)
+	 *    - Domain-selective IOTLB invalidation
+	 *   Else
+	 *    - PASID-selective PASID-based IOTLB invalidation
+	 * - If (pasid is RID_PASID)
+	 *    - Global Device-TLB invalidation to affected functions
+	 *   Else
+	 *    - PASID-based Device-TLB invalidation (with S=1 and
+	 *      Addr[63:12]=0x7FFFFFFF_FFFFF) to affected functions
+	 */
+	pasid_cache_invalidation_with_pasid(iommu, did, pasid);
+
+	if (pgtt == PASID_ENTRY_PGTT_SL_ONLY || pgtt == PASID_ENTRY_PGTT_NESTED)
+		iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
+	else
+		qi_flush_piotlb(iommu, did, pasid, 0, -1, 0);
+
+	/* Device IOTLB doesn't need to be flushed in caching mode. */
+	if (!cap_caching_mode(iommu->cap))
+		devtlb_invalidation_with_pasid(iommu, dev, pasid);
+
+	return 0;
+}
+
 /*
  * Set up the scalable mode pasid entry for passthrough translation type.
  */
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index d6b7d21244b1..3fc5aba02971 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -108,6 +108,10 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
 int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 				   struct dmar_domain *domain,
 				   struct device *dev, u32 pasid);
+int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
+				     struct dmar_domain *domain,
+				     struct device *dev, u32 pasid,
+				     bool enabled);
 int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 				   struct dmar_domain *domain,
 				   struct device *dev, u32 pasid);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 20/24] iommu/arm-smmu-v3: Add feature detection for HTTU
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (18 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 19/24] iommu/intel: Access/Dirty bit support for SL domains Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 21/24] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping Joao Martins
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

If the SMMU supports it and the kernel was built with HTTU support,
Probe support for Hardware Translation Table Update (HTTU) which is
essentially to enable hardware update of access and dirty flags.

Probe and set the smmu::features for Hardware Dirty and Hardware Access
bits. This is in preparation, to enable it on the context descriptors of
stage 1 format.

Link: https://lore.kernel.org/lkml/20210413085457.25400-5-zhukeqian1@huawei.com/
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
[joaomart: Change commit message to reflect the underlying changes,
 the Link points to the original version this was based]
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 32 +++++++++++++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  5 ++++
 2 files changed, 37 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 3fd83fb75722..e110ff4710bf 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3429,6 +3429,28 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	return 0;
 }
 
+static void arm_smmu_get_httu(struct arm_smmu_device *smmu, u32 reg)
+{
+	u32 fw_features = smmu->features & (ARM_SMMU_FEAT_HA | ARM_SMMU_FEAT_HD);
+	u32 features = 0;
+
+	switch (FIELD_GET(IDR0_HTTU, reg)) {
+	case IDR0_HTTU_ACCESS_DIRTY:
+		features |= ARM_SMMU_FEAT_HD;
+		fallthrough;
+	case IDR0_HTTU_ACCESS:
+		features |= ARM_SMMU_FEAT_HA;
+	}
+
+	if (smmu->dev->of_node)
+		smmu->features |= features;
+	else if (features != fw_features)
+		/* ACPI IORT sets the HTTU bits */
+		dev_warn(smmu->dev,
+			 "IDR0.HTTU overridden by FW configuration (0x%x)\n",
+			 fw_features);
+}
+
 static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
@@ -3489,6 +3511,8 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 			smmu->features |= ARM_SMMU_FEAT_E2H;
 	}
 
+	arm_smmu_get_httu(smmu, reg);
+
 	/*
 	 * The coherency feature as set by FW is used in preference to the ID
 	 * register, but warn on mismatch.
@@ -3675,6 +3699,14 @@ static int arm_smmu_device_acpi_probe(struct platform_device *pdev,
 	if (iort_smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE)
 		smmu->features |= ARM_SMMU_FEAT_COHERENCY;
 
+	switch (FIELD_GET(ACPI_IORT_SMMU_V3_HTTU_OVERRIDE, iort_smmu->flags)) {
+	case IDR0_HTTU_ACCESS_DIRTY:
+		smmu->features |= ARM_SMMU_FEAT_HD;
+		fallthrough;
+	case IDR0_HTTU_ACCESS:
+		smmu->features |= ARM_SMMU_FEAT_HA;
+	}
+
 	return 0;
 }
 #else
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index b574c58a3487..d82dd125446c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -33,6 +33,9 @@
 #define IDR0_ASID16			(1 << 12)
 #define IDR0_ATS			(1 << 10)
 #define IDR0_HYP			(1 << 9)
+#define IDR0_HTTU			GENMASK(7, 6)
+#define IDR0_HTTU_ACCESS		1
+#define IDR0_HTTU_ACCESS_DIRTY		2
 #define IDR0_COHACC			(1 << 4)
 #define IDR0_TTF			GENMASK(3, 2)
 #define IDR0_TTF_AARCH64		2
@@ -639,6 +642,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_BTM		(1 << 16)
 #define ARM_SMMU_FEAT_SVA		(1 << 17)
 #define ARM_SMMU_FEAT_E2H		(1 << 18)
+#define ARM_SMMU_FEAT_HA		(1 << 19)
+#define ARM_SMMU_FEAT_HD		(1 << 20)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 21/24] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (19 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 20/24] iommu/arm-smmu-v3: Add feature detection for HTTU Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-19 13:49   ` Robin Murphy
  2023-05-22 10:34   ` Shameerali Kolothum Thodi
  2023-05-18 20:46 ` [PATCH RFCv2 22/24] iommu/arm-smmu-v3: Add read_and_clear_dirty() support Joao Martins
                   ` (2 subsequent siblings)
  23 siblings, 2 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

From: Kunkun Jiang <jiangkunkun@huawei.com>

As nested mode is not upstreamed now, we just aim to support dirty
log tracking for stage1 with io-pgtable mapping (means not support
SVA mapping). If HTTU is supported, we enable HA/HD bits in the SMMU
CD and transfer ARM_HD quirk to io-pgtable.

We additionally filter out HD|HA if not supportted. The CD.HD bit
is not particularly useful unless we toggle the DBM bit in the PTE
entries.

Link: https://lore.kernel.org/lkml/20210413085457.25400-6-zhukeqian1@huawei.com/
Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
[joaomart:Convey HD|HA bits over to the context descriptor
 and update commit message; original in Link, where this is based on]
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 10 ++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  3 +++
 drivers/iommu/io-pgtable-arm.c              | 11 +++++++++--
 include/linux/io-pgtable.h                  |  4 ++++
 4 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index e110ff4710bf..e2b98a6a6b74 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1998,6 +1998,11 @@ static const struct iommu_flush_ops arm_smmu_flush_ops = {
 	.tlb_add_page	= arm_smmu_tlb_inv_page_nosync,
 };
 
+static bool arm_smmu_dbm_capable(struct arm_smmu_device *smmu)
+{
+	return smmu->features & (ARM_SMMU_FEAT_HD | ARM_SMMU_FEAT_COHERENCY);
+}
+
 /* IOMMU API */
 static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
 {
@@ -2124,6 +2129,8 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 			  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) |
 			  FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) |
 			  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
+	if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_HD)
+		cfg->cd.tcr |= CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD;
 	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
 
 	/*
@@ -2226,6 +2233,9 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 		.iommu_dev	= smmu->dev,
 	};
 
+	if (smmu->features & arm_smmu_dbm_capable(smmu))
+		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD;
+
 	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
 	if (!pgtbl_ops)
 		return -ENOMEM;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index d82dd125446c..83d6f3a2554f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -288,6 +288,9 @@
 #define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
 #define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
 
+#define CTXDESC_CD_0_TCR_HA            (1UL << 43)
+#define CTXDESC_CD_0_TCR_HD            (1UL << 42)
+
 #define CTXDESC_CD_0_AA64		(1UL << 41)
 #define CTXDESC_CD_0_S			(1UL << 44)
 #define CTXDESC_CD_0_R			(1UL << 45)
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 72dcdd468cf3..b2f470529459 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -75,6 +75,7 @@
 
 #define ARM_LPAE_PTE_NSTABLE		(((arm_lpae_iopte)1) << 63)
 #define ARM_LPAE_PTE_XN			(((arm_lpae_iopte)3) << 53)
+#define ARM_LPAE_PTE_DBM		(((arm_lpae_iopte)1) << 51)
 #define ARM_LPAE_PTE_AF			(((arm_lpae_iopte)1) << 10)
 #define ARM_LPAE_PTE_SH_NS		(((arm_lpae_iopte)0) << 8)
 #define ARM_LPAE_PTE_SH_OS		(((arm_lpae_iopte)2) << 8)
@@ -84,7 +85,7 @@
 
 #define ARM_LPAE_PTE_ATTR_LO_MASK	(((arm_lpae_iopte)0x3ff) << 2)
 /* Ignore the contiguous bit for block splitting */
-#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)6) << 52)
+#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)13) << 51)
 #define ARM_LPAE_PTE_ATTR_MASK		(ARM_LPAE_PTE_ATTR_LO_MASK |	\
 					 ARM_LPAE_PTE_ATTR_HI_MASK)
 /* Software bit for solving coherency races */
@@ -93,6 +94,9 @@
 /* Stage-1 PTE */
 #define ARM_LPAE_PTE_AP_UNPRIV		(((arm_lpae_iopte)1) << 6)
 #define ARM_LPAE_PTE_AP_RDONLY		(((arm_lpae_iopte)2) << 6)
+#define ARM_LPAE_PTE_AP_RDONLY_BIT	7
+#define ARM_LPAE_PTE_AP_WRITABLE	(ARM_LPAE_PTE_AP_RDONLY | \
+					 ARM_LPAE_PTE_DBM)
 #define ARM_LPAE_PTE_ATTRINDX_SHIFT	2
 #define ARM_LPAE_PTE_nG			(((arm_lpae_iopte)1) << 11)
 
@@ -407,6 +411,8 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 		pte = ARM_LPAE_PTE_nG;
 		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
 			pte |= ARM_LPAE_PTE_AP_RDONLY;
+		else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_HD)
+			pte |= ARM_LPAE_PTE_AP_WRITABLE;
 		if (!(prot & IOMMU_PRIV))
 			pte |= ARM_LPAE_PTE_AP_UNPRIV;
 	} else {
@@ -804,7 +810,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 
 	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
 			    IO_PGTABLE_QUIRK_ARM_TTBR1 |
-			    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
+			    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA |
+			    IO_PGTABLE_QUIRK_ARM_HD))
 		return NULL;
 
 	data = arm_lpae_alloc_pgtable(cfg);
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 25142a0e2fc2..9a996ba7856d 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -85,6 +85,8 @@ struct io_pgtable_cfg {
 	 *
 	 * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability
 	 *	attributes set in the TCR for a non-coherent page-table walker.
+	 *
+	 * IO_PGTABLE_QUIRK_ARM_HD: Enables dirty tracking.
 	 */
 	#define IO_PGTABLE_QUIRK_ARM_NS			BIT(0)
 	#define IO_PGTABLE_QUIRK_NO_PERMS		BIT(1)
@@ -92,6 +94,8 @@ struct io_pgtable_cfg {
 	#define IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT	BIT(4)
 	#define IO_PGTABLE_QUIRK_ARM_TTBR1		BIT(5)
 	#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA		BIT(6)
+	#define IO_PGTABLE_QUIRK_ARM_HD			BIT(7)
+
 	unsigned long			quirks;
 	unsigned long			pgsize_bitmap;
 	unsigned int			ias;
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 22/24] iommu/arm-smmu-v3: Add read_and_clear_dirty() support
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (20 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 21/24] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-06-16 16:46   ` Shameerali Kolothum Thodi
  2023-05-18 20:46 ` [PATCH RFCv2 23/24] iommu/arm-smmu-v3: Add set_dirty_tracking() support Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 24/24] iommu/arm-smmu-v3: Advertise IOMMU_DOMAIN_F_ENFORCE_DIRTY Joao Martins
  23 siblings, 1 reply; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

From: Keqian Zhu <zhukeqian1@huawei.com>

.read_and_clear_dirty() IOMMU domain op takes care of reading the dirty
bits (i.e. PTE has both DBM and AP[2] set) and marshalling into a bitmap of
a given page size.

While reading the dirty bits we also clear the PTE AP[2] bit to mark it as
writable-clean depending on read_and_clear_dirty() flags.

Structure it in a way that the IOPTE walker is generic, and so we pass a
function pointer over what to do on a per-PTE basis.

[Link below points to the original version that was based on]

Link: https://lore.kernel.org/lkml/20210413085457.25400-11-zhukeqian1@huawei.com/
Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Co-developed-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
[joaomart: Massage commit message]
Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  23 +++++
 drivers/iommu/io-pgtable-arm.c              | 104 ++++++++++++++++++++
 2 files changed, 127 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index e2b98a6a6b74..2cde14003469 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2765,6 +2765,28 @@ static int arm_smmu_enable_nesting(struct iommu_domain *domain)
 	return ret;
 }
 
+static int arm_smmu_read_and_clear_dirty(struct iommu_domain *domain,
+					 unsigned long iova, size_t size,
+					 unsigned long flags,
+					 struct iommu_dirty_bitmap *dirty)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+	int ret;
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return -EINVAL;
+
+	if (!ops || !ops->read_and_clear_dirty) {
+		pr_err_once("io-pgtable don't support dirty tracking\n");
+		return -ENODEV;
+	}
+
+	ret = ops->read_and_clear_dirty(ops, iova, size, flags, dirty);
+
+	return ret;
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
 	return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2893,6 +2915,7 @@ static struct iommu_ops arm_smmu_ops = {
 		.iova_to_phys		= arm_smmu_iova_to_phys,
 		.enable_nesting		= arm_smmu_enable_nesting,
 		.free			= arm_smmu_domain_free,
+		.read_and_clear_dirty	= arm_smmu_read_and_clear_dirty,
 	}
 };
 
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index b2f470529459..de9e61f8452d 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -717,6 +717,109 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
 	return iopte_to_paddr(pte, data) | iova;
 }
 
+struct arm_lpae_iopte_read_dirty {
+	unsigned long flags;
+	struct iommu_dirty_bitmap *dirty;
+};
+
+static int __arm_lpae_read_and_clear_dirty(unsigned long iova, size_t size,
+					   arm_lpae_iopte *ptep, void *opaque)
+{
+	struct arm_lpae_iopte_read_dirty *arg = opaque;
+	struct iommu_dirty_bitmap *dirty = arg->dirty;
+	arm_lpae_iopte pte;
+
+	pte = READ_ONCE(*ptep);
+	if (WARN_ON(!pte))
+		return -EINVAL;
+
+	if ((pte & ARM_LPAE_PTE_AP_WRITABLE) == ARM_LPAE_PTE_AP_WRITABLE)
+		return 0;
+
+	iommu_dirty_bitmap_record(dirty, iova, size);
+	if (!(arg->flags & IOMMU_DIRTY_NO_CLEAR))
+		set_bit(ARM_LPAE_PTE_AP_RDONLY_BIT, (unsigned long *)ptep);
+	return 0;
+}
+
+static int __arm_lpae_iopte_walk(struct arm_lpae_io_pgtable *data,
+				 unsigned long iova, size_t size,
+				 int lvl, arm_lpae_iopte *ptep,
+				 int (*fn)(unsigned long iova, size_t size,
+					   arm_lpae_iopte *pte, void *opaque),
+				 void *opaque)
+{
+	arm_lpae_iopte pte;
+	struct io_pgtable *iop = &data->iop;
+	size_t base, next_size;
+	int ret;
+
+	if (WARN_ON_ONCE(!fn))
+		return -EINVAL;
+
+	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+		return -EINVAL;
+
+	ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+	pte = READ_ONCE(*ptep);
+	if (WARN_ON(!pte))
+		return -EINVAL;
+
+	if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+		if (iopte_leaf(pte, lvl, iop->fmt))
+			return fn(iova, size, ptep, opaque);
+
+		/* Current level is table, traverse next level */
+		next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+		ptep = iopte_deref(pte, data);
+		for (base = 0; base < size; base += next_size) {
+			ret = __arm_lpae_iopte_walk(data, iova + base,
+						    next_size, lvl + 1, ptep,
+						    fn, opaque);
+			if (ret)
+				return ret;
+		}
+		return 0;
+	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
+		return fn(iova, size, ptep, opaque);
+	}
+
+	/* Keep on walkin */
+	ptep = iopte_deref(pte, data);
+	return __arm_lpae_iopte_walk(data, iova, size, lvl + 1, ptep,
+				     fn, opaque);
+}
+
+static int arm_lpae_read_and_clear_dirty(struct io_pgtable_ops *ops,
+					 unsigned long iova, size_t size,
+					 unsigned long flags,
+					 struct iommu_dirty_bitmap *dirty)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	struct arm_lpae_iopte_read_dirty arg = {
+		.flags = flags, .dirty = dirty,
+	};
+	arm_lpae_iopte *ptep = data->pgd;
+	int lvl = data->start_level;
+	long iaext = (s64)iova >> cfg->ias;
+
+	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+		return -EINVAL;
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+		iaext = ~iaext;
+	if (WARN_ON(iaext))
+		return -EINVAL;
+
+	if (data->iop.fmt != ARM_64_LPAE_S1 &&
+	    data->iop.fmt != ARM_32_LPAE_S1)
+		return -EINVAL;
+
+	return __arm_lpae_iopte_walk(data, iova, size, lvl, ptep,
+				     __arm_lpae_read_and_clear_dirty, &arg);
+}
+
 static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 {
 	unsigned long granule, page_sizes;
@@ -795,6 +898,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 		.map_pages	= arm_lpae_map_pages,
 		.unmap_pages	= arm_lpae_unmap_pages,
 		.iova_to_phys	= arm_lpae_iova_to_phys,
+		.read_and_clear_dirty = arm_lpae_read_and_clear_dirty,
 	};
 
 	return data;
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 23/24] iommu/arm-smmu-v3: Add set_dirty_tracking() support
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (21 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 22/24] iommu/arm-smmu-v3: Add read_and_clear_dirty() support Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-18 20:46 ` [PATCH RFCv2 24/24] iommu/arm-smmu-v3: Advertise IOMMU_DOMAIN_F_ENFORCE_DIRTY Joao Martins
  23 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

Dirty tracking will always be enabled with DBM=1 modifier enabled
by default.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 +++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 2cde14003469..bf0aac333725 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2787,6 +2787,27 @@ static int arm_smmu_read_and_clear_dirty(struct iommu_domain *domain,
 	return ret;
 }
 
+static int arm_smmu_set_dirty_tracking(struct iommu_domain *domain,
+				       bool enabled)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return -EINVAL;
+
+	if (!ops) {
+		pr_err_once("io-pgtable don't support dirty tracking\n");
+		return -ENODEV;
+	}
+
+	/*
+	 * Always enabled and the dirty bitmap is cleared prior to
+	 * set_dirty_tracking().
+	 */
+	return 0;
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
 	return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2916,6 +2937,7 @@ static struct iommu_ops arm_smmu_ops = {
 		.enable_nesting		= arm_smmu_enable_nesting,
 		.free			= arm_smmu_domain_free,
 		.read_and_clear_dirty	= arm_smmu_read_and_clear_dirty,
+		.set_dirty_tracking     = arm_smmu_set_dirty_tracking,
 	}
 };
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH RFCv2 24/24] iommu/arm-smmu-v3: Advertise IOMMU_DOMAIN_F_ENFORCE_DIRTY
  2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
                   ` (22 preceding siblings ...)
  2023-05-18 20:46 ` [PATCH RFCv2 23/24] iommu/arm-smmu-v3: Add set_dirty_tracking() support Joao Martins
@ 2023-05-18 20:46 ` Joao Martins
  2023-05-30 14:10   ` Shameerali Kolothum Thodi
  23 siblings, 1 reply; 65+ messages in thread
From: Joao Martins @ 2023-05-18 20:46 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, Joao Martins

Now that we probe, and handle the DBM bit modifier, unblock
the kAPI usage by exposing the IOMMU_DOMAIN_F_ENFORCE_DIRTY
and implement it's requirement of revoking device attachment
in the iommu_capable. Finally expose the IOMMU_CAP_DIRTY to
users (IOMMUFD_DEVICE_GET_CAPS).

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index bf0aac333725..71dd95a687fd 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2014,6 +2014,8 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
 		return master->smmu->features & ARM_SMMU_FEAT_COHERENCY;
 	case IOMMU_CAP_NOEXEC:
 		return true;
+	case IOMMU_CAP_DIRTY:
+		return arm_smmu_dbm_capable(master->smmu);
 	default:
 		return false;
 	}
@@ -2430,6 +2432,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	master = dev_iommu_priv_get(dev);
 	smmu = master->smmu;
 
+	if (domain->flags & IOMMU_DOMAIN_F_ENFORCE_DIRTY &&
+	    !arm_smmu_dbm_capable(smmu))
+		return -EINVAL;
+
+
 	/*
 	 * Checking that SVA is disabled ensures that this device isn't bound to
 	 * any mm, and can be safely detached from its old domain. Bonds cannot
@@ -2913,6 +2920,7 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 }
 
 static struct iommu_ops arm_smmu_ops = {
+	.supported_flags	= IOMMU_DOMAIN_F_ENFORCE_DIRTY,
 	.capable		= arm_smmu_capable,
 	.domain_alloc		= arm_smmu_domain_alloc,
 	.probe_device		= arm_smmu_probe_device,
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 03/24] vfio: Move iova_bitmap into iommu core
  2023-05-18 20:46 ` [PATCH RFCv2 03/24] vfio: Move iova_bitmap into iommu core Joao Martins
@ 2023-05-18 22:35   ` Alex Williamson
  2023-05-19  9:06     ` Joao Martins
  2023-05-19  9:01   ` Liu, Jingqi
  1 sibling, 1 reply; 65+ messages in thread
From: Alex Williamson @ 2023-05-18 22:35 UTC (permalink / raw)
  To: Joao Martins
  Cc: iommu, Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi,
	Lu Baolu, Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen,
	Joerg Roedel, Jean-Philippe Brucker, Suravee Suthikulpanit,
	Will Deacon, Robin Murphy, kvm

On Thu, 18 May 2023 21:46:29 +0100
Joao Martins <joao.m.martins@oracle.com> wrote:

> Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking
> the user bitmaps, so move to the common dependency into IOMMU core. IOMMUFD
> can't exactly host it given that VFIO dirty tracking can be used without
> IOMMUFD.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  drivers/iommu/Makefile                | 1 +
>  drivers/{vfio => iommu}/iova_bitmap.c | 0
>  drivers/vfio/Makefile                 | 3 +--
>  3 files changed, 2 insertions(+), 2 deletions(-)
>  rename drivers/{vfio => iommu}/iova_bitmap.c (100%)
> 
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 769e43d780ce..9d9dfbd2dfc2 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -10,6 +10,7 @@ obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_DART) += io-pgtable-dart.o
>  obj-$(CONFIG_IOMMU_IOVA) += iova.o
> +obj-$(CONFIG_IOMMU_IOVA) += iova_bitmap.o
>  obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
>  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
>  obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
> diff --git a/drivers/vfio/iova_bitmap.c b/drivers/iommu/iova_bitmap.c
> similarity index 100%
> rename from drivers/vfio/iova_bitmap.c
> rename to drivers/iommu/iova_bitmap.c
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index 57c3515af606..f9cc32a9810c 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -1,8 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-$(CONFIG_VFIO) += vfio.o
>  
> -vfio-y += vfio_main.o \
> -	  iova_bitmap.o
> +vfio-y += vfio_main.o
>  vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
>  vfio-$(CONFIG_VFIO_GROUP) += group.o
>  vfio-$(CONFIG_IOMMUFD) += iommufd.o

Doesn't this require more symbols to be exported for vfio?  I only see
iova_bitmap_set() as currently exported for vfio-pci variant drivers,
but vfio needs iova_bitmap_alloc(), iova_bitmap_free(), and
iova_bitmap_for_each().  Otherwise I'm happy to see it move to its new
home ;)  Thanks,

Alex


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking
  2023-05-18 20:46 ` [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking Joao Martins
@ 2023-05-19  8:42   ` Baolu Lu
  2023-05-19  9:28     ` Joao Martins
  2023-05-19 11:40   ` Jason Gunthorpe
  2023-05-19 13:22   ` Robin Murphy
  2 siblings, 1 reply; 65+ messages in thread
From: Baolu Lu @ 2023-05-19  8:42 UTC (permalink / raw)
  To: Joao Martins, iommu
  Cc: baolu.lu, Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm

On 2023/5/19 4:46, Joao Martins wrote:
> Add to iommu domain operations a set of callbacks to perform dirty
> tracking, particulary to start and stop tracking and finally to read and
> clear the dirty data.
> 
> Drivers are generally expected to dynamically change its translation
> structures to toggle the tracking and flush some form of control state
> structure that stands in the IOVA translation path. Though it's not
> mandatory, as drivers will be enable dirty tracking at boot, and just flush
> the IO pagetables when setting dirty tracking.  For each of the newly added
> IOMMU core APIs:
> 
> .supported_flags[IOMMU_DOMAIN_F_ENFORCE_DIRTY]: Introduce a set of flags
> that enforce certain restrictions in the iommu_domain object. For dirty
> tracking this means that when IOMMU_DOMAIN_F_ENFORCE_DIRTY is set via its
> helper iommu_domain_set_flags(...) devices attached via attach_dev will
> fail on devices that do*not*  have dirty tracking supported. IOMMU drivers
> that support dirty tracking should advertise this flag, while enforcing
> that dirty tracking is supported by the device in its .attach_dev iommu op.
> 
> iommu_cap::IOMMU_CAP_DIRTY: new device iommu_capable value when probing for
> capabilities of the device.
> 
> .set_dirty_tracking(): an iommu driver is expected to change its
> translation structures and enable dirty tracking for the devices in the
> iommu_domain. For drivers making dirty tracking always-enabled, it should
> just return 0.
> 
> .read_and_clear_dirty(): an iommu driver is expected to walk the iova range
> passed in and use iommu_dirty_bitmap_record() to record dirty info per
> IOVA. When detecting a given IOVA is dirty it should also clear its dirty
> state from the PTE,*unless*  the flag IOMMU_DIRTY_NO_CLEAR is passed in --
> flushing is steered from the caller of the domain_op via iotlb_gather. The
> iommu core APIs use the same data structure in use for dirty tracking for
> VFIO device dirty (struct iova_bitmap) abstracted by
> iommu_dirty_bitmap_record() helper function.
> 
> Signed-off-by: Joao Martins<joao.m.martins@oracle.com>
> ---
>   drivers/iommu/iommu.c      | 11 +++++++
>   include/linux/io-pgtable.h |  4 +++
>   include/linux/iommu.h      | 67 ++++++++++++++++++++++++++++++++++++++
>   3 files changed, 82 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 2088caae5074..95acc543e8fb 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2013,6 +2013,17 @@ struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus)
>   }
>   EXPORT_SYMBOL_GPL(iommu_domain_alloc);
>   
> +int iommu_domain_set_flags(struct iommu_domain *domain,
> +			   const struct bus_type *bus, unsigned long val)
> +{
> +	if (!(val & bus->iommu_ops->supported_flags))
> +		return -EINVAL;
> +
> +	domain->flags |= val;
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_domain_set_flags);

This seems to be a return to an old question. IOMMU domains are
allocated through buses, but can a domain be attached to devices on
different buses that happen to have different IOMMU ops? In reality, we
may not have such heterogeneous configurations yet, but it is better to
avoid such confusion as much as possible.

How about adding a domain op like .enforce_dirty_page_tracking. The
individual iommu driver which implements this callback will iterate all
devices that the domain has been attached and return success only if all
attached devices support dirty page tracking.

Then, in the domain's attach_dev or set_dev_pasid callbacks, if the
domain has been enforced dirty page tracking while the device to be
attached doesn't support it, -EINVAL will be returned which could be
intercepted by the caller as domain is incompatible.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 03/24] vfio: Move iova_bitmap into iommu core
  2023-05-18 20:46 ` [PATCH RFCv2 03/24] vfio: Move iova_bitmap into iommu core Joao Martins
  2023-05-18 22:35   ` Alex Williamson
@ 2023-05-19  9:01   ` Liu, Jingqi
  2023-05-19  9:07     ` Joao Martins
  1 sibling, 1 reply; 65+ messages in thread
From: Liu, Jingqi @ 2023-05-19  9:01 UTC (permalink / raw)
  To: Joao Martins, iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm


On 5/19/2023 4:46 AM, Joao Martins wrote:
> Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking
> the user bitmaps, so move to the common dependency into IOMMU core. IOMMUFD
s/move to/move
> can't exactly host it given that VFIO dirty tracking can be used without
> IOMMUFD.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>   drivers/iommu/Makefile                | 1 +
>   drivers/{vfio => iommu}/iova_bitmap.c | 0
>   drivers/vfio/Makefile                 | 3 +--
>   3 files changed, 2 insertions(+), 2 deletions(-)
>   rename drivers/{vfio => iommu}/iova_bitmap.c (100%)
>
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 769e43d780ce..9d9dfbd2dfc2 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -10,6 +10,7 @@ obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>   obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
>   obj-$(CONFIG_IOMMU_IO_PGTABLE_DART) += io-pgtable-dart.o
>   obj-$(CONFIG_IOMMU_IOVA) += iova.o
> +obj-$(CONFIG_IOMMU_IOVA) += iova_bitmap.o
>   obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
>   obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
>   obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
> diff --git a/drivers/vfio/iova_bitmap.c b/drivers/iommu/iova_bitmap.c
> similarity index 100%
> rename from drivers/vfio/iova_bitmap.c
> rename to drivers/iommu/iova_bitmap.c
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index 57c3515af606..f9cc32a9810c 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -1,8 +1,7 @@
>   # SPDX-License-Identifier: GPL-2.0
>   obj-$(CONFIG_VFIO) += vfio.o
>   
> -vfio-y += vfio_main.o \
> -	  iova_bitmap.o
> +vfio-y += vfio_main.o
>   vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
>   vfio-$(CONFIG_VFIO_GROUP) += group.o
>   vfio-$(CONFIG_IOMMUFD) += iommufd.o
Thanks,
Jingqi

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 03/24] vfio: Move iova_bitmap into iommu core
  2023-05-18 22:35   ` Alex Williamson
@ 2023-05-19  9:06     ` Joao Martins
  0 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-19  9:06 UTC (permalink / raw)
  To: Alex Williamson
  Cc: iommu, Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi,
	Lu Baolu, Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen,
	Joerg Roedel, Jean-Philippe Brucker, Suravee Suthikulpanit,
	Will Deacon, Robin Murphy, kvm



On 18/05/2023 23:35, Alex Williamson wrote:
> On Thu, 18 May 2023 21:46:29 +0100
> Joao Martins <joao.m.martins@oracle.com> wrote:
> 
>> Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking
>> the user bitmaps, so move to the common dependency into IOMMU core. IOMMUFD
>> can't exactly host it given that VFIO dirty tracking can be used without
>> IOMMUFD.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  drivers/iommu/Makefile                | 1 +
>>  drivers/{vfio => iommu}/iova_bitmap.c | 0
>>  drivers/vfio/Makefile                 | 3 +--
>>  3 files changed, 2 insertions(+), 2 deletions(-)
>>  rename drivers/{vfio => iommu}/iova_bitmap.c (100%)
>>
>> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
>> index 769e43d780ce..9d9dfbd2dfc2 100644
>> --- a/drivers/iommu/Makefile
>> +++ b/drivers/iommu/Makefile
>> @@ -10,6 +10,7 @@ obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
>>  obj-$(CONFIG_IOMMU_IO_PGTABLE_DART) += io-pgtable-dart.o
>>  obj-$(CONFIG_IOMMU_IOVA) += iova.o
>> +obj-$(CONFIG_IOMMU_IOVA) += iova_bitmap.o
>>  obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
>>  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
>>  obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
>> diff --git a/drivers/vfio/iova_bitmap.c b/drivers/iommu/iova_bitmap.c
>> similarity index 100%
>> rename from drivers/vfio/iova_bitmap.c
>> rename to drivers/iommu/iova_bitmap.c
>> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
>> index 57c3515af606..f9cc32a9810c 100644
>> --- a/drivers/vfio/Makefile
>> +++ b/drivers/vfio/Makefile
>> @@ -1,8 +1,7 @@
>>  # SPDX-License-Identifier: GPL-2.0
>>  obj-$(CONFIG_VFIO) += vfio.o
>>  
>> -vfio-y += vfio_main.o \
>> -	  iova_bitmap.o
>> +vfio-y += vfio_main.o
>>  vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
>>  vfio-$(CONFIG_VFIO_GROUP) += group.o
>>  vfio-$(CONFIG_IOMMUFD) += iommufd.o
> 
> Doesn't this require more symbols to be exported for vfio?  I only see
> iova_bitmap_set() as currently exported for vfio-pci variant drivers,
> but vfio needs iova_bitmap_alloc(), iova_bitmap_free(), and
> iova_bitmap_for_each(). 

It does, my mistake. I was using builtin for rapid iteration and forgot the most
obvious thing to build iommufd=m I'll fix with with a precedecessor patch
exporting the needed symbols

> Otherwise I'm happy to see it move to its new
> home ;)  Thanks,
> 
;)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 03/24] vfio: Move iova_bitmap into iommu core
  2023-05-19  9:01   ` Liu, Jingqi
@ 2023-05-19  9:07     ` Joao Martins
  0 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-19  9:07 UTC (permalink / raw)
  To: Liu, Jingqi
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm, iommu

On 19/05/2023 10:01, Liu, Jingqi wrote:
> On 5/19/2023 4:46 AM, Joao Martins wrote:
>> Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking
>> the user bitmaps, so move to the common dependency into IOMMU core. IOMMUFD
> s/move to/move

Indeed

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking
  2023-05-19  8:42   ` Baolu Lu
@ 2023-05-19  9:28     ` Joao Martins
  0 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-19  9:28 UTC (permalink / raw)
  To: Baolu Lu, iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
	Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm



On 19/05/2023 09:42, Baolu Lu wrote:
> On 2023/5/19 4:46, Joao Martins wrote:
>> Add to iommu domain operations a set of callbacks to perform dirty
>> tracking, particulary to start and stop tracking and finally to read and
>> clear the dirty data.
>>
>> Drivers are generally expected to dynamically change its translation
>> structures to toggle the tracking and flush some form of control state
>> structure that stands in the IOVA translation path. Though it's not
>> mandatory, as drivers will be enable dirty tracking at boot, and just flush
>> the IO pagetables when setting dirty tracking.  For each of the newly added
>> IOMMU core APIs:
>>
>> .supported_flags[IOMMU_DOMAIN_F_ENFORCE_DIRTY]: Introduce a set of flags
>> that enforce certain restrictions in the iommu_domain object. For dirty
>> tracking this means that when IOMMU_DOMAIN_F_ENFORCE_DIRTY is set via its
>> helper iommu_domain_set_flags(...) devices attached via attach_dev will
>> fail on devices that do*not*  have dirty tracking supported. IOMMU drivers
>> that support dirty tracking should advertise this flag, while enforcing
>> that dirty tracking is supported by the device in its .attach_dev iommu op.
>>
>> iommu_cap::IOMMU_CAP_DIRTY: new device iommu_capable value when probing for
>> capabilities of the device.
>>
>> .set_dirty_tracking(): an iommu driver is expected to change its
>> translation structures and enable dirty tracking for the devices in the
>> iommu_domain. For drivers making dirty tracking always-enabled, it should
>> just return 0.
>>
>> .read_and_clear_dirty(): an iommu driver is expected to walk the iova range
>> passed in and use iommu_dirty_bitmap_record() to record dirty info per
>> IOVA. When detecting a given IOVA is dirty it should also clear its dirty
>> state from the PTE,*unless*  the flag IOMMU_DIRTY_NO_CLEAR is passed in --
>> flushing is steered from the caller of the domain_op via iotlb_gather. The
>> iommu core APIs use the same data structure in use for dirty tracking for
>> VFIO device dirty (struct iova_bitmap) abstracted by
>> iommu_dirty_bitmap_record() helper function.
>>
>> Signed-off-by: Joao Martins<joao.m.martins@oracle.com>
>> ---
>>   drivers/iommu/iommu.c      | 11 +++++++
>>   include/linux/io-pgtable.h |  4 +++
>>   include/linux/iommu.h      | 67 ++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 82 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 2088caae5074..95acc543e8fb 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2013,6 +2013,17 @@ struct iommu_domain *iommu_domain_alloc(const struct
>> bus_type *bus)
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_domain_alloc);
>>   +int iommu_domain_set_flags(struct iommu_domain *domain,
>> +               const struct bus_type *bus, unsigned long val)
>> +{
>> +    if (!(val & bus->iommu_ops->supported_flags))
>> +        return -EINVAL;
>> +
>> +    domain->flags |= val;
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_domain_set_flags);
> 
> This seems to be a return to an old question. IOMMU domains are
> allocated through buses, but can a domain be attached to devices on
> different buses that happen to have different IOMMU ops? In reality, we
> may not have such heterogeneous configurations yet, but it is better to
> avoid such confusion as much as possible.
> 
> How about adding a domain op like .enforce_dirty_page_tracking. The
> individual iommu driver which implements this callback will iterate all
> devices that the domain has been attached and return success only if all
> attached devices support dirty page tracking.
> 

Hmm, but isn't the point is to actually prevent this from happening? Meaning to
ensure that only devices that support dirty tracking are attached in the domain?
The flag is meant to be advertise that individual domain knows about dirty
tracking enforcement, and it will validate it at .attach_dev when asked.

> Then, in the domain's attach_dev or set_dev_pasid callbacks, if the

I certainly didn't handle the ::set_dev_pasid callback, might have to fix next
iteration

> domain has been enforced dirty page tracking while the device to be
> attached doesn't support it, -EINVAL will be returned which could be
> intercepted by the caller as domain is incompatible.

This part is already done; I am just a little stuck on a
::enforce_dirty_tracking domain op being done /after/ devices are already
present in the domain. Note that today this is done right after we create the
hwpt (i.e. the iommu_domain) without devices being attached to it yet. I had a
separate version where I create a domain object with (bus, flags) as an
alternate incantation of this. In the variants alternatives I implemented,
ultimately picked this one as it could other similar things to sit on (e.g.
enforce_cache_coherency?)

I can switch to enforce_dirty_tracking instead of a flag, but I think it looks
more correct to ensure the property remains imutable at-domain-creation rather
than post dev attachment, unless I am missing something here.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking
  2023-05-18 20:46 ` [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking Joao Martins
  2023-05-19  8:42   ` Baolu Lu
@ 2023-05-19 11:40   ` Jason Gunthorpe
  2023-05-19 11:47     ` Joao Martins
  2023-05-19 13:22   ` Robin Murphy
  2 siblings, 1 reply; 65+ messages in thread
From: Jason Gunthorpe @ 2023-05-19 11:40 UTC (permalink / raw)
  To: Joao Martins
  Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
	Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm

On Thu, May 18, 2023 at 09:46:30PM +0100, Joao Martins wrote:
> Add to iommu domain operations a set of callbacks to perform dirty
> tracking, particulary to start and stop tracking and finally to read and
> clear the dirty data.
> 
> Drivers are generally expected to dynamically change its translation
> structures to toggle the tracking and flush some form of control state
> structure that stands in the IOVA translation path. Though it's not
> mandatory, as drivers will be enable dirty tracking at boot, and just flush
> the IO pagetables when setting dirty tracking.  For each of the newly added
> IOMMU core APIs:
> 
> .supported_flags[IOMMU_DOMAIN_F_ENFORCE_DIRTY]: Introduce a set of flags
> that enforce certain restrictions in the iommu_domain object. For dirty
> tracking this means that when IOMMU_DOMAIN_F_ENFORCE_DIRTY is set via its
> helper iommu_domain_set_flags(...) devices attached via attach_dev will
> fail on devices that do *not* have dirty tracking supported. IOMMU drivers
> that support dirty tracking should advertise this flag, while enforcing
> that dirty tracking is supported by the device in its .attach_dev iommu op.
> 
> iommu_cap::IOMMU_CAP_DIRTY: new device iommu_capable value when probing for
> capabilities of the device.
> 
> .set_dirty_tracking(): an iommu driver is expected to change its
> translation structures and enable dirty tracking for the devices in the
> iommu_domain. For drivers making dirty tracking always-enabled, it should
> just return 0.
> 
> .read_and_clear_dirty(): an iommu driver is expected to walk the iova range
> passed in and use iommu_dirty_bitmap_record() to record dirty info per
> IOVA. When detecting a given IOVA is dirty it should also clear its dirty
> state from the PTE, *unless* the flag IOMMU_DIRTY_NO_CLEAR is passed in --
> flushing is steered from the caller of the domain_op via iotlb_gather. The
> iommu core APIs use the same data structure in use for dirty tracking for
> VFIO device dirty (struct iova_bitmap) abstracted by
> iommu_dirty_bitmap_record() helper function.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  drivers/iommu/iommu.c      | 11 +++++++
>  include/linux/io-pgtable.h |  4 +++
>  include/linux/iommu.h      | 67 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 82 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 2088caae5074..95acc543e8fb 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2013,6 +2013,17 @@ struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus)
>  }
>  EXPORT_SYMBOL_GPL(iommu_domain_alloc);
>  
> +int iommu_domain_set_flags(struct iommu_domain *domain,
> +			   const struct bus_type *bus, unsigned long val)
> +{

Definately no bus argument.

The supported_flags should be in the domain op not the bus op.

But I think this is sort of the wrong direction, the dirty tracking
mode should be requested when the domain is created, not changed after
the fact.

Jason

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking
  2023-05-19 11:40   ` Jason Gunthorpe
@ 2023-05-19 11:47     ` Joao Martins
  2023-05-19 11:51       ` Jason Gunthorpe
  0 siblings, 1 reply; 65+ messages in thread
From: Joao Martins @ 2023-05-19 11:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
	Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm

On 19/05/2023 12:40, Jason Gunthorpe wrote:
> On Thu, May 18, 2023 at 09:46:30PM +0100, Joao Martins wrote:
>> Add to iommu domain operations a set of callbacks to perform dirty
>> tracking, particulary to start and stop tracking and finally to read and
>> clear the dirty data.
>>
>> Drivers are generally expected to dynamically change its translation
>> structures to toggle the tracking and flush some form of control state
>> structure that stands in the IOVA translation path. Though it's not
>> mandatory, as drivers will be enable dirty tracking at boot, and just flush
>> the IO pagetables when setting dirty tracking.  For each of the newly added
>> IOMMU core APIs:
>>
>> .supported_flags[IOMMU_DOMAIN_F_ENFORCE_DIRTY]: Introduce a set of flags
>> that enforce certain restrictions in the iommu_domain object. For dirty
>> tracking this means that when IOMMU_DOMAIN_F_ENFORCE_DIRTY is set via its
>> helper iommu_domain_set_flags(...) devices attached via attach_dev will
>> fail on devices that do *not* have dirty tracking supported. IOMMU drivers
>> that support dirty tracking should advertise this flag, while enforcing
>> that dirty tracking is supported by the device in its .attach_dev iommu op.
>>
>> iommu_cap::IOMMU_CAP_DIRTY: new device iommu_capable value when probing for
>> capabilities of the device.
>>
>> .set_dirty_tracking(): an iommu driver is expected to change its
>> translation structures and enable dirty tracking for the devices in the
>> iommu_domain. For drivers making dirty tracking always-enabled, it should
>> just return 0.
>>
>> .read_and_clear_dirty(): an iommu driver is expected to walk the iova range
>> passed in and use iommu_dirty_bitmap_record() to record dirty info per
>> IOVA. When detecting a given IOVA is dirty it should also clear its dirty
>> state from the PTE, *unless* the flag IOMMU_DIRTY_NO_CLEAR is passed in --
>> flushing is steered from the caller of the domain_op via iotlb_gather. The
>> iommu core APIs use the same data structure in use for dirty tracking for
>> VFIO device dirty (struct iova_bitmap) abstracted by
>> iommu_dirty_bitmap_record() helper function.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  drivers/iommu/iommu.c      | 11 +++++++
>>  include/linux/io-pgtable.h |  4 +++
>>  include/linux/iommu.h      | 67 ++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 82 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 2088caae5074..95acc543e8fb 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2013,6 +2013,17 @@ struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus)
>>  }
>>  EXPORT_SYMBOL_GPL(iommu_domain_alloc);
>>  
>> +int iommu_domain_set_flags(struct iommu_domain *domain,
>> +			   const struct bus_type *bus, unsigned long val)
>> +{
> 
> Definately no bus argument.
> 
> The supported_flags should be in the domain op not the bus op.
> 
> But I think this is sort of the wrong direction, the dirty tracking
> mode should be requested when the domain is created, not changed after
> the fact.

In practice it is done as soon after the domain is created but I understand what
you mean that both should be together; I have this implemented like that as my
first take as a domain_alloc passed flags, but I was a little undecided because
we are adding another domain_alloc() op for the user-managed pagetable and after
having another one we would end up with 3 ways of creating iommu domain -- but
maybe that's not an issue

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking
  2023-05-19 11:47     ` Joao Martins
@ 2023-05-19 11:51       ` Jason Gunthorpe
  2023-05-19 11:56         ` Joao Martins
  2023-05-19 12:13         ` Baolu Lu
  0 siblings, 2 replies; 65+ messages in thread
From: Jason Gunthorpe @ 2023-05-19 11:51 UTC (permalink / raw)
  To: Joao Martins
  Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
	Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm

On Fri, May 19, 2023 at 12:47:24PM +0100, Joao Martins wrote:

> In practice it is done as soon after the domain is created but I understand what
> you mean that both should be together; I have this implemented like that as my
> first take as a domain_alloc passed flags, but I was a little undecided because
> we are adding another domain_alloc() op for the user-managed pagetable and after
> having another one we would end up with 3 ways of creating iommu domain -- but
> maybe that's not an issue

It should ride on the same user domain alloc op as some generic flags,
there is no immediate use case to enable dirty tracking for
non-iommufd page tables

Jason

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking
  2023-05-19 11:51       ` Jason Gunthorpe
@ 2023-05-19 11:56         ` Joao Martins
  2023-05-19 13:29           ` Jason Gunthorpe
  2023-05-19 12:13         ` Baolu Lu
  1 sibling, 1 reply; 65+ messages in thread
From: Joao Martins @ 2023-05-19 11:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
	Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm



On 19/05/2023 12:51, Jason Gunthorpe wrote:
> On Fri, May 19, 2023 at 12:47:24PM +0100, Joao Martins wrote:
> 
>> In practice it is done as soon after the domain is created but I understand what
>> you mean that both should be together; I have this implemented like that as my
>> first take as a domain_alloc passed flags, but I was a little undecided because
>> we are adding another domain_alloc() op for the user-managed pagetable and after
>> having another one we would end up with 3 ways of creating iommu domain -- but
>> maybe that's not an issue
> 
> It should ride on the same user domain alloc op as some generic flags,

OK, I suppose that makes sense specially with this being tied in HWPT_ALLOC
where all this new user domain alloc does.

> there is no immediate use case to enable dirty tracking for
> non-iommufd page tables

True

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking
  2023-05-19 11:51       ` Jason Gunthorpe
  2023-05-19 11:56         ` Joao Martins
@ 2023-05-19 12:13         ` Baolu Lu
  1 sibling, 0 replies; 65+ messages in thread
From: Baolu Lu @ 2023-05-19 12:13 UTC (permalink / raw)
  To: Jason Gunthorpe, Joao Martins
  Cc: baolu.lu, iommu, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
	Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm

On 2023/5/19 19:51, Jason Gunthorpe wrote:
> On Fri, May 19, 2023 at 12:47:24PM +0100, Joao Martins wrote:
> 
>> In practice it is done as soon after the domain is created but I understand what
>> you mean that both should be together; I have this implemented like that as my
>> first take as a domain_alloc passed flags, but I was a little undecided because
>> we are adding another domain_alloc() op for the user-managed pagetable and after
>> having another one we would end up with 3 ways of creating iommu domain -- but
>> maybe that's not an issue
> It should ride on the same user domain alloc op as some generic flags,
> there is no immediate use case to enable dirty tracking for
> non-iommufd page tables

This is better than the current solution.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking
  2023-05-18 20:46 ` [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking Joao Martins
  2023-05-19  8:42   ` Baolu Lu
  2023-05-19 11:40   ` Jason Gunthorpe
@ 2023-05-19 13:22   ` Robin Murphy
  2023-05-19 13:43     ` Joao Martins
  2 siblings, 1 reply; 65+ messages in thread
From: Robin Murphy @ 2023-05-19 13:22 UTC (permalink / raw)
  To: Joao Martins, iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Alex Williamson, kvm

On 2023-05-18 21:46, Joao Martins wrote:
> Add to iommu domain operations a set of callbacks to perform dirty
> tracking, particulary to start and stop tracking and finally to read and
> clear the dirty data.
> 
> Drivers are generally expected to dynamically change its translation
> structures to toggle the tracking and flush some form of control state
> structure that stands in the IOVA translation path. Though it's not
> mandatory, as drivers will be enable dirty tracking at boot, and just flush
> the IO pagetables when setting dirty tracking.  For each of the newly added
> IOMMU core APIs:
> 
> .supported_flags[IOMMU_DOMAIN_F_ENFORCE_DIRTY]: Introduce a set of flags
> that enforce certain restrictions in the iommu_domain object. For dirty
> tracking this means that when IOMMU_DOMAIN_F_ENFORCE_DIRTY is set via its
> helper iommu_domain_set_flags(...) devices attached via attach_dev will
> fail on devices that do *not* have dirty tracking supported. IOMMU drivers
> that support dirty tracking should advertise this flag, while enforcing
> that dirty tracking is supported by the device in its .attach_dev iommu op.

Eww, no. For an internal thing, just call ->capable() - I mean, you're 
literally adding this feature as one of its caps...

However I'm not sure if we even need that - domains which don't support 
dirty tracking should just not expose the ops, and thus it ought to be 
inherently obvious.

I'm guessing most of the weirdness here is implicitly working around the 
enabled-from-the-start scenario on SMMUv3:

	domain = iommu_domain_alloc(bus);
	iommu_set_dirty_tracking(domain);
	// arm-smmu-v3 says OK since it doesn't know that it
	// definitely *isn't* possible, and saying no wouldn't
	// be helpful
	iommu_attach_group(group, domain);
	// oops, now we see that the relevant SMMU instance isn't one
	// which actually supports HTTU, what do we do? :(

I don't have any major objection to the general principle of flagging 
the domain to fail attach if it can't do what we promised, as a bodge 
for now, but please implement it privately in arm-smmu-v3 so it's easier 
to clean up again in future once until iommu_domain_alloc() gets sorted 
out properly to get rid of this awkward blind spot.

Thanks,
Robin.

> iommu_cap::IOMMU_CAP_DIRTY: new device iommu_capable value when probing for
> capabilities of the device.
> 
> .set_dirty_tracking(): an iommu driver is expected to change its
> translation structures and enable dirty tracking for the devices in the
> iommu_domain. For drivers making dirty tracking always-enabled, it should
> just return 0.
> 
> .read_and_clear_dirty(): an iommu driver is expected to walk the iova range
> passed in and use iommu_dirty_bitmap_record() to record dirty info per
> IOVA. When detecting a given IOVA is dirty it should also clear its dirty
> state from the PTE, *unless* the flag IOMMU_DIRTY_NO_CLEAR is passed in --
> flushing is steered from the caller of the domain_op via iotlb_gather. The
> iommu core APIs use the same data structure in use for dirty tracking for
> VFIO device dirty (struct iova_bitmap) abstracted by
> iommu_dirty_bitmap_record() helper function.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>   drivers/iommu/iommu.c      | 11 +++++++
>   include/linux/io-pgtable.h |  4 +++
>   include/linux/iommu.h      | 67 ++++++++++++++++++++++++++++++++++++++
>   3 files changed, 82 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 2088caae5074..95acc543e8fb 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2013,6 +2013,17 @@ struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus)
>   }
>   EXPORT_SYMBOL_GPL(iommu_domain_alloc);
>   
> +int iommu_domain_set_flags(struct iommu_domain *domain,
> +			   const struct bus_type *bus, unsigned long val)
> +{
> +	if (!(val & bus->iommu_ops->supported_flags))
> +		return -EINVAL;
> +
> +	domain->flags |= val;
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_domain_set_flags);
> +
>   void iommu_domain_free(struct iommu_domain *domain)
>   {
>   	if (domain->type == IOMMU_DOMAIN_SVA)
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 1b7a44b35616..25142a0e2fc2 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -166,6 +166,10 @@ struct io_pgtable_ops {
>   			      struct iommu_iotlb_gather *gather);
>   	phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
>   				    unsigned long iova);
> +	int (*read_and_clear_dirty)(struct io_pgtable_ops *ops,
> +				    unsigned long iova, size_t size,
> +				    unsigned long flags,
> +				    struct iommu_dirty_bitmap *dirty);
>   };
>   
>   /**
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 39d25645a5ab..992ea87f2f8e 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -13,6 +13,7 @@
>   #include <linux/errno.h>
>   #include <linux/err.h>
>   #include <linux/of.h>
> +#include <linux/iova_bitmap.h>
>   #include <uapi/linux/iommu.h>
>   
>   #define IOMMU_READ	(1 << 0)
> @@ -65,6 +66,11 @@ struct iommu_domain_geometry {
>   
>   #define __IOMMU_DOMAIN_SVA	(1U << 4)  /* Shared process address space */
>   
> +/* Domain feature flags that do not define domain types */
> +#define IOMMU_DOMAIN_F_ENFORCE_DIRTY	(1U << 6)  /* Enforce attachment of
> +						      dirty tracking supported
> +						      devices		  */
> +
>   /*
>    * This are the possible domain-types
>    *
> @@ -93,6 +99,7 @@ struct iommu_domain_geometry {
>   
>   struct iommu_domain {
>   	unsigned type;
> +	unsigned flags;
>   	const struct iommu_domain_ops *ops;
>   	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
>   	struct iommu_domain_geometry geometry;
> @@ -128,6 +135,7 @@ enum iommu_cap {
>   	 * this device.
>   	 */
>   	IOMMU_CAP_ENFORCE_CACHE_COHERENCY,
> +	IOMMU_CAP_DIRTY,		/* IOMMU supports dirty tracking */
>   };
>   
>   /* These are the possible reserved region types */
> @@ -220,6 +228,17 @@ struct iommu_iotlb_gather {
>   	bool			queued;
>   };
>   
> +/**
> + * struct iommu_dirty_bitmap - Dirty IOVA bitmap state
> + *
> + * @bitmap: IOVA bitmap
> + * @gather: Range information for a pending IOTLB flush
> + */
> +struct iommu_dirty_bitmap {
> +	struct iova_bitmap *bitmap;
> +	struct iommu_iotlb_gather *gather;
> +};
> +
>   /**
>    * struct iommu_ops - iommu ops and capabilities
>    * @capable: check capability
> @@ -248,6 +267,7 @@ struct iommu_iotlb_gather {
>    *                    pasid, so that any DMA transactions with this pasid
>    *                    will be blocked by the hardware.
>    * @pgsize_bitmap: bitmap of all possible supported page sizes
> + * @flags: All non domain type supported features
>    * @owner: Driver module providing these ops
>    */
>   struct iommu_ops {
> @@ -281,6 +301,7 @@ struct iommu_ops {
>   
>   	const struct iommu_domain_ops *default_domain_ops;
>   	unsigned long pgsize_bitmap;
> +	unsigned long supported_flags;
>   	struct module *owner;
>   };
>   
> @@ -316,6 +337,11 @@ struct iommu_ops {
>    * @enable_nesting: Enable nesting
>    * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
>    * @free: Release the domain after use.
> + * @set_dirty_tracking: Enable or Disable dirty tracking on the iommu domain
> + * @read_and_clear_dirty: Walk IOMMU page tables for dirtied PTEs marshalled
> + *                        into a bitmap, with a bit represented as a page.
> + *                        Reads the dirty PTE bits and clears it from IO
> + *                        pagetables.
>    */
>   struct iommu_domain_ops {
>   	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
> @@ -348,6 +374,12 @@ struct iommu_domain_ops {
>   				  unsigned long quirks);
>   
>   	void (*free)(struct iommu_domain *domain);
> +
> +	int (*set_dirty_tracking)(struct iommu_domain *domain, bool enabled);
> +	int (*read_and_clear_dirty)(struct iommu_domain *domain,
> +				    unsigned long iova, size_t size,
> +				    unsigned long flags,
> +				    struct iommu_dirty_bitmap *dirty);
>   };
>   
>   /**
> @@ -461,6 +493,9 @@ extern bool iommu_present(const struct bus_type *bus);
>   extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap);
>   extern bool iommu_group_has_isolated_msi(struct iommu_group *group);
>   extern struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus);
> +extern int iommu_domain_set_flags(struct iommu_domain *domain,
> +				  const struct bus_type *bus,
> +				  unsigned long flags);
>   extern void iommu_domain_free(struct iommu_domain *domain);
>   extern int iommu_attach_device(struct iommu_domain *domain,
>   			       struct device *dev);
> @@ -627,6 +662,28 @@ static inline bool iommu_iotlb_gather_queued(struct iommu_iotlb_gather *gather)
>   	return gather && gather->queued;
>   }
>   
> +static inline void iommu_dirty_bitmap_init(struct iommu_dirty_bitmap *dirty,
> +					   struct iova_bitmap *bitmap,
> +					   struct iommu_iotlb_gather *gather)
> +{
> +	if (gather)
> +		iommu_iotlb_gather_init(gather);
> +
> +	dirty->bitmap = bitmap;
> +	dirty->gather = gather;
> +}
> +
> +static inline void
> +iommu_dirty_bitmap_record(struct iommu_dirty_bitmap *dirty, unsigned long iova,
> +			  unsigned long length)
> +{
> +	if (dirty->bitmap)
> +		iova_bitmap_set(dirty->bitmap, iova, length);
> +
> +	if (dirty->gather)
> +		iommu_iotlb_gather_add_range(dirty->gather, iova, length);
> +}
> +
>   /* PCI device grouping function */
>   extern struct iommu_group *pci_device_group(struct device *dev);
>   /* Generic device grouping function */
> @@ -657,6 +714,9 @@ struct iommu_fwspec {
>   /* ATS is supported */
>   #define IOMMU_FWSPEC_PCI_RC_ATS			(1 << 0)
>   
> +/* Read but do not clear any dirty bits */
> +#define IOMMU_DIRTY_NO_CLEAR			(1 << 0)
> +
>   /**
>    * struct iommu_sva - handle to a device-mm bond
>    */
> @@ -755,6 +815,13 @@ static inline struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus
>   	return NULL;
>   }
>   
> +static inline int iommu_domain_set_flags(struct iommu_domain *domain,
> +					 const struct bus_type *bus,
> +					 unsigned long flags)
> +{
> +	return -ENODEV;
> +}
> +
>   static inline void iommu_domain_free(struct iommu_domain *domain)
>   {
>   }

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking
  2023-05-19 11:56         ` Joao Martins
@ 2023-05-19 13:29           ` Jason Gunthorpe
  2023-05-19 13:46             ` Joao Martins
  2023-08-10 18:23             ` Joao Martins
  0 siblings, 2 replies; 65+ messages in thread
From: Jason Gunthorpe @ 2023-05-19 13:29 UTC (permalink / raw)
  To: Joao Martins
  Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
	Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm

On Fri, May 19, 2023 at 12:56:19PM +0100, Joao Martins wrote:
> 
> 
> On 19/05/2023 12:51, Jason Gunthorpe wrote:
> > On Fri, May 19, 2023 at 12:47:24PM +0100, Joao Martins wrote:
> > 
> >> In practice it is done as soon after the domain is created but I understand what
> >> you mean that both should be together; I have this implemented like that as my
> >> first take as a domain_alloc passed flags, but I was a little undecided because
> >> we are adding another domain_alloc() op for the user-managed pagetable and after
> >> having another one we would end up with 3 ways of creating iommu domain -- but
> >> maybe that's not an issue
> > 
> > It should ride on the same user domain alloc op as some generic flags,
> 
> OK, I suppose that makes sense specially with this being tied in HWPT_ALLOC
> where all this new user domain alloc does.

Yes, it should be easy.

Then do what Robin said and make the domain ops NULL if the user
didn't ask for dirty tracking and then attach can fail if there are
domain incompatibility's.

Since alloc_user (or whatever it settles into) will have the struct
device * argument this should be easy enough with out getting mixed
with the struct bus cleanup.

Jason

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 01/24] iommu: Add RCU-protected page free support
  2023-05-18 20:46 ` [PATCH RFCv2 01/24] iommu: Add RCU-protected page free support Joao Martins
@ 2023-05-19 13:32   ` Jason Gunthorpe
  2023-05-19 16:48     ` Joao Martins
  0 siblings, 1 reply; 65+ messages in thread
From: Jason Gunthorpe @ 2023-05-19 13:32 UTC (permalink / raw)
  To: Joao Martins
  Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
	Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm

On Thu, May 18, 2023 at 09:46:27PM +0100, Joao Martins wrote:
> From: Lu Baolu <baolu.lu@linux.intel.com>
> 
> The IOMMU page tables are updated using iommu_map/unmap() interfaces.
> Currently, there is no mandatory requirement for drivers to use locks
> to ensure concurrent updates to page tables, because it's assumed that
> overlapping IOVA ranges do not have concurrent updates. Therefore the
> IOMMU drivers only need to take care of concurrent updates to level
> page table entries.
> 
> But enabling new features challenges this assumption. For example, the
> hardware assisted dirty page tracking feature requires scanning page
> tables in interfaces other than mapping and unmapping. This might result
> in a use-after-free scenario in which a level page table has been freed
> by the unmap() interface, while another thread is scanning the next level
> page table.

I'm not convinced.. The basic model we have is that the caller has to
bring the range locking and the caller has to promise it doesn't do
overlapping things to ranges.

iommufd implements this with area based IOVA range locking.

So, I don't really see an obvious reason why we can't also require
that the dirty reporting hold the area lock and domain locks while it
is calling the iommu driver?

Then we don't have a locking or RCU problem here.

Jason

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 05/24] iommufd: Add a flag to enforce dirty tracking on attach
  2023-05-18 20:46 ` [PATCH RFCv2 05/24] iommufd: Add a flag to enforce dirty tracking on attach Joao Martins
@ 2023-05-19 13:34   ` Jason Gunthorpe
  0 siblings, 0 replies; 65+ messages in thread
From: Jason Gunthorpe @ 2023-05-19 13:34 UTC (permalink / raw)
  To: Joao Martins
  Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
	Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm

On Thu, May 18, 2023 at 09:46:31PM +0100, Joao Martins wrote:
> Throughout IOMMU domain lifetime that wants to use dirty tracking, some
> guarantees are needed such that any device attached to the iommu_domain
> supports dirty tracking.
> 
> The idea is to handle a case where IOMMUs are assymetric feature-wise and
> thus the capability may not be advertised for all devices.  This is done by
> adding a flag into HWPT_ALLOC namely:
> 
> 	IOMMUFD_HWPT_ALLOC_ENFORCE_DIRTY

The flag to userspace makes sense but it should flow through as a flag
to alloc domain user not as an enforce op.

The enforce op exists for the wbinvd thing because of historical
reasons where it was supposed to auto-detect, it is not a great
pattern to copy.

Jason

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 07/24] iommufd/selftest: Test IOMMU_HWPT_ALLOC_ENFORCE_DIRTY
  2023-05-18 20:46 ` [PATCH RFCv2 07/24] iommufd/selftest: Test IOMMU_HWPT_ALLOC_ENFORCE_DIRTY Joao Martins
@ 2023-05-19 13:35   ` Jason Gunthorpe
  2023-05-19 13:52     ` Joao Martins
  2023-05-19 13:55   ` Joao Martins
  1 sibling, 1 reply; 65+ messages in thread
From: Jason Gunthorpe @ 2023-05-19 13:35 UTC (permalink / raw)
  To: Joao Martins
  Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
	Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm

On Thu, May 18, 2023 at 09:46:33PM +0100, Joao Martins wrote:

> @@ -50,6 +54,7 @@ struct iommu_test_cmd {
>  			__aligned_u64 length;
>  		} add_reserved;
>  		struct {
> +			__u32 dev_flags;
>  			__u32 out_stdev_id;
>  			__u32 out_hwpt_id;

Don't break the ABI needlessly, syzkaller relies on this

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking
  2023-05-19 13:22   ` Robin Murphy
@ 2023-05-19 13:43     ` Joao Martins
  2023-05-19 18:12       ` Robin Murphy
  0 siblings, 1 reply; 65+ messages in thread
From: Joao Martins @ 2023-05-19 13:43 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Alex Williamson, kvm, iommu

On 19/05/2023 14:22, Robin Murphy wrote:
> On 2023-05-18 21:46, Joao Martins wrote:
>> Add to iommu domain operations a set of callbacks to perform dirty
>> tracking, particulary to start and stop tracking and finally to read and
>> clear the dirty data.
>>
>> Drivers are generally expected to dynamically change its translation
>> structures to toggle the tracking and flush some form of control state
>> structure that stands in the IOVA translation path. Though it's not
>> mandatory, as drivers will be enable dirty tracking at boot, and just flush
>> the IO pagetables when setting dirty tracking.  For each of the newly added
>> IOMMU core APIs:
>>
>> .supported_flags[IOMMU_DOMAIN_F_ENFORCE_DIRTY]: Introduce a set of flags
>> that enforce certain restrictions in the iommu_domain object. For dirty
>> tracking this means that when IOMMU_DOMAIN_F_ENFORCE_DIRTY is set via its
>> helper iommu_domain_set_flags(...) devices attached via attach_dev will
>> fail on devices that do *not* have dirty tracking supported. IOMMU drivers
>> that support dirty tracking should advertise this flag, while enforcing
>> that dirty tracking is supported by the device in its .attach_dev iommu op.
> 
> Eww, no. For an internal thing, just call ->capable() - I mean, you're literally
> adding this feature as one of its caps...
> 
> However I'm not sure if we even need that - domains which don't support dirty
> tracking should just not expose the ops, and thus it ought to be inherently
> obvious.
> 
OK.

> I'm guessing most of the weirdness here is implicitly working around the
> enabled-from-the-start scenario on SMMUv3:
> 
It has nothing to do with SMMUv3. This is to futureproof the case where the
IOMMU capabilities are not homogeneous, even though, it isn't the case today.

The only thing SMMUv3 that kinda comes for free in this series is clearing
dirties before setting dirty tracking, because [it is always enabled]. But that
is needed regardless of SMMUv3.

>     domain = iommu_domain_alloc(bus);
>     iommu_set_dirty_tracking(domain);
>     // arm-smmu-v3 says OK since it doesn't know that it
>     // definitely *isn't* possible, and saying no wouldn't
>     // be helpful
>     iommu_attach_group(group, domain);
>     // oops, now we see that the relevant SMMU instance isn't one
>     // which actually supports HTTU, what do we do? :(
> 
> I don't have any major objection to the general principle of flagging the domain
> to fail attach if it can't do what we promised 

This is the reason why the I had the flag (or now a domain_alloc flag)...

> , as a bodge for now, but please
> implement it privately in arm-smmu-v3 so it's easier to clean up again in future
> once until iommu_domain_alloc() gets sorted out properly to get rid of this
> awkward blind spot.
> 

But it wasn't related to smmu-v3 logic.

All it is meant is to guarantee that when we only ever have dirty tracking
supported in a single domain. And we don't want that to change throughout the
lifetime of the domain.

> Thanks,
> Robin.
> 
>> iommu_cap::IOMMU_CAP_DIRTY: new device iommu_capable value when probing for
>> capabilities of the device.
>>
>> .set_dirty_tracking(): an iommu driver is expected to change its
>> translation structures and enable dirty tracking for the devices in the
>> iommu_domain. For drivers making dirty tracking always-enabled, it should
>> just return 0.
>>
>> .read_and_clear_dirty(): an iommu driver is expected to walk the iova range
>> passed in and use iommu_dirty_bitmap_record() to record dirty info per
>> IOVA. When detecting a given IOVA is dirty it should also clear its dirty
>> state from the PTE, *unless* the flag IOMMU_DIRTY_NO_CLEAR is passed in --
>> flushing is steered from the caller of the domain_op via iotlb_gather. The
>> iommu core APIs use the same data structure in use for dirty tracking for
>> VFIO device dirty (struct iova_bitmap) abstracted by
>> iommu_dirty_bitmap_record() helper function.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>   drivers/iommu/iommu.c      | 11 +++++++
>>   include/linux/io-pgtable.h |  4 +++
>>   include/linux/iommu.h      | 67 ++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 82 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 2088caae5074..95acc543e8fb 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2013,6 +2013,17 @@ struct iommu_domain *iommu_domain_alloc(const struct
>> bus_type *bus)
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_domain_alloc);
>>   +int iommu_domain_set_flags(struct iommu_domain *domain,
>> +               const struct bus_type *bus, unsigned long val)
>> +{
>> +    if (!(val & bus->iommu_ops->supported_flags))
>> +        return -EINVAL;
>> +
>> +    domain->flags |= val;
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_domain_set_flags);
>> +
>>   void iommu_domain_free(struct iommu_domain *domain)
>>   {
>>       if (domain->type == IOMMU_DOMAIN_SVA)
>> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
>> index 1b7a44b35616..25142a0e2fc2 100644
>> --- a/include/linux/io-pgtable.h
>> +++ b/include/linux/io-pgtable.h
>> @@ -166,6 +166,10 @@ struct io_pgtable_ops {
>>                     struct iommu_iotlb_gather *gather);
>>       phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
>>                       unsigned long iova);
>> +    int (*read_and_clear_dirty)(struct io_pgtable_ops *ops,
>> +                    unsigned long iova, size_t size,
>> +                    unsigned long flags,
>> +                    struct iommu_dirty_bitmap *dirty);
>>   };
>>     /**
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 39d25645a5ab..992ea87f2f8e 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -13,6 +13,7 @@
>>   #include <linux/errno.h>
>>   #include <linux/err.h>
>>   #include <linux/of.h>
>> +#include <linux/iova_bitmap.h>
>>   #include <uapi/linux/iommu.h>
>>     #define IOMMU_READ    (1 << 0)
>> @@ -65,6 +66,11 @@ struct iommu_domain_geometry {
>>     #define __IOMMU_DOMAIN_SVA    (1U << 4)  /* Shared process address space */
>>   +/* Domain feature flags that do not define domain types */
>> +#define IOMMU_DOMAIN_F_ENFORCE_DIRTY    (1U << 6)  /* Enforce attachment of
>> +                              dirty tracking supported
>> +                              devices          */
>> +
>>   /*
>>    * This are the possible domain-types
>>    *
>> @@ -93,6 +99,7 @@ struct iommu_domain_geometry {
>>     struct iommu_domain {
>>       unsigned type;
>> +    unsigned flags;
>>       const struct iommu_domain_ops *ops;
>>       unsigned long pgsize_bitmap;    /* Bitmap of page sizes in use */
>>       struct iommu_domain_geometry geometry;
>> @@ -128,6 +135,7 @@ enum iommu_cap {
>>        * this device.
>>        */
>>       IOMMU_CAP_ENFORCE_CACHE_COHERENCY,
>> +    IOMMU_CAP_DIRTY,        /* IOMMU supports dirty tracking */
>>   };
>>     /* These are the possible reserved region types */
>> @@ -220,6 +228,17 @@ struct iommu_iotlb_gather {
>>       bool            queued;
>>   };
>>   +/**
>> + * struct iommu_dirty_bitmap - Dirty IOVA bitmap state
>> + *
>> + * @bitmap: IOVA bitmap
>> + * @gather: Range information for a pending IOTLB flush
>> + */
>> +struct iommu_dirty_bitmap {
>> +    struct iova_bitmap *bitmap;
>> +    struct iommu_iotlb_gather *gather;
>> +};
>> +
>>   /**
>>    * struct iommu_ops - iommu ops and capabilities
>>    * @capable: check capability
>> @@ -248,6 +267,7 @@ struct iommu_iotlb_gather {
>>    *                    pasid, so that any DMA transactions with this pasid
>>    *                    will be blocked by the hardware.
>>    * @pgsize_bitmap: bitmap of all possible supported page sizes
>> + * @flags: All non domain type supported features
>>    * @owner: Driver module providing these ops
>>    */
>>   struct iommu_ops {
>> @@ -281,6 +301,7 @@ struct iommu_ops {
>>         const struct iommu_domain_ops *default_domain_ops;
>>       unsigned long pgsize_bitmap;
>> +    unsigned long supported_flags;
>>       struct module *owner;
>>   };
>>   @@ -316,6 +337,11 @@ struct iommu_ops {
>>    * @enable_nesting: Enable nesting
>>    * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
>>    * @free: Release the domain after use.
>> + * @set_dirty_tracking: Enable or Disable dirty tracking on the iommu domain
>> + * @read_and_clear_dirty: Walk IOMMU page tables for dirtied PTEs marshalled
>> + *                        into a bitmap, with a bit represented as a page.
>> + *                        Reads the dirty PTE bits and clears it from IO
>> + *                        pagetables.
>>    */
>>   struct iommu_domain_ops {
>>       int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
>> @@ -348,6 +374,12 @@ struct iommu_domain_ops {
>>                     unsigned long quirks);
>>         void (*free)(struct iommu_domain *domain);
>> +
>> +    int (*set_dirty_tracking)(struct iommu_domain *domain, bool enabled);
>> +    int (*read_and_clear_dirty)(struct iommu_domain *domain,
>> +                    unsigned long iova, size_t size,
>> +                    unsigned long flags,
>> +                    struct iommu_dirty_bitmap *dirty);
>>   };
>>     /**
>> @@ -461,6 +493,9 @@ extern bool iommu_present(const struct bus_type *bus);
>>   extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap);
>>   extern bool iommu_group_has_isolated_msi(struct iommu_group *group);
>>   extern struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus);
>> +extern int iommu_domain_set_flags(struct iommu_domain *domain,
>> +                  const struct bus_type *bus,
>> +                  unsigned long flags);
>>   extern void iommu_domain_free(struct iommu_domain *domain);
>>   extern int iommu_attach_device(struct iommu_domain *domain,
>>                      struct device *dev);
>> @@ -627,6 +662,28 @@ static inline bool iommu_iotlb_gather_queued(struct
>> iommu_iotlb_gather *gather)
>>       return gather && gather->queued;
>>   }
>>   +static inline void iommu_dirty_bitmap_init(struct iommu_dirty_bitmap *dirty,
>> +                       struct iova_bitmap *bitmap,
>> +                       struct iommu_iotlb_gather *gather)
>> +{
>> +    if (gather)
>> +        iommu_iotlb_gather_init(gather);
>> +
>> +    dirty->bitmap = bitmap;
>> +    dirty->gather = gather;
>> +}
>> +
>> +static inline void
>> +iommu_dirty_bitmap_record(struct iommu_dirty_bitmap *dirty, unsigned long iova,
>> +              unsigned long length)
>> +{
>> +    if (dirty->bitmap)
>> +        iova_bitmap_set(dirty->bitmap, iova, length);
>> +
>> +    if (dirty->gather)
>> +        iommu_iotlb_gather_add_range(dirty->gather, iova, length);
>> +}
>> +
>>   /* PCI device grouping function */
>>   extern struct iommu_group *pci_device_group(struct device *dev);
>>   /* Generic device grouping function */
>> @@ -657,6 +714,9 @@ struct iommu_fwspec {
>>   /* ATS is supported */
>>   #define IOMMU_FWSPEC_PCI_RC_ATS            (1 << 0)
>>   +/* Read but do not clear any dirty bits */
>> +#define IOMMU_DIRTY_NO_CLEAR            (1 << 0)
>> +
>>   /**
>>    * struct iommu_sva - handle to a device-mm bond
>>    */
>> @@ -755,6 +815,13 @@ static inline struct iommu_domain
>> *iommu_domain_alloc(const struct bus_type *bus
>>       return NULL;
>>   }
>>   +static inline int iommu_domain_set_flags(struct iommu_domain *domain,
>> +                     const struct bus_type *bus,
>> +                     unsigned long flags)
>> +{
>> +    return -ENODEV;
>> +}
>> +
>>   static inline void iommu_domain_free(struct iommu_domain *domain)
>>   {
>>   }

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking
  2023-05-19 13:29           ` Jason Gunthorpe
@ 2023-05-19 13:46             ` Joao Martins
  2023-08-10 18:23             ` Joao Martins
  1 sibling, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-19 13:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
	Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm

On 19/05/2023 14:29, Jason Gunthorpe wrote:
> On Fri, May 19, 2023 at 12:56:19PM +0100, Joao Martins wrote:
>>
>>
>> On 19/05/2023 12:51, Jason Gunthorpe wrote:
>>> On Fri, May 19, 2023 at 12:47:24PM +0100, Joao Martins wrote:
>>>
>>>> In practice it is done as soon after the domain is created but I understand what
>>>> you mean that both should be together; I have this implemented like that as my
>>>> first take as a domain_alloc passed flags, but I was a little undecided because
>>>> we are adding another domain_alloc() op for the user-managed pagetable and after
>>>> having another one we would end up with 3 ways of creating iommu domain -- but
>>>> maybe that's not an issue
>>>
>>> It should ride on the same user domain alloc op as some generic flags,
>>
>> OK, I suppose that makes sense specially with this being tied in HWPT_ALLOC
>> where all this new user domain alloc does.
> 
> Yes, it should be easy.
> 
> Then do what Robin said and make the domain ops NULL if the user
> didn't ask for dirty tracking and then attach can fail if there are
> domain incompatibility's.
> 
Yes

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 09/24] iommufd: Add IOMMU_HWPT_SET_DIRTY
  2023-05-18 20:46 ` [PATCH RFCv2 09/24] iommufd: Add IOMMU_HWPT_SET_DIRTY Joao Martins
@ 2023-05-19 13:49   ` Jason Gunthorpe
  2023-05-19 14:21     ` Joao Martins
  0 siblings, 1 reply; 65+ messages in thread
From: Jason Gunthorpe @ 2023-05-19 13:49 UTC (permalink / raw)
  To: Joao Martins
  Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
	Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm

On Thu, May 18, 2023 at 09:46:35PM +0100, Joao Martins wrote:
> +int iopt_set_dirty_tracking(struct io_pagetable *iopt,
> +			    struct iommu_domain *domain, bool enable)
> +{
> +	const struct iommu_domain_ops *ops = domain->ops;
> +	struct iommu_dirty_bitmap dirty;
> +	struct iommu_iotlb_gather gather;
> +	struct iopt_area *area;
> +	int ret = 0;
> +
> +	if (!ops->set_dirty_tracking)
> +		return -EOPNOTSUPP;
> +
> +	iommu_dirty_bitmap_init(&dirty, NULL, &gather);
> +
> +	down_write(&iopt->iova_rwsem);
> +	for (area = iopt_area_iter_first(iopt, 0, ULONG_MAX);
> +	     area && enable;

That's a goofy way to write this.. put this in a function and don't
call it if enable is not set.

Why is this down_write() ?

You can see that this locking already prevents racing dirty read with
domain unmap.

This domain cannot be removed from the iopt eg through
iopt_table_remove_domain() because this is holding the object
reference on the hwpt

The area cannot be unmapped because this is holding the
&iopt->iova_rwsem

There is no other way to call unmap..

You do have to check that area->pages != NULL though

Jason

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 21/24] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping
  2023-05-18 20:46 ` [PATCH RFCv2 21/24] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping Joao Martins
@ 2023-05-19 13:49   ` Robin Murphy
  2023-05-19 14:05     ` Joao Martins
  2023-05-22 10:34   ` Shameerali Kolothum Thodi
  1 sibling, 1 reply; 65+ messages in thread
From: Robin Murphy @ 2023-05-19 13:49 UTC (permalink / raw)
  To: Joao Martins, iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Alex Williamson, kvm

On 2023-05-18 21:46, Joao Martins wrote:
> From: Kunkun Jiang <jiangkunkun@huawei.com>
> 
> As nested mode is not upstreamed now, we just aim to support dirty
> log tracking for stage1 with io-pgtable mapping (means not support
> SVA mapping). If HTTU is supported, we enable HA/HD bits in the SMMU
> CD and transfer ARM_HD quirk to io-pgtable.
> 
> We additionally filter out HD|HA if not supportted. The CD.HD bit
> is not particularly useful unless we toggle the DBM bit in the PTE
> entries.

...seeds odd to describe the control which fundamentally enables DBM or 
not as "not particularly useful" to the DBM use-case :/

> Link: https://lore.kernel.org/lkml/20210413085457.25400-6-zhukeqian1@huawei.com/
> Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
> [joaomart:Convey HD|HA bits over to the context descriptor
>   and update commit message; original in Link, where this is based on]
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 10 ++++++++++
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  3 +++
>   drivers/iommu/io-pgtable-arm.c              | 11 +++++++++--
>   include/linux/io-pgtable.h                  |  4 ++++

For the sake of cleanliness, please split the io-pgtable and SMMU 
additions into separate patches (you could perhaps then squash 
set_dirty_tracking() into the SMMU patch as well).

Thanks,
Robin.

>   4 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index e110ff4710bf..e2b98a6a6b74 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1998,6 +1998,11 @@ static const struct iommu_flush_ops arm_smmu_flush_ops = {
>   	.tlb_add_page	= arm_smmu_tlb_inv_page_nosync,
>   };
>   
> +static bool arm_smmu_dbm_capable(struct arm_smmu_device *smmu)
> +{
> +	return smmu->features & (ARM_SMMU_FEAT_HD | ARM_SMMU_FEAT_COHERENCY);
> +}
> +
>   /* IOMMU API */
>   static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
>   {
> @@ -2124,6 +2129,8 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>   			  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) |
>   			  FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) |
>   			  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
> +	if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_HD)
> +		cfg->cd.tcr |= CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD;
>   	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
>   
>   	/*
> @@ -2226,6 +2233,9 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>   		.iommu_dev	= smmu->dev,
>   	};
>   
> +	if (smmu->features & arm_smmu_dbm_capable(smmu))
> +		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD;
> +
>   	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
>   	if (!pgtbl_ops)
>   		return -ENOMEM;
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index d82dd125446c..83d6f3a2554f 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -288,6 +288,9 @@
>   #define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
>   #define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
>   
> +#define CTXDESC_CD_0_TCR_HA            (1UL << 43)
> +#define CTXDESC_CD_0_TCR_HD            (1UL << 42)
> +
>   #define CTXDESC_CD_0_AA64		(1UL << 41)
>   #define CTXDESC_CD_0_S			(1UL << 44)
>   #define CTXDESC_CD_0_R			(1UL << 45)
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 72dcdd468cf3..b2f470529459 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -75,6 +75,7 @@
>   
>   #define ARM_LPAE_PTE_NSTABLE		(((arm_lpae_iopte)1) << 63)
>   #define ARM_LPAE_PTE_XN			(((arm_lpae_iopte)3) << 53)
> +#define ARM_LPAE_PTE_DBM		(((arm_lpae_iopte)1) << 51)
>   #define ARM_LPAE_PTE_AF			(((arm_lpae_iopte)1) << 10)
>   #define ARM_LPAE_PTE_SH_NS		(((arm_lpae_iopte)0) << 8)
>   #define ARM_LPAE_PTE_SH_OS		(((arm_lpae_iopte)2) << 8)
> @@ -84,7 +85,7 @@
>   
>   #define ARM_LPAE_PTE_ATTR_LO_MASK	(((arm_lpae_iopte)0x3ff) << 2)
>   /* Ignore the contiguous bit for block splitting */
> -#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)6) << 52)
> +#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)13) << 51)
>   #define ARM_LPAE_PTE_ATTR_MASK		(ARM_LPAE_PTE_ATTR_LO_MASK |	\
>   					 ARM_LPAE_PTE_ATTR_HI_MASK)
>   /* Software bit for solving coherency races */
> @@ -93,6 +94,9 @@
>   /* Stage-1 PTE */
>   #define ARM_LPAE_PTE_AP_UNPRIV		(((arm_lpae_iopte)1) << 6)
>   #define ARM_LPAE_PTE_AP_RDONLY		(((arm_lpae_iopte)2) << 6)
> +#define ARM_LPAE_PTE_AP_RDONLY_BIT	7
> +#define ARM_LPAE_PTE_AP_WRITABLE	(ARM_LPAE_PTE_AP_RDONLY | \
> +					 ARM_LPAE_PTE_DBM)
>   #define ARM_LPAE_PTE_ATTRINDX_SHIFT	2
>   #define ARM_LPAE_PTE_nG			(((arm_lpae_iopte)1) << 11)
>   
> @@ -407,6 +411,8 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>   		pte = ARM_LPAE_PTE_nG;
>   		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
>   			pte |= ARM_LPAE_PTE_AP_RDONLY;
> +		else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_HD)
> +			pte |= ARM_LPAE_PTE_AP_WRITABLE;
>   		if (!(prot & IOMMU_PRIV))
>   			pte |= ARM_LPAE_PTE_AP_UNPRIV;
>   	} else {
> @@ -804,7 +810,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>   
>   	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
>   			    IO_PGTABLE_QUIRK_ARM_TTBR1 |
> -			    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
> +			    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA |
> +			    IO_PGTABLE_QUIRK_ARM_HD))
>   		return NULL;
>   
>   	data = arm_lpae_alloc_pgtable(cfg);
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 25142a0e2fc2..9a996ba7856d 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -85,6 +85,8 @@ struct io_pgtable_cfg {
>   	 *
>   	 * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability
>   	 *	attributes set in the TCR for a non-coherent page-table walker.
> +	 *
> +	 * IO_PGTABLE_QUIRK_ARM_HD: Enables dirty tracking.
>   	 */
>   	#define IO_PGTABLE_QUIRK_ARM_NS			BIT(0)
>   	#define IO_PGTABLE_QUIRK_NO_PERMS		BIT(1)
> @@ -92,6 +94,8 @@ struct io_pgtable_cfg {
>   	#define IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT	BIT(4)
>   	#define IO_PGTABLE_QUIRK_ARM_TTBR1		BIT(5)
>   	#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA		BIT(6)
> +	#define IO_PGTABLE_QUIRK_ARM_HD			BIT(7)
> +
>   	unsigned long			quirks;
>   	unsigned long			pgsize_bitmap;
>   	unsigned int			ias;

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 07/24] iommufd/selftest: Test IOMMU_HWPT_ALLOC_ENFORCE_DIRTY
  2023-05-19 13:35   ` Jason Gunthorpe
@ 2023-05-19 13:52     ` Joao Martins
  0 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-19 13:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
	Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm

On 19/05/2023 14:35, Jason Gunthorpe wrote:
> On Thu, May 18, 2023 at 09:46:33PM +0100, Joao Martins wrote:
> 
>> @@ -50,6 +54,7 @@ struct iommu_test_cmd {
>>  			__aligned_u64 length;
>>  		} add_reserved;
>>  		struct {
>> +			__u32 dev_flags;
>>  			__u32 out_stdev_id;
>>  			__u32 out_hwpt_id;
> 
> Don't break the ABI needlessly, syzkaller relies on this

IOMMUFD_TEST was quoted "dangerous" in kconfig so I thought that ABI was loosed up

I guess I just add a new struct here that extends struct mock_domain inside the
union.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 15/24] iommufd: Add a flag to skip clearing of IOPTE dirty
  2023-05-18 20:46 ` [PATCH RFCv2 15/24] iommufd: Add a flag to skip clearing of IOPTE dirty Joao Martins
@ 2023-05-19 13:54   ` Jason Gunthorpe
  0 siblings, 0 replies; 65+ messages in thread
From: Jason Gunthorpe @ 2023-05-19 13:54 UTC (permalink / raw)
  To: Joao Martins
  Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
	Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm

On Thu, May 18, 2023 at 09:46:41PM +0100, Joao Martins wrote:
> VFIO has an operation where you an unmap an IOVA while returning a bitmap
> with the dirty data. In reality the operation doesn't quite query the IO
> pagetables that the the PTE was dirty or not, but it marks as dirty in the
> bitmap anything that was mapped all in one operation.
> 
> In IOMMUFD this equivalent can be done in two operations by querying with
> GET_DIRTY_IOVA followed by UNMAP_IOVA. However, this would incur two TLB
> flushes given that after clearing dirty bits IOMMU implementations require
> invalidating their IOTLB, plus another invalidation needed for the UNMAP.
> To allow dirty bits to be queried faster, we add a flag
> (IOMMU_GET_DIRTY_IOVA_NO_CLEAR) that requests to not clear the dirty bits
> from the PTE (but just reading them), under the expectation that the next
> operation is the unmap. An alternative is to unmap and just perpectually
> mark as dirty as that's the same behaviour as today. So here equivalent
> functionally can be provided, and if real dirty info is required we
> amortize the cost while querying.
> 
> There's still a race against DMA where in theory the unmap of the IOVA
> (when the guest invalidates the IOTLB via emulated iommu) would race
> against the VF performing DMA on the same IOVA being invalidated which
> would be marking the PTE as dirty but losing the update in the
> unmap-related IOTLB flush. The way to actually prevent the race would be to
> write-protect the IOPTE, then query dirty bits and flush the IOTLB in the
> unmap after.  However, this remains an issue that is so far theoretically
> possible, but lacks an use case or whether the race is relevant in the
> first place that justifies such complexity.
> 
> Link:
> https://lore.kernel.org/linux-iommu/20220502185239.GR8364@nvidia.com/

I think you should clip the explanation from the email into the commit
message - eg that we are accepting to resolve this race as throwing
away the DMA and it doesn't matter if it hit physical DRAM or not, the
VM can't tell if we threw it away because the DMA was blocked or
because we failed to copy the DRAM.

Jason

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 07/24] iommufd/selftest: Test IOMMU_HWPT_ALLOC_ENFORCE_DIRTY
  2023-05-18 20:46 ` [PATCH RFCv2 07/24] iommufd/selftest: Test IOMMU_HWPT_ALLOC_ENFORCE_DIRTY Joao Martins
  2023-05-19 13:35   ` Jason Gunthorpe
@ 2023-05-19 13:55   ` Joao Martins
  1 sibling, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-19 13:55 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm

On 18/05/2023 21:46, Joao Martins wrote:
> diff --git a/tools/testing/selftests/iommu/Makefile b/tools/testing/selftests/iommu/Makefile
> index 32c5fdfd0eef..f1aee4e5ec2e 100644
> --- a/tools/testing/selftests/iommu/Makefile
> +++ b/tools/testing/selftests/iommu/Makefile
> @@ -1,5 +1,8 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  CFLAGS += -Wall -O2 -Wno-unused-function
> +CFLAGS += -I../../../../tools/include/
> +CFLAGS += -I../../../../include/uapi/
> +CFLAGS += -I../../../../include/
>  CFLAGS += $(KHDR_INCLUDES)
>  
>  CFLAGS += -D_GNU_SOURCE

Please ignore this hunk here. I had a few issues with headers for the bitmap
helpers and this was a temporary hack that I failed to remove.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 21/24] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping
  2023-05-19 13:49   ` Robin Murphy
@ 2023-05-19 14:05     ` Joao Martins
  0 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-19 14:05 UTC (permalink / raw)
  To: Robin Murphy, iommu
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Alex Williamson, kvm

On 19/05/2023 14:49, Robin Murphy wrote:
> On 2023-05-18 21:46, Joao Martins wrote:
>> From: Kunkun Jiang <jiangkunkun@huawei.com>
>>
>> As nested mode is not upstreamed now, we just aim to support dirty
>> log tracking for stage1 with io-pgtable mapping (means not support
>> SVA mapping). If HTTU is supported, we enable HA/HD bits in the SMMU
>> CD and transfer ARM_HD quirk to io-pgtable.
>>
>> We additionally filter out HD|HA if not supportted. The CD.HD bit
>> is not particularly useful unless we toggle the DBM bit in the PTE
>> entries.
> 
> ...seeds odd to describe the control which fundamentally enables DBM or not as
> "not particularly useful" to the DBM use-case :/
> 

This is a remnant from v1 where we would just enable the context descriptor HD
bit, but not actually enabling DBM until set_dirty_Tracking(). Which no longer
is the case. Should remove this sentence.

>> Link: https://lore.kernel.org/lkml/20210413085457.25400-6-zhukeqian1@huawei.com/
>> Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
>> [joaomart:Convey HD|HA bits over to the context descriptor
>>   and update commit message; original in Link, where this is based on]
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 10 ++++++++++
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  3 +++
>>   drivers/iommu/io-pgtable-arm.c              | 11 +++++++++--
>>   include/linux/io-pgtable.h                  |  4 ++++
> 
> For the sake of cleanliness, please split the io-pgtable and SMMU additions into
> separate patches (you could perhaps then squash set_dirty_tracking() into the
> SMMU patch as well).
> 
ack

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 09/24] iommufd: Add IOMMU_HWPT_SET_DIRTY
  2023-05-19 13:49   ` Jason Gunthorpe
@ 2023-05-19 14:21     ` Joao Martins
  0 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-19 14:21 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
	Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm

On 19/05/2023 14:49, Jason Gunthorpe wrote:
> On Thu, May 18, 2023 at 09:46:35PM +0100, Joao Martins wrote:
>> +int iopt_set_dirty_tracking(struct io_pagetable *iopt,
>> +			    struct iommu_domain *domain, bool enable)
>> +{
>> +	const struct iommu_domain_ops *ops = domain->ops;
>> +	struct iommu_dirty_bitmap dirty;
>> +	struct iommu_iotlb_gather gather;
>> +	struct iopt_area *area;
>> +	int ret = 0;
>> +
>> +	if (!ops->set_dirty_tracking)
>> +		return -EOPNOTSUPP;
>> +
>> +	iommu_dirty_bitmap_init(&dirty, NULL, &gather);
>> +
>> +	down_write(&iopt->iova_rwsem);
>> +	for (area = iopt_area_iter_first(iopt, 0, ULONG_MAX);
>> +	     area && enable;
> 
> That's a goofy way to write this.. put this in a function and don't
> call it if enable is not set.
> 
I'll move this into a iopt_clear_dirty_data()

There's also another less positive aspect here on the implicit assumption I made
that a dirty::bitmap == NULL means we do not care about recording dirty bitmap.
Which is OK. But I use that field being NULL as means to ignore the iommu driver
status control to just clear the remnant dirty bits that could have been done
until we disable dirty tracking. I'm thinking in making this into a flags value,
but keeping internally only, I am not sure this should be exposed to userspace.

> Why is this down_write() ?
> 
> You can see that this locking already prevents racing dirty read with
> domain unmap.
> 
> This domain cannot be removed from the iopt eg through
> iopt_table_remove_domain() because this is holding the object
> reference on the hwpt
> 
> The area cannot be unmapped because this is holding the
> &iopt->iova_rwsem
> 
> There is no other way to call unmap..

down_read(&iopt->iova_rwsem) is more approprite;
iopt_read_and_clear_dirty_data() does so already.

But I should iterating over areas there too, which I wrongly not doing.

> You do have to check that area->pages != NULL though

OK

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 01/24] iommu: Add RCU-protected page free support
  2023-05-19 13:32   ` Jason Gunthorpe
@ 2023-05-19 16:48     ` Joao Martins
  0 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-05-19 16:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
	Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Robin Murphy, Alex Williamson, kvm

On 19/05/2023 14:32, Jason Gunthorpe wrote:
> On Thu, May 18, 2023 at 09:46:27PM +0100, Joao Martins wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>>
>> The IOMMU page tables are updated using iommu_map/unmap() interfaces.
>> Currently, there is no mandatory requirement for drivers to use locks
>> to ensure concurrent updates to page tables, because it's assumed that
>> overlapping IOVA ranges do not have concurrent updates. Therefore the
>> IOMMU drivers only need to take care of concurrent updates to level
>> page table entries.
>>
>> But enabling new features challenges this assumption. For example, the
>> hardware assisted dirty page tracking feature requires scanning page
>> tables in interfaces other than mapping and unmapping. This might result
>> in a use-after-free scenario in which a level page table has been freed
>> by the unmap() interface, while another thread is scanning the next level
>> page table.
> 
> I'm not convinced.. The basic model we have is that the caller has to
> bring the range locking and the caller has to promise it doesn't do
> overlapping things to ranges.
> 
> iommufd implements this with area based IOVA range locking.
> 
Right

> So, I don't really see an obvious reason why we can't also require
> that the dirty reporting hold the area lock and domain locks while it
> is calling the iommu driver?
> 
> Then we don't have a locking or RCU problem here.
> 
I would rather keep base on area range locking -- I think I got confused from
this other thread discussion on having RCU for the iopte walks. I'll remove it
from the series for now, and if later deemed needed it should come as a separate
thing.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking
  2023-05-19 13:43     ` Joao Martins
@ 2023-05-19 18:12       ` Robin Murphy
  0 siblings, 0 replies; 65+ messages in thread
From: Robin Murphy @ 2023-05-19 18:12 UTC (permalink / raw)
  To: Joao Martins
  Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
	Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Alex Williamson, kvm, iommu

On 19/05/2023 2:43 pm, Joao Martins wrote:
> On 19/05/2023 14:22, Robin Murphy wrote:
>> On 2023-05-18 21:46, Joao Martins wrote:
>>> Add to iommu domain operations a set of callbacks to perform dirty
>>> tracking, particulary to start and stop tracking and finally to read and
>>> clear the dirty data.
>>>
>>> Drivers are generally expected to dynamically change its translation
>>> structures to toggle the tracking and flush some form of control state
>>> structure that stands in the IOVA translation path. Though it's not
>>> mandatory, as drivers will be enable dirty tracking at boot, and just flush
>>> the IO pagetables when setting dirty tracking.  For each of the newly added
>>> IOMMU core APIs:
>>>
>>> .supported_flags[IOMMU_DOMAIN_F_ENFORCE_DIRTY]: Introduce a set of flags
>>> that enforce certain restrictions in the iommu_domain object. For dirty
>>> tracking this means that when IOMMU_DOMAIN_F_ENFORCE_DIRTY is set via its
>>> helper iommu_domain_set_flags(...) devices attached via attach_dev will
>>> fail on devices that do *not* have dirty tracking supported. IOMMU drivers
>>> that support dirty tracking should advertise this flag, while enforcing
>>> that dirty tracking is supported by the device in its .attach_dev iommu op.
>>
>> Eww, no. For an internal thing, just call ->capable() - I mean, you're literally
>> adding this feature as one of its caps...
>>
>> However I'm not sure if we even need that - domains which don't support dirty
>> tracking should just not expose the ops, and thus it ought to be inherently
>> obvious.
>>
> OK.
> 
>> I'm guessing most of the weirdness here is implicitly working around the
>> enabled-from-the-start scenario on SMMUv3:
>>
> It has nothing to do with SMMUv3. This is to futureproof the case where the
> IOMMU capabilities are not homogeneous, even though, it isn't the case today.

Indeed, but in practice SMMUv3 is the only case where you're 
realistically likely to see heterogenous dirty-tracking support across 
instances, and the one where it's most problematic to do things with a 
domain immediately after allocation.

iommu_domain_alloc() is the last piece of the puzzle in terms of 
updating the public IOMMU API to the modern multi-instance model, and 
we're working towards the point where it will be able to return a fully 
functional domain right off the bat. Thus I'm not keen on building more 
APIs around the current behviour which would become obsolete all too 
soon and need to be unpicked again. The domain_alloc_user stuff is a 
glimpse of the future, so if you're happy to focus on the IOMMUFD case 
for now and build on that model for the initial concept, that's great - 
by the time anyone comes up with a reason for generalising it further, 
the rest of the API should have caught up.

Thanks,
Robin.

> The only thing SMMUv3 that kinda comes for free in this series is clearing
> dirties before setting dirty tracking, because [it is always enabled]. But that
> is needed regardless of SMMUv3.
> 
>>      domain = iommu_domain_alloc(bus);
>>      iommu_set_dirty_tracking(domain);
>>      // arm-smmu-v3 says OK since it doesn't know that it
>>      // definitely *isn't* possible, and saying no wouldn't
>>      // be helpful
>>      iommu_attach_group(group, domain);
>>      // oops, now we see that the relevant SMMU instance isn't one
>>      // which actually supports HTTU, what do we do? :(
>>
>> I don't have any major objection to the general principle of flagging the domain
>> to fail attach if it can't do what we promised
> 
> This is the reason why the I had the flag (or now a domain_alloc flag)...
> 
>> , as a bodge for now, but please
>> implement it privately in arm-smmu-v3 so it's easier to clean up again in future
>> once until iommu_domain_alloc() gets sorted out properly to get rid of this
>> awkward blind spot.
>>
> 
> But it wasn't related to smmu-v3 logic.
> 
> All it is meant is to guarantee that when we only ever have dirty tracking
> supported in a single domain. And we don't want that to change throughout the
> lifetime of the domain.
> 
>> Thanks,
>> Robin.
>>
>>> iommu_cap::IOMMU_CAP_DIRTY: new device iommu_capable value when probing for
>>> capabilities of the device.
>>>
>>> .set_dirty_tracking(): an iommu driver is expected to change its
>>> translation structures and enable dirty tracking for the devices in the
>>> iommu_domain. For drivers making dirty tracking always-enabled, it should
>>> just return 0.
>>>
>>> .read_and_clear_dirty(): an iommu driver is expected to walk the iova range
>>> passed in and use iommu_dirty_bitmap_record() to record dirty info per
>>> IOVA. When detecting a given IOVA is dirty it should also clear its dirty
>>> state from the PTE, *unless* the flag IOMMU_DIRTY_NO_CLEAR is passed in --
>>> flushing is steered from the caller of the domain_op via iotlb_gather. The
>>> iommu core APIs use the same data structure in use for dirty tracking for
>>> VFIO device dirty (struct iova_bitmap) abstracted by
>>> iommu_dirty_bitmap_record() helper function.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>>    drivers/iommu/iommu.c      | 11 +++++++
>>>    include/linux/io-pgtable.h |  4 +++
>>>    include/linux/iommu.h      | 67 ++++++++++++++++++++++++++++++++++++++
>>>    3 files changed, 82 insertions(+)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index 2088caae5074..95acc543e8fb 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -2013,6 +2013,17 @@ struct iommu_domain *iommu_domain_alloc(const struct
>>> bus_type *bus)
>>>    }
>>>    EXPORT_SYMBOL_GPL(iommu_domain_alloc);
>>>    +int iommu_domain_set_flags(struct iommu_domain *domain,
>>> +               const struct bus_type *bus, unsigned long val)
>>> +{
>>> +    if (!(val & bus->iommu_ops->supported_flags))
>>> +        return -EINVAL;
>>> +
>>> +    domain->flags |= val;
>>> +    return 0;
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_domain_set_flags);
>>> +
>>>    void iommu_domain_free(struct iommu_domain *domain)
>>>    {
>>>        if (domain->type == IOMMU_DOMAIN_SVA)
>>> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
>>> index 1b7a44b35616..25142a0e2fc2 100644
>>> --- a/include/linux/io-pgtable.h
>>> +++ b/include/linux/io-pgtable.h
>>> @@ -166,6 +166,10 @@ struct io_pgtable_ops {
>>>                      struct iommu_iotlb_gather *gather);
>>>        phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
>>>                        unsigned long iova);
>>> +    int (*read_and_clear_dirty)(struct io_pgtable_ops *ops,
>>> +                    unsigned long iova, size_t size,
>>> +                    unsigned long flags,
>>> +                    struct iommu_dirty_bitmap *dirty);
>>>    };
>>>      /**
>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>>> index 39d25645a5ab..992ea87f2f8e 100644
>>> --- a/include/linux/iommu.h
>>> +++ b/include/linux/iommu.h
>>> @@ -13,6 +13,7 @@
>>>    #include <linux/errno.h>
>>>    #include <linux/err.h>
>>>    #include <linux/of.h>
>>> +#include <linux/iova_bitmap.h>
>>>    #include <uapi/linux/iommu.h>
>>>      #define IOMMU_READ    (1 << 0)
>>> @@ -65,6 +66,11 @@ struct iommu_domain_geometry {
>>>      #define __IOMMU_DOMAIN_SVA    (1U << 4)  /* Shared process address space */
>>>    +/* Domain feature flags that do not define domain types */
>>> +#define IOMMU_DOMAIN_F_ENFORCE_DIRTY    (1U << 6)  /* Enforce attachment of
>>> +                              dirty tracking supported
>>> +                              devices          */
>>> +
>>>    /*
>>>     * This are the possible domain-types
>>>     *
>>> @@ -93,6 +99,7 @@ struct iommu_domain_geometry {
>>>      struct iommu_domain {
>>>        unsigned type;
>>> +    unsigned flags;
>>>        const struct iommu_domain_ops *ops;
>>>        unsigned long pgsize_bitmap;    /* Bitmap of page sizes in use */
>>>        struct iommu_domain_geometry geometry;
>>> @@ -128,6 +135,7 @@ enum iommu_cap {
>>>         * this device.
>>>         */
>>>        IOMMU_CAP_ENFORCE_CACHE_COHERENCY,
>>> +    IOMMU_CAP_DIRTY,        /* IOMMU supports dirty tracking */
>>>    };
>>>      /* These are the possible reserved region types */
>>> @@ -220,6 +228,17 @@ struct iommu_iotlb_gather {
>>>        bool            queued;
>>>    };
>>>    +/**
>>> + * struct iommu_dirty_bitmap - Dirty IOVA bitmap state
>>> + *
>>> + * @bitmap: IOVA bitmap
>>> + * @gather: Range information for a pending IOTLB flush
>>> + */
>>> +struct iommu_dirty_bitmap {
>>> +    struct iova_bitmap *bitmap;
>>> +    struct iommu_iotlb_gather *gather;
>>> +};
>>> +
>>>    /**
>>>     * struct iommu_ops - iommu ops and capabilities
>>>     * @capable: check capability
>>> @@ -248,6 +267,7 @@ struct iommu_iotlb_gather {
>>>     *                    pasid, so that any DMA transactions with this pasid
>>>     *                    will be blocked by the hardware.
>>>     * @pgsize_bitmap: bitmap of all possible supported page sizes
>>> + * @flags: All non domain type supported features
>>>     * @owner: Driver module providing these ops
>>>     */
>>>    struct iommu_ops {
>>> @@ -281,6 +301,7 @@ struct iommu_ops {
>>>          const struct iommu_domain_ops *default_domain_ops;
>>>        unsigned long pgsize_bitmap;
>>> +    unsigned long supported_flags;
>>>        struct module *owner;
>>>    };
>>>    @@ -316,6 +337,11 @@ struct iommu_ops {
>>>     * @enable_nesting: Enable nesting
>>>     * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
>>>     * @free: Release the domain after use.
>>> + * @set_dirty_tracking: Enable or Disable dirty tracking on the iommu domain
>>> + * @read_and_clear_dirty: Walk IOMMU page tables for dirtied PTEs marshalled
>>> + *                        into a bitmap, with a bit represented as a page.
>>> + *                        Reads the dirty PTE bits and clears it from IO
>>> + *                        pagetables.
>>>     */
>>>    struct iommu_domain_ops {
>>>        int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
>>> @@ -348,6 +374,12 @@ struct iommu_domain_ops {
>>>                      unsigned long quirks);
>>>          void (*free)(struct iommu_domain *domain);
>>> +
>>> +    int (*set_dirty_tracking)(struct iommu_domain *domain, bool enabled);
>>> +    int (*read_and_clear_dirty)(struct iommu_domain *domain,
>>> +                    unsigned long iova, size_t size,
>>> +                    unsigned long flags,
>>> +                    struct iommu_dirty_bitmap *dirty);
>>>    };
>>>      /**
>>> @@ -461,6 +493,9 @@ extern bool iommu_present(const struct bus_type *bus);
>>>    extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap);
>>>    extern bool iommu_group_has_isolated_msi(struct iommu_group *group);
>>>    extern struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus);
>>> +extern int iommu_domain_set_flags(struct iommu_domain *domain,
>>> +                  const struct bus_type *bus,
>>> +                  unsigned long flags);
>>>    extern void iommu_domain_free(struct iommu_domain *domain);
>>>    extern int iommu_attach_device(struct iommu_domain *domain,
>>>                       struct device *dev);
>>> @@ -627,6 +662,28 @@ static inline bool iommu_iotlb_gather_queued(struct
>>> iommu_iotlb_gather *gather)
>>>        return gather && gather->queued;
>>>    }
>>>    +static inline void iommu_dirty_bitmap_init(struct iommu_dirty_bitmap *dirty,
>>> +                       struct iova_bitmap *bitmap,
>>> +                       struct iommu_iotlb_gather *gather)
>>> +{
>>> +    if (gather)
>>> +        iommu_iotlb_gather_init(gather);
>>> +
>>> +    dirty->bitmap = bitmap;
>>> +    dirty->gather = gather;
>>> +}
>>> +
>>> +static inline void
>>> +iommu_dirty_bitmap_record(struct iommu_dirty_bitmap *dirty, unsigned long iova,
>>> +              unsigned long length)
>>> +{
>>> +    if (dirty->bitmap)
>>> +        iova_bitmap_set(dirty->bitmap, iova, length);
>>> +
>>> +    if (dirty->gather)
>>> +        iommu_iotlb_gather_add_range(dirty->gather, iova, length);
>>> +}
>>> +
>>>    /* PCI device grouping function */
>>>    extern struct iommu_group *pci_device_group(struct device *dev);
>>>    /* Generic device grouping function */
>>> @@ -657,6 +714,9 @@ struct iommu_fwspec {
>>>    /* ATS is supported */
>>>    #define IOMMU_FWSPEC_PCI_RC_ATS            (1 << 0)
>>>    +/* Read but do not clear any dirty bits */
>>> +#define IOMMU_DIRTY_NO_CLEAR            (1 << 0)
>>> +
>>>    /**
>>>     * struct iommu_sva - handle to a device-mm bond
>>>     */
>>> @@ -755,6 +815,13 @@ static inline struct iommu_domain
>>> *iommu_domain_alloc(const struct bus_type *bus
>>>        return NULL;
>>>    }
>>>    +static inline int iommu_domain_set_flags(struct iommu_domain *domain,
>>> +                     const struct bus_type *bus,
>>> +                     unsigned long flags)
>>> +{
>>> +    return -ENODEV;
>>> +}
>>> +
>>>    static inline void iommu_domain_free(struct iommu_domain *domain)
>>>    {
>>>    }

^ permalink raw reply	[flat|nested] 65+ messages in thread

* RE: [PATCH RFCv2 21/24] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping
  2023-05-18 20:46 ` [PATCH RFCv2 21/24] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping Joao Martins
  2023-05-19 13:49   ` Robin Murphy
@ 2023-05-22 10:34   ` Shameerali Kolothum Thodi
  2023-05-22 10:43     ` Joao Martins
  1 sibling, 1 reply; 65+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-05-22 10:34 UTC (permalink / raw)
  To: Joao Martins, iommu
  Cc: Jason Gunthorpe, Kevin Tian, Lu Baolu, Yi Liu, Yi Y Sun,
	Eric Auger, Nicolin Chen, Joerg Roedel, Jean-Philippe Brucker,
	Suravee Suthikulpanit, Will Deacon, Robin Murphy,
	Alex Williamson, kvm



> -----Original Message-----
> From: Joao Martins [mailto:joao.m.martins@oracle.com]
> Sent: 18 May 2023 21:47
> To: iommu@lists.linux.dev
> Cc: Jason Gunthorpe <jgg@nvidia.com>; Kevin Tian <kevin.tian@intel.com>;
> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; Lu
> Baolu <baolu.lu@linux.intel.com>; Yi Liu <yi.l.liu@intel.com>; Yi Y Sun
> <yi.y.sun@intel.com>; Eric Auger <eric.auger@redhat.com>; Nicolin Chen
> <nicolinc@nvidia.com>; Joerg Roedel <joro@8bytes.org>; Jean-Philippe
> Brucker <jean-philippe@linaro.org>; Suravee Suthikulpanit
> <suravee.suthikulpanit@amd.com>; Will Deacon <will@kernel.org>; Robin
> Murphy <robin.murphy@arm.com>; Alex Williamson
> <alex.williamson@redhat.com>; kvm@vger.kernel.org; Joao Martins
> <joao.m.martins@oracle.com>
> Subject: [PATCH RFCv2 21/24] iommu/arm-smmu-v3: Enable HTTU for
> stage1 with io-pgtable mapping
> 
> From: Kunkun Jiang <jiangkunkun@huawei.com>
> 
> As nested mode is not upstreamed now, we just aim to support dirty
> log tracking for stage1 with io-pgtable mapping (means not support
> SVA mapping). If HTTU is supported, we enable HA/HD bits in the SMMU
> CD and transfer ARM_HD quirk to io-pgtable.
> 
> We additionally filter out HD|HA if not supportted. The CD.HD bit
> is not particularly useful unless we toggle the DBM bit in the PTE
> entries.
> 
> Link:
> https://lore.kernel.org/lkml/20210413085457.25400-6-zhukeqian1@huawei
> .com/
> Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
> [joaomart:Convey HD|HA bits over to the context descriptor
>  and update commit message; original in Link, where this is based on]
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 10 ++++++++++
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  3 +++
>  drivers/iommu/io-pgtable-arm.c              | 11 +++++++++--
>  include/linux/io-pgtable.h                  |  4 ++++
>  4 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index e110ff4710bf..e2b98a6a6b74 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1998,6 +1998,11 @@ static const struct iommu_flush_ops
> arm_smmu_flush_ops = {
>  	.tlb_add_page	= arm_smmu_tlb_inv_page_nosync,
>  };
> 
> +static bool arm_smmu_dbm_capable(struct arm_smmu_device *smmu)
> +{
> +	return smmu->features & (ARM_SMMU_FEAT_HD |
> ARM_SMMU_FEAT_COHERENCY);
> +}
> +

This will claim DBM capability for systems with just ARM_SMMU_FEAT_COHERENCY.

Thanks,
Shameer

>  /* IOMMU API */
>  static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
>  {
> @@ -2124,6 +2129,8 @@ static int arm_smmu_domain_finalise_s1(struct
> arm_smmu_domain *smmu_domain,
>  			  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) |
>  			  FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) |
>  			  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
> +	if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_HD)
> +		cfg->cd.tcr |= CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD;
>  	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
> 
>  	/*
> @@ -2226,6 +2233,9 @@ static int arm_smmu_domain_finalise(struct
> iommu_domain *domain,
>  		.iommu_dev	= smmu->dev,
>  	};
> 
> +	if (smmu->features & arm_smmu_dbm_capable(smmu))
> +		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD;
> +
>  	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
>  	if (!pgtbl_ops)
>  		return -ENOMEM;
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index d82dd125446c..83d6f3a2554f 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -288,6 +288,9 @@
>  #define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
>  #define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
> 
> +#define CTXDESC_CD_0_TCR_HA            (1UL << 43)
> +#define CTXDESC_CD_0_TCR_HD            (1UL << 42)
> +
>  #define CTXDESC_CD_0_AA64		(1UL << 41)
>  #define CTXDESC_CD_0_S			(1UL << 44)
>  #define CTXDESC_CD_0_R			(1UL << 45)
> diff --git a/drivers/iommu/io-pgtable-arm.c
> b/drivers/iommu/io-pgtable-arm.c
> index 72dcdd468cf3..b2f470529459 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -75,6 +75,7 @@
> 
>  #define ARM_LPAE_PTE_NSTABLE		(((arm_lpae_iopte)1) << 63)
>  #define ARM_LPAE_PTE_XN			(((arm_lpae_iopte)3) << 53)
> +#define ARM_LPAE_PTE_DBM		(((arm_lpae_iopte)1) << 51)
>  #define ARM_LPAE_PTE_AF			(((arm_lpae_iopte)1) << 10)
>  #define ARM_LPAE_PTE_SH_NS		(((arm_lpae_iopte)0) << 8)
>  #define ARM_LPAE_PTE_SH_OS		(((arm_lpae_iopte)2) << 8)
> @@ -84,7 +85,7 @@
> 
>  #define ARM_LPAE_PTE_ATTR_LO_MASK	(((arm_lpae_iopte)0x3ff) << 2)
>  /* Ignore the contiguous bit for block splitting */
> -#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)6) << 52)
> +#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)13) << 51)
>  #define ARM_LPAE_PTE_ATTR_MASK		(ARM_LPAE_PTE_ATTR_LO_MASK
> |	\
>  					 ARM_LPAE_PTE_ATTR_HI_MASK)
>  /* Software bit for solving coherency races */
> @@ -93,6 +94,9 @@
>  /* Stage-1 PTE */
>  #define ARM_LPAE_PTE_AP_UNPRIV		(((arm_lpae_iopte)1) << 6)
>  #define ARM_LPAE_PTE_AP_RDONLY		(((arm_lpae_iopte)2) << 6)
> +#define ARM_LPAE_PTE_AP_RDONLY_BIT	7
> +#define ARM_LPAE_PTE_AP_WRITABLE	(ARM_LPAE_PTE_AP_RDONLY | \
> +					 ARM_LPAE_PTE_DBM)
>  #define ARM_LPAE_PTE_ATTRINDX_SHIFT	2
>  #define ARM_LPAE_PTE_nG			(((arm_lpae_iopte)1) << 11)
> 
> @@ -407,6 +411,8 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct
> arm_lpae_io_pgtable *data,
>  		pte = ARM_LPAE_PTE_nG;
>  		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
>  			pte |= ARM_LPAE_PTE_AP_RDONLY;
> +		else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_HD)
> +			pte |= ARM_LPAE_PTE_AP_WRITABLE;
>  		if (!(prot & IOMMU_PRIV))
>  			pte |= ARM_LPAE_PTE_AP_UNPRIV;
>  	} else {
> @@ -804,7 +810,8 @@ arm_64_lpae_alloc_pgtable_s1(struct
> io_pgtable_cfg *cfg, void *cookie)
> 
>  	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
>  			    IO_PGTABLE_QUIRK_ARM_TTBR1 |
> -			    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
> +			    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA |
> +			    IO_PGTABLE_QUIRK_ARM_HD))
>  		return NULL;
> 
>  	data = arm_lpae_alloc_pgtable(cfg);
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 25142a0e2fc2..9a996ba7856d 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -85,6 +85,8 @@ struct io_pgtable_cfg {
>  	 *
>  	 * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the
> outer-cacheability
>  	 *	attributes set in the TCR for a non-coherent page-table walker.
> +	 *
> +	 * IO_PGTABLE_QUIRK_ARM_HD: Enables dirty tracking.
>  	 */
>  	#define IO_PGTABLE_QUIRK_ARM_NS			BIT(0)
>  	#define IO_PGTABLE_QUIRK_NO_PERMS		BIT(1)
> @@ -92,6 +94,8 @@ struct io_pgtable_cfg {
>  	#define IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT	BIT(4)
>  	#define IO_PGTABLE_QUIRK_ARM_TTBR1		BIT(5)
>  	#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA		BIT(6)
> +	#define IO_PGTABLE_QUIRK_ARM_HD			BIT(7)
> +
>  	unsigned long			quirks;
>  	unsigned long			pgsize_bitmap;
>  	unsigned int			ias;
> --
> 2.17.2


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 21/24] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping
  2023-05-22 10:34   ` Shameerali Kolothum Thodi
@ 2023-05-22 10:43     ` Joao Martins
  2023-06-16 17:00       ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 65+ messages in thread
From: Joao Martins @ 2023-05-22 10:43 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, iommu
  Cc: Jason Gunthorpe, Kevin Tian, Lu Baolu, Yi Liu, Yi Y Sun,
	Eric Auger, Nicolin Chen, Joerg Roedel, Jean-Philippe Brucker,
	Suravee Suthikulpanit, Will Deacon, Robin Murphy,
	Alex Williamson, kvm

On 22/05/2023 11:34, Shameerali Kolothum Thodi wrote:
>> -----Original Message-----
>> From: Joao Martins [mailto:joao.m.martins@oracle.com]
>> Sent: 18 May 2023 21:47
>> To: iommu@lists.linux.dev
>> Cc: Jason Gunthorpe <jgg@nvidia.com>; Kevin Tian <kevin.tian@intel.com>;
>> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; Lu
>> Baolu <baolu.lu@linux.intel.com>; Yi Liu <yi.l.liu@intel.com>; Yi Y Sun
>> <yi.y.sun@intel.com>; Eric Auger <eric.auger@redhat.com>; Nicolin Chen
>> <nicolinc@nvidia.com>; Joerg Roedel <joro@8bytes.org>; Jean-Philippe
>> Brucker <jean-philippe@linaro.org>; Suravee Suthikulpanit
>> <suravee.suthikulpanit@amd.com>; Will Deacon <will@kernel.org>; Robin
>> Murphy <robin.murphy@arm.com>; Alex Williamson
>> <alex.williamson@redhat.com>; kvm@vger.kernel.org; Joao Martins
>> <joao.m.martins@oracle.com>
>> Subject: [PATCH RFCv2 21/24] iommu/arm-smmu-v3: Enable HTTU for
>> stage1 with io-pgtable mapping
>>
>> From: Kunkun Jiang <jiangkunkun@huawei.com>
>>
>> As nested mode is not upstreamed now, we just aim to support dirty
>> log tracking for stage1 with io-pgtable mapping (means not support
>> SVA mapping). If HTTU is supported, we enable HA/HD bits in the SMMU
>> CD and transfer ARM_HD quirk to io-pgtable.
>>
>> We additionally filter out HD|HA if not supportted. The CD.HD bit
>> is not particularly useful unless we toggle the DBM bit in the PTE
>> entries.
>>
>> Link:
>> https://lore.kernel.org/lkml/20210413085457.25400-6-zhukeqian1@huawei
>> .com/
>> Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
>> [joaomart:Convey HD|HA bits over to the context descriptor
>>  and update commit message; original in Link, where this is based on]
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 10 ++++++++++
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  3 +++
>>  drivers/iommu/io-pgtable-arm.c              | 11 +++++++++--
>>  include/linux/io-pgtable.h                  |  4 ++++
>>  4 files changed, 26 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index e110ff4710bf..e2b98a6a6b74 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1998,6 +1998,11 @@ static const struct iommu_flush_ops
>> arm_smmu_flush_ops = {
>>  	.tlb_add_page	= arm_smmu_tlb_inv_page_nosync,
>>  };
>>
>> +static bool arm_smmu_dbm_capable(struct arm_smmu_device *smmu)
>> +{
>> +	return smmu->features & (ARM_SMMU_FEAT_HD |
>> ARM_SMMU_FEAT_COHERENCY);
>> +}
>> +
> 
> This will claim DBM capability for systems with just ARM_SMMU_FEAT_COHERENCY.

Gah, yes. It should be:

	(smmu->features & (ARM_SMMU_FEAT_HD | ARM_SMMU_FEAT_COHERENCY)) ==
		(ARM_SMMU_FEAT_HD | ARM_SMMU_FEAT_COHERENCY)

or making these two a macro on its own.

> 
>>  /* IOMMU API */
>>  static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
>>  {
>> @@ -2124,6 +2129,8 @@ static int arm_smmu_domain_finalise_s1(struct
>> arm_smmu_domain *smmu_domain,
>>  			  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) |
>>  			  FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) |
>>  			  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
>> +	if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_HD)
>> +		cfg->cd.tcr |= CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD;
>>  	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
>>
>>  	/*
>> @@ -2226,6 +2233,9 @@ static int arm_smmu_domain_finalise(struct
>> iommu_domain *domain,
>>  		.iommu_dev	= smmu->dev,
>>  	};
>>
>> +	if (smmu->features & arm_smmu_dbm_capable(smmu))
>> +		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD;
>> +
>>  	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
>>  	if (!pgtbl_ops)
>>  		return -ENOMEM;
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> index d82dd125446c..83d6f3a2554f 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> @@ -288,6 +288,9 @@
>>  #define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
>>  #define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
>>
>> +#define CTXDESC_CD_0_TCR_HA            (1UL << 43)
>> +#define CTXDESC_CD_0_TCR_HD            (1UL << 42)
>> +
>>  #define CTXDESC_CD_0_AA64		(1UL << 41)
>>  #define CTXDESC_CD_0_S			(1UL << 44)
>>  #define CTXDESC_CD_0_R			(1UL << 45)
>> diff --git a/drivers/iommu/io-pgtable-arm.c
>> b/drivers/iommu/io-pgtable-arm.c
>> index 72dcdd468cf3..b2f470529459 100644
>> --- a/drivers/iommu/io-pgtable-arm.c
>> +++ b/drivers/iommu/io-pgtable-arm.c
>> @@ -75,6 +75,7 @@
>>
>>  #define ARM_LPAE_PTE_NSTABLE		(((arm_lpae_iopte)1) << 63)
>>  #define ARM_LPAE_PTE_XN			(((arm_lpae_iopte)3) << 53)
>> +#define ARM_LPAE_PTE_DBM		(((arm_lpae_iopte)1) << 51)
>>  #define ARM_LPAE_PTE_AF			(((arm_lpae_iopte)1) << 10)
>>  #define ARM_LPAE_PTE_SH_NS		(((arm_lpae_iopte)0) << 8)
>>  #define ARM_LPAE_PTE_SH_OS		(((arm_lpae_iopte)2) << 8)
>> @@ -84,7 +85,7 @@
>>
>>  #define ARM_LPAE_PTE_ATTR_LO_MASK	(((arm_lpae_iopte)0x3ff) << 2)
>>  /* Ignore the contiguous bit for block splitting */
>> -#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)6) << 52)
>> +#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)13) << 51)
>>  #define ARM_LPAE_PTE_ATTR_MASK		(ARM_LPAE_PTE_ATTR_LO_MASK
>> |	\
>>  					 ARM_LPAE_PTE_ATTR_HI_MASK)
>>  /* Software bit for solving coherency races */
>> @@ -93,6 +94,9 @@
>>  /* Stage-1 PTE */
>>  #define ARM_LPAE_PTE_AP_UNPRIV		(((arm_lpae_iopte)1) << 6)
>>  #define ARM_LPAE_PTE_AP_RDONLY		(((arm_lpae_iopte)2) << 6)
>> +#define ARM_LPAE_PTE_AP_RDONLY_BIT	7
>> +#define ARM_LPAE_PTE_AP_WRITABLE	(ARM_LPAE_PTE_AP_RDONLY | \
>> +					 ARM_LPAE_PTE_DBM)
>>  #define ARM_LPAE_PTE_ATTRINDX_SHIFT	2
>>  #define ARM_LPAE_PTE_nG			(((arm_lpae_iopte)1) << 11)
>>
>> @@ -407,6 +411,8 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct
>> arm_lpae_io_pgtable *data,
>>  		pte = ARM_LPAE_PTE_nG;
>>  		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
>>  			pte |= ARM_LPAE_PTE_AP_RDONLY;
>> +		else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_HD)
>> +			pte |= ARM_LPAE_PTE_AP_WRITABLE;
>>  		if (!(prot & IOMMU_PRIV))
>>  			pte |= ARM_LPAE_PTE_AP_UNPRIV;
>>  	} else {
>> @@ -804,7 +810,8 @@ arm_64_lpae_alloc_pgtable_s1(struct
>> io_pgtable_cfg *cfg, void *cookie)
>>
>>  	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
>>  			    IO_PGTABLE_QUIRK_ARM_TTBR1 |
>> -			    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
>> +			    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA |
>> +			    IO_PGTABLE_QUIRK_ARM_HD))
>>  		return NULL;
>>
>>  	data = arm_lpae_alloc_pgtable(cfg);
>> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
>> index 25142a0e2fc2..9a996ba7856d 100644
>> --- a/include/linux/io-pgtable.h
>> +++ b/include/linux/io-pgtable.h
>> @@ -85,6 +85,8 @@ struct io_pgtable_cfg {
>>  	 *
>>  	 * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the
>> outer-cacheability
>>  	 *	attributes set in the TCR for a non-coherent page-table walker.
>> +	 *
>> +	 * IO_PGTABLE_QUIRK_ARM_HD: Enables dirty tracking.
>>  	 */
>>  	#define IO_PGTABLE_QUIRK_ARM_NS			BIT(0)
>>  	#define IO_PGTABLE_QUIRK_NO_PERMS		BIT(1)
>> @@ -92,6 +94,8 @@ struct io_pgtable_cfg {
>>  	#define IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT	BIT(4)
>>  	#define IO_PGTABLE_QUIRK_ARM_TTBR1		BIT(5)
>>  	#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA		BIT(6)
>> +	#define IO_PGTABLE_QUIRK_ARM_HD			BIT(7)
>> +
>>  	unsigned long			quirks;
>>  	unsigned long			pgsize_bitmap;
>>  	unsigned int			ias;
>> --
>> 2.17.2
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* RE: [PATCH RFCv2 24/24] iommu/arm-smmu-v3: Advertise IOMMU_DOMAIN_F_ENFORCE_DIRTY
  2023-05-18 20:46 ` [PATCH RFCv2 24/24] iommu/arm-smmu-v3: Advertise IOMMU_DOMAIN_F_ENFORCE_DIRTY Joao Martins
@ 2023-05-30 14:10   ` Shameerali Kolothum Thodi
  2023-05-30 19:19     ` Joao Martins
  0 siblings, 1 reply; 65+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-05-30 14:10 UTC (permalink / raw)
  To: Joao Martins, iommu
  Cc: Jason Gunthorpe, Kevin Tian, Lu Baolu, Yi Liu, Yi Y Sun,
	Eric Auger, Nicolin Chen, Joerg Roedel, Jean-Philippe Brucker,
	Suravee Suthikulpanit, Will Deacon, Robin Murphy,
	Alex Williamson, kvm



> -----Original Message-----
> From: Joao Martins [mailto:joao.m.martins@oracle.com]
> Sent: 18 May 2023 21:47
> To: iommu@lists.linux.dev
> Cc: Jason Gunthorpe <jgg@nvidia.com>; Kevin Tian <kevin.tian@intel.com>;
> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; Lu
> Baolu <baolu.lu@linux.intel.com>; Yi Liu <yi.l.liu@intel.com>; Yi Y Sun
> <yi.y.sun@intel.com>; Eric Auger <eric.auger@redhat.com>; Nicolin Chen
> <nicolinc@nvidia.com>; Joerg Roedel <joro@8bytes.org>; Jean-Philippe
> Brucker <jean-philippe@linaro.org>; Suravee Suthikulpanit
> <suravee.suthikulpanit@amd.com>; Will Deacon <will@kernel.org>; Robin
> Murphy <robin.murphy@arm.com>; Alex Williamson
> <alex.williamson@redhat.com>; kvm@vger.kernel.org; Joao Martins
> <joao.m.martins@oracle.com>
> Subject: [PATCH RFCv2 24/24] iommu/arm-smmu-v3: Advertise
> IOMMU_DOMAIN_F_ENFORCE_DIRTY
> 
> Now that we probe, and handle the DBM bit modifier, unblock
> the kAPI usage by exposing the IOMMU_DOMAIN_F_ENFORCE_DIRTY
> and implement it's requirement of revoking device attachment
> in the iommu_capable. Finally expose the IOMMU_CAP_DIRTY to
> users (IOMMUFD_DEVICE_GET_CAPS).
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index bf0aac333725..71dd95a687fd 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2014,6 +2014,8 @@ static bool arm_smmu_capable(struct device *dev,
> enum iommu_cap cap)
>  		return master->smmu->features &
> ARM_SMMU_FEAT_COHERENCY;
>  	case IOMMU_CAP_NOEXEC:
>  		return true;
> +	case IOMMU_CAP_DIRTY:
> +		return arm_smmu_dbm_capable(master->smmu);
>  	default:
>  		return false;
>  	}
> @@ -2430,6 +2432,11 @@ static int arm_smmu_attach_dev(struct
> iommu_domain *domain, struct device *dev)
>  	master = dev_iommu_priv_get(dev);
>  	smmu = master->smmu;
> 
> +	if (domain->flags & IOMMU_DOMAIN_F_ENFORCE_DIRTY &&
> +	    !arm_smmu_dbm_capable(smmu))
> +		return -EINVAL;
> +
> +

Since we have the supported_flags always set to " IOMMU_DOMAIN_F_ENFORCE_DIRTY"
below, platforms that doesn't have DBM capability will fail here, right? Or the idea is to set
domain flag only if the capability is reported true? But the iommu_domain_set_flags() doesn't
seems to check the capability though. 

(This seems to be causing problem with a rebased Qemu branch for ARM I have while sanity
testing on a platform that doesn't have DBM. I need to double check though).

Thanks,
Shameer

   
>  	/*
>  	 * Checking that SVA is disabled ensures that this device isn't bound to
>  	 * any mm, and can be safely detached from its old domain. Bonds
> cannot
> @@ -2913,6 +2920,7 @@ static void arm_smmu_remove_dev_pasid(struct
> device *dev, ioasid_t pasid)
>  }
> 
>  static struct iommu_ops arm_smmu_ops = {
> +	.supported_flags	= IOMMU_DOMAIN_F_ENFORCE_DIRTY,
>  	.capable		= arm_smmu_capable,
>  	.domain_alloc		= arm_smmu_domain_alloc,
>  	.probe_device		= arm_smmu_probe_device,
> --
> 2.17.2


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 24/24] iommu/arm-smmu-v3: Advertise IOMMU_DOMAIN_F_ENFORCE_DIRTY
  2023-05-30 14:10   ` Shameerali Kolothum Thodi
@ 2023-05-30 19:19     ` Joao Martins
  2023-05-31  9:21       ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 65+ messages in thread
From: Joao Martins @ 2023-05-30 19:19 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Jason Gunthorpe, Kevin Tian, Lu Baolu, Yi Liu, Yi Y Sun,
	Eric Auger, Nicolin Chen, Joerg Roedel, Jean-Philippe Brucker,
	Suravee Suthikulpanit, Will Deacon, Robin Murphy,
	Alex Williamson, kvm, iommu

On 30/05/2023 15:10, Shameerali Kolothum Thodi wrote:
>> -----Original Message-----
>> From: Joao Martins [mailto:joao.m.martins@oracle.com]
>> Sent: 18 May 2023 21:47
>> To: iommu@lists.linux.dev
>> Cc: Jason Gunthorpe <jgg@nvidia.com>; Kevin Tian <kevin.tian@intel.com>;
>> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; Lu
>> Baolu <baolu.lu@linux.intel.com>; Yi Liu <yi.l.liu@intel.com>; Yi Y Sun
>> <yi.y.sun@intel.com>; Eric Auger <eric.auger@redhat.com>; Nicolin Chen
>> <nicolinc@nvidia.com>; Joerg Roedel <joro@8bytes.org>; Jean-Philippe
>> Brucker <jean-philippe@linaro.org>; Suravee Suthikulpanit
>> <suravee.suthikulpanit@amd.com>; Will Deacon <will@kernel.org>; Robin
>> Murphy <robin.murphy@arm.com>; Alex Williamson
>> <alex.williamson@redhat.com>; kvm@vger.kernel.org; Joao Martins
>> <joao.m.martins@oracle.com>
>> Subject: [PATCH RFCv2 24/24] iommu/arm-smmu-v3: Advertise
>> IOMMU_DOMAIN_F_ENFORCE_DIRTY
>>
>> Now that we probe, and handle the DBM bit modifier, unblock
>> the kAPI usage by exposing the IOMMU_DOMAIN_F_ENFORCE_DIRTY
>> and implement it's requirement of revoking device attachment
>> in the iommu_capable. Finally expose the IOMMU_CAP_DIRTY to
>> users (IOMMUFD_DEVICE_GET_CAPS).
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index bf0aac333725..71dd95a687fd 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -2014,6 +2014,8 @@ static bool arm_smmu_capable(struct device *dev,
>> enum iommu_cap cap)
>>  		return master->smmu->features &
>> ARM_SMMU_FEAT_COHERENCY;
>>  	case IOMMU_CAP_NOEXEC:
>>  		return true;
>> +	case IOMMU_CAP_DIRTY:
>> +		return arm_smmu_dbm_capable(master->smmu);
>>  	default:
>>  		return false;
>>  	}
>> @@ -2430,6 +2432,11 @@ static int arm_smmu_attach_dev(struct
>> iommu_domain *domain, struct device *dev)
>>  	master = dev_iommu_priv_get(dev);
>>  	smmu = master->smmu;
>>
>> +	if (domain->flags & IOMMU_DOMAIN_F_ENFORCE_DIRTY &&
>> +	    !arm_smmu_dbm_capable(smmu))
>> +		return -EINVAL;
>> +
>> +
> 
> Since we have the supported_flags always set to " IOMMU_DOMAIN_F_ENFORCE_DIRTY"
> below, platforms that doesn't have DBM capability will fail here, right? 
> Or the idea is to set
> domain flag only if the capability is reported true? But the iommu_domain_set_flags() doesn't
> seems to check the capability though. 
> 
As posted the checking was only take place at device_attach (and you would set
the enforcement flag if iommufd reports the capability for the device via
IOMMU_DEVICE_GET_CAPS).

But the workflow will change a bit: while the enforcement also takes place on
device attach, but when we create a HWPT domain with flags (in
domain_alloc_user[0]), the dirty tracking is also going to be checked there
against the device passed in domain_alloc_user() in the driver implementation.
And otherwise fail if doesn't support when dirty-tracking support enforcement as
passed by flags. When we don't request dirty tracking the iommu ops that perform
the dirty tracking will also be kept cleared.

[0] https://lore.kernel.org/linux-iommu/20230511143844.22693-2-yi.l.liu@intel.com/

> (This seems to be causing problem with a rebased Qemu branch for ARM I have while sanity
> testing on a platform that doesn't have DBM. I need to double check though).
> 

Perhaps due to the broken check I had that I need validate the two bits
together, when it didn't had DBM set? Or I suspect because the qemu last patch I
was always end up setting IOMMU_DOMAIN_F_ENFORCE_DIRTY [*], and because the
checking is always enabled you can never attach devices.

[*] That last patch isn't quite there yet as it is meant to be using
device-get-caps prior to setting the enforcement, like the selftests

^ permalink raw reply	[flat|nested] 65+ messages in thread

* RE: [PATCH RFCv2 24/24] iommu/arm-smmu-v3: Advertise IOMMU_DOMAIN_F_ENFORCE_DIRTY
  2023-05-30 19:19     ` Joao Martins
@ 2023-05-31  9:21       ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 65+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-05-31  9:21 UTC (permalink / raw)
  To: Joao Martins
  Cc: Jason Gunthorpe, Kevin Tian, Lu Baolu, Yi Liu, Yi Y Sun,
	Eric Auger, Nicolin Chen, Joerg Roedel, Jean-Philippe Brucker,
	Suravee Suthikulpanit, Will Deacon, Robin Murphy,
	Alex Williamson, kvm, iommu



> -----Original Message-----
> From: Joao Martins [mailto:joao.m.martins@oracle.com]
> Sent: 30 May 2023 20:20
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Jason Gunthorpe <jgg@nvidia.com>; Kevin Tian <kevin.tian@intel.com>;
> Lu Baolu <baolu.lu@linux.intel.com>; Yi Liu <yi.l.liu@intel.com>; Yi Y Sun
> <yi.y.sun@intel.com>; Eric Auger <eric.auger@redhat.com>; Nicolin Chen
> <nicolinc@nvidia.com>; Joerg Roedel <joro@8bytes.org>; Jean-Philippe
> Brucker <jean-philippe@linaro.org>; Suravee Suthikulpanit
> <suravee.suthikulpanit@amd.com>; Will Deacon <will@kernel.org>; Robin
> Murphy <robin.murphy@arm.com>; Alex Williamson
> <alex.williamson@redhat.com>; kvm@vger.kernel.org;
> iommu@lists.linux.dev
> Subject: Re: [PATCH RFCv2 24/24] iommu/arm-smmu-v3: Advertise
> IOMMU_DOMAIN_F_ENFORCE_DIRTY
> 
> On 30/05/2023 15:10, Shameerali Kolothum Thodi wrote:
> >> -----Original Message-----
> >> From: Joao Martins [mailto:joao.m.martins@oracle.com]
> >> Sent: 18 May 2023 21:47
> >> To: iommu@lists.linux.dev
> >> Cc: Jason Gunthorpe <jgg@nvidia.com>; Kevin Tian
> <kevin.tian@intel.com>;
> >> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> Lu
> >> Baolu <baolu.lu@linux.intel.com>; Yi Liu <yi.l.liu@intel.com>; Yi Y Sun
> >> <yi.y.sun@intel.com>; Eric Auger <eric.auger@redhat.com>; Nicolin Chen
> >> <nicolinc@nvidia.com>; Joerg Roedel <joro@8bytes.org>; Jean-Philippe
> >> Brucker <jean-philippe@linaro.org>; Suravee Suthikulpanit
> >> <suravee.suthikulpanit@amd.com>; Will Deacon <will@kernel.org>; Robin
> >> Murphy <robin.murphy@arm.com>; Alex Williamson
> >> <alex.williamson@redhat.com>; kvm@vger.kernel.org; Joao Martins
> >> <joao.m.martins@oracle.com>
> >> Subject: [PATCH RFCv2 24/24] iommu/arm-smmu-v3: Advertise
> >> IOMMU_DOMAIN_F_ENFORCE_DIRTY
> >>
> >> Now that we probe, and handle the DBM bit modifier, unblock
> >> the kAPI usage by exposing the IOMMU_DOMAIN_F_ENFORCE_DIRTY
> >> and implement it's requirement of revoking device attachment
> >> in the iommu_capable. Finally expose the IOMMU_CAP_DIRTY to
> >> users (IOMMUFD_DEVICE_GET_CAPS).
> >>
> >> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> >> ---
> >>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 ++++++++
> >>  1 file changed, 8 insertions(+)
> >>
> >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >> index bf0aac333725..71dd95a687fd 100644
> >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >> @@ -2014,6 +2014,8 @@ static bool arm_smmu_capable(struct device
> *dev,
> >> enum iommu_cap cap)
> >>  		return master->smmu->features &
> >> ARM_SMMU_FEAT_COHERENCY;
> >>  	case IOMMU_CAP_NOEXEC:
> >>  		return true;
> >> +	case IOMMU_CAP_DIRTY:
> >> +		return arm_smmu_dbm_capable(master->smmu);
> >>  	default:
> >>  		return false;
> >>  	}
> >> @@ -2430,6 +2432,11 @@ static int arm_smmu_attach_dev(struct
> >> iommu_domain *domain, struct device *dev)
> >>  	master = dev_iommu_priv_get(dev);
> >>  	smmu = master->smmu;
> >>
> >> +	if (domain->flags & IOMMU_DOMAIN_F_ENFORCE_DIRTY &&
> >> +	    !arm_smmu_dbm_capable(smmu))
> >> +		return -EINVAL;
> >> +
> >> +
> >
> > Since we have the supported_flags always set to "
> IOMMU_DOMAIN_F_ENFORCE_DIRTY"
> > below, platforms that doesn't have DBM capability will fail here, right?
> > Or the idea is to set
> > domain flag only if the capability is reported true? But the
> iommu_domain_set_flags() doesn't
> > seems to check the capability though.
> >
> As posted the checking was only take place at device_attach (and you would
> set
> the enforcement flag if iommufd reports the capability for the device via
> IOMMU_DEVICE_GET_CAPS).

Ok. So CAPS is retrieved before we set the enforcement flag.

> 
> But the workflow will change a bit: while the enforcement also takes place
> on
> device attach, but when we create a HWPT domain with flags (in
> domain_alloc_user[0]), the dirty tracking is also going to be checked there
> against the device passed in domain_alloc_user() in the driver
> implementation.
> And otherwise fail if doesn't support when dirty-tracking support
> enforcement as
> passed by flags. When we don't request dirty tracking the iommu ops that
> perform
> the dirty tracking will also be kept cleared.

Ok.

> 
> [0]
> https://lore.kernel.org/linux-iommu/20230511143844.22693-2-yi.l.liu@inte
> l.com/
> 
> > (This seems to be causing problem with a rebased Qemu branch for ARM I
> have while sanity
> > testing on a platform that doesn't have DBM. I need to double check
> though).
> >
> 
> Perhaps due to the broken check I had that I need validate the two bits
> together, when it didn't had DBM set?

I have that fixed in my branch now.

 Or I suspect because the qemu last
> patch I
> was always end up setting IOMMU_DOMAIN_F_ENFORCE_DIRTY [*], and
> because the
> checking is always enabled you can never attach devices.

Ah.. this is it. 

> [*] That last patch isn't quite there yet as it is meant to be using
> device-get-caps prior to setting the enforcement, like the selftests

Got it.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 65+ messages in thread

* RE: [PATCH RFCv2 22/24] iommu/arm-smmu-v3: Add read_and_clear_dirty() support
  2023-05-18 20:46 ` [PATCH RFCv2 22/24] iommu/arm-smmu-v3: Add read_and_clear_dirty() support Joao Martins
@ 2023-06-16 16:46   ` Shameerali Kolothum Thodi
  2023-06-16 18:10     ` Joao Martins
  0 siblings, 1 reply; 65+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-06-16 16:46 UTC (permalink / raw)
  To: Joao Martins, iommu
  Cc: Jason Gunthorpe, Kevin Tian, Lu Baolu, Yi Liu, Yi Y Sun,
	Eric Auger, Nicolin Chen, Joerg Roedel, Jean-Philippe Brucker,
	Suravee Suthikulpanit, Will Deacon, Robin Murphy,
	Alex Williamson, kvm

Hi Joao,

> -----Original Message-----
> From: Joao Martins [mailto:joao.m.martins@oracle.com]
> Sent: 18 May 2023 21:47
> To: iommu@lists.linux.dev
> Cc: Jason Gunthorpe <jgg@nvidia.com>; Kevin Tian <kevin.tian@intel.com>;
> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; Lu
> Baolu <baolu.lu@linux.intel.com>; Yi Liu <yi.l.liu@intel.com>; Yi Y Sun
> <yi.y.sun@intel.com>; Eric Auger <eric.auger@redhat.com>; Nicolin Chen
> <nicolinc@nvidia.com>; Joerg Roedel <joro@8bytes.org>; Jean-Philippe
> Brucker <jean-philippe@linaro.org>; Suravee Suthikulpanit
> <suravee.suthikulpanit@amd.com>; Will Deacon <will@kernel.org>; Robin
> Murphy <robin.murphy@arm.com>; Alex Williamson
> <alex.williamson@redhat.com>; kvm@vger.kernel.org; Joao Martins
> <joao.m.martins@oracle.com>
> Subject: [PATCH RFCv2 22/24] iommu/arm-smmu-v3: Add
> read_and_clear_dirty() support
> 
> From: Keqian Zhu <zhukeqian1@huawei.com>
> 
> .read_and_clear_dirty() IOMMU domain op takes care of reading the dirty
> bits (i.e. PTE has both DBM and AP[2] set) and marshalling into a bitmap of
> a given page size.
> 
> While reading the dirty bits we also clear the PTE AP[2] bit to mark it as
> writable-clean depending on read_and_clear_dirty() flags.
> 
> Structure it in a way that the IOPTE walker is generic, and so we pass a
> function pointer over what to do on a per-PTE basis.
> 
> [Link below points to the original version that was based on]
> 
> Link:
> https://lore.kernel.org/lkml/20210413085457.25400-11-zhukeqian1@huaw
> ei.com/
> Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
> Co-developed-by: Kunkun Jiang <jiangkunkun@huawei.com>
> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
> [joaomart: Massage commit message]
> Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  23 +++++
>  drivers/iommu/io-pgtable-arm.c              | 104
> ++++++++++++++++++++
>  2 files changed, 127 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index e2b98a6a6b74..2cde14003469 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2765,6 +2765,28 @@ static int arm_smmu_enable_nesting(struct
> iommu_domain *domain)
>  	return ret;
>  }
> 
> +static int arm_smmu_read_and_clear_dirty(struct iommu_domain
> *domain,
> +					 unsigned long iova, size_t size,
> +					 unsigned long flags,
> +					 struct iommu_dirty_bitmap *dirty)
> +{
> +	struct arm_smmu_domain *smmu_domain =
> to_smmu_domain(domain);
> +	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> +	int ret;
> +
> +	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
> +		return -EINVAL;
> +
> +	if (!ops || !ops->read_and_clear_dirty) {
> +		pr_err_once("io-pgtable don't support dirty tracking\n");
> +		return -ENODEV;
> +	}
> +
> +	ret = ops->read_and_clear_dirty(ops, iova, size, flags, dirty);
> +
> +	return ret;
> +}
> +
>  static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args
> *args)
>  {
>  	return iommu_fwspec_add_ids(dev, args->args, 1);
> @@ -2893,6 +2915,7 @@ static struct iommu_ops arm_smmu_ops = {
>  		.iova_to_phys		= arm_smmu_iova_to_phys,
>  		.enable_nesting		= arm_smmu_enable_nesting,
>  		.free			= arm_smmu_domain_free,
> +		.read_and_clear_dirty	= arm_smmu_read_and_clear_dirty,
>  	}
>  };
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c
> b/drivers/iommu/io-pgtable-arm.c
> index b2f470529459..de9e61f8452d 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -717,6 +717,109 @@ static phys_addr_t arm_lpae_iova_to_phys(struct
> io_pgtable_ops *ops,
>  	return iopte_to_paddr(pte, data) | iova;
>  }
> 
> +struct arm_lpae_iopte_read_dirty {
> +	unsigned long flags;
> +	struct iommu_dirty_bitmap *dirty;
> +};
> +
> +static int __arm_lpae_read_and_clear_dirty(unsigned long iova, size_t size,
> +					   arm_lpae_iopte *ptep, void *opaque)
> +{
> +	struct arm_lpae_iopte_read_dirty *arg = opaque;
> +	struct iommu_dirty_bitmap *dirty = arg->dirty;
> +	arm_lpae_iopte pte;
> +
> +	pte = READ_ONCE(*ptep);
> +	if (WARN_ON(!pte))
> +		return -EINVAL;
> +
> +	if ((pte & ARM_LPAE_PTE_AP_WRITABLE) ==
> ARM_LPAE_PTE_AP_WRITABLE)
> +		return 0;
> +
> +	iommu_dirty_bitmap_record(dirty, iova, size);
> +	if (!(arg->flags & IOMMU_DIRTY_NO_CLEAR))
> +		set_bit(ARM_LPAE_PTE_AP_RDONLY_BIT, (unsigned long *)ptep);
> +	return 0;
> +}
> +
> +static int __arm_lpae_iopte_walk(struct arm_lpae_io_pgtable *data,
> +				 unsigned long iova, size_t size,
> +				 int lvl, arm_lpae_iopte *ptep,
> +				 int (*fn)(unsigned long iova, size_t size,
> +					   arm_lpae_iopte *pte, void *opaque),
> +				 void *opaque)
> +{
> +	arm_lpae_iopte pte;
> +	struct io_pgtable *iop = &data->iop;
> +	size_t base, next_size;
> +	int ret;
> +
> +	if (WARN_ON_ONCE(!fn))
> +		return -EINVAL;
> +
> +	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
> +		return -EINVAL;
> +
> +	ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
> +	pte = READ_ONCE(*ptep);
> +	if (WARN_ON(!pte))
> +		return -EINVAL;
> +
> +	if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
> +		if (iopte_leaf(pte, lvl, iop->fmt))
> +			return fn(iova, size, ptep, opaque);
> +
> +		/* Current level is table, traverse next level */
> +		next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
> +		ptep = iopte_deref(pte, data);
> +		for (base = 0; base < size; base += next_size) {
> +			ret = __arm_lpae_iopte_walk(data, iova + base,
> +						    next_size, lvl + 1, ptep,
> +						    fn, opaque);
> +			if (ret)
> +				return ret;
> +		}
> +		return 0;
> +	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
> +		return fn(iova, size, ptep, opaque);
> +	}
> +
> +	/* Keep on walkin */
> +	ptep = iopte_deref(pte, data);
> +	return __arm_lpae_iopte_walk(data, iova, size, lvl + 1, ptep,
> +				     fn, opaque);
> +}
> +
> +static int arm_lpae_read_and_clear_dirty(struct io_pgtable_ops *ops,
> +					 unsigned long iova, size_t size,
> +					 unsigned long flags,
> +					 struct iommu_dirty_bitmap *dirty)
> +{
> +	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +	struct io_pgtable_cfg *cfg = &data->iop.cfg;
> +	struct arm_lpae_iopte_read_dirty arg = {
> +		.flags = flags, .dirty = dirty,
> +	};
> +	arm_lpae_iopte *ptep = data->pgd;
> +	int lvl = data->start_level;
> +	long iaext = (s64)iova >> cfg->ias;
> +
> +	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
> +		return -EINVAL;

I guess the size here is supposed to be one of the pgsize that iommu supports.
But looking at the code, it looks like we are passing the iova mapped length and
it fails here in my test setup. Could you please check and confirm.

Thanks,
Shameer


> +
> +	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
> +		iaext = ~iaext;
> +	if (WARN_ON(iaext))
> +		return -EINVAL;
> +
> +	if (data->iop.fmt != ARM_64_LPAE_S1 &&
> +	    data->iop.fmt != ARM_32_LPAE_S1)
> +		return -EINVAL;
> +
> +	return __arm_lpae_iopte_walk(data, iova, size, lvl, ptep,
> +				     __arm_lpae_read_and_clear_dirty, &arg);
> +}
> +
>  static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
>  {
>  	unsigned long granule, page_sizes;
> @@ -795,6 +898,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
>  		.map_pages	= arm_lpae_map_pages,
>  		.unmap_pages	= arm_lpae_unmap_pages,
>  		.iova_to_phys	= arm_lpae_iova_to_phys,
> +		.read_and_clear_dirty = arm_lpae_read_and_clear_dirty,
>  	};
> 
>  	return data;
> --
> 2.17.2


^ permalink raw reply	[flat|nested] 65+ messages in thread

* RE: [PATCH RFCv2 21/24] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping
  2023-05-22 10:43     ` Joao Martins
@ 2023-06-16 17:00       ` Shameerali Kolothum Thodi
  2023-06-16 18:11         ` Joao Martins
  0 siblings, 1 reply; 65+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-06-16 17:00 UTC (permalink / raw)
  To: Joao Martins, iommu
  Cc: Jason Gunthorpe, Kevin Tian, Lu Baolu, Yi Liu, Yi Y Sun,
	Eric Auger, Nicolin Chen, Joerg Roedel, Jean-Philippe Brucker,
	Suravee Suthikulpanit, Will Deacon, Robin Murphy,
	Alex Williamson, kvm



> -----Original Message-----
> From: Joao Martins [mailto:joao.m.martins@oracle.com]
> Sent: 22 May 2023 11:43
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> iommu@lists.linux.dev
> Cc: Jason Gunthorpe <jgg@nvidia.com>; Kevin Tian <kevin.tian@intel.com>;
> Lu Baolu <baolu.lu@linux.intel.com>; Yi Liu <yi.l.liu@intel.com>; Yi Y Sun
> <yi.y.sun@intel.com>; Eric Auger <eric.auger@redhat.com>; Nicolin Chen
> <nicolinc@nvidia.com>; Joerg Roedel <joro@8bytes.org>; Jean-Philippe
> Brucker <jean-philippe@linaro.org>; Suravee Suthikulpanit
> <suravee.suthikulpanit@amd.com>; Will Deacon <will@kernel.org>; Robin
> Murphy <robin.murphy@arm.com>; Alex Williamson
> <alex.williamson@redhat.com>; kvm@vger.kernel.org
> Subject: Re: [PATCH RFCv2 21/24] iommu/arm-smmu-v3: Enable HTTU for
> stage1 with io-pgtable mapping

[...]

> >> @@ -2226,6 +2233,9 @@ static int arm_smmu_domain_finalise(struct
> >> iommu_domain *domain,
> >>  		.iommu_dev	= smmu->dev,
> >>  	};
> >>
> >> +	if (smmu->features & arm_smmu_dbm_capable(smmu))
> >> +		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD;

Also, I think we should limit setting this to s1 only pgtbl_cfg.

Thanks,
Shameer

> >> +
> >>  	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
> >>  	if (!pgtbl_ops)
> >>  		return -ENOMEM;
> >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> >> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> >> index d82dd125446c..83d6f3a2554f 100644
> >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> >> @@ -288,6 +288,9 @@
> >>  #define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
> >>  #define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
> >>
> >> +#define CTXDESC_CD_0_TCR_HA            (1UL << 43)
> >> +#define CTXDESC_CD_0_TCR_HD            (1UL << 42)
> >> +
> >>  #define CTXDESC_CD_0_AA64		(1UL << 41)
> >>  #define CTXDESC_CD_0_S			(1UL << 44)
> >>  #define CTXDESC_CD_0_R			(1UL << 45)
> >> diff --git a/drivers/iommu/io-pgtable-arm.c
> >> b/drivers/iommu/io-pgtable-arm.c index 72dcdd468cf3..b2f470529459
> >> 100644
> >> --- a/drivers/iommu/io-pgtable-arm.c
> >> +++ b/drivers/iommu/io-pgtable-arm.c
> >> @@ -75,6 +75,7 @@
> >>
> >>  #define ARM_LPAE_PTE_NSTABLE		(((arm_lpae_iopte)1) << 63)
> >>  #define ARM_LPAE_PTE_XN			(((arm_lpae_iopte)3) << 53)
> >> +#define ARM_LPAE_PTE_DBM		(((arm_lpae_iopte)1) << 51)
> >>  #define ARM_LPAE_PTE_AF			(((arm_lpae_iopte)1) << 10)
> >>  #define ARM_LPAE_PTE_SH_NS		(((arm_lpae_iopte)0) << 8)
> >>  #define ARM_LPAE_PTE_SH_OS		(((arm_lpae_iopte)2) << 8)
> >> @@ -84,7 +85,7 @@
> >>
> >>  #define ARM_LPAE_PTE_ATTR_LO_MASK	(((arm_lpae_iopte)0x3ff) <<
> 2)
> >>  /* Ignore the contiguous bit for block splitting */
> >> -#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)6) << 52)
> >> +#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)13) <<
> 51)
> >>  #define ARM_LPAE_PTE_ATTR_MASK
> 	(ARM_LPAE_PTE_ATTR_LO_MASK
> >> |	\
> >>  					 ARM_LPAE_PTE_ATTR_HI_MASK)
> >>  /* Software bit for solving coherency races */ @@ -93,6 +94,9 @@
> >>  /* Stage-1 PTE */
> >>  #define ARM_LPAE_PTE_AP_UNPRIV		(((arm_lpae_iopte)1) << 6)
> >>  #define ARM_LPAE_PTE_AP_RDONLY		(((arm_lpae_iopte)2) << 6)
> >> +#define ARM_LPAE_PTE_AP_RDONLY_BIT	7
> >> +#define ARM_LPAE_PTE_AP_WRITABLE
> 	(ARM_LPAE_PTE_AP_RDONLY | \
> >> +					 ARM_LPAE_PTE_DBM)
> >>  #define ARM_LPAE_PTE_ATTRINDX_SHIFT	2
> >>  #define ARM_LPAE_PTE_nG			(((arm_lpae_iopte)1) << 11)
> >>
> >> @@ -407,6 +411,8 @@ static arm_lpae_iopte
> arm_lpae_prot_to_pte(struct
> >> arm_lpae_io_pgtable *data,
> >>  		pte = ARM_LPAE_PTE_nG;
> >>  		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
> >>  			pte |= ARM_LPAE_PTE_AP_RDONLY;
> >> +		else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_HD)
> >> +			pte |= ARM_LPAE_PTE_AP_WRITABLE;
> >>  		if (!(prot & IOMMU_PRIV))
> >>  			pte |= ARM_LPAE_PTE_AP_UNPRIV;
> >>  	} else {
> >> @@ -804,7 +810,8 @@ arm_64_lpae_alloc_pgtable_s1(struct
> >> io_pgtable_cfg *cfg, void *cookie)
> >>
> >>  	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
> >>  			    IO_PGTABLE_QUIRK_ARM_TTBR1 |
> >> -			    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
> >> +			    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA |
> >> +			    IO_PGTABLE_QUIRK_ARM_HD))
> >>  		return NULL;
> >>
> >>  	data = arm_lpae_alloc_pgtable(cfg); diff --git
> >> a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index
> >> 25142a0e2fc2..9a996ba7856d 100644
> >> --- a/include/linux/io-pgtable.h
> >> +++ b/include/linux/io-pgtable.h
> >> @@ -85,6 +85,8 @@ struct io_pgtable_cfg {
> >>  	 *
> >>  	 * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the
> outer-cacheability
> >>  	 *	attributes set in the TCR for a non-coherent page-table walker.
> >> +	 *
> >> +	 * IO_PGTABLE_QUIRK_ARM_HD: Enables dirty tracking.
> >>  	 */
> >>  	#define IO_PGTABLE_QUIRK_ARM_NS			BIT(0)
> >>  	#define IO_PGTABLE_QUIRK_NO_PERMS		BIT(1)
> >> @@ -92,6 +94,8 @@ struct io_pgtable_cfg {
> >>  	#define IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT	BIT(4)
> >>  	#define IO_PGTABLE_QUIRK_ARM_TTBR1		BIT(5)
> >>  	#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA		BIT(6)
> >> +	#define IO_PGTABLE_QUIRK_ARM_HD			BIT(7)
> >> +
> >>  	unsigned long			quirks;
> >>  	unsigned long			pgsize_bitmap;
> >>  	unsigned int			ias;
> >> --
> >> 2.17.2
> >

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 22/24] iommu/arm-smmu-v3: Add read_and_clear_dirty() support
  2023-06-16 16:46   ` Shameerali Kolothum Thodi
@ 2023-06-16 18:10     ` Joao Martins
  0 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-06-16 18:10 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Jason Gunthorpe, Kevin Tian, Lu Baolu, Yi Liu, Yi Y Sun, iommu,
	Eric Auger, Nicolin Chen, Joerg Roedel, Jean-Philippe Brucker,
	Suravee Suthikulpanit, Will Deacon, Robin Murphy,
	Alex Williamson, kvm

On 16/06/2023 17:46, Shameerali Kolothum Thodi wrote:
> Hi Joao,
> 
>> -----Original Message-----
>> From: Joao Martins [mailto:joao.m.martins@oracle.com]
>> Sent: 18 May 2023 21:47
>> To: iommu@lists.linux.dev
>> Cc: Jason Gunthorpe <jgg@nvidia.com>; Kevin Tian <kevin.tian@intel.com>;
>> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; Lu
>> Baolu <baolu.lu@linux.intel.com>; Yi Liu <yi.l.liu@intel.com>; Yi Y Sun
>> <yi.y.sun@intel.com>; Eric Auger <eric.auger@redhat.com>; Nicolin Chen
>> <nicolinc@nvidia.com>; Joerg Roedel <joro@8bytes.org>; Jean-Philippe
>> Brucker <jean-philippe@linaro.org>; Suravee Suthikulpanit
>> <suravee.suthikulpanit@amd.com>; Will Deacon <will@kernel.org>; Robin
>> Murphy <robin.murphy@arm.com>; Alex Williamson
>> <alex.williamson@redhat.com>; kvm@vger.kernel.org; Joao Martins
>> <joao.m.martins@oracle.com>
>> Subject: [PATCH RFCv2 22/24] iommu/arm-smmu-v3: Add
>> read_and_clear_dirty() support
>>
>> From: Keqian Zhu <zhukeqian1@huawei.com>
>>
>> .read_and_clear_dirty() IOMMU domain op takes care of reading the dirty
>> bits (i.e. PTE has both DBM and AP[2] set) and marshalling into a bitmap of
>> a given page size.
>>
>> While reading the dirty bits we also clear the PTE AP[2] bit to mark it as
>> writable-clean depending on read_and_clear_dirty() flags.
>>
>> Structure it in a way that the IOPTE walker is generic, and so we pass a
>> function pointer over what to do on a per-PTE basis.
>>
>> [Link below points to the original version that was based on]
>>
>> Link:
>> https://lore.kernel.org/lkml/20210413085457.25400-11-zhukeqian1@huaw
>> ei.com/
>> Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
>> Co-developed-by: Kunkun Jiang <jiangkunkun@huawei.com>
>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
>> [joaomart: Massage commit message]
>> Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  23 +++++
>>  drivers/iommu/io-pgtable-arm.c              | 104
>> ++++++++++++++++++++
>>  2 files changed, 127 insertions(+)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index e2b98a6a6b74..2cde14003469 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -2765,6 +2765,28 @@ static int arm_smmu_enable_nesting(struct
>> iommu_domain *domain)
>>  	return ret;
>>  }
>>
>> +static int arm_smmu_read_and_clear_dirty(struct iommu_domain
>> *domain,
>> +					 unsigned long iova, size_t size,
>> +					 unsigned long flags,
>> +					 struct iommu_dirty_bitmap *dirty)
>> +{
>> +	struct arm_smmu_domain *smmu_domain =
>> to_smmu_domain(domain);
>> +	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> +	int ret;
>> +
>> +	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
>> +		return -EINVAL;
>> +
>> +	if (!ops || !ops->read_and_clear_dirty) {
>> +		pr_err_once("io-pgtable don't support dirty tracking\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	ret = ops->read_and_clear_dirty(ops, iova, size, flags, dirty);
>> +
>> +	return ret;
>> +}
>> +
>>  static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args
>> *args)
>>  {
>>  	return iommu_fwspec_add_ids(dev, args->args, 1);
>> @@ -2893,6 +2915,7 @@ static struct iommu_ops arm_smmu_ops = {
>>  		.iova_to_phys		= arm_smmu_iova_to_phys,
>>  		.enable_nesting		= arm_smmu_enable_nesting,
>>  		.free			= arm_smmu_domain_free,
>> +		.read_and_clear_dirty	= arm_smmu_read_and_clear_dirty,
>>  	}
>>  };
>>
>> diff --git a/drivers/iommu/io-pgtable-arm.c
>> b/drivers/iommu/io-pgtable-arm.c
>> index b2f470529459..de9e61f8452d 100644
>> --- a/drivers/iommu/io-pgtable-arm.c
>> +++ b/drivers/iommu/io-pgtable-arm.c
>> @@ -717,6 +717,109 @@ static phys_addr_t arm_lpae_iova_to_phys(struct
>> io_pgtable_ops *ops,
>>  	return iopte_to_paddr(pte, data) | iova;
>>  }
>>
>> +struct arm_lpae_iopte_read_dirty {
>> +	unsigned long flags;
>> +	struct iommu_dirty_bitmap *dirty;
>> +};
>> +
>> +static int __arm_lpae_read_and_clear_dirty(unsigned long iova, size_t size,
>> +					   arm_lpae_iopte *ptep, void *opaque)
>> +{
>> +	struct arm_lpae_iopte_read_dirty *arg = opaque;
>> +	struct iommu_dirty_bitmap *dirty = arg->dirty;
>> +	arm_lpae_iopte pte;
>> +
>> +	pte = READ_ONCE(*ptep);
>> +	if (WARN_ON(!pte))
>> +		return -EINVAL;
>> +
>> +	if ((pte & ARM_LPAE_PTE_AP_WRITABLE) ==
>> ARM_LPAE_PTE_AP_WRITABLE)
>> +		return 0;
>> +
>> +	iommu_dirty_bitmap_record(dirty, iova, size);
>> +	if (!(arg->flags & IOMMU_DIRTY_NO_CLEAR))
>> +		set_bit(ARM_LPAE_PTE_AP_RDONLY_BIT, (unsigned long *)ptep);
>> +	return 0;
>> +}
>> +
>> +static int __arm_lpae_iopte_walk(struct arm_lpae_io_pgtable *data,
>> +				 unsigned long iova, size_t size,
>> +				 int lvl, arm_lpae_iopte *ptep,
>> +				 int (*fn)(unsigned long iova, size_t size,
>> +					   arm_lpae_iopte *pte, void *opaque),
>> +				 void *opaque)
>> +{
>> +	arm_lpae_iopte pte;
>> +	struct io_pgtable *iop = &data->iop;
>> +	size_t base, next_size;
>> +	int ret;
>> +
>> +	if (WARN_ON_ONCE(!fn))
>> +		return -EINVAL;
>> +
>> +	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
>> +		return -EINVAL;
>> +
>> +	ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
>> +	pte = READ_ONCE(*ptep);
>> +	if (WARN_ON(!pte))
>> +		return -EINVAL;
>> +
>> +	if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
>> +		if (iopte_leaf(pte, lvl, iop->fmt))
>> +			return fn(iova, size, ptep, opaque);
>> +
>> +		/* Current level is table, traverse next level */
>> +		next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
>> +		ptep = iopte_deref(pte, data);
>> +		for (base = 0; base < size; base += next_size) {
>> +			ret = __arm_lpae_iopte_walk(data, iova + base,
>> +						    next_size, lvl + 1, ptep,
>> +						    fn, opaque);
>> +			if (ret)
>> +				return ret;
>> +		}
>> +		return 0;
>> +	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
>> +		return fn(iova, size, ptep, opaque);
>> +	}
>> +
>> +	/* Keep on walkin */
>> +	ptep = iopte_deref(pte, data);
>> +	return __arm_lpae_iopte_walk(data, iova, size, lvl + 1, ptep,
>> +				     fn, opaque);
>> +}
>> +
>> +static int arm_lpae_read_and_clear_dirty(struct io_pgtable_ops *ops,
>> +					 unsigned long iova, size_t size,
>> +					 unsigned long flags,
>> +					 struct iommu_dirty_bitmap *dirty)
>> +{
>> +	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
>> +	struct io_pgtable_cfg *cfg = &data->iop.cfg;
>> +	struct arm_lpae_iopte_read_dirty arg = {
>> +		.flags = flags, .dirty = dirty,
>> +	};
>> +	arm_lpae_iopte *ptep = data->pgd;
>> +	int lvl = data->start_level;
>> +	long iaext = (s64)iova >> cfg->ias;
>> +
>> +	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
>> +		return -EINVAL;
> 
> I guess the size here is supposed to be one of the pgsize that iommu supports.
> But looking at the code, it looks like we are passing the iova mapped length and
> it fails here in my test setup. Could you please check and confirm.
> 
I think this might be from the original patch, and it's meant to test that
length is aligned to the page size, but I failed to removed it this snip. We
should remove this.

> Thanks,
> Shameer
> 
> 
>> +
>> +	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
>> +		iaext = ~iaext;
>> +	if (WARN_ON(iaext))
>> +		return -EINVAL;
>> +
>> +	if (data->iop.fmt != ARM_64_LPAE_S1 &&
>> +	    data->iop.fmt != ARM_32_LPAE_S1)
>> +		return -EINVAL;
>> +
>> +	return __arm_lpae_iopte_walk(data, iova, size, lvl, ptep,
>> +				     __arm_lpae_read_and_clear_dirty, &arg);
>> +}
>> +
>>  static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
>>  {
>>  	unsigned long granule, page_sizes;
>> @@ -795,6 +898,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
>>  		.map_pages	= arm_lpae_map_pages,
>>  		.unmap_pages	= arm_lpae_unmap_pages,
>>  		.iova_to_phys	= arm_lpae_iova_to_phys,
>> +		.read_and_clear_dirty = arm_lpae_read_and_clear_dirty,
>>  	};
>>
>>  	return data;
>> --
>> 2.17.2
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 21/24] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping
  2023-06-16 17:00       ` Shameerali Kolothum Thodi
@ 2023-06-16 18:11         ` Joao Martins
  0 siblings, 0 replies; 65+ messages in thread
From: Joao Martins @ 2023-06-16 18:11 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Jason Gunthorpe, Kevin Tian, Lu Baolu, Yi Liu, Yi Y Sun, iommu,
	Eric Auger, Nicolin Chen, Joerg Roedel, Jean-Philippe Brucker,
	Suravee Suthikulpanit, Will Deacon, Robin Murphy,
	Alex Williamson, kvm



On 16/06/2023 18:00, Shameerali Kolothum Thodi wrote:
> 
> 
>> -----Original Message-----
>> From: Joao Martins [mailto:joao.m.martins@oracle.com]
>> Sent: 22 May 2023 11:43
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
>> iommu@lists.linux.dev
>> Cc: Jason Gunthorpe <jgg@nvidia.com>; Kevin Tian <kevin.tian@intel.com>;
>> Lu Baolu <baolu.lu@linux.intel.com>; Yi Liu <yi.l.liu@intel.com>; Yi Y Sun
>> <yi.y.sun@intel.com>; Eric Auger <eric.auger@redhat.com>; Nicolin Chen
>> <nicolinc@nvidia.com>; Joerg Roedel <joro@8bytes.org>; Jean-Philippe
>> Brucker <jean-philippe@linaro.org>; Suravee Suthikulpanit
>> <suravee.suthikulpanit@amd.com>; Will Deacon <will@kernel.org>; Robin
>> Murphy <robin.murphy@arm.com>; Alex Williamson
>> <alex.williamson@redhat.com>; kvm@vger.kernel.org
>> Subject: Re: [PATCH RFCv2 21/24] iommu/arm-smmu-v3: Enable HTTU for
>> stage1 with io-pgtable mapping
> 
> [...]
> 
>>>> @@ -2226,6 +2233,9 @@ static int arm_smmu_domain_finalise(struct
>>>> iommu_domain *domain,
>>>>  		.iommu_dev	= smmu->dev,
>>>>  	};
>>>>
>>>> +	if (smmu->features & arm_smmu_dbm_capable(smmu))
>>>> +		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD;
> 
> Also, I think we should limit setting this to s1 only pgtbl_cfg.
> 
+1, makes sense.

> Thanks,
> Shameer
> 
>>>> +
>>>>  	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
>>>>  	if (!pgtbl_ops)
>>>>  		return -ENOMEM;
>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>>> index d82dd125446c..83d6f3a2554f 100644
>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>>> @@ -288,6 +288,9 @@
>>>>  #define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
>>>>  #define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
>>>>
>>>> +#define CTXDESC_CD_0_TCR_HA            (1UL << 43)
>>>> +#define CTXDESC_CD_0_TCR_HD            (1UL << 42)
>>>> +
>>>>  #define CTXDESC_CD_0_AA64		(1UL << 41)
>>>>  #define CTXDESC_CD_0_S			(1UL << 44)
>>>>  #define CTXDESC_CD_0_R			(1UL << 45)
>>>> diff --git a/drivers/iommu/io-pgtable-arm.c
>>>> b/drivers/iommu/io-pgtable-arm.c index 72dcdd468cf3..b2f470529459
>>>> 100644
>>>> --- a/drivers/iommu/io-pgtable-arm.c
>>>> +++ b/drivers/iommu/io-pgtable-arm.c
>>>> @@ -75,6 +75,7 @@
>>>>
>>>>  #define ARM_LPAE_PTE_NSTABLE		(((arm_lpae_iopte)1) << 63)
>>>>  #define ARM_LPAE_PTE_XN			(((arm_lpae_iopte)3) << 53)
>>>> +#define ARM_LPAE_PTE_DBM		(((arm_lpae_iopte)1) << 51)
>>>>  #define ARM_LPAE_PTE_AF			(((arm_lpae_iopte)1) << 10)
>>>>  #define ARM_LPAE_PTE_SH_NS		(((arm_lpae_iopte)0) << 8)
>>>>  #define ARM_LPAE_PTE_SH_OS		(((arm_lpae_iopte)2) << 8)
>>>> @@ -84,7 +85,7 @@
>>>>
>>>>  #define ARM_LPAE_PTE_ATTR_LO_MASK	(((arm_lpae_iopte)0x3ff) <<
>> 2)
>>>>  /* Ignore the contiguous bit for block splitting */
>>>> -#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)6) << 52)
>>>> +#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)13) <<
>> 51)
>>>>  #define ARM_LPAE_PTE_ATTR_MASK
>> 	(ARM_LPAE_PTE_ATTR_LO_MASK
>>>> |	\
>>>>  					 ARM_LPAE_PTE_ATTR_HI_MASK)
>>>>  /* Software bit for solving coherency races */ @@ -93,6 +94,9 @@
>>>>  /* Stage-1 PTE */
>>>>  #define ARM_LPAE_PTE_AP_UNPRIV		(((arm_lpae_iopte)1) << 6)
>>>>  #define ARM_LPAE_PTE_AP_RDONLY		(((arm_lpae_iopte)2) << 6)
>>>> +#define ARM_LPAE_PTE_AP_RDONLY_BIT	7
>>>> +#define ARM_LPAE_PTE_AP_WRITABLE
>> 	(ARM_LPAE_PTE_AP_RDONLY | \
>>>> +					 ARM_LPAE_PTE_DBM)
>>>>  #define ARM_LPAE_PTE_ATTRINDX_SHIFT	2
>>>>  #define ARM_LPAE_PTE_nG			(((arm_lpae_iopte)1) << 11)
>>>>
>>>> @@ -407,6 +411,8 @@ static arm_lpae_iopte
>> arm_lpae_prot_to_pte(struct
>>>> arm_lpae_io_pgtable *data,
>>>>  		pte = ARM_LPAE_PTE_nG;
>>>>  		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
>>>>  			pte |= ARM_LPAE_PTE_AP_RDONLY;
>>>> +		else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_HD)
>>>> +			pte |= ARM_LPAE_PTE_AP_WRITABLE;
>>>>  		if (!(prot & IOMMU_PRIV))
>>>>  			pte |= ARM_LPAE_PTE_AP_UNPRIV;
>>>>  	} else {
>>>> @@ -804,7 +810,8 @@ arm_64_lpae_alloc_pgtable_s1(struct
>>>> io_pgtable_cfg *cfg, void *cookie)
>>>>
>>>>  	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
>>>>  			    IO_PGTABLE_QUIRK_ARM_TTBR1 |
>>>> -			    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
>>>> +			    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA |
>>>> +			    IO_PGTABLE_QUIRK_ARM_HD))
>>>>  		return NULL;
>>>>
>>>>  	data = arm_lpae_alloc_pgtable(cfg); diff --git
>>>> a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index
>>>> 25142a0e2fc2..9a996ba7856d 100644
>>>> --- a/include/linux/io-pgtable.h
>>>> +++ b/include/linux/io-pgtable.h
>>>> @@ -85,6 +85,8 @@ struct io_pgtable_cfg {
>>>>  	 *
>>>>  	 * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the
>> outer-cacheability
>>>>  	 *	attributes set in the TCR for a non-coherent page-table walker.
>>>> +	 *
>>>> +	 * IO_PGTABLE_QUIRK_ARM_HD: Enables dirty tracking.
>>>>  	 */
>>>>  	#define IO_PGTABLE_QUIRK_ARM_NS			BIT(0)
>>>>  	#define IO_PGTABLE_QUIRK_NO_PERMS		BIT(1)
>>>> @@ -92,6 +94,8 @@ struct io_pgtable_cfg {
>>>>  	#define IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT	BIT(4)
>>>>  	#define IO_PGTABLE_QUIRK_ARM_TTBR1		BIT(5)
>>>>  	#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA		BIT(6)
>>>> +	#define IO_PGTABLE_QUIRK_ARM_HD			BIT(7)
>>>> +
>>>>  	unsigned long			quirks;
>>>>  	unsigned long			pgsize_bitmap;
>>>>  	unsigned int			ias;
>>>> --
>>>> 2.17.2
>>>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking
  2023-05-19 13:29           ` Jason Gunthorpe
  2023-05-19 13:46             ` Joao Martins
@ 2023-08-10 18:23             ` Joao Martins
  2023-08-10 18:55               ` Jason Gunthorpe
  1 sibling, 1 reply; 65+ messages in thread
From: Joao Martins @ 2023-08-10 18:23 UTC (permalink / raw)
  To: Jason Gunthorpe, Robin Murphy
  Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
	Yi Y Sun, Eric Auger, Nicolin Chen, Joerg Roedel,
	Jean-Philippe Brucker, Suravee Suthikulpanit, Will Deacon,
	Alex Williamson, kvm

On 19/05/2023 14:29, Jason Gunthorpe wrote:
> On Fri, May 19, 2023 at 12:56:19PM +0100, Joao Martins wrote:
>> On 19/05/2023 12:51, Jason Gunthorpe wrote:
>>> On Fri, May 19, 2023 at 12:47:24PM +0100, Joao Martins wrote:
>>>> In practice it is done as soon after the domain is created but I understand what
>>>> you mean that both should be together; I have this implemented like that as my
>>>> first take as a domain_alloc passed flags, but I was a little undecided because
>>>> we are adding another domain_alloc() op for the user-managed pagetable and after
>>>> having another one we would end up with 3 ways of creating iommu domain -- but
>>>> maybe that's not an issue
>>>
>>> It should ride on the same user domain alloc op as some generic flags,
>>
>> OK, I suppose that makes sense specially with this being tied in HWPT_ALLOC
>> where all this new user domain alloc does.
> 
> Yes, it should be easy.
> 
> Then do what Robin said and make the domain ops NULL if the user
> didn't ask for dirty tracking and then attach can fail if there are
> domain incompatibility's.
> 
> Since alloc_user (or whatever it settles into) will have the struct
> device * argument this should be easy enough with out getting mixed
> with the struct bus cleanup.

Taking a step back, the iommu domain ops are a shared global pointer in all
iommu domains AFAIU at least in all three iommu implementations I was targetting
with this -- init-ed from iommu_ops::domain_default_ops. Not something we can
"just" clear part of it as that's the same global pointer shared with every
other domain. We would have to duplicate for every vendor two domain ops: one
with dirty and another without dirty tracking; though the general sentiment
behind clearing makes sense

But this is for IOMMUFD API driven only, perhaps we can just enforce at HWPT
allocation time as we are given a device ID there, or via device_attach too
inside iommufd core when we attach a device to an already existent hwpt.

This is a bit simpler and as a bonus it avoids getting dependent on the
domain_alloc_user() nesting infra and no core iommu domain changes;

Unless we also need to worry about non-IOMMUFD device-attach, which I don't
think it is the case here

e.g.

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 4c37eeea2bcd..4966775f5b00 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -339,6 +339,12 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable
*hwpt,
                goto err_unlock;
        }

+       if (hwpt->enforce_dirty &&
+           !device_iommu_capable(idev->dev, IOMMU_CAP_DIRTY)) {
+               rc = -EINVAL;
+               goto out_abort;
+       }
+
        /* Try to upgrade the domain we have */
        if (idev->enforce_cache_coherency) {
                rc = iommufd_hw_pagetable_enforce_cc(hwpt);
@@ -542,7 +548,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
        }

        hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev,
-                                         immediate_attach);
+                                         immediate_attach, false);
        if (IS_ERR(hwpt)) {
                destroy_hwpt = ERR_CAST(hwpt);
                goto out_unlock;

diff --git a/drivers/iommu/iommufd/hw_pagetable.c
b/drivers/iommu/iommufd/hw_pagetable.c
index 838530460d9b..da831b4404fd 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -62,6 +62,8 @@ int iommufd_hw_pagetable_enforce_cc(struct
iommufd_hw_pagetable *hwpt)
  * @ioas: IOAS to associate the domain with
  * @idev: Device to get an iommu_domain for
  * @immediate_attach: True if idev should be attached to the hwpt
+ * @enforce_dirty: True if dirty tracking support should be enforced
+ *                 on device attach
  *
  * Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT
  * will be linked to the given ioas and upon return the underlying iommu_domain
@@ -73,7 +75,8 @@ int iommufd_hw_pagetable_enforce_cc(struct
iommufd_hw_pagetable *hwpt)
  */
 struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
-                          struct iommufd_device *idev, bool immediate_attach)
+                          struct iommufd_device *idev, bool immediate_attach,
+                          bool enforce_dirty)
 {
        const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
        struct iommufd_hw_pagetable *hwpt;
@@ -90,8 +93,17 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct
iommufd_ioas *ioas,
        refcount_inc(&ioas->obj.users);
        hwpt->ioas = ioas;

+       if (enforce_dirty &&
+           !device_iommu_capable(idev->dev, IOMMU_CAP_DIRTY)) {
+               rc = -EINVAL;
+               goto out_abort;
+       }
+
@@ -99,6 +111,8 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct
iommufd_ioas *ioas,
                        hwpt->domain = NULL;
                        goto out_abort;
                }
+
+               hwpt->enforce_dirty = enforce_dirty;
        } else {
                hwpt->domain = iommu_domain_alloc(idev->dev->bus);
                if (!hwpt->domain) {
@@ -154,7 +168,8 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
        struct iommufd_ioas *ioas;
        int rc;

-       if (cmd->flags || cmd->__reserved)
+       if ((cmd->flags & ~(IOMMU_HWPT_ALLOC_ENFORCE_DIRTY)) ||
+           cmd->__reserved)
                return -EOPNOTSUPP;

        idev = iommufd_get_device(ucmd, cmd->dev_id);
@@ -168,7 +183,8 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
        }

        mutex_lock(&ioas->mutex);
-       hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev, false);
+       hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev, false,
+                                 cmd->flags & IOMMU_HWPT_ALLOC_ENFORCE_DIRTY);
        if (IS_ERR(hwpt)) {
                rc = PTR_ERR(hwpt);
                goto out_unlock;
diff --git a/drivers/iommu/iommufd/iommufd_private.h
b/drivers/iommu/iommufd/iommufd_private.h
index 8ba786bc95ff..7f0173e54c9c 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -247,6 +247,7 @@ struct iommufd_hw_pagetable {
        struct iommu_domain *domain;
        bool auto_domain : 1;
        bool enforce_cache_coherency : 1;
+       bool enforce_dirty : 1;
        bool msi_cookie : 1;
        /* Head at iommufd_ioas::hwpt_list */
        struct list_head hwpt_item;

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking
  2023-08-10 18:23             ` Joao Martins
@ 2023-08-10 18:55               ` Jason Gunthorpe
  2023-08-10 20:36                 ` Joao Martins
  0 siblings, 1 reply; 65+ messages in thread
From: Jason Gunthorpe @ 2023-08-10 18:55 UTC (permalink / raw)
  To: Joao Martins
  Cc: Robin Murphy, iommu, Kevin Tian, Shameerali Kolothum Thodi,
	Lu Baolu, Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen,
	Joerg Roedel, Jean-Philippe Brucker, Suravee Suthikulpanit,
	Will Deacon, Alex Williamson, kvm

On Thu, Aug 10, 2023 at 07:23:11PM +0100, Joao Martins wrote:
> On 19/05/2023 14:29, Jason Gunthorpe wrote:
> > On Fri, May 19, 2023 at 12:56:19PM +0100, Joao Martins wrote:
> >> On 19/05/2023 12:51, Jason Gunthorpe wrote:
> >>> On Fri, May 19, 2023 at 12:47:24PM +0100, Joao Martins wrote:
> >>>> In practice it is done as soon after the domain is created but I understand what
> >>>> you mean that both should be together; I have this implemented like that as my
> >>>> first take as a domain_alloc passed flags, but I was a little undecided because
> >>>> we are adding another domain_alloc() op for the user-managed pagetable and after
> >>>> having another one we would end up with 3 ways of creating iommu domain -- but
> >>>> maybe that's not an issue
> >>>
> >>> It should ride on the same user domain alloc op as some generic flags,
> >>
> >> OK, I suppose that makes sense specially with this being tied in HWPT_ALLOC
> >> where all this new user domain alloc does.
> > 
> > Yes, it should be easy.
> > 
> > Then do what Robin said and make the domain ops NULL if the user
> > didn't ask for dirty tracking and then attach can fail if there are
> > domain incompatibility's.
> > 
> > Since alloc_user (or whatever it settles into) will have the struct
> > device * argument this should be easy enough with out getting mixed
> > with the struct bus cleanup.
> 
> Taking a step back, the iommu domain ops are a shared global pointer in all
> iommu domains AFAIU at least in all three iommu implementations I was targetting
> with this -- init-ed from iommu_ops::domain_default_ops. Not something we can
> "just" clear part of it as that's the same global pointer shared with every
> other domain. We would have to duplicate for every vendor two domain ops: one
> with dirty and another without dirty tracking; though the general sentiment
> behind clearing makes sense

Yes, the "domain_default_ops" is basically a transitional hack to
help migrate to narrowly defined per-usage domain ops.

eg things like blocking and identity should not have mapping ops.

Things that don't support dirty tracking should not have dirty
tracking ops in the first place.

So the simplest version of this is that by default all domain
allocations do not support dirty tracking. This ensures maximum
cross-instance/device domain re-use.

If userspace would like to use dirty tracking it signals it to
iommufd, probably using the user domain alloc path.

The driver, if it supports it, returns a dirty capable domain with
matching dirty enabled ops.

A dirty capable domain can only be attached to a device/instance that
is compatible and continues to provide dirty tracking.

This allows HW that has special restrictions to be properly supported.
eg maybe HW can only support dirty on a specific page table
format. It can select that format during alloc.

> This is a bit simpler and as a bonus it avoids getting dependent on the
> domain_alloc_user() nesting infra and no core iommu domain changes;

We have to start tackling some of this and not just bodging on top of
bodges :\

I think the domain_alloc_user patches are in good enough shape you can
rely on them.

Return the IOMMU_CAP_DIRTY as generic data in the new GET_INFO
Accept some generic flag in the alloc_hwpt requesting dirty
Pass generic flags down to the driver.
Reject set flags and drivers that don't implement alloc_domain_user.

Driver returns a domain with the right ops 'XXX_domain_ops_dirty_paging'.
Driver refuses to attach the dirty enabled domain to places that do
dirty tracking.

Jason

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking
  2023-08-10 18:55               ` Jason Gunthorpe
@ 2023-08-10 20:36                 ` Joao Martins
  2023-08-11  1:09                   ` Jason Gunthorpe
  0 siblings, 1 reply; 65+ messages in thread
From: Joao Martins @ 2023-08-10 20:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, iommu, Kevin Tian, Shameerali Kolothum Thodi,
	Lu Baolu, Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen,
	Joerg Roedel, Jean-Philippe Brucker, Suravee Suthikulpanit,
	Will Deacon, Alex Williamson, kvm

On 10/08/2023 19:55, Jason Gunthorpe wrote:
> On Thu, Aug 10, 2023 at 07:23:11PM +0100, Joao Martins wrote:
>> On 19/05/2023 14:29, Jason Gunthorpe wrote:
>>> On Fri, May 19, 2023 at 12:56:19PM +0100, Joao Martins wrote:
>>>> On 19/05/2023 12:51, Jason Gunthorpe wrote:
>>>>> On Fri, May 19, 2023 at 12:47:24PM +0100, Joao Martins wrote:
>>>>>> In practice it is done as soon after the domain is created but I understand what
>>>>>> you mean that both should be together; I have this implemented like that as my
>>>>>> first take as a domain_alloc passed flags, but I was a little undecided because
>>>>>> we are adding another domain_alloc() op for the user-managed pagetable and after
>>>>>> having another one we would end up with 3 ways of creating iommu domain -- but
>>>>>> maybe that's not an issue
>>>>>
>>>>> It should ride on the same user domain alloc op as some generic flags,
>>>>
>>>> OK, I suppose that makes sense specially with this being tied in HWPT_ALLOC
>>>> where all this new user domain alloc does.
>>>
>>> Yes, it should be easy.
>>>
>>> Then do what Robin said and make the domain ops NULL if the user
>>> didn't ask for dirty tracking and then attach can fail if there are
>>> domain incompatibility's.
>>>
>>> Since alloc_user (or whatever it settles into) will have the struct
>>> device * argument this should be easy enough with out getting mixed
>>> with the struct bus cleanup.
>>
>> Taking a step back, the iommu domain ops are a shared global pointer in all
>> iommu domains AFAIU at least in all three iommu implementations I was targetting
>> with this -- init-ed from iommu_ops::domain_default_ops. Not something we can
>> "just" clear part of it as that's the same global pointer shared with every
>> other domain. We would have to duplicate for every vendor two domain ops: one
>> with dirty and another without dirty tracking; though the general sentiment
>> behind clearing makes sense
> 
> Yes, the "domain_default_ops" is basically a transitional hack to
> help migrate to narrowly defined per-usage domain ops.
> 
> eg things like blocking and identity should not have mapping ops.
> 
My earlier point was more about not what 'domain_default_ops' represents
but that it's a pointer. Shared by everyone (devices and domains alike). But you
sort of made it clear that it's OK to duplicate it to not have dirty tracking.
The duplication is what I felt a little odd.

> Things that don't support dirty tracking should not have dirty
> tracking ops in the first place.
> 
> So the simplest version of this is that by default all domain
> allocations do not support dirty tracking. This ensures maximum
> cross-instance/device domain re-use.
> 
> If userspace would like to use dirty tracking it signals it to
> iommufd, probably using the user domain alloc path.
> 
> The driver, if it supports it, returns a dirty capable domain with
> matching dirty enabled ops.
> 
> A dirty capable domain can only be attached to a device/instance that
> is compatible and continues to provide dirty tracking.
> 
> This allows HW that has special restrictions to be properly supported.
> eg maybe HW can only support dirty on a specific page table
> format. It can select that format during alloc.
> 

(...)

>> This is a bit simpler and as a bonus it avoids getting dependent on the
>> domain_alloc_user() nesting infra and no core iommu domain changes;
> 
> We have to start tackling some of this and not just bodging on top of
> bodges :\
> 

(...) I wasn't quite bodging, just trying to parallelize what was bus cleanup
could be tackling domain/device-independent ops without them being global. Maybe
I read too much into it hence my previous question.

> I think the domain_alloc_user patches are in good enough shape you can
> rely on them.
> 
I have been using them. Just needs a flags arg and I have a alternative to the
snip I pasted earlier with domain_alloc_user already there

> Return the IOMMU_CAP_DIRTY as generic data in the new GET_INFO

I have this one here:

https://lore.kernel.org/linux-iommu/20230518204650.14541-14-joao.m.martins@oracle.com/

I can rework to GET_HW_INFO but it really needs to be generic bits of data and
not iommu hw specific e.g. that translates into device_iommu_capable() cap
checking. The moment it stays hw specific and having userspace to decode what
different hw specific bits for general simple capability checking (e.g. do we
support nesting, do we support dirty tracking, etc), it breaks the point of a
generic API.

With exception I guess of going into the weeds of nesting specific formats as
there's hardware formats intimate to the userspace space process emulation for
iommu model specific thing i.e. things that can't be made generic.

Hopefully my expectation here matches yours for cap checking?

> Accept some generic flag in the alloc_hwpt requesting dirty
> Pass generic flags down to the driver.
> Reject set flags and drivers that don't implement alloc_domain_user.
> Driver refuses to attach the dirty enabled domain to places that do
> dirty tracking.
>

This is already done, and so far I have an unsigned long flags to
domain_alloc_user() and probably be kept defining it as iommu-domain-feature bit
(unless it's better to follow similar direction as hwpt_type like in
domain_alloc_user). And gets stored as iommu_domain::flags, like this series
had. Though if majority of driver rejection flows via alloc_domain_user only
(which has a struct device), perhaps it's not even needed to store as a new
iommu_domain::flags

> Driver returns a domain with the right ops 'XXX_domain_ops_dirty_paging'.

alright, you look OK with this -- I'll go with it then

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking
  2023-08-10 20:36                 ` Joao Martins
@ 2023-08-11  1:09                   ` Jason Gunthorpe
  0 siblings, 0 replies; 65+ messages in thread
From: Jason Gunthorpe @ 2023-08-11  1:09 UTC (permalink / raw)
  To: Joao Martins
  Cc: Robin Murphy, iommu, Kevin Tian, Shameerali Kolothum Thodi,
	Lu Baolu, Yi Liu, Yi Y Sun, Eric Auger, Nicolin Chen,
	Joerg Roedel, Jean-Philippe Brucker, Suravee Suthikulpanit,
	Will Deacon, Alex Williamson, kvm

On Thu, Aug 10, 2023 at 09:36:37PM +0100, Joao Martins wrote:

> > Yes, the "domain_default_ops" is basically a transitional hack to
> > help migrate to narrowly defined per-usage domain ops.
> > 
> > eg things like blocking and identity should not have mapping ops.
> > 
> My earlier point was more about not what 'domain_default_ops' represents
> but that it's a pointer. Shared by everyone (devices and domains alike). But you
> sort of made it clear that it's OK to duplicate it to not have dirty tracking.
> The duplication is what I felt a little odd.

Well, it is one path, we could also add a dirty_ops to the
domain. Hard to say which is better.

> (...) I wasn't quite bodging, just trying to parallelize what was bus cleanup
> could be tackling domain/device-independent ops without them being global. Maybe
> I read too much into it hence my previous question.

domain_alloc_user bypasses the bus cleanup

> > Return the IOMMU_CAP_DIRTY as generic data in the new GET_INFO
> 
> I have this one here:
> 
> https://lore.kernel.org/linux-iommu/20230518204650.14541-14-joao.m.martins@oracle.com/
> 
> I can rework to GET_HW_INFO but it really needs to be generic bits of data and
> not iommu hw specific e.g. that translates into device_iommu_capable() cap
> checking. 

Yes, HW_INFO seems the right way. Just add a

   __aligned_u64 out_capabilities;

To that struct iommu_hw_info and fill it with generic code.

> > Accept some generic flag in the alloc_hwpt requesting dirty
> > Pass generic flags down to the driver.
> > Reject set flags and drivers that don't implement alloc_domain_user.
> > Driver refuses to attach the dirty enabled domain to places that do
> > dirty tracking.
> 
> This is already done, and so far I have an unsigned long flags to
> domain_alloc_user() and probably be kept defining it as
> iommu-domain-feature bit

Yes a flag in some way is the best choice

> (unless it's better to follow similar direction as hwpt_type like in
> domain_alloc_user). And gets stored as iommu_domain::flags, like this series
> had. Though if majority of driver rejection flows via alloc_domain_user only
> (which has a struct device), perhaps it's not even needed to store as a new
> iommu_domain::flags

Right, we don't need to reflect it back if the dirty ops are NULL.

Jason

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2023-08-11  1:10 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-18 20:46 [PATCH RFCv2 00/24] IOMMUFD Dirty Tracking Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 01/24] iommu: Add RCU-protected page free support Joao Martins
2023-05-19 13:32   ` Jason Gunthorpe
2023-05-19 16:48     ` Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 02/24] iommu: Replace put_pages_list() with iommu_free_pgtbl_pages() Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 03/24] vfio: Move iova_bitmap into iommu core Joao Martins
2023-05-18 22:35   ` Alex Williamson
2023-05-19  9:06     ` Joao Martins
2023-05-19  9:01   ` Liu, Jingqi
2023-05-19  9:07     ` Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 04/24] iommu: Add iommu_domain ops for dirty tracking Joao Martins
2023-05-19  8:42   ` Baolu Lu
2023-05-19  9:28     ` Joao Martins
2023-05-19 11:40   ` Jason Gunthorpe
2023-05-19 11:47     ` Joao Martins
2023-05-19 11:51       ` Jason Gunthorpe
2023-05-19 11:56         ` Joao Martins
2023-05-19 13:29           ` Jason Gunthorpe
2023-05-19 13:46             ` Joao Martins
2023-08-10 18:23             ` Joao Martins
2023-08-10 18:55               ` Jason Gunthorpe
2023-08-10 20:36                 ` Joao Martins
2023-08-11  1:09                   ` Jason Gunthorpe
2023-05-19 12:13         ` Baolu Lu
2023-05-19 13:22   ` Robin Murphy
2023-05-19 13:43     ` Joao Martins
2023-05-19 18:12       ` Robin Murphy
2023-05-18 20:46 ` [PATCH RFCv2 05/24] iommufd: Add a flag to enforce dirty tracking on attach Joao Martins
2023-05-19 13:34   ` Jason Gunthorpe
2023-05-18 20:46 ` [PATCH RFCv2 06/24] iommufd/selftest: Add a flags to _test_cmd_{hwpt_alloc,mock_domain} Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 07/24] iommufd/selftest: Test IOMMU_HWPT_ALLOC_ENFORCE_DIRTY Joao Martins
2023-05-19 13:35   ` Jason Gunthorpe
2023-05-19 13:52     ` Joao Martins
2023-05-19 13:55   ` Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 08/24] iommufd: Dirty tracking data support Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 09/24] iommufd: Add IOMMU_HWPT_SET_DIRTY Joao Martins
2023-05-19 13:49   ` Jason Gunthorpe
2023-05-19 14:21     ` Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 10/24] iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 11/24] iommufd: Add IOMMU_HWPT_GET_DIRTY_IOVA Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 12/24] iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_IOVA Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 13/24] iommufd: Add IOMMU_DEVICE_GET_CAPS Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 14/24] iommufd/selftest: Test IOMMU_DEVICE_GET_CAPS Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 15/24] iommufd: Add a flag to skip clearing of IOPTE dirty Joao Martins
2023-05-19 13:54   ` Jason Gunthorpe
2023-05-18 20:46 ` [PATCH RFCv2 16/24] iommufd/selftest: Test IOMMU_GET_DIRTY_IOVA_NO_CLEAR flag Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 17/24] iommu/amd: Access/Dirty bit support in IOPTEs Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 18/24] iommu/amd: Print access/dirty bits if supported Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 19/24] iommu/intel: Access/Dirty bit support for SL domains Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 20/24] iommu/arm-smmu-v3: Add feature detection for HTTU Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 21/24] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping Joao Martins
2023-05-19 13:49   ` Robin Murphy
2023-05-19 14:05     ` Joao Martins
2023-05-22 10:34   ` Shameerali Kolothum Thodi
2023-05-22 10:43     ` Joao Martins
2023-06-16 17:00       ` Shameerali Kolothum Thodi
2023-06-16 18:11         ` Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 22/24] iommu/arm-smmu-v3: Add read_and_clear_dirty() support Joao Martins
2023-06-16 16:46   ` Shameerali Kolothum Thodi
2023-06-16 18:10     ` Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 23/24] iommu/arm-smmu-v3: Add set_dirty_tracking() support Joao Martins
2023-05-18 20:46 ` [PATCH RFCv2 24/24] iommu/arm-smmu-v3: Advertise IOMMU_DOMAIN_F_ENFORCE_DIRTY Joao Martins
2023-05-30 14:10   ` Shameerali Kolothum Thodi
2023-05-30 19:19     ` Joao Martins
2023-05-31  9:21       ` Shameerali Kolothum Thodi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).