linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/11] vfio/iommu_type1: Implement dirty log tracking based on smmuv3 HTTU
@ 2021-03-10  9:06 Keqian Zhu
  2021-03-10  9:06 ` [PATCH v2 01/11] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update Keqian Zhu
                   ` (10 more replies)
  0 siblings, 11 replies; 17+ messages in thread
From: Keqian Zhu @ 2021-03-10  9:06 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, iommu, Alex Williamson,
	Robin Murphy, Yi Sun, Will Deacon
  Cc: Kirti Wankhede, Cornelia Huck, Marc Zyngier, Catalin Marinas,
	Mark Rutland, James Morse, Suzuki K Poulose, wanghaibin.wang,
	jiangkunkun, yuzenghui, lushenming

Hi all,

This patch series implement vfio dma dirty log tracking based on smmuv3 HTTU.

changelog:

v2:
 - Address all comments of RFC version, thanks for all of you ;-)
 - Add a bugfix that start dirty log for newly added dma ranges and domain.

Intention:

As we know, vfio live migration is an important and valuable feature, but there
are still many hurdles to solve, including migration of interrupt, device state,
DMA dirty log tracking, and etc.

For now, the only dirty log tracking interface is pinning. It has some drawbacks:
1. Only smart vendor drivers are aware of this.
2. It's coarse-grained, the pinned-scope is generally bigger than what the device actually access.
3. It can't track dirty continuously and precisely, vfio populates all pinned-scope as dirty.
   So it doesn't work well with iteratively dirty log handling.

About SMMU HTTU:

HTTU (Hardware Translation Table Update) is a feature of ARM SMMUv3, it can update
access flag or/and dirty state of the TTD (Translation Table Descriptor) by hardware.
With HTTU, stage1 TTD is classified into 3 types:
                        DBM bit             AP[2](readonly bit)
1. writable_clean         1                       1
2. writable_dirty         1                       0
3. readonly               0                       1

If HTTU_HD (manage dirty state) is enabled, smmu can change TTD from writable_clean to
writable_dirty. Then software can scan TTD to sync dirty state into dirty bitmap. With
this feature, we can track the dirty log of DMA continuously and precisely.

About this series:

Patch 1-3: Add feature detection for smmu HTTU and enable HTTU for smmu stage1 mapping.
           And add feature detection for smmu BBML. We need to split block mapping when
           start dirty log tracking and merge page mapping when stop dirty log tracking,
		   which requires break-before-make procedure. But it might cause problems when the
		   TTD is alive. The I/O streams might not tolerate translation faults. So BBML
		   should be used.

Patch 4-7: Add four interfaces (start_dirty_log, stop_dirty_log, sync_dirty_log and clear_dirty_log)
           in IOMMU layer, they are essential to implement dma dirty log tracking for vfio.
		   We implement these interfaces for arm smmuv3.

Patch   8: Add HWDBM (Hardware Dirty Bit Management) device feature reporting in IOMMU layer.

Patch9-11: Implement a new dirty log tracking method for vfio based on iommu hwdbm. A new
           ioctl operation named VFIO_DIRTY_LOG_MANUAL_CLEAR is added, which can eliminate
		   some redundant dirty handling of userspace.

Optimizations TO Do:

1. We recognized that each smmu_domain (a vfio_container may has several smmu_domain) has its
   own stage1 mapping, and we must scan all these mapping to sync dirty state. We plan to refactor
   smmu_domain to support more than one smmu in one smmu_domain, then these smmus can share a same
   stage1 mapping.
2. We also recognized that scan TTD is a hotspot of performance. Recently, I have implement a
   SW/HW conbined dirty log tracking at MMU side [1], which can effectively solve this problem.
   This idea can be applied to smmu side too.

Thanks,
Keqian


[1] https://lore.kernel.org/linux-arm-kernel/20210126124444.27136-1-zhukeqian1@huawei.com/

Jean-Philippe Brucker (1):
  iommu/arm-smmu-v3: Add support for Hardware Translation Table Update

jiangkunkun (10):
  iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping
  iommu/arm-smmu-v3: Add feature detection for BBML
  iommu/arm-smmu-v3: Split block descriptor when start dirty log
  iommu/arm-smmu-v3: Merge a span of page when stop dirty log
  iommu/arm-smmu-v3: Scan leaf TTD to sync hardware dirty log
  iommu/arm-smmu-v3: Clear dirty log according to bitmap
  iommu/arm-smmu-v3: Add HWDBM device feature reporting
  vfio/iommu_type1: Add HWDBM status maintanance
  vfio/iommu_type1: Optimize dirty bitmap population based on iommu
    HWDBM
  vfio/iommu_type1: Add support for manual dirty log clear

 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   2 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 226 +++++++++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  14 +
 drivers/iommu/io-pgtable-arm.c                | 392 +++++++++++++++++-
 drivers/iommu/iommu.c                         | 236 +++++++++++
 drivers/vfio/vfio_iommu_type1.c               | 270 +++++++++++-
 include/linux/io-pgtable.h                    |  23 +
 include/linux/iommu.h                         |  84 ++++
 include/uapi/linux/vfio.h                     |  28 +-
 9 files changed, 1264 insertions(+), 11 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 01/11] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
  2021-03-10  9:06 [PATCH v2 00/11] vfio/iommu_type1: Implement dirty log tracking based on smmuv3 HTTU Keqian Zhu
@ 2021-03-10  9:06 ` Keqian Zhu
  2021-03-10  9:06 ` [PATCH v2 02/11] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping Keqian Zhu
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Keqian Zhu @ 2021-03-10  9:06 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, iommu, Alex Williamson,
	Robin Murphy, Yi Sun, Will Deacon
  Cc: Kirti Wankhede, Cornelia Huck, Marc Zyngier, Catalin Marinas,
	Mark Rutland, James Morse, Suzuki K Poulose, wanghaibin.wang,
	jiangkunkun, yuzenghui, lushenming, Jean-Philippe Brucker

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

If the SMMU supports it and the kernel was built with HTTU support,
enable hardware update of access and dirty flags. This is essential for
shared page tables, to reduce the number of access faults on the fault
queue. Normal DMA with io-pgtables doesn't currently use the access or
dirty flags.

We can enable HTTU even if CPUs don't support it, because the kernel
always checks for HW dirty bit and updates the PTE flags atomically.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  2 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 41 ++++++++++++++++++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  8 ++++
 3 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index bb251cab61f3..ae075e675892 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -121,10 +121,12 @@ static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm)
 	if (err)
 		goto out_free_asid;
 
+	/* HA and HD will be filtered out later if not supported by the SMMU */
 	tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, 64ULL - vabits_actual) |
 	      FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, ARM_LPAE_TCR_RGN_WBWA) |
 	      FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, ARM_LPAE_TCR_RGN_WBWA) |
 	      FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS) |
+	      CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD |
 	      CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
 
 	switch (PAGE_SIZE) {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8594b4a83043..b6d965504f44 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1012,10 +1012,17 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid,
 		 * this substream's traffic
 		 */
 	} else { /* (1) and (2) */
+		u64 tcr = cd->tcr;
+
 		cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
 		cdptr[2] = 0;
 		cdptr[3] = cpu_to_le64(cd->mair);
 
+		if (!(smmu->features & ARM_SMMU_FEAT_HD))
+			tcr &= ~CTXDESC_CD_0_TCR_HD;
+		if (!(smmu->features & ARM_SMMU_FEAT_HA))
+			tcr &= ~CTXDESC_CD_0_TCR_HA;
+
 		/*
 		 * STE is live, and the SMMU might read dwords of this CD in any
 		 * order. Ensure that it observes valid values before reading
@@ -1023,7 +1030,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid,
 		 */
 		arm_smmu_sync_cd(smmu_domain, ssid, true);
 
-		val = cd->tcr |
+		val = tcr |
 #ifdef __BIG_ENDIAN
 			CTXDESC_CD_0_ENDI |
 #endif
@@ -3196,6 +3203,28 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	return 0;
 }
 
+static void arm_smmu_get_httu(struct arm_smmu_device *smmu, u32 reg)
+{
+	u32 fw_features = smmu->features & (ARM_SMMU_FEAT_HA | ARM_SMMU_FEAT_HD);
+	u32 features = 0;
+
+	switch (FIELD_GET(IDR0_HTTU, reg)) {
+	case IDR0_HTTU_ACCESS_DIRTY:
+		features |= ARM_SMMU_FEAT_HD;
+		fallthrough;
+	case IDR0_HTTU_ACCESS:
+		features |= ARM_SMMU_FEAT_HA;
+	}
+
+	if (smmu->dev->of_node)
+		smmu->features |= features;
+	else if (features != fw_features)
+		/* ACPI IORT sets the HTTU bits */
+		dev_warn(smmu->dev,
+			 "IDR0.HTTU overridden by FW configuration (0x%x)\n",
+			 fw_features);
+}
+
 static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
@@ -3256,6 +3285,8 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 			smmu->features |= ARM_SMMU_FEAT_E2H;
 	}
 
+	arm_smmu_get_httu(smmu, reg);
+
 	/*
 	 * The coherency feature as set by FW is used in preference to the ID
 	 * register, but warn on mismatch.
@@ -3441,6 +3472,14 @@ static int arm_smmu_device_acpi_probe(struct platform_device *pdev,
 	if (iort_smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE)
 		smmu->features |= ARM_SMMU_FEAT_COHERENCY;
 
+	switch (FIELD_GET(ACPI_IORT_SMMU_V3_HTTU_OVERRIDE, iort_smmu->flags)) {
+	case IDR0_HTTU_ACCESS_DIRTY:
+		smmu->features |= ARM_SMMU_FEAT_HD;
+		fallthrough;
+	case IDR0_HTTU_ACCESS:
+		smmu->features |= ARM_SMMU_FEAT_HA;
+	}
+
 	return 0;
 }
 #else
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index f985817c967a..26d6b935b383 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -33,6 +33,9 @@
 #define IDR0_ASID16			(1 << 12)
 #define IDR0_ATS			(1 << 10)
 #define IDR0_HYP			(1 << 9)
+#define IDR0_HTTU			GENMASK(7, 6)
+#define IDR0_HTTU_ACCESS		1
+#define IDR0_HTTU_ACCESS_DIRTY		2
 #define IDR0_COHACC			(1 << 4)
 #define IDR0_TTF			GENMASK(3, 2)
 #define IDR0_TTF_AARCH64		2
@@ -285,6 +288,9 @@
 #define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
 #define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
 
+#define CTXDESC_CD_0_TCR_HA		(1UL << 43)
+#define CTXDESC_CD_0_TCR_HD		(1UL << 42)
+
 #define CTXDESC_CD_0_AA64		(1UL << 41)
 #define CTXDESC_CD_0_S			(1UL << 44)
 #define CTXDESC_CD_0_R			(1UL << 45)
@@ -607,6 +613,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_BTM		(1 << 16)
 #define ARM_SMMU_FEAT_SVA		(1 << 17)
 #define ARM_SMMU_FEAT_E2H		(1 << 18)
+#define ARM_SMMU_FEAT_HA		(1 << 19)
+#define ARM_SMMU_FEAT_HD		(1 << 20)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 02/11] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping
  2021-03-10  9:06 [PATCH v2 00/11] vfio/iommu_type1: Implement dirty log tracking based on smmuv3 HTTU Keqian Zhu
  2021-03-10  9:06 ` [PATCH v2 01/11] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update Keqian Zhu
@ 2021-03-10  9:06 ` Keqian Zhu
  2021-03-10  9:06 ` [PATCH v2 03/11] iommu/arm-smmu-v3: Add feature detection for BBML Keqian Zhu
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Keqian Zhu @ 2021-03-10  9:06 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, iommu, Alex Williamson,
	Robin Murphy, Yi Sun, Will Deacon
  Cc: Kirti Wankhede, Cornelia Huck, Marc Zyngier, Catalin Marinas,
	Mark Rutland, James Morse, Suzuki K Poulose, wanghaibin.wang,
	jiangkunkun, yuzenghui, lushenming

From: jiangkunkun <jiangkunkun@huawei.com>

If HTTU is supported, we enable HA/HD bits in the SMMU CD (stage 1
mapping), and set DBM bit for writable TTD.

The dirty state information is encoded using the access permission
bits AP[2] (stage 1) or S2AP[1] (stage 2) in conjunction with the
DBM (Dirty Bit Modifier) bit, where DBM means writable and AP[2]/
S2AP[1] means dirty.

Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---

changelog:

v2:
 - Use a new quirk flag named IO_PGTABLE_QUIRK_ARM_HD to transfer
   SMMU HD feature to io-pgtable. (Robin)

 - Rebase on Jean's HTTU patch(#1).

---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 +++
 drivers/iommu/io-pgtable-arm.c              | 7 ++++++-
 include/linux/io-pgtable.h                  | 3 +++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index b6d965504f44..369c0ea7a104 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1921,6 +1921,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 			  FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) |
 			  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) |
 			  FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) |
+			  CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD |
 			  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
 	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
 
@@ -2026,6 +2027,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 
 	if (smmu_domain->non_strict)
 		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+	if (smmu->features & ARM_SMMU_FEAT_HD)
+		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD;
 
 	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
 	if (!pgtbl_ops)
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 87def58e79b5..94d790b8ed27 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -72,6 +72,7 @@
 
 #define ARM_LPAE_PTE_NSTABLE		(((arm_lpae_iopte)1) << 63)
 #define ARM_LPAE_PTE_XN			(((arm_lpae_iopte)3) << 53)
+#define ARM_LPAE_PTE_DBM		(((arm_lpae_iopte)1) << 51)
 #define ARM_LPAE_PTE_AF			(((arm_lpae_iopte)1) << 10)
 #define ARM_LPAE_PTE_SH_NS		(((arm_lpae_iopte)0) << 8)
 #define ARM_LPAE_PTE_SH_OS		(((arm_lpae_iopte)2) << 8)
@@ -81,7 +82,7 @@
 
 #define ARM_LPAE_PTE_ATTR_LO_MASK	(((arm_lpae_iopte)0x3ff) << 2)
 /* Ignore the contiguous bit for block splitting */
-#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)6) << 52)
+#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)13) << 51)
 #define ARM_LPAE_PTE_ATTR_MASK		(ARM_LPAE_PTE_ATTR_LO_MASK |	\
 					 ARM_LPAE_PTE_ATTR_HI_MASK)
 /* Software bit for solving coherency races */
@@ -379,6 +380,7 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
 static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 					   int prot)
 {
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_lpae_iopte pte;
 
 	if (data->iop.fmt == ARM_64_LPAE_S1 ||
@@ -386,6 +388,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 		pte = ARM_LPAE_PTE_nG;
 		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
 			pte |= ARM_LPAE_PTE_AP_RDONLY;
+		else if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_HD)
+			pte |= ARM_LPAE_PTE_DBM;
+
 		if (!(prot & IOMMU_PRIV))
 			pte |= ARM_LPAE_PTE_AP_UNPRIV;
 	} else {
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index a4c9ca2c31f1..64cee6831c97 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -82,6 +82,8 @@ struct io_pgtable_cfg {
 	 *
 	 * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability
 	 *	attributes set in the TCR for a non-coherent page-table walker.
+	 *
+	 * IO_PGTABLE_QUIRK_ARM_HD: Support hardware management of dirty status.
 	 */
 	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
 	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
@@ -89,6 +91,7 @@ struct io_pgtable_cfg {
 	#define IO_PGTABLE_QUIRK_NON_STRICT	BIT(4)
 	#define IO_PGTABLE_QUIRK_ARM_TTBR1	BIT(5)
 	#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA	BIT(6)
+	#define IO_PGTABLE_QUIRK_ARM_HD		BIT(7)
 	unsigned long			quirks;
 	unsigned long			pgsize_bitmap;
 	unsigned int			ias;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 03/11] iommu/arm-smmu-v3: Add feature detection for BBML
  2021-03-10  9:06 [PATCH v2 00/11] vfio/iommu_type1: Implement dirty log tracking based on smmuv3 HTTU Keqian Zhu
  2021-03-10  9:06 ` [PATCH v2 01/11] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update Keqian Zhu
  2021-03-10  9:06 ` [PATCH v2 02/11] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping Keqian Zhu
@ 2021-03-10  9:06 ` Keqian Zhu
  2021-03-10  9:06 ` [PATCH v2 04/11] iommu/arm-smmu-v3: Split block descriptor when start dirty log Keqian Zhu
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Keqian Zhu @ 2021-03-10  9:06 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, iommu, Alex Williamson,
	Robin Murphy, Yi Sun, Will Deacon
  Cc: Kirti Wankhede, Cornelia Huck, Marc Zyngier, Catalin Marinas,
	Mark Rutland, James Morse, Suzuki K Poulose, wanghaibin.wang,
	jiangkunkun, yuzenghui, lushenming

From: jiangkunkun <jiangkunkun@huawei.com>

When altering a translation table descriptor of some specific reasons,
we require break-before-make procedure. But it might cause problems when
the TTD is alive. The I/O streams might not tolerate translation faults.

If the SMMU supports BBM level 1 or BBM level 2, we can change the block
size without using break-before-make sequence.

This adds feature detection for BBML, none functional change expected.

Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---

changelog:

v2:
 - Use two new quirk flags named IO_PGTABLE_QUIRK_ARM_BBML1/2 to transfer
   SMMU BBML feature to io-pgtable. (Robin)
   
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 19 +++++++++++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  6 ++++++
 include/linux/io-pgtable.h                  |  8 ++++++++
 3 files changed, 33 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 369c0ea7a104..443ac19c6da9 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2030,6 +2030,11 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 	if (smmu->features & ARM_SMMU_FEAT_HD)
 		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD;
 
+	if (smmu->features & ARM_SMMU_FEAT_BBML1)
+		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_BBML1;
+	else if (smmu->features & ARM_SMMU_FEAT_BBML2)
+		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_BBML2;
+
 	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
 	if (!pgtbl_ops)
 		return -ENOMEM;
@@ -3373,6 +3378,20 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	/* IDR3 */
 	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3);
+	switch (FIELD_GET(IDR3_BBML, reg)) {
+	case IDR3_BBML0:
+		break;
+	case IDR3_BBML1:
+		smmu->features |= ARM_SMMU_FEAT_BBML1;
+		break;
+	case IDR3_BBML2:
+		smmu->features |= ARM_SMMU_FEAT_BBML2;
+		break;
+	default:
+		dev_err(smmu->dev, "unknown/unsupported BBM behavior level\n");
+		return -ENXIO;
+	}
+
 	if (FIELD_GET(IDR3_RIL, reg))
 		smmu->features |= ARM_SMMU_FEAT_RANGE_INV;
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 26d6b935b383..a74125675544 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -54,6 +54,10 @@
 #define IDR1_SIDSIZE			GENMASK(5, 0)
 
 #define ARM_SMMU_IDR3			0xc
+#define IDR3_BBML			GENMASK(12, 11)
+#define IDR3_BBML0			0
+#define IDR3_BBML1			1
+#define IDR3_BBML2			2
 #define IDR3_RIL			(1 << 10)
 
 #define ARM_SMMU_IDR5			0x14
@@ -615,6 +619,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_E2H		(1 << 18)
 #define ARM_SMMU_FEAT_HA		(1 << 19)
 #define ARM_SMMU_FEAT_HD		(1 << 20)
+#define ARM_SMMU_FEAT_BBML1		(1 << 21)
+#define ARM_SMMU_FEAT_BBML2		(1 << 22)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 64cee6831c97..857932357f1d 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -84,6 +84,12 @@ struct io_pgtable_cfg {
 	 *	attributes set in the TCR for a non-coherent page-table walker.
 	 *
 	 * IO_PGTABLE_QUIRK_ARM_HD: Support hardware management of dirty status.
+	 *
+	 * IO_PGTABLE_QUIRK_ARM_BBML1: ARM SMMU supports BBM Level 1 behavior
+	 *	when changing block size.
+	 *
+	 * IO_PGTABLE_QUIRK_ARM_BBML2: ARM SMMU supports BBM Level 2 behavior
+	 * when changing block size.
 	 */
 	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
 	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
@@ -92,6 +98,8 @@ struct io_pgtable_cfg {
 	#define IO_PGTABLE_QUIRK_ARM_TTBR1	BIT(5)
 	#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA	BIT(6)
 	#define IO_PGTABLE_QUIRK_ARM_HD		BIT(7)
+	#define IO_PGTABLE_QUIRK_ARM_BBML1	BIT(8)
+	#define IO_PGTABLE_QUIRK_ARM_BBML2	BIT(9)
 	unsigned long			quirks;
 	unsigned long			pgsize_bitmap;
 	unsigned int			ias;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 04/11] iommu/arm-smmu-v3: Split block descriptor when start dirty log
  2021-03-10  9:06 [PATCH v2 00/11] vfio/iommu_type1: Implement dirty log tracking based on smmuv3 HTTU Keqian Zhu
                   ` (2 preceding siblings ...)
  2021-03-10  9:06 ` [PATCH v2 03/11] iommu/arm-smmu-v3: Add feature detection for BBML Keqian Zhu
@ 2021-03-10  9:06 ` Keqian Zhu
  2021-03-16  9:17   ` Yi Sun
  2021-03-10  9:06 ` [PATCH v2 05/11] iommu/arm-smmu-v3: Merge a span of page when stop " Keqian Zhu
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Keqian Zhu @ 2021-03-10  9:06 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, iommu, Alex Williamson,
	Robin Murphy, Yi Sun, Will Deacon
  Cc: Kirti Wankhede, Cornelia Huck, Marc Zyngier, Catalin Marinas,
	Mark Rutland, James Morse, Suzuki K Poulose, wanghaibin.wang,
	jiangkunkun, yuzenghui, lushenming

From: jiangkunkun <jiangkunkun@huawei.com>

Block descriptor is not a proper granule for dirty log tracking.
Take an extreme example, if DMA writes one byte, under 1G mapping,
the dirty amount reported to userspace is 1G, but under 4K mapping,
the dirty amount is just 4K.

This adds a new interface named start_dirty_log in iommu layer and
arm smmuv3 implements it, which splits block descriptor to an span
of page descriptors. Other types of IOMMU will perform architecture
specific actions to start dirty log.

To allow code reuse, the split_block operation is realized as an
iommu_ops too. We flush all iotlbs after the whole procedure is
completed to ease the pressure of iommu, as we will hanle a huge
range of mapping in general.

Spliting block does not simultaneously work with other pgtable ops,
as the only designed user is vfio, which always hold a lock, so race
condition is not considered in the pgtable ops.

Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---

changelog:

v2:
 - Change the return type of split_block(). size_t -> int.
 - Change commit message to properly describe race condition. (Robin)
 - Change commit message to properly describe the need of split block.
 - Add a new interface named start_dirty_log(). (Sun Yi)
 - Change commit message to explain the realtionship of split_block() and start_dirty_log().

---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  52 +++++++++
 drivers/iommu/io-pgtable-arm.c              | 122 ++++++++++++++++++++
 drivers/iommu/iommu.c                       |  48 ++++++++
 include/linux/io-pgtable.h                  |   2 +
 include/linux/iommu.h                       |  24 ++++
 5 files changed, 248 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 443ac19c6da9..5d2fb926a08e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2537,6 +2537,56 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
 	return ret;
 }
 
+static int arm_smmu_split_block(struct iommu_domain *domain,
+				unsigned long iova, size_t size)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+	size_t handled_size;
+
+	if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) {
+		dev_err(smmu->dev, "don't support BBML1/2, can't split block\n");
+		return -ENODEV;
+	}
+	if (!ops || !ops->split_block) {
+		pr_err("io-pgtable don't realize split block\n");
+		return -ENODEV;
+	}
+
+	handled_size = ops->split_block(ops, iova, size);
+	if (handled_size != size) {
+		pr_err("split block failed\n");
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+/*
+ * For SMMU, the action to start dirty log is spliting block mapping. The
+ * hardware dirty management is always enabled if hardware supports HTTU HD.
+ */
+static int arm_smmu_start_dirty_log(struct iommu_domain *domain,
+				    unsigned long iova, size_t size)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_HD))
+		return -ENODEV;
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return -EINVAL;
+
+	/*
+	 * Even if the split operation fail, we can still track dirty at block
+	 * granule, which is still a much better choice compared to full dirty
+	 * policy.
+	 */
+	iommu_split_block(domain, iova, size);
+	return 0;
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
 	return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2636,6 +2686,8 @@ static struct iommu_ops arm_smmu_ops = {
 	.device_group		= arm_smmu_device_group,
 	.domain_get_attr	= arm_smmu_domain_get_attr,
 	.domain_set_attr	= arm_smmu_domain_set_attr,
+	.split_block		= arm_smmu_split_block,
+	.start_dirty_log	= arm_smmu_start_dirty_log,
 	.of_xlate		= arm_smmu_of_xlate,
 	.get_resv_regions	= arm_smmu_get_resv_regions,
 	.put_resv_regions	= generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 94d790b8ed27..4c4eec3c0698 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -79,6 +79,8 @@
 #define ARM_LPAE_PTE_SH_IS		(((arm_lpae_iopte)3) << 8)
 #define ARM_LPAE_PTE_NS			(((arm_lpae_iopte)1) << 5)
 #define ARM_LPAE_PTE_VALID		(((arm_lpae_iopte)1) << 0)
+/* Block descriptor bits */
+#define ARM_LPAE_PTE_NT			(((arm_lpae_iopte)1) << 16)
 
 #define ARM_LPAE_PTE_ATTR_LO_MASK	(((arm_lpae_iopte)0x3ff) << 2)
 /* Ignore the contiguous bit for block splitting */
@@ -679,6 +681,125 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
 	return iopte_to_paddr(pte, data) | iova;
 }
 
+static size_t __arm_lpae_split_block(struct arm_lpae_io_pgtable *data,
+				     unsigned long iova, size_t size, int lvl,
+				     arm_lpae_iopte *ptep);
+
+static size_t arm_lpae_do_split_blk(struct arm_lpae_io_pgtable *data,
+				    unsigned long iova, size_t size,
+				    arm_lpae_iopte blk_pte, int lvl,
+				    arm_lpae_iopte *ptep)
+{
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	arm_lpae_iopte pte, *tablep;
+	phys_addr_t blk_paddr;
+	size_t tablesz = ARM_LPAE_GRANULE(data);
+	size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+	int i;
+
+	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+		return 0;
+
+	tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg);
+	if (!tablep)
+		return 0;
+
+	blk_paddr = iopte_to_paddr(blk_pte, data);
+	pte = iopte_prot(blk_pte);
+	for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz)
+		__arm_lpae_init_pte(data, blk_paddr, pte, lvl, &tablep[i]);
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_BBML1) {
+		/* Race does not exist */
+		blk_pte |= ARM_LPAE_PTE_NT;
+		__arm_lpae_set_pte(ptep, blk_pte, cfg);
+		io_pgtable_tlb_flush_walk(&data->iop, iova, size, size);
+	}
+	/* Race does not exist */
+	pte = arm_lpae_install_table(tablep, ptep, blk_pte, cfg);
+
+	/* Have splited it into page? */
+	if (lvl == (ARM_LPAE_MAX_LEVELS - 1))
+		return size;
+
+	/* Go back to lvl - 1 */
+	ptep -= ARM_LPAE_LVL_IDX(iova, lvl - 1, data);
+	return __arm_lpae_split_block(data, iova, size, lvl - 1, ptep);
+}
+
+static size_t __arm_lpae_split_block(struct arm_lpae_io_pgtable *data,
+				     unsigned long iova, size_t size, int lvl,
+				     arm_lpae_iopte *ptep)
+{
+	arm_lpae_iopte pte;
+	struct io_pgtable *iop = &data->iop;
+	size_t base, next_size, total_size;
+
+	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+		return 0;
+
+	ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+	pte = READ_ONCE(*ptep);
+	if (WARN_ON(!pte))
+		return 0;
+
+	if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+		if (iopte_leaf(pte, lvl, iop->fmt)) {
+			if (lvl == (ARM_LPAE_MAX_LEVELS - 1) ||
+			    (pte & ARM_LPAE_PTE_AP_RDONLY))
+				return size;
+
+			/* We find a writable block, split it. */
+			return arm_lpae_do_split_blk(data, iova, size, pte,
+					lvl + 1, ptep);
+		} else {
+			/* If it is the last table level, then nothing to do */
+			if (lvl == (ARM_LPAE_MAX_LEVELS - 2))
+				return size;
+
+			total_size = 0;
+			next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+			ptep = iopte_deref(pte, data);
+			for (base = 0; base < size; base += next_size)
+				total_size += __arm_lpae_split_block(data,
+						iova + base, next_size, lvl + 1,
+						ptep);
+			return total_size;
+		}
+	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
+		WARN(1, "Can't split behind a block.\n");
+		return 0;
+	}
+
+	/* Keep on walkin */
+	ptep = iopte_deref(pte, data);
+	return __arm_lpae_split_block(data, iova, size, lvl + 1, ptep);
+}
+
+static size_t arm_lpae_split_block(struct io_pgtable_ops *ops,
+				   unsigned long iova, size_t size)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	arm_lpae_iopte *ptep = data->pgd;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	int lvl = data->start_level;
+	long iaext = (s64)iova >> cfg->ias;
+
+	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+		return 0;
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+		iaext = ~iaext;
+	if (WARN_ON(iaext))
+		return 0;
+
+	/* If it is smallest granule, then nothing to do */
+	if (size == ARM_LPAE_BLOCK_SIZE(ARM_LPAE_MAX_LEVELS - 1, data))
+		return size;
+
+	return __arm_lpae_split_block(data, iova, size, lvl, ptep);
+}
+
 static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 {
 	unsigned long granule, page_sizes;
@@ -757,6 +878,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 		.map		= arm_lpae_map,
 		.unmap		= arm_lpae_unmap,
 		.iova_to_phys	= arm_lpae_iova_to_phys,
+		.split_block	= arm_lpae_split_block,
 	};
 
 	return data;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d0b0a15dba84..f644e0b16843 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2720,6 +2720,54 @@ int iommu_domain_set_attr(struct iommu_domain *domain,
 }
 EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
 
+int iommu_split_block(struct iommu_domain *domain, unsigned long iova,
+		      size_t size)
+{
+	const struct iommu_ops *ops = domain->ops;
+	unsigned int min_pagesz;
+	size_t pgsize;
+	int ret = 0;
+
+	if (unlikely(!ops || !ops->split_block))
+		return -ENODEV;
+
+	min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+	if (!IS_ALIGNED(iova | size, min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
+		       iova, size, min_pagesz);
+		return -EINVAL;
+	}
+
+	while (size) {
+		pgsize = iommu_pgsize(domain, iova, size);
+
+		ret = ops->split_block(domain, iova, pgsize);
+		if (ret)
+			break;
+
+		pr_debug("split handled: iova 0x%lx size 0x%zx\n", iova, pgsize);
+
+		iova += pgsize;
+		size -= pgsize;
+	}
+	iommu_flush_iotlb_all(domain);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_split_block);
+
+int iommu_start_dirty_log(struct iommu_domain *domain, unsigned long iova,
+			  size_t size)
+{
+	const struct iommu_ops *ops = domain->ops;
+
+	if (unlikely(!ops || !ops->start_dirty_log))
+		return -ENODEV;
+
+	return ops->start_dirty_log(domain, iova, size);
+}
+EXPORT_SYMBOL_GPL(iommu_start_dirty_log);
+
 void iommu_get_resv_regions(struct device *dev, struct list_head *list)
 {
 	const struct iommu_ops *ops = dev->bus->iommu_ops;
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 857932357f1d..d86dd2ade6ad 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -167,6 +167,8 @@ struct io_pgtable_ops {
 			size_t size, struct iommu_iotlb_gather *gather);
 	phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
 				    unsigned long iova);
+	size_t (*split_block)(struct io_pgtable_ops *ops, unsigned long iova,
+			      size_t size);
 };
 
 /**
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5e7fe519430a..85ffa451547d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -205,6 +205,8 @@ struct iommu_iotlb_gather {
  * @device_group: find iommu group for a particular device
  * @domain_get_attr: Query domain attributes
  * @domain_set_attr: Change domain attributes
+ * @split_block: Split block mapping into page mapping
+ * @start_dirty_log: Perform actions to start dirty log tracking
  * @get_resv_regions: Request list of reserved regions for a device
  * @put_resv_regions: Free list of reserved regions for a device
  * @apply_resv_region: Temporary helper call-back for iova reserved ranges
@@ -260,6 +262,12 @@ struct iommu_ops {
 	int (*domain_set_attr)(struct iommu_domain *domain,
 			       enum iommu_attr attr, void *data);
 
+	/* Track dirty log */
+	int (*split_block)(struct iommu_domain *domain, unsigned long iova,
+			   size_t size);
+	int (*start_dirty_log)(struct iommu_domain *domain, unsigned long iova,
+			       size_t size);
+
 	/* Request/Free a list of reserved regions for a device */
 	void (*get_resv_regions)(struct device *dev, struct list_head *list);
 	void (*put_resv_regions)(struct device *dev, struct list_head *list);
@@ -511,6 +519,10 @@ extern int iommu_domain_get_attr(struct iommu_domain *domain, enum iommu_attr,
 				 void *data);
 extern int iommu_domain_set_attr(struct iommu_domain *domain, enum iommu_attr,
 				 void *data);
+extern int iommu_split_block(struct iommu_domain *domain, unsigned long iova,
+			     size_t size);
+extern int iommu_start_dirty_log(struct iommu_domain *domain,
+				 unsigned long iova, size_t size);
 
 /* Window handling function prototypes */
 extern int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr,
@@ -901,6 +913,18 @@ static inline int iommu_domain_set_attr(struct iommu_domain *domain,
 	return -EINVAL;
 }
 
+static inline int iommu_split_block(struct iommu_domain *domain,
+				    unsigned long iova, size_t size)
+{
+	return -EINVAL;
+}
+
+static inline int iommu_start_dirty_log(struct iommu_domain *domain,
+					unsigned long iova, size_t size)
+{
+	return -EINVAL;
+}
+
 static inline int  iommu_device_register(struct iommu_device *iommu)
 {
 	return -ENODEV;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 05/11] iommu/arm-smmu-v3: Merge a span of page when stop dirty log
  2021-03-10  9:06 [PATCH v2 00/11] vfio/iommu_type1: Implement dirty log tracking based on smmuv3 HTTU Keqian Zhu
                   ` (3 preceding siblings ...)
  2021-03-10  9:06 ` [PATCH v2 04/11] iommu/arm-smmu-v3: Split block descriptor when start dirty log Keqian Zhu
@ 2021-03-10  9:06 ` Keqian Zhu
  2021-03-10  9:06 ` [PATCH v2 06/11] iommu/arm-smmu-v3: Scan leaf TTD to sync hardware " Keqian Zhu
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Keqian Zhu @ 2021-03-10  9:06 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, iommu, Alex Williamson,
	Robin Murphy, Yi Sun, Will Deacon
  Cc: Kirti Wankhede, Cornelia Huck, Marc Zyngier, Catalin Marinas,
	Mark Rutland, James Morse, Suzuki K Poulose, wanghaibin.wang,
	jiangkunkun, yuzenghui, lushenming

From: jiangkunkun <jiangkunkun@huawei.com>

When stop dirty log tracking, we need to recover all block descriptors
which are splited when start dirty log tracking.

This adds a new interface named stop_dirty_log in iommu layer and
arm smmuv3 implements it, which reinstall block mappings and unmap
the span of page mappings. Other types of IOMMU perform architecture
specific actions to stop dirty log.

To allow code reuse, the merge_page operation is realized as an
iommu_ops too. We flush all iotlbs after the whole procedure is
completed to ease the pressure of iommu, as we will hanle a huge
range of mapping in general.

Merging page does not simultaneously work with other pgtable ops,
as the only designed user is vfio, which always hold a lock, so race
condition is not considered in the pgtable ops.

Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---

changelog:

v2:
 - Change the return type of merge_page(). size_t -> int.
 - Change commit message to properly describe race condition. (Robin)
 - Add a new interface named stop_dirty_log(). (Sun Yi)
 - Change commit message to explain the realtionship of merge_page() and stop_dirty_log().
 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 52 +++++++++++++
 drivers/iommu/io-pgtable-arm.c              | 78 ++++++++++++++++++++
 drivers/iommu/iommu.c                       | 82 +++++++++++++++++++++
 include/linux/io-pgtable.h                  |  2 +
 include/linux/iommu.h                       | 24 ++++++
 5 files changed, 238 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 5d2fb926a08e..ac0d881c77b8 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2587,6 +2587,56 @@ static int arm_smmu_start_dirty_log(struct iommu_domain *domain,
 	return 0;
 }
 
+static int arm_smmu_merge_page(struct iommu_domain *domain,
+			       unsigned long iova, phys_addr_t paddr,
+			       size_t size, int prot)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+	size_t handled_size;
+
+	if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) {
+		dev_err(smmu->dev, "don't support BBML1/2, can't merge page\n");
+		return -ENODEV;
+	}
+	if (!ops || !ops->merge_page) {
+		pr_err("io-pgtable don't realize merge page\n");
+		return -ENODEV;
+	}
+
+	handled_size = ops->merge_page(ops, iova, paddr, size, prot);
+	if (handled_size != size) {
+		pr_err("merge page failed\n");
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+/*
+ * For SMMU, the action to stop dirty log is merge page mapping. The hardware
+ * dirty management is always enabled if hardware supports HTTU HD.
+ */
+static int arm_smmu_stop_dirty_log(struct iommu_domain *domain,
+				   unsigned long iova, size_t size, int prot)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_HD))
+		return -ENODEV;
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return -EINVAL;
+
+	/*
+	 * Even if the merge operation fail, it just effects performace of DMA
+	 * transaction.
+	 */
+	iommu_merge_page(domain, iova, size, prot);
+	return 0;
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
 	return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2688,6 +2738,8 @@ static struct iommu_ops arm_smmu_ops = {
 	.domain_set_attr	= arm_smmu_domain_set_attr,
 	.split_block		= arm_smmu_split_block,
 	.start_dirty_log	= arm_smmu_start_dirty_log,
+	.merge_page		= arm_smmu_merge_page,
+	.stop_dirty_log		= arm_smmu_stop_dirty_log,
 	.of_xlate		= arm_smmu_of_xlate,
 	.get_resv_regions	= arm_smmu_get_resv_regions,
 	.put_resv_regions	= generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 4c4eec3c0698..9028328b99b0 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -800,6 +800,83 @@ static size_t arm_lpae_split_block(struct io_pgtable_ops *ops,
 	return __arm_lpae_split_block(data, iova, size, lvl, ptep);
 }
 
+static size_t __arm_lpae_merge_page(struct arm_lpae_io_pgtable *data,
+				    unsigned long iova, phys_addr_t paddr,
+				    size_t size, int lvl, arm_lpae_iopte *ptep,
+				    arm_lpae_iopte prot)
+{
+	arm_lpae_iopte pte, *tablep;
+	struct io_pgtable *iop = &data->iop;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+
+	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+		return 0;
+
+	ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+	pte = READ_ONCE(*ptep);
+	if (WARN_ON(!pte))
+		return 0;
+
+	if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+		if (iopte_leaf(pte, lvl, iop->fmt))
+			return size;
+
+		/* Race does not exist */
+		if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_BBML1) {
+			prot |= ARM_LPAE_PTE_NT;
+			__arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+			io_pgtable_tlb_flush_walk(iop, iova, size,
+						  ARM_LPAE_GRANULE(data));
+
+			prot &= ~(ARM_LPAE_PTE_NT);
+			__arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+		} else {
+			__arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+		}
+
+		tablep = iopte_deref(pte, data);
+		__arm_lpae_free_pgtable(data, lvl + 1, tablep);
+		return size;
+	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
+		/* The size is too small, already merged */
+		return size;
+	}
+
+	/* Keep on walkin */
+	ptep = iopte_deref(pte, data);
+	return __arm_lpae_merge_page(data, iova, paddr, size, lvl + 1, ptep, prot);
+}
+
+static size_t arm_lpae_merge_page(struct io_pgtable_ops *ops, unsigned long iova,
+				  phys_addr_t paddr, size_t size, int iommu_prot)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	arm_lpae_iopte *ptep = data->pgd;
+	int lvl = data->start_level;
+	arm_lpae_iopte prot;
+	long iaext = (s64)iova >> cfg->ias;
+
+	/* If no access, then nothing to do */
+	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
+		return size;
+
+	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+		return 0;
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+		iaext = ~iaext;
+	if (WARN_ON(iaext || paddr >> cfg->oas))
+		return 0;
+
+	/* If it is smallest granule, then nothing to do */
+	if (size == ARM_LPAE_BLOCK_SIZE(ARM_LPAE_MAX_LEVELS - 1, data))
+		return size;
+
+	prot = arm_lpae_prot_to_pte(data, iommu_prot);
+	return __arm_lpae_merge_page(data, iova, paddr, size, lvl, ptep, prot);
+}
+
 static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 {
 	unsigned long granule, page_sizes;
@@ -879,6 +956,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 		.unmap		= arm_lpae_unmap,
 		.iova_to_phys	= arm_lpae_iova_to_phys,
 		.split_block	= arm_lpae_split_block,
+		.merge_page	= arm_lpae_merge_page,
 	};
 
 	return data;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f644e0b16843..2a10294b62a3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2768,6 +2768,88 @@ int iommu_start_dirty_log(struct iommu_domain *domain, unsigned long iova,
 }
 EXPORT_SYMBOL_GPL(iommu_start_dirty_log);
 
+static int __iommu_merge_page(struct iommu_domain *domain,
+			      unsigned long iova, phys_addr_t paddr,
+			      size_t size, int prot)
+{
+	const struct iommu_ops *ops = domain->ops;
+	unsigned int min_pagesz;
+	size_t pgsize;
+	int ret = 0;
+
+	if (unlikely(!ops || !ops->merge_page))
+		return -ENODEV;
+
+	min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+	if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa %pa size 0x%zx min_pagesz 0x%x\n",
+			iova, &paddr, size, min_pagesz);
+		return -EINVAL;
+	}
+
+	while (size) {
+		pgsize = iommu_pgsize(domain, iova | paddr, size);
+
+		ret = ops->merge_page(domain, iova, paddr, pgsize, prot);
+		if (ret)
+			break;
+
+		pr_debug("merge handled: iova 0x%lx pa %pa size 0x%zx\n",
+			 iova, &paddr, pgsize);
+
+		iova += pgsize;
+		paddr += pgsize;
+		size -= pgsize;
+	}
+
+	return ret;
+}
+
+int iommu_merge_page(struct iommu_domain *domain, unsigned long iova,
+		     size_t size, int prot)
+{
+	phys_addr_t phys;
+	dma_addr_t p, i;
+	size_t cont_size;
+	int ret = 0;
+
+	while (size) {
+		phys = iommu_iova_to_phys(domain, iova);
+		cont_size = PAGE_SIZE;
+		p = phys + cont_size;
+		i = iova + cont_size;
+
+		while (cont_size < size && p == iommu_iova_to_phys(domain, i)) {
+			p += PAGE_SIZE;
+			i += PAGE_SIZE;
+			cont_size += PAGE_SIZE;
+		}
+
+		ret = __iommu_merge_page(domain, iova, phys, cont_size, prot);
+		if (ret)
+			break;
+
+		iova += cont_size;
+		size -= cont_size;
+	}
+	iommu_flush_iotlb_all(domain);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_merge_page);
+
+int iommu_stop_dirty_log(struct iommu_domain *domain, unsigned long iova,
+			 size_t size, int prot)
+{
+	const struct iommu_ops *ops = domain->ops;
+
+	if (unlikely(!ops || !ops->stop_dirty_log))
+		return -ENODEV;
+
+	return ops->stop_dirty_log(domain, iova, size, prot);
+}
+EXPORT_SYMBOL_GPL(iommu_stop_dirty_log);
+
 void iommu_get_resv_regions(struct device *dev, struct list_head *list)
 {
 	const struct iommu_ops *ops = dev->bus->iommu_ops;
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index d86dd2ade6ad..38b4e17c70f0 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -169,6 +169,8 @@ struct io_pgtable_ops {
 				    unsigned long iova);
 	size_t (*split_block)(struct io_pgtable_ops *ops, unsigned long iova,
 			      size_t size);
+	size_t (*merge_page)(struct io_pgtable_ops *ops, unsigned long iova,
+			     phys_addr_t phys, size_t size, int prot);
 };
 
 /**
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 85ffa451547d..28111009cf6f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -207,6 +207,8 @@ struct iommu_iotlb_gather {
  * @domain_set_attr: Change domain attributes
  * @split_block: Split block mapping into page mapping
  * @start_dirty_log: Perform actions to start dirty log tracking
+ * @merge_page: Merge page mapping into block mapping
+ * @stop_dirty_log: Perform actions to stop dirty log tracking
  * @get_resv_regions: Request list of reserved regions for a device
  * @put_resv_regions: Free list of reserved regions for a device
  * @apply_resv_region: Temporary helper call-back for iova reserved ranges
@@ -267,6 +269,10 @@ struct iommu_ops {
 			   size_t size);
 	int (*start_dirty_log)(struct iommu_domain *domain, unsigned long iova,
 			       size_t size);
+	int (*merge_page)(struct iommu_domain *domain, unsigned long iova,
+			  phys_addr_t phys, size_t size, int prot);
+	int (*stop_dirty_log)(struct iommu_domain *domain, unsigned long iova,
+			      size_t size, int prot);
 
 	/* Request/Free a list of reserved regions for a device */
 	void (*get_resv_regions)(struct device *dev, struct list_head *list);
@@ -523,6 +529,10 @@ extern int iommu_split_block(struct iommu_domain *domain, unsigned long iova,
 			     size_t size);
 extern int iommu_dirty_log_start(struct iommu_domain *domain,
 				 unsigned long iova, size_t size);
+extern int iommu_merge_page(struct iommu_domain *domain, unsigned long iova,
+			    size_t size, int prot);
+extern int iommu_stop_dirty_log(struct iommu_domain *domain,
+				unsigned long iova, size_t size, int prot);
 
 /* Window handling function prototypes */
 extern int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr,
@@ -925,6 +935,20 @@ static inline int iommu_start_dirty_log(struct iommu_domain *domain,
 	return -EINVAL;
 }
 
+static inline int iommu_merge_page(struct iommu_domain *domain,
+				   unsigned long iova, size_t size,
+				   int prot)
+{
+	return -EINVAL;
+}
+
+static inline int iommu_stop_dirty_log(struct iommu_domain *domain,
+				       unsigned long iova, size_t size,
+				       int prot)
+{
+	return -EINVAL;
+}
+
 static inline int  iommu_device_register(struct iommu_device *iommu)
 {
 	return -ENODEV;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 06/11] iommu/arm-smmu-v3: Scan leaf TTD to sync hardware dirty log
  2021-03-10  9:06 [PATCH v2 00/11] vfio/iommu_type1: Implement dirty log tracking based on smmuv3 HTTU Keqian Zhu
                   ` (4 preceding siblings ...)
  2021-03-10  9:06 ` [PATCH v2 05/11] iommu/arm-smmu-v3: Merge a span of page when stop " Keqian Zhu
@ 2021-03-10  9:06 ` Keqian Zhu
  2021-03-17 10:44   ` Yi Sun
  2021-03-10  9:06 ` [PATCH v2 07/11] iommu/arm-smmu-v3: Clear dirty log according to bitmap Keqian Zhu
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Keqian Zhu @ 2021-03-10  9:06 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, iommu, Alex Williamson,
	Robin Murphy, Yi Sun, Will Deacon
  Cc: Kirti Wankhede, Cornelia Huck, Marc Zyngier, Catalin Marinas,
	Mark Rutland, James Morse, Suzuki K Poulose, wanghaibin.wang,
	jiangkunkun, yuzenghui, lushenming

From: jiangkunkun <jiangkunkun@huawei.com>

During dirty log tracking, user will try to retrieve dirty log from
iommu if it supports hardware dirty log.

This adds a new interface named sync_dirty_log in iommu layer and
arm smmuv3 implements it, which scans leaf TTD and treats it's dirty
if it's writable (As we just enable HTTU for stage1, so check whether
AP[2] is not set).

Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---

changelog:

v2:
 - Add new sanity check in arm_smmu_sync_dirty_log(). (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
 - Document the purpose of flush_iotlb in arm_smmu_sync_dirty_log(). (Robin)
 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 30 +++++++
 drivers/iommu/io-pgtable-arm.c              | 90 +++++++++++++++++++++
 drivers/iommu/iommu.c                       | 38 +++++++++
 include/linux/io-pgtable.h                  |  4 +
 include/linux/iommu.h                       | 18 +++++
 5 files changed, 180 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ac0d881c77b8..7407896a710e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2637,6 +2637,35 @@ static int arm_smmu_stop_dirty_log(struct iommu_domain *domain,
 	return 0;
 }
 
+static int arm_smmu_sync_dirty_log(struct iommu_domain *domain,
+				   unsigned long iova, size_t size,
+				   unsigned long *bitmap,
+				   unsigned long base_iova,
+				   unsigned long bitmap_pgshift)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_HD))
+		return -ENODEV;
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return -EINVAL;
+
+	if (!ops || !ops->sync_dirty_log) {
+		pr_err("io-pgtable don't realize sync dirty log\n");
+		return -ENODEV;
+	}
+
+	/*
+	 * Flush iotlb to ensure all inflight transactions are completed.
+	 * See doc IHI0070Da 3.13.4 "HTTU behavior summary".
+	 */
+	arm_smmu_flush_iotlb_all(domain);
+	return ops->sync_dirty_log(ops, iova, size, bitmap, base_iova,
+				   bitmap_pgshift);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
 	return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2740,6 +2769,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.start_dirty_log	= arm_smmu_start_dirty_log,
 	.merge_page		= arm_smmu_merge_page,
 	.stop_dirty_log		= arm_smmu_stop_dirty_log,
+	.sync_dirty_log		= arm_smmu_sync_dirty_log,
 	.of_xlate		= arm_smmu_of_xlate,
 	.get_resv_regions	= arm_smmu_get_resv_regions,
 	.put_resv_regions	= generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 9028328b99b0..67a208a05ab2 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -877,6 +877,95 @@ static size_t arm_lpae_merge_page(struct io_pgtable_ops *ops, unsigned long iova
 	return __arm_lpae_merge_page(data, iova, paddr, size, lvl, ptep, prot);
 }
 
+static int __arm_lpae_sync_dirty_log(struct arm_lpae_io_pgtable *data,
+				     unsigned long iova, size_t size,
+				     int lvl, arm_lpae_iopte *ptep,
+				     unsigned long *bitmap,
+				     unsigned long base_iova,
+				     unsigned long bitmap_pgshift)
+{
+	arm_lpae_iopte pte;
+	struct io_pgtable *iop = &data->iop;
+	size_t base, next_size;
+	unsigned long offset;
+	int nbits, ret;
+
+	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+		return -EINVAL;
+
+	ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+	pte = READ_ONCE(*ptep);
+	if (WARN_ON(!pte))
+		return -EINVAL;
+
+	if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+		if (iopte_leaf(pte, lvl, iop->fmt)) {
+			if (pte & ARM_LPAE_PTE_AP_RDONLY)
+				return 0;
+
+			/* It is writable, set the bitmap */
+			nbits = size >> bitmap_pgshift;
+			offset = (iova - base_iova) >> bitmap_pgshift;
+			bitmap_set(bitmap, offset, nbits);
+			return 0;
+		} else {
+			/* To traverse next level */
+			next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+			ptep = iopte_deref(pte, data);
+			for (base = 0; base < size; base += next_size) {
+				ret = __arm_lpae_sync_dirty_log(data,
+						iova + base, next_size, lvl + 1,
+						ptep, bitmap, base_iova, bitmap_pgshift);
+				if (ret)
+					return ret;
+			}
+			return 0;
+		}
+	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
+		if (pte & ARM_LPAE_PTE_AP_RDONLY)
+			return 0;
+
+		/* Though the size is too small, also set bitmap */
+		nbits = size >> bitmap_pgshift;
+		offset = (iova - base_iova) >> bitmap_pgshift;
+		bitmap_set(bitmap, offset, nbits);
+		return 0;
+	}
+
+	/* Keep on walkin */
+	ptep = iopte_deref(pte, data);
+	return __arm_lpae_sync_dirty_log(data, iova, size, lvl + 1, ptep,
+			bitmap, base_iova, bitmap_pgshift);
+}
+
+static int arm_lpae_sync_dirty_log(struct io_pgtable_ops *ops,
+				   unsigned long iova, size_t size,
+				   unsigned long *bitmap,
+				   unsigned long base_iova,
+				   unsigned long bitmap_pgshift)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	arm_lpae_iopte *ptep = data->pgd;
+	int lvl = data->start_level;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	long iaext = (s64)iova >> cfg->ias;
+
+	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+		return -EINVAL;
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+		iaext = ~iaext;
+	if (WARN_ON(iaext))
+		return -EINVAL;
+
+	if (data->iop.fmt != ARM_64_LPAE_S1 &&
+	    data->iop.fmt != ARM_32_LPAE_S1)
+		return -EINVAL;
+
+	return __arm_lpae_sync_dirty_log(data, iova, size, lvl, ptep,
+					 bitmap, base_iova, bitmap_pgshift);
+}
+
 static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 {
 	unsigned long granule, page_sizes;
@@ -957,6 +1046,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 		.iova_to_phys	= arm_lpae_iova_to_phys,
 		.split_block	= arm_lpae_split_block,
 		.merge_page	= arm_lpae_merge_page,
+		.sync_dirty_log	= arm_lpae_sync_dirty_log,
 	};
 
 	return data;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2a10294b62a3..44dfb78f9050 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2850,6 +2850,44 @@ int iommu_stop_dirty_log(struct iommu_domain *domain, unsigned long iova,
 }
 EXPORT_SYMBOL_GPL(iommu_stop_dirty_log);
 
+int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
+			 size_t size, unsigned long *bitmap,
+			 unsigned long base_iova, unsigned long bitmap_pgshift)
+{
+	const struct iommu_ops *ops = domain->ops;
+	unsigned int min_pagesz;
+	size_t pgsize;
+	int ret = 0;
+
+	if (unlikely(!ops || !ops->sync_dirty_log))
+		return -ENODEV;
+
+	min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+	if (!IS_ALIGNED(iova | size, min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
+		       iova, size, min_pagesz);
+		return -EINVAL;
+	}
+
+	while (size) {
+		pgsize = iommu_pgsize(domain, iova, size);
+
+		ret = ops->sync_dirty_log(domain, iova, pgsize,
+					  bitmap, base_iova, bitmap_pgshift);
+		if (ret)
+			break;
+
+		pr_debug("dirty_log_sync handle: iova 0x%lx pagesz 0x%zx\n",
+			 iova, pgsize);
+
+		iova += pgsize;
+		size -= pgsize;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sync_dirty_log);
+
 void iommu_get_resv_regions(struct device *dev, struct list_head *list)
 {
 	const struct iommu_ops *ops = dev->bus->iommu_ops;
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 38b4e17c70f0..5107a9d4ac79 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -171,6 +171,10 @@ struct io_pgtable_ops {
 			      size_t size);
 	size_t (*merge_page)(struct io_pgtable_ops *ops, unsigned long iova,
 			     phys_addr_t phys, size_t size, int prot);
+	int (*sync_dirty_log)(struct io_pgtable_ops *ops,
+			      unsigned long iova, size_t size,
+			      unsigned long *bitmap, unsigned long base_iova,
+			      unsigned long bitmap_pgshift);
 };
 
 /**
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 28111009cf6f..7d5777acfdb7 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -209,6 +209,7 @@ struct iommu_iotlb_gather {
  * @start_dirty_log: Perform actions to start dirty log tracking
  * @merge_page: Merge page mapping into block mapping
  * @stop_dirty_log: Perform actions to stop dirty log tracking
+ * @sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
  * @get_resv_regions: Request list of reserved regions for a device
  * @put_resv_regions: Free list of reserved regions for a device
  * @apply_resv_region: Temporary helper call-back for iova reserved ranges
@@ -273,6 +274,10 @@ struct iommu_ops {
 			  phys_addr_t phys, size_t size, int prot);
 	int (*stop_dirty_log)(struct iommu_domain *domain, unsigned long iova,
 			      size_t size, int prot);
+	int (*sync_dirty_log)(struct iommu_domain *domain,
+			      unsigned long iova, size_t size,
+			      unsigned long *bitmap, unsigned long base_iova,
+			      unsigned long bitmap_pgshift);
 
 	/* Request/Free a list of reserved regions for a device */
 	void (*get_resv_regions)(struct device *dev, struct list_head *list);
@@ -533,6 +538,10 @@ extern int iommu_merge_page(struct iommu_domain *domain, unsigned long iova,
 			    size_t size, int prot);
 extern int iommu_stop_dirty_log(struct iommu_domain *domain,
 				unsigned long iova, size_t size, int prot);
+extern int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
+				size_t size, unsigned long *bitmap,
+				unsigned long base_iova,
+				unsigned long bitmap_pgshift);
 
 /* Window handling function prototypes */
 extern int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr,
@@ -949,6 +958,15 @@ static inline int iommu_stop_dirty_log(struct iommu_domain *domain,
 	return -EINVAL;
 }
 
+static inline int iommu_sync_dirty_log(struct iommu_domain *domain,
+				       unsigned long iova, size_t size,
+				       unsigned long *bitmap,
+				       unsigned long base_iova,
+				       unsigned long pgshift)
+{
+	return -EINVAL;
+}
+
 static inline int  iommu_device_register(struct iommu_device *iommu)
 {
 	return -ENODEV;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 07/11] iommu/arm-smmu-v3: Clear dirty log according to bitmap
  2021-03-10  9:06 [PATCH v2 00/11] vfio/iommu_type1: Implement dirty log tracking based on smmuv3 HTTU Keqian Zhu
                   ` (5 preceding siblings ...)
  2021-03-10  9:06 ` [PATCH v2 06/11] iommu/arm-smmu-v3: Scan leaf TTD to sync hardware " Keqian Zhu
@ 2021-03-10  9:06 ` Keqian Zhu
  2021-03-10  9:06 ` [PATCH v2 08/11] iommu/arm-smmu-v3: Add HWDBM device feature reporting Keqian Zhu
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Keqian Zhu @ 2021-03-10  9:06 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, iommu, Alex Williamson,
	Robin Murphy, Yi Sun, Will Deacon
  Cc: Kirti Wankhede, Cornelia Huck, Marc Zyngier, Catalin Marinas,
	Mark Rutland, James Morse, Suzuki K Poulose, wanghaibin.wang,
	jiangkunkun, yuzenghui, lushenming

From: jiangkunkun <jiangkunkun@huawei.com>

After dirty log is retrieved, user should clear dirty log to re-enable
dirty log tracking for these dirtied pages.

This adds a new interface named clear_dirty_log in iommu layer and
arm smmuv3 implements it, which clears the dirty state (As we just
enable HTTU for stage1, so set the AP[2] bit) of these TTDs that are
specified by the user provided bitmap.

Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---

changelog:

v2:
 - Add new sanity check in arm_smmu_sync_dirty_log(). (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
 - Remove extra flush_iotlb in __iommu_clear_dirty_log().
 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 25 ++++++
 drivers/iommu/io-pgtable-arm.c              | 95 +++++++++++++++++++++
 drivers/iommu/iommu.c                       | 68 +++++++++++++++
 include/linux/io-pgtable.h                  |  4 +
 include/linux/iommu.h                       | 17 ++++
 5 files changed, 209 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 7407896a710e..696df51a3282 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2666,6 +2666,30 @@ static int arm_smmu_sync_dirty_log(struct iommu_domain *domain,
 				   bitmap_pgshift);
 }
 
+static int arm_smmu_clear_dirty_log(struct iommu_domain *domain,
+				    unsigned long iova, size_t size,
+				    unsigned long *bitmap,
+				    unsigned long base_iova,
+				    unsigned long bitmap_pgshift)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_HD))
+		return -ENODEV;
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return -EINVAL;
+
+	if (!ops || !ops->clear_dirty_log) {
+		pr_err("io-pgtable don't realize clear dirty log\n");
+		return -ENODEV;
+	}
+
+	return ops->clear_dirty_log(ops, iova, size, bitmap, base_iova,
+				    bitmap_pgshift);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
 	return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2770,6 +2794,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.merge_page		= arm_smmu_merge_page,
 	.stop_dirty_log		= arm_smmu_stop_dirty_log,
 	.sync_dirty_log		= arm_smmu_sync_dirty_log,
+	.clear_dirty_log	= arm_smmu_clear_dirty_log,
 	.of_xlate		= arm_smmu_of_xlate,
 	.get_resv_regions	= arm_smmu_get_resv_regions,
 	.put_resv_regions	= generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 67a208a05ab2..e3ef0f50611c 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -966,6 +966,100 @@ static int arm_lpae_sync_dirty_log(struct io_pgtable_ops *ops,
 					 bitmap, base_iova, bitmap_pgshift);
 }
 
+static int __arm_lpae_clear_dirty_log(struct arm_lpae_io_pgtable *data,
+				      unsigned long iova, size_t size,
+				      int lvl, arm_lpae_iopte *ptep,
+				      unsigned long *bitmap,
+				      unsigned long base_iova,
+				      unsigned long bitmap_pgshift)
+{
+	arm_lpae_iopte pte;
+	struct io_pgtable *iop = &data->iop;
+	unsigned long offset;
+	size_t base, next_size;
+	int nbits, ret, i;
+
+	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+		return -EINVAL;
+
+	ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+	pte = READ_ONCE(*ptep);
+	if (WARN_ON(!pte))
+		return -EINVAL;
+
+	if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+		if (iopte_leaf(pte, lvl, iop->fmt)) {
+			if (pte & ARM_LPAE_PTE_AP_RDONLY)
+				return 0;
+
+			/* Ensure all corresponding bits are set */
+			nbits = size >> bitmap_pgshift;
+			offset = (iova - base_iova) >> bitmap_pgshift;
+			for (i = offset; i < offset + nbits; i++) {
+				if (!test_bit(i, bitmap))
+					return 0;
+			}
+
+			/* Race does not exist */
+			pte |= ARM_LPAE_PTE_AP_RDONLY;
+			__arm_lpae_set_pte(ptep, pte, &iop->cfg);
+			return 0;
+		} else {
+			/* To traverse next level */
+			next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+			ptep = iopte_deref(pte, data);
+			for (base = 0; base < size; base += next_size) {
+				ret = __arm_lpae_clear_dirty_log(data,
+						iova + base, next_size, lvl + 1,
+						ptep, bitmap, base_iova,
+						bitmap_pgshift);
+				if (ret)
+					return ret;
+			}
+			return 0;
+		}
+	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
+		/* Though the size is too small, it is already clean */
+		if (pte & ARM_LPAE_PTE_AP_RDONLY)
+			return 0;
+
+		return -EINVAL;
+	}
+
+	/* Keep on walkin */
+	ptep = iopte_deref(pte, data);
+	return __arm_lpae_clear_dirty_log(data, iova, size, lvl + 1, ptep,
+			bitmap, base_iova, bitmap_pgshift);
+}
+
+static int arm_lpae_clear_dirty_log(struct io_pgtable_ops *ops,
+				    unsigned long iova, size_t size,
+				    unsigned long *bitmap,
+				    unsigned long base_iova,
+				    unsigned long bitmap_pgshift)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	arm_lpae_iopte *ptep = data->pgd;
+	int lvl = data->start_level;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	long iaext = (s64)iova >> cfg->ias;
+
+	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+		return -EINVAL;
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+		iaext = ~iaext;
+	if (WARN_ON(iaext))
+		return -EINVAL;
+
+	if (data->iop.fmt != ARM_64_LPAE_S1 &&
+	    data->iop.fmt != ARM_32_LPAE_S1)
+		return -EINVAL;
+
+	return __arm_lpae_clear_dirty_log(data, iova, size, lvl, ptep,
+			bitmap, base_iova, bitmap_pgshift);
+}
+
 static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 {
 	unsigned long granule, page_sizes;
@@ -1047,6 +1141,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 		.split_block	= arm_lpae_split_block,
 		.merge_page	= arm_lpae_merge_page,
 		.sync_dirty_log	= arm_lpae_sync_dirty_log,
+		.clear_dirty_log = arm_lpae_clear_dirty_log,
 	};
 
 	return data;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 44dfb78f9050..105e4c1f015e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2888,6 +2888,74 @@ int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
 }
 EXPORT_SYMBOL_GPL(iommu_sync_dirty_log);
 
+static int __iommu_clear_dirty_log(struct iommu_domain *domain,
+				   unsigned long iova, size_t size,
+				   unsigned long *bitmap,
+				   unsigned long base_iova,
+				   unsigned long bitmap_pgshift)
+{
+	const struct iommu_ops *ops = domain->ops;
+	size_t pgsize;
+	int ret = 0;
+
+	if (unlikely(!ops || !ops->clear_dirty_log))
+		return -ENODEV;
+
+	while (size) {
+		pgsize = iommu_pgsize(domain, iova, size);
+
+		ret = ops->clear_dirty_log(domain, iova, pgsize, bitmap,
+					   base_iova, bitmap_pgshift);
+		if (ret)
+			break;
+
+		pr_debug("dirty_log_clear handled: iova 0x%lx pagesz 0x%zx\n",
+			 iova, pgsize);
+
+		iova += pgsize;
+		size -= pgsize;
+	}
+
+	return ret;
+}
+
+int iommu_clear_dirty_log(struct iommu_domain *domain,
+			  unsigned long iova, size_t size,
+			  unsigned long *bitmap, unsigned long base_iova,
+			  unsigned long bitmap_pgshift)
+{
+	unsigned long riova, rsize;
+	unsigned int min_pagesz;
+	bool flush = false;
+	int rs, re, start, end;
+	int ret = 0;
+
+	min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+	if (!IS_ALIGNED(iova | size, min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx min_pagesz 0x%x\n",
+		       iova, min_pagesz);
+		return -EINVAL;
+	}
+
+	start = (iova - base_iova) >> bitmap_pgshift;
+	end = start + (size >> bitmap_pgshift);
+	bitmap_for_each_set_region(bitmap, rs, re, start, end) {
+		flush = true;
+		riova = iova + (rs << bitmap_pgshift);
+		rsize = (re - rs) << bitmap_pgshift;
+		ret = __iommu_clear_dirty_log(domain, riova, rsize, bitmap,
+					      base_iova, bitmap_pgshift);
+		if (ret)
+			break;
+	}
+
+	if (flush)
+		iommu_flush_iotlb_all(domain);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_clear_dirty_log);
+
 void iommu_get_resv_regions(struct device *dev, struct list_head *list)
 {
 	const struct iommu_ops *ops = dev->bus->iommu_ops;
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 5107a9d4ac79..48dbbd2e12b2 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -175,6 +175,10 @@ struct io_pgtable_ops {
 			      unsigned long iova, size_t size,
 			      unsigned long *bitmap, unsigned long base_iova,
 			      unsigned long bitmap_pgshift);
+	int (*clear_dirty_log)(struct io_pgtable_ops *ops,
+			       unsigned long iova, size_t size,
+			       unsigned long *bitmap, unsigned long base_iova,
+			       unsigned long bitmap_pgshift);
 };
 
 /**
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7d5777acfdb7..4f7db5d23b23 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -278,6 +278,10 @@ struct iommu_ops {
 			      unsigned long iova, size_t size,
 			      unsigned long *bitmap, unsigned long base_iova,
 			      unsigned long bitmap_pgshift);
+	int (*clear_dirty_log)(struct iommu_domain *domain,
+			       unsigned long iova, size_t size,
+			       unsigned long *bitmap, unsigned long base_iova,
+			       unsigned long bitmap_pgshift);
 
 	/* Request/Free a list of reserved regions for a device */
 	void (*get_resv_regions)(struct device *dev, struct list_head *list);
@@ -542,6 +546,10 @@ extern int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
 				size_t size, unsigned long *bitmap,
 				unsigned long base_iova,
 				unsigned long bitmap_pgshift);
+extern int iommu_clear_dirty_log(struct iommu_domain *domain, unsigned long iova,
+				 size_t dma_size, unsigned long *bitmap,
+				 unsigned long base_iova,
+				 unsigned long bitmap_pgshift);
 
 /* Window handling function prototypes */
 extern int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr,
@@ -967,6 +975,15 @@ static inline int iommu_sync_dirty_log(struct iommu_domain *domain,
 	return -EINVAL;
 }
 
+static inline int iommu_clear_dirty_log(struct iommu_domain *domain,
+					unsigned long iova, size_t size,
+					unsigned long *bitmap,
+					unsigned long base_iova,
+					unsigned long pgshift)
+{
+	return -EINVAL;
+}
+
 static inline int  iommu_device_register(struct iommu_device *iommu)
 {
 	return -ENODEV;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 08/11] iommu/arm-smmu-v3: Add HWDBM device feature reporting
  2021-03-10  9:06 [PATCH v2 00/11] vfio/iommu_type1: Implement dirty log tracking based on smmuv3 HTTU Keqian Zhu
                   ` (6 preceding siblings ...)
  2021-03-10  9:06 ` [PATCH v2 07/11] iommu/arm-smmu-v3: Clear dirty log according to bitmap Keqian Zhu
@ 2021-03-10  9:06 ` Keqian Zhu
  2021-03-10  9:06 ` [PATCH v2 09/11] vfio/iommu_type1: Add HWDBM status maintanance Keqian Zhu
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Keqian Zhu @ 2021-03-10  9:06 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, iommu, Alex Williamson,
	Robin Murphy, Yi Sun, Will Deacon
  Cc: Kirti Wankhede, Cornelia Huck, Marc Zyngier, Catalin Marinas,
	Mark Rutland, James Morse, Suzuki K Poulose, wanghaibin.wang,
	jiangkunkun, yuzenghui, lushenming

From: jiangkunkun <jiangkunkun@huawei.com>

We have implemented these interfaces required to support iommu
dirty log tracking. The last step is reporting this feature to
upper user, then the user can perform higher policy base on it.

This adds a new dev feature named IOMMU_DEV_FEAT_HWDBM in iommu
layer. For arm smmuv3, it is equal to ARM_SMMU_FEAT_HD and it is
enabled by default if supported. Other types of IOMMU can enable
it by default or when dev_enable_feature() is called.

Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---

changelog:

v2:
 - As dev_has_feature() has been removed from iommu layer, IOMMU_DEV_FEAT_HWDBM
   is designed to be used through "enable" interface.

---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 ++++
 include/linux/iommu.h                       | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 696df51a3282..cd1627123e80 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2722,6 +2722,8 @@ static bool arm_smmu_dev_has_feature(struct device *dev,
 	switch (feat) {
 	case IOMMU_DEV_FEAT_SVA:
 		return arm_smmu_master_sva_supported(master);
+	case IOMMU_DEV_FEAT_HWDBM:
+		return !!(master->smmu->features & ARM_SMMU_FEAT_HD);
 	default:
 		return false;
 	}
@@ -2738,6 +2740,8 @@ static bool arm_smmu_dev_feature_enabled(struct device *dev,
 	switch (feat) {
 	case IOMMU_DEV_FEAT_SVA:
 		return arm_smmu_master_sva_enabled(master);
+	case IOMMU_DEV_FEAT_HWDBM:
+		return arm_smmu_dev_has_feature(dev, feat);
 	default:
 		return false;
 	}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 4f7db5d23b23..88584a2d027c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -160,6 +160,7 @@ struct iommu_resv_region {
 enum iommu_dev_features {
 	IOMMU_DEV_FEAT_AUX,	/* Aux-domain feature */
 	IOMMU_DEV_FEAT_SVA,	/* Shared Virtual Addresses */
+	IOMMU_DEV_FEAT_HWDBM,	/* Hardware Dirty Bit Management */
 };
 
 #define IOMMU_PASID_INVALID	(-1U)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 09/11] vfio/iommu_type1: Add HWDBM status maintanance
  2021-03-10  9:06 [PATCH v2 00/11] vfio/iommu_type1: Implement dirty log tracking based on smmuv3 HTTU Keqian Zhu
                   ` (7 preceding siblings ...)
  2021-03-10  9:06 ` [PATCH v2 08/11] iommu/arm-smmu-v3: Add HWDBM device feature reporting Keqian Zhu
@ 2021-03-10  9:06 ` Keqian Zhu
  2021-03-10  9:06 ` [PATCH v2 10/11] vfio/iommu_type1: Optimize dirty bitmap population based on iommu HWDBM Keqian Zhu
  2021-03-10  9:06 ` [PATCH v2 11/11] vfio/iommu_type1: Add support for manual dirty log clear Keqian Zhu
  10 siblings, 0 replies; 17+ messages in thread
From: Keqian Zhu @ 2021-03-10  9:06 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, iommu, Alex Williamson,
	Robin Murphy, Yi Sun, Will Deacon
  Cc: Kirti Wankhede, Cornelia Huck, Marc Zyngier, Catalin Marinas,
	Mark Rutland, James Morse, Suzuki K Poulose, wanghaibin.wang,
	jiangkunkun, yuzenghui, lushenming

From: jiangkunkun <jiangkunkun@huawei.com>

We are going to optimize dirty log tracking based on iommu
HWDBM feature, but the dirty log from iommu is useful only
when all iommu backed groups are connected to iommu with
HWDBM feature. This maintains a counter for this feature.

Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---

changelog:

v2:
 - Simplify vfio_group_supports_hwdbm().
 - AS feature report of HWDBM has been changed, so change vfio_dev_has_feature() to
   vfio_dev_enable_feature().

---
 drivers/vfio/vfio_iommu_type1.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 4bb162c1d649..876351c061e4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -79,6 +79,7 @@ struct vfio_iommu {
 	bool			dirty_page_tracking;
 	bool			pinned_page_dirty_scope;
 	bool			container_open;
+	uint64_t		num_non_hwdbm_groups;
 };
 
 struct vfio_domain {
@@ -116,6 +117,7 @@ struct vfio_group {
 	struct list_head	next;
 	bool			mdev_group;	/* An mdev group */
 	bool			pinned_page_dirty_scope;
+	bool			iommu_hwdbm;	/* For iommu-backed group */
 };
 
 struct vfio_iova {
@@ -1187,6 +1189,24 @@ static void vfio_update_pgsize_bitmap(struct vfio_iommu *iommu)
 	}
 }
 
+static int vfio_dev_enable_feature(struct device *dev, void *data)
+{
+	enum iommu_dev_features *feat = data;
+
+	if (iommu_dev_feature_enabled(dev, *feat))
+		return 0;
+
+	return iommu_dev_enable_feature(dev, *feat);
+}
+
+static bool vfio_group_supports_hwdbm(struct vfio_group *group)
+{
+	enum iommu_dev_features feat = IOMMU_DEV_FEAT_HWDBM;
+
+	return !iommu_group_for_each_dev(group->iommu_group, &feat,
+					 vfio_dev_enable_feature);
+}
+
 static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
 			      struct vfio_dma *dma, dma_addr_t base_iova,
 			      size_t pgsize)
@@ -2435,6 +2455,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	 * capable via the page pinning interface.
 	 */
 	iommu->num_non_pinned_groups++;
+
+	/* Update the hwdbm status of group and iommu */
+	group->iommu_hwdbm = vfio_group_supports_hwdbm(group);
+	if (!group->iommu_hwdbm)
+		iommu->num_non_hwdbm_groups++;
+
 	mutex_unlock(&iommu->lock);
 	vfio_iommu_resv_free(&group_resv_regions);
 
@@ -2571,6 +2597,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 	struct vfio_domain *domain;
 	struct vfio_group *group;
 	bool update_dirty_scope = false;
+	bool update_iommu_hwdbm = false;
 	LIST_HEAD(iova_copy);
 
 	mutex_lock(&iommu->lock);
@@ -2609,6 +2636,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 
 		vfio_iommu_detach_group(domain, group);
 		update_dirty_scope = !group->pinned_page_dirty_scope;
+		update_iommu_hwdbm = !group->iommu_hwdbm;
 		list_del(&group->next);
 		kfree(group);
 		/*
@@ -2651,6 +2679,8 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 		if (iommu->dirty_page_tracking)
 			vfio_iommu_populate_bitmap_full(iommu);
 	}
+	if (update_iommu_hwdbm)
+		iommu->num_non_hwdbm_groups--;
 	mutex_unlock(&iommu->lock);
 }
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 10/11] vfio/iommu_type1: Optimize dirty bitmap population based on iommu HWDBM
  2021-03-10  9:06 [PATCH v2 00/11] vfio/iommu_type1: Implement dirty log tracking based on smmuv3 HTTU Keqian Zhu
                   ` (8 preceding siblings ...)
  2021-03-10  9:06 ` [PATCH v2 09/11] vfio/iommu_type1: Add HWDBM status maintanance Keqian Zhu
@ 2021-03-10  9:06 ` Keqian Zhu
  2021-03-10  9:06 ` [PATCH v2 11/11] vfio/iommu_type1: Add support for manual dirty log clear Keqian Zhu
  10 siblings, 0 replies; 17+ messages in thread
From: Keqian Zhu @ 2021-03-10  9:06 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, iommu, Alex Williamson,
	Robin Murphy, Yi Sun, Will Deacon
  Cc: Kirti Wankhede, Cornelia Huck, Marc Zyngier, Catalin Marinas,
	Mark Rutland, James Morse, Suzuki K Poulose, wanghaibin.wang,
	jiangkunkun, yuzenghui, lushenming

From: jiangkunkun <jiangkunkun@huawei.com>

In the past if vfio_iommu is not of pinned_page_dirty_scope and
vfio_dma is iommu_mapped, we populate full dirty bitmap for this
vfio_dma. Now we can try to get dirty log from iommu before make
the lousy decision.

In detail, if all vfio_group are of pinned_page_dirty_scope, the
dirty bitmap population is not affected. If there are vfio_groups
not of pinned_page_dirty_scope and their domains support HWDBM,
then we can try to get dirty log from IOMMU. Otherwise, lead to
full dirty bitmap.

We should start dirty log for newly added dma range and domain.

Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---

changelog:

v2:
 - Use new interface to start|stop dirty log. As split_block|merge_page are related to ARM SMMU. (Sun Yi)
 - Bugfix: Start dirty log for newly added dma range and domain.
 
---
 drivers/vfio/vfio_iommu_type1.c | 136 +++++++++++++++++++++++++++++++-
 1 file changed, 132 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 876351c061e4..a7ab0279eda0 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1207,6 +1207,25 @@ static bool vfio_group_supports_hwdbm(struct vfio_group *group)
 					 vfio_dev_enable_feature);
 }
 
+static int vfio_iommu_dirty_log_clear(struct vfio_iommu *iommu,
+				      dma_addr_t start_iova, size_t size,
+				      unsigned long *bitmap_buffer,
+				      dma_addr_t base_iova, size_t pgsize)
+{
+	struct vfio_domain *d;
+	unsigned long pgshift = __ffs(pgsize);
+	int ret;
+
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		ret = iommu_clear_dirty_log(d->domain, start_iova, size,
+					    bitmap_buffer, base_iova, pgshift);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
 			      struct vfio_dma *dma, dma_addr_t base_iova,
 			      size_t pgsize)
@@ -1218,13 +1237,28 @@ static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
 	unsigned long shift = bit_offset % BITS_PER_LONG;
 	unsigned long leftover;
 
+	if (!iommu->num_non_pinned_groups || !dma->iommu_mapped)
+		goto bitmap_done;
+
+	/* try to get dirty log from IOMMU */
+	if (!iommu->num_non_hwdbm_groups) {
+		struct vfio_domain *d;
+
+		list_for_each_entry(d, &iommu->domain_list, next) {
+			if (iommu_sync_dirty_log(d->domain, dma->iova, dma->size,
+						dma->bitmap, dma->iova, pgshift))
+				return -EFAULT;
+		}
+		goto bitmap_done;
+	}
+
 	/*
 	 * mark all pages dirty if any IOMMU capable device is not able
 	 * to report dirty pages and all pages are pinned and mapped.
 	 */
-	if (iommu->num_non_pinned_groups && dma->iommu_mapped)
-		bitmap_set(dma->bitmap, 0, nbits);
+	bitmap_set(dma->bitmap, 0, nbits);
 
+bitmap_done:
 	if (shift) {
 		bitmap_shift_left(dma->bitmap, dma->bitmap, shift,
 				  nbits + shift);
@@ -1286,6 +1320,18 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
 		 */
 		bitmap_clear(dma->bitmap, 0, dma->size >> pgshift);
 		vfio_dma_populate_bitmap(dma, pgsize);
+
+		/* Clear iommu dirty log to re-enable dirty log tracking */
+		if (!iommu->pinned_page_dirty_scope &&
+		    dma->iommu_mapped && !iommu->num_non_hwdbm_groups) {
+			ret = vfio_iommu_dirty_log_clear(iommu,	dma->iova,
+					dma->size, dma->bitmap, dma->iova,
+					pgsize);
+			if (ret) {
+				pr_warn("dma dirty log clear failed!\n");
+				return ret;
+			}
+		}
 	}
 	return 0;
 }
@@ -1561,6 +1607,9 @@ static bool vfio_iommu_iova_dma_valid(struct vfio_iommu *iommu,
 	return list_empty(iova);
 }
 
+static void vfio_dma_dirty_log_start(struct vfio_iommu *iommu,
+				     struct vfio_dma *dma);
+
 static int vfio_dma_do_map(struct vfio_iommu *iommu,
 			   struct vfio_iommu_type1_dma_map *map)
 {
@@ -1684,8 +1733,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 
 	if (!ret && iommu->dirty_page_tracking) {
 		ret = vfio_dma_bitmap_alloc(dma, pgsize);
-		if (ret)
+		if (ret) {
 			vfio_remove_dma(iommu, dma);
+			goto out_unlock;
+		}
+
+		/* Start dirty log for newly added dma */
+		vfio_dma_dirty_log_start(iommu, dma);
 	}
 
 out_unlock:
@@ -2262,6 +2316,9 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu,
 	list_splice_tail(iova_copy, iova);
 }
 
+static void vfio_domain_dirty_log_start(struct vfio_iommu *iommu,
+					struct vfio_domain *d);
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
 					 struct iommu_group *iommu_group)
 {
@@ -2445,6 +2502,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 
 	list_add(&domain->next, &iommu->domain_list);
 	vfio_update_pgsize_bitmap(iommu);
+
+	/* Start dirty log for newly added vfio domain */
+	if (iommu->dirty_page_tracking)
+		vfio_domain_dirty_log_start(iommu, domain);
 done:
 	/* Delete the old one and insert new iova list */
 	vfio_iommu_iova_insert_copy(iommu, &iova_copy);
@@ -3022,6 +3083,70 @@ static int vfio_iommu_type1_unmap_dma(struct vfio_iommu *iommu,
 			-EFAULT : 0;
 }
 
+static void vfio_domain_dirty_log_start(struct vfio_iommu *iommu,
+					struct vfio_domain *d)
+{
+	struct rb_node *n;
+
+	/* Go through all dmas even if some dmas failed */
+	for (n = rb_first(&iommu->dma_list); n; n = rb_next(n)) {
+		struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+
+		if (!dma->iommu_mapped)
+			continue;
+
+		iommu_start_dirty_log(d->domain, dma->iova, dma->size);
+	}
+}
+
+static void vfio_dma_dirty_log_start(struct vfio_iommu *iommu,
+				     struct vfio_dma *dma)
+{
+	struct vfio_domain *d;
+
+	if (!dma->iommu_mapped)
+		return;
+
+	/* Go through all domains even if some domain failed */
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		iommu_start_dirty_log(d->domain, dma->iova, dma->size);
+	}
+}
+
+static void vfio_dma_dirty_log_stop(struct vfio_iommu *iommu,
+				    struct vfio_dma *dma)
+{
+	struct vfio_domain *d;
+
+	if (!dma->iommu_mapped)
+		return;
+
+	/* Go through all domains even if some domain failed */
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		iommu_stop_dirty_log(d->domain, dma->iova, dma->size,
+				     d->prot | dma->prot);
+	}
+}
+
+static void vfio_iommu_dirty_log_switch(struct vfio_iommu *iommu, bool start)
+{
+	struct rb_node *n;
+
+	/*
+	 * Go ahead even if all iommu domains don't support HWDBM for now, as
+	 * we can get dirty log from IOMMU when these domains without HWDBM
+	 * are detached.
+	 */
+	for (n = rb_first(&iommu->dma_list); n; n = rb_next(n)) {
+		struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+
+		if (start)
+			vfio_dma_dirty_log_start(iommu, dma);
+		else
+			vfio_dma_dirty_log_stop(iommu, dma);
+	}
+}
+
 static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
 					unsigned long arg)
 {
@@ -3054,8 +3179,10 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
 		pgsize = 1 << __ffs(iommu->pgsize_bitmap);
 		if (!iommu->dirty_page_tracking) {
 			ret = vfio_dma_bitmap_alloc_all(iommu, pgsize);
-			if (!ret)
+			if (!ret) {
 				iommu->dirty_page_tracking = true;
+				vfio_iommu_dirty_log_switch(iommu, true);
+			}
 		}
 		mutex_unlock(&iommu->lock);
 		return ret;
@@ -3064,6 +3191,7 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
 		if (iommu->dirty_page_tracking) {
 			iommu->dirty_page_tracking = false;
 			vfio_dma_bitmap_free_all(iommu);
+			vfio_iommu_dirty_log_switch(iommu, false);
 		}
 		mutex_unlock(&iommu->lock);
 		return 0;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 11/11] vfio/iommu_type1: Add support for manual dirty log clear
  2021-03-10  9:06 [PATCH v2 00/11] vfio/iommu_type1: Implement dirty log tracking based on smmuv3 HTTU Keqian Zhu
                   ` (9 preceding siblings ...)
  2021-03-10  9:06 ` [PATCH v2 10/11] vfio/iommu_type1: Optimize dirty bitmap population based on iommu HWDBM Keqian Zhu
@ 2021-03-10  9:06 ` Keqian Zhu
  10 siblings, 0 replies; 17+ messages in thread
From: Keqian Zhu @ 2021-03-10  9:06 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, iommu, Alex Williamson,
	Robin Murphy, Yi Sun, Will Deacon
  Cc: Kirti Wankhede, Cornelia Huck, Marc Zyngier, Catalin Marinas,
	Mark Rutland, James Morse, Suzuki K Poulose, wanghaibin.wang,
	jiangkunkun, yuzenghui, lushenming

From: jiangkunkun <jiangkunkun@huawei.com>

In the past, we clear dirty log immediately after sync dirty
log to userspace. This may cause redundant dirty handling if
userspace handles dirty log iteratively:

After vfio clears dirty log, new dirty log starts to generate.
These new dirty log will be reported to userspace even if they
are generated before userspace handles the same dirty page.

That's to say, we should minimize the time gap of dirty log
clearing and dirty log handling. We can give userspace the
interface to clear dirty log.

Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---

changelog:

v2:
 - Rebase to newest code, so change VFIO_DIRTY_LOG_MANUAL_CLEAR form 9 to 11.

---
 drivers/vfio/vfio_iommu_type1.c | 104 ++++++++++++++++++++++++++++++--
 include/uapi/linux/vfio.h       |  28 ++++++++-
 2 files changed, 127 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a7ab0279eda0..94306f567894 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -77,6 +77,7 @@ struct vfio_iommu {
 	bool			v2;
 	bool			nesting;
 	bool			dirty_page_tracking;
+	bool			dirty_log_manual_clear;
 	bool			pinned_page_dirty_scope;
 	bool			container_open;
 	uint64_t		num_non_hwdbm_groups;
@@ -1226,6 +1227,78 @@ static int vfio_iommu_dirty_log_clear(struct vfio_iommu *iommu,
 	return 0;
 }
 
+static int vfio_iova_dirty_log_clear(u64 __user *bitmap,
+				     struct vfio_iommu *iommu,
+				     dma_addr_t iova, size_t size,
+				     size_t pgsize)
+{
+	struct vfio_dma *dma;
+	struct rb_node *n;
+	dma_addr_t start_iova, end_iova, riova;
+	unsigned long pgshift = __ffs(pgsize);
+	unsigned long bitmap_size;
+	unsigned long *bitmap_buffer = NULL;
+	bool clear_valid;
+	int rs, re, start, end, dma_offset;
+	int ret = 0;
+
+	bitmap_size = DIRTY_BITMAP_BYTES(size >> pgshift);
+	bitmap_buffer = kvmalloc(bitmap_size, GFP_KERNEL);
+	if (!bitmap_buffer) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (copy_from_user(bitmap_buffer, bitmap, bitmap_size)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	for (n = rb_first(&iommu->dma_list); n; n = rb_next(n)) {
+		dma = rb_entry(n, struct vfio_dma, node);
+		if (!dma->iommu_mapped)
+			continue;
+		if ((dma->iova + dma->size - 1) < iova)
+			continue;
+		if (dma->iova > iova + size - 1)
+			break;
+
+		start_iova = max(iova, dma->iova);
+		end_iova = min(iova + size, dma->iova + dma->size);
+
+		/* Similar logic as the tail of vfio_iova_dirty_bitmap */
+
+		clear_valid = false;
+		start = (start_iova - iova) >> pgshift;
+		end = (end_iova - iova) >> pgshift;
+		bitmap_for_each_set_region(bitmap_buffer, rs, re, start, end) {
+			clear_valid = true;
+			riova = iova + (rs << pgshift);
+			dma_offset = (riova - dma->iova) >> pgshift;
+			bitmap_clear(dma->bitmap, dma_offset, re - rs);
+		}
+
+		if (clear_valid)
+			vfio_dma_populate_bitmap(dma, pgsize);
+
+		if (clear_valid && !iommu->pinned_page_dirty_scope &&
+		    dma->iommu_mapped && !iommu->num_non_hwdbm_groups) {
+			ret = vfio_iommu_dirty_log_clear(iommu, start_iova,
+					end_iova - start_iova,	bitmap_buffer,
+					iova, pgsize);
+			if (ret) {
+				pr_warn("dma dirty log clear failed!\n");
+				goto out;
+			}
+		}
+
+	}
+
+out:
+	kfree(bitmap_buffer);
+	return ret;
+}
+
 static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
 			      struct vfio_dma *dma, dma_addr_t base_iova,
 			      size_t pgsize)
@@ -1275,6 +1348,11 @@ static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
 			 DIRTY_BITMAP_BYTES(nbits + shift)))
 		return -EFAULT;
 
+	/* Recover the bitmap under manual clear */
+	if (shift && iommu->dirty_log_manual_clear)
+		bitmap_shift_right(dma->bitmap, dma->bitmap, shift,
+				   nbits + shift);
+
 	return 0;
 }
 
@@ -1313,6 +1391,9 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
 		if (ret)
 			return ret;
 
+		if (iommu->dirty_log_manual_clear)
+			continue;
+
 		/*
 		 * Re-populate bitmap to include all pinned pages which are
 		 * considered as dirty but exclude pages which are unpinned and
@@ -2850,6 +2931,11 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
 		if (!iommu)
 			return 0;
 		return vfio_domains_have_iommu_cache(iommu);
+	case VFIO_DIRTY_LOG_MANUAL_CLEAR:
+		if (!iommu)
+			return 0;
+		iommu->dirty_log_manual_clear = true;
+		return 1;
 	default:
 		return 0;
 	}
@@ -3153,7 +3239,8 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
 	struct vfio_iommu_type1_dirty_bitmap dirty;
 	uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START |
 			VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP |
-			VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP;
+			VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP |
+			VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP;
 	unsigned long minsz;
 	int ret = 0;
 
@@ -3195,7 +3282,8 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
 		}
 		mutex_unlock(&iommu->lock);
 		return 0;
-	} else if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP) {
+	} else if (dirty.flags & (VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP |
+				VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP)) {
 		struct vfio_iommu_type1_dirty_bitmap_get range;
 		unsigned long pgshift;
 		size_t data_size = dirty.argsz - minsz;
@@ -3238,13 +3326,21 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
 			goto out_unlock;
 		}
 
-		if (iommu->dirty_page_tracking)
+		if (!iommu->dirty_page_tracking) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+
+		if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP)
 			ret = vfio_iova_dirty_bitmap(range.bitmap.data,
 						     iommu, range.iova,
 						     range.size,
 						     range.bitmap.pgsize);
 		else
-			ret = -EINVAL;
+			ret = vfio_iova_dirty_log_clear(range.bitmap.data,
+							iommu, range.iova,
+							range.size,
+							range.bitmap.pgsize);
 out_unlock:
 		mutex_unlock(&iommu->lock);
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 8ce36c1d53ca..784dc3cf2a8f 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -52,6 +52,14 @@
 /* Supports the vaddr flag for DMA map and unmap */
 #define VFIO_UPDATE_VADDR		10
 
+/*
+ * The vfio_iommu driver may support user clears dirty log manually, which means
+ * dirty log is not cleared automatically after dirty log is copied to userspace,
+ * it's user's duty to clear dirty log. Note: when user queries this extension
+ * and vfio_iommu driver supports it, then it is enabled.
+ */
+#define VFIO_DIRTY_LOG_MANUAL_CLEAR	11
+
 /*
  * The IOCTL interface is designed for extensibility by embedding the
  * structure length (argsz) and flags into structures passed between
@@ -1188,7 +1196,24 @@ struct vfio_iommu_type1_dma_unmap {
  * actual bitmap. If dirty pages logging is not enabled, an error will be
  * returned.
  *
- * Only one of the flags _START, _STOP and _GET may be specified at a time.
+ * Calling the IOCTL with VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP flag set,
+ * instructs the IOMMU driver to clear the dirty status of pages in a bitmap
+ * for IOMMU container for a given IOVA range. The user must specify the IOVA
+ * range, the bitmap and the pgsize through the structure
+ * vfio_iommu_type1_dirty_bitmap_get in the data[] portion. This interface
+ * supports clearing a bitmap of the smallest supported pgsize only and can be
+ * modified in future to clear a bitmap of any specified supported pgsize. The
+ * user must provide a memory area for the bitmap memory and specify its size
+ * in bitmap.size. One bit is used to represent one page consecutively starting
+ * from iova offset. The user should provide page size in bitmap.pgsize field.
+ * A bit set in the bitmap indicates that the page at that offset from iova is
+ * cleared the dirty status, and dirty tracking is re-enabled for that page. The
+ * caller must set argsz to a value including the size of structure
+ * vfio_iommu_dirty_bitmap_get, but excluing the size of the actual bitmap. If
+ * dirty pages logging is not enabled, an error will be returned.
+ *
+ * Only one of the flags _START, _STOP, _GET and _CLEAR may be specified at a
+ * time.
  *
  */
 struct vfio_iommu_type1_dirty_bitmap {
@@ -1197,6 +1222,7 @@ struct vfio_iommu_type1_dirty_bitmap {
 #define VFIO_IOMMU_DIRTY_PAGES_FLAG_START	(1 << 0)
 #define VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP	(1 << 1)
 #define VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP	(1 << 2)
+#define VFIO_IOMMU_DIRTY_PAGES_FLAG_CLEAR_BITMAP (1 << 3)
 	__u8         data[];
 };
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 04/11] iommu/arm-smmu-v3: Split block descriptor when start dirty log
  2021-03-10  9:06 ` [PATCH v2 04/11] iommu/arm-smmu-v3: Split block descriptor when start dirty log Keqian Zhu
@ 2021-03-16  9:17   ` Yi Sun
  2021-03-16 11:39     ` Keqian Zhu
  0 siblings, 1 reply; 17+ messages in thread
From: Yi Sun @ 2021-03-16  9:17 UTC (permalink / raw)
  To: Keqian Zhu
  Cc: linux-kernel, linux-arm-kernel, iommu, Alex Williamson,
	Robin Murphy, Will Deacon, Kirti Wankhede, Cornelia Huck,
	Marc Zyngier, Catalin Marinas, Mark Rutland, James Morse,
	Suzuki K Poulose, wanghaibin.wang, jiangkunkun, yuzenghui,
	lushenming

On 21-03-10 17:06:07, Keqian Zhu wrote:
> From: jiangkunkun <jiangkunkun@huawei.com>
> 
> Block descriptor is not a proper granule for dirty log tracking.
> Take an extreme example, if DMA writes one byte, under 1G mapping,
> the dirty amount reported to userspace is 1G, but under 4K mapping,
> the dirty amount is just 4K.
> 
> This adds a new interface named start_dirty_log in iommu layer and
> arm smmuv3 implements it, which splits block descriptor to an span
> of page descriptors. Other types of IOMMU will perform architecture
> specific actions to start dirty log.
> 
> To allow code reuse, the split_block operation is realized as an
> iommu_ops too. We flush all iotlbs after the whole procedure is
> completed to ease the pressure of iommu, as we will hanle a huge
> range of mapping in general.
> 
> Spliting block does not simultaneously work with other pgtable ops,
> as the only designed user is vfio, which always hold a lock, so race
> condition is not considered in the pgtable ops.
> 
> Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
> ---
> 
> changelog:
> 
> v2:
>  - Change the return type of split_block(). size_t -> int.
>  - Change commit message to properly describe race condition. (Robin)
>  - Change commit message to properly describe the need of split block.
>  - Add a new interface named start_dirty_log(). (Sun Yi)
>  - Change commit message to explain the realtionship of split_block() and start_dirty_log().
> 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  52 +++++++++
>  drivers/iommu/io-pgtable-arm.c              | 122 ++++++++++++++++++++
>  drivers/iommu/iommu.c                       |  48 ++++++++
>  include/linux/io-pgtable.h                  |   2 +
>  include/linux/iommu.h                       |  24 ++++
>  5 files changed, 248 insertions(+)
> 
Could you please split iommu common interface to a separate patch?
This may make review and comments easier.

IMHO, I think the start/stop interfaces could be merged into one, e.g:
    int iommu_domain_set_hwdbm(struct iommu_domain *domain, bool enable,
                               unsigned long iova, size_t size,
                               int prot);

Same comments to patch 5.

BRs,
Yi Sun

> -- 
> 2.19.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 04/11] iommu/arm-smmu-v3: Split block descriptor when start dirty log
  2021-03-16  9:17   ` Yi Sun
@ 2021-03-16 11:39     ` Keqian Zhu
  2021-03-17  3:06       ` Yi Sun
  0 siblings, 1 reply; 17+ messages in thread
From: Keqian Zhu @ 2021-03-16 11:39 UTC (permalink / raw)
  To: Yi Sun
  Cc: linux-kernel, linux-arm-kernel, iommu, Alex Williamson,
	Robin Murphy, Will Deacon, Kirti Wankhede, Cornelia Huck,
	Marc Zyngier, Catalin Marinas, Mark Rutland, James Morse,
	Suzuki K Poulose, wanghaibin.wang, jiangkunkun, yuzenghui,
	lushenming

Hi Yi,

On 2021/3/16 17:17, Yi Sun wrote:
> On 21-03-10 17:06:07, Keqian Zhu wrote:
>> From: jiangkunkun <jiangkunkun@huawei.com>
>>
>> Block descriptor is not a proper granule for dirty log tracking.
>> Take an extreme example, if DMA writes one byte, under 1G mapping,
>> the dirty amount reported to userspace is 1G, but under 4K mapping,
>> the dirty amount is just 4K.
>>
>> This adds a new interface named start_dirty_log in iommu layer and
>> arm smmuv3 implements it, which splits block descriptor to an span
>> of page descriptors. Other types of IOMMU will perform architecture
>> specific actions to start dirty log.
>>
>> To allow code reuse, the split_block operation is realized as an
>> iommu_ops too. We flush all iotlbs after the whole procedure is
>> completed to ease the pressure of iommu, as we will hanle a huge
>> range of mapping in general.
>>
>> Spliting block does not simultaneously work with other pgtable ops,
>> as the only designed user is vfio, which always hold a lock, so race
>> condition is not considered in the pgtable ops.
>>
>> Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
>> ---
>>
>> changelog:
>>
>> v2:
>>  - Change the return type of split_block(). size_t -> int.
>>  - Change commit message to properly describe race condition. (Robin)
>>  - Change commit message to properly describe the need of split block.
>>  - Add a new interface named start_dirty_log(). (Sun Yi)
>>  - Change commit message to explain the realtionship of split_block() and start_dirty_log().
>>
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  52 +++++++++
>>  drivers/iommu/io-pgtable-arm.c              | 122 ++++++++++++++++++++
>>  drivers/iommu/iommu.c                       |  48 ++++++++
>>  include/linux/io-pgtable.h                  |   2 +
>>  include/linux/iommu.h                       |  24 ++++
>>  5 files changed, 248 insertions(+)
>>
> Could you please split iommu common interface to a separate patch?
> This may make review and comments easier.
Yup, good suggestion.

> 
> IMHO, I think the start/stop interfaces could be merged into one, e.g:
>     int iommu_domain_set_hwdbm(struct iommu_domain *domain, bool enable,
>                                unsigned long iova, size_t size,
>                                int prot);
Looks good, this reduces some code. but I have a concern that this causes loss of flexibility,
as we must pass same arguments when start|stop dirty log. What's your opinion about this?

> 
> Same comments to patch 5.
OK. Thanks.

> 
> BRs,
> Yi Sun
> 
>> -- 
>> 2.19.1
> .
Thanks,
Keqian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 04/11] iommu/arm-smmu-v3: Split block descriptor when start dirty log
  2021-03-16 11:39     ` Keqian Zhu
@ 2021-03-17  3:06       ` Yi Sun
  0 siblings, 0 replies; 17+ messages in thread
From: Yi Sun @ 2021-03-17  3:06 UTC (permalink / raw)
  To: Keqian Zhu
  Cc: linux-kernel, linux-arm-kernel, iommu, Alex Williamson,
	Robin Murphy, Will Deacon, Kirti Wankhede, Cornelia Huck,
	Marc Zyngier, Catalin Marinas, Mark Rutland, James Morse,
	Suzuki K Poulose, wanghaibin.wang, jiangkunkun, yuzenghui,
	lushenming

On 21-03-16 19:39:47, Keqian Zhu wrote:
> Hi Yi,
> 
> On 2021/3/16 17:17, Yi Sun wrote:
> > On 21-03-10 17:06:07, Keqian Zhu wrote:
> >> From: jiangkunkun <jiangkunkun@huawei.com>
> >>
> >> Block descriptor is not a proper granule for dirty log tracking.
> >> Take an extreme example, if DMA writes one byte, under 1G mapping,
> >> the dirty amount reported to userspace is 1G, but under 4K mapping,
> >> the dirty amount is just 4K.
> >>
> >> This adds a new interface named start_dirty_log in iommu layer and
> >> arm smmuv3 implements it, which splits block descriptor to an span
> >> of page descriptors. Other types of IOMMU will perform architecture
> >> specific actions to start dirty log.
> >>
> >> To allow code reuse, the split_block operation is realized as an
> >> iommu_ops too. We flush all iotlbs after the whole procedure is
> >> completed to ease the pressure of iommu, as we will hanle a huge
> >> range of mapping in general.
> >>
> >> Spliting block does not simultaneously work with other pgtable ops,
> >> as the only designed user is vfio, which always hold a lock, so race
> >> condition is not considered in the pgtable ops.
> >>
> >> Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
> >> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
> >> ---
> >>
> >> changelog:
> >>
> >> v2:
> >>  - Change the return type of split_block(). size_t -> int.
> >>  - Change commit message to properly describe race condition. (Robin)
> >>  - Change commit message to properly describe the need of split block.
> >>  - Add a new interface named start_dirty_log(). (Sun Yi)
> >>  - Change commit message to explain the realtionship of split_block() and start_dirty_log().
> >>
> >> ---
> >>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  52 +++++++++
> >>  drivers/iommu/io-pgtable-arm.c              | 122 ++++++++++++++++++++
> >>  drivers/iommu/iommu.c                       |  48 ++++++++
> >>  include/linux/io-pgtable.h                  |   2 +
> >>  include/linux/iommu.h                       |  24 ++++
> >>  5 files changed, 248 insertions(+)
> >>
> > Could you please split iommu common interface to a separate patch?
> > This may make review and comments easier.
> Yup, good suggestion.
> 
> > 
> > IMHO, I think the start/stop interfaces could be merged into one, e.g:
> >     int iommu_domain_set_hwdbm(struct iommu_domain *domain, bool enable,
> >                                unsigned long iova, size_t size,
> >                                int prot);
> Looks good, this reduces some code. but I have a concern that this causes loss of flexibility,
> as we must pass same arguments when start|stop dirty log. What's your opinion about this?
> 
Per current design, start/stop interfaces have similar arguments. So I
think it is ok for now. For future extension, we may think to define a
structure to pass these arguments.

> > 
> > Same comments to patch 5.
> OK. Thanks.
> 
> > 
> > BRs,
> > Yi Sun
> > 
> >> -- 
> >> 2.19.1
> > .
> Thanks,
> Keqian

BRs,
Yi Sun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 06/11] iommu/arm-smmu-v3: Scan leaf TTD to sync hardware dirty log
  2021-03-10  9:06 ` [PATCH v2 06/11] iommu/arm-smmu-v3: Scan leaf TTD to sync hardware " Keqian Zhu
@ 2021-03-17 10:44   ` Yi Sun
  2021-03-17 12:59     ` Keqian Zhu
  0 siblings, 1 reply; 17+ messages in thread
From: Yi Sun @ 2021-03-17 10:44 UTC (permalink / raw)
  To: Keqian Zhu
  Cc: linux-kernel, linux-arm-kernel, iommu, Alex Williamson,
	Robin Murphy, Will Deacon, Kirti Wankhede, Cornelia Huck,
	Marc Zyngier, Catalin Marinas, Mark Rutland, James Morse,
	Suzuki K Poulose, wanghaibin.wang, jiangkunkun, yuzenghui,
	lushenming

On 21-03-10 17:06:09, Keqian Zhu wrote:
> From: jiangkunkun <jiangkunkun@huawei.com>
> 
> During dirty log tracking, user will try to retrieve dirty log from
> iommu if it supports hardware dirty log.
> 
> This adds a new interface named sync_dirty_log in iommu layer and
> arm smmuv3 implements it, which scans leaf TTD and treats it's dirty
> if it's writable (As we just enable HTTU for stage1, so check whether
> AP[2] is not set).
> 
> Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
> ---
> 
> changelog:
> 
> v2:
>  - Add new sanity check in arm_smmu_sync_dirty_log(). (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
>  - Document the purpose of flush_iotlb in arm_smmu_sync_dirty_log(). (Robin)
>  
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 30 +++++++
>  drivers/iommu/io-pgtable-arm.c              | 90 +++++++++++++++++++++
>  drivers/iommu/iommu.c                       | 38 +++++++++
>  include/linux/io-pgtable.h                  |  4 +
>  include/linux/iommu.h                       | 18 +++++
>  5 files changed, 180 insertions(+)
> 
Please split iommu common interface out. Thanks!

[...]

> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 2a10294b62a3..44dfb78f9050 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2850,6 +2850,44 @@ int iommu_stop_dirty_log(struct iommu_domain *domain, unsigned long iova,
>  }
>  EXPORT_SYMBOL_GPL(iommu_stop_dirty_log);
>  
> +int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
> +			 size_t size, unsigned long *bitmap,
> +			 unsigned long base_iova, unsigned long bitmap_pgshift)

One open question: shall we add PASID as one parameter to make iommu
know which address space to visit?

For live migration, the pasid should not be necessary. But considering
future extension, it may be required.

BRs,
Yi Sun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 06/11] iommu/arm-smmu-v3: Scan leaf TTD to sync hardware dirty log
  2021-03-17 10:44   ` Yi Sun
@ 2021-03-17 12:59     ` Keqian Zhu
  0 siblings, 0 replies; 17+ messages in thread
From: Keqian Zhu @ 2021-03-17 12:59 UTC (permalink / raw)
  To: Yi Sun
  Cc: linux-kernel, linux-arm-kernel, iommu, Alex Williamson,
	Robin Murphy, Will Deacon, Kirti Wankhede, Cornelia Huck,
	Marc Zyngier, Catalin Marinas, Mark Rutland, James Morse,
	Suzuki K Poulose, wanghaibin.wang, jiangkunkun, yuzenghui,
	lushenming

On 2021/3/17 18:44, Yi Sun wrote:
> On 21-03-10 17:06:09, Keqian Zhu wrote:
>> From: jiangkunkun <jiangkunkun@huawei.com>
>>
>> During dirty log tracking, user will try to retrieve dirty log from
>> iommu if it supports hardware dirty log.
>>
>> This adds a new interface named sync_dirty_log in iommu layer and
>> arm smmuv3 implements it, which scans leaf TTD and treats it's dirty
>> if it's writable (As we just enable HTTU for stage1, so check whether
>> AP[2] is not set).
>>
>> Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
>> ---
>>
>> changelog:
>>
>> v2:
>>  - Add new sanity check in arm_smmu_sync_dirty_log(). (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
>>  - Document the purpose of flush_iotlb in arm_smmu_sync_dirty_log(). (Robin)
>>  
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 30 +++++++
>>  drivers/iommu/io-pgtable-arm.c              | 90 +++++++++++++++++++++
>>  drivers/iommu/iommu.c                       | 38 +++++++++
>>  include/linux/io-pgtable.h                  |  4 +
>>  include/linux/iommu.h                       | 18 +++++
>>  5 files changed, 180 insertions(+)
>>
> Please split iommu common interface out. Thanks!
Yes, I will do it in v3.

> 
> [...]
> 
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 2a10294b62a3..44dfb78f9050 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2850,6 +2850,44 @@ int iommu_stop_dirty_log(struct iommu_domain *domain, unsigned long iova,
>>  }
>>  EXPORT_SYMBOL_GPL(iommu_stop_dirty_log);
>>  
>> +int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
>> +			 size_t size, unsigned long *bitmap,
>> +			 unsigned long base_iova, unsigned long bitmap_pgshift)
> 
> One open question: shall we add PASID as one parameter to make iommu
> know which address space to visit?
> 
> For live migration, the pasid should not be necessary. But considering
Sure, for live migration we just need to care about level/stage 2 mapping under nested mode.

> future extension, it may be required.
It sounds a good idea. I will consider this, thanks!

> 
> BRs,
> Yi Sun
> .
> 
Thanks,
Keqian

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-03-17 13:00 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-10  9:06 [PATCH v2 00/11] vfio/iommu_type1: Implement dirty log tracking based on smmuv3 HTTU Keqian Zhu
2021-03-10  9:06 ` [PATCH v2 01/11] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update Keqian Zhu
2021-03-10  9:06 ` [PATCH v2 02/11] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping Keqian Zhu
2021-03-10  9:06 ` [PATCH v2 03/11] iommu/arm-smmu-v3: Add feature detection for BBML Keqian Zhu
2021-03-10  9:06 ` [PATCH v2 04/11] iommu/arm-smmu-v3: Split block descriptor when start dirty log Keqian Zhu
2021-03-16  9:17   ` Yi Sun
2021-03-16 11:39     ` Keqian Zhu
2021-03-17  3:06       ` Yi Sun
2021-03-10  9:06 ` [PATCH v2 05/11] iommu/arm-smmu-v3: Merge a span of page when stop " Keqian Zhu
2021-03-10  9:06 ` [PATCH v2 06/11] iommu/arm-smmu-v3: Scan leaf TTD to sync hardware " Keqian Zhu
2021-03-17 10:44   ` Yi Sun
2021-03-17 12:59     ` Keqian Zhu
2021-03-10  9:06 ` [PATCH v2 07/11] iommu/arm-smmu-v3: Clear dirty log according to bitmap Keqian Zhu
2021-03-10  9:06 ` [PATCH v2 08/11] iommu/arm-smmu-v3: Add HWDBM device feature reporting Keqian Zhu
2021-03-10  9:06 ` [PATCH v2 09/11] vfio/iommu_type1: Add HWDBM status maintanance Keqian Zhu
2021-03-10  9:06 ` [PATCH v2 10/11] vfio/iommu_type1: Optimize dirty bitmap population based on iommu HWDBM Keqian Zhu
2021-03-10  9:06 ` [PATCH v2 11/11] vfio/iommu_type1: Add support for manual dirty log clear Keqian Zhu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).