All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] Consolidate domain cache invalidation
@ 2024-03-25  2:16 Lu Baolu
  2024-03-25  2:16 ` [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface Lu Baolu
                   ` (12 more replies)
  0 siblings, 13 replies; 44+ messages in thread
From: Lu Baolu @ 2024-03-25  2:16 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian, Jason Gunthorpe
  Cc: Tina Zhang, Yi Liu, iommu, linux-kernel, Lu Baolu

The IOMMU hardware cache needs to be invalidated whenever the mappings
in the domain are changed. Currently, domain cache invalidation is
scattered across different places, causing several issues:

- IOMMU IOTLB Invalidation: This is done by iterating through the domain
  IDs of each domain using the following code:

        xa_for_each(&dmar_domain->iommu_array, i, info)
                iommu_flush_iotlb_psi(info->iommu, dmar_domain,
                                      start_pfn, nrpages,
                                      list_empty(&gather->freelist), 0);

  This code could theoretically cause a use-after-free problem because
  there's no lock to protect the "info" pointer within the loop.

- Inconsistent Invalidation Methods: Different domain types implement
  their own cache invalidation methods, making the code difficult to
  maintain. For example, the DMA domain, SVA domain, and nested domain
  have similar cache invalidation code scattered across different files.

- SVA Domain Inconsistency: The SVA domain implementation uses a
  completely different data structure to track attached devices compared
  to other domains. This creates unnecessary differences and, even
  worse, leads to duplicate IOTLB invalidation when an SVA domain is
  attached to devices belonging to different IOMMU domains.

- Nested Domain Dependency: The special overlap between a nested domain
  and its parent domain requires a dedicated parent_domain_flush()
  helper function to be called everywhere the parent domain's mapping
  changes.

- Limited Debugging Support: There are currently no debugging aids
  available for domain cache invalidation.

By consolidating domain cache invalidation into a common location, we
can address the issues mentioned above and improve the code's
maintainability and debuggability.

Jason Gunthorpe (1):
  iommu: Add ops->domain_alloc_sva()

Lu Baolu (11):
  iommu/vt-d: Add cache tag assignment interface
  iommu/vt-d: Add cache tag invalidation helpers
  iommu/vt-d: Add trace events for cache tag interface
  iommu/vt-d: Use cache_tag_flush_all() in flush_iotlb_all
  iommu/vt-d: Use cache_tag_flush_range() in tlb_sync
  iommu/vt-d: Use cache_tag_flush_cm_range() in iotlb_sync_map
  iommu/vt-d: Cleanup use of iommu_flush_iotlb_psi()
  iommu/vt-d: Use cache_tag_flush_range() in cache_invalidate_user
  iommu/vt-d: Use cache helpers in arch_invalidate_secondary_tlbs
  iommu/vt-d: Retire intel_svm_dev
  iommu/vt-d: Retire struct intel_svm

 include/linux/iommu.h        |   3 +
 drivers/iommu/intel/iommu.h  |  66 +++---
 drivers/iommu/intel/trace.h  |  97 +++++++++
 drivers/iommu/intel/cache.c  | 389 +++++++++++++++++++++++++++++++++++
 drivers/iommu/intel/iommu.c  | 294 ++++----------------------
 drivers/iommu/intel/nested.c |  71 ++-----
 drivers/iommu/intel/svm.c    | 279 ++++++-------------------
 drivers/iommu/iommu-sva.c    |  16 +-
 drivers/iommu/intel/Makefile |   2 +-
 9 files changed, 660 insertions(+), 557 deletions(-)
 create mode 100644 drivers/iommu/intel/cache.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface
  2024-03-25  2:16 [PATCH 00/12] Consolidate domain cache invalidation Lu Baolu
@ 2024-03-25  2:16 ` Lu Baolu
  2024-03-28  7:12   ` Tian, Kevin
  2024-04-10 15:41   ` Jason Gunthorpe
  2024-03-25  2:16 ` [PATCH 02/12] iommu/vt-d: Add cache tag invalidation helpers Lu Baolu
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 44+ messages in thread
From: Lu Baolu @ 2024-03-25  2:16 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian, Jason Gunthorpe
  Cc: Tina Zhang, Yi Liu, iommu, linux-kernel, Lu Baolu

Caching tag is a combination of tags used by the hardware to cache various
translations. Whenever a mapping in a domain is changed, the IOMMU driver
should invalidate the caches with the caching tags. The VT-d specification
describes caching tags in section 6.2.1, Tagging of Cached Translations.

Add interface to assign caching tags to an IOMMU domain when attached to a
RID or PASID, and unassign caching tags when a domain is detached from a
RID or PASID. All caching tags are listed in the per-domain tag list and
are protected by a dedicated lock.

In addition to the basic IOTLB and devTLB caching tag types, PARENT_IOTLB
and PARENT_DEVTLB tag types are also introduced. These tags are used for
caches that store translations for DMA accesses through a nested user
domain. They are affected by changes to mappings in the parent domain.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.h  |  25 +++++
 drivers/iommu/intel/cache.c  | 192 +++++++++++++++++++++++++++++++++++
 drivers/iommu/intel/iommu.c  |  31 +++++-
 drivers/iommu/intel/nested.c |  21 +++-
 drivers/iommu/intel/svm.c    |  12 ++-
 drivers/iommu/intel/Makefile |   2 +-
 6 files changed, 274 insertions(+), 9 deletions(-)
 create mode 100644 drivers/iommu/intel/cache.c

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 404d2476a877..e3723b7a0b31 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -607,6 +607,9 @@ struct dmar_domain {
 	struct list_head devices;	/* all devices' list */
 	struct list_head dev_pasids;	/* all attached pasids */
 
+	spinlock_t cache_lock;		/* Protect the cache tag list */
+	struct list_head cache_tags;	/* Cache tag list */
+
 	int		iommu_superpage;/* Level of superpages supported:
 					   0 == 4KiB (no superpages), 1 == 2MiB,
 					   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
@@ -1092,6 +1095,28 @@ struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
 					       const struct iommu_user_data *user_data);
 struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid);
 
+enum cache_tag_type {
+	CACHE_TAG_TYPE_IOTLB,
+	CACHE_TAG_TYPE_DEVTLB,
+	CACHE_TAG_TYPE_PARENT_IOTLB,
+	CACHE_TAG_TYPE_PARENT_DEVTLB,
+};
+
+struct cache_tag {
+	struct list_head node;
+	enum cache_tag_type type;
+	struct intel_iommu *iommu;
+	struct device *dev;
+	u16 domain_id;
+	ioasid_t pasid;
+	int users;
+};
+
+int cache_tag_assign_domain(struct dmar_domain *domain, u16 did,
+			    struct device *dev, ioasid_t pasid);
+void cache_tag_unassign_domain(struct dmar_domain *domain, u16 did,
+			       struct device *dev, ioasid_t pasid);
+
 #ifdef CONFIG_INTEL_IOMMU_SVM
 void intel_svm_check(struct intel_iommu *iommu);
 int intel_svm_enable_prq(struct intel_iommu *iommu);
diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c
new file mode 100644
index 000000000000..5a4e12e494b6
--- /dev/null
+++ b/drivers/iommu/intel/cache.c
@@ -0,0 +1,192 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * cache.c - Intel VT-d cache invalidation
+ *
+ * Copyright (C) 2024 Intel Corporation
+ *
+ * Author: Lu Baolu <baolu.lu@linux.intel.com>
+ */
+
+#define pr_fmt(fmt)	"DMAR: " fmt
+
+#include <linux/dmar.h>
+#include <linux/iommu.h>
+#include <linux/memory.h>
+#include <linux/spinlock.h>
+
+#include "iommu.h"
+#include "pasid.h"
+
+/* Checks if an existing cache tag can be reused for a new association. */
+static bool cache_tag_reusable(struct cache_tag *tag, u16 domain_id,
+			       struct intel_iommu *iommu, struct device *dev,
+			       ioasid_t pasid, enum cache_tag_type type)
+{
+	if (tag->type != type)
+		return false;
+
+	if (tag->domain_id != domain_id || tag->pasid != pasid)
+		return false;
+
+	if (type == CACHE_TAG_TYPE_IOTLB)
+		return tag->iommu == iommu;
+
+	if (type == CACHE_TAG_TYPE_DEVTLB)
+		return tag->dev == dev;
+
+	return false;
+}
+
+/* Assign a cache tag with specified type to domain. */
+static int cache_tag_assign(struct dmar_domain *domain, u16 did,
+			    struct device *dev, ioasid_t pasid,
+			    enum cache_tag_type type)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct intel_iommu *iommu = info->iommu;
+	struct cache_tag *tag, *temp;
+	unsigned long flags;
+
+	tag = kzalloc(sizeof(*tag), GFP_KERNEL);
+	if (!tag)
+		return -ENOMEM;
+
+	tag->type = type;
+	tag->iommu = iommu;
+	tag->dev = dev;
+	tag->domain_id = did;
+	tag->pasid = pasid;
+	tag->users = 1;
+
+	spin_lock_irqsave(&domain->cache_lock, flags);
+	list_for_each_entry(temp, &domain->cache_tags, node) {
+		if (cache_tag_reusable(temp, did, iommu, dev, pasid, type)) {
+			temp->users++;
+			spin_unlock_irqrestore(&domain->cache_lock, flags);
+			kfree(tag);
+			return 0;
+		}
+	}
+	list_add_tail(&tag->node, &domain->cache_tags);
+	spin_unlock_irqrestore(&domain->cache_lock, flags);
+
+	return 0;
+}
+
+/* Unassign a cache tag with specified type from domain. */
+static void cache_tag_unassign(struct dmar_domain *domain, u16 did,
+			       struct device *dev, ioasid_t pasid,
+			       enum cache_tag_type type)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct intel_iommu *iommu = info->iommu;
+	struct cache_tag *tag;
+	unsigned long flags;
+
+	spin_lock_irqsave(&domain->cache_lock, flags);
+	list_for_each_entry(tag, &domain->cache_tags, node) {
+		if (cache_tag_reusable(tag, did, iommu, dev, pasid, type)) {
+			if (--tag->users == 0) {
+				list_del(&tag->node);
+				kfree(tag);
+			}
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&domain->cache_lock, flags);
+}
+
+static int __cache_tag_assign_domain(struct dmar_domain *domain, u16 did,
+				     struct device *dev, ioasid_t pasid)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	int ret;
+
+	ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_TYPE_IOTLB);
+	if (ret || !info->ats_enabled)
+		return ret;
+
+	ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_TYPE_DEVTLB);
+	if (ret)
+		cache_tag_unassign(domain, did, dev, pasid, CACHE_TAG_TYPE_IOTLB);
+
+	return ret;
+}
+
+static void __cache_tag_unassign_domain(struct dmar_domain *domain, u16 did,
+					struct device *dev, ioasid_t pasid)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+
+	cache_tag_unassign(domain, did, dev, pasid, CACHE_TAG_TYPE_IOTLB);
+
+	if (info->ats_enabled)
+		cache_tag_unassign(domain, did, dev, pasid, CACHE_TAG_TYPE_DEVTLB);
+}
+
+static int __cache_tag_assign_parent_domain(struct dmar_domain *domain, u16 did,
+					    struct device *dev, ioasid_t pasid)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	int ret;
+
+	ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_TYPE_PARENT_IOTLB);
+	if (ret || !info->ats_enabled)
+		return ret;
+
+	ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_TYPE_PARENT_DEVTLB);
+	if (ret)
+		cache_tag_unassign(domain, did, dev, pasid, CACHE_TAG_TYPE_PARENT_IOTLB);
+
+	return ret;
+}
+
+static void __cache_tag_unassign_parent_domain(struct dmar_domain *domain, u16 did,
+					       struct device *dev, ioasid_t pasid)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+
+	cache_tag_unassign(domain, did, dev, pasid, CACHE_TAG_TYPE_PARENT_IOTLB);
+
+	if (info->ats_enabled)
+		cache_tag_unassign(domain, did, dev, pasid, CACHE_TAG_TYPE_PARENT_DEVTLB);
+}
+
+/*
+ * Assigns cache tags to a domain when it's associated with a device's
+ * PASID using a specific domain ID.
+ *
+ * On success (return value of 0), cache tags are created and added to the
+ * domain's cache tag list. On failure (negative return value), an error
+ * code is returned indicating the reason for the failure.
+ */
+int cache_tag_assign_domain(struct dmar_domain *domain, u16 did,
+			    struct device *dev, ioasid_t pasid)
+{
+	int ret;
+
+	ret = __cache_tag_assign_domain(domain, did, dev, pasid);
+	if (ret || domain->domain.type != IOMMU_DOMAIN_NESTED)
+		return ret;
+
+	ret = __cache_tag_assign_parent_domain(domain->s2_domain, did, dev, pasid);
+	if (ret)
+		__cache_tag_unassign_domain(domain, did, dev, pasid);
+
+	return ret;
+}
+
+/*
+ * Removes the cache tags associated with a device's PASID when the domain is
+ * detached from the device.
+ *
+ * The cache tags must be previously assigned to the domain by calling the
+ * assign interface.
+ */
+void cache_tag_unassign_domain(struct dmar_domain *domain, u16 did,
+			       struct device *dev, ioasid_t pasid)
+{
+	__cache_tag_unassign_domain(domain, did, dev, pasid);
+	if (domain->domain.type == IOMMU_DOMAIN_NESTED)
+		__cache_tag_unassign_parent_domain(domain->s2_domain, did, dev, pasid);
+}
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 50eb9aed47cc..b4efbdedccce 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1750,7 +1750,9 @@ static struct dmar_domain *alloc_domain(unsigned int type)
 	domain->has_iotlb_device = false;
 	INIT_LIST_HEAD(&domain->devices);
 	INIT_LIST_HEAD(&domain->dev_pasids);
+	INIT_LIST_HEAD(&domain->cache_tags);
 	spin_lock_init(&domain->lock);
+	spin_lock_init(&domain->cache_lock);
 	xa_init(&domain->iommu_array);
 
 	return domain;
@@ -2322,11 +2324,20 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
 	struct device_domain_info *info = dev_iommu_priv_get(dev);
 	struct intel_iommu *iommu = info->iommu;
 	unsigned long flags;
+	u16 did;
 	int ret;
 
 	ret = domain_attach_iommu(domain, iommu);
 	if (ret)
 		return ret;
+
+	did = domain_id_iommu(domain, iommu);
+	ret = cache_tag_assign_domain(domain, did, dev, IOMMU_NO_PASID);
+	if (ret) {
+		domain_detach_iommu(domain, iommu);
+		return ret;
+	}
+
 	info->domain = domain;
 	spin_lock_irqsave(&domain->lock, flags);
 	list_add(&info->link, &domain->devices);
@@ -3798,6 +3809,7 @@ void device_block_translation(struct device *dev)
 	struct device_domain_info *info = dev_iommu_priv_get(dev);
 	struct intel_iommu *iommu = info->iommu;
 	unsigned long flags;
+	u16 did;
 
 	iommu_disable_pci_caps(info);
 	if (!dev_is_real_dma_subdevice(dev)) {
@@ -3815,6 +3827,8 @@ void device_block_translation(struct device *dev)
 	list_del(&info->link);
 	spin_unlock_irqrestore(&info->domain->lock, flags);
 
+	did = domain_id_iommu(info->domain, iommu);
+	cache_tag_unassign_domain(info->domain, did, dev, IOMMU_NO_PASID);
 	domain_detach_iommu(info->domain, iommu);
 	info->domain = NULL;
 }
@@ -4595,10 +4609,12 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 	struct dmar_domain *dmar_domain;
 	struct iommu_domain *domain;
 	unsigned long flags;
+	u16 did;
 
 	domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
 	if (WARN_ON_ONCE(!domain))
 		goto out_tear_down;
+	dmar_domain = to_dmar_domain(domain);
 
 	/*
 	 * The SVA implementation needs to handle its own stuffs like the mm
@@ -4607,10 +4623,11 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 	 */
 	if (domain->type == IOMMU_DOMAIN_SVA) {
 		intel_svm_remove_dev_pasid(dev, pasid);
+		cache_tag_unassign_domain(dmar_domain,
+					  FLPT_DEFAULT_DID, dev, pasid);
 		goto out_tear_down;
 	}
 
-	dmar_domain = to_dmar_domain(domain);
 	spin_lock_irqsave(&dmar_domain->lock, flags);
 	list_for_each_entry(curr, &dmar_domain->dev_pasids, link_domain) {
 		if (curr->dev == dev && curr->pasid == pasid) {
@@ -4622,6 +4639,8 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 	WARN_ON_ONCE(!dev_pasid);
 	spin_unlock_irqrestore(&dmar_domain->lock, flags);
 
+	did = domain_id_iommu(dmar_domain, iommu);
+	cache_tag_unassign_domain(dmar_domain, did, dev, pasid);
 	domain_detach_iommu(dmar_domain, iommu);
 	intel_iommu_debugfs_remove_dev_pasid(dev_pasid);
 	kfree(dev_pasid);
@@ -4638,6 +4657,7 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain *domain,
 	struct intel_iommu *iommu = info->iommu;
 	struct dev_pasid_info *dev_pasid;
 	unsigned long flags;
+	u16 did;
 	int ret;
 
 	if (!pasid_supported(iommu) || dev_is_real_dma_subdevice(dev))
@@ -4661,6 +4681,11 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain *domain,
 	if (ret)
 		goto out_free;
 
+	did = domain_id_iommu(dmar_domain, iommu);
+	ret = cache_tag_assign_domain(dmar_domain, did, dev, pasid);
+	if (ret)
+		goto out_detach_iommu;
+
 	if (domain_type_is_si(dmar_domain))
 		ret = intel_pasid_setup_pass_through(iommu, dev, pasid);
 	else if (dmar_domain->use_first_level)
@@ -4670,7 +4695,7 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain *domain,
 		ret = intel_pasid_setup_second_level(iommu, dmar_domain,
 						     dev, pasid);
 	if (ret)
-		goto out_detach_iommu;
+		goto out_unassign_tag;
 
 	dev_pasid->dev = dev;
 	dev_pasid->pasid = pasid;
@@ -4682,6 +4707,8 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain *domain,
 		intel_iommu_debugfs_create_dev_pasid(dev_pasid);
 
 	return 0;
+out_unassign_tag:
+	cache_tag_unassign_domain(dmar_domain, did, dev, pasid);
 out_detach_iommu:
 	domain_detach_iommu(dmar_domain, iommu);
 out_free:
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index a7d68f3d518a..85c744099558 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -26,6 +26,7 @@ static int intel_nested_attach_dev(struct iommu_domain *domain,
 	struct intel_iommu *iommu = info->iommu;
 	unsigned long flags;
 	int ret = 0;
+	u16 did;
 
 	if (info->domain)
 		device_block_translation(dev);
@@ -52,13 +53,15 @@ static int intel_nested_attach_dev(struct iommu_domain *domain,
 		return ret;
 	}
 
+	did = domain_id_iommu(dmar_domain, iommu);
+	ret = cache_tag_assign_domain(dmar_domain, did, dev, IOMMU_NO_PASID);
+	if (ret)
+		goto detach_iommu;
+
 	ret = intel_pasid_setup_nested(iommu, dev,
 				       IOMMU_NO_PASID, dmar_domain);
-	if (ret) {
-		domain_detach_iommu(dmar_domain, iommu);
-		dev_err_ratelimited(dev, "Failed to setup pasid entry\n");
-		return ret;
-	}
+	if (ret)
+		goto unassign_tag;
 
 	info->domain = dmar_domain;
 	spin_lock_irqsave(&dmar_domain->lock, flags);
@@ -68,6 +71,12 @@ static int intel_nested_attach_dev(struct iommu_domain *domain,
 	domain_update_iotlb(dmar_domain);
 
 	return 0;
+unassign_tag:
+	cache_tag_unassign_domain(dmar_domain, did, dev, IOMMU_NO_PASID);
+detach_iommu:
+	domain_detach_iommu(dmar_domain, iommu);
+
+	return ret;
 }
 
 static void intel_nested_domain_free(struct iommu_domain *domain)
@@ -206,7 +215,9 @@ struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
 	domain->domain.type = IOMMU_DOMAIN_NESTED;
 	INIT_LIST_HEAD(&domain->devices);
 	INIT_LIST_HEAD(&domain->dev_pasids);
+	INIT_LIST_HEAD(&domain->cache_tags);
 	spin_lock_init(&domain->lock);
+	spin_lock_init(&domain->cache_lock);
 	xa_init(&domain->iommu_array);
 
 	spin_lock(&s2_domain->s1_lock);
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index c1bed89b1026..d706226e84ee 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -366,17 +366,25 @@ static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
 			sdev->qdep = 0;
 	}
 
+	ret = cache_tag_assign_domain(to_dmar_domain(domain),
+				      FLPT_DEFAULT_DID, dev, pasid);
+	if (ret)
+		goto free_sdev;
+
 	/* Setup the pasid table: */
 	sflags = cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
 	ret = intel_pasid_setup_first_level(iommu, dev, mm->pgd, pasid,
 					    FLPT_DEFAULT_DID, sflags);
 	if (ret)
-		goto free_sdev;
+		goto unassign_tag;
 
 	list_add_rcu(&sdev->list, &svm->devs);
 
 	return 0;
 
+unassign_tag:
+	cache_tag_unassign_domain(to_dmar_domain(domain),
+				  FLPT_DEFAULT_DID, dev, pasid);
 free_sdev:
 	kfree(sdev);
 free_svm:
@@ -795,6 +803,8 @@ struct iommu_domain *intel_svm_domain_alloc(void)
 	if (!domain)
 		return NULL;
 	domain->domain.ops = &intel_svm_domain_ops;
+	INIT_LIST_HEAD(&domain->cache_tags);
+	spin_lock_init(&domain->cache_lock);
 
 	return &domain->domain;
 }
diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile
index 5402b699a122..c8beb0281559 100644
--- a/drivers/iommu/intel/Makefile
+++ b/drivers/iommu/intel/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
-obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o
+obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o cache.o
 obj-$(CONFIG_DMAR_TABLE) += trace.o cap_audit.o
 obj-$(CONFIG_DMAR_PERF) += perf.o
 obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 02/12] iommu/vt-d: Add cache tag invalidation helpers
  2024-03-25  2:16 [PATCH 00/12] Consolidate domain cache invalidation Lu Baolu
  2024-03-25  2:16 ` [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface Lu Baolu
@ 2024-03-25  2:16 ` Lu Baolu
  2024-03-28  7:39   ` Tian, Kevin
  2024-03-25  2:16 ` [PATCH 03/12] iommu/vt-d: Add trace events for cache tag interface Lu Baolu
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Lu Baolu @ 2024-03-25  2:16 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian, Jason Gunthorpe
  Cc: Tina Zhang, Yi Liu, iommu, linux-kernel, Lu Baolu

Add several helpers to invalidate the caches after mappings in the
affected domain are changed.

- cache_tag_flush_range() invalidates a range of caches after mappings
  within this range are changed. It uses the page-selective cache
  invalidation methods.

- cache_tag_flush_all() invalidates all caches tagged by a domain ID.
  It uses the domain-selective cache invalidation methods.

- cache_tag_flush_cm_range() invalidates a range of caches if IOMMU is
  working in the caching mode and second-only translation is used for
  the affected domain. It is called when non-present to present mappings
  are created.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.h |   7 ++
 drivers/iommu/intel/cache.c | 189 ++++++++++++++++++++++++++++++++++++
 drivers/iommu/intel/iommu.c |   5 -
 3 files changed, 196 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index e3723b7a0b31..d05fa0122d65 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -35,6 +35,8 @@
 #define VTD_PAGE_MASK		(((u64)-1) << VTD_PAGE_SHIFT)
 #define VTD_PAGE_ALIGN(addr)	(((addr) + VTD_PAGE_SIZE - 1) & VTD_PAGE_MASK)
 
+#define IOVA_PFN(addr)		((addr) >> PAGE_SHIFT)
+
 #define VTD_STRIDE_SHIFT        (9)
 #define VTD_STRIDE_MASK         (((u64)-1) << VTD_STRIDE_SHIFT)
 
@@ -1116,6 +1118,11 @@ int cache_tag_assign_domain(struct dmar_domain *domain, u16 did,
 			    struct device *dev, ioasid_t pasid);
 void cache_tag_unassign_domain(struct dmar_domain *domain, u16 did,
 			       struct device *dev, ioasid_t pasid);
+void cache_tag_flush_range(struct dmar_domain *domain, unsigned long start,
+			   unsigned long end, int ih);
+void cache_tag_flush_all(struct dmar_domain *domain);
+void cache_tag_flush_cm_range(struct dmar_domain *domain, unsigned long start,
+			      unsigned long end);
 
 #ifdef CONFIG_INTEL_IOMMU_SVM
 void intel_svm_check(struct intel_iommu *iommu);
diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c
index 5a4e12e494b6..4c245d39faf2 100644
--- a/drivers/iommu/intel/cache.c
+++ b/drivers/iommu/intel/cache.c
@@ -12,6 +12,7 @@
 #include <linux/dmar.h>
 #include <linux/iommu.h>
 #include <linux/memory.h>
+#include <linux/pci.h>
 #include <linux/spinlock.h>
 
 #include "iommu.h"
@@ -190,3 +191,191 @@ void cache_tag_unassign_domain(struct dmar_domain *domain, u16 did,
 	if (domain->domain.type == IOMMU_DOMAIN_NESTED)
 		__cache_tag_unassign_parent_domain(domain->s2_domain, did, dev, pasid);
 }
+
+static unsigned long calculate_psi_aligned_address(unsigned long start,
+						   unsigned long end,
+						   unsigned long *_pages,
+						   unsigned long *_mask)
+{
+	unsigned long pages = (end - start + VTD_PAGE_SIZE - 1) >> VTD_PAGE_SHIFT;
+	unsigned long aligned_pages = __roundup_pow_of_two(pages);
+	unsigned long bitmask = aligned_pages - 1;
+	unsigned long mask = ilog2(aligned_pages);
+	unsigned long pfn = IOVA_PFN(start);
+
+	/*
+	 * PSI masks the low order bits of the base address. If the
+	 * address isn't aligned to the mask, then compute a mask value
+	 * needed to ensure the target range is flushed.
+	 */
+	if (unlikely(bitmask & pfn)) {
+		unsigned long end_pfn = pfn + pages - 1, shared_bits;
+
+		/*
+		 * Since end_pfn <= pfn + bitmask, the only way bits
+		 * higher than bitmask can differ in pfn and end_pfn is
+		 * by carrying. This means after masking out bitmask,
+		 * high bits starting with the first set bit in
+		 * shared_bits are all equal in both pfn and end_pfn.
+		 */
+		shared_bits = ~(pfn ^ end_pfn) & ~bitmask;
+		mask = shared_bits ? __ffs(shared_bits) : BITS_PER_LONG;
+	}
+
+	*_pages = aligned_pages;
+	*_mask = mask;
+
+	return ALIGN_DOWN(start, VTD_PAGE_SIZE);
+}
+
+/*
+ * Invalidates a range of IOVA from @start (inclusive) to @end (inclusive)
+ * when the memory mappings in the target domain have been modified.
+ */
+void cache_tag_flush_range(struct dmar_domain *domain, unsigned long start,
+			   unsigned long end, int ih)
+{
+	unsigned long pages, mask, addr;
+	struct cache_tag *tag;
+	unsigned long flags;
+
+	addr = calculate_psi_aligned_address(start, end, &pages, &mask);
+
+	spin_lock_irqsave(&domain->cache_lock, flags);
+	list_for_each_entry(tag, &domain->cache_tags, node) {
+		struct device_domain_info *info = dev_iommu_priv_get(tag->dev);
+		struct intel_iommu *iommu = tag->iommu;
+		u16 sid = PCI_DEVID(info->bus, info->devfn);
+
+		switch (tag->type) {
+		case CACHE_TAG_TYPE_IOTLB:
+		case CACHE_TAG_TYPE_PARENT_IOTLB:
+			if (domain->use_first_level) {
+				qi_flush_piotlb(iommu, tag->domain_id,
+						tag->pasid, addr, pages, ih);
+			} else {
+				/*
+				 * Fallback to domain selective flush if no
+				 * PSI support or the size is too big.
+				 */
+				if (!cap_pgsel_inv(iommu->cap) ||
+				    mask > cap_max_amask_val(iommu->cap))
+					iommu->flush.flush_iotlb(iommu, tag->domain_id,
+								 0, 0, DMA_TLB_DSI_FLUSH);
+				else
+					iommu->flush.flush_iotlb(iommu, tag->domain_id,
+								 addr | ih, mask,
+								 DMA_TLB_PSI_FLUSH);
+			}
+			break;
+		case CACHE_TAG_TYPE_DEVTLB:
+			if (tag->pasid == IOMMU_NO_PASID)
+				qi_flush_dev_iotlb(iommu, sid, info->pfsid,
+						   info->ats_qdep, addr, mask);
+			else
+				qi_flush_dev_iotlb_pasid(iommu, sid, info->pfsid,
+							 tag->pasid, info->ats_qdep,
+							 addr, mask);
+
+			quirk_extra_dev_tlb_flush(info, addr, mask, tag->pasid, info->ats_qdep);
+			break;
+		case CACHE_TAG_TYPE_PARENT_DEVTLB:
+			/*
+			 * Address translation cache in device side caches the
+			 * result of nested translation. There is no easy way
+			 * to identify the exact set of nested translations
+			 * affected by a change in S2. So just flush the entire
+			 * device cache.
+			 */
+			qi_flush_dev_iotlb(iommu, sid, info->pfsid, info->ats_qdep,
+					   0, MAX_AGAW_PFN_WIDTH);
+			quirk_extra_dev_tlb_flush(info, 0, MAX_AGAW_PFN_WIDTH,
+						  IOMMU_NO_PASID, info->ats_qdep);
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&domain->cache_lock, flags);
+}
+
+/*
+ * Invalidates all ranges of IOVA when the memory mappings in the target
+ * domain have been modified.
+ */
+void cache_tag_flush_all(struct dmar_domain *domain)
+{
+	struct cache_tag *tag;
+	unsigned long flags;
+
+	spin_lock_irqsave(&domain->cache_lock, flags);
+	list_for_each_entry(tag, &domain->cache_tags, node) {
+		struct device_domain_info *info = dev_iommu_priv_get(tag->dev);
+		struct intel_iommu *iommu = tag->iommu;
+		u16 sid = PCI_DEVID(info->bus, info->devfn);
+
+		switch (tag->type) {
+		case CACHE_TAG_TYPE_IOTLB:
+		case CACHE_TAG_TYPE_PARENT_IOTLB:
+			if (domain->use_first_level)
+				qi_flush_piotlb(iommu, tag->domain_id,
+						tag->pasid, 0, -1, 0);
+			else
+				iommu->flush.flush_iotlb(iommu, tag->domain_id,
+							 0, 0, DMA_TLB_DSI_FLUSH);
+			break;
+		case CACHE_TAG_TYPE_DEVTLB:
+		case CACHE_TAG_TYPE_PARENT_DEVTLB:
+			qi_flush_dev_iotlb(iommu, sid, info->pfsid, info->ats_qdep,
+					   0, MAX_AGAW_PFN_WIDTH);
+			quirk_extra_dev_tlb_flush(info, 0, MAX_AGAW_PFN_WIDTH,
+						  IOMMU_NO_PASID, info->ats_qdep);
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&domain->cache_lock, flags);
+}
+
+/*
+ * Invalidate a range of IOVA when IOMMU is in caching mode and new mappings
+ * are added to the target domain.
+ */
+void cache_tag_flush_cm_range(struct dmar_domain *domain, unsigned long start,
+			      unsigned long end)
+{
+	unsigned long pages, mask, addr;
+	struct cache_tag *tag;
+	unsigned long flags;
+
+	addr = calculate_psi_aligned_address(start, end, &pages, &mask);
+
+	spin_lock_irqsave(&domain->cache_lock, flags);
+	list_for_each_entry(tag, &domain->cache_tags, node) {
+		struct intel_iommu *iommu = tag->iommu;
+
+		/*
+		 * When IOMMU is enabled in caching mode some non-present
+		 * mappings may be cached by the IOTLB if it uses second-
+		 * only translation.
+		 */
+		if (!cap_caching_mode(iommu->cap) || domain->use_first_level) {
+			iommu_flush_write_buffer(iommu);
+			continue;
+		}
+
+		if (tag->type == CACHE_TAG_TYPE_IOTLB ||
+		    tag->type == CACHE_TAG_TYPE_PARENT_IOTLB) {
+			/*
+			 * Fallback to domain selective flush if no
+			 * PSI support or the size is too big.
+			 */
+			if (!cap_pgsel_inv(iommu->cap) ||
+			    mask > cap_max_amask_val(iommu->cap))
+				iommu->flush.flush_iotlb(iommu, tag->domain_id,
+							 0, 0, DMA_TLB_DSI_FLUSH);
+			else
+				iommu->flush.flush_iotlb(iommu, tag->domain_id,
+							 addr, mask,
+							 DMA_TLB_PSI_FLUSH);
+		}
+	}
+	spin_unlock_irqrestore(&domain->cache_lock, flags);
+}
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index b4efbdedccce..93e4422c9b10 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -54,11 +54,6 @@
 				__DOMAIN_MAX_PFN(gaw), (unsigned long)-1))
 #define DOMAIN_MAX_ADDR(gaw)	(((uint64_t)__DOMAIN_MAX_PFN(gaw)) << VTD_PAGE_SHIFT)
 
-/* IO virtual address start page frame number */
-#define IOVA_START_PFN		(1)
-
-#define IOVA_PFN(addr)		((addr) >> PAGE_SHIFT)
-
 static void __init check_tylersburg_isoch(void);
 static int rwbf_quirk;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 03/12] iommu/vt-d: Add trace events for cache tag interface
  2024-03-25  2:16 [PATCH 00/12] Consolidate domain cache invalidation Lu Baolu
  2024-03-25  2:16 ` [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface Lu Baolu
  2024-03-25  2:16 ` [PATCH 02/12] iommu/vt-d: Add cache tag invalidation helpers Lu Baolu
@ 2024-03-25  2:16 ` Lu Baolu
  2024-03-25  2:16 ` [PATCH 04/12] iommu/vt-d: Use cache_tag_flush_all() in flush_iotlb_all Lu Baolu
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Lu Baolu @ 2024-03-25  2:16 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian, Jason Gunthorpe
  Cc: Tina Zhang, Yi Liu, iommu, linux-kernel, Lu Baolu

Add trace events for cache tag assign/unassign/flush operations and trace
the events in the interfaces. These trace events will improve debugging
capabilities by providing detailed information about cache tag activity.
A sample of the traced messages looks like below [messages have been
stripped and wrapped to make the line short].

 cache_tag_assign: dmar9/0000:00:01.0 type iotlb did 1 pasid 9 ref 1
 cache_tag_assign: dmar9/0000:00:01.0 type devtlb did 1 pasid 9 ref 1
 cache_tag_flush_all: dmar6/0000:8a:00.0 type iotlb did 7 pasid 0 ref 1
 cache_tag_flush_range: dmar1 0000:00:1b.0[0] type iotlb did 9
	[0xeab00000-0xeab1afff] addr 0xeab00000 pages 0x20 mask 0x5
 cache_tag_flush_range: dmar1 0000:00:1b.0[0] type iotlb did 9
	[0xeab20000-0xeab31fff] addr 0xeab20000 pages 0x20 mask 0x5
 cache_tag_flush_range: dmar1 0000:00:1b.0[0] type iotlb did 9
	[0xeaa40000-0xeaa51fff] addr 0xeaa40000 pages 0x20 mask 0x5
 cache_tag_flush_range: dmar1 0000:00:1b.0[0] type iotlb did 9
	[0x98de0000-0x98de4fff] addr 0x98de0000 pages 0x8 mask 0x3
 cache_tag_flush_range: dmar1 0000:00:1b.0[0] type iotlb did 9
	[0xe9828000-0xe9828fff] addr 0xe9828000 pages 0x1 mask 0x0
 cache_tag_unassign: dmar9/0000:00:01.0 type iotlb did 1 pasid 9 ref 1
 cache_tag_unassign: dmar9/0000:00:01.0 type devtlb did 1 pasid 9 ref 1

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/trace.h | 97 +++++++++++++++++++++++++++++++++++++
 drivers/iommu/intel/cache.c |  8 +++
 2 files changed, 105 insertions(+)

diff --git a/drivers/iommu/intel/trace.h b/drivers/iommu/intel/trace.h
index 93d96f93a89b..b8c2ebace8af 100644
--- a/drivers/iommu/intel/trace.h
+++ b/drivers/iommu/intel/trace.h
@@ -89,6 +89,103 @@ TRACE_EVENT(prq_report,
 				      __entry->dw1, __entry->dw2, __entry->dw3)
 	)
 );
+
+DECLARE_EVENT_CLASS(cache_tag_log,
+	TP_PROTO(struct cache_tag *tag),
+	TP_ARGS(tag),
+	TP_STRUCT__entry(
+		__string(iommu, tag->iommu->name)
+		__string(dev, dev_name(tag->dev))
+		__field(u16, type)
+		__field(u16, domain_id)
+		__field(u32, pasid)
+		__field(u32, users)
+	),
+	TP_fast_assign(
+		__assign_str(iommu, tag->iommu->name);
+		__assign_str(dev, dev_name(tag->dev));
+		__entry->type = tag->type;
+		__entry->domain_id = tag->domain_id;
+		__entry->pasid = tag->pasid;
+		__entry->users = tag->users;
+	),
+	TP_printk("%s/%s type %s did %d pasid %d ref %d",
+		  __get_str(iommu), __get_str(dev),
+		  __print_symbolic(__entry->type,
+			{ CACHE_TAG_TYPE_IOTLB,		"iotlb" },
+			{ CACHE_TAG_TYPE_DEVTLB,	"devtlb" },
+			{ CACHE_TAG_TYPE_PARENT_IOTLB,	"parent_iotlb" },
+			{ CACHE_TAG_TYPE_PARENT_DEVTLB,	"parent_devtlb" }),
+		__entry->domain_id, __entry->pasid, __entry->users
+	)
+);
+
+DEFINE_EVENT(cache_tag_log, cache_tag_assign,
+	TP_PROTO(struct cache_tag *tag),
+	TP_ARGS(tag)
+);
+
+DEFINE_EVENT(cache_tag_log, cache_tag_unassign,
+	TP_PROTO(struct cache_tag *tag),
+	TP_ARGS(tag)
+);
+
+DEFINE_EVENT(cache_tag_log, cache_tag_flush_all,
+	TP_PROTO(struct cache_tag *tag),
+	TP_ARGS(tag)
+);
+
+DECLARE_EVENT_CLASS(cache_tag_flush,
+	TP_PROTO(struct cache_tag *tag, unsigned long start, unsigned long end,
+		 unsigned long addr, unsigned long pages, unsigned long mask),
+	TP_ARGS(tag, start, end, addr, pages, mask),
+	TP_STRUCT__entry(
+		__string(iommu, tag->iommu->name)
+		__string(dev, dev_name(tag->dev))
+		__field(u16, type)
+		__field(u16, domain_id)
+		__field(u32, pasid)
+		__field(unsigned long, start)
+		__field(unsigned long, end)
+		__field(unsigned long, addr)
+		__field(unsigned long, pages)
+		__field(unsigned long, mask)
+	),
+	TP_fast_assign(
+		__assign_str(iommu, tag->iommu->name);
+		__assign_str(dev, dev_name(tag->dev));
+		__entry->type = tag->type;
+		__entry->domain_id = tag->domain_id;
+		__entry->pasid = tag->pasid;
+		__entry->start = start;
+		__entry->end = end;
+		__entry->addr = addr;
+		__entry->pages = pages;
+		__entry->mask = mask;
+	),
+	TP_printk("%s %s[%d] type %s did %d [0x%lx-0x%lx] addr 0x%lx pages 0x%lx mask 0x%lx",
+		  __get_str(iommu), __get_str(dev), __entry->pasid,
+		  __print_symbolic(__entry->type,
+			{ CACHE_TAG_TYPE_IOTLB,		"iotlb" },
+			{ CACHE_TAG_TYPE_DEVTLB,	"devtlb" },
+			{ CACHE_TAG_TYPE_PARENT_IOTLB,	"parent_iotlb" },
+			{ CACHE_TAG_TYPE_PARENT_DEVTLB,	"parent_devtlb" }),
+		__entry->domain_id, __entry->start, __entry->end,
+		__entry->addr, __entry->pages, __entry->mask
+	)
+);
+
+DEFINE_EVENT(cache_tag_flush, cache_tag_flush_range,
+	TP_PROTO(struct cache_tag *tag, unsigned long start, unsigned long end,
+		 unsigned long addr, unsigned long pages, unsigned long mask),
+	TP_ARGS(tag, start, end, addr, pages, mask)
+);
+
+DEFINE_EVENT(cache_tag_flush, cache_tag_flush_cm_range,
+	TP_PROTO(struct cache_tag *tag, unsigned long start, unsigned long end,
+		 unsigned long addr, unsigned long pages, unsigned long mask),
+	TP_ARGS(tag, start, end, addr, pages, mask)
+);
 #endif /* _TRACE_INTEL_IOMMU_H */
 
 /* This part must be outside protection */
diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c
index 4c245d39faf2..b35ef2ee2aca 100644
--- a/drivers/iommu/intel/cache.c
+++ b/drivers/iommu/intel/cache.c
@@ -17,6 +17,7 @@
 
 #include "iommu.h"
 #include "pasid.h"
+#include "trace.h"
 
 /* Checks if an existing cache tag can be reused for a new association. */
 static bool cache_tag_reusable(struct cache_tag *tag, u16 domain_id,
@@ -65,11 +66,13 @@ static int cache_tag_assign(struct dmar_domain *domain, u16 did,
 			temp->users++;
 			spin_unlock_irqrestore(&domain->cache_lock, flags);
 			kfree(tag);
+			trace_cache_tag_assign(temp);
 			return 0;
 		}
 	}
 	list_add_tail(&tag->node, &domain->cache_tags);
 	spin_unlock_irqrestore(&domain->cache_lock, flags);
+	trace_cache_tag_assign(tag);
 
 	return 0;
 }
@@ -87,6 +90,7 @@ static void cache_tag_unassign(struct dmar_domain *domain, u16 did,
 	spin_lock_irqsave(&domain->cache_lock, flags);
 	list_for_each_entry(tag, &domain->cache_tags, node) {
 		if (cache_tag_reusable(tag, did, iommu, dev, pasid, type)) {
+			trace_cache_tag_unassign(tag);
 			if (--tag->users == 0) {
 				list_del(&tag->node);
 				kfree(tag);
@@ -293,6 +297,8 @@ void cache_tag_flush_range(struct dmar_domain *domain, unsigned long start,
 						  IOMMU_NO_PASID, info->ats_qdep);
 			break;
 		}
+
+		trace_cache_tag_flush_range(tag, start, end, addr, pages, mask);
 	}
 	spin_unlock_irqrestore(&domain->cache_lock, flags);
 }
@@ -330,6 +336,7 @@ void cache_tag_flush_all(struct dmar_domain *domain)
 						  IOMMU_NO_PASID, info->ats_qdep);
 			break;
 		}
+		trace_cache_tag_flush_all(tag);
 	}
 	spin_unlock_irqrestore(&domain->cache_lock, flags);
 }
@@ -376,6 +383,7 @@ void cache_tag_flush_cm_range(struct dmar_domain *domain, unsigned long start,
 							 addr, mask,
 							 DMA_TLB_PSI_FLUSH);
 		}
+		trace_cache_tag_flush_cm_range(tag, start, end, addr, pages, mask);
 	}
 	spin_unlock_irqrestore(&domain->cache_lock, flags);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 04/12] iommu/vt-d: Use cache_tag_flush_all() in flush_iotlb_all
  2024-03-25  2:16 [PATCH 00/12] Consolidate domain cache invalidation Lu Baolu
                   ` (2 preceding siblings ...)
  2024-03-25  2:16 ` [PATCH 03/12] iommu/vt-d: Add trace events for cache tag interface Lu Baolu
@ 2024-03-25  2:16 ` Lu Baolu
  2024-03-28  7:47   ` Tian, Kevin
  2024-03-25  2:16 ` [PATCH 05/12] iommu/vt-d: Use cache_tag_flush_range() in tlb_sync Lu Baolu
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Lu Baolu @ 2024-03-25  2:16 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian, Jason Gunthorpe
  Cc: Tina Zhang, Yi Liu, iommu, linux-kernel, Lu Baolu

The flush_iotlb_all callback is called by the iommu core to flush
all caches for the affected domain. Use cache_tag_flush_all() in
this callback.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 21 +--------------------
 1 file changed, 1 insertion(+), 20 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 93e4422c9b10..4ce98f23917c 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1560,26 +1560,7 @@ static void parent_domain_flush(struct dmar_domain *domain,
 
 static void intel_flush_iotlb_all(struct iommu_domain *domain)
 {
-	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
-	struct iommu_domain_info *info;
-	unsigned long idx;
-
-	xa_for_each(&dmar_domain->iommu_array, idx, info) {
-		struct intel_iommu *iommu = info->iommu;
-		u16 did = domain_id_iommu(dmar_domain, iommu);
-
-		if (dmar_domain->use_first_level)
-			domain_flush_pasid_iotlb(iommu, dmar_domain, 0, -1, 0);
-		else
-			iommu->flush.flush_iotlb(iommu, did, 0, 0,
-						 DMA_TLB_DSI_FLUSH);
-
-		if (!cap_caching_mode(iommu->cap))
-			iommu_flush_dev_iotlb(dmar_domain, 0, MAX_AGAW_PFN_WIDTH);
-	}
-
-	if (dmar_domain->nested_parent)
-		parent_domain_flush(dmar_domain, 0, -1, 0);
+	cache_tag_flush_all(to_dmar_domain(domain));
 }
 
 static void iommu_disable_protect_mem_regions(struct intel_iommu *iommu)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 05/12] iommu/vt-d: Use cache_tag_flush_range() in tlb_sync
  2024-03-25  2:16 [PATCH 00/12] Consolidate domain cache invalidation Lu Baolu
                   ` (3 preceding siblings ...)
  2024-03-25  2:16 ` [PATCH 04/12] iommu/vt-d: Use cache_tag_flush_all() in flush_iotlb_all Lu Baolu
@ 2024-03-25  2:16 ` Lu Baolu
  2024-03-25  2:16 ` [PATCH 06/12] iommu/vt-d: Use cache_tag_flush_cm_range() in iotlb_sync_map Lu Baolu
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Lu Baolu @ 2024-03-25  2:16 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian, Jason Gunthorpe
  Cc: Tina Zhang, Yi Liu, iommu, linux-kernel, Lu Baolu

The tlb_sync callback is called by the iommu core to flush a range of
caches for the affected domain. Use cache_tag_flush_range() in this
callback.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 21 ++-------------------
 1 file changed, 2 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 4ce98f23917c..1c03d2dafb9d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4112,25 +4112,8 @@ static size_t intel_iommu_unmap_pages(struct iommu_domain *domain,
 static void intel_iommu_tlb_sync(struct iommu_domain *domain,
 				 struct iommu_iotlb_gather *gather)
 {
-	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
-	unsigned long iova_pfn = IOVA_PFN(gather->start);
-	size_t size = gather->end - gather->start;
-	struct iommu_domain_info *info;
-	unsigned long start_pfn;
-	unsigned long nrpages;
-	unsigned long i;
-
-	nrpages = aligned_nrpages(gather->start, size);
-	start_pfn = mm_to_dma_pfn_start(iova_pfn);
-
-	xa_for_each(&dmar_domain->iommu_array, i, info)
-		iommu_flush_iotlb_psi(info->iommu, dmar_domain,
-				      start_pfn, nrpages,
-				      list_empty(&gather->freelist), 0);
-
-	if (dmar_domain->nested_parent)
-		parent_domain_flush(dmar_domain, start_pfn, nrpages,
-				    list_empty(&gather->freelist));
+	cache_tag_flush_range(to_dmar_domain(domain), gather->start,
+			      gather->end, list_empty(&gather->freelist));
 	put_pages_list(&gather->freelist);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 06/12] iommu/vt-d: Use cache_tag_flush_cm_range() in iotlb_sync_map
  2024-03-25  2:16 [PATCH 00/12] Consolidate domain cache invalidation Lu Baolu
                   ` (4 preceding siblings ...)
  2024-03-25  2:16 ` [PATCH 05/12] iommu/vt-d: Use cache_tag_flush_range() in tlb_sync Lu Baolu
@ 2024-03-25  2:16 ` Lu Baolu
  2024-03-28  7:48   ` Tian, Kevin
  2024-03-25  2:17 ` [PATCH 07/12] iommu/vt-d: Cleanup use of iommu_flush_iotlb_psi() Lu Baolu
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Lu Baolu @ 2024-03-25  2:16 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian, Jason Gunthorpe
  Cc: Tina Zhang, Yi Liu, iommu, linux-kernel, Lu Baolu

The iotlb_sync_map callback is called by the iommu core after non-present
to present mappings are created. The iommu driver uses this callback to
invalidate caches if IOMMU is working in caching mode and second-only
translation is used for the domain. Use cache_tag_flush_cm_range() in this
callback.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 22 +---------------------
 1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 1c03d2dafb9d..2dcab1e5cd4d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1504,20 +1504,6 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
 		iommu_flush_dev_iotlb(domain, addr, mask);
 }
 
-/* Notification for newly created mappings */
-static void __mapping_notify_one(struct intel_iommu *iommu, struct dmar_domain *domain,
-				 unsigned long pfn, unsigned int pages)
-{
-	/*
-	 * It's a non-present to present mapping. Only flush if caching mode
-	 * and second level.
-	 */
-	if (cap_caching_mode(iommu->cap) && !domain->use_first_level)
-		iommu_flush_iotlb_psi(iommu, domain, pfn, pages, 0, 1);
-	else
-		iommu_flush_write_buffer(iommu);
-}
-
 /*
  * Flush the relevant caches in nested translation if the domain
  * also serves as a parent
@@ -4549,14 +4535,8 @@ static bool risky_device(struct pci_dev *pdev)
 static int intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
 				      unsigned long iova, size_t size)
 {
-	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
-	unsigned long pages = aligned_nrpages(iova, size);
-	unsigned long pfn = iova >> VTD_PAGE_SHIFT;
-	struct iommu_domain_info *info;
-	unsigned long i;
+	cache_tag_flush_cm_range(to_dmar_domain(domain), iova, iova + size - 1);
 
-	xa_for_each(&dmar_domain->iommu_array, i, info)
-		__mapping_notify_one(info->iommu, dmar_domain, pfn, pages);
 	return 0;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 07/12] iommu/vt-d: Cleanup use of iommu_flush_iotlb_psi()
  2024-03-25  2:16 [PATCH 00/12] Consolidate domain cache invalidation Lu Baolu
                   ` (5 preceding siblings ...)
  2024-03-25  2:16 ` [PATCH 06/12] iommu/vt-d: Use cache_tag_flush_cm_range() in iotlb_sync_map Lu Baolu
@ 2024-03-25  2:17 ` Lu Baolu
  2024-03-28  7:50   ` Tian, Kevin
  2024-03-25  2:17 ` [PATCH 08/12] iommu/vt-d: Use cache_tag_flush_range() in cache_invalidate_user Lu Baolu
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Lu Baolu @ 2024-03-25  2:17 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian, Jason Gunthorpe
  Cc: Tina Zhang, Yi Liu, iommu, linux-kernel, Lu Baolu

Use cache_tag_flush_range() in switch_to_super_page() to invalidate the
necessary caches when switching mappings from normal to super pages. The
iommu_flush_iotlb_psi() call in intel_iommu_memory_notifier() is
unnecessary since there should be no cache invalidation for the identity
domain.

Clean up iommu_flush_iotlb_psi() after the last call site is removed.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 175 +-----------------------------------
 1 file changed, 2 insertions(+), 173 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 2dcab1e5cd4d..6e019297843b 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1389,161 +1389,6 @@ static void __iommu_flush_dev_iotlb(struct device_domain_info *info,
 	quirk_extra_dev_tlb_flush(info, addr, mask, IOMMU_NO_PASID, qdep);
 }
 
-static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
-				  u64 addr, unsigned mask)
-{
-	struct dev_pasid_info *dev_pasid;
-	struct device_domain_info *info;
-	unsigned long flags;
-
-	if (!domain->has_iotlb_device)
-		return;
-
-	spin_lock_irqsave(&domain->lock, flags);
-	list_for_each_entry(info, &domain->devices, link)
-		__iommu_flush_dev_iotlb(info, addr, mask);
-
-	list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain) {
-		info = dev_iommu_priv_get(dev_pasid->dev);
-
-		if (!info->ats_enabled)
-			continue;
-
-		qi_flush_dev_iotlb_pasid(info->iommu,
-					 PCI_DEVID(info->bus, info->devfn),
-					 info->pfsid, dev_pasid->pasid,
-					 info->ats_qdep, addr,
-					 mask);
-	}
-	spin_unlock_irqrestore(&domain->lock, flags);
-}
-
-static void domain_flush_pasid_iotlb(struct intel_iommu *iommu,
-				     struct dmar_domain *domain, u64 addr,
-				     unsigned long npages, bool ih)
-{
-	u16 did = domain_id_iommu(domain, iommu);
-	struct dev_pasid_info *dev_pasid;
-	unsigned long flags;
-
-	spin_lock_irqsave(&domain->lock, flags);
-	list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain)
-		qi_flush_piotlb(iommu, did, dev_pasid->pasid, addr, npages, ih);
-
-	if (!list_empty(&domain->devices))
-		qi_flush_piotlb(iommu, did, IOMMU_NO_PASID, addr, npages, ih);
-	spin_unlock_irqrestore(&domain->lock, flags);
-}
-
-static void __iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did,
-				    unsigned long pfn, unsigned int pages,
-				    int ih)
-{
-	unsigned int aligned_pages = __roundup_pow_of_two(pages);
-	unsigned long bitmask = aligned_pages - 1;
-	unsigned int mask = ilog2(aligned_pages);
-	u64 addr = (u64)pfn << VTD_PAGE_SHIFT;
-
-	/*
-	 * PSI masks the low order bits of the base address. If the
-	 * address isn't aligned to the mask, then compute a mask value
-	 * needed to ensure the target range is flushed.
-	 */
-	if (unlikely(bitmask & pfn)) {
-		unsigned long end_pfn = pfn + pages - 1, shared_bits;
-
-		/*
-		 * Since end_pfn <= pfn + bitmask, the only way bits
-		 * higher than bitmask can differ in pfn and end_pfn is
-		 * by carrying. This means after masking out bitmask,
-		 * high bits starting with the first set bit in
-		 * shared_bits are all equal in both pfn and end_pfn.
-		 */
-		shared_bits = ~(pfn ^ end_pfn) & ~bitmask;
-		mask = shared_bits ? __ffs(shared_bits) : BITS_PER_LONG;
-	}
-
-	/*
-	 * Fallback to domain selective flush if no PSI support or
-	 * the size is too big.
-	 */
-	if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iommu->cap))
-		iommu->flush.flush_iotlb(iommu, did, 0, 0,
-					 DMA_TLB_DSI_FLUSH);
-	else
-		iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
-					 DMA_TLB_PSI_FLUSH);
-}
-
-static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
-				  struct dmar_domain *domain,
-				  unsigned long pfn, unsigned int pages,
-				  int ih, int map)
-{
-	unsigned int aligned_pages = __roundup_pow_of_two(pages);
-	unsigned int mask = ilog2(aligned_pages);
-	uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT;
-	u16 did = domain_id_iommu(domain, iommu);
-
-	if (WARN_ON(!pages))
-		return;
-
-	if (ih)
-		ih = 1 << 6;
-
-	if (domain->use_first_level)
-		domain_flush_pasid_iotlb(iommu, domain, addr, pages, ih);
-	else
-		__iommu_flush_iotlb_psi(iommu, did, pfn, pages, ih);
-
-	/*
-	 * In caching mode, changes of pages from non-present to present require
-	 * flush. However, device IOTLB doesn't need to be flushed in this case.
-	 */
-	if (!cap_caching_mode(iommu->cap) || !map)
-		iommu_flush_dev_iotlb(domain, addr, mask);
-}
-
-/*
- * Flush the relevant caches in nested translation if the domain
- * also serves as a parent
- */
-static void parent_domain_flush(struct dmar_domain *domain,
-				unsigned long pfn,
-				unsigned long pages, int ih)
-{
-	struct dmar_domain *s1_domain;
-
-	spin_lock(&domain->s1_lock);
-	list_for_each_entry(s1_domain, &domain->s1_domains, s2_link) {
-		struct device_domain_info *device_info;
-		struct iommu_domain_info *info;
-		unsigned long flags;
-		unsigned long i;
-
-		xa_for_each(&s1_domain->iommu_array, i, info)
-			__iommu_flush_iotlb_psi(info->iommu, info->did,
-						pfn, pages, ih);
-
-		if (!s1_domain->has_iotlb_device)
-			continue;
-
-		spin_lock_irqsave(&s1_domain->lock, flags);
-		list_for_each_entry(device_info, &s1_domain->devices, link)
-			/*
-			 * Address translation cache in device side caches the
-			 * result of nested translation. There is no easy way
-			 * to identify the exact set of nested translations
-			 * affected by a change in S2. So just flush the entire
-			 * device cache.
-			 */
-			__iommu_flush_dev_iotlb(device_info, 0,
-						MAX_AGAW_PFN_WIDTH);
-		spin_unlock_irqrestore(&s1_domain->lock, flags);
-	}
-	spin_unlock(&domain->s1_lock);
-}
-
 static void intel_flush_iotlb_all(struct iommu_domain *domain)
 {
 	cache_tag_flush_all(to_dmar_domain(domain));
@@ -1995,9 +1840,7 @@ static void switch_to_super_page(struct dmar_domain *domain,
 				 unsigned long end_pfn, int level)
 {
 	unsigned long lvl_pages = lvl_to_nr_pages(level);
-	struct iommu_domain_info *info;
 	struct dma_pte *pte = NULL;
-	unsigned long i;
 
 	while (start_pfn <= end_pfn) {
 		if (!pte)
@@ -2009,13 +1852,8 @@ static void switch_to_super_page(struct dmar_domain *domain,
 					       start_pfn + lvl_pages - 1,
 					       level + 1);
 
-			xa_for_each(&domain->iommu_array, i, info)
-				iommu_flush_iotlb_psi(info->iommu, domain,
-						      start_pfn, lvl_pages,
-						      0, 0);
-			if (domain->nested_parent)
-				parent_domain_flush(domain, start_pfn,
-						    lvl_pages, 0);
+			cache_tag_flush_range(domain, start_pfn << VTD_PAGE_SHIFT,
+					      end_pfn << VTD_PAGE_SHIFT, 0);
 		}
 
 		pte++;
@@ -3387,18 +3225,9 @@ static int intel_iommu_memory_notifier(struct notifier_block *nb,
 	case MEM_OFFLINE:
 	case MEM_CANCEL_ONLINE:
 		{
-			struct dmar_drhd_unit *drhd;
-			struct intel_iommu *iommu;
 			LIST_HEAD(freelist);
 
 			domain_unmap(si_domain, start_vpfn, last_vpfn, &freelist);
-
-			rcu_read_lock();
-			for_each_active_iommu(iommu, drhd)
-				iommu_flush_iotlb_psi(iommu, si_domain,
-					start_vpfn, mhp->nr_pages,
-					list_empty(&freelist), 0);
-			rcu_read_unlock();
 			put_pages_list(&freelist);
 		}
 		break;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 08/12] iommu/vt-d: Use cache_tag_flush_range() in cache_invalidate_user
  2024-03-25  2:16 [PATCH 00/12] Consolidate domain cache invalidation Lu Baolu
                   ` (6 preceding siblings ...)
  2024-03-25  2:17 ` [PATCH 07/12] iommu/vt-d: Cleanup use of iommu_flush_iotlb_psi() Lu Baolu
@ 2024-03-25  2:17 ` Lu Baolu
  2024-03-28  7:54   ` Tian, Kevin
  2024-03-25  2:17 ` [PATCH 09/12] iommu/vt-d: Use cache helpers in arch_invalidate_secondary_tlbs Lu Baolu
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Lu Baolu @ 2024-03-25  2:17 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian, Jason Gunthorpe
  Cc: Tina Zhang, Yi Liu, iommu, linux-kernel, Lu Baolu

The cache_invalidate_user callback is called to invalidate a range
of caches for the affected user domain. Use cache_tag_flush_range()
in this callback.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/nested.c | 50 +++---------------------------------
 1 file changed, 3 insertions(+), 47 deletions(-)

diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index 85c744099558..e251507cfcc0 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -90,50 +90,6 @@ static void intel_nested_domain_free(struct iommu_domain *domain)
 	kfree(dmar_domain);
 }
 
-static void nested_flush_dev_iotlb(struct dmar_domain *domain, u64 addr,
-				   unsigned int mask)
-{
-	struct device_domain_info *info;
-	unsigned long flags;
-	u16 sid, qdep;
-
-	spin_lock_irqsave(&domain->lock, flags);
-	list_for_each_entry(info, &domain->devices, link) {
-		if (!info->ats_enabled)
-			continue;
-		sid = info->bus << 8 | info->devfn;
-		qdep = info->ats_qdep;
-		qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
-				   qdep, addr, mask);
-		quirk_extra_dev_tlb_flush(info, addr, mask,
-					  IOMMU_NO_PASID, qdep);
-	}
-	spin_unlock_irqrestore(&domain->lock, flags);
-}
-
-static void intel_nested_flush_cache(struct dmar_domain *domain, u64 addr,
-				     u64 npages, bool ih)
-{
-	struct iommu_domain_info *info;
-	unsigned int mask;
-	unsigned long i;
-
-	xa_for_each(&domain->iommu_array, i, info)
-		qi_flush_piotlb(info->iommu,
-				domain_id_iommu(domain, info->iommu),
-				IOMMU_NO_PASID, addr, npages, ih);
-
-	if (!domain->has_iotlb_device)
-		return;
-
-	if (npages == U64_MAX)
-		mask = 64 - VTD_PAGE_SHIFT;
-	else
-		mask = ilog2(__roundup_pow_of_two(npages));
-
-	nested_flush_dev_iotlb(domain, addr, mask);
-}
-
 static int intel_nested_cache_invalidate_user(struct iommu_domain *domain,
 					      struct iommu_user_data_array *array)
 {
@@ -166,9 +122,9 @@ static int intel_nested_cache_invalidate_user(struct iommu_domain *domain,
 			break;
 		}
 
-		intel_nested_flush_cache(dmar_domain, inv_entry.addr,
-					 inv_entry.npages,
-					 inv_entry.flags & IOMMU_VTD_INV_FLAGS_LEAF);
+		cache_tag_flush_range(dmar_domain, inv_entry.addr,
+				      inv_entry.npages,
+				      inv_entry.flags & IOMMU_VTD_INV_FLAGS_LEAF);
 		processed++;
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 09/12] iommu/vt-d: Use cache helpers in arch_invalidate_secondary_tlbs
  2024-03-25  2:16 [PATCH 00/12] Consolidate domain cache invalidation Lu Baolu
                   ` (7 preceding siblings ...)
  2024-03-25  2:17 ` [PATCH 08/12] iommu/vt-d: Use cache_tag_flush_range() in cache_invalidate_user Lu Baolu
@ 2024-03-25  2:17 ` Lu Baolu
  2024-03-25  2:17 ` [PATCH 10/12] iommu/vt-d: Retire intel_svm_dev Lu Baolu
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Lu Baolu @ 2024-03-25  2:17 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian, Jason Gunthorpe
  Cc: Tina Zhang, Yi Liu, iommu, linux-kernel, Lu Baolu

The arch_invalidate_secondary_tlbs callback is called in the SVA mm
notification path. It invalidates all or a range of caches after the
CPU page table is modified. Use the cache tag helps in this path.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.h |  1 +
 drivers/iommu/intel/svm.c   | 76 +++----------------------------------
 2 files changed, 6 insertions(+), 71 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index d05fa0122d65..2575ded9f354 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -1148,6 +1148,7 @@ struct intel_svm {
 	struct mm_struct *mm;
 	u32 pasid;
 	struct list_head devs;
+	struct dmar_domain *domain;
 };
 #else
 static inline void intel_svm_check(struct intel_iommu *iommu) {}
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index d706226e84ee..751fab476fa2 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -168,88 +168,20 @@ void intel_svm_check(struct intel_iommu *iommu)
 	iommu->flags |= VTD_FLAG_SVM_CAPABLE;
 }
 
-static void __flush_svm_range_dev(struct intel_svm *svm,
-				  struct intel_svm_dev *sdev,
-				  unsigned long address,
-				  unsigned long pages, int ih)
-{
-	struct device_domain_info *info = dev_iommu_priv_get(sdev->dev);
-
-	if (WARN_ON(!pages))
-		return;
-
-	qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, address, pages, ih);
-	if (info->ats_enabled) {
-		qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid,
-					 svm->pasid, sdev->qdep, address,
-					 order_base_2(pages));
-		quirk_extra_dev_tlb_flush(info, address, order_base_2(pages),
-					  svm->pasid, sdev->qdep);
-	}
-}
-
-static void intel_flush_svm_range_dev(struct intel_svm *svm,
-				      struct intel_svm_dev *sdev,
-				      unsigned long address,
-				      unsigned long pages, int ih)
-{
-	unsigned long shift = ilog2(__roundup_pow_of_two(pages));
-	unsigned long align = (1ULL << (VTD_PAGE_SHIFT + shift));
-	unsigned long start = ALIGN_DOWN(address, align);
-	unsigned long end = ALIGN(address + (pages << VTD_PAGE_SHIFT), align);
-
-	while (start < end) {
-		__flush_svm_range_dev(svm, sdev, start, align >> VTD_PAGE_SHIFT, ih);
-		start += align;
-	}
-}
-
-static void intel_flush_svm_range(struct intel_svm *svm, unsigned long address,
-				unsigned long pages, int ih)
-{
-	struct intel_svm_dev *sdev;
-
-	rcu_read_lock();
-	list_for_each_entry_rcu(sdev, &svm->devs, list)
-		intel_flush_svm_range_dev(svm, sdev, address, pages, ih);
-	rcu_read_unlock();
-}
-
-static void intel_flush_svm_all(struct intel_svm *svm)
-{
-	struct device_domain_info *info;
-	struct intel_svm_dev *sdev;
-
-	rcu_read_lock();
-	list_for_each_entry_rcu(sdev, &svm->devs, list) {
-		info = dev_iommu_priv_get(sdev->dev);
-
-		qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, 0, -1UL, 0);
-		if (info->ats_enabled) {
-			qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid,
-						 svm->pasid, sdev->qdep,
-						 0, 64 - VTD_PAGE_SHIFT);
-			quirk_extra_dev_tlb_flush(info, 0, 64 - VTD_PAGE_SHIFT,
-						  svm->pasid, sdev->qdep);
-		}
-	}
-	rcu_read_unlock();
-}
-
 /* Pages have been freed at this point */
 static void intel_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn,
 					struct mm_struct *mm,
 					unsigned long start, unsigned long end)
 {
 	struct intel_svm *svm = container_of(mn, struct intel_svm, notifier);
+	struct dmar_domain *domain = svm->domain;
 
 	if (start == 0 && end == -1UL) {
-		intel_flush_svm_all(svm);
+		cache_tag_flush_all(domain);
 		return;
 	}
 
-	intel_flush_svm_range(svm, start,
-			      (end - start + PAGE_SIZE - 1) >> VTD_PAGE_SHIFT, 0);
+	cache_tag_flush_range(domain, start, end, 0);
 }
 
 static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
@@ -336,6 +268,7 @@ static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
 		INIT_LIST_HEAD_RCU(&svm->devs);
 
 		svm->notifier.ops = &intel_mmuops;
+		svm->domain = to_dmar_domain(domain);
 		ret = mmu_notifier_register(&svm->notifier, mm);
 		if (ret) {
 			kfree(svm);
@@ -803,6 +736,7 @@ struct iommu_domain *intel_svm_domain_alloc(void)
 	if (!domain)
 		return NULL;
 	domain->domain.ops = &intel_svm_domain_ops;
+	domain->use_first_level = true;
 	INIT_LIST_HEAD(&domain->cache_tags);
 	spin_lock_init(&domain->cache_lock);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 10/12] iommu/vt-d: Retire intel_svm_dev
  2024-03-25  2:16 [PATCH 00/12] Consolidate domain cache invalidation Lu Baolu
                   ` (8 preceding siblings ...)
  2024-03-25  2:17 ` [PATCH 09/12] iommu/vt-d: Use cache helpers in arch_invalidate_secondary_tlbs Lu Baolu
@ 2024-03-25  2:17 ` Lu Baolu
  2024-03-25  2:17 ` [PATCH 11/12] iommu: Add ops->domain_alloc_sva() Lu Baolu
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Lu Baolu @ 2024-03-25  2:17 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian, Jason Gunthorpe
  Cc: Tina Zhang, Yi Liu, iommu, linux-kernel, Lu Baolu

The intel_svm_dev data structure used in the sva implementation for the
Intel IOMMU driver stores information about a device attached to an SVA
domain. It is a duplicate of dev_pasid_info that serves the same purpose.

Replace intel_svm_dev with dev_pasid_info and clean up the use of
intel_svm_dev.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.h |  15 +----
 drivers/iommu/intel/iommu.c |  30 ++++-----
 drivers/iommu/intel/svm.c   | 131 +++++++++++-------------------------
 3 files changed, 55 insertions(+), 121 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 2575ded9f354..5dcced6f3634 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -650,6 +650,7 @@ struct dmar_domain {
 			struct list_head s2_link;
 		};
 	};
+	struct intel_svm *svm;
 
 	struct iommu_domain domain;	/* generic domain data structure for
 					   iommu core */
@@ -1131,23 +1132,13 @@ int intel_svm_finish_prq(struct intel_iommu *iommu);
 void intel_svm_page_response(struct device *dev, struct iopf_fault *evt,
 			     struct iommu_page_response *msg);
 struct iommu_domain *intel_svm_domain_alloc(void);
-void intel_svm_remove_dev_pasid(struct device *dev, ioasid_t pasid);
+void intel_svm_remove_dev_pasid(struct iommu_domain *domain);
 void intel_drain_pasid_prq(struct device *dev, u32 pasid);
 
-struct intel_svm_dev {
-	struct list_head list;
-	struct rcu_head rcu;
-	struct device *dev;
-	struct intel_iommu *iommu;
-	u16 did;
-	u16 sid, qdep;
-};
-
 struct intel_svm {
 	struct mmu_notifier notifier;
 	struct mm_struct *mm;
 	u32 pasid;
-	struct list_head devs;
 	struct dmar_domain *domain;
 };
 #else
@@ -1158,7 +1149,7 @@ static inline struct iommu_domain *intel_svm_domain_alloc(void)
 	return NULL;
 }
 
-static inline void intel_svm_remove_dev_pasid(struct device *dev, ioasid_t pasid)
+static inline void intel_svm_remove_dev_pasid(struct iommu_domain *domain)
 {
 }
 #endif
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6e019297843b..2ac1fc041333 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4384,18 +4384,6 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 		goto out_tear_down;
 	dmar_domain = to_dmar_domain(domain);
 
-	/*
-	 * The SVA implementation needs to handle its own stuffs like the mm
-	 * notification. Before consolidating that code into iommu core, let
-	 * the intel sva code handle it.
-	 */
-	if (domain->type == IOMMU_DOMAIN_SVA) {
-		intel_svm_remove_dev_pasid(dev, pasid);
-		cache_tag_unassign_domain(dmar_domain,
-					  FLPT_DEFAULT_DID, dev, pasid);
-		goto out_tear_down;
-	}
-
 	spin_lock_irqsave(&dmar_domain->lock, flags);
 	list_for_each_entry(curr, &dmar_domain->dev_pasids, link_domain) {
 		if (curr->dev == dev && curr->pasid == pasid) {
@@ -4407,10 +4395,20 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 	WARN_ON_ONCE(!dev_pasid);
 	spin_unlock_irqrestore(&dmar_domain->lock, flags);
 
-	did = domain_id_iommu(dmar_domain, iommu);
-	cache_tag_unassign_domain(dmar_domain, did, dev, pasid);
-	domain_detach_iommu(dmar_domain, iommu);
-	intel_iommu_debugfs_remove_dev_pasid(dev_pasid);
+	/*
+	 * The SVA implementation needs to handle its own stuffs like the mm
+	 * notification. Before consolidating that code into iommu core, let
+	 * the intel sva code handle it.
+	 */
+	if (domain->type == IOMMU_DOMAIN_SVA) {
+		cache_tag_unassign_domain(dmar_domain, FLPT_DEFAULT_DID, dev, pasid);
+		intel_svm_remove_dev_pasid(domain);
+	} else {
+		did = domain_id_iommu(dmar_domain, iommu);
+		cache_tag_unassign_domain(dmar_domain, did, dev, pasid);
+		domain_detach_iommu(dmar_domain, iommu);
+		intel_iommu_debugfs_remove_dev_pasid(dev_pasid);
+	}
 	kfree(dev_pasid);
 out_tear_down:
 	intel_pasid_tear_down_entry(iommu, dev, pasid, false);
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 751fab476fa2..0b767d16fb71 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -43,23 +43,6 @@ static void *pasid_private_find(ioasid_t pasid)
 	return xa_load(&pasid_private_array, pasid);
 }
 
-static struct intel_svm_dev *
-svm_lookup_device_by_dev(struct intel_svm *svm, struct device *dev)
-{
-	struct intel_svm_dev *sdev = NULL, *t;
-
-	rcu_read_lock();
-	list_for_each_entry_rcu(t, &svm->devs, list) {
-		if (t->dev == dev) {
-			sdev = t;
-			break;
-		}
-	}
-	rcu_read_unlock();
-
-	return sdev;
-}
-
 int intel_svm_enable_prq(struct intel_iommu *iommu)
 {
 	struct iopf_queue *iopfq;
@@ -187,7 +170,10 @@ static void intel_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn,
 static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 {
 	struct intel_svm *svm = container_of(mn, struct intel_svm, notifier);
-	struct intel_svm_dev *sdev;
+	struct dmar_domain *domain = svm->domain;
+	struct dev_pasid_info *dev_pasid;
+	struct device_domain_info *info;
+	unsigned long flags;
 
 	/* This might end up being called from exit_mmap(), *before* the page
 	 * tables are cleared. And __mmu_notifier_release() will delete us from
@@ -201,11 +187,13 @@ static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 	 * page) so that we end up taking a fault that the hardware really
 	 * *has* to handle gracefully without affecting other processes.
 	 */
-	rcu_read_lock();
-	list_for_each_entry_rcu(sdev, &svm->devs, list)
-		intel_pasid_tear_down_entry(sdev->iommu, sdev->dev,
-					    svm->pasid, true);
-	rcu_read_unlock();
+	spin_lock_irqsave(&domain->lock, flags);
+	list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain) {
+		info = dev_iommu_priv_get(dev_pasid->dev);
+		intel_pasid_tear_down_entry(info->iommu, dev_pasid->dev,
+					    dev_pasid->pasid, true);
+	}
+	spin_unlock_irqrestore(&domain->lock, flags);
 
 }
 
@@ -214,47 +202,17 @@ static const struct mmu_notifier_ops intel_mmuops = {
 	.arch_invalidate_secondary_tlbs = intel_arch_invalidate_secondary_tlbs,
 };
 
-static int pasid_to_svm_sdev(struct device *dev, unsigned int pasid,
-			     struct intel_svm **rsvm,
-			     struct intel_svm_dev **rsdev)
-{
-	struct intel_svm_dev *sdev = NULL;
-	struct intel_svm *svm;
-
-	if (pasid == IOMMU_PASID_INVALID || pasid >= PASID_MAX)
-		return -EINVAL;
-
-	svm = pasid_private_find(pasid);
-	if (IS_ERR(svm))
-		return PTR_ERR(svm);
-
-	if (!svm)
-		goto out;
-
-	/*
-	 * If we found svm for the PASID, there must be at least one device
-	 * bond.
-	 */
-	if (WARN_ON(list_empty(&svm->devs)))
-		return -EINVAL;
-	sdev = svm_lookup_device_by_dev(svm, dev);
-
-out:
-	*rsvm = svm;
-	*rsdev = sdev;
-
-	return 0;
-}
-
 static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
 				   struct device *dev, ioasid_t pasid)
 {
 	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
 	struct intel_iommu *iommu = info->iommu;
 	struct mm_struct *mm = domain->mm;
-	struct intel_svm_dev *sdev;
+	struct dev_pasid_info *dev_pasid;
 	struct intel_svm *svm;
 	unsigned long sflags;
+	unsigned long flags;
 	int ret = 0;
 
 	svm = pasid_private_find(pasid);
@@ -265,7 +223,6 @@ static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
 
 		svm->pasid = pasid;
 		svm->mm = mm;
-		INIT_LIST_HEAD_RCU(&svm->devs);
 
 		svm->notifier.ops = &intel_mmuops;
 		svm->domain = to_dmar_domain(domain);
@@ -283,26 +240,19 @@ static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
 		}
 	}
 
-	sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
-	if (!sdev) {
-		ret = -ENOMEM;
+	dmar_domain->svm = svm;
+
+	dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
+	if (!dev_pasid)
 		goto free_svm;
-	}
 
-	sdev->dev = dev;
-	sdev->iommu = iommu;
-	sdev->did = FLPT_DEFAULT_DID;
-	sdev->sid = PCI_DEVID(info->bus, info->devfn);
-	if (info->ats_enabled) {
-		sdev->qdep = info->ats_qdep;
-		if (sdev->qdep >= QI_DEV_EIOTLB_MAX_INVS)
-			sdev->qdep = 0;
-	}
+	dev_pasid->dev = dev;
+	dev_pasid->pasid = pasid;
 
 	ret = cache_tag_assign_domain(to_dmar_domain(domain),
 				      FLPT_DEFAULT_DID, dev, pasid);
 	if (ret)
-		goto free_sdev;
+		goto free_dev_pasid;
 
 	/* Setup the pasid table: */
 	sflags = cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
@@ -311,17 +261,19 @@ static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
 	if (ret)
 		goto unassign_tag;
 
-	list_add_rcu(&sdev->list, &svm->devs);
+	spin_lock_irqsave(&dmar_domain->lock, flags);
+	list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);
+	spin_unlock_irqrestore(&dmar_domain->lock, flags);
 
 	return 0;
 
 unassign_tag:
 	cache_tag_unassign_domain(to_dmar_domain(domain),
 				  FLPT_DEFAULT_DID, dev, pasid);
-free_sdev:
-	kfree(sdev);
+free_dev_pasid:
+	kfree(dev_pasid);
 free_svm:
-	if (list_empty(&svm->devs)) {
+	if (list_empty(&dmar_domain->dev_pasids)) {
 		mmu_notifier_unregister(&svm->notifier, mm);
 		pasid_private_remove(pasid);
 		kfree(svm);
@@ -330,26 +282,17 @@ static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
 	return ret;
 }
 
-void intel_svm_remove_dev_pasid(struct device *dev, u32 pasid)
+void intel_svm_remove_dev_pasid(struct iommu_domain *domain)
 {
-	struct intel_svm_dev *sdev;
-	struct intel_svm *svm;
-	struct mm_struct *mm;
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct intel_svm *svm = dmar_domain->svm;
+	struct mm_struct *mm = domain->mm;
 
-	if (pasid_to_svm_sdev(dev, pasid, &svm, &sdev))
-		return;
-	mm = svm->mm;
-
-	if (sdev) {
-		list_del_rcu(&sdev->list);
-		kfree_rcu(sdev, rcu);
-
-		if (list_empty(&svm->devs)) {
-			if (svm->notifier.ops)
-				mmu_notifier_unregister(&svm->notifier, mm);
-			pasid_private_remove(svm->pasid);
-			kfree(svm);
-		}
+	if (list_empty(&dmar_domain->dev_pasids)) {
+		if (svm->notifier.ops)
+			mmu_notifier_unregister(&svm->notifier, mm);
+		pasid_private_remove(svm->pasid);
+		kfree(svm);
 	}
 }
 
@@ -737,8 +680,10 @@ struct iommu_domain *intel_svm_domain_alloc(void)
 		return NULL;
 	domain->domain.ops = &intel_svm_domain_ops;
 	domain->use_first_level = true;
+	INIT_LIST_HEAD(&domain->dev_pasids);
 	INIT_LIST_HEAD(&domain->cache_tags);
 	spin_lock_init(&domain->cache_lock);
+	spin_lock_init(&domain->lock);
 
 	return &domain->domain;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 11/12] iommu: Add ops->domain_alloc_sva()
  2024-03-25  2:16 [PATCH 00/12] Consolidate domain cache invalidation Lu Baolu
                   ` (9 preceding siblings ...)
  2024-03-25  2:17 ` [PATCH 10/12] iommu/vt-d: Retire intel_svm_dev Lu Baolu
@ 2024-03-25  2:17 ` Lu Baolu
  2024-03-25  2:17 ` [PATCH 12/12] iommu/vt-d: Retire struct intel_svm Lu Baolu
  2024-03-28  7:59 ` [PATCH 00/12] Consolidate domain cache invalidation Tian, Kevin
  12 siblings, 0 replies; 44+ messages in thread
From: Lu Baolu @ 2024-03-25  2:17 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian, Jason Gunthorpe
  Cc: Tina Zhang, Yi Liu, iommu, linux-kernel, Jason Gunthorpe,
	Vasant Hegde, Lu Baolu

From: Jason Gunthorpe <jgg@nvidia.com>

Make a new op that receives the device and the mm_struct that the SVA
domain should be created for. Unlike domain_alloc_paging() the dev
argument is never NULL here.

This allows drivers to fully initialize the SVA domain and allocate the
mmu_notifier during allocation. It allows the notifier lifetime to follow
the lifetime of the iommu_domain.

Since we have only one call site, upgrade the new op to return ERR_PTR
instead of NULL.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[Removed smmu3 related changes - Vasant]
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20240311090843.133455-15-vasant.hegde@amd.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h     |  3 +++
 drivers/iommu/iommu-sva.c | 16 +++++++++++-----
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2e925b5eba53..8aabe83af8f2 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -518,6 +518,7 @@ static inline int __iommu_copy_struct_from_user_array(
  *                     Upon failure, ERR_PTR must be returned.
  * @domain_alloc_paging: Allocate an iommu_domain that can be used for
  *                       UNMANAGED, DMA, and DMA_FQ domain types.
+ * @domain_alloc_sva: Allocate an iommu_domain for Shared Virtual Addressing.
  * @probe_device: Add device to iommu driver handling
  * @release_device: Remove device from iommu driver handling
  * @probe_finalize: Do final setup work after the device is added to an IOMMU
@@ -558,6 +559,8 @@ struct iommu_ops {
 		struct device *dev, u32 flags, struct iommu_domain *parent,
 		const struct iommu_user_data *user_data);
 	struct iommu_domain *(*domain_alloc_paging)(struct device *dev);
+	struct iommu_domain *(*domain_alloc_sva)(struct device *dev,
+						 struct mm_struct *mm);
 
 	struct iommu_device *(*probe_device)(struct device *dev);
 	void (*release_device)(struct device *dev);
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 640acc804e8c..18a35e798b72 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -108,8 +108,8 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
 
 	/* Allocate a new domain and set it on device pasid. */
 	domain = iommu_sva_domain_alloc(dev, mm);
-	if (!domain) {
-		ret = -ENOMEM;
+	if (IS_ERR(domain)) {
+		ret = PTR_ERR(domain);
 		goto out_free_handle;
 	}
 
@@ -283,9 +283,15 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
 	const struct iommu_ops *ops = dev_iommu_ops(dev);
 	struct iommu_domain *domain;
 
-	domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
-	if (!domain)
-		return NULL;
+	if (ops->domain_alloc_sva) {
+		domain = ops->domain_alloc_sva(dev, mm);
+		if (IS_ERR(domain))
+			return domain;
+	} else {
+		domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
+		if (!domain)
+			return ERR_PTR(-ENOMEM);
+	}
 
 	domain->type = IOMMU_DOMAIN_SVA;
 	mmgrab(mm);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 12/12] iommu/vt-d: Retire struct intel_svm
  2024-03-25  2:16 [PATCH 00/12] Consolidate domain cache invalidation Lu Baolu
                   ` (10 preceding siblings ...)
  2024-03-25  2:17 ` [PATCH 11/12] iommu: Add ops->domain_alloc_sva() Lu Baolu
@ 2024-03-25  2:17 ` Lu Baolu
  2024-03-28  7:59 ` [PATCH 00/12] Consolidate domain cache invalidation Tian, Kevin
  12 siblings, 0 replies; 44+ messages in thread
From: Lu Baolu @ 2024-03-25  2:17 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian, Jason Gunthorpe
  Cc: Tina Zhang, Yi Liu, iommu, linux-kernel, Lu Baolu

The struct intel_svm was used for keeping attached devices info for sva
domain. Since sva domain is a kind of iommu_domain, the struct
dmar_domain should centralize all info of a sva domain, including the
info of attached devices. Therefore, retire struct intel_svm to clean up
the code.

Besides, allocate sva domain in domain_alloc_sva() callback which allows
the memory management notifier lifetime to follow the lifetime of the
iommu_domain.

Co-developed-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.h | 26 ++++------
 drivers/iommu/intel/iommu.c |  9 +---
 drivers/iommu/intel/svm.c   | 94 +++++++++----------------------------
 3 files changed, 32 insertions(+), 97 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 5dcced6f3634..7a4645e06bb7 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -649,8 +649,12 @@ struct dmar_domain {
 			/* link to parent domain siblings */
 			struct list_head s2_link;
 		};
+
+		/* SVA domain */
+		struct {
+			struct mmu_notifier notifier;
+		};
 	};
-	struct intel_svm *svm;
 
 	struct iommu_domain domain;	/* generic domain data structure for
 					   iommu core */
@@ -1131,26 +1135,16 @@ int intel_svm_enable_prq(struct intel_iommu *iommu);
 int intel_svm_finish_prq(struct intel_iommu *iommu);
 void intel_svm_page_response(struct device *dev, struct iopf_fault *evt,
 			     struct iommu_page_response *msg);
-struct iommu_domain *intel_svm_domain_alloc(void);
-void intel_svm_remove_dev_pasid(struct iommu_domain *domain);
+struct iommu_domain *intel_svm_domain_alloc(struct device *dev,
+					    struct mm_struct *mm);
 void intel_drain_pasid_prq(struct device *dev, u32 pasid);
-
-struct intel_svm {
-	struct mmu_notifier notifier;
-	struct mm_struct *mm;
-	u32 pasid;
-	struct dmar_domain *domain;
-};
 #else
 static inline void intel_svm_check(struct intel_iommu *iommu) {}
 static inline void intel_drain_pasid_prq(struct device *dev, u32 pasid) {}
-static inline struct iommu_domain *intel_svm_domain_alloc(void)
-{
-	return NULL;
-}
-
-static inline void intel_svm_remove_dev_pasid(struct iommu_domain *domain)
+static inline struct iommu_domain *intel_svm_domain_alloc(struct device *dev,
+							  struct mm_struct *mm)
 {
+	return ERR_PTR(-ENODEV);
 }
 #endif
 
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 2ac1fc041333..cdf7882f58f3 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3687,8 +3687,6 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
 		return domain;
 	case IOMMU_DOMAIN_IDENTITY:
 		return &si_domain->domain;
-	case IOMMU_DOMAIN_SVA:
-		return intel_svm_domain_alloc();
 	default:
 		return NULL;
 	}
@@ -4395,14 +4393,8 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 	WARN_ON_ONCE(!dev_pasid);
 	spin_unlock_irqrestore(&dmar_domain->lock, flags);
 
-	/*
-	 * The SVA implementation needs to handle its own stuffs like the mm
-	 * notification. Before consolidating that code into iommu core, let
-	 * the intel sva code handle it.
-	 */
 	if (domain->type == IOMMU_DOMAIN_SVA) {
 		cache_tag_unassign_domain(dmar_domain, FLPT_DEFAULT_DID, dev, pasid);
-		intel_svm_remove_dev_pasid(domain);
 	} else {
 		did = domain_id_iommu(dmar_domain, iommu);
 		cache_tag_unassign_domain(dmar_domain, did, dev, pasid);
@@ -4631,6 +4623,7 @@ const struct iommu_ops intel_iommu_ops = {
 	.hw_info		= intel_iommu_hw_info,
 	.domain_alloc		= intel_iommu_domain_alloc,
 	.domain_alloc_user	= intel_iommu_domain_alloc_user,
+	.domain_alloc_sva	= intel_svm_domain_alloc,
 	.probe_device		= intel_iommu_probe_device,
 	.probe_finalize		= intel_iommu_probe_finalize,
 	.release_device		= intel_iommu_release_device,
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 0b767d16fb71..47e475f67046 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -26,23 +26,6 @@
 
 static irqreturn_t prq_event_thread(int irq, void *d);
 
-static DEFINE_XARRAY_ALLOC(pasid_private_array);
-static int pasid_private_add(ioasid_t pasid, void *priv)
-{
-	return xa_alloc(&pasid_private_array, &pasid, priv,
-			XA_LIMIT(pasid, pasid), GFP_ATOMIC);
-}
-
-static void pasid_private_remove(ioasid_t pasid)
-{
-	xa_erase(&pasid_private_array, pasid);
-}
-
-static void *pasid_private_find(ioasid_t pasid)
-{
-	return xa_load(&pasid_private_array, pasid);
-}
-
 int intel_svm_enable_prq(struct intel_iommu *iommu)
 {
 	struct iopf_queue *iopfq;
@@ -156,8 +139,7 @@ static void intel_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn,
 					struct mm_struct *mm,
 					unsigned long start, unsigned long end)
 {
-	struct intel_svm *svm = container_of(mn, struct intel_svm, notifier);
-	struct dmar_domain *domain = svm->domain;
+	struct dmar_domain *domain = container_of(mn, struct dmar_domain, notifier);
 
 	if (start == 0 && end == -1UL) {
 		cache_tag_flush_all(domain);
@@ -169,8 +151,7 @@ static void intel_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn,
 
 static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 {
-	struct intel_svm *svm = container_of(mn, struct intel_svm, notifier);
-	struct dmar_domain *domain = svm->domain;
+	struct dmar_domain *domain = container_of(mn, struct dmar_domain, notifier);
 	struct dev_pasid_info *dev_pasid;
 	struct device_domain_info *info;
 	unsigned long flags;
@@ -210,41 +191,13 @@ static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
 	struct intel_iommu *iommu = info->iommu;
 	struct mm_struct *mm = domain->mm;
 	struct dev_pasid_info *dev_pasid;
-	struct intel_svm *svm;
 	unsigned long sflags;
 	unsigned long flags;
 	int ret = 0;
 
-	svm = pasid_private_find(pasid);
-	if (!svm) {
-		svm = kzalloc(sizeof(*svm), GFP_KERNEL);
-		if (!svm)
-			return -ENOMEM;
-
-		svm->pasid = pasid;
-		svm->mm = mm;
-
-		svm->notifier.ops = &intel_mmuops;
-		svm->domain = to_dmar_domain(domain);
-		ret = mmu_notifier_register(&svm->notifier, mm);
-		if (ret) {
-			kfree(svm);
-			return ret;
-		}
-
-		ret = pasid_private_add(svm->pasid, svm);
-		if (ret) {
-			mmu_notifier_unregister(&svm->notifier, mm);
-			kfree(svm);
-			return ret;
-		}
-	}
-
-	dmar_domain->svm = svm;
-
 	dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
 	if (!dev_pasid)
-		goto free_svm;
+		return -ENOMEM;
 
 	dev_pasid->dev = dev;
 	dev_pasid->pasid = pasid;
@@ -272,30 +225,10 @@ static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
 				  FLPT_DEFAULT_DID, dev, pasid);
 free_dev_pasid:
 	kfree(dev_pasid);
-free_svm:
-	if (list_empty(&dmar_domain->dev_pasids)) {
-		mmu_notifier_unregister(&svm->notifier, mm);
-		pasid_private_remove(pasid);
-		kfree(svm);
-	}
 
 	return ret;
 }
 
-void intel_svm_remove_dev_pasid(struct iommu_domain *domain)
-{
-	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
-	struct intel_svm *svm = dmar_domain->svm;
-	struct mm_struct *mm = domain->mm;
-
-	if (list_empty(&dmar_domain->dev_pasids)) {
-		if (svm->notifier.ops)
-			mmu_notifier_unregister(&svm->notifier, mm);
-		pasid_private_remove(svm->pasid);
-		kfree(svm);
-	}
-}
-
 /* Page request queue descriptor */
 struct page_req_dsc {
 	union {
@@ -663,7 +596,12 @@ void intel_svm_page_response(struct device *dev, struct iopf_fault *evt,
 
 static void intel_svm_domain_free(struct iommu_domain *domain)
 {
-	kfree(to_dmar_domain(domain));
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+
+	if (dmar_domain->notifier.ops)
+		mmu_notifier_unregister(&dmar_domain->notifier, domain->mm);
+
+	kfree(dmar_domain);
 }
 
 static const struct iommu_domain_ops intel_svm_domain_ops = {
@@ -671,13 +609,16 @@ static const struct iommu_domain_ops intel_svm_domain_ops = {
 	.free			= intel_svm_domain_free
 };
 
-struct iommu_domain *intel_svm_domain_alloc(void)
+struct iommu_domain *intel_svm_domain_alloc(struct device *dev,
+					    struct mm_struct *mm)
 {
 	struct dmar_domain *domain;
+	int ret;
 
 	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
 	if (!domain)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
+
 	domain->domain.ops = &intel_svm_domain_ops;
 	domain->use_first_level = true;
 	INIT_LIST_HEAD(&domain->dev_pasids);
@@ -685,5 +626,12 @@ struct iommu_domain *intel_svm_domain_alloc(void)
 	spin_lock_init(&domain->cache_lock);
 	spin_lock_init(&domain->lock);
 
+	domain->notifier.ops = &intel_mmuops;
+	ret = mmu_notifier_register(&domain->notifier, mm);
+	if (ret) {
+		kfree(domain);
+		return ERR_PTR(ret);
+	}
+
 	return &domain->domain;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* RE: [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface
  2024-03-25  2:16 ` [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface Lu Baolu
@ 2024-03-28  7:12   ` Tian, Kevin
  2024-04-06 12:55     ` Baolu Lu
  2024-04-10 15:41   ` Jason Gunthorpe
  1 sibling, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2024-03-28  7:12 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: Zhang, Tina, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Monday, March 25, 2024 10:17 AM
> 
> +enum cache_tag_type {
> +	CACHE_TAG_TYPE_IOTLB,
> +	CACHE_TAG_TYPE_DEVTLB,
> +	CACHE_TAG_TYPE_PARENT_IOTLB,
> +	CACHE_TAG_TYPE_PARENT_DEVTLB,
> +};

'_TYPE_' can be removed to make it shorter

> +
> +/* Checks if an existing cache tag can be reused for a new association. */
> +static bool cache_tag_reusable(struct cache_tag *tag, u16 domain_id,
> +			       struct intel_iommu *iommu, struct device *dev,
> +			       ioasid_t pasid, enum cache_tag_type type)

cache_tage_match()

> +{
> +	if (tag->type != type)
> +		return false;
> +
> +	if (tag->domain_id != domain_id || tag->pasid != pasid)
> +		return false;
> +
> +	if (type == CACHE_TAG_TYPE_IOTLB)
> +		return tag->iommu == iommu;
> +
> +	if (type == CACHE_TAG_TYPE_DEVTLB)
> +		return tag->dev == dev;
> +
> +	return false;

why do you disallow PARENT_TYPE from reusing? It's not uncommon
to have two devices attached to a same nested domain hence with
the same parent domain. Disallowing tag reuse implies unnecessarily
duplicated cache flushes...

> +}
> +
> +/* Assign a cache tag with specified type to domain. */
> +static int cache_tag_assign(struct dmar_domain *domain, u16 did,
> +			    struct device *dev, ioasid_t pasid,
> +			    enum cache_tag_type type)
> +{
> +	struct device_domain_info *info = dev_iommu_priv_get(dev);
> +	struct intel_iommu *iommu = info->iommu;
> +	struct cache_tag *tag, *temp;
> +	unsigned long flags;
> +
> +	tag = kzalloc(sizeof(*tag), GFP_KERNEL);
> +	if (!tag)
> +		return -ENOMEM;
> +
> +	tag->type = type;
> +	tag->iommu = iommu;
> +	tag->dev = dev;

should we set tag->dev only for DEVTLB type? It's a bit confusing to set
it for IOTLB type which doesn't care about device. Actually doing so
is instead misleading as the 1st device creating the tag may have been
detached but then it will still show up in the trace when the last device
detach destroying the tag.

> +static int __cache_tag_assign_parent_domain(struct dmar_domain
> *domain, u16 did,
> +					    struct device *dev, ioasid_t pasid)

this pair is similar to the earlier one except the difference on type.

what about keeping just one pair which accepts a 'parent' argument to
decide the type internally?


> +/*
> + * Assigns cache tags to a domain when it's associated with a device's
> + * PASID using a specific domain ID.

s/Assigns/Assign/

> +
> +	did = domain_id_iommu(domain, iommu);
> +	ret = cache_tag_assign_domain(domain, did, dev,
> IOMMU_NO_PASID);

there are many occurrences of this pattern. What about passing in
a 'iommu' parameter and getting 'did' inside the helper? for svm
it can be specialized internally too.

> @@ -4607,10 +4623,11 @@ static void
> intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
>  	 */
>  	if (domain->type == IOMMU_DOMAIN_SVA) {
>  		intel_svm_remove_dev_pasid(dev, pasid);
> +		cache_tag_unassign_domain(dmar_domain,
> +					  FLPT_DEFAULT_DID, dev, pasid);

is it correct to destroy the tag before teardown completes, e.g. iotlb still
needs to be flushed in intel_pasid_tear_down_entry()?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 02/12] iommu/vt-d: Add cache tag invalidation helpers
  2024-03-25  2:16 ` [PATCH 02/12] iommu/vt-d: Add cache tag invalidation helpers Lu Baolu
@ 2024-03-28  7:39   ` Tian, Kevin
  2024-04-07  5:33     ` Baolu Lu
  0 siblings, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2024-03-28  7:39 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: Zhang, Tina, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Monday, March 25, 2024 10:17 AM
> 
> +
> +static unsigned long calculate_psi_aligned_address(unsigned long start,
> +						   unsigned long end,
> +						   unsigned long *_pages,
> +						   unsigned long *_mask)
> +{
> +	unsigned long pages = (end - start + VTD_PAGE_SIZE - 1) >>
> VTD_PAGE_SHIFT;

this doesn't sound correct. You'd want to follow aligned_nrpages().

> +		case CACHE_TAG_TYPE_DEVTLB:
> +			if (tag->pasid == IOMMU_NO_PASID)
> +				qi_flush_dev_iotlb(iommu, sid, info->pfsid,
> +						   info->ats_qdep, addr,
> mask);
> +			else
> +				qi_flush_dev_iotlb_pasid(iommu, sid, info-
> >pfsid,
> +							 tag->pasid, info-
> >ats_qdep,
> +							 addr, mask);
> +
> +			quirk_extra_dev_tlb_flush(info, addr, mask, tag-
> >pasid, info->ats_qdep);
> +			break;
> +		case CACHE_TAG_TYPE_PARENT_DEVTLB:
> +			/*
> +			 * Address translation cache in device side caches the
> +			 * result of nested translation. There is no easy way
> +			 * to identify the exact set of nested translations
> +			 * affected by a change in S2. So just flush the entire
> +			 * device cache.
> +			 */
> +			qi_flush_dev_iotlb(iommu, sid, info->pfsid, info-
> >ats_qdep,
> +					   0, MAX_AGAW_PFN_WIDTH);
> +			quirk_extra_dev_tlb_flush(info, 0,
> MAX_AGAW_PFN_WIDTH,
> +						  IOMMU_NO_PASID, info-
> >ats_qdep);
> +			break;

parent devtlb can has pasid too, though not enabled now. As core helpers
probably we can support it now leading to simpler code:

	case CACHE_TAG_TYPE_PARENT_DEVTLB:
		//change addr/mask
		//fallthrough
	case CACHE_TAG_TYPE_DEVTLB:
		//what this patch does

> +void cache_tag_flush_all(struct dmar_domain *domain)
> +{
> +	struct cache_tag *tag;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&domain->cache_lock, flags);
> +	list_for_each_entry(tag, &domain->cache_tags, node) {
> +		struct device_domain_info *info = dev_iommu_priv_get(tag-
> >dev);
> +		struct intel_iommu *iommu = tag->iommu;
> +		u16 sid = PCI_DEVID(info->bus, info->devfn);
> +
> +		switch (tag->type) {
> +		case CACHE_TAG_TYPE_IOTLB:
> +		case CACHE_TAG_TYPE_PARENT_IOTLB:
> +			if (domain->use_first_level)
> +				qi_flush_piotlb(iommu, tag->domain_id,
> +						tag->pasid, 0, -1, 0);
> +			else
> +				iommu->flush.flush_iotlb(iommu, tag-
> >domain_id,
> +							 0, 0,
> DMA_TLB_DSI_FLUSH);
> +			break;
> +		case CACHE_TAG_TYPE_DEVTLB:
> +		case CACHE_TAG_TYPE_PARENT_DEVTLB:
> +			qi_flush_dev_iotlb(iommu, sid, info->pfsid, info-
> >ats_qdep,
> +					   0, MAX_AGAW_PFN_WIDTH);
> +			quirk_extra_dev_tlb_flush(info, 0,
> MAX_AGAW_PFN_WIDTH,
> +						  IOMMU_NO_PASID, info-
> >ats_qdep);
> +			break;
> +		}
> +	}

could this loop be consolidated with the one in previous helper? anyway
the only difference is on addr/mask...

> +/*
> + * Invalidate a range of IOVA when IOMMU is in caching mode and new
> mappings
> + * are added to the target domain.
> + */
> +void cache_tag_flush_cm_range(struct dmar_domain *domain, unsigned
> long start,
> +			      unsigned long end)
> +{

I'm also not sure why this is worth a separate helper. why couldn't it
be managed by previous flush_range()? 

> +	unsigned long pages, mask, addr;
> +	struct cache_tag *tag;
> +	unsigned long flags;
> +
> +	addr = calculate_psi_aligned_address(start, end, &pages, &mask);
> +
> +	spin_lock_irqsave(&domain->cache_lock, flags);
> +	list_for_each_entry(tag, &domain->cache_tags, node) {
> +		struct intel_iommu *iommu = tag->iommu;
> +
> +		/*
> +		 * When IOMMU is enabled in caching mode some non-
> present
> +		 * mappings may be cached by the IOTLB if it uses second-
> +		 * only translation.
> +		 */
> +		if (!cap_caching_mode(iommu->cap) || domain-
> >use_first_level) {
> +			iommu_flush_write_buffer(iommu);
> +			continue;
> +		}

the comment really doesn't tell what this block does. from its intention
it probably more suitable to be in the caller side as today.

> +
> +		if (tag->type == CACHE_TAG_TYPE_IOTLB ||
> +		    tag->type == CACHE_TAG_TYPE_PARENT_IOTLB) {
> +			/*
> +			 * Fallback to domain selective flush if no
> +			 * PSI support or the size is too big.
> +			 */
> +			if (!cap_pgsel_inv(iommu->cap) ||
> +			    mask > cap_max_amask_val(iommu->cap))
> +				iommu->flush.flush_iotlb(iommu, tag-
> >domain_id,
> +							 0, 0,
> DMA_TLB_DSI_FLUSH);
> +			else
> +				iommu->flush.flush_iotlb(iommu, tag-
> >domain_id,
> +							 addr, mask,
> +
> DMA_TLB_PSI_FLUSH);
> +		}
> +	}

You skipped devtlb invalidation. yes it's not necessary based on current
nested translation design and this part is inconsistent in different paths.

but as a semantics change you may want to first make removing devtlb
invalidation a separate patch and then do this cleanup, so bisect is easier.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 04/12] iommu/vt-d: Use cache_tag_flush_all() in flush_iotlb_all
  2024-03-25  2:16 ` [PATCH 04/12] iommu/vt-d: Use cache_tag_flush_all() in flush_iotlb_all Lu Baolu
@ 2024-03-28  7:47   ` Tian, Kevin
  2024-04-07  5:56     ` Baolu Lu
  0 siblings, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2024-03-28  7:47 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: Zhang, Tina, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Monday, March 25, 2024 10:17 AM
> 
> The flush_iotlb_all callback is called by the iommu core to flush
> all caches for the affected domain. Use cache_tag_flush_all() in
> this callback.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel/iommu.c | 21 +--------------------
>  1 file changed, 1 insertion(+), 20 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 93e4422c9b10..4ce98f23917c 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1560,26 +1560,7 @@ static void parent_domain_flush(struct
> dmar_domain *domain,
> 
>  static void intel_flush_iotlb_all(struct iommu_domain *domain)
>  {
> -	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> -	struct iommu_domain_info *info;
> -	unsigned long idx;
> -
> -	xa_for_each(&dmar_domain->iommu_array, idx, info) {
> -		struct intel_iommu *iommu = info->iommu;
> -		u16 did = domain_id_iommu(dmar_domain, iommu);
> -
> -		if (dmar_domain->use_first_level)
> -			domain_flush_pasid_iotlb(iommu, dmar_domain, 0,
> -1, 0);
> -		else
> -			iommu->flush.flush_iotlb(iommu, did, 0, 0,
> -						 DMA_TLB_DSI_FLUSH);
> -
> -		if (!cap_caching_mode(iommu->cap))
> -			iommu_flush_dev_iotlb(dmar_domain, 0,
> MAX_AGAW_PFN_WIDTH);
> -	}
> -
> -	if (dmar_domain->nested_parent)
> -		parent_domain_flush(dmar_domain, 0, -1, 0);
> +	cache_tag_flush_all(to_dmar_domain(domain));
>  }
> 

this replacement causes a functional change. Now devtlb is always
invalidated while old code doesn't do so for caching mode.

Probably you may want to first clean up all inconsistent devtlb
invalidation policies for caching mode before going to this series...

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 06/12] iommu/vt-d: Use cache_tag_flush_cm_range() in iotlb_sync_map
  2024-03-25  2:16 ` [PATCH 06/12] iommu/vt-d: Use cache_tag_flush_cm_range() in iotlb_sync_map Lu Baolu
@ 2024-03-28  7:48   ` Tian, Kevin
  2024-04-07  6:41     ` Baolu Lu
  0 siblings, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2024-03-28  7:48 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: Zhang, Tina, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Monday, March 25, 2024 10:17 AM
> 
> The iotlb_sync_map callback is called by the iommu core after non-present
> to present mappings are created. The iommu driver uses this callback to
> invalidate caches if IOMMU is working in caching mode and second-only
> translation is used for the domain. Use cache_tag_flush_cm_range() in this
> callback.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel/iommu.c | 22 +---------------------
>  1 file changed, 1 insertion(+), 21 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 1c03d2dafb9d..2dcab1e5cd4d 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1504,20 +1504,6 @@ static void iommu_flush_iotlb_psi(struct
> intel_iommu *iommu,
>  		iommu_flush_dev_iotlb(domain, addr, mask);
>  }
> 
> -/* Notification for newly created mappings */
> -static void __mapping_notify_one(struct intel_iommu *iommu, struct
> dmar_domain *domain,
> -				 unsigned long pfn, unsigned int pages)
> -{
> -	/*
> -	 * It's a non-present to present mapping. Only flush if caching mode
> -	 * and second level.
> -	 */
> -	if (cap_caching_mode(iommu->cap) && !domain->use_first_level)
> -		iommu_flush_iotlb_psi(iommu, domain, pfn, pages, 0, 1);
> -	else
> -		iommu_flush_write_buffer(iommu);
> -}

iommu_flush_write_buffer is for a quite different issue. it's clearer to
keep it separated from the iotlb helpers.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 07/12] iommu/vt-d: Cleanup use of iommu_flush_iotlb_psi()
  2024-03-25  2:17 ` [PATCH 07/12] iommu/vt-d: Cleanup use of iommu_flush_iotlb_psi() Lu Baolu
@ 2024-03-28  7:50   ` Tian, Kevin
  2024-04-07  7:06     ` Baolu Lu
  0 siblings, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2024-03-28  7:50 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: Zhang, Tina, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Monday, March 25, 2024 10:17 AM
> 
> Use cache_tag_flush_range() in switch_to_super_page() to invalidate the
> necessary caches when switching mappings from normal to super pages. The
> iommu_flush_iotlb_psi() call in intel_iommu_memory_notifier() is
> unnecessary since there should be no cache invalidation for the identity
> domain.
> 

what about a software identity domain?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 08/12] iommu/vt-d: Use cache_tag_flush_range() in cache_invalidate_user
  2024-03-25  2:17 ` [PATCH 08/12] iommu/vt-d: Use cache_tag_flush_range() in cache_invalidate_user Lu Baolu
@ 2024-03-28  7:54   ` Tian, Kevin
  2024-04-07  7:15     ` Baolu Lu
  0 siblings, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2024-03-28  7:54 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: Zhang, Tina, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Monday, March 25, 2024 10:17 AM
>
> @@ -166,9 +122,9 @@ static int intel_nested_cache_invalidate_user(struct
> iommu_domain *domain,
>  			break;
>  		}
> 
> -		intel_nested_flush_cache(dmar_domain, inv_entry.addr,
> -					 inv_entry.npages,
> -					 inv_entry.flags &
> IOMMU_VTD_INV_FLAGS_LEAF);
> +		cache_tag_flush_range(dmar_domain, inv_entry.addr,
> +				      inv_entry.npages,
> +				      inv_entry.flags &
> IOMMU_VTD_INV_FLAGS_LEAF);

the helper requires an 'end' address but 'npages' is incorrectly used here.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 00/12] Consolidate domain cache invalidation
  2024-03-25  2:16 [PATCH 00/12] Consolidate domain cache invalidation Lu Baolu
                   ` (11 preceding siblings ...)
  2024-03-25  2:17 ` [PATCH 12/12] iommu/vt-d: Retire struct intel_svm Lu Baolu
@ 2024-03-28  7:59 ` Tian, Kevin
  2024-04-07  7:28   ` Baolu Lu
  12 siblings, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2024-03-28  7:59 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: Zhang, Tina, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Monday, March 25, 2024 10:17 AM
> 
> The IOMMU hardware cache needs to be invalidated whenever the
> mappings
> in the domain are changed. Currently, domain cache invalidation is
> scattered across different places, causing several issues:
> 
> - IOMMU IOTLB Invalidation: This is done by iterating through the domain
>   IDs of each domain using the following code:
> 
>         xa_for_each(&dmar_domain->iommu_array, i, info)
>                 iommu_flush_iotlb_psi(info->iommu, dmar_domain,
>                                       start_pfn, nrpages,
>                                       list_empty(&gather->freelist), 0);
> 
>   This code could theoretically cause a use-after-free problem because
>   there's no lock to protect the "info" pointer within the loop.
> 
> - Inconsistent Invalidation Methods: Different domain types implement
>   their own cache invalidation methods, making the code difficult to
>   maintain. For example, the DMA domain, SVA domain, and nested domain
>   have similar cache invalidation code scattered across different files.
> 
> - SVA Domain Inconsistency: The SVA domain implementation uses a
>   completely different data structure to track attached devices compared
>   to other domains. This creates unnecessary differences and, even
>   worse, leads to duplicate IOTLB invalidation when an SVA domain is
>   attached to devices belonging to different IOMMU domains.

can you elaborate how duplicated invalidations are caused?

> 
> - Nested Domain Dependency: The special overlap between a nested domain
>   and its parent domain requires a dedicated parent_domain_flush()
>   helper function to be called everywhere the parent domain's mapping
>   changes.
> 
> - Limited Debugging Support: There are currently no debugging aids
>   available for domain cache invalidation.
> 
> By consolidating domain cache invalidation into a common location, we
> can address the issues mentioned above and improve the code's
> maintainability and debuggability.
> 

overall this is a nice work!

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface
  2024-03-28  7:12   ` Tian, Kevin
@ 2024-04-06 12:55     ` Baolu Lu
  2024-04-07  4:35       ` Baolu Lu
  0 siblings, 1 reply; 44+ messages in thread
From: Baolu Lu @ 2024-04-06 12:55 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: baolu.lu, Zhang, Tina, Liu, Yi L, iommu, linux-kernel

Hi Kevin,

Thanks for your review comments.

On 3/28/24 3:12 PM, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Monday, March 25, 2024 10:17 AM
>>
>> +enum cache_tag_type {
>> +	CACHE_TAG_TYPE_IOTLB,
>> +	CACHE_TAG_TYPE_DEVTLB,
>> +	CACHE_TAG_TYPE_PARENT_IOTLB,
>> +	CACHE_TAG_TYPE_PARENT_DEVTLB,
>> +};
> 
> '_TYPE_' can be removed to make it shorter

Okay.

> 
>> +
>> +/* Checks if an existing cache tag can be reused for a new association. */
>> +static bool cache_tag_reusable(struct cache_tag *tag, u16 domain_id,
>> +			       struct intel_iommu *iommu, struct device *dev,
>> +			       ioasid_t pasid, enum cache_tag_type type)
> 
> cache_tage_match()

Okay.

> 
>> +{
>> +	if (tag->type != type)
>> +		return false;
>> +
>> +	if (tag->domain_id != domain_id || tag->pasid != pasid)
>> +		return false;
>> +
>> +	if (type == CACHE_TAG_TYPE_IOTLB)
>> +		return tag->iommu == iommu;
>> +
>> +	if (type == CACHE_TAG_TYPE_DEVTLB)
>> +		return tag->dev == dev;
>> +
>> +	return false;
> 
> why do you disallow PARENT_TYPE from reusing? It's not uncommon
> to have two devices attached to a same nested domain hence with
> the same parent domain. Disallowing tag reuse implies unnecessarily
> duplicated cache flushes...

PARENT_TYPE could be reused. The new helper looks like the following:

/* Checks if an existing cache tag can be reused for a new association. */
static bool cache_tage_match(struct cache_tag *tag, u16 domain_id,
                                struct intel_iommu *iommu, struct device 
*dev,
                                ioasid_t pasid, enum cache_tag_type type)
{
         if (tag->type != type)
                 return false;

         if (tag->domain_id != domain_id || tag->pasid != pasid)
                 return false;

         if (type == CACHE_TAG_IOTLB || type == CACHE_TAG_PARENT_IOTLB)
                 return tag->iommu == iommu;

         if (type == CACHE_TAG_DEVTLB || type == CACHE_TAG_PARENT_DEVTLB)
                 return tag->dev == dev;

         return false;
}

>> +}
>> +
>> +/* Assign a cache tag with specified type to domain. */
>> +static int cache_tag_assign(struct dmar_domain *domain, u16 did,
>> +			    struct device *dev, ioasid_t pasid,
>> +			    enum cache_tag_type type)
>> +{
>> +	struct device_domain_info *info = dev_iommu_priv_get(dev);
>> +	struct intel_iommu *iommu = info->iommu;
>> +	struct cache_tag *tag, *temp;
>> +	unsigned long flags;
>> +
>> +	tag = kzalloc(sizeof(*tag), GFP_KERNEL);
>> +	if (!tag)
>> +		return -ENOMEM;
>> +
>> +	tag->type = type;
>> +	tag->iommu = iommu;
>> +	tag->dev = dev;
> 
> should we set tag->dev only for DEVTLB type? It's a bit confusing to set
> it for IOTLB type which doesn't care about device. Actually doing so
> is instead misleading as the 1st device creating the tag may have been
> detached but then it will still show up in the trace when the last device
> detach destroying the tag.

For IOTLB types, perhaps we could add a struct device pointer for the
iommu. This way, the tag->dev could more directly indicate the device
implementing the cache.

> 
>> +static int __cache_tag_assign_parent_domain(struct dmar_domain
>> *domain, u16 did,
>> +					    struct device *dev, ioasid_t pasid)
> 
> this pair is similar to the earlier one except the difference on type.
> 
> what about keeping just one pair which accepts a 'parent' argument to
> decide the type internally?

Okay, let try to refine it.

> 
> 
>> +/*
>> + * Assigns cache tags to a domain when it's associated with a device's
>> + * PASID using a specific domain ID.
> 
> s/Assigns/Assign/

Done.

> 
>> +
>> +	did = domain_id_iommu(domain, iommu);
>> +	ret = cache_tag_assign_domain(domain, did, dev,
>> IOMMU_NO_PASID);
> 
> there are many occurrences of this pattern. What about passing in
> a 'iommu' parameter and getting 'did' inside the helper? for svm
> it can be specialized internally too.

Perhaps, let me try it later and see what the code looks like.

> 
>> @@ -4607,10 +4623,11 @@ static void
>> intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
>>   	 */
>>   	if (domain->type == IOMMU_DOMAIN_SVA) {
>>   		intel_svm_remove_dev_pasid(dev, pasid);
>> +		cache_tag_unassign_domain(dmar_domain,
>> +					  FLPT_DEFAULT_DID, dev, pasid);
> 
> is it correct to destroy the tag before teardown completes, e.g. iotlb still
> needs to be flushed in intel_pasid_tear_down_entry()?

You are right. iotlb still needs to be there until the teardown
completes. I will investigate this more later.

Beset regards,
baolu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface
  2024-04-06 12:55     ` Baolu Lu
@ 2024-04-07  4:35       ` Baolu Lu
  2024-04-08  2:28         ` Tian, Kevin
  0 siblings, 1 reply; 44+ messages in thread
From: Baolu Lu @ 2024-04-07  4:35 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: baolu.lu, Zhang, Tina, Liu, Yi L, iommu, linux-kernel

On 4/6/24 8:55 PM, Baolu Lu wrote:
>>
>>> @@ -4607,10 +4623,11 @@ static void
>>> intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
>>>        */
>>>       if (domain->type == IOMMU_DOMAIN_SVA) {
>>>           intel_svm_remove_dev_pasid(dev, pasid);
>>> +        cache_tag_unassign_domain(dmar_domain,
>>> +                      FLPT_DEFAULT_DID, dev, pasid);
>>
>> is it correct to destroy the tag before teardown completes, e.g. iotlb 
>> still
>> needs to be flushed in intel_pasid_tear_down_entry()?
> 
> You are right. iotlb still needs to be there until the teardown
> completes. I will investigate this more later.

I reviewed this again. Cache tags are designed specifically for mapping
and unmapping paths. Therefore, there is no required order for attaching
and detaching paths.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/12] iommu/vt-d: Add cache tag invalidation helpers
  2024-03-28  7:39   ` Tian, Kevin
@ 2024-04-07  5:33     ` Baolu Lu
  2024-04-08  2:33       ` Tian, Kevin
  0 siblings, 1 reply; 44+ messages in thread
From: Baolu Lu @ 2024-04-07  5:33 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: baolu.lu, Zhang, Tina, Liu, Yi L, iommu, linux-kernel

On 3/28/24 3:39 PM, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Monday, March 25, 2024 10:17 AM
>>
>> +
>> +static unsigned long calculate_psi_aligned_address(unsigned long start,
>> +						   unsigned long end,
>> +						   unsigned long *_pages,
>> +						   unsigned long *_mask)
>> +{
>> +	unsigned long pages = (end - start + VTD_PAGE_SIZE - 1) >>
>> VTD_PAGE_SHIFT;
> 
> this doesn't sound correct. You'd want to follow aligned_nrpages().

Okay, I will make it like below.

	unsigned long pages = aligned_nrpages(start, end - start + 1);

> 
>> +		case CACHE_TAG_TYPE_DEVTLB:
>> +			if (tag->pasid == IOMMU_NO_PASID)
>> +				qi_flush_dev_iotlb(iommu, sid, info->pfsid,
>> +						   info->ats_qdep, addr,
>> mask);
>> +			else
>> +				qi_flush_dev_iotlb_pasid(iommu, sid, info-
>>> pfsid,
>> +							 tag->pasid, info-
>>> ats_qdep,
>> +							 addr, mask);
>> +
>> +			quirk_extra_dev_tlb_flush(info, addr, mask, tag-
>>> pasid, info->ats_qdep);
>> +			break;
>> +		case CACHE_TAG_TYPE_PARENT_DEVTLB:
>> +			/*
>> +			 * Address translation cache in device side caches the
>> +			 * result of nested translation. There is no easy way
>> +			 * to identify the exact set of nested translations
>> +			 * affected by a change in S2. So just flush the entire
>> +			 * device cache.
>> +			 */
>> +			qi_flush_dev_iotlb(iommu, sid, info->pfsid, info-
>>> ats_qdep,
>> +					   0, MAX_AGAW_PFN_WIDTH);
>> +			quirk_extra_dev_tlb_flush(info, 0,
>> MAX_AGAW_PFN_WIDTH,
>> +						  IOMMU_NO_PASID, info-
>>> ats_qdep);
>> +			break;
> 
> parent devtlb can has pasid too, though not enabled now. As core helpers
> probably we can support it now leading to simpler code:
> 
> 	case CACHE_TAG_TYPE_PARENT_DEVTLB:
> 		//change addr/mask
> 		//fallthrough
> 	case CACHE_TAG_TYPE_DEVTLB:
> 		//what this patch does

Yes. Yours is better. I will make it like below:

+               case CACHE_TAG_PARENT_DEVTLB:
+                       /*
+                        * Address translation cache in device side 
caches the
+                        * result of nested translation. There is no 
easy way
+                        * to identify the exact set of nested translations
+                        * affected by a change in S2. So just flush the 
entire
+                        * device cache.
+                        */
+                       addr = 0;
+                       mask = MAX_AGAW_PFN_WIDTH;
+                       fallthrough;
                 case CACHE_TAG_DEVTLB:


>> +void cache_tag_flush_all(struct dmar_domain *domain)
>> +{
>> +	struct cache_tag *tag;
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&domain->cache_lock, flags);
>> +	list_for_each_entry(tag, &domain->cache_tags, node) {
>> +		struct device_domain_info *info = dev_iommu_priv_get(tag-
>>> dev);
>> +		struct intel_iommu *iommu = tag->iommu;
>> +		u16 sid = PCI_DEVID(info->bus, info->devfn);
>> +
>> +		switch (tag->type) {
>> +		case CACHE_TAG_TYPE_IOTLB:
>> +		case CACHE_TAG_TYPE_PARENT_IOTLB:
>> +			if (domain->use_first_level)
>> +				qi_flush_piotlb(iommu, tag->domain_id,
>> +						tag->pasid, 0, -1, 0);
>> +			else
>> +				iommu->flush.flush_iotlb(iommu, tag-
>>> domain_id,
>> +							 0, 0,
>> DMA_TLB_DSI_FLUSH);
>> +			break;
>> +		case CACHE_TAG_TYPE_DEVTLB:
>> +		case CACHE_TAG_TYPE_PARENT_DEVTLB:
>> +			qi_flush_dev_iotlb(iommu, sid, info->pfsid, info-
>>> ats_qdep,
>> +					   0, MAX_AGAW_PFN_WIDTH);
>> +			quirk_extra_dev_tlb_flush(info, 0,
>> MAX_AGAW_PFN_WIDTH,
>> +						  IOMMU_NO_PASID, info-
>>> ats_qdep);
>> +			break;
>> +		}
>> +	}
> 
> could this loop be consolidated with the one in previous helper? anyway
> the only difference is on addr/mask...
> 
>> +/*
>> + * Invalidate a range of IOVA when IOMMU is in caching mode and new
>> mappings
>> + * are added to the target domain.
>> + */
>> +void cache_tag_flush_cm_range(struct dmar_domain *domain, unsigned
>> long start,
>> +			      unsigned long end)
>> +{
> 
> I'm also not sure why this is worth a separate helper. why couldn't it
> be managed by previous flush_range()?
This is only my preference. I'd like to separate things belonging to
different paths, so that it's easier for maintenance. For example, if,
in the future, we need to add or enhance something for a specific case,
we don't need to care about other cases.

> 
>> +	unsigned long pages, mask, addr;
>> +	struct cache_tag *tag;
>> +	unsigned long flags;
>> +
>> +	addr = calculate_psi_aligned_address(start, end, &pages, &mask);
>> +
>> +	spin_lock_irqsave(&domain->cache_lock, flags);
>> +	list_for_each_entry(tag, &domain->cache_tags, node) {
>> +		struct intel_iommu *iommu = tag->iommu;
>> +
>> +		/*
>> +		 * When IOMMU is enabled in caching mode some non-
>> present
>> +		 * mappings may be cached by the IOTLB if it uses second-
>> +		 * only translation.
>> +		 */
>> +		if (!cap_caching_mode(iommu->cap) || domain-
>>> use_first_level) {
>> +			iommu_flush_write_buffer(iommu);
>> +			continue;
>> +		}
> 
> the comment really doesn't tell what this block does. from its intention
> it probably more suitable to be in the caller side as today.

Yes, agreed. Will remove it.

> 
>> +
>> +		if (tag->type == CACHE_TAG_TYPE_IOTLB ||
>> +		    tag->type == CACHE_TAG_TYPE_PARENT_IOTLB) {
>> +			/*
>> +			 * Fallback to domain selective flush if no
>> +			 * PSI support or the size is too big.
>> +			 */
>> +			if (!cap_pgsel_inv(iommu->cap) ||
>> +			    mask > cap_max_amask_val(iommu->cap))
>> +				iommu->flush.flush_iotlb(iommu, tag-
>>> domain_id,
>> +							 0, 0,
>> DMA_TLB_DSI_FLUSH);
>> +			else
>> +				iommu->flush.flush_iotlb(iommu, tag-
>>> domain_id,
>> +							 addr, mask,
>> +
>> DMA_TLB_PSI_FLUSH);
>> +		}
>> +	}
> 
> You skipped devtlb invalidation. yes it's not necessary based on current
> nested translation design and this part is inconsistent in different paths.
> 
> but as a semantics change you may want to first make removing devtlb
> invalidation a separate patch and then do this cleanup, so bisect is easier.

This helper is designed for map paths when the IOMMU is in caching mode.
Caching mode doesn't impact the device TLB, so we should not invalidate
the device TLB here.

I guess the confusing thing is about the code below.


         /*
          * In caching mode, changes of pages from non-present to 
present require
          * flush. However, device IOTLB doesn't need to be flushed in 
this case.
          */
         if (!cap_caching_mode(iommu->cap) || !map)
                 iommu_flush_dev_iotlb(domain, addr, mask);

The real code doesn't match the comments, and I think the comments
describe the right thing. So perhaps the code should be changed to match
the comments.

	if (!map)
		iommu_flush_dev_iotlb(domain, addr, mask);

??

Best regards,
baolu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/12] iommu/vt-d: Use cache_tag_flush_all() in flush_iotlb_all
  2024-03-28  7:47   ` Tian, Kevin
@ 2024-04-07  5:56     ` Baolu Lu
  0 siblings, 0 replies; 44+ messages in thread
From: Baolu Lu @ 2024-04-07  5:56 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: baolu.lu, Zhang, Tina, Liu, Yi L, iommu, linux-kernel

On 3/28/24 3:47 PM, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Monday, March 25, 2024 10:17 AM
>>
>> The flush_iotlb_all callback is called by the iommu core to flush
>> all caches for the affected domain. Use cache_tag_flush_all() in
>> this callback.
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/intel/iommu.c | 21 +--------------------
>>   1 file changed, 1 insertion(+), 20 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index 93e4422c9b10..4ce98f23917c 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -1560,26 +1560,7 @@ static void parent_domain_flush(struct
>> dmar_domain *domain,
>>
>>   static void intel_flush_iotlb_all(struct iommu_domain *domain)
>>   {
>> -	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>> -	struct iommu_domain_info *info;
>> -	unsigned long idx;
>> -
>> -	xa_for_each(&dmar_domain->iommu_array, idx, info) {
>> -		struct intel_iommu *iommu = info->iommu;
>> -		u16 did = domain_id_iommu(dmar_domain, iommu);
>> -
>> -		if (dmar_domain->use_first_level)
>> -			domain_flush_pasid_iotlb(iommu, dmar_domain, 0,
>> -1, 0);
>> -		else
>> -			iommu->flush.flush_iotlb(iommu, did, 0, 0,
>> -						 DMA_TLB_DSI_FLUSH);
>> -
>> -		if (!cap_caching_mode(iommu->cap))
>> -			iommu_flush_dev_iotlb(dmar_domain, 0,
>> MAX_AGAW_PFN_WIDTH);
>> -	}
>> -
>> -	if (dmar_domain->nested_parent)
>> -		parent_domain_flush(dmar_domain, 0, -1, 0);
>> +	cache_tag_flush_all(to_dmar_domain(domain));
>>   }
>>
> 
> this replacement causes a functional change. Now devtlb is always
> invalidated while old code doesn't do so for caching mode.

You are right.

As my understanding of the VT-d spec, caching mode has nothing to do
with the device TLB, hence we should remove all checks on caching mode
before a device TLB invalidation.

I understand that people are leveraging the cache mode to avoid the
unnecessary VMEXIT from the guest OS since the host iommu unmap() path
ensures that device TLB is also invalidated.

Our direction should be to use the batched queue invalidation requests.
As we are consolidating code to a common place, that becomes much
easier. :-)

> Probably you may want to first clean up all inconsistent devtlb
> invalidation policies for caching mode before going to this series...

Sure thing!

Best regards,
baolu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/12] iommu/vt-d: Use cache_tag_flush_cm_range() in iotlb_sync_map
  2024-03-28  7:48   ` Tian, Kevin
@ 2024-04-07  6:41     ` Baolu Lu
  2024-04-08  2:51       ` Tian, Kevin
  0 siblings, 1 reply; 44+ messages in thread
From: Baolu Lu @ 2024-04-07  6:41 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: baolu.lu, Zhang, Tina, Liu, Yi L, iommu, linux-kernel

On 3/28/24 3:48 PM, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Monday, March 25, 2024 10:17 AM
>>
>> The iotlb_sync_map callback is called by the iommu core after non-present
>> to present mappings are created. The iommu driver uses this callback to
>> invalidate caches if IOMMU is working in caching mode and second-only
>> translation is used for the domain. Use cache_tag_flush_cm_range() in this
>> callback.
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/intel/iommu.c | 22 +---------------------
>>   1 file changed, 1 insertion(+), 21 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index 1c03d2dafb9d..2dcab1e5cd4d 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -1504,20 +1504,6 @@ static void iommu_flush_iotlb_psi(struct
>> intel_iommu *iommu,
>>   		iommu_flush_dev_iotlb(domain, addr, mask);
>>   }
>>
>> -/* Notification for newly created mappings */
>> -static void __mapping_notify_one(struct intel_iommu *iommu, struct
>> dmar_domain *domain,
>> -				 unsigned long pfn, unsigned int pages)
>> -{
>> -	/*
>> -	 * It's a non-present to present mapping. Only flush if caching mode
>> -	 * and second level.
>> -	 */
>> -	if (cap_caching_mode(iommu->cap) && !domain->use_first_level)
>> -		iommu_flush_iotlb_psi(iommu, domain, pfn, pages, 0, 1);
>> -	else
>> -		iommu_flush_write_buffer(iommu);
>> -}
> 
> iommu_flush_write_buffer is for a quite different issue. it's clearer to
> keep it separated from the iotlb helpers.

The VT-d spec describes the write buffer flushing in section 6.8. It
states,

1. Write buffer flushing is required only for earlier VT-d hardware
    implementations. Newer hardware implementations are expected to NOT
    require this.
2. Write buffer flushing is related to cache invalidation. The hardware
    performs an implicit write-buffer-flushing as a pre-condition to
    the invalidation operations.

According to this, write buffer flushing is only required in the map
path when IOMMU is not in caching mode and the RWBF iommu capability is
set, which perfectly matches the case of cache_tag_flush_cm_range().

By doing it in cache_tag_flush_cm_range(), the code will be simpler and
easier for maintenance.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/12] iommu/vt-d: Cleanup use of iommu_flush_iotlb_psi()
  2024-03-28  7:50   ` Tian, Kevin
@ 2024-04-07  7:06     ` Baolu Lu
  2024-04-08  2:57       ` Tian, Kevin
  0 siblings, 1 reply; 44+ messages in thread
From: Baolu Lu @ 2024-04-07  7:06 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: baolu.lu, Zhang, Tina, Liu, Yi L, iommu, linux-kernel

On 3/28/24 3:50 PM, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Monday, March 25, 2024 10:17 AM
>>
>> Use cache_tag_flush_range() in switch_to_super_page() to invalidate the
>> necessary caches when switching mappings from normal to super pages. The
>> iommu_flush_iotlb_psi() call in intel_iommu_memory_notifier() is
>> unnecessary since there should be no cache invalidation for the identity
>> domain.
>>
> 
> what about a software identity domain?

Software identity domain is used to fake the hardware passthrough
capability, on early VT-d hardware which doesn't implement the
passthrough mode. It's not any kind of protection domain, hence the OS
is not required to manage the cache synchronization.

Although I hope we can remove it someday and force the DMA domain
instead, we still need to carry it nowadays. However, we need to make it
consistent with the hardware passthrough. That is, hardware passthrough
doesn't require any cache invalidation in memory hot-plug paths, the
software passthrough should not either.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 08/12] iommu/vt-d: Use cache_tag_flush_range() in cache_invalidate_user
  2024-03-28  7:54   ` Tian, Kevin
@ 2024-04-07  7:15     ` Baolu Lu
  0 siblings, 0 replies; 44+ messages in thread
From: Baolu Lu @ 2024-04-07  7:15 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: baolu.lu, Zhang, Tina, Liu, Yi L, iommu, linux-kernel

On 3/28/24 3:54 PM, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Monday, March 25, 2024 10:17 AM
>>
>> @@ -166,9 +122,9 @@ static int intel_nested_cache_invalidate_user(struct
>> iommu_domain *domain,
>>   			break;
>>   		}
>>
>> -		intel_nested_flush_cache(dmar_domain, inv_entry.addr,
>> -					 inv_entry.npages,
>> -					 inv_entry.flags &
>> IOMMU_VTD_INV_FLAGS_LEAF);
>> +		cache_tag_flush_range(dmar_domain, inv_entry.addr,
>> +				      inv_entry.npages,
>> +				      inv_entry.flags &
>> IOMMU_VTD_INV_FLAGS_LEAF);
> 
> the helper requires an 'end' address but 'npages' is incorrectly used here.

Oh yes! Thanks for pointing this out. I will fix it like below.

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index c139524274b9..cef997c07158 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -1050,6 +1050,12 @@ static inline unsigned long 
aligned_nrpages(unsigned long host_addr, size_t size
         return PAGE_ALIGN(host_addr + size) >> VTD_PAGE_SHIFT;
  }

+/* Return a size from number of VTD pages. */
+static inline unsigned long nrpages_to_size(unsigned long npages)
+{
+       return npages << VTD_PAGE_SHIFT;
+}
+
  /* Convert value to context PASID directory size field coding. */
  #define context_pdts(pds)      (((pds) & 0x7) << 9)

diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index e251507cfcc0..ffbd8d98a3b8 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -123,7 +123,7 @@ static int intel_nested_cache_invalidate_user(struct 
iommu_domain *domain,
                 }

                 cache_tag_flush_range(dmar_domain, inv_entry.addr,
-                                     inv_entry.npages,
+                                     inv_entry.addr + 
nrpages_to_size(inv_entry.npages) - 1,
                                       inv_entry.flags & 
IOMMU_VTD_INV_FLAGS_LEAF);
                 processed++;
         }

Best regards,
baolu

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/12] Consolidate domain cache invalidation
  2024-03-28  7:59 ` [PATCH 00/12] Consolidate domain cache invalidation Tian, Kevin
@ 2024-04-07  7:28   ` Baolu Lu
  2024-04-08  3:03     ` Tian, Kevin
  0 siblings, 1 reply; 44+ messages in thread
From: Baolu Lu @ 2024-04-07  7:28 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: baolu.lu, Zhang, Tina, Liu, Yi L, iommu, linux-kernel

On 3/28/24 3:59 PM, Tian, Kevin wrote:
>> From: Lu Baolu<baolu.lu@linux.intel.com>
>> Sent: Monday, March 25, 2024 10:17 AM
>>
>> The IOMMU hardware cache needs to be invalidated whenever the
>> mappings
>> in the domain are changed. Currently, domain cache invalidation is
>> scattered across different places, causing several issues:
>>
>> - IOMMU IOTLB Invalidation: This is done by iterating through the domain
>>    IDs of each domain using the following code:
>>
>>          xa_for_each(&dmar_domain->iommu_array, i, info)
>>                  iommu_flush_iotlb_psi(info->iommu, dmar_domain,
>>                                        start_pfn, nrpages,
>>                                        list_empty(&gather->freelist), 0);
>>
>>    This code could theoretically cause a use-after-free problem because
>>    there's no lock to protect the "info" pointer within the loop.
>>
>> - Inconsistent Invalidation Methods: Different domain types implement
>>    their own cache invalidation methods, making the code difficult to
>>    maintain. For example, the DMA domain, SVA domain, and nested domain
>>    have similar cache invalidation code scattered across different files.
>>
>> - SVA Domain Inconsistency: The SVA domain implementation uses a
>>    completely different data structure to track attached devices compared
>>    to other domains. This creates unnecessary differences and, even
>>    worse, leads to duplicate IOTLB invalidation when an SVA domain is
>>    attached to devices belonging to different IOMMU domains.
> can you elaborate how duplicated invalidations are caused?

Yes, sure.

Current Intel SVA implementation keeps the bond between mm and a PASID
of a device in a list of intel_svm_dev. In the mm notifier callback, it
iterates all intel_svam_dev in the list and invalidates the IOTLB and
device TLB sequentially.

If multiple devices belong to a single IOMMU, the IOTLB will be flushed
multiple times. However, since these devices share the same domain ID
and PASID, a single IOTLB cache invalidation is sufficient. The
additional flushes are redundant and negatively impact performance.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface
  2024-04-07  4:35       ` Baolu Lu
@ 2024-04-08  2:28         ` Tian, Kevin
  0 siblings, 0 replies; 44+ messages in thread
From: Tian, Kevin @ 2024-04-08  2:28 UTC (permalink / raw)
  To: Baolu Lu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: Zhang, Tina, Liu, Yi L, iommu, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Sunday, April 7, 2024 12:35 PM
> 
> On 4/6/24 8:55 PM, Baolu Lu wrote:
> >>
> >>> @@ -4607,10 +4623,11 @@ static void
> >>> intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
> >>>        */
> >>>       if (domain->type == IOMMU_DOMAIN_SVA) {
> >>>           intel_svm_remove_dev_pasid(dev, pasid);
> >>> +        cache_tag_unassign_domain(dmar_domain,
> >>> +                      FLPT_DEFAULT_DID, dev, pasid);
> >>
> >> is it correct to destroy the tag before teardown completes, e.g. iotlb
> >> still
> >> needs to be flushed in intel_pasid_tear_down_entry()?
> >
> > You are right. iotlb still needs to be there until the teardown
> > completes. I will investigate this more later.
> 
> I reviewed this again. Cache tags are designed specifically for mapping
> and unmapping paths. Therefore, there is no required order for attaching
> and detaching paths.
> 

Okay. intel_pasid_tear_down_entry() directly retrieves the information
from the pasid entry instead of relying on the domain cache tag info.
so yes destroying the tag at this point is fine.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 02/12] iommu/vt-d: Add cache tag invalidation helpers
  2024-04-07  5:33     ` Baolu Lu
@ 2024-04-08  2:33       ` Tian, Kevin
  2024-04-08  2:53         ` Baolu Lu
  0 siblings, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2024-04-08  2:33 UTC (permalink / raw)
  To: Baolu Lu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: Zhang, Tina, Liu, Yi L, iommu, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Sunday, April 7, 2024 1:33 PM
> 
> On 3/28/24 3:39 PM, Tian, Kevin wrote:
> >> From: Lu Baolu <baolu.lu@linux.intel.com>
> >> Sent: Monday, March 25, 2024 10:17 AM
> >>
> >> +/*
> >> + * Invalidate a range of IOVA when IOMMU is in caching mode and new
> >> mappings
> >> + * are added to the target domain.
> >> + */
> >> +void cache_tag_flush_cm_range(struct dmar_domain *domain, unsigned
> >> long start,
> >> +			      unsigned long end)
> >> +{
> >
> > I'm also not sure why this is worth a separate helper. why couldn't it
> > be managed by previous flush_range()?
> This is only my preference. I'd like to separate things belonging to
> different paths, so that it's easier for maintenance. For example, if,
> in the future, we need to add or enhance something for a specific case,
> we don't need to care about other cases.

IMHO caching mode is an attribute in low level iommu which can be
handled perfectly well within the helper by checking that attribute.

it sounds a bit weird for the caller to know that detail and call different
helpers when all paths just want to request to flush a specific range.

> >> +
> >> +		if (tag->type == CACHE_TAG_TYPE_IOTLB ||
> >> +		    tag->type == CACHE_TAG_TYPE_PARENT_IOTLB) {
> >> +			/*
> >> +			 * Fallback to domain selective flush if no
> >> +			 * PSI support or the size is too big.
> >> +			 */
> >> +			if (!cap_pgsel_inv(iommu->cap) ||
> >> +			    mask > cap_max_amask_val(iommu->cap))
> >> +				iommu->flush.flush_iotlb(iommu, tag-
> >>> domain_id,
> >> +							 0, 0,
> >> DMA_TLB_DSI_FLUSH);
> >> +			else
> >> +				iommu->flush.flush_iotlb(iommu, tag-
> >>> domain_id,
> >> +							 addr, mask,
> >> +
> >> DMA_TLB_PSI_FLUSH);
> >> +		}
> >> +	}
> >
> > You skipped devtlb invalidation. yes it's not necessary based on current
> > nested translation design and this part is inconsistent in different paths.
> >
> > but as a semantics change you may want to first make removing devtlb
> > invalidation a separate patch and then do this cleanup, so bisect is easier.
> 
> This helper is designed for map paths when the IOMMU is in caching mode.
> Caching mode doesn't impact the device TLB, so we should not invalidate
> the device TLB here.
> 
> I guess the confusing thing is about the code below.
> 
> 
>          /*
>           * In caching mode, changes of pages from non-present to
> present require
>           * flush. However, device IOTLB doesn't need to be flushed in
> this case.
>           */
>          if (!cap_caching_mode(iommu->cap) || !map)
>                  iommu_flush_dev_iotlb(domain, addr, mask);
> 
> The real code doesn't match the comments, and I think the comments
> describe the right thing. So perhaps the code should be changed to match
> the comments.
> 
> 	if (!map)
> 		iommu_flush_dev_iotlb(domain, addr, mask);
> 
> ??
> 

yes. that should be fixed separately.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 06/12] iommu/vt-d: Use cache_tag_flush_cm_range() in iotlb_sync_map
  2024-04-07  6:41     ` Baolu Lu
@ 2024-04-08  2:51       ` Tian, Kevin
  2024-04-08  2:57         ` Baolu Lu
  0 siblings, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2024-04-08  2:51 UTC (permalink / raw)
  To: Baolu Lu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: Zhang, Tina, Liu, Yi L, iommu, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Sunday, April 7, 2024 2:41 PM
> 
> On 3/28/24 3:48 PM, Tian, Kevin wrote:
> >> From: Lu Baolu <baolu.lu@linux.intel.com>
> >> Sent: Monday, March 25, 2024 10:17 AM
> >>
> >> The iotlb_sync_map callback is called by the iommu core after non-
> present
> >> to present mappings are created. The iommu driver uses this callback to
> >> invalidate caches if IOMMU is working in caching mode and second-only
> >> translation is used for the domain. Use cache_tag_flush_cm_range() in
> this
> >> callback.
> >>
> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> >> ---
> >>   drivers/iommu/intel/iommu.c | 22 +---------------------
> >>   1 file changed, 1 insertion(+), 21 deletions(-)
> >>
> >> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> >> index 1c03d2dafb9d..2dcab1e5cd4d 100644
> >> --- a/drivers/iommu/intel/iommu.c
> >> +++ b/drivers/iommu/intel/iommu.c
> >> @@ -1504,20 +1504,6 @@ static void iommu_flush_iotlb_psi(struct
> >> intel_iommu *iommu,
> >>   		iommu_flush_dev_iotlb(domain, addr, mask);
> >>   }
> >>
> >> -/* Notification for newly created mappings */
> >> -static void __mapping_notify_one(struct intel_iommu *iommu, struct
> >> dmar_domain *domain,
> >> -				 unsigned long pfn, unsigned int pages)
> >> -{
> >> -	/*
> >> -	 * It's a non-present to present mapping. Only flush if caching mode
> >> -	 * and second level.
> >> -	 */
> >> -	if (cap_caching_mode(iommu->cap) && !domain->use_first_level)
> >> -		iommu_flush_iotlb_psi(iommu, domain, pfn, pages, 0, 1);
> >> -	else
> >> -		iommu_flush_write_buffer(iommu);
> >> -}
> >
> > iommu_flush_write_buffer is for a quite different issue. it's clearer to
> > keep it separated from the iotlb helpers.
> 
> The VT-d spec describes the write buffer flushing in section 6.8. It
> states,
> 
> 1. Write buffer flushing is required only for earlier VT-d hardware
>     implementations. Newer hardware implementations are expected to NOT
>     require this.
> 2. Write buffer flushing is related to cache invalidation. The hardware
>     performs an implicit write-buffer-flushing as a pre-condition to
>     the invalidation operations.
> 
> According to this, write buffer flushing is only required in the map
> path when IOMMU is not in caching mode and the RWBF iommu capability is
> set, which perfectly matches the case of cache_tag_flush_cm_range().
> 
> By doing it in cache_tag_flush_cm_range(), the code will be simpler and
> easier for maintenance.
> 

but above applies actually when caching mode is false or when 1st
stage is being used. this is the opposite of what flush_cm_range()
implies.

if we do it through a general flush_range() helper which accepts
a parameter to indicate the map path, then it's reasonable to move
this special handling to that helper as that is another trick which
the caller doesn't need to know.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/12] iommu/vt-d: Add cache tag invalidation helpers
  2024-04-08  2:33       ` Tian, Kevin
@ 2024-04-08  2:53         ` Baolu Lu
  2024-04-08  3:14           ` Tian, Kevin
  0 siblings, 1 reply; 44+ messages in thread
From: Baolu Lu @ 2024-04-08  2:53 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: baolu.lu, Zhang, Tina, Liu, Yi L, iommu, linux-kernel

On 4/8/24 10:33 AM, Tian, Kevin wrote:
>> From: Baolu Lu<baolu.lu@linux.intel.com>
>> Sent: Sunday, April 7, 2024 1:33 PM
>>
>> On 3/28/24 3:39 PM, Tian, Kevin wrote:
>>>> From: Lu Baolu<baolu.lu@linux.intel.com>
>>>> Sent: Monday, March 25, 2024 10:17 AM
>>>>
>>>> +/*
>>>> + * Invalidate a range of IOVA when IOMMU is in caching mode and new
>>>> mappings
>>>> + * are added to the target domain.
>>>> + */
>>>> +void cache_tag_flush_cm_range(struct dmar_domain *domain, unsigned
>>>> long start,
>>>> +			      unsigned long end)
>>>> +{
>>> I'm also not sure why this is worth a separate helper. why couldn't it
>>> be managed by previous flush_range()?
>> This is only my preference. I'd like to separate things belonging to
>> different paths, so that it's easier for maintenance. For example, if,
>> in the future, we need to add or enhance something for a specific case,
>> we don't need to care about other cases.
> IMHO caching mode is an attribute in low level iommu which can be
> handled perfectly well within the helper by checking that attribute.
> 
> it sounds a bit weird for the caller to know that detail and call different
> helpers when all paths just want to request to flush a specific range.

I see. The helper name is a bit confusing.

Generally speaking, cache_tag_flush_range() and cache_tag_flush_all()
are designed to flush caches for mapping change from present to non-
present. While cache_tag_flush_cm_range() is designed to flush caches
for mapping change from non-present to present.

How about renaming these helpers to

cache_tag_flush_present_range()
cache_tag_flush_present_all()
cache_tag_flush_non_present_range()

?

In cache_tag_flush_non_present_range(),

- if IOMMU is not in caching mode, flush the write buffer if necessary,
   or it's a no-op.
- if IOMMU is in caching mode, flush the IOTLB caches.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 07/12] iommu/vt-d: Cleanup use of iommu_flush_iotlb_psi()
  2024-04-07  7:06     ` Baolu Lu
@ 2024-04-08  2:57       ` Tian, Kevin
  0 siblings, 0 replies; 44+ messages in thread
From: Tian, Kevin @ 2024-04-08  2:57 UTC (permalink / raw)
  To: Baolu Lu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: Zhang, Tina, Liu, Yi L, iommu, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Sunday, April 7, 2024 3:06 PM
> 
> On 3/28/24 3:50 PM, Tian, Kevin wrote:
> >> From: Lu Baolu <baolu.lu@linux.intel.com>
> >> Sent: Monday, March 25, 2024 10:17 AM
> >>
> >> Use cache_tag_flush_range() in switch_to_super_page() to invalidate the
> >> necessary caches when switching mappings from normal to super pages.
> The
> >> iommu_flush_iotlb_psi() call in intel_iommu_memory_notifier() is
> >> unnecessary since there should be no cache invalidation for the identity
> >> domain.
> >>
> >
> > what about a software identity domain?
> 
> Software identity domain is used to fake the hardware passthrough
> capability, on early VT-d hardware which doesn't implement the
> passthrough mode. It's not any kind of protection domain, hence the OS
> is not required to manage the cache synchronization.
> 
> Although I hope we can remove it someday and force the DMA domain
> instead, we still need to carry it nowadays. However, we need to make it
> consistent with the hardware passthrough. That is, hardware passthrough
> doesn't require any cache invalidation in memory hot-plug paths, the
> software passthrough should not either.
> 

yes, that makes sense.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/12] iommu/vt-d: Use cache_tag_flush_cm_range() in iotlb_sync_map
  2024-04-08  2:51       ` Tian, Kevin
@ 2024-04-08  2:57         ` Baolu Lu
  0 siblings, 0 replies; 44+ messages in thread
From: Baolu Lu @ 2024-04-08  2:57 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: baolu.lu, Zhang, Tina, Liu, Yi L, iommu, linux-kernel

On 4/8/24 10:51 AM, Tian, Kevin wrote:
>> From: Baolu Lu<baolu.lu@linux.intel.com>
>> Sent: Sunday, April 7, 2024 2:41 PM
>>
>> On 3/28/24 3:48 PM, Tian, Kevin wrote:
>>>> From: Lu Baolu<baolu.lu@linux.intel.com>
>>>> Sent: Monday, March 25, 2024 10:17 AM
>>>>
>>>> The iotlb_sync_map callback is called by the iommu core after non-
>> present
>>>> to present mappings are created. The iommu driver uses this callback to
>>>> invalidate caches if IOMMU is working in caching mode and second-only
>>>> translation is used for the domain. Use cache_tag_flush_cm_range() in
>> this
>>>> callback.
>>>>
>>>> Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
>>>> ---
>>>>    drivers/iommu/intel/iommu.c | 22 +---------------------
>>>>    1 file changed, 1 insertion(+), 21 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>>>> index 1c03d2dafb9d..2dcab1e5cd4d 100644
>>>> --- a/drivers/iommu/intel/iommu.c
>>>> +++ b/drivers/iommu/intel/iommu.c
>>>> @@ -1504,20 +1504,6 @@ static void iommu_flush_iotlb_psi(struct
>>>> intel_iommu *iommu,
>>>>    		iommu_flush_dev_iotlb(domain, addr, mask);
>>>>    }
>>>>
>>>> -/* Notification for newly created mappings */
>>>> -static void __mapping_notify_one(struct intel_iommu *iommu, struct
>>>> dmar_domain *domain,
>>>> -				 unsigned long pfn, unsigned int pages)
>>>> -{
>>>> -	/*
>>>> -	 * It's a non-present to present mapping. Only flush if caching mode
>>>> -	 * and second level.
>>>> -	 */
>>>> -	if (cap_caching_mode(iommu->cap) && !domain->use_first_level)
>>>> -		iommu_flush_iotlb_psi(iommu, domain, pfn, pages, 0, 1);
>>>> -	else
>>>> -		iommu_flush_write_buffer(iommu);
>>>> -}
>>> iommu_flush_write_buffer is for a quite different issue. it's clearer to
>>> keep it separated from the iotlb helpers.
>> The VT-d spec describes the write buffer flushing in section 6.8. It
>> states,
>>
>> 1. Write buffer flushing is required only for earlier VT-d hardware
>>      implementations. Newer hardware implementations are expected to NOT
>>      require this.
>> 2. Write buffer flushing is related to cache invalidation. The hardware
>>      performs an implicit write-buffer-flushing as a pre-condition to
>>      the invalidation operations.
>>
>> According to this, write buffer flushing is only required in the map
>> path when IOMMU is not in caching mode and the RWBF iommu capability is
>> set, which perfectly matches the case of cache_tag_flush_cm_range().
>>
>> By doing it in cache_tag_flush_cm_range(), the code will be simpler and
>> easier for maintenance.
>>
> but above applies actually when caching mode is false or when 1st
> stage is being used. this is the opposite of what flush_cm_range()
> implies.
> 
> if we do it through a general flush_range() helper which accepts
> a parameter to indicate the map path, then it's reasonable to move
> this special handling to that helper as that is another trick which
> the caller doesn't need to know.

This could be an option. How about my proposal in another reply?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 00/12] Consolidate domain cache invalidation
  2024-04-07  7:28   ` Baolu Lu
@ 2024-04-08  3:03     ` Tian, Kevin
  2024-04-08  3:05       ` Baolu Lu
  0 siblings, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2024-04-08  3:03 UTC (permalink / raw)
  To: Baolu Lu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: Zhang, Tina, Liu, Yi L, iommu, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Sunday, April 7, 2024 3:28 PM
> 
> On 3/28/24 3:59 PM, Tian, Kevin wrote:
> >> From: Lu Baolu<baolu.lu@linux.intel.com>
> >> Sent: Monday, March 25, 2024 10:17 AM
> >>
> >> The IOMMU hardware cache needs to be invalidated whenever the
> >> mappings
> >> in the domain are changed. Currently, domain cache invalidation is
> >> scattered across different places, causing several issues:
> >>
> >> - IOMMU IOTLB Invalidation: This is done by iterating through the domain
> >>    IDs of each domain using the following code:
> >>
> >>          xa_for_each(&dmar_domain->iommu_array, i, info)
> >>                  iommu_flush_iotlb_psi(info->iommu, dmar_domain,
> >>                                        start_pfn, nrpages,
> >>                                        list_empty(&gather->freelist), 0);
> >>
> >>    This code could theoretically cause a use-after-free problem because
> >>    there's no lock to protect the "info" pointer within the loop.
> >>
> >> - Inconsistent Invalidation Methods: Different domain types implement
> >>    their own cache invalidation methods, making the code difficult to
> >>    maintain. For example, the DMA domain, SVA domain, and nested
> domain
> >>    have similar cache invalidation code scattered across different files.
> >>
> >> - SVA Domain Inconsistency: The SVA domain implementation uses a
> >>    completely different data structure to track attached devices compared
> >>    to other domains. This creates unnecessary differences and, even
> >>    worse, leads to duplicate IOTLB invalidation when an SVA domain is
> >>    attached to devices belonging to different IOMMU domains.
> > can you elaborate how duplicated invalidations are caused?
> 
> Yes, sure.
> 
> Current Intel SVA implementation keeps the bond between mm and a PASID
> of a device in a list of intel_svm_dev. In the mm notifier callback, it
> iterates all intel_svam_dev in the list and invalidates the IOTLB and
> device TLB sequentially.
> 
> If multiple devices belong to a single IOMMU, the IOTLB will be flushed
> multiple times. However, since these devices share the same domain ID
> and PASID, a single IOTLB cache invalidation is sufficient. The
> additional flushes are redundant and negatively impact performance.
> 

yes it's redundant. But what does "devices belonging to different
IOMMU domains" in the original context try to convey? From above
explanation it sounds irrelevant...

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/12] Consolidate domain cache invalidation
  2024-04-08  3:03     ` Tian, Kevin
@ 2024-04-08  3:05       ` Baolu Lu
  0 siblings, 0 replies; 44+ messages in thread
From: Baolu Lu @ 2024-04-08  3:05 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: baolu.lu, Zhang, Tina, Liu, Yi L, iommu, linux-kernel

On 4/8/24 11:03 AM, Tian, Kevin wrote:
>> From: Baolu Lu <baolu.lu@linux.intel.com>
>> Sent: Sunday, April 7, 2024 3:28 PM
>>
>> On 3/28/24 3:59 PM, Tian, Kevin wrote:
>>>> From: Lu Baolu<baolu.lu@linux.intel.com>
>>>> Sent: Monday, March 25, 2024 10:17 AM
>>>>
>>>> The IOMMU hardware cache needs to be invalidated whenever the
>>>> mappings
>>>> in the domain are changed. Currently, domain cache invalidation is
>>>> scattered across different places, causing several issues:
>>>>
>>>> - IOMMU IOTLB Invalidation: This is done by iterating through the domain
>>>>     IDs of each domain using the following code:
>>>>
>>>>           xa_for_each(&dmar_domain->iommu_array, i, info)
>>>>                   iommu_flush_iotlb_psi(info->iommu, dmar_domain,
>>>>                                         start_pfn, nrpages,
>>>>                                         list_empty(&gather->freelist), 0);
>>>>
>>>>     This code could theoretically cause a use-after-free problem because
>>>>     there's no lock to protect the "info" pointer within the loop.
>>>>
>>>> - Inconsistent Invalidation Methods: Different domain types implement
>>>>     their own cache invalidation methods, making the code difficult to
>>>>     maintain. For example, the DMA domain, SVA domain, and nested
>> domain
>>>>     have similar cache invalidation code scattered across different files.
>>>>
>>>> - SVA Domain Inconsistency: The SVA domain implementation uses a
>>>>     completely different data structure to track attached devices compared
>>>>     to other domains. This creates unnecessary differences and, even
>>>>     worse, leads to duplicate IOTLB invalidation when an SVA domain is
>>>>     attached to devices belonging to different IOMMU domains.
>>> can you elaborate how duplicated invalidations are caused?
>>
>> Yes, sure.
>>
>> Current Intel SVA implementation keeps the bond between mm and a PASID
>> of a device in a list of intel_svm_dev. In the mm notifier callback, it
>> iterates all intel_svam_dev in the list and invalidates the IOTLB and
>> device TLB sequentially.
>>
>> If multiple devices belong to a single IOMMU, the IOTLB will be flushed
>> multiple times. However, since these devices share the same domain ID
>> and PASID, a single IOTLB cache invalidation is sufficient. The
>> additional flushes are redundant and negatively impact performance.
>>
> 
> yes it's redundant. But what does "devices belonging to different
> IOMMU domains" in the original context try to convey? From above
> explanation it sounds irrelevant...

My typo. :-) Sorry for the confusion.

I should say,

"... attached to devices belonging to a same IOMMU ..."

Best regards,
baolu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 02/12] iommu/vt-d: Add cache tag invalidation helpers
  2024-04-08  2:53         ` Baolu Lu
@ 2024-04-08  3:14           ` Tian, Kevin
  0 siblings, 0 replies; 44+ messages in thread
From: Tian, Kevin @ 2024-04-08  3:14 UTC (permalink / raw)
  To: Baolu Lu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe
  Cc: Zhang, Tina, Liu, Yi L, iommu, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Monday, April 8, 2024 10:54 AM
> 
> On 4/8/24 10:33 AM, Tian, Kevin wrote:
> >> From: Baolu Lu<baolu.lu@linux.intel.com>
> >> Sent: Sunday, April 7, 2024 1:33 PM
> >>
> >> On 3/28/24 3:39 PM, Tian, Kevin wrote:
> >>>> From: Lu Baolu<baolu.lu@linux.intel.com>
> >>>> Sent: Monday, March 25, 2024 10:17 AM
> >>>>
> >>>> +/*
> >>>> + * Invalidate a range of IOVA when IOMMU is in caching mode and
> new
> >>>> mappings
> >>>> + * are added to the target domain.
> >>>> + */
> >>>> +void cache_tag_flush_cm_range(struct dmar_domain *domain,
> unsigned
> >>>> long start,
> >>>> +			      unsigned long end)
> >>>> +{
> >>> I'm also not sure why this is worth a separate helper. why couldn't it
> >>> be managed by previous flush_range()?
> >> This is only my preference. I'd like to separate things belonging to
> >> different paths, so that it's easier for maintenance. For example, if,
> >> in the future, we need to add or enhance something for a specific case,
> >> we don't need to care about other cases.
> > IMHO caching mode is an attribute in low level iommu which can be
> > handled perfectly well within the helper by checking that attribute.
> >
> > it sounds a bit weird for the caller to know that detail and call different
> > helpers when all paths just want to request to flush a specific range.
> 
> I see. The helper name is a bit confusing.
> 
> Generally speaking, cache_tag_flush_range() and cache_tag_flush_all()
> are designed to flush caches for mapping change from present to non-
> present. While cache_tag_flush_cm_range() is designed to flush caches
> for mapping change from non-present to present.
> 
> How about renaming these helpers to
> 
> cache_tag_flush_present_range()
> cache_tag_flush_present_all()
> cache_tag_flush_non_present_range()

I'll probably keep cache_tag_flush_range/all() as it is because the
naming matches the general interpretation of the tlb semantics.

then call the 3rd one as cache_tag_flush_range_np() for various
additional requirements on transitions from non-present entry.

> 
> ?
> 
> In cache_tag_flush_non_present_range(),
> 
> - if IOMMU is not in caching mode, flush the write buffer if necessary,

or if using stage-1

>    or it's a no-op.
> - if IOMMU is in caching mode, flush the IOTLB caches.
> 
> Best regards,
> baolu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface
  2024-03-25  2:16 ` [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface Lu Baolu
  2024-03-28  7:12   ` Tian, Kevin
@ 2024-04-10 15:41   ` Jason Gunthorpe
  2024-04-10 23:14     ` Tian, Kevin
                       ` (2 more replies)
  1 sibling, 3 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2024-04-10 15:41 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian, Tina Zhang,
	Yi Liu, iommu, linux-kernel

On Mon, Mar 25, 2024 at 10:16:54AM +0800, Lu Baolu wrote:
> Caching tag is a combination of tags used by the hardware to cache various
> translations. Whenever a mapping in a domain is changed, the IOMMU driver
> should invalidate the caches with the caching tags. The VT-d specification
> describes caching tags in section 6.2.1, Tagging of Cached Translations.
> 
> Add interface to assign caching tags to an IOMMU domain when attached to a
> RID or PASID, and unassign caching tags when a domain is detached from a
> RID or PASID. All caching tags are listed in the per-domain tag list and
> are protected by a dedicated lock.
> 
> In addition to the basic IOTLB and devTLB caching tag types, PARENT_IOTLB
> and PARENT_DEVTLB tag types are also introduced. These tags are used for
> caches that store translations for DMA accesses through a nested user
> domain. They are affected by changes to mappings in the parent domain.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel/iommu.h  |  25 +++++
>  drivers/iommu/intel/cache.c  | 192 +++++++++++++++++++++++++++++++++++
>  drivers/iommu/intel/iommu.c  |  31 +++++-
>  drivers/iommu/intel/nested.c |  21 +++-
>  drivers/iommu/intel/svm.c    |  12 ++-
>  drivers/iommu/intel/Makefile |   2 +-
>  6 files changed, 274 insertions(+), 9 deletions(-)
>  create mode 100644 drivers/iommu/intel/cache.c
> 
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index 404d2476a877..e3723b7a0b31 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -607,6 +607,9 @@ struct dmar_domain {
>  	struct list_head devices;	/* all devices' list */
>  	struct list_head dev_pasids;	/* all attached pasids */
>  
> +	spinlock_t cache_lock;		/* Protect the cache tag list */
> +	struct list_head cache_tags;	/* Cache tag list */

That is quite a neat trick - though building a dedicated invalidation
list duplicates data stored in the attached devices list?

You didn't try to make it RCU safe for invalidation?

> +struct cache_tag {
> +	struct list_head node;
> +	enum cache_tag_type type;
> +	struct intel_iommu *iommu;
> +	struct device *dev;

iommu and dev probably don't both need to be stored together. We have
iommu_get_iommu_dev() now.. I suppose this is probably a union of the
two pointers depending on tag. DEVTLB needs the dev and IOTLB needs
the iommu.

> +	u16 domain_id;
> +	ioasid_t pasid;
> +	int users;

unsigned int

> +static int __cache_tag_assign_parent_domain(struct dmar_domain *domain, u16 did,
> +					    struct device *dev, ioasid_t pasid)
> +{
> +	struct device_domain_info *info = dev_iommu_priv_get(dev);
> +	int ret;
> +
> +	ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_TYPE_PARENT_IOTLB);
> +	if (ret || !info->ats_enabled)
> +		return ret;

I'm not sure I understood the point of PARENT_IOTLB? I didn't see any
different implementation?

Isn't this backwards though? Each domain should have a list of things
to invalidate if the domain itself changes.

So the nesting parent should have a list of CHILD_DEVTLB's that need
cleaning. That list is changed when the nesting domains are attached
to something.

And a list of CHILD_IOTLBs, but the HW doesn't seem to need that?

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface
  2024-04-10 15:41   ` Jason Gunthorpe
@ 2024-04-10 23:14     ` Tian, Kevin
  2024-04-11 13:17       ` Baolu Lu
  2024-04-11 12:10     ` Baolu Lu
  2024-04-11 12:38     ` Baolu Lu
  2 siblings, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2024-04-10 23:14 UTC (permalink / raw)
  To: Jason Gunthorpe, Lu Baolu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Zhang, Tina, Liu, Yi L,
	iommu, linux-kernel

> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Wednesday, April 10, 2024 11:42 PM
> 
> On Mon, Mar 25, 2024 at 10:16:54AM +0800, Lu Baolu wrote:
> > +static int __cache_tag_assign_parent_domain(struct dmar_domain
> *domain, u16 did,
> > +					    struct device *dev, ioasid_t pasid)
> > +{
> > +	struct device_domain_info *info = dev_iommu_priv_get(dev);
> > +	int ret;
> > +
> > +	ret = cache_tag_assign(domain, did, dev, pasid,
> CACHE_TAG_TYPE_PARENT_IOTLB);
> > +	if (ret || !info->ats_enabled)
> > +		return ret;
> 
> I'm not sure I understood the point of PARENT_IOTLB? I didn't see any
> different implementation?
> 
> Isn't this backwards though? Each domain should have a list of things
> to invalidate if the domain itself changes.
> 
> So the nesting parent should have a list of CHILD_DEVTLB's that need
> cleaning. That list is changed when the nesting domains are attached
> to something.
> 

probably just a naming confusion. it's called PARENT_IOTLB from the
angle that this domain is used as a parent domain but actually it
tracks the child tags in nested attach.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface
  2024-04-10 15:41   ` Jason Gunthorpe
  2024-04-10 23:14     ` Tian, Kevin
@ 2024-04-11 12:10     ` Baolu Lu
  2024-04-11 12:38     ` Baolu Lu
  2 siblings, 0 replies; 44+ messages in thread
From: Baolu Lu @ 2024-04-11 12:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Tina Zhang, Yi Liu, iommu, linux-kernel

On 2024/4/10 23:41, Jason Gunthorpe wrote:
> On Mon, Mar 25, 2024 at 10:16:54AM +0800, Lu Baolu wrote:
>> Caching tag is a combination of tags used by the hardware to cache various
>> translations. Whenever a mapping in a domain is changed, the IOMMU driver
>> should invalidate the caches with the caching tags. The VT-d specification
>> describes caching tags in section 6.2.1, Tagging of Cached Translations.
>>
>> Add interface to assign caching tags to an IOMMU domain when attached to a
>> RID or PASID, and unassign caching tags when a domain is detached from a
>> RID or PASID. All caching tags are listed in the per-domain tag list and
>> are protected by a dedicated lock.
>>
>> In addition to the basic IOTLB and devTLB caching tag types, PARENT_IOTLB
>> and PARENT_DEVTLB tag types are also introduced. These tags are used for
>> caches that store translations for DMA accesses through a nested user
>> domain. They are affected by changes to mappings in the parent domain.
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/intel/iommu.h  |  25 +++++
>>   drivers/iommu/intel/cache.c  | 192 +++++++++++++++++++++++++++++++++++
>>   drivers/iommu/intel/iommu.c  |  31 +++++-
>>   drivers/iommu/intel/nested.c |  21 +++-
>>   drivers/iommu/intel/svm.c    |  12 ++-
>>   drivers/iommu/intel/Makefile |   2 +-
>>   6 files changed, 274 insertions(+), 9 deletions(-)
>>   create mode 100644 drivers/iommu/intel/cache.c
>>
>> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
>> index 404d2476a877..e3723b7a0b31 100644
>> --- a/drivers/iommu/intel/iommu.h
>> +++ b/drivers/iommu/intel/iommu.h
>> @@ -607,6 +607,9 @@ struct dmar_domain {
>>   	struct list_head devices;	/* all devices' list */
>>   	struct list_head dev_pasids;	/* all attached pasids */
>>   
>> +	spinlock_t cache_lock;		/* Protect the cache tag list */
>> +	struct list_head cache_tags;	/* Cache tag list */
> 
> That is quite a neat trick - though building a dedicated invalidation
> list duplicates data stored in the attached devices list?

Yes. The device and dev_pasid lists appear to be duplicate. I am about
to remove these two lists later.

> You didn't try to make it RCU safe for invalidation?

The queued invalidation interface is a bit complicated, especially when
it comes to device TLB invalidation. Device TLB invalidation might
result in a timeout, which requires special treatment.

>> +struct cache_tag {
>> +	struct list_head node;
>> +	enum cache_tag_type type;
>> +	struct intel_iommu *iommu;
>> +	struct device *dev;
> 
> iommu and dev probably don't both need to be stored together. We have
> iommu_get_iommu_dev() now.. I suppose this is probably a union of the
> two pointers depending on tag. DEVTLB needs the dev and IOTLB needs
> the iommu.
> 
>> +	u16 domain_id;
>> +	ioasid_t pasid;
>> +	int users;
> 
> unsigned int

Sure.

> 
>> +static int __cache_tag_assign_parent_domain(struct dmar_domain *domain, u16 did,
>> +					    struct device *dev, ioasid_t pasid)
>> +{
>> +	struct device_domain_info *info = dev_iommu_priv_get(dev);
>> +	int ret;
>> +
>> +	ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_TYPE_PARENT_IOTLB);
>> +	if (ret || !info->ats_enabled)
>> +		return ret;
> 
> I'm not sure I understood the point of PARENT_IOTLB? I didn't see any
> different implementation?
> 
> Isn't this backwards though? Each domain should have a list of things
> to invalidate if the domain itself changes.
> 
> So the nesting parent should have a list of CHILD_DEVTLB's that need
> cleaning. That list is changed when the nesting domains are attached
> to something.
> 
> And a list of CHILD_IOTLBs, but the HW doesn't seem to need that?

This is a partial replacement of below series.

https://lore.kernel.org/all/20240208082307.15759-1-yi.l.liu@intel.com/

Best regards,
baolu


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface
  2024-04-10 15:41   ` Jason Gunthorpe
  2024-04-10 23:14     ` Tian, Kevin
  2024-04-11 12:10     ` Baolu Lu
@ 2024-04-11 12:38     ` Baolu Lu
  2024-04-12  3:38       ` Tian, Kevin
  2 siblings, 1 reply; 44+ messages in thread
From: Baolu Lu @ 2024-04-11 12:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Tina Zhang, Yi Liu, iommu, linux-kernel

On 2024/4/10 23:41, Jason Gunthorpe wrote:
>> +struct cache_tag {
>> +	struct list_head node;
>> +	enum cache_tag_type type;
>> +	struct intel_iommu *iommu;
>> +	struct device *dev;
> iommu and dev probably don't both need to be stored together. We have
> iommu_get_iommu_dev() now.. I suppose this is probably a union of the
> two pointers depending on tag. DEVTLB needs the dev and IOTLB needs
> the iommu.

I forgot to reply this comment in previous reply. Sorry about it.

struct cache_tag {
         [ ... ]
         struct intel_iommu *iommu;
         struct device *dev;
         [ ... ]
};

I treat @iommu as the queued invalidation interface. All cache
invalidation raises to hardware through the invalidation queue.

The @dev field represents the location of the cache. For IOTLB cache, it
resides on the IOMMU hardware. In this case, the field stores the device
pointer to the IOMMU hardware. For DevTLB cache, it locates in the PCIe
endpoint. Here, the field stores the device pointer to that endpoint.

A correctly set @dev pointer allows users to see more accurate trace
message.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface
  2024-04-10 23:14     ` Tian, Kevin
@ 2024-04-11 13:17       ` Baolu Lu
  2024-04-11 13:42         ` Jason Gunthorpe
  0 siblings, 1 reply; 44+ messages in thread
From: Baolu Lu @ 2024-04-11 13:17 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: baolu.lu, Joerg Roedel, Will Deacon, Robin Murphy, Zhang, Tina,
	Liu, Yi L, iommu, linux-kernel

On 2024/4/11 7:14, Tian, Kevin wrote:
>> From: Jason Gunthorpe <jgg@ziepe.ca>
>> Sent: Wednesday, April 10, 2024 11:42 PM
>>
>> On Mon, Mar 25, 2024 at 10:16:54AM +0800, Lu Baolu wrote:
>>> +static int __cache_tag_assign_parent_domain(struct dmar_domain
>> *domain, u16 did,
>>> +					    struct device *dev, ioasid_t pasid)
>>> +{
>>> +	struct device_domain_info *info = dev_iommu_priv_get(dev);
>>> +	int ret;
>>> +
>>> +	ret = cache_tag_assign(domain, did, dev, pasid,
>> CACHE_TAG_TYPE_PARENT_IOTLB);
>>> +	if (ret || !info->ats_enabled)
>>> +		return ret;
>>
>> I'm not sure I understood the point of PARENT_IOTLB? I didn't see any
>> different implementation?
>>
>> Isn't this backwards though? Each domain should have a list of things
>> to invalidate if the domain itself changes.
>>
>> So the nesting parent should have a list of CHILD_DEVTLB's that need
>> cleaning. That list is changed when the nesting domains are attached
>> to something.
>>
> 
> probably just a naming confusion. it's called PARENT_IOTLB from the
> angle that this domain is used as a parent domain but actually it
> tracks the child tags in nested attach.

Is NESTING_IOTLB more readable?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface
  2024-04-11 13:17       ` Baolu Lu
@ 2024-04-11 13:42         ` Jason Gunthorpe
  0 siblings, 0 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2024-04-11 13:42 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Tian, Kevin, Joerg Roedel, Will Deacon, Robin Murphy, Zhang,
	Tina, Liu, Yi L, iommu, linux-kernel

On Thu, Apr 11, 2024 at 09:17:41PM +0800, Baolu Lu wrote:
> On 2024/4/11 7:14, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@ziepe.ca>
> > > Sent: Wednesday, April 10, 2024 11:42 PM
> > > 
> > > On Mon, Mar 25, 2024 at 10:16:54AM +0800, Lu Baolu wrote:
> > > > +static int __cache_tag_assign_parent_domain(struct dmar_domain
> > > *domain, u16 did,
> > > > +					    struct device *dev, ioasid_t pasid)
> > > > +{
> > > > +	struct device_domain_info *info = dev_iommu_priv_get(dev);
> > > > +	int ret;
> > > > +
> > > > +	ret = cache_tag_assign(domain, did, dev, pasid,
> > > CACHE_TAG_TYPE_PARENT_IOTLB);
> > > > +	if (ret || !info->ats_enabled)
> > > > +		return ret;
> > > 
> > > I'm not sure I understood the point of PARENT_IOTLB? I didn't see any
> > > different implementation?
> > > 
> > > Isn't this backwards though? Each domain should have a list of things
> > > to invalidate if the domain itself changes.
> > > 
> > > So the nesting parent should have a list of CHILD_DEVTLB's that need
> > > cleaning. That list is changed when the nesting domains are attached
> > > to something.
> > > 
> > 
> > probably just a naming confusion. it's called PARENT_IOTLB from the
> > angle that this domain is used as a parent domain but actually it
> > tracks the child tags in nested attach.
> 
> Is NESTING_IOTLB more readable?

Yes

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface
  2024-04-11 12:38     ` Baolu Lu
@ 2024-04-12  3:38       ` Tian, Kevin
  0 siblings, 0 replies; 44+ messages in thread
From: Tian, Kevin @ 2024-04-12  3:38 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Zhang, Tina, Liu, Yi L,
	iommu, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Thursday, April 11, 2024 8:39 PM
> 
> On 2024/4/10 23:41, Jason Gunthorpe wrote:
> >> +struct cache_tag {
> >> +	struct list_head node;
> >> +	enum cache_tag_type type;
> >> +	struct intel_iommu *iommu;
> >> +	struct device *dev;
> > iommu and dev probably don't both need to be stored together. We have
> > iommu_get_iommu_dev() now.. I suppose this is probably a union of the
> > two pointers depending on tag. DEVTLB needs the dev and IOTLB needs
> > the iommu.
> 
> I forgot to reply this comment in previous reply. Sorry about it.
> 
> struct cache_tag {
>          [ ... ]
>          struct intel_iommu *iommu;
>          struct device *dev;
>          [ ... ]
> };
> 
> I treat @iommu as the queued invalidation interface. All cache
> invalidation raises to hardware through the invalidation queue.
> 
> The @dev field represents the location of the cache. For IOTLB cache, it
> resides on the IOMMU hardware. In this case, the field stores the device
> pointer to the IOMMU hardware. For DevTLB cache, it locates in the PCIe
> endpoint. Here, the field stores the device pointer to that endpoint.
> 
> A correctly set @dev pointer allows users to see more accurate trace
> message.
> 

it's not a bad to add a comment for @dev here.

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2024-04-12  3:38 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-25  2:16 [PATCH 00/12] Consolidate domain cache invalidation Lu Baolu
2024-03-25  2:16 ` [PATCH 01/12] iommu/vt-d: Add cache tag assignment interface Lu Baolu
2024-03-28  7:12   ` Tian, Kevin
2024-04-06 12:55     ` Baolu Lu
2024-04-07  4:35       ` Baolu Lu
2024-04-08  2:28         ` Tian, Kevin
2024-04-10 15:41   ` Jason Gunthorpe
2024-04-10 23:14     ` Tian, Kevin
2024-04-11 13:17       ` Baolu Lu
2024-04-11 13:42         ` Jason Gunthorpe
2024-04-11 12:10     ` Baolu Lu
2024-04-11 12:38     ` Baolu Lu
2024-04-12  3:38       ` Tian, Kevin
2024-03-25  2:16 ` [PATCH 02/12] iommu/vt-d: Add cache tag invalidation helpers Lu Baolu
2024-03-28  7:39   ` Tian, Kevin
2024-04-07  5:33     ` Baolu Lu
2024-04-08  2:33       ` Tian, Kevin
2024-04-08  2:53         ` Baolu Lu
2024-04-08  3:14           ` Tian, Kevin
2024-03-25  2:16 ` [PATCH 03/12] iommu/vt-d: Add trace events for cache tag interface Lu Baolu
2024-03-25  2:16 ` [PATCH 04/12] iommu/vt-d: Use cache_tag_flush_all() in flush_iotlb_all Lu Baolu
2024-03-28  7:47   ` Tian, Kevin
2024-04-07  5:56     ` Baolu Lu
2024-03-25  2:16 ` [PATCH 05/12] iommu/vt-d: Use cache_tag_flush_range() in tlb_sync Lu Baolu
2024-03-25  2:16 ` [PATCH 06/12] iommu/vt-d: Use cache_tag_flush_cm_range() in iotlb_sync_map Lu Baolu
2024-03-28  7:48   ` Tian, Kevin
2024-04-07  6:41     ` Baolu Lu
2024-04-08  2:51       ` Tian, Kevin
2024-04-08  2:57         ` Baolu Lu
2024-03-25  2:17 ` [PATCH 07/12] iommu/vt-d: Cleanup use of iommu_flush_iotlb_psi() Lu Baolu
2024-03-28  7:50   ` Tian, Kevin
2024-04-07  7:06     ` Baolu Lu
2024-04-08  2:57       ` Tian, Kevin
2024-03-25  2:17 ` [PATCH 08/12] iommu/vt-d: Use cache_tag_flush_range() in cache_invalidate_user Lu Baolu
2024-03-28  7:54   ` Tian, Kevin
2024-04-07  7:15     ` Baolu Lu
2024-03-25  2:17 ` [PATCH 09/12] iommu/vt-d: Use cache helpers in arch_invalidate_secondary_tlbs Lu Baolu
2024-03-25  2:17 ` [PATCH 10/12] iommu/vt-d: Retire intel_svm_dev Lu Baolu
2024-03-25  2:17 ` [PATCH 11/12] iommu: Add ops->domain_alloc_sva() Lu Baolu
2024-03-25  2:17 ` [PATCH 12/12] iommu/vt-d: Retire struct intel_svm Lu Baolu
2024-03-28  7:59 ` [PATCH 00/12] Consolidate domain cache invalidation Tian, Kevin
2024-04-07  7:28   ` Baolu Lu
2024-04-08  3:03     ` Tian, Kevin
2024-04-08  3:05       ` Baolu Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.