[PATCH v2 0/7] iommu/vt-d: Add page request draining support

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/7] iommu/vt-d: Add page request draining support
@ 2020-04-15  5:25 Lu Baolu
  2020-04-15  5:25 ` [PATCH v2 1/7] iommu/vt-d: Refactor parameters for qi_submit_sync() Lu Baolu
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Lu Baolu @ 2020-04-15  5:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

When a PASID is stopped or terminated, there can be pending PRQs
(requests that haven't received responses) in the software and
remapping hardware. The pending page requests must be drained
so that the pasid could be reused. The register level interface
for page request draining is defined in 7.11 of the VT-d spec.
This series adds the support for page requests draining.

This includes two parts:
 - PATCH 1/7 ~ 3/7: refactor the qi_submit_sync() to support
   multiple descriptors per submission which will be used by
   PATCH 6/7.
 - PATCH 4/7 ~ 7/7: add page request drain support after a
   pasid entry is torn down due to an unbind operation.

Please help to review.

Best regards,
baolu

Change log:
 v1->v2:
  - Fix race between multiple prq handling threads

Lu Baolu (7):
  iommu/vt-d: Refactor parameters for qi_submit_sync()
  iommu/vt-d: Multiple descriptors per qi_submit_sync()
  iommu/vt-d: debugfs: Add support to show inv queue internals
  iommu/vt-d: Refactor prq_event_thread()
  iommu/vt-d: Save prq descriptors in an internal list
  iommu/vt-d: Add page request draining support
  iommu/vt-d: Remove redundant IOTLB flush

 drivers/iommu/dmar.c                |  63 +++--
 drivers/iommu/intel-iommu-debugfs.c |  62 +++++
 drivers/iommu/intel-pasid.c         |   4 +-
 drivers/iommu/intel-svm.c           | 383 ++++++++++++++++++----------
 drivers/iommu/intel_irq_remapping.c |   2 +-
 include/linux/intel-iommu.h         |  12 +-
 6 files changed, 369 insertions(+), 157 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/7] iommu/vt-d: Refactor parameters for qi_submit_sync()
  2020-04-15  5:25 [PATCH v2 0/7] iommu/vt-d: Add page request draining support Lu Baolu
@ 2020-04-15  5:25 ` Lu Baolu
  2020-04-15  8:02   ` Tian, Kevin
  2020-04-15  5:25 ` [PATCH v2 2/7] iommu/vt-d: Multiple descriptors per qi_submit_sync() Lu Baolu
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Lu Baolu @ 2020-04-15  5:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

Current qi_submit_sync() supports single invalidation descriptor
per submission and appends wait descriptor after each submission
to poll hardware completion. This patch adjusts the parameters
of this function so that multiple descriptors per submission can
be supported.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/dmar.c                | 24 ++++++++++++++----------
 drivers/iommu/intel-pasid.c         |  4 ++--
 drivers/iommu/intel-svm.c           |  6 +++---
 drivers/iommu/intel_irq_remapping.c |  2 +-
 include/linux/intel-iommu.h         |  8 +++++++-
 5 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index d9dc787feef7..bb42177e2369 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1225,10 +1225,14 @@ static int qi_check_fault(struct intel_iommu *iommu, int index)
 }
 
 /*
- * Submit the queued invalidation descriptor to the remapping
- * hardware unit and wait for its completion.
+ * Function to submit invalidation descriptors of all types to the queued
+ * invalidation interface(QI). Multiple descriptors can be submitted at a
+ * time, a wait descriptor will be appended to each submission to ensure
+ * hardware has completed the invalidation before return. Wait descriptors
+ * can be part of the submission but it will not be polled for completion.
  */
-int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
+int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
+		   unsigned int count, unsigned long options)
 {
 	int rc;
 	struct q_inval *qi = iommu->qi;
@@ -1318,7 +1322,7 @@ void qi_global_iec(struct intel_iommu *iommu)
 	desc.qw3 = 0;
 
 	/* should never fail */
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm,
@@ -1332,7 +1336,7 @@ void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
@@ -1356,7 +1360,7 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
@@ -1378,7 +1382,7 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 /* PASID-based IOTLB invalidation */
@@ -1419,7 +1423,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
 				QI_EIOTLB_AM(mask);
 	}
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 /* PASID-based device IOTLB Invalidate */
@@ -1448,7 +1452,7 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 	if (size_order)
 		desc.qw1 |= QI_DEV_EIOTLB_SIZE;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
@@ -1458,7 +1462,7 @@ void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
 
 	desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) |
 			QI_PC_GRAN(granu) | QI_PC_TYPE;
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 /*
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 48cc9ca5f3dc..7969e3dac2ad 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -498,7 +498,7 @@ pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 static void
@@ -512,7 +512,7 @@ iotlb_invalidation_with_pasid(struct intel_iommu *iommu, u16 did, u32 pasid)
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 static void
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index e9f4e979a71f..83dc4319f661 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -138,7 +138,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
 	}
 	desc.qw2 = 0;
 	desc.qw3 = 0;
-	qi_submit_sync(&desc, svm->iommu);
+	qi_submit_sync(svm->iommu, &desc, 1, 0);
 
 	if (sdev->dev_iotlb) {
 		desc.qw0 = QI_DEV_EIOTLB_PASID(svm->pasid) |
@@ -162,7 +162,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
 		}
 		desc.qw2 = 0;
 		desc.qw3 = 0;
-		qi_submit_sync(&desc, svm->iommu);
+		qi_submit_sync(svm->iommu, &desc, 1, 0);
 	}
 }
 
@@ -850,7 +850,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 				       sizeof(req->priv_data));
 			resp.qw2 = 0;
 			resp.qw3 = 0;
-			qi_submit_sync(&resp, iommu);
+			qi_submit_sync(iommu, &resp, 1, 0);
 		}
 		head = (head + sizeof(*req)) & PRQ_RING_MASK;
 	}
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 81e43c1df7ec..a042f123b091 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -151,7 +151,7 @@ static int qi_flush_iec(struct intel_iommu *iommu, int index, int mask)
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	return qi_submit_sync(&desc, iommu);
+	return qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 static int modify_irte(struct irq_2_iommu *irq_iommu,
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index cfe720f10112..ee2d5cdd8339 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -710,7 +710,13 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
 			  int pasid);
 
-extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
+int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
+		   unsigned int count, unsigned long options);
+/*
+ * Options used in qi_submit_sync:
+ * QI_OPT_WAIT_DRAIN - Wait for PRQ drain completion, spec 6.5.2.8.
+ */
+#define QI_OPT_WAIT_DRAIN		BIT(0)
 
 extern int dmar_ir_support(void);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/7] iommu/vt-d: Multiple descriptors per qi_submit_sync()
  2020-04-15  5:25 [PATCH v2 0/7] iommu/vt-d: Add page request draining support Lu Baolu
  2020-04-15  5:25 ` [PATCH v2 1/7] iommu/vt-d: Refactor parameters for qi_submit_sync() Lu Baolu
@ 2020-04-15  5:25 ` Lu Baolu
  2020-04-15  8:18   ` Tian, Kevin
  2020-04-15  5:25 ` [PATCH v2 3/7] iommu/vt-d: debugfs: Add support to show inv queue internals Lu Baolu
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Lu Baolu @ 2020-04-15  5:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

Extend qi_submit_sync() function to support multiple descriptors.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/dmar.c        | 39 +++++++++++++++++++++++--------------
 include/linux/intel-iommu.h |  1 +
 2 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index bb42177e2369..61d049e91f84 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1157,12 +1157,11 @@ static inline void reclaim_free_desc(struct q_inval *qi)
 	}
 }
 
-static int qi_check_fault(struct intel_iommu *iommu, int index)
+static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index)
 {
 	u32 fault;
 	int head, tail;
 	struct q_inval *qi = iommu->qi;
-	int wait_index = (index + 1) % QI_LENGTH;
 	int shift = qi_shift(iommu);
 
 	if (qi->desc_status[wait_index] == QI_ABORT)
@@ -1234,12 +1233,12 @@ static int qi_check_fault(struct intel_iommu *iommu, int index)
 int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
 		   unsigned int count, unsigned long options)
 {
-	int rc;
 	struct q_inval *qi = iommu->qi;
-	int offset, shift, length;
 	struct qi_desc wait_desc;
 	int wait_index, index;
 	unsigned long flags;
+	int offset, shift;
+	int rc, i;
 
 	if (!qi)
 		return 0;
@@ -1248,32 +1247,41 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
 	rc = 0;
 
 	raw_spin_lock_irqsave(&qi->q_lock, flags);
-	while (qi->free_cnt < 3) {
+	/*
+	 * Check if we have enough empty slots in the queue to submit,
+	 * the calculation is based on:
+	 * # of desc + 1 wait desc + 1 space between head and tail
+	 */
+	while (qi->free_cnt < count + 2) {
 		raw_spin_unlock_irqrestore(&qi->q_lock, flags);
 		cpu_relax();
 		raw_spin_lock_irqsave(&qi->q_lock, flags);
 	}
 
 	index = qi->free_head;
-	wait_index = (index + 1) % QI_LENGTH;
+	wait_index = (index + count) % QI_LENGTH;
 	shift = qi_shift(iommu);
-	length = 1 << shift;
 
-	qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE;
+	for (i = 0; i < count; i++) {
+		offset = ((index + i) % QI_LENGTH) << shift;
+		memcpy(qi->desc + offset, &desc[i], 1 << shift);
+		qi->desc_status[(index + i) % QI_LENGTH] = QI_IN_USE;
+	}
+	qi->desc_status[wait_index] = QI_IN_USE;
 
-	offset = index << shift;
-	memcpy(qi->desc + offset, desc, length);
 	wait_desc.qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
 			QI_IWD_STATUS_WRITE | QI_IWD_TYPE;
+	if (options & QI_OPT_WAIT_DRAIN)
+		wait_desc.qw0 |= QI_IWD_PRQ_DRAIN;
 	wait_desc.qw1 = virt_to_phys(&qi->desc_status[wait_index]);
 	wait_desc.qw2 = 0;
 	wait_desc.qw3 = 0;
 
 	offset = wait_index << shift;
-	memcpy(qi->desc + offset, &wait_desc, length);
+	memcpy(qi->desc + offset, &wait_desc, 1 << shift);
 
-	qi->free_head = (qi->free_head + 2) % QI_LENGTH;
-	qi->free_cnt -= 2;
+	qi->free_head = (qi->free_head + count + 1) % QI_LENGTH;
+	qi->free_cnt -= count + 1;
 
 	/*
 	 * update the HW tail register indicating the presence of
@@ -1289,7 +1297,7 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
 		 * a deadlock where the interrupt context can wait indefinitely
 		 * for free slots in the queue.
 		 */
-		rc = qi_check_fault(iommu, index);
+		rc = qi_check_fault(iommu, index, wait_index);
 		if (rc)
 			break;
 
@@ -1298,7 +1306,8 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
 		raw_spin_lock(&qi->q_lock);
 	}
 
-	qi->desc_status[index] = QI_DONE;
+	for (i = 0; i < count; i++)
+		qi->desc_status[(index + i) % QI_LENGTH] = QI_DONE;
 
 	reclaim_free_desc(qi);
 	raw_spin_unlock_irqrestore(&qi->q_lock, flags);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index ee2d5cdd8339..cca1e5f9aeaa 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -333,6 +333,7 @@ enum {
 
 #define QI_IWD_STATUS_DATA(d)	(((u64)d) << 32)
 #define QI_IWD_STATUS_WRITE	(((u64)1) << 5)
+#define QI_IWD_PRQ_DRAIN	(((u64)1) << 7)
 
 #define QI_IOTLB_DID(did) 	(((u64)did) << 16)
 #define QI_IOTLB_DR(dr) 	(((u64)dr) << 7)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 3/7] iommu/vt-d: debugfs: Add support to show inv queue internals
  2020-04-15  5:25 [PATCH v2 0/7] iommu/vt-d: Add page request draining support Lu Baolu
  2020-04-15  5:25 ` [PATCH v2 1/7] iommu/vt-d: Refactor parameters for qi_submit_sync() Lu Baolu
  2020-04-15  5:25 ` [PATCH v2 2/7] iommu/vt-d: Multiple descriptors per qi_submit_sync() Lu Baolu
@ 2020-04-15  5:25 ` Lu Baolu
  2020-04-15  5:25 ` [PATCH v2 4/7] iommu/vt-d: Refactor prq_event_thread() Lu Baolu
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Lu Baolu @ 2020-04-15  5:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

Export invalidation queue internals of each iommu device through
the debugfs.

Example of such dump on a Skylake machine:

$ sudo cat /sys/kernel/debug/iommu/intel/invalidation_queue
Invalidation queue on IOMMU: dmar1
 Base: 0x1672c9000      Head: 80        Tail: 80
Index           qw0                     qw1                     status
    0   0000000000000004        0000000000000000        0000000000000000
    1   0000000200000025        00000001672be804        0000000000000000
    2   0000000000000011        0000000000000000        0000000000000000
    3   0000000200000025        00000001672be80c        0000000000000000
    4   00000000000000d2        0000000000000000        0000000000000000
    5   0000000200000025        00000001672be814        0000000000000000
    6   0000000000000014        0000000000000000        0000000000000000
    7   0000000200000025        00000001672be81c        0000000000000000
    8   0000000000000014        0000000000000000        0000000000000000
    9   0000000200000025        00000001672be824        0000000000000000

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu-debugfs.c | 62 +++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/drivers/iommu/intel-iommu-debugfs.c b/drivers/iommu/intel-iommu-debugfs.c
index 3eb1fe240fb0..e3089865b8f3 100644
--- a/drivers/iommu/intel-iommu-debugfs.c
+++ b/drivers/iommu/intel-iommu-debugfs.c
@@ -372,6 +372,66 @@ static int domain_translation_struct_show(struct seq_file *m, void *unused)
 }
 DEFINE_SHOW_ATTRIBUTE(domain_translation_struct);
 
+static void invalidation_queue_entry_show(struct seq_file *m,
+					  struct intel_iommu *iommu)
+{
+	int index, shift = qi_shift(iommu);
+	struct qi_desc *desc;
+	int offset;
+
+	if (ecap_smts(iommu->ecap))
+		seq_puts(m, "Index\t\tqw0\t\t\tqw1\t\t\tqw2\t\t\tqw3\t\t\tstatus\n");
+	else
+		seq_puts(m, "Index\t\tqw0\t\t\tqw1\t\t\tstatus\n");
+
+	for (index = 0; index < QI_LENGTH; index++) {
+		offset = index << shift;
+		desc = iommu->qi->desc + offset;
+		if (ecap_smts(iommu->ecap))
+			seq_printf(m, "%5d\t%016llx\t%016llx\t%016llx\t%016llx\t%016x\n",
+				   index, desc->qw0, desc->qw1,
+				   desc->qw2, desc->qw3,
+				   iommu->qi->desc_status[index]);
+		else
+			seq_printf(m, "%5d\t%016llx\t%016llx\t%016x\n",
+				   index, desc->qw0, desc->qw1,
+				   iommu->qi->desc_status[index]);
+	}
+}
+
+static int invalidation_queue_show(struct seq_file *m, void *unused)
+{
+	struct dmar_drhd_unit *drhd;
+	struct intel_iommu *iommu;
+	unsigned long flags;
+	struct q_inval *qi;
+	int shift;
+
+	rcu_read_lock();
+	for_each_active_iommu(iommu, drhd) {
+		qi = iommu->qi;
+		shift = qi_shift(iommu);
+
+		if (!qi || !ecap_qis(iommu->ecap))
+			continue;
+
+		seq_printf(m, "Invalidation queue on IOMMU: %s\n", iommu->name);
+
+		raw_spin_lock_irqsave(&qi->q_lock, flags);
+		seq_printf(m, " Base: 0x%llx\tHead: %lld\tTail: %lld\n",
+			   virt_to_phys(qi->desc),
+			   dmar_readq(iommu->reg + DMAR_IQH_REG) >> shift,
+			   dmar_readq(iommu->reg + DMAR_IQT_REG) >> shift);
+		invalidation_queue_entry_show(m, iommu);
+		raw_spin_unlock_irqrestore(&qi->q_lock, flags);
+		seq_putc(m, '\n');
+	}
+	rcu_read_unlock();
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(invalidation_queue);
+
 #ifdef CONFIG_IRQ_REMAP
 static void ir_tbl_remap_entry_show(struct seq_file *m,
 				    struct intel_iommu *iommu)
@@ -490,6 +550,8 @@ void __init intel_iommu_debugfs_init(void)
 	debugfs_create_file("domain_translation_struct", 0444,
 			    intel_iommu_debug, NULL,
 			    &domain_translation_struct_fops);
+	debugfs_create_file("invalidation_queue", 0444, intel_iommu_debug,
+			    NULL, &invalidation_queue_fops);
 #ifdef CONFIG_IRQ_REMAP
 	debugfs_create_file("ir_translation_struct", 0444, intel_iommu_debug,
 			    NULL, &ir_translation_struct_fops);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 4/7] iommu/vt-d: Refactor prq_event_thread()
  2020-04-15  5:25 [PATCH v2 0/7] iommu/vt-d: Add page request draining support Lu Baolu
                   ` (2 preceding siblings ...)
  2020-04-15  5:25 ` [PATCH v2 3/7] iommu/vt-d: debugfs: Add support to show inv queue internals Lu Baolu
@ 2020-04-15  5:25 ` Lu Baolu
  2020-04-15  9:15   ` Tian, Kevin
  2020-04-15  5:25 ` [PATCH v2 5/7] iommu/vt-d: Save prq descriptors in an internal list Lu Baolu
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Lu Baolu @ 2020-04-15  5:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

Move the software processing page request descriptors part from
prq_event_thread() into a separated function. No any functional
changes.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-svm.c | 256 ++++++++++++++++++++------------------
 1 file changed, 135 insertions(+), 121 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 83dc4319f661..a1921b462783 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -722,142 +722,156 @@ static bool is_canonical_address(u64 addr)
 	return (((saddr << shift) >> shift) == saddr);
 }
 
-static irqreturn_t prq_event_thread(int irq, void *d)
+static void process_single_prq(struct intel_iommu *iommu,
+			       struct page_req_dsc *req)
 {
-	struct intel_iommu *iommu = d;
-	struct intel_svm *svm = NULL;
-	int head, tail, handled = 0;
-
-	/* Clear PPR bit before reading head/tail registers, to
-	 * ensure that we get a new interrupt if needed. */
-	writel(DMA_PRS_PPR, iommu->reg + DMAR_PRS_REG);
-
-	tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
-	head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
-	while (head != tail) {
-		struct intel_svm_dev *sdev;
-		struct vm_area_struct *vma;
-		struct page_req_dsc *req;
-		struct qi_desc resp;
-		int result;
-		vm_fault_t ret;
-		u64 address;
-
-		handled = 1;
-
-		req = &iommu->prq[head / sizeof(*req)];
+	int result = QI_RESP_FAILURE;
+	struct intel_svm_dev *sdev;
+	struct vm_area_struct *vma;
+	struct intel_svm *svm;
+	struct qi_desc resp;
+	vm_fault_t ret;
+	u64 address;
+
+	address = (u64)req->addr << VTD_PAGE_SHIFT;
+	if (!req->pasid_present) {
+		pr_err("%s: Page request without PASID: %08llx %08llx\n",
+		       iommu->name, ((unsigned long long *)req)[0],
+		       ((unsigned long long *)req)[1]);
+		goto no_pasid;
+	}
 
-		result = QI_RESP_FAILURE;
-		address = (u64)req->addr << VTD_PAGE_SHIFT;
-		if (!req->pasid_present) {
-			pr_err("%s: Page request without PASID: %08llx %08llx\n",
-			       iommu->name, ((unsigned long long *)req)[0],
-			       ((unsigned long long *)req)[1]);
-			goto no_pasid;
-		}
+	rcu_read_lock();
+	svm = ioasid_find(NULL, req->pasid, NULL);
+	/*
+	 * It *can't* go away, because the driver is not permitted
+	 * to unbind the mm while any page faults are outstanding.
+	 * So we only need RCU to protect the internal idr code.
+	 */
+	rcu_read_unlock();
 
-		if (!svm || svm->pasid != req->pasid) {
-			rcu_read_lock();
-			svm = ioasid_find(NULL, req->pasid, NULL);
-			/* It *can't* go away, because the driver is not permitted
-			 * to unbind the mm while any page faults are outstanding.
-			 * So we only need RCU to protect the internal idr code. */
-			rcu_read_unlock();
-			if (IS_ERR_OR_NULL(svm)) {
-				pr_err("%s: Page request for invalid PASID %d: %08llx %08llx\n",
-				       iommu->name, req->pasid, ((unsigned long long *)req)[0],
-				       ((unsigned long long *)req)[1]);
-				goto no_pasid;
-			}
-		}
+	if (IS_ERR_OR_NULL(svm)) {
+		pr_err("%s: Page request for invalid PASID %d: %08llx %08llx\n",
+		       iommu->name, req->pasid, ((unsigned long long *)req)[0],
+		       ((unsigned long long *)req)[1]);
+		goto no_pasid;
+	}
 
-		result = QI_RESP_INVALID;
-		/* Since we're using init_mm.pgd directly, we should never take
-		 * any faults on kernel addresses. */
-		if (!svm->mm)
-			goto bad_req;
+	result = QI_RESP_INVALID;
+	/* Since we're using init_mm.pgd directly, we should never take
+	 * any faults on kernel addresses. */
+	if (!svm->mm)
+		goto bad_req;
+
+	/* If address is not canonical, return invalid response */
+	if (!is_canonical_address(address))
+		goto bad_req;
+
+	/* If the mm is already defunct, don't handle faults. */
+	if (!mmget_not_zero(svm->mm))
+		goto bad_req;
+
+	down_read(&svm->mm->mmap_sem);
+	vma = find_extend_vma(svm->mm, address);
+	if (!vma || address < vma->vm_start)
+		goto invalid;
+
+	if (access_error(vma, req))
+		goto invalid;
+
+	ret = handle_mm_fault(vma, address,
+			      req->wr_req ? FAULT_FLAG_WRITE : 0);
+	if (ret & VM_FAULT_ERROR)
+		goto invalid;
+
+	result = QI_RESP_SUCCESS;
+invalid:
+	up_read(&svm->mm->mmap_sem);
+	mmput(svm->mm);
+bad_req:
+	/* Accounting for major/minor faults? */
+	rcu_read_lock();
+	list_for_each_entry_rcu(sdev, &svm->devs, list) {
+		if (sdev->sid == req->rid)
+			break;
+	}
 
-		/* If address is not canonical, return invalid response */
-		if (!is_canonical_address(address))
-			goto bad_req;
+	/* Other devices can go away, but the drivers are not permitted
+	 * to unbind while any page faults might be in flight. So it's
+	 * OK to drop the 'lock' here now we have it. */
+	rcu_read_unlock();
 
-		/* If the mm is already defunct, don't handle faults. */
-		if (!mmget_not_zero(svm->mm))
-			goto bad_req;
+	if (WARN_ON(&sdev->list == &svm->devs))
+		sdev = NULL;
 
-		down_read(&svm->mm->mmap_sem);
-		vma = find_extend_vma(svm->mm, address);
-		if (!vma || address < vma->vm_start)
-			goto invalid;
+	if (sdev && sdev->ops && sdev->ops->fault_cb) {
+		int rwxp = (req->rd_req << 3) | (req->wr_req << 2) |
+			(req->exe_req << 1) | (req->pm_req);
+		sdev->ops->fault_cb(sdev->dev, req->pasid, req->addr,
+				    req->priv_data, rwxp, result);
+	}
 
-		if (access_error(vma, req))
-			goto invalid;
+	/* We get here in the error case where the PASID lookup failed,
+	   and these can be NULL. Do not use them below this point! */
+	sdev = NULL;
+	svm = NULL;
+no_pasid:
+	if (req->lpig || req->priv_data_present) {
+		/*
+		 * Per VT-d spec. v3.0 ch7.7, system software must
+		 * respond with page group response if private data
+		 * is present (PDP) or last page in group (LPIG) bit
+		 * is set. This is an additional VT-d feature beyond
+		 * PCI ATS spec.
+		 */
+		resp.qw0 = QI_PGRP_PASID(req->pasid) |
+			QI_PGRP_DID(req->rid) |
+			QI_PGRP_PASID_P(req->pasid_present) |
+			QI_PGRP_PDP(req->pasid_present) |
+			QI_PGRP_RESP_CODE(result) |
+			QI_PGRP_RESP_TYPE;
+		resp.qw1 = QI_PGRP_IDX(req->prg_index) |
+			QI_PGRP_LPIG(req->lpig);
+
+		if (req->priv_data_present)
+			memcpy(&resp.qw2, req->priv_data,
+			       sizeof(req->priv_data));
+		resp.qw2 = 0;
+		resp.qw3 = 0;
+		qi_submit_sync(iommu, &resp, 1, 0);
+	}
+}
 
-		ret = handle_mm_fault(vma, address,
-				      req->wr_req ? FAULT_FLAG_WRITE : 0);
-		if (ret & VM_FAULT_ERROR)
-			goto invalid;
+static void intel_svm_process_prq(struct intel_iommu *iommu,
+				  struct page_req_dsc *prq,
+				  int head, int tail)
+{
+	struct page_req_dsc *req;
 
-		result = QI_RESP_SUCCESS;
-	invalid:
-		up_read(&svm->mm->mmap_sem);
-		mmput(svm->mm);
-	bad_req:
-		/* Accounting for major/minor faults? */
-		rcu_read_lock();
-		list_for_each_entry_rcu(sdev, &svm->devs, list) {
-			if (sdev->sid == req->rid)
-				break;
-		}
-		/* Other devices can go away, but the drivers are not permitted
-		 * to unbind while any page faults might be in flight. So it's
-		 * OK to drop the 'lock' here now we have it. */
-		rcu_read_unlock();
-
-		if (WARN_ON(&sdev->list == &svm->devs))
-			sdev = NULL;
-
-		if (sdev && sdev->ops && sdev->ops->fault_cb) {
-			int rwxp = (req->rd_req << 3) | (req->wr_req << 2) |
-				(req->exe_req << 1) | (req->pm_req);
-			sdev->ops->fault_cb(sdev->dev, req->pasid, req->addr,
-					    req->priv_data, rwxp, result);
-		}
-		/* We get here in the error case where the PASID lookup failed,
-		   and these can be NULL. Do not use them below this point! */
-		sdev = NULL;
-		svm = NULL;
-	no_pasid:
-		if (req->lpig || req->priv_data_present) {
-			/*
-			 * Per VT-d spec. v3.0 ch7.7, system software must
-			 * respond with page group response if private data
-			 * is present (PDP) or last page in group (LPIG) bit
-			 * is set. This is an additional VT-d feature beyond
-			 * PCI ATS spec.
-			 */
-			resp.qw0 = QI_PGRP_PASID(req->pasid) |
-				QI_PGRP_DID(req->rid) |
-				QI_PGRP_PASID_P(req->pasid_present) |
-				QI_PGRP_PDP(req->pasid_present) |
-				QI_PGRP_RESP_CODE(result) |
-				QI_PGRP_RESP_TYPE;
-			resp.qw1 = QI_PGRP_IDX(req->prg_index) |
-				QI_PGRP_LPIG(req->lpig);
-
-			if (req->priv_data_present)
-				memcpy(&resp.qw2, req->priv_data,
-				       sizeof(req->priv_data));
-			resp.qw2 = 0;
-			resp.qw3 = 0;
-			qi_submit_sync(iommu, &resp, 1, 0);
-		}
+	while (head != tail) {
+		req = &iommu->prq[head / sizeof(*req)];
+		process_single_prq(iommu, req);
 		head = (head + sizeof(*req)) & PRQ_RING_MASK;
 	}
+}
+
+static irqreturn_t prq_event_thread(int irq, void *d)
+{
+	struct intel_iommu *iommu = d;
+	int head, tail;
 
+	/*
+	 * Clear PPR bit before reading head/tail registers, to
+	 * ensure that we get a new interrupt if needed.
+	 */
+	writel(DMA_PRS_PPR, iommu->reg + DMAR_PRS_REG);
+
+	tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
+	head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
+	intel_svm_process_prq(iommu, iommu->prq, head, tail);
 	dmar_writeq(iommu->reg + DMAR_PQH_REG, tail);
 
-	return IRQ_RETVAL(handled);
+	return IRQ_RETVAL(1);
 }
 
 #define to_intel_svm_dev(handle) container_of(handle, struct intel_svm_dev, sva)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 5/7] iommu/vt-d: Save prq descriptors in an internal list
  2020-04-15  5:25 [PATCH v2 0/7] iommu/vt-d: Add page request draining support Lu Baolu
                   ` (3 preceding siblings ...)
  2020-04-15  5:25 ` [PATCH v2 4/7] iommu/vt-d: Refactor prq_event_thread() Lu Baolu
@ 2020-04-15  5:25 ` Lu Baolu
  2020-04-15  9:30   ` Tian, Kevin
  2020-04-15  5:25 ` [PATCH v2 6/7] iommu/vt-d: Add page request draining support Lu Baolu
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Lu Baolu @ 2020-04-15  5:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

Currently, the page request interrupt thread handles the page
requests in the queue in this way:

- Clear PPR bit to ensure new interrupt could come in;
- Read and record the head and tail registers;
- Handle all descriptors between head and tail;
- Write tail to head register.

This might cause some descriptors to be handles multiple times.
An example sequence:

- Thread A got scheduled with PRQ_1 and PRQ_2 in the queue;
- Thread A clear the PPR bit and record the head and tail;
- A new PRQ_3 comes and Thread B gets scheduled;
- Thread B record the head and tail which includes PRQ_1
  and PRQ_2.

As the result, PRQ_1 and PRQ_2 are handled twice in Thread_A and
Thread_B.

       Thread_A            Thread_B
      .--------.          .--------.
      |        |          |        |
      .--------.          .--------.
  head| PRQ_1  |      head| PRQ_1  |
      .--------.          .--------.
      | PRQ_2  |          | PRQ_2  |
      .--------.          .--------.
  tail|        |          | PRQ_3  |
      .--------.          .--------.
      |        |      tail|        |
      '--------'          '--------'

To avoid this, probably, we need to apply a spinlock to ensure
that PRQs are handled in a serialized way. But that means the
intel_svm_process_prq() will be called with a spinlock held.
This causes extra complexities in intel_svm_process_prq().

This aims to make PRQ descriptors to be handled in a serialized
way while remove the requirement of holding the spin lock in
intel_svm_process_prq() by saving the descriptors in a list.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-svm.c   | 58 ++++++++++++++++++++++++++++++-------
 include/linux/intel-iommu.h |  2 ++
 2 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index a1921b462783..05aeb8ea51c4 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -50,6 +50,8 @@ int intel_svm_enable_prq(struct intel_iommu *iommu)
 		return ret;
 	}
 	iommu->pr_irq = irq;
+	INIT_LIST_HEAD(&iommu->prq_list);
+	spin_lock_init(&iommu->prq_lock);
 
 	snprintf(iommu->prq_name, sizeof(iommu->prq_name), "dmar%d-prq", iommu->seq_id);
 
@@ -698,6 +700,14 @@ struct page_req_dsc {
 
 #define PRQ_RING_MASK	((0x1000 << PRQ_ORDER) - 0x20)
 
+struct page_req {
+	struct list_head list;
+	struct page_req_dsc desc;
+	unsigned int processing:1;
+	unsigned int drained:1;
+	unsigned int completed:1;
+};
+
 static bool access_error(struct vm_area_struct *vma, struct page_req_dsc *req)
 {
 	unsigned long requested = 0;
@@ -842,34 +852,60 @@ static void process_single_prq(struct intel_iommu *iommu,
 	}
 }
 
-static void intel_svm_process_prq(struct intel_iommu *iommu,
-				  struct page_req_dsc *prq,
-				  int head, int tail)
+static void intel_svm_process_prq(struct intel_iommu *iommu)
 {
-	struct page_req_dsc *req;
-
-	while (head != tail) {
-		req = &iommu->prq[head / sizeof(*req)];
-		process_single_prq(iommu, req);
-		head = (head + sizeof(*req)) & PRQ_RING_MASK;
+	struct page_req *req;
+	unsigned long flags;
+
+	spin_lock_irqsave(&iommu->prq_lock, flags);
+	while (!list_empty(&iommu->prq_list)) {
+		req = list_first_entry(&iommu->prq_list, struct page_req, list);
+		if (!req->processing) {
+			req->processing = true;
+			spin_unlock_irqrestore(&iommu->prq_lock, flags);
+			process_single_prq(iommu, &req->desc);
+			spin_lock_irqsave(&iommu->prq_lock, flags);
+			req->completed = true;
+		} else if (req->completed) {
+			list_del(&req->list);
+			kfree(req);
+		} else {
+			break;
+		}
 	}
+	spin_unlock_irqrestore(&iommu->prq_lock, flags);
 }
 
 static irqreturn_t prq_event_thread(int irq, void *d)
 {
 	struct intel_iommu *iommu = d;
+	unsigned long flags;
 	int head, tail;
 
+	spin_lock_irqsave(&iommu->prq_lock, flags);
 	/*
 	 * Clear PPR bit before reading head/tail registers, to
 	 * ensure that we get a new interrupt if needed.
 	 */
 	writel(DMA_PRS_PPR, iommu->reg + DMAR_PRS_REG);
-
 	tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
 	head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
-	intel_svm_process_prq(iommu, iommu->prq, head, tail);
+	while (head != tail) {
+		struct page_req_dsc *dsc;
+		struct page_req *req;
+
+		dsc = &iommu->prq[head / sizeof(*dsc)];
+		req = kzalloc(sizeof (*req), GFP_ATOMIC);
+		if (!req)
+			break;
+		req->desc = *dsc;
+		list_add_tail(&req->list, &iommu->prq_list);
+		head = (head + sizeof(*dsc)) & PRQ_RING_MASK;
+	}
 	dmar_writeq(iommu->reg + DMAR_PQH_REG, tail);
+	spin_unlock_irqrestore(&iommu->prq_lock, flags);
+
+	intel_svm_process_prq(iommu);
 
 	return IRQ_RETVAL(1);
 }
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index cca1e5f9aeaa..80715a59491c 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -590,6 +590,8 @@ struct intel_iommu {
 #ifdef CONFIG_INTEL_IOMMU_SVM
 	struct page_req_dsc *prq;
 	unsigned char prq_name[16];    /* Name for PRQ interrupt */
+	struct list_head prq_list;	/* pending page request list */
+	spinlock_t prq_lock;		/* protect above pending list */
 	struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for PASIDs */
 #endif
 	struct q_inval  *qi;            /* Queued invalidation info */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 6/7] iommu/vt-d: Add page request draining support
  2020-04-15  5:25 [PATCH v2 0/7] iommu/vt-d: Add page request draining support Lu Baolu
                   ` (4 preceding siblings ...)
  2020-04-15  5:25 ` [PATCH v2 5/7] iommu/vt-d: Save prq descriptors in an internal list Lu Baolu
@ 2020-04-15  5:25 ` Lu Baolu
  2020-04-15 11:10   ` Tian, Kevin
  2020-04-15  5:25 ` [PATCH v2 7/7] iommu/vt-d: Remove redundant IOTLB flush Lu Baolu
  2020-04-15  7:57 ` [PATCH v2 0/7] iommu/vt-d: Add page request draining support Tian, Kevin
  7 siblings, 1 reply; 24+ messages in thread
From: Lu Baolu @ 2020-04-15  5:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

When a PASID is stopped or terminated, there can be pending
PRQs (requests that haven't received responses) in remapping
hardware. This adds the interface to drain page requests and
call it when a PASID is terminated.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-svm.c   | 90 ++++++++++++++++++++++++++++++++++---
 include/linux/intel-iommu.h |  1 +
 2 files changed, 86 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 05aeb8ea51c4..736dd39fb52b 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -23,6 +23,7 @@
 #include "intel-pasid.h"
 
 static irqreturn_t prq_event_thread(int irq, void *d);
+static void intel_svm_drain_prq(struct device *dev, int pasid);
 
 #define PRQ_ORDER 0
 
@@ -210,6 +211,7 @@ static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 	rcu_read_lock();
 	list_for_each_entry_rcu(sdev, &svm->devs, list) {
 		intel_pasid_tear_down_entry(svm->iommu, sdev->dev, svm->pasid);
+		intel_svm_drain_prq(sdev->dev, svm->pasid);
 		intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 	}
 	rcu_read_unlock();
@@ -403,12 +405,8 @@ int intel_svm_unbind_gpasid(struct device *dev, int pasid)
 		if (!sdev->users) {
 			list_del_rcu(&sdev->list);
 			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+			intel_svm_drain_prq(dev, svm->pasid);
 			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
-			/* TODO: Drain in flight PRQ for the PASID since it
-			 * may get reused soon, we don't want to
-			 * confuse with its previous life.
-			 * intel_svm_drain_prq(dev, pasid);
-			 */
 			kfree_rcu(sdev, rcu);
 
 			if (list_empty(&svm->devs)) {
@@ -646,6 +644,7 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
 			 * large and has to be physically contiguous. So it's
 			 * hard to be as defensive as we might like. */
 			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+			intel_svm_drain_prq(dev, svm->pasid);
 			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 			kfree_rcu(sdev, rcu);
 
@@ -703,6 +702,7 @@ struct page_req_dsc {
 struct page_req {
 	struct list_head list;
 	struct page_req_dsc desc;
+	struct completion complete;
 	unsigned int processing:1;
 	unsigned int drained:1;
 	unsigned int completed:1;
@@ -732,9 +732,83 @@ static bool is_canonical_address(u64 addr)
 	return (((saddr << shift) >> shift) == saddr);
 }
 
+/**
+ * intel_svm_drain_prq:
+ *
+ * Drain all pending page requests related to a specific pasid in both
+ * software and hardware. The caller must guarantee that no more page
+ * requests related to this pasid coming.
+ */
+static void intel_svm_drain_prq(struct device *dev, int pasid)
+{
+	struct device_domain_info *info;
+	struct dmar_domain *domain;
+	struct intel_iommu *iommu;
+	struct qi_desc desc[3];
+	struct pci_dev *pdev;
+	struct page_req *req;
+	unsigned long flags;
+	u16 sid, did;
+	int qdep;
+
+	info = get_domain_info(dev);
+	if (WARN_ON(!info || !dev_is_pci(dev)))
+		return;
+
+	iommu = info->iommu;
+	domain = info->domain;
+	pdev = to_pci_dev(dev);
+
+	/* Mark all related pending requests drained. */
+	spin_lock_irqsave(&iommu->prq_lock, flags);
+	list_for_each_entry(req, &iommu->prq_list, list)
+		if (req->desc.pasid_present && req->desc.pasid == pasid)
+			req->drained = true;
+	spin_unlock_irqrestore(&iommu->prq_lock, flags);
+
+	/* Wait until all related pending requests complete. */
+retry:
+	spin_lock_irqsave(&iommu->prq_lock, flags);
+	list_for_each_entry(req, &iommu->prq_list, list) {
+		if (req->desc.pasid_present &&
+		    req->desc.pasid == pasid &&
+		    !req->completed) {
+			spin_unlock_irqrestore(&iommu->prq_lock, flags);
+			wait_for_completion_timeout(&req->complete, 5 * HZ);
+			goto retry;
+		}
+	}
+	spin_unlock_irqrestore(&iommu->prq_lock, flags);
+
+	/*
+	 * Perform steps described in VT-d spec CH7.10 to drain page
+	 * request and responses in hardware.
+	 */
+	sid = PCI_DEVID(info->bus, info->devfn);
+	did = domain->iommu_did[iommu->seq_id];
+	qdep = pci_ats_queue_depth(pdev);
+
+	memset(desc, 0, sizeof(desc));
+	desc[0].qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
+			QI_IWD_FENCE |
+			QI_IWD_TYPE;
+	desc[1].qw0 = QI_EIOTLB_PASID(pasid) |
+			QI_EIOTLB_DID(did) |
+			QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
+			QI_EIOTLB_TYPE;
+	desc[2].qw0 = QI_DEV_EIOTLB_PASID(pasid) |
+			QI_DEV_EIOTLB_SID(sid) |
+			QI_DEV_EIOTLB_QDEP(qdep) |
+			QI_DEIOTLB_TYPE |
+			QI_DEV_IOTLB_PFSID(info->pfsid);
+
+	qi_submit_sync(iommu, desc, 3, QI_OPT_WAIT_DRAIN);
+}
+
 static void process_single_prq(struct intel_iommu *iommu,
 			       struct page_req_dsc *req)
 {
+	struct page_req *p_req = container_of(req, struct page_req, desc);
 	int result = QI_RESP_FAILURE;
 	struct intel_svm_dev *sdev;
 	struct vm_area_struct *vma;
@@ -768,6 +842,10 @@ static void process_single_prq(struct intel_iommu *iommu,
 	}
 
 	result = QI_RESP_INVALID;
+
+	if (p_req->drained)
+		goto bad_req;
+
 	/* Since we're using init_mm.pgd directly, we should never take
 	 * any faults on kernel addresses. */
 	if (!svm->mm)
@@ -868,6 +946,7 @@ static void intel_svm_process_prq(struct intel_iommu *iommu)
 			req->completed = true;
 		} else if (req->completed) {
 			list_del(&req->list);
+			complete(&req->complete);
 			kfree(req);
 		} else {
 			break;
@@ -899,6 +978,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 		if (!req)
 			break;
 		req->desc = *dsc;
+		init_completion(&req->complete);
 		list_add_tail(&req->list, &iommu->prq_list);
 		head = (head + sizeof(*dsc)) & PRQ_RING_MASK;
 	}
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 80715a59491c..714a0df3d879 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -333,6 +333,7 @@ enum {
 
 #define QI_IWD_STATUS_DATA(d)	(((u64)d) << 32)
 #define QI_IWD_STATUS_WRITE	(((u64)1) << 5)
+#define QI_IWD_FENCE		(((u64)1) << 6)
 #define QI_IWD_PRQ_DRAIN	(((u64)1) << 7)
 
 #define QI_IOTLB_DID(did) 	(((u64)did) << 16)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 7/7] iommu/vt-d: Remove redundant IOTLB flush
  2020-04-15  5:25 [PATCH v2 0/7] iommu/vt-d: Add page request draining support Lu Baolu
                   ` (5 preceding siblings ...)
  2020-04-15  5:25 ` [PATCH v2 6/7] iommu/vt-d: Add page request draining support Lu Baolu
@ 2020-04-15  5:25 ` Lu Baolu
  2020-04-15  7:57 ` [PATCH v2 0/7] iommu/vt-d: Add page request draining support Tian, Kevin
  7 siblings, 0 replies; 24+ messages in thread
From: Lu Baolu @ 2020-04-15  5:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

IOTLB flush already included in the PASID tear down and the
page request drain process. There is no need to flush again.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-svm.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 736dd39fb52b..56e8d35225fc 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -212,7 +212,6 @@ static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 	list_for_each_entry_rcu(sdev, &svm->devs, list) {
 		intel_pasid_tear_down_entry(svm->iommu, sdev->dev, svm->pasid);
 		intel_svm_drain_prq(sdev->dev, svm->pasid);
-		intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 	}
 	rcu_read_unlock();
 
@@ -406,7 +405,6 @@ int intel_svm_unbind_gpasid(struct device *dev, int pasid)
 			list_del_rcu(&sdev->list);
 			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
 			intel_svm_drain_prq(dev, svm->pasid);
-			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 			kfree_rcu(sdev, rcu);
 
 			if (list_empty(&svm->devs)) {
@@ -645,7 +643,6 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
 			 * hard to be as defensive as we might like. */
 			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
 			intel_svm_drain_prq(dev, svm->pasid);
-			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 			kfree_rcu(sdev, rcu);
 
 			if (list_empty(&svm->devs)) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 0/7] iommu/vt-d: Add page request draining support
  2020-04-15  5:25 [PATCH v2 0/7] iommu/vt-d: Add page request draining support Lu Baolu
                   ` (6 preceding siblings ...)
  2020-04-15  5:25 ` [PATCH v2 7/7] iommu/vt-d: Remove redundant IOTLB flush Lu Baolu
@ 2020-04-15  7:57 ` Tian, Kevin
  2020-04-15  8:25   ` Lu Baolu
  7 siblings, 1 reply; 24+ messages in thread
From: Tian, Kevin @ 2020-04-15  7:57 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel
  Cc: Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Wednesday, April 15, 2020 1:26 PM
> 
> When a PASID is stopped or terminated, there can be pending PRQs
> (requests that haven't received responses) in the software and
> remapping hardware. The pending page requests must be drained
> so that the pasid could be reused. The register level interface
> for page request draining is defined in 7.11 of the VT-d spec.
> This series adds the support for page requests draining.

7.11 doesn't include register-level interface. It just talks about
the general requirements on system software, endpoint device
and its driver.

Thanks
Kevin

> 
> This includes two parts:
>  - PATCH 1/7 ~ 3/7: refactor the qi_submit_sync() to support
>    multiple descriptors per submission which will be used by
>    PATCH 6/7.
>  - PATCH 4/7 ~ 7/7: add page request drain support after a
>    pasid entry is torn down due to an unbind operation.
> 
> Please help to review.
> 
> Best regards,
> baolu
> 
> Change log:
>  v1->v2:
>   - Fix race between multiple prq handling threads
> 
> Lu Baolu (7):
>   iommu/vt-d: Refactor parameters for qi_submit_sync()
>   iommu/vt-d: Multiple descriptors per qi_submit_sync()
>   iommu/vt-d: debugfs: Add support to show inv queue internals
>   iommu/vt-d: Refactor prq_event_thread()
>   iommu/vt-d: Save prq descriptors in an internal list
>   iommu/vt-d: Add page request draining support
>   iommu/vt-d: Remove redundant IOTLB flush
> 
>  drivers/iommu/dmar.c                |  63 +++--
>  drivers/iommu/intel-iommu-debugfs.c |  62 +++++
>  drivers/iommu/intel-pasid.c         |   4 +-
>  drivers/iommu/intel-svm.c           | 383 ++++++++++++++++++----------
>  drivers/iommu/intel_irq_remapping.c |   2 +-
>  include/linux/intel-iommu.h         |  12 +-
>  6 files changed, 369 insertions(+), 157 deletions(-)
> 
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 1/7] iommu/vt-d: Refactor parameters for qi_submit_sync()
  2020-04-15  5:25 ` [PATCH v2 1/7] iommu/vt-d: Refactor parameters for qi_submit_sync() Lu Baolu
@ 2020-04-15  8:02   ` Tian, Kevin
  2020-04-15  8:33     ` Lu Baolu
  0 siblings, 1 reply; 24+ messages in thread
From: Tian, Kevin @ 2020-04-15  8:02 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel
  Cc: Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Wednesday, April 15, 2020 1:26 PM
> 
> Current qi_submit_sync() supports single invalidation descriptor
> per submission and appends wait descriptor after each submission
> to poll hardware completion. This patch adjusts the parameters
> of this function so that multiple descriptors per submission can
> be supported.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/dmar.c                | 24 ++++++++++++++----------
>  drivers/iommu/intel-pasid.c         |  4 ++--
>  drivers/iommu/intel-svm.c           |  6 +++---
>  drivers/iommu/intel_irq_remapping.c |  2 +-
>  include/linux/intel-iommu.h         |  8 +++++++-
>  5 files changed, 27 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index d9dc787feef7..bb42177e2369 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1225,10 +1225,14 @@ static int qi_check_fault(struct intel_iommu
> *iommu, int index)
>  }
> 
>  /*
> - * Submit the queued invalidation descriptor to the remapping
> - * hardware unit and wait for its completion.
> + * Function to submit invalidation descriptors of all types to the queued
> + * invalidation interface(QI). Multiple descriptors can be submitted at a
> + * time, a wait descriptor will be appended to each submission to ensure
> + * hardware has completed the invalidation before return. Wait descriptors
> + * can be part of the submission but it will not be polled for completion.
>   */
> -int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
> +int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
> +		   unsigned int count, unsigned long options)

Adding parameter w/o actually using them is not typical way of splitting
patches. Better squash this with 2/7 together.

>  {
>  	int rc;
>  	struct q_inval *qi = iommu->qi;
> @@ -1318,7 +1322,7 @@ void qi_global_iec(struct intel_iommu *iommu)
>  	desc.qw3 = 0;
> 
>  	/* should never fail */
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm,
> @@ -1332,7 +1336,7 @@ void qi_flush_context(struct intel_iommu *iommu,
> u16 did, u16 sid, u8 fm,
>  	desc.qw2 = 0;
>  	desc.qw3 = 0;
> 
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> @@ -1356,7 +1360,7 @@ void qi_flush_iotlb(struct intel_iommu *iommu,
> u16 did, u64 addr,
>  	desc.qw2 = 0;
>  	desc.qw3 = 0;
> 
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> @@ -1378,7 +1382,7 @@ void qi_flush_dev_iotlb(struct intel_iommu
> *iommu, u16 sid, u16 pfsid,
>  	desc.qw2 = 0;
>  	desc.qw3 = 0;
> 
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  /* PASID-based IOTLB invalidation */
> @@ -1419,7 +1423,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu,
> u16 did, u32 pasid, u64 addr,
>  				QI_EIOTLB_AM(mask);
>  	}
> 
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  /* PASID-based device IOTLB Invalidate */
> @@ -1448,7 +1452,7 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu
> *iommu, u16 sid, u16 pfsid,
>  	if (size_order)
>  		desc.qw1 |= QI_DEV_EIOTLB_SIZE;
> 
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
> @@ -1458,7 +1462,7 @@ void qi_flush_pasid_cache(struct intel_iommu
> *iommu, u16 did,
> 
>  	desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) |
>  			QI_PC_GRAN(granu) | QI_PC_TYPE;
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  /*
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 48cc9ca5f3dc..7969e3dac2ad 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -498,7 +498,7 @@ pasid_cache_invalidation_with_pasid(struct
> intel_iommu *iommu,
>  	desc.qw2 = 0;
>  	desc.qw3 = 0;
> 
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  static void
> @@ -512,7 +512,7 @@ iotlb_invalidation_with_pasid(struct intel_iommu
> *iommu, u16 did, u32 pasid)
>  	desc.qw2 = 0;
>  	desc.qw3 = 0;
> 
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  static void
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index e9f4e979a71f..83dc4319f661 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -138,7 +138,7 @@ static void intel_flush_svm_range_dev (struct
> intel_svm *svm, struct intel_svm_d
>  	}
>  	desc.qw2 = 0;
>  	desc.qw3 = 0;
> -	qi_submit_sync(&desc, svm->iommu);
> +	qi_submit_sync(svm->iommu, &desc, 1, 0);
> 
>  	if (sdev->dev_iotlb) {
>  		desc.qw0 = QI_DEV_EIOTLB_PASID(svm->pasid) |
> @@ -162,7 +162,7 @@ static void intel_flush_svm_range_dev (struct
> intel_svm *svm, struct intel_svm_d
>  		}
>  		desc.qw2 = 0;
>  		desc.qw3 = 0;
> -		qi_submit_sync(&desc, svm->iommu);
> +		qi_submit_sync(svm->iommu, &desc, 1, 0);
>  	}
>  }
> 
> @@ -850,7 +850,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
>  				       sizeof(req->priv_data));
>  			resp.qw2 = 0;
>  			resp.qw3 = 0;
> -			qi_submit_sync(&resp, iommu);
> +			qi_submit_sync(iommu, &resp, 1, 0);
>  		}
>  		head = (head + sizeof(*req)) & PRQ_RING_MASK;
>  	}
> diff --git a/drivers/iommu/intel_irq_remapping.c
> b/drivers/iommu/intel_irq_remapping.c
> index 81e43c1df7ec..a042f123b091 100644
> --- a/drivers/iommu/intel_irq_remapping.c
> +++ b/drivers/iommu/intel_irq_remapping.c
> @@ -151,7 +151,7 @@ static int qi_flush_iec(struct intel_iommu *iommu, int
> index, int mask)
>  	desc.qw2 = 0;
>  	desc.qw3 = 0;
> 
> -	return qi_submit_sync(&desc, iommu);
> +	return qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  static int modify_irte(struct irq_2_iommu *irq_iommu,
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index cfe720f10112..ee2d5cdd8339 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -710,7 +710,13 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu
> *iommu, u16 sid, u16 pfsid,
>  void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
>  			  int pasid);
> 
> -extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
> +int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
> +		   unsigned int count, unsigned long options);
> +/*
> + * Options used in qi_submit_sync:
> + * QI_OPT_WAIT_DRAIN - Wait for PRQ drain completion, spec 6.5.2.8.
> + */
> +#define QI_OPT_WAIT_DRAIN		BIT(0)

no one uses this flag in this patch

> 
>  extern int dmar_ir_support(void);
> 
> --
> 2.17.1

Thanks
Kevin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 2/7] iommu/vt-d: Multiple descriptors per qi_submit_sync()
  2020-04-15  5:25 ` [PATCH v2 2/7] iommu/vt-d: Multiple descriptors per qi_submit_sync() Lu Baolu
@ 2020-04-15  8:18   ` Tian, Kevin
  2020-04-15  8:30     ` Lu Baolu
  0 siblings, 1 reply; 24+ messages in thread
From: Tian, Kevin @ 2020-04-15  8:18 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel
  Cc: Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Wednesday, April 15, 2020 1:26 PM
> 
> Extend qi_submit_sync() function to support multiple descriptors.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/dmar.c        | 39 +++++++++++++++++++++++--------------
>  include/linux/intel-iommu.h |  1 +
>  2 files changed, 25 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index bb42177e2369..61d049e91f84 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1157,12 +1157,11 @@ static inline void reclaim_free_desc(struct
> q_inval *qi)
>  	}
>  }
> 
> -static int qi_check_fault(struct intel_iommu *iommu, int index)
> +static int qi_check_fault(struct intel_iommu *iommu, int index, int
> wait_index)
>  {
>  	u32 fault;
>  	int head, tail;
>  	struct q_inval *qi = iommu->qi;
> -	int wait_index = (index + 1) % QI_LENGTH;
>  	int shift = qi_shift(iommu);
> 
>  	if (qi->desc_status[wait_index] == QI_ABORT)
> @@ -1234,12 +1233,12 @@ static int qi_check_fault(struct intel_iommu
> *iommu, int index)
>  int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
>  		   unsigned int count, unsigned long options)
>  {
> -	int rc;
>  	struct q_inval *qi = iommu->qi;
> -	int offset, shift, length;
>  	struct qi_desc wait_desc;
>  	int wait_index, index;
>  	unsigned long flags;
> +	int offset, shift;
> +	int rc, i;
> 
>  	if (!qi)
>  		return 0;
> @@ -1248,32 +1247,41 @@ int qi_submit_sync(struct intel_iommu *iommu,
> struct qi_desc *desc,
>  	rc = 0;
> 
>  	raw_spin_lock_irqsave(&qi->q_lock, flags);
> -	while (qi->free_cnt < 3) {
> +	/*
> +	 * Check if we have enough empty slots in the queue to submit,
> +	 * the calculation is based on:
> +	 * # of desc + 1 wait desc + 1 space between head and tail
> +	 */
> +	while (qi->free_cnt < count + 2) {
>  		raw_spin_unlock_irqrestore(&qi->q_lock, flags);
>  		cpu_relax();
>  		raw_spin_lock_irqsave(&qi->q_lock, flags);
>  	}
> 
>  	index = qi->free_head;
> -	wait_index = (index + 1) % QI_LENGTH;
> +	wait_index = (index + count) % QI_LENGTH;
>  	shift = qi_shift(iommu);
> -	length = 1 << shift;
> 
> -	qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE;
> +	for (i = 0; i < count; i++) {
> +		offset = ((index + i) % QI_LENGTH) << shift;
> +		memcpy(qi->desc + offset, &desc[i], 1 << shift);
> +		qi->desc_status[(index + i) % QI_LENGTH] = QI_IN_USE;
> +	}

what about doing one memcpy and leave the loop only for updating
qi status?

> +	qi->desc_status[wait_index] = QI_IN_USE;
> 
> -	offset = index << shift;
> -	memcpy(qi->desc + offset, desc, length);
>  	wait_desc.qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
>  			QI_IWD_STATUS_WRITE | QI_IWD_TYPE;
> +	if (options & QI_OPT_WAIT_DRAIN)
> +		wait_desc.qw0 |= QI_IWD_PRQ_DRAIN;
>  	wait_desc.qw1 = virt_to_phys(&qi->desc_status[wait_index]);
>  	wait_desc.qw2 = 0;
>  	wait_desc.qw3 = 0;
> 
>  	offset = wait_index << shift;
> -	memcpy(qi->desc + offset, &wait_desc, length);
> +	memcpy(qi->desc + offset, &wait_desc, 1 << shift);
> 
> -	qi->free_head = (qi->free_head + 2) % QI_LENGTH;
> -	qi->free_cnt -= 2;
> +	qi->free_head = (qi->free_head + count + 1) % QI_LENGTH;
> +	qi->free_cnt -= count + 1;
> 
>  	/*
>  	 * update the HW tail register indicating the presence of
> @@ -1289,7 +1297,7 @@ int qi_submit_sync(struct intel_iommu *iommu,
> struct qi_desc *desc,
>  		 * a deadlock where the interrupt context can wait
> indefinitely
>  		 * for free slots in the queue.
>  		 */
> -		rc = qi_check_fault(iommu, index);
> +		rc = qi_check_fault(iommu, index, wait_index);
>  		if (rc)
>  			break;
> 
> @@ -1298,7 +1306,8 @@ int qi_submit_sync(struct intel_iommu *iommu,
> struct qi_desc *desc,
>  		raw_spin_lock(&qi->q_lock);
>  	}
> 
> -	qi->desc_status[index] = QI_DONE;
> +	for (i = 0; i < count; i++)
> +		qi->desc_status[(index + i) % QI_LENGTH] = QI_DONE;
> 
>  	reclaim_free_desc(qi);
>  	raw_spin_unlock_irqrestore(&qi->q_lock, flags);
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index ee2d5cdd8339..cca1e5f9aeaa 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -333,6 +333,7 @@ enum {
> 
>  #define QI_IWD_STATUS_DATA(d)	(((u64)d) << 32)
>  #define QI_IWD_STATUS_WRITE	(((u64)1) << 5)
> +#define QI_IWD_PRQ_DRAIN	(((u64)1) << 7)
> 
>  #define QI_IOTLB_DID(did) 	(((u64)did) << 16)
>  #define QI_IOTLB_DR(dr) 	(((u64)dr) << 7)
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/7] iommu/vt-d: Add page request draining support
  2020-04-15  7:57 ` [PATCH v2 0/7] iommu/vt-d: Add page request draining support Tian, Kevin
@ 2020-04-15  8:25   ` Lu Baolu
  0 siblings, 0 replies; 24+ messages in thread
From: Lu Baolu @ 2020-04-15  8:25 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel
  Cc: baolu.lu, Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

Hi Kevin,

On 2020/4/15 15:57, Tian, Kevin wrote:
>> From: Lu Baolu<baolu.lu@linux.intel.com>
>> Sent: Wednesday, April 15, 2020 1:26 PM
>>
>> When a PASID is stopped or terminated, there can be pending PRQs
>> (requests that haven't received responses) in the software and
>> remapping hardware. The pending page requests must be drained
>> so that the pasid could be reused. The register level interface
>> for page request draining is defined in 7.11 of the VT-d spec.
>> This series adds the support for page requests draining.
> 7.11 doesn't include register-level interface. It just talks about
> the general requirements on system software, endpoint device
> and its driver.
> 

I will replace this with "spec 7.10 specifies the software steps to
drain page requests and response".

Best regards,
baolu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/7] iommu/vt-d: Multiple descriptors per qi_submit_sync()
  2020-04-15  8:18   ` Tian, Kevin
@ 2020-04-15  8:30     ` Lu Baolu
  2020-04-15  8:51       ` Tian, Kevin
  0 siblings, 1 reply; 24+ messages in thread
From: Lu Baolu @ 2020-04-15  8:30 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel
  Cc: baolu.lu, Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

On 2020/4/15 16:18, Tian, Kevin wrote:
>> From: Lu Baolu<baolu.lu@linux.intel.com>
>> Sent: Wednesday, April 15, 2020 1:26 PM
>>
>> Extend qi_submit_sync() function to support multiple descriptors.
>>
>> Signed-off-by: Jacob Pan<jacob.jun.pan@linux.intel.com>
>> Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/dmar.c        | 39 +++++++++++++++++++++++--------------
>>   include/linux/intel-iommu.h |  1 +
>>   2 files changed, 25 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
>> index bb42177e2369..61d049e91f84 100644
>> --- a/drivers/iommu/dmar.c
>> +++ b/drivers/iommu/dmar.c
>> @@ -1157,12 +1157,11 @@ static inline void reclaim_free_desc(struct
>> q_inval *qi)
>>   	}
>>   }
>>
>> -static int qi_check_fault(struct intel_iommu *iommu, int index)
>> +static int qi_check_fault(struct intel_iommu *iommu, int index, int
>> wait_index)
>>   {
>>   	u32 fault;
>>   	int head, tail;
>>   	struct q_inval *qi = iommu->qi;
>> -	int wait_index = (index + 1) % QI_LENGTH;
>>   	int shift = qi_shift(iommu);
>>
>>   	if (qi->desc_status[wait_index] == QI_ABORT)
>> @@ -1234,12 +1233,12 @@ static int qi_check_fault(struct intel_iommu
>> *iommu, int index)
>>   int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
>>   		   unsigned int count, unsigned long options)
>>   {
>> -	int rc;
>>   	struct q_inval *qi = iommu->qi;
>> -	int offset, shift, length;
>>   	struct qi_desc wait_desc;
>>   	int wait_index, index;
>>   	unsigned long flags;
>> +	int offset, shift;
>> +	int rc, i;
>>
>>   	if (!qi)
>>   		return 0;
>> @@ -1248,32 +1247,41 @@ int qi_submit_sync(struct intel_iommu *iommu,
>> struct qi_desc *desc,
>>   	rc = 0;
>>
>>   	raw_spin_lock_irqsave(&qi->q_lock, flags);
>> -	while (qi->free_cnt < 3) {
>> +	/*
>> +	 * Check if we have enough empty slots in the queue to submit,
>> +	 * the calculation is based on:
>> +	 * # of desc + 1 wait desc + 1 space between head and tail
>> +	 */
>> +	while (qi->free_cnt < count + 2) {
>>   		raw_spin_unlock_irqrestore(&qi->q_lock, flags);
>>   		cpu_relax();
>>   		raw_spin_lock_irqsave(&qi->q_lock, flags);
>>   	}
>>
>>   	index = qi->free_head;
>> -	wait_index = (index + 1) % QI_LENGTH;
>> +	wait_index = (index + count) % QI_LENGTH;
>>   	shift = qi_shift(iommu);
>> -	length = 1 << shift;
>>
>> -	qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE;
>> +	for (i = 0; i < count; i++) {
>> +		offset = ((index + i) % QI_LENGTH) << shift;
>> +		memcpy(qi->desc + offset, &desc[i], 1 << shift);
>> +		qi->desc_status[(index + i) % QI_LENGTH] = QI_IN_USE;
>> +	}
> what about doing one memcpy and leave the loop only for updating
> qi status?
> 

One memcpy might cross the table boundary.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/7] iommu/vt-d: Refactor parameters for qi_submit_sync()
  2020-04-15  8:02   ` Tian, Kevin
@ 2020-04-15  8:33     ` Lu Baolu
  0 siblings, 0 replies; 24+ messages in thread
From: Lu Baolu @ 2020-04-15  8:33 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel
  Cc: baolu.lu, Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

On 2020/4/15 16:02, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Wednesday, April 15, 2020 1:26 PM
>>
>> Current qi_submit_sync() supports single invalidation descriptor
>> per submission and appends wait descriptor after each submission
>> to poll hardware completion. This patch adjusts the parameters
>> of this function so that multiple descriptors per submission can
>> be supported.
>>
>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/dmar.c                | 24 ++++++++++++++----------
>>   drivers/iommu/intel-pasid.c         |  4 ++--
>>   drivers/iommu/intel-svm.c           |  6 +++---
>>   drivers/iommu/intel_irq_remapping.c |  2 +-
>>   include/linux/intel-iommu.h         |  8 +++++++-
>>   5 files changed, 27 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
>> index d9dc787feef7..bb42177e2369 100644
>> --- a/drivers/iommu/dmar.c
>> +++ b/drivers/iommu/dmar.c
>> @@ -1225,10 +1225,14 @@ static int qi_check_fault(struct intel_iommu
>> *iommu, int index)
>>   }
>>
>>   /*
>> - * Submit the queued invalidation descriptor to the remapping
>> - * hardware unit and wait for its completion.
>> + * Function to submit invalidation descriptors of all types to the queued
>> + * invalidation interface(QI). Multiple descriptors can be submitted at a
>> + * time, a wait descriptor will be appended to each submission to ensure
>> + * hardware has completed the invalidation before return. Wait descriptors
>> + * can be part of the submission but it will not be polled for completion.
>>    */
>> -int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
>> +int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
>> +		   unsigned int count, unsigned long options)
> 
> Adding parameter w/o actually using them is not typical way of splitting
> patches. Better squash this with 2/7 together.

My original thought was to make it easier for code review. No particular
preference. Both are okay to me. :-)

Best regards,
baolu

> 
>>   {
>>   	int rc;
>>   	struct q_inval *qi = iommu->qi;
>> @@ -1318,7 +1322,7 @@ void qi_global_iec(struct intel_iommu *iommu)
>>   	desc.qw3 = 0;
>>
>>   	/* should never fail */
>> -	qi_submit_sync(&desc, iommu);
>> +	qi_submit_sync(iommu, &desc, 1, 0);
>>   }
>>
>>   void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm,
>> @@ -1332,7 +1336,7 @@ void qi_flush_context(struct intel_iommu *iommu,
>> u16 did, u16 sid, u8 fm,
>>   	desc.qw2 = 0;
>>   	desc.qw3 = 0;
>>
>> -	qi_submit_sync(&desc, iommu);
>> +	qi_submit_sync(iommu, &desc, 1, 0);
>>   }
>>
>>   void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
>> @@ -1356,7 +1360,7 @@ void qi_flush_iotlb(struct intel_iommu *iommu,
>> u16 did, u64 addr,
>>   	desc.qw2 = 0;
>>   	desc.qw3 = 0;
>>
>> -	qi_submit_sync(&desc, iommu);
>> +	qi_submit_sync(iommu, &desc, 1, 0);
>>   }
>>
>>   void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
>> @@ -1378,7 +1382,7 @@ void qi_flush_dev_iotlb(struct intel_iommu
>> *iommu, u16 sid, u16 pfsid,
>>   	desc.qw2 = 0;
>>   	desc.qw3 = 0;
>>
>> -	qi_submit_sync(&desc, iommu);
>> +	qi_submit_sync(iommu, &desc, 1, 0);
>>   }
>>
>>   /* PASID-based IOTLB invalidation */
>> @@ -1419,7 +1423,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu,
>> u16 did, u32 pasid, u64 addr,
>>   				QI_EIOTLB_AM(mask);
>>   	}
>>
>> -	qi_submit_sync(&desc, iommu);
>> +	qi_submit_sync(iommu, &desc, 1, 0);
>>   }
>>
>>   /* PASID-based device IOTLB Invalidate */
>> @@ -1448,7 +1452,7 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu
>> *iommu, u16 sid, u16 pfsid,
>>   	if (size_order)
>>   		desc.qw1 |= QI_DEV_EIOTLB_SIZE;
>>
>> -	qi_submit_sync(&desc, iommu);
>> +	qi_submit_sync(iommu, &desc, 1, 0);
>>   }
>>
>>   void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
>> @@ -1458,7 +1462,7 @@ void qi_flush_pasid_cache(struct intel_iommu
>> *iommu, u16 did,
>>
>>   	desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) |
>>   			QI_PC_GRAN(granu) | QI_PC_TYPE;
>> -	qi_submit_sync(&desc, iommu);
>> +	qi_submit_sync(iommu, &desc, 1, 0);
>>   }
>>
>>   /*
>> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
>> index 48cc9ca5f3dc..7969e3dac2ad 100644
>> --- a/drivers/iommu/intel-pasid.c
>> +++ b/drivers/iommu/intel-pasid.c
>> @@ -498,7 +498,7 @@ pasid_cache_invalidation_with_pasid(struct
>> intel_iommu *iommu,
>>   	desc.qw2 = 0;
>>   	desc.qw3 = 0;
>>
>> -	qi_submit_sync(&desc, iommu);
>> +	qi_submit_sync(iommu, &desc, 1, 0);
>>   }
>>
>>   static void
>> @@ -512,7 +512,7 @@ iotlb_invalidation_with_pasid(struct intel_iommu
>> *iommu, u16 did, u32 pasid)
>>   	desc.qw2 = 0;
>>   	desc.qw3 = 0;
>>
>> -	qi_submit_sync(&desc, iommu);
>> +	qi_submit_sync(iommu, &desc, 1, 0);
>>   }
>>
>>   static void
>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>> index e9f4e979a71f..83dc4319f661 100644
>> --- a/drivers/iommu/intel-svm.c
>> +++ b/drivers/iommu/intel-svm.c
>> @@ -138,7 +138,7 @@ static void intel_flush_svm_range_dev (struct
>> intel_svm *svm, struct intel_svm_d
>>   	}
>>   	desc.qw2 = 0;
>>   	desc.qw3 = 0;
>> -	qi_submit_sync(&desc, svm->iommu);
>> +	qi_submit_sync(svm->iommu, &desc, 1, 0);
>>
>>   	if (sdev->dev_iotlb) {
>>   		desc.qw0 = QI_DEV_EIOTLB_PASID(svm->pasid) |
>> @@ -162,7 +162,7 @@ static void intel_flush_svm_range_dev (struct
>> intel_svm *svm, struct intel_svm_d
>>   		}
>>   		desc.qw2 = 0;
>>   		desc.qw3 = 0;
>> -		qi_submit_sync(&desc, svm->iommu);
>> +		qi_submit_sync(svm->iommu, &desc, 1, 0);
>>   	}
>>   }
>>
>> @@ -850,7 +850,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
>>   				       sizeof(req->priv_data));
>>   			resp.qw2 = 0;
>>   			resp.qw3 = 0;
>> -			qi_submit_sync(&resp, iommu);
>> +			qi_submit_sync(iommu, &resp, 1, 0);
>>   		}
>>   		head = (head + sizeof(*req)) & PRQ_RING_MASK;
>>   	}
>> diff --git a/drivers/iommu/intel_irq_remapping.c
>> b/drivers/iommu/intel_irq_remapping.c
>> index 81e43c1df7ec..a042f123b091 100644
>> --- a/drivers/iommu/intel_irq_remapping.c
>> +++ b/drivers/iommu/intel_irq_remapping.c
>> @@ -151,7 +151,7 @@ static int qi_flush_iec(struct intel_iommu *iommu, int
>> index, int mask)
>>   	desc.qw2 = 0;
>>   	desc.qw3 = 0;
>>
>> -	return qi_submit_sync(&desc, iommu);
>> +	return qi_submit_sync(iommu, &desc, 1, 0);
>>   }
>>
>>   static int modify_irte(struct irq_2_iommu *irq_iommu,
>> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
>> index cfe720f10112..ee2d5cdd8339 100644
>> --- a/include/linux/intel-iommu.h
>> +++ b/include/linux/intel-iommu.h
>> @@ -710,7 +710,13 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu
>> *iommu, u16 sid, u16 pfsid,
>>   void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
>>   			  int pasid);
>>
>> -extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
>> +int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
>> +		   unsigned int count, unsigned long options);
>> +/*
>> + * Options used in qi_submit_sync:
>> + * QI_OPT_WAIT_DRAIN - Wait for PRQ drain completion, spec 6.5.2.8.
>> + */
>> +#define QI_OPT_WAIT_DRAIN		BIT(0)
> 
> no one uses this flag in this patch
> 
>>
>>   extern int dmar_ir_support(void);
>>
>> --
>> 2.17.1
> 
> Thanks
> Kevin
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 2/7] iommu/vt-d: Multiple descriptors per qi_submit_sync()
  2020-04-15  8:30     ` Lu Baolu
@ 2020-04-15  8:51       ` Tian, Kevin
  0 siblings, 0 replies; 24+ messages in thread
From: Tian, Kevin @ 2020-04-15  8:51 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel
  Cc: Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Wednesday, April 15, 2020 4:30 PM
> 
> On 2020/4/15 16:18, Tian, Kevin wrote:
> >> From: Lu Baolu<baolu.lu@linux.intel.com>
> >> Sent: Wednesday, April 15, 2020 1:26 PM
> >>
> >> Extend qi_submit_sync() function to support multiple descriptors.
> >>
> >> Signed-off-by: Jacob Pan<jacob.jun.pan@linux.intel.com>
> >> Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
> >> ---
> >>   drivers/iommu/dmar.c        | 39 +++++++++++++++++++++++--------------
> >>   include/linux/intel-iommu.h |  1 +
> >>   2 files changed, 25 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> >> index bb42177e2369..61d049e91f84 100644
> >> --- a/drivers/iommu/dmar.c
> >> +++ b/drivers/iommu/dmar.c
> >> @@ -1157,12 +1157,11 @@ static inline void reclaim_free_desc(struct
> >> q_inval *qi)
> >>   	}
> >>   }
> >>
> >> -static int qi_check_fault(struct intel_iommu *iommu, int index)
> >> +static int qi_check_fault(struct intel_iommu *iommu, int index, int
> >> wait_index)
> >>   {
> >>   	u32 fault;
> >>   	int head, tail;
> >>   	struct q_inval *qi = iommu->qi;
> >> -	int wait_index = (index + 1) % QI_LENGTH;
> >>   	int shift = qi_shift(iommu);
> >>
> >>   	if (qi->desc_status[wait_index] == QI_ABORT)
> >> @@ -1234,12 +1233,12 @@ static int qi_check_fault(struct intel_iommu
> >> *iommu, int index)
> >>   int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
> >>   		   unsigned int count, unsigned long options)
> >>   {
> >> -	int rc;
> >>   	struct q_inval *qi = iommu->qi;
> >> -	int offset, shift, length;
> >>   	struct qi_desc wait_desc;
> >>   	int wait_index, index;
> >>   	unsigned long flags;
> >> +	int offset, shift;
> >> +	int rc, i;
> >>
> >>   	if (!qi)
> >>   		return 0;
> >> @@ -1248,32 +1247,41 @@ int qi_submit_sync(struct intel_iommu
> *iommu,
> >> struct qi_desc *desc,
> >>   	rc = 0;
> >>
> >>   	raw_spin_lock_irqsave(&qi->q_lock, flags);
> >> -	while (qi->free_cnt < 3) {
> >> +	/*
> >> +	 * Check if we have enough empty slots in the queue to submit,
> >> +	 * the calculation is based on:
> >> +	 * # of desc + 1 wait desc + 1 space between head and tail
> >> +	 */
> >> +	while (qi->free_cnt < count + 2) {
> >>   		raw_spin_unlock_irqrestore(&qi->q_lock, flags);
> >>   		cpu_relax();
> >>   		raw_spin_lock_irqsave(&qi->q_lock, flags);
> >>   	}
> >>
> >>   	index = qi->free_head;
> >> -	wait_index = (index + 1) % QI_LENGTH;
> >> +	wait_index = (index + count) % QI_LENGTH;
> >>   	shift = qi_shift(iommu);
> >> -	length = 1 << shift;
> >>
> >> -	qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE;
> >> +	for (i = 0; i < count; i++) {
> >> +		offset = ((index + i) % QI_LENGTH) << shift;
> >> +		memcpy(qi->desc + offset, &desc[i], 1 << shift);
> >> +		qi->desc_status[(index + i) % QI_LENGTH] = QI_IN_USE;
> >> +	}
> > what about doing one memcpy and leave the loop only for updating
> > qi status?
> >
> 
> One memcpy might cross the table boundary.
> 

Thanks. you are right.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 4/7] iommu/vt-d: Refactor prq_event_thread()
  2020-04-15  5:25 ` [PATCH v2 4/7] iommu/vt-d: Refactor prq_event_thread() Lu Baolu
@ 2020-04-15  9:15   ` Tian, Kevin
  2020-04-16  1:33     ` Lu Baolu
  0 siblings, 1 reply; 24+ messages in thread
From: Tian, Kevin @ 2020-04-15  9:15 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel
  Cc: Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Wednesday, April 15, 2020 1:26 PM
> 
> Move the software processing page request descriptors part from
> prq_event_thread() into a separated function. No any functional
> changes.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-svm.c | 256 ++++++++++++++++++++------------------
>  1 file changed, 135 insertions(+), 121 deletions(-)
> 
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 83dc4319f661..a1921b462783 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -722,142 +722,156 @@ static bool is_canonical_address(u64 addr)
>  	return (((saddr << shift) >> shift) == saddr);
>  }
> 
> -static irqreturn_t prq_event_thread(int irq, void *d)
> +static void process_single_prq(struct intel_iommu *iommu,
> +			       struct page_req_dsc *req)
>  {
> -	struct intel_iommu *iommu = d;
> -	struct intel_svm *svm = NULL;
> -	int head, tail, handled = 0;
> -
> -	/* Clear PPR bit before reading head/tail registers, to
> -	 * ensure that we get a new interrupt if needed. */
> -	writel(DMA_PRS_PPR, iommu->reg + DMAR_PRS_REG);
> -
> -	tail = dmar_readq(iommu->reg + DMAR_PQT_REG) &
> PRQ_RING_MASK;
> -	head = dmar_readq(iommu->reg + DMAR_PQH_REG) &
> PRQ_RING_MASK;
> -	while (head != tail) {
> -		struct intel_svm_dev *sdev;
> -		struct vm_area_struct *vma;
> -		struct page_req_dsc *req;
> -		struct qi_desc resp;
> -		int result;
> -		vm_fault_t ret;
> -		u64 address;
> -
> -		handled = 1;
> -
> -		req = &iommu->prq[head / sizeof(*req)];
> +	int result = QI_RESP_FAILURE;
> +	struct intel_svm_dev *sdev;
> +	struct vm_area_struct *vma;
> +	struct intel_svm *svm;
> +	struct qi_desc resp;
> +	vm_fault_t ret;
> +	u64 address;
> +
> +	address = (u64)req->addr << VTD_PAGE_SHIFT;
> +	if (!req->pasid_present) {
> +		pr_err("%s: Page request without PASID: %08llx %08llx\n",
> +		       iommu->name, ((unsigned long long *)req)[0],
> +		       ((unsigned long long *)req)[1]);
> +		goto no_pasid;
> +	}
> 
> -		result = QI_RESP_FAILURE;
> -		address = (u64)req->addr << VTD_PAGE_SHIFT;
> -		if (!req->pasid_present) {
> -			pr_err("%s: Page request without
> PASID: %08llx %08llx\n",
> -			       iommu->name, ((unsigned long long *)req)[0],
> -			       ((unsigned long long *)req)[1]);
> -			goto no_pasid;
> -		}
> +	rcu_read_lock();
> +	svm = ioasid_find(NULL, req->pasid, NULL);
> +	/*
> +	 * It *can't* go away, because the driver is not permitted
> +	 * to unbind the mm while any page faults are outstanding.
> +	 * So we only need RCU to protect the internal idr code.
> +	 */
> +	rcu_read_unlock();
> 
> -		if (!svm || svm->pasid != req->pasid) {
> -			rcu_read_lock();
> -			svm = ioasid_find(NULL, req->pasid, NULL);
> -			/* It *can't* go away, because the driver is not
> permitted
> -			 * to unbind the mm while any page faults are
> outstanding.
> -			 * So we only need RCU to protect the internal idr
> code. */
> -			rcu_read_unlock();
> -			if (IS_ERR_OR_NULL(svm)) {
> -				pr_err("%s: Page request for invalid
> PASID %d: %08llx %08llx\n",
> -				       iommu->name, req->pasid, ((unsigned
> long long *)req)[0],
> -				       ((unsigned long long *)req)[1]);
> -				goto no_pasid;
> -			}
> -		}
> +	if (IS_ERR_OR_NULL(svm)) {
> +		pr_err("%s: Page request for invalid
> PASID %d: %08llx %08llx\n",
> +		       iommu->name, req->pasid, ((unsigned long long *)req)[0],
> +		       ((unsigned long long *)req)[1]);
> +		goto no_pasid;
> +	}
> 
> -		result = QI_RESP_INVALID;
> -		/* Since we're using init_mm.pgd directly, we should never
> take
> -		 * any faults on kernel addresses. */
> -		if (!svm->mm)
> -			goto bad_req;
> +	result = QI_RESP_INVALID;
> +	/* Since we're using init_mm.pgd directly, we should never take
> +	 * any faults on kernel addresses. */
> +	if (!svm->mm)
> +		goto bad_req;
> +
> +	/* If address is not canonical, return invalid response */
> +	if (!is_canonical_address(address))
> +		goto bad_req;
> +
> +	/* If the mm is already defunct, don't handle faults. */
> +	if (!mmget_not_zero(svm->mm))
> +		goto bad_req;
> +
> +	down_read(&svm->mm->mmap_sem);
> +	vma = find_extend_vma(svm->mm, address);
> +	if (!vma || address < vma->vm_start)
> +		goto invalid;
> +
> +	if (access_error(vma, req))
> +		goto invalid;
> +
> +	ret = handle_mm_fault(vma, address,
> +			      req->wr_req ? FAULT_FLAG_WRITE : 0);
> +	if (ret & VM_FAULT_ERROR)
> +		goto invalid;
> +
> +	result = QI_RESP_SUCCESS;
> +invalid:
> +	up_read(&svm->mm->mmap_sem);
> +	mmput(svm->mm);
> +bad_req:
> +	/* Accounting for major/minor faults? */
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(sdev, &svm->devs, list) {
> +		if (sdev->sid == req->rid)
> +			break;
> +	}
> 
> -		/* If address is not canonical, return invalid response */
> -		if (!is_canonical_address(address))
> -			goto bad_req;
> +	/* Other devices can go away, but the drivers are not permitted
> +	 * to unbind while any page faults might be in flight. So it's
> +	 * OK to drop the 'lock' here now we have it. */
> +	rcu_read_unlock();
> 
> -		/* If the mm is already defunct, don't handle faults. */
> -		if (!mmget_not_zero(svm->mm))
> -			goto bad_req;
> +	if (WARN_ON(&sdev->list == &svm->devs))
> +		sdev = NULL;
> 
> -		down_read(&svm->mm->mmap_sem);
> -		vma = find_extend_vma(svm->mm, address);
> -		if (!vma || address < vma->vm_start)
> -			goto invalid;
> +	if (sdev && sdev->ops && sdev->ops->fault_cb) {
> +		int rwxp = (req->rd_req << 3) | (req->wr_req << 2) |
> +			(req->exe_req << 1) | (req->pm_req);
> +		sdev->ops->fault_cb(sdev->dev, req->pasid, req->addr,
> +				    req->priv_data, rwxp, result);
> +	}
> 
> -		if (access_error(vma, req))
> -			goto invalid;
> +	/* We get here in the error case where the PASID lookup failed,
> +	   and these can be NULL. Do not use them below this point! */
> +	sdev = NULL;
> +	svm = NULL;
> +no_pasid:
> +	if (req->lpig || req->priv_data_present) {
> +		/*
> +		 * Per VT-d spec. v3.0 ch7.7, system software must
> +		 * respond with page group response if private data
> +		 * is present (PDP) or last page in group (LPIG) bit
> +		 * is set. This is an additional VT-d feature beyond
> +		 * PCI ATS spec.
> +		 */
> +		resp.qw0 = QI_PGRP_PASID(req->pasid) |
> +			QI_PGRP_DID(req->rid) |
> +			QI_PGRP_PASID_P(req->pasid_present) |
> +			QI_PGRP_PDP(req->pasid_present) |
> +			QI_PGRP_RESP_CODE(result) |
> +			QI_PGRP_RESP_TYPE;
> +		resp.qw1 = QI_PGRP_IDX(req->prg_index) |
> +			QI_PGRP_LPIG(req->lpig);
> +
> +		if (req->priv_data_present)
> +			memcpy(&resp.qw2, req->priv_data,
> +			       sizeof(req->priv_data));
> +		resp.qw2 = 0;
> +		resp.qw3 = 0;
> +		qi_submit_sync(iommu, &resp, 1, 0);
> +	}
> +}
> 
> -		ret = handle_mm_fault(vma, address,
> -				      req->wr_req ? FAULT_FLAG_WRITE : 0);
> -		if (ret & VM_FAULT_ERROR)
> -			goto invalid;
> +static void intel_svm_process_prq(struct intel_iommu *iommu,
> +				  struct page_req_dsc *prq,
> +				  int head, int tail)
> +{
> +	struct page_req_dsc *req;
> 
> -		result = QI_RESP_SUCCESS;
> -	invalid:
> -		up_read(&svm->mm->mmap_sem);
> -		mmput(svm->mm);
> -	bad_req:
> -		/* Accounting for major/minor faults? */
> -		rcu_read_lock();
> -		list_for_each_entry_rcu(sdev, &svm->devs, list) {
> -			if (sdev->sid == req->rid)
> -				break;
> -		}
> -		/* Other devices can go away, but the drivers are not
> permitted
> -		 * to unbind while any page faults might be in flight. So it's
> -		 * OK to drop the 'lock' here now we have it. */
> -		rcu_read_unlock();
> -
> -		if (WARN_ON(&sdev->list == &svm->devs))
> -			sdev = NULL;
> -
> -		if (sdev && sdev->ops && sdev->ops->fault_cb) {
> -			int rwxp = (req->rd_req << 3) | (req->wr_req << 2) |
> -				(req->exe_req << 1) | (req->pm_req);
> -			sdev->ops->fault_cb(sdev->dev, req->pasid, req-
> >addr,
> -					    req->priv_data, rwxp, result);
> -		}
> -		/* We get here in the error case where the PASID lookup
> failed,
> -		   and these can be NULL. Do not use them below this point!
> */
> -		sdev = NULL;
> -		svm = NULL;
> -	no_pasid:
> -		if (req->lpig || req->priv_data_present) {
> -			/*
> -			 * Per VT-d spec. v3.0 ch7.7, system software must
> -			 * respond with page group response if private data
> -			 * is present (PDP) or last page in group (LPIG) bit
> -			 * is set. This is an additional VT-d feature beyond
> -			 * PCI ATS spec.
> -			 */
> -			resp.qw0 = QI_PGRP_PASID(req->pasid) |
> -				QI_PGRP_DID(req->rid) |
> -				QI_PGRP_PASID_P(req->pasid_present) |
> -				QI_PGRP_PDP(req->pasid_present) |
> -				QI_PGRP_RESP_CODE(result) |
> -				QI_PGRP_RESP_TYPE;
> -			resp.qw1 = QI_PGRP_IDX(req->prg_index) |
> -				QI_PGRP_LPIG(req->lpig);
> -
> -			if (req->priv_data_present)
> -				memcpy(&resp.qw2, req->priv_data,
> -				       sizeof(req->priv_data));
> -			resp.qw2 = 0;
> -			resp.qw3 = 0;
> -			qi_submit_sync(iommu, &resp, 1, 0);
> -		}
> +	while (head != tail) {
> +		req = &iommu->prq[head / sizeof(*req)];
> +		process_single_prq(iommu, req);
>  		head = (head + sizeof(*req)) & PRQ_RING_MASK;
>  	}
> +}
> +
> +static irqreturn_t prq_event_thread(int irq, void *d)
> +{
> +	struct intel_iommu *iommu = d;
> +	int head, tail;
> 
> +	/*
> +	 * Clear PPR bit before reading head/tail registers, to
> +	 * ensure that we get a new interrupt if needed.
> +	 */
> +	writel(DMA_PRS_PPR, iommu->reg + DMAR_PRS_REG);
> +
> +	tail = dmar_readq(iommu->reg + DMAR_PQT_REG) &
> PRQ_RING_MASK;
> +	head = dmar_readq(iommu->reg + DMAR_PQH_REG) &
> PRQ_RING_MASK;
> +	intel_svm_process_prq(iommu, iommu->prq, head, tail);
>  	dmar_writeq(iommu->reg + DMAR_PQH_REG, tail);
> 
> -	return IRQ_RETVAL(handled);
> +	return IRQ_RETVAL(1);

this might be a functional change, since previously (0) could
be returned when head==tail.

>  }
> 
>  #define to_intel_svm_dev(handle) container_of(handle, struct
> intel_svm_dev, sva)
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 5/7] iommu/vt-d: Save prq descriptors in an internal list
  2020-04-15  5:25 ` [PATCH v2 5/7] iommu/vt-d: Save prq descriptors in an internal list Lu Baolu
@ 2020-04-15  9:30   ` Tian, Kevin
  2020-04-16  1:46     ` Lu Baolu
  0 siblings, 1 reply; 24+ messages in thread
From: Tian, Kevin @ 2020-04-15  9:30 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel
  Cc: Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Wednesday, April 15, 2020 1:26 PM
> 
> Currently, the page request interrupt thread handles the page
> requests in the queue in this way:
> 
> - Clear PPR bit to ensure new interrupt could come in;
> - Read and record the head and tail registers;
> - Handle all descriptors between head and tail;
> - Write tail to head register.
> 
> This might cause some descriptors to be handles multiple times.
> An example sequence:
> 
> - Thread A got scheduled with PRQ_1 and PRQ_2 in the queue;
> - Thread A clear the PPR bit and record the head and tail;
> - A new PRQ_3 comes and Thread B gets scheduled;
> - Thread B record the head and tail which includes PRQ_1
>   and PRQ_2.

I may overlook something but isn't the prq interrupt thread
per iommu then why would two prq threads contend here?

Thanks,
Kevin

> 
> As the result, PRQ_1 and PRQ_2 are handled twice in Thread_A and
> Thread_B.
> 
>        Thread_A            Thread_B
>       .--------.          .--------.
>       |        |          |        |
>       .--------.          .--------.
>   head| PRQ_1  |      head| PRQ_1  |
>       .--------.          .--------.
>       | PRQ_2  |          | PRQ_2  |
>       .--------.          .--------.
>   tail|        |          | PRQ_3  |
>       .--------.          .--------.
>       |        |      tail|        |
>       '--------'          '--------'
> 
> To avoid this, probably, we need to apply a spinlock to ensure
> that PRQs are handled in a serialized way. But that means the
> intel_svm_process_prq() will be called with a spinlock held.
> This causes extra complexities in intel_svm_process_prq().
> 
> This aims to make PRQ descriptors to be handled in a serialized
> way while remove the requirement of holding the spin lock in
> intel_svm_process_prq() by saving the descriptors in a list.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-svm.c   | 58 ++++++++++++++++++++++++++++++-------
>  include/linux/intel-iommu.h |  2 ++
>  2 files changed, 49 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index a1921b462783..05aeb8ea51c4 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -50,6 +50,8 @@ int intel_svm_enable_prq(struct intel_iommu *iommu)
>  		return ret;
>  	}
>  	iommu->pr_irq = irq;
> +	INIT_LIST_HEAD(&iommu->prq_list);
> +	spin_lock_init(&iommu->prq_lock);
> 
>  	snprintf(iommu->prq_name, sizeof(iommu->prq_name), "dmar%d-
> prq", iommu->seq_id);
> 
> @@ -698,6 +700,14 @@ struct page_req_dsc {
> 
>  #define PRQ_RING_MASK	((0x1000 << PRQ_ORDER) - 0x20)
> 
> +struct page_req {
> +	struct list_head list;
> +	struct page_req_dsc desc;
> +	unsigned int processing:1;
> +	unsigned int drained:1;
> +	unsigned int completed:1;
> +};
> +
>  static bool access_error(struct vm_area_struct *vma, struct page_req_dsc
> *req)
>  {
>  	unsigned long requested = 0;
> @@ -842,34 +852,60 @@ static void process_single_prq(struct intel_iommu
> *iommu,
>  	}
>  }
> 
> -static void intel_svm_process_prq(struct intel_iommu *iommu,
> -				  struct page_req_dsc *prq,
> -				  int head, int tail)
> +static void intel_svm_process_prq(struct intel_iommu *iommu)
>  {
> -	struct page_req_dsc *req;
> -
> -	while (head != tail) {
> -		req = &iommu->prq[head / sizeof(*req)];
> -		process_single_prq(iommu, req);
> -		head = (head + sizeof(*req)) & PRQ_RING_MASK;
> +	struct page_req *req;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&iommu->prq_lock, flags);
> +	while (!list_empty(&iommu->prq_list)) {
> +		req = list_first_entry(&iommu->prq_list, struct page_req, list);
> +		if (!req->processing) {
> +			req->processing = true;
> +			spin_unlock_irqrestore(&iommu->prq_lock, flags);
> +			process_single_prq(iommu, &req->desc);
> +			spin_lock_irqsave(&iommu->prq_lock, flags);
> +			req->completed = true;
> +		} else if (req->completed) {
> +			list_del(&req->list);
> +			kfree(req);
> +		} else {
> +			break;
> +		}
>  	}
> +	spin_unlock_irqrestore(&iommu->prq_lock, flags);
>  }
> 
>  static irqreturn_t prq_event_thread(int irq, void *d)
>  {
>  	struct intel_iommu *iommu = d;
> +	unsigned long flags;
>  	int head, tail;
> 
> +	spin_lock_irqsave(&iommu->prq_lock, flags);
>  	/*
>  	 * Clear PPR bit before reading head/tail registers, to
>  	 * ensure that we get a new interrupt if needed.
>  	 */
>  	writel(DMA_PRS_PPR, iommu->reg + DMAR_PRS_REG);
> -
>  	tail = dmar_readq(iommu->reg + DMAR_PQT_REG) &
> PRQ_RING_MASK;
>  	head = dmar_readq(iommu->reg + DMAR_PQH_REG) &
> PRQ_RING_MASK;
> -	intel_svm_process_prq(iommu, iommu->prq, head, tail);
> +	while (head != tail) {
> +		struct page_req_dsc *dsc;
> +		struct page_req *req;
> +
> +		dsc = &iommu->prq[head / sizeof(*dsc)];
> +		req = kzalloc(sizeof (*req), GFP_ATOMIC);
> +		if (!req)
> +			break;
> +		req->desc = *dsc;
> +		list_add_tail(&req->list, &iommu->prq_list);
> +		head = (head + sizeof(*dsc)) & PRQ_RING_MASK;
> +	}
>  	dmar_writeq(iommu->reg + DMAR_PQH_REG, tail);
> +	spin_unlock_irqrestore(&iommu->prq_lock, flags);
> +
> +	intel_svm_process_prq(iommu);
> 
>  	return IRQ_RETVAL(1);
>  }
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index cca1e5f9aeaa..80715a59491c 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -590,6 +590,8 @@ struct intel_iommu {
>  #ifdef CONFIG_INTEL_IOMMU_SVM
>  	struct page_req_dsc *prq;
>  	unsigned char prq_name[16];    /* Name for PRQ interrupt */
> +	struct list_head prq_list;	/* pending page request list */
> +	spinlock_t prq_lock;		/* protect above pending list */
>  	struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for
> PASIDs */
>  #endif
>  	struct q_inval  *qi;            /* Queued invalidation info */
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 6/7] iommu/vt-d: Add page request draining support
  2020-04-15  5:25 ` [PATCH v2 6/7] iommu/vt-d: Add page request draining support Lu Baolu
@ 2020-04-15 11:10   ` Tian, Kevin
  2020-04-16  2:19     ` Lu Baolu
  2020-04-16  8:38     ` Lu Baolu
  0 siblings, 2 replies; 24+ messages in thread
From: Tian, Kevin @ 2020-04-15 11:10 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel
  Cc: Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Wednesday, April 15, 2020 1:26 PM
> 
> When a PASID is stopped or terminated, there can be pending
> PRQs (requests that haven't received responses) in remapping
> hardware. This adds the interface to drain page requests and
> call it when a PASID is terminated.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-svm.c   | 90 ++++++++++++++++++++++++++++++++++---
>  include/linux/intel-iommu.h |  1 +
>  2 files changed, 86 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 05aeb8ea51c4..736dd39fb52b 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -23,6 +23,7 @@
>  #include "intel-pasid.h"
> 
>  static irqreturn_t prq_event_thread(int irq, void *d);
> +static void intel_svm_drain_prq(struct device *dev, int pasid);
> 
>  #define PRQ_ORDER 0
> 
> @@ -210,6 +211,7 @@ static void intel_mm_release(struct mmu_notifier
> *mn, struct mm_struct *mm)
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(sdev, &svm->devs, list) {
>  		intel_pasid_tear_down_entry(svm->iommu, sdev->dev, svm-
> >pasid);
> +		intel_svm_drain_prq(sdev->dev, svm->pasid);

I feel there is a problem here. If you clear the PASID entry before draining,
in-fly requests will hit unrecoverable fault instead, due to invalid PASID
entry.

>  		intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>  	}
>  	rcu_read_unlock();
> @@ -403,12 +405,8 @@ int intel_svm_unbind_gpasid(struct device *dev, int
> pasid)
>  		if (!sdev->users) {
>  			list_del_rcu(&sdev->list);
>  			intel_pasid_tear_down_entry(iommu, dev, svm-
> >pasid);
> +			intel_svm_drain_prq(dev, svm->pasid);
>  			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
> -			/* TODO: Drain in flight PRQ for the PASID since it
> -			 * may get reused soon, we don't want to
> -			 * confuse with its previous life.
> -			 * intel_svm_drain_prq(dev, pasid);
> -			 */
>  			kfree_rcu(sdev, rcu);
> 
>  			if (list_empty(&svm->devs)) {
> @@ -646,6 +644,7 @@ int intel_svm_unbind_mm(struct device *dev, int
> pasid)
>  			 * large and has to be physically contiguous. So it's
>  			 * hard to be as defensive as we might like. */
>  			intel_pasid_tear_down_entry(iommu, dev, svm-
> >pasid);
> +			intel_svm_drain_prq(dev, svm->pasid);
>  			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>  			kfree_rcu(sdev, rcu);
> 
> @@ -703,6 +702,7 @@ struct page_req_dsc {
>  struct page_req {
>  	struct list_head list;
>  	struct page_req_dsc desc;
> +	struct completion complete;
>  	unsigned int processing:1;
>  	unsigned int drained:1;
>  	unsigned int completed:1;
> @@ -732,9 +732,83 @@ static bool is_canonical_address(u64 addr)
>  	return (((saddr << shift) >> shift) == saddr);
>  }
> 
> +/**
> + * intel_svm_drain_prq:
> + *
> + * Drain all pending page requests related to a specific pasid in both
> + * software and hardware. The caller must guarantee that no more page
> + * requests related to this pasid coming.
> + */
> +static void intel_svm_drain_prq(struct device *dev, int pasid)
> +{
> +	struct device_domain_info *info;
> +	struct dmar_domain *domain;
> +	struct intel_iommu *iommu;
> +	struct qi_desc desc[3];
> +	struct pci_dev *pdev;
> +	struct page_req *req;
> +	unsigned long flags;
> +	u16 sid, did;
> +	int qdep;
> +
> +	info = get_domain_info(dev);
> +	if (WARN_ON(!info || !dev_is_pci(dev)))
> +		return;
> +
> +	iommu = info->iommu;
> +	domain = info->domain;
> +	pdev = to_pci_dev(dev);
> +
> +	/* Mark all related pending requests drained. */
> +	spin_lock_irqsave(&iommu->prq_lock, flags);
> +	list_for_each_entry(req, &iommu->prq_list, list)
> +		if (req->desc.pasid_present && req->desc.pasid == pasid)
> +			req->drained = true;
> +	spin_unlock_irqrestore(&iommu->prq_lock, flags);
> +
> +	/* Wait until all related pending requests complete. */
> +retry:
> +	spin_lock_irqsave(&iommu->prq_lock, flags);
> +	list_for_each_entry(req, &iommu->prq_list, list) {
> +		if (req->desc.pasid_present &&
> +		    req->desc.pasid == pasid &&
> +		    !req->completed) {
> +			spin_unlock_irqrestore(&iommu->prq_lock, flags);
> +			wait_for_completion_timeout(&req->complete, 5 *
> HZ);
> +			goto retry;
> +		}
> +	}
> +	spin_unlock_irqrestore(&iommu->prq_lock, flags);
> +
> +	/*
> +	 * Perform steps described in VT-d spec CH7.10 to drain page
> +	 * request and responses in hardware.
> +	 */
> +	sid = PCI_DEVID(info->bus, info->devfn);
> +	did = domain->iommu_did[iommu->seq_id];
> +	qdep = pci_ats_queue_depth(pdev);
> +
> +	memset(desc, 0, sizeof(desc));
> +	desc[0].qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
> +			QI_IWD_FENCE |
> +			QI_IWD_TYPE;
> +	desc[1].qw0 = QI_EIOTLB_PASID(pasid) |
> +			QI_EIOTLB_DID(did) |
> +			QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
> +			QI_EIOTLB_TYPE;
> +	desc[2].qw0 = QI_DEV_EIOTLB_PASID(pasid) |
> +			QI_DEV_EIOTLB_SID(sid) |
> +			QI_DEV_EIOTLB_QDEP(qdep) |
> +			QI_DEIOTLB_TYPE |
> +			QI_DEV_IOTLB_PFSID(info->pfsid);
> +
> +	qi_submit_sync(iommu, desc, 3, QI_OPT_WAIT_DRAIN);

the completion of above sequence ensures that previous queued
page group responses are sent out and received by the endpoint
and vice versa all in-fly page requests from the endpoint are queued
in iommu page request queue. Then comes a problem - you didn't
wait for completion of those newly-queued requests and their
responses.

According to VT-d spec 7.10, step (d) mentions when queue overflow
happens, software needs to repeat the above draining sequence to
drain auto-responses.

According to VT-d spec 7.11, the device driver must be notified to
revoke the PASID before this draining sequence happens. When 
does that happen? Possibly can add some comment to explain such
background.

Thanks
Kevin

> +}
> +
>  static void process_single_prq(struct intel_iommu *iommu,
>  			       struct page_req_dsc *req)
>  {
> +	struct page_req *p_req = container_of(req, struct page_req, desc);
>  	int result = QI_RESP_FAILURE;
>  	struct intel_svm_dev *sdev;
>  	struct vm_area_struct *vma;
> @@ -768,6 +842,10 @@ static void process_single_prq(struct intel_iommu
> *iommu,
>  	}
> 
>  	result = QI_RESP_INVALID;
> +
> +	if (p_req->drained)
> +		goto bad_req;
> +
>  	/* Since we're using init_mm.pgd directly, we should never take
>  	 * any faults on kernel addresses. */
>  	if (!svm->mm)
> @@ -868,6 +946,7 @@ static void intel_svm_process_prq(struct
> intel_iommu *iommu)
>  			req->completed = true;
>  		} else if (req->completed) {
>  			list_del(&req->list);
> +			complete(&req->complete);
>  			kfree(req);
>  		} else {
>  			break;
> @@ -899,6 +978,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
>  		if (!req)
>  			break;
>  		req->desc = *dsc;
> +		init_completion(&req->complete);
>  		list_add_tail(&req->list, &iommu->prq_list);
>  		head = (head + sizeof(*dsc)) & PRQ_RING_MASK;
>  	}
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 80715a59491c..714a0df3d879 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -333,6 +333,7 @@ enum {
> 
>  #define QI_IWD_STATUS_DATA(d)	(((u64)d) << 32)
>  #define QI_IWD_STATUS_WRITE	(((u64)1) << 5)
> +#define QI_IWD_FENCE		(((u64)1) << 6)
>  #define QI_IWD_PRQ_DRAIN	(((u64)1) << 7)
> 
>  #define QI_IOTLB_DID(did) 	(((u64)did) << 16)
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/7] iommu/vt-d: Refactor prq_event_thread()
  2020-04-15  9:15   ` Tian, Kevin
@ 2020-04-16  1:33     ` Lu Baolu
  0 siblings, 0 replies; 24+ messages in thread
From: Lu Baolu @ 2020-04-16  1:33 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel
  Cc: baolu.lu, Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

On 2020/4/15 17:15, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Wednesday, April 15, 2020 1:26 PM
>>
>> Move the software processing page request descriptors part from
>> prq_event_thread() into a separated function. No any functional
>> changes.
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/intel-svm.c | 256 ++++++++++++++++++++------------------
>>   1 file changed, 135 insertions(+), 121 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>> index 83dc4319f661..a1921b462783 100644
>> --- a/drivers/iommu/intel-svm.c
>> +++ b/drivers/iommu/intel-svm.c
>> @@ -722,142 +722,156 @@ static bool is_canonical_address(u64 addr)
>>   	return (((saddr << shift) >> shift) == saddr);
>>   }
>>
>> -static irqreturn_t prq_event_thread(int irq, void *d)
>> +static void process_single_prq(struct intel_iommu *iommu,
>> +			       struct page_req_dsc *req)
>>   {
>> -	struct intel_iommu *iommu = d;
>> -	struct intel_svm *svm = NULL;
>> -	int head, tail, handled = 0;
>> -
>> -	/* Clear PPR bit before reading head/tail registers, to
>> -	 * ensure that we get a new interrupt if needed. */
>> -	writel(DMA_PRS_PPR, iommu->reg + DMAR_PRS_REG);
>> -
>> -	tail = dmar_readq(iommu->reg + DMAR_PQT_REG) &
>> PRQ_RING_MASK;
>> -	head = dmar_readq(iommu->reg + DMAR_PQH_REG) &
>> PRQ_RING_MASK;
>> -	while (head != tail) {
>> -		struct intel_svm_dev *sdev;
>> -		struct vm_area_struct *vma;
>> -		struct page_req_dsc *req;
>> -		struct qi_desc resp;
>> -		int result;
>> -		vm_fault_t ret;
>> -		u64 address;
>> -
>> -		handled = 1;
>> -
>> -		req = &iommu->prq[head / sizeof(*req)];
>> +	int result = QI_RESP_FAILURE;
>> +	struct intel_svm_dev *sdev;
>> +	struct vm_area_struct *vma;
>> +	struct intel_svm *svm;
>> +	struct qi_desc resp;
>> +	vm_fault_t ret;
>> +	u64 address;
>> +
>> +	address = (u64)req->addr << VTD_PAGE_SHIFT;
>> +	if (!req->pasid_present) {
>> +		pr_err("%s: Page request without PASID: %08llx %08llx\n",
>> +		       iommu->name, ((unsigned long long *)req)[0],
>> +		       ((unsigned long long *)req)[1]);
>> +		goto no_pasid;
>> +	}
>>
>> -		result = QI_RESP_FAILURE;
>> -		address = (u64)req->addr << VTD_PAGE_SHIFT;
>> -		if (!req->pasid_present) {
>> -			pr_err("%s: Page request without
>> PASID: %08llx %08llx\n",
>> -			       iommu->name, ((unsigned long long *)req)[0],
>> -			       ((unsigned long long *)req)[1]);
>> -			goto no_pasid;
>> -		}
>> +	rcu_read_lock();
>> +	svm = ioasid_find(NULL, req->pasid, NULL);
>> +	/*
>> +	 * It *can't* go away, because the driver is not permitted
>> +	 * to unbind the mm while any page faults are outstanding.
>> +	 * So we only need RCU to protect the internal idr code.
>> +	 */
>> +	rcu_read_unlock();
>>
>> -		if (!svm || svm->pasid != req->pasid) {
>> -			rcu_read_lock();
>> -			svm = ioasid_find(NULL, req->pasid, NULL);
>> -			/* It *can't* go away, because the driver is not
>> permitted
>> -			 * to unbind the mm while any page faults are
>> outstanding.
>> -			 * So we only need RCU to protect the internal idr
>> code. */
>> -			rcu_read_unlock();
>> -			if (IS_ERR_OR_NULL(svm)) {
>> -				pr_err("%s: Page request for invalid
>> PASID %d: %08llx %08llx\n",
>> -				       iommu->name, req->pasid, ((unsigned
>> long long *)req)[0],
>> -				       ((unsigned long long *)req)[1]);
>> -				goto no_pasid;
>> -			}
>> -		}
>> +	if (IS_ERR_OR_NULL(svm)) {
>> +		pr_err("%s: Page request for invalid
>> PASID %d: %08llx %08llx\n",
>> +		       iommu->name, req->pasid, ((unsigned long long *)req)[0],
>> +		       ((unsigned long long *)req)[1]);
>> +		goto no_pasid;
>> +	}
>>
>> -		result = QI_RESP_INVALID;
>> -		/* Since we're using init_mm.pgd directly, we should never
>> take
>> -		 * any faults on kernel addresses. */
>> -		if (!svm->mm)
>> -			goto bad_req;
>> +	result = QI_RESP_INVALID;
>> +	/* Since we're using init_mm.pgd directly, we should never take
>> +	 * any faults on kernel addresses. */
>> +	if (!svm->mm)
>> +		goto bad_req;
>> +
>> +	/* If address is not canonical, return invalid response */
>> +	if (!is_canonical_address(address))
>> +		goto bad_req;
>> +
>> +	/* If the mm is already defunct, don't handle faults. */
>> +	if (!mmget_not_zero(svm->mm))
>> +		goto bad_req;
>> +
>> +	down_read(&svm->mm->mmap_sem);
>> +	vma = find_extend_vma(svm->mm, address);
>> +	if (!vma || address < vma->vm_start)
>> +		goto invalid;
>> +
>> +	if (access_error(vma, req))
>> +		goto invalid;
>> +
>> +	ret = handle_mm_fault(vma, address,
>> +			      req->wr_req ? FAULT_FLAG_WRITE : 0);
>> +	if (ret & VM_FAULT_ERROR)
>> +		goto invalid;
>> +
>> +	result = QI_RESP_SUCCESS;
>> +invalid:
>> +	up_read(&svm->mm->mmap_sem);
>> +	mmput(svm->mm);
>> +bad_req:
>> +	/* Accounting for major/minor faults? */
>> +	rcu_read_lock();
>> +	list_for_each_entry_rcu(sdev, &svm->devs, list) {
>> +		if (sdev->sid == req->rid)
>> +			break;
>> +	}
>>
>> -		/* If address is not canonical, return invalid response */
>> -		if (!is_canonical_address(address))
>> -			goto bad_req;
>> +	/* Other devices can go away, but the drivers are not permitted
>> +	 * to unbind while any page faults might be in flight. So it's
>> +	 * OK to drop the 'lock' here now we have it. */
>> +	rcu_read_unlock();
>>
>> -		/* If the mm is already defunct, don't handle faults. */
>> -		if (!mmget_not_zero(svm->mm))
>> -			goto bad_req;
>> +	if (WARN_ON(&sdev->list == &svm->devs))
>> +		sdev = NULL;
>>
>> -		down_read(&svm->mm->mmap_sem);
>> -		vma = find_extend_vma(svm->mm, address);
>> -		if (!vma || address < vma->vm_start)
>> -			goto invalid;
>> +	if (sdev && sdev->ops && sdev->ops->fault_cb) {
>> +		int rwxp = (req->rd_req << 3) | (req->wr_req << 2) |
>> +			(req->exe_req << 1) | (req->pm_req);
>> +		sdev->ops->fault_cb(sdev->dev, req->pasid, req->addr,
>> +				    req->priv_data, rwxp, result);
>> +	}
>>
>> -		if (access_error(vma, req))
>> -			goto invalid;
>> +	/* We get here in the error case where the PASID lookup failed,
>> +	   and these can be NULL. Do not use them below this point! */
>> +	sdev = NULL;
>> +	svm = NULL;
>> +no_pasid:
>> +	if (req->lpig || req->priv_data_present) {
>> +		/*
>> +		 * Per VT-d spec. v3.0 ch7.7, system software must
>> +		 * respond with page group response if private data
>> +		 * is present (PDP) or last page in group (LPIG) bit
>> +		 * is set. This is an additional VT-d feature beyond
>> +		 * PCI ATS spec.
>> +		 */
>> +		resp.qw0 = QI_PGRP_PASID(req->pasid) |
>> +			QI_PGRP_DID(req->rid) |
>> +			QI_PGRP_PASID_P(req->pasid_present) |
>> +			QI_PGRP_PDP(req->pasid_present) |
>> +			QI_PGRP_RESP_CODE(result) |
>> +			QI_PGRP_RESP_TYPE;
>> +		resp.qw1 = QI_PGRP_IDX(req->prg_index) |
>> +			QI_PGRP_LPIG(req->lpig);
>> +
>> +		if (req->priv_data_present)
>> +			memcpy(&resp.qw2, req->priv_data,
>> +			       sizeof(req->priv_data));
>> +		resp.qw2 = 0;
>> +		resp.qw3 = 0;
>> +		qi_submit_sync(iommu, &resp, 1, 0);
>> +	}
>> +}
>>
>> -		ret = handle_mm_fault(vma, address,
>> -				      req->wr_req ? FAULT_FLAG_WRITE : 0);
>> -		if (ret & VM_FAULT_ERROR)
>> -			goto invalid;
>> +static void intel_svm_process_prq(struct intel_iommu *iommu,
>> +				  struct page_req_dsc *prq,
>> +				  int head, int tail)
>> +{
>> +	struct page_req_dsc *req;
>>
>> -		result = QI_RESP_SUCCESS;
>> -	invalid:
>> -		up_read(&svm->mm->mmap_sem);
>> -		mmput(svm->mm);
>> -	bad_req:
>> -		/* Accounting for major/minor faults? */
>> -		rcu_read_lock();
>> -		list_for_each_entry_rcu(sdev, &svm->devs, list) {
>> -			if (sdev->sid == req->rid)
>> -				break;
>> -		}
>> -		/* Other devices can go away, but the drivers are not
>> permitted
>> -		 * to unbind while any page faults might be in flight. So it's
>> -		 * OK to drop the 'lock' here now we have it. */
>> -		rcu_read_unlock();
>> -
>> -		if (WARN_ON(&sdev->list == &svm->devs))
>> -			sdev = NULL;
>> -
>> -		if (sdev && sdev->ops && sdev->ops->fault_cb) {
>> -			int rwxp = (req->rd_req << 3) | (req->wr_req << 2) |
>> -				(req->exe_req << 1) | (req->pm_req);
>> -			sdev->ops->fault_cb(sdev->dev, req->pasid, req-
>>> addr,
>> -					    req->priv_data, rwxp, result);
>> -		}
>> -		/* We get here in the error case where the PASID lookup
>> failed,
>> -		   and these can be NULL. Do not use them below this point!
>> */
>> -		sdev = NULL;
>> -		svm = NULL;
>> -	no_pasid:
>> -		if (req->lpig || req->priv_data_present) {
>> -			/*
>> -			 * Per VT-d spec. v3.0 ch7.7, system software must
>> -			 * respond with page group response if private data
>> -			 * is present (PDP) or last page in group (LPIG) bit
>> -			 * is set. This is an additional VT-d feature beyond
>> -			 * PCI ATS spec.
>> -			 */
>> -			resp.qw0 = QI_PGRP_PASID(req->pasid) |
>> -				QI_PGRP_DID(req->rid) |
>> -				QI_PGRP_PASID_P(req->pasid_present) |
>> -				QI_PGRP_PDP(req->pasid_present) |
>> -				QI_PGRP_RESP_CODE(result) |
>> -				QI_PGRP_RESP_TYPE;
>> -			resp.qw1 = QI_PGRP_IDX(req->prg_index) |
>> -				QI_PGRP_LPIG(req->lpig);
>> -
>> -			if (req->priv_data_present)
>> -				memcpy(&resp.qw2, req->priv_data,
>> -				       sizeof(req->priv_data));
>> -			resp.qw2 = 0;
>> -			resp.qw3 = 0;
>> -			qi_submit_sync(iommu, &resp, 1, 0);
>> -		}
>> +	while (head != tail) {
>> +		req = &iommu->prq[head / sizeof(*req)];
>> +		process_single_prq(iommu, req);
>>   		head = (head + sizeof(*req)) & PRQ_RING_MASK;
>>   	}
>> +}
>> +
>> +static irqreturn_t prq_event_thread(int irq, void *d)
>> +{
>> +	struct intel_iommu *iommu = d;
>> +	int head, tail;
>>
>> +	/*
>> +	 * Clear PPR bit before reading head/tail registers, to
>> +	 * ensure that we get a new interrupt if needed.
>> +	 */
>> +	writel(DMA_PRS_PPR, iommu->reg + DMAR_PRS_REG);
>> +
>> +	tail = dmar_readq(iommu->reg + DMAR_PQT_REG) &
>> PRQ_RING_MASK;
>> +	head = dmar_readq(iommu->reg + DMAR_PQH_REG) &
>> PRQ_RING_MASK;
>> +	intel_svm_process_prq(iommu, iommu->prq, head, tail);
>>   	dmar_writeq(iommu->reg + DMAR_PQH_REG, tail);
>>
>> -	return IRQ_RETVAL(handled);
>> +	return IRQ_RETVAL(1);
> 
> this might be a functional change, since previously (0) could
> be returned when head==tail.

Yes.

I will change it to
	return IRQ_RETVAL(head != tail);

Best regards,
baolu

> 
>>   }
>>
>>   #define to_intel_svm_dev(handle) container_of(handle, struct
>> intel_svm_dev, sva)
>> --
>> 2.17.1
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 5/7] iommu/vt-d: Save prq descriptors in an internal list
  2020-04-15  9:30   ` Tian, Kevin
@ 2020-04-16  1:46     ` Lu Baolu
  2020-04-17  3:25       ` Lu Baolu
  0 siblings, 1 reply; 24+ messages in thread
From: Lu Baolu @ 2020-04-16  1:46 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel
  Cc: baolu.lu, Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

On 2020/4/15 17:30, Tian, Kevin wrote:
>> From: Lu Baolu<baolu.lu@linux.intel.com>
>> Sent: Wednesday, April 15, 2020 1:26 PM
>>
>> Currently, the page request interrupt thread handles the page
>> requests in the queue in this way:
>>
>> - Clear PPR bit to ensure new interrupt could come in;
>> - Read and record the head and tail registers;
>> - Handle all descriptors between head and tail;
>> - Write tail to head register.
>>
>> This might cause some descriptors to be handles multiple times.
>> An example sequence:
>>
>> - Thread A got scheduled with PRQ_1 and PRQ_2 in the queue;
>> - Thread A clear the PPR bit and record the head and tail;
>> - A new PRQ_3 comes and Thread B gets scheduled;
>> - Thread B record the head and tail which includes PRQ_1
>>    and PRQ_2.
> I may overlook something but isn't the prq interrupt thread
> per iommu then why would two prq threads contend here?

The prq interrupt could be masked by the PPR (Pending Page Request) bit
in Page Request Status Register. In the interrupt handling thread once
this bit is clear, new prq interrupts are allowed to be generated.

So, if a page request is in process and the PPR bit is cleared, another
page request from any devices under the same iommu could trigger another
interrupt thread.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 6/7] iommu/vt-d: Add page request draining support
  2020-04-15 11:10   ` Tian, Kevin
@ 2020-04-16  2:19     ` Lu Baolu
  2020-04-16  8:38     ` Lu Baolu
  1 sibling, 0 replies; 24+ messages in thread
From: Lu Baolu @ 2020-04-16  2:19 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel
  Cc: baolu.lu, Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

On 2020/4/15 19:10, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Wednesday, April 15, 2020 1:26 PM
>>
>> When a PASID is stopped or terminated, there can be pending
>> PRQs (requests that haven't received responses) in remapping
>> hardware. This adds the interface to drain page requests and
>> call it when a PASID is terminated.
>>
>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
>> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/intel-svm.c   | 90 ++++++++++++++++++++++++++++++++++---
>>   include/linux/intel-iommu.h |  1 +
>>   2 files changed, 86 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>> index 05aeb8ea51c4..736dd39fb52b 100644
>> --- a/drivers/iommu/intel-svm.c
>> +++ b/drivers/iommu/intel-svm.c
>> @@ -23,6 +23,7 @@
>>   #include "intel-pasid.h"
>>
>>   static irqreturn_t prq_event_thread(int irq, void *d);
>> +static void intel_svm_drain_prq(struct device *dev, int pasid);
>>
>>   #define PRQ_ORDER 0
>>
>> @@ -210,6 +211,7 @@ static void intel_mm_release(struct mmu_notifier
>> *mn, struct mm_struct *mm)
>>   	rcu_read_lock();
>>   	list_for_each_entry_rcu(sdev, &svm->devs, list) {
>>   		intel_pasid_tear_down_entry(svm->iommu, sdev->dev, svm-
>>> pasid);
>> +		intel_svm_drain_prq(sdev->dev, svm->pasid);
> 
> I feel there is a problem here. If you clear the PASID entry before draining,
> in-fly requests will hit unrecoverable fault instead, due to invalid PASID
> entry.

The in-fly requests will be ignored by IOMMU if the pasid entry is
empty. It won't result in an unrecoverable fault.

> 
>>   		intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>>   	}
>>   	rcu_read_unlock();
>> @@ -403,12 +405,8 @@ int intel_svm_unbind_gpasid(struct device *dev, int
>> pasid)
>>   		if (!sdev->users) {
>>   			list_del_rcu(&sdev->list);
>>   			intel_pasid_tear_down_entry(iommu, dev, svm-
>>> pasid);
>> +			intel_svm_drain_prq(dev, svm->pasid);
>>   			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>> -			/* TODO: Drain in flight PRQ for the PASID since it
>> -			 * may get reused soon, we don't want to
>> -			 * confuse with its previous life.
>> -			 * intel_svm_drain_prq(dev, pasid);
>> -			 */
>>   			kfree_rcu(sdev, rcu);
>>
>>   			if (list_empty(&svm->devs)) {
>> @@ -646,6 +644,7 @@ int intel_svm_unbind_mm(struct device *dev, int
>> pasid)
>>   			 * large and has to be physically contiguous. So it's
>>   			 * hard to be as defensive as we might like. */
>>   			intel_pasid_tear_down_entry(iommu, dev, svm-
>>> pasid);
>> +			intel_svm_drain_prq(dev, svm->pasid);
>>   			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>>   			kfree_rcu(sdev, rcu);
>>
>> @@ -703,6 +702,7 @@ struct page_req_dsc {
>>   struct page_req {
>>   	struct list_head list;
>>   	struct page_req_dsc desc;
>> +	struct completion complete;
>>   	unsigned int processing:1;
>>   	unsigned int drained:1;
>>   	unsigned int completed:1;
>> @@ -732,9 +732,83 @@ static bool is_canonical_address(u64 addr)
>>   	return (((saddr << shift) >> shift) == saddr);
>>   }
>>
>> +/**
>> + * intel_svm_drain_prq:
>> + *
>> + * Drain all pending page requests related to a specific pasid in both
>> + * software and hardware. The caller must guarantee that no more page
>> + * requests related to this pasid coming.
>> + */
>> +static void intel_svm_drain_prq(struct device *dev, int pasid)
>> +{
>> +	struct device_domain_info *info;
>> +	struct dmar_domain *domain;
>> +	struct intel_iommu *iommu;
>> +	struct qi_desc desc[3];
>> +	struct pci_dev *pdev;
>> +	struct page_req *req;
>> +	unsigned long flags;
>> +	u16 sid, did;
>> +	int qdep;
>> +
>> +	info = get_domain_info(dev);
>> +	if (WARN_ON(!info || !dev_is_pci(dev)))
>> +		return;
>> +
>> +	iommu = info->iommu;
>> +	domain = info->domain;
>> +	pdev = to_pci_dev(dev);
>> +
>> +	/* Mark all related pending requests drained. */
>> +	spin_lock_irqsave(&iommu->prq_lock, flags);
>> +	list_for_each_entry(req, &iommu->prq_list, list)
>> +		if (req->desc.pasid_present && req->desc.pasid == pasid)
>> +			req->drained = true;
>> +	spin_unlock_irqrestore(&iommu->prq_lock, flags);
>> +
>> +	/* Wait until all related pending requests complete. */
>> +retry:
>> +	spin_lock_irqsave(&iommu->prq_lock, flags);
>> +	list_for_each_entry(req, &iommu->prq_list, list) {
>> +		if (req->desc.pasid_present &&
>> +		    req->desc.pasid == pasid &&
>> +		    !req->completed) {
>> +			spin_unlock_irqrestore(&iommu->prq_lock, flags);
>> +			wait_for_completion_timeout(&req->complete, 5 *
>> HZ);
>> +			goto retry;
>> +		}
>> +	}
>> +	spin_unlock_irqrestore(&iommu->prq_lock, flags);
>> +
>> +	/*
>> +	 * Perform steps described in VT-d spec CH7.10 to drain page
>> +	 * request and responses in hardware.
>> +	 */
>> +	sid = PCI_DEVID(info->bus, info->devfn);
>> +	did = domain->iommu_did[iommu->seq_id];
>> +	qdep = pci_ats_queue_depth(pdev);
>> +
>> +	memset(desc, 0, sizeof(desc));
>> +	desc[0].qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
>> +			QI_IWD_FENCE |
>> +			QI_IWD_TYPE;
>> +	desc[1].qw0 = QI_EIOTLB_PASID(pasid) |
>> +			QI_EIOTLB_DID(did) |
>> +			QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
>> +			QI_EIOTLB_TYPE;
>> +	desc[2].qw0 = QI_DEV_EIOTLB_PASID(pasid) |
>> +			QI_DEV_EIOTLB_SID(sid) |
>> +			QI_DEV_EIOTLB_QDEP(qdep) |
>> +			QI_DEIOTLB_TYPE |
>> +			QI_DEV_IOTLB_PFSID(info->pfsid);
>> +
>> +	qi_submit_sync(iommu, desc, 3, QI_OPT_WAIT_DRAIN);
> 
> the completion of above sequence ensures that previous queued
> page group responses are sent out and received by the endpoint
> and vice versa all in-fly page requests from the endpoint are queued
> in iommu page request queue. Then comes a problem - you didn't
> wait for completion of those newly-queued requests and their
> responses.

We have emptied the pasid entry and invalidate the related caches, IOMMU
will ignore any new-coming page requests.

> 
> According to VT-d spec 7.10, step (d) mentions when queue overflow
> happens, software needs to repeat the above draining sequence to
> drain auto-responses.

Page request queue overflow is not checked and handled in the prq
interrupt thread. My plan is to add it in a separated patch set. Maybe I
need to state this in the cover letter.

> 
> According to VT-d spec 7.11, the device driver must be notified to
> revoke the PASID before this draining sequence happens. When
> does that happen? Possibly can add some comment to explain such
> background.

Currently, page request drain only happens in unbind() operations. That
ensures that the device driver and the endpoint device have revoked the
pasid. As for how should kernel handle pasid termination before
unbind(), it's still under discussion. For now, AFAICS, it seems that
the acceptable solution is to delay the release of a pasid until ubind()
happens.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 6/7] iommu/vt-d: Add page request draining support
  2020-04-15 11:10   ` Tian, Kevin
  2020-04-16  2:19     ` Lu Baolu
@ 2020-04-16  8:38     ` Lu Baolu
  2020-04-17  2:27       ` Tian, Kevin
  1 sibling, 1 reply; 24+ messages in thread
From: Lu Baolu @ 2020-04-16  8:38 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel
  Cc: baolu.lu, Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

Hi Kevin,

On 2020/4/15 19:10, Tian, Kevin wrote:
> the completion of above sequence ensures that previous queued
> page group responses are sent out and received by the endpoint
> and vice versa all in-fly page requests from the endpoint are queued
> in iommu page request queue. Then comes a problem - you didn't
> wait for completion of those newly-queued requests and their
> responses.

I thought about this again.

We do page request draining after PASID table entry gets torn down and
the devTLB gets invalidated. At this point, if any new page request for
this pasid comes in, IOMMU will generate an unrecoverable fault and
response the device with IR (Invalid Request). IOMMU won't put this page
request into the queue. [VT-d spec 7.4.1]

Hence, we don't need to worry about the newly-queued requests here.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 6/7] iommu/vt-d: Add page request draining support
  2020-04-16  8:38     ` Lu Baolu
@ 2020-04-17  2:27       ` Tian, Kevin
  0 siblings, 0 replies; 24+ messages in thread
From: Tian, Kevin @ 2020-04-17  2:27 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel
  Cc: Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Thursday, April 16, 2020 4:38 PM
> 
> Hi Kevin,
> 
> On 2020/4/15 19:10, Tian, Kevin wrote:
> > the completion of above sequence ensures that previous queued
> > page group responses are sent out and received by the endpoint
> > and vice versa all in-fly page requests from the endpoint are queued
> > in iommu page request queue. Then comes a problem - you didn't
> > wait for completion of those newly-queued requests and their
> > responses.
> 
> I thought about this again.
> 
> We do page request draining after PASID table entry gets torn down and
> the devTLB gets invalidated. At this point, if any new page request for
> this pasid comes in, IOMMU will generate an unrecoverable fault and
> response the device with IR (Invalid Request). IOMMU won't put this page
> request into the queue. [VT-d spec 7.4.1]

Non-coverable fault implies severe errors, so I don't see why we should
allow such thing happen when handling a normal situation. if you look at
the start of chapter 7:
--
Non-recoverable Faults: Requests that encounter non-recoverable 
address translation faults are aborted by the remapping hardware, 
and typically require a reset of the device (such as through a function-
level-reset) to recover and re-initialize the device to put it back into 
service.
--

> 
> Hence, we don't need to worry about the newly-queued requests here.
> 
> Best regards,
> Baolu

Thanks
Kevin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 5/7] iommu/vt-d: Save prq descriptors in an internal list
  2020-04-16  1:46     ` Lu Baolu
@ 2020-04-17  3:25       ` Lu Baolu
  0 siblings, 0 replies; 24+ messages in thread
From: Lu Baolu @ 2020-04-17  3:25 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel
  Cc: baolu.lu, Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

Hi Kevin,

On 2020/4/16 9:46, Lu Baolu wrote:
> On 2020/4/15 17:30, Tian, Kevin wrote:
>>> From: Lu Baolu<baolu.lu@linux.intel.com>
>>> Sent: Wednesday, April 15, 2020 1:26 PM
>>>
>>> Currently, the page request interrupt thread handles the page
>>> requests in the queue in this way:
>>>
>>> - Clear PPR bit to ensure new interrupt could come in;
>>> - Read and record the head and tail registers;
>>> - Handle all descriptors between head and tail;
>>> - Write tail to head register.
>>>
>>> This might cause some descriptors to be handles multiple times.
>>> An example sequence:
>>>
>>> - Thread A got scheduled with PRQ_1 and PRQ_2 in the queue;
>>> - Thread A clear the PPR bit and record the head and tail;
>>> - A new PRQ_3 comes and Thread B gets scheduled;
>>> - Thread B record the head and tail which includes PRQ_1
>>>    and PRQ_2.
>> I may overlook something but isn't the prq interrupt thread
>> per iommu then why would two prq threads contend here?
> 
> The prq interrupt could be masked by the PPR (Pending Page Request) bit
> in Page Request Status Register. In the interrupt handling thread once
> this bit is clear, new prq interrupts are allowed to be generated.
> 
> So, if a page request is in process and the PPR bit is cleared, another
> page request from any devices under the same iommu could trigger another
> interrupt thread.

Rechecked the code. You are right. As long as the interrupt thread is
per iommu, there will only single prq thread scheduled. I will change
this accordingly in the new version. Thank you for pointing this out.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-04-17  3:25 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-15  5:25 [PATCH v2 0/7] iommu/vt-d: Add page request draining support Lu Baolu
2020-04-15  5:25 ` [PATCH v2 1/7] iommu/vt-d: Refactor parameters for qi_submit_sync() Lu Baolu
2020-04-15  8:02   ` Tian, Kevin
2020-04-15  8:33     ` Lu Baolu
2020-04-15  5:25 ` [PATCH v2 2/7] iommu/vt-d: Multiple descriptors per qi_submit_sync() Lu Baolu
2020-04-15  8:18   ` Tian, Kevin
2020-04-15  8:30     ` Lu Baolu
2020-04-15  8:51       ` Tian, Kevin
2020-04-15  5:25 ` [PATCH v2 3/7] iommu/vt-d: debugfs: Add support to show inv queue internals Lu Baolu
2020-04-15  5:25 ` [PATCH v2 4/7] iommu/vt-d: Refactor prq_event_thread() Lu Baolu
2020-04-15  9:15   ` Tian, Kevin
2020-04-16  1:33     ` Lu Baolu
2020-04-15  5:25 ` [PATCH v2 5/7] iommu/vt-d: Save prq descriptors in an internal list Lu Baolu
2020-04-15  9:30   ` Tian, Kevin
2020-04-16  1:46     ` Lu Baolu
2020-04-17  3:25       ` Lu Baolu
2020-04-15  5:25 ` [PATCH v2 6/7] iommu/vt-d: Add page request draining support Lu Baolu
2020-04-15 11:10   ` Tian, Kevin
2020-04-16  2:19     ` Lu Baolu
2020-04-16  8:38     ` Lu Baolu
2020-04-17  2:27       ` Tian, Kevin
2020-04-15  5:25 ` [PATCH v2 7/7] iommu/vt-d: Remove redundant IOTLB flush Lu Baolu
2020-04-15  7:57 ` [PATCH v2 0/7] iommu/vt-d: Add page request draining support Tian, Kevin
2020-04-15  8:25   ` Lu Baolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).