linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] iommu/vt-d: Add page request draining support
@ 2020-05-07  0:55 Lu Baolu
  2020-05-07  0:55 ` [PATCH v4 1/5] iommu/vt-d: Multiple descriptors per qi_submit_sync() Lu Baolu
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Lu Baolu @ 2020-05-07  0:55 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

When a PASID is stopped or terminated, there can be pending PRQs
(requests that haven't received responses) in the software and
remapping hardware. The pending page requests must be drained
so that the pasid could be reused. The chapter 7.10 in the VT-d
specification specifies the software steps to drain pending page
requests and responses.

This includes two parts:
 - PATCH 1/4 ~ 2/4: refactor the qi_submit_sync() to support multiple
   descriptors per submission which will be used in the following
   patch.
 - PATCH 3/4 ~ 4/4: add page request drain support after a pasid entry
   is torn down.

Best regards,
baolu

Change log:
v3->v4:
  - Remove prq drain in mm notifier;
  - Set PASID FPD bit when pasid is cleared in mm notifier and clear
    it in unbound().

 v2->v3:
  - Address Kevin's review comments
    - Squash the first 2 patches together;
    - The prq thread is serialized, no need to consider reentrance;
    - Ensure no new-coming prq before drain prq in queue;
    - Handle page request overflow case.

 v1->v2:
  - Fix race between multiple prq handling threads.

Lu Baolu (5):
  iommu/vt-d: Multiple descriptors per qi_submit_sync()
  iommu/vt-d: debugfs: Add support to show inv queue internals
  iommu/vt-d: Disable non-recoverable fault processing before unbind
  iommu/vt-d: Add page request draining support
  iommu/vt-d: Remove redundant IOTLB flush

 drivers/iommu/dmar.c                |  63 ++++++++------
 drivers/iommu/intel-iommu-debugfs.c |  62 ++++++++++++++
 drivers/iommu/intel-iommu.c         |   4 +-
 drivers/iommu/intel-pasid.c         |  30 +++++--
 drivers/iommu/intel-pasid.h         |   3 +-
 drivers/iommu/intel-svm.c           | 123 ++++++++++++++++++++++++----
 drivers/iommu/intel_irq_remapping.c |   2 +-
 include/linux/intel-iommu.h         |  13 ++-
 8 files changed, 247 insertions(+), 53 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v4 1/5] iommu/vt-d: Multiple descriptors per qi_submit_sync()
  2020-05-07  0:55 [PATCH v4 0/5] iommu/vt-d: Add page request draining support Lu Baolu
@ 2020-05-07  0:55 ` Lu Baolu
  2020-05-07  5:34   ` Tian, Kevin
  2020-05-07  0:55 ` [PATCH v4 2/5] iommu/vt-d: debugfs: Add support to show inv queue internals Lu Baolu
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Lu Baolu @ 2020-05-07  0:55 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

Current qi_submit_sync() only supports single invalidation descriptor
per submission and appends wait descriptor after each submission to
poll the hardware completion. This extends the qi_submit_sync() helper
to support multiple descriptors, and add an option so that the caller
could specify the Page-request Drain (PD) bit in the wait descriptor.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/dmar.c                | 63 +++++++++++++++++------------
 drivers/iommu/intel-pasid.c         |  4 +-
 drivers/iommu/intel-svm.c           |  6 +--
 drivers/iommu/intel_irq_remapping.c |  2 +-
 include/linux/intel-iommu.h         |  9 ++++-
 5 files changed, 52 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index d9dc787feef7..61d049e91f84 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1157,12 +1157,11 @@ static inline void reclaim_free_desc(struct q_inval *qi)
 	}
 }
 
-static int qi_check_fault(struct intel_iommu *iommu, int index)
+static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index)
 {
 	u32 fault;
 	int head, tail;
 	struct q_inval *qi = iommu->qi;
-	int wait_index = (index + 1) % QI_LENGTH;
 	int shift = qi_shift(iommu);
 
 	if (qi->desc_status[wait_index] == QI_ABORT)
@@ -1225,17 +1224,21 @@ static int qi_check_fault(struct intel_iommu *iommu, int index)
 }
 
 /*
- * Submit the queued invalidation descriptor to the remapping
- * hardware unit and wait for its completion.
+ * Function to submit invalidation descriptors of all types to the queued
+ * invalidation interface(QI). Multiple descriptors can be submitted at a
+ * time, a wait descriptor will be appended to each submission to ensure
+ * hardware has completed the invalidation before return. Wait descriptors
+ * can be part of the submission but it will not be polled for completion.
  */
-int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
+int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
+		   unsigned int count, unsigned long options)
 {
-	int rc;
 	struct q_inval *qi = iommu->qi;
-	int offset, shift, length;
 	struct qi_desc wait_desc;
 	int wait_index, index;
 	unsigned long flags;
+	int offset, shift;
+	int rc, i;
 
 	if (!qi)
 		return 0;
@@ -1244,32 +1247,41 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 	rc = 0;
 
 	raw_spin_lock_irqsave(&qi->q_lock, flags);
-	while (qi->free_cnt < 3) {
+	/*
+	 * Check if we have enough empty slots in the queue to submit,
+	 * the calculation is based on:
+	 * # of desc + 1 wait desc + 1 space between head and tail
+	 */
+	while (qi->free_cnt < count + 2) {
 		raw_spin_unlock_irqrestore(&qi->q_lock, flags);
 		cpu_relax();
 		raw_spin_lock_irqsave(&qi->q_lock, flags);
 	}
 
 	index = qi->free_head;
-	wait_index = (index + 1) % QI_LENGTH;
+	wait_index = (index + count) % QI_LENGTH;
 	shift = qi_shift(iommu);
-	length = 1 << shift;
 
-	qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE;
+	for (i = 0; i < count; i++) {
+		offset = ((index + i) % QI_LENGTH) << shift;
+		memcpy(qi->desc + offset, &desc[i], 1 << shift);
+		qi->desc_status[(index + i) % QI_LENGTH] = QI_IN_USE;
+	}
+	qi->desc_status[wait_index] = QI_IN_USE;
 
-	offset = index << shift;
-	memcpy(qi->desc + offset, desc, length);
 	wait_desc.qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
 			QI_IWD_STATUS_WRITE | QI_IWD_TYPE;
+	if (options & QI_OPT_WAIT_DRAIN)
+		wait_desc.qw0 |= QI_IWD_PRQ_DRAIN;
 	wait_desc.qw1 = virt_to_phys(&qi->desc_status[wait_index]);
 	wait_desc.qw2 = 0;
 	wait_desc.qw3 = 0;
 
 	offset = wait_index << shift;
-	memcpy(qi->desc + offset, &wait_desc, length);
+	memcpy(qi->desc + offset, &wait_desc, 1 << shift);
 
-	qi->free_head = (qi->free_head + 2) % QI_LENGTH;
-	qi->free_cnt -= 2;
+	qi->free_head = (qi->free_head + count + 1) % QI_LENGTH;
+	qi->free_cnt -= count + 1;
 
 	/*
 	 * update the HW tail register indicating the presence of
@@ -1285,7 +1297,7 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 		 * a deadlock where the interrupt context can wait indefinitely
 		 * for free slots in the queue.
 		 */
-		rc = qi_check_fault(iommu, index);
+		rc = qi_check_fault(iommu, index, wait_index);
 		if (rc)
 			break;
 
@@ -1294,7 +1306,8 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 		raw_spin_lock(&qi->q_lock);
 	}
 
-	qi->desc_status[index] = QI_DONE;
+	for (i = 0; i < count; i++)
+		qi->desc_status[(index + i) % QI_LENGTH] = QI_DONE;
 
 	reclaim_free_desc(qi);
 	raw_spin_unlock_irqrestore(&qi->q_lock, flags);
@@ -1318,7 +1331,7 @@ void qi_global_iec(struct intel_iommu *iommu)
 	desc.qw3 = 0;
 
 	/* should never fail */
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm,
@@ -1332,7 +1345,7 @@ void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
@@ -1356,7 +1369,7 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
@@ -1378,7 +1391,7 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 /* PASID-based IOTLB invalidation */
@@ -1419,7 +1432,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
 				QI_EIOTLB_AM(mask);
 	}
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 /* PASID-based device IOTLB Invalidate */
@@ -1448,7 +1461,7 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 	if (size_order)
 		desc.qw1 |= QI_DEV_EIOTLB_SIZE;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
@@ -1458,7 +1471,7 @@ void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
 
 	desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) |
 			QI_PC_GRAN(granu) | QI_PC_TYPE;
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 /*
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 48cc9ca5f3dc..7969e3dac2ad 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -498,7 +498,7 @@ pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 static void
@@ -512,7 +512,7 @@ iotlb_invalidation_with_pasid(struct intel_iommu *iommu, u16 did, u32 pasid)
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 static void
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index e9f4e979a71f..83dc4319f661 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -138,7 +138,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
 	}
 	desc.qw2 = 0;
 	desc.qw3 = 0;
-	qi_submit_sync(&desc, svm->iommu);
+	qi_submit_sync(svm->iommu, &desc, 1, 0);
 
 	if (sdev->dev_iotlb) {
 		desc.qw0 = QI_DEV_EIOTLB_PASID(svm->pasid) |
@@ -162,7 +162,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
 		}
 		desc.qw2 = 0;
 		desc.qw3 = 0;
-		qi_submit_sync(&desc, svm->iommu);
+		qi_submit_sync(svm->iommu, &desc, 1, 0);
 	}
 }
 
@@ -850,7 +850,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 				       sizeof(req->priv_data));
 			resp.qw2 = 0;
 			resp.qw3 = 0;
-			qi_submit_sync(&resp, iommu);
+			qi_submit_sync(iommu, &resp, 1, 0);
 		}
 		head = (head + sizeof(*req)) & PRQ_RING_MASK;
 	}
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 81e43c1df7ec..a042f123b091 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -151,7 +151,7 @@ static int qi_flush_iec(struct intel_iommu *iommu, int index, int mask)
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	return qi_submit_sync(&desc, iommu);
+	return qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 static int modify_irte(struct irq_2_iommu *irq_iommu,
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index cfe720f10112..cca1e5f9aeaa 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -333,6 +333,7 @@ enum {
 
 #define QI_IWD_STATUS_DATA(d)	(((u64)d) << 32)
 #define QI_IWD_STATUS_WRITE	(((u64)1) << 5)
+#define QI_IWD_PRQ_DRAIN	(((u64)1) << 7)
 
 #define QI_IOTLB_DID(did) 	(((u64)did) << 16)
 #define QI_IOTLB_DR(dr) 	(((u64)dr) << 7)
@@ -710,7 +711,13 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
 			  int pasid);
 
-extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
+int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
+		   unsigned int count, unsigned long options);
+/*
+ * Options used in qi_submit_sync:
+ * QI_OPT_WAIT_DRAIN - Wait for PRQ drain completion, spec 6.5.2.8.
+ */
+#define QI_OPT_WAIT_DRAIN		BIT(0)
 
 extern int dmar_ir_support(void);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 2/5] iommu/vt-d: debugfs: Add support to show inv queue internals
  2020-05-07  0:55 [PATCH v4 0/5] iommu/vt-d: Add page request draining support Lu Baolu
  2020-05-07  0:55 ` [PATCH v4 1/5] iommu/vt-d: Multiple descriptors per qi_submit_sync() Lu Baolu
@ 2020-05-07  0:55 ` Lu Baolu
  2020-05-07  5:39   ` Tian, Kevin
  2020-05-07 16:47   ` Jacob Pan
  2020-05-07  0:55 ` [PATCH v4 3/5] iommu/vt-d: Disable non-recoverable fault processing before unbind Lu Baolu
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 21+ messages in thread
From: Lu Baolu @ 2020-05-07  0:55 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

Export invalidation queue internals of each iommu device through the
debugfs.

Example of such dump on a Skylake machine:

$ sudo cat /sys/kernel/debug/iommu/intel/invalidation_queue
Invalidation queue on IOMMU: dmar1
 Base: 0x1672c9000      Head: 80        Tail: 80
Index           qw0                     qw1                     status
    0   0000000000000004        0000000000000000        0000000000000000
    1   0000000200000025        00000001672be804        0000000000000000
    2   0000000000000011        0000000000000000        0000000000000000
    3   0000000200000025        00000001672be80c        0000000000000000
    4   00000000000000d2        0000000000000000        0000000000000000
    5   0000000200000025        00000001672be814        0000000000000000
    6   0000000000000014        0000000000000000        0000000000000000
    7   0000000200000025        00000001672be81c        0000000000000000
    8   0000000000000014        0000000000000000        0000000000000000
    9   0000000200000025        00000001672be824        0000000000000000

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu-debugfs.c | 62 +++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/drivers/iommu/intel-iommu-debugfs.c b/drivers/iommu/intel-iommu-debugfs.c
index 3eb1fe240fb0..e3089865b8f3 100644
--- a/drivers/iommu/intel-iommu-debugfs.c
+++ b/drivers/iommu/intel-iommu-debugfs.c
@@ -372,6 +372,66 @@ static int domain_translation_struct_show(struct seq_file *m, void *unused)
 }
 DEFINE_SHOW_ATTRIBUTE(domain_translation_struct);
 
+static void invalidation_queue_entry_show(struct seq_file *m,
+					  struct intel_iommu *iommu)
+{
+	int index, shift = qi_shift(iommu);
+	struct qi_desc *desc;
+	int offset;
+
+	if (ecap_smts(iommu->ecap))
+		seq_puts(m, "Index\t\tqw0\t\t\tqw1\t\t\tqw2\t\t\tqw3\t\t\tstatus\n");
+	else
+		seq_puts(m, "Index\t\tqw0\t\t\tqw1\t\t\tstatus\n");
+
+	for (index = 0; index < QI_LENGTH; index++) {
+		offset = index << shift;
+		desc = iommu->qi->desc + offset;
+		if (ecap_smts(iommu->ecap))
+			seq_printf(m, "%5d\t%016llx\t%016llx\t%016llx\t%016llx\t%016x\n",
+				   index, desc->qw0, desc->qw1,
+				   desc->qw2, desc->qw3,
+				   iommu->qi->desc_status[index]);
+		else
+			seq_printf(m, "%5d\t%016llx\t%016llx\t%016x\n",
+				   index, desc->qw0, desc->qw1,
+				   iommu->qi->desc_status[index]);
+	}
+}
+
+static int invalidation_queue_show(struct seq_file *m, void *unused)
+{
+	struct dmar_drhd_unit *drhd;
+	struct intel_iommu *iommu;
+	unsigned long flags;
+	struct q_inval *qi;
+	int shift;
+
+	rcu_read_lock();
+	for_each_active_iommu(iommu, drhd) {
+		qi = iommu->qi;
+		shift = qi_shift(iommu);
+
+		if (!qi || !ecap_qis(iommu->ecap))
+			continue;
+
+		seq_printf(m, "Invalidation queue on IOMMU: %s\n", iommu->name);
+
+		raw_spin_lock_irqsave(&qi->q_lock, flags);
+		seq_printf(m, " Base: 0x%llx\tHead: %lld\tTail: %lld\n",
+			   virt_to_phys(qi->desc),
+			   dmar_readq(iommu->reg + DMAR_IQH_REG) >> shift,
+			   dmar_readq(iommu->reg + DMAR_IQT_REG) >> shift);
+		invalidation_queue_entry_show(m, iommu);
+		raw_spin_unlock_irqrestore(&qi->q_lock, flags);
+		seq_putc(m, '\n');
+	}
+	rcu_read_unlock();
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(invalidation_queue);
+
 #ifdef CONFIG_IRQ_REMAP
 static void ir_tbl_remap_entry_show(struct seq_file *m,
 				    struct intel_iommu *iommu)
@@ -490,6 +550,8 @@ void __init intel_iommu_debugfs_init(void)
 	debugfs_create_file("domain_translation_struct", 0444,
 			    intel_iommu_debug, NULL,
 			    &domain_translation_struct_fops);
+	debugfs_create_file("invalidation_queue", 0444, intel_iommu_debug,
+			    NULL, &invalidation_queue_fops);
 #ifdef CONFIG_IRQ_REMAP
 	debugfs_create_file("ir_translation_struct", 0444, intel_iommu_debug,
 			    NULL, &ir_translation_struct_fops);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 3/5] iommu/vt-d: Disable non-recoverable fault processing before unbind
  2020-05-07  0:55 [PATCH v4 0/5] iommu/vt-d: Add page request draining support Lu Baolu
  2020-05-07  0:55 ` [PATCH v4 1/5] iommu/vt-d: Multiple descriptors per qi_submit_sync() Lu Baolu
  2020-05-07  0:55 ` [PATCH v4 2/5] iommu/vt-d: debugfs: Add support to show inv queue internals Lu Baolu
@ 2020-05-07  0:55 ` Lu Baolu
  2020-05-07  5:45   ` Tian, Kevin
  2020-05-07 16:55   ` Jacob Pan
  2020-05-07  0:55 ` [PATCH v4 4/5] iommu/vt-d: Add page request draining support Lu Baolu
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 21+ messages in thread
From: Lu Baolu @ 2020-05-07  0:55 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

When a PASID is used for SVA by the device, it's possible that the PASID
entry is cleared before the device flushes all ongoing DMA requests. The
IOMMU should ignore the non-recoverable faults caused by these requests.
Intel VT-d provides such function through the FPD bit of the PASID entry.
This sets FPD bit when PASID entry is cleared in the mm notifier and
clear it when the pasid is unbound.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c |  4 ++--
 drivers/iommu/intel-pasid.c | 26 +++++++++++++++++++++-----
 drivers/iommu/intel-pasid.h |  3 ++-
 drivers/iommu/intel-svm.c   |  9 ++++++---
 4 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index d1866c0905b1..7811422b5a68 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5352,7 +5352,7 @@ static void __dmar_remove_one_dev_info(struct device_domain_info *info)
 	if (info->dev) {
 		if (dev_is_pci(info->dev) && sm_supported(iommu))
 			intel_pasid_tear_down_entry(iommu, info->dev,
-					PASID_RID2PASID);
+					PASID_RID2PASID, false);
 
 		iommu_disable_dev_iotlb(info);
 		domain_context_clear(iommu, info->dev);
@@ -5587,7 +5587,7 @@ static void aux_domain_remove_dev(struct dmar_domain *domain,
 	auxiliary_unlink_device(domain, dev);
 
 	spin_lock(&iommu->lock);
-	intel_pasid_tear_down_entry(iommu, dev, domain->default_pasid);
+	intel_pasid_tear_down_entry(iommu, dev, domain->default_pasid, false);
 	domain_detach_iommu(domain, iommu);
 	spin_unlock(&iommu->lock);
 
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 7969e3dac2ad..11aef6c12972 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -292,7 +292,20 @@ static inline void pasid_clear_entry(struct pasid_entry *pe)
 	WRITE_ONCE(pe->val[7], 0);
 }
 
-static void intel_pasid_clear_entry(struct device *dev, int pasid)
+static inline void pasid_clear_entry_with_fpd(struct pasid_entry *pe)
+{
+	WRITE_ONCE(pe->val[0], PASID_PTE_FPD);
+	WRITE_ONCE(pe->val[1], 0);
+	WRITE_ONCE(pe->val[2], 0);
+	WRITE_ONCE(pe->val[3], 0);
+	WRITE_ONCE(pe->val[4], 0);
+	WRITE_ONCE(pe->val[5], 0);
+	WRITE_ONCE(pe->val[6], 0);
+	WRITE_ONCE(pe->val[7], 0);
+}
+
+static void
+intel_pasid_clear_entry(struct device *dev, int pasid, bool pf_ignore)
 {
 	struct pasid_entry *pe;
 
@@ -300,7 +313,10 @@ static void intel_pasid_clear_entry(struct device *dev, int pasid)
 	if (WARN_ON(!pe))
 		return;
 
-	pasid_clear_entry(pe);
+	if (pf_ignore)
+		pasid_clear_entry_with_fpd(pe);
+	else
+		pasid_clear_entry(pe);
 }
 
 static inline void pasid_set_bits(u64 *ptr, u64 mask, u64 bits)
@@ -533,8 +549,8 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
 	qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT);
 }
 
-void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
-				 struct device *dev, int pasid)
+void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev,
+				 int pasid, bool pf_ignore)
 {
 	struct pasid_entry *pte;
 	u16 did;
@@ -544,7 +560,7 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 		return;
 
 	did = pasid_get_domain_id(pte);
-	intel_pasid_clear_entry(dev, pasid);
+	intel_pasid_clear_entry(dev, pasid, pf_ignore);
 
 	if (!ecap_coherent(iommu->ecap))
 		clflush_cache_range(pte, sizeof(*pte));
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index a41b09b3ffde..e6dd95ffe381 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -15,6 +15,7 @@
 #define PASID_MAX			0x100000
 #define PASID_PTE_MASK			0x3F
 #define PASID_PTE_PRESENT		1
+#define PASID_PTE_FPD			2
 #define PDE_PFN_MASK			PAGE_MASK
 #define PASID_PDE_SHIFT			6
 #define MAX_NR_PASID_BITS		20
@@ -120,7 +121,7 @@ int intel_pasid_setup_nested(struct intel_iommu *iommu,
 			     struct iommu_gpasid_bind_data_vtd *pasid_data,
 			     struct dmar_domain *domain, int addr_width);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
-				 struct device *dev, int pasid);
+				 struct device *dev, int pasid, bool pf_ignore);
 int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
 void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid);
 #endif /* __INTEL_PASID_H */
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 83dc4319f661..9561ba59a170 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -207,7 +207,8 @@ static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 	 */
 	rcu_read_lock();
 	list_for_each_entry_rcu(sdev, &svm->devs, list) {
-		intel_pasid_tear_down_entry(svm->iommu, sdev->dev, svm->pasid);
+		intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
+					    svm->pasid, true);
 		intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 	}
 	rcu_read_unlock();
@@ -400,7 +401,8 @@ int intel_svm_unbind_gpasid(struct device *dev, int pasid)
 		sdev->users--;
 		if (!sdev->users) {
 			list_del_rcu(&sdev->list);
-			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+			intel_pasid_tear_down_entry(iommu, dev,
+						    svm->pasid, false);
 			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 			/* TODO: Drain in flight PRQ for the PASID since it
 			 * may get reused soon, we don't want to
@@ -643,7 +645,8 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
 			 * to use. We have a *shared* PASID table, because it's
 			 * large and has to be physically contiguous. So it's
 			 * hard to be as defensive as we might like. */
-			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+			intel_pasid_tear_down_entry(iommu, dev,
+						    svm->pasid, false);
 			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 			kfree_rcu(sdev, rcu);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 4/5] iommu/vt-d: Add page request draining support
  2020-05-07  0:55 [PATCH v4 0/5] iommu/vt-d: Add page request draining support Lu Baolu
                   ` (2 preceding siblings ...)
  2020-05-07  0:55 ` [PATCH v4 3/5] iommu/vt-d: Disable non-recoverable fault processing before unbind Lu Baolu
@ 2020-05-07  0:55 ` Lu Baolu
  2020-05-07  6:35   ` Tian, Kevin
  2020-05-07  0:55 ` [PATCH v4 5/5] iommu/vt-d: Remove redundant IOTLB flush Lu Baolu
  2020-05-07  6:38 ` [PATCH v4 0/5] iommu/vt-d: Add page request draining support Tian, Kevin
  5 siblings, 1 reply; 21+ messages in thread
From: Lu Baolu @ 2020-05-07  0:55 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

When a PASID is stopped or terminated, there can be pending PRQs
(requests that haven't received responses) in remapping hardware.
This adds the interface to drain page requests and call it when a
PASID is terminated.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-svm.c   | 102 ++++++++++++++++++++++++++++++++++--
 include/linux/intel-iommu.h |   4 ++
 2 files changed, 101 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 9561ba59a170..7256eb965cf6 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -23,6 +23,7 @@
 #include "intel-pasid.h"
 
 static irqreturn_t prq_event_thread(int irq, void *d);
+static void intel_svm_drain_prq(struct device *dev, int pasid);
 
 #define PRQ_ORDER 0
 
@@ -66,6 +67,8 @@ int intel_svm_enable_prq(struct intel_iommu *iommu)
 	dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
 	dmar_writeq(iommu->reg + DMAR_PQA_REG, virt_to_phys(iommu->prq) | PRQ_ORDER);
 
+	init_completion(&iommu->prq_complete);
+
 	return 0;
 }
 
@@ -403,12 +406,8 @@ int intel_svm_unbind_gpasid(struct device *dev, int pasid)
 			list_del_rcu(&sdev->list);
 			intel_pasid_tear_down_entry(iommu, dev,
 						    svm->pasid, false);
+			intel_svm_drain_prq(dev, svm->pasid);
 			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
-			/* TODO: Drain in flight PRQ for the PASID since it
-			 * may get reused soon, we don't want to
-			 * confuse with its previous life.
-			 * intel_svm_drain_prq(dev, pasid);
-			 */
 			kfree_rcu(sdev, rcu);
 
 			if (list_empty(&svm->devs)) {
@@ -647,6 +646,7 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
 			 * hard to be as defensive as we might like. */
 			intel_pasid_tear_down_entry(iommu, dev,
 						    svm->pasid, false);
+			intel_svm_drain_prq(dev, svm->pasid);
 			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 			kfree_rcu(sdev, rcu);
 
@@ -725,6 +725,92 @@ static bool is_canonical_address(u64 addr)
 	return (((saddr << shift) >> shift) == saddr);
 }
 
+/**
+ * intel_svm_drain_prq:
+ *
+ * Drain all pending page requests and responses related to a specific
+ * pasid in both software and hardware.
+ */
+static void intel_svm_drain_prq(struct device *dev, int pasid)
+{
+	struct device_domain_info *info;
+	struct dmar_domain *domain;
+	struct intel_iommu *iommu;
+	struct qi_desc desc[3];
+	struct pci_dev *pdev;
+	int head, tail;
+	u16 sid, did;
+	int qdep;
+
+	info = get_domain_info(dev);
+	if (WARN_ON(!info || !dev_is_pci(dev)))
+		return;
+
+	if (!info->ats_enabled)
+		return;
+
+	iommu = info->iommu;
+	domain = info->domain;
+	pdev = to_pci_dev(dev);
+	sid = PCI_DEVID(info->bus, info->devfn);
+	did = domain->iommu_did[iommu->seq_id];
+	qdep = pci_ats_queue_depth(pdev);
+
+	memset(desc, 0, sizeof(desc));
+	desc[0].qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
+			QI_IWD_FENCE |
+			QI_IWD_TYPE;
+	desc[1].qw0 = QI_EIOTLB_PASID(pasid) |
+			QI_EIOTLB_DID(did) |
+			QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
+			QI_EIOTLB_TYPE;
+	desc[2].qw0 = QI_DEV_EIOTLB_PASID(pasid) |
+			QI_DEV_EIOTLB_SID(sid) |
+			QI_DEV_EIOTLB_QDEP(qdep) |
+			QI_DEIOTLB_TYPE |
+			QI_DEV_IOTLB_PFSID(info->pfsid);
+
+	/*
+	 * Submit an invalidation wait descriptor with fence and page request
+	 * drain flags set to invalidation queue. This ensures that all requests
+	 * submitted to the invalidation queue ahead of this wait descriptor are
+	 * processed and completed, and all already issued page requests from
+	 * the device are put in the page request queue.
+	 */
+	qi_submit_sync(iommu, desc, 1, QI_OPT_WAIT_DRAIN);
+
+	/*
+	 * Check and wait until all pending page requests in the queue are
+	 * handled by the intr thread.
+	 */
+prq_retry:
+	tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
+	head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
+	while (head != tail) {
+		struct page_req_dsc *req;
+
+		req = &iommu->prq[head / sizeof(*req)];
+		if (!req->pasid_present || req->pasid != pasid) {
+			head = (head + sizeof(*req)) & PRQ_RING_MASK;
+			continue;
+		}
+
+		wait_for_completion_timeout(&iommu->prq_complete, HZ);
+		goto prq_retry;
+	}
+
+	/*
+	 * Perform steps described in VT-d spec CH7.10 to drain page
+	 * requests and responses in hardware.
+	 */
+qi_retry:
+	qi_submit_sync(iommu, desc, 3, QI_OPT_WAIT_DRAIN);
+	if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO) {
+		wait_for_completion_timeout(&iommu->prq_complete, HZ);
+		goto qi_retry;
+	}
+}
+
 static irqreturn_t prq_event_thread(int irq, void *d)
 {
 	struct intel_iommu *iommu = d;
@@ -859,6 +945,12 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 	}
 
 	dmar_writeq(iommu->reg + DMAR_PQH_REG, tail);
+	/*
+	 * Clear the page request overflow bit and wake up all threads that
+	 * are waiting for the completion of this handling.
+	 */
+	writel(DMA_PRS_PRO, iommu->reg + DMAR_PRS_REG);
+	complete(&iommu->prq_complete);
 
 	return IRQ_RETVAL(handled);
 }
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index cca1e5f9aeaa..a0512b401a59 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -292,6 +292,8 @@
 
 /* PRS_REG */
 #define DMA_PRS_PPR	((u32)1)
+#define DMA_PRS_PRO	((u32)2)
+
 #define DMA_VCS_PAS	((u64)1)
 
 #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts)			\
@@ -333,6 +335,7 @@ enum {
 
 #define QI_IWD_STATUS_DATA(d)	(((u64)d) << 32)
 #define QI_IWD_STATUS_WRITE	(((u64)1) << 5)
+#define QI_IWD_FENCE		(((u64)1) << 6)
 #define QI_IWD_PRQ_DRAIN	(((u64)1) << 7)
 
 #define QI_IOTLB_DID(did) 	(((u64)did) << 16)
@@ -590,6 +593,7 @@ struct intel_iommu {
 #ifdef CONFIG_INTEL_IOMMU_SVM
 	struct page_req_dsc *prq;
 	unsigned char prq_name[16];    /* Name for PRQ interrupt */
+	struct completion prq_complete;
 	struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for PASIDs */
 #endif
 	struct q_inval  *qi;            /* Queued invalidation info */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 5/5] iommu/vt-d: Remove redundant IOTLB flush
  2020-05-07  0:55 [PATCH v4 0/5] iommu/vt-d: Add page request draining support Lu Baolu
                   ` (3 preceding siblings ...)
  2020-05-07  0:55 ` [PATCH v4 4/5] iommu/vt-d: Add page request draining support Lu Baolu
@ 2020-05-07  0:55 ` Lu Baolu
  2020-05-07  6:37   ` Tian, Kevin
  2020-05-07  6:38 ` [PATCH v4 0/5] iommu/vt-d: Add page request draining support Tian, Kevin
  5 siblings, 1 reply; 21+ messages in thread
From: Lu Baolu @ 2020-05-07  0:55 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

IOTLB flush already included in the PASID tear down and the page request
drain process. There is no need to flush again.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-svm.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 7256eb965cf6..5ff05adc96e9 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -209,11 +209,9 @@ static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 	 * *has* to handle gracefully without affecting other processes.
 	 */
 	rcu_read_lock();
-	list_for_each_entry_rcu(sdev, &svm->devs, list) {
+	list_for_each_entry_rcu(sdev, &svm->devs, list)
 		intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
 					    svm->pasid, true);
-		intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
-	}
 	rcu_read_unlock();
 
 }
@@ -407,7 +405,6 @@ int intel_svm_unbind_gpasid(struct device *dev, int pasid)
 			intel_pasid_tear_down_entry(iommu, dev,
 						    svm->pasid, false);
 			intel_svm_drain_prq(dev, svm->pasid);
-			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 			kfree_rcu(sdev, rcu);
 
 			if (list_empty(&svm->devs)) {
@@ -647,7 +644,6 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
 			intel_pasid_tear_down_entry(iommu, dev,
 						    svm->pasid, false);
 			intel_svm_drain_prq(dev, svm->pasid);
-			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 			kfree_rcu(sdev, rcu);
 
 			if (list_empty(&svm->devs)) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* RE: [PATCH v4 1/5] iommu/vt-d: Multiple descriptors per qi_submit_sync()
  2020-05-07  0:55 ` [PATCH v4 1/5] iommu/vt-d: Multiple descriptors per qi_submit_sync() Lu Baolu
@ 2020-05-07  5:34   ` Tian, Kevin
  0 siblings, 0 replies; 21+ messages in thread
From: Tian, Kevin @ 2020-05-07  5:34 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel
  Cc: Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Thursday, May 7, 2020 8:56 AM
> 
> Current qi_submit_sync() only supports single invalidation descriptor
> per submission and appends wait descriptor after each submission to
> poll the hardware completion. This extends the qi_submit_sync() helper
> to support multiple descriptors, and add an option so that the caller
> could specify the Page-request Drain (PD) bit in the wait descriptor.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/dmar.c                | 63 +++++++++++++++++------------
>  drivers/iommu/intel-pasid.c         |  4 +-
>  drivers/iommu/intel-svm.c           |  6 +--
>  drivers/iommu/intel_irq_remapping.c |  2 +-
>  include/linux/intel-iommu.h         |  9 ++++-
>  5 files changed, 52 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index d9dc787feef7..61d049e91f84 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1157,12 +1157,11 @@ static inline void reclaim_free_desc(struct
> q_inval *qi)
>  	}
>  }
> 
> -static int qi_check_fault(struct intel_iommu *iommu, int index)
> +static int qi_check_fault(struct intel_iommu *iommu, int index, int
> wait_index)
>  {
>  	u32 fault;
>  	int head, tail;
>  	struct q_inval *qi = iommu->qi;
> -	int wait_index = (index + 1) % QI_LENGTH;
>  	int shift = qi_shift(iommu);
> 
>  	if (qi->desc_status[wait_index] == QI_ABORT)
> @@ -1225,17 +1224,21 @@ static int qi_check_fault(struct intel_iommu
> *iommu, int index)
>  }
> 
>  /*
> - * Submit the queued invalidation descriptor to the remapping
> - * hardware unit and wait for its completion.
> + * Function to submit invalidation descriptors of all types to the queued
> + * invalidation interface(QI). Multiple descriptors can be submitted at a
> + * time, a wait descriptor will be appended to each submission to ensure
> + * hardware has completed the invalidation before return. Wait descriptors
> + * can be part of the submission but it will not be polled for completion.
>   */
> -int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
> +int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
> +		   unsigned int count, unsigned long options)
>  {
> -	int rc;
>  	struct q_inval *qi = iommu->qi;
> -	int offset, shift, length;
>  	struct qi_desc wait_desc;
>  	int wait_index, index;
>  	unsigned long flags;
> +	int offset, shift;
> +	int rc, i;
> 
>  	if (!qi)
>  		return 0;
> @@ -1244,32 +1247,41 @@ int qi_submit_sync(struct qi_desc *desc, struct
> intel_iommu *iommu)
>  	rc = 0;
> 
>  	raw_spin_lock_irqsave(&qi->q_lock, flags);
> -	while (qi->free_cnt < 3) {
> +	/*
> +	 * Check if we have enough empty slots in the queue to submit,
> +	 * the calculation is based on:
> +	 * # of desc + 1 wait desc + 1 space between head and tail
> +	 */
> +	while (qi->free_cnt < count + 2) {
>  		raw_spin_unlock_irqrestore(&qi->q_lock, flags);
>  		cpu_relax();
>  		raw_spin_lock_irqsave(&qi->q_lock, flags);
>  	}
> 
>  	index = qi->free_head;
> -	wait_index = (index + 1) % QI_LENGTH;
> +	wait_index = (index + count) % QI_LENGTH;
>  	shift = qi_shift(iommu);
> -	length = 1 << shift;
> 
> -	qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE;
> +	for (i = 0; i < count; i++) {
> +		offset = ((index + i) % QI_LENGTH) << shift;
> +		memcpy(qi->desc + offset, &desc[i], 1 << shift);
> +		qi->desc_status[(index + i) % QI_LENGTH] = QI_IN_USE;
> +	}
> +	qi->desc_status[wait_index] = QI_IN_USE;
> 
> -	offset = index << shift;
> -	memcpy(qi->desc + offset, desc, length);
>  	wait_desc.qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
>  			QI_IWD_STATUS_WRITE | QI_IWD_TYPE;
> +	if (options & QI_OPT_WAIT_DRAIN)
> +		wait_desc.qw0 |= QI_IWD_PRQ_DRAIN;
>  	wait_desc.qw1 = virt_to_phys(&qi->desc_status[wait_index]);
>  	wait_desc.qw2 = 0;
>  	wait_desc.qw3 = 0;
> 
>  	offset = wait_index << shift;
> -	memcpy(qi->desc + offset, &wait_desc, length);
> +	memcpy(qi->desc + offset, &wait_desc, 1 << shift);
> 
> -	qi->free_head = (qi->free_head + 2) % QI_LENGTH;
> -	qi->free_cnt -= 2;
> +	qi->free_head = (qi->free_head + count + 1) % QI_LENGTH;
> +	qi->free_cnt -= count + 1;
> 
>  	/*
>  	 * update the HW tail register indicating the presence of
> @@ -1285,7 +1297,7 @@ int qi_submit_sync(struct qi_desc *desc, struct
> intel_iommu *iommu)
>  		 * a deadlock where the interrupt context can wait
> indefinitely
>  		 * for free slots in the queue.
>  		 */
> -		rc = qi_check_fault(iommu, index);
> +		rc = qi_check_fault(iommu, index, wait_index);
>  		if (rc)
>  			break;
> 
> @@ -1294,7 +1306,8 @@ int qi_submit_sync(struct qi_desc *desc, struct
> intel_iommu *iommu)
>  		raw_spin_lock(&qi->q_lock);
>  	}
> 
> -	qi->desc_status[index] = QI_DONE;
> +	for (i = 0; i < count; i++)
> +		qi->desc_status[(index + i) % QI_LENGTH] = QI_DONE;
> 
>  	reclaim_free_desc(qi);
>  	raw_spin_unlock_irqrestore(&qi->q_lock, flags);
> @@ -1318,7 +1331,7 @@ void qi_global_iec(struct intel_iommu *iommu)
>  	desc.qw3 = 0;
> 
>  	/* should never fail */
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm,
> @@ -1332,7 +1345,7 @@ void qi_flush_context(struct intel_iommu *iommu,
> u16 did, u16 sid, u8 fm,
>  	desc.qw2 = 0;
>  	desc.qw3 = 0;
> 
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> @@ -1356,7 +1369,7 @@ void qi_flush_iotlb(struct intel_iommu *iommu,
> u16 did, u64 addr,
>  	desc.qw2 = 0;
>  	desc.qw3 = 0;
> 
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> @@ -1378,7 +1391,7 @@ void qi_flush_dev_iotlb(struct intel_iommu
> *iommu, u16 sid, u16 pfsid,
>  	desc.qw2 = 0;
>  	desc.qw3 = 0;
> 
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  /* PASID-based IOTLB invalidation */
> @@ -1419,7 +1432,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu,
> u16 did, u32 pasid, u64 addr,
>  				QI_EIOTLB_AM(mask);
>  	}
> 
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  /* PASID-based device IOTLB Invalidate */
> @@ -1448,7 +1461,7 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu
> *iommu, u16 sid, u16 pfsid,
>  	if (size_order)
>  		desc.qw1 |= QI_DEV_EIOTLB_SIZE;
> 
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
> @@ -1458,7 +1471,7 @@ void qi_flush_pasid_cache(struct intel_iommu
> *iommu, u16 did,
> 
>  	desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) |
>  			QI_PC_GRAN(granu) | QI_PC_TYPE;
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  /*
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 48cc9ca5f3dc..7969e3dac2ad 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -498,7 +498,7 @@ pasid_cache_invalidation_with_pasid(struct
> intel_iommu *iommu,
>  	desc.qw2 = 0;
>  	desc.qw3 = 0;
> 
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  static void
> @@ -512,7 +512,7 @@ iotlb_invalidation_with_pasid(struct intel_iommu
> *iommu, u16 did, u32 pasid)
>  	desc.qw2 = 0;
>  	desc.qw3 = 0;
> 
> -	qi_submit_sync(&desc, iommu);
> +	qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  static void
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index e9f4e979a71f..83dc4319f661 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -138,7 +138,7 @@ static void intel_flush_svm_range_dev (struct
> intel_svm *svm, struct intel_svm_d
>  	}
>  	desc.qw2 = 0;
>  	desc.qw3 = 0;
> -	qi_submit_sync(&desc, svm->iommu);
> +	qi_submit_sync(svm->iommu, &desc, 1, 0);
> 
>  	if (sdev->dev_iotlb) {
>  		desc.qw0 = QI_DEV_EIOTLB_PASID(svm->pasid) |
> @@ -162,7 +162,7 @@ static void intel_flush_svm_range_dev (struct
> intel_svm *svm, struct intel_svm_d
>  		}
>  		desc.qw2 = 0;
>  		desc.qw3 = 0;
> -		qi_submit_sync(&desc, svm->iommu);
> +		qi_submit_sync(svm->iommu, &desc, 1, 0);
>  	}
>  }
> 
> @@ -850,7 +850,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
>  				       sizeof(req->priv_data));
>  			resp.qw2 = 0;
>  			resp.qw3 = 0;
> -			qi_submit_sync(&resp, iommu);
> +			qi_submit_sync(iommu, &resp, 1, 0);
>  		}
>  		head = (head + sizeof(*req)) & PRQ_RING_MASK;
>  	}
> diff --git a/drivers/iommu/intel_irq_remapping.c
> b/drivers/iommu/intel_irq_remapping.c
> index 81e43c1df7ec..a042f123b091 100644
> --- a/drivers/iommu/intel_irq_remapping.c
> +++ b/drivers/iommu/intel_irq_remapping.c
> @@ -151,7 +151,7 @@ static int qi_flush_iec(struct intel_iommu *iommu, int
> index, int mask)
>  	desc.qw2 = 0;
>  	desc.qw3 = 0;
> 
> -	return qi_submit_sync(&desc, iommu);
> +	return qi_submit_sync(iommu, &desc, 1, 0);
>  }
> 
>  static int modify_irte(struct irq_2_iommu *irq_iommu,
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index cfe720f10112..cca1e5f9aeaa 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -333,6 +333,7 @@ enum {
> 
>  #define QI_IWD_STATUS_DATA(d)	(((u64)d) << 32)
>  #define QI_IWD_STATUS_WRITE	(((u64)1) << 5)
> +#define QI_IWD_PRQ_DRAIN	(((u64)1) << 7)
> 
>  #define QI_IOTLB_DID(did) 	(((u64)did) << 16)
>  #define QI_IOTLB_DR(dr) 	(((u64)dr) << 7)
> @@ -710,7 +711,13 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu
> *iommu, u16 sid, u16 pfsid,
>  void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
>  			  int pasid);
> 
> -extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
> +int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
> +		   unsigned int count, unsigned long options);
> +/*
> + * Options used in qi_submit_sync:
> + * QI_OPT_WAIT_DRAIN - Wait for PRQ drain completion, spec 6.5.2.8.
> + */
> +#define QI_OPT_WAIT_DRAIN		BIT(0)
> 
>  extern int dmar_ir_support(void);
> 
> --
> 2.17.1

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v4 2/5] iommu/vt-d: debugfs: Add support to show inv queue internals
  2020-05-07  0:55 ` [PATCH v4 2/5] iommu/vt-d: debugfs: Add support to show inv queue internals Lu Baolu
@ 2020-05-07  5:39   ` Tian, Kevin
  2020-05-07 16:47   ` Jacob Pan
  1 sibling, 0 replies; 21+ messages in thread
From: Tian, Kevin @ 2020-05-07  5:39 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel
  Cc: Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Thursday, May 7, 2020 8:56 AM
> 
> Export invalidation queue internals of each iommu device through the
> debugfs.
> 
> Example of such dump on a Skylake machine:
> 
> $ sudo cat /sys/kernel/debug/iommu/intel/invalidation_queue
> Invalidation queue on IOMMU: dmar1
>  Base: 0x1672c9000      Head: 80        Tail: 80
> Index           qw0                     qw1                     status
>     0   0000000000000004        0000000000000000        0000000000000000
>     1   0000000200000025        00000001672be804        0000000000000000
>     2   0000000000000011        0000000000000000        0000000000000000
>     3   0000000200000025        00000001672be80c        0000000000000000
>     4   00000000000000d2        0000000000000000        0000000000000000
>     5   0000000200000025        00000001672be814        0000000000000000
>     6   0000000000000014        0000000000000000        0000000000000000
>     7   0000000200000025        00000001672be81c        0000000000000000
>     8   0000000000000014        0000000000000000        0000000000000000
>     9   0000000200000025        00000001672be824        0000000000000000
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-iommu-debugfs.c | 62
> +++++++++++++++++++++++++++++
>  1 file changed, 62 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu-debugfs.c b/drivers/iommu/intel-
> iommu-debugfs.c
> index 3eb1fe240fb0..e3089865b8f3 100644
> --- a/drivers/iommu/intel-iommu-debugfs.c
> +++ b/drivers/iommu/intel-iommu-debugfs.c
> @@ -372,6 +372,66 @@ static int domain_translation_struct_show(struct
> seq_file *m, void *unused)
>  }
>  DEFINE_SHOW_ATTRIBUTE(domain_translation_struct);
> 
> +static void invalidation_queue_entry_show(struct seq_file *m,
> +					  struct intel_iommu *iommu)
> +{
> +	int index, shift = qi_shift(iommu);
> +	struct qi_desc *desc;
> +	int offset;
> +
> +	if (ecap_smts(iommu->ecap))
> +		seq_puts(m,
> "Index\t\tqw0\t\t\tqw1\t\t\tqw2\t\t\tqw3\t\t\tstatus\n");
> +	else
> +		seq_puts(m, "Index\t\tqw0\t\t\tqw1\t\t\tstatus\n");
> +
> +	for (index = 0; index < QI_LENGTH; index++) {
> +		offset = index << shift;
> +		desc = iommu->qi->desc + offset;
> +		if (ecap_smts(iommu->ecap))
> +			seq_printf(m,
> "%5d\t%016llx\t%016llx\t%016llx\t%016llx\t%016x\n",
> +				   index, desc->qw0, desc->qw1,
> +				   desc->qw2, desc->qw3,
> +				   iommu->qi->desc_status[index]);
> +		else
> +			seq_printf(m, "%5d\t%016llx\t%016llx\t%016x\n",
> +				   index, desc->qw0, desc->qw1,
> +				   iommu->qi->desc_status[index]);
> +	}
> +}
> +
> +static int invalidation_queue_show(struct seq_file *m, void *unused)
> +{
> +	struct dmar_drhd_unit *drhd;
> +	struct intel_iommu *iommu;
> +	unsigned long flags;
> +	struct q_inval *qi;
> +	int shift;
> +
> +	rcu_read_lock();
> +	for_each_active_iommu(iommu, drhd) {
> +		qi = iommu->qi;
> +		shift = qi_shift(iommu);
> +
> +		if (!qi || !ecap_qis(iommu->ecap))
> +			continue;
> +
> +		seq_printf(m, "Invalidation queue on IOMMU: %s\n",
> iommu->name);
> +
> +		raw_spin_lock_irqsave(&qi->q_lock, flags);
> +		seq_printf(m, " Base: 0x%llx\tHead: %lld\tTail: %lld\n",
> +			   virt_to_phys(qi->desc),
> +			   dmar_readq(iommu->reg + DMAR_IQH_REG) >>
> shift,
> +			   dmar_readq(iommu->reg + DMAR_IQT_REG) >>
> shift);
> +		invalidation_queue_entry_show(m, iommu);
> +		raw_spin_unlock_irqrestore(&qi->q_lock, flags);
> +		seq_putc(m, '\n');
> +	}
> +	rcu_read_unlock();
> +
> +	return 0;
> +}
> +DEFINE_SHOW_ATTRIBUTE(invalidation_queue);
> +
>  #ifdef CONFIG_IRQ_REMAP
>  static void ir_tbl_remap_entry_show(struct seq_file *m,
>  				    struct intel_iommu *iommu)
> @@ -490,6 +550,8 @@ void __init intel_iommu_debugfs_init(void)
>  	debugfs_create_file("domain_translation_struct", 0444,
>  			    intel_iommu_debug, NULL,
>  			    &domain_translation_struct_fops);
> +	debugfs_create_file("invalidation_queue", 0444, intel_iommu_debug,
> +			    NULL, &invalidation_queue_fops);
>  #ifdef CONFIG_IRQ_REMAP
>  	debugfs_create_file("ir_translation_struct", 0444,
> intel_iommu_debug,
>  			    NULL, &ir_translation_struct_fops);
> --
> 2.17.1

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v4 3/5] iommu/vt-d: Disable non-recoverable fault processing before unbind
  2020-05-07  0:55 ` [PATCH v4 3/5] iommu/vt-d: Disable non-recoverable fault processing before unbind Lu Baolu
@ 2020-05-07  5:45   ` Tian, Kevin
  2020-05-07 13:23     ` Lu Baolu
  2020-05-07 16:55   ` Jacob Pan
  1 sibling, 1 reply; 21+ messages in thread
From: Tian, Kevin @ 2020-05-07  5:45 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel
  Cc: Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Thursday, May 7, 2020 8:56 AM
> 
> When a PASID is used for SVA by the device, it's possible that the PASID
> entry is cleared before the device flushes all ongoing DMA requests. The
> IOMMU should ignore the non-recoverable faults caused by these requests.
> Intel VT-d provides such function through the FPD bit of the PASID entry.
> This sets FPD bit when PASID entry is cleared in the mm notifier and
> clear it when the pasid is unbound.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-iommu.c |  4 ++--
>  drivers/iommu/intel-pasid.c | 26 +++++++++++++++++++++-----
>  drivers/iommu/intel-pasid.h |  3 ++-
>  drivers/iommu/intel-svm.c   |  9 ++++++---
>  4 files changed, 31 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index d1866c0905b1..7811422b5a68 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5352,7 +5352,7 @@ static void __dmar_remove_one_dev_info(struct
> device_domain_info *info)
>  	if (info->dev) {
>  		if (dev_is_pci(info->dev) && sm_supported(iommu))
>  			intel_pasid_tear_down_entry(iommu, info->dev,
> -					PASID_RID2PASID);
> +					PASID_RID2PASID, false);
> 
>  		iommu_disable_dev_iotlb(info);
>  		domain_context_clear(iommu, info->dev);
> @@ -5587,7 +5587,7 @@ static void aux_domain_remove_dev(struct
> dmar_domain *domain,
>  	auxiliary_unlink_device(domain, dev);
> 
>  	spin_lock(&iommu->lock);
> -	intel_pasid_tear_down_entry(iommu, dev, domain->default_pasid);
> +	intel_pasid_tear_down_entry(iommu, dev, domain->default_pasid,
> false);
>  	domain_detach_iommu(domain, iommu);
>  	spin_unlock(&iommu->lock);
> 
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 7969e3dac2ad..11aef6c12972 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -292,7 +292,20 @@ static inline void pasid_clear_entry(struct
> pasid_entry *pe)
>  	WRITE_ONCE(pe->val[7], 0);
>  }
> 
> -static void intel_pasid_clear_entry(struct device *dev, int pasid)
> +static inline void pasid_clear_entry_with_fpd(struct pasid_entry *pe)
> +{
> +	WRITE_ONCE(pe->val[0], PASID_PTE_FPD);
> +	WRITE_ONCE(pe->val[1], 0);
> +	WRITE_ONCE(pe->val[2], 0);
> +	WRITE_ONCE(pe->val[3], 0);
> +	WRITE_ONCE(pe->val[4], 0);
> +	WRITE_ONCE(pe->val[5], 0);
> +	WRITE_ONCE(pe->val[6], 0);
> +	WRITE_ONCE(pe->val[7], 0);
> +}
> +
> +static void
> +intel_pasid_clear_entry(struct device *dev, int pasid, bool pf_ignore)

Hi, Baolu,

Just curious whether it makes sense to always set FPD here. Yes, SVA is
one known example that non-recoverable fault associated with a PASID
entry might be caused after the entry is cleared and those are considered
benign.  But even in a general context (w/o SVA) why do we care about 
such faults after a PASID entry is torn down?

Thanks
Kevin

>  {
>  	struct pasid_entry *pe;
> 
> @@ -300,7 +313,10 @@ static void intel_pasid_clear_entry(struct device
> *dev, int pasid)
>  	if (WARN_ON(!pe))
>  		return;
> 
> -	pasid_clear_entry(pe);
> +	if (pf_ignore)
> +		pasid_clear_entry_with_fpd(pe);
> +	else
> +		pasid_clear_entry(pe);
>  }
> 
>  static inline void pasid_set_bits(u64 *ptr, u64 mask, u64 bits)
> @@ -533,8 +549,8 @@ devtlb_invalidation_with_pasid(struct intel_iommu
> *iommu,
>  	qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT);
>  }
> 
> -void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> -				 struct device *dev, int pasid)
> +void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct
> device *dev,
> +				 int pasid, bool pf_ignore)
>  {
>  	struct pasid_entry *pte;
>  	u16 did;
> @@ -544,7 +560,7 @@ void intel_pasid_tear_down_entry(struct
> intel_iommu *iommu,
>  		return;
> 
>  	did = pasid_get_domain_id(pte);
> -	intel_pasid_clear_entry(dev, pasid);
> +	intel_pasid_clear_entry(dev, pasid, pf_ignore);
> 
>  	if (!ecap_coherent(iommu->ecap))
>  		clflush_cache_range(pte, sizeof(*pte));
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> index a41b09b3ffde..e6dd95ffe381 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -15,6 +15,7 @@
>  #define PASID_MAX			0x100000
>  #define PASID_PTE_MASK			0x3F
>  #define PASID_PTE_PRESENT		1
> +#define PASID_PTE_FPD			2
>  #define PDE_PFN_MASK			PAGE_MASK
>  #define PASID_PDE_SHIFT			6
>  #define MAX_NR_PASID_BITS		20
> @@ -120,7 +121,7 @@ int intel_pasid_setup_nested(struct intel_iommu
> *iommu,
>  			     struct iommu_gpasid_bind_data_vtd *pasid_data,
>  			     struct dmar_domain *domain, int addr_width);
>  void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> -				 struct device *dev, int pasid);
> +				 struct device *dev, int pasid, bool pf_ignore);
>  int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
>  void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid);
>  #endif /* __INTEL_PASID_H */
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 83dc4319f661..9561ba59a170 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -207,7 +207,8 @@ static void intel_mm_release(struct mmu_notifier
> *mn, struct mm_struct *mm)
>  	 */
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(sdev, &svm->devs, list) {
> -		intel_pasid_tear_down_entry(svm->iommu, sdev->dev, svm-
> >pasid);
> +		intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
> +					    svm->pasid, true);
>  		intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>  	}
>  	rcu_read_unlock();
> @@ -400,7 +401,8 @@ int intel_svm_unbind_gpasid(struct device *dev, int
> pasid)
>  		sdev->users--;
>  		if (!sdev->users) {
>  			list_del_rcu(&sdev->list);
> -			intel_pasid_tear_down_entry(iommu, dev, svm-
> >pasid);
> +			intel_pasid_tear_down_entry(iommu, dev,
> +						    svm->pasid, false);
>  			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>  			/* TODO: Drain in flight PRQ for the PASID since it
>  			 * may get reused soon, we don't want to
> @@ -643,7 +645,8 @@ int intel_svm_unbind_mm(struct device *dev, int
> pasid)
>  			 * to use. We have a *shared* PASID table, because
> it's
>  			 * large and has to be physically contiguous. So it's
>  			 * hard to be as defensive as we might like. */
> -			intel_pasid_tear_down_entry(iommu, dev, svm-
> >pasid);
> +			intel_pasid_tear_down_entry(iommu, dev,
> +						    svm->pasid, false);
>  			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>  			kfree_rcu(sdev, rcu);
> 
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v4 4/5] iommu/vt-d: Add page request draining support
  2020-05-07  0:55 ` [PATCH v4 4/5] iommu/vt-d: Add page request draining support Lu Baolu
@ 2020-05-07  6:35   ` Tian, Kevin
  2020-05-08  2:26     ` Lu Baolu
  0 siblings, 1 reply; 21+ messages in thread
From: Tian, Kevin @ 2020-05-07  6:35 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel; +Cc: Raj, Ashok, linux-kernel, iommu

> From: Lu Baolu
> Sent: Thursday, May 7, 2020 8:56 AM
> 
> When a PASID is stopped or terminated, there can be pending PRQs
> (requests that haven't received responses) in remapping hardware.
> This adds the interface to drain page requests and call it when a
> PASID is terminated.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-svm.c   | 102 ++++++++++++++++++++++++++++++++++-
> -
>  include/linux/intel-iommu.h |   4 ++
>  2 files changed, 101 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 9561ba59a170..7256eb965cf6 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -23,6 +23,7 @@
>  #include "intel-pasid.h"
> 
>  static irqreturn_t prq_event_thread(int irq, void *d);
> +static void intel_svm_drain_prq(struct device *dev, int pasid);
> 
>  #define PRQ_ORDER 0
> 
> @@ -66,6 +67,8 @@ int intel_svm_enable_prq(struct intel_iommu *iommu)
>  	dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
>  	dmar_writeq(iommu->reg + DMAR_PQA_REG, virt_to_phys(iommu-
> >prq) | PRQ_ORDER);
> 
> +	init_completion(&iommu->prq_complete);
> +
>  	return 0;
>  }
> 
> @@ -403,12 +406,8 @@ int intel_svm_unbind_gpasid(struct device *dev, int
> pasid)
>  			list_del_rcu(&sdev->list);
>  			intel_pasid_tear_down_entry(iommu, dev,
>  						    svm->pasid, false);
> +			intel_svm_drain_prq(dev, svm->pasid);
>  			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
> -			/* TODO: Drain in flight PRQ for the PASID since it
> -			 * may get reused soon, we don't want to
> -			 * confuse with its previous life.
> -			 * intel_svm_drain_prq(dev, pasid);
> -			 */
>  			kfree_rcu(sdev, rcu);
> 
>  			if (list_empty(&svm->devs)) {
> @@ -647,6 +646,7 @@ int intel_svm_unbind_mm(struct device *dev, int
> pasid)
>  			 * hard to be as defensive as we might like. */
>  			intel_pasid_tear_down_entry(iommu, dev,
>  						    svm->pasid, false);
> +			intel_svm_drain_prq(dev, svm->pasid);
>  			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>  			kfree_rcu(sdev, rcu);
> 
> @@ -725,6 +725,92 @@ static bool is_canonical_address(u64 addr)
>  	return (((saddr << shift) >> shift) == saddr);
>  }
> 
> +/**
> + * intel_svm_drain_prq:
> + *
> + * Drain all pending page requests and responses related to a specific
> + * pasid in both software and hardware.
> + */
> +static void intel_svm_drain_prq(struct device *dev, int pasid)
> +{
> +	struct device_domain_info *info;
> +	struct dmar_domain *domain;
> +	struct intel_iommu *iommu;
> +	struct qi_desc desc[3];
> +	struct pci_dev *pdev;
> +	int head, tail;
> +	u16 sid, did;
> +	int qdep;
> +
> +	info = get_domain_info(dev);
> +	if (WARN_ON(!info || !dev_is_pci(dev)))
> +		return;
> +
> +	if (!info->ats_enabled)
> +		return;

ats_enabled -> pri_enabled

> +
> +	iommu = info->iommu;
> +	domain = info->domain;
> +	pdev = to_pci_dev(dev);
> +	sid = PCI_DEVID(info->bus, info->devfn);
> +	did = domain->iommu_did[iommu->seq_id];
> +	qdep = pci_ats_queue_depth(pdev);
> +
> +	memset(desc, 0, sizeof(desc));
> +	desc[0].qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
> +			QI_IWD_FENCE |
> +			QI_IWD_TYPE;
> +	desc[1].qw0 = QI_EIOTLB_PASID(pasid) |
> +			QI_EIOTLB_DID(did) |
> +			QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
> +			QI_EIOTLB_TYPE;
> +	desc[2].qw0 = QI_DEV_EIOTLB_PASID(pasid) |
> +			QI_DEV_EIOTLB_SID(sid) |
> +			QI_DEV_EIOTLB_QDEP(qdep) |
> +			QI_DEIOTLB_TYPE |
> +			QI_DEV_IOTLB_PFSID(info->pfsid);
> +
> +	/*
> +	 * Submit an invalidation wait descriptor with fence and page request
> +	 * drain flags set to invalidation queue. This ensures that all requests
> +	 * submitted to the invalidation queue ahead of this wait descriptor
> are
> +	 * processed and completed, and all already issued page requests
> from
> +	 * the device are put in the page request queue.
> +	 */

I feel this comment is better moved earlier since it explains the overall 
flow including all 3 descriptors. Also it is not one wait descriptor which
gets both fence and drain flags set. There are two wait descriptors with
one setting fence and the other setting drain.

> +	qi_submit_sync(iommu, desc, 1, QI_OPT_WAIT_DRAIN);

the count is '3' instead of '1'.

> +
> +	/*
> +	 * Check and wait until all pending page requests in the queue are
> +	 * handled by the intr thread.
> +	 */
> +prq_retry:
> +	tail = dmar_readq(iommu->reg + DMAR_PQT_REG) &
> PRQ_RING_MASK;
> +	head = dmar_readq(iommu->reg + DMAR_PQH_REG) &
> PRQ_RING_MASK;
> +	while (head != tail) {
> +		struct page_req_dsc *req;
> +
> +		req = &iommu->prq[head / sizeof(*req)];
> +		if (!req->pasid_present || req->pasid != pasid) {
> +			head = (head + sizeof(*req)) & PRQ_RING_MASK;
> +			continue;
> +		}
> +
> +		wait_for_completion_timeout(&iommu->prq_complete, HZ);
> +		goto prq_retry;
> +	}
> +
> +	/*
> +	 * Perform steps described in VT-d spec CH7.10 to drain page
> +	 * requests and responses in hardware.
> +	 */
> +qi_retry:
> +	qi_submit_sync(iommu, desc, 3, QI_OPT_WAIT_DRAIN);
> +	if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO) {
> +		wait_for_completion_timeout(&iommu->prq_complete, HZ);
> +		goto qi_retry;
> +	}
> +}
> +
>  static irqreturn_t prq_event_thread(int irq, void *d)
>  {
>  	struct intel_iommu *iommu = d;
> @@ -859,6 +945,12 @@ static irqreturn_t prq_event_thread(int irq, void *d)
>  	}
> 
>  	dmar_writeq(iommu->reg + DMAR_PQH_REG, tail);
> +	/*
> +	 * Clear the page request overflow bit and wake up all threads that
> +	 * are waiting for the completion of this handling.
> +	 */
> +	writel(DMA_PRS_PRO, iommu->reg + DMAR_PRS_REG);
> +	complete(&iommu->prq_complete);
> 
>  	return IRQ_RETVAL(handled);
>  }
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index cca1e5f9aeaa..a0512b401a59 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -292,6 +292,8 @@
> 
>  /* PRS_REG */
>  #define DMA_PRS_PPR	((u32)1)
> +#define DMA_PRS_PRO	((u32)2)
> +
>  #define DMA_VCS_PAS	((u64)1)
> 
>  #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts)
> 	\
> @@ -333,6 +335,7 @@ enum {
> 
>  #define QI_IWD_STATUS_DATA(d)	(((u64)d) << 32)
>  #define QI_IWD_STATUS_WRITE	(((u64)1) << 5)
> +#define QI_IWD_FENCE		(((u64)1) << 6)
>  #define QI_IWD_PRQ_DRAIN	(((u64)1) << 7)
> 
>  #define QI_IOTLB_DID(did) 	(((u64)did) << 16)
> @@ -590,6 +593,7 @@ struct intel_iommu {
>  #ifdef CONFIG_INTEL_IOMMU_SVM
>  	struct page_req_dsc *prq;
>  	unsigned char prq_name[16];    /* Name for PRQ interrupt */
> +	struct completion prq_complete;
>  	struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for
> PASIDs */
>  #endif
>  	struct q_inval  *qi;            /* Queued invalidation info */
> --
> 2.17.1
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v4 5/5] iommu/vt-d: Remove redundant IOTLB flush
  2020-05-07  0:55 ` [PATCH v4 5/5] iommu/vt-d: Remove redundant IOTLB flush Lu Baolu
@ 2020-05-07  6:37   ` Tian, Kevin
  0 siblings, 0 replies; 21+ messages in thread
From: Tian, Kevin @ 2020-05-07  6:37 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel; +Cc: Raj, Ashok, linux-kernel, iommu

> From: Lu Baolu
> Sent: Thursday, May 7, 2020 8:56 AM
> 
> IOTLB flush already included in the PASID tear down and the page request
> drain process. There is no need to flush again.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-svm.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 7256eb965cf6..5ff05adc96e9 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -209,11 +209,9 @@ static void intel_mm_release(struct mmu_notifier
> *mn, struct mm_struct *mm)
>  	 * *has* to handle gracefully without affecting other processes.
>  	 */
>  	rcu_read_lock();
> -	list_for_each_entry_rcu(sdev, &svm->devs, list) {
> +	list_for_each_entry_rcu(sdev, &svm->devs, list)
>  		intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
>  					    svm->pasid, true);
> -		intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
> -	}
>  	rcu_read_unlock();
> 
>  }
> @@ -407,7 +405,6 @@ int intel_svm_unbind_gpasid(struct device *dev, int
> pasid)
>  			intel_pasid_tear_down_entry(iommu, dev,
>  						    svm->pasid, false);
>  			intel_svm_drain_prq(dev, svm->pasid);
> -			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>  			kfree_rcu(sdev, rcu);
> 
>  			if (list_empty(&svm->devs)) {
> @@ -647,7 +644,6 @@ int intel_svm_unbind_mm(struct device *dev, int
> pasid)
>  			intel_pasid_tear_down_entry(iommu, dev,
>  						    svm->pasid, false);
>  			intel_svm_drain_prq(dev, svm->pasid);
> -			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>  			kfree_rcu(sdev, rcu);
> 
>  			if (list_empty(&svm->devs)) {
> --
> 2.17.1

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v4 0/5] iommu/vt-d: Add page request draining support
  2020-05-07  0:55 [PATCH v4 0/5] iommu/vt-d: Add page request draining support Lu Baolu
                   ` (4 preceding siblings ...)
  2020-05-07  0:55 ` [PATCH v4 5/5] iommu/vt-d: Remove redundant IOTLB flush Lu Baolu
@ 2020-05-07  6:38 ` Tian, Kevin
  2020-05-07  6:47   ` Lu Baolu
  5 siblings, 1 reply; 21+ messages in thread
From: Tian, Kevin @ 2020-05-07  6:38 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel
  Cc: Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Thursday, May 7, 2020 8:55 AM
> 
> When a PASID is stopped or terminated, there can be pending PRQs
> (requests that haven't received responses) in the software and
> remapping hardware. The pending page requests must be drained
> so that the pasid could be reused. The chapter 7.10 in the VT-d
> specification specifies the software steps to drain pending page
> requests and responses.
> 
> This includes two parts:
>  - PATCH 1/4 ~ 2/4: refactor the qi_submit_sync() to support multiple
>    descriptors per submission which will be used in the following
>    patch.
>  - PATCH 3/4 ~ 4/4: add page request drain support after a pasid entry
>    is torn down.
> 

I think you should mention that this series depends on Jacob's nested
SVA series.

> Best regards,
> baolu
> 
> Change log:
> v3->v4:
>   - Remove prq drain in mm notifier;
>   - Set PASID FPD bit when pasid is cleared in mm notifier and clear
>     it in unbound().
> 
>  v2->v3:
>   - Address Kevin's review comments
>     - Squash the first 2 patches together;
>     - The prq thread is serialized, no need to consider reentrance;
>     - Ensure no new-coming prq before drain prq in queue;
>     - Handle page request overflow case.
> 
>  v1->v2:
>   - Fix race between multiple prq handling threads.
> 
> Lu Baolu (5):
>   iommu/vt-d: Multiple descriptors per qi_submit_sync()
>   iommu/vt-d: debugfs: Add support to show inv queue internals
>   iommu/vt-d: Disable non-recoverable fault processing before unbind
>   iommu/vt-d: Add page request draining support
>   iommu/vt-d: Remove redundant IOTLB flush
> 
>  drivers/iommu/dmar.c                |  63 ++++++++------
>  drivers/iommu/intel-iommu-debugfs.c |  62 ++++++++++++++
>  drivers/iommu/intel-iommu.c         |   4 +-
>  drivers/iommu/intel-pasid.c         |  30 +++++--
>  drivers/iommu/intel-pasid.h         |   3 +-
>  drivers/iommu/intel-svm.c           | 123 ++++++++++++++++++++++++----
>  drivers/iommu/intel_irq_remapping.c |   2 +-
>  include/linux/intel-iommu.h         |  13 ++-
>  8 files changed, 247 insertions(+), 53 deletions(-)
> 
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 0/5] iommu/vt-d: Add page request draining support
  2020-05-07  6:38 ` [PATCH v4 0/5] iommu/vt-d: Add page request draining support Tian, Kevin
@ 2020-05-07  6:47   ` Lu Baolu
  0 siblings, 0 replies; 21+ messages in thread
From: Lu Baolu @ 2020-05-07  6:47 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel
  Cc: baolu.lu, Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

Hi Kevin,

Thanks a lot for reviewing.

On 2020/5/7 14:38, Tian, Kevin wrote:
>> From: Lu Baolu<baolu.lu@linux.intel.com>
>> Sent: Thursday, May 7, 2020 8:55 AM
>>
>> When a PASID is stopped or terminated, there can be pending PRQs
>> (requests that haven't received responses) in the software and
>> remapping hardware. The pending page requests must be drained
>> so that the pasid could be reused. The chapter 7.10 in the VT-d
>> specification specifies the software steps to drain pending page
>> requests and responses.
>>
>> This includes two parts:
>>   - PATCH 1/4 ~ 2/4: refactor the qi_submit_sync() to support multiple
>>     descriptors per submission which will be used in the following
>>     patch.
>>   - PATCH 3/4 ~ 4/4: add page request drain support after a pasid entry
>>     is torn down.
>>
> I think you should mention that this series depends on Jacob's nested
> SVA series.
> 

Yes. It's based on Jacob's vSVA series since guest unbind also requires
prq draining.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 3/5] iommu/vt-d: Disable non-recoverable fault processing before unbind
  2020-05-07  5:45   ` Tian, Kevin
@ 2020-05-07 13:23     ` Lu Baolu
  2020-05-08  2:12       ` Tian, Kevin
  0 siblings, 1 reply; 21+ messages in thread
From: Lu Baolu @ 2020-05-07 13:23 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel
  Cc: baolu.lu, Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

Hi Kevin,

On 2020/5/7 13:45, Tian, Kevin wrote:
>> From: Lu Baolu<baolu.lu@linux.intel.com>
>> Sent: Thursday, May 7, 2020 8:56 AM
>>
>> When a PASID is used for SVA by the device, it's possible that the PASID
>> entry is cleared before the device flushes all ongoing DMA requests. The
>> IOMMU should ignore the non-recoverable faults caused by these requests.
>> Intel VT-d provides such function through the FPD bit of the PASID entry.
>> This sets FPD bit when PASID entry is cleared in the mm notifier and
>> clear it when the pasid is unbound.
>>
>> Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/intel-iommu.c |  4 ++--
>>   drivers/iommu/intel-pasid.c | 26 +++++++++++++++++++++-----
>>   drivers/iommu/intel-pasid.h |  3 ++-
>>   drivers/iommu/intel-svm.c   |  9 ++++++---
>>   4 files changed, 31 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index d1866c0905b1..7811422b5a68 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -5352,7 +5352,7 @@ static void __dmar_remove_one_dev_info(struct
>> device_domain_info *info)
>>   	if (info->dev) {
>>   		if (dev_is_pci(info->dev) && sm_supported(iommu))
>>   			intel_pasid_tear_down_entry(iommu, info->dev,
>> -					PASID_RID2PASID);
>> +					PASID_RID2PASID, false);
>>
>>   		iommu_disable_dev_iotlb(info);
>>   		domain_context_clear(iommu, info->dev);
>> @@ -5587,7 +5587,7 @@ static void aux_domain_remove_dev(struct
>> dmar_domain *domain,
>>   	auxiliary_unlink_device(domain, dev);
>>
>>   	spin_lock(&iommu->lock);
>> -	intel_pasid_tear_down_entry(iommu, dev, domain->default_pasid);
>> +	intel_pasid_tear_down_entry(iommu, dev, domain->default_pasid,
>> false);
>>   	domain_detach_iommu(domain, iommu);
>>   	spin_unlock(&iommu->lock);
>>
>> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
>> index 7969e3dac2ad..11aef6c12972 100644
>> --- a/drivers/iommu/intel-pasid.c
>> +++ b/drivers/iommu/intel-pasid.c
>> @@ -292,7 +292,20 @@ static inline void pasid_clear_entry(struct
>> pasid_entry *pe)
>>   	WRITE_ONCE(pe->val[7], 0);
>>   }
>>
>> -static void intel_pasid_clear_entry(struct device *dev, int pasid)
>> +static inline void pasid_clear_entry_with_fpd(struct pasid_entry *pe)
>> +{
>> +	WRITE_ONCE(pe->val[0], PASID_PTE_FPD);
>> +	WRITE_ONCE(pe->val[1], 0);
>> +	WRITE_ONCE(pe->val[2], 0);
>> +	WRITE_ONCE(pe->val[3], 0);
>> +	WRITE_ONCE(pe->val[4], 0);
>> +	WRITE_ONCE(pe->val[5], 0);
>> +	WRITE_ONCE(pe->val[6], 0);
>> +	WRITE_ONCE(pe->val[7], 0);
>> +}
>> +
>> +static void
>> +intel_pasid_clear_entry(struct device *dev, int pasid, bool pf_ignore)
> Hi, Baolu,
> 
> Just curious whether it makes sense to always set FPD here. Yes, SVA is
> one known example that non-recoverable fault associated with a PASID
> entry might be caused after the entry is cleared and those are considered
> benign.  But even in a general context (w/o SVA) why do we care about
> such faults after a PASID entry is torn down?

First level page tables are also used for DMA protection. For example,
thunderbolt peripherals are always untrusted and should be protected
with IOMMU. IOMMU should always report unrecoverable faults generated
by those device to detect possible DMA attacks.

ATS/PRI devices are always trusted devices, hence we could tolerate
setting FPD bit in the time window between mm_release notifier and
unbind().

Best regards,
baolu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 2/5] iommu/vt-d: debugfs: Add support to show inv queue internals
  2020-05-07  0:55 ` [PATCH v4 2/5] iommu/vt-d: debugfs: Add support to show inv queue internals Lu Baolu
  2020-05-07  5:39   ` Tian, Kevin
@ 2020-05-07 16:47   ` Jacob Pan
  2020-05-08  1:37     ` Lu Baolu
  1 sibling, 1 reply; 21+ messages in thread
From: Jacob Pan @ 2020-05-07 16:47 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, ashok.raj, Liu Yi L, kevin.tian, iommu,
	linux-kernel, jacob.jun.pan

Hi Baolu,

Very helpful feature, thanks for doing this. Just a small suggestion.

On Thu,  7 May 2020 08:55:31 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Export invalidation queue internals of each iommu device through the
> debugfs.
> 
> Example of such dump on a Skylake machine:
> 
> $ sudo cat /sys/kernel/debug/iommu/intel/invalidation_queue
> Invalidation queue on IOMMU: dmar1
>  Base: 0x1672c9000      Head: 80        Tail: 80
> Index           qw0                     qw1                     status
>     0   0000000000000004        0000000000000000
> 0000000000000000 1   0000000200000025        00000001672be804
> 0000000000000000 2   0000000000000011        0000000000000000
> 0000000000000000 3   0000000200000025        00000001672be80c
> 0000000000000000 4   00000000000000d2        0000000000000000
> 0000000000000000 5   0000000200000025        00000001672be814
> 0000000000000000 6   0000000000000014        0000000000000000
> 0000000000000000 7   0000000200000025        00000001672be81c
> 0000000000000000 8   0000000000000014        0000000000000000
> 0000000000000000 9   0000000200000025        00000001672be824
> 0000000000000000
> 
Head and Tail shows the offset, and queue is dump with index. Would it
be nice to mark where the Head and Tail is in the list?
In your example, the queue is empty (H=T), would be nice to see where
the previous entry is if there were any faults.

> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-iommu-debugfs.c | 62
> +++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu-debugfs.c
> b/drivers/iommu/intel-iommu-debugfs.c index
> 3eb1fe240fb0..e3089865b8f3 100644 ---
> a/drivers/iommu/intel-iommu-debugfs.c +++
> b/drivers/iommu/intel-iommu-debugfs.c @@ -372,6 +372,66 @@ static int
> domain_translation_struct_show(struct seq_file *m, void *unused) }
>  DEFINE_SHOW_ATTRIBUTE(domain_translation_struct);
>  
> +static void invalidation_queue_entry_show(struct seq_file *m,
> +					  struct intel_iommu *iommu)
> +{
> +	int index, shift = qi_shift(iommu);
> +	struct qi_desc *desc;
> +	int offset;
> +
> +	if (ecap_smts(iommu->ecap))
> +		seq_puts(m,
> "Index\t\tqw0\t\t\tqw1\t\t\tqw2\t\t\tqw3\t\t\tstatus\n");
> +	else
> +		seq_puts(m, "Index\t\tqw0\t\t\tqw1\t\t\tstatus\n");
> +
> +	for (index = 0; index < QI_LENGTH; index++) {
> +		offset = index << shift;
> +		desc = iommu->qi->desc + offset;
> +		if (ecap_smts(iommu->ecap))
> +			seq_printf(m,
> "%5d\t%016llx\t%016llx\t%016llx\t%016llx\t%016x\n",
> +				   index, desc->qw0, desc->qw1,
> +				   desc->qw2, desc->qw3,
> +				   iommu->qi->desc_status[index]);
> +		else
> +			seq_printf(m,
> "%5d\t%016llx\t%016llx\t%016x\n",
> +				   index, desc->qw0, desc->qw1,
> +				   iommu->qi->desc_status[index]);
> +	}
> +}
> +
> +static int invalidation_queue_show(struct seq_file *m, void *unused)
> +{
> +	struct dmar_drhd_unit *drhd;
> +	struct intel_iommu *iommu;
> +	unsigned long flags;
> +	struct q_inval *qi;
> +	int shift;
> +
> +	rcu_read_lock();
> +	for_each_active_iommu(iommu, drhd) {
> +		qi = iommu->qi;
> +		shift = qi_shift(iommu);
> +
> +		if (!qi || !ecap_qis(iommu->ecap))
> +			continue;
> +
> +		seq_printf(m, "Invalidation queue on IOMMU: %s\n",
> iommu->name); +
> +		raw_spin_lock_irqsave(&qi->q_lock, flags);
> +		seq_printf(m, " Base: 0x%llx\tHead: %lld\tTail:
> %lld\n",
> +			   virt_to_phys(qi->desc),
> +			   dmar_readq(iommu->reg + DMAR_IQH_REG) >>
> shift,
> +			   dmar_readq(iommu->reg + DMAR_IQT_REG) >>
> shift);
> +		invalidation_queue_entry_show(m, iommu);
> +		raw_spin_unlock_irqrestore(&qi->q_lock, flags);
> +		seq_putc(m, '\n');
> +	}
> +	rcu_read_unlock();
> +
> +	return 0;
> +}
> +DEFINE_SHOW_ATTRIBUTE(invalidation_queue);
> +
>  #ifdef CONFIG_IRQ_REMAP
>  static void ir_tbl_remap_entry_show(struct seq_file *m,
>  				    struct intel_iommu *iommu)
> @@ -490,6 +550,8 @@ void __init intel_iommu_debugfs_init(void)
>  	debugfs_create_file("domain_translation_struct", 0444,
>  			    intel_iommu_debug, NULL,
>  			    &domain_translation_struct_fops);
> +	debugfs_create_file("invalidation_queue", 0444,
> intel_iommu_debug,
> +			    NULL, &invalidation_queue_fops);
>  #ifdef CONFIG_IRQ_REMAP
>  	debugfs_create_file("ir_translation_struct", 0444,
> intel_iommu_debug, NULL, &ir_translation_struct_fops);

[Jacob Pan]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 3/5] iommu/vt-d: Disable non-recoverable fault processing before unbind
  2020-05-07  0:55 ` [PATCH v4 3/5] iommu/vt-d: Disable non-recoverable fault processing before unbind Lu Baolu
  2020-05-07  5:45   ` Tian, Kevin
@ 2020-05-07 16:55   ` Jacob Pan
  2020-05-08  1:39     ` Lu Baolu
  1 sibling, 1 reply; 21+ messages in thread
From: Jacob Pan @ 2020-05-07 16:55 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, ashok.raj, Liu Yi L, kevin.tian, iommu,
	linux-kernel, jacob.jun.pan

On Thu,  7 May 2020 08:55:32 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> When a PASID is used for SVA by the device, it's possible that the
> PASID entry is cleared before the device flushes all ongoing DMA
> requests. The IOMMU should ignore the non-recoverable faults caused
> by these requests.
Perhaps be more specific, only untranslated requests causes UR.

> Intel VT-d provides such function through the FPD
> bit of the PASID entry. This sets FPD bit when PASID entry is cleared
> in the mm notifier and clear it when the pasid is unbound.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-iommu.c |  4 ++--
>  drivers/iommu/intel-pasid.c | 26 +++++++++++++++++++++-----
>  drivers/iommu/intel-pasid.h |  3 ++-
>  drivers/iommu/intel-svm.c   |  9 ++++++---
>  4 files changed, 31 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index d1866c0905b1..7811422b5a68 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5352,7 +5352,7 @@ static void __dmar_remove_one_dev_info(struct
> device_domain_info *info) if (info->dev) {
>  		if (dev_is_pci(info->dev) && sm_supported(iommu))
>  			intel_pasid_tear_down_entry(iommu, info->dev,
> -					PASID_RID2PASID);
> +					PASID_RID2PASID, false);
>  
>  		iommu_disable_dev_iotlb(info);
>  		domain_context_clear(iommu, info->dev);
> @@ -5587,7 +5587,7 @@ static void aux_domain_remove_dev(struct
> dmar_domain *domain, auxiliary_unlink_device(domain, dev);
>  
>  	spin_lock(&iommu->lock);
> -	intel_pasid_tear_down_entry(iommu, dev,
> domain->default_pasid);
> +	intel_pasid_tear_down_entry(iommu, dev,
> domain->default_pasid, false); domain_detach_iommu(domain, iommu);
>  	spin_unlock(&iommu->lock);
>  
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 7969e3dac2ad..11aef6c12972 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -292,7 +292,20 @@ static inline void pasid_clear_entry(struct
> pasid_entry *pe) WRITE_ONCE(pe->val[7], 0);
>  }
>  
> -static void intel_pasid_clear_entry(struct device *dev, int pasid)
> +static inline void pasid_clear_entry_with_fpd(struct pasid_entry *pe)
> +{
> +	WRITE_ONCE(pe->val[0], PASID_PTE_FPD);
> +	WRITE_ONCE(pe->val[1], 0);
> +	WRITE_ONCE(pe->val[2], 0);
> +	WRITE_ONCE(pe->val[3], 0);
> +	WRITE_ONCE(pe->val[4], 0);
> +	WRITE_ONCE(pe->val[5], 0);
> +	WRITE_ONCE(pe->val[6], 0);
> +	WRITE_ONCE(pe->val[7], 0);
> +}
> +
> +static void
> +intel_pasid_clear_entry(struct device *dev, int pasid, bool
> pf_ignore) {
>  	struct pasid_entry *pe;
>  
> @@ -300,7 +313,10 @@ static void intel_pasid_clear_entry(struct
> device *dev, int pasid) if (WARN_ON(!pe))
>  		return;
>  
> -	pasid_clear_entry(pe);
> +	if (pf_ignore)
> +		pasid_clear_entry_with_fpd(pe);
> +	else
> +		pasid_clear_entry(pe);
>  }
>  
>  static inline void pasid_set_bits(u64 *ptr, u64 mask, u64 bits)
> @@ -533,8 +549,8 @@ devtlb_invalidation_with_pasid(struct intel_iommu
> *iommu, qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 -
> VTD_PAGE_SHIFT); }
>  
> -void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> -				 struct device *dev, int pasid)
> +void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct
> device *dev,
> +				 int pasid, bool pf_ignore)
>  {
>  	struct pasid_entry *pte;
>  	u16 did;
> @@ -544,7 +560,7 @@ void intel_pasid_tear_down_entry(struct
> intel_iommu *iommu, return;
>  
>  	did = pasid_get_domain_id(pte);
> -	intel_pasid_clear_entry(dev, pasid);
> +	intel_pasid_clear_entry(dev, pasid, pf_ignore);
>  
>  	if (!ecap_coherent(iommu->ecap))
>  		clflush_cache_range(pte, sizeof(*pte));
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> index a41b09b3ffde..e6dd95ffe381 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -15,6 +15,7 @@
>  #define PASID_MAX			0x100000
>  #define PASID_PTE_MASK			0x3F
>  #define PASID_PTE_PRESENT		1
> +#define PASID_PTE_FPD			2
>  #define PDE_PFN_MASK			PAGE_MASK
>  #define PASID_PDE_SHIFT			6
>  #define MAX_NR_PASID_BITS		20
> @@ -120,7 +121,7 @@ int intel_pasid_setup_nested(struct intel_iommu
> *iommu, struct iommu_gpasid_bind_data_vtd *pasid_data,
>  			     struct dmar_domain *domain, int
> addr_width); void intel_pasid_tear_down_entry(struct intel_iommu
> *iommu,
> -				 struct device *dev, int pasid);
> +				 struct device *dev, int pasid, bool
> pf_ignore); int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned
> int *pasid); void vcmd_free_pasid(struct intel_iommu *iommu, unsigned
> int pasid); #endif /* __INTEL_PASID_H */
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 83dc4319f661..9561ba59a170 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -207,7 +207,8 @@ static void intel_mm_release(struct mmu_notifier
> *mn, struct mm_struct *mm) */
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(sdev, &svm->devs, list) {
> -		intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
> svm->pasid);
> +		intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
> +					    svm->pasid, true);
>  		intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>  	}
>  	rcu_read_unlock();
> @@ -400,7 +401,8 @@ int intel_svm_unbind_gpasid(struct device *dev,
> int pasid) sdev->users--;
>  		if (!sdev->users) {
>  			list_del_rcu(&sdev->list);
> -			intel_pasid_tear_down_entry(iommu, dev,
> svm->pasid);
> +			intel_pasid_tear_down_entry(iommu, dev,
> +						    svm->pasid,
> false); intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>  			/* TODO: Drain in flight PRQ for the PASID
> since it
>  			 * may get reused soon, we don't want to
> @@ -643,7 +645,8 @@ int intel_svm_unbind_mm(struct device *dev, int
> pasid)
>  			 * to use. We have a *shared* PASID table,
> because it's
>  			 * large and has to be physically
> contiguous. So it's
>  			 * hard to be as defensive as we might like.
> */
> -			intel_pasid_tear_down_entry(iommu, dev,
> svm->pasid);
> +			intel_pasid_tear_down_entry(iommu, dev,
> +						    svm->pasid,
> false); intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>  			kfree_rcu(sdev, rcu);
>  
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 2/5] iommu/vt-d: debugfs: Add support to show inv queue internals
  2020-05-07 16:47   ` Jacob Pan
@ 2020-05-08  1:37     ` Lu Baolu
  0 siblings, 0 replies; 21+ messages in thread
From: Lu Baolu @ 2020-05-08  1:37 UTC (permalink / raw)
  To: Jacob Pan
  Cc: baolu.lu, Joerg Roedel, ashok.raj, Liu Yi L, kevin.tian, iommu,
	linux-kernel

Hi Jacob,

On 5/8/20 12:47 AM, Jacob Pan wrote:
> Hi Baolu,
> 
> Very helpful feature, thanks for doing this. Just a small suggestion.

Thanks a lot for reviewing my patch.

> 
> On Thu,  7 May 2020 08:55:31 +0800
> Lu Baolu<baolu.lu@linux.intel.com>  wrote:
> 
>> Export invalidation queue internals of each iommu device through the
>> debugfs.
>>
>> Example of such dump on a Skylake machine:
>>
>> $ sudo cat /sys/kernel/debug/iommu/intel/invalidation_queue
>> Invalidation queue on IOMMU: dmar1
>>   Base: 0x1672c9000      Head: 80        Tail: 80
>> Index           qw0                     qw1                     status
>>      0   0000000000000004        0000000000000000
>> 0000000000000000 1   0000000200000025        00000001672be804
>> 0000000000000000 2   0000000000000011        0000000000000000
>> 0000000000000000 3   0000000200000025        00000001672be80c
>> 0000000000000000 4   00000000000000d2        0000000000000000
>> 0000000000000000 5   0000000200000025        00000001672be814
>> 0000000000000000 6   0000000000000014        0000000000000000
>> 0000000000000000 7   0000000200000025        00000001672be81c
>> 0000000000000000 8   0000000000000014        0000000000000000
>> 0000000000000000 9   0000000200000025        00000001672be824
>> 0000000000000000
>>
> Head and Tail shows the offset, and queue is dump with index. Would it
> be nice to mark where the Head and Tail is in the list?

The Head and Tail actually show the index. I will mark it clearly in the
dump to avoid any confusion. Thanks for the reminding.

> In your example, the queue is empty (H=T), would be nice to see where
> the previous entry is if there were any faults.
> 

The qi_check_fault() has already cleared the faults and moved ahead the
HEAD register. So probably the developers have to check the kernel log
and locate the fault descriptor by themselves.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 3/5] iommu/vt-d: Disable non-recoverable fault processing before unbind
  2020-05-07 16:55   ` Jacob Pan
@ 2020-05-08  1:39     ` Lu Baolu
  0 siblings, 0 replies; 21+ messages in thread
From: Lu Baolu @ 2020-05-08  1:39 UTC (permalink / raw)
  To: Jacob Pan
  Cc: baolu.lu, Joerg Roedel, ashok.raj, Liu Yi L, kevin.tian, iommu,
	linux-kernel

On 5/8/20 12:55 AM, Jacob Pan wrote:
> On Thu,  7 May 2020 08:55:32 +0800
> Lu Baolu <baolu.lu@linux.intel.com> wrote:
> 
>> When a PASID is used for SVA by the device, it's possible that the
>> PASID entry is cleared before the device flushes all ongoing DMA
>> requests. The IOMMU should ignore the non-recoverable faults caused
>> by these requests.
> Perhaps be more specific, only untranslated requests causes UR.

Yes. Sure!

> 
>> Intel VT-d provides such function through the FPD
>> bit of the PASID entry. This sets FPD bit when PASID entry is cleared
>> in the mm notifier and clear it when the pasid is unbound.
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/intel-iommu.c |  4 ++--
>>   drivers/iommu/intel-pasid.c | 26 +++++++++++++++++++++-----
>>   drivers/iommu/intel-pasid.h |  3 ++-
>>   drivers/iommu/intel-svm.c   |  9 ++++++---
>>   4 files changed, 31 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index d1866c0905b1..7811422b5a68 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -5352,7 +5352,7 @@ static void __dmar_remove_one_dev_info(struct
>> device_domain_info *info) if (info->dev) {
>>   		if (dev_is_pci(info->dev) && sm_supported(iommu))
>>   			intel_pasid_tear_down_entry(iommu, info->dev,
>> -					PASID_RID2PASID);
>> +					PASID_RID2PASID, false);
>>   
>>   		iommu_disable_dev_iotlb(info);
>>   		domain_context_clear(iommu, info->dev);
>> @@ -5587,7 +5587,7 @@ static void aux_domain_remove_dev(struct
>> dmar_domain *domain, auxiliary_unlink_device(domain, dev);
>>   
>>   	spin_lock(&iommu->lock);
>> -	intel_pasid_tear_down_entry(iommu, dev,
>> domain->default_pasid);
>> +	intel_pasid_tear_down_entry(iommu, dev,
>> domain->default_pasid, false); domain_detach_iommu(domain, iommu);
>>   	spin_unlock(&iommu->lock);
>>   
>> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
>> index 7969e3dac2ad..11aef6c12972 100644
>> --- a/drivers/iommu/intel-pasid.c
>> +++ b/drivers/iommu/intel-pasid.c
>> @@ -292,7 +292,20 @@ static inline void pasid_clear_entry(struct
>> pasid_entry *pe) WRITE_ONCE(pe->val[7], 0);
>>   }
>>   
>> -static void intel_pasid_clear_entry(struct device *dev, int pasid)
>> +static inline void pasid_clear_entry_with_fpd(struct pasid_entry *pe)
>> +{
>> +	WRITE_ONCE(pe->val[0], PASID_PTE_FPD);
>> +	WRITE_ONCE(pe->val[1], 0);
>> +	WRITE_ONCE(pe->val[2], 0);
>> +	WRITE_ONCE(pe->val[3], 0);
>> +	WRITE_ONCE(pe->val[4], 0);
>> +	WRITE_ONCE(pe->val[5], 0);
>> +	WRITE_ONCE(pe->val[6], 0);
>> +	WRITE_ONCE(pe->val[7], 0);
>> +}
>> +
>> +static void
>> +intel_pasid_clear_entry(struct device *dev, int pasid, bool
>> pf_ignore) {
>>   	struct pasid_entry *pe;
>>   
>> @@ -300,7 +313,10 @@ static void intel_pasid_clear_entry(struct
>> device *dev, int pasid) if (WARN_ON(!pe))
>>   		return;
>>   
>> -	pasid_clear_entry(pe);
>> +	if (pf_ignore)
>> +		pasid_clear_entry_with_fpd(pe);
>> +	else
>> +		pasid_clear_entry(pe);
>>   }
>>   
>>   static inline void pasid_set_bits(u64 *ptr, u64 mask, u64 bits)
>> @@ -533,8 +549,8 @@ devtlb_invalidation_with_pasid(struct intel_iommu
>> *iommu, qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 -
>> VTD_PAGE_SHIFT); }
>>   
>> -void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
>> -				 struct device *dev, int pasid)
>> +void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct
>> device *dev,
>> +				 int pasid, bool pf_ignore)
>>   {
>>   	struct pasid_entry *pte;
>>   	u16 did;
>> @@ -544,7 +560,7 @@ void intel_pasid_tear_down_entry(struct
>> intel_iommu *iommu, return;
>>   
>>   	did = pasid_get_domain_id(pte);
>> -	intel_pasid_clear_entry(dev, pasid);
>> +	intel_pasid_clear_entry(dev, pasid, pf_ignore);
>>   
>>   	if (!ecap_coherent(iommu->ecap))
>>   		clflush_cache_range(pte, sizeof(*pte));
>> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
>> index a41b09b3ffde..e6dd95ffe381 100644
>> --- a/drivers/iommu/intel-pasid.h
>> +++ b/drivers/iommu/intel-pasid.h
>> @@ -15,6 +15,7 @@
>>   #define PASID_MAX			0x100000
>>   #define PASID_PTE_MASK			0x3F
>>   #define PASID_PTE_PRESENT		1
>> +#define PASID_PTE_FPD			2
>>   #define PDE_PFN_MASK			PAGE_MASK
>>   #define PASID_PDE_SHIFT			6
>>   #define MAX_NR_PASID_BITS		20
>> @@ -120,7 +121,7 @@ int intel_pasid_setup_nested(struct intel_iommu
>> *iommu, struct iommu_gpasid_bind_data_vtd *pasid_data,
>>   			     struct dmar_domain *domain, int
>> addr_width); void intel_pasid_tear_down_entry(struct intel_iommu
>> *iommu,
>> -				 struct device *dev, int pasid);
>> +				 struct device *dev, int pasid, bool
>> pf_ignore); int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned
>> int *pasid); void vcmd_free_pasid(struct intel_iommu *iommu, unsigned
>> int pasid); #endif /* __INTEL_PASID_H */
>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>> index 83dc4319f661..9561ba59a170 100644
>> --- a/drivers/iommu/intel-svm.c
>> +++ b/drivers/iommu/intel-svm.c
>> @@ -207,7 +207,8 @@ static void intel_mm_release(struct mmu_notifier
>> *mn, struct mm_struct *mm) */
>>   	rcu_read_lock();
>>   	list_for_each_entry_rcu(sdev, &svm->devs, list) {
>> -		intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
>> svm->pasid);
>> +		intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
>> +					    svm->pasid, true);
>>   		intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>>   	}
>>   	rcu_read_unlock();
>> @@ -400,7 +401,8 @@ int intel_svm_unbind_gpasid(struct device *dev,
>> int pasid) sdev->users--;
>>   		if (!sdev->users) {
>>   			list_del_rcu(&sdev->list);
>> -			intel_pasid_tear_down_entry(iommu, dev,
>> svm->pasid);
>> +			intel_pasid_tear_down_entry(iommu, dev,
>> +						    svm->pasid,
>> false); intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>>   			/* TODO: Drain in flight PRQ for the PASID
>> since it
>>   			 * may get reused soon, we don't want to
>> @@ -643,7 +645,8 @@ int intel_svm_unbind_mm(struct device *dev, int
>> pasid)
>>   			 * to use. We have a *shared* PASID table,
>> because it's
>>   			 * large and has to be physically
>> contiguous. So it's
>>   			 * hard to be as defensive as we might like.
>> */
>> -			intel_pasid_tear_down_entry(iommu, dev,
>> svm->pasid);
>> +			intel_pasid_tear_down_entry(iommu, dev,
>> +						    svm->pasid,
>> false); intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>>   			kfree_rcu(sdev, rcu);
>>   
> Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> 

Best regards,
baolu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v4 3/5] iommu/vt-d: Disable non-recoverable fault processing before unbind
  2020-05-07 13:23     ` Lu Baolu
@ 2020-05-08  2:12       ` Tian, Kevin
  2020-05-08  2:49         ` Lu Baolu
  0 siblings, 1 reply; 21+ messages in thread
From: Tian, Kevin @ 2020-05-08  2:12 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel
  Cc: Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Thursday, May 7, 2020 9:23 PM
> 
> Hi Kevin,
> 
> On 2020/5/7 13:45, Tian, Kevin wrote:
> >> From: Lu Baolu<baolu.lu@linux.intel.com>
> >> Sent: Thursday, May 7, 2020 8:56 AM
> >>
> >> When a PASID is used for SVA by the device, it's possible that the PASID
> >> entry is cleared before the device flushes all ongoing DMA requests. The
> >> IOMMU should ignore the non-recoverable faults caused by these
> requests.
> >> Intel VT-d provides such function through the FPD bit of the PASID entry.
> >> This sets FPD bit when PASID entry is cleared in the mm notifier and
> >> clear it when the pasid is unbound.
> >>
> >> Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
> >> ---
> >>   drivers/iommu/intel-iommu.c |  4 ++--
> >>   drivers/iommu/intel-pasid.c | 26 +++++++++++++++++++++-----
> >>   drivers/iommu/intel-pasid.h |  3 ++-
> >>   drivers/iommu/intel-svm.c   |  9 ++++++---
> >>   4 files changed, 31 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> >> index d1866c0905b1..7811422b5a68 100644
> >> --- a/drivers/iommu/intel-iommu.c
> >> +++ b/drivers/iommu/intel-iommu.c
> >> @@ -5352,7 +5352,7 @@ static void
> __dmar_remove_one_dev_info(struct
> >> device_domain_info *info)
> >>   	if (info->dev) {
> >>   		if (dev_is_pci(info->dev) && sm_supported(iommu))
> >>   			intel_pasid_tear_down_entry(iommu, info->dev,
> >> -					PASID_RID2PASID);
> >> +					PASID_RID2PASID, false);
> >>
> >>   		iommu_disable_dev_iotlb(info);
> >>   		domain_context_clear(iommu, info->dev);
> >> @@ -5587,7 +5587,7 @@ static void aux_domain_remove_dev(struct
> >> dmar_domain *domain,
> >>   	auxiliary_unlink_device(domain, dev);
> >>
> >>   	spin_lock(&iommu->lock);
> >> -	intel_pasid_tear_down_entry(iommu, dev, domain->default_pasid);
> >> +	intel_pasid_tear_down_entry(iommu, dev, domain->default_pasid,
> >> false);
> >>   	domain_detach_iommu(domain, iommu);
> >>   	spin_unlock(&iommu->lock);
> >>
> >> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> >> index 7969e3dac2ad..11aef6c12972 100644
> >> --- a/drivers/iommu/intel-pasid.c
> >> +++ b/drivers/iommu/intel-pasid.c
> >> @@ -292,7 +292,20 @@ static inline void pasid_clear_entry(struct
> >> pasid_entry *pe)
> >>   	WRITE_ONCE(pe->val[7], 0);
> >>   }
> >>
> >> -static void intel_pasid_clear_entry(struct device *dev, int pasid)
> >> +static inline void pasid_clear_entry_with_fpd(struct pasid_entry *pe)
> >> +{
> >> +	WRITE_ONCE(pe->val[0], PASID_PTE_FPD);
> >> +	WRITE_ONCE(pe->val[1], 0);
> >> +	WRITE_ONCE(pe->val[2], 0);
> >> +	WRITE_ONCE(pe->val[3], 0);
> >> +	WRITE_ONCE(pe->val[4], 0);
> >> +	WRITE_ONCE(pe->val[5], 0);
> >> +	WRITE_ONCE(pe->val[6], 0);
> >> +	WRITE_ONCE(pe->val[7], 0);
> >> +}
> >> +
> >> +static void
> >> +intel_pasid_clear_entry(struct device *dev, int pasid, bool pf_ignore)
> > Hi, Baolu,
> >
> > Just curious whether it makes sense to always set FPD here. Yes, SVA is
> > one known example that non-recoverable fault associated with a PASID
> > entry might be caused after the entry is cleared and those are considered
> > benign.  But even in a general context (w/o SVA) why do we care about
> > such faults after a PASID entry is torn down?
> 
> First level page tables are also used for DMA protection. For example,
> thunderbolt peripherals are always untrusted and should be protected
> with IOMMU. IOMMU should always report unrecoverable faults generated
> by those device to detect possible DMA attacks.

when untrusted devices are protected by IOMMU, I don't think PASID
entry (of RID2PASID) will have present bit cleared.

> 
> ATS/PRI devices are always trusted devices, hence we could tolerate
> setting FPD bit in the time window between mm_release notifier and
> unbind().
> 
> Best regards,
> baolu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 4/5] iommu/vt-d: Add page request draining support
  2020-05-07  6:35   ` Tian, Kevin
@ 2020-05-08  2:26     ` Lu Baolu
  0 siblings, 0 replies; 21+ messages in thread
From: Lu Baolu @ 2020-05-08  2:26 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel; +Cc: baolu.lu, Raj, Ashok, linux-kernel, iommu

Hi Kevin,

On 5/7/20 2:35 PM, Tian, Kevin wrote:
>> From: Lu Baolu
>> Sent: Thursday, May 7, 2020 8:56 AM
>>
>> When a PASID is stopped or terminated, there can be pending PRQs
>> (requests that haven't received responses) in remapping hardware.
>> This adds the interface to drain page requests and call it when a
>> PASID is terminated.
>>
>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
>> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/intel-svm.c   | 102 ++++++++++++++++++++++++++++++++++-
>> -
>>   include/linux/intel-iommu.h |   4 ++
>>   2 files changed, 101 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>> index 9561ba59a170..7256eb965cf6 100644
>> --- a/drivers/iommu/intel-svm.c
>> +++ b/drivers/iommu/intel-svm.c
>> @@ -23,6 +23,7 @@
>>   #include "intel-pasid.h"
>>
>>   static irqreturn_t prq_event_thread(int irq, void *d);
>> +static void intel_svm_drain_prq(struct device *dev, int pasid);
>>
>>   #define PRQ_ORDER 0
>>
>> @@ -66,6 +67,8 @@ int intel_svm_enable_prq(struct intel_iommu *iommu)
>>   	dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
>>   	dmar_writeq(iommu->reg + DMAR_PQA_REG, virt_to_phys(iommu-
>>> prq) | PRQ_ORDER);
>>
>> +	init_completion(&iommu->prq_complete);
>> +
>>   	return 0;
>>   }
>>
>> @@ -403,12 +406,8 @@ int intel_svm_unbind_gpasid(struct device *dev, int
>> pasid)
>>   			list_del_rcu(&sdev->list);
>>   			intel_pasid_tear_down_entry(iommu, dev,
>>   						    svm->pasid, false);
>> +			intel_svm_drain_prq(dev, svm->pasid);
>>   			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>> -			/* TODO: Drain in flight PRQ for the PASID since it
>> -			 * may get reused soon, we don't want to
>> -			 * confuse with its previous life.
>> -			 * intel_svm_drain_prq(dev, pasid);
>> -			 */
>>   			kfree_rcu(sdev, rcu);
>>
>>   			if (list_empty(&svm->devs)) {
>> @@ -647,6 +646,7 @@ int intel_svm_unbind_mm(struct device *dev, int
>> pasid)
>>   			 * hard to be as defensive as we might like. */
>>   			intel_pasid_tear_down_entry(iommu, dev,
>>   						    svm->pasid, false);
>> +			intel_svm_drain_prq(dev, svm->pasid);
>>   			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>>   			kfree_rcu(sdev, rcu);
>>
>> @@ -725,6 +725,92 @@ static bool is_canonical_address(u64 addr)
>>   	return (((saddr << shift) >> shift) == saddr);
>>   }
>>
>> +/**
>> + * intel_svm_drain_prq:
>> + *
>> + * Drain all pending page requests and responses related to a specific
>> + * pasid in both software and hardware.
>> + */
>> +static void intel_svm_drain_prq(struct device *dev, int pasid)
>> +{
>> +	struct device_domain_info *info;
>> +	struct dmar_domain *domain;
>> +	struct intel_iommu *iommu;
>> +	struct qi_desc desc[3];
>> +	struct pci_dev *pdev;
>> +	int head, tail;
>> +	u16 sid, did;
>> +	int qdep;
>> +
>> +	info = get_domain_info(dev);
>> +	if (WARN_ON(!info || !dev_is_pci(dev)))
>> +		return;
>> +
>> +	if (!info->ats_enabled)
>> +		return;
> 
> ats_enabled -> pri_enabled

Yes. Sure!

> 
>> +
>> +	iommu = info->iommu;
>> +	domain = info->domain;
>> +	pdev = to_pci_dev(dev);
>> +	sid = PCI_DEVID(info->bus, info->devfn);
>> +	did = domain->iommu_did[iommu->seq_id];
>> +	qdep = pci_ats_queue_depth(pdev);
>> +
>> +	memset(desc, 0, sizeof(desc));
>> +	desc[0].qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
>> +			QI_IWD_FENCE |
>> +			QI_IWD_TYPE;
>> +	desc[1].qw0 = QI_EIOTLB_PASID(pasid) |
>> +			QI_EIOTLB_DID(did) |
>> +			QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
>> +			QI_EIOTLB_TYPE;
>> +	desc[2].qw0 = QI_DEV_EIOTLB_PASID(pasid) |
>> +			QI_DEV_EIOTLB_SID(sid) |
>> +			QI_DEV_EIOTLB_QDEP(qdep) |
>> +			QI_DEIOTLB_TYPE |
>> +			QI_DEV_IOTLB_PFSID(info->pfsid);
>> +
>> +	/*
>> +	 * Submit an invalidation wait descriptor with fence and page request
>> +	 * drain flags set to invalidation queue. This ensures that all requests
>> +	 * submitted to the invalidation queue ahead of this wait descriptor
>> are
>> +	 * processed and completed, and all already issued page requests
>> from
>> +	 * the device are put in the page request queue.
>> +	 */
> 
> I feel this comment is better moved earlier since it explains the overall
> flow including all 3 descriptors. Also it is not one wait descriptor which
> gets both fence and drain flags set. There are two wait descriptors with
> one setting fence and the other setting drain.
> 
>> +	qi_submit_sync(iommu, desc, 1, QI_OPT_WAIT_DRAIN);
> 
> the count is '3' instead of '1'.

Yes. Will fix it in the next version.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 3/5] iommu/vt-d: Disable non-recoverable fault processing before unbind
  2020-05-08  2:12       ` Tian, Kevin
@ 2020-05-08  2:49         ` Lu Baolu
  0 siblings, 0 replies; 21+ messages in thread
From: Lu Baolu @ 2020-05-08  2:49 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel
  Cc: baolu.lu, Raj, Ashok, jacob.jun.pan, Liu, Yi L, iommu, linux-kernel

On 5/8/20 10:12 AM, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Thursday, May 7, 2020 9:23 PM
>>
>> Hi Kevin,
>>
>> On 2020/5/7 13:45, Tian, Kevin wrote:
>>>> From: Lu Baolu<baolu.lu@linux.intel.com>
>>>> Sent: Thursday, May 7, 2020 8:56 AM
>>>>
>>>> When a PASID is used for SVA by the device, it's possible that the PASID
>>>> entry is cleared before the device flushes all ongoing DMA requests. The
>>>> IOMMU should ignore the non-recoverable faults caused by these
>> requests.
>>>> Intel VT-d provides such function through the FPD bit of the PASID entry.
>>>> This sets FPD bit when PASID entry is cleared in the mm notifier and
>>>> clear it when the pasid is unbound.
>>>>
>>>> Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
>>>> ---
>>>>    drivers/iommu/intel-iommu.c |  4 ++--
>>>>    drivers/iommu/intel-pasid.c | 26 +++++++++++++++++++++-----
>>>>    drivers/iommu/intel-pasid.h |  3 ++-
>>>>    drivers/iommu/intel-svm.c   |  9 ++++++---
>>>>    4 files changed, 31 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>>>> index d1866c0905b1..7811422b5a68 100644
>>>> --- a/drivers/iommu/intel-iommu.c
>>>> +++ b/drivers/iommu/intel-iommu.c
>>>> @@ -5352,7 +5352,7 @@ static void
>> __dmar_remove_one_dev_info(struct
>>>> device_domain_info *info)
>>>>    	if (info->dev) {
>>>>    		if (dev_is_pci(info->dev) && sm_supported(iommu))
>>>>    			intel_pasid_tear_down_entry(iommu, info->dev,
>>>> -					PASID_RID2PASID);
>>>> +					PASID_RID2PASID, false);
>>>>
>>>>    		iommu_disable_dev_iotlb(info);
>>>>    		domain_context_clear(iommu, info->dev);
>>>> @@ -5587,7 +5587,7 @@ static void aux_domain_remove_dev(struct
>>>> dmar_domain *domain,
>>>>    	auxiliary_unlink_device(domain, dev);
>>>>
>>>>    	spin_lock(&iommu->lock);
>>>> -	intel_pasid_tear_down_entry(iommu, dev, domain->default_pasid);
>>>> +	intel_pasid_tear_down_entry(iommu, dev, domain->default_pasid,
>>>> false);
>>>>    	domain_detach_iommu(domain, iommu);
>>>>    	spin_unlock(&iommu->lock);
>>>>
>>>> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
>>>> index 7969e3dac2ad..11aef6c12972 100644
>>>> --- a/drivers/iommu/intel-pasid.c
>>>> +++ b/drivers/iommu/intel-pasid.c
>>>> @@ -292,7 +292,20 @@ static inline void pasid_clear_entry(struct
>>>> pasid_entry *pe)
>>>>    	WRITE_ONCE(pe->val[7], 0);
>>>>    }
>>>>
>>>> -static void intel_pasid_clear_entry(struct device *dev, int pasid)
>>>> +static inline void pasid_clear_entry_with_fpd(struct pasid_entry *pe)
>>>> +{
>>>> +	WRITE_ONCE(pe->val[0], PASID_PTE_FPD);
>>>> +	WRITE_ONCE(pe->val[1], 0);
>>>> +	WRITE_ONCE(pe->val[2], 0);
>>>> +	WRITE_ONCE(pe->val[3], 0);
>>>> +	WRITE_ONCE(pe->val[4], 0);
>>>> +	WRITE_ONCE(pe->val[5], 0);
>>>> +	WRITE_ONCE(pe->val[6], 0);
>>>> +	WRITE_ONCE(pe->val[7], 0);
>>>> +}
>>>> +
>>>> +static void
>>>> +intel_pasid_clear_entry(struct device *dev, int pasid, bool pf_ignore)
>>> Hi, Baolu,
>>>
>>> Just curious whether it makes sense to always set FPD here. Yes, SVA is
>>> one known example that non-recoverable fault associated with a PASID
>>> entry might be caused after the entry is cleared and those are considered
>>> benign.  But even in a general context (w/o SVA) why do we care about
>>> such faults after a PASID entry is torn down?
>>
>> First level page tables are also used for DMA protection. For example,
>> thunderbolt peripherals are always untrusted and should be protected
>> with IOMMU. IOMMU should always report unrecoverable faults generated
>> by those device to detect possible DMA attacks.
> 
> when untrusted devices are protected by IOMMU, I don't think PASID
> entry (of RID2PASID) will have present bit cleared.

I mean, protect the system from malicious devices. In any case, IOMMU
should report and log the unrecoverable faults caused by the untrusted
devices.

Best regards,
baolu

> 
>>
>> ATS/PRI devices are always trusted devices, hence we could tolerate
>> setting FPD bit in the time window between mm_release notifier and
>> unbind().
>>
>> Best regards,
>> baolu

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-05-08  2:53 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-07  0:55 [PATCH v4 0/5] iommu/vt-d: Add page request draining support Lu Baolu
2020-05-07  0:55 ` [PATCH v4 1/5] iommu/vt-d: Multiple descriptors per qi_submit_sync() Lu Baolu
2020-05-07  5:34   ` Tian, Kevin
2020-05-07  0:55 ` [PATCH v4 2/5] iommu/vt-d: debugfs: Add support to show inv queue internals Lu Baolu
2020-05-07  5:39   ` Tian, Kevin
2020-05-07 16:47   ` Jacob Pan
2020-05-08  1:37     ` Lu Baolu
2020-05-07  0:55 ` [PATCH v4 3/5] iommu/vt-d: Disable non-recoverable fault processing before unbind Lu Baolu
2020-05-07  5:45   ` Tian, Kevin
2020-05-07 13:23     ` Lu Baolu
2020-05-08  2:12       ` Tian, Kevin
2020-05-08  2:49         ` Lu Baolu
2020-05-07 16:55   ` Jacob Pan
2020-05-08  1:39     ` Lu Baolu
2020-05-07  0:55 ` [PATCH v4 4/5] iommu/vt-d: Add page request draining support Lu Baolu
2020-05-07  6:35   ` Tian, Kevin
2020-05-08  2:26     ` Lu Baolu
2020-05-07  0:55 ` [PATCH v4 5/5] iommu/vt-d: Remove redundant IOTLB flush Lu Baolu
2020-05-07  6:37   ` Tian, Kevin
2020-05-07  6:38 ` [PATCH v4 0/5] iommu/vt-d: Add page request draining support Tian, Kevin
2020-05-07  6:47   ` Lu Baolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).