[PATCH v3 0/4] iommu/vt-d: Add page request draining support

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/4] iommu/vt-d: Add page request draining support
@ 2020-04-22  8:06 ` Lu Baolu
  0 siblings, 0 replies; 18+ messages in thread
From: Lu Baolu @ 2020-04-22  8:06 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

When a PASID is stopped or terminated, there can be pending PRQs
(requests that haven't received responses) in the software and
remapping hardware. The pending page requests must be drained
so that the pasid could be reused. The chapter 7.10 in the VT-d
specification specifies the software steps to drain pending page
requests and responses.

This includes two parts:
 - PATCH 1/4 ~ 2/4: refactor the qi_submit_sync() to support multiple
   descriptors per submission which will be used in the following
   patch.
 - PATCH 3/4 ~ 4/4: add page request drain support after a pasid entry
   is torn down.

Please help to review.

Best regards,
baolu

Change log:
 v2->v3:
  - Address Kevin's review comments
    - Squash the first 2 patches together;
    - The prq thread is serialized, no need to consider reentrance;
    - Ensure no new-coming prq before drain prq in queue;
    - Handle page request overflow case.

 v1->v2:
  - Fix race between multiple prq handling threads.


Lu Baolu (4):
  iommu/vt-d: Multiple descriptors per qi_submit_sync()
  iommu/vt-d: debugfs: Add support to show inv queue internals
  iommu/vt-d: Add page request draining support
  iommu/vt-d: Remove redundant IOTLB flush

 drivers/iommu/dmar.c                |  63 +++++++++-------
 drivers/iommu/intel-iommu-debugfs.c |  62 +++++++++++++++
 drivers/iommu/intel-pasid.c         |   4 +-
 drivers/iommu/intel-svm.c           | 112 +++++++++++++++++++++++++---
 drivers/iommu/intel_irq_remapping.c |   2 +-
 include/linux/intel-iommu.h         |  13 +++-
 6 files changed, 216 insertions(+), 40 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 0/4] iommu/vt-d: Add page request draining support
@ 2020-04-22  8:06 ` Lu Baolu
  0 siblings, 0 replies; 18+ messages in thread
From: Lu Baolu @ 2020-04-22  8:06 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: kevin.tian, ashok.raj, linux-kernel, iommu

When a PASID is stopped or terminated, there can be pending PRQs
(requests that haven't received responses) in the software and
remapping hardware. The pending page requests must be drained
so that the pasid could be reused. The chapter 7.10 in the VT-d
specification specifies the software steps to drain pending page
requests and responses.

This includes two parts:
 - PATCH 1/4 ~ 2/4: refactor the qi_submit_sync() to support multiple
   descriptors per submission which will be used in the following
   patch.
 - PATCH 3/4 ~ 4/4: add page request drain support after a pasid entry
   is torn down.

Please help to review.

Best regards,
baolu

Change log:
 v2->v3:
  - Address Kevin's review comments
    - Squash the first 2 patches together;
    - The prq thread is serialized, no need to consider reentrance;
    - Ensure no new-coming prq before drain prq in queue;
    - Handle page request overflow case.

 v1->v2:
  - Fix race between multiple prq handling threads.


Lu Baolu (4):
  iommu/vt-d: Multiple descriptors per qi_submit_sync()
  iommu/vt-d: debugfs: Add support to show inv queue internals
  iommu/vt-d: Add page request draining support
  iommu/vt-d: Remove redundant IOTLB flush

 drivers/iommu/dmar.c                |  63 +++++++++-------
 drivers/iommu/intel-iommu-debugfs.c |  62 +++++++++++++++
 drivers/iommu/intel-pasid.c         |   4 +-
 drivers/iommu/intel-svm.c           | 112 +++++++++++++++++++++++++---
 drivers/iommu/intel_irq_remapping.c |   2 +-
 include/linux/intel-iommu.h         |  13 +++-
 6 files changed, 216 insertions(+), 40 deletions(-)

-- 
2.17.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/4] iommu/vt-d: Multiple descriptors per qi_submit_sync()
  2020-04-22  8:06 ` Lu Baolu
@ 2020-04-22  8:06   ` Lu Baolu
  -1 siblings, 0 replies; 18+ messages in thread
From: Lu Baolu @ 2020-04-22  8:06 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

Current qi_submit_sync() only supports single invalidation descriptor
per submission and appends wait descriptor after each submission to
poll the hardware completion. This extends the qi_submit_sync() helper
to support multiple descriptors, and add an option so that the caller
could specify the Page-request Drain (PD) bit in the wait descriptor.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/dmar.c                | 63 +++++++++++++++++------------
 drivers/iommu/intel-pasid.c         |  4 +-
 drivers/iommu/intel-svm.c           |  6 +--
 drivers/iommu/intel_irq_remapping.c |  2 +-
 include/linux/intel-iommu.h         |  9 ++++-
 5 files changed, 52 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index d9dc787feef7..61d049e91f84 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1157,12 +1157,11 @@ static inline void reclaim_free_desc(struct q_inval *qi)
 	}
 }
 
-static int qi_check_fault(struct intel_iommu *iommu, int index)
+static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index)
 {
 	u32 fault;
 	int head, tail;
 	struct q_inval *qi = iommu->qi;
-	int wait_index = (index + 1) % QI_LENGTH;
 	int shift = qi_shift(iommu);
 
 	if (qi->desc_status[wait_index] == QI_ABORT)
@@ -1225,17 +1224,21 @@ static int qi_check_fault(struct intel_iommu *iommu, int index)
 }
 
 /*
- * Submit the queued invalidation descriptor to the remapping
- * hardware unit and wait for its completion.
+ * Function to submit invalidation descriptors of all types to the queued
+ * invalidation interface(QI). Multiple descriptors can be submitted at a
+ * time, a wait descriptor will be appended to each submission to ensure
+ * hardware has completed the invalidation before return. Wait descriptors
+ * can be part of the submission but it will not be polled for completion.
  */
-int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
+int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
+		   unsigned int count, unsigned long options)
 {
-	int rc;
 	struct q_inval *qi = iommu->qi;
-	int offset, shift, length;
 	struct qi_desc wait_desc;
 	int wait_index, index;
 	unsigned long flags;
+	int offset, shift;
+	int rc, i;
 
 	if (!qi)
 		return 0;
@@ -1244,32 +1247,41 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 	rc = 0;
 
 	raw_spin_lock_irqsave(&qi->q_lock, flags);
-	while (qi->free_cnt < 3) {
+	/*
+	 * Check if we have enough empty slots in the queue to submit,
+	 * the calculation is based on:
+	 * # of desc + 1 wait desc + 1 space between head and tail
+	 */
+	while (qi->free_cnt < count + 2) {
 		raw_spin_unlock_irqrestore(&qi->q_lock, flags);
 		cpu_relax();
 		raw_spin_lock_irqsave(&qi->q_lock, flags);
 	}
 
 	index = qi->free_head;
-	wait_index = (index + 1) % QI_LENGTH;
+	wait_index = (index + count) % QI_LENGTH;
 	shift = qi_shift(iommu);
-	length = 1 << shift;
 
-	qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE;
+	for (i = 0; i < count; i++) {
+		offset = ((index + i) % QI_LENGTH) << shift;
+		memcpy(qi->desc + offset, &desc[i], 1 << shift);
+		qi->desc_status[(index + i) % QI_LENGTH] = QI_IN_USE;
+	}
+	qi->desc_status[wait_index] = QI_IN_USE;
 
-	offset = index << shift;
-	memcpy(qi->desc + offset, desc, length);
 	wait_desc.qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
 			QI_IWD_STATUS_WRITE | QI_IWD_TYPE;
+	if (options & QI_OPT_WAIT_DRAIN)
+		wait_desc.qw0 |= QI_IWD_PRQ_DRAIN;
 	wait_desc.qw1 = virt_to_phys(&qi->desc_status[wait_index]);
 	wait_desc.qw2 = 0;
 	wait_desc.qw3 = 0;
 
 	offset = wait_index << shift;
-	memcpy(qi->desc + offset, &wait_desc, length);
+	memcpy(qi->desc + offset, &wait_desc, 1 << shift);
 
-	qi->free_head = (qi->free_head + 2) % QI_LENGTH;
-	qi->free_cnt -= 2;
+	qi->free_head = (qi->free_head + count + 1) % QI_LENGTH;
+	qi->free_cnt -= count + 1;
 
 	/*
 	 * update the HW tail register indicating the presence of
@@ -1285,7 +1297,7 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 		 * a deadlock where the interrupt context can wait indefinitely
 		 * for free slots in the queue.
 		 */
-		rc = qi_check_fault(iommu, index);
+		rc = qi_check_fault(iommu, index, wait_index);
 		if (rc)
 			break;
 
@@ -1294,7 +1306,8 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 		raw_spin_lock(&qi->q_lock);
 	}
 
-	qi->desc_status[index] = QI_DONE;
+	for (i = 0; i < count; i++)
+		qi->desc_status[(index + i) % QI_LENGTH] = QI_DONE;
 
 	reclaim_free_desc(qi);
 	raw_spin_unlock_irqrestore(&qi->q_lock, flags);
@@ -1318,7 +1331,7 @@ void qi_global_iec(struct intel_iommu *iommu)
 	desc.qw3 = 0;
 
 	/* should never fail */
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm,
@@ -1332,7 +1345,7 @@ void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
@@ -1356,7 +1369,7 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
@@ -1378,7 +1391,7 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 /* PASID-based IOTLB invalidation */
@@ -1419,7 +1432,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
 				QI_EIOTLB_AM(mask);
 	}
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 /* PASID-based device IOTLB Invalidate */
@@ -1448,7 +1461,7 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 	if (size_order)
 		desc.qw1 |= QI_DEV_EIOTLB_SIZE;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
@@ -1458,7 +1471,7 @@ void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
 
 	desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) |
 			QI_PC_GRAN(granu) | QI_PC_TYPE;
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 /*
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 48cc9ca5f3dc..7969e3dac2ad 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -498,7 +498,7 @@ pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 static void
@@ -512,7 +512,7 @@ iotlb_invalidation_with_pasid(struct intel_iommu *iommu, u16 did, u32 pasid)
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 static void
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index e9f4e979a71f..83dc4319f661 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -138,7 +138,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
 	}
 	desc.qw2 = 0;
 	desc.qw3 = 0;
-	qi_submit_sync(&desc, svm->iommu);
+	qi_submit_sync(svm->iommu, &desc, 1, 0);
 
 	if (sdev->dev_iotlb) {
 		desc.qw0 = QI_DEV_EIOTLB_PASID(svm->pasid) |
@@ -162,7 +162,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
 		}
 		desc.qw2 = 0;
 		desc.qw3 = 0;
-		qi_submit_sync(&desc, svm->iommu);
+		qi_submit_sync(svm->iommu, &desc, 1, 0);
 	}
 }
 
@@ -850,7 +850,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 				       sizeof(req->priv_data));
 			resp.qw2 = 0;
 			resp.qw3 = 0;
-			qi_submit_sync(&resp, iommu);
+			qi_submit_sync(iommu, &resp, 1, 0);
 		}
 		head = (head + sizeof(*req)) & PRQ_RING_MASK;
 	}
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 81e43c1df7ec..a042f123b091 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -151,7 +151,7 @@ static int qi_flush_iec(struct intel_iommu *iommu, int index, int mask)
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	return qi_submit_sync(&desc, iommu);
+	return qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 static int modify_irte(struct irq_2_iommu *irq_iommu,
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index cfe720f10112..cca1e5f9aeaa 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -333,6 +333,7 @@ enum {
 
 #define QI_IWD_STATUS_DATA(d)	(((u64)d) << 32)
 #define QI_IWD_STATUS_WRITE	(((u64)1) << 5)
+#define QI_IWD_PRQ_DRAIN	(((u64)1) << 7)
 
 #define QI_IOTLB_DID(did) 	(((u64)did) << 16)
 #define QI_IOTLB_DR(dr) 	(((u64)dr) << 7)
@@ -710,7 +711,13 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
 			  int pasid);
 
-extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
+int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
+		   unsigned int count, unsigned long options);
+/*
+ * Options used in qi_submit_sync:
+ * QI_OPT_WAIT_DRAIN - Wait for PRQ drain completion, spec 6.5.2.8.
+ */
+#define QI_OPT_WAIT_DRAIN		BIT(0)
 
 extern int dmar_ir_support(void);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 1/4] iommu/vt-d: Multiple descriptors per qi_submit_sync()
@ 2020-04-22  8:06   ` Lu Baolu
  0 siblings, 0 replies; 18+ messages in thread
From: Lu Baolu @ 2020-04-22  8:06 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: kevin.tian, ashok.raj, linux-kernel, iommu

Current qi_submit_sync() only supports single invalidation descriptor
per submission and appends wait descriptor after each submission to
poll the hardware completion. This extends the qi_submit_sync() helper
to support multiple descriptors, and add an option so that the caller
could specify the Page-request Drain (PD) bit in the wait descriptor.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/dmar.c                | 63 +++++++++++++++++------------
 drivers/iommu/intel-pasid.c         |  4 +-
 drivers/iommu/intel-svm.c           |  6 +--
 drivers/iommu/intel_irq_remapping.c |  2 +-
 include/linux/intel-iommu.h         |  9 ++++-
 5 files changed, 52 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index d9dc787feef7..61d049e91f84 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1157,12 +1157,11 @@ static inline void reclaim_free_desc(struct q_inval *qi)
 	}
 }
 
-static int qi_check_fault(struct intel_iommu *iommu, int index)
+static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index)
 {
 	u32 fault;
 	int head, tail;
 	struct q_inval *qi = iommu->qi;
-	int wait_index = (index + 1) % QI_LENGTH;
 	int shift = qi_shift(iommu);
 
 	if (qi->desc_status[wait_index] == QI_ABORT)
@@ -1225,17 +1224,21 @@ static int qi_check_fault(struct intel_iommu *iommu, int index)
 }
 
 /*
- * Submit the queued invalidation descriptor to the remapping
- * hardware unit and wait for its completion.
+ * Function to submit invalidation descriptors of all types to the queued
+ * invalidation interface(QI). Multiple descriptors can be submitted at a
+ * time, a wait descriptor will be appended to each submission to ensure
+ * hardware has completed the invalidation before return. Wait descriptors
+ * can be part of the submission but it will not be polled for completion.
  */
-int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
+int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
+		   unsigned int count, unsigned long options)
 {
-	int rc;
 	struct q_inval *qi = iommu->qi;
-	int offset, shift, length;
 	struct qi_desc wait_desc;
 	int wait_index, index;
 	unsigned long flags;
+	int offset, shift;
+	int rc, i;
 
 	if (!qi)
 		return 0;
@@ -1244,32 +1247,41 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 	rc = 0;
 
 	raw_spin_lock_irqsave(&qi->q_lock, flags);
-	while (qi->free_cnt < 3) {
+	/*
+	 * Check if we have enough empty slots in the queue to submit,
+	 * the calculation is based on:
+	 * # of desc + 1 wait desc + 1 space between head and tail
+	 */
+	while (qi->free_cnt < count + 2) {
 		raw_spin_unlock_irqrestore(&qi->q_lock, flags);
 		cpu_relax();
 		raw_spin_lock_irqsave(&qi->q_lock, flags);
 	}
 
 	index = qi->free_head;
-	wait_index = (index + 1) % QI_LENGTH;
+	wait_index = (index + count) % QI_LENGTH;
 	shift = qi_shift(iommu);
-	length = 1 << shift;
 
-	qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE;
+	for (i = 0; i < count; i++) {
+		offset = ((index + i) % QI_LENGTH) << shift;
+		memcpy(qi->desc + offset, &desc[i], 1 << shift);
+		qi->desc_status[(index + i) % QI_LENGTH] = QI_IN_USE;
+	}
+	qi->desc_status[wait_index] = QI_IN_USE;
 
-	offset = index << shift;
-	memcpy(qi->desc + offset, desc, length);
 	wait_desc.qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
 			QI_IWD_STATUS_WRITE | QI_IWD_TYPE;
+	if (options & QI_OPT_WAIT_DRAIN)
+		wait_desc.qw0 |= QI_IWD_PRQ_DRAIN;
 	wait_desc.qw1 = virt_to_phys(&qi->desc_status[wait_index]);
 	wait_desc.qw2 = 0;
 	wait_desc.qw3 = 0;
 
 	offset = wait_index << shift;
-	memcpy(qi->desc + offset, &wait_desc, length);
+	memcpy(qi->desc + offset, &wait_desc, 1 << shift);
 
-	qi->free_head = (qi->free_head + 2) % QI_LENGTH;
-	qi->free_cnt -= 2;
+	qi->free_head = (qi->free_head + count + 1) % QI_LENGTH;
+	qi->free_cnt -= count + 1;
 
 	/*
 	 * update the HW tail register indicating the presence of
@@ -1285,7 +1297,7 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 		 * a deadlock where the interrupt context can wait indefinitely
 		 * for free slots in the queue.
 		 */
-		rc = qi_check_fault(iommu, index);
+		rc = qi_check_fault(iommu, index, wait_index);
 		if (rc)
 			break;
 
@@ -1294,7 +1306,8 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 		raw_spin_lock(&qi->q_lock);
 	}
 
-	qi->desc_status[index] = QI_DONE;
+	for (i = 0; i < count; i++)
+		qi->desc_status[(index + i) % QI_LENGTH] = QI_DONE;
 
 	reclaim_free_desc(qi);
 	raw_spin_unlock_irqrestore(&qi->q_lock, flags);
@@ -1318,7 +1331,7 @@ void qi_global_iec(struct intel_iommu *iommu)
 	desc.qw3 = 0;
 
 	/* should never fail */
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm,
@@ -1332,7 +1345,7 @@ void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
@@ -1356,7 +1369,7 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
@@ -1378,7 +1391,7 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 /* PASID-based IOTLB invalidation */
@@ -1419,7 +1432,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
 				QI_EIOTLB_AM(mask);
 	}
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 /* PASID-based device IOTLB Invalidate */
@@ -1448,7 +1461,7 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 	if (size_order)
 		desc.qw1 |= QI_DEV_EIOTLB_SIZE;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
@@ -1458,7 +1471,7 @@ void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
 
 	desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) |
 			QI_PC_GRAN(granu) | QI_PC_TYPE;
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 /*
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 48cc9ca5f3dc..7969e3dac2ad 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -498,7 +498,7 @@ pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 static void
@@ -512,7 +512,7 @@ iotlb_invalidation_with_pasid(struct intel_iommu *iommu, u16 did, u32 pasid)
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(&desc, iommu);
+	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 static void
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index e9f4e979a71f..83dc4319f661 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -138,7 +138,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
 	}
 	desc.qw2 = 0;
 	desc.qw3 = 0;
-	qi_submit_sync(&desc, svm->iommu);
+	qi_submit_sync(svm->iommu, &desc, 1, 0);
 
 	if (sdev->dev_iotlb) {
 		desc.qw0 = QI_DEV_EIOTLB_PASID(svm->pasid) |
@@ -162,7 +162,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
 		}
 		desc.qw2 = 0;
 		desc.qw3 = 0;
-		qi_submit_sync(&desc, svm->iommu);
+		qi_submit_sync(svm->iommu, &desc, 1, 0);
 	}
 }
 
@@ -850,7 +850,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 				       sizeof(req->priv_data));
 			resp.qw2 = 0;
 			resp.qw3 = 0;
-			qi_submit_sync(&resp, iommu);
+			qi_submit_sync(iommu, &resp, 1, 0);
 		}
 		head = (head + sizeof(*req)) & PRQ_RING_MASK;
 	}
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 81e43c1df7ec..a042f123b091 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -151,7 +151,7 @@ static int qi_flush_iec(struct intel_iommu *iommu, int index, int mask)
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	return qi_submit_sync(&desc, iommu);
+	return qi_submit_sync(iommu, &desc, 1, 0);
 }
 
 static int modify_irte(struct irq_2_iommu *irq_iommu,
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index cfe720f10112..cca1e5f9aeaa 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -333,6 +333,7 @@ enum {
 
 #define QI_IWD_STATUS_DATA(d)	(((u64)d) << 32)
 #define QI_IWD_STATUS_WRITE	(((u64)1) << 5)
+#define QI_IWD_PRQ_DRAIN	(((u64)1) << 7)
 
 #define QI_IOTLB_DID(did) 	(((u64)did) << 16)
 #define QI_IOTLB_DR(dr) 	(((u64)dr) << 7)
@@ -710,7 +711,13 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
 			  int pasid);
 
-extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
+int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
+		   unsigned int count, unsigned long options);
+/*
+ * Options used in qi_submit_sync:
+ * QI_OPT_WAIT_DRAIN - Wait for PRQ drain completion, spec 6.5.2.8.
+ */
+#define QI_OPT_WAIT_DRAIN		BIT(0)
 
 extern int dmar_ir_support(void);
 
-- 
2.17.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/4] iommu/vt-d: debugfs: Add support to show inv queue internals
  2020-04-22  8:06 ` Lu Baolu
@ 2020-04-22  8:06   ` Lu Baolu
  -1 siblings, 0 replies; 18+ messages in thread
From: Lu Baolu @ 2020-04-22  8:06 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

Export invalidation queue internals of each iommu device through the
debugfs.

Example of such dump on a Skylake machine:

$ sudo cat /sys/kernel/debug/iommu/intel/invalidation_queue
Invalidation queue on IOMMU: dmar1
 Base: 0x1672c9000      Head: 80        Tail: 80
Index           qw0                     qw1                     status
    0   0000000000000004        0000000000000000        0000000000000000
    1   0000000200000025        00000001672be804        0000000000000000
    2   0000000000000011        0000000000000000        0000000000000000
    3   0000000200000025        00000001672be80c        0000000000000000
    4   00000000000000d2        0000000000000000        0000000000000000
    5   0000000200000025        00000001672be814        0000000000000000
    6   0000000000000014        0000000000000000        0000000000000000
    7   0000000200000025        00000001672be81c        0000000000000000
    8   0000000000000014        0000000000000000        0000000000000000
    9   0000000200000025        00000001672be824        0000000000000000

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu-debugfs.c | 62 +++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/drivers/iommu/intel-iommu-debugfs.c b/drivers/iommu/intel-iommu-debugfs.c
index 3eb1fe240fb0..e3089865b8f3 100644
--- a/drivers/iommu/intel-iommu-debugfs.c
+++ b/drivers/iommu/intel-iommu-debugfs.c
@@ -372,6 +372,66 @@ static int domain_translation_struct_show(struct seq_file *m, void *unused)
 }
 DEFINE_SHOW_ATTRIBUTE(domain_translation_struct);
 
+static void invalidation_queue_entry_show(struct seq_file *m,
+					  struct intel_iommu *iommu)
+{
+	int index, shift = qi_shift(iommu);
+	struct qi_desc *desc;
+	int offset;
+
+	if (ecap_smts(iommu->ecap))
+		seq_puts(m, "Index\t\tqw0\t\t\tqw1\t\t\tqw2\t\t\tqw3\t\t\tstatus\n");
+	else
+		seq_puts(m, "Index\t\tqw0\t\t\tqw1\t\t\tstatus\n");
+
+	for (index = 0; index < QI_LENGTH; index++) {
+		offset = index << shift;
+		desc = iommu->qi->desc + offset;
+		if (ecap_smts(iommu->ecap))
+			seq_printf(m, "%5d\t%016llx\t%016llx\t%016llx\t%016llx\t%016x\n",
+				   index, desc->qw0, desc->qw1,
+				   desc->qw2, desc->qw3,
+				   iommu->qi->desc_status[index]);
+		else
+			seq_printf(m, "%5d\t%016llx\t%016llx\t%016x\n",
+				   index, desc->qw0, desc->qw1,
+				   iommu->qi->desc_status[index]);
+	}
+}
+
+static int invalidation_queue_show(struct seq_file *m, void *unused)
+{
+	struct dmar_drhd_unit *drhd;
+	struct intel_iommu *iommu;
+	unsigned long flags;
+	struct q_inval *qi;
+	int shift;
+
+	rcu_read_lock();
+	for_each_active_iommu(iommu, drhd) {
+		qi = iommu->qi;
+		shift = qi_shift(iommu);
+
+		if (!qi || !ecap_qis(iommu->ecap))
+			continue;
+
+		seq_printf(m, "Invalidation queue on IOMMU: %s\n", iommu->name);
+
+		raw_spin_lock_irqsave(&qi->q_lock, flags);
+		seq_printf(m, " Base: 0x%llx\tHead: %lld\tTail: %lld\n",
+			   virt_to_phys(qi->desc),
+			   dmar_readq(iommu->reg + DMAR_IQH_REG) >> shift,
+			   dmar_readq(iommu->reg + DMAR_IQT_REG) >> shift);
+		invalidation_queue_entry_show(m, iommu);
+		raw_spin_unlock_irqrestore(&qi->q_lock, flags);
+		seq_putc(m, '\n');
+	}
+	rcu_read_unlock();
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(invalidation_queue);
+
 #ifdef CONFIG_IRQ_REMAP
 static void ir_tbl_remap_entry_show(struct seq_file *m,
 				    struct intel_iommu *iommu)
@@ -490,6 +550,8 @@ void __init intel_iommu_debugfs_init(void)
 	debugfs_create_file("domain_translation_struct", 0444,
 			    intel_iommu_debug, NULL,
 			    &domain_translation_struct_fops);
+	debugfs_create_file("invalidation_queue", 0444, intel_iommu_debug,
+			    NULL, &invalidation_queue_fops);
 #ifdef CONFIG_IRQ_REMAP
 	debugfs_create_file("ir_translation_struct", 0444, intel_iommu_debug,
 			    NULL, &ir_translation_struct_fops);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/4] iommu/vt-d: debugfs: Add support to show inv queue internals
@ 2020-04-22  8:06   ` Lu Baolu
  0 siblings, 0 replies; 18+ messages in thread
From: Lu Baolu @ 2020-04-22  8:06 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: kevin.tian, ashok.raj, linux-kernel, iommu

Export invalidation queue internals of each iommu device through the
debugfs.

Example of such dump on a Skylake machine:

$ sudo cat /sys/kernel/debug/iommu/intel/invalidation_queue
Invalidation queue on IOMMU: dmar1
 Base: 0x1672c9000      Head: 80        Tail: 80
Index           qw0                     qw1                     status
    0   0000000000000004        0000000000000000        0000000000000000
    1   0000000200000025        00000001672be804        0000000000000000
    2   0000000000000011        0000000000000000        0000000000000000
    3   0000000200000025        00000001672be80c        0000000000000000
    4   00000000000000d2        0000000000000000        0000000000000000
    5   0000000200000025        00000001672be814        0000000000000000
    6   0000000000000014        0000000000000000        0000000000000000
    7   0000000200000025        00000001672be81c        0000000000000000
    8   0000000000000014        0000000000000000        0000000000000000
    9   0000000200000025        00000001672be824        0000000000000000

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu-debugfs.c | 62 +++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/drivers/iommu/intel-iommu-debugfs.c b/drivers/iommu/intel-iommu-debugfs.c
index 3eb1fe240fb0..e3089865b8f3 100644
--- a/drivers/iommu/intel-iommu-debugfs.c
+++ b/drivers/iommu/intel-iommu-debugfs.c
@@ -372,6 +372,66 @@ static int domain_translation_struct_show(struct seq_file *m, void *unused)
 }
 DEFINE_SHOW_ATTRIBUTE(domain_translation_struct);
 
+static void invalidation_queue_entry_show(struct seq_file *m,
+					  struct intel_iommu *iommu)
+{
+	int index, shift = qi_shift(iommu);
+	struct qi_desc *desc;
+	int offset;
+
+	if (ecap_smts(iommu->ecap))
+		seq_puts(m, "Index\t\tqw0\t\t\tqw1\t\t\tqw2\t\t\tqw3\t\t\tstatus\n");
+	else
+		seq_puts(m, "Index\t\tqw0\t\t\tqw1\t\t\tstatus\n");
+
+	for (index = 0; index < QI_LENGTH; index++) {
+		offset = index << shift;
+		desc = iommu->qi->desc + offset;
+		if (ecap_smts(iommu->ecap))
+			seq_printf(m, "%5d\t%016llx\t%016llx\t%016llx\t%016llx\t%016x\n",
+				   index, desc->qw0, desc->qw1,
+				   desc->qw2, desc->qw3,
+				   iommu->qi->desc_status[index]);
+		else
+			seq_printf(m, "%5d\t%016llx\t%016llx\t%016x\n",
+				   index, desc->qw0, desc->qw1,
+				   iommu->qi->desc_status[index]);
+	}
+}
+
+static int invalidation_queue_show(struct seq_file *m, void *unused)
+{
+	struct dmar_drhd_unit *drhd;
+	struct intel_iommu *iommu;
+	unsigned long flags;
+	struct q_inval *qi;
+	int shift;
+
+	rcu_read_lock();
+	for_each_active_iommu(iommu, drhd) {
+		qi = iommu->qi;
+		shift = qi_shift(iommu);
+
+		if (!qi || !ecap_qis(iommu->ecap))
+			continue;
+
+		seq_printf(m, "Invalidation queue on IOMMU: %s\n", iommu->name);
+
+		raw_spin_lock_irqsave(&qi->q_lock, flags);
+		seq_printf(m, " Base: 0x%llx\tHead: %lld\tTail: %lld\n",
+			   virt_to_phys(qi->desc),
+			   dmar_readq(iommu->reg + DMAR_IQH_REG) >> shift,
+			   dmar_readq(iommu->reg + DMAR_IQT_REG) >> shift);
+		invalidation_queue_entry_show(m, iommu);
+		raw_spin_unlock_irqrestore(&qi->q_lock, flags);
+		seq_putc(m, '\n');
+	}
+	rcu_read_unlock();
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(invalidation_queue);
+
 #ifdef CONFIG_IRQ_REMAP
 static void ir_tbl_remap_entry_show(struct seq_file *m,
 				    struct intel_iommu *iommu)
@@ -490,6 +550,8 @@ void __init intel_iommu_debugfs_init(void)
 	debugfs_create_file("domain_translation_struct", 0444,
 			    intel_iommu_debug, NULL,
 			    &domain_translation_struct_fops);
+	debugfs_create_file("invalidation_queue", 0444, intel_iommu_debug,
+			    NULL, &invalidation_queue_fops);
 #ifdef CONFIG_IRQ_REMAP
 	debugfs_create_file("ir_translation_struct", 0444, intel_iommu_debug,
 			    NULL, &ir_translation_struct_fops);
-- 
2.17.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 3/4] iommu/vt-d: Add page request draining support
  2020-04-22  8:06 ` Lu Baolu
@ 2020-04-22  8:06   ` Lu Baolu
  -1 siblings, 0 replies; 18+ messages in thread
From: Lu Baolu @ 2020-04-22  8:06 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

When a PASID is stopped or terminated, there can be pending PRQs
(requests that haven't received responses) in remapping hardware.
This adds the interface to drain page requests and call it when a
PASID is terminated.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-svm.c   | 103 ++++++++++++++++++++++++++++++++++--
 include/linux/intel-iommu.h |   4 ++
 2 files changed, 102 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 83dc4319f661..2534641ef707 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -23,6 +23,7 @@
 #include "intel-pasid.h"
 
 static irqreturn_t prq_event_thread(int irq, void *d);
+static void intel_svm_drain_prq(struct device *dev, int pasid);
 
 #define PRQ_ORDER 0
 
@@ -66,6 +67,8 @@ int intel_svm_enable_prq(struct intel_iommu *iommu)
 	dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
 	dmar_writeq(iommu->reg + DMAR_PQA_REG, virt_to_phys(iommu->prq) | PRQ_ORDER);
 
+	init_completion(&iommu->prq_complete);
+
 	return 0;
 }
 
@@ -208,6 +211,7 @@ static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 	rcu_read_lock();
 	list_for_each_entry_rcu(sdev, &svm->devs, list) {
 		intel_pasid_tear_down_entry(svm->iommu, sdev->dev, svm->pasid);
+		intel_svm_drain_prq(sdev->dev, svm->pasid);
 		intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 	}
 	rcu_read_unlock();
@@ -401,12 +405,8 @@ int intel_svm_unbind_gpasid(struct device *dev, int pasid)
 		if (!sdev->users) {
 			list_del_rcu(&sdev->list);
 			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+			intel_svm_drain_prq(dev, svm->pasid);
 			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
-			/* TODO: Drain in flight PRQ for the PASID since it
-			 * may get reused soon, we don't want to
-			 * confuse with its previous life.
-			 * intel_svm_drain_prq(dev, pasid);
-			 */
 			kfree_rcu(sdev, rcu);
 
 			if (list_empty(&svm->devs)) {
@@ -644,6 +644,7 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
 			 * large and has to be physically contiguous. So it's
 			 * hard to be as defensive as we might like. */
 			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+			intel_svm_drain_prq(dev, svm->pasid);
 			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 			kfree_rcu(sdev, rcu);
 
@@ -722,6 +723,92 @@ static bool is_canonical_address(u64 addr)
 	return (((saddr << shift) >> shift) == saddr);
 }
 
+/**
+ * intel_svm_drain_prq:
+ *
+ * Drain all pending page requests and responses related to a specific
+ * pasid in both software and hardware.
+ */
+static void intel_svm_drain_prq(struct device *dev, int pasid)
+{
+	struct device_domain_info *info;
+	struct dmar_domain *domain;
+	struct intel_iommu *iommu;
+	struct qi_desc desc[3];
+	struct pci_dev *pdev;
+	int head, tail;
+	u16 sid, did;
+	int qdep;
+
+	info = get_domain_info(dev);
+	if (WARN_ON(!info || !dev_is_pci(dev)))
+		return;
+
+	if (!info->ats_enabled)
+		return;
+
+	iommu = info->iommu;
+	domain = info->domain;
+	pdev = to_pci_dev(dev);
+	sid = PCI_DEVID(info->bus, info->devfn);
+	did = domain->iommu_did[iommu->seq_id];
+	qdep = pci_ats_queue_depth(pdev);
+
+	memset(desc, 0, sizeof(desc));
+	desc[0].qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
+			QI_IWD_FENCE |
+			QI_IWD_TYPE;
+	desc[1].qw0 = QI_EIOTLB_PASID(pasid) |
+			QI_EIOTLB_DID(did) |
+			QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
+			QI_EIOTLB_TYPE;
+	desc[2].qw0 = QI_DEV_EIOTLB_PASID(pasid) |
+			QI_DEV_EIOTLB_SID(sid) |
+			QI_DEV_EIOTLB_QDEP(qdep) |
+			QI_DEIOTLB_TYPE |
+			QI_DEV_IOTLB_PFSID(info->pfsid);
+
+	/*
+	 * Submit an invalidation wait descriptor with fence and page request
+	 * drain flags set to invalidation queue. This ensures that all requests
+	 * submitted to the invalidation queue ahead of this wait descriptor are
+	 * processed and completed, and all already issued page requests from
+	 * the device are put in the page request queue.
+	 */
+	qi_submit_sync(iommu, desc, 1, QI_OPT_WAIT_DRAIN);
+
+	/*
+	 * Check and wait until all pending page requests in the queue are
+	 * handled by the intr thread.
+	 */
+prq_retry:
+	tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
+	head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
+	while (head != tail) {
+		struct page_req_dsc *req;
+
+		req = &iommu->prq[head / sizeof(*req)];
+		if (!req->pasid_present || req->pasid != pasid) {
+			head = (head + sizeof(*req)) & PRQ_RING_MASK;
+			continue;
+		}
+
+		wait_for_completion_timeout(&iommu->prq_complete, HZ);
+		goto prq_retry;
+	}
+
+	/*
+	 * Perform steps described in VT-d spec CH7.10 to drain page
+	 * requests and responses in hardware.
+	 */
+qi_retry:
+	qi_submit_sync(iommu, desc, 3, QI_OPT_WAIT_DRAIN);
+	if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO) {
+		wait_for_completion_timeout(&iommu->prq_complete, HZ);
+		goto qi_retry;
+	}
+}
+
 static irqreturn_t prq_event_thread(int irq, void *d)
 {
 	struct intel_iommu *iommu = d;
@@ -856,6 +943,12 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 	}
 
 	dmar_writeq(iommu->reg + DMAR_PQH_REG, tail);
+	/*
+	 * Clear the page request overflow bit and wake up all threads that
+	 * are waiting for the completion of this handling.
+	 */
+	writel(DMA_PRS_PRO, iommu->reg + DMAR_PRS_REG);
+	complete(&iommu->prq_complete);
 
 	return IRQ_RETVAL(handled);
 }
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index cca1e5f9aeaa..a0512b401a59 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -292,6 +292,8 @@
 
 /* PRS_REG */
 #define DMA_PRS_PPR	((u32)1)
+#define DMA_PRS_PRO	((u32)2)
+
 #define DMA_VCS_PAS	((u64)1)
 
 #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts)			\
@@ -333,6 +335,7 @@ enum {
 
 #define QI_IWD_STATUS_DATA(d)	(((u64)d) << 32)
 #define QI_IWD_STATUS_WRITE	(((u64)1) << 5)
+#define QI_IWD_FENCE		(((u64)1) << 6)
 #define QI_IWD_PRQ_DRAIN	(((u64)1) << 7)
 
 #define QI_IOTLB_DID(did) 	(((u64)did) << 16)
@@ -590,6 +593,7 @@ struct intel_iommu {
 #ifdef CONFIG_INTEL_IOMMU_SVM
 	struct page_req_dsc *prq;
 	unsigned char prq_name[16];    /* Name for PRQ interrupt */
+	struct completion prq_complete;
 	struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for PASIDs */
 #endif
 	struct q_inval  *qi;            /* Queued invalidation info */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 3/4] iommu/vt-d: Add page request draining support
@ 2020-04-22  8:06   ` Lu Baolu
  0 siblings, 0 replies; 18+ messages in thread
From: Lu Baolu @ 2020-04-22  8:06 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: kevin.tian, ashok.raj, linux-kernel, iommu

When a PASID is stopped or terminated, there can be pending PRQs
(requests that haven't received responses) in remapping hardware.
This adds the interface to drain page requests and call it when a
PASID is terminated.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-svm.c   | 103 ++++++++++++++++++++++++++++++++++--
 include/linux/intel-iommu.h |   4 ++
 2 files changed, 102 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 83dc4319f661..2534641ef707 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -23,6 +23,7 @@
 #include "intel-pasid.h"
 
 static irqreturn_t prq_event_thread(int irq, void *d);
+static void intel_svm_drain_prq(struct device *dev, int pasid);
 
 #define PRQ_ORDER 0
 
@@ -66,6 +67,8 @@ int intel_svm_enable_prq(struct intel_iommu *iommu)
 	dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
 	dmar_writeq(iommu->reg + DMAR_PQA_REG, virt_to_phys(iommu->prq) | PRQ_ORDER);
 
+	init_completion(&iommu->prq_complete);
+
 	return 0;
 }
 
@@ -208,6 +211,7 @@ static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 	rcu_read_lock();
 	list_for_each_entry_rcu(sdev, &svm->devs, list) {
 		intel_pasid_tear_down_entry(svm->iommu, sdev->dev, svm->pasid);
+		intel_svm_drain_prq(sdev->dev, svm->pasid);
 		intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 	}
 	rcu_read_unlock();
@@ -401,12 +405,8 @@ int intel_svm_unbind_gpasid(struct device *dev, int pasid)
 		if (!sdev->users) {
 			list_del_rcu(&sdev->list);
 			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+			intel_svm_drain_prq(dev, svm->pasid);
 			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
-			/* TODO: Drain in flight PRQ for the PASID since it
-			 * may get reused soon, we don't want to
-			 * confuse with its previous life.
-			 * intel_svm_drain_prq(dev, pasid);
-			 */
 			kfree_rcu(sdev, rcu);
 
 			if (list_empty(&svm->devs)) {
@@ -644,6 +644,7 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
 			 * large and has to be physically contiguous. So it's
 			 * hard to be as defensive as we might like. */
 			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+			intel_svm_drain_prq(dev, svm->pasid);
 			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 			kfree_rcu(sdev, rcu);
 
@@ -722,6 +723,92 @@ static bool is_canonical_address(u64 addr)
 	return (((saddr << shift) >> shift) == saddr);
 }
 
+/**
+ * intel_svm_drain_prq:
+ *
+ * Drain all pending page requests and responses related to a specific
+ * pasid in both software and hardware.
+ */
+static void intel_svm_drain_prq(struct device *dev, int pasid)
+{
+	struct device_domain_info *info;
+	struct dmar_domain *domain;
+	struct intel_iommu *iommu;
+	struct qi_desc desc[3];
+	struct pci_dev *pdev;
+	int head, tail;
+	u16 sid, did;
+	int qdep;
+
+	info = get_domain_info(dev);
+	if (WARN_ON(!info || !dev_is_pci(dev)))
+		return;
+
+	if (!info->ats_enabled)
+		return;
+
+	iommu = info->iommu;
+	domain = info->domain;
+	pdev = to_pci_dev(dev);
+	sid = PCI_DEVID(info->bus, info->devfn);
+	did = domain->iommu_did[iommu->seq_id];
+	qdep = pci_ats_queue_depth(pdev);
+
+	memset(desc, 0, sizeof(desc));
+	desc[0].qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
+			QI_IWD_FENCE |
+			QI_IWD_TYPE;
+	desc[1].qw0 = QI_EIOTLB_PASID(pasid) |
+			QI_EIOTLB_DID(did) |
+			QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
+			QI_EIOTLB_TYPE;
+	desc[2].qw0 = QI_DEV_EIOTLB_PASID(pasid) |
+			QI_DEV_EIOTLB_SID(sid) |
+			QI_DEV_EIOTLB_QDEP(qdep) |
+			QI_DEIOTLB_TYPE |
+			QI_DEV_IOTLB_PFSID(info->pfsid);
+
+	/*
+	 * Submit an invalidation wait descriptor with fence and page request
+	 * drain flags set to invalidation queue. This ensures that all requests
+	 * submitted to the invalidation queue ahead of this wait descriptor are
+	 * processed and completed, and all already issued page requests from
+	 * the device are put in the page request queue.
+	 */
+	qi_submit_sync(iommu, desc, 1, QI_OPT_WAIT_DRAIN);
+
+	/*
+	 * Check and wait until all pending page requests in the queue are
+	 * handled by the intr thread.
+	 */
+prq_retry:
+	tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
+	head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
+	while (head != tail) {
+		struct page_req_dsc *req;
+
+		req = &iommu->prq[head / sizeof(*req)];
+		if (!req->pasid_present || req->pasid != pasid) {
+			head = (head + sizeof(*req)) & PRQ_RING_MASK;
+			continue;
+		}
+
+		wait_for_completion_timeout(&iommu->prq_complete, HZ);
+		goto prq_retry;
+	}
+
+	/*
+	 * Perform steps described in VT-d spec CH7.10 to drain page
+	 * requests and responses in hardware.
+	 */
+qi_retry:
+	qi_submit_sync(iommu, desc, 3, QI_OPT_WAIT_DRAIN);
+	if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO) {
+		wait_for_completion_timeout(&iommu->prq_complete, HZ);
+		goto qi_retry;
+	}
+}
+
 static irqreturn_t prq_event_thread(int irq, void *d)
 {
 	struct intel_iommu *iommu = d;
@@ -856,6 +943,12 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 	}
 
 	dmar_writeq(iommu->reg + DMAR_PQH_REG, tail);
+	/*
+	 * Clear the page request overflow bit and wake up all threads that
+	 * are waiting for the completion of this handling.
+	 */
+	writel(DMA_PRS_PRO, iommu->reg + DMAR_PRS_REG);
+	complete(&iommu->prq_complete);
 
 	return IRQ_RETVAL(handled);
 }
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index cca1e5f9aeaa..a0512b401a59 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -292,6 +292,8 @@
 
 /* PRS_REG */
 #define DMA_PRS_PPR	((u32)1)
+#define DMA_PRS_PRO	((u32)2)
+
 #define DMA_VCS_PAS	((u64)1)
 
 #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts)			\
@@ -333,6 +335,7 @@ enum {
 
 #define QI_IWD_STATUS_DATA(d)	(((u64)d) << 32)
 #define QI_IWD_STATUS_WRITE	(((u64)1) << 5)
+#define QI_IWD_FENCE		(((u64)1) << 6)
 #define QI_IWD_PRQ_DRAIN	(((u64)1) << 7)
 
 #define QI_IOTLB_DID(did) 	(((u64)did) << 16)
@@ -590,6 +593,7 @@ struct intel_iommu {
 #ifdef CONFIG_INTEL_IOMMU_SVM
 	struct page_req_dsc *prq;
 	unsigned char prq_name[16];    /* Name for PRQ interrupt */
+	struct completion prq_complete;
 	struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for PASIDs */
 #endif
 	struct q_inval  *qi;            /* Queued invalidation info */
-- 
2.17.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 4/4] iommu/vt-d: Remove redundant IOTLB flush
  2020-04-22  8:06 ` Lu Baolu
@ 2020-04-22  8:06   ` Lu Baolu
  -1 siblings, 0 replies; 18+ messages in thread
From: Lu Baolu @ 2020-04-22  8:06 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: ashok.raj, jacob.jun.pan, Liu Yi L, kevin.tian, iommu,
	linux-kernel, Lu Baolu

IOTLB flush already included in the PASID tear down and the page request
drain process. There is no need to flush again.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-svm.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 2534641ef707..543109fe2003 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -212,7 +212,6 @@ static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 	list_for_each_entry_rcu(sdev, &svm->devs, list) {
 		intel_pasid_tear_down_entry(svm->iommu, sdev->dev, svm->pasid);
 		intel_svm_drain_prq(sdev->dev, svm->pasid);
-		intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 	}
 	rcu_read_unlock();
 
@@ -406,7 +405,6 @@ int intel_svm_unbind_gpasid(struct device *dev, int pasid)
 			list_del_rcu(&sdev->list);
 			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
 			intel_svm_drain_prq(dev, svm->pasid);
-			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 			kfree_rcu(sdev, rcu);
 
 			if (list_empty(&svm->devs)) {
@@ -645,7 +643,6 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
 			 * hard to be as defensive as we might like. */
 			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
 			intel_svm_drain_prq(dev, svm->pasid);
-			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 			kfree_rcu(sdev, rcu);
 
 			if (list_empty(&svm->devs)) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 4/4] iommu/vt-d: Remove redundant IOTLB flush
@ 2020-04-22  8:06   ` Lu Baolu
  0 siblings, 0 replies; 18+ messages in thread
From: Lu Baolu @ 2020-04-22  8:06 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: kevin.tian, ashok.raj, linux-kernel, iommu

IOTLB flush already included in the PASID tear down and the page request
drain process. There is no need to flush again.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-svm.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 2534641ef707..543109fe2003 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -212,7 +212,6 @@ static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 	list_for_each_entry_rcu(sdev, &svm->devs, list) {
 		intel_pasid_tear_down_entry(svm->iommu, sdev->dev, svm->pasid);
 		intel_svm_drain_prq(sdev->dev, svm->pasid);
-		intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 	}
 	rcu_read_unlock();
 
@@ -406,7 +405,6 @@ int intel_svm_unbind_gpasid(struct device *dev, int pasid)
 			list_del_rcu(&sdev->list);
 			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
 			intel_svm_drain_prq(dev, svm->pasid);
-			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 			kfree_rcu(sdev, rcu);
 
 			if (list_empty(&svm->devs)) {
@@ -645,7 +643,6 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
 			 * hard to be as defensive as we might like. */
 			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
 			intel_svm_drain_prq(dev, svm->pasid);
-			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
 			kfree_rcu(sdev, rcu);
 
 			if (list_empty(&svm->devs)) {
-- 
2.17.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/4] iommu/vt-d: Add page request draining support
  2020-04-22  8:06 ` Lu Baolu
@ 2020-04-28  1:47   ` Lu Baolu
  -1 siblings, 0 replies; 18+ messages in thread
From: Lu Baolu @ 2020-04-28  1:47 UTC (permalink / raw)
  To: kevin.tian
  Cc: Joerg Roedel, baolu.lu, ashok.raj, jacob.jun.pan, Liu Yi L,
	iommu, linux-kernel

Hi Kevin,

On 2020/4/22 16:06, Lu Baolu wrote:
> When a PASID is stopped or terminated, there can be pending PRQs
> (requests that haven't received responses) in the software and
> remapping hardware. The pending page requests must be drained
> so that the pasid could be reused. The chapter 7.10 in the VT-d
> specification specifies the software steps to drain pending page
> requests and responses.
> 
> This includes two parts:
>   - PATCH 1/4 ~ 2/4: refactor the qi_submit_sync() to support multiple
>     descriptors per submission which will be used in the following
>     patch.
>   - PATCH 3/4 ~ 4/4: add page request drain support after a pasid entry
>     is torn down.
> 
> Please help to review.
> 
> Best regards,
> baolu
> 
> Change log:
>   v2->v3:
>    - Address Kevin's review comments
>      - Squash the first 2 patches together;
>      - The prq thread is serialized, no need to consider reentrance;
>      - Ensure no new-coming prq before drain prq in queue;
>      - Handle page request overflow case.

Very appreciated for your review comments.

How about these changes? Any comments?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/4] iommu/vt-d: Add page request draining support
@ 2020-04-28  1:47   ` Lu Baolu
  0 siblings, 0 replies; 18+ messages in thread
From: Lu Baolu @ 2020-04-28  1:47 UTC (permalink / raw)
  To: kevin.tian; +Cc: ashok.raj, linux-kernel, iommu

Hi Kevin,

On 2020/4/22 16:06, Lu Baolu wrote:
> When a PASID is stopped or terminated, there can be pending PRQs
> (requests that haven't received responses) in the software and
> remapping hardware. The pending page requests must be drained
> so that the pasid could be reused. The chapter 7.10 in the VT-d
> specification specifies the software steps to drain pending page
> requests and responses.
> 
> This includes two parts:
>   - PATCH 1/4 ~ 2/4: refactor the qi_submit_sync() to support multiple
>     descriptors per submission which will be used in the following
>     patch.
>   - PATCH 3/4 ~ 4/4: add page request drain support after a pasid entry
>     is torn down.
> 
> Please help to review.
> 
> Best regards,
> baolu
> 
> Change log:
>   v2->v3:
>    - Address Kevin's review comments
>      - Squash the first 2 patches together;
>      - The prq thread is serialized, no need to consider reentrance;
>      - Ensure no new-coming prq before drain prq in queue;
>      - Handle page request overflow case.

Very appreciated for your review comments.

How about these changes? Any comments?

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/4] iommu/vt-d: Add page request draining support
  2020-04-22  8:06   ` Lu Baolu
@ 2020-04-29  3:36     ` Jacob Pan
  -1 siblings, 0 replies; 18+ messages in thread
From: Jacob Pan @ 2020-04-29  3:36 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, ashok.raj, Liu Yi L, kevin.tian, iommu,
	linux-kernel, jacob.jun.pan

On Wed, 22 Apr 2020 16:06:10 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> When a PASID is stopped or terminated, there can be pending PRQs
> (requests that haven't received responses) in remapping hardware.
> This adds the interface to drain page requests and call it when a
> PASID is terminated.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-svm.c   | 103
> ++++++++++++++++++++++++++++++++++-- include/linux/intel-iommu.h |
> 4 ++ 2 files changed, 102 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 83dc4319f661..2534641ef707 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -23,6 +23,7 @@
>  #include "intel-pasid.h"
>  
>  static irqreturn_t prq_event_thread(int irq, void *d);
> +static void intel_svm_drain_prq(struct device *dev, int pasid);
>  
>  #define PRQ_ORDER 0
>  
> @@ -66,6 +67,8 @@ int intel_svm_enable_prq(struct intel_iommu *iommu)
>  	dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
>  	dmar_writeq(iommu->reg + DMAR_PQA_REG,
> virt_to_phys(iommu->prq) | PRQ_ORDER); 
> +	init_completion(&iommu->prq_complete);
> +
>  	return 0;
>  }
>  
> @@ -208,6 +211,7 @@ static void intel_mm_release(struct mmu_notifier
> *mn, struct mm_struct *mm) rcu_read_lock();
>  	list_for_each_entry_rcu(sdev, &svm->devs, list) {
>  		intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
> svm->pasid);
> +		intel_svm_drain_prq(sdev->dev, svm->pasid);
mmu_notifier release is called in atomic context, drain_prq needs to
wait for completion. I tested exit path when a process crashes. I got

[  +0.696214] BUG: sleeping function called from invalid context at kernel/sched/completion.c:101
[  +0.000068] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3235, name: dsatest
[  +0.000046] INFO: lockdep is turned off.
[  +0.000002] CPU: 1 PID: 3235 Comm: dsatest Not tainted 5.7.0-rc1-z-svmtest+ #1637
[  +0.000000] Hardware name: Intel Corporation Kabylake Client platform/Skylake Halo DDR4 RVP11, BIOS 
04.1709050855 09/05/2017
[  +0.000001] Call Trace:
[  +0.000004]  dump_stack+0x68/0x9b
[  +0.000003]  ___might_sleep+0x229/0x250
[  +0.000003]  wait_for_completion_timeout+0x3c/0x110
[  +0.000003]  intel_svm_drain_prq+0x12f/0x210
[  +0.000003]  intel_mm_release+0x73/0x110
[  +0.000003]  __mmu_notifier_release+0x94/0x220
[  +0.000002]  ? do_munmap+0x10/0x10
[  +0.000002]  ? prepare_ftrace_return+0x5c/0x80
[  +0.000003]  exit_mmap+0x156/0x1a0
[  +0.000002]  ? mmput+0x44/0x120
[  +0.000003]  ? exit_mmap+0x5/0x1a0
[  +0.000002]  ? ftrace_graph_caller+0xa0/0xa0
[  +0.000001]  mmput+0x5e/0x120


>  		intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>  	}
>  	rcu_read_unlock();
> @@ -401,12 +405,8 @@ int intel_svm_unbind_gpasid(struct device *dev,
> int pasid) if (!sdev->users) {
>  			list_del_rcu(&sdev->list);
>  			intel_pasid_tear_down_entry(iommu, dev,
> svm->pasid);
> +			intel_svm_drain_prq(dev, svm->pasid);
>  			intel_flush_svm_range_dev(svm, sdev, 0, -1,
> 0);
> -			/* TODO: Drain in flight PRQ for the PASID
> since it
> -			 * may get reused soon, we don't want to
> -			 * confuse with its previous life.
> -			 * intel_svm_drain_prq(dev, pasid);
> -			 */
>  			kfree_rcu(sdev, rcu);
>  
>  			if (list_empty(&svm->devs)) {
> @@ -644,6 +644,7 @@ int intel_svm_unbind_mm(struct device *dev, int
> pasid)
>  			 * large and has to be physically
> contiguous. So it's
>  			 * hard to be as defensive as we might like.
> */ intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
> +			intel_svm_drain_prq(dev, svm->pasid);
>  			intel_flush_svm_range_dev(svm, sdev, 0, -1,
> 0); kfree_rcu(sdev, rcu);
>  
> @@ -722,6 +723,92 @@ static bool is_canonical_address(u64 addr)
>  	return (((saddr << shift) >> shift) == saddr);
>  }
>  
> +/**
> + * intel_svm_drain_prq:
> + *
> + * Drain all pending page requests and responses related to a
> specific
> + * pasid in both software and hardware.
> + */
> +static void intel_svm_drain_prq(struct device *dev, int pasid)
> +{
> +	struct device_domain_info *info;
> +	struct dmar_domain *domain;
> +	struct intel_iommu *iommu;
> +	struct qi_desc desc[3];
> +	struct pci_dev *pdev;
> +	int head, tail;
> +	u16 sid, did;
> +	int qdep;
> +
> +	info = get_domain_info(dev);
> +	if (WARN_ON(!info || !dev_is_pci(dev)))
> +		return;
> +
> +	if (!info->ats_enabled)
> +		return;
> +
> +	iommu = info->iommu;
> +	domain = info->domain;
> +	pdev = to_pci_dev(dev);
> +	sid = PCI_DEVID(info->bus, info->devfn);
> +	did = domain->iommu_did[iommu->seq_id];
> +	qdep = pci_ats_queue_depth(pdev);
> +
> +	memset(desc, 0, sizeof(desc));
> +	desc[0].qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
> +			QI_IWD_FENCE |
> +			QI_IWD_TYPE;
> +	desc[1].qw0 = QI_EIOTLB_PASID(pasid) |
> +			QI_EIOTLB_DID(did) |
> +			QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
> +			QI_EIOTLB_TYPE;
> +	desc[2].qw0 = QI_DEV_EIOTLB_PASID(pasid) |
> +			QI_DEV_EIOTLB_SID(sid) |
> +			QI_DEV_EIOTLB_QDEP(qdep) |
> +			QI_DEIOTLB_TYPE |
> +			QI_DEV_IOTLB_PFSID(info->pfsid);
> +
> +	/*
> +	 * Submit an invalidation wait descriptor with fence and
> page request
> +	 * drain flags set to invalidation queue. This ensures that
> all requests
> +	 * submitted to the invalidation queue ahead of this wait
> descriptor are
> +	 * processed and completed, and all already issued page
> requests from
> +	 * the device are put in the page request queue.
> +	 */
> +	qi_submit_sync(iommu, desc, 1, QI_OPT_WAIT_DRAIN);
> +
> +	/*
> +	 * Check and wait until all pending page requests in the
> queue are
> +	 * handled by the intr thread.
> +	 */
> +prq_retry:
> +	tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
> +	head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
> +	while (head != tail) {
> +		struct page_req_dsc *req;
> +
> +		req = &iommu->prq[head / sizeof(*req)];
> +		if (!req->pasid_present || req->pasid != pasid) {
> +			head = (head + sizeof(*req)) & PRQ_RING_MASK;
> +			continue;
> +		}
> +
> +		wait_for_completion_timeout(&iommu->prq_complete,
> HZ);
> +		goto prq_retry;
> +	}
> +
> +	/*
> +	 * Perform steps described in VT-d spec CH7.10 to drain page
> +	 * requests and responses in hardware.
> +	 */
> +qi_retry:
> +	qi_submit_sync(iommu, desc, 3, QI_OPT_WAIT_DRAIN);
> +	if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO) {
> +		wait_for_completion_timeout(&iommu->prq_complete,
> HZ);
> +		goto qi_retry;
> +	}
> +}
> +
>  static irqreturn_t prq_event_thread(int irq, void *d)
>  {
>  	struct intel_iommu *iommu = d;
> @@ -856,6 +943,12 @@ static irqreturn_t prq_event_thread(int irq,
> void *d) }
>  
>  	dmar_writeq(iommu->reg + DMAR_PQH_REG, tail);
> +	/*
> +	 * Clear the page request overflow bit and wake up all
> threads that
> +	 * are waiting for the completion of this handling.
> +	 */
> +	writel(DMA_PRS_PRO, iommu->reg + DMAR_PRS_REG);
> +	complete(&iommu->prq_complete);
>  
>  	return IRQ_RETVAL(handled);
>  }
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index cca1e5f9aeaa..a0512b401a59 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -292,6 +292,8 @@
>  
>  /* PRS_REG */
>  #define DMA_PRS_PPR	((u32)1)
> +#define DMA_PRS_PRO	((u32)2)
> +
>  #define DMA_VCS_PAS	((u64)1)
>  
>  #define IOMMU_WAIT_OP(iommu, offset, op, cond,
> sts)			\ @@ -333,6 +335,7 @@ enum {
>  
>  #define QI_IWD_STATUS_DATA(d)	(((u64)d) << 32)
>  #define QI_IWD_STATUS_WRITE	(((u64)1) << 5)
> +#define QI_IWD_FENCE		(((u64)1) << 6)
>  #define QI_IWD_PRQ_DRAIN	(((u64)1) << 7)
>  
>  #define QI_IOTLB_DID(did) 	(((u64)did) << 16)
> @@ -590,6 +593,7 @@ struct intel_iommu {
>  #ifdef CONFIG_INTEL_IOMMU_SVM
>  	struct page_req_dsc *prq;
>  	unsigned char prq_name[16];    /* Name for PRQ interrupt */
> +	struct completion prq_complete;
>  	struct ioasid_allocator_ops pasid_allocator; /* Custom
> allocator for PASIDs */ #endif
>  	struct q_inval  *qi;            /* Queued invalidation info
> */

[Jacob Pan]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/4] iommu/vt-d: Add page request draining support
@ 2020-04-29  3:36     ` Jacob Pan
  0 siblings, 0 replies; 18+ messages in thread
From: Jacob Pan @ 2020-04-29  3:36 UTC (permalink / raw)
  To: Lu Baolu; +Cc: kevin.tian, ashok.raj, linux-kernel, iommu

On Wed, 22 Apr 2020 16:06:10 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> When a PASID is stopped or terminated, there can be pending PRQs
> (requests that haven't received responses) in remapping hardware.
> This adds the interface to drain page requests and call it when a
> PASID is terminated.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-svm.c   | 103
> ++++++++++++++++++++++++++++++++++-- include/linux/intel-iommu.h |
> 4 ++ 2 files changed, 102 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 83dc4319f661..2534641ef707 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -23,6 +23,7 @@
>  #include "intel-pasid.h"
>  
>  static irqreturn_t prq_event_thread(int irq, void *d);
> +static void intel_svm_drain_prq(struct device *dev, int pasid);
>  
>  #define PRQ_ORDER 0
>  
> @@ -66,6 +67,8 @@ int intel_svm_enable_prq(struct intel_iommu *iommu)
>  	dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
>  	dmar_writeq(iommu->reg + DMAR_PQA_REG,
> virt_to_phys(iommu->prq) | PRQ_ORDER); 
> +	init_completion(&iommu->prq_complete);
> +
>  	return 0;
>  }
>  
> @@ -208,6 +211,7 @@ static void intel_mm_release(struct mmu_notifier
> *mn, struct mm_struct *mm) rcu_read_lock();
>  	list_for_each_entry_rcu(sdev, &svm->devs, list) {
>  		intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
> svm->pasid);
> +		intel_svm_drain_prq(sdev->dev, svm->pasid);
mmu_notifier release is called in atomic context, drain_prq needs to
wait for completion. I tested exit path when a process crashes. I got

[  +0.696214] BUG: sleeping function called from invalid context at kernel/sched/completion.c:101
[  +0.000068] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3235, name: dsatest
[  +0.000046] INFO: lockdep is turned off.
[  +0.000002] CPU: 1 PID: 3235 Comm: dsatest Not tainted 5.7.0-rc1-z-svmtest+ #1637
[  +0.000000] Hardware name: Intel Corporation Kabylake Client platform/Skylake Halo DDR4 RVP11, BIOS 
04.1709050855 09/05/2017
[  +0.000001] Call Trace:
[  +0.000004]  dump_stack+0x68/0x9b
[  +0.000003]  ___might_sleep+0x229/0x250
[  +0.000003]  wait_for_completion_timeout+0x3c/0x110
[  +0.000003]  intel_svm_drain_prq+0x12f/0x210
[  +0.000003]  intel_mm_release+0x73/0x110
[  +0.000003]  __mmu_notifier_release+0x94/0x220
[  +0.000002]  ? do_munmap+0x10/0x10
[  +0.000002]  ? prepare_ftrace_return+0x5c/0x80
[  +0.000003]  exit_mmap+0x156/0x1a0
[  +0.000002]  ? mmput+0x44/0x120
[  +0.000003]  ? exit_mmap+0x5/0x1a0
[  +0.000002]  ? ftrace_graph_caller+0xa0/0xa0
[  +0.000001]  mmput+0x5e/0x120


>  		intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>  	}
>  	rcu_read_unlock();
> @@ -401,12 +405,8 @@ int intel_svm_unbind_gpasid(struct device *dev,
> int pasid) if (!sdev->users) {
>  			list_del_rcu(&sdev->list);
>  			intel_pasid_tear_down_entry(iommu, dev,
> svm->pasid);
> +			intel_svm_drain_prq(dev, svm->pasid);
>  			intel_flush_svm_range_dev(svm, sdev, 0, -1,
> 0);
> -			/* TODO: Drain in flight PRQ for the PASID
> since it
> -			 * may get reused soon, we don't want to
> -			 * confuse with its previous life.
> -			 * intel_svm_drain_prq(dev, pasid);
> -			 */
>  			kfree_rcu(sdev, rcu);
>  
>  			if (list_empty(&svm->devs)) {
> @@ -644,6 +644,7 @@ int intel_svm_unbind_mm(struct device *dev, int
> pasid)
>  			 * large and has to be physically
> contiguous. So it's
>  			 * hard to be as defensive as we might like.
> */ intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
> +			intel_svm_drain_prq(dev, svm->pasid);
>  			intel_flush_svm_range_dev(svm, sdev, 0, -1,
> 0); kfree_rcu(sdev, rcu);
>  
> @@ -722,6 +723,92 @@ static bool is_canonical_address(u64 addr)
>  	return (((saddr << shift) >> shift) == saddr);
>  }
>  
> +/**
> + * intel_svm_drain_prq:
> + *
> + * Drain all pending page requests and responses related to a
> specific
> + * pasid in both software and hardware.
> + */
> +static void intel_svm_drain_prq(struct device *dev, int pasid)
> +{
> +	struct device_domain_info *info;
> +	struct dmar_domain *domain;
> +	struct intel_iommu *iommu;
> +	struct qi_desc desc[3];
> +	struct pci_dev *pdev;
> +	int head, tail;
> +	u16 sid, did;
> +	int qdep;
> +
> +	info = get_domain_info(dev);
> +	if (WARN_ON(!info || !dev_is_pci(dev)))
> +		return;
> +
> +	if (!info->ats_enabled)
> +		return;
> +
> +	iommu = info->iommu;
> +	domain = info->domain;
> +	pdev = to_pci_dev(dev);
> +	sid = PCI_DEVID(info->bus, info->devfn);
> +	did = domain->iommu_did[iommu->seq_id];
> +	qdep = pci_ats_queue_depth(pdev);
> +
> +	memset(desc, 0, sizeof(desc));
> +	desc[0].qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
> +			QI_IWD_FENCE |
> +			QI_IWD_TYPE;
> +	desc[1].qw0 = QI_EIOTLB_PASID(pasid) |
> +			QI_EIOTLB_DID(did) |
> +			QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
> +			QI_EIOTLB_TYPE;
> +	desc[2].qw0 = QI_DEV_EIOTLB_PASID(pasid) |
> +			QI_DEV_EIOTLB_SID(sid) |
> +			QI_DEV_EIOTLB_QDEP(qdep) |
> +			QI_DEIOTLB_TYPE |
> +			QI_DEV_IOTLB_PFSID(info->pfsid);
> +
> +	/*
> +	 * Submit an invalidation wait descriptor with fence and
> page request
> +	 * drain flags set to invalidation queue. This ensures that
> all requests
> +	 * submitted to the invalidation queue ahead of this wait
> descriptor are
> +	 * processed and completed, and all already issued page
> requests from
> +	 * the device are put in the page request queue.
> +	 */
> +	qi_submit_sync(iommu, desc, 1, QI_OPT_WAIT_DRAIN);
> +
> +	/*
> +	 * Check and wait until all pending page requests in the
> queue are
> +	 * handled by the intr thread.
> +	 */
> +prq_retry:
> +	tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
> +	head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
> +	while (head != tail) {
> +		struct page_req_dsc *req;
> +
> +		req = &iommu->prq[head / sizeof(*req)];
> +		if (!req->pasid_present || req->pasid != pasid) {
> +			head = (head + sizeof(*req)) & PRQ_RING_MASK;
> +			continue;
> +		}
> +
> +		wait_for_completion_timeout(&iommu->prq_complete,
> HZ);
> +		goto prq_retry;
> +	}
> +
> +	/*
> +	 * Perform steps described in VT-d spec CH7.10 to drain page
> +	 * requests and responses in hardware.
> +	 */
> +qi_retry:
> +	qi_submit_sync(iommu, desc, 3, QI_OPT_WAIT_DRAIN);
> +	if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO) {
> +		wait_for_completion_timeout(&iommu->prq_complete,
> HZ);
> +		goto qi_retry;
> +	}
> +}
> +
>  static irqreturn_t prq_event_thread(int irq, void *d)
>  {
>  	struct intel_iommu *iommu = d;
> @@ -856,6 +943,12 @@ static irqreturn_t prq_event_thread(int irq,
> void *d) }
>  
>  	dmar_writeq(iommu->reg + DMAR_PQH_REG, tail);
> +	/*
> +	 * Clear the page request overflow bit and wake up all
> threads that
> +	 * are waiting for the completion of this handling.
> +	 */
> +	writel(DMA_PRS_PRO, iommu->reg + DMAR_PRS_REG);
> +	complete(&iommu->prq_complete);
>  
>  	return IRQ_RETVAL(handled);
>  }
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index cca1e5f9aeaa..a0512b401a59 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -292,6 +292,8 @@
>  
>  /* PRS_REG */
>  #define DMA_PRS_PPR	((u32)1)
> +#define DMA_PRS_PRO	((u32)2)
> +
>  #define DMA_VCS_PAS	((u64)1)
>  
>  #define IOMMU_WAIT_OP(iommu, offset, op, cond,
> sts)			\ @@ -333,6 +335,7 @@ enum {
>  
>  #define QI_IWD_STATUS_DATA(d)	(((u64)d) << 32)
>  #define QI_IWD_STATUS_WRITE	(((u64)1) << 5)
> +#define QI_IWD_FENCE		(((u64)1) << 6)
>  #define QI_IWD_PRQ_DRAIN	(((u64)1) << 7)
>  
>  #define QI_IOTLB_DID(did) 	(((u64)did) << 16)
> @@ -590,6 +593,7 @@ struct intel_iommu {
>  #ifdef CONFIG_INTEL_IOMMU_SVM
>  	struct page_req_dsc *prq;
>  	unsigned char prq_name[16];    /* Name for PRQ interrupt */
> +	struct completion prq_complete;
>  	struct ioasid_allocator_ops pasid_allocator; /* Custom
> allocator for PASIDs */ #endif
>  	struct q_inval  *qi;            /* Queued invalidation info
> */

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/4] iommu/vt-d: Add page request draining support
  2020-04-29  3:36     ` Jacob Pan
@ 2020-04-29  6:00       ` Lu Baolu
  -1 siblings, 0 replies; 18+ messages in thread
From: Lu Baolu @ 2020-04-29  6:00 UTC (permalink / raw)
  To: Jacob Pan
  Cc: baolu.lu, Joerg Roedel, ashok.raj, Liu Yi L, kevin.tian, iommu,
	linux-kernel

Hi Jacob,

On 2020/4/29 11:36, Jacob Pan wrote:
> On Wed, 22 Apr 2020 16:06:10 +0800
> Lu Baolu<baolu.lu@linux.intel.com>  wrote:
> 
>> When a PASID is stopped or terminated, there can be pending PRQs
>> (requests that haven't received responses) in remapping hardware.
>> This adds the interface to drain page requests and call it when a
>> PASID is terminated.
>>
>> Signed-off-by: Jacob Pan<jacob.jun.pan@linux.intel.com>
>> Signed-off-by: Liu Yi L<yi.l.liu@intel.com>
>> Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/intel-svm.c   | 103
>> ++++++++++++++++++++++++++++++++++-- include/linux/intel-iommu.h |
>> 4 ++ 2 files changed, 102 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>> index 83dc4319f661..2534641ef707 100644
>> --- a/drivers/iommu/intel-svm.c
>> +++ b/drivers/iommu/intel-svm.c
>> @@ -23,6 +23,7 @@
>>   #include "intel-pasid.h"
>>   
>>   static irqreturn_t prq_event_thread(int irq, void *d);
>> +static void intel_svm_drain_prq(struct device *dev, int pasid);
>>   
>>   #define PRQ_ORDER 0
>>   
>> @@ -66,6 +67,8 @@ int intel_svm_enable_prq(struct intel_iommu *iommu)
>>   	dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
>>   	dmar_writeq(iommu->reg + DMAR_PQA_REG,
>> virt_to_phys(iommu->prq) | PRQ_ORDER);
>> +	init_completion(&iommu->prq_complete);
>> +
>>   	return 0;
>>   }
>>   
>> @@ -208,6 +211,7 @@ static void intel_mm_release(struct mmu_notifier
>> *mn, struct mm_struct *mm) rcu_read_lock();
>>   	list_for_each_entry_rcu(sdev, &svm->devs, list) {
>>   		intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
>> svm->pasid);
>> +		intel_svm_drain_prq(sdev->dev, svm->pasid);
> mmu_notifier release is called in atomic context, drain_prq needs to
> wait for completion. I tested exit path when a process crashes. I got
> 
> [  +0.696214] BUG: sleeping function called from invalid context at kernel/sched/completion.c:101
> [  +0.000068] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3235, name: dsatest
> [  +0.000046] INFO: lockdep is turned off.
> [  +0.000002] CPU: 1 PID: 3235 Comm: dsatest Not tainted 5.7.0-rc1-z-svmtest+ #1637
> [  +0.000000] Hardware name: Intel Corporation Kabylake Client platform/Skylake Halo DDR4 RVP11, BIOS
> 04.1709050855 09/05/2017
> [  +0.000001] Call Trace:
> [  +0.000004]  dump_stack+0x68/0x9b
> [  +0.000003]  ___might_sleep+0x229/0x250
> [  +0.000003]  wait_for_completion_timeout+0x3c/0x110
> [  +0.000003]  intel_svm_drain_prq+0x12f/0x210
> [  +0.000003]  intel_mm_release+0x73/0x110
> [  +0.000003]  __mmu_notifier_release+0x94/0x220
> [  +0.000002]  ? do_munmap+0x10/0x10
> [  +0.000002]  ? prepare_ftrace_return+0x5c/0x80
> [  +0.000003]  exit_mmap+0x156/0x1a0
> [  +0.000002]  ? mmput+0x44/0x120
> [  +0.000003]  ? exit_mmap+0x5/0x1a0
> [  +0.000002]  ? ftrace_graph_caller+0xa0/0xa0
> [  +0.000001]  mmput+0x5e/0x120
> 
> 

Thanks a lot!

Actually, we can't drain page requests in this mm_notifier code path,
right? The assumptions of page request draining are that 1) the device
driver has drained DMA requests in the device end; 2) the pasid entry
has been marked as non-present. So we could only drain page requests in
the unbind path.

Thought?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/4] iommu/vt-d: Add page request draining support
@ 2020-04-29  6:00       ` Lu Baolu
  0 siblings, 0 replies; 18+ messages in thread
From: Lu Baolu @ 2020-04-29  6:00 UTC (permalink / raw)
  To: Jacob Pan; +Cc: kevin.tian, ashok.raj, linux-kernel, iommu

Hi Jacob,

On 2020/4/29 11:36, Jacob Pan wrote:
> On Wed, 22 Apr 2020 16:06:10 +0800
> Lu Baolu<baolu.lu@linux.intel.com>  wrote:
> 
>> When a PASID is stopped or terminated, there can be pending PRQs
>> (requests that haven't received responses) in remapping hardware.
>> This adds the interface to drain page requests and call it when a
>> PASID is terminated.
>>
>> Signed-off-by: Jacob Pan<jacob.jun.pan@linux.intel.com>
>> Signed-off-by: Liu Yi L<yi.l.liu@intel.com>
>> Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/intel-svm.c   | 103
>> ++++++++++++++++++++++++++++++++++-- include/linux/intel-iommu.h |
>> 4 ++ 2 files changed, 102 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>> index 83dc4319f661..2534641ef707 100644
>> --- a/drivers/iommu/intel-svm.c
>> +++ b/drivers/iommu/intel-svm.c
>> @@ -23,6 +23,7 @@
>>   #include "intel-pasid.h"
>>   
>>   static irqreturn_t prq_event_thread(int irq, void *d);
>> +static void intel_svm_drain_prq(struct device *dev, int pasid);
>>   
>>   #define PRQ_ORDER 0
>>   
>> @@ -66,6 +67,8 @@ int intel_svm_enable_prq(struct intel_iommu *iommu)
>>   	dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
>>   	dmar_writeq(iommu->reg + DMAR_PQA_REG,
>> virt_to_phys(iommu->prq) | PRQ_ORDER);
>> +	init_completion(&iommu->prq_complete);
>> +
>>   	return 0;
>>   }
>>   
>> @@ -208,6 +211,7 @@ static void intel_mm_release(struct mmu_notifier
>> *mn, struct mm_struct *mm) rcu_read_lock();
>>   	list_for_each_entry_rcu(sdev, &svm->devs, list) {
>>   		intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
>> svm->pasid);
>> +		intel_svm_drain_prq(sdev->dev, svm->pasid);
> mmu_notifier release is called in atomic context, drain_prq needs to
> wait for completion. I tested exit path when a process crashes. I got
> 
> [  +0.696214] BUG: sleeping function called from invalid context at kernel/sched/completion.c:101
> [  +0.000068] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3235, name: dsatest
> [  +0.000046] INFO: lockdep is turned off.
> [  +0.000002] CPU: 1 PID: 3235 Comm: dsatest Not tainted 5.7.0-rc1-z-svmtest+ #1637
> [  +0.000000] Hardware name: Intel Corporation Kabylake Client platform/Skylake Halo DDR4 RVP11, BIOS
> 04.1709050855 09/05/2017
> [  +0.000001] Call Trace:
> [  +0.000004]  dump_stack+0x68/0x9b
> [  +0.000003]  ___might_sleep+0x229/0x250
> [  +0.000003]  wait_for_completion_timeout+0x3c/0x110
> [  +0.000003]  intel_svm_drain_prq+0x12f/0x210
> [  +0.000003]  intel_mm_release+0x73/0x110
> [  +0.000003]  __mmu_notifier_release+0x94/0x220
> [  +0.000002]  ? do_munmap+0x10/0x10
> [  +0.000002]  ? prepare_ftrace_return+0x5c/0x80
> [  +0.000003]  exit_mmap+0x156/0x1a0
> [  +0.000002]  ? mmput+0x44/0x120
> [  +0.000003]  ? exit_mmap+0x5/0x1a0
> [  +0.000002]  ? ftrace_graph_caller+0xa0/0xa0
> [  +0.000001]  mmput+0x5e/0x120
> 
> 

Thanks a lot!

Actually, we can't drain page requests in this mm_notifier code path,
right? The assumptions of page request draining are that 1) the device
driver has drained DMA requests in the device end; 2) the pasid entry
has been marked as non-present. So we could only drain page requests in
the unbind path.

Thought?

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/4] iommu/vt-d: Add page request draining support
  2020-04-29  6:00       ` Lu Baolu
@ 2020-04-29 22:23         ` Jacob Pan
  -1 siblings, 0 replies; 18+ messages in thread
From: Jacob Pan @ 2020-04-29 22:23 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, ashok.raj, Liu Yi L, kevin.tian, iommu,
	linux-kernel, jacob.jun.pan

On Wed, 29 Apr 2020 14:00:05 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi Jacob,
> 
> On 2020/4/29 11:36, Jacob Pan wrote:
> > On Wed, 22 Apr 2020 16:06:10 +0800
> > Lu Baolu<baolu.lu@linux.intel.com>  wrote:
> >   
> >> When a PASID is stopped or terminated, there can be pending PRQs
> >> (requests that haven't received responses) in remapping hardware.
> >> This adds the interface to drain page requests and call it when a
> >> PASID is terminated.
> >>
> >> Signed-off-by: Jacob Pan<jacob.jun.pan@linux.intel.com>
> >> Signed-off-by: Liu Yi L<yi.l.liu@intel.com>
> >> Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
> >> ---
> >>   drivers/iommu/intel-svm.c   | 103
> >> ++++++++++++++++++++++++++++++++++-- include/linux/intel-iommu.h |
> >> 4 ++ 2 files changed, 102 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> >> index 83dc4319f661..2534641ef707 100644
> >> --- a/drivers/iommu/intel-svm.c
> >> +++ b/drivers/iommu/intel-svm.c
> >> @@ -23,6 +23,7 @@
> >>   #include "intel-pasid.h"
> >>   
> >>   static irqreturn_t prq_event_thread(int irq, void *d);
> >> +static void intel_svm_drain_prq(struct device *dev, int pasid);
> >>   
> >>   #define PRQ_ORDER 0
> >>   
> >> @@ -66,6 +67,8 @@ int intel_svm_enable_prq(struct intel_iommu
> >> *iommu) dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
> >>   	dmar_writeq(iommu->reg + DMAR_PQA_REG,
> >> virt_to_phys(iommu->prq) | PRQ_ORDER);
> >> +	init_completion(&iommu->prq_complete);
> >> +
> >>   	return 0;
> >>   }
> >>   
> >> @@ -208,6 +211,7 @@ static void intel_mm_release(struct
> >> mmu_notifier *mn, struct mm_struct *mm) rcu_read_lock();
> >>   	list_for_each_entry_rcu(sdev, &svm->devs, list) {
> >>   		intel_pasid_tear_down_entry(svm->iommu,
> >> sdev->dev, svm->pasid);
> >> +		intel_svm_drain_prq(sdev->dev, svm->pasid);  
> > mmu_notifier release is called in atomic context, drain_prq needs to
> > wait for completion. I tested exit path when a process crashes. I
> > got
> > 
> > [  +0.696214] BUG: sleeping function called from invalid context at
> > kernel/sched/completion.c:101 [  +0.000068] in_atomic(): 1,
> > irqs_disabled(): 0, non_block: 0, pid: 3235, name: dsatest
> > [  +0.000046] INFO: lockdep is turned off. [  +0.000002] CPU: 1
> > PID: 3235 Comm: dsatest Not tainted 5.7.0-rc1-z-svmtest+ #1637
> > [  +0.000000] Hardware name: Intel Corporation Kabylake Client
> > platform/Skylake Halo DDR4 RVP11, BIOS 04.1709050855 09/05/2017
> > [  +0.000001] Call Trace: [  +0.000004]  dump_stack+0x68/0x9b
> > [  +0.000003]  ___might_sleep+0x229/0x250
> > [  +0.000003]  wait_for_completion_timeout+0x3c/0x110
> > [  +0.000003]  intel_svm_drain_prq+0x12f/0x210
> > [  +0.000003]  intel_mm_release+0x73/0x110
> > [  +0.000003]  __mmu_notifier_release+0x94/0x220
> > [  +0.000002]  ? do_munmap+0x10/0x10
> > [  +0.000002]  ? prepare_ftrace_return+0x5c/0x80
> > [  +0.000003]  exit_mmap+0x156/0x1a0
> > [  +0.000002]  ? mmput+0x44/0x120
> > [  +0.000003]  ? exit_mmap+0x5/0x1a0
> > [  +0.000002]  ? ftrace_graph_caller+0xa0/0xa0
> > [  +0.000001]  mmput+0x5e/0x120
> > 
> >   
> 
> Thanks a lot!
> 
> Actually, we can't drain page requests in this mm_notifier code path,
> right? The assumptions of page request draining are that 1) the device
> driver has drained DMA requests in the device end; 2) the pasid entry
> has been marked as non-present. So we could only drain page requests
> in the unbind path.
> 
> Thought?
> 
Right, we could save the drain here. unbind will come soon when mm
exits. So even the in flight PRs come through, we could just respond
with "Invalid Response" after mm exit starts. The current code already
checks if the mm is exiting by mmget_not_zero.

> Best regards,
> baolu

[Jacob Pan]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/4] iommu/vt-d: Add page request draining support
@ 2020-04-29 22:23         ` Jacob Pan
  0 siblings, 0 replies; 18+ messages in thread
From: Jacob Pan @ 2020-04-29 22:23 UTC (permalink / raw)
  To: Lu Baolu; +Cc: kevin.tian, ashok.raj, linux-kernel, iommu

On Wed, 29 Apr 2020 14:00:05 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi Jacob,
> 
> On 2020/4/29 11:36, Jacob Pan wrote:
> > On Wed, 22 Apr 2020 16:06:10 +0800
> > Lu Baolu<baolu.lu@linux.intel.com>  wrote:
> >   
> >> When a PASID is stopped or terminated, there can be pending PRQs
> >> (requests that haven't received responses) in remapping hardware.
> >> This adds the interface to drain page requests and call it when a
> >> PASID is terminated.
> >>
> >> Signed-off-by: Jacob Pan<jacob.jun.pan@linux.intel.com>
> >> Signed-off-by: Liu Yi L<yi.l.liu@intel.com>
> >> Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
> >> ---
> >>   drivers/iommu/intel-svm.c   | 103
> >> ++++++++++++++++++++++++++++++++++-- include/linux/intel-iommu.h |
> >> 4 ++ 2 files changed, 102 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> >> index 83dc4319f661..2534641ef707 100644
> >> --- a/drivers/iommu/intel-svm.c
> >> +++ b/drivers/iommu/intel-svm.c
> >> @@ -23,6 +23,7 @@
> >>   #include "intel-pasid.h"
> >>   
> >>   static irqreturn_t prq_event_thread(int irq, void *d);
> >> +static void intel_svm_drain_prq(struct device *dev, int pasid);
> >>   
> >>   #define PRQ_ORDER 0
> >>   
> >> @@ -66,6 +67,8 @@ int intel_svm_enable_prq(struct intel_iommu
> >> *iommu) dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
> >>   	dmar_writeq(iommu->reg + DMAR_PQA_REG,
> >> virt_to_phys(iommu->prq) | PRQ_ORDER);
> >> +	init_completion(&iommu->prq_complete);
> >> +
> >>   	return 0;
> >>   }
> >>   
> >> @@ -208,6 +211,7 @@ static void intel_mm_release(struct
> >> mmu_notifier *mn, struct mm_struct *mm) rcu_read_lock();
> >>   	list_for_each_entry_rcu(sdev, &svm->devs, list) {
> >>   		intel_pasid_tear_down_entry(svm->iommu,
> >> sdev->dev, svm->pasid);
> >> +		intel_svm_drain_prq(sdev->dev, svm->pasid);  
> > mmu_notifier release is called in atomic context, drain_prq needs to
> > wait for completion. I tested exit path when a process crashes. I
> > got
> > 
> > [  +0.696214] BUG: sleeping function called from invalid context at
> > kernel/sched/completion.c:101 [  +0.000068] in_atomic(): 1,
> > irqs_disabled(): 0, non_block: 0, pid: 3235, name: dsatest
> > [  +0.000046] INFO: lockdep is turned off. [  +0.000002] CPU: 1
> > PID: 3235 Comm: dsatest Not tainted 5.7.0-rc1-z-svmtest+ #1637
> > [  +0.000000] Hardware name: Intel Corporation Kabylake Client
> > platform/Skylake Halo DDR4 RVP11, BIOS 04.1709050855 09/05/2017
> > [  +0.000001] Call Trace: [  +0.000004]  dump_stack+0x68/0x9b
> > [  +0.000003]  ___might_sleep+0x229/0x250
> > [  +0.000003]  wait_for_completion_timeout+0x3c/0x110
> > [  +0.000003]  intel_svm_drain_prq+0x12f/0x210
> > [  +0.000003]  intel_mm_release+0x73/0x110
> > [  +0.000003]  __mmu_notifier_release+0x94/0x220
> > [  +0.000002]  ? do_munmap+0x10/0x10
> > [  +0.000002]  ? prepare_ftrace_return+0x5c/0x80
> > [  +0.000003]  exit_mmap+0x156/0x1a0
> > [  +0.000002]  ? mmput+0x44/0x120
> > [  +0.000003]  ? exit_mmap+0x5/0x1a0
> > [  +0.000002]  ? ftrace_graph_caller+0xa0/0xa0
> > [  +0.000001]  mmput+0x5e/0x120
> > 
> >   
> 
> Thanks a lot!
> 
> Actually, we can't drain page requests in this mm_notifier code path,
> right? The assumptions of page request draining are that 1) the device
> driver has drained DMA requests in the device end; 2) the pasid entry
> has been marked as non-present. So we could only drain page requests
> in the unbind path.
> 
> Thought?
> 
Right, we could save the drain here. unbind will come soon when mm
exits. So even the in flight PRs come through, we could just respond
with "Invalid Response" after mm exit starts. The current code already
checks if the mm is exiting by mmget_not_zero.

> Best regards,
> baolu

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-04-29 22:17 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-22  8:06 [PATCH v3 0/4] iommu/vt-d: Add page request draining support Lu Baolu
2020-04-22  8:06 ` Lu Baolu
2020-04-22  8:06 ` [PATCH v3 1/4] iommu/vt-d: Multiple descriptors per qi_submit_sync() Lu Baolu
2020-04-22  8:06   ` Lu Baolu
2020-04-22  8:06 ` [PATCH v3 2/4] iommu/vt-d: debugfs: Add support to show inv queue internals Lu Baolu
2020-04-22  8:06   ` Lu Baolu
2020-04-22  8:06 ` [PATCH v3 3/4] iommu/vt-d: Add page request draining support Lu Baolu
2020-04-22  8:06   ` Lu Baolu
2020-04-29  3:36   ` Jacob Pan
2020-04-29  3:36     ` Jacob Pan
2020-04-29  6:00     ` Lu Baolu
2020-04-29  6:00       ` Lu Baolu
2020-04-29 22:23       ` Jacob Pan
2020-04-29 22:23         ` Jacob Pan
2020-04-22  8:06 ` [PATCH v3 4/4] iommu/vt-d: Remove redundant IOTLB flush Lu Baolu
2020-04-22  8:06   ` Lu Baolu
2020-04-28  1:47 ` [PATCH v3 0/4] iommu/vt-d: Add page request draining support Lu Baolu
2020-04-28  1:47   ` Lu Baolu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.