Linux-PCI Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes
@ 2020-10-15  9:00 Jean-Philippe Brucker
  2020-10-15  9:00 ` [RFC PATCH 1/2] iommu: Add flags to sva_unbind() Jean-Philippe Brucker
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jean-Philippe Brucker @ 2020-10-15  9:00 UTC (permalink / raw)
  To: dwmw2, baolu.lu, joro, zhangfei.gao, wangzhou1
  Cc: arnd, gregkh, iommu, linux-kernel, linux-accelerators,
	kevin.tian, jacob.jun.pan, ashok.raj, linux-pci,
	Jean-Philippe Brucker

Add a parameter to iommu_sva_unbind_device() that tells the IOMMU driver
whether the PRI queue needs flushing. When looking at the PCIe spec
again I noticed that most of the time the SMMUv3 driver doesn't actually
need to flush the PRI queue. Does this make sense for Intel VT-d as well
or did I overlook something?

Before calling iommu_sva_unbind_device(), device drivers must stop the
device from using the PASID. For PCIe devices, that consists of
completing any pending DMA, and completing any pending page request
unless the device uses Stop Markers. So unless the device uses Stop
Markers, we don't need to flush the PRI queue. For SMMUv3, stopping DMA
means completing all stall events, so we never need to flush the event
queue.

First patch adds flags to unbind(), and the second one lets device
drivers tell whether the PRI queue needs to be flushed.

Other remarks:

* The PCIe spec (see quote on patch 2), says that the device signals
  whether it has sent a Stop Marker or not during Stop PASID. In reality
  it's unlikely that a given device will sometimes use one stop method
  and sometimes the other, so it could be a device-wide flag rather than
  passing it at each unbind(). I don't want to speculate too much about
  future implementation so I prefer having the flag in unbind().

* In patch 1, uacce passes 0 to unbind(). To pass the right flag I'm
  thinking that uacce->ops->stop_queue(), which tells the device driver
  to stop DMA, should return whether faults are pending. This can be
  added later once uacce has an actual PCIe user, but we need to
  remember to do it.

Jean-Philippe Brucker (2):
  iommu: Add flags to sva_unbind()
  iommu: Add IOMMU_UNBIND_FAULT_PENDING flag

 include/linux/intel-iommu.h |  2 +-
 include/linux/iommu.h       | 38 ++++++++++++++++++++++++++++++++++---
 drivers/iommu/intel/svm.c   | 10 ++++++----
 drivers/iommu/iommu.c       | 10 +++++++---
 drivers/misc/uacce/uacce.c  |  4 ++--
 5 files changed, 51 insertions(+), 13 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/2] iommu: Add flags to sva_unbind()
  2020-10-15  9:00 [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes Jean-Philippe Brucker
@ 2020-10-15  9:00 ` Jean-Philippe Brucker
  2020-10-15  9:00 ` [RFC PATCH 2/2] iommu: Add IOMMU_UNBIND_FAULT_PENDING flag Jean-Philippe Brucker
  2020-10-15 18:22 ` [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes Raj, Ashok
  2 siblings, 0 replies; 10+ messages in thread
From: Jean-Philippe Brucker @ 2020-10-15  9:00 UTC (permalink / raw)
  To: dwmw2, baolu.lu, joro, zhangfei.gao, wangzhou1
  Cc: arnd, gregkh, iommu, linux-kernel, linux-accelerators,
	kevin.tian, jacob.jun.pan, ashok.raj, linux-pci,
	Jean-Philippe Brucker

Provide a way for device drivers to tell IOMMU drivers about the device
state and the cleanup work to be done, when unbinding. No functional
change.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/linux/intel-iommu.h | 2 +-
 include/linux/iommu.h       | 7 ++++---
 drivers/iommu/intel/svm.c   | 7 ++++---
 drivers/iommu/iommu.c       | 5 +++--
 drivers/misc/uacce/uacce.c  | 4 ++--
 5 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index fbf5b3e7707e..5b66b23d591d 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -747,7 +747,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 int intel_svm_unbind_gpasid(struct device *dev, u32 pasid);
 struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
 				 void *drvdata);
-void intel_svm_unbind(struct iommu_sva *handle);
+void intel_svm_unbind(struct iommu_sva *handle, unsigned long flags);
 u32 intel_svm_get_pasid(struct iommu_sva *handle);
 int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
 			    struct iommu_page_response *msg);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b95a6f8db6ff..26c1358a2a37 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -285,7 +285,7 @@ struct iommu_ops {
 
 	struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
 				      void *drvdata);
-	void (*sva_unbind)(struct iommu_sva *handle);
+	void (*sva_unbind)(struct iommu_sva *handle, unsigned long flags);
 	u32 (*sva_get_pasid)(struct iommu_sva *handle);
 
 	int (*page_response)(struct device *dev,
@@ -636,7 +636,7 @@ int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev);
 struct iommu_sva *iommu_sva_bind_device(struct device *dev,
 					struct mm_struct *mm,
 					void *drvdata);
-void iommu_sva_unbind_device(struct iommu_sva *handle);
+void iommu_sva_unbind_device(struct iommu_sva *handle, unsigned long flags);
 u32 iommu_sva_get_pasid(struct iommu_sva *handle);
 
 #else /* CONFIG_IOMMU_API */
@@ -1026,7 +1026,8 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
 	return NULL;
 }
 
-static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
+static inline void iommu_sva_unbind_device(struct iommu_sva *handle,
+					   unsigned long flags)
 {
 }
 
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index f1861fa3d0e4..700b05612af9 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -651,7 +651,8 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
 }
 
 /* Caller must hold pasid_mutex */
-static int intel_svm_unbind_mm(struct device *dev, u32 pasid)
+static int intel_svm_unbind_mm(struct device *dev, u32 pasid,
+			       unsigned long flags)
 {
 	struct intel_svm_dev *sdev;
 	struct intel_iommu *iommu;
@@ -1091,13 +1092,13 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
 	return sva;
 }
 
-void intel_svm_unbind(struct iommu_sva *sva)
+void intel_svm_unbind(struct iommu_sva *sva, unsigned long flags)
 {
 	struct intel_svm_dev *sdev;
 
 	mutex_lock(&pasid_mutex);
 	sdev = to_intel_svm_dev(sva);
-	intel_svm_unbind_mm(sdev->dev, sdev->pasid);
+	intel_svm_unbind_mm(sdev->dev, sdev->pasid, flags);
 	mutex_unlock(&pasid_mutex);
 }
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8c470f451a32..741c463095a8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2991,6 +2991,7 @@ EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
 /**
  * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
  * @handle: the handle returned by iommu_sva_bind_device()
+ * @flags: IOMMU_UNBIND_* flags
  *
  * Put reference to a bond between device and address space. The device should
  * not be issuing any more transaction for this PASID. All outstanding page
@@ -2998,7 +2999,7 @@ EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
  *
  * Returns 0 on success, or an error value
  */
-void iommu_sva_unbind_device(struct iommu_sva *handle)
+void iommu_sva_unbind_device(struct iommu_sva *handle, unsigned long flags)
 {
 	struct iommu_group *group;
 	struct device *dev = handle->dev;
@@ -3012,7 +3013,7 @@ void iommu_sva_unbind_device(struct iommu_sva *handle)
 		return;
 
 	mutex_lock(&group->mutex);
-	ops->sva_unbind(handle);
+	ops->sva_unbind(handle, flags);
 	mutex_unlock(&group->mutex);
 
 	iommu_group_put(group);
diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 56dd98ab5a81..0800566a6656 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -105,7 +105,7 @@ static int uacce_bind_queue(struct uacce_device *uacce, struct uacce_queue *q)
 
 	pasid = iommu_sva_get_pasid(handle);
 	if (pasid == IOMMU_PASID_INVALID) {
-		iommu_sva_unbind_device(handle);
+		iommu_sva_unbind_device(handle, 0);
 		return -ENODEV;
 	}
 
@@ -118,7 +118,7 @@ static void uacce_unbind_queue(struct uacce_queue *q)
 {
 	if (!q->handle)
 		return;
-	iommu_sva_unbind_device(q->handle);
+	iommu_sva_unbind_device(q->handle, 0);
 	q->handle = NULL;
 }
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/2] iommu: Add IOMMU_UNBIND_FAULT_PENDING flag
  2020-10-15  9:00 [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes Jean-Philippe Brucker
  2020-10-15  9:00 ` [RFC PATCH 1/2] iommu: Add flags to sva_unbind() Jean-Philippe Brucker
@ 2020-10-15  9:00 ` Jean-Philippe Brucker
  2020-10-16  7:07   ` Christoph Hellwig
  2020-10-15 18:22 ` [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes Raj, Ashok
  2 siblings, 1 reply; 10+ messages in thread
From: Jean-Philippe Brucker @ 2020-10-15  9:00 UTC (permalink / raw)
  To: dwmw2, baolu.lu, joro, zhangfei.gao, wangzhou1
  Cc: arnd, gregkh, iommu, linux-kernel, linux-accelerators,
	kevin.tian, jacob.jun.pan, ashok.raj, linux-pci,
	Jean-Philippe Brucker

IOMMU drivers only need to flush their PRI queue when faults might be
pending. According to the PCIe spec (quoted below) this only happens
when using the "Stop Marker" method. Otherwise the function waits for
pending faults before signaling to the device driver that it can
unbind().

Add the IOMMU_UNBIND_FAULT_PENDING flags to unbind(), to tell the IOMMU
driver whether it's worth flushing the queue.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/linux/iommu.h     | 31 +++++++++++++++++++++++++++++++
 drivers/iommu/intel/svm.c |  3 ++-
 drivers/iommu/iommu.c     |  5 ++++-
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 26c1358a2a37..fd9630b1240d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -163,6 +163,37 @@ enum iommu_dev_features {
 
 #define IOMMU_PASID_INVALID	(-1U)
 
+/*
+ * Indicate that a device stops using a PASID by issuing a Stop Marker Message.
+ * From PCIe 4.0r1.0, 10.4.1.2 Managing PASID TLP Prefix Usage:
+ *
+ * "To stop without using a Stop Marker Message, the Function shall:
+ *  1. Stop queueing new Page Request Messages for this PASID.
+ *  2. Finish transmitting any multi-page Page Request Messages for this PASID
+ *     (i.e. send the Page Request Message with the L bit Set).
+ *  3. Wait for PRG Response Messages associated any outstanding Page Request
+ *     Messages for the PASID.
+ *  4. Indicate that the PASID has stopped using a device specific mechanism.
+ *     This mechanism must indicate that a Stop Marker Message will not be
+ *     generated.
+ *  To stop with the use of a Stop Marker Message the Function shall:
+ * [1. and 2. are the same]
+ *  3. Internally mark all outstanding Page Request Messages for this PASID as
+ *     stale. PRG Response Messages associated with these requests will return
+ *     Page Request Allocation credits and PRG Index values but are otherwise
+ *     ignored.
+ *  4. Indicate that the PASID has stopped using a device specific mechanism.
+ *     This mechanism must indicate that a Stop Marker Message will be
+ *     generated.
+ *  5. Send a Stop Marker Message to indicate to the host that all subsequent
+ *     Page Request Messages for this PASID are for a new use of the PASID
+ *     value."
+ *
+ * If the device indicates that the Stop Marker Message will be generated, the
+ * device driver should set the IOMMU_UNBIND_FAULT_PENDING flag.
+ */
+#define IOMMU_UNBIND_FAULT_PENDING	(1UL << 0)
+
 #ifdef CONFIG_IOMMU_API
 
 /**
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 700b05612af9..aa1fcb66fa95 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -680,7 +680,8 @@ static int intel_svm_unbind_mm(struct device *dev, u32 pasid,
 			 * hard to be as defensive as we might like. */
 			intel_pasid_tear_down_entry(iommu, dev,
 						    svm->pasid, false);
-			intel_svm_drain_prq(dev, svm->pasid);
+			if (flags & IOMMU_UNBIND_FAULT_PENDING)
+				intel_svm_drain_prq(dev, svm->pasid);
 			kfree_rcu(sdev, rcu);
 
 			if (list_empty(&svm->devs)) {
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 741c463095a8..eede0592a2c0 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2995,7 +2995,10 @@ EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
  *
  * Put reference to a bond between device and address space. The device should
  * not be issuing any more transaction for this PASID. All outstanding page
- * requests for this PASID must have been flushed to the IOMMU.
+ * requests for this PASID must have been completed, or flushed to the IOMMU. If
+ * they have not been completed, for example when using a Stop Marker Message to
+ * stop PASID in a PCIe device, then the caller must set the flag
+ * %IOMMU_UNBIND_FAULT_PENDING.
  *
  * Returns 0 on success, or an error value
  */
-- 
2.28.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes
  2020-10-15  9:00 [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes Jean-Philippe Brucker
  2020-10-15  9:00 ` [RFC PATCH 1/2] iommu: Add flags to sva_unbind() Jean-Philippe Brucker
  2020-10-15  9:00 ` [RFC PATCH 2/2] iommu: Add IOMMU_UNBIND_FAULT_PENDING flag Jean-Philippe Brucker
@ 2020-10-15 18:22 ` Raj, Ashok
  2020-10-16  7:59   ` Jean-Philippe Brucker
  2 siblings, 1 reply; 10+ messages in thread
From: Raj, Ashok @ 2020-10-15 18:22 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: dwmw2, baolu.lu, joro, zhangfei.gao, wangzhou1, arnd, gregkh,
	iommu, linux-kernel, linux-accelerators, kevin.tian,
	jacob.jun.pan, linux-pci, Ashok Raj, Lu, Baolu, Jacon Jun Pan

Hi Jean

+ Baolu who is looking into this.


On Thu, Oct 15, 2020 at 11:00:27AM +0200, Jean-Philippe Brucker wrote:
> Add a parameter to iommu_sva_unbind_device() that tells the IOMMU driver
> whether the PRI queue needs flushing. When looking at the PCIe spec
> again I noticed that most of the time the SMMUv3 driver doesn't actually
> need to flush the PRI queue. Does this make sense for Intel VT-d as well
> or did I overlook something?
> 
> Before calling iommu_sva_unbind_device(), device drivers must stop the
> device from using the PASID. For PCIe devices, that consists of
> completing any pending DMA, and completing any pending page request
> unless the device uses Stop Markers. So unless the device uses Stop
> Markers, we don't need to flush the PRI queue. For SMMUv3, stopping DMA
> means completing all stall events, so we never need to flush the event
> queue.

I don't think this is true. Baolu is working on an enhancement to this,
I'll quickly summarize this below:

Stop markers are weird, I'm not certain there is any device today that
sends STOP markers. Even if they did, markers don't have a required
response, they are fire and forget from the device pov.

I'm not sure about other IOMMU's how they behave, When there is no space in
the PRQ, IOMMU auto-responds to the device. This puts the device in a
while (1) loop. The fake successful response will let the device do a ATS
lookup, and that would fail forcing the device to do another PRQ. The idea
is somewhere there the OS has repeated the others and this will find a way
in the PRQ. The point is this is less reliable and can't be the only
indication. PRQ draining has a specific sequence. 

The detailed steps are outlined in 
"Chapter 7.10 "Software Steps to Drain Page Requests & Responses"

- Submit invalidation wait with fence flag to ensure all prior
  invalidations are processed.
- submit iotlb followed by devtlb invalidation
- Submit invalidation wait with page-drain to make sure any page-requests
  issued by the device are flushed when this invalidation wait completes.
- If during the above process there was a queue overflow SW can assume no
  outstanding page-requests are there. If we had a queue full condition,
  then sw must repeat step 2,3 above.


To that extent the proposal is as follows: names are suggestive :-) I'm
making this up as I go!

- iommu_stop_page_req() - Kernel needs to make sure we respond to all
  outstanding requests (since we can't drop responses) 
  - Ensure we respond immediatly for anything that comes before the drain
    sequence completes
- iommu_drain_page_req() - Which does the above invalidation with
  Page_drain set.

Once the driver has performed a reset and before issuing any new request,
it does iommu_resume_page_req() this will ensure we start processing
incoming page-req after this point.


> 
> First patch adds flags to unbind(), and the second one lets device
> drivers tell whether the PRI queue needs to be flushed.
> 
> Other remarks:
> 
> * The PCIe spec (see quote on patch 2), says that the device signals
>   whether it has sent a Stop Marker or not during Stop PASID. In reality
>   it's unlikely that a given device will sometimes use one stop method
>   and sometimes the other, so it could be a device-wide flag rather than
>   passing it at each unbind(). I don't want to speculate too much about
>   future implementation so I prefer having the flag in unbind().
> 
> * In patch 1, uacce passes 0 to unbind(). To pass the right flag I'm
>   thinking that uacce->ops->stop_queue(), which tells the device driver
>   to stop DMA, should return whether faults are pending. This can be
>   added later once uacce has an actual PCIe user, but we need to
>   remember to do it.

I think intel iommmu driver does this today for SVA when pasid is being
freed. Its still important to go through the drain before that PASID can be
re-purposed.

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] iommu: Add IOMMU_UNBIND_FAULT_PENDING flag
  2020-10-15  9:00 ` [RFC PATCH 2/2] iommu: Add IOMMU_UNBIND_FAULT_PENDING flag Jean-Philippe Brucker
@ 2020-10-16  7:07   ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2020-10-16  7:07 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: dwmw2, baolu.lu, joro, zhangfei.gao, wangzhou1, arnd, gregkh,
	iommu, linux-kernel, linux-accelerators, kevin.tian,
	jacob.jun.pan, ashok.raj, linux-pci

On Thu, Oct 15, 2020 at 11:00:29AM +0200, Jean-Philippe Brucker wrote:
> IOMMU drivers only need to flush their PRI queue when faults might be
> pending. According to the PCIe spec (quoted below) this only happens
> when using the "Stop Marker" method. Otherwise the function waits for
> pending faults before signaling to the device driver that it can
> unbind().
> 
> Add the IOMMU_UNBIND_FAULT_PENDING flags to unbind(), to tell the IOMMU
> driver whether it's worth flushing the queue.

As we have no code using the "stop marker" method please just remove
the code entirely instead of adding a flag that is just dead code.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes
  2020-10-15 18:22 ` [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes Raj, Ashok
@ 2020-10-16  7:59   ` Jean-Philippe Brucker
  2020-10-17 11:25     ` Raj, Ashok
  0 siblings, 1 reply; 10+ messages in thread
From: Jean-Philippe Brucker @ 2020-10-16  7:59 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: dwmw2, baolu.lu, joro, zhangfei.gao, wangzhou1, arnd, gregkh,
	iommu, linux-kernel, linux-accelerators, kevin.tian,
	jacob.jun.pan, linux-pci, Lu, Baolu, Jacon Jun Pan

On Thu, Oct 15, 2020 at 11:22:11AM -0700, Raj, Ashok wrote:
> Hi Jean
> 
> + Baolu who is looking into this.
> 
> 
> On Thu, Oct 15, 2020 at 11:00:27AM +0200, Jean-Philippe Brucker wrote:
> > Add a parameter to iommu_sva_unbind_device() that tells the IOMMU driver
> > whether the PRI queue needs flushing. When looking at the PCIe spec
> > again I noticed that most of the time the SMMUv3 driver doesn't actually
> > need to flush the PRI queue. Does this make sense for Intel VT-d as well
> > or did I overlook something?
> > 
> > Before calling iommu_sva_unbind_device(), device drivers must stop the
> > device from using the PASID. For PCIe devices, that consists of
> > completing any pending DMA, and completing any pending page request
> > unless the device uses Stop Markers. So unless the device uses Stop
> > Markers, we don't need to flush the PRI queue. For SMMUv3, stopping DMA
> > means completing all stall events, so we never need to flush the event
> > queue.
> 
> I don't think this is true. Baolu is working on an enhancement to this,
> I'll quickly summarize this below:
> 
> Stop markers are weird, I'm not certain there is any device today that
> sends STOP markers. Even if they did, markers don't have a required
> response, they are fire and forget from the device pov.

Definitely agree with this, and I hope none will implement stop marker.
For devices that *don't* use a stop marker, the PCIe spec says (10.4.1.2):

  To stop [using a PASID] without using a Stop Marker Message, the
  function shall:
  1. Stop queueing new Page Request Messages for this PASID.
  2. Finish transmitting any multi-page Page Request Messages for this
     PASID (i.e. send the Page Request Message with the L bit Set).
  3. Wait for PRG Response Messages associated any outstanding Page
     Request Messages for the PASID.

So they have to flush their PR themselves. And since the device driver
completes this sequence before calling unbind(), then there shouldn't be
any oustanding PR for the PASID, and unbind() doesn't need to flush,
right?

> I'm not sure about other IOMMU's how they behave, When there is no space in
> the PRQ, IOMMU auto-responds to the device. This puts the device in a
> while (1) loop. The fake successful response will let the device do a ATS
> lookup, and that would fail forcing the device to do another PRQ.

But in the sequence above, step 1 should ensure that the device will not
send another PR for any successful response coming back at step 3.

So I agree with the below if we suspect there could be pending PR, but
given that pending PR are a stop marker thing and we don't know any device
using stop markers, I wondered why I bothered implementing PRIq flush at
all for SMMUv3, hence this RFC.

Thanks,
Jean


> The idea
> is somewhere there the OS has repeated the others and this will find a way
> in the PRQ. The point is this is less reliable and can't be the only
> indication. PRQ draining has a specific sequence. 
> 
> The detailed steps are outlined in 
> "Chapter 7.10 "Software Steps to Drain Page Requests & Responses"
> 
> - Submit invalidation wait with fence flag to ensure all prior
>   invalidations are processed.
> - submit iotlb followed by devtlb invalidation
> - Submit invalidation wait with page-drain to make sure any page-requests
>   issued by the device are flushed when this invalidation wait completes.
> - If during the above process there was a queue overflow SW can assume no
>   outstanding page-requests are there. If we had a queue full condition,
>   then sw must repeat step 2,3 above.
> 
> 
> To that extent the proposal is as follows: names are suggestive :-) I'm
> making this up as I go!
> 
> - iommu_stop_page_req() - Kernel needs to make sure we respond to all
>   outstanding requests (since we can't drop responses) 
>   - Ensure we respond immediatly for anything that comes before the drain
>     sequence completes
> - iommu_drain_page_req() - Which does the above invalidation with
>   Page_drain set.
> 
> Once the driver has performed a reset and before issuing any new request,
> it does iommu_resume_page_req() this will ensure we start processing
> incoming page-req after this point.
> 
> 
> > 
> > First patch adds flags to unbind(), and the second one lets device
> > drivers tell whether the PRI queue needs to be flushed.
> > 
> > Other remarks:
> > 
> > * The PCIe spec (see quote on patch 2), says that the device signals
> >   whether it has sent a Stop Marker or not during Stop PASID. In reality
> >   it's unlikely that a given device will sometimes use one stop method
> >   and sometimes the other, so it could be a device-wide flag rather than
> >   passing it at each unbind(). I don't want to speculate too much about
> >   future implementation so I prefer having the flag in unbind().
> > 
> > * In patch 1, uacce passes 0 to unbind(). To pass the right flag I'm
> >   thinking that uacce->ops->stop_queue(), which tells the device driver
> >   to stop DMA, should return whether faults are pending. This can be
> >   added later once uacce has an actual PCIe user, but we need to
> >   remember to do it.
> 
> I think intel iommmu driver does this today for SVA when pasid is being
> freed. Its still important to go through the drain before that PASID can be
> re-purposed.
> 
> Cheers,
> Ashok

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes
  2020-10-16  7:59   ` Jean-Philippe Brucker
@ 2020-10-17 11:25     ` Raj, Ashok
  2020-10-19 14:08       ` Jean-Philippe Brucker
  0 siblings, 1 reply; 10+ messages in thread
From: Raj, Ashok @ 2020-10-17 11:25 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: dwmw2, baolu.lu, joro, zhangfei.gao, wangzhou1, arnd, gregkh,
	iommu, linux-kernel, linux-accelerators, kevin.tian,
	jacob.jun.pan, linux-pci, Lu, Baolu, Jacon Jun Pan, Ashok Raj

Hi Jean

On Fri, Oct 16, 2020 at 09:59:23AM +0200, Jean-Philippe Brucker wrote:
> On Thu, Oct 15, 2020 at 11:22:11AM -0700, Raj, Ashok wrote:
> > Hi Jean
> > 
> > + Baolu who is looking into this.
> > 
> > 
> > On Thu, Oct 15, 2020 at 11:00:27AM +0200, Jean-Philippe Brucker wrote:
> > > Add a parameter to iommu_sva_unbind_device() that tells the IOMMU driver
> > > whether the PRI queue needs flushing. When looking at the PCIe spec
> > > again I noticed that most of the time the SMMUv3 driver doesn't actually
> > > need to flush the PRI queue. Does this make sense for Intel VT-d as well
> > > or did I overlook something?
> > > 
> > > Before calling iommu_sva_unbind_device(), device drivers must stop the
> > > device from using the PASID. For PCIe devices, that consists of
> > > completing any pending DMA, and completing any pending page request
> > > unless the device uses Stop Markers. So unless the device uses Stop
> > > Markers, we don't need to flush the PRI queue. For SMMUv3, stopping DMA
> > > means completing all stall events, so we never need to flush the event
> > > queue.
> > 
> > I don't think this is true. Baolu is working on an enhancement to this,
> > I'll quickly summarize this below:
> > 
> > Stop markers are weird, I'm not certain there is any device today that
> > sends STOP markers. Even if they did, markers don't have a required
> > response, they are fire and forget from the device pov.
> 
> Definitely agree with this, and I hope none will implement stop marker.
> For devices that *don't* use a stop marker, the PCIe spec says (10.4.1.2):
> 
>   To stop [using a PASID] without using a Stop Marker Message, the
>   function shall:
>   1. Stop queueing new Page Request Messages for this PASID.

The device driver would need to tell stop sending any new PR's.

>   2. Finish transmitting any multi-page Page Request Messages for this
>      PASID (i.e. send the Page Request Message with the L bit Set).
>   3. Wait for PRG Response Messages associated any outstanding Page
>      Request Messages for the PASID.
> 
> So they have to flush their PR themselves. And since the device driver
> completes this sequence before calling unbind(), then there shouldn't be
> any oustanding PR for the PASID, and unbind() doesn't need to flush,
> right?

I can see how the device can complete #2,3 above. But the device driver
isn't the one managing page-responses right. So in order for the device to
know the above sequence is complete, it would need to get some assist from
IOMMU driver?

How does the driver know that everything host received has been responded
back to device?

> 
> > I'm not sure about other IOMMU's how they behave, When there is no space in
> > the PRQ, IOMMU auto-responds to the device. This puts the device in a
> > while (1) loop. The fake successful response will let the device do a ATS
> > lookup, and that would fail forcing the device to do another PRQ.
> 
> But in the sequence above, step 1 should ensure that the device will not
> send another PR for any successful response coming back at step 3.

True, but there could be some page-request in flight on its way to the
IOMMU. By draining and getting that round trip back to IOMMU we gaurantee
things in flight are flushed to PRQ after that Drain completes.
> 
> So I agree with the below if we suspect there could be pending PR, but
> given that pending PR are a stop marker thing and we don't know any device
> using stop markers, I wondered why I bothered implementing PRIq flush at
> all for SMMUv3, hence this RFC.
> 

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes
  2020-10-17 11:25     ` Raj, Ashok
@ 2020-10-19 14:08       ` Jean-Philippe Brucker
  2020-10-19 18:33         ` Jacob Pan
  2020-10-19 21:16         ` Raj, Ashok
  0 siblings, 2 replies; 10+ messages in thread
From: Jean-Philippe Brucker @ 2020-10-19 14:08 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: dwmw2, baolu.lu, joro, zhangfei.gao, wangzhou1, arnd, gregkh,
	iommu, linux-kernel, linux-accelerators, kevin.tian,
	jacob.jun.pan, linux-pci, Lu, Baolu, Jacon Jun Pan

On Sat, Oct 17, 2020 at 04:25:25AM -0700, Raj, Ashok wrote:
> > For devices that *don't* use a stop marker, the PCIe spec says (10.4.1.2):
> > 
> >   To stop [using a PASID] without using a Stop Marker Message, the
> >   function shall:
> >   1. Stop queueing new Page Request Messages for this PASID.
> 
> The device driver would need to tell stop sending any new PR's.
> 
> >   2. Finish transmitting any multi-page Page Request Messages for this
> >      PASID (i.e. send the Page Request Message with the L bit Set).
> >   3. Wait for PRG Response Messages associated any outstanding Page
> >      Request Messages for the PASID.
> > 
> > So they have to flush their PR themselves. And since the device driver
> > completes this sequence before calling unbind(), then there shouldn't be
> > any oustanding PR for the PASID, and unbind() doesn't need to flush,
> > right?
> 
> I can see how the device can complete #2,3 above. But the device driver
> isn't the one managing page-responses right. So in order for the device to
> know the above sequence is complete, it would need to get some assist from
> IOMMU driver?

No the device driver just waits for the device to indicate that it has
completed the sequence. That's what the magic stop-PASID mechanism
described by PCIe does. In 6.20.1 "Managing PASID TLP Prefix Usage" it
says:

"A Function must have a mechanism to request that it gracefully stop using
 a specific PASID. This mechanism is device specific but must satisfy the
 following rules:
 [...]
 * When the stop request mechanism indicates completion, the Function has:
   [...]
   * Complied with additional rules described in Address Translation
     Services (Chapter 10 [10.4.1.2 quoted above]) if Address Translations
     or Page Requests were issued on the behalf of this PASID."

So after the device driver initiates this mechanism in the device, the
device must be able to indicate completion of the mechanism, which
includes completing all in-flight Page Requests. At that point the device
driver can call unbind() knowing there is no pending PR for this PASID.

Thanks,
Jean

> 
> How does the driver know that everything host received has been responded
> back to device?
> 
> > 
> > > I'm not sure about other IOMMU's how they behave, When there is no space in
> > > the PRQ, IOMMU auto-responds to the device. This puts the device in a
> > > while (1) loop. The fake successful response will let the device do a ATS
> > > lookup, and that would fail forcing the device to do another PRQ.
> > 
> > But in the sequence above, step 1 should ensure that the device will not
> > send another PR for any successful response coming back at step 3.
> 
> True, but there could be some page-request in flight on its way to the
> IOMMU. By draining and getting that round trip back to IOMMU we gaurantee
> things in flight are flushed to PRQ after that Drain completes.
> > 
> > So I agree with the below if we suspect there could be pending PR, but
> > given that pending PR are a stop marker thing and we don't know any device
> > using stop markers, I wondered why I bothered implementing PRIq flush at
> > all for SMMUv3, hence this RFC.
> > 
> 
> Cheers,
> Ashok

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes
  2020-10-19 14:08       ` Jean-Philippe Brucker
@ 2020-10-19 18:33         ` Jacob Pan
  2020-10-19 21:16         ` Raj, Ashok
  1 sibling, 0 replies; 10+ messages in thread
From: Jacob Pan @ 2020-10-19 18:33 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Raj, Ashok, dwmw2, baolu.lu, joro, zhangfei.gao, wangzhou1, arnd,
	gregkh, iommu, linux-kernel, linux-accelerators, kevin.tian,
	linux-pci, Lu, Baolu, Jacon Jun Pan, jacob.jun.pan

Hi Jean-Philippe,

On Mon, 19 Oct 2020 16:08:24 +0200, Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:

> On Sat, Oct 17, 2020 at 04:25:25AM -0700, Raj, Ashok wrote:
> > > For devices that *don't* use a stop marker, the PCIe spec says
> > > (10.4.1.2):
> > > 
> > >   To stop [using a PASID] without using a Stop Marker Message, the
> > >   function shall:
> > >   1. Stop queueing new Page Request Messages for this PASID.  
> > 
> > The device driver would need to tell stop sending any new PR's.
> >   
> > >   2. Finish transmitting any multi-page Page Request Messages for this
> > >      PASID (i.e. send the Page Request Message with the L bit Set).
> > >   3. Wait for PRG Response Messages associated any outstanding Page
> > >      Request Messages for the PASID.
> > > 
> > > So they have to flush their PR themselves. And since the device driver
> > > completes this sequence before calling unbind(), then there shouldn't
> > > be any oustanding PR for the PASID, and unbind() doesn't need to
> > > flush, right?  
> > 
> > I can see how the device can complete #2,3 above. But the device driver
> > isn't the one managing page-responses right. So in order for the device
> > to know the above sequence is complete, it would need to get some
> > assist from IOMMU driver?  
> 
> No the device driver just waits for the device to indicate that it has
> completed the sequence. That's what the magic stop-PASID mechanism
> described by PCIe does. In 6.20.1 "Managing PASID TLP Prefix Usage" it
> says:
> 
> "A Function must have a mechanism to request that it gracefully stop using
>  a specific PASID. This mechanism is device specific but must satisfy the
>  following rules:
>  [...]
>  * When the stop request mechanism indicates completion, the Function has:
>    [...]
>    * Complied with additional rules described in Address Translation
>      Services (Chapter 10 [10.4.1.2 quoted above]) if Address Translations
>      or Page Requests were issued on the behalf of this PASID."
> 
> So after the device driver initiates this mechanism in the device, the
> device must be able to indicate completion of the mechanism, which
> includes completing all in-flight Page Requests. At that point the device
> driver can call unbind() knowing there is no pending PR for this PASID.
> 
In step #3, I think it is possible that device driver received page response
as part of the auto page response, so it may not guarantee all the in-flight
PRQs are completed inside IOMMU. Therefore, drain is _always_ needed to be
sure?

> Thanks,
> Jean
> 
> > 
> > How does the driver know that everything host received has been
> > responded back to device?
> >   
> > >   
> > > > I'm not sure about other IOMMU's how they behave, When there is no
> > > > space in the PRQ, IOMMU auto-responds to the device. This puts the
> > > > device in a while (1) loop. The fake successful response will let
> > > > the device do a ATS lookup, and that would fail forcing the device
> > > > to do another PRQ.  
> > > 
> > > But in the sequence above, step 1 should ensure that the device will
> > > not send another PR for any successful response coming back at step
> > > 3.  
> > 
> > True, but there could be some page-request in flight on its way to the
> > IOMMU. By draining and getting that round trip back to IOMMU we
> > gaurantee things in flight are flushed to PRQ after that Drain
> > completes.  
> > > 
> > > So I agree with the below if we suspect there could be pending PR, but
> > > given that pending PR are a stop marker thing and we don't know any
> > > device using stop markers, I wondered why I bothered implementing
> > > PRIq flush at all for SMMUv3, hence this RFC.
> > >   
> > 
> > Cheers,
> > Ashok  


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes
  2020-10-19 14:08       ` Jean-Philippe Brucker
  2020-10-19 18:33         ` Jacob Pan
@ 2020-10-19 21:16         ` Raj, Ashok
  1 sibling, 0 replies; 10+ messages in thread
From: Raj, Ashok @ 2020-10-19 21:16 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: dwmw2, baolu.lu, joro, zhangfei.gao, wangzhou1, arnd, gregkh,
	iommu, linux-kernel, linux-accelerators, kevin.tian,
	jacob.jun.pan, linux-pci, Lu, Baolu, Jacon Jun Pan, Ashok Raj

Hi Jean

On Mon, Oct 19, 2020 at 04:08:24PM +0200, Jean-Philippe Brucker wrote:
> On Sat, Oct 17, 2020 at 04:25:25AM -0700, Raj, Ashok wrote:
> > > For devices that *don't* use a stop marker, the PCIe spec says (10.4.1.2):
> > > 
> > >   To stop [using a PASID] without using a Stop Marker Message, the
> > >   function shall:
> > >   1. Stop queueing new Page Request Messages for this PASID.
> > 
> > The device driver would need to tell stop sending any new PR's.
> > 
> > >   2. Finish transmitting any multi-page Page Request Messages for this
> > >      PASID (i.e. send the Page Request Message with the L bit Set).
> > >   3. Wait for PRG Response Messages associated any outstanding Page
> > >      Request Messages for the PASID.
> > > 
> > > So they have to flush their PR themselves. And since the device driver
> > > completes this sequence before calling unbind(), then there shouldn't be
> > > any oustanding PR for the PASID, and unbind() doesn't need to flush,
> > > right?
> > 
> > I can see how the device can complete #2,3 above. But the device driver
> > isn't the one managing page-responses right. So in order for the device to
> > know the above sequence is complete, it would need to get some assist from
> > IOMMU driver?
> 
> No the device driver just waits for the device to indicate that it has
> completed the sequence. That's what the magic stop-PASID mechanism
> described by PCIe does. In 6.20.1 "Managing PASID TLP Prefix Usage" it
> says:

The goal is we do this when the device is in a messup up state. So we can't
take for granted the device is properly operating which is why we are going
to wack the device with a flr().

The only thing that's supposed to work without a brain freeze is the
invalidation logic. Spec requires devices to respond to invalidations even when
they are in the process of flr().

So when IOMMU does an invalidation wait with a Page-Drain, IOMMU waits till
the response for that arrives before completing the descriptor. Due to 
the posted semantics it will ensure any PR's issued and in the fabric are flushed 
out to memory. 

I suppose you can wait for the device to vouch for all responses, but that
is assuming the device is still functioning properly. Given that we use it
in two places,

* Reclaiming a PASID - only during a tear down sequence, skipping it
  doesn't really buy us much.
* During FLR this can't be skipped anyway due to the above sequence
  requirement. 

> 
> "A Function must have a mechanism to request that it gracefully stop using
>  a specific PASID. This mechanism is device specific but must satisfy the
>  following rules:
>  [...]
>  * When the stop request mechanism indicates completion, the Function has:
>    [...]
>    * Complied with additional rules described in Address Translation
>      Services (Chapter 10 [10.4.1.2 quoted above]) if Address Translations
>      or Page Requests were issued on the behalf of this PASID."
> 
> So after the device driver initiates this mechanism in the device, the
> device must be able to indicate completion of the mechanism, which
> includes completing all in-flight Page Requests. At that point the device
> driver can call unbind() knowing there is no pending PR for this PASID.
> 

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-15  9:00 [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes Jean-Philippe Brucker
2020-10-15  9:00 ` [RFC PATCH 1/2] iommu: Add flags to sva_unbind() Jean-Philippe Brucker
2020-10-15  9:00 ` [RFC PATCH 2/2] iommu: Add IOMMU_UNBIND_FAULT_PENDING flag Jean-Philippe Brucker
2020-10-16  7:07   ` Christoph Hellwig
2020-10-15 18:22 ` [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes Raj, Ashok
2020-10-16  7:59   ` Jean-Philippe Brucker
2020-10-17 11:25     ` Raj, Ashok
2020-10-19 14:08       ` Jean-Philippe Brucker
2020-10-19 18:33         ` Jacob Pan
2020-10-19 21:16         ` Raj, Ashok

Linux-PCI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pci/0 linux-pci/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pci linux-pci/ https://lore.kernel.org/linux-pci \
		linux-pci@vger.kernel.org
	public-inbox-index linux-pci

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pci


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git