linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] IOMMU SVA device driver interface
@ 2019-02-20 14:27 Jean-Philippe Brucker
  2019-02-20 14:27 ` [PATCH 1/1] iommu: Bind process address spaces to devices Jean-Philippe Brucker
  0 siblings, 1 reply; 13+ messages in thread
From: Jean-Philippe Brucker @ 2019-02-20 14:27 UTC (permalink / raw)
  To: joro
  Cc: iommu, linux-kernel, christian.koenig, kevin.tian, jacob.jun.pan,
	ashok.raj, baolu.lu, alex.williamson

This series focuses on the device driver API for SVA, as I'd like to get
at least parts of it merged in v5.2 [1]. If we can get consensus on the
interface, it will be easier for device drivers to start adding SVA
support while we're improving the IOMMU side.

Since v3 [2] I changed the interface as requested, and changed iommu-sva
to be a set of helpers rather than the main entry point. This way
intel-svm and amd_iommu_v2 can support the common bind() API without
having to move to the generic implementation (which I'm still rewriting).

The four dev_feature functions are implemented by Lu Baolu's IOMMU-aware
mdev series [3].
 
* iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_SVA) -> true/false
  - reports if SVA is supported by IOMMU and device
  - iommu_ops->dev_has_feat()

* iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) -> 0/err
  - enables SVA if IOMMU and device support it.
  - iommu_ops->dev_enable_feat()

* iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_SVA) -> 0/err
  - disable SVA if it was enabled
  - iommu_ops->dev_disable_feat()

* iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_SVA) -> true/false
  - shows the SVA state (could be implemented in iommu.c)
  - iommu_ops->dev_feature_enabled()

* iommu_sva_bind(dev, mm, &pasid, mm_exit_cb, drvdata) -> 0/err
  - Binds dev to mm
  - Sets a callback to disable the PASID, in case mm exits before the dd
    calls unbind()
  - iommu_ops->bind_mm()

* iommu_sva_unbind(dev, &pasid)
  - Unbinds dev from mm
  - iommu_ops->unbind_mm()

Please have a look and tell me what needs to change to be compatible
with intel-svm, amd_iommu_v2 and AMD KFD. As the only SVA user currently
upstream AMD KFD will get preferential treatment, but to keep this
simple I didn't add the min/max_pasid params that AMD KFD requires to
set non-discoverable PASID limits. I could add iommu_dev_set_sva_param()
or something like that?

Thanks,
Jean

[1] The full patch stack contains:
    * bind()/unbind() API
    * fault reporting API
    * ATS for SMMUv3
    * generic IOASID
    * IO mm tracking
    * IO page fault handler
    * PRI + stall for SMMUv3
    * PASID for SMMUv3
    * VFIO usage example
    I'd like to get at least the first three into v5.2

    git://linux-arm.org/linux-jpb.git sva/current
    http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/sva/current
    Passes all my tests, but needs some refinement.

[2] [PATCH v3 00/10] Shared Virtual Addressing for the IOMMU
    https://www.spinics.net/lists/iommu/msg30076.html
    See also, for more recent interface discussion:
    [RFC PATCH 0/6] Auxiliary IOMMU domains and Arm SMMUv3
    https://www.spinics.net/lists/iommu/msg30637.html

[3] [PATCH v6 0/9] vfio/mdev: IOMMU aware mediated device
    https://lore.kernel.org/lkml/20190213040301.23021-10-baolu.lu@linux.intel.com/T/


---
If you're not sure what this all means:

Shared Virtual Addressing (SVA) is the ability to share process address
spaces with devices. It is called "SVM" (Shared Virtual Memory) by OpenCL
and some IOMMU architectures, but since that abbreviation is already used
for AMD virtualisation in Linux (Secure Virtual Machine), we prefer the
less ambiguous "SVA".

Sharing process address spaces with devices allows to rely on core kernel
memory management for DMA, removing some complexity from application and
device drivers. After binding to a device, applications can instruct it to
perform DMA on buffers obtained with malloc.

The device, bus and IOMMU must support the following features:

* Multiple address spaces per device, for example using the PCI PASID
  (Process Address Space ID) extension. The IOMMU driver allocates a
  PASID and the device uses it in DMA transactions.
* I/O Page Faults (IOPF), for example PCI PRI (Page Request Interface) or
  Arm SMMU stall. The core mm handles translation faults from the IOMMU.
* MMU and IOMMU implement compatible page table formats.
---

Jean-Philippe Brucker (1):
  iommu: Bind process address spaces to devices

 drivers/iommu/iommu.c | 104 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h |  24 ++++++++++
 2 files changed, 128 insertions(+)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/1] iommu: Bind process address spaces to devices
  2019-02-20 14:27 [PATCH 0/1] IOMMU SVA device driver interface Jean-Philippe Brucker
@ 2019-02-20 14:27 ` Jean-Philippe Brucker
  2019-02-26 11:17   ` Joerg Roedel
  0 siblings, 1 reply; 13+ messages in thread
From: Jean-Philippe Brucker @ 2019-02-20 14:27 UTC (permalink / raw)
  To: joro
  Cc: iommu, linux-kernel, christian.koenig, kevin.tian, jacob.jun.pan,
	ashok.raj, baolu.lu, alex.williamson

Add bind() and unbind() operations to the IOMMU API. Bind() returns a
PASID to the device driver (by convention, a 20-bit system-wide ID
representing the address space). When programming DMA addresses, device
drivers include this PASID in a device-specific manner, to let the device
access the given address space. Since the process memory may be paged out,
device and IOMMU must support I/O page faults (e.g. PCI PRI).

Device drivers pass an mm_exit() callback to bind(), that is called by the
IOMMU driver if the process exits before the device driver called
unbind(). In mm_exit(), device driver should disable DMA from the given
context, so that the core IOMMU can reallocate the PASID.

To use these functions, device driver must first enable the
IOMMU_DEV_FEAT_SVA device feature with iommu_dev_enable_feature().

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu.c | 104 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h |  24 ++++++++++
 2 files changed, 128 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 1629354255c3..5feba98566b2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2351,3 +2351,107 @@ int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
 	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_aux_get_pasid);
+
+/**
+ * iommu_sva_bind_device() - Bind a process address space to a device
+ * @dev: the device
+ * @mm: the mm to bind, caller must hold a reference to it
+ * @pasid: valid address where the PASID will be stored
+ * @mm_exit: notifier function to call when the mm exits
+ * @drvdata: private data passed to the mm exit handler
+ *
+ * Create a bond between device and task, allowing the device to access the mm
+ * using the returned PASID. If unbind() isn't called first, a subsequent bind()
+ * for the same device and mm fails with -EEXIST.
+ *
+ * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
+ * initialize the required SVA features.
+ *
+ * @mm_exit is called when the mm is about to be torn down by exit_mmap. After
+ * @mm_exit returns, the device must not issue any more transaction with the
+ * PASID given as argument.
+ *
+ * The @mm_exit handler is allowed to sleep. Be careful about the locks taken in
+ * @mm_exit, because they might lead to deadlocks if they are also held when
+ * dropping references to the mm. Consider the following call chain:
+ *   mutex_lock(A); mmput(mm) -> exit_mm() -> @mm_exit() -> mutex_lock(A)
+ * Using mmput_async() prevents this scenario.
+ *
+ * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
+ * is returned.
+ */
+int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
+			  iommu_mm_exit_handler_t mm_exit, void *drvdata)
+{
+	int ret = -EINVAL;
+	struct iommu_group *group;
+	const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+	if (!pasid)
+		return -EINVAL;
+
+	if (!ops || !ops->bind_mm)
+		return -ENODEV;
+
+	group = iommu_group_get(dev);
+	if (!group)
+		return -ENODEV;
+
+	/* Ensure device count and domain don't change while we're binding */
+	mutex_lock(&group->mutex);
+
+	/*
+	 * To keep things simple, SVA currently doesn't support IOMMU groups
+	 * with more than one device. Existing SVA-capable systems are not
+	 * affected by the problems that required IOMMU groups (lack of ACS
+	 * isolation, device ID aliasing and other hardware issues).
+	 */
+	if (iommu_group_device_count(group) != 1)
+		goto out_unlock;
+
+	ret = ops->bind_mm(dev, mm, pasid, mm_exit, drvdata);
+
+out_unlock:
+	mutex_unlock(&group->mutex);
+	iommu_group_put(group);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
+
+/**
+ * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
+ * @dev: the device
+ * @pasid: the pasid returned by bind()
+ *
+ * Remove bond between device and address space identified by @pasid. Users
+ * should not call unbind() if the corresponding mm exited (as the PASID might
+ * have been reallocated for another process).
+ *
+ * The device must not be issuing any more transaction for this PASID. All
+ * outstanding page requests for this PASID must have been flushed to the IOMMU.
+ *
+ * Returns 0 on success, or an error value
+ */
+int iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	int ret = -EINVAL;
+	struct iommu_group *group;
+	const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+	if (!ops || !ops->unbind_mm)
+		return -ENODEV;
+
+	group = iommu_group_get(dev);
+	if (!group)
+		return -ENODEV;
+
+	mutex_lock(&group->mutex);
+	ret = ops->unbind_mm(dev, pasid);
+	mutex_unlock(&group->mutex);
+
+	iommu_group_put(group);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index bdd68778ceb5..a69aeea58818 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -55,6 +55,7 @@ struct iommu_fault_event;
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
+typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid, void *);
 
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
@@ -159,6 +160,7 @@ struct iommu_resv_region {
 /* Per device IOMMU features */
 enum iommu_dev_features {
 	IOMMU_DEV_FEAT_AUX,	/* Aux-domain feature */
+	IOMMU_DEV_FEAT_SVA,	/* Shared Virtual Addresses */
 };
 
 #ifdef CONFIG_IOMMU_API
@@ -228,6 +230,8 @@ struct page_response_msg {
  * @dev_feat_enabled: check enabled feature
  * @aux_attach/detach_dev: aux-domain specific attach/detach entries.
  * @aux_get_pasid: get the pasid given an aux-domain
+ * @bind_mm: Bind process address space to device
+ * @unbind_mm: Unbind process address space from device
  * @page_response: handle page request response
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  */
@@ -283,6 +287,9 @@ struct iommu_ops {
 	void (*aux_detach_dev)(struct iommu_domain *domain, struct device *dev);
 	int (*aux_get_pasid)(struct iommu_domain *domain, struct device *dev);
 
+	int (*bind_mm)(struct device *dev, struct mm_struct *mm, int *pasid,
+			iommu_mm_exit_handler_t mm_exit, void *drvdata);
+	int (*unbind_mm)(struct device *dev, int pasid);
 	int (*page_response)(struct device *dev, struct page_response_msg *msg);
 
 	unsigned long pgsize_bitmap;
@@ -541,6 +548,11 @@ int iommu_aux_attach_device(struct iommu_domain *domain, struct device *dev);
 void iommu_aux_detach_device(struct iommu_domain *domain, struct device *dev);
 int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev);
 
+extern int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
+				 int *pasid, iommu_mm_exit_handler_t mm_exit,
+				 void *drvdata);
+extern int iommu_sva_unbind_device(struct device *dev, int pasid);
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
@@ -889,6 +901,18 @@ iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
 	return -ENODEV;
 }
 
+static inline int
+iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
+		      iommu_mm_exit_handler_t mm_exit, void *drvdata)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	return -ENODEV;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_DEBUGFS
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] iommu: Bind process address spaces to devices
  2019-02-20 14:27 ` [PATCH 1/1] iommu: Bind process address spaces to devices Jean-Philippe Brucker
@ 2019-02-26 11:17   ` Joerg Roedel
  2019-02-26 12:49     ` Jean-Philippe Brucker
  2019-02-27 21:41     ` Jacob Pan
  0 siblings, 2 replies; 13+ messages in thread
From: Joerg Roedel @ 2019-02-26 11:17 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu, linux-kernel, christian.koenig, kevin.tian, jacob.jun.pan,
	ashok.raj, baolu.lu, alex.williamson

Hi Jean-Philippe,

Thanks for the patch! I think this is getting close to be applied after
the next merge window.

On Wed, Feb 20, 2019 at 02:27:59PM +0000, Jean-Philippe Brucker wrote:
> +int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
> +			  iommu_mm_exit_handler_t mm_exit, void *drvdata)

I think we are better of with introducing a sva-bind handle which can be
used to extend and further configure the binding done with this
function.

How about a 'struct iommu_sva' with an iommu-private definition that is
returned by this function:

	struct iommu_sva *iommu_sva_bind_device(struct device *dev,
						struct mm_struct *mm);

and the corresponding unbind function:

	int iommu_sva_unbind_device(struct iommu_sva* *handle);

(Btw, does this need to return and int? Can unbinding fail?).

With that in place we can implement and extentable API base on the
handle:

	int iommu_sva_get_pasid(struct iommu_sva *handle);
	void iommu_sva_set_exit_handler(struct iommu_sva *handle,
					iommu_mm_exit_handler_t mm_exit);

I think at least the AMD IOMMU driver needs more call-backs like a
handler that is invoked when a fault can not be resolved. And there
might be others in the future, putting them all in the parameter list of
the bind function doesn't scale well.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] iommu: Bind process address spaces to devices
  2019-02-26 11:17   ` Joerg Roedel
@ 2019-02-26 12:49     ` Jean-Philippe Brucker
  2019-02-26 13:02       ` Joerg Roedel
  2019-02-27 21:41     ` Jacob Pan
  1 sibling, 1 reply; 13+ messages in thread
From: Jean-Philippe Brucker @ 2019-02-26 12:49 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: kevin.tian, alex.williamson, ashok.raj, linux-kernel, iommu,
	christian.koenig

On 26/02/2019 11:17, Joerg Roedel wrote:
> Hi Jean-Philippe,
> 
> Thanks for the patch! I think this is getting close to be applied after
> the next merge window.
> 
> On Wed, Feb 20, 2019 at 02:27:59PM +0000, Jean-Philippe Brucker wrote:
>> +int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
>> +			  iommu_mm_exit_handler_t mm_exit, void *drvdata)
> 
> I think we are better of with introducing a sva-bind handle which can be
> used to extend and further configure the binding done with this
> function.
> 
> How about a 'struct iommu_sva' with an iommu-private definition that is
> returned by this function:
> 
> 	struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> 						struct mm_struct *mm);
> 
> and the corresponding unbind function:
> 
> 	int iommu_sva_unbind_device(struct iommu_sva* *handle);
> 
> (Btw, does this need to return and int? Can unbinding fail?).

With the pasid-based interface, unbind would have failed if the mm had
exited before the device driver called unbind (and with invalid
parameters). But even then returning an error is only useful for debug,
since callers usually can't handle or propagate release errors.

> With that in place we can implement and extentable API base on the
> handle:
> 
> 	int iommu_sva_get_pasid(struct iommu_sva *handle);
> 	void iommu_sva_set_exit_handler(struct iommu_sva *handle,
> 					iommu_mm_exit_handler_t mm_exit);

Ok sounds good. It doesn't look like this interface requires a lot of
changes on my side (iommu_sva corresponds to the iommu_bond structure
I've been using internally) but I might find problems while implementing it.

> I think at least the AMD IOMMU driver needs more call-backs like a
> handler that is invoked when a fault can not be resolved. And there
> might be others in the future, putting them all in the parameter list of
> the bind function doesn't scale well.

Device drivers will also want to have some private data to easily
identify the faulting or exiting context. How about:

    struct iommu_sva_ops {
        void (*mm_exit)(struct iommu_sva *handle, void *drvdata);
    };
    int iommu_sva_set_ops(struct iommu_sva *handle,
                          const struct iommu_sva_ops *ops,
                          void *drvdata);

I now think that device driver should always call unbind() to release
the iommu_sva handle, even if they got notified by mm_exit.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] iommu: Bind process address spaces to devices
  2019-02-26 12:49     ` Jean-Philippe Brucker
@ 2019-02-26 13:02       ` Joerg Roedel
  0 siblings, 0 replies; 13+ messages in thread
From: Joerg Roedel @ 2019-02-26 13:02 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kevin.tian, alex.williamson, ashok.raj, linux-kernel, iommu,
	christian.koenig

On Tue, Feb 26, 2019 at 12:49:15PM +0000, Jean-Philippe Brucker wrote:
> On 26/02/2019 11:17, Joerg Roedel wrote:
> > 	int iommu_sva_get_pasid(struct iommu_sva *handle);
> > 	void iommu_sva_set_exit_handler(struct iommu_sva *handle,
> > 					iommu_mm_exit_handler_t mm_exit);
> 
> Ok sounds good. It doesn't look like this interface requires a lot of
> changes on my side (iommu_sva corresponds to the iommu_bond structure
> I've been using internally) but I might find problems while implementing it.

Great!

> Device drivers will also want to have some private data to easily
> identify the faulting or exiting context. How about:
> 
>     struct iommu_sva_ops {
>         void (*mm_exit)(struct iommu_sva *handle, void *drvdata);
>     };
>     int iommu_sva_set_ops(struct iommu_sva *handle,
>                           const struct iommu_sva_ops *ops,
>                           void *drvdata);

Okay, we can also do it this way. But then please pass the drvdata via
the bind() call and not via set_ops(). Set_ops() should then really only
pass the call-backs, as the name implies.

> I now think that device driver should always call unbind() to release
> the iommu_sva handle, even if they got notified by mm_exit.

Yes, this should be required. I think it also makes driver
implementation easier because it doesn't need to care too much about
this special case.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] iommu: Bind process address spaces to devices
  2019-02-26 11:17   ` Joerg Roedel
  2019-02-26 12:49     ` Jean-Philippe Brucker
@ 2019-02-27 21:41     ` Jacob Pan
  2019-02-28  1:10       ` Tian, Kevin
                         ` (2 more replies)
  1 sibling, 3 replies; 13+ messages in thread
From: Jacob Pan @ 2019-02-27 21:41 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Jean-Philippe Brucker, iommu, linux-kernel, christian.koenig,
	kevin.tian, ashok.raj, baolu.lu, alex.williamson, jacob.jun.pan

On Tue, 26 Feb 2019 12:17:43 +0100
Joerg Roedel <joro@8bytes.org> wrote:

> Hi Jean-Philippe,
> 
> Thanks for the patch! I think this is getting close to be applied
> after the next merge window.
> 
> On Wed, Feb 20, 2019 at 02:27:59PM +0000, Jean-Philippe Brucker wrote:
> > +int iommu_sva_bind_device(struct device *dev, struct mm_struct
> > *mm, int *pasid,
> > +			  iommu_mm_exit_handler_t mm_exit, void
> > *drvdata)  
> 
> I think we are better of with introducing a sva-bind handle which can
> be used to extend and further configure the binding done with this
> function.
> 
> How about a 'struct iommu_sva' with an iommu-private definition that
> is returned by this function:
> 
> 	struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> 						struct mm_struct *mm);
> 
Just trying to understand how to use this API.
So if we bind the same mm to two different devices, we should get two
different iommu_sva handle, right?
I think intel-svm still needs a flag argument for supervisor pasid etc.
Other than that, I think both interface should work for vt-d.

Another question is that for nested SVA, we will need to bind guest mm.
Do you think we should try to reuse this or have it separate? I am
working on a separate API for now.

> and the corresponding unbind function:
> 
> 	int iommu_sva_unbind_device(struct iommu_sva* *handle);
> 
> (Btw, does this need to return and int? Can unbinding fail?).
> 
> With that in place we can implement and extentable API base on the
> handle:
> 
> 	int iommu_sva_get_pasid(struct iommu_sva *handle);
If multiple bind to the same mm gets multiple handles, this API should
retrieve the same pasid for different handle?

Just curious why making
the handle private instead of returning the pasid value in the handle?

> 	void iommu_sva_set_exit_handler(struct iommu_sva *handle,
> 					iommu_mm_exit_handler_t
> mm_exit);
> 
> I think at least the AMD IOMMU driver needs more call-backs like a
> handler that is invoked when a fault can not be resolved. And there
> might be others in the future, putting them all in the parameter list
> of the bind function doesn't scale well.
> 

> Regards,
> 
> 	Joerg


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/1] iommu: Bind process address spaces to devices
  2019-02-27 21:41     ` Jacob Pan
@ 2019-02-28  1:10       ` Tian, Kevin
  2019-02-28 18:53         ` Jacob Pan
  2019-02-28 12:19       ` Jean-Philippe Brucker
  2019-02-28 14:09       ` Joerg Roedel
  2 siblings, 1 reply; 13+ messages in thread
From: Tian, Kevin @ 2019-02-28  1:10 UTC (permalink / raw)
  To: Jacob Pan, Joerg Roedel
  Cc: Jean-Philippe Brucker, iommu, linux-kernel, christian.koenig,
	Raj, Ashok, baolu.lu, alex.williamson

> From: Jacob Pan [mailto:jacob.jun.pan@linux.intel.com]
> Sent: Thursday, February 28, 2019 5:41 AM
> 
> On Tue, 26 Feb 2019 12:17:43 +0100
> Joerg Roedel <joro@8bytes.org> wrote:
> 
> >
> > How about a 'struct iommu_sva' with an iommu-private definition that
> > is returned by this function:
> >
> > 	struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> > 						struct mm_struct *mm);
> >
> Just trying to understand how to use this API.
> So if we bind the same mm to two different devices, we should get two
> different iommu_sva handle, right?
> I think intel-svm still needs a flag argument for supervisor pasid etc.
> Other than that, I think both interface should work for vt-d.
> 
> Another question is that for nested SVA, we will need to bind guest mm.
> Do you think we should try to reuse this or have it separate? I am
> working on a separate API for now.
> 

It has to be different. Host doesn't know guest mm.

Also note that from virtualization p.o.v we just focus on 'nested
translation' in host side. The 1st level may point to guest CPU
page table (SVA), or IOVA page table. In that manner, the API
(as currently defined in your series) is purely about setting up
nested translation on VFIO assigned device. 

Thanks
Kevin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] iommu: Bind process address spaces to devices
  2019-02-27 21:41     ` Jacob Pan
  2019-02-28  1:10       ` Tian, Kevin
@ 2019-02-28 12:19       ` Jean-Philippe Brucker
  2019-02-28 18:32         ` Jacob Pan
  2019-02-28 14:09       ` Joerg Roedel
  2 siblings, 1 reply; 13+ messages in thread
From: Jean-Philippe Brucker @ 2019-02-28 12:19 UTC (permalink / raw)
  To: Jacob Pan, Joerg Roedel
  Cc: iommu, linux-kernel, christian.koenig, kevin.tian, ashok.raj,
	baolu.lu, alex.williamson

On 27/02/2019 21:41, Jacob Pan wrote:
> On Tue, 26 Feb 2019 12:17:43 +0100
> Joerg Roedel <joro@8bytes.org> wrote:
> 
>> Hi Jean-Philippe,
>>
>> Thanks for the patch! I think this is getting close to be applied
>> after the next merge window.
>>
>> On Wed, Feb 20, 2019 at 02:27:59PM +0000, Jean-Philippe Brucker wrote:
>>> +int iommu_sva_bind_device(struct device *dev, struct mm_struct
>>> *mm, int *pasid,
>>> +			  iommu_mm_exit_handler_t mm_exit, void
>>> *drvdata)  
>>
>> I think we are better of with introducing a sva-bind handle which can
>> be used to extend and further configure the binding done with this
>> function.
>>
>> How about a 'struct iommu_sva' with an iommu-private definition that
>> is returned by this function:
>>
>> 	struct iommu_sva *iommu_sva_bind_device(struct device *dev,
>> 						struct mm_struct *mm);
>>
> Just trying to understand how to use this API.
> So if we bind the same mm to two different devices, we should get two
> different iommu_sva handle, right?

Yes, the iommu_sva handle is the bond between one mm and one device, so
you will get two different handles.

> I think intel-svm still needs a flag argument for supervisor pasid etc.
> Other than that, I think both interface should work for vt-d.

Is supervisor PASID still needed now that we have auxiliary domains, and
now that VT-d supports nested IOVA? You could have private kernel
address spaces through auxiliary domains, or simply use DMA API as usual
with PASID#0. I've been reluctant to make that feature common because it
looks risky - gives full access to the kernel address space to devices
and no notification on mapping change.

> Another question is that for nested SVA, we will need to bind guest mm.
> Do you think we should try to reuse this or have it separate? I am
> working on a separate API for now.

I also think it should be separate. That bind() operation is performed
on an auxiliary domain, I guess?

>> and the corresponding unbind function:
>>
>> 	int iommu_sva_unbind_device(struct iommu_sva* *handle);
>>
>> (Btw, does this need to return and int? Can unbinding fail?).
>>
>> With that in place we can implement and extentable API base on the
>> handle:
>>
>> 	int iommu_sva_get_pasid(struct iommu_sva *handle);
> If multiple bind to the same mm gets multiple handles, this API should
> retrieve the same pasid for different handle?

Yes

> Just curious why making
> the handle private instead of returning the pasid value in the handle?

I don't have a strong objection against that. One reason to have an
accessor is that the PASID is freed on mm_exit, so until the device
driver calls unbind(), the PASID contained in the handle is stale (and
the accessor returns PASID_INVALID). But that's a bit pedantic, the
device driver should know that the handle is stale since it got notified
of it. Having an accessor might also help tracking uses of the handle in
the kernel, and make future API modifications easier.

Thanks,
Jean

>> 	void iommu_sva_set_exit_handler(struct iommu_sva *handle,
>> 					iommu_mm_exit_handler_t
>> mm_exit);
>>
>> I think at least the AMD IOMMU driver needs more call-backs like a
>> handler that is invoked when a fault can not be resolved. And there
>> might be others in the future, putting them all in the parameter list
>> of the bind function doesn't scale well.
>>
> 
>> Regards,
>>
>> 	Joerg
> 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] iommu: Bind process address spaces to devices
  2019-02-27 21:41     ` Jacob Pan
  2019-02-28  1:10       ` Tian, Kevin
  2019-02-28 12:19       ` Jean-Philippe Brucker
@ 2019-02-28 14:09       ` Joerg Roedel
  2019-02-28 21:15         ` Jacob Pan
  2 siblings, 1 reply; 13+ messages in thread
From: Joerg Roedel @ 2019-02-28 14:09 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Jean-Philippe Brucker, iommu, linux-kernel, christian.koenig,
	kevin.tian, ashok.raj, baolu.lu, alex.williamson

Hi Jacob,

On Wed, Feb 27, 2019 at 01:41:29PM -0800, Jacob Pan wrote:
> On Tue, 26 Feb 2019 12:17:43 +0100
> Joerg Roedel <joro@8bytes.org> wrote:

> Just trying to understand how to use this API.
> So if we bind the same mm to two different devices, we should get two
> different iommu_sva handle, right?
> I think intel-svm still needs a flag argument for supervisor pasid etc.
> Other than that, I think both interface should work for vt-d.

I second Jean's question here, is supervisor pasid still needed with
scalable mode? What is the use-case and which mm_struct will be used for
supervisor accesses?

> Another question is that for nested SVA, we will need to bind guest mm.
> Do you think we should try to reuse this or have it separate? I am
> working on a separate API for now.

I think a separate API makes more sense. It could be somehow fit into
this as well, but having it separate is cleaner. And we already have
separate API for aux-domains, so this would be just another extension of
the IOMMU-API for using PASIDs.


> > 	int iommu_sva_get_pasid(struct iommu_sva *handle);
> If multiple bind to the same mm gets multiple handles, this API should
> retrieve the same pasid for different handle?

It can return the same handle if we store the pasid in the mm_struct,
for example ...
> Just curious why making the handle private instead of returning the
> pasid value in the handle?

... which is also the reason why I prefer the accessor function, it
allows to have the pasid not in the iommu_sva handle, but to retrieve it
from somewhere else (like the mm_struct).

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] iommu: Bind process address spaces to devices
  2019-02-28 12:19       ` Jean-Philippe Brucker
@ 2019-02-28 18:32         ` Jacob Pan
  0 siblings, 0 replies; 13+ messages in thread
From: Jacob Pan @ 2019-02-28 18:32 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Joerg Roedel, iommu, linux-kernel, christian.koenig, kevin.tian,
	ashok.raj, baolu.lu, alex.williamson, jacob.jun.pan

On Thu, 28 Feb 2019 12:19:22 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> On 27/02/2019 21:41, Jacob Pan wrote:
> > On Tue, 26 Feb 2019 12:17:43 +0100
> > Joerg Roedel <joro@8bytes.org> wrote:
> >   
> >> Hi Jean-Philippe,
> >>
> >> Thanks for the patch! I think this is getting close to be applied
> >> after the next merge window.
> >>
> >> On Wed, Feb 20, 2019 at 02:27:59PM +0000, Jean-Philippe Brucker
> >> wrote:  
> >>> +int iommu_sva_bind_device(struct device *dev, struct mm_struct
> >>> *mm, int *pasid,
> >>> +			  iommu_mm_exit_handler_t mm_exit, void
> >>> *drvdata)    
> >>
> >> I think we are better of with introducing a sva-bind handle which
> >> can be used to extend and further configure the binding done with
> >> this function.
> >>
> >> How about a 'struct iommu_sva' with an iommu-private definition
> >> that is returned by this function:
> >>
> >> 	struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> >> 						struct mm_struct
> >> *mm); 
> > Just trying to understand how to use this API.
> > So if we bind the same mm to two different devices, we should get
> > two different iommu_sva handle, right?  
> 
> Yes, the iommu_sva handle is the bond between one mm and one device,
> so you will get two different handles.
> 
> > I think intel-svm still needs a flag argument for supervisor pasid
> > etc. Other than that, I think both interface should work for vt-d.  
> 
> Is supervisor PASID still needed now that we have auxiliary domains,
> and now that VT-d supports nested IOVA? You could have private kernel
> address spaces through auxiliary domains, or simply use DMA API as
> usual with PASID#0. I've been reluctant to make that feature common
> because it looks risky - gives full access to the kernel address
> space to devices and no notification on mapping change.
> 
It is still in the VT-d spec. Ashok will be able to answer this
better :)
> > Another question is that for nested SVA, we will need to bind guest
> > mm. Do you think we should try to reuse this or have it separate? I
> > am working on a separate API for now.  
> 
> I also think it should be separate. That bind() operation is performed
> on an auxiliary domain, I guess?
> 
yes the 2nd level is retrieved from aux domain for mdev, but for pdev,
2nd level comes from rid2pasid/default domain.
> >> and the corresponding unbind function:
> >>
> >> 	int iommu_sva_unbind_device(struct iommu_sva* *handle);
> >>
> >> (Btw, does this need to return and int? Can unbinding fail?).
> >>
> >> With that in place we can implement and extentable API base on the
> >> handle:
> >>
> >> 	int iommu_sva_get_pasid(struct iommu_sva *handle);  
> > If multiple bind to the same mm gets multiple handles, this API
> > should retrieve the same pasid for different handle?  
> 
> Yes
> 
> > Just curious why making
> > the handle private instead of returning the pasid value in the
> > handle?  
> 
> I don't have a strong objection against that. One reason to have an
> accessor is that the PASID is freed on mm_exit, so until the device
> driver calls unbind(), the PASID contained in the handle is stale (and
> the accessor returns PASID_INVALID). But that's a bit pedantic, the
> device driver should know that the handle is stale since it got
> notified of it. Having an accessor might also help tracking uses of
> the handle in the kernel, and make future API modifications easier.
> 
make sense.
> Thanks,
> Jean
> 
> >> 	void iommu_sva_set_exit_handler(struct iommu_sva *handle,
> >> 					iommu_mm_exit_handler_t
> >> mm_exit);
> >>
> >> I think at least the AMD IOMMU driver needs more call-backs like a
> >> handler that is invoked when a fault can not be resolved. And there
> >> might be others in the future, putting them all in the parameter
> >> list of the bind function doesn't scale well.
> >>  
> >   
> >> Regards,
> >>
> >> 	Joerg  
> > 
> >   
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] iommu: Bind process address spaces to devices
  2019-02-28  1:10       ` Tian, Kevin
@ 2019-02-28 18:53         ` Jacob Pan
  0 siblings, 0 replies; 13+ messages in thread
From: Jacob Pan @ 2019-02-28 18:53 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Joerg Roedel, Jean-Philippe Brucker, iommu, linux-kernel,
	christian.koenig, Raj, Ashok, baolu.lu, alex.williamson,
	jacob.jun.pan

On Thu, 28 Feb 2019 01:10:55 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Jacob Pan [mailto:jacob.jun.pan@linux.intel.com]
> > Sent: Thursday, February 28, 2019 5:41 AM
> > 
> > On Tue, 26 Feb 2019 12:17:43 +0100
> > Joerg Roedel <joro@8bytes.org> wrote:
> >   
> > >
> > > How about a 'struct iommu_sva' with an iommu-private definition
> > > that is returned by this function:
> > >
> > > 	struct iommu_sva *iommu_sva_bind_device(struct device
> > > *dev, struct mm_struct *mm);
> > >  
> > Just trying to understand how to use this API.
> > So if we bind the same mm to two different devices, we should get
> > two different iommu_sva handle, right?
> > I think intel-svm still needs a flag argument for supervisor pasid
> > etc. Other than that, I think both interface should work for vt-d.
> > 
> > Another question is that for nested SVA, we will need to bind guest
> > mm. Do you think we should try to reuse this or have it separate? I
> > am working on a separate API for now.
> >   
> 
> It has to be different. Host doesn't know guest mm.
> 
> Also note that from virtualization p.o.v we just focus on 'nested
> translation' in host side. The 1st level may point to guest CPU
> page table (SVA), or IOVA page table. In that manner, the API
> (as currently defined in your series) is purely about setting up
> nested translation on VFIO assigned device. 
>
Sounds good, will keep them separate.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] iommu: Bind process address spaces to devices
  2019-02-28 14:09       ` Joerg Roedel
@ 2019-02-28 21:15         ` Jacob Pan
  2019-02-28 22:04           ` Raj, Ashok
  0 siblings, 1 reply; 13+ messages in thread
From: Jacob Pan @ 2019-02-28 21:15 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Jean-Philippe Brucker, iommu, linux-kernel, christian.koenig,
	kevin.tian, ashok.raj, baolu.lu, alex.williamson, jacob.jun.pan

On Thu, 28 Feb 2019 15:09:50 +0100
Joerg Roedel <joro@8bytes.org> wrote:

> Hi Jacob,
> 
> On Wed, Feb 27, 2019 at 01:41:29PM -0800, Jacob Pan wrote:
> > On Tue, 26 Feb 2019 12:17:43 +0100
> > Joerg Roedel <joro@8bytes.org> wrote:  
> 
> > Just trying to understand how to use this API.
> > So if we bind the same mm to two different devices, we should get
> > two different iommu_sva handle, right?
> > I think intel-svm still needs a flag argument for supervisor pasid
> > etc. Other than that, I think both interface should work for vt-d.  
> 
> I second Jean's question here, is supervisor pasid still needed with
> scalable mode? What is the use-case and which mm_struct will be used
> for supervisor accesses?
> 
I will delegate this to Ashok.

> > Another question is that for nested SVA, we will need to bind guest
> > mm. Do you think we should try to reuse this or have it separate? I
> > am working on a separate API for now.  
> 
> I think a separate API makes more sense. It could be somehow fit into
> this as well, but having it separate is cleaner. And we already have
> separate API for aux-domains, so this would be just another extension
> of the IOMMU-API for using PASIDs.
> 
Agreed.
> 
> > > 	int iommu_sva_get_pasid(struct iommu_sva *handle);  
> > If multiple bind to the same mm gets multiple handles, this API
> > should retrieve the same pasid for different handle?  
> 
> It can return the same handle if we store the pasid in the mm_struct,
> for example ...
> > Just curious why making the handle private instead of returning the
> > pasid value in the handle?  
> 
> ... which is also the reason why I prefer the accessor function, it
> allows to have the pasid not in the iommu_sva handle, but to retrieve
> it from somewhere else (like the mm_struct).
make sense, more flexible storage and controlled access too. thanks for
explaining.


Jacob

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] iommu: Bind process address spaces to devices
  2019-02-28 21:15         ` Jacob Pan
@ 2019-02-28 22:04           ` Raj, Ashok
  0 siblings, 0 replies; 13+ messages in thread
From: Raj, Ashok @ 2019-02-28 22:04 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Joerg Roedel, Jean-Philippe Brucker, iommu, linux-kernel,
	christian.koenig, kevin.tian, baolu.lu, alex.williamson,
	Ashok Raj

On Thu, Feb 28, 2019 at 01:15:49PM -0800, Jacob Pan wrote:
> On Thu, 28 Feb 2019 15:09:50 +0100
> Joerg Roedel <joro@8bytes.org> wrote:
> 
> > Hi Jacob,
> > 
> > On Wed, Feb 27, 2019 at 01:41:29PM -0800, Jacob Pan wrote:
> > > On Tue, 26 Feb 2019 12:17:43 +0100
> > > Joerg Roedel <joro@8bytes.org> wrote:  
> > 
> > > Just trying to understand how to use this API.
> > > So if we bind the same mm to two different devices, we should get
> > > two different iommu_sva handle, right?
> > > I think intel-svm still needs a flag argument for supervisor pasid
> > > etc. Other than that, I think both interface should work for vt-d.  
> > 
> > I second Jean's question here, is supervisor pasid still needed with
> > scalable mode? What is the use-case and which mm_struct will be used
> > for supervisor accesses?
> > 
> I will delegate this to Ashok.

Supervisor PASID is still required for some kernel clients. Some of our
IB folks had asked for it. Current implementation uses init_mm, but
we know this is dangerous. Plus since the kernel has no support for
mmu_notifiers for kernel memory we were not able to invalidate
device tlb after memory was freed.

Suppose we could just build regular page-tables much like how the map/unmap
does today, but bind it with a Supervisor PASID. This way we don't open
up the kimono to the device, but only open select portions on request.

we haven't spent enough time on it lately, but will focus once the core
pieces are completed for the baseline support for Scalable mode.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-02-28 22:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-20 14:27 [PATCH 0/1] IOMMU SVA device driver interface Jean-Philippe Brucker
2019-02-20 14:27 ` [PATCH 1/1] iommu: Bind process address spaces to devices Jean-Philippe Brucker
2019-02-26 11:17   ` Joerg Roedel
2019-02-26 12:49     ` Jean-Philippe Brucker
2019-02-26 13:02       ` Joerg Roedel
2019-02-27 21:41     ` Jacob Pan
2019-02-28  1:10       ` Tian, Kevin
2019-02-28 18:53         ` Jacob Pan
2019-02-28 12:19       ` Jean-Philippe Brucker
2019-02-28 18:32         ` Jacob Pan
2019-02-28 14:09       ` Joerg Roedel
2019-02-28 21:15         ` Jacob Pan
2019-02-28 22:04           ` Raj, Ashok

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).