Re: [Patch v8 03/10] vfio/type1: Report iommu nesting info to userspace

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [Patch v8 03/10] vfio/type1: Report iommu nesting info to userspace
  2021-03-02 20:35   ` Liu Yi L
@ 2021-03-02 12:52     ` Jason Gunthorpe
  -1 siblings, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2021-03-02 12:52 UTC (permalink / raw)
  To: Liu Yi L
  Cc: alex.williamson, eric.auger, baolu.lu, joro, kevin.tian,
	jacob.jun.pan, ashok.raj, jun.j.tian, yi.y.sun, jean-philippe,
	peterx, jasowang, hao.wu, stefanha, iommu, kvm, Lingshan.Zhu,
	vivek.gautam

On Wed, Mar 03, 2021 at 04:35:38AM +0800, Liu Yi L wrote:
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 4bb162c1d649..3a5c84d4f19b 100644
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -63,22 +63,24 @@ MODULE_PARM_DESC(dma_entry_limit,
>  		 "Maximum number of user DMA mappings per container (65535).");
>  
>  struct vfio_iommu {
> -	struct list_head	domain_list;
> -	struct list_head	iova_list;
> -	struct vfio_domain	*external_domain; /* domain for external user */
> -	struct mutex		lock;
> -	struct rb_root		dma_list;
> -	struct blocking_notifier_head notifier;
> -	unsigned int		dma_avail;
> -	unsigned int		vaddr_invalid_count;
> -	uint64_t		pgsize_bitmap;
> -	uint64_t		num_non_pinned_groups;
> -	wait_queue_head_t	vaddr_wait;
> -	bool			v2;
> -	bool			nesting;
> -	bool			dirty_page_tracking;
> -	bool			pinned_page_dirty_scope;
> -	bool			container_open;
> +	struct list_head		domain_list;
> +	struct list_head		iova_list;
> +	/* domain for external user */
> +	struct vfio_domain		*external_domain;
> +	struct mutex			lock;
> +	struct rb_root			dma_list;
> +	struct blocking_notifier_head	notifier;
> +	unsigned int			dma_avail;
> +	unsigned int			vaddr_invalid_count;
> +	uint64_t			pgsize_bitmap;
> +	uint64_t			num_non_pinned_groups;
> +	wait_queue_head_t		vaddr_wait;
> +	struct iommu_nesting_info	*nesting_info;
> +	bool				v2;
> +	bool				nesting;
> +	bool				dirty_page_tracking;
> +	bool				pinned_page_dirty_scope;
> +	bool				container_open;
>  };

I always hate seeing one line patches done like this. If you want to
re-indent you should remove the horizontal whitespace, not add an
unreadable amount more.

Also, Linus has been unhappy before to see lists of bool's in structs
due to the huge amount of memory they waste.

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch v8 03/10] vfio/type1: Report iommu nesting info to userspace
@ 2021-03-02 12:52     ` Jason Gunthorpe
  0 siblings, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2021-03-02 12:52 UTC (permalink / raw)
  To: Liu Yi L
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, vivek.gautam,
	jasowang, stefanha, yi.y.sun, alex.williamson, iommu,
	Lingshan.Zhu, hao.wu, jun.j.tian

On Wed, Mar 03, 2021 at 04:35:38AM +0800, Liu Yi L wrote:
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 4bb162c1d649..3a5c84d4f19b 100644
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -63,22 +63,24 @@ MODULE_PARM_DESC(dma_entry_limit,
>  		 "Maximum number of user DMA mappings per container (65535).");
>  
>  struct vfio_iommu {
> -	struct list_head	domain_list;
> -	struct list_head	iova_list;
> -	struct vfio_domain	*external_domain; /* domain for external user */
> -	struct mutex		lock;
> -	struct rb_root		dma_list;
> -	struct blocking_notifier_head notifier;
> -	unsigned int		dma_avail;
> -	unsigned int		vaddr_invalid_count;
> -	uint64_t		pgsize_bitmap;
> -	uint64_t		num_non_pinned_groups;
> -	wait_queue_head_t	vaddr_wait;
> -	bool			v2;
> -	bool			nesting;
> -	bool			dirty_page_tracking;
> -	bool			pinned_page_dirty_scope;
> -	bool			container_open;
> +	struct list_head		domain_list;
> +	struct list_head		iova_list;
> +	/* domain for external user */
> +	struct vfio_domain		*external_domain;
> +	struct mutex			lock;
> +	struct rb_root			dma_list;
> +	struct blocking_notifier_head	notifier;
> +	unsigned int			dma_avail;
> +	unsigned int			vaddr_invalid_count;
> +	uint64_t			pgsize_bitmap;
> +	uint64_t			num_non_pinned_groups;
> +	wait_queue_head_t		vaddr_wait;
> +	struct iommu_nesting_info	*nesting_info;
> +	bool				v2;
> +	bool				nesting;
> +	bool				dirty_page_tracking;
> +	bool				pinned_page_dirty_scope;
> +	bool				container_open;
>  };

I always hate seeing one line patches done like this. If you want to
re-indent you should remove the horizontal whitespace, not add an
unreadable amount more.

Also, Linus has been unhappy before to see lists of bool's in structs
due to the huge amount of memory they waste.

Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
  2021-03-02 20:35   ` Liu Yi L
@ 2021-03-02 12:56     ` Jason Gunthorpe
  -1 siblings, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2021-03-02 12:56 UTC (permalink / raw)
  To: Liu Yi L
  Cc: alex.williamson, eric.auger, baolu.lu, joro, kevin.tian,
	jacob.jun.pan, ashok.raj, jun.j.tian, yi.y.sun, jean-philippe,
	peterx, jasowang, hao.wu, stefanha, iommu, kvm, Lingshan.Zhu,
	vivek.gautam

On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:
>  
> +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> +{
> +	struct domain_capsule *dc = (struct domain_capsule *)data;
> +	unsigned long arg = *(unsigned long *)dc->data;
> +
> +	return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> +					  (void __user *)arg);

This arg buisness is really tortured. The type should be set at the
ioctl, not constantly passed down as unsigned long or worse void *.

And why is this passing a __user pointer deep into an iommu_* API??

> +/**
> + * VFIO_IOMMU_NESTING_OP - _IOW(VFIO_TYPE, VFIO_BASE + 18,
> + *				struct vfio_iommu_type1_nesting_op)
> + *
> + * This interface allows userspace to utilize the nesting IOMMU
> + * capabilities as reported in VFIO_IOMMU_TYPE1_INFO_CAP_NESTING
> + * cap through VFIO_IOMMU_GET_INFO. For platforms which require
> + * system wide PASID, PASID will be allocated by VFIO_IOMMU_PASID
> + * _REQUEST.
> + *
> + * @data[] types defined for each op:
> + * +=================+===============================================+
> + * | NESTING OP      |      @data[]                                  |
> + * +=================+===============================================+
> + * | BIND_PGTBL      |      struct iommu_gpasid_bind_data            |
> + * +-----------------+-----------------------------------------------+
> + * | UNBIND_PGTBL    |      struct iommu_gpasid_bind_data            |
> + *
> +-----------------+-----------------------------------------------+

If the type is known why does the struct have a flex array?

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
@ 2021-03-02 12:56     ` Jason Gunthorpe
  0 siblings, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2021-03-02 12:56 UTC (permalink / raw)
  To: Liu Yi L
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, vivek.gautam,
	jasowang, stefanha, yi.y.sun, alex.williamson, iommu,
	Lingshan.Zhu, hao.wu, jun.j.tian

On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:
>  
> +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> +{
> +	struct domain_capsule *dc = (struct domain_capsule *)data;
> +	unsigned long arg = *(unsigned long *)dc->data;
> +
> +	return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> +					  (void __user *)arg);

This arg buisness is really tortured. The type should be set at the
ioctl, not constantly passed down as unsigned long or worse void *.

And why is this passing a __user pointer deep into an iommu_* API??

> +/**
> + * VFIO_IOMMU_NESTING_OP - _IOW(VFIO_TYPE, VFIO_BASE + 18,
> + *				struct vfio_iommu_type1_nesting_op)
> + *
> + * This interface allows userspace to utilize the nesting IOMMU
> + * capabilities as reported in VFIO_IOMMU_TYPE1_INFO_CAP_NESTING
> + * cap through VFIO_IOMMU_GET_INFO. For platforms which require
> + * system wide PASID, PASID will be allocated by VFIO_IOMMU_PASID
> + * _REQUEST.
> + *
> + * @data[] types defined for each op:
> + * +=================+===============================================+
> + * | NESTING OP      |      @data[]                                  |
> + * +=================+===============================================+
> + * | BIND_PGTBL      |      struct iommu_gpasid_bind_data            |
> + * +-----------------+-----------------------------------------------+
> + * | UNBIND_PGTBL    |      struct iommu_gpasid_bind_data            |
> + *
> +-----------------+-----------------------------------------------+

If the type is known why does the struct have a flex array?

Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
  2021-03-02 12:56     ` Jason Gunthorpe
@ 2021-03-02 17:13       ` Jacob Pan
  -1 siblings, 0 replies; 40+ messages in thread
From: Jacob Pan @ 2021-03-02 17:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu Yi L, alex.williamson, eric.auger, baolu.lu, joro,
	kevin.tian, ashok.raj, jun.j.tian, yi.y.sun, jean-philippe,
	peterx, jasowang, hao.wu, stefanha, iommu, kvm, Lingshan.Zhu,
	vivek.gautam, jacob.jun.pan

Hi Jason,

On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:
> >  
> > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > +{
> > +	struct domain_capsule *dc = (struct domain_capsule *)data;
> > +	unsigned long arg = *(unsigned long *)dc->data;
> > +
> > +	return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> > +					  (void __user *)arg);  
> 
> This arg buisness is really tortured. The type should be set at the
> ioctl, not constantly passed down as unsigned long or worse void *.
> 
> And why is this passing a __user pointer deep into an iommu_* API??
> 
The idea was that IOMMU UAPI (not API) is independent of VFIO or other user
driver frameworks. The design is documented here:
Documentation/userspace-api/iommu.rst
IOMMU UAPI handles the type and sanitation of user provided data.

Could you be more specific about your concerns?

> > +/**
> > + * VFIO_IOMMU_NESTING_OP - _IOW(VFIO_TYPE, VFIO_BASE + 18,
> > + *				struct vfio_iommu_type1_nesting_op)
> > + *
> > + * This interface allows userspace to utilize the nesting IOMMU
> > + * capabilities as reported in VFIO_IOMMU_TYPE1_INFO_CAP_NESTING
> > + * cap through VFIO_IOMMU_GET_INFO. For platforms which require
> > + * system wide PASID, PASID will be allocated by VFIO_IOMMU_PASID
> > + * _REQUEST.
> > + *
> > + * @data[] types defined for each op:
> > + * +=================+===============================================+
> > + * | NESTING OP      |      @data[]                                  |
> > + * +=================+===============================================+
> > + * | BIND_PGTBL      |      struct iommu_gpasid_bind_data            |
> > + * +-----------------+-----------------------------------------------+
> > + * | UNBIND_PGTBL    |      struct iommu_gpasid_bind_data            |
> > + *
> > +-----------------+-----------------------------------------------+  
> 
> If the type is known why does the struct have a flex array?
> 
This will be extended to other types in the next patches.

> Jason


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
@ 2021-03-02 17:13       ` Jacob Pan
  0 siblings, 0 replies; 40+ messages in thread
From: Jacob Pan @ 2021-03-02 17:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, vivek.gautam,
	jasowang, stefanha, yi.y.sun, alex.williamson, iommu,
	Lingshan.Zhu, hao.wu, jun.j.tian

Hi Jason,

On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:
> >  
> > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > +{
> > +	struct domain_capsule *dc = (struct domain_capsule *)data;
> > +	unsigned long arg = *(unsigned long *)dc->data;
> > +
> > +	return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> > +					  (void __user *)arg);  
> 
> This arg buisness is really tortured. The type should be set at the
> ioctl, not constantly passed down as unsigned long or worse void *.
> 
> And why is this passing a __user pointer deep into an iommu_* API??
> 
The idea was that IOMMU UAPI (not API) is independent of VFIO or other user
driver frameworks. The design is documented here:
Documentation/userspace-api/iommu.rst
IOMMU UAPI handles the type and sanitation of user provided data.

Could you be more specific about your concerns?

> > +/**
> > + * VFIO_IOMMU_NESTING_OP - _IOW(VFIO_TYPE, VFIO_BASE + 18,
> > + *				struct vfio_iommu_type1_nesting_op)
> > + *
> > + * This interface allows userspace to utilize the nesting IOMMU
> > + * capabilities as reported in VFIO_IOMMU_TYPE1_INFO_CAP_NESTING
> > + * cap through VFIO_IOMMU_GET_INFO. For platforms which require
> > + * system wide PASID, PASID will be allocated by VFIO_IOMMU_PASID
> > + * _REQUEST.
> > + *
> > + * @data[] types defined for each op:
> > + * +=================+===============================================+
> > + * | NESTING OP      |      @data[]                                  |
> > + * +=================+===============================================+
> > + * | BIND_PGTBL      |      struct iommu_gpasid_bind_data            |
> > + * +-----------------+-----------------------------------------------+
> > + * | UNBIND_PGTBL    |      struct iommu_gpasid_bind_data            |
> > + *
> > +-----------------+-----------------------------------------------+  
> 
> If the type is known why does the struct have a flex array?
> 
This will be extended to other types in the next patches.

> Jason


Thanks,

Jacob
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
  2021-03-02 17:13       ` Jacob Pan
@ 2021-03-02 17:15         ` Jason Gunthorpe
  -1 siblings, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2021-03-02 17:15 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Liu Yi L, alex.williamson, eric.auger, baolu.lu, joro,
	kevin.tian, ashok.raj, jun.j.tian, yi.y.sun, jean-philippe,
	peterx, jasowang, hao.wu, stefanha, iommu, kvm, Lingshan.Zhu,
	vivek.gautam

On Tue, Mar 02, 2021 at 09:13:19AM -0800, Jacob Pan wrote:
> Hi Jason,
> 
> On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:
> > >  
> > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > > +{
> > > +	struct domain_capsule *dc = (struct domain_capsule *)data;
> > > +	unsigned long arg = *(unsigned long *)dc->data;
> > > +
> > > +	return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> > > +					  (void __user *)arg);  
> > 
> > This arg buisness is really tortured. The type should be set at the
> > ioctl, not constantly passed down as unsigned long or worse void *.
> > 
> > And why is this passing a __user pointer deep into an iommu_* API??
> > 
> The idea was that IOMMU UAPI (not API) is independent of VFIO or other user
> driver frameworks. The design is documented here:
> Documentation/userspace-api/iommu.rst
> IOMMU UAPI handles the type and sanitation of user provided data.

Why? If it is uapi it has defined types and those types should be
completely clear from the C code, not obfuscated.

I haven't looked at the design doc yet, but this is a just a big red
flag, you shouldn't be tunneling one subsytems uAPI through another
subsystem.

If you need to hook two subsystems together it should be more
directly, like VFIO takes in the IOMMU FD and 'registers' itself in
some way with the IOMMU then you can do the IOMMU actions through the
IOMMU FD and it can call back to VFIO as needed.

At least in this way we can swap VFIO for other things in the API.

Having every subsystem that wants to implement IOMMU also implement
tunneled ops seems very backwards.

> Could you be more specific about your concerns?

Avoid using unsigned long, void * and flex arrays to describe
concretely typed things.

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
@ 2021-03-02 17:15         ` Jason Gunthorpe
  0 siblings, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2021-03-02 17:15 UTC (permalink / raw)
  To: Jacob Pan
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, vivek.gautam,
	jasowang, stefanha, yi.y.sun, alex.williamson, iommu,
	Lingshan.Zhu, hao.wu, jun.j.tian

On Tue, Mar 02, 2021 at 09:13:19AM -0800, Jacob Pan wrote:
> Hi Jason,
> 
> On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:
> > >  
> > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > > +{
> > > +	struct domain_capsule *dc = (struct domain_capsule *)data;
> > > +	unsigned long arg = *(unsigned long *)dc->data;
> > > +
> > > +	return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> > > +					  (void __user *)arg);  
> > 
> > This arg buisness is really tortured. The type should be set at the
> > ioctl, not constantly passed down as unsigned long or worse void *.
> > 
> > And why is this passing a __user pointer deep into an iommu_* API??
> > 
> The idea was that IOMMU UAPI (not API) is independent of VFIO or other user
> driver frameworks. The design is documented here:
> Documentation/userspace-api/iommu.rst
> IOMMU UAPI handles the type and sanitation of user provided data.

Why? If it is uapi it has defined types and those types should be
completely clear from the C code, not obfuscated.

I haven't looked at the design doc yet, but this is a just a big red
flag, you shouldn't be tunneling one subsytems uAPI through another
subsystem.

If you need to hook two subsystems together it should be more
directly, like VFIO takes in the IOMMU FD and 'registers' itself in
some way with the IOMMU then you can do the IOMMU actions through the
IOMMU FD and it can call back to VFIO as needed.

At least in this way we can swap VFIO for other things in the API.

Having every subsystem that wants to implement IOMMU also implement
tunneled ops seems very backwards.

> Could you be more specific about your concerns?

Avoid using unsigned long, void * and flex arrays to describe
concretely typed things.

Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Patch v8 00/10] vfio: expose virtual Shared Virtual Addressing to VMs
@ 2021-03-02 20:35 ` Liu Yi L
  0 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: kevin.tian, jacob.jun.pan, ashok.raj, yi.l.liu, jun.j.tian,
	yi.y.sun, jean-philippe, peterx, jasowang, hao.wu, stefanha,
	iommu, kvm, jgg, Lingshan.Zhu, vivek.gautam

Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
Intel platforms allows address space sharing between device DMA and
applications. SVA can reduce programming complexity and enhance security.

This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
guest application address space with passthru devices. This is called
vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
changes. For IOMMU and QEMU changes, they are in separate series (listed
in the "Related series").

The high-level architecture for SVA virtualization is as below, the key
design of vSVA support is to utilize the dual-stage IOMMU translation (
also known as IOMMU nesting translation) capability in host IOMMU.


    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process CR3, FL only|
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'                       |
    |             |                       V
    |             |                CR3 in GPA
    '-------------'
Guest
------| Shadow |--------------------------|--------
      v        v                          v
Host
    .-------------.  .----------------------.
    |   pIOMMU    |  | Bind FL for GVA-GPA  |
    |             |  '----------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.------------------------------.
    |             |   |SL for GPA-HPA, default domain|
    |             |   '------------------------------'
    '-------------'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

This patch series has been updated regards to the disscussion around PASID
allocation in v7 [1]. This series has removed the PASID allocation, and
adapted to the /dev/ioasid solution [2]. Therefore the patches in this series
has been re-ordered. And the Patch Overview is as below:

 1. reports IOMMU nesting info to userspace ( patch 0001, 0002, 0003, 0010)
 2. vfio support for binding guest page table to host (patch 0004)
 3. vfio support for IOMMU cache invalidation from VMs (patch 0005)
 4. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0006, 0007)
 5. expose PASID capability to VM (patch 0008)
 6. add doc for VFIO dual stage control (patch 0009)

The complete vSVA kernel upstream patches are divided into three phases:
    1. Common APIs and PCI device direct assignment
    2. IOMMU-backed Mediated Device assignment
    3. Page Request Services (PRS) support

This patchset is aiming for the phase 1 and phase 2. And it has dependency on IOASID
extension from Jacobd Pan [3]. Complete set for current vSVA kernel and QEMU can be
found in [4] and [5].

[1] https://lore.kernel.org/kvm/DM5PR11MB14351121729909028D6EB365C31D0@DM5PR11MB1435.namprd11.prod.outlook.com/
[2] https://lore.kernel.org/linux-iommu/1614463286-97618-19-git-send-email-jacob.jun.pan@linux.intel.com/
[3] https://lore.kernel.org/linux-iommu/1614463286-97618-1-git-send-email-jacob.jun.pan@linux.intel.com/
[4] https://github.com/jacobpan/linux/tree/vsva-linux-5.12-rc1-v8
[5] https://github.com/luxis1999/qemu/tree/vsva_5.12_rc1_qemu_rfcv11

Regards,
Yi Liu

Changelog:
	- Patch v7 -> Patch v8:
	  a) removed the PASID allocation out of this series, it is covered by below patch:
	     https://lore.kernel.org/linux-iommu/1614463286-97618-19-git-send-email-jacob.jun.pan@linux.intel.com/
	  Patch v7: https://lore.kernel.org/kvm/1599734733-6431-1-git-send-email-yi.l.liu@intel.com/

	- Patch v6 -> Patch v7:
	  a) drop [PATCH v6 01/15] of v6 as it's merged by Alex.
	  b) rebase on Jacob's v8 IOMMU uapi enhancement and v2 IOASID extension patchset.
	  c) Address comments against v6 from Alex and Eric.
	  Patch v6: https://lore.kernel.org/kvm/1595917664-33276-1-git-send-email-yi.l.liu@intel.com/

	- Patch v5 -> Patch v6:
	  a) Address comments against v5 from Eric.
	  b) rebase on Jacob's v6 IOMMU uapi enhancement
	  Patch v5: https://lore.kernel.org/kvm/1594552870-55687-1-git-send-email-yi.l.liu@intel.com/

	- Patch v4 -> Patch v5:
	  a) Address comments against v4
	  Patch v4: https://lore.kernel.org/kvm/1593861989-35920-1-git-send-email-yi.l.liu@intel.com/

	- Patch v3 -> Patch v4:
	  a) Address comments against v3
	  b) Add rb from Stefan on patch 14/15
	  Patch v3: https://lore.kernel.org/kvm/1592988927-48009-1-git-send-email-yi.l.liu@intel.com/

	- Patch v2 -> Patch v3:
	  a) Rebase on top of Jacob's v3 iommu uapi patchset
	  b) Address comments from Kevin and Stefan Hajnoczi
	  c) Reuse DOMAIN_ATTR_NESTING to get iommu nesting info
	  d) Drop [PATCH v2 07/15] iommu/uapi: Add iommu_gpasid_unbind_data
	  Patch v2: https://lore.kernel.org/kvm/1591877734-66527-1-git-send-email-yi.l.liu@intel.com/

	- Patch v1 -> Patch v2:
	  a) Refactor vfio_iommu_type1_ioctl() per suggestion from Christoph
	     Hellwig.
	  b) Re-sequence the patch series for better bisect support.
	  c) Report IOMMU nesting cap info in detail instead of a format in
	     v1.
	  d) Enforce one group per nesting type container for vfio iommu type1
	     driver.
	  e) Build the vfio_mm related code from vfio.c to be a separate
	     vfio_pasid.ko.
	  f) Add PASID ownership check in IOMMU driver.
	  g) Adopted to latest IOMMU UAPI design. Removed IOMMU UAPI version
	     check. Added iommu_gpasid_unbind_data for unbind requests from
	     userspace.
	  h) Define a single ioctl:VFIO_IOMMU_NESTING_OP for bind/unbind_gtbl
	     and cahce_invld.
	  i) Document dual stage control in vfio.rst.
	  Patch v1: https://lore.kernel.org/kvm/1584880325-10561-1-git-send-email-yi.l.liu@intel.com/

	- RFC v3 -> Patch v1:
	  a) Address comments to the PASID request(alloc/free) path
	  b) Report PASID alloc/free availabitiy to user-space
	  c) Add a vfio_iommu_type1 parameter to support pasid quota tuning
	  d) Adjusted to latest ioasid code implementation. e.g. remove the
	     code for tracking the allocated PASIDs as latest ioasid code
	     will track it, VFIO could use ioasid_free_set() to free all
	     PASIDs.
	  RFC v3: https://lore.kernel.org/kvm/1580299912-86084-1-git-send-email-yi.l.liu@intel.com/

	- RFC v2 -> v3:
	  a) Refine the whole patchset to fit the roughly parts in this series
	  b) Adds complete vfio PASID management framework. e.g. pasid alloc,
	  free, reclaim in VM crash/down and per-VM PASID quota to prevent
	  PASID abuse.
	  c) Adds IOMMU uAPI version check and page table format check to ensure
	  version compatibility and hardware compatibility.
	  d) Adds vSVA vfio support for IOMMU-backed mdevs.
	  RFC v2: https://lore.kernel.org/kvm/1571919983-3231-1-git-send-email-yi.l.liu@intel.com/

	- RFC v1 -> v2:
	  Dropped vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE.
	  RFC v1: https://lore.kernel.org/kvm/1562324772-3084-1-git-send-email-yi.l.liu@intel.com/

---
Eric Auger (1):
  vfio: Document dual stage control

Liu Yi L (8):
  iommu: Report domain nesting info
  iommu/smmu: Report empty domain nesting info
  vfio/type1: Report iommu nesting info to userspace
  vfio/type1: Support binding guest page tables to PASID
  vfio/type1: Allow invalidating first-level/stage IOMMU cache
  vfio/type1: Add vSVA support for IOMMU-backed mdevs
  vfio/pci: Expose PCIe PASID capability to userspace
  iommu/vt-d: Support reporting nesting capability info

Yi Sun (1):
  iommu: Pass domain to sva_unbind_gpasid()

 Documentation/driver-api/vfio.rst           |  77 +++++
 Documentation/userspace-api/iommu.rst       |   5 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  29 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c       |  29 +-
 drivers/iommu/intel/cap_audit.h             |   7 +
 drivers/iommu/intel/iommu.c                 |  68 ++++-
 drivers/iommu/intel/svm.c                   |   3 +-
 drivers/iommu/iommu.c                       |   2 +-
 drivers/vfio/pci/vfio_pci_config.c          |   2 +-
 drivers/vfio/vfio_iommu_type1.c             | 321 +++++++++++++++++++-
 include/linux/intel-iommu.h                 |   3 +-
 include/linux/iommu.h                       |   3 +-
 include/uapi/linux/iommu.h                  |  72 +++++
 include/uapi/linux/vfio.h                   |  57 ++++
 14 files changed, 651 insertions(+), 27 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Patch v8 00/10] vfio: expose virtual Shared Virtual Addressing to VMs
@ 2021-03-02 20:35 ` Liu Yi L
  0 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, vivek.gautam, yi.y.sun, jgg, Lingshan.Zhu, jasowang,
	hao.wu

Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
Intel platforms allows address space sharing between device DMA and
applications. SVA can reduce programming complexity and enhance security.

This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
guest application address space with passthru devices. This is called
vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
changes. For IOMMU and QEMU changes, they are in separate series (listed
in the "Related series").

The high-level architecture for SVA virtualization is as below, the key
design of vSVA support is to utilize the dual-stage IOMMU translation (
also known as IOMMU nesting translation) capability in host IOMMU.


    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process CR3, FL only|
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'                       |
    |             |                       V
    |             |                CR3 in GPA
    '-------------'
Guest
------| Shadow |--------------------------|--------
      v        v                          v
Host
    .-------------.  .----------------------.
    |   pIOMMU    |  | Bind FL for GVA-GPA  |
    |             |  '----------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.------------------------------.
    |             |   |SL for GPA-HPA, default domain|
    |             |   '------------------------------'
    '-------------'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

This patch series has been updated regards to the disscussion around PASID
allocation in v7 [1]. This series has removed the PASID allocation, and
adapted to the /dev/ioasid solution [2]. Therefore the patches in this series
has been re-ordered. And the Patch Overview is as below:

 1. reports IOMMU nesting info to userspace ( patch 0001, 0002, 0003, 0010)
 2. vfio support for binding guest page table to host (patch 0004)
 3. vfio support for IOMMU cache invalidation from VMs (patch 0005)
 4. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0006, 0007)
 5. expose PASID capability to VM (patch 0008)
 6. add doc for VFIO dual stage control (patch 0009)

The complete vSVA kernel upstream patches are divided into three phases:
    1. Common APIs and PCI device direct assignment
    2. IOMMU-backed Mediated Device assignment
    3. Page Request Services (PRS) support

This patchset is aiming for the phase 1 and phase 2. And it has dependency on IOASID
extension from Jacobd Pan [3]. Complete set for current vSVA kernel and QEMU can be
found in [4] and [5].

[1] https://lore.kernel.org/kvm/DM5PR11MB14351121729909028D6EB365C31D0@DM5PR11MB1435.namprd11.prod.outlook.com/
[2] https://lore.kernel.org/linux-iommu/1614463286-97618-19-git-send-email-jacob.jun.pan@linux.intel.com/
[3] https://lore.kernel.org/linux-iommu/1614463286-97618-1-git-send-email-jacob.jun.pan@linux.intel.com/
[4] https://github.com/jacobpan/linux/tree/vsva-linux-5.12-rc1-v8
[5] https://github.com/luxis1999/qemu/tree/vsva_5.12_rc1_qemu_rfcv11

Regards,
Yi Liu

Changelog:
	- Patch v7 -> Patch v8:
	  a) removed the PASID allocation out of this series, it is covered by below patch:
	     https://lore.kernel.org/linux-iommu/1614463286-97618-19-git-send-email-jacob.jun.pan@linux.intel.com/
	  Patch v7: https://lore.kernel.org/kvm/1599734733-6431-1-git-send-email-yi.l.liu@intel.com/

	- Patch v6 -> Patch v7:
	  a) drop [PATCH v6 01/15] of v6 as it's merged by Alex.
	  b) rebase on Jacob's v8 IOMMU uapi enhancement and v2 IOASID extension patchset.
	  c) Address comments against v6 from Alex and Eric.
	  Patch v6: https://lore.kernel.org/kvm/1595917664-33276-1-git-send-email-yi.l.liu@intel.com/

	- Patch v5 -> Patch v6:
	  a) Address comments against v5 from Eric.
	  b) rebase on Jacob's v6 IOMMU uapi enhancement
	  Patch v5: https://lore.kernel.org/kvm/1594552870-55687-1-git-send-email-yi.l.liu@intel.com/

	- Patch v4 -> Patch v5:
	  a) Address comments against v4
	  Patch v4: https://lore.kernel.org/kvm/1593861989-35920-1-git-send-email-yi.l.liu@intel.com/

	- Patch v3 -> Patch v4:
	  a) Address comments against v3
	  b) Add rb from Stefan on patch 14/15
	  Patch v3: https://lore.kernel.org/kvm/1592988927-48009-1-git-send-email-yi.l.liu@intel.com/

	- Patch v2 -> Patch v3:
	  a) Rebase on top of Jacob's v3 iommu uapi patchset
	  b) Address comments from Kevin and Stefan Hajnoczi
	  c) Reuse DOMAIN_ATTR_NESTING to get iommu nesting info
	  d) Drop [PATCH v2 07/15] iommu/uapi: Add iommu_gpasid_unbind_data
	  Patch v2: https://lore.kernel.org/kvm/1591877734-66527-1-git-send-email-yi.l.liu@intel.com/

	- Patch v1 -> Patch v2:
	  a) Refactor vfio_iommu_type1_ioctl() per suggestion from Christoph
	     Hellwig.
	  b) Re-sequence the patch series for better bisect support.
	  c) Report IOMMU nesting cap info in detail instead of a format in
	     v1.
	  d) Enforce one group per nesting type container for vfio iommu type1
	     driver.
	  e) Build the vfio_mm related code from vfio.c to be a separate
	     vfio_pasid.ko.
	  f) Add PASID ownership check in IOMMU driver.
	  g) Adopted to latest IOMMU UAPI design. Removed IOMMU UAPI version
	     check. Added iommu_gpasid_unbind_data for unbind requests from
	     userspace.
	  h) Define a single ioctl:VFIO_IOMMU_NESTING_OP for bind/unbind_gtbl
	     and cahce_invld.
	  i) Document dual stage control in vfio.rst.
	  Patch v1: https://lore.kernel.org/kvm/1584880325-10561-1-git-send-email-yi.l.liu@intel.com/

	- RFC v3 -> Patch v1:
	  a) Address comments to the PASID request(alloc/free) path
	  b) Report PASID alloc/free availabitiy to user-space
	  c) Add a vfio_iommu_type1 parameter to support pasid quota tuning
	  d) Adjusted to latest ioasid code implementation. e.g. remove the
	     code for tracking the allocated PASIDs as latest ioasid code
	     will track it, VFIO could use ioasid_free_set() to free all
	     PASIDs.
	  RFC v3: https://lore.kernel.org/kvm/1580299912-86084-1-git-send-email-yi.l.liu@intel.com/

	- RFC v2 -> v3:
	  a) Refine the whole patchset to fit the roughly parts in this series
	  b) Adds complete vfio PASID management framework. e.g. pasid alloc,
	  free, reclaim in VM crash/down and per-VM PASID quota to prevent
	  PASID abuse.
	  c) Adds IOMMU uAPI version check and page table format check to ensure
	  version compatibility and hardware compatibility.
	  d) Adds vSVA vfio support for IOMMU-backed mdevs.
	  RFC v2: https://lore.kernel.org/kvm/1571919983-3231-1-git-send-email-yi.l.liu@intel.com/

	- RFC v1 -> v2:
	  Dropped vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE.
	  RFC v1: https://lore.kernel.org/kvm/1562324772-3084-1-git-send-email-yi.l.liu@intel.com/

---
Eric Auger (1):
  vfio: Document dual stage control

Liu Yi L (8):
  iommu: Report domain nesting info
  iommu/smmu: Report empty domain nesting info
  vfio/type1: Report iommu nesting info to userspace
  vfio/type1: Support binding guest page tables to PASID
  vfio/type1: Allow invalidating first-level/stage IOMMU cache
  vfio/type1: Add vSVA support for IOMMU-backed mdevs
  vfio/pci: Expose PCIe PASID capability to userspace
  iommu/vt-d: Support reporting nesting capability info

Yi Sun (1):
  iommu: Pass domain to sva_unbind_gpasid()

 Documentation/driver-api/vfio.rst           |  77 +++++
 Documentation/userspace-api/iommu.rst       |   5 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  29 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c       |  29 +-
 drivers/iommu/intel/cap_audit.h             |   7 +
 drivers/iommu/intel/iommu.c                 |  68 ++++-
 drivers/iommu/intel/svm.c                   |   3 +-
 drivers/iommu/iommu.c                       |   2 +-
 drivers/vfio/pci/vfio_pci_config.c          |   2 +-
 drivers/vfio/vfio_iommu_type1.c             | 321 +++++++++++++++++++-
 include/linux/intel-iommu.h                 |   3 +-
 include/linux/iommu.h                       |   3 +-
 include/uapi/linux/iommu.h                  |  72 +++++
 include/uapi/linux/vfio.h                   |  57 ++++
 14 files changed, 651 insertions(+), 27 deletions(-)

-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Patch v8 01/10] iommu: Report domain nesting info
  2021-03-02 20:35 ` Liu Yi L
@ 2021-03-02 20:35   ` Liu Yi L
  -1 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: kevin.tian, jacob.jun.pan, ashok.raj, yi.l.liu, jun.j.tian,
	yi.y.sun, jean-philippe, peterx, jasowang, hao.wu, stefanha,
	iommu, kvm, jgg, Lingshan.Zhu, vivek.gautam

IOMMUs that support nesting translation needs report the capability info
to userspace. It gives information about requirements the userspace needs
to implement plus other features characterizing the physical implementation.

This patch introduces a new IOMMU UAPI struct that gives information about
the nesting capabilities and features. This struct is supposed to be returned
by iommu_domain_get_attr() with DOMAIN_ATTR_NESTING attribute parameter, with
one domain whose type has been set to DOMAIN_ATTR_NESTING.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v7 -> v8:
*) add padding in struct iommu_nesting_info_vtd
*) describe future extension rules for struct iommu_nesting_info in iommu.rst.
*) remove SYSWIDE_PASID

v6 -> v7:
*) rephrase the commit message, replace the @data[] field in struct
   iommu_nesting_info with union per comments from Eric Auger.

v5 -> v6:
*) rephrase the feature notes per comments from Eric Auger.
*) rename @size of struct iommu_nesting_info to @argsz.

v4 -> v5:
*) address comments from Eric Auger.

v3 -> v4:
*) split the SMMU driver changes to be a separate patch
*) move the @addr_width and @pasid_bits from vendor specific
   part to generic part.
*) tweak the description for the @features field of struct
   iommu_nesting_info.
*) add description on the @data[] field of struct iommu_nesting_info

v2 -> v3:
*) remvoe cap/ecap_mask in iommu_nesting_info.
*) reuse DOMAIN_ATTR_NESTING to get nesting info.
*) return an empty iommu_nesting_info for SMMU drivers per Jean'
   suggestion.
---
 Documentation/userspace-api/iommu.rst |  5 +-
 include/uapi/linux/iommu.h            | 72 +++++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/Documentation/userspace-api/iommu.rst b/Documentation/userspace-api/iommu.rst
index d3108c1519d5..ad06bb94aad5 100644
--- a/Documentation/userspace-api/iommu.rst
+++ b/Documentation/userspace-api/iommu.rst
@@ -26,6 +26,7 @@ supported user-kernel APIs are as follows:
 2. Bind/Unbind guest PASID table (e.g. ARM SMMU)
 3. Invalidate IOMMU caches upon guest requests
 4. Report errors to the guest and serve page requests
+5. Read iommu_nesting_info from kernel
 
 Requirements
 ============
@@ -96,7 +97,9 @@ kernel. Simply recompiling existing code with newer kernel header should
 not be an issue in that only existing flags are used.
 
 IOMMU vendor driver should report the below features to IOMMU UAPI
-consumers (e.g. via VFIO).
+consumers (e.g. via VFIO). The feature list is passed by struct
+iommu_nesting_info. The future extension to this structure follows
+the rule defined in section "Extension Rules & Precautions".
 
 1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
 2. IOMMU_NESTING_FEAT_BIND_PGTBL
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index e1d9e75f2c94..e924bfc091e8 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -338,4 +338,76 @@ struct iommu_gpasid_bind_data {
 	} vendor;
 };
 
+/*
+ * struct iommu_nesting_info_vtd - Intel VT-d specific nesting info.
+ *
+ * @flags:	VT-d specific flags. Currently reserved for future
+ *		extension. must be set to 0.
+ * @cap_reg:	Describe basic capabilities as defined in VT-d capability
+ *		register.
+ * @ecap_reg:	Describe the extended capabilities as defined in VT-d
+ *		extended capability register.
+ */
+struct iommu_nesting_info_vtd {
+	__u32	flags;
+	__u8	padding[12];
+	__u64	cap_reg;
+	__u64	ecap_reg;
+};
+
+/*
+ * struct iommu_nesting_info - Information for nesting-capable IOMMU.
+ *			       userspace should check it before using
+ *			       nesting capability.
+ *
+ * @argsz:	size of the whole structure.
+ * @flags:	currently reserved for future extension. must set to 0.
+ * @format:	PASID table entry format, the same definition as struct
+ *		iommu_gpasid_bind_data @format.
+ * @features:	supported nesting features.
+ * @addr_width:	the output addr width of first level/stage translation
+ * @pasid_bits:	maximum supported PASID bits, 0 represents no PASID
+ *		support.
+ * @vendor:	vendor specific data, structure type can be deduced from
+ *		@format field.
+ *
+ * +===============+======================================================+
+ * | feature       |  Notes                                               |
+ * +===============+======================================================+
+ * | BIND_PGTBL    |  IOMMU vendor driver sets it to mandate userspace to |
+ * |               |  bind the first level/stage page table to associated |
+ * |               |  PASID (either the one specified in bind request or  |
+ * |               |  the default PASID of iommu domain), through IOMMU   |
+ * |               |  UAPI.                                               |
+ * +---------------+------------------------------------------------------+
+ * | CACHE_INVLD   |  IOMMU vendor driver sets it to mandate userspace to |
+ * |               |  explicitly invalidate the IOMMU cache through IOMMU |
+ * |               |  UAPI according to vendor-specific requirement when  |
+ * |               |  changing the 1st level/stage page table.            |
+ * +---------------+------------------------------------------------------+
+ *
+ * data struct types defined for @format:
+ * +================================+=====================================+
+ * | @format                        | data struct                         |
+ * +================================+=====================================+
+ * | IOMMU_PASID_FORMAT_INTEL_VTD   | struct iommu_nesting_info_vtd       |
+ * +--------------------------------+-------------------------------------+
+ *
+ */
+struct iommu_nesting_info {
+	__u32	argsz;
+	__u32	flags;
+	__u32	format;
+#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 0)
+#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 1)
+	__u32	features;
+	__u16	addr_width;
+	__u16	pasid_bits;
+	__u8	padding[12];
+	/* Vendor specific data */
+	union {
+		struct iommu_nesting_info_vtd vtd;
+	} vendor;
+};
+
 #endif /* _UAPI_IOMMU_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 01/10] iommu: Report domain nesting info
@ 2021-03-02 20:35   ` Liu Yi L
  0 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, vivek.gautam, yi.y.sun, jgg, Lingshan.Zhu, jasowang,
	hao.wu

IOMMUs that support nesting translation needs report the capability info
to userspace. It gives information about requirements the userspace needs
to implement plus other features characterizing the physical implementation.

This patch introduces a new IOMMU UAPI struct that gives information about
the nesting capabilities and features. This struct is supposed to be returned
by iommu_domain_get_attr() with DOMAIN_ATTR_NESTING attribute parameter, with
one domain whose type has been set to DOMAIN_ATTR_NESTING.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v7 -> v8:
*) add padding in struct iommu_nesting_info_vtd
*) describe future extension rules for struct iommu_nesting_info in iommu.rst.
*) remove SYSWIDE_PASID

v6 -> v7:
*) rephrase the commit message, replace the @data[] field in struct
   iommu_nesting_info with union per comments from Eric Auger.

v5 -> v6:
*) rephrase the feature notes per comments from Eric Auger.
*) rename @size of struct iommu_nesting_info to @argsz.

v4 -> v5:
*) address comments from Eric Auger.

v3 -> v4:
*) split the SMMU driver changes to be a separate patch
*) move the @addr_width and @pasid_bits from vendor specific
   part to generic part.
*) tweak the description for the @features field of struct
   iommu_nesting_info.
*) add description on the @data[] field of struct iommu_nesting_info

v2 -> v3:
*) remvoe cap/ecap_mask in iommu_nesting_info.
*) reuse DOMAIN_ATTR_NESTING to get nesting info.
*) return an empty iommu_nesting_info for SMMU drivers per Jean'
   suggestion.
---
 Documentation/userspace-api/iommu.rst |  5 +-
 include/uapi/linux/iommu.h            | 72 +++++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/Documentation/userspace-api/iommu.rst b/Documentation/userspace-api/iommu.rst
index d3108c1519d5..ad06bb94aad5 100644
--- a/Documentation/userspace-api/iommu.rst
+++ b/Documentation/userspace-api/iommu.rst
@@ -26,6 +26,7 @@ supported user-kernel APIs are as follows:
 2. Bind/Unbind guest PASID table (e.g. ARM SMMU)
 3. Invalidate IOMMU caches upon guest requests
 4. Report errors to the guest and serve page requests
+5. Read iommu_nesting_info from kernel
 
 Requirements
 ============
@@ -96,7 +97,9 @@ kernel. Simply recompiling existing code with newer kernel header should
 not be an issue in that only existing flags are used.
 
 IOMMU vendor driver should report the below features to IOMMU UAPI
-consumers (e.g. via VFIO).
+consumers (e.g. via VFIO). The feature list is passed by struct
+iommu_nesting_info. The future extension to this structure follows
+the rule defined in section "Extension Rules & Precautions".
 
 1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
 2. IOMMU_NESTING_FEAT_BIND_PGTBL
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index e1d9e75f2c94..e924bfc091e8 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -338,4 +338,76 @@ struct iommu_gpasid_bind_data {
 	} vendor;
 };
 
+/*
+ * struct iommu_nesting_info_vtd - Intel VT-d specific nesting info.
+ *
+ * @flags:	VT-d specific flags. Currently reserved for future
+ *		extension. must be set to 0.
+ * @cap_reg:	Describe basic capabilities as defined in VT-d capability
+ *		register.
+ * @ecap_reg:	Describe the extended capabilities as defined in VT-d
+ *		extended capability register.
+ */
+struct iommu_nesting_info_vtd {
+	__u32	flags;
+	__u8	padding[12];
+	__u64	cap_reg;
+	__u64	ecap_reg;
+};
+
+/*
+ * struct iommu_nesting_info - Information for nesting-capable IOMMU.
+ *			       userspace should check it before using
+ *			       nesting capability.
+ *
+ * @argsz:	size of the whole structure.
+ * @flags:	currently reserved for future extension. must set to 0.
+ * @format:	PASID table entry format, the same definition as struct
+ *		iommu_gpasid_bind_data @format.
+ * @features:	supported nesting features.
+ * @addr_width:	the output addr width of first level/stage translation
+ * @pasid_bits:	maximum supported PASID bits, 0 represents no PASID
+ *		support.
+ * @vendor:	vendor specific data, structure type can be deduced from
+ *		@format field.
+ *
+ * +===============+======================================================+
+ * | feature       |  Notes                                               |
+ * +===============+======================================================+
+ * | BIND_PGTBL    |  IOMMU vendor driver sets it to mandate userspace to |
+ * |               |  bind the first level/stage page table to associated |
+ * |               |  PASID (either the one specified in bind request or  |
+ * |               |  the default PASID of iommu domain), through IOMMU   |
+ * |               |  UAPI.                                               |
+ * +---------------+------------------------------------------------------+
+ * | CACHE_INVLD   |  IOMMU vendor driver sets it to mandate userspace to |
+ * |               |  explicitly invalidate the IOMMU cache through IOMMU |
+ * |               |  UAPI according to vendor-specific requirement when  |
+ * |               |  changing the 1st level/stage page table.            |
+ * +---------------+------------------------------------------------------+
+ *
+ * data struct types defined for @format:
+ * +================================+=====================================+
+ * | @format                        | data struct                         |
+ * +================================+=====================================+
+ * | IOMMU_PASID_FORMAT_INTEL_VTD   | struct iommu_nesting_info_vtd       |
+ * +--------------------------------+-------------------------------------+
+ *
+ */
+struct iommu_nesting_info {
+	__u32	argsz;
+	__u32	flags;
+	__u32	format;
+#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 0)
+#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 1)
+	__u32	features;
+	__u16	addr_width;
+	__u16	pasid_bits;
+	__u8	padding[12];
+	/* Vendor specific data */
+	union {
+		struct iommu_nesting_info_vtd vtd;
+	} vendor;
+};
+
 #endif /* _UAPI_IOMMU_H */
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 02/10] iommu/smmu: Report empty domain nesting info
  2021-03-02 20:35 ` Liu Yi L
@ 2021-03-02 20:35   ` Liu Yi L
  -1 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: kevin.tian, jacob.jun.pan, ashok.raj, yi.l.liu, jun.j.tian,
	yi.y.sun, jean-philippe, peterx, jasowang, hao.wu, stefanha,
	iommu, kvm, jgg, Lingshan.Zhu, vivek.gautam, Will Deacon,
	Robin Murphy

This patch is added as instead of returning a boolean for DOMAIN_ATTR_NESTING,
iommu_domain_get_attr() should return an iommu_nesting_info handle. For
now, return an empty nesting info struct for now as true nesting is not
yet supported by the SMMUs.

Note: this patch just ensure no compiling issue, to be functional ready
fro ARM platform, needs to apply patches from Vivek Gautam in below link.

https://lore.kernel.org/linux-iommu/20210212105859.8445-1-vivek.gautam@arm.com/

Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
v5 -> v6:
*) add review-by from Eric Auger.

v4 -> v5:
*) address comments from Eric Auger.
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 +++++++++++++++++++--
 drivers/iommu/arm/arm-smmu/arm-smmu.c       | 29 +++++++++++++++++++--
 2 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8594b4a83043..99ea3ee35826 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2449,6 +2449,32 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
 	return group;
 }
 
+static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain,
+					void *data)
+{
+	struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
+	unsigned int size;
+
+	if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+		return -ENODEV;
+
+	size = sizeof(struct iommu_nesting_info);
+
+	/*
+	 * if provided buffer size is smaller than expected, should
+	 * return 0 and also the expected buffer size to caller.
+	 */
+	if (info->argsz < size) {
+		info->argsz = size;
+		return 0;
+	}
+
+	/* report an empty iommu_nesting_info for now */
+	memset(info, 0x0, size);
+	info->argsz = size;
+	return 0;
+}
+
 static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 				    enum iommu_attr attr, void *data)
 {
@@ -2458,8 +2484,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 	case IOMMU_DOMAIN_UNMANAGED:
 		switch (attr) {
 		case DOMAIN_ATTR_NESTING:
-			*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
-			return 0;
+			return arm_smmu_domain_nesting_info(smmu_domain, data);
 		default:
 			return -ENODEV;
 		}
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index d8c6bfde6a61..d874c580ea80 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1481,6 +1481,32 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
 	return group;
 }
 
+static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain,
+					void *data)
+{
+	struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
+	unsigned int size;
+
+	if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+		return -ENODEV;
+
+	size = sizeof(struct iommu_nesting_info);
+
+	/*
+	 * if provided buffer size is smaller than expected, should
+	 * return 0 and also the expected buffer size to caller.
+	 */
+	if (info->argsz < size) {
+		info->argsz = size;
+		return 0;
+	}
+
+	/* report an empty iommu_nesting_info for now */
+	memset(info, 0x0, size);
+	info->argsz = size;
+	return 0;
+}
+
 static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 				    enum iommu_attr attr, void *data)
 {
@@ -1490,8 +1516,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 	case IOMMU_DOMAIN_UNMANAGED:
 		switch (attr) {
 		case DOMAIN_ATTR_NESTING:
-			*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
-			return 0;
+			return arm_smmu_domain_nesting_info(smmu_domain, data);
 		case DOMAIN_ATTR_IO_PGTABLE_CFG: {
 			struct io_pgtable_domain_attr *pgtbl_cfg = data;
 			*pgtbl_cfg = smmu_domain->pgtbl_cfg;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 02/10] iommu/smmu: Report empty domain nesting info
@ 2021-03-02 20:35   ` Liu Yi L
  0 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, vivek.gautam, yi.y.sun, jgg, Robin Murphy, Lingshan.Zhu,
	Will Deacon, jasowang, hao.wu

This patch is added as instead of returning a boolean for DOMAIN_ATTR_NESTING,
iommu_domain_get_attr() should return an iommu_nesting_info handle. For
now, return an empty nesting info struct for now as true nesting is not
yet supported by the SMMUs.

Note: this patch just ensure no compiling issue, to be functional ready
fro ARM platform, needs to apply patches from Vivek Gautam in below link.

https://lore.kernel.org/linux-iommu/20210212105859.8445-1-vivek.gautam@arm.com/

Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
v5 -> v6:
*) add review-by from Eric Auger.

v4 -> v5:
*) address comments from Eric Auger.
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 +++++++++++++++++++--
 drivers/iommu/arm/arm-smmu/arm-smmu.c       | 29 +++++++++++++++++++--
 2 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8594b4a83043..99ea3ee35826 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2449,6 +2449,32 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
 	return group;
 }
 
+static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain,
+					void *data)
+{
+	struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
+	unsigned int size;
+
+	if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+		return -ENODEV;
+
+	size = sizeof(struct iommu_nesting_info);
+
+	/*
+	 * if provided buffer size is smaller than expected, should
+	 * return 0 and also the expected buffer size to caller.
+	 */
+	if (info->argsz < size) {
+		info->argsz = size;
+		return 0;
+	}
+
+	/* report an empty iommu_nesting_info for now */
+	memset(info, 0x0, size);
+	info->argsz = size;
+	return 0;
+}
+
 static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 				    enum iommu_attr attr, void *data)
 {
@@ -2458,8 +2484,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 	case IOMMU_DOMAIN_UNMANAGED:
 		switch (attr) {
 		case DOMAIN_ATTR_NESTING:
-			*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
-			return 0;
+			return arm_smmu_domain_nesting_info(smmu_domain, data);
 		default:
 			return -ENODEV;
 		}
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index d8c6bfde6a61..d874c580ea80 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1481,6 +1481,32 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
 	return group;
 }
 
+static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain,
+					void *data)
+{
+	struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
+	unsigned int size;
+
+	if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+		return -ENODEV;
+
+	size = sizeof(struct iommu_nesting_info);
+
+	/*
+	 * if provided buffer size is smaller than expected, should
+	 * return 0 and also the expected buffer size to caller.
+	 */
+	if (info->argsz < size) {
+		info->argsz = size;
+		return 0;
+	}
+
+	/* report an empty iommu_nesting_info for now */
+	memset(info, 0x0, size);
+	info->argsz = size;
+	return 0;
+}
+
 static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 				    enum iommu_attr attr, void *data)
 {
@@ -1490,8 +1516,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 	case IOMMU_DOMAIN_UNMANAGED:
 		switch (attr) {
 		case DOMAIN_ATTR_NESTING:
-			*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
-			return 0;
+			return arm_smmu_domain_nesting_info(smmu_domain, data);
 		case DOMAIN_ATTR_IO_PGTABLE_CFG: {
 			struct io_pgtable_domain_attr *pgtbl_cfg = data;
 			*pgtbl_cfg = smmu_domain->pgtbl_cfg;
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 03/10] vfio/type1: Report iommu nesting info to userspace
  2021-03-02 20:35 ` Liu Yi L
@ 2021-03-02 20:35   ` Liu Yi L
  -1 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: kevin.tian, jacob.jun.pan, ashok.raj, yi.l.liu, jun.j.tian,
	yi.y.sun, jean-philippe, peterx, jasowang, hao.wu, stefanha,
	iommu, kvm, jgg, Lingshan.Zhu, vivek.gautam

This patch exports iommu nesting capability info to user space through
VFIO. Userspace is expected to check this info for supported uAPIs (e.g.
bind page table, cache invalidation) and the vendor specific format
information for first level/stage page table that will be bound to.

The nesting info is available only after container set to be NESTED type.
Current implementation imposes one limitation - one nesting container
should include at most one iommu group. The philosophy of vfio container
is having all groups/devices within the container share the same IOMMU
context. When vSVA is enabled, one IOMMU context could include one 2nd-
level address space and multiple 1st-level address spaces. While the
2nd-level address space is reasonably sharable by multiple groups, blindly
sharing 1st-level address spaces across all groups within the container
might instead break the guest expectation. In the future sub/super container
concept might be introduced to allow partial address space sharing within
an IOMMU context. But for now let's go with this restriction by requiring
singleton container for using nesting iommu features. Below link has the
related discussion about this decision.

https://lore.kernel.org/kvm/20200515115924.37e6996d@w520.home/

This patch also changes the NESTING type container behaviour. Something
that would have succeeded before will now fail: Before this series, if
user asked for a VFIO_IOMMU_TYPE1_NESTING, it would have succeeded even
if the SMMU didn't support stage-2, as the driver would have silently
fallen back on stage-1 mappings (which work exactly the same as stage-2
only since there was no nesting supported). After the series, we do check
for DOMAIN_ATTR_NESTING so if user asks for VFIO_IOMMU_TYPE1_NESTING and
the SMMU doesn't support stage-2, the ioctl fails. But it should be a good
fix and completely harmless. Detail can be found in below link as well.

https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
v7 -> v8:
*) tweak per Alex's comments against v7.
*) check "iommu->nesting_info->format == 0" in attach_group()

v6 -> v7:
*) using vfio_info_add_capability() for adding nesting cap per suggestion
   from Eric.

v5 -> v6:
*) address comments against v5 from Eric Auger.
*) don't report nesting cap to userspace if the nesting_info->format is
   invalid.

v4 -> v5:
*) address comments from Eric Auger.
*) return struct iommu_nesting_info for VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
   cap is much "cheap", if needs extension in future, just define another cap.
   https://lore.kernel.org/kvm/20200708132947.5b7ee954@x1.home/

v3 -> v4:
*) address comments against v3.

v1 -> v2:
*) added in v2
---
 drivers/vfio/vfio_iommu_type1.c | 102 +++++++++++++++++++++++++++-----
 include/uapi/linux/vfio.h       |  19 ++++++
 2 files changed, 105 insertions(+), 16 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 4bb162c1d649..3a5c84d4f19b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -63,22 +63,24 @@ MODULE_PARM_DESC(dma_entry_limit,
 		 "Maximum number of user DMA mappings per container (65535).");
 
 struct vfio_iommu {
-	struct list_head	domain_list;
-	struct list_head	iova_list;
-	struct vfio_domain	*external_domain; /* domain for external user */
-	struct mutex		lock;
-	struct rb_root		dma_list;
-	struct blocking_notifier_head notifier;
-	unsigned int		dma_avail;
-	unsigned int		vaddr_invalid_count;
-	uint64_t		pgsize_bitmap;
-	uint64_t		num_non_pinned_groups;
-	wait_queue_head_t	vaddr_wait;
-	bool			v2;
-	bool			nesting;
-	bool			dirty_page_tracking;
-	bool			pinned_page_dirty_scope;
-	bool			container_open;
+	struct list_head		domain_list;
+	struct list_head		iova_list;
+	/* domain for external user */
+	struct vfio_domain		*external_domain;
+	struct mutex			lock;
+	struct rb_root			dma_list;
+	struct blocking_notifier_head	notifier;
+	unsigned int			dma_avail;
+	unsigned int			vaddr_invalid_count;
+	uint64_t			pgsize_bitmap;
+	uint64_t			num_non_pinned_groups;
+	wait_queue_head_t		vaddr_wait;
+	struct iommu_nesting_info	*nesting_info;
+	bool				v2;
+	bool				nesting;
+	bool				dirty_page_tracking;
+	bool				pinned_page_dirty_scope;
+	bool				container_open;
 };
 
 struct vfio_domain {
@@ -144,6 +146,9 @@ struct vfio_regions {
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)	\
 					(!list_empty(&iommu->domain_list))
 
+#define CONTAINER_HAS_DOMAIN(iommu)	(((iommu)->external_domain) || \
+					 (!list_empty(&(iommu)->domain_list)))
+
 #define DIRTY_BITMAP_BYTES(n)	(ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE)
 
 /*
@@ -2242,6 +2247,12 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu,
 	list_splice_tail(iova_copy, iova);
 }
 
+static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
+{
+	kfree(iommu->nesting_info);
+	iommu->nesting_info = NULL;
+}
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
 					 struct iommu_group *iommu_group)
 {
@@ -2264,6 +2275,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 		return -EINVAL;
 	}
 
+	/* Nesting type container can include only one group */
+	if (iommu->nesting && CONTAINER_HAS_DOMAIN(iommu)) {
+		mutex_unlock(&iommu->lock);
+		return -EINVAL;
+	}
+
 	group = kzalloc(sizeof(*group), GFP_KERNEL);
 	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
 	if (!group || !domain) {
@@ -2332,6 +2349,31 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_domain;
 
+	/* Nesting cap info is available only after attaching */
+	if (iommu->nesting) {
+		int size = sizeof(struct iommu_nesting_info);
+
+		iommu->nesting_info = kzalloc(size, GFP_KERNEL);
+		if (!iommu->nesting_info) {
+			ret = -ENOMEM;
+			goto out_detach;
+		}
+
+		/* Now get the nesting info */
+		iommu->nesting_info->argsz = size;
+		ret = iommu_domain_get_attr(domain->domain,
+					    DOMAIN_ATTR_NESTING,
+					    iommu->nesting_info);
+		if (ret)
+			goto out_detach;
+
+		/* when @format of nesting_info is 0, fail the attach */
+		if (iommu->nesting_info->format == 0) {
+			ret = -ENOENT;
+			goto out_detach;
+		}
+	}
+
 	/* Get aperture info */
 	iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY, &geo);
 
@@ -2441,6 +2483,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	return 0;
 
 out_detach:
+	vfio_iommu_release_nesting_info(iommu);
 	vfio_iommu_detach_group(domain, group);
 out_domain:
 	iommu_domain_free(domain->domain);
@@ -2626,6 +2669,8 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 				} else {
 					vfio_iommu_unmap_unpin_reaccount(iommu);
 				}
+
+				vfio_iommu_release_nesting_info(iommu);
 			}
 			iommu_domain_free(domain->domain);
 			list_del(&domain->next);
@@ -2854,6 +2899,28 @@ static int vfio_iommu_dma_avail_build_caps(struct vfio_iommu *iommu,
 					sizeof(cap_dma_avail));
 }
 
+static int vfio_iommu_nesting_build_caps(struct vfio_iommu *iommu,
+					 struct vfio_info_cap *caps)
+{
+	struct vfio_iommu_type1_info_cap_nesting nesting_cap;
+	size_t size;
+
+	/* when nesting_info is null, no need to go further */
+	if (!iommu->nesting_info)
+		return 0;
+
+	size = offsetof(struct vfio_iommu_type1_info_cap_nesting, info) +
+	       iommu->nesting_info->argsz;
+
+	nesting_cap.header.id = VFIO_IOMMU_TYPE1_INFO_CAP_NESTING;
+	nesting_cap.header.version = 1;
+
+	memcpy(&nesting_cap.info, iommu->nesting_info,
+	       iommu->nesting_info->argsz);
+
+	return vfio_info_add_capability(caps, &nesting_cap.header, size);
+}
+
 static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
 				     unsigned long arg)
 {
@@ -2892,6 +2959,9 @@ static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
 	if (!ret)
 		ret = vfio_iommu_iova_build_caps(iommu, &caps);
 
+	if (!ret)
+		ret = vfio_iommu_nesting_build_caps(iommu, &caps);
+
 	mutex_unlock(&iommu->lock);
 
 	if (ret)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 8ce36c1d53ca..5dab20aaf7e7 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -14,6 +14,7 @@
 
 #include <linux/types.h>
 #include <linux/ioctl.h>
+#include <linux/iommu.h>
 
 #define VFIO_API_VERSION	0
 
@@ -1073,6 +1074,24 @@ struct vfio_iommu_type1_info_dma_avail {
 	__u32	avail;
 };
 
+/*
+ * The nesting capability allows to report the related capability
+ * and info for nesting iommu type.
+ *
+ * The structures below define version 1 of this capability.
+ *
+ * Nested capabilities should be checked by the userspace after
+ * setting VFIO_TYPE1_NESTING_IOMMU.
+ *
+ * @info: the nesting info provided by IOMMU driver.
+ */
+#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING  4
+
+struct vfio_iommu_type1_info_cap_nesting {
+	struct	vfio_info_cap_header header;
+	struct	iommu_nesting_info info;
+};
+
 #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 03/10] vfio/type1: Report iommu nesting info to userspace
@ 2021-03-02 20:35   ` Liu Yi L
  0 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, vivek.gautam, yi.y.sun, jgg, Lingshan.Zhu, jasowang,
	hao.wu

This patch exports iommu nesting capability info to user space through
VFIO. Userspace is expected to check this info for supported uAPIs (e.g.
bind page table, cache invalidation) and the vendor specific format
information for first level/stage page table that will be bound to.

The nesting info is available only after container set to be NESTED type.
Current implementation imposes one limitation - one nesting container
should include at most one iommu group. The philosophy of vfio container
is having all groups/devices within the container share the same IOMMU
context. When vSVA is enabled, one IOMMU context could include one 2nd-
level address space and multiple 1st-level address spaces. While the
2nd-level address space is reasonably sharable by multiple groups, blindly
sharing 1st-level address spaces across all groups within the container
might instead break the guest expectation. In the future sub/super container
concept might be introduced to allow partial address space sharing within
an IOMMU context. But for now let's go with this restriction by requiring
singleton container for using nesting iommu features. Below link has the
related discussion about this decision.

https://lore.kernel.org/kvm/20200515115924.37e6996d@w520.home/

This patch also changes the NESTING type container behaviour. Something
that would have succeeded before will now fail: Before this series, if
user asked for a VFIO_IOMMU_TYPE1_NESTING, it would have succeeded even
if the SMMU didn't support stage-2, as the driver would have silently
fallen back on stage-1 mappings (which work exactly the same as stage-2
only since there was no nesting supported). After the series, we do check
for DOMAIN_ATTR_NESTING so if user asks for VFIO_IOMMU_TYPE1_NESTING and
the SMMU doesn't support stage-2, the ioctl fails. But it should be a good
fix and completely harmless. Detail can be found in below link as well.

https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
v7 -> v8:
*) tweak per Alex's comments against v7.
*) check "iommu->nesting_info->format == 0" in attach_group()

v6 -> v7:
*) using vfio_info_add_capability() for adding nesting cap per suggestion
   from Eric.

v5 -> v6:
*) address comments against v5 from Eric Auger.
*) don't report nesting cap to userspace if the nesting_info->format is
   invalid.

v4 -> v5:
*) address comments from Eric Auger.
*) return struct iommu_nesting_info for VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
   cap is much "cheap", if needs extension in future, just define another cap.
   https://lore.kernel.org/kvm/20200708132947.5b7ee954@x1.home/

v3 -> v4:
*) address comments against v3.

v1 -> v2:
*) added in v2
---
 drivers/vfio/vfio_iommu_type1.c | 102 +++++++++++++++++++++++++++-----
 include/uapi/linux/vfio.h       |  19 ++++++
 2 files changed, 105 insertions(+), 16 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 4bb162c1d649..3a5c84d4f19b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -63,22 +63,24 @@ MODULE_PARM_DESC(dma_entry_limit,
 		 "Maximum number of user DMA mappings per container (65535).");
 
 struct vfio_iommu {
-	struct list_head	domain_list;
-	struct list_head	iova_list;
-	struct vfio_domain	*external_domain; /* domain for external user */
-	struct mutex		lock;
-	struct rb_root		dma_list;
-	struct blocking_notifier_head notifier;
-	unsigned int		dma_avail;
-	unsigned int		vaddr_invalid_count;
-	uint64_t		pgsize_bitmap;
-	uint64_t		num_non_pinned_groups;
-	wait_queue_head_t	vaddr_wait;
-	bool			v2;
-	bool			nesting;
-	bool			dirty_page_tracking;
-	bool			pinned_page_dirty_scope;
-	bool			container_open;
+	struct list_head		domain_list;
+	struct list_head		iova_list;
+	/* domain for external user */
+	struct vfio_domain		*external_domain;
+	struct mutex			lock;
+	struct rb_root			dma_list;
+	struct blocking_notifier_head	notifier;
+	unsigned int			dma_avail;
+	unsigned int			vaddr_invalid_count;
+	uint64_t			pgsize_bitmap;
+	uint64_t			num_non_pinned_groups;
+	wait_queue_head_t		vaddr_wait;
+	struct iommu_nesting_info	*nesting_info;
+	bool				v2;
+	bool				nesting;
+	bool				dirty_page_tracking;
+	bool				pinned_page_dirty_scope;
+	bool				container_open;
 };
 
 struct vfio_domain {
@@ -144,6 +146,9 @@ struct vfio_regions {
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)	\
 					(!list_empty(&iommu->domain_list))
 
+#define CONTAINER_HAS_DOMAIN(iommu)	(((iommu)->external_domain) || \
+					 (!list_empty(&(iommu)->domain_list)))
+
 #define DIRTY_BITMAP_BYTES(n)	(ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE)
 
 /*
@@ -2242,6 +2247,12 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu,
 	list_splice_tail(iova_copy, iova);
 }
 
+static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
+{
+	kfree(iommu->nesting_info);
+	iommu->nesting_info = NULL;
+}
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
 					 struct iommu_group *iommu_group)
 {
@@ -2264,6 +2275,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 		return -EINVAL;
 	}
 
+	/* Nesting type container can include only one group */
+	if (iommu->nesting && CONTAINER_HAS_DOMAIN(iommu)) {
+		mutex_unlock(&iommu->lock);
+		return -EINVAL;
+	}
+
 	group = kzalloc(sizeof(*group), GFP_KERNEL);
 	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
 	if (!group || !domain) {
@@ -2332,6 +2349,31 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_domain;
 
+	/* Nesting cap info is available only after attaching */
+	if (iommu->nesting) {
+		int size = sizeof(struct iommu_nesting_info);
+
+		iommu->nesting_info = kzalloc(size, GFP_KERNEL);
+		if (!iommu->nesting_info) {
+			ret = -ENOMEM;
+			goto out_detach;
+		}
+
+		/* Now get the nesting info */
+		iommu->nesting_info->argsz = size;
+		ret = iommu_domain_get_attr(domain->domain,
+					    DOMAIN_ATTR_NESTING,
+					    iommu->nesting_info);
+		if (ret)
+			goto out_detach;
+
+		/* when @format of nesting_info is 0, fail the attach */
+		if (iommu->nesting_info->format == 0) {
+			ret = -ENOENT;
+			goto out_detach;
+		}
+	}
+
 	/* Get aperture info */
 	iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY, &geo);
 
@@ -2441,6 +2483,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	return 0;
 
 out_detach:
+	vfio_iommu_release_nesting_info(iommu);
 	vfio_iommu_detach_group(domain, group);
 out_domain:
 	iommu_domain_free(domain->domain);
@@ -2626,6 +2669,8 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 				} else {
 					vfio_iommu_unmap_unpin_reaccount(iommu);
 				}
+
+				vfio_iommu_release_nesting_info(iommu);
 			}
 			iommu_domain_free(domain->domain);
 			list_del(&domain->next);
@@ -2854,6 +2899,28 @@ static int vfio_iommu_dma_avail_build_caps(struct vfio_iommu *iommu,
 					sizeof(cap_dma_avail));
 }
 
+static int vfio_iommu_nesting_build_caps(struct vfio_iommu *iommu,
+					 struct vfio_info_cap *caps)
+{
+	struct vfio_iommu_type1_info_cap_nesting nesting_cap;
+	size_t size;
+
+	/* when nesting_info is null, no need to go further */
+	if (!iommu->nesting_info)
+		return 0;
+
+	size = offsetof(struct vfio_iommu_type1_info_cap_nesting, info) +
+	       iommu->nesting_info->argsz;
+
+	nesting_cap.header.id = VFIO_IOMMU_TYPE1_INFO_CAP_NESTING;
+	nesting_cap.header.version = 1;
+
+	memcpy(&nesting_cap.info, iommu->nesting_info,
+	       iommu->nesting_info->argsz);
+
+	return vfio_info_add_capability(caps, &nesting_cap.header, size);
+}
+
 static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
 				     unsigned long arg)
 {
@@ -2892,6 +2959,9 @@ static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
 	if (!ret)
 		ret = vfio_iommu_iova_build_caps(iommu, &caps);
 
+	if (!ret)
+		ret = vfio_iommu_nesting_build_caps(iommu, &caps);
+
 	mutex_unlock(&iommu->lock);
 
 	if (ret)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 8ce36c1d53ca..5dab20aaf7e7 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -14,6 +14,7 @@
 
 #include <linux/types.h>
 #include <linux/ioctl.h>
+#include <linux/iommu.h>
 
 #define VFIO_API_VERSION	0
 
@@ -1073,6 +1074,24 @@ struct vfio_iommu_type1_info_dma_avail {
 	__u32	avail;
 };
 
+/*
+ * The nesting capability allows to report the related capability
+ * and info for nesting iommu type.
+ *
+ * The structures below define version 1 of this capability.
+ *
+ * Nested capabilities should be checked by the userspace after
+ * setting VFIO_TYPE1_NESTING_IOMMU.
+ *
+ * @info: the nesting info provided by IOMMU driver.
+ */
+#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING  4
+
+struct vfio_iommu_type1_info_cap_nesting {
+	struct	vfio_info_cap_header header;
+	struct	iommu_nesting_info info;
+};
+
 #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
 
 /**
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
  2021-03-02 20:35 ` Liu Yi L
@ 2021-03-02 20:35   ` Liu Yi L
  -1 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: kevin.tian, jacob.jun.pan, ashok.raj, yi.l.liu, jun.j.tian,
	yi.y.sun, jean-philippe, peterx, jasowang, hao.wu, stefanha,
	iommu, kvm, jgg, Lingshan.Zhu, vivek.gautam

Nesting translation allows two-levels/stages page tables, with 1st level
for guest translations (e.g. GVA->GPA), 2nd level for host translations
(e.g. GPA->HPA). This patch adds interface for binding guest page tables
to a PASID. This PASID must have been allocated by the userspace before
the binding request. e.g. allocated from /dev/ioasid. As the bind data
is parsed by iommu abstract layer, so this patch doesn't have the ownership
check against the PASID from userspace. It would be done in the iommu sub-
system.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v7 -> v8:
*) adapt to /dev/ioasid
*) address comments from Alex on v7.
*) adapt to latest iommu_sva_unbind_gpasid() implementation.
*) remove the OP check against VFIO_IOMMU_NESTING_OP_NUM as it's redundant
   to the default switch case in vfio_iommu_handle_pgtbl_op().

v6 -> v7:
*) introduced @user in struct domain_capsule to simplify the code per Eric's
   suggestion.
*) introduced VFIO_IOMMU_NESTING_OP_NUM for sanitizing op from userspace.
*) corrected the @argsz value of unbind_data in vfio_group_unbind_gpasid_fn().

v5 -> v6:
*) dropped vfio_find_nesting_group() and add vfio_get_nesting_domain_capsule().
   per comment from Eric.
*) use iommu_uapi_sva_bind/unbind_gpasid() and iommu_sva_unbind_gpasid() in
   linux/iommu.h for userspace operation and in-kernel operation.

v3 -> v4:
*) address comments from Alex on v3

v2 -> v3:
*) use __iommu_sva_unbind_gpasid() for unbind call issued by VFIO
   https://lore.kernel.org/linux-iommu/1592931837-58223-6-git-send-email-jacob.jun.pan@linux.intel.com/

v1 -> v2:
*) rename subject from "vfio/type1: Bind guest page tables to host"
*) remove VFIO_IOMMU_BIND, introduce VFIO_IOMMU_NESTING_OP to support bind/
   unbind guet page table
*) replaced vfio_iommu_for_each_dev() with a group level loop since this
   series enforces one group per container w/ nesting type as start.
*) rename vfio_bind/unbind_gpasid_fn() to vfio_dev_bind/unbind_gpasid_fn()
*) vfio_dev_unbind_gpasid() always successful
*) use vfio_mm->pasid_lock to avoid race between PASID free and page table
   bind/unbind
---
 drivers/vfio/vfio_iommu_type1.c | 156 ++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       |  35 +++++++
 2 files changed, 191 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3a5c84d4f19b..0044931b80dc 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -164,6 +164,34 @@ struct vfio_regions {
 
 #define WAITED 1
 
+struct domain_capsule {
+	struct vfio_group	*group;
+	struct iommu_domain	*domain;
+	void			*data;
+	/* set if @data contains a user pointer*/
+	bool			user;
+};
+
+/* iommu->lock must be held */
+static int vfio_prepare_nesting_domain_capsule(struct vfio_iommu *iommu,
+					       struct domain_capsule *dc)
+{
+	struct vfio_domain *domain;
+	struct vfio_group *group;
+
+	if (!iommu->nesting_info)
+		return -EINVAL;
+
+	domain = list_first_entry(&iommu->domain_list,
+				  struct vfio_domain, next);
+	group = list_first_entry(&domain->group_list,
+				 struct vfio_group, next);
+	dc->group = group;
+	dc->domain = domain->domain;
+	dc->user = true;
+	return 0;
+}
+
 static int put_pfn(unsigned long pfn, int prot);
 
 static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu,
@@ -2607,6 +2635,51 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
+{
+	struct domain_capsule *dc = (struct domain_capsule *)data;
+	unsigned long arg = *(unsigned long *)dc->data;
+
+	return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
+					  (void __user *)arg);
+}
+
+static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
+{
+	struct domain_capsule *dc = (struct domain_capsule *)data;
+
+	/*
+	 * dc->user is a toggle for the unbind operation. When user
+	 * set, the dc->data passes in a __user pointer and requires
+	 * to use iommu_uapi_sva_unbind_gpasid(), in which it will
+	 * copy the unbind data from the user buffer. When user is
+	 * clear, the dc->data passes in a pasid which is going to
+	 * be unbind no need to copy data from userspace.
+	 */
+	if (dc->user) {
+		unsigned long arg = *(unsigned long *)dc->data;
+
+		iommu_uapi_sva_unbind_gpasid(dc->domain,
+					     dev, (void __user *)arg);
+	} else {
+		ioasid_t pasid = *(ioasid_t *)dc->data;
+
+		iommu_sva_unbind_gpasid(dc->domain, dev, pasid);
+	}
+	return 0;
+}
+
+static void vfio_group_unbind_gpasid_fn(ioasid_t pasid, void *data)
+{
+	struct domain_capsule *dc = (struct domain_capsule *)data;
+
+	dc->user = false;
+	dc->data = &pasid;
+
+	iommu_group_for_each_dev(dc->group->iommu_group,
+				 dc, vfio_dev_unbind_gpasid_fn);
+}
+
 static void vfio_iommu_type1_detach_group(void *iommu_data,
 					  struct iommu_group *iommu_group)
 {
@@ -2650,6 +2723,27 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 		if (!group)
 			continue;
 
+		if (iommu->nesting_info &&
+		    iommu->nesting_info->features &
+					IOMMU_NESTING_FEAT_BIND_PGTBL) {
+			struct domain_capsule dc = { .group = group,
+						     .domain = domain->domain,
+						     .data = NULL };
+			struct ioasid_user *iuser;
+
+			/*
+			 * For devices attached to nesting type iommu,
+			 * VFIO should unbind page tables bound with the
+			 * devices in the iommu group before detaching.
+			 */
+			iuser = ioasid_user_get_from_task(current);
+			if (!(IS_ERR(iuser) || !iuser)) {
+				ioasid_user_for_each_id(iuser, &dc,
+					       vfio_group_unbind_gpasid_fn);
+				ioasid_user_put(iuser);
+			}
+		}
+
 		vfio_iommu_detach_group(domain, group);
 		update_dirty_scope = !group->pinned_page_dirty_scope;
 		list_del(&group->next);
@@ -3166,6 +3260,66 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
 	return -EINVAL;
 }
 
+static long vfio_iommu_handle_pgtbl_op(struct vfio_iommu *iommu,
+				       bool is_bind, unsigned long arg)
+{
+	struct domain_capsule dc = { .data = &arg, .user = true };
+	struct iommu_nesting_info *info;
+	int ret;
+
+	mutex_lock(&iommu->lock);
+
+	info = iommu->nesting_info;
+	if (!info || !(info->features & IOMMU_NESTING_FEAT_BIND_PGTBL)) {
+		ret = -EOPNOTSUPP;
+		goto out_unlock;
+	}
+
+	ret = vfio_prepare_nesting_domain_capsule(iommu, &dc);
+	if (ret)
+		goto out_unlock;
+
+	if (is_bind)
+		ret = iommu_group_for_each_dev(dc.group->iommu_group, &dc,
+					       vfio_dev_bind_gpasid_fn);
+	if (ret || !is_bind)
+		iommu_group_for_each_dev(dc.group->iommu_group,
+					 &dc, vfio_dev_unbind_gpasid_fn);
+
+out_unlock:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
+static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
+					unsigned long arg)
+{
+	struct vfio_iommu_type1_nesting_op hdr;
+	unsigned int minsz;
+	int ret;
+
+	minsz = offsetofend(struct vfio_iommu_type1_nesting_op, flags);
+
+	if (copy_from_user(&hdr, (void __user *)arg, minsz))
+		return -EFAULT;
+
+	if (hdr.argsz < minsz || hdr.flags & ~VFIO_NESTING_OP_MASK)
+		return -EINVAL;
+
+	switch (hdr.flags & VFIO_NESTING_OP_MASK) {
+	case VFIO_IOMMU_NESTING_OP_BIND_PGTBL:
+		ret = vfio_iommu_handle_pgtbl_op(iommu, true, arg + minsz);
+		break;
+	case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
+		ret = vfio_iommu_handle_pgtbl_op(iommu, false, arg + minsz);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -3182,6 +3336,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		return vfio_iommu_type1_unmap_dma(iommu, arg);
 	case VFIO_IOMMU_DIRTY_PAGES:
 		return vfio_iommu_type1_dirty_pages(iommu, arg);
+	case VFIO_IOMMU_NESTING_OP:
+		return vfio_iommu_type1_nesting_op(iommu, arg);
 	default:
 		return -ENOTTY;
 	}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5dab20aaf7e7..985e6cf4c52d 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1227,6 +1227,41 @@ struct vfio_iommu_type1_dirty_bitmap_get {
 
 #define VFIO_IOMMU_DIRTY_PAGES             _IO(VFIO_TYPE, VFIO_BASE + 17)
 
+/**
+ * VFIO_IOMMU_NESTING_OP - _IOW(VFIO_TYPE, VFIO_BASE + 18,
+ *				struct vfio_iommu_type1_nesting_op)
+ *
+ * This interface allows userspace to utilize the nesting IOMMU
+ * capabilities as reported in VFIO_IOMMU_TYPE1_INFO_CAP_NESTING
+ * cap through VFIO_IOMMU_GET_INFO. For platforms which require
+ * system wide PASID, PASID will be allocated by VFIO_IOMMU_PASID
+ * _REQUEST.
+ *
+ * @data[] types defined for each op:
+ * +=================+===============================================+
+ * | NESTING OP      |      @data[]                                  |
+ * +=================+===============================================+
+ * | BIND_PGTBL      |      struct iommu_gpasid_bind_data            |
+ * +-----------------+-----------------------------------------------+
+ * | UNBIND_PGTBL    |      struct iommu_gpasid_bind_data            |
+ * +-----------------+-----------------------------------------------+
+ *
+ * returns: 0 on success, -errno on failure.
+ */
+struct vfio_iommu_type1_nesting_op {
+	__u32	argsz;
+	__u32	flags;
+#define VFIO_NESTING_OP_MASK	(0xffff) /* lower 16-bits for op */
+	__u8	data[];
+};
+
+enum {
+	VFIO_IOMMU_NESTING_OP_BIND_PGTBL,
+	VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL,
+};
+
+#define VFIO_IOMMU_NESTING_OP		_IO(VFIO_TYPE, VFIO_BASE + 18)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
@ 2021-03-02 20:35   ` Liu Yi L
  0 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, vivek.gautam, yi.y.sun, jgg, Lingshan.Zhu, jasowang,
	hao.wu

Nesting translation allows two-levels/stages page tables, with 1st level
for guest translations (e.g. GVA->GPA), 2nd level for host translations
(e.g. GPA->HPA). This patch adds interface for binding guest page tables
to a PASID. This PASID must have been allocated by the userspace before
the binding request. e.g. allocated from /dev/ioasid. As the bind data
is parsed by iommu abstract layer, so this patch doesn't have the ownership
check against the PASID from userspace. It would be done in the iommu sub-
system.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v7 -> v8:
*) adapt to /dev/ioasid
*) address comments from Alex on v7.
*) adapt to latest iommu_sva_unbind_gpasid() implementation.
*) remove the OP check against VFIO_IOMMU_NESTING_OP_NUM as it's redundant
   to the default switch case in vfio_iommu_handle_pgtbl_op().

v6 -> v7:
*) introduced @user in struct domain_capsule to simplify the code per Eric's
   suggestion.
*) introduced VFIO_IOMMU_NESTING_OP_NUM for sanitizing op from userspace.
*) corrected the @argsz value of unbind_data in vfio_group_unbind_gpasid_fn().

v5 -> v6:
*) dropped vfio_find_nesting_group() and add vfio_get_nesting_domain_capsule().
   per comment from Eric.
*) use iommu_uapi_sva_bind/unbind_gpasid() and iommu_sva_unbind_gpasid() in
   linux/iommu.h for userspace operation and in-kernel operation.

v3 -> v4:
*) address comments from Alex on v3

v2 -> v3:
*) use __iommu_sva_unbind_gpasid() for unbind call issued by VFIO
   https://lore.kernel.org/linux-iommu/1592931837-58223-6-git-send-email-jacob.jun.pan@linux.intel.com/

v1 -> v2:
*) rename subject from "vfio/type1: Bind guest page tables to host"
*) remove VFIO_IOMMU_BIND, introduce VFIO_IOMMU_NESTING_OP to support bind/
   unbind guet page table
*) replaced vfio_iommu_for_each_dev() with a group level loop since this
   series enforces one group per container w/ nesting type as start.
*) rename vfio_bind/unbind_gpasid_fn() to vfio_dev_bind/unbind_gpasid_fn()
*) vfio_dev_unbind_gpasid() always successful
*) use vfio_mm->pasid_lock to avoid race between PASID free and page table
   bind/unbind
---
 drivers/vfio/vfio_iommu_type1.c | 156 ++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       |  35 +++++++
 2 files changed, 191 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3a5c84d4f19b..0044931b80dc 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -164,6 +164,34 @@ struct vfio_regions {
 
 #define WAITED 1
 
+struct domain_capsule {
+	struct vfio_group	*group;
+	struct iommu_domain	*domain;
+	void			*data;
+	/* set if @data contains a user pointer*/
+	bool			user;
+};
+
+/* iommu->lock must be held */
+static int vfio_prepare_nesting_domain_capsule(struct vfio_iommu *iommu,
+					       struct domain_capsule *dc)
+{
+	struct vfio_domain *domain;
+	struct vfio_group *group;
+
+	if (!iommu->nesting_info)
+		return -EINVAL;
+
+	domain = list_first_entry(&iommu->domain_list,
+				  struct vfio_domain, next);
+	group = list_first_entry(&domain->group_list,
+				 struct vfio_group, next);
+	dc->group = group;
+	dc->domain = domain->domain;
+	dc->user = true;
+	return 0;
+}
+
 static int put_pfn(unsigned long pfn, int prot);
 
 static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu,
@@ -2607,6 +2635,51 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
+{
+	struct domain_capsule *dc = (struct domain_capsule *)data;
+	unsigned long arg = *(unsigned long *)dc->data;
+
+	return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
+					  (void __user *)arg);
+}
+
+static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
+{
+	struct domain_capsule *dc = (struct domain_capsule *)data;
+
+	/*
+	 * dc->user is a toggle for the unbind operation. When user
+	 * set, the dc->data passes in a __user pointer and requires
+	 * to use iommu_uapi_sva_unbind_gpasid(), in which it will
+	 * copy the unbind data from the user buffer. When user is
+	 * clear, the dc->data passes in a pasid which is going to
+	 * be unbind no need to copy data from userspace.
+	 */
+	if (dc->user) {
+		unsigned long arg = *(unsigned long *)dc->data;
+
+		iommu_uapi_sva_unbind_gpasid(dc->domain,
+					     dev, (void __user *)arg);
+	} else {
+		ioasid_t pasid = *(ioasid_t *)dc->data;
+
+		iommu_sva_unbind_gpasid(dc->domain, dev, pasid);
+	}
+	return 0;
+}
+
+static void vfio_group_unbind_gpasid_fn(ioasid_t pasid, void *data)
+{
+	struct domain_capsule *dc = (struct domain_capsule *)data;
+
+	dc->user = false;
+	dc->data = &pasid;
+
+	iommu_group_for_each_dev(dc->group->iommu_group,
+				 dc, vfio_dev_unbind_gpasid_fn);
+}
+
 static void vfio_iommu_type1_detach_group(void *iommu_data,
 					  struct iommu_group *iommu_group)
 {
@@ -2650,6 +2723,27 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 		if (!group)
 			continue;
 
+		if (iommu->nesting_info &&
+		    iommu->nesting_info->features &
+					IOMMU_NESTING_FEAT_BIND_PGTBL) {
+			struct domain_capsule dc = { .group = group,
+						     .domain = domain->domain,
+						     .data = NULL };
+			struct ioasid_user *iuser;
+
+			/*
+			 * For devices attached to nesting type iommu,
+			 * VFIO should unbind page tables bound with the
+			 * devices in the iommu group before detaching.
+			 */
+			iuser = ioasid_user_get_from_task(current);
+			if (!(IS_ERR(iuser) || !iuser)) {
+				ioasid_user_for_each_id(iuser, &dc,
+					       vfio_group_unbind_gpasid_fn);
+				ioasid_user_put(iuser);
+			}
+		}
+
 		vfio_iommu_detach_group(domain, group);
 		update_dirty_scope = !group->pinned_page_dirty_scope;
 		list_del(&group->next);
@@ -3166,6 +3260,66 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
 	return -EINVAL;
 }
 
+static long vfio_iommu_handle_pgtbl_op(struct vfio_iommu *iommu,
+				       bool is_bind, unsigned long arg)
+{
+	struct domain_capsule dc = { .data = &arg, .user = true };
+	struct iommu_nesting_info *info;
+	int ret;
+
+	mutex_lock(&iommu->lock);
+
+	info = iommu->nesting_info;
+	if (!info || !(info->features & IOMMU_NESTING_FEAT_BIND_PGTBL)) {
+		ret = -EOPNOTSUPP;
+		goto out_unlock;
+	}
+
+	ret = vfio_prepare_nesting_domain_capsule(iommu, &dc);
+	if (ret)
+		goto out_unlock;
+
+	if (is_bind)
+		ret = iommu_group_for_each_dev(dc.group->iommu_group, &dc,
+					       vfio_dev_bind_gpasid_fn);
+	if (ret || !is_bind)
+		iommu_group_for_each_dev(dc.group->iommu_group,
+					 &dc, vfio_dev_unbind_gpasid_fn);
+
+out_unlock:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
+static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
+					unsigned long arg)
+{
+	struct vfio_iommu_type1_nesting_op hdr;
+	unsigned int minsz;
+	int ret;
+
+	minsz = offsetofend(struct vfio_iommu_type1_nesting_op, flags);
+
+	if (copy_from_user(&hdr, (void __user *)arg, minsz))
+		return -EFAULT;
+
+	if (hdr.argsz < minsz || hdr.flags & ~VFIO_NESTING_OP_MASK)
+		return -EINVAL;
+
+	switch (hdr.flags & VFIO_NESTING_OP_MASK) {
+	case VFIO_IOMMU_NESTING_OP_BIND_PGTBL:
+		ret = vfio_iommu_handle_pgtbl_op(iommu, true, arg + minsz);
+		break;
+	case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
+		ret = vfio_iommu_handle_pgtbl_op(iommu, false, arg + minsz);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -3182,6 +3336,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		return vfio_iommu_type1_unmap_dma(iommu, arg);
 	case VFIO_IOMMU_DIRTY_PAGES:
 		return vfio_iommu_type1_dirty_pages(iommu, arg);
+	case VFIO_IOMMU_NESTING_OP:
+		return vfio_iommu_type1_nesting_op(iommu, arg);
 	default:
 		return -ENOTTY;
 	}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5dab20aaf7e7..985e6cf4c52d 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1227,6 +1227,41 @@ struct vfio_iommu_type1_dirty_bitmap_get {
 
 #define VFIO_IOMMU_DIRTY_PAGES             _IO(VFIO_TYPE, VFIO_BASE + 17)
 
+/**
+ * VFIO_IOMMU_NESTING_OP - _IOW(VFIO_TYPE, VFIO_BASE + 18,
+ *				struct vfio_iommu_type1_nesting_op)
+ *
+ * This interface allows userspace to utilize the nesting IOMMU
+ * capabilities as reported in VFIO_IOMMU_TYPE1_INFO_CAP_NESTING
+ * cap through VFIO_IOMMU_GET_INFO. For platforms which require
+ * system wide PASID, PASID will be allocated by VFIO_IOMMU_PASID
+ * _REQUEST.
+ *
+ * @data[] types defined for each op:
+ * +=================+===============================================+
+ * | NESTING OP      |      @data[]                                  |
+ * +=================+===============================================+
+ * | BIND_PGTBL      |      struct iommu_gpasid_bind_data            |
+ * +-----------------+-----------------------------------------------+
+ * | UNBIND_PGTBL    |      struct iommu_gpasid_bind_data            |
+ * +-----------------+-----------------------------------------------+
+ *
+ * returns: 0 on success, -errno on failure.
+ */
+struct vfio_iommu_type1_nesting_op {
+	__u32	argsz;
+	__u32	flags;
+#define VFIO_NESTING_OP_MASK	(0xffff) /* lower 16-bits for op */
+	__u8	data[];
+};
+
+enum {
+	VFIO_IOMMU_NESTING_OP_BIND_PGTBL,
+	VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL,
+};
+
+#define VFIO_IOMMU_NESTING_OP		_IO(VFIO_TYPE, VFIO_BASE + 18)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 05/10] vfio/type1: Allow invalidating first-level/stage IOMMU cache
  2021-03-02 20:35 ` Liu Yi L
@ 2021-03-02 20:35   ` Liu Yi L
  -1 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: kevin.tian, jacob.jun.pan, ashok.raj, yi.l.liu, jun.j.tian,
	yi.y.sun, jean-philippe, peterx, jasowang, hao.wu, stefanha,
	iommu, kvm, jgg, Lingshan.Zhu, vivek.gautam

This patch provides an interface allowing the userspace to invalidate
IOMMU cache for first-level page table. It is required when the first
level IOMMU page table is not managed by the host kernel in the nested
translation setup.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v1 -> v2:
*) rename from "vfio/type1: Flush stage-1 IOMMU cache for nesting type"
*) rename vfio_cache_inv_fn() to vfio_dev_cache_invalidate_fn()
*) vfio_dev_cache_inv_fn() always successful
*) remove VFIO_IOMMU_CACHE_INVALIDATE, and reuse VFIO_IOMMU_NESTING_OP
---
 drivers/vfio/vfio_iommu_type1.c | 38 +++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       |  3 +++
 2 files changed, 41 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 0044931b80dc..86b6d8f9789a 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -3291,6 +3291,41 @@ static long vfio_iommu_handle_pgtbl_op(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static int vfio_dev_cache_invalidate_fn(struct device *dev, void *data)
+{
+	struct domain_capsule *dc = (struct domain_capsule *)data;
+	unsigned long arg = *(unsigned long *)dc->data;
+
+	iommu_uapi_cache_invalidate(dc->domain, dev, (void __user *)arg);
+	return 0;
+}
+
+static long vfio_iommu_invalidate_cache(struct vfio_iommu *iommu,
+					unsigned long arg)
+{
+	struct domain_capsule dc = { .data = &arg };
+	struct iommu_nesting_info *info;
+	int ret;
+
+	mutex_lock(&iommu->lock);
+	info = iommu->nesting_info;
+	if (!info || !(info->features & IOMMU_NESTING_FEAT_CACHE_INVLD)) {
+		ret = -EOPNOTSUPP;
+		goto out_unlock;
+	}
+
+	ret = vfio_prepare_nesting_domain_capsule(iommu, &dc);
+	if (ret)
+		goto out_unlock;
+
+	iommu_group_for_each_dev(dc.group->iommu_group, &dc,
+				 vfio_dev_cache_invalidate_fn);
+
+out_unlock:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
 					unsigned long arg)
 {
@@ -3313,6 +3348,9 @@ static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
 	case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
 		ret = vfio_iommu_handle_pgtbl_op(iommu, false, arg + minsz);
 		break;
+	case VFIO_IOMMU_NESTING_OP_CACHE_INVLD:
+		ret = vfio_iommu_invalidate_cache(iommu, arg + minsz);
+		break;
 	default:
 		ret = -EINVAL;
 	}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 985e6cf4c52d..08b8d236dfee 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1245,6 +1245,8 @@ struct vfio_iommu_type1_dirty_bitmap_get {
  * +-----------------+-----------------------------------------------+
  * | UNBIND_PGTBL    |      struct iommu_gpasid_bind_data            |
  * +-----------------+-----------------------------------------------+
+ * | CACHE_INVLD     |      struct iommu_cache_invalidate_info       |
+ * +-----------------+-----------------------------------------------+
  *
  * returns: 0 on success, -errno on failure.
  */
@@ -1258,6 +1260,7 @@ struct vfio_iommu_type1_nesting_op {
 enum {
 	VFIO_IOMMU_NESTING_OP_BIND_PGTBL,
 	VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL,
+	VFIO_IOMMU_NESTING_OP_CACHE_INVLD,
 };
 
 #define VFIO_IOMMU_NESTING_OP		_IO(VFIO_TYPE, VFIO_BASE + 18)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 05/10] vfio/type1: Allow invalidating first-level/stage IOMMU cache
@ 2021-03-02 20:35   ` Liu Yi L
  0 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, vivek.gautam, yi.y.sun, jgg, Lingshan.Zhu, jasowang,
	hao.wu

This patch provides an interface allowing the userspace to invalidate
IOMMU cache for first-level page table. It is required when the first
level IOMMU page table is not managed by the host kernel in the nested
translation setup.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v1 -> v2:
*) rename from "vfio/type1: Flush stage-1 IOMMU cache for nesting type"
*) rename vfio_cache_inv_fn() to vfio_dev_cache_invalidate_fn()
*) vfio_dev_cache_inv_fn() always successful
*) remove VFIO_IOMMU_CACHE_INVALIDATE, and reuse VFIO_IOMMU_NESTING_OP
---
 drivers/vfio/vfio_iommu_type1.c | 38 +++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       |  3 +++
 2 files changed, 41 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 0044931b80dc..86b6d8f9789a 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -3291,6 +3291,41 @@ static long vfio_iommu_handle_pgtbl_op(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static int vfio_dev_cache_invalidate_fn(struct device *dev, void *data)
+{
+	struct domain_capsule *dc = (struct domain_capsule *)data;
+	unsigned long arg = *(unsigned long *)dc->data;
+
+	iommu_uapi_cache_invalidate(dc->domain, dev, (void __user *)arg);
+	return 0;
+}
+
+static long vfio_iommu_invalidate_cache(struct vfio_iommu *iommu,
+					unsigned long arg)
+{
+	struct domain_capsule dc = { .data = &arg };
+	struct iommu_nesting_info *info;
+	int ret;
+
+	mutex_lock(&iommu->lock);
+	info = iommu->nesting_info;
+	if (!info || !(info->features & IOMMU_NESTING_FEAT_CACHE_INVLD)) {
+		ret = -EOPNOTSUPP;
+		goto out_unlock;
+	}
+
+	ret = vfio_prepare_nesting_domain_capsule(iommu, &dc);
+	if (ret)
+		goto out_unlock;
+
+	iommu_group_for_each_dev(dc.group->iommu_group, &dc,
+				 vfio_dev_cache_invalidate_fn);
+
+out_unlock:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
 					unsigned long arg)
 {
@@ -3313,6 +3348,9 @@ static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
 	case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
 		ret = vfio_iommu_handle_pgtbl_op(iommu, false, arg + minsz);
 		break;
+	case VFIO_IOMMU_NESTING_OP_CACHE_INVLD:
+		ret = vfio_iommu_invalidate_cache(iommu, arg + minsz);
+		break;
 	default:
 		ret = -EINVAL;
 	}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 985e6cf4c52d..08b8d236dfee 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1245,6 +1245,8 @@ struct vfio_iommu_type1_dirty_bitmap_get {
  * +-----------------+-----------------------------------------------+
  * | UNBIND_PGTBL    |      struct iommu_gpasid_bind_data            |
  * +-----------------+-----------------------------------------------+
+ * | CACHE_INVLD     |      struct iommu_cache_invalidate_info       |
+ * +-----------------+-----------------------------------------------+
  *
  * returns: 0 on success, -errno on failure.
  */
@@ -1258,6 +1260,7 @@ struct vfio_iommu_type1_nesting_op {
 enum {
 	VFIO_IOMMU_NESTING_OP_BIND_PGTBL,
 	VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL,
+	VFIO_IOMMU_NESTING_OP_CACHE_INVLD,
 };
 
 #define VFIO_IOMMU_NESTING_OP		_IO(VFIO_TYPE, VFIO_BASE + 18)
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 06/10] iommu: Pass domain to sva_unbind_gpasid()
  2021-03-02 20:35 ` Liu Yi L
@ 2021-03-02 20:35   ` Liu Yi L
  -1 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: kevin.tian, jacob.jun.pan, ashok.raj, yi.l.liu, jun.j.tian,
	yi.y.sun, jean-philippe, peterx, jasowang, hao.wu, stefanha,
	iommu, kvm, jgg, Lingshan.Zhu, vivek.gautam

From: Yi Sun <yi.y.sun@intel.com>

Current interface is good enough for SVA virtualization on an assigned
physical PCI device, but when it comes to mediated devices, a physical
device may be attached with multiple aux-domains. Also, for guest unbind,
the PASID to be unbind should be allocated to the VM. This check requires
to know the ioasid_set which is associated with the domain.

So this interface needs to pass in domain info. Then the iommu driver is
able to know which domain will be used for the 2nd stage translation of
the nesting mode and also be able to do PASID ownership check. This patch
passes @domain per the above reason.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
v7 -> v8:
*) tweaked the commit message.

v6 -> v7:
*) correct the link for the details of modifying pasid prototype to bve "u32".
*) hold off r-b from Eric Auger as there is modification in this patch, will
   seek r-b in this version.

v5 -> v6:
*) use "u32" prototype for @pasid.
*) add review-by from Eric Auger.

v2 -> v3:
*) pass in domain info only
*) use u32 for pasid instead of int type

v1 -> v2:
*) added in v2.
---
 drivers/iommu/intel/svm.c   | 3 ++-
 drivers/iommu/iommu.c       | 2 +-
 include/linux/intel-iommu.h | 3 ++-
 include/linux/iommu.h       | 3 ++-
 4 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 561d011c7287..7521b4aefd16 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -496,7 +496,8 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 	return ret;
 }
 
-int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
+int intel_svm_unbind_gpasid(struct iommu_domain *domain,
+			    struct device *dev, u32 pasid)
 {
 	struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
 	struct intel_svm_dev *sdev;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d46f103a1e4b..822e485683ae 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2185,7 +2185,7 @@ int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
 	if (unlikely(!domain->ops->sva_unbind_gpasid))
 		return -ENODEV;
 
-	return domain->ops->sva_unbind_gpasid(dev, pasid);
+	return domain->ops->sva_unbind_gpasid(domain, dev, pasid);
 }
 EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
 
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 554aa946f142..aaf403966444 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -755,7 +755,8 @@ extern int intel_svm_enable_prq(struct intel_iommu *iommu);
 extern int intel_svm_finish_prq(struct intel_iommu *iommu);
 int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 			  struct iommu_gpasid_bind_data *data);
-int intel_svm_unbind_gpasid(struct device *dev, u32 pasid);
+int intel_svm_unbind_gpasid(struct iommu_domain *domain,
+			    struct device *dev, u32 pasid);
 struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
 				 void *drvdata);
 void intel_svm_unbind(struct iommu_sva *handle);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5e7fe519430a..4840217a590b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -299,7 +299,8 @@ struct iommu_ops {
 	int (*sva_bind_gpasid)(struct iommu_domain *domain,
 			struct device *dev, struct iommu_gpasid_bind_data *data);
 
-	int (*sva_unbind_gpasid)(struct device *dev, u32 pasid);
+	int (*sva_unbind_gpasid)(struct iommu_domain *domain,
+				 struct device *dev, u32 pasid);
 
 	int (*def_domain_type)(struct device *dev);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 06/10] iommu: Pass domain to sva_unbind_gpasid()
@ 2021-03-02 20:35   ` Liu Yi L
  0 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, vivek.gautam, yi.y.sun, jgg, Lingshan.Zhu, jasowang,
	hao.wu

From: Yi Sun <yi.y.sun@intel.com>

Current interface is good enough for SVA virtualization on an assigned
physical PCI device, but when it comes to mediated devices, a physical
device may be attached with multiple aux-domains. Also, for guest unbind,
the PASID to be unbind should be allocated to the VM. This check requires
to know the ioasid_set which is associated with the domain.

So this interface needs to pass in domain info. Then the iommu driver is
able to know which domain will be used for the 2nd stage translation of
the nesting mode and also be able to do PASID ownership check. This patch
passes @domain per the above reason.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
v7 -> v8:
*) tweaked the commit message.

v6 -> v7:
*) correct the link for the details of modifying pasid prototype to bve "u32".
*) hold off r-b from Eric Auger as there is modification in this patch, will
   seek r-b in this version.

v5 -> v6:
*) use "u32" prototype for @pasid.
*) add review-by from Eric Auger.

v2 -> v3:
*) pass in domain info only
*) use u32 for pasid instead of int type

v1 -> v2:
*) added in v2.
---
 drivers/iommu/intel/svm.c   | 3 ++-
 drivers/iommu/iommu.c       | 2 +-
 include/linux/intel-iommu.h | 3 ++-
 include/linux/iommu.h       | 3 ++-
 4 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 561d011c7287..7521b4aefd16 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -496,7 +496,8 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 	return ret;
 }
 
-int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
+int intel_svm_unbind_gpasid(struct iommu_domain *domain,
+			    struct device *dev, u32 pasid)
 {
 	struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
 	struct intel_svm_dev *sdev;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d46f103a1e4b..822e485683ae 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2185,7 +2185,7 @@ int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
 	if (unlikely(!domain->ops->sva_unbind_gpasid))
 		return -ENODEV;
 
-	return domain->ops->sva_unbind_gpasid(dev, pasid);
+	return domain->ops->sva_unbind_gpasid(domain, dev, pasid);
 }
 EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
 
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 554aa946f142..aaf403966444 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -755,7 +755,8 @@ extern int intel_svm_enable_prq(struct intel_iommu *iommu);
 extern int intel_svm_finish_prq(struct intel_iommu *iommu);
 int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 			  struct iommu_gpasid_bind_data *data);
-int intel_svm_unbind_gpasid(struct device *dev, u32 pasid);
+int intel_svm_unbind_gpasid(struct iommu_domain *domain,
+			    struct device *dev, u32 pasid);
 struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
 				 void *drvdata);
 void intel_svm_unbind(struct iommu_sva *handle);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5e7fe519430a..4840217a590b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -299,7 +299,8 @@ struct iommu_ops {
 	int (*sva_bind_gpasid)(struct iommu_domain *domain,
 			struct device *dev, struct iommu_gpasid_bind_data *data);
 
-	int (*sva_unbind_gpasid)(struct device *dev, u32 pasid);
+	int (*sva_unbind_gpasid)(struct iommu_domain *domain,
+				 struct device *dev, u32 pasid);
 
 	int (*def_domain_type)(struct device *dev);
 
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 07/10] vfio/type1: Add vSVA support for IOMMU-backed mdevs
  2021-03-02 20:35 ` Liu Yi L
@ 2021-03-02 20:35   ` Liu Yi L
  -1 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: kevin.tian, jacob.jun.pan, ashok.raj, yi.l.liu, jun.j.tian,
	yi.y.sun, jean-philippe, peterx, jasowang, hao.wu, stefanha,
	iommu, kvm, jgg, Lingshan.Zhu, vivek.gautam

Recent years, mediated device pass-through framework (e.g. vfio-mdev)
is used to achieve flexible device sharing across domains (e.g. VMs).
Also there are hardware assisted mediated pass-through solutions from
platform vendors. e.g. Intel VT-d scalable mode which supports Intel
Scalable I/O Virtualization technology. Such mdevs are called IOMMU-
backed mdevs as there are IOMMU enforced DMA isolation for such mdevs.
In kernel, IOMMU-backed mdevs are exposed to IOMMU layer by aux-domain
concept, which means mdevs are protected by an iommu domain which is
auxiliary to the domain that the kernel driver primarily uses for DMA
API. Details can be found in the KVM presentation as below:

https://events19.linuxfoundation.org/wp-content/uploads/2017/12/Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf

This patch extends NESTING_IOMMU ops to IOMMU-backed mdev devices. The
main requirement is to use the auxiliary domain associated with mdev.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
CC: Jun Tian <jun.j.tian@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
v5 -> v6:
*) add review-by from Eric Auger.

v1 -> v2:
*) check the iommu_device to ensure the handling mdev is IOMMU-backed
---
 drivers/vfio/vfio_iommu_type1.c | 35 ++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 86b6d8f9789a..883a79f36c46 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2635,18 +2635,37 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static struct device *vfio_get_iommu_device(struct vfio_group *group,
+					    struct device *dev)
+{
+	if (group->mdev_group)
+		return vfio_mdev_get_iommu_device(dev);
+	else
+		return dev;
+}
+
 static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
 {
 	struct domain_capsule *dc = (struct domain_capsule *)data;
 	unsigned long arg = *(unsigned long *)dc->data;
+	struct device *iommu_device;
+
+	iommu_device = vfio_get_iommu_device(dc->group, dev);
+	if (!iommu_device)
+		return -EINVAL;
 
-	return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
+	return iommu_uapi_sva_bind_gpasid(dc->domain, iommu_device,
 					  (void __user *)arg);
 }
 
 static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
 {
 	struct domain_capsule *dc = (struct domain_capsule *)data;
+	struct device *iommu_device;
+
+	iommu_device = vfio_get_iommu_device(dc->group, dev);
+	if (!iommu_device)
+		return -EINVAL;
 
 	/*
 	 * dc->user is a toggle for the unbind operation. When user
@@ -2659,12 +2678,12 @@ static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
 	if (dc->user) {
 		unsigned long arg = *(unsigned long *)dc->data;
 
-		iommu_uapi_sva_unbind_gpasid(dc->domain,
-					     dev, (void __user *)arg);
+		iommu_uapi_sva_unbind_gpasid(dc->domain, iommu_device,
+					     (void __user *)arg);
 	} else {
 		ioasid_t pasid = *(ioasid_t *)dc->data;
 
-		iommu_sva_unbind_gpasid(dc->domain, dev, pasid);
+		iommu_sva_unbind_gpasid(dc->domain, iommu_device, pasid);
 	}
 	return 0;
 }
@@ -3295,8 +3314,14 @@ static int vfio_dev_cache_invalidate_fn(struct device *dev, void *data)
 {
 	struct domain_capsule *dc = (struct domain_capsule *)data;
 	unsigned long arg = *(unsigned long *)dc->data;
+	struct device *iommu_device;
+
+	iommu_device = vfio_get_iommu_device(dc->group, dev);
+	if (!iommu_device)
+		return -EINVAL;
 
-	iommu_uapi_cache_invalidate(dc->domain, dev, (void __user *)arg);
+	iommu_uapi_cache_invalidate(dc->domain, iommu_device,
+				    (void __user *)arg);
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 07/10] vfio/type1: Add vSVA support for IOMMU-backed mdevs
@ 2021-03-02 20:35   ` Liu Yi L
  0 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, vivek.gautam, yi.y.sun, jgg, Lingshan.Zhu, jasowang,
	hao.wu

Recent years, mediated device pass-through framework (e.g. vfio-mdev)
is used to achieve flexible device sharing across domains (e.g. VMs).
Also there are hardware assisted mediated pass-through solutions from
platform vendors. e.g. Intel VT-d scalable mode which supports Intel
Scalable I/O Virtualization technology. Such mdevs are called IOMMU-
backed mdevs as there are IOMMU enforced DMA isolation for such mdevs.
In kernel, IOMMU-backed mdevs are exposed to IOMMU layer by aux-domain
concept, which means mdevs are protected by an iommu domain which is
auxiliary to the domain that the kernel driver primarily uses for DMA
API. Details can be found in the KVM presentation as below:

https://events19.linuxfoundation.org/wp-content/uploads/2017/12/Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf

This patch extends NESTING_IOMMU ops to IOMMU-backed mdev devices. The
main requirement is to use the auxiliary domain associated with mdev.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
CC: Jun Tian <jun.j.tian@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
v5 -> v6:
*) add review-by from Eric Auger.

v1 -> v2:
*) check the iommu_device to ensure the handling mdev is IOMMU-backed
---
 drivers/vfio/vfio_iommu_type1.c | 35 ++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 86b6d8f9789a..883a79f36c46 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2635,18 +2635,37 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static struct device *vfio_get_iommu_device(struct vfio_group *group,
+					    struct device *dev)
+{
+	if (group->mdev_group)
+		return vfio_mdev_get_iommu_device(dev);
+	else
+		return dev;
+}
+
 static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
 {
 	struct domain_capsule *dc = (struct domain_capsule *)data;
 	unsigned long arg = *(unsigned long *)dc->data;
+	struct device *iommu_device;
+
+	iommu_device = vfio_get_iommu_device(dc->group, dev);
+	if (!iommu_device)
+		return -EINVAL;
 
-	return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
+	return iommu_uapi_sva_bind_gpasid(dc->domain, iommu_device,
 					  (void __user *)arg);
 }
 
 static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
 {
 	struct domain_capsule *dc = (struct domain_capsule *)data;
+	struct device *iommu_device;
+
+	iommu_device = vfio_get_iommu_device(dc->group, dev);
+	if (!iommu_device)
+		return -EINVAL;
 
 	/*
 	 * dc->user is a toggle for the unbind operation. When user
@@ -2659,12 +2678,12 @@ static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
 	if (dc->user) {
 		unsigned long arg = *(unsigned long *)dc->data;
 
-		iommu_uapi_sva_unbind_gpasid(dc->domain,
-					     dev, (void __user *)arg);
+		iommu_uapi_sva_unbind_gpasid(dc->domain, iommu_device,
+					     (void __user *)arg);
 	} else {
 		ioasid_t pasid = *(ioasid_t *)dc->data;
 
-		iommu_sva_unbind_gpasid(dc->domain, dev, pasid);
+		iommu_sva_unbind_gpasid(dc->domain, iommu_device, pasid);
 	}
 	return 0;
 }
@@ -3295,8 +3314,14 @@ static int vfio_dev_cache_invalidate_fn(struct device *dev, void *data)
 {
 	struct domain_capsule *dc = (struct domain_capsule *)data;
 	unsigned long arg = *(unsigned long *)dc->data;
+	struct device *iommu_device;
+
+	iommu_device = vfio_get_iommu_device(dc->group, dev);
+	if (!iommu_device)
+		return -EINVAL;
 
-	iommu_uapi_cache_invalidate(dc->domain, dev, (void __user *)arg);
+	iommu_uapi_cache_invalidate(dc->domain, iommu_device,
+				    (void __user *)arg);
 	return 0;
 }
 
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 08/10] vfio/pci: Expose PCIe PASID capability to userspace
  2021-03-02 20:35 ` Liu Yi L
@ 2021-03-02 20:35   ` Liu Yi L
  -1 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: kevin.tian, jacob.jun.pan, ashok.raj, yi.l.liu, jun.j.tian,
	yi.y.sun, jean-philippe, peterx, jasowang, hao.wu, stefanha,
	iommu, kvm, jgg, Lingshan.Zhu, vivek.gautam

This patch exposes PCIe PASID capability to userspace and where to
emulate this capability if wants to further expose it to VM.

And this patch only exposes PASID capability for devices which has PCIe
PASID extended struture in its configuration space. While for VFs, user
space still unable to see this capability as SR-IOV spec forbides VF to
implement PASID capability extended structure. It is a TODO in future.
Related discussion can be found in below link:

https://lore.kernel.org/kvm/20200407095801.648b1371@w520.home/

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
v7 -> v8:
*) refine the commit message and the subject.

v5 -> v6:
*) add review-by from Eric Auger.

v1 -> v2:
*) added in v2, but it was sent in a separate patchseries before
---
 drivers/vfio/pci/vfio_pci_config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index a402adee8a21..95b5478f51ac 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -95,7 +95,7 @@ static const u16 pci_ext_cap_length[PCI_EXT_CAP_ID_MAX + 1] = {
 	[PCI_EXT_CAP_ID_LTR]	=	PCI_EXT_CAP_LTR_SIZEOF,
 	[PCI_EXT_CAP_ID_SECPCI]	=	0,	/* not yet */
 	[PCI_EXT_CAP_ID_PMUX]	=	0,	/* not yet */
-	[PCI_EXT_CAP_ID_PASID]	=	0,	/* not yet */
+	[PCI_EXT_CAP_ID_PASID]	=	PCI_EXT_CAP_PASID_SIZEOF,
 };
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 08/10] vfio/pci: Expose PCIe PASID capability to userspace
@ 2021-03-02 20:35   ` Liu Yi L
  0 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, vivek.gautam, yi.y.sun, jgg, Lingshan.Zhu, jasowang,
	hao.wu

This patch exposes PCIe PASID capability to userspace and where to
emulate this capability if wants to further expose it to VM.

And this patch only exposes PASID capability for devices which has PCIe
PASID extended struture in its configuration space. While for VFs, user
space still unable to see this capability as SR-IOV spec forbides VF to
implement PASID capability extended structure. It is a TODO in future.
Related discussion can be found in below link:

https://lore.kernel.org/kvm/20200407095801.648b1371@w520.home/

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
v7 -> v8:
*) refine the commit message and the subject.

v5 -> v6:
*) add review-by from Eric Auger.

v1 -> v2:
*) added in v2, but it was sent in a separate patchseries before
---
 drivers/vfio/pci/vfio_pci_config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index a402adee8a21..95b5478f51ac 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -95,7 +95,7 @@ static const u16 pci_ext_cap_length[PCI_EXT_CAP_ID_MAX + 1] = {
 	[PCI_EXT_CAP_ID_LTR]	=	PCI_EXT_CAP_LTR_SIZEOF,
 	[PCI_EXT_CAP_ID_SECPCI]	=	0,	/* not yet */
 	[PCI_EXT_CAP_ID_PMUX]	=	0,	/* not yet */
-	[PCI_EXT_CAP_ID_PASID]	=	0,	/* not yet */
+	[PCI_EXT_CAP_ID_PASID]	=	PCI_EXT_CAP_PASID_SIZEOF,
 };
 
 /*
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 09/10] vfio: Document dual stage control
  2021-03-02 20:35 ` Liu Yi L
@ 2021-03-02 20:35   ` Liu Yi L
  -1 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: kevin.tian, jacob.jun.pan, ashok.raj, yi.l.liu, jun.j.tian,
	yi.y.sun, jean-philippe, peterx, jasowang, hao.wu, stefanha,
	iommu, kvm, jgg, Lingshan.Zhu, vivek.gautam, Stefan Hajnoczi

From: Eric Auger <eric.auger@redhat.com>

The VFIO API was enhanced to support nested stage control: a bunch of
new ioctls and usage guideline.

Let's document the process to follow to set up nested mode.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v7 -> v8:
*) remove SYSWIDE_PASID description, point to /dev/ioasid when mentioning
   PASID allocation from host.

v6 -> v7:
*) tweak per Eric's comments.

v5 -> v6:
*) tweak per Eric's comments.

v3 -> v4:
*) add review-by from Stefan Hajnoczi

v2 -> v3:
*) address comments from Stefan Hajnoczi

v1 -> v2:
*) new in v2, compared with Eric's original version, pasid table bind
   and fault reporting is removed as this series doesn't cover them.
   Original version from Eric.
   https://lore.kernel.org/kvm/20200320161911.27494-12-eric.auger@redhat.com/
---
 Documentation/driver-api/vfio.rst | 77 +++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index f1a4d3c3ba0b..9ccf9d63b72f 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -239,6 +239,83 @@ group and can access them as follows::
 	/* Gratuitous device reset and go... */
 	ioctl(device, VFIO_DEVICE_RESET);
 
+IOMMU Dual Stage Control
+------------------------
+
+Some IOMMUs support 2 stages/levels of translation. Stage corresponds
+to the ARM terminology while level corresponds to Intel's terminology.
+In the following text, we use either without distinction.
+
+This is useful when the guest is exposed with a virtual IOMMU and some
+devices are assigned to the guest through VFIO. Then the guest OS can
+use stage-1 (GIOVA -> GPA or GVA->GPA), while the hypervisor uses stage
+2 for VM isolation (GPA -> HPA).
+
+Under dual-stage translation, the guest gets ownership of the stage-1
+page tables or both the stage-1 configuration structures and page tables.
+This depends on vendor. e.g. on Intel platform, the guest owns stage-1
+page tables under nesting. While on ARM, the guest owns both the stage-1
+configuration structures and page tables under nesting. The hypervisor
+owns the root configuration structure (for security reasons), including
+stage-2 configuration. This works as long as configuration structures
+and page table formats are compatible between the virtual IOMMU and the
+physical IOMMU.
+
+Assuming the HW supports it, this nested mode is selected by choosing the
+VFIO_TYPE1_NESTING_IOMMU type through:
+
+    ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU);
+
+This forces the hypervisor to use the stage-2, leaving stage-1 available
+for guest usage.
+The stage-1 format and binding method are reported in nesting capability.
+(VFIO_IOMMU_TYPE1_INFO_CAP_NESTING) through VFIO_IOMMU_GET_INFO:
+
+    ioctl(container->fd, VFIO_IOMMU_GET_INFO, &nesting_info);
+
+The nesting cap info is available only after NESTING_IOMMU is selected.
+If the underlying IOMMU doesn't support nesting, VFIO_SET_IOMMU fails and
+userspace should try other IOMMU types. Details of the nesting cap info
+can be found in Documentation/userspace-api/iommu.rst.
+
+Bind stage-1 page table to the IOMMU differs per platform. On Intel,
+the stage1 page table info are mediated by the userspace for each PASID.
+On ARM, the userspace directly passes the GPA of the whole PASID table.
+Currently only Intel's binding is supported (IOMMU_NESTING_FEAT_BIND_PGTBL)
+is supported:
+
+    nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL;
+    memcpy(&nesting_op->data, &bind_data, sizeof(bind_data));
+    ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
+
+When multiple stage-1 page tables are supported on a device, each page
+table is associated with a PASID (Process Address Space ID) to differentiate
+with each other. In such case, userspace should include PASID in the
+bind_data when issuing direct binding requests.
+
+PASID could be managed per-device or system-wide which, again, depends on
+IOMMU vendor. e.g. as by Intel platforms, userspace *must* allocate PASID
+from host before attempting binding of stage-1 page table, the allocation
+is done by the /dev/ioasid interface. For systems without /dev/ioasid,
+userspace should not go further binding page table and shall be failed
+by the kernel. For the usage of /dev/ioasid, please refer to below doc:
+
+    Documentation/userspace-api/ioasid.rst
+
+Once the stage-1 page table is bound to the IOMMU, the guest is allowed to
+fully manage its mapping at its disposal. The IOMMU walks nested stage-1
+and stage-2 page tables when serving DMA requests from assigned device, and
+may cache the stage-1 mapping in the IOTLB. When required (IOMMU_NESTING_
+FEAT_CACHE_INVLD), userspace *must* forward guest stage-1 invalidation to
+the host, so the IOTLB is invalidated:
+
+    nesting_op->flags = VFIO_IOMMU_NESTING_OP_CACHE_INVLD;
+    memcpy(&nesting_op->data, &cache_inv_data, sizeof(cache_inv_data));
+    ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
+
+Forwarded invalidations can happen at various granularity levels (page
+level, context level, etc.)
+
 VFIO User API
 -------------------------------------------------------------------------------
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 09/10] vfio: Document dual stage control
@ 2021-03-02 20:35   ` Liu Yi L
  0 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	Stefan Hajnoczi, iommu, vivek.gautam, yi.y.sun, jgg,
	Lingshan.Zhu, jasowang, hao.wu

From: Eric Auger <eric.auger@redhat.com>

The VFIO API was enhanced to support nested stage control: a bunch of
new ioctls and usage guideline.

Let's document the process to follow to set up nested mode.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v7 -> v8:
*) remove SYSWIDE_PASID description, point to /dev/ioasid when mentioning
   PASID allocation from host.

v6 -> v7:
*) tweak per Eric's comments.

v5 -> v6:
*) tweak per Eric's comments.

v3 -> v4:
*) add review-by from Stefan Hajnoczi

v2 -> v3:
*) address comments from Stefan Hajnoczi

v1 -> v2:
*) new in v2, compared with Eric's original version, pasid table bind
   and fault reporting is removed as this series doesn't cover them.
   Original version from Eric.
   https://lore.kernel.org/kvm/20200320161911.27494-12-eric.auger@redhat.com/
---
 Documentation/driver-api/vfio.rst | 77 +++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index f1a4d3c3ba0b..9ccf9d63b72f 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -239,6 +239,83 @@ group and can access them as follows::
 	/* Gratuitous device reset and go... */
 	ioctl(device, VFIO_DEVICE_RESET);
 
+IOMMU Dual Stage Control
+------------------------
+
+Some IOMMUs support 2 stages/levels of translation. Stage corresponds
+to the ARM terminology while level corresponds to Intel's terminology.
+In the following text, we use either without distinction.
+
+This is useful when the guest is exposed with a virtual IOMMU and some
+devices are assigned to the guest through VFIO. Then the guest OS can
+use stage-1 (GIOVA -> GPA or GVA->GPA), while the hypervisor uses stage
+2 for VM isolation (GPA -> HPA).
+
+Under dual-stage translation, the guest gets ownership of the stage-1
+page tables or both the stage-1 configuration structures and page tables.
+This depends on vendor. e.g. on Intel platform, the guest owns stage-1
+page tables under nesting. While on ARM, the guest owns both the stage-1
+configuration structures and page tables under nesting. The hypervisor
+owns the root configuration structure (for security reasons), including
+stage-2 configuration. This works as long as configuration structures
+and page table formats are compatible between the virtual IOMMU and the
+physical IOMMU.
+
+Assuming the HW supports it, this nested mode is selected by choosing the
+VFIO_TYPE1_NESTING_IOMMU type through:
+
+    ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU);
+
+This forces the hypervisor to use the stage-2, leaving stage-1 available
+for guest usage.
+The stage-1 format and binding method are reported in nesting capability.
+(VFIO_IOMMU_TYPE1_INFO_CAP_NESTING) through VFIO_IOMMU_GET_INFO:
+
+    ioctl(container->fd, VFIO_IOMMU_GET_INFO, &nesting_info);
+
+The nesting cap info is available only after NESTING_IOMMU is selected.
+If the underlying IOMMU doesn't support nesting, VFIO_SET_IOMMU fails and
+userspace should try other IOMMU types. Details of the nesting cap info
+can be found in Documentation/userspace-api/iommu.rst.
+
+Bind stage-1 page table to the IOMMU differs per platform. On Intel,
+the stage1 page table info are mediated by the userspace for each PASID.
+On ARM, the userspace directly passes the GPA of the whole PASID table.
+Currently only Intel's binding is supported (IOMMU_NESTING_FEAT_BIND_PGTBL)
+is supported:
+
+    nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL;
+    memcpy(&nesting_op->data, &bind_data, sizeof(bind_data));
+    ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
+
+When multiple stage-1 page tables are supported on a device, each page
+table is associated with a PASID (Process Address Space ID) to differentiate
+with each other. In such case, userspace should include PASID in the
+bind_data when issuing direct binding requests.
+
+PASID could be managed per-device or system-wide which, again, depends on
+IOMMU vendor. e.g. as by Intel platforms, userspace *must* allocate PASID
+from host before attempting binding of stage-1 page table, the allocation
+is done by the /dev/ioasid interface. For systems without /dev/ioasid,
+userspace should not go further binding page table and shall be failed
+by the kernel. For the usage of /dev/ioasid, please refer to below doc:
+
+    Documentation/userspace-api/ioasid.rst
+
+Once the stage-1 page table is bound to the IOMMU, the guest is allowed to
+fully manage its mapping at its disposal. The IOMMU walks nested stage-1
+and stage-2 page tables when serving DMA requests from assigned device, and
+may cache the stage-1 mapping in the IOTLB. When required (IOMMU_NESTING_
+FEAT_CACHE_INVLD), userspace *must* forward guest stage-1 invalidation to
+the host, so the IOTLB is invalidated:
+
+    nesting_op->flags = VFIO_IOMMU_NESTING_OP_CACHE_INVLD;
+    memcpy(&nesting_op->data, &cache_inv_data, sizeof(cache_inv_data));
+    ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
+
+Forwarded invalidations can happen at various granularity levels (page
+level, context level, etc.)
+
 VFIO User API
 -------------------------------------------------------------------------------
 
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 10/10] iommu/vt-d: Support reporting nesting capability info
  2021-03-02 20:35 ` Liu Yi L
@ 2021-03-02 20:35   ` Liu Yi L
  -1 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: kevin.tian, jacob.jun.pan, ashok.raj, yi.l.liu, jun.j.tian,
	yi.y.sun, jean-philippe, peterx, jasowang, hao.wu, stefanha,
	iommu, kvm, jgg, Lingshan.Zhu, vivek.gautam

This patch reports nesting info when iommu_domain_get_attr() is called with
DOMAIN_ATTR_NESTING and one domain with nesting set.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v7 -> v8:
*) tweak per latest code base

v6 -> v7:
*) split the patch in v6 into two patches:
   [PATCH v7 15/16] iommu/vt-d: Only support nesting when nesting caps are consistent across iommu units
   [PATCH v7 16/16] iommu/vt-d: Support reporting nesting capability info

v2 -> v3:
*) remove cap/ecap_mask in iommu_nesting_info.
---
 drivers/iommu/intel/cap_audit.h |  7 ++++
 drivers/iommu/intel/iommu.c     | 68 ++++++++++++++++++++++++++++++++-
 2 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/cap_audit.h b/drivers/iommu/intel/cap_audit.h
index 74cfccae0e81..787e98282a02 100644
--- a/drivers/iommu/intel/cap_audit.h
+++ b/drivers/iommu/intel/cap_audit.h
@@ -60,6 +60,13 @@
 #define ECAP_QI_MASK		BIT_ULL(1)
 #define ECAP_C_MASK		BIT_ULL(0)
 
+/* Capabilities related to nested translation */
+#define VTD_CAP_MASK		(CAP_FL1GP_MASK | CAP_FL5LP_MASK)
+
+#define VTD_ECAP_MASK		(ECAP_PRS_MASK | ECAP_ERS_MASK | \
+				 ECAP_SRS_MASK | ECAP_EAFS_MASK | \
+				 ECAP_PASID_MASK)
+
 /*
  * u64 intel_iommu_cap_sanity, intel_iommu_ecap_sanity will be adjusted as each
  * IOMMU gets audited.
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 4409d86b4e18..f7432fb1c6ea 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5508,13 +5508,79 @@ static bool domain_use_flush_queue(void)
 	return r;
 }
 
+static int intel_iommu_get_nesting_info(struct iommu_domain *domain,
+					struct iommu_nesting_info *info)
+{
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	u64 cap = VTD_CAP_MASK, ecap = VTD_ECAP_MASK;
+	struct device_domain_info *domain_info;
+	struct iommu_nesting_info_vtd vtd;
+	unsigned int size;
+
+	if (!info)
+		return -EINVAL;
+
+	if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE))
+		return -ENODEV;
+
+	size = sizeof(struct iommu_nesting_info);
+	/*
+	 * if provided buffer size is smaller than expected, should
+	 * return 0 and also the expected buffer size to caller.
+	 */
+	if (info->argsz < size) {
+		info->argsz = size;
+		return 0;
+	}
+
+	/*
+	 * arbitrary select the first domain_info as all nesting
+	 * related capabilities should be consistent across iommu
+	 * units.
+	 */
+	domain_info = list_first_entry(&dmar_domain->devices,
+				       struct device_domain_info, link);
+	cap &= domain_info->iommu->cap;
+	ecap &= domain_info->iommu->ecap;
+
+	info->addr_width = dmar_domain->gaw;
+	info->format = IOMMU_PASID_FORMAT_INTEL_VTD;
+	info->features = IOMMU_NESTING_FEAT_BIND_PGTBL |
+			 IOMMU_NESTING_FEAT_CACHE_INVLD;
+	info->pasid_bits = ilog2(intel_pasid_max_id);
+	memset(&info->padding, 0x0, 12);
+
+	vtd.flags = 0;
+	memset(&vtd.padding, 0x0, 12);
+	vtd.cap_reg = cap & VTD_CAP_MASK;
+	vtd.ecap_reg = ecap & VTD_ECAP_MASK;
+
+	memcpy(&info->vendor.vtd, &vtd, sizeof(vtd));
+	return 0;
+}
+
 static int
 intel_iommu_domain_get_attr(struct iommu_domain *domain,
 			    enum iommu_attr attr, void *data)
 {
 	switch (domain->type) {
 	case IOMMU_DOMAIN_UNMANAGED:
-		return -ENODEV;
+		switch (attr) {
+		case DOMAIN_ATTR_NESTING:
+		{
+			struct iommu_nesting_info *info =
+				(struct iommu_nesting_info *)data;
+			unsigned long flags;
+			int ret;
+
+			spin_lock_irqsave(&device_domain_lock, flags);
+			ret = intel_iommu_get_nesting_info(domain, info);
+			spin_unlock_irqrestore(&device_domain_lock, flags);
+			return ret;
+		}
+		default:
+			return -ENODEV;
+		}
 	case IOMMU_DOMAIN_DMA:
 		switch (attr) {
 		case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Patch v8 10/10] iommu/vt-d: Support reporting nesting capability info
@ 2021-03-02 20:35   ` Liu Yi L
  0 siblings, 0 replies; 40+ messages in thread
From: Liu Yi L @ 2021-03-02 20:35 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, vivek.gautam, yi.y.sun, jgg, Lingshan.Zhu, jasowang,
	hao.wu

This patch reports nesting info when iommu_domain_get_attr() is called with
DOMAIN_ATTR_NESTING and one domain with nesting set.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v7 -> v8:
*) tweak per latest code base

v6 -> v7:
*) split the patch in v6 into two patches:
   [PATCH v7 15/16] iommu/vt-d: Only support nesting when nesting caps are consistent across iommu units
   [PATCH v7 16/16] iommu/vt-d: Support reporting nesting capability info

v2 -> v3:
*) remove cap/ecap_mask in iommu_nesting_info.
---
 drivers/iommu/intel/cap_audit.h |  7 ++++
 drivers/iommu/intel/iommu.c     | 68 ++++++++++++++++++++++++++++++++-
 2 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/cap_audit.h b/drivers/iommu/intel/cap_audit.h
index 74cfccae0e81..787e98282a02 100644
--- a/drivers/iommu/intel/cap_audit.h
+++ b/drivers/iommu/intel/cap_audit.h
@@ -60,6 +60,13 @@
 #define ECAP_QI_MASK		BIT_ULL(1)
 #define ECAP_C_MASK		BIT_ULL(0)
 
+/* Capabilities related to nested translation */
+#define VTD_CAP_MASK		(CAP_FL1GP_MASK | CAP_FL5LP_MASK)
+
+#define VTD_ECAP_MASK		(ECAP_PRS_MASK | ECAP_ERS_MASK | \
+				 ECAP_SRS_MASK | ECAP_EAFS_MASK | \
+				 ECAP_PASID_MASK)
+
 /*
  * u64 intel_iommu_cap_sanity, intel_iommu_ecap_sanity will be adjusted as each
  * IOMMU gets audited.
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 4409d86b4e18..f7432fb1c6ea 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5508,13 +5508,79 @@ static bool domain_use_flush_queue(void)
 	return r;
 }
 
+static int intel_iommu_get_nesting_info(struct iommu_domain *domain,
+					struct iommu_nesting_info *info)
+{
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	u64 cap = VTD_CAP_MASK, ecap = VTD_ECAP_MASK;
+	struct device_domain_info *domain_info;
+	struct iommu_nesting_info_vtd vtd;
+	unsigned int size;
+
+	if (!info)
+		return -EINVAL;
+
+	if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE))
+		return -ENODEV;
+
+	size = sizeof(struct iommu_nesting_info);
+	/*
+	 * if provided buffer size is smaller than expected, should
+	 * return 0 and also the expected buffer size to caller.
+	 */
+	if (info->argsz < size) {
+		info->argsz = size;
+		return 0;
+	}
+
+	/*
+	 * arbitrary select the first domain_info as all nesting
+	 * related capabilities should be consistent across iommu
+	 * units.
+	 */
+	domain_info = list_first_entry(&dmar_domain->devices,
+				       struct device_domain_info, link);
+	cap &= domain_info->iommu->cap;
+	ecap &= domain_info->iommu->ecap;
+
+	info->addr_width = dmar_domain->gaw;
+	info->format = IOMMU_PASID_FORMAT_INTEL_VTD;
+	info->features = IOMMU_NESTING_FEAT_BIND_PGTBL |
+			 IOMMU_NESTING_FEAT_CACHE_INVLD;
+	info->pasid_bits = ilog2(intel_pasid_max_id);
+	memset(&info->padding, 0x0, 12);
+
+	vtd.flags = 0;
+	memset(&vtd.padding, 0x0, 12);
+	vtd.cap_reg = cap & VTD_CAP_MASK;
+	vtd.ecap_reg = ecap & VTD_ECAP_MASK;
+
+	memcpy(&info->vendor.vtd, &vtd, sizeof(vtd));
+	return 0;
+}
+
 static int
 intel_iommu_domain_get_attr(struct iommu_domain *domain,
 			    enum iommu_attr attr, void *data)
 {
 	switch (domain->type) {
 	case IOMMU_DOMAIN_UNMANAGED:
-		return -ENODEV;
+		switch (attr) {
+		case DOMAIN_ATTR_NESTING:
+		{
+			struct iommu_nesting_info *info =
+				(struct iommu_nesting_info *)data;
+			unsigned long flags;
+			int ret;
+
+			spin_lock_irqsave(&device_domain_lock, flags);
+			ret = intel_iommu_get_nesting_info(domain, info);
+			spin_unlock_irqrestore(&device_domain_lock, flags);
+			return ret;
+		}
+		default:
+			return -ENODEV;
+		}
 	case IOMMU_DOMAIN_DMA:
 		switch (attr) {
 		case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* RE: [Patch v8 03/10] vfio/type1: Report iommu nesting info to userspace
  2021-03-02 12:52     ` Jason Gunthorpe
@ 2021-03-03  9:53       ` Liu, Yi L
  -1 siblings, 0 replies; 40+ messages in thread
From: Liu, Yi L @ 2021-03-03  9:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, eric.auger, baolu.lu, joro, Tian, Kevin,
	jacob.jun.pan, Raj, Ashok, Tian, Jun J, Sun, Yi Y, jean-philippe,
	peterx, jasowang, Wu, Hao, stefanha, iommu, kvm, Zhu, Lingshan,
	vivek.gautam

Hi Jason,

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, March 2, 2021 8:52 PM
> 
> On Wed, Mar 03, 2021 at 04:35:38AM +0800, Liu Yi L wrote:
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c
> > index 4bb162c1d649..3a5c84d4f19b 100644
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -63,22 +63,24 @@ MODULE_PARM_DESC(dma_entry_limit,
> >  		 "Maximum number of user DMA mappings per container
> (65535).");
> >
> >  struct vfio_iommu {
> > -	struct list_head	domain_list;
> > -	struct list_head	iova_list;
> > -	struct vfio_domain	*external_domain; /* domain for external
> user */
> > -	struct mutex		lock;
> > -	struct rb_root		dma_list;
> > -	struct blocking_notifier_head notifier;
> > -	unsigned int		dma_avail;
> > -	unsigned int		vaddr_invalid_count;
> > -	uint64_t		pgsize_bitmap;
> > -	uint64_t		num_non_pinned_groups;
> > -	wait_queue_head_t	vaddr_wait;
> > -	bool			v2;
> > -	bool			nesting;
> > -	bool			dirty_page_tracking;
> > -	bool			pinned_page_dirty_scope;
> > -	bool			container_open;
> > +	struct list_head		domain_list;
> > +	struct list_head		iova_list;
> > +	/* domain for external user */
> > +	struct vfio_domain		*external_domain;
> > +	struct mutex			lock;
> > +	struct rb_root			dma_list;
> > +	struct blocking_notifier_head	notifier;
> > +	unsigned int			dma_avail;
> > +	unsigned int			vaddr_invalid_count;
> > +	uint64_t			pgsize_bitmap;
> > +	uint64_t			num_non_pinned_groups;
> > +	wait_queue_head_t		vaddr_wait;
> > +	struct iommu_nesting_info	*nesting_info;
> > +	bool				v2;
> > +	bool				nesting;
> > +	bool				dirty_page_tracking;
> > +	bool				pinned_page_dirty_scope;
> > +	bool				container_open;
> >  };
> 
> I always hate seeing one line patches done like this. If you want to
> re-indent you should remove the horizontal whitespace, not add an
> unreadable amount more.

Oops. will be careful in next version. Perhaps no need to re-indent
the existing fields to avoid the whitespace?

> 
> Also, Linus has been unhappy before to see lists of bool's in structs
> due to the huge amount of memory they waste.

How about something like below? I can do it if Alex is fine with it.

		u64 v2:1;
		u64 nesting:1;
		u64 dirty_page_tracking:1;
		u64 pinned_page_dirty_scope:1;
		u64 container_open:1;
		u64 reserved:59;

And thanks for sharing me what Linus prefers.

Regards,
Yi Liu

> Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [Patch v8 03/10] vfio/type1: Report iommu nesting info to userspace
@ 2021-03-03  9:53       ` Liu, Yi L
  0 siblings, 0 replies; 40+ messages in thread
From: Liu, Yi L @ 2021-03-03  9:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, vivek.gautam,
	jasowang, stefanha, Sun, Yi Y, alex.williamson, iommu, Zhu,
	Lingshan, Wu, Hao, Tian,  Jun J

Hi Jason,

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, March 2, 2021 8:52 PM
> 
> On Wed, Mar 03, 2021 at 04:35:38AM +0800, Liu Yi L wrote:
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c
> > index 4bb162c1d649..3a5c84d4f19b 100644
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -63,22 +63,24 @@ MODULE_PARM_DESC(dma_entry_limit,
> >  		 "Maximum number of user DMA mappings per container
> (65535).");
> >
> >  struct vfio_iommu {
> > -	struct list_head	domain_list;
> > -	struct list_head	iova_list;
> > -	struct vfio_domain	*external_domain; /* domain for external
> user */
> > -	struct mutex		lock;
> > -	struct rb_root		dma_list;
> > -	struct blocking_notifier_head notifier;
> > -	unsigned int		dma_avail;
> > -	unsigned int		vaddr_invalid_count;
> > -	uint64_t		pgsize_bitmap;
> > -	uint64_t		num_non_pinned_groups;
> > -	wait_queue_head_t	vaddr_wait;
> > -	bool			v2;
> > -	bool			nesting;
> > -	bool			dirty_page_tracking;
> > -	bool			pinned_page_dirty_scope;
> > -	bool			container_open;
> > +	struct list_head		domain_list;
> > +	struct list_head		iova_list;
> > +	/* domain for external user */
> > +	struct vfio_domain		*external_domain;
> > +	struct mutex			lock;
> > +	struct rb_root			dma_list;
> > +	struct blocking_notifier_head	notifier;
> > +	unsigned int			dma_avail;
> > +	unsigned int			vaddr_invalid_count;
> > +	uint64_t			pgsize_bitmap;
> > +	uint64_t			num_non_pinned_groups;
> > +	wait_queue_head_t		vaddr_wait;
> > +	struct iommu_nesting_info	*nesting_info;
> > +	bool				v2;
> > +	bool				nesting;
> > +	bool				dirty_page_tracking;
> > +	bool				pinned_page_dirty_scope;
> > +	bool				container_open;
> >  };
> 
> I always hate seeing one line patches done like this. If you want to
> re-indent you should remove the horizontal whitespace, not add an
> unreadable amount more.

Oops. will be careful in next version. Perhaps no need to re-indent
the existing fields to avoid the whitespace?

> 
> Also, Linus has been unhappy before to see lists of bool's in structs
> due to the huge amount of memory they waste.

How about something like below? I can do it if Alex is fine with it.

		u64 v2:1;
		u64 nesting:1;
		u64 dirty_page_tracking:1;
		u64 pinned_page_dirty_scope:1;
		u64 container_open:1;
		u64 reserved:59;

And thanks for sharing me what Linus prefers.

Regards,
Yi Liu

> Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
  2021-03-02 17:15         ` Jason Gunthorpe
@ 2021-03-03 19:42           ` Jacob Pan
  -1 siblings, 0 replies; 40+ messages in thread
From: Jacob Pan @ 2021-03-03 19:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu Yi L, alex.williamson, eric.auger, baolu.lu, joro,
	kevin.tian, ashok.raj, jun.j.tian, yi.y.sun, jean-philippe,
	peterx, jasowang, hao.wu, stefanha, iommu, kvm, Lingshan.Zhu,
	vivek.gautam, jacob.jun.pan, Jason Wang

Hi Jason,

On Tue, 2 Mar 2021 13:15:51 -0400, Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Mar 02, 2021 at 09:13:19AM -0800, Jacob Pan wrote:
> > Hi Jason,
> > 
> > On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe <jgg@nvidia.com>
> > wrote: 
> > > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:  
> > > >  
> > > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > > > +{
> > > > +	struct domain_capsule *dc = (struct domain_capsule *)data;
> > > > +	unsigned long arg = *(unsigned long *)dc->data;
> > > > +
> > > > +	return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> > > > +					  (void __user *)arg);    
> > > 
> > > This arg buisness is really tortured. The type should be set at the
> > > ioctl, not constantly passed down as unsigned long or worse void *.
> > > 
> > > And why is this passing a __user pointer deep into an iommu_* API??
> > >   
> > The idea was that IOMMU UAPI (not API) is independent of VFIO or other
> > user driver frameworks. The design is documented here:
> > Documentation/userspace-api/iommu.rst
> > IOMMU UAPI handles the type and sanitation of user provided data.  
> 
> Why? If it is uapi it has defined types and those types should be
> completely clear from the C code, not obfuscated.
> 
From the user's p.o.v., it is plain c code nothing obfuscated. As for
kernel handling of the data types, it has to be answered by the bigger
question of how we deal with sharing IOMMU among multiple subsystems with
UAPIs.

> I haven't looked at the design doc yet, but this is a just a big red
> flag, you shouldn't be tunneling one subsytems uAPI through another
> subsystem.
>
> If you need to hook two subsystems together it should be more
> directly, like VFIO takes in the IOMMU FD and 'registers' itself in
> some way with the IOMMU then you can do the IOMMU actions through the
> IOMMU FD and it can call back to VFIO as needed.
> 
> At least in this way we can swap VFIO for other things in the API.
> 
> Having every subsystem that wants to implement IOMMU also implement
> tunneled ops seems very backwards.
>
Let me try to understand your proposal, you are saying:
1. Create a new user interface such that FDs can be obtained as a reference
to an IOMMU.
2. Handle over the IOMMU FD to VFIO or other subsystem which wish to
register for IOMMU service.

However, IOMMU is a system device which has little value to be exposed to
the userspace. Not to mention the device-IOMMU affinity/topology. VFIO
nicely abstracts IOMMU from the userspace, why do we want to reverse that?

With this UAPI layering approach, we converge common code at the IOMMU
layer. Without introducing new user interfaces, we can support multiple
subsystems that want to use IOMMU service. e.g. VDPA and VFIO.

> > Could you be more specific about your concerns?  
> 
> Avoid using unsigned long, void * and flex arrays to describe
> concretely typed things.
> 
> Jason


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
@ 2021-03-03 19:42           ` Jacob Pan
  0 siblings, 0 replies; 40+ messages in thread
From: Jacob Pan @ 2021-03-03 19:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, vivek.gautam,
	Jason Wang, stefanha, yi.y.sun, alex.williamson, iommu,
	Lingshan.Zhu, hao.wu, jun.j.tian

Hi Jason,

On Tue, 2 Mar 2021 13:15:51 -0400, Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Mar 02, 2021 at 09:13:19AM -0800, Jacob Pan wrote:
> > Hi Jason,
> > 
> > On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe <jgg@nvidia.com>
> > wrote: 
> > > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:  
> > > >  
> > > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > > > +{
> > > > +	struct domain_capsule *dc = (struct domain_capsule *)data;
> > > > +	unsigned long arg = *(unsigned long *)dc->data;
> > > > +
> > > > +	return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> > > > +					  (void __user *)arg);    
> > > 
> > > This arg buisness is really tortured. The type should be set at the
> > > ioctl, not constantly passed down as unsigned long or worse void *.
> > > 
> > > And why is this passing a __user pointer deep into an iommu_* API??
> > >   
> > The idea was that IOMMU UAPI (not API) is independent of VFIO or other
> > user driver frameworks. The design is documented here:
> > Documentation/userspace-api/iommu.rst
> > IOMMU UAPI handles the type and sanitation of user provided data.  
> 
> Why? If it is uapi it has defined types and those types should be
> completely clear from the C code, not obfuscated.
> 
From the user's p.o.v., it is plain c code nothing obfuscated. As for
kernel handling of the data types, it has to be answered by the bigger
question of how we deal with sharing IOMMU among multiple subsystems with
UAPIs.

> I haven't looked at the design doc yet, but this is a just a big red
> flag, you shouldn't be tunneling one subsytems uAPI through another
> subsystem.
>
> If you need to hook two subsystems together it should be more
> directly, like VFIO takes in the IOMMU FD and 'registers' itself in
> some way with the IOMMU then you can do the IOMMU actions through the
> IOMMU FD and it can call back to VFIO as needed.
> 
> At least in this way we can swap VFIO for other things in the API.
> 
> Having every subsystem that wants to implement IOMMU also implement
> tunneled ops seems very backwards.
>
Let me try to understand your proposal, you are saying:
1. Create a new user interface such that FDs can be obtained as a reference
to an IOMMU.
2. Handle over the IOMMU FD to VFIO or other subsystem which wish to
register for IOMMU service.

However, IOMMU is a system device which has little value to be exposed to
the userspace. Not to mention the device-IOMMU affinity/topology. VFIO
nicely abstracts IOMMU from the userspace, why do we want to reverse that?

With this UAPI layering approach, we converge common code at the IOMMU
layer. Without introducing new user interfaces, we can support multiple
subsystems that want to use IOMMU service. e.g. VDPA and VFIO.

> > Could you be more specific about your concerns?  
> 
> Avoid using unsigned long, void * and flex arrays to describe
> concretely typed things.
> 
> Jason


Thanks,

Jacob
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
  2021-03-03 19:42           ` Jacob Pan
@ 2021-03-03 19:45             ` Jason Gunthorpe
  -1 siblings, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2021-03-03 19:45 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Liu Yi L, alex.williamson, eric.auger, baolu.lu, joro,
	kevin.tian, ashok.raj, jun.j.tian, yi.y.sun, jean-philippe,
	peterx, jasowang, hao.wu, stefanha, iommu, kvm, Lingshan.Zhu,
	vivek.gautam

On Wed, Mar 03, 2021 at 11:42:12AM -0800, Jacob Pan wrote:
> Hi Jason,
> 
> On Tue, 2 Mar 2021 13:15:51 -0400, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Tue, Mar 02, 2021 at 09:13:19AM -0800, Jacob Pan wrote:
> > > Hi Jason,
> > > 
> > > On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe <jgg@nvidia.com>
> > > wrote: 
> > > > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:  
> > > > >  
> > > > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > > > > +{
> > > > > +	struct domain_capsule *dc = (struct domain_capsule *)data;
> > > > > +	unsigned long arg = *(unsigned long *)dc->data;
> > > > > +
> > > > > +	return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> > > > > +					  (void __user *)arg);    
> > > > 
> > > > This arg buisness is really tortured. The type should be set at the
> > > > ioctl, not constantly passed down as unsigned long or worse void *.
> > > > 
> > > > And why is this passing a __user pointer deep into an iommu_* API??
> > > >   
> > > The idea was that IOMMU UAPI (not API) is independent of VFIO or other
> > > user driver frameworks. The design is documented here:
> > > Documentation/userspace-api/iommu.rst
> > > IOMMU UAPI handles the type and sanitation of user provided data.  
> > 
> > Why? If it is uapi it has defined types and those types should be
> > completely clear from the C code, not obfuscated.
> > 
> From the user's p.o.v., it is plain c code nothing obfuscated. As for
> kernel handling of the data types, it has to be answered by the bigger
> question of how we deal with sharing IOMMU among multiple subsystems with
> UAPIs.
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1;
	t=1614800733; bh=ESjraPQ1U+x3dvWd7l3HBKlTBu3ySX2nWO4QM44ApEU=;
	h=ARC-Seal:ARC-Message-Signature:ARC-Authentication-Results:Date:
	 From:To:CC:Subject:Message-ID:References:Content-Type:
	 Content-Disposition:In-Reply-To:X-ClientProxiedBy:MIME-Version:
	 X-MS-Exchange-MessageSentRepresentingType:X-Header;
	b=Jyxs64SLsrGfmoPaAHc58Bpre6/+/wDGs/dAMDviWTwRc+2Miw0+jOBGQXNLz/lE8
	 KbS/K02BZmgn/jJwl994Po6nS3kGgTg0of6AFSd2MqtaZPMx+aMJ3prec9hwpHaXmN
	 SiosC+FviWoFQHmEvWSQoVeEMS093zQ+sjcsymUkMXYHGYQRqebW101Mii6/0hT3iv
	 Su4YvAyaKmOpWT8sayI4K0ICIhRxWAOT5P78FrXofij9o3X9T9F/9Bo+S5BWSCMzMr
	 +dM2KsTd4Ecac9hNemigs87T/tiCh50XaZoc8WMRFWGYMpha6KFdCV5wWL0Yzauyt7
	 H7uCWkmg/QTCw==

As I said, don't obfuscate types like this in the kernel. It is not
good style.

> However, IOMMU is a system device which has little value to be exposed to
> the userspace. Not to mention the device-IOMMU affinity/topology. VFIO
> nicely abstracts IOMMU from the userspace, why do we want to reverse that?

The other patch was talking about a /dev/ioasid - why can't this stuff
be run over that?

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
@ 2021-03-03 19:45             ` Jason Gunthorpe
  0 siblings, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2021-03-03 19:45 UTC (permalink / raw)
  To: Jacob Pan
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, vivek.gautam,
	jasowang, stefanha, yi.y.sun, alex.williamson, iommu,
	Lingshan.Zhu, hao.wu, jun.j.tian

On Wed, Mar 03, 2021 at 11:42:12AM -0800, Jacob Pan wrote:
> Hi Jason,
> 
> On Tue, 2 Mar 2021 13:15:51 -0400, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Tue, Mar 02, 2021 at 09:13:19AM -0800, Jacob Pan wrote:
> > > Hi Jason,
> > > 
> > > On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe <jgg@nvidia.com>
> > > wrote: 
> > > > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:  
> > > > >  
> > > > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > > > > +{
> > > > > +	struct domain_capsule *dc = (struct domain_capsule *)data;
> > > > > +	unsigned long arg = *(unsigned long *)dc->data;
> > > > > +
> > > > > +	return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> > > > > +					  (void __user *)arg);    
> > > > 
> > > > This arg buisness is really tortured. The type should be set at the
> > > > ioctl, not constantly passed down as unsigned long or worse void *.
> > > > 
> > > > And why is this passing a __user pointer deep into an iommu_* API??
> > > >   
> > > The idea was that IOMMU UAPI (not API) is independent of VFIO or other
> > > user driver frameworks. The design is documented here:
> > > Documentation/userspace-api/iommu.rst
> > > IOMMU UAPI handles the type and sanitation of user provided data.  
> > 
> > Why? If it is uapi it has defined types and those types should be
> > completely clear from the C code, not obfuscated.
> > 
> From the user's p.o.v., it is plain c code nothing obfuscated. As for
> kernel handling of the data types, it has to be answered by the bigger
> question of how we deal with sharing IOMMU among multiple subsystems with
> UAPIs.
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1;
	t=1614800733; bh=ESjraPQ1U+x3dvWd7l3HBKlTBu3ySX2nWO4QM44ApEU=;
	h=ARC-Seal:ARC-Message-Signature:ARC-Authentication-Results:Date:
	 From:To:CC:Subject:Message-ID:References:Content-Type:
	 Content-Disposition:In-Reply-To:X-ClientProxiedBy:MIME-Version:
	 X-MS-Exchange-MessageSentRepresentingType:X-Header;
	b=Jyxs64SLsrGfmoPaAHc58Bpre6/+/wDGs/dAMDviWTwRc+2Miw0+jOBGQXNLz/lE8
	 KbS/K02BZmgn/jJwl994Po6nS3kGgTg0of6AFSd2MqtaZPMx+aMJ3prec9hwpHaXmN
	 SiosC+FviWoFQHmEvWSQoVeEMS093zQ+sjcsymUkMXYHGYQRqebW101Mii6/0hT3iv
	 Su4YvAyaKmOpWT8sayI4K0ICIhRxWAOT5P78FrXofij9o3X9T9F/9Bo+S5BWSCMzMr
	 +dM2KsTd4Ecac9hNemigs87T/tiCh50XaZoc8WMRFWGYMpha6KFdCV5wWL0Yzauyt7
	 H7uCWkmg/QTCw==

As I said, don't obfuscate types like this in the kernel. It is not
good style.

> However, IOMMU is a system device which has little value to be exposed to
> the userspace. Not to mention the device-IOMMU affinity/topology. VFIO
> nicely abstracts IOMMU from the userspace, why do we want to reverse that?

The other patch was talking about a /dev/ioasid - why can't this stuff
be run over that?

Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
  2021-03-03 19:45             ` Jason Gunthorpe
@ 2021-03-04  7:20               ` Liu, Yi L
  -1 siblings, 0 replies; 40+ messages in thread
From: Liu, Yi L @ 2021-03-04  7:20 UTC (permalink / raw)
  To: Jason Gunthorpe, Jacob Pan
  Cc: alex.williamson, eric.auger, baolu.lu, joro, Tian, Kevin, Raj,
	Ashok, Tian, Jun J, Sun, Yi Y, jean-philippe, peterx, jasowang,
	Wu, Hao, stefanha, iommu, kvm, Zhu, Lingshan, vivek.gautam

Hi Jason,

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, March 4, 2021 3:45 AM
> 
> On Wed, Mar 03, 2021 at 11:42:12AM -0800, Jacob Pan wrote:
> > Hi Jason,
> >
> > On Tue, 2 Mar 2021 13:15:51 -0400, Jason Gunthorpe <jgg@nvidia.com>
> wrote:
> >
> > > On Tue, Mar 02, 2021 at 09:13:19AM -0800, Jacob Pan wrote:
> > > > Hi Jason,
> > > >
> > > > On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe <jgg@nvidia.com>
> > > > wrote:
> > > > > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:
> > > > > >
> > > > > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > > > > > +{
> > > > > > +	struct domain_capsule *dc = (struct domain_capsule *)data;
> > > > > > +	unsigned long arg = *(unsigned long *)dc->data;
> > > > > > +
> > > > > > +	return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> > > > > > +					  (void __user *)arg);
> > > > >
> > > > > This arg buisness is really tortured. The type should be set at the
> > > > > ioctl, not constantly passed down as unsigned long or worse void *.
> > > > >
> > > > > And why is this passing a __user pointer deep into an iommu_* API??
> > > > >
> > > > The idea was that IOMMU UAPI (not API) is independent of VFIO or
> other
> > > > user driver frameworks. The design is documented here:
> > > > Documentation/userspace-api/iommu.rst
> > > > IOMMU UAPI handles the type and sanitation of user provided data.
> > >
> > > Why? If it is uapi it has defined types and those types should be
> > > completely clear from the C code, not obfuscated.
> > >
> > From the user's p.o.v., it is plain c code nothing obfuscated. As for
> > kernel handling of the data types, it has to be answered by the bigger
> > question of how we deal with sharing IOMMU among multiple
> subsystems with
> > UAPIs.
> 
> As I said, don't obfuscate types like this in the kernel. It is not
> good style.
> 
> > However, IOMMU is a system device which has little value to be exposed
> to
> > the userspace. Not to mention the device-IOMMU affinity/topology. VFIO
> > nicely abstracts IOMMU from the userspace, why do we want to reverse
> that?
> 
> The other patch was talking about a /dev/ioasid - why can't this stuff
> be run over that?

The stuff in this patch are actually iommu domain operations, which are
finally supported by iommu domain ops. While /dev/ioasid in another patch
is created for IOASID allocation/free to fit the PASID allocation requirement
from both vSVA and vDPA. It has no idea about iommu domain and neither the
device information. Without such info, /dev/ioasid is unable to run this
stuff.

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
@ 2021-03-04  7:20               ` Liu, Yi L
  0 siblings, 0 replies; 40+ messages in thread
From: Liu, Yi L @ 2021-03-04  7:20 UTC (permalink / raw)
  To: Jason Gunthorpe, Jacob Pan
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, vivek.gautam,
	jasowang, stefanha, Sun, Yi Y, alex.williamson, iommu, Zhu,
	Lingshan, Wu, Hao, Tian, Jun J

Hi Jason,

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, March 4, 2021 3:45 AM
> 
> On Wed, Mar 03, 2021 at 11:42:12AM -0800, Jacob Pan wrote:
> > Hi Jason,
> >
> > On Tue, 2 Mar 2021 13:15:51 -0400, Jason Gunthorpe <jgg@nvidia.com>
> wrote:
> >
> > > On Tue, Mar 02, 2021 at 09:13:19AM -0800, Jacob Pan wrote:
> > > > Hi Jason,
> > > >
> > > > On Tue, 2 Mar 2021 08:56:28 -0400, Jason Gunthorpe <jgg@nvidia.com>
> > > > wrote:
> > > > > On Wed, Mar 03, 2021 at 04:35:39AM +0800, Liu Yi L wrote:
> > > > > >
> > > > > > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > > > > > +{
> > > > > > +	struct domain_capsule *dc = (struct domain_capsule *)data;
> > > > > > +	unsigned long arg = *(unsigned long *)dc->data;
> > > > > > +
> > > > > > +	return iommu_uapi_sva_bind_gpasid(dc->domain, dev,
> > > > > > +					  (void __user *)arg);
> > > > >
> > > > > This arg buisness is really tortured. The type should be set at the
> > > > > ioctl, not constantly passed down as unsigned long or worse void *.
> > > > >
> > > > > And why is this passing a __user pointer deep into an iommu_* API??
> > > > >
> > > > The idea was that IOMMU UAPI (not API) is independent of VFIO or
> other
> > > > user driver frameworks. The design is documented here:
> > > > Documentation/userspace-api/iommu.rst
> > > > IOMMU UAPI handles the type and sanitation of user provided data.
> > >
> > > Why? If it is uapi it has defined types and those types should be
> > > completely clear from the C code, not obfuscated.
> > >
> > From the user's p.o.v., it is plain c code nothing obfuscated. As for
> > kernel handling of the data types, it has to be answered by the bigger
> > question of how we deal with sharing IOMMU among multiple
> subsystems with
> > UAPIs.
> 
> As I said, don't obfuscate types like this in the kernel. It is not
> good style.
> 
> > However, IOMMU is a system device which has little value to be exposed
> to
> > the userspace. Not to mention the device-IOMMU affinity/topology. VFIO
> > nicely abstracts IOMMU from the userspace, why do we want to reverse
> that?
> 
> The other patch was talking about a /dev/ioasid - why can't this stuff
> be run over that?

The stuff in this patch are actually iommu domain operations, which are
finally supported by iommu domain ops. While /dev/ioasid in another patch
is created for IOASID allocation/free to fit the PASID allocation requirement
from both vSVA and vDPA. It has no idea about iommu domain and neither the
device information. Without such info, /dev/ioasid is unable to run this
stuff.

Thanks,
Yi Liu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
  2021-03-04  7:20               ` Liu, Yi L
@ 2021-03-04 12:52                 ` Jason Gunthorpe
  -1 siblings, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2021-03-04 12:52 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Jacob Pan, alex.williamson, eric.auger, baolu.lu, joro, Tian,
	Kevin, Raj, Ashok, Tian, Jun J, Sun, Yi Y, jean-philippe, peterx,
	jasowang, Wu, Hao, stefanha, iommu, kvm, Zhu, Lingshan,
	vivek.gautam

On Thu, Mar 04, 2021 at 07:20:22AM +0000, Liu, Yi L wrote:
> > > However, IOMMU is a system device which has little value to be exposed
> > to
> > > the userspace. Not to mention the device-IOMMU affinity/topology. VFIO
> > > nicely abstracts IOMMU from the userspace, why do we want to reverse
> > that?
> > 
> > The other patch was talking about a /dev/ioasid - why can't this stuff
> > be run over that?
> 
> The stuff in this patch are actually iommu domain operations, which are
> finally supported by iommu domain ops. While /dev/ioasid in another patch
> is created for IOASID allocation/free to fit the PASID allocation requirement
> from both vSVA and vDPA. It has no idea about iommu domain and neither the
> device information. Without such info, /dev/ioasid is unable to run this
> stuff.

Why can't it know? My point was that VFIO should interact with
/dev/ioasid to exchange the information both sides need and once
exchanged the control over IOMMU should be done through /dev/ioasid,
not inside VFIO.

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID
@ 2021-03-04 12:52                 ` Jason Gunthorpe
  0 siblings, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2021-03-04 12:52 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, vivek.gautam,
	jasowang, stefanha, Sun, Yi Y, alex.williamson, iommu, Zhu,
	Lingshan, Wu, Hao, Tian, Jun J

On Thu, Mar 04, 2021 at 07:20:22AM +0000, Liu, Yi L wrote:
> > > However, IOMMU is a system device which has little value to be exposed
> > to
> > > the userspace. Not to mention the device-IOMMU affinity/topology. VFIO
> > > nicely abstracts IOMMU from the userspace, why do we want to reverse
> > that?
> > 
> > The other patch was talking about a /dev/ioasid - why can't this stuff
> > be run over that?
> 
> The stuff in this patch are actually iommu domain operations, which are
> finally supported by iommu domain ops. While /dev/ioasid in another patch
> is created for IOASID allocation/free to fit the PASID allocation requirement
> from both vSVA and vDPA. It has no idea about iommu domain and neither the
> device information. Without such info, /dev/ioasid is unable to run this
> stuff.

Why can't it know? My point was that VFIO should interact with
/dev/ioasid to exchange the information both sides need and once
exchanged the control over IOMMU should be done through /dev/ioasid,
not inside VFIO.

Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2021-03-04 12:53 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-02 20:35 [Patch v8 00/10] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
2021-03-02 20:35 ` Liu Yi L
2021-03-02 20:35 ` [Patch v8 01/10] iommu: Report domain nesting info Liu Yi L
2021-03-02 20:35   ` Liu Yi L
2021-03-02 20:35 ` [Patch v8 02/10] iommu/smmu: Report empty " Liu Yi L
2021-03-02 20:35   ` Liu Yi L
2021-03-02 20:35 ` [Patch v8 03/10] vfio/type1: Report iommu nesting info to userspace Liu Yi L
2021-03-02 20:35   ` Liu Yi L
2021-03-02 12:52   ` Jason Gunthorpe
2021-03-02 12:52     ` Jason Gunthorpe
2021-03-03  9:53     ` Liu, Yi L
2021-03-03  9:53       ` Liu, Yi L
2021-03-02 20:35 ` [Patch v8 04/10] vfio/type1: Support binding guest page tables to PASID Liu Yi L
2021-03-02 20:35   ` Liu Yi L
2021-03-02 12:56   ` Jason Gunthorpe
2021-03-02 12:56     ` Jason Gunthorpe
2021-03-02 17:13     ` Jacob Pan
2021-03-02 17:13       ` Jacob Pan
2021-03-02 17:15       ` Jason Gunthorpe
2021-03-02 17:15         ` Jason Gunthorpe
2021-03-03 19:42         ` Jacob Pan
2021-03-03 19:42           ` Jacob Pan
2021-03-03 19:45           ` Jason Gunthorpe
2021-03-03 19:45             ` Jason Gunthorpe
2021-03-04  7:20             ` Liu, Yi L
2021-03-04  7:20               ` Liu, Yi L
2021-03-04 12:52               ` Jason Gunthorpe
2021-03-04 12:52                 ` Jason Gunthorpe
2021-03-02 20:35 ` [Patch v8 05/10] vfio/type1: Allow invalidating first-level/stage IOMMU cache Liu Yi L
2021-03-02 20:35   ` Liu Yi L
2021-03-02 20:35 ` [Patch v8 06/10] iommu: Pass domain to sva_unbind_gpasid() Liu Yi L
2021-03-02 20:35   ` Liu Yi L
2021-03-02 20:35 ` [Patch v8 07/10] vfio/type1: Add vSVA support for IOMMU-backed mdevs Liu Yi L
2021-03-02 20:35   ` Liu Yi L
2021-03-02 20:35 ` [Patch v8 08/10] vfio/pci: Expose PCIe PASID capability to userspace Liu Yi L
2021-03-02 20:35   ` Liu Yi L
2021-03-02 20:35 ` [Patch v8 09/10] vfio: Document dual stage control Liu Yi L
2021-03-02 20:35   ` Liu Yi L
2021-03-02 20:35 ` [Patch v8 10/10] iommu/vt-d: Support reporting nesting capability info Liu Yi L
2021-03-02 20:35   ` Liu Yi L

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.