kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Liu, Yi L" <yi.l.liu@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "eric.auger@redhat.com" <eric.auger@redhat.com>,
	"baolu.lu@linux.intel.com" <baolu.lu@linux.intel.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"jacob.jun.pan@linux.intel.com" <jacob.jun.pan@linux.intel.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	"Tian, Jun J" <jun.j.tian@intel.com>,
	"Sun, Yi Y" <yi.y.sun@intel.com>,
	"jean-philippe@linaro.org" <jean-philippe@linaro.org>,
	"peterx@redhat.com" <peterx@redhat.com>,
	"Wu, Hao" <hao.wu@intel.com>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH v3 06/14] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)
Date: Fri, 3 Jul 2020 06:28:04 +0000	[thread overview]
Message-ID: <CY4PR11MB1432DD97F44EB8AA5CCC87D8C36A0@CY4PR11MB1432.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20200702151832.048b44d1@x1.home>

Hi Alex,

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, July 3, 2020 5:19 AM
> 
> On Wed, 24 Jun 2020 01:55:19 -0700
> Liu Yi L <yi.l.liu@intel.com> wrote:
> 
> > This patch allows user space to request PASID allocation/free, e.g.
> > when serving the request from the guest.
> >
> > PASIDs that are not freed by userspace are automatically freed when
> > the IOASID set is destroyed when process exits.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> > v1 -> v2:
> > *) move the vfio_mm related code to be a seprate module
> > *) use a single structure for alloc/free, could support a range of
> > PASIDs
> > *) fetch vfio_mm at group_attach time instead of at iommu driver open
> > time
> > ---
> >  drivers/vfio/Kconfig            |  1 +
> >  drivers/vfio/vfio_iommu_type1.c | 96
> ++++++++++++++++++++++++++++++++++++++++-
> >  drivers/vfio/vfio_pasid.c       | 10 +++++
> >  include/linux/vfio.h            |  6 +++
> >  include/uapi/linux/vfio.h       | 36 ++++++++++++++++
> >  5 files changed, 147 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index
> > 3d8a108..95d90c6 100644
> > --- a/drivers/vfio/Kconfig
> > +++ b/drivers/vfio/Kconfig
> > @@ -2,6 +2,7 @@
> >  config VFIO_IOMMU_TYPE1
> >  	tristate
> >  	depends on VFIO
> > +	select VFIO_PASID if (X86)
> >  	default n
> >
> >  config VFIO_IOMMU_SPAPR_TCE
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c index 8c143d5..d0891c5 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -73,6 +73,7 @@ struct vfio_iommu {
> >  	bool			v2;
> >  	bool			nesting;
> >  	struct iommu_nesting_info *nesting_info;
> > +	struct vfio_mm		*vmm;
> 
> Structure alignment again.

sure. may get agreement in the prior email.

> 
> >  	bool			dirty_page_tracking;
> >  	bool			pinned_page_dirty_scope;
> >  };
> > @@ -1933,6 +1934,17 @@ static void vfio_iommu_iova_insert_copy(struct
> > vfio_iommu *iommu,
> >
> >  	list_splice_tail(iova_copy, iova);
> >  }
> > +
> > +static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
> > +{
> > +	if (iommu->vmm) {
> > +		vfio_mm_put(iommu->vmm);
> > +		iommu->vmm = NULL;
> > +	}
> > +
> > +	kfree(iommu->nesting_info);
> 
> iommu->nesting_info = NULL;

got it.

> > +}
> > +
> >  static int vfio_iommu_type1_attach_group(void *iommu_data,
> >  					 struct iommu_group *iommu_group)
> { @@ -2067,6 +2079,25 @@
> > static int vfio_iommu_type1_attach_group(void *iommu_data,
> >  			goto out_detach;
> >  		}
> >  		iommu->nesting_info = info;
> > +
> > +		if (info->features & IOMMU_NESTING_FEAT_SYSWIDE_PASID) {
> > +			struct vfio_mm *vmm;
> > +			int sid;
> > +
> > +			vmm = vfio_mm_get_from_task(current);
> > +			if (IS_ERR(vmm)) {
> > +				ret = PTR_ERR(vmm);
> > +				goto out_detach;
> > +			}
> > +			iommu->vmm = vmm;
> > +
> > +			sid = vfio_mm_ioasid_sid(vmm);
> > +			ret = iommu_domain_set_attr(domain->domain,
> > +						    DOMAIN_ATTR_IOASID_SID,
> > +						    &sid);
> 
> This looks pretty dicey in the case of !CONFIG_VFIO_PASID, can we get here in
> that case?  If so it looks like we're doing bad things with setting the domain-
> >ioasid_sid.

I guess not. So far, vfio_iommu_type1 will select CONFIG_VFIO_PASID for X86.
do you think it is enough?

> 
> > +			if (ret)
> > +				goto out_detach;
> > +		}
> >  	}
> >
> >  	/* Get aperture info */
> > @@ -2178,7 +2209,8 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
> >  	return 0;
> >
> >  out_detach:
> > -	kfree(iommu->nesting_info);
> > +	if (iommu->nesting_info)
> > +		vfio_iommu_release_nesting_info(iommu);
> 
> Make vfio_iommu_release_nesting_info() check iommu->nesting_info, then call
> it unconditionally?

got it. :-)

> >  	vfio_iommu_detach_group(domain, group);
> >  out_domain:
> >  	iommu_domain_free(domain->domain);
> > @@ -2380,7 +2412,8 @@ static void vfio_iommu_type1_detach_group(void
> *iommu_data,
> >  				else
> >
> 	vfio_iommu_unmap_unpin_reaccount(iommu);
> >
> > -				kfree(iommu->nesting_info);
> > +				if (iommu->nesting_info)
> > +
> 	vfio_iommu_release_nesting_info(iommu);
> >  			}
> >  			iommu_domain_free(domain->domain);
> >  			list_del(&domain->next);
> > @@ -2852,6 +2885,63 @@ static int vfio_iommu_type1_dirty_pages(struct
> vfio_iommu *iommu,
> >  	return -EINVAL;
> >  }
> >
> > +static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu,
> > +					unsigned int min,
> > +					unsigned int max)
> > +{
> > +	int ret = -ENOTSUPP;
> > +
> > +	mutex_lock(&iommu->lock);
> > +	if (iommu->vmm)
> > +		ret = vfio_pasid_alloc(iommu->vmm, min, max);
> > +	mutex_unlock(&iommu->lock);
> > +	return ret;
> > +}
> > +
> > +static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu,
> > +					unsigned int min,
> > +					unsigned int max)
> > +{
> > +	int ret = -ENOTSUPP;
> > +
> > +	mutex_lock(&iommu->lock);
> > +	if (iommu->vmm) {
> > +		vfio_pasid_free_range(iommu->vmm, min, max);
> > +		ret = 0;
> > +	}
> > +	mutex_unlock(&iommu->lock);
> > +	return ret;
> > +}
> > +
> > +static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu,
> > +					  unsigned long arg)
> > +{
> > +	struct vfio_iommu_type1_pasid_request req;
> > +	unsigned long minsz;
> > +
> > +	minsz = offsetofend(struct vfio_iommu_type1_pasid_request, range);
> > +
> > +	if (copy_from_user(&req, (void __user *)arg, minsz))
> > +		return -EFAULT;
> > +
> > +	if (req.argsz < minsz || (req.flags & ~VFIO_PASID_REQUEST_MASK))
> > +		return -EINVAL;
> > +
> > +	if (req.range.min > req.range.max)
> 
> Is it exploitable that a user can spin the kernel for a long time in the case of a free
> by calling this with [0, MAX_UINT] regardless of their actual allocations?

IOASID can ensure that user can only free the PASIDs allocated to the
user. but it's true, kernel needs to loop all the PASIDs within the
range provided by user. it may take a long time. is there anything we
can do? one thing may limit the range provided by user?

> > +		return -EINVAL;
> > +
> > +	switch (req.flags & VFIO_PASID_REQUEST_MASK) {
> > +	case VFIO_IOMMU_ALLOC_PASID:
> > +		return vfio_iommu_type1_pasid_alloc(iommu,
> > +					req.range.min, req.range.max);
> > +	case VFIO_IOMMU_FREE_PASID:
> > +		return vfio_iommu_type1_pasid_free(iommu,
> > +					req.range.min, req.range.max);
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +}
> > +
> >  static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  				   unsigned int cmd, unsigned long arg)  { @@ -
> 2868,6 +2958,8 @@
> > static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  		return vfio_iommu_type1_unmap_dma(iommu, arg);
> >  	case VFIO_IOMMU_DIRTY_PAGES:
> >  		return vfio_iommu_type1_dirty_pages(iommu, arg);
> > +	case VFIO_IOMMU_PASID_REQUEST:
> > +		return vfio_iommu_type1_pasid_request(iommu, arg);
> >  	}
> >
> >  	return -ENOTTY;
> > diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c
> > index dd5b6d1..2ea9f1a 100644
> > --- a/drivers/vfio/vfio_pasid.c
> > +++ b/drivers/vfio/vfio_pasid.c
> > @@ -54,6 +54,7 @@ void vfio_mm_put(struct vfio_mm *vmm)  {
> >  	kref_put_mutex(&vmm->kref, vfio_mm_release,
> > &vfio_pasid.vfio_mm_lock);  }
> > +EXPORT_SYMBOL_GPL(vfio_mm_put);
> >
> >  static void vfio_mm_get(struct vfio_mm *vmm)  { @@ -103,6 +104,13 @@
> > struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
> >  	mmput(mm);
> >  	return vmm;
> >  }
> > +EXPORT_SYMBOL_GPL(vfio_mm_get_from_task);
> > +
> > +int vfio_mm_ioasid_sid(struct vfio_mm *vmm) {
> > +	return vmm->ioasid_sid;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_mm_ioasid_sid);
> >
> >  int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)  { @@
> > -112,6 +120,7 @@ int vfio_pasid_alloc(struct vfio_mm *vmm, int min,
> > int max)
> >
> >  	return (pasid == INVALID_IOASID) ? -ENOSPC : pasid;  }
> > +EXPORT_SYMBOL_GPL(vfio_pasid_alloc);
> >
> >  void vfio_pasid_free_range(struct vfio_mm *vmm,
> >  			    ioasid_t min, ioasid_t max)
> > @@ -129,6 +138,7 @@ void vfio_pasid_free_range(struct vfio_mm *vmm,
> >  	for (; pasid <= max; pasid++)
> >  		ioasid_free(pasid);
> >  }
> > +EXPORT_SYMBOL_GPL(vfio_pasid_free_range);
> >
> >  static int __init vfio_pasid_init(void)  { diff --git
> > a/include/linux/vfio.h b/include/linux/vfio.h index 74e077d..8e60a32
> > 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -101,6 +101,7 @@ struct vfio_mm;
> >  #if IS_ENABLED(CONFIG_VFIO_PASID)
> >  extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct
> > *task);  extern void vfio_mm_put(struct vfio_mm *vmm);
> > +int vfio_mm_ioasid_sid(struct vfio_mm *vmm);
> >  extern int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max);
> > extern void vfio_pasid_free_range(struct vfio_mm *vmm,
> >  					ioasid_t min, ioasid_t max);
> > @@ -114,6 +115,11 @@ static inline void vfio_mm_put(struct vfio_mm
> > *vmm)  {  }
> >
> > +static inline int vfio_mm_ioasid_sid(struct vfio_mm *vmm) {
> > +	return -ENOTTY;
> > +}
> > +
> >  static inline int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int
> > max)  {
> >  	return -ENOTTY;
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index f1f39e1..657b2db 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -1162,6 +1162,42 @@ struct vfio_iommu_type1_dirty_bitmap_get {
> >
> >  #define VFIO_IOMMU_DIRTY_PAGES             _IO(VFIO_TYPE, VFIO_BASE + 17)
> >
> > +/**
> > + * VFIO_IOMMU_PASID_REQUEST - _IOWR(VFIO_TYPE, VFIO_BASE + 18,
> > + *				struct vfio_iommu_type1_pasid_request)
> > + *
> > + * PASID (Processor Address Space ID) is a PCIe concept for tagging
> > + * address spaces in DMA requests. When system-wide PASID allocation
> > + * is required by underlying iommu driver (e.g. Intel VT-d), this
> > + * provides an interface for userspace to request pasid alloc/free
> > + * for its assigned devices. Userspace should check the availability
> > + * of this API through VFIO_IOMMU_GET_INFO.
> > + *
> > + * @flags=VFIO_IOMMU_ALLOC_PASID, allocate a single PASID within @range.
> > + * @flags=VFIO_IOMMU_FREE_PASID, free the PASIDs within @range.
> > + * @range is [min, max], which means both @min and @max are inclusive.
> > + * ALLOC_PASID and FREE_PASID are mutually exclusive.
> > + *
> > + * returns: allocated PASID value on success, -errno on failure for
> > + *	     ALLOC_PASID;
> > + *	     0 for FREE_PASID operation;
> > + */
> > +struct vfio_iommu_type1_pasid_request {
> > +	__u32	argsz;
> > +#define VFIO_IOMMU_ALLOC_PASID	(1 << 0)
> > +#define VFIO_IOMMU_FREE_PASID	(1 << 1)
> 
> VFIO_IOMMU_PASID_FLAG_{ALLOC,FREE} would be more similar to other VFIO
> UAPI conventions.  Thanks,

yes, much better. will modify it.

Thanks,
Yi Liu

> Alex
> 
> > +	__u32	flags;
> > +	struct {
> > +		__u32	min;
> > +		__u32	max;
> > +	} range;
> > +};
> > +
> > +#define VFIO_PASID_REQUEST_MASK	(VFIO_IOMMU_ALLOC_PASID | \
> > +					 VFIO_IOMMU_FREE_PASID)
> > +
> > +#define VFIO_IOMMU_PASID_REQUEST	_IO(VFIO_TYPE, VFIO_BASE + 18)
> > +
> >  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU
> > -------- */
> >
> >  /*


  reply	other threads:[~2020-07-03  6:28 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-24  8:55 [PATCH v3 00/14] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
2020-06-24  8:55 ` [PATCH v3 01/14] vfio/type1: Refactor vfio_iommu_type1_ioctl() Liu Yi L
2020-07-02 21:21   ` Alex Williamson
2020-07-03  3:46     ` Liu, Yi L
2020-06-24  8:55 ` [PATCH v3 02/14] iommu: Report domain nesting info Liu Yi L
2020-06-26  7:47   ` Jean-Philippe Brucker
2020-06-26 16:04     ` Robin Murphy
2020-06-27  6:53       ` Liu, Yi L
2020-06-30  1:20         ` Tian, Kevin
2020-06-27  6:14     ` Liu, Yi L
2020-06-29  9:24   ` Stefan Hajnoczi
2020-06-29 12:23     ` Liu, Yi L
2020-06-30  2:00       ` Tian, Kevin
2020-06-30  3:45         ` Liu, Yi L
2020-07-03  9:59         ` Stefan Hajnoczi
2020-07-02 17:54   ` Alex Williamson
2020-07-03  3:53     ` Liu, Yi L
2020-06-24  8:55 ` [PATCH v3 03/14] vfio/type1: Report iommu nesting info to userspace Liu Yi L
2020-07-02 18:38   ` Alex Williamson
2020-07-03  6:05     ` Liu, Yi L
2020-07-03 13:03       ` Liu, Yi L
2020-06-24  8:55 ` [PATCH v3 04/14] vfio: Add PASID allocation/free support Liu Yi L
2020-07-02 21:17   ` Alex Williamson
2020-07-03  6:08     ` Liu, Yi L
2020-06-24  8:55 ` [PATCH v3 05/14] iommu/vt-d: Support setting ioasid set to domain Liu Yi L
2020-06-24  8:55 ` [PATCH v3 06/14] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free) Liu Yi L
2020-07-02 21:18   ` Alex Williamson
2020-07-03  6:28     ` Liu, Yi L [this message]
2020-07-08  8:16       ` Liu, Yi L
2020-07-08 19:54         ` Alex Williamson
2020-07-09  0:32           ` Liu, Yi L
2020-07-09  1:56             ` Tian, Kevin
2020-07-09  2:08               ` Liu, Yi L
2020-07-09  2:18                 ` Tian, Kevin
2020-07-09  2:26                   ` Liu, Yi L
2020-07-09  7:16                     ` Liu, Yi L
2020-07-09 14:27                       ` Alex Williamson
2020-07-09 18:05                         ` Jacob Pan
2020-07-10  5:39                         ` Liu, Yi L
2020-07-10 12:55                           ` Alex Williamson
2020-07-10 13:03                             ` Liu, Yi L
2020-06-24  8:55 ` [PATCH v3 07/14] iommu: Pass domain to sva_unbind_gpasid() Liu Yi L
2020-06-24  8:55 ` [PATCH v3 08/14] iommu/vt-d: Check ownership for PASIDs from user-space Liu Yi L
2020-06-24  8:55 ` [PATCH v3 09/14] vfio/type1: Support binding guest page tables to PASID Liu Yi L
2020-07-02 21:19   ` Alex Williamson
2020-07-03  6:46     ` Liu, Yi L
2020-06-24  8:55 ` [PATCH v3 10/14] vfio/type1: Allow invalidating first-level/stage IOMMU cache Liu Yi L
2020-07-02 21:19   ` Alex Williamson
2020-07-03  3:47     ` Liu, Yi L
2020-06-24  8:55 ` [PATCH v3 11/14] vfio/type1: Add vSVA support for IOMMU-backed mdevs Liu Yi L
2020-06-24  8:55 ` [PATCH v3 12/14] vfio/pci: Expose PCIe PASID capability to guest Liu Yi L
2020-06-24  8:55 ` [PATCH v3 13/14] vfio: Document dual stage control Liu Yi L
2020-06-29  9:21   ` Stefan Hajnoczi
2020-06-29  9:24     ` Liu, Yi L
2020-06-24  8:55 ` [PATCH v3 14/14] iommu/vt-d: Support reporting nesting capability info Liu Yi L

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CY4PR11MB1432DD97F44EB8AA5CCC87D8C36A0@CY4PR11MB1432.namprd11.prod.outlook.com \
    --to=yi.l.liu@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=eric.auger@redhat.com \
    --cc=hao.wu@intel.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=jean-philippe@linaro.org \
    --cc=joro@8bytes.org \
    --cc=jun.j.tian@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterx@redhat.com \
    --cc=yi.y.sun@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).