linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jacob Pan <jacob.jun.pan@intel.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	"Bie, Tiwei" <tiwei.bie@intel.com>,
	"Kumar, Sanjay K" <sanjay.k.kumar@intel.com>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	David Woodhouse <dwmw2@infradead.org>,
	"Sun, Yi Y" <yi.y.sun@intel.com>,
	jacob.jun.pan@intel.com
Subject: Re: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device
Date: Thu, 20 Sep 2018 08:53:39 -0700	[thread overview]
Message-ID: <20180920085339.2ea67e72@jacob-builder> (raw)
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D19130ED5B@SHSMSX101.ccr.corp.intel.com>

On Wed, 19 Sep 2018 02:22:03 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> > Sent: Tuesday, September 18, 2018 11:47 PM
> > 
> > On 14/09/2018 22:04, Jacob Pan wrote:  
> > >> This example only needs to modify first-level translation, and
> > >> works with SMMUv3. The kernel here could be the host, in which
> > >> case second-level translation is disabled in the SMMU, or it
> > >> could be the guest, in which case second-level mappings are
> > >> created by QEMU and first-level translation is managed by
> > >> assigning PASID tables to the guest.  
> > > There is a difference in case of guest SVA. VT-d v3 will bind
> > > guest PASID and guest CR3 instead of the guest PASID table. Then
> > > turn on nesting. In case of mdev, the second level is obtained
> > > from the aux domain which was setup for the default PASID. Or in
> > > case of PCI device, second level is harvested from RID2PASID.  
> > 
> > Right, though I wasn't talking about the host managing guest SVA
> > here, but a kernel binding the address space of one of its
> > userspace drivers to the mdev.
> >   
> > >> So (2) would use iommu_sva_bind_device(),  
> > > We would need something different than that for guest bind, just
> > > to show the two cases:>
> > > int iommu_sva_bind_device(struct device *dev, struct mm_struct
> > > *mm,  
> > int  
> > > *pasid, unsigned long flags, void *drvdata)
> > >
> > > (WIP)
> > > int sva_bind_gpasid(struct device *dev, struct gpasid_bind_data
> > > *data) where:
> > > /**
> > >  * struct gpasid_bind_data - Information about device and guest
> > > PASID binding
> > >  * @pasid:       Process address space ID used for the guest mm
> > >  * @addr_width:  Guest address width. Paging mode can also be
> > > derived.
> > >  * @gcr3:        Guest CR3 value from guest mm
> > >  */
> > > struct gpasid_bind_data {
> > >         __u32 pasid;
> > >         __u64 gcr3;
> > >         __u32 addr_width;
> > >         __u32 flags;
> > > #define IOMMU_SVA_GPASID_SRE    BIT(0) /* supervisor request */
> > > };
> > > Perhaps there is room to merge with io_mm but the life cycle  
> > management  
> > > of guest PASID and host PASID will be different if you rely on mm
> > > release callback than FD.  
> 
> let's not calling gpasid here - which makes sense only in
> bind_pasid_table proposal where pasid table thus pasid space is
> managed by guest. In above context it is always about host pasid
> (allocated in system-wide), which could point to a host cr3 (user
> process) or a guest cr3 (vm case).
> 
I agree this gpasid is confusing, we have a system wide PASID
name space. Just a way to differentiate different bind, perhaps
just a flag indicating the PASID is used for guest.
i.e.
struct pasid_bind_data {
         __u32 pasid;
         __u64 gcr3;
         __u32 addr_width;
         __u32 flags;
#define IOMMU_SVA_GPASID_SRE    BIT(0) /* supervisor request */
#define IOMMU_SVA_PASID_GUEST   BIT(0) /* host pasid used by guest */
};

> > I think gpasid management should stay separate from io_mm, since in
> > your case VFIO mechanisms are used for life cycle management of the
> > VM, similarly to the former bind_pasid_table proposal. For example
> > closing the container fd would unbind all guest page tables. The
> > QEMU process' address space lifetime seems like the wrong thing to
> > track for gpasid.  
> 
> I sort of agree (though not thinking through all the flow carefully).
> PASIDs are allocated per iommu domain, thus release also happens when
> domain is detached (along with container fd close).
> 
I also prefer to keep gpasid separate.

But I don't think we need to have per iommu domain per PASID for guest
SVA case. Assuming you are talking about host IOMMU domain. The PASID
bind call is a result of guest PASID cache flush with a PASID
previously allocated. The host just need to put gcr3 into the PASID
entry then harvest the second level from the existing domain.
> >   
> > >> but (1) needs something
> > >> else. Aren't auxiliary domains suitable for (1)? Why limit
> > >> auxiliary domain to second-level or nested translation? It seems
> > >> silly to use a different API for first-level, since the flow in
> > >> userspace and VFIO is the same as your second-level case as far
> > >> as MAP_DMA ioctl goes. The difference is that in your case the
> > >> auxiliary domain supports an additional operation which binds
> > >> first-level page tables. An auxiliary domain that only supports
> > >> first-level wouldn't support this operation, but it can still
> > >> implement iommu_map/unmap/etc. 
> > > I think the intention is that when a mdev is created, we don;t
> > > know whether it will be used for SVA or IOVA. So aux domain is
> > > here to "hold a spot" for the default PASID such that MAP_DMA
> > > calls can work as usual, which is second level only. Later, if
> > > SVA is used on the mdev there will be another PASID allocated for
> > > that purpose. Do we need to create an aux domain for each PASID?
> > > the translation can be looked up by the combination of parent dev
> > > and pasid.  
> > 
> > When allocating a new PASID for the guest, I suppose you need to
> > clone the second-level translation config? In which case a single
> > aux domain for the mdev might be easier to implement in the IOMMU
> > driver. Entirely up to you since we don't have this case on SMMUv3
> >   
> 
> One thing to highlight in related discussions (also mentioned in other
> thread). There is not a new iommu domain type called 'aux'. 'aux'
> matters only to a specific device when a domain is attached to that
> device which has aux capability enabled. Same domain can be attached
> to other device as normal domain. In that case multiple PASIDs
> allocated on same mdev are tied to same aux domain, same bare metal
> SVA case, i.e. any domain (normal or aux) can include 2nd level
> structure and multiple 1st level structures. Jean is correct - all
> PASIDs in same domain then share 2nd level translation, and there are
> io_mm or similar tracking structures to associate each PASID to a 1st
> level translation structure.
> 
I think we are all talking about the same thing :)
yes, 2nd level is cloned from aux domain/default PASID for mdev, and
pdev similarly from DMA_MAP domain.

> Thanks
> Kevin
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

[Jacob Pan]

  reply	other threads:[~2018-09-20 15:52 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-30  4:09 [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 01/10] iommu: Add APIs for multiple domains per device Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 02/10] iommu/vt-d: Add multiple domains per device query Lu Baolu
2018-09-05 19:35   ` Alex Williamson
2018-09-06  0:54     ` Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 03/10] iommu/amd: Add default branch in amd_iommu_capable() Lu Baolu
2018-09-05 19:37   ` Alex Williamson
2018-09-06  0:55     ` Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 04/10] iommu/vt-d: Enable/disable multiple domains per device Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 05/10] iommu/vt-d: Attach/detach domains in auxiliary mode Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 06/10] iommu/vt-d: Return ID associated with an auxiliary domain Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 07/10] vfio/mdev: Add mediated device domain type Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 08/10] vfio/type1: Add domain at(de)taching group helpers Lu Baolu
2018-09-10 16:23   ` Jean-Philippe Brucker
2018-09-12  5:02     ` Lu Baolu
2018-09-12 17:54       ` Jean-Philippe Brucker
2018-09-13  0:35         ` Tian, Kevin
2018-09-14 14:45           ` Jean-Philippe Brucker
2018-09-15  2:36             ` Tian, Kevin
2018-09-18 15:52               ` Jean-Philippe Brucker
     [not found]                 ` <AADFC41AFE54684AB9EE6CBC0274A5D19130EAD7@SHSMSX101.ccr.corp.intel.com>
2018-09-19  2:10                   ` Lu Baolu
2018-09-25 17:55                   ` Jean-Philippe Brucker
2018-09-26  2:11                     ` Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 09/10] vfio/type1: Determine domain type of an mdev group Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 10/10] vfio/type1: Attach domain for " Lu Baolu
2018-09-05  3:01 ` [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device Tian, Kevin
2018-09-05 19:15   ` Alex Williamson
2018-09-06  1:29     ` Lu Baolu
2018-09-10 16:22 ` Jean-Philippe Brucker
2018-09-12  2:42   ` Lu Baolu
2018-09-12 17:54     ` Jean-Philippe Brucker
2018-09-13  0:19       ` Tian, Kevin
2018-09-13 15:03         ` Jean-Philippe Brucker
2018-09-13 16:55           ` Raj, Ashok
2018-09-14 14:39             ` Jean-Philippe Brucker
     [not found]           ` <AADFC41AFE54684AB9EE6CBC0274A5D191302ECE@SHSMSX101.ccr.corp.intel.com>
2018-09-14 14:40             ` Jean-Philippe Brucker
2018-09-14 21:04           ` Jacob Pan
2018-09-18 15:46             ` Jean-Philippe Brucker
2018-09-19  2:22               ` Tian, Kevin
2018-09-20 15:53                 ` Jacob Pan [this message]
2018-09-14  2:46       ` Lu Baolu
2018-09-14  2:53         ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180920085339.2ea67e72@jacob-builder \
    --to=jacob.jun.pan@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jean-philippe.brucker@arm.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sanjay.k.kumar@intel.com \
    --cc=tiwei.bie@intel.com \
    --cc=yi.y.sun@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).