linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Joerg Roedel <joro@8bytes.org>,
	David Woodhouse <dwmw2@infradead.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	"Kirti Wankhede" <kwankhede@nvidia.com>
Cc: "Raj, Ashok" <ashok.raj@intel.com>,
	"Bie, Tiwei" <tiwei.bie@intel.com>,
	"Kumar, Sanjay K" <sanjay.k.kumar@intel.com>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Sun, Yi Y" <yi.y.sun@intel.com>,
	"Pan, Jacob jun" <jacob.jun.pan@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: RE: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device
Date: Thu, 13 Sep 2018 00:19:44 +0000	[thread overview]
Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D191300D06@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: <3602f8c1-df17-4894-1bcc-4d779f9aa7fd@arm.com>

> From: Jean-Philippe Brucker
> Sent: Thursday, September 13, 2018 1:54 AM
> 
> On 12/09/2018 03:42, Lu Baolu wrote:
> > Hi,
> >
> > On 09/11/2018 12:22 AM, Jean-Philippe Brucker wrote:
> >> Hi,
> >>
> >> On 30/08/2018 05:09, Lu Baolu wrote:
> >>> Below APIs are introduced in the IOMMU glue for device drivers to use
> >>> the finer granularity translation.
> >>>
> >>> * iommu_capable(IOMMU_CAP_AUX_DOMAIN)
> >>>    - Represents the ability for supporting multiple domains per device
> >>>      (a.k.a. finer granularity translations) of the IOMMU hardware.
> >>
> >> iommu_capable() cannot represent hardware capabilities, we need
> >> something else for systems with multiple IOMMUs that have different
> >> caps. How about iommu_domain_get_attr on the device's domain
> instead?
> >
> > Domain is not a good choice for per iommu cap query. A domain might be
> > attached to devices belonging to different iommu's.
> >
> > How about an API with device structure as parameter? A device always
> > belongs to a specific iommu. This API is supposed to be used the
> > device driver.
> 
> Ah right, domain attributes won't work. Your suggestion seems more
> suitable, but maybe users can simply try to enable auxiliary domains
> first, and conclude that the IOMMU doesn't support it if it returns an error
> 
> >>> * iommu_en(dis)able_aux_domain(struct device *dev)
> >>>    - Enable/disable the multiple domains capability for a device
> >>>      referenced by @dev.
> 
> It strikes me now that in the IOMMU driver,
> iommu_enable/disable_aux_domain() will do the same thing as
> iommu_sva_device_init/shutdown()
> (https://www.spinics.net/lists/arm-kernel/msg651896.html). Some IOMMU
> drivers want to enable PASID and allocate PASID tables only when
> requested by users, in the sva_init_device IOMMU op (see Joerg's comment
> last year https://patchwork.kernel.org/patch/9989307/#21025429). Maybe
> we could simply add a flag to iommu_sva_device_init?

We could combine, but definitely 'sva' should be removed :-)

> 
> >>> * iommu_auxiliary_id(struct iommu_domain *domain)
> >>>    - Return the index value used for finer-granularity DMA translation.
> >>>      The specific device driver needs to feed the hardware with this
> >>>      value, so that hardware device could issue the DMA transaction with
> >>>      this value tagged.
> >>
> >> This could also reuse iommu_domain_get_attr.
> >>
> >>
> >> More generally I'm having trouble understanding how auxiliary domains
> >> will be used. So VFIO allocates PASIDs like this:
> >
> > As I wrote in the cover letter, "auxiliary domain" is just a name to
> > ease discussion. It's actually has no special meaning (we think a domain
> > as an isolation boundary which could be used by the IOMMU to isolate
> > the DMA transactions out of a PCI device or partial of it).
> >
> > So drivers like vfio should see no difference when use an auxiliary
> > domain. The auxiliary domain is not aware out of iommu driver.
> 
> For an auxiliary domain, VFIO does need to retrieve the PASID and write
> it to hardware. But being able to reuse
> iommu_map/unmap/iova_to_phys/etc
> on the auxiliary domain is nice.
> 
> >> * iommu_enable_aux_domain(parent_dev)
> >> * iommu_domain_alloc() -> dom1
> >> * iommu_domain_alloc() -> dom2
> >> * iommu_attach_device(dom1, parent_dev)
> >>    -> dom1 gets PASID #1
> >> * iommu_attach_device(dom2, parent_dev)
> >>    -> dom2 gets PASID #2
> >>
> >> Then I'm not sure about the next steps, when userspace does
> >> VFIO_IOMMU_MAP_DMA or VFIO_IOMMU_BIND on an mdev's
> container. Is the
> >> following use accurate?
> >>
> >> For the single translation level:
> >> * iommu_map(dom1, ...) updates first-level/second-level pgtables for
> >> PASID #1
> >> * iommu_map(dom2, ...) updates first-level/second-level pgtables for
> >> PASID #2
> >>
> >> Nested translation:
> >> * iommu_map(dom1, ...) updates second-level pgtables for PASID #1
> >> * iommu_bind_table(dom1, ...) binds first-level pgtables, provided by
> >> the guest, for PASID #1
> >> * iommu_map(dom2, ...) updates second-level pgtables for PASID #2
> >> * iommu_bind_table(dom2, ...) binds first-level pgtables for PASID #2
> >>>
> >> I'm trying to understand how to implement this with SMMU and other
> >
> > This is proposed for architectures which support finer granularity
> > second level translation with no impact on architectures which only
> > support Source ID or the similar granularity.
> 
> Just to be clear, in this paragraph you're only referring to the
> Nested/second-level translation for mdev, which is specific to vt-d
> rev3? Other architectures can still do first-level translation with
> PASID, to support some use-cases of IOMMU aware mediated device
> (assigning mdevs to userspace drivers, for example)

yes. aux domain concept applies only to vt-d rev3 which introduces
scalable mode. Care is taken to avoid breaking usages on existing
architectures.

one note. Assigning mdevs to user space alone doesn't imply IOMMU
aware. All existing mdev usages use software or proprietary methods to
isolate DMA. There is only one potential IOMMU aware mdev usage 
which we talked not rely on vt-d rev3 scalable mode - wrap a random 
PCI device into a single mdev instance (no sharing). In that case mdev 
inherits RID from parent PCI device, thus is isolated by IOMMU in RID 
granular. Our RFC supports this usage too. In VFIO two usages (PASID-
based and RID-based) use same code path, i.e. always binding domain to
the parent device of mdev. But within IOMMU they go different paths.
PASID-based will go to aux-domain as iommu_enable_aux_domain
has been called on that device. RID-based will follow existing 
unmanaged domain path, as if it is parent device assignment.

> 
> >> IOMMUs. It's not a clean fit since we have a single domain to hold the
> >> second-level pgtables.
> >
> > Do you mind explaining why a domain holds multiple second-level
> > pgtables? Shouldn't that be multiple domains?
> 
> I didn't mean a single domain holding multiple second-level pgtables,
> but a single domain holding a single set of second-level pgtables for
> all mdevs. But let's ignore that, mdev and second-level isn't realistic
> for arm SMMU.

yes. single second-level doesn't allow multiple mdevs (each mdev
assigned to different user process or VM). that is why vt-d rev3
introduces scalable mode. :-)

> 
> >> Then again, the nested case probably doesn't
> >> matter for us - we might as well assign the parent directly, since all
> >> mdevs have the same second-level and can only be assigned to the same
> VM.
> >>
> >>
> >> Also, can non-VFIO device drivers use auxiliary domains to do
> map/unmap
> >> on PASIDs? They are asking to do that and I'm proposing the private
> >> PASID thing, but since aux domains provide a similar feature we should
> >> probably converge somehow.
> >
> > Yes, any non-VFIO device driver could use aux domain as well. The use
> > model is:
> >
> > iommu_enable_aux_domain(dev)
> > -- enables aux domain support for this device
> >
> > iommu_domain_alloc(dev)
> > -- allocate an iommu domain
> >
> > iommu_attach_device(domain, dev)
> > -- attach the domain to device
> >
> > iommu_auxiliary_id(domain)
> > -- retrieve the pasid id used by this domain
> >
> > The device driver then
> >
> > iommu_map(domain, ...)
> >
> > set the pasid id to hardware register and start to do dma.
> 
> Sounds good, I'll drop the private PASID patch if we can figure out a
> solution to the attach/detach_dev problem discussed on patch 8/10
> 

Can you elaborate a bit on private PASID usage? what is the
high level flow on it? 

Again based on earlier explanation, aux domain is specific to IOMMU
architecture supporting vtd scalable mode-like capability, which allows
separate 2nd/1st level translations per PASID. Need a better understanding
how private PASID is relevant here.

Thanks
Kevin

  reply	other threads:[~2018-09-13  0:22 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-30  4:09 [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 01/10] iommu: Add APIs for multiple domains per device Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 02/10] iommu/vt-d: Add multiple domains per device query Lu Baolu
2018-09-05 19:35   ` Alex Williamson
2018-09-06  0:54     ` Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 03/10] iommu/amd: Add default branch in amd_iommu_capable() Lu Baolu
2018-09-05 19:37   ` Alex Williamson
2018-09-06  0:55     ` Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 04/10] iommu/vt-d: Enable/disable multiple domains per device Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 05/10] iommu/vt-d: Attach/detach domains in auxiliary mode Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 06/10] iommu/vt-d: Return ID associated with an auxiliary domain Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 07/10] vfio/mdev: Add mediated device domain type Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 08/10] vfio/type1: Add domain at(de)taching group helpers Lu Baolu
2018-09-10 16:23   ` Jean-Philippe Brucker
2018-09-12  5:02     ` Lu Baolu
2018-09-12 17:54       ` Jean-Philippe Brucker
2018-09-13  0:35         ` Tian, Kevin
2018-09-14 14:45           ` Jean-Philippe Brucker
2018-09-15  2:36             ` Tian, Kevin
2018-09-18 15:52               ` Jean-Philippe Brucker
     [not found]                 ` <AADFC41AFE54684AB9EE6CBC0274A5D19130EAD7@SHSMSX101.ccr.corp.intel.com>
2018-09-19  2:10                   ` Lu Baolu
2018-09-25 17:55                   ` Jean-Philippe Brucker
2018-09-26  2:11                     ` Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 09/10] vfio/type1: Determine domain type of an mdev group Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 10/10] vfio/type1: Attach domain for " Lu Baolu
2018-09-05  3:01 ` [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device Tian, Kevin
2018-09-05 19:15   ` Alex Williamson
2018-09-06  1:29     ` Lu Baolu
2018-09-10 16:22 ` Jean-Philippe Brucker
2018-09-12  2:42   ` Lu Baolu
2018-09-12 17:54     ` Jean-Philippe Brucker
2018-09-13  0:19       ` Tian, Kevin [this message]
2018-09-13 15:03         ` Jean-Philippe Brucker
2018-09-13 16:55           ` Raj, Ashok
2018-09-14 14:39             ` Jean-Philippe Brucker
     [not found]           ` <AADFC41AFE54684AB9EE6CBC0274A5D191302ECE@SHSMSX101.ccr.corp.intel.com>
2018-09-14 14:40             ` Jean-Philippe Brucker
2018-09-14 21:04           ` Jacob Pan
2018-09-18 15:46             ` Jean-Philippe Brucker
2018-09-19  2:22               ` Tian, Kevin
2018-09-20 15:53                 ` Jacob Pan
2018-09-14  2:46       ` Lu Baolu
2018-09-14  2:53         ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AADFC41AFE54684AB9EE6CBC0274A5D191300D06@SHSMSX101.ccr.corp.intel.com \
    --to=kevin.tian@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=jean-philippe.brucker@arm.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sanjay.k.kumar@intel.com \
    --cc=tiwei.bie@intel.com \
    --cc=yi.y.sun@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).