All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Raj, Ashok" <ashok.raj@intel.com>
To: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Joerg Roedel <joro@8bytes.org>,
	David Woodhouse <dwmw2@infradead.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	"Bie, Tiwei" <tiwei.bie@intel.com>,
	"Kumar, Sanjay K" <sanjay.k.kumar@intel.com>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Sun, Yi Y" <yi.y.sun@intel.com>,
	"Pan, Jacob jun" <jacob.jun.pan@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Ashok Raj <ashok.raj@intel.com>
Subject: Re: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device
Date: Thu, 13 Sep 2018 09:55:21 -0700	[thread overview]
Message-ID: <20180913165520.GA14731@otc-nc-03> (raw)
In-Reply-To: <03d496b0-84c2-b3ca-5be5-d4540c6d8ec7@arm.com>

On Thu, Sep 13, 2018 at 04:03:01PM +0100, Jean-Philippe Brucker wrote:
> On 13/09/2018 01:19, Tian, Kevin wrote:
> >>> This is proposed for architectures which support finer granularity
> >>> second level translation with no impact on architectures which only
> >>> support Source ID or the similar granularity.
> >>
> >> Just to be clear, in this paragraph you're only referring to the
> >> Nested/second-level translation for mdev, which is specific to vt-d
> >> rev3? Other architectures can still do first-level translation with
> >> PASID, to support some use-cases of IOMMU aware mediated device
> >> (assigning mdevs to userspace drivers, for example)
> > 
> > yes. aux domain concept applies only to vt-d rev3 which introduces
> > scalable mode. Care is taken to avoid breaking usages on existing
> > architectures.
> > 
> > one note. Assigning mdevs to user space alone doesn't imply IOMMU
> > aware. All existing mdev usages use software or proprietary methods to
> > isolate DMA. There is only one potential IOMMU aware mdev usage 
> > which we talked not rely on vt-d rev3 scalable mode - wrap a random 
> > PCI device into a single mdev instance (no sharing). In that case mdev 
> > inherits RID from parent PCI device, thus is isolated by IOMMU in RID 
> > granular. Our RFC supports this usage too. In VFIO two usages (PASID-
> > based and RID-based) use same code path, i.e. always binding domain to
> > the parent device of mdev. But within IOMMU they go different paths.
> > PASID-based will go to aux-domain as iommu_enable_aux_domain
> > has been called on that device. RID-based will follow existing 
> > unmanaged domain path, as if it is parent device assignment.
> 
> For Arm SMMU we're more interested in the PASID-granular case than the
> RID-granular one. It doesn't necessarily require vt-d rev3 scalable
> mode, the following example can be implemented with an SMMUv3, since it
> only needs PASID-granular first-level translation:

You are right, you can simply use the first level as IOVA for every PASID.

Only issue becomes when you need to assign that to a guest, you would be required
to shadow the 1st level. If you have a 2nd level per-pasid first level can
be managed in guest and don't require to shadow them.

> 
> We have a PCI function that supports PASID, and can be partitioned into
> multiple isolated entities, mdevs. Each mdev has an MMIO frame, an MSI
> vector and a PASID.
> 
> Different processes (userspace drivers, not QEMU) each open one mdev. A
> process controlling one mdev has two ways of doing DMA:
> 
> (1) Classically, the process uses a VFIO_TYPE1v2_IOMMU container. This
> creates an auxiliary domain for the mdev, with PASID #35. The process
> creates DMA mappings with VFIO_IOMMU_MAP_DMA. VFIO calls iommu_map on
> the auxiliary domain. The IOMMU driver populates the pgtables associated
> with PASID #35.
> 
> (2) SVA. One way of doing it: the process uses a new
> "VFIO_TYPE1_SVA_IOMMU" type of container. VFIO binds the process address
> space to the device, gets PASID #35. Simpler, but not everyone wants to
> use SVA, especially not userspace drivers which need the highest
> performance.
> 
> 
> This example only needs to modify first-level translation, and works
> with SMMUv3. The kernel here could be the host, in which case
> second-level translation is disabled in the SMMU, or it could be the
> guest, in which case second-level mappings are created by QEMU and
> first-level translation is managed by assigning PASID tables to the guest.
> 
> So (2) would use iommu_sva_bind_device(), but (1) needs something else.
> Aren't auxiliary domains suitable for (1)? Why limit auxiliary domain to
> second-level or nested translation? It seems silly to use a different
> API for first-level, since the flow in userspace and VFIO is the same as
> your second-level case as far as MAP_DMA ioctl goes. The difference is
> that in your case the auxiliary domain supports an additional operation
> which binds first-level page tables. An auxiliary domain that only
> supports first-level wouldn't support this operation, but it can still
> implement iommu_map/unmap/etc.
> 
> 
> Another note: if for some reason you did want to allow userspace to
> choose between first-level or second-level, you could implement the
> VFIO_TYPE1_NESTING_IOMMU container. It acts like a VFIO_TYPE1v2_IOMMU,
> but also sets the DOMAIN_ATTR_NESTING on the IOMMU domain. So DMA_MAP
> ioctl on a NESTING container would populate second-level, and DMA_MAP on
> a normal container populates first-level. But if you're always going to
> use second-level by default, the distinction isn't necessary.

Where is the nesting attribute specified? in vt-d2 it was part of context
entry, so also meant all PASID's are nested now. In vt-d3 its part of
PASID context.

It seems unsafe to share PASID's with different VM's since any request 
W/O PASID has only one mapping.

> 
> 
> >> Sounds good, I'll drop the private PASID patch if we can figure out a
> >> solution to the attach/detach_dev problem discussed on patch 8/10
> >>
> > 
> > Can you elaborate a bit on private PASID usage? what is the
> > high level flow on it? 
> > 
> > Again based on earlier explanation, aux domain is specific to IOMMU
> > architecture supporting vtd scalable mode-like capability, which allows
> > separate 2nd/1st level translations per PASID. Need a better understanding
> > how private PASID is relevant here.
> 
> Private PASIDs are used for doing iommu_map/iommu_unmap on PASIDs
> (first-level translation):
> https://www.spinics.net/lists/dri-devel/msg177003.html As above, some
> people don't want SVA, some can't do it, some may even want a few
> private address spaces just for their kernel driver. They need a way to
> allocate PASIDs and do iommu_map/iommu_unmap on them, without binding to
> a process. I was planning to add the private PASID patch to my SVA
> series, but in my opinion the feature overlaps with auxiliary domains.

It sounds like it maps to AUX domains.

WARNING: multiple messages have this Message-ID (diff)
From: "Raj, Ashok" <ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: Jean-Philippe Brucker
	<jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
Cc: "Tian,
	Kevin" <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"Bie, Tiwei" <tiwei.bie-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"Kumar,
	Sanjay K"
	<sanjay.k.kumar-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Kirti Wankhede
	<kwankhede-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>,
	"iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org"
	<iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Alex Williamson
	<alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"Pan,
	Jacob jun"
	<jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Ashok Raj <ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	David Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	"Sun, Yi Y" <yi.y.sun-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Subject: Re: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device
Date: Thu, 13 Sep 2018 09:55:21 -0700	[thread overview]
Message-ID: <20180913165520.GA14731@otc-nc-03> (raw)
In-Reply-To: <03d496b0-84c2-b3ca-5be5-d4540c6d8ec7-5wv7dgnIgG8@public.gmane.org>

On Thu, Sep 13, 2018 at 04:03:01PM +0100, Jean-Philippe Brucker wrote:
> On 13/09/2018 01:19, Tian, Kevin wrote:
> >>> This is proposed for architectures which support finer granularity
> >>> second level translation with no impact on architectures which only
> >>> support Source ID or the similar granularity.
> >>
> >> Just to be clear, in this paragraph you're only referring to the
> >> Nested/second-level translation for mdev, which is specific to vt-d
> >> rev3? Other architectures can still do first-level translation with
> >> PASID, to support some use-cases of IOMMU aware mediated device
> >> (assigning mdevs to userspace drivers, for example)
> > 
> > yes. aux domain concept applies only to vt-d rev3 which introduces
> > scalable mode. Care is taken to avoid breaking usages on existing
> > architectures.
> > 
> > one note. Assigning mdevs to user space alone doesn't imply IOMMU
> > aware. All existing mdev usages use software or proprietary methods to
> > isolate DMA. There is only one potential IOMMU aware mdev usage 
> > which we talked not rely on vt-d rev3 scalable mode - wrap a random 
> > PCI device into a single mdev instance (no sharing). In that case mdev 
> > inherits RID from parent PCI device, thus is isolated by IOMMU in RID 
> > granular. Our RFC supports this usage too. In VFIO two usages (PASID-
> > based and RID-based) use same code path, i.e. always binding domain to
> > the parent device of mdev. But within IOMMU they go different paths.
> > PASID-based will go to aux-domain as iommu_enable_aux_domain
> > has been called on that device. RID-based will follow existing 
> > unmanaged domain path, as if it is parent device assignment.
> 
> For Arm SMMU we're more interested in the PASID-granular case than the
> RID-granular one. It doesn't necessarily require vt-d rev3 scalable
> mode, the following example can be implemented with an SMMUv3, since it
> only needs PASID-granular first-level translation:

You are right, you can simply use the first level as IOVA for every PASID.

Only issue becomes when you need to assign that to a guest, you would be required
to shadow the 1st level. If you have a 2nd level per-pasid first level can
be managed in guest and don't require to shadow them.

> 
> We have a PCI function that supports PASID, and can be partitioned into
> multiple isolated entities, mdevs. Each mdev has an MMIO frame, an MSI
> vector and a PASID.
> 
> Different processes (userspace drivers, not QEMU) each open one mdev. A
> process controlling one mdev has two ways of doing DMA:
> 
> (1) Classically, the process uses a VFIO_TYPE1v2_IOMMU container. This
> creates an auxiliary domain for the mdev, with PASID #35. The process
> creates DMA mappings with VFIO_IOMMU_MAP_DMA. VFIO calls iommu_map on
> the auxiliary domain. The IOMMU driver populates the pgtables associated
> with PASID #35.
> 
> (2) SVA. One way of doing it: the process uses a new
> "VFIO_TYPE1_SVA_IOMMU" type of container. VFIO binds the process address
> space to the device, gets PASID #35. Simpler, but not everyone wants to
> use SVA, especially not userspace drivers which need the highest
> performance.
> 
> 
> This example only needs to modify first-level translation, and works
> with SMMUv3. The kernel here could be the host, in which case
> second-level translation is disabled in the SMMU, or it could be the
> guest, in which case second-level mappings are created by QEMU and
> first-level translation is managed by assigning PASID tables to the guest.
> 
> So (2) would use iommu_sva_bind_device(), but (1) needs something else.
> Aren't auxiliary domains suitable for (1)? Why limit auxiliary domain to
> second-level or nested translation? It seems silly to use a different
> API for first-level, since the flow in userspace and VFIO is the same as
> your second-level case as far as MAP_DMA ioctl goes. The difference is
> that in your case the auxiliary domain supports an additional operation
> which binds first-level page tables. An auxiliary domain that only
> supports first-level wouldn't support this operation, but it can still
> implement iommu_map/unmap/etc.
> 
> 
> Another note: if for some reason you did want to allow userspace to
> choose between first-level or second-level, you could implement the
> VFIO_TYPE1_NESTING_IOMMU container. It acts like a VFIO_TYPE1v2_IOMMU,
> but also sets the DOMAIN_ATTR_NESTING on the IOMMU domain. So DMA_MAP
> ioctl on a NESTING container would populate second-level, and DMA_MAP on
> a normal container populates first-level. But if you're always going to
> use second-level by default, the distinction isn't necessary.

Where is the nesting attribute specified? in vt-d2 it was part of context
entry, so also meant all PASID's are nested now. In vt-d3 its part of
PASID context.

It seems unsafe to share PASID's with different VM's since any request 
W/O PASID has only one mapping.

> 
> 
> >> Sounds good, I'll drop the private PASID patch if we can figure out a
> >> solution to the attach/detach_dev problem discussed on patch 8/10
> >>
> > 
> > Can you elaborate a bit on private PASID usage? what is the
> > high level flow on it? 
> > 
> > Again based on earlier explanation, aux domain is specific to IOMMU
> > architecture supporting vtd scalable mode-like capability, which allows
> > separate 2nd/1st level translations per PASID. Need a better understanding
> > how private PASID is relevant here.
> 
> Private PASIDs are used for doing iommu_map/iommu_unmap on PASIDs
> (first-level translation):
> https://www.spinics.net/lists/dri-devel/msg177003.html As above, some
> people don't want SVA, some can't do it, some may even want a few
> private address spaces just for their kernel driver. They need a way to
> allocate PASIDs and do iommu_map/iommu_unmap on them, without binding to
> a process. I was planning to add the private PASID patch to my SVA
> series, but in my opinion the feature overlaps with auxiliary domains.

It sounds like it maps to AUX domains.

  reply	other threads:[~2018-09-13 17:35 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-30  4:09 [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 01/10] iommu: Add APIs for multiple domains per device Lu Baolu
2018-08-30  4:09   ` Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 02/10] iommu/vt-d: Add multiple domains per device query Lu Baolu
2018-08-30  4:09   ` Lu Baolu
2018-09-05 19:35   ` Alex Williamson
2018-09-06  0:54     ` Lu Baolu
2018-09-06  0:54       ` Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 03/10] iommu/amd: Add default branch in amd_iommu_capable() Lu Baolu
2018-08-30  4:09   ` Lu Baolu
2018-09-05 19:37   ` Alex Williamson
2018-09-05 19:37     ` Alex Williamson
2018-09-06  0:55     ` Lu Baolu
2018-09-06  0:55       ` Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 04/10] iommu/vt-d: Enable/disable multiple domains per device Lu Baolu
2018-08-30  4:09   ` Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 05/10] iommu/vt-d: Attach/detach domains in auxiliary mode Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 06/10] iommu/vt-d: Return ID associated with an auxiliary domain Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 07/10] vfio/mdev: Add mediated device domain type Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 08/10] vfio/type1: Add domain at(de)taching group helpers Lu Baolu
2018-09-10 16:23   ` Jean-Philippe Brucker
2018-09-10 16:23     ` Jean-Philippe Brucker
2018-09-12  5:02     ` Lu Baolu
2018-09-12 17:54       ` Jean-Philippe Brucker
2018-09-12 17:54         ` Jean-Philippe Brucker
2018-09-13  0:35         ` Tian, Kevin
2018-09-14 14:45           ` Jean-Philippe Brucker
2018-09-15  2:36             ` Tian, Kevin
2018-09-18 15:52               ` Jean-Philippe Brucker
     [not found]                 ` <f46bcf5f-002b-4269-a750-c4254c9fc89f-5wv7dgnIgG8@public.gmane.org>
2018-09-18 23:26                   ` Tian, Kevin
2018-09-19  2:10                     ` Lu Baolu
2018-09-25 17:55                     ` Jean-Philippe Brucker
2018-09-26  2:11                       ` Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 09/10] vfio/type1: Determine domain type of an mdev group Lu Baolu
2018-08-30  4:09 ` [RFC PATCH v2 10/10] vfio/type1: Attach domain for " Lu Baolu
2018-09-05  3:01 ` [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device Tian, Kevin
2018-09-05  3:01   ` Tian, Kevin
2018-09-05 19:15   ` Alex Williamson
2018-09-06  1:29     ` Lu Baolu
2018-09-10 16:22 ` Jean-Philippe Brucker
2018-09-12  2:42   ` Lu Baolu
2018-09-12 17:54     ` Jean-Philippe Brucker
2018-09-13  0:19       ` Tian, Kevin
2018-09-13  0:19         ` Tian, Kevin
2018-09-13 15:03         ` Jean-Philippe Brucker
2018-09-13 16:55           ` Raj, Ashok [this message]
2018-09-13 16:55             ` Raj, Ashok
2018-09-14 14:39             ` Jean-Philippe Brucker
     [not found]           ` <03d496b0-84c2-b3ca-5be5-d4540c6d8ec7-5wv7dgnIgG8@public.gmane.org>
2018-09-14  0:39             ` Tian, Kevin
2018-09-14 14:40               ` Jean-Philippe Brucker
2018-09-14 14:40                 ` Jean-Philippe Brucker
2018-09-14 21:04           ` Jacob Pan
2018-09-18 15:46             ` Jean-Philippe Brucker
2018-09-19  2:22               ` Tian, Kevin
2018-09-20 15:53                 ` Jacob Pan
2018-09-14  2:46       ` Lu Baolu
2018-09-14  2:53         ` Tian, Kevin
2018-09-14  2:53           ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180913165520.GA14731@otc-nc-03 \
    --to=ashok.raj@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=jean-philippe.brucker@arm.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sanjay.k.kumar@intel.com \
    --cc=tiwei.bie@intel.com \
    --cc=yi.y.sun@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.