linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: dwmw2@infradead.org, iommu@lists.linux-foundation.org,
	qemu-devel@nongnu.org, joerg.roedel@amd.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/3] iommu: Introduce iommu_group
Date: Wed, 18 Apr 2012 14:07:57 -0600	[thread overview]
Message-ID: <1334779677.3112.190.camel@bling.home> (raw)
In-Reply-To: <20120418095824.GB5387@truffala.fritz.box>

On Wed, 2012-04-18 at 19:58 +1000, David Gibson wrote:
> On Mon, Apr 02, 2012 at 03:14:40PM -0600, Alex Williamson wrote:
> > IOMMUs often do not have visibility of individual devices in the
> > system.  Due to IOMMU design, bus topology, or device quirks, we
> > can often only identify groups of devices.  Examples include
> > Intel VT-d & AMD-Vi which often have function level visibility
> > compared to POWER partitionable endpoints which have bridge level
> > granularity.
> 
> That's a significant oversimplification of the situation on POWER,
> although it doesn't really matter in this context.  On older (i.e. pre
> PCI-E) hardware, PEs have either host bridge (i.e. domain)
> granularity, or in IIUC in some cases p2p bridge granularity, using
> special p2p bridges, since that's the only real way to do iommu
> differentiation without the PCI-E requestor IDs.  This isn't as coarse
> as it seems in practice, because the hardware is usually built with a
> bridge per physical PCI slot.
> 
> On newer PCI-E hardware, the PE granularity is basically a firmware
> decision, and can go down to function level.  I believe pHyp puts the
> granularity at the bridge level.  Our non-virtualized Linux "firmware"
> currently does put it at the function level, but Ben is thinking about
> changing that to bridge level: again, because of the hardware design
> that isn't as coarse as it seems, and at this level we can hardware
> guarantee isolation to a degree that's not possible at the function
> level.

Ok, thanks for the clarification.  This should support either model and
it will be up to the iommu driver to fill the groups with the right
devices.

> >  PCIe-to-PCI bridges also often cloud the IOMMU
> > visibility as it cannot distiguish devices behind the bridge.
> > Devices can also sometimes hurt themselves by initiating DMA using
> > the wrong source ID on a multifunction PCI device.
> > 
> > IOMMU groups are meant to help solve these problems and hopefully
> > become the working unit of the IOMMI API.
> 
> So far, so simple.  No objections here.  I am trying to work out what
> the real difference in approach is in this seriers from either your or
> my earlier isolation group series.  AFAICT it's just that this
> approach is explicitly only about IOMMU identity, ignoring (here) any
> other factors which might affect isolation.  Or am I missing
> something?

Yes, they are very similar and actually also similar to how VFIO manages
groups.  It's easy to start some kind of group structure, the hard part
is in the details and particularly where to stop.  My attempt to figure
out where isolation groups stop went quite poorly, ending up with a
ridiculously complicated mess of hierarchical groups that got worse as I
tried to fill in the gaps.

With iommu groups I try to take a step back and simplify.  I initially
had a goal of describing only the minimum iommu granularity sets, this
is where the dma_dev idea came from.  But the iommu granularity doesn't
really guarantee all that much or allow any ability for the iommu driver
to add additional policies that would be useful for userspace drivers
(ex. multi-function devices and peer-to-peer isolation).  So again I've
had to allow that a group might have multiple visible requestor IDs
within the group.  This time though I'm trying to disallow hierarchies,
which means that even kernel, dma_ops, usage of groups are restricted to
a single, platform defined, level of isolation.  I'm also trying to stay
out of the business of providing a group management interface.  I only
want to describe groups.  Things like stopping driver probes should be
device level problems.  In effect, this level should not be providing
enforcement and ownership, something like VFIO will do that.  So the
differences are subtle, but important.  Thanks,

Alex


  reply	other threads:[~2012-04-18 20:08 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-02 21:14 [RFC PATCH 0/3] IOMMU groups Alex Williamson
2012-04-02 21:14 ` [RFC PATCH 1/3] iommu: Introduce iommu_group Alex Williamson
2012-04-18  9:58   ` David Gibson
2012-04-18 20:07     ` Alex Williamson [this message]
2012-04-02 21:14 ` [RFC PATCH 2/3] iommu: Create basic group infrastructure and update AMD-Vi & Intel VT-d Alex Williamson
2012-04-18 11:55   ` David Gibson
2012-04-18 20:28     ` Alex Williamson
2012-04-02 21:14 ` [RFC PATCH 3/3] iommu: Create attach/detach group interface Alex Williamson
2012-04-10 21:03 ` [RFC PATCH 0/3] IOMMU groups Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1334779677.3112.190.camel@bling.home \
    --to=alex.williamson@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joerg.roedel@amd.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).