linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoffer Dall <christoffer.dall@linaro.org>
To: Don Dutile <ddutile@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Eric Auger <eric.auger@redhat.com>,
	eric.auger.pro@gmail.com, marc.zyngier@arm.com,
	robin.murphy@arm.com, joro@8bytes.org, tglx@linutronix.de,
	jason@lakedaemon.net, linux-arm-kernel@lists.infradead.org,
	kvm@vger.kernel.org, drjones@redhat.com,
	linux-kernel@vger.kernel.org, pranav.sawargaonkar@gmail.com,
	iommu@lists.linux-foundation.org, punit.agrawal@arm.com,
	diana.craciun@nxp.com, benh@kernel.crashing.org, arnd@arndb.de,
	jcm@redhat.com, dwmw@amazon.co.uk
Subject: Re: Summary of LPC guest MSI discussion in Santa Fe
Date: Wed, 9 Nov 2016 20:23:03 +0100	[thread overview]
Message-ID: <20161109192303.GD15676@cbox> (raw)
In-Reply-To: <582371FB.2040808@redhat.com>

On Wed, Nov 09, 2016 at 01:59:07PM -0500, Don Dutile wrote:
> On 11/09/2016 12:03 PM, Will Deacon wrote:
> >On Tue, Nov 08, 2016 at 09:52:33PM -0500, Don Dutile wrote:
> >>On 11/08/2016 06:35 PM, Alex Williamson wrote:
> >>>On Tue, 8 Nov 2016 21:29:22 +0100
> >>>Christoffer Dall <christoffer.dall@linaro.org> wrote:
> >>>>Is my understanding correct, that you need to tell userspace about the
> >>>>location of the doorbell (in the IOVA space) in case (2), because even
> >>>>though the configuration of the device is handled by the (host) kernel
> >>>>through trapping of the BARs, we have to avoid the VFIO user programming
> >>>>the device to create other DMA transactions to this particular address,
> >>>>since that will obviously conflict and either not produce the desired
> >>>>DMA transactions or result in unintended weird interrupts?
> >
> >Yes, that's the crux of the issue.
> >
> >>>Correct, if the MSI doorbell IOVA range overlaps RAM in the VM, then
> >>>it's potentially a DMA target and we'll get bogus data on DMA read from
> >>>the device, and lose data and potentially trigger spurious interrupts on
> >>>DMA write from the device.  Thanks,
> >>>
> >>That's b/c the MSI doorbells are not positioned *above* the SMMU, i.e.,
> >>they address match before the SMMU checks are done.  if
> >>all DMA addrs had to go through SMMU first, then the DMA access could
> >>be ignored/rejected.
> >
> >That's actually not true :( The SMMU can't generally distinguish between MSI
> >writes and DMA writes, so it would just see a write transaction to the
> >doorbell address, regardless of how it was generated by the endpoint.
> >
> >Will
> >
> So, we have real systems where MSI doorbells are placed at the same IOVA
> that could have memory for a guest

I don't think this is a property of a hardware system.  THe problem is
userspace not knowing where in the IOVA space the kernel is going to
place the doorbell, so you can end up (basically by chance) that some
IPA range of guest memory overlaps with the IOVA space for the doorbell.


>, but not at the same IOVA as memory on real hw ?

On real hardware without an IOMMU the system designer would have to
separate the IOVA and RAM in the physical address space.  With an IOMMU,
the SMMU driver just makes sure to allocate separate regions in the IOVA
space.

The challenge, as I understand it, happens with the VM, because the VM
doesn't allocate the IOVA for the MSI doorbell itself, but the host
kernel does this, independently from the attributes (e.g. memory map) of
the VM.

Because the IOVA is a single resource, but with two independent entities
allocating chunks of it (the host kernel for the MSI doorbell IOVA, and
the VFIO user for other DMA operations), you have to provide some
coordination between those to entities to avoid conflicts.  In the case
of KVM, the two entities are the host kernel and the VFIO user (QEMU/the
VM), and the host kernel informs the VFIO user to never attempt to use
the doorbell IOVA already reserved by the host kernel for DMA.

One way to do that is to ensure that the IPA space of the VFIO user
corresponding to the doorbell IOVA is simply not valid, ie. the reserved
regions that avoid for example QEMU to allocate RAM there.

(I suppose it's technically possible to get around this issue by letting
QEMU place RAM wherever it wants but tell the guest to never use a
particular subset of its RAM for DMA, because that would conflict with
the doorbell IOVA or be seen as p2p transactions.  But I think we all
probably agree that it's a disgusting idea.)

> How are memory holes passed to SMMU so it doesn't have this issue for bare-metal
> (assign an IOVA that overlaps an MSI doorbell address)?
> 

As I understand it, the SMMU driver manages the whole IOVA space when
VFIO is *not* involved, so it simply allocates non-overlapping regions.

The problem occurs when you have two independent entities essentially
attempting to mange the same resource (and the problem is exacerbated by
the VM potentially allocating slots in the IOVA space which may have
other limitations it doesn't know about, for example the p2p regions,
because the VM doesn't know anything about the topology of the
underlying physical system).

Christoffer

  reply	other threads:[~2016-11-09 19:23 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-03 21:39 [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II) Eric Auger
2016-11-03 21:39 ` [RFC 1/8] vfio: fix vfio_info_cap_add/shift Eric Auger
2016-11-03 21:39 ` [RFC 2/8] iommu/iova: fix __alloc_and_insert_iova_range Eric Auger
2016-11-03 21:39 ` [RFC 3/8] iommu/dma: Allow MSI-only cookies Eric Auger
2016-11-03 21:39 ` [RFC 4/8] iommu: Add a list of iommu_reserved_region in iommu_domain Eric Auger
2016-11-03 21:39 ` [RFC 5/8] vfio/type1: Introduce RESV_IOVA_RANGE capability Eric Auger
2016-11-03 21:39 ` [RFC 6/8] iommu: Handle the list of reserved regions Eric Auger
2016-11-03 21:39 ` [RFC 7/8] iommu/vt-d: Implement add_reserved_regions callback Eric Auger
2016-11-03 21:39 ` [RFC 8/8] iommu/arm-smmu: implement " Eric Auger
2016-11-04  4:02 ` [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II) Alex Williamson
2016-11-08  2:45   ` Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II)) Will Deacon
2016-11-08 14:27     ` Summary of LPC guest MSI discussion in Santa Fe Auger Eric
2016-11-08 17:54       ` Will Deacon
2016-11-08 19:02         ` Don Dutile
2016-11-08 19:10           ` Will Deacon
2016-11-09  7:43           ` Auger Eric
2016-11-08 16:02     ` Don Dutile
2016-11-08 20:29     ` Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II)) Christoffer Dall
2016-11-08 23:35       ` Alex Williamson
2016-11-09  2:52         ` Summary of LPC guest MSI discussion in Santa Fe Don Dutile
2016-11-09 17:03           ` Will Deacon
2016-11-09 18:59             ` Don Dutile
2016-11-09 19:23               ` Christoffer Dall [this message]
2016-11-09 20:01                 ` Alex Williamson
2016-11-10 14:40                   ` Joerg Roedel
2016-11-10 17:07                     ` Alex Williamson
2016-11-09 20:31                 ` Will Deacon
2016-11-09 22:17                   ` Alex Williamson
2016-11-09 22:25                     ` Will Deacon
2016-11-09 23:24                       ` Alex Williamson
2016-11-09 23:38                         ` Will Deacon
2016-11-09 23:59                           ` Alex Williamson
2016-11-10  0:14                             ` Auger Eric
2016-11-10  0:55                               ` Alex Williamson
2016-11-10  2:01                                 ` Will Deacon
2016-11-10 11:14                                   ` Auger Eric
2016-11-10 17:46                                     ` Alex Williamson
2016-11-11 11:19                                       ` Joerg Roedel
2016-11-11 15:50                                         ` Alex Williamson
2016-11-11 16:05                                           ` Alex Williamson
2016-11-14 15:19                                             ` Joerg Roedel
2016-11-11 16:25                                           ` Don Dutile
2016-11-11 16:00                                         ` Don Dutile
2016-11-10 14:52                               ` Joerg Roedel
2016-11-09 20:11               ` Robin Murphy
2016-11-10 15:18                 ` Joerg Roedel
2016-11-21  5:13     ` Jon Masters
2016-11-23 20:12       ` Don Dutile

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161109192303.GD15676@cbox \
    --to=christoffer.dall@linaro.org \
    --cc=alex.williamson@redhat.com \
    --cc=arnd@arndb.de \
    --cc=benh@kernel.crashing.org \
    --cc=ddutile@redhat.com \
    --cc=diana.craciun@nxp.com \
    --cc=drjones@redhat.com \
    --cc=dwmw@amazon.co.uk \
    --cc=eric.auger.pro@gmail.com \
    --cc=eric.auger@redhat.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jason@lakedaemon.net \
    --cc=jcm@redhat.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=pranav.sawargaonkar@gmail.com \
    --cc=punit.agrawal@arm.com \
    --cc=robin.murphy@arm.com \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).