linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Will Deacon <will.deacon@arm.com>
To: Auger Eric <eric.auger@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	drjones@redhat.com, jason@lakedaemon.net, kvm@vger.kernel.org,
	marc.zyngier@arm.com, benh@kernel.crashing.org, joro@8bytes.org,
	punit.agrawal@arm.com, linux-kernel@vger.kernel.org,
	arnd@arndb.de, diana.craciun@nxp.com,
	iommu@lists.linux-foundation.org, pranav.sawargaonkar@gmail.com,
	ddutile@redhat.com, linux-arm-kernel@lists.infradead.org,
	jcm@redhat.com, tglx@linutronix.de, robin.murphy@arm.com,
	dwmw@amazon.co.uk, christoffer.dall@linaro.org,
	eric.auger.pro@gmail.com
Subject: Re: Summary of LPC guest MSI discussion in Santa Fe
Date: Tue, 8 Nov 2016 17:54:57 +0000	[thread overview]
Message-ID: <20161108175457.GK20591@arm.com> (raw)
In-Reply-To: <dae12190-1eb6-20a9-5740-9e5be8bb65fc@redhat.com>

On Tue, Nov 08, 2016 at 03:27:23PM +0100, Auger Eric wrote:
> On 08/11/2016 03:45, Will Deacon wrote:
> > Rather than treat these as separate problems, a better interface is to
> > tell userspace about a set of reserved regions, and have this include
> > the MSI doorbell, irrespective of whether or not it can be remapped.
> > Don suggested that we statically pick an address for the doorbell in a
> > similar way to x86, and have the kernel map it there. We could even pick
> > 0xfee00000. If it conflicts with a reserved region on the platform (due
> > to (4)), then we'd obviously have to (deterministically?) allocate it
> > somewhere else, but probably within the bottom 4G.
> 
> This is tentatively achieved now with
> [1] [RFC v2 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 - Alt II
> (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264506.html)

Yup, I saw that fly by. Hopefully some of the internals can be reused
with the current thinking on user ABI.

> > The next question is how to tell userspace about all of the reserved
> > regions. Initially, the idea was to extend VFIO, however Alex pointed
> > out a horrible scenario:
> > 
> >   1. QEMU spawns a VM on system 0
> >   2. VM is migrated to system 1
> >   3. QEMU attempts to passthrough a device using PCI hotplug
> > 
> > In this scenario, the guest memory map is chosen at step (1), yet there
> > is no VFIO fd available to determine the reserved regions. Furthermore,
> > the reserved regions may vary between system 0 and system 1. This pretty
> > much rules out using VFIO to determine the reserved regions.Alex suggested
> > that the SMMU driver can advertise the regions via /sys/class/iommu/. This
> > would solve part of the problem, but migration between systems with
> > different memory maps can still cause problems if the reserved regions
> > of the new system conflict with the guest memory map chosen by QEMU.
> 
> 
> OK so I understand we do not want anymore the VFIO chain capability API
> (patch 5 of above series) but we prefer a sysfs approach instead.

Right.

> I understand the sysfs approach which allows the userspace to get the
> info earlier and independently on VFIO. Keeping in mind current QEMU
> virt - which is not the only userspace - will not do much from this info
> until we bring upheavals in virt address space management. So if I am
> not wrong, at the moment the main action to be undertaken is the
> rejection of the PCI hotplug in case we detect a collision.

I don't think so; it should be up to userspace to reject the hotplug.
If userspace doesn't have support for the regions, then that's fine --
you just end up in a situation where the CPU page table maps memory
somewhere that the device can't see. In other words, you'll end up with
spurious DMA failures, but that's exactly what happens with current systems
if you passthrough an overlapping region (Robin demonstrated this on Juno).

Additionally, you can imagine some future support where you can tell the
guest not to use certain regions of its memory for DMA. In this case, you
wouldn't want to refuse the hotplug in the case of overlapping regions.

Really, I think the kernel side just needs to enumerate the fixed reserved
regions, place the doorbell at a fixed address and then advertise these
via sysfs.

> I can respin [1]
> - studying and taking into account Robin's comments about dm_regions
> similarities
> - removing the VFIO capability chain and replacing this by a sysfs API

Ideally, this would be reusable between different SMMU drivers so the sysfs
entries have the same format etc.

> Would that be OK?

Sounds good to me. Are you in a position to prototype something on the qemu
side once we've got kernel-side agreement?

> What about Alex comments who wanted to report the usable memory ranges
> instead of unusable memory ranges?
> 
> Also did you have a chance to discuss the following items:
> 1) the VFIO irq safety assessment

The discussion really focussed on system topology, as opposed to properties
of the doorbell. Regardless of how the device talks to the doorbell, if
the doorbell can't protect against things like MSI spoofing, then it's
unsafe. My opinion is that we shouldn't allow passthrough by default on
systems with unsafe doorbells (we could piggyback on allow_unsafe_interrupts
cmdline option to VFIO).

A first step would be making all this opt-in, and only supporting GICv3
ITS for now.

> 2) the MSI reserved size computation (is an arbitrary size OK?)

If we fix the base address, we could fix a size too. However, we'd still
need to enumerate the doorbells to check that they fit in the region we
have. If not, then we can warn during boot and treat it the same way as
a resource conflict (that is, reallocate the region in some deterministic
way).

Will

  reply	other threads:[~2016-11-08 17:55 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-03 21:39 [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II) Eric Auger
2016-11-03 21:39 ` [RFC 1/8] vfio: fix vfio_info_cap_add/shift Eric Auger
2016-11-03 21:39 ` [RFC 2/8] iommu/iova: fix __alloc_and_insert_iova_range Eric Auger
2016-11-03 21:39 ` [RFC 3/8] iommu/dma: Allow MSI-only cookies Eric Auger
2016-11-03 21:39 ` [RFC 4/8] iommu: Add a list of iommu_reserved_region in iommu_domain Eric Auger
2016-11-03 21:39 ` [RFC 5/8] vfio/type1: Introduce RESV_IOVA_RANGE capability Eric Auger
2016-11-03 21:39 ` [RFC 6/8] iommu: Handle the list of reserved regions Eric Auger
2016-11-03 21:39 ` [RFC 7/8] iommu/vt-d: Implement add_reserved_regions callback Eric Auger
2016-11-03 21:39 ` [RFC 8/8] iommu/arm-smmu: implement " Eric Auger
2016-11-04  4:02 ` [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II) Alex Williamson
2016-11-08  2:45   ` Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II)) Will Deacon
2016-11-08 14:27     ` Summary of LPC guest MSI discussion in Santa Fe Auger Eric
2016-11-08 17:54       ` Will Deacon [this message]
2016-11-08 19:02         ` Don Dutile
2016-11-08 19:10           ` Will Deacon
2016-11-09  7:43           ` Auger Eric
2016-11-08 16:02     ` Don Dutile
2016-11-08 20:29     ` Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II)) Christoffer Dall
2016-11-08 23:35       ` Alex Williamson
2016-11-09  2:52         ` Summary of LPC guest MSI discussion in Santa Fe Don Dutile
2016-11-09 17:03           ` Will Deacon
2016-11-09 18:59             ` Don Dutile
2016-11-09 19:23               ` Christoffer Dall
2016-11-09 20:01                 ` Alex Williamson
2016-11-10 14:40                   ` Joerg Roedel
2016-11-10 17:07                     ` Alex Williamson
2016-11-09 20:31                 ` Will Deacon
2016-11-09 22:17                   ` Alex Williamson
2016-11-09 22:25                     ` Will Deacon
2016-11-09 23:24                       ` Alex Williamson
2016-11-09 23:38                         ` Will Deacon
2016-11-09 23:59                           ` Alex Williamson
2016-11-10  0:14                             ` Auger Eric
2016-11-10  0:55                               ` Alex Williamson
2016-11-10  2:01                                 ` Will Deacon
2016-11-10 11:14                                   ` Auger Eric
2016-11-10 17:46                                     ` Alex Williamson
2016-11-11 11:19                                       ` Joerg Roedel
2016-11-11 15:50                                         ` Alex Williamson
2016-11-11 16:05                                           ` Alex Williamson
2016-11-14 15:19                                             ` Joerg Roedel
2016-11-11 16:25                                           ` Don Dutile
2016-11-11 16:00                                         ` Don Dutile
2016-11-10 14:52                               ` Joerg Roedel
2016-11-09 20:11               ` Robin Murphy
2016-11-10 15:18                 ` Joerg Roedel
2016-11-21  5:13     ` Jon Masters
2016-11-23 20:12       ` Don Dutile

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161108175457.GK20591@arm.com \
    --to=will.deacon@arm.com \
    --cc=alex.williamson@redhat.com \
    --cc=arnd@arndb.de \
    --cc=benh@kernel.crashing.org \
    --cc=christoffer.dall@linaro.org \
    --cc=ddutile@redhat.com \
    --cc=diana.craciun@nxp.com \
    --cc=drjones@redhat.com \
    --cc=dwmw@amazon.co.uk \
    --cc=eric.auger.pro@gmail.com \
    --cc=eric.auger@redhat.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jason@lakedaemon.net \
    --cc=jcm@redhat.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=pranav.sawargaonkar@gmail.com \
    --cc=punit.agrawal@arm.com \
    --cc=robin.murphy@arm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).