From: Auger Eric <eric.auger@redhat.com>
To: Don Dutile <ddutile@redhat.com>, Will Deacon <will.deacon@arm.com>
Cc: drjones@redhat.com, christoffer.dall@linaro.org,
jason@lakedaemon.net, kvm@vger.kernel.org, marc.zyngier@arm.com,
benh@kernel.crashing.org, joro@8bytes.org, punit.agrawal@arm.com,
linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
diana.craciun@nxp.com,
Alex Williamson <alex.williamson@redhat.com>,
pranav.sawargaonkar@gmail.com, arnd@arndb.de, dwmw@amazon.co.uk,
jcm@redhat.com, tglx@linutronix.de, robin.murphy@arm.com,
linux-arm-kernel@lists.infradead.org, eric.auger.pro@gmail.com
Subject: Re: Summary of LPC guest MSI discussion in Santa Fe
Date: Wed, 9 Nov 2016 08:43:12 +0100 [thread overview]
Message-ID: <001659cd-5806-7729-8cea-dd6982010c9f@redhat.com> (raw)
In-Reply-To: <5822214F.2070500@redhat.com>
Hi Will,
On 08/11/2016 20:02, Don Dutile wrote:
> On 11/08/2016 12:54 PM, Will Deacon wrote:
>> On Tue, Nov 08, 2016 at 03:27:23PM +0100, Auger Eric wrote:
>>> On 08/11/2016 03:45, Will Deacon wrote:
>>>> Rather than treat these as separate problems, a better interface is to
>>>> tell userspace about a set of reserved regions, and have this include
>>>> the MSI doorbell, irrespective of whether or not it can be remapped.
>>>> Don suggested that we statically pick an address for the doorbell in a
>>>> similar way to x86, and have the kernel map it there. We could even
>>>> pick
>>>> 0xfee00000. If it conflicts with a reserved region on the platform (due
>>>> to (4)), then we'd obviously have to (deterministically?) allocate it
>>>> somewhere else, but probably within the bottom 4G.
>>> This is tentatively achieved now with
>>> [1] [RFC v2 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 - Alt II
>>> (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264506.html)
>>>
>> Yup, I saw that fly by. Hopefully some of the internals can be reused
>> with the current thinking on user ABI.
>>
>>>> The next question is how to tell userspace about all of the reserved
>>>> regions. Initially, the idea was to extend VFIO, however Alex pointed
>>>> out a horrible scenario:
>>>>
>>>> 1. QEMU spawns a VM on system 0
>>>> 2. VM is migrated to system 1
>>>> 3. QEMU attempts to passthrough a device using PCI hotplug
>>>>
>>>> In this scenario, the guest memory map is chosen at step (1), yet there
>>>> is no VFIO fd available to determine the reserved regions. Furthermore,
>>>> the reserved regions may vary between system 0 and system 1. This
>>>> pretty
>>>> much rules out using VFIO to determine the reserved regions.Alex
>>>> suggested
>>>> that the SMMU driver can advertise the regions via
>>>> /sys/class/iommu/. This
>>>> would solve part of the problem, but migration between systems with
>>>> different memory maps can still cause problems if the reserved regions
>>>> of the new system conflict with the guest memory map chosen by QEMU.
>>>
>>> OK so I understand we do not want anymore the VFIO chain capability API
>>> (patch 5 of above series) but we prefer a sysfs approach instead.
>> Right.
>>
>>> I understand the sysfs approach which allows the userspace to get the
>>> info earlier and independently on VFIO. Keeping in mind current QEMU
>>> virt - which is not the only userspace - will not do much from this info
>>> until we bring upheavals in virt address space management. So if I am
>>> not wrong, at the moment the main action to be undertaken is the
>>> rejection of the PCI hotplug in case we detect a collision.
>> I don't think so; it should be up to userspace to reject the hotplug.
>> If userspace doesn't have support for the regions, then that's fine --
>> you just end up in a situation where the CPU page table maps memory
>> somewhere that the device can't see. In other words, you'll end up with
>> spurious DMA failures, but that's exactly what happens with current
>> systems
>> if you passthrough an overlapping region (Robin demonstrated this on
>> Juno).
>>
>> Additionally, you can imagine some future support where you can tell the
>> guest not to use certain regions of its memory for DMA. In this case, you
>> wouldn't want to refuse the hotplug in the case of overlapping regions.
>>
>> Really, I think the kernel side just needs to enumerate the fixed
>> reserved
>> regions, place the doorbell at a fixed address and then advertise these
>> via sysfs.
>>
>>> I can respin [1]
>>> - studying and taking into account Robin's comments about dm_regions
>>> similarities
>>> - removing the VFIO capability chain and replacing this by a sysfs API
>> Ideally, this would be reusable between different SMMU drivers so the
>> sysfs
>> entries have the same format etc.
>>
>>> Would that be OK?
>> Sounds good to me. Are you in a position to prototype something on the
>> qemu
>> side once we've got kernel-side agreement?
yes sure.
>>
>>> What about Alex comments who wanted to report the usable memory ranges
>>> instead of unusable memory ranges?
>>>
>>> Also did you have a chance to discuss the following items:
>>> 1) the VFIO irq safety assessment
>> The discussion really focussed on system topology, as opposed to
>> properties
>> of the doorbell. Regardless of how the device talks to the doorbell, if
>> the doorbell can't protect against things like MSI spoofing, then it's
>> unsafe. My opinion is that we shouldn't allow passthrough by default on
>> systems with unsafe doorbells (we could piggyback on
>> allow_unsafe_interrupts
>> cmdline option to VFIO).
OK.
>>
>> A first step would be making all this opt-in, and only supporting GICv3
>> ITS for now.
> You're trying to support a config that is < GICv3 and no ITS ? ...
> That would be the equiv. of x86 pre-intr-remap, and that's why
> allow_unsafe_interrupts
> hook was created ... to enable devel/kick-the-tires.
>>> 2) the MSI reserved size computation (is an arbitrary size OK?)
>> If we fix the base address, we could fix a size too. However, we'd still
>> need to enumerate the doorbells to check that they fit in the region we
>> have. If not, then we can warn during boot and treat it the same way as
>> a resource conflict (that is, reallocate the region in some deterministic
>> way).
OK
Thanks
Eric
>>
>> Will
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2016-11-09 7:43 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-03 21:39 [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II) Eric Auger
2016-11-03 21:39 ` [RFC 1/8] vfio: fix vfio_info_cap_add/shift Eric Auger
2016-11-03 21:39 ` [RFC 2/8] iommu/iova: fix __alloc_and_insert_iova_range Eric Auger
2016-11-03 21:39 ` [RFC 3/8] iommu/dma: Allow MSI-only cookies Eric Auger
2016-11-03 21:39 ` [RFC 4/8] iommu: Add a list of iommu_reserved_region in iommu_domain Eric Auger
2016-11-03 21:39 ` [RFC 5/8] vfio/type1: Introduce RESV_IOVA_RANGE capability Eric Auger
2016-11-03 21:39 ` [RFC 6/8] iommu: Handle the list of reserved regions Eric Auger
2016-11-03 21:39 ` [RFC 7/8] iommu/vt-d: Implement add_reserved_regions callback Eric Auger
2016-11-03 21:39 ` [RFC 8/8] iommu/arm-smmu: implement " Eric Auger
2016-11-04 4:02 ` [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II) Alex Williamson
2016-11-08 2:45 ` Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II)) Will Deacon
2016-11-08 14:27 ` Summary of LPC guest MSI discussion in Santa Fe Auger Eric
2016-11-08 17:54 ` Will Deacon
2016-11-08 19:02 ` Don Dutile
2016-11-08 19:10 ` Will Deacon
2016-11-09 7:43 ` Auger Eric [this message]
2016-11-08 16:02 ` Don Dutile
2016-11-08 20:29 ` Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II)) Christoffer Dall
2016-11-08 23:35 ` Alex Williamson
2016-11-09 2:52 ` Summary of LPC guest MSI discussion in Santa Fe Don Dutile
2016-11-09 17:03 ` Will Deacon
2016-11-09 18:59 ` Don Dutile
2016-11-09 19:23 ` Christoffer Dall
2016-11-09 20:01 ` Alex Williamson
2016-11-10 14:40 ` Joerg Roedel
2016-11-10 17:07 ` Alex Williamson
2016-11-09 20:31 ` Will Deacon
2016-11-09 22:17 ` Alex Williamson
2016-11-09 22:25 ` Will Deacon
2016-11-09 23:24 ` Alex Williamson
2016-11-09 23:38 ` Will Deacon
2016-11-09 23:59 ` Alex Williamson
2016-11-10 0:14 ` Auger Eric
2016-11-10 0:55 ` Alex Williamson
2016-11-10 2:01 ` Will Deacon
2016-11-10 11:14 ` Auger Eric
2016-11-10 17:46 ` Alex Williamson
2016-11-11 11:19 ` Joerg Roedel
2016-11-11 15:50 ` Alex Williamson
2016-11-11 16:05 ` Alex Williamson
2016-11-14 15:19 ` Joerg Roedel
2016-11-11 16:25 ` Don Dutile
2016-11-11 16:00 ` Don Dutile
2016-11-10 14:52 ` Joerg Roedel
2016-11-09 20:11 ` Robin Murphy
2016-11-10 15:18 ` Joerg Roedel
2016-11-21 5:13 ` Jon Masters
2016-11-23 20:12 ` Don Dutile
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=001659cd-5806-7729-8cea-dd6982010c9f@redhat.com \
--to=eric.auger@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=arnd@arndb.de \
--cc=benh@kernel.crashing.org \
--cc=christoffer.dall@linaro.org \
--cc=ddutile@redhat.com \
--cc=diana.craciun@nxp.com \
--cc=drjones@redhat.com \
--cc=dwmw@amazon.co.uk \
--cc=eric.auger.pro@gmail.com \
--cc=iommu@lists.linux-foundation.org \
--cc=jason@lakedaemon.net \
--cc=jcm@redhat.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marc.zyngier@arm.com \
--cc=pranav.sawargaonkar@gmail.com \
--cc=punit.agrawal@arm.com \
--cc=robin.murphy@arm.com \
--cc=tglx@linutronix.de \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).