linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Auger Eric <eric.auger@redhat.com>
To: Don Dutile <ddutile@redhat.com>, Will Deacon <will.deacon@arm.com>
Cc: drjones@redhat.com, christoffer.dall@linaro.org,
	jason@lakedaemon.net, kvm@vger.kernel.org, marc.zyngier@arm.com,
	benh@kernel.crashing.org, joro@8bytes.org, punit.agrawal@arm.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
	diana.craciun@nxp.com,
	Alex Williamson <alex.williamson@redhat.com>,
	pranav.sawargaonkar@gmail.com, arnd@arndb.de, dwmw@amazon.co.uk,
	jcm@redhat.com, tglx@linutronix.de, robin.murphy@arm.com,
	linux-arm-kernel@lists.infradead.org, eric.auger.pro@gmail.com
Subject: Re: Summary of LPC guest MSI discussion in Santa Fe
Date: Wed, 9 Nov 2016 08:43:12 +0100	[thread overview]
Message-ID: <001659cd-5806-7729-8cea-dd6982010c9f@redhat.com> (raw)
In-Reply-To: <5822214F.2070500@redhat.com>

Hi Will,
On 08/11/2016 20:02, Don Dutile wrote:
> On 11/08/2016 12:54 PM, Will Deacon wrote:
>> On Tue, Nov 08, 2016 at 03:27:23PM +0100, Auger Eric wrote:
>>> On 08/11/2016 03:45, Will Deacon wrote:
>>>> Rather than treat these as separate problems, a better interface is to
>>>> tell userspace about a set of reserved regions, and have this include
>>>> the MSI doorbell, irrespective of whether or not it can be remapped.
>>>> Don suggested that we statically pick an address for the doorbell in a
>>>> similar way to x86, and have the kernel map it there. We could even
>>>> pick
>>>> 0xfee00000. If it conflicts with a reserved region on the platform (due
>>>> to (4)), then we'd obviously have to (deterministically?) allocate it
>>>> somewhere else, but probably within the bottom 4G.
>>> This is tentatively achieved now with
>>> [1] [RFC v2 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 - Alt II
>>> (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264506.html)
>>>
>> Yup, I saw that fly by. Hopefully some of the internals can be reused
>> with the current thinking on user ABI.
>>
>>>> The next question is how to tell userspace about all of the reserved
>>>> regions. Initially, the idea was to extend VFIO, however Alex pointed
>>>> out a horrible scenario:
>>>>
>>>>    1. QEMU spawns a VM on system 0
>>>>    2. VM is migrated to system 1
>>>>    3. QEMU attempts to passthrough a device using PCI hotplug
>>>>
>>>> In this scenario, the guest memory map is chosen at step (1), yet there
>>>> is no VFIO fd available to determine the reserved regions. Furthermore,
>>>> the reserved regions may vary between system 0 and system 1. This
>>>> pretty
>>>> much rules out using VFIO to determine the reserved regions.Alex
>>>> suggested
>>>> that the SMMU driver can advertise the regions via
>>>> /sys/class/iommu/. This
>>>> would solve part of the problem, but migration between systems with
>>>> different memory maps can still cause problems if the reserved regions
>>>> of the new system conflict with the guest memory map chosen by QEMU.
>>>
>>> OK so I understand we do not want anymore the VFIO chain capability API
>>> (patch 5 of above series) but we prefer a sysfs approach instead.
>> Right.
>>
>>> I understand the sysfs approach which allows the userspace to get the
>>> info earlier and independently on VFIO. Keeping in mind current QEMU
>>> virt - which is not the only userspace - will not do much from this info
>>> until we bring upheavals in virt address space management. So if I am
>>> not wrong, at the moment the main action to be undertaken is the
>>> rejection of the PCI hotplug in case we detect a collision.
>> I don't think so; it should be up to userspace to reject the hotplug.
>> If userspace doesn't have support for the regions, then that's fine --
>> you just end up in a situation where the CPU page table maps memory
>> somewhere that the device can't see. In other words, you'll end up with
>> spurious DMA failures, but that's exactly what happens with current
>> systems
>> if you passthrough an overlapping region (Robin demonstrated this on
>> Juno).
>>
>> Additionally, you can imagine some future support where you can tell the
>> guest not to use certain regions of its memory for DMA. In this case, you
>> wouldn't want to refuse the hotplug in the case of overlapping regions.
>>
>> Really, I think the kernel side just needs to enumerate the fixed
>> reserved
>> regions, place the doorbell at a fixed address and then advertise these
>> via sysfs.
>>
>>> I can respin [1]
>>> - studying and taking into account Robin's comments about dm_regions
>>> similarities
>>> - removing the VFIO capability chain and replacing this by a sysfs API
>> Ideally, this would be reusable between different SMMU drivers so the
>> sysfs
>> entries have the same format etc.
>>
>>> Would that be OK?
>> Sounds good to me. Are you in a position to prototype something on the
>> qemu
>> side once we've got kernel-side agreement?
yes sure.
>>
>>> What about Alex comments who wanted to report the usable memory ranges
>>> instead of unusable memory ranges?
>>>
>>> Also did you have a chance to discuss the following items:
>>> 1) the VFIO irq safety assessment
>> The discussion really focussed on system topology, as opposed to
>> properties
>> of the doorbell. Regardless of how the device talks to the doorbell, if
>> the doorbell can't protect against things like MSI spoofing, then it's
>> unsafe. My opinion is that we shouldn't allow passthrough by default on
>> systems with unsafe doorbells (we could piggyback on
>> allow_unsafe_interrupts
>> cmdline option to VFIO).
OK.
>>
>> A first step would be making all this opt-in, and only supporting GICv3
>> ITS for now.
> You're trying to support a config that is < GICv3 and no ITS ? ...
> That would be the equiv. of x86 pre-intr-remap, and that's why
> allow_unsafe_interrupts
> hook was created ... to enable devel/kick-the-tires.
>>> 2) the MSI reserved size computation (is an arbitrary size OK?)
>> If we fix the base address, we could fix a size too. However, we'd still
>> need to enumerate the doorbells to check that they fit in the region we
>> have. If not, then we can warn during boot and treat it the same way as
>> a resource conflict (that is, reallocate the region in some deterministic
>> way).
OK

Thanks

Eric
>>
>> Will
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2016-11-09  7:43 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-03 21:39 [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II) Eric Auger
2016-11-03 21:39 ` [RFC 1/8] vfio: fix vfio_info_cap_add/shift Eric Auger
2016-11-03 21:39 ` [RFC 2/8] iommu/iova: fix __alloc_and_insert_iova_range Eric Auger
2016-11-03 21:39 ` [RFC 3/8] iommu/dma: Allow MSI-only cookies Eric Auger
2016-11-03 21:39 ` [RFC 4/8] iommu: Add a list of iommu_reserved_region in iommu_domain Eric Auger
2016-11-03 21:39 ` [RFC 5/8] vfio/type1: Introduce RESV_IOVA_RANGE capability Eric Auger
2016-11-03 21:39 ` [RFC 6/8] iommu: Handle the list of reserved regions Eric Auger
2016-11-03 21:39 ` [RFC 7/8] iommu/vt-d: Implement add_reserved_regions callback Eric Auger
2016-11-03 21:39 ` [RFC 8/8] iommu/arm-smmu: implement " Eric Auger
2016-11-04  4:02 ` [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II) Alex Williamson
2016-11-08  2:45   ` Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II)) Will Deacon
2016-11-08 14:27     ` Summary of LPC guest MSI discussion in Santa Fe Auger Eric
2016-11-08 17:54       ` Will Deacon
2016-11-08 19:02         ` Don Dutile
2016-11-08 19:10           ` Will Deacon
2016-11-09  7:43           ` Auger Eric [this message]
2016-11-08 16:02     ` Don Dutile
2016-11-08 20:29     ` Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II)) Christoffer Dall
2016-11-08 23:35       ` Alex Williamson
2016-11-09  2:52         ` Summary of LPC guest MSI discussion in Santa Fe Don Dutile
2016-11-09 17:03           ` Will Deacon
2016-11-09 18:59             ` Don Dutile
2016-11-09 19:23               ` Christoffer Dall
2016-11-09 20:01                 ` Alex Williamson
2016-11-10 14:40                   ` Joerg Roedel
2016-11-10 17:07                     ` Alex Williamson
2016-11-09 20:31                 ` Will Deacon
2016-11-09 22:17                   ` Alex Williamson
2016-11-09 22:25                     ` Will Deacon
2016-11-09 23:24                       ` Alex Williamson
2016-11-09 23:38                         ` Will Deacon
2016-11-09 23:59                           ` Alex Williamson
2016-11-10  0:14                             ` Auger Eric
2016-11-10  0:55                               ` Alex Williamson
2016-11-10  2:01                                 ` Will Deacon
2016-11-10 11:14                                   ` Auger Eric
2016-11-10 17:46                                     ` Alex Williamson
2016-11-11 11:19                                       ` Joerg Roedel
2016-11-11 15:50                                         ` Alex Williamson
2016-11-11 16:05                                           ` Alex Williamson
2016-11-14 15:19                                             ` Joerg Roedel
2016-11-11 16:25                                           ` Don Dutile
2016-11-11 16:00                                         ` Don Dutile
2016-11-10 14:52                               ` Joerg Roedel
2016-11-09 20:11               ` Robin Murphy
2016-11-10 15:18                 ` Joerg Roedel
2016-11-21  5:13     ` Jon Masters
2016-11-23 20:12       ` Don Dutile

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=001659cd-5806-7729-8cea-dd6982010c9f@redhat.com \
    --to=eric.auger@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=arnd@arndb.de \
    --cc=benh@kernel.crashing.org \
    --cc=christoffer.dall@linaro.org \
    --cc=ddutile@redhat.com \
    --cc=diana.craciun@nxp.com \
    --cc=drjones@redhat.com \
    --cc=dwmw@amazon.co.uk \
    --cc=eric.auger.pro@gmail.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jason@lakedaemon.net \
    --cc=jcm@redhat.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=pranav.sawargaonkar@gmail.com \
    --cc=punit.agrawal@arm.com \
    --cc=robin.murphy@arm.com \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).