From: Auger Eric <eric.auger@redhat.com> To: Will Deacon <will.deacon@arm.com>, Alex Williamson <alex.williamson@redhat.com> Cc: drjones@redhat.com, jason@lakedaemon.net, kvm@vger.kernel.org, marc.zyngier@arm.com, benh@kernel.crashing.org, joro@8bytes.org, punit.agrawal@arm.com, linux-kernel@vger.kernel.org, arnd@arndb.de, diana.craciun@nxp.com, iommu@lists.linux-foundation.org, pranav.sawargaonkar@gmail.com, ddutile@redhat.com, linux-arm-kernel@lists.infradead.org, jcm@redhat.com, tglx@linutronix.de, robin.murphy@arm.com, dwmw@amazon.co.uk, christoffer.dall@linaro.org, eric.auger.pro@gmail.com Subject: Re: Summary of LPC guest MSI discussion in Santa Fe Date: Tue, 8 Nov 2016 15:27:23 +0100 [thread overview] Message-ID: <dae12190-1eb6-20a9-5740-9e5be8bb65fc@redhat.com> (raw) In-Reply-To: <20161108024559.GA20591@arm.com> Hi Will, On 08/11/2016 03:45, Will Deacon wrote: > Hi all, > > I figured this was a reasonable post to piggy-back on for the LPC minutes > relating to guest MSIs on arm64. > > On Thu, Nov 03, 2016 at 10:02:05PM -0600, Alex Williamson wrote: >> We can always have QEMU reject hot-adding the device if the reserved >> region overlaps existing guest RAM, but I don't even really see how we >> advise users to give them a reasonable chance of avoiding that >> possibility. Apparently there are also ARM platforms where MSI pages >> cannot be remapped to support the previous programmable user/VM >> address, is it even worthwhile to support those platforms? Does that >> decision influence whether user programmable MSI reserved regions are >> really a second class citizen to fixed reserved regions? I expect >> we'll be talking about this tomorrow morning, but I certainly haven't >> come up with any viable solutions to this. Thanks, > > At LPC last week, we discussed guest MSIs on arm64 as part of the PCI > microconference. I presented some slides to illustrate some of the issues > we're trying to solve: > > http://www.willdeacon.ukfsn.org/bitbucket/lpc-16/msi-in-guest-arm64.pdf > > Punit took some notes (thanks!) on the etherpad here: > > https://etherpad.openstack.org/p/LPC2016_PCI Thanks to both of you for the minutes and slides. Unfortunately I could not travel but my ears were burning ;-) > > although the discussion was pretty lively and jumped about, so I've had > to go from memory where the notes didn't capture everything that was > said. > > To summarise, arm64 platforms differ in their handling of MSIs when compared > to x86: > > 1. The physical memory map is not standardised (Jon pointed out that > this is something that was realised late on) > 2. MSIs are usually treated the same as DMA writes, in that they must be > mapped by the SMMU page tables so that they target a physical MSI > doorbell > 3. On some platforms, MSIs bypass the SMMU entirely (e.g. due to an MSI > doorbell built into the PCI RC) > 4. Platforms typically have some set of addresses that abort before > reaching the SMMU (e.g. because the PCI identifies them as P2P). > > All of this means that userspace (QEMU) needs to identify the memory > regions corresponding to points (3) and (4) and ensure that they are > not allocated in the guest physical (IPA) space. For platforms that can > remap the MSI doorbell as in (2), then some space also needs to be > allocated for that. > > Rather than treat these as separate problems, a better interface is to > tell userspace about a set of reserved regions, and have this include > the MSI doorbell, irrespective of whether or not it can be remapped. > Don suggested that we statically pick an address for the doorbell in a > similar way to x86, and have the kernel map it there. We could even pick > 0xfee00000. If it conflicts with a reserved region on the platform (due > to (4)), then we'd obviously have to (deterministically?) allocate it > somewhere else, but probably within the bottom 4G. This is tentatively achieved now with [1] [RFC v2 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 - Alt II (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264506.html) > > The next question is how to tell userspace about all of the reserved > regions. Initially, the idea was to extend VFIO, however Alex pointed > out a horrible scenario: > > 1. QEMU spawns a VM on system 0 > 2. VM is migrated to system 1 > 3. QEMU attempts to passthrough a device using PCI hotplug > > In this scenario, the guest memory map is chosen at step (1), yet there > is no VFIO fd available to determine the reserved regions. Furthermore, > the reserved regions may vary between system 0 and system 1. This pretty > much rules out using VFIO to determine the reserved regions.Alex suggested > that the SMMU driver can advertise the regions via /sys/class/iommu/. This > would solve part of the problem, but migration between systems with > different memory maps can still cause problems if the reserved regions > of the new system conflict with the guest memory map chosen by QEMU. OK so I understand we do not want anymore the VFIO chain capability API (patch 5 of above series) but we prefer a sysfs approach instead. I understand the sysfs approach which allows the userspace to get the info earlier and independently on VFIO. Keeping in mind current QEMU virt - which is not the only userspace - will not do much from this info until we bring upheavals in virt address space management. So if I am not wrong, at the moment the main action to be undertaken is the rejection of the PCI hotplug in case we detect a collision. I can respin [1] - studying and taking into account Robin's comments about dm_regions similarities - removing the VFIO capability chain and replacing this by a sysfs API Would that be OK? What about Alex comments who wanted to report the usable memory ranges instead of unusable memory ranges? Also did you have a chance to discuss the following items: 1) the VFIO irq safety assessment 2) the MSI reserved size computation (is an arbitrary size OK?) Thanks Eric > Jon pointed out that most people are pretty conservative about hardware > choices when migrating between them -- that is, they may only migrate > between different revisions of the same SoC, or they know ahead of time > all of the memory maps they want to support and this could be communicated > by way of configuration to libvirt. It would be up to QEMU to fail the > hotplug if it detected a conflict. Alex asked if there was a security > issue with DMA bypassing the SMMU, but there aren't currently any systems > where that is known to happen. Such a system would surely not be safe for > passthrough. > > Ben mused that a way to handle conflicts dynamically might be to hotplug > on the entire host bridge in the guest, passing firmware tables describing > the new reserved regions as a property of the host bridge. Whilst this > may well solve the issue, it was largely considered future work due to > its invasive nature and dependency on firmware tables (and guest support) > that do not currently exist. > > Will > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
WARNING: multiple messages have this Message-ID (diff)
From: eric.auger@redhat.com (Auger Eric) To: linux-arm-kernel@lists.infradead.org Subject: Summary of LPC guest MSI discussion in Santa Fe Date: Tue, 8 Nov 2016 15:27:23 +0100 [thread overview] Message-ID: <dae12190-1eb6-20a9-5740-9e5be8bb65fc@redhat.com> (raw) In-Reply-To: <20161108024559.GA20591@arm.com> Hi Will, On 08/11/2016 03:45, Will Deacon wrote: > Hi all, > > I figured this was a reasonable post to piggy-back on for the LPC minutes > relating to guest MSIs on arm64. > > On Thu, Nov 03, 2016 at 10:02:05PM -0600, Alex Williamson wrote: >> We can always have QEMU reject hot-adding the device if the reserved >> region overlaps existing guest RAM, but I don't even really see how we >> advise users to give them a reasonable chance of avoiding that >> possibility. Apparently there are also ARM platforms where MSI pages >> cannot be remapped to support the previous programmable user/VM >> address, is it even worthwhile to support those platforms? Does that >> decision influence whether user programmable MSI reserved regions are >> really a second class citizen to fixed reserved regions? I expect >> we'll be talking about this tomorrow morning, but I certainly haven't >> come up with any viable solutions to this. Thanks, > > At LPC last week, we discussed guest MSIs on arm64 as part of the PCI > microconference. I presented some slides to illustrate some of the issues > we're trying to solve: > > http://www.willdeacon.ukfsn.org/bitbucket/lpc-16/msi-in-guest-arm64.pdf > > Punit took some notes (thanks!) on the etherpad here: > > https://etherpad.openstack.org/p/LPC2016_PCI Thanks to both of you for the minutes and slides. Unfortunately I could not travel but my ears were burning ;-) > > although the discussion was pretty lively and jumped about, so I've had > to go from memory where the notes didn't capture everything that was > said. > > To summarise, arm64 platforms differ in their handling of MSIs when compared > to x86: > > 1. The physical memory map is not standardised (Jon pointed out that > this is something that was realised late on) > 2. MSIs are usually treated the same as DMA writes, in that they must be > mapped by the SMMU page tables so that they target a physical MSI > doorbell > 3. On some platforms, MSIs bypass the SMMU entirely (e.g. due to an MSI > doorbell built into the PCI RC) > 4. Platforms typically have some set of addresses that abort before > reaching the SMMU (e.g. because the PCI identifies them as P2P). > > All of this means that userspace (QEMU) needs to identify the memory > regions corresponding to points (3) and (4) and ensure that they are > not allocated in the guest physical (IPA) space. For platforms that can > remap the MSI doorbell as in (2), then some space also needs to be > allocated for that. > > Rather than treat these as separate problems, a better interface is to > tell userspace about a set of reserved regions, and have this include > the MSI doorbell, irrespective of whether or not it can be remapped. > Don suggested that we statically pick an address for the doorbell in a > similar way to x86, and have the kernel map it there. We could even pick > 0xfee00000. If it conflicts with a reserved region on the platform (due > to (4)), then we'd obviously have to (deterministically?) allocate it > somewhere else, but probably within the bottom 4G. This is tentatively achieved now with [1] [RFC v2 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 - Alt II (http://www.mail-archive.com/linux-kernel at vger.kernel.org/msg1264506.html) > > The next question is how to tell userspace about all of the reserved > regions. Initially, the idea was to extend VFIO, however Alex pointed > out a horrible scenario: > > 1. QEMU spawns a VM on system 0 > 2. VM is migrated to system 1 > 3. QEMU attempts to passthrough a device using PCI hotplug > > In this scenario, the guest memory map is chosen at step (1), yet there > is no VFIO fd available to determine the reserved regions. Furthermore, > the reserved regions may vary between system 0 and system 1. This pretty > much rules out using VFIO to determine the reserved regions.Alex suggested > that the SMMU driver can advertise the regions via /sys/class/iommu/. This > would solve part of the problem, but migration between systems with > different memory maps can still cause problems if the reserved regions > of the new system conflict with the guest memory map chosen by QEMU. OK so I understand we do not want anymore the VFIO chain capability API (patch 5 of above series) but we prefer a sysfs approach instead. I understand the sysfs approach which allows the userspace to get the info earlier and independently on VFIO. Keeping in mind current QEMU virt - which is not the only userspace - will not do much from this info until we bring upheavals in virt address space management. So if I am not wrong, at the moment the main action to be undertaken is the rejection of the PCI hotplug in case we detect a collision. I can respin [1] - studying and taking into account Robin's comments about dm_regions similarities - removing the VFIO capability chain and replacing this by a sysfs API Would that be OK? What about Alex comments who wanted to report the usable memory ranges instead of unusable memory ranges? Also did you have a chance to discuss the following items: 1) the VFIO irq safety assessment 2) the MSI reserved size computation (is an arbitrary size OK?) Thanks Eric > Jon pointed out that most people are pretty conservative about hardware > choices when migrating between them -- that is, they may only migrate > between different revisions of the same SoC, or they know ahead of time > all of the memory maps they want to support and this could be communicated > by way of configuration to libvirt. It would be up to QEMU to fail the > hotplug if it detected a conflict. Alex asked if there was a security > issue with DMA bypassing the SMMU, but there aren't currently any systems > where that is known to happen. Such a system would surely not be safe for > passthrough. > > Ben mused that a way to handle conflicts dynamically might be to hotplug > on the entire host bridge in the guest, passing firmware tables describing > the new reserved regions as a property of the host bridge. Whilst this > may well solve the issue, it was largely considered future work due to > its invasive nature and dependency on firmware tables (and guest support) > that do not currently exist. > > Will > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
next prev parent reply other threads:[~2016-11-08 14:27 UTC|newest] Thread overview: 119+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-11-03 21:39 [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II) Eric Auger 2016-11-03 21:39 ` Eric Auger 2016-11-03 21:39 ` Eric Auger 2016-11-03 21:39 ` [RFC 1/8] vfio: fix vfio_info_cap_add/shift Eric Auger 2016-11-03 21:39 ` Eric Auger 2016-11-03 21:39 ` [RFC 2/8] iommu/iova: fix __alloc_and_insert_iova_range Eric Auger 2016-11-03 21:39 ` Eric Auger 2016-11-03 21:39 ` [RFC 3/8] iommu/dma: Allow MSI-only cookies Eric Auger 2016-11-03 21:39 ` Eric Auger 2016-11-03 21:39 ` [RFC 4/8] iommu: Add a list of iommu_reserved_region in iommu_domain Eric Auger 2016-11-03 21:39 ` [RFC 5/8] vfio/type1: Introduce RESV_IOVA_RANGE capability Eric Auger 2016-11-03 21:39 ` Eric Auger 2016-11-03 21:39 ` [RFC 6/8] iommu: Handle the list of reserved regions Eric Auger 2016-11-03 21:39 ` [RFC 7/8] iommu/vt-d: Implement add_reserved_regions callback Eric Auger 2016-11-03 21:39 ` [RFC 8/8] iommu/arm-smmu: implement " Eric Auger 2016-11-03 21:39 ` Eric Auger 2016-11-04 4:02 ` [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II) Alex Williamson 2016-11-04 4:02 ` Alex Williamson 2016-11-04 4:02 ` Alex Williamson 2016-11-08 2:45 ` Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II)) Will Deacon 2016-11-08 2:45 ` Will Deacon 2016-11-08 14:27 ` Auger Eric [this message] 2016-11-08 14:27 ` Summary of LPC guest MSI discussion in Santa Fe Auger Eric 2016-11-08 17:54 ` Will Deacon 2016-11-08 17:54 ` Will Deacon 2016-11-08 17:54 ` Will Deacon 2016-11-08 19:02 ` Don Dutile 2016-11-08 19:02 ` Don Dutile 2016-11-08 19:02 ` Don Dutile 2016-11-08 19:10 ` Will Deacon 2016-11-08 19:10 ` Will Deacon 2016-11-09 7:43 ` Auger Eric 2016-11-09 7:43 ` Auger Eric 2016-11-09 7:43 ` Auger Eric 2016-11-08 16:02 ` Don Dutile 2016-11-08 16:02 ` Don Dutile 2016-11-08 20:29 ` Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II)) Christoffer Dall 2016-11-08 20:29 ` Christoffer Dall 2016-11-08 20:29 ` Christoffer Dall 2016-11-08 23:35 ` Alex Williamson 2016-11-08 23:35 ` Alex Williamson 2016-11-08 23:35 ` Alex Williamson 2016-11-09 2:52 ` Summary of LPC guest MSI discussion in Santa Fe Don Dutile 2016-11-09 2:52 ` Don Dutile 2016-11-09 2:52 ` Don Dutile 2016-11-09 17:03 ` Will Deacon 2016-11-09 17:03 ` Will Deacon 2016-11-09 17:03 ` Will Deacon 2016-11-09 18:59 ` Don Dutile 2016-11-09 18:59 ` Don Dutile 2016-11-09 19:23 ` Christoffer Dall 2016-11-09 19:23 ` Christoffer Dall 2016-11-09 19:23 ` Christoffer Dall 2016-11-09 20:01 ` Alex Williamson 2016-11-09 20:01 ` Alex Williamson 2016-11-09 20:01 ` Alex Williamson 2016-11-10 14:40 ` Joerg Roedel 2016-11-10 14:40 ` Joerg Roedel 2016-11-10 17:07 ` Alex Williamson 2016-11-10 17:07 ` Alex Williamson 2016-11-10 17:07 ` Alex Williamson 2016-11-09 20:31 ` Will Deacon 2016-11-09 20:31 ` Will Deacon 2016-11-09 22:17 ` Alex Williamson 2016-11-09 22:17 ` Alex Williamson 2016-11-09 22:17 ` Alex Williamson 2016-11-09 22:25 ` Will Deacon 2016-11-09 22:25 ` Will Deacon 2016-11-09 22:25 ` Will Deacon 2016-11-09 23:24 ` Alex Williamson 2016-11-09 23:24 ` Alex Williamson 2016-11-09 23:24 ` Alex Williamson 2016-11-09 23:38 ` Will Deacon 2016-11-09 23:38 ` Will Deacon 2016-11-09 23:59 ` Alex Williamson 2016-11-09 23:59 ` Alex Williamson 2016-11-09 23:59 ` Alex Williamson 2016-11-10 0:14 ` Auger Eric 2016-11-10 0:14 ` Auger Eric 2016-11-10 0:55 ` Alex Williamson 2016-11-10 0:55 ` Alex Williamson 2016-11-10 0:55 ` Alex Williamson 2016-11-10 2:01 ` Will Deacon 2016-11-10 2:01 ` Will Deacon 2016-11-10 11:14 ` Auger Eric 2016-11-10 11:14 ` Auger Eric 2016-11-10 11:14 ` Auger Eric 2016-11-10 17:46 ` Alex Williamson 2016-11-10 17:46 ` Alex Williamson 2016-11-10 17:46 ` Alex Williamson 2016-11-11 11:19 ` Joerg Roedel 2016-11-11 11:19 ` Joerg Roedel 2016-11-11 15:50 ` Alex Williamson 2016-11-11 15:50 ` Alex Williamson 2016-11-11 15:50 ` Alex Williamson 2016-11-11 16:05 ` Alex Williamson 2016-11-11 16:05 ` Alex Williamson 2016-11-11 16:05 ` Alex Williamson 2016-11-14 15:19 ` Joerg Roedel 2016-11-14 15:19 ` Joerg Roedel 2016-11-11 16:25 ` Don Dutile 2016-11-11 16:25 ` Don Dutile 2016-11-11 16:25 ` Don Dutile 2016-11-11 16:00 ` Don Dutile 2016-11-11 16:00 ` Don Dutile 2016-11-11 16:00 ` Don Dutile 2016-11-10 14:52 ` Joerg Roedel 2016-11-10 14:52 ` Joerg Roedel 2016-11-09 20:11 ` Robin Murphy 2016-11-09 20:11 ` Robin Murphy 2016-11-10 15:18 ` Joerg Roedel 2016-11-10 15:18 ` Joerg Roedel 2016-11-10 15:18 ` Joerg Roedel 2016-11-21 5:13 ` Jon Masters 2016-11-21 5:13 ` Jon Masters 2016-11-21 5:13 ` Jon Masters 2016-11-23 20:12 ` Don Dutile 2016-11-23 20:12 ` Don Dutile 2016-11-23 20:12 ` Don Dutile
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=dae12190-1eb6-20a9-5740-9e5be8bb65fc@redhat.com \ --to=eric.auger@redhat.com \ --cc=alex.williamson@redhat.com \ --cc=arnd@arndb.de \ --cc=benh@kernel.crashing.org \ --cc=christoffer.dall@linaro.org \ --cc=ddutile@redhat.com \ --cc=diana.craciun@nxp.com \ --cc=drjones@redhat.com \ --cc=dwmw@amazon.co.uk \ --cc=eric.auger.pro@gmail.com \ --cc=iommu@lists.linux-foundation.org \ --cc=jason@lakedaemon.net \ --cc=jcm@redhat.com \ --cc=joro@8bytes.org \ --cc=kvm@vger.kernel.org \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=marc.zyngier@arm.com \ --cc=pranav.sawargaonkar@gmail.com \ --cc=punit.agrawal@arm.com \ --cc=robin.murphy@arm.com \ --cc=tglx@linutronix.de \ --cc=will.deacon@arm.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.