From: Robin Murphy <robin.murphy@arm.com> To: Christoffer Dall <christoffer.dall@linaro.org> Cc: Eric Auger <eric.auger@redhat.com>, eric.auger.pro@gmail.com, marc.zyngier@arm.com, alex.williamson@redhat.com, will.deacon@arm.com, joro@8bytes.org, tglx@linutronix.de, jason@lakedaemon.net, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, drjones@redhat.com, linux-kernel@vger.kernel.org, Bharat.Bhushan@freescale.com, pranav.sawargaonkar@gmail.com, p.fedin@samsung.com, iommu@lists.linux-foundation.org, Jean-Philippe.Brucker@arm.com, yehuday@marvell.com, Manish.Jaggi@caviumnetworks.com, Peter Maydell <peter.maydell@linaro.org> Subject: Re: [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed Date: Tue, 4 Oct 2016 18:18:12 +0100 [thread overview] Message-ID: <7474f131-9e44-bfde-6937-7cdbd6b2c8a5@arm.com> (raw) In-Reply-To: <20161002095614.GA23218@cbox> On 02/10/16 10:56, Christoffer Dall wrote: > On Fri, Sep 30, 2016 at 02:24:40PM +0100, Robin Murphy wrote: >> Hi Eric, >> >> On 27/09/16 21:48, Eric Auger wrote: >>> iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on >>> IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap >>> they reserve the IOVA window to prevent the iova allocator to >>> allocate in those areas. >>> >>> Signed-off-by: Eric Auger <eric.auger@redhat.com> >>> --- >>> drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++ >>> include/linux/dma-iommu.h | 18 ++++++++++++++++++ >>> 2 files changed, 66 insertions(+) >>> >>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c >>> index 04bbc85..db21143 100644 >>> --- a/drivers/iommu/dma-iommu.c >>> +++ b/drivers/iommu/dma-iommu.c >>> @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain, >>> return 0; >>> } >>> EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie); >>> + >>> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova, >>> + phys_addr_t paddr, size_t size, int prot) >>> +{ >>> + struct iova_domain *iovad; >>> + unsigned long lo, hi; >>> + int ret; >>> + >>> + if (domain->type != IOMMU_DOMAIN_MIXED) >>> + return -EINVAL; >>> + >>> + if (!domain->iova_cookie) >>> + return -EINVAL; >>> + >>> + iovad = cookie_iovad(domain); >>> + >>> + lo = iova_pfn(iovad, iova); >>> + hi = iova_pfn(iovad, iova + size - 1); >>> + reserve_iova(iovad, lo, hi); >> >> This can't work reliably - reserve_iova() will (for good reason) merge >> any adjacent or overlapping entries, so any unmap is liable to free more >> IOVA space than actually gets unmapped, and things will get subtly out >> of sync and go wrong later. >> >> The more general issue with this whole approach, though, is that it >> effectively rules out userspace doing guest memory hotplug or similar, >> and I'm not we want to paint ourselves into that corner. Basically, as >> soon as a device is attached to a guest, the entirety of the unallocated >> IPA space becomes reserved, and userspace can never add anything further >> to it, because any given address *might* be in use for an MSI mapping. > > Ah, we didn't think of that when discussing this design at KVM Forum, > because the idea was that the IOVA allocator was in charge of that > resource, and the IOVA was a separate concept from the IPA space. > > I think what tripped us up, is that while the above is true for the MSI > configuration where we trap the bar and do the allocation at VFIO init > time, the guest device driver can program DMA to any address without > trapping, and therefore there's an inherent relationship between the > IOVA and the IPA space. Is that right? Yes, for anything the guest knows about and/or can touch directly, IOVA must equal IPA, or DMA is going to go horribly wrong. It's only direct interactions between device and host behind the guest's back where we (may) have some freedom with IOVA assignment. >> I think it still makes most sense to stick with the original approach of >> cooperating with userspace to reserve a bounded area - it's just that we >> can then let automatic mapping take care of itself within that area. > > I was thinking that it's also possible to do it the other way around: To > let userspace say wherever memory may be hotplugged and do the > allocation within the remaining area, but I suppose that's pretty much > the same thing, and it should just depend on what's easiest to implement > and what userspace can best predict. Indeed, if userspace *is* able to pre-emptively claim everything it might ever want, that does kind of implicitly solve the "tell me where I can put this" problem (assuming it doesn't simply claim the whole address space, of course), but I'm not so sure it works well if there are any specific restrictions (e.g. if some device is going to require the MSI range to be 32-bit addressable). It also fails to address the issue below... >> Speaking of which, I've realised the same fundamental reservation >> problem already applies to PCI without ACS, regardless of MSIs. I just >> tried on my Juno with guest memory placed at 0x4000000000, (i.e. >> matching the host PA of the 64-bit PCI window), and sure enough when the >> guest kicks off some DMA on the passed-through NIC, the root complex >> interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR >> claimed by the video card, and it fails. I guess this doesn't get hit in >> practice on x86 because the guest memory map is unlikely to be much >> different from the host's. >> >> It seems like we basically need a general way of communicating fixed and >> movable host reservations to userspace :/ >> > > Yes, this makes sense to me. Do we have any existing way of > discovering this from userspace or can we think of something? I know virtually nothing about the userspace interface, but I was under the impression it would require something new. I wasn't even aware you could do the VFIO-under-QEMU-TCG thing which Eric points out, so it seems like the general "tell userspace about addresses it can't use" issue is perhaps the more pressing one. On investigation, QEMU's static memory map with RAM at 0x4000000 is already busted for VFIO on Juno, as that results in attempting DMA to config space, which goes about as well as one might expect. Robin. > > Thanks, > -Christoffer >
WARNING: multiple messages have this Message-ID (diff)
From: robin.murphy@arm.com (Robin Murphy) To: linux-arm-kernel@lists.infradead.org Subject: [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed Date: Tue, 4 Oct 2016 18:18:12 +0100 [thread overview] Message-ID: <7474f131-9e44-bfde-6937-7cdbd6b2c8a5@arm.com> (raw) In-Reply-To: <20161002095614.GA23218@cbox> On 02/10/16 10:56, Christoffer Dall wrote: > On Fri, Sep 30, 2016 at 02:24:40PM +0100, Robin Murphy wrote: >> Hi Eric, >> >> On 27/09/16 21:48, Eric Auger wrote: >>> iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on >>> IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap >>> they reserve the IOVA window to prevent the iova allocator to >>> allocate in those areas. >>> >>> Signed-off-by: Eric Auger <eric.auger@redhat.com> >>> --- >>> drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++ >>> include/linux/dma-iommu.h | 18 ++++++++++++++++++ >>> 2 files changed, 66 insertions(+) >>> >>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c >>> index 04bbc85..db21143 100644 >>> --- a/drivers/iommu/dma-iommu.c >>> +++ b/drivers/iommu/dma-iommu.c >>> @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain, >>> return 0; >>> } >>> EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie); >>> + >>> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova, >>> + phys_addr_t paddr, size_t size, int prot) >>> +{ >>> + struct iova_domain *iovad; >>> + unsigned long lo, hi; >>> + int ret; >>> + >>> + if (domain->type != IOMMU_DOMAIN_MIXED) >>> + return -EINVAL; >>> + >>> + if (!domain->iova_cookie) >>> + return -EINVAL; >>> + >>> + iovad = cookie_iovad(domain); >>> + >>> + lo = iova_pfn(iovad, iova); >>> + hi = iova_pfn(iovad, iova + size - 1); >>> + reserve_iova(iovad, lo, hi); >> >> This can't work reliably - reserve_iova() will (for good reason) merge >> any adjacent or overlapping entries, so any unmap is liable to free more >> IOVA space than actually gets unmapped, and things will get subtly out >> of sync and go wrong later. >> >> The more general issue with this whole approach, though, is that it >> effectively rules out userspace doing guest memory hotplug or similar, >> and I'm not we want to paint ourselves into that corner. Basically, as >> soon as a device is attached to a guest, the entirety of the unallocated >> IPA space becomes reserved, and userspace can never add anything further >> to it, because any given address *might* be in use for an MSI mapping. > > Ah, we didn't think of that when discussing this design at KVM Forum, > because the idea was that the IOVA allocator was in charge of that > resource, and the IOVA was a separate concept from the IPA space. > > I think what tripped us up, is that while the above is true for the MSI > configuration where we trap the bar and do the allocation at VFIO init > time, the guest device driver can program DMA to any address without > trapping, and therefore there's an inherent relationship between the > IOVA and the IPA space. Is that right? Yes, for anything the guest knows about and/or can touch directly, IOVA must equal IPA, or DMA is going to go horribly wrong. It's only direct interactions between device and host behind the guest's back where we (may) have some freedom with IOVA assignment. >> I think it still makes most sense to stick with the original approach of >> cooperating with userspace to reserve a bounded area - it's just that we >> can then let automatic mapping take care of itself within that area. > > I was thinking that it's also possible to do it the other way around: To > let userspace say wherever memory may be hotplugged and do the > allocation within the remaining area, but I suppose that's pretty much > the same thing, and it should just depend on what's easiest to implement > and what userspace can best predict. Indeed, if userspace *is* able to pre-emptively claim everything it might ever want, that does kind of implicitly solve the "tell me where I can put this" problem (assuming it doesn't simply claim the whole address space, of course), but I'm not so sure it works well if there are any specific restrictions (e.g. if some device is going to require the MSI range to be 32-bit addressable). It also fails to address the issue below... >> Speaking of which, I've realised the same fundamental reservation >> problem already applies to PCI without ACS, regardless of MSIs. I just >> tried on my Juno with guest memory placed at 0x4000000000, (i.e. >> matching the host PA of the 64-bit PCI window), and sure enough when the >> guest kicks off some DMA on the passed-through NIC, the root complex >> interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR >> claimed by the video card, and it fails. I guess this doesn't get hit in >> practice on x86 because the guest memory map is unlikely to be much >> different from the host's. >> >> It seems like we basically need a general way of communicating fixed and >> movable host reservations to userspace :/ >> > > Yes, this makes sense to me. Do we have any existing way of > discovering this from userspace or can we think of something? I know virtually nothing about the userspace interface, but I was under the impression it would require something new. I wasn't even aware you could do the VFIO-under-QEMU-TCG thing which Eric points out, so it seems like the general "tell userspace about addresses it can't use" issue is perhaps the more pressing one. On investigation, QEMU's static memory map with RAM at 0x4000000 is already busted for VFIO on Juno, as that results in attempting DMA to config space, which goes about as well as one might expect. Robin. > > Thanks, > -Christoffer >
next prev parent reply other threads:[~2016-10-04 17:18 UTC|newest] Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-09-27 20:48 [RFC 00/11] KVM PCIe/MSI passthrough on ARM/ARM64: re-design with transparent MSI mapping Eric Auger 2016-09-27 20:48 ` Eric Auger 2016-09-27 20:48 ` Eric Auger 2016-09-27 20:48 ` [RFC 01/11] iommu: Add iommu_domain_msi_geometry and DOMAIN_ATTR_MSI_GEOMETRY Eric Auger 2016-09-27 20:48 ` Eric Auger 2016-09-27 20:48 ` [RFC 02/11] iommu: Introduce IOMMU_CAP_TRANSLATE_MSI capability Eric Auger 2016-09-27 20:48 ` Eric Auger 2016-09-27 20:48 ` [RFC 03/11] iommu: Introduce IOMMU_DOMAIN_MIXED Eric Auger 2016-09-27 20:48 ` Eric Auger 2016-09-27 20:48 ` [RFC 04/11] iommu/dma: Allow MSI-only cookies Eric Auger 2016-09-27 20:48 ` Eric Auger 2016-09-27 20:48 ` [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed Eric Auger 2016-09-27 20:48 ` Eric Auger 2016-09-30 13:24 ` Robin Murphy 2016-09-30 13:24 ` Robin Murphy 2016-09-30 13:24 ` Robin Murphy 2016-10-02 9:56 ` Christoffer Dall 2016-10-02 9:56 ` Christoffer Dall 2016-10-02 9:56 ` Christoffer Dall 2016-10-04 17:18 ` Robin Murphy [this message] 2016-10-04 17:18 ` Robin Murphy 2016-10-04 17:37 ` Auger Eric 2016-10-04 17:37 ` Auger Eric 2016-10-03 9:38 ` Auger Eric 2016-10-03 9:38 ` Auger Eric 2016-10-03 9:38 ` Auger Eric 2016-09-27 20:48 ` [RFC 06/11] iommu/arm-smmu: Allow IOMMU_DOMAIN_MIXED domain allocation Eric Auger 2016-09-27 20:48 ` Eric Auger 2016-09-27 20:48 ` [RFC 07/11] iommu: Use IOMMU_DOMAIN_MIXED typed domain when IOMMU translates MSI Eric Auger 2016-09-27 20:48 ` Eric Auger 2016-09-27 20:48 ` [RFC 08/11] vfio/type1: Sets the IOVA window in case MSI IOVA need to be allocated Eric Auger 2016-09-27 20:48 ` Eric Auger 2016-09-27 20:48 ` [RFC 09/11] vfio/type1: Reserve IOVAs for IOMMU_DOMAIN_MIXED domains Eric Auger 2016-09-27 20:48 ` Eric Auger 2016-09-27 20:48 ` [RFC 10/11] iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP Eric Auger 2016-09-27 20:48 ` Eric Auger 2016-09-27 20:48 ` [RFC 11/11] iommu/arm-smmu: Advertise IOMMU_CAP_TRANSLATE_MSI Eric Auger 2016-09-27 20:48 ` Eric Auger
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=7474f131-9e44-bfde-6937-7cdbd6b2c8a5@arm.com \ --to=robin.murphy@arm.com \ --cc=Bharat.Bhushan@freescale.com \ --cc=Jean-Philippe.Brucker@arm.com \ --cc=Manish.Jaggi@caviumnetworks.com \ --cc=alex.williamson@redhat.com \ --cc=christoffer.dall@linaro.org \ --cc=drjones@redhat.com \ --cc=eric.auger.pro@gmail.com \ --cc=eric.auger@redhat.com \ --cc=iommu@lists.linux-foundation.org \ --cc=jason@lakedaemon.net \ --cc=joro@8bytes.org \ --cc=kvm@vger.kernel.org \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=marc.zyngier@arm.com \ --cc=p.fedin@samsung.com \ --cc=peter.maydell@linaro.org \ --cc=pranav.sawargaonkar@gmail.com \ --cc=tglx@linutronix.de \ --cc=will.deacon@arm.com \ --cc=yehuday@marvell.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.