From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933614AbcI3Naz (ORCPT ); Fri, 30 Sep 2016 09:30:55 -0400 Received: from foss.arm.com ([217.140.101.70]:43186 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933508AbcI3Nau (ORCPT ); Fri, 30 Sep 2016 09:30:50 -0400 Subject: Re: [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed To: Eric Auger , eric.auger.pro@gmail.com, christoffer.dall@linaro.org, marc.zyngier@arm.com, alex.williamson@redhat.com, will.deacon@arm.com, joro@8bytes.org, tglx@linutronix.de, jason@lakedaemon.net, linux-arm-kernel@lists.infradead.org References: <1475009318-2617-1-git-send-email-eric.auger@redhat.com> <1475009318-2617-6-git-send-email-eric.auger@redhat.com> Cc: kvm@vger.kernel.org, drjones@redhat.com, linux-kernel@vger.kernel.org, Bharat.Bhushan@freescale.com, pranav.sawargaonkar@gmail.com, p.fedin@samsung.com, iommu@lists.linux-foundation.org, Jean-Philippe.Brucker@arm.com, yehuday@marvell.com, Manish.Jaggi@caviumnetworks.com From: Robin Murphy Message-ID: <1b1b30b3-4199-9e18-362c-b8bc9d45277d@arm.com> Date: Fri, 30 Sep 2016 14:24:40 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <1475009318-2617-6-git-send-email-eric.auger@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Eric, On 27/09/16 21:48, Eric Auger wrote: > iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on > IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap > they reserve the IOVA window to prevent the iova allocator to > allocate in those areas. > > Signed-off-by: Eric Auger > --- > drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++ > include/linux/dma-iommu.h | 18 ++++++++++++++++++ > 2 files changed, 66 insertions(+) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 04bbc85..db21143 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain, > return 0; > } > EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie); > + > +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova, > + phys_addr_t paddr, size_t size, int prot) > +{ > + struct iova_domain *iovad; > + unsigned long lo, hi; > + int ret; > + > + if (domain->type != IOMMU_DOMAIN_MIXED) > + return -EINVAL; > + > + if (!domain->iova_cookie) > + return -EINVAL; > + > + iovad = cookie_iovad(domain); > + > + lo = iova_pfn(iovad, iova); > + hi = iova_pfn(iovad, iova + size - 1); > + reserve_iova(iovad, lo, hi); This can't work reliably - reserve_iova() will (for good reason) merge any adjacent or overlapping entries, so any unmap is liable to free more IOVA space than actually gets unmapped, and things will get subtly out of sync and go wrong later. The more general issue with this whole approach, though, is that it effectively rules out userspace doing guest memory hotplug or similar, and I'm not we want to paint ourselves into that corner. Basically, as soon as a device is attached to a guest, the entirety of the unallocated IPA space becomes reserved, and userspace can never add anything further to it, because any given address *might* be in use for an MSI mapping. I think it still makes most sense to stick with the original approach of cooperating with userspace to reserve a bounded area - it's just that we can then let automatic mapping take care of itself within that area. Speaking of which, I've realised the same fundamental reservation problem already applies to PCI without ACS, regardless of MSIs. I just tried on my Juno with guest memory placed at 0x4000000000, (i.e. matching the host PA of the 64-bit PCI window), and sure enough when the guest kicks off some DMA on the passed-through NIC, the root complex interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR claimed by the video card, and it fails. I guess this doesn't get hit in practice on x86 because the guest memory map is unlikely to be much different from the host's. It seems like we basically need a general way of communicating fixed and movable host reservations to userspace :/ Robin. > + ret = iommu_map(domain, iova, paddr, size, prot); > + if (ret) > + free_iova(iovad, lo); > + return ret; > +} > +EXPORT_SYMBOL(iommu_dma_map_mixed); > + > +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova, > + size_t size) > +{ > + struct iova_domain *iovad; > + unsigned long lo; > + size_t ret; > + > + if (domain->type != IOMMU_DOMAIN_MIXED) > + return -EINVAL; > + > + if (!domain->iova_cookie) > + return -EINVAL; > + > + iovad = cookie_iovad(domain); > + lo = iova_pfn(iovad, iova); > + > + ret = iommu_unmap(domain, iova, size); > + if (ret == size) > + free_iova(iovad, lo); > + return ret; > +} > +EXPORT_SYMBOL(iommu_dma_unmap_mixed); > diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h > index 1c55413..f2aa855 100644 > --- a/include/linux/dma-iommu.h > +++ b/include/linux/dma-iommu.h > @@ -70,6 +70,12 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg); > int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain, > dma_addr_t base, u64 size); > > +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova, > + phys_addr_t paddr, size_t size, int prot); > + > +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova, > + size_t size); > + > #else > > struct iommu_domain; > @@ -99,6 +105,18 @@ static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain, > return -ENODEV; > } > > +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova, > + phys_addr_t paddr, size_t size, int prot) > +{ > + return -ENODEV; > +} > + > +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova, > + size_t size) > +{ > + return -ENODEV; > +} > + > #endif /* CONFIG_IOMMU_DMA */ > #endif /* __KERNEL__ */ > #endif /* __DMA_IOMMU_H */ > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robin Murphy Subject: Re: [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed Date: Fri, 30 Sep 2016 14:24:40 +0100 Message-ID: <1b1b30b3-4199-9e18-362c-b8bc9d45277d@arm.com> References: <1475009318-2617-1-git-send-email-eric.auger@redhat.com> <1475009318-2617-6-git-send-email-eric.auger@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org, p.fedin-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, yehuday-eYqpPyKDWXRBDgjK7y7TUQ@public.gmane.org To: Eric Auger , eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org, marc.zyngier-5wv7dgnIgG8@public.gmane.org, alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, will.deacon-5wv7dgnIgG8@public.gmane.org, joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org, tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org, jason-NLaQJdtUoK4Be96aLqz0jA@public.gmane.org, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org Return-path: In-Reply-To: <1475009318-2617-6-git-send-email-eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org List-Id: kvm.vger.kernel.org Hi Eric, On 27/09/16 21:48, Eric Auger wrote: > iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on > IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap > they reserve the IOVA window to prevent the iova allocator to > allocate in those areas. > > Signed-off-by: Eric Auger > --- > drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++ > include/linux/dma-iommu.h | 18 ++++++++++++++++++ > 2 files changed, 66 insertions(+) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 04bbc85..db21143 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain, > return 0; > } > EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie); > + > +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova, > + phys_addr_t paddr, size_t size, int prot) > +{ > + struct iova_domain *iovad; > + unsigned long lo, hi; > + int ret; > + > + if (domain->type != IOMMU_DOMAIN_MIXED) > + return -EINVAL; > + > + if (!domain->iova_cookie) > + return -EINVAL; > + > + iovad = cookie_iovad(domain); > + > + lo = iova_pfn(iovad, iova); > + hi = iova_pfn(iovad, iova + size - 1); > + reserve_iova(iovad, lo, hi); This can't work reliably - reserve_iova() will (for good reason) merge any adjacent or overlapping entries, so any unmap is liable to free more IOVA space than actually gets unmapped, and things will get subtly out of sync and go wrong later. The more general issue with this whole approach, though, is that it effectively rules out userspace doing guest memory hotplug or similar, and I'm not we want to paint ourselves into that corner. Basically, as soon as a device is attached to a guest, the entirety of the unallocated IPA space becomes reserved, and userspace can never add anything further to it, because any given address *might* be in use for an MSI mapping. I think it still makes most sense to stick with the original approach of cooperating with userspace to reserve a bounded area - it's just that we can then let automatic mapping take care of itself within that area. Speaking of which, I've realised the same fundamental reservation problem already applies to PCI without ACS, regardless of MSIs. I just tried on my Juno with guest memory placed at 0x4000000000, (i.e. matching the host PA of the 64-bit PCI window), and sure enough when the guest kicks off some DMA on the passed-through NIC, the root complex interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR claimed by the video card, and it fails. I guess this doesn't get hit in practice on x86 because the guest memory map is unlikely to be much different from the host's. It seems like we basically need a general way of communicating fixed and movable host reservations to userspace :/ Robin. > + ret = iommu_map(domain, iova, paddr, size, prot); > + if (ret) > + free_iova(iovad, lo); > + return ret; > +} > +EXPORT_SYMBOL(iommu_dma_map_mixed); > + > +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova, > + size_t size) > +{ > + struct iova_domain *iovad; > + unsigned long lo; > + size_t ret; > + > + if (domain->type != IOMMU_DOMAIN_MIXED) > + return -EINVAL; > + > + if (!domain->iova_cookie) > + return -EINVAL; > + > + iovad = cookie_iovad(domain); > + lo = iova_pfn(iovad, iova); > + > + ret = iommu_unmap(domain, iova, size); > + if (ret == size) > + free_iova(iovad, lo); > + return ret; > +} > +EXPORT_SYMBOL(iommu_dma_unmap_mixed); > diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h > index 1c55413..f2aa855 100644 > --- a/include/linux/dma-iommu.h > +++ b/include/linux/dma-iommu.h > @@ -70,6 +70,12 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg); > int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain, > dma_addr_t base, u64 size); > > +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova, > + phys_addr_t paddr, size_t size, int prot); > + > +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova, > + size_t size); > + > #else > > struct iommu_domain; > @@ -99,6 +105,18 @@ static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain, > return -ENODEV; > } > > +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova, > + phys_addr_t paddr, size_t size, int prot) > +{ > + return -ENODEV; > +} > + > +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova, > + size_t size) > +{ > + return -ENODEV; > +} > + > #endif /* CONFIG_IOMMU_DMA */ > #endif /* __KERNEL__ */ > #endif /* __DMA_IOMMU_H */ > From mboxrd@z Thu Jan 1 00:00:00 1970 From: robin.murphy@arm.com (Robin Murphy) Date: Fri, 30 Sep 2016 14:24:40 +0100 Subject: [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed In-Reply-To: <1475009318-2617-6-git-send-email-eric.auger@redhat.com> References: <1475009318-2617-1-git-send-email-eric.auger@redhat.com> <1475009318-2617-6-git-send-email-eric.auger@redhat.com> Message-ID: <1b1b30b3-4199-9e18-362c-b8bc9d45277d@arm.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi Eric, On 27/09/16 21:48, Eric Auger wrote: > iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on > IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap > they reserve the IOVA window to prevent the iova allocator to > allocate in those areas. > > Signed-off-by: Eric Auger > --- > drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++ > include/linux/dma-iommu.h | 18 ++++++++++++++++++ > 2 files changed, 66 insertions(+) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 04bbc85..db21143 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain, > return 0; > } > EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie); > + > +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova, > + phys_addr_t paddr, size_t size, int prot) > +{ > + struct iova_domain *iovad; > + unsigned long lo, hi; > + int ret; > + > + if (domain->type != IOMMU_DOMAIN_MIXED) > + return -EINVAL; > + > + if (!domain->iova_cookie) > + return -EINVAL; > + > + iovad = cookie_iovad(domain); > + > + lo = iova_pfn(iovad, iova); > + hi = iova_pfn(iovad, iova + size - 1); > + reserve_iova(iovad, lo, hi); This can't work reliably - reserve_iova() will (for good reason) merge any adjacent or overlapping entries, so any unmap is liable to free more IOVA space than actually gets unmapped, and things will get subtly out of sync and go wrong later. The more general issue with this whole approach, though, is that it effectively rules out userspace doing guest memory hotplug or similar, and I'm not we want to paint ourselves into that corner. Basically, as soon as a device is attached to a guest, the entirety of the unallocated IPA space becomes reserved, and userspace can never add anything further to it, because any given address *might* be in use for an MSI mapping. I think it still makes most sense to stick with the original approach of cooperating with userspace to reserve a bounded area - it's just that we can then let automatic mapping take care of itself within that area. Speaking of which, I've realised the same fundamental reservation problem already applies to PCI without ACS, regardless of MSIs. I just tried on my Juno with guest memory placed at 0x4000000000, (i.e. matching the host PA of the 64-bit PCI window), and sure enough when the guest kicks off some DMA on the passed-through NIC, the root complex interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR claimed by the video card, and it fails. I guess this doesn't get hit in practice on x86 because the guest memory map is unlikely to be much different from the host's. It seems like we basically need a general way of communicating fixed and movable host reservations to userspace :/ Robin. > + ret = iommu_map(domain, iova, paddr, size, prot); > + if (ret) > + free_iova(iovad, lo); > + return ret; > +} > +EXPORT_SYMBOL(iommu_dma_map_mixed); > + > +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova, > + size_t size) > +{ > + struct iova_domain *iovad; > + unsigned long lo; > + size_t ret; > + > + if (domain->type != IOMMU_DOMAIN_MIXED) > + return -EINVAL; > + > + if (!domain->iova_cookie) > + return -EINVAL; > + > + iovad = cookie_iovad(domain); > + lo = iova_pfn(iovad, iova); > + > + ret = iommu_unmap(domain, iova, size); > + if (ret == size) > + free_iova(iovad, lo); > + return ret; > +} > +EXPORT_SYMBOL(iommu_dma_unmap_mixed); > diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h > index 1c55413..f2aa855 100644 > --- a/include/linux/dma-iommu.h > +++ b/include/linux/dma-iommu.h > @@ -70,6 +70,12 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg); > int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain, > dma_addr_t base, u64 size); > > +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova, > + phys_addr_t paddr, size_t size, int prot); > + > +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova, > + size_t size); > + > #else > > struct iommu_domain; > @@ -99,6 +105,18 @@ static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain, > return -ENODEV; > } > > +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova, > + phys_addr_t paddr, size_t size, int prot) > +{ > + return -ENODEV; > +} > + > +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova, > + size_t size) > +{ > + return -ENODEV; > +} > + > #endif /* CONFIG_IOMMU_DMA */ > #endif /* __KERNEL__ */ > #endif /* __DMA_IOMMU_H */ >