linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64
@ 2016-03-01 18:27 Eric Auger
  2016-03-01 18:27 ` [RFC v5 01/17] iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute Eric Auger
                   ` (17 more replies)
  0 siblings, 18 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

This series addresses KVM PCIe passthrough with MSI enabled on ARM/ARM64.
It pursues the efforts done on [1], [2], [3]. It also aims at covering the
same need on PowerPC platforms although the same kind of integration
should be carried out.

On x86 all accesses to the 1MB PA region [FEE0_0000h - FEF0_000h] are directed
as interrupt messages: accesses to this special PA window directly target the
APIC configuration space and not DRAM, meaning the downstream IOMMU is bypassed.

This is not the case on above mentionned platforms where MSI messages emitted
by devices are conveyed through the IOMMU. This means an IOVA/host PA mapping
must exist for the MSI to reach the MSI controller. Normal way to create
IOVA bindings consists in using VFIO DMA MAP API. However in this case
the MSI IOVA is not mapped onto guest RAM but on host physical page (the MSI
controller frame).

In a nutshell, this series does:
- introduce a new DMA-RESERVED-IOMMU API to register a IOVA window usable for
  reserved mapping and allocate/bind IOVA to host physical addresses
- reuse VFIO DMA MAP ioctl with a new flag to plug onto that new API
- check if the MSI mapping is safe when attaching the vfio group to the
  container (allow_unsafe_interrupts modality)
- allow the MSI subsystem to map/unmap the doorbell on MSI message composition
- allow the user-space to know how many IOVA pages are requested

Best Regards

Eric

Testing:
- functional on ARM64 AMD Overdrive HW (single GICv2m frame) with
  x Intel e1000e PCIe card
  x Intel X540-T2 (SR-IOV capable)
- Not tested: ARM GICv3 ITS

References:
[1] [RFC 0/2] VFIO: Add virtual MSI doorbell support
    (https://lkml.org/lkml/2015/7/24/135)
[2] [RFC PATCH 0/6] vfio: Add interface to map MSI pages
    (https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016607.html)
[3] [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO
    (http://permalink.gmane.org/gmane.comp.emulators.kvm.arm.devel/3858)

Git:
https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.5-rc6-pcie-passthrough-rfcv5

previous version at
v3: https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.5-rc5-pcie-passthrough-rfcv4

QEMU Integration:
[RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt
(http://lists.gnu.org/archive/html/qemu-arm/2016-01/msg00444.html)
https://git.linaro.org/people/eric.auger/qemu.git/shortlog/refs/heads/v2.5.0-pci-passthrough-rfc-v2

User Hints:
To allow PCI/MSI passthrough with GICv2M, compile VFIO as a module and
load the vfio_iommu_type1 module with allow_unsafe_interrupts param:
sudo modprobe -v vfio-pci
sudo modprobe -r vfio_iommu_type1
sudo modprobe -v vfio_iommu_type1 allow_unsafe_interrupts=1

History:

RFC v4 -> RFC v5:
- take into account Thomas' comments on MSI related patches
  - split "msi: IOMMU map the doorbell address when needed"
  - increase readability and add comments
  - fix style issues
 - split "iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute"
 - platform ITS now advertises IOMMU_CAP_INTR_REMAP
 - fix compilation issue with CONFIG_IOMMU API unset
 - arm-smmu-v3 now advertises DOMAIN_ATTR_MSI_MAPPING

RFC v3 -> v4:
- Move doorbell mapping/unmapping in msi.c
- fix ref count issue on set_affinity: in case of a change in the address
  the previous address is decremented
- doorbell map/unmap now is done on msi composition. Should allow the use
  case for platform MSI controllers
- create dma-reserved-iommu.h/c exposing/implementing a new API dedicated
  to reserved IOVA management (looking like dma-iommu glue)
- series reordering to ease the review:
  - first part is related to IOMMU
  - second related to MSI sub-system
  - third related to VFIO (except arm-smmu IOMMU_CAP_INTR_REMAP removal)
- expose the number of requested IOVA pages through VFIO_IOMMU_GET_INFO
  [this partially addresses Marc's comments on iommu_get/put_single_reserved
   size/alignment problematic - which I did not ignore - but I don't know
   how much I can do at the moment]

RFC v2 -> RFC v3:
- should fix wrong handling of some CONFIG combinations:
  CONFIG_IOVA, CONFIG_IOMMU_API, CONFIG_PCI_MSI_IRQ_DOMAIN
- fix MSI_FLAG_IRQ_REMAPPING setting in GICv3 ITS (although not tested)

PATCH v1 -> RFC v2:
- reverted to RFC since it looks more reasonable ;-) the code is split
  between VFIO, IOMMU, MSI controller and I am not sure I did the right
  choices. Also API need to be further discussed.
- iova API usage in arm-smmu.c.
- MSI controller natively programs the MSI addr with either the PA or IOVA.
  This is not done anymore in vfio-pci driver as suggested by Alex.
- check irq remapping capability of the group

RFC v1 [2] -> PATCH v1:
- use the existing dma map/unmap ioctl interface with a flag to register a
  reserved IOVA range. Use the legacy Rb to store this special vfio_dma.
- a single reserved IOVA contiguous region now is allowed
- use of an RB tree indexed by PA to store allocated reserved slots
- use of a vfio_domain iova_domain to manage iova allocation within the
  window provided by the userspace
- vfio alloc_map/unmap_free take a vfio_group handle
- vfio_group handle is cached in vfio_pci_device
- add ref counting to bindings
- user modality enabled at the end of the series


Eric Auger (17):
  iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute
  iommu/arm-smmu: advertise DOMAIN_ATTR_MSI_MAPPING attribute
  iommu: introduce a reserved iova cookie
  dma-reserved-iommu: alloc/free_reserved_iova_domain
  dma-reserved-iommu: reserved binding rb-tree and helpers
  dma-reserved-iommu: iommu_get/put_single_reserved
  dma-reserved-iommu: iommu_unmap_reserved
  msi: Add a new MSI_FLAG_IRQ_REMAPPING flag
  irqchip/gic-v3-its: ITS advertises MSI_FLAG_IRQ_REMAPPING
  msi: export msi_get_domain_info
  msi: msi_compose wrapper
  msi: IOMMU map the doorbell address when needed
  vfio: introduce VFIO_IOVA_RESERVED vfio_dma type
  vfio: allow the user to register reserved iova range for MSI mapping
  vfio/type1: also check IRQ remapping capability at msi domain
  iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP
  vfio/type1: return MSI mapping requirements with VFIO_IOMMU_GET_INFO

 drivers/iommu/Kconfig                         |   8 +
 drivers/iommu/Makefile                        |   1 +
 drivers/iommu/arm-smmu-v3.c                   |   2 +
 drivers/iommu/arm-smmu.c                      |   4 +-
 drivers/iommu/dma-reserved-iommu.c            | 270 ++++++++++++++++++++
 drivers/iommu/iommu.c                         |   1 +
 drivers/irqchip/irq-gic-v3-its-pci-msi.c      |   3 +-
 drivers/irqchip/irq-gic-v3-its-platform-msi.c |   3 +-
 drivers/vfio/vfio_iommu_type1.c               | 351 +++++++++++++++++++++++++-
 include/linux/dma-reserved-iommu.h            |  78 ++++++
 include/linux/iommu.h                         |   6 +
 include/linux/msi.h                           |  17 ++
 include/uapi/linux/vfio.h                     |  14 +-
 kernel/irq/msi.c                              | 139 +++++++++-
 14 files changed, 885 insertions(+), 12 deletions(-)
 create mode 100644 drivers/iommu/dma-reserved-iommu.c
 create mode 100644 include/linux/dma-reserved-iommu.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC v5 01/17] iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
@ 2016-03-01 18:27 ` Eric Auger
  2016-03-01 18:27 ` [RFC v5 02/17] iommu/arm-smmu: advertise " Eric Auger
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

Introduce a new DOMAIN_ATTR_MSI_MAPPING domain attribute. If supported,
this means the MSI addresses need to be mapped in the IOMMU.

x86 IOMMUs typically don't expose the attribute since on x86, MSI write
transaction addresses always are within the 1MB PA region [FEE0_0000h -
FEF0_000h] window which directly targets the APIC configuration space and
hence bypass the sMMU. On ARM and PowerPC however MSI transactions are
conveyed through the IOMMU.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v4 -> v5:
- introduce the user in the next patch

RFC v1 -> v1:
- the data field is not used
- for this attribute domain_get_attr simply returns 0 if the MSI_MAPPING
  capability if needed or <0 if not.
- removed struct iommu_domain_msi_maps
---
 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a5c539f..a4fe04a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -112,6 +112,7 @@ enum iommu_attr {
 	DOMAIN_ATTR_FSL_PAMU_ENABLE,
 	DOMAIN_ATTR_FSL_PAMUV1,
 	DOMAIN_ATTR_NESTING,	/* two stages of translation */
+	DOMAIN_ATTR_MSI_MAPPING, /* Require MSIs mapping in iommu */
 	DOMAIN_ATTR_MAX,
 };
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v5 02/17] iommu/arm-smmu: advertise DOMAIN_ATTR_MSI_MAPPING attribute
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
  2016-03-01 18:27 ` [RFC v5 01/17] iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute Eric Auger
@ 2016-03-01 18:27 ` Eric Auger
  2016-03-01 18:27 ` [RFC v5 03/17] iommu: introduce a reserved iova cookie Eric Auger
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

On ARM, MSI write transactions from device upstream to the smmu
are conveyed through the iommu. Therefore target physical addresses
must be mapped and DOMAIN_ATTR_MSI_MAPPING is set to advertise
this requirement on arm-smmu and arm-smmu-v3.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>

---

v4 -> v5:
- don't handle fsl_pamu_domain anymore
- handle arm-smmu-v3
---
 drivers/iommu/arm-smmu-v3.c | 2 ++
 drivers/iommu/arm-smmu.c    | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 2087534..1d7b506 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1895,6 +1895,8 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 	case DOMAIN_ATTR_NESTING:
 		*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
 		return 0;
+	case DOMAIN_ATTR_MSI_MAPPING:
+		return 0;
 	default:
 		return -ENODEV;
 	}
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 59ee4b8..c8b7e71 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1409,6 +1409,8 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 	case DOMAIN_ATTR_NESTING:
 		*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
 		return 0;
+	case DOMAIN_ATTR_MSI_MAPPING:
+		return 0;
 	default:
 		return -ENODEV;
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v5 03/17] iommu: introduce a reserved iova cookie
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
  2016-03-01 18:27 ` [RFC v5 01/17] iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute Eric Auger
  2016-03-01 18:27 ` [RFC v5 02/17] iommu/arm-smmu: advertise " Eric Auger
@ 2016-03-01 18:27 ` Eric Auger
  2016-03-03 16:26   ` Julien Grall
  2016-03-01 18:27 ` [RFC v5 04/17] dma-reserved-iommu: alloc/free_reserved_iova_domain Eric Auger
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

This patch introduces some new fields in the iommu_domain struct,
dedicated to reserved iova management.

In a similar way as DMA mapping IOVA window, we need to store
information related to a reserved IOVA window.

The reserved_iova_cookie will store the reserved iova_domain
handle. An RB tree indexed by physical address is introduced to
store the host physical addresses bound to reserved IOVAs.

Those physical addresses will correspond to MSI frame base
addresses, also referred to as doorbells. Their number should be
quite limited per domain.

Also a mutex is introduced to protect accesses to the iova_domain
and RB tree.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 drivers/iommu/iommu.c | 1 +
 include/linux/iommu.h | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0e3b009..7b2bb94 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1072,6 +1072,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 
 	domain->ops  = bus->iommu_ops;
 	domain->type = type;
+	mutex_init(&domain->reserved_mutex);
 
 	return domain;
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a4fe04a..0189144 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -82,6 +82,11 @@ struct iommu_domain {
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
 	void *iova_cookie;
+	void *reserved_iova_cookie;
+	/* rb tree indexed by PA, for reserved bindings only */
+	struct rb_root reserved_binding_list;
+	/* protects reserved cookie and rbtree manipulation */
+	struct mutex reserved_mutex;
 };
 
 enum iommu_cap {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v5 04/17] dma-reserved-iommu: alloc/free_reserved_iova_domain
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (2 preceding siblings ...)
  2016-03-01 18:27 ` [RFC v5 03/17] iommu: introduce a reserved iova cookie Eric Auger
@ 2016-03-01 18:27 ` Eric Auger
  2016-03-01 18:27 ` [RFC v5 05/17] dma-reserved-iommu: reserved binding rb-tree and helpers Eric Auger
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

Introduce alloc/free_reserved_iova_domain in the IOMMU API.
alloc_reserved_iova_domain initializes an iova domain at a given
iova base address and with a given size. This iova domain will
be used to allocate iova within that window. Those IOVAs will be reserved
for special purpose, typically MSI frame binding. Allocation function
within the reserved iova domain will be introduced in subsequent patches.

Those functions are implemented and exposed if CONFIG_IOMMU_DMA_RESERVED
is seta.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v3 -> v4:
- formerly in "iommu/arm-smmu: implement alloc/free_reserved_iova_domain" &
  "iommu: add alloc/free_reserved_iova_domain"

v2 -> v3:
- remove iommu_alloc_reserved_iova_domain & iommu_free_reserved_iova_domain
  static implementation in case CONFIG_IOMMU_API is not set

v1 -> v2:
- moved from vfio API to IOMMU API
---
 drivers/iommu/Kconfig              |  8 +++++
 drivers/iommu/Makefile             |  1 +
 drivers/iommu/dma-reserved-iommu.c | 74 ++++++++++++++++++++++++++++++++++++++
 include/linux/dma-reserved-iommu.h | 45 +++++++++++++++++++++++
 4 files changed, 128 insertions(+)
 create mode 100644 drivers/iommu/dma-reserved-iommu.c
 create mode 100644 include/linux/dma-reserved-iommu.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index a1e75cb..0775143 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -55,6 +55,12 @@ config IOMMU_DMA
 	select IOMMU_API
 	select IOMMU_IOVA
 
+# IOMMU reserved IOVA mapping (MSI doorbell)
+config IOMMU_DMA_RESERVED
+	bool
+	select IOMMU_API
+	select IOMMU_IOVA
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PPC32
@@ -288,6 +294,7 @@ config SPAPR_TCE_IOMMU
 config ARM_SMMU
 	bool "ARM Ltd. System MMU (SMMU) Support"
 	depends on (ARM64 || ARM) && MMU
+	select IOMMU_DMA_RESERVED
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
 	select ARM_DMA_USE_IOMMU if ARM
@@ -301,6 +308,7 @@ config ARM_SMMU
 config ARM_SMMU_V3
 	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
 	depends on ARM64 && PCI
+	select IOMMU_DMA_RESERVED
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
 	select GENERIC_MSI_IRQ_DOMAIN
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 42fc0c2..ea68d23 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -2,6 +2,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
+obj-$(CONFIG_IOMMU_DMA_RESERVED) += dma-reserved-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
diff --git a/drivers/iommu/dma-reserved-iommu.c b/drivers/iommu/dma-reserved-iommu.c
new file mode 100644
index 0000000..41a1add
--- /dev/null
+++ b/drivers/iommu/dma-reserved-iommu.c
@@ -0,0 +1,74 @@
+/*
+ * Reserved IOVA Management
+ *
+ * Copyright (c) 2015 Linaro Ltd.
+ *              www.linaro.org
+ *
+ * Copyright (C) 2000-2004 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/iommu.h>
+#include <linux/iova.h>
+
+int iommu_alloc_reserved_iova_domain(struct iommu_domain *domain,
+				     dma_addr_t iova, size_t size,
+				     unsigned long order)
+{
+	unsigned long granule, mask;
+	struct iova_domain *iovad;
+	int ret = 0;
+
+	granule = 1UL << order;
+	mask = granule - 1;
+	if (iova & mask || (!size) || (size & mask))
+		return -EINVAL;
+
+	mutex_lock(&domain->reserved_mutex);
+
+	if (domain->reserved_iova_cookie) {
+		ret = -EEXIST;
+		goto unlock;
+	}
+
+	iovad = kzalloc(sizeof(struct iova_domain), GFP_KERNEL);
+	if (!iovad) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	init_iova_domain(iovad, granule,
+			 iova >> order, (iova + size - 1) >> order);
+	domain->reserved_iova_cookie = iovad;
+
+unlock:
+	mutex_unlock(&domain->reserved_mutex);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_alloc_reserved_iova_domain);
+
+void iommu_free_reserved_iova_domain(struct iommu_domain *domain)
+{
+	struct iova_domain *iovad =
+		(struct iova_domain *)domain->reserved_iova_cookie;
+
+	if (!iovad)
+		return;
+
+	mutex_lock(&domain->reserved_mutex);
+
+	put_iova_domain(iovad);
+	kfree(iovad);
+
+	mutex_unlock(&domain->reserved_mutex);
+}
+EXPORT_SYMBOL_GPL(iommu_free_reserved_iova_domain);
diff --git a/include/linux/dma-reserved-iommu.h b/include/linux/dma-reserved-iommu.h
new file mode 100644
index 0000000..5bf863b
--- /dev/null
+++ b/include/linux/dma-reserved-iommu.h
@@ -0,0 +1,45 @@
+/*
+ * Copyright (c) 2015 Linaro Ltd.
+ *              www.linaro.org
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+#ifndef __DMA_RESERVED_IOMMU_H
+#define __DMA_RESERVED_IOMMU_H
+
+#ifdef __KERNEL__
+#include <asm/errno.h>
+
+#ifdef CONFIG_IOMMU_DMA_RESERVED
+#include <linux/iommu.h>
+
+/**
+ * iommu_alloc_reserved_iova_domain: allocate the reserved iova domain
+ *
+ * @domain: iommu domain handle
+ * @iova: base iova address
+ * @size: iova window size
+ * @order: page order
+ */
+int iommu_alloc_reserved_iova_domain(struct iommu_domain *domain,
+				     dma_addr_t iova, size_t size,
+				     unsigned long order);
+
+/**
+ * iommu_free_reserved_iova_domain: free the reserved iova domain
+ *
+ * @domain: iommu domain handle
+ */
+void iommu_free_reserved_iova_domain(struct iommu_domain *domain);
+
+#endif	/* CONFIG_IOMMU_DMA_RESERVED */
+#endif	/* __KERNEL__ */
+#endif	/* __DMA_RESERVED_IOMMU_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v5 05/17] dma-reserved-iommu: reserved binding rb-tree and helpers
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (3 preceding siblings ...)
  2016-03-01 18:27 ` [RFC v5 04/17] dma-reserved-iommu: alloc/free_reserved_iova_domain Eric Auger
@ 2016-03-01 18:27 ` Eric Auger
  2016-03-01 18:27 ` [RFC v5 06/17] dma-reserved-iommu: iommu_get/put_single_reserved Eric Auger
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

we will need to track which host physical addresses are mapped to
reserved IOVA. In that prospect we introduce a new RB tree indexed
by physical address. This RB tree only is used for reserved IOVA
bindings.

It is expected this RB tree will contain very few bindings. Those
generally correspond to single page mapping one MSI frame (GICv2m
frame or ITS GITS_TRANSLATER frame).

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v3 -> v4:
- that code was formerly in "iommu/arm-smmu: add a reserved binding RB tree"
---
 drivers/iommu/dma-reserved-iommu.c | 60 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/drivers/iommu/dma-reserved-iommu.c b/drivers/iommu/dma-reserved-iommu.c
index 41a1add..30d54d0 100644
--- a/drivers/iommu/dma-reserved-iommu.c
+++ b/drivers/iommu/dma-reserved-iommu.c
@@ -20,6 +20,66 @@
 #include <linux/iommu.h>
 #include <linux/iova.h>
 
+struct iommu_reserved_binding {
+	struct kref		kref;
+	struct rb_node		node;
+	struct iommu_domain	*domain;
+	phys_addr_t		addr;
+	dma_addr_t		iova;
+	size_t			size;
+};
+
+/* Reserved binding RB-tree manipulation */
+
+static struct iommu_reserved_binding *find_reserved_binding(
+				    struct iommu_domain *d,
+				    phys_addr_t start, size_t size)
+{
+	struct rb_node *node = d->reserved_binding_list.rb_node;
+
+	while (node) {
+		struct iommu_reserved_binding *binding =
+			rb_entry(node, struct iommu_reserved_binding, node);
+
+		if (start + size <= binding->addr)
+			node = node->rb_left;
+		else if (start >= binding->addr + binding->size)
+			node = node->rb_right;
+		else
+			return binding;
+	}
+
+	return NULL;
+}
+
+static void link_reserved_binding(struct iommu_domain *d,
+				  struct iommu_reserved_binding *new)
+{
+	struct rb_node **link = &d->reserved_binding_list.rb_node;
+	struct rb_node *parent = NULL;
+	struct iommu_reserved_binding *binding;
+
+	while (*link) {
+		parent = *link;
+		binding = rb_entry(parent, struct iommu_reserved_binding,
+				   node);
+
+		if (new->addr + new->size <= binding->addr)
+			link = &(*link)->rb_left;
+		else
+			link = &(*link)->rb_right;
+	}
+
+	rb_link_node(&new->node, parent, link);
+	rb_insert_color(&new->node, &d->reserved_binding_list);
+}
+
+static void unlink_reserved_binding(struct iommu_domain *d,
+				    struct iommu_reserved_binding *old)
+{
+	rb_erase(&old->node, &d->reserved_binding_list);
+}
+
 int iommu_alloc_reserved_iova_domain(struct iommu_domain *domain,
 				     dma_addr_t iova, size_t size,
 				     unsigned long order)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v5 06/17] dma-reserved-iommu: iommu_get/put_single_reserved
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (4 preceding siblings ...)
  2016-03-01 18:27 ` [RFC v5 05/17] dma-reserved-iommu: reserved binding rb-tree and helpers Eric Auger
@ 2016-03-01 18:27 ` Eric Auger
  2016-03-10 11:52   ` Jean-Philippe Brucker
  2016-03-01 18:27 ` [RFC v5 07/17] dma-reserved-iommu: iommu_unmap_reserved Eric Auger
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

This patch introduces iommu_get/put_single_reserved.

iommu_get_single_reserved allows to allocate a new reserved iova page
and map it onto the physical page that contains a given physical address.
Page size is the IOMMU page one. It is the responsability of the
system integrator to make sure the in use IOMMU page size corresponds
to the granularity of the MSI frame.

It returns the iova that is mapped onto the provided physical address.
Hence the physical address passed in argument does not need to be aligned.

In case a mapping already exists between both pages, the IOVA mapped
to the PA is directly returned.

Each time an iova is successfully returned a binding ref count is
incremented.

iommu_put_single_reserved decrements the ref count and when this latter
is null, the mapping is destroyed and the iova is released.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Ankit Jindal <ajindal@apm.com>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>

---

v3 -> v4:
- formerly in iommu: iommu_get/put_single_reserved &
  iommu/arm-smmu: implement iommu_get/put_single_reserved
- Attempted to address Marc's doubts about missing size/alignment
  at VFIO level (user-space knows the IOMMU page size and the number
  of IOVA pages to provision)

v2 -> v3:
- remove static implementation of iommu_get_single_reserved &
  iommu_put_single_reserved when CONFIG_IOMMU_API is not set

v1 -> v2:
- previously a VFIO API, named vfio_alloc_map/unmap_free_reserved_iova
---
 drivers/iommu/dma-reserved-iommu.c | 115 +++++++++++++++++++++++++++++++++++++
 include/linux/dma-reserved-iommu.h |  26 +++++++++
 2 files changed, 141 insertions(+)

diff --git a/drivers/iommu/dma-reserved-iommu.c b/drivers/iommu/dma-reserved-iommu.c
index 30d54d0..537c83e 100644
--- a/drivers/iommu/dma-reserved-iommu.c
+++ b/drivers/iommu/dma-reserved-iommu.c
@@ -132,3 +132,118 @@ void iommu_free_reserved_iova_domain(struct iommu_domain *domain)
 	mutex_unlock(&domain->reserved_mutex);
 }
 EXPORT_SYMBOL_GPL(iommu_free_reserved_iova_domain);
+
+int iommu_get_single_reserved(struct iommu_domain *domain,
+			      phys_addr_t addr, int prot,
+			      dma_addr_t *iova)
+{
+	unsigned long order = __ffs(domain->ops->pgsize_bitmap);
+	size_t page_size = 1 << order;
+	phys_addr_t mask = page_size - 1;
+	phys_addr_t aligned_addr = addr & ~mask;
+	phys_addr_t offset  = addr - aligned_addr;
+	struct iommu_reserved_binding *b;
+	struct iova *p_iova;
+	struct iova_domain *iovad =
+		(struct iova_domain *)domain->reserved_iova_cookie;
+	int ret;
+
+	if (!iovad)
+		return -EINVAL;
+
+	mutex_lock(&domain->reserved_mutex);
+
+	b = find_reserved_binding(domain, aligned_addr, page_size);
+	if (b) {
+		*iova = b->iova + offset;
+		kref_get(&b->kref);
+		ret = 0;
+		goto unlock;
+	}
+
+	/* there is no existing reserved iova for this pa */
+	p_iova = alloc_iova(iovad, 1, iovad->dma_32bit_pfn, true);
+	if (!p_iova) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+	*iova = p_iova->pfn_lo << order;
+
+	b = kzalloc(sizeof(*b), GFP_KERNEL);
+	if (!b) {
+		ret = -ENOMEM;
+		goto free_iova_unlock;
+	}
+
+	ret = iommu_map(domain, *iova, aligned_addr, page_size, prot);
+	if (ret)
+		goto free_binding_iova_unlock;
+
+	kref_init(&b->kref);
+	kref_get(&b->kref);
+	b->domain = domain;
+	b->addr = aligned_addr;
+	b->iova = *iova;
+	b->size = page_size;
+
+	link_reserved_binding(domain, b);
+
+	*iova += offset;
+	goto unlock;
+
+free_binding_iova_unlock:
+	kfree(b);
+free_iova_unlock:
+	free_iova(iovad, *iova >> order);
+unlock:
+	mutex_unlock(&domain->reserved_mutex);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_get_single_reserved);
+
+/* called with reserved_mutex locked */
+static void reserved_binding_release(struct kref *kref)
+{
+	struct iommu_reserved_binding *b =
+		container_of(kref, struct iommu_reserved_binding, kref);
+	struct iommu_domain *d = b->domain;
+	struct iova_domain *iovad =
+		(struct iova_domain *)d->reserved_iova_cookie;
+	unsigned long order = __ffs(b->size);
+
+	iommu_unmap(d, b->iova, b->size);
+	free_iova(iovad, b->iova >> order);
+	unlink_reserved_binding(d, b);
+	kfree(b);
+}
+
+void iommu_put_single_reserved(struct iommu_domain *domain, dma_addr_t iova)
+{
+	unsigned long order;
+	phys_addr_t aligned_addr;
+	dma_addr_t aligned_iova, page_size, mask, offset;
+	struct iommu_reserved_binding *b;
+
+	order = __ffs(domain->ops->pgsize_bitmap);
+	page_size = (uint64_t)1 << order;
+	mask = page_size - 1;
+
+	aligned_iova = iova & ~mask;
+	offset = iova - aligned_iova;
+
+	aligned_addr = iommu_iova_to_phys(domain, aligned_iova);
+
+	mutex_lock(&domain->reserved_mutex);
+
+	b = find_reserved_binding(domain, aligned_addr, page_size);
+	if (!b)
+		goto unlock;
+	kref_put(&b->kref, reserved_binding_release);
+
+unlock:
+	mutex_unlock(&domain->reserved_mutex);
+}
+EXPORT_SYMBOL_GPL(iommu_put_single_reserved);
+
+
+
diff --git a/include/linux/dma-reserved-iommu.h b/include/linux/dma-reserved-iommu.h
index 5bf863b..71ec800 100644
--- a/include/linux/dma-reserved-iommu.h
+++ b/include/linux/dma-reserved-iommu.h
@@ -40,6 +40,32 @@ int iommu_alloc_reserved_iova_domain(struct iommu_domain *domain,
  */
 void iommu_free_reserved_iova_domain(struct iommu_domain *domain);
 
+/**
+ * iommu_get_single_reserved: allocate a reserved iova page and bind
+ * it onto the page that contains a physical address (@addr)
+ *
+ * @domain: iommu domain handle
+ * @addr: physical address to bind
+ * @prot: mapping protection attribute
+ * @iova: returned iova
+ *
+ * In case the 2 pages already are bound simply return @iova and
+ * increment a ref count
+ */
+int iommu_get_single_reserved(struct iommu_domain *domain,
+			      phys_addr_t addr, int prot,
+			      dma_addr_t *iova);
+
+/**
+ * iommu_put_single_reserved: decrement a ref count of the iova page
+ *
+ * @domain: iommu domain handle
+ * @iova: iova whose binding ref count is decremented
+ *
+ * if the binding ref count is null, unmap the iova page and release the iova
+ */
+void iommu_put_single_reserved(struct iommu_domain *domain, dma_addr_t iova);
+
 #endif	/* CONFIG_IOMMU_DMA_RESERVED */
 #endif	/* __KERNEL__ */
 #endif	/* __DMA_RESERVED_IOMMU_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v5 07/17] dma-reserved-iommu: iommu_unmap_reserved
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (5 preceding siblings ...)
  2016-03-01 18:27 ` [RFC v5 06/17] dma-reserved-iommu: iommu_get/put_single_reserved Eric Auger
@ 2016-03-01 18:27 ` Eric Auger
  2016-03-01 18:27 ` [RFC v5 08/17] msi: Add a new MSI_FLAG_IRQ_REMAPPING flag Eric Auger
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

Introduce a new function whose role is to unmap all allocated
reserved IOVAs and free the reserved iova domain

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v3 -> v4:
- previously "iommu/arm-smmu: relinquish reserved resources on
  domain deletion"
---
 drivers/iommu/dma-reserved-iommu.c | 27 ++++++++++++++++++++++++---
 include/linux/dma-reserved-iommu.h |  7 +++++++
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/dma-reserved-iommu.c b/drivers/iommu/dma-reserved-iommu.c
index 537c83e..7217bb7 100644
--- a/drivers/iommu/dma-reserved-iommu.c
+++ b/drivers/iommu/dma-reserved-iommu.c
@@ -116,7 +116,7 @@ unlock:
 }
 EXPORT_SYMBOL_GPL(iommu_alloc_reserved_iova_domain);
 
-void iommu_free_reserved_iova_domain(struct iommu_domain *domain)
+void __iommu_free_reserved_iova_domain(struct iommu_domain *domain)
 {
 	struct iova_domain *iovad =
 		(struct iova_domain *)domain->reserved_iova_cookie;
@@ -124,11 +124,14 @@ void iommu_free_reserved_iova_domain(struct iommu_domain *domain)
 	if (!iovad)
 		return;
 
-	mutex_lock(&domain->reserved_mutex);
-
 	put_iova_domain(iovad);
 	kfree(iovad);
+}
 
+void iommu_free_reserved_iova_domain(struct iommu_domain *domain)
+{
+	mutex_lock(&domain->reserved_mutex);
+	__iommu_free_reserved_iova_domain(domain);
 	mutex_unlock(&domain->reserved_mutex);
 }
 EXPORT_SYMBOL_GPL(iommu_free_reserved_iova_domain);
@@ -245,5 +248,23 @@ unlock:
 }
 EXPORT_SYMBOL_GPL(iommu_put_single_reserved);
 
+void iommu_unmap_reserved(struct iommu_domain *domain)
+{
+	struct rb_node *node;
+
+	mutex_lock(&domain->reserved_mutex);
+	while ((node = rb_first(&domain->reserved_binding_list))) {
+		struct iommu_reserved_binding *b =
+			rb_entry(node, struct iommu_reserved_binding, node);
+
+		while (!kref_put(&b->kref, reserved_binding_release))
+			;
+	}
+	domain->reserved_binding_list = RB_ROOT;
+	__iommu_free_reserved_iova_domain(domain);
+	mutex_unlock(&domain->reserved_mutex);
+}
+EXPORT_SYMBOL_GPL(iommu_unmap_reserved);
+
 
 
diff --git a/include/linux/dma-reserved-iommu.h b/include/linux/dma-reserved-iommu.h
index 71ec800..766c58c 100644
--- a/include/linux/dma-reserved-iommu.h
+++ b/include/linux/dma-reserved-iommu.h
@@ -66,6 +66,13 @@ int iommu_get_single_reserved(struct iommu_domain *domain,
  */
 void iommu_put_single_reserved(struct iommu_domain *domain, dma_addr_t iova);
 
+/**
+ * iommu_unmap_reserved: unmap & destroy the reserved iova bindings
+ *
+ * @domain: iommu domain handle
+ */
+void iommu_unmap_reserved(struct iommu_domain *domain);
+
 #endif	/* CONFIG_IOMMU_DMA_RESERVED */
 #endif	/* __KERNEL__ */
 #endif	/* __DMA_RESERVED_IOMMU_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v5 08/17] msi: Add a new MSI_FLAG_IRQ_REMAPPING flag
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (6 preceding siblings ...)
  2016-03-01 18:27 ` [RFC v5 07/17] dma-reserved-iommu: iommu_unmap_reserved Eric Auger
@ 2016-03-01 18:27 ` Eric Auger
  2016-03-01 18:27 ` [RFC v5 09/17] irqchip/gic-v3-its: ITS advertises MSI_FLAG_IRQ_REMAPPING Eric Auger
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

Let's introduce a new msi_domain_info flag value, MSI_FLAG_IRQ_REMAPPING
meant to tell the domain supports IRQ REMAPPING, also known as Interrupt
Translation Service. On Intel HW this IRQ remapping capability is
abstracted on IOMMU side while on ARM it is abstracted on MSI controller
side. This flag will be used to know whether the MSI passthrough is
safe.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v4 -> v5:
- seperate flag introduction from first user addition (ITS)
---
 include/linux/msi.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index a2a0068..03eda72 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -261,6 +261,8 @@ enum {
 	MSI_FLAG_MULTI_PCI_MSI		= (1 << 3),
 	/* Support PCI MSIX interrupts */
 	MSI_FLAG_PCI_MSIX		= (1 << 4),
+	/* Support MSI IRQ remapping service */
+	MSI_FLAG_IRQ_REMAPPING		= (1 << 5),
 };
 
 int msi_domain_set_affinity(struct irq_data *data, const struct cpumask *mask,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v5 09/17] irqchip/gic-v3-its: ITS advertises MSI_FLAG_IRQ_REMAPPING
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (7 preceding siblings ...)
  2016-03-01 18:27 ` [RFC v5 08/17] msi: Add a new MSI_FLAG_IRQ_REMAPPING flag Eric Auger
@ 2016-03-01 18:27 ` Eric Auger
  2016-03-01 18:27 ` [RFC v5 10/17] msi: export msi_get_domain_info Eric Auger
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

The ITS is the first ARM MSI controller advertising the new
MSI_FLAG_IRQ_REMAPPING flag. It does so because it supports
interrupt translation service. This HW support offers isolation
of MSIs, feature used when using KVM device passthrough.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v5: new
---
 drivers/irqchip/irq-gic-v3-its-pci-msi.c      | 3 ++-
 drivers/irqchip/irq-gic-v3-its-platform-msi.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its-pci-msi.c b/drivers/irqchip/irq-gic-v3-its-pci-msi.c
index aee60ed..8223765 100644
--- a/drivers/irqchip/irq-gic-v3-its-pci-msi.c
+++ b/drivers/irqchip/irq-gic-v3-its-pci-msi.c
@@ -96,7 +96,8 @@ static struct msi_domain_ops its_pci_msi_ops = {
 
 static struct msi_domain_info its_pci_msi_domain_info = {
 	.flags	= (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
-		   MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX),
+		   MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX |
+		   MSI_FLAG_IRQ_REMAPPING),
 	.ops	= &its_pci_msi_ops,
 	.chip	= &its_msi_irq_chip,
 };
diff --git a/drivers/irqchip/irq-gic-v3-its-platform-msi.c b/drivers/irqchip/irq-gic-v3-its-platform-msi.c
index 470b4aa..8c0d69d 100644
--- a/drivers/irqchip/irq-gic-v3-its-platform-msi.c
+++ b/drivers/irqchip/irq-gic-v3-its-platform-msi.c
@@ -63,7 +63,8 @@ static struct msi_domain_ops its_pmsi_ops = {
 };
 
 static struct msi_domain_info its_pmsi_domain_info = {
-	.flags	= (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS),
+	.flags	= (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
+		   MSI_FLAG_IRQ_REMAPPING),
 	.ops	= &its_pmsi_ops,
 	.chip	= &its_pmsi_irq_chip,
 };
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v5 10/17] msi: export msi_get_domain_info
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (8 preceding siblings ...)
  2016-03-01 18:27 ` [RFC v5 09/17] irqchip/gic-v3-its: ITS advertises MSI_FLAG_IRQ_REMAPPING Eric Auger
@ 2016-03-01 18:27 ` Eric Auger
  2016-03-01 18:27 ` [RFC v5 11/17] msi: msi_compose wrapper Eric Auger
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

We plan to use msi_get_domain_info in VFIO module so let's export it.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v2 -> v3:
- remove static implementation in case CONFIG_PCI_MSI_IRQ_DOMAIN is not set
---
 kernel/irq/msi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 38e89ce..9b0ba4a 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -400,5 +400,6 @@ struct msi_domain_info *msi_get_domain_info(struct irq_domain *domain)
 {
 	return (struct msi_domain_info *)domain->host_data;
 }
+EXPORT_SYMBOL_GPL(msi_get_domain_info);
 
 #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v5 11/17] msi: msi_compose wrapper
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (9 preceding siblings ...)
  2016-03-01 18:27 ` [RFC v5 10/17] msi: export msi_get_domain_info Eric Auger
@ 2016-03-01 18:27 ` Eric Auger
  2016-03-01 18:27 ` [RFC v5 12/17] msi: IOMMU map the doorbell address when needed Eric Auger
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

Currently the MSI message is composed by directly calling
irq_chip_compose_msi_msg and erased by setting the memory to zero.

On some platforms, we will need to complexify this composition to
properly handle MSI emission through IOMMU. Also we will need to track
when the MSI message is erased.

We propose to introduce a common wrapper for actual composition and
erasure, msi_compose.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---
v4 -> v5:
- just introduce the msi-compose wrapper without adding new
  functionalities

v3 -> v4:
- that code was formely in irq-gic-common.c
  "irqchip/gicv2m/v3-its-pci-msi: IOMMU map the MSI frame when needed"
  also the [un]mapping was done in irq_write_msi_msg; now done on compose

v2 -> v3:
- protect iova/addr manipulation with CONFIG_ARCH_DMA_ADDR_T_64BIT and
  CONFIG_PHYS_ADDR_T_64BIT
- only expose gic_pci_msi_domain_write_msg in case CONFIG_IOMMU_API &
  CONFIG_PCI_MSI_IRQ_DOMAIN are set.
- gic_set/unset_msi_addr duly become static
---
 kernel/irq/msi.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 9b0ba4a..72bf4d6 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -55,6 +55,19 @@ static inline void irq_chip_write_msi_msg(struct irq_data *data,
 	data->chip->irq_write_msi_msg(data, msg);
 }
 
+static int msi_compose(struct irq_data *irq_data,
+		       struct msi_msg *msg, bool erase)
+{
+	int ret = 0;
+
+	if (erase)
+		memset(msg, 0, sizeof(*msg));
+	else
+		ret = irq_chip_compose_msi_msg(irq_data, msg);
+
+	return ret;
+}
+
 /**
  * msi_domain_set_affinity - Generic affinity setter function for MSI domains
  * @irq_data:	The irq data associated to the interrupt
@@ -73,7 +86,7 @@ int msi_domain_set_affinity(struct irq_data *irq_data,
 
 	ret = parent->chip->irq_set_affinity(parent, mask, force);
 	if (ret >= 0 && ret != IRQ_SET_MASK_OK_DONE) {
-		BUG_ON(irq_chip_compose_msi_msg(irq_data, &msg));
+		BUG_ON(msi_compose(irq_data, &msg, false));
 		irq_chip_write_msi_msg(irq_data, &msg);
 	}
 
@@ -85,7 +98,7 @@ static void msi_domain_activate(struct irq_domain *domain,
 {
 	struct msi_msg msg;
 
-	BUG_ON(irq_chip_compose_msi_msg(irq_data, &msg));
+	BUG_ON(msi_compose(irq_data, &msg, false));
 	irq_chip_write_msi_msg(irq_data, &msg);
 }
 
@@ -94,7 +107,7 @@ static void msi_domain_deactivate(struct irq_domain *domain,
 {
 	struct msi_msg msg;
 
-	memset(&msg, 0, sizeof(msg));
+	msi_compose(irq_data, &msg, true);
 	irq_chip_write_msi_msg(irq_data, &msg);
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v5 12/17] msi: IOMMU map the doorbell address when needed
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (10 preceding siblings ...)
  2016-03-01 18:27 ` [RFC v5 11/17] msi: msi_compose wrapper Eric Auger
@ 2016-03-01 18:27 ` Eric Auger
  2016-03-01 18:27 ` [RFC v5 13/17] vfio: introduce VFIO_IOVA_RESERVED vfio_dma type Eric Auger
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

In case the msi is emitted by a device attached to an iommu
domain and this iommu domain requires MSI mapping, the msi
address (aka doorbell) must be mapped in the IOMMU. Else
MSI write transaction will cause a fault.

We perform this action at msi message composition time. On any
MSI address change and MSI message erasure we decrement the reference
counter to the IOMMU binding.

In case the mapping fails we just WARN_ON.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v5:
- use macros to increase the readability
- add comments
- fix a typo that caused a compilation error if CONFIG_IOMMU_API
  is not set
---
 include/linux/msi.h |  15 +++++++
 kernel/irq/msi.c    | 119 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 134 insertions(+)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index 03eda72..b920cac 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -10,6 +10,21 @@ struct msi_msg {
 	u32	data;		/* 16 bits of msi message data */
 };
 
+/* Helpers to convert the msi message address to a an iova/physical address */
+#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
+#define msg_to_dma_addr(msg) \
+	(((dma_addr_t)((msg)->address_hi) << 32) | (msg)->address_lo)
+#else
+#define msg_to_dma_addr(msg) ((msg)->address_lo)
+#endif
+
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+#define msg_to_phys_addr(msg) \
+	(((phys_addr_t)((msg)->address_hi) << 32) | (msg)->address_lo)
+#else
+#define msg_to_phys_addr(msg)	((msg)->address_lo)
+#endif
+
 extern int pci_msi_ignore_mask;
 /* Helper functions */
 struct irq_data;
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 72bf4d6..8ddbe57 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -17,6 +17,8 @@
 
 /* Temparory solution for building, will be removed later */
 #include <linux/pci.h>
+#include <linux/iommu.h>
+#include <linux/dma-reserved-iommu.h>
 
 struct msi_desc *alloc_msi_entry(struct device *dev)
 {
@@ -55,16 +57,133 @@ static inline void irq_chip_write_msi_msg(struct irq_data *data,
 	data->chip->irq_write_msi_msg(data, msg);
 }
 
+/**
+ * msi_map_doorbell: make sure an IOMMU mapping exists on domain @d
+ * for the message physical address (aka. doorbell)
+ *
+ * Either allocate an IOVA and create a mapping or simply increment
+ * a reference count on the existing IOMMU mapping
+ * @d: iommu domain handle the mapping belongs to
+ * @msg: msi message handle
+ */
+static int msi_map_doorbell(struct iommu_domain *d, struct msi_msg *msg)
+{
+#ifdef CONFIG_IOMMU_DMA_RESERVED
+	phys_addr_t addr;
+	dma_addr_t iova;
+	int ret;
+
+	addr = msg_to_phys_addr(msg);
+	ret = iommu_get_single_reserved(d, addr, IOMMU_WRITE, &iova);
+	if (!ret) {
+		msg->address_lo = lower_32_bits(iova);
+		msg->address_hi = upper_32_bits(iova);
+	}
+	return ret;
+#else
+	return -ENODEV;
+#endif
+}
+
+/**
+ * msi_unmap_doorbell: decrements the reference count on an existing
+ * doorbell IOMMU mapping
+ *
+ * @d: iommu domain the mapping is attached to
+ * @msg: msi message containing the doorbell IOVA to unbind
+ */
+static void msi_unmap_doorbell(struct iommu_domain *d, struct msi_msg *msg)
+{
+#ifdef CONFIG_IOMMU_DMA_RESERVED
+	dma_addr_t iova;
+
+	iova = msg_to_dma_addr(msg);
+	iommu_put_single_reserved(d, iova);
+#endif
+}
+
+#ifdef CONFIG_IOMMU_API
+/**
+ * irq_data_to_msi_mapping_domain: checks if an irq corresponds to
+ * an MSI whose write address must be mapped in an IOMMU domain
+ *
+ * determine whether the irq corresponds to an MSI emitted by a device,
+ * upstream to an IOMMU, and if this IOMMU requires a binding of the
+ * MSI address
+ *
+ * @irq_data: irq data handle
+ */
+static struct iommu_domain *
+irq_data_to_msi_mapping_domain(struct irq_data *irq_data)
+{
+	struct iommu_domain *d;
+	struct msi_desc *desc;
+	struct device *dev;
+	int ret;
+
+	desc = irq_data_get_msi_desc(irq_data);
+	if (!desc)
+		return NULL;
+
+	dev = msi_desc_to_dev(desc);
+
+	d = iommu_get_domain_for_dev(dev);
+	if (!d)
+		return NULL;
+
+	ret = iommu_domain_get_attr(d, DOMAIN_ATTR_MSI_MAPPING, NULL);
+	if (!ret)
+		return d;
+	else
+		return NULL;
+}
+#else
+static inline struct iommu_domain *
+irq_data_to_msi_mapping_domain(struct irq_data *irq_data)
+{
+	return NULL;
+}
+#endif /* CONFIG_IOMMU_API */
+
 static int msi_compose(struct irq_data *irq_data,
 		       struct msi_msg *msg, bool erase)
 {
+	struct msi_msg old_msg;
+	struct iommu_domain *d;
 	int ret = 0;
 
+	/*
+	 * Does this IRQ require an MSI address mapping in an IOMMU?
+	 * If it does, read the existing cached message. This will allow
+	 * to check if the IOMMU mapping needs an update
+	 */
+	d = irq_data_to_msi_mapping_domain(irq_data);
+	if (unlikely(d))
+		get_cached_msi_msg(irq_data->irq, &old_msg);
+
 	if (erase)
 		memset(msg, 0, sizeof(*msg));
 	else
 		ret = irq_chip_compose_msi_msg(irq_data, msg);
 
+	if (!d)
+		goto out;
+
+	/*
+	 * An MSI address IOMMU binding needs to be handled.
+	 * In case we have a change in the MSI address or an MSI
+	 * message erasure, destroy the existing binding.
+	 * In case we have an actual MSI message composition
+	 * bind the new MSI address
+	 */
+	if ((old_msg.address_lo != msg->address_lo) ||
+	    (old_msg.address_hi != msg->address_hi))
+		msi_unmap_doorbell(d, &old_msg);
+
+	if (!erase)
+		WARN_ON(msi_map_doorbell(d, msg));
+
+out:
 	return ret;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v5 13/17] vfio: introduce VFIO_IOVA_RESERVED vfio_dma type
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (11 preceding siblings ...)
  2016-03-01 18:27 ` [RFC v5 12/17] msi: IOMMU map the doorbell address when needed Eric Auger
@ 2016-03-01 18:27 ` Eric Auger
  2016-03-01 18:27 ` [RFC v5 14/17] vfio: allow the user to register reserved iova range for MSI mapping Eric Auger
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

We introduce a vfio_dma type since we will need to discriminate
legacy vfio_dma's from new reserved ones. Since those latter are
not mapped at registration, some treatments need to be reworked:
removal, replay. Currently they are unplugged. In subsequent patches
they will be reworked.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 drivers/vfio/vfio_iommu_type1.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 6f1ea3d..692e9a2 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -53,6 +53,15 @@ module_param_named(disable_hugepages,
 MODULE_PARM_DESC(disable_hugepages,
 		 "Disable VFIO IOMMU support for IOMMU hugepages.");
 
+enum vfio_iova_type {
+	VFIO_IOVA_USER = 0, /* standard IOVA used to map user vaddr */
+	/*
+	 * IOVA reserved to map special host physical addresses,
+	 * MSI frames for instance
+	 */
+	VFIO_IOVA_RESERVED,
+};
+
 struct vfio_iommu {
 	struct list_head	domain_list;
 	struct mutex		lock;
@@ -75,6 +84,7 @@ struct vfio_dma {
 	unsigned long		vaddr;		/* Process virtual addr */
 	size_t			size;		/* Map size (bytes) */
 	int			prot;		/* IOMMU_READ/WRITE */
+	enum vfio_iova_type	type;		/* type of IOVA */
 };
 
 struct vfio_group {
@@ -395,7 +405,8 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 
 static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
 {
-	vfio_unmap_unpin(iommu, dma);
+	if (likely(dma->type != VFIO_IOVA_RESERVED))
+		vfio_unmap_unpin(iommu, dma);
 	vfio_unlink_dma(iommu, dma);
 	kfree(dma);
 }
@@ -671,6 +682,10 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 		dma_addr_t iova;
 
 		dma = rb_entry(n, struct vfio_dma, node);
+
+		if (unlikely(dma->type == VFIO_IOVA_RESERVED))
+			continue;
+
 		iova = dma->iova;
 
 		while (iova < dma->iova + dma->size) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v5 14/17] vfio: allow the user to register reserved iova range for MSI mapping
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (12 preceding siblings ...)
  2016-03-01 18:27 ` [RFC v5 13/17] vfio: introduce VFIO_IOVA_RESERVED vfio_dma type Eric Auger
@ 2016-03-01 18:27 ` Eric Auger
  2016-03-01 18:27 ` [RFC v5 15/17] vfio/type1: also check IRQ remapping capability at msi domain Eric Auger
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

The user is allowed to [un]register a reserved IOVA range by using the
DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
It provides the base address and the size. This region is stored in the
vfio_dma rb tree. At that point the iova range is not mapped to any target
address yet. The host kernel will use those iova when needed, typically
when the VFIO-PCI device allocates its MSIs.

This patch also handles the destruction of the reserved binding RB-tree and
domain's iova_domains.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>

---
v3 -> v4:
- use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
- protect vfio_register_reserved_iova_range implementation with
  CONFIG_IOMMU_DMA_RESERVED
- handle unregistration by user-space and on vfio_iommu_type1 release

v1 -> v2:
- set returned value according to alloc_reserved_iova_domain result
- free the iova domains in case any error occurs

RFC v1 -> v1:
- takes into account Alex comments, based on
  [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
- use the existing dma map/unmap ioctl interface with a flag to register
  a reserved IOVA range. A single reserved iova region is allowed.
---
 drivers/vfio/vfio_iommu_type1.c | 141 +++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/vfio.h       |  12 +++-
 2 files changed, 150 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 692e9a2..4e01ebe 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -36,6 +36,7 @@
 #include <linux/uaccess.h>
 #include <linux/vfio.h>
 #include <linux/workqueue.h>
+#include <linux/dma-reserved-iommu.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
@@ -403,10 +404,22 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 	vfio_lock_acct(-unlocked);
 }
 
+static void vfio_unmap_reserved(struct vfio_iommu *iommu)
+{
+#ifdef CONFIG_IOMMU_DMA_RESERVED
+	struct vfio_domain *d;
+
+	list_for_each_entry(d, &iommu->domain_list, next)
+		iommu_unmap_reserved(d->domain);
+#endif
+}
+
 static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
 {
 	if (likely(dma->type != VFIO_IOVA_RESERVED))
 		vfio_unmap_unpin(iommu, dma);
+	else
+		vfio_unmap_reserved(iommu);
 	vfio_unlink_dma(iommu, dma);
 	kfree(dma);
 }
@@ -489,7 +502,8 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 	 */
 	if (iommu->v2) {
 		dma = vfio_find_dma(iommu, unmap->iova, 0);
-		if (dma && dma->iova != unmap->iova) {
+		if (dma && (dma->iova != unmap->iova ||
+			   (dma->type == VFIO_IOVA_RESERVED))) {
 			ret = -EINVAL;
 			goto unlock;
 		}
@@ -501,6 +515,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 	}
 
 	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
+		if (dma->type == VFIO_IOVA_RESERVED) {
+			ret = -EINVAL;
+			goto unlock;
+		}
 		if (!iommu->v2 && unmap->iova > dma->iova)
 			break;
 		unmapped += dma->size;
@@ -650,6 +668,114 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static int vfio_register_reserved_iova_range(struct vfio_iommu *iommu,
+			   struct vfio_iommu_type1_dma_map *map)
+{
+#ifdef CONFIG_IOMMU_DMA_RESERVED
+	dma_addr_t iova = map->iova;
+	size_t size = map->size;
+	uint64_t mask;
+	struct vfio_dma *dma;
+	int ret = 0;
+	struct vfio_domain *d;
+	unsigned long order;
+
+	/* Verify that none of our __u64 fields overflow */
+	if (map->size != size || map->iova != iova)
+		return -EINVAL;
+
+	order =  __ffs(vfio_pgsize_bitmap(iommu));
+	mask = ((uint64_t)1 << order) - 1;
+
+	WARN_ON(mask & PAGE_MASK);
+
+	if (!size || (size | iova) & mask)
+		return -EINVAL;
+
+	/* Don't allow IOVA address wrap */
+	if (iova + size - 1 < iova)
+		return -EINVAL;
+
+	mutex_lock(&iommu->lock);
+
+	if (vfio_find_dma(iommu, iova, size)) {
+		ret =  -EEXIST;
+		goto out;
+	}
+
+	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
+	if (!dma) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	dma->iova = iova;
+	dma->size = size;
+	dma->type = VFIO_IOVA_RESERVED;
+
+	list_for_each_entry(d, &iommu->domain_list, next)
+		ret |= iommu_alloc_reserved_iova_domain(d->domain, iova,
+							size, order);
+
+	if (ret) {
+		list_for_each_entry(d, &iommu->domain_list, next)
+			iommu_free_reserved_iova_domain(d->domain);
+		goto out;
+	}
+
+	vfio_link_dma(iommu, dma);
+
+out:
+	mutex_unlock(&iommu->lock);
+	return ret;
+#else /* CONFIG_IOMMU_DMA_RESERVED */
+	return -ENODEV;
+#endif
+}
+
+static void vfio_unregister_reserved_iova_range(struct vfio_iommu *iommu,
+				struct vfio_iommu_type1_dma_unmap *unmap)
+{
+#ifdef CONFIG_IOMMU_DMA_RESERVED
+	dma_addr_t iova = unmap->iova;
+	struct vfio_dma *dma;
+	size_t size = unmap->size;
+	uint64_t mask;
+	unsigned long order;
+
+	/* Verify that none of our __u64 fields overflow */
+	if (unmap->size != size || unmap->iova != iova)
+		return;
+
+	order =  __ffs(vfio_pgsize_bitmap(iommu));
+	mask = ((uint64_t)1 << order) - 1;
+
+	WARN_ON(mask & PAGE_MASK);
+
+	if (!size || (size | iova) & mask)
+		return;
+
+	/* Don't allow IOVA address wrap */
+	if (iova + size - 1 < iova)
+		return;
+
+	mutex_lock(&iommu->lock);
+
+	dma = vfio_find_dma(iommu, iova, size);
+
+	if (!dma || (dma->type != VFIO_IOVA_RESERVED)) {
+		unmap->size = 0;
+		goto out;
+	}
+
+	unmap->size =  dma->size;
+	vfio_remove_dma(iommu, dma);
+
+out:
+	mutex_unlock(&iommu->lock);
+#endif
+}
+
 static int vfio_bus_type(struct device *dev, void *data)
 {
 	struct bus_type **bus = data;
@@ -946,6 +1072,7 @@ static void vfio_iommu_type1_release(void *iommu_data)
 	struct vfio_group *group, *group_tmp;
 
 	vfio_iommu_unmap_unpin_all(iommu);
+	vfio_unmap_reserved(iommu);
 
 	list_for_each_entry_safe(domain, domain_tmp,
 				 &iommu->domain_list, next) {
@@ -1019,7 +1146,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
 		struct vfio_iommu_type1_dma_map map;
 		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
-				VFIO_DMA_MAP_FLAG_WRITE;
+				VFIO_DMA_MAP_FLAG_WRITE |
+				VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA;
 
 		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
 
@@ -1029,6 +1157,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		if (map.argsz < minsz || map.flags & ~mask)
 			return -EINVAL;
 
+		if (map.flags & VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA)
+			return vfio_register_reserved_iova_range(iommu, &map);
+
 		return vfio_dma_do_map(iommu, &map);
 
 	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
@@ -1043,10 +1174,16 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		if (unmap.argsz < minsz || unmap.flags)
 			return -EINVAL;
 
+		if (unmap.flags & VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA) {
+			vfio_unregister_reserved_iova_range(iommu, &unmap);
+			goto out;
+		}
+
 		ret = vfio_dma_do_unmap(iommu, &unmap);
 		if (ret)
 			return ret;
 
+out:
 		return copy_to_user((void __user *)arg, &unmap, minsz);
 	}
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 7d7a4c6..d5a48e7 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -410,12 +410,21 @@ struct vfio_iommu_type1_info {
  *
  * Map process virtual addresses to IO virtual addresses using the
  * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
+ *
+ * In case MSI_RESERVED_IOVA flag is set, the API only aims at registering an
+ * IOVA region which will be used on some platforms to map the host MSI frame.
+ * in that specific case, vaddr and prot are ignored. The requirement for
+ * provisioning such IOVA range can be checked by calling VFIO_IOMMU_GET_INFO
+ * with the VFIO_IOMMU_INFO_REQUIRE_MSI_MAP attribute. A single
+ * MSI_RESERVED_IOVA region can be registered
  */
 struct vfio_iommu_type1_dma_map {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
 #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
+/* reserved iova for MSI vectors*/
+#define VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA (1 << 2)
 	__u64	vaddr;				/* Process virtual address */
 	__u64	iova;				/* IO virtual address */
 	__u64	size;				/* Size of mapping (bytes) */
@@ -431,7 +440,8 @@ struct vfio_iommu_type1_dma_map {
  * Caller sets argsz.  The actual unmapped size is returned in the size
  * field.  No guarantee is made to the user that arbitrary unmaps of iova
  * or size different from those used in the original mapping call will
- * succeed.
+ * succeed. A Reserved DMA region must be unmapped with MSI_RESERVED_IOVA
+ * flag set.
  */
 struct vfio_iommu_type1_dma_unmap {
 	__u32	argsz;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v5 15/17] vfio/type1: also check IRQ remapping capability at msi domain
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (13 preceding siblings ...)
  2016-03-01 18:27 ` [RFC v5 14/17] vfio: allow the user to register reserved iova range for MSI mapping Eric Auger
@ 2016-03-01 18:27 ` Eric Auger
  2016-03-01 18:27 ` [RFC v5 16/17] iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP Eric Auger
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
by the msi controller. vfio_safe_irq_domain allows to check whether
interrupts are "safe" for a given device. They are if the device does
not use MSI or if the device uses MSI and the msi-parent controller
supports IRQ remapping.

Then we check at group level if all devices have safe interrupts: if not,
we only allow the group to be attached if allow_unsafe_interrupts is set.

At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
changed in next patch.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---
v3 -> v4:
- rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
  and irq_remapping into safe_irq_domains

v2 -> v3:
- protect vfio_msi_parent_irq_remapping_capable with
  CONFIG_GENERIC_MSI_IRQ_DOMAIN
---
 drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 4e01ebe..88a40f1 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -37,6 +37,8 @@
 #include <linux/vfio.h>
 #include <linux/workqueue.h>
 #include <linux/dma-reserved-iommu.h>
+#include <linux/irqdomain.h>
+#include <linux/msi.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
@@ -788,6 +790,33 @@ static int vfio_bus_type(struct device *dev, void *data)
 	return 0;
 }
 
+/**
+ * vfio_safe_irq_domain: returns whether the irq domain
+ * the device is attached to is safe with respect to MSI isolation.
+ * If the irq domain is not an MSI domain, we return it is safe.
+ *
+ * @dev: device handle
+ * @data: unused
+ * returns 0 if the irq domain is safe, -1 if not.
+ */
+static int vfio_safe_irq_domain(struct device *dev, void *data)
+{
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+	struct irq_domain *domain;
+	struct msi_domain_info *info;
+
+	domain = dev_get_msi_domain(dev);
+	if (!domain)
+		return 0;
+
+	info = msi_get_domain_info(domain);
+
+	if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
+		return -1;
+#endif
+	return 0;
+}
+
 static int vfio_iommu_replay(struct vfio_iommu *iommu,
 			     struct vfio_domain *domain)
 {
@@ -882,7 +911,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	struct vfio_group *group, *g;
 	struct vfio_domain *domain, *d;
 	struct bus_type *bus = NULL;
-	int ret;
+	int ret, safe_irq_domains;
 
 	mutex_lock(&iommu->lock);
 
@@ -905,6 +934,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 
 	group->iommu_group = iommu_group;
 
+	/*
+	 * Determine if all the devices of the group have a safe irq domain
+	 * with respect to MSI isolation
+	 */
+	safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
+				       vfio_safe_irq_domain);
+
 	/* Determine bus_type in order to allocate a domain */
 	ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
 	if (ret)
@@ -932,8 +968,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	INIT_LIST_HEAD(&domain->group_list);
 	list_add(&group->next, &domain->group_list);
 
+	/*
+	 * to advertise safe interrupts either the IOMMU or the MSI controllers
+	 * must support IRQ remapping/interrupt translation
+	 */
 	if (!allow_unsafe_interrupts &&
-	    !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
+	    (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
 		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
 		       __func__);
 		ret = -EPERM;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v5 16/17] iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (14 preceding siblings ...)
  2016-03-01 18:27 ` [RFC v5 15/17] vfio/type1: also check IRQ remapping capability at msi domain Eric Auger
@ 2016-03-01 18:27 ` Eric Auger
  2016-03-01 18:27 ` [RFC v5 17/17] vfio/type1: return MSI mapping requirements with VFIO_IOMMU_GET_INFO Eric Auger
  2016-03-02  8:11 ` [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Jaggi, Manish
  17 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

Do not advertise IOMMU_CAP_INTR_REMAP for arm-smmu. Indeed the
irq_remapping capability is abstracted on irqchip side for ARM as
opposed to Intel IOMMU featuring IRQ remapping HW.

So to check IRQ remapping capability, the msi domain needs to be
checked instead.

This commit needs to be applied after "vfio/type1: also check IRQ
remapping capability at msi domain" else the legacy interrupt
assignment gets broken with arm-smmu.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 drivers/iommu/arm-smmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c8b7e71..ce988fb 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1284,7 +1284,7 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 		 */
 		return true;
 	case IOMMU_CAP_INTR_REMAP:
-		return true; /* MSIs are just memory writes */
+		return false; /* interrupt translation handled at MSI controller level */
 	case IOMMU_CAP_NOEXEC:
 		return true;
 	default:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC v5 17/17] vfio/type1: return MSI mapping requirements with VFIO_IOMMU_GET_INFO
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (15 preceding siblings ...)
  2016-03-01 18:27 ` [RFC v5 16/17] iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP Eric Auger
@ 2016-03-01 18:27 ` Eric Auger
  2016-03-02  8:11 ` [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Jaggi, Manish
  17 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-01 18:27 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu

This patch allows the user-space to know whether MSI addresses need to
be mapped in the IOMMU. The user-space uses VFIO_IOMMU_GET_INFO ioctl and
IOMMU_INFO_REQUIRE_MSI_MAP gets set if they need to.

Also the number of IOMMU pages requested to map those is returned in
msi_iova_pages field. User-space must use this information to allocate an
IOVA contiguous region of size msi_iova_pages * ffs(iova_pgsizes) and pass
it with VFIO_IOMMU_MAP_DMA iotcl (VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA set).

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

Currently it is assumed a single doorbell page is used per MSI controller.
This is the case for known ARM MSI controllers (GICv2M, GICv3 ITS, ...).
If an MSI controller were to expose more doorbells it could implement a
new callback at irq_chip interface.

v4 -> v5:
- move msi_info and ret declaration within the conditional code

v3 -> v4:
- replace former vfio_domains_require_msi_mapping by
  more complex computation of MSI mapping requirements, especially the
  number of pages to be provided by the user-space.
- reword patch title

RFC v1 -> v1:
- derived from
  [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
- renamed allow_msi_reconfig into require_msi_mapping
- fixed VFIO_IOMMU_GET_INFO
---
 drivers/vfio/vfio_iommu_type1.c | 149 ++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       |   2 +
 2 files changed, 151 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 88a40f1..17a941c 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -39,6 +39,7 @@
 #include <linux/dma-reserved-iommu.h>
 #include <linux/irqdomain.h>
 #include <linux/msi.h>
+#include <linux/irq.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
@@ -95,6 +96,18 @@ struct vfio_group {
 	struct list_head	next;
 };
 
+struct vfio_irq_chip {
+	struct list_head next;
+	struct irq_chip *chip;
+};
+
+struct vfio_msi_map_info {
+	bool mapping_required;
+	size_t page_size;
+	unsigned int iova_pages;
+	struct list_head irq_chip_list;
+};
+
 /*
  * This code handles mapping and unmapping of user data buffers
  * into DMA'ble space using the IOMMU
@@ -267,6 +280,128 @@ static int vaddr_get_pfn(unsigned long vaddr, int prot, unsigned long *pfn)
 	return ret;
 }
 
+#if defined(CONFIG_GENERIC_MSI_IRQ_DOMAIN) && defined(CONFIG_IOMMU_DMA_RESERVED)
+/**
+ * vfio_dev_compute_msi_map_info: augment MSI mapping info (@data) with
+ * the @dev device requirements.
+ *
+ * @dev: device handle
+ * @data: opaque pointing to a struct vfio_msi_map_info
+ *
+ * returns 0 upon success or -ENOMEM
+ */
+static int vfio_dev_compute_msi_map_info(struct device *dev, void *data)
+{
+	struct irq_domain *domain;
+	struct msi_domain_info *info;
+	struct vfio_msi_map_info *msi_info = (struct vfio_msi_map_info *)data;
+	struct irq_chip *chip;
+	struct vfio_irq_chip *iter, *new;
+
+	domain = dev_get_msi_domain(dev);
+	if (!domain)
+		return 0;
+
+	/* Let's compute the needs for the MSI domain */
+	info = msi_get_domain_info(domain);
+	chip = info->chip;
+	list_for_each_entry(iter, &msi_info->irq_chip_list, next) {
+		if (iter->chip == chip)
+			return 0;
+	}
+
+	new = kzalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+
+	new->chip = chip;
+
+	list_add(&new->next, &msi_info->irq_chip_list);
+
+	/*
+	 * new irq_chip to be taken into account; we currently assume
+	 * a single iova doorbell by irq chip requesting MSI mapping
+	 */
+	msi_info->iova_pages += 1;
+	return 0;
+}
+
+/**
+ * vfio_domain_compute_msi_map_info: compute MSI mapping requirements (@data)
+ * for vfio_domain @d
+ *
+ * @d: vfio domain handle
+ * @data: opaque pointing to a struct vfio_msi_map_info
+ *
+ * returns 0 upon success or -ENOMEM
+ */
+static int vfio_domain_compute_msi_map_info(struct vfio_domain *d, void *data)
+{
+	int ret = 0;
+	struct vfio_msi_map_info *msi_info = (struct vfio_msi_map_info *)data;
+	struct vfio_irq_chip *iter, *tmp;
+	struct vfio_group *g;
+
+	msi_info->iova_pages = 0;
+	INIT_LIST_HEAD(&msi_info->irq_chip_list);
+
+	if (iommu_domain_get_attr(d->domain,
+				   DOMAIN_ATTR_MSI_MAPPING, NULL))
+		return 0;
+	msi_info->mapping_required = true;
+	list_for_each_entry(g, &d->group_list, next) {
+		ret = iommu_group_for_each_dev(g->iommu_group, msi_info,
+			   vfio_dev_compute_msi_map_info);
+		if (ret)
+			goto out;
+	}
+out:
+	list_for_each_entry_safe(iter, tmp, &msi_info->irq_chip_list, next) {
+		list_del(&iter->next);
+		kfree(iter);
+	}
+	return ret;
+}
+
+/**
+ * vfio_compute_msi_map_info: compute MSI mapping requirements
+ *
+ * Do some MSI addresses need to be mapped? IOMMU page size?
+ * Max number of IOVA pages needed by any domain to map MSI
+ *
+ * @iommu: iommu handle
+ * @info: msi map info handle
+ *
+ * returns 0 upon success or -ENOMEM
+ */
+static int vfio_compute_msi_map_info(struct vfio_iommu *iommu,
+				 struct vfio_msi_map_info *msi_info)
+{
+	int ret = 0;
+	struct vfio_domain *d;
+	unsigned long bitmap = ULONG_MAX;
+	unsigned int iova_pages = 0;
+
+	msi_info->mapping_required = false;
+
+	mutex_lock(&iommu->lock);
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		bitmap &= d->domain->ops->pgsize_bitmap;
+		ret = vfio_domain_compute_msi_map_info(d, msi_info);
+		if (ret)
+			goto out;
+		if (msi_info->iova_pages > iova_pages)
+			iova_pages = msi_info->iova_pages;
+	}
+out:
+	msi_info->page_size = 1 << __ffs(bitmap);
+	msi_info->iova_pages = iova_pages;
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
+#endif
+
 /*
  * Attempt to pin pages.  We really don't want to track all the pfns and
  * the iommu can only map chunks of consecutive pfns anyway, so get the
@@ -1179,6 +1314,20 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 		info.flags = VFIO_IOMMU_INFO_PGSIZES;
 
+#if defined(CONFIG_GENERIC_MSI_IRQ_DOMAIN) && defined(CONFIG_IOMMU_DMA_RESERVED)
+		{
+			struct vfio_msi_map_info msi_info;
+			int ret;
+
+			ret = vfio_compute_msi_map_info(iommu, &msi_info);
+			if (ret)
+				return ret;
+
+			if (msi_info.mapping_required)
+				info.flags |= VFIO_IOMMU_INFO_REQUIRE_MSI_MAP;
+			info.msi_iova_pages = msi_info.iova_pages;
+		}
+#endif
 		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
 		return copy_to_user((void __user *)arg, &info, minsz);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index d5a48e7..863c68a 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -400,7 +400,9 @@ struct vfio_iommu_type1_info {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
+#define VFIO_IOMMU_INFO_REQUIRE_MSI_MAP (1 << 1)/* MSI must be mapped */
 	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
+	__u32   msi_iova_pages;	/* number of IOVA pages needed to map MSIs */
 };
 
 #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64
  2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (16 preceding siblings ...)
  2016-03-01 18:27 ` [RFC v5 17/17] vfio/type1: return MSI mapping requirements with VFIO_IOMMU_GET_INFO Eric Auger
@ 2016-03-02  8:11 ` Jaggi, Manish
  2016-03-02 12:30   ` Eric Auger
  17 siblings, 1 reply; 24+ messages in thread
From: Jaggi, Manish @ 2016-03-02  8:11 UTC (permalink / raw)
  To: Eric Auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Bharat.Bhushan,
	pranav.sawargaonkar, p.fedin, iommu



>>From: Eric Auger <eric.auger@linaro.org>
>>Sent: Tuesday, March 1, 2016 11:57 PM
>>To: eric.auger@st.com; eric.auger@linaro.org; robin.murphy@arm.com; alex.williamson@redhat.com; will.deacon@arm.com; joro@8bytes.org; tglx@linutronix.de; >>jason@lakedaemon.net; marc.zyngier@arm.com; christoffer.dall@linaro.org; linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org
>>Cc: suravee.suthikulpanit@amd.com; patches@linaro.org; linux-kernel@vger.kernel.org; Jaggi, Manish; Bharat.Bhushan@freescale.com; >>pranav.sawargaonkar@gmail.com; p.fedin@samsung.com; iommu@lists.linux-foundation.org
>>Subject: [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64

>>This series addresses KVM PCIe passthrough with MSI enabled on ARM/ARM64.
>>It pursues the efforts done on [1], [2], [3]. It also aims at covering the
>>same need on PowerPC platforms although the same kind of integration
>>.should be carried out.
>>
[snip]
>>- Not tested: ARM GICv3 ITS

[snip]
>>QEMU Integration:
>>[RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt
>>(http://lists.gnu.org/archive/html/qemu-arm/2016-01/msg00444.html)
>>https://git.linaro.org/people/eric.auger/qemu.git/shortlog/refs/heads/v2.5.0-pci-passthrough-rfc-v2

For gicv3 its, I believe, the below series for qemu and kernel is required for gicv3-its

[RFC PATCH v3 0/5] vITS support
https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg05197.html

and in kernel CONFIG_HAVE_KVM_MSI must be enabled so that qemu sees MSI capability KVM_CAP_SIGNAL_MSI

This has a dependency on gsi routing support
KVM: arm/arm64: gsi routing support
https://lkml.org/lkml/2015/6/29/290

I had both the above series in 4.2 in my local 4.2 tree. 

BR
-Manish

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64
  2016-03-02  8:11 ` [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Jaggi, Manish
@ 2016-03-02 12:30   ` Eric Auger
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-02 12:30 UTC (permalink / raw)
  To: Jaggi, Manish, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Bharat.Bhushan,
	pranav.sawargaonkar, p.fedin, iommu

Hi Manish,
On 03/02/2016 09:11 AM, Jaggi, Manish wrote:
> 
> 
>>> From: Eric Auger <eric.auger@linaro.org>
>>> Sent: Tuesday, March 1, 2016 11:57 PM
>>> To: eric.auger@st.com; eric.auger@linaro.org; robin.murphy@arm.com; alex.williamson@redhat.com; will.deacon@arm.com; joro@8bytes.org; tglx@linutronix.de; >>jason@lakedaemon.net; marc.zyngier@arm.com; christoffer.dall@linaro.org; linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org
>>> Cc: suravee.suthikulpanit@amd.com; patches@linaro.org; linux-kernel@vger.kernel.org; Jaggi, Manish; Bharat.Bhushan@freescale.com; >>pranav.sawargaonkar@gmail.com; p.fedin@samsung.com; iommu@lists.linux-foundation.org
>>> Subject: [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64
> 
>>> This series addresses KVM PCIe passthrough with MSI enabled on ARM/ARM64.
>>> It pursues the efforts done on [1], [2], [3]. It also aims at covering the
>>> same need on PowerPC platforms although the same kind of integration
>>> .should be carried out.
>>>
> [snip]
>>> - Not tested: ARM GICv3 ITS
> 
> [snip]
>>> QEMU Integration:
>>> [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt
>>> (http://lists.gnu.org/archive/html/qemu-arm/2016-01/msg00444.html)
>>> https://git.linaro.org/people/eric.auger/qemu.git/shortlog/refs/heads/v2.5.0-pci-passthrough-rfc-v2
> 
> For gicv3 its, I believe, the below series for qemu and kernel is required for gicv3-its
> 
> [RFC PATCH v3 0/5] vITS support
> https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg05197.html
> 
> and in kernel CONFIG_HAVE_KVM_MSI must be enabled so that qemu sees MSI capability KVM_CAP_SIGNAL_MSI
> 
> This has a dependency on gsi routing support
> KVM: arm/arm64: gsi routing support
> https://lkml.org/lkml/2015/6/29/290

which has a dependency on Andre's ITS emulation series too.

The Kernel series will be resent soon on top on new vgic design.
> 
> I had both the above series in 4.2 in my local 4.2 tree. 

Did you have a chance to test with GICv3 ITS already?

Best Regards

Eric


> 
> BR
> -Manish
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v5 03/17] iommu: introduce a reserved iova cookie
  2016-03-01 18:27 ` [RFC v5 03/17] iommu: introduce a reserved iova cookie Eric Auger
@ 2016-03-03 16:26   ` Julien Grall
  2016-03-29 17:26     ` Eric Auger
  0 siblings, 1 reply; 24+ messages in thread
From: Julien Grall @ 2016-03-03 16:26 UTC (permalink / raw)
  To: Eric Auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: patches, Manish.Jaggi, linux-kernel, iommu

Hi Eric,

On 01/03/16 18:27, Eric Auger wrote:
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 0e3b009..7b2bb94 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1072,6 +1072,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
>
>   	domain->ops  = bus->iommu_ops;
>   	domain->type = type;
> +	mutex_init(&domain->reserved_mutex);

For consistency, the RB-tree reserved_binding_list should be initialized 
too:

domain->reserved_binding_list = RB_ROOT;

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v5 06/17] dma-reserved-iommu: iommu_get/put_single_reserved
  2016-03-01 18:27 ` [RFC v5 06/17] dma-reserved-iommu: iommu_get/put_single_reserved Eric Auger
@ 2016-03-10 11:52   ` Jean-Philippe Brucker
  2016-03-29 17:07     ` Eric Auger
  0 siblings, 1 reply; 24+ messages in thread
From: Jean-Philippe Brucker @ 2016-03-10 11:52 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, robin.murphy, alex.williamson, will.deacon, joro,
	tglx, jason, marc.zyngier, christoffer.dall, linux-arm-kernel,
	kvmarm, kvm, patches, Manish.Jaggi, linux-kernel, iommu

Hi Eric,

On Tue, Mar 01, 2016 at 06:27:46PM +0000, Eric Auger wrote:
>[...]
> +
> +int iommu_get_single_reserved(struct iommu_domain *domain,
> +			      phys_addr_t addr, int prot,
> +			      dma_addr_t *iova)
> +{
> +	unsigned long order = __ffs(domain->ops->pgsize_bitmap);
> +	size_t page_size = 1 << order;
> +	phys_addr_t mask = page_size - 1;
> +	phys_addr_t aligned_addr = addr & ~mask;
> +	phys_addr_t offset  = addr - aligned_addr;
> +	struct iommu_reserved_binding *b;
> +	struct iova *p_iova;
> +	struct iova_domain *iovad =
> +		(struct iova_domain *)domain->reserved_iova_cookie;
> +	int ret;
> +
> +	if (!iovad)
> +		return -EINVAL;
> +
> +	mutex_lock(&domain->reserved_mutex);

I believe this function could get called from the chunk of __setup_irq
that is executed atomically:

    * request_threaded_irq
    * __setup_irq
    * irq_startup
    * irq_domain_activate_irq
    * msi_domain_activate
    * msi_compose
    * iommu_get_single_reserved

If this is the case, we should probably use a spinlock to protect the
iova_domain...

> +
> +	b = find_reserved_binding(domain, aligned_addr, page_size);
> +	if (b) {
> +		*iova = b->iova + offset;
> +		kref_get(&b->kref);
> +		ret = 0;
> +		goto unlock;
> +	}
> +
> +	/* there is no existing reserved iova for this pa */
> +	p_iova = alloc_iova(iovad, 1, iovad->dma_32bit_pfn, true);
> +	if (!p_iova) {
> +		ret = -ENOMEM;
> +		goto unlock;
> +	}
> +	*iova = p_iova->pfn_lo << order;
> +
> +	b = kzalloc(sizeof(*b), GFP_KERNEL);

... and GFP_ATOMIC here.

Thanks,
Jean-Philippe

> +	if (!b) {
> +		ret = -ENOMEM;
> +		goto free_iova_unlock;
> +	}
> +
> +	ret = iommu_map(domain, *iova, aligned_addr, page_size, prot);
> +	if (ret)
> +		goto free_binding_iova_unlock;
> +
> +	kref_init(&b->kref);
> +	kref_get(&b->kref);
> +	b->domain = domain;
> +	b->addr = aligned_addr;
> +	b->iova = *iova;
> +	b->size = page_size;
> +
> +	link_reserved_binding(domain, b);
> +
> +	*iova += offset;
> +	goto unlock;
> +
> +free_binding_iova_unlock:
> +	kfree(b);
> +free_iova_unlock:
> +	free_iova(iovad, *iova >> order);
> +unlock:
> +	mutex_unlock(&domain->reserved_mutex);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_get_single_reserved);

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v5 06/17] dma-reserved-iommu: iommu_get/put_single_reserved
  2016-03-10 11:52   ` Jean-Philippe Brucker
@ 2016-03-29 17:07     ` Eric Auger
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-29 17:07 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: eric.auger, robin.murphy, alex.williamson, will.deacon, joro,
	tglx, jason, marc.zyngier, christoffer.dall, linux-arm-kernel,
	kvmarm, kvm, patches, Manish.Jaggi, linux-kernel, iommu

Hi Jean-Philippe,
On 03/10/2016 12:52 PM, Jean-Philippe Brucker wrote:
> Hi Eric,
> 
> On Tue, Mar 01, 2016 at 06:27:46PM +0000, Eric Auger wrote:
>> [...]
>> +
>> +int iommu_get_single_reserved(struct iommu_domain *domain,
>> +			      phys_addr_t addr, int prot,
>> +			      dma_addr_t *iova)
>> +{
>> +	unsigned long order = __ffs(domain->ops->pgsize_bitmap);
>> +	size_t page_size = 1 << order;
>> +	phys_addr_t mask = page_size - 1;
>> +	phys_addr_t aligned_addr = addr & ~mask;
>> +	phys_addr_t offset  = addr - aligned_addr;
>> +	struct iommu_reserved_binding *b;
>> +	struct iova *p_iova;
>> +	struct iova_domain *iovad =
>> +		(struct iova_domain *)domain->reserved_iova_cookie;
>> +	int ret;
>> +
>> +	if (!iovad)
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&domain->reserved_mutex);
> 
> I believe this function could get called from the chunk of __setup_irq
> that is executed atomically:
> 
>     * request_threaded_irq
>     * __setup_irq
>     * irq_startup
>     * irq_domain_activate_irq
>     * msi_domain_activate
>     * msi_compose
>     * iommu_get_single_reserved
> 
> If this is the case, we should probably use a spinlock to protect the
> iova_domain...
Please apologize for the delay, I was in vacation.
Thank you for spotting this flow. I will rework the locking.
> 
>> +
>> +	b = find_reserved_binding(domain, aligned_addr, page_size);
>> +	if (b) {
>> +		*iova = b->iova + offset;
>> +		kref_get(&b->kref);
>> +		ret = 0;
>> +		goto unlock;
>> +	}
>> +
>> +	/* there is no existing reserved iova for this pa */
>> +	p_iova = alloc_iova(iovad, 1, iovad->dma_32bit_pfn, true);
>> +	if (!p_iova) {
>> +		ret = -ENOMEM;
>> +		goto unlock;
>> +	}
>> +	*iova = p_iova->pfn_lo << order;
>> +
>> +	b = kzalloc(sizeof(*b), GFP_KERNEL);
> 
> ... and GFP_ATOMIC here.
OK

Thank you for your time!

Best Regards

Eric
> 
> Thanks,
> Jean-Philippe
> 
>> +	if (!b) {
>> +		ret = -ENOMEM;
>> +		goto free_iova_unlock;
>> +	}
>> +
>> +	ret = iommu_map(domain, *iova, aligned_addr, page_size, prot);
>> +	if (ret)
>> +		goto free_binding_iova_unlock;
>> +
>> +	kref_init(&b->kref);
>> +	kref_get(&b->kref);
>> +	b->domain = domain;
>> +	b->addr = aligned_addr;
>> +	b->iova = *iova;
>> +	b->size = page_size;
>> +
>> +	link_reserved_binding(domain, b);
>> +
>> +	*iova += offset;
>> +	goto unlock;
>> +
>> +free_binding_iova_unlock:
>> +	kfree(b);
>> +free_iova_unlock:
>> +	free_iova(iovad, *iova >> order);
>> +unlock:
>> +	mutex_unlock(&domain->reserved_mutex);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_get_single_reserved);

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v5 03/17] iommu: introduce a reserved iova cookie
  2016-03-03 16:26   ` Julien Grall
@ 2016-03-29 17:26     ` Eric Auger
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Auger @ 2016-03-29 17:26 UTC (permalink / raw)
  To: Julien Grall, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel, kvmarm, kvm
  Cc: patches, Manish.Jaggi, linux-kernel, iommu

Hi Julien,
On 03/03/2016 05:26 PM, Julien Grall wrote:
> Hi Eric,
> 
> On 01/03/16 18:27, Eric Auger wrote:
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 0e3b009..7b2bb94 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1072,6 +1072,7 @@ static struct iommu_domain
>> *__iommu_domain_alloc(struct bus_type *bus,
>>
>>       domain->ops  = bus->iommu_ops;
>>       domain->type = type;
>> +    mutex_init(&domain->reserved_mutex);
> 
> For consistency, the RB-tree reserved_binding_list should be initialized
> too:
> 
> domain->reserved_binding_list = RB_ROOT;
Sure

Thank you

Eric
> 
> Cheers,
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-03-29 17:27 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-01 18:27 [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
2016-03-01 18:27 ` [RFC v5 01/17] iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute Eric Auger
2016-03-01 18:27 ` [RFC v5 02/17] iommu/arm-smmu: advertise " Eric Auger
2016-03-01 18:27 ` [RFC v5 03/17] iommu: introduce a reserved iova cookie Eric Auger
2016-03-03 16:26   ` Julien Grall
2016-03-29 17:26     ` Eric Auger
2016-03-01 18:27 ` [RFC v5 04/17] dma-reserved-iommu: alloc/free_reserved_iova_domain Eric Auger
2016-03-01 18:27 ` [RFC v5 05/17] dma-reserved-iommu: reserved binding rb-tree and helpers Eric Auger
2016-03-01 18:27 ` [RFC v5 06/17] dma-reserved-iommu: iommu_get/put_single_reserved Eric Auger
2016-03-10 11:52   ` Jean-Philippe Brucker
2016-03-29 17:07     ` Eric Auger
2016-03-01 18:27 ` [RFC v5 07/17] dma-reserved-iommu: iommu_unmap_reserved Eric Auger
2016-03-01 18:27 ` [RFC v5 08/17] msi: Add a new MSI_FLAG_IRQ_REMAPPING flag Eric Auger
2016-03-01 18:27 ` [RFC v5 09/17] irqchip/gic-v3-its: ITS advertises MSI_FLAG_IRQ_REMAPPING Eric Auger
2016-03-01 18:27 ` [RFC v5 10/17] msi: export msi_get_domain_info Eric Auger
2016-03-01 18:27 ` [RFC v5 11/17] msi: msi_compose wrapper Eric Auger
2016-03-01 18:27 ` [RFC v5 12/17] msi: IOMMU map the doorbell address when needed Eric Auger
2016-03-01 18:27 ` [RFC v5 13/17] vfio: introduce VFIO_IOVA_RESERVED vfio_dma type Eric Auger
2016-03-01 18:27 ` [RFC v5 14/17] vfio: allow the user to register reserved iova range for MSI mapping Eric Auger
2016-03-01 18:27 ` [RFC v5 15/17] vfio/type1: also check IRQ remapping capability at msi domain Eric Auger
2016-03-01 18:27 ` [RFC v5 16/17] iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP Eric Auger
2016-03-01 18:27 ` [RFC v5 17/17] vfio/type1: return MSI mapping requirements with VFIO_IOMMU_GET_INFO Eric Auger
2016-03-02  8:11 ` [RFC v5 00/17] KVM PCIe/MSI passthrough on ARM/ARM64 Jaggi, Manish
2016-03-02 12:30   ` Eric Auger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).