linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes
@ 2016-05-04 11:54 Eric Auger
  2016-05-04 11:54 ` [PATCH v9 1/7] vfio: introduce a vfio_dma type field Eric Auger
                   ` (7 more replies)
  0 siblings, 8 replies; 38+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel
  Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday

This series allows the user-space to register a reserved IOVA domain.
This completes the kernel integration of the whole functionality on top
of part 1 (v9) & 2 (v8).

It also depends on [PATCH 1/3] iommu: Add MMIO mapping type series,
http://comments.gmane.org/gmane.linux.kernel.iommu/12869

We reuse the VFIO DMA MAP ioctl with a new flag to bridge to the
msi-iommu API. The need for provisioning such MSI IOVA range is reported
through capability chain, using VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY.

vfio_iommu_type1 checks if the MSI mapping is safe when attaching the
vfio group to the container (allow_unsafe_interrupts modality).

On ARM/ARM64, the IOMMU does not astract IRQ remapping. the modality is
abstracted on MSI controller side. The GICv3 ITS is the first controller
advertising the modality.

More details & context can be found at:
http://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/

Best Regards

Eric

Testing:
- functional on ARM64 AMD Overdrive HW (single GICv2m frame) with
  Intel X540-T2 (SR-IOV capable)
- also tested on Armada-7040 using an intel IXGBE (82599ES) by
  Yehuda Yitschak (v8)
- Not tested: ARM GICv3 ITS

References:
[1] [RFC 0/2] VFIO: Add virtual MSI doorbell support
    (https://lkml.org/lkml/2015/7/24/135)
[2] [RFC PATCH 0/6] vfio: Add interface to map MSI pages
    (https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016607.html)
[3] [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO
    (http://permalink.gmane.org/gmane.comp.emulators.kvm.arm.devel/3858)

Git: complete series available at
https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.6-rc6-pcie-passthrough-v9

previous version at
https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.6-rc5-pcie-passthrough-v8

History:
v8 -> v9:
- report MSI geometry through capability chain (last patch only);
  with the current limitation that an arbitrary number of 16 page
  requirement is reported. To be improved later on.


v7 -> v8:
- use renamed msi-iommu API
- VFIO only responsible for setting the IOVA aperture
- use new DOMAIN_ATTR_MSI_GEOMETRY iommu domain attribute

v6 -> v7:
- vfio_find_dma now accepts a dma_type argument.
- should have recovered the capability to unmap the whole user IOVA range
- remove computation of nb IOVA pages -> will post a separate RFC for that
  while respinning the QEMU part

RFC v5 -> patch v6:
- split to ease the review process

RFC v4 -> RFC v5:
- take into account Thomas' comments on MSI related patches
  - split "msi: IOMMU map the doorbell address when needed"
  - increase readability and add comments
  - fix style issues
 - split "iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute"
 - platform ITS now advertises IOMMU_CAP_INTR_REMAP
 - fix compilation issue with CONFIG_IOMMU API unset
 - arm-smmu-v3 now advertises DOMAIN_ATTR_MSI_MAPPING

RFC v3 -> v4:
- Move doorbell mapping/unmapping in msi.c
- fix ref count issue on set_affinity: in case of a change in the address
  the previous address is decremented
- doorbell map/unmap now is done on msi composition. Should allow the use
  case for platform MSI controllers
- create dma-reserved-iommu.h/c exposing/implementing a new API dedicated
  to reserved IOVA management (looking like dma-iommu glue)
- series reordering to ease the review:
  - first part is related to IOMMU
  - second related to MSI sub-system
  - third related to VFIO (except arm-smmu IOMMU_CAP_INTR_REMAP removal)
- expose the number of requested IOVA pages through VFIO_IOMMU_GET_INFO
  [this partially addresses Marc's comments on iommu_get/put_single_reserved
   size/alignment problematic - which I did not ignore - but I don't know
   how much I can do at the moment]

RFC v2 -> RFC v3:
- should fix wrong handling of some CONFIG combinations:
  CONFIG_IOVA, CONFIG_IOMMU_API, CONFIG_PCI_MSI_IRQ_DOMAIN
- fix MSI_FLAG_IRQ_REMAPPING setting in GICv3 ITS (although not tested)

PATCH v1 -> RFC v2:
- reverted to RFC since it looks more reasonable ;-) the code is split
  between VFIO, IOMMU, MSI controller and I am not sure I did the right
  choices. Also API need to be further discussed.
- iova API usage in arm-smmu.c.
- MSI controller natively programs the MSI addr with either the PA or IOVA.
  This is not done anymore in vfio-pci driver as suggested by Alex.
- check irq remapping capability of the group

RFC v1 [2] -> PATCH v1:
- use the existing dma map/unmap ioctl interface with a flag to register a
  reserved IOVA range. Use the legacy Rb to store this special vfio_dma.
- a single reserved IOVA contiguous region now is allowed
- use of an RB tree indexed by PA to store allocated reserved slots
- use of a vfio_domain iova_domain to manage iova allocation within the
  window provided by the userspace
- vfio alloc_map/unmap_free take a vfio_group handle
- vfio_group handle is cached in vfio_pci_device
- add ref counting to bindings
- user modality enabled at the end of the series


Eric Auger (7):
  vfio: introduce a vfio_dma type field
  vfio/type1: vfio_find_dma accepting a type argument
  vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots
  vfio: allow reserved msi iova registration
  vfio/type1: also check IRQ remapping capability at msi domain
  iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP
  vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability
    chains

 drivers/iommu/arm-smmu-v3.c     |   3 +-
 drivers/iommu/arm-smmu.c        |   3 +-
 drivers/vfio/vfio_iommu_type1.c | 270 +++++++++++++++++++++++++++++++++++++---
 include/uapi/linux/vfio.h       |  40 +++++-
 4 files changed, 298 insertions(+), 18 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v9 1/7] vfio: introduce a vfio_dma type field
  2016-05-04 11:54 [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes Eric Auger
@ 2016-05-04 11:54 ` Eric Auger
  2016-05-04 11:54 ` [PATCH v9 2/7] vfio/type1: vfio_find_dma accepting a type argument Eric Auger
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel
  Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday

We introduce a vfio_dma type since we will need to discriminate
different types of dma slots:
- VFIO_IOVA_USER: IOVA region used to map user vaddr
- VFIO_IOVA_RESERVED: IOVA region reserved to map host device PA such
  as MSI doorbells

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v6 -> v7:
- add VFIO_IOVA_ANY
- do not introduce yet any VFIO_IOVA_RESERVED handling
---
 drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 75b24e9..aaf5a6c 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -53,6 +53,16 @@ module_param_named(disable_hugepages,
 MODULE_PARM_DESC(disable_hugepages,
 		 "Disable VFIO IOMMU support for IOMMU hugepages.");
 
+enum vfio_iova_type {
+	VFIO_IOVA_USER = 0,	/* standard IOVA used to map user vaddr */
+	/*
+	 * IOVA reserved to map special host physical addresses,
+	 * MSI frames for instance
+	 */
+	VFIO_IOVA_RESERVED,
+	VFIO_IOVA_ANY,		/* matches any IOVA type */
+};
+
 struct vfio_iommu {
 	struct list_head	domain_list;
 	struct mutex		lock;
@@ -75,6 +85,7 @@ struct vfio_dma {
 	unsigned long		vaddr;		/* Process virtual addr */
 	size_t			size;		/* Map size (bytes) */
 	int			prot;		/* IOMMU_READ/WRITE */
+	enum vfio_iova_type	type;		/* type of IOVA */
 };
 
 struct vfio_group {
-- 
1.9.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v9 2/7] vfio/type1: vfio_find_dma accepting a type argument
  2016-05-04 11:54 [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes Eric Auger
  2016-05-04 11:54 ` [PATCH v9 1/7] vfio: introduce a vfio_dma type field Eric Auger
@ 2016-05-04 11:54 ` Eric Auger
  2016-05-09 22:49   ` Alex Williamson
  2016-05-04 11:54 ` [PATCH v9 3/7] vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots Eric Auger
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 38+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel
  Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday

In our RB-tree we now have slots of different types (USER and RESERVED).
It becomes useful to be able to search for dma slots of a specific type or
any type. This patch proposes an implementation for that modality and also
changes the existing callers using the USER type.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 drivers/vfio/vfio_iommu_type1.c | 63 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 53 insertions(+), 10 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index aaf5a6c..2d769d4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -98,23 +98,64 @@ struct vfio_group {
  * into DMA'ble space using the IOMMU
  */
 
-static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
-				      dma_addr_t start, size_t size)
+/**
+ * vfio_find_dma_from_node: looks for a dma slot intersecting a window
+ * from a given rb tree node
+ * @top: top rb tree node where the search starts (including this node)
+ * @start: window start
+ * @size: window size
+ * @type: window type
+ */
+static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
+						dma_addr_t start, size_t size,
+						enum vfio_iova_type type)
 {
-	struct rb_node *node = iommu->dma_list.rb_node;
+	struct rb_node *node = top;
+	struct vfio_dma *dma;
 
 	while (node) {
-		struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
-
+		dma = rb_entry(node, struct vfio_dma, node);
 		if (start + size <= dma->iova)
 			node = node->rb_left;
 		else if (start >= dma->iova + dma->size)
 			node = node->rb_right;
 		else
+			break;
+	}
+	if (!node)
+		return NULL;
+
+	/* a dma slot intersects our window, check the type also matches */
+	if (type == VFIO_IOVA_ANY || dma->type == type)
+		return dma;
+
+	/* restart 2 searches skipping the current node */
+	if (start < dma->iova) {
+		dma = vfio_find_dma_from_node(node->rb_left, start,
+					      size, type);
+		if (dma)
 			return dma;
 	}
+	if (start + size > dma->iova + dma->size)
+		dma = vfio_find_dma_from_node(node->rb_right, start,
+					      size, type);
+	return dma;
+}
+
+/**
+ * vfio_find_dma: find a dma slot intersecting a given window
+ * @iommu: vfio iommu handle
+ * @start: window base iova
+ * @size: window size
+ * @type: window type
+ */
+static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
+				      dma_addr_t start, size_t size,
+				      enum vfio_iova_type type)
+{
+	struct rb_node *top_node = iommu->dma_list.rb_node;
 
-	return NULL;
+	return vfio_find_dma_from_node(top_node, start, size, type);
 }
 
 static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
@@ -488,19 +529,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 	 * mappings within the range.
 	 */
 	if (iommu->v2) {
-		dma = vfio_find_dma(iommu, unmap->iova, 0);
+		dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
 		if (dma && dma->iova != unmap->iova) {
 			ret = -EINVAL;
 			goto unlock;
 		}
-		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
+		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
+				    VFIO_IOVA_USER);
 		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
 			ret = -EINVAL;
 			goto unlock;
 		}
 	}
 
-	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
+	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
+				    VFIO_IOVA_USER))) {
 		if (!iommu->v2 && unmap->iova > dma->iova)
 			break;
 		unmapped += dma->size;
@@ -604,7 +647,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 
 	mutex_lock(&iommu->lock);
 
-	if (vfio_find_dma(iommu, iova, size)) {
+	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
 		mutex_unlock(&iommu->lock);
 		return -EEXIST;
 	}
-- 
1.9.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v9 3/7] vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots
  2016-05-04 11:54 [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes Eric Auger
  2016-05-04 11:54 ` [PATCH v9 1/7] vfio: introduce a vfio_dma type field Eric Auger
  2016-05-04 11:54 ` [PATCH v9 2/7] vfio/type1: vfio_find_dma accepting a type argument Eric Auger
@ 2016-05-04 11:54 ` Eric Auger
  2016-05-09 22:49   ` Alex Williamson
  2016-05-04 11:54 ` [PATCH v9 4/7] vfio: allow reserved msi iova registration Eric Auger
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 38+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel
  Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday

Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
let's implement the expected behavior for removal and replay. As opposed
to user dma slots, IOVAs are not systematically bound to PAs and PAs are
not pinned. VFIO just initializes the IOVA "aperture". IOVAs are allocated
outside of the VFIO framework, typically the MSI layer which is
responsible to free and unmap them. The MSI mapping resources are freeed
by the IOMMU driver on domain destruction.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v7 -> v8:
- do no destroy anything anymore, just bypass unmap/unpin and iommu_map
  on replay
---
 drivers/vfio/vfio_iommu_type1.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2d769d4..94a9916 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -391,7 +391,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 	struct vfio_domain *domain, *d;
 	long unlocked = 0;
 
-	if (!dma->size)
+	if (!dma->size || dma->type != VFIO_IOVA_USER)
 		return;
 	/*
 	 * We use the IOMMU to track the physical addresses, otherwise we'd
@@ -727,6 +727,9 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 		dma = rb_entry(n, struct vfio_dma, node);
 		iova = dma->iova;
 
+		if (dma->type == VFIO_IOVA_RESERVED)
+			continue;
+
 		while (iova < dma->iova + dma->size) {
 			phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
 			size_t size;
-- 
1.9.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v9 4/7] vfio: allow reserved msi iova registration
  2016-05-04 11:54 [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes Eric Auger
                   ` (2 preceding siblings ...)
  2016-05-04 11:54 ` [PATCH v9 3/7] vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots Eric Auger
@ 2016-05-04 11:54 ` Eric Auger
  2016-05-05 19:22   ` Chalamarla, Tirumalesh
  2016-05-10 15:29   ` Alex Williamson
  2016-05-04 11:54 ` [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain Eric Auger
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 38+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel
  Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday

The user is allowed to register a reserved MSI IOVA range by using the
DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
This region is stored in the vfio_dma rb tree. At that point the iova
range is not mapped to any target address yet. The host kernel will use
those iova when needed, typically when MSIs are allocated.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>

---
v7 -> v8:
- use iommu_msi_set_aperture function. There is no notion of
  unregistration anymore since the reserved msi slot remains
  until the container gets closed.

v6 -> v7:
- use iommu_free_reserved_iova_domain
- convey prot attributes downto dma-reserved-iommu iova domain creation
- reserved bindings teardown now performed on iommu domain destruction
- rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
         VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
- change title
- pass the protection attribute to dma-reserved-iommu API

v3 -> v4:
- use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
- protect vfio_register_reserved_iova_range implementation with
  CONFIG_IOMMU_DMA_RESERVED
- handle unregistration by user-space and on vfio_iommu_type1 release

v1 -> v2:
- set returned value according to alloc_reserved_iova_domain result
- free the iova domains in case any error occurs

RFC v1 -> v1:
- takes into account Alex comments, based on
  [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
- use the existing dma map/unmap ioctl interface with a flag to register
  a reserved IOVA range. A single reserved iova region is allowed.
---
 drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/vfio.h       | 10 +++++-
 2 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 94a9916..4d3a6f1 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -36,6 +36,7 @@
 #include <linux/uaccess.h>
 #include <linux/vfio.h>
 #include <linux/workqueue.h>
+#include <linux/msi-iommu.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
@@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 	vfio_lock_acct(-unlocked);
 }
 
+static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
+				dma_addr_t iova, size_t size)
+{
+	struct vfio_domain *d;
+	int ret = 0;
+
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
 static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
 {
 	vfio_unmap_unpin(iommu, dma);
@@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static int vfio_register_msi_range(struct vfio_iommu *iommu,
+				   struct vfio_iommu_type1_dma_map *map)
+{
+	dma_addr_t iova = map->iova;
+	size_t size = map->size;
+	int ret = 0;
+	struct vfio_dma *dma;
+	unsigned long order;
+	uint64_t mask;
+
+	/* Verify that none of our __u64 fields overflow */
+	if (map->size != size || map->iova != iova)
+		return -EINVAL;
+
+	order =  __ffs(vfio_pgsize_bitmap(iommu));
+	mask = ((uint64_t)1 << order) - 1;
+
+	WARN_ON(mask & PAGE_MASK);
+
+	if (!size || (size | iova) & mask)
+		return -EINVAL;
+
+	/* Don't allow IOVA address wrap */
+	if (iova + size - 1 < iova)
+		return -EINVAL;
+
+	mutex_lock(&iommu->lock);
+
+	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
+		ret =  -EEXIST;
+		goto unlock;
+	}
+
+	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
+	if (!dma) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	dma->iova = iova;
+	dma->size = size;
+	dma->type = VFIO_IOVA_RESERVED;
+
+	ret = vfio_set_msi_aperture(iommu, iova, size);
+	if (ret)
+		goto free_unlock;
+
+	vfio_link_dma(iommu, dma);
+	goto unlock;
+
+free_unlock:
+	kfree(dma);
+unlock:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static int vfio_bus_type(struct device *dev, void *data)
 {
 	struct bus_type **bus = data;
@@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
 		struct vfio_iommu_type1_dma_map map;
 		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
-				VFIO_DMA_MAP_FLAG_WRITE;
+				VFIO_DMA_MAP_FLAG_WRITE |
+				VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
 
 		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
 
@@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		if (map.argsz < minsz || map.flags & ~mask)
 			return -EINVAL;
 
+		if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
+			return vfio_register_msi_range(iommu, &map);
+
 		return vfio_dma_do_map(iommu, &map);
 
 	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 255a211..4a9dbc2 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
  *
  * Map process virtual addresses to IO virtual addresses using the
  * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
+ *
+ * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
+ * IOVA region that will be used on some platforms to map the host MSI frames.
+ * In that specific case, vaddr is ignored. Once registered, an MSI reserved
+ * IOVA region stays until the container is closed.
  */
 struct vfio_iommu_type1_dma_map {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
 #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
+/* reserved iova for MSI vectors*/
+#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
 	__u64	vaddr;				/* Process virtual address */
 	__u64	iova;				/* IO virtual address */
 	__u64	size;				/* Size of mapping (bytes) */
@@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
  * Caller sets argsz.  The actual unmapped size is returned in the size
  * field.  No guarantee is made to the user that arbitrary unmaps of iova
  * or size different from those used in the original mapping call will
- * succeed.
+ * succeed. Once registered, an MSI region cannot be unmapped and stays
+ * until the container is closed.
  */
 struct vfio_iommu_type1_dma_unmap {
 	__u32	argsz;
-- 
1.9.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
  2016-05-04 11:54 [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes Eric Auger
                   ` (3 preceding siblings ...)
  2016-05-04 11:54 ` [PATCH v9 4/7] vfio: allow reserved msi iova registration Eric Auger
@ 2016-05-04 11:54 ` Eric Auger
  2016-05-05 19:23   ` Chalamarla, Tirumalesh
  2016-05-09 22:49   ` Alex Williamson
  2016-05-04 11:54 ` [PATCH v9 6/7] iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP Eric Auger
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 38+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel
  Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday

On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
by the msi controller. vfio_safe_irq_domain allows to check whether
interrupts are "safe" for a given device. They are if the device does
not use MSI or if the device uses MSI and the msi-parent controller
supports IRQ remapping.

Then we check at group level if all devices have safe interrupts: if not,
we only allow the group to be attached if allow_unsafe_interrupts is set.

At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
changed in next patch.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---
v3 -> v4:
- rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
  and irq_remapping into safe_irq_domains

v2 -> v3:
- protect vfio_msi_parent_irq_remapping_capable with
  CONFIG_GENERIC_MSI_IRQ_DOMAIN
---
 drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 4d3a6f1..2fc8197 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -37,6 +37,8 @@
 #include <linux/vfio.h>
 #include <linux/workqueue.h>
 #include <linux/msi-iommu.h>
+#include <linux/irqdomain.h>
+#include <linux/msi.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
@@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
 	return 0;
 }
 
+/**
+ * vfio_safe_irq_domain: returns whether the irq domain
+ * the device is attached to is safe with respect to MSI isolation.
+ * If the irq domain is not an MSI domain, we return it is safe.
+ *
+ * @dev: device handle
+ * @data: unused
+ * returns 0 if the irq domain is safe, -1 if not.
+ */
+static int vfio_safe_irq_domain(struct device *dev, void *data)
+{
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+	struct irq_domain *domain;
+	struct msi_domain_info *info;
+
+	domain = dev_get_msi_domain(dev);
+	if (!domain)
+		return 0;
+
+	info = msi_get_domain_info(domain);
+
+	if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
+		return -1;
+#endif
+	return 0;
+}
+
 static int vfio_iommu_replay(struct vfio_iommu *iommu,
 			     struct vfio_domain *domain)
 {
@@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	struct vfio_group *group, *g;
 	struct vfio_domain *domain, *d;
 	struct bus_type *bus = NULL;
-	int ret;
+	int ret, safe_irq_domains;
 
 	mutex_lock(&iommu->lock);
 
@@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 
 	group->iommu_group = iommu_group;
 
+	/*
+	 * Determine if all the devices of the group have a safe irq domain
+	 * with respect to MSI isolation
+	 */
+	safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
+				       vfio_safe_irq_domain);
+
 	/* Determine bus_type in order to allocate a domain */
 	ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
 	if (ret)
@@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	INIT_LIST_HEAD(&domain->group_list);
 	list_add(&group->next, &domain->group_list);
 
+	/*
+	 * to advertise safe interrupts either the IOMMU or the MSI controllers
+	 * must support IRQ remapping/interrupt translation
+	 */
 	if (!allow_unsafe_interrupts &&
-	    !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
+	    (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
 		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
 		       __func__);
 		ret = -EPERM;
-- 
1.9.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v9 6/7] iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP
  2016-05-04 11:54 [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes Eric Auger
                   ` (4 preceding siblings ...)
  2016-05-04 11:54 ` [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain Eric Auger
@ 2016-05-04 11:54 ` Eric Auger
  2016-05-04 11:54 ` [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains Eric Auger
  2016-05-20 16:01 ` [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes Eric Auger
  7 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel
  Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday

Do not advertise IOMMU_CAP_INTR_REMAP for arm-smmu(-v3). Indeed the
irq_remapping capability is abstracted on irqchip side for ARM as
opposed to Intel IOMMU featuring IRQ remapping HW.

So to check IRQ remapping capability, the msi domain needs to be
checked instead.

This commit needs to be applied after "vfio/type1: also check IRQ
remapping capability at msi domain" else the legacy interrupt
assignment gets broken with arm-smmu.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 drivers/iommu/arm-smmu-v3.c | 3 ++-
 drivers/iommu/arm-smmu.c    | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index b5d9826..778212c 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1386,7 +1386,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 	case IOMMU_CAP_CACHE_COHERENCY:
 		return true;
 	case IOMMU_CAP_INTR_REMAP:
-		return true; /* MSIs are just memory writes */
+		/* interrupt translation handled at MSI controller level */
+		return false;
 	case IOMMU_CAP_NOEXEC:
 		return true;
 	default:
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 0908985..e7c9e89 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1320,7 +1320,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 		 */
 		return true;
 	case IOMMU_CAP_INTR_REMAP:
-		return true; /* MSIs are just memory writes */
+		/* interrupt translation handled at MSI controller level */
+		return false;
 	case IOMMU_CAP_NOEXEC:
 		return true;
 	default:
-- 
1.9.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
  2016-05-04 11:54 [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes Eric Auger
                   ` (5 preceding siblings ...)
  2016-05-04 11:54 ` [PATCH v9 6/7] iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP Eric Auger
@ 2016-05-04 11:54 ` Eric Auger
  2016-05-04 12:06   ` Eric Auger
  2016-05-09 22:49   ` Alex Williamson
  2016-05-20 16:01 ` [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes Eric Auger
  7 siblings, 2 replies; 38+ messages in thread
From: Eric Auger @ 2016-05-04 11:54 UTC (permalink / raw)
  To: eric.auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel
  Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday

This patch allows the user-space to retrieve the MSI geometry. The
implementation is based on capability chains, now also added to
VFIO_IOMMU_GET_INFO.

The returned info comprise:
- whether the MSI IOVA are constrained to a reserved range (x86 case) and
  in the positive, the start/end of the aperture,
- or whether the IOVA aperture need to be set by the userspace. In that
  case, the size and alignment of the IOVA region to be provided are
  returned.

In case the userspace must provide the IOVA range, we currently return
an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
current ARM platforms. This may be deprecated by a more sophisticated
computation later on.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---
v8 -> v9:
- use iommu_msi_supported flag instead of programmable
- replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
  capability chain, reporting the MSI geometry

v7 -> v8:
- use iommu_domain_msi_geometry

v6 -> v7:
- remove the computation of the number of IOVA pages to be provisionned.
  This number depends on the domain/group/device topology which can
  dynamically change. Let's rely instead rely on an arbitrary max depending
  on the system

v4 -> v5:
- move msi_info and ret declaration within the conditional code

v3 -> v4:
- replace former vfio_domains_require_msi_mapping by
  more complex computation of MSI mapping requirements, especially the
  number of pages to be provided by the user-space.
- reword patch title

RFC v1 -> v1:
- derived from
  [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
- renamed allow_msi_reconfig into require_msi_mapping
- fixed VFIO_IOMMU_GET_INFO
---
 drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       | 30 +++++++++++++++++-
 2 files changed, 98 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2fc8197..841360b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
+				     struct vfio_info_cap *caps)
+{
+	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
+	struct iommu_domain_msi_geometry msi_geometry;
+	struct vfio_info_cap_header *header;
+	struct vfio_domain *d;
+	bool mapping_required;
+	size_t size;
+
+	mutex_lock(&iommu->lock);
+	/* All domains have same require_msi_map property, pick first */
+	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
+	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
+			      &msi_geometry);
+	mapping_required = msi_geometry.iommu_msi_supported;
+
+	mutex_unlock(&iommu->lock);
+
+	size = sizeof(*vfio_msi_geometry);
+	header = vfio_info_cap_add(caps, size,
+				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
+
+	if (IS_ERR(header))
+		return PTR_ERR(header);
+
+	vfio_msi_geometry = container_of(header,
+				struct vfio_iommu_type1_info_cap_msi_geometry,
+				header);
+
+	vfio_msi_geometry->reserved = !mapping_required;
+	if (vfio_msi_geometry->reserved) {
+		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
+		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
+		return 0;
+	}
+
+	vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
+	/* we currently report the need for an arbitray number of 16 pages */
+	vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
+
+	return 0;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		}
 	} else if (cmd == VFIO_IOMMU_GET_INFO) {
 		struct vfio_iommu_type1_info info;
+		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+		int ret;
 
 		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
 
@@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
+		ret = compute_msi_geometry_caps(iommu, &caps);
+		if (ret)
+			return ret;
+
+		if (caps.size) {
+			info.flags |= VFIO_IOMMU_INFO_CAPS;
+			if (info.argsz < sizeof(info) + caps.size) {
+				info.argsz = sizeof(info) + caps.size;
+				info.cap_offset = 0;
+			} else {
+				vfio_info_cap_shift(&caps, sizeof(info));
+				if (copy_to_user((void __user *)arg +
+						sizeof(info), caps.buf,
+						caps.size)) {
+					kfree(caps.buf);
+					return -EFAULT;
+				}
+				info.cap_offset = sizeof(info);
+			}
+
+			kfree(caps.buf);
+		}
+
 		return copy_to_user((void __user *)arg, &info, minsz) ?
 			-EFAULT : 0;
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 4a9dbc2..0ff6a8d 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
-	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
+#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
+	__u32   cap_offset;	/* Offset within info struct of first cap */
+	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
+};
+
+#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
+
+/*
+ * The MSI geometry capability allows to report the MSI IOVA geometry:
+ * - either the MSI IOVAs are constrained within a reserved IOVA aperture
+ *   whose boundaries are given by [@aperture_start, @aperture_end].
+ *   this is typically the case on x86 host. The userspace is not allowed
+ *   to map userspace memory at IOVAs intersecting this range using
+ *   VFIO_IOMMU_MAP_DMA.
+ * - or the MSI IOVAs are not requested to belong to any reserved range;
+ *   in that case the userspace must provide an IOVA window characterized by
+ *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
+ */
+struct vfio_iommu_type1_info_cap_msi_geometry {
+	struct vfio_info_cap_header header;
+	bool reserved; /* Are MSI IOVAs within a reserved aperture? */
+	/* reserved */
+	__u64 aperture_start;
+	__u64 aperture_end;
+	/* not reserved */
+	__u64 size; /* IOVA aperture size in bytes the userspace must provide */
+	__u64 alignment; /* alignment of the window, in bytes */
 };
 
 #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
@@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
  * IOVA region that will be used on some platforms to map the host MSI frames.
  * In that specific case, vaddr is ignored. Once registered, an MSI reserved
  * IOVA region stays until the container is closed.
+ * The requirement for provisioning such reserved IOVA range can be checked by
+ * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
  */
 struct vfio_iommu_type1_dma_map {
 	__u32	argsz;
-- 
1.9.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
  2016-05-04 11:54 ` [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains Eric Auger
@ 2016-05-04 12:06   ` Eric Auger
  2016-05-09 23:03     ` Alex Williamson
  2016-05-09 22:49   ` Alex Williamson
  1 sibling, 1 reply; 38+ messages in thread
From: Eric Auger @ 2016-05-04 12:06 UTC (permalink / raw)
  To: eric.auger, robin.murphy, alex.williamson, will.deacon, joro,
	tglx, jason, marc.zyngier, christoffer.dall, linux-arm-kernel
  Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday

Hi Alex,
On 05/04/2016 01:54 PM, Eric Auger wrote:
> This patch allows the user-space to retrieve the MSI geometry. The
> implementation is based on capability chains, now also added to
> VFIO_IOMMU_GET_INFO.

If you prefer we could consider this patch outside of the main series
since it brings extra functionalities (MSI geometry reporting). In a
first QEMU integration we would live without knowing the MSI geometry I
think, all the more so I currently report an arbitrary number of
requested IOVA pages. The computation of the exact number of doorbells
to map brings extra complexity and I did not address this issue yet.

It sketches a possible user API to report the MSI geometry based on the
capability chains, as you suggested some time ago. I am currently busy
drafting a QEMU integration.

Best Regards

Eric
> 
> The returned info comprise:
> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>   in the positive, the start/end of the aperture,
> - or whether the IOVA aperture need to be set by the userspace. In that
>   case, the size and alignment of the IOVA region to be provided are
>   returned.
> 
> In case the userspace must provide the IOVA range, we currently return
> an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
> current ARM platforms. This may be deprecated by a more sophisticated
> computation later on.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> v8 -> v9:
> - use iommu_msi_supported flag instead of programmable
> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>   capability chain, reporting the MSI geometry
> 
> v7 -> v8:
> - use iommu_domain_msi_geometry
> 
> v6 -> v7:
> - remove the computation of the number of IOVA pages to be provisionned.
>   This number depends on the domain/group/device topology which can
>   dynamically change. Let's rely instead rely on an arbitrary max depending
>   on the system
> 
> v4 -> v5:
> - move msi_info and ret declaration within the conditional code
> 
> v3 -> v4:
> - replace former vfio_domains_require_msi_mapping by
>   more complex computation of MSI mapping requirements, especially the
>   number of pages to be provided by the user-space.
> - reword patch title
> 
> RFC v1 -> v1:
> - derived from
>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> - renamed allow_msi_reconfig into require_msi_mapping
> - fixed VFIO_IOMMU_GET_INFO
> ---
>  drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h       | 30 +++++++++++++++++-
>  2 files changed, 98 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2fc8197..841360b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> +				     struct vfio_info_cap *caps)
> +{
> +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> +	struct iommu_domain_msi_geometry msi_geometry;
> +	struct vfio_info_cap_header *header;
> +	struct vfio_domain *d;
> +	bool mapping_required;
> +	size_t size;
> +
> +	mutex_lock(&iommu->lock);
> +	/* All domains have same require_msi_map property, pick first */
> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> +			      &msi_geometry);
> +	mapping_required = msi_geometry.iommu_msi_supported;
> +
> +	mutex_unlock(&iommu->lock);
> +
> +	size = sizeof(*vfio_msi_geometry);
> +	header = vfio_info_cap_add(caps, size,
> +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> +
> +	if (IS_ERR(header))
> +		return PTR_ERR(header);
> +
> +	vfio_msi_geometry = container_of(header,
> +				struct vfio_iommu_type1_info_cap_msi_geometry,
> +				header);
> +
> +	vfio_msi_geometry->reserved = !mapping_required;
> +	if (vfio_msi_geometry->reserved) {
> +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
> +		return 0;
> +	}
> +
> +	vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
> +	/* we currently report the need for an arbitray number of 16 pages */
> +	vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
> +
> +	return 0;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> @@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  		}
>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
>  		struct vfio_iommu_type1_info info;
> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> +		int ret;
>  
>  		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>  
> @@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>  
> +		ret = compute_msi_geometry_caps(iommu, &caps);
> +		if (ret)
> +			return ret;
> +
> +		if (caps.size) {
> +			info.flags |= VFIO_IOMMU_INFO_CAPS;
> +			if (info.argsz < sizeof(info) + caps.size) {
> +				info.argsz = sizeof(info) + caps.size;
> +				info.cap_offset = 0;
> +			} else {
> +				vfio_info_cap_shift(&caps, sizeof(info));
> +				if (copy_to_user((void __user *)arg +
> +						sizeof(info), caps.buf,
> +						caps.size)) {
> +					kfree(caps.buf);
> +					return -EFAULT;
> +				}
> +				info.cap_offset = sizeof(info);
> +			}
> +
> +			kfree(caps.buf);
> +		}
> +
>  		return copy_to_user((void __user *)arg, &info, minsz) ?
>  			-EFAULT : 0;
>  
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 4a9dbc2..0ff6a8d 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
>  	__u32	argsz;
>  	__u32	flags;
>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
> -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
> +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
> +	__u32   cap_offset;	/* Offset within info struct of first cap */
> +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
> +};
> +
> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
> +
> +/*
> + * The MSI geometry capability allows to report the MSI IOVA geometry:
> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> + *   whose boundaries are given by [@aperture_start, @aperture_end].
> + *   this is typically the case on x86 host. The userspace is not allowed
> + *   to map userspace memory at IOVAs intersecting this range using
> + *   VFIO_IOMMU_MAP_DMA.
> + * - or the MSI IOVAs are not requested to belong to any reserved range;
> + *   in that case the userspace must provide an IOVA window characterized by
> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> + */
> +struct vfio_iommu_type1_info_cap_msi_geometry {
> +	struct vfio_info_cap_header header;
> +	bool reserved; /* Are MSI IOVAs within a reserved aperture? */
> +	/* reserved */
> +	__u64 aperture_start;
> +	__u64 aperture_end;
> +	/* not reserved */
> +	__u64 size; /* IOVA aperture size in bytes the userspace must provide */
> +	__u64 alignment; /* alignment of the window, in bytes */
>  };
>  
>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> @@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
>   * IOVA region that will be used on some platforms to map the host MSI frames.
>   * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>   * IOVA region stays until the container is closed.
> + * The requirement for provisioning such reserved IOVA range can be checked by
> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
>   */
>  struct vfio_iommu_type1_dma_map {
>  	__u32	argsz;
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 4/7] vfio: allow reserved msi iova registration
  2016-05-04 11:54 ` [PATCH v9 4/7] vfio: allow reserved msi iova registration Eric Auger
@ 2016-05-05 19:22   ` Chalamarla, Tirumalesh
  2016-05-09  7:55     ` Eric Auger
  2016-05-10 15:29   ` Alex Williamson
  1 sibling, 1 reply; 38+ messages in thread
From: Chalamarla, Tirumalesh @ 2016-05-05 19:22 UTC (permalink / raw)
  To: Eric Auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel
  Cc: julien.grall, patches, Jean-Philippe.Brucker, p.fedin,
	linux-kernel, Bharat.Bhushan, iommu, pranav.sawargaonkar,
	yehuday






On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" <linux-arm-kernel-bounces@lists.infradead.org on behalf of eric.auger@linaro.org> wrote:

>The user is allowed to register a reserved MSI IOVA range by using the
>DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>This region is stored in the vfio_dma rb tree. At that point the iova
>range is not mapped to any target address yet. The host kernel will use
>those iova when needed, typically when MSIs are allocated.
>
>Signed-off-by: Eric Auger <eric.auger@linaro.org>
>Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
>
>---
>v7 -> v8:
>- use iommu_msi_set_aperture function. There is no notion of
>  unregistration anymore since the reserved msi slot remains
>  until the container gets closed.
>
>v6 -> v7:
>- use iommu_free_reserved_iova_domain
>- convey prot attributes downto dma-reserved-iommu iova domain creation
>- reserved bindings teardown now performed on iommu domain destruction
>- rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
>         VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>- change title
>- pass the protection attribute to dma-reserved-iommu API
>
>v3 -> v4:
>- use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>- protect vfio_register_reserved_iova_range implementation with
>  CONFIG_IOMMU_DMA_RESERVED
>- handle unregistration by user-space and on vfio_iommu_type1 release
>
>v1 -> v2:
>- set returned value according to alloc_reserved_iova_domain result
>- free the iova domains in case any error occurs
>
>RFC v1 -> v1:
>- takes into account Alex comments, based on
>  [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>- use the existing dma map/unmap ioctl interface with a flag to register
>  a reserved IOVA range. A single reserved iova region is allowed.
>---
> drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
> include/uapi/linux/vfio.h       | 10 +++++-
> 2 files changed, 86 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>index 94a9916..4d3a6f1 100644
>--- a/drivers/vfio/vfio_iommu_type1.c
>+++ b/drivers/vfio/vfio_iommu_type1.c
>@@ -36,6 +36,7 @@
> #include <linux/uaccess.h>
> #include <linux/vfio.h>
> #include <linux/workqueue.h>
>+#include <linux/msi-iommu.h>
> 
> #define DRIVER_VERSION  "0.2"
> #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
>@@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
> 	vfio_lock_acct(-unlocked);
> }
> 
>+static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>+				dma_addr_t iova, size_t size)
>+{
>+	struct vfio_domain *d;
>+	int ret = 0;
>+
>+	list_for_each_entry(d, &iommu->domain_list, next) {
>+		ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
>+		if (ret)
>+			break;
>+	}
>+	return ret;
>+}
>+
> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
> {
> 	vfio_unmap_unpin(iommu, dma);
>@@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> 	return ret;
> }
> 
>+static int vfio_register_msi_range(struct vfio_iommu *iommu,
>+				   struct vfio_iommu_type1_dma_map *map)
>+{
>+	dma_addr_t iova = map->iova;
>+	size_t size = map->size;
>+	int ret = 0;
>+	struct vfio_dma *dma;
>+	unsigned long order;
>+	uint64_t mask;
>+
>+	/* Verify that none of our __u64 fields overflow */
>+	if (map->size != size || map->iova != iova)
>+		return -EINVAL;
>+
>+	order =  __ffs(vfio_pgsize_bitmap(iommu));
>+	mask = ((uint64_t)1 << order) - 1;
>+
>+	WARN_ON(mask & PAGE_MASK);
>+
>+	if (!size || (size | iova) & mask)
>+		return -EINVAL;
>+
>+	/* Don't allow IOVA address wrap */
>+	if (iova + size - 1 < iova)
>+		return -EINVAL;
>+
>+	mutex_lock(&iommu->lock);
>+
>+	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>+		ret =  -EEXIST;
>+		goto unlock;
>+	}
>+
>+	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>+	if (!dma) {
>+		ret = -ENOMEM;
>+		goto unlock;
>+	}
>+
>+	dma->iova = iova;
>+	dma->size = size;
>+	dma->type = VFIO_IOVA_RESERVED;
>+
>+	ret = vfio_set_msi_aperture(iommu, iova, size);
>+	if (ret)
>+		goto free_unlock;
>+
>+	vfio_link_dma(iommu, dma);
>+	goto unlock;
>+
>+free_unlock:
>+	kfree(dma);
>+unlock:
>+	mutex_unlock(&iommu->lock);
>+	return ret;
>+}
>+
> static int vfio_bus_type(struct device *dev, void *data)
> {
> 	struct bus_type **bus = data;
>@@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> 	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
> 		struct vfio_iommu_type1_dma_map map;
> 		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
>-				VFIO_DMA_MAP_FLAG_WRITE;
>+				VFIO_DMA_MAP_FLAG_WRITE |
>+				VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
> 
> 		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
> 
>@@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> 		if (map.argsz < minsz || map.flags & ~mask)
> 			return -EINVAL;
> 
>+		if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
>+			return vfio_register_msi_range(iommu, &map);
>+

How are we getting the PA and any validation needed?
> 		return vfio_dma_do_map(iommu, &map);
> 
> 	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>index 255a211..4a9dbc2 100644
>--- a/include/uapi/linux/vfio.h
>+++ b/include/uapi/linux/vfio.h
>@@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
>  *
>  * Map process virtual addresses to IO virtual addresses using the
>  * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
>+ *
>+ * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
>+ * IOVA region that will be used on some platforms to map the host MSI frames.
>+ * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>+ * IOVA region stays until the container is closed.
>  */
> struct vfio_iommu_type1_dma_map {
> 	__u32	argsz;
> 	__u32	flags;
> #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
> #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
>+/* reserved iova for MSI vectors*/
>+#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
> 	__u64	vaddr;				/* Process virtual address */
> 	__u64	iova;				/* IO virtual address */
> 	__u64	size;				/* Size of mapping (bytes) */
>@@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
>  * Caller sets argsz.  The actual unmapped size is returned in the size
>  * field.  No guarantee is made to the user that arbitrary unmaps of iova
>  * or size different from those used in the original mapping call will
>- * succeed.
>+ * succeed. Once registered, an MSI region cannot be unmapped and stays
>+ * until the container is closed.
>  */
> struct vfio_iommu_type1_dma_unmap {
> 	__u32	argsz;
>-- 
>1.9.1
>
>
>_______________________________________________
>linux-arm-kernel mailing list
>linux-arm-kernel@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
  2016-05-04 11:54 ` [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain Eric Auger
@ 2016-05-05 19:23   ` Chalamarla, Tirumalesh
  2016-05-09  8:05     ` Eric Auger
  2016-05-09 22:49   ` Alex Williamson
  1 sibling, 1 reply; 38+ messages in thread
From: Chalamarla, Tirumalesh @ 2016-05-05 19:23 UTC (permalink / raw)
  To: Eric Auger, eric.auger, robin.murphy, alex.williamson,
	will.deacon, joro, tglx, jason, marc.zyngier, christoffer.dall,
	linux-arm-kernel
  Cc: julien.grall, patches, Jean-Philippe.Brucker, p.fedin,
	linux-kernel, Bharat.Bhushan, iommu, pranav.sawargaonkar,
	yehuday






On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" <linux-arm-kernel-bounces@lists.infradead.org on behalf of eric.auger@linaro.org> wrote:

>On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>by the msi controller. vfio_safe_irq_domain allows to check whether
>interrupts are "safe" for a given device. They are if the device does
>not use MSI or if the device uses MSI and the msi-parent controller
>supports IRQ remapping.
>
>Then we check at group level if all devices have safe interrupts: if not,
>we only allow the group to be attached if allow_unsafe_interrupts is set.
>
>At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>changed in next patch.

Will this work in systems with multiple ITS?
>
>Signed-off-by: Eric Auger <eric.auger@linaro.org>
>
>---
>v3 -> v4:
>- rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>  and irq_remapping into safe_irq_domains
>
>v2 -> v3:
>- protect vfio_msi_parent_irq_remapping_capable with
>  CONFIG_GENERIC_MSI_IRQ_DOMAIN
>---
> drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 42 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>index 4d3a6f1..2fc8197 100644
>--- a/drivers/vfio/vfio_iommu_type1.c
>+++ b/drivers/vfio/vfio_iommu_type1.c
>@@ -37,6 +37,8 @@
> #include <linux/vfio.h>
> #include <linux/workqueue.h>
> #include <linux/msi-iommu.h>
>+#include <linux/irqdomain.h>
>+#include <linux/msi.h>
> 
> #define DRIVER_VERSION  "0.2"
> #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
>@@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
> 	return 0;
> }
> 
>+/**
>+ * vfio_safe_irq_domain: returns whether the irq domain
>+ * the device is attached to is safe with respect to MSI isolation.
>+ * If the irq domain is not an MSI domain, we return it is safe.
>+ *
>+ * @dev: device handle
>+ * @data: unused
>+ * returns 0 if the irq domain is safe, -1 if not.
>+ */
>+static int vfio_safe_irq_domain(struct device *dev, void *data)
>+{
>+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>+	struct irq_domain *domain;
>+	struct msi_domain_info *info;
>+
>+	domain = dev_get_msi_domain(dev);
>+	if (!domain)
>+		return 0;
>+
>+	info = msi_get_domain_info(domain);
>+
>+	if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>+		return -1;
>+#endif
>+	return 0;
>+}
>+
> static int vfio_iommu_replay(struct vfio_iommu *iommu,
> 			     struct vfio_domain *domain)
> {
>@@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> 	struct vfio_group *group, *g;
> 	struct vfio_domain *domain, *d;
> 	struct bus_type *bus = NULL;
>-	int ret;
>+	int ret, safe_irq_domains;
> 
> 	mutex_lock(&iommu->lock);
> 
>@@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> 
> 	group->iommu_group = iommu_group;
> 
>+	/*
>+	 * Determine if all the devices of the group have a safe irq domain
>+	 * with respect to MSI isolation
>+	 */
>+	safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>+				       vfio_safe_irq_domain);
>+
> 	/* Determine bus_type in order to allocate a domain */
> 	ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
> 	if (ret)
>@@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> 	INIT_LIST_HEAD(&domain->group_list);
> 	list_add(&group->next, &domain->group_list);
> 
>+	/*
>+	 * to advertise safe interrupts either the IOMMU or the MSI controllers
>+	 * must support IRQ remapping/interrupt translation
>+	 */
> 	if (!allow_unsafe_interrupts &&
>-	    !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>+	    (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
> 		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
> 		       __func__);
> 		ret = -EPERM;
>-- 
>1.9.1
>
>
>_______________________________________________
>linux-arm-kernel mailing list
>linux-arm-kernel@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 4/7] vfio: allow reserved msi iova registration
  2016-05-05 19:22   ` Chalamarla, Tirumalesh
@ 2016-05-09  7:55     ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-05-09  7:55 UTC (permalink / raw)
  To: Chalamarla, Tirumalesh, eric.auger, robin.murphy,
	alex.williamson, will.deacon, joro, tglx, jason, marc.zyngier,
	christoffer.dall, linux-arm-kernel
  Cc: julien.grall, patches, Jean-Philippe.Brucker, p.fedin,
	linux-kernel, Bharat.Bhushan, iommu, pranav.sawargaonkar,
	yehuday

Hi Chalamarla,
On 05/05/2016 09:22 PM, Chalamarla, Tirumalesh wrote:
> 
> 
> 
> 
> 
> On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" <linux-arm-kernel-bounces@lists.infradead.org on behalf of eric.auger@linaro.org> wrote:
> 
>> The user is allowed to register a reserved MSI IOVA range by using the
>> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>> This region is stored in the vfio_dma rb tree. At that point the iova
>> range is not mapped to any target address yet. The host kernel will use
>> those iova when needed, typically when MSIs are allocated.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
>>
>> ---
>> v7 -> v8:
>> - use iommu_msi_set_aperture function. There is no notion of
>>  unregistration anymore since the reserved msi slot remains
>>  until the container gets closed.
>>
>> v6 -> v7:
>> - use iommu_free_reserved_iova_domain
>> - convey prot attributes downto dma-reserved-iommu iova domain creation
>> - reserved bindings teardown now performed on iommu domain destruction
>> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
>>         VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>> - change title
>> - pass the protection attribute to dma-reserved-iommu API
>>
>> v3 -> v4:
>> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>> - protect vfio_register_reserved_iova_range implementation with
>>  CONFIG_IOMMU_DMA_RESERVED
>> - handle unregistration by user-space and on vfio_iommu_type1 release
>>
>> v1 -> v2:
>> - set returned value according to alloc_reserved_iova_domain result
>> - free the iova domains in case any error occurs
>>
>> RFC v1 -> v1:
>> - takes into account Alex comments, based on
>>  [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>> - use the existing dma map/unmap ioctl interface with a flag to register
>>  a reserved IOVA range. A single reserved iova region is allowed.
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
>> include/uapi/linux/vfio.h       | 10 +++++-
>> 2 files changed, 86 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 94a9916..4d3a6f1 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -36,6 +36,7 @@
>> #include <linux/uaccess.h>
>> #include <linux/vfio.h>
>> #include <linux/workqueue.h>
>> +#include <linux/msi-iommu.h>
>>
>> #define DRIVER_VERSION  "0.2"
>> #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
>> @@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> 	vfio_lock_acct(-unlocked);
>> }
>>
>> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>> +				dma_addr_t iova, size_t size)
>> +{
>> +	struct vfio_domain *d;
>> +	int ret = 0;
>> +
>> +	list_for_each_entry(d, &iommu->domain_list, next) {
>> +		ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
>> +		if (ret)
>> +			break;
>> +	}
>> +	return ret;
>> +}
>> +
>> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> {
>> 	vfio_unmap_unpin(iommu, dma);
>> @@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>> 	return ret;
>> }
>>
>> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
>> +				   struct vfio_iommu_type1_dma_map *map)
>> +{
>> +	dma_addr_t iova = map->iova;
>> +	size_t size = map->size;
>> +	int ret = 0;
>> +	struct vfio_dma *dma;
>> +	unsigned long order;
>> +	uint64_t mask;
>> +
>> +	/* Verify that none of our __u64 fields overflow */
>> +	if (map->size != size || map->iova != iova)
>> +		return -EINVAL;
>> +
>> +	order =  __ffs(vfio_pgsize_bitmap(iommu));
>> +	mask = ((uint64_t)1 << order) - 1;
>> +
>> +	WARN_ON(mask & PAGE_MASK);
>> +
>> +	if (!size || (size | iova) & mask)
>> +		return -EINVAL;
>> +
>> +	/* Don't allow IOVA address wrap */
>> +	if (iova + size - 1 < iova)
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&iommu->lock);
>> +
>> +	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>> +		ret =  -EEXIST;
>> +		goto unlock;
>> +	}
>> +
>> +	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>> +	if (!dma) {
>> +		ret = -ENOMEM;
>> +		goto unlock;
>> +	}
>> +
>> +	dma->iova = iova;
>> +	dma->size = size;
>> +	dma->type = VFIO_IOVA_RESERVED;
>> +
>> +	ret = vfio_set_msi_aperture(iommu, iova, size);
>> +	if (ret)
>> +		goto free_unlock;
>> +
>> +	vfio_link_dma(iommu, dma);
>> +	goto unlock;
>> +
>> +free_unlock:
>> +	kfree(dma);
>> +unlock:
>> +	mutex_unlock(&iommu->lock);
>> +	return ret;
>> +}
>> +
>> static int vfio_bus_type(struct device *dev, void *data)
>> {
>> 	struct bus_type **bus = data;
>> @@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>> 	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
>> 		struct vfio_iommu_type1_dma_map map;
>> 		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
>> -				VFIO_DMA_MAP_FLAG_WRITE;
>> +				VFIO_DMA_MAP_FLAG_WRITE |
>> +				VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>>
>> 		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>>
>> @@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>> 		if (map.argsz < minsz || map.flags & ~mask)
>> 			return -EINVAL;
>>
>> +		if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
>> +			return vfio_register_msi_range(iommu, &map);
>> +
> 
> How are we getting the PA and any validation needed?
Sorry I am not sure I get your question? The principle is the userspace
registers a spare Guest PA range that can be safely used by the kernel
to map host MSI frames. In QEMU integration we use platform bus unmapped
GPA to make things transparent for the virt machine.

Any validation: we currently make sure the registered GPA was not
already registered as RAM regions. The kernel also checks it is properly
aligned according to the IOMMU page size.

Please let me know if I answered your question.

Best Regards

Eric

>> 		return vfio_dma_do_map(iommu, &map);
>>
>> 	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 255a211..4a9dbc2 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
>>  *
>>  * Map process virtual addresses to IO virtual addresses using the
>>  * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
>> + *
>> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
>> + * IOVA region that will be used on some platforms to map the host MSI frames.
>> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>> + * IOVA region stays until the container is closed.
>>  */
>> struct vfio_iommu_type1_dma_map {
>> 	__u32	argsz;
>> 	__u32	flags;
>> #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
>> #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
>> +/* reserved iova for MSI vectors*/
>> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
>> 	__u64	vaddr;				/* Process virtual address */
>> 	__u64	iova;				/* IO virtual address */
>> 	__u64	size;				/* Size of mapping (bytes) */
>> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
>>  * Caller sets argsz.  The actual unmapped size is returned in the size
>>  * field.  No guarantee is made to the user that arbitrary unmaps of iova
>>  * or size different from those used in the original mapping call will
>> - * succeed.
>> + * succeed. Once registered, an MSI region cannot be unmapped and stays
>> + * until the container is closed.
>>  */
>> struct vfio_iommu_type1_dma_unmap {
>> 	__u32	argsz;
>> -- 
>> 1.9.1
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
  2016-05-05 19:23   ` Chalamarla, Tirumalesh
@ 2016-05-09  8:05     ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-05-09  8:05 UTC (permalink / raw)
  To: Chalamarla, Tirumalesh, eric.auger, robin.murphy,
	alex.williamson, will.deacon, joro, tglx, jason, marc.zyngier,
	christoffer.dall, linux-arm-kernel
  Cc: julien.grall, patches, Jean-Philippe.Brucker, p.fedin,
	linux-kernel, Bharat.Bhushan, iommu, pranav.sawargaonkar,
	yehuday

Hi Chalamarla,
On 05/05/2016 09:23 PM, Chalamarla, Tirumalesh wrote:
> 
> 
> 
> 
> 
> On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" <linux-arm-kernel-bounces@lists.infradead.org on behalf of eric.auger@linaro.org> wrote:
> 
>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>> by the msi controller. vfio_safe_irq_domain allows to check whether
>> interrupts are "safe" for a given device. They are if the device does
>> not use MSI or if the device uses MSI and the msi-parent controller
>> supports IRQ remapping.
>>
>> Then we check at group level if all devices have safe interrupts: if not,
>> we only allow the group to be attached if allow_unsafe_interrupts is set.
>>
>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>> changed in next patch.
> 
> Will this work in systems with multiple ITS?
Yes it should support multiple ITS.

Please note however that the series does not yet implement the
msi_doorbell_info callback in GICv3 ITS PCIe/platform irqchip. Without
that, the pci_enable is going to fail. I am going to submit a separate
RFC for that but I can't test it at the moment.

Best Regards

Eric
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>> v3 -> v4:
>> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>>  and irq_remapping into safe_irq_domains
>>
>> v2 -> v3:
>> - protect vfio_msi_parent_irq_remapping_capable with
>>  CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 4d3a6f1..2fc8197 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -37,6 +37,8 @@
>> #include <linux/vfio.h>
>> #include <linux/workqueue.h>
>> #include <linux/msi-iommu.h>
>> +#include <linux/irqdomain.h>
>> +#include <linux/msi.h>
>>
>> #define DRIVER_VERSION  "0.2"
>> #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
>> 	return 0;
>> }
>>
>> +/**
>> + * vfio_safe_irq_domain: returns whether the irq domain
>> + * the device is attached to is safe with respect to MSI isolation.
>> + * If the irq domain is not an MSI domain, we return it is safe.
>> + *
>> + * @dev: device handle
>> + * @data: unused
>> + * returns 0 if the irq domain is safe, -1 if not.
>> + */
>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>> +{
>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> +	struct irq_domain *domain;
>> +	struct msi_domain_info *info;
>> +
>> +	domain = dev_get_msi_domain(dev);
>> +	if (!domain)
>> +		return 0;
>> +
>> +	info = msi_get_domain_info(domain);
>> +
>> +	if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>> +		return -1;
>> +#endif
>> +	return 0;
>> +}
>> +
>> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>> 			     struct vfio_domain *domain)
>> {
>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> 	struct vfio_group *group, *g;
>> 	struct vfio_domain *domain, *d;
>> 	struct bus_type *bus = NULL;
>> -	int ret;
>> +	int ret, safe_irq_domains;
>>
>> 	mutex_lock(&iommu->lock);
>>
>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>
>> 	group->iommu_group = iommu_group;
>>
>> +	/*
>> +	 * Determine if all the devices of the group have a safe irq domain
>> +	 * with respect to MSI isolation
>> +	 */
>> +	safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>> +				       vfio_safe_irq_domain);
>> +
>> 	/* Determine bus_type in order to allocate a domain */
>> 	ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>> 	if (ret)
>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> 	INIT_LIST_HEAD(&domain->group_list);
>> 	list_add(&group->next, &domain->group_list);
>>
>> +	/*
>> +	 * to advertise safe interrupts either the IOMMU or the MSI controllers
>> +	 * must support IRQ remapping/interrupt translation
>> +	 */
>> 	if (!allow_unsafe_interrupts &&
>> -	    !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>> +	    (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
>> 		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
>> 		       __func__);
>> 		ret = -EPERM;
>> -- 
>> 1.9.1
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 2/7] vfio/type1: vfio_find_dma accepting a type argument
  2016-05-04 11:54 ` [PATCH v9 2/7] vfio/type1: vfio_find_dma accepting a type argument Eric Auger
@ 2016-05-09 22:49   ` Alex Williamson
  2016-05-10 14:54     ` Eric Auger
  0 siblings, 1 reply; 38+ messages in thread
From: Alex Williamson @ 2016-05-09 22:49 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
	marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, julien.grall, yehuday

On Wed,  4 May 2016 11:54:13 +0000
Eric Auger <eric.auger@linaro.org> wrote:

> In our RB-tree we now have slots of different types (USER and RESERVED).
> It becomes useful to be able to search for dma slots of a specific type or
> any type. This patch proposes an implementation for that modality and also
> changes the existing callers using the USER type.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 63 ++++++++++++++++++++++++++++++++++-------
>  1 file changed, 53 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index aaf5a6c..2d769d4 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -98,23 +98,64 @@ struct vfio_group {
>   * into DMA'ble space using the IOMMU
>   */
>  
> -static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
> -				      dma_addr_t start, size_t size)
> +/**
> + * vfio_find_dma_from_node: looks for a dma slot intersecting a window
> + * from a given rb tree node
> + * @top: top rb tree node where the search starts (including this node)
> + * @start: window start
> + * @size: window size
> + * @type: window type
> + */
> +static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
> +						dma_addr_t start, size_t size,
> +						enum vfio_iova_type type)
>  {
> -	struct rb_node *node = iommu->dma_list.rb_node;
> +	struct rb_node *node = top;
> +	struct vfio_dma *dma;
>  
>  	while (node) {
> -		struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
> -
> +		dma = rb_entry(node, struct vfio_dma, node);
>  		if (start + size <= dma->iova)
>  			node = node->rb_left;
>  		else if (start >= dma->iova + dma->size)
>  			node = node->rb_right;
>  		else
> +			break;
> +	}
> +	if (!node)
> +		return NULL;
> +
> +	/* a dma slot intersects our window, check the type also matches */
> +	if (type == VFIO_IOVA_ANY || dma->type == type)
> +		return dma;
> +
> +	/* restart 2 searches skipping the current node */
> +	if (start < dma->iova) {
> +		dma = vfio_find_dma_from_node(node->rb_left, start,
> +					      size, type);
> +		if (dma)
>  			return dma;
>  	}
> +	if (start + size > dma->iova + dma->size)
> +		dma = vfio_find_dma_from_node(node->rb_right, start,
> +					      size, type);

It might be nice to split this little bit of trickery out to a separate
patch so that you can better explain what it does.  It's really not
needed until we start creating user-type entries, so it shouldn't hurt
bisection.

> +	return dma;
> +}
> +
> +/**
> + * vfio_find_dma: find a dma slot intersecting a given window
> + * @iommu: vfio iommu handle
> + * @start: window base iova
> + * @size: window size
> + * @type: window type
> + */
> +static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
> +				      dma_addr_t start, size_t size,
> +				      enum vfio_iova_type type)
> +{
> +	struct rb_node *top_node = iommu->dma_list.rb_node;
>  
> -	return NULL;
> +	return vfio_find_dma_from_node(top_node, start, size, type);
>  }
>  
>  static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
> @@ -488,19 +529,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>  	 * mappings within the range.
>  	 */
>  	if (iommu->v2) {
> -		dma = vfio_find_dma(iommu, unmap->iova, 0);
> +		dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
>  		if (dma && dma->iova != unmap->iova) {
>  			ret = -EINVAL;
>  			goto unlock;
>  		}
> -		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
> +		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
> +				    VFIO_IOVA_USER);
>  		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
>  			ret = -EINVAL;
>  			goto unlock;
>  		}
>  	}
>  
> -	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
> +	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
> +				    VFIO_IOVA_USER))) {
>  		if (!iommu->v2 && unmap->iova > dma->iova)
>  			break;
>  		unmapped += dma->size;
> @@ -604,7 +647,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  
>  	mutex_lock(&iommu->lock);
>  
> -	if (vfio_find_dma(iommu, iova, size)) {
> +	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>  		mutex_unlock(&iommu->lock);
>  		return -EEXIST;
>  	}

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 3/7] vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots
  2016-05-04 11:54 ` [PATCH v9 3/7] vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots Eric Auger
@ 2016-05-09 22:49   ` Alex Williamson
  2016-05-11 12:58     ` Eric Auger
  0 siblings, 1 reply; 38+ messages in thread
From: Alex Williamson @ 2016-05-09 22:49 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
	marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, julien.grall, yehuday

On Wed,  4 May 2016 11:54:14 +0000
Eric Auger <eric.auger@linaro.org> wrote:

> Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
> let's implement the expected behavior for removal and replay. As opposed
> to user dma slots, IOVAs are not systematically bound to PAs and PAs are
> not pinned. VFIO just initializes the IOVA "aperture". IOVAs are allocated
> outside of the VFIO framework, typically the MSI layer which is
> responsible to free and unmap them. The MSI mapping resources are freeed
> by the IOMMU driver on domain destruction.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> v7 -> v8:
> - do no destroy anything anymore, just bypass unmap/unpin and iommu_map
>   on replay
> ---
>  drivers/vfio/vfio_iommu_type1.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2d769d4..94a9916 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -391,7 +391,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>  	struct vfio_domain *domain, *d;
>  	long unlocked = 0;
>  
> -	if (!dma->size)
> +	if (!dma->size || dma->type != VFIO_IOVA_USER)
>  		return;
>  	/*
>  	 * We use the IOMMU to track the physical addresses, otherwise we'd
> @@ -727,6 +727,9 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>  		dma = rb_entry(n, struct vfio_dma, node);
>  		iova = dma->iova;
>  
> +		if (dma->type == VFIO_IOVA_RESERVED)
> +			continue;
> +

But you do still need some sort of replay mechanism, right?  Not to
replay the IOVA to PA mapping, but to call iommu_msi_set_aperture() for
the new domain.  How will you know that this entry is an MSI reserved
range or something else?  Perhaps we can't have a generic "reserved"
type here.

>  		while (iova < dma->iova + dma->size) {
>  			phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
>  			size_t size;

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
  2016-05-04 11:54 ` [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain Eric Auger
  2016-05-05 19:23   ` Chalamarla, Tirumalesh
@ 2016-05-09 22:49   ` Alex Williamson
  2016-05-10 16:10     ` Eric Auger
  1 sibling, 1 reply; 38+ messages in thread
From: Alex Williamson @ 2016-05-09 22:49 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
	marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, julien.grall, yehuday

On Wed,  4 May 2016 11:54:16 +0000
Eric Auger <eric.auger@linaro.org> wrote:

> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
> by the msi controller. vfio_safe_irq_domain allows to check whether
> interrupts are "safe" for a given device. They are if the device does
> not use MSI

Are we sure we're not opening a security hole here?  An MSI is simply a
DMA write, so really whether or not a device uses MSI is irrelevant.
If it can generate a DMA to the MSI doorbell then we need to be
protected and I think we pretty much need to assume that devices are
DMA capable.  Do the MSI domain checks cover this?

> or if the device uses MSI and the msi-parent controller
> supports IRQ remapping.
> 
> Then we check at group level if all devices have safe interrupts: if not,
> we only allow the group to be attached if allow_unsafe_interrupts is set.
> 
> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
> changed in next patch.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> v3 -> v4:
> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>   and irq_remapping into safe_irq_domains
> 
> v2 -> v3:
> - protect vfio_msi_parent_irq_remapping_capable with
>   CONFIG_GENERIC_MSI_IRQ_DOMAIN
> ---
>  drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 42 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 4d3a6f1..2fc8197 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -37,6 +37,8 @@
>  #include <linux/vfio.h>
>  #include <linux/workqueue.h>
>  #include <linux/msi-iommu.h>
> +#include <linux/irqdomain.h>
> +#include <linux/msi.h>
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
>  	return 0;
>  }
>  
> +/**
> + * vfio_safe_irq_domain: returns whether the irq domain
> + * the device is attached to is safe with respect to MSI isolation.
> + * If the irq domain is not an MSI domain, we return it is safe.
> + *
> + * @dev: device handle
> + * @data: unused
> + * returns 0 if the irq domain is safe, -1 if not.
> + */
> +static int vfio_safe_irq_domain(struct device *dev, void *data)
> +{
> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
> +	struct irq_domain *domain;
> +	struct msi_domain_info *info;
> +
> +	domain = dev_get_msi_domain(dev);
> +	if (!domain)
> +		return 0;
> +
> +	info = msi_get_domain_info(domain);
> +
> +	if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
> +		return -1;
> +#endif
> +	return 0;
> +}
> +
>  static int vfio_iommu_replay(struct vfio_iommu *iommu,
>  			     struct vfio_domain *domain)
>  {
> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  	struct vfio_group *group, *g;
>  	struct vfio_domain *domain, *d;
>  	struct bus_type *bus = NULL;
> -	int ret;
> +	int ret, safe_irq_domains;
>  
>  	mutex_lock(&iommu->lock);
>  
> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  
>  	group->iommu_group = iommu_group;
>  
> +	/*
> +	 * Determine if all the devices of the group have a safe irq domain
> +	 * with respect to MSI isolation
> +	 */
> +	safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
> +				       vfio_safe_irq_domain);
> +
>  	/* Determine bus_type in order to allocate a domain */
>  	ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>  	if (ret)
> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  	INIT_LIST_HEAD(&domain->group_list);
>  	list_add(&group->next, &domain->group_list);
>  
> +	/*
> +	 * to advertise safe interrupts either the IOMMU or the MSI controllers
> +	 * must support IRQ remapping/interrupt translation
> +	 */
>  	if (!allow_unsafe_interrupts &&
> -	    !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
> +	    (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
>  		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
>  		       __func__);
>  		ret = -EPERM;

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
  2016-05-04 11:54 ` [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains Eric Auger
  2016-05-04 12:06   ` Eric Auger
@ 2016-05-09 22:49   ` Alex Williamson
  2016-05-10 16:36     ` Eric Auger
  1 sibling, 1 reply; 38+ messages in thread
From: Alex Williamson @ 2016-05-09 22:49 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
	marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, julien.grall, yehuday

On Wed,  4 May 2016 11:54:18 +0000
Eric Auger <eric.auger@linaro.org> wrote:

> This patch allows the user-space to retrieve the MSI geometry. The
> implementation is based on capability chains, now also added to
> VFIO_IOMMU_GET_INFO.
> 
> The returned info comprise:
> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>   in the positive, the start/end of the aperture,
> - or whether the IOVA aperture need to be set by the userspace. In that
>   case, the size and alignment of the IOVA region to be provided are
>   returned.
> 
> In case the userspace must provide the IOVA range, we currently return
> an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
> current ARM platforms. This may be deprecated by a more sophisticated
> computation later on.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> v8 -> v9:
> - use iommu_msi_supported flag instead of programmable
> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>   capability chain, reporting the MSI geometry
> 
> v7 -> v8:
> - use iommu_domain_msi_geometry
> 
> v6 -> v7:
> - remove the computation of the number of IOVA pages to be provisionned.
>   This number depends on the domain/group/device topology which can
>   dynamically change. Let's rely instead rely on an arbitrary max depending
>   on the system
> 
> v4 -> v5:
> - move msi_info and ret declaration within the conditional code
> 
> v3 -> v4:
> - replace former vfio_domains_require_msi_mapping by
>   more complex computation of MSI mapping requirements, especially the
>   number of pages to be provided by the user-space.
> - reword patch title
> 
> RFC v1 -> v1:
> - derived from
>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> - renamed allow_msi_reconfig into require_msi_mapping
> - fixed VFIO_IOMMU_GET_INFO
> ---
>  drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h       | 30 +++++++++++++++++-
>  2 files changed, 98 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2fc8197..841360b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> +				     struct vfio_info_cap *caps)
> +{
> +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> +	struct iommu_domain_msi_geometry msi_geometry;
> +	struct vfio_info_cap_header *header;
> +	struct vfio_domain *d;
> +	bool mapping_required;
> +	size_t size;
> +
> +	mutex_lock(&iommu->lock);
> +	/* All domains have same require_msi_map property, pick first */
> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> +			      &msi_geometry);
> +	mapping_required = msi_geometry.iommu_msi_supported;
> +
> +	mutex_unlock(&iommu->lock);
> +
> +	size = sizeof(*vfio_msi_geometry);
> +	header = vfio_info_cap_add(caps, size,
> +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> +
> +	if (IS_ERR(header))
> +		return PTR_ERR(header);
> +
> +	vfio_msi_geometry = container_of(header,
> +				struct vfio_iommu_type1_info_cap_msi_geometry,
> +				header);
> +
> +	vfio_msi_geometry->reserved = !mapping_required;
> +	if (vfio_msi_geometry->reserved) {
> +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
> +		return 0;
> +	}
> +
> +	vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
> +	/* we currently report the need for an arbitray number of 16 pages */
> +	vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;

Hmm, that really is arbitrary.  How could we know a real value here?

> +
> +	return 0;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> @@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  		}
>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
>  		struct vfio_iommu_type1_info info;
> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> +		int ret;
>  
>  		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>  
> @@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>  
> +		ret = compute_msi_geometry_caps(iommu, &caps);
> +		if (ret)
> +			return ret;
> +
> +		if (caps.size) {
> +			info.flags |= VFIO_IOMMU_INFO_CAPS;
> +			if (info.argsz < sizeof(info) + caps.size) {
> +				info.argsz = sizeof(info) + caps.size;
> +				info.cap_offset = 0;
> +			} else {
> +				vfio_info_cap_shift(&caps, sizeof(info));
> +				if (copy_to_user((void __user *)arg +
> +						sizeof(info), caps.buf,
> +						caps.size)) {
> +					kfree(caps.buf);
> +					return -EFAULT;
> +				}
> +				info.cap_offset = sizeof(info);
> +			}
> +
> +			kfree(caps.buf);
> +		}
> +
>  		return copy_to_user((void __user *)arg, &info, minsz) ?
>  			-EFAULT : 0;
>  
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 4a9dbc2..0ff6a8d 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
>  	__u32	argsz;
>  	__u32	flags;
>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
> -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
> +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
> +	__u32   cap_offset;	/* Offset within info struct of first cap */
> +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */

This would break existing users, we can't arbitrarily change the offset
of iova_pgsizes.  We can add cap_offset to the end and I think
everything would work about above if we do that.

> +};
> +
> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
> +
> +/*
> + * The MSI geometry capability allows to report the MSI IOVA geometry:
> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> + *   whose boundaries are given by [@aperture_start, @aperture_end].
> + *   this is typically the case on x86 host. The userspace is not allowed
> + *   to map userspace memory at IOVAs intersecting this range using
> + *   VFIO_IOMMU_MAP_DMA.
> + * - or the MSI IOVAs are not requested to belong to any reserved range;
> + *   in that case the userspace must provide an IOVA window characterized by
> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> + */
> +struct vfio_iommu_type1_info_cap_msi_geometry {
> +	struct vfio_info_cap_header header;
> +	bool reserved; /* Are MSI IOVAs within a reserved aperture? */

Do bools have a guaranteed user size?  Let's make this a __u32 and call
it flags with bit 0 defined as reserved.  I'm tempted to suggest we
could figure out how to make alignment fit in another __u32 so we have a
properly packed structure, otherwise we should make a reserved __u32.

> +	/* reserved */
> +	__u64 aperture_start;
> +	__u64 aperture_end;
> +	/* not reserved */
> +	__u64 size; /* IOVA aperture size in bytes the userspace must provide */
> +	__u64 alignment; /* alignment of the window, in bytes */
>  };
>  
>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> @@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
>   * IOVA region that will be used on some platforms to map the host MSI frames.
>   * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>   * IOVA region stays until the container is closed.
> + * The requirement for provisioning such reserved IOVA range can be checked by
> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
>   */
>  struct vfio_iommu_type1_dma_map {
>  	__u32	argsz;

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
  2016-05-04 12:06   ` Eric Auger
@ 2016-05-09 23:03     ` Alex Williamson
  2016-05-10 16:50       ` Eric Auger
  0 siblings, 1 reply; 38+ messages in thread
From: Alex Williamson @ 2016-05-09 23:03 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
	marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, julien.grall, yehuday

On Wed, 4 May 2016 14:06:19 +0200
Eric Auger <eric.auger@linaro.org> wrote:

> Hi Alex,
> On 05/04/2016 01:54 PM, Eric Auger wrote:
> > This patch allows the user-space to retrieve the MSI geometry. The
> > implementation is based on capability chains, now also added to
> > VFIO_IOMMU_GET_INFO.  
> 
> If you prefer we could consider this patch outside of the main series
> since it brings extra functionalities (MSI geometry reporting). In a
> first QEMU integration we would live without knowing the MSI geometry I
> think, all the more so I currently report an arbitrary number of
> requested IOVA pages. The computation of the exact number of doorbells
> to map brings extra complexity and I did not address this issue yet.
> 
> It sketches a possible user API to report the MSI geometry based on the
> capability chains, as you suggested some time ago. I am currently busy
> drafting a QEMU integration.

How would the user know that reserved MSI mappings are requires or
available without this?  Wouldn't the only option be for userspace to
try to map something with the reserved MSI flag set and see if the
kernel accepts it?  That's not a very desirable programming model.  The
arbitrary size is pretty ugly, but it at least makes for a consistent
user interface.  Is it a functional issue if we overestimate the size
or is it just a matter of wasting IOVA space?  Is there significant
harm in making it obscenely large, like 1MB?  The reference counting and
re-use of IOVA pages seems like we may often only be using a single
IOVA page for multiple doorbells.  I guess I'm leaning towards defining
the API even if the value is somewhat arbitrary because we'd rather have
control of this rather than having the user guess and try to rope them
back in later to use a kernel recommended value.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 2/7] vfio/type1: vfio_find_dma accepting a type argument
  2016-05-09 22:49   ` Alex Williamson
@ 2016-05-10 14:54     ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-05-10 14:54 UTC (permalink / raw)
  To: Alex Williamson
  Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
	marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, julien.grall, yehuday

Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
> On Wed,  4 May 2016 11:54:13 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
> 
>> In our RB-tree we now have slots of different types (USER and RESERVED).
>> It becomes useful to be able to search for dma slots of a specific type or
>> any type. This patch proposes an implementation for that modality and also
>> changes the existing callers using the USER type.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 63 ++++++++++++++++++++++++++++++++++-------
>>  1 file changed, 53 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index aaf5a6c..2d769d4 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -98,23 +98,64 @@ struct vfio_group {
>>   * into DMA'ble space using the IOMMU
>>   */
>>  
>> -static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
>> -				      dma_addr_t start, size_t size)
>> +/**
>> + * vfio_find_dma_from_node: looks for a dma slot intersecting a window
>> + * from a given rb tree node
>> + * @top: top rb tree node where the search starts (including this node)
>> + * @start: window start
>> + * @size: window size
>> + * @type: window type
>> + */
>> +static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
>> +						dma_addr_t start, size_t size,
>> +						enum vfio_iova_type type)
>>  {
>> -	struct rb_node *node = iommu->dma_list.rb_node;
>> +	struct rb_node *node = top;
>> +	struct vfio_dma *dma;
>>  
>>  	while (node) {
>> -		struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
>> -
>> +		dma = rb_entry(node, struct vfio_dma, node);
>>  		if (start + size <= dma->iova)
>>  			node = node->rb_left;
>>  		else if (start >= dma->iova + dma->size)
>>  			node = node->rb_right;
>>  		else
>> +			break;
>> +	}
>> +	if (!node)
>> +		return NULL;
>> +
>> +	/* a dma slot intersects our window, check the type also matches */
>> +	if (type == VFIO_IOVA_ANY || dma->type == type)
>> +		return dma;
>> +
>> +	/* restart 2 searches skipping the current node */
>> +	if (start < dma->iova) {
>> +		dma = vfio_find_dma_from_node(node->rb_left, start,
>> +					      size, type);
>> +		if (dma)
>>  			return dma;
>>  	}
>> +	if (start + size > dma->iova + dma->size)
>> +		dma = vfio_find_dma_from_node(node->rb_right, start,
>> +					      size, type);
> 
> It might be nice to split this little bit of trickery out to a separate
> patch so that you can better explain what it does.  It's really not
> needed until we start creating user-type entries, so it shouldn't hurt
> bisection.
OK I will

Regards

Eric
> 
>> +	return dma;
>> +}
>> +
>> +/**
>> + * vfio_find_dma: find a dma slot intersecting a given window
>> + * @iommu: vfio iommu handle
>> + * @start: window base iova
>> + * @size: window size
>> + * @type: window type
>> + */
>> +static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
>> +				      dma_addr_t start, size_t size,
>> +				      enum vfio_iova_type type)
>> +{
>> +	struct rb_node *top_node = iommu->dma_list.rb_node;
>>  
>> -	return NULL;
>> +	return vfio_find_dma_from_node(top_node, start, size, type);
>>  }
>>  
>>  static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
>> @@ -488,19 +529,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>  	 * mappings within the range.
>>  	 */
>>  	if (iommu->v2) {
>> -		dma = vfio_find_dma(iommu, unmap->iova, 0);
>> +		dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
>>  		if (dma && dma->iova != unmap->iova) {
>>  			ret = -EINVAL;
>>  			goto unlock;
>>  		}
>> -		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
>> +		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
>> +				    VFIO_IOVA_USER);
>>  		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
>>  			ret = -EINVAL;
>>  			goto unlock;
>>  		}
>>  	}
>>  
>> -	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
>> +	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
>> +				    VFIO_IOVA_USER))) {
>>  		if (!iommu->v2 && unmap->iova > dma->iova)
>>  			break;
>>  		unmapped += dma->size;
>> @@ -604,7 +647,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  
>>  	mutex_lock(&iommu->lock);
>>  
>> -	if (vfio_find_dma(iommu, iova, size)) {
>> +	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>>  		mutex_unlock(&iommu->lock);
>>  		return -EEXIST;
>>  	}
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 4/7] vfio: allow reserved msi iova registration
  2016-05-04 11:54 ` [PATCH v9 4/7] vfio: allow reserved msi iova registration Eric Auger
  2016-05-05 19:22   ` Chalamarla, Tirumalesh
@ 2016-05-10 15:29   ` Alex Williamson
  2016-05-10 15:34     ` Eric Auger
  1 sibling, 1 reply; 38+ messages in thread
From: Alex Williamson @ 2016-05-10 15:29 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
	marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, julien.grall, yehuday

On Wed,  4 May 2016 11:54:15 +0000
Eric Auger <eric.auger@linaro.org> wrote:

> The user is allowed to register a reserved MSI IOVA range by using the
> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
> This region is stored in the vfio_dma rb tree. At that point the iova
> range is not mapped to any target address yet. The host kernel will use
> those iova when needed, typically when MSIs are allocated.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> 
> ---
> v7 -> v8:
> - use iommu_msi_set_aperture function. There is no notion of
>   unregistration anymore since the reserved msi slot remains
>   until the container gets closed.
> 
> v6 -> v7:
> - use iommu_free_reserved_iova_domain
> - convey prot attributes downto dma-reserved-iommu iova domain creation
> - reserved bindings teardown now performed on iommu domain destruction
> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
>          VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
> - change title
> - pass the protection attribute to dma-reserved-iommu API
> 
> v3 -> v4:
> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
> - protect vfio_register_reserved_iova_range implementation with
>   CONFIG_IOMMU_DMA_RESERVED
> - handle unregistration by user-space and on vfio_iommu_type1 release
> 
> v1 -> v2:
> - set returned value according to alloc_reserved_iova_domain result
> - free the iova domains in case any error occurs
> 
> RFC v1 -> v1:
> - takes into account Alex comments, based on
>   [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
> - use the existing dma map/unmap ioctl interface with a flag to register
>   a reserved IOVA range. A single reserved iova region is allowed.
> ---
>  drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
>  include/uapi/linux/vfio.h       | 10 +++++-
>  2 files changed, 86 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 94a9916..4d3a6f1 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -36,6 +36,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/vfio.h>
>  #include <linux/workqueue.h>
> +#include <linux/msi-iommu.h>
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> @@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>  	vfio_lock_acct(-unlocked);
>  }
>  
> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
> +				dma_addr_t iova, size_t size)
> +{
> +	struct vfio_domain *d;
> +	int ret = 0;
> +
> +	list_for_each_entry(d, &iommu->domain_list, next) {
> +		ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
> +		if (ret)
> +			break;
> +	}
> +	return ret;
> +}
> +
>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>  {
>  	vfio_unmap_unpin(iommu, dma);
> @@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  	return ret;
>  }
>  
> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
> +				   struct vfio_iommu_type1_dma_map *map)
> +{
> +	dma_addr_t iova = map->iova;
> +	size_t size = map->size;
> +	int ret = 0;
> +	struct vfio_dma *dma;
> +	unsigned long order;
> +	uint64_t mask;
> +
> +	/* Verify that none of our __u64 fields overflow */
> +	if (map->size != size || map->iova != iova)
> +		return -EINVAL;
> +
> +	order =  __ffs(vfio_pgsize_bitmap(iommu));
> +	mask = ((uint64_t)1 << order) - 1;
> +
> +	WARN_ON(mask & PAGE_MASK);
> +
> +	if (!size || (size | iova) & mask)
> +		return -EINVAL;
> +
> +	/* Don't allow IOVA address wrap */
> +	if (iova + size - 1 < iova)
> +		return -EINVAL;
> +
> +	mutex_lock(&iommu->lock);
> +
> +	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
> +		ret =  -EEXIST;
> +		goto unlock;
> +	}
> +
> +	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
> +	if (!dma) {
> +		ret = -ENOMEM;
> +		goto unlock;
> +	}
> +
> +	dma->iova = iova;
> +	dma->size = size;
> +	dma->type = VFIO_IOVA_RESERVED;

[oops, forgot to send this reply with the others]

I'm tempted to suggest we set type explicitly in the USER case too just
to make that abundantly clear rather than taking advantage of the
kzalloc struct.

> +
> +	ret = vfio_set_msi_aperture(iommu, iova, size);
> +	if (ret)
> +		goto free_unlock;
> +
> +	vfio_link_dma(iommu, dma);
> +	goto unlock;
> +
> +free_unlock:
> +	kfree(dma);
> +unlock:
> +	mutex_unlock(&iommu->lock);
> +	return ret;
> +}
> +
>  static int vfio_bus_type(struct device *dev, void *data)
>  {
>  	struct bus_type **bus = data;
> @@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
>  		struct vfio_iommu_type1_dma_map map;
>  		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
> -				VFIO_DMA_MAP_FLAG_WRITE;
> +				VFIO_DMA_MAP_FLAG_WRITE |
> +				VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>  
>  		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>  
> @@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  		if (map.argsz < minsz || map.flags & ~mask)
>  			return -EINVAL;
>  
> +		if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
> +			return vfio_register_msi_range(iommu, &map);
> +
>  		return vfio_dma_do_map(iommu, &map);
>  
>  	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 255a211..4a9dbc2 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
>   *
>   * Map process virtual addresses to IO virtual addresses using the
>   * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
> + *
> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
> + * IOVA region that will be used on some platforms to map the host MSI frames.
> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
> + * IOVA region stays until the container is closed.
>   */
>  struct vfio_iommu_type1_dma_map {
>  	__u32	argsz;
>  	__u32	flags;
>  #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
>  #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
> +/* reserved iova for MSI vectors*/
> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
>  	__u64	vaddr;				/* Process virtual address */
>  	__u64	iova;				/* IO virtual address */
>  	__u64	size;				/* Size of mapping (bytes) */
> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
>   * Caller sets argsz.  The actual unmapped size is returned in the size
>   * field.  No guarantee is made to the user that arbitrary unmaps of iova
>   * or size different from those used in the original mapping call will
> - * succeed.
> + * succeed. Once registered, an MSI region cannot be unmapped and stays
> + * until the container is closed.
>   */
>  struct vfio_iommu_type1_dma_unmap {
>  	__u32	argsz;

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 4/7] vfio: allow reserved msi iova registration
  2016-05-10 15:29   ` Alex Williamson
@ 2016-05-10 15:34     ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-05-10 15:34 UTC (permalink / raw)
  To: Alex Williamson
  Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
	marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, julien.grall, yehuday

Hi Alex,
On 05/10/2016 05:29 PM, Alex Williamson wrote:
> On Wed,  4 May 2016 11:54:15 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
> 
>> The user is allowed to register a reserved MSI IOVA range by using the
>> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>> This region is stored in the vfio_dma rb tree. At that point the iova
>> range is not mapped to any target address yet. The host kernel will use
>> those iova when needed, typically when MSIs are allocated.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
>>
>> ---
>> v7 -> v8:
>> - use iommu_msi_set_aperture function. There is no notion of
>>   unregistration anymore since the reserved msi slot remains
>>   until the container gets closed.
>>
>> v6 -> v7:
>> - use iommu_free_reserved_iova_domain
>> - convey prot attributes downto dma-reserved-iommu iova domain creation
>> - reserved bindings teardown now performed on iommu domain destruction
>> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
>>          VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>> - change title
>> - pass the protection attribute to dma-reserved-iommu API
>>
>> v3 -> v4:
>> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>> - protect vfio_register_reserved_iova_range implementation with
>>   CONFIG_IOMMU_DMA_RESERVED
>> - handle unregistration by user-space and on vfio_iommu_type1 release
>>
>> v1 -> v2:
>> - set returned value according to alloc_reserved_iova_domain result
>> - free the iova domains in case any error occurs
>>
>> RFC v1 -> v1:
>> - takes into account Alex comments, based on
>>   [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>> - use the existing dma map/unmap ioctl interface with a flag to register
>>   a reserved IOVA range. A single reserved iova region is allowed.
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
>>  include/uapi/linux/vfio.h       | 10 +++++-
>>  2 files changed, 86 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 94a9916..4d3a6f1 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -36,6 +36,7 @@
>>  #include <linux/uaccess.h>
>>  #include <linux/vfio.h>
>>  #include <linux/workqueue.h>
>> +#include <linux/msi-iommu.h>
>>  
>>  #define DRIVER_VERSION  "0.2"
>>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
>> @@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  	vfio_lock_acct(-unlocked);
>>  }
>>  
>> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>> +				dma_addr_t iova, size_t size)
>> +{
>> +	struct vfio_domain *d;
>> +	int ret = 0;
>> +
>> +	list_for_each_entry(d, &iommu->domain_list, next) {
>> +		ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
>> +		if (ret)
>> +			break;
>> +	}
>> +	return ret;
>> +}
>> +
>>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  {
>>  	vfio_unmap_unpin(iommu, dma);
>> @@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  	return ret;
>>  }
>>  
>> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
>> +				   struct vfio_iommu_type1_dma_map *map)
>> +{
>> +	dma_addr_t iova = map->iova;
>> +	size_t size = map->size;
>> +	int ret = 0;
>> +	struct vfio_dma *dma;
>> +	unsigned long order;
>> +	uint64_t mask;
>> +
>> +	/* Verify that none of our __u64 fields overflow */
>> +	if (map->size != size || map->iova != iova)
>> +		return -EINVAL;
>> +
>> +	order =  __ffs(vfio_pgsize_bitmap(iommu));
>> +	mask = ((uint64_t)1 << order) - 1;
>> +
>> +	WARN_ON(mask & PAGE_MASK);
>> +
>> +	if (!size || (size | iova) & mask)
>> +		return -EINVAL;
>> +
>> +	/* Don't allow IOVA address wrap */
>> +	if (iova + size - 1 < iova)
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&iommu->lock);
>> +
>> +	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>> +		ret =  -EEXIST;
>> +		goto unlock;
>> +	}
>> +
>> +	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>> +	if (!dma) {
>> +		ret = -ENOMEM;
>> +		goto unlock;
>> +	}
>> +
>> +	dma->iova = iova;
>> +	dma->size = size;
>> +	dma->type = VFIO_IOVA_RESERVED;
> 
> [oops, forgot to send this reply with the others]
> 
> I'm tempted to suggest we set type explicitly in the USER case too just
> to make that abundantly clear rather than taking advantage of the
> kzalloc struct.

no worries. OK I will set the dma type explicitly in vfio_dma_do_map too.

Thanks

Eric
> 
>> +
>> +	ret = vfio_set_msi_aperture(iommu, iova, size);
>> +	if (ret)
>> +		goto free_unlock;
>> +
>> +	vfio_link_dma(iommu, dma);
>> +	goto unlock;
>> +
>> +free_unlock:
>> +	kfree(dma);
>> +unlock:
>> +	mutex_unlock(&iommu->lock);
>> +	return ret;
>> +}
>> +
>>  static int vfio_bus_type(struct device *dev, void *data)
>>  {
>>  	struct bus_type **bus = data;
>> @@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
>>  		struct vfio_iommu_type1_dma_map map;
>>  		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
>> -				VFIO_DMA_MAP_FLAG_WRITE;
>> +				VFIO_DMA_MAP_FLAG_WRITE |
>> +				VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>>  
>>  		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>>  
>> @@ -1072,6 +1145,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  		if (map.argsz < minsz || map.flags & ~mask)
>>  			return -EINVAL;
>>  
>> +		if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
>> +			return vfio_register_msi_range(iommu, &map);
>> +
>>  		return vfio_dma_do_map(iommu, &map);
>>  
>>  	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 255a211..4a9dbc2 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
>>   *
>>   * Map process virtual addresses to IO virtual addresses using the
>>   * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
>> + *
>> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
>> + * IOVA region that will be used on some platforms to map the host MSI frames.
>> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>> + * IOVA region stays until the container is closed.
>>   */
>>  struct vfio_iommu_type1_dma_map {
>>  	__u32	argsz;
>>  	__u32	flags;
>>  #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
>>  #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
>> +/* reserved iova for MSI vectors*/
>> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
>>  	__u64	vaddr;				/* Process virtual address */
>>  	__u64	iova;				/* IO virtual address */
>>  	__u64	size;				/* Size of mapping (bytes) */
>> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
>>   * Caller sets argsz.  The actual unmapped size is returned in the size
>>   * field.  No guarantee is made to the user that arbitrary unmaps of iova
>>   * or size different from those used in the original mapping call will
>> - * succeed.
>> + * succeed. Once registered, an MSI region cannot be unmapped and stays
>> + * until the container is closed.
>>   */
>>  struct vfio_iommu_type1_dma_unmap {
>>  	__u32	argsz;
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
  2016-05-09 22:49   ` Alex Williamson
@ 2016-05-10 16:10     ` Eric Auger
  2016-05-10 17:24       ` Robin Murphy
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Auger @ 2016-05-10 16:10 UTC (permalink / raw)
  To: Alex Williamson
  Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
	marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, julien.grall, yehuday

Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
> On Wed,  4 May 2016 11:54:16 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
> 
>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>> by the msi controller. vfio_safe_irq_domain allows to check whether
>> interrupts are "safe" for a given device. They are if the device does
>> not use MSI
> 
> Are we sure we're not opening a security hole here?  An MSI is simply a
> DMA write, so really whether or not a device uses MSI is irrelevant.
> If it can generate a DMA to the MSI doorbell then we need to be
> protected and I think we pretty much need to assume that devices are
> DMA capable.  Do the MSI domain checks cover this?
Let me try to rephrase: we check the device is not attached to an MSI
controller (I think this is the semantic of dev_get_msi_domain(dev)).

If it is not, we don't have to care about MSI isolation: there will be
no IOMMU binding between the device and any MSI doorbell. If it is we
check the msi domain is backed by an MSI controller able to perform MSI
isolation.

So effectively "usage of MSIs" is improper - since it is decided after
the group attachment anyway  - and the commit message should rather
state "if the device is linked to an MSI controller" (dt msi-parent
notion I think).

Does it sound better?

Regards

Eric
> 
>> or if the device uses MSI and the msi-parent controller
>> supports IRQ remapping.
>>
>> Then we check at group level if all devices have safe interrupts: if not,
>> we only allow the group to be attached if allow_unsafe_interrupts is set.
>>
>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>> changed in next patch.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>> v3 -> v4:
>> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>>   and irq_remapping into safe_irq_domains
>>
>> v2 -> v3:
>> - protect vfio_msi_parent_irq_remapping_capable with
>>   CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 42 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 4d3a6f1..2fc8197 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -37,6 +37,8 @@
>>  #include <linux/vfio.h>
>>  #include <linux/workqueue.h>
>>  #include <linux/msi-iommu.h>
>> +#include <linux/irqdomain.h>
>> +#include <linux/msi.h>
>>  
>>  #define DRIVER_VERSION  "0.2"
>>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
>>  	return 0;
>>  }
>>  
>> +/**
>> + * vfio_safe_irq_domain: returns whether the irq domain
>> + * the device is attached to is safe with respect to MSI isolation.
>> + * If the irq domain is not an MSI domain, we return it is safe.
>> + *
>> + * @dev: device handle
>> + * @data: unused
>> + * returns 0 if the irq domain is safe, -1 if not.
>> + */
>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>> +{
>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> +	struct irq_domain *domain;
>> +	struct msi_domain_info *info;
>> +
>> +	domain = dev_get_msi_domain(dev);
>> +	if (!domain)
>> +		return 0;
>> +
>> +	info = msi_get_domain_info(domain);
>> +
>> +	if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>> +		return -1;
>> +#endif
>> +	return 0;
>> +}
>> +
>>  static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>  			     struct vfio_domain *domain)
>>  {
>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>  	struct vfio_group *group, *g;
>>  	struct vfio_domain *domain, *d;
>>  	struct bus_type *bus = NULL;
>> -	int ret;
>> +	int ret, safe_irq_domains;
>>  
>>  	mutex_lock(&iommu->lock);
>>  
>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>  
>>  	group->iommu_group = iommu_group;
>>  
>> +	/*
>> +	 * Determine if all the devices of the group have a safe irq domain
>> +	 * with respect to MSI isolation
>> +	 */
>> +	safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>> +				       vfio_safe_irq_domain);
>> +
>>  	/* Determine bus_type in order to allocate a domain */
>>  	ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>>  	if (ret)
>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>  	INIT_LIST_HEAD(&domain->group_list);
>>  	list_add(&group->next, &domain->group_list);
>>  
>> +	/*
>> +	 * to advertise safe interrupts either the IOMMU or the MSI controllers
>> +	 * must support IRQ remapping/interrupt translation
>> +	 */
>>  	if (!allow_unsafe_interrupts &&
>> -	    !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>> +	    (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
>>  		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
>>  		       __func__);
>>  		ret = -EPERM;
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
  2016-05-09 22:49   ` Alex Williamson
@ 2016-05-10 16:36     ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-05-10 16:36 UTC (permalink / raw)
  To: Alex Williamson
  Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
	marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, julien.grall, yehuday

Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
> On Wed,  4 May 2016 11:54:18 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
> 
>> This patch allows the user-space to retrieve the MSI geometry. The
>> implementation is based on capability chains, now also added to
>> VFIO_IOMMU_GET_INFO.
>>
>> The returned info comprise:
>> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>>   in the positive, the start/end of the aperture,
>> - or whether the IOVA aperture need to be set by the userspace. In that
>>   case, the size and alignment of the IOVA region to be provided are
>>   returned.
>>
>> In case the userspace must provide the IOVA range, we currently return
>> an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
>> current ARM platforms. This may be deprecated by a more sophisticated
>> computation later on.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>> v8 -> v9:
>> - use iommu_msi_supported flag instead of programmable
>> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>>   capability chain, reporting the MSI geometry
>>
>> v7 -> v8:
>> - use iommu_domain_msi_geometry
>>
>> v6 -> v7:
>> - remove the computation of the number of IOVA pages to be provisionned.
>>   This number depends on the domain/group/device topology which can
>>   dynamically change. Let's rely instead rely on an arbitrary max depending
>>   on the system
>>
>> v4 -> v5:
>> - move msi_info and ret declaration within the conditional code
>>
>> v3 -> v4:
>> - replace former vfio_domains_require_msi_mapping by
>>   more complex computation of MSI mapping requirements, especially the
>>   number of pages to be provided by the user-space.
>> - reword patch title
>>
>> RFC v1 -> v1:
>> - derived from
>>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
>> - renamed allow_msi_reconfig into require_msi_mapping
>> - fixed VFIO_IOMMU_GET_INFO
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
>>  include/uapi/linux/vfio.h       | 30 +++++++++++++++++-
>>  2 files changed, 98 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 2fc8197..841360b 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>>  	return ret;
>>  }
>>  
>> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
>> +				     struct vfio_info_cap *caps)
>> +{
>> +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
>> +	struct iommu_domain_msi_geometry msi_geometry;
>> +	struct vfio_info_cap_header *header;
>> +	struct vfio_domain *d;
>> +	bool mapping_required;
>> +	size_t size;
>> +
>> +	mutex_lock(&iommu->lock);
>> +	/* All domains have same require_msi_map property, pick first */
>> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
>> +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
>> +			      &msi_geometry);
>> +	mapping_required = msi_geometry.iommu_msi_supported;
>> +
>> +	mutex_unlock(&iommu->lock);
>> +
>> +	size = sizeof(*vfio_msi_geometry);
>> +	header = vfio_info_cap_add(caps, size,
>> +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
>> +
>> +	if (IS_ERR(header))
>> +		return PTR_ERR(header);
>> +
>> +	vfio_msi_geometry = container_of(header,
>> +				struct vfio_iommu_type1_info_cap_msi_geometry,
>> +				header);
>> +
>> +	vfio_msi_geometry->reserved = !mapping_required;
>> +	if (vfio_msi_geometry->reserved) {
>> +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
>> +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
>> +		return 0;
>> +	}
>> +
>> +	vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
>> +	/* we currently report the need for an arbitray number of 16 pages */
>> +	vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
> 
> Hmm, that really is arbitrary.  How could we know a real value here?
Yes I fully agree and this is aknowledged in the cover/commit msg. I
dared to do that because this has the benefits to allow introducing the
userspace API while refining this computation later on.

I did not find yet an elegant solution to compute the platform max
number/size of doorbells (besides what I did in the past which was
dependent on the group/device current topology).

Maybe an option would be to have the relevant MSI controllers
registering their doorbells in a global list at probe time and then we
would enumerate all of them. I was reluctant to add this new
functionality in the series at this stage, hence the current simplification.
> 
>> +
>> +	return 0;
>> +}
>> +
>>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  				   unsigned int cmd, unsigned long arg)
>>  {
>> @@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  		}
>>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
>>  		struct vfio_iommu_type1_info info;
>> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>> +		int ret;
>>  
>>  		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>>  
>> @@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  
>>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>>  
>> +		ret = compute_msi_geometry_caps(iommu, &caps);
>> +		if (ret)
>> +			return ret;
>> +
>> +		if (caps.size) {
>> +			info.flags |= VFIO_IOMMU_INFO_CAPS;
>> +			if (info.argsz < sizeof(info) + caps.size) {
>> +				info.argsz = sizeof(info) + caps.size;
>> +				info.cap_offset = 0;
>> +			} else {
>> +				vfio_info_cap_shift(&caps, sizeof(info));
>> +				if (copy_to_user((void __user *)arg +
>> +						sizeof(info), caps.buf,
>> +						caps.size)) {
>> +					kfree(caps.buf);
>> +					return -EFAULT;
>> +				}
>> +				info.cap_offset = sizeof(info);
>> +			}
>> +
>> +			kfree(caps.buf);
>> +		}
>> +
>>  		return copy_to_user((void __user *)arg, &info, minsz) ?
>>  			-EFAULT : 0;
>>  
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 4a9dbc2..0ff6a8d 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
>>  	__u32	argsz;
>>  	__u32	flags;
>>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
>> -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
>> +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
>> +	__u32   cap_offset;	/* Offset within info struct of first cap */
>> +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
> 
> This would break existing users, we can't arbitrarily change the offset
> of iova_pgsizes.  We can add cap_offset to the end and I think
> everything would work about above if we do that.
Hum yes, sorry for the lack of care.
> 
>> +};
>> +
>> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
>> +
>> +/*
>> + * The MSI geometry capability allows to report the MSI IOVA geometry:
>> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
>> + *   whose boundaries are given by [@aperture_start, @aperture_end].
>> + *   this is typically the case on x86 host. The userspace is not allowed
>> + *   to map userspace memory at IOVAs intersecting this range using
>> + *   VFIO_IOMMU_MAP_DMA.
>> + * - or the MSI IOVAs are not requested to belong to any reserved range;
>> + *   in that case the userspace must provide an IOVA window characterized by
>> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
>> + */
>> +struct vfio_iommu_type1_info_cap_msi_geometry {
>> +	struct vfio_info_cap_header header;
>> +	bool reserved; /* Are MSI IOVAs within a reserved aperture? */
> 
> Do bools have a guaranteed user size?  Let's make this a __u32 and call
> it flags with bit 0 defined as reserved.  I'm tempted to suggest we
> could figure out how to make alignment fit in another __u32 so we have a
> properly packed structure, otherwise we should make a reserved __u32.
OK will rewrite & check that.

Thanks

Eric
> 
>> +	/* reserved */
>> +	__u64 aperture_start;
>> +	__u64 aperture_end;
>> +	/* not reserved */
>> +	__u64 size; /* IOVA aperture size in bytes the userspace must provide */
>> +	__u64 alignment; /* alignment of the window, in bytes */
>>  };
>>  
>>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>> @@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
>>   * IOVA region that will be used on some platforms to map the host MSI frames.
>>   * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>>   * IOVA region stays until the container is closed.
>> + * The requirement for provisioning such reserved IOVA range can be checked by
>> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
>>   */
>>  struct vfio_iommu_type1_dma_map {
>>  	__u32	argsz;
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains
  2016-05-09 23:03     ` Alex Williamson
@ 2016-05-10 16:50       ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-05-10 16:50 UTC (permalink / raw)
  To: Alex Williamson
  Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
	marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, julien.grall, yehuday

On 05/10/2016 01:03 AM, Alex Williamson wrote:
> On Wed, 4 May 2016 14:06:19 +0200
> Eric Auger <eric.auger@linaro.org> wrote:
> 
>> Hi Alex,
>> On 05/04/2016 01:54 PM, Eric Auger wrote:
>>> This patch allows the user-space to retrieve the MSI geometry. The
>>> implementation is based on capability chains, now also added to
>>> VFIO_IOMMU_GET_INFO.  
>>
>> If you prefer we could consider this patch outside of the main series
>> since it brings extra functionalities (MSI geometry reporting). In a
>> first QEMU integration we would live without knowing the MSI geometry I
>> think, all the more so I currently report an arbitrary number of
>> requested IOVA pages. The computation of the exact number of doorbells
>> to map brings extra complexity and I did not address this issue yet.
>>
>> It sketches a possible user API to report the MSI geometry based on the
>> capability chains, as you suggested some time ago. I am currently busy
>> drafting a QEMU integration.
> 
> How would the user know that reserved MSI mappings are requires or
> available without this?  Wouldn't the only option be for userspace to
> try to map something with the reserved MSI flag set and see if the
> kernel accepts it?
Well my first guess was that the (QEMU) virt machine using KVM/PCIe-MSI
passthrough could hardcode an arbitrary "large" iova size (currently 16
64kB pages in my QEMU integration). In case this is not sufficient for
mapping all host doorbells, we would see MSI allocation failing. In case
the need shows up, we could increase the value later on.
  That's not a very desirable programming model.  The
> arbitrary size is pretty ugly, but it at least makes for a consistent
> user interface.  Is it a functional issue if we overestimate the size
> or is it just a matter of wasting IOVA space?  Is there significant
> harm in making it obscenely large, like 1MB?
no just waste of IOVA space. To make it as transparent as possible for
virt machine I wanted to hide in the platform bus reserved IOVA.
  The reference counting and
> re-use of IOVA pages seems like we may often only be using a single
> IOVA page for multiple doorbells.  I guess I'm leaning towards defining
> the API even if the value is somewhat arbitrary because we'd rather have
> control of this rather than having the user guess and try to rope them
> back in later to use a kernel recommended value.  Thanks,

OK

Thanks

Eric
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
  2016-05-10 16:10     ` Eric Auger
@ 2016-05-10 17:24       ` Robin Murphy
  2016-05-11  8:38         ` Eric Auger
  0 siblings, 1 reply; 38+ messages in thread
From: Robin Murphy @ 2016-05-10 17:24 UTC (permalink / raw)
  To: Eric Auger, Alex Williamson
  Cc: eric.auger, will.deacon, joro, tglx, jason, marc.zyngier,
	christoffer.dall, linux-arm-kernel, patches, linux-kernel,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	Jean-Philippe.Brucker, julien.grall, yehuday

Hi Eric,

On 10/05/16 17:10, Eric Auger wrote:
> Hi Alex,
> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>> On Wed,  4 May 2016 11:54:16 +0000
>> Eric Auger <eric.auger@linaro.org> wrote:
>>
>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>> interrupts are "safe" for a given device. They are if the device does
>>> not use MSI
>>
>> Are we sure we're not opening a security hole here?  An MSI is simply a
>> DMA write, so really whether or not a device uses MSI is irrelevant.
>> If it can generate a DMA to the MSI doorbell then we need to be
>> protected and I think we pretty much need to assume that devices are
>> DMA capable.  Do the MSI domain checks cover this?
> Let me try to rephrase: we check the device is not attached to an MSI
> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>
> If it is not, we don't have to care about MSI isolation: there will be
> no IOMMU binding between the device and any MSI doorbell. If it is we
> check the msi domain is backed by an MSI controller able to perform MSI
> isolation.
 >
> So effectively "usage of MSIs" is improper - since it is decided after
> the group attachment anyway  - and the commit message should rather
> state "if the device is linked to an MSI controller" (dt msi-parent
> notion I think).

Hmm, I think Alex has a point here - on a GICv2m I can happily fire 
arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely 
aren't in an MSI domain, so I don't think it's valid to assume that a 
device using only wired interrupts, therefore with no connection to any 
MSI controller, isn't still capable of maliciously spewing DMA all over 
any and every doorbell region in the system.

Robin.

> Does it sound better?
>
> Regards
>
> Eric
>>
>>> or if the device uses MSI and the msi-parent controller
>>> supports IRQ remapping.
>>>
>>> Then we check at group level if all devices have safe interrupts: if not,
>>> we only allow the group to be attached if allow_unsafe_interrupts is set.
>>>
>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>> changed in next patch.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>
>>> ---
>>> v3 -> v4:
>>> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>>>    and irq_remapping into safe_irq_domains
>>>
>>> v2 -> v3:
>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>    CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>> ---
>>>   drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
>>>   1 file changed, 42 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>> index 4d3a6f1..2fc8197 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -37,6 +37,8 @@
>>>   #include <linux/vfio.h>
>>>   #include <linux/workqueue.h>
>>>   #include <linux/msi-iommu.h>
>>> +#include <linux/irqdomain.h>
>>> +#include <linux/msi.h>
>>>
>>>   #define DRIVER_VERSION  "0.2"
>>>   #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
>>>   	return 0;
>>>   }
>>>
>>> +/**
>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>> + * the device is attached to is safe with respect to MSI isolation.
>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>> + *
>>> + * @dev: device handle
>>> + * @data: unused
>>> + * returns 0 if the irq domain is safe, -1 if not.
>>> + */
>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>> +{
>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>> +	struct irq_domain *domain;
>>> +	struct msi_domain_info *info;
>>> +
>>> +	domain = dev_get_msi_domain(dev);
>>> +	if (!domain)
>>> +		return 0;
>>> +
>>> +	info = msi_get_domain_info(domain);
>>> +
>>> +	if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>> +		return -1;
>>> +#endif
>>> +	return 0;
>>> +}
>>> +
>>>   static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>   			     struct vfio_domain *domain)
>>>   {
>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>>   	struct vfio_group *group, *g;
>>>   	struct vfio_domain *domain, *d;
>>>   	struct bus_type *bus = NULL;
>>> -	int ret;
>>> +	int ret, safe_irq_domains;
>>>
>>>   	mutex_lock(&iommu->lock);
>>>
>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>>
>>>   	group->iommu_group = iommu_group;
>>>
>>> +	/*
>>> +	 * Determine if all the devices of the group have a safe irq domain
>>> +	 * with respect to MSI isolation
>>> +	 */
>>> +	safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>> +				       vfio_safe_irq_domain);
>>> +
>>>   	/* Determine bus_type in order to allocate a domain */
>>>   	ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>>>   	if (ret)
>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>>   	INIT_LIST_HEAD(&domain->group_list);
>>>   	list_add(&group->next, &domain->group_list);
>>>
>>> +	/*
>>> +	 * to advertise safe interrupts either the IOMMU or the MSI controllers
>>> +	 * must support IRQ remapping/interrupt translation
>>> +	 */
>>>   	if (!allow_unsafe_interrupts &&
>>> -	    !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>> +	    (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
>>>   		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
>>>   		       __func__);
>>>   		ret = -EPERM;
>>
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
  2016-05-10 17:24       ` Robin Murphy
@ 2016-05-11  8:38         ` Eric Auger
  2016-05-11  9:31           ` Robin Murphy
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Auger @ 2016-05-11  8:38 UTC (permalink / raw)
  To: Robin Murphy, Alex Williamson
  Cc: eric.auger, will.deacon, joro, tglx, jason, marc.zyngier,
	christoffer.dall, linux-arm-kernel, patches, linux-kernel,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	Jean-Philippe.Brucker, julien.grall, yehuday

Hi Robin, Alex,
On 05/10/2016 07:24 PM, Robin Murphy wrote:
> Hi Eric,
> 
> On 10/05/16 17:10, Eric Auger wrote:
>> Hi Alex,
>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>> On Wed,  4 May 2016 11:54:16 +0000
>>> Eric Auger <eric.auger@linaro.org> wrote:
>>>
>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>> abstracted
>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>> interrupts are "safe" for a given device. They are if the device does
>>>> not use MSI
>>>
>>> Are we sure we're not opening a security hole here?  An MSI is simply a
>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>> If it can generate a DMA to the MSI doorbell then we need to be
>>> protected and I think we pretty much need to assume that devices are
>>> DMA capable.  Do the MSI domain checks cover this?
>> Let me try to rephrase: we check the device is not attached to an MSI
>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>
>> If it is not, we don't have to care about MSI isolation: there will be
>> no IOMMU binding between the device and any MSI doorbell. If it is we
>> check the msi domain is backed by an MSI controller able to perform MSI
>> isolation.
>>
>> So effectively "usage of MSIs" is improper - since it is decided after
>> the group attachment anyway  - and the commit message should rather
>> state "if the device is linked to an MSI controller" (dt msi-parent
>> notion I think).
> 
> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
> aren't in an MSI domain, so I don't think it's valid to assume that a
> device using only wired interrupts, therefore with no connection to any
> MSI controller, isn't still capable of maliciously spewing DMA all over
> any and every doorbell region in the system.

Sorry but I still don't get the point. For the device to reach the
doorbell there must be an IOMMU mapping.

- if the device is not attached to an MSI domain, there won't be any
doorbell iommu mapping built by this series, so no risk, right?

The device will be allowed to reach only memory iommu mapped by
userspace with VFIO DMA MAP standard API. Of course if the userspace can
mmap all the host PA that's a more general issue, right?

- If the device is attached to an MSI domain (msi-parent link), 2 cases:
1) the MSI controller advertises MSI isolation (ITS cases), no risk
2) the MSI controller does not advertise MSI isolation (GICv2m), there
is a security hole.
   a) by default we reject the device attachment
   b) if the userspace overrides the safe interrupt option he accepts
the security hole

What am I missing?

Best Regards

Eric
> 
> Robin.
> 
>> Does it sound better?
>>
>> Regards
>>
>> Eric
>>>
>>>> or if the device uses MSI and the msi-parent controller
>>>> supports IRQ remapping.
>>>>
>>>> Then we check at group level if all devices have safe interrupts: if
>>>> not,
>>>> we only allow the group to be attached if allow_unsafe_interrupts is
>>>> set.
>>>>
>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>>> changed in next patch.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>
>>>> ---
>>>> v3 -> v4:
>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>> vfio_safe_irq_domain
>>>>    and irq_remapping into safe_irq_domains
>>>>
>>>> v2 -> v3:
>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>>    CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>> ---
>>>>   drivers/vfio/vfio_iommu_type1.c | 44
>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>   1 file changed, 42 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>> index 4d3a6f1..2fc8197 100644
>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>> @@ -37,6 +37,8 @@
>>>>   #include <linux/vfio.h>
>>>>   #include <linux/workqueue.h>
>>>>   #include <linux/msi-iommu.h>
>>>> +#include <linux/irqdomain.h>
>>>> +#include <linux/msi.h>
>>>>
>>>>   #define DRIVER_VERSION  "0.2"
>>>>   #define DRIVER_AUTHOR   "Alex Williamson
>>>> <alex.williamson@redhat.com>"
>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>> void *data)
>>>>       return 0;
>>>>   }
>>>>
>>>> +/**
>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>> + * the device is attached to is safe with respect to MSI isolation.
>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>> + *
>>>> + * @dev: device handle
>>>> + * @data: unused
>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>> + */
>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>> +{
>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>> +    struct irq_domain *domain;
>>>> +    struct msi_domain_info *info;
>>>> +
>>>> +    domain = dev_get_msi_domain(dev);
>>>> +    if (!domain)
>>>> +        return 0;
>>>> +
>>>> +    info = msi_get_domain_info(domain);
>>>> +
>>>> +    if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>> +        return -1;
>>>> +#endif
>>>> +    return 0;
>>>> +}
>>>> +
>>>>   static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>>                    struct vfio_domain *domain)
>>>>   {
>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>> *iommu_data,
>>>>       struct vfio_group *group, *g;
>>>>       struct vfio_domain *domain, *d;
>>>>       struct bus_type *bus = NULL;
>>>> -    int ret;
>>>> +    int ret, safe_irq_domains;
>>>>
>>>>       mutex_lock(&iommu->lock);
>>>>
>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>> *iommu_data,
>>>>
>>>>       group->iommu_group = iommu_group;
>>>>
>>>> +    /*
>>>> +     * Determine if all the devices of the group have a safe irq
>>>> domain
>>>> +     * with respect to MSI isolation
>>>> +     */
>>>> +    safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>>> +                       vfio_safe_irq_domain);
>>>> +
>>>>       /* Determine bus_type in order to allocate a domain */
>>>>       ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>>>>       if (ret)
>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>> *iommu_data,
>>>>       INIT_LIST_HEAD(&domain->group_list);
>>>>       list_add(&group->next, &domain->group_list);
>>>>
>>>> +    /*
>>>> +     * to advertise safe interrupts either the IOMMU or the MSI
>>>> controllers
>>>> +     * must support IRQ remapping/interrupt translation
>>>> +     */
>>>>       if (!allow_unsafe_interrupts &&
>>>> -        !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>> +        (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>> !safe_irq_domains)) {
>>>>           pr_warn("%s: No interrupt remapping support.  Use the
>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>> support on this platform\n",
>>>>                  __func__);
>>>>           ret = -EPERM;
>>>
>>
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
  2016-05-11  8:38         ` Eric Auger
@ 2016-05-11  9:31           ` Robin Murphy
  2016-05-11  9:44             ` Eric Auger
  0 siblings, 1 reply; 38+ messages in thread
From: Robin Murphy @ 2016-05-11  9:31 UTC (permalink / raw)
  To: Eric Auger, Alex Williamson
  Cc: eric.auger, will.deacon, joro, tglx, jason, marc.zyngier,
	christoffer.dall, linux-arm-kernel, patches, linux-kernel,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	Jean-Philippe.Brucker, julien.grall, yehuday

On 11/05/16 09:38, Eric Auger wrote:
> Hi Robin, Alex,
> On 05/10/2016 07:24 PM, Robin Murphy wrote:
>> Hi Eric,
>>
>> On 10/05/16 17:10, Eric Auger wrote:
>>> Hi Alex,
>>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>>> On Wed,  4 May 2016 11:54:16 +0000
>>>> Eric Auger <eric.auger@linaro.org> wrote:
>>>>
>>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>>> abstracted
>>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>>> interrupts are "safe" for a given device. They are if the device does
>>>>> not use MSI
>>>>
>>>> Are we sure we're not opening a security hole here?  An MSI is simply a
>>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>>> If it can generate a DMA to the MSI doorbell then we need to be
>>>> protected and I think we pretty much need to assume that devices are
>>>> DMA capable.  Do the MSI domain checks cover this?
>>> Let me try to rephrase: we check the device is not attached to an MSI
>>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>>
>>> If it is not, we don't have to care about MSI isolation: there will be
>>> no IOMMU binding between the device and any MSI doorbell. If it is we
>>> check the msi domain is backed by an MSI controller able to perform MSI
>>> isolation.
>>>
>>> So effectively "usage of MSIs" is improper - since it is decided after
>>> the group attachment anyway  - and the commit message should rather
>>> state "if the device is linked to an MSI controller" (dt msi-parent
>>> notion I think).
>>
>> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
>> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
>> aren't in an MSI domain, so I don't think it's valid to assume that a
>> device using only wired interrupts, therefore with no connection to any
>> MSI controller, isn't still capable of maliciously spewing DMA all over
>> any and every doorbell region in the system.
>
> Sorry but I still don't get the point. For the device to reach the
> doorbell there must be an IOMMU mapping.

Only when the doorbell is _downstream_ of IOMMU translation.

> - if the device is not attached to an MSI domain, there won't be any
> doorbell iommu mapping built by this series, so no risk, right?
>
> The device will be allowed to reach only memory iommu mapped by
> userspace with VFIO DMA MAP standard API. Of course if the userspace can
> mmap all the host PA that's a more general issue, right?
>
> - If the device is attached to an MSI domain (msi-parent link), 2 cases:
> 1) the MSI controller advertises MSI isolation (ITS cases), no risk
> 2) the MSI controller does not advertise MSI isolation (GICv2m), there
> is a security hole.
>     a) by default we reject the device attachment
>     b) if the userspace overrides the safe interrupt option he accepts
> the security hole
>
> What am I missing?

The x86-with-interrupt-remapping-disabled case. Currently, without 
unsafe_interrupts, everything is rejected - with this patch, we'll still 
reject anything with an MSI domain, but e.g. legacy PCI devices using 
INTx would be free to scribble all over the un-translated interrupt 
region willy-nilly. That's the subtle, but significant, change in behaviour.

Robin.

> Best Regards
>
> Eric
>>
>> Robin.
>>
>>> Does it sound better?
>>>
>>> Regards
>>>
>>> Eric
>>>>
>>>>> or if the device uses MSI and the msi-parent controller
>>>>> supports IRQ remapping.
>>>>>
>>>>> Then we check at group level if all devices have safe interrupts: if
>>>>> not,
>>>>> we only allow the group to be attached if allow_unsafe_interrupts is
>>>>> set.
>>>>>
>>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>>>> changed in next patch.
>>>>>
>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>
>>>>> ---
>>>>> v3 -> v4:
>>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>>> vfio_safe_irq_domain
>>>>>     and irq_remapping into safe_irq_domains
>>>>>
>>>>> v2 -> v3:
>>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>>>     CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>> ---
>>>>>    drivers/vfio/vfio_iommu_type1.c | 44
>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>    1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>> index 4d3a6f1..2fc8197 100644
>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>> @@ -37,6 +37,8 @@
>>>>>    #include <linux/vfio.h>
>>>>>    #include <linux/workqueue.h>
>>>>>    #include <linux/msi-iommu.h>
>>>>> +#include <linux/irqdomain.h>
>>>>> +#include <linux/msi.h>
>>>>>
>>>>>    #define DRIVER_VERSION  "0.2"
>>>>>    #define DRIVER_AUTHOR   "Alex Williamson
>>>>> <alex.williamson@redhat.com>"
>>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>>> void *data)
>>>>>        return 0;
>>>>>    }
>>>>>
>>>>> +/**
>>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>>> + * the device is attached to is safe with respect to MSI isolation.
>>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>>> + *
>>>>> + * @dev: device handle
>>>>> + * @data: unused
>>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>>> + */
>>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>>> +{
>>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>> +    struct irq_domain *domain;
>>>>> +    struct msi_domain_info *info;
>>>>> +
>>>>> +    domain = dev_get_msi_domain(dev);
>>>>> +    if (!domain)
>>>>> +        return 0;
>>>>> +
>>>>> +    info = msi_get_domain_info(domain);
>>>>> +
>>>>> +    if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>>> +        return -1;
>>>>> +#endif
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>>    static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>>>                     struct vfio_domain *domain)
>>>>>    {
>>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>>> *iommu_data,
>>>>>        struct vfio_group *group, *g;
>>>>>        struct vfio_domain *domain, *d;
>>>>>        struct bus_type *bus = NULL;
>>>>> -    int ret;
>>>>> +    int ret, safe_irq_domains;
>>>>>
>>>>>        mutex_lock(&iommu->lock);
>>>>>
>>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>>> *iommu_data,
>>>>>
>>>>>        group->iommu_group = iommu_group;
>>>>>
>>>>> +    /*
>>>>> +     * Determine if all the devices of the group have a safe irq
>>>>> domain
>>>>> +     * with respect to MSI isolation
>>>>> +     */
>>>>> +    safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>>>> +                       vfio_safe_irq_domain);
>>>>> +
>>>>>        /* Determine bus_type in order to allocate a domain */
>>>>>        ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
>>>>>        if (ret)
>>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>>> *iommu_data,
>>>>>        INIT_LIST_HEAD(&domain->group_list);
>>>>>        list_add(&group->next, &domain->group_list);
>>>>>
>>>>> +    /*
>>>>> +     * to advertise safe interrupts either the IOMMU or the MSI
>>>>> controllers
>>>>> +     * must support IRQ remapping/interrupt translation
>>>>> +     */
>>>>>        if (!allow_unsafe_interrupts &&
>>>>> -        !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>>> +        (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>>> !safe_irq_domains)) {
>>>>>            pr_warn("%s: No interrupt remapping support.  Use the
>>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>>> support on this platform\n",
>>>>>                   __func__);
>>>>>            ret = -EPERM;
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
  2016-05-11  9:31           ` Robin Murphy
@ 2016-05-11  9:44             ` Eric Auger
  2016-05-11 13:48               ` Robin Murphy
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Auger @ 2016-05-11  9:44 UTC (permalink / raw)
  To: Robin Murphy, Alex Williamson
  Cc: eric.auger, will.deacon, joro, tglx, jason, marc.zyngier,
	christoffer.dall, linux-arm-kernel, patches, linux-kernel,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	Jean-Philippe.Brucker, julien.grall, yehuday

Hi Robin,
On 05/11/2016 11:31 AM, Robin Murphy wrote:
> On 11/05/16 09:38, Eric Auger wrote:
>> Hi Robin, Alex,
>> On 05/10/2016 07:24 PM, Robin Murphy wrote:
>>> Hi Eric,
>>>
>>> On 10/05/16 17:10, Eric Auger wrote:
>>>> Hi Alex,
>>>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>>>> On Wed,  4 May 2016 11:54:16 +0000
>>>>> Eric Auger <eric.auger@linaro.org> wrote:
>>>>>
>>>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>>>> abstracted
>>>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>>>> interrupts are "safe" for a given device. They are if the device does
>>>>>> not use MSI
>>>>>
>>>>> Are we sure we're not opening a security hole here?  An MSI is
>>>>> simply a
>>>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>>>> If it can generate a DMA to the MSI doorbell then we need to be
>>>>> protected and I think we pretty much need to assume that devices are
>>>>> DMA capable.  Do the MSI domain checks cover this?
>>>> Let me try to rephrase: we check the device is not attached to an MSI
>>>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>>>
>>>> If it is not, we don't have to care about MSI isolation: there will be
>>>> no IOMMU binding between the device and any MSI doorbell. If it is we
>>>> check the msi domain is backed by an MSI controller able to perform MSI
>>>> isolation.
>>>>
>>>> So effectively "usage of MSIs" is improper - since it is decided after
>>>> the group attachment anyway  - and the commit message should rather
>>>> state "if the device is linked to an MSI controller" (dt msi-parent
>>>> notion I think).
>>>
>>> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
>>> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
>>> aren't in an MSI domain, so I don't think it's valid to assume that a
>>> device using only wired interrupts, therefore with no connection to any
>>> MSI controller, isn't still capable of maliciously spewing DMA all over
>>> any and every doorbell region in the system.
>>
>> Sorry but I still don't get the point. For the device to reach the
>> doorbell there must be an IOMMU mapping.
> 
> Only when the doorbell is _downstream_ of IOMMU translation.
Yes but that's the root hypothesis with VFIO assignment, right?

Except if the no-iommu option is explicitly set by the userspace, there
must be an IOMMU downstream to the device that protects all the DMA
transactions.
> 
>> - if the device is not attached to an MSI domain, there won't be any
>> doorbell iommu mapping built by this series, so no risk, right?
>>
>> The device will be allowed to reach only memory iommu mapped by
>> userspace with VFIO DMA MAP standard API. Of course if the userspace can
>> mmap all the host PA that's a more general issue, right?
>>
>> - If the device is attached to an MSI domain (msi-parent link), 2 cases:
>> 1) the MSI controller advertises MSI isolation (ITS cases), no risk
>> 2) the MSI controller does not advertise MSI isolation (GICv2m), there
>> is a security hole.
>>     a) by default we reject the device attachment
>>     b) if the userspace overrides the safe interrupt option he accepts
>> the security hole
>>
>> What am I missing?
> 
> The x86-with-interrupt-remapping-disabled case. Currently, without
> unsafe_interrupts, everything is rejected - with this patch, we'll still
> reject anything with an MSI domain, but e.g. legacy PCI devices using
> INTx would be free to scribble all over the un-translated interrupt
> region willy-nilly.
Assuming we have an IOMMU, either we have translated transactions or
faults? So if legacy PCI devices try to write into a doorbell, there
must be faults because there is no IOMMU mapping for doorbells?

Best Regards

Eric
 That's the subtle, but significant, change in
> behaviour.
> 
> Robin.
> 
>> Best Regards
>>
>> Eric
>>>
>>> Robin.
>>>
>>>> Does it sound better?
>>>>
>>>> Regards
>>>>
>>>> Eric
>>>>>
>>>>>> or if the device uses MSI and the msi-parent controller
>>>>>> supports IRQ remapping.
>>>>>>
>>>>>> Then we check at group level if all devices have safe interrupts: if
>>>>>> not,
>>>>>> we only allow the group to be attached if allow_unsafe_interrupts is
>>>>>> set.
>>>>>>
>>>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>>>>> changed in next patch.
>>>>>>
>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>>
>>>>>> ---
>>>>>> v3 -> v4:
>>>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>>>> vfio_safe_irq_domain
>>>>>>     and irq_remapping into safe_irq_domains
>>>>>>
>>>>>> v2 -> v3:
>>>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>>>>     CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>> ---
>>>>>>    drivers/vfio/vfio_iommu_type1.c | 44
>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>>    1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>>> index 4d3a6f1..2fc8197 100644
>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>>> @@ -37,6 +37,8 @@
>>>>>>    #include <linux/vfio.h>
>>>>>>    #include <linux/workqueue.h>
>>>>>>    #include <linux/msi-iommu.h>
>>>>>> +#include <linux/irqdomain.h>
>>>>>> +#include <linux/msi.h>
>>>>>>
>>>>>>    #define DRIVER_VERSION  "0.2"
>>>>>>    #define DRIVER_AUTHOR   "Alex Williamson
>>>>>> <alex.williamson@redhat.com>"
>>>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>>>> void *data)
>>>>>>        return 0;
>>>>>>    }
>>>>>>
>>>>>> +/**
>>>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>>>> + * the device is attached to is safe with respect to MSI isolation.
>>>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>>>> + *
>>>>>> + * @dev: device handle
>>>>>> + * @data: unused
>>>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>>>> + */
>>>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>>>> +{
>>>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>> +    struct irq_domain *domain;
>>>>>> +    struct msi_domain_info *info;
>>>>>> +
>>>>>> +    domain = dev_get_msi_domain(dev);
>>>>>> +    if (!domain)
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    info = msi_get_domain_info(domain);
>>>>>> +
>>>>>> +    if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>>>> +        return -1;
>>>>>> +#endif
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>>    static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>>>>                     struct vfio_domain *domain)
>>>>>>    {
>>>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>>>> *iommu_data,
>>>>>>        struct vfio_group *group, *g;
>>>>>>        struct vfio_domain *domain, *d;
>>>>>>        struct bus_type *bus = NULL;
>>>>>> -    int ret;
>>>>>> +    int ret, safe_irq_domains;
>>>>>>
>>>>>>        mutex_lock(&iommu->lock);
>>>>>>
>>>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>>>> *iommu_data,
>>>>>>
>>>>>>        group->iommu_group = iommu_group;
>>>>>>
>>>>>> +    /*
>>>>>> +     * Determine if all the devices of the group have a safe irq
>>>>>> domain
>>>>>> +     * with respect to MSI isolation
>>>>>> +     */
>>>>>> +    safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>>>>> +                       vfio_safe_irq_domain);
>>>>>> +
>>>>>>        /* Determine bus_type in order to allocate a domain */
>>>>>>        ret = iommu_group_for_each_dev(iommu_group, &bus,
>>>>>> vfio_bus_type);
>>>>>>        if (ret)
>>>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>>>> *iommu_data,
>>>>>>        INIT_LIST_HEAD(&domain->group_list);
>>>>>>        list_add(&group->next, &domain->group_list);
>>>>>>
>>>>>> +    /*
>>>>>> +     * to advertise safe interrupts either the IOMMU or the MSI
>>>>>> controllers
>>>>>> +     * must support IRQ remapping/interrupt translation
>>>>>> +     */
>>>>>>        if (!allow_unsafe_interrupts &&
>>>>>> -        !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>>>> +        (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>>>> !safe_irq_domains)) {
>>>>>>            pr_warn("%s: No interrupt remapping support.  Use the
>>>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>>>> support on this platform\n",
>>>>>>                   __func__);
>>>>>>            ret = -EPERM;
>>>>>
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 3/7] vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots
  2016-05-09 22:49   ` Alex Williamson
@ 2016-05-11 12:58     ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-05-11 12:58 UTC (permalink / raw)
  To: Alex Williamson
  Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
	marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, julien.grall, yehuday

Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
> On Wed,  4 May 2016 11:54:14 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
> 
>> Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
>> let's implement the expected behavior for removal and replay. As opposed
>> to user dma slots, IOVAs are not systematically bound to PAs and PAs are
>> not pinned. VFIO just initializes the IOVA "aperture". IOVAs are allocated
>> outside of the VFIO framework, typically the MSI layer which is
>> responsible to free and unmap them. The MSI mapping resources are freeed
>> by the IOMMU driver on domain destruction.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> v7 -> v8:
>> - do no destroy anything anymore, just bypass unmap/unpin and iommu_map
>>   on replay
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 2d769d4..94a9916 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -391,7 +391,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  	struct vfio_domain *domain, *d;
>>  	long unlocked = 0;
>>  
>> -	if (!dma->size)
>> +	if (!dma->size || dma->type != VFIO_IOVA_USER)
>>  		return;
>>  	/*
>>  	 * We use the IOMMU to track the physical addresses, otherwise we'd
>> @@ -727,6 +727,9 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>  		dma = rb_entry(n, struct vfio_dma, node);
>>  		iova = dma->iova;
>>  
>> +		if (dma->type == VFIO_IOVA_RESERVED)
>> +			continue;
>> +
> 
> But you do still need some sort of replay mechanism, right?  Not to
> replay the IOVA to PA mapping, but to call iommu_msi_set_aperture() for
> the new domain.  How will you know that this entry is an MSI reserved
> range or something else?  Perhaps we can't have a generic "reserved"
> type here.
Thanks for spotting this bug. I was not testing this case yet and
effectively I need to replay the set_aperture.

Thanks for the review

Best Regards

Eric

> 
>>  		while (iova < dma->iova + dma->size) {
>>  			phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
>>  			size_t size;
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
  2016-05-11  9:44             ` Eric Auger
@ 2016-05-11 13:48               ` Robin Murphy
  2016-05-11 14:37                 ` Eric Auger
  0 siblings, 1 reply; 38+ messages in thread
From: Robin Murphy @ 2016-05-11 13:48 UTC (permalink / raw)
  To: Eric Auger, Alex Williamson
  Cc: eric.auger, will.deacon, joro, tglx, jason, marc.zyngier,
	christoffer.dall, linux-arm-kernel, patches, linux-kernel,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	Jean-Philippe.Brucker, julien.grall, yehuday

On 11/05/16 10:44, Eric Auger wrote:
> Hi Robin,
> On 05/11/2016 11:31 AM, Robin Murphy wrote:
>> On 11/05/16 09:38, Eric Auger wrote:
>>> Hi Robin, Alex,
>>> On 05/10/2016 07:24 PM, Robin Murphy wrote:
>>>> Hi Eric,
>>>>
>>>> On 10/05/16 17:10, Eric Auger wrote:
>>>>> Hi Alex,
>>>>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>>>>> On Wed,  4 May 2016 11:54:16 +0000
>>>>>> Eric Auger <eric.auger@linaro.org> wrote:
>>>>>>
>>>>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>>>>> abstracted
>>>>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>>>>> interrupts are "safe" for a given device. They are if the device does
>>>>>>> not use MSI
>>>>>>
>>>>>> Are we sure we're not opening a security hole here?  An MSI is
>>>>>> simply a
>>>>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>>>>> If it can generate a DMA to the MSI doorbell then we need to be
>>>>>> protected and I think we pretty much need to assume that devices are
>>>>>> DMA capable.  Do the MSI domain checks cover this?
>>>>> Let me try to rephrase: we check the device is not attached to an MSI
>>>>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>>>>
>>>>> If it is not, we don't have to care about MSI isolation: there will be
>>>>> no IOMMU binding between the device and any MSI doorbell. If it is we
>>>>> check the msi domain is backed by an MSI controller able to perform MSI
>>>>> isolation.
>>>>>
>>>>> So effectively "usage of MSIs" is improper - since it is decided after
>>>>> the group attachment anyway  - and the commit message should rather
>>>>> state "if the device is linked to an MSI controller" (dt msi-parent
>>>>> notion I think).
>>>>
>>>> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
>>>> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
>>>> aren't in an MSI domain, so I don't think it's valid to assume that a
>>>> device using only wired interrupts, therefore with no connection to any
>>>> MSI controller, isn't still capable of maliciously spewing DMA all over
>>>> any and every doorbell region in the system.
>>>
>>> Sorry but I still don't get the point. For the device to reach the
>>> doorbell there must be an IOMMU mapping.
>>
>> Only when the doorbell is _downstream_ of IOMMU translation.
> Yes but that's the root hypothesis with VFIO assignment, right?
>
> Except if the no-iommu option is explicitly set by the userspace, there
> must be an IOMMU downstream to the device that protects all the DMA
> transactions.

The IOMMU is downstream of the device, clearly, but the MSI machinery 
itself could be _in between_ the two - say, integrated into a PCIe root 
complex served by an external SMMU. The doorbell is carved out of the 
upstream address space so that a write targeting the appropriate address 
hits it directly and never even goes out to the SMMU - e.g. 
http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268

>>> - if the device is not attached to an MSI domain, there won't be any
>>> doorbell iommu mapping built by this series, so no risk, right?
>>>
>>> The device will be allowed to reach only memory iommu mapped by
>>> userspace with VFIO DMA MAP standard API. Of course if the userspace can
>>> mmap all the host PA that's a more general issue, right?
>>>
>>> - If the device is attached to an MSI domain (msi-parent link), 2 cases:
>>> 1) the MSI controller advertises MSI isolation (ITS cases), no risk
>>> 2) the MSI controller does not advertise MSI isolation (GICv2m), there
>>> is a security hole.
>>>      a) by default we reject the device attachment
>>>      b) if the userspace overrides the safe interrupt option he accepts
>>> the security hole
>>>
>>> What am I missing?
>>
>> The x86-with-interrupt-remapping-disabled case. Currently, without
>> unsafe_interrupts, everything is rejected - with this patch, we'll still
>> reject anything with an MSI domain, but e.g. legacy PCI devices using
>> INTx would be free to scribble all over the un-translated interrupt
>> region willy-nilly.
> Assuming we have an IOMMU, either we have translated transactions or
> faults? So if legacy PCI devices try to write into a doorbell, there
> must be faults because there is no IOMMU mapping for doorbells?

To the best of my understanding, having been wading through the VT-d 
spec, the case on x86 is as above (OK, so strictly it's more "in 
parallel with" than "in front of") - writes outside the 0xFEExxxxx 
interrupt region go to the DMA remapping units, whereas writes within 
that region go to the interrupt remapping unit, and if that's disabled 
or not implemented, simply sail straight through to the APIC beyond.

Robin.

> Best Regards
>
> Eric
>   That's the subtle, but significant, change in
>> behaviour.
>>
>> Robin.
>>
>>> Best Regards
>>>
>>> Eric
>>>>
>>>> Robin.
>>>>
>>>>> Does it sound better?
>>>>>
>>>>> Regards
>>>>>
>>>>> Eric
>>>>>>
>>>>>>> or if the device uses MSI and the msi-parent controller
>>>>>>> supports IRQ remapping.
>>>>>>>
>>>>>>> Then we check at group level if all devices have safe interrupts: if
>>>>>>> not,
>>>>>>> we only allow the group to be attached if allow_unsafe_interrupts is
>>>>>>> set.
>>>>>>>
>>>>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>>>>>>> changed in next patch.
>>>>>>>
>>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>>>
>>>>>>> ---
>>>>>>> v3 -> v4:
>>>>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>>>>> vfio_safe_irq_domain
>>>>>>>      and irq_remapping into safe_irq_domains
>>>>>>>
>>>>>>> v2 -> v3:
>>>>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>>>>>      CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>>> ---
>>>>>>>     drivers/vfio/vfio_iommu_type1.c | 44
>>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>>>     1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>>>> index 4d3a6f1..2fc8197 100644
>>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>>>> @@ -37,6 +37,8 @@
>>>>>>>     #include <linux/vfio.h>
>>>>>>>     #include <linux/workqueue.h>
>>>>>>>     #include <linux/msi-iommu.h>
>>>>>>> +#include <linux/irqdomain.h>
>>>>>>> +#include <linux/msi.h>
>>>>>>>
>>>>>>>     #define DRIVER_VERSION  "0.2"
>>>>>>>     #define DRIVER_AUTHOR   "Alex Williamson
>>>>>>> <alex.williamson@redhat.com>"
>>>>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>>>>> void *data)
>>>>>>>         return 0;
>>>>>>>     }
>>>>>>>
>>>>>>> +/**
>>>>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>>>>> + * the device is attached to is safe with respect to MSI isolation.
>>>>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>>>>> + *
>>>>>>> + * @dev: device handle
>>>>>>> + * @data: unused
>>>>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>>>>> + */
>>>>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>>>>> +{
>>>>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>>> +    struct irq_domain *domain;
>>>>>>> +    struct msi_domain_info *info;
>>>>>>> +
>>>>>>> +    domain = dev_get_msi_domain(dev);
>>>>>>> +    if (!domain)
>>>>>>> +        return 0;
>>>>>>> +
>>>>>>> +    info = msi_get_domain_info(domain);
>>>>>>> +
>>>>>>> +    if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>>>>> +        return -1;
>>>>>>> +#endif
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>>     static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>>>>>                      struct vfio_domain *domain)
>>>>>>>     {
>>>>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>> *iommu_data,
>>>>>>>         struct vfio_group *group, *g;
>>>>>>>         struct vfio_domain *domain, *d;
>>>>>>>         struct bus_type *bus = NULL;
>>>>>>> -    int ret;
>>>>>>> +    int ret, safe_irq_domains;
>>>>>>>
>>>>>>>         mutex_lock(&iommu->lock);
>>>>>>>
>>>>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>> *iommu_data,
>>>>>>>
>>>>>>>         group->iommu_group = iommu_group;
>>>>>>>
>>>>>>> +    /*
>>>>>>> +     * Determine if all the devices of the group have a safe irq
>>>>>>> domain
>>>>>>> +     * with respect to MSI isolation
>>>>>>> +     */
>>>>>>> +    safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
>>>>>>> +                       vfio_safe_irq_domain);
>>>>>>> +
>>>>>>>         /* Determine bus_type in order to allocate a domain */
>>>>>>>         ret = iommu_group_for_each_dev(iommu_group, &bus,
>>>>>>> vfio_bus_type);
>>>>>>>         if (ret)
>>>>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>> *iommu_data,
>>>>>>>         INIT_LIST_HEAD(&domain->group_list);
>>>>>>>         list_add(&group->next, &domain->group_list);
>>>>>>>
>>>>>>> +    /*
>>>>>>> +     * to advertise safe interrupts either the IOMMU or the MSI
>>>>>>> controllers
>>>>>>> +     * must support IRQ remapping/interrupt translation
>>>>>>> +     */
>>>>>>>         if (!allow_unsafe_interrupts &&
>>>>>>> -        !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>>>>> +        (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>>>>> !safe_irq_domains)) {
>>>>>>>             pr_warn("%s: No interrupt remapping support.  Use the
>>>>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>>>>> support on this platform\n",
>>>>>>>                    __func__);
>>>>>>>             ret = -EPERM;
>>>>>>
>>>>>
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain
  2016-05-11 13:48               ` Robin Murphy
@ 2016-05-11 14:37                 ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-05-11 14:37 UTC (permalink / raw)
  To: Robin Murphy, Alex Williamson
  Cc: eric.auger, will.deacon, joro, tglx, jason, marc.zyngier,
	christoffer.dall, linux-arm-kernel, patches, linux-kernel,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	Jean-Philippe.Brucker, julien.grall, yehuday

Hi Robin,
On 05/11/2016 03:48 PM, Robin Murphy wrote:
> On 11/05/16 10:44, Eric Auger wrote:
>> Hi Robin,
>> On 05/11/2016 11:31 AM, Robin Murphy wrote:
>>> On 11/05/16 09:38, Eric Auger wrote:
>>>> Hi Robin, Alex,
>>>> On 05/10/2016 07:24 PM, Robin Murphy wrote:
>>>>> Hi Eric,
>>>>>
>>>>> On 10/05/16 17:10, Eric Auger wrote:
>>>>>> Hi Alex,
>>>>>> On 05/10/2016 12:49 AM, Alex Williamson wrote:
>>>>>>> On Wed,  4 May 2016 11:54:16 +0000
>>>>>>> Eric Auger <eric.auger@linaro.org> wrote:
>>>>>>>
>>>>>>>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
>>>>>>>> abstracted
>>>>>>>> by the msi controller. vfio_safe_irq_domain allows to check whether
>>>>>>>> interrupts are "safe" for a given device. They are if the device
>>>>>>>> does
>>>>>>>> not use MSI
>>>>>>>
>>>>>>> Are we sure we're not opening a security hole here?  An MSI is
>>>>>>> simply a
>>>>>>> DMA write, so really whether or not a device uses MSI is irrelevant.
>>>>>>> If it can generate a DMA to the MSI doorbell then we need to be
>>>>>>> protected and I think we pretty much need to assume that devices are
>>>>>>> DMA capable.  Do the MSI domain checks cover this?
>>>>>> Let me try to rephrase: we check the device is not attached to an MSI
>>>>>> controller (I think this is the semantic of dev_get_msi_domain(dev)).
>>>>>>
>>>>>> If it is not, we don't have to care about MSI isolation: there
>>>>>> will be
>>>>>> no IOMMU binding between the device and any MSI doorbell. If it is we
>>>>>> check the msi domain is backed by an MSI controller able to
>>>>>> perform MSI
>>>>>> isolation.
>>>>>>
>>>>>> So effectively "usage of MSIs" is improper - since it is decided
>>>>>> after
>>>>>> the group attachment anyway  - and the commit message should rather
>>>>>> state "if the device is linked to an MSI controller" (dt msi-parent
>>>>>> notion I think).
>>>>>
>>>>> Hmm, I think Alex has a point here - on a GICv2m I can happily fire
>>>>> arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs
>>>>> definitely
>>>>> aren't in an MSI domain, so I don't think it's valid to assume that a
>>>>> device using only wired interrupts, therefore with no connection to
>>>>> any
>>>>> MSI controller, isn't still capable of maliciously spewing DMA all
>>>>> over
>>>>> any and every doorbell region in the system.
>>>>
>>>> Sorry but I still don't get the point. For the device to reach the
>>>> doorbell there must be an IOMMU mapping.
>>>
>>> Only when the doorbell is _downstream_ of IOMMU translation.
>> Yes but that's the root hypothesis with VFIO assignment, right?
>>
>> Except if the no-iommu option is explicitly set by the userspace, there
>> must be an IOMMU downstream to the device that protects all the DMA
>> transactions.
> 
> The IOMMU is downstream of the device, clearly, but the MSI machinery
> itself could be _in between_ the two - say, integrated into a PCIe root
> complex served by an external SMMU. The doorbell is carved out of the
> upstream address space so that a write targeting the appropriate address
> hits it directly and never even goes out to the SMMU - e.g.
> http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268

Ah OK, I now better understand the point. So in that case not all the
DMA accesses of the devices are properly protected by the sMMU. Doesn't
it look as a kind of PCIe Access Control Service issue here where
transactions are able to bypass the translation service (downstream to
the root complex)?

I clearly did not address that kind of topology in this series. In such
a case the MSIs may even not be mapped at all which was the primary
purpose of the series.

Do you have any clue about how to detect that situation? If we cannot
rely on any iommu service I don't find any other solution but to
recognize this situation and reject the assignment.

> 
>>>> - if the device is not attached to an MSI domain, there won't be any
>>>> doorbell iommu mapping built by this series, so no risk, right?
>>>>
>>>> The device will be allowed to reach only memory iommu mapped by
>>>> userspace with VFIO DMA MAP standard API. Of course if the userspace
>>>> can
>>>> mmap all the host PA that's a more general issue, right?
>>>>
>>>> - If the device is attached to an MSI domain (msi-parent link), 2
>>>> cases:
>>>> 1) the MSI controller advertises MSI isolation (ITS cases), no risk
>>>> 2) the MSI controller does not advertise MSI isolation (GICv2m), there
>>>> is a security hole.
>>>>      a) by default we reject the device attachment
>>>>      b) if the userspace overrides the safe interrupt option he accepts
>>>> the security hole
>>>>
>>>> What am I missing?
>>>
>>> The x86-with-interrupt-remapping-disabled case. Currently, without
>>> unsafe_interrupts, everything is rejected - with this patch, we'll still
>>> reject anything with an MSI domain, but e.g. legacy PCI devices using
>>> INTx would be free to scribble all over the un-translated interrupt
>>> region willy-nilly.
>> Assuming we have an IOMMU, either we have translated transactions or
>> faults? So if legacy PCI devices try to write into a doorbell, there
>> must be faults because there is no IOMMU mapping for doorbells?
> 
> To the best of my understanding, having been wading through the VT-d
> spec, the case on x86 is as above (OK, so strictly it's more "in
> parallel with" than "in front of") - writes outside the 0xFEExxxxx
> interrupt region go to the DMA remapping units, whereas writes within
> that region go to the interrupt remapping unit, and if that's disabled
> or not implemented, simply sail straight through to the APIC beyond.
that's my understanding too

Best Regards

Eric
> 
> Robin.
> 
>> Best Regards
>>
>> Eric
>>   That's the subtle, but significant, change in
>>> behaviour.
>>>
>>> Robin.
>>>
>>>> Best Regards
>>>>
>>>> Eric
>>>>>
>>>>> Robin.
>>>>>
>>>>>> Does it sound better?
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Eric
>>>>>>>
>>>>>>>> or if the device uses MSI and the msi-parent controller
>>>>>>>> supports IRQ remapping.
>>>>>>>>
>>>>>>>> Then we check at group level if all devices have safe
>>>>>>>> interrupts: if
>>>>>>>> not,
>>>>>>>> we only allow the group to be attached if
>>>>>>>> allow_unsafe_interrupts is
>>>>>>>> set.
>>>>>>>>
>>>>>>>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP.
>>>>>>>> This is
>>>>>>>> changed in next patch.
>>>>>>>>
>>>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>>>>
>>>>>>>> ---
>>>>>>>> v3 -> v4:
>>>>>>>> - rename vfio_msi_parent_irq_remapping_capable into
>>>>>>>> vfio_safe_irq_domain
>>>>>>>>      and irq_remapping into safe_irq_domains
>>>>>>>>
>>>>>>>> v2 -> v3:
>>>>>>>> - protect vfio_msi_parent_irq_remapping_capable with
>>>>>>>>      CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>>>> ---
>>>>>>>>     drivers/vfio/vfio_iommu_type1.c | 44
>>>>>>>> +++++++++++++++++++++++++++++++++++++++--
>>>>>>>>     1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>>>>> index 4d3a6f1..2fc8197 100644
>>>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>>>>> @@ -37,6 +37,8 @@
>>>>>>>>     #include <linux/vfio.h>
>>>>>>>>     #include <linux/workqueue.h>
>>>>>>>>     #include <linux/msi-iommu.h>
>>>>>>>> +#include <linux/irqdomain.h>
>>>>>>>> +#include <linux/msi.h>
>>>>>>>>
>>>>>>>>     #define DRIVER_VERSION  "0.2"
>>>>>>>>     #define DRIVER_AUTHOR   "Alex Williamson
>>>>>>>> <alex.williamson@redhat.com>"
>>>>>>>> @@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
>>>>>>>> void *data)
>>>>>>>>         return 0;
>>>>>>>>     }
>>>>>>>>
>>>>>>>> +/**
>>>>>>>> + * vfio_safe_irq_domain: returns whether the irq domain
>>>>>>>> + * the device is attached to is safe with respect to MSI
>>>>>>>> isolation.
>>>>>>>> + * If the irq domain is not an MSI domain, we return it is safe.
>>>>>>>> + *
>>>>>>>> + * @dev: device handle
>>>>>>>> + * @data: unused
>>>>>>>> + * returns 0 if the irq domain is safe, -1 if not.
>>>>>>>> + */
>>>>>>>> +static int vfio_safe_irq_domain(struct device *dev, void *data)
>>>>>>>> +{
>>>>>>>> +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>>>>>>>> +    struct irq_domain *domain;
>>>>>>>> +    struct msi_domain_info *info;
>>>>>>>> +
>>>>>>>> +    domain = dev_get_msi_domain(dev);
>>>>>>>> +    if (!domain)
>>>>>>>> +        return 0;
>>>>>>>> +
>>>>>>>> +    info = msi_get_domain_info(domain);
>>>>>>>> +
>>>>>>>> +    if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>>>>>>>> +        return -1;
>>>>>>>> +#endif
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>     static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>>>>>>>                      struct vfio_domain *domain)
>>>>>>>>     {
>>>>>>>> @@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>>> *iommu_data,
>>>>>>>>         struct vfio_group *group, *g;
>>>>>>>>         struct vfio_domain *domain, *d;
>>>>>>>>         struct bus_type *bus = NULL;
>>>>>>>> -    int ret;
>>>>>>>> +    int ret, safe_irq_domains;
>>>>>>>>
>>>>>>>>         mutex_lock(&iommu->lock);
>>>>>>>>
>>>>>>>> @@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>>> *iommu_data,
>>>>>>>>
>>>>>>>>         group->iommu_group = iommu_group;
>>>>>>>>
>>>>>>>> +    /*
>>>>>>>> +     * Determine if all the devices of the group have a safe irq
>>>>>>>> domain
>>>>>>>> +     * with respect to MSI isolation
>>>>>>>> +     */
>>>>>>>> +    safe_irq_domains = !iommu_group_for_each_dev(iommu_group,
>>>>>>>> &bus,
>>>>>>>> +                       vfio_safe_irq_domain);
>>>>>>>> +
>>>>>>>>         /* Determine bus_type in order to allocate a domain */
>>>>>>>>         ret = iommu_group_for_each_dev(iommu_group, &bus,
>>>>>>>> vfio_bus_type);
>>>>>>>>         if (ret)
>>>>>>>> @@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
>>>>>>>> *iommu_data,
>>>>>>>>         INIT_LIST_HEAD(&domain->group_list);
>>>>>>>>         list_add(&group->next, &domain->group_list);
>>>>>>>>
>>>>>>>> +    /*
>>>>>>>> +     * to advertise safe interrupts either the IOMMU or the MSI
>>>>>>>> controllers
>>>>>>>> +     * must support IRQ remapping/interrupt translation
>>>>>>>> +     */
>>>>>>>>         if (!allow_unsafe_interrupts &&
>>>>>>>> -        !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>>>>>>>> +        (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>>>>>>>> !safe_irq_domains)) {
>>>>>>>>             pr_warn("%s: No interrupt remapping support.  Use the
>>>>>>>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
>>>>>>>> support on this platform\n",
>>>>>>>>                    __func__);
>>>>>>>>             ret = -EPERM;
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes
  2016-05-04 11:54 [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes Eric Auger
                   ` (6 preceding siblings ...)
  2016-05-04 11:54 ` [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains Eric Auger
@ 2016-05-20 16:01 ` Eric Auger
  2016-06-08  8:29   ` Auger Eric
  7 siblings, 1 reply; 38+ messages in thread
From: Eric Auger @ 2016-05-20 16:01 UTC (permalink / raw)
  To: eric.auger, robin.murphy, alex.williamson, will.deacon, joro,
	tglx, jason, marc.zyngier, christoffer.dall, linux-arm-kernel
  Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday

Alex, Robin,

While my 3 part series primarily addresses the problematic of mapping
MSI doorbells into arm-smmu, it fails in :

1) determining whether the MSI controller is downstream or upstream to
the IOMMU,
	=> indicates whether the MSI doorbell must be mapped
	=> participates in the decision about 2)

2) determining whether it is safe to assign a PCIe device.

I think we share this understanding with Robin. All above of course
stands for ARM.

I get stuck with those 2 issues and I have few questions about iommu
group setup, PCIe, iommu dt/ACPI description. I would be grateful to you
if you could answer part of those questions and advise about the
strategy to fix those.

Best Regards

Eric

QUESTIONS:

1) Robin, you pointed some host controllers which also are MSI
controllers
(http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268). In
that case MSIs never reach the IOMMU. I failed in finding anything about
MSIs in PCIe ACS spec. What should be the iommu groups in that
situation. Isn't the upstreamed code able to see some DMA transfers are
not properly isolated and alias devices in the same group? According to
your security warning, Alex, I would think the code does not recognize
it, can you confirm please?

2) can other PCIe components be MSI controllers?

3) Am I obliged to consider arbitrary topologies where an MSI controller
stands between the PCIe host and the iommu? in the PCIe space or
platform space? If this only relates to PCIe couldn' I check if an MSI
controller exists in the PCIe tree?

4) Robin suggested in a private thread to enumerate through a list of
"registered" doorbells and if any belongs to an unsafe MSI controller,
consider the assignment is unsafe. This would be a first step before
doing something more complex. Alex, would that be acceptable to you for
issue #2?

5) About issue #1: don't we miss tools in dt/ACPI to describe the
location of the iommu on ARM? This is not needed on x86 because
irq_remapping and IOMMU are at the same place but my understanding is
that it is on ARM where
- there is no connection between the MSI controller - which implements
irq remapping - and the iommu
- MSI are conveyed on the same address space as standard memory
transactions.

6)  can't we live with iommu/MSI controller respective location uncertainty?

- in my current series, with the above Xilinx MSI controller, I would
see there is an arm-smmu requiring mapping behind the PCI host, would
query the characteristics of the MSI doorbell (not implemented by that
controller), so no mapping would be done. So it would work I think.
- However in case we have this topology: PCIe host -> MSI controller
generally used behind an IOMMU (so registering a doorbell) -> IOMMU,
this wouldn't work since the doorbell would be mapped.




On 05/04/2016 01:54 PM, Eric Auger wrote:
> This series allows the user-space to register a reserved IOVA domain.
> This completes the kernel integration of the whole functionality on top
> of part 1 (v9) & 2 (v8).
> 
> It also depends on [PATCH 1/3] iommu: Add MMIO mapping type series,
> http://comments.gmane.org/gmane.linux.kernel.iommu/12869
> 
> We reuse the VFIO DMA MAP ioctl with a new flag to bridge to the
> msi-iommu API. The need for provisioning such MSI IOVA range is reported
> through capability chain, using VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY.
> 
> vfio_iommu_type1 checks if the MSI mapping is safe when attaching the
> vfio group to the container (allow_unsafe_interrupts modality).
> 
> On ARM/ARM64, the IOMMU does not astract IRQ remapping. the modality is
> abstracted on MSI controller side. The GICv3 ITS is the first controller
> advertising the modality.
> 
> More details & context can be found at:
> http://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/
> 
> Best Regards
> 
> Eric
> 
> Testing:
> - functional on ARM64 AMD Overdrive HW (single GICv2m frame) with
>   Intel X540-T2 (SR-IOV capable)
> - also tested on Armada-7040 using an intel IXGBE (82599ES) by
>   Yehuda Yitschak (v8)
> - Not tested: ARM GICv3 ITS
> 
> References:
> [1] [RFC 0/2] VFIO: Add virtual MSI doorbell support
>     (https://lkml.org/lkml/2015/7/24/135)
> [2] [RFC PATCH 0/6] vfio: Add interface to map MSI pages
>     (https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016607.html)
> [3] [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO
>     (http://permalink.gmane.org/gmane.comp.emulators.kvm.arm.devel/3858)
> 
> Git: complete series available at
> https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.6-rc6-pcie-passthrough-v9
> 
> previous version at
> https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.6-rc5-pcie-passthrough-v8
> 
> History:
> v8 -> v9:
> - report MSI geometry through capability chain (last patch only);
>   with the current limitation that an arbitrary number of 16 page
>   requirement is reported. To be improved later on.
> 
> 
> v7 -> v8:
> - use renamed msi-iommu API
> - VFIO only responsible for setting the IOVA aperture
> - use new DOMAIN_ATTR_MSI_GEOMETRY iommu domain attribute
> 
> v6 -> v7:
> - vfio_find_dma now accepts a dma_type argument.
> - should have recovered the capability to unmap the whole user IOVA range
> - remove computation of nb IOVA pages -> will post a separate RFC for that
>   while respinning the QEMU part
> 
> RFC v5 -> patch v6:
> - split to ease the review process
> 
> RFC v4 -> RFC v5:
> - take into account Thomas' comments on MSI related patches
>   - split "msi: IOMMU map the doorbell address when needed"
>   - increase readability and add comments
>   - fix style issues
>  - split "iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute"
>  - platform ITS now advertises IOMMU_CAP_INTR_REMAP
>  - fix compilation issue with CONFIG_IOMMU API unset
>  - arm-smmu-v3 now advertises DOMAIN_ATTR_MSI_MAPPING
> 
> RFC v3 -> v4:
> - Move doorbell mapping/unmapping in msi.c
> - fix ref count issue on set_affinity: in case of a change in the address
>   the previous address is decremented
> - doorbell map/unmap now is done on msi composition. Should allow the use
>   case for platform MSI controllers
> - create dma-reserved-iommu.h/c exposing/implementing a new API dedicated
>   to reserved IOVA management (looking like dma-iommu glue)
> - series reordering to ease the review:
>   - first part is related to IOMMU
>   - second related to MSI sub-system
>   - third related to VFIO (except arm-smmu IOMMU_CAP_INTR_REMAP removal)
> - expose the number of requested IOVA pages through VFIO_IOMMU_GET_INFO
>   [this partially addresses Marc's comments on iommu_get/put_single_reserved
>    size/alignment problematic - which I did not ignore - but I don't know
>    how much I can do at the moment]
> 
> RFC v2 -> RFC v3:
> - should fix wrong handling of some CONFIG combinations:
>   CONFIG_IOVA, CONFIG_IOMMU_API, CONFIG_PCI_MSI_IRQ_DOMAIN
> - fix MSI_FLAG_IRQ_REMAPPING setting in GICv3 ITS (although not tested)
> 
> PATCH v1 -> RFC v2:
> - reverted to RFC since it looks more reasonable ;-) the code is split
>   between VFIO, IOMMU, MSI controller and I am not sure I did the right
>   choices. Also API need to be further discussed.
> - iova API usage in arm-smmu.c.
> - MSI controller natively programs the MSI addr with either the PA or IOVA.
>   This is not done anymore in vfio-pci driver as suggested by Alex.
> - check irq remapping capability of the group
> 
> RFC v1 [2] -> PATCH v1:
> - use the existing dma map/unmap ioctl interface with a flag to register a
>   reserved IOVA range. Use the legacy Rb to store this special vfio_dma.
> - a single reserved IOVA contiguous region now is allowed
> - use of an RB tree indexed by PA to store allocated reserved slots
> - use of a vfio_domain iova_domain to manage iova allocation within the
>   window provided by the userspace
> - vfio alloc_map/unmap_free take a vfio_group handle
> - vfio_group handle is cached in vfio_pci_device
> - add ref counting to bindings
> - user modality enabled at the end of the series
> 
> 
> Eric Auger (7):
>   vfio: introduce a vfio_dma type field
>   vfio/type1: vfio_find_dma accepting a type argument
>   vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots
>   vfio: allow reserved msi iova registration
>   vfio/type1: also check IRQ remapping capability at msi domain
>   iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP
>   vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability
>     chains
> 
>  drivers/iommu/arm-smmu-v3.c     |   3 +-
>  drivers/iommu/arm-smmu.c        |   3 +-
>  drivers/vfio/vfio_iommu_type1.c | 270 +++++++++++++++++++++++++++++++++++++---
>  include/uapi/linux/vfio.h       |  40 +++++-
>  4 files changed, 298 insertions(+), 18 deletions(-)
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes
  2016-05-20 16:01 ` [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes Eric Auger
@ 2016-06-08  8:29   ` Auger Eric
  2016-06-08 21:06     ` Alex Williamson
  0 siblings, 1 reply; 38+ messages in thread
From: Auger Eric @ 2016-06-08  8:29 UTC (permalink / raw)
  To: eric.auger, robin.murphy, alex.williamson, will.deacon, joro,
	tglx, jason, marc.zyngier, christoffer.dall, linux-arm-kernel
  Cc: patches, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, julien.grall, yehuday

Dear all,
Le 20/05/2016 à 18:01, Eric Auger a écrit :
> Alex, Robin,
> 
> While my 3 part series primarily addresses the problematic of mapping
> MSI doorbells into arm-smmu, it fails in :
> 
> 1) determining whether the MSI controller is downstream or upstream to
> the IOMMU,
> 	=> indicates whether the MSI doorbell must be mapped
> 	=> participates in the decision about 2)
> 
> 2) determining whether it is safe to assign a PCIe device.
> 
> I think we share this understanding with Robin. All above of course
> stands for ARM.
> 
> I get stuck with those 2 issues and I have few questions about iommu
> group setup, PCIe, iommu dt/ACPI description. I would be grateful to you
> if you could answer part of those questions and advise about the
> strategy to fix those.

gentle reminder about the questions below; hope I did not miss any reply.
If anybody has some time to spent on this topic...

> 
> Best Regards
> 
> Eric
> 
> QUESTIONS:
> 
> 1) Robin, you pointed some host controllers which also are MSI
> controllers
> (http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268). In
> that case MSIs never reach the IOMMU. I failed in finding anything about
> MSIs in PCIe ACS spec. What should be the iommu groups in that
> situation. Isn't the upstreamed code able to see some DMA transfers are
> not properly isolated and alias devices in the same group? According to
> your security warning, Alex, I would think the code does not recognize
> it, can you confirm please?
my current understanding is end points would be in separate groups (assuming
ACS support) although MSI controller frame is not properly protected.
> 
> 2) can other PCIe components be MSI controllers?
> 
> 3) Am I obliged to consider arbitrary topologies where an MSI controller
> stands between the PCIe host and the iommu? in the PCIe space or
> platform space? If this only relates to PCIe couldn' I check if an MSI
> controller exists in the PCIe tree?
In my last series, I consider the assignment of platform device unsafe as
soon as there is a GICv2m. This is a change in the user experience compared to
what we have before.
> 
> 4) Robin suggested in a private thread to enumerate through a list of
> "registered" doorbells and if any belongs to an unsafe MSI controller,
> consider the assignment is unsafe. This would be a first step before
> doing something more complex. Alex, would that be acceptable to you for
> issue #2?
I implemented this technique in my last series waiting for more discussion
on 4, 5.
> 
> 5) About issue #1: don't we miss tools in dt/ACPI to describe the
> location of the iommu on ARM? This is not needed on x86 because
> irq_remapping and IOMMU are at the same place but my understanding is
> that it is on ARM where
> - there is no connection between the MSI controller - which implements
> irq remapping - and the iommu
> - MSI are conveyed on the same address space as standard memory
> transactions.
> 
> 6)  can't we live with iommu/MSI controller respective location uncertainty?
> 
> - in my current series, with the above Xilinx MSI controller, I would
> see there is an arm-smmu requiring mapping behind the PCI host, would
> query the characteristics of the MSI doorbell (not implemented by that
> controller), so no mapping would be done. So it would work I think.
> - However in case we have this topology: PCIe host -> MSI controller
> generally used behind an IOMMU (so registering a doorbell) -> IOMMU,
> this wouldn't work since the doorbell would be mapped.

Best Regards

Eric
> 
> 
> 
> 
> On 05/04/2016 01:54 PM, Eric Auger wrote:
>> This series allows the user-space to register a reserved IOVA domain.
>> This completes the kernel integration of the whole functionality on top
>> of part 1 (v9) & 2 (v8).
>>
>> It also depends on [PATCH 1/3] iommu: Add MMIO mapping type series,
>> http://comments.gmane.org/gmane.linux.kernel.iommu/12869
>>
>> We reuse the VFIO DMA MAP ioctl with a new flag to bridge to the
>> msi-iommu API. The need for provisioning such MSI IOVA range is reported
>> through capability chain, using VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY.
>>
>> vfio_iommu_type1 checks if the MSI mapping is safe when attaching the
>> vfio group to the container (allow_unsafe_interrupts modality).
>>
>> On ARM/ARM64, the IOMMU does not astract IRQ remapping. the modality is
>> abstracted on MSI controller side. The GICv3 ITS is the first controller
>> advertising the modality.
>>
>> More details & context can be found at:
>> http://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/
>>
>> Best Regards
>>
>> Eric
>>
>> Testing:
>> - functional on ARM64 AMD Overdrive HW (single GICv2m frame) with
>>   Intel X540-T2 (SR-IOV capable)
>> - also tested on Armada-7040 using an intel IXGBE (82599ES) by
>>   Yehuda Yitschak (v8)
>> - Not tested: ARM GICv3 ITS
>>
>> References:
>> [1] [RFC 0/2] VFIO: Add virtual MSI doorbell support
>>     (https://lkml.org/lkml/2015/7/24/135)
>> [2] [RFC PATCH 0/6] vfio: Add interface to map MSI pages
>>     (https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016607.html)
>> [3] [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO
>>     (http://permalink.gmane.org/gmane.comp.emulators.kvm.arm.devel/3858)
>>
>> Git: complete series available at
>> https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.6-rc6-pcie-passthrough-v9
>>
>> previous version at
>> https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.6-rc5-pcie-passthrough-v8
>>
>> History:
>> v8 -> v9:
>> - report MSI geometry through capability chain (last patch only);
>>   with the current limitation that an arbitrary number of 16 page
>>   requirement is reported. To be improved later on.
>>
>>
>> v7 -> v8:
>> - use renamed msi-iommu API
>> - VFIO only responsible for setting the IOVA aperture
>> - use new DOMAIN_ATTR_MSI_GEOMETRY iommu domain attribute
>>
>> v6 -> v7:
>> - vfio_find_dma now accepts a dma_type argument.
>> - should have recovered the capability to unmap the whole user IOVA range
>> - remove computation of nb IOVA pages -> will post a separate RFC for that
>>   while respinning the QEMU part
>>
>> RFC v5 -> patch v6:
>> - split to ease the review process
>>
>> RFC v4 -> RFC v5:
>> - take into account Thomas' comments on MSI related patches
>>   - split "msi: IOMMU map the doorbell address when needed"
>>   - increase readability and add comments
>>   - fix style issues
>>  - split "iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute"
>>  - platform ITS now advertises IOMMU_CAP_INTR_REMAP
>>  - fix compilation issue with CONFIG_IOMMU API unset
>>  - arm-smmu-v3 now advertises DOMAIN_ATTR_MSI_MAPPING
>>
>> RFC v3 -> v4:
>> - Move doorbell mapping/unmapping in msi.c
>> - fix ref count issue on set_affinity: in case of a change in the address
>>   the previous address is decremented
>> - doorbell map/unmap now is done on msi composition. Should allow the use
>>   case for platform MSI controllers
>> - create dma-reserved-iommu.h/c exposing/implementing a new API dedicated
>>   to reserved IOVA management (looking like dma-iommu glue)
>> - series reordering to ease the review:
>>   - first part is related to IOMMU
>>   - second related to MSI sub-system
>>   - third related to VFIO (except arm-smmu IOMMU_CAP_INTR_REMAP removal)
>> - expose the number of requested IOVA pages through VFIO_IOMMU_GET_INFO
>>   [this partially addresses Marc's comments on iommu_get/put_single_reserved
>>    size/alignment problematic - which I did not ignore - but I don't know
>>    how much I can do at the moment]
>>
>> RFC v2 -> RFC v3:
>> - should fix wrong handling of some CONFIG combinations:
>>   CONFIG_IOVA, CONFIG_IOMMU_API, CONFIG_PCI_MSI_IRQ_DOMAIN
>> - fix MSI_FLAG_IRQ_REMAPPING setting in GICv3 ITS (although not tested)
>>
>> PATCH v1 -> RFC v2:
>> - reverted to RFC since it looks more reasonable ;-) the code is split
>>   between VFIO, IOMMU, MSI controller and I am not sure I did the right
>>   choices. Also API need to be further discussed.
>> - iova API usage in arm-smmu.c.
>> - MSI controller natively programs the MSI addr with either the PA or IOVA.
>>   This is not done anymore in vfio-pci driver as suggested by Alex.
>> - check irq remapping capability of the group
>>
>> RFC v1 [2] -> PATCH v1:
>> - use the existing dma map/unmap ioctl interface with a flag to register a
>>   reserved IOVA range. Use the legacy Rb to store this special vfio_dma.
>> - a single reserved IOVA contiguous region now is allowed
>> - use of an RB tree indexed by PA to store allocated reserved slots
>> - use of a vfio_domain iova_domain to manage iova allocation within the
>>   window provided by the userspace
>> - vfio alloc_map/unmap_free take a vfio_group handle
>> - vfio_group handle is cached in vfio_pci_device
>> - add ref counting to bindings
>> - user modality enabled at the end of the series
>>
>>
>> Eric Auger (7):
>>   vfio: introduce a vfio_dma type field
>>   vfio/type1: vfio_find_dma accepting a type argument
>>   vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots
>>   vfio: allow reserved msi iova registration
>>   vfio/type1: also check IRQ remapping capability at msi domain
>>   iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP
>>   vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability
>>     chains
>>
>>  drivers/iommu/arm-smmu-v3.c     |   3 +-
>>  drivers/iommu/arm-smmu.c        |   3 +-
>>  drivers/vfio/vfio_iommu_type1.c | 270 +++++++++++++++++++++++++++++++++++++---
>>  include/uapi/linux/vfio.h       |  40 +++++-
>>  4 files changed, 298 insertions(+), 18 deletions(-)
>>
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes
  2016-06-08  8:29   ` Auger Eric
@ 2016-06-08 21:06     ` Alex Williamson
  2016-06-09  7:55       ` Auger Eric
  0 siblings, 1 reply; 38+ messages in thread
From: Alex Williamson @ 2016-06-08 21:06 UTC (permalink / raw)
  To: Auger Eric
  Cc: eric.auger, robin.murphy, will.deacon, joro, tglx, jason,
	marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, julien.grall, yehuday

On Wed, 8 Jun 2016 10:29:35 +0200
Auger Eric <eric.auger@linaro.org> wrote:

> Dear all,
> Le 20/05/2016 à 18:01, Eric Auger a écrit :
> > Alex, Robin,
> > 
> > While my 3 part series primarily addresses the problematic of mapping
> > MSI doorbells into arm-smmu, it fails in :
> > 
> > 1) determining whether the MSI controller is downstream or upstream to
> > the IOMMU,  
> > 	=> indicates whether the MSI doorbell must be mapped
> > 	=> participates in the decision about 2)  
> > 
> > 2) determining whether it is safe to assign a PCIe device.
> > 
> > I think we share this understanding with Robin. All above of course
> > stands for ARM.
> > 
> > I get stuck with those 2 issues and I have few questions about iommu
> > group setup, PCIe, iommu dt/ACPI description. I would be grateful to you
> > if you could answer part of those questions and advise about the
> > strategy to fix those.  
> 
> gentle reminder about the questions below; hope I did not miss any reply.
> If anybody has some time to spent on this topic...
> 
> > 
> > Best Regards
> > 
> > Eric
> > 
> > QUESTIONS:
> > 
> > 1) Robin, you pointed some host controllers which also are MSI
> > controllers
> > (http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268). In
> > that case MSIs never reach the IOMMU. I failed in finding anything about
> > MSIs in PCIe ACS spec. What should be the iommu groups in that
> > situation. Isn't the upstreamed code able to see some DMA transfers are
> > not properly isolated and alias devices in the same group? According to
> > your security warning, Alex, I would think the code does not recognize
> > it, can you confirm please?  
> my current understanding is end points would be in separate groups (assuming
> ACS support) although MSI controller frame is not properly protected.

We don't currently consider MSI differently from other DMA and we don't
currently have any sort of concept of a device within the intermediate
fabric as being a DMA target.  We expect fabric devices to only be
transaction routers.  We use ACS to determine whether there's any
possibility of DMA being redirected before it reaches the IOMMU, but it
seems that a DMA being consumed by an interrupt controller before it
reaches the IOMMU would be another cause for an isolation breach.
 
> > 2) can other PCIe components be MSI controllers?

I'm not even entirely sure what this means.  Would a DMA write from an
endpoint target the MMIO space of an intermediate, fabric device?
 
> > 3) Am I obliged to consider arbitrary topologies where an MSI controller
> > stands between the PCIe host and the iommu? in the PCIe space or
> > platform space? If this only relates to PCIe couldn' I check if an MSI
> > controller exists in the PCIe tree?  
> In my last series, I consider the assignment of platform device unsafe as
> soon as there is a GICv2m. This is a change in the user experience compared to
> what we have before.

If the MSI controller is downstream of our DMA translation, it doesn't
seem like we have much choice but to mark it unsafe.  The endpoint is
fully able to attempt to exploit it.
 
> > 4) Robin suggested in a private thread to enumerate through a list of
> > "registered" doorbells and if any belongs to an unsafe MSI controller,
> > consider the assignment is unsafe. This would be a first step before
> > doing something more complex. Alex, would that be acceptable to you for
> > issue #2?  
> I implemented this technique in my last series waiting for more discussion
> on 4, 5.

Seems sufficient.  I don't mind taking a broad swing versus all the
extra complexity of defining which devices are safe vs unsafe.
 
> > 5) About issue #1: don't we miss tools in dt/ACPI to describe the
> > location of the iommu on ARM? This is not needed on x86 because
> > irq_remapping and IOMMU are at the same place but my understanding is
> > that it is on ARM where
> > - there is no connection between the MSI controller - which implements
> > irq remapping - and the iommu
> > - MSI are conveyed on the same address space as standard memory
> > transactions.

It seems pretty dubious to me to have fixed address, unprotected MSI
controllers sitting in the DMA space of a device before IOMMU
translation.  Seems like you not only need to mark interrupts as
unsafe, but exclude the address space of the MSI controller from the
available IOVA space to the user.
 
> > 6)  can't we live with iommu/MSI controller respective location uncertainty?
> > 
> > - in my current series, with the above Xilinx MSI controller, I would
> > see there is an arm-smmu requiring mapping behind the PCI host, would
> > query the characteristics of the MSI doorbell (not implemented by that
> > controller), so no mapping would be done. So it would work I think.
> > - However in case we have this topology: PCIe host -> MSI controller
> > generally used behind an IOMMU (so registering a doorbell) -> IOMMU,
> > this wouldn't work since the doorbell would be mapped.  

I'm a little confused which direction "behind" is here, but it seems
like any time the MSI controller lives in the DMA address space of the
endpoint, both interfering with the available IOVA space to the user
and potentially an attack vector for the user, we need to call it out
as unsafe.  Maybe some of them are for exclusive use of the device and
the attack vector is relatively contained, but they still affect the
IOVA space of the user.  Such a configuration might be safe, but as I
said I'm not opposed to being pretty liberal in applying the unsafe
requirement if the platform has done something unfriendly.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes
  2016-06-08 21:06     ` Alex Williamson
@ 2016-06-09  7:55       ` Auger Eric
  2016-06-09 19:44         ` Alex Williamson
  2016-06-20 15:42         ` Pranav Sawargaonkar
  0 siblings, 2 replies; 38+ messages in thread
From: Auger Eric @ 2016-06-09  7:55 UTC (permalink / raw)
  To: Alex Williamson
  Cc: eric.auger.pro, robin.murphy, will.deacon, joro, tglx, jason,
	marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, julien.grall, yehuday

Alex,
> On Wed, 8 Jun 2016 10:29:35 +0200
> Auger Eric <eric.auger@linaro.org> wrote:
> 
>> Dear all,
>> Le 20/05/2016 à 18:01, Eric Auger a écrit :
>>> Alex, Robin,
>>>
>>> While my 3 part series primarily addresses the problematic of mapping
>>> MSI doorbells into arm-smmu, it fails in :
>>>
>>> 1) determining whether the MSI controller is downstream or upstream to
>>> the IOMMU,  
>>> 	=> indicates whether the MSI doorbell must be mapped
>>> 	=> participates in the decision about 2)  
>>>
>>> 2) determining whether it is safe to assign a PCIe device.
>>>
>>> I think we share this understanding with Robin. All above of course
>>> stands for ARM.
>>>
>>> I get stuck with those 2 issues and I have few questions about iommu
>>> group setup, PCIe, iommu dt/ACPI description. I would be grateful to you
>>> if you could answer part of those questions and advise about the
>>> strategy to fix those.  
>>
>> gentle reminder about the questions below; hope I did not miss any reply.
>> If anybody has some time to spent on this topic...
>>
>>>
>>> Best Regards
>>>
>>> Eric
>>>
>>> QUESTIONS:
>>>
>>> 1) Robin, you pointed some host controllers which also are MSI
>>> controllers
>>> (http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268). In
>>> that case MSIs never reach the IOMMU. I failed in finding anything about
>>> MSIs in PCIe ACS spec. What should be the iommu groups in that
>>> situation. Isn't the upstreamed code able to see some DMA transfers are
>>> not properly isolated and alias devices in the same group? According to
>>> your security warning, Alex, I would think the code does not recognize
>>> it, can you confirm please?  
>> my current understanding is end points would be in separate groups (assuming
>> ACS support) although MSI controller frame is not properly protected.
> 
> We don't currently consider MSI differently from other DMA and we don't
> currently have any sort of concept of a device within the intermediate
> fabric as being a DMA target.  We expect fabric devices to only be
> transaction routers.  We use ACS to determine whether there's any
> possibility of DMA being redirected before it reaches the IOMMU, but it
> seems that a DMA being consumed by an interrupt controller before it
> reaches the IOMMU would be another cause for an isolation breach.
>  
OK thank you for the confirmation
>>> 2) can other PCIe components be MSI controllers?
> 
> I'm not even entirely sure what this means.  Would a DMA write from an
> endpoint target the MMIO space of an intermediate, fabric device?
With the example provided by Robin we have a host controller acting as
an MSI controller. I wondered whether we could have some other fabric
devices (downstream to the host controller in PCIe terminology) also
likely to act as MSI controllers.
>  
>>> 3) Am I obliged to consider arbitrary topologies where an MSI controller
>>> stands between the PCIe host and the iommu? in the PCIe space or
>>> platform space? If this only relates to PCIe couldn' I check if an MSI
>>> controller exists in the PCIe tree?  
>> In my last series, I consider the assignment of platform device unsafe as
>> soon as there is a GICv2m. This is a change in the user experience compared to
>> what we have before.
> 
> If the MSI controller is downstream of our DMA translation, it doesn't
> seem like we have much choice but to mark it unsafe.  The endpoint is
> fully able to attempt to exploit it.
OK the orginal question was related to non PCIe topologies:

- we know some PCIe fabric topologies where the PCIe host controller
implements MSI controller.
- Shall we be prepared to address the same kind of issues with platform
MSI controllers. Are there some SOCs where we would put an unsafe MSI
platform controller before IOMMU translation. Or do we consider it is a
platform topology we don't support for assignment?

>  
>>> 4) Robin suggested in a private thread to enumerate through a list of
>>> "registered" doorbells and if any belongs to an unsafe MSI controller,
>>> consider the assignment is unsafe. This would be a first step before
>>> doing something more complex. Alex, would that be acceptable to you for
>>> issue #2?  
>> I implemented this technique in my last series waiting for more discussion
>> on 4, 5.
> 
> Seems sufficient.  I don't mind taking a broad swing versus all the
> extra complexity of defining which devices are safe vs unsafe.
OK
>  
>>> 5) About issue #1: don't we miss tools in dt/ACPI to describe the
>>> location of the iommu on ARM? This is not needed on x86 because
>>> irq_remapping and IOMMU are at the same place but my understanding is
>>> that it is on ARM where
>>> - there is no connection between the MSI controller - which implements
>>> irq remapping - and the iommu
>>> - MSI are conveyed on the same address space as standard memory
>>> transactions.
> 
> It seems pretty dubious to me to have fixed address, unprotected MSI
> controllers sitting in the DMA space of a device before IOMMU
> translation.
same for me ;-)
  Seems like you not only need to mark interrupts as
> unsafe, but exclude the address space of the MSI controller from the
> available IOVA space to the user.
I currently do not see how to achieve that. The guest can program the
assigned device DMA target address with the MSI frame PA. there is no
IOMMU to protect. How can we make it if we don't trap on DMA programming?
>  
>>> 6)  can't we live with iommu/MSI controller respective location uncertainty?
>>>
>>> - in my current series, with the above Xilinx MSI controller, I would
>>> see there is an arm-smmu requiring mapping behind the PCI host, would
>>> query the characteristics of the MSI doorbell (not implemented by that
>>> controller), so no mapping would be done. So it would work I think.
>>> - However in case we have this topology: PCIe host -> MSI controller
>>> generally used behind an IOMMU (so registering a doorbell) -> IOMMU,
>>> this wouldn't work since the doorbell would be mapped.  
> 
> I'm a little confused which direction "behind" is here, but it seems
> like any time the MSI controller lives in the DMA address space of the
> endpoint, both interfering with the available IOVA space to the user
> and potentially an attack vector for the user, we need to call it out
> as unsafe.  Maybe some of them are for exclusive use of the device and
> the attack vector is relatively contained, but they still affect the
> IOVA space of the user.  Such a configuration might be safe, but as I
> said I'm not opposed to being pretty liberal in applying the unsafe
> requirement if the platform has done something unfriendly.  Thanks,
OK that's clear.

Thank you for your feedbacks

Best Regards

Eric
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes
  2016-06-09  7:55       ` Auger Eric
@ 2016-06-09 19:44         ` Alex Williamson
  2016-06-20 15:42         ` Pranav Sawargaonkar
  1 sibling, 0 replies; 38+ messages in thread
From: Alex Williamson @ 2016-06-09 19:44 UTC (permalink / raw)
  To: Auger Eric
  Cc: eric.auger.pro, robin.murphy, will.deacon, joro, tglx, jason,
	marc.zyngier, christoffer.dall, linux-arm-kernel, patches,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, julien.grall, yehuday

On Thu, 9 Jun 2016 09:55:37 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Alex,
> > On Wed, 8 Jun 2016 10:29:35 +0200
> > Auger Eric <eric.auger@linaro.org> wrote:
> >   
> >> Dear all,
> >> Le 20/05/2016 à 18:01, Eric Auger a écrit :  
> >>> Alex, Robin,
> >>>
> >>> While my 3 part series primarily addresses the problematic of mapping
> >>> MSI doorbells into arm-smmu, it fails in :
> >>>
> >>> 1) determining whether the MSI controller is downstream or upstream to
> >>> the IOMMU,    
> >>> 	=> indicates whether the MSI doorbell must be mapped
> >>> 	=> participates in the decision about 2)    
> >>>
> >>> 2) determining whether it is safe to assign a PCIe device.
> >>>
> >>> I think we share this understanding with Robin. All above of course
> >>> stands for ARM.
> >>>
> >>> I get stuck with those 2 issues and I have few questions about iommu
> >>> group setup, PCIe, iommu dt/ACPI description. I would be grateful to you
> >>> if you could answer part of those questions and advise about the
> >>> strategy to fix those.    
> >>
> >> gentle reminder about the questions below; hope I did not miss any reply.
> >> If anybody has some time to spent on this topic...
> >>  
> >>>
> >>> Best Regards
> >>>
> >>> Eric
> >>>
> >>> QUESTIONS:
> >>>
> >>> 1) Robin, you pointed some host controllers which also are MSI
> >>> controllers
> >>> (http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268). In
> >>> that case MSIs never reach the IOMMU. I failed in finding anything about
> >>> MSIs in PCIe ACS spec. What should be the iommu groups in that
> >>> situation. Isn't the upstreamed code able to see some DMA transfers are
> >>> not properly isolated and alias devices in the same group? According to
> >>> your security warning, Alex, I would think the code does not recognize
> >>> it, can you confirm please?    
> >> my current understanding is end points would be in separate groups (assuming
> >> ACS support) although MSI controller frame is not properly protected.  
> > 
> > We don't currently consider MSI differently from other DMA and we don't
> > currently have any sort of concept of a device within the intermediate
> > fabric as being a DMA target.  We expect fabric devices to only be
> > transaction routers.  We use ACS to determine whether there's any
> > possibility of DMA being redirected before it reaches the IOMMU, but it
> > seems that a DMA being consumed by an interrupt controller before it
> > reaches the IOMMU would be another cause for an isolation breach.
> >    
> OK thank you for the confirmation
> >>> 2) can other PCIe components be MSI controllers?  
> > 
> > I'm not even entirely sure what this means.  Would a DMA write from an
> > endpoint target the MMIO space of an intermediate, fabric device?  
> With the example provided by Robin we have a host controller acting as
> an MSI controller. I wondered whether we could have some other fabric
> devices (downstream to the host controller in PCIe terminology) also
> likely to act as MSI controllers.
> >    
> >>> 3) Am I obliged to consider arbitrary topologies where an MSI controller
> >>> stands between the PCIe host and the iommu? in the PCIe space or
> >>> platform space? If this only relates to PCIe couldn' I check if an MSI
> >>> controller exists in the PCIe tree?    
> >> In my last series, I consider the assignment of platform device unsafe as
> >> soon as there is a GICv2m. This is a change in the user experience compared to
> >> what we have before.  
> > 
> > If the MSI controller is downstream of our DMA translation, it doesn't
> > seem like we have much choice but to mark it unsafe.  The endpoint is
> > fully able to attempt to exploit it.  
> OK the orginal question was related to non PCIe topologies:
> 
> - we know some PCIe fabric topologies where the PCIe host controller
> implements MSI controller.
> - Shall we be prepared to address the same kind of issues with platform
> MSI controllers. Are there some SOCs where we would put an unsafe MSI
> platform controller before IOMMU translation. Or do we consider it is a
> platform topology we don't support for assignment?

I was trying to answer as generically, non-PCI as I could, but I guess
I still slipped an "endpoint" in there.  So if we define an MSI
controller generically as a DMA write target which generates platform
specific interrupts in response to those writes, and that MSI
controller is fixed in the address space of the interrupt generating
device (ie. there's no IOMMU translation applied), then I think:

a) The part of the address space consumed by the MSI controller needs
to be described to the user

b) Depending on the scope of the interrupts the MSI controller is able
to generate, the interrupts need to be marked unsafe

This is really not all that different from x86 with interrupt
remapping.  We have an address space hole where the MSI controller
lives at 0xfee00000.  We would like to describe this, but we currently
don't have a strong need to do so because it's architecturally fixed.
With interrupt remapping, the scope of the interrupts the device is
able to generate is only the interrupts intended for the device, which
we consider safe.  Without interrupt remapping, the IOMMU does not do
translation of this range (so it doesn't matter whether the MSI
controller is upstream, downstream, or coincident with the IOMMU), and
the scope of the interrupts the device can target are sufficiently
large to make it unsafe.

I believe what you're trying to account for is an MSI controller that
might arbitrarily consume a fixed portion of the IOVA space for a
device and for which it's ability to generate interrupts may or may not
be limited in scope.

I don't currently have a strong preference whether we do or don't
support assignment of such devices, but if we do and we have no ability
to manage the device's access to the MSI controller's DMA address space
(ie. no IOMU in the DMA path) and no guarantees of the scope of the
interrupts that the controller can generate through that DMA window, it
needs to be marked unsafe.  If we don't plan to support devices behind
such topologies then we need to prevent their use through vfio.

> >>> 4) Robin suggested in a private thread to enumerate through a list of
> >>> "registered" doorbells and if any belongs to an unsafe MSI controller,
> >>> consider the assignment is unsafe. This would be a first step before
> >>> doing something more complex. Alex, would that be acceptable to you for
> >>> issue #2?    
> >> I implemented this technique in my last series waiting for more discussion
> >> on 4, 5.  
> > 
> > Seems sufficient.  I don't mind taking a broad swing versus all the
> > extra complexity of defining which devices are safe vs unsafe.  
> OK
> >    
> >>> 5) About issue #1: don't we miss tools in dt/ACPI to describe the
> >>> location of the iommu on ARM? This is not needed on x86 because
> >>> irq_remapping and IOMMU are at the same place but my understanding is
> >>> that it is on ARM where
> >>> - there is no connection between the MSI controller - which implements
> >>> irq remapping - and the iommu
> >>> - MSI are conveyed on the same address space as standard memory
> >>> transactions.  
> > 
> > It seems pretty dubious to me to have fixed address, unprotected MSI
> > controllers sitting in the DMA space of a device before IOMMU
> > translation.  
> same for me ;-)
>   Seems like you not only need to mark interrupts as
> > unsafe, but exclude the address space of the MSI controller from the
> > available IOVA space to the user.  
> I currently do not see how to achieve that. The guest can program the
> assigned device DMA target address with the MSI frame PA. there is no
> IOMMU to protect. How can we make it if we don't trap on DMA programming?

All we can do regarding "excluding the address space of the MSI
controller" is to create a vfio interface to describe that exclusion.
Without an IOMMU we can't prevent access to it.  Therefore if we go
back to b) above, a user should only be allowed access to the device if
either the scope is sufficiently limited to be considered safe or we
have an opt-in for unsafe interrupts.

Still you have the issue of how the device actually gets programmed to
hit their DMA target to generate interrupts.  This is a virtualization
question though, a userspace driver needs to know both the available
IOVA space and the MSI target addresses.  It's userspace drivers like
QEMU that need to provide further virtualization to intercept guest
programming of the device and "do the right thing".  For this you
either need to rely on standards-based interfaces, like PCI MSI/X or
you get to write device specific virtualization code that traps the
interrupt programming.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes
  2016-06-09  7:55       ` Auger Eric
  2016-06-09 19:44         ` Alex Williamson
@ 2016-06-20 15:42         ` Pranav Sawargaonkar
  2016-06-20 15:46           ` Pranav Sawargaonkar
  1 sibling, 1 reply; 38+ messages in thread
From: Pranav Sawargaonkar @ 2016-06-20 15:42 UTC (permalink / raw)
  To: Auger Eric
  Cc: Alex Williamson, eric.auger.pro, robin.murphy, will.deacon, joro,
	tglx, jason, marc.zyngier, christoffer.dall, linux-arm-kernel,
	patches, linux-kernel, Bharat Bhushan, Pavel Fedin, iommu,
	Jean-Philippe.Brucker, julien.grall, yehuday,
	Pranavkumar Sawargaonkar

Hi Eric,

Tested this series on APM X-Gene2 with E1000 and sata sil card.
Tested-By: Pranavkumar Sawargaonkar <psawargaonkar@apm.com>

Thanks,
Pranav


On Thu, Jun 9, 2016 at 1:25 PM, Auger Eric <eric.auger@redhat.com> wrote:
> Alex,
>> On Wed, 8 Jun 2016 10:29:35 +0200
>> Auger Eric <eric.auger@linaro.org> wrote:
>>
>>> Dear all,
>>> Le 20/05/2016 à 18:01, Eric Auger a écrit :
>>>> Alex, Robin,
>>>>
>>>> While my 3 part series primarily addresses the problematic of mapping
>>>> MSI doorbells into arm-smmu, it fails in :
>>>>
>>>> 1) determining whether the MSI controller is downstream or upstream to
>>>> the IOMMU,
>>>>     => indicates whether the MSI doorbell must be mapped
>>>>     => participates in the decision about 2)
>>>>
>>>> 2) determining whether it is safe to assign a PCIe device.
>>>>
>>>> I think we share this understanding with Robin. All above of course
>>>> stands for ARM.
>>>>
>>>> I get stuck with those 2 issues and I have few questions about iommu
>>>> group setup, PCIe, iommu dt/ACPI description. I would be grateful to you
>>>> if you could answer part of those questions and advise about the
>>>> strategy to fix those.
>>>
>>> gentle reminder about the questions below; hope I did not miss any reply.
>>> If anybody has some time to spent on this topic...
>>>
>>>>
>>>> Best Regards
>>>>
>>>> Eric
>>>>
>>>> QUESTIONS:
>>>>
>>>> 1) Robin, you pointed some host controllers which also are MSI
>>>> controllers
>>>> (http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268). In
>>>> that case MSIs never reach the IOMMU. I failed in finding anything about
>>>> MSIs in PCIe ACS spec. What should be the iommu groups in that
>>>> situation. Isn't the upstreamed code able to see some DMA transfers are
>>>> not properly isolated and alias devices in the same group? According to
>>>> your security warning, Alex, I would think the code does not recognize
>>>> it, can you confirm please?
>>> my current understanding is end points would be in separate groups (assuming
>>> ACS support) although MSI controller frame is not properly protected.
>>
>> We don't currently consider MSI differently from other DMA and we don't
>> currently have any sort of concept of a device within the intermediate
>> fabric as being a DMA target.  We expect fabric devices to only be
>> transaction routers.  We use ACS to determine whether there's any
>> possibility of DMA being redirected before it reaches the IOMMU, but it
>> seems that a DMA being consumed by an interrupt controller before it
>> reaches the IOMMU would be another cause for an isolation breach.
>>
> OK thank you for the confirmation
>>>> 2) can other PCIe components be MSI controllers?
>>
>> I'm not even entirely sure what this means.  Would a DMA write from an
>> endpoint target the MMIO space of an intermediate, fabric device?
> With the example provided by Robin we have a host controller acting as
> an MSI controller. I wondered whether we could have some other fabric
> devices (downstream to the host controller in PCIe terminology) also
> likely to act as MSI controllers.
>>
>>>> 3) Am I obliged to consider arbitrary topologies where an MSI controller
>>>> stands between the PCIe host and the iommu? in the PCIe space or
>>>> platform space? If this only relates to PCIe couldn' I check if an MSI
>>>> controller exists in the PCIe tree?
>>> In my last series, I consider the assignment of platform device unsafe as
>>> soon as there is a GICv2m. This is a change in the user experience compared to
>>> what we have before.
>>
>> If the MSI controller is downstream of our DMA translation, it doesn't
>> seem like we have much choice but to mark it unsafe.  The endpoint is
>> fully able to attempt to exploit it.
> OK the orginal question was related to non PCIe topologies:
>
> - we know some PCIe fabric topologies where the PCIe host controller
> implements MSI controller.
> - Shall we be prepared to address the same kind of issues with platform
> MSI controllers. Are there some SOCs where we would put an unsafe MSI
> platform controller before IOMMU translation. Or do we consider it is a
> platform topology we don't support for assignment?
>
>>
>>>> 4) Robin suggested in a private thread to enumerate through a list of
>>>> "registered" doorbells and if any belongs to an unsafe MSI controller,
>>>> consider the assignment is unsafe. This would be a first step before
>>>> doing something more complex. Alex, would that be acceptable to you for
>>>> issue #2?
>>> I implemented this technique in my last series waiting for more discussion
>>> on 4, 5.
>>
>> Seems sufficient.  I don't mind taking a broad swing versus all the
>> extra complexity of defining which devices are safe vs unsafe.
> OK
>>
>>>> 5) About issue #1: don't we miss tools in dt/ACPI to describe the
>>>> location of the iommu on ARM? This is not needed on x86 because
>>>> irq_remapping and IOMMU are at the same place but my understanding is
>>>> that it is on ARM where
>>>> - there is no connection between the MSI controller - which implements
>>>> irq remapping - and the iommu
>>>> - MSI are conveyed on the same address space as standard memory
>>>> transactions.
>>
>> It seems pretty dubious to me to have fixed address, unprotected MSI
>> controllers sitting in the DMA space of a device before IOMMU
>> translation.
> same for me ;-)
>   Seems like you not only need to mark interrupts as
>> unsafe, but exclude the address space of the MSI controller from the
>> available IOVA space to the user.
> I currently do not see how to achieve that. The guest can program the
> assigned device DMA target address with the MSI frame PA. there is no
> IOMMU to protect. How can we make it if we don't trap on DMA programming?
>>
>>>> 6)  can't we live with iommu/MSI controller respective location uncertainty?
>>>>
>>>> - in my current series, with the above Xilinx MSI controller, I would
>>>> see there is an arm-smmu requiring mapping behind the PCI host, would
>>>> query the characteristics of the MSI doorbell (not implemented by that
>>>> controller), so no mapping would be done. So it would work I think.
>>>> - However in case we have this topology: PCIe host -> MSI controller
>>>> generally used behind an IOMMU (so registering a doorbell) -> IOMMU,
>>>> this wouldn't work since the doorbell would be mapped.
>>
>> I'm a little confused which direction "behind" is here, but it seems
>> like any time the MSI controller lives in the DMA address space of the
>> endpoint, both interfering with the available IOVA space to the user
>> and potentially an attack vector for the user, we need to call it out
>> as unsafe.  Maybe some of them are for exclusive use of the device and
>> the attack vector is relatively contained, but they still affect the
>> IOVA space of the user.  Such a configuration might be safe, but as I
>> said I'm not opposed to being pretty liberal in applying the unsafe
>> requirement if the platform has done something unfriendly.  Thanks,
> OK that's clear.
>
> Thank you for your feedbacks
>
> Best Regards
>
> Eric
>>
>> Alex
>>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes
  2016-06-20 15:42         ` Pranav Sawargaonkar
@ 2016-06-20 15:46           ` Pranav Sawargaonkar
  0 siblings, 0 replies; 38+ messages in thread
From: Pranav Sawargaonkar @ 2016-06-20 15:46 UTC (permalink / raw)
  To: Auger Eric
  Cc: Alex Williamson, eric.auger.pro, robin.murphy, will.deacon, joro,
	tglx, jason, marc.zyngier, christoffer.dall, linux-arm-kernel,
	patches, linux-kernel, Bharat Bhushan, Pavel Fedin, iommu,
	Jean-Philippe.Brucker, julien.grall, yehuday,
	Pranavkumar Sawargaonkar

On Mon, Jun 20, 2016 at 9:12 PM, Pranav Sawargaonkar
<pranav.sawargaonkar@gmail.com> wrote:
> Hi Eric,
>
> Tested this series on APM X-Gene2 with E1000 and sata sil card.
> Tested-By: Pranavkumar Sawargaonkar <psawargaonkar@apm.com>
>
> Thanks,
> Pranav
>
>
> On Thu, Jun 9, 2016 at 1:25 PM, Auger Eric <eric.auger@redhat.com> wrote:
>> Alex,
>>> On Wed, 8 Jun 2016 10:29:35 +0200
>>> Auger Eric <eric.auger@linaro.org> wrote:
>>>
>>>> Dear all,
>>>> Le 20/05/2016 à 18:01, Eric Auger a écrit :
>>>>> Alex, Robin,
>>>>>
>>>>> While my 3 part series primarily addresses the problematic of mapping
>>>>> MSI doorbells into arm-smmu, it fails in :
>>>>>
>>>>> 1) determining whether the MSI controller is downstream or upstream to
>>>>> the IOMMU,
>>>>>     => indicates whether the MSI doorbell must be mapped
>>>>>     => participates in the decision about 2)
>>>>>
>>>>> 2) determining whether it is safe to assign a PCIe device.
>>>>>
>>>>> I think we share this understanding with Robin. All above of course
>>>>> stands for ARM.
>>>>>
>>>>> I get stuck with those 2 issues and I have few questions about iommu
>>>>> group setup, PCIe, iommu dt/ACPI description. I would be grateful to you
>>>>> if you could answer part of those questions and advise about the
>>>>> strategy to fix those.
>>>>
>>>> gentle reminder about the questions below; hope I did not miss any reply.
>>>> If anybody has some time to spent on this topic...
>>>>
>>>>>
>>>>> Best Regards
>>>>>
>>>>> Eric
>>>>>
>>>>> QUESTIONS:
>>>>>
>>>>> 1) Robin, you pointed some host controllers which also are MSI
>>>>> controllers
>>>>> (http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268). In
>>>>> that case MSIs never reach the IOMMU. I failed in finding anything about
>>>>> MSIs in PCIe ACS spec. What should be the iommu groups in that
>>>>> situation. Isn't the upstreamed code able to see some DMA transfers are
>>>>> not properly isolated and alias devices in the same group? According to
>>>>> your security warning, Alex, I would think the code does not recognize
>>>>> it, can you confirm please?
>>>> my current understanding is end points would be in separate groups (assuming
>>>> ACS support) although MSI controller frame is not properly protected.
>>>
>>> We don't currently consider MSI differently from other DMA and we don't
>>> currently have any sort of concept of a device within the intermediate
>>> fabric as being a DMA target.  We expect fabric devices to only be
>>> transaction routers.  We use ACS to determine whether there's any
>>> possibility of DMA being redirected before it reaches the IOMMU, but it
>>> seems that a DMA being consumed by an interrupt controller before it
>>> reaches the IOMMU would be another cause for an isolation breach.
>>>
>> OK thank you for the confirmation
>>>>> 2) can other PCIe components be MSI controllers?
>>>
>>> I'm not even entirely sure what this means.  Would a DMA write from an
>>> endpoint target the MMIO space of an intermediate, fabric device?
>> With the example provided by Robin we have a host controller acting as
>> an MSI controller. I wondered whether we could have some other fabric
>> devices (downstream to the host controller in PCIe terminology) also
>> likely to act as MSI controllers.
>>>
>>>>> 3) Am I obliged to consider arbitrary topologies where an MSI controller
>>>>> stands between the PCIe host and the iommu? in the PCIe space or
>>>>> platform space? If this only relates to PCIe couldn' I check if an MSI
>>>>> controller exists in the PCIe tree?
>>>> In my last series, I consider the assignment of platform device unsafe as
>>>> soon as there is a GICv2m. This is a change in the user experience compared to
>>>> what we have before.
>>>
>>> If the MSI controller is downstream of our DMA translation, it doesn't
>>> seem like we have much choice but to mark it unsafe.  The endpoint is
>>> fully able to attempt to exploit it.
>> OK the orginal question was related to non PCIe topologies:
>>
>> - we know some PCIe fabric topologies where the PCIe host controller
>> implements MSI controller.
>> - Shall we be prepared to address the same kind of issues with platform
>> MSI controllers. Are there some SOCs where we would put an unsafe MSI
>> platform controller before IOMMU translation. Or do we consider it is a
>> platform topology we don't support for assignment?
>>
>>>
>>>>> 4) Robin suggested in a private thread to enumerate through a list of
>>>>> "registered" doorbells and if any belongs to an unsafe MSI controller,
>>>>> consider the assignment is unsafe. This would be a first step before
>>>>> doing something more complex. Alex, would that be acceptable to you for
>>>>> issue #2?
>>>> I implemented this technique in my last series waiting for more discussion
>>>> on 4, 5.
>>>
>>> Seems sufficient.  I don't mind taking a broad swing versus all the
>>> extra complexity of defining which devices are safe vs unsafe.
>> OK
>>>
>>>>> 5) About issue #1: don't we miss tools in dt/ACPI to describe the
>>>>> location of the iommu on ARM? This is not needed on x86 because
>>>>> irq_remapping and IOMMU are at the same place but my understanding is
>>>>> that it is on ARM where
>>>>> - there is no connection between the MSI controller - which implements
>>>>> irq remapping - and the iommu
>>>>> - MSI are conveyed on the same address space as standard memory
>>>>> transactions.
>>>
>>> It seems pretty dubious to me to have fixed address, unprotected MSI
>>> controllers sitting in the DMA space of a device before IOMMU
>>> translation.
>> same for me ;-)
>>   Seems like you not only need to mark interrupts as
>>> unsafe, but exclude the address space of the MSI controller from the
>>> available IOVA space to the user.
>> I currently do not see how to achieve that. The guest can program the
>> assigned device DMA target address with the MSI frame PA. there is no
>> IOMMU to protect. How can we make it if we don't trap on DMA programming?
>>>
>>>>> 6)  can't we live with iommu/MSI controller respective location uncertainty?
>>>>>
>>>>> - in my current series, with the above Xilinx MSI controller, I would
>>>>> see there is an arm-smmu requiring mapping behind the PCI host, would
>>>>> query the characteristics of the MSI doorbell (not implemented by that
>>>>> controller), so no mapping would be done. So it would work I think.
>>>>> - However in case we have this topology: PCIe host -> MSI controller
>>>>> generally used behind an IOMMU (so registering a doorbell) -> IOMMU,
>>>>> this wouldn't work since the doorbell would be mapped.
>>>
>>> I'm a little confused which direction "behind" is here, but it seems
>>> like any time the MSI controller lives in the DMA address space of the
>>> endpoint, both interfering with the available IOVA space to the user
>>> and potentially an attack vector for the user, we need to call it out
>>> as unsafe.  Maybe some of them are for exclusive use of the device and
>>> the attack vector is relatively contained, but they still affect the
>>> IOVA space of the user.  Such a configuration might be safe, but as I
>>> said I'm not opposed to being pretty liberal in applying the unsafe
>>> requirement if the platform has done something unfriendly.  Thanks,
>> OK that's clear.
>>
>> Thank you for your feedbacks
>>
>> Best Regards
>>
>> Eric
>>>
>>> Alex
>>>

Oops sorry top posting in earlier reply,

Tested this series on APM X-Gene2 with E1000 and sata sil card.
Tested-By: Pranavkumar Sawargaonkar <psawargaonkar@apm.com>

Thanks,
Pranav

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2016-06-20 15:47 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-04 11:54 [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes Eric Auger
2016-05-04 11:54 ` [PATCH v9 1/7] vfio: introduce a vfio_dma type field Eric Auger
2016-05-04 11:54 ` [PATCH v9 2/7] vfio/type1: vfio_find_dma accepting a type argument Eric Auger
2016-05-09 22:49   ` Alex Williamson
2016-05-10 14:54     ` Eric Auger
2016-05-04 11:54 ` [PATCH v9 3/7] vfio/type1: bypass unmap/unpin and replay for VFIO_IOVA_RESERVED slots Eric Auger
2016-05-09 22:49   ` Alex Williamson
2016-05-11 12:58     ` Eric Auger
2016-05-04 11:54 ` [PATCH v9 4/7] vfio: allow reserved msi iova registration Eric Auger
2016-05-05 19:22   ` Chalamarla, Tirumalesh
2016-05-09  7:55     ` Eric Auger
2016-05-10 15:29   ` Alex Williamson
2016-05-10 15:34     ` Eric Auger
2016-05-04 11:54 ` [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain Eric Auger
2016-05-05 19:23   ` Chalamarla, Tirumalesh
2016-05-09  8:05     ` Eric Auger
2016-05-09 22:49   ` Alex Williamson
2016-05-10 16:10     ` Eric Auger
2016-05-10 17:24       ` Robin Murphy
2016-05-11  8:38         ` Eric Auger
2016-05-11  9:31           ` Robin Murphy
2016-05-11  9:44             ` Eric Auger
2016-05-11 13:48               ` Robin Murphy
2016-05-11 14:37                 ` Eric Auger
2016-05-04 11:54 ` [PATCH v9 6/7] iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP Eric Auger
2016-05-04 11:54 ` [PATCH v9 7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains Eric Auger
2016-05-04 12:06   ` Eric Auger
2016-05-09 23:03     ` Alex Williamson
2016-05-10 16:50       ` Eric Auger
2016-05-09 22:49   ` Alex Williamson
2016-05-10 16:36     ` Eric Auger
2016-05-20 16:01 ` [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes Eric Auger
2016-06-08  8:29   ` Auger Eric
2016-06-08 21:06     ` Alex Williamson
2016-06-09  7:55       ` Auger Eric
2016-06-09 19:44         ` Alex Williamson
2016-06-20 15:42         ` Pranav Sawargaonkar
2016-06-20 15:46           ` Pranav Sawargaonkar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).