linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v3 00/15] KVM PCIe/MSI passthrough on ARM/ARM64
@ 2016-02-12  8:13 Eric Auger
  2016-02-12  8:13 ` [RFC v3 01/15] iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute Eric Auger
                   ` (14 more replies)
  0 siblings, 15 replies; 29+ messages in thread
From: Eric Auger @ 2016-02-12  8:13 UTC (permalink / raw)
  To: eric.auger, eric.auger, alex.williamson, will.deacon, joro, tglx,
	jason, marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm,
	kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

This series addresses KVM PCIe passthrough with MSI enabled on ARM/ARM64.
It pursues the efforts done on [1], [2], [3]. It also aims at covering the
same need on PowerPC platforms although the same kind of integration
should be carried out.

On x86 all accesses to the 1MB PA region [FEE0_0000h - FEF0_000h] are directed
as interrupt messages: accesses to this special PA window directly target the
APIC configuration space and not DRAM, meaning the downstream IOMMU is bypassed.

This is not the case on above mentionned platforms where MSI messages emitted
by devices are conveyed through the IOMMU. This means an IOVA/host PA mapping
must exist for the MSI to reach the MSI controller. Normal way to create
IOVA bindings consists in using VFIO DMA MAP API. However in this case
the MSI IOVA is not mapped onto guest RAM but on host physical page (the MSI
controller frame).

In a nutshell, this series does:
- introduce an IOMMU API to register a IOVA window usable for reserved mapping
- reuse VFIO DMA MAP ioctl with a new flag to plug onto that new API
- check if the device MSI-parent controllers allow IRQ remapping
  (allow unsafe interrupt modality) for a given group
- introduce a new IOMMU API to allocate reserved IOVAs and bind them onto
  a physical address
- allow the GICv2M and GICv3-ITS PCI irqchip to map/unmap the MSI frame
  on irq_write_msi_msg

Best Regards

Eric

Testing:
- functional on ARM64 AMD Overdrive HW (single GICv2m frame) with an e1000e
  PCIe card.
- tested there is no regresion on
  x non assigned PCIe driver
  x platform device passthrough
- Not tested: ARM with SR-IOV, ARM GICv3 ITS, ...

References:
[1] [RFC 0/2] VFIO: Add virtual MSI doorbell support
    (https://lkml.org/lkml/2015/7/24/135)
[2] [RFC PATCH 0/6] vfio: Add interface to map MSI pages
    (https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016607.html)
[3] [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO
    (http://permalink.gmane.org/gmane.comp.emulators.kvm.arm.devel/3858)

Git:
https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.5-rc3-pcie-passthrough-rfcv3

previous version at
v2: https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.5-rc3-pcie-passthrough-rfcv2
v1: https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.5-rc1-pcie-passthrough-v1

QEMU Integration:
[RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt
(http://lists.gnu.org/archive/html/qemu-arm/2016-01/msg00444.html)
https://git.linaro.org/people/eric.auger/qemu.git/shortlog/refs/heads/v2.5.0-pci-passthrough-rfc-v2

User Hints:
To allow PCI/MSI passthrough with GICv2M, compile VFIO as a module and
load the vfio_iommu_type1 module with allow_unsafe_interrupts param:
sudo modprobe -v vfio-pci
sudo modprobe -r vfio_iommu_type1
sudo modprobe -v vfio_iommu_type1 allow_unsafe_interrupts=1

History:

RFC v2 -> RFC v3:
- should fix wrong handling of some CONFIG combinations:
  CONFIG_IOVA, CONFIG_IOMMU_API, CONFIG_PCI_MSI_IRQ_DOMAIN
- fix MSI_FLAG_IRQ_REMAPPING setting in GICv3 ITS (although not tested)

PATCH v1 -> RFC v2:
- reverted to RFC since it looks more reasonable ;-) the code is split
  between VFIO, IOMMU, MSI controller and I am not sure I did the right
  choices. Also API need to be further discussed.
- iova API usage in arm-smmu.c.
- MSI controller natively programs the MSI addr with either the PA or IOVA.
  This is not done anymore in vfio-pci driver as suggested by Alex.
- check irq remapping capability of the group

RFC v1 [2] -> PATCH v1:
- use the existing dma map/unmap ioctl interface with a flag to register a
  reserved IOVA range. Use the legacy Rb to store this special vfio_dma.
- a single reserved IOVA contiguous region now is allowed
- use of an RB tree indexed by PA to store allocated reserved slots
- use of a vfio_domain iova_domain to manage iova allocation within the
  window provided by the userspace
- vfio alloc_map/unmap_free take a vfio_group handle
- vfio_group handle is cached in vfio_pci_device
- add ref counting to bindings
- user modality enabled at the end of the series


Eric Auger (15):
  iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute
  vfio: expose MSI mapping requirement through VFIO_IOMMU_GET_INFO
  vfio: introduce VFIO_IOVA_RESERVED vfio_dma type
  iommu: add alloc/free_reserved_iova_domain
  iommu/arm-smmu: implement alloc/free_reserved_iova_domain
  iommu/arm-smmu: add a reserved binding RB tree
  iommu: iommu_get/put_single_reserved
  iommu/arm-smmu: implement iommu_get/put_single_reserved
  iommu/arm-smmu: relinquish reserved resources on domain deletion
  vfio: allow the user to register reserved iova range for MSI mapping
  msi: Add a new MSI_FLAG_IRQ_REMAPPING flag
  msi: export msi_get_domain_info
  vfio/type1: also check IRQ remapping capability at msi domain
  iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP
  irqchip/gicv2m/v3-its-pci-msi: IOMMU map the MSI frame when needed

 drivers/iommu/Kconfig                    |   2 +
 drivers/iommu/arm-smmu.c                 | 292 +++++++++++++++++++++++++++++--
 drivers/iommu/fsl_pamu_domain.c          |   2 +
 drivers/iommu/iommu.c                    |  43 +++++
 drivers/irqchip/irq-gic-common.c         |  69 ++++++++
 drivers/irqchip/irq-gic-common.h         |   5 +
 drivers/irqchip/irq-gic-v2m.c            |   7 +-
 drivers/irqchip/irq-gic-v3-its-pci-msi.c |   8 +-
 drivers/vfio/vfio_iommu_type1.c          | 157 ++++++++++++++++-
 include/linux/iommu.h                    |  31 ++++
 include/linux/msi.h                      |   2 +
 include/uapi/linux/vfio.h                |  10 ++
 kernel/irq/msi.c                         |   1 +
 13 files changed, 607 insertions(+), 22 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC v3 01/15] iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute
  2016-02-12  8:13 [RFC v3 00/15] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
@ 2016-02-12  8:13 ` Eric Auger
  2016-02-12  8:13 ` [RFC v3 02/15] vfio: expose MSI mapping requirement through VFIO_IOMMU_GET_INFO Eric Auger
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2016-02-12  8:13 UTC (permalink / raw)
  To: eric.auger, eric.auger, alex.williamson, will.deacon, joro, tglx,
	jason, marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm,
	kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

Introduce new DOMAIN_ATTR_MSI_MAPPING domain attribute. If supported,
this means the MSI addresses need to be mapped in the IOMMU. ARM SMMUS
and FSL PAMU, at least expose this attribute.

x86 IOMMUs typically don't expose the attribute since on x86, MSI write
transaction addresses always are within the 1MB PA region [FEE0_0000h -
FEF0_000h] window which directly targets the APIC configuration space and
hence bypass the sMMU.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

RFC v1 -> v1:
- the data field is not used
- for this attribute domain_get_attr simply returns 0 if the MSI_MAPPING
  capability if needed or <0 if not.
- removed struct iommu_domain_msi_maps
---
 drivers/iommu/arm-smmu.c        | 2 ++
 drivers/iommu/fsl_pamu_domain.c | 2 ++
 include/linux/iommu.h           | 1 +
 3 files changed, 5 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 59ee4b8..c8b7e71 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1409,6 +1409,8 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 	case DOMAIN_ATTR_NESTING:
 		*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
 		return 0;
+	case DOMAIN_ATTR_MSI_MAPPING:
+		return 0;
 	default:
 		return -ENODEV;
 	}
diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index da0e1e3..46d5c6a 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -856,6 +856,8 @@ static int fsl_pamu_get_domain_attr(struct iommu_domain *domain,
 	case DOMAIN_ATTR_FSL_PAMUV1:
 		*(int *)data = DOMAIN_ATTR_FSL_PAMUV1;
 		break;
+	case DOMAIN_ATTR_MSI_MAPPING:
+		break;
 	default:
 		pr_debug("Unsupported attribute type\n");
 		ret = -EINVAL;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a5c539f..a4fe04a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -112,6 +112,7 @@ enum iommu_attr {
 	DOMAIN_ATTR_FSL_PAMU_ENABLE,
 	DOMAIN_ATTR_FSL_PAMUV1,
 	DOMAIN_ATTR_NESTING,	/* two stages of translation */
+	DOMAIN_ATTR_MSI_MAPPING, /* Require MSIs mapping in iommu */
 	DOMAIN_ATTR_MAX,
 };
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC v3 02/15] vfio: expose MSI mapping requirement through VFIO_IOMMU_GET_INFO
  2016-02-12  8:13 [RFC v3 00/15] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
  2016-02-12  8:13 ` [RFC v3 01/15] iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute Eric Auger
@ 2016-02-12  8:13 ` Eric Auger
  2016-02-18  9:34   ` Marc Zyngier
  2016-02-12  8:13 ` [RFC v3 03/15] vfio: introduce VFIO_IOVA_RESERVED vfio_dma type Eric Auger
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Eric Auger @ 2016-02-12  8:13 UTC (permalink / raw)
  To: eric.auger, eric.auger, alex.williamson, will.deacon, joro, tglx,
	jason, marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm,
	kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

This patch allows the user-space to retrieve whether msi write
transaction addresses must be mapped. This is returned through the
VFIO_IOMMU_GET_INFO API and its new flag: VFIO_IOMMU_INFO_REQUIRE_MSI_MAP.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

RFC v1 -> v1:
- derived from
  [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
- renamed allow_msi_reconfig into require_msi_mapping
- fixed VFIO_IOMMU_GET_INFO
---
 drivers/vfio/vfio_iommu_type1.c | 26 ++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       |  1 +
 2 files changed, 27 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 6f1ea3d..c5b57e1 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -255,6 +255,29 @@ static int vaddr_get_pfn(unsigned long vaddr, int prot, unsigned long *pfn)
 }
 
 /*
+ * vfio_domains_require_msi_mapping: indicates whether MSI write transaction
+ * addresses must be mapped
+ *
+ * returns true if it does
+ */
+static bool vfio_domains_require_msi_mapping(struct vfio_iommu *iommu)
+{
+	struct vfio_domain *d;
+	bool ret;
+
+	mutex_lock(&iommu->lock);
+	/* All domains have same require_msi_map property, pick first */
+	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
+	if (iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_MAPPING, NULL) < 0)
+		ret = false;
+	else
+		ret = true;
+	mutex_unlock(&iommu->lock);
+
+	return ret;
+}
+
+/*
  * Attempt to pin pages.  We really don't want to track all the pfns and
  * the iommu can only map chunks of consecutive pfns anyway, so get the
  * first page and all consecutive pages with the same locking.
@@ -997,6 +1020,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 		info.flags = VFIO_IOMMU_INFO_PGSIZES;
 
+		if (vfio_domains_require_msi_mapping(iommu))
+			info.flags |= VFIO_IOMMU_INFO_REQUIRE_MSI_MAP;
+
 		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
 		return copy_to_user((void __user *)arg, &info, minsz);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 7d7a4c6..43e183b 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -400,6 +400,7 @@ struct vfio_iommu_type1_info {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
+#define VFIO_IOMMU_INFO_REQUIRE_MSI_MAP (1 << 1)/* MSI must be mapped */
 	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
 };
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC v3 03/15] vfio: introduce VFIO_IOVA_RESERVED vfio_dma type
  2016-02-12  8:13 [RFC v3 00/15] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
  2016-02-12  8:13 ` [RFC v3 01/15] iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute Eric Auger
  2016-02-12  8:13 ` [RFC v3 02/15] vfio: expose MSI mapping requirement through VFIO_IOMMU_GET_INFO Eric Auger
@ 2016-02-12  8:13 ` Eric Auger
  2016-02-12  8:13 ` [RFC v3 04/15] iommu: add alloc/free_reserved_iova_domain Eric Auger
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2016-02-12  8:13 UTC (permalink / raw)
  To: eric.auger, eric.auger, alex.williamson, will.deacon, joro, tglx,
	jason, marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm,
	kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

We introduce a vfio_dma type since we will need to discriminate
legacy vfio_dma's from new reserved ones. Since those latter are
not mapped at registration, some treatments need to be reworked:
removal, replay. Currently they are unplugged. In subsequent patches
they will be reworked.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 drivers/vfio/vfio_iommu_type1.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c5b57e1..b9326c9 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -53,6 +53,15 @@ module_param_named(disable_hugepages,
 MODULE_PARM_DESC(disable_hugepages,
 		 "Disable VFIO IOMMU support for IOMMU hugepages.");
 
+enum vfio_iova_type {
+	VFIO_IOVA_USER = 0, /* standard IOVA used to map user vaddr */
+	/*
+	 * IOVA reserved to map special host physical addresses,
+	 * MSI frames for instance
+	 */
+	VFIO_IOVA_RESERVED,
+};
+
 struct vfio_iommu {
 	struct list_head	domain_list;
 	struct mutex		lock;
@@ -75,6 +84,7 @@ struct vfio_dma {
 	unsigned long		vaddr;		/* Process virtual addr */
 	size_t			size;		/* Map size (bytes) */
 	int			prot;		/* IOMMU_READ/WRITE */
+	enum vfio_iova_type	type;		/* type of IOVA */
 };
 
 struct vfio_group {
@@ -418,7 +428,8 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 
 static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
 {
-	vfio_unmap_unpin(iommu, dma);
+	if (likely(dma->type != VFIO_IOVA_RESERVED))
+		vfio_unmap_unpin(iommu, dma);
 	vfio_unlink_dma(iommu, dma);
 	kfree(dma);
 }
@@ -694,6 +705,10 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 		dma_addr_t iova;
 
 		dma = rb_entry(n, struct vfio_dma, node);
+
+		if (unlikely(dma->type == VFIO_IOVA_RESERVED))
+			continue;
+
 		iova = dma->iova;
 
 		while (iova < dma->iova + dma->size) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC v3 04/15] iommu: add alloc/free_reserved_iova_domain
  2016-02-12  8:13 [RFC v3 00/15] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (2 preceding siblings ...)
  2016-02-12  8:13 ` [RFC v3 03/15] vfio: introduce VFIO_IOVA_RESERVED vfio_dma type Eric Auger
@ 2016-02-12  8:13 ` Eric Auger
  2016-02-12  8:13 ` [RFC v3 05/15] iommu/arm-smmu: implement alloc/free_reserved_iova_domain Eric Auger
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2016-02-12  8:13 UTC (permalink / raw)
  To: eric.auger, eric.auger, alex.williamson, will.deacon, joro, tglx,
	jason, marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm,
	kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

Introduce alloc/free_reserved_iova_domain in the IOMMU API.
alloc_reserved_iova_domain initializes an iova domain at a given
iova base address and with a given size. This iova domain will
be used to allocate iova within that window. Those IOVAs will be reserved
for special purpose, typically MSI frame binding. Allocation function
within the reserved iova domain will be introduced in subsequent patches.

This is the responsability of the API user to make sure any IOVA
belonging to that domain are allocated with those allocation functions.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---
v2 -> v3:
- remove iommu_alloc_reserved_iova_domain & iommu_free_reserved_iova_domain
  static implementation in case CONFIG_IOMMU_API is not set

v1 -> v2:
- moved from vfio API to IOMMU API
---
 drivers/iommu/iommu.c | 22 ++++++++++++++++++++++
 include/linux/iommu.h | 10 ++++++++++
 2 files changed, 32 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0e3b009..a994f34 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1557,6 +1557,28 @@ int iommu_domain_set_attr(struct iommu_domain *domain,
 }
 EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
 
+int iommu_alloc_reserved_iova_domain(struct iommu_domain *domain,
+				     dma_addr_t iova, size_t size,
+				     unsigned long order)
+{
+	int ret;
+
+	if (!domain->ops->alloc_reserved_iova_domain)
+		return -EINVAL;
+	ret = domain->ops->alloc_reserved_iova_domain(domain, iova,
+						      size, order);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_alloc_reserved_iova_domain);
+
+void iommu_free_reserved_iova_domain(struct iommu_domain *domain)
+{
+	if (!domain->ops->free_reserved_iova_domain)
+		return;
+	domain->ops->free_reserved_iova_domain(domain);
+}
+EXPORT_SYMBOL_GPL(iommu_free_reserved_iova_domain);
+
 void iommu_get_dm_regions(struct device *dev, struct list_head *list)
 {
 	const struct iommu_ops *ops = dev->bus->iommu_ops;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a4fe04a..2d1f155 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -195,6 +195,12 @@ struct iommu_ops {
 	int (*domain_set_windows)(struct iommu_domain *domain, u32 w_count);
 	/* Get the number of windows per domain */
 	u32 (*domain_get_windows)(struct iommu_domain *domain);
+	/* allocates the reserved iova domain */
+	int (*alloc_reserved_iova_domain)(struct iommu_domain *domain,
+					  dma_addr_t iova, size_t size,
+					  unsigned long order);
+	/* frees the reserved iova domain */
+	void (*free_reserved_iova_domain)(struct iommu_domain *domain);
 
 #ifdef CONFIG_OF_IOMMU
 	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
@@ -266,6 +272,10 @@ extern int iommu_domain_get_attr(struct iommu_domain *domain, enum iommu_attr,
 				 void *data);
 extern int iommu_domain_set_attr(struct iommu_domain *domain, enum iommu_attr,
 				 void *data);
+extern int iommu_alloc_reserved_iova_domain(struct iommu_domain *domain,
+					    dma_addr_t iova, size_t size,
+					    unsigned long order);
+extern void iommu_free_reserved_iova_domain(struct iommu_domain *domain);
 struct device *iommu_device_create(struct device *parent, void *drvdata,
 				   const struct attribute_group **groups,
 				   const char *fmt, ...) __printf(4, 5);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC v3 05/15] iommu/arm-smmu: implement alloc/free_reserved_iova_domain
  2016-02-12  8:13 [RFC v3 00/15] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (3 preceding siblings ...)
  2016-02-12  8:13 ` [RFC v3 04/15] iommu: add alloc/free_reserved_iova_domain Eric Auger
@ 2016-02-12  8:13 ` Eric Auger
  2016-02-18 11:09   ` Robin Murphy
  2016-02-12  8:13 ` [RFC v3 06/15] iommu/arm-smmu: add a reserved binding RB tree Eric Auger
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Eric Auger @ 2016-02-12  8:13 UTC (permalink / raw)
  To: eric.auger, eric.auger, alex.williamson, will.deacon, joro, tglx,
	jason, marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm,
	kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

Implement alloc/free_reserved_iova_domain for arm-smmu. we use
the iova allocator (iova.c). The iova_domain is attached to the
arm_smmu_domain struct. A mutex is introduced to protect it.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---
v2 -> v3:
- select IOMMU_IOVA when ARM_SMMU or ARM_SMMU_V3 is set

v1 -> v2:
- formerly implemented in vfio_iommu_type1
---
 drivers/iommu/Kconfig    |  2 ++
 drivers/iommu/arm-smmu.c | 87 +++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 74 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index a1e75cb..1106528 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -289,6 +289,7 @@ config ARM_SMMU
 	bool "ARM Ltd. System MMU (SMMU) Support"
 	depends on (ARM64 || ARM) && MMU
 	select IOMMU_API
+	select IOMMU_IOVA
 	select IOMMU_IO_PGTABLE_LPAE
 	select ARM_DMA_USE_IOMMU if ARM
 	help
@@ -302,6 +303,7 @@ config ARM_SMMU_V3
 	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
 	depends on ARM64 && PCI
 	select IOMMU_API
+	select IOMMU_IOVA
 	select IOMMU_IO_PGTABLE_LPAE
 	select GENERIC_MSI_IRQ_DOMAIN
 	help
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c8b7e71..f42341d 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -42,6 +42,7 @@
 #include <linux/platform_device.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
+#include <linux/iova.h>
 
 #include <linux/amba/bus.h>
 
@@ -347,6 +348,9 @@ struct arm_smmu_domain {
 	enum arm_smmu_domain_stage	stage;
 	struct mutex			init_mutex; /* Protects smmu pointer */
 	struct iommu_domain		domain;
+	struct iova_domain		*reserved_iova_domain;
+	/* protects reserved domain manipulation */
+	struct mutex			reserved_mutex;
 };
 
 static struct iommu_ops arm_smmu_ops;
@@ -975,6 +979,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 		return NULL;
 
 	mutex_init(&smmu_domain->init_mutex);
+	mutex_init(&smmu_domain->reserved_mutex);
 	spin_lock_init(&smmu_domain->pgtbl_lock);
 
 	return &smmu_domain->domain;
@@ -1446,22 +1451,74 @@ out_unlock:
 	return ret;
 }
 
+static int arm_smmu_alloc_reserved_iova_domain(struct iommu_domain *domain,
+					       dma_addr_t iova, size_t size,
+					       unsigned long order)
+{
+	unsigned long granule, mask;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	int ret = 0;
+
+	granule = 1UL << order;
+	mask = granule - 1;
+	if (iova & mask || (!size) || (size & mask))
+		return -EINVAL;
+
+	if (smmu_domain->reserved_iova_domain)
+		return -EEXIST;
+
+	mutex_lock(&smmu_domain->reserved_mutex);
+
+	smmu_domain->reserved_iova_domain =
+		kzalloc(sizeof(struct iova_domain), GFP_KERNEL);
+	if (!smmu_domain->reserved_iova_domain) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	init_iova_domain(smmu_domain->reserved_iova_domain,
+			 granule, iova >> order, (iova + size - 1) >> order);
+
+unlock:
+	mutex_unlock(&smmu_domain->reserved_mutex);
+	return ret;
+}
+
+static void arm_smmu_free_reserved_iova_domain(struct iommu_domain *domain)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct iova_domain *iovad = smmu_domain->reserved_iova_domain;
+
+	if (!iovad)
+		return;
+
+	mutex_lock(&smmu_domain->reserved_mutex);
+
+	put_iova_domain(iovad);
+	kfree(iovad);
+
+	mutex_unlock(&smmu_domain->reserved_mutex);
+}
+
 static struct iommu_ops arm_smmu_ops = {
-	.capable		= arm_smmu_capable,
-	.domain_alloc		= arm_smmu_domain_alloc,
-	.domain_free		= arm_smmu_domain_free,
-	.attach_dev		= arm_smmu_attach_dev,
-	.detach_dev		= arm_smmu_detach_dev,
-	.map			= arm_smmu_map,
-	.unmap			= arm_smmu_unmap,
-	.map_sg			= default_iommu_map_sg,
-	.iova_to_phys		= arm_smmu_iova_to_phys,
-	.add_device		= arm_smmu_add_device,
-	.remove_device		= arm_smmu_remove_device,
-	.device_group		= arm_smmu_device_group,
-	.domain_get_attr	= arm_smmu_domain_get_attr,
-	.domain_set_attr	= arm_smmu_domain_set_attr,
-	.pgsize_bitmap		= -1UL, /* Restricted during device attach */
+	.capable			= arm_smmu_capable,
+	.domain_alloc			= arm_smmu_domain_alloc,
+	.domain_free			= arm_smmu_domain_free,
+	.attach_dev			= arm_smmu_attach_dev,
+	.detach_dev			= arm_smmu_detach_dev,
+	.map				= arm_smmu_map,
+	.unmap				= arm_smmu_unmap,
+	.map_sg				= default_iommu_map_sg,
+	.iova_to_phys			= arm_smmu_iova_to_phys,
+	.add_device			= arm_smmu_add_device,
+	.remove_device			= arm_smmu_remove_device,
+	.device_group			= arm_smmu_device_group,
+	.domain_get_attr		= arm_smmu_domain_get_attr,
+	.domain_set_attr		= arm_smmu_domain_set_attr,
+	.alloc_reserved_iova_domain	= arm_smmu_alloc_reserved_iova_domain,
+	.free_reserved_iova_domain	= arm_smmu_free_reserved_iova_domain,
+	/* Page size bitmap, restricted during device attach */
+	.pgsize_bitmap			= -1UL,
 };
 
 static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC v3 06/15] iommu/arm-smmu: add a reserved binding RB tree
  2016-02-12  8:13 [RFC v3 00/15] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (4 preceding siblings ...)
  2016-02-12  8:13 ` [RFC v3 05/15] iommu/arm-smmu: implement alloc/free_reserved_iova_domain Eric Auger
@ 2016-02-12  8:13 ` Eric Auger
  2016-02-12  8:13 ` [RFC v3 07/15] iommu: iommu_get/put_single_reserved Eric Auger
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2016-02-12  8:13 UTC (permalink / raw)
  To: eric.auger, eric.auger, alex.williamson, will.deacon, joro, tglx,
	jason, marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm,
	kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

we will need to track which host physical addresses are mapped to
reserved IOVA. In that prospect we introduce a new RB tree indexed
by physical address. This RB tree only is used for reserved IOVA
bindings.

It is expected this RB tree will contain very few bindings. Those
generally correspond to single page mapping one MSI frame (GICv2m
frame or ITS GITS_TRANSLATER frame).

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---
---
 drivers/iommu/arm-smmu.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 64 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index f42341d..729a4c6 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -349,10 +349,21 @@ struct arm_smmu_domain {
 	struct mutex			init_mutex; /* Protects smmu pointer */
 	struct iommu_domain		domain;
 	struct iova_domain		*reserved_iova_domain;
-	/* protects reserved domain manipulation */
+	/* rb tree indexed by PA, for reserved bindings only */
+	struct rb_root			reserved_binding_list;
+	/* protects reserved domain and rbtree manipulation */
 	struct mutex			reserved_mutex;
 };
 
+struct arm_smmu_reserved_binding {
+	struct kref		kref;
+	struct rb_node		node;
+	struct arm_smmu_domain	*domain;
+	phys_addr_t		addr;
+	dma_addr_t		iova;
+	size_t			size;
+};
+
 static struct iommu_ops arm_smmu_ops;
 
 static DEFINE_SPINLOCK(arm_smmu_devices_lock);
@@ -400,6 +411,57 @@ static struct device_node *dev_get_dev_node(struct device *dev)
 	return dev->of_node;
 }
 
+/* Reserved binding RB-tree manipulation */
+
+static struct arm_smmu_reserved_binding *find_reserved_binding(
+				    struct arm_smmu_domain *d,
+				    phys_addr_t start, size_t size)
+{
+	struct rb_node *node = d->reserved_binding_list.rb_node;
+
+	while (node) {
+		struct arm_smmu_reserved_binding *binding =
+			rb_entry(node, struct arm_smmu_reserved_binding, node);
+
+		if (start + size <= binding->addr)
+			node = node->rb_left;
+		else if (start >= binding->addr + binding->size)
+			node = node->rb_right;
+		else
+			return binding;
+	}
+
+	return NULL;
+}
+
+static void link_reserved_binding(struct arm_smmu_domain *d,
+				  struct arm_smmu_reserved_binding *new)
+{
+	struct rb_node **link = &d->reserved_binding_list.rb_node;
+	struct rb_node *parent = NULL;
+	struct arm_smmu_reserved_binding *binding;
+
+	while (*link) {
+		parent = *link;
+		binding = rb_entry(parent, struct arm_smmu_reserved_binding,
+				   node);
+
+		if (new->addr + new->size <= binding->addr)
+			link = &(*link)->rb_left;
+		else
+			link = &(*link)->rb_right;
+	}
+
+	rb_link_node(&new->node, parent, link);
+	rb_insert_color(&new->node, &d->reserved_binding_list);
+}
+
+static void unlink_reserved_binding(struct arm_smmu_domain *d,
+				    struct arm_smmu_reserved_binding *old)
+{
+	rb_erase(&old->node, &d->reserved_binding_list);
+}
+
 static struct arm_smmu_master *find_smmu_master(struct arm_smmu_device *smmu,
 						struct device_node *dev_node)
 {
@@ -981,6 +1043,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 	mutex_init(&smmu_domain->init_mutex);
 	mutex_init(&smmu_domain->reserved_mutex);
 	spin_lock_init(&smmu_domain->pgtbl_lock);
+	smmu_domain->reserved_binding_list = RB_ROOT;
 
 	return &smmu_domain->domain;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC v3 07/15] iommu: iommu_get/put_single_reserved
  2016-02-12  8:13 [RFC v3 00/15] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (5 preceding siblings ...)
  2016-02-12  8:13 ` [RFC v3 06/15] iommu/arm-smmu: add a reserved binding RB tree Eric Auger
@ 2016-02-12  8:13 ` Eric Auger
  2016-02-18 11:06   ` Marc Zyngier
  2016-02-12  8:13 ` [RFC v3 08/15] iommu/arm-smmu: implement iommu_get/put_single_reserved Eric Auger
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Eric Auger @ 2016-02-12  8:13 UTC (permalink / raw)
  To: eric.auger, eric.auger, alex.williamson, will.deacon, joro, tglx,
	jason, marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm,
	kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

This patch introduces iommu_get/put_single_reserved.

iommu_get_single_reserved allows to allocate a new reserved iova page
and map it onto the physical page that contains a given physical address.
It returns the iova that is mapped onto the provided physical address.
Hence the physical address passed in argument does not need to be aligned.

In case a mapping already exists between both pages, the IOVA mapped
to the PA is directly returned.

Each time an iova is successfully returned a binding ref count is
incremented.

iommu_put_single_reserved decrements the ref count and when this latter
is null, the mapping is destroyed and the iova is released.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Ankit Jindal <ajindal@apm.com>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>

---

v2 -> v3:
- remove static implementation of iommu_get_single_reserved &
  iommu_put_single_reserved when CONFIG_IOMMU_API is not set

v1 -> v2:
- previously a VFIO API, named vfio_alloc_map/unmap_free_reserved_iova
---
 drivers/iommu/iommu.c | 21 +++++++++++++++++++++
 include/linux/iommu.h | 20 ++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index a994f34..14ebde1 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1415,6 +1415,27 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 	return unmapped;
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
+int iommu_get_single_reserved(struct iommu_domain *domain,
+			      phys_addr_t addr, int prot,
+			      dma_addr_t *iova)
+{
+	if (!domain->ops->get_single_reserved)
+		return  -ENODEV;
+
+	return domain->ops->get_single_reserved(domain, addr, prot, iova);
+
+}
+EXPORT_SYMBOL_GPL(iommu_get_single_reserved);
+
+void iommu_put_single_reserved(struct iommu_domain *domain,
+			       dma_addr_t iova)
+{
+	if (!domain->ops->put_single_reserved)
+		return;
+
+	domain->ops->put_single_reserved(domain, iova);
+}
+EXPORT_SYMBOL_GPL(iommu_put_single_reserved);
 
 size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 			 struct scatterlist *sg, unsigned int nents, int prot)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2d1f155..1e00c1b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -201,6 +201,21 @@ struct iommu_ops {
 					  unsigned long order);
 	/* frees the reserved iova domain */
 	void (*free_reserved_iova_domain)(struct iommu_domain *domain);
+	/**
+	 * allocate a reserved iova page and bind it onto the page that
+	 * contains a physical address (@addr), returns the @iova bound to
+	 * @addr. In case the 2 pages already are bound simply return @iova
+	 * and increment a ref count.
+	 */
+	int (*get_single_reserved)(struct iommu_domain *domain,
+					 phys_addr_t addr, int prot,
+					 dma_addr_t *iova);
+	/**
+	 * decrement a ref count of the iova page. If null, unmap the iova page
+	 * and release the iova
+	 */
+	void (*put_single_reserved)(struct iommu_domain *domain,
+					   dma_addr_t iova);
 
 #ifdef CONFIG_OF_IOMMU
 	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
@@ -276,6 +291,11 @@ extern int iommu_alloc_reserved_iova_domain(struct iommu_domain *domain,
 					    dma_addr_t iova, size_t size,
 					    unsigned long order);
 extern void iommu_free_reserved_iova_domain(struct iommu_domain *domain);
+extern int iommu_get_single_reserved(struct iommu_domain *domain,
+				     phys_addr_t paddr, int prot,
+				     dma_addr_t *iova);
+extern void iommu_put_single_reserved(struct iommu_domain *domain,
+				      dma_addr_t iova);
 struct device *iommu_device_create(struct device *parent, void *drvdata,
 				   const struct attribute_group **groups,
 				   const char *fmt, ...) __printf(4, 5);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC v3 08/15] iommu/arm-smmu: implement iommu_get/put_single_reserved
  2016-02-12  8:13 [RFC v3 00/15] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (6 preceding siblings ...)
  2016-02-12  8:13 ` [RFC v3 07/15] iommu: iommu_get/put_single_reserved Eric Auger
@ 2016-02-12  8:13 ` Eric Auger
  2016-02-12  8:13 ` [RFC v3 09/15] iommu/arm-smmu: relinquish reserved resources on domain deletion Eric Auger
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2016-02-12  8:13 UTC (permalink / raw)
  To: eric.auger, eric.auger, alex.williamson, will.deacon, joro, tglx,
	jason, marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm,
	kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

Implement the iommu_get/put_single_reserved API in arm-smmu.

In order to track which physical address is already mapped we
use the RB tree indexed by PA.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v1 -> v2:
- previously in vfio_iommu_type1.c
---
 drivers/iommu/arm-smmu.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 114 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 729a4c6..9961bfd 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1563,6 +1563,118 @@ static void arm_smmu_free_reserved_iova_domain(struct iommu_domain *domain)
 	mutex_unlock(&smmu_domain->reserved_mutex);
 }
 
+static int arm_smmu_get_single_reserved(struct iommu_domain *domain,
+					phys_addr_t addr, int prot,
+					dma_addr_t *iova)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	unsigned long order = __ffs(domain->ops->pgsize_bitmap);
+	size_t page_size = 1 << order;
+	phys_addr_t mask = page_size - 1;
+	phys_addr_t aligned_addr = addr & ~mask;
+	phys_addr_t offset  = addr - aligned_addr;
+	struct arm_smmu_reserved_binding *b;
+	struct iova *p_iova;
+	struct iova_domain *iovad = smmu_domain->reserved_iova_domain;
+	int ret;
+
+	if (!iovad)
+		return -EINVAL;
+
+	mutex_lock(&smmu_domain->reserved_mutex);
+
+	b = find_reserved_binding(smmu_domain, aligned_addr, page_size);
+	if (b) {
+		*iova = b->iova + offset;
+		kref_get(&b->kref);
+		ret = 0;
+		goto unlock;
+	}
+
+	/* there is no existing reserved iova for this pa */
+	p_iova = alloc_iova(iovad, 1, iovad->dma_32bit_pfn, true);
+	if (!p_iova) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+	*iova = p_iova->pfn_lo << order;
+
+	b = kzalloc(sizeof(*b), GFP_KERNEL);
+	if (!b) {
+		ret = -ENOMEM;
+		goto free_iova_unlock;
+	}
+
+	ret = arm_smmu_map(domain, *iova, aligned_addr, page_size, prot);
+	if (ret)
+		goto free_binding_iova_unlock;
+
+	kref_init(&b->kref);
+	kref_get(&b->kref);
+	b->domain = smmu_domain;
+	b->addr = aligned_addr;
+	b->iova = *iova;
+	b->size = page_size;
+
+	link_reserved_binding(smmu_domain, b);
+
+	*iova += offset;
+	goto unlock;
+
+free_binding_iova_unlock:
+	kfree(b);
+free_iova_unlock:
+	free_iova(smmu_domain->reserved_iova_domain, *iova >> order);
+unlock:
+	mutex_unlock(&smmu_domain->reserved_mutex);
+	return ret;
+}
+
+/* called with reserved_mutex locked */
+static void reserved_binding_release(struct kref *kref)
+{
+	struct arm_smmu_reserved_binding *b =
+		container_of(kref, struct arm_smmu_reserved_binding, kref);
+	struct arm_smmu_domain *smmu_domain = b->domain;
+	struct iommu_domain *d = &smmu_domain->domain;
+	unsigned long order = __ffs(b->size);
+
+
+	arm_smmu_unmap(d, b->iova, b->size);
+	free_iova(smmu_domain->reserved_iova_domain, b->iova >> order);
+	unlink_reserved_binding(smmu_domain, b);
+	kfree(b);
+}
+
+static void arm_smmu_put_single_reserved(struct iommu_domain *domain,
+					 dma_addr_t iova)
+{
+	unsigned long order;
+	phys_addr_t aligned_addr;
+	dma_addr_t aligned_iova, page_size, mask, offset;
+	struct arm_smmu_reserved_binding *b;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	order = __ffs(domain->ops->pgsize_bitmap);
+	page_size = (uint64_t)1 << order;
+	mask = page_size - 1;
+
+	aligned_iova = iova & ~mask;
+	offset = iova - aligned_iova;
+
+	aligned_addr = iommu_iova_to_phys(domain, aligned_iova);
+
+	mutex_lock(&smmu_domain->reserved_mutex);
+
+	b = find_reserved_binding(smmu_domain, aligned_addr, page_size);
+	if (!b)
+		goto unlock;
+	kref_put(&b->kref, reserved_binding_release);
+
+unlock:
+	mutex_unlock(&smmu_domain->reserved_mutex);
+}
+
 static struct iommu_ops arm_smmu_ops = {
 	.capable			= arm_smmu_capable,
 	.domain_alloc			= arm_smmu_domain_alloc,
@@ -1580,6 +1692,8 @@ static struct iommu_ops arm_smmu_ops = {
 	.domain_set_attr		= arm_smmu_domain_set_attr,
 	.alloc_reserved_iova_domain	= arm_smmu_alloc_reserved_iova_domain,
 	.free_reserved_iova_domain	= arm_smmu_free_reserved_iova_domain,
+	.get_single_reserved		= arm_smmu_get_single_reserved,
+	.put_single_reserved		= arm_smmu_put_single_reserved,
 	/* Page size bitmap, restricted during device attach */
 	.pgsize_bitmap			= -1UL,
 };
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC v3 09/15] iommu/arm-smmu: relinquish reserved resources on domain deletion
  2016-02-12  8:13 [RFC v3 00/15] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (7 preceding siblings ...)
  2016-02-12  8:13 ` [RFC v3 08/15] iommu/arm-smmu: implement iommu_get/put_single_reserved Eric Auger
@ 2016-02-12  8:13 ` Eric Auger
  2016-02-12  8:13 ` [RFC v3 10/15] vfio: allow the user to register reserved iova range for MSI mapping Eric Auger
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2016-02-12  8:13 UTC (permalink / raw)
  To: eric.auger, eric.auger, alex.williamson, will.deacon, joro, tglx,
	jason, marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm,
	kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

arm_smmu_unmap_reserved releases all reserved binding resources:
destroy all bindings, free iova, free iova_domain. This happens
on domain deletion.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 drivers/iommu/arm-smmu.c | 34 +++++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 9961bfd..ae8a97d 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -363,6 +363,7 @@ struct arm_smmu_reserved_binding {
 	dma_addr_t		iova;
 	size_t			size;
 };
+static void arm_smmu_unmap_reserved(struct iommu_domain *domain);
 
 static struct iommu_ops arm_smmu_ops;
 
@@ -1057,6 +1058,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 	 * already been detached.
 	 */
 	arm_smmu_destroy_domain_context(domain);
+	arm_smmu_unmap_reserved(domain);
 	kfree(smmu_domain);
 }
 
@@ -1547,19 +1549,23 @@ unlock:
 	return ret;
 }
 
-static void arm_smmu_free_reserved_iova_domain(struct iommu_domain *domain)
+static void __arm_smmu_free_reserved_iova_domain(struct arm_smmu_domain *sd)
 {
-	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-	struct iova_domain *iovad = smmu_domain->reserved_iova_domain;
+	struct iova_domain *iovad = sd->reserved_iova_domain;
 
 	if (!iovad)
 		return;
 
-	mutex_lock(&smmu_domain->reserved_mutex);
-
 	put_iova_domain(iovad);
 	kfree(iovad);
+}
 
+static void arm_smmu_free_reserved_iova_domain(struct iommu_domain *domain)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	mutex_lock(&smmu_domain->reserved_mutex);
+	__arm_smmu_free_reserved_iova_domain(smmu_domain);
 	mutex_unlock(&smmu_domain->reserved_mutex);
 }
 
@@ -1675,6 +1681,24 @@ unlock:
 	mutex_unlock(&smmu_domain->reserved_mutex);
 }
 
+static void arm_smmu_unmap_reserved(struct iommu_domain *domain)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct rb_node *node;
+
+	mutex_lock(&smmu_domain->reserved_mutex);
+	while ((node = rb_first(&smmu_domain->reserved_binding_list))) {
+		struct arm_smmu_reserved_binding *b =
+			rb_entry(node, struct arm_smmu_reserved_binding, node);
+
+		while (!kref_put(&b->kref, reserved_binding_release))
+			;
+	}
+	smmu_domain->reserved_binding_list = RB_ROOT;
+	__arm_smmu_free_reserved_iova_domain(smmu_domain);
+	mutex_unlock(&smmu_domain->reserved_mutex);
+}
+
 static struct iommu_ops arm_smmu_ops = {
 	.capable			= arm_smmu_capable,
 	.domain_alloc			= arm_smmu_domain_alloc,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC v3 10/15] vfio: allow the user to register reserved iova range for MSI mapping
  2016-02-12  8:13 [RFC v3 00/15] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (8 preceding siblings ...)
  2016-02-12  8:13 ` [RFC v3 09/15] iommu/arm-smmu: relinquish reserved resources on domain deletion Eric Auger
@ 2016-02-12  8:13 ` Eric Auger
  2016-02-12  8:13 ` [RFC v3 11/15] msi: Add a new MSI_FLAG_IRQ_REMAPPING flag Eric Auger
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2016-02-12  8:13 UTC (permalink / raw)
  To: eric.auger, eric.auger, alex.williamson, will.deacon, joro, tglx,
	jason, marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm,
	kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

The user is allowed to register a reserved IOVA range by using the
DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
It provides the base address and the size. This region is stored in the
vfio_dma rb tree. At that point the iova range is not mapped to any target
address yet. The host kernel will use those iova when needed, typically
when the VFIO-PCI device allocates its MSI's.

This patch also handles the destruction of the reserved binding RB-tree and
domain's iova_domains.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>

---

v1 -> v2:
- set returned value according to alloc_reserved_iova_domain result
- free the iova domains in case any error occurs

RFC v1 -> v1:
- takes into account Alex comments, based on
  [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
- use the existing dma map/unmap ioctl interface with a flag to register
  a reserved IOVA range. A single reserved iova region is allowed.
---
 drivers/vfio/vfio_iommu_type1.c | 75 ++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/vfio.h       |  9 +++++
 2 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index b9326c9..c5d3b48 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -673,6 +673,75 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static int vfio_register_reserved_iova_range(struct vfio_iommu *iommu,
+			   struct vfio_iommu_type1_dma_map *map)
+{
+	dma_addr_t iova = map->iova;
+	size_t size = map->size;
+	uint64_t mask;
+	struct vfio_dma *dma;
+	int ret = 0;
+	struct vfio_domain *d;
+	unsigned long order;
+
+	/* Verify that none of our __u64 fields overflow */
+	if (map->size != size || map->iova != iova)
+		return -EINVAL;
+
+	order =  __ffs(vfio_pgsize_bitmap(iommu));
+	mask = ((uint64_t)1 << order) - 1;
+
+	WARN_ON(mask & PAGE_MASK);
+
+	/* we currently only support MSI_RESERVED_IOVA */
+	if (!(map->flags & VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA))
+		return -EINVAL;
+
+	if (!size || (size | iova) & mask)
+		return -EINVAL;
+
+	/* Don't allow IOVA address wrap */
+	if (iova + size - 1 < iova)
+		return -EINVAL;
+
+	mutex_lock(&iommu->lock);
+
+	/* check if the iova domain has not been instantiated already*/
+	d = list_first_entry(&iommu->domain_list,
+				  struct vfio_domain, next);
+
+	if (vfio_find_dma(iommu, iova, size)) {
+		ret =  -EEXIST;
+		goto out;
+	}
+
+	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
+	if (!dma) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	dma->iova = iova;
+	dma->size = size;
+	dma->type = VFIO_IOVA_RESERVED;
+
+	list_for_each_entry(d, &iommu->domain_list, next)
+		ret |= iommu_alloc_reserved_iova_domain(d->domain, iova,
+							size, order);
+
+	if (ret) {
+		list_for_each_entry(d, &iommu->domain_list, next)
+			iommu_free_reserved_iova_domain(d->domain);
+		goto out;
+	}
+
+	vfio_link_dma(iommu, dma);
+
+out:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static int vfio_bus_type(struct device *dev, void *data)
 {
 	struct bus_type **bus = data;
@@ -1045,7 +1114,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
 		struct vfio_iommu_type1_dma_map map;
 		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
-				VFIO_DMA_MAP_FLAG_WRITE;
+				VFIO_DMA_MAP_FLAG_WRITE |
+				VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA;
 
 		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
 
@@ -1055,6 +1125,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		if (map.argsz < minsz || map.flags & ~mask)
 			return -EINVAL;
 
+		if (map.flags & VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA)
+			return vfio_register_reserved_iova_range(iommu, &map);
+
 		return vfio_dma_do_map(iommu, &map);
 
 	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 43e183b..982e326 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -411,12 +411,21 @@ struct vfio_iommu_type1_info {
  *
  * Map process virtual addresses to IO virtual addresses using the
  * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
+ *
+ * In case MSI_RESERVED_IOVA is set, the API only aims at registering an IOVA
+ * region which will be used on some platforms to map the host MSI frame.
+ * in that specific case, vaddr and prot are ignored. The requirement for
+ * provisioning such IOVA range can be checked by calling VFIO_IOMMU_GET_INFO
+ * with the VFIO_IOMMU_INFO_REQUIRE_MSI_MAP attribute. A single
+ * MSI_RESERVED_IOVA region can be registered
  */
 struct vfio_iommu_type1_dma_map {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
 #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
+/* reserved iova for MSI vectors*/
+#define VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA (1 << 2)
 	__u64	vaddr;				/* Process virtual address */
 	__u64	iova;				/* IO virtual address */
 	__u64	size;				/* Size of mapping (bytes) */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC v3 11/15] msi: Add a new MSI_FLAG_IRQ_REMAPPING flag
  2016-02-12  8:13 [RFC v3 00/15] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (9 preceding siblings ...)
  2016-02-12  8:13 ` [RFC v3 10/15] vfio: allow the user to register reserved iova range for MSI mapping Eric Auger
@ 2016-02-12  8:13 ` Eric Auger
  2016-02-12  8:13 ` [RFC v3 12/15] msi: export msi_get_domain_info Eric Auger
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2016-02-12  8:13 UTC (permalink / raw)
  To: eric.auger, eric.auger, alex.williamson, will.deacon, joro, tglx,
	jason, marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm,
	kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

Let's introduce a new msi_domain_info flag value, MSI_FLAG_IRQ_REMAPPING
meant to tell the domain supports IRQ REMAPPING, also known as Interrupt
Translation Service. On Intel HW this capability is abstracted on IOMMU
side while on ARM it is abstracted on MSI controller side.

GICv3 ITS HW is the first HW advertising that feature.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 drivers/irqchip/irq-gic-v3-its-pci-msi.c | 3 ++-
 include/linux/msi.h                      | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-v3-its-pci-msi.c b/drivers/irqchip/irq-gic-v3-its-pci-msi.c
index aee60ed..8223765 100644
--- a/drivers/irqchip/irq-gic-v3-its-pci-msi.c
+++ b/drivers/irqchip/irq-gic-v3-its-pci-msi.c
@@ -96,7 +96,8 @@ static struct msi_domain_ops its_pci_msi_ops = {
 
 static struct msi_domain_info its_pci_msi_domain_info = {
 	.flags	= (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
-		   MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX),
+		   MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX |
+		   MSI_FLAG_IRQ_REMAPPING),
 	.ops	= &its_pci_msi_ops,
 	.chip	= &its_msi_irq_chip,
 };
diff --git a/include/linux/msi.h b/include/linux/msi.h
index a2a0068..03eda72 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -261,6 +261,8 @@ enum {
 	MSI_FLAG_MULTI_PCI_MSI		= (1 << 3),
 	/* Support PCI MSIX interrupts */
 	MSI_FLAG_PCI_MSIX		= (1 << 4),
+	/* Support MSI IRQ remapping service */
+	MSI_FLAG_IRQ_REMAPPING		= (1 << 5),
 };
 
 int msi_domain_set_affinity(struct irq_data *data, const struct cpumask *mask,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC v3 12/15] msi: export msi_get_domain_info
  2016-02-12  8:13 [RFC v3 00/15] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (10 preceding siblings ...)
  2016-02-12  8:13 ` [RFC v3 11/15] msi: Add a new MSI_FLAG_IRQ_REMAPPING flag Eric Auger
@ 2016-02-12  8:13 ` Eric Auger
  2016-02-12  8:13 ` [RFC v3 13/15] vfio/type1: also check IRQ remapping capability at msi domain Eric Auger
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2016-02-12  8:13 UTC (permalink / raw)
  To: eric.auger, eric.auger, alex.williamson, will.deacon, joro, tglx,
	jason, marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm,
	kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

We plan to use msi_get_domain_info in VFIO module so let's export it.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v2 -> v3:
- remove static implementation in case CONFIG_PCI_MSI_IRQ_DOMAIN is not set
---
 kernel/irq/msi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 38e89ce..9b0ba4a 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -400,5 +400,6 @@ struct msi_domain_info *msi_get_domain_info(struct irq_domain *domain)
 {
 	return (struct msi_domain_info *)domain->host_data;
 }
+EXPORT_SYMBOL_GPL(msi_get_domain_info);
 
 #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC v3 13/15] vfio/type1: also check IRQ remapping capability at msi domain
  2016-02-12  8:13 [RFC v3 00/15] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (11 preceding siblings ...)
  2016-02-12  8:13 ` [RFC v3 12/15] msi: export msi_get_domain_info Eric Auger
@ 2016-02-12  8:13 ` Eric Auger
  2016-02-12  8:13 ` [RFC v3 14/15] iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP Eric Auger
  2016-02-12  8:13 ` [RFC v3 15/15] irqchip/gicv2m/v3-its-pci-msi: IOMMU map the MSI frame when needed Eric Auger
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2016-02-12  8:13 UTC (permalink / raw)
  To: eric.auger, eric.auger, alex.williamson, will.deacon, joro, tglx,
	jason, marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm,
	kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
by the msi controller. vfio_msi_parent_irq_remapping_capable allows to
check whether interrupts are "safe" for a given device. There are if the device
does not use MSI or if the device uses MSI and the msi-parent controller
supports IRQ remapping.

Then we check at group level if all devices have safe interrupts: if not
only allow the group to be attached if allow_unsafe_interrupts is set.

At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
changed in next patch.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v2 -> v3:
- protect vfio_msi_parent_irq_remapping_capable with
  CONFIG_GENERIC_MSI_IRQ_DOMAIN
---
 drivers/vfio/vfio_iommu_type1.c | 39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c5d3b48..3afb815 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -36,6 +36,8 @@
 #include <linux/uaccess.h>
 #include <linux/vfio.h>
 #include <linux/workqueue.h>
+#include <linux/irqdomain.h>
+#include <linux/msi.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
@@ -754,6 +756,32 @@ static int vfio_bus_type(struct device *dev, void *data)
 	return 0;
 }
 
+/**
+ * vfio_msi_parent_irq_remapping_capable: returns whether the device msi-parent
+ * controller supports IRQ remapping, aka interrupt translation
+ *
+ * @dev: device handle
+ * @data: unused
+ * returns 0 if irq remapping is supported or -1 if not supported.
+ */
+static int vfio_msi_parent_irq_remapping_capable(struct device *dev, void *data)
+{
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+	struct irq_domain *domain;
+	struct msi_domain_info *info;
+
+	domain = dev_get_msi_domain(dev);
+	if (!domain)
+		return 0;
+
+	info = msi_get_domain_info(domain);
+
+	if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
+		return -1;
+#endif
+	return 0;
+}
+
 static int vfio_iommu_replay(struct vfio_iommu *iommu,
 			     struct vfio_domain *domain)
 {
@@ -848,7 +876,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	struct vfio_group *group, *g;
 	struct vfio_domain *domain, *d;
 	struct bus_type *bus = NULL;
-	int ret;
+	int ret, irq_remapping;
 
 	mutex_lock(&iommu->lock);
 
@@ -871,6 +899,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 
 	group->iommu_group = iommu_group;
 
+	/*
+	 * Determine if all the devices of the group has an MSI-parent that
+	 * supports irq remapping
+	 */
+	irq_remapping = !iommu_group_for_each_dev(iommu_group, &bus,
+				       vfio_msi_parent_irq_remapping_capable);
+
 	/* Determine bus_type in order to allocate a domain */
 	ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
 	if (ret)
@@ -899,7 +934,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	list_add(&group->next, &domain->group_list);
 
 	if (!allow_unsafe_interrupts &&
-	    !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
+	    (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !irq_remapping)) {
 		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
 		       __func__);
 		ret = -EPERM;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC v3 14/15] iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP
  2016-02-12  8:13 [RFC v3 00/15] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (12 preceding siblings ...)
  2016-02-12  8:13 ` [RFC v3 13/15] vfio/type1: also check IRQ remapping capability at msi domain Eric Auger
@ 2016-02-12  8:13 ` Eric Auger
  2016-02-12  8:13 ` [RFC v3 15/15] irqchip/gicv2m/v3-its-pci-msi: IOMMU map the MSI frame when needed Eric Auger
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2016-02-12  8:13 UTC (permalink / raw)
  To: eric.auger, eric.auger, alex.williamson, will.deacon, joro, tglx,
	jason, marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm,
	kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

Do not advertise IOMMU_CAP_INTR_REMAP for arm-smmu. Indeed the
irq_remapping capability is abstracted on irqchip side for ARM as
opposed to Intel IOMMU featuring IRQ remapping HW.

So to check IRQ remmapping capability, the msi domain needs to be
checked instead.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 drivers/iommu/arm-smmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index ae8a97d..9a83285 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1354,7 +1354,7 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 		 */
 		return true;
 	case IOMMU_CAP_INTR_REMAP:
-		return true; /* MSIs are just memory writes */
+		return false; /* MSIs are just memory writes */
 	case IOMMU_CAP_NOEXEC:
 		return true;
 	default:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC v3 15/15] irqchip/gicv2m/v3-its-pci-msi: IOMMU map the MSI frame when needed
  2016-02-12  8:13 [RFC v3 00/15] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
                   ` (13 preceding siblings ...)
  2016-02-12  8:13 ` [RFC v3 14/15] iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP Eric Auger
@ 2016-02-12  8:13 ` Eric Auger
  2016-02-18 11:33   ` Marc Zyngier
  14 siblings, 1 reply; 29+ messages in thread
From: Eric Auger @ 2016-02-12  8:13 UTC (permalink / raw)
  To: eric.auger, eric.auger, alex.williamson, will.deacon, joro, tglx,
	jason, marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm,
	kvm
  Cc: suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

In case the msi_desc references a device attached to an iommu
domain, the msi address needs to be mapped in the IOMMU. Else any
MSI write transaction will cause a fault.

gic_set_msi_addr detects that case and allocates an iova bound
to the physical address page comprising the MSI frame. This iova
then is used as the msi_msg address. Unset operation decrements the
reference on the binding.

The functions are called in the irq_write_msi_msg ops implementation.
At that time we can recognize whether the msi is setup or teared down
looking at the msi_msg content. Indeed msi_domain_deactivate zeroes all
the fields.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v2 -> v3:
- protect iova/addr manipulation with CONFIG_ARCH_DMA_ADDR_T_64BIT and
  CONFIG_PHYS_ADDR_T_64BIT
- only expose gic_pci_msi_domain_write_msg in case CONFIG_IOMMU_API &
  CONFIG_PCI_MSI_IRQ_DOMAIN are set.
- gic_set/unset_msi_addr duly become static
---
 drivers/irqchip/irq-gic-common.c         | 69 ++++++++++++++++++++++++++++++++
 drivers/irqchip/irq-gic-common.h         |  5 +++
 drivers/irqchip/irq-gic-v2m.c            |  7 +++-
 drivers/irqchip/irq-gic-v3-its-pci-msi.c |  5 +++
 4 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c
index f174ce0..46cd06c 100644
--- a/drivers/irqchip/irq-gic-common.c
+++ b/drivers/irqchip/irq-gic-common.c
@@ -18,6 +18,8 @@
 #include <linux/io.h>
 #include <linux/irq.h>
 #include <linux/irqchip/arm-gic.h>
+#include <linux/iommu.h>
+#include <linux/msi.h>
 
 #include "irq-gic-common.h"
 
@@ -121,3 +123,70 @@ void gic_cpu_config(void __iomem *base, void (*sync_access)(void))
 	if (sync_access)
 		sync_access();
 }
+
+#if defined(CONFIG_IOMMU_API) && defined(CONFIG_PCI_MSI_IRQ_DOMAIN)
+static int gic_set_msi_addr(struct irq_data *data, struct msi_msg *msg)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct device *dev = msi_desc_to_dev(desc);
+	struct iommu_domain *d;
+	phys_addr_t addr;
+	dma_addr_t iova;
+	int ret;
+
+	d = iommu_get_domain_for_dev(dev);
+	if (!d)
+		return 0;
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+	addr = ((phys_addr_t)(msg->address_hi) << 32) | msg->address_lo;
+#else
+	addr = msg->address_lo;
+#endif
+
+	ret = iommu_get_single_reserved(d, addr, IOMMU_WRITE, &iova);
+
+	if (!ret) {
+		msg->address_lo = lower_32_bits(iova);
+		msg->address_hi = upper_32_bits(iova);
+	}
+	return ret;
+}
+
+
+static void gic_unset_msi_addr(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct device *dev;
+	struct iommu_domain *d;
+	dma_addr_t iova;
+
+#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
+	iova = ((dma_addr_t)(desc->msg.address_hi) << 32) |
+		desc->msg.address_lo;
+#else
+	iova = desc->msg.address_lo;
+#endif
+
+	dev = msi_desc_to_dev(desc);
+	if (!dev)
+		return;
+
+	d = iommu_get_domain_for_dev(dev);
+	if (!d)
+		return;
+
+	iommu_put_single_reserved(d, iova);
+}
+
+void gic_pci_msi_domain_write_msg(struct irq_data *irq_data,
+				  struct msi_msg *msg)
+{
+	if (!msg->address_hi && !msg->address_lo && !msg->data)
+		gic_unset_msi_addr(irq_data); /* deactivate */
+	else
+		gic_set_msi_addr(irq_data, msg); /* activate, set_affinity */
+
+	pci_msi_domain_write_msg(irq_data, msg);
+}
+#endif
+
diff --git a/drivers/irqchip/irq-gic-common.h b/drivers/irqchip/irq-gic-common.h
index fff697d..98681fd 100644
--- a/drivers/irqchip/irq-gic-common.h
+++ b/drivers/irqchip/irq-gic-common.h
@@ -35,4 +35,9 @@ void gic_cpu_config(void __iomem *base, void (*sync_access)(void));
 void gic_enable_quirks(u32 iidr, const struct gic_quirk *quirks,
 		void *data);
 
+#if defined(CONFIG_PCI_MSI_IRQ_DOMAIN) && defined(CONFIG_IOMMU_API)
+void gic_pci_msi_domain_write_msg(struct irq_data *irq_data,
+				  struct msi_msg *msg);
+#endif
+
 #endif /* _IRQ_GIC_COMMON_H */
diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index c779f83..692d809 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -24,6 +24,7 @@
 #include <linux/of_pci.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
+#include "irq-gic-common.h"
 
 /*
 * MSI_TYPER:
@@ -83,7 +84,11 @@ static struct irq_chip gicv2m_msi_irq_chip = {
 	.irq_mask		= gicv2m_mask_msi_irq,
 	.irq_unmask		= gicv2m_unmask_msi_irq,
 	.irq_eoi		= irq_chip_eoi_parent,
-	.irq_write_msi_msg	= pci_msi_domain_write_msg,
+#ifdef CONFIG_IOMMU_API
+	.irq_write_msi_msg	= gic_pci_msi_domain_write_msg,
+#else
+	.irq_write_msi_msg      = pci_msi_domain_write_msg,
+#endif
 };
 
 static struct msi_domain_info gicv2m_msi_domain_info = {
diff --git a/drivers/irqchip/irq-gic-v3-its-pci-msi.c b/drivers/irqchip/irq-gic-v3-its-pci-msi.c
index 8223765..690504e 100644
--- a/drivers/irqchip/irq-gic-v3-its-pci-msi.c
+++ b/drivers/irqchip/irq-gic-v3-its-pci-msi.c
@@ -19,6 +19,7 @@
 #include <linux/of.h>
 #include <linux/of_irq.h>
 #include <linux/of_pci.h>
+#include "irq-gic-common.h"
 
 static void its_mask_msi_irq(struct irq_data *d)
 {
@@ -37,7 +38,11 @@ static struct irq_chip its_msi_irq_chip = {
 	.irq_unmask		= its_unmask_msi_irq,
 	.irq_mask		= its_mask_msi_irq,
 	.irq_eoi		= irq_chip_eoi_parent,
+#ifdef CONFIG_IOMMU_API
+	.irq_write_msi_msg	= gic_pci_msi_domain_write_msg,
+#else
 	.irq_write_msi_msg	= pci_msi_domain_write_msg,
+#endif
 };
 
 struct its_pci_alias {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC v3 02/15] vfio: expose MSI mapping requirement through VFIO_IOMMU_GET_INFO
  2016-02-12  8:13 ` [RFC v3 02/15] vfio: expose MSI mapping requirement through VFIO_IOMMU_GET_INFO Eric Auger
@ 2016-02-18  9:34   ` Marc Zyngier
  2016-02-18 15:26     ` Eric Auger
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Zyngier @ 2016-02-18  9:34 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, alex.williamson, will.deacon, joro, tglx, jason,
	christoffer.dall, linux-arm-kernel, kvmarm, kvm,
	suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

On Fri, 12 Feb 2016 08:13:04 +0000
Eric Auger <eric.auger@linaro.org> wrote:

> This patch allows the user-space to retrieve whether msi write
> transaction addresses must be mapped. This is returned through the
> VFIO_IOMMU_GET_INFO API and its new flag: VFIO_IOMMU_INFO_REQUIRE_MSI_MAP.
> 
> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> RFC v1 -> v1:
> - derived from
>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> - renamed allow_msi_reconfig into require_msi_mapping
> - fixed VFIO_IOMMU_GET_INFO
> ---
>  drivers/vfio/vfio_iommu_type1.c | 26 ++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h       |  1 +
>  2 files changed, 27 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 6f1ea3d..c5b57e1 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -255,6 +255,29 @@ static int vaddr_get_pfn(unsigned long vaddr, int prot, unsigned long *pfn)
>  }
>  
>  /*
> + * vfio_domains_require_msi_mapping: indicates whether MSI write transaction
> + * addresses must be mapped
> + *
> + * returns true if it does
> + */
> +static bool vfio_domains_require_msi_mapping(struct vfio_iommu *iommu)
> +{
> +	struct vfio_domain *d;
> +	bool ret;
> +
> +	mutex_lock(&iommu->lock);
> +	/* All domains have same require_msi_map property, pick first */
> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> +	if (iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_MAPPING, NULL) < 0)
> +		ret = false;
> +	else
> +		ret = true;

nit: this could be simplified as:

ret = (iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_MAPPING, NULL) == 0);

> +	mutex_unlock(&iommu->lock);
> +
> +	return ret;
> +}
> +
> +/*
>   * Attempt to pin pages.  We really don't want to track all the pfns and
>   * the iommu can only map chunks of consecutive pfns anyway, so get the
>   * first page and all consecutive pages with the same locking.
> @@ -997,6 +1020,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  		info.flags = VFIO_IOMMU_INFO_PGSIZES;
>  
> +		if (vfio_domains_require_msi_mapping(iommu))
> +			info.flags |= VFIO_IOMMU_INFO_REQUIRE_MSI_MAP;
> +
>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>  
>  		return copy_to_user((void __user *)arg, &info, minsz);
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 7d7a4c6..43e183b 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -400,6 +400,7 @@ struct vfio_iommu_type1_info {
>  	__u32	argsz;
>  	__u32	flags;
>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
> +#define VFIO_IOMMU_INFO_REQUIRE_MSI_MAP (1 << 1)/* MSI must be mapped */
>  	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
>  };
>  


FWIW:

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC v3 07/15] iommu: iommu_get/put_single_reserved
  2016-02-12  8:13 ` [RFC v3 07/15] iommu: iommu_get/put_single_reserved Eric Auger
@ 2016-02-18 11:06   ` Marc Zyngier
  2016-02-18 16:42     ` Eric Auger
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Zyngier @ 2016-02-18 11:06 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, alex.williamson, will.deacon, joro, tglx, jason,
	christoffer.dall, linux-arm-kernel, kvmarm, kvm,
	suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

On Fri, 12 Feb 2016 08:13:09 +0000
Eric Auger <eric.auger@linaro.org> wrote:

> This patch introduces iommu_get/put_single_reserved.
> 
> iommu_get_single_reserved allows to allocate a new reserved iova page
> and map it onto the physical page that contains a given physical address.
> It returns the iova that is mapped onto the provided physical address.
> Hence the physical address passed in argument does not need to be aligned.
> 
> In case a mapping already exists between both pages, the IOVA mapped
> to the PA is directly returned.
> 
> Each time an iova is successfully returned a binding ref count is
> incremented.
> 
> iommu_put_single_reserved decrements the ref count and when this latter
> is null, the mapping is destroyed and the iova is released.

I wonder if there is a requirement for the caller to find out about the
size of the mapping, or to impose a given size... MSIs clearly do not
have that requirement (this is always a 32bit value), but since
allocations usually pair address and size, I though I'd ask...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC v3 05/15] iommu/arm-smmu: implement alloc/free_reserved_iova_domain
  2016-02-12  8:13 ` [RFC v3 05/15] iommu/arm-smmu: implement alloc/free_reserved_iova_domain Eric Auger
@ 2016-02-18 11:09   ` Robin Murphy
  2016-02-18 15:22     ` Eric Auger
  2016-02-18 16:06     ` Alex Williamson
  0 siblings, 2 replies; 29+ messages in thread
From: Robin Murphy @ 2016-02-18 11:09 UTC (permalink / raw)
  To: Eric Auger, eric.auger, alex.williamson, will.deacon, joro, tglx,
	jason, marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm,
	kvm
  Cc: Thomas.Lendacky, brijesh.singh, patches, Manish.Jaggi, p.fedin,
	linux-kernel, iommu, pranav.sawargaonkar, sherry.hurwitz

Hi Eric,

On 12/02/16 08:13, Eric Auger wrote:
> Implement alloc/free_reserved_iova_domain for arm-smmu. we use
> the iova allocator (iova.c). The iova_domain is attached to the
> arm_smmu_domain struct. A mutex is introduced to protect it.

The IOMMU API currently leaves IOVA management entirely up to the caller 
- VFIO is already managing its own IOVA space, so what warrants this 
being pushed all the way down to the IOMMU driver? All I see here is 
abstract code with no hardware-specific details that'll have to be 
copy-pasted into other IOMMU drivers (e.g. SMMUv3), which strongly 
suggests it's the wrong place to do it.

As I understand the problem, VFIO has a generic "configure an IOMMU to 
point at an MSI doorbell" step to do in the process of attaching a 
device, which hasn't needed implementing yet due to VT-d's 
IOMMU_CAP_I_AM_ALSO_ACTUALLY_THE_MSI_CONTROLLER_IN_DISGUISE flag, which 
most of us have managed to misinterpret so far. AFAICS all the IOMMU 
driver should need to know about this is an iommu_map() call (which will 
want a slight extension[1] to make things behave properly). We should be 
fixing the abstraction to be less x86-centric, not hacking up all the 
ARM drivers to emulate x86 hardware behaviour in software.

Robin.

[1]:http://article.gmane.org/gmane.linux.kernel.cross-arch/30833

> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>
> ---
> v2 -> v3:
> - select IOMMU_IOVA when ARM_SMMU or ARM_SMMU_V3 is set
>
> v1 -> v2:
> - formerly implemented in vfio_iommu_type1
> ---
>   drivers/iommu/Kconfig    |  2 ++
>   drivers/iommu/arm-smmu.c | 87 +++++++++++++++++++++++++++++++++++++++---------
>   2 files changed, 74 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index a1e75cb..1106528 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -289,6 +289,7 @@ config ARM_SMMU
>   	bool "ARM Ltd. System MMU (SMMU) Support"
>   	depends on (ARM64 || ARM) && MMU
>   	select IOMMU_API
> +	select IOMMU_IOVA
>   	select IOMMU_IO_PGTABLE_LPAE
>   	select ARM_DMA_USE_IOMMU if ARM
>   	help
> @@ -302,6 +303,7 @@ config ARM_SMMU_V3
>   	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>   	depends on ARM64 && PCI
>   	select IOMMU_API
> +	select IOMMU_IOVA
>   	select IOMMU_IO_PGTABLE_LPAE
>   	select GENERIC_MSI_IRQ_DOMAIN
>   	help
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index c8b7e71..f42341d 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -42,6 +42,7 @@
>   #include <linux/platform_device.h>
>   #include <linux/slab.h>
>   #include <linux/spinlock.h>
> +#include <linux/iova.h>
>
>   #include <linux/amba/bus.h>
>
> @@ -347,6 +348,9 @@ struct arm_smmu_domain {
>   	enum arm_smmu_domain_stage	stage;
>   	struct mutex			init_mutex; /* Protects smmu pointer */
>   	struct iommu_domain		domain;
> +	struct iova_domain		*reserved_iova_domain;
> +	/* protects reserved domain manipulation */
> +	struct mutex			reserved_mutex;
>   };
>
>   static struct iommu_ops arm_smmu_ops;
> @@ -975,6 +979,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>   		return NULL;
>
>   	mutex_init(&smmu_domain->init_mutex);
> +	mutex_init(&smmu_domain->reserved_mutex);
>   	spin_lock_init(&smmu_domain->pgtbl_lock);
>
>   	return &smmu_domain->domain;
> @@ -1446,22 +1451,74 @@ out_unlock:
>   	return ret;
>   }
>
> +static int arm_smmu_alloc_reserved_iova_domain(struct iommu_domain *domain,
> +					       dma_addr_t iova, size_t size,
> +					       unsigned long order)
> +{
> +	unsigned long granule, mask;
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	int ret = 0;
> +
> +	granule = 1UL << order;
> +	mask = granule - 1;
> +	if (iova & mask || (!size) || (size & mask))
> +		return -EINVAL;
> +
> +	if (smmu_domain->reserved_iova_domain)
> +		return -EEXIST;
> +
> +	mutex_lock(&smmu_domain->reserved_mutex);
> +
> +	smmu_domain->reserved_iova_domain =
> +		kzalloc(sizeof(struct iova_domain), GFP_KERNEL);
> +	if (!smmu_domain->reserved_iova_domain) {
> +		ret = -ENOMEM;
> +		goto unlock;
> +	}
> +
> +	init_iova_domain(smmu_domain->reserved_iova_domain,
> +			 granule, iova >> order, (iova + size - 1) >> order);
> +
> +unlock:
> +	mutex_unlock(&smmu_domain->reserved_mutex);
> +	return ret;
> +}
> +
> +static void arm_smmu_free_reserved_iova_domain(struct iommu_domain *domain)
> +{
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct iova_domain *iovad = smmu_domain->reserved_iova_domain;
> +
> +	if (!iovad)
> +		return;
> +
> +	mutex_lock(&smmu_domain->reserved_mutex);
> +
> +	put_iova_domain(iovad);
> +	kfree(iovad);
> +
> +	mutex_unlock(&smmu_domain->reserved_mutex);
> +}
> +
>   static struct iommu_ops arm_smmu_ops = {
> -	.capable		= arm_smmu_capable,
> -	.domain_alloc		= arm_smmu_domain_alloc,
> -	.domain_free		= arm_smmu_domain_free,
> -	.attach_dev		= arm_smmu_attach_dev,
> -	.detach_dev		= arm_smmu_detach_dev,
> -	.map			= arm_smmu_map,
> -	.unmap			= arm_smmu_unmap,
> -	.map_sg			= default_iommu_map_sg,
> -	.iova_to_phys		= arm_smmu_iova_to_phys,
> -	.add_device		= arm_smmu_add_device,
> -	.remove_device		= arm_smmu_remove_device,
> -	.device_group		= arm_smmu_device_group,
> -	.domain_get_attr	= arm_smmu_domain_get_attr,
> -	.domain_set_attr	= arm_smmu_domain_set_attr,
> -	.pgsize_bitmap		= -1UL, /* Restricted during device attach */
> +	.capable			= arm_smmu_capable,
> +	.domain_alloc			= arm_smmu_domain_alloc,
> +	.domain_free			= arm_smmu_domain_free,
> +	.attach_dev			= arm_smmu_attach_dev,
> +	.detach_dev			= arm_smmu_detach_dev,
> +	.map				= arm_smmu_map,
> +	.unmap				= arm_smmu_unmap,
> +	.map_sg				= default_iommu_map_sg,
> +	.iova_to_phys			= arm_smmu_iova_to_phys,
> +	.add_device			= arm_smmu_add_device,
> +	.remove_device			= arm_smmu_remove_device,
> +	.device_group			= arm_smmu_device_group,
> +	.domain_get_attr		= arm_smmu_domain_get_attr,
> +	.domain_set_attr		= arm_smmu_domain_set_attr,
> +	.alloc_reserved_iova_domain	= arm_smmu_alloc_reserved_iova_domain,
> +	.free_reserved_iova_domain	= arm_smmu_free_reserved_iova_domain,
> +	/* Page size bitmap, restricted during device attach */
> +	.pgsize_bitmap			= -1UL,
>   };
>
>   static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC v3 15/15] irqchip/gicv2m/v3-its-pci-msi: IOMMU map the MSI frame when needed
  2016-02-12  8:13 ` [RFC v3 15/15] irqchip/gicv2m/v3-its-pci-msi: IOMMU map the MSI frame when needed Eric Auger
@ 2016-02-18 11:33   ` Marc Zyngier
  2016-02-18 15:33     ` Eric Auger
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Zyngier @ 2016-02-18 11:33 UTC (permalink / raw)
  To: Eric Auger, leo.duran
  Cc: eric.auger, alex.williamson, will.deacon, joro, tglx, jason,
	christoffer.dall, linux-arm-kernel, kvmarm, kvm,
	suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, Thomas.Lendacky

On Fri, 12 Feb 2016 08:13:17 +0000
Eric Auger <eric.auger@linaro.org> wrote:

> In case the msi_desc references a device attached to an iommu
> domain, the msi address needs to be mapped in the IOMMU. Else any
> MSI write transaction will cause a fault.
> 
> gic_set_msi_addr detects that case and allocates an iova bound
> to the physical address page comprising the MSI frame. This iova
> then is used as the msi_msg address. Unset operation decrements the
> reference on the binding.
> 
> The functions are called in the irq_write_msi_msg ops implementation.
> At that time we can recognize whether the msi is setup or teared down
> looking at the msi_msg content. Indeed msi_domain_deactivate zeroes all
> the fields.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> v2 -> v3:
> - protect iova/addr manipulation with CONFIG_ARCH_DMA_ADDR_T_64BIT and
>   CONFIG_PHYS_ADDR_T_64BIT
> - only expose gic_pci_msi_domain_write_msg in case CONFIG_IOMMU_API &
>   CONFIG_PCI_MSI_IRQ_DOMAIN are set.
> - gic_set/unset_msi_addr duly become static
> ---
>  drivers/irqchip/irq-gic-common.c         | 69 ++++++++++++++++++++++++++++++++
>  drivers/irqchip/irq-gic-common.h         |  5 +++
>  drivers/irqchip/irq-gic-v2m.c            |  7 +++-
>  drivers/irqchip/irq-gic-v3-its-pci-msi.c |  5 +++
>  4 files changed, 85 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c
> index f174ce0..46cd06c 100644
> --- a/drivers/irqchip/irq-gic-common.c
> +++ b/drivers/irqchip/irq-gic-common.c
> @@ -18,6 +18,8 @@
>  #include <linux/io.h>
>  #include <linux/irq.h>
>  #include <linux/irqchip/arm-gic.h>
> +#include <linux/iommu.h>
> +#include <linux/msi.h>
>  
>  #include "irq-gic-common.h"
>  
> @@ -121,3 +123,70 @@ void gic_cpu_config(void __iomem *base, void (*sync_access)(void))
>  	if (sync_access)
>  		sync_access();
>  }
> +
> +#if defined(CONFIG_IOMMU_API) && defined(CONFIG_PCI_MSI_IRQ_DOMAIN)
> +static int gic_set_msi_addr(struct irq_data *data, struct msi_msg *msg)
> +{
> +	struct msi_desc *desc = irq_data_get_msi_desc(data);
> +	struct device *dev = msi_desc_to_dev(desc);
> +	struct iommu_domain *d;
> +	phys_addr_t addr;
> +	dma_addr_t iova;
> +	int ret;
> +
> +	d = iommu_get_domain_for_dev(dev);
> +	if (!d)
> +		return 0;
> +#ifdef CONFIG_PHYS_ADDR_T_64BIT
> +	addr = ((phys_addr_t)(msg->address_hi) << 32) | msg->address_lo;
> +#else
> +	addr = msg->address_lo;
> +#endif
> +
> +	ret = iommu_get_single_reserved(d, addr, IOMMU_WRITE, &iova);
> +
> +	if (!ret) {
> +		msg->address_lo = lower_32_bits(iova);
> +		msg->address_hi = upper_32_bits(iova);
> +	}
> +	return ret;
> +}
> +
> +
> +static void gic_unset_msi_addr(struct irq_data *data)
> +{
> +	struct msi_desc *desc = irq_data_get_msi_desc(data);
> +	struct device *dev;
> +	struct iommu_domain *d;
> +	dma_addr_t iova;
> +
> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
> +	iova = ((dma_addr_t)(desc->msg.address_hi) << 32) |
> +		desc->msg.address_lo;
> +#else
> +	iova = desc->msg.address_lo;
> +#endif
> +
> +	dev = msi_desc_to_dev(desc);
> +	if (!dev)
> +		return;
> +
> +	d = iommu_get_domain_for_dev(dev);
> +	if (!d)
> +		return;
> +
> +	iommu_put_single_reserved(d, iova);
> +}
> +
> +void gic_pci_msi_domain_write_msg(struct irq_data *irq_data,
> +				  struct msi_msg *msg)
> +{
> +	if (!msg->address_hi && !msg->address_lo && !msg->data)
> +		gic_unset_msi_addr(irq_data); /* deactivate */
> +	else
> +		gic_set_msi_addr(irq_data, msg); /* activate, set_affinity */
> +
> +	pci_msi_domain_write_msg(irq_data, msg);
> +}

So by doing that, you are specializing this infrastructure to PCI.
If you hijacked irq_compose_msi_msg() instead, you'd have both
platform and PCI MSI for the same price.

I can see a potential problem with the teardown of an MSI (I don't
think the compose method is called on teardown), but I think this could
be easily addressed.

> +#endif
> +
> diff --git a/drivers/irqchip/irq-gic-common.h b/drivers/irqchip/irq-gic-common.h
> index fff697d..98681fd 100644
> --- a/drivers/irqchip/irq-gic-common.h
> +++ b/drivers/irqchip/irq-gic-common.h
> @@ -35,4 +35,9 @@ void gic_cpu_config(void __iomem *base, void (*sync_access)(void));
>  void gic_enable_quirks(u32 iidr, const struct gic_quirk *quirks,
>  		void *data);
>  
> +#if defined(CONFIG_PCI_MSI_IRQ_DOMAIN) && defined(CONFIG_IOMMU_API)
> +void gic_pci_msi_domain_write_msg(struct irq_data *irq_data,
> +				  struct msi_msg *msg);
> +#endif
> +
>  #endif /* _IRQ_GIC_COMMON_H */
> diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
> index c779f83..692d809 100644
> --- a/drivers/irqchip/irq-gic-v2m.c
> +++ b/drivers/irqchip/irq-gic-v2m.c
> @@ -24,6 +24,7 @@
>  #include <linux/of_pci.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
> +#include "irq-gic-common.h"
>  
>  /*
>  * MSI_TYPER:
> @@ -83,7 +84,11 @@ static struct irq_chip gicv2m_msi_irq_chip = {
>  	.irq_mask		= gicv2m_mask_msi_irq,
>  	.irq_unmask		= gicv2m_unmask_msi_irq,
>  	.irq_eoi		= irq_chip_eoi_parent,
> -	.irq_write_msi_msg	= pci_msi_domain_write_msg,
> +#ifdef CONFIG_IOMMU_API
> +	.irq_write_msi_msg	= gic_pci_msi_domain_write_msg,
> +#else
> +	.irq_write_msi_msg      = pci_msi_domain_write_msg,
> +#endif

Irrespective of the way you implement the translation procedure, you
should make this unconditional, and have the #ifdefery in the code that
implements it.

>  };
>  
>  static struct msi_domain_info gicv2m_msi_domain_info = {
> diff --git a/drivers/irqchip/irq-gic-v3-its-pci-msi.c b/drivers/irqchip/irq-gic-v3-its-pci-msi.c
> index 8223765..690504e 100644
> --- a/drivers/irqchip/irq-gic-v3-its-pci-msi.c
> +++ b/drivers/irqchip/irq-gic-v3-its-pci-msi.c
> @@ -19,6 +19,7 @@
>  #include <linux/of.h>
>  #include <linux/of_irq.h>
>  #include <linux/of_pci.h>
> +#include "irq-gic-common.h"
>  
>  static void its_mask_msi_irq(struct irq_data *d)
>  {
> @@ -37,7 +38,11 @@ static struct irq_chip its_msi_irq_chip = {
>  	.irq_unmask		= its_unmask_msi_irq,
>  	.irq_mask		= its_mask_msi_irq,
>  	.irq_eoi		= irq_chip_eoi_parent,
> +#ifdef CONFIG_IOMMU_API
> +	.irq_write_msi_msg	= gic_pci_msi_domain_write_msg,
> +#else
>  	.irq_write_msi_msg	= pci_msi_domain_write_msg,
> +#endif
>  };
>  
>  struct its_pci_alias {

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC v3 05/15] iommu/arm-smmu: implement alloc/free_reserved_iova_domain
  2016-02-18 11:09   ` Robin Murphy
@ 2016-02-18 15:22     ` Eric Auger
  2016-02-18 16:06     ` Alex Williamson
  1 sibling, 0 replies; 29+ messages in thread
From: Eric Auger @ 2016-02-18 15:22 UTC (permalink / raw)
  To: Robin Murphy, eric.auger, alex.williamson, will.deacon, joro,
	tglx, jason, marc.zyngier, christoffer.dall, linux-arm-kernel,
	kvmarm, kvm
  Cc: Thomas.Lendacky, brijesh.singh, patches, Manish.Jaggi, p.fedin,
	linux-kernel, iommu, pranav.sawargaonkar, sherry.hurwitz

Hi Robin,
On 02/18/2016 12:09 PM, Robin Murphy wrote:
> Hi Eric,
> 
> On 12/02/16 08:13, Eric Auger wrote:
>> Implement alloc/free_reserved_iova_domain for arm-smmu. we use
>> the iova allocator (iova.c). The iova_domain is attached to the
>> arm_smmu_domain struct. A mutex is introduced to protect it.
> 
> The IOMMU API currently leaves IOVA management entirely up to the caller
I agree

> - VFIO is already managing its own IOVA space, so what warrants this
> being pushed all the way down to the IOMMU driver?
In practice, with upstreamed code, VFIO uses IOVA = GPA provided by the
user-space (corresponding to RAM regions) and does not allocate IOVA
itself. The IOVA is passed through the VFIO_IOMMU_MAP_DMA ioctl.

With that series we propose that the user-space provides a pool of
unused IOVA that can be used to map Host physical addresses (MSI frame
address). So effectively someone needs to use an iova allocator to
allocate within that window. This can be vfio or the iommu driver. But
in both cases this is a new capability introduced in either component.

In the first version of this series
(https://lkml.org/lkml/2016/1/26/371) I put this iova allocation in
vfio_iommu_type1. the vfio-pci driver then was using this vfio internal
API to overwrite the physical address written in the PCI device by the
MSI controller.

However I was advised by Alex to move things at a lower level
(http://www.spinics.net/lists/kvm/msg126809.html), IOMMU core or irq
remapping driver; also the MSI controller should directly program the
IOVA address in the PCI device.

On ARM, irq remapping is somehow abstracted in ITS driver. Also we need
that functionality in GICv2M so I eventually chose to put the
functionality in the IOMMU driver. Since iova.c is not compiled by
everyone and since that functionality is needed for a restricted set of
architectures, ARM/ARM64 & PowerPC I chose to implement this in arhc
specific code, for the time being in arm-smmu.c.

This allows the MSI controller to interact with the IOMMU API to bind
its MSI address. I think it may be feasible to have the MSI controller
interact with the vfio external user API but does it look better?

Assuming we can agree on the relevance of adding that functionality at
IOMMU API level, maybe we can create a separate .c file to share code
between arm-smmu and arm-smmu-v3.c? or even I could dare to add this
into the iommu generic part. What is your opinion?

 All I see here is
> abstract code with no hardware-specific details that'll have to be
> copy-pasted into other IOMMU drivers (e.g. SMMUv3), which strongly
> suggests it's the wrong place to do it.
> 
> As I understand the problem, VFIO has a generic "configure an IOMMU to
> point at an MSI doorbell" step to do in the process of attaching a
> device, which hasn't needed implementing yet due to VT-d's
> IOMMU_CAP_I_AM_ALSO_ACTUALLY_THE_MSI_CONTROLLER_IN_DISGUISE flag, which
> most of us have managed to misinterpret so far.

Maybe I misunderstood the above comment but I would say this is the
contrary: ie up to now, VFIO did not need to care about that issue since
MSI addresses were not mapped in the IOMMU on x86. Now they need to be
so we need to extend an existing API, would it be VFIO external user API
or IOMMU API. But please correct if I misunderstood you.

Also I found it more practical to have a all-in-one API doing both the
iova allocation and binding (dma_map_single like). the user of the API
does not have to care about the iommu page size.

Thanks for your time and looking forward to reading from you!

Best Regards

Eric

 AFAICS all the IOMMU
> driver should need to know about this is an iommu_map() call (which will
> want a slight extension[1] to make things behave properly). We should be
> fixing the abstraction to be less x86-centric, not hacking up all the
> ARM drivers to emulate x86 hardware behaviour in software.
> 
> Robin.
> 
> [1]:http://article.gmane.org/gmane.linux.kernel.cross-arch/30833
> 
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>> v2 -> v3:
>> - select IOMMU_IOVA when ARM_SMMU or ARM_SMMU_V3 is set
>>
>> v1 -> v2:
>> - formerly implemented in vfio_iommu_type1
>> ---
>>   drivers/iommu/Kconfig    |  2 ++
>>   drivers/iommu/arm-smmu.c | 87
>> +++++++++++++++++++++++++++++++++++++++---------
>>   2 files changed, 74 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
>> index a1e75cb..1106528 100644
>> --- a/drivers/iommu/Kconfig
>> +++ b/drivers/iommu/Kconfig
>> @@ -289,6 +289,7 @@ config ARM_SMMU
>>       bool "ARM Ltd. System MMU (SMMU) Support"
>>       depends on (ARM64 || ARM) && MMU
>>       select IOMMU_API
>> +    select IOMMU_IOVA
>>       select IOMMU_IO_PGTABLE_LPAE
>>       select ARM_DMA_USE_IOMMU if ARM
>>       help
>> @@ -302,6 +303,7 @@ config ARM_SMMU_V3
>>       bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>>       depends on ARM64 && PCI
>>       select IOMMU_API
>> +    select IOMMU_IOVA
>>       select IOMMU_IO_PGTABLE_LPAE
>>       select GENERIC_MSI_IRQ_DOMAIN
>>       help
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index c8b7e71..f42341d 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -42,6 +42,7 @@
>>   #include <linux/platform_device.h>
>>   #include <linux/slab.h>
>>   #include <linux/spinlock.h>
>> +#include <linux/iova.h>
>>
>>   #include <linux/amba/bus.h>
>>
>> @@ -347,6 +348,9 @@ struct arm_smmu_domain {
>>       enum arm_smmu_domain_stage    stage;
>>       struct mutex            init_mutex; /* Protects smmu pointer */
>>       struct iommu_domain        domain;
>> +    struct iova_domain        *reserved_iova_domain;
>> +    /* protects reserved domain manipulation */
>> +    struct mutex            reserved_mutex;
>>   };
>>
>>   static struct iommu_ops arm_smmu_ops;
>> @@ -975,6 +979,7 @@ static struct iommu_domain
>> *arm_smmu_domain_alloc(unsigned type)
>>           return NULL;
>>
>>       mutex_init(&smmu_domain->init_mutex);
>> +    mutex_init(&smmu_domain->reserved_mutex);
>>       spin_lock_init(&smmu_domain->pgtbl_lock);
>>
>>       return &smmu_domain->domain;
>> @@ -1446,22 +1451,74 @@ out_unlock:
>>       return ret;
>>   }
>>
>> +static int arm_smmu_alloc_reserved_iova_domain(struct iommu_domain
>> *domain,
>> +                           dma_addr_t iova, size_t size,
>> +                           unsigned long order)
>> +{
>> +    unsigned long granule, mask;
>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +    int ret = 0;
>> +
>> +    granule = 1UL << order;
>> +    mask = granule - 1;
>> +    if (iova & mask || (!size) || (size & mask))
>> +        return -EINVAL;
>> +
>> +    if (smmu_domain->reserved_iova_domain)
>> +        return -EEXIST;
>> +
>> +    mutex_lock(&smmu_domain->reserved_mutex);
>> +
>> +    smmu_domain->reserved_iova_domain =
>> +        kzalloc(sizeof(struct iova_domain), GFP_KERNEL);
>> +    if (!smmu_domain->reserved_iova_domain) {
>> +        ret = -ENOMEM;
>> +        goto unlock;
>> +    }
>> +
>> +    init_iova_domain(smmu_domain->reserved_iova_domain,
>> +             granule, iova >> order, (iova + size - 1) >> order);
>> +
>> +unlock:
>> +    mutex_unlock(&smmu_domain->reserved_mutex);
>> +    return ret;
>> +}
>> +
>> +static void arm_smmu_free_reserved_iova_domain(struct iommu_domain
>> *domain)
>> +{
>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +    struct iova_domain *iovad = smmu_domain->reserved_iova_domain;
>> +
>> +    if (!iovad)
>> +        return;
>> +
>> +    mutex_lock(&smmu_domain->reserved_mutex);
>> +
>> +    put_iova_domain(iovad);
>> +    kfree(iovad);
>> +
>> +    mutex_unlock(&smmu_domain->reserved_mutex);
>> +}
>> +
>>   static struct iommu_ops arm_smmu_ops = {
>> -    .capable        = arm_smmu_capable,
>> -    .domain_alloc        = arm_smmu_domain_alloc,
>> -    .domain_free        = arm_smmu_domain_free,
>> -    .attach_dev        = arm_smmu_attach_dev,
>> -    .detach_dev        = arm_smmu_detach_dev,
>> -    .map            = arm_smmu_map,
>> -    .unmap            = arm_smmu_unmap,
>> -    .map_sg            = default_iommu_map_sg,
>> -    .iova_to_phys        = arm_smmu_iova_to_phys,
>> -    .add_device        = arm_smmu_add_device,
>> -    .remove_device        = arm_smmu_remove_device,
>> -    .device_group        = arm_smmu_device_group,
>> -    .domain_get_attr    = arm_smmu_domain_get_attr,
>> -    .domain_set_attr    = arm_smmu_domain_set_attr,
>> -    .pgsize_bitmap        = -1UL, /* Restricted during device attach */
>> +    .capable            = arm_smmu_capable,
>> +    .domain_alloc            = arm_smmu_domain_alloc,
>> +    .domain_free            = arm_smmu_domain_free,
>> +    .attach_dev            = arm_smmu_attach_dev,
>> +    .detach_dev            = arm_smmu_detach_dev,
>> +    .map                = arm_smmu_map,
>> +    .unmap                = arm_smmu_unmap,
>> +    .map_sg                = default_iommu_map_sg,
>> +    .iova_to_phys            = arm_smmu_iova_to_phys,
>> +    .add_device            = arm_smmu_add_device,
>> +    .remove_device            = arm_smmu_remove_device,
>> +    .device_group            = arm_smmu_device_group,
>> +    .domain_get_attr        = arm_smmu_domain_get_attr,
>> +    .domain_set_attr        = arm_smmu_domain_set_attr,
>> +    .alloc_reserved_iova_domain    =
>> arm_smmu_alloc_reserved_iova_domain,
>> +    .free_reserved_iova_domain    = arm_smmu_free_reserved_iova_domain,
>> +    /* Page size bitmap, restricted during device attach */
>> +    .pgsize_bitmap            = -1UL,
>>   };
>>
>>   static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
>>
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC v3 02/15] vfio: expose MSI mapping requirement through VFIO_IOMMU_GET_INFO
  2016-02-18  9:34   ` Marc Zyngier
@ 2016-02-18 15:26     ` Eric Auger
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2016-02-18 15:26 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: eric.auger, alex.williamson, will.deacon, joro, tglx, jason,
	christoffer.dall, linux-arm-kernel, kvmarm, kvm,
	suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

Hi Marc,
On 02/18/2016 10:34 AM, Marc Zyngier wrote:
> On Fri, 12 Feb 2016 08:13:04 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
> 
>> This patch allows the user-space to retrieve whether msi write
>> transaction addresses must be mapped. This is returned through the
>> VFIO_IOMMU_GET_INFO API and its new flag: VFIO_IOMMU_INFO_REQUIRE_MSI_MAP.
>>
>> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> RFC v1 -> v1:
>> - derived from
>>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
>> - renamed allow_msi_reconfig into require_msi_mapping
>> - fixed VFIO_IOMMU_GET_INFO
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 26 ++++++++++++++++++++++++++
>>  include/uapi/linux/vfio.h       |  1 +
>>  2 files changed, 27 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 6f1ea3d..c5b57e1 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -255,6 +255,29 @@ static int vaddr_get_pfn(unsigned long vaddr, int prot, unsigned long *pfn)
>>  }
>>  
>>  /*
>> + * vfio_domains_require_msi_mapping: indicates whether MSI write transaction
>> + * addresses must be mapped
>> + *
>> + * returns true if it does
>> + */
>> +static bool vfio_domains_require_msi_mapping(struct vfio_iommu *iommu)
>> +{
>> +	struct vfio_domain *d;
>> +	bool ret;
>> +
>> +	mutex_lock(&iommu->lock);
>> +	/* All domains have same require_msi_map property, pick first */
>> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
>> +	if (iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_MAPPING, NULL) < 0)
>> +		ret = false;
>> +	else
>> +		ret = true;
> 
> nit: this could be simplified as:
> 
> ret = (iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_MAPPING, NULL) == 0);
sure ;-)
> 
>> +	mutex_unlock(&iommu->lock);
>> +
>> +	return ret;
>> +}
>> +
>> +/*
>>   * Attempt to pin pages.  We really don't want to track all the pfns and
>>   * the iommu can only map chunks of consecutive pfns anyway, so get the
>>   * first page and all consecutive pages with the same locking.
>> @@ -997,6 +1020,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  
>>  		info.flags = VFIO_IOMMU_INFO_PGSIZES;
>>  
>> +		if (vfio_domains_require_msi_mapping(iommu))
>> +			info.flags |= VFIO_IOMMU_INFO_REQUIRE_MSI_MAP;
>> +
>>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>>  
>>  		return copy_to_user((void __user *)arg, &info, minsz);
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 7d7a4c6..43e183b 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -400,6 +400,7 @@ struct vfio_iommu_type1_info {
>>  	__u32	argsz;
>>  	__u32	flags;
>>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
>> +#define VFIO_IOMMU_INFO_REQUIRE_MSI_MAP (1 << 1)/* MSI must be mapped */
>>  	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
>>  };
>>  
> 
> 
> FWIW:
> 
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
thanks

Eric
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC v3 15/15] irqchip/gicv2m/v3-its-pci-msi: IOMMU map the MSI frame when needed
  2016-02-18 11:33   ` Marc Zyngier
@ 2016-02-18 15:33     ` Eric Auger
  2016-02-18 15:47       ` Marc Zyngier
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Auger @ 2016-02-18 15:33 UTC (permalink / raw)
  To: Marc Zyngier, leo.duran
  Cc: eric.auger, alex.williamson, will.deacon, joro, tglx, jason,
	christoffer.dall, linux-arm-kernel, kvmarm, kvm,
	suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, Thomas.Lendacky

Hi Marc,
On 02/18/2016 12:33 PM, Marc Zyngier wrote:
> On Fri, 12 Feb 2016 08:13:17 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
> 
>> In case the msi_desc references a device attached to an iommu
>> domain, the msi address needs to be mapped in the IOMMU. Else any
>> MSI write transaction will cause a fault.
>>
>> gic_set_msi_addr detects that case and allocates an iova bound
>> to the physical address page comprising the MSI frame. This iova
>> then is used as the msi_msg address. Unset operation decrements the
>> reference on the binding.
>>
>> The functions are called in the irq_write_msi_msg ops implementation.
>> At that time we can recognize whether the msi is setup or teared down
>> looking at the msi_msg content. Indeed msi_domain_deactivate zeroes all
>> the fields.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> v2 -> v3:
>> - protect iova/addr manipulation with CONFIG_ARCH_DMA_ADDR_T_64BIT and
>>   CONFIG_PHYS_ADDR_T_64BIT
>> - only expose gic_pci_msi_domain_write_msg in case CONFIG_IOMMU_API &
>>   CONFIG_PCI_MSI_IRQ_DOMAIN are set.
>> - gic_set/unset_msi_addr duly become static
>> ---
>>  drivers/irqchip/irq-gic-common.c         | 69 ++++++++++++++++++++++++++++++++
>>  drivers/irqchip/irq-gic-common.h         |  5 +++
>>  drivers/irqchip/irq-gic-v2m.c            |  7 +++-
>>  drivers/irqchip/irq-gic-v3-its-pci-msi.c |  5 +++
>>  4 files changed, 85 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c
>> index f174ce0..46cd06c 100644
>> --- a/drivers/irqchip/irq-gic-common.c
>> +++ b/drivers/irqchip/irq-gic-common.c
>> @@ -18,6 +18,8 @@
>>  #include <linux/io.h>
>>  #include <linux/irq.h>
>>  #include <linux/irqchip/arm-gic.h>
>> +#include <linux/iommu.h>
>> +#include <linux/msi.h>
>>  
>>  #include "irq-gic-common.h"
>>  
>> @@ -121,3 +123,70 @@ void gic_cpu_config(void __iomem *base, void (*sync_access)(void))
>>  	if (sync_access)
>>  		sync_access();
>>  }
>> +
>> +#if defined(CONFIG_IOMMU_API) && defined(CONFIG_PCI_MSI_IRQ_DOMAIN)
>> +static int gic_set_msi_addr(struct irq_data *data, struct msi_msg *msg)
>> +{
>> +	struct msi_desc *desc = irq_data_get_msi_desc(data);
>> +	struct device *dev = msi_desc_to_dev(desc);
>> +	struct iommu_domain *d;
>> +	phys_addr_t addr;
>> +	dma_addr_t iova;
>> +	int ret;
>> +
>> +	d = iommu_get_domain_for_dev(dev);
>> +	if (!d)
>> +		return 0;
>> +#ifdef CONFIG_PHYS_ADDR_T_64BIT
>> +	addr = ((phys_addr_t)(msg->address_hi) << 32) | msg->address_lo;
>> +#else
>> +	addr = msg->address_lo;
>> +#endif
>> +
>> +	ret = iommu_get_single_reserved(d, addr, IOMMU_WRITE, &iova);
>> +
>> +	if (!ret) {
>> +		msg->address_lo = lower_32_bits(iova);
>> +		msg->address_hi = upper_32_bits(iova);
>> +	}
>> +	return ret;
>> +}
>> +
>> +
>> +static void gic_unset_msi_addr(struct irq_data *data)
>> +{
>> +	struct msi_desc *desc = irq_data_get_msi_desc(data);
>> +	struct device *dev;
>> +	struct iommu_domain *d;
>> +	dma_addr_t iova;
>> +
>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>> +	iova = ((dma_addr_t)(desc->msg.address_hi) << 32) |
>> +		desc->msg.address_lo;
>> +#else
>> +	iova = desc->msg.address_lo;
>> +#endif
>> +
>> +	dev = msi_desc_to_dev(desc);
>> +	if (!dev)
>> +		return;
>> +
>> +	d = iommu_get_domain_for_dev(dev);
>> +	if (!d)
>> +		return;
>> +
>> +	iommu_put_single_reserved(d, iova);
>> +}
>> +
>> +void gic_pci_msi_domain_write_msg(struct irq_data *irq_data,
>> +				  struct msi_msg *msg)
>> +{
>> +	if (!msg->address_hi && !msg->address_lo && !msg->data)
>> +		gic_unset_msi_addr(irq_data); /* deactivate */
>> +	else
>> +		gic_set_msi_addr(irq_data, msg); /* activate, set_affinity */
>> +
>> +	pci_msi_domain_write_msg(irq_data, msg);
>> +}
> 
> So by doing that, you are specializing this infrastructure to PCI.
> If you hijacked irq_compose_msi_msg() instead, you'd have both
> platform and PCI MSI for the same price.
> 
> I can see a potential problem with the teardown of an MSI (I don't
> think the compose method is called on teardown), but I think this could
> be easily addressed.
Yes effectively this is the reason why I moved it from
irq_compose_msi_msg (my original implementation) to irq_write_msi_msg. I
noticed I had no way to detect the teardown whereas the
msi_domain_deactivate also calls irq_write_msi_msg which is quite
practical ;-) To be honest I need to further look at the non PCI
implementation.


> 
>> +#endif
>> +
>> diff --git a/drivers/irqchip/irq-gic-common.h b/drivers/irqchip/irq-gic-common.h
>> index fff697d..98681fd 100644
>> --- a/drivers/irqchip/irq-gic-common.h
>> +++ b/drivers/irqchip/irq-gic-common.h
>> @@ -35,4 +35,9 @@ void gic_cpu_config(void __iomem *base, void (*sync_access)(void));
>>  void gic_enable_quirks(u32 iidr, const struct gic_quirk *quirks,
>>  		void *data);
>>  
>> +#if defined(CONFIG_PCI_MSI_IRQ_DOMAIN) && defined(CONFIG_IOMMU_API)
>> +void gic_pci_msi_domain_write_msg(struct irq_data *irq_data,
>> +				  struct msi_msg *msg);
>> +#endif
>> +
>>  #endif /* _IRQ_GIC_COMMON_H */
>> diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
>> index c779f83..692d809 100644
>> --- a/drivers/irqchip/irq-gic-v2m.c
>> +++ b/drivers/irqchip/irq-gic-v2m.c
>> @@ -24,6 +24,7 @@
>>  #include <linux/of_pci.h>
>>  #include <linux/slab.h>
>>  #include <linux/spinlock.h>
>> +#include "irq-gic-common.h"
>>  
>>  /*
>>  * MSI_TYPER:
>> @@ -83,7 +84,11 @@ static struct irq_chip gicv2m_msi_irq_chip = {
>>  	.irq_mask		= gicv2m_mask_msi_irq,
>>  	.irq_unmask		= gicv2m_unmask_msi_irq,
>>  	.irq_eoi		= irq_chip_eoi_parent,
>> -	.irq_write_msi_msg	= pci_msi_domain_write_msg,
>> +#ifdef CONFIG_IOMMU_API
>> +	.irq_write_msi_msg	= gic_pci_msi_domain_write_msg,
>> +#else
>> +	.irq_write_msi_msg      = pci_msi_domain_write_msg,
>> +#endif
> 
> Irrespective of the way you implement the translation procedure, you
> should make this unconditional, and have the #ifdefery in the code that
> implements it.

OK

Thanks

Eric
> 
>>  };
>>  
>>  static struct msi_domain_info gicv2m_msi_domain_info = {
>> diff --git a/drivers/irqchip/irq-gic-v3-its-pci-msi.c b/drivers/irqchip/irq-gic-v3-its-pci-msi.c
>> index 8223765..690504e 100644
>> --- a/drivers/irqchip/irq-gic-v3-its-pci-msi.c
>> +++ b/drivers/irqchip/irq-gic-v3-its-pci-msi.c
>> @@ -19,6 +19,7 @@
>>  #include <linux/of.h>
>>  #include <linux/of_irq.h>
>>  #include <linux/of_pci.h>
>> +#include "irq-gic-common.h"
>>  
>>  static void its_mask_msi_irq(struct irq_data *d)
>>  {
>> @@ -37,7 +38,11 @@ static struct irq_chip its_msi_irq_chip = {
>>  	.irq_unmask		= its_unmask_msi_irq,
>>  	.irq_mask		= its_mask_msi_irq,
>>  	.irq_eoi		= irq_chip_eoi_parent,
>> +#ifdef CONFIG_IOMMU_API
>> +	.irq_write_msi_msg	= gic_pci_msi_domain_write_msg,
>> +#else
>>  	.irq_write_msi_msg	= pci_msi_domain_write_msg,
>> +#endif
>>  };
>>  
>>  struct its_pci_alias {
> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC v3 15/15] irqchip/gicv2m/v3-its-pci-msi: IOMMU map the MSI frame when needed
  2016-02-18 15:33     ` Eric Auger
@ 2016-02-18 15:47       ` Marc Zyngier
  2016-02-18 16:58         ` Eric Auger
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Zyngier @ 2016-02-18 15:47 UTC (permalink / raw)
  To: Eric Auger, leo.duran
  Cc: eric.auger, alex.williamson, will.deacon, joro, tglx, jason,
	christoffer.dall, linux-arm-kernel, kvmarm, kvm,
	suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, Thomas.Lendacky

On 18/02/16 15:33, Eric Auger wrote:
> Hi Marc,
> On 02/18/2016 12:33 PM, Marc Zyngier wrote:
>> On Fri, 12 Feb 2016 08:13:17 +0000
>> Eric Auger <eric.auger@linaro.org> wrote:
>>
>>> In case the msi_desc references a device attached to an iommu
>>> domain, the msi address needs to be mapped in the IOMMU. Else any
>>> MSI write transaction will cause a fault.
>>>
>>> gic_set_msi_addr detects that case and allocates an iova bound
>>> to the physical address page comprising the MSI frame. This iova
>>> then is used as the msi_msg address. Unset operation decrements the
>>> reference on the binding.
>>>
>>> The functions are called in the irq_write_msi_msg ops implementation.
>>> At that time we can recognize whether the msi is setup or teared down
>>> looking at the msi_msg content. Indeed msi_domain_deactivate zeroes all
>>> the fields.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>
>>> ---
>>>
>>> v2 -> v3:
>>> - protect iova/addr manipulation with CONFIG_ARCH_DMA_ADDR_T_64BIT and
>>>   CONFIG_PHYS_ADDR_T_64BIT
>>> - only expose gic_pci_msi_domain_write_msg in case CONFIG_IOMMU_API &
>>>   CONFIG_PCI_MSI_IRQ_DOMAIN are set.
>>> - gic_set/unset_msi_addr duly become static
>>> ---
>>>  drivers/irqchip/irq-gic-common.c         | 69 ++++++++++++++++++++++++++++++++
>>>  drivers/irqchip/irq-gic-common.h         |  5 +++
>>>  drivers/irqchip/irq-gic-v2m.c            |  7 +++-
>>>  drivers/irqchip/irq-gic-v3-its-pci-msi.c |  5 +++
>>>  4 files changed, 85 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c
>>> index f174ce0..46cd06c 100644
>>> --- a/drivers/irqchip/irq-gic-common.c
>>> +++ b/drivers/irqchip/irq-gic-common.c
>>> @@ -18,6 +18,8 @@
>>>  #include <linux/io.h>
>>>  #include <linux/irq.h>
>>>  #include <linux/irqchip/arm-gic.h>
>>> +#include <linux/iommu.h>
>>> +#include <linux/msi.h>
>>>  
>>>  #include "irq-gic-common.h"
>>>  
>>> @@ -121,3 +123,70 @@ void gic_cpu_config(void __iomem *base, void (*sync_access)(void))
>>>  	if (sync_access)
>>>  		sync_access();
>>>  }
>>> +
>>> +#if defined(CONFIG_IOMMU_API) && defined(CONFIG_PCI_MSI_IRQ_DOMAIN)
>>> +static int gic_set_msi_addr(struct irq_data *data, struct msi_msg *msg)
>>> +{
>>> +	struct msi_desc *desc = irq_data_get_msi_desc(data);
>>> +	struct device *dev = msi_desc_to_dev(desc);
>>> +	struct iommu_domain *d;
>>> +	phys_addr_t addr;
>>> +	dma_addr_t iova;
>>> +	int ret;
>>> +
>>> +	d = iommu_get_domain_for_dev(dev);
>>> +	if (!d)
>>> +		return 0;
>>> +#ifdef CONFIG_PHYS_ADDR_T_64BIT
>>> +	addr = ((phys_addr_t)(msg->address_hi) << 32) | msg->address_lo;
>>> +#else
>>> +	addr = msg->address_lo;
>>> +#endif
>>> +
>>> +	ret = iommu_get_single_reserved(d, addr, IOMMU_WRITE, &iova);
>>> +
>>> +	if (!ret) {
>>> +		msg->address_lo = lower_32_bits(iova);
>>> +		msg->address_hi = upper_32_bits(iova);
>>> +	}
>>> +	return ret;
>>> +}
>>> +
>>> +
>>> +static void gic_unset_msi_addr(struct irq_data *data)
>>> +{
>>> +	struct msi_desc *desc = irq_data_get_msi_desc(data);
>>> +	struct device *dev;
>>> +	struct iommu_domain *d;
>>> +	dma_addr_t iova;
>>> +
>>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>>> +	iova = ((dma_addr_t)(desc->msg.address_hi) << 32) |
>>> +		desc->msg.address_lo;
>>> +#else
>>> +	iova = desc->msg.address_lo;
>>> +#endif
>>> +
>>> +	dev = msi_desc_to_dev(desc);
>>> +	if (!dev)
>>> +		return;
>>> +
>>> +	d = iommu_get_domain_for_dev(dev);
>>> +	if (!d)
>>> +		return;
>>> +
>>> +	iommu_put_single_reserved(d, iova);
>>> +}
>>> +
>>> +void gic_pci_msi_domain_write_msg(struct irq_data *irq_data,
>>> +				  struct msi_msg *msg)
>>> +{
>>> +	if (!msg->address_hi && !msg->address_lo && !msg->data)
>>> +		gic_unset_msi_addr(irq_data); /* deactivate */
>>> +	else
>>> +		gic_set_msi_addr(irq_data, msg); /* activate, set_affinity */
>>> +
>>> +	pci_msi_domain_write_msg(irq_data, msg);
>>> +}
>>
>> So by doing that, you are specializing this infrastructure to PCI.
>> If you hijacked irq_compose_msi_msg() instead, you'd have both
>> platform and PCI MSI for the same price.
>>
>> I can see a potential problem with the teardown of an MSI (I don't
>> think the compose method is called on teardown), but I think this could
>> be easily addressed.
> Yes effectively this is the reason why I moved it from
> irq_compose_msi_msg (my original implementation) to irq_write_msi_msg. I
> noticed I had no way to detect the teardown whereas the
> msi_domain_deactivate also calls irq_write_msi_msg which is quite
> practical ;-) To be honest I need to further look at the non PCI
> implementation.

Another thing to consider is that MSI controllers may use different
doorbells for different CPU affinities. With your implementation,
repeatedly changing the affinity from one CPU to another would increase
the refcounting, and never actually lower it (you don't necessarily go
via an "unmap"). Of course, none of that applies to GICv2m/GICv3-ITS,
but that's worth considering.

So I think we may need some better tracking of the IOVA we program in
the device, and offer a generic infrastructure for this instead of
hiding it in the MSI controller drivers.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC v3 05/15] iommu/arm-smmu: implement alloc/free_reserved_iova_domain
  2016-02-18 11:09   ` Robin Murphy
  2016-02-18 15:22     ` Eric Auger
@ 2016-02-18 16:06     ` Alex Williamson
  1 sibling, 0 replies; 29+ messages in thread
From: Alex Williamson @ 2016-02-18 16:06 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Eric Auger, eric.auger, will.deacon, joro, tglx, jason,
	marc.zyngier, christoffer.dall, linux-arm-kernel, kvmarm, kvm,
	Thomas.Lendacky, brijesh.singh, patches, Manish.Jaggi, p.fedin,
	linux-kernel, iommu, pranav.sawargaonkar, sherry.hurwitz

On Thu, 18 Feb 2016 11:09:17 +0000
Robin Murphy <robin.murphy@arm.com> wrote:

> Hi Eric,
> 
> On 12/02/16 08:13, Eric Auger wrote:
> > Implement alloc/free_reserved_iova_domain for arm-smmu. we use
> > the iova allocator (iova.c). The iova_domain is attached to the
> > arm_smmu_domain struct. A mutex is introduced to protect it.  
> 
> The IOMMU API currently leaves IOVA management entirely up to the caller 
> - VFIO is already managing its own IOVA space, so what warrants this 
> being pushed all the way down to the IOMMU driver? All I see here is 
> abstract code with no hardware-specific details that'll have to be 
> copy-pasted into other IOMMU drivers (e.g. SMMUv3), which strongly 
> suggests it's the wrong place to do it.
> 
> As I understand the problem, VFIO has a generic "configure an IOMMU to 
> point at an MSI doorbell" step to do in the process of attaching a 
> device, which hasn't needed implementing yet due to VT-d's 
> IOMMU_CAP_I_AM_ALSO_ACTUALLY_THE_MSI_CONTROLLER_IN_DISGUISE flag, which 
> most of us have managed to misinterpret so far. AFAICS all the IOMMU 
> driver should need to know about this is an iommu_map() call (which will 
> want a slight extension[1] to make things behave properly). We should be 
> fixing the abstraction to be less x86-centric, not hacking up all the 
> ARM drivers to emulate x86 hardware behaviour in software.

The gap I see is that, that the I_AM_ALSO_ACTUALLY_THE_MSI...
solution transparently fixes, is that there's no connection between
pci_enable_msi{x}_range and the IOMMU API.  If I want to allow a device
managed by an IOMMU API domain to perform MSI, I need to go scrape the
MSI vectors out of the device, setup a translation into my IOVA space,
and re-write those vectors.  Not to mention that as an end user, I
have no idea what might be sharing the page where those vectors are
targeted and what I might be allowing the user DMA access to.  MSI
setup is necessarily making use of the IOVA space of the device, so
there's clearly an opportunity to interact with the IOMMU API to manage
that IOVA usage.  x86 has an implicit range of IOVA space for MSI, this
makes an explicit range, reserved by the IOMMU API user for this
purpose.  At the vfio level, I just want to be able to call the PCI
MSI/X setup routines and have them automatically program vectors that
make use of IOVA space that I've already marked reserved for this
purpose.  I don't see how that's x86-centric other than x86 has already
managed to make this transparent and spoiled users into expecting
working IOVAs on the device after using standard MSI vector setup
callbacks.  That's the goal I'm looking for.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC v3 07/15] iommu: iommu_get/put_single_reserved
  2016-02-18 11:06   ` Marc Zyngier
@ 2016-02-18 16:42     ` Eric Auger
  2016-02-18 16:51       ` Marc Zyngier
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Auger @ 2016-02-18 16:42 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: eric.auger, alex.williamson, will.deacon, joro, tglx, jason,
	christoffer.dall, linux-arm-kernel, kvmarm, kvm,
	suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

Hello,
On 02/18/2016 12:06 PM, Marc Zyngier wrote:
> On Fri, 12 Feb 2016 08:13:09 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
> 
>> This patch introduces iommu_get/put_single_reserved.
>>
>> iommu_get_single_reserved allows to allocate a new reserved iova page
>> and map it onto the physical page that contains a given physical address.
>> It returns the iova that is mapped onto the provided physical address.
>> Hence the physical address passed in argument does not need to be aligned.
>>
>> In case a mapping already exists between both pages, the IOVA mapped
>> to the PA is directly returned.
>>
>> Each time an iova is successfully returned a binding ref count is
>> incremented.
>>
>> iommu_put_single_reserved decrements the ref count and when this latter
>> is null, the mapping is destroyed and the iova is released.
> 
> I wonder if there is a requirement for the caller to find out about the
> size of the mapping, or to impose a given size... MSIs clearly do not
> have that requirement (this is always a 32bit value), but since. 
> allocations usually pair address and size, I though I'd ask...
Yes. Currently this only makes sure the host PA is mapped and returns
the corresponding IOVA. It is part of the discussion we need to have on
the API besides the problematic of which API it should belong to.

Thanks

Eric
> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC v3 07/15] iommu: iommu_get/put_single_reserved
  2016-02-18 16:42     ` Eric Auger
@ 2016-02-18 16:51       ` Marc Zyngier
  2016-02-18 17:18         ` Eric Auger
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Zyngier @ 2016-02-18 16:51 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, alex.williamson, will.deacon, joro, tglx, jason,
	christoffer.dall, linux-arm-kernel, kvmarm, kvm,
	suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

On 18/02/16 16:42, Eric Auger wrote:
> Hello,
> On 02/18/2016 12:06 PM, Marc Zyngier wrote:
>> On Fri, 12 Feb 2016 08:13:09 +0000
>> Eric Auger <eric.auger@linaro.org> wrote:
>>
>>> This patch introduces iommu_get/put_single_reserved.
>>>
>>> iommu_get_single_reserved allows to allocate a new reserved iova page
>>> and map it onto the physical page that contains a given physical address.
>>> It returns the iova that is mapped onto the provided physical address.
>>> Hence the physical address passed in argument does not need to be aligned.
>>>
>>> In case a mapping already exists between both pages, the IOVA mapped
>>> to the PA is directly returned.
>>>
>>> Each time an iova is successfully returned a binding ref count is
>>> incremented.
>>>
>>> iommu_put_single_reserved decrements the ref count and when this latter
>>> is null, the mapping is destroyed and the iova is released.
>>
>> I wonder if there is a requirement for the caller to find out about the
>> size of the mapping, or to impose a given size... MSIs clearly do not
>> have that requirement (this is always a 32bit value), but since. 
>> allocations usually pair address and size, I though I'd ask...
> Yes. Currently this only makes sure the host PA is mapped and returns
> the corresponding IOVA. It is part of the discussion we need to have on
> the API besides the problematic of which API it should belong to.

One of the issues I have with the API at the moment is that there is no
control on the page size. Imagine you have allocated a 4kB IOVA window
for your MSI, but your IOMMU can only map 64kB (not unreasonable to
imagine on arm64). What happens then?

Somehow, userspace should be told about it, one way or another.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC v3 15/15] irqchip/gicv2m/v3-its-pci-msi: IOMMU map the MSI frame when needed
  2016-02-18 15:47       ` Marc Zyngier
@ 2016-02-18 16:58         ` Eric Auger
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2016-02-18 16:58 UTC (permalink / raw)
  To: Marc Zyngier, leo.duran
  Cc: eric.auger, alex.williamson, will.deacon, joro, tglx, jason,
	christoffer.dall, linux-arm-kernel, kvmarm, kvm,
	suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, Thomas.Lendacky

Hi Marc,
On 02/18/2016 04:47 PM, Marc Zyngier wrote:
> On 18/02/16 15:33, Eric Auger wrote:
>> Hi Marc,
>> On 02/18/2016 12:33 PM, Marc Zyngier wrote:
>>> On Fri, 12 Feb 2016 08:13:17 +0000
>>> Eric Auger <eric.auger@linaro.org> wrote:
>>>
>>>> In case the msi_desc references a device attached to an iommu
>>>> domain, the msi address needs to be mapped in the IOMMU. Else any
>>>> MSI write transaction will cause a fault.
>>>>
>>>> gic_set_msi_addr detects that case and allocates an iova bound
>>>> to the physical address page comprising the MSI frame. This iova
>>>> then is used as the msi_msg address. Unset operation decrements the
>>>> reference on the binding.
>>>>
>>>> The functions are called in the irq_write_msi_msg ops implementation.
>>>> At that time we can recognize whether the msi is setup or teared down
>>>> looking at the msi_msg content. Indeed msi_domain_deactivate zeroes all
>>>> the fields.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>
>>>> ---
>>>>
>>>> v2 -> v3:
>>>> - protect iova/addr manipulation with CONFIG_ARCH_DMA_ADDR_T_64BIT and
>>>>   CONFIG_PHYS_ADDR_T_64BIT
>>>> - only expose gic_pci_msi_domain_write_msg in case CONFIG_IOMMU_API &
>>>>   CONFIG_PCI_MSI_IRQ_DOMAIN are set.
>>>> - gic_set/unset_msi_addr duly become static
>>>> ---
>>>>  drivers/irqchip/irq-gic-common.c         | 69 ++++++++++++++++++++++++++++++++
>>>>  drivers/irqchip/irq-gic-common.h         |  5 +++
>>>>  drivers/irqchip/irq-gic-v2m.c            |  7 +++-
>>>>  drivers/irqchip/irq-gic-v3-its-pci-msi.c |  5 +++
>>>>  4 files changed, 85 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c
>>>> index f174ce0..46cd06c 100644
>>>> --- a/drivers/irqchip/irq-gic-common.c
>>>> +++ b/drivers/irqchip/irq-gic-common.c
>>>> @@ -18,6 +18,8 @@
>>>>  #include <linux/io.h>
>>>>  #include <linux/irq.h>
>>>>  #include <linux/irqchip/arm-gic.h>
>>>> +#include <linux/iommu.h>
>>>> +#include <linux/msi.h>
>>>>  
>>>>  #include "irq-gic-common.h"
>>>>  
>>>> @@ -121,3 +123,70 @@ void gic_cpu_config(void __iomem *base, void (*sync_access)(void))
>>>>  	if (sync_access)
>>>>  		sync_access();
>>>>  }
>>>> +
>>>> +#if defined(CONFIG_IOMMU_API) && defined(CONFIG_PCI_MSI_IRQ_DOMAIN)
>>>> +static int gic_set_msi_addr(struct irq_data *data, struct msi_msg *msg)
>>>> +{
>>>> +	struct msi_desc *desc = irq_data_get_msi_desc(data);
>>>> +	struct device *dev = msi_desc_to_dev(desc);
>>>> +	struct iommu_domain *d;
>>>> +	phys_addr_t addr;
>>>> +	dma_addr_t iova;
>>>> +	int ret;
>>>> +
>>>> +	d = iommu_get_domain_for_dev(dev);
>>>> +	if (!d)
>>>> +		return 0;
>>>> +#ifdef CONFIG_PHYS_ADDR_T_64BIT
>>>> +	addr = ((phys_addr_t)(msg->address_hi) << 32) | msg->address_lo;
>>>> +#else
>>>> +	addr = msg->address_lo;
>>>> +#endif
>>>> +
>>>> +	ret = iommu_get_single_reserved(d, addr, IOMMU_WRITE, &iova);
>>>> +
>>>> +	if (!ret) {
>>>> +		msg->address_lo = lower_32_bits(iova);
>>>> +		msg->address_hi = upper_32_bits(iova);
>>>> +	}
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +
>>>> +static void gic_unset_msi_addr(struct irq_data *data)
>>>> +{
>>>> +	struct msi_desc *desc = irq_data_get_msi_desc(data);
>>>> +	struct device *dev;
>>>> +	struct iommu_domain *d;
>>>> +	dma_addr_t iova;
>>>> +
>>>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>>>> +	iova = ((dma_addr_t)(desc->msg.address_hi) << 32) |
>>>> +		desc->msg.address_lo;
>>>> +#else
>>>> +	iova = desc->msg.address_lo;
>>>> +#endif
>>>> +
>>>> +	dev = msi_desc_to_dev(desc);
>>>> +	if (!dev)
>>>> +		return;
>>>> +
>>>> +	d = iommu_get_domain_for_dev(dev);
>>>> +	if (!d)
>>>> +		return;
>>>> +
>>>> +	iommu_put_single_reserved(d, iova);
>>>> +}
>>>> +
>>>> +void gic_pci_msi_domain_write_msg(struct irq_data *irq_data,
>>>> +				  struct msi_msg *msg)
>>>> +{
>>>> +	if (!msg->address_hi && !msg->address_lo && !msg->data)
>>>> +		gic_unset_msi_addr(irq_data); /* deactivate */
>>>> +	else
>>>> +		gic_set_msi_addr(irq_data, msg); /* activate, set_affinity */
>>>> +
>>>> +	pci_msi_domain_write_msg(irq_data, msg);
>>>> +}
>>>
>>> So by doing that, you are specializing this infrastructure to PCI.
>>> If you hijacked irq_compose_msi_msg() instead, you'd have both
>>> platform and PCI MSI for the same price.
>>>
>>> I can see a potential problem with the teardown of an MSI (I don't
>>> think the compose method is called on teardown), but I think this could
>>> be easily addressed.
>> Yes effectively this is the reason why I moved it from
>> irq_compose_msi_msg (my original implementation) to irq_write_msi_msg. I
>> noticed I had no way to detect the teardown whereas the
>> msi_domain_deactivate also calls irq_write_msi_msg which is quite
>> practical ;-) To be honest I need to further look at the non PCI
>> implementation.
> 
> Another thing to consider is that MSI controllers may use different
> doorbells for different CPU affinities.

OK thanks for pointing this out.

This is also a good confirmation that a single IOVA address is not
always sufficient (at some point we wondered if we could directly use
the MSI controller guest PA instead of having the user-space providing
an IOVA pool)

 With your implementation,
> repeatedly changing the affinity from one CPU to another would increase
> the refcounting, and never actually lower it (you don't necessarily go
> via an "unmap").

 Of course, none of that applies to GICv2m/GICv3-ITS,
> but that's worth considering.
> 
> So I think we may need some better tracking of the IOVA we program in
> the device, and offer a generic infrastructure for this instead of
> hiding it in the MSI controller drivers.
OK I will study that.

Thanks for your time!

Eric
> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC v3 07/15] iommu: iommu_get/put_single_reserved
  2016-02-18 16:51       ` Marc Zyngier
@ 2016-02-18 17:18         ` Eric Auger
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2016-02-18 17:18 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: eric.auger, alex.williamson, will.deacon, joro, tglx, jason,
	christoffer.dall, linux-arm-kernel, kvmarm, kvm,
	suravee.suthikulpanit, patches, linux-kernel, Manish.Jaggi,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	sherry.hurwitz, brijesh.singh, leo.duran, Thomas.Lendacky

Hi Marc,
On 02/18/2016 05:51 PM, Marc Zyngier wrote:
> On 18/02/16 16:42, Eric Auger wrote:
>> Hello,
>> On 02/18/2016 12:06 PM, Marc Zyngier wrote:
>>> On Fri, 12 Feb 2016 08:13:09 +0000
>>> Eric Auger <eric.auger@linaro.org> wrote:
>>>
>>>> This patch introduces iommu_get/put_single_reserved.
>>>>
>>>> iommu_get_single_reserved allows to allocate a new reserved iova page
>>>> and map it onto the physical page that contains a given physical address.
>>>> It returns the iova that is mapped onto the provided physical address.
>>>> Hence the physical address passed in argument does not need to be aligned.
>>>>
>>>> In case a mapping already exists between both pages, the IOVA mapped
>>>> to the PA is directly returned.
>>>>
>>>> Each time an iova is successfully returned a binding ref count is
>>>> incremented.
>>>>
>>>> iommu_put_single_reserved decrements the ref count and when this latter
>>>> is null, the mapping is destroyed and the iova is released.
>>>
>>> I wonder if there is a requirement for the caller to find out about the
>>> size of the mapping, or to impose a given size... MSIs clearly do not
>>> have that requirement (this is always a 32bit value), but since. 
>>> allocations usually pair address and size, I though I'd ask...
>> Yes. Currently this only makes sure the host PA is mapped and returns
>> the corresponding IOVA. It is part of the discussion we need to have on
>> the API besides the problematic of which API it should belong to.
> 
> One of the issues I have with the API at the moment is that there is no
> control on the page size. Imagine you have allocated a 4kB IOVA window
> for your MSI, but your IOMMU can only map 64kB (not unreasonable to
> imagine on arm64). What happens then?
The code checks the IOVA window size is aligned with the IOMMU page size
so I think that case is handled at iova domain creation
(arm_smmu_alloc_reserved_iova_domain).
> 
> Somehow, userspace should be told about it, one way or another.
I agree on that point. The user-space should be provided with the
information about the requested iova pool size and alignments. This is
missing in current rfc series.

Best Regards

Eric
> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-02-18 17:19 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-12  8:13 [RFC v3 00/15] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
2016-02-12  8:13 ` [RFC v3 01/15] iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute Eric Auger
2016-02-12  8:13 ` [RFC v3 02/15] vfio: expose MSI mapping requirement through VFIO_IOMMU_GET_INFO Eric Auger
2016-02-18  9:34   ` Marc Zyngier
2016-02-18 15:26     ` Eric Auger
2016-02-12  8:13 ` [RFC v3 03/15] vfio: introduce VFIO_IOVA_RESERVED vfio_dma type Eric Auger
2016-02-12  8:13 ` [RFC v3 04/15] iommu: add alloc/free_reserved_iova_domain Eric Auger
2016-02-12  8:13 ` [RFC v3 05/15] iommu/arm-smmu: implement alloc/free_reserved_iova_domain Eric Auger
2016-02-18 11:09   ` Robin Murphy
2016-02-18 15:22     ` Eric Auger
2016-02-18 16:06     ` Alex Williamson
2016-02-12  8:13 ` [RFC v3 06/15] iommu/arm-smmu: add a reserved binding RB tree Eric Auger
2016-02-12  8:13 ` [RFC v3 07/15] iommu: iommu_get/put_single_reserved Eric Auger
2016-02-18 11:06   ` Marc Zyngier
2016-02-18 16:42     ` Eric Auger
2016-02-18 16:51       ` Marc Zyngier
2016-02-18 17:18         ` Eric Auger
2016-02-12  8:13 ` [RFC v3 08/15] iommu/arm-smmu: implement iommu_get/put_single_reserved Eric Auger
2016-02-12  8:13 ` [RFC v3 09/15] iommu/arm-smmu: relinquish reserved resources on domain deletion Eric Auger
2016-02-12  8:13 ` [RFC v3 10/15] vfio: allow the user to register reserved iova range for MSI mapping Eric Auger
2016-02-12  8:13 ` [RFC v3 11/15] msi: Add a new MSI_FLAG_IRQ_REMAPPING flag Eric Auger
2016-02-12  8:13 ` [RFC v3 12/15] msi: export msi_get_domain_info Eric Auger
2016-02-12  8:13 ` [RFC v3 13/15] vfio/type1: also check IRQ remapping capability at msi domain Eric Auger
2016-02-12  8:13 ` [RFC v3 14/15] iommu/arm-smmu: do not advertise IOMMU_CAP_INTR_REMAP Eric Auger
2016-02-12  8:13 ` [RFC v3 15/15] irqchip/gicv2m/v3-its-pci-msi: IOMMU map the MSI frame when needed Eric Auger
2016-02-18 11:33   ` Marc Zyngier
2016-02-18 15:33     ` Eric Auger
2016-02-18 15:47       ` Marc Zyngier
2016-02-18 16:58         ` Eric Auger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).