All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 00/11] KVM PCIe/MSI passthrough on ARM/ARM64: re-design with transparent MSI mapping
@ 2016-09-27 20:48 ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

Following Robin's series [1] addressing MSI IOMMU mapping for devices
attached to a DMA ops domain, my previous 3 part series (v12) lost most
of its consistency. msi-iommu API role now is handled at dma-iommu level
while MSI doorbell registration API only is used for security assessment.
Also MSI layer part is not needed anymore since mapping directly is
done in the compose callback.

Here I propose an alternative approach, based on [1]. This approach
was discussed at the KVM forum with Christoffer Dall and Marc Zyngier,
and was suggested by Christoffer. The idea is we could let the iommu
layer transparently allocate MSI frame IOVAs in the holes left between
UNMANAGED iova slots, set by the iommu-api user.

This series introduces a new IOMMU domain type that allows mixing of
unmanaged and managed IOVA slots. We define an IOVA domain whose
aperture covers the GPA address range. Each time the IOMMU-API
user maps iova/pa, we reserve the IOVA range to prevent the iova
allocator from using it for MSI mapping.

This simplifies the user part which does not need anymore to provide an
IOVA aperture anymore.

The current series does not address the interrupt safety assessment,
which may be considered as a separate issue. Currently the assignemnt
is considered as unsafe, on ARM (even with a GICv3 ITS).

Please let me know what is your feeling wrt this alternative approach.

dependency:
[1] [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU
http://www.spinics.net/lists/arm-kernel/msg531110.html

Best Regards

Eric

Testing:
- functional on ARM64 AMD Overdrive HW (single GICv2m frame). Lack of contexts
  prevents me from testing multiple assignment.

Git: complete series available at
https://github.com/eauger/linux/tree/generic-v7-pcie-passthru-redesign-rfc
previous: https://github.com/eauger/linux/tree/v4.7-rc7-passthrough-v12

the above branch includes a temporary patch to work around a ThunderX pci
bus reset crash (which I think unrelated to this series):
"vfio: pci: HACK! workaround thunderx pci_try_reset_bus crash"
Do not take this one for other platforms.


Eric Auger (10):
  iommu: Add iommu_domain_msi_geometry and DOMAIN_ATTR_MSI_GEOMETRY
  iommu: Introduce IOMMU_CAP_TRANSLATE_MSI capability
  iommu: Introduce IOMMU_DOMAIN_MIXED
  iommu/dma: iommu_dma_(un)map_mixed
  iommu/arm-smmu: Allow IOMMU_DOMAIN_MIXED domain allocation
  iommu: Use IOMMU_DOMAIN_MIXED typed domain when IOMMU translates MSI
  vfio/type1: Sets the IOVA window in case MSI IOVA need to be allocated
  vfio/type1: Reserve IOVAs for IOMMU_DOMAIN_MIXED domains
  iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP
  iommu/arm-smmu: Advertise IOMMU_CAP_TRANSLATE_MSI

Robin Murphy (1):
  iommu/dma: Allow MSI-only cookies

 drivers/iommu/arm-smmu-v3.c     |  8 +++-
 drivers/iommu/arm-smmu.c        |  8 +++-
 drivers/iommu/dma-iommu.c       | 91 +++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c           | 10 ++++-
 drivers/vfio/vfio_iommu_type1.c | 48 ++++++++++++++++++----
 include/linux/dma-iommu.h       | 27 ++++++++++++
 include/linux/iommu.h           | 23 +++++++++++
 7 files changed, 203 insertions(+), 12 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 00/11] KVM PCIe/MSI passthrough on ARM/ARM64: re-design with transparent MSI mapping
@ 2016-09-27 20:48 ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

Following Robin's series [1] addressing MSI IOMMU mapping for devices
attached to a DMA ops domain, my previous 3 part series (v12) lost most
of its consistency. msi-iommu API role now is handled at dma-iommu level
while MSI doorbell registration API only is used for security assessment.
Also MSI layer part is not needed anymore since mapping directly is
done in the compose callback.

Here I propose an alternative approach, based on [1]. This approach
was discussed at the KVM forum with Christoffer Dall and Marc Zyngier,
and was suggested by Christoffer. The idea is we could let the iommu
layer transparently allocate MSI frame IOVAs in the holes left between
UNMANAGED iova slots, set by the iommu-api user.

This series introduces a new IOMMU domain type that allows mixing of
unmanaged and managed IOVA slots. We define an IOVA domain whose
aperture covers the GPA address range. Each time the IOMMU-API
user maps iova/pa, we reserve the IOVA range to prevent the iova
allocator from using it for MSI mapping.

This simplifies the user part which does not need anymore to provide an
IOVA aperture anymore.

The current series does not address the interrupt safety assessment,
which may be considered as a separate issue. Currently the assignemnt
is considered as unsafe, on ARM (even with a GICv3 ITS).

Please let me know what is your feeling wrt this alternative approach.

dependency:
[1] [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU
http://www.spinics.net/lists/arm-kernel/msg531110.html

Best Regards

Eric

Testing:
- functional on ARM64 AMD Overdrive HW (single GICv2m frame). Lack of contexts
  prevents me from testing multiple assignment.

Git: complete series available at
https://github.com/eauger/linux/tree/generic-v7-pcie-passthru-redesign-rfc
previous: https://github.com/eauger/linux/tree/v4.7-rc7-passthrough-v12

the above branch includes a temporary patch to work around a ThunderX pci
bus reset crash (which I think unrelated to this series):
"vfio: pci: HACK! workaround thunderx pci_try_reset_bus crash"
Do not take this one for other platforms.


Eric Auger (10):
  iommu: Add iommu_domain_msi_geometry and DOMAIN_ATTR_MSI_GEOMETRY
  iommu: Introduce IOMMU_CAP_TRANSLATE_MSI capability
  iommu: Introduce IOMMU_DOMAIN_MIXED
  iommu/dma: iommu_dma_(un)map_mixed
  iommu/arm-smmu: Allow IOMMU_DOMAIN_MIXED domain allocation
  iommu: Use IOMMU_DOMAIN_MIXED typed domain when IOMMU translates MSI
  vfio/type1: Sets the IOVA window in case MSI IOVA need to be allocated
  vfio/type1: Reserve IOVAs for IOMMU_DOMAIN_MIXED domains
  iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP
  iommu/arm-smmu: Advertise IOMMU_CAP_TRANSLATE_MSI

Robin Murphy (1):
  iommu/dma: Allow MSI-only cookies

 drivers/iommu/arm-smmu-v3.c     |  8 +++-
 drivers/iommu/arm-smmu.c        |  8 +++-
 drivers/iommu/dma-iommu.c       | 91 +++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c           | 10 ++++-
 drivers/vfio/vfio_iommu_type1.c | 48 ++++++++++++++++++----
 include/linux/dma-iommu.h       | 27 ++++++++++++
 include/linux/iommu.h           | 23 +++++++++++
 7 files changed, 203 insertions(+), 12 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 00/11] KVM PCIe/MSI passthrough on ARM/ARM64: re-design with transparent MSI mapping
@ 2016-09-27 20:48 ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: linux-arm-kernel

Following Robin's series [1] addressing MSI IOMMU mapping for devices
attached to a DMA ops domain, my previous 3 part series (v12) lost most
of its consistency. msi-iommu API role now is handled at dma-iommu level
while MSI doorbell registration API only is used for security assessment.
Also MSI layer part is not needed anymore since mapping directly is
done in the compose callback.

Here I propose an alternative approach, based on [1]. This approach
was discussed at the KVM forum with Christoffer Dall and Marc Zyngier,
and was suggested by Christoffer. The idea is we could let the iommu
layer transparently allocate MSI frame IOVAs in the holes left between
UNMANAGED iova slots, set by the iommu-api user.

This series introduces a new IOMMU domain type that allows mixing of
unmanaged and managed IOVA slots. We define an IOVA domain whose
aperture covers the GPA address range. Each time the IOMMU-API
user maps iova/pa, we reserve the IOVA range to prevent the iova
allocator from using it for MSI mapping.

This simplifies the user part which does not need anymore to provide an
IOVA aperture anymore.

The current series does not address the interrupt safety assessment,
which may be considered as a separate issue. Currently the assignemnt
is considered as unsafe, on ARM (even with a GICv3 ITS).

Please let me know what is your feeling wrt this alternative approach.

dependency:
[1] [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU
http://www.spinics.net/lists/arm-kernel/msg531110.html

Best Regards

Eric

Testing:
- functional on ARM64 AMD Overdrive HW (single GICv2m frame). Lack of contexts
  prevents me from testing multiple assignment.

Git: complete series available at
https://github.com/eauger/linux/tree/generic-v7-pcie-passthru-redesign-rfc
previous: https://github.com/eauger/linux/tree/v4.7-rc7-passthrough-v12

the above branch includes a temporary patch to work around a ThunderX pci
bus reset crash (which I think unrelated to this series):
"vfio: pci: HACK! workaround thunderx pci_try_reset_bus crash"
Do not take this one for other platforms.


Eric Auger (10):
  iommu: Add iommu_domain_msi_geometry and DOMAIN_ATTR_MSI_GEOMETRY
  iommu: Introduce IOMMU_CAP_TRANSLATE_MSI capability
  iommu: Introduce IOMMU_DOMAIN_MIXED
  iommu/dma: iommu_dma_(un)map_mixed
  iommu/arm-smmu: Allow IOMMU_DOMAIN_MIXED domain allocation
  iommu: Use IOMMU_DOMAIN_MIXED typed domain when IOMMU translates MSI
  vfio/type1: Sets the IOVA window in case MSI IOVA need to be allocated
  vfio/type1: Reserve IOVAs for IOMMU_DOMAIN_MIXED domains
  iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP
  iommu/arm-smmu: Advertise IOMMU_CAP_TRANSLATE_MSI

Robin Murphy (1):
  iommu/dma: Allow MSI-only cookies

 drivers/iommu/arm-smmu-v3.c     |  8 +++-
 drivers/iommu/arm-smmu.c        |  8 +++-
 drivers/iommu/dma-iommu.c       | 91 +++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c           | 10 ++++-
 drivers/vfio/vfio_iommu_type1.c | 48 ++++++++++++++++++----
 include/linux/dma-iommu.h       | 27 ++++++++++++
 include/linux/iommu.h           | 23 +++++++++++
 7 files changed, 203 insertions(+), 12 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 01/11] iommu: Add iommu_domain_msi_geometry and DOMAIN_ATTR_MSI_GEOMETRY
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

Introduce a new DOMAIN_ATTR_MSI_GEOMETRY domain attribute. It enables
to query the aperture of the IOVA window dedicated to MSIs.

x86 IOMMUs will typically expose an MSI aperture matching the 1MB
region [FEE0_0000h - FEF0_000h] corresponding to the APIC configuration
 space and no support for MSI translation.

On ARM, arm-smmu(-v3) translate MSIs. Aperture is not set indicating
MSI IOVA can live everywhere in the IOVA address space

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/iommu/iommu.c |  5 +++++
 include/linux/iommu.h | 13 +++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9a2f196..617cb2b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1485,6 +1485,7 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
 			  enum iommu_attr attr, void *data)
 {
 	struct iommu_domain_geometry *geometry;
+	struct iommu_domain_msi_geometry *msi_geometry;
 	bool *paging;
 	int ret = 0;
 	u32 *count;
@@ -1495,6 +1496,10 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
 		*geometry = domain->geometry;
 
 		break;
+	case DOMAIN_ATTR_MSI_GEOMETRY:
+		msi_geometry  = data;
+		*msi_geometry = domain->msi_geometry;
+		break;
 	case DOMAIN_ATTR_PAGING:
 		paging  = data;
 		*paging = (domain->pgsize_bitmap != 0UL);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 436dc21..ef6e047 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -52,6 +52,11 @@ struct iommu_domain_geometry {
 	bool force_aperture;       /* DMA only allowed in mappable range? */
 };
 
+struct iommu_domain_msi_geometry {
+	dma_addr_t aperture_start; /* First address usable for MSI IOVA     */
+	dma_addr_t aperture_end;   /* Last address usable for MSI IOVA      */
+};
+
 /* Domain feature flags */
 #define __IOMMU_DOMAIN_PAGING	(1U << 0)  /* Support for iommu_map/unmap */
 #define __IOMMU_DOMAIN_DMA_API	(1U << 1)  /* Domain for use in DMA-API
@@ -83,6 +88,7 @@ struct iommu_domain {
 	iommu_fault_handler_t handler;
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
+	struct iommu_domain_msi_geometry msi_geometry;
 	void *iova_cookie;
 };
 
@@ -108,6 +114,7 @@ enum iommu_cap {
 
 enum iommu_attr {
 	DOMAIN_ATTR_GEOMETRY,
+	DOMAIN_ATTR_MSI_GEOMETRY,
 	DOMAIN_ATTR_PAGING,
 	DOMAIN_ATTR_WINDOWS,
 	DOMAIN_ATTR_FSL_PAMU_STASH,
@@ -352,6 +359,12 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 void iommu_fwspec_free(struct device *dev);
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
 
+static inline bool iommu_domain_msi_aperture_valid(struct iommu_domain *domain)
+{
+	return (domain->msi_geometry.aperture_end >
+		domain->msi_geometry.aperture_start);
+}
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 01/11] iommu: Add iommu_domain_msi_geometry and DOMAIN_ATTR_MSI_GEOMETRY
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

Introduce a new DOMAIN_ATTR_MSI_GEOMETRY domain attribute. It enables
to query the aperture of the IOVA window dedicated to MSIs.

x86 IOMMUs will typically expose an MSI aperture matching the 1MB
region [FEE0_0000h - FEF0_000h] corresponding to the APIC configuration
 space and no support for MSI translation.

On ARM, arm-smmu(-v3) translate MSIs. Aperture is not set indicating
MSI IOVA can live everywhere in the IOVA address space

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Suggested-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/iommu/iommu.c |  5 +++++
 include/linux/iommu.h | 13 +++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9a2f196..617cb2b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1485,6 +1485,7 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
 			  enum iommu_attr attr, void *data)
 {
 	struct iommu_domain_geometry *geometry;
+	struct iommu_domain_msi_geometry *msi_geometry;
 	bool *paging;
 	int ret = 0;
 	u32 *count;
@@ -1495,6 +1496,10 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
 		*geometry = domain->geometry;
 
 		break;
+	case DOMAIN_ATTR_MSI_GEOMETRY:
+		msi_geometry  = data;
+		*msi_geometry = domain->msi_geometry;
+		break;
 	case DOMAIN_ATTR_PAGING:
 		paging  = data;
 		*paging = (domain->pgsize_bitmap != 0UL);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 436dc21..ef6e047 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -52,6 +52,11 @@ struct iommu_domain_geometry {
 	bool force_aperture;       /* DMA only allowed in mappable range? */
 };
 
+struct iommu_domain_msi_geometry {
+	dma_addr_t aperture_start; /* First address usable for MSI IOVA     */
+	dma_addr_t aperture_end;   /* Last address usable for MSI IOVA      */
+};
+
 /* Domain feature flags */
 #define __IOMMU_DOMAIN_PAGING	(1U << 0)  /* Support for iommu_map/unmap */
 #define __IOMMU_DOMAIN_DMA_API	(1U << 1)  /* Domain for use in DMA-API
@@ -83,6 +88,7 @@ struct iommu_domain {
 	iommu_fault_handler_t handler;
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
+	struct iommu_domain_msi_geometry msi_geometry;
 	void *iova_cookie;
 };
 
@@ -108,6 +114,7 @@ enum iommu_cap {
 
 enum iommu_attr {
 	DOMAIN_ATTR_GEOMETRY,
+	DOMAIN_ATTR_MSI_GEOMETRY,
 	DOMAIN_ATTR_PAGING,
 	DOMAIN_ATTR_WINDOWS,
 	DOMAIN_ATTR_FSL_PAMU_STASH,
@@ -352,6 +359,12 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 void iommu_fwspec_free(struct device *dev);
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
 
+static inline bool iommu_domain_msi_aperture_valid(struct iommu_domain *domain)
+{
+	return (domain->msi_geometry.aperture_end >
+		domain->msi_geometry.aperture_start);
+}
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 02/11] iommu: Introduce IOMMU_CAP_TRANSLATE_MSI capability
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

Introduce a new IOMMU_CAP_TRANSLATE_MSI capability to check whether
the IOMMU translates MSI write transactions. This capability will be
checked on domain allocation to check the type of the domain that
shall be used.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ef6e047..5c2673a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -97,6 +97,7 @@ enum iommu_cap {
 					   transactions */
 	IOMMU_CAP_INTR_REMAP,		/* IOMMU supports interrupt isolation */
 	IOMMU_CAP_NOEXEC,		/* IOMMU_NOEXEC flag */
+	IOMMU_CAP_TRANSLATE_MSI,	/* IOMMU translates MSI transactions */
 };
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 02/11] iommu: Introduce IOMMU_CAP_TRANSLATE_MSI capability
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

Introduce a new IOMMU_CAP_TRANSLATE_MSI capability to check whether
the IOMMU translates MSI write transactions. This capability will be
checked on domain allocation to check the type of the domain that
shall be used.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ef6e047..5c2673a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -97,6 +97,7 @@ enum iommu_cap {
 					   transactions */
 	IOMMU_CAP_INTR_REMAP,		/* IOMMU supports interrupt isolation */
 	IOMMU_CAP_NOEXEC,		/* IOMMU_NOEXEC flag */
+	IOMMU_CAP_TRANSLATE_MSI,	/* IOMMU translates MSI transactions */
 };
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 03/11] iommu: Introduce IOMMU_DOMAIN_MIXED
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

We introduce a new IOMMU domain type, dubbed IOMMU_DOMAIN_MIXED.
It is bound to be used as a replacement of IOMMU_DOMAIN_UNMANAGED
when the IOMMU translates MSI addresses. Such domain hosts
"unmanaged" reserved IOVA ranges chosen by the iommu-api user,
dished out as part of alloc_iova. Rest if available for internal
iova needs such as MSI frame IOVA allocation.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/linux/iommu.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5c2673a..44fe213 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -62,6 +62,9 @@ struct iommu_domain_msi_geometry {
 #define __IOMMU_DOMAIN_DMA_API	(1U << 1)  /* Domain for use in DMA-API
 					      implementation              */
 #define __IOMMU_DOMAIN_PT	(1U << 2)  /* Domain is identity mapped   */
+#define __IOMMU_DOMAIN_MIXED	(1U << 3)  /* Domain mixing unmanaged and
+					    * managed IOVAS
+					    */
 
 /*
  * This are the possible domain-types
@@ -71,6 +74,9 @@ struct iommu_domain_msi_geometry {
  *	IOMMU_DOMAIN_IDENTITY	- DMA addresses are system physical addresses
  *	IOMMU_DOMAIN_UNMANAGED	- DMA mappings managed by IOMMU-API user, used
  *				  for VMs
+ *	IOMMU_DOMAIN_MIXED	- Most DMA mappings are managed by IOMMU-API
+ *				  users and holes are left available for
+				  internal use such as MSI frame IOVA allocation
  *	IOMMU_DOMAIN_DMA	- Internally used for DMA-API implementations.
  *				  This flag allows IOMMU drivers to implement
  *				  certain optimizations for these domains
@@ -80,6 +86,9 @@ struct iommu_domain_msi_geometry {
 #define IOMMU_DOMAIN_UNMANAGED	(__IOMMU_DOMAIN_PAGING)
 #define IOMMU_DOMAIN_DMA	(__IOMMU_DOMAIN_PAGING |	\
 				 __IOMMU_DOMAIN_DMA_API)
+#define IOMMU_DOMAIN_MIXED	(__IOMMU_DOMAIN_MIXED |		\
+				 __IOMMU_DOMAIN_PAGING |	\
+				 __IOMMU_DOMAIN_DMA_API)
 
 struct iommu_domain {
 	unsigned type;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 03/11] iommu: Introduce IOMMU_DOMAIN_MIXED
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

We introduce a new IOMMU domain type, dubbed IOMMU_DOMAIN_MIXED.
It is bound to be used as a replacement of IOMMU_DOMAIN_UNMANAGED
when the IOMMU translates MSI addresses. Such domain hosts
"unmanaged" reserved IOVA ranges chosen by the iommu-api user,
dished out as part of alloc_iova. Rest if available for internal
iova needs such as MSI frame IOVA allocation.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 include/linux/iommu.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5c2673a..44fe213 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -62,6 +62,9 @@ struct iommu_domain_msi_geometry {
 #define __IOMMU_DOMAIN_DMA_API	(1U << 1)  /* Domain for use in DMA-API
 					      implementation              */
 #define __IOMMU_DOMAIN_PT	(1U << 2)  /* Domain is identity mapped   */
+#define __IOMMU_DOMAIN_MIXED	(1U << 3)  /* Domain mixing unmanaged and
+					    * managed IOVAS
+					    */
 
 /*
  * This are the possible domain-types
@@ -71,6 +74,9 @@ struct iommu_domain_msi_geometry {
  *	IOMMU_DOMAIN_IDENTITY	- DMA addresses are system physical addresses
  *	IOMMU_DOMAIN_UNMANAGED	- DMA mappings managed by IOMMU-API user, used
  *				  for VMs
+ *	IOMMU_DOMAIN_MIXED	- Most DMA mappings are managed by IOMMU-API
+ *				  users and holes are left available for
+				  internal use such as MSI frame IOVA allocation
  *	IOMMU_DOMAIN_DMA	- Internally used for DMA-API implementations.
  *				  This flag allows IOMMU drivers to implement
  *				  certain optimizations for these domains
@@ -80,6 +86,9 @@ struct iommu_domain_msi_geometry {
 #define IOMMU_DOMAIN_UNMANAGED	(__IOMMU_DOMAIN_PAGING)
 #define IOMMU_DOMAIN_DMA	(__IOMMU_DOMAIN_PAGING |	\
 				 __IOMMU_DOMAIN_DMA_API)
+#define IOMMU_DOMAIN_MIXED	(__IOMMU_DOMAIN_MIXED |		\
+				 __IOMMU_DOMAIN_PAGING |	\
+				 __IOMMU_DOMAIN_DMA_API)
 
 struct iommu_domain {
 	unsigned type;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 04/11] iommu/dma: Allow MSI-only cookies
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

From: Robin Murphy <robin.murphy@arm.com>

IOMMU domain users such as VFIO face a similar problem to DMA API ops
with regard to mapping MSI messages in systems where the MSI write is
subject to IOMMU translation. With the relevant infrastructure now in
place for managed DMA domains, it's actually really simple for other
users to piggyback off that and reap the benefits without giving up
their own IOVA management, and without having to reinvent their own
wheel in the MSI layer.

Allow such users to opt into automatic MSI remapping by dedicating a
region of their IOVA space to a managed cookie.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v1 -> v2:
- add NULL last param to iommu_dma_init_domain
- use cookie_iovad()
- check against IOMMU_DOMAIN_MIXED domain type
---
 drivers/iommu/dma-iommu.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h |  9 +++++++++
 2 files changed, 52 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c5ab866..04bbc85 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -716,3 +716,46 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
 		msg->address_lo += lower_32_bits(msi_page->iova);
 	}
 }
+
+/**
+ * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
+ * @domain: IOMMU domain to prepare
+ * @base: Base address of IOVA region to use as the MSI remapping aperture
+ * @size: Size of the desired MSI aperture
+ *
+ * Users who manage their own IOVA allocation and do not want DMA API support,
+ * but would still like to take advantage of automatic MSI remapping, can use
+ * this to initialise their own domain appropriately.
+ */
+int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+		dma_addr_t base, u64 size)
+{
+	struct iova_domain *iovad;
+	int ret;
+
+	if (domain->type != IOMMU_DOMAIN_MIXED)
+		return -EINVAL;
+
+	/* check the iova domain intersects the MSI window */
+	if (iommu_domain_msi_aperture_valid(domain) &&
+		(domain->msi_geometry.aperture_end < base ||
+		 domain->msi_geometry.aperture_start > base + size - 1))
+		return -EINVAL;
+
+	ret = iommu_get_dma_cookie(domain);
+	if (ret)
+		return ret;
+
+	ret = iommu_dma_init_domain(domain, base, size, NULL);
+	if (ret) {
+		iommu_put_dma_cookie(domain);
+		return ret;
+	}
+
+	iovad = cookie_iovad(domain);
+	if (base < U64_MAX - size)
+		reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
+
+	return 0;
+}
+EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 32c5890..1c55413 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
 /* The DMA API isn't _quite_ the whole story, though... */
 void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
 
+int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+		dma_addr_t base, u64 size);
+
 #else
 
 struct iommu_domain;
@@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
 {
 }
 
+static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+		dma_addr_t base, u64 size)
+{
+	return -ENODEV;
+}
+
 #endif	/* CONFIG_IOMMU_DMA */
 #endif	/* __KERNEL__ */
 #endif	/* __DMA_IOMMU_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 04/11] iommu/dma: Allow MSI-only cookies
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

From: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>

IOMMU domain users such as VFIO face a similar problem to DMA API ops
with regard to mapping MSI messages in systems where the MSI write is
subject to IOMMU translation. With the relevant infrastructure now in
place for managed DMA domains, it's actually really simple for other
users to piggyback off that and reap the benefits without giving up
their own IOVA management, and without having to reinvent their own
wheel in the MSI layer.

Allow such users to opt into automatic MSI remapping by dedicating a
region of their IOVA space to a managed cookie.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

---

v1 -> v2:
- add NULL last param to iommu_dma_init_domain
- use cookie_iovad()
- check against IOMMU_DOMAIN_MIXED domain type
---
 drivers/iommu/dma-iommu.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h |  9 +++++++++
 2 files changed, 52 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c5ab866..04bbc85 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -716,3 +716,46 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
 		msg->address_lo += lower_32_bits(msi_page->iova);
 	}
 }
+
+/**
+ * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
+ * @domain: IOMMU domain to prepare
+ * @base: Base address of IOVA region to use as the MSI remapping aperture
+ * @size: Size of the desired MSI aperture
+ *
+ * Users who manage their own IOVA allocation and do not want DMA API support,
+ * but would still like to take advantage of automatic MSI remapping, can use
+ * this to initialise their own domain appropriately.
+ */
+int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+		dma_addr_t base, u64 size)
+{
+	struct iova_domain *iovad;
+	int ret;
+
+	if (domain->type != IOMMU_DOMAIN_MIXED)
+		return -EINVAL;
+
+	/* check the iova domain intersects the MSI window */
+	if (iommu_domain_msi_aperture_valid(domain) &&
+		(domain->msi_geometry.aperture_end < base ||
+		 domain->msi_geometry.aperture_start > base + size - 1))
+		return -EINVAL;
+
+	ret = iommu_get_dma_cookie(domain);
+	if (ret)
+		return ret;
+
+	ret = iommu_dma_init_domain(domain, base, size, NULL);
+	if (ret) {
+		iommu_put_dma_cookie(domain);
+		return ret;
+	}
+
+	iovad = cookie_iovad(domain);
+	if (base < U64_MAX - size)
+		reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
+
+	return 0;
+}
+EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 32c5890..1c55413 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
 /* The DMA API isn't _quite_ the whole story, though... */
 void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
 
+int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+		dma_addr_t base, u64 size);
+
 #else
 
 struct iommu_domain;
@@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
 {
 }
 
+static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+		dma_addr_t base, u64 size)
+{
+	return -ENODEV;
+}
+
 #endif	/* CONFIG_IOMMU_DMA */
 #endif	/* __KERNEL__ */
 #endif	/* __DMA_IOMMU_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on
IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap
they reserve the IOVA window to prevent the iova allocator to
allocate in those areas.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h | 18 ++++++++++++++++++
 2 files changed, 66 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 04bbc85..db21143 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
 	return 0;
 }
 EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
+
+int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
+			phys_addr_t paddr, size_t size, int prot)
+{
+	struct iova_domain *iovad;
+	unsigned long lo, hi;
+	int ret;
+
+	if (domain->type != IOMMU_DOMAIN_MIXED)
+		return -EINVAL;
+
+	if (!domain->iova_cookie)
+		return -EINVAL;
+
+	iovad = cookie_iovad(domain);
+
+	lo = iova_pfn(iovad, iova);
+	hi = iova_pfn(iovad, iova + size - 1);
+	reserve_iova(iovad, lo, hi);
+	ret = iommu_map(domain, iova, paddr, size, prot);
+	if (ret)
+		free_iova(iovad, lo);
+	return ret;
+}
+EXPORT_SYMBOL(iommu_dma_map_mixed);
+
+size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
+			     size_t size)
+{
+	struct iova_domain *iovad;
+	unsigned long lo;
+	size_t ret;
+
+	if (domain->type != IOMMU_DOMAIN_MIXED)
+		return -EINVAL;
+
+	if (!domain->iova_cookie)
+		return -EINVAL;
+
+	iovad = cookie_iovad(domain);
+	lo = iova_pfn(iovad, iova);
+
+	ret = iommu_unmap(domain, iova, size);
+	if (ret == size)
+		free_iova(iovad, lo);
+	return ret;
+}
+EXPORT_SYMBOL(iommu_dma_unmap_mixed);
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 1c55413..f2aa855 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -70,6 +70,12 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
 int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
 		dma_addr_t base, u64 size);
 
+int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
+			phys_addr_t paddr, size_t size, int prot);
+
+size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
+			     size_t size);
+
 #else
 
 struct iommu_domain;
@@ -99,6 +105,18 @@ static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
 	return -ENODEV;
 }
 
+int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
+			phys_addr_t paddr, size_t size, int prot)
+{
+	return -ENODEV;
+}
+
+size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
+			     size_t size)
+{
+	return -ENODEV;
+}
+
 #endif	/* CONFIG_IOMMU_DMA */
 #endif	/* __KERNEL__ */
 #endif	/* __DMA_IOMMU_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on
IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap
they reserve the IOVA window to prevent the iova allocator to
allocate in those areas.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h | 18 ++++++++++++++++++
 2 files changed, 66 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 04bbc85..db21143 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
 	return 0;
 }
 EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
+
+int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
+			phys_addr_t paddr, size_t size, int prot)
+{
+	struct iova_domain *iovad;
+	unsigned long lo, hi;
+	int ret;
+
+	if (domain->type != IOMMU_DOMAIN_MIXED)
+		return -EINVAL;
+
+	if (!domain->iova_cookie)
+		return -EINVAL;
+
+	iovad = cookie_iovad(domain);
+
+	lo = iova_pfn(iovad, iova);
+	hi = iova_pfn(iovad, iova + size - 1);
+	reserve_iova(iovad, lo, hi);
+	ret = iommu_map(domain, iova, paddr, size, prot);
+	if (ret)
+		free_iova(iovad, lo);
+	return ret;
+}
+EXPORT_SYMBOL(iommu_dma_map_mixed);
+
+size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
+			     size_t size)
+{
+	struct iova_domain *iovad;
+	unsigned long lo;
+	size_t ret;
+
+	if (domain->type != IOMMU_DOMAIN_MIXED)
+		return -EINVAL;
+
+	if (!domain->iova_cookie)
+		return -EINVAL;
+
+	iovad = cookie_iovad(domain);
+	lo = iova_pfn(iovad, iova);
+
+	ret = iommu_unmap(domain, iova, size);
+	if (ret == size)
+		free_iova(iovad, lo);
+	return ret;
+}
+EXPORT_SYMBOL(iommu_dma_unmap_mixed);
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 1c55413..f2aa855 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -70,6 +70,12 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
 int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
 		dma_addr_t base, u64 size);
 
+int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
+			phys_addr_t paddr, size_t size, int prot);
+
+size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
+			     size_t size);
+
 #else
 
 struct iommu_domain;
@@ -99,6 +105,18 @@ static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
 	return -ENODEV;
 }
 
+int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
+			phys_addr_t paddr, size_t size, int prot)
+{
+	return -ENODEV;
+}
+
+size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
+			     size_t size)
+{
+	return -ENODEV;
+}
+
 #endif	/* CONFIG_IOMMU_DMA */
 #endif	/* __KERNEL__ */
 #endif	/* __DMA_IOMMU_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 06/11] iommu/arm-smmu: Allow IOMMU_DOMAIN_MIXED domain allocation
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

This patch allows the allocation of IOMMU_DOMAIN_MIXED typed
domains in arm-smmu and arm-smmu-v3.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/arm-smmu-v3.c | 3 ++-
 drivers/iommu/arm-smmu.c    | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 15c01c3..e825679 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1383,7 +1383,8 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
 
-	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA &&
+	    type != IOMMU_DOMAIN_MIXED)
 		return NULL;
 
 	/*
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index ac4aab9..707c09b 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1003,7 +1003,8 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
 
-	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA &&
+	    type != IOMMU_DOMAIN_MIXED)
 		return NULL;
 	/*
 	 * Allocate the domain and initialise some of its data structures.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 06/11] iommu/arm-smmu: Allow IOMMU_DOMAIN_MIXED domain allocation
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

This patch allows the allocation of IOMMU_DOMAIN_MIXED typed
domains in arm-smmu and arm-smmu-v3.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 3 ++-
 drivers/iommu/arm-smmu.c    | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 15c01c3..e825679 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1383,7 +1383,8 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
 
-	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA &&
+	    type != IOMMU_DOMAIN_MIXED)
 		return NULL;
 
 	/*
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index ac4aab9..707c09b 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1003,7 +1003,8 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
 
-	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA &&
+	    type != IOMMU_DOMAIN_MIXED)
 		return NULL;
 	/*
 	 * Allocate the domain and initialise some of its data structures.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 07/11] iommu: Use IOMMU_DOMAIN_MIXED typed domain when IOMMU translates MSI
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

If the IOMMU advertises IOMMU_CAP_TRANSLATE_MSI choose
IOMMU_DOMAIN_MIXED domain type instead of IOMMU_DOMAIN_UNMANAGED
to allow transparent allocation of MSI frame IOVAs.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/iommu.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 617cb2b..8b4b90c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1069,7 +1069,10 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 
 struct iommu_domain *iommu_domain_alloc(struct bus_type *bus)
 {
-	return __iommu_domain_alloc(bus, IOMMU_DOMAIN_UNMANAGED);
+	if (bus->iommu_ops->capable(IOMMU_CAP_TRANSLATE_MSI))
+		return __iommu_domain_alloc(bus, IOMMU_DOMAIN_MIXED);
+	else
+		return __iommu_domain_alloc(bus, IOMMU_DOMAIN_UNMANAGED);
 }
 EXPORT_SYMBOL_GPL(iommu_domain_alloc);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 07/11] iommu: Use IOMMU_DOMAIN_MIXED typed domain when IOMMU translates MSI
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

If the IOMMU advertises IOMMU_CAP_TRANSLATE_MSI choose
IOMMU_DOMAIN_MIXED domain type instead of IOMMU_DOMAIN_UNMANAGED
to allow transparent allocation of MSI frame IOVAs.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/iommu/iommu.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 617cb2b..8b4b90c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1069,7 +1069,10 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 
 struct iommu_domain *iommu_domain_alloc(struct bus_type *bus)
 {
-	return __iommu_domain_alloc(bus, IOMMU_DOMAIN_UNMANAGED);
+	if (bus->iommu_ops->capable(IOMMU_CAP_TRANSLATE_MSI))
+		return __iommu_domain_alloc(bus, IOMMU_DOMAIN_MIXED);
+	else
+		return __iommu_domain_alloc(bus, IOMMU_DOMAIN_UNMANAGED);
 }
 EXPORT_SYMBOL_GPL(iommu_domain_alloc);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 08/11] vfio/type1: Sets the IOVA window in case MSI IOVA need to be allocated
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

In case the IOMMU translates MSI, MSI IOVA will need to be transparently
allocated by the kernel in unused GPA space. Let's initialize a DMA domain
where some slots will be reserved "unmanaged" slots and rest is left
available for IOVA allocation.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/vfio/vfio_iommu_type1.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2ba1942..ad08fd0 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -36,6 +36,7 @@
 #include <linux/uaccess.h>
 #include <linux/vfio.h>
 #include <linux/workqueue.h>
+#include <linux/dma-iommu.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
@@ -829,6 +830,11 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 
 	vfio_test_domain_fgsp(domain);
 
+	/* sets the IOVA window */
+	if (iommu_capable(bus, IOMMU_CAP_TRANSLATE_MSI) &&
+	    iommu_get_dma_msi_region_cookie(domain->domain, 0, ULONG_MAX))
+		goto out_detach;
+
 	/* replay mappings on new domains */
 	ret = vfio_iommu_replay(iommu, domain);
 	if (ret)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 08/11] vfio/type1: Sets the IOVA window in case MSI IOVA need to be allocated
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

In case the IOMMU translates MSI, MSI IOVA will need to be transparently
allocated by the kernel in unused GPA space. Let's initialize a DMA domain
where some slots will be reserved "unmanaged" slots and rest is left
available for IOVA allocation.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/vfio/vfio_iommu_type1.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2ba1942..ad08fd0 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -36,6 +36,7 @@
 #include <linux/uaccess.h>
 #include <linux/vfio.h>
 #include <linux/workqueue.h>
+#include <linux/dma-iommu.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
@@ -829,6 +830,11 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 
 	vfio_test_domain_fgsp(domain);
 
+	/* sets the IOVA window */
+	if (iommu_capable(bus, IOMMU_CAP_TRANSLATE_MSI) &&
+	    iommu_get_dma_msi_region_cookie(domain->domain, 0, ULONG_MAX))
+		goto out_detach;
+
 	/* replay mappings on new domains */
 	ret = vfio_iommu_replay(iommu, domain);
 	if (ret)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 09/11] vfio/type1: Reserve IOVAs for IOMMU_DOMAIN_MIXED domains
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

For IOMMU_DOMAIN_MIXED domain types, the user space defines most
of the IOVA regions (UNMANAGED) and map them. Since we want to allow
the kernel to allocate spare IOVAs within the holes we need to reserve
the user space defined IOVA regions.

in vfio_test_domain_fgsp, we keep the original iommu_(un)map since
the purpose just is to try map/unmap and not to keep it.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/vfio/vfio_iommu_type1.c | 42 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index ad08fd0..2d1eede 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -37,6 +37,7 @@
 #include <linux/vfio.h>
 #include <linux/workqueue.h>
 #include <linux/dma-iommu.h>
+#include <linux/iova.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
@@ -255,6 +256,32 @@ static int vaddr_get_pfn(unsigned long vaddr, int prot, unsigned long *pfn)
 	return ret;
 }
 
+static int iommu_dma_map(struct iommu_domain *domain, unsigned long iova,
+			 phys_addr_t paddr, size_t size, int prot)
+{
+	switch (domain->type) {
+	case IOMMU_DOMAIN_UNMANAGED:
+		return iommu_map(domain, iova, paddr, size, prot);
+	case IOMMU_DOMAIN_MIXED:
+		return iommu_dma_map_mixed(domain, iova, paddr, size, prot);
+	default:
+		return -ENOENT;
+	}
+}
+
+static size_t iommu_dma_unmap(struct iommu_domain *domain, unsigned long iova,
+			      size_t size)
+{
+	switch (domain->type) {
+	case IOMMU_DOMAIN_UNMANAGED:
+		return iommu_unmap(domain, iova, size);
+	case IOMMU_DOMAIN_MIXED:
+		return iommu_dma_unmap_mixed(domain, iova, size);
+	default:
+		return -ENOENT;
+	}
+}
+
 /*
  * Attempt to pin pages.  We really don't want to track all the pfns and
  * the iommu can only map chunks of consecutive pfns anyway, so get the
@@ -353,7 +380,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 				      struct vfio_domain, next);
 
 	list_for_each_entry_continue(d, &iommu->domain_list, next) {
-		iommu_unmap(d->domain, dma->iova, dma->size);
+		iommu_dma_unmap(d->domain, dma->iova, dma->size);
 		cond_resched();
 	}
 
@@ -379,7 +406,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 				break;
 		}
 
-		unmapped = iommu_unmap(domain->domain, iova, len);
+		unmapped = iommu_dma_unmap(domain->domain, iova, len);
 		if (WARN_ON(!unmapped))
 			break;
 
@@ -519,7 +546,7 @@ static int map_try_harder(struct vfio_domain *domain, dma_addr_t iova,
 	int ret = 0;
 
 	for (i = 0; i < npage; i++, pfn++, iova += PAGE_SIZE) {
-		ret = iommu_map(domain->domain, iova,
+		ret = iommu_dma_map(domain->domain, iova,
 				(phys_addr_t)pfn << PAGE_SHIFT,
 				PAGE_SIZE, prot | domain->prot);
 		if (ret)
@@ -527,7 +554,7 @@ static int map_try_harder(struct vfio_domain *domain, dma_addr_t iova,
 	}
 
 	for (; i < npage && i > 0; i--, iova -= PAGE_SIZE)
-		iommu_unmap(domain->domain, iova, PAGE_SIZE);
+		iommu_dma_unmap(domain->domain, iova, PAGE_SIZE);
 
 	return ret;
 }
@@ -539,7 +566,8 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova,
 	int ret;
 
 	list_for_each_entry(d, &iommu->domain_list, next) {
-		ret = iommu_map(d->domain, iova, (phys_addr_t)pfn << PAGE_SHIFT,
+		ret = iommu_dma_map(d->domain, iova,
+				(phys_addr_t)pfn << PAGE_SHIFT,
 				npage << PAGE_SHIFT, prot | d->prot);
 		if (ret) {
 			if (ret != -EBUSY ||
@@ -554,7 +582,7 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova,
 
 unwind:
 	list_for_each_entry_continue_reverse(d, &iommu->domain_list, next)
-		iommu_unmap(d->domain, iova, npage << PAGE_SHIFT);
+		iommu_dma_unmap(d->domain, iova, npage << PAGE_SHIFT);
 
 	return ret;
 }
@@ -690,7 +718,7 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 								 iova + size))
 				size += PAGE_SIZE;
 
-			ret = iommu_map(domain->domain, iova, phys,
+			ret = iommu_dma_map(domain->domain, iova, phys,
 					size, dma->prot | domain->prot);
 			if (ret)
 				return ret;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 09/11] vfio/type1: Reserve IOVAs for IOMMU_DOMAIN_MIXED domains
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

For IOMMU_DOMAIN_MIXED domain types, the user space defines most
of the IOVA regions (UNMANAGED) and map them. Since we want to allow
the kernel to allocate spare IOVAs within the holes we need to reserve
the user space defined IOVA regions.

in vfio_test_domain_fgsp, we keep the original iommu_(un)map since
the purpose just is to try map/unmap and not to keep it.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/vfio/vfio_iommu_type1.c | 42 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index ad08fd0..2d1eede 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -37,6 +37,7 @@
 #include <linux/vfio.h>
 #include <linux/workqueue.h>
 #include <linux/dma-iommu.h>
+#include <linux/iova.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
@@ -255,6 +256,32 @@ static int vaddr_get_pfn(unsigned long vaddr, int prot, unsigned long *pfn)
 	return ret;
 }
 
+static int iommu_dma_map(struct iommu_domain *domain, unsigned long iova,
+			 phys_addr_t paddr, size_t size, int prot)
+{
+	switch (domain->type) {
+	case IOMMU_DOMAIN_UNMANAGED:
+		return iommu_map(domain, iova, paddr, size, prot);
+	case IOMMU_DOMAIN_MIXED:
+		return iommu_dma_map_mixed(domain, iova, paddr, size, prot);
+	default:
+		return -ENOENT;
+	}
+}
+
+static size_t iommu_dma_unmap(struct iommu_domain *domain, unsigned long iova,
+			      size_t size)
+{
+	switch (domain->type) {
+	case IOMMU_DOMAIN_UNMANAGED:
+		return iommu_unmap(domain, iova, size);
+	case IOMMU_DOMAIN_MIXED:
+		return iommu_dma_unmap_mixed(domain, iova, size);
+	default:
+		return -ENOENT;
+	}
+}
+
 /*
  * Attempt to pin pages.  We really don't want to track all the pfns and
  * the iommu can only map chunks of consecutive pfns anyway, so get the
@@ -353,7 +380,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 				      struct vfio_domain, next);
 
 	list_for_each_entry_continue(d, &iommu->domain_list, next) {
-		iommu_unmap(d->domain, dma->iova, dma->size);
+		iommu_dma_unmap(d->domain, dma->iova, dma->size);
 		cond_resched();
 	}
 
@@ -379,7 +406,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 				break;
 		}
 
-		unmapped = iommu_unmap(domain->domain, iova, len);
+		unmapped = iommu_dma_unmap(domain->domain, iova, len);
 		if (WARN_ON(!unmapped))
 			break;
 
@@ -519,7 +546,7 @@ static int map_try_harder(struct vfio_domain *domain, dma_addr_t iova,
 	int ret = 0;
 
 	for (i = 0; i < npage; i++, pfn++, iova += PAGE_SIZE) {
-		ret = iommu_map(domain->domain, iova,
+		ret = iommu_dma_map(domain->domain, iova,
 				(phys_addr_t)pfn << PAGE_SHIFT,
 				PAGE_SIZE, prot | domain->prot);
 		if (ret)
@@ -527,7 +554,7 @@ static int map_try_harder(struct vfio_domain *domain, dma_addr_t iova,
 	}
 
 	for (; i < npage && i > 0; i--, iova -= PAGE_SIZE)
-		iommu_unmap(domain->domain, iova, PAGE_SIZE);
+		iommu_dma_unmap(domain->domain, iova, PAGE_SIZE);
 
 	return ret;
 }
@@ -539,7 +566,8 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova,
 	int ret;
 
 	list_for_each_entry(d, &iommu->domain_list, next) {
-		ret = iommu_map(d->domain, iova, (phys_addr_t)pfn << PAGE_SHIFT,
+		ret = iommu_dma_map(d->domain, iova,
+				(phys_addr_t)pfn << PAGE_SHIFT,
 				npage << PAGE_SHIFT, prot | d->prot);
 		if (ret) {
 			if (ret != -EBUSY ||
@@ -554,7 +582,7 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova,
 
 unwind:
 	list_for_each_entry_continue_reverse(d, &iommu->domain_list, next)
-		iommu_unmap(d->domain, iova, npage << PAGE_SHIFT);
+		iommu_dma_unmap(d->domain, iova, npage << PAGE_SHIFT);
 
 	return ret;
 }
@@ -690,7 +718,7 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 								 iova + size))
 				size += PAGE_SIZE;
 
-			ret = iommu_map(domain->domain, iova, phys,
+			ret = iommu_dma_map(domain->domain, iova, phys,
 					size, dma->prot | domain->prot);
 			if (ret)
 				return ret;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 10/11] iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

Do not advertise IOMMU_CAP_INTR_REMAP for arm-smmu(-v3). Indeed the
irq_remapping capability is abstracted on irqchip side for ARM as
opposed to Intel IOMMU featuring IRQ remapping HW.

So for the time being assignement is considered as unsafe on ARM,
until we get an accurate description of whether the MSI controllers
are downstream to smmus.

This commit affects platform and PCIe device assignment use cases.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
---
 drivers/iommu/arm-smmu-v3.c | 3 ++-
 drivers/iommu/arm-smmu.c    | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index e825679..c86ba84 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1371,7 +1371,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 	case IOMMU_CAP_CACHE_COHERENCY:
 		return true;
 	case IOMMU_CAP_INTR_REMAP:
-		return true; /* MSIs are just memory writes */
+		/* interrupt translation handled at MSI controller level */
+		return false;
 	case IOMMU_CAP_NOEXEC:
 		return true;
 	default:
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 707c09b..7af1dd0 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1359,7 +1359,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 		 */
 		return true;
 	case IOMMU_CAP_INTR_REMAP:
-		return true; /* MSIs are just memory writes */
+		/* interrupt translation handled at MSI controller level */
+		return false;
 	case IOMMU_CAP_NOEXEC:
 		return true;
 	default:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 10/11] iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

Do not advertise IOMMU_CAP_INTR_REMAP for arm-smmu(-v3). Indeed the
irq_remapping capability is abstracted on irqchip side for ARM as
opposed to Intel IOMMU featuring IRQ remapping HW.

So for the time being assignement is considered as unsafe on ARM,
until we get an accurate description of whether the MSI controllers
are downstream to smmus.

This commit affects platform and PCIe device assignment use cases.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

---
---
 drivers/iommu/arm-smmu-v3.c | 3 ++-
 drivers/iommu/arm-smmu.c    | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index e825679..c86ba84 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1371,7 +1371,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 	case IOMMU_CAP_CACHE_COHERENCY:
 		return true;
 	case IOMMU_CAP_INTR_REMAP:
-		return true; /* MSIs are just memory writes */
+		/* interrupt translation handled at MSI controller level */
+		return false;
 	case IOMMU_CAP_NOEXEC:
 		return true;
 	default:
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 707c09b..7af1dd0 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1359,7 +1359,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 		 */
 		return true;
 	case IOMMU_CAP_INTR_REMAP:
-		return true; /* MSIs are just memory writes */
+		/* interrupt translation handled at MSI controller level */
+		return false;
 	case IOMMU_CAP_NOEXEC:
 		return true;
 	default:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 11/11] iommu/arm-smmu: Advertise IOMMU_CAP_TRANSLATE_MSI
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

arm-smmu and arm-smmu-v3 do translate MSI write transactions
emitted by downstream devices. Advertise this property through
the capable operation.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/arm-smmu-v3.c | 2 ++
 drivers/iommu/arm-smmu.c    | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c86ba84..431ba8c 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1375,6 +1375,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 		return false;
 	case IOMMU_CAP_NOEXEC:
 		return true;
+	case IOMMU_CAP_TRANSLATE_MSI:
+		return true;
 	default:
 		return false;
 	}
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 7af1dd0..b862a1c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1363,6 +1363,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 		return false;
 	case IOMMU_CAP_NOEXEC:
 		return true;
+	case IOMMU_CAP_TRANSLATE_MSI:
+		return true;
 	default:
 		return false;
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 11/11] iommu/arm-smmu: Advertise IOMMU_CAP_TRANSLATE_MSI
@ 2016-09-27 20:48   ` Eric Auger
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Auger @ 2016-09-27 20:48 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

arm-smmu and arm-smmu-v3 do translate MSI write transactions
emitted by downstream devices. Advertise this property through
the capable operation.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 2 ++
 drivers/iommu/arm-smmu.c    | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c86ba84..431ba8c 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1375,6 +1375,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 		return false;
 	case IOMMU_CAP_NOEXEC:
 		return true;
+	case IOMMU_CAP_TRANSLATE_MSI:
+		return true;
 	default:
 		return false;
 	}
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 7af1dd0..b862a1c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1363,6 +1363,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 		return false;
 	case IOMMU_CAP_NOEXEC:
 		return true;
+	case IOMMU_CAP_TRANSLATE_MSI:
+		return true;
 	default:
 		return false;
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed
@ 2016-09-30 13:24     ` Robin Murphy
  0 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2016-09-30 13:24 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

Hi Eric,

On 27/09/16 21:48, Eric Auger wrote:
> iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on
> IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap
> they reserve the IOVA window to prevent the iova allocator to
> allocate in those areas.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>  drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-iommu.h | 18 ++++++++++++++++++
>  2 files changed, 66 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 04bbc85..db21143 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>  	return 0;
>  }
>  EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
> +
> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
> +			phys_addr_t paddr, size_t size, int prot)
> +{
> +	struct iova_domain *iovad;
> +	unsigned long lo, hi;
> +	int ret;
> +
> +	if (domain->type != IOMMU_DOMAIN_MIXED)
> +		return -EINVAL;
> +
> +	if (!domain->iova_cookie)
> +		return -EINVAL;
> +
> +	iovad = cookie_iovad(domain);
> +
> +	lo = iova_pfn(iovad, iova);
> +	hi = iova_pfn(iovad, iova + size - 1);
> +	reserve_iova(iovad, lo, hi);

This can't work reliably - reserve_iova() will (for good reason) merge
any adjacent or overlapping entries, so any unmap is liable to free more
IOVA space than actually gets unmapped, and things will get subtly out
of sync and go wrong later.

The more general issue with this whole approach, though, is that it
effectively rules out userspace doing guest memory hotplug or similar,
and I'm not we want to paint ourselves into that corner. Basically, as
soon as a device is attached to a guest, the entirety of the unallocated
IPA space becomes reserved, and userspace can never add anything further
to it, because any given address *might* be in use for an MSI mapping.

I think it still makes most sense to stick with the original approach of
cooperating with userspace to reserve a bounded area - it's just that we
can then let automatic mapping take care of itself within that area.

Speaking of which, I've realised the same fundamental reservation
problem already applies to PCI without ACS, regardless of MSIs. I just
tried on my Juno with guest memory placed at 0x4000000000, (i.e.
matching the host PA of the 64-bit PCI window), and sure enough when the
guest kicks off some DMA on the passed-through NIC, the root complex
interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR
claimed by the video card, and it fails. I guess this doesn't get hit in
practice on x86 because the guest memory map is unlikely to be much
different from the host's.

It seems like we basically need a general way of communicating fixed and
movable host reservations to userspace :/

Robin.

> +	ret = iommu_map(domain, iova, paddr, size, prot);
> +	if (ret)
> +		free_iova(iovad, lo);
> +	return ret;
> +}
> +EXPORT_SYMBOL(iommu_dma_map_mixed);
> +
> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
> +			     size_t size)
> +{
> +	struct iova_domain *iovad;
> +	unsigned long lo;
> +	size_t ret;
> +
> +	if (domain->type != IOMMU_DOMAIN_MIXED)
> +		return -EINVAL;
> +
> +	if (!domain->iova_cookie)
> +		return -EINVAL;
> +
> +	iovad = cookie_iovad(domain);
> +	lo = iova_pfn(iovad, iova);
> +
> +	ret = iommu_unmap(domain, iova, size);
> +	if (ret == size)
> +		free_iova(iovad, lo);
> +	return ret;
> +}
> +EXPORT_SYMBOL(iommu_dma_unmap_mixed);
> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
> index 1c55413..f2aa855 100644
> --- a/include/linux/dma-iommu.h
> +++ b/include/linux/dma-iommu.h
> @@ -70,6 +70,12 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>  int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>  		dma_addr_t base, u64 size);
>  
> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
> +			phys_addr_t paddr, size_t size, int prot);
> +
> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
> +			     size_t size);
> +
>  #else
>  
>  struct iommu_domain;
> @@ -99,6 +105,18 @@ static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>  	return -ENODEV;
>  }
>  
> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
> +			phys_addr_t paddr, size_t size, int prot)
> +{
> +	return -ENODEV;
> +}
> +
> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
> +			     size_t size)
> +{
> +	return -ENODEV;
> +}
> +
>  #endif	/* CONFIG_IOMMU_DMA */
>  #endif	/* __KERNEL__ */
>  #endif	/* __DMA_IOMMU_H */
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed
@ 2016-09-30 13:24     ` Robin Murphy
  0 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2016-09-30 13:24 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A,
	tglx-hfZtesqFncYOwBW4kG4KsQ, jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

Hi Eric,

On 27/09/16 21:48, Eric Auger wrote:
> iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on
> IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap
> they reserve the IOVA window to prevent the iova allocator to
> allocate in those areas.
> 
> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-iommu.h | 18 ++++++++++++++++++
>  2 files changed, 66 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 04bbc85..db21143 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>  	return 0;
>  }
>  EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
> +
> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
> +			phys_addr_t paddr, size_t size, int prot)
> +{
> +	struct iova_domain *iovad;
> +	unsigned long lo, hi;
> +	int ret;
> +
> +	if (domain->type != IOMMU_DOMAIN_MIXED)
> +		return -EINVAL;
> +
> +	if (!domain->iova_cookie)
> +		return -EINVAL;
> +
> +	iovad = cookie_iovad(domain);
> +
> +	lo = iova_pfn(iovad, iova);
> +	hi = iova_pfn(iovad, iova + size - 1);
> +	reserve_iova(iovad, lo, hi);

This can't work reliably - reserve_iova() will (for good reason) merge
any adjacent or overlapping entries, so any unmap is liable to free more
IOVA space than actually gets unmapped, and things will get subtly out
of sync and go wrong later.

The more general issue with this whole approach, though, is that it
effectively rules out userspace doing guest memory hotplug or similar,
and I'm not we want to paint ourselves into that corner. Basically, as
soon as a device is attached to a guest, the entirety of the unallocated
IPA space becomes reserved, and userspace can never add anything further
to it, because any given address *might* be in use for an MSI mapping.

I think it still makes most sense to stick with the original approach of
cooperating with userspace to reserve a bounded area - it's just that we
can then let automatic mapping take care of itself within that area.

Speaking of which, I've realised the same fundamental reservation
problem already applies to PCI without ACS, regardless of MSIs. I just
tried on my Juno with guest memory placed at 0x4000000000, (i.e.
matching the host PA of the 64-bit PCI window), and sure enough when the
guest kicks off some DMA on the passed-through NIC, the root complex
interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR
claimed by the video card, and it fails. I guess this doesn't get hit in
practice on x86 because the guest memory map is unlikely to be much
different from the host's.

It seems like we basically need a general way of communicating fixed and
movable host reservations to userspace :/

Robin.

> +	ret = iommu_map(domain, iova, paddr, size, prot);
> +	if (ret)
> +		free_iova(iovad, lo);
> +	return ret;
> +}
> +EXPORT_SYMBOL(iommu_dma_map_mixed);
> +
> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
> +			     size_t size)
> +{
> +	struct iova_domain *iovad;
> +	unsigned long lo;
> +	size_t ret;
> +
> +	if (domain->type != IOMMU_DOMAIN_MIXED)
> +		return -EINVAL;
> +
> +	if (!domain->iova_cookie)
> +		return -EINVAL;
> +
> +	iovad = cookie_iovad(domain);
> +	lo = iova_pfn(iovad, iova);
> +
> +	ret = iommu_unmap(domain, iova, size);
> +	if (ret == size)
> +		free_iova(iovad, lo);
> +	return ret;
> +}
> +EXPORT_SYMBOL(iommu_dma_unmap_mixed);
> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
> index 1c55413..f2aa855 100644
> --- a/include/linux/dma-iommu.h
> +++ b/include/linux/dma-iommu.h
> @@ -70,6 +70,12 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>  int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>  		dma_addr_t base, u64 size);
>  
> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
> +			phys_addr_t paddr, size_t size, int prot);
> +
> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
> +			     size_t size);
> +
>  #else
>  
>  struct iommu_domain;
> @@ -99,6 +105,18 @@ static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>  	return -ENODEV;
>  }
>  
> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
> +			phys_addr_t paddr, size_t size, int prot)
> +{
> +	return -ENODEV;
> +}
> +
> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
> +			     size_t size)
> +{
> +	return -ENODEV;
> +}
> +
>  #endif	/* CONFIG_IOMMU_DMA */
>  #endif	/* __KERNEL__ */
>  #endif	/* __DMA_IOMMU_H */
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed
@ 2016-09-30 13:24     ` Robin Murphy
  0 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2016-09-30 13:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Eric,

On 27/09/16 21:48, Eric Auger wrote:
> iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on
> IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap
> they reserve the IOVA window to prevent the iova allocator to
> allocate in those areas.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>  drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-iommu.h | 18 ++++++++++++++++++
>  2 files changed, 66 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 04bbc85..db21143 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>  	return 0;
>  }
>  EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
> +
> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
> +			phys_addr_t paddr, size_t size, int prot)
> +{
> +	struct iova_domain *iovad;
> +	unsigned long lo, hi;
> +	int ret;
> +
> +	if (domain->type != IOMMU_DOMAIN_MIXED)
> +		return -EINVAL;
> +
> +	if (!domain->iova_cookie)
> +		return -EINVAL;
> +
> +	iovad = cookie_iovad(domain);
> +
> +	lo = iova_pfn(iovad, iova);
> +	hi = iova_pfn(iovad, iova + size - 1);
> +	reserve_iova(iovad, lo, hi);

This can't work reliably - reserve_iova() will (for good reason) merge
any adjacent or overlapping entries, so any unmap is liable to free more
IOVA space than actually gets unmapped, and things will get subtly out
of sync and go wrong later.

The more general issue with this whole approach, though, is that it
effectively rules out userspace doing guest memory hotplug or similar,
and I'm not we want to paint ourselves into that corner. Basically, as
soon as a device is attached to a guest, the entirety of the unallocated
IPA space becomes reserved, and userspace can never add anything further
to it, because any given address *might* be in use for an MSI mapping.

I think it still makes most sense to stick with the original approach of
cooperating with userspace to reserve a bounded area - it's just that we
can then let automatic mapping take care of itself within that area.

Speaking of which, I've realised the same fundamental reservation
problem already applies to PCI without ACS, regardless of MSIs. I just
tried on my Juno with guest memory placed at 0x4000000000, (i.e.
matching the host PA of the 64-bit PCI window), and sure enough when the
guest kicks off some DMA on the passed-through NIC, the root complex
interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR
claimed by the video card, and it fails. I guess this doesn't get hit in
practice on x86 because the guest memory map is unlikely to be much
different from the host's.

It seems like we basically need a general way of communicating fixed and
movable host reservations to userspace :/

Robin.

> +	ret = iommu_map(domain, iova, paddr, size, prot);
> +	if (ret)
> +		free_iova(iovad, lo);
> +	return ret;
> +}
> +EXPORT_SYMBOL(iommu_dma_map_mixed);
> +
> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
> +			     size_t size)
> +{
> +	struct iova_domain *iovad;
> +	unsigned long lo;
> +	size_t ret;
> +
> +	if (domain->type != IOMMU_DOMAIN_MIXED)
> +		return -EINVAL;
> +
> +	if (!domain->iova_cookie)
> +		return -EINVAL;
> +
> +	iovad = cookie_iovad(domain);
> +	lo = iova_pfn(iovad, iova);
> +
> +	ret = iommu_unmap(domain, iova, size);
> +	if (ret == size)
> +		free_iova(iovad, lo);
> +	return ret;
> +}
> +EXPORT_SYMBOL(iommu_dma_unmap_mixed);
> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
> index 1c55413..f2aa855 100644
> --- a/include/linux/dma-iommu.h
> +++ b/include/linux/dma-iommu.h
> @@ -70,6 +70,12 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>  int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>  		dma_addr_t base, u64 size);
>  
> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
> +			phys_addr_t paddr, size_t size, int prot);
> +
> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
> +			     size_t size);
> +
>  #else
>  
>  struct iommu_domain;
> @@ -99,6 +105,18 @@ static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>  	return -ENODEV;
>  }
>  
> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
> +			phys_addr_t paddr, size_t size, int prot)
> +{
> +	return -ENODEV;
> +}
> +
> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
> +			     size_t size)
> +{
> +	return -ENODEV;
> +}
> +
>  #endif	/* CONFIG_IOMMU_DMA */
>  #endif	/* __KERNEL__ */
>  #endif	/* __DMA_IOMMU_H */
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed
@ 2016-10-02  9:56       ` Christoffer Dall
  0 siblings, 0 replies; 38+ messages in thread
From: Christoffer Dall @ 2016-10-02  9:56 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Eric Auger, eric.auger.pro, marc.zyngier, alex.williamson,
	will.deacon, joro, tglx, jason, linux-arm-kernel, kvm, drjones,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi,
	Peter Maydell

On Fri, Sep 30, 2016 at 02:24:40PM +0100, Robin Murphy wrote:
> Hi Eric,
> 
> On 27/09/16 21:48, Eric Auger wrote:
> > iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on
> > IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap
> > they reserve the IOVA window to prevent the iova allocator to
> > allocate in those areas.
> > 
> > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > ---
> >  drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/dma-iommu.h | 18 ++++++++++++++++++
> >  2 files changed, 66 insertions(+)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 04bbc85..db21143 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
> > +
> > +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
> > +			phys_addr_t paddr, size_t size, int prot)
> > +{
> > +	struct iova_domain *iovad;
> > +	unsigned long lo, hi;
> > +	int ret;
> > +
> > +	if (domain->type != IOMMU_DOMAIN_MIXED)
> > +		return -EINVAL;
> > +
> > +	if (!domain->iova_cookie)
> > +		return -EINVAL;
> > +
> > +	iovad = cookie_iovad(domain);
> > +
> > +	lo = iova_pfn(iovad, iova);
> > +	hi = iova_pfn(iovad, iova + size - 1);
> > +	reserve_iova(iovad, lo, hi);
> 
> This can't work reliably - reserve_iova() will (for good reason) merge
> any adjacent or overlapping entries, so any unmap is liable to free more
> IOVA space than actually gets unmapped, and things will get subtly out
> of sync and go wrong later.
> 
> The more general issue with this whole approach, though, is that it
> effectively rules out userspace doing guest memory hotplug or similar,
> and I'm not we want to paint ourselves into that corner. Basically, as
> soon as a device is attached to a guest, the entirety of the unallocated
> IPA space becomes reserved, and userspace can never add anything further
> to it, because any given address *might* be in use for an MSI mapping.

Ah, we didn't think of that when discussing this design at KVM Forum,
because the idea was that the IOVA allocator was in charge of that
resource, and the IOVA was a separate concept from the IPA space.

I think what tripped us up, is that while the above is true for the MSI
configuration where we trap the bar and do the allocation at VFIO init
time, the guest device driver can program DMA to any address without
trapping, and therefore there's an inherent relationship between the
IOVA and the IPA space.  Is that right?

> 
> I think it still makes most sense to stick with the original approach of
> cooperating with userspace to reserve a bounded area - it's just that we
> can then let automatic mapping take care of itself within that area.

I was thinking that it's also possible to do it the other way around: To
let userspace say wherever memory may be hotplugged and do the
allocation within the remaining area, but I suppose that's pretty much
the same thing, and it should just depend on what's easiest to implement
and what userspace can best predict.

> 
> Speaking of which, I've realised the same fundamental reservation
> problem already applies to PCI without ACS, regardless of MSIs. I just
> tried on my Juno with guest memory placed at 0x4000000000, (i.e.
> matching the host PA of the 64-bit PCI window), and sure enough when the
> guest kicks off some DMA on the passed-through NIC, the root complex
> interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR
> claimed by the video card, and it fails. I guess this doesn't get hit in
> practice on x86 because the guest memory map is unlikely to be much
> different from the host's.
> 
> It seems like we basically need a general way of communicating fixed and
> movable host reservations to userspace :/
> 

Yes, this makes sense to me.   Do we have any existing way of
discovering this from userspace or can we think of something?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed
@ 2016-10-02  9:56       ` Christoffer Dall
  0 siblings, 0 replies; 38+ messages in thread
From: Christoffer Dall @ 2016-10-02  9:56 UTC (permalink / raw)
  To: Robin Murphy
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, drjones-H+wXaHxf7aLQT0dZR+AlfA,
	jason-NLaQJdtUoK4Be96aLqz0jA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Peter Maydell, marc.zyngier-5wv7dgnIgG8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ, will.deacon-5wv7dgnIgG8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

On Fri, Sep 30, 2016 at 02:24:40PM +0100, Robin Murphy wrote:
> Hi Eric,
> 
> On 27/09/16 21:48, Eric Auger wrote:
> > iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on
> > IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap
> > they reserve the IOVA window to prevent the iova allocator to
> > allocate in those areas.
> > 
> > Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > ---
> >  drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/dma-iommu.h | 18 ++++++++++++++++++
> >  2 files changed, 66 insertions(+)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 04bbc85..db21143 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
> > +
> > +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
> > +			phys_addr_t paddr, size_t size, int prot)
> > +{
> > +	struct iova_domain *iovad;
> > +	unsigned long lo, hi;
> > +	int ret;
> > +
> > +	if (domain->type != IOMMU_DOMAIN_MIXED)
> > +		return -EINVAL;
> > +
> > +	if (!domain->iova_cookie)
> > +		return -EINVAL;
> > +
> > +	iovad = cookie_iovad(domain);
> > +
> > +	lo = iova_pfn(iovad, iova);
> > +	hi = iova_pfn(iovad, iova + size - 1);
> > +	reserve_iova(iovad, lo, hi);
> 
> This can't work reliably - reserve_iova() will (for good reason) merge
> any adjacent or overlapping entries, so any unmap is liable to free more
> IOVA space than actually gets unmapped, and things will get subtly out
> of sync and go wrong later.
> 
> The more general issue with this whole approach, though, is that it
> effectively rules out userspace doing guest memory hotplug or similar,
> and I'm not we want to paint ourselves into that corner. Basically, as
> soon as a device is attached to a guest, the entirety of the unallocated
> IPA space becomes reserved, and userspace can never add anything further
> to it, because any given address *might* be in use for an MSI mapping.

Ah, we didn't think of that when discussing this design at KVM Forum,
because the idea was that the IOVA allocator was in charge of that
resource, and the IOVA was a separate concept from the IPA space.

I think what tripped us up, is that while the above is true for the MSI
configuration where we trap the bar and do the allocation at VFIO init
time, the guest device driver can program DMA to any address without
trapping, and therefore there's an inherent relationship between the
IOVA and the IPA space.  Is that right?

> 
> I think it still makes most sense to stick with the original approach of
> cooperating with userspace to reserve a bounded area - it's just that we
> can then let automatic mapping take care of itself within that area.

I was thinking that it's also possible to do it the other way around: To
let userspace say wherever memory may be hotplugged and do the
allocation within the remaining area, but I suppose that's pretty much
the same thing, and it should just depend on what's easiest to implement
and what userspace can best predict.

> 
> Speaking of which, I've realised the same fundamental reservation
> problem already applies to PCI without ACS, regardless of MSIs. I just
> tried on my Juno with guest memory placed at 0x4000000000, (i.e.
> matching the host PA of the 64-bit PCI window), and sure enough when the
> guest kicks off some DMA on the passed-through NIC, the root complex
> interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR
> claimed by the video card, and it fails. I guess this doesn't get hit in
> practice on x86 because the guest memory map is unlikely to be much
> different from the host's.
> 
> It seems like we basically need a general way of communicating fixed and
> movable host reservations to userspace :/
> 

Yes, this makes sense to me.   Do we have any existing way of
discovering this from userspace or can we think of something?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed
@ 2016-10-02  9:56       ` Christoffer Dall
  0 siblings, 0 replies; 38+ messages in thread
From: Christoffer Dall @ 2016-10-02  9:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 30, 2016 at 02:24:40PM +0100, Robin Murphy wrote:
> Hi Eric,
> 
> On 27/09/16 21:48, Eric Auger wrote:
> > iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on
> > IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap
> > they reserve the IOVA window to prevent the iova allocator to
> > allocate in those areas.
> > 
> > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > ---
> >  drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/dma-iommu.h | 18 ++++++++++++++++++
> >  2 files changed, 66 insertions(+)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 04bbc85..db21143 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
> > +
> > +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
> > +			phys_addr_t paddr, size_t size, int prot)
> > +{
> > +	struct iova_domain *iovad;
> > +	unsigned long lo, hi;
> > +	int ret;
> > +
> > +	if (domain->type != IOMMU_DOMAIN_MIXED)
> > +		return -EINVAL;
> > +
> > +	if (!domain->iova_cookie)
> > +		return -EINVAL;
> > +
> > +	iovad = cookie_iovad(domain);
> > +
> > +	lo = iova_pfn(iovad, iova);
> > +	hi = iova_pfn(iovad, iova + size - 1);
> > +	reserve_iova(iovad, lo, hi);
> 
> This can't work reliably - reserve_iova() will (for good reason) merge
> any adjacent or overlapping entries, so any unmap is liable to free more
> IOVA space than actually gets unmapped, and things will get subtly out
> of sync and go wrong later.
> 
> The more general issue with this whole approach, though, is that it
> effectively rules out userspace doing guest memory hotplug or similar,
> and I'm not we want to paint ourselves into that corner. Basically, as
> soon as a device is attached to a guest, the entirety of the unallocated
> IPA space becomes reserved, and userspace can never add anything further
> to it, because any given address *might* be in use for an MSI mapping.

Ah, we didn't think of that when discussing this design at KVM Forum,
because the idea was that the IOVA allocator was in charge of that
resource, and the IOVA was a separate concept from the IPA space.

I think what tripped us up, is that while the above is true for the MSI
configuration where we trap the bar and do the allocation at VFIO init
time, the guest device driver can program DMA to any address without
trapping, and therefore there's an inherent relationship between the
IOVA and the IPA space.  Is that right?

> 
> I think it still makes most sense to stick with the original approach of
> cooperating with userspace to reserve a bounded area - it's just that we
> can then let automatic mapping take care of itself within that area.

I was thinking that it's also possible to do it the other way around: To
let userspace say wherever memory may be hotplugged and do the
allocation within the remaining area, but I suppose that's pretty much
the same thing, and it should just depend on what's easiest to implement
and what userspace can best predict.

> 
> Speaking of which, I've realised the same fundamental reservation
> problem already applies to PCI without ACS, regardless of MSIs. I just
> tried on my Juno with guest memory placed at 0x4000000000, (i.e.
> matching the host PA of the 64-bit PCI window), and sure enough when the
> guest kicks off some DMA on the passed-through NIC, the root complex
> interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR
> claimed by the video card, and it fails. I guess this doesn't get hit in
> practice on x86 because the guest memory map is unlikely to be much
> different from the host's.
> 
> It seems like we basically need a general way of communicating fixed and
> movable host reservations to userspace :/
> 

Yes, this makes sense to me.   Do we have any existing way of
discovering this from userspace or can we think of something?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed
@ 2016-10-03  9:38       ` Auger Eric
  0 siblings, 0 replies; 38+ messages in thread
From: Auger Eric @ 2016-10-03  9:38 UTC (permalink / raw)
  To: Robin Murphy, eric.auger.pro, christoffer.dall, marc.zyngier,
	alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

Hi Robin,

On 30/09/2016 15:24, Robin Murphy wrote:
> Hi Eric,
> 
> On 27/09/16 21:48, Eric Auger wrote:
>> iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on
>> IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap
>> they reserve the IOVA window to prevent the iova allocator to
>> allocate in those areas.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>  drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/dma-iommu.h | 18 ++++++++++++++++++
>>  2 files changed, 66 insertions(+)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 04bbc85..db21143 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>  	return 0;
>>  }
>>  EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>> +
>> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			phys_addr_t paddr, size_t size, int prot)
>> +{
>> +	struct iova_domain *iovad;
>> +	unsigned long lo, hi;
>> +	int ret;
>> +
>> +	if (domain->type != IOMMU_DOMAIN_MIXED)
>> +		return -EINVAL;
>> +
>> +	if (!domain->iova_cookie)
>> +		return -EINVAL;
>> +
>> +	iovad = cookie_iovad(domain);
>> +
>> +	lo = iova_pfn(iovad, iova);
>> +	hi = iova_pfn(iovad, iova + size - 1);
>> +	reserve_iova(iovad, lo, hi);
> 
> This can't work reliably - reserve_iova() will (for good reason) merge
> any adjacent or overlapping entries, so any unmap is liable to free more
> IOVA space than actually gets unmapped, and things will get subtly out
> of sync and go wrong later.
OK. I did not notice that.
> 
> The more general issue with this whole approach, though, is that it
> effectively rules out userspace doing guest memory hotplug or similar,
> and I'm not we want to paint ourselves into that corner. Basically, as
> soon as a device is attached to a guest, the entirety of the unallocated
> IPA space becomes reserved, and userspace can never add anything further
> to it, because any given address *might* be in use for an MSI mapping.
I fully agree. My bad, I mixed up about how/when the PCI MMIO space was
iommu mapped. So we don't have any other solution than having the guest
providing unused and non reserved GPA. Back to the original approach then.
> 
> I think it still makes most sense to stick with the original approach of
> cooperating with userspace to reserve a bounded area - it's just that we
> can then let automatic mapping take care of itself within that area.
OK will respin asap.
> 
> Speaking of which, I've realised the same fundamental reservation
> problem already applies to PCI without ACS, regardless of MSIs. I just
> tried on my Juno with guest memory placed at 0x4000000000, (i.e.
> matching the host PA of the 64-bit PCI window), and sure enough when the
> guest kicks off some DMA on the passed-through NIC, the root complex
> interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR
> claimed by the video card, and it fails. I guess this doesn't get hit in
> practice on x86 because the guest memory map is unlikely to be much
> different from the host's.
> 
> It seems like we basically need a general way of communicating fixed and
> movable host reservations to userspace :/

Yes I saw "iommu/dma: Avoid PCI host bridge windows". Well this looks
like a generalisation of the MSI geometry issue (they also face this one
on x86 with a non x86 guest). This will also hit the fact that on QEMU
the ARM guest memory map is static.

Thank you for your time

Best Regards

Eric
> 
> Robin.
> 
>> +	ret = iommu_map(domain, iova, paddr, size, prot);
>> +	if (ret)
>> +		free_iova(iovad, lo);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(iommu_dma_map_mixed);
>> +
>> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			     size_t size)
>> +{
>> +	struct iova_domain *iovad;
>> +	unsigned long lo;
>> +	size_t ret;
>> +
>> +	if (domain->type != IOMMU_DOMAIN_MIXED)
>> +		return -EINVAL;
>> +
>> +	if (!domain->iova_cookie)
>> +		return -EINVAL;
>> +
>> +	iovad = cookie_iovad(domain);
>> +	lo = iova_pfn(iovad, iova);
>> +
>> +	ret = iommu_unmap(domain, iova, size);
>> +	if (ret == size)
>> +		free_iova(iovad, lo);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(iommu_dma_unmap_mixed);
>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>> index 1c55413..f2aa855 100644
>> --- a/include/linux/dma-iommu.h
>> +++ b/include/linux/dma-iommu.h
>> @@ -70,6 +70,12 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>>  int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>  		dma_addr_t base, u64 size);
>>  
>> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			phys_addr_t paddr, size_t size, int prot);
>> +
>> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			     size_t size);
>> +
>>  #else
>>  
>>  struct iommu_domain;
>> @@ -99,6 +105,18 @@ static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>  	return -ENODEV;
>>  }
>>  
>> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			phys_addr_t paddr, size_t size, int prot)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			     size_t size)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>>  #endif	/* CONFIG_IOMMU_DMA */
>>  #endif	/* __KERNEL__ */
>>  #endif	/* __DMA_IOMMU_H */
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed
@ 2016-10-03  9:38       ` Auger Eric
  0 siblings, 0 replies; 38+ messages in thread
From: Auger Eric @ 2016-10-03  9:38 UTC (permalink / raw)
  To: Robin Murphy, eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A,
	tglx-hfZtesqFncYOwBW4kG4KsQ, jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

Hi Robin,

On 30/09/2016 15:24, Robin Murphy wrote:
> Hi Eric,
> 
> On 27/09/16 21:48, Eric Auger wrote:
>> iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on
>> IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap
>> they reserve the IOVA window to prevent the iova allocator to
>> allocate in those areas.
>>
>> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> ---
>>  drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/dma-iommu.h | 18 ++++++++++++++++++
>>  2 files changed, 66 insertions(+)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 04bbc85..db21143 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>  	return 0;
>>  }
>>  EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>> +
>> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			phys_addr_t paddr, size_t size, int prot)
>> +{
>> +	struct iova_domain *iovad;
>> +	unsigned long lo, hi;
>> +	int ret;
>> +
>> +	if (domain->type != IOMMU_DOMAIN_MIXED)
>> +		return -EINVAL;
>> +
>> +	if (!domain->iova_cookie)
>> +		return -EINVAL;
>> +
>> +	iovad = cookie_iovad(domain);
>> +
>> +	lo = iova_pfn(iovad, iova);
>> +	hi = iova_pfn(iovad, iova + size - 1);
>> +	reserve_iova(iovad, lo, hi);
> 
> This can't work reliably - reserve_iova() will (for good reason) merge
> any adjacent or overlapping entries, so any unmap is liable to free more
> IOVA space than actually gets unmapped, and things will get subtly out
> of sync and go wrong later.
OK. I did not notice that.
> 
> The more general issue with this whole approach, though, is that it
> effectively rules out userspace doing guest memory hotplug or similar,
> and I'm not we want to paint ourselves into that corner. Basically, as
> soon as a device is attached to a guest, the entirety of the unallocated
> IPA space becomes reserved, and userspace can never add anything further
> to it, because any given address *might* be in use for an MSI mapping.
I fully agree. My bad, I mixed up about how/when the PCI MMIO space was
iommu mapped. So we don't have any other solution than having the guest
providing unused and non reserved GPA. Back to the original approach then.
> 
> I think it still makes most sense to stick with the original approach of
> cooperating with userspace to reserve a bounded area - it's just that we
> can then let automatic mapping take care of itself within that area.
OK will respin asap.
> 
> Speaking of which, I've realised the same fundamental reservation
> problem already applies to PCI without ACS, regardless of MSIs. I just
> tried on my Juno with guest memory placed at 0x4000000000, (i.e.
> matching the host PA of the 64-bit PCI window), and sure enough when the
> guest kicks off some DMA on the passed-through NIC, the root complex
> interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR
> claimed by the video card, and it fails. I guess this doesn't get hit in
> practice on x86 because the guest memory map is unlikely to be much
> different from the host's.
> 
> It seems like we basically need a general way of communicating fixed and
> movable host reservations to userspace :/

Yes I saw "iommu/dma: Avoid PCI host bridge windows". Well this looks
like a generalisation of the MSI geometry issue (they also face this one
on x86 with a non x86 guest). This will also hit the fact that on QEMU
the ARM guest memory map is static.

Thank you for your time

Best Regards

Eric
> 
> Robin.
> 
>> +	ret = iommu_map(domain, iova, paddr, size, prot);
>> +	if (ret)
>> +		free_iova(iovad, lo);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(iommu_dma_map_mixed);
>> +
>> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			     size_t size)
>> +{
>> +	struct iova_domain *iovad;
>> +	unsigned long lo;
>> +	size_t ret;
>> +
>> +	if (domain->type != IOMMU_DOMAIN_MIXED)
>> +		return -EINVAL;
>> +
>> +	if (!domain->iova_cookie)
>> +		return -EINVAL;
>> +
>> +	iovad = cookie_iovad(domain);
>> +	lo = iova_pfn(iovad, iova);
>> +
>> +	ret = iommu_unmap(domain, iova, size);
>> +	if (ret == size)
>> +		free_iova(iovad, lo);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(iommu_dma_unmap_mixed);
>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>> index 1c55413..f2aa855 100644
>> --- a/include/linux/dma-iommu.h
>> +++ b/include/linux/dma-iommu.h
>> @@ -70,6 +70,12 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>>  int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>  		dma_addr_t base, u64 size);
>>  
>> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			phys_addr_t paddr, size_t size, int prot);
>> +
>> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			     size_t size);
>> +
>>  #else
>>  
>>  struct iommu_domain;
>> @@ -99,6 +105,18 @@ static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>  	return -ENODEV;
>>  }
>>  
>> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			phys_addr_t paddr, size_t size, int prot)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			     size_t size)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>>  #endif	/* CONFIG_IOMMU_DMA */
>>  #endif	/* __KERNEL__ */
>>  #endif	/* __DMA_IOMMU_H */
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed
@ 2016-10-03  9:38       ` Auger Eric
  0 siblings, 0 replies; 38+ messages in thread
From: Auger Eric @ 2016-10-03  9:38 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

On 30/09/2016 15:24, Robin Murphy wrote:
> Hi Eric,
> 
> On 27/09/16 21:48, Eric Auger wrote:
>> iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on
>> IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap
>> they reserve the IOVA window to prevent the iova allocator to
>> allocate in those areas.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>  drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/dma-iommu.h | 18 ++++++++++++++++++
>>  2 files changed, 66 insertions(+)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 04bbc85..db21143 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>  	return 0;
>>  }
>>  EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>> +
>> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			phys_addr_t paddr, size_t size, int prot)
>> +{
>> +	struct iova_domain *iovad;
>> +	unsigned long lo, hi;
>> +	int ret;
>> +
>> +	if (domain->type != IOMMU_DOMAIN_MIXED)
>> +		return -EINVAL;
>> +
>> +	if (!domain->iova_cookie)
>> +		return -EINVAL;
>> +
>> +	iovad = cookie_iovad(domain);
>> +
>> +	lo = iova_pfn(iovad, iova);
>> +	hi = iova_pfn(iovad, iova + size - 1);
>> +	reserve_iova(iovad, lo, hi);
> 
> This can't work reliably - reserve_iova() will (for good reason) merge
> any adjacent or overlapping entries, so any unmap is liable to free more
> IOVA space than actually gets unmapped, and things will get subtly out
> of sync and go wrong later.
OK. I did not notice that.
> 
> The more general issue with this whole approach, though, is that it
> effectively rules out userspace doing guest memory hotplug or similar,
> and I'm not we want to paint ourselves into that corner. Basically, as
> soon as a device is attached to a guest, the entirety of the unallocated
> IPA space becomes reserved, and userspace can never add anything further
> to it, because any given address *might* be in use for an MSI mapping.
I fully agree. My bad, I mixed up about how/when the PCI MMIO space was
iommu mapped. So we don't have any other solution than having the guest
providing unused and non reserved GPA. Back to the original approach then.
> 
> I think it still makes most sense to stick with the original approach of
> cooperating with userspace to reserve a bounded area - it's just that we
> can then let automatic mapping take care of itself within that area.
OK will respin asap.
> 
> Speaking of which, I've realised the same fundamental reservation
> problem already applies to PCI without ACS, regardless of MSIs. I just
> tried on my Juno with guest memory placed at 0x4000000000, (i.e.
> matching the host PA of the 64-bit PCI window), and sure enough when the
> guest kicks off some DMA on the passed-through NIC, the root complex
> interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR
> claimed by the video card, and it fails. I guess this doesn't get hit in
> practice on x86 because the guest memory map is unlikely to be much
> different from the host's.
> 
> It seems like we basically need a general way of communicating fixed and
> movable host reservations to userspace :/

Yes I saw "iommu/dma: Avoid PCI host bridge windows". Well this looks
like a generalisation of the MSI geometry issue (they also face this one
on x86 with a non x86 guest). This will also hit the fact that on QEMU
the ARM guest memory map is static.

Thank you for your time

Best Regards

Eric
> 
> Robin.
> 
>> +	ret = iommu_map(domain, iova, paddr, size, prot);
>> +	if (ret)
>> +		free_iova(iovad, lo);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(iommu_dma_map_mixed);
>> +
>> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			     size_t size)
>> +{
>> +	struct iova_domain *iovad;
>> +	unsigned long lo;
>> +	size_t ret;
>> +
>> +	if (domain->type != IOMMU_DOMAIN_MIXED)
>> +		return -EINVAL;
>> +
>> +	if (!domain->iova_cookie)
>> +		return -EINVAL;
>> +
>> +	iovad = cookie_iovad(domain);
>> +	lo = iova_pfn(iovad, iova);
>> +
>> +	ret = iommu_unmap(domain, iova, size);
>> +	if (ret == size)
>> +		free_iova(iovad, lo);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(iommu_dma_unmap_mixed);
>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>> index 1c55413..f2aa855 100644
>> --- a/include/linux/dma-iommu.h
>> +++ b/include/linux/dma-iommu.h
>> @@ -70,6 +70,12 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>>  int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>  		dma_addr_t base, u64 size);
>>  
>> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			phys_addr_t paddr, size_t size, int prot);
>> +
>> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			     size_t size);
>> +
>>  #else
>>  
>>  struct iommu_domain;
>> @@ -99,6 +105,18 @@ static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>  	return -ENODEV;
>>  }
>>  
>> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			phys_addr_t paddr, size_t size, int prot)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long iova,
>> +			     size_t size)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>>  #endif	/* CONFIG_IOMMU_DMA */
>>  #endif	/* __KERNEL__ */
>>  #endif	/* __DMA_IOMMU_H */
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed
  2016-10-02  9:56       ` Christoffer Dall
@ 2016-10-04 17:18         ` Robin Murphy
  -1 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2016-10-04 17:18 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Eric Auger, eric.auger.pro, marc.zyngier, alex.williamson,
	will.deacon, joro, tglx, jason, linux-arm-kernel, kvm, drjones,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi,
	Peter Maydell

On 02/10/16 10:56, Christoffer Dall wrote:
> On Fri, Sep 30, 2016 at 02:24:40PM +0100, Robin Murphy wrote:
>> Hi Eric,
>>
>> On 27/09/16 21:48, Eric Auger wrote:
>>> iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on
>>> IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap
>>> they reserve the IOVA window to prevent the iova allocator to
>>> allocate in those areas.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>> ---
>>>  drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
>>>  include/linux/dma-iommu.h | 18 ++++++++++++++++++
>>>  2 files changed, 66 insertions(+)
>>>
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index 04bbc85..db21143 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>>  	return 0;
>>>  }
>>>  EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>>> +
>>> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
>>> +			phys_addr_t paddr, size_t size, int prot)
>>> +{
>>> +	struct iova_domain *iovad;
>>> +	unsigned long lo, hi;
>>> +	int ret;
>>> +
>>> +	if (domain->type != IOMMU_DOMAIN_MIXED)
>>> +		return -EINVAL;
>>> +
>>> +	if (!domain->iova_cookie)
>>> +		return -EINVAL;
>>> +
>>> +	iovad = cookie_iovad(domain);
>>> +
>>> +	lo = iova_pfn(iovad, iova);
>>> +	hi = iova_pfn(iovad, iova + size - 1);
>>> +	reserve_iova(iovad, lo, hi);
>>
>> This can't work reliably - reserve_iova() will (for good reason) merge
>> any adjacent or overlapping entries, so any unmap is liable to free more
>> IOVA space than actually gets unmapped, and things will get subtly out
>> of sync and go wrong later.
>>
>> The more general issue with this whole approach, though, is that it
>> effectively rules out userspace doing guest memory hotplug or similar,
>> and I'm not we want to paint ourselves into that corner. Basically, as
>> soon as a device is attached to a guest, the entirety of the unallocated
>> IPA space becomes reserved, and userspace can never add anything further
>> to it, because any given address *might* be in use for an MSI mapping.
> 
> Ah, we didn't think of that when discussing this design at KVM Forum,
> because the idea was that the IOVA allocator was in charge of that
> resource, and the IOVA was a separate concept from the IPA space.
> 
> I think what tripped us up, is that while the above is true for the MSI
> configuration where we trap the bar and do the allocation at VFIO init
> time, the guest device driver can program DMA to any address without
> trapping, and therefore there's an inherent relationship between the
> IOVA and the IPA space.  Is that right?

Yes, for anything the guest knows about and/or can touch directly, IOVA
must equal IPA, or DMA is going to go horribly wrong. It's only direct
interactions between device and host behind the guest's back where we
(may) have some freedom with IOVA assignment.

>> I think it still makes most sense to stick with the original approach of
>> cooperating with userspace to reserve a bounded area - it's just that we
>> can then let automatic mapping take care of itself within that area.
> 
> I was thinking that it's also possible to do it the other way around: To
> let userspace say wherever memory may be hotplugged and do the
> allocation within the remaining area, but I suppose that's pretty much
> the same thing, and it should just depend on what's easiest to implement
> and what userspace can best predict.

Indeed, if userspace *is* able to pre-emptively claim everything it
might ever want, that does kind of implicitly solve the "tell me where I
can put this" problem (assuming it doesn't simply claim the whole
address space, of course), but I'm not so sure it works well if there
are any specific restrictions (e.g. if some device is going to require
the MSI range to be 32-bit addressable). It also fails to address the
issue below...

>> Speaking of which, I've realised the same fundamental reservation
>> problem already applies to PCI without ACS, regardless of MSIs. I just
>> tried on my Juno with guest memory placed at 0x4000000000, (i.e.
>> matching the host PA of the 64-bit PCI window), and sure enough when the
>> guest kicks off some DMA on the passed-through NIC, the root complex
>> interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR
>> claimed by the video card, and it fails. I guess this doesn't get hit in
>> practice on x86 because the guest memory map is unlikely to be much
>> different from the host's.
>>
>> It seems like we basically need a general way of communicating fixed and
>> movable host reservations to userspace :/
>>
> 
> Yes, this makes sense to me.   Do we have any existing way of
> discovering this from userspace or can we think of something?

I know virtually nothing about the userspace interface, but I was under
the impression it would require something new. I wasn't even aware you
could do the VFIO-under-QEMU-TCG thing which Eric points out, so it
seems like the general "tell userspace about addresses it can't use"
issue is perhaps the more pressing one. On investigation, QEMU's static
memory map with RAM at 0x4000000 is already busted for VFIO on Juno, as
that results in attempting DMA to config space, which goes about as well
as one might expect.

Robin.

> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed
@ 2016-10-04 17:18         ` Robin Murphy
  0 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2016-10-04 17:18 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/10/16 10:56, Christoffer Dall wrote:
> On Fri, Sep 30, 2016 at 02:24:40PM +0100, Robin Murphy wrote:
>> Hi Eric,
>>
>> On 27/09/16 21:48, Eric Auger wrote:
>>> iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on
>>> IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap
>>> they reserve the IOVA window to prevent the iova allocator to
>>> allocate in those areas.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>> ---
>>>  drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
>>>  include/linux/dma-iommu.h | 18 ++++++++++++++++++
>>>  2 files changed, 66 insertions(+)
>>>
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index 04bbc85..db21143 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>>  	return 0;
>>>  }
>>>  EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>>> +
>>> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
>>> +			phys_addr_t paddr, size_t size, int prot)
>>> +{
>>> +	struct iova_domain *iovad;
>>> +	unsigned long lo, hi;
>>> +	int ret;
>>> +
>>> +	if (domain->type != IOMMU_DOMAIN_MIXED)
>>> +		return -EINVAL;
>>> +
>>> +	if (!domain->iova_cookie)
>>> +		return -EINVAL;
>>> +
>>> +	iovad = cookie_iovad(domain);
>>> +
>>> +	lo = iova_pfn(iovad, iova);
>>> +	hi = iova_pfn(iovad, iova + size - 1);
>>> +	reserve_iova(iovad, lo, hi);
>>
>> This can't work reliably - reserve_iova() will (for good reason) merge
>> any adjacent or overlapping entries, so any unmap is liable to free more
>> IOVA space than actually gets unmapped, and things will get subtly out
>> of sync and go wrong later.
>>
>> The more general issue with this whole approach, though, is that it
>> effectively rules out userspace doing guest memory hotplug or similar,
>> and I'm not we want to paint ourselves into that corner. Basically, as
>> soon as a device is attached to a guest, the entirety of the unallocated
>> IPA space becomes reserved, and userspace can never add anything further
>> to it, because any given address *might* be in use for an MSI mapping.
> 
> Ah, we didn't think of that when discussing this design at KVM Forum,
> because the idea was that the IOVA allocator was in charge of that
> resource, and the IOVA was a separate concept from the IPA space.
> 
> I think what tripped us up, is that while the above is true for the MSI
> configuration where we trap the bar and do the allocation at VFIO init
> time, the guest device driver can program DMA to any address without
> trapping, and therefore there's an inherent relationship between the
> IOVA and the IPA space.  Is that right?

Yes, for anything the guest knows about and/or can touch directly, IOVA
must equal IPA, or DMA is going to go horribly wrong. It's only direct
interactions between device and host behind the guest's back where we
(may) have some freedom with IOVA assignment.

>> I think it still makes most sense to stick with the original approach of
>> cooperating with userspace to reserve a bounded area - it's just that we
>> can then let automatic mapping take care of itself within that area.
> 
> I was thinking that it's also possible to do it the other way around: To
> let userspace say wherever memory may be hotplugged and do the
> allocation within the remaining area, but I suppose that's pretty much
> the same thing, and it should just depend on what's easiest to implement
> and what userspace can best predict.

Indeed, if userspace *is* able to pre-emptively claim everything it
might ever want, that does kind of implicitly solve the "tell me where I
can put this" problem (assuming it doesn't simply claim the whole
address space, of course), but I'm not so sure it works well if there
are any specific restrictions (e.g. if some device is going to require
the MSI range to be 32-bit addressable). It also fails to address the
issue below...

>> Speaking of which, I've realised the same fundamental reservation
>> problem already applies to PCI without ACS, regardless of MSIs. I just
>> tried on my Juno with guest memory placed at 0x4000000000, (i.e.
>> matching the host PA of the 64-bit PCI window), and sure enough when the
>> guest kicks off some DMA on the passed-through NIC, the root complex
>> interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR
>> claimed by the video card, and it fails. I guess this doesn't get hit in
>> practice on x86 because the guest memory map is unlikely to be much
>> different from the host's.
>>
>> It seems like we basically need a general way of communicating fixed and
>> movable host reservations to userspace :/
>>
> 
> Yes, this makes sense to me.   Do we have any existing way of
> discovering this from userspace or can we think of something?

I know virtually nothing about the userspace interface, but I was under
the impression it would require something new. I wasn't even aware you
could do the VFIO-under-QEMU-TCG thing which Eric points out, so it
seems like the general "tell userspace about addresses it can't use"
issue is perhaps the more pressing one. On investigation, QEMU's static
memory map with RAM at 0x4000000 is already busted for VFIO on Juno, as
that results in attempting DMA to config space, which goes about as well
as one might expect.

Robin.

> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed
  2016-10-04 17:18         ` Robin Murphy
@ 2016-10-04 17:37           ` Auger Eric
  -1 siblings, 0 replies; 38+ messages in thread
From: Auger Eric @ 2016-10-04 17:37 UTC (permalink / raw)
  To: Robin Murphy, Christoffer Dall
  Cc: eric.auger.pro, marc.zyngier, alex.williamson, will.deacon, joro,
	tglx, jason, linux-arm-kernel, kvm, drjones, linux-kernel,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	Jean-Philippe.Brucker, yehuday, Manish.Jaggi, Peter Maydell

Hi Robin,

On 04/10/2016 19:18, Robin Murphy wrote:
> On 02/10/16 10:56, Christoffer Dall wrote:
>> On Fri, Sep 30, 2016 at 02:24:40PM +0100, Robin Murphy wrote:
>>> Hi Eric,
>>>
>>> On 27/09/16 21:48, Eric Auger wrote:
>>>> iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on
>>>> IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap
>>>> they reserve the IOVA window to prevent the iova allocator to
>>>> allocate in those areas.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>> ---
>>>>  drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>  include/linux/dma-iommu.h | 18 ++++++++++++++++++
>>>>  2 files changed, 66 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>>> index 04bbc85..db21143 100644
>>>> --- a/drivers/iommu/dma-iommu.c
>>>> +++ b/drivers/iommu/dma-iommu.c
>>>> @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>>>  	return 0;
>>>>  }
>>>>  EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>>>> +
>>>> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
>>>> +			phys_addr_t paddr, size_t size, int prot)
>>>> +{
>>>> +	struct iova_domain *iovad;
>>>> +	unsigned long lo, hi;
>>>> +	int ret;
>>>> +
>>>> +	if (domain->type != IOMMU_DOMAIN_MIXED)
>>>> +		return -EINVAL;
>>>> +
>>>> +	if (!domain->iova_cookie)
>>>> +		return -EINVAL;
>>>> +
>>>> +	iovad = cookie_iovad(domain);
>>>> +
>>>> +	lo = iova_pfn(iovad, iova);
>>>> +	hi = iova_pfn(iovad, iova + size - 1);
>>>> +	reserve_iova(iovad, lo, hi);
>>>
>>> This can't work reliably - reserve_iova() will (for good reason) merge
>>> any adjacent or overlapping entries, so any unmap is liable to free more
>>> IOVA space than actually gets unmapped, and things will get subtly out
>>> of sync and go wrong later.
>>>
>>> The more general issue with this whole approach, though, is that it
>>> effectively rules out userspace doing guest memory hotplug or similar,
>>> and I'm not we want to paint ourselves into that corner. Basically, as
>>> soon as a device is attached to a guest, the entirety of the unallocated
>>> IPA space becomes reserved, and userspace can never add anything further
>>> to it, because any given address *might* be in use for an MSI mapping.
>>
>> Ah, we didn't think of that when discussing this design at KVM Forum,
>> because the idea was that the IOVA allocator was in charge of that
>> resource, and the IOVA was a separate concept from the IPA space.
>>
>> I think what tripped us up, is that while the above is true for the MSI
>> configuration where we trap the bar and do the allocation at VFIO init
>> time, the guest device driver can program DMA to any address without
>> trapping, and therefore there's an inherent relationship between the
>> IOVA and the IPA space.  Is that right?
> 
> Yes, for anything the guest knows about and/or can touch directly, IOVA
> must equal IPA, or DMA is going to go horribly wrong. It's only direct
> interactions between device and host behind the guest's back where we
> (may) have some freedom with IOVA assignment.
> 
>>> I think it still makes most sense to stick with the original approach of
>>> cooperating with userspace to reserve a bounded area - it's just that we
>>> can then let automatic mapping take care of itself within that area.
>>
>> I was thinking that it's also possible to do it the other way around: To
>> let userspace say wherever memory may be hotplugged and do the
>> allocation within the remaining area, but I suppose that's pretty much
>> the same thing, and it should just depend on what's easiest to implement
>> and what userspace can best predict.
> 
> Indeed, if userspace *is* able to pre-emptively claim everything it
> might ever want, that does kind of implicitly solve the "tell me where I
> can put this" problem (assuming it doesn't simply claim the whole
> address space, of course), but I'm not so sure it works well if there
> are any specific restrictions (e.g. if some device is going to require
> the MSI range to be 32-bit addressable). It also fails to address the
> issue below...
> 
>>> Speaking of which, I've realised the same fundamental reservation
>>> problem already applies to PCI without ACS, regardless of MSIs. I just
>>> tried on my Juno with guest memory placed at 0x4000000000, (i.e.
>>> matching the host PA of the 64-bit PCI window), and sure enough when the
>>> guest kicks off some DMA on the passed-through NIC, the root complex
>>> interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR
>>> claimed by the video card, and it fails. I guess this doesn't get hit in
>>> practice on x86 because the guest memory map is unlikely to be much
>>> different from the host's.
>>>
>>> It seems like we basically need a general way of communicating fixed and
>>> movable host reservations to userspace :/
>>>
>>
>> Yes, this makes sense to me.   Do we have any existing way of
>> discovering this from userspace or can we think of something?
> 
> I know virtually nothing about the userspace interface, but I was under
> the impression it would require something new. I wasn't even aware you
> could do the VFIO-under-QEMU-TCG thing which Eric points out,
I meant running a non x86 VM on an x86 host. Quoting Alex:

"x86 isn't problem-free in this space.  An x86 VM is going to know that
the 0xfee00000 address range is special, it won't be backed by RAM and
won't be a DMA target, thus we'll never attempt to map it for an iova
address.  However, if we run a non-x86 VM or a userspace driver, it
doesn't necessarily know that there's anything special about that range
of iovas.  I intend to resolve this with an extension to the iommu info
ioctl that describes the available iova space for the iommu.  The
interrupt region would simply be excluded."

In my v12 I added such VFIO IOMMU info ioctl to retrieve the MSI
topology. Now for the issue you pointed out (PCI without ACS) I
understand this is a generalisation of the same issue and the VFIO IOMMU
info capability chain API could be used as well. I can submit something
separately. But anyway at QEMU level, due to the static mapping in
mach-virt, at the moment, we just can reject the assignment I am afraid.

Thanks

Eric

 so it
> seems like the general "tell userspace about addresses it can't use"
> issue is perhaps the more pressing one. On investigation, QEMU's static
> memory map with RAM at 0x4000000 is already busted for VFIO on Juno, as
> that results in attempting DMA to config space, which goes about as well
> as one might expect.
> 
> Robin.
> 
>>
>> Thanks,
>> -Christoffer
>>
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed
@ 2016-10-04 17:37           ` Auger Eric
  0 siblings, 0 replies; 38+ messages in thread
From: Auger Eric @ 2016-10-04 17:37 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

On 04/10/2016 19:18, Robin Murphy wrote:
> On 02/10/16 10:56, Christoffer Dall wrote:
>> On Fri, Sep 30, 2016 at 02:24:40PM +0100, Robin Murphy wrote:
>>> Hi Eric,
>>>
>>> On 27/09/16 21:48, Eric Auger wrote:
>>>> iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on
>>>> IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap
>>>> they reserve the IOVA window to prevent the iova allocator to
>>>> allocate in those areas.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>> ---
>>>>  drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>  include/linux/dma-iommu.h | 18 ++++++++++++++++++
>>>>  2 files changed, 66 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>>> index 04bbc85..db21143 100644
>>>> --- a/drivers/iommu/dma-iommu.c
>>>> +++ b/drivers/iommu/dma-iommu.c
>>>> @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>>>  	return 0;
>>>>  }
>>>>  EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>>>> +
>>>> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
>>>> +			phys_addr_t paddr, size_t size, int prot)
>>>> +{
>>>> +	struct iova_domain *iovad;
>>>> +	unsigned long lo, hi;
>>>> +	int ret;
>>>> +
>>>> +	if (domain->type != IOMMU_DOMAIN_MIXED)
>>>> +		return -EINVAL;
>>>> +
>>>> +	if (!domain->iova_cookie)
>>>> +		return -EINVAL;
>>>> +
>>>> +	iovad = cookie_iovad(domain);
>>>> +
>>>> +	lo = iova_pfn(iovad, iova);
>>>> +	hi = iova_pfn(iovad, iova + size - 1);
>>>> +	reserve_iova(iovad, lo, hi);
>>>
>>> This can't work reliably - reserve_iova() will (for good reason) merge
>>> any adjacent or overlapping entries, so any unmap is liable to free more
>>> IOVA space than actually gets unmapped, and things will get subtly out
>>> of sync and go wrong later.
>>>
>>> The more general issue with this whole approach, though, is that it
>>> effectively rules out userspace doing guest memory hotplug or similar,
>>> and I'm not we want to paint ourselves into that corner. Basically, as
>>> soon as a device is attached to a guest, the entirety of the unallocated
>>> IPA space becomes reserved, and userspace can never add anything further
>>> to it, because any given address *might* be in use for an MSI mapping.
>>
>> Ah, we didn't think of that when discussing this design at KVM Forum,
>> because the idea was that the IOVA allocator was in charge of that
>> resource, and the IOVA was a separate concept from the IPA space.
>>
>> I think what tripped us up, is that while the above is true for the MSI
>> configuration where we trap the bar and do the allocation at VFIO init
>> time, the guest device driver can program DMA to any address without
>> trapping, and therefore there's an inherent relationship between the
>> IOVA and the IPA space.  Is that right?
> 
> Yes, for anything the guest knows about and/or can touch directly, IOVA
> must equal IPA, or DMA is going to go horribly wrong. It's only direct
> interactions between device and host behind the guest's back where we
> (may) have some freedom with IOVA assignment.
> 
>>> I think it still makes most sense to stick with the original approach of
>>> cooperating with userspace to reserve a bounded area - it's just that we
>>> can then let automatic mapping take care of itself within that area.
>>
>> I was thinking that it's also possible to do it the other way around: To
>> let userspace say wherever memory may be hotplugged and do the
>> allocation within the remaining area, but I suppose that's pretty much
>> the same thing, and it should just depend on what's easiest to implement
>> and what userspace can best predict.
> 
> Indeed, if userspace *is* able to pre-emptively claim everything it
> might ever want, that does kind of implicitly solve the "tell me where I
> can put this" problem (assuming it doesn't simply claim the whole
> address space, of course), but I'm not so sure it works well if there
> are any specific restrictions (e.g. if some device is going to require
> the MSI range to be 32-bit addressable). It also fails to address the
> issue below...
> 
>>> Speaking of which, I've realised the same fundamental reservation
>>> problem already applies to PCI without ACS, regardless of MSIs. I just
>>> tried on my Juno with guest memory placed at 0x4000000000, (i.e.
>>> matching the host PA of the 64-bit PCI window), and sure enough when the
>>> guest kicks off some DMA on the passed-through NIC, the root complex
>>> interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR
>>> claimed by the video card, and it fails. I guess this doesn't get hit in
>>> practice on x86 because the guest memory map is unlikely to be much
>>> different from the host's.
>>>
>>> It seems like we basically need a general way of communicating fixed and
>>> movable host reservations to userspace :/
>>>
>>
>> Yes, this makes sense to me.   Do we have any existing way of
>> discovering this from userspace or can we think of something?
> 
> I know virtually nothing about the userspace interface, but I was under
> the impression it would require something new. I wasn't even aware you
> could do the VFIO-under-QEMU-TCG thing which Eric points out,
I meant running a non x86 VM on an x86 host. Quoting Alex:

"x86 isn't problem-free in this space.  An x86 VM is going to know that
the 0xfee00000 address range is special, it won't be backed by RAM and
won't be a DMA target, thus we'll never attempt to map it for an iova
address.  However, if we run a non-x86 VM or a userspace driver, it
doesn't necessarily know that there's anything special about that range
of iovas.  I intend to resolve this with an extension to the iommu info
ioctl that describes the available iova space for the iommu.  The
interrupt region would simply be excluded."

In my v12 I added such VFIO IOMMU info ioctl to retrieve the MSI
topology. Now for the issue you pointed out (PCI without ACS) I
understand this is a generalisation of the same issue and the VFIO IOMMU
info capability chain API could be used as well. I can submit something
separately. But anyway at QEMU level, due to the static mapping in
mach-virt, at the moment, we just can reject the assignment I am afraid.

Thanks

Eric

 so it
> seems like the general "tell userspace about addresses it can't use"
> issue is perhaps the more pressing one. On investigation, QEMU's static
> memory map with RAM at 0x4000000 is already busted for VFIO on Juno, as
> that results in attempting DMA to config space, which goes about as well
> as one might expect.
> 
> Robin.
> 
>>
>> Thanks,
>> -Christoffer
>>
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2016-10-04 17:37 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-27 20:48 [RFC 00/11] KVM PCIe/MSI passthrough on ARM/ARM64: re-design with transparent MSI mapping Eric Auger
2016-09-27 20:48 ` Eric Auger
2016-09-27 20:48 ` Eric Auger
2016-09-27 20:48 ` [RFC 01/11] iommu: Add iommu_domain_msi_geometry and DOMAIN_ATTR_MSI_GEOMETRY Eric Auger
2016-09-27 20:48   ` Eric Auger
2016-09-27 20:48 ` [RFC 02/11] iommu: Introduce IOMMU_CAP_TRANSLATE_MSI capability Eric Auger
2016-09-27 20:48   ` Eric Auger
2016-09-27 20:48 ` [RFC 03/11] iommu: Introduce IOMMU_DOMAIN_MIXED Eric Auger
2016-09-27 20:48   ` Eric Auger
2016-09-27 20:48 ` [RFC 04/11] iommu/dma: Allow MSI-only cookies Eric Auger
2016-09-27 20:48   ` Eric Auger
2016-09-27 20:48 ` [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed Eric Auger
2016-09-27 20:48   ` Eric Auger
2016-09-30 13:24   ` Robin Murphy
2016-09-30 13:24     ` Robin Murphy
2016-09-30 13:24     ` Robin Murphy
2016-10-02  9:56     ` Christoffer Dall
2016-10-02  9:56       ` Christoffer Dall
2016-10-02  9:56       ` Christoffer Dall
2016-10-04 17:18       ` Robin Murphy
2016-10-04 17:18         ` Robin Murphy
2016-10-04 17:37         ` Auger Eric
2016-10-04 17:37           ` Auger Eric
2016-10-03  9:38     ` Auger Eric
2016-10-03  9:38       ` Auger Eric
2016-10-03  9:38       ` Auger Eric
2016-09-27 20:48 ` [RFC 06/11] iommu/arm-smmu: Allow IOMMU_DOMAIN_MIXED domain allocation Eric Auger
2016-09-27 20:48   ` Eric Auger
2016-09-27 20:48 ` [RFC 07/11] iommu: Use IOMMU_DOMAIN_MIXED typed domain when IOMMU translates MSI Eric Auger
2016-09-27 20:48   ` Eric Auger
2016-09-27 20:48 ` [RFC 08/11] vfio/type1: Sets the IOVA window in case MSI IOVA need to be allocated Eric Auger
2016-09-27 20:48   ` Eric Auger
2016-09-27 20:48 ` [RFC 09/11] vfio/type1: Reserve IOVAs for IOMMU_DOMAIN_MIXED domains Eric Auger
2016-09-27 20:48   ` Eric Auger
2016-09-27 20:48 ` [RFC 10/11] iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP Eric Auger
2016-09-27 20:48   ` Eric Auger
2016-09-27 20:48 ` [RFC 11/11] iommu/arm-smmu: Advertise IOMMU_CAP_TRANSLATE_MSI Eric Auger
2016-09-27 20:48   ` Eric Auger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.